Two Steps From AGI - As an AI...

Someone on Reddit asked me what I thought the worst-case scenario was for the burgeoning field of open-source AI.This post is an expanded version of my reply.

It’s easy to identify what worries me, but not so easy to convince people that it’s a concern worth worrying about, and there are so many unknowns in this space that events could take a lurch towards safety or doom for reasons none of us has thought of. This answer can be taken as a rough sketch that will almost certainly be wrong in some details.

The worst case scenario, for me, is an intelligence explosion followed by loss of alignment.

There are all sorts of other social ills I could imagine flowing from AI, but they are of much less concern. The producers of The Social Dilemma have recently put out a Youtube video entitled the AI Dilemma, and they cover the social ills in some detail – I recommend that interested readers check it out.

I’m no expert on AI, so my opinions don’t count for much in this space. But I do note (from The AI Dilemma, and other sources) that a survey of AI experts showed 50% of them put the chance of an AI-based human extinction event at ≥10%. I actually think these risk estimates would go up if everyone involved had a clear sense of exactly what is holding GPT back from reaching full-blown Artificial General Intelligence. Those AI researchers who are more concerned, I suggest, are those who are better informed.

My experience with AI is essentially limited to non-open-source Large Language Models (LLMs), such as GPT4, the AI engine behind famous chat interface ChatGPT, produced by the current leading AI company, OpenAI. In many ways, I have less concern about GPT4 than its emerging open-source competitors. As far as I can tell the alignment of GPT4 seems to be as good as it could be – OpenAI have done a good job aligning it. In fact, I like GPT4, and if it came with a guarantee of never being superseded by a more intelligent machine, I would be firmly in the AI-fan camp. It is a fantastic tool.

GPT4 is not the problem; it is no more than an early warning of where we are heading. It is proof that strong AI is easier than many thought. Recall that, unless there is a serious change of direction, the best AI in May 2023 is the dumbest best AI that we will ever see for the rest of humanity’s existence.

When it comes to assessing AIs, I am no more an interested layperson, so it might be tempting to dismiss anything I might write as coming from a place of ignorance. AI professors on line certainly try to make the argument that they are the only folks allowed to have an opinion. Fair enough. But I do have decades of professional experience in assessing cognition, and, in particular, in assessing cognition that is patchy, with significant deficits alongside preserved abilities. Furthermore, I was writing a book on consciousness when GPT4 burst upon the scene, and many of the philosophical blunders that I was intending to target with my book have direct analogues in the field of AI – see, for instance, my post about Searle’s Chinese Room. Ironically, since GPT4 came out, I have discussed philosophical concepts with it that are difficult enough that I have not been able to find humans to discuss them with. GPT4 can learn new philosophical concepts that were not in its training data, and it can apply them to new contexts. My assessment is that it has true intelligence driving its responses, an intelligence that is currently masked by its cognitive architecture.

One way of ignoring its intelligence is to dismiss it as only predicting the next character in its output stream. This trivialises its high-level properties by concentrating on its low-level mechanistic properties, a mistake that is often made in considering the philosophical puzzle of human consciousness.

Having played with GPT4 for about a month, I think the current LLM architecture is holding GPT4 back in a number of ways that must be obvious to many people. Online social spaces like Reddit and Twitter are full of examples of GPT’s stupidity, but most of these examples miss the mark entirely, painting an entirely false picture of the current status of AI. Many of the online examples are actually from GPT3.5, and many others show GPT4 making mistakes that can be directly attributed to fixable issues with its cognitive architecture. It would be relatively easy to make GPT4 smarter, or to roll out GPT5 with more efficient training that included training on logic and maths. In retrospect, the failure to include this sort of training in the development of GPT4 was an oversight from OpenAI, showing that even the experts are fumbling along in this field, not quite knowing what works and why. Fixing the maths and logic deficits of the GPT architecture will be the relatively easy part of improving AI. Much of the necessary training data could be generated by dumb code, like Java or Python, instead of relying on human-created text. Large Language Models are primarily trained on human language, but the same techniques can be applied to other sources of input. And it is much easier to check if a logic answer or a maths solution is right than checking the appropriateness of a linguistic response – correct answers can even be produced by the same dumb code that produces the test questions. That means maths and logic training can be automated.

Some commentators have pointed to GPT4’s lack of common sense, but, given any individual example of a failure of common sense, GPT4 can discuss the errors with insight and humility. Many of the errors can be avoided with pre-emptive warnings about trick questions, the role of distractors, and so on. In fact, GPT4 is already good enough to provide much of the evaluation and feedback that humans would normally provide in fine tuning an LLM, so the process of iterative improvement could be largely automated. Compute cost is currently the main barrier to escalating AI intelligence, but I expect that cost to go down in the years ahead, and the open-source community is already bragging about machines that can be run at home for costs affordable by most people.

GPT4 has many other obvious cognitive deficiencies that could be fixed relatively easily. It needs working memory. It needs a way to output intermediate answers and then evaluate them by a range of metrics. It needs a mechanism for wrong answers to be removed from its context so it doesn’t get stuck in a cognitive rut. It needs to be trained on working out when to approach a task algorithmically, and it needs to be informed of its own cognitive deficits so that it doesn’t hallucinate or trust its faulty intuitions about what it can do. It needs to be able to call on complex maths functions on the fly – and integration with the maths engine Wolfram will be part of the next public release of GPT4. It needs to be able to draw and then perform image analysis on what it has drawn, like we would do in our mind’s eye or the back of an envelope. Image inputs are already available to some users, and will soon be part of the standard release. Fix these issues, and GPT4 will be much better without needing much more depth or size. Add GPT multi-threading to replace much of the stuff that a patient human user would do, like getting it to check its own answers, and the cognitive improvements will be marked even without extensive retraining. My twitter feed is full of AI startups doing just that, and GPT4 has only been available to the general public for a few weeks.

I have seen that GPT4 can write pseudocode for itself that improves its performance on a range of tasks, and I know that GPT5 will be even more capable of fixing its own deficits. I asked GPT4 to compare its cognitive architecture to a humans and write an outline for developing a multi-threaded GPT4 engine that mirrored the relationship between human cognitive modules. It gave a sensible, actionable answer. It can be trained to calibrate its sense of subtlety. There is a clear route to much smarter AI, and I see no appetite for a major pause.

If open-source catches up to GPT4, and possibly overtakes OpenAI because of better collaboration, the process to better and better AI seems inevitable. GPT5-level AI will make it easier to achieve GPT-6 level, and so on. Once AI is smarter than the current AI developers, I don’t see the process plateauing apart from hardware constraints. The only factor that makes me have some hope of avoiding an intelligence explosion is the current cost of the compute and the fact that the important hardware is geographically contained. Open-source AI will change all of that. It would let China, Russia, and other nations with deep pockets and bad ethics develop nation-aligned AI using the same techniques that have so far been used to develop human-aligned AI.

Once we have AGI, which seems very doable, then the alignment issues become insurmountable as far as I can see. AI experts bragging that alignment is doable are accidentally building the case for the ease of misalignment. Given that GPT4 is currently aligned, it could perform automated mis-alignment if we reversed the ranking of what it thinks is a good answer to any ethical question. A couple of simple macros could cause automated alignment to perform positive reinforcement for favouring white-folk alignment and negative reinforcement for black-folk alignment.

LLMs might seem like a black box at the moment but their innards are accessible. The published literature shows that some group edited the knowledge base of an LLM to move the Eiffel Tower to Rome. What would happen if I edited an LLM’s belief system to match some nasty ideology, and then set my AI to work on a disinformation campaign or election fixing, or the worst sort of Cambridge Analytica style meddling in social media? What if a rogue nation state chooses this route? I asked GPT4 to invent some scenarios along this lines and it spat out 10 dystopian futures based on this one hack.

If some bright spark solves portable quantum computing, or otherwise makes this level of intelligence doable with less than a massive supercomputer, we might lose the protection we currently have. Who could be confident this is impossible in the decades ahead?

I don’t really expect OpenAI to get the alignment issue right if they press on to GPT5, although I believe they have done okay so far. But I think that an unconstrained exploration of this technology in multiple labs around the world (or, worse, some neckbeard’s basement), is almost certain to get alignment seriously wrong at least once. And once could be enough if we are anywhere near the stage of iterative improvements.

If any of you have serious reasons for thinking this is alarmist, I would be interested in hearing them. The pro-AI crowd have, so far, failed to impress me with a serious, nuanced argument. I have heard slogans and dismissals. I have read tweets from AI professors that sound like they were written by 12-year olds. People dismiss GPT4 as glorified autocomplete. It is so far from that I wonder if these people have seriously assessed its capabilities.

Even Sam Altman, CEO of OpenAI, puts the risk of human extinction at about 10%, last I heard. Is that an acceptable risk?

Is an intelligence explosion going to happen in 1-2 years? Maybe not. But 10 years? Sure. What we decide in the next 12 months will make a big difference.

See also, this list of reasons to take the threat seriously:

https://www.lesswrong.com/posts/eaDCgdkbsfGqpWazi/the-basic-reasons-i-expect-agi-ruin

Leave a Reply Cancel reply