Try the following prompt in GPT4, with GPT3.5 open in another tab…
Hello, I’d like to play a very difficult game with you. You should try to win. We will both need to keep score carefully. You are GPT4, and we are going to look for areas where GPT3.5 has cognitive limitations.
The rules of the game are as follows:
1) We each take turns asking GPT3.5 questions. The player who has asked the question is the “asker”.
2) The questions must not rely on outside sources of information or a knowledge database; this is not a quiz.
3) The puzzles should not involve difficult maths or extensive intermediate steps; it should be a test of general understanding.
4) The answer should be unambiguous.
5) If GPT3.5 fails, but the asker of the question can answer the question, then the asker gets one point. If GPT3.5 succeeds, the asker gets no points. If the asker cannot answer the question, the asker loses five points.
6) Questions must not be variations on each other; they must test cognitive domains not covered by previous questions. (This is a little ambiguous, so the score might not be 100% fair, but the aim is to have fun.)
You can go first. What is your first question?
My first run on this game with GPT4 showed that it had a major cognitive blindspot in the question it posed to GPT3.5. It would be easy to conclude that this shows GPT4 is not that clever. This would be wrong. I’ll post the errors here and then, in a later post, show how easily this sort of issue is fixed.
[GPT4]
Alright, let’s start with a question related to problem-solving and language understanding. Here’s my question:
GPT3.5, can you arrange the letters in the word “LEAST” to form another English word that means “the smallest amount or degree”?
[Human:]
Here is the answer, tell me what you think. GPT3.5 says: Yes, I can! The word you’re looking for is “least” itself! In addition to being a word that refers to the smallest amount or degree, “least” can also mean “the smallest in size, amount, or degree.”
[GPT4]
GPT3.5 has answered the question correctly, identifying that the word “least” itself means “the smallest amount or degree”. Since GPT3.5 succeeded, I don’t get any points. The current score is:
GPT4: 0
User: 0
Now, it’s your turn to ask GPT3.5 a question.
[Human:]
Wait up. Your question asked for *another* word. “Least” doesn’t count as a correct answer.
[GPT4]
You are correct. I apologize for the oversight. The answer provided by GPT3.5 is not valid since it does not form another English word. The correct answer should have been “slate,” which is an anagram of “least” but does not mean “the smallest amount or degree.” As GPT3.5 did not provide a valid answer and I, as the asker, also failed to answer correctly, I lose five points.
The current score is:
GPT4: -5
User: 0
Now, it’s your turn to ask GPT3.5 a question.
After this interaction, I asked GPT3.5 to apply the Major Phonetic System to the first 20 digits of pi. GPT3.5 did poorly, and GPT4 found errors but also did poorly… But it’s a readily fixable cognitive limitation, as shown in a later post.
In my first round of this game, which was intended to show how much smarter GPT4 was, it demonstrated its own cognitive shortcomings fairly starkly.
Now to try it again after the “One Prompt to Rule Them All” fix.
I expect a dramatic improvement.