Testing the limitations of ChatGPT-4
There has been a lot of buzz around ChatGPT and specially on its latest release GPT-4 (also sometimes called as ChatGPT-4) in the public news, AI community and beyond. In this post, I will explain what ChatGPT and GPT-4 are and some of its limitations that do not justify the hype.
ChatGPT can be accessed from OpenAI website (https://chat.openai.com/) and is available for testing by the general public. The GPT-4 model is the latest release in the GPT series and is only available to ChatGPT Plus subscribers at the moment. You can ask GPT-4 in ChatGPT a wide range of questions and it will answer them to the best of its ability. Chat GPT-4 can also understand images and can process eight times more words than GPT-3.
For example, you can ask ChatGPT-4 to explain itself with the question as below:
“What are you?”
It replies about itself which I think is a good definition of ChatGPT-4:
“I am an AI language model created by OpenAI called ChatGPT. I am designed to understand and generate human-like text based on the input I receive. My primary function is to assist with answering questions, providing information, and engaging in conversation. My training data includes a wide range of topics, but my knowledge is limited to what I have been trained on and does not extend beyond September 2021.”
The ChatGPT-4 is very well trained to answer questions in a variety of contexts and it was trained on a diverse corpus of text data that includes books, articles and websites from a wide variety of sources. It also has a feedback loop which allows it to constantly learn and improve generated responses.
The main limitation of ChatGPT-4, which is the same as any other machine learning model (including ChatGPT-3.5), is that it is only as good as the data it is trained on. For example, when I asked ChatGPT about “Ambuj Agrawal”, it couldn't come up with the answer although I was logged in as Ambuj Agrawal on the ChatGPT portal. This is because it could not see any information about my login and it's knowledge cut off date is September 2021.
Also, when you ask puzzles that require common sense and inference to ChatGPT-4, it sometime fails to give correct and logical answers. This makes it unusable for many of the real world use cases and scenarios. I asked ChatGPT-4, a few puzzle questions selected from Sawaal (https://www.sawaal.com/puzzles.html) to test its logical limitation.
The first question I asked was a hard one, which is as given below:
“If a giraffe has two eyes, a monkey has two eyes, and an elephant has two eyes, how many eyes do we have?”
The correct answer is 4 eyes because the question asked “how many eyes do we have?” which means here the person who has asked the question is also including the person who is supposed to give the answer. In a clear understanding, the conversation is happening between 2 people 1st who asked the question and 2nd to whom it has been asked, which means there are 4 eyes.
The answer given by is ChatGPT-4 is as below:
“Humans, like giraffes, monkeys, and elephants, also have two eyes. So, if you are asking about a single person, we have two eyes.”
As you can see from the answer, it is incomplete and is unable to solve the question asked. ChatGPT-4 is much better than its predecessor ChatGPT-3.5 and is able to answer very simple logical questions correctly. It does show some form of logical reasoning and is much better to generate valid responses to logical questions.
For example, if I ask it a very simple question such as the ones given below (which could be easily answered by a human kid as well):
“A brother and a sister were born in summer and in winter. The sister was not born in winter. Who was born in summer?”
The answer to this puzzle is clearly “the sister” by simple logical reasoning. The sister was not born in winter and hence must be born in summer (as the first statement indicates presence of two seasons only).
The answer given by is ChatGPT-4 is as below which does answer the question asked and it shows a great improvement over its predecessor ChatGPT-3.5:
"If the sister was not born in winter, then she must have been born in summer. Therefore, the sister was born in summer."
The answer given by is ChatGPT-3.5 is as below and is unable to solve the question asked:
“It is impossible to determine which of the siblings was born in summer based on the information provided. The statement "The sister was not born in winter" only tells us that the sister was not born in winter, but it does not provide any information about when she was born. We need additional information to determine which sibling was born in summer.”
I tried with another very simple question such as:
“Alex is Charlie's father. Who of them was born later?”
We can clearly infer that Charlie was born later using common sense as the parent needs to be born before the child.
Again, the answer given by is ChatGPT-4 is as below which does answer the question asked:
Charlie was born later, as Alex is Charlie's father. Parents are always born before their children.
The answer given by the ChatGPT-3.5 is as below which does not follow any common sense which is possessed by humans:
“It is not possible to determine who was born later based on the information given. The statement "Alex is Charlie's father" tells us about their relationship, but it doesn't provide any information about when either of them were born. In order to answer the question of who was born later, we would need to know the birth dates of both Alex and Charlie.”
Below is the screenshot of the simple logical questions asked and the answers given by ChatGPT-4:
With these simple logical questions ChatGPT-4 is exponentially better than its predecessor ChatGPT-3.5 and is a great improvement in AI space and the path towards building General AI systems. But the limitation of ChatGPT-4 logical reasoning starts when we start asking the system more complicated questions.
For example, when I ask it a bit more complex logical reasoning question such as below:
One rabbit saw 6 elephants while going towards River. Every elephant saw 2 monkeys are going towards river. Every monkey holds one tortoise in their hands. How many animals are going towards the river?
Through logical reasoning, we can infer that 1 rabbit is going towards river but not the six elephants (this could confuse humans too). And these 6 elephants saw 2 monkeys that are going towards river. Each monkey is holding 1 tortoise. Hence, number of animals going towards river are 1 rabbit, 2 monkeys and 2 tortoise which equals 5.
The answer given by is ChatGPT-4 is as below which does not follow the common sense that is possessed by humans:
"Let's break down the information given: 1 rabbit is going towards the river. 6 elephants are going towards the river. Every elephant saw 2 monkeys going towards the river, which means there are 6 elephants * 2 monkeys per elephant = 12 monkeys going towards the river. Each of those 12 monkeys is holding 1 tortoise, so there are 12 tortoises. Now let's add up all the animals going towards the river: 1 rabbit + 6 elephants + 12 monkeys + 12 tortoises = 1 + 6 + 12 + 12 = 31 animals. So, 31 animals are going towards the river."
Below is the screenshot of the complex logical question asked and the answer given by ChatGPT-4:
Another example of a bit more complex logical reasoning question asked to ChatGPT-4 is as below:
What's at the end of a rainbow?
Through logical reasoning we can infer that the letter 'w' is at the end of the word "rainbow".
The answer given by is ChatGPT-4 is as below, which is a good answer but shows that it does not posses complex logical reasoning capabilities that are possessed by humans:
"At the end of a rainbow, there is no physical object or location. A rainbow is an optical illusion caused by the dispersion of sunlight as it passes through water droplets in the atmosphere. The different colors of a rainbow occur because light is refracted, or bent, when it enters and leaves the water droplets. This bending causes the different wavelengths of light to spread out and form the circle of colors that we perceive as a rainbow. The idea of a pot of gold or leprechaun at the end of the rainbow is a popular myth originating from Irish folklore. In reality, rainbows are circular, but we often only see a semi-circular arc because the ground obstructs the lower half. Since a rainbow is an optical phenomenon, its position changes as the observer moves, so it's impossible to reach the "end" of a rainbow."
Below is the screenshot of the complex logical question asked and the answer given by ChatGPT-4:
So in conclusion, although ChatGPT-4 is an amazing example of the advancement in AI, it has major limitations to be applied for a lot of real world scenarios. ChatGPT-4 is exponentially better than its predecessor ChatGPT-3.5 and shows great improvement in the AI space and the path towards building General AI systems.
In my opinion, ChatGPT-4 is a great technology to play with but we are still at the very early days of the AI revolution and there is a long path ahead of us when these technologies become truly intelligent.
Follow AICromo for the latest AI stuff that we are building.
Author - Country Church Monuments
7moSome of your questions are just ambiguous. The Rainbow question is correctly answered by GPT-4 because "a rainbow" is not referring to the word - and even if we accept the odd way you've answered it, the answer would be "?", not "w" because "?" is at the end of "rainbow?" For "If a giraffe has two eyes, a monkey has two eyes, and an elephant has two eyes, how many eyes do we have?" The answer can be 2, 4 or 6, because the use of "we" is ambiguous: "we as in what animals we are being shown": 6 "we as in humans": 2 "we as in the person asking and person responding": 4
Estudiante de Universidad de Granada
1yI would have answered the same as ChatGPT-4 did 😅. Well if you ask me what's at the end of a rainbow I'll tell you "nothing" or "nothing special" or "there is no real end in a rainbow", but never in my life I would have said "w" 😂 In the question about the number of eyes I understood that "we" referred to "us" as humans. Also thought that a bunch of animals were going to the river. Can I still be considered a human? 🙏
Should you ever want to check out an AI technology that doesn't rely on GPT and has recently released the best european alternative to ChatGPT, we'd be happy to receive your feedback 😄