GPT-4.5 Passes Three-Way Turing Test
The researchers conducted a three-way Turing test on four AI systems: ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5. The latter scored the highest.
In a paper published March 31, Cameron Jones and Benjamin Bergen of the Department of Cognitive Science at the University of California, San Diego, shared the results of the experiment.
They used an original three-way version of the test, in which participants had five-minute conversations with both another interlocutor and one of the AI systems, and then determined which of the interlocutors they considered human. This version is more difficult than the test in which people communicate only with a machine.
In 73% of cases, subjects considered GPT-4.5 to be human. Other AIs scored lower:
- LLaMa-3.1 — 56%;
- ELIZA — 23%;
- GPT-4o — 21%.
The Turing test is a conceptual test proposed by British mathematician Alan Turing in 1950 to determine whether a computer can exhibit intelligent behavior indistinguishable from that of a human.
The essence of the test:
- A person is conducting a text conversation with two interlocutors: another person and artificial intelligence.
- If the subject cannot determine with certainty which of them is the machine, the computer is considered to have passed the test.
The Turing test has been conducted many times among popular AI models. For example, in June 2024, people were unable to distinguish ChatGPT from a human interlocutor in 54% of cases. ELIZA then scored 22%, GPT-3.5 - 50%, and humans - 67%.
In 2023, in a similar study by Jones, GPT-4 scored 41%, GPT-3.5 - 14%, ELIZA - 27%. Humans then got 63%.
Let us recall that in February 2025, OpenAI released a new version of the GPT-4.5 chatbot with advanced “emotional intelligence”.