laitimes

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

author:Jiangmen Ventures
Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

Paper Link:

https://arxiv.org/abs/2305.20010

Project Address:

https://www.humanornot.ai/

"I believe that in the next 50 years, computers are likely to show better capabilities, so that the average tester is no more than 70% likely to distinguish between a machine and a human after 5 minutes of questioning."

- Alan Turing, above 1950 is the famous Turing test, the Turing test was proposed by Alan Turing, the father of computer science and artificial intelligence in the world, in 1950 in a paper called "Computing Machinery and Intelligence" [1]. In this seminal paper, Turing fully defines the process and evaluation criteria of the Turing test, and the concept of artificial intelligence was not proposed until the 1956 Dartmouth Conference 6 years later). The content of the Turing test can be summarized as: if the computer can answer a series of questions asked by human testers in 5 minutes, and more than 30% of the answers can confuse the tester into thinking that it is a human answer, it can be considered that the computer has passed the Turing test and has a certain ability to think. Turing visualized this test as a "parody game."

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

Recently, AI21 Labs from Israel (AI21 Labs recently proposed its own chat interaction large model Jurassic-2 [2]) released their research progress in Turing test on large language models, AI21 Labs designed a large-scale online game called "human" or not, the game has so far attracted more than 1.5 million unique users and has been tested more than 10 million times, with players tasked with correctly guessing the identity of the person in an anonymous two-minute conversation. From the perspective of test scale and testing methods, "Human or Not" should be regarded as a modern advanced version of the Turing test. The results are also quite interesting, with an average false guess rate of 68% for games, indicating that only about 20% of users can clearly distinguish whether they are talking to a machine or a human, which is enough to reflect the powerful capabilities of current AI models in conversation.

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

1. Introduction

The Turing test was originally just a thought experiment to judge whether the machine could think like a human, without other considerations, perhaps Turing himself did not expect that the game he designed at the beginning would later become the most authoritative benchmark for evaluating machine intelligence in the field of artificial intelligence. The most widely circulated computer program that has passed the Turing test is an AI system called Eugene Goostman developed by a Russian team in 2014, which confused 33% of the testers in the test and was eventually determined to have intelligence equivalent to that of a 13-year-old child.

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

The "Human or Not" online game designed in this article can try some Turing tests on the current large language model, the picture above is the specific picture of the game, in this test example, the other party first speaks, and then the user needs to talk to it within a certain time limit, after the dialogue, the system will pop up a dialog box to let users determine whether the chat just now is a robot or a human, after the judgment, the system will tell you whether the judgment is correct. According to the authors, the Human or Not online game attracted a large number of test users within the first month of its release, which helped them a lot to continue the experiment. The authors also mention that the results of their experiment coincide with Turing's prediction in 1950, that is, after a short period of communication, a human tester can correctly identify an AI with less than 70% accuracy. Second, the design and development of "human or not" Recently, more and more people have begun to use ChatGPT and other large models to assist their work and life, for example, creators can use it as their own thinking communication partner, the elderly can reduce their loneliness by talking to large models, and so on, these cases have benefited from the large model can now initially simulate human communication and dialogue behavior. The core design of "Human or Not" is to ensure that the AI robots involved in the dialogue in the background are not easy to distinguish, and according to Turing's original vision, only then can we think that the machine has a certain "intelligence". Therefore, the authors define a set of human characters that AI can simulate, and these characters present diverse group characteristics, each robot has its own unique personality and goals, which can make the test dialogue more interesting and non-repetitive.

2.1 Robot Role Definition

In defining the role of each robot, a series of Prompt Engineering was designed, and the author took into account the name, age and occupation, as well as unique personality characteristics such as wit, humor, or seriousness, as shown in the example below, which introduces Maria, a 42-year-old production worker, who is witty and lively, used to use slang, but refuses to answer factual questions, but is generally friendly and funny.

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

In addition, the hints include game descriptions that allow the robot to perceive the specific context of the game, and some robots are set in a very unique narrative scene, which will keep the tester highly engaged. 2.2 Context Information Integration

The recent hot In-Context Learning technique has been shown to significantly improve the authenticity of generated text if relevant information is integrated into the context of a large language model, so the authors refer to ICL to provide robots with real-time, context-sensitive information in Human or not, such as local news and weather data. As shown in the figure below, the author first provided the robot with weather conditions in the Honolulu area and informed him about recent events of concern in Honolulu.

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

The authors hope that by doing so, when the user asks the robot about recent events, the response generated by the robot can be as close to the real world as possible, and the way of providing factual information through ICL can make the robot pay more attention to details during the interaction process, avoiding the "Hallucination" phenomenon common to large models.

2.3 Dialogue style

In order to increase the diversity of conversations, the development team first started with the diversity of large models, and they set up several different core language models, including the latest GPT-4, AI21 Labs' own Jurassic-2 (Jurassic 2), and Google's Cohere2 model.

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

To further increase the realism of the dialogue, the authors have prompted large models to simulate a variety of different dialogue styles, for example, some robots are strictly required not to have punctuation errors, while others may be required to use slang and make intentional grammatical errors. In the three examples shown in the figure above, the left side is the speech of the robot, and the right side is the speech of the human user.

3. Experimental results and analysis

Within its first month of launch, the Human Or Not game attracted more than 10 million conversation tests from more than 1.5 million users, providing a very rich dataset for further analysis by the authors' team. By analyzing anonymous conversations from different users around the world, the authors gradually explored some "know-how" when judging AI and humans, which fully demonstrate the cognitive flexibility and creativity of the human mind.

3.1 Grammatical errors or spelling errors

For example, some users believe that only humans can make grammatical mistakes or spelling mistakes, so they pay close attention to spelling errors, grammatical errors, and the use of slang in conversations, which they consider to be distinguishing features of human dialogue. But in fact, some AIs can already simulate these specificities of humans well, so it is difficult to distinguish.

3.2 Private and philosophical issues

Another common strategy is to test AI bots by asking private questions, which may be asked directly about private experiences and unique insights about something to try to distinguish between humans and AI. However, because robots have been set up with fictional personal stories in advance, this approach often does not succeed. It is also very interesting that some users try to explore the emotional and philosophical capabilities of AI, asking abstract ethical questions, such as asking about the meaning of life, views on political conflicts, or religious beliefs, in order to assess the depth of understanding and ability to provide deep answers. However, even in the face of such complex problems, AI bots can still provide reasonable and contextually relevant responses, as shown in the two examples below, where the user asks who created humans, and the AI directly replies: God.

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

3.3 Real-time information judgment

Some users also try to ask their interlocutors if they are aware of current trends, which may take into account the point in time at which the current large model is updating the model and whether they have the ability to network for real-time information. For example, some users directly ask their conversation partner if they know about the recently popular dance on TikTok, or discuss a recent topic about the war in Ukraine, as shown in the image below.

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

IV. Summary

This article brings the ancient Turing test back to life by designing an online game, Human or Not. The experiments in this paper provide us with some valuable data and results on human-computer interaction of existing large models, but the authors also acknowledge that this analysis is somewhat one-sided, because the current way of user participation can only be done by using English, and the experimental results may not cover a wide range of human culture, language, and age differences.

In today's explosion of large-scale language models, we can think that the emergence of "Human or Not" represents an important milestone in evaluating artificial intelligence capabilities, which can be used as a paradigm for future research on humanoid artificial intelligence and Turing-like tests. With the continuous development of AI, its potential to affect various industries of human beings has become more and more obvious, which requires us to quickly establish a more complete AI ethical safety assessment mechanism. The primitive Turing test may be outdated today, but its original motivation for judging whether a machine has the ability to think is still of key practical significance. We expect that with the blessing of the Turing test, we can make a more secure, trustworthy and responsible AI system.

reference

[1] Alan M. Turing. Computing Machinery and Intelligence. Communications of the ACM, 59:433–460, 1950

[2] AI21 Labs. Announcing Jurassic-2 and Task-Specific APIs, 2023. URL https://www.ai21.com/blog/introducing-j2

Author:seven_

Illustration by unDraw-The End-

New this week!

Scan the code to watch!

"AI Technology Stream" original submission plan

TechBeat is an AI Learning Community (www.techbeat.net) established by Jiangmen Ventures. The community has launched 480+ talk videos and 2400+ technical dry goods articles, covering CV/NLP/ML/ROBOTIS, etc.; Hold top meetings and other online communication activities on a regular basis every month, and hold offline gatherings and exchange activities for technicians from time to time. We are striving to become a high-quality, knowledge-based communication platform that AI talents love, hoping to create more professional services and experiences for AI talents, and accelerate and accompany their growth.

Contents

Latest Technology Interpretation/Systematic Knowledge Sharing //

Cutting-edge information commentary/experience narration //

Instructions for submission

Manuscripts need to be original articles and indicate author information.

We will select some directions in in-depth technical analysis and scientific research experience, inspire users with more inspirational articles, and do original content rewards

Submission method

Send mail to

[email protected]

Or add staff WeChat (chemn493) to submit articles to communicate the details of submissions; You can also pay attention to the "Jiangmen Venture Capital" public account, and reply to the word "submission" in the background to get submission instructions.

>>> Add WeChat!

About MeGate▼Jiangmen is a new venture capital firm focused on discovering, accelerating and investing in technology-driven startups, including Jiangmen Innovation Services, Jiangmen Technology Community, and TechBeat AI Community. The company is committed to connecting technology and business, discovering and cultivating scientific and technological innovation enterprises with global influence, and promoting enterprise innovation and development and industrial upgrading.

Founded at the end of 2015, the founding team was built by the founding team of Microsoft Venture Capital in China, and has selected and deeply incubated 126 innovative technology-based startups for Microsoft.

If you are a start-up in the technology field and want not only investment, but also a series of continuous and valuable post-investment services, please send or recommend a project to my "door":

Can the big model pass the Turing test, AI21 Labs made a million-level online game "human or not"

⤵ One click to send you to TechBeat Happy Planet