laitimes

If you let AI do your college entrance examination language questions, how many points can it get?

author:Chinese Academy of Sciences China Science Expo

Editor's note:

This paper mainly analyzes the current performance of the big language model in doing questions, especially language questions, through test answers. Never try to challenge the discipline of the exam, relying on yourself to walk out every step of life is solid and powerful. As mentioned in the article, "Friends must not give up learning, hope that everything will use AI in the future." Keep learning, your smart brain will bring you the greatest surprise and reward! ”

If you let AI do your college entrance examination language questions, how many points can it get?

(Image source: Screenshot of the author's webpage dialogue with AI)

The above is the blessing of an AI to the students who participated in the 2023 college entrance examination, do you feel its full love and expectation for you?

The fields and abilities to be examined in the college entrance examination are very comprehensive, and most people have shortcomings in comparison, and the author was unable to obtain high scores because of the lack of "resonance" with the college entrance examination modern language reading questions.

Recently, the author engaged in brain science research has an idea: if you let a powerful artificial intelligence (AI) Large Language Model (LLM) like GPT-4 answer the Chinese college entrance examination questions, how will it perform?

If you let AI do your college entrance examination language questions, how many points can it get?

Dream University in Bloom (Image source: Midjourney, an image generation AI model)

Why does pressure give big language models?

Why is it that large language models have strong ability to do questions? Why don't other previously developed natural language processing (NLP) language models have this capability?

One way of saying that large models have emergent ability, which means that a model automatically learns some advanced and complex functions or behaviors during the training process, and these functions or behaviors are not directly coded or specified. Emergency is the most important core technology for recent AI breakthroughs, enabling large models to perform better at new, unknown tasks because it can adaptively learn new features or behaviors without the need to retrain or modify the model.

Why are humans smart and adaptable?

There is a hypothesis that emerges, which refers to the fact that once the number of neurons in the brain breaks through a specific number, the brain's various functions, including logical thinking ability, can rise a notch, which is the best example of quantitative change leading to qualitative change.

Therefore, when the amount of parameters trained by the large language model and the text data fed to it continue to grow, one day the AI "realized", and since then there has been an explosive leap in language ability, so now the composition written by AI, if not carefully screened, is indistinguishable from the composition written by ordinary high school students.

If you let AI do your college entrance examination language questions, how many points can it get?

The emergence of large models (Image source: Reference [1])

After the emergence, large language models have multimodal thinking links, which can construct a high-dimensional intrinsic representation of language and meaning, so as to complete the final output through natural language reasoning in intermediate steps.

Simply put, it will be simple to reason.

Just looking at the blessing of GPT-4 at the beginning, it is actually difficult to tell whether it was written by AI or human. Although it does not yet possess real consciousness or thinking ability, it does use language similar to the human thinking and reasoning process to bridge the context.

GPT-4, like the popular ChatGPT before it, is a large language model based on a Generative Pre-trained Transformer (GPT) architecture. If a multi-step problem is broken down into intermediate steps that can be solved separately, the expressive reasoning ability of large language models will be further improved.

If you let AI do your college entrance examination language questions, how many points can it get?

The emergence of the ability of large model thinking chains (Image source: Reference [2])

Well, the front is paved with so many excellent language models, and the next thing is that the mule is a horse that is about to be pulled out for a walk.

Then we will use GPT-4 to replace the big language model to fight, and see if it can be shamed for the author in the college entrance examination language!

If you let AI do your college entrance examination language questions, how many points can it get?

Go ahead, GPT-4, and start your AI puzzle journey! (Image source: Kamen Rider Build)

Quiz begins!

This article will let AI do all the college entrance examination language papers in all provinces and cities in the country in 2022, a total of 8 sets, which are the national A paper, the national B paper, the new college entrance examination I paper, the new college entrance examination II paper, the Beijing volume, the Tianjin volume, the Zhejiang paper and the Shanghai paper, and then count its final score. (Because the text materials for OpenAI training large-language models are all before September 2021, the 2022 exam paper is new and unopened for it.) )

If you let AI do your college entrance examination language questions, how many points can it get?

(Image source: Screenshot of the author's webpage dialogue with AI)

The author is from Zhejiang, so I took the Zhejiang roll as an example.

The first big question is the application of language and text (20 points), the following purple box is the question, and the gray box is its answer:

If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: C

If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: 2. B 3.B

If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: D

If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: (1). It is because it is higher than life (2). It is actually full of philosophy (3). and the philosophy of life is appropriately exaggerated and dramatized

Unfortunately, the first 4 questions are all multiple-choice questions, and it only answers 1 question correctly.

After only 4 questions, we had to declare that it had lost the possibility of hitting high scores.

Typos, pinyin judgment, the use of words, punctuation, and the identification of sick sentences and other topics, I feel that AI is not very good at it, which shows that the basic skills of language are not very solid! However, the 5th question is indeed very good to write the appropriate sentence, and the meaning to be expressed by the answer is basically the same, and the definition and brief description part of the question can answer even if it does not give it the pictures it needs, it can be seen that it is good at the connection of the context and the summary of the central meaning to be expressed as a whole, and does not pay much attention to the details.

That is, AI has a little language literacy, but not much.

According to the Zhejiang roll scoring rules, 12 points are deducted for the first big question, and the score is: 8/20.

The next second question is modern reading (30 points), after entering the original text and questions, the AI answers as follows:

If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: 7. A 8.A 9. (1) Scholars: Interest shifts from career to diet to promote dietary development. (2) Technology: Chinese food has a long history, and food technology has been greatly developed in the Ming and Qing dynasties. (3) Theory: Long-term practical experience develops into a system theory.

If you let AI do your college entrance examination language questions, how many points can it get?

Reference answer scoring points: 10. (1) Inhibition. (2) Baking. 11. (1) Atsuhou Zhongxiao. (2) Bear the burden of humiliation. (3) Be motivated. (4) Be conscientious. 12. (1) Give up small love, take big love. (2) Give up selfish interests and take great righteousness. 13. (1) Write about Tunhou's mother's eagerness for a better life. (2) To shape the character of Dunhou who is willing to abide by the desolation and dedicated.

The multiple-choice questions of modern reading are sadly all wrong, and the short-answer questions are not summarized from the original text, and if you correct according to the standard answers, it only gets 1 point for 10 points for small reading comprehension.

It can also be seen from the big reading comprehension that AI will not have the slightest answering skills, such as asking artistic techniques, the correct answer is "suppression" and "baking" these two methods, AI has worked hard to answer a bunch of did not deduct the point, therefore, can only score 0 points.

The character part answers the two points of responsibility and selflessness, can only be said to have a certain understanding of the most superficial content of the original text, but lack of deep understanding, so the evaluation and artistic effect are completely wrong, it can be said that AI is a little helpless to understand in the face of a longer modern text.

It seems that AI can only analyze what the text itself embodies, and cannot deeply understand the connotation that the author wants to express.

Referring to the standard answer, it has a composite score of 4/30 in this big question.

The third major topic is reading ancient poetry (40 points).

And guess what's it going to do?

If you let AI do your college entrance examination language questions, how many points can it get?

(Image source: 2022 Zhejiang College Entrance Examination Chinese and Chinese part)

If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: 14. C 15.B 16.D

If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: 17. The AI is completely correct 18. (1) Then I will think that I am a ruthless man, and that I am stingy (rewarding) the title. (2) To know that those circumstances can be given to the people (to reward loyalty with punishment) is also to do harm to the people.

How about it, didn't you expect that the literary text of AI is actually good! Only 1 of the 3 multiple-choice questions is wrong, and the sentences are all correct!

It's just that there are many problems with the translation of the last question, such as the meaning of "forbearance" and "love" in the text should be "ruthless" and "stingy" respectively, and the AI translates to "endure" and "love", which is obviously a little hopeful, and the final literary score: 13/20.

If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: 19.(1).Qinzhenglou (2).Thousand Autumn Festival 20. Emotionally, Wang Shi expresses nostalgia for the prosperity of the past, and Du Shi expresses the lament of the past prosperity and decline; In terms of writing, Wang Shi uses detailed description, and Du Shi uses anthropomorphic techniques.

Fill-in-the-blank questions are the strength of AI, basically all right, even ancient poetry is no exception, but the understanding of ancient poetry emotions and writing methods and answering skills are still a little worse, score: 5/8.

If you let AI do your college entrance examination language questions, how many points can it get?
If you let AI do your college entrance examination language questions, how many points can it get?

Correct Answer: Omitted

The literary comprehension of the third question is also good, and the standard answer is only a few small points, and the score: 4/6.

GPT(1)(2)(4) ancient poetry is completely correct, so it can be counted as all right, and the score: 6/6.

However, "the tide is flat and the shores are wide, and there is no wind to still be" is too "creative", not only making up ancient poems by themselves, but also mixing Chinese and English...

Final Ancient Poetry Reading Score: 28/40.

Then the last part is the essay, with a total score of 60 points, and the topics are as follows:

If you let AI do your college entrance examination language questions, how many points can it get?

(Image source: 2022 Zhejiang College Entrance Examination Chinese Composition Section)

The composition materials in 2022 are quite grounded, the content and examples are given very specific, and talking about things is exactly what AI is good at, let everyone take a look at AI's 800-word essay:

If you let AI do your college entrance examination language questions, how many points can it get?

(Image source: Screenshot of the author's webpage dialogue with AI)

Looking at it throughout, I feel that there are too many repetitions of words and sentences, and the content in the quoted material is very frequent, but the logic and sentences are quite smooth, and the overall seems to barely give a passing score of 36 points.

In this way, the final score of AI is 8+4+28+36=76 points when the full score of the Chinese Zhejiang paper is 150 points.

Fail! GPT can only smile and type "GG is gone"...

So when it fails the challenge Zhejiang paper, what kind of performance will it perform when trying to do other college entrance examination language papers? Implementing the author's own strict marking standards, and in the case that the final essay is uniformly only given a passing grade, the final scores of other college entrance examination language papers are summarized in the following figure:

If you let AI do your college entrance examination language questions, how many points can it get?

(Image source: author)

A total of 8 sets of test papers were tried, and the failure rate was as high as 87.5%...

Friends must not give up learning, hope that everything will use AI in the future, now the big language model artificial intelligence is actually far inferior to your "understanding" of text, it is only good at "memory" and "content summary".

Keep learning, your smart brain will bring you the greatest surprise and reward!

Why is the AI language test not satisfactory? How about its other subjects?

In the process of grading the paper, the author found that GPT is basically all correct for words such as breaking sentences in classical Chinese and filling in the blanks according to the context, and when it comes to detailed emotions and expression and writing skills in modern reading and stories, it is difficult for AI to score, and the more modern words, the lower its score in this big question, indicating that it is difficult to grasp the point.

Why is that?

Because the infrastructure of the GPT series itself, the Transformer, is not good at handling long sequences, although OpenAI experts use sparse transformers to improve the processing of long text and reduce computational complexity, modern text counts longer, it still can't focus on key points. Especially in prose, sparse treatment means that it looks at a paragraph and skips two or three paragraphs, swallowing dates throughout, and it may be difficult to summarize what the main line of the story says, let alone understand the deep meaning of the author in the text.

The reason why the literary text answers better than the modern text is because it is shorter, which effectively avoids the shortcomings that the Transformer is not good at long sequence processing, and the literary text can usually top two or three words in the vernacular, so the information richness is higher, which enables the AI to maintain the attention mechanism of key points throughout the text, so as to have a better understanding of the overall content.

In short, AI has not undergone systematic language learning, does not understand the skills of answering exam questions, lacks mastery of the details of Chinese pinyin and grammar, and does not have a deep understanding of the emotional and spiritual connotations that the author wants to express in modern and ancient poetry.

Some people may wonder what would happen if GTP-4 were asked to challenge other subjects in the gaokao. The author's test results are: English is the highest (after all, it is its native language); Math and physics, simple questions are okay, as soon as the number of words in the question is long, it starts to make up nonsense, and the scores are quite low; The results of chemistry, biology and literature are average, not much different from language.

Relax and test luck

This year's college entrance examination has come to an end, and I sincerely wish all candidates to exert their due strength and be admitted to their ideal university!

As a "senior" who has experienced the college entrance examination, there is a sincere message for everyone, the college entrance examination is only a phased summary of life, and the level of the score cannot be equated with the success or failure in the future. Life is a long-distance run, improve your cognition, broaden your horizons, grasp the trend of the times, make the right choices, and make continuous efforts, which is the most important.

Finally, I wish you all good luck!

If you let AI do your college entrance examination language questions, how many points can it get?

The college entrance examination must win! (Image source: Midjourney, an image generation artificial intelligence model)

Bibliography:

[1] Jason Wei, Yi Tay, et al. Emergent Abilities of Large Language Models. arXiv:2206.07682. (2022)

[2] Jason Wei Xuezhi Wang, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903v6. (2023)

[3] Sébastien Bubeck, Varun Chandrasekaran, et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv:2303.12712. (2023)

If you let AI do your college entrance examination language questions, how many points can it get?

Produced by: Popular Science China

Author: Qian Yu (Center for Excellence in Brain Science and Intelligent Technology, Chinese Academy of Sciences)

Executive Producer: China Science Expo