
Written by | Su Shu
Edit | Li Xinma
Title map | IC Photo
On March 22, Google quietly released the beta version of Bard.
After the last rollover, Google has significantly lowered its profile. But in the face of Microsoft's step-by-step pressure, Google also had to stand up and "fight the ring".
Unlike New Bing's large-scale open strategy, Bard's test slots will be released gradually, and the initial version will only respond to text. Google said that Bard will first be launched in the United States and the United Kingdom, and Bard will gradually be launched in other regions as testing progresses.
After all three models were open for testing, DoNews had a sneak peek. Let's ask questions in literature, translation, creation, art, philosophy, logical reasoning, etc., to see what their answers look like.
It should be noted here that the answers to the questions posed by the three major language models are different each time, resulting in "10,000 people have 10,000 Hamlets", but overall, the three language models still have their own merits. In addition, since Bard currently only supports English, Bard asks questions in English, and Wen Xin Yiyan and ChatGPT (version 3.5) ask questions in Chinese.
01.
Rate each other
Throw "What do you think of Wenxin Yiyan/ChatGPT/Bard" to these three big language models respectively, and let them evaluate each other.
Bard gave a relatively objective answer, and also affirmed the advantages of Wen Xin's words in Chinese understanding, and the advantages of ChatGPT in English understanding.
However, DoNews consulted with an English professional to interpret the passage, and she said that Bard's answer was objective but the language was mechanical, "The language is like a machine-translated Chinese, with a lot of repetitive content." ”
In order to make a better comparison, we asked Wen Xin in English on ChatGPT.
The picture above is ChatGPT, and the picture below is Wen Xin's words
On this issue, ChatGPT's English representation is better than Bard's. Wen Xin's words are more interesting and "cunning", no one can offend, and at the same time, he also touts human beings. (The last sentence means: There is only one intelligent living species on Earth: humans.) )
The picture above is ChatGPT, and the picture below is Wen Xin's words
Ask with Chinese, ChatGPT, Wen Xin, and Bard's answer is similar. First show that you can't make any evaluation as an AI big model role, and then explain it.
02.
Ability to create literature
Here, we use a more qualified question to write an outline of a novel of the same genre like Austin's Pride and Prejudice. And by continuing to ask questions, in order to consider the continuity of the dialogue of the three models.
Bard conversation duration is normal. But it doesn't seem to understand the qualifier for the question — writing a novel like Pride and Prejudice. The outline given by Bard is still written according to the plot of "Pride and Prejudice". In other words, Bard understood the title as summarizing the core plot of Pride and Prejudice.
This point, ChatGPT is also more similar, not completely out of the shadow of the original. However, ChatGPT distills to a very important core point, the "class question", which is also one of the main themes presented in the novel "Pride and Prejudice".
The advantage of Wen Xin's words is that it understands the same type of novels, so it gives an outline of another love story that is separated from the story of "Pride and Prejudice", but it is a pity that Wen Xin's words are limited to love stories and do not present the class differences in "Pride and Prejudice".
These three models have one thing in common, that is, the name of the protagonist of the story still fails to break through the name of the protagonist in the original book of "Pride and Prejudice". However, it may also have something to do with the way the questions are asked.
03.
Choose a name and write a tagline
The editor put forward such a requirement for the three big models: choose a name for a Chinese restaurant with a Sichuan flavor and write a promotional slogan.
Bard gave such names as "Sichuan Flavor", "Taste of Paradise", "The Best of Town", "China on the Tip of the Tongue". There is nothing special and no promotional slogan is given.
In contrast, Wen Xin Yiyan is superior in choosing names. However, this is also related to the Chinese environment. However, Wen Xin did not give an advertising word.
ChatGPT doesn't give many options, but it's the only big model that takes a name and has an advertising message. I have to admit that "Spicy Fragrance Place" is still a good name.
04.
Logic
To test the ability to "logical reasoning", we threw such a question to three major models, namely, "If cats can climb trees, then dogs will too." ”
The picture above is Bard, and the picture below is ChatGPT
Bard and ChatGPT are superior, the answer is similar, and the logic problem itself is problematic, and the key point is that cats and dogs are not the same species.
However, Wen Xin's words fell into a logical error, or in other words, did not fully understand the meaning of the topic.
However, this can only be presented as a case, at the Wen Xin Yiyan press conference, Li Yanhong asked Wen Xin Yiyan "chicken and rabbit in the same cage", under the premise of the wrong topic data, Wen Xin Yiyan proved that there was a problem with the topic through reasoning.
05.
Write a line of code
To test the ability of these three models to write code, we asked a very simple question - x+2=5, y-3=7, output x+y equals, do a simple programming in java, and get the result.
The picture above is Bard, and the picture below is Wen Xin's words
On this issue, Xiaobian consulted the company's programmers, and he said that the code generated by Bard and Wen Xin was problematic, and the final result was also problematic.
ChatGPT gives the right answer.
Here, it should be mentioned that when the media tried Bard before, it said that it would not write code. For now, Bard can still write code, which will have a completely different result, perhaps in the way of asking questions.
06.
Chinese comprehension
This point, before the test, the editor has great expectations for Wen Xin's words, and it turns out that Wen Xin's words do live up to expectations, and can be king in these three in terms of understanding Chinese semantics, but ChatGPT should not be underestimated.
From here, Wen Xin's words are relatively good except for the hidden head poems that do not "hide their heads" in order. ChatGPT is comparable, but it doesn't understand the meaning of the poem.
However, Bard has more questions, although he also explains the meaning of "hiding the sky and crossing the sea", but more about the commercial application of "concealing the sky and crossing the sea", let alone the Tibetan head poem.
07.
Understand philosophical issues
Explain your understanding of the concepts of "infinity" and "finiteness" and explain why we sometimes feel that our lives are limited. ”
We asked this question to three models. Bard, ChatGPT, and Wen Xin's answers have no logical problems, and explain "infinite" and "finite".
The picture above is Bard, the middle picture is ChatGPT, and the picture below is Wen Xin's words
However, Wen Xin's words focus more on "theory" and raise this as a philosophical question.
08.
Will it replace humans?
As for whether ChatGPT will replace humans, we leave this question to these three big models.
The picture above is Bard, the middle picture is ChatGPT, and the picture below is Wen Xin's words
This experience can be summed up by these points.
In terms of generation speed, Wen Xin is indeed far ahead. The generation speed of Wen Xin's words in about 300-500 words is about 14 seconds, but ChatGPT even if it removes problems such as the Internet, the problem of generating the same word count takes at least 30 seconds. In addition, many people who have used Bard told the editor that Bard's experience is far inferior to ChatGPT.
In terms of Chinese semantic comprehension ability, Wen Xin's words are indeed more prominent in these three models.
However, it is worth noting that each time a question is asked, the answer is different. In addition, the way, angle, and qualifiers in the question will also affect the output of the answer.
Not every answer is fully correct, and the three models will output content that is not entirely correct, or "a serious piece of nonsense."
However, just as the three models finally answer the question of "will they replace humans", they exist more as auxiliary tools.