laitimes

The Experience Report is here! The reporter personally tested Baidu's "Wen Xin Word" for the first time

"Science and Technology Innovation Board Daily" on March 16 (reporter Huang Xinyi), today Baidu held a press conference on Wen Xin Yiyan, announcing the opening of the invitation test. The reporter of the "Science and Technology Innovation Board Daily" immediately got the internal test code of Wen Xin's words, and actually tested the effect of Wen Xin's words. On the whole, Baidu Wenxin can basically complete the question answering and image generation demonstrated at the press conference. However, the understanding of some issues still needs to be further improved.

First of all, the Science and Technology Innovation Board Daily asked a more common question: What is the difference between you and chatGPT?

It can be seen that Wen Xin's Chinese organizational skills are good, and he can answer questions and answers smoothly.

Later, we asked about the domestic companies involved in pre-training large models? Wen Xin's answer was also more comprehensive.

The reporter tried a problem that involved information searching.

What has been Tesla's sales in China in the past five months and the past year? Wen Xin's answer is:

The reporter directly obtained the answer through Baidu search: statistics released by the Passenger Association show that in January 2023, Tesla's sales in China reached 66,051 vehicles, an increase of 18% month-on-month; The monthly export volume is 39,208 units.

Wen Xin has not been able to grasp the correct data source, and it still needs to be further optimized.

At the press conference, Baidu focused on demonstrating the comprehensive ability of Wen Xin Yiyan in five scenarios. According to the demo demon, Wen Xin Yiyan not only has the advantages and abilities of large language models such as literary creation, commercial copywriting, and mathematical calculation, but also shows the ability of Chinese understanding and multimodal generation.

The reporter of the "Science and Technology Innovation Board Daily" specially tested from these aspects.

The first is literary creation, and the reporter asked it to imitate the three-body style to write an 800-word science fiction novel.

Wen Xin's answer was:

After that, the reporter asked him to write a children's song related to the little yellow duck, and Wen Xin's words were completed in more than ten seconds.

For migrant workers, writing a job summary is a headache. The reporter also asked Wen Xin a question about this.

Judging by the answer, it seems that the programmer's work summary is more neatly written.

Then, the reporter tried Wen Xin's ability in the direction of commercial copywriting.

We asked Wen Xin to write a children's clothing marketing copy for the Double 11 promotion, and set the keywords of "healthy materials and high cost performance".

Obviously, Wen Xin understood the meaning of the key words, and integrated "healthy materials and high cost performance" into the copywriting.

The reporter continued to let it generate Mid-Autumn Festival poster copy for AI medical companies.

From the answer, Wen Xin's words can accurately understand the Chinese meaning of the problem, but the literary style still needs to be improved.

The reporter also asked it to name a big data and business intelligence company.

The name of the company given by Wen Xin in a word can only be said to be very "hopeful".

After that, the reporter asked Wen Xin to write a Tibetan poem "Happy Birthday to You".

This time the performance was good, it was completed in about 10 seconds, and the rhyme was achieved.

Li Yanhong believes that "Tibetan head poetry" is a test of AI's understanding of Chinese and Chinese culture, and can clearly demonstrate the advantages of Wen Xin's words in Chinese. "However, correspondingly, Yiyan's current training in English language and code scenarios is not enough, and the performance is not good enough, and we must step up training and continuously improve these capabilities."

Then, the reporter hoped to test Wen Xin's mathematical logic estimation ability, and at the press conference, Wen Xin successfully answered the classic question of chicken and rabbit in the same cage.

The "Science and Technology Innovation Board Daily" found several elementary school math topics from the Internet.

For example: chickens and waivers are placed in a cage with 29 heads on top and 92 feet below. Q: How many chickens and rabbits are in the cage?

Wen Xin's answer is 12 chickens and 17 rabbits, which is consistent with the standard answer.

But for other math problems, Wen Xin's words are wrong.

There are 36 coins of 2 and 5 cents, worth a total of 99 cents. Q: How many coins are there for each type?

The correct answer is 2 points 27 pieces, 5 points 9 pieces. But Wen Xin's words were not correct.

A certain mathematics competition has a total of 20 questions, and the scoring standard is 5 points for each correct question, and 1 point is deducted for each wrong or no question. Xiaohua participated in this competition and scored 64 points. Q: How many questions did Xiaohua get right?

The correct answer should be 14 questions, and Wen Xin came up with 16 questions in one word.

Li Yanhong said at the press conference that Wen Xin Yiyan has certain thinking skills and can learn relatively complex tasks such as mathematical deduction and logical reasoning. But the accuracy rate is not 100% at this stage, and it needs to be given more time to learn and grow.

Robin Li also demonstrated a demo of the multimodal generation direction, > such as creating a poster for the 2023 World Intelligent Transportation Conference. Here is the demonstration effect:

In the actual test, Wen Xinyan did not seem to have the function of generating posters for the conference, but gave some design suggestions.

However, for simple images with keywords, Wen Xin did a good job.

The resulting picture effect is basically up to standard, and the speed is extremely fast, only about ten seconds.

Previously, a number of employees of major Internet manufacturers told the "Science and Technology Innovation Board Daily" reporter that they had begun to use ChatGPT to automatically generate business code and refactor code.

So, the reporter tried to see if Wen Xin's words could write code smoothly.

Question: I need a piece of bubbling sort java code

Question: JS script that draws ovals

The reporter asked a programmer friend to check, and the other party said: There is no problem at a rough glance, and I feel that it will be a good auxiliary tool for programmers in the future.

Li Yanhong said that the current version of Wen Xin Yiyan has been able to generate text, pictures and voice. "Because the cost of generating video is relatively high, it is not yet open to all users, and we will gradually access it in the future. But friends who are familiar with the creation of Baijia should have experienced this function, and tens of thousands of articles are converted into video content through this ability every day for distribution on Baidu. ”

Robin Li pointed out that multimodal is a clear development trend of generative AI. In the future, with the enhancement of Baidu's multimodal unified large model, the multimodal generation capability of Wen Xin Yiyan will continue to improve.

From the reporter's experience, Wen Xin's words have been able to answer questions more smoothly and accurately, but they still need to be optimized in some Q&A scenarios.

Li Yanhong said at the press conference: On the whole, such large language models are far from the stage of development and perfection, and they sometimes have very amazing performance, but in many scenarios, there are obvious bugs in detail, and there is a lot of room for improvement. It will surely develop rapidly and change rapidly in the coming period.

Read on