laitimes

Can't translate between Chinese and English, Baidu Wenxin changed "Chinese poor student"

author:Acorn Business Review
Can't translate between Chinese and English, Baidu Wenxin changed "Chinese poor student"

Written by Zhao Xiang, Jiang Ruiying

Edited by Qin Tuo

Recently, Wen Xin's "skin" controversy has intensified.

An industry insider told Acorn Business Review that the so-called "set skin" should refer to Wen Xin's behavior of using Chinese-English translation to apply open source models and using open source data to join training.

According to relevant papers released by Baidu, the "Wensheng Diagram" function of Wen Xin Yiyan uses Baidu Translation for Chinese-English translation during training.

Can't translate between Chinese and English, Baidu Wenxin changed "Chinese poor student"

In fact, Baidu Translation has long fallen behind in the industry, and the machine translation effect is far behind other artificial intelligence companies.

Acorn Business Review compares Baidu Translation with iFLYTEK Translation, Tencent Translation, and NetEase Translation and finds that Baidu Translation's translation of nouns and phrases has problems such as ambiguity and ambiguity.

The above-mentioned industry insiders said:

If the translation is wrong, it will also directly affect the effect of generating the model.

But translation is the only way for AI in China.

As of 2021, English content accounted for 60.4% of the world's top 10 million websites, and only 1.4% Chinese content.

Chinese AI relies on a large number of English-language datasets for training.

Translation became a "mountain" in front of Wen Xin's words.

In the face of technical problems, Baidu not only never responded positively, but also constantly publicly emphasized that "domestic products" take time.

Such a statement seems to be safeguarding Li Yanhong's self-esteem of "an excellent student with a word of literary heart Chinese an excellent student."

Wen Xin is trapped in Chinese-English translation, where is the future of Chinese AI?

One: Wen Xin's words "set the skin" doubtful, the self-portrait is actually a "white male"?

On March 22, blogger @Mr. Liu Dake broke the news that Wen Xin's words "set skins, draw skins, and fake".

@刘大可先生表示, if you use Wenxin to make a picture, it will translate the Chinese into English, and then use the foreign open source AI literary diagram model Stable Diffusion to generate the picture.

Stable Diffusion is similar to OpenAI's Dall-E 2 in that it is an AI literal graph model released by British company Stability AI in August 2022.

Can't translate between Chinese and English, Baidu Wenxin changed "Chinese poor student"

Take "one can bean" as an example, this sentence is not a complete language in Chinese; If you use "Baidu Translation" to translate directly into English, the result is "One can beans".

"Can" is translated as "can", and depending on the context, "can" is understood as a measure word, and the whole sentence means "a can of beans".

Therefore, let Wen Xin make a picture of "one can beans" with one word, and you will get a picture of "a can of beans".

This is not an isolated case:

1. "Beef can" using "Baidu translation" will get "Beef can";

2. "A refrigerator can" is "A refrigerator can".

3. "Milk Road" corresponds to "Milk Road";

Correspondingly, Wen Xin's words will also generate pictures of "a can of beef", "refrigerator full of cans" and "Milky Way".

In addition to the mistakes caused by "straight translation", the words with multiple meanings will also make Wen Xin's drawing "blind":

1. "Crane" "Baidu translated" as "Crane";

2. "Turkey" for "Turkey".

"Crane" is more commonly used to mean "crane", and "Turkey" also means "turkey", so Wen Xin's words will generate pictures of "crane head" and "turkey".

In addition, if Wen Xin is allowed to generate a "portrait" with a word, without adding the keyword "Chinese", all the drawings are "Caucasians".

On March 16, Acorn Business Review asked Wen Xin to draw a "self-portrait," only to get a picture of a "white male."

On March 23, Baidu issued a statement saying that "no rumors and no rumors", and Wen Xin's "Wen Sheng Tu" ability comes from Wen Xin's cross-modal large model ERNIE-ViLG.

Can't translate between Chinese and English, Baidu Wenxin changed "Chinese poor student"

Two: "Pig Teammate" Baidu translation, let Wen Xin's words involved in the "set skin" turmoil

Wen Xin's words have a different answer.

On March 23, according to The Paper, when talking to Wen Xin and asking whether he used Stable Diffusion, Wen Xin not only admitted to using Stable Diffusion, but also admitted to using deep learning models such as Transformer and GRU to generate images.

In fact, this does not mean that Wen Xin's words "Wen Sheng Tu" have the so-called "set skin".

Baidu officially introduced that Wenxin's "Wensheng Diagram" function comes from ERNIE-ViLG 2.0.

According to Baidu's paper "ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Hybrid Denoising Experts", the training data of ERNIE-ViLG 2.0 consists of 170 million graphic and text pairs, including the English dataset publicly available on the network and Baidu's internal Chinese dataset.

However, during the ERNIE-ViLG 2.0 training phase, the Chinese and English translations in some training data were automatically translated by Baidu Translation.

Industry insider Tang Zhe (pseudonym) told Acorn Business Review that Baidu Translation has fallen behind, and there is a big gap between machine translation and other artificial intelligence companies.

In particular, there are problems such as ambiguity and ambiguity in noun and phrase translation, and if the translation is wrong, it will directly affect the results of the generated model.

Take the above "beef can" as an example.

Acorn Business Review translates the result in Baidu as "Beef can" (canned beef); Tencent translates as "Beef is fine"; NetEase translates as "Beef can" (canned beef); iFLYTEK translated the result as "Beef is OK".

Among the different translation software, only Tencent Translation and iFLYTEK Translation answered correctly.

Tang Zhe gave an example of the hottest "tiger-headed fat boy" in recent times.

Baidu translates as "Big fat kid with a tiger's head and brain", and NetEase Translation and Tencent Translation are similar.

iFLYTEK's translation result is relatively close, "A tiger-headed fat boy";

Tang Zhe explained that Wen Xin's ability to draw in words emphasizes the physical information of entering text content, such as "fat boy with tiger head", and will treat "tiger head" as a separate entity.

This not only ignores the overall sentence-level semantic understanding, but also completely inconsistent with the mention of "strong Chinese comprehension" when Wen Xin's words were published.

Three: "Poor student" disguises "honor student", Li Yanhong's self-esteem game?

At the Wenxin Yiyan press conference on March 16, Baidu CEO Robin Li used a pre-recorded Wen Xin Yiyan demonstration video to explain his Chinese understanding ability.

For example, Xiang Wen Xin raised issues such as "Luoyang paper expensive" and "hidden head poems".

In the end, Li Yanhong concluded that Wen Xin Yiyan is a big language model rooted in the Chinese market, with the most advanced natural language processing capabilities in the Chinese field.

At present, it seems that Wen Xin's words are far from what Li Yanhong said about "Chinese excellent students".

Tang Zhe believes that Baidu adopts Chinese-English translation methods, whether it is applying open source models or using open source data to join training, it should pay more attention to Chinese language understanding and Chinese-English translation.

Regrettably, there is a technical problem with Wenxin's words, and in the face of doubts, Baidu has not provided a public professional answer to this question.

It also consumes the patience and confidence of domestic users on the grounds that "it takes time to learn and grow" and "to give self-developed product information and time".

On the other hand, when Wen Xin's answer is accused of having a problem, the answer will "disappear".

On the afternoon of March 23, Acorn Business Review used Wen Xin to find that the "XX can" question, which was similar to "beef can", was suspected to be "blocked".

It was not until the evening of the same day that the answer was restored again, and in the face of the instruction that "beef can be", Wen Xin drew no longer "canned beef", but "a plate of beef".

Does this seem to be an "upgrade iteration"?

But continuing to ask "Milk Road," the resulting image is still "the Milky Way."

This operation is not like Baidu's "learning ability improvement", but more like a shielding and correction carried out by real people.

Even so, Li Yanhong also believes in an exclusive media interview that Wen Xin's words can catch up with ChatGPT in January this year in two months.

Tang Zhe believes that from the user's point of view, the gap between Wen Xin and ChatGPT is not small.

ChatGPT can basically meet the demands of users, on the other hand, Wen Xin's words, the ability to make pictures and understand the ability to understand both "lost", "From the results of drawing, Wen Xin's words are more suitable for playing the game of looking at pictures and guessing idioms." ”

Four: Wen Xin is trapped in Chinese-English translation, is it difficult for China to have ChatGPT?

How far is Wen Xin's word from ChatGPT?

Tang Zhe believes that Wen Xin's answers in knowledge Q&A, dictionaries, literature retrieval and other aspects are basically passed, which is to give full play to the advantages of Baidu's own search engine.

But in mathematics, code, inductive reasoning, translation, etc., it was left behind by ChatGPT.

Tang Zhe has repeatedly asked about the unique Chinese lunar calendar, and even though Baidu has revised Wen Xin's words many times, Wen Xin still does not give the correct answer.

For some common kinship relationships and reasoning choices, the reasoning logic shown by Wen Xin seems to be very reasonable, but it is actually a serious nonsense.

Tang Zhe concluded:

1. In Chinese writing and grammatical analysis, the accuracy and accuracy of the analysis are not high;

2. Text generation, rewriting, composition, etc., although there will be a basic format framework, but can not read the content carefully, the content is not as detailed and delicate as GPT.

3. Wen Xin's words in the parametric full memory type of massive information, and the reply to some complex logical thinking and reasoning questions, which do not conform to the user's original intention.

Secondly, for sensitive topics related to security, Wen Xin is very cautious about his words.

Tang Zhe said that the range of sensitive words in Wen Xin's words is broad, and the processing logic is also slightly rough.

For example, if user A asks a question with a sensitive word, Wen Xin's words will force A's dialog box to close in order to prevent A from continuing to post.

On such issues, ChatGPT is relatively more objective, which will increase positive guidance and reassurance.

Tang Zhe believes that through the above examples, it can be seen that Baidu's internal refinement and stratification are not enough, and the processing is not flexible.

In fact, for Wenxin, the most important thing at this stage is to find alternatives to Baidu translation.

Some insiders said that although the scale of Chinese Simplified Internet users and English Internet users is similar, as of 2021, English content accounts for 60.4% of the world's top 10 million websites, and Chinese content accounts for only 1.4%.

Chinese AI needs to rely on a large number of English datasets for training, otherwise it will suffer.

If Wen Xin wants to become the "Chinese version" of ChatGPT, he needs to do the first step of translating between Chinese and English.

When Wen Xin said that "Wen Sheng Tu" was still "looking at the picture and guessing the idiom", according to the latest news, OpenAI announced that ChatGPT supports access to third-party plug-ins and launched 11 plug-ins.

Among them, the Browsing plugin supports ChatGPT to search Internet content in real time.

Unsealing ChatGPT will have countless possibilities.

There is no doubt that the era of all-around AI assistants is coming.

But to this day, China is still "locked up" by ChatGPT.