laitimes

Tencent Yuanbao cured my information anxiety

Tencent Yuanbao cured my information anxiety

5 large models horizontal evaluation, only it 500 words to explain the matter of a 100,000-word essay. Author丨Ma Ruilei, Lin Jiexin Editor丨Lin Jiexin recently flipped through the photo album to a picture in March, and suddenly found that his reading volume has been skyrocketing after engaging in AI.

Tencent Yuanbao cured my information anxiety

Large models reconstruct the thinking roles in the workflow of many industries, and also lead to people in this industry who have always had information intake anxiety. Because people from all walks of life who are engaged in research have a big brain. For example, Stanford University used AI to play different people to create an AI society, inspiring Tsinghua University to use AI to open a game design company, and later there was AI to simulate the development of human society for 6,000 years, and found that AI humans will become selfish in order to survive. It's all good, it's fun to look at and easy to understand, it's a good house.

Tencent Yuanbao cured my information anxiety
Tencent Yuanbao cured my information anxiety

The biggest headaches are those ghosts: AI has successfully edited human genes, AI has learned to predict plasma tearing to advance controlled nuclear fusion, and AI has designed a system to prove Euclid's plane geometry theorem without human demonstration. (Don't look at it, headaches, but these things that make me fall asleep in class have become part of my work.) For a long time, I have been testing the ability of each large model to read papers, and I have explored a set of prompts: summarize the content of the paper, what is the research background, what methods are used to demonstrate, what positive breakthroughs have been made, and what are the advantages compared with the same type of research? What impact will it have on the lives of ordinary people? If the technical method is complicated, please use analogy or metaphor to help me understand.

Tencent Yuanbao cured my information anxiety

This passage can quickly locate the purpose and use of the research, and at the same time understand how the research will affect us ordinary people. The problem is that most papers are tens of thousands to hundreds of thousands of words, and there are many professional terms in various industries, which AI can read, but the result is often a very empty framework. Not to mention using metaphors to interpret some content, because the AI does not understand deeply enough, it cannot use easy-to-understand words to assist understanding.....

Tencent Yuanbao cured my information anxiety

Half a year ago, I found that the best thing to use is kimi, so in 2 months, I used him to read a 11.83 million word paper, and the whole person's soul was sublimated. Of course,People always like the new and hate the old.,It's been half a year.,At this time, I also want to see what other AIs are doing.,Let's have a horizontal review.。 So I opened my chat with Kimi to see what problems I had in the past, and I put on the pain mask......

Tencent Yuanbao cured my information anxiety
Tencent Yuanbao cured my information anxiety

Yes, looking at the past chat history, I remember that kimi can only use OCR to recognize words but can't read graphs, so kimi can't recognize the statistical charts of many papers, resulting in a large number of curves and data charts in some papers. kimi is blind under the direct light, like the above picture belongs to the AI simulation of human society after 1000 generations, the human personality transformation curve, if the paper is not specifically elaborated, I don't know how to change at all... Critical information is not available.

Tencent Yuanbao cured my information anxiety

So in this evaluation, I plan to find a long text comprehension ability that is not inferior to kimi, and then there is graphic comprehension ability, but it is best to be domestic, so that I can use it at any time.

1

Elementary picture comprehension test

The first is a simple picture comprehension test.

Tencent Yuanbao cured my information anxiety

To make a disclaimer here: everyone knows that I'm a person who likes to tricky when testing AI. No way, often some AI manufacturers like to use the classic test questions that everyone has used to drill loopholes, such as identifying dogs and fried chicken, which used to be very popular, one day AI suddenly collectively enlightened, and then some netizens changed the order of the pictures, and AI couldn't recognize it. (Well, everyone figure it out for themselves)

Tencent Yuanbao cured my information anxiety

So in the next round of testing, I originally planned to use the Chinese questions of the 2016 college entrance examination in Guangdong to challenge the AI picture reading comprehension, but I was afraid that this thing would be secretly practiced by AI, so my brother had an idea and put a bunch of noise on the picture. I won't bully Kimi this round. Let's have a round of large models that have been determined to have the ability to recognize pictures, and come to an imperial city PK. Ali's Tongyi Qianwen, Baidu Wenxin Yiyan, Byte Doubao, Tencent Yuanbao.

Tencent Yuanbao cured my information anxiety

Note that I used the original picture test here, I found that Tongyi can accurately recognize numbers, but can not deeply interpret the expression, or the slap on the face, kiss, in order to confirm the rigor of the experiment, I uploaded a picture of the parking lot, and found that it can accurately recognize the logo of Ford cars, so there is no such situation that the picture cannot be read, but he has not been trained.

Tencent Yuanbao cured my information anxiety

This is the performance of the bean bag, even the numbers are read wrong, so we won't continue.....

Tencent Yuanbao cured my information anxiety

Wenxin Yiyan...... Although the score was read, I asked him later if he saw the slap and the kiss..... This guy replied to me with a "hee-hee", I&*%$#?!

Tencent Yuanbao cured my information anxiety

To be honest, I have given up when it comes to ingots, because in my impression, Wenxin Yiyan, Tongyi Qianwen, and Doubao are all out at least half a year earlier than ingots, and ingots really have no sense of existence in me. As a result, what's going on with the brothers? This reads out, and it's a noisy picture??? Tencent held back for a moment, and then when I asked about these facial features, he also made an interpretation of what might happen. So in the first PK, ingots took the lead. So since each family is sure to have the ability to read pictures, the next step is to increase the difficulty, and there are long papers with pictures and texts.

2

Long Essay Reading Ability Test

论文名:《An evolutionary model of personality traits related to cooperative behavior using a large language model》这篇论文内容,主要讲了用大模型生成不同性格的AI,模拟人类社会发展1000代,最后AI居然集体变为自私人格,自然杂志上的新研究揭露,AI在不受约束的情况下,可能整体都会趋向于自私。

Tencent Yuanbao cured my information anxiety

The main reason is that there are a lot of various curves in the paper, and in order to better understand the personality changes made by AI humans in order to survive, they must be combined with the curve diagram understanding. So here I want to take a look at the ability of each large model to summarize long texts and pictures. In view of Kimi's excellent ability to understand long texts, he is still used here as a benchmark to measure the quality of each family. But this is no longer a large-scale model civil war in China, but a claude at the current T0 level abroad, directly on the strength. Kimi's prompt: summarize the content of the paper, explain the research background, research methods and results, and what data the experimenter provides to support his experiment.

☟ Swipe up and down to see more

Slide for more photos

I first asked kimi to summarize the content of the paper to understand the details, and learned that this is a paper about AI simulating the development of human society and the change of human personality.

Tencent Yuanbao cured my information anxiety

So I asked what the trend of human iteration was, and Kimi gave an answer, but this answer honestly didn't read the whole text coherently.

Tencent Yuanbao cured my information anxiety

In the follow-up questioning, the fluctuations of this chart were not reflected. Rather, it can be roughly summarized as being selfish first and then more cooperative and then potentially becoming selfish, but this can be fatal, because in the 900th generation, all AI is greatly selfish. That is, the information obtained by Kimi is inaccurate. Tencent ingots

☟ Swipe up and down to see more

Slide for more photos

Yuanbao, I usually ask the main content first, I think the training of ingots is estimated to be a study of the user's reading habits or simply a group of people with efficient reading obsessive-compulsive disorder. Because it generates a format with clear priorities, from the research background, research methods, experimental design, result analysis, and overall conclusions. It feels like taking the notes of a top student while reading. And what model was used for the experiment, and what key data was included, are presented. This is something that Kimi doesn't have under the same prompt.

Tencent Yuanbao cured my information anxiety

But compared to Kimi, I think the biggest difference is in the iterative trend. Ingots are able to tell the fluctuations of the curve development. In the evolutionary process, the initial stage, which lasts until about the 300th generation, increases rapidly, reaching 0.55 around the 350th generation, and then drops to about 0.40 around the 450th generation. After that, the cooperative ratio increased and decreased repeatedly, reaching a maximum of about 0.75 around the 850th generation, and then rapidly decreasing to about 0.15. According to the fluctuation of the data, it is concluded that in the process of evolution, the distribution of personality genes in the two-dimensional space of AI humans shows multiple transformations, reflecting the alternation of cooperative and selfish personality traits. That is, the evolution of AI humans has been jumping back and forth between selfishness and cooperation, and a specific time period has been given. (History is really a wheel~)

Tencent Yuanbao cured my information anxiety

Moreover, I also found that there is an extra button in the lower left corner of it - read the document in depth, and go in at one point, Master Yuanbao will knock one for you today, and then I will not give up, take me more.

Tencent Yuanbao cured my information anxiety

Because it directly combines the diagram with the content, turning the paper into courseware, in the past, when I opened the paper, I saw that the chart was numb, because I still had to look at the small print to understand what the chart described. Now I open the chart with ingots, and I blew up, because I realized it directly. And I doubt that Tencent went somewhere to invite a gold medal lecturer to prepare for the lesson,The visual design of the entire UI interface is very in line with reading habits,There is an outline of the paper on the left,The text part is matched with the picture to see the paper,If you don't understand,You can also ask questions about the content in real time,Really understand me。

Tencent Yuanbao cured my information anxiety
Tencent Yuanbao cured my information anxiety

At the end of the day, the people also put a key question and answer, and this thing shocked me. Brothers, those who have participated in the defense should know the gold content of this function, right? This is Professor Yuanbao simulating the graduation defense with you, and the teacher is marking the key points for you before the exam, and you can also refresh different questions.

Tencent Yuanbao cured my information anxiety

People will even evaluate the paper, in other words, the paper you wrote will be uploaded to Yuanbao, Yuanbao will teach you to change the paper, and you will mock the defense when you are done, Brother Baozi, not only look at the paper, I found that it is estimated that writing a thesis and mock defense also has miraculous effects. A thousand questions in the general sense

Tencent Yuanbao cured my information anxiety

The overall idea looks good, the beginning is concise and clear to introduce the research focus of the paper, and the text is displayed from the characteristics and results of the research, but if you delve into the specific content, you will find that it is not very comprehensive, a little vague, and after reading a seat, it is better than a seat. Claude-3.5

Tencent Yuanbao cured my information anxiety

At a glance, Claude's reply was really concise, mainly summarizing some of the main points of the paper, not particularly systematic, but I have to say that maybe because of the small number of words, I actually read it. But it's too concise, and after reading it, I don't have it, and it's not very friendly to me, who is just getting started.

Tencent Yuanbao cured my information anxiety

Of course, Tongyi Qianwen and Claude-3.5 also do the content of Yuanbao to summarize the specific values, the difference is that Claude-3.5 can clearly know which picture the specific conclusion corresponds to, which Tongyi Qianwen does not. But Clude3.5 doesn't put the picture there like ingots, and you have to flip through the picture and slide back and forth, which looks troublesome. From the tests of kimi, Tongyi Qianwen, Tencent ingots and Claude 3.5, I also unexpectedly found that the interaction design between kimi and Tencent ingots is very smooth. When asking a question and getting the corresponding feedback, these two companies are very nice, click on the share logo in the bottom right corner of the generated answer, and they can quickly generate a long image or link to the content. In fact, Tongyi Qianwen,Click to share will also have corresponding interactions,But at present, you can only copy the link of the answer,There is no function to generate pictures,Tongyi,Here you can improve it。 In addition to the ability to summarize papers, I don't know how each company is performing when I read the research report, so let's try again and see the effect.

3

Analytical research report

Then throw a PDF of "Insights into the Heat Trends of the 2024 Paris Olympics", and help me analyze the research report, summarize the most important information, and the number of words should not exceed 500 words. Tongyi Qianwen summarized a paragraph very simply, and a closer look at the content only summarized the platform and brand cooperation, which is not comprehensive.

Tencent Yuanbao cured my information anxiety

Tencent ingots

Tencent Yuanbao cured my information anxiety

Here I am again on ingots, summarizing the core points of the research report, and also summarizing the specific content from the Olympic Games heat scanning, topic insight, and brand insight, which is very clear.

Tencent Yuanbao cured my information anxiety

If you are a short video operator or merchant, you will find out how precious ingots' information is. First of all, he will tell you what the main hot spots are. Then he pointed out that the two most popular social platforms, Weibo and Douyin, of which Weibo accounted for 68.3% of the content of the whole network, and Douyin accounted for 69.4% of the interactive Olympic topic interaction on the whole network. However, Yuan Bao also pointed out that the brand mainly carries out commercial advertising on Xiaohongshu, because Xiaohongshu's hot topics focus more on sports and athletes, while Douyin focuses on patriotic topics. At the same time, from the perspective of consumer trends, there are many female users in Xiaohongshu and many male users in Douyin, and 25~34 years old is the main group. Isn't the consumer portrait suddenly clear? If I can summarize every research report like this, I can read 100 copies a day.

Tencent Yuanbao cured my information anxiety

The point is that its in-depth reading can still summarize the key information with pictures, and at the end of each intensive reading, there can be another wave of answers to key questions. Claude-3.5

Tencent Yuanbao cured my information anxiety

It's a decent, very concise summary of some of the information you want to see. The overall experience,Ingot is indeed stronger in the ability of long text intensive reading,It is very online in terms of content and text format,I feel that it understands the user's reading habits very well,The outline of the in-depth reading mode、Picture and text collocation、The ability to ask questions about the article in real time,It's very comfortable to use!

4

Extra-test article

Of course, it has also been popular on the Internet recently to test AI's ability to understand memes and mathematical logic reasoning, so here are also some of the tests that everyone likes to test on the Internet to see the performance of each company.

Tencent Yuanbao cured my information anxiety

Upload a meme and ask: What does this meme actually mean? A thousand questions in the general sense

Tencent Yuanbao cured my information anxiety

It can be seen that it is very serious about understanding memes, there is a physical level, and there is a lack of chemistry, humor and burnout. Tencent ingots

Tencent Yuanbao cured my information anxiety

Yuanbao really understands how to hit workers, and directly and clearly targets an emotion. "Complaining about an unsolvable problem" or "Feeling powerless about a situation". Claude3.5

Tencent Yuanbao cured my information anxiety

Claude reads a lot of mixed emotions in this wave, and seems to be better at describing everyday helplessness than I do.

Tencent Yuanbao cured my information anxiety

This is followed by simple mathematical reasoning, and in order to prevent the problem from being trained by the AI, I test the same graph in reverse order. Wen Xin said

Tencent Yuanbao cured my information anxiety

No, Wen Xin's words leaked out of the chicken's feet, the positive answer is no problem, but the reverse answer is simpler or similar to the square......

Tencent Yuanbao cured my information anxiety

Tongyi Qianwen is cleared normally.

Tencent Yuanbao cured my information anxiety

The ingots are also cleared normally.

Tencent Yuanbao cured my information anxiety

As an aside, when I was using Tencent Ingot today, I also wanted to see its ability to update the Internet in real time to get the latest information. The reason is that although most AI now has networking capabilities, it generally searches for some old messages as a reference. When I tried to search for the application of AI in Yiwu, I actually found the article I wrote last Friday, and the ingots also summarized the content of the article. In this horizontal test, there is a feeling that the large models seem to have become a little slack after last year's 100-model war. In fact, as a user, I really want to see the rolls of various companies, so that there will be better products to help me "work". Seriously, the advantage of AI products lies in the process of continuous evolution, there is no eternal winner, only eternal innovators. It's a long competition, and a better user experience is the only constant.

Tencent Yuanbao cured my information anxiety
Tencent Yuanbao cured my information anxiety

Without the authorization of "AI Technology Review", it is strictly forbidden to reprint it in any way on the webpage, forum, and community!

Please leave a message in the background of "AI Technology Review" to obtain authorization for reprinting on the official account, and you need to indicate the source and insert the business card of this official account when reprinting.

Read on