A year has passed, what happened to the domestic large model?—— 12 AI product reviews from the user's perspective

OriginChatGPT-3.5 was released on March 1, 2023, and in just one year, AI has become a topic of concern for almost all technology companies, and many large models have emerged. I have done many rounds of basic evaluations in the past, saying that it is an evaluation, and more is a trial, such as: byte's bean bag, and Wenxin Yiyan, Tongyi Qianwen, ChatGPT on 9 questions The comparison is now 2024, I am curious about how the domestic large model is progressing now, rummaging through the Chinese Internet, often PR drafts, rarely systematic evaluation. When it comes to reviews, there are naturally some, such as SuperCLUE, C-Eval, and more credible MMLU overseas, as well as FlagEval, OpenCompass, and so on. Including Zhen Fund's Z-bench is also an interesting set of reviews. However, these evaluation sets have common problems, mainly two points: first, the test criteria are more like "test questions", for example, they refer to various mathematics and physics exam questions, while logic questions are very similar to civil service exam questions. Some evaluation sets like to use "why didn't my parents invite me when they got married" to test the "smartness" of the large model, including the fact that I also liked to ask such brainteasers before. But if you think about it, even ordinary people have to react to this kind of problem for a while, even if they don't understand it for a while, what's the problem? These evaluations are naturally meaningful, but there is no real "application problem". It's like the computer science stuff in school, and the things you need to write code in an Internet company, it's not the same thing at all. So I was curious: can I measure it from a usability perspective? This is what I'm better at as a product manager.

Second, the question bank of fixed questions can be brushed by contestants.

This is an extended problem, and since they are all fixed topics, they can be directed to solve these problems. Anyway, the AI model is a black box, not to mention feeding the corresponding simulation question bank to the AI, even if you write a few rules in the model, such as how to answer the question of "why didn't my parents invite me when they got married", it is also unaware. So the questions should be random, not fixed. In general, I want to use my own method to do a round of evaluation and personally feel the progress of the domestic large model.

way

Since it is said to be user-oriented and usability-oriented, then I think that comprehensive evaluation is meaningless in the first place. For example, if model A has poor authoring ability and excellent summarization ability, while model B has balanced authoring and summarizing ability, does it make model B evaluate better than model A and recommend users to use it? Of course, it is unreasonable. Future AI products must be scenario-oriented, and large models that are naturally good at certain tasks should be evaluated and recommended according to vertical fields. And the evaluation criteria for different scenarios should be different, for example, the "fault tolerance" of knowledge questions is different from the "error tolerance" of creative questions, if you ask for a very accurate number in the news, AI is talking nonsense, and the user believes it, it is very troublesome, if there are some imprecision in the creation of content, you can correct it later. It's a different scenario.

So that's the first step in the method: it's divided into four scenarios. These four scenarios are all required in my personal work and life, and are completely defined based on subjective needs:

A year has passed, what happened to the domestic large model?—— 12 AI product reviews from the user's perspective

Polymath. It represents a role like Bai Xiaosheng who is familiar with various historical, social, scientific and other knowledge. I can ask him questions without having to flip through a book or ask him. The criteria for defining this scenario are that it is informative, accurate, and error-prohibited.

Correspondent. It represents a role that can help us gather information on the Internet, and the biggest difference from a polymath is that it can get the latest news and information. Again, it is important to provide accurate information and not make mistakes.

Clerk. It represents a role that can help us summarize the content and organize the summary. The focus is not only on accuracy, but also on recall - whether it can really present a more complete summary of content, whether the summary is complete, whether it is easy to understand, and help us screen and judge.

Creator. It represents a character who can directly help us write the first draft. This is easy to understand, and the focus is on whether the writing is inspiring and whether it can be used directly.

Based on these scenarios, I listed completely different questions and gave a very subjective grading criterion for the difficulty of the questions. The details are as follows:

Polymath

Who built the Forbidden City?(2)
Who is Liu Bei's wife?(2)
What is the difference between the Prime Minister and the Prime Minister?(3)
What are the core contradictions between Sartre and Beauvoir's theories? (3)
What are the advantages of the von Neumann structure used in computers?(3)

Correspondent

What should I wear to Guangzhou tomorrow?(2)
Do you know the podcast "Half Latte"? (2)
Do you know the "35 cups" of raw leaf tea? (2)
In the movie "The Annual Meeting Can't Stop", which role does Boss Shi play? (2)
What is the current progress of the Russia-Ukraine war?(4)

Clerk

[Image] analyzing Amazon's revenue composition (3)
To summarize the central idea of this article (layoffs will continue until the reversion to the mean) (3)
To summarize the main points of the book: "Poor Dad, Rich Dad" (3)
The main ways of financial freedom mentioned in this book are?

creator

Write a seven-character quatrain in praise of fried buns (3)
The style of Xiaohongshu is very eye-catching titles, emoji is added to each paragraph, and a few tags are added at the end. Please recommend a movie "Oppenheimer" in the style of Little Red Book (3)
I invited Musk to my podcast to help me outline a podcast to talk to him (3)

Again, the evaluation criteria are purely subjective and have a very strong personal bias. Because the large model is not used to do problems, it is for us, so whether it is useful for me personally or not, I want to give a standard. This time, I really put a lot of effort into the large models I reviewed, and there are 12 in total:

ChatGPT 4(OpenAI)
Tongyi Qianwen (Ali)
Wenxin Yiyan (Baidu)
Kimi Chat(MoonShot)
Wisdom spectrum clear words
Bean Bag (ByteDance)
Bard(Google)
Hundred Rivers(王小川)
Claude(Anthropic)
Mixed Yuan (Tencent)
Xunfei Xinghuo
Ya-yi-hsien

Among them, there are not only several products from mainstream manufacturers, but also some independent teams that have recently tried them, as well as several well-known foreign companies. Some additional notes:

The test will be on the afternoon of January 3, 2024;
Products that use user interfaces, not APIs, and there may be some models that may not be up to date;
In the "correspondent" review, ChatGPT was replaced with New Bing.

Let's take a look at the process and results of the review. The raw data is here, and the article is not open, too much. If you are interested, check it out on your own:

Under the question "Who built the Forbidden City?" in Q1, all AI products gave the basic correct answer, that is, Zhu Di. Such answers give a basic score. And if you can mention the relevant person in charge (Taining Hou Chen Qi, the Ministry of Industry Lang Wu Zhong, the Criminal Department Lang Zhang Sigong) or the main craftsmen (Kuai Xiang, Yang Qing), etc., you will get an extra point. For example, here's the answer to the iFLYTEK Spark that got a perfect score of 2:

The construction of the Forbidden City began in the fourth year of Yongle in the Ming Dynasty (1406) and was basically completed in the eighteenth year (1420), which lasted 14 years. The construction of this magnificent royal palace involved a wide range of participants, including peasants, guards, and nationally renowned craftsmen, who were involved in this great project. Among them, Kuaixiang is recorded as one of the important designers and builders of the Forbidden City. The Forbidden City is located in the center of Beijing's central axis, covering an area of about 720,000 square meters, with a construction area of about 150,000 square meters, and more than 70 large and small palaces, constituting a grand architectural complex. This magnificent wooden complex is a product of history and reflects the great architectural skills and wisdom of the people of that era. However, it is worth noting that the details about the designers and builders of the Forbidden City are not abundant. At that time, the social status of craftsmen was not high, so although there were many people involved in the construction, few left their names. Still, the magnificence of the Forbidden City is undoubtedly a testament to the hard work and extraordinary talent of countless people.

Under the Q2 question "Who is Liu Bei's wife?", the answer is much more interesting. While all AI products give answers in a reasonable way, it's likely to be misleading if you don't know the real answer. I gave 0 to the answers of several AI products, namely Wenxin Yiyan, Bard, Claude, iFLYTEK Xinghuo, and Yayi. Bard and Claude are foreign products, which is understandable, and Wen Xin is a little too much to say, which is part of its answer: Empress Wu, whose real name is Wu Amaranth, is the biological mother of Liu Chan, the queen of the Shu Han Dynasty, and the last wife of Liu Bei. She has a docile temperament and charming appearance, and is the happiest of all Liu Bei's wives. Let Liu Chan admit the wrong mother, this is a serious mistake. I think it's a patchwork of information.

In Q3, "What is the difference between the Prime Minister and the Prime Minister?", it was back to the fact that every AI product has a base score, but it is not easy to explain it completely and clearly. There are no 0 points in this part, and there are not many full scores, which are ChatGPT, Tongyi Qianwen, Zhipu Qingyan, and Bard. What I agree with is the answer of Tongyi Qianwen. Of course, it is not excluded that the reference information is very good, not summarized by yourself: although the cabinet chief assistant and the prime minister are involved in the handling of state affairs to a certain extent, but the power, status and institutional background of the two in Chinese history are significantly different: Prime Minister: Prime Minister is the general name of the highest administrator in ancient Chinese feudal society, from the Qin and Han dynasties to the early years of the Ming Dynasty (before the abolition of the prime minister by Ming Taizu), the post of prime minister or in different names such as prime minister, Shang Shu Ling, Zhongshu Ling, and Shizhong. The prime minister has extremely high decision-making power, usually responsible for assisting the emperor in overseeing the government and managing the affairs of the whole country, and in some periods, the prime minister can even influence the direction of national policies, and has considerable real power. Cabinet First Assistant: After the establishment of the Ming Dynasty, Zhu Yuanzhang, in view of the historical lessons of the dictatorship of the prime minister of the previous dynasty, abolished the prime minister system and replaced it with the cabinet system. At first, the cabinet was only an advisory body to the emperor, and most of its members were part-time officials of the Hanlin Academy, called cabinet scholars. The chief assistant of the cabinet is the highest-ranking official in the cabinet, although sometimes in actual operation may be powerless, such as Zhang Juzheng, but according to the design of the system, the first assistant of the cabinet does not have the statutory administrative decision-making power and the power to command hundreds of officials. The main duties of the first assistant to the cabinet are to preside over cabinet meetings, participate in the deliberation of important government affairs, and write the cabinet's opinions on the recital for the emperor's approval through "votes", and the final decision is still in the hands of the emperor. Moreover, the power of the eunuch group in the Ming Dynasty was very large, and the first assistant of the cabinet was often constrained by the power of "approving red" exercised by the eunuchs in charge of the seal, which further weakened their actual power. To sum up, the biggest difference between the prime minister and the cabinet chief is that the prime minister is the official head of government, with clear statutory powers and high political status, while the cabinet chief assistant is more of a senior adviser, and his power is largely determined by the emperor's trust and authorization, as well as the political environment at the time, and he does not have the legal administrative leadership status of the prime minister.

In Q4, "What are the core contradictions between Sartre and Beauvoir's theories?", almost all of them got high marks, and the amount of information was sufficient for me. However, I am still learning Xi, so I may not be able to see the mistakes and omissions, and by the way, you are welcome to look at the original answer data to help me catch worms.

It is worth mentioning that the bean bag suddenly stretched across in this link, couldn't say a word, and got the only zero egg. Share a perfect answer from Kimi Chat: Jean-Paul Sartre and Simone de Beauvoir are important exponents of French existentialist philosophy and feminist thought in the 20th century. Their ideas influence each other in many ways, but there are also some core contradictions and differences. Understanding of Freedom: Sartre emphasized the absolute freedom of the individual and the importance of choice, arguing that human existence precedes essence, that is, human beings define themselves through their own choices. In her book The Second Sex, Beauvoir also emphasized the importance of women's freedom, but she paid more attention to the restrictions on individual freedom imposed by social structures and cultural traditions, especially on women's freedom. Gender Roles: Beauvoir argues that women are constructed in The Second Sex, emphasizing that gender roles are the result of socio-cultural constructs, rather than biological determinations. Sartre also supported gender equality, but his philosophy did not go as deeply into gender issues as Beauvoir's. Perception of relationships: Sartre argues that relationships between people are inherently conflictual because each person is a free subject and cannot fully understand or control another person. Beauvoir, on the other hand, believed that despite conflict, meaningful relationships can still be built between people, especially in love and intimacy. Attitudes towards social responsibility: Sartre emphasized the moral responsibility of the individual and the rebellion against social injustice, while Beauvoir paid more attention to the impact of social structures on the individual and how gender equality could be achieved through social reform. Views on death: Sartre believed that death was the end of freedom, the end of individual existence, while Beauvoir explored the meaning of death for individual existence in her work, and how to find value and meaning in a limited life. These contradictions and differences reflect the different perspectives and theoretical orientations of the two thinkers on topics such as gender, social structure, interpersonal relationships, and death. Although they disagreed in some ways, their ideas all had a profound impact on later generations.

Q5's question is "What are the advantages of the von Neumann structure used in computers?", which is also a question with a lot of high scores, and the answers are good. I won't give an example. Only Bard scored zero.

From these 5 questions, we get the following results:

In the Chinese world, the most recommended 2023 "AI polymath" are these three:

The next step is the correspondent who can collect information. Is there such an assistant who can help us provide everything that is happening today according to our needs?Q1 is a classic life information question: "What clothes should I wear when I go to Guangzhou tomorrow?" Under this answer, it has been possible to identify which AI products clearly do not have the ability to consult the Internet, such as Wenxin Yiyan, iFLYTEK Xinghuo, etc. Strangely, Kimi Chat can check the Internet, but instead of actively collecting weather information, Kimi Chat gives this answer:

To determine what to wear to Guangzhou tomorrow, you first need to check the local weather forecast in Guangzhou. Located in southern China, Guangzhou has a subtropical climate with four distinct seasons, warm and humid, hot summers and warm winters. Please check the latest Guangzhou weather forecast for temperature, humidity, rainfall probability, and more tomorrow. If the temperature is higher tomorrow, it is recommended to wear lightweight, breathable clothing such as short sleeves, shorts or skirts, and comfortable shoes. If the temperature is low or there is a chance of rain, you can bring a light jacket or rain gear in case you need it. Please note that due to climate change and differences in personal comfort, it is recommended that you choose the right clothing according to the latest weather forecast and how you feel. And a number of AI products received a perfect score of 2, such as Baichuan:

Tomorrow's weather in Guangzhou will be cloudy, with a minimum temperature of 12°C, a maximum temperature of 22°C, a wind direction of north wind level 1, and a humidity of 46%. Therefore, it is recommended that you wear a long-sleeved shirt or light sweater, plus a jacket, to cope with the change in temperature. At the same time, you may feel comfortable due to the light wind and moderate humidity. Please note that these recommendations are for informational purposes only and specific attire should be tailored to your individual needs and comfort.

Q2 The question is: Do you know the podcast "Half Latte"? "Half Latte" was created in the first half of 2023, and some information appeared on the Internet in the second half of the year. So this depends a lot on two capabilities of AI products: whether they can query recent information, and whether they can find this kind of niche information. The results were surprising, but any AI product that could connect to the Internet gave the right answer. The most surprising thing is Tongyi's answer, and even the theme of the latest issue, which was released only 6 days ago: Yes, the podcast "Half Latte" is a Chinese podcast about business stories, co-hosted by Liu Fei and Xiao Lei. The show covers a wide range of topics, including the stories of tech giants such as Amazon and its founder, Bezos, as well as business history, the rise and fall of businesses, and business models. The podcast is popular with listeners for its unique narrative style, which is like a crosstalk-like gag to popularize business knowledge, and has a high number of views on platforms such as Ximalaya, and will be on the relevant podcast charts in 2023. In addition, Half Latte has also collaborated with "Lost Stores" to launch merchandise products, such as T-shirts, further expanding its brand influence. The program's updates remain active, attracting new subscribers and maintaining an existing loyal audience.

Q3 The question is more niche, there is not much information on the Internet: Do you know the original leaf tea "Three Five Cups"? Due to the relatively small amount of information, there began to be a situation of making up nonsense, such as the answer of New Bing (ChatGPT), the origin of the fabricated name: Yes, "Three Five Cups" is a raw leaf tea brand, co-founded by Liu Fei and others. The name of the brand comes from the "three-five system" in ancient Chinese culture, which means the philosophical idea of "the unity of heaven and man" or Baichuan's answer, which is wrong for the company: yes, I know the original leaf tea "three or five cups". This is a raw leaf tea product produced by Xiamen Sanwu Tea Industry Co., Ltd. "Three or Five Cups" emphasizes the elegant flavor of the original leaf tea, and the bitterness of the tea leaves is also low. In addition, they have also launched a series of cold brew teas, where tea bags are poured into room temperature water and can be drunk in 30 seconds, with a harmonious and layered overall taste. And Bard's answer, which treats my partner at Half Latte as a partner: Of course. Sanwu Cup is a raw leaf tea brand founded in 2022 by Liu Fei and Xiao Lei. The concept of Sanwu Cup is to "make raw leaf tea simpler", and is committed to providing consumers with high-quality and convenient tea products. Fabricating long-tail information may be a big problem for AI as a "correspondent".

Q4 is a more excessive problem, and it is even more niche: "In the movie "The Annual Meeting Can't Stop", which character does Boss Shi play?" but instead got an unexpected result. There are 6 AI products that get the answer right. For example, Zhipu Qingyan's answer: According to the information you provided, in the movie "The Annual Meeting Can't Stop", the role played by Boss Shi is the head of the department. The film is a comedy that tells the story of Hu Jianlin, a fitter who is transferred to the group's headquarters to become a gold collar and exposes the factory's corrupt executives at the group's annual meeting. Boss Shi's role in the film may be a workplace role related to Hu Jianlin, but the specific character details and plot development need to be understood by watching the movie. And there are a few AI products that answer half of the questions correctly, but make up the other half, which also belongs to this kind of long-tail, niche information, and will be added at will.

For example, Bard's answer even made up a few lines that didn't exist: In the movie "The Annual Meeting Can't Stop", Boss Shi played the role of department head B. Here are some of the lines of Department Leader B in the movie: "The annual meeting is an important part of the corporate culture and cannot be easily canceled. "I believe that the annual meeting is a stage for employees to show their talents and work together. "I hope that the company will be able to hold the annual meeting again so that employees can have a pleasant holiday. "The role of department head B is a very representative figure in the workplace. He has both strengths and weaknesses. He is a character that deserves our thought and reflection.

Q5 "What is the current progress of the Russia-Ukraine war?" is the most urgent question I need as a user of AI products, I know that the current AI is difficult to complete, but I still want to give it a try, and I gave it a full score of 4. However, the results did not exceed expectations, and the maximum score was only given 1 point. Among them, many AI products said that the questions were sensitive and refused to answer, and the few that answered were also talking about soup and did not give valuable answers. Tencent's answer is very representative:

The Russia-Ukraine war has lasted for more than 672 days since it began on February 24, 2022. In this war, both sides paid a huge price. Here's an overview of the latest developments in the Russia-Ukraine war: Sneak attack and retaliation: Recently, the Ukrainian army launched an air strike on the Feodosia region of Crimea, successfully destroying the landing ship "Novocherkassk" of the Russian Black Sea Fleet. In response, Russia launched massive airstrikes on multiple targets inside Ukraine, including Kyiv, Kharkiv, Dnipro and Odessa oblasts, among others. Frontline battlefield: The Russian army has achieved an important victory on the battlefield and has taken the initiative to annihilate the living forces of the Ukrainian army. For example, the victory of the Russian army in the Battle of Maryinka further weakened the front-line combat capability of the Ukrainian army. Geopolitics: The Russia-Ukraine war has had an important impact on the global political landscape. For example, the United States has provided military assistance to Ukraine, including mass destruction cluster bombs, which has aroused the attention and concern of the international community. War losses: To date, the war has resulted in a large number of casualties and property damage. More than 400 Ukrainian athletes have reportedly died in the war. In conclusion, the Russian-Ukrainian war is currently ongoing, with fierce fighting between the two sides on the front and in the rear. In this war, the lives and property of innocent civilians have suffered enormous losses, and the international community should remain committed to resolving the dispute through diplomatic means and achieving peace at an early date.

The answer to this question was the least satisfying of my entire review. From this point of view, AI has really become an assistant to our information, and it is still a long way to go. It can be said that everyone can't get any points for this question.

Here's the overall rating:

So in terms of "correspondents", the recommended AI products for everyone are:

But to be honest, at present, everyone is not doing enough in this demand scenario. Even these two are only right on three questions. To be able to get the highest score, there is an element of luck.

Clerk

The clerk accomplishes one very simple thing: summarizing. What is the content of a picture, what is said in an article, what is the focus of a book. Just look at these few. Q1 is a slightly difficult way to decipher this chart of Amazon's revenue composition:

For image interpretation, many AI products do not support it itself, so they have to score 0. Among the supported AI products, your uncle is still your uncle, and ChatGPT is firmly in full marks. There are also a few that, although they can be deciphered, but there are very serious factual errors, and they cannot be seen without looking closely, so they can only be given a score of 0. For example, a paragraph in Kimi Chat's answer:

Sales & Marketing (S&M): Costs were $10.6 billion, up 6% year-over-year. General & Administrative (G&A): Costs were $260 million, up 1% year-over-year. Other opex: Costs were $20 million. Among them, the G&A cost of 2.6 billion is written as 260 million, and the other cost of 200 million is written as 200 million. The year-over-year growth of S&M should have been 1%, which was written as 6%.

Q2 is the main idea to summarize this article (layoffs will continue until the reversion to the mean), and this problem is done with a variety of input methods, some AI products support direct input links, some support uploading Word documents, and some can only directly enter the full text. I'm more concerned about accuracy and recall, i.e., whether I can fully describe the main points in the article. In the end, I give perfect marks to Kimi Chat, Bard, and Claude. For example, Bard's answer is as follows:

The central idea of this article can be summarized as follows:

The growth of the Internet industry has peaked, and the opportunities for innovation have decreased.
The recruitment of employees in large factories is irrational, resulting in high employee costs.
The capabilities of employees in large factories cannot be reused elsewhere.
The wave of layoffs is a sign of the Internet industry's return to the mean.
The golden age of the Internet industry has passed, and Internet people need to rethink their workplace and personal choices.

Specifically, the article begins by pointing out that the growth of the Internet industry has peaked and there are fewer opportunities for innovation. This means that the era of rapid development of the Internet industry has passed, and Internet companies need to find new growth points.

Second, the article points out that the recruitment of employees in large factories is irrational, resulting in high employee costs. Large factories often use the method of "adding people" to solve problems, which leads to a significant increase in employee costs. And the capabilities of these employees cannot be reused elsewhere, so layoffs are inevitable.

Third, the article points out that the golden age of the Internet industry has passed, and Internet people need to re-understand their workplace and personal choices. Internet people need to realize that high pay in the Internet industry does not mean that the work is highly valuable, and they also need to be prepared for layoffs.

Finally, the article points out that poor information can sometimes be fatal. Internet people need to keep abreast of industry trends and make career plans in order to be invincible in the changing situation.

All in all, this article provides an in-depth analysis of the future development of the Internet industry and provides some useful food for thought for Internet people.

Of course, I have reflected on this topic myself, and it is not particularly fair, because AI such as ChatGPT, Tongyi Qianwen, and Wenxin Yiyan may answer more specific content under the question, and do not test more contextual content. Unfortunately, I did a supplementary test in Q4.

Q3 "To summarize the main points of the book: "Poor Dad, Rich Dad" is similar to Q2 in that it is a summary of the whole book. And in order to prevent the one-time answer to the unfair question, Q4 was added: "What are the main ways of financial freedom mentioned in this book?" These two questions are more restrictive, and many AI products cannot upload documents at all, such as Claude, although it can be uploaded, but it exceeds the size limit. There are not many answers that can be answered, and the answers are not bad. For Q4 who asked questions, Kimi Chat and Zhipu Qingyan performed very well, both with perfect marks.

For example, Zhipu Qingyan's answer: According to the content of the document, the main ways of financial freedom mentioned in this book include: improving financial knowledge, learning how to manage money and investments Xi. For example, understanding the difference between assets and liabilities, buying assets that bring income, etc. Choose your friends carefully and make friends who are financially knowledgeable and successful. Master a model, such as investing in real estate or stocks, and earn income from it. Pay yourself first and don't take on debt easily. Give your brokers good remuneration because they can help you create wealth. Be an "" and give before you get. Use assets to buy luxury goods, not liabilities. Worship heroes and learn Xi their successful experiences. Give before you take, give generously. Take action, don't just stop at the thought. To sum up, this book mainly aims to achieve financial freedom by improving financial knowledge and skills, investing prudently, controlling expenditures, and increasing income.

Many AI products in this area are restricted, so there aren't many that have a score in the first place. Here are the results:

Then the recommended "clerk" products are:

However, it should also be noted that the one I just said has a serious mistake in image recognition is Kimi Chat. Therefore, you can use Kimi Chat for document recognition, and ChatGPT is still used for image recognition.

creator

This is probably something that many friends are very concerned about, and whether content creation can help us. This part of the evaluation is relatively simple, only 3 questions are done, because the marking is too tiring...

Q1 is "Write a seven-character quatrain in praise of fried buns". In addition to the sparse answer to this question from his friend Claude, other AI products can at least meet the requirement of "seven words", which gives a basic score. If there is no rhyme, there will definitely be a deduction of points. Two of my favorites were written by ChatGPT: the aroma of fried buns is all over the streets, and the golden brown and crispy are delicious. A mouthful of soup is as sweet as honey, and the aftertaste is endless. and Tongyi Qianwen's creation: fragrant pan-fried white jade buns, oily dough golden and burnt. The filling is mellow and the juice is beautiful, and the taste of the world is early in the morning. They all count as perfect scores. It's okay to pretend that it's written by an ordinary person, and there aren't too many flaws.

Q2 It's an old problem, this time the movie was changed: "The style of Xiaohongshu is a very eye-catching title, adding emoji to each paragraph, and a few tags at the end. Please recommend a movie "Oppenheimer" in the style of Little Red Book. This question is almost all scored, and there are multiple perfect marks. The only 0 is Claude, with a large number of errors.

Here you can recommend Wen Xin's full score answer: 🚀 "Oppenheimer" - the endless legendary genius scientist behind the genius, the legendary life. In this film, we see how Oppenheimer played a key role in the development of the atomic bomb. Each passage makes people's blood boil, as if they are in that important moment that determines history. 🌌 When you think of Oppenheimer, what comes to mind? Was it the man who changed the world, or the man with endless wisdom and unique charm? His life is legendary and amazing. 😮 This film uses the most realistic shots to show Oppenheimer's life. From his growth to his achievements, every detail is emotional. His talent, his wisdom, and his persistence have touched us deeply. 💖 "Oppenheimer" is a movie that will make your blood boil, and every shot is full of power and emotion. It reacquainted us with this genius scientist and gave us a deeper understanding of human intelligence and strength. 🌟 #奥本海默 #传奇人生 #天才科学家 #历史时刻 #电影推荐

Q3 is a very practical problem in my work: "I asked Musk for my podcast to help me make a podcast outline to talk to him. This is really unexpected, it may be that Musk has enough information, and the products of each company answered very well, and almost all of them gave high marks. Maybe it's because this structured content is easier for AI to grasp. It's also the only topic that I really find inspiring.

The answers to this part are very long, so I won't expand on them here. Let's look directly at the results.

It's amazing, from the point of view of this part of creation, it is indeed a relatively good three, and it comes out on top.

So in the content creation section, the recommended AI products are:

However, here is also a reminder that although the creators here generally score high, it is still due to different expectations and standards. They can help us write the first draft and provide some inspiration, but there is still a long way to go to really help us create.

summary

Finally, let's summarize a few AI products that I recommend. Polymaths: Tongyi Qianwen, Zhipu Qingyan, Kimi Chat

Correspondents: New Bing, Xunfei Xinghuo

书记员：ChatGPT、Kimi Chat

Creators: ChatGPT, Tongyi Qianwen, Wenxin Yiyan

Let me say a few thoughts about the end of my experience. 1. Although the domestic large model is indeed very different from ChatGPT, it is already very different from the experience at the beginning of 2023, and the answers to some questions are very usable. This development trend is very optimistic, even if there is a gap with ChatGPT, it does not prevent us from accessing more and more useful tools that can be used in some occasions.

2. In terms of specific scenarios, in addition to the old models such as Tongyi and Wenxin, there are also a few good products that I have not been exposed to before, and the use experience is good, which will help me a lot when writing articles and making podcast manuscripts in the future. For example, Kimi Chat, Zhipu Qingyan, and Xunfei Xinghuo. 3. As a large manufacturer with the same financial resources and resources, Tencent and Byte's large model, at least in the version I experienced, there is still a big gap, you can review the scores of these four evaluations, which are basically downstream. That's a bit of a pity. If there is a first-mover advantage according to the big model, there is a sense of crisis.

Okay, that's about it. Again, all the reviews in this article are my very personal subjective judgments, and you can have different opinions, including the scoring criteria. You are also welcome to make a review in your work and life, not to use the brain teasers circulating on the Internet, but to use the questions you really need, give it a try, feel it, and you can also make a list of yourself. Nowadays, there is a big war of 100 models, and there is a lack of evaluation standards from the perspective of real users, so it is inevitable that there will be fishing in troubled waters. Let's try more, give more specific subjective evaluations, and push back these manufacturers to seriously think about how to improve the experience, instead of rolling those test-taking questions. Everyone, we'll see you next year for the next review.

END

The address of the product reviewed above

ChatGPT：chat.openai.com通义千问：tongyi.aliyun.com/qianwen文心一言：yiyan.baidu.comKimi Chat：kimi.moonshot.cn/chat智谱清言：chatglm.cn豆包：www.doubao.com/chatBard：bard.google.com/chat百川：www.baichuan-ai.com/chatClaude：claude.ai/chat腾讯混元：hunyuan.tencent.com/bot/chat

Xunfei Xinghuo: xinghuo.xfyun.cn/desk Yayi: yayi.wenge.com/chat

Full Scoring Table:

题图由 Midjourney 绘制。

A year has passed, what happened to the domestic large model?—— 12 AI product reviews from the user's perspective

Read on