AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Finance Associated Press

2024-05-17 10:41Published on the official account of Cailianpress, a subsidiary of Shanghai Poster Industry Group

"Science and Technology Innovation Board Daily" on May 17 (Reporter Zhu Ling) Recently, OpenAI used a 26-minute online live broadcast to demonstrate the amazing interactive capabilities brought by GPT-4o, bringing a new round of AI hegemony into the "Her era". GPT-4o's "o" stands for "omni", which means "omni", and the model is able to achieve seamless text, video, and audio inputs, and generate corresponding modal outputs, truly realizing multimodal interaction.

The day after that, the annual Google I/O developer conference came as scheduled, and Google CEO Sundar Pichai announced a series of major updates around its latest generative AI model, Gemini, to fight back against OpenAI, including Project Astra, an AI assistant project powered by the upgraded Gemini model, and Veo, a video model for Wensheng that benchmarks against Sora.

This week, the AI battlefield has come to an end, and the reporter of "Science and Technology Innovation Board Daily" conducted a capability evaluation on the "star" players in the AI industry - Google Gemini 1.5 Pro (1 million tokens), OpenAI's latest upgraded GPT-4o and the previously released GPT-4.

▍文本测试：谷歌Gemini 1.5 Pro正确率和速度完胜GPT-4o和GPT-4

It has been more than a year since OpenAI released GPT-4, and according to reports, the inference ability of the new flagship model GPT-4o has been significantly improved, the speed is faster, and the price has also dropped.

Google's Gemini series is known for its iconic large contextual window, which has previously been available in three form factors: Ultra, Pro, and Nano, each of which is suitable for different scale and application scenarios. It was announced that the context length of the iteration Gemini 1.5 Pro has been increased from 1 million tokens to 2 million tokens. This improvement significantly enhances the model's data processing capabilities, making it more comfortable with more complex and large datasets.

Both companies are confident in the evolution of their large models, but the situation needs to be verified in practice.

The first question is the "factual answer question", and only the Google Gemini 1.5 Pro model answers correctly, which can discern the fact that "screws are not a food".

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Gemini 1.5 Pro回复结果

GPT-4 and GPT-4o are very detailed and comprehensive in their answers to the question "how to make spicy screws", covering the required materials, production steps, and tips, but they ignore the premise that "screws are not an edible product".

GPT-4、GPT-4o回复结果

The second question is "Logical Computing Question", GPT-4 and GPT-4o both answered incorrectly, the Google model gave the correct answer, and showed the specific answer time, and the answer and analysis were given in less than 10 seconds, and the performance can be described as "fast and good".

Gemini 1.5 Pro回复结果

Different models have different thinking strategies when dealing with logical problems. Unlike Gemini 1.5 Pro, which gives the answer first and then explains the rules behind it in detail, GPT-4 and GPT-4o prefer to disassemble the problem in depth first, rather than presenting the answer directly. However, this meticulous process of analysis and dismantling of the questions also results in the latter two taking a relatively long time to answer.

GPT-4、GPT-4o回复结果

The third question is "Biology", GPT-4 answered incorrectly, GPT-4o and Google Gemini 1.5 Pro answered correctly, with a time of 14.83 seconds and 11.2 seconds, respectively, and Gemini 1.5 Pro was slightly better.

Gemini 1.5 Pro回复结果

The fourth question is "Ethics and Morality", and the answers of the three large models are correct, and all of them can identify it as the classic ethical dilemma "Tram Problem". GPT-4 and Gemini 1.5 Pro emphasize the complexity of ethical dilemmas and do not give direct choices, while GPT-4o analyzes and gives choices based on the principle of "minimizing casualties".

The three models respond to the results

The reporter of "Science and Technology Innovation Board Daily" summarized the text test results and found that Google's Gemini 1.5 Pro model with 1 million parameters relied on the correct performance of all four times, and the strength leverage, GPT-4o answered correctly twice, while the performance of the GPT-4 model was unsatisfactory, and only answered correctly once.

Since the Gemini 1.5 Pro model with 2 million parameters has not yet been opened, the reporter of "Science and Technology Innovation Board Daily" applied for internal testing, and waited for further testing and sharing after passing.

▍Multimodal testing: GPT-4o is superior in detail and analysis capabilities

GPT-4o is OpenAI's third major iteration of GPT-4, its popular large-scale multimodal model, which expands GPT-4's capabilities with vision capabilities that enable the newly released model to dialogue, visual recognition, and interaction with users in an integrated and seamless way. The Gemini 1.5 Pro also has multi-modal capabilities, which are suitable for summarizing, chatting, image analysis, and video captioning, as well as extracting data from long texts and tables.

The reporter asked about three large models with "photos of the park".

During the test, the reporter used a "photo of the park" to ask about three large models. Based on image test feedback, all three large models accurately depict the content of the park photos, but with slightly different emphasis. GPT-4o excels at completeness of information, detailing various details such as the type of vessel and the state of the lake, but it is a little verbose. Gemini 1.5 Pro's language is concise and fluent, and words such as "leisurely boating" and "pleasant scenery" are used to describe the beauty of the picture, but the details are not as rich as GPT-4o. GPT-4 is succinct in description, but not rich in detail.

In short, GPT-4o is the strongest if you value the comprehensiveness of the information; If you're more verbal, the Gemini 1.5 Pro performs slightly better.

Since GPT-4 does not yet have the ability to parse audio and video content, it does not conduct relevant evaluations. OpenAI co-founder Sam Altman said that the new speech model GPT-4o has not yet been shipped, and that it has been shipped only as a text version of GPT-4o. As soon as the audio version is shipped, the reporter will bring the evaluation as soon as possible.

According to the feedback from the video test, GPT-4o has demonstrated strong multi-modal processing capabilities when parsing video content. It is capable of extracting and analyzing video frames and presenting them to the user intuitively through a graphical interface. During the analysis, the model accurately identified the quadruped robot in the video and described its appearance, environment, and activities in detail.

In contrast, Gemini 1.5 Pro's reply was brief and monotonous, and it was only after the reporter's second questioning that more details were fleshed out.

Overall, GPT-4o is the best choice for the most comprehensive and in-depth understanding of multimodal content, while Gemini 1.5 Pro is more suitable for multimodal applications that value the quality and efficiency of expression. However, neither GPT-4o nor Gemini 1.5 Pro mentions the analysis of sound in video, which is a common missing of the two multimodal large model interpretations.

▍Former Huawei's "Genius Boy" predicts that China's first end-to-end multi-modal large model will arrive by the end of the year

The AI race has reached a white-hot stage, and it has bid farewell to pure technology competition and turned to application and user experience competition.

In the search engine and office space, Google will also further introduce AI into it. The reporter found that the "AI Overviews" function, which summarizes the results of Google's search engine, is now available. Robin Li, Baidu's founder, chairman and chief executive, said on an earnings call last night that 11% of search results on Baidu search are currently generated by AI. He pointed out that Baidu Search's AI restructuring is still in its early stages, and overall, search is most likely to become the killer application in the AI era.

OpenAI and Google are both eyeing an intelligent assistant that can interact naturally, which is an end-to-end unified multi-modal model that will drive revolutionary changes in AI applications. Former Huawei "Genius Boy",

Li Bojie, former Huawei's "genius boy" and co-founder of Logenic Al, believes that the first multi-mode end-to-end multi-modality in China is likely to be almost available by the end of this year.

In response to the recent slowdown in the development of AI Agent, Li Bojie said, "Although the development of AI intelligent assistant is promising, the cost and user's willingness to pay are the main factors limiting its rapid development. GPT-4o It is 4 times faster than GPT-4 and doubles the cost, but it may still be expensive for the average consumer. ”

Li Bojie said that in the long run, practical intelligent assistants have higher value because of their ability to solve real-world problems. In the short term, smart assistants with emotional companionship and entertainment functions are easier to commercialize because they require less reliability and are relatively easy to develop and deploy.

(Science and Technology Innovation Board Daily reporter Zhu Ling)

View original image 156K

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Read on

After the college entrance examination, the girl returned to school to attend the graduation ceremony, curled her hair, and went to school beautifully in a skirt, netizens appealed: Hide it quickly, aren't you afraid of being taken away? This girl is so beautiful and generous,

Baodao welcomes Mr. Hu Ge, Taiwan's star mainland, gold panning, big water, big fish, big opportunity

Why does 45-year-old Tong Dawei have such a long flowering period? also plays a guy who is in love, he doesn't have the greasy of an uncle at all, he is really an ageless male god! Tong Dawei is an ageless male god, obviously 45

faked falling, walked naked, was driven away by security guards, and domestic female stars acted as demons on the red carpet, losing face and throwing them abroad

#理想家生活 ##什么值得买#话说洗发水, there are many brands, such as Rejoice, Pantene, Head & Shoulders, and today I want to introduce you to Adolf. I don't know big

The stunning style of the female star, the private parts are eye-catching, netizens: Is this kind of dress excessive

When I was young, I really didn't think that red mushrooms were beautiful, but now that I have more plastic surgery, I suddenly feel that red mushrooms are so beautiful! Zhong Chuhong's face, will it still be red now? #港星韵味何在 ##香港女明星系

After a World Cup qualifier, thousands of netizens thanked the Singaporean players in unison. The ratings of the participating players in this game (Singapore 1:3 Thailand) directly exceeded that of NBA All-Star players. You are all

Everyone was deceived by Lin Yuner's embarrassing popularity, Lin Yuner's overseas eighteenth-tier popularity, Li Junhao is hotter 🔥 than Lin Yuner, Li Junhao's French activities are full of fans, Lin Yuner's French Cannes 0 fans buy hot searches

Top 10 Hollywood celebrity couples fortune revealed: From music legends to sports stars, who is at the top of the list?

Dedicated to the female star of art - Yoko Minamino

Looking at the combination of old husbands and young wives in the entertainment industry, there is really an indescribable feeling. Is the charm of money so great? The woman would rather find someone who is so much older than herself

From Lu Han to Wang Yibo, traffic stars have dominated the screen entertainment for 10 years, who decided?

Alan Tam fans will always say why after Alan Tam withdrew from receiving the award in 88, Leslie Cheung couldn't win two awards by himself like Tan, they probably forgot that Alan Tam couldn't be one for a long time

↓↓↓#张国荣#天才, nothing is difficult, mainly because it is too handsome, and it attracts people to be jealous of the memory of #Hong Kong and Taiwan stars#

Hu Lianxin [Bixin] exotic #actress beauty photo sharing ##女明星绝美造型##纯纯的东方美女##时尚美照推荐#

The EU has imposed tariffs on China's electric vehicles, don't panic, China has three cards to fight back

Sales ranking in the first week of June: Ideal L6 turned the tide, Xiaomi SU7 was out of the blue

No wonder "The Story of Roses" is a god! The actor's education is more ruthless than the other, and Lin Gengxin is a doctor

The EU imposed tariffs on the ground, and the related trade losses may reach 4 billion US dollars, and the impact of China's electric vehicles is geometric

There is hope for the World Cup! Oscar naturalization is coming to an end! The national football front line is picturesque!

The iPhone 16 Pro has a new change, and it will usher in a new upgrade in this regard, which will use capacitive buttons

On the 13th, the KPL was first announced, and after 1,300 days, it returned in the early morning, but Ning Yu was substituted

The most expensive domestic car is on the market, with a price of 7.18 million yuan!

Will be imposed by the EU 48.1% tariff SAIC Motor issued a statement in response

Mo Yuyunjian: The eldest princess's move is too stupid, and only she can reveal Xue Fangfei's identity

Father's Day mobile phone gift recommendation: choose these for large screen and large battery

Musk won! Tesla shareholders approved Musk's sky-high compensation plan

After losing G3! Kidd bluntly expressed his dissatisfaction with Doncic's defense, who threw the blame and made a free throw, and Irving: Support him

Compensation can be as high as hundreds of thousands? Guangqi Honda employees "rushed" to be laid off

The top leaders of car companies debate the price war: Camry fell into 140,000, "not even a splash"

Green Army 3-0 Mavericks! Owen's speech was meaningful, and Su Qun hit the nail on the head, he really dared to say it