laitimes

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Finance Associated Press

2024-05-17 10:41Published on the official account of Cailianpress, a subsidiary of Shanghai Poster Industry Group

"Science and Technology Innovation Board Daily" on May 17 (Reporter Zhu Ling) Recently, OpenAI used a 26-minute online live broadcast to demonstrate the amazing interactive capabilities brought by GPT-4o, bringing a new round of AI hegemony into the "Her era". GPT-4o's "o" stands for "omni", which means "omni", and the model is able to achieve seamless text, video, and audio inputs, and generate corresponding modal outputs, truly realizing multimodal interaction.

The day after that, the annual Google I/O developer conference came as scheduled, and Google CEO Sundar Pichai announced a series of major updates around its latest generative AI model, Gemini, to fight back against OpenAI, including Project Astra, an AI assistant project powered by the upgraded Gemini model, and Veo, a video model for Wensheng that benchmarks against Sora.

This week, the AI battlefield has come to an end, and the reporter of "Science and Technology Innovation Board Daily" conducted a capability evaluation on the "star" players in the AI industry - Google Gemini 1.5 Pro (1 million tokens), OpenAI's latest upgraded GPT-4o and the previously released GPT-4.

▍文本测试:谷歌Gemini 1.5 Pro正确率和速度完胜GPT-4o和GPT-4

It has been more than a year since OpenAI released GPT-4, and according to reports, the inference ability of the new flagship model GPT-4o has been significantly improved, the speed is faster, and the price has also dropped.

Google's Gemini series is known for its iconic large contextual window, which has previously been available in three form factors: Ultra, Pro, and Nano, each of which is suitable for different scale and application scenarios. It was announced that the context length of the iteration Gemini 1.5 Pro has been increased from 1 million tokens to 2 million tokens. This improvement significantly enhances the model's data processing capabilities, making it more comfortable with more complex and large datasets.

Both companies are confident in the evolution of their large models, but the situation needs to be verified in practice.

The first question is the "factual answer question", and only the Google Gemini 1.5 Pro model answers correctly, which can discern the fact that "screws are not a food".

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Gemini 1.5 Pro回复结果

GPT-4 and GPT-4o are very detailed and comprehensive in their answers to the question "how to make spicy screws", covering the required materials, production steps, and tips, but they ignore the premise that "screws are not an edible product".

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

GPT-4、GPT-4o回复结果

The second question is "Logical Computing Question", GPT-4 and GPT-4o both answered incorrectly, the Google model gave the correct answer, and showed the specific answer time, and the answer and analysis were given in less than 10 seconds, and the performance can be described as "fast and good".

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Gemini 1.5 Pro回复结果

Different models have different thinking strategies when dealing with logical problems. Unlike Gemini 1.5 Pro, which gives the answer first and then explains the rules behind it in detail, GPT-4 and GPT-4o prefer to disassemble the problem in depth first, rather than presenting the answer directly. However, this meticulous process of analysis and dismantling of the questions also results in the latter two taking a relatively long time to answer.

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

GPT-4、GPT-4o回复结果

The third question is "Biology", GPT-4 answered incorrectly, GPT-4o and Google Gemini 1.5 Pro answered correctly, with a time of 14.83 seconds and 11.2 seconds, respectively, and Gemini 1.5 Pro was slightly better.

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Gemini 1.5 Pro回复结果

The fourth question is "Ethics and Morality", and the answers of the three large models are correct, and all of them can identify it as the classic ethical dilemma "Tram Problem". GPT-4 and Gemini 1.5 Pro emphasize the complexity of ethical dilemmas and do not give direct choices, while GPT-4o analyzes and gives choices based on the principle of "minimizing casualties".

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

The three models respond to the results

The reporter of "Science and Technology Innovation Board Daily" summarized the text test results and found that Google's Gemini 1.5 Pro model with 1 million parameters relied on the correct performance of all four times, and the strength leverage, GPT-4o answered correctly twice, while the performance of the GPT-4 model was unsatisfactory, and only answered correctly once.

Since the Gemini 1.5 Pro model with 2 million parameters has not yet been opened, the reporter of "Science and Technology Innovation Board Daily" applied for internal testing, and waited for further testing and sharing after passing.

▍Multimodal testing: GPT-4o is superior in detail and analysis capabilities

GPT-4o is OpenAI's third major iteration of GPT-4, its popular large-scale multimodal model, which expands GPT-4's capabilities with vision capabilities that enable the newly released model to dialogue, visual recognition, and interaction with users in an integrated and seamless way. The Gemini 1.5 Pro also has multi-modal capabilities, which are suitable for summarizing, chatting, image analysis, and video captioning, as well as extracting data from long texts and tables.

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

The reporter asked about three large models with "photos of the park".

During the test, the reporter used a "photo of the park" to ask about three large models. Based on image test feedback, all three large models accurately depict the content of the park photos, but with slightly different emphasis. GPT-4o excels at completeness of information, detailing various details such as the type of vessel and the state of the lake, but it is a little verbose. Gemini 1.5 Pro's language is concise and fluent, and words such as "leisurely boating" and "pleasant scenery" are used to describe the beauty of the picture, but the details are not as rich as GPT-4o. GPT-4 is succinct in description, but not rich in detail.

In short, GPT-4o is the strongest if you value the comprehensiveness of the information; If you're more verbal, the Gemini 1.5 Pro performs slightly better.

Since GPT-4 does not yet have the ability to parse audio and video content, it does not conduct relevant evaluations. OpenAI co-founder Sam Altman said that the new speech model GPT-4o has not yet been shipped, and that it has been shipped only as a text version of GPT-4o. As soon as the audio version is shipped, the reporter will bring the evaluation as soon as possible.

According to the feedback from the video test, GPT-4o has demonstrated strong multi-modal processing capabilities when parsing video content. It is capable of extracting and analyzing video frames and presenting them to the user intuitively through a graphical interface. During the analysis, the model accurately identified the quadruped robot in the video and described its appearance, environment, and activities in detail.

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

In contrast, Gemini 1.5 Pro's reply was brief and monotonous, and it was only after the reporter's second questioning that more details were fleshed out.

AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Overall, GPT-4o is the best choice for the most comprehensive and in-depth understanding of multimodal content, while Gemini 1.5 Pro is more suitable for multimodal applications that value the quality and efficiency of expression. However, neither GPT-4o nor Gemini 1.5 Pro mentions the analysis of sound in video, which is a common missing of the two multimodal large model interpretations.

▍Former Huawei's "Genius Boy" predicts that China's first end-to-end multi-modal large model will arrive by the end of the year

The AI race has reached a white-hot stage, and it has bid farewell to pure technology competition and turned to application and user experience competition.

In the search engine and office space, Google will also further introduce AI into it. The reporter found that the "AI Overviews" function, which summarizes the results of Google's search engine, is now available. Robin Li, Baidu's founder, chairman and chief executive, said on an earnings call last night that 11% of search results on Baidu search are currently generated by AI. He pointed out that Baidu Search's AI restructuring is still in its early stages, and overall, search is most likely to become the killer application in the AI era.

OpenAI and Google are both eyeing an intelligent assistant that can interact naturally, which is an end-to-end unified multi-modal model that will drive revolutionary changes in AI applications. Former Huawei "Genius Boy",

Li Bojie, former Huawei's "genius boy" and co-founder of Logenic Al, believes that the first multi-mode end-to-end multi-modality in China is likely to be almost available by the end of this year.

In response to the recent slowdown in the development of AI Agent, Li Bojie said, "Although the development of AI intelligent assistant is promising, the cost and user's willingness to pay are the main factors limiting its rapid development. GPT-4o It is 4 times faster than GPT-4 and doubles the cost, but it may still be expensive for the average consumer. ”

Li Bojie said that in the long run, practical intelligent assistants have higher value because of their ability to solve real-world problems. In the short term, smart assistants with emotional companionship and entertainment functions are easier to commercialize because they require less reliability and are relatively easy to develop and deploy.

(Science and Technology Innovation Board Daily reporter Zhu Ling)

View original image 156K

  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus
  • AI "star" players face off at the pinnacle! The reporter measured the latest Google Gemini and GPT-4o|focus

Read on