laitimes

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

author:Data Ape
OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

If Jensen Huang is the tech Tyler Swift who has gained a lot of fans with his affinity and appeal, then Sam Altman is a bit like the AI Kim Kardashian, always good at creating buzz and stealing the limelight.

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

In the last two weeks, rumors have been raging about OpenAI's upcoming launch of a search engine, with all the spotlight on Altman. Just when everyone's expectations were about to reach their peak, the "popular fried chicken" of Silicon Valley suddenly jumped out last Friday and announced that OpenAI's spring product launch conference would be held on May 10, the day before Google's I/O developer conference. He also promised on Twitter that he would bring some "magical" updates, such a set of "marketing mix punches" not only created momentum for OpenAI, but also made Google's "warm-up" instantly mute.

So at Monday's press conference, what exactly did OpenAI launch a "magic" product?

GPT-4o,OpenAI首款能分析情绪的多模态大型语言模型

At 10 a.m. PT, OpenAI's chief technology officer, Mira Murati, entered the live broadcast room to brief the audience on the big spring update, which includes a desktop version of ChatGPT, an updated user interface, and most importantly, a new flagship model, GPT-4o.

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

(Murati at the press conference)

The "o" in GPT-4o stands for "Omnimodal", and as the name suggests, this is a multimodal large model based on GPT-4.

What's more noteworthy is that GPT-4o is able to interact with users in a variety of tones and accurately capture users' emotional changes, which is a big improvement. Unlike previous versions, which only recognized voice input through "voice-to-text", GPT-4o is able to process voice input in real time and respond to the user's emotion and tone.

During the live broadcast, two OpenAI employees showed everyone the details of the update of GPT-4o.

1. Sensing user emotions: Mark Chen, head of cutting-edge research, asked ChatGPT-4o to listen to his breathing, and the chatbot detected his rapid breathing and humorously advised him not to breathe like a vacuum cleaner and to slow down. Mark then takes a deep breath, and GPT-4o says this is the right way to breathe.

2. Voices with different emotions: Chen demonstrated how ChatGPT-4o can read AI-generated stories in different voices, including super-dramatic recitations, robotic tones, and even singing.

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

(ChatGPT-4o根据指示变换语调让大家捧腹大笑)

3. Real-time visual capabilities: Researcher Barret Zoph demonstrates how ChatGPT-4o can solve math problems in real time through a phone camera, as if a real math teacher is next to guide each step of the problem. In addition, ChatGPT-4o can also observe the user's facial expressions through the front-facing camera and analyze their emotions.

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

(Barrett Zoff demonstrates solving equations under the step-by-step guidance of ChatGPT-4o)

4. More immediate voice interaction: ChatGPT-4o's response time has been shortened, and the interaction with the user is more instantaneous. Murati and Chen used the new chatbot to demonstrate real-time translation across languages, seamlessly transitioning between English and Italian.

As you can see, the focus of this update is to make the chatbot less mechanical and indifferent, but closer to a real human, able to understand and express emotions. So, how does GPT-4o achieve emotion recognition?

OpenAI has not released more technical details at this time, but according to its overview on its official website, before GPT-4o, when using ChatGPT's voice mode, it needs to be relayed through three models that are independent of each other:

1. The first model converts audio to text;

2. GPT-3.5 or GPT-4 then processes the text input and outputs the text;

3. The last model converts the text back to audio.

This often results in a large loss of information, such as the inability to capture intonation, recognize multiple speakers or background noise, and fail to generate laughter, singing, or other emotional expressions.

What's innovative about GPT-4o is that it's OpenAI's first model that integrates text, visual, and audio multimodal inputs and outputs. By training a new, unified model end-to-end, all input and output processing is done by the same neural network.

In addition to multimodal input and output, GPT-4o also has a faster response time: it can respond to audio input in as little as 232 milliseconds, with an average response time of 320 milliseconds, which is close to the response time of a human in a conversation.

GPT-4o's performance on English text and code is comparable to GPT-4 Turbo performance, with significantly improved performance on non-English text, while the API is also faster and costs are reduced by 50%. Compared to existing models, GPT-4o is particularly good at visual and audio understanding.

In order to give you a more intuitive feeling, we asked ChatGPT-4 to generate a table comparing GPT-4o and GPT-4 Turbo:

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

科技博主“All About AI”也在YouTube上展示了GPT-4o和GPT-4 Turbo的反应速度(下图)。

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

By asking GPT-4o (left) and GPT-4 Turbo (right) the same at the same time—"write three paragraphs about life in Paris in the 19th century"—we can observe that GPT-4 Turbo is still processing its output while GPT-4o has already finished processing and responding.

GPT-4o processed 574 tokens in 5216 milliseconds (5.216 seconds), which is equivalent to about 110 tokens per second; GPT-4 Turbo processed 474 tokens in 23,442 milliseconds (23.442 seconds), which is equivalent to approximately 20 tokens per second. The processing speed of the former is about 5.44 times faster than that of the latter.

After the launch, an OpenAI researcher confirmed in his own tweet that the mysterious GPT-2 chatbot that had previously appeared on the test site was indeed GPT-4o.

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

"GPT-4o is our latest cutting-edge model. We have tested a version on LMSys, which is im-also-a-good-gpt2-chatbot. William Fedus introduced it on his Twitter and got a retweet from Altman.

"ELO scores can end up being limited by the difficulty of the prompts. We found that on the more difficult set of prompts — especially programming — GPT-4o's ELO was 100 points higher than our previous best model," the engineer added.

从下图可以看出,GPT-4o(也就是im-also-a-good-gpt2-chatbot)的表现一骑绝尘,远高于其他大模型。

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

Murati also announced at the spring launch event that GPT-4o's text and image capabilities have begun to be available to paid ChatGPT Plus and Teams users, and will soon be rolled out to enterprise users. At the same time, free users will also be granted access to it gradually, subject to rate limitations. GPT-4o's voice capabilities are expected to be available to users in the coming weeks.

Currently, developers are able to use GPT-4o's text and visual modes through the API.

In addition, OpenAI has also optimized the user interface (UI) of ChatGPT and launched the ChatGPT app for macOS, which is available to paying users. The company said it will also launch a Windows version of the ChatGPT app later this year.

苹果将用GPT-4o取代自家语音助手Siri?

The launch of GPT-4o drove Apple's stock price up slightly.

On Friday, Bloomberg reported that Apple is considering integrating ChatGPT technology into its next-generation iOS18 system. If a deal is reached with OpenAI, Apple could launch a ChatGPT-based chat assistant as one of a series of new AI features the company plans to release in June.

OpenAI推出最新大模型“GPT-4o”,你的快乐悲伤它都能读懂

(Bloomberg report)

For many years, Apple has been a technology stock favored by top investors and investment institutions, including Warren Buffett, and is the largest technology company by market capitalization, but in recent years it has underperformed other big technology companies.

Apple's stock price is down about 2% year-to-date, while Microsoft's stock price is up more than 10%. Thanks to its leadership in the field of AI, especially its deep partnership with OpenAI, and the addition of AI technology to its cloud business and office suite, Microsoft has become the world's most valuable company, and this leading position looks set to continue for some time.

Looking at the market capitalization of other Magnificent 7 companies: Google grew by 20% with Gemini, Meta, which owns the open-source large language model LLaMA, rose by 32%, and Amazon, which invested in star AI startup Antropic, increased by 22%; The market value of Nvidia, a chip company known as the "arms dealer" in the AI industry, has increased by as much as 82%. (Note: Magnificent 7 refers to 7 tech companies with monopoly/oligopoly positions, pricing power, and long-term profitability, namely Microsoft, Google, Meta, Amazon, Nvidia, Apple, and Tesla.) )

Analysts generally believe that Apple's slowdown is mainly due to weak growth in its core business, the iPhone, and the lack of new AI product lines. Although Siri was launched in 2011 as an AI voice assistant, it is far inferior to competitors from Google, Amazon, and OpenAI in terms of accuracy and usefulness.

On the other hand, competitors in the mobile phone business have also introduced new AI features in their phones before Apple. For example, Samsung Electronics' recently launched high-end Galaxy phones feature the latest generative AI technology, offering features such as real-time language translation, summarizing notes, and editing photos.

In the face of pressure from all sides, Apple announced in February that it would cancel a decade-long plan to build a car and transfer some employees to the generative AI team, signaling that AI will become a focus for the company's future development.

In a May 2 conference call with analysts, Tim Cook said Apple has an advantage in the AI era with its ability to seamlessly integrate hardware, software, and services. The CEO said he had used ChatGPT last year and believed that ChatGPT still had a lot of problems to solve at that time. He repeatedly stressed that Apple would introduce new AI features on a "very deliberate basis", which may explain why Apple has been slow to roll out its AI product line.

So does GPT-4o live up to Cook's criteria? I believe that we will be able to see this at Apple's annual Worldwide Developers Conference in June.

Read on