If Jensen Huang is the tech Tyler Swift who has gained a lot of fans with his affinity and appeal, then Sam Altman is a bit like the AI Kim Kardashian, always good at creating buzz and stealing the limelight.

In the last two weeks, rumors have been raging about OpenAI's upcoming launch of a search engine, with all the spotlight on Altman. Just when everyone's expectations were about to reach their peak, the "popular fried chicken" of Silicon Valley suddenly jumped out last Friday and announced that OpenAI's spring product launch conference would be held on May 10, the day before Google's I/O developer conference. He also promised on Twitter that he would bring some "magical" updates, such a set of "marketing mix punches" not only created momentum for OpenAI, but also made Google's "warm-up" instantly mute.

So at Monday's press conference, what exactly did OpenAI launch a "magic" product?

GPT-4o,OpenAI首款能分析情绪的多模态大型语言模型

At 10 a.m. PT, OpenAI's chief technology officer, Mira Murati, entered the live broadcast room to brief the audience on the big spring update, which includes a desktop version of ChatGPT, an updated user interface, and most importantly, a new flagship model, GPT-4o.

(Murati at the press conference)

The "o" in GPT-4o stands for "Omnimodal", and as the name suggests, this is a multimodal large model based on GPT-4.

What's more noteworthy is that GPT-4o is able to interact with users in a variety of tones and accurately capture users' emotional changes, which is a big improvement. Unlike previous versions, which only recognized voice input through "voice-to-text", GPT-4o is able to process voice input in real time and respond to the user's emotion and tone.

During the live broadcast, two OpenAI employees showed everyone the details of the update of GPT-4o.

1. Sensing user emotions: Mark Chen, head of cutting-edge research, asked ChatGPT-4o to listen to his breathing, and the chatbot detected his rapid breathing and humorously advised him not to breathe like a vacuum cleaner and to slow down. Mark then takes a deep breath, and GPT-4o says this is the right way to breathe.

2. Voices with different emotions: Chen demonstrated how ChatGPT-4o can read AI-generated stories in different voices, including super-dramatic recitations, robotic tones, and even singing.

(ChatGPT-4o根据指示变换语调让大家捧腹大笑)

3. Real-time visual capabilities: Researcher Barret Zoph demonstrates how ChatGPT-4o can solve math problems in real time through a phone camera, as if a real math teacher is next to guide each step of the problem. In addition, ChatGPT-4o can also observe the user's facial expressions through the front-facing camera and analyze their emotions.

(Barrett Zoff demonstrates solving equations under the step-by-step guidance of ChatGPT-4o)

4. More immediate voice interaction: ChatGPT-4o's response time has been shortened, and the interaction with the user is more instantaneous. Murati and Chen used the new chatbot to demonstrate real-time translation across languages, seamlessly transitioning between English and Italian.

As you can see, the focus of this update is to make the chatbot less mechanical and indifferent, but closer to a real human, able to understand and express emotions. So, how does GPT-4o achieve emotion recognition?

OpenAI has not released more technical details at this time, but according to its overview on its official website, before GPT-4o, when using ChatGPT's voice mode, it needs to be relayed through three models that are independent of each other:

1. The first model converts audio to text;

2. GPT-3.5 or GPT-4 then processes the text input and outputs the text;

3. The last model converts the text back to audio.

This often results in a large loss of information, such as the inability to capture intonation, recognize multiple speakers or background noise, and fail to generate laughter, singing, or other emotional expressions.

What's innovative about GPT-4o is that it's OpenAI's first model that integrates text, visual, and audio multimodal inputs and outputs. By training a new, unified model end-to-end, all input and output processing is done by the same neural network.

In addition to multimodal input and output, GPT-4o also has a faster response time: it can respond to audio input in as little as 232 milliseconds, with an average response time of 320 milliseconds, which is close to the response time of a human in a conversation.

GPT-4o's performance on English text and code is comparable to GPT-4 Turbo performance, with significantly improved performance on non-English text, while the API is also faster and costs are reduced by 50%. Compared to existing models, GPT-4o is particularly good at visual and audio understanding.

In order to give you a more intuitive feeling, we asked ChatGPT-4 to generate a table comparing GPT-4o and GPT-4 Turbo:

科技博主“All About AI”也在YouTube上展示了GPT-4o和GPT-4 Turbo的反应速度（下图）。

By asking GPT-4o (left) and GPT-4 Turbo (right) the same at the same time—"write three paragraphs about life in Paris in the 19th century"—we can observe that GPT-4 Turbo is still processing its output while GPT-4o has already finished processing and responding.

GPT-4o processed 574 tokens in 5216 milliseconds (5.216 seconds), which is equivalent to about 110 tokens per second; GPT-4 Turbo processed 474 tokens in 23,442 milliseconds (23.442 seconds), which is equivalent to approximately 20 tokens per second. The processing speed of the former is about 5.44 times faster than that of the latter.

After the launch, an OpenAI researcher confirmed in his own tweet that the mysterious GPT-2 chatbot that had previously appeared on the test site was indeed GPT-4o.

"GPT-4o is our latest cutting-edge model. We have tested a version on LMSys, which is im-also-a-good-gpt2-chatbot. William Fedus introduced it on his Twitter and got a retweet from Altman.

"ELO scores can end up being limited by the difficulty of the prompts. We found that on the more difficult set of prompts — especially programming — GPT-4o's ELO was 100 points higher than our previous best model," the engineer added.

从下图可以看出，GPT-4o（也就是im-also-a-good-gpt2-chatbot）的表现一骑绝尘，远高于其他大模型。

Murati also announced at the spring launch event that GPT-4o's text and image capabilities have begun to be available to paid ChatGPT Plus and Teams users, and will soon be rolled out to enterprise users. At the same time, free users will also be granted access to it gradually, subject to rate limitations. GPT-4o's voice capabilities are expected to be available to users in the coming weeks.

Currently, developers are able to use GPT-4o's text and visual modes through the API.

In addition, OpenAI has also optimized the user interface (UI) of ChatGPT and launched the ChatGPT app for macOS, which is available to paying users. The company said it will also launch a Windows version of the ChatGPT app later this year.

苹果将用GPT-4o取代自家语音助手Siri？

The launch of GPT-4o drove Apple's stock price up slightly.

On Friday, Bloomberg reported that Apple is considering integrating ChatGPT technology into its next-generation iOS18 system. If a deal is reached with OpenAI, Apple could launch a ChatGPT-based chat assistant as one of a series of new AI features the company plans to release in June.

(Bloomberg report)

For many years, Apple has been a technology stock favored by top investors and investment institutions, including Warren Buffett, and is the largest technology company by market capitalization, but in recent years it has underperformed other big technology companies.

Apple's stock price is down about 2% year-to-date, while Microsoft's stock price is up more than 10%. Thanks to its leadership in the field of AI, especially its deep partnership with OpenAI, and the addition of AI technology to its cloud business and office suite, Microsoft has become the world's most valuable company, and this leading position looks set to continue for some time.

Looking at the market capitalization of other Magnificent 7 companies: Google grew by 20% with Gemini, Meta, which owns the open-source large language model LLaMA, rose by 32%, and Amazon, which invested in star AI startup Antropic, increased by 22%; The market value of Nvidia, a chip company known as the "arms dealer" in the AI industry, has increased by as much as 82%. (Note: Magnificent 7 refers to 7 tech companies with monopoly/oligopoly positions, pricing power, and long-term profitability, namely Microsoft, Google, Meta, Amazon, Nvidia, Apple, and Tesla.) ）

Analysts generally believe that Apple's slowdown is mainly due to weak growth in its core business, the iPhone, and the lack of new AI product lines. Although Siri was launched in 2011 as an AI voice assistant, it is far inferior to competitors from Google, Amazon, and OpenAI in terms of accuracy and usefulness.

On the other hand, competitors in the mobile phone business have also introduced new AI features in their phones before Apple. For example, Samsung Electronics' recently launched high-end Galaxy phones feature the latest generative AI technology, offering features such as real-time language translation, summarizing notes, and editing photos.

In the face of pressure from all sides, Apple announced in February that it would cancel a decade-long plan to build a car and transfer some employees to the generative AI team, signaling that AI will become a focus for the company's future development.

In a May 2 conference call with analysts, Tim Cook said Apple has an advantage in the AI era with its ability to seamlessly integrate hardware, software, and services. The CEO said he had used ChatGPT last year and believed that ChatGPT still had a lot of problems to solve at that time. He repeatedly stressed that Apple would introduce new AI features on a "very deliberate basis", which may explain why Apple has been slow to roll out its AI product line.

So does GPT-4o live up to Cook's criteria? I believe that we will be able to see this at Apple's annual Worldwide Developers Conference in June.

OpenAI推出最新大模型“GPT-4o”，你的快乐悲伤它都能读懂

GPT-4o,OpenAI首款能分析情绪的多模态大型语言模型

苹果将用GPT-4o取代自家语音助手Siri？

Read on

The press conference was tragic, and Ultraman posted an article satirizing Google! Google's crazy restructuring to take on OpenAI

It was revealed that the OpenAI super alignment team was disbanded!

70B模型秒出1000token，代码重写超越GPT4o，来自OpenAI参投团队

OpenAI's Super Alignment Team Disbanded Insiders Revealed: Trust in Ultraman Collapsed

Google released a new upgraded large model to face off against OpenAI; Meizu released the new Flyme AIOS system

changes in the senior management of pharmaceutical companies Novartis and GSK in China; OpenAI's Chief Scientist Leaves | Executive Updates: May 5-17, 2024

The Conservative Rout? The driving force behind OpenAI's infighting left Altman: It makes me sad

OpenAI is shockingly exposed! Executives angrily denounced the suppression, and the 710 billion AI giant was embarrassed at home and abroad|Titanium Media AGI

GPT-4o sparks heated discussions about OpenAI's organizational innovation! Heavy responsibilities for fresh graduates and undergraduates, the ranks are all floating clouds

Ilya left OpenAI insider exposure: Ultraman cut his team's computing power and prioritized products to make money

In the second act of OpenAI's palace fight, the core security team was disbanded, and the person in charge blew up the inside story of his resignation

OpenAI forces departing employees to sign shut-up agreements: GPT can talk, but former employees can't

Unraveling the Mystery of Memory: Ebbinghaus's Forgetting Curve and Mind Model Playing Cards Help You Grow and Leap

After GPU, NPU becomes the standard configuration again, how do mobile phones and PCs carry large AI models?

Be a sneak peek! ByteDance is unprecedented! The large model is stunningly unveiled, and the price is as low as 99%!

39 million people watched Lei Jun's live test drive; Musk recruits second brain-computer experiment patient; DeepMind launches a large-scale model risk assessment framework

OpenAI responds to "gag" resignation clauses; Didi Chengwei: Liu Qing was promoted to permanent partner, and the company no longer has the position of president; NetBSD prohibits AI-generated code | Geek headlines

OpenAI employees were "sealed" when they left their jobs, the core security team was disbanded, and Altman responded urgently: there was an agreement, but it was never implemented!

From "sky-high prices" to "fracture prices", large models are about to change

If you want to land a large model, let everyone afford to use it first

Direct interaction with hundreds of millions of users Third-party AI models accelerate access to the Weibo ecosystem

iFLYTEK Xinghuo large model empowerment, opening up the "new consciousness" of virtual people

聊聊OpenAI最新发布的GPT 4o

When open source meets large models, what kind of changes will occur?

OpenAI Shock! The chief scientist suddenly left! Wang Yuquan's exclusive analysis!

It is said that the senior management of the Tsinghua Department of the large model company has changed

58.com Sun Qiming: How to build a large model of life service vertical? Self-developed + open source with both hands

AI Dimensity Full Push, China's First End-to-End Large Model Mass Production on the Car Xpeng opens the era of AI intelligent driving

The price of large models has fallen, and the Internet-style "turf war" has reappeared, will big factories really lose money?

The Past of China's Large Model Capital: 20 Large Model Insiders Walk on the "Life and Death Table"

The price war of AI large models starts, and the winner will be decided in a year?

Baidu's first Wenxin large model learning machine Z30 is on sale, and 8G +256G is sold for 6694 yuan