GPT-4o Released: How an Intelligent Assistant That Can Read Users' Emotions Goes from Science Fiction to Reality

Beijing News

2024-05-14 14:58Posted on the official account of Beijing Beijing News

In the early morning of May 14, Beijing time, OpenAI released a new generation of flagship generative model GPT-4o in a 26-minute live broadcast, demonstrating a series of new capabilities such as millisecond-level response, recognition of human emotions for audio and video interaction, and multi-modal input/output. Along with these capabilities is a new desktop version of ChatGPT and a new user interface, which CTO Mira Murati said was to make it more accessible to more people, announcing OpenAI's product philosophy: free first.

After the press conference, OpenAI CEO Sam Altman posted a word on his personal social platform: she (her). In the sci-fi movie "She", the AI assistant fell in love with humans, and today, ChatGPT voice assistant products with new functions and access to GPT-4o seem to be really expected to bring the bridge of sci-fi movies into reality.

Recognize the tone of expression and can interrupt GPT-4o to show the "real" voice assistant at any time

"It was my first time coming to a live press conference, and I was a little nervous." When Mark Chen, the head of OpenAI's cutting-edge research department, spoke to ChatGPT through his phone, ChatGPT replied, "Why don't you take a deep breath?" ”

"Okay, I'll take a deep breath."

"Slow down, Mark, you're not a vacuum cleaner."

- This is a scene that happened in the live broadcast, through the live broadcast, OpenAI showed how ChatGPT recognizes the emotions in the user's voice after connecting to GPT-4o. Mark has since demonstrated how ChatGPT can read AI-generated stories in different voices, including super-dramatic recitations, robotic tones, and even singing.

GPT-4o Released: How an Intelligent Assistant That Can Read Users' Emotions Goes from Science Fiction to Reality

Mark, head of OpenAI's cutting-edge research department, demonstrates GPT-4o's real-time voice interaction capabilities.

This seems to be different from the "traditional" voice assistant technology, some experts said that the actual technical logic of some "voice assistants" on the market is to convert the heard voice into text, and then convert it into a voice reply to the user after answering with text, so this kind of voice assistant cannot hear the emotions contained in the voice, and there is a delay problem, but according to today's demonstration, OpenAI seems to solve this problem.

According to OpenAI's latest blog post published on its official website, before GPT-4o, the voice mode spoke to ChatGPT with an average delay of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4). Now, GPT-4o can respond to audio input in as little as 232 milliseconds, with an average response time of 320 milliseconds, similar to that of a human. It is understood that GPT-4o is a new model trained separately that can process text, visuals, and audio end-to-end, which means that all inputs and outputs are processed by the same neural network.

In addition to emotion recognition in terms of speech, GPT-4o also has real-time vision capabilities, according to OpenAI researcher Barret Zoph's demonstration, ChatGPT helps him solve an equation in real time through his phone camera, like a real math teacher next to guide him through every step of the problem. "I'm there for you whenever you're struggling with math." ChatGPT said.

ChatGPT can even observe the user's facial expressions through the front-facing camera and analyze their emotions. In response to a netizen's question, "Can ChatGPT recognize your expression?" Barrett pointed his phone's camera at him, and ChatGPT replied, "A big smile, you look very happy." ”

ChatGPT recognizes OpenAI researcher Barrett's emotions.

In addition, this demo also demonstrated GPT-4o's code capabilities, real-time translation capabilities, and more. According to Altman, the "o" in GPT-4o stands for "omni" because the model has the ability to text, picture, video and voice at the same time.

The data shows that GPT-4o's performance on English text and code matches that of GPT-4 Turbo, but with significantly improved performance on non-English text, while the API (interface) is also faster.

At the same time, the cost of GPT-4o has also been reduced, according to the official website, GPT-4o charges $0.005 and $0.015 per 1M token (statement unit) input and output, while GPT-4 Turbo charges $0.01 and $0.03 per 1M token input and output, compared to GPT-4o's cost is reduced by 50%.

OpenAI's vision under the new interaction and new interface: let more people use the product free first

After being connected to the new version of the large model, ChatGPT can receive any combination of text, audio, and images as input, and generate any combination of text, audio, and image output in real time.

In today's first round of demos, ChatGPT was used directly on the mobile phone. It is worth noting that there has also been news recently that Apple is in talks with OpenAI to use ChatGPT features in the next-generation iPhone operating system.

In addition, ChatGPT also has a "desktop version" in Apple computers, as well as a new user interface. With keyboard shortcuts (Option + Space), users can instantly ask ChatGPT questions, and in addition, users can take screenshots of their screens and discuss them directly in the app. Later this year, OpenAI will also launch a Windows version.

Altman posted, "The new voice (and video) mode is the best computer interface I've ever used. It feels like AI in a movie, which really surprised me. Reaching human-level response times and expressive abilities is a big change. ”

"The old ChatGPT interface showed the possibilities of language, while the new interface feels fundamentally different. It is fast, smart, fun, natural and rewarding. For me, talking to a computer never really felt natural, but now it is. As we add (optional) personalization, access to your information, the ability to act on your behalf, and more, I really see an exciting future where we're able to do a lot more with computers than ever before. Ultraman said.

In addition, both Mira and Altman emphasized OpenAI's "free" philosophy.

Mira said that what makes GPT-4o special is that it brings GPT-4 level intelligence to everyone, including free users, in a very natural way of interaction, "In the future, OpenAI's products will be free first, in order to make them available to more people." ”

Altman also emphasized the importance of "free", "One of our key missions is to make extremely effective AI products available to people for free, and I am proud that we have made the best model in the world and can use it for free on ChatGPT without watching ads." ”

Altman said that the original idea when he and his team created OpenAI was to create artificial intelligence and use it to create all sorts of benefits for the world, "and now it seems that we're going to create AI, and then other people are going to use that AI to create all sorts of amazing things that benefit us all." ”

"We're a business, and we're looking to find ways to charge and help us deliver free, great AI services to billions of people." Ultraman said.

However, the Shell Finance reporter logged in to the web version of ChatGPT on May 14 and found that there are still only two built-in large model options: GPT-3.5 and GPT-4, and there is no option to use GPT-4o for free. OpenAI said that in the coming weeks, users will automatically receive updates to GPT-4o without taking any action.

Screenshot of the reporter logging into the web version of ChatGPT on May 14.

It is worth noting that the release time of OpenAI is just before the press conference of its rival Google, and there are voices that OpenAI would rather launch GPT-4o first than GPT-5 as previously expected, mainly for competitive purposes.

"What is more disappointing is that OpenAI did not release GPT-5 this time, and even GPT-4.5 was not seen. OpenAI has released a series of applications, most importantly a voice assistant that has a far superior experience to Siri thanks to the use of end-to-end large model technology. OpenAI's release of the application just shows that the application has a great future in the field of artificial intelligence. At present, it seems that GPT-5 may have a 'difficult birth' for a while. Fu Sheng, chairman and CEO of Cheetah Mobile, said.

Reporter contact email: [email protected]

Beijing News Shell Financial Reporter Luo Yidan

Edited by Li Zheng

Proofreading by Jun Liu

View original image 772K