laitimes

OpenAI model is finally updated! GPT-4o, with its powerful audio-visual capabilities, will be available to all users

author:51CTO

Edit | Yifeng

Produced by | 51CTO Technology Stack (WeChat ID: blog51cto)

Spring is finally here! The model of the GPT series has finally waited for a long-awaited update - GPT-4o has surfaced.

而且此前颇具神秘气息的“im-also-a-good-gpt2-chatbot”,正是其测试版本。

Ultraman was not seen in this update, but was hosted by OpenAI CTO Muri Murati. She has previously been the subject of some controversy over the lack of clarity about OpenAI's training data in interviews.

What did OpenAI say at its spring launch? In a word, GPT-4o is faster, more modal, and cheaper!

OpenAI model is finally updated! GPT-4o, with its powerful audio-visual capabilities, will be available to all users

Image

1. The latest model, GPT-4o

让奥特曼直呼“amazing work”的模型更新来了!

OpenAI model is finally updated! GPT-4o, with its powerful audio-visual capabilities, will be available to all users

Image

You can see that GPT-4o's performance is unbeatable. (As an aside, the Tongyi Qianwen model is silently on the right side of this picture).

The new large language model, trained on the massive amounts of data from the Internet, will be better at working with text and audio, and can handle 50 languages.

OpenAI's updated GPT-4o generative AI model will be officially available to developers and consumers in the coming weeks. The new model will be available to all users, adding that paid users will continue to "have five times the capacity limit of free users".

OpenAI CTO Muri Murati said GPT-4o provides "GPT-4 levels" of intelligence but improves GPT-4's capabilities in text, visuals, and audio.

Speaking in a keynote address at OpenAI's office, Murati said, "The strength of GPT-4o is that it spans speech, text, and vision. "This is very important because we are looking at the future of human-machine interaction.

GPT-4 is OpenAI's leading model before it, and it is trained from a combination of images and text, which can analyze images and text, complete tasks such as extracting text from images and even describing image content. But GPT-4o adds voice capabilities to this foundation.

这吻合了此前大家猜测的方向:“ChatGPT+Voice Agent”!

OpenAI model is finally updated! GPT-4o, with its powerful audio-visual capabilities, will be available to all users

Jim Fan, a scientist at Nvidia, updates his predictions before the live broadcast

2. GPT-4o's powerful "audio-visual" capabilities

OpenAI CEO Sam Altman released a message saying that the model is "natively multimodal," meaning the model can generate content or understand speech, text, or image commands.

What exactly can GPT-4o achieve in terms of speech?

GPT-4o dramatically improves the experience with ChatGPT – OpenAI's viral AI chatbot. ChatGPT has long offered a voice mode that uses a text-to-speech model to transcribe text in ChatGPT. But GPT-4o has improved on this, allowing users to interact with ChatGPT more like an assistant.

For example, a user can ask ChatGPT, which is powered by GPT-4o, a question and interrupt ChatGPT as it answers. OpenAI says the model can provide a "real-time" response and can even capture emotion in the user's voice and generate a "range of different emotional styles" of speech.

GPT-4o also improves ChatGPT's visual abilities. Given a photo or a desktop screen, ChatGPT can now quickly answer relevant questions, from "What's going on with this software code" to "What brand of shirt is this person wearing?"

"We know that these models are getting more and more complex, but we want the interactive experience to actually become more natural and easy, so that you don't have to focus on the user interface at all, but only on working with [GPT]," Murati said.

OpenAI claims that GPT-4o is also more multilingual, with improved performance in 50 different languages. Altman added on X that developers who want to use GPT-4o can access the API, which is half the price of GPT-4-turbo and twice as fast as GPT-4-turbo.

3. Write at the end

The launch of OpenAI's model GPT-4o with powerful audio capabilities gives us a further glimpse into the future of virtual assistants.

A tech blogger familiar with the matter said that the release at this time is also a signal that OpenAI and Apple have reached a deal. This means that the future of Siri may be powered by ChatGPT!

OpenAI model is finally updated! GPT-4o, with its powerful audio-visual capabilities, will be available to all users

Image

If OpenAI, Microsoft, and Apple all hold hands, then Google, the "AI Wang Feng", will really fall into the embarrassment of fighting alone.

Tomorrow, Google's developer conference will be here. OpenAI's rush to release a product update at this time is a bit of a steal from Google's limelight!

So, what do you think Google can release to save a game for itself?

Reference Links:

1.https://techcrunch.com/2024/05/13/openais-newest-model-is-gpt-4o/

2.https://www.theverge.com/2024/5/13/24155493/openai-gpt-4o-launching-free-for-all-chatgpt-users?showComments=1

Source: 51CTO Technology Stack

Read on