laitimes

Artificial intelligence can also provide "emotional value", OpenAI released a new large model GPT-4o

author:Modern Express

In the early morning of May 14, Beijing time, OpenAI, a world-renowned artificial intelligence research institution, announced at its spring conference that it had officially launched a new multimodal large model GPT-4o. This revolutionary product marks a new breakthrough in the field of generative AI, bringing users an unprecedented interactive experience. GPT-4o has quickly attracted wide attention in the industry due to its powerful text, audio, and image processing capabilities, as well as its fast response and free and open characteristics.

Artificial intelligence can also provide "emotional value", OpenAI released a new large model GPT-4o

The response time is only 232 milliseconds, and the chat is comparable to that of a real person

GPT-4o, as OpenAI's latest flagship product, has an "o" in its name that stands for "omni", which means "all-rounder". The model not only supports text input, but also accepts any combination of audio and image as input and generates the corresponding text, audio, and image output. This feature makes GPT-4o extremely flexible and adaptable in terms of human-computer interaction.

At the press conference, OpenAI demonstrated GPT-4o's real-time interactive capabilities. Whether it's voice input or image recognition, GPT-4o can give an accurate response in a short period of time. Users are able to use ChatGPT as naturally as they interact with an assistant, or they can interrupt ChatGPT while it is answering a question. Moreover, the new model is able to capture the emotions in the user's voice and generate speech in different emotional styles, just like a real person.

In terms of audio input in particular, GPT-4o's response speed is only 232 milliseconds to 320 milliseconds, which is comparable to the response speed of human conversations. This near-real-time interactive experience makes GPT-4o have great application potential in voice assistants, intelligent customer service and other fields.

Artificial intelligence can also provide "emotional value", OpenAI released a new large model GPT-4o

For example, in the demo video released by OpenAI, when the user points the camera at the birthday cake and candles, GPT-4o can quickly react that they are celebrating a birthday. And when the user asks for a birthday greeting song, GPT-4o is able to sing like a real person, and the tone is playful and not stiff.

It is worth mentioning that GPT-4o will be free and open to all users. This move will undoubtedly greatly lower the threshold for the use of AI technology, so that more people can experience the convenience brought by AI technology. At the same time, OpenAI also provides additional benefits for Plus users, including 5 times the call limit.

The power of GPT-4o is due to its end-to-end training approach across text, visual, and audio. This means that all inputs and outputs are processed by the same neural network, enabling efficient information integration and generation. This training method not only improves the performance of the model, but also makes GPT-4o particularly good at image and audio understanding.

CEO Sam Altman said that the original ChatGPT showed the rudiments of a language interface, while the new ChatGPT has a very different feel. "It's fast, smart, fun, natural, and useful."

"For me, talking to a computer has never really been natural, but now it is. I really see an exciting future where we can do more things with computers than ever before. Ultraman said.

GPT-4o may trigger a new outlet for AI applications

This multimodal large model has made significant breakthroughs in text, voice, and video, greatly enhancing the application potential of AI. The launch of GPT-4o may accelerate the implementation of AI applications and promote the penetration of AI technology into a wider range of fields. At the same time, GPT-4o's multi-modal interaction capability will also bring new development opportunities and challenges to AI technology.

Artificial intelligence can also provide "emotional value", OpenAI released a new large model GPT-4o

Silicon Valley serial entrepreneur, founder and CEO of Traini Sun Linjia said in an interview with the first financial reporter that the latest release of OpenAI has upgraded and changed the way of interaction. "Voice is becoming a more stable interface, widening the physical boundaries of human interaction with products. And when talking to GPT in real time, GPT can respond emotionally, which is cool, AGI was lacking in emotion before. He also said that the field of artificial intelligence (AI) is developing in the direction of AI applications, which will create real value for life.

However, there are also those who are cautious about the future development of GPT-4o. They believe that although GPT-4o has made a major breakthrough in technology, there are still some problems and challenges that need to be solved in practical applications. For example, how to ensure the privacy and security of GPT-4o when processing user data, how to avoid GPT-4o's misjudgment and misleading, etc.

In response, OpenAI said that GPT-4o "created a new security system to protect the voice output." GPT-4o also conducted an extensive external assessment with more than 70 experts in areas such as social psychology, bias and fairness, and misinformation to determine what risks the newly added model poses or amplifies.

In response to Khan Academy's example of Khan using GPT-4o to tutor his son in math problems, AI software developer Mckay Wrigley wrote on social platform X: "This demo is crazy. Students share their iPad screens with GPT-4o, and AI converses with them to help them learn in real-time. Imagine if every student in the world could learn like this, and the future would be so bright. ”

Some netizens feel that OpenAI has further opened up the gap with Apple, and think that GPT-4o "functions kill Siri in seconds".

However, according to foreign media reports, Apple is close to reaching an agreement with OpenAI to introduce part of the latter's technology to the iPhone this year, providing a "chatbot" powered by ChatGPT as part of the artificial intelligence features in iOS 18.

However, Apple is also in talks with Google to license the Gemini chatbot. Google will hold an I/O developer conference one day after OpenAI's spring update conference, and Google has mentioned the "Gemini era" in the official blog post of the conference, and it is expected to release the latest developments of the Gemini large model.

Modern Express/Hyundai + reporter Long Qiuli comprehensive

Read on