laitimes

OpenAI叇巼બ模५GPT-4o, 微๓ "I'm going to say "I'm going to go"

author:Rui Finance

Recently, OpenAI, a leader in the field of artificial intelligence, released a new AI model, GPT-4o, a major breakthrough that has been hailed as "changing the history of human-computer interaction overnight". GPT-4o not only supports voice chat, but also enables real-time video interaction, which is as smooth as human interaction. The advent of this technology will undoubtedly bring new development opportunities in the field of artificial intelligence.

OpenAI叇巼બ模५GPT-4o, 微๓ "I'm going to say "I'm going to go"

OpenAI's ambitions

OpenAI's flagship product, ChatGPT, can understand natural language and answer users' questions, but due to its "pre-trained" principle, it can't search for content instantly. In addition, the generation mechanism of large language models also makes it impossible for ChatGPT to completely avoid the phenomenon of "serious nonsense". So, people who want to stay up-to-date with real-time content still need to turn to search engines.

Traditional search engines are based on keyword matching, that is, identifying the search scope based on the keywords entered by the user, and matching the massive amount of information that may be in line with the user's intent. However, the pain point of traditional search is the large amount of redundancy and information inconsistency caused by the massive amount of information from different sources, which also leads to a lot of information searched, but no useful things can be found.

OpenAI clearly wants to be an important connection point between humans and data, and ChatGPT alone (even the smartest GPT) can only meet part of the demand, and the launch of a search engine is imperative. At present, the industry is most concerned about what kind of form OpenAI's search engine will be, and whether it can really shake Google's long-established search market ecology.

Before OpenAI, there was already a generative search engine in the United States, Perplexity. Founded in 2022, Perplexity is a Silicon Valley-based startup that focuses on developing generative search engines using artificial intelligence technology to provide direct answers to search queries instead of providing a list of website links. PerplexityAI integrates videos, images, etc., in the answers provided, and sometimes provides direct link resources. Perplexity has been liked by people including Nvidia CEO Jensen Huang, and has reached 10 million MAUs in the first and a half years since its establishment.

So, will OpenAI's search engine be similar to PerplexityAI, or will it bring more surprises? We still have to wait for the final reveal of OpenAI.

GPT-4o is not only completely free, but also covers desktop and mobile apps, greatly improving performance, and can comprehensively process text, images and audio, making human-computer interaction more natural and simple. For example, GPT-4o can be allowed to join a web conference and record a summary summary of the speech for the user.

What exactly is GPT-4o used for? Users can let GPT-4o handle the problem at hand, greatly improving productivity, and can have real-time voice conversations with AI, just as naturally and smoothly as chatting with a real person. AI has reached human speed in processing responses, and can even understand the user's emotions and respond accordingly.

Steal the limelight from Microsoft

In the face of OpenAI's deliberate crash and seizing the limelight, what kind of AI products did Google come up with at today's I/O conference, and did it bring enough shock and novelty?

The Google I/O Developer Conference has entered its 16th year this year, and AI has long become the absolute or even the only protagonist of the I/O Conference. Google CEO Pichai even announced at the end that AI was said 121 times in the whole conference, which caused the audience to laugh. Although there was no mention of competitors throughout the conference, Google CEO Pichai began to show Google's AI prowess from the beginning of the keynote, announcing that Google has fully entered the Gemini era. He emphasized that Google has been investing in AI for more than a decade, through every layer of AI: research, products, infrastructure.

While AI upstart OpenAI has a first-mover advantage in product launches, Google has an overwhelming advantage in research papers, user scale, number of products, and computing power, which is a direct reason why OpenAI must form an alliance with Microsoft, as neither company can compete with Google alone.

Pichai also announced that the Gemini model has covered 2 billion users across Google's entire platform, with more than 1 million users signing up for it in just three months. The native multi-model Gemini 1.5 Pro, released two months ago, has already been used by more than 1.5 million developers.

In terms of performance, Google is the Thanos of the AI industry. Gemini 1.5 Pro has directly improved the performance of Token (Context Processing) to the million level, comprehensively overpowering GPT-4.0 Turbo, which is trapped by slower performance. Three months later, Google today announced that the improved Gemini 1.5 Pro is generally available to Gemini Advanced users in 35 languages.

What's even more brutal is that Google has also doubled the context window processing performance of the Gemini 1.5 Pro to 2 million (only available to developers for the time being), which OpenAI can only hope to achieve. Pichai announced that this is a significant step towards the ultimate goal of infinite context.

What kind of experience can Gemini 1.5 Pro bring to users? Google used Workspace to demonstrate the dramatic changes that AI can bring to productivity. For example, if you're meeting remotely via Google Meets, you can ask Gemini to record and list the minutes for yourself, even if the user can't attend.

With Gemini, Gmail has a soul. Writing an email is already a basic operation. Users can ask Gemini to help them organize and summarize Gmail's massive emails, organize and summarize users' consumption expenses based on recent receipts and credit card bill emails, and give a professional and specific financial expenditure list.

Equip the AI with eyes and mouths

Zhou Hongyi pointed out that according to the brief technical principle introduction at the OpenAI press conference, different from the traditional practice of translating speech into text processing and then translating it into speech, this technology directly processes the speech and forms an integrated large model engine to achieve direct understanding of voice input - including understanding the details of emotions, feelings, intonation, and accents contained in the voice, and at the same time directly outputting speech.

"This brings a new experience, that is, the delay is only about 300 milliseconds, which reaches the response speed of human conversation, so that not only can you understand the emotions in your words, but also can be accompanied by happiness, sadness, disappointment, excitement or more complex feelings when outputting answers." Zhou Hongyi said.

Zhou Hongyi also pointed out that in addition to the amazing voice processing level, there is an easy overlook that GPT-4o can actually directly open the mobile phone camera and give it a more powerful eye ability directly through the mobile phone camera. This may not be as good as Sora, but it's a step up from GPT-4.5's ability to input images into forms. "So to sum up, GPT-4.0 is equivalent to giving artificial intelligence the ability to understand knowledge, which is equivalent to having a brain, and then GPT-4.5 is equivalent to giving some rudimentary ability to see, and GPT-4o actually adds eyes that can really understand the world, and ears that can understand people's speech, and the mouth can also freely express their emotions and emotions."

In Zhou Hongyi's view, some people will be disappointed that OpeanAI did not launch GPT-5.0 in this release, but the path to general artificial intelligence is not only to catch up with humans in super reasoning ability, knowledge ability, and logical ability, but more importantly, the ability to interact with people. Therefore, when AI can see the world better through both mobile phone cameras and ubiquitous IoT cameras, and can interact with people at the same response speed, this thing becomes very scary, "that is, it makes artificial intelligence truly more human-like."

To sum up, the development of artificial intelligence technology is changing with each passing day, and every technological breakthrough has brought us new surprises. The release of OpenAI's new AI model, GPT-4o, and Google's Gemini 1.5 Pro at the I/O conference are important breakthroughs in the field of artificial intelligence. The advent of these technologies will undoubtedly bring new development opportunities in the field of artificial intelligence and bring more convenience to our lives. However, we should also note that the development of AI technology still faces many challenges, such as how to ensure the safety of AI and how to avoid the abuse of AI. These questions require us to continue to think and explore while developing artificial intelligence.

Read on