Google's two-hour I/O conference mentioned "AI" 121 times, released more than a dozen updates and new products, but "lacked surprises"

In the early morning of May 15, Beijing time, Google's annual developer I/O conference 2024 was held in the Shoreline Amphitheater near its headquarters in Mountain View, California, USA. The two-hour event was hosted by Google CEO Sundar Pichai.

The day before, OpenAI just released GPT-4o and the new ChatGPT, and the outside world is full of expectations for what kind of "AI" answer sheet Google will hand over at the developer conference.

The Paper (www.thepaper.cn) noted that during the entire developer conference, Sundar Pichai said that according to Gemini statistics, he mentioned "AI" 121 times during the entire press conference and released more than a dozen product updates and new products, but outside comments said that compared with GPT-4o's less than 30-minute press conference, "there is a lack of surprises".

Google's two-hour I/O conference mentioned "AI" 121 times, released more than a dozen updates and new products, but "lacked surprises"

The scene of Google's annual developer I/O conference

At this developer conference, Google made the most thorough AI transformation of the search business, updated and upgraded the Gemini1.5Pro version, and launched the Gemini1.5Flash lightweight small model. In addition, Google has launched Veo, a generative video model that claims to be more effective than Sora, Gemini Live, a voice-visual interaction similar to GPT-4o, and Project Astra, an AI agent.

Gemini has been upgraded to be lighter

At the conference, Google announced an update to the Gemini model, after Google launched the Gemini 1.5 Pro, a medium-sized multimodal model with a context length of 1 million tokens.

At the developer conference, Sundar Pichai announced that the upgraded Gemini 1.5 Pro will be able to follow increasingly complex and nuanced instructions, including those that specify production-level behaviors, such as roles, formats, and styles, by improving data and algorithms to improve the model's code generation, logical reasoning and planning, multi-turn dialogue, and audio and image comprehension. Developers using the API and Google Cloud customers can get 1.5 Pro of the 2 million tokens context window through the waitlist.

The upgraded Gemini 1.5 Pro will be available to developers worldwide, supporting 35 languages in more than 150 countries.

In addition, in order to meet users' needs for low latency and low cost, Google released the lightweight model Gemini 1.5 Flash at this press conference.

Gemini1.5Flash

Compared to the Gemini 1.5 Pro, this version is characterized by a faster response time and a cost as low as $0.35 per million tokens. The Gemini 1.5 Pro, on the other hand, is aimed at users who need high-quality content, charging $7 per million tokens.

Despite its small size, Gemini 1.5 Flash implements a long context window of 1 million tags, and developers can register to try 2 million tags, which is suitable for tasks such as summarization, chat apps, image and video captions, data extraction of long documents and tables, and more. According to reports, these features are implemented because Google has used 1.5Pro to train the model called "distillation", which migrates the core knowledge and skills of the larger model to a smaller, more efficient model.

发布AI智能体Project Astra

"For a long time, we have had a dream to create a universal AI agent to help make people's lives more convenient. Now, we've been honing our sword for years to launch Project Astra's universal AI agent. Sundar Pichai said at the press conference. Based on the Gemini model, the agent is able to process information faster by continuously encoding video frames, combining video and voice inputs into an event timeline, and caching this information for efficient recall.

At the conference, Demis Hassabis, CEO of Google Deepmind, took the stage to explain and show the prototype operation video of Project Astra.

During the presentation, Google showed a person holding a mobile phone, pointing a camera at various parts of the office, and interacting with it with words: "When you see something making a sound, please let me know." In this video demonstration, Astra is able to recognize various objects and even codes and interact with humans in real time.

During the demo, a user suddenly asked Astra a question that he hadn't covered before, "Do you remember where I put my glasses?" ”

"Your glasses are next to the apples on the table." Astra replied. This process caused exclamations from the scene.

This suggests that Astra had "seen" the user's glasses when the camera scanned, and Astra recorded them visually.

But after watching GPT-4o's demo, Astra's demo seemed to lack surprises.

Google says that in the future, people will be able to use their mobile phones or glasses to provide services with their on-the-go AI expert assistants. However, these features won't be available in Google products like Gemini apps and web experiences until later this year.

Released video generation model Veo against Sora

At the press conference, Jamis Hassabis announced that Google has officially released a new video generation model, Veo, which will become Sora's new fierce rival.

Google claims that Veo is capable of creating high-quality 1080p videos of more than 60 seconds based on text and images, and users can set lighting, lens language, video color style, and more. In addition, Veo is able to understand film and visual technologies, such as the concept of time-lapse.

Users can generate a video by simply writing a text prompt, such as the text prompt: "Panning the camera over the tranquil mountain terrain, the camera slowly reveals snow-capped peaks, granite rocks, and clear lakes reflecting the sky." "A spaceship shuttles through the vastness of space, stars streak, high speed, sci-fi".

Like Sora, Veo is not currently publicly available to a small number of creators.

Search engine upgrade, combined with Gemini

Liz Reid, Google's head of search, said at the launch that Google has undergone many technological changes over the past 25 years, "and we're constantly reimagining and expanding the capabilities of Google Search." ”

Liz Reed announced that nowadays, with the help of AI, Google search can do more than people think. She said that combining Gemini's features including multi-step inference, planning, and multimodality with Google's search system to launch AI Overviews. With AI Overviews, users can upload videos that demonstrate the problem they want to solve, and then launch searches in forums and other areas of the internet to find solutions.

In addition, users can ask complex questions to custom Gemini models. Even when users don't know what they're asking, Google can recommend them and brainstorm for them. Users can chat directly with Gemini to find detailed information from their entire inbox.

"From answering, planning, and customizing requirements to organizing and video searches, Google does it for you, and all you have to do is ask questions." "However, AI Overviews will be launched in the United States first.

Trillium：AI基础设施的更新

Training a large model requires a lot of computing power. Halfway through the conference, The Paper noticed that Sundar Pichai had quietly announced Google's sixth-generation tensor processing unit (TPU), Trillium. Google calls it "the most high-performing and energy-efficient TPU to date," with Trillium's peak computing performance per chip improved by 4.7x compared to the previous generation of TPU v5e. Google will offer Trillium to its cloud customers later this year.

It is worth noting that Google has also launched a series of new AI features on the Android platform this time. Previously, the "Circle to Search" feature allowed users to search without having to switch apps, but now it can also be used as a study companion to solve complex problems such as math problems and charts, Google said. This feature is currently available on more than 100 million Android devices, and that number is expected to double by the end of the year.

Source: The Paper (Reporter Yu Yan)

Editor: Wang Shu