laitimes

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

author:Silicon Star Man

This year's Google Shoreline Amphitheater is filled with an atmosphere like a gladiatorial arena like never before.

The day before, OpenAI turned the world upside down with GPT-4o and the new ChatGPT, and what Google is going to do at the most important Google I/O this year and even in recent years seems to be the only theme.

In fact, according to OpenAI sources, GPT-4o is a model that has been developed at least two years ago, and the small AI circle in Silicon Valley and the frequent exchange of information not only have OpenAI opportunities to deliberately choose to snipe Google before Google's conference, but the latter will also expect this.

So, when Pichai took center stage, a counter-sniper fire began.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

In this two-hour launch, Google has been both offensive and defensive.

It has made the most thorough AI transformation of its home service search, and has also comprehensively updated the Gemini model family again.

Defend the direction of OpenAI's onslaught and attack at the same time.

On the one hand, Veo, a model with a gunpowder effect that surpasses Sora, has been released, and it is a product that can be applied for and experienced immediately; In addition, Gemini Live, a voice-visual interaction function similar to GPT-4o, was also showcased, and Project Astra, an AI agent that is more radical than competitors such as OpenAI, was also introduced.

The following is a transcript of the scene.

Gemini,Gemini,还是Gemini

When Google CEO Sundar Pichai took the stage, Gemini spoke even more often than the word Google in his first few minutes.

Gemini was the core model officially launched at Google I/O last year, and a year later, Google has used it to complete its internal "unification". The model is Gemini, the smart assistant is Gemini, and the core of Android is Gemini. Pichai doesn't even call himself Googler anymore, they call:

Geminary。

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

At the conference, the Gemini model was the first to be updated. A few months after a long-text version of Gemini 1.5 was launched in preview, today it's officially released to everyone. The previous Gemini 1.5 version had a context length of 1 million tokens. And Pichai seems to understate the announcement:

The length of the new version was refreshed again, reaching 2 million tokens.

The developers at the scene erupted in the first cheers of the day.

"We're officially in the Gemini era." Pichai said straight to the point. There are currently more than 15 million developers using Gemini to develop. Gemini has reached 1 million subscribers in the last three months.

Specific information about Gemini is, of course, from Demis Hassabis, CEO of Deepmind. It was also the legend's first Google I/O presentation.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

In his presentation, the first release was Gemini 1.5 flash. This is an end-side model, and there are also versions of 1 million and 2 million tokens. This seems to point to Google's next end-to-end ambitions.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

Queue up

"We always have a lot of models being trained at the same time, and we use our strongest models to help the smaller ones."

For the previously launched separate version of the Gemini App, Google has also made an update and launched a higher-level subscription service, Gemini Advanced. That is, the highest level of service that benchmarks ChatGPT Plus.

In this service, a new feature looks to be a response to yesterday's update to ChatGPT - Gemini Live. You can make real-time and delay-free phone calls in Gemini to interact with the AI, which is what GPT-4o did yesterday. Unfortunately, this part is just a passing pass, and it seems that Google is more trying to say, even if it is a day late, to tell the world that you are not the only one who can do it.

However, there seems to be a slight disappointment from the scene, and people clearly want to see more releases from Needlepoint Pair Maiman.

有视觉记忆的AI Agent

So, the big one came.

An AI Agent full of ambitions.

In the face of OpenAI's offense, you can't just defend. Google also needs something a little more radical to fight back. This thing is Project Astra. This is an AI agent that is still under development, and Pichai describes Google's dream as always being to make a powerful AI agent.

Hassabis, CEO of Google Deepmind, took the stage to explain and show a video of Astra's prototype in action.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

Yes, I deliberately filmed a meaningful handover

In the demonstration, everything is similar to the AI agent we have seen at the beginning, which can recognize objects through the camera turned on by the user and interact with the user's voice in real time. And the amazing moment came at the end, when the user took Astra around for a long time, and suddenly asked a question that had not been touched on before:

"Do you remember where I put my glasses?"

This was a question that hadn't been asked before, but Astra had "seen" his glasses when the camera scanned, and the AI agent actually recorded them visually.

"Your glasses are next to the apples on the table." Astra replied.

This made the audience exclaim, and it was also the longest applause at the press conference.

In addition, Google has clearly made an offensive against Sora. A brand new visual model has been released, Voe. This is the culmination of many of Google's past visual models, and it can also be seen as an internal resource integration forced by Sora.

In Voe's presentation, users can continue to increase the length of the video generated by clicking on the extension, which allows it to exceed the initial 1 minute of Sora and maintain consistency.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

Google also highlighted their collaboration with artists in the development of these products. It seems to be shouting to those artists who are dissatisfied with Sora, come here, come here, I'm better.

Finally getting started with search: the biggest makeover of Google Search

In addition to the reaction to OpenAI's attack, there is one thing that people are concerned about: how is Google's search revamped.

When Google will start searching is the big moment that everyone is looking forward to. OpenAI's previous smoke bombs and Perplexity's constant touching of porcelain have made Google seem too quiet all the time. And this time there is finally the biggest update and change.

When hundreds of millions of U.S. users open Google today, they'll see the biggest changes in recent years.

The AI overview, which is an AI-generated summary of search answers, appears under everyone's search box.

Moreover, this summary is not a fixed template, but is adapted to your question.

For example, Google can help you plan based on your problem. At this time, under the search box, the steps in progress will be displayed, and then you will be shown different cards in the Overview, sorting out the required information and providing it to you.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

Google says that this can only be done with a strong real-time search, and the implication is that companies that don't have a search ability base should not come to touch porcelain.

And the revamp of search is just the beginning, and it looks like it is going to become a super portal for Google to stimulate users' AI needs.

For example, Google shows a scenario where when users don't even know what they should ask, Google can recommend it to you and brainstorm with you. At this time, the search interface has further changed completely. Like the flow of information from different cards, each of which can be manipulated further.

"Google will Google for you." That's Pichai's definition of this.

Taking it a step further, Google also showed a feature to search with real-time video conversations. And this is also the first live demo after half an hour.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

The scene was too quiet for a while, and people were waiting for the live demo, and there was a commotion when the car was pushed up with a computer

When you buy a record player and you don't know anything about it, and it has a playback problem, but you don't know what the problem is, you can just turn on the camera and ask.

And Google directly gave the answers and solutions sorted out by AI.

"This is the search of the Gemeni era." Pichai said. The applause rang out again.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

The AI capabilities of the whole family bucket have been further transformed and upgraded

The repertoire on Google I/O is a showcase of the new features of the Family Bucket. And the Gemini era is coming, and these family buckets will naturally be updated.

The first application case that Pichai presented on the day was "Ask Photo". 9 years ago, Google Photo was released. 6 billion photos and videos are uploaded every day. Gemini makes AI editing easy.

You can now ask a photo to have a conversation with the photo. For example, you can ask the Photo app, "What is my license plate?" Then Gemini looks for your car in the photo and tells you the answer.

Or you can ask Photo, "When did my daughter learn to swim," and then ask further, "How is her progressing?" Photo can show you the corresponding photos and videos. This is really useful for people who hold their mobile phones every day to watch the growth of their precious children.

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

The display of this function also made the audience cheer.

In addition, Workspace has a lot of new features, Google also showed a teaching tool based on multimodal ability, you can ask for teaching in voice, such as "give me a case of explaining the mechanics of a basketball", and Motion will automatically speak in a natural voice.

Another feature that caught the eyes of the live audience was Android's use of Gemnini. In a live demo showcased, a scam phone call came in, like we often encounter, and after a stern reminder, the other party asked you to transfer your money to a safe account.

And just as this sentence was spoken, Gemini was stimulated, and a warning box popped up, preventing the call from proceeding.

The audience burst into what could have been the second-longest cheer of the day.

At the end of the press conference, Pichai joked at the end of the press conference that someone must be counting, how many times I talked about AI today.

"Don't count, Gemini is done."

Then the big screen shows 120.

"I've said AI so many times."

The stronger AI family bucket and the unified Gemini|onlookers watched Google I/O's Jedi counterattack

Then Gemini added another 1 to make it 121.

The scene laughed.

It's clear that Google is still in the process of consolidating resources. Whether it is the improvement of the ability of the whole family bucket or the transformation of search, there is a logic behind it, to use the capabilities and resources accumulated by Google over the years, and Gemini will be the only brain, transform everything, keep and continue to grab new users.

Google won't take the tables lightly, and the AI war will continue.