laitimes

Google's Gemini 1.5 Pro has been upgraded to 2 million tokens, announcing that it is open to developers around the world

author:DeepTech

On the afternoon of May 14, local time, Google held its annual I/O developer conference in Mountain View, USA.

In 110 minutes, the keyword artificial intelligence (AI) was mentioned 121 times, demonstrating Google's all-out stance and obvious ambition in the field of artificial intelligence.

As Google's flagship model, Gemini and its various iterations were unveiled and stole the show. Google is integrating it into almost all of its own products, including Android, search, browsers, and Gmail, to name a few, and the demos are dazzling.

Previously, there were three versions of Google Gemini, namely Ultra, Pro, and Nano, with different sizes, different performance, and different scenarios to deal with.

Now, Google has unveiled a new version at the conference, Gemini 1.5 Flash. Google says the new multimodal model is just as powerful as the Gemini 1.5 Pro, but it's optimized for "high-frequency, low-latency tasks." This makes it better able to produce quick responses.

Google's Gemini 1.5 Pro has been upgraded to 2 million tokens, announcing that it is open to developers around the world

(Source: Google)

Google has also made some upgrades to Gemini 1.5, which will supposedly improve its translation, reasoning, and coding capabilities. In addition, Google says it has doubled the Gemini 1.5 Pro's contextual window (the amount of information it can receive) from 1 million tokens to 2 million.

Both Gemini 1.5 Pro and 1.5 Flash are currently in public preview. Google also disclosed that there are currently more than 1.5 million developers of Gemini, and more than 2 billion users have experienced the power of Gemini.

With Gemini, a number of Google products have received new features. For example, Google Photos will add Ask Photos later this year, which will now make it easier to search for photos, identify different photo backgrounds, and even find photos based on license plate numbers or answer other questions about the content of photos.

Google CEO Sundar Pichai said on stage that Gemini can "turn any input into any output." This means that it can extract information from text, photos, audio, social or web videos, and live video from your phone's camera, integrate it, and finally summarize it and answer questions.

Google showed a demo video in which a person scans all the books on the shelves with a camera and records the titles in a database for later identification.

Google's Gemini 1.5 Pro has been upgraded to 2 million tokens, announcing that it is open to developers around the world

(Source: Google)

Another highlight that Google announced at the conference was the launch of a new system called Astra later this year, promising that it would be the most powerful and advanced AI assistant Google has ever launched.

The current generation of AI assistants, such as ChatGPT, can retrieve information and provide answers, but that's not all it takes. But this year, Google rebranded its virtual assistants as more advanced "agents" that supposedly have reasoning, planning, and memory skills, and are able to take multiple steps to perform tasks.

Oriol Vinyals, Google's vice president of research at DeepMind, told MIT Technology Review that people will be able to use Astra through smartphones and even desktop computers, but the company is also exploring other options, such as embedding it in smart glasses or other devices.

It's worth mentioning that in the demo video shown at the I/O conference, the sharp-eyed audience captured what appeared to be a prototype of Google Glass. This means that Google may have relaunched its earlier failed smart glasses project.

Google's Gemini 1.5 Pro has been upgraded to 2 million tokens, announcing that it is open to developers around the world

(来源:Sean Hollister / The Verge )

"We're in the early stages [of AI agent development]." Google CEO Pichai said this in a conference call ahead of the I/O conference.

"We have always wanted to build a general-purpose agent that can be useful in everyday life." Demis Hassabis, CEO and co-founder of Google's DeepMind, said.

"Imagine these agents seeing and hearing what we're doing, better understanding the environment we're in, and reacting quickly in conversations, making the speed and quality of interactions more natural." He added, "This is what Astra will look like in the future." ”

The day before Google's I/O conference, its rival OpenAI unveiled its own super-AI assistant, GPT-4o. Google's DeepMind's Astra responds to audio and video input in a very similar way to GPT-4o.

In Google's demo video, a user points a smartphone camera and smart glasses at an object and asks Astra to explain what they are. When a user points the device out of the window and asks, "What neighborhood do you think I'm in?" The AI system was able to identify King's Cross Station in London, where Google's DeepMind headquarters are located.

It can also remind the user that the glasses are on the table, as it recorded this in previous interactions.

Vignals said the demo showcased Google's DeepMind vision for real-time multimodal AI that can process multiple types of input, including voice, video, text, and more.

"We're very excited to be able to really get close to our users and provide them with any help they want in the future." He said. Google has also upgraded its AI model, Gemini, to handle larger amounts of data, an upgrade that helps it process larger documents and videos, and have longer conversations.

Tech companies are competing for the "hegemon" of the AI space, and big tech companies are showing that they are pushing the frontiers of technology, and AI agents are their "darlings".

Many tech companies have put AI agents into their narratives, including OpenAI and Google's DeepMind. The goal of these companies is to build artificial general intelligence (AGI), an idea of a super-AI system that is still largely in the envisioning stage.

"Ultimately, you'll have an agent who really understands you, can do a lot for you, and can work across multiple tasks and domains," said Chirag Shah, a professor at the University of Washington who specializes in online search. ”

It's a vision, but Google's launch today is its latest effort to compete with its rivals. Shah said that by launching these products, Google can collect more data from more than 1 billion users about how they use models and which ones work.

At the I/O conference, Google introduced more new AI features in addition to AI agents.

It will integrate AI more deeply into search engines through a new feature called AI overviews, which collects information from the internet and condenses it into short summaries to present to users as part of search results. The feature is already live in the United States and will be available to more countries and regions later.

Felix Simon, an artificial intelligence and digital journalism researcher at the Reuters Institute for Journalism, said this would help speed up the search process and provide users with more specific answers to more complex and niche questions.

"I think that's where search has been hard to get right." He said.

Another new feature of Google AI Search is better planning. For example, people will soon be able to ask for a search for dining and travel advice, just as they would ask a travel agency to recommend restaurants and hotels.

Google's Gemini 1.5 Pro has been upgraded to 2 million tokens, announcing that it is open to developers around the world

Figure | Artificial intelligence helps solve math problems (Source: Google)

Give it a recipe and Gemini will be able to help the user plan what they need to do or what they need to buy. Users can also talk to the AI system and ask it to complete a number of tasks, from simple tasks such as telling them about the weather conditions to complex tasks such as helping them prepare for interviews or important speeches.

People can also interrupt Gemini's response and ask clarifying questions, just like they would with a human. Coincidentally, the GPT-4o that OpenAI demonstrated yesterday has the same capabilities.

To further counter rival OpenAI, Google has also launched Veo, a new video generative AI system. Veo's ability to generate short videos and understand cues such as "time-lapse" or "aerial view of scenery" allows users to have more control over the style of their video clips.

Google has a significant advantage in training video generation models because it has YouTube. The company has already announced collaborations with artists such as Donald Glover and Wycleaf Jean, who are using the company's technology to create their own work.

Earlier this year, when asked if OpenAI's model used YouTube data in its training, OpenAI's chief technology officer, Mira Murati, did not give a definitive answer.

Douglas Eck, Google's senior director of research at DeepMind, was also vague about the training data used to create Veo when asked by MIT Technology Review, but said that "training may be done on some YouTube content based on our agreements with YouTube creators."

On the one hand, Google is promoting its generative AI as a tool that artists can use to create, but on the other hand, these tools are likely to learn how to create something new by using the work of existing artists, Shah said.

AI companies such as Google and OpenAI are facing a series of lawsuits from writers and artists claiming that their intellectual property is being used without consent or payment.

"It's a double-edged sword for artists." Shah said.

Finally, to better distinguish AI-generated content from real content, Google has also expanded its SynthID watermarking tool. It is designed to detect AI-generated misinformation, deepfakes, or phishing spam.

SynthID leaves an imperceptible watermark in the generated content that is invisible to humans but can be detected using software that analyzes pixel data. The tool can now scan content on the Gemini app, on the web, and Veo-generated. Google says it plans to release SynthID as an open-source tool later this summer.

Reference:

https://www.wired.com/story/everything-google-announced-at-io-2024/

https://www.technologyreview.com/2024/05/14/1092407/googles-astra-is-its-first-ai-for-everything-agent/

https://www.theverge.com/2024/5/14/24156518/google-glass-prototype-ar-glasses-io-2024

Support: Ren

Typesetting: Luo Yi

Read on