laitimes

Google“绝地反击”OpenAI新模型GPT-4o

author:Data Ape
Google“绝地反击”OpenAI新模型GPT-4o

Yesterday was the latest multi-modal large model GPT-4o by OpenAI stole the limelight, and today Google made a "Jedi counterattack" at the I/O developer conference, and the product update is benchmarked against OpenAI everywhere.

Before the meeting, Google uploaded a video on its official Twitter about a person chatting with Gemini with the camera of his phone facing the I/O stage.

When asked what he saw on camera, Gemini replied, "It looks like people are preparing for a big event, maybe a meeting or a presentation. Is there anything special that catches your eye? ”

When asked to analyze the meaning of the letters displayed on stage, Gemini said the letters stood for Google I/O, and then based on the new prompt, Gemini said it was excited to learn about new advances in artificial intelligence and how they can help people in their daily lives.

Its smooth, human-like tone of speech and its ability to recognize its surroundings are reminiscent of GPT-4o, which was launched only yesterday.

Like GPT-4o, this is supposed to be a closed beta version of the latest Gemini and is not yet available to the public. GPT-4o is also currently only available to users with text and image capabilities, and a real-time voice mode will be launched in the coming weeks.

Later at the conference, Google showcased its voice AI assistant Gemini Live and Project Astra, a multimodal AI project that could provide technical support for the new Gemini.

Gemini Live supports real-time interactions, and users can interrupt a conversation with the chatbot at any time. By integrating with Google Lens, it allows users to search the web by recording and explainer videos, taking advantage of its large contextual window, allowing users to quickly access large amounts of information, making interactions with AI assistants more natural and smooth.

Gemini Live will be available in 10 voice options and will go live later this year, when Google will make Gemini Live available to Gemini Advanced subscribers.

Project Astra is led by Demis Hassabis, head of Google's DeepMind Lab, who envisions Astra as an always-on all-around assistant, similar to the fictional communicator in Star Trek or the voice in the movie She, everywhere.

Coincidentally, yesterday OpenAI's CEO, Sam Altman, also compared GPT-4o to the movie "She" on Twitter.

Astra is designed to operate in real-time, answering questions or assisting with tasks through conversation, supporting a variety of interactions, including voice, text, drawing, photography, and video.

In the video, Astra helps employees at Google's London office find his missing glasses, checks the code on the whiteboard, and more, all in real time, in a conversational fashion.

This is not the most "-for-tat", at the conference, Google CEO Sundar Pichai (Sundar Pichai) and a group of executives kept emphasizing in their speeches that "our Gemini was built according to the multimodal path at the beginning of its birth", which seems to be a "slap in the face" OpenAI, which has just launched the first multimodal large model GPT-4o.

Of course, Google has also updated its flagship AI model, Gemini. The latest version of Gemini 1.5 Pro will add a larger context window, from the previous support of up to 1 million tokens to the ability to process 2 million tokens in the future.

Launched in February, Gemini 1.5 Pro is a mid-size multimodal model optimized for cross-task scaling, with a context window capable of supporting 128,000 tokens. With AI Studio and Vertex AI, a small group of developers and enterprise customers can use an extended context window of 1 million tokens. This means that the Gemini 1.5 Pro is capable of processing up to 1 hour of video, 11 hours of audio, a codebase of over 30,000 lines, or analyzing a document of over 700,000 words in one go.

The faster, more efficient and cheaper Gemini 1.5 Flash was also unveiled at the conference. The Gemini 1.5 Pro starts at $7/1,000,000 tokens, and the Gemini 1.5 Flash starts at $0.35/1,000,000 tokens.

Starting today, developers can try out Gemini 1.5 Flash through Google AI Studio and Vertex AI.

Google“绝地反击”OpenAI新模型GPT-4o

Google says that the Gemini 1.5 Pro will soon be available in the side panel of Workspace, automating workflows across apps.

Gemini was also introduced to Google Photos. With the help of the new feature "Ask Photos", users can query photos directly through the chatbot instead of manually flipping through thousands of photos. For example, if you want to know your license plate number, simply ask Gemini, "What is my license plate number?" Instead of typing in the keyword "license plate" and browsing through all the relevant photos. Gemini will intelligently identify and extract the license plate number that belongs to your vehicle. This feature is scheduled to roll out to all Google Photos users later this summer.

As part of this update to the AI model lineup, Gemma 2, a next-generation open AI model, will be launched in June. With 27 billion parameters, this model built on the new architecture outperforms models twice as large in performance and can run on a single TPU host in Vertex AI. In addition, Google has launched PaliGemma, the first visual language model in the Gemma family. Notably, the Gemma model, which was launched earlier this year, only contains 2 billion parameters and 7 billion parameters, and this update significantly expands the size and capabilities of the model.

LearnLM, a fine-tuned model based on Gemini parallel educational research, was also presented at the conference.

Google“绝地反击”OpenAI新模型GPT-4o

There's also Imagen 3, a new image model that benchmarks Dall-E 3, which claims to have an "incredible level of detail" for realistic images with fewer traces of AI.

Google“绝地反击”OpenAI新模型GPT-4o

(Image generated by Imagen 3)

Three months after OpenAI demonstrated its text-to-video model Sora, Google launched a competing product, Veo: it supports a wide range of visual and cinematic styles and is capable of generating high-quality 1080p resolution videos of more than a minute.

Veo boasts advanced natural language understanding capabilities that accurately understand cinematic jargon, including "time-lapse" and "aerial shots."

At this Google I/O developer conference, AI is still at the center of all the conversations, and almost every feature update is closely related to AI. For example, Gemini continues to optimize for Google search; Gemini's tighter integration with Gmail; Gemini AI Teammate, which is similar to Microsoft's Copilot office assistant; Android phones will introduce more AI services, etc.

Judging from the presentation of this conference, it seems that the gap between Google and OpenAI is gradually narrowing. The two companies are not only competing fiercely at the technical level, but also showing their ability to promote the application of AI to a wider range of scenarios. It's hard to say who will win in the end, but what we do know is that the competition will drive both to innovate and introduce more cutting-edge technologies and solutions.

At the end of the meeting, the CEO, who was nicknamed "Brother Chophai" by netizens, humorously mentioned that he counted for everyone, and a total of 120 AI was mentioned at this press conference.

Google“绝地反击”OpenAI新模型GPT-4o

Read on