Google“绝地反击”OpenAI新模型GPT-4o

author：Data Ape 2024-05-15 10:42:00

Yesterday was the latest multi-modal large model GPT-4o by OpenAI stole the limelight, and today Google made a "Jedi counterattack" at the I/O developer conference, and the product update is benchmarked against OpenAI everywhere.

Before the meeting, Google uploaded a video on its official Twitter about a person chatting with Gemini with the camera of his phone facing the I/O stage.

When asked what he saw on camera, Gemini replied, "It looks like people are preparing for a big event, maybe a meeting or a presentation. Is there anything special that catches your eye? ”

When asked to analyze the meaning of the letters displayed on stage, Gemini said the letters stood for Google I/O, and then based on the new prompt, Gemini said it was excited to learn about new advances in artificial intelligence and how they can help people in their daily lives.

Its smooth, human-like tone of speech and its ability to recognize its surroundings are reminiscent of GPT-4o, which was launched only yesterday.

Like GPT-4o, this is supposed to be a closed beta version of the latest Gemini and is not yet available to the public. GPT-4o is also currently only available to users with text and image capabilities, and a real-time voice mode will be launched in the coming weeks.

Later at the conference, Google showcased its voice AI assistant Gemini Live and Project Astra, a multimodal AI project that could provide technical support for the new Gemini.

Gemini Live supports real-time interactions, and users can interrupt a conversation with the chatbot at any time. By integrating with Google Lens, it allows users to search the web by recording and explainer videos, taking advantage of its large contextual window, allowing users to quickly access large amounts of information, making interactions with AI assistants more natural and smooth.

Gemini Live will be available in 10 voice options and will go live later this year, when Google will make Gemini Live available to Gemini Advanced subscribers.

Project Astra is led by Demis Hassabis, head of Google's DeepMind Lab, who envisions Astra as an always-on all-around assistant, similar to the fictional communicator in Star Trek or the voice in the movie She, everywhere.

Coincidentally, yesterday OpenAI's CEO, Sam Altman, also compared GPT-4o to the movie "She" on Twitter.

Astra is designed to operate in real-time, answering questions or assisting with tasks through conversation, supporting a variety of interactions, including voice, text, drawing, photography, and video.

In the video, Astra helps employees at Google's London office find his missing glasses, checks the code on the whiteboard, and more, all in real time, in a conversational fashion.

This is not the most "-for-tat", at the conference, Google CEO Sundar Pichai (Sundar Pichai) and a group of executives kept emphasizing in their speeches that "our Gemini was built according to the multimodal path at the beginning of its birth", which seems to be a "slap in the face" OpenAI, which has just launched the first multimodal large model GPT-4o.

Of course, Google has also updated its flagship AI model, Gemini. The latest version of Gemini 1.5 Pro will add a larger context window, from the previous support of up to 1 million tokens to the ability to process 2 million tokens in the future.

Launched in February, Gemini 1.5 Pro is a mid-size multimodal model optimized for cross-task scaling, with a context window capable of supporting 128,000 tokens. With AI Studio and Vertex AI, a small group of developers and enterprise customers can use an extended context window of 1 million tokens. This means that the Gemini 1.5 Pro is capable of processing up to 1 hour of video, 11 hours of audio, a codebase of over 30,000 lines, or analyzing a document of over 700,000 words in one go.

The faster, more efficient and cheaper Gemini 1.5 Flash was also unveiled at the conference. The Gemini 1.5 Pro starts at $7/1,000,000 tokens, and the Gemini 1.5 Flash starts at $0.35/1,000,000 tokens.

Starting today, developers can try out Gemini 1.5 Flash through Google AI Studio and Vertex AI.

Google says that the Gemini 1.5 Pro will soon be available in the side panel of Workspace, automating workflows across apps.

Gemini was also introduced to Google Photos. With the help of the new feature "Ask Photos", users can query photos directly through the chatbot instead of manually flipping through thousands of photos. For example, if you want to know your license plate number, simply ask Gemini, "What is my license plate number?" Instead of typing in the keyword "license plate" and browsing through all the relevant photos. Gemini will intelligently identify and extract the license plate number that belongs to your vehicle. This feature is scheduled to roll out to all Google Photos users later this summer.

As part of this update to the AI model lineup, Gemma 2, a next-generation open AI model, will be launched in June. With 27 billion parameters, this model built on the new architecture outperforms models twice as large in performance and can run on a single TPU host in Vertex AI. In addition, Google has launched PaliGemma, the first visual language model in the Gemma family. Notably, the Gemma model, which was launched earlier this year, only contains 2 billion parameters and 7 billion parameters, and this update significantly expands the size and capabilities of the model.

LearnLM, a fine-tuned model based on Gemini parallel educational research, was also presented at the conference.

There's also Imagen 3, a new image model that benchmarks Dall-E 3, which claims to have an "incredible level of detail" for realistic images with fewer traces of AI.

(Image generated by Imagen 3)

Three months after OpenAI demonstrated its text-to-video model Sora, Google launched a competing product, Veo: it supports a wide range of visual and cinematic styles and is capable of generating high-quality 1080p resolution videos of more than a minute.

Veo boasts advanced natural language understanding capabilities that accurately understand cinematic jargon, including "time-lapse" and "aerial shots."

At this Google I/O developer conference, AI is still at the center of all the conversations, and almost every feature update is closely related to AI. For example, Gemini continues to optimize for Google search; Gemini's tighter integration with Gmail; Gemini AI Teammate, which is similar to Microsoft's Copilot office assistant; Android phones will introduce more AI services, etc.

Judging from the presentation of this conference, it seems that the gap between Google and OpenAI is gradually narrowing. The two companies are not only competing fiercely at the technical level, but also showing their ability to promote the application of AI to a wider range of scenarios. It's hard to say who will win in the end, but what we do know is that the competition will drive both to innovate and introduce more cutting-edge technologies and solutions.

At the end of the meeting, the CEO, who was nicknamed "Brother Chophai" by netizens, humorously mentioned that he counted for everyone, and a total of 120 AI was mentioned at this press conference.

Google“绝地反击”OpenAI新模型GPT-4o

Read on

changes in the senior management of pharmaceutical companies Novartis and GSK in China; OpenAI's Chief Scientist Leaves | Executive Updates: May 5-17, 2024

The Conservative Rout? The driving force behind OpenAI's infighting left Altman: It makes me sad

OpenAI is shockingly exposed! Executives angrily denounced the suppression, and the 710 billion AI giant was embarrassed at home and abroad|Titanium Media AGI

GPT-4o sparks heated discussions about OpenAI's organizational innovation! Heavy responsibilities for fresh graduates and undergraduates, the ranks are all floating clouds

Ilya left OpenAI insider exposure: Ultraman cut his team's computing power and prioritized products to make money

In the second act of OpenAI's palace fight, the core security team was disbanded, and the person in charge blew up the inside story of his resignation

OpenAI forces departing employees to sign shut-up agreements: GPT can talk, but former employees can't

OpenAI responds to "gag" resignation clauses; Didi Chengwei: Liu Qing was promoted to permanent partner, and the company no longer has the position of president; NetBSD prohibits AI-generated code | Geek headlines

OpenAI employees were "sealed" when they left their jobs, the core security team was disbanded, and Altman responded urgently: there was an agreement, but it was never implemented!

聊聊OpenAI最新发布的GPT 4o

OpenAI Shock! The chief scientist suddenly left! Wang Yuquan's exclusive analysis!

OpenAI officially announced the launch of "next-generation cutting-edge model" training! It is expected that the training parameters will be further improved, or the "Wensheng video" model Sora will be integrated

Former OpenAI director reveals the inside story of Ultraman's recall: The board of directors knew that ChatGPT had been released from X

It's all "my own people"! OpenAI urgently set up a "safety committee", less than half a month after the disbandment of the "super alignment" team, and will face the first security "big test" in 90 days

OpenAI is caught in the biggest public relations crisis in history, and the head of Altman, who is in charge, donated half of his net worth to help the company tide over the difficulties

The Stanford team was exposed to plagiarism of the Tsinghua system and deleted the database, and the CEO of the plagiarized company was also internationally recognized

Mistral's first "open" programming model

Stanford AI team plagiarized domestic large models? Even the identification of "Tsinghua Jane" was copied! The Tsinghua team responded

A preliminary study of the basic model in the figure below in the era of rapid development of LLMs

Chaos Cosmos adds over 650 high-quality 3D models and materials

It seems that AI is the trend of mobile phone development in the future, and it has recently been revealed that Siri will be completely remodeled with AI, allowing it to control all functions, which allows users to control the single through voice

The Stanford AI team was accused of plagiarizing a large domestic model

RAND: Make sure the AI model weights

The Stanford AI team admitted to plagiarizing the Tsinghua model, publicly apologized and pulled down the controversial project

Today's Legal Q&A: Whether the Stanford AI team's plagiarism of the facewall open source model is infringing

The model jointly developed by Tsinghua University and Facewall was shelled, and the two Stanford student authors apologized for deleting the citation

The Stanford team plagiarized the Tsinghua large model, and the author apologized late at night, and the Chinese model could no longer be ignored

Microsoft's Pan Haifeng, University of Washington Sheng's team released the first full-slide digital pathology model

Alibaba Cloud's first biological model that combines DNA, RNA, and protein, covering 16.9W species

Zhong Xuegao responded to the sweet potato assassin again; Ideal refutes rumors that new cars serve as second-hand car exports; The Stanford AI team apologized for plagiarizing the Chinese model

As soon as the domestic open source is opened, the foreign team will self-develop [angry] The Stanford University team plagiarized the open source model of the star startup company of Tsinghua University, "Little Steel Cannon" MiniCPM-Llama3-V2.

Current and former employees of OpenAI, Google DeepMind warn of the risks of artificial intelligence: it could lead to the extinction of humanity! Call for the protection of whistleblowers