Reporter: Cai Ding Editor: Lan Suying
On December 6, Eastern time, Google CEO Sundar Pichai announced the official launch of Gemini 1.0, the largest and most capable Google model to date. Gemini is the first step in a new era of Google's large model with native multimodality, and it includes three orders of magnitude: the most powerful Gemini Ultra, the Gemini Pro for multitasking, and the Gemini Nano for specific tasks and devices.
After Pichai's official tweet was released, Musk also commented below, "(Gemini) is impressive." On the same day, Google also released a time-lapse demonstration video of about 6 minutes showing the multimodal capabilities of Gemini (e.g., the combination of spoken conversation prompts and image recognition). As of press time, the video has garnered 1.41 million views on YouTube.
However, just one day after Gemini's release, there have already been voices accusing Google of "cheating" Gemini's performance.
Among them, a Bloomberg op-ed said that Google misrepresented Gemini's AI performance in a demo video. Columnist Parmy Olson believes that in this video posted by Google, Gemini seems to be very powerful, but a little too powerful. In response, Google admitted that the video of the Gemini performance demonstration was not live, but instead used still image frames from the original footage, and then wrote text prompts to get Gemini to respond.
The 6-minute demo video raises questions
According to Olson, Gemini's demo video is really impressive. Gemini's ability to infer that the drawing is a crab based on just a few random points shows the inference capabilities of large models that Google's DeepMind AI Lab has trained over the years. However, Olson pointed out that some of the features that Google shows in this video are not unique to Gemini, and that ChatGPT Plus has similar reasoning capabilities.
Image source: Google
The reporter of "Daily Economic News" noticed that in this 6-minute video, Gemini seems to be able to quickly recognize the image and react in a matter of seconds. However, if a user clicks on the description of the video posted on YouTube, Google writes an important "disclaimer" stating that "the latency has been artificially reduced for Gemini's presentation purposes, and the Gemini's output length has been shortened for brevity." This means that Gemini actually takes longer to actually answer each question than it would in the video presentation.
Machine learning instructor Santiago Valdarrama suggested in a post on Platform X that Google's "disclaimer" of the above video appears to "show a carefully selected good result, not recorded in real time but edited." "That's misleading, and anyone involved should be embarrassed." ”
Image source: X
In addition, Google's MMLU multi-task language understanding dataset test shows that Gemini Ultra not only surpasses GPT-4, but even surpasses human experts. However, many industry experts have found that in the MMLU test, the Gemini Ultra results are marked in small gray letters CoT@32, indicating that the best result was selected after 32 attempts using the Chain-of-Thought Prompting technique. In contrast, GPT-4 has no prompt word skills and has only tried 5 times.
Image source: Google
Denied the fraud, the head of Gemini said that it was just for the sake of brevity to shorten the reaction time
According to the American technology media The Verge, it is fair to say that this is not the first time that big tech companies have edited their product demonstration videos, and it is also very common for other big tech companies other than Google to make slight adjustments to the videos in order to avoid any technical problems caused by live demonstrations.
However, Google firmly denied that the video was "fake". In a blog post, Oriol Vinyals, vice president of DeepMind and deep learning at Google and co-head of Gemini, explains how the Gemini demo video was made: Instead of being in real time, the performance demo video uses still image frames from the original footage, then writes text prompts and asks it to respond with predictions.
"All the user prompts and outputs in the video are real, just shortened for brevity (Gemini's reaction time). This video showcases a multi-modal user experience built with Gemini, and we made it to inspire developers. Viales emphasized.
Olsen wasn't buying it. "It's completely different from what Google describes — Google says anyone can have a smooth voice conversation with Gemini because Gemini can observe and react to the world around them in real time," she wrote in her column. ”
She also pointed out that Google's official Gemini performance shows that Gemini Ultra (blue in the chart below) outperforms GPT-4 in 7 of the 9 standard benchmarks. These benchmarks are often used to test the ability of AI models in high school physics, professional law, and ethics scenarios, among other things.
Image source: Google
However, in most benchmarks, Gemini Ultra is only a few percentage points better than OpenAI's GPT-4, and some are even less than 1 percentage point. In other words, Olson argues, Google's so-called top-of-the-line AI model only makes limited improvements to the work done by OpenAI a year ago.
It's important to point out that Google's 6-minute Gemini demo video doesn't state that the demo model is the Gemini Ultra.
Olson believes that Google, a "clumsy search giant," was caught off guard by OpenAI's ChatGPT a year ago and has been hoping to catch the generative AI wave ever since. Google wants to make people remember with its powerful marketing that it has one of the most powerful AI research teams in the world and has access to more data than anyone else. But from a technical standpoint, Google still lags behind OpenAI when it comes to generative AI.
However, in the tech industry, no one can guarantee that it will always be smooth sailing. The early mobile phone overlords Nokia and BlackBerry are examples. After Apple launched its more powerful and popular iPhone, Nokia and BlackBerry quickly lost market share. In the software space, the success of the market comes from the system with the most powerful performance.
National Business Daily