laitimes

Google's large model was questioned for falsification after it became popular!

Google's large model was questioned for falsification after it became popular!

Google's large model was questioned for falsification after it became popular!

Technology giant Google's new large model Gemini became popular overnight and was favored by the market, but some analysts pointed out that Google was suspected of exaggerating the publicity in Gemini's promotional materials.

On December 6, local time, Google announced the launch of Gemini, the "largest, strongest, and most versatile" new large language model. Gemini will be the first large model to run directly on a mobile phone, being used in Google's Pixel 8 Pro smartphone and chatbot Bard. Gemini is seen as a direct response to the latest large model GPT-4 of the cutting-edge AI giant OpenAI, and it also symbolizes that Google, which was once in a passive state due to the chatbot ChatGPT, has finally officially rushed back to the track.

According to Google, Gemini scored 90.0% on MMLU (Massive Multitasking Language Understanding) and was the first model to outperform a human expert in the MMLU test. Gemini will include a suite of models at three different scales, with the Gemini Ultra being positioned as a competitor to GPT-4, with the Gemini Pro outperforming GPT-3.5 and the Gemini Nano for specific tasks and mobile devices.

With its powerful performance, Gemini became an overnight hit and attracted the attention of Wall Street. On December 7, Google's parent company Alphabet (Nasdaq:GOOG) rose 5.31% to close at $136.93, ushering in its best day since August 29 this year, with a total market capitalization of $1.72 trillion.

Bank of America analysts pointed out on the 6th that Alphabet has been under some pressure this year due to concerns about Google's AI capabilities, and a "highly competitive model with a good brand image" may attract more consumers to Google search and have a positive impact on the sales of cloud services: "The data shows that Google has first-class, unrepeatable AI capabilities, which may have a positive impact on the company's stock trend in the first half of 2024." ”

JPMorgan analysts wrote in a note on the 6th that although the market did not give a significant reaction to Gemini on the day, it was "encouraging" to see Google's progress in "this major technological shift". However, JPMorgan Chase & Co. also pointed out that there is "uncertainty about the monetization path of large models in the search space", which may bring some headwinds in the future.

In a report on the 7th, analysts at JPMorgan Chase wrote: "While it is still early in its development, the launch of Gemini symbolizes a major innovation by Google in the second year that generative AI has been widely commercialized and widely disseminated." ”

At the moment, it seems that how Google will commercialize Gemini in its business in general, and in search business in particular, is a point of interest to Wall Street. Currently, Google plans to license Gemini to customers through Google Cloud later this month and will integrate with other products in Google services in the coming months, but has not yet announced a follow-up commercialization strategy.

Analysts at Wells Fargo said the launch of Gemini should be enough to quell the debate about where Google should go from here, but the key question is how Google can monetize Gemini: "In short, I think Google has shown that they still have some competitiveness. ”

Analysts at KeyBanc also said that Gemini is the "pinnacle" of Google's many AI announcements this year, but it will take time for AI to have a positive impact on Google's performance growth and profitability: "Gemini is still trying to get into core products like search, so we recommend being patient and watching the impact." ”

Unlike Wall Street's overall bullishness, there are voices in the tech field that Gemini may have doubts about "exaggerated publicity".

Shortly after Gemini was launched on the 6th, some netizens pointed out some inappropriateness in the promotional materials. For example, when Google says that Gemini's MMLU score is higher than GPT-4's, GPT-4's score is 86.4%, but according to Google's 60-page technical report, Gemini Ultra's MMLU test results have a small print note with "CoT@32" under the results, indicating that it used the Chain-of-Thought Prompting technique, tried 32 times, and selected the best result. In contrast, GPT-4 gives 5 examples of the no-prompt word technique, and under this standard, Gemini Ultra's test result is actually 83.7%, which is lower than GPT-4's 86.4%.

If the same CoT@32 method is used, although the score is still lower than that of Gemini Ultra, GPT-4 has a score rate of 87.29%.

Google's large model was questioned for falsification after it became popular!

Comparison of MMLU test scores between Gemini and GPT under various conditions. Source: Google

If, as Jeff Dean, Google's chief scientist at DeepMind, responded, this is just to show a comparison between the two different methods, the skepticism of Gemini's test video is even more difficult to refute.

After the launch of Gemini, Google released a six-minute demo video showing some interesting interactions between testers and Gemini, including asking Gemini to recognize images and describe them in multiple languages, having Gemini design quizzes from a map, and playing cup games and deduction games with Gemini. Throughout the process, Gemini responds very quickly, generates audio and pictures to support answers, and uses some colloquial and even humorous expressions, which can be described as eye-opening.

However, some netizens soon found a problem from the text disclaimer at the beginning of the video, thinking that it may imply that the video shows a carefully selected good result, not recorded in real time, but edited. Later, in a blog post, Google explained the multimodal interaction process, basically indirectly acknowledging that the effect in the demo video can only be achieved by using a patchwork of static images and multiple prompts.

Google's large model was questioned for falsification after it became popular!

For example, in the article, Google admits that, unlike the quick reaction to the guessing gestures in the video, Gemini will only conclude that it is a guessing game if all three gestures are shown at the same time and the message is that it is a game. Screenshot of the official website

Some analysts pointed out that this is completely different from what Google hinted at in the video, because from the video, Gemini can observe and react to the world around him in real time, and users can have a smooth voice conversation with Gemini. Ethan Mollick, a professor at the Wharton School of Business, also demonstrated on the X platform that if you use static images and multiple prompts, you can use ChatGPT Plus to replicate Gemini's performance.

Google's large model was questioned for falsification after it became popular!

Ethan Molik showed ChatGPT Plus multiple screenshots from Google's demo video at the same time, and ChatGPT Plus was able to give a similar answer.

After the doubts fermented, Eli Collins, Google's vice president of DeepMind products, responded to the media that the drawing of a duck demonstration in the video (drawing a stick figure of a duck that Gemini can correctly explain each step of) is indeed a research-grade feature, at least not yet in Google's actual products.

Oriol Vinyals, VP of Research and Deep Learning at Google's DeepMind, also posted a lengthy post on the X (formerly Twitter) platform, explaining how the team made the video: "All the user prompts and output in the video are real, just shortened for brevity. "This video shows what a multimodal user experience built with Gemini looks like," Vignals added. We do this to motivate developers. ”

However, Vignals' response sparked more controversy. One netizen commented: "If you want to motivate developers, why not publish authentic content? This is both insincere and misleading. ”

Google's large model was questioned for falsification after it became popular!

Some Google employees revealed to foreign media that they thought the video painted an "unrealistic picture". One employee said he wasn't surprised by the exaggerated demo, as they were used to some degree of exaggerated marketing of the product: "I think most employees who have used large language model technology know to take all of this with a grain of salt. ”

Some foreign media believe that Google's "huge bureaucratic system and product managers at all levels have made it impossible to launch products as agile as OpenAI until now". This is not a bad thing for a society that is dealing with the impact of AI transformation. But there should be some reservations about Google's recent rapid progress.

Read on