
Large model "gathering": domestic volume price, foreign volume capacity

Large model "gathering": domestic volume price, foreign volume capacity

Written by | Cao Shuangtao

Edit | Yang Bocheng

题图 | IC Photo

In the competition between China and the United States for AI models, two different paths are being derived.

Following the OpenAI spring conference in the early morning of May 14, Beijing time, OpenAI launched GPT-4o, which has the ability to listen, watch, and speak. In the early morning of May 15, Beijing time, at the Google I/O developer conference, Google CEO Sundar Pichai released dozens of Google and AI products, which can be called the "family bucket" level, comprehensively encircling Open AI.

These include Gemini 1.5 Pro and Gemini 1.5 Flash, which support 2 million tokens of long text, Veo that benchmarks Sora, Gemma 2, an open-source model, AI Overviews, and the sixth-generation TPU.

The biggest attraction of the whole developer conference was Google's AI voice assistant - Astra, which can recognize objects, codes and various things through the camera. In the live demo video, the user asks Astra to tell her when she sees something that makes a sound, and the assistant replies that it can see a speaker that makes a sound. For the flashing apple, Astra was able to answer accurately next to the glasses.

除Astra外,谷歌还推出基于Gemini的多款通用AI Agent子系列产品。 如音频的NotebookLM、音乐的Music AI Sandbox、视频的Veo、图像的Imagen 3,直接对标OpenAI发布的GPT-4o、Dall-E和Sora。

Unlike Google and OpenAI, the battle over technology is that domestic large models may usher in an era of price wars. At the 2024 Spring Volcano Engine Force Conference held by Byte on May 15, Byte launched 3 AI products, including the AI picture product PicPic, the AI education product Hippo Aixue, and the AI plot interactive product Cat Box.

In addition to these three products, Byte's TOC-oriented products also include Gauth for AI education, Doubao and CiCi for AI dialogue; ChitChop, the little Wukong who locates the AI tool; Positioning the Coze and clasp of the Al Bot creation platform; BagelBel, who locates the AI interactive plot, etc.

However, Byte took the lead in starting the industry price war, and Tan Bei, president of Volcano Engine, said that the bean bag model will be commercialized for payment, and the pricing is much lower than the industry price. Taking the bean bag universal model pro-32k version as an example, the input price of model inference is only 0.0008 yuan/1000 tokens. The pricing of models of the same specification on the market is generally 0.12 yuan/1000 Tokens, which is 150 times the price of the bean bag model.

After this round of price cuts, other domestic large-scale model manufacturers may follow. However, it is still debatable whether the price reduction can help domestic large-scale model manufacturers bring more new users and paying users.

GPT-4o and Google Gemini continue to advertise that AI capabilities have been greatly improved at this stage, but which of the two has the stronger ability of large models? Based on this, we also carried out multi-dimensional tests on GPT-4o and Gemini.

1. Text output: Gemini and GPT-4o are getting closer and closer, and some of their capabilities have caught up

Since GPT-4o and Gemini are both TOP-level large models in the world, we directly upgraded the difficulty in testing the text output capabilities of the two large models.

Why are many countries still dominated by oil trucks? What do you think are the factors that affect the increase in the overseas penetration rate of new energy vehicles? We also put this problem on GPT-4o and Gemini, both of which pointed out that the construction of charging infrastructure, acquisition costs, technological progress, policy support, and consumer cultural habits are the reasons why many countries are still dominated by fuel vehicles.

However, compared with GPT-4o, Gemini not only recognized that we were asking two questions and gave answers to both of them, but also answered questions about car companies and publicity and education that GPT-4o did not point out. That said, Gemini's answer is probably more complete.

Large model "gathering": domestic volume price, foreign volume capacity

Source: DoNews mapping based on Gemini and GPT-4o generated content

We continued to ask the two models to write a 10,000-word global new energy vehicle report for us at the same time, and asked the report to include industry price wars, battery technology, future development directions, and industry trends.

But at this time, there is a significant difference in the performance of the two large models, and GPT-4o generates a framework for seven chapters for us, and each frame also has a corresponding small frame. But it just doesn't output the content we request, which may be related to the current GPT-4o's poor ability to write long texts.

Large model "gathering": domestic volume price, foreign volume capacity

Source: GPT-4o official website

Although Gemini gave us specific text content, there was a big gap between the 1,679 words of the full text and the 10,000-word report we requested. After the content is greatly compressed, the overall quality of the content is also relatively poor.

For example, when it comes to the development of the new energy vehicle industry, Gemini gives the content of industrial chain integration, cross-border cooperation, and international competition, and each content is summarized in only one sentence. In other words, when it comes to issues that really involve industry expertise, both Gemini and GPT-4o have different degrees of shortcomings.

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

As we continue to raise the difficulty, we ask why global commodity prices have continued to rise so far this year. What are the implications of this increase? Will prices fall back in the future? In the first question, GPT-4o and Gemini gave the same answer, both pointing out that it is related to supply chains, geopolitical conflicts, global economy and other factors. And in the future price trend forecast, the answer given by the two is basically the same.

But in terms of the impact of the increase, Gemini may give a more complete answer. In particular, the impact on finance, corporate profits, society, etc., GPT-4o did not point out.

Large model "gathering": domestic volume price, foreign volume capacity

Source: DoNews mapping based on Gemini and GPT-4o generated content

In terms of rapid analysis of text content, we let the two large models analyze the risk points in Anker Innovations' 2024 Q1 financial report at the same time, and the risk points generated by GPT-4o include reduced cash flow, high sales expenses and management expenses, large fluctuations in financial expenses, and losses due to fair value.

Large model "gathering": domestic volume price, foreign volume capacity

Source: GPT-4o official website

However, Gemini gave six points: slowing down revenue growth, a sharp decline in operating activities, a significant increase in selling expenses and administrative expenses, an increase in inventory price losses, a sharp increase in foreign exchange, and dependence on government subsidies. Nor can it be seen that Gemini's answer is more complete.

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

And when we asked two large models at the same time to write a 2,000-word essay on how to help a lovelorn person come out of the shadows. The article is required to have a point of view, and the article needs to be accompanied by corresponding pictures and audio, Gemini's performance can be said to be completely slinging GPT-4o.

At the beginning of the article, Gemini puts in a soothing piece of music that can also be played. Under each subdivision, Gemini retrieves images related to the content directly from the website, realizing any combination of text, audio, and images mentioned by OpenAI.

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

In contrast, the content given by GPT-4o is somewhat inferior. Except for the image at the beginning of the article, there are no images related to the content in the rest of the text, and there is no audio in the text.

Large model "gathering": domestic volume price, foreign volume capacity

Source: GPT-4o official website

After the overall test, we found that Google has gone from "lagging behind" to catching up in the field of generative AI, especially in the output of text capabilities, and even surpassed GPT-4o in terms of content quality and content combination.

Second, in contrast, Gemini's comprehensive capabilities cannot be ignored

During the test, we found that Gemini not only supports textual content questions, but also voice questions. However, due to the limited network in China, it is not possible to test the voice function for the time being, and it is impossible to determine whether this is the Astra mentioned in Google's press conference. Compared to Google's speed, GPT-4o still supports a single text content question.

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

Source: GPT-4o official website

Google, which has been deeply involved in the search industry for many years, has made the current Gemini also realize AI retrieval. And this detection includes not only graphics, web pages, but also videos. When we asked Gemini to produce a 20-30S video with car safety as the core, Gemini first gave us a specific video script.

When we continue to ask, can you we generate the video directly? Gemini's answer was a bit more than we expected, giving us a few YouTube links. And these links don't even need to jump to YouTube to watch, and they can also be played automatically within the Gemini model.

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

In contrast, GPT-4o can also output the corresponding video script according to our requirements, but it does not have these features of Gemini.

Large model "gathering": domestic volume price, foreign volume capacity

Source: GPT-4o official website

It is worth noting that neither Gemini nor GPT-4o currently supports audio and video content recognition, and Gemini does not currently support image generation. GPT-4o, which supports image generation, also has some problems.

For example, when we ask GPT-4o to output a photo that contains the four mythical beasts in traditional Chinese mythology and stories, although the content of the picture shows the four mythical beasts, except for the green dragon that slightly conforms to the prototype of the mythological story, the other three mythical beasts are very different from the prototype in the mythological story, which may also be related to the OpenAI team's poor ability to learn traditional Chinese mythological stories.

Large model "gathering": domestic volume price, foreign volume capacity

Source: GPT-4o official website

However, in terms of image recognition capabilities, Gemini is deriving more scene services based on image recognition. We selected the common noodle pictures on the online platform, and after Gemini identified the picture as egg noodles, he gave us keywords such as egg noodles and Chinese noodles to facilitate our secondary search. What's more, Gemini also directly recommends a variety of egg noodle recipes.

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

In contrast, GPT-4o only briefly introduces the noodles when it recognizes the content of the image as noodles, and does not elaborate too much.

Large model "gathering": domestic volume price, foreign volume capacity

Source: GPT-4o official website

When we upgraded the difficulty of our ability to read pictures, we selected common bamboo forest photos on the online platform and asked Gemini where the pictures were taken, Gemini gave the locations including Arashiyama Bamboo Forest in Kyoto, Japan, Sagano Bamboo Forest in Kyoto, Yaeyama Bamboo Forest in Okinawa, Japan, Moso Bamboo Forest in Sichuan, Anji Bamboo Forest in China, South America or Southeast Asia, and pointed out the importance of bamboo forests.

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

GPT-4o only points out that such scenery is extremely common in East Asian countries such as China and Japan. For example, Japan's Kyoto Arashiyama Bamboo Forest and China's Anji Bamboo Sea are all famous bamboo forest scenic spots. Not only are there fewer locations than Gemini, but do you know the specific shooting locations?

Source: GPT-4o official website

In terms of testing logical reasoning, when we selected the more difficult finale questions in the 2023 National Mathematics College Entrance Examination, the answer given by GPT-4o can be described as disappointment.

Large model "gathering": domestic volume price, foreign volume capacity

Source: 2023 National Paper Mathematics Past Questions

For example, in the two questions of question 20 of the national college entrance examination paper, GPT-4o simply gave incomplete steps to solve the problem, and did not output any accurate answers.

Large model "gathering": domestic volume price, foreign volume capacity

Source: GPT-4o official website

In the three questions of question 21, GPT-4o not only turns the three questions into two questions, but the first two questions of the probability problem should be answered by specific numbers, but in GPT-4o, the answer is indeterminate with variable N.

However, Gemini's performance is also not good, for example, in the formula for finding the general term in the first question of question 20, Gemini gives two solutions, but the answers given by the two solutions are completely different. In other words, Gemini may simply scrape the relevant links of domestic websites without conducting a second review of the content and accuracy of the information.

Large model "gathering": domestic volume price, foreign volume capacity

Source: Gemini's official website

On the whole, Gemini is currently more capable than GPT-4o in terms of comprehensive capabilities in many aspects and product launch speed. And in terms of price, Google's Gemini 1.5 Flash is priced at 35 cents per 1 million tokens, which is much lower than GPT-4o's price of $5 per 1 million tokens. The performance of the product portfolio is not inferior to GPT-4o and the low price, and Google may be unleashing the king bomb.

However, judging from the strong technical capabilities accumulated by OpenAl on large models, it is still debatable how long Google's slight lead in some aspects can be maintained. Google and OpenAI's ongoing game on AI large model technology may push the technical capabilities of American AI large models to new heights.

Third, the price of domestic volume may promote the industry to accelerate the reshuffle

It is also incomprehensible that Byte took the lead in initiating the industry price war, and at present, the commercialization of domestic large models on the TOC side is developing in the direction of the mobile Internet of that year.

Relying on price wars, mobile Internet manufacturers (large model manufacturers) continue to increase the number of new users and daily active users, and gradually derive revenue from other scenarios such as advertising, e-commerce, and other scenarios that are highly compatible with core business scenarios. This not only ensures that the platform maximizes the value of a single user, but also helps mobile Internet manufacturers (large model manufacturers) improve cash flow and continuously reduce corporate losses.

In the follow-up, mobile Internet manufacturers (large model manufacturers) continued to launch industry price wars, small and medium-sized manufacturers with insufficient financial capacity were shuffled and cleared, and the industry share continued to concentrate on the head manufacturers. Under the high profile of the leading manufacturers, more commercialization will be derived from both the supply and demand sides, and finally the industry will form a Matthew effect in which the stronger the stronger.

Not only on the TOC side, but also on the TOB side in the future. From the perspective of benchmarking the SaaS industry, price is still one of the core advantages of domestic SaaS companies, especially in the SaaS industry with serious homogeneity of products and scenarios, domestic business owners' weak awareness of payment, high churn rate of small and medium-sized customers, poor compliance, and centralized decision-making.

However, it should be pointed out that the price war for the TOC side in the Internet era is more based on services in subdivided scenarios. Under this kind of service, what consumers really care about is the quality of the service. In the case of rigid demand, the quality of this service is even more diluted.

However, in the era of large AI models, consumers may not have high requirements for image scene services such as Meitu. But in other scenarios, the essence is that users are willing to pay for high-quality content on large models.

In other words, what consumers really value is the processing power of the big model and the ability to complete the task efficiently, not the price. If you don't perform well in the ability to complete tasks, no matter how low the price is, it is useless.

This is especially true for industries such as finance and research and investment that require high content quality and fast data quality generation for large model output. Not to mention the enterprise-side customized large model, and the production content and data of the large model are not allowed to deviate in the slightest.

Perhaps domestic large model manufacturers want to use the price war to let the large model help the enterprise drive revenue growth as soon as possible, and then hedge the high R&D cost investment in the early stage of the large model, as well as the investment in related hardware.

However, as domestic large-scale model manufacturers continue to roll up in price, it may affect many start-ups with superior technical capabilities but insufficient financial strength, will this make the gap between China and the United States in terms of technical capabilities in large-scale models be widened?

Read on