OpenAI has made a major update and released the GPT-4o multi-modal large model

Wan Da-hsien

2024-05-17 20:11Posted in Heilongjiang Science and Technology Creators

OpenAI held a spring conference and launched a blockbuster update, can the domestic large model keep up?

At the press conference just now, OpenAI released the latest GPT-4o multimodal large model, which can reason across text, audio, and vision (image and video) in real time.

At the same time, compared to the previous GPT-4 Trubo, GPT-4o is not only correspondingly faster, but also cheaper.

OpenAI has made a major update and released the GPT-4o multi-modal large model

For example, in the past, the speech model needed to call three models, namely transcription, intelligence, and Wensheng speech functions, so there would be a delay, and it was generally a state of one question and one answer, and it took several seconds to wait in the communication. The performance of GPT-4o is quite amazing, in the process of live demonstration, GPT-4o can already reach the state of "real-time" response, that is, similar to human-to-person chat, can interrupt the conversation, do not have to wait for the answer to be completed to ask new questions, and can also make various requests.

Because GPT-4o has the ability to perceive emotions, it can generate voices with different emotional styles, such as gently telling bedtime stories, etc., and the effect can be described as very shocking.

In terms of price, GPT-4o is 50% cheaper and has a rate limit 5 times higher than GPT-4 Turbo. It is because of the efficiency gains that OpenAI has decided to make GPT-4 available to all users.

That is, free users will also be able to use GPT-4o. Specifically, the number of messages will be limited to 5 times that for Free users for Plus users, and a bit higher for Team and Enterprise users. However, there is a limit to the free cost, and when the quota is depleted, it will automatically switch to GPT-3.5.

In terms of features, free users can also upload images, analyze them, use the "browser" to search for real-time information, and more.

In addition, GPT-4o also opens up APIs, based on which developers can develop and deploy AI applications, so free and open GPT-4o will better promote the number of GPTs developers.

In addition to this, OpenAI has also released a desktop version of ChatGPT, but it is currently only available for macOS. Users can use shortcut keys to "shoot" the desktop and ask ChatGPT questions, that is, they can identify and analyze the content of the computer desktop.

For example, for the code that appears on the desktop, it is able to parse the code that is used to get the daily weather data and detail what it does with the weather data. After running the code, ChatGPT analyzes it, and it not only gives the exact time period of July and August, but also describes how much the maximum temperature reached during this period.

However, Windows users need not fret, as the Windows version will be available later this year.

In addition, GPT-4o outscored GPT-4 Turbo, Claude 3 Opus, and Gemini Pro 1.5 in a variety of tests. It can be seen that GPT-4o's capabilities are already industry leaders.

So, in the face of GPT-4o, how will our domestic large model perform?

To be honest, because GPT-4o has just been released, it may take a little time for the domestic large model to follow up. On the whole, the current domestic large model is close to GPT-4.

For example, SenseTime's Scholar Puyu has 123 billion parameters, and its test score ranks second in the world on a total of 300,000 question sets in 51 well-known evaluation sets around the world, and ranks first in the ten evaluations of comprehensive test agieval, knowledge quiz CommonsenseQA, reading comprehension and reasoning, with a score exceeding GPT-4.

For another example, iFLYTEK said that its Spark cognitive model V3.5 has surpassed GPT-4 Turbo in language understanding and mathematics, with 96% of GPT-4 Turbo code ability and 91% of GPT-4 multimodal understanding.

Then there is the Tongyi Qianwen 2.5 just released by Alibaba Cloud, according to media reports: the model performance has fully surpassed GPT-4-Turbo and become the "strongest" Chinese large model on the surface; Tongyi Qianwen's 110 billion parameter open-source model has achieved the best results in multiple benchmarks, surpassing Meta's Llama-3-70B and becoming the most powerful model in the open source field.

There is also Baidu's Wenxin model version 4.0, which Robin Li said is not inferior to GPT-4. Especially in Chinese, Wen Xin Yiyan may perform better.

Therefore, although GPT-4o has strong performance, domestic AI models will soon keep up.

In fact, I have used Wenxin Yiyan for a long time before, and in the actual experience process, I feel that it is still very OK, which greatly improves the work efficiency and meets my requirements. And I also use the free version, I believe the paid version will be better.

In other words, the current domestic large model can already meet the actual needs.

Perhaps in terms of performance, there is still a certain gap with GPT-4o, but the reason for this backwardness, the proportion of hardware factors may be larger, after all, the foundation of AI is computing power, and the United States has been restricting the export of advanced AI chips to us, which everyone knows.

And the cost is also high, it is understood that ChatGPT costs about 12 cents for every 1,000 words generated. And this kind of big model is not so important to us at the moment that it is irreplaceable, and there is no need to invest a lot of money in it. As a productivity tool, it is actually enough to meet daily use, and it must not fall into the trap of vicious competition, similar to the Star Wars of the United States and the Soviet Union, and it is itself that will be dragged down in the end.

In addition, our AI development strategy is different from others, and we mainly want to achieve intelligence in industry, provide production efficiency and product quality, and reduce production costs. And asking a question, helping to write a document, generating voice and video, etc., for the national strategy, it is not so significant for the time being, and the existing domestic large model is enough. This logic is very simple, and I believe everyone can understand it.

In short, when it comes to big models, we don't need to be presumptuous. On the contrary, under the extreme blockade and suppression of the United States, the domestic large-scale model has reached the world's advanced level, which is the best proof. It is believed that with the continuous development and improvement of the domestic semiconductor industry chain, the production capacity and performance of AI chips will gradually increase, and it is a matter of time before we catch up with ChatGPT, because we have a massive demand market and practice environment, which cannot be compared with other countries, just like mobile payment, short video, e-commerce, intelligent driving and other fields, which can achieve rapid development in China, and because we ourselves are the world's largest and best demand market. Therefore, the pace of Chinese artificial intelligence development will definitely not stop, and will be faster and faster.

View original image 56K

OpenAI has made a major update and released the GPT-4o multi-modal large model
OpenAI has made a major update and released the GPT-4o multi-modal large model
OpenAI has made a major update and released the GPT-4o multi-modal large model
OpenAI has made a major update and released the GPT-4o multi-modal large model
OpenAI has made a major update and released the GPT-4o multi-modal large model
OpenAI has made a major update and released the GPT-4o multi-modal large model
OpenAI has made a major update and released the GPT-4o multi-modal large model
OpenAI has made a major update and released the GPT-4o multi-modal large model
OpenAI has made a major update and released the GPT-4o multi-modal large model

OpenAI has made a major update and released the GPT-4o multi-modal large model

OpenAI has made a major update and released the GPT-4o multi-modal large model

Read on