laitimes

Machine vision companies are playing a game they can't afford to play

A well-known consulting firm once predicted that there will be only two kinds of companies in the future, artificial intelligence and non-profitable.

It may not have expected that there is a third - unprofitable AI companies.

Last year, we reported on the "disappearing machine vision companies", and the former "AI four tigers" (SenseTime, Megvii, Yuncong, and YITU) had their own difficulties in commercial profitability. However, with the GPT series products setting off another wave of "big refining models", these AI companies have risen again.

SenseTime previously disclosed that the next development strategy is general artificial intelligence (AGI), continue to promote "big device + big model", and released a Chinese language large model "discussion" with 180 billion parameters.

Machine vision companies are playing a game they can't afford to play

Megvii Technology also stated that it will firmly invest in the research and development of generative large models and maintain a long-term leading position in core technical capabilities.

In the fixed increase plan disclosed by Cloudwalk Technology, the funds raised for the R&D project of the "industry wizard" large model of Cloudwalk did not exceed 3.635 billion yuan.

YITU Technology has no public news, but in the previous financing, it was optimistic because of AI large models and domestic chips.

Whether it is the last round, the "pre-training + fine tuning" large model represented by BERT and GPT-3, or the "pre-training + fine tuning + prompt + RLHF (human feedback)" large language model represented by ChatGPT, GPT-4, Wen Xin Yiyan, etc., has become an important tool for major head technology companies to "show muscles" and compete with each other.

Google, Baidu and other big factories ran to enter, and various large models "fairy fighting method". This carnival party has become a game that machine vision companies have to play and cannot afford to play.

Embarrassing "tunic"

Recently, CV Company participated in the game of big model, and there was such a style of painting: the tone was big for a while, and the tone was intimidating for a while.

In the public information, they all said that they would increase investment to solve basic technologies and basic problems. The managers of Yuncong said that they should "invest one or two billion yuan to solve the problem of computing power" and "we are a technology company, and the investment in R&D will not be low"; Relevant people from SenseTime said that it is necessary to make a "unified and standardized big model" and "accelerate the construction of core capabilities of general artificial intelligence"; Megvii also benchmarked OpenAI, and wanted to "do AI technology innovation that affects the physical world."

Machine vision companies are playing a game they can't afford to play

When it comes to large model technology and the product itself, the confidence is insufficient.

This says that "the basic large model must have a long-term layout, NLP has many difficulties, and there will be a big gap with overseas leading companies in the short term", and the one that says "Chinese AI companies have the pressure of commercialization and cannot innovate at any cost like OpenAI".

"Expectation management" is something you can understand.

Young people often say that they are "Kong Yiji who can't take off his robe", and CV's situation of not being embarrassed about large models is actually similar to "Kong Yiji".

CV's accumulation in the fields of underlying technology, infrastructure, talents, capital, ecology and other fields is not as significant as that of leading technology companies. Therefore, it is naturally impossible to really fight with Google, OpenAI, BATH (Baidu, Alibaba, Tencent, Huawei) and burn money to do a general foundation model.

The new round of large language models, complete technology stack, engineering implementation capabilities, computing power costs, data accumulation, etc. have extremely high thresholds, and the difficulty of AI companies to develop their own large language models is unprecedented. OpenAI spent $544 million in 2022, with revenue of only $36 million, which is a domestic CV company that does not have.

Of course, the outside world should not over-amplify the responsibility of CV companies, and put the innovation pressure that giants can bear on CV companies.

However, CV companies have the aura of "AI-native native enterprises" and have indeed accumulated a lot of technical reserves, so they can't just lie down and rely on big factories like ISV integrators and software companies, happily waiting for integration or calling APIs.

The former "AI four tigers" still have to support the shelf of "technical self-reliance" and strive to integrate into this wave of large model refining, so the competition between the number of models and the scale of parameters has been raised to a new level of competition.

For example, Cloud has pre-trained models in the field of NLP and vision, and SenseTime's new model system built on the basis of "AI device SenseCore" includes general visual models, Chinese language models, and image generation models... Among them, the parameter scale of the large model only "negotiable" is similar to that of GPT-3.

Today everyone sighs, it is not easy for Kong Yiji to take off the robe, change the angle, the "big model" this long robe, is it necessary for CV company to wear it?

Games that can't be played

From the pre-trained large model in 2018 to the large language model in 2023, the large model has gone through a small cycle from germination to prosperity, and the types and functions have also become rich, and we have seen many AI companies, universities and scientific research institutions, and industry companies to create a variety of large models.

Here's the problem:

First, the "intelligence emergence" of large models requires ultra-large-scale data and sufficient training to appear, and only the basic model without investment can do it.

Many industry-oriented pre-trained large models, due to insufficient data and training, can not reach the critical point of "intelligence emergence", which is why there are so many pre-trained large models before, but only the arrival of ChatGPT has confirmed the feasibility of "general artificial intelligence".

Today, when the robustness and generalization of the basic model are greatly improved, blindly "training the big model", the result is "the same as the big and small classes", the basic large model and the industry large model together, consume the already insufficient computing power, further push up the computing cost, and make AI companies bear a heavier burden.

Machine vision companies are playing a game they can't afford to play

Second, the commercialization path of large models, standardized API is a relatively basic one, and the basic model API has a siphon effect.

To put it simply, through API access to AI capabilities, technology is the decisive factor, the basic model has strong capabilities, a wide audience, it is easy to complete commercialization through the API economy, while the industry large model faces a narrow field, it is difficult to dilute R&D costs through "scale effect".

As one big model after another was introduced to the market, everyone suddenly found that we did not lack large models, but a commercialization path.

At present, the commercialization of large models is still relatively limited, C-side general products are priced at cost, B-side profit prospects are uncertain, according to A16Z US LLM entrepreneurship survey, pure model manufacturers can only take away 0-10% of the value, and to benchmark OpenAI's pricing strategy for a long time, it will face great commercialization pressure.

The general basic model and the industry big model are oriented to the market and customers together, and the result is a game in the distribution of business value. The AI giant "Immortal Fighting Method" has created a general basic large model that will attract the most attention from the industry and users.

And a large number of large industry models, either no one cares after the training, wasting the initial investment; Either it cannot meet the needs of the industry and the commercialization prospects are limited; Or it conflicts with the capabilities of the common basic large model, resulting in commercialization not meeting expectations.

Li Zhifei, the founder of the same AI entrepreneurial company, said bluntly in an interview: "Not everyone has to do a general large model, it is very difficult to enter rashly, the business competition is fierce, and it will be very painful to think of the business model in the end." ”

Therefore, the big refining model may be a game that CV companies cannot afford to play at present.

Travel light

You may ask, now that big models are so popular, how can you eat this wave of dividends without training big models and establish an advantage in the new round of AI boom?

CV companies need to travel light, and may want to try a few ways to explore the opportunities in the big model boom:

1. Establish a closer connection with the basic large model platform.

It is too difficult to develop large models by yourself, the cost of training and storage is too high, and the community ecological support is not sufficient. You can stand on the shoulders of giants and access the ability of the basic model to create a small model, which is different from the business model of the basic model.

Previously, one of the challenges of CV company's profitability was that machine vision to enter the waist and tail market, there was a massive fragmentation demand, the customer volume was relatively small, the number was large, and the project scale was not large, which put forward high requirements for the development efficiency of CV company.

General-purpose mature algorithms cannot meet the needs of segmentation, but it is not realistic and cost-effective to rely on algorithm engineers to customize and develop. The basic large model advances the algorithm development to the industrialization stage, reduces the programming workload, improves the development efficiency, and improves the cost performance of customized algorithms, which is easier for enterprises to accept.

For CV companies, algorithms enter the stage of industrial large-scale production, fully covering fragmented demand and large-scale reuse, and the overall revenue capacity will naturally rise.

2. Go deep into specific industries and build differentiated application products.

If the basic model wants to go to the industry, it must be further fine-tuned, and CV companies have corresponding advantages.

Many highly specialized or complex jobs, such as finance, architectural design, programming, office, customer service, etc., require accurate vertical knowledge; In some specific fields, such as healthcare and justice, unstructured data is scarce. Without enough corpus to "feed", the basic model will lack some "common sense" in these scenes, such as GPT-4 can not write Chinese poems.

It is said that the training dataset of GPT-3.5 is all private datasets, of which 89.3% of the data of the key SFT training set is customized.

Most CV companies have their own vertical areas of focus, such as YITU's smart healthcare, Megvii's Internet of Things, Yuncong's smart park, SenseTime's smart city, smart mobility, etc., which can be combined with differentiated data sets precipitated in related fields and use fine-tuning or prompt methods to create more accurate and reliable small models, which are easier to deploy and accelerate the rapid landing of AI for AI applications.

3. Build a more resilient moat of ecological cooperation.

CV companies' accumulation of large model technology will become a hole card in the AI 2.0 era, and can also be used as an ecological cooperation chip with AI giants and computing power providers.

For example, this round of large models puts forward high requirements for prompt learning, reinforcement learning RLHF of human feedback, etc., so that the model can discover the use of knowledge and understand human preferences under the guidance of humans, which is a very new field in China, and there are few prompters and professional annotators. According to media reports, OpenAI's annotators have a bachelor's degree of 52.6% and a master's degree of 36.8%, which cannot all rely on the crowdsourcing model for data annotation, and must have their own vertical annotation team.

Machine vision companies are playing a game they can't afford to play

For example, in the medical field, medical images have not yet established a database as large as natural images, and the annotation of medical images is difficult, unlike natural image annotation, ordinary people know what it is at a glance, and the data labeling of medical images involves professional knowledge such as organs and cancer, and it needs to be accumulated in a targeted manner.

Such high-level technical personnel are precisely an important resource for AI-native enterprises such as CV companies, and can cooperate more closely with the upstream and downstream of the industry chain to ensure the competitiveness and sustainability of products and services, and attract customers to put more data into their products, forming a Matthew effect.

The big model opens up a new path with great value and possibility, and there are too many expectations and ambitions. To have the ability of a large model, does not mean to train a big model yourself.

The craze for repetitive construction will eventually fade, and by then, the test of commercialization of large models has just begun.

For CV companies, taking off the "big model" kaftan is to retain the commercial "bottoms". The collective tide is just a momentary excitement, and only by preserving strength can we go further in the AI rivers and lakes.

Read on