laitimes

In the first year of the implementation of the large model, Lanzhou Technology wants to open up the last mile of AI application

author:Ebang Power

AI applications have been stuck in the "last mile" for a long time, multi-dimensional data aggregation, analysis, interpretation, needs, responses, and feedback of different positions, personalized industry characteristics and enterprise characteristics, which have long plagued AI applications and also affect the cash flow life and death line of AI enterprises.

As one of the world's top NLP scientists, Dr. Zhou Ming, founder & CEO of Lanzhou Technology, not only pays close attention to the technical progress of the large model itself, but also pays attention to the technical capabilities, industry characteristics, scenario details, and core needs of enterprise customers. After surveying more than 100 enterprises, Zhou Ming summarized the multiple intersections between enterprise needs and large model capabilities, scene details and model performance, deployment costs and actual effects, carefully measured each joint point of large models from technology research and development to actual implementation, and proposed a series of practical methodologies from model selection, training, deployment, industry adaptation, security and other aspects based on overall considerations, so as to break the "last mile" problem of AI landing.

At the launch conference of Lanzhou Technology's large-scale model technology and products on March 18, 2024, Zhou Ming shared his judgment on technological development, summarized the AI landing methodology, and put forward his own thoughts on the development of China's large-scale model industry.

In the first year of the implementation of the large model, Lanzhou Technology wants to open up the last mile of AI application

The following is the content of the speech, which has not been reviewed by the guests:

Dear District Mayor Xu, Dear Dr. Kaifu, distinguished guests, good afternoon, the theme I am sharing today is "How to Promote Innovation with Two-Wheel Drive of Large Model Technology and Application".

First of all, let's introduce Lanzhou Technology. I come from Microsoft Research and have been engaged in natural language research for many years, and at the end of 2020, I had a premonition that large models would become one of the country's infrastructures, so I went to Sinovation Works to accept the guidance of Dr. Kai-Fu Lee, and established Lanzhou Technology in June 2021, and released the Mencius lightweight model that year. In March 2023, Mengzi GDP was released, and in January 2024, Mengzi GPT2 was released, which was approved by the Cyberspace Administration of China and can be opened to the public. This is our development experience.

In terms of business direction, we have always insisted on taking to B as the main and to C as the supplement.

Because I think the biggest application prospect of large models in China is enterprise services. I have also always attached importance to doing useful research to achieve mutual promotion of technology and scenarios. To this end, I hope to establish an ecological environment where industry, academia and research can sit together, cultivate talents, and find opportunities for innovation. Since its establishment, Lanzhou Technology has also paid special attention to the cultivation of this ecological environment.

1. 2024 is the first year of the implementation of the large model

Let's start with a little bit of a recap at the technological developments of the past year. With the emergence of ChatGPT, everyone has invested in large-scale model research, and the main research progress includes:

First, the large-scale model technology itself is developing rapidly.

Second, the large model has developed to a certain extent, and users can use the large model to build their own GDP, that is, GPTs.

Third, open source. Open source has changed the ecology of the entire model, making the large model more practical and more popular.

Fourth, multimodality. For example, Sora generated a 60-second video, and the results were very good.

Fifth, the development of technical tools related to landing, such as RAG, helps large models to land quickly.

Finally, large-scale native applications, such as Character.ai and Perplexity, have reached hundreds of millions of users, and the use of large-scale models to support search has also developed very well.

These are all things that we care about, on the one hand, we need to grasp the technical context, and on the other hand, we also have to think about how to develop our strengths and avoid weaknesses. Because these technologies are from the United States, it's good that we can follow up quickly, but I always wonder, how can we build on our strengths and avoid our weaknesses and achieve transcendence?

I think there are two aspects to consider: first, some key technologies, can we do better than others. The second thing is to grasp the general trend of national development to achieve innovation.

What is the general trend of national development? The big model must be implemented. 2024 is the first year of the implementation of the large model. Because the development of the large model has been very good in the past year, it has the ability to land. Second, enterprises have accumulated a lot of data and scenarios in reducing costs and increasing efficiency. This year's government work report also emphasizes the empowerment of large models in every possible way, so it is said that the right time, place and people for the landing of large models this year are all available.

What should we do? We should take advantage of the general trend of national development, create value through implementation, and stimulate innovation, rather than simply catching up with the United States.

The landing of large models does not mean that "gold is everywhere". I can responsibly say that there are many places that we have not yet developed - for example, how to break through the last mile of the large model, what is the business model of the large model, how to strengthen the delivery capacity, improve the standardization of products, these are actually just beginning to emerge.

2. The nine-character formula for the landing of the large model

So what's the secret to our landing? I've been thinking about this a lot. I think of Mr. Lei Jun's 7-word formula for Internet entrepreneurship: focus, extreme, word-of-mouth, and fast. I would like to add two words here called cost - large-scale model entrepreneurship cannot be costed. No matter how much financing you have, when you run out, you must have the ability to make blood. At the same time, we pay attention to cost in R&D, delivery, and every link.

This nine-character formula is actually integrated into all the actions of Lanzhou Technology, and let's take a look at how we can achieve this nine-character formula.

First, positioning. Our positioning of Lanzhou Technology is very clear, and it is also very simple, to be a comprehensive service company with large-scale model technology and enterprise scenario application.

Second, PMF. Through more than two years of research, we have found that there are four major problems in the implementation of large models, and if they are not solved well, the application of large models is basically a bubble.

1) Training and deployment costs. Tens of millions of models are unacceptable to users.

2) Industry adaptation issues.

3) Illusion problem. The result of many large models is that insiders look like laymen, laymen look like insiders, and companies dare to use them?

4) Data security issues, the data of enterprises is reluctant to take out, especially the data of central state-owned enterprises, he is unwilling to participate in the training of large models, and even unwilling to be mobilized by you. How to solve the worries of enterprises?

So how does Lanzhou Technology solve these four major problems?

The first is the issue of cost, that is, the specification selection of the toB large model.

The bigger the model, the better, but the cost is also getting higher. We surveyed 102 companies across a wide range of industries and found that their need for large models focused on several capabilities, including language understanding, multi-turn dialogue, text generation, machine translation, and more. They told us, you don't have to tell me about the big model, you explain these abilities clearly, and I will use them to the extreme.

I look back and think, how big should the model be in order to do these capabilities well?

I drew a very rustic curve, which many people call the "Zhou Ming Curve".

In the first year of the implementation of the large model, Lanzhou Technology wants to open up the last mile of AI application

It can be seen that to solve the language understanding ability, the effect is between 10 billion and 100 billion, and no matter how large the ROI is, it is not cost-effective. This is the certainty we find from uncertainty, so we can do a good job of practicing the model of 10 billion to 100 billion yuan, and realize the ability that enterprises care about.

Second, how to train the model.

When the team was first established, the investment was limited, how to establish a system to quickly and effectively train these models? We established a training system for Mencius large models and implemented them in an orderly manner.

The Mencius model has a full-link security system. As a responsible enterprise, we need to have a compliant and high-quality corpus, a secure pre-training process, reinforcement learning to promote security issues, alignment issues, and some filtering in input and output.

With these assurances in place, data becomes the most important thing. How do we get bigger data, how do we filter data better, how do we use data better, this involves how data is collected and where it is collected.

We collected petabytes of data, filtered through high-precision page parsing, and then filtered through layers of better data, and then used these data through text deduplication and data adjustment, which is particularly clever to make the most of the limited data.

Based on this data, we continued to develop on the basis of the company's commission, and soon we iterated the Mencius V3 version.

The dataset of V3 is 2.7 times that of V2, and the overall data quality has been well improved, covering good data in various aspects such as encyclopedias, papers, books, code, and web pages. We currently have 7b, 13b, 40b models. The 13b model has achieved leading results in both basic Chinese and English basic ability, and has also achieved an upward trend in coding and mathematics. We'll be opening it up at the end of this month, and we'd love for everyone to use it and give us feedback.

In the first year of the implementation of the large model, Lanzhou Technology wants to open up the last mile of AI application

After talking about large model technology, let's look at large model applications. How to solve the last mile of the large model.

We were the first to propose a large model classification system at L1, L2, L3 and L44 layers in China.

In the first year of the implementation of the large model, Lanzhou Technology wants to open up the last mile of AI application

L1 is a general-purpose large model. L2 is the industry model. As I just mentioned, we are training each key industry model on a very good foundation, plus industry data and industry hotspots, which requires us to understand the industry, know where the industry data is, what the industry tasks are, and what the pain points of the last mile industry are, which are all integrated into the model training. Therefore, as you can see, each of our industry models is trained with partners at the head of the industry.

L3 is the scene model, and we need to make detailed adjustments for the key scenes. For example, in the financial field, the general model or even the industry model sometimes does not do very well, 0~1 does well, 1~10, 10~100 does not do well, which means that your model is not competitive in the industry. So we have to take things to the extreme, taking the scene into account.

L4 is an AI assistant. The user issues a natural language instruction, and it automatically decides which level of model or level of scenario to use to meet the user's needs. It is equivalent to using the large model as the brain to do task decomposition, call the model, and finally summarize the results to form a result report.

The third problem is hallucination, and the fourth is security. I'll save time and talk about it together.

The illusion problem is plausible. The security issue is the confidentiality of user data. How to fix it?

Actually, you can see that this diagram is a little bit complicated, which means that our large model is actually responsible for three things.

The first thing is that the big model takes care of himself first, and when he comes to a question, he sees if he can answer it, which is his own problem.

The second thing, it found that he didn't solve it very well, it went to access the local data of the enterprise, and after he understood the intention, he went to check the enterprise incident.

If it doesn't understand it, it needs a larger range of data to look it up. This is the third thing.

If the model can get the answer, it will output the result by itself, and if it can't decide, it will fuse these three things together, and we will make a neural network to reorder the three kinds of information and input the final result, so as to solve the illusion and data security.

On this basis, we put forward a series of mencius large-scale models in horizontal and vertical N.

In the first year of the implementation of the large model, Lanzhou Technology wants to open up the last mile of AI application

3. Innovation and implementation promote each other

I just talked about the entrepreneur's consciousness, you first of all to pay attention to the first thing, the second thing to do the ultimate, the third word-of-mouth, the fourth fast, and finally the cost, we reflect the 9 words of the formula - such as how big the large model is, what product functions to do on the large model, are according to customer needs, pick up the most important things to do, so as to improve our delivery capabilities.

Finally, I summarize today's sharing, introduce the entrepreneurial concept of our Lanzhou Technology, and hope that everyone will remember this nine-character formula, and then I want to emphasize that innovation and landing are complementary to each other, do not blindly innovate or blindly land, to link the two together, so that it can iterate quickly.

The second thing, I want to emphasize is to pay attention to ecological issues, no matter how strong a company is, it is impossible to do everything well, we should cooperate with other companies to create a good ecological environment for innovation and entrepreneurship, so that everyone can get the opportunity to develop in such an environment, and also help the development of science and technology in our country.

The landing of the large model industry is kicking off, let us create a better future, innovate and win-win, thank you.

Read on