Huawei's AI stores "solving" large models

Compared with the hot front of large models, people's attention is rarely paid to the upstream of the industrial chain.

After all, compared with ChatGPT's fluent answers, rich fun, and improved work efficiency, hardware such as chips and storage is not sexy or even slightly boring.

However, people cannot ignore such a problem: in the era of extremely clear industrial division of labor, industry progress has never been the result of a certain manufacturer's promotion, but the collaborative drive of the entire industrial chain.

Behind ChatGPT's astonishment of the world, it is not only OpenAI's countless talented engineers who tackle technical problems day and night. Nvidia's GPUs, Samsung's memory, Intel's CPUs and other hardware are also indispensable.

From a more low-level logic, the large model represented by ChatGPT is essentially inseparable from two elements: massive and effective data, and powerful computing power.

In the three carriages of artificial intelligence, data is the raw material for production, computing power is the infrastructure, and the algorithm is the logical representation of the large model.

There is no doubt that the tall buildings of large models are built on the basis of data and computing power.

Nowadays, thousands of industries in China have surged up with a boom in large models, entering the era of "dancing with demons", and who can make a Chinese version of ChatGPT is still unknown.

However, the GPU and memory required to train large models have ushered in a new expansion opportunity.

At the end of May this year, the market value of GPU head player Nvidia exceeded one trillion US dollars, indicating that the fire of the big model first spread to the upstream of the industrial chain, allowing enterprises to eat the first wave of technology dividends.

The memory that accompanies data, driven by large models, is also ushering in a technological revolution and market leap.

First, the data under the boom of large models: large total, miscellaneous and noisy

Starting from the data explosion, the current data volume is growing rapidly, from terabytes to petabytes to amazing ZB, how to store massive amounts of data is a problem that many data centers and enterprises must solve.

On the other hand, multimodal AI represented by large models has a far more complex data structure and type than single-modal AI, and the amount of data is also larger.

The two trends are superimposed, and the market is about to demand incremental storage.

From a microscopic point of view, enterprise research and development of large models must go through the following stages: data aggregation, data preprocessing, model training, and inference application, and each stage is inseparable from storage.

In the data aggregation stage, large models require large amounts of data and a wide variety of data. For storage, in addition to expanding the capacity and installing data, it is more important to integrate all kinds of unstructured data and securely flow it in order to be used by enterprises.

This is not an easy task, because the data format, type and protocol are different, and enterprises need to spend a lot of manpower and material resources to break down barriers, and even establish standards and ecosystems, and technical advantages and commercial status are indispensable.

At the model training stage, the quality of the data determines the upper limit of the model.

In other words, large models rely on data is not rigorous, and it is more accurate to say that they should rely on valid data.

In the previous training mode, XPU usually directly called all data for training.

However, in the massive data, not all data can be used, and the existence of some data will reduce the model training effect and prolong the training cycle.

Therefore, in the pre-training stage, data preprocessing and aggregation can be done in advance to eliminate these data "noise" and leave clean and effective data to reduce the "illusion" of the model.

Further, due to network fluctuations and XPU failures, many large models are interrupted during training, that is, checkpoint moments, and then restart training. During the recovery process, the training will fall back to the previous node, which is equivalent to a part of the retraining, which not only prolongs the time, but also increases the XPU power consumption.

The key point of this problem is how to quickly access Checkpoint's data, resume training, and shorten the time, which requires extremely high storage concurrency and bandwidth transmission.

The last level is the inference application, where the large model will be directly face-to-face with the customer, which is the most effective window to show the effect of the model.

This window is related to the user experience, so the response time requirements for large models are very high.

Taking ChatGPT as an example, the average time for users to answer questions is within 10s for simple questions. If it takes too long, the user experience will be bad, and trust in the model will be lost, and even bad reviews will be given.

This reflects the delay phenomenon of large model inference. In general, the accuracy of the models is similar, and the longer the delay, the worse the experience. Therefore, shortening the delay is very important for large model manufacturers, and similar problems can actually be innovated in storage and model optimization.

Returning to first principles, several difficulties in training large models are essentially around the core proposition of how to make good use of data.

As the basic hardware of data, storage is not just a simple record of data, but deeply participates in the whole process of data collection, circulation, utilization and other large model training.

Hundreds of domestic large models competed, but the first to win was GPU manufacturer Nvidia. Then, according to the growth logic of XPU, storage should also be able to replicate Nvidia's miracle of wealth.

The winning method is that who can preemptively decouple the pain points of large model training by storage vendors can occupy the high ground and become the first to eat crabs.

Second, AI storage three elements: accuracy, efficiency, and energy consumption

In the past, the method of training models was simple and crude: a large amount of data plus artificial labor, and powerful computing power were constantly tuned to improve the accuracy of the model.

This method of making miracles worked, but it was extremely costly, and once became an old robe that many AI companies could not take off.

In fact, among the three carriages of AI, the optimization of any link can reduce costs and increase efficiency. The previous training methods focused on computing power, some companies bought powerful XPU, training efficiency has indeed improved, but the model training effect is still poor, low efficiency, large power consumption, low accuracy and other problems emerge one after another.

The fundamental reason is that computing power is only a tool, and data is a factor of production, and the practice of only improving tools without optimizing data is the wrong focus.

This is like if it is difficult for a woman to cook without rice, and no matter how capable a chef is, it is difficult to make a beautiful and delicious dish without good ingredients.

In a similar model of vigorous miracles, domestic companies have accumulated a large number of computing power resources in the past few years. The problem now is: how to use this computing power without redundancy, idleness, waste, and create value.

Zhang Ji, chief scientist of storage at Huawei Zurich Research Institute, believes that under the condition of sufficient computing power, the efficiency of model training has reached the extreme, and if you want to further improve efficiency and model effect, you need to work data. Further, it is to make technological innovations on the memory to which the data is attached.

For example, the data preprocessing mentioned above is that the past training method is that XPU directly calls all data training and puts it back into memory after use, which has several problems.

First of all, XPU mobilizes all the data, and these data have noise, which will affect the training effect; Secondly, XPU increases energy consumption and time when calling and putting back data; Finally, because the amount of data is too large, memory alone is far from enough, so it is necessary to load external memory to store data, so the data faces security risks when it flows.

Here's a simple example:

Assuming there are 10,000 photos in the phone, how to find one of them quickly and accurately?

The traditional way is to open the mobile phone folder, which will display 10,000 photos, if the photo resolution is too high, it will take time for the mobile phone to load the picture, and then the user will compare and find one by one, which is inefficient and error-prone.

At present, it is more common that when the mobile phone stores the photo, the memory has extracted the feature value of the photo and done a good job of data aggregation. Then when the user wants to find the photo, just enter the label of the picture, you can find the picture that meets the characteristics, narrowing the scope of search.

From the perspective of storage, the logic in this is actually the memory to do data preprocessing, when the CPU is looking for pictures, according to the feature value, call is a small data set in 10,000 pictures, so the speed is fast, low energy consumption, high accuracy, and does not occupy more computing resources.

For large models, the above logic still holds.

On the memory side, enterprises can first do data preprocessing, eliminate invalid data (noise), and collect and sort out the data, then XPU only calls the data that needs to be used when calling data, which is faster and more efficient, and the utilization rate of XPU is also improved.

Moreover, the memory itself is in direct contact with the data, which is the first barrier to data security. Therefore, the encryption and protection of data on the memory can also ensure the safe flow of data to the greatest extent.

If the miracle of vigorous power is to improve the model training effect by laying computing power, then preprocessing the data on the memory is to reduce the storage cost and improve the efficiency and accuracy from the entire model training process.

It is clear that the former method is widely used and has reached its limits, and the latter method is starting to take the entire AI industry by storm.

Third, how does AI storage allow enterprises to use large models?

Looking at the history of ChatGPT, from 1.0 iteration to 4.0, OpenAI has invested hundreds of millions of dollars, and even now, the cost of training once is millions of dollars.

In the final analysis, at this stage, the big model is still a money-burning business, without strong financial strength and talent echelon, it is impossible to get on the card table.

There is a view in the industry: the general model can only be a game of the big factory. But that doesn't mean that non-big companies can't have their own models.

At present, on top of the basic big model, the establishment of the industry big model has become a common business paradigm.

For enterprises that lack AI capabilities, standing on the shoulders of giants is undoubtedly a shortcut to save time, effort and money.

The comparative advantage of these companies is that they are close to the data, and the data is real and valid.

This advantage is also a disadvantage: many companies will not use this data.

To this end, enterprises have to cooperate with basic large model manufacturers to open up data and train models.

But for some organizations, data security may be more important than data.

So, how to ensure the safe flow of data, but also make good use of data, activate data value, and expand business?

The answer is imminent: on the memory, the enterprise data, through the vector method, into the data required by the model. In addition, based on the security management of the memory itself, the safe flow of data can be realized.

In this way, enterprises can not only train industry models, but also only need to maintain this small part of data, and the threshold for occupancy is reduced; You can also take ownership of the data into your own hands, and the safety factor is also increased.

Based on this, Huawei has launched two AI storage products: OceanStor A310 and FusionCube A3000.

OceanStor A310 is a deep learning data lake storage product, which can provide storage support from "data aggregation and preprocessing to model training and inference" for enterprise training of large models.

In terms of specific parameters, OceanStor A310 meets multi-protocol lossless convergence, supports 96 flash disks, its bandwidth reaches 400G/s, IOPS reaches 12 million, and supports horizontal expansion of up to 4096 nodes.

At the same time, OceanStor A310 has in-memory computing capabilities, and the built-in computing power can support AI large models to access raw data scattered in various places, and realize global unified data view and scheduling across systems, regions, and clouds, simplifying the data aggregation process.

FusionCube A3000 is a training/push hyper-converged integrated machine, integrating storage, network, computing, and model development platforms, with built-in OceanStor A300 storage nodes, facing tens of billions of model applications, and supporting one-stop installation and deployment, which can deploy applications in 2 hours, and provide mainstream large model services in the industry through the Blue Whale Application Store.

At present, Huawei's FusionCube A3000 products can not only be delivered in one stop, but also support the integration of other AI large-model software, and the development of computing power platforms and networks. On its built-in OceanStor A300 storage node, manufacturers can integrate third-party GPUs and software platforms to build hyperconverged nodes that suit them.

In short, the OceanStor A310 and FusionCube A3000 are essentially designed to solve the problem that many enterprises lack technical support when building industry models.

Moreover, these two products themselves are suitable for different customers, the former is suitable for "existence", with universal capabilities; The latter provides one-stop delivery capabilities, lowering the threshold for enterprises to implement large model applications.

The future of AI storage

From the industrial society to the information society, with the development of new technologies, the total amount of data has grown exponentially.

How to make good use of data has become the key, and this is closely related to storage technology, and the two influence and interact with each other.

When technologies such as large models begin to "emerge", the market demand for new storage architectures and technologies accelerates suddenly.

Zhang Ji, chief scientist of storage at Huawei Zurich Research Institute, believes that under the traditional XPU computing center system, all data is developed around XPU, and in this process, the flow of data will bring many problems such as energy consumption, efficiency, and security.

These problems can actually be separated through data and control, using technological innovation, so that some data bypass the CPU and directly "feed" to the GPU, on the one hand, reduce the CPU load, on the other hand, improve GPU utilization, and reduce the process cost in the entire training process.

"Model training should come back to the data itself." Leifeng NetLeifeng Net