When the large model joins hands with retail: the technical breakthrough of the nine-number algorithm middle platform

Whip Bull News on January 11 that the essence of retail revolves around cost, efficiency and experience. When the era of artificial intelligence represented by large models comes, and from the soaring "100 model war" to the rational "landing is king", it is difficult to balance the "impossible triangle" that originally existed in the retail industry, and there is a new solution.

The nine-digit algorithm middle platform, a MaaS tool tempered in JD.com's internal high-concurrency and high-complexity collaborative retail scenarios, supports traditional model and large model training. It has been connected to multiple open source models such as JD's Yanxi model, adhering to the concept of "multi-cloud and multi-mode, device-cloud collaboration", and continues to deepen technology, aiming to promote retail e-commerce scenarios to reduce costs, improve efficiency, and optimize experience.

In 2023, JD 11.11, the nine-number algorithm middle platform and a number of model applications on it will obtain large-scale practical operation opportunities, realize kilocalorie-level distributed scheduling in terms of computing power, provide core algorithm services for 800+ e-commerce businesses, and realize the understanding and modeling of hundreds of millions of users and goods.

Efficient intelligent scheduling of computing power: Achieve the "optimal solution" of resource allocation and reduce the cost of computing power

Computing power is an important "competition point" for large models to achieve a jump in value. However, in the process of implementing large models, they still face challenges such as exponential growth in computing power demand, high computing power cost, high heterogeneous complexity, and cross-domain and multi-dimensional scheduling. Therefore, it is particularly important to achieve unified, efficient, and low-cost scheduling of computing power.

Refined in the nine-digit algorithm middle platform of JD's retail business, a new generation of heterogeneous cross-domain intelligent computing power scheduling system is built at the level of underlying computing power, which carries out refined management and control of computing resource scheduling, realizes the approximate "optimal solution" of computing resource allocation, and helps retail algorithm computing power reduce costs.

When the large model joins hands with retail: the technical breakthrough of the nine-number algorithm middle platform

This technical architecture is optimized from the five stages of the algorithm task cycle, covering dynamic queues, multi-dimensional perception, scheduling decisions, efficient execution, and intelligent attribution, which not only unifies the computing power of multiple computer rooms up to 2,000 kilometers apart into computing power clusters to achieve cross-domain scheduling and efficient resource matching, but also optimizes computing tasks through GPU operators, operator fusion, IO optimization, RDMA and other technologies, and squeezes hardware performance to the extreme, increasing GPU utilization by 1 times and greatly reducing computing costs.

Multi-mode: Efficient fine-tuning "out-of-the-box" to improve business efficiency

The large model that has undergone technological ramp-up is now moving towards application, showing great potential to promote the upgrading of industrial digital intelligence. However, the road to application is not as smooth as imagined, and problems such as "hallucinations", poor timeliness, lack of professional knowledge, and data security have yet to be overcome in large models.

In response to the above problems, the Nine Numbers Algorithm Middle Platform focuses on building a complete set of large model application capability framework, supporting high-performance high-speed fine-tuning and RAG knowledge retrieval technology, greatly improving the efficiency of model training, solving business problems of different complexity, and striving to provide a better service experience.

SFT (Efficient Fine-Tuning) technology is used to solve simple business problems in a single step. For example, when a user asks "what are the basic functions of an Apple mobile phone", SFT technology can fine-tune the large model based on the pre-trained pedestal model and use the data in the vertical field of retail e-commerce to obtain a vertical large model with knowledge of the specific business domain, and then answer the user's inquiry.

At present, the nine-number algorithm middle platform integrates multiple mainstream LLM models, including the Yanxi large model, and develops the 9N-SFT framework to unify the sample standards and training modes of the model, so that one sample and configuration can be switched between multiple models at will. In layman's terms, a number of mainstream LLM models have been configured by algorithm engineers one by one, which can be "used out of the box" in a nine-digit environment, allowing large models to "try faster" when invoking, and improving the performance by about 40% compared with pure open source code. This self-developed framework has been applied to many of JD's internal businesses to achieve low-cost application of SFT technology.

RAG (Retrieval Enhanced Generation) technology is used to deal with relatively fixed process complex business problems. Specific to the retail scenario, whether it is a product consultation from C-end users or a platform entry consultation from B-end merchants, the requirements for timeliness, professionalism and accuracy are higher, and the large model also needs to have the ability to understand multiple rounds of dialogue. RAG+LLM technology can give full play to the capabilities of artificial intelligence combined with contextual semantic understanding to provide users with a better experience.

Specifically, RAG technology consists of three components: indexing, retrieval, and generation, and realizes the connection between large language models and external knowledge bases through LangChain. For example, when a user asks "what is the difference between two different brands of mobile phones", RAG technology uses indexing to "plug-in" the knowledge base of different parameters, different attribute data, and the latest hot trends of the two mobile phones for the large model, and finds accurate product parameters and other information in the product knowledge base through retrieval technology, and compares the differences between the two mobile phones in which important dimensions are different through the generation ability of the large model, so as to efficiently and accurately output the differences between the two mobile phones to the user.

In the future, the Nine Numbers Algorithm Middle Platform is committed to realizing a new product interaction method of "intent-based result designation", providing services to users through AI Agent (agent) and solving more complex business problems in a more intelligent way.

Device-cloud synergy: "Lightweight deployment" of large models optimizes user experience

Undoubtedly, the application of large models shows an exponential increase in the demand for local computing, and it is not realistic to hand over all computing to cloud computing for centralized processing. A more reasonable path is to give full play to the advantages of cloud computing, mobilize the agility of terminal computing, and activate "device-cloud collaboration". In this context, the device-cloud collaboration of large and small models has become more realistic.

JD.com judges that the collaboration of large and small models will be an important path for the implementation of large model technology in the future. On the one hand, the large model is responsible for outputting general capabilities, and the small model is responsible for actual inference execution, which not only improves the coverage and accuracy of the system, but also reduces the inference delay and ensures the security of private data.

The Nine Numbers Algorithm Middle Platform builds an AI technology system for device-cloud collaboration, places AI models on mobile phones, provides AI capabilities in the whole interactive link, understands user demands in real time and quickly, and performs real-time calculations, improves the full-link user interaction experience, and optimizes business goal prediction. In terms of technical implementation, pythonVM is compatible with mainstream operating systems and more than 95% of models, based on the self-developed efficient inference engine and a variety of compression and compilation technologies in parallel to promote the lightweight development of large models, and through the collaborative training of large and small models, it can be used in the cloud after one training, so as to improve the intelligent effect of the whole link.

At present, the Nine Numbers Algorithm Middle Platform is exploring two core application scenarios of terminal intelligence technology: First, in the search and push scenario, the search and recommendation business has extremely high latency requirements, and the more real-time data is used, the greater the effect of the model. With device-cloud collaboration, you can recommend more accurate products to users based on the most real-time data of users on the device. Second, in data security, the intelligent computing structure of the terminal naturally has the role of data isolation to ensure that sensitive data is not uploaded and data security is ensured.

This end of the intelligent technology will also be applied to more scenarios. For example, optimizing the courier experience and migrating the courier loading inspection from cloud detection to mobile phone detection can ensure the system response speed and improve the operation experience even in a weak network environment.

In the future, JD.com will continue to deepen its technology, combine its digital intelligence experience in the field of retail e-commerce, and continue to promote the large model to the depths of the industry.

Further reading: About JD Cloud vGPU pooling scheme

Facing the demand for digital intelligence computing power in the era of large models, JD Cloud has launched the vGPU pooling scheme based on the self-developed hybrid multi-cloud and multi-cluster scheduling operating system CloudShip, which improves the AI operation efficiency and reduces the cost through the pooling of GPU heterogeneous resources, which has very significant advantages and practical application value. The vGPU pooling solution provides one-stop GPU computing power pooling capabilities, centrally managing and scheduling distributed GPU resources, and improving GPU utilization by up to 70%.

When the large model joins hands with retail: the technical breakthrough of the nine-number algorithm middle platform

Read on

Google Releases Blockbuster AI Models! Predicting all the biomolecules on Earth will greatly accelerate research into the treatment of diseases such as cancer

The new large model version 2.5 is on the back of Alibaba Cloud

The first domestic large cruise ship, hydrogen energy products, algorithm large model, Shanghai IP unveiled at the 2024 China Brand Day

Trading experience: a minimalist model that is off to a good start

How to design large-scale model products in order to truly integrate business and make users feel value?

Google Launches Next-Generation AI Model for Drug R&D! Can AI healthcare make a comeback?

The "Lenovo Department" model company has completed nearly 100 million yuan in Pre-A round of financing, and AI+ manufacturing is coming

【Financial Analysis】Not only "bonsai", but also "landscape" AI large model to accelerate the exploration of industrial landing

The in-depth application of AI makes the pre-ride prediction of hitchhiking more comprehensive and accurate The Tick Cool Technology Science Experience Center explains the principle of the order receiving prediction model in detail

OpenAI released the first version of the "Model Specification" to restrain ChatGPT from crossing the line and breaking the law

"Chaoxing Future" completed hundreds of millions of yuan in Pre-B round of financing, adding edge-side large model inference chips

Tongyi Qianwen 2.5 is officially released! The 110 billion parameter open-source model surpasses Llama 3

Catch up with GPT-4 in an all-round way, and Xiaomi mobile phones will be carried! Alibaba Cloud Tongyi Qianwen 2.5 model was released

26-year-old sprint large model with a team of 100 people, CTO of wall-facing intelligent genius: efficiency is more important than parameters

It's so powerful that you don't dare to use it for ordinary people! How did Sora, an epic model, transform industries overnight?

In the era of large models, what new sparks will AI and database technology collide with?