58.com Sun Qiming: How to build a large model of life service vertical? Self-developed + open source with both hands

作者 | GenAICon 2024

The 2024 China Generative AI Conference was held in Beijing on April 18-19, and at the AIGC application session, the main venue on the second day of the conference, Sun Qiming, head of large language model algorithms at 58.com TEG-AI Lab, delivered a speech on the theme of "Construction and Application of Vertical Large Language Models in the Field of Life Services".

Sun Qiming introduced in detail the ideas and experiences of 58.com in building a large model system of life service verticals. According to him, 58.com is undergoing overall transformation and industrialization upgrading, and strives to realize the online and digitization of information processes in the entire service chain. In order to support this transformation, 58.com AI Lab has built an AI platform with leading models and agile and easy-to-use applications to facilitate the rapid implementation of AI applications in its four internal business lines.

In Sun Qiming's view, the general large model + prompt will not replace everything, and the application side needs to fine-tune its own large model based on its own business scenarios. At present, 58.com has a large number of large model training tasks in every day, and the four major business lines have used large models to improve the service experience, and the number of online models has exceeded 200 so far.

58.com has built a platform to support large language model training and inference, and launched the vertical large language model Lingxi large model (ChatLing) based on the platform, which has achieved better results than the official open-source large model. In addition to self-developed models, 58.com also actively integrates open source general large models and can quickly respond to the release of the latest open source models.

For example, on the evening of April 18, the latest Llama 3 model was just open-sourced, and the next afternoon, 58.com quickly launched this new open-source model on its own AI platform.

The following is a transcript of Sun Qiming's speech:

I will mainly introduce how we built a large model platform in related vertical fields, and used it to empower 58.com's online business products, which brought considerable online revenue.

First, let me introduce myself briefly. Since joining 58.com after graduation, my focus has been on recommender systems, NLP (Natural Language Processing), and large model technology. At present, I am responsible for the technical direction of 58.com's large language model, and lead the construction of the company's internal large model platform from 0 to 1.

As a service platform, 58.com's main business covers four main directions: life services, real estate, recruitment and automobiles.

We currently divide our business growth strategy into "first curve" and "second curve". The "first curve" refers to our traditional traffic model, that is, potential customers become our B-end merchants by purchasing 58.com's membership services, and these merchants publish information on the 58.com platform. The C-side browses these posts and directly interacts with the merchants and communicates with them in the future, which is our traffic business model. Each of our business lines, including maintenance, real estate, etc., has a corresponding case showcase.

In this way, 58.com not only provides a platform for information release, but also promotes direct contact between B-end merchants and C-end consumers, so as to achieve the matching of needs of both parties.

1. The service platform leverages AI transformation, and the intelligent assistant realizes preliminary screening and capital retention

The construction and application of large language models will further optimize the process of traffic business model, improve user experience, and enhance the service quality and efficiency of the platform.

58.com is undergoing overall transformation and industrialization upgrading, and strives to realize the transformation of its business model through the second curve strategy.

Our goal is to online and digitize the information process in the entire service chain, so as to seamlessly connect the upstream and downstream links, and provide a one-stop service, so that customers can complete more transactions on our platform, not just a simple traffic business. For example, users can directly find nannies and confinement nannies on the 58.com platform, or complete related work in the real estate field.

To support this transformation, 58.com AI Lab is committed to building a leading, agile, and easy-to-use AI platform that aims to facilitate the rapid implementation and implementation of AI applications across various business lines.

The flowchart we design starts from the underlying AI computing power, including GPU, CPU, NPU and other general technology platform resources.

At the technology platform layer, we design the overall algorithm engine, including computing power management, large-scale cluster scheduling, and offline and online performance acceleration.

At the algorithm model level, our platform covers images, speech, traditional NLP algorithms, 3D modeling, as well as emerging large language models and multimodal large models.

On top of the technology platform layer, we have built an application platform layer, which provides services including intelligent dialogue, customer service, VR house viewing, AIGC picture generation, digital human cloning interaction, etc. In addition, our agents contain workflow and knowledge base plug-ins to adapt to the application needs of different fields.

Ultimately, based on the entire AI application platform, we can further empower AI applications within the company, including sales, customer service, online products, operations, and office.

At present, this process is running quite smoothly within the company, with a large number of large model training tasks going on every day, and our four major business lines are basically able to use large models to improve services. Today, for example, we have trained more than 200 models online.

In terms of application, we have launched a B-end merchant intelligent chat assistant. This assistant is mainly used in recruitment scenarios, especially on the 58.com platform, and we have found that there are many blue-collar positions that use this assistant.

The recruiter may not have enough customer service team to respond to every merchant who submits a resume, and our bot will take over the conversation. Based on the big model and job-related information, it can proactively ask the poster if they have some basic work experience and can meet the basic requirements of the position. At the same time, if users have questions about the company's location, position treatment and other information, the intelligent assistant can also provide relevant content based on the large model and conduct simple communication.

The core functions of the intelligent chat assistant lie in AI retention and preliminary screening. For job seekers, AI needs to communicate effectively with them, obtain resumes, and determine whether they meet the basic needs of the recruiter.

Ideally, we want AI to go one step further by providing recruiters directly with candidates who have already passed the interview. AI may even complete the entire interview process and directly determine whether the candidate is suitable for employment.

If this can be achieved, it will be a huge improvement over the traditional business model of providing only traffic. The advantage of the latter is that they are able to better understand and handle complex conversation scenarios, providing a more precise and personalized interactive experience.

We tried two large model application scenarios.

First, we tried an end-to-end approach where the big model took over and handled the entire chat process, including timely responses and communication. However, we found that this approach was not ideal. Subsequently, we turned to the reference of traditional NLP methods, which include autoresponders, text classification, text matching, and Q&A knowledge bases.

In addition, there is the active boot strategy, which responds based on the state machine, for example, after answering a question, the system will ask the next question at the right time based on the configured words. Finally, there is slot identification, which designs and responds based on the content provided by the user. However, these traditional NLP methods rely too much on the knowledge base and are expensive to maintain, especially in the context of rapid changes in information, such as changes in economic trends or the emergence of new jobs, and it is particularly difficult to update and maintain the traditional knowledge base.

Our team is responsible for the knowledge base service of many scenarios on 58.com, and in the face of the continuous change of knowledge base information, we try to use automated mining technology to reduce the need for manual update, but at present, these technologies still cannot achieve the effect of manual update, especially in terms of diversity.

We evaluate the AI conversation system very strictly, especially when it comes to the fluency of the conversation, we evaluate the appropriateness of every sentence of the bot in the conversation by setting the right criteria, and any reply that does not meet the standard will lead to a decrease in the fluency of the conversation. Our goal is to achieve 80% conversation fluency, which is one of the indicators of whether robots can replace human jobs.

2. Bringing together self-developed and third-party models, Llama 3 has been launched on the platform

We followed the development trend of general large models such as ChatGPT last year and began to explore various models.

The consensus we have with the industry is that the combination of general large models and prompt engineering cannot solve all problems, especially in practical applications, we cannot rely on chat capabilities alone to solve all problems. Although some open-source or closed-source business models perform well in everyday conversations, they struggle to achieve an accuracy rate of more than 99.9% in business scenarios.

So we set out to build a platform designed to support the entire training and inference process for large language models, with integrations with APIs that are working well across industries. We want all business units within the company to take advantage of the best models available today.

Our platform architecture is divided into several layers, starting with the facilities layer, which provides the necessary hardware resources and computing platform. At the model layer, we have integrated a variety of open-source sequence models, including Llama, Qwen, Baichuan, Yi, etc. Our self-built "Lingxi" platform also trains and integrates our model ChatLing.

At the tool level, we provide current inference acceleration frameworks with excellent performance, such as vLLM, TGI, TensorRT-LLM, and EnergonAI, as well as fine-tuning tools for training, including alignment and tools that encapsulate MoE modeling methods.

This package design allows our line-of-business R&D personnel, even those with non-technical backgrounds, to go into the intricate technical details. All they need to do is prepare the data, use the platform's simple point-and-click operations and dataset configuration, take advantage of the excellent default parameters we provide, or use our tools to fine-tune these parameters, and they can train a large language model suitable for their industry.

We built a Lingxi large language model called ChatLing, which benefited from the large language platform we mentioned earlier, and enabled the three-stage process of pre-training, SST, domain fine-tuning, and reinforcement learning alignment to be carried out smoothly.

We didn't fully adopt the strategy of training from scratch, as this requires a large amount of high-quality corpus, and current model training usually needs to process at least 1.5T to 2T tokens.

In order to make efficient use of resources and iterate quickly, we chose to use the so-called "pedestal" based on the existing open source model that has performed well at home and abroad. On top of these foundations, we combine 58.com's business data, use cleaned, tens of billions of high-quality corpora for incremental pre-training, and then perform fine-tuning and reinforcement learning to further improve the performance of the model.

By completing these three stages of training on an open-source common pedestal, we obtained the Lingxi large model, which was then used to empower 58.com's business applications.

The flexibility of this process is reflected in the ability of our team to respond quickly to new open source models as soon as they are released. On the evening of April 18th, Llama 3 was just open-sourced, and on the afternoon of April 19th, we have completed fine-tuning training and reinforcement learning based on the model, and the new model has been launched. This means that on the evening of April 19th, our business unit will be able to start using the model based on Llama 3 fine-tuning.

We are committed to developing and adopting a variety of technologies to optimize the use of inference resources for large language models, and these technologies are currently being actively used in our business.

One of these technologies is MoE, where we have created an automated process to build an MoE model. This process allows different models to choose from a variety of implementations based on their needs, including methods such as Databricks, traditional Mistral, or TM2, to generate their base MoE models. In addition, we have completed the fine-tuning and training of the MoE-based model, so that the model can serve specific business scenarios more accurately.

We also use S-LoRA technology, which is a widely used fine-tuning method, and although some have questioned its differences from direct BERT fine-tuning, we have integrated LoRA to allow a single base model to be compatible with up to 1,000 LoRA models, enabling personalized customization for each scenario. This method can be spliced in batches according to the diversity of online traffic, and combined with the pedestal model for inference, which greatly saves resources.

In terms of training and inference acceleration of large models, we use traditional solutions including Unsloth, as well as fine-tuning techniques implemented on Flamer and Flash Attention technology in incremental pre-training. In addition, we utilize HQQ-based inference acceleration technology. Although there is a certain upper limit of hardware resources, such as using two 4090 graphics cards to support the inference and fine-tuning of the Qianwen 72B model, this is our current limit.

3. The large language model supports a variety of cooperation modes and flexibly responds to different business parties

Our platform is built to support a variety of cooperation models.

For applications that do not have an internal algorithm team, we provide an Agent platform, which is similar to the capabilities provided by Zhipu and Qianwen. This in-house agent platform allows users to quickly create bots in a low-code or even zero-code way, and build a large model workflow by dragging and dropping.

For example, to create a robot that queries the weather or a service that calls internal interfaces, users only need to drag and drop the corresponding process modules, such as the large model processing process, the knowledge base process, the interface call process, and the process of making the large model self-check the correctness of the workflow. After the API is completed, the user can generate an API with a single release for the business to directly access.

This ease of use has improved everyone's satisfaction and has been widely used within the company.

For application four, we provide API interfaces directly, but do not include the agent platform. This approach is suitable for scenarios where it is considered straightforward to use without fine-tuning the model. We have connected to a variety of commercial models, such as Wenxin Yiyan, Zhipu AI (GLM), Tongyi Qianwen, etc., and combined with our own ChatLing model for rapid deployment.

For scenarios that need to be fine-tuned, we offer three different support options.

Application 3 is our more common practice, which includes a dedicated fine-tuning team, who provides customized fine-tuning services for the business based on the Lingxi large model, so as to realize the empowerment of the platform to the business.

For application 2, the business side has its own algorithm team, so there is no need for the direct participation of the ChatLing large model team. In this case, the algorithm team can directly perform in-depth customized fine-tuning and related operations directly on our large model platform.

For those business parties with particularly strong algorithmic capabilities (Application 1), they may not need to fine-tune based on the Lingxi large model or the model tuned by Instruct. In such cases, we also provide support, including necessary parameter configuration and subsequent Prompt optimization services, to help the business side optimize and reproduce the fine-tuned data in a chain of thought.

Fourth, the Lingxi model can better understand life services, and the online inference speed is 2.6 times faster than that of closed source

In terms of the effect evaluation of the Lingxi large model, we have trained and implemented multiple versions of the Lingxi large model based on open source large models of different sizes, including the implementation of the MoE architecture.

We have tested the model capabilities on public evaluation platforms such as OpenCompass and MT-bench, and provided four versions of the model, including Turbo and Plus. Compared with other open-source models, our model has shown certain performance advantages in MMU, C-Eval and other indicators.

In addition to evaluations on open-source datasets, we also evaluated the models using internal data. For NLP and NLU tasks, our fine-tuned open source model is more effective than directly using the official open source model.

This improvement is mainly due to the fact that we have incorporated a large amount of industry data into the development process of ChatLing, and built a large model with industry characteristics, which makes our model more accurate and powerful in understanding the life service and recruitment fields.

We conducted an experiment, purchased the services of the top 1 and 2 manufacturers in the market, and tested them against our ChatLing.

We used ChatLing Turbo, which has about 10 billion parameters, and compared it with the large models of commercial manufacturers based on 100 billion parameters. In this comparison, we made sure that the data and conditions used were as consistent as possible, except for the model itself.

The results show that at the scale of 10 billion parameters, the performance of our ChatLing model after fine-tuning exceeds that of the 100 billion parameter models of commercial manufacturers. This finding greatly enhances our confidence in the feasibility of applying small-scale models to industry-specific data.

We broke down the capabilities of the large model and designed a scheme with four separate modules. Each of the four modules is a self-contained large model that is deployed on a single pedestal model via S-LoRA technology.

The first module is intent recognition, which is responsible for judging the user's current intent; The second module is answering questions, which needs to judge whether the user's question can be answered and generate corresponding answers, which involves the dual tasks of NLG and NLU; The third module is rhetorical question generation, which identifies the missing parts based on the information the user has provided, and generates questions to ask the user; The last module is information extraction, which is responsible for extracting key data from the information provided by the user.

Among these four capabilities, part of the intent recognition, information extraction, and answer question modules can be regarded as more direct NLG tasks, and the large model shows high accuracy on these tasks. The answering questions and rhetorical questions generation module involves more complex NLU tasks, and we have optimized the model by combining the technologies of NLG and NLU to reduce the probability of the model hallucinating.

Through the implementation of cases based on AI central control or Agent, we conducted an in-depth comparison and analysis of different large model application strategies. In particular, we pay attention to the method of splitting the large model into four independent capabilities, and compare it with the fine-tuning effect of the closed-source 100-billion-parameter large model and the effect of GPT-4 Turbo directly managing the dialogue.

From these comparisons, we draw some valuable conclusions.

Compared with the commercial closed-source model, when we split the large model into four independent capabilities: intent recognition, question answering, rhetorical question generation, and information extraction, each independent ability performs better than the closed-source model. This shows that the training of large models for specific industry domains in this scenario is very successful.

Secondly, in terms of conversation fluency, if each round of the conversation is required to be fluent and accurate, although the level of traditional NLP is high, GPT-4's end-to-end scheme does not reach this level. Whether using the closed-source large model or our Lingxi large model, the solution after splitting according to the four capabilities is significantly improved compared to traditional NLP.

In terms of inference speed, our tens of billions of models are much smaller than commercial models in size, but through inference acceleration technology, our online inference speed is 2.6 times that of the highest speed of commercial closed-source large models.

In summary, large models in vertical domains have significant performance advantages over open-source general large models. Whether it is open source data or closed source data, vertical domain large models can meet or exceed the traditional evaluation standards of open source models, and at the same time show better performance indicators in specific internal scenarios. Even compared with the fine-tuned commercial 100-billion-parameter general-purpose model, the 10-billion-parameter model in the vertical field is not inferior in performance.

The democratization of large models is becoming more and more obvious, and even if a smaller-scale large model is used, such as the open-source 8B model such as Llama 3, as long as it is combined with a corpus of a specific vertical domain for training, its performance in specific scenarios may even exceed the results of directly using commercial 100 billion general large models or fine-tuning based on them. This is a valuable experience that we have accumulated through practice.

The above is a complete summary of the content of Sun Qiming's speech.

58.com Sun Qiming: How to build a large model of life service vertical? Self-developed + open source with both hands

1. The service platform leverages AI transformation, and the intelligent assistant realizes preliminary screening and capital retention

2. Bringing together self-developed and third-party models, Llama 3 has been launched on the platform

3. The large language model supports a variety of cooperation modes and flexibly responds to different business parties

Fourth, the Lingxi model can better understand life services, and the online inference speed is 2.6 times faster than that of closed source

Read on