轩辕大模型的实践与应用 | ML-Summit 2024

In the wave of artificial intelligence, big models are reshaping the future of fintech in their own unique way. At the 2024 Global Machine Learning Technology Conference on April 26, Yang Qing, general manager of Du Xiaoman's data intelligence department and executive chairman of the technical committee, shared the practice and application of the Xuanyuan model in the financial industry. This article will comprehensively introduce the implementation of the Xuanyuan model in the financial field.

Yang Qing_Xuanyuan large model practice and application .pdf

Author | Yang Qing

Listing | 青哥谈 AI

Redefining Finance: Paradigm Revolution Led by Big Models

The large model leads the financial industry to evolve to a cognitive intelligence paradigm, bringing a financial ecology of "human-machine symbiosis". The financial industry has gone through three stages, from traditional finance to Internet finance and then to intelligent finance. In the traditional financial stage, the industry follows the information intermediary paradigm, financial institutions mainly provide services manually, and IT systems are mainly used for back-end data processing. In this mode, service efficiency is restricted by human factors, which is a "human-led" model. In the stage of Internet finance, the Internet has broken the limitations of time and space, enabling financial services to reach a wider range of people. In this mode, online, mobile, and platform-based have become the mainstream, and the application of cloud computing and big data makes the advantages of man and machine complement each other, realizing the effect of "1+1>2". In the current stage of intelligent finance, technological innovation drives business transformation, large models reshape business processes, empower traditional AI services such as intelligent risk control and intelligent management, and large models Copilot and Agent have also been widely used. From traditional finance to Internet finance to intelligent finance, we can see the evolution of human-machine relationship from "human-led" to "human-machine collaboration" to "human-machine symbiosis".

In this context, financial institutions need to embrace the new paradigm of cognitive intelligence and use large models to lead financial transformation. Specifically, it is necessary to accelerate the digital and intelligent transformation and transformation of the financial industry based on the multi-dimensional capabilities of large models in terms of understanding, memory, generation, knowledge, and logic. Specifically, the ability of large models can be applied in four aspects: product and service development, intelligent customer experience creation, business process remodeling, and human-machine symbiosis management system construction, so as to achieve the application purpose of interaction naturalization, decision-making collaboration and execution automation.

Building a financial model: technology integration and iterative upgrading

The advancement of the financial model from general knowledge to scenario specialization is like the growth process of a person from a middle school student to a professional person. In the "middle school student" stage, large models need to receive general education and master a wide range of language understanding and information processing skills to lay a solid foundation for subsequent learning, while in the "college students" stage, they need to receive professional education to train the model on large-scale general text data, so that they can have a deep understanding of financial terminology, industry cases, expert experience and best practices, and gradually acquire the professional capabilities required by the financial industry, and finally become "working people" Guided by job output, it is necessary to further improve the performance and adaptability of the model through feedback from financial scenarios. The three phases also correspond to different model building phases, such as data optimization, financial enhancement, value alignment, and application enhancement.

First of all, use data-driven large models to make intelligent breakthroughs. The screening of massive financial data is a process of "panning for gold in sand", and the Xuanyuan team has built an intelligent data processing pipeline including text extraction, data cleaning and quality and safety assessment, and screened out 32% of the data essence in the original Chinese data through the steps of rule filtering, model filtering, deduplication filtering and quality filtering, and built high-quality model training data, including 10TB of general corpus and 1TB of financial corpus.

The quality model library built by the team includes text quality discrimination model, knowledge discrimination model and content structure discrimination model, which is strictly controlled by all-round data quality to escort model training. Manual evaluation proved that the quality of the filtered data was significantly improved by 48%, which led to a significant improvement in model performance.

In addition, the team has built an industry-leading content security system based on multi-domain content security standards to efficiently identify malicious information and firmly maintain the security bottom line of financial development. The system combines the annotation process of active learning and the automatic confrontation of large models to improve the data production efficiency and the prevention and control capabilities of the content security system, and effectively filter sensitive malicious content in multiple fields to less than 1%.

Second, the financial AI foundation is built through pre-trained models. The large Chinese model needs to consider the problem of word list construction, because a single word may require multiple Unicode characters to be constructed, resulting in slower decoding speed and longer encoding sequences. Considering that a large vocabulary is conducive to long text modeling and reasoning efficiency, the team adopted a high-compression granularity expansion method to add 7k Chinese letters and 25K Chinese words, making the new vocabulary size reach 64k. The pre-training is divided into two stages: in the first stage, only the lexicon features and decoding linear layer of the model are updated to adapt the model to the newly added vocabulary, and the original decoding method is corrected, and the data distribution and type are consistent with the original model newspaper, accounting for 50% of the data in Chinese and English; in the second stage, the model is updated with all parameters to increase the proportion of general Chinese and financial field data, of which Chinese accounts for 60%, English accounts for 25%, and financial field accounts for 15%. Through the two-stage pre-training, the convergence of the domain large model can be more stable. By adjusting the data proportion and increasing Chinese and financial data, we expect the model's English ability to be maintained, Chinese knowledge to be enhanced, and financial ability to improve, and the final actual effect shows that all three abilities are improved with the training process.

In addition, the long context capability of large models is a key requirement for the implementation of financial scenarios, and the team also tried to achieve a model context length of 100k, and summarized the practical experience of three different implementation methods: "dependency extrapolation", "extrapolation + short training" and "extrapolation + long training".

In addition, through the fine-tuning of instructions, a comprehensive financial cognitive "brain" is constructed. In terms of data construction, following the goal of low-cost and high-quality SFT data construction, the self-developed data generation method Self-QA is adopted, and the data is constructed through a three-step strategy of knowledge-guided instruction generation, machine reading comprehension, construction and filtering. In terms of data generation, the instruction data includes the general field and the financial field, of which the general field accounts for 80%, including the common sense encyclopedia, creative generation, code programming, safety and harmlessness, logical reasoning, summary summary, mathematical calculation and information extraction, which are 8 categories and 50 subcategories, while the financial field instruction data includes financial encyclopedia, financial calculation, research report interpretation and customer service skills of the 4 categories and 20 subcategories. Through the combination of hybrid fine-tuning and instruction fine-tuning, it can take into account the general and financial capabilities of large models and avoid catastrophic forgetting.

Finally, the "self-transcendence" of the model is realized through reinforcement learning. Reinforcement learning is a machine learning method that learns the optimal strategy by interacting with the environment to achieve the self-transcendence of the model. Compared with traditional supervised learning methods, reinforcement learning based on human feedback (RLHF) can learn from environmental feedback, it can explore more and wider samples, enhance positive cases and suppress negative cases, and the loss function is based on the soft label of dominant values, which can make the model have better generalization. By further designing the reward model, overcoming the difficulties in PPO training and innovating the training, the reinforcement learning effect of the model is remarkable, and the human preferences in response mode, language style and answer content are better aligned.

In order to scientifically evaluate the performance of the model and accurately guide the optimization path, the team built a comprehensive model evaluation system. The gap is seen by the "horizontal evaluation" between different models, and the "vertical evaluation" of the same model at different stages is used to see the improvement. The pre-training stage focuses on whether there are abnormalities in the training and the quality of the base model, the instruction fine-tuning stage focuses on whether the dialogue ability is satisfied and the generalization ability is sufficient, and the reinforcement learning stage focuses on whether the security is improved and whether the usefulness can be maintained. The evaluation methods include real-time evaluation and stage evaluation, the real-time evaluation uses CheckPoint to automatically trigger the evaluation pipeline, and the stage evaluation adopts the all-round evaluation system of "automatic + manual", and the construction of intelligent evaluation tools and specifications is very important to improve the efficiency and consistency of evaluation. Based on the team's evaluation practice, we have open-sourced the "FinanceIQ" large model financial automatic evaluation set, FinanceIQ focuses on Chinese financial field tasks, covering 10 financial categories, 36 financial subcategories, a total of 7173 questions. It mainly covers several authoritative financial field examinations such as certified public accountant (CPA), tax agent, economist, banking qualification, fund qualification, securities qualification, futures qualification, insurance qualification (CICE), and financial planner.

The implementation of financial models: practical challenges and solutions

The large model will create value increment for the financial industry, but it faces many challenges in the implementation process, mainly including financial knowledge challenges, financial capability challenges and application cost challenges. When it comes to financial literacy challenges, large models can suffer from hallucination issues, accuracy issues, and forgetting issues, resulting in a lack of usability and reliability. In terms of financial capability challenges, large models need to have logical ability, reasoning ability, and decision-making and analysis ability, but these capabilities may require human intervention to be effective. In terms of application cost challenges, large models need to consume a large amount of GPU computing power, adaptation, inference, and maintenance costs, which makes their application costs high.

In order to address these three challenges, the "Xuanyuan" model has been effectively explored in terms of retrieval augmentation (RAG), agent and model quantification technology to solve the problem of difficulty in the implementation of financial large models.

Value Creation of Financial Models: Empowerment, Innovation and Change

On the whole, based on the comprehension, generation, logic and memory capabilities of large language models, the core capabilities of financial large models are mainly embodied in personality generation, interaction enhancement, knowledge enrichment and predictive analysis. The financial model reshapes the financial value chain from point to surface by reshaping the fields of financial services, financial management, operations, marketing, office and R&D: in terms of financial services, it realizes the transformation from cost reduction and efficiency improvement to value creation through customer information labels and voice recommendations, in terms of financial management, it provides professional-level financial analysis through analyst assistants and financial consultants, and in terms of operation, it realizes a new end-to-end operation model through NL2SQL and investment research and investment advisory. In terms of marketing, we provide integrated marketing workshops through communication insights and intelligent delivery. In the office, improve employee productivity with intelligent search and knowledge assistants. In terms of R&D, through code generation and single test generation, it helps improve the quality and efficiency of R&D.

"Xuanyuan" model: beyond cognition, towards AGI

In order to meet the challenges of the implementation of large models in financial scenarios and share practical experience and achievements with the whole industry, we have open-sourced the "Xuanyuan" series of large models. Du Xiaoman's "Xuanyuan" model is the first open-source Chinese financial model in China. In May 2023, the 100 billion parameter scale Chinese model "XuanYuan-176B" was released open-source. In September 2023, "XuanYuan-70B" topped the list of all open source models on the two authoritative lists of C-Eval and CMMLU. In March 2024, "Xuanyuan" released 12 new financial models. It includes 6B, 13B, and 70B parameter base models, dialogue models, and int4/int8 quantization models, and is completely open source for developers to download and use.

The "Xuanyuan" model has excellent ability to understand and generate content in the financial field. On the FinanceIQ test set, XuanYuan-70B-V2 shows a level that surpasses GPT-4 and demonstrates expert-level financial knowledge capabilities. In terms of the ability to solve practical financial tasks, the manual evaluation results of financial experts show that the Xuanyuan large model of each parameter size has the strength of "fighting big with small", reaching the model level of 2-5 times the number of parameters. "Xuanyuan" not only has excellent results in the financial field, but also has model capabilities covering multiple general ability dimensions such as mathematical calculation, scenario writing, logical reasoning, and text summarization, and has performed well in mainstream evaluation sets including MMLU, CEVAL, CMMLU, GSM8K, HumanEval, etc., and has even surpassed GPT-4 in multiple Chinese evaluation lists.

"Xuanyuan" open source address: https://github.com/Duxiaoman-DI/XuanYuan

轩辕大模型的实践与应用 | ML-Summit 2024

Read on

Shadowless Cloud Classroom at an altitude of 3,200 meters: Children under the snow-capped mountains meet AI models

Xiao Xin shared: cellular automata model

The man stole 800 yuan of mobile phone models and was detained

Only Google's injured world has been achieved, but should the "all-round model" be followed?

Unraveling the Mystery of Memory: Ebbinghaus's Forgetting Curve and Mind Model Playing Cards Help You Grow and Leap

After GPU, NPU becomes the standard configuration again, how do mobile phones and PCs carry large AI models?

Be a sneak peek! ByteDance is unprecedented! The large model is stunningly unveiled, and the price is as low as 99%!

39 million people watched Lei Jun's live test drive; Musk recruits second brain-computer experiment patient; DeepMind launches a large-scale model risk assessment framework

From "sky-high prices" to "fracture prices", large models are about to change

If you want to land a large model, let everyone afford to use it first

Direct interaction with hundreds of millions of users Third-party AI models accelerate access to the Weibo ecosystem

iFLYTEK Xinghuo large model empowerment, opening up the "new consciousness" of virtual people

When open source meets large models, what kind of changes will occur?

It is said that the senior management of the Tsinghua Department of the large model company has changed

58.com Sun Qiming: How to build a large model of life service vertical? Self-developed + open source with both hands

AI Dimensity Full Push, China's First End-to-End Large Model Mass Production on the Car Xpeng opens the era of AI intelligent driving