Lv Zhongtao, CTO of ICBC: Two feasible paths for the application of large models of banks

"The basic big model has a large amount of data, high computing power cost, and algorithm difficulty, so it is built by the head AI company. Although it has strong general knowledge ability, it lacks financial expertise and has limited application to financial scenarios. On June 10, Lv Zhongtao, academic director of the New Finance Alliance and chief technology officer of ICBC, said at the internal seminar on "Digital Intelligence Transformation of Financial Institutions and Application of Large Model Technology" held by the New Finance Alliance.

Lv Zhongtao believes that there are two feasible paths to realize the large-scale application of large models in the financial industry:

For small and medium-sized financial institutions, they can comprehensively consider the cost-effectiveness of application output and input costs, and introduce various large-model public cloud APIs or privatization deployment services on demand to directly meet the enabling requirements.

At the meeting, Shen Zhiyong, General Manager of Data Management Department of Minsheng Bank, Liu Jinmiao, General Manager of Digital Asset Management and R&D Center of Ping An Bank, and Hu Shiwei, Co-founder of Fourth Paradigm, also made keynote speeches. Li Lihui, Chairman of the New Finance Alliance and former Governor of Bank of China, and Zhao Xiaofei, Deputy Director of the Fintech Research Center of the China Academy of Information and Communications Technology, exchanged comments.

More than 170 guests from 56 banks and non-banking institutions and 55 technology companies participated in the conference online and offline. The meeting was chaired by Wu Yushan, Secretary-General of the New Finance Alliance, with academic support from the China Finance 40 Forum. For details of the meeting, please refer to → "How to Release the Value of Big Models to the Financial Industry?" The following is the full text of Lu Zhongtao's speech, which has been personally reviewed.

AI large model industry landing exploration practice

Text | Lv Zhongtao

Image

Lv Zhongtao

Hello guests! It is an honor to share with leaders and industry experts the practice of ICBC's exploration in the application of artificial intelligence large models.

Image

What is a large model

Since November 2022, with the release of ChatGPT, artificial intelligence large model technology has become a hot spot for all walks of life. The international authoritative journal "Nature-Machine Intelligence" defines large models as "pre-trained deep learning algorithms" with network parameter scales of more than 100 million. Through the training and learning of massive data, the large model has powerful language understanding and expression, thinking chain reasoning and other capabilities, and shows significant advantages and great potential in AI tasks such as text image understanding and content generation.

Compared with the traditional artificial intelligence algorithm one-by-case modeling method, large models have stronger general capabilities, can handle a variety of tasks, and can better solve the fragmentation problem of traditional models. Its characteristics can be summarized as "three major and one fast". The three major models refer to the training of large models based on "large computing power + big data + large algorithm parameter network structure" to achieve general massive knowledge presetting. One fast refers to the strong general ability of large models, and various industries can directly use or "stand on the shoulders of giants" on the basis of large models to quickly learn new knowledge through retraining and quickly empower business applications.

Large models can be classified according to three dimensions: parameter scale, data mode, and modeling mode.

From the perspective of parameter scale, large models generally refer to deep learning models with parameter scale of more than 100 million. As a complex neural network similar to the human brain, the larger the scale of its parameters, the more knowledge it can accommodate, and the stronger the ability. According to the scale of different parameters, it can be divided into billion, tens of billions, hundreds of billions or even trillions of parameter large models, which only have simple recognition and analysis capabilities and are used for simple tasks such as text classification and text similarity; The tens of billions of models have certain text generation and general capabilities, and can handle tasks such as simple logical reasoning and relatively low difficulty in article summary generation and idle chat, and it is difficult to deal with tasks with complex logic and strong specialization; Compared with the 10 billion large model, the 100 billion large model has a larger "storage space", it is not easy to forget information, can learn massive data, and has strong logical reasoning and generation ability, compared with the 10 billion large model, the 100 billion large model has significantly improved in the task of complex logical reasoning and strong professionalism such as knowledge question and answer, reading comprehension, logical reasoning, and article writing; Due to the huge consumption of computing power, it is difficult to commercially promote and apply the trillion model in the short term.

Compared with the 10 billion model, the data "memory" ability is stronger, with stronger logical reasoning and generation capabilities, and compared with the trillion large model, the cost-effective advantage is obvious. Therefore, the 100 billion model is the focus of development and application in recent years.

From the perspective of data modality, modality refers to the form of data expression, usually including text, images, audio, video, etc. According to the number of different data modalities supported, large models can be divided into single-modal and multi-modal large models, among which single-modal large models can be natural language large models that process text, visual large models that process images, etc. Multimodal large models can process a variety of data types such as images and text at the same time, and are currently more mature such as multimodal large models for text-based or text-based figures.

In November 2022, OpenAI launched the ChatGPT service based on hundreds of billions of natural language large models, demonstrating excellent text-general task solving capabilities. Single-modal natural language large models have become the focus of recent development due to their strong understanding ability and convenient interaction mode. Although ChatGPT4 has entered multimodality, there is still room for development of single-modal large models in terms of language and vision, and not everything is multimodal at once.

Artificial intelligence modeling refers to the use of various algorithms and technologies to build models in the field of artificial intelligence to achieve tasks such as analysis, understanding, and generation of various real-world problems.

From the perspective of modeling methods, large models are divided into two types: analytical and generative large models. Among them, the analytical large model is also called the discriminant large model, the principle is to classify or predict unknown data by learning the historical law of the training data, generally used to deal with text classification and other analysis and understanding tasks with simpler contexts, typical algorithms such as Google's BERT; By learning the patterns generated by data, generative large models can better analyze and understand data, and realize the creation of new sample content, which can be used for intelligent content creation tasks such as article writing and code generation, typical algorithms such as OpenAI's ChatGPT.

Compared with the analytical large model, the generative big model has powerful content generation and analytical reasoning capabilities, realizing the leap of artificial intelligence from traditional recognition and analysis to generative and creative capabilities, which the industry calls AIGC (artificial intelligence content generation) and has become a hot application in the industry.

For example, Open AI's GPT-4 is a 100-billion-level, multimodal, generative large model.

Since the birth of ChatGPT, the wave of large models at home and abroad has continued to rise, and an industrial chain ecology of algorithm research and development and product application has been formed. However, compared with the international leading level, there is still a generational gap between domestic large models. In terms of algorithm ecology, the best big model is OpenAI's GPT-4, which is a multimodal large model that shows the human level on professional and academic benchmarks in many fields. In China, various AI companies have entered the market, and have launched products such as Baidu Wenxin Yiyan, Tsinghua GLM, Ali Tongyi Qianwen, iFLYTEK Xinghuo and so on, each of which is good at it. At the application level, Microsoft has launched intelligent products in traditional fields such as search, office, and security by relying on the investment in GPT4 dividends. Domestically, it is still in its initial stage at the application level and needs to be further explored.

Image

The relationship between large models and traditional models

Commercial banks do not rely on a large model to conquer the world, the large model has a certain relationship with the traditional model, and the two are promoted at the same time.

To clarify the relationship between large models and traditional models, first of all, we must understand the positioning of large models in the relationship between artificial intelligence technology. Through the practice of ICBC, big model technology is not an independent algorithm or service, it is a complex system project, including large model computing power cluster construction, large model algorithm precipitation, large model supporting pipeline tools, large model services, etc., and through the empowerment of large models, the ability of vertical technology platforms such as natural language processing, image recognition, and knowledge graph has been iteratively upgraded.

After more than five years of construction, ICBC has accumulated more than 3,000 artificial intelligence models, including traditional machine learning models, traditional deep learning models and large models. First, traditional machine learning models, due to their strong interpretability, are widely used in intelligent decision-making analysis applications such as fraud transaction prediction and wealth management product marketing recommendations, and second, traditional deep learning models are widely used in perception recognition tasks such as OCR recognition, face recognition, and speech recognition, effectively improving labor-intensive work; After exploration and practice, large models can be used for AIGC tasks such as text and images, improving the quality and efficiency of intelligence-intensive work.

From the trend point of view, the large model will gradually surpass the ability of the traditional model with the enhancement of general capabilities, but subject to problems such as high computational complexity and poor interpretability, in the short term, the large model and the traditional model will coexist, and at the same time, the large model can be used as a central control and the traditional model is called as a skill. In the future, with the reduction of computational complexity and the enhancement of interpretability of large models, large models will gradually replace traditional models from the perspective of comprehensive cost performance.

The development and application mode of large models and traditional models is different. Compared with traditional models, first, the research and development state, large models have higher requirements in terms of computing power investment, data accumulation, personnel skills, algorithm complexity, etc., generally forming a professional team and training a model with multiple people; The second is the application state, which evolves from fragmented API calls to unified prompt word mining calls, reducing the difficulty of technology integration and research and development.

Image

The application value of large models for commercial banks

As a new type of artificial intelligence technology, ICBC actively explores and applies it to improve the intelligence level of intelligent customer service, smart office, operation management, marketing creation, intelligent R&D and other business fields, and truly solve the pain points of front-line employees.

In the field of intelligent customer service, there are a large number of credit cards, deposits and loans and other business handling regulations, traditional mode, when serving customers, agents need to interact with the system many times, the entire processing process takes a certain amount of time, and the customer waits for a long time. Through the document understanding, analysis and generation capabilities of large models, comprehensive, professional and accurate response techniques are automatically summarized and refined from a large number of banking business regulations, providing reference for agents and improving response efficiency and customer satisfaction.

In the field of operation management, there are business pain points such as difficulty in system standard retrieval, complex business handling, and interpretation of professional terms in the daily work of branch employees, and through large model capabilities such as abstract generation and information extraction, a large number of "static" documents can be converted into scenario-based and process-oriented "live" guidance, so as to improve the business response and communication capabilities of branch employees and create high-quality services.

In the field of smart office, through the large-model AIGC capability, it helps to write conference abstracts, write first drafts of reports, edit documents, and make posters to improve office efficiency. For example, the meeting minutes are generated, and according to the content of the meeting dialogue, the large model quickly generates the first draft of the meeting minutes, reducing the cost of manually recording meeting minutes. At the same time, the use of large model code generation and code completion capabilities can improve the coding efficiency and quality of front-line developers.

Image

Risks and challenges faced by large model applications

The essence of the big model is a deep learning algorithm with massive parameters, which is subject to factors such as the black box of the current model and high computational complexity, and there are problems such as answering questions and ethical risks of science and technology. For example, ChatGPT generates a large number of facts that seem logical, but the content may not be true or even fabricated, and there are security risks such as illegal exploitation and rumor-mongering.

The state attaches great importance to the security of large model applications, and the Cyberspace Administration of China clarifies that "content generated by artificial intelligence should reflect the core values of socialism", and requires caution in dealing with customers, and at the same time, unified approval is required for customer scenarios.

Although the big model has various security risks, it also brings new opportunities for the digital transformation of the banking industry. In this process, we need to solve many challenges such as data, computing power, algorithms, and applications.

First, large models require big data. Through data-driven, it releases the value of data elements, accelerates the construction of large models in the financial industry and enterprises, and accelerates the digital transformation of the banking industry.

Second, large models require large computing power. At present, the domestic and foreign computing power market is facing complex situations such as shortage of computing power supply, multi-vendor heterogeneous computing power integration, insufficient domestic AI ecosystem, and computer room and network construction, and financial institutions need to deepen cooperation with all parties in the industry to jointly promote and solve large-scale computing power deployment and application challenges.

Third, large models require great cooperation. The banking industry should accelerate the exploration of strategies and practices for introducing large model technologies common to the industry, and accelerate the enhancement of large model capabilities by promoting the application practice of large model algorithms in the banking industry, so as to improve the ability of large models to serve the financial industry.

Fourth, big models need big innovation. In order for the large model to be deeply applied in the bank, it is necessary to explore and form a set of high-standard, low-threshold banking financial large model application mode for the banking industry to quickly promote the deepening application of artificial intelligence in the financial field.

Image

Landing plan of commercial banks

Regarding the application of large models, there is currently no standard methodology in the industry, enterprises can use basic large models, industry large models, enterprise large models, and task large models according to the generalization and specialization of the scenario, and the scale of training data and input computing power of the four-layer model are gradually reduced, and the professional attributes are enhanced layer by layer.

Among them, the basic large model is built by the head AI company due to the large amount of data input, high cost of computing power, and difficult algorithm, although the general knowledge ability is strong, but it lacks financial expertise and has limited application in financial scenarios.

To realize the large-scale application of large models in the financial industry, there are two feasible paths:

For large financial institutions, due to the massive financial data and rich application scenarios, it is advisable to introduce industry-leading basic large models, self-built financial industry and enterprise large models, considering the long construction period, fine-tuning can be used to form a large model of tasks in professional fields to quickly empower business. For example, in the early stage, our bank and Pengcheng Laboratory jointly created and took the lead in realizing the application of artificial intelligence large models in the industry through fine-tuning.

Through ICBC's early practice, we believe that large models have obvious advantages in AIGC capabilities in text, images and other fields, but the current stage is not mature, and there are still problems such as scientific and technological ethical risks. Therefore, it is not recommended to use directly to customers in the short term, and priority should be given to the intelligence-intensive scenarios created by financial text and financial image analysis and understanding, and improve the quality and efficiency of business personnel in the form of assistants and human-machine collaboration.

After some exploration and application of large models in digital employees in the early stage, ICBC found that the human-computer interaction ability and information aggregation ability based on the large model technology can realize the information integration of multiple capabilities. On the one hand, it can integrate the system entrance to form a new mode of financial work such as analysis, prediction, and monitoring through natural language interaction for all employees, so as to equip each employee with an AI assistant. On the other hand, through the integration of large models, traditional models and business transaction processes, it is more conducive to giving full play to the multiplication and multiplier effects of data elements and realizing more efficient business processes.

We believe that with the help of the big model, financial institutions will improve the intelligent human-machine collaboration, intelligent business decision-making, and intelligent business processes, better realize digital transformation, and ultimately empower the real economy and people's better life.

This article is from the New Financial Alliance NFA