laitimes

How was the "Cao Zhi" big language model born? Let's take a look at the CTO of Daguan Data

author:Shangguan News
How was the "Cao Zhi" big language model born? Let's take a look at the CTO of Daguan Data

"Cao Zhi wrote a poem in seven steps, and his most famous chapter, Luoshenfu, is a typical long text of ancient literature. This is also the specialty of the 'Cao Zhi' big model, doing intelligent analysis and writing of long documents. At the 2023 World Artificial Intelligence Conference (hereinafter referred to as "WAIC2023"), Chen Yunwen, chairman of Daguan Data, officially released the "Cao Zhi" vertical field large language model (hereinafter referred to as the "Cao Zhi" large model).

This is the first independent and controllable domestic GPT large language model dedicated to vertical industries in China, which can accurately complete long-text writing with multiple types and complex structures, automatically draft multiple types of documents, and realize multi-modal content generation in the future, such as tables, charts, pictures, etc. in long documents. So how was "Cao Zhi" born? Let's listen to the narration of Ji Daqi, CTO of Daguan Data.

Deep in the field of NLP

Founded in 2015 and growing up in Shanghai Pudong Software Park, Daguan Data's founding team are all program veterans who have worked with Chinese characters for more than a decade, and are deeply engaged in the field of NLP (natural language processing). In March this year, with the release of the vertical, dedicated, independent and controllable domestic version of the ChatGPT "Cao Zhi" large model, Daguan Data is constantly promoting the deep integration of NLP technology into different industries.

How was the "Cao Zhi" big language model born? Let's take a look at the CTO of Daguan Data

NLP is known as the crown jewel of AI. From the Internet to a wider range of industries, Daguan Data has accumulated a large number of data, talents and NLP traditional architecture in vertical fields in finance, government affairs, manufacturing and other industries. After extensive exchanges with customers from finance, government, manufacturing and other industries, Ji Daqi, co-founder and CTO of Daguan Data, gradually found that NLP technology has broad application prospects in office documents.

In 2017, Google published a paper proposing NLP's two technical routes of "understanding" and "generation". "Based on the superior resources and future development of Daguan Data, we chose the technical route of 'understanding' from the beginning." Ji Daqi introduced. This year, the IDP intelligent document review system developed by Ji Daqi and the R&D team using knowledge graph, text recognition and other technologies entered the market.

With the continuous development of artificial intelligence, the need for machine intelligence to process long text is becoming increasingly urgent. Subsequently, Daguan Data was put into the development of the large language model, and Ji Daqi served as the general leader of the project. This is the starting point of the birth of today's "Cao Zhi" big model.

"Cultivate" an artificial intelligence version of "Cao Zhi"

"We want to 'cultivate' an artificial intelligence version of 'Cao Zhi', hoping that it can quickly generate long texts like the mainland's historical celebrity Cao Zhi." Referring to the origin of the name "Cao Zhi" large model, Ji Daqi said with a smile, "This is voted out of forty or fifty names by our employees. ”

How was the "Cao Zhi" big language model born? Let's take a look at the CTO of Daguan Data

"Long text" is the target task of the "Cao Zhi" large model. Different from the simple short text generation of one question and one answer, the "Cao Zhi" large model can accurately complete the writing of long text with multiple types and complex structures, automatically draft various types of documents, and have features such as automatic typesetting, intelligent error correction, text polishing, and automatic summary generation. It can also realize multimodal content generation, such as tables, charts, pictures, etc. in long documents; Support writing in dozens of languages such as Chinese, English, French, German, Japanese, Korean, etc., and assist manual workers to greatly improve office efficiency; In terms of long document translation, 1:1 layout restoration of the original text's titles, paragraphs, and other contents is realized, providing a real-time translation experience, which is widely used in scenarios for intensive processing of multilingual documents.

This is also the first batch of industrial application-level models that can be implemented in large-scale language models in China, and has been put into application in multiple scenarios of AIGC in the financial field. Based on the "Cao Zhi" system, the "Cao Zhi" large model further consolidates the intelligent foundation of Daguan data industry application and comprehensively enhances the AI full-product matrix capability.

Responsible Editor: Yang Linyu

Text Lu Xiaoyu

Source: Pudong Release

Read on