How was the "Cao Zhi" big language model born? Let's take a look at the CTO of Daguan Data

author：Shangguan News 2023-08-29 17:21:00

"Cao Zhi wrote a poem in seven steps, and his most famous chapter, Luoshenfu, is a typical long text of ancient literature. This is also the specialty of the 'Cao Zhi' big model, doing intelligent analysis and writing of long documents. At the 2023 World Artificial Intelligence Conference (hereinafter referred to as "WAIC2023"), Chen Yunwen, chairman of Daguan Data, officially released the "Cao Zhi" vertical field large language model (hereinafter referred to as the "Cao Zhi" large model).

This is the first independent and controllable domestic GPT large language model dedicated to vertical industries in China, which can accurately complete long-text writing with multiple types and complex structures, automatically draft multiple types of documents, and realize multi-modal content generation in the future, such as tables, charts, pictures, etc. in long documents. So how was "Cao Zhi" born? Let's listen to the narration of Ji Daqi, CTO of Daguan Data.

Deep in the field of NLP

Founded in 2015 and growing up in Shanghai Pudong Software Park, Daguan Data's founding team are all program veterans who have worked with Chinese characters for more than a decade, and are deeply engaged in the field of NLP (natural language processing). In March this year, with the release of the vertical, dedicated, independent and controllable domestic version of the ChatGPT "Cao Zhi" large model, Daguan Data is constantly promoting the deep integration of NLP technology into different industries.

NLP is known as the crown jewel of AI. From the Internet to a wider range of industries, Daguan Data has accumulated a large number of data, talents and NLP traditional architecture in vertical fields in finance, government affairs, manufacturing and other industries. After extensive exchanges with customers from finance, government, manufacturing and other industries, Ji Daqi, co-founder and CTO of Daguan Data, gradually found that NLP technology has broad application prospects in office documents.

In 2017, Google published a paper proposing NLP's two technical routes of "understanding" and "generation". "Based on the superior resources and future development of Daguan Data, we chose the technical route of 'understanding' from the beginning." Ji Daqi introduced. This year, the IDP intelligent document review system developed by Ji Daqi and the R&D team using knowledge graph, text recognition and other technologies entered the market.

With the continuous development of artificial intelligence, the need for machine intelligence to process long text is becoming increasingly urgent. Subsequently, Daguan Data was put into the development of the large language model, and Ji Daqi served as the general leader of the project. This is the starting point of the birth of today's "Cao Zhi" big model.

"Cultivate" an artificial intelligence version of "Cao Zhi"

"We want to 'cultivate' an artificial intelligence version of 'Cao Zhi', hoping that it can quickly generate long texts like the mainland's historical celebrity Cao Zhi." Referring to the origin of the name "Cao Zhi" large model, Ji Daqi said with a smile, "This is voted out of forty or fifty names by our employees. ”

"Long text" is the target task of the "Cao Zhi" large model. Different from the simple short text generation of one question and one answer, the "Cao Zhi" large model can accurately complete the writing of long text with multiple types and complex structures, automatically draft various types of documents, and have features such as automatic typesetting, intelligent error correction, text polishing, and automatic summary generation. It can also realize multimodal content generation, such as tables, charts, pictures, etc. in long documents; Support writing in dozens of languages such as Chinese, English, French, German, Japanese, Korean, etc., and assist manual workers to greatly improve office efficiency; In terms of long document translation, 1:1 layout restoration of the original text's titles, paragraphs, and other contents is realized, providing a real-time translation experience, which is widely used in scenarios for intensive processing of multilingual documents.

This is also the first batch of industrial application-level models that can be implemented in large-scale language models in China, and has been put into application in multiple scenarios of AIGC in the financial field. Based on the "Cao Zhi" system, the "Cao Zhi" large model further consolidates the intelligent foundation of Daguan data industry application and comprehensively enhances the AI full-product matrix capability.

Responsible Editor: Yang Linyu

Text Lu Xiaoyu

Source: Pudong Release

How was the "Cao Zhi" big language model born? Let's take a look at the CTO of Daguan Data

Read on

Global AI Agent inventory, big language model entrepreneurship must refer to 60 AI agents

Reversing the Curse: The Powerlessness of Big Language Models

CNCC | Prospective problems and challenges of large language models in mathematics: theory, methods and applications

Recently, the desktop operating system, the three camps have very large version updates. First of all, domestic DeepinOS accesses AI large language models. Immediately after the 26th, Microsoft Wind

The implementation practice of large language model in data warehouse data governance

The breakthrough of the big language model is to equip AI with five senses and five senses

How to use big language models to build a private knowledge base?

🚀Langchain-Chatchat: The New Choice for Local Knowledge Base Q&A! 🌟 Project Highlights: Based on the Big Language Model: Combining Langchain and Ch

Microsoft launched the AutoGen framework to help developers create complex applications based on large language models

Live Review | Potential and resistance, explore the application of big language models in the field of financial risk control

Under the wave of ChatGPT, look at the development of China's large language model industry #Dongshroom Business School#

The Big Language Model of Federal Law

The bookstore picked it up casually and took a look, and stood for three hours to read it, the fastest reading speed 😂 ever#Large Language Model#OpenAI

KOSMOS-2.5: Multimodal Large Language Model for Reading "Text-Dense Images"

MIT Amazing Proof: Big Language Model is the World Model? LLM understands space and time

How to Become LLM Word Master! "The Underlying Mental Method of Big Language Model"

Solomonov: The Prophet of Large Language Models

Large Language Model Deployment: vLLM and Quantization

Apple launches OpenELM, an efficient language model, Xiaomi plans a new car for 150,000 yuan, and AI successfully rewrites human DNA

The combination of deep learning and chemical language models is used for de novo drug design, which is published in the journal Nature

The tuyere belonging to major technology companies is here again! This large language model leads to the "new industrial revolution."

The landing of large language models Why the first step is to do customer service

OpenAI launches new large language model GPT-4o; Apple will start selling the Vision Pro in China; SoftBank sold almost all of its shares in Alibaba

探索大语言模型：理解Self Attention| 京东物流技术团队

The synergy of knowledge graphs with large language models

Multi-functional RNA analysis, the RNA language model of the Baidu team was published in the journal Nature

The parameters are improved slightly, and the performance index explodes! Google: Large language models hide mysterious skills

Learn more about large language model operations (LLMOps)

#头条创作挑战赛#Gai是现在人工智能追求的目标, which is also the essence of artificial intelligence now, the establishment of a knowledge base cannot be like an industry knowledge base

CVPR 2024|Only one language model is needed to generate high-quality 360-degree scenes from image diffusion models

Altman talks about the opportunities, challenges and human self-reflection of AI: China will have a unique large language model

19 Best Large Language Models in 2024