laitimes

Liu Cong, Vice President of iFLYTEK: Technology Steps and Subversive Impact of Cognitive Intelligence Big Model|WISE2023 Subversive AIGC Industry Development Summit

author:36 Krypton

On May 23, 36Kr held the "Subversion · AIGC" Industry Development Summit. This summit brings together industry forces, discusses the coping strategies of enterprises and industries in the face of change, shares thinking, explores and discovers the most potential enterprises and the most valuable technologies in the industry, and explores the way forward in a turbulent environment.

At the conference, Liu Cong, Vice President and Dean of iFLYTEK Research Institute, delivered a keynote speech entitled "Technological Leaps and Disruptive Impacts of Cognitive Intelligence Large Model". Liu Cong believes that the "intelligent emergence" of large models has opened a new paradigm of "human-like" machine natural language interactive learning, which will change the way information is distributed, obtained, produced and interacted, subvert the traditional programming model, accelerate scientific research work, and improve productivity.

Liu Cong pointed out that although there are still problems such as "Zhang Guan Li Dai" and no human "inspiration" ability in the general large model, based on the "ripple effect" view previously proposed by iFLYTEK, the systematic error will gradually decrease as the data and feedback obtained by the model increase, and with the help of knowledge-intensive data annotation engineering.

Specific to the difference between the AI 2.0 and 1.0 era, Liu Cong quoted Xu Jingming, co-founder of iFLYTEK, saying that the AI 1.0 era is to find scenarios and problems with technology, and due to limited versatility, it needs to be customized for specific scenarios and industries, facing problems such as high cost and unsustainability. The emergence of cognitive large models has brought three breaking points: automatic hammering of nails for different scenarios and different tools more efficiently, being able to pound all kinds of nails, and hammers have become cheaper.

Liu Cong, Vice President of iFLYTEK: Technology Steps and Subversive Impact of Cognitive Intelligence Big Model|WISE2023 Subversive AIGC Industry Development Summit

Liu Cong, Vice President of iFLYTEK and President of the Institute

The following is the transcript of Liu Cong's speech (compiled and edited by 36Kr):

Hello everyone, today I am very happy to come to 36Kr Subversion · AIGC Industry Forum.

The topic of my talk today is "Technological Leaps and Disruptive Impacts of Large Models of Cognitive Intelligence". It can be seen from this topic that on the one hand, there will be our understanding of cognitive intelligence large model technology, and on the other hand, it will also bring the interpretation of the "1+N" system of the Xunfei Xinghuo cognitive large model released on May 6.

First, let's look at the technological steps of the cognitive intelligence big model.

The big model technology that everyone has been talking about for a long time, because of the rapid changes, as well as the iteration brought to the industry and products, has brought some anxiety to practitioners in related industries, including people like us who do technology research and development.

ChatGPT was released on November 30, with more than 100 million active users in two months after its launch; Bill Gates said that the big model is "no less historically significant than the birth of the PC or the Internet"; After the release of GPT-4, everyone paid great attention to its multimodal capabilities, and objectively speaking, its improvement in many language capabilities is more worthy of attention; Google Brain merged with DeepMind and launched PaLM2, and the effect is also worth continuing to watch.

The emergence of intelligence displayed by cognitive large models has promoted the technological leap of general artificial intelligence. Microsoft Research has published a paper called "The Spark of General Intelligence", listing and analyzing the various capabilities of GPT-4, and it can be seen that the model and data are feasible; When ChatGPT was released, it announced the characteristics of its abilities on 48 tasks; Looking at the domestic market, the Politburo meeting of the CPC Central Committee on April 28 this year also proposed to "attach importance to the development of general artificial intelligence".

Therefore, combined with the 48 main tasks given by ChatGPT and the analysis of various needs of more than 4 million developers on iFLYTEK's artificial intelligence open innovation platform, we extract seven dimensions of general artificial intelligence: text generation, language understanding, knowledge question and answer, logical reasoning, mathematical ability (text models are not easy to solve mathematical problems, and proprietary mathematical functions) and code capabilities, and then expand to multimodal capabilities.

I very much agree with the two points mentioned by Microsoft Wei Qing just now, the first is how we evaluate a thing should be closely related to the goal we want to do, I think that making a big model can not only focus on one or two problem directions, which is why we emphasize the development and evaluation of cognitive large model capabilities, first of all, there must be a scientific and systematic evaluation system.

Second, I very much agree with what President Wei just mentioned that practice is the only criterion for testing truth. It is not scientific to evaluate the ability of large models based on only a few question tests, and how to give everyone a comprehensive understanding of the capabilities of large models and their practical application is crucial. From a technical point of view, I think ChatGPT is a very good and successful product because there are hundreds of millions of users who are actually experiencing and expressing their feelings and opinions.

Therefore, we believe that the cognitive big model should have more comprehensive evaluation standards and systems, and the National Key Laboratory of Cognitive Intelligence undertaken by the University of Science and Technology of China and iFLYTEK, together with the Chinese Academy of Sciences Artificial Intelligence Industry-University-Research Innovation Alliance and the Yangtze River Delta Artificial Intelligence Industry Chain Alliance, designed a general cognitive large model evaluation system, and formed 481 subdivided task types covering 7 categories after joint discussion.

Let's review how the cognitive big model achieves the emergence of intelligence.

ChatGPT is still essentially a large model of deep neural networks, and a conversational AI system – emphatically, not a conversational system. Some people say that ChatGPT is a chat tool, but we think that ChatGPT not only enhances the functionality of chatbots, but more importantly, it can use the well-known prompt method to uniformly input various tasks into a large model, and a common large model can solve so many tasks and capabilities.

Why does intelligence emerge meaningfully? Because of the emergence of ChatGPT's intelligence, a new paradigm of "human-like" machine natural language interactive learning has been opened. The text itself is an abstract, suitable for human communication process, no matter what field of knowledge is learned, machines can learn like humans, truly master and use the core language and knowledge.

On March 14, GPT-4 was officially launched, and we saw GPT-4's multimodal capabilities, but more importantly, its expertise in many tasks continued to improve, its question answering was more secure and controllable, and it was able to handle longer contexts. We believe that the core of GPT-4's success is still the ability to penetrate language. The technology behind GPT-4 multimodal is to jointly encode images, image OCR text, and text inputs, and align image features to a unified semantic space in a joint training manner.

We also need to understand the real purpose of the generic large model and the practicality problems that the large model still has. For example, "Zhang Guan Li Dai" is a flaw in the large model itself, because it is generated by character after character, rather than directly copying fragments; There are also problems such as the difficulty of updating new knowledge in time, the inability to learn large models in "read-only mode", and the lack of human "flash of light" ability.

The advent of large models has also made data iteration different than ever. As a pioneer, ChatGPT is a pioneer with outstanding scientific and technological achievements, experience and professionals around the world who are contributing wisdom to ChatGPT and GPT-4. The self-evolution of the intelligent ability of large models requires the feeding of knowledge and user feedback from all over the world, which is very critical for the evolution of large models.

Here I also share with you the idea of the "ripple effect" we proposed, because the "ripple effect" is accelerating the "intelligence emergence" of cognitive intelligence.

In 2010, after iFLYTEK launched the iFLYTEK Cloud Platform (later iFLYTEK Open Platform) and iFLYTEK Input Method, we also put forward the idea of "ripple effect": artificial intelligence-related technologies are used by people little by little, more and more people use it, and they will continue to contribute data and feedback, and the system error will become smaller and smaller, just like the vibration of water ripples.

The field of perception represented by speech does benefit from the ripple effect, such as our speech recognition system, after assembling algorithms, data, etc., its error rate has decreased by more than 30% per year for 8-10 consecutive years.

Now there is a new content called cognitive intelligence data annotation project, the "ripple effect" is also applicable, but it is different from the voice and image fields we just said. The original data annotation is labor-intensive, and ordinary people can operate it after simple training; However, cognitive large models involve a wide variety of fields and specialties, and data annotation has been transformed into knowledge-intensive, and a steady stream of incremental knowledge data is a solid foundation for the emergence of large model intelligence.

Make a small summary:

First, from the current point of view, the upper limit of large models is very high, and there is hope for the emergence of machine intelligence close to human intelligence in the future.

Second, the conversational AI system of the "pure text world" has been very important for a long time, and this AI system and human-machine collaboration are well designed, not only can be closed-loop self-consistent, but also have a lot of text resources.

Third, the unified deep neural network large model has strong versatility, which is the spark towards general intelligence, and further research is needed in the future.

Fourth, cognitive big models can be popularized and applied in other fields such as motion intelligence, multimodal intelligence, and embodied intelligence in the future, and there is huge space in technology and industry.

Next, I will introduce the research progress and application practice of the iFLYTEK Spark cognitive large model released on May 6.

The "intelligence emergence" of cognitive big models brings new opportunities to solve human needs, and we distilled six changes brought about by cognitive big models in February this year:

First, change the way information is distributed and accessed. Whether traditional search or video streaming, information distribution will change in the future;

Second, by revolutionizing the production of content, writing will become easier;

Third, new natural interactions, all kinds of interactions will change under the Internet of Everything;

Fourth, realize expert-level virtual assistants, and more people can enjoy the benefits of resources in education, medical and other industries.

Fifth, subvert the traditional manual programming method.

Sixth, it will become an accelerator for scientific research work and greatly improve productivity. At present, it is possible to integrate the extraction and analysis of the content of the literature.

Why can iFLYTEK make a large cognitive model of iFLYTEK within half a year since it launched its research on December 15 last year?

"A lot of the present you see is the invisible past." In fact, iFLYTEK has made many years of source core technology reserves for the emergence of large model intelligence, in 2012 our voice evaluation passed the human expert level for the first time, in 2014 we proposed the "iFLYTEK Super Brain" plan, that is, to make machines understand and think, in 2017, the State Key Laboratory of Cognitive Intelligence was approved, and in 2022 the "iFLYTEK Super Brain 2030" plan was further launched, so that machines understand knowledge, learn well, and can evolve, and we continue to win championships in various international authoritative technology competitions. All are the accumulation of technology in the past ten years.

We also have three national platforms: the National Key Laboratory of Cognitive Intelligence, the National Engineering Research Center for Speech and Language Information Processing, and the National New Generation Artificial Intelligence Open Innovation Platform.

Based on the above, we launched the "1+N" large model research plan on December 15 last year, not only to make the "1" base model, but also to launch products around education, medical, interaction, office, automotive and other scenarios. From the first day of the start of the research, our technical route has been very clear, comprehensively benchmarking the 48 task capabilities given by ChatGPT, and moving forward step by step according to the plan.

On May 6th, we officially released the Xunfei Spark cognitive large model. Let's take a quick look at the display of the seven core capabilities of iFLYTEK Xinghuo, among which some interesting questions come from everyone's questions; In the demonstration of multimodal capabilities, everyone can see the virtual human with semantic continuity and automatic generation.

In the field of industry applications, it can be seen that in the field of education, the iFLYTEK AI learning machine equipped with the iFLYTEK Xinghuo cognitive model can mark and comment on Chinese and English essays layer by layer, and based on the analysis of hundreds of teachers' essay grading and grading records, the Xinghuo cognitive model has surpassed the level of ordinary teachers in terms of Chinese and English composition correction accuracy, error recall rate, and sentence revision grace rate. In terms of language learning, iFLYTEK Spark can realize a free and open-ended oral practice environment to avoid "dumb English".

In the office field, iFLYTEK has realized the upgrade of meeting minutes, regular discourse, one-click drafting, reading abstracts and other capabilities, making the office more efficient. Today, everyone listened to the report, as long as you upload the recording to iFLYTEK, choose the direction of the manuscript you want, you can generate the corresponding manuscript with one click. There are also new changes in automotive, digital employees, and more.

Here I want to talk about a very interesting topic, is a point mentioned by Xu Jingming, co-founder of iFLYTEK at the beginning of this year: before, it was to use AI technology to find scenarios and problems, but its general ability is difficult to be practical in different scenarios, just like holding a hammer to find nails but finding that each nail is different, resulting in complex industry customization, expensive, unsustainable problems and other problems, the value of the hammer has also disappeared in the fierce competition in the market.

However, the emergence of cognitive large models like "Mjolnir" breaks the game from three aspects: it automatically hammers nails more efficiently and automatically for different scenarios and different tools, can beat all kinds of nails, and hammers have become cheaper. On the basis of cost reduction and efficiency increase, the close combination of cognitive big models and industry scenarios can also continuously feed back the continuous evolution of the big model's own capabilities.

After its release on May 6, iFLYTEK Spark has three key milestones in the year of continuous upgrades. On June 9, it is necessary to break through open-ended questions and answers, and upgrade multi-round dialogue ability and mathematical ability; On August 15, the code capability was broken through and the multimodal interaction upgrade was realized; By October 24, we will achieve a universal model benchmark against ChatGPT, Chinese surpass and English equivalent.

Therefore, I believe that the development of China's cognitive big model must not only have "overtaking on a curve", but also need to have the courage to directly target, catch up and surpass while paying tribute to the goal, so it is also necessary to "charge straight on the straight road". Industry and academia also need to deeply integrate and collaborate to continuously inject inexhaustible power in the long-distance running of large models.

From win-tel in the PC era, to the two ecosystems of iOS and Android in the mobile Internet era, and then to the search ecosystem, we believe that the future big model itself can drive a series of upstream and downstream industry chains and technology chains to form a new ecology.

At present, iFLYTEK Open Platform is also working with iFLYTEK Spark to empower more developers to create more valuable AI applications and build a "Spark" ecosystem.

It is believed that the spark of general artificial intelligence will surely form a fire in China. We hope to take the Xunfei Spark cognitive model as a new starting point and ignition point, and work with all walks of life to build a better world with artificial intelligence.

Finish|Shen Xiao

Read on