laitimes

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

Speech: Zhou Ming

Editor: Du Wei

In the field of natural language, what is the next window after perceptual intelligence? Zhou Ming, founder of Lanzhou Technology, made a detailed review and prospect of the development prospects of "cognitive intelligence" and the problems that need to be solved.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

On March 23, the Heart of Machine AI Technology Annual Conference was held online. At the afternoon artificial intelligence forum, Zhou Ming, chief scientist of Innovation Workshop, founder of Lanzhou Technology, former chairman of ACL and vice chairman of CCF, made a speech with the theme of "The Innovation Era of Cognitive Intelligence".

https://www.bilibili.com/video/BV15Z4y1B76d

The Heart of the Machine has sorted out the content of the speech without changing the original meaning.

Thanks to the recommendation of The Heart of The Machine, I have the opportunity to introduce the idea of cognitive intelligence that we are engaged in at Lanzhou Technology, and the title of my speech is "The Innovative Era of Cognitive Intelligence".

Everyone knows that artificial intelligence has experienced ups and downs after the development of the past few decades. From the earliest Turing test to the Dartmouth conference, which brought the origins of AI, then the expert system of the 60s, and then artificial intelligence unfortunately entered the first winter, and everyone did not believe in artificial intelligence. By the 1980s, the rise of Japan's fifth-generation computers brought new hope, such as the Prolog programming language. But then soon it entered the second winter of artificial intelligence.

By the 1990s, machine learning began to emerge, that is, statistical machine learning based on data-driven. At that time, statistical machine translation and other fields also began to be practical. The real leap forward in AI came from the rise of deep learning, probably in 2006. One milestone was 2016 – ImageNet surpassed humans. In 2017, AlphaGO defeated the human Go champion. For natural language, pre-trained models began to emerge in 2018, and later AlphaFold predicted protein structure with high precision. All of these are milestones in the development of artificial intelligence.

In general, artificial intelligence technology is roughly divided into two schools. The first genre is the early school of artificial intelligence based on symbolic computation, and the second genre is the neural network genre represented by the recent deep learning. Of course, these two schools have their own advantages, the former is more explainable, but it requires experts to wake up and is more fragile. The latter relies on big data and lacks interpretability.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

In any case, the artificial intelligence technology brought about by deep learning in recent years has profoundly changed human life, from images to speech to natural language processing, knowledge graphs, search and recommendation have been greatly improved, and popular technologies such as autonomous driving, security, automatic translation, and medical diagnosis have been deeply integrated into people's lives.

We are engaged in natural language understanding, and we care about where are the opportunities for natural language after perceptual intelligence? Here, after judging the perceptual intelligence, cognitive intelligence began to rise, driving the development of the industry.

There are a few key points here. The first key point is that recent research based on pre-trained models has driven many leaps in natural language processing tasks, one of the representative work of which is that in 2019, Google used Bert pre-trained models for reading comprehension, which exceeded the human annotation level. Coupled with some advances in the field of knowledge graphs and reasoning, people are full of expectations for the rise of cognitive intelligence represented by natural language.

What exactly is the problem that cognitive intelligence is trying to solve? In fact, cognitive intelligence to solve language understanding, problem solving, auxiliary decision-making and predictive planning problems, it has a very wide range of applications, from machine translation to search, chat, expert systems, advertising, sentiment analysis, dialogue, information extraction, fault diagnosis, reasoning, knowledge graph, affective computing and so on.

With cognitive intelligence, people can start from big data, go to information retrieval, go to knowledge and reasoning, and then go to insightful discovery, fully strengthen the intelligence engine based on big data, promote the digital transformation of all walks of life, and promote business upgrading.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

What does Lanzhou Technology do in the field of cognitive intelligence?

We have incubated a team in the Innovation Factory, Lanzhou Technology, which aims to promote the development of cognitive intelligence.

We first made a pre-trained model, which is based on the self-developed Mencius lightweight model, which can handle multi-language and multi-modal, while supporting understanding and generation, and customizing to meet the needs of different fields and different scenarios.

Then do a series of natural language processing tasks on the basis of pre-training. Taking machine translation as an example, we use pre-training models and multi-language joint training, coupled with terminology recognition and translation technology, to achieve translation between the world's major languages centered on Chinese, and in many vertical fields have achieved the industry's top level, through cooperation with companies such as Transsion, to help translators improve production efficiency.

The third is text generation. The so-called text generation, the user has some keywords or some topics, so that the computer generates an article or even a novel. We used a self-developed pre-trained model to develop an interactive and controllable text generation technology supported by general and domain big data. Users can specify keywords or knowledge units or application scenarios to generate a text, which can be applied to marketing copy generation (in cooperation with several storytelling-Ronghui companies), news summaries, novels or screenplay writing, etc.

The fourth is the search engine. We make a new search engine from scratch based on a pre-trained model. 20 years ago, everyone manually defined a lot of features based on TF-IDF, such as many search engines using tens of thousands of features to sort. We want to improve relevance and recall through end-to-end learning by not defining so many features by pre-training models, while using knowledge graphs to implement the whole process from search to inference to insight discovery. We want to help finance, marketing, law, government affairs and other fields to improve the efficiency of search and judgment.

In 2021, our work won the first prize in the HICOOL International Entrepreneurship Competition, with a total of 4800 participating teams, and 6 teams won the first prize. And, we got first place in the AI and finance track.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

However, cognitive intelligence sounds very mysterious, you have done a lot of technology, how to use it in all walks of life? This brings us to the solution problem of cognitive intelligence. Our thinking goes like this. First of all, the bottom layer should build a large-scale pre-trained model, including GPU clustering, data, training, fine-tuning, compression, and model lightweighting. On this basis, single-language, multi-language and multi-modal pre-trained models are trained to support tasks ranging from search engines to text understanding, machine translation, text generation, speech recognition and synthesis, image and video annotation and generation. Note that they are all based on natural language, through multimodality to other modes of understanding and processing.

On this basis, we unleash our capabilities through a flexible AI intelligence cloud. The so-called flexible AI intelligent cloud is that users can use drag and drop what they see is what they get, and soon form the composition of the business. In practice, the corresponding services can be obtained through SaaS or deeply customized methods.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

The path to lightweight model training

Our massive pre-trained model has taken a counterattack path. Many companies are pursuing large-scale pre-trained models, and the bigger the better. We believe that the pre-trained model may be more refined, more accurate, and more lightweight to a certain extent, so that users can easily implement it.

Here's a general idea of large-scale pre-training models. First, you have to have a large amount of text, but also have a large amount of computing power to calculate a language model. This language model also has to be fine-tuned for downstream tasks, and sometimes people are working on zero-shot methods that don't need to be fine-tuned, like GPT-3, and then go to complete some downstream tasks. The advantage of this approach is that it solves the problem of fragmentation, as long as you have the data to train the model, the model can be transferred to the process of learning, and the smaller labeled data set is fine-tuned when dealing with new tasks, so as to achieve a relatively high level.

This new paradigm has brought about a significant increase in the productivity of natural language, and it also marks the entry of NLP into the stage of industrialization and implementation, which is undoubtedly a good thing. So everyone is working on pre-trained models, and now the main models are Encoder mode (such as Bert), Decoder mode (such as GPT) and Encoder-Decoder mode (such as T5).

Many pre-trained models now follow these genres. Everyone's idea is nothing more than to study more data or a larger model, or to study a more efficient pre-training method, or to study how to use knowledge to enhance the pre-training model, or to study small sample learning and unified fine-tuning mechanisms.

Why should we focus on lightweight models? The cost of training the model is very high, as shown in the figure below, it is reported that it costs $4.6 million to train a GPT-3 model at the beginning, of course, this number is much smaller now, but it is still very expensive. Over the past few years, the parameters of the pre-trained model have increased by more than three orders of magnitude, and although the hardware capacity is also growing, its speed is much lower than the growth rate of the model parameter quantity, so the training cost still rises by two orders of magnitude.

Many of our colleagues in the industry are looking at how to reduce the cost of training, but it is still a big number. The cost of training mainly considers the following factors: the amount of model parameters, GPU and TPU hash rate, and the amount of data. In the actual task, in the process of adapting the big model to the downstream task, the implementation cost is relatively large, and the user cannot afford to buy so many GPUs to do reasoning. In view of this, we need to reduce costs, improve training capabilities, speed up training, and research lightweight models is now a top priority for lanzhou technology.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

Different lightweight model techniques

We've looked at a lot of techniques for lightweight models, so here's a brief introduction.

The first is model optimization, for different types of pre-training, we have done the corresponding model optimization.

The second is knowledge enhancement, including enhancement based on entity extraction, enhancement of common sense knowledge and domain knowledge, event dependence and causation, and the perception of knowledge of the multimodal world, from all aspects of research on how the corresponding knowledge can be used to improve its ability under the same size model. We also enhance this model with linguistics-based knowledge, such as dependencies.

Finally, we consider data enhancement, including domain knowledge enhancement, that is, continuing to train on the basis of existing models based on domain text; task data enhancement, such as obtaining Q&A pairs through information retrieval, for Q&A tasks; cross-language resource enhancement, such as more resources in a language, knowledge migration of languages through multi-language pre-training, and migration to low-resource languages.

Work like this makes the ability of the small models we train not necessarily low, and can be quickly customized for new areas. Currently, we have open sourced four small models, including text analysis, generation, image understanding, and financial models.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

The chart below is a reflection of our results from July to September 2021 in the CLUE charts. Our model is 1 billion parameters, but compared to the models of other companies' 10 billion and 100 billion parameters, it is not inferior, or even improved. On a number of natural language processing tasks, such as semantic similarity, our model ranks first in the synthesis of all tasks.

Our model is characterized by its small size, low cost, but relatively fine, thanks to the fact that it introduces a lot of knowledge. Another big feature is fast, we can train a new model in a few days to complete, do a new task can be completed in half a day. Then the specialization, each field and each task can be customized a pre-trained model, this degree of propriety is certainly more than the ability of the general large model.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

Our Mencius open source model was also awarded China's "50 Best Open Source Products". These models include Mengzi-BRET-base, Mengzi-BRET-base-fin, Mengzi-T5-base, and Mengzi-Oscar-base. The relevant documents and models can be downloaded as follows:

Address of the thesis: https://arxiv.org/abs/2110.06696

Project Address: https://github.com/Langboat/Mengzi

Capability expansion and corresponding models

We've recently added a lot of graphic capabilities to such models, such as image transcoding, where a picture generates a rich piece of text to describe the content of the image. Or text-to-text, i.e. a given small piece of text to generate a picture. The resulting effect is not bad, our model is relatively lightweight, so the cost of use is relatively low. As I said, many of our models have been open sourced, and many people in the open source community have increased their awareness of pre-trained models and enhanced their business capabilities by fully discussing and communicating with each other.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

On this basis, we studied machine translation. The machine translation mentioned here includes general translation, with Chinese as the center, covering the translation between major languages such as Chinese and English, Chinese and German, Chinese and French. The following figure shows the performance of Chinese-English translation in various vertical fields, many of which are in cooperation with Transsion. Compared with the current very popular translations, there is a good improvement. Whether in finance, automotive, law, contracts, machinery, engineering, oil, electricity and other aspects, it is now at a first-class level.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

Based on Mencius's pre-trained model, we are also doing technical research in the field of text generation. We study controllable text generation, which means users can enter topics, keywords, knowledge graphs, styles, personalities, etc. Our system generates text that contains this information about the user and truly reflects the user's intent. We call this controlled text generation.

The following figure shows us an example of marketing copy generation that we did in cooperation with Digital Storytelling-Ronghui Company. Users enter the title "Bring your skin back to 18 years old", keywords such as "ginger juice, whitening, mask", etc., enter some knowledge graphs, that is, use triples to describe the fact points, users can enter a lot of knowledge points or fact points at will. Our system "Mencius" generates a relatively smooth marketing copy.

Compared to popular models such as GPT, our model has three characteristics. The first resulting text may be richer, the second context is more coherent, and the third reflects the user's input facts. Every sentence output by GPT may be smooth but the sentences are incoherent or counterfactual. We have done in-depth research on these aspects and overcome them.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

Based on Mencius's pre-training model, we have made a new generation of industry search engines, taking financial search engines as an example. We can search for general stock stock price information, we can search for news, announcements, annual reports, we can get new fact points in the form of questions and answers, and we can also get the company's financial information.

One of the features is that we can guide the search according to the industry chain and the event chain. For example, if a user enters a keyword, we search for some results. But users want to understand the impact on the upstream and downstream of the industrial chain, we can generate new search keywords according to this industry chain, and users get new search results. At the same time, if the user wants to find out what new events or important events are in this search result, we need to extract the events, and then slide up and down according to the event map to get "what kind of events will be affected by such events", or "indicate what kind of events will occur."

In this way, we made a search experience based on the industry chain and event guidance, helping investment researchers to analyze the impact of which important events appear on the industrial chain, the impact on the downstream or indicate which new events are generated, so as to take some actions.

Based on our text generation technology of pre-trained models, we are also doing intelligent research report generation. The so-called intelligent research report is that some customers provide some topics, and the traditional method needs to manually search for evidence and documents on the Internet, and then manually carry out integration and extraction.

We want to automate all of these processes, so given a topic, we get a lot of relevant research reports through search, and then based on Mencius's lightweight model, we use knowledge graphs, small sample learning and comparative learning to do some structured events of information extraction, emotional public opinion analysis, abstract generation, opinion research reports and intelligent questions and answers, and then group these things together to form a research report.

You can see the examples in the following figure, such as giving a keyword "new energy vehicle", through Lanzhou's search engine, searching from the Internet to a lot of related research reports or news, through the integration can get common problem pairs, event extraction, abstract generation and public opinion analysis, and then all of these content input into our engine to generate a research report, including title, outline and specific content.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

Based on this technology, we can do "corporate ESG social responsibility report generation", and the same is true. The user enters the title of a company's corporate responsibility report, and according to the theme of the report, the corresponding writing outline is automatically generated, including responsibility management, market performance, social performance, environmental performance, report afterword, etc., and generates large headings, sub-headings, and final summaries and suggestions.

For the large headings and subheadings of each outline, we use the information extraction method to extract key information, and then generate the corresponding text, and then form the whole report after each paragraph is generated.

Of course, these results cannot replace manual experts, and human experts need to verify, correct and improve to ensure that they are correct. We hope that AI can work with human experts to improve the efficiency of the entire work.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?
Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

The challenges of the future of cognitive intelligence

Finally, let's talk about some of the challenges facing cognitive intelligence in the future.

The first challenge is the lack of common sense and reasoning.

You can see an interesting question in the following figure, such as the fact that "Trump is the fifty-fourth president of the United States". After the following question-and-answer process, you will find that humans and even children can answer, but some machines cannot answer. For example, who is the President of the United States? Both machines and people can answer. Is Trump the most powerful man in the United States? Man can answer, but unless such evidence or discourse appears in the document, the machine can answer, otherwise it cannot answer. Here's the reasoning, the president of the United States should be the most powerful person in the United States, this is common sense. Without this common sense, the machine cannot answer such a question. How to organize common sense and use common sense for reasoning is a flaw in the current pre-trained model.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

The second is how to ensure consistency in multiple rounds of dialogue, which is also a challenge we encounter in text generation, that is, inconsistent sentences before and after, such as inconsistencies in time, spatial inconsistencies, or logical inconsistencies.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

These challenges bring a lot of thinking about what the next generation of AI, including cognitive intelligence, should do, do, and what is our focus. I've listed four more important questions here based on my own perception.

The first is interpretability, for now our end-to-end learning, giving an input and then giving a classification or a result, there is actually no explanation, resulting in many applications when the user does not dare to use, such as applications in finance.

The second small sample learning, now end-to-end learning requires a lot of label text to learn, if the corpus of the label is relatively small, the learning effect is not good. This requires solving the problem of learning small samples.

The third is the problem of reasoning, just said that with the knowledge graph or common sense, how to get out of a reasoning chain from input to output to give conclusions.

Finally, there is the problem of common sense, which has been mentioned earlier and will not be repeated here.

To recap here, for ARTIFICIAL intelligence and cognitive intelligence, they actually have two stages. Let's start with the first stage, which uses symbols to reason, which has inputs and outputs, and has logic and reasoning. When people encounter some unfamiliar facts, they will be accustomed to doing some logical reasoning. This is System2. Compared to System2, System 1 is what is now deep learning. Using experience and data, it is possible to quickly give output from one input, without the need for a deep reasoning process, and thus lack of interpretability.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

If you combine the two, you have the ability to give both results quickly and the logic behind them. However, this rules-based symbology is not unique, so there is no way to adjust the network structure based on the loss of the output result. Neural networks, on the other hand, are small but not interpretable.

I wondered if I could do a foundational competency called Foundation Skill. One inspiration is when people do something big, like deriving a math problem or doing an essay, and it has a lot of basic skills that have been learned elsewhere before, and there is no need to do end-to-end learning for a new task. If every basic human ability is done well, it is quickly put together when dealing with a big task. If you can solve the problem of differentiation, whether the basic ability is based on data or logic, you can quickly stitch it into a large system, thus effectively solving the problem of small sample learning.

So, in order to study the solution of complex reasoning problems based on small sample learning based on basic ability, we are doing the automatic answering of the LSAT for the AMERICAN judicial examination. LSAT has three major problems: analytical reasoning, logical reasoning and reading logic. In the example below, there are six conditions known for the analytical reasoning question, asking "If something holds, which of the above answers is the most likely".

To solve this problem, we must first do natural language understanding, turning natural language input into a logical expression. The second step is to require a reasoning, starting from the initial state, through step-by-step reasoning, to obtain the possible final state. Then from the possible final state, we look at the more and less that meet the constraints, and extract the answers that meet the constraints.

How to solve the problem of natural language understanding here? Because this is a small sample study, LSAT has only a few thousand questions in total, so it is very difficult to learn logical understanding from end to end. So, can we learn with the basic ability just mentioned plus the ability to fine-tune? That is to say, word segmentation, semantic expression, and logical expression generation are all learned in other channels or with other data, and here for such a new data set to do rapid adaptation and transfer learning, to see if such a problem can be solved. It also involves how common sense is embedded in the entire process of logical understanding or reasoning.

In summary, LSAT is a very good data set to help you with complex reasoning tasks.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

Finally, I would like to conclude that cognitive intelligence is now developing better and better, and everyone is full of expectations for it. There is a good opportunity now, because pre-trained models and fine-tuning have greatly solved the problem of fragmentation. The SaaS model hopes to solve the problem of putting services into the hands of users in the last mile. Of course, opportunities and challenges coexist, and the biggest challenges are intellectualization, lightweighting, and ethics. We also need to address small sample learning, explainable, and common-sense reasoning, which is the goal of the next 5-10 years. Lanzhou has now done some work, that is, the fusion of neural networks and symbol systems, plus some basic capabilities and fine-tuning ideas, trying to advance the relevant experiments.

Lanzhou Technology is a cognitive intelligence company that does digital transformation for business scenarios, providing business insight products based on natural language processing, including functional engines based on pre-trained models, such as search, generation, translation, dialogue, and SaaS products for vertical industry scenarios. We aspire to be the world's top NLP technology company.

We are looking for researchers, engineers, product managers and interns all year round, and those interested can visit our website for detailed information.

Zhou Ming, founder of Lanzhou Technology: From perceptual intelligence to cognitive intelligence, what innovations should be made in the field of NLP?

Read on