laitimes

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Xiao Xiao Hengyu from the Cave Fei Temple

Qubits | Official account QbitAI

After Microsoft Bing plugged into the GPT model, the effect was not as good as everyone thought - it went crazy.

Now the official emergency came out, and (cut) the most popular feature on Bing, that is, the ability to express opinions.

Everyone is obviously not buying it, thinking that the new Bing has lost the most interesting part, and the current version of the experience is not even as good as Siri 1.0.

Some netizens hope that the big model behind Bing will be upgraded:

Maybe it's only the GPT-3 version now, and GPT-4 hasn't been released yet.

However, according to the New York Times and other revelations, the big model behind Bing is likely to be GPT-4...

We throw this phenomenon to the most familiar person in the field of AI Chat in China - Li Di.

He came up and poured cold water on the hot big model:

The bugs currently exhibited by New Bing and ChatGPT reflect a key problem that the larger model has to solve.

This problem can be ignored for a while, but making a large model, or even a ChatGPT-like product, if it is not solved, will eventually hit a wall.

This bug is the logical ability of the large model.

Big model: success is also logic, failure is also logic

It starts with the GPT-3.5 model behind ChatGPT.

Starting with GPT-3.5, large models have demonstrated a breakthrough ability - Chain of Thought (CoT), that is, logical thinking ability.

For example, when doing math problems, compared to directly outputting answers, the model can reason step by step until the correct answer is given, reflecting the thinking chain ability:

However, this ability has not been found in small and medium-sized models, or even some large models, and the academic community believes that this is a unique "new feature" of some large models.

Based on this "new feature", the popular ChatGPT was born, which showed the effect of thinking like a human when answering questions and expressing opinions.

However, Li Di believes that this logical ability of GPT-3.5 is unstable, uncontrollable and even dangerous.

First, its way of thinking is opaque; Second, it doesn't even cite sources.

These two problems seem to be covered up by OpenAI with a large amount of manually finely labeled data and a large number of model parameters, but once uncontrollable factors are added (access to the Internet like Bing, or modify parameters, etc.), they may cause the model to collapse at any time.

Therefore, nowadays, the ability to think logically is becoming a double-edged sword for large models -

The use effect is good, and the large model enters a new era; Once out of control, it will only make it more difficult for large models to land.

To illustrate the problem with the logical ability of large models, Li Di mentioned the Xiaoice chain of the latest product released by Xiaoice company.

X-Chain of Thought & Action (X-Co Xiaoice TA) is also a big language model that helps people answer questions through dialogue.

But its most typical difference is that the thought chain is realized with only a model of 2% of the GPT-3 parameters, and the thinking process is still transparent.

In terms of model size, it is not only not a large model with hundreds of billions of parameters in the GPT series, but the parameters behind it are only tens of billions or even the lowest can be reduced to 3.5 billion;

As for functionality, it refuses to generate roundups, assignments, and speeches like ChatGPT, but it can achieve more features. In addition to not shying away from expressing opinions on events and actively searching for answers online, it can also flexibly call various models or knowledge bases to complete tasks.

Specifically, the architecture of the Xiaoice chain is divided into three modules.

Module 1 is responsible for processing statements using the Chain of Thought (CoT) capability.

This part can be implemented by calling a large model with CoT capabilities, but it can also call the medium model with about 3.5 billion parameters mentioned above to convert the input statement into the output of the action instruction for specific actions.

Module 2 is responsible for executing instructions (Action), which receives and processes the Action instructions output by Module 1 and is responsible for executing the corresponding tasks.

According to the different instructions processed, the model and data called by module two are different, and there are at least three major ways to use it:

Networked or local knowledge base search. You can track the Internet to find the latest hot spots, even web page jumps, or index answers in a specific knowledge base.

Call a specific model to do something. For example, call the diffusion model with good effect to complete the painting, or call the speech model to synthesize the sound.

Control specific behavior in the physical world. Such as turning on lights, buying airline tickets, taking a taxi, etc., are not necessarily specific instructions, but conclusions drawn after model inference.

Module 3 is responsible for natural language generation, which simply describes the results of thinking and action in human words and then reports them to users.

To sum up, Xiaoice chain can be said to take out the hottest "way of thinking" of ChatGPT to make a model separately, and constantly reduce the size of the model.

Li Di believes that even if the core model size of the Xiaoice chain is only medium, it can show similar effects to the large model in the way of thinking about some problems.

△ It can also be networked and grabbed on the front line of eating melons

Based on this point of view, Li Di did the opposite in the mainstream "to do China's ChatGPT", not only did not promote his own ChatGPT products, but even launched a Xiaoice chain that emphasized "this is not ChatGPT".

It seems a bit non-mainstream (manual dog head).

Is there really a theoretical basis for doing this?

The technical basis behind CoT, there have indeed been many related studies abroad, including the "coaxing GPT-3 accuracy rate skyrocketing" paper that exploded some time ago:

In their research, the team found that just saying "let's think step by step" to GPT-3 can make it correctly answer logical reasoning questions that it would not have before, such as the following example from the MutiArith dataset:

Half of the 16 balls are golf balls, half of these golf balls are blue, how many blue golf balls are there?

These examples specifically test the ability of language models to do mathematical problems, especially logical reasoning.

GPT-3 was originally only 17% accurate in zero-sample scenarios (never seen in a similar body shape before), but when asked to think step by step, the accuracy soared to 78.7%.

The method, called CoT, was first discovered and proposed by the Google Brain team in January last year.

Its core idea is based on prompting, so that large models learn the process of thinking step by step and solve practical problems logically:

But the above Chain of Thought (CoT) papers are basically still stuck in the study of large models.

Li Di believes that the logical ability represented by the thinking chain is not the product of the grand model.

In China, "AI" may already be a well-known word, and it is also a wave of innovation in full swing.

If the method described by Li Di is verified, then there may be other ways out for AI industrial applications in addition to the large model route of "heaping parameters" and "throwing funds".

Domestic AI applications landed, and the power was divided into three points

The effect and popularity of ChatGPT have allowed people walking on the route of large models to see a glimmer of light, but it does not mean that the AI industrialization route only leaves the possibility of large models.

In other words, the popularity of ChatGPT can more clearly present the current status and trend of AI application landing at home and abroad.

First of all, the main path can be divided into three points.

The first is to directly do the underlying large model.

This is the most straightforward, the easiest to understand, and at the same time the most difficult path to take.

On the one hand, large models require a large amount of training data, while the reality is that there is less data that can be used for training, especially Chinese data.

Taking the recent hot spot as an example, the biggest shortcoming of China's first ChatGPT-like product MOSS launched by Professor Qiu Xipeng's team in Fudan is that the Chinese level is not high enough, and one of the important reasons is the lack of high-quality Chinese corpus during the training of large models behind it.

On the other hand, the parameters of large models are massive. Every seemingly short answer from ChatGPT mobilizes 175 billion parameters.

In response to the huge amount of parameters that first brought a huge amount of work to the annotation project, OpenAI hired a large number of workers in Kenya for less than $2 per hour, and screened and annotated data around the clock. Looking at China, only giant companies such as ByteDance and Baidu can spend so much manpower on labeling work.

In both of the above aspects, the final arrow points to the same problem: cost, immeasurable cost.

OpenAI CEO Altman once revealed on Twitter that the computational cost of ChatGPT per conversation is "unbearable." The 5 cents may seem like a thin number, but the number of conversations each person has with ChatGPT each day, and the growing number of people who use it, adds up to a terrifying magnitude.

Ohn Hennessy, chairman of Google's parent company Alphabet, said this week that AI conversations, such as large language models, could cost more than 10 times more than traditional search engines. Morgan Stanley previously estimated that Google's 3.3 trillion search queries in 2022 will cost 0.2 cents per time, and if connected to products like Bard, this number will increase based on the length of AI text generation.

AI similar to ChatGPT answers 50 words each time to answer general query business, then Google will increase its annual cost by $6 billion.

It is worth noting that no matter which domestic player piles up a large model comparable to GPT-3.5 or even GPT-4, it is necessary to find an application scenario that can land and run, and only by realizing a commercial closed loop can it not lead to nothing.

The second way is to extract the essence from the big model.

Expanding it is to reduce the magnitude of parameters under the premise of retaining or even improving a single capability of the large model as much as possible, and is committed to using a smaller model to realize the functions expressed by the large model.

If you think of a large model as a bicycle, the process of stacking parameters is the process of achieving an effect on the large model, which is difficult and slow. After removing the rough extraction, the effect can be achieved without the slow progress of the bicycle, which is equivalent to building a rocket on the way to the same goal.

Amazon is following this path by starting directly with small models, but this path can go through a key premise: small and medium-sized models can approach, and even achieve the practical ability shown by large models.

Cutting off unwanted branches and leaves and exploring the lower minimum scale of models with specific functions can alleviate the cost pressure caused by large model training to some extent.

However, this route is also controversial, first, because the ChatGPT large model has shown the feasibility of application, and insisting on this approach is bound to go against the current in technology; Second, even if the cost is better, there are no real-life cases to prove that this route can achieve the final victory in the AI application landing competition.

The third way is different from the previous two, not technical differences, but directly from the perspective of commercialization to play a competitive advantage.

This kind of player does not need to write more articles on technology, but more tests the ability of business innovation, which belongs to the mode of "taking nails and finding hammers" after thinking about the application of the scene.

At present, there are already reference cases along this road in foreign countries, such as AI startup Jasper, which provides various services based on GPT-3 open APIs, using AI to generate text content for platforms such as blog posts, social media posts and web pages.

Where the product experience is good enough, or the scene resources are rich enough, you can accumulate a large number of users and form your own core competitiveness.

Thinking backwards, precisely because the core competitiveness is not technical, companies that take this path will always have a sword of Damocles hanging over their heads. How can you not always worry about putting the fate of the product or even the company in the hands of others, and there is a risk of being stuck in the neck at any time?

Three routes are in front of us, and the pros and cons have already appeared. The first way means huge costs; The second way, the scheme has yet to be verified; The third way is that the core means of production are uncontrollable.

Which one leads to Rome? Or, in addition to these three paths, will there be a potential shortcut to the landing of AI applications?

Li Di said they chose the second path. Xiaoice chain is also based on this path, and in essence, it is still from the perspective of "explainable artificial intelligence" to explore the commercial application of AI with controllable costs and risks.

As for the scheme verification, it may not have to wait too long, Li Di said, in the future, Xiaoice Chain will cooperate with Bing to apply this method to search engines.

We will wait and see how it will be applied in practice.

.AI

Read on