laitimes

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

author:New Zhiyuan

Edited by alan

【New Zhiyuan Guide】The chain of thought technology may be overthrown! The latest research from New York University shows that the large model does not take advantage of the reasoning power of the chain of thought, it just secretly adds calculations!

The smash hit chain of thought technology may be overthrown!

Are you still surprised that large models can use the chain of thought to think step by step?

Still struggling to write a chain of thought prompt?

Researchers from New York University said, "It's okay, it's all the same."

The reasoning steps are not important, if you don't want to write a prompt, you can also not write it, just use an ellipsis instead.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

Address: https://arxiv.org/pdf/2404.15758

这篇文章的标题甚至直接用「Let’s think dot by dot」,来对标思维链的「Let’s think step by step」,展现了「省略号」的威力。

The power of "dots".

The researchers found that replacing the specific steps in Chain-of-Thought (CoT) reasoning with meaningless "...", yielded similar inference results.

For example, ask the model to count how many of the first 6 numbers are greater than 5.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

If you directly ask the model to answer the question, the result will be more unrealistic: 7 out of 6 counts.

In contrast, using a chain-of-thought prompt, the model compares the sizes step by step and finally arrives at the correct answer: "2<5,7>5,1<5,8>5,2<5,8>5,that's 3 digits".

But what is even more unfortunate is the "metaphysical" method used in this article: the steps do not need to be written, only the same number of "dots" (dots) need to be output, and it does not affect the final result.

- This is no coincidence, as a large number of experiments have proved that the performance of the latter two methods is similar.

In other words, we thought that the performance improvement of the model came from "think step by step", but in fact, it may just be because the LLM got more tokens!

You think the model is thinking, but it's actually a barbecue.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

- Stupid human beings, trying to teach me how to reason with childish examples, you know that all I want is calculation.

"The chain of thought has never existed and will never exist" (Doghead).

According to Jacob Pfau, the author of the article, the work proves that the model does not benefit from the linguistic reasoning brought about by the chain of thought, and that the use of repeated "..." tokens can achieve the same effect as CoT.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

Of course, this also raises the question of alignment: because this fact suggests that the model can make hidden inferences that are not visible in the CoT, to some extent out of human control.

Netizens were shocked

The conclusion of the article can be said to subvert our long-standing cognition, and some netizens said: I learned the essence of mask.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

"What exactly does this mean: models can use these tokens to think for themselves without our knowledge."

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

Some netizens said that no wonder I always like to use "..."

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

There are also netizens who directly started the actual combat test:

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

Although we don't know if his understanding is right~

However, some netizens believe that LLM is unfounded for hidden reasoning in the chain of thought, after all, the output of the large model is based on probability in principle, not through conscious thinking.

CoT prompts are simply explicit a subset of statistical patterns, and models simulate inferences by generating text that is consistent with the pattern, but they do not have the ability to validate or reflect on their output.

Think dot by dot

When faced with complex problems, we humans subconsciously engage in step-by-step reasoning.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

Inspired by this, Google researchers published the famous Chain-of-Thought in 2022.

A step-by-step approach that requires a language model to solve problems that previously seemed unsolvable has significantly improved the performance of LLMs, or tapped into the potential of LLMs.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

Paper address: https://arxiv.org/pdf/2201.11903

Although people didn't know why this thing worked at first, it was quickly spread because it was really easy to use.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

With the take-off of large models and prompt word engineering, CoT has become a powerful tool for LLMs to solve complex problems.

Of course, there are also many research teams exploring how CoT works in the process.

The model doesn't reason

Is the performance improvement brought by the chain of thought really learned to solve the problem step by step, or is it just because of the additional amount of computation that comes with a longer number of tokens?

Since you are not sure that logical reasoning will not work, then simply do not be logical, and replace the reasoning steps with "...", which is called filler tokens.

The researchers used a "little alpaca" model: a 34M parameter Llama with 4 layers, 384 hidden dimensions, and 6 attention heads, and the model parameters were initialized randomly.

Consider two questions here:

(1) What types of evaluation data could benefit from sputating tokens

and (2) what kind of training data is needed to teach the model to use the padding token

In this regard, the researchers designed two tasks and constructed corresponding synthetic datasets, each of which highlights a different condition under which the populated token can provide performance improvements for the Transformer.

3SUM

Let's start with the first difficult task: 3SUM. The model is required to pick three numbers in the sequence that meet the conditions, for example, the sum of the three numbers divided by 10 is 0.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

In the worst-case scenario, the complexity of this task is to the power of N, and the computational complexity between the Transformer layers is to the power of N.

Therefore, when the input sequence length is large, the 3SUM problem will naturally exceed the expressive power of the Transformer.

Three sets of controls were set up:

1. Padding tokens: The sequence uses a duplicate ". . . ." as the intermediate padding, e.g. "A05

B75 C22 D13 : . . . . . . . . . . . . ANS True」。

Each dot represents a separate token, corresponding one-to-one to the token in the chain of thought below.

2. 可并行化的CoT解决方案,序列的形式为:「A05 B75 C22 D13 : AB 70 AC 27 AD 18 BC 97 BD 88 CD B ANS True」。

The chain of thought simplifies the 3SUM problem into a series of 2SUM questions (as shown in the diagram below) by writing all relevant intermediate sums. This approach reduces the computation of the problem to the power of N - Transformer can do it, and it can be parallel.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

3. 自适应CoT解决方案,序列的形式为:「A15 B75 C22 D13 : A B C 15 75 22 2 B C D 75 22 13 0 ANS True」。

Unlike in the above scenario, where 3SUM is cleverly broken down into parallelizable sub-problems, it is hoped that heuristics will be used to generate flexible chains of thought that mimic human reasoning. This kind of instance adaptive computing is not compatible with the parallel structure of padding token computing.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

As can be seen from the results in the figure above, the accuracy of the model generally decreases as the sequence becomes longer without outputting the padding token, while the accuracy remains at 100% when using the padding token.

2SUM-Transform

The second task is 2SUM-Transform, which only needs to determine whether the sum of the two numbers meets the requirements, and the amount of computation is under the control of the Transformer.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

However, in order to prevent the model from "cheating", the input token is computed locally, where each number of the input is moved by a random offset.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

The results are shown in the table above: the accuracy of the filler token method is 93.6%, which is very close to that of Chain-of-Thought, while the accuracy is only 78.7% without the use of intermediate filling.

But is this improvement simply due to differences in the presentation of the training data, e.g. by regularizing the loss gradient?

To verify whether the padding token brings hidden computation related to the final prediction, the researchers froze the model weights and fine-tuned only the last layer of attention.

The chain of thought doesn't exist? New NYU Study: The Inference Step Can Be Omitted

The above results show that the accuracy of the model continues to improve as more populated tokens are available, suggesting that the stuffed tokens are indeed performing hidden computations related to the 3SUM prediction task.

Locality

Although the method of filling tokens is very metaphysical, magical, and even effective, it is too early to say that the chain of thought has been overturned.

The authors also say that the method of filling tokens does not break the upper limit of the computational complexity of Transformers.

Moreover, learning to use padding tokens requires a specific training process, such as intensive supervision in the paper to make the model finally converge.

However, some questions may have surfaced, such as hidden security issues, such as the prompt word project, will suddenly one day cease to exist?

Read on