laitimes

Google Revolutionizes Large Model Memory!Feedback Attention Mechanism Leads to a New Era of Infinite Memory

author:Not bald programmer
Google Revolutionizes Large Model Memory!Feedback Attention Mechanism Leads to a New Era of Infinite Memory

Google has finally made a move! We will no longer tolerate the "amnesia" of large models.

TransformerFAM was born, and it is said that large models should have unlimited memory!

话不多说,先来看看TransformerFAM的“疗效”:

Google Revolutionizes Large Model Memory!Feedback Attention Mechanism Leads to a New Era of Infinite Memory

The performance of large models when dealing with long context tasks has been significantly improved!

In the diagram above, tasks such as Isabelle and NarrativeQA require the model to understand and process large amounts of contextual information and give accurate answers or summaries to specific questions. The model of the FAM configuration outperformed all other BSWA configurations for all tasks, and it was seen that when a certain point was exceeded, the increase in the number of BSWA memory segments could no longer improve its memory capacity.

It seems that on the way to long texts and long dialogues, there is indeed something about the "can't forget" of the large model of FAM.

According to Google researchers, FAM is a novel Transformer architecture, Feedback Attention Memory, which uses feedback loops to enable the network to pay attention to its own latent representations, facilitating the emergence of working memory inside the Transformer and enabling it to process infinitely long sequences.

To put it simply, this strategy is a bit like our strategy of manually fighting the "amnesia" of the large model: every time you talk to the large model, you enter a prompt again. However, FAM is more advanced, and when the model processes a new chunk of data, it will reintegrate the previously processed information (i.e., FAM) into the current process as a dynamically updated context.

In this way, you can deal with the problem of "love to forget". Even better, despite the introduction of feedback mechanisms to maintain long-term working memory, FAM is designed to maintain compatibility with pre-trained models without the need for additional weighting. So theoretically, the strong memory of the large model does not make it dull or consume more computing resources.

So, how did such a wonderful TransformerFAM come to be explored, and what is the related technology?

Google Revolutionizes Large Model Memory!Feedback Attention Mechanism Leads to a New Era of Infinite Memory

From the challenge,

Why does TransformerFAM help large models "remember more"?

滑动窗口注意力(Sliding Window Attention, SWA)这个概念,对TransformerFAM的设计至关重要。

In the traditional Transformer model, the complexity of self-attention increases quadratically with the increase of sequence length, which limits the model's ability to process long sequences.

"In the film Fragments of Memory (2000), the protagonist suffers from anterograde amnesia, which means that he cannot remember what happened in the last 10 minutes, but his long-term memories are intact and he has to tattoo important information on his body to remember them. This is similar to the current state of large language models (LLMs)," the paper reads.

Google Revolutionizes Large Model Memory!Feedback Attention Mechanism Leads to a New Era of Infinite Memory

Screenshot of the movie "Memory Fragments", the picture comes from the Internet

Sliding Window Attention, which is an improved attention mechanism for processing long sequences of data. It was inspired by the sliding window technique in computer science. When dealing with natural language processing (NLP) tasks, SWA allows the model to focus only on a fixed-size window of the input sequence at each time step, rather than the entire sequence. Therefore, the advantage of SWA is that it can significantly reduce the amount of computation.

Google Revolutionizes Large Model Memory!Feedback Attention Mechanism Leads to a New Era of Infinite Memory

But SWA has limitations because its attention span is limited by the size of the window, which prevents the model from taking into account important information outside the window.

TransformerFAM enables integrated attention, block-level updates, information compression, and global context storage by adding feedback activation to re-enter the context representation into each block of the sliding window's attention.

In TransformerFAM, improvements are achieved through feedback loops. Specifically, when the model processes the current sequence block, it not only focuses on the elements within the current window, but also reintroduces the previously processed contextual information (i.e., the previous "feedback activation") into the attention mechanism as an additional input. In this way, even if the model's attention window slides over the sequence, it is able to maintain memory and understanding of previous information.

So, with this improvement, TransformerFAM gives LLMs the potential to handle sequences of infinite length!

Google Revolutionizes Large Model Memory!Feedback Attention Mechanism Leads to a New Era of Infinite Memory

With a large model of working memory, continue to move towards AGI

TransformerFAM has shown positive promise in research, which will undoubtedly improve the performance of AI in understanding and generating long text tasks, such as processing document summarization, story generation, Q&A, etc.

Google Revolutionizes Large Model Memory!Feedback Attention Mechanism Leads to a New Era of Infinite Memory

At the same time, whether it's a smart assistant or an emotional companion, an AI with an infinite memory sounds more appealing.

Interestingly, the design of TransformerFAM is inspired by the memory mechanism in biology, which coincides with the natural intelligence simulation pursued by AGI. This paper is an attempt to integrate a concept from neuroscience – attention-based working memory – into the field of deep learning.

TransformerFAM introduces working memory to large models through feedback loops, allowing the model to not only remember short-term information, but also maintain the memory of key information in long-term sequences.

Through bold imagination, researchers hypothesize bridges between the real world and abstract concepts. As innovations like TransformerFAM continue to emerge, technological bottlenecks will be broken again and again, and a smarter, more connected future is slowly unfolding for us.

Read on