Xiao Xiao Zephyr originated from the Concave Fei Temple
Qubits | Official account QbitAI
Chen Danqi, an alumnus of Tsinghua Yao class, gave a new speech at ACL 2023!
The topic is still a very hot research direction recently——
Whether (large) language models such as GPT-3 and PaLM need to rely on retrieval to make up for their own shortcomings, so as to better apply the landing.
In this speech, she and 3 other keynote speakers jointly introduced several research directions on this topic, including training methods, applications and challenges.
During the speech, the response from the audience was also enthusiastic, many netizens seriously raised their own questions, and several speakers tried their best to answer questions.
As for the specific effect of this speech? Some netizens directly gave a "recommendation" to the comment area.
So, what exactly did they say in this 3-hour talk? What is worth hearing?
Why do large models need "plug-in" databases?
The core theme of this talk is "Search-based Language Model", which contains two elements: search and language model.
By definition, it refers to "plugging" a data retrieval library to the language model, and retrieving this database during inference (and other operations), and finally outputting based on the search results.
This type of plug-in data repository is also known as a semiparametric model or a nonparametric model.
The reason for studying this direction is because (large) language models such as GPT-3 and PaLM, while showing good results, also have some headache "bugs", mainly three problems:
1. The amount of parameters is too large, and if the calculation cost is too high if the training is based on new data;
2. Memory is not good (in the face of long text, remember the following and forget the above), hallucinations will occur over time, and it is easy to leak data;
3. The current amount of parameters, it is impossible to remember all knowledge.
In this case, an external search corpus was proposed, that is, to "plug-in" a database to the large language model, so that it could answer questions by looking up information at any time, and because this database could be updated at any time, there was no need to worry about the cost of retraining.
After introducing the definition and background, it is time for the specific architecture, training, multimodality, application and challenges of this research direction.
In terms of architecture, it mainly introduces the content retrieved by the search-based language model, the way of retrieval and the "timing" of retrieval.
Specifically, this type of model mainly retrieves tokens, text blocks and entity mentions, and the ways and timing of use retrieval are also diverse, which is a very flexible model architecture.
In terms of training methods, independent training (language model and retrieval model are trained separately), continuous learning, multi-task learning (joint training) and other methods are introduced.
As for applications, such models involve more, not only in code generation, classification, knowledge-intensive NLP and other tasks, but also through fine-tuning, reinforcement learning, retrieval-based prompt words and other methods.
Application scenarios are also flexible, including long-tail scenarios, scenarios that require knowledge updates, and scenarios involving privacy and security, all of which have a place for this type of model.
Not only textually, of course. Such models also have the potential for multimodal extensions, which can be used for tasks other than text.
This type of model sounds like a lot of advantages, but there are some challenges with the search-based language model.
In his final "closing" speech, Chen Danqi highlighted several major problems that need to be solved in this research direction.
First, does the small language model + (expanding) large database essentially mean that the number of parameters of the language model is still large? How to solve this problem?
For example, although the number of parameters of such a model can be very small, only 7 billion parameters, the plug-in database can reach 2T...
Second, the efficiency of similarity search. How to design algorithms to maximize search efficiency is a very active research direction at present.
Third, complete complex language tasks. Including open-ended text generation tasks and complex text reasoning tasks, how to use retrieval-based language models to complete this task is also a direction that needs to be continuously explored.
Of course, Chen Danqi also mentioned that these topics are both challenges and research opportunities. Those who are still looking for a thesis topic can consider whether to add it to the research list~
It is worth mentioning that this speech is not a topic found out "out of thin air", and the 4 speakers intimately released the links to the papers for the reference of the speech on the official website.
From model architecture, training methods, applications, multimodality to challenges, if you are interested in any part of these topics, you can go to the official website to find the corresponding classic papers:
Answer the audience's confusion live
Such a dry speech, the four keynote speakers are not without a source, in the speech they also patiently answered the questions raised by the audience.
Let's start with who the keynote speaker is Kangkang.
The first was Danqi Chen, assistant professor of computer science at Princeton University, who led the talk.
She is one of the most talked about young Chinese scholars in the field of computer science recently, and she is also an alumnus of the Tsinghua Yao Class of 08.
In the informatics competition circle, she is quite legendary - the CDQ divide and conquer algorithm is named after her. In 2008, she won an IOI gold medal for the Chinese team.
And her 156-page doctoral thesis "Neural Reading Comprehension and Beyond" was once a hit, not only won the Stanford Best Doctoral Dissertation Award that year, but also became one of the hottest graduation theses of Stanford University in the past decade.
Now, in addition to being an assistant professor of computer science at Princeton University, Chen Danqi is also the co-leader of the NLP team that built the NLP team from scratch and a member of the AIML group.
Her research focuses on natural language processing and machine learning, and she is interested in simple yet reliable methods that are feasible, extensible, and generalizable in real-world problems.
Also from Princeton University is Chen Danqi's apprentice, Zexuan Zhong.
Zhong Zexuan is a fourth-year doctoral student at Princeton University. Graduated from the University of Illinois at Urbana-Champaign with a master's degree, tutored by Tao Xie; Graduated from the Department of Computer Science of Peking University with a bachelor's degree, he interned at Microsoft Asia Research Institute, and his supervisor was Nie Zaiqing.
His latest research focuses on extracting structured information from unstructured text, extracting factual information from pre-trained language models, analyzing the generalization capabilities of dense retrieval models, and developing training techniques suitable for search-based language models.
In addition, the keynote speakers were Akari Asai and Sewon Min from the University of Washington.
Akari Asai is a fourth-year doctoral student majoring in natural language processing at the University of Washington and a bachelor's degree from the University of Tokyo.
Her main passion is to develop reliable and adaptable natural language processing systems to improve information acquisition capabilities.
Recently, her research has focused on general knowledge retrieval systems and efficient adaptive NLP models.
Sewon Min is a doctoral candidate in the natural language processing group at the University of Washington, and during his doctoral studies, he worked part-time as a researcher at Meta AI for four years, and graduated from Seoul National University with a bachelor's degree.
Recently, she has focused on language modeling, retrieval, and the intersection of the two.
During the presentation, the audience also enthusiastically asked many questions, such as why perplexity was used as the main indicator of the speech.
The keynote speaker gave a careful answer:
Confusion (PPL) is often used when comparing parametric language models. However, whether the improvement in confusion can be translated into downstream applications is still a research question.
Studies have shown that perplexity correlates well with downstream tasks (especially build tasks), and that perplexity generally provides very stable results that can be evaluated on large-scale evaluation data (which is unlabeled relative to downstream tasks, which may be affected by sensitivity to prompts and lack of large-scale labeled data, resulting in unstable results).
Some netizens raised such questions:
Regarding the statement that "language models are expensive to train, and introducing retrieval may solve this problem", did you just replace temporal complexity with spatial complexity (data storage)?
The answer given by the keynote speaker is Aunt Sauce's:
Our discussion focused on how language models can be scaled down to a smaller size, reducing the need for time and space. However, data storage actually adds additional overhead, which requires careful weighing and research, which we consider to be the current challenge.
Compared with training a language model with more than ten billion parameters, I think the most important thing at present is to reduce the cost of training.
If you want to find this speech PPT, or squat specific replay, you can go to the official website to see ~
Official Website:
https://acl2023-retrieval-lm.github.io/
Reference Links:
[1]https://twitter.com/AkariAsai/status/1677455831439384582
[2]https://twitter.com/cosmtrek/status/1678077835418955781
[3]https://app.sli.do/event/ok8R2jMMvNjp9uMkxi63Qi/live/questions
— End —
Qubits QbitAI · Headline number signed
Follow us and be the first to know the latest scientific and technological trends