Based on the application scope and dilemma of LLM large model agent

This article mentions how large models typically work, i.e., questions and answers through prompt words, and points to two main problems: the management of historical conversation information and the limitation of the number of tokens. This paper discusses the two application scenarios of knowledge base Q&A and personal assistant, and analyzes the dilemmas faced by each one, such as the knowledge base cannot effectively handle multimodal information and large documents, and the personal assistant is limited by the complexity of tool parameters and token length. THE ARTICLE ALSO MENTIONS FINE-TUNING AS A METHOD TO IMPROVE MODEL PERFORMANCE AND ITS POTENTIAL APPLICATION IN DIFFERENT FIELDS. Finally, expectations for fine-tuning to become a standard operating process were shared, as well as suggestions for improvements to the existing platform and infrastructure.

background

At present, the LLM large model is in full swing, and the major Internet manufacturers are basically training & launching self-developed large models, chatgpt, Qianwen, moonshot kimi. Based on these large models, many applications have also emerged. However, there has not yet been a phenomenal application, and the wonderful duck camera is counted as one, but it is also very short-lived. Due to the requirements of business scenarios, the author also explored the solution of Agent based on large models, and tried to use it in actual business scenarios. But there is still a certain gap in the reality.

In the agent-based solution, there are many open source frameworks, such as:

LangChain: A large-model-based application development framework, which enables application developers to complete application development based on large-model inference combined with storage, tools, indexes, prompts and other modules to complete application development of personal assistants, tasks, and Q&A. The main focus is on the integration of service capabilities of a single agent. Basically, many AI applications now have this in mind.
Autogen: Microsoft's open-source agent framework. If we think of Langchain as an agent that interacts with people, then autogen is a multi-agent task framework in which agents and agents interact and work together to accomplish a goal. The human only needs to provide a goal to Autogen, and he can complete the goal through multiple rounds of dialogue and execution on the agent.
metagpt: The official introduction is "MetaGPT is a powerful open-source software that leverages a multi-agent framework (product design, technical design, and programmer) to handle your needs. Simply enter the requirements, and MetaGPT can plan, design, and generate product documentation, test code, and main runtime code, so you can start running your software right away. Multi-agents are more efficient and flexible than single-agent. This is a major breakthrough in AI technology that makes software development easier and more efficient. ”

Well, no matter how fanciful it is, it still depends on whether it works or not, let's take Langchain's two best application scenarios as an example, one is the knowledge base Q&A, and the other is a personal assistant to take a look.

Before talking about these two scenarios, we need to familiarize ourselves with the working posture of the large model:

The working posture of the general large model is that the prompt word = > the large model = > and the large model gives questions and answers based on the prompt word.

General format of the prompt:

PREFIX = """Answer the following questions as best you can. You have access to the following tools:"""
FORMAT_INSTRUCTIONS = """Use the following format:


Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question"""
SUFFIX = """Begin!


Question: {input}
Thought:{agent_scratchpad}"""

To put it simply, it is to tell the large model to answer my question in a specific format, the tools that can be used, the parameter format, etc. The point is to require the large model to work according to a specific paradigm and output results. The above is Promopt of the MRKL model in the chat scenario, if you are interested, you can take a look at the paper behind MRKL.

MRKL address: https://arxiv.org/pdf/2205.00445.pdf

In the actual operation process, our parameters will be filled with the corresponding data, for example, tool will be replaced with tool information, input information, including historical session information, etc., as shown in the following figure:

There are two problems with this place:

The first problem is that LLMs do not have historical dialogue information, so agents and other frameworks generally have the ability to memory, and every time the large model is called, the historical dialogue information needs to be taken out and put into the prompt;
The second problem is that there are only a limited number of tokens that can be called at one time, which is actually fatal. If you encounter a very many scenarios related to the knowledge base, how to solve the problem? You can't bring all the information of the knowledge base every time, there are generally two solutions, the first is to only bring the relevant information over each time, that is, the following general knowledge base scheme. There is also a fine-tuning mode to do it in advance, using prior knowledge to solve expert problems.

The effect of the knowledge base

The above figure, it is estimated that many students have seen it, we divide PDF, Yuque documents, WORD and other documents, embedding, and store them in the vector database, and in the Q&A session, pull the relevant blocks, and then build a prompt to the large model, and then let him output the corresponding Q&A.

The essence solves two problems, the problem of the limitation of the number of tokens, and the ability to refine the precise problem.

Text Splitter : A text splitter is an algorithm or method that splits large pieces of text into smaller chunks or fragments. The goal is to create manageable fragments that can be processed individually, which is often necessary when working with large documents or datasets.
Embedding: To put it simply, vectorization is the final need for normalization and self-correlation of data in different dimensions. Of course, the best thing to do is chatgpt's text-embedding-ada-002, in fact, chatgpt's can do better than other large models, to a large extent, it is affected by his Embedding results, under the Transform model, the long-head attention is very dependent on the quality of Embedding.
vectorstore: Vector database, which stores the vectors that have just been embedded, and general vector databases are available, such as holo, redis, etc., which support the calculation of Euclidean distances, vector inner products, etc.

The principle is very simple, there are not thousands of robots in the knowledge base within the group, and it is estimated that there are hundreds, and in some basic questions, it can answer almost the same, but in complex problems, I have not seen a very powerful knowledge base robot appear.

The dilemma of the knowledge base

The first problem is that the knowledge base documents that we vectorize actually contain a lot of pictures in many cases, and in terms of expressing semantic capabilities, pictures often contain more semantics, and image processing may become a link, and even more, including videos, how to deal with them, so document processing can not simply do NLP processing, but also have to support multimodality, and it seems that such capabilities have not yet appeared. In the figure below, it is actually an example.

The second thing that affects the result is the size of the document, if the knowledge base is too large, after the partition is completed, the recalled data will also be too much, and there will inevitably be incomplete information caused by discarding. However, this situation can be solved by trying multiple calls and then integrating them.

In addition, there is a correlation between the poses used, and if the problem is not discriminative, the returned result also has a problem with generalization. One of the characteristics of the knowledge base capability is that the knowledge you provide to me must be accurate and useful, because as a user, if you give me what is useful and what you give me is wrong, as a user, I will not use it, I am not an expert, and I have no way to judge the correctness of the results. This is a different mind from an expert assistant.

Effect of personal assistant

The basic logic of the personal assistant is that after input, assemble the parameters, tell the large model the target and tool description, let the large model tell us whether we should use the tool, and which tool to use, in this process, the large model will also help us build the tool parameters (provided that the tool describes the type of parameter clearly), Langchain calls according to the tool and parameters, and after getting the result, the large model is called again. This is a basic ReAct idea. For details, please refer to the paper: ReAct

ReAct Address: https://arxiv.org/pdf/2210.03629.pdf

In fact, according to this idea, if you write a good Prompt, and then write the description of the tool clearly, you will find that this can indeed produce some value, at least in the auxiliary tool assistant, he can still achieve good results. For example, check the whitelist, order information, and so on.

Personal assistant dilemma

One of his obvious dilemmas is that the parameters of the tool should not be too complex, because the parameters are derived from the results of the recommendation of the large model, even the latest gpt is prone to problems in the parameter mapping, even if it is a String array, it is prone to the problem of inconsistent formatting. Therefore, the input parameters of the tool must be simple enough, otherwise the transformation and validation will be painful to death.
Since each call carries a full amount of History, if there is too much background knowledge, there will also be the problem of tokens being too long. Just like you want someone to do customer service for you, first of all, he has to know enough prior knowledge, because he has not done fine-tuning, so he comes with full information every time, and the model is slow to process.

So in the end, if you want good results, you still need to prepare enough clear business domain knowledge (multimodal) to do Fine-Tuning.

Fine-'s-did-it-nots

Before talking about fine-tuning in the field of NLP, we can first talk about fine-tuning in the field of Wensheng diagrams. At present, there is a very popular website in the field of Wensheng Diagram called Station C. He is all based on the idea of LORA, which is based on the idea of LORA, and makes a fine-tuning model. It's easy to take the basic model of SD, add the fine-tuning model and Promot to produce images that correspond to the style.

Please refer to the paper LORA address: https://arxiv.org/pdf/2106.09685.pdf

When the inference ability of the open-source basic model reaches the level of ChatGPT4, fine-tuning will become a standard operating process in order to solve problems in various professional fields (you can also call it style, domain knowledge, and contextual background). But it's hard to say whether it can form an atmosphere similar to C station, after all, the style of the picture is universal, everyone can learn from each other, and the professional background is different for each product.
Based on the ability of large models, it is still difficult to achieve good results in the field of accurate computing, such as the invocation of complex tools. However, good results can be obtained in the field of expert assistants, such as Copilot for code generation. Although the generated code does not guarantee that a single line does not need to be modified, it can achieve 80% of the valid code, and modify the problematic 20% based on your own expert knowledge, which is also a huge amount of effort.
Finally, let's talk about DingTalk, the current DingTalk already supports streaming stream, which can easily achieve the effect of large model typing, you can try it.

Author: Feng Ling

Source-WeChat public account: big Taobao technology

Source: https://mp.weixin.qq.com/s/c05e7n4JmaAkbyaWszdc1Q

Based on the application scope and dilemma of LLM large model agent

Read on

A summary of 9 models of geometric guide angle problems in the mathematics common test of the high school entrance examination

Five forces model to improve personal core competence

Meta AI released the most powerful open-source large model, Llama 3, which is available in versions 8B and 70B?

How to use AI models to solve practical problems?

In the era of large models, is the data center outdated now?

轩辕大模型的实践与应用 | ML-Summit 2024

The mobile UI model came out, and the Apple iPhone may welcome a new cycle of upgrades

iFLYTEK does not tell the "sexy story" of large models

Meta released the "strongest open-source AI model", and the next generation may be stronger than GPT

面壁新模型:早于Llama3、比肩 Llama3、推理超越 Llama3!

Huawei's profit soared by 564% in the first quarter, Tianya community recovered, and Xiaohongshu tested its self-developed large model

13 Models of Effective Communication Expression

Eat through an industrial chain in one day: NO.37 AI large model industrial chain

10 domestic large models vs. mentally handicapped - Chinese comprehension ability assessment

The most complete interpretation of the MoE hybrid expert model: revealing the key technologies and challenges

Baidu's strongest SOTA: 3DGS based on diffusion model!