AI框架LangChain实战

background

Nowadays, there are a variety of AI applications to improve work efficiency. Like what:

1. Tools that upload local documents and help users summarize and analyze content, such as chatpdf.

2. Browser plug-ins that aggregate analysis information based on web content or videos, such as Monica by new bing.

We expect to use the logic of these tools to achieve one goal: users can automatically generate usable front-end page code by simply feeding them requirements documents or interface documents.

Study and research

All the operations we do later are based on openAI, and we need to apply for API keys on its official website in advance.

Each account will receive a $5 credit, which is enough for ordinary study and research for a while.

In the process of using it, some problems with chatGPT's AI model were found:

The large text exceeds the number of OpenAI tokens and exceeds the limit. At present, OpenAI's model is basically 4K, 8K, and 16K tokens. If you call the OpenAI API directly, an error indicating that the token has been exceeded will be reported.
There is no Internet connection and real-time information is not available.
Short-term memory. If the token exceeds the limit of the model, the AI will not be able to remember the previous settings.
Poor multitasking support. If you ask the AI to do a lot of things at once, it will be confused and will not know the priority and order of tasks

Why use langchain

To quote a very vivid phrase: AI models only have powerful "brains" but no "arms" and cannot interact with the outside world. The langchain framework gives developers different tools to install an AI model with an "arm".

As the chain in its name suggests, you can see that through this framework, various things can be "chained". It has only become popular in recent months, and it is currently the second fastest growing repository on GitHub. As of July 2023, there are already 54.7K stars.

It supports nodejs and Python calls, and the code of subsequent examples will be demonstrated using nodejs.

Read local documentation

Langchain supports csv, docx, epub, json, markdown, pdf, text and other file loading. Here, a word document is used to demonstrate that LangChain reads the relevant information of the interface document (the document is mock data and does not involve sensitive information).

The information of the interface document is shown in the figure (pay attention to the circled information, the code will correspond to it later)

Example 1: Search for document-related information

As can be seen from the results returned by the code, LangChain has successfully loaded the Word document and can find the information corresponding to the problem.

Example 2: Summarize API information

Let langchain get it from the interface information in the document

Interface address
field
header
Request method (POST in the example)

It is then output in axios. You can see that the output and the information in the document are exactly the same, and the code is available.

After looking at the example, you may wonder: Didn't you mention that AI models have a limit on the number of tokens? How can such a large document be readable directly?

You can see that there are two key calls in the code: splitter and vectorStore. For example, to use a model with a maximum token limit of 4K to read a string length of 1W, you can use the following steps:

Use text cutting to cut a string length of 1w into 10 document fragments of 1k length
Use vector storage to convert normal text into vector text
With vector search, you can find the most relevant document fragments.
The information is refined, summarized, and output through the AI model

The chatpdf mentioned at the beginning also uses a similar process, and if you are interested, you can go to github to take a look at the source code of chatpdf.

Read the web page

As mentioned earlier, chatGPT's AI model cannot be connected to the Internet, and its data can only be reached until 2021. For example, you ask it what the weather is like today? It will make up a result for you.

In July 2023, chatGPT plus has opened the plug-in for networking, and the banknote capability can ignore the fact that it cannot be connected to the Internet.

There is another way to crawl the information of the web page through the crawler, organize the information in the web page, and then provide it to the large model as a context.

Here's an example of me doing web crawling via puppeteer apifox docs: h861y5qddl.apifox.cn/

To make it easier to demonstrate things like auto-typing and auto-clicking, I'm going to turn off the headless browser for now

The full effect can be viewed in the video below:

, duration 00:55

The content in the video flashes relatively quickly, and many friends may not have finished reading the content, so I will add here what operations are involved in the picture:

After the node script is executed, puppeteer automatically starts a browser
Puppeteer automatically performs actions such as focusing, clicking, typing, etc., crawling the HTML text of all clicked pages
Use Cheerio to select the corresponding HTML snippet and format the text (remove tag, class, id, etc.)
Use langchain for text cutting, vector storage, search for key information, and then output via prompt + LLM. The output text here will only contain the code, and the code is of very high quality and ready to use!
Use fs for nodejs to write code into your project's files

Implementation steps

初始化openAI model

An important parameter is involved here, temperature, which has a value range of 0 - 1.

The lower the temperature value, the more stable the generated content will be. The temperature needs to be set to 0, and if the AI can't find the content of the matching reply, it will directly say "I don't know", which avoids the AI nonsense, which is very suitable for our scenario.

If the scene requires some divergent content, such as letting the AI tell jokes, simulating human customer service, etc., you need to increase the temperature value.

Initialize the headless browser

Puppeteer is commonly used in nodejs for web crawling, which can simulate various interactions and is very powerful.

Because there is a lot of unnecessary information and tags in the web page, we need to first observe which areas the main information of the page content is distributed in.

Taking the apifox documentation as an example, the contents we want to crawl are distributed in the #main ID, and the classes of the crawling interface start with ui-tree-node.

If we need to crawl all the interfaces managed by announcements, we can perform regular deblurring in the puppeteer function to match the corresponding HTML slices, and then merge the slices back.

Note: When crawling non-SSR (server-side rendering) pages, puppeteer needs to set an appropriate waiting time or listen for iconic elements to appear, otherwise it will not be able to crawl the desired content.

Formatting HTML

Another commonly used web page processing library in NodeJS is Cheerio. Although it is not as powerful as Puppeteer in terms of crawling pages, it can be used to perform various formatting operations on HTML.

In order to get more effective content, we need to use cheerio to filter out the content of the HTML and leave only the text.

Text Cutting + Vector Storage

Even if only valid content is retained, the crawled text content may still exceed the 4k token limit. Therefore, we need to call Langchain's API to cut the text + vector storage, and then we can extract the information we want in the [Search Q&A Chain].

创建RetrievalQAChain（搜寻问答链）

An important parameter is involved here, verbose, which can output the flow and thought process of the AI call chain to the console, and we can better adjust the prompt based on this information to get a more accurate output.

For example, our search information below is:

Summarize the following information:

1. The addresses of all interfaces in the text

2. The request parameters corresponding to the API

The corresponding result is as follows:

'The API addresses are: \n' +
'1. get/api/v1/announce/announce_list\n' +
'2. post/api/v1/announce/delete_announce\n' +
'3. post/api/v1/announce/update_announce\n' +
'4. post/api/v1/announce/add_announce\n' +
'\n' +
'The request parameters are: \n' +
'1. get/api/v1/announce/announce_list: announce_content, start_timestring, end_timestring, page, page_size\n' +
'2. post/api/v1/announce/delete_announce: id\n' +
'3. post/api/v1/announce/update_announce: id, announce_type, is_top, announce_content, publish_time\n' +
'4. post/api/v1/announce/add_announce: announce_type, is_top, announce_content, publish_time'

As you can see, all the information we need has been summarized. But the output format is not ideal:

The description is not in Chinese
It's not normal code and doesn't work

We can use langchain in combination with prompt to initialize the LLM chain and format the output again.

prompt

A good prompt should contain the following sections to ensure that the task requirements and desired output are clearly communicated:

Topic or task description

Clearly state what the topic or task of the prompt is, so that the user knows what needs to be done. For example: "Chinese to English", "generate dialogue", etc.

指令（Instructions）

Provide clear guidance on how to complete the task. Instructions should be concise and clear and avoid vague or ambiguous expressions.

示例（Examples）

Provide some examples of what you want your input and output to look like. Examples help users understand the requirements of the task and output.

限制（Constraints）

Determine the constraints and conditions of the task. These limitations may relate to aspects such as the length of the output, format, language, etc. Limits help constrain the task so that the answers provided by the user are more in line with expectations.

结尾（End of Prompt）

Clearly indicate the end of the prompt with a specific flag or text to ensure the user knows when the task is complete.

上下文（Context）

This is not required, and if you need to output the result according to the context, you must put the context at the beginning or at the end, otherwise the quality of the output answer will be much worse.

Here is one of the templates I wrote:

Context:
<article>{formatContent}</article>
Prompt: generate ts code by interface
Instructions:
You are a skilled front-end engineer, proficient in TypeScript.
Based on the given context, generate the corresponding code.
Ignore the "*" symbol in the context.
Examples:
Input: /api/v1/[xxx]
Output: {listRequest}
Input: /api/v1/[xxx]
Output: {updateRequest}
Input: /api/v1/[xxx]
Output: {listResponse}
Input: /api/v1/[xxx]
Output: {listResponse}
Input: /api/v1/[xxx]
Output: {deleteResponse}
Constraints:
Do not generate any content other than the code.
Do not generate duplicate [typescript interface].
Task:
Generate TypeScript declarations and corresponding request interfaces.
End of Prompt

Final output

After invoking the LLM, the AI model will generate the following code according to my prompt:

typing.ts

export interface AddDictionaryRequest {
  name?: string;
  key?: string;
  val?: string;
}


export interface AddDictionaryResponse {
  id?: number;
}


export interface DeleteDictionaryRequest {
  ids: number[];
}


export interface DeleteDictionaryResponse {
  object: {};
}


export interface EditDictionaryRequest {
  id: number;
  name?: string;
  key?: string;
  val?: string;
}


export interface EditDictionaryResponse {
  id?: number;
}


export interface GetDictionaryRequest {
  id: number;
}


export interface GetDictionaryResponse {
  config?: {
    id?: number;
    name?: string;
    key?: string;
    val?: string;
    created_at?: string;
    updated_at?: string;
  };
}


export interface ListDictionaryRequest {
  key?: string;
  name?: string;
  page?: number;
  page_size?: number;
}


export interface ListDictionaryResponse {
  headers?: {
    label?: string;
    key?: string;
    sort?: boolean;
    tips?: string;
    merge?: boolean;
    mergeField?: string;
  }[];
  rows?: {};
  sums?: {};
  counts?: number;
}

index.ts

import type {
  AddDictionaryRequest,
  AddDictionaryResponse,
  DeleteDictionaryRequest,
  DeleteDictionaryResponse,
  EditDictionaryRequest,
  EditDictionaryResponse,
  GetDictionaryRequest,
  GetDictionaryResponse,
  ListDictionaryRequest,
  ListDictionaryResponse,
} from "./typing.ts";
// utils
import request from "@/utils/request";
/**
 * 添加字典信息
 */
export async function addDictionary_api(
  params: AddDictionaryRequest
): Promise<AddDictionaryResponse> {
  return request.post("/api/waldon/test-dictionary/add", params);
}


/**
 * 删除字典信息
 */
export async function deleteDictionary_api(
  params: DeleteDictionaryRequest
): Promise<DeleteDictionaryResponse> {
  return request.post("/api/waldon/test-dictionary/delete", params);
}


/**
 * 修改字典信息
 */
export async function editDictionary_api(
  params: EditDictionaryRequest
): Promise<EditDictionaryResponse> {
  return request.post("/api/waldon/test-dictionary/edit", params);
}


/**
 * 获取字典信息
 */
export async function getDictionary_api(
  params: GetDictionaryRequest
): Promise<GetDictionaryResponse> {
  return request.get("/api/waldon/test-dictionary/get", params);
}


/**
 * 字典列表
 */
export async function listDictionary_api(
  params: ListDictionaryRequest
): Promise<ListDictionaryResponse> {
  return request.get("/api/waldon/test-dictionary/list", params);
}

After comparing it with the text of the first output, you can see that every statement in this prompt hits!

The generated content is only code, no other superfluous descriptions, and the code is of very high quality! This code quality and variable naming can surpass 70% of handwritten code!

Once you have the desired result, you can use the NodeJS fs API to generate the text into the project file.

Data Aggregation

Here are some tutorials and videos that I found better when I was learning prompt and langchain:

Geek time

time.geekbang.org/column/intr… AI use courses, how to better set up AI characters, and how to write perfect prompts

langchainjs

github.com/liaokongVFX… Chinese documents (strongly recommended)
www.youtube.com/playlist?li… YouTube Basics Tutorial for getting started
www.pinecone.io/learn/serie… Relatively basic documentation
www.youtube.com/watch?v=9qq… Search for Comparative Training
js.langchain.com/docs/api/ve… faiss uses documentation

openai

github.com/openai/open… AI writes single tests
github.com/openai/open… The repository used by AI can focus on the content of examples
github.com/chatanywher… openai帐号
github.com/openai/open… OpenAI's official website

summary

Langchain is a great framework with a lot of features, but it only demonstrates the two types of web analysis: reading local files and crawling web pages. Moreover, it is not strongly bound to any company's AI model, even if openai or other foreign models cannot be used later, it can also be replaced with a domestic one. If you have a better learning posture, welcome to communicate together~

Author: Deqin

Source-WeChat public account: Sanqi Mutual Entertainment Technical Team

Source: https://mp.weixin.qq.com/s/36LDqIlBsAHpoM7M5aBtjA