Learn about LangChain

In everyday life, we often focus on building end-to-end applications. There are many automated machine learning platforms and continuous integration/continuous delivery (CI/CD) pipelines that can be used to automate our machine learning processes. We also have tools like Roboflow and Andrew N.G.'s Landing AI to automate or create end-to-end computer vision applications.

If we want to create applications based on large language models with OpenAI or Hugging Face, we may have had to do it manually. Now, to achieve the same goal, we have two of the most well-known libraries, Haystack and LangChain, which can help us create end-to-end applications or processes based on large language models.

Let's take a closer look at LangChain.

Learn about LangChain

What is LangChain?

Learn about LangChain

LangChain is an innovative framework that is revolutionizing the way we develop applications driven by language models. By introducing advanced principles, LangChain is redefining the limits of what traditional APIs can achieve. In addition, the LangChain application has the characteristics of an intelligent agent that enables the language model to interact and adapt to the environment.

LangChain consists of several modules. As its name suggests, the main purpose of LangChain is to chain these modules. This means that we can chain each module together and use this chain structure to call all modules at once.

These modules consist of the following parts:

Model

As discussed in the introduction, models mainly cover large language models (LLM). A large language model refers to a neural network model that has a large number of parameters and is trained on large-scale unlabeled text. Tech giants have launched a variety of large-scale language models, such as:

Google's BERT
OpenAI's GPT-3
Google LaMDA
Google PaLM
Meta AI 的 LLaMA
OpenAI's GPT-4
……

With LangChain, interacting with large language models becomes easier. The interfaces and features provided by LangChain help easily integrate the power of LLM into your work applications. LangChain leverages the asyncio library to provide asynchronous support for LLM.

For network binding scenarios that require concurrent calls to multiple LLMs at the same time, LangChain also provides asynchronous support. By freeing up the thread processing the request, the server can assign it to other tasks until the response is ready, maximizing resource utilization.

Currently, LangChain supports asynchronous support for models such as OpenAI, PromptLayerOpenAI, ChatOpenAI, and Anthropic, but asynchronous support for other LLMs will be expanded in future plans. You can use the agenerate method to invoke OpenAI LLM asynchronously. In addition, you can write custom LLM wrappers that are not limited to the models supported by LangChain.

I use OpenAI in my application and mostly use the Davinci, Babbage, Curie, and Ada models to solve my problems. Each model has its own benefits, token usage, and use cases.

For more information on these models, read:

https://subscription.packtpub.com/book/data/9781800563193/2/ch02lvl1sec07/introducing-davinci-babbage-curie-and-ada

Case 1:

# Importing modules 
from langchain.llms import OpenAI

#Here we are using text-ada-001 but you can change it 
llm = OpenAI(model_name="text-ada-001", n=2, best_of=2)

#Ask anything
llm("Tell me a joke")

Output 1:

'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

Case 2:

llm_result = llm.generate(["Tell me a poem"]*15)

Output 2:

[Generation(text="\n\nWhat if love neverspeech\n\nWhat if love never ended\n\nWhat if love was only a feeling\n\nI'll never know this love\n\nIt's not a feeling\n\nBut it's what we have for each other\n\nWe just know that love is something strong\n\nAnd we can't help but be happy\n\nWe just feel what love is for us\n\nAnd we love each other with all our heart\n\nWe just don't know how\n\nHow it will go\n\nBut we know that love is something strong\n\nAnd we'll always have each other\n\nIn our lives."),  
 Generation(text='\n\nOnce upon a time\n\nThere was a love so pure and true\n\nIt lasted for centuries\n\nAnd never became stale or dry\n\nIt was moving and alive\n\nAnd the heart of the love-ick\n\nIs still beating strong and true.')]

Prompt

As we all know, prompts are inputs we provide to the system in order to make precise or specific adjustments to the answer based on our use case. Many times, we want more than just text, but more structured information. Many new object detection and classification algorithms based on contrast, pre-training and zero-sample learning use hints as valid inputs to predict outcomes. For example, OpenAI's CLIP and META's Grounding DINO both use hints as input to predictions.

In LangChain, we can set up prompt templates as needed and connect them with the main chain for output prediction. In addition, LangChain provides the ability to output parsers for further refining the results. The role of the output parser is to (1) guide the format of the model output, and (2) parse the output into the desired format (including retries if necessary).

In LangChain, we can provide prompt templates as input. A template refers to a specific format or blueprint for which we want to get an answer. LangChain provides pre-designed prompt templates that can be used to generate prompts for different types of tasks. However, in some cases, the preset templates may not meet your needs. In this case, we can use a customized prompt template.

Case:

from langchain import PromptTemplate
# This template will act as a blue print for prompt

template = """
I want you to act as a naming consultant for new companies.
What is a good name for a company that makes {product}?
"""

prompt = PromptTemplate(
    input_variables=["product"],
    template=template,
)
prompt.format(product="colorful socks")
# -> I want you to act as a naming consultant for new companies.
# -> What is a good name for a company that makes colorful socks?

Memory

In LangChain, chained and proxy operate in stateless mode by default, i.e. they process each incoming query independently. However, in some applications, such as chatbots, keeping a record of previous interactions is important in both the short and long term. This is where the concept of "memory" needs to be introduced.

LangChain provides two forms of memory components. First, LangChain provides auxiliary tools for managing and manipulating previous chat messages that are designed to be modular and work well regardless of the use case. Secondly, LangChain provides an easy way to integrate these tools into a chain structure, making it very flexible and adaptable to various situations.

Case:

from langchain.memory import  ChatMessageHistory  
  
history  = ChatMessageHistory()  
history.add_user_message("hi!")  
  
history.add_ai_message("whats up?")  
history.messages

Output:

[HumanMessage(content='hi!', additional_kwargs={}),  
 AIMessage(content='whats up?', additional_kwargs={})]

Chain

Chains provide a way to combine various components into a unified application. For example, you can create a chain that receives input from the user, formats it with a PromptTemplate, and then passes the formatted reply to LLM (Large Language Model). By integrating multiple chains with other components, more complex chain structures can be generated.

LLMChain is considered one of the most common ways to query LLM objects. It formats the provided input key value and memory key value (if present) according to the prompt template, and then sends the formatted string to LLM, which generates and returns the output.

After the Language model is invoked, a series of steps can be followed to sequence multiple model calls. This practice is especially valuable when you want to take the output of one call as the input to another. In this chained sequence, each chain has an input and an output, with the output of one step serving as the input to the next step.

#Here we are chaining everything
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)
human_message_prompt = HumanMessagePromptTemplate(
        prompt=PromptTemplate(
            template="What is a good name for a company that makes {product}?",
            input_variables=["product"],
        )
    )
chat_prompt_template = ChatPromptTemplate.from_messages([human_message_prompt])
chat = ChatOpenAI(temperature=0.9)
# Temperature is about randomness in answer more the temp, random the answer
#Final Chain

chain = LLMChain(llm=chat, prompt=chat_prompt_template)
print(chain.run("colorful socks"))

Agent

Some apps may require not only a predetermined LLM (Large Language Model)/other tool call order, but also an indeterminate call order based on user input. The sequence involved in this case includes an "agent" that has access to a variety of tools. Based on the user's input, the agent may decide whether to invoke these tools and determine the input when invoked.

According to the documentation, the high-level pseudocode for the proxy looks roughly like this:

Receive user input.
Based on the input, the agent decides whether to use a tool and what the tool's input should be.
Invoke the tool and record the observations (that is, the output from the call using the tool and input).
The history of tools, tool inputs, and observations is passed back to the agent, which decides which steps it should take next.
Repeat the previous steps until the agent decides that the tool is no longer needed, and then respond directly to the user.

This process is repeated until the agent decides that the tool is no longer needed and responds directly to the user.

Case:

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)

tools = load_tools(["serpapi", "llm-math"], llm=llm)

agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

agent.run("Who is Leo DiCaprio's girlfriend? What is her current age raised to the 0.43 power?")

Let's summarize everything in the following graph.

Understanding all modules and chained operations is important for pipeline applications that use LangChain to build large language models. This is just a brief introduction to LangChain.

Learn about LangChain

Practical application of LangChain

Learn about LangChain

Without further ado, let's go straight to building simple applications using LangChain. One of the most interesting applications is creating a question answering bot on custom data.

Disclaimer/Warning: This code is only meant to show how the app is built. I do not guarantee that the code will be optimized, and further improvements may be required depending on the specific problem statement.

Start importing modules

Import LangChain and OpenAI for the large language model section. If you haven't installed them yet, install them first.

#    IMPORTS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import ConversationalRetrievalChain
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS
from PyPDF2 import PdfReader
from langchain import OpenAI, VectorDBQA
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.chains import ConversationChain

from langchain.document_loaders import TextLoader
# from langchain import ConversationalRetrievalChain
from langchain.chains.question_answering import load_qa_chain
from langchain import LLMChain
# from langchain import retrievers
import langchain
from langchain.chains.conversation.memory import ConversationBufferMemory

py2PDF is a tool for reading and processing PDF files. In addition, there are different types of memory, such as ConversationBufferMemory and ConversationBufferWindowMemory, which have specific functions. I'll go into more detail about memory in the last section.

Set up the environment

I think you know how to get the OpenAI API key, but I still want to clarify:

Go to the OpenAI API page,
Click "Create new secret key"
That will be your API key. Please paste it below

import os  
os.environ["OPENAI_API_KEY"] = "sk-YOUR API KEY"

Which model to use? Davinci, Babbage, Curie or Ada? GPT-3, GPT-3.5, GPT-4? There are many questions about models, all of which are suitable for different tasks. Some models are less expensive and some are more accurate.

For simplicity, we will use the most affordable model "GPT-3.5-TURBO". Temperature is a parameter that affects the randomness of the answer. The higher the temperature value, the more random the answer we get.

llm = ChatOpenAI(temperature=0,model_name="gpt-3.5-turbo")

Here, you can add your own data. You can use any format, such as PDF, text, document, or CSV. Depending on your data format, you can cancel/comment the following code.

# Custom data
from langchain.document_loaders import DirectoryLoader
pdf_loader = PdfReader(r'Your PDF location')

# excel_loader = DirectoryLoader('./Reports/', glob="**/*.txt")
# word_loader = DirectoryLoader('./Reports/', glob="**/*.docx")

We can't add all the data at once. We divide the data into chunks and send them to create data Embedding.

Embedding is represented in the form of a vector of numbers or arrays that capture the substance and contextual information of the markup that the model processes and generates. These embeddings are derived from the parameters or weights of the model and are used to encode and decode input and output text.

This is how Embedding was created.

In simple terms, in LLM, Embedding is a way of representing text as a vector of numbers. This enables language models to understand the meaning of words and phrases and perform tasks such as text classification, summarization, and translation.

In layman's terms, Embedding is a way to convert words into numbers. This is achieved by training machine learning models on a large corpus of text. The model learns to associate each word with a unique vector of numbers. This vector represents the meaning of the word, as well as the relationship to other words.

Let's do exactly the same thing as shown in the image above.

#Preprocessing of file

raw_text = ''
for i, page in enumerate(pdf_loader.pages):
    text = page.extract_text()
    if text:
        raw_text += text

# print(raw_text[:100])


text_splitter = CharacterTextSplitter(        
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
texts = text_splitter.split_text(raw_text)

In a real-world scenario, when a user initiates a query, it searches in vector stores and retrieves the most suitable index, which is then passed to LLM. LLM then rebuilds the content in the index to provide a formatted response to the user.

I recommend digging further into the concepts of vector storage and Embedding to enhance your understanding.

embeddings = OpenAIEmbeddings()
# vectorstore = Chroma.from_documents(documents, embeddings)
vectorstore = FAISS.from_texts(texts, embeddings)

Embedding vectors are stored directly in a vector database. There are many vector databases available, such as Pinecone, FAISS, etc. Here, we will use FAISS.

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say GTGTGTGTGTGTGTGTGTG, don't try to make up an answer.
{context}
Question: {question}
Helpful Answer:"""
QA_PROMPT = PromptTemplate(
    template=prompt_template, input_variables=['context',"question"]
)

You can use your own hints to refine queries and answers. After writing the prompt, let's link it with the final chain.

Let's call the last chain, which includes everything that was previously linked. We use ConversationalRetrievalChain here. It helps us have conversations in a human way and remember previous chats.

qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(temperature=0.8), vectorstore.as_retriever(),qa_prompt=QA_PROMPT)

We'll use simple Gradio to create a web application. You can choose to use Streamlit or other front-end technologies. In addition, there are many free deployment options to choose from, such as deploying to Hugging Face or localhosting, which we can do later.

# Front end web app
import gradio as gr
with gr.Blocks() as demo:
    gr.Markdown("## Grounding DINO ChatBot")
    chatbot = gr.Chatbot()
    msg = gr.Textbox()
    clear = gr.Button("Clear")
    chat_history = []

def user(user_message, history)
        print("Type of use msg:",type(user_message))
        # Get response from QA chain
        response = qa({"question": user_message, "chat_history": history})
        # Append user message and response to chat history
        history.append((user_message, response["answer"]))
        print(history)
        return gr.update(value=""), history
    msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False)
    clear.click(lambda: None, None, chatbot, queue=False)
    ############################################

if __name__ == "__main__":
    demo.launch(debug=True)

This code will create a link locally where you can ask questions and see the answers. At the same time, in your integrated development environment (IDE), you will see the maintenance of your chat history.

LangChain snapshot

This is a simple introduction that shows how to create the final chain by connecting different modules. By tweaking modules and code, you can implement many different features. I would say that play is the highest form of research!

Learn about LangChain

LangChain's tokens and models

Learn about LangChain

Token

Tokens can be thought of as components of a word. Before processing the prompt, the API splits the input into tokens. The token's segmentation does not necessarily correspond exactly to the beginning or end of the word, and may also include trailing spaces or even subwords.

In natural language processing, we usually do the operation of tokenizer to split a paragraph into sentences or words. Here, too, we divide sentences and paragraphs into small chunks made up of words.

The image above shows how to split text into tokens. Different colors represent different tokens. As a rule of thumb, a token is equivalent to approximately 4 characters in common English text. This means that 100 tokens are equivalent to approximately 75 words.

If you want to check the number of tokens for a specific text, you can check it directly on OpenAI's Tokenizer.

Another way to count tokens is to use the Tiktoken library.

import tiktoken
#Write function to take string input and return number of tokens 
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.encoding_for_model(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

Finally, use the above function:

prompt = []
for i in data:
    prompt.append((num_tokens_from_string(i['prompt'], "davinci")))
    
completion = []
for j in data:
    completion.append((num_tokens_from_string(j['completion'], "davinci")))
    
res_list = []
for i in range(0, len(prompt)):
    res_list.append(prompt[i] + completion[i])
    
no_of_final_token = 0
for i in res_list:
    no_of_final_token+=i
print("Number of final token",no_of_final_token)

Output:

Number of final token 2094

The choice of different models is affected by the number of tokens.

First, let's learn about the different models offered by OpenAI. In this blog, I focus on OpenAI models. We can also use hugging faces and cohere AI models.

Let's start with the basic model.

Model

GPT is powerful because it is trained on large datasets. However, great power comes with a price, so OpenAI offers several models to choose from, also known as engines.

Davinci is the largest and most powerful engine. It can perform all the tasks that other engines can perform. Babbage is a sub-powerful engine that can perform the tasks that Curie and Ada can perform. Ada is the weakest engine, but it has the best performance and the lowest price.

As GPT continues to evolve, there are many different versions of the model to choose from. There are approximately more than 50 models available in the GPT series.

Screenshot from OpenAI's official model page

Therefore, there are different models to choose from for different uses, including generating and editing images, processing audio, and encoding. For text processing and natural language processing, we want to choose models that can accurately perform the task. In the image above, we can see three available models:

GPT-3
GPT-3.5
GPT-4

However, at this time we cannot use GPT-4 directly, as GPT-4 is currently limited to a limited testing phase and is only available to certain authorized users. We need to join the waiting list and wait for authorization. Therefore, now we are left with only two options, GPT-3 and GPT-3.5.

Screenshot from OpenAI's official model page

The figure above shows the models available for GPT-3 and GPT-3.5. You can see that these models are based on different versions of Davinci, Babbage, Curie, and Ada.

If you look at the chart above, you'll see a column called "Max tokens". "Max tokens" is a parameter of the OpenAI model that limits the number of tokens that can be generated in a single request. The limit includes prompts and the number of tokens completed.

In other words, if your prompt occupies 1,000 tokens, you can only generate completion text for up to 3,000 tokens. In addition, "Max tokens" are restricted to being executed by OpenAI servers. If you try to generate text that exceeds the limit, your request will be rejected.

GPT-3-based models have lower "Max tokens" values (2049), while GPT-3.5-based models have higher values (4096). Therefore, GPT-3 is used. 5 Models can handle larger amounts of data.

Next, let's take a look at pricing for different models.

We can choose the "GPT-3.5-turbo" model based on GPT-3.5.

Let's say I have 5000 words and I use the "gpt-3.5-turbo" model, then:

5000 words equals approximately 6667 tokens.

Now, for 1000 tokens, we need $0.002.

So, for 6667 tokens, we need about $0.0133.

We can roughly calculate how much usage is needed for processing. At the same time, the number of iterations is a parameter that changes the number of tokens, so this needs to be considered in the calculation.

Now, you can understand the importance of tokens. That's why we have to do very clean and proper preprocessing to reduce noise in the document while also reducing the cost of processing tokens. Therefore, it is very important to clean up the text properly, such as removing noise. Even removing extra spaces can save money for your API keys.

Let's view all models in one memory graph.

Learn about LangChain

summary

Learn about LangChain

Tokens are essential for question answering or other LLM-related tasks. How to preprocess data in a way that enables the use of cheaper models is a real game-changer. The choice of model depends on the trade-offs you wish to make. The Davinci series will be serviced with greater speed and accuracy, but at a higher cost. A model based on the GPT-3.5 Turbo will save money, but at a slower pace.

Written by Chinmay Bhalerao

Source: Distributed Labs

Original: https://pub.towardsai.net/tokens-and-models-understanding-langchain-%EF%B8%8F-part-3-e471aececf19