天天看點

LangChain中文手冊子產品之模型通用功能

作者:AI讓生活更美好

文檔的這一部分涉及 LangChain 中使用的不同類型的模型。在此頁面上,我們将在較高層次上介紹模型類型,但我們為每種模型類型提供了單獨的頁面。這些頁面包含使用該模型的更詳細的“操作方法”指南,以及不同模型提供商的清單。

語言模型

大型語言模型 (LLM) 是我們涵蓋的第一類模型。這些模型将文本字元串作為輸入,并傳回文本字元串作為輸出。

聊天模型

聊天模型是我們涵蓋的第二種類型的模型。這些模型通常由語言模型支援,但它們的 API 更加結構化。具體來說,這些模型将聊天消息清單作為輸入,并傳回聊天消息。

文本嵌入模型

我們涵蓋的第三種模型是文本嵌入模型。這些模型将文本作為輸入并傳回一個浮點數清單。

開始

LangChain 的核心價值之一是它為模型提供了一個标準接口。這使您可以輕松地在模型之間進行切換。在高層次上,有兩種主要類型的模型:

  • 語言模型:适合文本生成
  • 文本嵌入模型:适用于将文本轉換為數字表示

語言模型

語言模型有兩種不同的子類型:

  • LLMs:這些包裝 APIs 接受文本并傳回文本
  • ChatModels:這些包裝模型接收聊天消息并傳回聊天消息

這是一個細微的差别,但 LangChain 的一個價值支柱是我們提供了一個統一的接口來跨越這些。這很好,因為盡管底層 API 實際上非常不同,但您通常希望互換使用它們。

為了解這一點,讓我們看看 OpenAI(OpenAI 的 LLM 的包裝器)與 ChatOpenAI(OpenAI 的 ChatModel 的包裝器)。

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
           
llm = OpenAI()
           
chat_model = ChatOpenAI()
           

text->text界面

llm.predict("say hi!")

'\n\nHi there!'

chat_model.predict("say hi!")

'Hello there!'

messages->message界面

from langchain.schema import HumanMessage

llm.predict_messages([HumanMessage(content="say hi!")])

AIMessage(content='\n\nHello! Nice to meet you!', additional_kwargs={}, example=False)

chat_model.predict_messages([HumanMessage(content="say hi!")])

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={}, example=False)

語言模型

大型語言模型(LLM)是 LangChain 的核心元件。LangChain 不是 LLM 的提供者,而是提供一個标準接口,您可以通過該接口與各種 LLM 進行互動。

提供了以下文檔部分:

  • 入門:LangChain LLM 課程提供的所有功能的概述。
  • 操作指南:操作指南的集合。這些重點介紹了如何使用我們的 LLM 課程(流媒體、異步等)實作各種目标。
  • 內建:關于如何将不同的 LLM 提供商與 LangChain(OpenAI、Hugging Face 等)內建的示例集合。
  • 參考:所有 LLM 課程的 API 參考文檔。

開始

本筆記本介紹了如何在 LangChain 中使用 LLM 課程。

LLM 類是為與 LLM 接口而設計的類。有很多 LLM 提供程式(OpenAI、Cohere、Hugging Face 等)——此類旨在為所有這些提供标準接口。在這部分文檔中,我們将重點關注通用 LLM 功能。有關使用特定 LLM 包裝器的詳細資訊,請參閱操作方法部分中的示例。

對于本筆記本,我們将使用 OpenAI LLM 包裝器,盡管突出顯示的功能對所有 LLM 類型都是通用的。

from langchain.llms import OpenAI
           
llm = OpenAI(model_name="text-ada-001", n=2, best_of=2)
           

生成文本: LLM 最基本的功能就是調用它,傳入一個字元串并傳回一個字元串。

llm("Tell me a joke")
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
           

生成:更廣泛地說,您可以使用輸入清單調用它,傳回比文本更完整的響應。這個完整的回複包括多個頂級回複,以及 LLM 提供者特定資訊

llm_result = llm.generate(["Tell me a joke", "Tell me a poem"]*15)
           
len(llm_result.generations)
           
30
           
llm_result.generations[0]
           
[Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'),
 Generation(text='\n\nWhy did the chicken cross the road?\n\nTo get to the other side.')]
           
llm_result.generations[-1]
           
[Generation(text="\n\nWhat if love neverspeech\n\nWhat if love never ended\n\nWhat if love was only a feeling\n\nI'll never know this love\n\nIt's not a feeling\n\nBut it's what we have for each other\n\nWe just know that love is something strong\n\nAnd we can't help but be happy\n\nWe just feel what love is for us\n\nAnd we love each other with all our heart\n\nWe just don't know how\n\nHow it will go\n\nBut we know that love is something strong\n\nAnd we'll always have each other\n\nIn our lives."),
 Generation(text='\n\nOnce upon a time\n\nThere was a love so pure and true\n\nIt lasted for centuries\n\nAnd never became stale or dry\n\nIt was moving and alive\n\nAnd the heart of the love-ick\n\nIs still beating strong and true.')]
           

您還可以通路傳回的提供商特定資訊。此資訊未跨提供商标準化。

llm_result.llm_output
           
{'token_usage': {'completion_tokens': 3903,
  'total_tokens': 4023,
  'prompt_tokens': 120}}
           

Number of Tokens:您還可以估計一段文本在該模型中将有多少個标記。這很有用,因為模型有一個上下文長度(更多的标記會花費更多),這意味着你需要知道你傳入的文本有多長。

請注意,預設情況下,令牌是使用tiktoken估算的(舊版本 <3.8 除外,其中使用了 Hugging Face 令牌生成器)

llm.get_num_tokens("what a joke")
           
3           

通用功能#

此處的示例都針對某些使用 LLM 的“操作方法”指南。

  • 如何使用 LLM 的異步 API
  • 如何編寫自定義 LLM 包裝器
  • 如何(以及為什麼)使用假 LLM
  • 如何(以及為什麼)使用人工輸入法學碩士
  • 如何緩存 LLM 調用
  • 如何序列化 LLM 類
  • 如何流式傳輸 LLM 和聊天模型響應
  • 如何跟蹤令牌使用情況

如何使用 LLM 的異步 API

LangChain 通過利用asyncio庫為 LLM 提供異步支援。

異步支援對于同時調用多個 LLM 特别有用,因為這些調用是網絡綁定的。目前支援OpenAI、PromptLayerOpenAI和ChatOpenAI,但對其他 LLM 的異步支援在路線圖上。Anthropic

您可以使用該agenerate方法異步調用 OpenAI LLM。

import time
import asyncio

from langchain.llms import OpenAI

def generate_serially():
    llm = OpenAI(temperature=0.9)
    for _ in range(10):
        resp = llm.generate(["Hello, how are you?"])
        print(resp.generations[0][0].text)


async def async_generate(llm):
    resp = await llm.agenerate(["Hello, how are you?"])
    print(resp.generations[0][0].text)


async def generate_concurrently():
    llm = OpenAI(temperature=0.9)
    tasks = [async_generate(llm) for _ in range(10)]
    await asyncio.gather(*tasks)


s = time.perf_counter()
# If running this outside of Jupyter, use asyncio.run(generate_concurrently())
await generate_concurrently() 
elapsed = time.perf_counter() - s
print('\033[1m' + f"Concurrent executed in {elapsed:0.2f} seconds." + '\033[0m')

s = time.perf_counter()
generate_serially()
elapsed = time.perf_counter() - s
print('\033[1m' + f"Serial executed in {elapsed:0.2f} seconds." + '\033[0m')
           
I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, how about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about yourself?


I'm doing well, thank you! How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you! How about you?


I'm doing well, thank you. How about you?
Concurrent executed in 1.39 seconds.


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?

I'm doing well, thank you. How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about yourself?


I'm doing well, thanks for asking. How about you?


I'm doing well, thanks! How about you?


I'm doing well, thank you. How about you?


I'm doing well, thank you. How about yourself?


I'm doing well, thanks for asking. How about you?
Serial executed in 5.77 seconds.           

如何編寫自定義 LLM 包裝器

如果您想使用自己的 LLM 或不同于 LangChain 支援的包裝器,本筆記本介紹了如何建立自定義 LLM 包裝器。

自定義 LLM 隻需要執行一件必需的事情:

  1. 一種_call接受字元串、一些可選停用詞并傳回字元串的方法

它可以實作第二個可選的東西:

  1. _identifying_params用于幫助列印此類的屬性。應該傳回字典。

讓我們實作一個非常簡單的自定義 LLM,它隻傳回輸入的前 N ​個字元。

from typing import Any, List, Mapping, Optional

from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
           
class CustomLLM(LLM):
    
    n: int
        
    @property
    def _llm_type(self) -> str:
        return "custom"
    
    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        return prompt[:self.n]
    
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {"n": self.n}
           

我們現在可以将其用作任何其他 LLM。

llm = CustomLLM(n=10)
           
llm("This is a foobar thing")
           
'This is a '
           

我們還可以列印 LLM 并檢視其自定義列印。

print(llm)
           
CustomLLM
Params: {'n': 10}           

如何(以及為什麼)使用假 LLM

我們公開了一個可用于測試的假 LLM 類。這允許您模拟對 LLM 的調用并模拟如果 LLM 以某種方式響應會發生什麼。

在本筆記本中,我們将介紹如何使用它。

我們從在代理中使用 FakeLLM 開始。

from langchain.llms.fake import FakeListLLM
           
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
           
tools = load_tools(["python_repl"])
           
responses=[
    "Action: Python REPL\nAction Input: print(2 + 2)",
    "Final Answer: 4"
]
llm = FakeListLLM(responses=responses)
           
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
           
agent.run("whats 2 + 2")
           
> Entering new AgentExecutor chain...
Action: Python REPL
Action Input: print(2 + 2)
Observation: 4

Thought:Final Answer: 4

> Finished chain.
           
'4'           

如何(以及為什麼)使用人工輸入 LLM

與假 LLM 類似,LangChain 提供了一個僞 LLM 類,可用于測試、調試或教育目的。這使您可以模拟對 LLM 的調用,并模拟人類在收到提示時的反應。

在本筆記本中,我們将介紹如何使用它。

我們首先在代理中使用 HumanInputLLM。

from langchain.llms.human import HumanInputLLM
           
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
           
tools = load_tools(["wikipedia"])
llm = HumanInputLLM(prompt_func=lambda prompt: print(f"\n===PROMPT====\n{prompt}\n=====END OF PROMPT======"))
           
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
           
agent.run("What is 'Bocchi the Rock!'?")
           
> Entering new AgentExecutor chain...

===PROMPT====
Answer the following questions as best you can. You have access to the following tools:

Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, historical events, or other subjects. Input should be a search query.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Wikipedia]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: What is 'Bocchi the Rock!'?
Thought:
=====END OF PROMPT======
I need to use a tool.
Action: Wikipedia
Action Input: Bocchi the Rock!, Japanese four-panel manga and anime series.
Observation: Page: Bocchi the Rock!
Summary: Bocchi the Rock! (ぼっち・ざ・ろっく!, Bocchi Za Rokku!) is a Japanese four-panel manga series written and illustrated by Aki Hamaji. It has been serialized in Houbunsha's seinen manga magazine Manga Time Kirara Max since December 2017. Its chapters have been collected in five tankōbon volumes as of November 2022.
An anime television series adaptation produced by CloverWorks aired from October to December 2022. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.

Page: Manga Time Kirara
Summary: Manga Time Kirara (まんがタイムきらら, Manga Taimu Kirara) is a Japanese seinen manga magazine published by Houbunsha which mainly serializes four-panel manga. The magazine is sold on the ninth of each month and was first published as a special edition of Manga Time, another Houbunsha magazine, on May 17, 2002. Characters from this magazine have appeared in a crossover role-playing game called Kirara Fantasia.

Page: Manga Time Kirara Max
Summary: Manga Time Kirara Max (まんがタイムきららMAX) is a Japanese four-panel seinen manga magazine published by Houbunsha. It is the third magazine of the "Kirara" series, after "Manga Time Kirara" and "Manga Time Kirara Carat". The first issue was released on September 29, 2004. Currently the magazine is released on the 19th of each month.
Thought:
===PROMPT====
Answer the following questions as best you can. You have access to the following tools:

Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, historical events, or other subjects. Input should be a search query.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Wikipedia]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: What is 'Bocchi the Rock!'?
Thought:I need to use a tool.
Action: Wikipedia
Action Input: Bocchi the Rock!, Japanese four-panel manga and anime series.
Observation: Page: Bocchi the Rock!
Summary: Bocchi the Rock! (ぼっち・ざ・ろっく!, Bocchi Za Rokku!) is a Japanese four-panel manga series written and illustrated by Aki Hamaji. It has been serialized in Houbunsha's seinen manga magazine Manga Time Kirara Max since December 2017. Its chapters have been collected in five tankōbon volumes as of November 2022.
An anime television series adaptation produced by CloverWorks aired from October to December 2022. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.

Page: Manga Time Kirara
Summary: Manga Time Kirara (まんがタイムきらら, Manga Taimu Kirara) is a Japanese seinen manga magazine published by Houbunsha which mainly serializes four-panel manga. The magazine is sold on the ninth of each month and was first published as a special edition of Manga Time, another Houbunsha magazine, on May 17, 2002. Characters from this magazine have appeared in a crossover role-playing game called Kirara Fantasia.

Page: Manga Time Kirara Max
Summary: Manga Time Kirara Max (まんがタイムきららMAX) is a Japanese four-panel seinen manga magazine published by Houbunsha. It is the third magazine of the "Kirara" series, after "Manga Time Kirara" and "Manga Time Kirara Carat". The first issue was released on September 29, 2004. Currently the magazine is released on the 19th of each month.
Thought:
=====END OF PROMPT======
These are not relevant articles.
Action: Wikipedia
Action Input: Bocchi the Rock!, Japanese four-panel manga series written and illustrated by Aki Hamaji.
Observation: Page: Bocchi the Rock!
Summary: Bocchi the Rock! (ぼっち・ざ・ろっく!, Bocchi Za Rokku!) is a Japanese four-panel manga series written and illustrated by Aki Hamaji. It has been serialized in Houbunsha's seinen manga magazine Manga Time Kirara Max since December 2017. Its chapters have been collected in five tankōbon volumes as of November 2022.
An anime television series adaptation produced by CloverWorks aired from October to December 2022. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.
Thought:
===PROMPT====
Answer the following questions as best you can. You have access to the following tools:

Wikipedia: A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, historical events, or other subjects. Input should be a search query.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [Wikipedia]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: What is 'Bocchi the Rock!'?
Thought:I need to use a tool.
Action: Wikipedia
Action Input: Bocchi the Rock!, Japanese four-panel manga and anime series.
Observation: Page: Bocchi the Rock!
Summary: Bocchi the Rock! (ぼっち・ざ・ろっく!, Bocchi Za Rokku!) is a Japanese four-panel manga series written and illustrated by Aki Hamaji. It has been serialized in Houbunsha's seinen manga magazine Manga Time Kirara Max since December 2017. Its chapters have been collected in five tankōbon volumes as of November 2022.
An anime television series adaptation produced by CloverWorks aired from October to December 2022. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.

Page: Manga Time Kirara
Summary: Manga Time Kirara (まんがタイムきらら, Manga Taimu Kirara) is a Japanese seinen manga magazine published by Houbunsha which mainly serializes four-panel manga. The magazine is sold on the ninth of each month and was first published as a special edition of Manga Time, another Houbunsha magazine, on May 17, 2002. Characters from this magazine have appeared in a crossover role-playing game called Kirara Fantasia.

Page: Manga Time Kirara Max
Summary: Manga Time Kirara Max (まんがタイムきららMAX) is a Japanese four-panel seinen manga magazine published by Houbunsha. It is the third magazine of the "Kirara" series, after "Manga Time Kirara" and "Manga Time Kirara Carat". The first issue was released on September 29, 2004. Currently the magazine is released on the 19th of each month.
Thought:These are not relevant articles.
Action: Wikipedia
Action Input: Bocchi the Rock!, Japanese four-panel manga series written and illustrated by Aki Hamaji.
Observation: Page: Bocchi the Rock!
Summary: Bocchi the Rock! (ぼっち・ざ・ろっく!, Bocchi Za Rokku!) is a Japanese four-panel manga series written and illustrated by Aki Hamaji. It has been serialized in Houbunsha's seinen manga magazine Manga Time Kirara Max since December 2017. Its chapters have been collected in five tankōbon volumes as of November 2022.
An anime television series adaptation produced by CloverWorks aired from October to December 2022. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.
Thought:
=====END OF PROMPT======
It worked.
Final Answer: Bocchi the Rock! is a four-panel manga series and anime television series. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim.

> Finished chain.
           
"Bocchi the Rock! is a four-panel manga series and anime television series. The series has been praised for its writing, comedy, characters, and depiction of social anxiety, with the anime's visual creativity receiving acclaim."           

如何緩存 LLM 調用

此筆記本介紹了如何緩存單個 LLM 調用的結果。

from langchain.llms import OpenAI
           

在記憶體緩存中import langchain from langchain.cache import InMemoryCache langchain.llm_cache = InMemoryCache() # To make the caching really obvious, lets use a slower model. llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2) %%time # The first time, it is not yet in cache, so it should take longer llm("Tell me a joke") CPU times: user 35.9 ms, sys: 28.6 ms, total: 64.6 ms Wall time: 4.83 s "\n\nWhy couldn't the bicycle stand up by itself? It was...two tired!" %%time # The second time it is, so it goes faster llm("Tell me a joke") CPU times: user 238 µs, sys: 143 µs, total: 381 µs Wall time: 1.76 ms '\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

SQLite 緩存!rm .langchain.db # We can do the same thing with a SQLite cache from langchain.cache import SQLiteCache langchain.llm_cache = SQLiteCache(database_path=".langchain.db") %%time # The first time, it is not yet in cache, so it should take longer llm("Tell me a joke") CPU times: user 17 ms, sys: 9.76 ms, total: 26.7 ms Wall time: 825 ms '\n\nWhy did the chicken cross the road?\n\nTo get to the other side.' %%time # The second time it is, so it goes faster llm("Tell me a joke") CPU times: user 2.46 ms, sys: 1.23 ms, total: 3.7 ms Wall time: 2.67 ms '\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'

Redis緩存

标準緩存

使用Redis緩存提示和響應。

# We can do the same thing with a Redis cache
# (make sure your local Redis instance is running first before running this example)
from redis import Redis
from langchain.cache import RedisCache

langchain.llm_cache = RedisCache(redis_=Redis())
           
%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
           
CPU times: user 6.88 ms, sys: 8.75 ms, total: 15.6 ms
Wall time: 1.04 s
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'
           
%%time
# The second time it is, so it goes faster
llm("Tell me a joke")
           
CPU times: user 1.59 ms, sys: 610 µs, total: 2.2 ms
Wall time: 5.58 ms
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'
           

語義緩存

使用Redis緩存提示和響應,并根據語義相似性評估命中。

from langchain.embeddings import OpenAIEmbeddings
from langchain.cache import RedisSemanticCache


langchain.llm_cache = RedisSemanticCache(
    redis_url="redis://localhost:6379",
    embedding=OpenAIEmbeddings()
)
           
%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
           
CPU times: user 351 ms, sys: 156 ms, total: 507 ms
Wall time: 3.37 s
           
"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."
           
%%time
# The second time, while not a direct hit, the question is semantically similar to the original question,
# so it uses the cached result!
llm("Tell me one joke")
           
CPU times: user 6.25 ms, sys: 2.72 ms, total: 8.97 ms
Wall time: 262 ms
           
"\n\nWhy don't scientists trust atoms?\nBecause they make up everything."
           

GPTCache

我們可以使用GPTCache進行精确比對緩存或基于語義相似性緩存結果

先從精确比對的例子說起

from gptcache import Cache
from gptcache.manager.factory import manager_factory
from gptcache.processor.pre import get_prompt
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    cache_obj.init(
        pre_embedding_func=get_prompt,
        data_manager=manager_factory(manager="map", data_dir=f"map_cache_{hashed_llm}"),
    )

langchain.llm_cache = GPTCache(init_gptcache)
           
%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
           
CPU times: user 21.5 ms, sys: 21.3 ms, total: 42.8 ms
Wall time: 6.2 s
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'
           
%%time
# The second time it is, so it goes faster
llm("Tell me a joke")
           
CPU times: user 571 µs, sys: 43 µs, total: 614 µs
Wall time: 635 µs
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'
           

現在讓我們展示一個相似性緩存的例子

from gptcache import Cache
from gptcache.adapter.api import init_similar_cache
from langchain.cache import GPTCache
import hashlib

def get_hashed_name(name):
    return hashlib.sha256(name.encode()).hexdigest()

def init_gptcache(cache_obj: Cache, llm: str):
    hashed_llm = get_hashed_name(llm)
    init_similar_cache(cache_obj=cache_obj, data_dir=f"similar_cache_{hashed_llm}")

langchain.llm_cache = GPTCache(init_gptcache)
           
%%time
# The first time, it is not yet in cache, so it should take longer
llm("Tell me a joke")
           
CPU times: user 1.42 s, sys: 279 ms, total: 1.7 s
Wall time: 8.44 s
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
           
%%time
# This is an exact match, so it finds it in the cache
llm("Tell me a joke")
           
CPU times: user 866 ms, sys: 20 ms, total: 886 ms
Wall time: 226 ms
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
           
%%time
# This is not an exact match, but semantically within distance so it hits!
llm("Tell me joke")
           
CPU times: user 853 ms, sys: 14.8 ms, total: 868 ms
Wall time: 224 ms
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side.'
           

SQLAlchemy 緩存

# You can use SQLAlchemyCache to cache with any SQL database supported by SQLAlchemy.

# from langchain.cache import SQLAlchemyCache
# from sqlalchemy import create_engine

# engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres")
# langchain.llm_cache = SQLAlchemyCache(engine)
           

自定義 SQLAlchemy 模式# You can define your own declarative SQLAlchemyCache child class to customize the schema used for caching. For example, to support high-speed fulltext prompt indexing with Postgres, use: from sqlalchemy import Column, Integer, String, Computed, Index, Sequence from sqlalchemy import create_engine from sqlalchemy.ext.declarative import declarative_base from sqlalchemy_utils import TSVectorType from langchain.cache import SQLAlchemyCache Base = declarative_base() class FulltextLLMCache(Base): # type: ignore """Postgres table for fulltext-indexed LLM Cache""" __tablename__ = "llm_cache_fulltext" id = Column(Integer, Sequence('cache_id'), primary_key=True) prompt = Column(String, nullable=False) llm = Column(String, nullable=False) idx = Column(Integer) response = Column(String) prompt_tsv = Column(TSVectorType(), Computed("to_tsvector('english', llm || ' ' || prompt)", persisted=True)) __table_args__ = ( Index("idx_fulltext_prompt_tsv", prompt_tsv, postgresql_using="gin"), ) engine = create_engine("postgresql://postgres:postgres@localhost:5432/postgres") langchain.llm_cache = SQLAlchemyCache(engine, FulltextLLMCache)

可選緩存

如果您願意,您還可以關閉特定 LLM 的緩存。在下面的示例中,即使啟用了全局緩存,我們也會為特定的 LLM 将其關閉

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2, cache=False)
           
%%time
llm("Tell me a joke")
           
CPU times: user 5.8 ms, sys: 2.71 ms, total: 8.51 ms
Wall time: 745 ms
           
'\n\nWhy did the chicken cross the road?\n\nTo get to the other side!'
           
%%time
llm("Tell me a joke")
           
CPU times: user 4.91 ms, sys: 2.64 ms, total: 7.55 ms
Wall time: 623 ms
           
'\n\nTwo guys stole a calendar. They got six months each.'
           

鍊中的可選緩存

您還可以關閉鍊中特定節點的緩存。請注意,由于某些接口,通常更容易先建構鍊,然後再編輯 LLM。

作為示例,我們将加載一個彙總器 map-reduce 鍊。我們将緩存映射步驟的結果,但不會當機合并步驟的結果。

llm = OpenAI(model_name="text-davinci-002")
no_cache_llm = OpenAI(model_name="text-davinci-002", cache=False)
           
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.mapreduce import MapReduceChain

text_splitter = CharacterTextSplitter()
           
with open('../../../state_of_the_union.txt') as f:
    state_of_the_union = f.read()
texts = text_splitter.split_text(state_of_the_union)
           
from langchain.docstore.document import Document
docs = [Document(page_content=t) for t in texts[:3]]
from langchain.chains.summarize import load_summarize_chain
           
chain = load_summarize_chain(llm, chain_type="map_reduce", reduce_llm=no_cache_llm)
           
%%time
chain.run(docs)
           
CPU times: user 452 ms, sys: 60.3 ms, total: 512 ms
Wall time: 5.09 s
           
'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure. In response to Russian aggression in Ukraine, the United States is joining with European allies to impose sanctions and isolate Russia. American forces are being mobilized to protect NATO countries in the event that Putin decides to keep moving west. The Ukrainians are bravely fighting back, but the next few weeks will be hard for them. Putin will pay a high price for his actions in the long run. Americans should not be alarmed, as the United States is taking action to protect its interests and allies.'
           

當我們再次運作它時,我們發現它運作得更快,但最終的答案是不同的。這是由于在 map 步驟緩存,而不是在 reduce 步驟緩存。

%%time
chain.run(docs)
           
CPU times: user 11.5 ms, sys: 4.33 ms, total: 15.8 ms
Wall time: 1.04 s
           
'\n\nPresident Biden is discussing the American Rescue Plan and the Bipartisan Infrastructure Law, which will create jobs and help Americans. He also talks about his vision for America, which includes investing in education and infrastructure.'
           
!rm .langchain.db sqlite.db           

如何序列化 LLM 類

本筆記本介紹了如何在磁盤上寫入和讀取 LLM 配置。如果您想儲存給定 LLM 的配置(例如,供應商、溫度等),這将很有用。

from langchain.llms import OpenAI
from langchain.llms.loading import load_llm
           

加載

首先,讓我們回顧一下從磁盤加載 LLM。LLM 可以以兩種格式儲存在磁盤上:json 或 yaml。無論擴充名如何,它們都以相同的方式加載。

!cat llm.json
           
{
    "model_name": "text-davinci-003",
    "temperature": 0.7,
    "max_tokens": 256,
    "top_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
    "n": 1,
    "best_of": 1,
    "request_timeout": null,
    "_type": "openai"
}
           
llm = load_llm("llm.json")
           
!cat llm.yaml
           
_type: openai
best_of: 1
frequency_penalty: 0.0
max_tokens: 256
model_name: text-davinci-003
n: 1
presence_penalty: 0.0
request_timeout: null
temperature: 0.7
top_p: 1.0
           
llm = load_llm("llm.yaml")
           

儲存

如果您想從記憶體中的 LLM 轉換為它的序列化版本,您可以通過調用該.save方法輕松實作。同樣,這同時支援 json 和 yaml。

llm.save("llm.json")
           
llm.save("llm.yaml")           

如何流式傳輸 LLM 和聊天模型響應

LangChain 為 LLM 提供流媒體支援。目前,我們支援OpenAI、ChatOpenAI和ChatAnthropic實作的流式處理,但對其他 LLM 實作的流式處理支援也在路線圖上。要利用流式傳輸,請使用CallbackHandler實作on_llm_new_token. 在這個例子中,我們使用StreamingStdOutCallbackHandler.

from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI, ChatAnthropic
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.schema import HumanMessage
           
llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = llm("Write me a song about sparkling water.")
           
Verse 1
I'm sippin' on sparkling water,
It's so refreshing and light,
It's the perfect way to quench my thirst
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

Verse 2
I'm sippin' on sparkling water,
It's so bubbly and bright,
It's the perfect way to cool me down
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.

Verse 3
I'm sippin' on sparkling water,
It's so light and so clear,
It's the perfect way to keep me cool
On a hot summer night.

Chorus
Sparkling water, sparkling water,
It's the best way to stay hydrated,
It's so crisp and so clean,
It's the perfect way to stay refreshed.
           

LLMResult如果使用 ,我們仍然可以通路結尾generate。但是,token_usage目前不支援流式傳輸。

llm.generate(["Tell me a joke."])
           
Q: What did the fish say when it hit the wall?
A: Dam!
           
LLMResult(generations=[[Generation(text='\n\nQ: What did the fish say when it hit the wall?\nA: Dam!', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {}, 'model_name': 'text-davinci-003'})
           

這是聊天模型實作的示例ChatOpenAI:

chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = chat([HumanMessage(content="Write me a song about sparkling water.")])
           
Verse 1:
Bubbles rising to the top
A refreshing drink that never stops
Clear and crisp, it's oh so pure
Sparkling water, I can't ignore

Chorus:
Sparkling water, oh how you shine
A taste so clean, it's simply divine
You quench my thirst, you make me feel alive
Sparkling water, you're my favorite vibe

Verse 2:
No sugar, no calories, just H2O
A drink that's good for me, don't you know
With lemon or lime, you're even better
Sparkling water, you're my forever

Chorus:
Sparkling water, oh how you shine
A taste so clean, it's simply divine
You quench my thirst, you make me feel alive
Sparkling water, you're my favorite vibe

Bridge:
You're my go-to drink, day or night
You make me feel so light
I'll never give you up, you're my true love
Sparkling water, you're sent from above

Chorus:
Sparkling water, oh how you shine
A taste so clean, it's simply divine
You quench my thirst, you make me feel alive
Sparkling water, you're my favorite vibe

Outro:
Sparkling water, you're the one for me
I'll never let you go, can't you see
You're my drink of choice, forevermore
Sparkling water, I adore.
           

這是聊天模型實作的示例ChatAnthropic,它使用了他們的claude模型。

chat = ChatAnthropic(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
resp = chat([HumanMessage(content="Write me a song about sparkling water.")])
           
Here is my attempt at a song about sparkling water:

Sparkling water, bubbles so bright, 
Dancing in the glass with delight.
Refreshing and crisp, a fizzy delight,
Quenching my thirst with each sip I take.
The carbonation tickles my tongue,
As the refreshing water song is sung.
Lime or lemon, a citrus twist,
Makes sparkling water such a bliss.
Healthy and hydrating, a drink so pure,
Sparkling water, always alluring.
Bubbles ascending in a stream, 
Sparkling water, you're my dream!           

如何跟蹤令牌使用情況

此筆記本介紹了如何跟蹤特定呼叫的令牌使用情況。它目前僅針對 OpenAI API 實作。

讓我們首先看一個非常簡單的跟蹤單個 LLM 調用的令牌使用示例。

from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback
           
llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)
           
with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    print(cb)
           
Tokens Used: 42
	Prompt Tokens: 4
	Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084
           

上下文管理器中的任何内容都将被跟蹤。下面是使用它按順序跟蹤多個呼叫的示例。

with get_openai_callback() as cb:
    result = llm("Tell me a joke")
    result2 = llm("Tell me a joke")
    print(cb.total_tokens)
           
91
           

如果使用其中包含多個步驟的鍊或代理,它将跟蹤所有這些步驟。

from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
           
with get_openai_callback() as cb:
    response = agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")
           
> Entering new AgentExecutor chain...
 I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: Sudeikis and Wilde's relationship ended in November 2020. Wilde was publicly served with court documents regarding child custody while she was presenting Don't Worry Darling at CinemaCon 2022. In January 2021, Wilde began dating singer Harry Styles after meeting during the filming of Don't Worry Darling.
Thought: I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557

Thought: I now know the final answer.
Final Answer: Harry Styles, Olivia Wilde's boyfriend, is 29 years old and his age raised to the 0.23 power is 2.169459462491557.

> Finished chain.
Total Tokens: 1506
Prompt Tokens: 1350
Completion Tokens: 156
Total Cost (USD): $0.03012           

繼續閱讀