preface
In the previous article, I introduced a simple practice of using SemanticKernel/C# for RAG, in the previous article I used an online API compatible with the OpenAI format, but there are actually many local offline scenarios. Today, I would like to introduce to you how to use the dialogue model and embedding model in Ollama for local offline scenarios in SemanticKernel/C#.
Start practicing
本文使用的对话模型是gemma2:2b,嵌入模型是all-minilm:latest,可以先在Ollama中下载好。
2024年2月8号,Ollamama中的兼容了OpenAI Chat Completions API,具体见 https://ollama.com/blog/openai-compatibility。
Therefore, it is relatively simple to use the dialogue model in Ollama in SemanticKernel/C#.
var kernel = Kernel.CreateBuilder()
.AddOpenAIChatCompletion(modelId: "gemma2:2b", apiKey: , endpoint: new Uri("http://localhost:11434")).Build();
This is how you can build the kernel.
Let's try it out:
public async Task<string> Praise()
{
var skPrompt = """
你是一个夸人的专家,回复一句话夸人。
你的回复应该是一句话,不要太长,也不要太短。
""";
var result = await _kernel.InvokePromptAsync(skPrompt);
var str = result.ToString();
return str;
}
In this way, the setup successfully uses Ollama's dialogue model in SemanticKernel.
Now let's take a look at the embedding model, since Ollama is not compatible with OpenAI's format, it is not possible to use it directly.
Ollama's format looks like this:
OpenAI's request format looks like this:
curl https://api.openai.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"input": "Your text string goes here",
"model": "text-embedding-3-small"
}'
OpenAI's return format looks like this:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.006929283495992422,
-0.005336422007530928,
... (omitted for spacing)
-4.547132266452536e-05,
-0.024047505110502243
],
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}
Therefore, forwarding by request is not possible.
Someone also raised this question in ollama's issue before:
There also seems to be a ready implementation of the compatibility of the embedded interface:
I tried it so far, and it is not compatible.
In SemanticKernel, I need to implement some interfaces to use Ollama's embedding model, but after searching, I found that there are already big guys who have done this, github address: https://github.com/BLaZeKiLL/Codeblaze.SemanticKernel.
For use, see https://github.com/BLaZeKiLL/Codeblaze.SemanticKernel/tree/main/dotnet/Codeblaze.SemanticKernel.Connectors.Ollama
大佬实现了ChatCompletion、EmbeddingGeneration与TextGenerationService,如果你只使用到EmbeddingGeneration可以看大佬的代码,在项目里自己添加一些类,来减少项目中的包。
Here, for convenience, install the big guy's package directly:
构建ISemanticTextMemory:
public async Task<ISemanticTextMemory> GetTextMemory3()
{
var builder = new MemoryBuilder();
var embeddingEndpoint = "http://localhost:11434";
var cancellationTokenSource = new System.Threading.CancellationTokenSource();
var cancellationToken = cancellationTokenSource.Token;
builder.WithHttpClient(new HttpClient());
builder.WithOllamaTextEmbeddingGeneration("all-minilm:latest", embeddingEndpoint);
IMemoryStore memoryStore = await SqliteMemoryStore.ConnectAsync("memstore.db");
builder.WithMemoryStore(memoryStore);
var textMemory = builder.Build();
return textMemory;
}
Now let's try the effect, improve it based on yesterday's sharing, and upload a txt file today.
A private document looks like this, and the privacy information has been replaced:
各位同学:
你好,为了帮助大家平安、顺利地度过美好的大学时光,学校专门引进“互联网+”高校安全教育服务平台,可通过手机端随时随地学习安全知识的网络微课程。大学生活多姿多彩,牢固掌握安全知识,全面提升安全技能和素质。请同学们务必在规定的学习时间完成该课程的学习与考试。
请按如下方式自主完成学习和考试:
1、手机端学习平台入口:请关注微信公众号“XX大学”或扫描下方二维码,进入后点击公众号菜单栏【学术导航】→【XX微课】,输入账号(学号)、密码(学号),点【登录】后即可绑定信息,进入学习平台。
2、网页端学习平台入口:打开浏览器,登录www.xxx.cn,成功进入平台后,即可进行安全知识的学习。
3、平台开放时间:2024年4月1日—2024年4月30日,必须完成所有的课程学习后才能进行考试,试题共计50道,满分为100分,80分合格,有3次考试机会,最终成绩取最优分值。
4、答疑qq群号:123123123。
学习平台登录流程
1. 手机端学习平台入口:
请扫描下方二维码,关注微信公众号“XX大学”;
公众号菜单栏【学术导航】→【XX微课】,选择学校名称,输入账号(学号)、密码(学号),点【登录】后即可绑定信息,进入学习平台;
遇到问题请点【在线课服】或【常见问题】,进行咨询(咨询时间:周一至周日8:30-17:00)。
2. 网页端学习平台入口:
打开浏览器,登录www.xxx.cn,成功进入平台后,即可进行安全知识的学习。
3. 安全微课学习、考试
1) 微课学习
点击首页【学习任务中】的【2024年春季安全教育】,进入课程学习;
展开微课列表,点击微课便可开始学习;
大部分微课是点击继续学习,个别微课是向上或向左滑动学习;
微课学习完成后会有“恭喜,您已完成本微课的学习”的提示,需点击【确定】,再点击【返回课程列表】,方可记录微课完成状态;
2) 结课考试
完成该项目的所有微课学习后,点击【考试安排】→【参加考试】即可参加结课考试。
To upload a document:
Cut into three sections:
Deposit data:
Reply to a question, such as "What is the Q&A QQ group number?" ”:
Although it took a bit long, about tens of seconds, the answer was correct:
Try answering one more question:
The answer effect is not very good, and because the configuration is not good, the local run is also very slow, if there is a condition, you can change a model, if there is no condition and it is not necessary to run offline, you can connect a free API, in combination with the local embedding model.
换成在线api的Qwen/Qwen2-7B-Instruct,效果还不错:
summary
The main takeaway from this practice is how to use the dialogue model and embedding model in Ollama in SemanticKernel for local offline scenarios. In the process of practicing RAG, I found that there are two main places that affect the effect.
The first place is the determination of the slice size:
var lines = TextChunker.SplitPlainTextLines(input, 20);
var paragraphs = TextChunker.SplitPlainTextParagraphs(lines, 100);
The second place is to get a few pieces of relevant data and the setting of relevance:
var memoryResults = textMemory.SearchAsync(index, input, limit: 3, minRelevanceScore: 0.3);
The correlation is too high, and a piece of data is also found