安裝milvus

關于milvus

milvus作為一個內建的開源平台，目标就是向量檢索的內建平台。類似于elasticsearch內建了搜尋。細節大家可以直接看官網。https://www.milvus.io/cn/docs/v0.11.0/overview.md

安裝

說起來其實挺容易，方法也比較清晰。不過下載下傳比較慢。

>> docker pull milvusdb/milvus:0.11.0-cpu-d101620-4c44c0
0.11.0-cpu-d101620-4c44c0: Pulling from milvusdb/milvus
75f829a71a1c: Pull complete
e654e509dcd3: Pull complete
482d74c614ad: Pull complete
85d20808a7e5: Pull complete
2f8820d4255e: Pull complete
Digest: sha256:6a5dc00b26dc18be5e5bfddc8cfb36370188e4c951e62ffafa30fbe3f4b1ad60
Status: Downloaded newer image for milvusdb/milvus:0.11.0-cpu-d101620-4c44c0
docker.io/milvusdb/milvus:0.11.0-cpu-d101620-4c44c0

啟動

docker run -d --name milvus_cpu_0.11.0 \
	-p 19530:19530 \
	-p 19121:19121 \
	-v /home/$yourname/milvus/db:/var/lib/milvus/db \
	-v /home/$yourname/milvus/conf:/var/lib/milvus/conf \
	-v /home/$yourname/milvus/logs:/var/lib/milvus/logs \
	-v /home/$yourname/milvus/wal:/var/lib/milvus/wal \
	milvusdb/milvus:0.11.0-cpu-d101620-4c44c0```

這個地方有個小細節，就是不要-d啟動，觀察錯誤，等沒問題之後再用-d啟動，這樣可以觀察細節。

另外，milvus.yaml如果下載下傳不了就翻牆吧。我還是放一份在這裡吧。https://download.csdn.net/download/iterate7/13081889

安裝對應的admin觀察界面

>> docker pull milvusdb/milvus-em:v0.5.0
>> docker run -d -p 3000:80 -e API_URL=http://192.168.13.218:3000 milvusdb/milvus-em:v0.5.0

然後就可以在界面觀察milvus了。

向量檢索milvus之一：以圖搜圖安裝milvus實驗向量搜尋建索引以圖搜尋利用pic-search-webserver來圖檔向量化利用pic-search-webclient來頁面互動總結參考文獻

實驗向量搜尋

主要是增删改查。直接看代碼更直接。

import numpy as np
import random
from milvus import Milvus
from milvus import Status

_HOST = '192.168.xx.xx'
_PORT = 19530

# Connect to Milvus Server
milvus = Milvus(_HOST, _PORT)

# Close client instance
# milvus.close()

# Returns the status of the Milvus server.
server_status = milvus.server_status(timeout=4)
print(server_status)


# Vector parameters
_DIM = 8  # dimension of vector

_INDEX_FILE_SIZE = 32  # max file size of stored index

# the demo name.
collection_name = 'example_collection_'
partition_tag = 'demo_tag_'
segment_name= ''

# 10 vectors with 8 dimension, per element is float32 type, vectors should be a 2-D array
vectors = [[random.random() for _ in range(_DIM)] for _ in range(10)]
ids = [i for i in range(10)]

print(vectors)

# Returns the version of the client.
client_version= milvus.client_version()
print(client_version)

# Returns the version of the Milvus server.
server_version = milvus.server_version(timeout=10)
print(server_version)

print("has collection:",milvus.has_collection(collection_name=collection_name, timeout=10))


from milvus import DataType
# Information needed to create a collection.Defult index_file_size=1024 and metric_type=MetricType.L2
collection_param = {
    "fields": [
        #  Milvus doesn't support string type now, but we are considering supporting it soon.
        #  {"name": "title", "type": DataType.STRING},
        {"name": "duration", "type": DataType.INT32, "params": {"unit": "minute"}},
        {"name": "release_year", "type": DataType.INT32},
        {"name": "embedding", "type": DataType.FLOAT_VECTOR, "params": {"dim": 8}},
    ],
    "segment_row_limit": 4096,
    "auto_id": False
}

# ------
# Basic create collection:
#     After create collection `demo_films`, we create a partition tagged "American", it means the films we
#     will be inserted are from American.
# ------
# milvus.create_collection(collection_name, collection_param)
# milvus.create_partition(collection_name, "American")


# ------
# Basic create collection:
#     You can check the collection info and partitions we've created by `get_collection_info` and
#     `list_partitions`
# ------
print("--------get collection info--------")
collection = milvus.get_collection_info(collection_name)
print(collection)
partitions = milvus.list_partitions(collection_name)
print("\n----------list partitions----------")
print(partitions)

# ------
# Basic insert entities:
#     We have three films of The_Lord_of_the_Rings series here with their id, duration release_year
#     and fake embeddings to be inserted. They are listed below to give you a overview of the structure.
# ------
The_Lord_of_the_Rings = [
    {
        "title": "The_Fellowship_of_the_Ring",
        "id": 1,
        "duration": 208,
        "release_year": 2001,
        "embedding": [random.random() for _ in range(8)]
    },
    {
        "title": "The_Two_Towers",
        "id": 2,
        "duration": 226,
        "release_year": 2002,
        "embedding": [random.random() for _ in range(8)]
    },
    {
        "title": "The_Return_of_the_King",
        "id": 3,
        "duration": 252,
        "release_year": 2003,
        "embedding": [random.random() for _ in range(8)]
    }
]

# ------
# Basic insert entities:
#     To insert these films into Milvus, we have to group values from the same field together like below.
#     Then these grouped data are used to create `hybrid_entities`.
# ------
ids = [k.get("id") for k in The_Lord_of_the_Rings]
durations = [k.get("duration") for k in The_Lord_of_the_Rings]
release_years = [k.get("release_year") for k in The_Lord_of_the_Rings]
embeddings = [k.get("embedding") for k in The_Lord_of_the_Rings]

hybrid_entities = [
    # Milvus doesn't support string type yet, so we cannot insert "title".
    {"name": "duration", "values": durations, "type": DataType.INT32},
    {"name": "release_year", "values": release_years, "type": DataType.INT32},
    {"name": "embedding", "values": embeddings, "type": DataType.FLOAT_VECTOR},
]

# ------
# Basic insert entities:
#     We insert the `hybrid_entities` into our collection, into partition `American`, with ids we provide.
#     If succeed, ids we provide will be returned.
# ------
for _ in range(2000):
    ids = milvus.insert(collection_name, hybrid_entities, ids, partition_tag="American")
    print("\n----------insert----------")
    print("Films are inserted and the ids are: {}".format(ids))


# ------
# Basic insert entities:
#     After insert entities into collection, we need to flush collection to make sure its on disk,
#     so that we are able to retrieve it.
# ------
before_flush_counts = milvus.count_entities(collection_name)
milvus.flush([collection_name])
after_flush_counts = milvus.count_entities(collection_name)
print("\n----------flush----------")
print("There are {} films in collection `{}` before flush".format(before_flush_counts, collection_name))
print("There are {} films in collection `{}` after flush".format(after_flush_counts, collection_name))

# ------
# Basic insert entities:
#     We can get the detail of collection statistics info by `get_collection_stats`
# ------
info = milvus.get_collection_stats(collection_name)
print("\n----------get collection stats----------")
print(info)

# ------
# Basic search entities:
#     Now that we have 3 films inserted into our collection, it's time to obtain them.
#     We can get films by ids, if milvus can't find entity for a given id, `None` will be returned.
#     In the case we provide below, we will only get 1 film with id=1 and the other is `None`
# ------
films = milvus.get_entity_by_id(collection_name, ids=[1, 200])
print("\n----------get entity by id = 1, id = 200----------")
for film in films:
    if film is not None:
        print(" > id: {},\n > duration: {}m,\n > release_years: {},\n > embedding: {}"
              .format(film.id, film.duration, film.release_year, film.embedding))

# ------
# Basic hybrid search entities:
#      Getting films by id is not enough, we are going to get films based on vector similarities.
#      Let's say we have a film with its `embedding` and we want to find `top3` films that are most similar
#      with it by L2 distance.
#      Other than vector similarities, we also want to obtain films that:
#        `released year` term in 2002 or 2003,
#        `duration` larger than 250 minutes.
#
#      Milvus provides Query DSL(Domain Specific Language) to support structured data filtering in queries.
#      For now milvus supports TermQuery and RangeQuery, they are structured as below.
#      For more information about the meaning and other options about "must" and "bool",
#      please refer to DSL chapter of our pymilvus documentation
#      (https://pymilvus.readthedocs.io/en/latest/).
# ------
query_embedding = [random.random() for _ in range(8)]
query_hybrid = {
    "bool": {
        "must": [
            {
                "term": {"release_year": [2002, 2003]}
            },
            {
                # "GT" for greater than
                "range": {"duration": {"GT": 250}}
            },
            {
                "vector": {
                    "embedding": {"topk": 3, "query": [query_embedding], "metric_type": "L2"}
                }
            }
        ]
    }
}

# ------
# Basic hybrid search entities:
#     And we want to get all the fields back in results, so fields = ["duration", "release_year", "embedding"].
#     If searching successfully, results will be returned.
#     `results` have `nq`(number of queries) separate results, since we only query for 1 film, The length of
#     `results` is 1.
#     We ask for top 3 in-return, but our condition is too strict while the database is too small, so we can
#     only get 1 film, which means length of `entities` in below is also 1.
#
#     Now we've gotten the results, and known it's a 1 x 1 structure, how can we get ids, distances and fields?
#     It's very simple, for every `topk_film`, it has three properties: `id, distance and entity`.
#     All fields are stored in `entity`, so you can finally obtain these data as below:
#     And the result should be film with id = 3.
# ------
results = milvus.search(collection_name, query_hybrid, fields=["duration", "release_year", "embedding"])
print("\n----------search----------")
for entities in results:
    for topk_film in entities:
        current_entity = topk_film.entity
        print("- id: {}".format(topk_film.id))
        print("- distance: {}".format(topk_film.distance))

        print("- release_year: {}".format(current_entity.release_year))
        print("- duration: {}".format(current_entity.duration))
        print("- embedding: {}".format(current_entity.embedding))

# ------
# Basic delete:
#     Now let's see how to delete things in Milvus.
#     You can simply delete entities by their ids.
# ------
# milvus.delete_entity_by_id(collection_name, ids=[1, 2])
# milvus.flush()  # flush is important
# result = milvus.get_entity_by_id(collection_name, ids=[1, 2])
#
# counts_delete = sum([1 for entity in result if entity is not None])
# counts_in_collection = milvus.count_entities(collection_name)
# print("\n----------delete id = 1, id = 2----------")
# print("Get {} entities by id 1, 2".format(counts_delete))
# print("There are {} entities after delete films with 1, 2".format(counts_in_collection))
#
# # ------
# # Basic delete:
# #     You can drop partitions we create, and drop the collection we create.
# # ------
# milvus.drop_partition(collection_name, partition_tag='American')
# if collection_name in milvus.list_collections():
#     milvus.drop_collection(collection_name)

# ------
# Summary:
#     Now we've went through all basic communications pymilvus can do with Milvus server, hope it's helpful!
# ------
#https://github.com/milvus-io/pymilvus/tree/0.3.0#insert-entities-in-a-collection

建索引

上面隻是插入庫裡。真正的搜尋還是要建索引的。

ivf_param = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 4096}}
# the demo name.
collection_name = 'example_collection_'
partition_tag = 'demo_tag_'
segment_name= ''
_HOST = '192.168.xx.xx'
_PORT = 19530

# Connect to Milvus Server
client = Milvus(_HOST, _PORT)
client.create_index(collection_name, "embedding", ivf_param)

建了索引之後搜尋就非常快了。

以圖搜尋

https://tutorials.milvus.io/how-to-do-reverse-image-search-with-milvus/index.html可以是一個粗略的參考。本質上就是圖的向量化，然後milvus建索引搜尋。

搜尋之後，找到id和圖的關系展示。

邏輯非常簡單。

向量檢索milvus之一：以圖搜圖安裝milvus實驗向量搜尋建索引以圖搜尋利用pic-search-webserver來圖檔向量化利用pic-search-webclient來頁面互動總結參考文獻

利用pic-search-webserver來圖檔向量化

docker run \
-v /Users/xx/milvus/data/VOCdevkit/VOC2012/JPEGImages:/tmp/pic1 \
-p 35000:5000 -e "DATA_PATH=/tmp/images-data" \
-e "MILVUS_HOST=192.168.xx.xx" milvusbootcamp/pic-search-webserver:0.7.0

這個指令是啟動一個服務，來完成圖檔的向量化。後續我們專門來一個章節來分析這部分。

前提是搞一些圖檔放在JPEGImages檔案夾裡。當然提前裝好鏡像, docker pull milvusbootcamp/pic-search-webserver:0.7.0

利用pic-search-webclient來頁面互動

>> docker pull milvusbootcamp/pic-search-webclient:0.1.0
>> docker run --name zilliz_search_images_demo_web  --rm -p 8001:80 \
-e API_URL=http://0.0.0.0:35000 \
milvusbootcamp/pic-search-webclient:0.1.0

裝好之後就可以在界面觀察。

向量檢索milvus之一：以圖搜圖安裝milvus實驗向量搜尋建索引以圖搜尋利用pic-search-webserver來圖檔向量化利用pic-search-webclient來頁面互動總結參考文獻

總結

安裝的一些實操；向量的一些基礎操作；
圖檔的向量化
向量的索引以及搜尋docker的部署

參考文獻

https://tutorials.milvus.io/how-to-do-reverse-image-search-with-milvus/index.html
https://github.com/milvus-io/pymilvus/tree/0.3.0#insert-entities-in-a-collection
https://zilliz.blog.csdn.net/article/details/103884272?utm_medium=distribute.pc_relevant_t0.none-task-blog-OPENSEARCH-1.channel_param&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-OPENSEARCH-1.channel_param

向量檢索milvus之一：以圖搜圖安裝milvus實驗向量搜尋建索引以圖搜尋利用pic-search-webserver來圖檔向量化利用pic-search-webclient來頁面互動總結參考文獻

安裝milvus

關于milvus

安裝

啟動

安裝對應的admin觀察界面

實驗向量搜尋

建索引

以圖搜尋

利用pic-search-webserver來圖檔向量化

利用pic-search-webclient來頁面互動

總結

參考文獻

繼續閱讀

Command Network(POJ 3164)---定根最小樹形圖模闆題題目描述輸入格式輸出格式輸入樣例輸出樣例分析源程式

開源低帶寬語音編解碼器

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

極大似然法(ML)與最大期望法(EM)

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

C++ 第十五周報告1--《冒泡法排序》

筆試面試題目：滑動視窗(二)

資料結構與算法（27）——排序（二）

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

hdu7108哈希