Elasticsearch 和 Python建構面部識别系統—Elastic Stack 實戰手冊

https://developer.aliyun.com/topic/download?id=1295 · 更多精彩内容，請下載下傳閱讀全本《Elastic Stack實戰手冊》 https://developer.aliyun.com/topic/download?id=1295 https://developer.aliyun.com/topic/es100 · 加入創作人行列，一起交流碰撞，參與技術圈年度盛事吧 https://developer.aliyun.com/topic/es100

創作人：劉曉國

你是否曾經嘗試在圖像中搜尋目标？ Elasticsearch 可以幫助你存儲，分析和搜尋圖像或視訊中的目标。

在本文中，我們将向你展示如何建構一個使用 Python 進行面部識别的系統。了解有關如何檢測和編碼面部資訊的更多資訊-并在搜尋中找到比對項。

我們将參照代碼： https://github.com/liu-xiao-guo/face_detection_elasticsearch 。你可以把這個代碼下載下傳到本地的電腦：

$ pwd
/Users/liuxg/python/face_detection
$ tree -L 2
.
├── README.md
├── getVectorFromPicture.py
├── images
│   ├── shay.png
│   ├── simon.png
│   ├── steven.png
│   └── uri.png
├── images_to_be_recognized
│   └── facial-recognition-blog-elastic-founders-match.png
└── recognizeFaces.py

在上面的代碼中，有如下的兩個 python 檔案：

getVectorFromPicture.py：導入在 images 目錄下的圖像。這些圖像将被導入到 Elasticsearch 中
recognizeFaces.py：識别位于 images_to_be_recognized 目錄下的圖像檔案

基礎知識

面部識别

面部識别是使用面部特征來識别使用者的過程，例如，為了實作身份驗證機制（例如解鎖智能手機）。它根據人的面部細節捕獲，分析和比較模式。此過程可以分為三個步驟：

人臉檢測：識别數字圖像中的人臉
人臉資料編碼：将人臉特征轉換為數字表示
臉部比對：搜尋和比較臉部特征

在示例中，我們将引導你完成每個步驟。

128 維向量

可以将面部特征轉換為一組數字資訊，以便進行存儲和分析。

Vector data type

Elasticsearch 提供了 dense_vector 資料類型來存儲浮點值的 dense vectors。向量中的最大尺寸數不應超過 2048，這足以存儲面部特征表示。

現在，讓我們實作所有這些概念。

準備

要檢測面部并編碼資訊，你需要執行以下操作：

Python：在此示例中，我們将使用 Python 3
Elasticsearch 叢集：你可以免費使用阿裡雲Elasticsearch 來啟動叢集。本文中，我将進行一個本地的部署 Elasticsearch 及 Kibana。
人臉識别庫：一個簡單的人臉識别 Python 庫。
Python Elasticsearch 用戶端：Elasticsearch的官方Python用戶端。

用戶端下載下傳： https://elasticsearch-py.readthedocs.io/en/v7.10.1/ Python教程： https://elasticstack.blog.csdn.net/article/details/111573923 Python下載下傳：: https://www.python.org/downloads/

注意，我們已經在 Ubuntu 20.04 LTS 和 Ubuntu 18.04 LTS 上測試了以下說明。根據你的作業系統，可能需要進行一些更改。盡管下面的安裝步驟是針對 Ubuntu 作業系統的，但是我們可以按照同樣的步驟在 Mac OS 上進行同樣的順序進行安裝（部分指令會有所不同）。

安裝 Python 和 Python 庫

随 Python 3 的安裝一起提供了 Ubuntu 20.04 和其他版本的 Debian Linux。

如果你的系統不是這種情況，則可以點選下載下傳并安裝 Python：

要确認您的版本是最新版本，可以運作以下指令：

sudo apt update 
sudo apt upgrade

确認 Python 版本為 3.x：

python3 -V

或者：

python --version

安裝 pip3 來管理 Python 庫：

sudo apt install -y python3-pip

安裝 face_recognition 庫所需的 cmake：

pip3 install CMake

将 cmake bin 檔案夾添加到 $PATH 目錄中：

export PATH=$CMake_bin_folder:$PATH

在我的測試中，上述步驟可以不需要。你隻要在任何一個 terminal 中打入 cmake 指令，如果能看到被執行，那麼就可以不用上面的指令了。

最後，在開始編寫主程式腳本之前，安裝以下庫：

pip3 install dlib 
pip3 install numpy 
pip3 install face_recognition  
pip3 install elasticsearch

從圖像中檢測和編碼面部資訊

使用 face_recognition 庫，我們可以從圖像中檢測人臉，并将人臉特征轉換為 128 維向量。

為此，我們建立一個叫做 getVectorFromPicture.py:

getVectorFromPicture.py

import face_recognition 
import numpy as np 
import sys
import os
from pathlib import Path
from elasticsearch import Elasticsearch
 
es = Elasticsearch([{'host':'localhost','port':9200}])
 
cwd = os.getcwd()
print("cwd: " + cwd)
 
# Get the images directory
rootdir = cwd + "/images"
print("rootdir: " + rootdir)
 
for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        print(os.path.join(subdir, file))
        file_path = os.path.join(subdir, file)
 
        image = face_recognition.load_image_file(file_path)
 
        # detect the faces from the images
        face_locations = face_recognition.face_locations(image)
 
        # encode the 128-dimension face encoding for each face in the image
        face_encodings = face_recognition.face_encodings(image, face_locations)
 
        # Display the 128-dimension for each face detected
        for face_encoding in face_encodings:
            print("Face found ==>  ", face_encoding.tolist())
            print("name: " + Path(file_path).stem)
            name = Path(file_path).stem
            face_encoding = face_encoding.tolist()
 
            # format a dictionary to be indexed
            e = {
                "face_name": name,
                "face_encoding": face_encoding 
            }
 
            res = es.index(index = 'faces', doc_type ='_doc', body = e)

首先，我們需要聲明的是：你需要修改上面的 Elasticsearch 的位址，如果你的 Elasticsearch 不是運作于 localhost:9200。上面的代碼非常之簡單。它把目前目錄下的子目錄 images 下的所有檔案都掃描一遍，并針對每個檔案進行編碼。我們使用 Python client API 接口把資料導入到 Elasticsearch 中去。在我們的 images 檔案夾中，有四個檔案。

在導入資料之前，我們需要在 Kibana 中建立一個叫做 faces 的索引：

PUT faces
{
  "mappings": {
    "properties": {
      "face_name": {
        "type": "keyword"
      },
      "face_encoding": {
        "type": "dense_vector",
        "dims": 128
      }
    }
  }
}

讓我們執行 getVectorFromPicture.py 以擷取 Elastic 創始人圖像的面部特征表示。

python3 getVectorFromPicture.py

現在，我們可以将面部特征表示存儲到 Elasticsearch 中。

我們可以在 Elasticsearch 中看到四個文檔：

GET faces/_count

{
  "count" : 4,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}

我們也可以檢視 faces 索引的文檔：

GET faces/_search

比對面孔

假設我們在 Elasticsearch 中索引了四個文檔，其中包含 Elastic 創始人的每個面部表情。現在，我們可以使用創始人的其他圖像來比對各個圖像。

為此，我們需要建立一個叫做 recognizeFaces.py 的檔案。

recognizeFaces.py

import face_recognition
import numpy as np
from elasticsearch import Elasticsearch
import sys
import os
 
from elasticsearch import Elasticsearch
 
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
 
cwd = os.getcwd()
# print("cwd: " + cwd)
 
# Get the images directory
rootdir = cwd + "/images_to_be_recognized"
# print("rootdir: {0}".format(rootdir))
 
for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        print(os.path.join(subdir, file))
        file_path = os.path.join(subdir, file)
 
        image = face_recognition.load_image_file(file_path)
 
        # detect the faces from the images
        face_locations = face_recognition.face_locations(image)
 
        # encode the 128-dimension face encoding for each face in the image
        face_encodings = face_recognition.face_encodings(image, face_locations)
 
        # Display the 128-dimension for each face detected
        i = 0
        for face_encoding in face_encodings:
            i += 1
            print("Face", i)
            response = es.search(
                index="faces",
                body={
                    "size": 1,
                    "_source": "face_name",
                    "query": {
                        "script_score": {
                            "query": {
                                "match_all": {}
                            },
                            "script": {
                                "source": "cosineSimilarity(params.query_vector, 'face_encoding')",
                                "params": {
                                    "query_vector": face_encoding.tolist()
                                }
                            }
                        }
                    }
                }
            )
 
            # print(response)
 
            for hit in response['hits']['hits']:
                # double score=float(hit['_score'])
                print("score: {}".format(hit['_score']))
                if float(hit['_score']) > 0.92:
                    print("==> This face  match with ", hit['_source']['face_name'], ",the score is", hit['_score'])
                else:
                    print("==> Unknown face")

這個檔案的寫法也非常簡單。它從目錄 images_to_be_recognized 中擷取需要識别的檔案，并對這個圖檔進行識别。我們使用 cosineSimilarity 函數來計算給定查詢向量和存儲在 Elasticsearch 中的文檔向量之間的餘弦相似度。

# Display the 128-dimension for each face detected
        i = 0
        for face_encoding in face_encodings:
            i += 1
            print("Face", i)
            response = es.search(
                index="faces",
                body={
                    "size": 1,
                    "_source": "face_name",
                    "query": {
                        "script_score": {
                            "query": {
                                "match_all": {}
                            },
                            "script": {
                                "source": "cosineSimilarity(params.query_vector, 'face_encoding')",
                                "params": {
                                    "query_vector": face_encoding.tolist()
                                }
                            }
                        }
                    }
                }
            )

假設分數低于 0.92 被認為是未知面孔：

for hit in response['hits']['hits']:
                # double score=float(hit['_score'])
                print("score: {}".format(hit['_score']))
                if float(hit['_score']) > 0.92:
                    print("==> This face  match with ", hit['_source']['face_name'], ",the score is", hit['_score'])
                else:
                    print("==> Unknown face")

執行上面的 Python 代碼：

該腳本能夠檢測出得分比對度高于 0.92 的所有面孔

搜尋進階

面部識别和搜尋可以結合使用，以用于進階用例。你可以使用 Elasticsearch 建構更複雜的查詢，例如 geo_queries，query-dsl-bool-query 和 search-aggregations。

例如，以下查詢将 cosineSimilarity 搜尋應用于200公裡半徑内的特定位置：

GET /_search 
{ 
  "query": { 
    "script_score": { 
      "query": { 
    "bool": { 
      "must": { 
        "match_all": {} 
      }, 
      "filter": { 
        "geo_distance": { 
          "distance": "200km", 
          "pin.location": { 
            "lat": 40, 
            "lon": -70 
          } 
        } 
      } 
    } 
  }, 
       "script": { 
                "source": "cosineSimilarity(params.query_vector, 'face_encoding')", 
                 "params": { 
                 "query_vector":[ 
                        -0.14664565,
                       0.07806452,
                       0.03944433,
                       ...
                       ...
                       ...
                       -0.03167224,
                       -0.13942884
                    ] 
                } 
           } 
    } 
  } 
}

将 cosineSimilarity 與其他 Elasticsearch 查詢結合使用，可以無限地實作更複雜的用例。

結論

面部識别可能與許多用例相關，并且你可能已經在日常生活中使用了它。上面描述的概念可以推廣到圖像或視訊中的任何對象檢測，是以你可以将用例擴充到非常大的應用場景。

參考：

https://www.elastic.co/blog/how-to-build-a-facial-recognition-system-using-elasticsearch-and-python

Elasticsearch 和 Python建構面部識别系統—Elastic Stack 實戰手冊

基礎知識

面部識别

128 維向量

Vector data type

準備

安裝 Python 和 Python 庫

從圖像中檢測和編碼面部資訊

比對面孔

搜尋進階

結論

繼續閱讀

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

Linux裝置模型（中）之上層容器

JBoss,Geronimo和Glassfish初窺

PowerPC平台 Linux移植三

在python中建立excel并寫入