天天看點

制作自己的人臉識别系統 制作自己的人臉識别系統 (Making your own Face Recognition System)

by Sigurður Skúli

通過SigurðurSkúli

制作自己的人臉識别系統 (Making your own Face Recognition System)

Face recognition is the latest trend when it comes to user authentication. Apple recently launched their new iPhone X which uses Face ID to authenticate users. OnePlus 5 is getting the Face Unlock feature from theOnePlus 5T soon. And Baidu is using face recognition instead of ID cards to allow their employees to enter their offices. These applications may seem like magic to a lot of people. But in this article we aim to demystify the subject by teaching you how to make your own simplified version of a face recognition system in Python.

人臉識别是涉及使用者身份驗證的最新趨勢。 蘋果最近推出了他們的新iPhone X,該手機使用Face ID來驗證使用者身份。 OnePlus 5即将從OnePlus 5T獲得面部解鎖功能 。 百度正在使用人臉識别代替身份證,允許員工進入辦公室 。 對于許多人來說,這些應用程式似乎是不可思議的。 但是在本文中,我們旨在通過教您如何使用Python制作自己的簡化版本的人臉識别系統來揭開這個主題的神秘面紗。

Github link for those who do not like reading and only want the code

Github連結适合那些不喜歡閱讀而隻想要代碼的人

背景 (Background)

Before we get into the details of the implementation I want to discuss the details of FaceNet. Which is the network we will be using in our system.

在詳細介紹實作之前,我想讨論一下FaceNet的細節。 我們将在系統中使用哪個網絡。

面對網 (FaceNet)

FaceNet is a neural network that learns a mapping from face images to a compact Euclidean space where distances correspond to a measure of face similarity. That is to say, the more similar two face images are the lesser the distance between them.

FaceNet是一個神經網絡,可學習從人臉圖像到緊湊歐幾裡得空間的映射,其中距離對應于人臉相似性的度量。 也就是說,兩個人臉圖像越相似,它們之間的距離就越小。

三重損失 (Triplet Loss)

FaceNet uses a distinct loss method called Triplet Loss to calculate loss. Triplet Loss minimises the distance between an anchor and a positive, images that contain same identity, and maximises the distance between the anchor and a negative, images that contain different identities.

FaceNet使用一種稱為三重損失的獨特損失方法來計算損失。 三重損失使錨點和包含相同辨別的正像之間的距離最小化,并使錨點和包含不同辨別的負像之間的距離最大化。

  • f(a) refers to the output encoding of the anchor

    f(a)表示錨點的輸出編碼

  • f(p) refers to the output encoding of the positive

    f(p)表示正數的輸出編碼

  • f(n) refers to the output encoding of the negative

    f(n)表示負數的輸出編碼

  • alpha is a constant used to make sure that the network does not try to optimise towards f(a) - f(p) = f(a) - f(n) = 0.

    alpha是一個常數,用于確定網絡不會嘗試朝f(a)-f(p)= f(a)-f(n)= 0優化。

  • […]+ is equal to max(0, sum)

    […] +等于max(0,sum)

暹羅網絡 (Siamese Networks)

FaceNet is a Siamese Network. A Siamese Network is a type of neural network architecture that learns how to differentiate between two inputs. This allows them to learn which images are similar and which are not. These images could be contain faces.

FaceNet是一個暹羅網絡。 暹羅網絡是一種神經網絡架構,可以學習如何區分兩個輸入。 這使他們能夠了解哪些圖像相似,哪些不相似。 這些圖像可能包含面Kong。

Siamese networks consist of two identical neural networks, each with the same exact weights. First, each network take one of the two input images as input. Then, the outputs of the last layers of each network are sent to a function that determines whether the images contain the same identity.

連體網絡由兩個相同的神經網絡組成,每個神經網絡具有相同的精确權重。 首先,每個網絡都将兩個輸入圖像之一作為輸入。 然後,每個網絡的最後一層的輸出将發送到确定圖像是否包含相同身份的功能。

In FaceNet, this is done by calculating the distance between the two outputs.

在FaceNet中,這是通過計算兩個輸出之間的距離來完成的。

實作 (Implementation)

Now that we have clarified the theory, we can jump straight into the implementation.

現在,我們已經闡明了理論,我們可以直接進入實作過程。

In our implementation we’re going to be using Keras and Tensorflow. Additionally, we’re using two utility files that we got from deeplearning.ai’s repo to abstract all interactions with the FaceNet network.:

在我們的實作,我們要使用Keras和Tensorflow 。 此外,我們使用從deeplearning.ai的倉庫中獲得的兩個實用程式檔案來抽象​​與FaceNet網絡的所有互動。

  • fr_utils.py contains functions to feed images to the network and getting the encoding of images

    fr_utils.py包含将圖像饋送到網絡并擷取圖像編碼的函數

  • inception_blocks_v2.py contains functions to prepare and compile the FaceNet network

    inception_blocks_v2.py包含用于準備和編譯FaceNet網絡的函數

編譯FaceNet網絡 (Compiling the FaceNet network)

The first thing we have to do is compile the FaceNet network so that we can use it for our face recognition system.

我們要做的第一件事是編譯FaceNet網絡,以便我們可以将其用于面部識别系統。

import osimport globimport numpy as npimport cv2import tensorflow as tffrom fr_utils import *from inception_blocks_v2 import *from keras import backend as K
           
K.set_image_data_format('channels_first')
           
FRmodel = faceRecoModel(input_shape=(3, 96, 96))
           
def triplet_loss(y_true, y_pred, alpha = 0.3):    anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,               positive)), axis=-1)    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor,                negative)), axis=-1)    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)    loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))       return loss
           
FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])load_weights_from_FaceNet(FRmodel)
           

We’ll start by initialising our network with an input shape of (3, 96, 96). That means that the Red-Green-Blue (RGB) channels are the first dimension of the image volume fed to the network. And that all images that are fed to the network must be 96x96 pixel images.

我們将從初始化輸入形狀為(3,96,96)的網絡開始。 這意味着紅綠藍(RGB)通道是饋送到網絡的圖像量的第一維。 并且所有送入網絡的圖像都必須是96x96像素的圖像。

Next we’ll define the Triplet Loss function. The function in the code snippet above follows the definition of the Triplet Loss equation that we defined in the previous section.

接下來,我們将定義三重損失函數。 上面的代碼片段中的函數遵循我們在上一節中定義的Triplet Loss方程的定義。

If you are unfamiliar with any of the Tensorflow functions used to perform the calculation, I’d recommend reading the documentation (for which I have added links to for each function) as it will improve your understanding of the code. But comparing the function to the equation in Figure 1 should be enough.

如果您不熟悉用于執行計算的任何Tensorflow函數,我建議您閱讀文檔(為此我添加了每個函數的連結),因為它會增進您對代碼的了解。 但是将函數與圖1中的方程進行比較就足夠了。

Once we have our loss function, we can compile our face recognition model using Keras. And we’ll use the Adam optimizer to minimise the loss calculated by the Triplet Loss function.

一旦有了損失功能,就可以使用Keras編譯人臉識别模型。 而且,我們将使用Adam優化器來最小化由Triplet Loss函數計算的損耗。

準備資料庫 (Preparing a Database)

Now that we have compiled FaceNet, we are going to prepare a database of individuals we want our system to recognise. We are going to use all the images contained in our images directory for our database of individuals.

現在我們已經編譯了FaceNet,我們将準備一個我們希望系統識别的個人資料庫。 我們将使用圖像中包含的所有圖像 個人資料庫的目錄。

NOTE: We are only going to use one image of each individual in our implementation. The reason is that the FaceNet network is powerful enough to only need one image of an individual to recognise them!

注意:在我們的實作中,我們将隻使用一個圖像。 原因是FaceNet網絡足夠強大,隻需要一個人的圖像就可以識别它們!

def prepare_database():    database = {}
           
for file in glob.glob("images/*"):        identity = os.path.splitext(os.path.basename(file))[0]        database[identity] = img_path_to_encoding(file, FRmodel)
           
return database
           

For each image, we will convert the image data to an encoding of 128 float numbers. We do this by calling the function img_path_to_encoding. The function takes in a path to an image and feeds the image to our face recognition network. Then, it returns the output from the network, which happens to be the encoding of the image.

對于每張圖像,我們會将圖像資料轉換為128個浮點數的編碼。 我們通過調用img_path_to_encoding函數來實作 。 該功能将擷取圖像的路徑,并将圖像提供給我們的面部識别網絡。 然後,它從網絡傳回輸出,該輸出恰好是圖像的編碼。

Once we have added the encoding for each image to our database, our system can finally start recognising individuals!

将每個圖像的編碼添加到資料庫後,我們的系統最終可以開始識别個人!

識别人臉 (Recognising a Face)

As discussed in the Background section, FaceNet is trained to minimise the distance between images of the same individual and maximise the distance between images of different individuals. Our implementation uses this information to determine which individual the new image fed to our system is most likely to be.

如背景技術部分中所述,FaceNet經過訓練可以使同一個人的圖像之間的距離最小,而使不同個人的圖像之間的距離最大。 我們的實作使用此資訊來确定新圖像最有可能是哪個人。

def who_is_it(image, database, model):    encoding = img_to_encoding(image, model)        min_dist = 100    identity = None        # Loop over the database dictionary's names and encodings.    for (name, db_enc) in database.items():        dist = np.linalg.norm(db_enc - encoding)
           
print('distance for %s is %s' %(name, dist))
           
if dist < min_dist:            min_dist = dist            identity = name        if min_dist > 0.52:        return None    else:        return identity
           

The function above feeds the new image into a utility function called img_to_encoding. The function processes an image using FaceNet and returns the encoding of the image. Now that we have the encoding we can find the individual that the image most likely belongs to.

上面的函數将新圖像饋送到名為img_to_encoding的實用程式函數中。 該函數使用FaceNet處理圖像并傳回圖像的編碼。 現在我們有了編碼,我們可以找到圖像最有可能屬于的個人。

To find the individual, we go through our database and calculate the distance between our new image and each individual in the database. The individual with the lowest distance to the new image is then chosen as the most likely candidate.

為了找到個人,我們周遊資料庫并計算新圖像與資料庫中每個個人之間的距離。 然後選擇與新圖像距離最短的個人作為最可能的候選人。

Finally, we must determine whether the candidate image and the new image contain the same person or not. Since by the end of our loop we have only determined the most likely individual. This is where the following code snippet comes into play.

最後,我們必須确定候選圖像和新圖像是否包含同一個人。 由于在循環結束時,我們僅确定了最有可能的個人。 這是以下代碼段起作用的地方。

if min_dist > 0.52:    return Noneelse:    return identity
           
  • If the distance is above 0.52, then we determine that the individual in the new image does not exist in our database.

    如果距離大于0.52,則我們确定新圖像中的個人在我們的資料庫中不存在。

  • But, if the distance is equal to or below 0.52, then we determine they are the same individual!

    但是,如果距離等于或小于0.52,則我們确定它們是同一個人!

Now the tricky part here is that the value 0.52 was achieved through trial-and-error on my behalf for my specific dataset. The best value might be much lower or slightly higher and it will depend on your implementation and data. I recommend trying out different values and see what fits your system best!

現在,這裡最棘手的部分是,我代表我的特定資料集通過反複試驗獲得了0.52的值。 最佳值可能會低得多或略高,這取決于您的實作和資料。 我建議嘗試不同的值,然後看看最适合您的系統!

使用人臉識别建構系統 (Building a System using Face Recognition)

Now that we know the details on how we recognise a person using a face recognition algorithm, we can start having some fun with it.

既然我們知道了如何使用面部識别算法識别人的細節,我們就可以開始使用它了。

In the Github repository I linked to at the beginning of this article is a demo that uses a laptop’s webcam to feed video frames to our face recognition algorithm. Once the algorithm recognises an individual in the frame, the demo plays an audio message that welcomes the user using the name of their image in the database. Figure 3 shows an example of the demo in action.

在本文開頭我連結到的Github存儲庫中,是一個示範,該示範使用便攜式計算機的網絡攝像頭将視訊幀饋送到我們的面部識别算法。 一旦算法識别出幀中的某個人,示範就會播放音頻消息,使用資料庫中其圖像的名稱來歡迎使用者。 圖3顯示了示範示例。

結論 (Conclusion)

By now you should be familiar with how face recognition systems work and how to make your own simplified face recognition system using a pre-trained version of the FaceNet network in python!

現在,您應該熟悉面部識别系統的工作原理,以及如何使用經過預訓練的python FaceNet網絡版本制作自己的簡化面部識别系統!

If you want to play around with the demonstration in the Github repository and add images of people you know then go ahead and fork the repository.

如果您想在Github存儲庫中進行示範,并添加認識的人的圖像,請繼續并添加存儲庫。

Have some fun with the demonstration and impress all your friends with your awesome knowledge of face recognition!

在示範中玩得開心,并以您對人臉識别的精通知識打動所有朋友!

翻譯自: https://www.freecodecamp.org/news/making-your-own-face-recognition-system-29a8e728107c/