人臉識别“FaceNet: A Unified Embedding for Face Recognition and Clustering”

2023-03-16 03:03:26

作者認為用于訓練的三元組很重要，使用經過篩選的三元組資料訓練，在LFW上識别率達到了99.63%。使用深度卷積網學習圖像的歐式嵌入。

方法描述：

使用了兩個網絡Zeiler&Fergus，Inception網絡。這個系統的架構如下：

将三元組損失用于識别，認證和聚類任務。學習一個embedding 函數 f(x) 将圖像x映射到特征空間 Rd ，使得同一人臉間的平方距離最小。

1.三元組損失

同一人間的距離小，不同人間的距離大：

人臉識别“FaceNet: A Unified Embedding for Face Recognition and Clustering”

損失函數是為了最小化：

人臉識别“FaceNet: A Unified Embedding for Face Recognition and Clustering”

損失函數學習的結果如下：

人臉識别“FaceNet: A Unified Embedding for Face Recognition and Clustering”

這裡的關鍵是選擇hard的三元組，讓模型快速收斂，原則是，給定 xai ，選擇hard正樣本，最大化 ||f(xai)−f(xpi)||22 ，選擇hard負樣本，最小化 ||f(xai)−f(xni)||22 。這有可能導緻誤标記或品質較差的圖像主宰hard的正樣本和負樣本。

作者介紹了兩個方法避免這種情況，

Generate triplets offline every n steps, using the most recent network checkpoint and computing the argmin and argmax on a subset of the data.

Generate triplets online. This can be done by selecting the hard positive/negative exemplars from within a mini-batch.

為了避免選擇最難的負樣本，選擇滿足如下條件的semi-hard負樣本，這些負樣本落在margin α 的内部：

人臉識别“FaceNet: A Unified Embedding for Face Recognition and Clustering”

深度卷積網

作者使用了兩個網絡，分别分析FLOPS和網絡參數對結果的影響，兩個網絡一個大一個小，分别用在資料中心和移動手機上，第一個ZF的網絡深度為22層，有140百萬個參數，每幅圖像需要16億次浮點運算。第二個網絡是GoogleNet的Inception模型，參數和浮點運算次數小很多。

FLOPS與準确率之間的關系：

人臉識别“FaceNet: A Unified Embedding for Face Recognition and Clustering”

繼續閱讀