說明:
這是優達學城的一個機器學習作業項目,
我覺得還比較典型綜合了幾個常見的深度學習技術,值得分享一下;實作包括,資料增廣,遷移學習,網絡模型建構,訓練,評估方法等。
這裡隻是做了一個遷移學習的實作,重在實踐過程,其原理沒做分析。
缺點:由于訓練資料集規模較小,訓練的資料,不确定精确反映網絡模型性能;比如ResNet50加載預訓練模型權重,相比不加載(随機值),訓練後精确度反而稍微較低,與理論不符。
項目3:人臉識别
歡迎來到機器學習工程師納米學位的第三個項目!在此檔案中,有些示例代碼已經提供給你,但你還需要實作更多的功能讓項目成功運作。除非有明确要求,你無須修改任何已給出的代碼。每一部分都會有詳細的指導,需要實作的部分也會在注釋中以’TODO’标出。請仔細閱讀所有的提示!
除了實作代碼外,你還必須回答一些與項目和你的實作有關的問題。每一個需要你回答的問題都會以’問題 X’為标題。請仔細閱讀每個問題,并且在問題後的’回答問題’文字框中寫出完整的答案。我們将根據你對問題的回答和撰寫代碼所實作的功能來對你送出的項目進行評分。
任務介紹
人臉識别是一個計算機視覺任務,任務是要通過一張帶有人臉的圖像,對圖像中的人臉進行識别并判斷是誰。關于人臉識别的任務,我們一定會用到2015年Google開發的FaceNet,這個模型由于其性能非常好而被廣泛使用,并且該訓練好的模型已經被開源。
是以,本項目的任務将要學習人臉識别任務,在此項目中,我們将先按課程所學到的知識親手搭建一個卷積神經網絡,然後,我們将用進階的網絡結構,比如ResNet50再次進行人臉識别任務,最後我們将用到預訓練好的FaceNet模型。在這個過程中,我們還會用到資料增強和人臉抽取技術來提升人臉識别的精确度。
在這個人臉識别項目中,我們将使用一個開源資料集Five Celebrity Faces Dataset,這也是一個在Kaggle比賽中的一個資料集。我們也已經下載下傳好了并放在
./5-celebrity-faces-dataset
中,資料集中包含五位名人的照片,Ben Affleck, Elton John, Jerry Seinfeld, Madonna, Mindy Kaling。檔案下分
train
和
val
。
資料準備
我們首先要簡單的觀察資料,然後通過資料增強和人臉抽取技術對資料圖像資料進行抽取。你需要在完成這些操作後,思考并回答相關的問題。
顯示一張圖像
所有 train下面的圖像檔案名都存入 images 清單中,并将該圖像的人名按順序存于 images_name 中
import cv2
import matplotlib.pyplot as plt
import os
import random
import pandas as pd
%matplotlib inline
data_root = "./5-celebrity-faces-dataset/train/"
import csv
def read_file_log(pathName, cvsName):
with open(cvsName, 'w',encoding='utf-8') as f_cvs:
csv_writer = csv.writer(f_cvs)
#csv_writer.writerow(['fileNale', 'label'])
#csv_writer.writerow(["file_path", "name"])
all_dirs = os.listdir(pathName)
for dir_name in all_dirs:
all_files = os.listdir(pathName+dir_name)
for file_name in all_files:
child = os.path.join('%s/%s/%s' % (pathName, dir_name,file_name))
label = dir_name
##print child.decode('gbk') # .decode('gbk')是解決中文顯示亂碼問題
#print(child,label)
csv_writer.writerow([child, label])
train_log_file = './5-celebrity-faces-dataset/train_log.csv'
read_file_log(data_root,train_log_file)
def read_csv(file):
with open(file) as csvfile:
reader = csv.reader(csvfile)
images = []
images_name = []
for line in reader:
images.append(line[0])
images_name.append(line[1])
return images,images_name
# TODO: 把所有 train下面的圖像檔案名都存入 images 清單中,并将該圖像的人名按順序存于 images_name 中
images,images_name = read_csv(train_log_file)
print(images[0])
print(images_name[0])
./5-celebrity-faces-dataset/train//elton_john/httpssmediacacheakpinimgcomxfefdacfbfdeadajpg.jpg
elton_john
從
images
中随機讀取一張圖像,使用
cv2.imread
讀取圖像,然後使用
pyplot.imshow
顯示圖像。注意:你需要同時顯示該圖像對應的人名,以及列印該圖像的
shape
。
# TODO: 從images 中随機讀取一張圖像,并獲得該圖像中的人名
from random import randrange
def random_sample(images=images, images_name=images_name):
print("随機選取一張照片:")
random_index = randrange(0,len(images))
# TODO:從 images 和 images_name 随機讀取一個圖像檔案路徑以及該圖像的人名
im_file, im_name = images[random_index],images_name[random_index]
# TODO:使用 cv2.imread 讀取圖像檔案
img = cv2.imread(im_file)
img2=cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # cv2預設為bgr順序
# TODO:使用 plt.imshow 和 plt.show() 顯示圖像
plt.imshow(img2)
plt.show()
# 列印該圖像的人名
print(im_name)
# 列印該圖像的大小 shape
print(img.shape)
return im_file, im_name
random_sample(images, images_name)
随機選取一張照片:
elton_john
(353, 236, 3)
('./5-celebrity-faces-dataset/train//elton_john/httpssmediacacheakpinimgcomxfefdacfbfdeadajpg.jpg',
'elton_john')
你可以多次運作上面的代碼來多觀察一些人物圖像,以此來對資料有一個大緻的認知
用
cv2.imread
讀取所有資料并存入
train_x
中,然後用 0,1,2,3,4 來标記 Ben Affleck, Elton John, Jerry Seinfeld, Madonna, Mindy Kaling,并将所有
images_name
資料存入
train_y
中。
train_x = []
train_y = []
dict_name = {'ben_afflek':0,
'elton_john':1,
'jerry_seinfeld':2,
'madonna':3,
'mindy_kaling':4}
for file,name in zip(images,images_name):
#print(file,name)
train_x.append(cv2.imread(file))
train_y.append(dict_name[name])
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]
資料增強
首先,我們需要列印訓練集的數量。
93
我們可以看到,訓練集的圖像數量比較少,這對模型模組化并學習圖像資料到人名标簽的映射關系增加了難度,是以這裡需要做資料增強的工作。這裡有一份很不錯的資料可以幫助你了解資料增強——Data Augmentation。
from keras.preprocessing.image import ImageDataGenerator
# TODO: 構造圖像資料增強器
data_gen = ImageDataGenerator(
rescale = .1, # TODO:随機縮放圖像RGB值的倍數
rotation_range =0.15 , # TODO:随機旋轉圖像的範圍
zoom_range = 0.1, # TODO:随機縮放圖像大小範圍
width_shift_range = 0.2, # TODO:随機水準方向平移圖像(fraction of total width)
height_shift_range= 0.2, # TODO:随機縱向平移圖像(fraction of total height)
horizontal_flip=True,
)
Using TensorFlow backend.
使用
flow_from_directory
周遊資料集
./5-celebrity-faces-dataset/data
,來觀察資料增強的表現。先得到一個圖像疊代器,該疊代器每次都從路徑裡讀取一個圖像,并按照資料增強器的規則進行編輯圖像
# 直接運作,得到一個圖像疊代器,該疊代器每次都從路徑裡讀取一個圖像,并按照資料增強器的規則進行編輯圖像
dataflow_generator = data_gen.flow_from_directory(
"./5-celebrity-faces-dataset/data",
target_size=(160, 160),
batch_size=3,
color_mode='rgb',
class_mode='categorical')
print(dataflow_generator.filenames)
Found 5 images belonging to 5 classes.
['ben_afflek/httpcsvkmeuaeccjpg.jpg', 'elton_john/httpftqncomymusicLxZeltonjohnjpg.jpg', 'jerry_seinfeld/httpgraphicsnytimescomimagessectionmoviesfilmographyWireImagejpg.jpg', 'madonna/httpiamediaimdbcomimagesMMVBMTANDQNTAxNDVeQTJeQWpwZBbWUMDIMjQOTYVUXCRALjpg.jpg', 'mindy_kaling/httpgonetworthcomwpcontentuploadsthumbsjpg.jpg']
# TODO:從 疊代器中 讀取10張圖檔,并顯示圖像
from keras.preprocessing.image import array_to_img
sample_count = 10
i = 0
filenames = dataflow_generator.filenames
labels = dataflow_generator.class_indices
print(filenames)
print(len(filenames))
print(labels)
#此處image_data是一個二維序列,
#image_data[0][...]存放batch_size張圖檔
#image_data[1][...]存放batch_size對應标簽
for image_data in dataflow_generator:
# TODO:使用 plt.imshow 和 plt.show() 顯示圖像
print(len(image_data[1]))
for j in range(0,len(image_data[1])):
if i >= 12:
break
plt.subplot(3,5,1+i)
image = image_data[0][j].astype('uint8')
#print(type(image_data))
plt.imshow(array_to_img(image))
#plt.imshow(image)
i += 1
print(image_data[1][j]) #label
print(image_data[0][0].shape) #image
sample_count -= 1
if sample_count <= 0:
break
['ben_afflek/httpcsvkmeuaeccjpg.jpg', 'elton_john/httpftqncomymusicLxZeltonjohnjpg.jpg', 'jerry_seinfeld/httpgraphicsnytimescomimagessectionmoviesfilmographyWireImagejpg.jpg', 'madonna/httpiamediaimdbcomimagesMMVBMTANDQNTAxNDVeQTJeQWpwZBbWUMDIMjQOTYVUXCRALjpg.jpg', 'mindy_kaling/httpgonetworthcomwpcontentuploadsthumbsjpg.jpg']
5
{'ben_afflek': 0, 'elton_john': 1, 'jerry_seinfeld': 2, 'madonna': 3, 'mindy_kaling': 4}
3
[0. 0. 0. 0. 1.]
(160, 160, 3)
[0. 1. 0. 0. 0.]
(160, 160, 3)
[0. 0. 1. 0. 0.]
(160, 160, 3)
2
[0. 0. 0. 1. 0.]
(160, 160, 3)
[1. 0. 0. 0. 0.]
(160, 160, 3)
3
[0. 0. 0. 1. 0.]
(160, 160, 3)
[0. 0. 0. 0. 1.]
(160, 160, 3)
[0. 1. 0. 0. 0.]
(160, 160, 3)
2
[0. 0. 1. 0. 0.]
(160, 160, 3)
[1. 0. 0. 0. 0.]
(160, 160, 3)
3
[0. 0. 1. 0. 0.]
(160, 160, 3)
[0. 0. 0. 1. 0.]
(160, 160, 3)
2
3
2
3
2
問題1:觀察以上人臉圖像,簡單說說産生的圖像中存在哪些增強的部分,然後再詳細闡述你對資料增強的思考,包括為什麼資料增強能夠幫助人臉識别?你需要參考一些論文,并列出你的引用。
問題回答:
左右平移,上下平移,旋轉
用CNN處理圖檔特征,具有平移不變形,旋轉不變性。
因為對同一個人拍照,可以采取不同角度和構圖,是以做資料增強可以更好泛化
資料增強能夠補充資料數量,防止過拟合,增強模型的泛化能力。資料增強有助于産生更多的資料訓練網絡。增強模型的泛化的性能,使網絡能夠泛化到不在訓練集中的圖像。正如本項目的訓練集的圖像數量比較少,模型模組化并學習就很困難。通過資料增強的方式,就可以在很少資料情況下産生足夠多的資料,建立圖檔分類器。
增強模型的泛化的性能,一般的手段有資料增強和正則化方法,而用于資料增強的一般方法有:随機裁剪、随機水準翻轉、平移、旋轉、增加噪音和生成網絡方法等(前兩個方法用的最多,也最有效)。
https://nanonets.com/blog/data-augmentation-how-to-use-deep-learning-when-you-have-limited-data-part-2/
https://machinelearningmastery.com/how-to-configure-image-data-augmentation-when-training-deep-learning-neural-networks/
人臉抽取
在做人臉識别任務中,一項常用的圖像資料處理的技術是人臉檢測(Face Detection)。人臉檢測是将輸入的圖檔中的人臉部分自動檢測出來,具體來說就是要通過預測一個矩形邊界框(Bounding Box)從整個圖像中定位人臉部分,這裡的矩形邊界框由矩形左下角坐标以及矩形高和寬來定義。人臉檢測是一個比較成熟的任務,接下來在我們這個項目中,我們将使用 Multi-Task Cascaded Convolutional Neural Network,MTCNN,你也可以參考論文:Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks,來學習人臉檢測任務。
# 運作下面代碼,安裝 mtcnn
!pip install mtcnn
Requirement already satisfied: mtcnn in /home/leon/anaconda3/lib/python3.7/site-packages (0.1.0)
Requirement already satisfied: keras>=2.0.0 in /home/leon/anaconda3/lib/python3.7/site-packages (from mtcnn) (2.3.1)
Requirement already satisfied: opencv-python>=4.1.0 in /home/leon/anaconda3/lib/python3.7/site-packages (from mtcnn) (4.2.0.32)
Requirement already satisfied: keras-applications>=1.0.6 in /home/leon/anaconda3/lib/python3.7/site-packages (from keras>=2.0.0->mtcnn) (1.0.8)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /home/leon/anaconda3/lib/python3.7/site-packages (from keras>=2.0.0->mtcnn) (1.1.0)
Requirement already satisfied: six>=1.9.0 in /home/leon/anaconda3/lib/python3.7/site-packages (from keras>=2.0.0->mtcnn) (1.13.0)
Requirement already satisfied: pyyaml in /home/leon/anaconda3/lib/python3.7/site-packages (from keras>=2.0.0->mtcnn) (5.2)
Requirement already satisfied: h5py in /home/leon/anaconda3/lib/python3.7/site-packages (from keras>=2.0.0->mtcnn) (2.8.0)
Requirement already satisfied: scipy>=0.14 in /home/leon/anaconda3/lib/python3.7/site-packages (from keras>=2.0.0->mtcnn) (1.3.2)
Requirement already satisfied: numpy>=1.9.1 in /home/leon/anaconda3/lib/python3.7/site-packages (from keras>=2.0.0->mtcnn) (1.17.4)
# 定義人臉抽取的函數
from PIL import Image
from mtcnn.mtcnn import MTCNN
import numpy as np
def extract_face(filename, image_size=(160, 160)):
# 加載圖像
image = Image.open(filename)
# 轉換RGB
image = image.convert('RGB')
# 轉成 numpy.array 格式的資料
image_data = np.asarray(image)
# 建立一個人臉檢測,
detector = MTCNN()
# 從圖像中檢測
results = detector.detect_faces(image_data)
# 傳回的結果是圖像中所有出現的人臉的矩形邊界框,由于我們的圖像中隻有一張人臉,所是以隻需要取結果中第一個
box_x, box_y, width, height = results[0]['box']
# 處理下标為負的情況
box_x, box_y = abs(box_x), abs(box_y)
box_x_up, box_y_up = box_x + width, box_y + height
# 獲得人臉部分的資料
face = image_data[box_y:box_y_up, box_x:box_x_up]
print("face.shape",face.shape)
# TODO:把抽取出來的人臉圖像 resize 至需要的圖像大小,并傳回numpy格式的資料
#face_array = cv2.resize(face,image_size,interpolation=cv2.INTER_CUBIC)
image = Image.fromarray(face)
image = image.resize(image_size)
face_array = np.asarray(image)
return face_array
ran_img_file, ran_img_name = random_sample()
img = extract_face(ran_img_file)
plt.imshow(img)
#plt.show()
print(img.shape)
随機選取一張照片:
madonna
(315, 214, 3)
WARNING:tensorflow:From /home/leon/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
WARNING:tensorflow:From /home/leon/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
face.shape (44, 33, 3)
(160, 160, 3)
問題2: 通過多次運作以上代碼并觀察人臉抽取後的圖像,你認為人臉檢測對人臉識别有幫助嗎?為什麼?你需要參考一些論文,并列出你的reference。
回答問題:
當然有幫助,隻有準确檢測出人臉區域,才能進一步做人臉識别
從資料整理的角度,通過檢測是否有圖像包含人臉,可以更好的清理訓練圖像資料。
人臉識别的所需的資訊既有特征,又有結構。人類檢測的矩形邊界框定義人臉,其實是最基本的結構。
人臉識别過程中常常要做人臉對齊(Face Alignment),而人臉檢測是進行這一步驟的基礎。對齊的人類圖像,會降低模型識别的難度。
現實場景的圖像可能包括大量背景資訊,而不是我們關注的人類。人類對齊可以有助于把人臉和背景分離,更好的讓模型專注于人類的特征。
Bruce Cheen. 人臉檢測和人臉識别.
https://blog.csdn.net/czp_374/article/details/81162923
如何應用MTCNN和FaceNet模型實作人臉檢測及識别.
http://www.uml.org.cn/ai/201806124.asp
構造資料
現在我們可以應用以上的資料增強和人臉檢測技術來構造完整的資料。
這裡可以直接使用前面定義好的圖像資料增強器
data_gen
,然後使用 ImageDataGenerator中的
random_transform
對單個圖像做随機增強操作。
另外,在構造資料之前,你需要先構造一個人名到類别的映射,使得在構造資料的label的時候将string格式的人名轉換為int格式的類别。
程式設計練習:
- 你需要 構造人名字典,将 ben_afflek、elton_john、jerry_seinfeld、madonna、mindy_kaling 分别映射到 0-1-2-3-4
- 定義
函數,周遊train或者val檔案夾,讀取檔案夾下5個人名檔案夾,以該檔案夾名映射至0到4 的标簽;然後分别從人名檔案夾中周遊所有圖像檔案,讀取圖像,如果是train檔案夾下的圖像,則需要用load_dataset
來增強圖像資料,增強次數為augment_times;如果是val檔案夾下的圖像,則不需要進行圖像增強data_gen.random_transform
def extract_face2(detector,image, image_size=(160, 160)):
# 轉成 numpy.array 格式的資料
image_data = image
# 建立一個人臉檢測,
# 從圖像中檢測
results = detector.detect_faces(image_data)
flag = True
if len(results) > 0:
# 傳回的結果是圖像中所有出現的人臉的矩形邊界框,由于我們的圖像中隻有一張人臉,所是以隻需要取結果中第一個
box_x, box_y, width, height = results[0]['box']
# 處理下标為負的情況
box_x, box_y = abs(box_x), abs(box_y)
box_x_up, box_y_up = box_x + width, box_y + height
# 獲得人臉部分的資料
face = image_data[box_y:box_y_up, box_x:box_x_up]
#print("face.shape",face.shape)
# TODO:把抽取出來的人臉圖像 resize 至需要的圖像大小,并傳回numpy格式的資料
flag = True
face_array = cv2.resize(face,image_size,interpolation=cv2.INTER_CUBIC)
else:
flag = False
face_array = None
return flag,face_array
# TODO:構造人名字典,将 ben_afflek、elton_john、jerry_seinfeld、madonna、mindy_kaling 分别映射到 0-1-2-3-4
name_dict = {'ben_afflek':0,
'elton_john':1,
'jerry_seinfeld':2,
'madonna':3,
'mindy_kaling':4}
# TODO:定義資料加載函數,data_dir為檔案路徑,augment_times為資料增強次數,is_train為判斷是訓練集還是測試集(測試集不需要資料增強)
def load_dataset2(data_dir = "./5-celebrity-faces-dataset/train/", augment_times=2, is_train=True):
data_x = []
data_y = []
images = []
labels = []
if is_train:
data_gen2 = ImageDataGenerator(rescale = .3, rotation_range =0.2 , zoom_range = 0.2, width_shift_range = 0.2, height_shift_range= 0.2)
dataflow_generator2 = data_gen2.flow_from_directory(data_dir,target_size=(160, 160),batch_size=1,color_mode='rgb',class_mode='categorical')
labels_dict = dataflow_generator2.class_indices
print("labels_dict",labels_dict)
sample_count = len(dataflow_generator2.filenames)*augment_times
print('sample_count:',sample_count)
#filenames = dataflow_generator2.filenames
#labels = dataflow_generator2.class_indices
#print(filenames)
#print(labels)
for image_data in dataflow_generator:
# TODO:使用 plt.imshow 和 plt.show() 顯示圖像
#print(len(image_data[1]))
for j in range(0,len(image_data[1])):
image = image_data[0][j].astype('uint8')
images.append(image)
labels.append(image_data[1][j])
#print(image_data[1]) #label
#print(image_data[0][0].shape) #image
sample_count -= 1
if sample_count <= 0:
images = np.array(images)
labels = np.array(labels)
break
else:
images = []
labels = []
all_dirs = os.listdir(data_dir)
print("all_dirs:",all_dirs)
for dir_name in all_dirs:
all_files = os.listdir(data_dir+dir_name)
for file_name in all_files:
im_file = os.path.join('%s/%s/%s' % (data_dir, dir_name,file_name))
img = Image.open(filename)
image = image.convert('RGB')
label = dir_name
images.append(img)
labels.append(name_dict[label])
#print("im_file:{},label:{}".format(im_file,label))
images = np.array(images)
labels = np.eye(5)[np.array(labels)]
detector = MTCNN()
for i in range(0,len(labels)):
flag,face = extract_face2(detector,images[i])
if flag:
data_x.append(face)
data_y.append(labels[i])
data_x =np.array(data_x)
data_y =np.array(data_y)
return data_x, data_y
def extract_face3(detector, filename, image_size=(160, 160)):
# 加載圖像
image = Image.open(filename)
# 轉換RGB
image = image.convert('RGB')
# 轉成 numpy.array 格式的資料
image_data = np.asarray(image)
# 從圖像中檢測
results = detector.detect_faces(image_data)
# 傳回的結果是圖像中所有出現的人臉的矩形邊界框,由于我們的圖像中隻有一張人臉,所是以隻需要取結果中第一個
box_x, box_y, width, height = results[0]['box']
# 處理下标為負的情況
box_x, box_y = abs(box_x), abs(box_y)
box_x_up, box_y_up = box_x + width, box_y + height
# 獲得人臉部分的資料
face = image_data[box_y:box_y_up, box_x:box_x_up]
# TODO:把抽取出來的人臉圖像 resize 至需要的圖像大小,并傳回numpy格式的資料
image = Image.fromarray(face)
image = image.resize(image_size)
face_array = np.asarray(image)
return face_array
# TODO:構造人名字典,将 ben_afflek、elton_john、jerry_seinfeld、madonna、mindy_kaling 分别映射到 0-1-2-3-4
name_dict = {'ben_afflek':0,
'elton_john':1,
'jerry_seinfeld':2,
'madonna':3,
'mindy_kaling':4}
# TODO:定義資料加載函數,data_dir為檔案路徑,augment_times為資料增強次數,is_train為判斷是訓練集還是測試集(測試集不需要資料增強)
def load_dataset(data_dir = "./5-celebrity-faces-dataset/train/", augment_times=2, is_train=True):
data_x = []
data_y = []
detector = MTCNN()
# TODO:
for subdir in os.listdir(data_dir):
path = os.path.join(data_dir, subdir)
for filename in os.listdir(path):
face = extract_face3(detector,os.path.join(path, filename))
#print(face.shape)
data_x.append(face)
data_y.append(name_dict[subdir])
# 如果是測試資料,則不需要進行資料增強
if is_train:
for _ in range(augment_times):
face_aug = data_gen.random_transform(face)
data_x.append(face_aug)
data_y.append(name_dict[subdir])
return data_x, data_y
train_x, train_y = load_dataset("./5-celebrity-faces-dataset/train/", augment_times=2, is_train=True)
test_x, test_y = load_dataset("./5-celebrity-faces-dataset/val/", is_train=False)
# 最終構造好訓練和測試資料
train_X = np.asarray(train_x)
train_Y = np.eye(5)[np.array(train_y)]
test_X = np.asarray(test_x)
test_Y = np.eye(5)[np.array(test_y)]
print(train_X.shape)
print(train_Y.shape,train_Y)
(279, 160, 160, 3)
(279, 5) [[0. 1. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 1. 0. 0. 0.]
...
[0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1.]
[0. 0. 0. 0. 1.]]
index = [i for i in range(len(train_y))]
random.shuffle(index)
train_X = train_X[index]
train_Y = train_Y[index]
print(len(train_X))
print(train_X[0].shape)
print(len(train_Y))
print(train_Y)
plt.imshow(train_X[10])
279
(160, 160, 3)
279
[[1. 0. 0. 0. 0.]
[0. 0. 0. 0. 1.]
[0. 0. 1. 0. 0.]
...
[0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 1.]]
<matplotlib.image.AxesImage at 0x7f59b0724be0>
建構一個卷積神經網絡
建立一個卷積神經網絡來對人臉進行分類。在你代碼塊的最後,執行
model.summary()
來輸出你模型的總結資訊。
。
問題3: 在下方的代碼塊中嘗試使用 Keras 搭建卷積網絡的架構,并回答相關的問題。
- 你可以嘗試自己搭建一個卷積網絡的模型,那麼你需要回答你搭建卷積網絡的具體步驟(用了哪些層)以及為什麼這樣搭建。
- 你也可以根據上圖提示的步驟搭建卷積網絡,那麼請說明如上的架構能夠在該問題上取得的表現。
回答問題:
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential
model = Sequential()
### TODO: 定義你的網絡架構
model.add(Conv2D(filters=32, kernel_size=3, padding='valid', activation='relu', input_shape=(160, 160, 3)))
#model.add(Conv2D(32, (3,3), input_shape=(160, 160, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=2))
#model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.5))
model.add(Conv2D(filters=64, kernel_size=3, padding='valid', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.5))
model.add(Conv2D(filters=128, kernel_size=3, padding='valid', activation='relu'))
model.add(GlobalAveragePooling2D())
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_40 (Conv2D) (None, 158, 158, 32) 896
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 79, 79, 32) 0
_________________________________________________________________
dropout_4 (Dropout) (None, 79, 79, 32) 0
_________________________________________________________________
conv2d_41 (Conv2D) (None, 77, 77, 64) 18496
_________________________________________________________________
max_pooling2d_22 (MaxPooling (None, 38, 38, 64) 0
_________________________________________________________________
dropout_5 (Dropout) (None, 38, 38, 64) 0
_________________________________________________________________
conv2d_42 (Conv2D) (None, 36, 36, 128) 73856
_________________________________________________________________
global_average_pooling2d_2 ( (None, 128) 0
_________________________________________________________________
dropout_6 (Dropout) (None, 128) 0
_________________________________________________________________
dense_23 (Dense) (None, 5) 645
=================================================================
Total params: 93,893
Trainable params: 93,893
Non-trainable params: 0
_________________________________________________________________
keras.callbacks.ModelCheckpoint(filepath, monitor=‘val_loss’, verbose=0, save_best_only=False, save_weights_only=False, mode=‘auto’, period=1)
filepath 可以包括命名格式選項,可以由 epoch 的值和 logs 的鍵(由 on_epoch_end 參數傳遞)來填充。
例如:如果 filepath 是 weights.{epoch:02d}-{val_loss:.2f}.hdf5, 那麼模型被儲存的的檔案名就會有訓練輪數和驗證損失。
參數
filepath: 字元串,儲存模型的路徑。
monitor: 被監測的資料。
verbose: 詳細資訊模式,0 或者 1 。
save_best_only: 如果 save_best_only=True, 被監測資料的最佳模型就不會被覆寫。
mode: {auto, min, max} 的其中之一。 如果 save_best_only=True,那麼是否覆寫儲存檔案的決定就取決于被監測資料的最大或者最小值。 對于 val_acc,模式就會是 max,而對于 val_loss,模式就需要是 min,等等。 在 auto 模式中,方向會自動從被監測的資料的名字中判斷出來。
save_weights_only: 如果 True,那麼隻有模型的權重會被儲存 (model.save_weights(filepath)), 否則的話,整個模型會被儲存 (model.save(filepath))。
period: 每個檢查點之間的間隔(訓練輪數)。
from keras.callbacks import ModelCheckpoint
filepath='weights.best.hdf5'
# 有一次提升, 則覆寫一次.
checkpointer = ModelCheckpoint(filepath='face.weights.best.hdf5', verbose=1, save_best_only=True)
callbacks_list = [checkpoint]
# 直接運作編譯模型和訓練模型
# 編譯模型
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
# 模型訓練
history1 = model.fit(train_X, train_Y, batch_size=16, epochs=50,callbacks=callbacks_list)
Epoch 1/50
279/279 [==============================] - 7s 24ms/step - loss: 39.8889 - accuracy: 0.1828
Epoch 2/50
/home/leon/anaconda3/lib/python3.7/site-packages/keras/callbacks/callbacks.py:707: RuntimeWarning: Can save best model only with val_acc available, skipping.
'skipping.' % (self.monitor), RuntimeWarning)
...
Epoch 49/50
279/279 [==============================] - 6s 23ms/step - loss: 1.0212 - accuracy: 0.5986
Epoch 50/50
279/279 [==============================] - 6s 22ms/step - loss: 0.9619 - accuracy: 0.6057
模型測試
你需要編寫一個自動測試模型準确率的函數。
from sklearn.metrics import fbeta_score, accuracy_score
from sklearn.metrics import classification_report
def metric_accuracy(model, test_X, test_Y, model_name):
preds_Y = model.predict(test_X)
correct = 0.
for pr, y in zip(preds_Y, test_Y):
pr_cls = np.argmax(pr)
if y[pr_cls] == 1:
correct += 1
accuracy = correct / len(preds_Y)
print()
print("%s Accuracy: %.3f" % (model_name, accuracy))
def metric_accuracy2(model, test_X, test_Y, model_name):
preds_Y = model.predict(test_X)
#TODO:通過預測值preds_Y以及真實值test_Y,來計算準确率
#print(preds_Y)
max_index = np.argmax(preds_Y,axis=1) #橫軸比較
print(len(max_index),max_index)
preds_Y = np.zeros( preds_Y.shape)
for i in range(0,len(max_index)):
preds_Y[i][max_index[i]] = 1
#print(len(preds_Y),preds_Y)
print(preds_Y.shape)
print(preds_Y[0])
print("="*88)
#print(len(test_Y),test_Y)
print(test_Y.shape)
print(test_Y[0])
accuracy = accuracy_score(preds_Y,test_Y)
#classification_report(test_Y,preds_Y)
print()
print("%s Accuracy: %.3f" % (model_name, accuracy))
metric_accuracy(model, test_X, test_Y, "Simple CNN")
Simple CNN Accuracy: 0.600
Simple CNN Accuracy: 0.835
進階 CNN 模型架構,ResNet50
在計算機視覺任務中,有一些複雜的進階CNN模型架構,比如ResNet、VGG、Inception 等等,他們能夠對圖像有一個非常好的表達。并且,已經有人把這些模型在非常大的圖像資料上訓練好了參數,這使得預訓練的大模型能夠對圖像有一個很好的特征表達。這種在大規模圖像資料上學到的圖像特征,能夠遷移到人臉圖像的特征表示。
在這一小節,我們利用預訓練好的 ResNet50,抽取圖像特征,然後再去做人臉識别。雖然 ResNet50 在各種圖像上面進行預訓練的,但是該模型對圖像結構特征資訊的學習也能夠幫助人臉識别任務中的預測。
import keras
from keras.models import Model, Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
# 模型底層使用 ResNet50 對原始圖像進行模組化,特征抽取
resnet50_weights = "./models/resnet50_weights.h5"
resnet = keras.applications.resnet50.ResNet50(weights=None, include_top=False, input_shape=(160, 160, 3))
resnet.load_weights(resnet50_weights)
# TODO:自己定義模型頂層,使用抽取後的特征進行人臉識别
resnet_face = Sequential()
resnet_face.add(Flatten(input_shape=resnet.output_shape[1:]))
#resnet_face.add(Dense(2048, activation="relu"))
#resnet_face.add(Dropout(0.5))
resnet_face.add(Dense(1024, activation="relu"))
resnet_face.add(Dropout(0.5))
resnet_face.add(Dense(5, activation='softmax'))
resnet_face_model = Model(inputs=resnet.input, outputs=resnet_face(resnet.output))
resnet_face_model.summary()
/home/leon/anaconda3/lib/python3.7/site-packages/keras_applications/resnet50.py:265: UserWarning: The output shape of `ResNet50(include_top=False)` has been changed since Keras 2.2.0.
warnings.warn('The output shape of `ResNet50(include_top=False)` '
Model: "model_71"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_61 (InputLayer) (None, 160, 160, 3) 0
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 166, 166, 3) 0 input_61[0][0]
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 80, 80, 64) 9472 conv1_pad[0][0]
_______________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 80, 80, 64) 256 conv1[0][0]
__________________________________________________________________________________________________
activation_295 (Activation) (None, 80, 80, 64) 0 bn_conv1[0][0] _________________________________________________________________________________
...
________________________________________________________________________________________________
bn5c_branch2c (BatchNormalizati (None, 5, 5, 2048) 8192 res5c_branch2c[0][0]
__________________________________________________________________________________________________
add_112 (Add) (None, 5, 5, 2048) 0 bn5c_branch2c[0][0]
activation_340[0][0]
__________________________________________________________________________________________________
activation_343 (Activation) (None, 5, 5, 2048) 0 add_112[0][0]
__________________________________________________________________________________________________
sequential_25 (Sequential) (None, 5) 52434949 activation_343[0][0]
==================================================================================================
Total params: 76,022,661
Trainable params: 75,969,541
Non-trainable params: 53,120
__________________________________________________________________________________________________
print(len(resnet_face_model.layers))
print(resnet_face_model.layers[0])
176
<keras.engine.input_layer.InputLayer object at 0x7f1023f25e80>
for layer in resnet_face_model.layers[:10]:
layer.trainable = False
# 設定同樣的訓練參數,直接運作
## 編譯模型
resnet_face_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
# 模型訓練
resnet_face_model.fit(train_X, train_Y, batch_size=16, epochs=50)
Epoch 1/50
279/279 [==============================] - 38s 137ms/step - loss: 19.1508 - accuracy: 0.3907
...
Epoch 49/50
279/279 [==============================] - 34s 121ms/step - loss: 0.5168 - accuracy: 0.9749
Epoch 50/50
279/279 [==============================] - 34s 121ms/step - loss: 1.2976 - accuracy: 0.9570
模型測試
# 直接運作,測試 resnet_face_model 的準确率
metric_accuracy(resnet_face_model, test_X, test_Y, "ResNet50")
ResNet50 Accuracy: 0.560
ResNet50 Accuracy: 0.556
問題5: 對比 ResNet50 模型和 CNN 模型的結果,請你分析為什麼 ResNet50 模型能夠取得更好的結果?
回答問題:
1.ResNet50模型相比CNN更深,卷積核更多,可以采樣到更多的特征。
2.ResNet50模型有結果大量資料訓練過的參數;
問題6: 上面我們使用了預訓練好的 ResNet50,即
resnet.load_weights(resnet50_weights)
,那麼加載預訓練好的參數對該任務有幫助嗎?你需要通過做對比實驗,即不加載預訓練好的參數,然後在下面的代碼框中重新跑一遍 ResNet50 的模型,來作為對比說明加載預訓練是否有幫助
回答問題:
1.預加載參數,模型收斂的更快;
2.從實驗結果看預加載的參數,泛化能力不如重新訓練所有參數?(這點與我認知剛好相反,請老師指點)
# 重新跑一遍不加載預訓練參數的 ResNet50 的模型,請在此處寫完整的code
import keras
from keras.models import Model, Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
# 模型底層使用 ResNet50 對原始圖像進行模組化,特征抽取
resnet50_weights = "./models/resnet50_weights.h5"
resnet2 = keras.applications.resnet50.ResNet50(weights=None, include_top=False, input_shape=(160, 160, 3))
#resnet.load_weights(resnet50_weights)
# TODO:自己定義模型頂層,使用抽取後的特征進行人臉識别
resnet_face2 = Sequential()
resnet_face2.add(Flatten(input_shape=resnet2.output_shape[1:]))
resnet_face2.add(Dense(1024, activation="relu"))
resnet_face2.add(Dropout(0.5))
resnet_face2.add(Dense(5, activation="softmax"))
resnet_face_model2 = Model(inputs=resnet2.input, outputs=resnet_face2(resnet2.output))
resnet_face_model2.summary()
Model: "model_33"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_33 (InputLayer) (None, 160, 160, 3) 0
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D) (None, 166, 166, 3) 0 input_33[0][0]
__________________________________________________________________________________________________
conv1 (Conv2D) (None, 80, 80, 64) 9472 conv1_pad[0][0]
...
__________________________________________________________________________________________________
add_96 (Add) (None, 5, 5, 2048) 0 bn5c_branch2c[0][0]
activation_291[0][0]
__________________________________________________________________________________________________
activation_294 (Activation) (None, 5, 5, 2048) 0 add_96[0][0]
__________________________________________________________________________________________________
sequential_9 (Sequential) (None, 5) 52434949 activation_294[0][0]
==================================================================================================
Total params: 76,022,661
Trainable params: 75,969,541
Non-trainable params: 53,120
__________________________________________________________________________________________________
/home/leon/anaconda3/lib/python3.7/site-packages/keras_applications/resnet50.py:265: UserWarning: The output shape of `ResNet50(include_top=False)` has been changed since Keras 2.2.0.
warnings.warn('The output shape of `ResNet50(include_top=False)` '
## 編譯模型
resnet_face_model2.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
# 模型訓練
resnet_face_model2.fit(train_X, train_Y, batch_size=16, epochs=50)
Epoch 1/50
279/279 [==============================] - 39s 138ms/step - loss: 128.2206 - accuracy: 0.2151
...
Epoch 49/50
279/279 [==============================] - 34s 122ms/step - loss: 0.3852 - accuracy: 0.9211
Epoch 50/50
279/279 [==============================] - 34s 121ms/step - loss: 0.0479 - accuracy: 0.9892
ResNet50 Accuracy: 0.600
ResNet50 Accuracy: 0.753
FaceNet
上一小節中,我們利用了預訓練好的 ResNet50 來抽取圖像特征,而這一小節我們将利用預訓練好的 FaceNet 來抽取人臉特征。我們已經知道 ResNet50 是在大規模資料上模組化學習圖像特征的,這裡面的資料是多種多樣的,不限制于人臉圖像,而 FaceNet 是專門對于人臉進行特征抽取的工具。
from keras.models import load_model
# load the model
model = load_model('./models/facenet_keras.h5')
# summarize input and output shape
print(model.inputs)
print(model.outputs)
model.summary()
[<tf.Tensor 'input_1_1:0' shape=(?, 160, 160, 3) dtype=float32>]
[<tf.Tensor 'Bottleneck_BatchNorm/cond/Merge:0' shape=(?, 128) dtype=float32>]
Model: "inception_resnet_v1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 160, 160, 3) 0
__________________________________________________________________________________________________
Conv2d_1a_3x3 (Conv2D) (None, 79, 79, 32) 864 input_1[0][0]
__________________________________________________________________________________________________
Conv2d_1a_3x3_BatchNorm (BatchN (None, 79, 79, 32) 96 Conv2d_1a_3x3[0][0]
__________________________________________________________________________________________________
...
__________________________________________________________________________________________________
Block8_6_Branch_0_Conv2d_1x1_Ba (None, 3, 3, 192) 576 Block8_6_Branch_0_Conv2d_1x1[0][0
__________________________________________________________________________________________________
Block8_6_Branch_1_Conv2d_0c_3x1 (None, 3, 3, 192) 576 Block8_6_Branch_1_Conv2d_0c_3x1[0
__________________________________________________________________________________________________
Block8_6_Branch_0_Conv2d_1x1_Ac (None, 3, 3, 192) 0 Block8_6_Branch_0_Conv2d_1x1_Batc
__________________________________________________________________________________________________
Block8_6_Branch_1_Conv2d_0c_3x1 (None, 3, 3, 192) 0 Block8_6_Branch_1_Conv2d_0c_3x1_B
__________________________________________________________________________________________________
Block8_6_Concatenate (Concatena (None, 3, 3, 384) 0 Block8_6_Branch_0_Conv2d_1x1_Acti
Block8_6_Branch_1_Conv2d_0c_3x1_A
__________________________________________________________________________________________________
Block8_6_Conv2d_1x1 (Conv2D) (None, 3, 3, 1792) 689920 Block8_6_Concatenate[0][0]
__________________________________________________________________________________________________
Block8_6_ScaleSum (Lambda) (None, 3, 3, 1792) 0 Block8_5_Activation[0][0]
Block8_6_Conv2d_1x1[0][0]
__________________________________________________________________________________________________
AvgPool (GlobalAveragePooling2D (None, 1792) 0 Block8_6_ScaleSum[0][0]
__________________________________________________________________________________________________
Dropout (Dropout) (None, 1792) 0 AvgPool[0][0]
__________________________________________________________________________________________________
Bottleneck (Dense) (None, 128) 229376 Dropout[0][0]
__________________________________________________________________________________________________
Bottleneck_BatchNorm (BatchNorm (None, 128) 384 Bottleneck[0][0]
==================================================================================================
Total params: 22,808,144
Trainable params: 22,779,312
Non-trainable params: 28,832
__________________________________________________________________________________________________
/home/leon/anaconda3/lib/python3.7/site-packages/keras/engine/saving.py:341: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
# TODO:使用 load_model 從`./models/facenet_keras.h5` 加載模型
from keras.models import load_model
# 模型底層使用 FaceNet 對原始圖像進行模組化,特征抽取
# 加載預訓練好的 FaceNet 模型。
facenet_model = load_model('./models/facenet_keras.h5')
# TODO:自己定義模型頂層,使用抽取後的特征進行人臉識别
facenet_face = Sequential()
#facenet_face.add(Dense(1024, input_shape=facenet_model.output_shape, activation="relu"))
#facenet_face.add(Dropout(0.5))
facenet_face.add(Dense(5, activation='softmax'))
facenet_face_model = Model(inputs=facenet_model.input, outputs=facenet_face(facenet_model.output))
facenet_face_model.summary()
Model: "model_42"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 160, 160, 3) 0
__________________________________________________________________________________________________
Conv2d_1a_3x3 (Conv2D) (None, 79, 79, 32) 864 input_1[0][0]
__________________________________________________________________________________________________
Conv2d_1a_3x3_BatchNorm (BatchN (None, 79, 79, 32) 96 Conv2d_1a_3x3[0][0]
__________________________________________________________________________________________________
...
__________________________________________________________________________________________________
Block8_6_ScaleSum (Lambda) (None, 3, 3, 1792) 0 Block8_5_Activation[0][0]
Block8_6_Conv2d_1x1[0][0]
__________________________________________________________________________________________________
AvgPool (GlobalAveragePooling2D (None, 1792) 0 Block8_6_ScaleSum[0][0]
__________________________________________________________________________________________________
Dropout (Dropout) (None, 1792) 0 AvgPool[0][0]
__________________________________________________________________________________________________
Bottleneck (Dense) (None, 128) 229376 Dropout[0][0]
__________________________________________________________________________________________________
Bottleneck_BatchNorm (BatchNorm (None, 128) 384 Bottleneck[0][0]
__________________________________________________________________________________________________
sequential_21 (Sequential) (None, 5) 645 Bottleneck_BatchNorm[0][0]
==================================================================================================
Total params: 22,808,789
Trainable params: 22,779,957
Non-trainable params: 28,832
__________________________________________________________________________________________________
/home/leon/anaconda3/lib/python3.7/site-packages/keras/engine/saving.py:341: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually.
warnings.warn('No training configuration found in save file: '
train_XX = []
for x in train_X:
x_image = Image.fromarray(x)
x_image = x_image.resize((160, 160))
train_XX.append(np.asarray(x_image))
train_X2 = np.array(train_XX)
train_X2[0].shape
(160, 160, 3)
## 編譯模型
facenet_face_model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
# 模型訓練
facenet_face_model.fit(train_X2, train_Y, batch_size=8, epochs=50)
Epoch 1/50
279/279 [==============================] - 29s 106ms/step - loss: 0.7062 - accuracy: 0.7491
Epoch 2/50
279/279 [==============================] - 18s 64ms/step - loss: 0.3518 - accuracy: 0.8781
...
Epoch 49/50
279/279 [==============================] - 18s 64ms/step - loss: 0.0768 - accuracy: 0.9749
Epoch 50/50
279/279 [==============================] - 18s 64ms/step - loss: 0.0727 - accuracy: 0.9821
<keras.callbacks.callbacks.History at 0x7f1188d7f160>
test_XX = []
for x in test_X:
x_image = Image.fromarray(x)
x_image = x_image.resize((160, 160))
test_XX.append(np.asarray(x_image))
test_X2 = np.array(test_XX)
test_X2[0].shape
preds_Y = facenet_face_model.predict(test_X2)
correct = 0.
for pr, y in zip(preds_Y, test_Y):
pr_cls = np.argmax(pr)
if y[pr_cls] == 1:
correct += 1
accuracy = correct / len(preds_Y)
print("FaceNet Accuracy: %.3f" % accuracy)
FaceNet Accuracy: 0.960
問題7: 評價 FaceNet 模型的效果,并指出為什麼 FaceNet 比上一小節中 ResNet 的效果要好。
回答問題:
FaceNet收斂更快,其模型參數專門針對人臉訓練過;
問題8:
- 首先你需要在下放畫一個表格,将上面所做的實驗結果都列出來。
- 然後總結此項目,你認為在這個人臉識别項目中,哪些技術對識别準确率起比較重要的作用?請結合以上的實驗結果分析。
- 最後,你再簡要說說還有哪些技術對人臉識别任務有較大的幫助?列出你的references
回答問題:
實驗結果總結
Models | Accuracy |
---|---|
CNN | #0.44 |
ResNet50 no-pretrain | #0.60 |
ResNet50 pretrain | #0.56 |
FaceNet | #0.96 |
人臉識别項目中,對識别準确率起比較重要的作用的:
1.人臉檢測,準确的人臉檢測;
2.網絡模型設計,更深,更複雜的網絡,更有利于特征提取;
3.訓練資料,結果資料增強的資料集,訓練效果更好;
references:
https://arxiv.org/pdf/1503.03832.pdf
https://www.cnblogs.com/lijie-blog/p/10168073.html