簡單快速的Style Transfer（利用VGG19）-筆記

What is Style Transfer

Style Transfer, 風格轉移，即把圖檔A變成圖檔B的風格，但保持圖檔A的内容不變，舉個栗子，假設下圖左是你自己的作品，中圖是梵高的星空，右圖則是風格轉移算法的結果。

How to do it?

風格轉移的實作方法很多，複雜的算法使用到GANs，其結果會非常的精美，但會需要（十分）強大的算力（和錢）。一些改進的方法能夠顯著将GANs類模型的計算成本控制在合理範圍，例如将一張圖檔切割成很多小的方塊，對每個小塊進行風格轉移，最後再将它們拼起來。

但實際上存在一種更簡單更快速的方法——不需要用到GANs、隻需要借助pre-trained image classification CNN，即可完成風格轉換。了解其原理之前，必須先簡單提一下CNN是如何完成image classification的，我們知道CNN classifier是由很多很多的CNN神經網絡組成的，不同的CNN的kernel size不一樣，較淺的layers的kernel size都比較小，這樣可以抓取一些細微的特征，例如動物的毛發、金屬表面的質感等等，較深的layers的kernel size會逐漸增大，這樣可以抓取一些更完整的特征，例如眼睛、尾巴、車輪等等。可以發現，剛剛描述的較淺的layers描述的接近于一張圖檔的styles——油畫畫筆的texture、色塊的分布等等，而較深的layers則描述的更多的是一張圖檔的内容——眼睛、尾巴、車輪等等。于是一個簡單的風格轉移模型就是利用CNN classifier不同的layer抓取的内容不一這個特點來實作快速、低成本的風格轉移。

具體實作方法

需要利用的工具：VGG19(in Tensorflow), Python

（我們甚至不需要GPU）

Flow

将input圖檔feed進一個pre-trained image architecture, like VGG or ResNet.
計算Loss：

1）Content：把content image的content layer, F l ∈ R m , n F^{l} \in \mathcal{R}^{m ,n} Fl∈Rm,n提取出來，将content layer變平成一個向量 f l ∈ R m ∗ n , 1 \mathbf{f}^{l} \in \mathcal{R}^{m*n,1} fl∈Rm∗n,1；将生成的圖檔 P l ∈ R m , n P^{l} \in \mathcal{R}^{m ,n} Pl∈Rm,n也做同樣的變平處理成一個向量 p l ∈ R m ∗ n , 1 \mathbf{p}^{l} \in \mathcal{R}^{m*n,1} pl∈Rm∗n,1，那麼content loss就是 f l \mathbf{f}^{l} fl和 p l \mathbf{p}^{l} pl的Euclidean Norm：

L c o n t e n t ( p , f , l ) = 1 2 ∑ i , j ( F i , j l − P i , j l ) 2 L_{content}(\mathbf{p},\mathbf{f},l)=\frac{1}{2}\sum_{i,j}(F_{i,j}^l-P_{i,j}^l)^2 Lcontent(p,f,l)=21∑i,j(Fi,jl−Pi,jl)2

2）Style Loss：兩個向量的點乘可以表現這兩個向量有多相似（即同方向），當我們把兩個flattened feature vector點乘時，這個乘積也代表了某個feature vector在某個方向上是否相似，需要注意的是，由于圖形這個張量被flatten成一個向量，故點乘并不能展示spatial資訊，而隻能描述更加細微的texture。

L s t y l e = G i , j l = ∑ k F i , k l F j , k l L_{style}=G^l_{i,j}=\sum_k F^l_{i,k}F^l_{j,k} Lstyle=Gi,jl=∑kFi,klFj,kl

其中G代表Gram matrix，即兩個向量的outer product組成的矩陣

3）A somewhat intuitive explaination w.r.t. why use difference in content loss and dot product in style loss：The content feature extracted from VGG is like greyscaled sketches of the content image. 即 F c o n t e n t F_{content} Fcontent可以想象成黑白的勾勒content的線條，是以當我們想比較生成的圖檔是否具備 F c o n t e n t F_{content} Fcontent所代表的content，我們隻需要檢查某個pixel上，是否存在一個相似的pixel的值。而style的話是一種local texture，可以想象在一副油畫中，筆刷刷出來的質感，或者像梵高的星空這幅畫，你會看到大面積的螺旋狀的gradient，是以比起是否或高或低的像素值，我更在意這些像素它們變化的方向是否和style image一緻，而這種方向可以很好的被dot product給capture。
計算Gradients w.r.t. input image pixels P P P。注意這個gradients不會被back propagate到VGG的weights上，而是back propagate給input圖檔，VGG的weights全程保持不變。

Implementation

首先我們load content image and style image，注意這裡用的VGG，VGG的input是224X224，是以需要把它們都裁成224X224。

content_image = #load your content image here
style_image = #load your style image here

我們load VGG19 model from Keras

import tensorflow as tf
vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')

可以看一下VGG19裡面有哪些layers

print([layer.name for layer in vgg.layers])

block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_conv4
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_conv4
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_conv4
block5_pool

雖然看起來很普通，但這一步就是奇迹發生的時刻，我們從VGG19裡pick了content layer和style layer。（try picking different layers to represent content and loss, and see what you get）

content_layers = ['block5_conv2']
style_layers = ['block1_conv1',
				'block2_conv1',
				'block3_conv1',
				'block4_conv1',
				'block5_conv1']
num_content_layers = len(conten_layers)
num_style_layers = len(style_layers)

寫一個function把Layers給wrap up一下

def vgg_layers(layer_names):
	"""creates a vgg model that returns a list of intermediate output values"""
	vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet')
	vgg.trainable = False #鎖住VGG的參數不變，因為我們想訓練的不是參數，而是input
	outputs = [vgg.get_layer(name).output for layer in layer_names] #vgg.get_layer(name).output 是一個tensor placeholder，下面vgg.input同理，因為VGG19的input必須是224X224，是以vgg.input也是這個size的tensor placeholder
	model = tf.keras.Model([vgg.input],outputs)
	return model
style_extractor = vgg_layers(style_layers)
style_outputs = style_extractor(style_image*255)

利用Gram matrix計算style loss的function，這個就是前文提到的feature vector的dot product。我們選取了5個CNN block的第一層CNN作為style feature vector，計算這些feature vector和其它feature vectors（包括它們自己）的dot product，并組成一個(num_of_feature_vector *num_of_feature_vector)的矩陣，這個過程可以用gram_matrix來實作：

G c d l = ∑ i j F i j c l ( x ) F i j d l ( x ) I J G^l_{cd}=\frac{\sum_{ij}F^l_{ijc}(x)F^l_{ijd}(x)}{IJ} Gcdl=IJ∑ijFijcl(x)Fijdl(x)

def gram_matrix(input_tensor):
	#(b,i,j,c)=(batch_size, ith row, jth col, cth color channel)
	result = tf.linalg.einsum('bijc,bijd->bcd', input_tensor, input_tensor)
	input_shape = tf.shape(input_tensor)
	# 第一個dim[0]是batch_size，是以IJ=dim[1]*dim[2]
	num_locations = tf.cast(input_shape[1]*input_shape[2],tf.float32)
	return result/num_locations

Wrap loss into the model:

class StyleContentModel(tf.keras.models.Model):
	def __init__(self, style_layers, content_layers):
		super(StyleContentModel, self).__init__()
		self.vgg = vgg_layers(style_layers+content_layers)
		self.style_layers = style_layers
		self.content_layers = content_layers
		self.num_style_layers = len(style_layers)
		self.vgg.trainable = False
	
	def call(self, inputs):
		"""float input form [0,1]"""
		inputs = input*255.0
		preprocessed_input = tf.keras.applications.vgg19.preprocess_input(inputs)
		outputs = self.vgg(preprocessed_input)
		style_outputs, content_outputs = (outputs[:self.num_style_layers],
									outputs[self.num_style_layers:])
		style_outputs = [gram_matrix(style_output) for style_output in style_outputs]
		content_dict = {content_name:value for content_name,value in zip(self.content_layers, content_outputs)}
		style_dict = {style_name:value for style_name,value in zip(self.style_layers, style_outputs)}
		return {'content':content_dict, 'style':style_dict}

extractor = StyleContentModel(style_layers, content_layers)

計算gradient，開始backpropagate：

style_targets = extractor(style_image)['style']
content_targets = extractor(content_image)['content']

# initialize a placeholder tensor, with the same dimension of content image
image = tf.Variable(content_image)

#輸入進模型的資料都*255了，是以這裡還原成[0,1]
def clip_0_1(image):
	return tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=1.0)

#定義一個optimizer
opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)

# 定義style和content的loss在總loss各占多少比重
style_weight, content_weight=1e-2, 1e4
def style_content_loss(outputs):
	style_outputs = outputs['style']
	content_outputs = outputs['content']
	style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2) for name in style_outputs.keys()])
	style_loss *= style_weight/num_style_layers
	content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2) for name in content_outputs.keys()])
	content_loss *= content_weight/num_content_layers
	return style_loss + content_loss

到這裡就是全部的setup了，後續就是一個tf.session開始訓練，感興趣的可以去連結2繼續看看，這裡就不繼續copy&paste了（不然都沒辦法tag成原創了hhh），是以就到此為止了，感謝閱讀。

參考：

https://arxiv.org/abs/1508.06576

https://www.tensorflow.org/tutorials/generative/style_transfer

簡單快速的Style Transfer（利用VGG19）-筆記

What is Style Transfer

How to do it?

具體實作方法

Flow

Implementation

繼續閱讀

開源低帶寬語音編解碼器

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

極大似然法(ML)與最大期望法(EM)

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

C++ 第十五周報告1--《冒泡法排序》

筆試面試題目：滑動視窗(二)

資料結構與算法（27）——排序（二）

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

hdu7108哈希