文章目錄

一、前言
二、代碼實踐

一、前言

訓練環境：Google colab

訓練時長：<15min

論文位址：https://arxiv.org/abs/1508.06576

在本教程中，我們将學習如何使用深度學習來創作另一種（畢加索或梵高式）風格的圖像，這就是所謂的神經類型遷移！這是列昂·蓋茨的論文中概述的一種技術：一種藝術風格的神經算法，非常值得一讀。

那麼，問題來了，什麼是neural style transfer?

答：neural style transfer是一種優化技術，用于取三張圖檔：内容圖檔，樣式參考圖檔(如來自同一著名畫家的藝術作品)，和你想要改造風格的輸入圖檔，将三張圖檔混合在一起，這樣輸入圖檔轉化成集内容圖檔的内容與樣式參考圖檔的樣式于一體的别樣圖檔。

例如，讓我們來看看這隻海龜和葛飾北齋的《神奈川巨浪》:

如果葛飾北齋決定用這種風格來畫這隻烏龜，那會是什麼樣子呢？會是這樣的嗎?

這是魔法還是深度學習？幸運的是，這并不涉及任何巫術：樣式轉換是一種有趣的技術，它展示了神經網絡的功能和内部表示。

神經風格傳遞的原理是定義兩個距離函數，一個描述兩幅圖像的内容如何不同，一個描述兩幅圖像之間的風格差異。然後，給定三個圖像，一個期望的樣式圖像，一個期望的内容圖像，和輸入圖像(用内容圖像初始化)，我們嘗試轉換輸入圖像，以最小化與内容圖像的内容距離和它與樣式圖像的樣式距離。總之，我們将擷取基本輸入圖像、要比對的内容圖像和要比對的樣式圖像。我們将通過反向傳播最小化内容和樣式之間的距離(損失)來轉換基本輸入圖像，建立一個比對内容圖像的内容和樣式圖像的樣式的圖像。

其中将涉及的具體概念：在這個過程中，我們将圍繞以下概念建立實踐經驗和發展直覺。

Eager Execution——使用 TensorFlow 的指令式程式設計環境，可以立即評估操作。
使用函數API定義模型，我們将構模組化型的一個子集，它将使我們能夠使用函數API通路必要的中間激活。
利用一個預訓練模型的特征圖—學習如何使用預訓練模型及其特征圖
建立自定義訓練循環——我們将研究如何設定優化器來最小化給定的輸入參數損失
我們将按照一般步驟來執行風格轉換：可視化資料、基本預處理/準備我們的資料、設定損失函數、建立模型、損失函數優化。

二、代碼實踐

下載下傳圖檔（環境：Google colab)

import os
img_dir = '/tmp/nst'
if not os.path.exists(img_dir):
    os.makedirs(img_dir)
!wget --quiet -P /tmp/nst/ https://upload.wikimedia.org/wikipedia/commons/d/d7/Green_Sea_Turtle_grazing_seagrass.jpg
!wget --quiet -P /tmp/nst/ https://upload.wikimedia.org/wikipedia/commons/0/0a/The_Great_Wave_off_Kanagawa.jpg
!wget --quiet -P /tmp/nst/ https://upload.wikimedia.org/wikipedia/commons/b/b4/Vassily_Kandinsky%2C_1913_-_Composition_7.jpg
!wget --quiet -P /tmp/nst/ https://upload.wikimedia.org/wikipedia/commons/0/00/Tuebingen_Neckarfront.jpg
!wget --quiet -P /tmp/nst/ https://upload.wikimedia.org/wikipedia/commons/6/68/Pillars_of_creation_2014_HST_WFC3-UVIS_full-res_denoised.jpg
!wget --quiet -P /tmp/nst/ https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg/1024px-Van_Gogh_-_Starry_Night_-_Google_Art_Project.jpg
!wget --quiet -P /tmp/nst/ https://img-blog.csdnimg.cn/20210403204754742.jpg

複制

代碼運作後，tmp下的 nst 檔案夾裡有了我們請求下載下傳的圖檔。

導入需要的依賴庫和配置

import matplotlib.pyplot as plt
import matplotlib as mpl

mpl.rcParams['figure.figsize'] = (10,10)
mpl.rcParams['axes.grid'] = False

import numpy as np
from PIL import Image
import time
import functools

複制

%tensorflow_version 1.x
import tensorflow as tf

from tensorflow.python.keras.preprocessing import image as kp_image
from tensorflow.python.keras import models 
from tensorflow.python.keras import losses
from tensorflow.python.keras import layers
from tensorflow.python.keras import backend as K

複制

我們将從啟用 Eager execution 開始。Eager execution允許我們以最清晰和最易讀的方式完成這項技術。

tf.enable_eager_execution()
print("Eager execution: {}".format(tf.executing_eagerly()))

複制

定義我們的圖檔路徑

# Set up some global values here
content_path = '/tmp/nst/chongzi.jpg'
style_path = '/tmp/nst/The_Great_Wave_off_Kanagawa.jpg'

複制

可視化輸入圖像

def load_img(path_to_img):
  max_dim = 512
  img = Image.open(path_to_img)
  long = max(img.size)
  scale = max_dim/long
  img = img.resize((round(img.size[0]*scale), round(img.size[1]*scale)), Image.ANTIALIAS)
  
  img = kp_image.img_to_array(img)
  
  # We need to broadcast the image array such that it has a batch dimension 
  img = np.expand_dims(img, axis=0)
  return img

複制

def imshow(img, title=None):
  # Remove the batch dimension
  out = np.squeeze(img, axis=0)
  # Normalize for display 
  out = out.astype('uint8')
  plt.imshow(out)
  if title is not None:
    plt.title(title)
  plt.imshow(out)

複制

plt.figure(figsize=(10,10))

content = load_img(content_path).astype('uint8')
style = load_img(style_path).astype('uint8')

plt.subplot(1, 2, 1)
imshow(content, 'Content Image')

plt.subplot(1, 2, 2)
imshow(style, 'Style Image')
plt.show()

複制

我們期望輸出的圖像是輸入圖檔轉化成集内容圖檔的内容與樣式參考圖檔的樣式于一體的别樣圖檔

準備資料：讓我們建立便捷化加載和預處理圖像流程的方法。根據 VGG 訓練流程，我們執行與預期相同的預處理流程。VGG網絡在圖像上進行訓練，每個通道歸一化的平均值為[103.939，116.779，123.68]（通道BGR）。

def load_and_process_img(path_to_img):
  img = load_img(path_to_img)
  img = tf.keras.applications.vgg19.preprocess_input(img)
  return img

複制

為了檢視優化的輸出，我們需要執行逆預處理步驟。此外，由于我們的優化圖像的值可能在負無窮和正無窮之間的任何地方，我們必須剪切，以保持我們的值在 0-255 範圍内。

def deprocess_img(processed_img):
  x = processed_img.copy()
  if len(x.shape) == 4:
    x = np.squeeze(x, 0)
  assert len(x.shape) == 3, ("Input to deprocess image must be an image of "
                             "dimension [1, height, width, channel] or [height, width, channel]")
  if len(x.shape) != 3:
    raise ValueError("Invalid input to deprocessing image")
  
  # perform the inverse of the preprocessing step
  x[:, :, 0] += 103.939
  x[:, :, 1] += 116.779
  x[:, :, 2] += 123.68
  x = x[:, :, ::-1]

  x = np.clip(x, 0, 255).astype('uint8')
  return x

複制

定義内容和樣式表示：為了獲得圖像的内容和樣式表示，我們将檢視模型中的一些中間層。随着我們深入模型，這些中間層代表越來越高的階特征。在這種情況下，我們使用網絡架構VGG19，一個預先訓練的圖像分類網絡。這些中間層對于從圖像定義内容和樣式的表示是必要的。對于輸入圖像，我們将嘗試比對這些中間層上相應的樣式和内容目标表示。

為什麼需要中間層?

為了讓一個網絡執行圖像分類(我們的網絡已經接受了這樣的訓練)，它必須了解圖像。這涉及到将原始圖像作為輸入像素，并通過将原始圖像像素轉換為複雜的圖像特性了解來建構内部表示。這也是卷積神經網絡能夠很好地泛化的部分原因:它們能夠捕獲類(例如，貓vs狗)中的不變性和定義特征，而這些不變性和定義特征對背景噪聲和其他滋擾是不可知的。是以，在輸入原始圖像和輸出分類标簽之間的某個地方，模型充當一個複雜的特征提取器;是以，通過通路中間層，我們能夠描述輸入圖像的内容和樣式。

具體來說，我們将從我們的網絡中拉出這些中間層:

# Content layer where will pull our feature maps
content_layers = ['block5_conv2'] 

# Style layer we are interested in
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1', 
                'block4_conv1', 
                'block5_conv1'
               ]

num_content_layers = len(content_layers)
num_style_layers = len(style_layers)

複制

構模組化型：在這種情況下，我們加載VGG19，并将我們的輸入張量輸入到模型中。這将允許我們提取内容、樣式和生成的圖像的特征映射(以及随後的内容和樣式表示)。我們使用VGG19，正如論文中建議的那樣。此外，由于 VGG19 是一個相對簡單的模型(與ResNet、Inception等相比)，是以功能映射實際上更适合于樣式轉換。

為了通路與我們的樣式和内容特性映射對應的中間層，我們獲得了相應的輸出，并使用 Keras 函數API，使用所需的輸出激活來定義模型。使用函數式 API 定義模型隻需定義輸入和輸出：model = Model(inputs, outputs)。

def get_model():
  """ Creates our model with access to intermediate layers. 
  
  This function will load the VGG19 model and access the intermediate layers. 
  These layers will then be used to create a new model that will take input image
  and return the outputs from these intermediate layers from the VGG model. 
  
  Returns:
    returns a keras model that takes image inputs and outputs the style and 
      content intermediate layers. 
  """
  # Load our model. We load pretrained VGG, trained on imagenet data
  vgg = tf.keras.applications.vgg19.VGG19(include_top=False, weights='imagenet')
  vgg.trainable = False
  # Get output layers corresponding to style and content layers 
  style_outputs = [vgg.get_layer(name).output for name in style_layers]
  content_outputs = [vgg.get_layer(name).output for name in content_layers]
  model_outputs = style_outputs + content_outputs
  # Build model 
  return models.Model(vgg.input, model_outputs)

複制

在上面的代碼片段中，我們将加載預先訓練好的圖像分類網絡。然後我們擷取前面定義的感興趣的層。然後，我們通過将模型的輸入設定為圖像，将輸出設定為樣式和内容層的輸出來定義模型。換句話說，我們建立了一個模型，它将擷取輸入圖像并輸出内容和樣式中間層！

計算Content Loss：我們将在每一層添加我們的Content Loss。這樣，當我們通過模型(在 Eager 中是簡單的模型input_image!)提供輸入圖像時，每次疊代都将正确地計算通過模型的所有内容損失，因為我們正在急切地執行，是以将計算所有的梯度。

其中我們通過一些因子 wl 權重每一層損失的貢獻。在我們的例子中，我們将每個層的權重相等（wl=1/|L|）。

Computing style loss

同樣，我們将損失作為距離度量。

def gram_matrix(input_tensor):
  # We make the image channels first 
  channels = int(input_tensor.shape[-1])
  a = tf.reshape(input_tensor, [-1, channels])
  n = tf.shape(a)[0]
  gram = tf.matmul(a, a, transpose_a=True)
  return gram / tf.cast(n, tf.float32)

def get_style_loss(base_style, gram_target):
  """Expects two images of dimension h, w, c"""
  # height, width, num filters of each layer
  # We scale the loss at a given layer by the size of the feature map and the number of filters
  height, width, channels = base_style.get_shape().as_list()
  gram_style = gram_matrix(base_style)
  
  return tf.reduce_mean(tf.square(gram_style - gram_target))# / (4. * (channels ** 2) * (width * height) ** 2)

複制

應用樣式遷移

運作梯度下降：如果你不熟悉梯度下降/反向傳播或需要複習，你絕對應該看看這個令人敬畏的資源。在這種情況下，我們使用 Adam 優化器來最小化我們的損失。我們疊代地更新我們的輸出圖像，使其損失最小化。我們不更新與我們的網絡相關的權值，而是訓練我們的輸入圖像，使損失最小化。為了做到這一點，我們必須知道如何計算損失和梯度。

請注意L-BFGS，如果您熟悉這個算法推薦，不是本教程中使用本教程因為背後的主要動機是為了說明與渴望執行最佳實踐，通過使用亞當，我們可以證明autograd/梯度帶功能自定義訓練循環。

接下來我們将定義一個小助手函數，它将加載内容和樣式圖像，并通過我們的網絡轉發它們，然後該網絡将輸出模型中的内容和樣式特征表示。

def get_feature_representations(model, content_path, style_path):
  """Helper function to compute our content and style feature representations.

  This function will simply load and preprocess both the content and style 
  images from their path. Then it will feed them through the network to obtain
  the outputs of the intermediate layers. 
  
  Arguments:
    model: The model that we are using.
    content_path: The path to the content image.
    style_path: The path to the style image
    
  Returns:
    returns the style features and the content features. 
  """
  # Load our images in 
  content_image = load_and_process_img(content_path)
  style_image = load_and_process_img(style_path)
  
  # batch compute content and style features
  style_outputs = model(style_image)
  content_outputs = model(content_image)
  
  
  # Get the style and content feature representations from our model  
  style_features = [style_layer[0] for style_layer in style_outputs[:num_style_layers]]
  content_features = [content_layer[0] for content_layer in content_outputs[num_style_layers:]]
  return style_features, content_features

複制

計算損失和梯度：這裡我們用 tf.GradientTape 計算梯度。它允許我們通過跟蹤操作來利用自動微分來計算後面的梯度。它記錄前向傳遞過程中的操作，然後計算出損失函數相對于後向傳遞的輸入圖像的梯度。

def compute_loss(model, loss_weights, init_image, gram_style_features, content_features):
  """This function will compute the loss total loss.
  
  Arguments:
    model: The model that will give us access to the intermediate layers
    loss_weights: The weights of each contribution of each loss function. 
      (style weight, content weight, and total variation weight)
    init_image: Our initial base image. This image is what we are updating with 
      our optimization process. We apply the gradients wrt the loss we are 
      calculating to this image.
    gram_style_features: Precomputed gram matrices corresponding to the 
      defined style layers of interest.
    content_features: Precomputed outputs from defined content layers of 
      interest.
      
  Returns:
    returns the total loss, style loss, content loss, and total variational loss
  """
  style_weight, content_weight = loss_weights
  
  # Feed our init image through our model. This will give us the content and 
  # style representations at our desired layers. Since we're using eager
  # our model is callable just like any other function!
  model_outputs = model(init_image)
  
  style_output_features = model_outputs[:num_style_layers]
  content_output_features = model_outputs[num_style_layers:]
  
  style_score = 0
  content_score = 0

  # Accumulate style losses from all layers
  # Here, we equally weight each contribution of each loss layer
  weight_per_style_layer = 1.0 / float(num_style_layers)
  for target_style, comb_style in zip(gram_style_features, style_output_features):
    style_score += weight_per_style_layer * get_style_loss(comb_style[0], target_style)
    
  # Accumulate content losses from all layers 
  weight_per_content_layer = 1.0 / float(num_content_layers)
  for target_content, comb_content in zip(content_features, content_output_features):
    content_score += weight_per_content_layer* get_content_loss(comb_content[0], target_content)
  
  style_score *= style_weight
  content_score *= content_weight

  # Get total loss
  loss = style_score + content_score 
  return loss, style_score, content_score

複制

然後計算梯度就簡單了

def compute_grads(cfg):
  with tf.GradientTape() as tape: 
    all_loss = compute_loss(**cfg)
  # Compute gradients wrt input image
  total_loss = all_loss[0]
  return tape.gradient(total_loss, cfg['init_image']), all_loss

複制

優化循環

import IPython.display

def run_style_transfer(content_path, 
                       style_path,
                       num_iterations=1000,
                       content_weight=1e3, 
                       style_weight=1e-2): 
  # We don't need to (or want to) train any layers of our model, so we set their
  # trainable to false. 
  model = get_model() 
  for layer in model.layers:
    layer.trainable = False
  
  # Get the style and content feature representations (from our specified intermediate layers) 
  style_features, content_features = get_feature_representations(model, content_path, style_path)
  gram_style_features = [gram_matrix(style_feature) for style_feature in style_features]
  
  # Set initial image
  init_image = load_and_process_img(content_path)
  init_image = tf.Variable(init_image, dtype=tf.float32)
  # Create our optimizer
  opt = tf.train.AdamOptimizer(learning_rate=5, beta1=0.99, epsilon=1e-1)

  # For displaying intermediate images 
  iter_count = 1
  
  # Store our best result
  best_loss, best_img = float('inf'), None
  
  # Create a nice config 
  loss_weights = (style_weight, content_weight)
  cfg = {
      'model': model,
      'loss_weights': loss_weights,
      'init_image': init_image,
      'gram_style_features': gram_style_features,
      'content_features': content_features
  }
    
  # For displaying
  num_rows = 2
  num_cols = 5
  display_interval = num_iterations/(num_rows*num_cols)
  start_time = time.time()
  global_start = time.time()
  
  norm_means = np.array([103.939, 116.779, 123.68])
  min_vals = -norm_means
  max_vals = 255 - norm_means   
  
  imgs = []
  for i in range(num_iterations):
    grads, all_loss = compute_grads(cfg)
    loss, style_score, content_score = all_loss
    opt.apply_gradients([(grads, init_image)])
    clipped = tf.clip_by_value(init_image, min_vals, max_vals)
    init_image.assign(clipped)
    end_time = time.time() 
    
    if loss < best_loss:
      # Update best loss and best image from total loss. 
      best_loss = loss
      best_img = deprocess_img(init_image.numpy())

    if i % display_interval== 0:
      start_time = time.time()
      
      # Use the .numpy() method to get the concrete numpy array
      plot_img = init_image.numpy()
      plot_img = deprocess_img(plot_img)
      imgs.append(plot_img)
      IPython.display.clear_output(wait=True)
      IPython.display.display_png(Image.fromarray(plot_img))
      print('Iteration: {}'.format(i))        
      print('Total loss: {:.4e}, ' 
            'style loss: {:.4e}, '
            'content loss: {:.4e}, '
            'time: {:.4f}s'.format(loss, style_score, content_score, time.time() - start_time))
  print('Total time: {:.4f}s'.format(time.time() - global_start))
  IPython.display.clear_output(wait=True)
  plt.figure(figsize=(14,4))
  for i,img in enumerate(imgs):
      plt.subplot(num_rows,num_cols,i+1)
      plt.imshow(img)
      plt.xticks([])
      plt.yticks([])
      
  return best_img, best_loss

複制

模型跑起來!

best, best_loss = run_style_transfer(content_path, style_path, num_iterations=1000)

複制

可視化輸出

我們對輸出圖像進行“deprocess”，以去除對其進行的處理，展示。

def show_results(best_img, content_path, style_path, show_large_final=True):
  plt.figure(figsize=(10, 5))
  content = load_img(content_path) 
  style = load_img(style_path)

  plt.subplot(1, 2, 1)
  imshow(content, 'Content Image')

  plt.subplot(1, 2, 2)
  imshow(style, 'Style Image')

  if show_large_final: 
    plt.figure(figsize=(10, 10))

    plt.imshow(best_img)
    plt.title('Output Image')
    plt.show()

複制

show_results(best, content_path, style_path)

複制

遷移學習打造圖像的别樣風格，一隻玉米螟的圖像也能成為藝術品，看起來還不錯吧！

遷移學習實踐 深度學習打造圖像的别樣風格一、前言二、代碼實踐

文章目錄

一、前言

二、代碼實踐

遷移學習實踐深度學習打造圖像的别樣風格一、前言二、代碼實踐