天天看點

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

Invertible Image Rescaling

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

https://arxiv.org/pdf/2005.05650.pdf

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

https://arxiv.org/pdf/2005.05650.pdfhttps://github.com/pkuxmq/Invertible-Image-Rescaling

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

https://github.com/pkuxmq/Invertible-Image-Rescaling

目錄

Invertible Image Rescaling

Abstract

Introduction

Related Work

Methods

Post 講解

Abstract

High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images. However, typical image downscaling is a non-injective mapping due to the loss of high-frequency information, which leads to the ill-posed problem of the inverse upscaling procedure and poses great challenges for recovering details from the downscaled low-resolution images. Simply upscaling with image super-resolution methods results in unsatisfactory recovering performance.

In this work, we propose to solve this problem by modeling the downscaling and upscaling processes from a new perspective, i.e. an invertible bijective transformation, which can largely mitigate the ill-posed nature of image upscaling. We develop an Invertible Rescaling Net (IRN) with deliberately designed framework and objectives to produce visually-pleasing low-resolution images and meanwhile capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process. In this way, upscaling is made tractable by inversely passing a randomly-drawn latent variable with the low-resolution image through the network.

Experimental results demonstrate the significant improvement of our model over existing methods in terms of both quantitative and qualitative evaluations of image upscaling reconstruction from downscaled images.

提出動機:從應用(在不同顯示器上顯示,節約存儲和寬帶成本)出發提出需求,到目前上采樣(downscaling 是非單射比對,upscaling 會丢失細節)和超分算法(恢複效果不理想)的不足。

本文工作:提出可逆雙射變換,建構 Invertible Rescaling Net (IRN) 網絡。至于該網絡為啥有效,作者的這裡的一句解釋很抽象。。。

實驗結論:就是很好地解決了 downscaled 圖像的 upscaling 重構問題。

Introduction

第一段(略了):研究背景需求和 upscaling 的問題所在,提出研究背景。

第二段(略了):傳統超分算法的問題。

第三段(略了):聯合 downscaling 和 upscaling 方法 [26] [34] [49] 的問題。

[26] ECCV18 : Task-aware image downscaling;

[34] TIP18 : Learning a convolutional neural network for image compact-resolution

[49] TIP18 : Learned image downscaling for upscaling using content adaptive resampler

In this paper, with inspiration from the reciprocal nature of this pair of image rescaling tasks, we propose a novel method to largely mitigate this ill-posed problem of the image upscaling. According to the Nyquist-Shannon sampling theorem, high-frequency contents are lost during downscaling. Ideally, we hope to keep all lost information to perfectly recover the original HR image, but storing or transferring the high-frequency information is unacceptable. In order to well address this challenge, we develop a novel invertible model called Invertible Rescaling Net (IRN) which captures some knowledge on the lost information in the form of its distribution and embeds it into model’s parameters to mitigate the ill-posedness. Given an HR image
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
, IRN not only downscales it into a visually-pleasing LR image
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
, but also embed the case-specific high-frequency content into an auxiliary case-agnostic latent variable
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
, whose marginal distribution obeys a fixed pre-specified distribution (e.g., isotropic Gaussian). Based on this model, we use a randomly drawn sample of
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
from the pre-specified distribution for the inverse upscaling procedure , which holds the most information that one could have in upscaling.

downscale 和 upscale 是一對互補的過程。

奈奎斯特-香濃定理說明,downscale 過程一定會丢失資訊,這些資訊在 upscale 過程是無法完全恢複的。

本文的思路是,在 downscale 的過程中,除了得到高品質的 LR 圖像,同時也要将特定案例的高頻内容嵌入到一個輔助的不可知案例的潛在變量中,而這個變量的邊際分布服從一個固定的預先指定的分布,如各向同性高斯分布。

基于此模型,本文的方法從預先指定的分布中随機抽取

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

樣本進行逆 upscaling 過程,其中包含了 upscaling 過程中所能獲得的更多資訊。

Yet, there are still several great challenges needed to be addressed during the IRN training process. Specifically, it is essential to ensure the quality of reconstructed HR images, obtain visually pleasing downscaled LR ones, and accomplish the upscaling with a case-agnostic
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
, i.e.,
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
instead of a case-specific
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

.

To this end, we design a novel compact and effective objective function by combining three respective components: an HR reconstruction loss, an LR guidance loss and a distribution matching loss. The last component is for the model to capture the true HR image manifold as well as for enforcing

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

to be case-agnostic.

Neither the conventional adversarial training techniques of generative adversarial nets (GANs) [21] nor the maximum likelihood estimation (MLE) method for existing invertible neural networks [15,16,29,4] could achieve our goal, since the model distribution doesn’t exist here, meanwhile these methods don’t guide the distribution in the latent space.

Instead, we take the pushed-forward empirical distribution of

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
as the distribution on
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
, which, in independent company with
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
, is the actually used distribution to inversely pass our model to recover the distribution of
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
. We thus match this distribution with the empirical distribution of
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

(the data distribution).

Moreover, due to the invertible nature of our model, we show that once this matching task is accomplished, the matching task in the

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
space is also solved, and
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

is made case-agnostic.

We minimize the JS divergence to match the distributions, since the alternative sample-based maximum mean discrepancy (MMD) method [3] doesn’t generalize well to the high dimension data in our task.

這一大段故事包括以下幾點:

本文方法需要面臨的難題是:保證重建的HR圖像的品質;獲得視覺上令人滿意的縮小的 LR 圖像;并使用案例無關的

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

來完成 upscaling,即

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

而不是案例特定的

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

,這是至關重要的。

為此,本文提出了一個密集的目标函數,包括 an HR reconstruction loss, an LR guidance loss and a distribution matching loss。

最後一個 loss 是為了讓模型捕獲真正的 HR 圖像流形,以及強制

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

是個案無關的。

生成對抗網絡的傳統對抗訓練技術和已有可逆神經網絡的最大似然估計(MLE)方法都不能達到我們的目标,因為這裡不存在模型分布,同時這些方法也不能指導潛在空間的分布。

本文将

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

的 pushed-forward 經驗分布作為

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

的分布,且這個分布與 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

相獨立,同時也是反向可逆過程中恢複

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

使用的分布。是以,将這個分布與

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

的經驗分布 (資料分布) 相比對。

此外,由于模型的可逆性質,本文證明一旦這個比對任務完成,

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

空間中的比對任務也會被解決,并且

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

是不區分大小寫的。

本文采用最小化 JS 發散來比對分布,因為替代的基于樣本的最大平均偏差 (MMD) 方法不能很好地推廣到本任務中的高維資料。

Our contributions are concluded as follows:

– To our best knowledge, the proposed IRN is the first attempt to model image downscaling and upscaling, a pair of mutually-inverse tasks, using an invertible (i.e., bijective) transformation. Powered by the deliberately designed invertibility, our proposed IRN can largely mitigate the ill-posed nature of image upscaling reconstruction from the downscaled LR image.

– We propose a novel model design and efficient training objectives for IRN to enforce the latent variable

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
, with embedded lost high-frequency information in the downscaling direction, to obey a simple case-agnostic distribution. This enables efficient upscaling based on the valuable samples of
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

drawn from the certain distribution.

– The proposed IRN can significantly boost the performance of upscaling reconstruction from downscaled LR images compared with state-of-the-art downscaling-SR and encoder-decoder methods. Moreover, the amount of parameters of IRN is significantly reduced, which indicates the light-weight and high-efficiency of the new IRN model.

貢獻總結為:

首次用可逆變換解決 downscaling and upscaling 問題;

提出了高效的目标函數,強制潛變量

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

,在 downscaling 方向上嵌入丢失的高頻資訊,服從簡單的案例不可知分布(非常重要的創新);

性能好,還參數少。

Related Work

Image Upscaling after Downscaling

Super resolution (SR) is a widely-used image upscaling method and get promising results in low-resolution (LR) image upscaling task. Therefore, SR methods could be used to upscale downscaled images. Since the SR task is inherently ill-posed, previous SR works mainly focus on learning strong prior information by example-based strategy [18,20,46,27] or deep learning models [17,36,60,59,14,50]. However, if the targeted LR image is pre-downscaled from the corresponding high-resolution image, taking the image downscaling method into consideration would significantly help the upscaling reconstruction.

Traditional image downscaling approaches employ frequency-based kernels, such as Bilinear, Bicubic, etc. [41], as a low-pass filter to sub-sample the input HR images into target resolution. Normally, these methods suffer from resulting over-smoothed images since the high-frequency details are suppressed. Therefore, several detail-preserving or structurally similar downscaling methods [31,42,51,52,38] are proposed recently. Besides those perceptual-oriented downscaling methods, inspired by the potentially mutual reinforcement between downscaling and its inverse task, upscaling, increasing efforts have been focused on the upscaling-optimal downscaling methods, which aim to learn a downscaling model that is optimal to the post-upscaling operation.

For instance, Kim et al. [26] proposed a task-aware downscaling model based on an auto-encoder framework, in which the encoder and decoder act as the downscaling and upscaling model, respectively, such that the downscaling and upscaling processes are trained jointly as a united task. Similarly, Li et al. [34] proposed to use a CNN to estimate downscaled compact-resolution images and leverage a learned or specified SR model for HR image reconstruction. More recently, Sun et al. [49] proposed a new content-adaptive-resampler based image downscaling method, which can be jointly trained with any existing differentiable upscaling (SR) models.

Although these attempts have an effect of pushing one of downscaling and upscaling to resemble the inverse process of the other, they still suffer from the ill-posed nature of image upscaling problem. In this paper, we propose to model the downscaling and upscaling processes by leveraging the invertible neural networks.

傳統的 downscaling 方法采用 frequency based kernels;這種方法直接将細節資訊丢失了。

是以,為了保留細節資訊,detail-preserving or structurally similar downscaling 被提出。

另外,為了能讓 downscaling 圖像 upscaling 更好的圖像,提出了能讓 upscaling 最優的 downscaling 方法,即 upscaling-optimal downscaling methods。

例如 。。。。

盡管這些嘗試具有推動 downscaling  和 upscaling 的效果,以類似于另一個的逆過程,他們仍然遭受圖像 upscaling 不适定這個本質的問題。

Invertible Neural Network

The invertible neural network (INN) [15,16,29,32,22,8,13] is a popular choice for generative models, in which the generative process
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
given a latent variable
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
can be specified by an INN architecture
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
. The direct access to the inverse mapping
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
makes inference much cheaper. As it is possible to compute the density of the model distribution in INN explicitly, one can use the maximum likelihood method for training. Due to such flexibility, INN architectures are also used for many variational inference tasks [44,30,10].

可逆神經網絡 (INN) 是生成模型的常用選擇,其中 用 INN 體系結構

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

實作潛在變量 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

的生成過程

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

。逆映射為

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

由于可以在 INN 中明确地計算模型分布的密度,可以使用最大似然法進行訓練。

由于這種靈活性,INN體系結構也被用于許多變分推理任務。

INN is composed of invertible blocks. In this study, we employ the invertible architecture in [16]. For the l-th block, input
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
is split into
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
and
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
along the channel axis, and they undergo the additive affine transformations [15]:
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
where φ, η are arbitrary functions. The corresponding output is
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
. Given the output, its inverse transformation is easily computed:
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
 To enhance the transformation ability, the identity branch is often augmented [16]:
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

這裡給出了一種 INN 模型,是 ICLR 2017 一篇文章中給出的模型。至于為啥這樣,有啥好處,請參拜原著:

[16] Density estimation using real NVP [2017 ICLR]https://arxiv.org/pdf/1605.08803.pdf

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

https://arxiv.org/pdf/1605.08803.pdf

Some prior works studied using INN for paired data
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
. Ardizzone et al. [3] analyzed real-world problems from medicine and astrophysics. Compared to their tasks, image downscaling and upscaling bring more difficulties because of notably larger dimensionality, so that their losses do not work for our task. In addition, the ground-truth LR image y does not exist in our task. Guided image generation and colorization using INN is proposed in [4] where the invertible modeling between
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
and
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
is conditioned on a guidance
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
. The model cannot generate
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
given
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
thus is unsuitable for the image upscaling task. INN is also applied to the image-to-image translation task [43] where the paired domain
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
instead of paired data is considered, thus is again not the case of image upscaling.

[3] Analyzing inverse problems with invertible neural networks [2019 ICLR][4] Guided image generation with ¨ conditional invertible neural networks [2019]

[43] Reversible gans for memory-efficient image-to-image translation [2019 CVPR]

一些先前的工作使用 INN 對配對資料 (x, y) 進行了研究。Ardizzone 等人分析了來自醫學和天體實體學的現實問題。與他們的任務相比,圖像的降維和升維明顯更大,帶來了更多的困難,是以他們的損失對我們的任務不起作用。此外,真實 LR 圖像

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

在本文的任務中不存在。在 [4] 中提出了使用 INN 的引導圖像生成和着色,其中

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

之間的可逆模組化是以引導

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

為條件的。該模型不能在給定

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

的情況下生成

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

,是以不适合圖像的 upscaling 任務。INN 也應用于圖像到圖像的轉換任務 [43],其中考慮的是成對的域 (X, Y) 而不是成對的資料,是以同樣不是圖像 upscaling 的情況。

Other Related Fields

  • Image Compression

圖像壓縮是一種應用于數字圖像的資料壓縮,以降低其存儲或傳輸成本。圖像壓縮可以是有損的(如JPEG、BPG) 或無損的 (如PNG、BMP)。近年來,基于深度學習的圖像壓縮方法在視覺效果和壓縮比方面都有很好的效果。但是,圖像的分辨率不會因壓縮而改變,也就是說,壓縮後的圖像隻有比特流 (bit-stream),沒有視覺上有意義的低分辨率圖像。是以,圖像壓縮方法無法滿足我們的任務。

  • Image Super-resolution

注意,圖像的 upscaling 與超分辨率是不同的任務。在本文的場景中,真實的 HR 圖像一開始是存在的,但在一些應用中,不得不暫時丢棄它,轉而存儲/傳輸 LR 版本。然後希望以後可以使用 LR圖像恢複 HR 圖像。而對于 SR,實際的 HR 在應用程式中是不存在的,任務是為LR生成新的HR圖像。

Methods

Model Specification

The sketch of our modeling framework is presented in Fig. 1. As explained in Introduction, we mitigate the ill-posed problem of the upscaling task by modeling the distribution of lost information during downscaling. We note that according to the Nyquist-Shannon sampling theorem [47], the lost information during downscaling an HR image amounts to high-frequency contents. Thus we firstly employ a wavelet transformation to decompose the HR image

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

into low and high-frequency component, denote as

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

and

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

respectively. Since the case-specific high-frequency information will be lost after downscaling, in order to best recover the original

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

as possible in the upscaling procedure, we use an invertible neural network to produce the visually-pleasing LR image

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

meanwhile model the distribution of the lost information by introducing an auxiliary latent variable

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

. In contrast to the case-specific

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

(i.e.,

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

), we force

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

to be case-agnostic (i.e.,

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

) and obey a simple specified distribution, e.g., an isotropic Gaussian distribution. In this way, there is no further need to preserve either

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 or

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

after downscaling, and

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

can be randomly sampled in the upscaling procedure, which is used to reconstruct

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

combined with LR image

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

by inversely passing the model.

模型定義:

首先,輸入高分辨率圖像通過 小波變換 分解為高頻和低頻成分;

然後,通過多個可逆神經網絡做兩件事,輸出高品質 downscaling 圖像 和 将失去的高頻成分模組化到一個輔助的潛在分布,這個分布與輸入圖像的任何成分無關,即為 case-agnostic,且服從 isotropic Gaussian 分布;這樣做的好處是,網絡不需要儲存 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 和 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 的中間參數,在最後 upscaling 時,隻需要 低分辨率圖像和 isotropic Gaussian 分布随機采樣的資料就可以恢複高品質高分辨率圖像了。

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

Invertible Architecture

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

The general architecture of our proposed IRN is composed of stacked Downscaling Modules, each of which contains one Haar Transformation block and several invertible neural network blocks (InvBlocks), as illustrated in Fig. 2. We will show later that both of them are invertible, and thus the entire IRN model is invertible accordingly.

  • The Haar Transformation
We design the model to contain certain inductive bias, which can efficiently learn to decompose
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
into the downscaled image
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
and case-agnostic high-frequency information embedded in
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
. To achieve this, we apply the Haar Transformation as the first layer in each downscaling module, which can explicitly decompose the input images into an approximate low-pass representation, and three directions of high-frequency coefficients [53][35][4]. More concretely, the Haar Transformation transforms the input raw images or a group of feature maps with height H, width W and channel C into a tensor of shape (1/2H, 1/2W, 4C). The first C slices of the output tensor are effectively produced by an average pooling, which is approximately a low-pass representation equivalent to the Bilinear interpolation downsampling. The rest three groups of C slices contain residual components in the vertical, horizontal and diagonal directions respectively, which are the high-frequency information in the original HR image. By such a transformation, the low and high-frequency information are effectively separated and will be fed into the following InvBlocks.

哈爾變換,er,不多講了,很基礎的東西。

  • InvBlock
Taking the feature maps after the Haar Transformation as input, a stack of InvBlocks is used to further abstract the LR and latent representations. We leverage the general coupling layer architecture proposed in [15,16], i.e. Eqs. (1,3).
Utilizing the coupling layer is based on our considerations that (1) the input has already been split into low and high-frequency components by the Haar transformation; (2) we want the two branches of the output of a coupling layer to further polish the low and high-frequency inputs for a suitable LR image appearance and an independent and properly distributed latent representation of the high-frequency contents. So we match the low and high-frequency components respectively to the split of
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
,
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
in Eq. (1). Furthermore, as the short-cut connection is proved to be important in the image scaling tasks [36,50], we employ the additive transformation (Eq. 1) for the low-frequency part 
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
, and the enhanced affine transformation (Eq. 3) for the high-frequency part
可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
to increase the model capacity, as shown in Fig. 2.

1. 通過哈爾變換,輸入已經被分解為低頻和高頻分量;

2. 低頻分量作為公式(1)中的 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

,高頻分量作為 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

3. 對于 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 采用 additive transformation (Eq. 1);對于 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 采用 enhanced affine transformation (Eq. 3)。

Note that the transformation functions φ(·), η(·), ρ(·) in Fig. 2 can be arbitrary. Here we employ a densely connected convolutional block, which is referred as Dense Block in [50] and demonstrated for its effectiveness of image upscaling task. Function ρ(·) is further followed by a centered sigmoid function and a scale term to prevent numerical explosion due to the exp(·) function. Note that Figure 2 omits the exp(·) in function ρ.

其中,變換函數φ(·),η(·),ρ(·)可以是任意的。在這裡,使用一個 Dense Block,并證明了它的圖像 upscaling 任務的有效性。函數 ρ(·) 後面是一個 sigmoid 函數和一個 scale 項,以防止 exp(·) 函數引起的數值爆炸。注意,圖 2 省略了 ρ 函數中的 exp(·)。

  • Quantization
To save the output images of IRN as common image storage format such as RGB (8 bits for each R, G and B color channels), a quantization module is adopted which converts floating-point values of produced LR images to 8-bit unsigned int. We simply use rounding operation as the quantization module, store our output LR images by PNG format and use it in the upscaling procedure. There is one obstacle should be noted that the quantization module is nondifferentiable. To ensure that IRN can be optimized during training, we use Straight-Through Estimator [9] on the quantization module when calculating the gradients.

為了将 IRN 輸出的圖像儲存為 RGB (每個R、G、B顔色通道8位) 等常用的圖像存儲格式,采用量化子產品将産生的 LR 圖像的浮點值轉換為8位 unsigned int。本文簡單地使用四舍五入運算作為量化子產品,以 PNG 格式存儲輸出的 LR 圖像,并在 upscaling 過程中使用它。需要注意的是,量子化模是不可微的。為了保證 IRN 在訓練時能夠得到優化,在計算梯度時,在量化子產品上使用了Straight-Through Estimator。

Training Objectives

Based on Section 3.1, our approach for invertible downscaling constructs a model that specifies a correspondence between HR image x and LR image y, as well as a caseagnostic distribution p(z) of z. The goal of training is to drive these modeled relations and quantities to match our desiderata and HR image data {x (n)} N n=1. This includes three specific goals, as detailed below.

  • LR Guidance

Downscaling 的圖像與 Bicubic method 降維圖像做 L1 或 L2 loss,如下公式所示:

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解
  • HR Reconstruction

雖然

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

是可逆的,但當z不傳輸時,

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

之間的對應關系就不可逆了。對于特定的縮小後的 LR 圖像

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

,模型可以使用 case-agnostic

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

中的任意

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 樣本來恢複原始 HR 圖像。 

這個損失函數就是恢複的高分辨率圖像與原高分辨率圖像做 L1 或 L2 loss。

  • Distribution Matching

distribution matching loss 的目的是鼓勵網絡捕捉輸入 HR 圖像資料的分布

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

,使得模型重建的 HR 盡可能符合真實的 HR 資料分布。

符号表示:

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 : 輸入 HR 圖像;

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 :從 

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 變換得到的 LR 圖像;

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 :随機采樣信号;

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

 :恢複的 HR 圖像。

 則該損失函數表示為:

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

通過 JS 距離計算該損失函數:

可逆圖像縮放 ECCV 2020 :Invertible Image RescalingInvertible Image RescalingPost 講解

Post 講解

請參見部落格 ECCV 2020 | 對損失資訊進行模組化,實作信号處理高保真還原

繼續閱讀