High-resolution digital images are usually downscaled to fit various display screens or save the cost of storage and bandwidth, meanwhile the post-upscaling is adpoted to recover the original resolutions or the details in the zoom-in images. However, typical image downscaling is a non-injective mapping due to the loss of high-frequency information, which leads to the ill-posed problem of the inverse upscaling procedure and poses great challenges for recovering details from the downscaled low-resolution images. Simply upscaling with image super-resolution methods results in unsatisfactory recovering performance.
In this work, we propose to solve this problem by modeling the downscaling and upscaling processes from a new perspective, i.e. an invertible bijective transformation, which can largely mitigate the ill-posed nature of image upscaling. We develop an Invertible Rescaling Net (IRN) with deliberately designed framework and objectives to produce visually-pleasing low-resolution images and meanwhile capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process. In this way, upscaling is made tractable by inversely passing a randomly-drawn latent variable with the low-resolution image through the network.
Experimental results demonstrate the significant improvement of our model over existing methods in terms of both quantitative and qualitative evaluations of image upscaling reconstruction from downscaled images.
[34] TIP18 : Learning a convolutional neural network for image compact-resolution
[49] TIP18 : Learned image downscaling for upscaling using content adaptive resampler
In this paper, with inspiration from the reciprocal nature of this pair of image rescaling tasks, we propose a novel method to largely mitigate this ill-posed problem of the image upscaling. According to the Nyquist-Shannon sampling theorem, high-frequency contents are lost during downscaling. Ideally, we hope to keep all lost information to perfectly recover the original HR image, but storing or transferring the high-frequency information is unacceptable. In order to well address this challenge, we develop a novel invertible model called Invertible Rescaling Net (IRN) which captures some knowledge on the lost information in the form of its distribution and embeds it into model’s parameters to mitigate the ill-posedness. Given an HR image
, whose marginal distribution obeys a fixed pre-specified distribution (e.g., isotropic Gaussian). Based on this model, we use a randomly drawn sample of
Yet, there are still several great challenges needed to be addressed during the IRN training process. Specifically, it is essential to ensure the quality of reconstructed HR images, obtain visually pleasing downscaled LR ones, and accomplish the upscaling with a case-agnostic
To this end, we design a novel compact and effective objective function by combining three respective components: an HR reconstruction loss, an LR guidance loss and a distribution matching loss. The last component is for the model to capture the true HR image manifold as well as for enforcing
Neither the conventional adversarial training techniques of generative adversarial nets (GANs) [21] nor the maximum likelihood estimation (MLE) method for existing invertible neural networks [15,16,29,4] could achieve our goal, since the model distribution doesn’t exist here, meanwhile these methods don’t guide the distribution in the latent space.
Instead, we take the pushed-forward empirical distribution of
We minimize the JS divergence to match the distributions, since the alternative sample-based maximum mean discrepancy (MMD) method [3] doesn’t generalize well to the high dimension data in our task.
這一大段故事包括以下幾點:
本文方法需要面臨的難題是:保證重建的HR圖像的品質;獲得視覺上令人滿意的縮小的 LR 圖像;并使用案例無關的
– To our best knowledge, the proposed IRN is the first attempt to model image downscaling and upscaling, a pair of mutually-inverse tasks, using an invertible (i.e., bijective) transformation. Powered by the deliberately designed invertibility, our proposed IRN can largely mitigate the ill-posed nature of image upscaling reconstruction from the downscaled LR image.
– We propose a novel model design and efficient training objectives for IRN to enforce the latent variable
, with embedded lost high-frequency information in the downscaling direction, to obey a simple case-agnostic distribution. This enables efficient upscaling based on the valuable samples of
– The proposed IRN can significantly boost the performance of upscaling reconstruction from downscaled LR images compared with state-of-the-art downscaling-SR and encoder-decoder methods. Moreover, the amount of parameters of IRN is significantly reduced, which indicates the light-weight and high-efficiency of the new IRN model.
Super resolution (SR) is a widely-used image upscaling method and get promising results in low-resolution (LR) image upscaling task. Therefore, SR methods could be used to upscale downscaled images. Since the SR task is inherently ill-posed, previous SR works mainly focus on learning strong prior information by example-based strategy [18,20,46,27] or deep learning models [17,36,60,59,14,50]. However, if the targeted LR image is pre-downscaled from the corresponding high-resolution image, taking the image downscaling method into consideration would significantly help the upscaling reconstruction.
Traditional image downscaling approaches employ frequency-based kernels, such as Bilinear, Bicubic, etc. [41], as a low-pass filter to sub-sample the input HR images into target resolution. Normally, these methods suffer from resulting over-smoothed images since the high-frequency details are suppressed. Therefore, several detail-preserving or structurally similar downscaling methods [31,42,51,52,38] are proposed recently. Besides those perceptual-oriented downscaling methods, inspired by the potentially mutual reinforcement between downscaling and its inverse task, upscaling, increasing efforts have been focused on the upscaling-optimal downscaling methods, which aim to learn a downscaling model that is optimal to the post-upscaling operation.
For instance, Kim et al. [26] proposed a task-aware downscaling model based on an auto-encoder framework, in which the encoder and decoder act as the downscaling and upscaling model, respectively, such that the downscaling and upscaling processes are trained jointly as a united task. Similarly, Li et al. [34] proposed to use a CNN to estimate downscaled compact-resolution images and leverage a learned or specified SR model for HR image reconstruction. More recently, Sun et al. [49] proposed a new content-adaptive-resampler based image downscaling method, which can be jointly trained with any existing differentiable upscaling (SR) models.
Although these attempts have an effect of pushing one of downscaling and upscaling to resemble the inverse process of the other, they still suffer from the ill-posed nature of image upscaling problem. In this paper, we propose to model the downscaling and upscaling processes by leveraging the invertible neural networks.
傳統的 downscaling 方法采用 frequency based kernels;這種方法直接将細節資訊丢失了。
是以,為了保留細節資訊,detail-preserving or structurally similar downscaling 被提出。
makes inference much cheaper. As it is possible to compute the density of the model distribution in INN explicitly, one can use the maximum likelihood method for training. Due to such flexibility, INN architectures are also used for many variational inference tasks [44,30,10].
. Ardizzone et al. [3] analyzed real-world problems from medicine and astrophysics. Compared to their tasks, image downscaling and upscaling bring more difficulties because of notably larger dimensionality, so that their losses do not work for our task. In addition, the ground-truth LR image y does not exist in our task. Guided image generation and colorization using INN is proposed in [4] where the invertible modeling between
The sketch of our modeling framework is presented in Fig. 1. As explained in Introduction, we mitigate the ill-posed problem of the upscaling task by modeling the distribution of lost information during downscaling. We note that according to the Nyquist-Shannon sampling theorem [47], the lost information during downscaling an HR image amounts to high-frequency contents. Thus we firstly employ a wavelet transformation to decompose the HR image
The general architecture of our proposed IRN is composed of stacked Downscaling Modules, each of which contains one Haar Transformation block and several invertible neural network blocks (InvBlocks), as illustrated in Fig. 2. We will show later that both of them are invertible, and thus the entire IRN model is invertible accordingly.
The Haar Transformation
We design the model to contain certain inductive bias, which can efficiently learn to decompose
. To achieve this, we apply the Haar Transformation as the first layer in each downscaling module, which can explicitly decompose the input images into an approximate low-pass representation, and three directions of high-frequency coefficients [53][35][4]. More concretely, the Haar Transformation transforms the input raw images or a group of feature maps with height H, width W and channel C into a tensor of shape (1/2H, 1/2W, 4C). The first C slices of the output tensor are effectively produced by an average pooling, which is approximately a low-pass representation equivalent to the Bilinear interpolation downsampling. The rest three groups of C slices contain residual components in the vertical, horizontal and diagonal directions respectively, which are the high-frequency information in the original HR image. By such a transformation, the low and high-frequency information are effectively separated and will be fed into the following InvBlocks.
哈爾變換,er,不多講了,很基礎的東西。
InvBlock
Taking the feature maps after the Haar Transformation as input, a stack of InvBlocks is used to further abstract the LR and latent representations. We leverage the general coupling layer architecture proposed in [15,16], i.e. Eqs. (1,3).
Utilizing the coupling layer is based on our considerations that (1) the input has already been split into low and high-frequency components by the Haar transformation; (2) we want the two branches of the output of a coupling layer to further polish the low and high-frequency inputs for a suitable LR image appearance and an independent and properly distributed latent representation of the high-frequency contents. So we match the low and high-frequency components respectively to the split of
in Eq. (1). Furthermore, as the short-cut connection is proved to be important in the image scaling tasks [36,50], we employ the additive transformation (Eq. 1) for the low-frequency part
Note that the transformation functions φ(·), η(·), ρ(·) in Fig. 2 can be arbitrary. Here we employ a densely connected convolutional block, which is referred as Dense Block in [50] and demonstrated for its effectiveness of image upscaling task. Function ρ(·) is further followed by a centered sigmoid function and a scale term to prevent numerical explosion due to the exp(·) function. Note that Figure 2 omits the exp(·) in function ρ.
To save the output images of IRN as common image storage format such as RGB (8 bits for each R, G and B color channels), a quantization module is adopted which converts floating-point values of produced LR images to 8-bit unsigned int. We simply use rounding operation as the quantization module, store our output LR images by PNG format and use it in the upscaling procedure. There is one obstacle should be noted that the quantization module is nondifferentiable. To ensure that IRN can be optimized during training, we use Straight-Through Estimator [9] on the quantization module when calculating the gradients.
為了将 IRN 輸出的圖像儲存為 RGB (每個R、G、B顔色通道8位) 等常用的圖像存儲格式,采用量化子產品将産生的 LR 圖像的浮點值轉換為8位 unsigned int。本文簡單地使用四舍五入運算作為量化子產品,以 PNG 格式存儲輸出的 LR 圖像,并在 upscaling 過程中使用它。需要注意的是,量子化模是不可微的。為了保證 IRN 在訓練時能夠得到優化,在計算梯度時,在量化子產品上使用了Straight-Through Estimator。
Training Objectives
Based on Section 3.1, our approach for invertible downscaling constructs a model that specifies a correspondence between HR image x and LR image y, as well as a caseagnostic distribution p(z) of z. The goal of training is to drive these modeled relations and quantities to match our desiderata and HR image data {x (n)} N n=1. This includes three specific goals, as detailed below.