laitimes

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

Reporting by XinZhiyuan

EDIT: LRS

【New Zhiyuan Introduction】There are unwanted things in the photo, and it is too troublesome to pick up? The artifact is coming! Samsung researchers recently proposed an image restoration model: LaMa, which does not require too much computation under high-resolution image input, and the effect is amazing!

When taking pictures, I think everyone has had an experience: there will always be a lot of other tourists in the background, and after taking photos, you have to find out which one is yourself for half a day.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

In addition to other tourists, if there is a trash can in the photo, or too many elements that are not related to the picture, it will also destroy the beauty of the whole photo. For the small partner whose PS picture technology is not good, it is really difficult to cut these elements out of the picture.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

But the purpose of the development of artificial intelligence technology is to make this kind of work simple!

With just one click, you can cut out all the unwanted elements in the picture, and there is "no PS trace"!

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

Image repair

For a long time, a large number of researchers have been studying how to better remove elements from the picture and correctly replace the background, a task also known as image inpainting.

This task looks simple, but it is quite difficult to implement, because the background information that is obscured is completely unknown to the AI, and the background is generated by the brain.

And some of the occluded elements are not regular background images, and may also be quite complex elements.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

But since the release of Image Inpainting's pioneering work in 2016, the current effect of image restoration has been quite amazing, and there is still an "imaginary" component in face restoration, but it is a piece of cake for cutting the background.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

When human beings make up for images, they will naturally use human understanding of the three-dimensional world, but for AI, the information he can receive is only the pixels in the two-dimensional image. This difference in information reception is also one of the difficulties of AI image repair.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

And human beings can also infer the full picture of an object from a part of the object according to visual common sense. So in order for AI to learn image restoration, we first need to teach the machine one thing: what is the world really like?

The ImageNet dataset provides a large number of two-dimensional images, so it's easy to make the machine aware of the world.

Another problem is that the real photos that usually need to be repaired are of high resolution, so the computational cost required is also higher. But most current image restoration methods focus on low-quality images. Although it is possible to reduce the resolution of the image to a small image in various ways, and then apply the result of the repair to the original image, the final result is certainly not as good as the repair on the original image.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

High-resolution images bring more realistic image restoration, but also require more time for training and image processing, is there really no way to do both?

LaMa model

In response to the problem mentioned above, Samsung's researchers have proposed a new model, LaMa (LArge MAsk inpainting), which is capable of arbitrarily removing various elements from images in the case of high-resolution images.

LaMa's main innovations are: proposing a new repair network structure, using fast Fourier convolution, with a wide receiving domain of images, high perceptual field perception loss, and a large training mask, which can effectively improve the performance potential of the first two components.

The model also generalizes well to higher resolution images than at the time of training, achieving benchmark-like performance at a lower parameter amount and computational cost.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

Address: https://arxiv.org/abs/2109.07161

Code address: https://github.com/saic-mdal/lama

For example, the various trees, window sills, street lights, and cars in the picture below can be removed with one click.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

The main architecture of the model is shown in the following figure. Contains a black-and-white drawing of a mask, an original image. The masking map is overlaid into the Inpainting network, downsampled to a low resolution, then passed through several fast Fourier convolution FFC residual blocks, and finally the output was upsampled, resulting in a high-resolution repair image.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

Like a typical image restoration network, LaMa must also understand the image and try to fill in the pixels it thinks is best suited. So, in this case, in order to reduce the computation, it also needs to shrink the image at the beginning of the network. However, LaMa uses some special techniques when processing images to ensure that the downsampled image quality is the same as the original high-resolution image.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

The network is mainly divided into two steps.

First, the model compresses the image and tries to save only important relevant information. The network will end up retaining mostly generic information about the image, such as color, overall style, or common objects that appear, but not precise details. The model then tries to use the same principle but rebuilds the image backwards. The researchers used tricks such as skipt-connections to save information from the first few layers of the network and pass it on to the second step so that the model could direct it to the right object.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

In simple terms, the model can know that there is a tower in the picture, the blue sky and trees, which is called global information, but it still needs some skip connections to make the model recognize that the Eiffel Tower is in the center of the picture.

For more fine-grained information, such as details such as where there are clouds here or there, and what colors the trees have, researchers call it local information.

But there is also a problem, that is, in this case, the model is dealing with a lower quality image, which will reduce the quality of the image restoration. So the peculiarity is that laMa doesn't use convolution and skip connections to maintain local knowledge as it would in a regular convolutional network, but uses fast Fourier convolution, which means that the network will work in both the spatial and frequency domains and won't need to go back to the previous layer to understand the context of the image.

Each layer will process local features along with convolutions in the spatial domain, and use Fourier convolutions in the frequency domain to analyze global features.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

The frequency domain is a bit peculiar, basically converting the input image to all possible frequencies, so each pixel of this newly created image will represent a frequency that covers the entire spatial image and the amount of it is present, not the color. Of course, the frequencies here are not sound frequencies, but repeating patterns that represent different scales.

Thus, convolution of a new Fourier image allows the model to process the entire image at each step of the convolution process, so that the image can be better understood even in the first few layers without much computational cost, an effect that is not possible with conventional convolution.

The global and local results are then saved and sent to the next layer, which will repeat these steps and eventually get a final image that can be zoomed in back.

The use of the Fourier domain allows it to be extended to larger images, because the image resolution does not affect the Fourier domain, it uses the frequency of the entire image rather than the color as a feature, and the repetition pattern sought needs to be the same image size, which means that the same effect can be achieved even when training this network with small images.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

The researchers conducted experiments on image restoration on the CelebA-HQ dataset, using learnable perceptual image plaque similarity (LPIP) and FID as quantitative assessment metrics. Almost all models have weaker performance compared to laMa Fourier models (red up arrow). The table also includes metrics for different strategies generated by different test masks, namely narrow, wide, and segmentation, and LaMa Fourier's performance is still stronger, indicating that experimental methods make more efficient use of trainable parameters.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

Here are some examples of image repair for models.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!
Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!
Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!
Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

There are also some examples of fixes that are not very good.

Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!
Too many other tourists in the photo? Samsung researchers proposed laMa models, one click to cut them all!

While the results have been mixed, the LaMa model still performs well and represents an important step towards real-world applications.

Resources:

https://www.louisbouchard.ai/lama/

Read on