laitimes

OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

Reports from the Heart of the Machine

Machine Heart Editorial Department

In terms of breathtaking, OpenAI never disappoints.

OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

On January 6 last year, OpenAI released a new model, THEL· E, which can generate images from text without crossover, breaks the wall of natural language and visual dimensions, causing a cheer in the AI circle.

After more than a year, DALL · E ushered in an upgraded version - DALL · E 2。

OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

With DALL · E Compared to DALL · E 2 has higher resolution and lower latency when generating user-described images. The new version also adds some new features, such as editing the original image.

However, OpenAI is not directly available to the public DALL· E 2。 Currently, researchers can register online to preview the system. OpenAI hopes to use it for third-party applications in the future.

Play Waitlist Address: https://labs.openai.com/waitlist

OpenAI also announced the DALL · E 2's research paper, Hierarchical Text-Conditional Image Generation with CLIP Latents, said Prafulla Dhariwal, a research scientist at OpenAI and co-author, "This neural network is amazing, and it can generate corresponding images based on text descriptions."

Address: https://cdn.openai.com/papers/dall-e-2.pdf

Netizens have posted the use of DALL · E 2 generates images, such as pandas skateboarding.

OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

Another example is a child and a puppy sitting on the floor and watching the stars.

OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

DALL· E 2 Generative art masterpieces

DALL· How does the E 2 perform? Let's take a sneak peek. First of all, DALL · E 2 can create raw, realistic images and art from text descriptions, and it can combine concepts, attributes, and styles for image generation. For example, an astronaut on a horse:

OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

The resulting image can be more than one (10 examples on the official website), and it can also generate the following image (an astronaut on a horse), which is really varied:

DALL· E 2 can edit existing images based on natural language subtitles. It can add and remove elements while taking shadows, reflections, and textures into account. As shown in the image below, the original image is on the left and the DALL · E 2 Edited image. After comparing the two figures, we found that there are numbers 1, 2, 3 in the left image, click on the corresponding position, you can add elements such as corgis, and the following figure selects to add corgis at 1.

You can also add a corgi in 3 places.

DALL· The E 2 can recreate the original image to create different variants:

You might ask, DALL · What's so good about the E 2 over the next generation? Simply put, DALL · E 2 produces more realistic and accurate images at 4x resolution. For example, the following image generates a "fox sitting in a field at sunrise, and the resulting image is monet-style." DALL· E 2 produces more accurate images.

After reading the above demonstration, we can compare the DALL · The characteristics of the E 2 boil down to the following: DALL · One of the new features of E 2 is the fix, which was added to the DALL · E 1 builds on text-to-image generation at a more fine-grained level of image. Users can start with an existing picture and select an area for the model to edit the image, for example, you can draw a picture on the wall of the living room and replace it with another, or put a vase of flowers on the coffee table. The model can populate (or remove) objects while taking into account details such as the direction of shadows in the room.

DALL· Another feature of E 2 is to generate different variations of images, where the user uploads an image and the model creates a series of similar variants. In addition, DALL · E 2 can also blend two pictures to produce a picture that contains both elements. The resulting image is 1024 x 1024 pixels, significantly more than 256 x 256 pixels.

Generate iterations of the model

DALL· E 2 builds on clip, says Prafulla Dhariwal, an OpenAI research scientist: "DALL · E 1 simply extracted the GPT-3 method from the language and applied it to the generated image: compressing the image into a series of words and learning to predict what would happen next."

This is the GPT model used by many text AI applications. But word matching doesn't necessarily match people's expectations, and the prediction process limits the authenticity of the image. Clip aims to view images and summarize their contents in a human way, and OpenAI iteratively created an inverted version of CLIP, "unCLIP", which generates images from descriptions, while DALL · E2 uses a process called diffusion to generate images.

The training dataset consists of the image x and its corresponding subtitle pair (x, y). The given images x, z_i, and z_t represent clip images and text embeddings, respectively. OpenAI generates a stack to generate images from captions using two components:

Prior P(z_i |y) generates a CLIP image embedded z_i conditioned on subtitle y;

Decoder P(x|z_i , y) produces image x on the condition that the CLIP image is embedded z_i (and optionally a text caption y).

The decoder allows the researcher to invert images given the CLIP image embedding, while the prior allows learning the generative model of the image embedding itself. Stacking these two components produces an image x, the generating model P(x|y) given the caption y:

DALL· The full model of E has never been publicly released, but other developers have built some imitations of THE DALL · E feature of the tool. One of the most popular mainstream apps is Wombo's Dream mobile app, which generates images based on a variety of content described by the user.

OpenAI already has some built-in protections in place. The model is trained on a dataset that has been culled out of bad data, ideally limiting its ability to produce objectionable content.

To avoid misuse of the resulting images, DALL · E 2 is watermarked on the resulting image to indicate that the work was generated by AI. In addition, the model is also unable to generate any recognizable faces based on names.

DALL· The E 2 will be tested by vetted partners, but there are some requirements: users are prohibited from uploading or generating images that are "likely to cause harm." They must also explain what it does to generate images with AI, and they can't make the resulting images available to others through an app or website.

But OpenAI hopes to bring THE DALL · E 2 it is added to the organization's API toolset, enabling it to provide support for third-party applications. Dhariwal said: "We want to go through this process in stages to continuously evaluate how to safely release this technology from the feedback we get."

Reference Links:

https://openai.com/dall-e-2/

https://www.theverge.com/2022/4/6/23012123/openai-clip-dalle-2-ai-text-to-image-generator-testing

Read on