laitimes

"Regeneration" Dalí + robot wall-e, text generation picture AI upgrade is coming!

This is an image generated by the AI system DECK-E 2 based on a text description of "Shiba Inu dog wearing a beret and black turtleneck."

After a year, the upgraded version of DALL-E is coming!

On April 6, local time, openAI, an artificial intelligence research institute, released DALL-E 2 (text-to-image generation program). With higher resolution and lower latency, the DALL-E 2 improves accuracy by 71.7%, realism by 88.8%, resolution by 4 times, and combines concept, attributes and style to create more vivid images, such as drawing foxes on the prairie in claude Monet's style.

Two new features have been added: finer-grained text locally modified images, and multiple stylistic variations to generate original images.

The former is like this!

Add a flamingo swimming circle to area 2 of the original image

Add a puppy to area 1 and area 2 of the original image, respectively

DALL-E 2 applies DALL-E's text-to-image capabilities on a finer level. The user can start with an existing picture, select an area, and tell the model how to modify it. Models can fill (or remove) objects, taking into account details such as shadow direction, reflections, and texture.

The latter is like this!

Use the same image as a baseline to create versions in different styles or arrangements.

The resulting image is 1024 x 1024 pixels, a leap from the 256 x 256 pixels provided by the original model

DALL-E takes its name from artist Salvador Dalí and the protagonist wall-E of Robot Story, the first version debuted in January 2021. DALL-E is based on the 175 billion-parameter GPT-3 model, but it uses only 12 billion parameters, using a set of data paired with text and images to produce images in a text narrative.

Salvador Dalí

The protagonist of Robot Story, robot WALL-E (Wall-E)

Prafulla Dhariwal, an OpenAI research scientist, said: "DALL-E 1 simply took the GPT-3 method from the language and applied it to generating images: we compressed the images into a series of words and then learned to predict what came next".

But word matching doesn't necessarily capture the point of human acceptance, and the prediction process limits the authenticity of the image. Clip (a computer vision system released by OpenAI last year) was used to observe images and summarize their contents in a human way.

The DALL-E system automatically creates some images based on the text "Avocado-shaped armchair"

CLIP is the original DALL · Based on the implementation of the E function, DALL-E 2 combines the advantages of CLIP and diffusion models. DALL· The "diffusion" process of E image generation can be understood as starting from "a bunch of points" and filling the image with more and more details. The characteristic of the diffusion model is that it can greatly improve the fidelity of the generated image at the expense of diversity.

DALL-E 2 is based on "Teddy bears mixing sparkling chemicals as mad scientists, steampunk." Describes the resulting image

In order to avoid the misuse of the generated images, OpenAI has implemented some built-in protection measures.

The model is trained on datasets that have been weeded out of bad data and will be tested by OpenAI-vetted partners, and users are barred from uploading or generating "non-G-rated" and "potentially harmful" images, as well as any images involving hate symbols, nudity, lewd gestures, or "major conspiracies or events related to major geopolitical events that are taking place."

The model also can't generate any recognizable faces based on names, even if something like the Mona Lisa is requested. At the same time, DALL · E 2 is watermarked on the resulting image to indicate that the work was generated by AI. Ideally, these measures can limit their ability to produce bad content.

As before, the tool is not released directly to the public. But researchers can submit applications to preview the system, and OpenAI hopes to bring THEL · E 2 is incorporated into the organization's API toolset, making it available for third-party applications.

Dhariwal said, "We want to proceed with this process in stages to continuously evaluate how to safely release this technology from the feedback obtained. ”

Read on