OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

Reports from the Heart of the Machine

Machine Heart Editorial Department

In terms of breathtaking, OpenAI never disappoints.

OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

On January 6 last year, OpenAI released a new model, THEL· E, which can generate images from text without crossover, breaks the wall of natural language and visual dimensions, causing a cheer in the AI circle.

After more than a year, DALL · E ushered in an upgraded version - DALL · E 2。

With DALL · E Compared to DALL · E 2 has higher resolution and lower latency when generating user-described images. The new version also adds some new features, such as editing the original image.

However, OpenAI is not directly available to the public DALL· E 2。 Currently, researchers can register online to preview the system. OpenAI hopes to use it for third-party applications in the future.

Play Waitlist Address: https://labs.openai.com/waitlist

OpenAI also announced the DALL · E 2's research paper, Hierarchical Text-Conditional Image Generation with CLIP Latents, said Prafulla Dhariwal, a research scientist at OpenAI and co-author, "This neural network is amazing, and it can generate corresponding images based on text descriptions."

Address: https://cdn.openai.com/papers/dall-e-2.pdf

Netizens have posted the use of DALL · E 2 generates images, such as pandas skateboarding.

Another example is a child and a puppy sitting on the floor and watching the stars.

DALL· E 2 Generative art masterpieces

DALL· How does the E 2 perform? Let's take a sneak peek. First of all, DALL · E 2 can create raw, realistic images and art from text descriptions, and it can combine concepts, attributes, and styles for image generation. For example, an astronaut on a horse:

The resulting image can be more than one (10 examples on the official website), and it can also generate the following image (an astronaut on a horse), which is really varied:

DALL· E 2 can edit existing images based on natural language subtitles. It can add and remove elements while taking shadows, reflections, and textures into account. As shown in the image below, the original image is on the left and the DALL · E 2 Edited image. After comparing the two figures, we found that there are numbers 1, 2, 3 in the left image, click on the corresponding position, you can add elements such as corgis, and the following figure selects to add corgis at 1.

You can also add a corgi in 3 places.

DALL· The E 2 can recreate the original image to create different variants:

You might ask, DALL · What's so good about the E 2 over the next generation? Simply put, DALL · E 2 produces more realistic and accurate images at 4x resolution. For example, the following image generates a "fox sitting in a field at sunrise, and the resulting image is monet-style." DALL· E 2 produces more accurate images.

After reading the above demonstration, we can compare the DALL · The characteristics of the E 2 boil down to the following: DALL · One of the new features of E 2 is the fix, which was added to the DALL · E 1 builds on text-to-image generation at a more fine-grained level of image. Users can start with an existing picture and select an area for the model to edit the image, for example, you can draw a picture on the wall of the living room and replace it with another, or put a vase of flowers on the coffee table. The model can populate (or remove) objects while taking into account details such as the direction of shadows in the room.

DALL· Another feature of E 2 is to generate different variations of images, where the user uploads an image and the model creates a series of similar variants. In addition, DALL · E 2 can also blend two pictures to produce a picture that contains both elements. The resulting image is 1024 x 1024 pixels, significantly more than 256 x 256 pixels.

Generate iterations of the model

DALL· E 2 builds on clip, says Prafulla Dhariwal, an OpenAI research scientist: "DALL · E 1 simply extracted the GPT-3 method from the language and applied it to the generated image: compressing the image into a series of words and learning to predict what would happen next."

This is the GPT model used by many text AI applications. But word matching doesn't necessarily match people's expectations, and the prediction process limits the authenticity of the image. Clip aims to view images and summarize their contents in a human way, and OpenAI iteratively created an inverted version of CLIP, "unCLIP", which generates images from descriptions, while DALL · E2 uses a process called diffusion to generate images.

The training dataset consists of the image x and its corresponding subtitle pair (x, y). The given images x, z_i, and z_t represent clip images and text embeddings, respectively. OpenAI generates a stack to generate images from captions using two components:

Prior P(z_i |y) generates a CLIP image embedded z_i conditioned on subtitle y;

Decoder P(x|z_i , y) produces image x on the condition that the CLIP image is embedded z_i (and optionally a text caption y).

The decoder allows the researcher to invert images given the CLIP image embedding, while the prior allows learning the generative model of the image embedding itself. Stacking these two components produces an image x, the generating model P(x|y) given the caption y:

DALL· The full model of E has never been publicly released, but other developers have built some imitations of THE DALL · E feature of the tool. One of the most popular mainstream apps is Wombo's Dream mobile app, which generates images based on a variety of content described by the user.

OpenAI already has some built-in protections in place. The model is trained on a dataset that has been culled out of bad data, ideally limiting its ability to produce objectionable content.

To avoid misuse of the resulting images, DALL · E 2 is watermarked on the resulting image to indicate that the work was generated by AI. In addition, the model is also unable to generate any recognizable faces based on names.

DALL· The E 2 will be tested by vetted partners, but there are some requirements: users are prohibited from uploading or generating images that are "likely to cause harm." They must also explain what it does to generate images with AI, and they can't make the resulting images available to others through an app or website.

But OpenAI hopes to bring THE DALL · E 2 it is added to the organization's API toolset, enabling it to provide support for third-party applications. Dhariwal said: "We want to go through this process in stages to continuously evaluate how to safely release this technology from the feedback we get."

Reference Links:

https://openai.com/dall-e-2/

https://www.theverge.com/2022/4/6/23012123/openai-clip-dalle-2-ai-text-to-image-generator-testing

OpenAI's DAL · E ushered in an upgrade, not only text to generate images, but also to create secondary creations

Read on

The Yale team developed an "air clip" that can measure the coronavirus that can be used for early identification

When self-supervision met language-image pre-training, UC Berkeley proposed the multitasking framework SLIP

In the era of electric vehicles, The whole line of Price Bond Power Semiconductor Packaging has been listed on the market

Byte the latest text generation image AI, the training set actually does not have a picture with a text description?!

"Regeneration" Dalí + robot wall-e, text generation picture AI upgrade is coming!

The new multimodal king ascends the throne! OpenAI Releases DAL · E 2, generate the image "which to play which"

Concise, vivid, illustrating how the "old painter" DALL-E 2 works

The original artist was stunned: this explosive AI really painted the dream into reality! Download the app for everyone

Lao Huang personally came to the door to deliver supercomputing!OpenAI Ultraman went to Stanford to give a speech on GPT-5 after signing

Huang delivered the first super AI chip!

OpenAI is betting on solar energy to drive AI development, co-investing $20 million in Exowatt

Sound cloning revolution: OpenAI technology takes only 15 seconds and realistically mimics the human voice

Abandoning OpenAI, HUDstats adopts Amazon Bedrock to advance esports storytelling technology

My company hasn't been killed by OpenAI yet

Interview with the person in charge of OpenAI Sora: 20 questions to delve into the details of R&D, Sora is still in the GPT-1 period

Fresh Early Technology丨OpenAI opens the "memory" function to ChatGPT Plus users, Cao Cao Travels submits an IPO application to Hong Kong, and Xiaohongshu denies the Pre-IPO round of financing

OpenAI is making trouble mysteriously, GPT-4.5 is online, reasoning crushes GPT-4, Ultraman laughs but doesn't say anything

Restart negotiations with OpenAI, Apple finds a "spare tire" for iOS 18's AI

OpenAI secretly launched a mysterious model, suspected to be ChatGPT4.5 for public testing

Microsoft and OpenAI have been sued as a class

The AI Revolution: The Way Forward for Microsoft and OpenAI

OpenAI may launch a search engine to challenge Google, Li Feifei AI company has received financing to focus on "spatial intelligence", and Chang'e-6 has been successfully launched to start its journey to the moon

AGI News: Stanford Li Feifei started his first business, aiming at "spatial intelligence"; OpenAI will release a search product next week to challenge Google

The US media exposed the news: 69-year-old Bill Gates is still the boss behind the scenes, leading the marriage of Microsoft OpenAI