The problem that the AI drawing model can't write was solved by Ali

2024-01-01 12:55:00

Cressy from the temple of Wafei

量子位 | 公众号 QbitAI

An AI drawing tool that can accurately write kanji is finally here!

It supports a total of four languages, including Chinese, and the position of the text can be arbitrarily specified.

Since then, people can finally say goodbye to the "ghost drawing" of AI drawing models.

The problem that the AI drawing model can't write was solved by Ali

The drawing tool, called AnyText, from Alibaba, allows you to add text to your drawings exactly where you want them to be.

Previous drawing models generally could not accurately add text to the diagram, and even if they did, it was difficult to support complex structures such as Chinese.

At present, Anytext supports four languages: Chinese, English, Japanese, and Korean, which not only has accurate glyphs, but also perfectly blends with the picture.

In addition to being able to add text to the drawing, it is not a problem to modify the text that is already in the image, or even add words to it.

We've also experienced how AnyText works.

A variety of styles are easy to navigate

The official AnyText deployment tutorial is provided in the GitHub documentation, which can also be experienced in the Magic Ride community.

In addition, some netizens have made PyTorch notes, which can be deployed locally or in Colab with one click, and we use this method.

AnyText supports Chinese and English prompts, but from the program log, Chinese prompts will be automatically translated into English.

For example, we want AnyText to change Musk into a white T-shirt and ask him to make a call to the qubit (QbitAI).

Just enter the prompt word, then set the position of the text, and run it directly.

If you need to adjust parameters such as dimensions, you can expand the menu at the top, and if you can't do it, there are also bilingual tutorials in English and Chinese.

In the end, on Colab with V100, AnyText took more than 10 seconds to draw four images.

The effect is still good, whether it is the picture itself or the text, it looks like there is no flaw.

And AnyText can accurately imitate various text materials, such as chalk words on the blackboard, and even traditional calligraphy......

The text in Street View, or even the e-commerce promotional poster, is no match for AnyText.

And it's not just the variety of words on the plane, but also the three-dimensional style.

The text editing function can also modify the text in existing pictures, leaving almost no flaws.

In the test, AnyText also achieved good results - the accuracy in both Chinese and English was significantly higher than that of ControlNet, and the FID error was greatly reduced.

In addition, if you deploy it yourself, you can customize the font by preparing the font file and making simple changes to the code.

So, how did the researchers get AnyText to learn to write?

Text rendering is done independently

AnyText is developed based on the diffusion model, which is mainly divided into two modules, and the process of text generation is relatively independent.

The two modules are the hidden space auxiliary module and the text embedding module.

Among them, the auxiliary module encodes the three types of information of glyph, character position and mask and constructs a hidden space feature image to assist in the generation of visual characters.

The text embedding module decouples the semantic part of the descriptor from the part of the text to be generated, and uses the image encoding module to extract the glyph information separately, and then fuses it with the semantic information.

In practice, the embedded text is fed to the drawing module with an asterisk, which is reserved in the embedded space and filled with symbols.

Then, the glyph image obtained by the text embedding module is fed into the pre-trained OCR model, the glyph features are extracted, and then its dimensions are adjusted and the symbols in the reserved position are replaced to obtain a new sequence.

Finally, this sequence represents the text encoder that is fed into the CLIP to form the instructions that ultimately guide the image generation.

This "divide and conquer" approach not only contributes to the accuracy of the writing, but also helps to improve the consistency of the text and the background.

In addition, AnyText supports embedding other diffusion models to support text generation.

Address:

https://arxiv.org/abs/2311.03054

GitHub：

https://github.com/tyxsspa/AnyText

Magic Community:

https://modelscope.cn/models/damo/cv_anytext_text_generation_editing/summary

Note:

https://colab.research.google.com/github/camenduru/AnyText-colab/blob/main/AnyText_colab.ipynb

— END —

QbitAI · Headline number signed

The problem that the AI drawing model can't write was solved by Ali

A variety of styles are easy to navigate

Text rendering is done independently

Read on