One minute! This AI creativity exploded...

2021-10-27 06:55:19

Complete the design of clothing and furniture in one minute, and become a powerful assistant to the designer——

This is the industry's largest Chinese multimodal pre-trained AI model jointly released by Alibaba and Tsinghua University: M6.

With a parameter scale of up to 100 billion, the M6 model is the largest model in the history of multimodal pre-training. Taking the application of image generation as an example, M6 can design images of more than 30 item categories including clothing, footwear, furniture, jewelry, books, etc., and the creation of works can be completed in as little as one minute.

One minute! This AI creativity exploded...

How does the M6 achieve fast and sophisticated design?

Because M6 is a "multimodal pre-training model", as a new type of AI training method, it breaks through the bottleneck of traditional deep learning methods and enables AI to have cognitive capabilities.

The training path of M6 is: First, automatically learn a large amount of language text and image data, memorize and understand the rich prior knowledge of human beings, and then further learn the professional field information, so that ai can master common sense and professional knowledge at the same time.

The breakthrough of M6 stems from a number of underlying technological innovations. Based on the self-developed Whale distributed framework, the Alibaba research team expanded the parameter scale to 100 billion yuan, and used large-scale data parallelism and model parallelism to increase the training speed by more than 10 times, and it only takes 1-2 days to complete the pre-training of hundreds of millions of data.

The Alibaba M6 model automatically designs pictures based on text content

In addition, the M6 model applies a multimodal pre-trained model to a text-based image generation task for the first time, combined with vector quantization to generate adversarial network learning text and image encoding co-modeling tasks, which can generate high-definition and richly detailed images.

Multimodal pre-training is the foundation of the next generation of artificial intelligence, and the M6 model has achieved a number of breakthroughs such as training efficiency and generation accuracy, and is the optimal model for many current multimodal downstream tasks Chinese.

——Yang Hongxia, senior algorithm expert of the Intelligent Computing Laboratory of Alibaba Damo Academy

As one of the earliest technology companies in China to invest in cognitive intelligence research, Alibaba has more than 30 research achievements in the field of cognitive intelligence that have been included in top international conferences.

In the next step, the research team will also develop a higher-scale trillion-parameter multi-modal pre-training model, continue to break through the limits of computing power and pre-trained models, and finally achieve high-quality pan-content generation in the general field.

One minute! This AI creativity exploded...

Read on

To see how strong the AI is, someone took it to play a "script kill"

Hardware 丨 AMD expects to launch a CPU with an integrated AI engine as early as 2023

Why sound is suitable for building a brand strengthens the mind

The 7th generation of Qualcomm AI engine: through AI, see the future

Capture once in 5 minutes, at least 89 times a day at home! Suntech employee: I don't even dare to go to the toilet

Played a script kill, the same car teammate "not human"

2022 Le Orange New Product Launch: 14 new products qifa software and hardware fully upgraded

Is there any software to dub videos? Share software that can dub videos

Don't let ChatGPT run

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Cheating with ChatGPT, beware of being caught, anti-plagiarism watermark technology makes students' nightmares come early

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Experience ChatGPT again: it will still be wrong, but the logic is stronger

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

The CV ring exploded again? Xiaoza high-profile official announcement DINOv2, split retrieval omnipotent, netizens: Meta is "Open" AI