Complete the design of clothing and furniture in one minute, and become a powerful assistant to the designer——
This is the industry's largest Chinese multimodal pre-trained AI model jointly released by Alibaba and Tsinghua University: M6.
With a parameter scale of up to 100 billion, the M6 model is the largest model in the history of multimodal pre-training. Taking the application of image generation as an example, M6 can design images of more than 30 item categories including clothing, footwear, furniture, jewelry, books, etc., and the creation of works can be completed in as little as one minute.

How does the M6 achieve fast and sophisticated design?
Because M6 is a "multimodal pre-training model", as a new type of AI training method, it breaks through the bottleneck of traditional deep learning methods and enables AI to have cognitive capabilities.
The training path of M6 is: First, automatically learn a large amount of language text and image data, memorize and understand the rich prior knowledge of human beings, and then further learn the professional field information, so that ai can master common sense and professional knowledge at the same time.
The breakthrough of M6 stems from a number of underlying technological innovations. Based on the self-developed Whale distributed framework, the Alibaba research team expanded the parameter scale to 100 billion yuan, and used large-scale data parallelism and model parallelism to increase the training speed by more than 10 times, and it only takes 1-2 days to complete the pre-training of hundreds of millions of data.
The Alibaba M6 model automatically designs pictures based on text content
In addition, the M6 model applies a multimodal pre-trained model to a text-based image generation task for the first time, combined with vector quantization to generate adversarial network learning text and image encoding co-modeling tasks, which can generate high-definition and richly detailed images.
Multimodal pre-training is the foundation of the next generation of artificial intelligence, and the M6 model has achieved a number of breakthroughs such as training efficiency and generation accuracy, and is the optimal model for many current multimodal downstream tasks Chinese.
——Yang Hongxia, senior algorithm expert of the Intelligent Computing Laboratory of Alibaba Damo Academy
As one of the earliest technology companies in China to invest in cognitive intelligence research, Alibaba has more than 30 research achievements in the field of cognitive intelligence that have been included in top international conferences.
In the next step, the research team will also develop a higher-scale trillion-parameter multi-modal pre-training model, continue to break through the limits of computing power and pre-trained models, and finally achieve high-quality pan-content generation in the general field.