Author | Cheng Qian
Edit | Yun Peng
Zhidong reported on July 5 that this afternoon, at the 2024 World Artificial Intelligence Conference (WAIC) Tencent Forum, according to Jiang Jie, vice president of Tencent Group, Tencent Group Vice President revealed that the current Tencent hybrid model adopts the MoE architecture, the overall parameters of the model have reached trillions, and the number of Tokens has exceeded 7 trillion, ranking in the first echelon of domestic large models.
In addition, Tencent has built a set of full-link product matrices for the implementation of large-scale model applications, covering rich infrastructure at the bottom to multiple intelligent applications at the top, including self-developed general large models, model development platforms, agent development platforms, and intelligent application solutions customized for different scenarios.
Wu Yunsheng, Vice President of Tencent Cloud, Head of Tencent Cloud Intelligence, and Head of Tencent Youtu Lab, believes that today's large-scale model technology is rapidly evolving in the direction of multimodality, zero-shot learning, 3D and video generation, and Tencent is continuing to promote AI research and application. Tencent Cloud has upgraded the large model knowledge engine, image creation engine, and video creation engine to accelerate the application of large models in the medical and entertainment fields. In addition, in terms of cultural studies, Tencent Cloud announced that it has open-sourced the world's largest Oracle multimodal data.
Wu Yunsheng said that he sees that there are two types of views on Scaling Law, one is that the development of this technical path has been relatively moderate, and the other believes that it is still developing, and from his personal point of view, Scaling Law as a whole will play a role for a period of time. For example, there are still many new breakthroughs in multimodal research. Therefore, it is not possible to draw a definitive conclusion on the development potential of this technical path, but should continue to explore based on different scenarios or technological evolution.
Wu Yunsheng said that since its inception, Youtu Lab has insisted that only when technology is implemented in the industry can it generate value, which is also the direction of AI development, so they have been promoting around this point. At the technical level, Youtu Lab is exploring technologies such as multimodal technology, small samples and zero samples; At the platform level, we strive to integrate more players in the industry chain, continuously iterate platform capabilities, and lower the threshold for AI applications. The third is the knowledge engine, image creation engine, video creation engine, etc., which encapsulate some tool capabilities to help users use these tools to quickly build intelligent applications and implement them in specific scenarios.
▲Wu Yunsheng, Vice President of Tencent Cloud, Head of Tencent Cloud Intelligence, and Head of Tencent Youtu Lab
1. Evolving from single-modal and multi-modal to full-modality, there is a lack of killer applications for large-scale models
Jiang Jie, vice president of Tencent Group, said that the focus of this year's WAIC is undoubtedly on large models, and as of the end of April, China has launched more than 300 large models, and more than 100 large models with more than 1 billion parameters. In addition, OpenAI recently announced the cessation of API calls in Chinese mainland, which shows the value and necessity of realizing the full-link independent research and development of large models.
At present, Tencent's hybrid model adopts the MoE architecture, the number of parameters has reached the trillion level, and the number of tokens has reached 7 trillion. On top of this, Jiang Jie summarized several major phenomena in the current large model industry.
In the future, the general model will become an infrastructure like hydropower and coal, and there will be large models with different scenarios, different modes, and different sizes, which enterprises can call and fine-tune according to their own needs.
Jiang Jie said that the current ability of large models is far from reaching the ceiling, and when its parameter scale continues to break through, models of different sizes will appear in the future, and the performance will be improved and the customized needs of enterprises can be met through the collaboration of large and small models. Large models mainly provide powerful model capabilities and generalization performance, while small models are specially optimized for specific scenarios to achieve faster and more effective accurate inference.
Tencent Hybrid Model has been opened to enterprise and individual developers through Tencent Cloud, including trillions, hundreds of billions, tens of billions of different parameter sizes, and next, Tencent Hybrid MoE models of various sizes will also be open sourced to the public, which can support diverse deployment scenarios such as mobile phones, PCs, clouds, and data centers.
In addition, the large-scale model industry is undergoing an evolution from unimodal to multimodal and then to full modality.
He added that in the field of Wensheng graphs, the model using the DiT architecture integrates the Transformer architecture, which was previously mainly used for text generation, and shows significant advantages in image and video generation tasks. In the field of Wensheng video, video generation is developing in the direction of higher resolution, longer duration, and more refinement, and some good models have been able to generate high-definition videos up to several minutes, bringing a broad application imagination.
In terms of application, Jiang Jie believes that application scenarios will be an inevitable factor in the decisive battle of large models in the future. At present, the implementation of large models is mainly in terms of work efficiency, which is a certain distance from the real business, and lacks killer applications.
At present, Tencent has nearly 700 internal businesses and scenarios connected to the large model, with about 300 million calls per day. In the future, Tencent will open up its capabilities and experience based on these practices to the outside world through Tencent Cloud.
2. The training corpus of the hybrid large model exceeds 7 trillion, and multiple versions have been open-sourced
Tencent has built a full-link product matrix in the field of large models, covering rich infrastructure at the bottom to multiple intelligent applications at the top. It includes self-developed general large models, model development platforms, agent development platforms, and intelligent application solutions customized for different scenarios.
Wu Yunsheng said that the development of AI technology in the past period of time has made the advancement of AI technology reach a new commanding height, but in practice, single-modal technology only performs well in some scenarios, and the blessing of multi-modality can expand the application paradigm of large models. For example, multimodal large models can combine vision and language understanding to break through the limitations of previous technology applications and achieve more accurate semantic analysis.
In terms of learning paradigm, traditional models are limited by task independence and the need for a large amount of labeled data, and the R&D process can be simplified through zero-shot and small-shot learning. For example, the generation of digital humans can now be generated with just one photo without the need for additional customization processes. Wu Yunsheng said that this kind of technology can also be applied to industrial quality inspection scenarios, and the efficiency and accuracy of defect detection can be improved through simple vocabulary prompts or defect illustration photos.
In addition, in terms of content presentation, with the development of technologies such as video generation, users can obtain a more immersive experience, such as breaking the limitations of traditional physics simulation through 3D generation technology, and improving the speed and quality of generated content.
Based on the breakthrough of the underlying technology, Tencent's large model is also constantly evolving. In September last year, Tencent released the full-link self-developed Tencent hybrid model, which reached the scale of trillions of parameters based on the MoE model, and the pre-trained corpus exceeded 7 trillion tokens. Wu Yun claimed that Tencent's hybrid model has firmly ranked in the first echelon of domestic large models, and the number of tokens called in a single day has exceeded 100 billion.
Tencent Cloud has opened up the 256K version of Lite, the multi-modal version of Vision, as well as sub-models and interfaces such as code generation and role-playing to meet the needs of different enterprises and developers.
3. The full-link product matrix, knowledge engine, image creation, and video creation are upgraded and applied to the ground
In terms of model tool products, in May this year, Tencent launched three large-scale model PaaS products, including a knowledge engine, an image generation engine, and a video generation engine. At present, these three products have been upgraded and iterated.
The dynamic retrieval capability of the knowledge engine is upgraded, supporting mutual search of images and texts, searching for images by images, etc., and can also give answers with pictures and texts in combination with the images and text fragments retrieved in the knowledge base. Tencent has also further expanded the coverage of enterprise knowledge types, upgraded the conversational data Q&A experience, and supported ultra-large tables, multi-step reasoning in multiple scenarios, multi-condition filtering, and summing calculations.
In the image creation engine, the number of image styles has been increased to 33, and a generation mode dedicated to avatars has been introduced, so that the generated pictures can not only retain personal physiognomy characteristics, but also integrate multiple artistic style characteristics. At the same time, the tool also adds product background generation, model dressing, creative dressing, etc., which can reduce the production cost of marketing and film and television industry.
Tencent has added more than 20 popular dance moves to the video creation engine, and based on 3D modeling technology and generation technology, it can improve the audio naturalness, similarity and speech speed effect of videos.
At the same time, in order to help enterprises build their own model applications more quickly, Tencent's machine learning platform TI has comprehensively upgraded its fine-tuned data preparation, fine-tuned training and model validation platforms.
The TI platform has built-in open-source and extensible data construction, which can help developers prepare data and improve data annotation capabilities. In terms of fine-tuning training, the platform supports one-click start of fine-tuning tasks, and ensures the continuity and performance improvement of large model training through a three-layer stabilization mechanism and a self-developed Android framework. In the process of model validation, the TI platform adopts a three-stage model evaluation process, including lightweight experience, objective decision-making, and subjective evaluation.
At present, Tencent has provided users with full-link model services around multiple products such as office collaboration, data analysis, knowledge management, and intelligent marketing. Wu Yun claims that they have applied the capabilities of the knowledge engine to the text robot of the starting customer's large model, and the configuration cost of the large model robot can be reduced by more than 50% compared with the traditional text robot when facing complex tasks such as bill inquiry and return and exchange.
Tencent Lexiang, a knowledge learning collaboration platform serving enterprises, has served more than 300,000 customers and has hundreds of millions of users.
Wu Yunsheng said, for example, that on the knowledge production side, AI can complete collaboration in one sentence, and at the same time, it can also generate training knowledge points to improve the training efficiency of different departments; On the knowledge consumption side, Tencent has launched intelligent Q&A, which allows AI to answer more internal business knowledge and improve the efficiency of knowledge acquisition.
Fourth, it has been implemented in medical, entertainment, and scientific research, and the world's largest Oracle multimodal dataset has been open-sourced
In terms of industrial implementation, Tencent has made breakthroughs in the fields of medical care, entertainment, and scientific research.
In the medical field, Tencent and the Shanghai Digital Medicine Innovation Center have jointly developed a large medical model, and implemented applications such as physical examination reports and electronic medical record generation in Ruijin Hospital. In terms of physical examination report generation, the large model can generate physical examination reports in an average of 5 seconds, saving doctors more than 50% of their writing time.
In the entertainment industry, Tencent and China Literature Group provide AI-assisted writing assistants for writers based on the capabilities of large-scale model Wenshengwen, including descriptive inspiration, outline extraction, character extraction and other capabilities, which can help writers write and produce illustrations.
In addition, Wu Yunsheng also said that the application of multiple models will also bring many practical problems to enterprises.
First of all, the company's engineering team has limited resources, and it is complicated to build inference clusters and service platforms on their own for models that iterate quickly. Second, the inference cost of the model is very high, and the inference deployment of tens of billions or hundreds of billions of models will face bottlenecks in terms of throughput and latency.
Tencent's TI platform is the answer to these problems.
In terms of inference, Tencent has helped China Literature improve the speed of content generation under the same resource conditions, and the platform also provides more intuitive monitoring and management tools to help it manage tasks and resources.
In addition to industrial landing, the application value of large models in scientific computing and cultural research is being amplified.
In the field of astronomy, based on the blessing of AI technology, Tencent has discovered 3 cases of fast radio bursts and 41 pulsars, according to Wu Yunsheng, fast radio bursts are the focus of current astronomical research, compared with pulsars, fast radio bursts because of the short issuance time, AI training data, low frequency, so it is more difficult to find than pulsars.
Based on this, Tencent has designed a new end-to-end AI algorithm, introducing multi-example learning and large model attention mechanisms, which can improve the accuracy of the model and the speed of data processing.
In the field of culture, Tencent today open-sourced the world's largest Oracle multimodal comprehensive dataset, including rubbings and facsimiles; The corresponding position of the word, the corresponding prefix, etc. This dataset can help researchers develop and generate Oracle detection recognition templates.
Finally, Wu Yunsheng said that whether it is in the industrial landing, scientific and cultural exploration, or the advancement of large-scale model technology, these developments are inseparable from the complete industrial chain coordination and ecological co-construction.
Conclusion: Tencent has accumulated full-link self-developed technology for large models
At present, in terms of large models, Tencent has accumulated full-link self-developed technologies from computing infrastructure to machine learning platforms and upper-layer applications.
With the deepening of the application of cutting-edge AI technologies such as large models in the real economy, cultural protection, scientific discovery and other fields, and the acceleration of industrial chain collaboration and ecological co-construction in the upstream, midstream and downstream, AI is becoming a strong driving force for the intelligent upgrading of the whole society.