Tencent mixed element large model logo. Visual China map
"100-model war" added another giant: Tencent officially announced its participation in the war.
On September 7, at the 2023 Tencent Global Digital Ecosystem Conference, Tencent officially released the hybrid model and announced that it would be opened to the public through Tencent Cloud.
According to the data, Tencent Mixed Element Model is a general large model developed by Tencent, which has been connected to more than 50 business tests of Tencent, including Tencent Cloud, Tencent Advertising, Tencent Games, Tencent Fintech, Tencent Meeting, Tencent Docs, WeChat Souyi Search, QQ Browser and other businesses and products.
It is worth noting that this is not the first time that Tencent has disclosed the relevant progress of large models. Previously, on June 19, Tencent announced that it relied on the Tencent Cloud TI platform to build a select store for large models in the industry, providing one-stop MaaS (Model-as-a-Service) services to build exclusive large models and intelligent applications.
"Big model competition is still in the first mile of the marathon, and the industry is too new to have a clear market share or share." On September 7, in an interview with The Paper and other media, Tang Daosheng, senior executive vice president of Tencent Group and CEO of the Cloud and Smart Industry Business Group, said frankly, "Everyone likes to release general models at every turn, and in my opinion, this road is a bit off track, and it cannot solve the actual problems and pain points of the industry." ”
Regarding the prospects of large model landing, he also said bluntly: "Some cutting-edge investment and layout, it may take 3 to 5 years to see commercial returns, and it is too early to talk about the commercial prospects of large models." ”
Tencent's big model "roadmap"
In the first half of this year, many large manufacturers released large models, and the "100-model war" once became the focus of public opinion, while Tencent did not release a general large model until September, which seems to be a little belated.
"Tencent only looks at itself and does not look at others when it makes a big model." On September 7, regarding the release time, Jiang Jie, vice president of Tencent, frankly told the surging news reporter, "The hybrid model has been tested internally within Tencent for a long time, and in terms of launch time, it has not considered racing with peers." ”
From the perspective of basic parameters, the current parameter scale of Tencent Hybrid exceeds 100 billion, and the pre-training corpus exceeds 2 trillion tokens, which has the ability to create Chinese, logical reasoning in complex contexts, and task execution capabilities.
What are the characteristics and advantages of mixed-element large models compared to other models? The surging news reporter combed and found that its biggest feature is that it is more prominent in the ability to solve "illusions". The so-called "hallucination" refers to the phenomenon that the large model will answer nonsense when answering questions, and the Tencent mixed-element large model can be corrected by the "truth-profiling" algorithm in the pre-training stage, and the measured hallucination rate can be reduced by 30%-50%.
"Other vendors often use knowledge graphs or search plugins to make the retrieval ability of large models more accurate, but using plugins will lead to new illusions, so Tencent decided to solve this problem in the pre-training stage of large models." Jiang Jie said.
In addition, the mixed-element large model also has the processing power of ultra-long text, which can provide more than 4000 words of long text answers, while GPT-3.5 can only provide more than 1000 words of question and answer under the same proposition. In terms of data sources, Jiang Jie introduced that Tencent will not use personal privacy data when it makes small models, large models and even large language models. In addition, Tencent's content products also provide a large-scale and diversified corpus for Tencent's hybrid model, which can learn language knowledge and contextual understanding capabilities in various application scenarios.
In terms of specific indicators, the mixed-element large model surpasses GPT-3.5 in many indicators, including code sub-items, STEM sub-items, college entrance examination sub-items and mathematics sub-items, but it is still far from GPT-4. Jiang Jie said that when domestic manufacturers make larger language models, they need to do every technological breakthrough step by step, down-to-earth, and honestly face some gaps in technology with international manufacturers.
The outside world is curious why Tencent launched a general large model again after launching a large model in the industry?
Previously, Tang Daosheng mentioned that the general large model can solve 70%-80% of the problems in 100 scenarios, but it may not be able to meet 100% of the needs of a certain scenario of the enterprise. Compared with blindly using a general large model, enterprises may be a better option to build their own exclusive models based on industry large models. Model parameters are less than general-purpose large models, training and inference are less expensive, and optimization is easier.
In this regard, Tang Daosheng told the surging news reporter that the late launch was because Tencent has been in the process of research and development and application. In fact, Tencent has already begun internal testing of the mixed element model, but it will not announce its specific progress until it has undergone full application integration and practice, and the final announced finished product is fully polished, but it will still be continuously updated and iterated.
Where is the path to commercialization of large models?
After the popularity of the "100-model war" has slightly decreased, the actual landing efficiency of the large model has become the focus of external consideration.
When talking about the future business prospects of the mixed-element large model, Jiang Jie said frankly that the generation of commercial income at the To B (enterprise) side still needs to be explored, because the current large-scale hybrid model is not comprehensive enough in the face of maturity and complex tasks, so it cannot fully unlock more professional scenarios, and the application needs to be improved.
"The hybrid model was not released for the purpose of release at the beginning, but based on Tencent's own applications, such as WeChat, QQ, etc., for relevant research and development and matching, providing the ability to deeply integrate with each other, in order to offset the high equipment, training and labor costs behind the large model." Jiang Jie said.
Previously, Zhu Ye, vice president of Baidu Intelligent Cloud, said in an interview with the surging news reporter, including the media: "From the perspective of measuring the commercial value of the big model itself, if it cannot be applied, the entire large model is difficult to sustain, it really needs investment, and the entire ecology and application prosperity are very important." We judge that the two scenarios of marketing services and office efficiency improvement may first achieve large-scale landing, and I think we can see the application gradually landing and the scale gradually increasing in the next few months. ”
For the high cost of large model research and development, Tang Daosheng revealed that Tencent has a tilt towards resource allocation, and the mixed element large model is one of the most important projects within Tencent, so it will be considered as the first priority, and the current staffing and resource cooperation within the company is "running smoothly", but the competition of large models is still in the first kilometer of the marathon, and it may not be until 3 to 5 years before seeing the corresponding return.
Not long ago, Baidu announced that "Wen Xin Yiyan" will be fully opened to the whole society, and will also open a number of newly reconstructed AI native applications, allowing users to fully experience the four core capabilities of generative AI's understanding, generation, logic and memory.
Will the mixed-element model also be open to the public? In this regard, Jiang Jie said that whether to provide C-end (consumer) services is only a matter of time and choice, "At present, we still focus on trying in internal application scenarios, TO C or not is just a 'switch', do a good job of our own capabilities, improve accuracy is what we are most concerned about." ”