laitimes

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

author:Smart stuff
Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

Author | Cheng Qian

Edit | Heart

Zhidong reported on May 13 that today, on the occasion of the first anniversary of its establishment, the domestic AI large model unicorn 010,000 products were upgraded with a series of products.

In terms of closed-source models, 010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

In the field of open source, the Yi-34B, Yi-9B, and 6B versions of the small- and medium-sized open-source models released earlier have been upgraded to the Yi-1.5 series, and each version achieves the best SOTA performance in the same size.

Open source address: https://huggingface.co/01-ai

Magic Community: https://www.modelscope.cn/organization/01ai

At the same time, Kai-Fu Lee also introduced the one-stop AI work platform Wanzhi, which supports the use of PC web pages and WeChat applets, and can make meeting minutes, weekly reports, writing assistants, PPT, interpret financial reports, papers and other documents, and can generate PPT within 1-2 minutes.

Founded in May last year, on the occasion of its first anniversary, its product matrix was heavily upgraded, and after the release of the first pre-trained large model Yi-34B in November 2023, it has formed a complete product matrix for open source, closed source, B-side, and C-side.

In the media exchange session, Kai-Fu Lee shared that the revenue of C-end overseas productivity applications of 010,000 things is expected to reach 1~200 million yuan this year, mainly paid by foreign users.

Artificial general intelligence (AGI) has been Kai-Fu Lee's dream for more than 40 years, according to him, he promised investors a year ago that he would not cash out for 10 years, and listing is the goal that the Zero One Ten Thousand Things team will work on in the next two years.

Kai-Fu Lee is optimistic about the development of domestic chips, saying that Zero One Everything will adopt domestic chips at the right time, and Innovation Works has been paying attention to investment in this area. The zero-10,000 model is more pragmatic and will continue to explore how to train the best model with the fewest chips and the lowest cost.

In addition, Kai-Fu Lee also mentioned that he recently opened a Douyin account, sharing technology, products and some views on live broadcast.

1. The evaluation of the 100 billion parameter closed-source large model surpasses GPT-4, and Yi-XLarge MoE has been launched, which will impact the performance of GPT-5

01000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Kai-Fu Lee revealed that after testing, the Yi-Large evaluation results have reached at least the alignment with GPT-4, and some indicators have surpassed GPT-4.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

In the third-party evaluation results, Yi-Large ranked second in the Stanford English assessment, second only to GPT-4-Turbo, and ranked first in the domestic large model list in the Chinese SuperCLUE results.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

In addition, the preliminary training results of Yi-XLarge MoE, a larger model based on the MoE architecture being trained by Zero One Everything, show that all indicators of Yi-XLarge MoE have surpassed Yi-Large, which will impact the performance and innovation of GPT-5.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

On top of the previously open-source 6B and 34B models, today, the company announced the synchronous upgrade of the Yi-1.5 open-source series, open-sourced 6B, 9B, and 34B different parameter scales, as well as pre-training, Chat models, etc., including Yi-34B Base+Chat, Yi-9B Base+Chat, and Yi-6B Base+Chat.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

Kai-Fu Lee said that the Yi-1.5 series model has surpassed the Gemma, Mistral and Llama-3-8B models, and the evaluation results show that the 34B model in the Yi-1.5 series is in the absolute leading position in the 34B-50B model, and the performance in some indicators is not even inferior to the 70B model.

?? ?? API, ?? API 1999 Yi-Large Yi-Large-Turbo Yi-Medium Yi-Medium-200K Yi-Vision Yi-Spark

In addition, there are relatively low-cost APIs, including Yi-Vision, which is fine-tuned based on the open-source 34B model, Yi-Vision, a multi-modal vision model, and Yi-Spark, which has smaller parameter sizes.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

At present, Zero One Things has provided a free quota for developers to use, and Lee Kai-fu revealed that 80% of the developers who initially contacted chose to move from the original model to the model using Zero One Things.

According to Lan Yuchuan, the person in charge of the 010,000 API platform, the pricing of Yi-Large API is 20 yuan corresponding to one million tokens, which is about 1/3 of the cost and pricing of GPT-4 Turbo, and it is also very competitive compared to other large-scale models. At the same time, Zero One Everything also provides a faster and cheaper Yi-Large-Turbo.

2. One-stop AI workstation Wanzhi is online, quickly reads ultra-long documents, and generates PPT in 2 minutes

Zero One One-stop AI Workstation Wanzhi has WeChat applet and PC web version. Users can not only read massive data, analyze charts and texts, but also generate PPT in 1-2 minutes.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

Kai-Fu Lee said that all applications should be AI-native, and the use of productivity tools in the future will be completely subverted.

He demonstrated the capabilities of Wanzhi on the spot.

First of all, when asked, "I want to make Taiwanese braised pork rice, please give me a table of ingredients and a mind map of the time and steps of how to make it". Wanzhi presented the recipe of braised pork rice through a table, and also presented the steps of making it through a mind map.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

When answering about the popular performances in Beijing, Wanzhi gave a specific time and place of the event, and there will be a Gantt chart presented, so that users can clearly see the specific information of the performance according to their free time.

Wanzhi can also quickly read and comprehend PDF documents and understand diagrams. After uploading the PDF document, after reading it, a summary and suggested questions will be generated on the right.

Faced with a single chart in the PDF, Knowing will generate content based on the context, as shown in the chart below "Percentage of Granted AI Patents by Geographic Region between 2010 and 2022", and Knowing will give the source of the chart and the obvious trends shown in it for users to understand.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

In addition, in the scenario of Wanzhi as a productivity tool, it can also quickly generate PPT, when introducing what AI is to students, the PPT content needs to be simple and have wonderful pictures. First of all, we will automatically summarize the key points that should be covered in the PPT content, and then directly generate the PPT. If users are not satisfied with a page of pictures in the PPT, they can also quickly change the pictures through AI-enhanced image search.

Kai-Fu Lee said that the PPT generated by Wanzhi is better than that of Microsoft Office Copilot, and the production time of a PPT is about 1-2 minutes.

3. TC-PMF is the key to the development of large models in the AI 2.0 era

Regarding the precipitation and outlook for the future, Kai-Fu Lee said that there has been a hot discussion recently: some people think that we should pursue AGI madly, and AGI will rewrite everything once it happens; Others think that the bigger the model, the harder it is to use, and we should quickly find the PMF (product-market fit).

He believes that both views are true, but they are incomplete, and that no company can lead all companies for a long time by technology alone, and must rely on non-technical advantages, that is, the final product wins.

Therefore, enterprises should not forget the importance of PMF, but in the AI 2.0 era, they also need to consider TC-PMF (Product-Market-Technology-Cost Fit), where T represents technology and C represents cost.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

At the same time, the inference cost of the entire industry is too high, and many applications such as social networking, e-commerce, and short video cannot fully embed AI, so enterprises need to continuously reduce the inference cost.

In terms of application, in general, to achieve the best AI-first application, a top-level model is required to achieve it. But sometimes some applications explode first and can be implemented on smaller models.

In response to the above phenomena and the pain points of the industry, Kai-Fu Lee analyzed the four development principles that Zero One has always adhered to.

First of all, the products of 010,000 are oriented to the world, and its single product revenue has reached 100 million yuan this year, the product ROI is close to 1, and the number of overseas product users has been close to 10 million in 9 months after it was launched.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

The second feature is the co-construction of the model base, Kai-Fu Lee said, the number of GPUs of foreign manufacturers has reached more than 10 times that of domestic start-ups, in this context, the model and inference team of zero 10,000 things have basically the same scale.

In terms of training, Zero One Things has previously cooperated with NVIDIA and has become one of the top three in the world to achieve FP8 training, which means that FP8 precision can be used from beginning to end to achieve faster training. He added that with the accumulation of this technology, the training cost can be about half lower than that of its peers.

The third feature is to continue to pay attention to user experience, the value of the model is generated in the value to the user, a very typical function is the analysis of charts, rather than the understanding of ordinary photos.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

Finally, a test for all large model players is that enterprises need to consider how to develop products at the same time, understand the market and users, and pay attention to the speed of technology development, the evolution of model capabilities, and the reduction of costs. The ability to know-how is required here, including sufficient foresight, and whether it can be achieved and executed with independent strength.

Yi-Large, a 100 billion parameter model, was released, and the new MoE model played against GPT-5

Kai-Fu Lee believes that on this basis, there are four major advantages of zero and one thing, and the researchers of zero and one thing's technology, products and innovation factories can predict the development trend of technology, polish the product and be patient, the predictive ability of investors, and the reasoning team continues to reduce the cost of reasoning.

Talking about the market feedback and data indicators that are important for the transformation of large model capabilities into productivity, Cao Dapeng, the head of Productivity Products of Zero 10,000 Things, shared that the product focuses on the long-term retention rate in the 0-1 stage, including whether it can generate user-generated word-of-mouth, and in the 1-100 stage, it will pay more attention to whether the growth rate is fast enough, business model and paid conversion and other indicators.

When building AI-First applications, we should not only reach the world's leading level in model capabilities, but also reduce price and quality to meet user needs from the perspective of users.

Conclusion: The two-wheel drive of the open and closed source matrix aims at AI native applications

Founded one year ago, the product matrix of 0100000 has an open source and closed-source model, and the application direction covers the B-end and C-side, which is based on the strong understanding and reasoning ability of the underlying large model to explore the core pain points of users' daily life, such as making PPT, analyzing charts, etc., so that AI-native applications can really appear.

Since the beginning of this year, while comprehensively catching up with and surpassing the top foreign large models, many domestic large model applications have exploded, penetrating into all aspects of users' lives and work, and the focus of industrial competition is shifting.

Read on