With computing power, it is possible to surpass Sora.

70% of code problems cannot be solved by the pedestal model alone.

There is only a two-year window for large-scale model application innovation based on vertical scenarios.

ROI is the first criterion to measure the application value of AIGC.

AI gives everyone a chance to break through themselves.

……

At the scene of the China AIGC Industry Summit, 20 celebrities debated. From software applications, intelligent terminals to embodied intelligence, AIGC is sweeping in an all-round way, and "Hello, New Applications!" has become the theme of this year's AIGC Summit.

Enterprise players from the underlying infrastructure, model layer, and application layer of AIGC, as well as insight from the market and academic circles, talked about the opportunities and challenges of the trillion-dollar market in the first year of the implementation of large models.

The scene was full of people, and the venue of 500 people could be said to be full (in fact, there were no seats at the station).

10,000 words to sort out the debate of the China AIGC Industry Summit, and the most comprehensive industry reference for the application of large models is here

Millions of netizens watched and actively discussed online, and dozens of well-known media in the industry participated in the live broadcast and report of the conference, with a total exposure of more than 10 million.

In order to allow more readers to understand the content of this AIGC summit more comprehensively and systematically, and deeply perceive the development of this wave of the times, Qubit has made a 10,000-word combing of major models, hoping to provide you with a valuable industry reference.

This review mainly focuses on five aspects, namely the AIGC's model layer, application layer, infrastructure layer participants, and industry insight perspectives, and finally the wonderful views of the roundtable discussion.

AIGC model layer: Microsoft, Alibaba, Qualcomm and other players talk about landing

Microsoft Mian Li: AI applications have entered a new stage, and Microsoft has helped enterprise applications to land globally

Mian Li, General Manager of Azure Cloud Business Unit of Microsoft Greater China, shared how Microsoft Copilot and Azure AI platform can help the global implementation of enterprise-level applications.

Li Mian believes that AI has gone through several iterations in the past 12 months, and now AI applications have entered a new stage. How can enterprises build their own applications, how can they realize the real value brought by AI, and can consider the implementation of applications from four aspects: improving employee productivity, reshaping the interaction with users, reshaping the internal flow of enterprises, and strengthening products and services.

He highlighted a range of support that Microsoft can provide to enterprises as they build their own apps.

At the AI model level, Li Mian introduced three types of models supported by the Azure platform, namely OpenAI series models, third-party open source models, and enterprise self-developed models (BYOM). At the same time, the application prospect of small model (SLM) in specific scenarios is also described.

For development tools, Li Mian mentioned that Azure provides a low-code, no-code Microsoft Copilot Studio workbench and Azure AI Studio for deep customization, which is convenient for enterprises to quickly develop AI applications.

Considering the needs of enterprise-level applications, Li Mian also said that Microsoft not only provides support for enterprises at the top model layer, but also provides a series of supporting services such as the scheduling layer, hardware layer, and cloud data center below.

At the end of his speech, Mian Li reaffirmed Microsoft's commitment to data privacy and security:

"The customer's data is the customer's data, the customer's data is not used to train other models, and all customer data is protected by enterprise-grade protection, protected by comprehensive corporate compliance and security controls. ”

Kunlun Wanwei Fanghan: Tiangong SkyMusic music model will greatly reduce the threshold and cost of music creation

Fang Han, Chairman and CEO of Kunlun Wanwei, shared "The Evolution and Landing of Tiangong Multimodal Large Model". On the day of the conference, Kunlun Wanwei released "Tiangong 3.0", which is the first model to achieve SOTA level in the field of Chinese music AIGC. At the same time, he also announced that the "Tiangong 3.0" base model and the "Tiangong SkyMusic" music model officially opened the public beta.

"Tiangong 3.0" has 400 billion parameters, surpassing the 314 billion parameters of Grok-1, and is the world's largest open-source MoE model. In the MMbench and MMbench-CN test sets, the performance indicators of "Tiangong 3.0" surpass GPT-4V in an all-round way.

Through special agent training, the current large model can "search, write, read, talk, speak, draw, listen and sing" to meet a variety of complex content creation needs. For example, it can accurately identify "Chengdu Disneyland" as a meme and give a play strategy, automatically summarize the literature, generate outlines, PPTs and brain maps, and generate agents through non-code methods.

Fang Han introduced the "Tiangong SkyMusic" music model, which has surpassed Sora in terms of vocal recognition and sound quality thanks to the training data of 20 million pieces of music and the unique model architecture. "Tiangong SkyMusic" supports the generation of music according to the characteristics of the sound source and singer, and supports the synthesis of multiple dialects, which greatly reduces the threshold and cost of music creation——

Songs used in all walks of life can be generated by AI, and the cost quickly drops from tens of thousands of dollars to cents.

Finally, Fang Han shared Kunlun's vision: "Realize general artificial intelligence, so that everyone can better shape and express themselves." He believes that the evolution of large models will eventually achieve AGI, and the popularization of AIGC capabilities will help break the monopoly of strong cultures and achieve cultural equality. As a global Internet company, Kunlun Wanwei hopes to use AI technology to empower global users.

Ali Tongyi Qianwen Lin Junyang: Intelligent models should be integrated into the understanding of vision/speech

Lin Junyang, the person in charge of Ali Tongyi Qianwen Open Source, shared the efforts made by Ali Tongyi Qianwen model to "move towards a general model" at the scene.

Lin Junyang said that since the open source, the Tongyi Qianwen Qwen (a transliteration of "Qianwen" in order to facilitate English pronunciation) series models have received extensive attention from developers at home and abroad.

Since August last year, the Tongyi Qianwen Qwen series models have been open-sourced and new. Starting from the scale of 7B and 14B parameters, until the 72B parameter version was open-sourced, the latest action, Ali Tongyi Qianwen family also has a "small member", which is a 14B parameter MoE model. The urgent need of the developer community prompted Alibaba to quickly open source the 32B model, which is close to the performance of the 72B parameter model, and in some respects, has advantages over the MoE model.

Lin Junyang emphasized at the scene that Ali Tongyi Qianwen is also very focused on creating a large model use ecology.

First of all, the code of Tongyi Qianwen has been officially integrated into the codebase of Hug Face, and developers can use the model of Tongyi Qianwen more conveniently.

Secondly, Tongyi Qianwen has made a lot of progress in supporting third-party frameworks, and platforms including ollama can use the Qwen series models with one click.

Lin Junyang also shared issues related to capabilities such as multi-language, long sequence, post-training, agent, and multi-modality.

Multilingual: The model is inherently multilingual, not just bilingual in Chinese and English, and the team has tested and optimized multilingual capabilities.

Long sequences: Qwen series models have not been able to roll long text, which is not easy to do, not only to ensure "long", but also to ensure the effect; at present, the performance of the 32k version has been relatively stable; and the evaluation of finding a needle in a haystack has found that long sequences can be used on Chatbot.

Post-training: Optimize post-training in terms of data through SAT and other aspects, so that the potential of large models can explode.

Agent: One of the ways to do this is to do more data annotation and research on how to use agent.

Multimodality (Qwen-VL): A very intelligent model should incorporate an understanding of vision and speech, and this year we will focus on the study of video modality and think about how to build a VL-Agent.

Qualcomm Satellite: The Qualcomm AI Engine with a heterogeneous computing system can fully meet the diverse requirements of generative AI

Wan Weixing, head of Qualcomm's AI product technology in China, said in his speech that as a chip manufacturer, Qualcomm is promoting the large-scale expansion of AIGC-related industries by providing leading products and solutions.

He pointed out that Qualcomm believes that the era of end-side generative AI has arrived.

In the third-generation Snapdragon 8 and Snapdragon X Elite products released by Qualcomm in October last year, the large language model has been completely moved to the device side, empowering many AI mobile phones and AI PCs. Under the trend of multimodality, in February this year, Qualcomm also moved the multimodal large model to the device side. At the launch of the Snapdragon X Elite, Qualcomm also demonstrated the world's first multimodal audio inference model running on a Windows PC.

Wan Satellite said that generative AI use cases in different fields have diverse requirements, and the AI models required behind them are also very different, and it is difficult to have one processor that can be perfectly suitable for all use cases.

In this regard, Qualcomm has launched the Qualcomm AI Engine with a heterogeneous computing system, which contains a variety of processor components to fully meet the diverse requirements of generative AI. The focus is on the NPU. Based on the evolution of user needs and terminal use cases over the years, Qualcomm NPU has been continuously upgraded. The Hexagon NPU of the third-generation Snapdragon 8 also integrates a Transformer acceleration module specifically built for generative AI, as well as advanced AI technologies such as microarchitecture upgrades, independent power supply tracks, and micro-slice inference.

Wanxing also revealed that Qualcomm will focus on supporting the end-side of multimodal models this year, as well as supporting the deployment of higher-parameter and large-scale language models on the device.

After talking about the hardware design, Wanxing introduced Qualcomm's important AI software products, including Qualcomm AI Stack, a unified solution across platforms and terminals.

You only need to complete the optimal deployment of the model on one Qualcomm platform, and it is very convenient to migrate this part of the work to other Qualcomm product lines.

In addition, Qualcomm also launched the Qualcomm AI Hub (Qualcomm AI Hub) at this year's MWC Barcelona. The product is aimed at third-party developers and partners, and can help developers make full use of the hardware computing power of Qualcomm and Snapdragon's underlying chips to develop their own innovative AI applications.

He concluded by summarizing Qualcomm's strengths in AI as "unmatched hardware design, top-of-the-line heterogeneous computing capabilities, scalable AI software tools, and extensive ecosystem and model support."

Ant Li Jianguo: The problem of more than 70% of the code cannot be solved by the pedestal model alone

More than 70% of the problems need end-to-end code generation capabilities to solve, and the pedestal model alone is far from satisfying.

At the China AIGC Industry Summit, Li Jianguo, head of CodeFuse, said that although the current code model is advancing rapidly in the base model and application products, it still faces many challenges in order to achieve a significant improvement in R&D efficiency in enterprises.

From the perspective of the whole life cycle of software R&D, from the initial requirements design to code development, test building, release operation and maintenance, data insight, etc., writing code may only account for 1/5 or even less of the workload.

Li Jianguo said that Ant Group hopes to build a "R&D agent" to achieve task distribution and connection through intelligent agents, connect all links, and comprehensively improve R&D efficiency.

When CodeFuse was first released, it was clearly stated that "we should make a large code model of the whole life cycle". CodeFuse has 13 open-source repositories, covering eight major software development areas, including code training, testing, DevOps operation and maintenance, program analysis, and evaluation. Li Jianguo said that this is an all-round open source.

Finally, looking at the whole field, combined with external statistics and ant practice, the pedestal model can only solve about 30% of the problems in the actual application process, and the remaining 70% of the problems still need end-to-end code generation capabilities. In addition, it is necessary to continue to evolve in terms of agent reasoning capabilities, demand demand disassembly, and cross-modal interaction.

Li Jianguo also emphasized that in vertical scenarios, such as financial scenarios, the requirements for the security, trustworthiness, and reliability of generated code are also problems that Ant is focusing on overcoming.

Although there are many challenges and long obstacles, Li Jianguo believes that Ant will work together with the open source community to solve this problem to a certain extent in the next two to three years under the traction of Moore's Law of Everything.

Xiaoice Xu Yuanchun: The real operating body of the market is very simple

Xu Yuanchun, co-founder and COO of Xiaoice and head of the AI Creativity Lab, gave a speech on the theme of "Digital Human + Large Model: Creating New Business Application Scenarios".

"How to make money as an algorithm company and how to make money as an AIGC industry company are the last questions to be answered. The first question to be answered is, how do you make money with this thing?", Xu Yuanchun said.

He used several specific examples to show how Xiaoice makes everyone make money.

The first is an individual blogger in the beauty industry, who used the Xiaoice virtual person and large model platform to create her own digital person, and used the digital human avatar to share the content of creating clothing on the short video platform. In just over 40 days, her single video has reached 2 million views, and she has attracted 6-8 intending customers to offline stores every day. And this has allowed her business to develop better.

The second is a small and medium-sized enterprise, which started with software development, technology empowerment, and back-end support, and now uses Xiaoice's technology platform to transform and become an AI service provider, providing AI empowerment services for 300 small and medium-sized enterprises in Yunnan within 4 months.

The third is a larger industry leader, who deeply integrates Xiaoice's digital human and large model technology into its own various hardware products to achieve "out-of-the-box", and every hardware device with a screen can become a new interactive carrier.

In Xu Yuanchun's view, the real industrial application can be deepened not in the height of the temple, but in the distance of the rivers and lakes:

You find that the real market is running, the market practitioners, don't have so many complicated ideas about AI, they are very simple.

He further added that Xiaoice has implanted large models and digital humans more deeply into the workflow and task system of the enterprise, and the digital employee is equivalent to having a closed-loop brain that integrates enterprise knowledge and data, which can make business processes and customer communication smoother.

Finally, Xu Yuanchun talked about the closed loop of business. There is a closed loop of software + hardware products such as "cloud + terminal", and there is also a formal closed loop such as interaction + content. Today, through real enterprises and individual cases, using technology to gain more competitiveness and make their business better is actually the most important node in all closed loops.

"Finding and activating each key node is the only way to achieve a true closed loop on technology commercialization. ”

AIGC Application Layer: How Can Ordinary People Do AI?

Meitu Wu Xinhong: The window period for large-scale model application innovation based on vertical scenarios is only two years

Wu Xinhong, founder, chairman and CEO of Meitu, shared the exploration of Meitu's video model.

After 16 years of development, Meitu has been focusing on imaging and design products, forming three major AI product categories: image, video, and design.

Wu Xinhong showed a 60-second AI short film produced in only half a day, using a series of AI tools such as Kai Shot, WHEE, Wink, etc., which greatly reduced the production threshold and improved efficiency compared with traditional animation workflows.

Wu Xinhong expects that in the second half of this year, there will be a lot of domestic Sora listings, and Meitu is also one of them.

We believe that three points are very important for the increasing competition: first, creativity beyond reality, second, workflow integration, and third, the ability to vertical scenarios.

Among them, the application innovation of large models based on vertical models, Wu Xinhong believes that there is a two-year window period.

Looking forward to the future, Wu Xinhong believes that in addition to Wensheng video, the standard configuration of video models will also emerge more generation methods such as picture video, video video, audio video, etc., and the application scenarios are very broad.

This year, video generation, represented by Sora, is just the beginning. With the deepening of the video model's understanding of the physical world, it is expected to achieve more professional capabilities such as plot design, storyboarding, and transitions, which can be deeply integrated with the video production workflow, and can generate 1-5 minute videos in the future.

Yao Dong of Kingsoft Office: WPS is no longer a document editor

Yao Dong, Vice President of Kingsoft Office and General Manager of R&D Middle Office Division, shared Kingsoft Office's thinking and practice in embracing the wave of AI at this conference.

As an office software company, Kingsoft Office has taken "multi-screen, content, cloud, collaboration, and AI" as its strategic focus in the past five years, and has paid special attention to the development of AI and collaboration in the past two years.

Just a few days ago, Kingsoft Office released the enterprise-level product WPS 365.

Yao Dong said that the current WPS is no longer a document editor, but an office platform that includes various functions such as enterprise data collaboration, knowledge management, communication, and various algorithm-related model services. In the recent release of WPS 365, the WPS AI Enterprise Edition included in it focuses on building enterprise brains for customers, focusing on three types of capabilities: AI Hub, AI Docs, and Copilot Pro.

Among them, AI Hub is the foundation for enterprises to use AI capabilities, providing a unified interface and development system that is compatible with various large models on the market, allowing enterprises to flexibly select and switch between suitable models.

AI Docs is used to help enterprises revitalize massive unstructured data assets.

Employees write documents every day, and these are actually very important things for businesses. But there has always been a problem in the past that this kind of knowledge cannot be reused because it is unstructured.

Traditional keyword search is difficult to accurately determine the knowledge in Chinese files, but based on large models and multimodal technology, WPS 365 realizes intelligent reading comprehension, search and Q&A of various formats of documents within the enterprise, and strictly follows document permission control.

Copilot Pro uses AI to drive natural language interactive office. For example, to do data analysis, the traditional way requires writing scripts, designing formulas, drawing charts, etc., and the threshold is very high. In Copilot Pro, users only need to express their requirements in natural language, and let the AI automate the entire process.

Yao Dong emphasized that document data is not transmitted between people, in fact, it is a data island, and today's office is no longer just about simply writing document analysis data, but more importantly, the collaboration between people and people, and between people and AI.

Evernote Tang Yi: The AI-driven "second brain" not only gives users freedom, but also reduces information management anxiety

Tang Yi, Chairman and CEO of Evernote, has extensive experience in technology entrepreneurship, multinational enterprise management, and investment and financing.

Since March last year, Evernote has used its self-developed impression model to drive "Impression AI" products and services, and has empowered its full line of software and intelligent hardware products.

Tang Yi's sharing focused on the field of "knowledge management". In his view, the development of AIGC is still in the early stage of prosperity, and challenges and opportunities coexist.

He believes that compared with the rapid expansion of computing power, datasets and model scales, the progress of model algorithms is relatively slow, and the investment and income of computing power are disproportionate. In addition, at present, with the exhaustion of human public domain data for model training, the addition of more and more synthetic data will also lead to a decrease in the output effect of the model.

At the same time, in practice and competition, it is found that the growth of specific data-driven model capabilities is constantly strengthening, and the trend of miniaturization and efficiency of models is becoming increasingly prominent.

Talking about the evolution direction of Evernote's large model and products, Tang Yi said that from the perspective of the Compound AI System, he will improve the ability of self-developed impression large models, and at the same time give full play to the advantages of users, data, scenarios, carriers, interactions, etc., to create a real AI super application.

Driven by AI, Evernote will help users intelligently aggregate information, read and absorb efficiently, assist in inspiration recording and creation, and automatically complete knowledge sorting and refinement, becoming a real and intelligent "second brain" for users.

Step-by-step dynamic tension: Humanoid robots will be applied as platforms in the future

Zhang Li, co-founder and COO of general robotics startup Chase Dynamics, shared his insights on the development of humanoid robots and their relationship with AGI at the China AIGC Industry Summit.

At present, there has been a substantial breakthrough in the two-legged movement ability of humanoid robots, but the operation ability is still limited, because AI cannot fully form its own behavior according to multi-modal scenarios, and how to use multi-modal large models to generate autonomous movement and control of robots is a part that industry and academia are catching up and researching.

In terms of hardware and software algorithms, especially in the collaboration between the brain and the cerebellum, humanoid robots still need to make more breakthroughs.

Zhang Zhang imagines that the future humanoid robot can realize platform-based applications, just like today's iPhone+APP. By installing different applications and using their own motion control capabilities to perform various tasks, the robot greatly expands the range of applications.

In essence, a robot is an electromechanical system that is similar to or surpasses human mobility, computing power, and perception capabilities. In terms of technology, pre-planned motion control is a relatively traditional technology, and if you need to interact more with the outside world, such as environmental cognition, object detection, contact feedback, etc., you need new technology. In this regard, the impact of AGI on robots is very large.

In the process of continuous research and development of iterative products, Progressive Power has formed key new technologies such as imitation learning, deep reinforcement learning and perception-based motion control, and has launched humanoid robots, bipedal robots, and quadruped robots.

Zhang Zhang shared his views on the market prospects for humanoid robots:

Whether it is TOB or TOC, embodied intelligence has a very large application scenario in the future.

In the process of continuous expansion of technological boundaries, it is very important to commercialize relatively mature technologies and products by laying eggs along the way, forming autonomous mobility and mobile operation capabilities, and connecting robots with AGI and AIGC to strengthen the cognition and understanding of scenarios, realize task decomposition, and better complete planning decisions.

Get the quick knife Tsing Yi: AI has given many people a chance to break through themselves

The theme of the speech by Kuaidao Tsing Yi, the co-founder and manager of the AI Learning Circle, was "Hexagonal Warriors, the Revolution of Personal Ability Driven by AI".

"Six-sided" in Kuaidao Tsing Yi here refers to product capabilities, output capabilities, efficiency improvement capabilities, innovation capabilities, management capabilities, and design capabilities. In his opinion, the development of AI technology has comprehensively improved his personal hexagonal ability.

He shared his experience in the past year from both innovation and output.

First of all, Kuaidao Tsing Yi believes that the source of AI innovation can be considered from four aspects: the product you want to use, a pain point that has been bothering you for a long time, the huge changes that you are familiar with in the industry can foresee, and the things that you are passionate about and challenging:

If two out of four can be done, three is well worth the time you spend studying it.

Taking this as a starting point, Kuaidao Tsing Yi introduced the self-developed AI sparring applet "Start Training", which is used to conduct AI actual combat sparring for employees and receive AI feedback. In this way, employees can easily answer their questions when they meet real customers after practice.

Then he shared the original intention of developing this mini program. At first, I wanted to use it for my company's programmers, but later the programmers said that they didn't rely on communication to work, but on writing code. Later, once posted on Moments, the owner of a chain of beauty salons found that this was particularly useful for their front-line beauticians to introduce products......

Kuaidao Tsing Yi sighed, "The initial starting point may be different from what you imagined, and there may be many different things in the process."

In addition, he also emphasized the importance of the enterprise's proprietary knowledge base and proprietary data, and said that he set several restrictions for the team when doing this AI project: the team should not exceed 3 people, and the missing capabilities should be supplemented with AI, and the hardware should not be touched, and the large model should not be trained, and only the training scenarios to improve user capabilities should be done.

Recognize your own abilities, do what you are better at, and don't feel that you can do anything just because AI is strong.

In terms of improving output capabilities, Kuaidao Tsing Yi shared his transformation from "ensuring that the account is not frozen" to changing it every day for 365 days, and doing an AI-related live broadcast every week to talk about what others are doing. All of this is the improvement of the output ability brought to him by this wave of AI.

Finally, Kuaidao Tsing Yi quoted Jordan's words: "I can accept failure, but I can't accept not trying." ”

AIGC Infrastructure Layer: How to Support the Digital Transformation of Industries?

Amazon Web Services Wang Xiaoye: Four key points for enterprises to seize the opportunities of generative AI

The era of generative AI has begun, and it's not something that will happen in the future.

Wang Xiaoye, technical director of the product department of Amazon Web Services Greater China, said in his speech that generative AI will disrupt all industries within 18 months, bringing a huge market opportunity of up to 4.4 trillion US dollars to the world.

As for how enterprises can seize the opportunities of generative AI, Wang Xiaoye summarized four key points: choosing the right scenario, choosing the right tools and partners, paying attention to data as the core competitiveness of enterprises, and paying attention to talent training and AI-related supervision and governance.

He pointed out that generative AI has great potential in six scenarios: cross-language communication, business decision-making and insights, generation of intelligent services and marketing materials, and overall operational efficiency improvement.

Wang Xiaoye pointed out that thanks to the optimization of model capabilities and costs, generative AI is evolving from limited primary applications such as Wensheng diagrams, marketing, and chatbots to a wider range of fields. For example, with the support of large models such as Claude, the landing of more scenarios such as language translation, emotional companionship, and game content review is quietly happening. He emphasized that multimodal interaction will be an important trend in the development of large models.

In terms of helping enterprises apply generative AI, Amazon Web Services proposes "three layers of atomic capabilities": the underlying infrastructure acceleration layer, tools such as Amazon Bedrock that use basic models to build generative AI applications, and the top layer of out-of-the-box generative AI applications.

From e-commerce to cloud computing, Amazon has been using technology and AI to disrupt and innovate existing industries. Wang Xiaoye finally said that the next place that Amazon is building and continues to invest in is the three-layer atomic capabilities of generative AI, hoping to win the era of generative AI with customers.

SenseTime Yangfan: Building an AI infrastructure ecosystem is the key to lowering the threshold for AI applications

"AI applications in China are becoming more and more, more and more new scenarios are being opened, and we will see the explosion of China's generative AI market in the second half of this year or the first half of next year. ”

Yang Fan, co-founder of SenseTime and president of the large device business group, made such a judgment at the conference.

Yang Fan analyzed that the current law of scale is still dominating the technological iteration of AI, and the core problem of the development of the AI industry is that "the input-output ratio of the industry is not good enough". With the increase in the cost of AI production and application, it is an inevitable trend to lower the threshold for use by reducing costs.

The construction of AI infrastructure is the key to solving this problem.

Only by standardizing, infrastructuring, and servicing these general-purpose capabilities, whether it is a large-scale computing power cluster or a machine model API, or even a complete system around ultra-large-scale data in the future, can it be possible to make the entire AI industry have a lower threshold for innovation and more cost-effective in the future, and more people can come in and make money on it.

Regarding SenseTime's investment in this area, Yang Fan first introduced the latest progress of SenseTime's intelligent computing center in Lingang:

As of the end of last year, seven or eight nodes have been built to form a connection, including Lingang, and many new nodes are under construction. The connection computing power is more than 12,000P, which is close to 10,000P ahead of the single-point computing power. At the same time, SenseTime has also carried out extensive cooperation with the industrial chain at the chip level, and the Lingang Intelligent Computing Center has more than 15% of the domestic chip computing power.

In addition to consolidating the foundation of computing power, Yang Fan also described the software product and service systems launched by SenseTime at different levels, which mentioned a full set of solutions to reduce the cost of model calling.

He also shared the development of SenseTime's own large model, in addition to seeing more language tasks last year, now more solutions to provide different basic models in different fields of image, video, and 3D reconstruction.

Generally speaking, SenseTime still hopes to support a more prosperous scene ecology with infrastructure platform capabilities.

AIGC洞察者:Scaling Laws是关键

Yuan Li, Peking University: We are almost the first to propose retrieval enhancement to solve the problem of large model illusions

Yuan Li, assistant professor of Peking University's Shenzhen Graduate School, shared his team's practical experience in the vertical application of multimodal models at the conference.

He said that toys used for small talk do not meet the real needs of users, and AI must be transformed into real productivity, and productivity is transformed by vertical fields.

Professor Yuan Li introduced several representative products developed by their team based on Pengcheng's cloud brain and self-built computing power, based on general and industry data:

ChatExcel: A multi-modal AI assistant for data table processing, which can directly manipulate tables with text for data visualization and marketing strategy analysis. This achievement has been applied by a luxury giant. The PhD student who developed this application also founded Yuankong AI.

ChatLaw: A vertical application of Chinese law, which can provide users and lawyers with services such as information analysis, structured extraction, and generation of legal documents. The product adopts retrieval enhancement technology and introduces legal text database reference, which effectively alleviates the illusion problem of large models.

At that time, we were the first in the industry to do the practice of retrieval augmentation, but we did not put forward this concept, let the big model do the big model thing, and let the retrieval do the retrieval thing.

Finally, Professor Yuan Li introduced the Open-Sora Plan, an open source plan for Sora reproduction jointly initiated by them and Peking University alumni enterprise Rabbit Exhibition Intelligence, with the goal of realizing a visual version of LLaMA. The project is divided into three technical parts: Video Codec, Diffusion Transformer, and Conditional Injection.

At present, the first version of the pre-trained model and CausalVideoVAE have been open-sourced, which has attracted wide attention in the open source community and has received nearly 10,000 stars on GitHub. The best feature of the framework is its ability to generate long videos, thanks to the compression of long video clips fed during training.

Next, the project will achieve higher reproduction goals in three phases: the first phase has been open-sourced, the second phase will strive for open-source support for 20-second 720p video generation models, and the third phase hopes to achieve performance beyond the original Sota with the help of industrial computing power.

Professor Yuan Li said that open source has promoted the prosperity of AI, and they also hope to give back to the community through open source, so that both academia and industry can share technological achievements.

Zhang Lu, Fusion Fund, Silicon Valley: Startups can take the "cocktail" mode at this stage

As a top investor who has been focusing on and deploying the AI field for a long time, Zhang Lu, founding partner of Silicon Valley Fusion Fund and visiting lecturer at Stanford University, shared her in-depth insights into the development of AI technology and industry in the world, especially in Silicon Valley.

Zhang Lu pointed out that AI is becoming an industry-wide digital transformation tool, and the emergence of massive high-quality data has laid the foundation for the large-scale application of AI.

Against this backdrop, AI will present 10 times greater opportunities than in the internet era, but only a third of them will be reserved for start-ups.

As a start-up, it is very important to find the right industry and the right application scenario, and find the right entry point, and data is the core. How do you get high-quality data, and how do you make it your competitive advantage?

If start-ups want to seize the opportunity in the AI wave, they must find their own innovation entry point and make full use of the ecological platform built by large companies to achieve common development.

At this stage, start-ups can basically do a "cocktail" model, that is, mobilize the API of the most cutting-edge large model, use the open source model on it, and then make some modifications to optimize the model.

"In this optimization process, two features quickly became apparent. Zhang Lu said that the first feature is that the quality of the data is more important than the quantity of the data, and the second is that there is no need for a model to solve all the problems.

In terms of investment direction, Zhang Lu said that the Fusion Fund focuses on two dimensions: the application layer and infrastructure of AI.

Among them, the application layer mainly focuses on fields with massive high-quality data and broad application prospects, such as healthcare, finance and insurance, robotics, and space, while the infrastructure layer lays out various technology nodes from chips to the cloud, aiming to break through the key bottlenecks of AI development such as computing power, energy consumption, and privacy.

In her speech, Zhang Lu said that with the vigorous development of the open source community, small models and industry-specific models will also become an important trend in AI applications.

She emphasized that for entrepreneurs, the acquisition and application of high-quality data is more critical than massive data, and the performance of customized small models in specific scenarios can even be comparable to that of general large models.

Lu Zhiwu of the National People's Congress: With computing power, it is possible to surpass Sora

Lu Zhiwu, a professor at the Hillhouse School of Artificial Intelligence at Renmin University of Chinese, shared the theme of "VDT: General Diffusion Video Generation Based on Transformer".

VDT is an abbreviation for Video Diffusion Transformer. This is a project led by Lu Zhiwu, which was published on arXiv in May last year and has been accepted by the ICLR.

Its innovations are the application of Transformer to video generation – long before OpenAI released Sora – and the introduction of unified spatiotemporal mask modeling into models.

Why is video generation moving from a Diffusion-based model to a Transformer-based model?

Lu Zhiwu said that the Transformer model has the advantage of capturing long-term or irregular time dependence, which is especially important in the video field, and the number of parameters of the Transformer model can be increased as needed, which provides flexibility to improve the performance of the model.

In his presentation, Lu Zhiwu mentioned the key spatiotemporal Transformer block in the VDT model and explained its nuances from existing models such as Sora. He pointed out that due to computing power limitations, the team adopted a separate approach to space and time in the design to improve efficiency.

How does VDT compare with SOTA models like Sora? According to Lu Zhiwu's analysis, the two are different in terms of spatiotemporal attention processing, but this difference is not essential.

We speculate that Sora's powerful physical world simulation capabilities mainly come from the unified space-time tokenization and Attention mechanism.

Lu Zhiwu said in the end that the team found through experiments that the effect of the VDT model is only related to the computing power consumed, which is consistent with the conclusion of OpenAI's image generation model DiT.

"The more computing power, the better. It's not impossible to get more computing power and surpass Sora. ”

Panel Discussion: ROI is the first criterion for measuring the application value of AIGC

"Hello, new application!" The summit set up a roundtable forum, and the topic of discussion was very pragmatic: how to land, how to make money?

——From the advent of ChatGPT to the present, in the past year and a half, AIGC has a very obvious trend, that is, it has gradually developed from building the basic layer to "using it". This year is also considered by many to be the first year of AIGC application, and at this point in time, it is necessary to sit down and talk about down-to-earth topics related to AIGC.

The three representative guests invited this time are:

Gao Yushi, vice president of technology of Easy Group, led the research and development of the group's AI intelligent system in the field of health protection.

Xu Dong, head of Alibaba Cloud's Tongyi large model business, has in-depth practice in the fields of cloud native, device-cloud architecture and AI large model.

Zhou Jian, founder and CEO of Lanma Technology, who has accumulated valuable experience in the field of AI and enterprise services.

Under the chairmanship of Jin Lei, editor-in-chief of qubits, the roundtable mainly focused on three topics: how to use large model applications, AI has its own ways to make money, and the benefits of the 100-model war outweigh the disadvantages.

How are you using the large model application?

Gao Yushi said that the upgrade of Dr.GPT has brought great convenience to both doctors and patients. On the doctor's side, the efficiency of clinical research has been doubled, the creation of popular science content has achieved a monthly output of 10,000 articles, the adoption rate of intelligent assisted diagnosis and treatment has reached 86%, and the diagnosis time has been shortened from 10 minutes to 1-2 minutes. The health consultant on the patient side covers more than 300,000 users, with an activity rate of 70%.

Zhou Jian's Lanma Technology has built an enterprise-level AI agent based on large language models to serve the development of enhanced automation and innovative business in the daily office scenarios of enterprises, and has realized the typical application of expert knowledge to empower grassroots employees and increase management efficiency in industries and fields such as insurance, banking, and government affairs.

Xu Dong gave a two-dimensional view from the perspective of the Tongyi large model, and currently sees that the first type is that the large model shapes the core business model of the industry, such as NPC in the game industry, role playing in the social field, and applications like the end side of intelligent hardware;

There are different ways to make money with AI

In terms of AIGC commercialization, Xu Dong said that there is no killer product in AIGC applications at present, and it is possible to wait and see that there may be an innovative business model based on subscription system in the future.

Gao Yushi said that they mainly make profits by providing value-added services for C-end users, such as medical and health insurance, shopping malls, and popular science payments. For the B-side, it is mainly pay-as-you-go.

Zhou Jian mentioned that one possibility is to charge AI Agent/digital employees based on large language models on a monthly basis. Integrate new production factors such as expert knowledge, models, and computing power into a set of services, and charge and share them according to usage for industries such as finance.

As for how to judge the value of an AIGC product, the three panelists agreed that it depends on whether it can improve ROI, including reducing costs and increasing efficiency, increasing revenue or improving user experience. However, the specific measurement method depends on the characteristics of the industry and the scenario.

The pros outweigh the disadvantages

In response to the necessity of the "100 Model War" last year, Gao Yushi believes that it is valuable from the perspective of accelerating technological development, but the problem of resource loss needs to be paid attention to. He predicts that a reshuffle could eventually take place among tech giants and the startups they invest in.

Zhou Jian proposed that in the future, only a few general large models may be needed, but there may be as many as hundreds of subdivided vertical domain models, which requires more startups to participate.

Xu Dong also believes that the "100 model war" is not completely extravagant and wasteful, it has cultivated a talent team, accumulated experience in models and data, and we also welcome the competition of non-homogeneous models, these cultivated talents and accumulated experience will also help large models to land in thousands of industries, which will be of great benefit to the commercialization of AIGC in the future.

10,000 words to sort out the debate of the China AIGC Industry Summit, and the most comprehensive industry reference for the application of large models is here

AIGC model layer: Microsoft, Alibaba, Qualcomm and other players talk about landing

AIGC Application Layer: How Can Ordinary People Do AI?

AIGC Infrastructure Layer: How to Support the Digital Transformation of Industries?

AIGC洞察者:Scaling Laws是关键

Panel Discussion: ROI is the first criterion for measuring the application value of AIGC

Read on