Learning dynamics The forum on large models and general artificial intelligence of the annual meeting of Chinese artificial intelligence industry ended successfully

On the morning of April 14, the 13th Wu Wenjun Artificial Intelligence Science and Technology Award Ceremony and the 2023 Chinese Artificial Intelligence Industry Annual Conference - Large Model and General Artificial Intelligence Forum with the theme of "Innovation-driven, Digital Intelligence Power" were successfully concluded at the Hilton Hotel in Suzhou Industrial Park. Experts and scholars from well-known universities in China gathered together to explore the theoretical development of important AI technologies such as embodied intelligence, multi-modal multi-task learning, and semantic space alignment, and key topics such as the application of AI technologies such as intelligent human-computer interaction, OCR, and content generation, and jointly discussed and shared key technologies, innovation difficulties and development trends of artificial intelligence.

Everyone's opinion

Embrace the era of artificial intelligence

Professor Xu Mai, Deputy Director of the Youth Working Committee of the China Society of Image and Graphics, Professor Xu Mai of Beijing University of Aeronautics and Astronautics, and Researcher Zhao Jian, a young scientist of the China Telecom Institute of Artificial Intelligence, co-chaired the forum, Xiong Hongkai, Distinguished Professor of the School of Electronic Information and Electrical Engineering of Shanghai Jiao Tong University, Jin Lianwen, a second-level professor of South China University of Technology, Ye Qixiang, a distinguished professor of the University of Chinese Academy of Sciences, and the Director of the CAAI Education Working Committee. Wu Fei, Director of the Institute of Artificial Intelligence of Zhejiang University, Sun Baigui, Head of AIGC of Alibaba Tongyi Laboratory, Li Shengxi, Professor of the School of Electronic and Information Engineering of Beihang University, Zi Ran, Director of Security GPT Business of Sangfor Technology Co., Ltd., Zhao Bin, Associate Professor of Northwestern Polytechnical University and Young Scientist of Shanghai Artificial Intelligence Laboratory, and other experts from academia and industry attended and delivered special reports at the forum. Zhao Jian, a young scientist at China Telecom Artificial Intelligence Research Institute, served as the moderator of the forum.

Researcher Jian Zhao

Professor Xu Mai, Deputy Director of the Youth Working Committee of the Chinese Society of Image and Graphics and Professor of Beijing University of Aeronautics and Astronautics, delivered a welcome speech for the forum. "With the rapid development of artificial intelligence technology, large models have become an important force to promote social progress and industrial upgrading. ”

He said: "This forum brings together many leaders in the field of large models, and we discuss the latest research results, application cases and future trends of large models, and I believe that through our exchanges and cooperation, we can better understand the potential and challenges of large models, and more effectively promote the development and application of large model technology." "At present, the development of large models is not all smooth sailing, the huge demand for computing resources, strict requirements for data quality, and the ethical and privacy issues that may be brought about urgently require the joint efforts of scholars, developers, and users in the field to explore and innovate to ensure the healthy development and wide application of large model technology. Everyone should embrace new trends with a more positive attitude and face the challenges of technical logic, demand logic and scenarios in the context of the new era.

Professor Mai Xu

Let's go to the AI big model boom

Focus on the opportunities and challenges of the times

Xiong Hongkai, a distinguished professor at the School of Electronic Information and Electrical Engineering of Shanghai Jiao Tong University, gave a keynote report entitled "Sparse Optimization and Generalization Design for General Vision Large Models", in which he discussed with the participants the efficient generalization methods of different scenarios and geometric structure data for the basic architecture Transformer design of large models. He discussed the problem of modal hybrid adaptation information forgetting, and constructed a multi-task dynamic model topology based on the lossless adaptive adjustment of information according to the reversible normalized flow. The Transformer structure is extended to form a learnable anisotropic filter to achieve multi-scale geometric frequency analysis. For the generalized manifold signal, dynamic routing can be carried out to learn the composition, design the specification and isovariable network, and improve the generalization performance under different local coordinate systems, different 3D grid structures and resolutions.

Professor Xiong Hongkai

Jin Lianwen, a second-level professor of South China University of Technology, gave a keynote report entitled "Some Thoughts on Visual Basic Model and OCR Vertical Large Model". With the rise of large language models (LLMs), artificial general intelligence (AGI) for natural language processing has made a major breakthrough. In his sharing, he briefly reviewed the progress of representative technologies related to multimodal large models, visual basic models and OCR vertical domain basic models in recent years, and introduced the latest research of his team, that is, the construction method and technical route of the pixel-level underlying processing base model for OCR-oriented document images. Subsequently, the participants discussed and looked forward to the development trend and future research direction of vertical fields such as OCR in the era of large models, and brought very novel insights to everyone.

Professor Jin Lianwen

Ye Qixiang, a distinguished professor at the University of National Academy of Sciences, gave a keynote report entitled "Structural Design and Physical Inspiration of Visual Representation Model", in which he first analyzed the complementarity and dialectical relationship between local convolution operation and global attention operation, and coupled local features with global features to form a Conformer network structure, which significantly enhanced the visual representation ability and improved the lower performance limit of the representation model. Then, the problem of Mask Image Modeling (MIM) self-supervised learning information leakage caused by local convolution operation is discussed, and the Token Merging operation is proposed to break through the local constraints of convolution or local operation to form an efficient hierarchical Transformer representation (HiViT) and a fully pre-trained Transformer Pyramid Network (iTPN). This series of studies has improved the performance of tasks such as visual object detection and segmentation to a new height from the perspective of model structure design.

Professor Yip Qixiang

Explore the innovative development of large models

Break down the barriers between technology and typical scenarios

Zi Ran, director of the security GPT business of Sangfor Technology Co., Ltd., gave a keynote report entitled "Practice and Research of Large Language Models in the Field of Network Security". In the report, he first introduced the latest progress and implementation practices of large language models in the field of network security at home and abroad, including attack detection, threat research and judgment, data security, etc. It also discusses how large language models in the security field will develop with the rapid development of new large language model technologies such as RAG, ultra-long context, and AI agent. From the perspective of being closer to the actual product, we introduced the more intuitive scenarios that the large model in the security field will show to customers.

Dr. Ziran

Sun Baigui, head of AIGC, a figure from Alibaba Tongyi Lab, shared a keynote report entitled "Application and Innovation of AIGC FaceChain" based on his rich research experience. He believes that driven by the wave of AIGC technology, image content generation has shown broad application potential on the C-end and B-side. Combined with practical applications, he shared with you the research results of facechain in popular application scenarios such as character photos, virtual fittings, and character videos. The relevant technical scenarios have been successfully implemented in multiple applications such as Fliggy Digital Travel Photography and Tongyi Wanxiang Photo Museum, and its open API has the advantages of out-of-the-box, custom templates, flexible configuration styles, and training-free technical paths. At present, the facechain team actively promotes the construction of open source communities, has gained more than 8.1K stars on GitHub, and has won 6 domestic and foreign open source projects and individual awards. He hopes that through this sharing, more people can understand facechain, and pay more attention to the development and future application of this aspect.

Researcher Sun Baigui

Li Shengxi, a professor at the School of Electronic and Information Engineering of Beijing University of Aeronautics and Astronautics, gave a keynote report entitled "Representation and Compression Method for Visual Semantic Reconstruction", he said: In the era of big data and large models, the continuous progress of intelligent algorithms is often accompanied by the steady improvement of their representation capabilities, and the probability generation model plays a very key role in artificial intelligence with its advantages such as probabilistic interpretation of the signal in an unsupervised manner. Focusing on the representation and reversibility method of generative adversarial networks oriented to visual semantic reconstruction, this paper analyzes the representation performance of generative adversarial networks, and then introduces the generative adversarial networks oriented to semantic refactoring with feature functions as statistical measures, and its theoretical completeness can ensure the completeness of semantic representations.

Professor Li Shengxi

Wu Fei, Director of the CAAI Education Working Committee and Director of the Institute of Artificial Intelligence of Zhejiang University, gave a keynote report entitled "Technical Links and Bottleneck Challenges from Text Synthesis to Video Synthesis". In his report, he introduced the self-attention neural network transformer proposed by Google in 2016 that can capture the local/global association between text words, the Vision transformer that Google extended from the text domain to the image domain in 2021, the Stable Diffusion graphs proposed by Stability AI in 2022, and the In 2023, the University of California, Berkeley and New York University proposed the development of core algorithms such as Diffusion Transformers (DiTs), an image synthesis technology, and in the development of these core algorithms, revealing the mechanism and ceiling of meaningful association and combination of the smallest units in synthetic content.

In terms of large models or general large models in vertical fields, he also put forward research hotspots for future prospects. It includes how to make the large language model jump into a cross-media large model, how to make the training and empowerment process of the large language model realize the two-wheel drive of data and knowledge, how to let the large language model interact with the environment to guide or evaluate the action behavior of the agent, how to design better large language model tools to complete the challenges in the field of basic scientific research or engineering technology, and how to break through the barriers between the large model in the vertical field and the lightweight inference of the device through the device-cloud collaboration chain technology, and form the research of device-cloud collaboration between the large and small models。 These challenges put forward more specific scientific research requirements for the development of the field of large models.

Professor Wu Fei

Zhao Bin, associate professor of Northwestern Polytechnical University and young scientist of Shanghai Artificial Intelligence Laboratory, brought a keynote report entitled "Integration of Artificial Intelligence Software and Hardware". He mentioned that since the beginning of life, the evolution of biological intelligence is not only reflected in the evolution of the way of thinking, but also the transformation of body structures such as body shape and limbs. Artificial intelligence is a series of technologies formed with reference to biological intelligence, and its theoretical development and technology implementation require the collaboration of software and hardware. Driven by this idea, it is necessary to pay attention to the integration of artificial intelligence software and hardware research to promote the application of artificial intelligence.

His sharing mainly condensed the three-dimensional interaction mode of "thinking computing-entity control-environment perception" of biological intelligence, focusing on the relevant research of large models driving embodied agents, including high-level semantic understanding, self-skill cognition and complex task execution and other technologies, providing new ideas for the development of artificial intelligence software and hardware in the era of large models. He said: "At present, the solution ability of large models is relatively poor, and when interacting with the real environment, there are still many long-tail problems that have not been solved, which also guides how to do research in the future." I hope that in the future, artificial intelligence can touch everything, make the world more fun, and more new concepts will emerge. ”

Associate Professor Zhao Bin

Around the problems and challenges

Discuss the future development of large models

The roundtable dialogue session was attended by some of the speakers and specially invited researcher Shan Shiguang from the Institute of Computing, Chinese Academy of Sciences, to discuss three topics: "The impact of large models on visual research", "Will large models dominate everything", "How to implement general artificial intelligence, and how to combine special artificial intelligence and general artificial intelligence".

The guests expressed their opinions and expressed their inspiring insights on the above topics, leading the audience to have a clearer understanding of the development of large models and the future of general artificial intelligence. The guests said that large models cannot dominate everything, and in the future, artificial intelligence learning will definitely return to human learning. Whether the large model can make a breakthrough in self-creation and self-invention in the future is a long-term and difficult matter.

With the promotion of large models, how to form batch and scale benefits is a common topic in the industry. With the embodiment of application value, the promotion of large models in the financial industry will be greater. The large model will undoubtedly bring a new era, which requires in-depth cooperation and collaborative innovation among government, industry, academia, research, and application.

Encourage innovation and take responsibility intelligently. This forum provides new ideas and suggestions for the development of the digital intelligence industry by discussing the technology and application trends of large models and general artificial intelligence, and at the same time promotes the exchange and cooperation of the industry, and promotes the popularization and promotion of AI innovative technologies. In the future, the conference will continue to share new technologies, new policies and new trends in the field of artificial intelligence, build industry communication bridges, and jointly promote the high-quality development of Chinese's artificial intelligence industry.

Learning dynamics The forum on large models and general artificial intelligence of the annual meeting of Chinese artificial intelligence industry ended successfully

Read on