laitimes

Experts such as Xiaomi, Face Wall Intelligence, Volcano Engine, and Kuaishou jointly interpret the latest multimodal technology

author:InfoQ

With the wide application of large models, multimodal technology is considered to be the direction of future development. However, despite its broad prospects, it faces many challenges and difficulties, such as technical problems in data fusion, model fusion, and cross-modal information fusion. At the AICon Global Artificial Intelligence Development and Application Conference and Large Model Application Ecology Exhibition, we specially curated a special topic on multimodal technologies and applications, with Meng Erli, technical director of the machine learning team of Xiaomi AI Lab, as the special producer, and carefully selected the following four experts to share their insights:

How sound foundation models drive sound understanding and generation

First of all, we are very honored to have Yujun Wang, who is the head of voice technology at Xiaomi and the head of the acoustic voice direction of the AI Lab of the Technical Committee. He has 20 years of experience in acoustic speech in academia and industry. His research interests include the perception, understanding, generation, and presentation of sound. Founded in 2017, he leads the acoustic speech team, which covers three areas: speech understanding, generation and measurement, with 17 sub-directions, including speech recognition, voice analysis and restoration, speech synthesis, etc. They provide voice services for Xiaomi's mobile AIoT platform, with an average of 1.26 billion services per day, and have won 7 domestic and foreign acoustic voice challenge championships.

In his speech, Wang Yujun will focus on the evolution of Xiaomi's voice base model, and how the sound base model can accurately help the understanding and generation of sound from both sides of the codec. Through his sharing, the audience will learn about the important role of sound foundation models in driving sound understanding and generation, as well as the current challenges and future prospects.

Towards practical multi-modal large models

Secondly, we are also honored to have Yao Yuan, a researcher at Facewall Intelligence and a postdoctoral fellow in the Department of Computer Science at Tsinghua University. He has extensive research experience in the fields of multimodal large models, information extraction, and knowledge graphs. He will share a presentation on the move towards practical multimodal large models, highlighting the team's latest work and achievements in this field.

In his speech, he first analyzed the challenges faced by multimodal large models in the process of practical application, including the limitations of parameter scale, computational cost, image perception resolution, language ability, etc. Subsequently, he will share the team's recent cutting-edge explorations, covering the construction of large models on end-side bases, multi-modal large models of high-definition graphs, cross-language generalization of multimodal capabilities, and reinforcement learning with multimodal human feedback.

Among them, he will focus on the efficient device-side multi-modal large model built by the team, MiniCPM-V 2.0. This series of models has a total parameter size of 2.8B, and has a number of outstanding characteristics: leading performance, and a comprehensive score better than that of mainstream models on commonly used evaluation benchmarks; Outstanding OCR capabilities, support for high-definition image encoding, and significant results in bilingual support and trusted behavior. MiniCPM-V 2.0 has performed well on the international open source platform HuggingFace, and has received wide attention and recognition.

Through his sharing, the audience will be able to gain an in-depth understanding of the challenges faced by the current multimodal large models in the process of practical application, and master the optimization strategies and technical methods for these challenges, so as to better apply them in practical scenarios.

The practice and prospect of multimodal large models in the financial industry

We are also honored to have Siji Zhou, Director of Financial Solutions and Head of Financial Models at Volcano Engine. She is committed to promoting the application of artificial intelligence in the financial industry, and has in-depth research and industry experience in the fields of natural language processing, machine learning, and computer vision. She will share a presentation on the practice and prospects of multimodal large models in the financial industry, and delve into the key issues and prospects in this field.

In his speech, Mr. Zhou will point out that the transformation of large models from unimodal to multimodal will bring new productivity tools to all walks of life, which may lead to revolutionary changes in business models. Especially in the financial industry, the use of multimodal methods to comprehensively process text, numerical, tabular and visual data can comprehensively understand financial professional documents, so as to improve the application effect of technology in the financial field.

In addition, she will also deeply analyze the development trend and application scenarios of multimodal large model technology in the financial field. She will also discuss the development trend of multimodal large models at home and abroad, deeply analyze the opportunities and challenges of technology, and look forward to the implementation prospects of financial multimodal large models in practice.

Kuaishou "Graphic" Wensheng Diagram Large Model Application Practice

We invited Yan Li, who is the leader of the Kuaishou "Ketu" large model team and a Ph.D. from the Institute of Computing, Chinese Academy of Sciences. He has more than 10 years of experience in algorithm development, business implementation and management, and has extensive experience in the field of multimodal content understanding and generation technology. He will share a speech on the application and practice of Kuaishou's "Ketu" Wensheng Diagram Large Model, and introduce the audience's self-developed Wensheng Diagram Large Model released by Kuaishou for the first time, as well as its application practices and effects in the Kuaishou APP, so as to inspire the development of the industry.

In his speech, Mr. Li Yan will review the development process of the Wensheng Diagram large model, as well as the research and development of Kuaishou Wensheng Diagram Large-scale Model, and deeply discuss the technical path and implementation of this technology. He will also share the peripheral plug-in capabilities of the Kuaishou Wensheng Diagram model, as well as the application and value analysis in the Kuaishou APP, providing the audience with suggestions and inspiration on how to develop the Chinese Bunsheng Diagram Base Model from scratch, how to accurately and objectively evaluate the effect of a Wensheng Diagram Model, how to choose the landing scenario of the Wensheng Diagram Model with the highest ROI, and how to avoid the application risks of the Wensheng Diagram Model.

Through his sharing, the audience will be able to understand the application practice of the Wensheng Diagram model in Kuaishou, and how to apply this technology in their own work to achieve more efficient and valuable business goals.

Event Recommendation:

There is only 1 day left until the opening of the conference, and tickets are about to sell out......

The conference is about to open, please contact the ticketing students to purchase tickets or consult other questions: 13269078023, or scan the QR code above to add the welfare officer of the conference, and you can receive the welfare information package.

Original link: https://sourl.co/ytYqVL

Read on