The era of generative AI on the device side has arrived, and Qualcomm empowers AIGC application innovation with leading AI software and hardware technologies

On April 17, the China AIGC Industry Summit was held in Beijing. With the theme of "Hello, New Applications", the summit invited representatives of generative AI applications, AI infrastructure, and model layers to share insights on the latest status and trends of generative AI. At the summit, Wan Weixing, head of Qualcomm's AI product technology in China, delivered a keynote speech on "Promoting the Arrival of the Era of Generative AI on the Device Side". He emphasized the advent of the era of generative AI on the device side, and mentioned that Qualcomm's third-generation Snapdragon 8 and Snapdragon X Elite platforms have or will soon enable many AI phones and AI PC products. Wanxing gave a detailed introduction to the Qualcomm AI Engine and its components, and also demonstrated the end-to-end use cases of Qualcomm's heterogeneous computing capabilities. In addition, Wanxing also introduced the Qualcomm AI software stack and AI Hub, which will greatly improve the efficiency of developers in model development, optimization and deployment, which in turn will help create more innovative AI applications.

The full text of the speech is as follows:

Good morning, I am very pleased to be able to participate in this China AIGC Industry Summit, welcome the arrival of the generative AI era with all guests and friends, and share with you how the products and solutions provided by Qualcomm as a chip manufacturer can promote the large-scale expansion of AIGC-related industries.

We believe that the era of device-side generative AI has arrived, and Qualcomm has completely moved large language models to device-side operation on the third-generation Snapdragon 8 and Snapdragon X Elite released in October 2023, and has or will empower many AI phones and AI PCs. In terms of mobile phones, many Android flagships released by OEMs at the end of last year and the beginning of this year, including Samsung, Xiaomi, Honor, OPPO and vivo, have the ability to run generative AI on the device side.

The development of multi-modal large models based on image semantic understanding is an important trend at present, and during MWC Barcelona in February this year, Qualcomm also demonstrated the world's first multi-modal large model (LMM) running on Android phones. Specifically, we ran a large language and visual assistant model (LLaVa) with over 7 billion parameters based on image and text input on our reference design with Snapdragon 8 Gen 3, which generates multi-turn conversations based on image input. Multimodal large models with language and visual comprehension capabilities enable many use cases, such as identifying and discussing complex visual patterns, objects, and scenes. Imagine that users with visual impairments can use this technology on the terminal side to navigate the city. At the same time, Qualcomm also demonstrated the world's first audio inference multimodal large model running on a Windows PC on the Snapdragon X Elite.

Next, let's take a look at how Qualcomm, as a chip manufacturer, meets the diverse requirements of generative AI. Generative AI use cases in different domains have diverse requirements, including on-demand, persistent, and ubiquitous use cases, and the AI models required behind them are also very different, and it is difficult to have a single tool that can be perfectly applied to all generative AI use cases or non-generative AI use cases. For example, some use cases require sequential control and are sensitive to latency, some are persistent and sensitive to computing power and power consumption, and some use cases are always on and are especially sensitive to power consumption.

Qualcomm's Qualcomm AI Engine is a leading heterogeneous computing system, which consists of multiple processor components, including general-purpose hardware acceleration units CPU and GPU, NPU specifically for high computing power needs, and Qualcomm Sensor Hub, which play different roles in the AI inference process. The on-demand tasks that are executed sequentially mentioned above can be run on the CPU or GPU, persistent tasks that require high AI computing power, such as image processing and generative AI, can be run on the NPU, and tasks that require always-on and are particularly sensitive to power consumption can be run on the Qualcomm Sensor Hub.

Let me give you a brief introduction to the evolution of Qualcomm NPU, which is a very typical case of low-level hardware design driven by upper-level use cases. In 2015 and earlier, AI was mainly used for some relatively simple image recognition and image classification use cases, so we configured the NPU with scalar and vector accelerators. From 2016 to 2022, the concept of computational photography became popular, and we shifted our research direction from image classification to AI computing, AI video, etc., including support for natural language understanding and processing, as well as support for Transformer models, and we added tensor accelerators to NPU hardware on top of scalar and vector accelerators. In 2023, large models are very popular, and we are the first in the industry to complete the support of large models on the device side, and configure a special Transformer acceleration module for the NPU. In 2024, we will focus on supporting the device-side deployment of multimodal models and the deployment of large language models with higher parameter quantities.

Next, we will give you a more in-depth introduction to Qualcomm Hexagon NPU. The Hexagon NPU on the third-generation Snapdragon 8 not only has a microarchitectural upgrade, but also has a separate power supply track specifically configured for optimal energy efficiency. We also use micro-slice inference technology to support deep network integration and achieve more extreme performance. In addition, Hexagon NPU also integrates advanced technologies such as Transformer acceleration module specially designed for generative AI, higher DDR transmission bandwidth, and higher IP frequency. The combination of all these technologies makes Hexagon NPU an industry-leading NPU for device-side generative AI.

Let's take a look at a specific example, a virtual avatar AI assistant – this is a very typical end-to-end use case that leverages Qualcomm's heterogeneous computing power. It includes many complex AI workloads, first of all, the automatic speech recognition (ASR) model is responsible for converting speech signals into text, which can run on the Qualcomm sensor hub, then the large language model is responsible for processing text input, generating replies and conversations, which can run on the NPU, and then the text output is converted into speech signals through the text generation speech (TTS) model, which can run on the CPU, and finally the GPU module is responsible for synchronously completing the avatar rendering based on the speech output, so that you can get an end-to-end avatar assistant use case that interacts with voice。

The hardware technology was introduced earlier, and now I will share the AI performance of the Qualcomm platform. In the field of smartphones, the third-generation Snapdragon 8 far outperforms its competitors in terms of the total scores of AI benchmarks such as Master Lu AIMark V4.3 and Antutu AITuTu, as well as in the specific model inference performance test of MLCommon MLPerf Inference: Mobile V3.1. On the PC side, the Snapdragon X Elite also outperformed its x86-based competitors in the UL Procyon AI inference benchmark for Windows.

In addition to providing leading hardware platform design, Qualcomm has also launched a unified software stack across platforms, terminals, and operating systems, called the Qualcomm AI Stack. The Qualcomm AI software stack supports all current mainstream training frameworks and execution environments, and we also provide developers with different levels of optimization interfaces, as well as a complete compilation toolchain, so that developers can develop, optimize, and deploy models more efficiently on the Snapdragon platform. It is worth emphasizing that the Qualcomm AI software stack is a unified solution across platforms and terminals, so developers can easily migrate this part of the work to all other products of Qualcomm and Snapdragon as long as they complete the optimized deployment of the model on one platform of Qualcomm and Snapdragon.

During this year's MWC Barcelona, Qualcomm released a very heavyweight product, the Qualcomm AI Hub (Qualcomm AI Hub). The product is aimed at third-party developers and partners, and can help developers make full use of the hardware computing power of Qualcomm and Snapdragon's underlying chips to develop their own innovative AI applications. The process of app development with Qualcomm AI Hub is as simple as "stuffing an elephant in the freezer." The first step is to select the required model according to the use case, the second step is to select the Qualcomm or Snapdragon platform to be deployed, and the third step is to write only a few lines of script code to complete the entire model deployment and see the running effect of the application or algorithm on the terminal side.

At present, Qualcomm AI Hub has supported more than 100 models, including generative AI models that everyone is more concerned about, including language, text and image generation, as well as traditional AI models such as image recognition, image segmentation, natural language understanding, natural language processing, etc. For specific model information, you are welcome to visit the Qualcomm AI Hub website (AIHUB.QUALCOMM.COM) for query.

Finally, to summarize Qualcomm's AI lead. First, Qualcomm has unparalleled AI performance on the device side, second, Qualcomm has top-notch heterogeneous computing capabilities, enabling AI capabilities to run through the entire SoC, fully unleashing the capabilities of CPU, GPU, NPU, and Qualcomm sensor hubs to application developers, third, we provide scalable AI software tools, such as the aforementioned Qualcomm AI software stack, and finally, we can support a wide range of ecosystems and AI models.

(8664705)