Research on image processing SoC solutions (video encoder and DLA)

author：Xu Dan's writing space 2024-04-19 19:12:00

Recently, I communicated with fans on the official account, and mentioned the image processing SoC, including CPU + ISP + video encoder + DLA + axi/ahb/apb bus and peripherals, I think it is very interesting and worth learning and understanding, especially the two concepts of video encoder and DLA.

1 Video Encoder

A video encoder is a tool that compresses and converts a digital video signal into a specific format. Encoders usually use specific algorithms that make video files smaller and easier to store and transfer. Its development originates from the development of the Internet, the real-time data of high-definition video is huge, in order to achieve the transmission of such a high amount of video images under limited bandwidth, the audio and video application system compresses and encodes the image by using encoding equipment to greatly reduce the amount of data, and then transmits it through the network.

There are three main types of functions:

: Change the way video data is stored, reducing the size of video files while maintaining high quality.

: Transcode videos so that videos in different formats can be played on different devices and platforms.

: Affects the image quality of the video by increasing or decreasing the compression factor.

Common video encoders include H.264, H.265, MPEG-4, and VP9.

It is one of the most popular formats at present, and it is widely used in scenarios such as mobile devices, smart TVs, and digital billboards. It is mainly used for, and 4K (4096*2160) and 8K (8192*4320) are not enough to transmit.

It is the successor to H.264 and can further improve the video compression ratio. H.265/HEVC retains some mature technologies and inherits its existing advantages on the existing mainstream video coding standard H.264, and adopts advanced coding technologies such as coding segmentation and predictive coding technology based on quadtree structure, the video compression efficiency will be about half higher than that of H.264, and it can easily realize the transmission of 1080P images at low bandwidth, and support the transmission of 4K and 8K high-definition images.

It is another popular video compression standard that is also widely used in fields such as digital TV and online video.

is a free video encoder launched by Google with better video compression.

2 FOR

The Deep Learning Accelerator (DLA) is a product of NVIDIA. It is an application-specific integrated circuit, a fixed-function accelerator engine for deep learning operations, which can efficiently perform various fixed operations common in modern neural network architectures, such as convolution, deconvolution, full connection, activation, pooling, batch normalization, and other layers, DLA does not support Explicit Quantization. According to NVIDIA's official documentation, DLA supports the execution of about 15 major AI operators, and unsupported operator types will be offloaded to the GPGPU for calculation.

Although DLA doesn't have as many layers of support as GPUs, it still supports a variety of layers used in many popular neural network architectures. In many cases, tier support may meet the requirements of the model. For example, the NVIDIA TAO Toolkit includes a variety of pre-trained models supported by DLA, ranging from object detection to motion recognition. While it's important to note that DLA throughput is typically lower than GPUs, it's very energy-efficient.

Software invocation process:

From the point of view of application-> model loading> user-mode runtime library-> device file system (devfs ioctl)->kernel driver->NPU hardware calls, it is the same as the development idea of VIP.

NVIDIA Xavier uses GPU as the computing core, and its SoC solution is CPU+GPU+ASIC: there are 4 main modules: CPU, GPU, Deep Learning Accelerator (DLA) and Programmable Vision Accelerator (PVA). It occupies the largest area with the GPU, followed by the CPU, and finally supplemented by two ASICs:.

The counterpart to this is Tesla. Tesla FSD uses NPU (an ASIC) as the computing core: there are three main modules: CPU, GPU, and Neural Processing Unit (NPU). Among them, the most important and the largest area is Tesla's self-developed NPU (which belongs to ASIC), which is mainly used to run deep neural networks. The GPU is mainly used to run the post processing part of the deep neural network. The total SoC solution is also CPU+GPU+ASIC, and there are functions similar to DLA in NPU, but the name is different.

Research on image processing SoC solutions (video encoder and DLA)

Read on