laitimes

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

From Ultrabook to Evo, from USB 1.0 to Raiden 4, Intel has had an unusual focus on platform innovation for years. But unfortunately, in the field of discrete graphics cards, although many efforts have been made before, the results have been minimal. Now that Intel has officially released the Mobile Platform-based Iris Monochrome Series, can it become Intel's next major milestone and make up for the shortcomings of intel platforms and make it truly "perfect"?

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

On March 30, 2022, Intel officially released the Mobile Platform-based A-Series Mobile Terminal Display. There is no doubt that the Intel Sharp series in laptops marks the next big point in Intel's journey. The first Intel Iris 3 series laptops are now available, followed by the more powerful Intel Iris 5 series and 7 series this summer. All Mobile Exclusive Displays will include a common architecture and advanced feature set, including support for DX12 Ultimate Ultimate Edition and Intel's advanced AI and media engines. The 3 Series offers enhanced 1080p gaming and content creation performance for Evo's thin and light laptops. The Glisten 5 Series and Ryzen 7 Series will offer the same leading content creation capabilities, but with higher graphics and computing power.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing
Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Xe-HPG, the foundation for gaming and creation

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Intel Iris A-Series products are based on Intel's Xe HPG unique display architecture, which includes a powerful AI engine and an enhanced media engine that supports next-generation codec standards. In addition, Intel has created the next-generation Xe display engine and a new graphics pipeline to handle a variety of different display tasks.

Xe-HPG: Rendering Slice

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

To understand the Xe-HPG architecture, first look at the Rendering Slice. This is a basic building block of reusable IP in the Xe-HPG architecture. In the Xe-HPG microarchitecture, every 4 Xe cores form a render slice. Each Xe core is equipped with a considerable number of computing units, such as the vector engine XVE, the matrix engine XMX, and so on. In addition, Xe-HPG also integrates other mainstream graphics technologies, such as mesh shading, sampler feedback, etc.

Xe-HPG's biggest feature is flexibility, by overlaying the rendering tiles can be built different SOCs (minimum two, maximum eight), which enhances the scalability of Xe-HPG and makes the product line more abundant. Xe-HPG delivers 1.5x more performance per watt compared to the previous generation of Xe-LP microarchitectures. Meanwhile, Xe-HPG's render slicing supports dedicated hardware optical tracing units from DX12 Ultimate, Microsoft DXR, and Vulkan RT. Each slice is also equipped with four hardware photo-tracing accelerators to support real-time ray tracing. That is to say, the Xe-HPG architecture is hardware-enabled for light chasing, and gamers can have more choices.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing
Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Xe-HPG:Xe-Core

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Let's take a look at the core architecture and components of Xe-HPG. As the core component of the Xe-HPG architecture, Xe-Core (Xe core) replaces the concept of EU and becomes the most basic execution unit in the Xe-HPG architecture. Each Xe core includes 16 256-bit wide SIMD vector engines that perform most of the operations for traditional graphics shaders, primarily responsible for the computation of traditional image processing. Because the algorithmic core of AI revolves almost entirely around a series of large matrix multiplication and accumulation algorithms, Intel has built a dedicated matrix engine in each Xe core to perform hardware acceleration. The Xe core contains 16 matrix engines, each 1024 bits wide. The matrix engine is designed to speed up AI operations. To meet the high bandwidth requirements of matrix, vector, and ray-traced units, we built a large 192KB of local memory in each Xe core. It can be dynamically allocated between L1 cache and shared local memory (SLM) based on the needs of each workload.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

The vector engine of the Xe core improves the ALU unit, and the FP instruction can run simultaneously with the integer operation (INT) instruction, including the fast IN8 calculation of the DP4a. In addition, Intel has enhanced AI capabilities by adding a new XMX matrix engine for high-throughput matrix multiplication, covering the most common AI data types, including BF16 and INT8. In order to effectively improve execution performance and computing power, Xe-HPG can simultaneously schedule and execute floating-point FP, integer INT and XMX instructions, and parallel two engines and shared resources in the form of locksteps.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing
Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Here's an example from Intel. Xe-HPG's vector engine uses basic SIMD vector instructions to perform 8 parallel multiplications and then 8 parallel additions (i.e. a total of 16 Ops per clock).

DP4a is optimized for AI computations that do not require 32-bit precision. It divides all 32-bit inputs into 8-bit blocks and then multiplies them independently. This is a total of 32 parallel multiplications (shown by the purple square) and then 32 cumulative to a total of 64 operations per cycle, which is 4 times better performance than the standard SIMD MAC.

Finally, matrix engine XMX takes it to the next level by pipelining multiplication accumulation to 4 depths. As with DP4a, each operand is divided into 4 blocks, which are independently multiplied and added to implement 64 operating instructions per stage (purple tile shown). With 4 depth pipelines, each clock generates 256 operations, achieving 16 times the performance of a traditional 32-bit SIMD MAC.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

XeSS: Benchmarking DLSS and FSR

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

One of the main applications of the XMX matrix engine is the use of AI in real-time rendering, which led directly to the birth of XeSS. XeSS is a supersampling technique that provides higher performance in games compared to traditional high-resolution rendering. It uses neural networks to assist motion vectors to produce beautiful, high-resolution images from low-resolution renderings. Seeing this, I think everyone feels that it is not strange. Right! NVIDIA DLSS and AMD FSR shouted in unison "I know this!" However, at the meeting, Intel has not yet announced more technical details and actual performance of XeSS, which needs to be verified by subsequent tests.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Intel announced that 14 games will support XeSS technology, and more are expected to be added in the coming months.

Xe Media Engine: Optimized codec to accelerate media authoring and playback

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

In Intel's view, the Xe media engine in the Xe kernel is one of the most advanced media accelerators. The Xe Media Engine has built-in encoding/decoding commonly used in the industry, including H.265/HEVC, H.264/MPEG-4/AVC, VP9, and hardware acceleration support for AV1 encoding and decoding, especially when it comes to hardware encoding support for AV1 builds, Intel is at the forefront.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

AV1's efficiency is 50% higher than the most common codec H.264 and 20% higher than HEVC, enabling video creators to deliver higher picture quality with lower bandwidth and smaller file sizes. Compared to previous codecs, AV1 offers a better compression ratio and better visuals. AV1 encoding hardware acceleration in Sharp Graphics cards is up to 50 times faster than traditional software implementations. Currently, FFMPEG, Handbrake, Adobe, and XSplit have all integrated support for The Brilliant AV1.

Xe Display Engine: High output specifications, SpeedSync solves screen tearing

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

In the display output section, Xe-HPG supports HDMI 2.0b and DP1.4a specifications, which means that gamers will be able to enjoy 1080p@360Hz, or the combined output of 4 monitors with 4k@120Hz HDR. This is in line with current NVIDIA and AMD mainstream graphics cards.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Also worth mentioning is Speed Sync in the Xe display engine. As you can see from the name, this is a technology similar to AMD FreeSync or NVIDIA G-Sync that can solve the problem of screen tearing caused by the graphics card output not synchronized with the display refresh rate.

In addition to SpeedSync, Smooth Sync is also accompanied by another new Intel technology introduced by Ruixuan. It can solve the problem of screen tearing by blurring the screen tearing function through the jitter filter function, providing players with a better sense of game immersion, that is, using blurring processing to reduce screen distortion and make the image look more coherent. Intel currently says that all Iris graphics cards will support the Spirit Sync feature.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Alchemist product preview

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Intel's current generation of I-Ray A series graphics cards, code-named Alchemy, has two different chip designs. As shown on the left of the figure above, the larger chip code name ACM-G10 contains 32 Xe cores and optical chasing units, 16MBL2 cache, 256-bit GDDR6 interfaces, 16 PCIE4 interfaces; the smaller chip codename ACM-G11 on the right contains 8 Xe cores and optical chasing units, 4MBL2 cache, 96-bit video memory interfaces, and 8 PCIE4. Both chip designs include two Xe multifunction codec engines and a 4-channel display output engine.

Specific to the product, the Intel Iris A series of mobile discrete graphics covers everything from low-power mainstream graphics for thin and light notebooks to high-performance graphics for gamebooks.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing
Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

As can be seen from the figure, the Iwi-hyun mobile graphics released by Intel this time mainly include the A350M and A370M of the I-Hyun 3 series, the A550M of the I-H-5 series, and the A730M and A770M of the I-H-7 series.

Obviously, the two products of the Ruixuan 3 series are chips based on 8 Xe core designs, equipped with 4GB of GDDR6 video memory, TDP up to 50W, minimum 25W. The number of Xe cores and the memory width of the A550M are basically twice that of the A370M, which should be based on the 32Xe core chip reduction specifications. Finally, looking at the Ruixuan 7 series, there are also two products, A730M and A770M, up to 32Xe core, 16GB memory, 256bit bit width. Here is a reminder of the product time to market, the products equipped with the Ruixuan 3 series A350M and A370M will meet with the end user immediately, and the products equipped with the Ruixuan 5 and The Ruixuan 7 series will be available this summer. At the same time, the entire series of Sharp A Series Mobile Exclusive displays support DirectX12 Ultimate Ultimate Edition, including ray tracing, variable rate shading, mesh shading, and sampler feedback.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

▲ From the data released by Intel, at 1080p resolution, medium or high quality settings, the A370M has up to twice the performance improvement compared to the core display of the 12th generation Core mobile processor. At the same time, these games can run 1080p@60fps, which can give users a good experience.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

The A370M graphics card can achieve 90fps in most competitive games at 1080p resolution, and this is measured at medium or high quality.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

▲Compared with the integrated graphics card of the 12th generation Core, the design and creation performance has also been significantly improved on the platform equipped with the A370M discrete graphics card. In terms of video codec, taking Davinci as an example, the performance of 4K H.264 to H.265 can be improved by up to 60%. In terms of AI-related functions, such as the two application scenarios in Adobe PR, there is a doubling of performance improvements.

Deep Link: The secret to performance improvements

Above we have seen a huge performance improvement in the creation of Ruixuan A series graphics cards, but this is not only from the standalone graphics card, but also thanks to the blessing of the Deep Link technology based on the entire system.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

What is Deep Link? It is actually an umbrella term that covers different technologies such as dynamic power sharing, supercoding, and super hashrate.

Let's start with dynamic power sharing, a technology that maximizes cpu or GPU performance within the limits of system power consumption. Back in 2016, Intel introduced the first version of Dynamic Power Sharing, which dynamically distributes power between the CPU and GPU. In general, when running the load, if the CPU needs more power, the system will distribute more power to the CPU, and vice versa for the GPU, the ultimate goal is to make the notebook perform better in the application when the overall power is certain.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing
Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

As shown in the figure, when the system finds that the GPU load is too high, the system will dynamically adjust the power ratio of the GPU and the CPU to distribute more power to the GPU. When the system finds that the GPU load is low, such as a light office scenario, it will dynamically adjust the power ratio and distribute the power to the CPU. If a good dynamic equilibrium is achieved, the current power ratio is maintained. All laptops powered by Intel's 12th Generation Core and Sharp can enable this technology.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Looking at the super encoding technology, it is to use the codec engine of two graphics cards at the same time to greatly improve the codec efficiency. This collaboration is achieved through OneVPL's API interface. OneVPL is a cross-platform, open framework where applications can identify and invoke multiple multimedia engines on the platform through interfaces, making full use of video processing capabilities. When supercoding begins to work, groups of decoded raw frames are handed over to oneVPL via specific API functions, which are then assigned to different multimedia engines in groups and copied to the corresponding memory for caching. Regardless of how many frames each group has, the corresponding cluster or solo multimedia engine will begin to encode according to the set format. OneVPL will complete the subsequent packaging work, and stitch the encoded frames into a final video to output. This parallel processing is a significant increase in coding efficiency over a single graphics card.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing
Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Finally, let's talk about super computing power. Every notebook equipped with Intel Iris Discrete graphics can benefit from the computing power of discrete graphics, but the integrated graphics of Intel CPUs also provides a computing engine. So Intel wanted to distribute the load reasonably to different compute engines, so it designed MLS, which is a machine learning-based service.

MLS is a framework in OpenVino that intelligently distributes loads to different hashing blocks. Depending on the characteristics of the current application or load, such as latency sensitivity, throughput, performance requirements, power consumption, and so on. These factors help MLS make decisions to distribute the load to a discrete graphics card, integrated graphics card, or CPU. When you want to process a video, such as noise removal, superdivision, sharpening, etc., the imported screen will be passed to the MLS frame by frame, and each frame will be broken down into several blocks, which are arranged in the work queue. MLS starts a worker thread and automatically allocates these blocks to different hashrate modules according to demand. As shown in the figure above, part of it is assigned to the clustered calculation engine and part is assigned to the unique matrix engine. As the graphics card completes the current task, MLS will continue to dispatch new tasks. Until the final processing of all blocks is completed, these enhanced screens are packaged as output.

Arc Control: The Driver panel is feature-rich

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

With the launch of the Iris mobile unique display product, Intel has also released a new graphics driver control interface, called Arc Control - Intel Iris Control Panel. Arc Control provides a one-stop stop for settings or information reception related to Sharp Graphics cards, including workloads that allow users to quickly upgrade drivers and see graphics performance in a timely manner, virtual camera settings, automatic generation of game highlight moments, and software settings that allow everyone to become professional streamers. Its functionality is similar to NVIDIA's GFE panel and AMD's Radeon Software driver software.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

The Arc Control Dashboard is based on advanced layer overlay technology, independent of the operating system, and is less likely to consume processor workloads, affecting overall machine performance and disrupting the user's original ongoing tasks. Users can enter and exit Arc Control at will, and can interact with Ark Control through shortcut keys, which is very convenient in use. In addition, the Arc Control Dashboard will provide a convenient installation and automatic update service, and users will be automatically notified whenever a new game is on the shelves or a game will be released with a new driver, and if they don't want to be bothered by this information, they can flexibly adjust to their needs.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

Secondly, the Ruixuan Control Control Panel will also provide performance testing, which will be given specific parameters and icons for user reference. These specific parameters and icons can provide users with a complete GPU workload situation, and users can make appropriate adjustments according to their own needs. At the same time, there will also be monitoring layer pop-ups floating on the game, and users can see the working status of the whole machine when playing the game.

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

The Arc Control control panel also provides convenient live streaming and streaming related functions and settings, and users can quickly turn on the live streaming function to share the wonderful pictures of the game to the live broadcast platform. With shortcut keys, users can quickly turn on the virtual camera to remove the background, automatically adjust the picture scale, and capture and save highlight moments in the game. After the release, there are about 10 games that support the ability to capture screenshots or videos of the game.

It's important to note that the Arc Control Control Panel works with all Intel graphics, so not only Intel's discrete graphics are available, but Intel's integrated graphics as well.

Sharp is coming

Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing
Xe-HPG lands in the mobile market with Intel Iris Mobile Unique Parsing

At this release communication meeting, Intel said that the world's first notebook equipped with Ruixuan graphics card is Samsung Galaxy Book2 Pro, equipped with Ruixuan A350M, which has been pre-sold in some regions, and there is currently no such product in China. In Intel's product plan, there are everything from ultra-thin and ultra-thin books to high-performance game books. Starting from the second quarter, notebook products using the Ruixuan 3 series are expected to be available in China, and the products of the Ruixuan 5 and Ruixuan 7 series are expected to be released this summer, with prices starting from $899. Players who like to try it early, you can look forward to it!

Read on