laitimes

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

author:Trendy electronics

After writing the interpretation of Meteor Lake, I believe my friends and I feel that it is true that a deeper perspective is needed to fully explain the most complex "four-in-one" architecture in PC history. I remember when I was a child, there was an animated movie, which was about four robots of different forms combined into a larger robot, and the strength of combat power lies in a reasonable combination.

Meteor Lake is the same, although it has been divided into GPU Tile, SoC Tile, IO Tile and Compute Tile four functional modules, but still need to put each IP in the most suitable position for them, and in line with the design characteristics of this generation of processor high-performance energy consumption ratio and smooth communication, in fact, this is the real core competitiveness of Meteor Lake. So, next, let's see how Intel builds blocks in terms of each functional module.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

01Intel 4 process technology determines the performance of Compute Lite

For a detailed interpretation of each tile, let's start with the Intel 4 manufacturing process that determines the performance of the Compute Tile. In the past, we have had a variety of discussions about Intel, a critical manufacturing process point in its evolution. At this Intel ON Technology Innovation Summit, Intel made some qualitative information about its performance, although not the final quantitative data will wait until the official release of the Ultra Core processor, but it is already very exciting.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Describe the performance improvement of the Intel 4 manufacturing process using the high-performance logic library area this indicator, compared with the Intel 7 process in (integration) has a 2 times reduction, but also in the performance power consumption ratio reduced by more than 20%, in addition to the upgrade of manufacturing equipment (EUV), the intuitive feeling is that DIE has become smaller, after the application of new 8VTs, better coordination of frequency and voltage relationship, thereby improving more efficient underlying power supply.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Just now I mentioned that Intel's manufacturing process has an equivalent concept when it is equivalent to the process technology of other manufacturers, and one of the important indicators is transistor density. As far as I can remember, Intel has had a higher transistor density than any other since Cannon Lake. However, the unit libraries used by various semiconductor manufacturers are different, even if the same generation of process is difficult to directly compare with transistor density, can not fully reflect the manufacturer's process level, for example, transistors on DIE are not evenly distributed, so in semiconductor manufacturing transistor density is more used as a reference quantity.

If in the same manufacturer's product architecture, the increase in transistor density brought about by the increased cell library height does mean an increase in performance on DIE. Compared with Intel 4 and Intel 7, this time Intel announced that the high-performance library was upgraded from 240 library height to 408, resulting in a reduction in the DIE area by 0.59X (the reduction in DIE area also means an increase in transistor density). Of course, for FinFET, increasing the Fin height or decreasing the Fin pitch can effectively increase the drive current, and the decrease in contact gate spacing and MO spacing also means a decrease in DIE area, in fact, the DIE area reduction by half is converted by 0.59X reduction caused by the increase in library height multiplied by 0.83X reduction brought by gate spacing.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

The progress of the process technology is synchronized with the semiconductor manufacturing equipment lithography machine, just now I have mentioned the EUV lithography machine in the Intel PC processor for the first time, it will bring more refined process results and process efficiency to the entire Intel 4 manufacturing process. In addition, according to Intel, the world's first NA EUV (0.55 high numerical aperture) will also settle in Intel, which means that EUV will only stay at two nodes in the progress of Intel's PC processor, and then be rerouted to a new production line, which is the real difficulty of Intel's four-year five-process process.

Talking back to the Intel 4 manufacturing process, it is precisely because of the introduction of the EUV lithography machine, whose quadruple exposure process optimizes the metal stack of 18 layers on the connecting layer, which contains 13 copper interconnect layers and 5 reinforced copper layers. It can be seen that the densest reinforced copper layer achieves a 30nm metal layer spacing, which greatly improves the number of layers and density.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

In addition, it is necessary to talk about some changes in Intel's contact materials in the connection layer. In the development of the manufacturing process, Intel has been optimizing the contact materials in the process to improve electron mobility, simply put, to reduce resistance, Intel 7 before the processor connection layer has been using tungsten material, Intel 7 uses two different special metal layers (tantalum isolation layer with cobalt wire and nitride with copper alloy) to achieve smaller resistance and longer life, but the two materials have been difficult to balance in lifetime and electron mobility, so Intel 4 To further promote the application of new materials, tantalum/cobalt and pure copper metal alloy processes are used on densely enhanced copper layers, while using long life and high electron mobility.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

The biggest contribution of EUV is the improvement of the manufacturing process, which can achieve more accurate semiconductor processing on a more streamlined process, which is also the basis for increasing the density of transistors. Intel this time through EUV lithography technology, with a single EUV layer once processing instead of the previous lithography and grinding layering processing, so that the total number of masks and the total number of process steps have been greatly improved. It is also worth mentioning that after using a more refined EUV, the connection structure inside the chip is more standard, abandoning the previous non-standard structure, which will make APR very simple and efficient.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

It is the above improvements that keep the yield of Intel 4 very high at the beginning, unlike the previous 14nm and 10nm process processes that need to be optimized by the second generation of products to achieve a better yield. These economies accumulated on Intel 4 will also lay a very good foundation for future Intel 20A and Intel 18A.

02Building a new SOC Lite is a leap forward in energy efficiency

While SOC Lite is the first feature module to appear in Meteor Lake, we don't really have to be mysterious about it. In fact, before the emergence of SOC Lite, Intel classified non-compute-intensive IP such as Wi-Fi modules, display output units, and memory controllers into the Uncore category (corresponding to compute-intensive core Core). The reason for setting up a SOC Lite is mainly to better achieve a high energy efficiency ratio, so the driving structure of this functional module has a clear goal from the beginning of the design.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

In the last part of Meteor Lake, we mentioned that SOC itself can be regarded as a small CPU, but it is not entirely accurate. The reason why it is named SOC is still taken from System On Chiplet, but it is different from independent SoCs, which are more functional IPs, packaged together to improve the energy efficiency ratio of the entire CPU. I divide the IPs into three categories: newly added, such as NPU and LP E-Core; Various functional IPs in the previous Uncore category; Migrated from other functional modules. Therefore, the IP composition of SOC Lite is quite complex, and there are NPU, LP E-Core, memory controller, system agent, wireless controller, IO cache module, power management module, image processing module, display output module... Everyone knows that in a meeting from people, you want to find a person alone is a reasonable way of communication (communication), rather than directly going up and grabbing the microphone (loop bus), thus affecting everyone.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

To learn more about SOC Lite, let's first talk about its four design principles:

1. Compute-intensive IP has been reclassified to optimize its power and greatly improve the efficiency ratio without compromising performance.

2. The I/O has been extended enough bandwidth to extend the main IP inside SOC Lite to match larger system memory.

3. Among the cores of the SOC, a very low-energy core is introduced.

4. Reorganized some algorithms for power management.

In a word: mainly to replan and introduce new IP, rebuild bus and IO channels, make hardware resource scheduling independent, and normalize Uncore modules.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Next, let's take a look at how these design principles have changed on SOC Lite and how they achieve the ultimate goal of improving energy efficiency. First, let's look at the previous generation of mixed-architecture chips, Graphics Complex (graphics core) is attached to Core Complex (performance core and energy efficiency core collectively), the two share a Ring Fabric (ring bus), wherein the media codec is located in the graphics core.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions
Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Therefore, whether it is to call the performance core, energy efficiency core, or graphics core, media codec want to access the memory, you must go through the bus, system agent (System Agent), memory controller this line, access efficiency is very high, but the so-called pull the whole body, even if I only want to call the media codec to see the video, you need to activate all the logic units, and the ring bus is also the whole open, which is actually what just said "want to grab the microphone in the meeting to communicate with a single person", The microphone does allow the person who wants to communicate to hear clearly, but it also affects the attention of others. This is a very uneconomical practice from the point of view of energy consumption ratio.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions
Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

OK, you said take the media codec out of the graphics core. In fact, Intel also does this, in addition to the system agent, memory controller are taken from the ring bus, are put on the new SOC bus, OK, everyone has become a family, who needs to call directly on the SOC bus on the line, so the media codec, GPU Tile can directly access the content controller, Compute Tile also realizes independent resource call, other unused functional modules can be activated without power.

After more than an hour of explanation, everyone is already familiar with it, and our graphics Tile, that is, our XLPG Graphics, is on our graphics Tile. At the same time, our multimedia engine was moved to SOC Tile, and our display engine was moved to SOC Tile. On the IO Tile there is a Display PHIES responsible for displaying the output of the signal. We have upgraded the Meteor Lake engine to support up to 8K 60 10bit HDR decoding and 8K 30 10bitHDR encoding. We support a variety of advanced and legacy formats, including VP9, AVC, HEVC, AV1, and other legacy formats. The usage scenarios for different tasks and different users are different. Whether you do video playback or streaming, basic video editing or advanced video editing, you can get very good support from Intel Media Engine for gaming, productivity, and AI.

Next, let's talk about Intel's Display Engine. We did a few key things, starting with further optimization of the display and display power. The second is Display we can do compression on the full path. When you encounter a mismatch between the Display output and the Display solution, using this compression can provide the display output well, but the power consumption is very well controlled. Some modes including low power consumption can simultaneously reduce the demand for CPU memory graphics and reduce power consumption.

As far as standards are concerned, we support HDMI 2.1, DP 2.1 and full eDP 1.4 output specifications with resolutions up to one 8K60 HDR, or four 4K60 HDR, or higher refresh rates of 1080p or 1440p.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Next, it's time to talk about I/O and bandwidth, the 12th and 13th generation Core have always used the same ring bus, and the way to solve the bandwidth bottleneck and reduce latency in the past is to create high priority for some common IPs. However, on Meteor Lake, the integration of Uncore's IP in the new SOC Tile, especially the new IP, requires very high bandwidth, and then continues to use the original bandwidth solution, which will always cause congestion.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

To solve the problem of insufficient bus bandwidth, and the communication method of the ring bus is not economical, the most direct way is to create a new bus with a larger bandwidth, which Intel named the NOC bus, with a bandwidth of 128GB/s; In addition, it can improve the energy efficiency of each IP to access the memory, that is, the bandwidth requirements of the SOC internal IP are matched in real time, thereby solving the congestion between IP and IP, and between IP and bus; Third, it can allow the IP of the access bus to achieve independent communication, so Intel's engineers also call it "scalable fabric", in my opinion, this is very similar to distributed communication, or can be called "distributed extensible bus", of course, this is not the official Chinese name of the NOC bus.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

In addition, in order to solve some communication bottlenecks between I/O, this time Intel also added a bus in the SOC Tile - IO Fabric, and added an I/O cache block between the two buses to manage I/O sequencing and address translation, this new bus I plan to save for the specific interpretation of IO Tile.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Next is a very important improvement on the Meteor Lake split hybrid architecture, which is also located in the SOC Tile. In the past, mobilizing IP resources in Uncore required CPU computing units to control and coordinate, so it was necessary to fully energize and activate all computing units, which would also greatly reduce the performance of energy consumption.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Intel's solution is to add a very low-power energy-efficient core to the SOC Tile, which contains two cores, the low-power computing island energy efficiency core (LP E-Core) in the three-level computing core, and only need to mobilize the resources of the media IP through it to watch the video, allowing the Compute Tile and other IP to rest, thereby reducing the overall power consumption.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

It is very interesting that although it is in different tiles, Intel has completely opened up the calls of the three computing units in the system, and even in the Windows Task Manager, you can see the respective occupancy of the three levels of computing cores.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

After dividing the processor into four functional modules (Tile), the power module has also been redesigned, each Tile integrates a special power management controller (PMC), and an overall management unit (PUNIT) is designed on the SOC Tile, thus constituting a real-time scalable power management architecture, it is its existence that can separate the power supply of different IPs and each Tile in the SOC to achieve on-demand power saving and controllability.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

In terms of reducing power consumption, Intel has also made some additional optimizations:1. Integrated voltage control DLVRs for finer voltage control; 2. Dynamically adjust the bus frequency, reduce the frequency in real time according to IP requirements, and save total power consumption; 3. Actively adjust software and hardware for different workloads.

To summarize: SOC Tile is a new module that integrates a variety of UNCore IP and new function IP, which transforms the bus and power architecture, adds a third-level computing core, the overall design is to achieve a better energy consumption ratio, and integrates AI functions into it, which is currently Intel's most efficient UNCore design outside of non-computing cores, which will profoundly affect the architecture design of future generations of CPUs.

03Flexible and efficient IO Tile design, with dual-bus architecture to adapt to different IP and expansion requirements

Looking at the interpretation of SOC Tile, you should be able to understand that Meteor Lake aims to build a scalable architecture, so each tile wants to create a series of problems that are flexible enough to optimize and solve, including IO Tile.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Through this figure, it can be seen more clearly that the various IPs mounted on the NOC bus are characterized by high bandwidth and fast response, so that the devices on it can access the entire memory quickly and with low power consumption.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

SOC internal sensing, IO tile, and Thunderbolt, PCIe, WiFi, USB 4 for output... All strung together via a dedicated high-speed bus, IO Fabric. There are also two other IPs responsible for security, including Meteor Lake's new chip-level security engine SSE and platform-level reliability and manageable security module CSME, which are responsible for different layers of security. In addition, IO Tile provides USB 4 and PCIe outputs, and since it is mounted directly on the IO Fabric, I specifically confirmed to Intel that its performance and responsiveness are the same as those attached directly to the IO Fabric.

04 According to the new three-level computing core, to optimize the resource call mode

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

As mentioned earlier, Intel opened up the call of three computing units in the system, which is a very important computing execution logic of the discrete module architecture, we may wish to take a detailed look, compared to the previous secondary hybrid architecture, it needs to be further optimized in the hardware thread scheduler, so that the "low-power island" is added to the reasonable task allocation.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

The new computing unit framework mainly divides the tasks that should be borne according to power consumption, required performance, and response speed, and specifically how a task switches between different cores, which requires a hardware thread scheduler to provide a more complex fundamental strategy. It should be noted that, like the previous hardware thread scheduler, it does not specifically allocate processes to a specific core, but recommends the real-time hardware capabilities of P-Core, E-Core, and LP E-Core to the operating system.

The specific way is to divide Class 0~Class 3 four levels according to the ability to execute instructions in each clock, representing the number of instructions that P-Core or E-Core is executing required for a project, and then according to the weight of E and Perf (E stands for pursuing energy efficiency, Perf represents the pursuit of performance) to determine which range it falls into, so as to choose the right Core. This mechanic is a bit complicated, in a word, getting the right thread running on the right core at the right time.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Meteor Lake mainly enhances the feedback to the OS, when an IP occupies power consumption, the core power consumption will be dynamically redistributed, more accurate reporting of the entire core and the capabilities of each core. For example, when a high-performance frontend is assigned to 4 P-Cores, two processes are added to the E-Core again, and if the P-Core is executed and the two smaller processes are still executing on the E-Core, the hardware scheduling will create an OS to transfer the two processes to the LP E-Core of the SOC, so that the entire Compute Tile can be shut down. It can be seen that LP E-Core not only schedules each IP, but also joins the entire computing process.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

After talking about the call of the three-level computing core, let's further talk about how the new AI function is called. We've said before that NPU is a low-power AI acceleration engine for solving; The CPU is used for the computing module with very fast response speed, which is suitable for random and fast response to AI needs; GPUs, on the other hand, are suitable for large-scale AI applications. Therefore, the different processing units on the processor of an AI task mobilization are also different.

05GPU cores with the same architecture as Ruixuan graphics card achieve 2x performance improvement

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

After taking the media IP and display IP out of the GPU tile, not only more ample DIE area is allocated to the GPU. It can be seen that Xe has a 2 times performance improvement compared with the previous generation.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

The new media engine and display engine are removed from the GPU to form a more efficient video output stream together with the Display PHYs display output unit on the IO Tile.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

In addition, the two engines have also been upgraded, and the media IP supports up to 8K 60 10-bit HDR decoding and 8K 30 10-bit HDR encoding, and supports VP9, AVC, HEVC, AV1 and other wide range of formats.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

The display IP supports HDMI 2.1, DP 2.1 (20G), eDP1.4 output interface specifications.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

In addition, Intel also showed the internal structure of the new GPU Tile for the first time. It uses the same architecture as the Sharp Graphics card, with 8 Xe cores, 128 geometry rendering pipelines, distributed among two Render Slices, 1.33 times the power of Pixel and Samplers, and 8 new hardware ray tracing units. From the perspective of overall performance, higher frequency is achieved with lower voltage than the previous generation GPU.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

In addition, the main technical features accumulated by the Ruixuan graphics card have also been inherited, such as the better optimization of DX12 Ultimate, support for ray tracing and XeSS. In addition, the out-of-order sampling function has been added to further improve the accuracy of data sampling.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

In the Blender rendering test, it was more than 2x better than CPU. Comparison of the performance of competing products will need to wait until the official release of Meteor Lake or our review.

In addition, Intel also demonstrated the idea of AI in terminal devices such as PCs and the evolution of ideas based on Meteor Lake, in fact, before that, we have solved how to distribute and apply various AI applications on new processors from the hardware architecture, which also includes the evolution of AIGC (generative AI). Here, I will not expand this topic, combined with Intel's recent AI capability progress in CPU and GPU products, to talk to you about the driving factors in the evolution of AI.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Intel places more emphasis on end-side AI capabilities in the promotion of AI technology, that is, to solve the AI needs of as many users as possible in the front-end. For example, some small ISVs can use Meteor Lake and its later processor products for localized AI computing, and after large-scale deployment, they also have AI computing power comparable to the cloud; In addition, by solving AI computing power on PCs, much less is spent on server construction, power, and bandwidth, which allows a few software developers outside the top to smoothly advance projects; Front-end AI computing power can also be continuously carried out when there is no network deployment, which can also better protect user privacy.

Higher manufacturing process + higher energy consumption ratio + AI blessing - detailed interpretation of Meteor Lake's four modules and AI functions

Of course, Intel also provides more AI computing cores in addition to CPU, GPU, and NPU on the end side, and through the early layout of OpenVINO, all computing power can be opened up to serve AI, which is also in line with Intel's XPU product strategy. At present, the efficiency of AI computing power is not high, and by applying different libraries and algorithms, it is not unattainable to complete the powerful AIGC localization layout through PC processors in the future.

Written at the end:

Currently, Meteor Lake is demonstrating a huge architectural change that transcends the entire industry. For Intel, it is most important to digest the Intel 4 manufacturing process brought by EUV manufacturing equipment and continue to improve core computing power after the large-scale mass production of Co-EMIB packaging process. Because this generation of processor chips represents Intel's chiplet (chiplet, chip) manufacturing and integration capabilities under the discrete module architecture, it is also the best opportunity for Intel to promote the IDM 2.0 foundry strategy.

Under the wave of AI, Intel also took advantage of the architecture of Meteor Lake to release, came up with its own ideas on the layout of the end side, and also sounded the clarion call for national AI. Of course, what makes us most emotional is that the chip giant that has been established for more than 50 years has once again taken an accelerated pace, and Meteor Lake will be the most important step to return to the peak of process technology.