laitimes

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

author:Zhenyi Technology

1. Performance analysis

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

This time, in addition to the standard version, the other three models of the Pura70 series are equipped with a new processor, the Kirin 9010. If you just look at the processor model, combined with the previous Kirin processor naming rules, it is generally considered that this is an overclocked version of the Kirin 9000S.

As a result, as soon as the CPU specifications of the processor came out, it showed that the main frequency of the large core was greatly reduced! As for the main frequency of the medium core and small core, it was slightly increased, and it also has the same hyper-threading design as the Kirin 9000S.

The GPU part is still the same Maleoon 910 used in the Kirin 9000S, so the only change in this generation is the CPU part. As can be seen from the CPU performance score (as shown in the figure below), although the frequency of the large core has decreased, the performance has increased!

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

Obviously, this is not a downclocked version of the Kirin 9000S, but a new processor with an upgraded CPU core microarchitecture!The single-core performance of the CPU has increased by 8.4% in the GB6 benchmark score, and the multi-core performance has also increased by 8.4%.

That is to say, if you look at the GB6 running score alone, the large core IPC of the Kirin 9010 has been increased by 23%, and let's see how the integer of its large core and the IPC of the floating point running score have changed.

As shown in the figure below, the Kirin 9010 has a 5.8% advantage over the Kirin 9000S's large-core integer performance at the same power consumption, in other words, its energy efficiency is higher, and if you combine the difference between the two main speeds, it means that its integer IPC has improved by 20%!

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

In the large-core floating-point performance part of the figure below, the Kirin 9010 has a 7.25% advantage over the Kirin 9000S, and the power consumption performance is also very consistent - the energy efficiency is significantly improved, combined with the difference between the two main clocks, its floating-point IPC is improved by 21.7%.

  • The summary is that the CPU large-core IPC performance of the Kirin 9010 is significantly improved by more than 20% compared with the Kirin 9000S. And because the frequency is set relatively low, its energy efficiency performance is also more than 5% higher than that of the Kirin 9000S.

It can also be seen that in order to improve the energy efficiency of the CPU under the limited process technology, Huawei has come up with a way to greatly upgrade the microarchitecture but reduce the main frequency, and finally achieve a performance improvement with the same power consumption.

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

2. Microarchitecture analysis

Previously, the CPU core of the Kirin 9000S was slightly weaker than the Snapdragon 888 in terms of IPC performance, but this time the CPU core of the Kirin 9010 has the same integer IPC as the Snapdragon 8+ Gen1, and it has surpassed it in the floating point IPC!

That is to say, this time Kirin 9010's CPU self-developed large-core microarchitecture, in terms of performance design, is relatively close to the Snapdragon 8+ Gen1 X2 ultra-large core microarchitecture, let's explore the similarities between the two from the microarchitecture aspect.

Starting with the Cortex-X2 architecture, we can see from the figure below that the front-end has an 8-emission width, the ROB depth of the out-of-order window is 288, and the back-end integer part has 4 integer execution units, and the pipeline has been reduced from 11 to 10 in X1.

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

As for the Kirin 9010's large-core microarchitecture, from the known information, it is only 8 emission correspondings to the X2 microarchitecture, and other such as 8 decoding width and 6 integer execution units, are greater than X2's 6 decoding width and 4 integer execution units.

In addition, in terms of caching system, the Kirin 9010 and the Snapdragon 8+ Gen1 are also very similar, both of which are consistent in the front-end L1 instruction cache, the back-end L1 data cache, the L2 cache, etc., but the Kirin 9010 has no exact value in terms of L3 cache and SLC cache.

Since the L3 cache and SLC cache mainly affect the energy efficiency performance of the large cores, they can be regarded as controlling the variables in terms of the caching system, that is, in the case of wider front-end and back-end, Kirin only has a floating-point IPC that surpasses X2.

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

The reason for this situation is related to the manufacturing process. In addition, when it comes to this, we have to look at the direction of microarchitecture, because Huawei's microarchitecture design route of the new Kirin coincides with the core route.

For example, the width of the front end of the Kirin 9010 large core is 8 decoding and 8 launch, which is the same as the front end width of the Apple A14, and the number of integer units at the back end of the two is also 6. However, the overall advantage of the caching system on the fruit side is huge, and the ROB depth is as high as 630!

In the end, the A14's large-core integer IPC performance is 31% higher than that of the Kirin 9010's large-core!

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

3. Process analysis

Finally, we have to go back to the crux of everything: Huawei's microarchitecture design capabilities are severely constrained by the limited manufacturing process, and in order to maintain a balance between performance and energy efficiency, the current CPU architecture of the Kirin 9010 is finally achieved.

For example, if Huawei wants to learn how to stack the cache system and ROB deep, it will need to add a large number of transistors, but this will push up costs, increase power consumption, and reduce yields. And there is only one solution, and that is to upgrade the manufacturing process.

In terms of process upgrades, as shown in the figure below, upgrading from TSMC's 7nm process to TSMC's 5nm process can increase the logic density by about 1.84 times and the SRAM density by 1.35 times, while reducing the power consumption of the same channel by 30%, which is extremely significant.

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

Among them, the largest increase in logic density can greatly reduce the circuit size of each functional module in the SoC, which means that more transistors can be stacked in the same area, or the chip area can be greatly reduced with the same number of transistors.

SRAM density, on the other hand, is associated with all levels of cache, allowing for larger caches to be crammed into the same area as before, and because the cache system is closely related to energy efficiency, there is a lot of leeway for performance design when the stack is in place.

Taken together, the result of these two massive increases in density is that there is a lot of room for performance improvement in microarchitecture IPCs, as more transistors can be crammed into the same or smaller chip area, both in terms of performance and efficiency.

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

At this point, the reason why the Kirin 9010 large core is backed by a new microarchitecture, but the main frequency is still reduced to 2.3GHz is basically clear, in essence, the process technology limits the design space of the microarchitecture - the performance is greatly improved and the energy efficiency performance cannot be both.

If there are corresponding process conditions, then the large-core microarchitecture of the Kirin 9010 can be like the Apple A14 built on TSMC's 5nm process, adding enough transistors to de-pile the cache and ROB, so as to narrow the gap between the two IPC performance.

The final question is, how can Huawei and Apple's CPU microarchitecture routes coincide? This is actually quite a coincidence, Huawei is forced to extend the Kunpeng self-developed core on the computer side to Kirin, while Apple is postponed from the A series to the M series.

How good is Huawei's latest Kirin 9010? The large-core IPC has been greatly improved, and the microarchitecture is analyzed in ultra-detail

Further reading:

Is the Kirin 9000S really inferior to the Kirin 9000? Huawei's "Qi Fighting Machine" has already given the answer

The whole history of Apple's core manufacturing began in 1986, flourished in 2010, and flourished in 2020

END

Committed to the exploration and collation of mobile phone knowledge