laitimes

Apple launched the M4 chip, mediocre?

author:175500; yse
Apple launched the M4 chip, mediocre?

Apple today unveiled the M4, the latest chip to deliver exceptional performance to the new iPad Pro. Built with second-generation 3nm technology, M4 is a system-on-chip (SoC) that improves the industry-leading energy efficiency of Apple silicon and enables the incredibly thin and light design of the iPad Pro. It also features an all-new display engine that drives the groundbreaking Ultra Retina XDR display on iPad Pro with stunning precision, color and brightness.

The new chip's CPU boasts up to 10 cores, while the new 10-core GPU builds on the next-generation GPU architecture introduced by M3 and brings dynamic caching, hardware-accelerated ray tracing, and hardware-accelerated mesh shading capabilities to the iPad for the first time. The M4 has Apple's fastest Neural Engine, capable of performing up to 38 trillion operations per second, which is faster than the neural processing unit of any AI PC today. Combined with faster memory bandwidth, next-generation machine learning (ML) accelerators in the CPU, and a high-performance GPU, M4 makes the new iPad Pro an extremely powerful AI device.

Johny Srouji, Apple's senior vice president of hardware technology, said: "The new iPad Pro with M4 is a great example of how best-in-class custom chips can be built to achieve breakthrough products. "The M4's energy-efficient performance and its all-new display engine make the iPad Pro's thin and light design and game-changing displays possible, while fundamental improvements in the CPU, GPU, Neural Engine and memory systems make the M4 ideal for understanding the latest applications that leverage artificial intelligence. All in all, this new chip makes the iPad Pro the most powerful device in its class. ”

Apple launched the M4 chip, mediocre?

TSMC's second-generation 3nm process

Consisting of 28 billion transistors, M4 is built with second-generation 3nm technology, further enhancing the power of Apple silicon. The M4 also features an all-new display engine designed with groundbreaking technology that delivers the incredible precision, color accuracy, and brightness uniformity of the Ultra Retina XDR display, a state-of-the-art display created by combining light from two OLED panels.

From this description we can be highly sure. Apple's description of the "second-generation 3nm process" is in perfect agreement with TSMC's second-generation 3nm process, N3E. An enhanced version of their 3nm process node is a bit inferior to the N3B process used by the M3 series of chips; N3E is not as dense as N3B, but according to TSMC, it offers slightly better performance and power consumption characteristics. The differences are so close that the architecture plays a bigger role, but in the race for energy efficiency, Apple will have whatever advantage they can get.

Apple launched the M4 chip, mediocre?

Apple's position as TSMC's new process node launch partner has been well established over the years, and Apple appears to be the first company to launch an N3E process chip. However, they won't be the last, as almost all of TSMC's high-performance customers are expected to adopt N3E next year. So, as usual, Apple's immediate advantage in chip manufacturing is only temporary.

Apple's early leadership may also explain why we now see the M4 on the iPad (a relatively low-selling device from Apple) instead of the MacBook series. At some point, TSMC's N3E capacity will catch up and then catch up some. I'm not going to risk speculating on Apple's plans for the series at the time, because I don't really see Apple stopping production of M3 chips so soon, but it also puts them in an awkward position where they have to exist in the M4.

The chip size of the new chip (or the chip photos released) has not yet been announced, but the total number of transistors is 28 billion, which is only slightly more than the number of transistors in the M3, suggesting that Apple is not investing too much in new hardware.

M4 CPU architecture: four performance cores, six efficiency cores

Starting with the CPU side, we are faced with a mystery in the design of Apple's M4 CPU cores. Apple is tight-lipped and lacks a performance comparison with the M3, which means we don't get much information about CPU design comparisons. So it remains to be seen whether the M4 represents a watershed moment in Apple's CPU design – the new Monsoon/A11 – or a minor update similar to the Everest CPU cores in the A17. Of course, we would like the latter, but if there are no more details, we will work on what we know.

Apple's short keynote on the SoC noted improved branch prediction for both Performance and Efficiency Cores, as well as a broader decoding and execution engine for Performance Cores. However, these are the same as the broad claims Apple has made for the M3, so this in itself does not represent a new CPU architecture.

According to Apple, the M4 has a brand new CPU with up to 10 cores, which contains up to four performance cores, and now contains six efficiency cores. The next-generation kernel has improved branch prediction capabilities, providing a broader decoding and execution engine for performance cores and a deeper execution engine for efficiency cores. Both types of cores also feature enhanced, next-generation machine learning accelerators.

Apple launched the M4 chip, mediocre?

Compared to the powerful M2 in the previous iPad Pro, the M4 has 1.5x faster CPU performance. 1Whether you're working with complex orchestral files in Logic Pro or adding demanding effects to 4K video in LumaFusion, M4 boosts performance across your entire professional workflow.

However, what makes the Apple M4 CPU claim unique is that both CPU core types are "next-generation machine learning accelerators." This goes hand-in-hand with Apple's broader focus on ML/AI performance in M4, although the company hasn't elaborated on what these accelerators are used for. With the NPU doing all the heavy lifting, the purpose of AI enhancement on the CPU cores is no longer total throughput/performance, but more light inference workloads mixed in more general-purpose workloads without spending time and resources on dedicated NPUs.

One well-educated guess is that Apple has updated their low-record AMX matrix units, which have been part of the M-series SoCs since the beginning. However, recent AMX versions already support common ML number formats like FP16, BF16, and INT8, so if Apple makes changes here, it's not a simple and straightforward thing like adding (more) common formats. At the same time, if it's AMX, it's a bit surprising to see Apple mention it, as they are very secretive about these devices.

Another reasonable option is that Apple has made some changes to the SIMD units within its CPUs to add common ML number formats, as developers can access these units more directly. But at the same time, Apple has been pushing developers to use higher-level frameworks from the start (that's how AMX is accessed), so this could actually happen either way.

In any case, whatever the CPU cores that prop up the M4, one thing is certain: there are more of them. The full M4 configuration includes 4 performance cores and 6 efficiency cores, which is 2 more efficiency cores than the M3. The Lite iPad model gets a 3P+6E configuration, while the higher-tier configuration gets the full 4P+6E experience, so the impact on performance may be noticeable.

All else being equal, adding two efficiency cores won't significantly improve CPU performance compared to the M3's 4P+4E configuration. But Apple's efficiency cores shouldn't be underestimated either, because even Apple's efficiency cores are relatively powerful due to the use of out-of-order execution. Especially when fixed workloads can remain on efficiency cores rather than being promoted to performance cores, there's a lot of room for energy efficiency improvements.

Other than that, Apple hasn't released any detailed performance charts for the new SoC/CPU cores, so there's little hard data to discuss. But the company claims that the M4 has 50% faster CPU performance than the M2. This is presumably for multi-threaded workloads that can take advantage of M4's CPU core count. In addition, Apple also claimed in the keynote that they can deliver M2 performance at half the power, which seems like a reasonable proposition when combined with process node improvements, architecture improvements, and an increase in the number of CPU cores.

However, as always, we will have to see how the independent benchmark turns out.

M4 GPU 架构:光线追踪和动态缓存

The M4's new 10-core GPU is built on the next-generation graphics architecture of the M3 series of chips. It has Dynamic Caching, an Apple innovation that dynamically allocates local memory in the hardware in real-time, significantly increasing the average utilization of the GPU. This significantly improves the performance of the most demanding professional applications and games.

The GPU situation is much simpler compared to the CPU case on the M4. The new GPU architecture has just recently been introduced in M3 (Apple doesn't iterate on this core type as often as CPUs), and Apple has almost confirmed that the GPU in M4 is the same as the architecture in M3.

Apple launched the M4 chip, mediocre?

With 10 GPU cores, the advanced configuration is the same as on the M3. Whether this means that the various blocks and caches are truly the same as the M3 remains to be seen, but Apple hasn't made any claims about the M4's GPU performance, which could be interpreted in any way as its superiority over the M3's GPU. In fact, the iPad's smaller form factor and more limited cooling capacity mean that the GPU will be subject to thermal limitations under any sustained workload, especially when compared to how the M3 performs in active cooling devices like 14-14. inches MacBook Pro.

In any case, this means that the M4 comes with all the major new architectural features introduced by the M3 GPU: ray tracing, mesh shading, and dynamic caching. Apple also highlighted that hardware-accelerated ray tracing is also coming to the iPad for the first time, enabling more realistic shadows and reflections in games and other graphics-rich experiences. Hardware-accelerated mesh shading is also built into the GPU, providing greater geometry processing power and efficiency, enabling more visually complex scenes in gaming and graphics-intensive applications. M4 has given a huge boost to professional rendering performance in applications like Octane and is now four times faster than M2.

Here, we don't emphasize fiber tracing, but mesh shading is an important next-generation geometry processing method. At the same time, dynamic caching is Apple's term for its improved memory allocation technology on M-series chips, which avoids over-allocating memory from Apple's unified memory pool to the GPU.

With these improvements to the CPU and GPU, the M4 maintains the industry-leading performance per watt of Apple silicon. The M4 delivers the same performance as the M2 at half the power. Compared to the latest PC chips found in thin and light laptops, the M4 delivers the same performance with only a quarter of the power consumption.

In addition to GPU rendering, M4 also gets an updated media engine block for M3, which comes from M2 and is a relatively important thing for iPad use. Most notably, the M3/M4's Media Engine module adds support for AV1 video decoding, the next generation of open video codec. While Apple is more than happy to pay royalties for HEVC/H.265 to ensure it's available in its ecosystem, the royalty-free AV1 codec is expected to play an important role and be used in the coming years, while the iPad Pro is better off using the latest codecs (or at least doesn't have to decode AV1 inefficiently in software).

However, what is innovative in the display of the M4 is the new display engine. This module is responsible for compositing images and driving the displays connected on the device, Apple has never given this module much attention, but when they update it, it usually brings some functional improvements right away.

Apple launched the M4 chip, mediocre?

The key change here seems to be the enabling of Apple's new mezzanine "tandem" OLED panel configuration, which debuted in the iPad Pro. The iPad's Ultra Retina XDR display stacks two OLED panels directly on top of each other so that the display can cumulatively meet Apple's brightness target of 1600 nits, which is clearly not possible with a single OLED panel. This, in turn, requires a display controller that knows how to manipulate panels, not only to drive a set of mirrored displays, but also to account for the performance penalty caused by one panel being under the other.

While not directly related to the iPad Pro, it will be interesting to see if Apple uses this opportunity to increase the total number of displays that the M4 can drive, as the average M-series SoC is usually limited to 2 displays, much to the consternation of MacBook users. The fact that the M4 can drive in-line OLED panels and external 6K displays is promising, but when the M4 arrives on the Mac, we'll see how that translates into the Mac ecosystem.

M4 NPU Architecture: New Stuff, Faster Stuff

Arguably, the biggest focus of Apple's M4 SoC is the company's NPU, also known as Neural Engine. Since the M1, the company has been rolling out 16-core designs (smaller designs on the A-series chips before that), with each generation delivering modest performance gains. But Apple says that with the advent of the M4 generation, they have made an even bigger leap in performance.

Apple launched the M4 chip, mediocre?

The M4 NPU is still a 16-core design and is rated at 38 TOPS, just over double the 18 TOPS Neural Engine in the M3. Coincidentally, only a few TOPS higher than the Neural Engine in the A17. So, as a benchmark statement, Apple claims that the M4 NPU is much more powerful than the one in the M3, not to mention the M2 that powered the previous iPad, and even earlier, is 60 times faster than the A11's NPU.

Unfortunately, the problem (again) comes in the details, as Apple doesn't list all the important accuracy information - whether the number is based on INT16, INT8, or INT4 precision. As the legal precision of ML inference at the moment, INT8 is the most likely option, especially since this is what Apple made last year's offer for the A17. But the freedom to mix precision, or even just not disclose them, is a headache to say the least. This makes it difficult to compare similar specifications.

In any case, even if most of the performance improvements come from INT8 support rather than INT16/FP16 support, the M4 NPU is expected to bring significant performance improvements to AI performance, similar to what has already happened with A17. Since Apple was one of the first chip suppliers to launch a consumer-grade SoC with what we now call an NPU, the company isn't afraid to make a big splash on this issue, especially when compared to what's happening in the market. Computer field. Especially since Apple offers a complete hardware/software ecosystem, the company has the advantage of being able to shape their software with its own NPUs, rather than waiting to invent killer apps for it.

According to Apple's description, the M4 has an extremely fast Neural Engine, which is an IP module in the chip specifically designed to accelerate AI workloads. It's Apple's most powerful Neural Engine ever, capable of performing a staggering 38 trillion operations per second, a staggering 60 times faster than the first Neural Engine in the A11 Ionic. The Neural Engine, along with next-generation machine learning accelerators in the CPU, a high-performance GPU, and higher-bandwidth unified memory, makes the M4 an extremely powerful AI chip. With AI features in iPadOS, such as Live Captions for real-time audio captions and Visual Look Up for objects in videos and photos, the new iPad Pro allows users to quickly complete amazing AI tasks on the device.

iPad Pro with M4 makes it easy to separate subjects from their backgrounds in 4K video in Final Cut Pro with a single tap, and automatically create scores in real time in StaffPad just by listening to someone play the piano. Inference workloads can be completed efficiently and privately while minimizing the impact on application memory, application responsiveness, and battery life. The Neural Engine in M4 is Apple's most powerful Neural Engine to date, more powerful than any Neural Processing Unit in any AI PC today.

M4内存:采用更快的LPDDR5X

Last but not least, the memory capabilities of the M4 SoC have also been significantly improved. Given the memory bandwidth data (120GB/s) that Apple quoted for the M4, all indications are that they have finally adopted the LPDDR5X in their new SoC.

LPDDR5X is a mid-term update to the LPDDR5 standard that provides higher memory clock speeds than LPDDR5, up to 6400 MT/s. While the LPDDR5X currently has speeds of up to 8533 MT/s (and there will be faster speeds), this makes the memory clock speed around LPDDR5X-7700 based on the Apple M4's 120GB/s data.

Since the M4 will be applied to the iPad first, we don't know its maximum memory capacity yet. The M3 can accommodate up to 24GB of RAM, and while Apple is unlikely to regress in this regard, there's also no indication that they'll be able to increase the memory to 32GB. At the same time, the iPad Pro will all come with either 8GB or 16GB of RAM, depending on the model.

Link to original article

https://www.anandtech.com/show/21387/apple-announces-m4-soc-latest-and-greatest-starts-on-ipad-pro

Source | Semiconductor Industry Watch (ID: icbank) Compiled from Apple

Apple launched the M4 chip, mediocre?

☞ Business Cooperation: ☏ Please call 010-82306118 / ✐ or to [email protected]

Apple launched the M4 chip, mediocre?

Read on