laitimes

Tesla is becoming the leading AI chip company

author:Xinzhixun

Tesla aspires to be one of the world's leading AI companies. To date, they haven't deployed a state-of-the-art self-driving system, an honor that applies to Alphabet's Waymo. In addition, Tesla is also nowhere to be seen in the world of generative artificial intelligence. That being said, thanks to data collection advantages, professional computing, a culture of innovation, and leading AI researchers, Tesla has the potential to make the leap forward in the field of self-driving cars and robotics.

Tesla is becoming the leading AI chip company

Tesla currently has a very small amount of AI infrastructure in-house, with only about 4,000 NVIDIA V100s and about 16,000 NVIDIA A100s. That's a very small number compared to other big tech companies in the world, as companies like Microsoft and Meta have more than 100,000 GPUs, and they want to double those numbers in the short to medium term. Tesla's weak AI infrastructure is due in part to multiple delays in its internal D1 training chip.

Now the situation is changing rapidly.

Tesla plans to significantly increase its AI capabilities by more than 10 times in 1.5 years. That's partly for their own capabilities, but also a large portion for Musk's newly formed artificial intelligence company X.AI.

Today, we want to take a deep dive into Tesla's AI capabilities, including the number of H100 and Dojo D1s it has, as well as the quarter-over-quarter growth, and Tesla's unique needs due to its model architecture, training infrastructure, and edge inference, including HW 4.0. Finally, we want to discuss what X.AI is doing, which is Musk's competitor to OpenAI, poaching many famous engineers from OpenAI.

Tesla is becoming the leading AI chip company

The story of the D1 training chip is a long and difficult one. It faced problems from silicon design to power delivery, but now Tesla claims it's ready for the striking and starting mass production.

Tesla has been designing in-house AI chips for its cars since 2016 and for data center applications since 2018. Prior to the release of the D1 chip, semianalysis exclusively disclosed the special packaging technology they used. This technique is called InFO SoW. Simply put, think of it as a wafer-sized sector package. This is similar to the principle of Cerebras' full-wafer-sized AI chips, but with the advantage of allowing for known good die testing. This is the most unique and interesting aspect of Tesla's architecture, since this InFO SoW has 25 chips built in and no memory directly connected.

Tesla is becoming the leading AI chip company
Tesla is becoming the leading AI chip company
Tesla is becoming the leading AI chip company
Tesla is becoming the leading AI chip company
Tesla is becoming the leading AI chip company

Back in 2021, SemiAnalysis also discussed the advantages and disadvantages of their chip architecture in more detail. Since then, the most interesting aspect has been that Tesla had to make another chip located on a PCIe card to provide memory connectivity due to insufficient on-chip memory.

Tesla was supposed to ramp up production in 2022, but didn't do so due to silicon and system issues. Now in the middle of 2023, the D1 chip is finally increasing production. The architecture is well suited for Tesla's unique use case, but it's worth noting that it's not useful for LLMs where memory bandwidth is severely constrained.

Tesla's use case is unique in that it has to focus on image networks. Therefore, their architectures are very different. In the past, we have discussed that deep learning recommendation networks and transformer-based language models require very different architectures. Image/video recognition networks also require different combinations of compute, on-chip communications, on-chip memory, and off-chip memory requirements.

During training, the utilization of these convolutional models on the GPU is very low. With NVIDIA's next-generation transformers further optimized, especially sparse MoE, Tesla's investment in its differentiated, optimized convolutional architecture should work well. These image networks must comply with the limitations of Tesla's inference infrastructure.

Tesla HW 4.0, the second generation FSD chip

In addition to the D1 training chip, which is manufactured by TSMC, the chip that runs artificial intelligence inference inside Tesla's electric car is called a fully autonomous driving (FSD) chip. The models on Tesla vehicles are extremely limited because Tesla has a very stubborn belief that they don't need huge performance to achieve full self-driving. In addition, Tesla's cost constraints are much stricter than Waymo and Cruise because they actually ship more. Meanwhile, Alphabet Waymo and General Motors Cruise are using full-size GPUs that cost 10 times more cars during development and early testing, and want to build faster (and more expensive) SoCs for their own cars.

Tesla's second-generation FSD chip, which has been shipped in cars since February 2023, has a design very similar to the first-generation chip. The first generation is based on Samsung's 14nm process and is built around three quad-core clusters with a total of 12 Arm Cortex-A72 cores operating at 2.2 GHz. However, in the second-generation design, Tesla increased the number of CPU cores to five 4-core clusters (20), for a total of 20 Cortex-A72 cores.

The most important part of the second-generation FSD chip is the three NPU cores. The three cores use 32 MB of SRAM, each for storing model weights and activations. Each cycle, 256 bytes of activation data and 128 bytes of weight data are read from SRAM to the multiply-accumulate unit (MAC). The design of the MAC is a grid, with each NPU core having a 96x96 grid, for a total of 9216 MACs and 18432 operations per clock cycle. Each chip has three NPUs operating at 2.2 GHz, with a total computing power of 121.651 trillion operations per second (TOPS).

Tesla is becoming the leading AI chip company

The second-generation FSD has 256GB of NVMe storage and 16GB of Micron GDDR6, 14Gbps, located on a 128-bit memory bus, providing 224GB/s bandwidth. The latter is the most notable change, as bandwidth increases by about 3.3 times from generation to generation. The increase in FLOPS relative to bandwidth indicates that HW3 is difficult to fully utilize. Each HW 4.0 has two FSD chips.

The increase in HW4.0 board-level performance comes at the cost of additional power consumption. Compared to HW3.0, the idle power consumption of the HW4.0 class is about twice that of the HW3.0 motherboard. At the peak, expect it to be higher as well. The external HW4.0 enclosure appears at 10 volts at 16 amps, which translates to 160 watts of used power.

Despite the improved performance of HW4.0, Tesla wants the HW3.0 to implement FSD as well, probably because they don't want to retrofit existing HW3.0 users who buy FSD.

The infotainment system uses AMD GPU/APU. This is also on the same board as the FSD chip compared to the previous generation, which had a separate daughter board.

Tesla is becoming the leading AI chip company

The HW4.0 platform supports 12 cameras, one of which is used for redundancy purposes, so there are 11 cameras in use. In the old setup, the front-facing camera hub used three lower-resolution 1.2-megapixel cameras. The new platform uses two higher-resolution 5-megapixel cameras.

Tesla does not currently use lidar sensors or other types of non-camera methods. In the past, they did use radar, but it was canceled in the middle of the third generation. This has greatly reduced the manufacturing cost of the car, which Tesla has optimized, and the company believes that pure camera sensing is a possible route for self-driving cars. However, they also noted that if there is a viable radar, they will integrate it with the camera system.

In the HW4.0 platform, there is an internally designed radar called Phoenix. Phoenix combines radar systems with camera systems to create safer vehicles by leveraging more data. The Phoenix radar uses the 76-77 GHz spectrum with a peak effective isotropic radiated power (EIPR) of 4.16 watts and an average EIRP of 177.4 milliwatts. It is a non-pulsed automotive radar system with three sensing modes. The radar PCB includes the Xilinx Zynq XA7Z020 FPGA for sensor fusion.

Tesla AI model differentiation

Tesla's goal is to produce foundational AI models that power its self-driving robots and cars. Both need to be aware of their surroundings and navigate around them, so the same type of AI model can be applied to both. Creating efficient models for future autonomous platforms requires a lot of research and, more specifically, a lot of data. In addition, the inference of these models must be done with extremely low power and low latency. Due to hardware limitations, this greatly reduces the maximum model size that Tesla can offer.

Of all the companies, Tesla has the largest dataset available to train its deep learning neural network. Each vehicle on the road uses sensors and images to capture data and multiply it by the number of Tesla electric vehicles on the road to produce a huge data set. Tesla calls the part of its data it collects "automatic tagging of fleet size." Each Tesla electric car collects a 45-60 second dense log of sensor data, including video, inertial measurement unit (IMU) data, GPS, odometer, etc., and sends it to Tesla's training servers.

Tesla's model was trained on segmentation, masking, depth, point matching, and other tasks. With millions of electric vehicles on the road, Tesla has a plethora of data sources that are well labeled and documented. This makes it possible to carry out ongoing training in Dojo supercomputers at the company's facilities.

Tesla's belief in the data contradicts the available infrastructure the company already has in place. Tesla used only a fraction of the data they collected. Tesla overtrains its models due to its strict inference limitations, known for achieving the best possible accuracy within a given model size.

Overtraining small models can lead to stagnation in fully autonomous performance, and the inability to use all the data collected. Many companies are similarly choosing to train on as large scale as possible, but they are also using much more powerful automotive inference chips. For example, NVIDIA plans to offer automotive customers more than 2,000 DRIVE Thor with more than 2,000 TERAFLOPS computing power, which is more than 15 times that of Tesla's new HW4.0. In addition, the NVIDIA architecture is more flexible for other models.

Editor: Xinzhixun-Lin Zi Compiled from: semianalysis