Advancements in deep learning hardware: GPUs, TPUs, etc

author：The semiconductor industry is vertical 2024-04-21 12:14:00

THIS ARTICLE IS SYNTHESIZED BY THE SEMICONDUCTOR INDUSTRY (ID: ICVIEWS).

Deep learning and the hardware that drives it are constantly evolving.

Deep learning has dramatically changed industries from healthcare to autonomous driving. However, these advancements would not have been possible without the parallel development of hardware technology. Let's explore the evolution of deep learning hardware, with a focus on GPUs and TPUs and what's next.

The rise of GPUs

Graphics processing units play a key role in the deep learning revolution. Originally designed to handle computer graphics and image processing, GPUs are highly efficient at performing the matrix and vector operations of the deep learning core.

There are a few reasons why GPUs are good for deep learning:

First of all, compared with the CPU, the GPU has more independent high-throughput computing channels, fewer control units make it not interfered by more tasks other than computing, and has a purer computing environment than the CPU, so the deep learning and neural network models will complete the computing tasks more efficiently with the blessing of the GPU.

Secondly, the core of deep learning is parameter learning, which is relatively independent of each other and can be processed in parallel, while GPUs are suitable for parallel processing with simple logic.

In addition, the architecture of GPUs is suitable for computationally intensive and data-parallel programs, and deep learning meets both of these points.

Finally, GPUs can provide the best memory bandwidth with little to no impact on the latency of thread parallelism, which is one of the reasons why GPUs are suitable for deep learning.

Introduction of TPU

A tensor processing unit (TPU) is a custom ASIC chip that was designed from the ground up by Google and specifically designed for machine learning workloads. TPUs provide compute support for Google's major products, including Translator, Photos, Search Assistant, and Gmail, among others. Cloud TPU uses TPU as a scalable cloud computing resource and provides computing resources to all developers and data scientists running cutting-edge ML models on Google Cloud.

In order to be faster than GPUs, Google designed the TPU, a dedicated processor for neural networks, further sacrificing the versatility of the processor and focusing on matrix operations. TPU no longer supports a wide variety of applications, but only the large-scale addition and multiplication operations required by neural networks.

Because of the single matrix multiplication process that knew what it was going to calculate from the beginning, TPU directly designed a large physical matrix with thousands of multipliers and adders directly connected. For example, Cloud TPU v2 includes two 128*128 computing matrices, which is equivalent to 32,768 ALUs.

The TPU is calculated as follows: the parameter W is loaded from memory into the multiplier and adder matrices, and then the TPU loads the data X from memory. As each multiplication is performed, the result of the calculation is passed to the next multiplier while summing. Therefore, the output is the sum of all the multiplication results between the data and the parameters. The biggest feature of this process is that there is no need for memory requests during the entire mass computing and data delivery process.

This method not only improves the computing efficiency of the neural network but also saves power consumption, and the low resource consumption brings low cost, which is cheaper for the general public. Google itself calculates that the computing cost of TPU is about 1/5 of that of non-TPU, and the TPU service in Google Cloud seems to be relatively close to the people.

Future technology beyond GPUs and TPUs

The landscape of deep learning hardware is constantly evolving. Here are some of the emerging technologies that could shape the future:

FPGAs (Field Programmable Gate Arrays): Unlike GPUs and TPUs, FPGAs are programmable and can be reconfigured after manufacturing, which provides flexibility for specific applications, and it can be programmed and configured based on specific deep learning models and algorithms. FPGAs have parallel computing capabilities that can quickly perform a large number of matrix operations and tensor operations in deep learning. In deep learning models, these calculations are very time-consuming, but FPGAs can perform multiple computational tasks at the same time through parallel computing at the hardware level, significantly accelerating computational speed.

ASICs (Application Specific Integrated Circuits): are tailored to specific applications and provide the best performance and energy efficiency. ASICs for deep learning are still in their early stages, but they hold promise for future optimization.

Neuromorphic computing: The idea of neuromorphic computing is to take inspiration from the brain and design computer chips that fuse memory and processing power. In the brain, synapses provide direct memory access to neurons that process information. This is how the brain achieves impressive computing power and speed with very little power consumption. By mimicking this architecture, neuromorphic computing provides a way to build intelligent neuromorphic chips that consume very little energy and are computationally fast. This technology is expected to dramatically reduce power consumption while dramatically improving processing efficiency.

Challenges and future directions

While advances in deep learning hardware are impressive, they also come with a set of challenges:

High cost: Developing custom hardware such as TPUs and ASICs requires significant investment in research, development, and manufacturing.

Software compatibility: Ensuring that new hardware works seamlessly with existing software frameworks requires ongoing collaboration between hardware developers, researchers, and software programmers.

Sustainability: As the hardware becomes more powerful, it also consumes more energy. Making these technologies sustainable is critical to their long-term survival.

conclusion

Deep learning and the hardware that drives it are constantly evolving. Whether it's through improvements in GPU technology, wider adoption of TPUs, or groundbreaking new technologies like neuromorphic computing, the future of deep learning hardware looks exciting and promising. The challenge for developers and researchers is to balance performance, cost, and energy efficiency to continue to drive innovations that can transform our world.

*Disclaimer: This article was created by the original author. The content of the article is his personal point of view, and our reprint is only for sharing and discussion, and does not mean that we agree or agree, if you have any objections, please contact the background.

Advancements in deep learning hardware: GPUs, TPUs, etc

The rise of GPUs

Introduction of TPU

Future technology beyond GPUs and TPUs

Challenges and future directions

conclusion

Read on

CPU, GPU, TPU, NPU !️are several different types of processors, each with its own advantages and disadvantages

I sat on the podium to supervise, and slowly most of the students entered the state of study. I stood slowly

Exploring the Future World: Application and Principle Analysis of Deep Learning in Autonomous Driving

Deep Learning Basics: Explanation of Some Common Terms in Complete Sets of Electrical Appliances (Recommended Collection)

To predict the fragment spectrum of intact glycopeptides, Zhejiang University developed a deep learning method DeepGlyco

The Stanford team has developed a new deep learning model that can predict surface displacement caused by carbon capture

Wang Ziqi's private clothes are recommended for good-looking boys to learn deeply!

Deep Thinking: Is the bigger the visual deep learning model, the better?

Southern Surveying and Mapping Recommendation | Liu Li: Stope information extraction from Weining Beishan open-pit mine by combining deep learning and object-oriented analysis

【Technology】End-to-end large model of automobiles: deep learning of driving rules by AI

A review of multimodal deep learning!

Preschool Education|Dong Xinran: Promoting Children's Deep Learning in Game Workshops: A Case Study of "Pengcheng Food Street".

Detailed explanation of the principles and technologies of generative AI (1) - neural networks and deep learning

I heard that you lack a GPU?

One of the 100 models of analytical thinking: deep learning

Re-learning Marx's On the Jewish Question from the Anti-Semitism Consciousness Act in the United States

【Party Discipline Learning and Education】One lesson a day | The "Regulations on Disciplinary Actions of the Communist Party of China" stipulate the punishment for acts of resisting organizational review

He died before he left the school, which made the hero cry. With 50 points, 4 rebounds and 4 assists, Mitchell almost played the pinnacle of his personal series in the G6 series against the Magic, but against Garland

Yale reveals the contextual learning mechanism of self-attention structure and proves the convergence of gradient flow algorithms

Orina Chang of Oppenheimer Fund: Individual investors should learn from Warren Buffett like this!

Zhou Hongyi said that Xiaomi made cars for the first time beyond the speed of the industry: many cars should learn from Xiaomi and have memorable highlights

These 24 special effect points must be remembered, they are more effective than medicine, collect them and learn together!

Do you know what diseases Zhongjing's nine major prescriptions can treat?

Stomach dryness and hunger fast, liver dryness, numbness of hands and feet, kidney dryness and can't sleep, etc., collect it and study together!

The 60-year-old aunt drank rose tea every day, and went for a physical examination 8 months later, and the doctor praised it: it is worth learning

The phenomenon of chicken babies in primary school, nine out of ten children learn ahead of time

Sister Lang 5 is the top three sisters after the first father, and the one who can't speak the most is Chen Lijun, who feels that her heart is different, and she has no champion appearance at all. Qi Wei, Chen Lijun, and Liu Xin are Sister Lang's 5 first male and third queen

Sun Tzu's Art of War: The wisdom of handling and counter-grasping, we must learn it well.

Idioms can also be memorized in this way, the learning of Chinese focuses on accumulation, and remembering it can improve the level of writing

Before the child is 10 years old, he must set these rules! The more he grows up, the better he becomes, and he also takes the initiative to learn

Eight Duanjin seven points, remember to get twice the result with half the effort, collect it and learn it together!