
Original: Tan Jing
DPU is another hot spot after artificial intelligence chips.
Industry insiders joke that the enthusiasm of those investors to study DPU is higher than that of those who make DPUs.
When someone asked what the DPU chip was, a familiar female voice floated in the startup: "Alipay arrived, 100 million yuan." "Yes, the amount of financing is often hundreds of millions.
Don't say that the Internet factory, the national character head big fund is also hiding behind the knife, the eyes are directly hooked to stare at the DPU, waiting for the opportunity.
Cloud vendors' desire for DPU chips is hysterical. People who forced DPU had blood spurting from their nostrils and their eyes on fire.
There is a saying that DPU has only two brands - Amazon Cloud and Alibaba Cloud, and others.
The story of DPU begins a long time ago.
(i)
In 1998, on the court of Stanford University in the United States, the pointed green grass was gently touched by the sun, the children screamed and cheered, the jerseys were bright, and the wind ran.
Off the pitch, some parents waited to pick up their children and take them home. Some of them are colleagues at Stanford University and their families.
Left and right, the two parents in the group chatted.
"What have you been up to lately?"
"Started a business."
"What is the direction of entrepreneurship?"
"You can run multiple operating systems on a single computer."
The man who asked the question, polite and polite, heard the answer, suddenly his eyes lit up, and blurted out: "This idea is very good, very fresh." ”
I am most afraid of a popular family, meet a popular family, and burst into a spark when I talk.
Two parents, on the side of the court, reached an investment intention on the spot.
This parent, no, this early investor who saw the shot, was elegant and personable.
He is Zhang Shousheng, a famous Chinese physicist, Yang Zhenning's beloved disciple, mainly engaged in research in the field of condensed matter physics, and a tenured professor at Stanford University.
Diane Green, a gifted female manager, has been an executive since the start of her business and has been in charge of the company for a decade.
Later, she also managed Google Cloud for a while.
This operation made the children look confused.
The father of another child, Diane Green's husband, is a Stanford Professor, Mendel Rosenblum (Professor Luo), the company's chief scientist, and a world-class expert in the field of operating systems.
At that time, on the road to entrepreneurship, the couple had just left, and the company was called VMware.
More than two decades later, the company became a giant in virtualization technology.
The terminology of computer technology often flashes a rational light, and the word "virtualization" is a mirror flower and water moon, too illusory, and the mysterious temperament of a fairy wind Dao bone is pinched to death.
Not surprisingly, the beginning of virtualization technology: academic research station C bit.
A handful of universities are studying, Stanford University in the United States and cambridge university in the United Kingdom.
A handful of companies are exploring, IBM, Intel, Microsoft.
The 1970s was the golden age of virtualization academic research, and a number of academic papers laid the theoretical foundation for this direction.
In the field of science and technology, it is not enough to rely on papers, you have to come up with something, and you have to use it.
And other representative companies were born, waiting for the intestines to be broken every year, from the seventies, to the nineties, the first and twenty years.
According to VMware CEO Diane Green, one night, Professor Luo came home and talked about work, saying: "I want to revisit virtualization, introduce isolation into the operating system, and run old and new code at the same time, without having to build a new operating system." ”
Ideas poured out of his mind like a flood, and Professor Luo was so excited that the day after he woke up, he began to make prototypes.
Soon after, the surging power of action allowed Professor Luo and his students to virtualize the X86 server.
Scientists' circle of friends is often radiant, and the founder gathered five experts, swung up the formation, and launched a charge against virtualization, this time in history.
"Attack" X86, opening up a new situation of virtualization from defense to attack.
According to legend, the culture and Internet culture of virtualization technology companies are different.
When the Stanford university students closed their eyes and picked offers, they said that Google's culture attracts young people, dogs to work, pajamas to bang, and VMware employees can marry and have children at the company.
Subsequently, the academics passed the ball to the side, and the PC's virtualized catch.
Although in the era of small PCs, virtualization is not just needed, but it has created a new way of playing hardware and boiled the blood of geeks.
Carefully ask, is virtualization cheating the CPU?
Answered bluntly: The CPU has an "ability" to be deceived.
Virtualization technology is very powerful, or to guard against arrogance and impatience, the road behind, is still long.
(ii)
Undoubtedly, VMware was successful, and when the open source world rose, software shrugged off the dominance of commercial software, and open source geeks took to the stage.
Please remember that the spirit of these two virtualization technologies is small (collar) (sleeve), because they are crucial to the development of DPUs.
Sorted by stroke of last name: Anthony Liguori, American. Zhang Xiantao, Chinese.
In 2001, at the Computer Lab of the University of Cambridge in the United Kingdom, Professor Ian Pratt led several doctoral students to do a very well-known virtualization project called the Xen project.
Xen is pronounced a bit like "than", but not exactly the same, pronounced without sticking out the tongue.
Xen has a large, dynamic development community that has profoundly impacted cloud computing, virtualization, and security.
Two years later, the first stable version of the Xen X86 virtual machine monitor was available.
In 2004, Zhang Xiantao studied for a doctorate at Wuhan University. Years of cold window, no less effort, he has a strong technical level and a stronger hands-on ability. The technology god he worries about where to go, but Zhang Xiantao is still a little worried, because the choice of virtualization technology is too small.
To study for a PhD, diligence is important, and it is also important to have a good doctoral supervisor.
Zhang Xiantao's doctoral supervisor is the world-renowned cryptographer Qing Sihan. Zhang Xiantao must have spent a lot of luck to meet such a good mentor.
He said to Zhang Xiantao with a kind face: "Virtualization is a research direction, I have cooperation with Intel, you go there to intern first, don't worry, I will help you arrange, the rest, it depends on yourself." ”
Now it seems that Professor Qing Sihan sent Zhang Xiantao to the enterprise for internship, which is completely based on the interests of the students. (Some doctoral supervisors are not willing to let doctoral students go out to work, and three thousand words are omitted here.
Therefore, Zhang Xiantao started as an intern at Intel and did it for 3 years. Time flew by, his technology soared, and he officially joined Intel in 2008.
He probably didn't expect that the job would last 9 years.
Shake the camera to IBM corporation in 2002, another frontier in virtualization.
A college student named Anthony Lee Kwok-ri has been interning at IBM during his time at school, 20 hours a week, for four consecutive years, wind and frost, snow and rain, never stopping.
In 2006, Anthony joined the IBM Linux Technology Center as a software engineer for seven years.
In this life, people find a technology that they really like as a hobby, and then pay time and patience at any cost, polish it with their hearts, and harvest more than just a skill that can be used.
More importantly, there is also a reborn self.
Maybe one evening, the red light stained the sky, Zhang Xiantao was looking at the clouds, Anthony was also looking at the clouds, and there might be a moment when they all realized that they were going to deal with the name of the very poetic technology for a lifetime.
Worldly causes, causes and conditions in the world.
(iii)
Chip jianghu, there are people, there is wine, there are stories, there are voices that criticize Intel.
In the era when pc is king, because virtualization is not mainstream, the Intel X86 instruction set is very unfriendly to virtualization services.
This pot, which had to be carried by Intel, was carried until 2005.
Intel was only smiling and friendly that year, and did some extended instruction sets in the CPU.
This heavy blow has turned the tide of virtualization.
In November of that year, Intel announced that the product supported hardware virtualization (VT-x, VT-d), and AMD also followed suit, announcing that the product supported hardware virtualization (SVM).
Don't look at the belated, but this set of technology is also hardcore enough to make it is also remarkable.
But Intel's "crime" is also very obvious - inefficient.
VMware was quick to say that Intel was slamming it.
Imitating Luo Binwang's crusade against Wu Zetian, he produced a famous "檄文" and crusaded against Intel.
Accuse the virtualization-enabled extensions of inefficient instruction sets and not yet having the performance done at home (in this case, VMware's "binary translation").
Blame others, take the opportunity to praise yourself, there is nothing wrong with it.
After this incident, you can also get a glimpse of VMware's jianghu status, criticize others, and have to be straight.
After this change, virtualization has full fire support from CPU chip manufacturers, as if it is pressed twice the speed play button.
This is the shortcoming of virtualization, which was rescued for the first time.
At that time, Intel did not work in vain, dipped in spit to count the banknotes, and his heart blossomed. Support Xen, exactly the same as support for Linux, virtualization drives the ecology, everyone loves user stickiness.
The sunlight outside the window is just right, it is eating time again, the white clouds in the sky are floating by, there is no great change in the world, and the cloud computing has quietly changed dramatically.
The first ticket to cloud computing, Amazon Cloud grabbed it. As soon as the clarion call of cloud computing sounded, open source quickly occupied the C bit.
Xen is full of confidence, said Ian Pratt, chief scientific officer and professor of computer labs at the University of Cambridge: "Microsoft is on its way to catching up with us. ”
That's true, Microsoft's products are inspired by Xen.
For a time, Red Hat, Sun Microsystems, Suse Linux, Xen was everywhere.
For a moment, it seemed that anything new was going to sprout in Xen.
Xen was surrounded by applause, and even the cloud giants extended an olive branch. In 2006, Amazon Cloud (EC2) introduced the first instance type (m1.small) to adopt Xen.
Xen boarded the first cloud and has since become a force in cloud computing.
Virtualization technology is burning oil, Microsoft is not convinced. Softricity, a virtualization vendor based in Boston, HASTI hastened to announce a timeline for introducing a new virtual machine management product for Windows Server.
"Don't rush, I'm buying it."
The follow-up is that Microsoft has once again revealed its true colors, and has sold a number of startups for virtualization many times in a row.
This also reflects the virtualization technology from the side, which is niche, difficult, and critical, and even Microsoft is dumb and eats yellow, and there is bitterness that cannot be said.
In 2007, Intel released VT-x, which enhanced many functions and surpassed VMware's binary translation technology.
VMware's inner monologue: Criticizing Intel, sloppy.
In 2009, research firm Gartner predicted that three technologies would dominate virtualization: VMware's ESX Server, Xen, and Microsoft's Viridian hypervisor.
Some technical pre-test (divination) test (bu), listen to it, don't take it seriously.
Positive reviews loudly said that Xen was one of the industry standards of the time and was very mature.
The negative review whispered that Xen's architecture is very complex, the code chain is very long, and the changes to the kernel are relatively large.
Xen is a very good project, but it is really too complicated. No more than 50 people in the world really understand the Xen architecture. Most people stay at the level of being able to use only this.
In addition, Xen is a traditional virtualization technology.
Xen's traditional structure, which determines that it has a particularly heavy burden, is busy with many things, such as protecting physical hardware, protecting BIOS, virtualizing CPUs, virtualizing storage, virtualizing networks, and many other management functions.
Xen is too bulky and doomed to quit, but the emergence of Xen has opened the door to the open source world for virtualization.
In the dream of decades, the long wind tens of thousands of miles, to blow virtualization into the open source world.
In 2014, KVM, which is also open source, came.
As a member of the Linux family (in the form of components), this lightweight super-hypervisor, with its light posture, sweeps the world.
At this time, VMware's life is not booming, closed-source commercial software, not to mention spending money, cloud vendors to change is not convenient.
Don't forget that the early architecture of the DPU of Amazon Cloud and Alibaba Cloud can see Xen and KVM.
(iv)
Ask the world whether this mountain is the highest, or if there is another place higher than the sky.
(Manually play rhythm: Hey Ha!) )
The tragic green teenager behind the open source community hides in the open source jianghu and practices hard skills.
Virtualization technology more than a decade ago was not mature.
Looking around the world, there are not many people who do virtualization technology research.
The most powerful and respected person in the open source community, with a unified title, Maintainer (software maintainer), is also a high-level code contributor, in charge of the design and planning of open source projects, has a deep understanding of the big picture, has a deep understanding of the future, and has a unique insight into the future.
The world is sometimes big, sometimes small.
Anthony, is the Maintainer of QEMU.
Xiantao Zhang is the author and maintainer of KVM cross-platform support and KVM/IA64.
QEMU was once the world's premier system emulator and virtualizer. QEMU supports Xen and KVM and is widely deployed in most cloud environments.
From Xen to KVM, Anthony and Zhang Xiantao's technology has advanced by leaps and bounds, breaking the ceiling every day. As long as the performance challenge is solved, any posture will be.
You know, the bottom of those systems, to solve the performance of the fault or bug ( bug), is very difficult, out of the hand can be strangled the throat of the entire project, people can not move.
Anthony and Zhang Xiantao, there are constantly "coincidences" on their bodies.
Zhang Xiantao and Anthony intersect in the Xen and KVM open source communities, and are well-known geeks in the community, which has made countless open source players look up to them.
After many interviews and reports, when they introduced themselves, they once said a sentence: "I have been doing the work in the direction of virtualization." ”
The same words are spoken in Chinese and the other in English.
As long as there is a news hotspot of virtualization security, foreign technical media are proud to interview Anthony. Zhang Xiantao was one of the first domestic participants in KVM.
Behind the coincidence, it is often inevitable.
In 2013, Anthony, joined Amazon Cloud.
In 2014, Zhang Xiantao joined Alibaba Cloud.
In the past, in the virtualized open source community, the two most "powerful" people.
Nowadays, in the head cloud computing vendors, DPU technology changes are dominant.
More coincidentally, one is responsible for the Nitro system and one is responsible for the Dragon system.
The DPUs of Amazon Cloud and Alibaba Cloud both draw on the essence of open source virtualization software Xen and KVM.
Cloud computing has brought about the prosperity of virtualization technology and achieved a technological transition.
At this time, the experts of virtualization have changed from the treasures of hardware manufacturers to the treasures of cloud computing manufacturers and large Internet companies.
At the age of 35, it is eliminated and the like, in front of these people, Chundang put a stinky fart.
Everyone realized that virtualization technology was "valuable" and flocked to it, but unfortunately, the threshold was high.
Virtualization is a very difficult technology, virtual machines are abstractions of real computing environments, and many people are stuck by the word "abstract".
The operating system kernel is already a sweeping monk-level skill, and virtualization is a lone wolf.
Zhang Xiantao said: "In the past, we thought that the operating system kernel was the most difficult to understand and the most complex system software. There are many very senior kernel engineers in the industry who have switched to virtualization, but they can't understand it and can't do it well. ”
Why?
Because virtualization abstracts another layer, its difficulty is greatly increased, and it is necessary to use software to implement hardware (functions).
In an era when cloud vendors didn't have virtualization experts, Amazon Cloud also looked to Intel people to solve problems.
More than a decade ago, the story of Intel engineers' FireWire rescue cloud manufacturers was almost forgotten.
In 2010, Alibaba Cloud is preparing to release its product (ECS 1.0) on May 10.
At that time, the engineers of three companies crowded in Alibaba Cloud to attack the stronghold, attacked for more than a month, some people had to put their heads bald, see the day, there is a hurdle, can not pass.
About 1000 servers, after a night of running, there will always be a wonderful thing, the hard disk can not be found.
The hard disk is also aggrieved: "I dropped the line." ”
The team was cornered by death, and they came up with a responsible inference: the problem was either in the chipset or on the chip. The attack team is roaring, we have to ask Intel to send someone, hurry up.
When life is hanging on the line, no matter who Intel sends, it will be stared at deadly, eager to use the stopwatch to time.
Unexpectedly, after the Intel experts arrived at the scene, they looked at (all the configurations), thought about it for a while, and then said: "You can change the parameters."
The timekeeper, looking at the following table, from getting the Alibaba Cloud server logs to getting it, took about 3,000 seconds (50 minutes).
The relief of this moment made the Alibaba Cloud engineers present never forget it for a lifetime.
Grab a few silk scarves and we're going to wrap our arms around our square dances.
This matter, let a foreign aid inside Alibaba Cloud, a small fire.
Who knows that soon after, Zhang Wensong of Alibaba Cloud asked the team a question: If you want to dig a person who does the best virtualization, who should you dig?
Who is Zhang Wensong? Founder of Linux Virtual Server, open source god, former CTO of Alibaba Cloud, chief scientist.
Alibaba Cloud engineers, their hands do not leave the keyboard, their heads do not have to be lifted, and they have opened their mouths: Zhang Xiantao.
Coincidentally.
Someone told me: "In the 2000s, the Amazon cloud had not yet put Anthony in the past, and the virtualization problem could not be solved, and it had to rely on Intel." Because of the big bull engineers of virtualization, amazon cloud is also lacking. ”
Amazon Cloud, with Anthony.
Alibaba Cloud, with Zhang Xiantao.
The seeker does not meet, so where is the master of virtualization?
Unbeknownst to the cloud, they gathered at IBM, Intel, and Red Hat.
According to well-informed sources, around 2008, intel's Shanghai office, the virtualization team was about a dozen people. After cloud computing drove the popularity of virtualization technology, the whole world came to recruit people.
Since then, many virtualized talents have stayed in the United States until now.
People fight for a breath, and the Buddha receives an incense. Why do cloud vendors hold their breath and pounce on chips?
The answer is: who suffers, who feels bad, who drives me crazy, who knows.
Zhang Xiantao's earnest tone was impressive: "Even if it is not the Shenlong team, Alibaba Cloud will have another team to make the DPU." ”
As we all know, how big the server size of cloud computing vendors is now. When the scale expands and the number of users grows, the desire for DPU becomes eager.
Hundreds of thousands of servers, day by day, waiting to be fed.
In Anthony's mind, he should have asked the question of the nature of the DPU many times:
"In order to get a better product, we have to design the hardware, we have to design a hardware platform that is dedicated to virtualization. Not generic software, not generic hardware. ”
Looking back, there is no way back, and in the period of optimal technological change, DPUs have emerged. Customizing hardware acceleration with DPU becomes the most correct direction.
(5)
Don't blame me for not reminding me that the virtualization of cloud computing is very different from the virtualization of previous generations.
The previous generations of products and DPU are separated by a bottomless graben, jumping over, is the avenue to heaven.
The question is, how to jump?
Since 2012, the Amazon cloud team, especially EC2 Virtualization, has been thinking:
The "super administrator" called Hypervisor had to be a little bolder and a little more capable. So the question is, can the world make a better superstore than a pure software architecture?
This is what I can find, Anthony talked about in a foreign media interview, the earliest point in time when Amazon Cloud's idea of DPU germinated.
But there was no shadow of Nitro at that time.
Later, the Nitro System was exposed to the public by a well-known acquisition.
The acquired company, called Annapurna Labs, is annapurna's peak laboratory, and has research and development centers in Both Israel and the United States.
Mountaineering enthusiasts, the name is very familiar.
Coincidentally, one of the ten highest peaks in the Himalayas, Annapurna Peak.
The horns are sharp, the mountains are straight, the lines are sharp, the snow is covered, and the sea of clouds is rolling violently, scratching the heads of climbers all over the world: "You come here~"
Coincidentally, the company's two founders, Billy and Nafea, are also mountaineering enthusiasts and are proud to climb this peak. Although they did not arrive, but their hearts had arrived, they designed the horn peak as a LOGO and printed it on the packaged chip.
Mountaineering is individual heroism, and DPU is team-based collectivism.
The Annapurna Laboratory is a gift from Heaven for amazon clouds. Domestic cloud manufacturers had this good deal at the beginning, and they woke up laughing in the middle of the night.
Acquisition of this matter, just with money is not enough, a good "acquisition target" is extremely rare.
This "mountaineering enthusiast" company, in addition to mountaineering, has several unique skills.
First, graviton chips, the first Arm chips of cloud manufacturers.
The second is a virtual machine shortcut shortcut technology, ENA.
(ENA, full name Elastic Network Adapter, a network card driver that can be used in virtual and physical machines, is an open source project published on the GitHub website.) )
This technology focuses on four or two thousand pounds, so that the virtual machine bypasses the software (kernel and user space network handling procedures) and directly operates the hardware (network card), and so on, improving network efficiency.
The once-obscure ENA became a key technology for Amazon's cloud network virtualization and will now be part of the famous Nitro.
The cooperation is silky smooth, then buy it, who called the world's richest man in charge of Amazon at that time.
In 2015, the purchase price was $350 million.
Don't look at how much was spent at the time, but how much was saved later. It's a near-perfect acquisition that saves Amazon a lot of dollars every year.
Because one of the specialties of DPU is that it can play very well, a set of dragon eighteen palms, defeat virtualization loss, not to mention.
Less loss, of course, save money.
This card, developed by Annapurnasfont Labs, not only uninstalls the VPC network function, but also the EBS storage network function.
This is the aforementioned, "task offload" technique.
According to Brendan Gregg, Nitro's performance loss is very small (less than 1%), and Nitro's virtualization performance is close to bare devices.
In Amazon's culture, there are theories of the one-way door and two-way door decisions. This translation is quite obscure.
"One-way door" tasks, like the movie "Squid Game", are mostly pointed at the head by a gun while working.
As long as the mission fails, a shot is "hissed". Thrilling is not thrilling, stabbing is not exciting.
"Two-way door" is that this scene is not used well, move to another place, maybe you can still use it, anyway, it will not be in vain, KPI is saved, everything is good to discuss.
The DPU is dedicated, and "dedicated" means that it is "useless" to get elsewhere.
When the Nitro system was developed, the distance between the gun and the head was a few millimeters.
The distance between failure and success is lost.
When describing the difficult development years, the R&D team was like a liberal arts student, using four adjectives in one breath.
They said: "This time we made the decision, methodically, cautiously, slowly, thoughtfully. ”
The knowledgeable mind understands that this is not an ordinary task, and its requirements have exceeded the capabilities of traditional virtualization technology. Because breaking the tradition is to be reborn from the fire.
The R&D team wrote on the technology blog: "Only innovation can do it, but we are not in a hurry to pat our heads." The whole journey of exploration lasted five years, carefully and repeatedly, with care at every step, to verify that we were heading in the right direction. ”
In 2013, the Amazon Cloud R&D team launched the first Nitro offload card (C3 instance type) to offload network processes into hardware.
In 2014, the EBS storage was offloaded to hardware (C4 instance type), and for the first time the R&D team worked with a company called AnnapulnasFont Labs.
The Nitro R&D team talked about the timing of R&D: "In 2017, we uninstalled the last component, including the control plane and the remaining I/O, and we introduced a new hypervisor with a full Nitro system of C5 instance types. ”
What the code looked like, I can't remember it now, but the engineer still remembers the mood at that time:
"This is the pouring of money, the exhaustion of the body and mind, the mission of the commitment, the incredible moment." When the Nitro system was introduced, five years of hard work was a rare thing in this life. ”
What does Nitro bring to the Amazon Cloud?
Nitro's iteration has pushed the ec2 product family at the core of Amazon Cloud to evolve in a larger, faster, more secure, more stable, more type, and more cost-effective direction.
The Nitro system gives Amazon Cloud the ability to deliver a cloud with 100 Gbps enhanced Ethernet networks, supporting higher throughput or network-constrained workloads such as HPC applications.
With the Nitro system, virtualization functions are offloaded to dedicated hardware, breaking down EC2's architecture into smaller chunks. Assembled in many different ways, these blocks provide the flexibility to design and deliver EC2 instances quickly, with increasing compute, storage, memory, and networking options.
Werner Vogels, CTO of Amazon Cloud, once said, "In Amazon Cloud, 90% to 95% of new projects come from feedback from customers, and the remaining 5% are also innovative attempts from the customer's point of view." ”
The Nitro system is one of these projects, it was born in 2013, matured in 2017, and is still evolving, and has iterated to the fifth generation in 2021.
(6)
The most important point is that the amazon cloud technology team has seen it, and the Alibaba Cloud DPCA team has also seen it.
Anthony saw it, and Zhang Xiantao also saw it.
Moving the traditional virtualization technology directly to the cloud computing is extremely flawed, after all, it is not for the cloud computing server.
Spend your time on the most thought-provoking issues.
Around 2016, Dr. Zhang Xiantao was thinking about the same question every day: What kind of virtualization technology is suitable for cloud computing?
It is necessary to fundamentally solve the shortcomings of traditional virtualization applications in the data center (that is, all problems in terms of performance, resources, and isolation).
The "Divine Dragon System" in his mind slowly became clear.
During that year, Dr. Zhang Xiantao traveled intensively and quietly between Beijing and Hangzhou, intending to persuade a number of big-name chip master architects to join Alibaba Cloud.
There is such a sentence, very touching, and when it is realized in the future, it is even more exciting:
"The outside world can't understand the determination of Internet companies to do DPU, this thing is absolutely unprecedented, it can change the most core technology in cloud computing."
What technical value does the DPCA chip bring to Alibaba Cloud?
Zhang Xiantao believes that, first, to solve the problem of complete isolation of CPU and memory. There are two levels of isolation here, one is the isolation of security, and the other is the isolation of performance.
Second, IO links are most prone to security vulnerabilities. QEMU this simulator is brought from the traditional virtualization, at the point in time of the first generation of the Shenlong chip, it is completely obsolete.
The so-called obsolescence, including two points. First, the code is open source and visible to everyone. Secondly, there are many security vulnerabilities, and some virtual machines often escape.
In the world of the public cloud, the five words "virtual machine escape" have not yet been spoken, and a group of people have pounced on it and covered your mouth.
Virtual machine escape = Absolutely not allowed.
DPU solves the problem of performance, but also solves the problem of security.
Shenlong chip thought well at the beginning, multiple cards to solve the problem, emphasize multi-in-one, a variety of functions in one card to achieve, complexity decreased, stability enhanced.
The two carriages solve the same problem and realize different ideas.
Foshan has no shadow feet, and the dragon has no shadow knife.
One of the keys to DPU is "where to cut" and "where to cut", and the answers are full of mystery and Zen.
This is reminiscent of The Cow, if you want to answer: Where is the bone, where is the meat, where is the bone and flesh connected;
I'm afraid I have to know the structure of the whole cow, all by the feel, the knife technique in the mind, in the muscles.
This is not enough, the problem is that every cloud vendor's software is different.
How to deal with the software interface of distributed storage and distributed network?
Which should be placed in the control path?
Which ones are placed in the data path?
If you don't understand virtualization, you don't know how to cut, or the performance is not good after cutting.
The DPU team is angry on the surface, and the heart is suffocated, who broke the problem?
Or maybe some DPU teams haven't seen where the cows are.
DPU this thing, light hardware ideas, or light software ideas, will definitely have big problems.
When the story is told, the knowledge of virtualization is not enough, and we must talk about another experience of Zhang Xiantao at Intel.
Shanghai Hongqiao, as a famous transportation hub, has been bullish in surrounding housing prices.
In 2005, Zhang Xiantao had just arrived at Intel for an internship, the salary was not high, the wallet was not drummed, he was looking for a house to live in the Dahongqiao area, and he chose to choose Moutai Road near Xianxia Road, an old-fashioned community called Tianshan Five Villages.
The house price of Dahongqiao forced Zhang Xiantao and his brother to share a single room, and in the forced room, there were two single beds, which was cramped enough. Unexpectedly, more cramped in the back.
As soon as he entered Intel, Zhang Xiantao's pressure value exploded.
Why? He found that after six years of computer professional white reading, Ma Ya, intel bulls said, actually did not understand. The reason is that the things they talk about involve the expertise inside the chip.
Can dominate the door of an era, the top of the semiconductor industry chain, there are still many cheats.
After a few days of confusion in his head, Zhang Xiantao's strength not to accept defeat was on his head.
After being instructed by the high-ranking people, he rushed to the Scripture Cabinet to find the treasure book.
The System Development Manual is such a presence that you can read the package for the first time, and you can't understand it.
Logically, the operating system of the computer is written according to this. For example, Intel's 64-bit processor uses IA 64, and the companion Manual has several volumes.
The "Manual" is also not deceived, the volume is as thick as a brick, I don't believe you can finish it.
At night, the master brother slept, Zhang Xiantao did not dare to turn on the headlights, and felt something from under the pillow, a flashlight. So, coughing and farting were all in a bed, using the light of a flashlight to look at the Manual.
The black tiles of the old residential area blended with the night, and the floor flashed with sporadic light, emerging from the square grid of the window, and the light in Zhang Xiantao's house came out of the quilt.
When I opened my head, I realized that the pain was layer by layer.
I read it every day, and I also have to look at the kernel code of the operating system. Why is this line of code written like this, he has to go to the programming manual to find the answer. That's not enough, look at the code for Linux and Xen.
I don't understand how to do it, Intel also has a "brother help" mechanism, similar to the red scarf of the bully, do not know how to ask the brother.
A software programming manual in the left hand, a hardware programming manual in the right hand, plus Linux kernel code, line by line to understand.
I don't understand what to do, go to the American engineer for advice.
Zhang Xiantao took a bite of the programming manual, dipped a few lines of Linux kernel code, and became a daily routine.
Day after day, Zhang Xiantao's understanding of the CPU and the operating system deepens and deepens.
At Intel, Zhang Xiantao knew a "cold knowledge".
Any chip, before "walking" out of Intel, internal employees may have obtained "unpaid chips" three to five years in advance.
Engineers have to "use" all the new features of the CPU with software.
To put it bluntly, the chips in hand have not yet been officially mass-produced. There will be all kinds of problems in the chip. You have to understand whether the "problem" comes from software or hardware.
Without understanding this, you can never suspect that something is wrong with the CPU.
The technology leader of the DPU needs to understand the chip, the chipset, the PCIe bus, the operating system, and the virtualization to reach a realm where the lights are turned off to take things as freely as turning on the lights.
The judgment that seems to be flowing in the clouds and flowing water is formed day after day, year after year, silently, just like the volcanic ash that falls from morning to night in the hot wind after the volcanic eruption, burying all technical difficulties.
Wake up from the volcanic ash and see a new world reshaped.
The deployment of the DPU is tantamount to changing the windproof material for the high-speed speed train and the waterproof material for the submarine working in the deep sea.
From 2017 to 2021, both Amazon Cloud and Alibaba Cloud have entered a new world of a virtuous circle of DPU product iteration.
In the summer of 2021, Dr. Zhang Xiantao said to me: "Before, no one believed that Internet companies needed chip technology. Now, everyone believes it. ”
(vii)
The referee gave a long whistle, and the male host's magnetic voice announced: Audience friends, this is the data center arena, the IaaS layer final game, the last game of cloud computing infrastructure.
When the excellent DPU came out, domestic cloud computing vendors in the IaaS layer, this round of battle, declared the end.
The cloud computing vendor that developed the DPU said: "I have a showdown, I win." ”
Even five years ago, looking at the website looking for a job, even if the cloud computing vendor "released" the position of recruiting chip experts, who dared to go? What to do? Then the senior HR looked at the job description, they were confused, and they had not contacted the people who engaged in chips.
How fast the software development cycle is, how slow the hardware development cycle is. Everyone else said that this relationship did not seem to last long.
The veteran chip company turned its head, and the corner of the eye was full of quality (contempt) suspicion (vision).
Cloud vendors are only good at software, how to face chips?
The scene faced by cloud vendors is extremely complex, how to do it with chips?
Who came up with the question, so hard.
The problem is the complexity of complexity, and the demand is the just need of the just need.
Sorry for the late science popularization of the article, DPU is a dedicated chip for servers on the cloud.
This sentence has two key words, "server on the cloud" and "dedicated chip".
First, servers on the cloud.
The cloud server is somewhat like a public bathhouse, which can be used by one person or shared by many people, and the trouble is brought about by "using it together".
Public bathhouses "use together", preferably with partitions. I look at you, you look at me, so I don't have a clear (ann) (all).
Coca-Cola and Pepsi are on a cloud, can read each other's documents, and immediately turn their faces, and soap is not picked up.
So what to do?
The answer is: the DPU has to be plugged in, and every server has to be plugged in. 100,000 servers, 100,000 DPUs. Antivirus software is to protect security with software, and one of the functions of DPU is to protect security with hardware.
When it comes to security, hardware is better than software, and this will not be repeated.
Again, dedicated chips.
At the mention of special chips, people who make money by mining and speculating are excited and rush to say: "I am the most knowledgeable." "Because different cryptocurrencies, different mining machines. The more the mining machine is right, the more profitable it is to dig coins.
The copper smell of money tells us: special things, let special chips do.
There is still controversy now, but in the future, it will be clear that DPU is the standard of cloud computing.
Moore and Denard, two old gentlemen, ruthlessly pointed out the "helplessness of reality", and the CPU became the most expensive "hit worker".
Therefore, DPU as a dedicated hardware, in addition to security, but also to reduce the burden on the CPU.
A few streets away, you can hear the nagging of the DPU: "Oh, the CPU my ancestors, put it down, you dare to move this, you can't waste resources on the network and storage load." ”
The CPU said: Save the child. I'm too hard.
(The reason the CPU shouted for help was that it was overwhelmed by both handling a large number of upper-level applications, maintaining the infrastructure of the underlying software, and dealing with a variety of special IO-like protocols.) )
Offload the "burden" from the CPU, and the DPU will hopefully become a representative chip that undertakes these "burdens."
The CPU is also very happy with the appearance of the DPU, you can, you are on.
Indeed, some people praise DPU as the "third" main chip after CPUs and GPUs.
Don't misunderstand the ability of DPU because there are too many flowers and applause.
The CPU sits firmly on the "main coffee" throne, the CPU can be used as a DPU, and the CPU can also be used as a GPU, but the reverse does not hold.
DPU things CPU can do, however, CPU is much older than DPU. The cattle knife is too expensive, and the person who kills the chicken is naturally reluctant to do so.
Cloud vendors want to "take a bath together", relying on virtualization technology. Virtualization is good, but it will cause a bunch of "bad things", such as performance loss, and even some people compare this loss to "paying taxes", and of course pay more taxes.
This loss is also equivalent to the fact that before the bath has begun, half of the water is wasted in the water pipe, and the soap foam has not had time to wash away.
The more difficult the technical problem, the more excited the geeks became, and unconsciously raised the small leather whip in their hands.
Virtualization is the essence of DPU, the history of virtualization is almost as old as the computer, is one of the greatest ideas in the history of computer science, creating a great cloud computing technology and market.
"Creating illusions" and "hiding details" (giving the upper-level application an illusion and reducing the complexity of the upper-level application's use of the lower-level resources).
The operating system we use every day is also a kind of virtualization "thought", which is the virtualization of hardware resources.
The virtualization of the PC "becomes" the core of the computing process. "Turn" the storage media into a file system.
In the smoke and smoke of cloud computing, the virtualized ammunition depot sprayed with camouflage camouflage can finally not be hidden.
(viii)
Saying that it is low-key, who expected that the DPU directly broke through the dimensional wall and played "fire tongs Liu Ming" in the barrage.
When there are artificial DPU, the DPU is not yet on fire, they are, Alibaba Cloud's "Shenlong chip", Amazon Cloud's "Nitro System".
Both, excellent.
Not only has it been built, but it has also been used on a large scale.
Not only has it scaled up, but the benefits in the cloud scene have also been huge.
Alibaba Cloud is the most top-notch in the technical team of domestic cloud computing vendors.
Amazon Cloud has never disappointed in technology (PR advertising investment is another story).
The team they built the DPU is like a male teacher crossing the river, turning the world upside down.
Since then, cloud vendors have been divided into two columns: those with DPUs and those without DPUs.
The Chinese men's soccer team, laughing and silent, lagging behind in the big score, is talking about those cloud manufacturers without DPU.
Amazon Cloud and Alibaba Cloud are both revolutionaries and have chosen the same technology direction.
Yuncan Xiapu is also a proud person at the end of the world.
The SA of Amazon Cloud is Solutions Architect, very good at playing, showing code without a word, almost everyone is comparable to the startup CTO.
One SA told me privately, "Simply put, DPU is equivalent to delegating virtualized different workloads to different cards." ”
Pay attention to the verb "devolved" and understand it for a while before you figure it out. The word, ah, is used wonderfully, and the technical term behind it is "task unloading".
"The Nitro is a card that straps the load (Hypervisor virtual layer, storage, network) to it. That is, the things that affect the security, performance, and stability of virtualization are put into the board. ”
"It's not a card, it's a set of cards. Each card has a different goal. ”
"The Nitro System is called a system because it consists of three separate sections: the Nitro card, the Nitro security chip, and the Nitro hypervisor.
In the past, I had to cook two small dishes by myself, but the ready-made API was ready.
Not only can you cook, but you will also create your own dishes.
Learning (engaging) to burn (create) dishes (new), it is not so difficult.
Because the Nitro system is a "box of base components" with many different ways of assembling, enabling AWS to flexibly design and deliver quickly (EC2 instance types), compute, storage, memory, and networking can all be combined options.
After the patient with selection difficulty saw it, he quickly took a sip of coffee and suppressed the shock.
Amazon cloud employees also talk about this approach to extend the cloud computing microservices architecture to hardware, facilitating "innovative APIs."
In 2017, people who love to watch the liveliness are watching the Dragon MOC card, but, never expected, the onlookers are DPU.
An employee of Alibaba Cloud's heterogeneous computing team told me privately, "MOC can be understood as a small server. As the name suggests, micro-servicer on chip. However, in 2021, we will use the Dragon chip uniformly for external caliber, not called MOC card. ”
Alibaba Cloud employees also said: "For the details of the Shenlong chip, the company hopes to talk less about it. There are quite a few people who are inquiring. ”
On October 20, 2021, Shenlong launched the fourth generation, and the Jianghu people called Shenlong 4.0.
Compared with the third generation of dragons, how much has the key performance indicators improved?
Speaking of two key, network key performance indicators are more than doubled, and storage key performance indicators are doubled. For the first time in the world, DPCA 4.0 is equipped with a large-scale elastic RDMA high-performance network, and the overall network latency is greatly reduced.
As a network communication technology, RDMA is not a new technology, but Alibaba Cloud elastic RDMA allows RDMA technology to move from the niche field of high-performance computing (HPC) to the public cloud.
The ability of RDMA to form large-scale networking was a problem that the entire industry could not solve.
Elastic RDMA will greatly benefit from performance gains for cloud-native microservices, serviceless computing applications, and even applications using the Netty Network Programming Framework in Java.
In the fall of 2021, Zhang Xiantao said: "The Shenlong chip is currently the best DPU in the industry, and there is no one. ”
DPU should be talented and talented, and it should have good looks, but there are also "two monsters".
A big monster, Amazon Cloud and Alibaba Cloud's DPU, do not take out.
DPU as a dedicated chip, do not you understand, as long as you understand yourself.
Another big monster, many cloud computing manufacturers, when it comes to self-developed DPU, say goodbye.
Not to mention that Qingyun and UCloud are listed, and they are also losing money.
What's more, to create a DPU, how to do it, you have to auction three hundred million yuan.
(IX)
On the earthen wall at the entrance of the village, the slogan of red paint on a white background is painted:
DPU, early to own, early to get rich.
DPU, keep it safe.
DPU, well isolated.
DPU, save a lot of money.
Must be used, and there is no money for self-research, you can use NVIDIA's DPU. In 2020, NVIDIA acquired Mellanox for $6.9 billion, which is the DPU.
Unfortunately, it is not "tailor-made", it is not suitable for use, and it is very painful. Some experts have been mercilessly criticized, and they are not satisfied with NVIDIA's existing features.
The leaves on the trees, green and yellow, cloud computing vendors submitted work orders to Broadcom, are in line.
The north wind was blowing, the branches were bald, and the work orders were still lining up.
Both Alibaba Cloud and Amazon Cloud's DPUs were released in 2017.
After many years, has any cloud vendor kept up?
The crowd shook their heads and fell silent.
Amazon Cloud and Alibaba Cloud may say: "Forgive me, did not hold back, laughed out loud."
In the non-public product market, there is an urgent need for "well-informed people".
Coincidentally, there is a famous domestic cloud manufacturer, running to the customer to publicize (blow) transmission (cattle), DPU is not a smart network card, my factory in 2012 has, much faster than Shenlong and Nitro.
Knowledgeable customers send out a soul question, and instantly "social death" scene.
"If your DPU is really like this, then why don't you use it?"
Is there a phenomenon of bragging, human-to-human transmission?
Coincidentally, the turn of the smiling river, the rest of the DPU products, or stay at the level of "not very easy to use".
Or just grope for a prototype and stay at the level of proof of concept.
The Chinese men's football team smiled and said, sorry, can't open the situation.
Fans fired: "Spent so much money, you want to say the emphasis is on participation?" ”
Too coincidentally, someone told "Dear Data", a number of companies secretly sent employees, every day to find Aliyun people to talk, why this is done, why is that interface so designed.
The water of the chip is very deep, and there will always be some "inside information" from the supply chain, and there is a cloud manufacturer that has copied for several years, pixel-level copying, and has not copied a decent one.
Worse still, the scale is getting bigger and bigger, and it can't last.
Those cloud vendors with DPUs, hot upgrades, multi-Happy, iterative speed whizzing drip.
Those cloud vendors that don't have DPU, but miserable, I heard that one of them has to restart the server once a month.
(I thought restarting was just a common practice for liberal arts students, don't ask me how I know.) )
The DPU is the artifact of Versailles in the circle of friends.
Cloud manufacturers send a circle of friends, wishing you business to build a world-class DPU as soon as possible.
Realizing that Youshang has built a world-class DPU, silently delete a circle of friends.
Fungible wrote in the circle of friends that in 2019, we defined DPU.
Downstairs commented: "The company is quite valuable, and the SoftBank Vision Fund has invested heavily." ”
Unfortunately, the product is generally done, and the understanding of cloud computing is not in place, and it is impossible to give it a thumbs up to its comments.
Intel couldn't sit still and released the IPU infrastructure processor to express a different view of the "DPU" thing.
I hope that the circle of friends will receive high praise.
Cloud manufacturers line up downstairs in turn to like, but the heart sighs, the world of DPU, Intel can not be a single order to "unify the rivers and lakes".
(x)
Invest in DPUs, at least two "do not vote".
Teams that are not familiar with the needs of the cloud business.
Second, those teams that have a relatively shallow understanding of the software and hardware integration part.
Unfortunately, invest in DPU this pool of water, no water is the most mixed, only water is more mixed.
There are two well-known miscarriages of justice on the DPU. Before the absence of DPU, SmartNIC (a type of smart network card) came out one step ahead of the other to reduce the burden on the network.
The first impression is the most impressive. Therefore, some people still mistakenly believe that DPU is SmartNIC.
SmartNIC is an acceleration of the network, but solves a much smaller problem than the DPU.
At this time, the ETC automatic bar lifting machine (fine) was online: "You answer me, is the most basic function of the DPU a network card?"
Even the people are familiar with the "5G" and "gigabit optical fiber" in the news, not to mention the industrial Internet and the Internet of Vehicles.
The requirements for the network are getting higher and higher, and the network bandwidth of cloud computing has gone from the mainstream 10Gbps to 100 Gbps with your eyes closed.
Unfortunately, although the DPU can help the network, it is not a smart network card.
When a product has changed dramatically, we might as well call it a new name.
Unfortunately, following the path of smart network cards, you will never reach the oasis of DPU.
However, talking about DPU at the "2021 Smart Network Card Summit" is also a feature of a specific period.
All misunderstandings are clouds of smoke.
"Is a smart network card the only way for DPU? How do you understand the idea of building a smart network card first, doing a solid job and then doing a DPU? ”
Huang Chaobo, author of the book "Software and Hardware Integration: The Road to Innovation of Ultra-large-scale Cloud Computing Architecture" published by electronic industry press, and the former head of chip and hardware research and development of UCloud cloud computing manufacturers, thinks so:
"Standing at the level of function, it is definitely a process from simple to complex, and this statement is correct."
After the twist, it is often the focus.
From the perspective of realization, this statement is debatable. The path of smart network cards often follows NVIDIA's approach, first NIC, then SmartNIC, and then SOC. The implementation of network functions is a custom ASIC (Dedicated Integrated Chip). However, Amazon Cloud and Alibaba Cloud 'didn't go the usual way'. From the beginning, it was only the CPU that was implemented, and then gradually various accelerations were added. In short, the evolution of this DPU is from CPU to DPU. ”
As mentioned earlier, Amazon Cloud and Alibaba Cloud are the same technical direction, but the way to go is different.
You taste, you taste: NVIDIA's technology route is from customized acceleration, to universal. This is completely the opposite direction of technological evolution from Amazon Cloud and Alibaba Cloud, from generalization to adding customization.
Another unjust case is the literal understanding of DPU.
Sure enough, you can't just look at the surface.
The full name of the DPU, called Data Processing Unit, is a data processor. Ever since Cybertron pioneered, there has been data. Can't the CPU process data? Can't the GPU? Since it is not, then why do you call it data processing.
The CPU and GPU clenched their fists, resisting the urge to slap the photons and shouting: "Today, no one is good to make." ”
Not to mention, the Data Security Law slammed the door panel: "Temporary inspection, I heard that you have data here, or the underlying data?" ”
In this way, the security guard pulled up the yellow warning belt, and the scene was afraid that it would be out of control.
Unjust cases obscure difficulties.
DPU is software-defined hardware, is to use hardware adaptation software to do acceleration, want to understand DPU, to understand a lot of things: chips, system software, computer architecture, cloud computing services, virtualization.
The success stories of the two cloud vendors also play down the difficulties.
Investors often hear people say: "Alibaba Cloud and Amazon Cloud DPU have been built, and there is not much time left for the creators (countries) (teams). ”
In 2021, a bunch of domestic DPU companies have received financing one after another.
With both hands outstretched, count, Cloud Leopard Intelligence, Yisi Core, Hefei Edge Wisdom Core, Nebulas Zhilian, Qingyun Semiconductor, Dayu Zhixin, Zhongke Yishu, Core Qiyuan, Deep Storage Intelligence, etc.
DPU start-ups exist in Beijing, Shanghai, Zhuhai and other places.
What can be found on the public industrial and commercial information is that the Internet manufacturers have also brushed their hands:
Tencent Investment, CloudEd Leopard Intelligence.
Meituan Investment, Nebulas Zhilian.
Byte Investment, Cloud Pulse Core Union.
There are many advantages to DPU, and China's cloud computing market is a multi-cloud market. For example, after the emergence of industry clouds represented by telecom clouds, more industry clouds such as financial clouds and logistics clouds have gradually emerged. There will even be "local clouds" and "a bureaucratic cloud". The head of the cloud computing vendors is not the only customer of DPUs.
In addition, experts from the Chinese Computer Society have estimated that the amount of DPU used in data centers will reach the same level as that of data center servers, and will be added at the level of tens of millions of new products every year, counting the replacement of stocks, and the overall demand for 5 years will exceed 200 million.
This is more than the demand for standalone GPU cards.
It can even be said that a server may not have a GPU, but it cannot be without a DPU.
It's like every room in the hotel must have WIFI, otherwise the front desk customer service phone will be exploded.
As far as the eye can see, the situation is very good, Xin (Lang) Xin (Fei) Xiang (Jin) Rong (Qian).
In fact, niche and specialized key technologies are difficult to get a glimpse of.
The essence of DPU is to solve many problems in the application of traditional virtualization to cloud computing. Because the early virtualization technology is more used in the desktop system, the traditional virtualization used on the desktop is directly moved to use, and it is not easy to use.
The essence of DPU design is closely related to virtualization, in order to solve the "bad things" (performance, resources, isolation aspects, etc.) that come with virtualization.
Simply put, virtualization is mainly divided into four types: CPU virtualization, memory virtualization, network virtualization, and storage virtualization. Only DPUs are the last stop to fundamentally solve the shortcomings of traditional virtualization applications in the data center.
Intel VT-x only addresses CPU virtualization and memory issues. The problem of network virtualization and storage virtualization is a historical legacy problem that has not been effectively solved, especially in cloud computing scenarios. Functionally, it can be achieved, but performance, scalability, and isolation are always not handled well.
Some of the problems are solved, what about the others?
DPU solves the "other", that is, DPU is the last stop to solve the virtualization shortcomings.
DPU is done by targeting the real pain points of hardware virtualization in cloud computing.
So strong, what technologies are involved in DPU?
Put it this way, because the technology involved is very wide, Zhang Xiantao, head of elastic computing at Alibaba Cloud, said: "For the Shenlong chip, I almost mobilized first-class experts from Alibaba Cloud. ”
Perhaps in the eyes of some CPU makers, making DPU is relatively simple. My CPU is so complex that it can be built, and playing with the DPU is a dimensionality reduction attack.
But is DPU really easy to build?
If you don't understand virtualization, don't understand the system software, don't understand the scene of cloud computing, just understand the chip and want to do DPU, then you can send out five big black characters: ignorant, fearless.
DPU is the master of multi-genre technology, there is software, there is hardware, there is computing, there is networking, there is storage, there is virtualization, there is security, there are accelerators, there are drivers, there are frameworks, there are applications, and the essence is intertwined.
Maybe one day, the DPU will call the CPU.
Finally, let us stand up and applaud those who are truly technological revolutionaries.
After all, a local technological revolution can be more exciting than a ball game.
"The players are still a few steps away from the small penalty area on the pitch, and the teammates are shouting not far away, crosses! Cross! The goalkeeper's face moved, as if hesitating. Seize the opportunity, shoot hard from a small angle, and break the goal! ”
"Still stunned what to do, scored, applauded."
(End of story)
The author of this article voice-over: "17,000 words of originality is not easy, forward the article and then go." ”