laitimes

He Zhengyu of Ant Group: The technology behind Double Eleven to reduce carbon, there is a killer trick of "green computing" | talk about carbon

Author | Deng Yongyi

Edit | Su Jianxun

"Talking about Carbon" is an interview column launched by 36Kr's channel 36 Carbon around the topic of "Double Carbon and ESG", we will look for the key players of the "double carbon business" of large companies in the industry, star enterprise CEOs, academic industry representatives and other figures, and conduct an in-depth discussion on carbon neutrality strategy, sustainable development, corporate social responsibility and other topics.

The following is the third issue of "Talking about Carbon", 36 Carbon exclusively interviewed He Zhengyu, Chairman of Ant Group's Infrastructure Committee and President of Trusted Native Technologies Division. Since 2019, He Zhengyu and his team have explored a set of "green computing" technology system, and applied it in the e-commerce business for the first time during the double eleven in 2021, which has greatly improved the efficiency of computing resource utilization, and a double eleven has saved Ant Group 640,000 kWh of electricity and reduced carbon dioxide emissions by 394 tons, which is equivalent to 30,000 fuel vehicles parked for a day.

Under the trend of meta-universe and 5G, the demand for computing power of technology companies will rise exponentially, which means huge energy consumption. In this issue of "Carbon", He Zhengyu shared the stories and experiences of Ant Group's "technology emission reduction" concept, landing, and commercialization prospects in recent years.

.

Similar to most Internet technology companies, when the tide of "double carbon" came, Ant Group faced such an examination question: its own business does not have high carbon emissions, if you want to reduce carbon, where to reduce? How to reduce?

He Zhengyu received this proposition.

He Zhengyu has a resume of standard academic template: he was admitted to Beijing Institute of Technology at the age of 15, graduated from Georgia Institute of Technology with a doctorate, and then worked in the Google kernel group, founded and led the open source project gVisor, and became a new star in the field of basic technology.

In 2018, He Zhengyu returned to China to join Ant Group, and his first job was the technical structure upgrade project of Ant Group. He led the establishment of the "Trusted Native Technology Department", specializing in infrastructure technology, the first goal is to upgrade the technical architecture, more reasonable allocation of computing resources, improve the overall operational efficiency of ant infrastructure.

In 2020, the state announced the goal of "carbon peaking and carbon neutrality", and the "green computing" related team of Ant Group, including He Zhengyu's team, sorted out the work since 2019 more completely, thus determining the carbon neutrality action route. Today, He Zhengyu serves as the president of Ant Group's Trusted Native Business Unit and the chairman of Ant Group's Infrastructure Committee.

There is a precedent in the industry for energy saving and carbon reduction through IT. "Green computing" is a new concept proposed by the industry based on this problem. Although there is no clear definition, the industry generally believes that the core of green computing is to improve resource efficiency.

In terms of specific practice, green computing is mainly divided into two aspects: at the physical level, reduce the data center PUE (core energy consumption index, that is, the total energy consumption of the data center / IT equipment energy consumption); at the computing power level, rationally allocate computing resources.

A number of self-developed technologies in He Zhengyu's Trusted Native Technology Department have been collected into the "green computing" system, together with the research results of the database and technology risk departments. During the 2021 Singles Day period, the "green computing technology system" will be applied on a large scale in Alibaba's e-commerce business for the first time - ants' computing resources will become "tidal lanes", and computing resources will be allocated to different tasks according to time periods. For example, when lunchtime is idle, computing resources can be allocated to operations with low real-time requirements; but at midnight, computing resources can fully support the peak period of payment and order placement, and the time for resources to move can be reduced from the original few hours to about one minute.

A calculation resource is provided to different operation tasks in different time periods, which greatly improves the utilization efficiency of computing resources. Last year, after a double eleven, green calculation allowed ants to save 640,000 kWh of electricity, equivalent to 820 people's annual daily electricity consumption; reduced carbon dioxide emissions by 394 tons, equivalent to 30,000 fuel vehicles parked for a day.

2021 Ant Group Double 11 Emission Reduction Report

Compared with three years ago, ant group's server utilization rate increased by more than 2 times, which is equivalent to reducing the power consumption of each hashing power by half under the same scale of business.

To achieve such an emission reduction effect in a short period of time, every step of Ant's 2019 architecture upgrade is indispensable.

"In the past three years, Ant's technical architecture upgrade has mainly done two things, the first is to move the business to the cloud native architecture, and the second is to establish a unified dispatch center to dispatch all computing resources." He Zhengyu recalls 36 carbon.

Previously, Ant had completed the business migration to the cloud, and the first satisfaction was to let the business software move to the cloud and be "available" on the cloud. When the business enters the stage of rapid development, the internal computing resources are scattered in different places, such as business, AI and other departments with large computing power needs have their own technology stacks, and the problem of repeated wheel building is becoming increasingly obvious.

Therefore, choosing to upgrade to a cloud-native architecture is equivalent to tearing down the underlying operating environment and refactoring the system based on the cloud to ensure security and credibility. Developers no longer have to develop software and then deploy it to the cloud as before, but work directly on the cloud from the beginning of development.

On this basis, the core technology developed by the Trusted Native Technology Department has significantly improved the computational efficiency. Ant's self-developed security container technology is similar to allowing computing tasks from Android and IOS terminals to exist in the same environment at the same time, isolating and allowing the two tasks to run independently. Even when the CPU load exceeds 95 percent, computational efficiency is rarely affected.

Behind the technological emission reduction, the organizational mechanism and adjustment of Ant Group also provide guarantees.

From the organizational mechanism, every year, Ant Group first controls through financial principles, and formulates the budget for the current year according to the actual use of resources in the previous year every year. After that, the business and technology will determine the technology investment and emission reduction targets for this year according to the needs.

By upgrading the cloud native architecture, Ant has also unified the allocation of computing power to the CTO line, and set up a department like the Trusted Native Technology Department to tackle related infrastructure technologies.

"In the past, some departments with relatively large computing power needs would have their own technology stacks and servers, which would certainly be easy to have idle resources." After cloud nativeization, ants unified the computing power to the CTO line for deployment, technically reducing a lot of losses, and it is possible to achieve green computing. He Zhengyu told 36 Carbon.

Looking at the carbon neutrality process, it is only a little more than a year since the state announced the "double carbon" goal, and the "carbon reduction road" of the technology giants has just begun.

After the "double carbon" goal was proposed, Tencent released the carbon neutrality goal and action route in February this year, clarifying the task of the first stage: while saving energy, increasing the application of green electricity, and exploring new technology routes and business models in the form of self-research and investment. Based on the original ICT business, Huawei established "Huawei Digital Energy" in June last year, and has now exported its own green data centers, base stations and other solutions.

In March 2021, Ant announced its carbon neutrality target, committing to net zero emissions by 2030 (Scope 1, 2 and 3), and in April announced its own carbon neutrality route.

From the path point of view, in addition to the mainstream means of reducing data center PUE, purchasing green electricity, investing in green technology, carbon sink offsetting, etc., ants' carbon neutral emission path pays more attention to the benefits of technological emission reduction - ants specifically mentioned in this year's carbon neutrality report that green computing will reduce nearly 30,000 tons of carbon for ants in 2021.

At present, Ant's green computing technology is also being shared through open source, free open patents, and paper sharing. Among them, the distributed database with elastic scalability has gone to the commercialization stage one step ahead - the OceanBase database has begun to commercialize, helping customers with corresponding needs to achieve the effect of increasing efficiency and saving electricity. OceanBase now serves more than 400 customers.

There's a lot more to do. He Zhengyu said that Ant's goal is to catch up with world-class emission reduction practices within 3-5 years.

Foreign tech giants do carbon neutrality earlier. Google announced its own carbon neutrality as early as 2007, and has since launched products including self-developed data centers, Nest thermostat energy-saving power supplies, etc.; in terms of technical architecture, Google has formed an integrated base from storage to computing, which has also greatly reduced energy consumption.

By 2021, Google has even gone further with its goal of net zero: by 2030, let data centers "operate with carbon-free energy 24/7", which means that green energy will be used throughout the entire life cycle of operation.

For the 3-5 year target, He Zhengyu said that Ant mainly bets on breakthroughs in basic software technology. He believes that the potential of technology in green computing is far from being fully exploited.

One proof is that the current data center PUE reduction is fast to reach the limit, and the space for emission reduction is not too large; but through the rational scheduling of computing resources, the resource utilization rate of ants has more than tripled compared with three years ago, and the expected increase this year is also considerable. He Zhengyu believes that there are still many dividends of technological emission reduction, and the basic technology research and development generally has the advantage of being a latecomer, and the team will solve the problem faster and faster.

Next, the tentacles of the Trusted Native Technology Department are also expanding into new areas - this year, He Zhengyu's team will focus on problems such as storage resource pooling. "Now we are scheduling the upper layer of computing power, but the storage is more low-level and more difficult to migrate." Once computing and storage are connected, the efficiency of business operations will be qualitatively improved, which will further reduce energy consumption. Our goal this year is to increase resource utilization by about 15 percent. He said to 36 Carbon.

He Zhengyu

The following is the content of the dialogue between He Zhengyu, President of 36 Carbon and Ant Group's Trusted Native Business Unit and Chairman of Ant Group's Infrastructure Committee, which was published after editing:

36 Carbon: The mainland 'double carbon' target was proposed in September 2020, and Ant Group announced its own carbon neutrality target six months later: committing to net zero emissions by 2030. What is the background behind this goal?

He Zhengyu: The background for declaring carbon neutrality is to respond to the national "3060" carbon neutrality target. On the surface, we announced our carbon neutrality target in March 2021, and we moved quickly. In fact, ants have been exploring and practicing this direction since much earlier. For example, at the level of technological emission reduction, our exploration of this direction dates back to 2019.

In 2019, we already served hundreds of millions of users. When the volume reaches a certain scale, the pursuit of high-quality development by enterprises will become inevitable, which is in our prediction. So we began to upgrade the technical architecture and fully cloud-native, which is an important opportunity for us to do "technology emission reduction" later. At that time, we knew that the most important direction was to improve the utilization of resources and consume energy more efficiently.

36 Carbon: What is the level of energy consumption of ants in 2019?

He Zhengyu: From the perspective of carbon reduction, our business foundation is financial technology. In the beginning, our energy consumption level was similar to that of the financial industry – financial services had higher requirements for continuity and availability, and many times increased energy consumption in exchange for availability.

So the goal we set at that time was to benchmark the emission reduction practices of the most advanced foreign technology companies, such as Google, which has been reducing emissions since 2009. From the beginning of our emission reduction to the present, the energy efficiency of each business is about twice that of the industry level. In the future, we also hope to catch up with the world's most advanced emission reduction practices within 3-5 years.

36 Carbon: After the "double carbon" target came out, the ants announced the target after only half a year, and it was very fast. Inside the ants, what adjustments have been made to emission reduction targets as a result?

He Zhengyu: After seeing that the country proposed the double carbon target in 2020, our architecture upgrade is actually in line with it, which has accumulated a certain first-mover advantage. After the "double carbon" goal came out, we sorted out what we were doing more completely, such as calculating how much energy can be saved after the efficiency improvement, and determining the carbon neutral action route. In fact, starting from 2019, we will review the improvement of energy efficiency in technology this year every year, which is a long-term process.

36 Carbon: If the target is refined, how do ants dismantle the task of reducing emissions?

He Zhengyu: According to the greenhouse gas accounting system, Ant Group's carbon neutrality is divided into scope 1 direct emissions and fugitive emissions caused by fossil fuel combustion; scope 2 is indirect emissions caused by purchased energy sources such as electricity and heat; and scope 3 is related indirect emissions in the supply chain. Our goal is to achieve carbon neutrality for Scope 1 and Scope 2 operating emissions from 2021 and net zero emissions for Scope 1 and Scope 3 by 2030.

For tech companies, the biggest energy consumption comes from data centers, electricity, cooling, and so on. Computing power is one of the aspects, and what the Trusted Native Technologies Department is exploring is the hope of reducing emissions in Scope 3 through technology.

36 Carbon: The upgrading of the group's technical structure is an important premise of the "green computing" technology system, what did Ant Group do at that time?

He Zhengyu: Before cloud nativeization, ants and many technology companies now have similar technical architectures, such as a department with relatively large computing needs, they may hold a part of their own computing resources, such as storage, databases, etc., independent development. But at the peak of non-business, many resources are idle.

Therefore, Ant set up a middle-office department such as the Trusted Native Technology Department in 2019 to do basic technology research and development. To sum up, in terms of technical emission reduction, we have done two things, the first is to move all business to the cloud-native architecture, and the second is to establish a unified scheduling center to dispatch all computing resources.

36 Carbon: What indicators does Ant Group use to measure emissions reduction? What aspects will ants focus more on than industry practice?

He Zhengyu: Indicators are multi-dimensional. What we pursue is not simply to reduce core energy consumption indicators such as PUE, but also to achieve overall resource utilization, R&D efficiency, stability, safety and so on.

For example, we bought an energy-saving LED lamp, which is a little more expensive than ordinary incandescent lamps, but if you never turn it off after buying it back, it is a waste of energy. What we seek is to dynamically adjust the switch of the lamp, without sacrificing business continuity, to save as much energy as possible.

36 Carbon: Can we share a specific business scenario while saving energy while ensuring business continuity? What key technologies have been applied?

He Zhengyu: Many domestic technology companies are operation-driven, which means that there must be a peak in demand for computing resources. For example, the computing resource investment of Double 11 may be 100 times different from usual, which is the most prominent problem in the current domestic technology industry.

Taking the ants themselves as an example, we probably have half of the tasks online, and most of the other half are offline tasks, and we have applied a variety of green technologies to do dynamic adjustments, such as offline hybrid deployment technology, cloud native time-sharing scheduling, AI elastic capacity and so on.

For example, our tidal mixing technology, for example, computing resources are the same as tidal lanes, when everyone eats at noon, the business needs are not much, we will give up the lanes and replace them with other tasks that do not require high latency; we will also analyze the internal business data, predict the future peak of computing resources, and the algorithm is constantly tuned.

36 Carbon: How to improve the efficiency of computing resources? What's the difficulty?

He Zhengyu: The key technical difficulty lies in whether the computing resources can really be freed up when there is a real peak. In this regard, the cloud nativeization of ant's overall architecture has laid a good foundation for unified scheduling of computing resources.

The core components of the cloud, such as containers, ant has always insisted on independent research and development of technology, which highlights its importance in the green computing system. For example, this is equivalent to developing a new operating system, which can run Android, IOS and so on at the same time. All computing tasks run on the same machine, even if the overall CPU utilization reaches 80 or 90%, it can not affect each other, which greatly improves the operation efficiency.

To achieve this premise, we can do the scheduling of computing resources, let the offline task run next to the online task, the online task run to the database, through real-time dynamic configuration, according to the service requirements to do this thing.

36 Carbon: The industry does green computing, generally from two aspects, one is to reduce the PUE of data centers, and the other is to allocate computing resources reasonably. How do ants view emission reduction efficiency in different directions?

He Zhengyu: PUE is the energy consumption in addition to computing, such as data center lighting, cooling, etc., now the industry's advanced PUE level is around 1.1, and it is almost the limit to increase to 1.0, and there is about a 10% improvement dividend. But there is still a lot of room for improvement in computational efficiency. Over the past three years, our overall resource utilization has more than tripled.

So for the treatment of carbon, we must first adjust the energy structure, the energy consumption of technology companies is mostly electricity, we will improve by purchasing green electricity, etc.; then improve the calculation efficiency, the resource utilization rate is adjusted to the optimal.

36 Carbon: Cloud nativeization from 2019 onwards, is there any organizational adjustment behind this that accompanies the internal ants? What is the impact on compute resource consolidation?

He Zhengyu: There are adjustments. The organizational characteristics of ants are still dominated by the organizational form of large middle office + small front desk. In terms of technical architecture upgrade and technical emission reduction, we first set the goal of infrastructure technical committee and financial and security team internally, and under the premise of ensuring business stability, we set the goal of energy consumption and efficiency improvement every year, and then we formulate technical input and procurement volume.

At the time of specific implementation, with the CTO line as the lead, the computing resources of all business units are attributed to the CTO line for unified planning, procurement and configuration. We have a strong incentive to save resources, and there are market-oriented settlement means within the organization.

36 Carbon: How do ants strike a balance between reducing the cost of emissions and ensuring business continuity?

He Zhengyu: In terms of emission reduction, ants have a relatively strong structural control process. The first is to control through finance, if the machine consumption or utilization rate does not arrive, there may be no new approvals, which is guaranteed from the mechanism.

Then on our side of technology, we estimate how many resources will be needed for business development throughout the year, arrange and distribute them differently, and then determine what part of the technology is invested.

In terms of business, we definitely put resource protection first, so that the computing resources of the business are satisfied first. To avoid disruption to the business, we deploy technologies like tidal hybrids at the bottom to help them improve efficiency. This is also the reason for the establishment of a middle office department such as the Trusted Native Technology Department.

36 Carbon: Reviewing the past three years, what are the experiences that ants deserve to share?

He Zhengyu: I think the first, first of all, is to have a strong enough basic technical team, when we go deep into the operating system level, whether it is middleware, operating system, or even database, we have a corresponding team to do research and development for the goal of improving efficiency and reducing emissions. If everything in your hand is a black box, all external procurement, you basically can't do anything, and our investment in self-developed technology is paying off at this moment.

Second, from a technical point of view, it is important to define clear goals. For example, the double carbon target at the national level is very positive for the guidance of enterprises. For many technical students, the worry is not that the technology is difficult or takes time, but that there is no definite goal or problem. It is very important to define the problem clearly.

36 Carbon: Did you think it was more radical to just mention the goal of achieving world-class emissions reduction practices in 3-5 years?

He Zhengyu: Of course, time is very tight, we definitely have technical advantages and challenges.

Technically, we believe that many times there will be a latecomer advantage. Especially in the field of computer infrastructure development, there are two concepts: Green Field and Brown Field. (Green Field, which refers to the development of a system in a new environment without legacy code and other issues; Brown Field, which refers to the development or improvement of the previous system)

Ant has a good governance tradition on this issue, every three years will do a generation of large technical architecture upgrades, can better deal with the historical problems of technology, the speed of solving problems will become faster. Therefore, for this goal, we are still relatively confident to achieve.

36 Carbon: Based on last year, what is Ant Group's emissions reduction target for this year? What key technologies will we focus on?

He Zhengyu: Through trusted native technology, we actually achieved 27,000 tons of carbon dioxide reduction last year. This year, we hope to increase resource utilization by another around 15 percent.

In terms of technology, this year we will focus on solving problems such as storage systems and further connect storage and computing systems, which will significantly improve the efficiency of dynamic scheduling.

In addition, we are also very supportive and passionate about open source, and have now open sourced a large open source project Kata Containers, which is the core technology of container isolation. In the future, for example, some technologies involving operating systems and cloud-native underlying components, we will also open source them and share them through academic papers and algorithms.

36 Carbon: Are these practices of ant technology emission reduction currently exported and commercialized?

He Zhengyu: Of course, we hope that technology will benefit the entire industry. At present, the "green computing" system includes two technical categories, one is cloud native and the other is native distributed database. At present, our cloud-native technology is open to the outside world through open source, free open patents, and paper sharing.

As for the native distributed database, we provide services to the outside world through product forms. Our distributed database OceanBase supports green computing technology and currently serves more than 400 customers. OceanBase is based on the ability to improve resource efficiency in computing, storage, and networking based on offline hybrid deployments, extreme lossless elasticity, and intelligent time-sharing scheduling.

36 Carbon: What are some of the best technical abatement practices to share globally?

He Zhengyu: From my point of view, technology companies are divided into two types of routes, one is Party B companies, such as cloud computing manufacturers, hardware manufacturers, etc. companies that mainly provide computing power, as well as consulting companies, etc.; the other is Party A company, which consumes a lot of computing power.

Party B wants to help customers achieve carbon neutrality, from hardware to software, with a full set of technology stacks. Their own technology stack utilization can reach a very high level, such as IBM's combination of software and hardware, can make the utilization of the technology stack to 99%, which is very amazing. That's because they have to help customers solve problems, and they take advantage of AI and data and so on, and they have a good ability to predict the demand for hash rates.

The best thing in the company is Google. Google's biggest advantage is that many of the systems are built rather than procured, which allows them to see clearly what all sectors of the business are doing. Their technology stack is equivalent to treating all servers as a computer, doing everything to improve the utilization of that computer. Even if it is increased by a few percent, the efficiency improvement under the huge server volume is amazing.

Therefore, there will be great companies in these two directions, one is to help customers do, the other is to help themselves, which are the goals we should pursue, and ants will also explore.

36 Carbon: For the tech industry, where do you think the biggest current challenges to reduce emissions will be?

He Zhengyu: I read a data center report the other day, which can actually reflect some problems. At present, the data center has not slowed down from the perspective of market shipment growth, and it is still improving every year. And the power consumption of a single server or CPU has not decreased.

In terms of the challenge of reducing emissions, I think it is really a technological breakthrough. If we avoid unnecessary consumption during handling, storage, and transmission, the overall energy consumption of the technology industry can be reduced exponentially.

This means that tech companies need to look at the problem more responsibly. In the future 5G, metaverse and other new technology trends, the computing power we generate will be huge. If the tech industry does nothing, an energy crisis is foreseeable.

Welcome to 36 Carbon

Read on