laitimes

Putting data centers in the Arctic Circle may not be as profitable as this green computing path

The Heart of the Machine is original

Author: Shanshan

While COVID-19 continues to spread around the world and attract much of the attention, according to the latest report by the Intergovernmental Panel on Climate Change (IPCC), the window for reversing climate catastrophe is rapidly closing, and carbon reduction actions remain urgent.

22 April 2022 is the 53rd Earth Day, and each year Earth Day develops a corresponding theme for action. The theme for 2022 is "Invest in Our Planet", which builds healthy cities, countries and economies through a green economy and the implementation of sustainable business models.

International tech giants have been investing in this area for a long time, and now they have increased their investment. Amazon acquired more than 4 GW of a total of 35 wind and photovoltaic power plants in 2020, making it the largest buyer of renewable energy to date. Google has proposed to achieve real-time zero-carbon operations around the world by 2030, transitioning the statistical range of zero carbon from years to hours. Microsoft proposes to achieve negative carbon emissions by 2030 and eliminate all historical carbon emissions by 2050.

Chinese Internet technology companies have also taken action in recent years. In 2021, many companies, including Ali, Ant, Tencent and other companies, have put forward their own carbon neutrality goals, basically taking 2030 as a key time point to achieve carbon neutrality. On Earth Day this year, Ant and Alibaba announced that they have joined the Low Carbon Patent Pledge, an international platform advocating the sharing of low-carbon technology patents, and opened some of their energy conservation and emission reduction patents to the world free of charge.

In fact, many domestic companies with large-scale data centers have made many attempts to reduce emissions, the main means is hardware transformation, that is, through more advanced cooling technology to reduce data center energy consumption, such as a variety of water-cooled, liquid-cooled solutions, collectively known as reducing PUE, this practice has been explored in the world for many years, Google's PUE is now very close to 1, saving a lot of costs; but hardware transformation takes a long time, high O&M costs, and relatively limited revenue, Because the power consumption of data centers is mainly on servers, as long as the server utilization is low, it means waste. This part of the waste far exceeds PUE, and according to Gartner, the CPU utilization of servers in data centers worldwide is only 6% to 12%.

Nowadays, domestic enterprises stand in the position of latecomers to develop green computing, which is a good time to focus on the future and lay out more sustainable prospect technologies. Taking ants as an example, this paper explains how to focus on technologies that can increase the utilization rate of existing computing power in addition to ensuring a lower PUE, and take a more stable and promising "green computing" path. The results of this green computing technology project also won the 2021 Cloud Native Technology Innovation Solution Award from the Academy of Information and Communications Technology.

Based on "trustworthy and native", the road to green computing technology is "difficult and correct"

In the era of big data, data is becoming a new driving force for the development of the national economy. According to IDC estimates, China is expected to generate 48.6 zettabytes (zettabytes, representing 10 trillion bytes) by 2025, accounting for 27.8% of the world's total and contributing an average of 1.5% to 1.8% of GDP growth per year.

However, in order for the potential of data to be truly released, it needs the support of a strong computing power system. As a network of specific equipment for massive data processing and processing, data centers consume a lot of power resources during their normal operation. According to the China Academy of Information and Communications Technology, data centers across the country will consume about 76 billion kWh in 2020, accounting for 1% of the total electricity consumption of the whole society (7,511 billion kWh). Converted into carbon dioxide emissions, the national data center carbon dioxide emissions in 2020 will be nearly 40 million tons.

In terms of energy conservation and carbon reduction, the path of domestic Internet technology enterprises to build green data centers is roughly similar, mainly through the optimization of heat dissipation, cooling systems and server performance, reducing power utilization efficiency (PUE). PUE is one of the important evaluation indicators of green data centers, the theoretical limit is 1, and the closer the value is to 1, the better the energy efficiency level.

"We have noticed that we have encountered some challenges in saving energy and reducing emissions simply by reducing PUE. PUE technology is already a low-hanging fruit, almost picked, the key technology that determined the green data center was PUE ten years ago, and it is no longer 3 to 5 years ago." Wu Peng, senior technical expert of Ant Group, explained to the heart of the machine:

"Ten years ago, the level of the entire industry was 1.8-1.5; ten years later, this value has dropped to around 1.3, and some excellent companies can drop below 1.1. However, as we continue to decline from 1.1 to 1, there will be some non-linear additional investments, as well as some other technical risks."

This also means that for technology companies, relying solely on energy-saving technologies will not be enough to meet the zero-carbon challenge.

"Over the past decade, the industry has continued to iterate towards large-scale, intelligent and energy-efficient technologies. Ant began to start from its own technical advantages a few years ago, aiming to further improve energy utilization efficiency and business efficiency per unit of energy under the premise of low carbon, and the combination of this series of technologies is Ant's green computing technology system."

It is understood that with the support of green computing technology capabilities, the average daily utilization rate of Ant Group's full computer room in 2021 has reached 2 times that of 2019, and the utilization rate of mixed clusters has exceeded 40%, catching up with the level of international leading companies such as Facebook (now Meta).

Putting data centers in the Arctic Circle may not be as profitable as this green computing path

Green computing technology architecture.

Ant began developing "green computing" technology in 2019, earlier than the carbon neutrality target, which is understandably driven by the internal needs of technology companies after they have developed to a certain scale. At present, this set of technologies can solve key problems in the industry such as rational allocation of large-scale cluster resources, effective scheduling at the minute level, and intelligent traffic prediction, and the relevant capabilities come from multiple technical teams such as trusted native, technical risk, native distributed database OceanBase, and intelligent engine.

"Trusted native is a large-scale infrastructure technology, the underlying technology of green computing." Yang Tongkai, senior technical expert of Ant Group, introduced to the heart of the machine.

Trusted Native is ant group's concept from the demand for next-generation financial infrastructure, developers can use to build a more stable, secure, efficient and easy-to-use large-scale technology infrastructure to meet the strict business needs of the pan-financial industry. Specific to green computing, the three core technologies of trusted native are "hybrid deployment technology offline", "cloud native time-sharing scheduling technology" and "AI elastic capacity technology".

"Hybrid deployment technology offline", that is, offline hybrid deployment of computing resources. The traditional market approach is to separate online tasks and offline tasks in different clusters to avoid possible conflicts, but the isolation of two clusters from each other will make a large amount of cluster computing power idle, and the computing efficiency of the entire cluster is inefficient.

"The difficulty of hybrid deployment lies in the technology itself, and how to ensure that online business and offline business can run smoothly and safely on a physical machine without interfering with each other is a difficult point recognized by the industry." Yang Tongkai said.

Ant Group's solution to the problem is the first time in the industry to use the strong isolation technology of Kata secure containers, and deploy offline tasks hybridly on the servers where the online service is located. Under the guarantee of strong isolation technology, even if the utilization rate of the stand-alone CPU reaches more than 80%, Ant's online service will not be affected by the offline task mix, and can run stably within the requirements of its service indicators.

Putting data centers in the Arctic Circle may not be as profitable as this green computing path

Safety container mixed isolation technology.

"Cloud-native time-sharing scheduling technology" means that according to the load characteristics of specific scenarios, staggered peak multiplexing computing resources are arranged through scheduling. Ant Group has online services with more than one million computing power, and these online services have different business scenarios, so there are differences in the time span of resource use, such as periodicity in the time dimension. Time-sharing scheduling is to use the time characteristics of business use of resources to provide a resource to different applications in different time periods, which can greatly improve resource efficiency.

"Through this technology ant can be very good at arranging online business with different peaks." We can now achieve more refined, hourly resource orchestration, which is equivalent to one machine can make 24 resources, effectively improving the efficiency of the entire physical machine and reducing resource investment." Yang Tongkai introduced.

Putting data centers in the Arctic Circle may not be as profitable as this green computing path

Cloud-native time-sharing scheduling technology.

"AI elastic capacity technology", that is, the combination of artificial intelligence to dynamically predict the capacity of the application. Ant's business characteristics have very high stability requirements, such as double 11 and other activity scenarios, in the past, in order to cope with the traffic peak, mainly rely on manual judgment, constantly increase the server for protection. However, there are problems such as difficulty and lag in manual judgment, for which Ant has developed AI intelligent capacity technology, using big data and artificial intelligence technology, building a traffic cycle algorithm for graph calculation, and predicting traffic through deep learning, so as to achieve intelligent expansion and contraction.

Putting data centers in the Arctic Circle may not be as profitable as this green computing path

AI elastic capacity technology.

At present, Ant's trusted native technology is mainly open to the outside world through open source, and at the same time, through other product development, such as commercial products using SOFA technology, a certain degree of commercial practice has been achieved.

The above technology is only the application of trusted native in green computing. As a complete set of infrastructure technologies, it includes cloud native, secure containers, confidential computing, trusted hardware, mini program runtime, and so on. From a long-term perspective, carbon reduction technology and infrastructure are very compatible, which is why improving resource utilization is one of the key goals of Ant's trusted native technology.

Over the past few years, cloud native has led the way in large-scale clustered system architecture. From the perspective of system architecture, cloud native is an operations-oriented (SRE) architecture, its core mission is to ensure the stability of the system, when security and stability and performance conflict, operation-oriented architecture will make security more likely to be compromised, and as a platform user application development, in most cases do not want to intervene in safe and credible work.

But in the last year or two, technology trends have changed to some extent. With the gradual improvement of the system of privacy and data security protection in various countries, not only the infrastructure architecture needs to be strengthened and credible, but also the application needs to be protected, and the detection, protection and blocking of the security aspect is carried out at multiple different system levels, and even stronger regulations are required to refuse applications that do not meet the security rules to enter the software supply chain.

It is based on this trend judgment and technical concept, Ant invested in the research and development of large-scale infrastructure technology such as trusted native, and carried out extensive practice, such as forming a secure computing team to explore confidential computing technology, strengthening the system's ability to protect against intrusion, while allowing the system not to spy on what the upper-level applications are doing, this effective strong protection is necessary for sensitive financial applications to improve resource utilization.

In ant's green computing technology system, in addition to trusted native, it is also worth paying attention to OceanBase. In the past two years, the domestic self-developed database has been hot, and OceanBase is also a well-known example, breaking the world record of the transaction processing task (TPC-C) benchmark for two consecutive years.

From the perspective of technical principles, OceanBase mainly reduces carbon emissions from the following three aspects:

First, LSM-Tree-based advanced compression technology can significantly reduce storage costs, such as the migration of a business of Alipay from Oracle to OceanBase, and the compression of data from 100TB to 33TB;

OceanBase introduces the Paxos distributed consistency protocol into two-phase commit (2PC) technology to make distributed transactions automatically fault-tolerant;

The third is the SQL execution engine optimization technology, which greatly reduces the SQL execution time through the execution plan cache (Plan Cache), fast SQL parameterization, operator downpression and filtering, and vectorization engine.

Putting data centers in the Arctic Circle may not be as profitable as this green computing path

OceanBase data migration solution.

Leading database technology is naturally a strong support for carbon reduction, and the investment of domestic manufacturers in this regard is believed to see returns one after another.

Open ecology, the industry benefits

A green and low-carbon future is the common pursuit of mankind, and it is also a common problem in the world at present, which requires not only the leadership of several companies, but the cooperation of the entire industry and society. In the past few years, mainly foreign enterprises have opened up their own technologies, domestic peer learning, such as the open computing project created by Facebook in 2011, Microsoft, Google are involved, open source their own data center solutions, to help reduce costs. In the past two years, with the development of self-developed technology, leading domestic companies are also opening up their achievements and practices to the outside world to give back to the industry.

In the case of ants, on the one hand, the basic technologies that have been solidly done, such as those involving operating systems, databases, and cloud-native underlying components, are open sourced; on the other hand, some parts that are not engineering software, such as intelligent algorithms, are shared with peers through academic papers. This is also the practice of leading companies such as Google.

Wang Xu, senior technical expert of Ant Group and co-founder of Kata, said in the interview: "Ant's research and development and exploration in green computing has always remained open, and I hope that some of our exploratory work can help the entire industry." Now we have some leading technologies, such as Kata Containers, the top project of the Open Infrastructure Foundation and the open source factual standard in this field, and we have been continuously giving back our practices to the open source community; including ant's Kubernetes cluster, one of the world's largest production clusters, which is also being fed back to the community; in the trusted area, we have donated Occlum LibOS to the Confidential Computing Alliance. This is the first project they have received from China; in addition, there is the financial-grade distributed middleware SOFAStack, which contains the components needed to build a financial-grade, cloud-native architecture. Up to now, Ant has open sourced nearly 800 repositories in core areas such as cloud native, database, and front-end, and has grown nearly 20 of the world's top open source community projects. These are the little contributions we can make as technologists to the industry, and we will be more open in the future. ”

Successive technological revolutions have been a creative response to the development of human society. Since the mainland put forward the carbon neutrality target in 2020, "double carbon" has been written into the government work report for two consecutive years. In 2021, the Ministry of Industry and Information Technology issued the "Three-year Action Plan for the Development of New Data Centers", which clearly proposed to vigorously promote the development pattern of new data centers with advanced technology, green and low carbon, and the scale of computing power adapted to the growth of digital economy. In the era of digital transformation of society, the connotation of "green computing" is becoming more and more abundant, gradually expanding from hardware to the combination of soft and hardware, and technology is also innovating to find a more future-oriented direction. For technology enterprises, how to more actively solve the problem of energy conservation and carbon reduction through technology to respond to people's expectations for a better life is a challenge and an opportunity.

Read on