laitimes

For the first time, the cloud database technology behind Double 11 Double 12 was revealed! | Q recommended

From 2009 to 2021, from tens of millions of transactions to hundreds of billions of transactions, Double 11 has been carried out for 12 years. Today, the annual Double 11 and the double 12 a month later have become a true national shopping carnival. Just past 2021 Double 11, more than 800 million consumers participated.

Contrary to the rising transaction volume and number of participants, the collapse of the main position of Double 11 "Taobao APP" and the main position of Double 12 "Tmall APP" has decreased year by year. On this basis, Taobao and Tmall are still constantly absorbing feedback from consumers and optimizing functions, such as supporting the real-time display of coupons in shopping carts to the auction price in 2021, searching for orders that have been purchased... A large number of operation requests on the application flow to the technical background, which brings a lot of pressure to the database.

What kind of database is supporting the stable progress of double 11 double 12 in 2021? In the third issue of "Data Cool Talk", Zhu Cheng, captain of Alibaba's Taobao Technology Department Double 12, Xu Peide, captain of Alibaba Business Platform Double 11, Chen Jinfu, captain of Alibaba Database Double 11, and Wang Yipeng, editor-in-chief of InfoQ, jointly revealed the database technology behind Double 11 Double 12.

For the first time, the cloud database technology behind Double 11 Double 12 was revealed! | Q recommended

Let the hot flash kill truly achieve "fighting hand speed"

As a common business scenario in e-commerce, the second kill is now a normalized business on Taobao, such as the activity of grabbing Moutai on Taobao at 8 pm every night. However, this kind of activity was not easy to stabilize in the early years. When massive amounts of data pour in instantaneously, causing a pulse-like impact on the system, the system will be hung up at once. At this time, the user sees that the page is hanging. Taobao began to consider adding the ability to queue to solve the situation of the system hanging in seconds, but on the user side, you will see the page display the rush results after the circle, and the experience is not good.

In Double 11 Double 12, the drawbacks of this approach are further amplified. The data shows that in the double 11-seconds system, the peak transaction data of the seconds is more than 500,000 per second, which is a very typical e-commerce seconds sale scenario. After the outbreak of orders at 0 o'clock, with the completion of the first wave of orders, consumers quickly began to revisit Taobao. In the process of shopping, consumers often choose multiple products before placing an order, so the traffic of shopping is far higher than the traffic of placing orders. When millions of consumers are shopping together, the query pressure of the database read link is amplified by an order of magnitude.

At the database level, a commodity ID often corresponds to a row of records in the database. The moment a consumer places an order, an asset or card is written off, it is said to have completed a transaction in a relational database. In order to ensure that such a big promotion spike can support the high concurrency of shopping, Alibaba Cloud has undergone iterations from open source MySQL to AliSQL in the selection of databases.

At the business level, consumers now participate in the spike event, whether it is the peak of the double festival promotion, and can instantly get feedback on the results of the rush, without waiting. Achieving fairness at the database level means that the spike activity is already a real "hard-to-speed" thing.

In fact, in order to ensure stability, in order to ensure that the peak of the big promotion can pass smoothly in previous years, a downgrade strategy will be implemented in some places where the amount of computation is relatively large or the stability risk is relatively high to ensure that the traffic peak can be smoothly passed. Starting in 2021, Alibaba has placed more emphasis on user experience, so "having to survive these difficulties" is a requirement for the database team. Zhu Cheng said: "This needs to involve a lot of basic link transformation, only the improvement of database capabilities can be done. ”

In 2021 Double 11 Double 12, there is an omnipresent technical force that guarantees the stability of the overall system, such as PolarDB's extreme elasticity, massive storage and high concurrent HTAP access product features. It is worth mentioning that in terms of memory-computing separation, Alibaba combines the technology stack capabilities of the DPCA ESC, ESSD and PolarDB file systems to achieve the reduction of some meaningless overhead at the kernel level, so that many database-level operations can pass through the hardware without operating system conversion. The improvement of this ring technology capability alone can reduce the cluster size of the transaction order library by more than 40% at the business level, and improve the real-time analysis of business data circulation efficiency by 30%. Open source attributes will also bring more imagination space to PolarDB.

In addition to the peak traffic in the spike scenario, in daily business, it is still necessary to maintain service stability at a low cost, which requires the database storage head to be able to achieve ups and downs. "When the big promotion uses a relatively high-specification storage machine head, that is, the computing storage node, some familiar database friends may define it as the engine layer, the execution layer, that is a high-specification data execution node, but I do not need such a good node in daily life, which means that it has a process of upgrading and matching." Doing this before there is no storage computation separation is actually a time-consuming operation. In fact, the operation of the lowering nose means that it is a bit like a motorcycle with a tube, but the horsepower is a little larger, the horsepower is a little smaller, different scenes, I will change the motorcycle head of different specifications. Xu Peide said.

Make the price of the hand clearly visible in the shopping cart

When stability became the norm, the demands of the business layer began to be met one by one. Zhu Cheng said that in the past, consumers emphasized buying and buying, and now they prefer to shop around, there are two obvious appeals, one is that the price can be more clear, know what the price of this thing is, what is the price after the coupon, and the other is that the user hopes that an order can support multiple addresses, and can enjoy more discounts and discounts when placing an order with one click, to achieve the highest cost performance.

The realization of the post-coupon to the hand price depends on the capacity of the database in both aspects, computing power and data access. If you only rely on RDS to upgrade or expand, the cost will be difficult to estimate. As the underlying product, the three important indicators of the database are cost, efficiency and stability. When making product selection, you need to systematically consider these three indicators. As a result, the Alibaba Database team launched the Tair in-memory database. "Before Tair, whether it was used as a cache or persistent storage, it was more of a KV-class data structure. But what we need is a relational, structured access capability, so Tair introduced the SQL model, which we call TairSQL, a relational in-memory database that provides a high-performance read-write concurrency with a relatively low hardware cost. Xu Peide introduced.

When consumers are in the process of shopping, once they place an order, the card coupon will be written off or frozen, and the asset status will be updated to TairSQL, which puts forward higher requirements for low latency between heterogeneous data sources. At the same time, the amount of data on user assets is very large, and if all memory is used, the overall storage cost is very expensive. "The requirements for high performance, large capacity, and low cost may seem difficult to balance, but this is well addressed on Tair. Tair's persistent memory technology in recent years is a good match for many similar scenarios today, that is, the Tair persistent memory form (Tair-PMEM) that is being sold on the cloud. Today, tair uses persistent memory, so that every operation is persisted, and the performance throughput is almost equal to that of memory. At the same time, the entire storage space can be increased by an order of magnitude through the new hardware. Chen Jinfu said.

That is to say, the combination of PMEM and Tair has realized ultra-large memory storage, which can persistently and normalize dozens of assets such as consumer platform red packets, store red packets, platform coupons, store coupons, store member discounts, and store limited time discounts, so that when the business calculates the price, it can obtain all the data from one place. In addition, the overall schema of the Tair in-memory database adopts a share-nothing schema and provides users with the ability to partition single-threaded ACID. In a horizontally scaled cluster, each node serves dozens of partitions, each using a transactional model in which individual threads respond to avoid the overhead of lock contention. It is reported that in the big promotion scenario, Tair provides an almost straight-line P99 access delay.

Enables historical orders to be retrieved in real time

The second business layer requirement that is satisfied is that historical orders can be retrieved in real time during the double period. This feature is also a feature that has been demoted in the past during the peak of the promotion. Taobao has developed so far, accumulated a large number of historical order data, on the basis of doing a good job of data storage to achieve accurate retrieval, is not an easy task. The user's order retrieval relies on the fuzzy matching function, and it is difficult to rely on the database alone to achieve an accurate retrieval experience. On the one hand, databases need to match key word meanings entered by consumers with high correlation, such as searching for data brought back by a teacup including teacups and tea sets.

Such a description can easily make technical people think of search engines. Search engines can indeed meet these requirements in terms of functionality, and they are also very mature in technology, but the application is only to achieve this function within the enterprise, and the cost is too high. "Especially inside Alibaba, in order to give users a more timely experience, he can almost think that his index is saved in full memory state." Xu Peide said. In the past 20 years of development, Taobao has accumulated hundreds of billions of orders, "the index column of hundreds of billions of orders, all thrown into memory, my machine cost will definitely not be able to hold back." ”

In the end, the Alibaba team and the database team chose ADB (AnalyticDB) together, as early as 2015 and 2016, the data could be entered offline, and the ad Hoc query was not only able to ensure that the order addition was not affected, but also had a rich search relevance ranking. In July 2019, analyticDB for MySQL 3.0 was released, a highly compatible with the MySQL protocol and the SQL:2003 syntax standard, enabling instant multidimensional analysis perspectives and business exploration of massive data to quickly build a data warehouse on the enterprise cloud. In 2021's Double 11 Double 12, ADB 3.0 truly enables historical order retrieval in real time, whether in peak scenarios or not.

Specifically, ADB 3.0 addresses three issues:

Full data migration and real-time synchronization. DMS warehouse all-in-one architecture, with the help of DTS efficient transfer capabilities, migrate all MySQL data to ADB and keep real-time synchronization.

Row-level storage capabilities. The ADB storage format adopts paX format with row and column mixing, which can provide efficient random lookup capability based on line number, and can also divide the parallelism of reads according to Chunk granularity, multi-Chunk parallel scanning, improve offline read throughput performance, and take into account online low-latency queries and offline high throughput scenarios.

Adaptive indexes. In response to the order search needs at any time to change, ADB self-developed adaptive indexing framework, support string InvertIndex, bitmap index, KDTree index, JSON index and vector index five index types, column level different types of index can support a variety of conditions (cross, merge, difference) of any combination, compared to the traditional database, no need to manually build a composite index, and support OR/NOT and other conditions of index pushdown.

Today, ADB 3.0 has enabled Alibaba to achieve high satisfaction in the order search business, which is about 86% lower than the number of individual customer complaints in 2020. In Chen Jinfu's view, a great part of the value of the cloud-native data warehouse ADB3.0 lies in the ability to realize the online real-time realization of data and the ability to mine some commercial value that has not yet been discovered. "The requirements for a new database product type are actually in the exploratory stage of the entire industry."

Write at the end

The database technical support behind Double 11 Double 12 goes far beyond that. Behind the closing of an order, there are nearly 50 requests at the database level, which is far from the support provided by a single database product. Behind the rich operational activities and hundreds of billions of transactions, the database level is a combination of database products including RDS, PolarDB, Tair, ADB (ADB3.0) and Lindorm. 2021 is Alibaba's first double 11 year of 100% cloud migration, and it is also a year of full cloud nativeization of Alibaba Cloud database, but the peak computing cost has dropped by 50% compared with 2020, which shows the huge commercial value and potential of cloud databases. The future benefits and value of cloud-native databases will also exceed the database itself.

Read on