laitimes

The practice of the Hujiang order system sub-table project

author:Flash Gene

background

With the continuous enrichment of Hujiang's product line, the data generated by transactions has also increased several times every year. At present, the entire trading system has completed the field splitting, including the order domain, commodity domain, marketing domain, clearing and settlement domain, etc., among which the data growth of the order domain is the most prominent, and the services it provides include creating orders, modifying orders, checking orders, distribution, after-sales, etc. Based on the current growth rate and future estimates, the storage capacity of a single table in a database will exceed the reasonable threshold within half a year. If the capacity of a single table is too large, it will consume a large amount of database resources (such as CPU, I/O, etc.) for insert and query operations, which will affect various services in the order domain. In order to avoid the systemic risk of the entire trading service caused by the excessive capacity of a single table, we propose an order sub-table item. This article introduces the practical process of the order system table sharding project.

Analysis of the current situation

The current situation of the order system is mainly described from the following dimensions:

Systematic classification

  • Order internal system: order system, order change system, distribution system, front desk query system, operation query system, after-sales system a total of 6 subsystems. These systems operate order data primarily through access to databases.
  • Peripheral systems: front desk settlement page, clearing and settlement system, BI analysis system, CRM system, business operation system. Except for the analysis system, which exports order data to the data center on a regular basis, the other systems operate order data through the order service.

Data classification

According to the frequency of order generation and modification, we divide order data into two categories:

  • Hot data, which is modified frequently within 2 months after an order is successfully placed, such as payment status, delivery status, and delivery address.
  • Cold data, orders after the transaction is completed, are mainly used in the after-sales system and query system.

System-to-data relationships

The peripheral system mainly accesses the order data through the order service, and classifies the order subsystem according to the type of data accessed:

  • Hot data access, including order system, order modification system, and distribution system access.
  • Cold data access, including query system and after-sales system.

Concurrency

According to the historical number of visits and the estimation of the next three years (3 times per year), the concurrency that the current system can withstand can fully meet the rapid development of the business within the reasonable threshold of a single table.

Solution selection

There are many mature solutions to solve the problem of excessive capacity of a single table. We compared the following scenarios:

MySQL table partitioning technology

A normal table is a logical table in which all data exists in the same file. A partitioned table is also an independent logical table, but the underlying table consists of multiple physical subtables. Partitioning is like a coarse-grained index, reducing the number of datasets accessed. When using partitioned tables, consider the following aspects:

  • For the selection of partition rules, MySQL provides scopes, lists, hashes, etc. Different rules can affect the efficiency of querying data.
  • Partition maintenance is easy for adding and removing partitions, but it is more complicated for reorganizing partitions or ALTER statements: you need to create a temporary partition first, then copy the data, and finally delete the partition. During this time, you will need to stop the application service.
  • Optimization is weak, and all optimizations can only be carried out at the database level, and the controllability is not high.

The order system has multi-dimensional query requirements, such as user, order number, commodity, commodity type, institution, source, platform, time, status, amount, etc., so the partitioned table will not bring significant improvement to the order query service.

Horizontal sub-tables

The practice of the Hujiang order system sub-table project

Data writes and queries are spread across multiple tables, and the benefits are clear:

  • High concurrency and strong performance, no centralized write bottleneck.
  • Strong controllability.

At the same time, it has a large impact on the existing system:

  • Order numbers need to be distributed globally, and the current order number generation rules no longer meet the distribution requirements.
  • All subsystems of the order service need to be transformed.
  • Downtime release required.

Separation of hot and cold data

The practice of the Hujiang order system sub-table project

According to the characteristics of hot and cold order data and the grouping of related order systems, cold data and hot data can be stored separately. This approach has the following advantages:

  • The scope of impact is reduced, the workload is reduced, and only the order inquiry system is affected.
  • The number of cold databases is controllable.
  • No downtime required.

At the same time, there are some disadvantages:

  • The capacity of a single table in the history database is too large.
  • In the case of high concurrency, there is a single-point bottleneck.
  • The active library data is not controllable.

Separation of hot and cold data + horizontal table sharding of cold data

The practice of the Hujiang order system sub-table project

Solution 2 + Solution 3 can solve the problem of excessive capacity of a single table and meet the current business development. First, data separation reduces the data capacity of a single table in the active database, and the capacity of a single table in the active database can be kept within a reasonable range based on the current system concurrency. At the same time, the database where the cold data is located performs horizontal table sharding operations to ensure that the capacity of a single table is within a controllable range.

Inquire

After horizontal table sharding, the query of non-sharding rules becomes more complicated. The way to avoid traversing all child tables is to store the global shard key and query criteria. Global mappings can be achieved in the following ways:

  • The global table is added, but the problem that a single table is too large is introduced.
  • Search engine, query efficiency, horizontal scalability. The company's infrastructure group has provided a stable and efficient search service (Elasticsearch), which can be directly connected to solve query problems.

Data storage architecture

After determining the use of option 4, review all business requirements again, and for a few systems that directly rely on the order library, it is decided to temporarily provide a full data warehouse for the business side to use, and the internal services of the order do not use this data warehouse. The final order data storage is divided into the following 4 parts:

The practice of the Hujiang order system sub-table project
  • The activity library stores data generated in real time.
  • The repository stores data from the last 2 months.
  • All data is stored in a full warehouse.
  • Elasticsearch stores the order index data in the historian.

The following figure shows the architecture of the application service and data store.

The practice of the Hujiang order system sub-table project

After this transformation, the structure is changed to the following figure:

The practice of the Hujiang order system sub-table project

Programme execution

In the process of implementation, the main focus is on the following aspects:

Cold data migration and horizontal sharding

Data migration uses scheduled tasks to migrate cold data to the historical database, which follows the following rules:

  • Migrate sequentially based on the definition of cold data. For example, orders older than two months belong to cold data, and the orders that are created in a small time are migrated first.
  • If an exception occurs during the migration, the migration will be terminated and an exception notification will be issued.
  • The migration process is not responsible for creating index data for cold data.
  • The migration process is not responsible for deleting orders that have already been migrated. We use ID ranges as the sharding rules based on the following considerations:
  • You can control the data in a single table.
  • Cold data does not have the pressure of concurrent writes.
  • Easy to maintain, no changes to the data that has been sharded when adding a sub-table.

The specific rules are as follows:

  • The order number range is divided, for example, 1~10000 is in the tb_order_0001 table, and 10001~20000 is in the tb_order_0002 table.
  • All order subtables must have the same sharding logic as the primary table. For example, if the order number 123 master table data is stored in the tb_order_0001 table, then the order detail data must be in the tb_order_detail_0001 table.

Create a cold data index

The practice of the Hujiang order system sub-table project

You can create an index for a history database in one of the following ways:

  • Full search engine pull, used during initialization and index reconstruction.
  • Incremental sending: You can use a scheduled index creation task to pull the latest migrated cold data and send it to the message queue specified by the search engine.

If the message is sent, but the order system cannot know the result of the search engine, a scheduled inspection task is deployed to check whether the index has been successfully created for the order in the historical database.

The practice of the Hujiang order system sub-table project

Since the orders must be migrated sequentially during the cold data migration process in Step 1, the order number that is successfully indexed can be used as the demarcation point for order migration, and the specific business logic is as follows:

  • Orders smaller than this order number have been successfully migrated and indexed.
  • Orders larger than this order number have not been migrated or indexed.

Therefore, during the scheduled inspection task check, the maximum order number that has been successfully indexed is stored in the order demarcation table and cache to facilitate the use of other order business logic (such as query service, scheduled deletion task, etc.).

The practice of the Hujiang order system sub-table project

Synchronize the full database of orders

The practice of the Hujiang order system sub-table project

Citing Alibaba's open source project Otter as a quasi-real-time synchronization tool, it is based on MySql binlog, which has high stability and supports high-availability deployment. To prevent the DELETE statement from being synchronized to the destination database, modify some of the source code in Otter to implement DELETE statement filtering. For the specific principle, please refer to the introduction of the Otter official website, and we use a single-computer room deployment model. At the same time, regular inspection tasks are deployed to verify real-time synchronization results.

Multi-source consistency

To solve the problem of lost orders or data inconsistencies that may occur during cold data migration, take the following measures to solve them:

  • If an exception occurs during data migration, the migration task is suspended and an alert message is sent.
  • If the data is modified, an alert message is sent.
  • Check whether the data that has been successfully migrated is the same as the data in the history database before the data is deleted in the active library.

In addition to Overter's binlog-based assurance mechanism, we add a scheduled inspection task to check whether the data of all warehouses and active databases is consistent within one hour.

Order Inquiries

There are two libraries used for order queries: the active library and the history library. The specific implementation process is as follows:

  1. The order number is queried based on the sharding rule
  2. Query according to non-sharding rules In the above two processes, the demarcation point order number is used to ensure that the query does not contain duplicate data, and the specific rules are as follows:
  • The active library only looks: x>n orders.
  • The history database only queries orders for x<=n.

The order tracking service includes two types of paginated query methods:

  • Previous, next query.
  • Page number pagination.

The multi-data source paging problem is essentially a pagination problem with multiple ordered sets. For the first query, each query will know the start or end position of the last query, so you only need to add the start or end position in the query condition to which data sources can be located. For the second query method, you need to traverse the data source to obtain the total amount of data, calculate the total number of page numbers, record the total amount of data that meets the conditions of each data source, and then determine which data sources the data falls on based on the page number and the number of pages per page.

During the implementation, we encountered the problem that the search service has a deep pagination limit (up to 1000 records returned), for example, if there are 20 records per page, you can only turn to page 50. In reality, some back-end operations and queries exceed this limit. Therefore, we adopt the strategy of modifying the query conditions to implement deep pagination, and the following is a simplified description of the implementation of deep pagination: Suppose there are 10 records in the search service, and the query interface implements pagination through offset and limit, as shown in the following figure

The practice of the Hujiang order system sub-table project

If there are 2 data entries per page without a pagination limit, the syntax for querying the data on pages 1, 2, and 3 is as follows: Page 1, offset=0, limit=2, returns 100,102 Page 2, offset=2, limit=2, returns 110,200 Page 3, offset=4, limit=2, returns 201,300 Now the following limits are added to offset and limit: offset<=1, limit<=2, then both variables may exceed the limit in actual pagination, and the solution is as follows:

  • If the offset >maxOffset, you can modify the query condition to make offset=0, for example, query the data on page 2 (offset=2, limit=2), and add orderId>102 to the query condition, you can change the offset to 0.
  • After the Offset condition is met, the limit problem is handled. If the limit >maxLimit, modify the query conditions to loop the query so that the limit of each query <=maxLimit. The sample code is as follows:
public static void paging(int offset, int limit) {

List<Integer> result = null;
if (offset <= MAX_OFFSET) {
result = pagingForLimit(offset, limit, null);
} else if (offset > MAX_OFFSET) {
int skipSize = offset; //需跳跃个数
Integer currentItem = skip(skipSize);
result = pagingForLimit(0, limit, currentItem);
}
if (result != null) {
System.out.println(result.toString());
} else {
System.out.println("无值");
}

}           

If the offset is greater than the limit (offset>MAX_OFFSET), you need to find the order number that met the previous condition, and modify the query condition to offset=0. Here's the code:

else if (offset > MAX_OFFSET) {

int skipSize = offset; //需跳跃个数
Integer currentItem = skip(skipSize);
result = pagingForLimit(0, limit, currentItem);
}           

The main way to find the previous order number is skip, and after finding the order number, you can modify offset=0 to solve the offset limit. The skip method section code is listed below, as shown below.

/**

* @param size 表示前一个订单号的位置
* @return 返回前一个订单号
*/

private static Integer skip(int size) {
int maxSkipStep = MAX_OFFSET + MAX_LIMIT;//最大跳跃步长
if (size <= maxSkipStep) {
return get(size, null);
} else {
Integer preInteger = null;
while (size > maxSkipStep) {
List<Integer> tmp = query(MAX_OFFSET, MAX_LIMIT, preInteger);
preInteger = tmp.get(tmp.size() - 1);
size -= maxSkipStep;
}
return get(size, preInteger);
}
}           

The maximum number of queries in the search service is used to skip queries to reduce the number of lookups. Once you've solved the limits of offset, start working on the limits of limit. Again, by modifying the previous order number, multiple queries get all the data of the current page.

/**
* 带限制的分页
* @param offset 请求偏移量
* @param limit 请求个数
* @param currentItem 前一个订单号
* @return
*/
private static List<Integer> pagingForLimit(int offset, int limit, Integer currentItem) {
if (limit <= MAX_LIMIT) {
return query(offset, limit, currentItem);
} else {
List<Integer> result = query(offset, MAX_LIMIT, currentItem);
limit -= MAX_LIMIT;
while (limit > MAX_LIMIT) {
Integer pre = null;
if (result != null) {
pre = result.get(result.size() - 1);

}
result.addAll(query(0, MAX_LIMIT, pre));
limit -= MAX_LIMIT;
}

if (limit > 0) {
Integer preInteger = result.get(result.size() - 1);
result.addAll(query(0, limit, preInteger));
}
return result;
}
}           

Go live and publish

In order to ensure a smooth launch, the whole project is divided into 6 steps and 4 releases, and the specific implementation plan is as follows:

Batch 1 Release:

  • Separation of hot and cold data from the order migration task.
  • Otter deployment, full database synchronization.

Batch 2 Release:

  • Scheduled incremental cold data migration tasks.
  • Connect to the search service and create order index data.

Batch 3 Release:

  • Transform the order back-end operation query service.

Batch 4 Release:

  • Revamp the order front desk inquiry service.
  • Start a scheduled task to delete cold data from a hot database.

summary

Through this project, the capacity of the order form has been effectively controlled, the QPS of order query on the user side has been increased by 2 times, and the historical order query on the operation side has been increased by 4 times. There are also some problems with the current solution: the capacity of the full warehouse, which is mainly designed to reduce changes to the peripheral systems that directly depend on the trading library. Next, it is necessary to work with the peripheral system to develop a more reasonable way to obtain the full order data and remove the dependence on the full warehouse.

Author: He Jianghua

Source-WeChat public account: Hujiang Technology

Source: https://mp.weixin.qq.com/s/TGJiwqd4wcQ4KWsbJLLvfA

Read on