laitimes

Alibaba Cloud Time Series Database real-time index construction optimization practice

author:Flash Gene
Alibaba Cloud Time Series Database real-time index construction optimization practice

This is the 30th article of 2024

( Reading time: 15 minutes )

01

Preface

Time Series Database Management System (TSDB) has attracted a lot of attention in recent years, and it plays a crucial role in performance monitoring scenarios. Khronos is Alibaba's largest time series storage engine, which is self-developed and launched by the Intelligent Engine Division, providing time series data storage capabilities and real-time analysis capabilities for the group's business. Interested readers can read the review: "Khronos: Performance Monitoring Engine Construction Practices for Trillion-Scale Timelines" In this article, we will focus on Khronos' practices in real-time index building optimization.

We define timeliness delay as the gap between when the user writes the data to when the data can be retrieved. In TSDB, users expect real-time write metrics to be immediately visible (second-level effectiveness), but we find that the general Log-Structured Merge-Tree (LSM) architecture will have jitter delays in writing effectiveness to minutes due to the increased pressure on index construction during segment switching, which seriously affects the user experience. This paper firstly analyzes the reasons for the delay in data validity based on LSM architecture through experiments, and puts forward the design idea of supplemental index, that is, relying on the index built by the previous order, and only building the index on the timeline that has not appeared before, so as to reduce the overhead of index construction. Second, we design an index based on the index structure of Minimum Excluded (Mex) to efficiently multiplex the index built by the pre-sequence segment. Thirdly, we take advantage of the non-redundancy feature of the patch index, and propose a scheme to reuse the intermediate results of the query to avoid the repeated traversal of the index during the query process. Finally, we propose an index completion strategy to avoid the query and memory overhead caused by excessive dependency.

Experiments show that the supplemental index scheme proposed by Khronos greatly reduces the effectiveness delay, the highest efficiency delay of the system is reduced from minutes to seconds, and compared with the industry time series databases InfluxDB and TimeScaleDB, the write throughput of Khronos is increased by at least 4 times, and the query latency is reduced by at least 6 times. The design of the supplemental index was also accepted by one of the top conferences in the field of information retrieval CIKM2023

论文题目:《Khronos: A Real-Time Indexing Framework for Time Series Databases on Large-Scale Performance Monitoring Systems》

Link to paper: https://dl.acm.org/doi/10.1145/3583780.3614944

[Click to read the original article at the end of the article to jump]

02

Background

KMonitor is a data monitoring platform widely used by Alibaba Group's search and recommendation business, which has the characteristics of real-time, massive metrics and multi-metric computing performance. After years of stable operation and continuous optimization, KMonitor has been widely deployed in the core clusters of the group's e-commerce business, and has gradually replaced it as the main force of basic system monitoring. At the heart of KMonitor is the Khronos timing storage engine, which has provided strong back-end support for the KMonitor platform since 2020 to meet the needs of the business that has grown exponentially every year. For four consecutive years, Khronos has been robustly handling the query and write peaks during Singles' Day, storing more than tens of petabytes of monitoring data while ensuring second-level response latency.

2.1 Data Modeling

Monitoring data is generally modeled as a multi-dimensional time series. Figure 1 (a) shows how we model data, where the series key uniquely determines a timeline, which consists of a metric metric name and a set of tags (tagk:tagv). At the same time, each timeline also contains a set of time-point sequences, each timepoint consists of a timestamp and multiple field values. For example, the last row in Figure (a) represents the application of Khronos deployed on machine 10.10.10.5, which reports the usage of each CPU core every 10 seconds. Tags generally represent the attributes and dimensions of the monitored entity (e.g., app, tenant, host).

Alibaba Cloud Time Series Database real-time index construction optimization practice

2.2 Query Mode

The query of time series data usually includes a metric name, a time range, and a set of tag conditions, which can be divided into direct matching and fuzzy matching. For example:

Alibaba Cloud Time Series Database real-time index construction optimization practice

In the above query, we expect the recall time to be between 10-35, where the machine IP satisfies a regular expression of CPU utilization under the Khronos application. app="khronos" is a direct match, and host=Regex(xxx) is a fuzzy match.

2.3 Timeline Index

In Khronos, we build inverted and prefix tree indexes for each timeline, as shown in Figure 1(b). Each timeline is assigned a unique seriesId (first column of Figure 1(a)). For the inverted index, we split the seriesKey into metric and tagk:tagv tokens, and write the seriesId to the corresponding inverted list (as shown in Figure 1(b)). For the prefix tree index, we insert the token of the seriesKey split metric-tagk-tagv into the prefix tree (as shown in Figure 1(b)). Inverted indexes are generally used for direct matching, while prefix tree indexes are used to fuzzy matches and query pulldown hints.

2.4 Write Load Analysis

  • Time locality: The time series data in performance monitoring scenarios is usually time-local, that is, a timeline usually reports data at a fixed interval, for example, each physical machine reports its CPU usage every 10 seconds.
  • Series churn problem [1]: Over time, timelines are dying out (i.e., no new points in time are being written) and new timelines are being created. This means that the total cardinality of timelines in the database grows over time, and the size of active timelines is much smaller than the total cardinality, which is often referred to in the industry as series churns. In Khronos, we find that the lifecycle of the timeline is usually less than 1 hour (as shown in Figure 2 right), which is usually caused by automatic task scheduling or scheduled short-term tasks.
  • For example, in the upper left of Figure 2, taskB is first scheduled on host 1.1.1.2, reporting three red timelines, and at a certain time, taskB is scheduled to host 1.1.1.5 on a new machine due to load balancing and other reasons, generating a new yellow timeline with different host tags. The timeline (red) on the original machine is no longer reported and becomes inactive (dead). The Series Chorn problem means that the proportion of lines that are active within a certain time T (active means that the lifetime of the line intersects with the specified time range T) is much smaller than the total number of timelines in the database.
  • For example, in Figure 2, only 4 timelines are active (3 yellow for taskB and 1 blue for taskA), while the number of buses in the database is 12.
Alibaba Cloud Time Series Database real-time index construction optimization practice

2.5 Layout Storage

Global Index (GL)

Under this data load, global indexes will introduce significant read amplification. On the one hand, during the writing process, you need to determine whether each message is a stock timeline, if so, you do not need to build an index, if not, you need to assign a seriesId to build the corresponding index. The efficiency of the search decreases as the size of the global index increases. On the other hand, although the vast majority of timelines have died out in the database and the collection of active timelines is small, the query process still needs to recall the timelines that meet the conditions in the global index, and the read amplification is obvious. Figure 3 shows the timeline cardinality of some tenants in Khronos on the left. In Figure 3, we tested the global index-based InfluxDB, and we found that its write performance and query efficiency gradually decreased with the increase of the accumulated timeline cardinality in the database, and the system could only support a cardinality of 30Mbit/s, which could not meet the cardinality scale of our online trillion timelines.

Alibaba Cloud Time Series Database real-time index construction optimization practice

LSM Index (SL) for Time Partition

Therefore, for high-cardinality scenarios such as performance monitoring, data is usually organized by segmentation according to time intervals. Write data is divided into segments based on time ranges, and each segment is self-explanatory, containing local indexes and point-in-time data for its corresponding time range. When the size or time interval of a segment exceeds the threshold, the segment will be sealed, and a new segment will be opened in memory to accept real-time writes.

This LSM-like design avoids the amplification of global indexes and is better adapted to user queries. First of all, recent data is generally accessed more frequently, it resides in memory, and query performance is better. In addition, during the query, the corresponding segment can be accessed according to the time interval specified by the user to avoid the index traversal of the extinction line in GL.

03

Motivation

3.1 Effectiveness delay jitter analysis

However, in the design of SL, when each segment starts, the segment is built from scratch, which means that each write message is considered a new line (i.e., not in the SL index) and needs to be indexed, as shown in Figure 5

Alibaba Cloud Time Series Database real-time index construction optimization practice

and

Alibaba Cloud Time Series Database real-time index construction optimization practice

at

Alibaba Cloud Time Series Database real-time index construction optimization practice

You'll need to build an index at startup (although those two lines are already there.)

Alibaba Cloud Time Series Database real-time index construction optimization practice

has been indexed). As a result, the pressure to build the index soars during the segment startup phase (Figure 4, top left). Index construction, on the other hand, introduces more computation and memory access than writing data points, so the index write capacity is lower. In summary, the design of SL will cause huge index construction pressure when the segment is started, exceeding the upper limit of system processing, resulting in data accumulation and data validity (shown in Figure 4, lower left). In the production environment, we found that this data effectiveness jitter was triggered every 30 minutes and lasted for a few minutes at a time, which greatly affected the user experience.

Alibaba Cloud Time Series Database real-time index construction optimization practice

3.2 Existing Solutions

  • Add resources [2, 3]: The most direct way is to add more computing resources, but starting more build threads when the segment starts will preempt the CPU of the query, affecting the real-time query efficiency, and we found that the index construction overhead is 10-15 times that of writing point data, and theoretically we need to add 10-15 times the computing resources to solve the effectiveness delay problem, which is very wasteful.
  • Strong schema [4-6]: Another industry solution [2,3] is to control the cardinality of the timeline by forcing the schema to limit the number of tags and fields. However, this solution limits the flexibility of use, and the configuration needs to be updated every time a dimension is added, which also increases the difficulty of configuration management.
  • Periodic cleanup of dead lines [7] :P Rometheus solves the series churn problem by deleting the index corresponding to the inactive line from the real-time index from the real-time index on a regular basis, but the deletion operation will block real-time data writes, introducing a new effectiveness delay, as shown on the right side of Figure 4.

3.3 Design Ideas and Challenges of Supplemental Indexes

The key to the effectiveness of SL indexes lies in the redundancy of indexes: if the lifecycle of a timeline spans multiple segments, it will be repeatedly indexed in each segment, such as the midline in Figure 5

Alibaba Cloud Time Series Database real-time index construction optimization practice

。 We can find that due to the temporal locality of the time series data, at least 70% of the timelines between adjacent segments overlap, which means that for these 70% of the timelines, we can save the references they built indexes in the previous segment, and the real index building is triggered by the other 30% of the new lines. This is shown in the figure below.

From this, in Khronos, we came up with the concept of a complementary index, i.e., only constructing timelines that were not in the previous segment. The patch index for each segment consists of two parts: (a) an index built by a new line in the current segment (a local index) and (b) a reference to the index of the previous segment (a dependent index). For example, in the following diagram for:

Alibaba Cloud Time Series Database real-time index construction optimization practice

, whose complement index includes (a)

Alibaba Cloud Time Series Database real-time index construction optimization practice

and (b)

Alibaba Cloud Time Series Database real-time index construction optimization practice

,

Alibaba Cloud Time Series Database real-time index construction optimization practice

Alibaba Cloud Time Series Database real-time index construction optimization practice

Figure 5 shows the index building portion of the GL, SL, and Supplemental indexes. Compared with GL, SL divides segments according to time, which avoids the problem of read amplification of GL. Compared with SL, SL eliminates the need to build redundant indexes and avoids the problem of delay in effectiveness. Complement indexes combine the advantages of GL indexes without redundancy and the extremely low read amplification of SL indexes to achieve better tradeoff.

Alibaba Cloud Time Series Database real-time index construction optimization practice

However, the design of the patch index presents a number of challenges:

  • Index organization: In order to reuse the index content of the previous segment, the index structure needs to be redesigned, such as the allocation of seriesId. The global seriesId easily exceeds the bit width of 4 bytes, which affects the storage and compression efficiency, while the self-numbering within the segment may cause seriesId conflicts during inverted index multiplexing.
  • Query efficiency: Patch indexes need to traverse the index from multiple dependent segments, which may increase query latency.
  • Index dependencies: Successive index dependencies between segments form a chain of dependencies, for example, the index in Figure 5 is actually only built in and only stores references to its index. Thus dependence, and dependence, forms a chain of dependencies of length 3. If the dependency chain is extended indefinitely, it will tend to be close to a GL, and an excessively long dependency chain will increase memory usage and affect query efficiency.

At this point, we have introduced the basic idea of supplemental indexing, that is, to reduce the effectiveness delay by reducing redundant index construction, and in the following sections, we will explain in detail the implementation details of supplemental indexes, such as how to implement index references, how to organize indexes to support the reuse of inter-segment indexes, how to ensure efficient and accurate queries, how to deal with overly long index dependencies, etc.

04

System architecture

Figure 6 illustrates our data processing process, where time-series data collected from monitored entities is periodically sent to Khronos, and each message is hashed into different partitions based on the series key, each of which is independent of each other and does not communicate with each other. In each partition, the build thread writes live write data into the building segment, where point data is written to the buffer, and a patch index is built for the new line.

In-memory data is appended and can be queried in real time. Once the time interval of a building segment exceeds a certain threshold, the system automatically creates a new empty segment for receiving real-time writes, and the previous segment is converted into a dumping segment, which is converted by an asynchronous thread to perform operations such as sorting, sorting, deduplication, and index completion to form an immutable segment, which is written to the remote storage or local disk by the asynchronous flash thread. At the same time, offline will periodically trigger a compaction task to organize data, merge or split flushed segments into compacted segments with no intersection in time intervals, and reduce the number of segments accessed by queries.

Figure 7 illustrates our query flow. The query is first (1) broadcast to all partitions via QRS. In each partition, we will only (2) access segments that have intersections with the query time interval. Within the segment, the query thread (3) traverses the patch index to recall the seriesId that satisfies the condition. Then, (4) the line and point data are taken according to the seriesId, and (5) the timeline segments of the same timeline in different segments will be merged and sorted according to the timestamp. (6) The sorted data is aggregated in the return QRS and finally returned to the user. The query details of the internal indexes of a segment are described in a later section.

Alibaba Cloud Time Series Database real-time index construction optimization practice

05

Mex-based index organization

5.1 Writing Process

Let's start with the data writing process for Khronos (Algorithm 1), where we assign a point data buffer to each timeline. while

Alibaba Cloud Time Series Database real-time index construction optimization practice

When creating an initial null, we first determine whether the entered series key exists in the previous order

Alibaba Cloud Time Series Database real-time index construction optimization practice

, if it exists, we will be in it

Alibaba Cloud Time Series Database real-time index construction optimization practice

The seriesId is stored as a reference in the seriesIdList structure, and only the point data p is written to the buffer (rows 4-6 of the algorithm), thus reducing the construction of redundant indexes. Instead, we will use the Mex function to calculate the seriesId, and actually build the index and insert the point data (rows 8-11 of the algorithm). In Algorithm 1, the seriesIdList can be thought of as a special inverted list that records all the timelines written to the segment (including references to the previous segment and the timeline when the index was actually built).

seriesIdList serves as a filter for irrelevant timelines during the query, which we will cover in detail in the next section.

Alibaba Cloud Time Series Database real-time index construction optimization practice

5.2 seriesId的分配方式

In order to ensure accurate recall, the allocation of seriesId should follow the following principles:

  • The seriesId should be monotonically incremented to support inverted list insertion and efficient intersection/or-or computation.
  • For a previously existing timeline, its seriesId should be the same as that of the previous segment to reuse the inverted index, that is, the inverted list of dependent segments can be directly intersected or without converting the seriesId.
  • The new timeline assignment should ensure that there is no overlap with the seriesIdList of the previous segment to avoid conflicts.

However, with the increase of the cardinality of the storage timeline in the database, the global seriesId can easily break through the limit of 4 bytes (especially in the single-tenant trillion-base scenario such as khronos), which increases the storage cost and reduces the compression efficiency. Therefore, we have introduced the Mex function to calculate the seriesId. Mex(S) returns the smallest integer in the set S that is not in S.

For example, Mex({0, 1, 3, 5}) = 2. In Khronos, we assign each new timeline the smallest seriesId that is not in the previous and current segments, i.e. row 8 in algorithm 1. Figure 8 illustrates how the Global Increment (Max) and Mex seriesId are assigned. The characters in the box identify the timeline, and the vertical corresponding id of the box identifies the seriesId to which it is assigned. For example

Alibaba Cloud Time Series Database real-time index construction optimization practice

middle

Alibaba Cloud Time Series Database real-time index construction optimization practice

The seriesId is assigned as 4. Figure 9 illustrates the process of building a patch inverted index (corresponding to Figure 8).

Alibaba Cloud Time Series Database real-time index construction optimization practice
Alibaba Cloud Time Series Database real-time index construction optimization practice

The essence of Mex's design is that it securely reuses inactive seriesIds. In Figures 8 and 9, seriesId 2 is in

Alibaba Cloud Time Series Database real-time index construction optimization practice

is assigned to the line

Alibaba Cloud Time Series Database real-time index construction optimization practice

however

Alibaba Cloud Time Series Database real-time index construction optimization practice

The life cycle did not carry over

Alibaba Cloud Time Series Database real-time index construction optimization practice

。 seriesId 2在和

Alibaba Cloud Time Series Database real-time index construction optimization practice

It can be easily filtered out, so seriesId 2 is paired

Alibaba Cloud Time Series Database real-time index construction optimization practice

is inactive. Based on this, for:

Alibaba Cloud Time Series Database real-time index construction optimization practice

No, no

Alibaba Cloud Time Series Database real-time index construction optimization practice

and

Alibaba Cloud Time Series Database real-time index construction optimization practice

can be safely reused. Another advantage of the Mex index is that it reduces the gap value of the id in the inverted list by multiplexing the seriesId, which is more conducive to compression, for example, in Figure 8

Alibaba Cloud Time Series Database real-time index construction optimization practice

, Max's

Alibaba Cloud Time Series Database real-time index construction optimization practice

And the Mex program

Alibaba Cloud Time Series Database real-time index construction optimization practice

, the gap value is greatly reduced.

06

Patch index query process

In this section, we will introduce the query process of the patch index, in which we propose a scheme for intermediate result reuse, avoiding duplicate index traversal and ensuring accurate result recall. Before introducing the patch index query, let's first understand how the internal index of a single segment works together to recall the set of seriesIds that meet the criteria based on the user query, as shown in Figure 10.

Among them, we recall the CPU metrics of the application Khronos, which are deployed on specific machines. (1) We traverse the prefix tree index to obtain all tag hosts with tagv prefixed with 10.10.12;(2) filter out two tagv of 10.10.12.11 and 10.10.12.15 through regular filters;(3) construct a query tree according to the metric and tag conditions specified by the query, each leaf node corresponds to an inverted list, and the intermediate node represents the and/or operation; (4) Perform intersecting or operation according to the query tree and the corresponding inverted list to obtain the set of seriesIds {0, 2, 3, 7} that meet the conditions. Then, according to the seriesId, the point data of the corresponding timeline can be retrieved in the segment for subsequent operations such as merging.

For the patch index, we first perform the index traversal shown in the figure above in the local index of each segment to get a set of seriesIds that satisfies the criteria

Alibaba Cloud Time Series Database real-time index construction optimization practice

。 In order to ensure the integrity of the recall results, we also need to iterate through the indexes of all dependent segments along the dependency chain of segments. Let's start with a simple example, assuming that the query only needs to access one segment

Alibaba Cloud Time Series Database real-time index construction optimization practice

, its index only depends

Alibaba Cloud Time Series Database real-time index construction optimization practice

so

Alibaba Cloud Time Series Database real-time index construction optimization practice

The full collection of seriesIds that should be recalled

Alibaba Cloud Time Series Database real-time index construction optimization practice

It can be expressed as:

Alibaba Cloud Time Series Database real-time index construction optimization practice

thereinto

Alibaba Cloud Time Series Database real-time index construction optimization practice

be

Alibaba Cloud Time Series Database real-time index construction optimization practice

Traversing the recall results of the local index (i.e., the actual built index) is shown in Figure 10. We'll start with

Alibaba Cloud Time Series Database real-time index construction optimization practice

The recall results and

Alibaba Cloud Time Series Database real-time index construction optimization practice

Intersect, filter out what doesn't belong

Alibaba Cloud Time Series Database real-time index construction optimization practice

of the unrelated timeline, then the results are sumed

Alibaba Cloud Time Series Database real-time index construction optimization practice

The results of the local recall

Alibaba Cloud Time Series Database real-time index construction optimization practice

Merge to get a complete and accurate set of seriesIds. Previously mentioned

Alibaba Cloud Time Series Database real-time index construction optimization practice

All timelines in a segment are recorded, and those timelines that belong only to the previous segment can be filtered out by intersecting with them. For prefix trees, the recalled tagv set may be false positive due to index reuse, but the subsequent inverted calculation can ensure the accuracy of the final recalled seriesId set.

Next, let's consider a more complex situation, hypothetical

Alibaba Cloud Time Series Database real-time index construction optimization practice

depend

Alibaba Cloud Time Series Database real-time index construction optimization practice

whereas

Alibaba Cloud Time Series Database real-time index construction optimization practice

depend

Alibaba Cloud Time Series Database real-time index construction optimization practice

so

Alibaba Cloud Time Series Database real-time index construction optimization practice

The full recall results

Alibaba Cloud Time Series Database real-time index construction optimization practice

It can be expressed as:

Alibaba Cloud Time Series Database real-time index construction optimization practice
Alibaba Cloud Time Series Database real-time index construction optimization practice

First and

Alibaba Cloud Time Series Database real-time index construction optimization practice

Intersect, filter out the absence

Alibaba Cloud Time Series Database real-time index construction optimization practice

After this step, the inactive seriesIds will not be passed backwards to continue the calculation, so this part of the seriesId can be safely stored in

Alibaba Cloud Time Series Database real-time index construction optimization practice

Medium multiplexing. The above formula further proves the correctness of the Mex index. For example, for Figure 9, if we want to start with

Alibaba Cloud Time Series Database real-time index construction optimization practice

Token A is recalled,

Alibaba Cloud Time Series Database real-time index construction optimization practice

It is important to note that for the initial segment of the dependency chain, for example

Alibaba Cloud Time Series Database real-time index construction optimization practice

Alibaba Cloud Time Series Database real-time index construction optimization practice

Let's consider a more complex case, let's say that the query recalls data from multiple segments, that is, the query intersects with multiple segments in the time range. In SL index mode, each segment is independent of each other, and the index of the same timeline will be repeatedly built in multiple segments, and the query process will also be traversed repeatedly, which is a great waste. On the other hand, in our complement index design, the recall result of the previous segment

Alibaba Cloud Time Series Database real-time index construction optimization practice

is the result of the current segment recall

Alibaba Cloud Time Series Database real-time index construction optimization practice

(see Equation 1 above).

Therefore, in each query session, we process segments in the order of dependencies, and cache the intermediate results from the previous segment for the next segment query. In this way, in each segment, we only need to access the built local index, and we do not need to repeatedly traverse the index that each segment pre-order depends, thus avoiding the traversal of duplicate indexes in SL mode. Figure 11 illustrates the process of intermediate result reuse. In conclusion, on the one hand, the supplementary index reduces the redundant index construction in the construction process, improves the data effectiveness, and on the other hand, reduces the traversal of duplicate indexes in the query process, so it has obvious benefits.

Alibaba Cloud Time Series Database real-time index construction optimization practice

07

Index completion

In this section, we analyze the additional overhead of indexing dependency chains and propose a fine-grained index completion strategy to cut off long dependency chains.

7.1 Dependency Chain Overhead

In the previous sections, we discussed the benefits of dependency chains, such as the removal of redundant index building and traversal, which are actually the advantages of global index GL. If the dependency chain can be extended indefinitely, it will revert back to the GL version, which will lead to the read amplification problem under the series churn load mentioned earlier. A long dependency chain will incur the following overheads:

  • Memory bloat: The most recent data is frequently accessed, and you want to keep the query latency very low, so the most recent data is generally kept in memory, and for patch indexes, this means that all the dependent data of the real-time segment needs to be kept in memory to ensure query latency, and if the dependency chain is too long, this part of the memory will continue to expand.
  • Query recursive overhead: In the previous section, although we reduced duplicate index access, we introduced additional recursive overhead, that is, we accessed data that did not belong to us in the previous segment due to index dependencies. For example, in Figure 9, when we recall the seriesId set corresponding to token A in , it is not related to the seriesId 5 because the index dependency also participates in the inverted calculation. The longer the dependency chain, the greater the additional recursive overhead. When the dependency chain is infinite, this part of the recursive overhead is equivalent to the traversal of the GL index over the extinction line index.

Therefore, we need to complete the indexes in the segment at a specific time to cut off the pre-order dependency. 

7.2 Index Completion Algorithm

In Khronos, we chose to do asynchronous completion of the index during the dump phase. The reason why it is processed in the dump phase is due to the following reasons: (a) the index of the memory segment depends on the disk segment will introduce additional I/O overhead and increase the query latency; (b) Persistent segments maintain the self-explanatory nature and facilitate the export of data migration.

Algorithm 2 describes our whole process of complementing. Let's sort the series keys by metric. When all the data corresponding to a metric is completed, the corresponding metric subtrees in the original prefix tree index can be released immediately, reducing the index bloat in the completion process. For inverted indexes, we use inverted lists as granular completions. The query saves the references that are accessing the index, avoiding the release of memory during the query process.

Alibaba Cloud Time Series Database real-time index construction optimization practice

As we mentioned earlier, patching indexes can reduce duplicate index traversal compared to SL schemes during queries, but at the same time introduce additional recursive overhead. In the paper, we demonstrate that when the repetition of the timeline between adjacent segments exceeds a certain threshold, the query efficiency of the patching scheme is higher. Students who are interested in the details can read Section 7 of the paper. In Khronos' online tenants, the timeline duplication of adjacent segments is generally 70%-80%, and the benefits of deduplicating index traversal are much higher than the additional recursive overhead, making topping index queries more efficient.

08

Summary of the experiment

In the experiment, we compare the write throughput, efficiency, query latency and memory overhead of SL version and the patch index version, and test the benefits of different write and query scenarios. We also tested build and query performance with industry-leading open-source databases InfluxDB[8] and TimeScaleDB[9]. Compared with the open-source TSDB, Khronos increases the write throughput by at least 4 times, reduces the effective latency by nearly 100 times, and improves the query efficiency by at least 6 times, proving the high efficiency of Khronos.

8.1 Delay in Effectiveness

The SL solution is the original version of Khronos before the patch indexing scheme was launched, while the CPL scheme is the version after the patch indexing scheme was launched. Figure 14 illustrates the change in write throughput (top) and utility latency (bottom) during segment boot before and after the upgrade. We can see that in the SL solution, when the segment is started, due to the increased pressure of index construction, the effectiveness delay jitter is reduced to 5 minutes, which lasts for nearly 8 minutes, and at the same time, due to the greater overhead of index construction, the write throughput also drops sharply, see the orange line. In the CPL patch index scheme, the effectiveness delay remains below 1s when the segment is started, which is significantly higher than that of SL, as shown in the blue line.

After the optimization of supplementary indexes was launched in the second half of 2022, the effectiveness delay of Double 11 was reduced by hundreds of times compared with 2021 (achieving a breakthrough in minutes to seconds), greatly improving the user experience. 

Alibaba Cloud Time Series Database real-time index construction optimization practice

8.2 Query Efficiency

Figure 15 shows the query latency of different tenants ASI, HI (Hippo Docker), IG (IGraph), and TPP before and after using the patch index. The arrows in the figure indicate the percentage of latency reduction for the CPL scheme compared to the SL scheme. Table 3 shows the timeline and data scale of different tenant recalls, and combined with Figure 15 and Table 3, we can see that when thousands of lines and millions of points are recalled, the engine can still guarantee a query latency of less than 200ms. To analyze the latency benefit of the patch index, we can compare the solid and dashed lines of the same color.

For the vast majority of queries, patch indexes provide a positive latency benefit, up to 19%. We found that for queries with a time range of less than half an hour, the query typically only accesses one or two segments, in which case the benefits of a patch index to reduce duplicate index traversal are difficult to exploit, and instead reduce query efficiency by up to 6% due to additional recursive dependencies (see HI tenants). However, when the query time interval exceeds 3 hours, most of them will access local disks or remote storage, so the revenue ratio of supplemental indexes is not so obvious.

Alibaba Cloud Time Series Database real-time index construction optimization practice
Alibaba Cloud Time Series Database real-time index construction optimization practice

citation

[01] Fabian Reinartz. 2022. Writing a Time Series Database from Scratch.

https://web.archive.org/web/20210622211933/https://fabxc.org/tsdb/

[02] Franco Solleza, AndrewCrotty, Suman Karumuri, Nesime Tatbul, and Stan Zdonik. 2022. Mach: A Pluggable Metrics Storage Engine for the Age of Observability. In Proc. CIDR.

[03] Rahul Potharaju, Terry Kim, Wentao Wu, Vidip Acharya, Steve Suh, Andrew Fogarty, Apoorve Dave, Sinduja Ramanujam, Tomas Talius, Lev Novik, and Raghu Ramakrishnan. 2020. Helios: Hyperscale Indexing for the Cloud & Edge. Proc. VLDB Endow. 13, 12 (2020), 3231–3244.

[04] Colin Adams, Luis Alonso, Benjamin Atkin, John Banning, Sumeer Bhola, Rick Buskens, Ming Chen, Xi Chen, Yoo Chung, Qin Jia, Nick Sakharov, George Talbot, Nick Taylor, and Adam Tart. 2020. Monarch: Google’s Planet-Scale In-Memory Time Series Database. Proc. VLDB Endow. 13, 12 (2020), 3181–3194.

[05] 2022. Timescale: Time-series data simplified.

https://www.timescale.com

[06] Zhiqi Wang and Zili Shao. 2022. TimeUnion: An Efficient Architecture with Unified Data Model for Timeseries Management Systems on Hybrid Cloud Storage. In Proc. SIGMOD. ACM, 1418–1432.

[07] 2022. Prometheus-Monitoring system & time series database.

https://prometheus.io

[08] 2022. InfluxDB: Open Source Time Series Database.

https://www.influxdata.com

[09] 2022. Timescale: Time-series data simplified.

https://www.timescale.com

Author: Liu Xinyu (Xinyu)

Source-WeChat public account: Ali Technology

Source: https://mp.weixin.qq.com/s/sdcOaN8nCntKn83pwZ46Xw

Read on