laitimes

Huawei's own time series database is open source, let's see how the level is

author:Not bald programmer
Huawei's own time series database is open source, let's see how the level is

The use of time series databases (TSDBs) has been prevalent in a variety of industries for decades, especially in financial and industrial control systems. However, the advent of the Internet of Things (IoT) has led to a proliferation in the volume of time series data, or time series data for short, which has placed higher demands on database performance and storage costs, driving the need for dedicated time series databases.

Faced with the outdated architecture and limited scalability of legacy time series solutions, a new generation of time series databases has emerged with modern architectures that enable distributed processing and horizontal scaling, as well as flexible deployment in the cloud or on-premises.

At the end of 2022, another blockbuster product joined the track of open source time series database, and was tested and produced in more than 60 enterprises in just one year, attracting 70+ contributors from key universities and enterprises at home and abroad - openGemini, Huawei's open-source distributed time series database, mainly focuses on the storage and analysis of massive time series data, simplifies the business system architecture through technological innovation, reduces the storage cost of massive time series data, and improves the storage and analysis efficiency of time series data.

Today, we invited Xiang Yu, the leader of the openGemini community, to talk about their open source story~

01 Originated from internal needs, and gradually moved towards self-development

The development of openGemini originated from Huawei's own needs.

In 2019, with the establishment of HUAWEI CLOUD, data centers in Guangzhou, Shanghai, Beijing, Guizhou, and Hong Kong were built one after another, and 260+ cloud services were launched one after another, collecting terabytes of monitoring metrics on average every day. The larger the amount of data, the lower the query efficiency, and the increasing cost of data storage.

At the time, there weren't any good time series database products that could keep up with demand. InfluxDB is still a stand-alone version, and domestic Apache IoTDB and TDengine are far from meeting the production requirements. Therefore, Huawei is determined to build its own database, optimize data processing, and solve the most important business problems today. In this context, openGemini came into being.

According to Xiang Yu, in terms of technology selection, they initially carried out cluster transformation on the basis of open source InfluxDB. However, as the number of metrics and the frequency of collection increases, the daily data volume has reached tens of terabytes. At this time, the flaws of InfluxDB's own architecture begin to become apparent, affecting the performance and stability of the system. Therefore, they chose to refactor the architecture and started the road of self-development of the openGemini kernel.

02 Unique personality and performance

Since its inception, openGemini has been closely related to the needs of Huawei's own business, so every design is full of practical considerations. Specifically, openGemini is different from other time series databases in that it has 9 main "personalities":

Performance advantages: High performance is the most important item in openGemini's differentiated competitiveness. Compared with the open-source InfluxDB, openGemini improves simple query scenarios by more than 2 times, medium query scenarios by more than 5 times, and complex query scenarios by more than 10 times in massive data scenarios. Compared with other similar open source products, openGemini also has obvious performance advantages.

The official stand-alone write performance is as follows (the test tool is TSBS, please refer to the official website documentation for relevant test details):

Huawei's own time series database is open source, let's see how the level is

Official comparison of single-node query performance (average latency, ms) in DevOps scenarios:

Huawei's own time series database is open source, let's see how the level is

Comparison of stand-alone query performance (average latency, ms) in IoT scenarios:

Huawei's own time series database is open source, let's see how the level is

In addition, openGemini has launched a series of practical functions in data storage and data analysis to build more competitive differentiation:

Unique distributed architecture: openGemini provides two versions: stand-alone and distributed clusters, in which the distributed cluster adopts the MPP massively parallel processing hierarchical architecture, which divides the computing engine, storage engine, and metadata management into independent components, which are ts-sql, ts-store, and ts-meta. Different components can be scaled out independently, so that they can flexibly respond to complex application scenarios.

High-cardinality engines: The high-cardinality problem (also known as a dimension disaster) has long plagued the development of time-series databases, which can lead to inverted index bloat, resulting in high memory resource consumption and reduced read and write performance. The openGemini high-cardinality engine completely solves this problem by building sparse indexes dedicated to time series, which is very suitable for use in network monitoring, financial risk control, Internet of Things, transportation, and other fields.

Huawei's own time series database is open source, let's see how the level is

Text retrieval: Text data is a common data type, openGemini supports indexing on top of text data, adopts dynamic learning word segmentation methods, supports exact, phrase and fuzzy matching, and has the advantages of low memory resource occupation and high retrieval efficiency.

Streaming aggregation: Streaming aggregation is a pre-aggregation method that writes data and downsamples data at the same time, which aims to solve the problem of serious I/O amplification caused by traditional downsampling methods that read a large amount of historical data from disk for calculation.

Multi-level downsampling: For existing historical data, the traditional downsampling method retains the historical data details. In some scenarios, the historical data details are not important, and only the data features need to be retained, and the multi-level downsampling function can extract the features of the historical data details and replace the historical data details in situ, which can further reduce the storage cost by 50%.

Anomaly detection and prediction: Anomaly detection and prediction is one of the most mature applications of time series data analysis, and is widely used in scenarios such as quantitative trading, network security detection, and routine maintenance of data centers, industrial equipment, and IT infrastructure. openGemini provides an anomaly detection library, openGemini-Castor, which encapsulates the detection algorithms of 13 common anomaly scenarios, which has the advantages of fast detection speed, high accuracy, and integrated flow and batch, helping applications improve data analysis efficiency.

Hot and cold tiered data storage: Historical data can be dumped to object storage, which can permanently retain historical data in a low-cost manner and support offline analysis of big data. [This feature is planned to be released in H2]

Data reliability: Multiple compute replicas are supported to further improve data reliability. [This feature is planned to be released in H2]

Huawei's own time series database is open source, let's see how the level is

03 Pay attention to the user experience, and it is more convenient to get started

openGemini is not only powerful, its unique design can also bring a lot of comfortable experience in practical applications:

In terms of getting started, openGemini is fully compatible with the ecosystem of InfluxDB v1.x, and the interfaces and ecological tools are directly common, so that InfluxDB users can migrate without barriers. At the same time, openGemini uses the same Line Protocol as InfluxDB, which is simple and easy to understand for data modeling, and is also friendly to relational database developers. Finally, openGemini uses a SQL-like query language, so you don't need to learn more, so it's easy to get started. In terms of cluster deployment, the community also provides a one-click deployment tool, Gemix, which saves a lot of configuration work.

In terms of operating systems, openGemini currently supports mainstream Linux systems (including openEuler), Windows, and MacOS, making application development and debugging more convenient. The processor supports both x86 and ARM64 architectures.

In terms of cloud native, openGemini provides Dockerfile and Docker images, and supports the deployment of platforms such as Docker, K8s, and KubeEdge. Since the IP address changes after the container is re-opened, openGemini adds a domain name function to ensure that the cluster nodes can still be connected after the container is restarted. The community has also created an openGemini-operator project to facilitate one-click containerized deployment. openGemini supports Prometheus remote reading and writing, which can be used as a back-end storage for Prometheus to solve the problem of insufficient storage capacity. [btw: openGemini will also directly support PromQL, currently under development]

In terms of observability, the community has developed the ts-monitor component, which collects node and kernel metrics, and is divided into 19 subcategories and more than 260 items, which can be used with Grafana to comprehensively monitor the running status of openGemini. For example, CPU and memory utilization, write bandwidth, write latency, write concurrency, QPS, and other metrics can be viewed at a glance through the visual interface, so that you can view the running status, optimize database performance, and pinpoint problems at any time.

04 After internal combat test, give back to open source

As a time series database, the most common use cases of openGemini are IoT and O&M monitoring, and it has incomparable advantages in processing massive amounts of data. At the same time, as an internal project of Huawei, openGemini has been tested by "its own family":

HUAWEI CLOUD SRE uses openGemini as the monitoring data storage base and deploys a total of 25 clusters with a maximum cluster size of 70 nodes, successfully withstanding the test of 40 million data writes per second and 50,000 concurrent queries per second. Compared with the original solution, the end-to-end latency of the original system is reduced by 50%, CPU resources can be reduced by 68%, memory resources can be reduced by 50%, and hard disk resources can be reduced by more than 90% when carrying the same services.

Since switching to openGemini, HUAWEI CLOUD's industrial IoT platform, the stand-alone version of InfluxDB, is no longer bothered by throughput, with end-to-end and query performance improved by three times, and the number of device access increased to millions.

Xiang Yu introduced that openGemini originated from open source and benefited a lot from the InfluxDB open source project, so adhering to the spirit of open source, all the code of openGemini is open sourced, hoping that more enterprises and developers around the world will benefit from it, and also hope to work with the majority of developers to jointly promote technological innovation and share open source results through an open community platform.

At present, openGemini only has an open source version and cloud services, and does not plan to get involved in offline commercial versions, and is willing to donate to the foundation. At present, there are still many imperfections in the community, and in the future, the community will further enrich the ecological tools of openGemini (such as data migration tools, SDKs, big data ecosystem integration, etc.), visual management interface, and documentation.

"At present, the community's technical planning will generally focus on the three important application scenarios of the Internet of Things, operation and maintenance monitoring, and observability, strengthen the compatibility of related technology ecosystems and the construction of kernel capabilities, and we are starting to demonstrate the next-generation software architecture of openGemini. Xiang Yu said.

Huawei's own time series database is open source, let's see how the level is

"In the short term, openGemini will not consider industrial-related scenarios, because the business scenarios in the industrial field are very complex, the real-time requirements are extremely high, the moat of industrial software vendors is very deep, and the time series database can do limited. In addition, the community lacks industry background and does not know enough about this scenario. After that, we will consider looking for some partners in the industrial field, such as industrial software vendors, solution providers, etc., to cooperate and improve together. Xiang Yu said.

openGemini 官网主页:

Hatps://vv.opengemini.org/

openGemini 开源地址:

hatps://github.com/opengemini

Read on