laitimes

Database "autonomous driving", Tencent cloud native database X AI exploration and breakthrough!

Introduction | The latest research results of Tencent Cloud's native database team were selected as the international top meeting SIGMOD, the database combined with AI to form an autonomous brain, and achieved good results in the 2022 intelligent tuning human-machine competition, marking Tencent Cloud's major breakthrough in the field of database autonomy and achieving performance leadership.

Add AI technology to form a database autonomous brain

—— Zhou Ke/Professor of Huazhong University of Science and Technology

Professor Zhou Ke said that in the context of massive data, the growth of DBAs (manual operation and maintenance) is far from keeping up with the growth of data, and the user load is diverse and dynamic, and the immutable operation and maintenance mode can no longer meet the needs of users. Adding AI to the database and forming an autonomous brain of the database is in line with the development direction of database autonomy.

The basic framework for achieving database autonomy includes three aspects: observation, analysis, and decision-making. Collect load data at a lower cost, select the appropriate method for analysis based on the collected data, make the final decision on when to deploy, quantify the operation to facilitate model processing, and perform feedback operations so that the model can be self-learning and self-optimizing, so that the autonomous database can be automatically configured, managed, and optimized for specific data and loads without human assistance.

In the future, the challenges facing database autonomy are threefold:

The database load is dynamic and diverse, and it is necessary to ensure the efficient adaptability of the database;

With fewer resources, ensure the performance stability of the database;

Database autonomous operations are interpretable to help system administrators learn from them and to facilitate the optimal development of the system.

Get better tuning results in as little time as possible

—— Xing Jiashu / TEG Database R&D Department / Cloud Native Database R&D Center / Senior Engineer

The database has many parameters, which is difficult to tune and costly to operate and maintain people; the existing tools have limited functions, take a long time and the effect is average; some users do not have a full-time operation and maintenance team, and parameter tuning is even more difficult to achieve. In view of the difficulties, the Tencent Cloud Database team launched a parameter tuning service to automatically tune database parameters end-to-end. Compared with the existing methods, CDBTune (Tencent Cloud MySQL Hybrid Tuning System) does not need to subdivide the load type, does not need to accumulate a large number of samples, and can intelligently learn the parameter tuning process and obtain better parameter tuning effects.

In principle, a deep reinforcement learning model is used. By stress testing the database, recording the internal and external indicators of the database, and generating samples for learning. Use genetic algorithms and expert experience for fast warm-up, significantly increasing tuning speeds through parallel architectures. Implement end-to-end design, simple, efficient and easy to train, easy to service, able to find the best tuning direction for users in as little time as possible.

Landing to the application level of engineering practice. Separate service scheduling and task execution work, and the worker performs specific tasks. The Learner task is responsible for taking samples, calculating network gradients, updating neural networks, and recommending database parameters for Actors. Actors are responsible for interacting with the training instance, setting parameters, replaying traffic, and collecting performance data; each round is completed, new parameter recommendations are obtained from Lear, forming a closed loop. The overall implementation is a parallel architecture with high availability, scalability, and automatic task recovery.

Database autonomy "monitor-diagnose-solve" AI technology practice

—— Yuan Zhang/TEG Database R&D Department/Cloud Native Database R&D Center/Expert Engineer

In the database service, database resources include memory /IO/CPU, resource monitoring, abnormal identification, detection is very important, only the rational use of database resources, in order to maintain the stability and efficiency of the database service. The anomaly detection capability of Tencent Cloud MySQL can automatically detect anomalies (memory analysis, kernel buried points, io latency hardware resource statistics), identify anomalies, realize the internal closed loop of anomaly detection, and reduce the pressure of O&M.

Tencent Cloud MySQL can set the SQL flow limit function, after discovering the abnormal request, the abnormal service SQL is restricted, so as to ensure that the normal SQL statement can run; improve the mySQL official histogram, launch the Compressed histogram, to avoid the problem caused by inaccurate statistical information caused by data tilt, select the wrong plan; launch the Statment Outline function, the query plan that users need is solidified, and there is no need to modify the SQL statement. From the Outlint table, you can query the corresponding plan, thereby improving the user experience; parallel optimization of the new index, the introduction of parallel sort optimization, parallel construction of btree, compared with the official mysql, the performance is better (the acceleration ratio can be up to 15, which is 5 times that of the official mysql), more complete; the optimizer autonomy, tracking business SQL performance data (SQL tags, performance buried points, change tracking), automatically generate optimization strategies (statistics, virtual indexes, Plan intervention), verify the optimization strategy and take effect in grayscale, realize SQL optimization closed-loop, and reduce the pressure of large-scale operation.

For the performance problems of specific business scenarios, the deadlock scenario is optimized, the deadlock detection switch is set, the deadlock information is enriched, the transaction lock is optimized, the probability of deadlock is reduced and the lock resource occupation is reduced; for the second-kill scenario of the e-commerce business, the hot spot update protection function is dynamically opened with one click, so that the service is insensitive, and the performance of the second-kill scene is increased by 50 times; the migration switching scene is optimized, and the problem of long HA warm-up time is solved through the synchronous optimization of the main and standby caches, so that the HA service is smoothed and excessive, and the jitter is reduced.

The database is "intelligent" to adapt to any business scenario

—— Cheng Changming/CSIG Cloud Product Department 1/Database Center/Senior Product Manager

Due to the wide variety of business systems, parameter tuning for business is a headache for database managers, and it is often necessary to use experience to build a set of relatively "effective" parameter templates, and often the template cannot cope with all situations. "Smart" to adapt to any business scenario.

Tencent Cloud MySQL published 2 top SIGMOD papers in 2019 and 2022:

In 2019, Tencent's cloud database product team proposed CDBTune, an end-to-end cloud database parameter tuning system based on deep reinforcement learning (DRL), and the research paper "An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning" was selected as SIGMOD Research Full Paper (Research Long Article)

In 2022, the latest research results of Tencent Cloud Database Product Team were selected into the SIGMOD Research Full Paper, and the paper title was "HUNTER: An Online Cloud Database Hybrid Tuning System for Personalized Requirements", marking a further breakthrough in the database AI intelligence of Tencent Cloud Database Team. Achieve performance leadership.

Through the way of AI intelligent analysis, the best parameter adjustment effect can be obtained; through the "one-click" method, the complex parameter adjustment process can be completed, and the best parameter setting suggestions can be obtained. Depending on the business situation, the characteristics required for each stage of the business are different. The best practices of Tencent Cloud MySQL can be applied to the following three stages of the business:

New instance purchase stage: Train the optimal configuration for each scenario, match the business characteristics as much as possible, and increase by 15%-50% for different workloads.

Rapid service iteration stage: Determine the business type, completely customize different scenarios according to their own situation, estimate the optimization results, and quickly apply them to the instances with one click. Take the game as an example, the players are frantically pouring in at the beginning of the game; taking e-commerce as an example, the generation of shopping peaks; and predictingly setting the best parameters in advance to cope with the upcoming database peak pressure.

Business stabilization stage: Through the capture and playback of the workload characteristics of the database, the monitoring indicators and SQL operation status are monitored and analyzed, and the parameter values are continuously adjusted through deep learning to finally output the best parameter values.

In addition to database parameters, there are various factors affecting the efficient operation of databases, and SQL execution efficiency, indexing is reasonable, locks, resource allocation, etc. can be solved by "AI".

Tencent Cloud MySQL "Intelligent Tuning" will issue a public beta invitation in May, so stay tuned!

What database experts say

Read on