How does reinforcement learning do data analysis? National Singapore et al. TKDE 2022 review paper

2022-04-11 13:09:21

Reporting by XinZhiyuan

Source: Specialized

【New Zhiyuan Introduction】Data analysis is now one of the necessary skills. Traditionally, static algorithms or rules are mostly used for data analysis, but in real-world scenarios, we are often faced with complex interactive environments, and how to learn better strategies is a very practical problem. Fortunately, reinforcement learning can be an effective way to solve this problem. Scholars from Nanyang Technological University in Singapore published a review paper on Deep Reinforcement Learning Data Processing and Analysis at TKDE, providing a comprehensive review of recent work, with a focus on improving data processing and analysis using DRL.

Data processing and analysis are fundamental and universal. Algorithms play a vital role in data processing and analysis, and many algorithm designs combine heuristics and general rules of human knowledge and experience to improve their effectiveness.

In recent years, reinforcement learning, especially deep reinforcement learning (DRL), has been increasingly explored and utilized in many areas because it can learn better strategies in complex interactive environments than statically designed algorithms. Driven by this trend, we have conducted a comprehensive review of our recent work, with a focus on improving data processing and analysis with DRL.

First, we introduce the key concepts, theories, and methods in DRL. Next, we will discuss the deployment of DRL on database systems to facilitate data processing and analysis in all aspects, including data organization, scheduling, tuning, and indexing.

We then investigated the application of DRL in data processing and analytics, from data preparation and natural language processing to healthcare, fintech, and more.

Finally, we discuss the important challenges and future research directions for using DRLs in data processing and analysis.

Thesis link: https://arxiv.org/abs/2108.04526

In the era of big data, data processing and analytics are fundamental and ubiquitous, and critical for many organizations that are on a digital journey to improve and transform their businesses and operations. Data analytics often requires other key operations such as data acquisition, data cleansing, data integration, modeling, and more before insights can be extracted.

Big data can unleash tremendous value creation in many industries, such as healthcare and retail. However, the complexity of the data (e.g., high capacity, high speed, and high diversity) presents many challenges to data analysis, making it difficult to derive meaningful insights. To meet this challenge and promote efficient and effective data processing and analysis, researchers and practitioners have designed a large number of algorithms and techniques, and have also developed a large number of learning systems, such as Spark MLlib and Rafiki.

To support fast data processing and accurate data analysis, a large number of algorithms rely on rules developed based on human knowledge and experience. For example, Shortest Job First is a scheduling algorithm that selects the job with the shortest execution time for the next execution. However, in the case of not taking full advantage of workload characteristics, its performance is poor compared to learning-based scheduling algorithms. Another example is packet classification in a computer network, which matches a packet to one rule in a set of rules. One solution is to construct a decision tree using manually tuned heuristic classifications. Specifically, heuristics are designed for a specific set of rules and therefore may not work well for other workloads with different characteristics.

We observe three limitations of existing algorithms:

First, the algorithm is suboptimal. Useful information such as data distribution may be ignored or underutilized by rules. Second, the algorithm lacks adaptability. Algorithms designed for a particular workload do not perform well in a different workload. Third, algorithm design is a time-consuming process. Developers have to spend a lot of time trying out a lot of rules to find one that works empirically.

Learning-based algorithms are also used for data processing and analysis. There are two learning methods that are often used: supervised learning and reinforcement learning. They achieve better performance by directly optimizing performance targets. Supervised learning often requires a rich set of high-quality labeled training data, which can be difficult and challenging to obtain. For example, configuration tuning is important to optimize the overall performance of a database management system (DBMS). In discrete and contiguous spaces, there may be hundreds of tuning knobs associated with each other. In addition, different database instances, query workloads, and hardware characteristics make data collection unusable, especially in cloud environments.

Reinforcement learning performs better than supervised learning because it employs trial-and-error searches and requires fewer training samples to find a well-configured cloud database.

Another concrete example is query optimization in query processing. The task of the database system optimizer is to find the best execution plan for queries to reduce query costs. Traditional optimizers typically enumerate many candidate plans and use a cost model to find the one with the least cost. The optimization process can be slow and inaccurate.

Without relying on an inaccurate cost model, deep reinforcement learning (DRL) methods improve execution plans by interacting with the database (for example, changing the order in which tables are joined).

When a query is sent to an agent (i.e. the DRL optimizer), the proxy generates a state vector by characterizing basic information such as the relationships and tables accessed. Taking the state as input, the agent uses a neural network to generate a probability distribution of an action set that can contain all possible join operations as potential actions.

Each action represents a partial join plan on a pair of tables, and once the action is performed, the status is updated. After taking the possible actions, a complete plan is generated, which is then executed by the DBMS to receive a reward.

In this query optimization problem, the reward can be calculated based on the actual delay. During training with reward signals, agents can improve strategies, resulting in better connection ordering with higher rewards (i.e., less latency).

Drl workflow for query optimization

Reinforcement learning (RL) focuses on learning to make intelligent actions in an environment. On the basis of exploration and development, the RL algorithm improves itself through environmental feedback. Over the past few decades, RL has made tremendous progress in both theory and technology.

Notably, DRL combines deep learning (DL) techniques to process complex unstructured data and is designed to learn and explore itself from historical data to solve notoriously difficult and large-scale problems such as AlphaGo.

In recent years, researchers from different communities have come up with DRL solutions to solve problems in data processing and analysis. We classify existing works using DRL from two perspectives: system and application.

From a systems perspective, we focus on basic research topics, from general, such as scheduling, to system-specific, such as database query optimization. We should also highlight how it was developed in markov decision-making processes and discuss how to solve drl problems more effectively compared to traditional methods. Due to the long workload execution and data acquisition time in the actual system, techniques such as sampling and simulation are used to improve the efficiency of DRL training.

From an application perspective, we will cover a variety of key applications in data processing and data analysis to provide a comprehensive understanding of the usability and adaptability of DRLs. Many fields are converted by using DRL, which helps to learn domain-specific knowledge about the application.

In this roundup, our goal is to provide an extensive and systematic review of the latest advances in using DRL in solving data systems, data processing, and analytics problems.

RL Technology Classification

Resources:

[1] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, A. Hung Byers et al., Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011.

[2] X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen et al., “Mllib: Machine learning in apache spark,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1235–1241, 2016.

[3] W.Wang, J. Gao, M. Zhang, S.Wang, G. Chen, T. K. Ng, B. C. Ooi, J. Shao, and M. Reyad, “Rafiki: machine learning as an analytics service system,” VLDB, vol. 12, no. 2, pp. 128–140, 2018.

How does reinforcement learning do data analysis? National Singapore et al. TKDE 2022 review paper

Read on