laitimes

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

author:AI Tech Review
The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

The development of drug resistance in modern cancer treatment is often the cause of treatment failure and tumor progression, and the drug resistance and tumor characteristics of each patient are highly individualized.

In order to address the limitations of traditional intermittent androgen deprivation therapy (IADT) in the treatment of prostate cancer, Qingpeng Zhang's team from the University of Hong Kong, together with Huazhong University of Science and Technology, Moffitt Cancer Center and Princeton, has developed a data-driven reinforcement learning solution.

First, they developed a time-varying mixed-effects GLV (tM-GLV) model based on the heterogeneity of evolutionary mechanisms and the pharmacokinetics of drugs for individual patients. Then, they proposed a reinforcement learning-supported individualized IADT framework, known as Individualized IADT, to learn the dynamics of individual patients' prostate tumors and derive the optimal dosing strategy. Simulation experiments using clinical trial data have shown that the time to disease progression in prostate cancer patients is significantly prolonged with reduced drug doses. In addition, the method is equally applicable to other cancers, as it can be adapted based on clinical data.

In summary, it is a promising personalized treatment tool that can be used for personalized treatment of different types of tumors.

1

body

Prostate tumors are the second most common cancer in the world, and treatments typically include radiation therapy and hormone therapy. Hormone therapies such as ADT can be effective in treating advanced prostate cancer, but they can also have side effects. Drug resistance is a difficult point in the treatment of prostate cancer, and traditional dosing policies may lead to the rapid spread of drug-resistant cells. As a result, intermittent androgen deprivation therapy (IADT) has been proposed and has been validated in numerous clinical trials.

Traditional IADT suffers from two design problems, namely induction therapy and a strict treatment schedule. Recent studies have shown that methods of stopping and resuming ADT administration based on predetermined PSA thresholds may be more successful without induction therapy. However, IADT therapy designed in this way has not yet taken full advantage of the patient's personality traits and a wealth of other clinical information, such as multiomics data.

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

Therefore, Qingpeng Zhang's team proposed a reinforcement learning-supported personalized mathematical oncology model framework (), which learns patient-specific tumor evolutionary dynamics from actual patient data, and proposes an evolution-versus-competition-based optimal therapy that integrates patient-specific, treatment-specific, and tumor-specific into an evolutionary model (tM-GLV) to simulate the competition and coexistence mechanisms between reactive tumors and drug-resistant tumors. Reinforcement learning was used to further consider patient heterogeneity and tumor competitive evolution mechanisms, and to derive the optimal dosing strategy for individual patients.

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

Address: https://academic.oup.com/bib/article/25/2/bbae071/7630480?login=false#deqn01

Due to the complex interacting factors, the evolutionary dynamics of prostate cancer cannot be fully described. But according to the systems cybernetic approach, we can build the cancer ecosystem into a mathematical model that captures key processes at the cancer level, including selectivity, competition, mutation, adaptation, etc.

The research team developed a time-varying mixed-effect generalized Lotka-Voltra (tM-GLV) model with the above processes (1). The tumor itself is heterogeneous, and the research team hypothesized that there were two phenotypes of prostate cancer cells before treatment, namely reactive (hormone-dependent) and drug-resistant (hormone-independent) cells. Drug-resistant cancer cells are initially in the minority, but under conditions of androgen suppression, they can gain a growth advantage. At the same time, there is fierce competition in the tumor microenvironment due to the high resource requirements (oxygen, etc.) of these two phenotypes. The research team innovatively made a static matrix of relationships dynamic to capture the evolution of cancer under the influence of drugs and competition, as well as the accumulation of drug resistance.

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

It is still a challenge to accurately predict the evolution of antimicrobial resistance through these models, and to delay the accumulation of antimicrobial resistance and prolong the survival time of patients. In this work, the researchers employed reinforcement learning to learn strategies for drug delivery, and the agent acts as a controller to help control the evolution and development of drug resistance.

Reinforcement learning algorithms can be divided into value-based and policy-based algorithms. The researchers tested several modern reinforcement learning algorithms, including DDPG, TRPO, PPO, and SAC. However, each algorithm has its advantages and limitations.

DDPG is a deterministic off-policy algorithm that can only be applied to continuous states and continuous action spaces. TRPO is an on-policy reinforcement learning algorithm that uses KL divergence to control the update from the old policy to the new policy, but its second-order optimization makes fine-tuning hyperparameters difficult. Both SAC and PPO are easy-to-implement and very flexible algorithms for discrete or continuous action state spaces, and researchers have found that PPO has better learning efficiency and convergence than SAC through experiments.

Reinforcement learning is a continuous process in which the agent agent interacts with the environment in discrete time steps, and at each step, the agent receives the state of the environment

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

And choose an action based on the strategy

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

, the environment is updated to

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

and action-related rewards

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

Be responsive. After each cycle, the agent updates the strategy π and value function

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

where π map the S state into action space A, ie

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

In the RL problem, where the state-action space is inexhaustible, it is not practical to store a separate value function for each possible state. A policy-based strategy gradient algorithm has been proposed as an alternative, that is, the strategy gradient is estimated and the stochastic gradient ascent algorithm is used to improve the strategy, and its main feature is to directly model and optimize the strategy. PPO optimizes the gradient estimation algorithm on the basis of the gradient strategy, so that each update of the strategy must be controlled within a given maximum deviation range, without having to calculate the KL divergence between the old and new strategies, reducing the complexity of the algorithm. PPO's gradient estimation algorithm balances the trade-off between explore and exploit in reinforcement learning, and prevents the new strategy from deviating too far from the old strategy, so as to achieve stable and efficient learning.

After confirming the reinforcement learning algorithm, it was necessary to construct a reinforcement learning environment, and the researchers built a PCaC environment based on the tM-GLV model, which included a continuous state space of the tumor, drug-controlled actions, and immediate feedback (reward function). Therefore, we must define states, action spaces, and reward functions, which are the three key elements of reinforcement learning.

Two phenotypic and biomarker markers (serum PSA levels) of prostate cancer cells are included in the tM-GLV model proposed by the investigators (1). Therefore, at each time step t, the level of cell number and the level of PSA are observed as the current state. Other combinations of features can provide more information for model training, more precisely, the transient growth/decay rate

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

It can be used as a complement to the state function, reflecting the effects of the current drug as well as the pressure of competition, and can be obtained directly from the current state. As a result, the state of the PCaC environment is determined by the

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

Given.

In addition, the action space is composed of doses of two drugs, and the work uses a discrete action space, but they propose that this method can be easily extended to the continuous action space, i.e., continuous administration of doses and continuous administration time.

Finally, the reward function relates to drug efficacy and competitive intensity, with the addition of penalties for the dose administered. Among them, it is important to note that insufficient dosing may lead to a suboptimal strategy, that is, the agent will allow the reactive cancer cell population to proliferate uncontrollably, which on the one hand inhibits the proliferation of drug-resistant cancer cells, but leads to cancer metastasis and disease progression. To solve this problem, the researchers assigned a time reward for tumor-free progression to the reward function and used a metastatic probability model to simulate the metastasis of cancer cells as a stopping criterion to avoid the infinite expansion of the reactive cancer cell population.

2

Experimental results

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

The results in the figure above show that reinforcement learning can significantly delay the time to progression (TTP) in resistant patients. Figure (2) shows the dosing strategy and treatment results on the left, and the corresponding standard IADT dosing strategy and TTP for the corresponding patient on the right, where the gray bar indicates the time of discontinuation and the red bar indicates the time of medication. Among other things, we found the following differences from the standard IADT.

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

First, the average time per treatment cycle was reduced compared to standard IADT: 1.3 months instead of 13. 4 months, 3.5 months instead of 16.5 months.

As shown in figure (b) above, under this adaptive dosing strategy obtained through reinforcement learning, the reactive cancer cell population oscillates at a relatively high level before drug resistance develops. The competitive advantage of responsive cancer cells also exhibits this oscillatory pattern, suggesting that the proposed I2ADT can inhibit resistant cancer cells by exerting competitive pressure on responsive cancer cells.

As shown in Figure (c), the biphasic pattern typically observed in IADT is avoided by shortening the treatment period. The biphasic pattern observed under conventional IADT therapy suggests that the effect of medication decreases for 6-8 months after initiation of treatment.

Second, what is learned through reinforcement learning is dynamic and tailored to the needs of each patient. In the initial stages of treatment, reactive cancer cells are given a greater competitive advantage over drug-resistant cancer cells compared to IADT and traditional continuous ADT. As treatment progresses and intratumoral competition continues, the competitive advantage of reactive cancer cells gradually declines to zero in both IADT and ADT. However, a significant competitive advantage remains, which allows reactive cells to compete with drug-resistant cancer cells, ultimately prolonging the survival time of drug-resistant patients.

To compare efficacy with IADT or ADT, we used the following measures: time to progression (TTP) and progression-free survival (PFS) and total dose. TTP is defined as the time it takes for a single patient to reach the end of simulation (EOS). FPS refers to the time from the start of treatment to the occurrence of disease progression (EOS). EOS is reached when drug-resistant cancer cells account for 80% of their capacity or when the simulation reaches its maximum number of steps (120 months).

The simulation results show that by maintaining a high competitive advantage in the early stage, the TTP and PFS rates are significantly extended compared with standard IADT or ADT (P=0.0019). These results suggest that adaptive drug delivery can be used as an effective strategy to delay the onset of drug resistance and improve patient outcomes.

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

Considering the unavoidable adverse effects during hormone therapy, it is best to reduce the dose as long as the condition is under control. In Table (1), we compared the rate of decline in the average dose per cycle of CPA and LEU, and the overall proportion of treatment time compared to standard IADT.

The team of HKU Cheung Ching-peng proposed a new idea of personalized cancer treatment: using AI to control the evolution of cancer cells

The results showed that the dosage of CPA and LEU in the treatment was significantly reduced, and the proportion of the treatment period was also reduced, indicating that I2ADT could reduce the risk of adverse reactions in the treatment of prostate cancer patients and improve the quality of life of patients.

3

Conclusion and outlook

AI makes it possible to explore and utilize big data, and at the same time, combined with traditional biophysical and mathematical models, the model has stronger explanatory properties. Especially in the field of cancer treatment, there is a huge amount of data waiting for us to mine and utilize.

In this work, Qingpeng Zhang's team proposed a therapeutic dose strategy for prostate cancer, called: This strategy utilizes reinforcement learning methods to inhibit drug-resistant cells by taking advantage of the competitive advantages of reactive cells. This framework has broad adaptations and can be used to optimize treatments for other cancer types. However, adjustments to mathematical models and reinforcement learning structures are needed for different cancer types, and a variety of clinical data is needed to support the optimization of such individualized treatment options.

They point out that AI models have shown strong performance in current intermittent therapy applications for prostate cancer, but their versatility may be limited due to the specificity of the training data used and have not been tested in different clinical settings.

They also acknowledged that there are limitations in terms of data, as clinical trial data are mainly focused on drug administration and PSA as a single biomarker, ignoring other physiological, genetic, and lifestyle factors. Therefore, there is a need to address these limitations in the future, gather more information, and validate the efficacy and safety of the model in different tumor settings.

In addition, the researchers also mentioned that their model synthesizes the effects of the two drugs, but further research is needed on the subtle differences between the two drugs in terms of disease pathway interactions.

At the same time, in order to improve the effectiveness of the model, more detailed patient-specific clinical and pathological data need to be obtained, including information on the drug combination. The article also mentions the challenges of integrating these deep learning models into clinical workflows and highlights the importance of addressing them.

In addition, the article points out some limitations of the study, including the lack of data from a comprehensive biomarker panel and consideration of serum testosterone recovery in patients after treatment.

While there are limitations and challenges to current work, looking to the future, we believe that collaboration among data scientists, pharmacologists, and oncologists can further optimize and other adaptive treatment strategies. This interdisciplinary effort is essential to realize the full potential of personalized medicine to improve cancer treatment outcomes.

Read on