laitimes

Application of artificial intelligence in drug discovery

author:Tianjin Chinese herbal medicine
Application of artificial intelligence in drug discovery

Over the past 20 years, biologists and chemists have been working to develop an efficient and advanced drug discovery and evaluation system [1] that not only efficiently discovers targeted therapeutic drugs, but also reduces the risk of potential adverse drug reactions. In order to reduce the time and cost of new drug development, researchers have turned their attention to computer- and big-data-based methods, such as virtual screening (VS) and molecular docking, but these methods have problems such as poor accuracy and low efficiency [2]. Artificial intelligence (AI), including deep learning (DL) and machine learning (ML) algorithms, is expected to overcome many of the problems and difficulties encountered in the drug design and discovery process. Computer modeling based on AI principles provides a new and good idea for compound identification and verification, target identification, peptide synthesis, evaluation of drug toxicity and physicochemical properties, drug monitoring, drug efficacy and drug repositioning [3].

Google Search Trends in September 2015 showed that AI is one of the most searched terms since the introduction of ML. Some scholars describe ML as the main AI application, while others believe that ML is a subset of AI [4-6]. Russell et al. [7] believe that there are 7 sub-categories of AI, namely reasoning and problem solving, knowledge representation, planning and social intelligence, perception, machine learning, robotics (motion and manipulation), and natural language processing. In short, DL is a subset of ML, ML is a subset of AI, and the order of development of the three is AI>ML>DL.

Currently, there are a variety of ML techniques that can help scientists find valuable features, patterns, and structures in biomedical big data. The National Center for Bioinformatics (NCBI) Gene Expression Comprehensive Database (GEO) [8], Cancer Genome Atlas (TCGA), and ArrayExpress [9] are large databases containing gene expression data. By analyzing gene expression signatures, scientists can identify target genes that cause different diseases. For example, using ML methods and gene expression data, van Jzendorn et al. [10] discovered new biomarkers and potential drug targets for rare soft tissue sarcoma.

There are also many large chemical databases available to scientists to help them discover drugs for specific targets, such as PubChem, a freely accessible chemical chemical database that covers data on various chemical structures, including biological, physical, chemical and toxic properties. The ChEMBL database contains absorption, distribution, metabolism, and excretion (ADME) signatures of many bioactive compounds, as well as target interaction information [11-12]. In addition, there are similar databases such as DrugBank [13], LINCSL1000 [14] and PDB for drug development.

In the pharmaceutical industry, AI can help solve many of the problems in classical chemistry that affect drug discovery and development. With the advancement of information technology and the rapid development of high-performance computers in the past 10 years, a series of AI algorithms from ML to DL in computer-aided drug design (CADD) are increasing day by day. At present, the main goal of scientists is to improve the drug discovery and development process with high accuracy and confidence through ML algorithms based on classical chemistry. Over the past 20 years, many techniques and tools such as drug computational discovery, quantitative structure-activity relationship (QSAR) modeling, and free energy minimization methods have been established [15]. Zhang et al. [16] conducted a computer analysis of a novel coronavirus to screen for compounds that are biologically active against severe acute respiratory syndrome (SARS). These compounds were subsequently analyzed for absorption, distribution, metabolism, excretion and toxicity (ADMET) and docking. The results showed that 13 traditional Chinese medicines were effective against the new coronavirus. Therefore, the combination of traditional chemistry-oriented drug discovery and development concepts with CAD will provide a broad research platform for drug discovery.

However, finding suitable and bioactive drug molecules is the most difficult step in the drug discovery and development process. According to incomplete statistics, about 90% of drug molecules usually fail to pass phase II clinical trials and regulatory approval [17-20]. Encouragingly, with the continuous establishment of AI technology and multiple algorithms in the past 10 years, as well as the continuous improvement of biomedical big data, the limitations of drug discovery and development can be addressed by using AI-based tools and technologies. In addition, machine learning makes it easier for researchers to work with big biomedical databases. This paper briefly reviews the combination of AI algorithm and traditional chemistry to improve the efficiency of drug discovery and the research progress of artificial intelligence in the process of drug discovery, in order to provide some reference for the application of AI in drug discovery in mainland China.

1 Screening of lead compounds

AI plays a huge role in screening new and potential lead compounds. There are currently about 106 million chemical structures in the database from different types of studies, such as genomic studies, clinical and non-clinical studies, in vivo analysis and microarray analysis studies. Based on their active site, structure, and target binding capacity, machine learning models such as reinforcement, logistic, regression, and generative models can be used to screen for these chemical structures.

Because of its time and cost savings, AI is an urgently needed technology in the primary and secondary drug screening process [21]. Tasks such as cell classification, designing new compounds, and predicting the three-dimensional structure of target molecules can be aided by AI to shorten development time, thereby speeding up the drug discovery process [22-23]. Primary drug screening involves the classification of cells through AI techniques [24]. In order to classify the cells of interest, the ML model first needs to be trained in order to recognize the cells and their characteristics [25]. Secondary drug screening involves analyzing the physical properties, biological activity, and toxicity of compounds. Public databases such as ChEMBL, PubChem and ZINC cover millions of compounds and their specific information, such as structure, known targets, etc. Matching molecular pairs (MMPs) and ML can predict biological activities such as oral exposure, in vivo clearance, ADMET, and mode of action [26-28].

2 Peptide synthesis and discovery of small molecule drugs

In recent years, researchers have used AI to synthesize new peptides. Yan et al. [29] developed Deep-AmPEP30, a DL-based short antimicrobial peptide identification platform, in 2020 and used this platform to identify novel antimicrobial peptides from the genome sequence of Clostridium difficile present in the gastrointestinal tract. In addition, Kavousi et al. [30] developed a web server for the identification of antimicrobial peptides, IAMPE, which can identify new antimicrobial peptides. Yi et al. [31] designed ACP-DL in 2019, a DL-based tool that uses the LSTM algorithm to discover new anticancer peptides. AI-based tools can also be used to explore the therapeutic effects of small molecule drugs. Zhavoronkov et al. [32] designed GENTRL, a small molecule de novo design tool based on generative reinforcement learning, and used it to discover a novel DDR1 kinase inhibitor. McCloskey et al. [33] combined DNA-encoded small molecule library data with ML models such as GraphCNN and RF to discover novel drug-like small molecules. Xing et al. [34] integrated XGBoost, SVM, and DNN to find small molecule drugs for targets related to rheumatoid arthritis.

3 Determination of optimal dose administration

How to determine the dose with the best efficacy and the least toxic side effects has always been a challenge in drug design [35]. With the advent of AI, many researchers can use ML and DL algorithms to help determine the optimal dose to administer. For example, Shen et al. [36] developed an AI-PRS platform based on this to determine the optimal and combination dose of antiretrovirals for the treatment of AIDS. When tenofovir, faviren, and lamivudine were used in combination in 10 AIDS patients, analysis using AI-PRS showed that the starting dose of tenofovir was reduced by 33% without causing virus recurrence. Julkunen et al. [37] used the new ML-driven tool comboFM to identify the combination of the anticancer drug crizotinib and bortezomib and showed good efficacy in lymphoma cell lines. Tang et al. [38] used ML techniques such as artificial neural networks, Bayesian additive regression trees, enhanced regression trees, and multiple adaptive regression splines to determine the optimal dose of the immunosuppressive drug tacrolimus.

Design of Class 4 pharmacological compounds and prediction of adverse drug reactions

The design of drug-like compounds is complex and difficult. In recent years, a variety of AI-based online tools have been developed to analyze the release of drug-like compounds and evaluate the feasibility of selected bioactive compounds as carriers. The most commonly used of these are chemically profile-based doomer evaluations, in which researchers can use AI to identify bioactive drugs for specific targets associated with a disease. For example, Wu et al. [39] designed WDL-RF to determine the biological activity of G protein-coupled receptors (GPCRs) targeting ligands using an integrated DL and RF approach.

AI can also be used to determine possible adverse drug reactions before drugs are marketed. For example, Dey et al. [39] used DL-based models to predict drug-related adverse reactions and even identify the chemical substructures that cause these adverse reactions. In addition, Liu et al. [41] integrated the chemical, biological and phenotypic properties of drugs to predict adverse reactions associated with them through ML analysis. James et al. [42] integrated biological, chemical, and phenotypic properties to predict drug-related neurological adverse reactions through ML analysis and identify ADRs associated with drugs targeting Alzheimer's disease.

5 Prediction of protein-protein interactions

Prediction of protein-protein interactions (PPIs) is critical for drug discovery and development. PPI plays an important role in almost all biological processes, including signal transduction, cell growth, and immune defense. Given the critical role of these interactions in homeostasis and disease response, synthetic proteins, such as engineered antibodies, that interact with proteins in the body represent one of the most transformative treatments in modern medicine.

Most synthetic proteins are developed on experimental platforms, but it is difficult for developers to know where and how these proteins bind to target proteins. Despite many advances in computational design methods, predicting amino acid sequences in regions of interaction with targets remains one of the most challenging problems in structural biology. As researchers' understanding of PPI has increased, the availability of protein structure data has increased, and advances in ML have laid the groundwork for methods for analyzing PPIs. For example, use Bayesian networks (BN) to predict PPI. Its essence is to use the similarity of gene co-expression, gene ontology (GO), and other biological processes to integrate datasets to produce precise PPI networks that demonstrate a comprehensive yeast interactome. Research groups have developed a new hierarchical model, the PCA-Bound Extreme Learning Machine (PCA-EELM), using a dataset of BN-bound yeast, which predicts protein-protein interactions by using only protein sequence information [43].

6 Virtual Screening (VS) Efficiency Improvement

VS is the calculation method corresponding to experimental HTS, that is, virtually testing the activity of compounds in the chemical library on biomolecular targets that may have therapeutic significance for specific diseases in the computer, its essence is to select possible active compounds from a large number of compounds, in many compounds only some of the active compounds of the target, if directly screened, although the number of active compounds is relatively complete, but the probability of obtaining active compounds is relatively low, and the cost paid is relatively large. If the VS strategy is adopted, the probability of obtaining the active compound is greatly increased. Therefore, VS is one of the important methods of CAD in the process of drug design and discovery, and it is also an effective method to screen out compounds with therapeutic promise from the compound database.

However, VS is costly and less accurate. The introduction of ML in VS can speed up VS and reduce the false positive rate of VS. VS can be divided into two types, structure-based VS (SBVS) and ligand-based VS (LBVS) [44-45]. Among them, molecular docking is the main principle applied in SBVS, and a variety of scoring algorithms based on AI and ML have been developed, such as NNScore, CScore, SVR-SCORE and ID-SCORE [46]; Algorithms are also used to simulate molecular dynamics in SBVS and to predict protein-ligand affinity in SBVS, such as RFs, SVMs, CNNs, and shallow neural networks [47]. LBVS has also developed different algorithms and tools, such as SwissSimilarity, METADOCK, HybridSim-VS, PKRank, BRUSELAS and AutoDock Bias [48-49]. SBVS and LBVS reduce the complexity of identifying potential therapeutic compounds for pathogenic targets, making them more convenient and precise.

7 QSAR modeling and drug repositioning

Studying the relationship between the structure and physicochemical properties of compounds and their biological activity is important in drug design and development. QSAR modeling is the use of theoretical calculations and statistical analysis methods to establish quantitative mathematical models between chemical structures and biological activities. QSAR models are roughly divided into 2 categories, namely regression models and classification models. A variety of web-based tools and algorithms, such as the VEGA platform, QSAR-Co, FL-QSAR, Transformer-CNN and Chemception, have been developed, providing a new approach for QSAR modeling [50-53].

Effectively identifying new indications from approved clinical drugs can bypass multiple pre-approval tests required to develop new drugs, a process known as drug retargeting. The emergence of large datasets in genomics, proteomics, in vivo and in vitro pharmacological studies provides a convenient pathway for drug repositioning. In recent years, ML algorithms have replaced traditional methods based on chemical similarity and molecular docking with new systems biology methods, and the emergence of AI-based algorithms and network-based tools has provided a platform for research in this field, such as DrugNet, DRIMC, DPDR-CPI, PHARMGKB and DRRS [54-58]. Hooshmand et al. [59] identified 16 potential reusable drugs against the novel coronavirus based on neural networks, and identified 12 promising drug targets for the novel coronavirus based on the multi-model DL method.

8 Prediction of physicochemical properties and drug target affinity

Physical and chemical properties such as solubility, partition coefficient, ionization degree, and permeability coefficient may affect the pharmacokinetic properties and drug-target binding efficiency of compounds. Therefore, when designing drug molecules, the physicochemical properties of compounds must be considered. A variety of AI-based tools have been developed to predict the above-mentioned physicochemical properties of compounds, such as molecular fingerprinting, the SMILES format, coulomb matrices, and potential energy measurements, all of which are used in the DNN training phase [60, 61].

Predicting the binding affinity of chemical molecules to therapeutic targets is an important part of drug discovery and development, and recent advances in AI algorithms have accelerated this process. Researchers have developed network-based tools such as ChemMapper and Similarity Integration Method (SEA) using the similarity profile of drugs and their targets. In addition, ML and DL-based drug target affinity recognition models such as KronRLS, SimBoost, DeepDTA and Padme were constructed [62].

9 Binding prediction of compounds and in vivo safety analysis

AI can predict the effects of drug molecules bound to targets and when they do not bind before synthesis, as well as perform in vivo safety analysis. In recent years, DL algorithms have helped researchers develop DL methods that can analyze the molecular characterization of compounds and are suitable for predicting compound toxicity, such as the application of DL in antimicrobial discovery can help select new powerful lead compounds with ideal pharmacokinetic and toxicity characteristics for further optimization. As computing power and algorithms continue to improve, so does the amount of data available for use [46-48, 63]. The potential of DL algorithms in toxicity prediction depends on the quality and quantity of datasets, so more in-depth research is needed to make AI-based algorithms more reliable for toxicity prediction.

10 Design of multi-target ligand drug molecules

One of the important outcomes of AL and ML algorithms in drug discovery and development is the prediction and estimation of the overall topology and kinetics of disease networks, drug-drug interactions, or drug-target relationships. Databases such as DisGeNET and STRTCH are used to determine gene-disease associations, drug-target associations, and molecular pathways, respectively. For example, Gu et al. [64] identified the targets of 197 most commonly used TCM in 2020 using a similar ensemble method, and then used the DisGeNET database to link these targets to different diseases, thereby linking TCM to treatable diseases. Multipharmacology is best suited for the design of drugs for complex diseases such as cancer, neurodegenerative diseases, diabetes and heart failure. ML-based methods have the potential to analyze association putative molecular networks, greatly increasing the probability of discovering multi-target ligand drug molecules. In addition, ML models can also help identify multi-target ligand drug molecules with different binding sites.

11 Design of clinical trials

Clinical trials are the longest and most costly part of the new drug development process. At present, the success rate of drug clinical trials in all countries around the world is low, which seriously affects drug development and wastes a lot of time and costs. In clinical trials, AI technology can be used to assist clinical trial design, patient recruitment and clinical trial data processing. For example, a clinical trial matching system developed by IBM Watson [65] creates detailed profiles by leveraging patients' medical records and large amounts of previous clinical trial data. AI models can also improve the success rate of drug clinical trials by analyzing toxicity, side effects, and other relevant parameters, thereby reducing the cost of clinical trials. In the near future, AI technology will also facilitate the good management of clinical trial data, thereby achieving the goal of personalized medicine.

12 Conclusion

AI is becoming a powerful tool for solving complex problems in medicine, life sciences, and engineering. In summary, AI can participate in all stages of the drug development process, providing one-stop service of "discovery-market" for new drug development, such as lead compound screening, peptide synthesis and small molecule drug discovery, determination of optimal dose administration, design of drug-like compounds and prediction of adverse reactions, analysis of protein-protein interactions, improvement of virtual screening efficiency, QSAR modeling and drug repositioning, prediction of physicochemical properties and drug target affinity, compound binding prediction and in vivo safety analysis, Design of multi-target ligand drug molecules to auxiliary clinical trial design.

Recent advances in ML and DL methods present a good opportunity for drug discovery and development, while advances in AI algorithms, especially DL methods, as well as improvements in architectural hardware and greater access to big data, indicate that a third wave of AI is coming, and many pharmaceutical companies are collaborating with AI companies. At present, AI has been successfully used in the field of drug discovery, target identification, lead optimization, ADMET prediction, and clinical trial design. In December 2020, Insilico Medicine's small molecule inhibitor submitted a clinical study application to the US FDA, and completed the Phase I clinical trial at the end of 2022, and obtained orphan drug designation granted by the US FDA in 2023. The drug is the first-ever new target small molecule inhibitor discovered and approved by an AI-based tool.

Although there are already some success stories of AI in the field of drug discovery, there are still 2 major challenges in obtaining high-quality data. First, labeling cannot be binary, because the role of drugs in biological systems is complex; Second, while databases hold vast amounts of information, there is not much high-quality data available in drug discovery. Therefore, there is an urgent need to establish a platform that can provide not only massive data, but also high-quality data. In the pharmaceutical industry, where open data sharing is uncommon, the Pistoia alliance encourages many companies to share data with others and is ready to establish a unified data format, but there are technical challenges that need to be solved yet.

On the other hand, due to the complexity of human diseases and the specificity of human bodies, there are also some inevitable difficulties in using AI tools in the drug development process. Therefore, AI technology and multiple algorithms need to invest more money and research work, and need to be organically combined with traditional basic disciplines and clinical medicine. But there is no doubt that in the near future, AI will revolutionize the drug discovery and development process. It is hoped that the research progress on the application of artificial intelligence in the process of drug discovery briefly reviewed in this paper can provide a certain reference for the application of AI in drug discovery in China.

Source: Li Shuangxing, Li Yihao, Lin Zhi, Zhang Wei, Yang Yanwei, Qu Zhe, Li Yanchuan, Huo Guitao, Lv Jianjun. Research Progress on the Application of Artificial Intelligence in Drug Discovery [J]. Pharmaceutical Evaluation Research, 2023, 46(9):2023-2036.

Read on