laitimes

Artificial intelligence predicts RNA and DNA binding sites to accelerate drug discovery

Edit | Radish peel

Structure-based drug design (SBDD) for nucleic acid macromolecules, particularly RNA, is a dynamic research direction that has produced several FDA-approved compounds. Similar to proteins, one of the key components of RNA in SBDD is the correct identification of binding sites for presumptive drug candidates.

The common structural organization of RNA, combined with the dynamic properties of these molecules, makes it challenging to identify the binding sites of small molecules. In addition, a structure-based approach is required, since only the sequence information does not take into account the conformational plasticity of nucleic acid macromolecules. Deep learning promises to solve the binding site detection problem, but requires a lot of structural data, which is very limited for nucleic acids compared to proteins.

Researchers at the Skolkovo Institute of Science and Technology in Russia assembled a set of about 2,000 small nucleic acid structures, including about 2,500 binding sites, about 40 times larger than previously used datasets, and demonstrated a deep learning method based on this structural dataset, BiteNetN, to detect binding sites in nucleic acid structures. Operating using arbitrary nucleic acid complexes, BiteNetN exhibits state-of-the-art performance and facilitates the analysis of different conformations and mutant variants.

The study, titled "Structure-based deep learning for binding site detection in nucleic acid macromolecules," was published in NAR Genomics and Bioinformatics on November 26, 2021.

Artificial intelligence predicts RNA and DNA binding sites to accelerate drug discovery

RNA molecules are critical in many cellular processes, such as gene regulation and cellular information transmission, and therefore represent a promising class of pharmacological targets. The RNA-targeted drug discovery campaign explores a variety of perspectives, including the design of DNA G-tetrastreak stabilizers, antibiotics that target ribose switches, antisense RNA, and antiviral drugs that target RNA. RNA targets that extend the druggable genome, including those associated with "non-druggable" protein targets or non-coding microRNAs, are of particular interest.

However, there are many barriers to RNA drug development, including those associated with low chemical diversity and the dynamic properties of RNA structure. Similar to proteins, RNA molecules are highly structured to form binding sites through which small molecules can modulate them. Therefore, highly efficient, structure-specific RNA small molecule ligand binding site detectors are needed to advance the discovery of RNA-targeted drugs.

"For example, nucleic acid DNA and RNA can be involved in signaling, and we can target the signaling or any other process they are involved in. This can be a promising strategy for non-treatable protein targets, such as disordered proteins or proteins that lack convenient binding sites." Petr Popov, the study's principal investigator, said, "Then there are pathogenic RNAs that are foreign to the body, such as viruses such as SARS-CoV-2 or HIV."

Although there are a large number of protein-specific methods, the number of methods used to predict RNA-small molecule interaction sites is very limited, which can be roughly divided into knowledge-based methods, empirical methods, and machine learning methods. Knowledge-based approaches, such as InfoRNA, mine RNA motifs in a database of known RNA-small molecule binding sites. Empirical methods, such as Rsite, Rsite2, or RBind, rely on simple geometric features of RNA structure and look for extremes of these features as indicators of binding sites.

Recently, scientists have developed a machine learning method, RNAsite; it includes a random forest model that manipulates structure-based and sequence-based features of computed RNA. The use of deep learning is expected to improve RNA binding site detectors; however, it is hampered by the relatively small number of RNA structures available.

In fact, while recent deep learning methods for protein-small molecule or protein-peptide binding site detection rely on datasets of thousands of examples, the RNAsite model was trained on only 60 RNA-small molecule complexes.

In this study, the team demonstrated the first structure-based deep learning method to predict nucleic acid-small molecule ligand binding sites. To overcome the problem of small data sets, the researchers considered RNA and DNA complexes, interfaces formed by symmetrical pairing with crystals, NMR models, and data enhancement. A dataset of 2000 nucleic acid small molecule structures was designed, including 2500 binding site interfaces retrieved from the Protein Database (PDB).

Next, the researchers developed a voxel-based view of the structure of the nuclear chain, each representing a 13-cube in physical space and storing eight channels corresponding to the density of a particular type of atom. The voxelized representation is then fed to a 3D convolutional neural network that scores fragments in nucleic acid structures associated with binding sites. The resulting structure-based deep learning model, called BiteNetN, predicts the coordinates of the interface center of the binding site, the probability score for each center, and the score for each nucleotide in the binding site.

BiteNetN

data set

To train the BiteNetN deep learning model, the researchers built a large dataset of 1933 nucleic acid-ligand complexes, including 1065 DNA and 886 RNA structures of different types (18 structures containing both DNA and RNA).

model

Artificial intelligence predicts RNA and DNA binding sites to accelerate drug discovery

Illustration: BiteNetN workflow. (Source: Thesis)

The researchers trained BiteNetN on select nucleic acid structures using a 3D CNN architecture, demonstrating optimal performance for protein-small molecule molecule and protein-peptide binding site detection, illustrating the BiteNetN workflow.

Artificial intelligence predicts RNA and DNA binding sites to accelerate drug discovery

Illustration: Demonstrates that BiteNetN works for different types of DNA or RNA structures. (Source: Thesis)

Comparison with other methods

To compare the performance of BiteNetN with other methods, the researchers obtained binding site predictions for four different methods: Rsite, Rsite2, RBind, and RNAsite for ten test sets. The researchers calculated weighted AP, ROC AUC, and MCC performance metrics for existing methods, as well as 10 BiteNetN models trained on the designed dataset.

"Most of the early methods were only for RNA, especially single strands. Our method applies to DNA and two or more strands. We can even see the extra loci that arise when multiple molecules are entangled together." Igor Kozlovskii, author of the paper, said.

Artificial intelligence predicts RNA and DNA binding sites to accelerate drug discovery

Illustration: Weighted AP, ROC AUC, and MCC performance metrics on a dataset. (Source: Thesis)

Case studies

Binding sites are the structural and dynamic properties of macromolecules; therefore, methods for predicting binding sites should distinguish between conformations with open and folding binding sites and be suitable for the analysis of conformational sets. To demonstrate that BiteNetN is used for related nucleic acid ligand binding site detection problems, the researchers tested HIV-1 for trans-activation reaction regions and ATP aptamers.

Artificial intelligence predicts RNA and DNA binding sites to accelerate drug discovery

Illustration: AP, ROC AUC, and MCC performance indicators of seven TAR RNA structures bound to small molecules. (Source: Thesis)

Artificial intelligence predicts RNA and DNA binding sites to accelerate drug discovery

Illustration: Binding site scores calculated on at-POINT binding and ATP unbound MD trajectories of wild-type AP aptabols and their G6A mutants. (Source: Thesis)

In summary, the team wants to emphasize that nucleic acid structure is different from protein structure in atomic composition and structural folding, so it is difficult to directly apply protein binding site detection methods. Here, the team designed a specific type for nucleic acid structures covering a variety of nucleotides, suitable for DNA and RNA, and their multi-stranded complexes.

They designed BiteNetN, which is always superior to other methods in the test set they build. BiteNetN has a specific conformation, as we demonstrated by analyzing seven different HIV-1 TAR RNA structures that bind small molecules. It facilitates large-scale analyses, such as conformational sets or mutation variation analysis, as shown in at-AP aptamer case studies. Finally, BiteNetN can use RNA and DNA complexes, including multiple strands.

Open Source Links: https://sites.skoltech.ru/imolecule/tools/bitenet/

Thesis link: https://academic.oup.com/nargab/article/3/4/lqab111/6441762#316112271

Related: https://phys.org/news/2022-01-artificial-intelligence-rna-dna-sites.html

Artificial Intelligence × [ Biological Neuroscience Mathematics Physics Materials ]

"ScienceAI" focuses on the intersection and integration of artificial intelligence with other cutting-edge technologies and basic sciences.

Welcome to follow the stars and click Likes and Likes and Are Watching in the bottom right corner.

Read on