Using deep learning, molecular optimization is performed through a fragment modification

Edit | Radish peel

Molecular optimization is a critical step in drug development, and chemical modifications can be used to improve the desired properties of a drug candidate.

Researchers from The Ohio State University have developed a novel depth-generated model, Modof, on molecular maps for molecular optimization. Modof modifies a given molecule by predicting a single break site at a molecule and removing and/or adding fragments at that site.

Implement multiple pipelines of the same Modof model in Modof-pipe to modify input molecules at multiple break locations. The researchers showed that Modof-pipe is able to retain the main molecular scaffold, allowing for control of intermediate optimization steps and better constraining molecular similarities. Modof-pipe outperforms state-of-the-art methods on benchmark datasets.

The study, titled "A deep generative model for molecule optimization via one fragment modification," was published in Nature Machine Intelligence on December 9, 2021.

Using deep learning, molecular optimization is performed through a fragment modification

Molecular optimization constrains drug discovery

Molecular optimization is a critical step in drug discovery, and chemical modifications can be used to improve the desired properties of a drug candidate. For example, in lead optimization, the chemical structure of lead molecules can be altered to improve their selectivity and specificity.

Traditionally, this molecular optimization process has been programmed based on the knowledge and experience of medicinal chemists and carried out through fragment-based screening or synthesis. Therefore, it is not extensible or automated.

Recent studies have shown that computer methods using deep learning enable alternative computational generation processes to accelerate traditional paradigms. These deep learning methods learn from string-based molecular representations (SMILES) or molecular diagrams and generate new representations with better properties accordingly (for example, by connecting atoms and bonds).

Although computationally appealing, these methods do not fit the in vitro molecular optimization process in a very important way: molecular optimization requires retaining the main scaffolds of the molecule, but generating a complete new molecular structure may not be able to replicate the scaffold. As a result, these methods are limited in their potential to inform and guide molecular optimization in vitro.

"Modifier with a fragment"

Here, the team proposes a new generative model for molecular optimization that more closely resembles silicon chemical modifications. The method is called "modifier with a fragment" or Modof. Following a fragment-based drug design philosophy, Modof predicts a single break site on a molecule and modifies the molecule by altering fragments of that site (e.g., ring systems, joints, and side chains).

Unlike existing molecular optimization methods that encode and decode entire molecular maps, Modof learns and codes from the differences between molecules before and after optimization at a break position. To modify a molecule, Modof generates only one fragment that instantiates the expected difference by decoding a sample taken from the potential "difference" space. Modof then removes the original fragment from the disconnected site and attaches the resulting fragment to the site.

By sampling multiple times, Modof is able to generate multiple candidates for optimization. A pipeline consisting of multiple identical Modof models, represented as Modof-pipes, iteratively optimizes molecules at multiple break locations through different Modof models, with the output molecule of one Modof model as input to the next Modof model. Modof-pipe is further enhanced to Modof-pipem to allow one molecule to be modified into multiple optimized molecules as the final output.

Illustration: Overview of the Modof model. (Source: Thesis)

Modof has the following advantages:

It modifies fragments one at a time. It better approaches in vitro chemical modifications and retains most of the molecular scaffolds. As a result, it may better inform and guide molecular optimization in vitro.

It only encodes and decodes the fragments that need to be modified, which is conducive to better modification performance.

Modof-pipe iteratively modifies multiple fragments of different fracture sites. It makes it easier to control and visually decipher intermediate modification steps and helps to better explain the entire modification process.

Modof is not as sophisticated as state-of-the-art technology. It has at least 40% fewer parameters and 26% less training data used.

Modof-pipe outperformed the most advanced method on the benchmark dataset for optimizing the octanol-water partition coefficient, which was affected by synthetic accessibility (SA) and ring size, improving by 81.2% while the optimized molecules were not constrained by molecular similarity, compared with 51.2%, 25.6%, and 9.2% improvements if the optimized molecules needed to be at least similar to pre-optimization molecules, 0.2, 0.4, and 0.6, respectively.

Modof-pipem improves the performance of Modof-pipe by at least 17.8%.

Modof-pipem and Modof-pipe also demonstrated superior performance in two other benchmarking tasks, optimizing the binding affinity of molecules to dopamine D2 receptors and improving drug similarity estimated by quantitative measurements.

Illustration: Example of a Modof-pipe for plogP optimization. (Source: Thesis)

discuss

Molecular optimization using simulated properties

Most of the molecular properties considered in the study's experiments were based on simulated or predicted values, not experimental measurements.

That is, a stand-alone simulation or machine learning model is first used to generate attribute values for the benchmark dataset.

For example, the Crippen logP is estimated by wildman and Crippen methods, synthetic accessibility is calculated using the scoring function of predefined fragments, DRD2 properties are predicted using a support vector machine classifier, and quantum electrodynamic properties are predicted using a nonlinear classifier combined with multiple expected functions of molecular properties.

Although all existing molecular optimization generative models use this simulation property, challenges and opportunities coexist. Challenges arise when simulation or machine learning models predicted by these attributes are inaccurate for various reasons, and generative models learned from inaccuracies in attribute values will also be inaccurate or incorrect, resulting in generated molecules that could negatively impact downstream drug development tasks.

However, as these simulations and predictions continue to improve, the opportunities presented by attribute simulations or predictions can be enormous in unlocking the power of large-scale data-driven learning paradigms to stimulate drug development.

Specifically, most of the deep learning-based models for drug development, many of which have proven to be very promising, would not be possible without large-scale training data.

Although it is impractical to measure the properties of interest to a large number of molecules experimentally, the simulation or prediction of the properties of molecules enables large amounts of training data and enables the development of such deep learning methods. Fortunately, due to the accumulation of experimental measurements and the powerful learning ability of innovative computational methods, attribute prediction simulations or models become more accurate. Accurate property simulations or predictions of large-scale molecular data, and the powerful ability to generate models from these molecular data, will collectively have strong potential to further advance the development of silicon drugs.

Synthesizable and inversely synthesizable

The experiment showed that Modof was also able to improve the accessibility of synthesis. However, this does not necessarily mean that the generated molecules can be easily synthesized. This limitation of Modof is actually common to almost all computational methods used for molecular generation. A recent study showed that many molecules generated through deep learning are not easily synthesized, which limits the translational potential of generative models to have a real impact in drug development.

On the other hand, inverse synthesis prediction through deep learning, aimed at determining the viable synthesis path of a given molecule by learning and searching from a large number of synthetic paths, has been an active area of research. Optimizing molecules not only has better properties, but also has better synthesizability, especially while determining a clear synthesis path, which can be a very interesting and challenging future research direction.

The team hopes to develop a comprehensive computational framework that can generate synthesizable molecules with better properties. This requires not only large amounts of data to train complex models, but also the necessary domain knowledge and human experts to cycle into the learning process.

In vitro validation

Ultimately, computer-generated molecules need to be tested in the lab to validate computational methods. While most existing computational methods are developed in an academic setting and therefore cannot be easily tested on a library of molecules that are available or proprietary, and the molecules they generate cannot be synthesized as easily as discussed earlier; some success stories show that powerful computational methods have great potential to truly make new discoveries that will lead to success in laboratory validation.

Similar to this molecular optimization and discovery process using deep learning methods is AlphaFold, a deep learning method that predicts the folded structure of proteins. AlphaFold's breakthrough in solving a major biological challenge from 50 years ago is a powerful testament to the power of modern learning methods that should not be underestimated.

Still, there is a great need to work with the pharmaceutical industry and in vitro testing to truly translate advances in computational methods into real impact. In addition, efficient sampling and/or prioritization of the generated molecules to determine viable sets of small molecules for small-scale in vitro validation may be a practical solution. This will require the development of new sampling schemes on the molecular subspace and/or the learning of molecular priorities during molecular generation. At the same time, large-scale in vitro validation of silicon-generating molecules is a challenging but interesting future research direction.

Calculate other problems in molecular optimization

One limitation of Modof-pipe is that it employs a local greedy optimization strategy: in each iteration, Modof's input molecules will be optimized to their best, and if the optimized molecules have no better properties, they will not make additional Modof iterations.

Illustration: Example of Modof-pipe for DRD2, QED, and multi-attribute optimization. (Source: Thesis)

conclusion

Modof optimizes molecules at a time by learning to optimize the differences between molecules before and after optimization. Using a less complex model, it can achieve better or similar performance than state-of-the-art methods. In addition to the limitations already discussed above and the corresponding future research directions, another limitation of Modof is that, in Modof, modifications occur on the periphery of the molecule.

While this is common in in vitro pilot optimization, the team is currently investigating how Modof can be enhanced to modify the inner regions of the molecule, by learning from appropriate training data from those regions if needed. In addition, the researchers hope to integrate domain-specific knowledge into the Modof learning process to improve interpretability in the learning and generation process.

Artificial Intelligence × [ Biological Neuroscience Mathematics Physics Materials ]

"ScienceAI" focuses on the intersection and integration of artificial intelligence with other cutting-edge technologies and basic sciences.

Welcome to follow the stars and click Likes and Likes and Are Watching in the bottom right corner.

Using deep learning, molecular optimization is performed through a fragment modification

Read on