laitimes

Exploring the Value of GenAI in Life Sciences: From Prediction to Creation

author:Global Technology Map
Exploring the Value of GenAI in Life Sciences: From Prediction to Creation

Artificial intelligence has deeply empowered the digital and intelligent transformation of life sciences and biomedicine, and has achieved many remarkable research results in solving human life and health problems. Recently, generative artificial intelligence (GenAI), represented by ChatGPT, has further expanded the capabilities of data-driven drug development and healthcare models. Gartner predicts that by 2025, the proportion of new drugs developed using GenAI technology systems will rise from zero today to more than 30%. However, in the expectation of GenAI to unleash the potential of the life and health industry, it is still necessary to be vigilant against the many risks and challenges brought by bioinformatics security issues and the threat of biological weapons.

1. GenAI empowers life science research and technological innovation

With the continuous iterative upgrading of artificial intelligence technology, GenAI has set off a revolution in productivity mode with huge amounts of data and large model architecture as the technical characteristics and driving force. ChatGPT、DALL· E, Stable Diffusion and other applications are the first to be applied in a rich, diverse and interactive way, and the life science field continues to accumulate energy and try to explore. Compared with the prediction role of artificial intelligence in the previous stage, GenAI has led the life sciences and biomedicine fields into the creation stage with greatly improved learning and Xi capabilities and generation capabilities, providing strong support for the rich tasks of downstream industries.

(a) GenAI provides a powerful power for basic research in the life sciences

Artificial intelligence prediction of protein structure makes it possible to freely control cell functions and life activities, and its prediction performance is constantly improving in terms of accuracy, range, and time. In October 2023, Google DeepMind and the European Bioinformatics Institute (EMBL-EBI) launched a major upgrade version of AlphaFold-latest, which further improves the accuracy by 10% on the basis of the ability to predict all known proteins on Earth, and the prediction accuracy can reach the atomic level. Compared with AlphaFold's performance breakthrough in using multi-sequence matching algorithms to achieve atomic-resolution structure prediction, research teams such as Meta in the United States have achieved an order of magnitude acceleration of high-resolution prediction by using the internal representation of language models, and the largest protein language model ESM-2 developed by them predicted more than 617 million protein structures in only 2 weeks. Both of these approaches demonstrate the potential of AI to improve predictive protein structure performance and innovation.

Predictive structures provide a more efficient means to decode the three-dimensional mysteries of proteins, and GenAI provides an end-to-end convenient way to directly create proteins or even unknown or non-existent functional proteins, amplifying a nearly infinite and vast space for protein sequences and structures, making the trend of disrupting the paradigm of life science and biomedical research even more significant. At present, the application of GenAI in the field of protein design and biomedicine mainly includes two construction ideas: Transformer architecture and diffusion model. The former is represented by Progen, a protein language model developed by Profuluent, an American start-up biopharmaceutical company, in January 2023. Based on the 1.2 billion parameter neural network of the Transformer architecture, the model provides a method to generate specific proteins according to the desired attributes, synthesizes artificial enzymes that do not exist in nature de novo, and has attracted widespread attention in the field of life sciences. The latter construction idea adopts the technical path of diffusion model commonly used in the field of image generation, and is better at generating images based on text to describe the relationship between protein sequences and structures, and quickly generate the skeleton structure of proteins. For example, in October 2022, Stanford University and Microsoft Research introduced a folding diffusion (FoldingDiff) model, inspired by the protein folding process in receptors, to realize the design of protein backbone structure by mirroring the natural folding process of proteins, and solve the problem of directly generating proteins with complex and diverse structures.

(b) GenAI is driving technological change in pharmaceutical R&D

In terms of drug R&D, GenAI can establish drug development assistance models based on biological mechanisms, clinical data of diseases, and pharmaceutical drug data. On the one hand, it can reduce the manpower, material resources and time investment in R&D, reduce the time and economic cost of drug R&D, and on the other hand, it can assist in predicting the effectiveness and safety of new drugs and improve the success rate of drug R&D. For example, Pharma.AI Insilico Medicine developed an artificial intelligence drug discovery platform in 2022, it spent $2.7 million to develop the world's first AI-discovered drug candidate ISM001-055 with a new target and molecular structure in just 18 months, which greatly reduces the cost and time of drug development compared to the high cost of $1.98 billion in the average 14 years of new drug development.

2. Explore the innovative application scenarios of GenAI in the field of life sciences

With a long life science research chain and a complex industrial layout, GenAI integrates biology, chemistry, computational science, pharmacology, and disease treatment to form a comprehensive pathway, providing efficient tools for small and large molecule design, optimization, and synthesis. At present, the application exploration of GenAI is still in the early stage of research and development, and the platform level is beginning to emerge, and the real application is still in the embryonic stage, and the degree of development and application of each technical link is different, but on the whole, the industrial application potential of GenAI is sufficient.

GenAI provides powerful search and optimization tools for drug discovery and antibody construction. In the early stage of molecular discovery, which is the most difficult and expensive to develop, GenAI can not only establish the connection between amino acid sequences and protein structures, but more importantly, find new molecules that can be accurately targeted, effectively perform functions and complete attribute tuning in the broad protein space according to a certain disease or a target, so as to be the most promising drug candidate for subsequent research and development, so as to avoid a lot of resource-intensive trial and error work and improve the success rate. For example, the University of Toronto in Canada and Stanford University in the United States used the AI-driven end-to-end drug discovery engine PandaOmics biocomputing platform and the Chemistry42 generative chemistry platform to select 7 molecules for synthesis and biological testing based on the protein structure predicted by AlphaFold, and discovered the first drug in only 30 days, becoming the first case of successfully applying AlphaFold to the early drug hit discovery and identification process. Later, the University of Washington developed ProteinMPNN, a protein sequence design strategy based on deep Xi, which can design binding proteins with high stability, specificity and affinity from scratch only through the three-dimensional structural information of proteins, expanding the undruggable targets and creating a new way of protein drug development.

GenAI opens up new avenues for brain image computing and brain network computing. Through the integration of neuroimaging, GenAI has made many important breakthroughs in extracting spatiotemporal brain features and reconstructing the topological connectivity of brain networks, providing a promising way to reconstruct the visual experience of human brain activity and understand the brain. In March 2023, the Faculty of Frontier Biological Sciences, Osaka University, Japan, reconstructed high-resolution images of human brain activity based on diffusion model reconstruction (technical ideas are shown in Figure 1 below). In April, the Illinois Institute of Technology proposed a new dream recording method that combines GenAI, non-invasive brain-computer interfaces, and thinking-type software, which can generate signals for thinking input during REM sleep, which is an important step in the understanding and application of brain network computing.

Exploring the Value of GenAI in Life Sciences: From Prediction to Creation

Source: Takagi Y, Nishimoto S. High-resolution image reconstruction based on a latent diffusion model of human brain activity[C]//IEEE/CVF Conference Proceedings on Computer Vision and Pattern Recognition.2023: 14453-14463.

GenAI provides new "smart kinetic energy" for complex clinical diagnosis and expert systems. GenAI empowers the entire process of diagnosis and treatment. First of all, in terms of auxiliary diagnosis, GenAI can provide valuable reference for clinical diagnosis and treatment decisions, improve the quality of medical images, replace diagnostic processes such as electronic medical record entry, complete the liberation of doctors' intelligence and energy, and realize the improvement of doctors' professional capabilities. After evaluation by Harvard Medical School in the United States and Dokkyo University in Japan, the generative training models GPT-3 and GPT-4 were found to have a total diagnostic accuracy of more than 90% in a series of challenging clinical cases. Secondly, in terms of rehabilitation treatment, GenAI can synthesize speech audio for people with voice loss, body projection for people with disabilities, and non-aggressive medical companionship for patients with mental illness, etc., so as to soothe patients in a humanized way, so as to soothe their emotions and accelerate their recovery.

GenAI can play an active role in drug repurposing. Drug repositioning refers to the discovery of new uses for existing drugs in other disease areas. By analyzing clinical data, genomics data, and other information, GenAI can identify the potential role of drugs in the treatment of other diseases, thereby supporting the clinical repositioning of drugs. This approach can save time and cost in drug development and accelerate the translation of drugs from the laboratory to the clinic. For example, researchers from IBM Research Institute and Teva Pharmaceuticals in Israel used GenAI algorithms to simulate clinical trials and discovered that the sleeping pill zolpidem can also be used as a new drug for the treatment of Parkinson's dementia.

3. Risks and problems faced by GenAI in the field of life sciences

As GenAI continues to unleash the potential of life science research, biosecurity and data privacy issues also face certain risks.

First, GenAI provides a simple and convenient means to implement bioterrorism. New technologies can grow biological weapons in bacteria or cells without extracting toxins from the source, or combine toxins with antibodies to produce more threatening "fusion toxins" on this basis, which are "multipliers" of the threat of biological weapons. Andrew White, a professor of chemical engineering at the University of Rochester in the United States, conducted a penetration test of the GPT-4 model, and after providing scientific papers related to chemical weapons and a list of chemical manufacturers, he obtained the nerve agents and manufacturing sites recommended by GPT-4 as chemical weapons.

Second, the contradiction between the credibility of generated data and the unexplainability of GenAI increases data security risks. The information errors of large language models such as GPT, the "hallucination" problem of misleading and slandering using false information, and the opaque "black box" theory of GenAI have all affected the credibility and availability of drug data to a certain extent, which may lead to bias in subsequent R&D decisions and fail to ensure the safety and efficacy of drugs. At the same time, its limited interpretability makes it difficult to correct errors and biases in the generated content.

Third, there are data privacy issues in the research and development process of AI drugs. Drug research and development involves large amounts of patient data and clinical trial data, which contain patients' personally identifiable information and health information, involving privacy and security issues. If a GenAI model is vulnerable in data processing and storage, it can lead to patient data leakage, misuse, or misuse, leading to potential legal action and reputational damage.

Exploring the Value of GenAI in Life Sciences: From Prediction to Creation

Source: McKinsey website

Fourth, data source and processing are the biggest pain points that hinder GenAI's research in the field of life sciences and medicine. On the one hand, the quality and quantity of current structural biology data are far from meeting the training needs of generative models, and on the other hand, the data label processing cost of protein sequences is very high, which may bring great financial pressure to R&D work and affect the development progress.

epilogue

GenAI has injected innovation momentum into new forms and models of life sciences, and will continue to develop into a more economical, efficient, and rapid stage in the future, including the reduction of running computing costs and the open source of more and more large models. The prospects of the life sciences and biomedicine fields are exciting, but in order to truly play a driving role from R&D to implementation, it is also necessary for the government and industry to work together to build an industrial ecology, adhere to both regulatory norms and promote development, and strengthen the deep integration with industry-specific scenarios, so as to promote the safe and steady development of the AI+ life science industry.

Bibliography:

[1]https://www.gartner.com/cn/information-technology/articles/beyond-chatgpt-the-future-of-generative-ai-for-enterprises

[2]http://www.cac.gov.cn/2023-04/11/c_1682854275475410.htm

[3] Madani A, Krause B, Greene E R, et al. Large language models generate functional protein sequences across diverse families[J]. Nature Biotechnology, 2023: 1-8.

[4] Wu K E, Yang K K, Berg R, et al. Protein structure generation via folding diffusion[J]. arXiv preprint arXiv:2209.15611, 2022.

[5] Perron Q, Mirguet O, Tajmouati H, et al. Deep generative models for ligand‐based de novo design applied to multi‐parametric optimization[J]. Journal of Computational Chemistry, 2022, 43(10): 692-703.

[6] Ren F, Ding X, Zheng M, et al. AlphaFold Accelerates Artificial Intelligence Powered Drug Discovery: Efficient Discovery of a Novel Cyclin-dependent Kinase 20 (CDK20) Small Molecule Inhibitor[J]. arXiv preprint arXiv:2201.09647, 2022

[7] Dauparas J, Anishchenko I, Bennett N, et al. Robust deep learning–based protein sequence design using ProteinMPNN[J]. Science, 2022, 378(6615): 49-56.

[8]https://www.genengnews.com/topics/drug-discovery/ai-proofing-workflows-in-drug-development/

[9] Nair R, Mohan D D, Setlur S, et al. Generative models for age, race/ethnicity, and disease state dependence of physiological determinants of drug dosing[J]. Journal of Pharmacokinetics and Pharmacodynamics, 2022: 1-12.

[10] Hirosawa T, Harada Y, Yokose M, et al. Diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical Vignettes with common chief complaints: A pilot study[J]. International Journal of Environmental Research and Public Health, 2023, 20(4): 3378.

[11]https://www.genengnews.com/topics/drug-discovery/seven-biopharma-trends-to-watch-in-2023/

[12]https://mp.weixin.qq.com/s/IOkxHFNTnVfmL5Q__Xdm2Q

About the Author

Dai Ji, Research Office III, Institute of International Technology and Economics, Development Research Center of the State Council

Research interests: tracking the situation in the field of biology and research on key core technologies and cutting-edge technologies

Contact: [email protected]

Editor丨Zheng Shi

Exploring the Value of GenAI in Life Sciences: From Prediction to Creation

About the Institute

Founded in November 1985, the International Institute of Technology and Economics (IITE) is a non-profit research institute affiliated to the Development Research Center of the State Council, whose main functions are to study major policy, strategic and forward-looking issues in the economic, scientific and technological and social development of the mainland, track and analyze the development trend of the world's science and technology and economic development, and provide decision-making consulting services for the central government and relevant ministries and commissions. The "Global Technology Map" is the official WeChat account of the International Institute of Technology and Economics, which is dedicated to conveying cutting-edge technology information and technological innovation insights to the public.

Address: Block A, Building 20, Xiaonanzhuang, Haidian District, Beijing

Phone: 010-82635522

WeChat: iite_er

Read on