laitimes

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

author:Fruit shell network science

Source: Lizimo I am a builder

In terms of cohort studies and biobank construction, there does not seem to be a set of effective Chinese solutions. At the recent cohort research and precision medicine forum jointly organized by Nanjing Medical University and Immena, many experts from home and abroad shared the exploration experience of cohort research, which attracted people's thinking. What kind of cohort research does China need, I would like to use this article to introduce the stones.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

The "2023 Cohort Research and Precision Medicine Translational Academic Forum" jointly organized by Nanjing Medical University and Immena

Which has a greater impact on people's health, air pollution or smoking? Why are some people prone to cancer without smoking? Questions such as these are often asked in real life.

In recent years, scientific research has revealed many factors affecting health, and prevention has become a major concern for healthy living. However, there are still large differences between individuals. With the increasing attention of precision medicine in the whole society, the understanding of the related role of our genes, the environment and lifestyle is also being mentioned more and more.

But what exactly sustains our health and what increases our risk of disease. Answering these questions will not happen overnight, but it may all be possible to rely on long-term cohort studies and the contribution of corresponding biobanks.

1. What is a cohort study? Why do I need a biobank?

Cohort, in English, refers to people with the same characteristics or the same exposure. The study of this group of people formed the subject of cohort studies, tracking this group of people over time.

When participants enrolled in a cohort study join the study, the investigators collect their basic information, such as demographics, age, and ethnic characteristics, and may also collect biological, social, psychological, medical, environmental, and genetic information, which constitute the baseline of the study. Researchers also periodically collect information on the onset of disease or health status at different stages of the participants' later lives, known as follow-up. Depending on the goals of the study, follow-up may last weeks or years.

Comparing data at follow-up points with baselines allows prospective evaluation of how different factors affect a person's health. In epidemiology, cohort studies are often used to identify potential risk factors that contribute to disease or influence the development of disease patterns, such as air pollution, smoking, coronavirus, and more. Since 1940, cohort studies represented by the Framingham Study and the British Doctors Study in the United States have begun to be built, laying the foundation for the risk recognition and prevention of chronic non-communicable diseases such as cardiovascular and cerebrovascular diseases and tumors.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

Schematic diagram of the relationship between cohort studies and biobanks

A biobank is a collection of biological samples and their ancillary information collected and stored for different cohort studies. Traditionally, biobanks have been set up to serve the research to be carried out; But increasingly, researchers are realizing that a complete and systematic biobank can be used by a larger research community to guide, inspire, and meet their diverse research needs, and many are prospectively designed from the ground up.

Therefore, the goals and organizational management institutions of biobank construction are different, for example, the sample bank oriented to specific disease cohorts is mostly collected through hospitals, while the population-based prospective large-scale cohort needs to collect samples from natural populations, and also requires greater funding, resources and organizational forms to promote.

Biobank construction is increasingly large, whether community-based prospective natural population cohort studies or hospital-based specialized cohort studies, as most risk factors have only modest or small effects on individual health outcomes, and only through larger population studies do these effects reflect statistically objective and reliable results.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

Professor Rory Collins, Principal Researcher and CEO of the UK Biobank, often mentioned an example in his speech: as shown in the figure, the relationship between coronary heart disease risk and systolic blood pressure level in 5,000, 50,000 and 500,000 people gradually became clear and consistent with the increase in sample size.

Globally, the establishment of population biobanks has been supported by governments. The construction of large-scale, high-quality biobanks has become a key area of infrastructure development and an important resource for medical research. The trend of combining large-scale prospective longitudinal cohort design and biological sample collection is increasingly being adopted, such as the UK Biobank (UKB), All of Us, Biobank Japan, Precision Health Research, Singapore, PRECISE, and others.

Biobanks and cohort studies in the $100 genome era

Although there are many factors such as environment and lifestyle, the differences between healthy individuals are ultimately reflected in differences in molecular metabolism. Thanks to the rapid development of gene chips, high-throughput gene sequencing and other technologies, genetic and genomic information was the first to be used for cohort research on a large scale.

Although a single defect in a gene can cause obvious diseases (or rare diseases), this is only a small part, and most diseases, such as chronic diseases, although also related to genetic factors, involve the cumulative effect of multiple changes in multiple genes. With the Human Genome Project, it becomes more feasible to combine genotype and phenotypic data for genome-wide association studies (GWAS) or polygenic risk scores (PRS) or whole-phenotypic association studies (PheWAS). We can generate genetic data from individuals, correlate them with phenotypic and clinical data from individual interviews, physical assessments, medical history reviews, or other biochemical tests to explore possible genetic effects and impacts.

GWAS research has grown since the publication of the first successful GWAS study on myocardial infarction in 2002, and the insights gained from it have had a huge impact on disease research. However, the effect sizes of these associations are usually small and require large populations to reach statistical significance. In addition, association studies only establish correlations, not causations, so detailed molecular biology studies must be carried out after GWAS to firmly establish the true molecular mechanism of the trait.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

Manhattan maps are often used to depict closely related risk sites. Each dot represents an SNP, with the X-axis showing the genomic location and the Y-axis showing the level of association. This example was taken from a GWAS study investigating kidney stone disease, so the spikes indicate genetic variants that are more common in kidney stone patients.

In recent years, whole genome sequencing (WGS) has risen rapidly, by detecting the entire genome, WGS provides the most comprehensive genetic sequence dataset for each individual, supplementing and enhancing Genotyping and whole exome sequencing (WES) data. With the significant reduction in sequencing costs and the $100 category, WGS sequencing of more samples within a limited budget, combined with advances in bioinformatics analysis and variant annotation, can enable the rapid discovery of new genes and insights (especially in non-coding regions of the genome), which will undoubtedly drive the next wave of human genomics research.

Genetic and genomic analysis of large-scale populations can enhance the understanding of disease risk and etiology, such as PCSK9, ANGPTL3, Lp(a) and other risk genes in the field of cardiometabolic diseases and CIDEB, HSD17B13, PNPLA3 on nonalcoholic steatohepatitis, etc., further supporting a better pathophysiological understanding of disease development and helping to develop corresponding therapeutic drugs (these targets are already on the market or in the clinical stage).

Until now, GWAS remains an important tool for deciphering the intricate relationship between genetics and complex traits, becoming a cornerstone for improving healthcare strategies and targeted therapies. As of September 10, 2023, the GWAS catalog contains 6566 published articles, 552116 strongly correlated loci, and 65590 complete statistical data.

Why is UKB a model for biobank service cohort research?

Among the typical cases of continuous construction and promotion of high-quality scientific research in biobanks, the UK Biobank UKB must be the most noteworthy.

The UKB began recruiting in 2006 and reached its recruitment target of 500,000 people four years later (equivalent to around 7 out of every 1,000 Britons volunteering for the programme). UKB focuses on healthy volunteers aged 40 to 69 who have had blood and urine collected, their bodies scanned and imaged, asked to fill out questionnaires about their habits and lifestyles, and agreed to long-term follow-up by the programme. Therefore, in addition to the corresponding biological samples, the UKB has collected a living information base for many years. The information for each test administered to each participant is linked to all their other test results, so the UKB continues to grow in depth and breadth.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?
Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

Breadth and depth of UKB data

The UKB project was driven by British scientists in the context of calls for government investment in DNA, so the project began with the aim of analysing participants' genetic sequences. Using bespoke gene chips, UKB provided genotyping data for its first 150,000 participants in May 2015, and released results for all 500,000 people two years later. With the continuous advancement of sequencing technology, UKB has also continued to follow up, successively improving and supplementing the WES and WGS sequencing coverage of 500,000 people. WES data for 470,000 participants was provided to researchers in July 2022, WGS data for the first 200,000 participants was launched in November 2021, and the remaining 300,000 WGS data will also be released by the end of 2023.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

The stored samples may eventually run out, but the resulting digital information does not. In addition to the continuous replenishment of genetic data, UKB is also continuously superimposed on other data types, including single-cell transcriptome, proteome (initially concentrated in 1500 proteins, gradually expanded to 3000), metabolome (including about 250 lipids and some amino acids), imaging (by 2024, 100,000 participants will be MRI, including brain, heart, bone, etc.). With the application of different measurement techniques, the amount of information per sample of UKB has been released to the greatest extent, which also lays a good foundation for continuous association analysis, correlation verification at different latitudes and broadening possible phenotypic spectrum analysis.

The UKB provides lessons in terms of size (only about 1,000 of the 500,000 participants later dropped), depth (a wide range of data types), and duration (15 years follow-up provides a wealth of data for ongoing health research). But UKB's success doesn't stop there.

In the construction of biobanks, UKB has explored a successful model of public-private-partnership. For example, in the sequencing of WES and WGS, UKB has chosen to cooperate with a number of pharmaceutical companies, in addition to the funds provided by the British government, these cooperative pharmaceutical companies have also jointly funded the development of corresponding sequencing projects. Through investment in sharing the data mining and use rights brought by sequencing projects, pharmaceutical companies, including Regeneron Company, have discovered a number of new drug targets through such large-scale WES analysis, such as GPR75 as a treatment target for obesity, which provides sufficient support and reserves for subsequent innovative drug discovery and development.

The genetic data of all participants in the UKB is available to approved researchers and can be accessed through the UK Biobank's Research Analysis Platform, recently launched by UKB in partnership with DNAnexus. DNAnexus is deployed on the Amazon Cloud in London, ensuring secure and auditable access and data access. For researchers in early-stage and low- and middle-income countries, UKB provides free computing resources. This removes the computational burden of large-scale, multimodal data and improves the opportunity for the research population to access data and identify genetic findings from it.

This data sharing has led to extensive research and new discoveries. UKB is used by over 30,000 registered researchers in nearly 100 countries worldwide (80% of research applications come from researchers outside the UK); By the end of 2022, nearly 6,000 scientific papers had been published using UKB data, with more than 180,000 citations, and the number of papers increased at a rate of thousands per year.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

The number of UKB registered researchers and their published articles has grown steadily in recent years

The real value of biobanks lies in the data, and basic data resources such as UKB can be used by researchers around the world. The reason why this "open sharing" can play its greatest value is that in this process, data is rapidly iterated - scientists around the world are constantly using (output) and contributing (input), which promotes the generation of new insights, accumulates the "big data" of scientific research, and constantly refreshes and iterates forward. Science is essentially about stepping on someone else's shoulder and climbing continuously. Ultimately, participants benefit directly from high-quality health research, researchers can publish impactful results, and public-private partnerships generate ample opportunity to consider translation, so that the value of biobanks is maximized by the continued construction of different populations on the same cohort bank.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

Summary of UKB's success factors

In this way, UKB has built the largest information resource on genetic and environmental factors that cause or prevent diseases in the UK to date, and has gradually become one of the few large-scale human biological health information databases in the world, and continues to play an influential role in global scientific research and industrial transformation.

What kind of cohort studies does China need?

UKB provides a model reference for us to carry out cohort research and biobank construction, but the experience of Western countries has never directly solved domestic problems, and it cannot be completely applied to biobank construction.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

Experts and scholars often mention several challenge factors in domestic cohort research and biobank construction

Foreign biobanks, including genetic sequencing data, are mainly composed of individual samples from European ancestry, and biobanks need to be diversified to reflect the differences between populations, and researchers should also consider the possible limitations and impacts of insufficient population representation on their research. Domestically, we are a populous country of 56 ethnic groups, and the diversity of human genetic resources is further enriched, which may be unmatched by any country. At the same time, the continent's vast territory and wide population distribution also pose special challenges for us to collect sufficiently representative population samples and build standardized and high-quality biobanks.

Of course, we also have some representative and internationally influential cohort sample bank precedents, such as China Kadoorie Biobank (CKB), Taizhou Longitudinal Study (TZL, Taizhou cohort), Jiangsu Birth Cohort, etc. Some high-quality scientific research results are based on the research of these cohorts, and the experience of these cohort explorations provides reference for subsequent research in other cohorts, including from population recruitment and follow-up to standardized data production and achievement transformation.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?
Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?
Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

At the forum, researcher Chen Xingdong, Executive Dean of Taizhou Institute of Health Sciences of Fudan University, introduced in detail the construction experience of the Taizhou cohort

The experience of the Taizhou cohort suggests that the cohort study must serve the local health management needs. Local government involvement, organization and influence can make a positive contribution to population recruitment and follow-up, medical information connectivity, and the return of research results. In recent years, the large-scale population cohort research actively promoted by Nanjing Jiangbei New Area and Nanjing Medical University also reflects such an efficient "government-enterprise-research-medicine" interactive mode. The top-level mechanism design and support, the top-down organizational promotion form, and the joint participation of multiple parties in continuous construction, so that the successfully verified advantages of China's construction deserve more sharing.

Cohort studies through biobanks have been shown to provide evidence for primary prevention, evidence for the development of screening and diagnostic tools, evidence for medical decision-making, and new targets and ideas for drug development. Therefore, in the identification and transformation of the results of cohort research, we should not only pay attention to the quantity and quality of published articles, patents or special works, but also pay attention to the practice and guidance of new understanding for disease risk detection and disease prevention (such as incorporating new oncogenes found into disease screening, or promoting the application of more cost-effective polygenic risk score testing, etc.), and consider introducing more corresponding industries. To think about possible early screening and early diagnosis tools (Taizhou cohort provides a good soil for the construction of tumor early screening methodology based on liquid biopsy), the development of therapeutic drugs, and the construction of health management tools.

The mainland has abundant population and patient resources, and many hospitals and research institutions have previously accumulated biological samples of specific diseases or general diseases of corresponding scale; As the cost of domestic sequencing continues to decline, cohort construction and genomics research in large-scale populations seem to usher in new opportunities. Of course, the methods of making full use of biological samples are not entirely forward-looking to build new cohorts from scratch, but also to explore mechanisms for reusing existing cohort samples for iteration. On the basis of the original samples, the genome information is superimposed by WGS to achieve genome sequencing accessible to everyone in the cohort, and correlate and combine the phenotypic, clinical and pathological data that have been collected, which can not only play the value of investment more efficiently, but also reduce the cost of construction from scratch. Of course, using the existing cohort research base, it is also necessary to consider the existing sample quality (whether it is available enough), the participant pool (whether it can be followed up), the data management system (whether it is scalable and integrated), and the governance structure (how to promote the transition and continuous exploration of new and old projects).

Nevertheless, many researchers have also mentioned that we still need to broaden the available funding sources and bring in other institutions that can also benefit from the translation of future results, such as attracting funding and participation from domestic innovative pharmaceutical companies, to ensure the sustainability of biobank construction.

In terms of data governance behind biological samples, sustainable and accessible models need to be considered. On the one hand, with the explosive growth of genomic and other multidimensional data, we also need to strengthen the development and construction of data platforms and computational and analytical tools. "There must be mines, pickaxes and shovels to nugget, and high-speed tracks to transport mines out." On the other hand, just like the successful experience prompted by UKB, how to open the ecosystem, strengthen data sharing, and at the same time integrate research with other high-quality cohort data, and attract more researchers outside the institution to participate in the analysis and interpretation of data, which puts forward high requirements for cohort construction from the mechanism and management. In addition, in the past two years, the state has strengthened the requirements for security, privacy, supervision and other aspects of big data resources, which requires comprehensive consideration of the whole process from layout design, implementation, construction, follow-up, data management to use application, data achievement sharing and management for the construction of cohorts/biobanks. In this way, we can gradually explore the biobank development model that best adapts to Chinese characteristics.

Large-scale research helps us understand health and disease, and it must be inseparable from domestic and foreign cooperation. Stronger international cooperation can better facilitate research from etiology and genetics to risk prediction. Integrating Chinese cohorts in different regions, such as the Chinese in Singapore is very similar to the population in South China, in the complex chronic disease research, it can better suggest the differences and diversity of similar groups in different climatic environments and different lifestyles. Similarly, the practical experience of transforming foreign cohort research results into health and hygiene management is also worth learning. The results and experiences that have been translated on the basis of cohort research in the United Kingdom and Singapore, including from treatment to prevention, from disease management to health management, from hospitals to communities, continuous education of doctors and training systems for the next generation of doctors, and analysis of health economics, all provide a background for cooperation, reference and exploration of similar work in China. In the past many years, cohort studies, corresponding data analysis and transformation of results have occurred in Western populations, and through cooperation, domestic cohort studies can be combined with existing results more quickly for comparison, so as to maximize the value of research and allow the participating populations to truly benefit from such research.

V. Future prospects

Science is essentially an intellectual enterprise. Biobanks are a good medium to attract different scientific communities to share and integrate. At the same time, biobanks are also a way for society to participate directly in research, and behind each sample lies a lot of hope, and in return, researchers hope to gain the most insights and knowledge from the study of these samples.

Large-scale physical biobanks are a valuable long-term asset because the number of human biological samples available during the sample collection phase is limited. A typical biobank is still "one-off-off", where biological samples are collected from a specific population and stored. Therefore, the more information available for each finite sample, the more valuable it is for this sample bank and researchers. Just as large biobanks such as UKB have tried to enrich the data dimensions linked to each sample, the development of genomics and multiomics technology has also provided optional tools for the production of biobank data in the new era.

High-quality, data-rich sample libraries are essential for future research. This also makes the rational design of corresponding cohort studies, the collection of biological samples, the efficient data production with standardized planning consistency, and scalable and accessible data management very important.

The ultimate goal of large-scale cohort biobanks remains to serve the precise prevention, diagnosis and treatment of diseases. But this will never be a one-time process, UKB has been following up for 15 years after the completion of recruitment, and it has gradually ushered in the gradual unfolding of scientific results 20 years after the creation of the project, and this expansion of influence is based on continuous investment, continuous development and continuous "utilization". For China's cohort research, it also takes time to move from scientific research to people's daily life, but it also needs the concept of "starting with the end" transformation of results, which can not only benefit participants, but also allow the public to participate and understand.

Entering the era of hundreds of metagenomes, how can Chinese cohort research take advantage of the momentum?

Professor Emanuele Di Angelantonio of the University of Cambridge mentioned at the forum: Cohort research is like wine, the more mellow it is, the more aromatic it is. This metaphor is apt for queue building

China's queue construction requires an international perspective. Building China's own influential cohort is a universal voice, which requires exploring an effective Chinese model, but it also needs to show more characteristics of the Chinese cohort, summarize and compare the results of other foreign cohort studies, and require more domestic and international cooperation, collaboration and intersection in different disciplines and fields. Only by making the basic findings of the Chinese cohort visible, indexed, and explored by more international researchers can Chinese cohort continue to benefit the most from the iterative findings of the entire scientific community.

As the cost of domestic sequencing continues to decline, domestic cohort studies will also directly benefit from the accessibility of genomic information, and it is expected that more Chinese cohort studies will be seen by the international scientific community, and we also expect that these high-quality Chinese cohort studies supported by WGS can further promote the realization of truly personalized medical protection.

Read on