The Paper's reporter Zhang Hui

Screenshot of the dissertation web page
Recently, the journal of Genetics and Genomics, an international authoritative academic journal, published online the team of Jin Li, academician of the Chinese Academy of Sciences and professor of Fudan University, entitled "The HuaBiao Project: Whole-Exome Sequencing of 5,000 Han Chinese." Individuals") research papers.
In this study, 5,000 individuals from three representative Han ethnic groups in North China (Zhengzhou), East China (Taizhou) and South China (Nanning) were sequenced by genome whole exomes, and the "Huabiao" Chinese exome database (hereinafter referred to as the "Huabiao" database) was preliminarily constructed.
On August 30th, at the briefing meeting of the latest scientific research achievements of the "Huabiao Project", Professor Wang Jiucun, one of the corresponding authors of the paper, the director of the Department of Human Genetics and Anthropology of Fudan University, and Shi Leming, professor of the Institute of Human Phenotype Group of Fudan University, introduced the scientific research results.
Discover genomic variants specific to native populations
Various health problems, including diseases, are significantly affected by genetic variation in the genome. This genetic variation varies not only from person to person, but also often from "population".
In 2001, the Human Genome Project constructed the first human reference genome; in 2008, the 1,000 Genomes Project was launched to sequence the whole genome of 2,500 samples from different human races around the world, drawing the most detailed map of human genome variation to date. Through the 1000 Genomes Project, the scientific community has found that there are significant differences in the locus and frequency of genomic variation between different races (populations).
At the same time, the rapid development and maturity of a new generation of genome sequencing technology has made it possible to conduct large-scale sequencing studies of population samples at the genome level and systematically reveal the fine genetic structure of populations. Exons are part of eukaryotes' genes that are responsible for encoding proteins; exomes are collections of protein-coding regions of a genome. Compared with whole genome sequencing, the cost of the new generation of whole-exome sequencing (WES) that covers the coding region of the genome is significantly reduced. Rare variant sites of clinical pathogenesis can be more accurately detected at high sequencing depths.
At present, there are many large public all-exon (WES) databases in the world, such as ExAC, gnomAD, etc. But the samples in these databases are mostly Caucasian, African-American or Latino, and the Han Chinese samples have a limited number. As the most populous ethnic group in the world, the Han chinese have high genetic diversity, and the establishment of a high-quality and representative exon database of the Han Chinese population is of great value for biomedical research.
In September 2017, the Key Laboratory of Modern Anthropology of the Ministry of Education of Fudan University and relevant institutions jointly launched the "Huabiao Project" - China All Exogenesis Database Project, which is one of the Chinese public databases independently built. As the main scientific designer of the "Huabiao Project", Jin Li said that the construction goal of the first phase of the "Huabiao Project" is to systematically analyze the allele frequency of the exon region by high-quality sequencing of the representative Han population samples covering the whole country, finely characterize the genetic structure of the Chinese Han population, and form a Chinese population genome database independently built in China. At the same time, under the premise of following the relevant provisions of the national management of human genetic resources, we will explore and promote the preservation and sharing of various types of biomedical data, including genomic data, to provide reference data sets for the next step of precision medicine research.
After nearly four years, the Fudan University team and Aiji Taikang used the all-exon capture chip technology to complete the capture and sequencing of all-exons of 5,000 Han Chinese individuals, build the "Huabiao" database, and complete the first phase of the "Huabiao Project".
At present, the Huabiao database contains a total of 2.07 million genetic variants, of which 46.4% of the genetic variants were first discovered in the study. Researchers around the world can quickly retrieve frequency information about relevant genetic variants through a database sub-station (https://www.biosino.org/wepd) located on the website of the China Biomedical Big Data Center (Shanghai).
The "Huabiao" database provides a scientific basis for the accurate diagnosis and treatment of rare diseases
The samples in the Huabiao database were compared with the "Chinese Family No. 1" biological reference standard (http://chinese-quartet.org/) originally developed by The Institute of Human Phenotype Group of Fudan University, and the results showed that the genetic data SNP accuracy (precision) of the "Huabiao" standard reached 99%.
The researchers also compared the "Huabiao" sample with the same other technical route, the genome-wide chip data, and the results showed a consistency rate of 99.8%.
The Huabiao database was highly consistent with the common genetic loci in thegnod (East Asian population) (R2>0.99).
The above results all prove that the variation data of Huabiao database have high quality and high accuracy.
Professor Wang Jiucun, one of the corresponding authors of the paper and director of the Department of Human Genetics and Anthropology of Fudan University, introduced that the population database, including the all-exome database, is of great significance to the biomedical community to carry out rare disease research. Rare diseases refer to rare diseases that occur only in a very small number of people and have a population prevalence of less than 1 in 10,000. Statistics show that the majority (72%) of rare diseases are hereditary diseases, and many rare diseases occur early in the patient's life, such as thalassemia, osteogenesis imperfecta, etc.
The Huabiao database provides low-frequency locus frequency information for the Han Chinese population, which can help researchers distinguish between rare disease-causing mutations and high-frequency benign variations, thus providing a scientific basis for further accurate identification and analysis of the pathogenic molecular mechanisms, genetic mechanisms and accurate diagnosis and treatment schemes of rare diseases based on rare diseases in Chinese. According to reports, in general, the larger the population, the greater the selection pressure generated, and the easier it is to remove harmful gene mutations. Due to the large Han population, the impact of rare diseases is smaller than that of closed small groups.
Fudan University doctoral students Hao Meng, postdoctoral fellow Pu Weilin, young associate researcher Li Yi and Wen Shaoqing are the co-first authors of the paper, Fudan University professors Jin Li, Wang Jiucun, Li Hui and young associate researcher Wang Yi are co-corresponding authors. The relevant work is supported by the Shanghai Major Science and Technology Project, the Medical Science Innovation Fund of the Chinese Academy of Medical Sciences and the National Key Basic Research and Development Program.
Editor-in-Charge: Gao Wen
Proofreader: Yijia Xu