laitimes

After 22 years, more than 200 million missing human genomes were deciphered for the first time| titanium media science popularization

After 22 years, more than 200 million missing human genomes were deciphered for the first time| titanium media science popularization

Study of DNA bases in the complete human genome, represented by the letters A, T, C, and G (Source: NHGRI)

After 22 years, the researchers finally deciphered the complete sequence of the human genome from start to finish.

Titanium media App April 1 news, according to science and technology daily, the world's top journal "Science" magazine this morning issued 6 papers report, announced the latest progress in human genome sequencing: the national human genome research center (NHGRI) composed of telomere-to-telomere (T2T) alliance scientific team, through new technology to develop the world's first complete, gap-free human genome sequence, For the first time, highly identical segmental repetitive genomic regions and their variations in the human genome have been revealed.

It was a "major upgrade" to the standard human reference genome, the reference genome sequence (GRCh38) released in 2013, adding previously hidden dna fragments on entire chromosomes, deciphering about 200 million missing DNA base pairs and more than 2,000 new genes — 8 percent of the human genome.

The results of this study are significant. The complete human genome sequence revealed by researchers is one of the most complex puzzles in the world, and this research has allowed humans to see the most complete, gap-free DNA base gene sequence for the first time, which is crucial for humans to understand the full spectrum of genome variation, as well as the genetic contribution of certain diseases, and will promote research and scientific development related to cancer, birth defects and aging.

At the same time, this is also the first time in the 141 years since Science's founding that six papers in the same issue have been published revealing the study of the human genome.

The paper's author, Ting Wang, a geneticist at Washington University School of Medicine in St. Louis, said that having a complete genome this time will definitely improve biomedical research. "There is no doubt that this is an important achievement."

"We've seen chapters that we've never read before," said evan Eichler, a corresponding author of the paper and a researcher at the Howard Hughes Medical Institute (HHMI) at the University of Washington, a major industry-wide event.

After 22 years, more than 200 million missing human genomes were deciphered for the first time| titanium media science popularization

Science paper cover art

What exactly did the researchers decipher?

The human genome is made up of more than 6 billion independent DNA bases, about 20,000-30,000 protein-coding genes (the whole gene still does not have a unified answer), about the same number as other primates such as chimpanzees, distributed on 23 pairs of chromosomes. To read tens of thousands of genomes, the scientists first cut all strands of DNA into pieces of DNA that are hundreds to thousands of units long. The individual bases in each fragment are then read with a sequencing machine, and the scientists try to assemble the fragments in the correct order, like piecing together a complex puzzle.

On February 12, 2001, the International Human Genome Project, which was jointly participated by scientists from six countries, published the human genome map and preliminary analysis results for the first time, and on April 15, 2003, the human genome sequence sketch was published.

However, due to technical limitations, the original Human Genome Project left a "blank" gap of about 8%. This part, which is difficult to sequence, consists of highly repetitive, complex blocks of DNA that contain functional genes and centromeres and telomeres located in the middle and ends of chromosomes.

In fact, the core challenge is that certain regions of the genome repeat the same bases repeatedly. The replicated regions include filaments and ribosomal DNA, which in the past could not assemble some of the chopped fragments in the correct order. It's like having the same pieces of a puzzle, scientists don't know which piece is where, so there's a big gap left in the genome map.

And most cells contain two genomes —one from the father and one from the mother. When the researchers tried to assemble all the fragments, sequences from both parents may have mixed together, masking the actual variations within the individual's genome.

Today, researchers have implemented a new gap-free version of T2T-CHM13 with new nano-machine equipment and core technology, composed of 3.055 billion base pairs and 19969 protein-coding genes. Nearly 200 million new DNA sequences have been added, including 99 genes that may code for proteins and nearly 2,000 of them that need further study.

Most of these candidate genes are inactivated, but 115 of them may still be expressed. The team also found about 2 million additional variants in the human genome, 622 of which appear in medicine-related genes. In addition, the new sequence corrects thousands of structural errors in GRCh38.

After 22 years, more than 200 million missing human genomes were deciphered for the first time| titanium media science popularization

Display pattern of proximal centromere chromosomes (Source: Thesis)

Specifically, the gaps filled by the new sequence include the entire short arm of the 5 chromosomes of humans and cover some of the most complex regions of the genome. These include highly repetitive DNA sequences found in and around important chromosomal structures, such as telomeres at the ends of chromosomes and centromeres that coordinate replication chromosomal isolation during cell division.

In addition, the new sequence reveals previously undiscovered segmental repetitions, i.e. long dna fragments copied in the genome, and reveals unprecedented details about the region around the centromere. Variability within this region may provide new evidence for how human ancestors evolved.

It is worth mentioning that the key progress of this research result is actually the use of new technical equipment - the rapid iteration of gene sequencing machines manufactured by Oxford Nanopore Technology and Pacific Biosciences Company in the United Kingdom.

Back in 2017, Adam Phillipy, head of the National Center for the Human Genome Research (NHGRI), and Karen Mega of the University of California, Santa Cruz (UCSC), realized that the ability of new nanoporous machines to accurately read 1 million DNA bases at a time could open the door to finally solving genomic difficulties.

Around the same time, a team of scientists led by Evan Eichler of the Howard Hughes Medical Institute (HHMI) at the University of Washington has demonstrated that more complex forms of genetic variation can be addressed using Pacific Biosciences' equipment technology.

So together, the trio founded the Telomere-to-Telomere (T2T) Alliance, which used the resources of a team of about 100 scientists around the world to speed up the study of good couples.

Subsequently, the team continued to utilize rapidly iterating nanoporous gene sequencing machines for six consecutive months, and invited dozens of scientists to assemble these gene fragments and analyze the results. Finally, using equipment, technology, etc., long-reading sequencing readings were realized, and long-reading sequencing was combined with the data of Oxford nanopores, with an accuracy rate of more than 99%, filling the gap in global genetic research.

By the summer of 2020, the team had stitched together two chromosomes. During the COVID-19 outbreak, the team worked remotely through communication tools such as Slack to obtain an additional 21 chromosomes, sorting each chromosome from one end or telomere to the other. Moreover, the researchers also tried to assemble the most difficult regions of the genome, namely the highly repetitive DNA sequences in the silk particles.

Ultimately, through lengthy studies and team collaboration, the team succeeded in sequencing each chromosome, containing multiple copies of the genes encoding the RNA used to make ribosomes, for a total of 400.

In June 2021, the results were first published on the preprint platform bioRxiv. After peer review, a series of papers have now appeared in the journal Science.

The researchers said in an interview after the meeting that the next phase of the study will sequence the genomes of different people to fully grasp the diversity and role of human genes and the relationship between humans and close relatives and other primates.

With an annual growth rate of more than 20%, China's 10 billion gene market has broad prospects

With the continuous development of biological technology, new industries emerge in an endless stream, and the Chinese gene sequencing industry to which this research result belongs is a ten-billion-level market with broad prospects for development.

According to the research statistics of Qianji Investment Bank, as early as 2019, the global biological products industry market size where gene sequencing is located reached 317.2 billion yuan, and it is expected to reach the trillion level in the next five years. Among them, in 2019, the market size of China's gene sequencing industry was about 14.9 billion yuan, with an annual growth rate of more than 20%.

In recent years, the gene sequencing industry has developed rapidly, attracting a large number of capital and enterprises to enter. From the upstream and downstream of the industry, the gene sequencing industry chain mainly includes three links: upstream instruments, midstream service providers and downstream terminal applications. The companies involved include BGI, Daan Gene, WuXi AppTec, and Internet giants Apple, Amazon, Google, and Microsoft.

After 22 years, more than 200 million missing human genomes were deciphered for the first time| titanium media science popularization

The entire industry seems simple, but the upstream gene sequencer and supporting reagents are the highest barriers in the entire industry chain, and the downstream terminal applications also involve a wide range of fields, including the human genome in the medical field, the human microbial genome and the basic research field, but also the environmental governance in the non-medical field, oil storage detection, agricultural and animal husbandry breeding, etc.

In fact, decades ago, the medical community tried to transplant the heart of a baboon to a child with congenital heart disease. Today, through chimerism, through gene editing, and even through synthetic biology, the transplantation of pig hearts in humans has been realized.

Yin Ye, CEO of BGI Group, once said that in fact, today's human beings have entered the era of life, and what we care about is our own genes and health, so that we will integrate the physical world, the information world and the living world.

In the application scenarios continue to broaden, sequencing capabilities to further strengthen the joint promotion of the role, the global gene sequencing industry market size will continue to grow, Although the market size of China's gene industry and the global head of the enterprise gap is larger, but still occupy a large advantage in the domestic market, the future to improve the international market share, but also need to further strengthen technology research and development, the future development has a huge imagination space.

Today, the results of new genome sequence research are an indispensable first step for researchers and an important step for commercialization.

Evan Eichler said, "Now that we have a Rosetta stele (note: a granite flashing rock stele made in 196 BC that deciphers the meaning and structure of egyptian hieroglyphs that have been lost for more than a thousand years), we can study the complete compilation of hundreds of thousands of other genomes in the future." ”

(This article was first published on titanium media App, author | Li Canon, editor| Lin Zhijia)

Read on