laitimes

The most complete sequence of the human genome was announced early this morning!

In the early hours of this morning, Science magazine launched six papers in a row, publishing the complete sequence of the human genome for the first time. Spanning 3 years, this research result fills the gap left by the sequencing results 20 years ago and is a major milestone in the study of the human genome.

The complete sequencing and analysis was done by 114 scientists. The scientists come from 33 research institutes in different countries and are also known as the Telemere-to-Telomere Consortium (T2T).

Fill the 8% gap

The study of the human genome has a history of more than 50 years. One of the most famous research projects is the Human Genome Project, launched in 1990. On April 14, 2003, multinational laboratories spent more than 10 years and $3 billion to complete the sequencing of the Human Genome Project, mapping the human genome for the first time. Sequencing work at that time greatly advanced the study of genomics and increased our understanding of humans and diseases.

But about 8 percent of the sequences produced by the Human Genome Project at the time contained a lot of highly repetitive DNA sequences that added up to the length of a chromosome.

The most complete sequence of the human genome was announced early this morning!

Human chromosomes 1 to 22 | Andreas Bolzer et al.

The 8% missing stems from the limitations of sequencing technology 20 years ago. The sequencing method used at the time was "short-read" technology, which could only read a short segment of gene sequence at a time. For example, if you imagine a part of the genome as a sentence in a paragraph, such as "Xiaoming ate a bun this morning", through short reading and long sequencing, researchers can get a lot of short parts, such as "today", "Xiaoming", "Ming eat", "bun"; and then after "puzzle" analysis, they can piece together this complete sentence.

The most complete sequence of the human genome was announced early this morning!

However, although researchers can know the short sequences contained in this gene, they cannot know how many times the gene has been repeated. That is to say, the researchers were able to piece together the phrase "Xiaoming ate a bun this morning", but they do not know whether this sentence is repeated throughout the paragraph and how many times it is repeated. Because of the lack of this information, repeating sequences has been a major problem in genomics research for the past 20 years.

It was not until the birth of two new technologies that the study of human genomics ushered in a turning point. Both technologies fall under the "long-read" technology, one is Oxford Nanopore DNA sequencing, which can read up to 1 million DNA bases at a time with moderate accuracy, and the other is PacBio HiFi sequencing, which can only read 20,000 bases at a time, but with near-perfect accuracy. Both techniques can measure large chunks of DNA sequences at once, and researchers can directly see an entire sentence or even a paragraph, and they can know how many times a sequence is repeated consecutively.

The most complete sequence of the human genome was announced early this morning!

Karen Miga and Adam Phillipppy, co-chairs of the T2T Alliance, | T2T Consortium

Beginning in early 2019, scientists at the T2T Alliance have combined these two new technologies in a dedicated effort to overcome these repeating sequences and fill in the missing gaps. At the end of 2020, they announced the phased results – the complete assembly of the X and 8 chromosomes. After 2 years of hard work, they have finally unveiled the truly complete sequence of the human genome, from telomeres to telomeres, containing every chromosome.

See the entire genetic heritage

In 2001, the human genome project published a reference genome called GRCh38, which has been continuously improved and modified since then; this time, the new reference genome published by the T2T Alliance is called T2T-CHM13, which is an upgraded version of GRCh38. The newly added DNA sequences total nearly 200 million base pairs not only fill in the previously vacant 5 chromosomal short arms, but also reveal the most complex regions of the genome — highly repetitive DNA sequences around telomeres and centromeres.

The most complete sequence of the human genome was announced early this morning!

Schematic diagram of chM13 genome sequenced with HiFi | References[1]

This complete sequencing also corrected many previous errors, such as some fragment repetitions that had not been detected before. These long strands of repetitive DNA were once thought to be "garbage areas" in the genome and had little practical effect. However, in recent years, a growing body of research has suggested that these repeating sequences may be very important for human evolution and disease. This time, scientists have found the last piece of the puzzle of the genome, finally pieced together the key to open the treasure chest, and then they can study the treasure that is not yet known.

One of the experimental groups involved in the study was from the Santa Cruz Institute for Genomics at the University of California, USA. David Haussler, director of the institute, said: "Now we can stand at the top of the hill and look down on all the scenery below and see the entire genetic heritage of our humanity. ”

The most complete sequence of the human genome was announced early this morning!

The full genomic data has been made publicly available on NCBI and GitHub | NCBI

In the next step, researchers will focus on important areas that were previously difficult to study, such as centromeres. 90% of the newly added genome sequences come from centromeres. Centromeres are extremely important for the inheritance of genetic information, and in meiosis, pairs of chromosomes begin to divide from the centromeres. Scientists believe that many of the genetic variants associated with disease are hidden in the long, repetitive DNA of the silk grain.

Scientists will also try to sequence more complete genomes. T2T will work with the Human Pangenome Reference Consortium to measure the complete genome sequences of 350 people and create a "human pan-genome reference" to present and interpret the diversity of human populations from a genomic perspective.

Adam Phillipppy, co-chair of the T2T Alliance, believes that sequencing and analyzing the complete genome is good for everyone. In the near future, sequencing a person's complete genome will become cheaper and simpler, and researchers and healthcare workers will be able to identify all genetic variants and find parts related to diseases to provide advice on people's medical care and life.

bibliography

[1]https://www.biorxiv.org/content/10.1101/2021.05.26.445798v1

[2]https://www.eurekalert.org/news-releases/946948?

[3]https://www.eurekalert.org/news-releases/947718

[4]https://www.eurekalert.org/news-releases/947629

[5]https://www.eurekalert.org/news-releases/947636

[6]https://www.eurekalert.org/news-releases/947910

Author: Cat Swallow

Editor: Mai Mai

The most complete sequence of the human genome was announced early this morning!

This article is from the fruit shell and may not be reproduced without authorization.

Read on