#科技之巅 #
After twenty-one years, in 2022, scientists announced the first and most complete human genome sequence, filling in the missing "puzzle" pieces of 20 years.
The most complete human genome sequence to release
(i) Several concepts relating to the sequence of the human genome
(1) Gene and genome sequencing
■A piece of DNA with genetic information is called a gene. It is the sequence of all nucleotides needed to produce a polypeptide chain or functional RNA. Genes support the basic structure and performance of life. It stores all the information about the race, blood type, gestation, growth, apoptosis, etc. of life. The interdependence of the environment and heredity depicts important physiological processes such as life reproduction, cell division and protein synthesis. All life phenomena such as the birth, growth, decay, disease, old age, and death of organisms are related to genes. It is also an intrinsic factor that determines life and health. Thus, genes have dual properties: materiality (mode of being) and informationality (fundamental properties). Fragments of DNA that carry genetic information are called genes, and other DNA sequences, some directly function in their own structure, and some are involved in regulating the expression of genetic information. A minimum of 265 to 350 genes are required to make up a simple life.
Gene: A piece of DNA that carries genetic information
■ Genome refers to all genetic information within a cell, that is, a complete set of haploid genetic material in a cell or organism. Humans have only one genome, about 20,000-30,000 genes. The human genome contains all the secrets of human development and evolution.
The human genome consists of 23 pairs of chromosomes, including 22 pairs of autosomes and 1 pair of sex chromosomes. The human genome contains about 3.16 billion DNA base pairs.
Genome: All genetic information within a cell
Around the 80s of the 20th century, the development and successful application of several key technologies made whole-genome sequence determination possible, such as Sanger's dideoxysequencing, pulsed gel electrophoresis to separate large pieces of DNA, in vitro recombination of DNA and PCR technology, and genomics came into being.
Genomics was first proposed by Thomas Roderrick in 1986 and includes genome mapping, sequencing and analysis. The concept of genomics has now developed and can be divided into structural genomics and functional genomics.
The human genome has only four "letters" A, T, C, and G - representing the four bases that make up DNA, but the number of words is as many as 6 billion, distributed in 23 pairs of chromosomes, and the infinite combinations contain countless mysteries of human evolution, birth, old age, disease and death.
The human genetic code is hidden in the base pair sequence, and if you want to explore the secret, you must sequence the gene, and the sequencing process is to read and recognize the base pair sequence.
The purpose of human genome sequencing is to detect the sequence of 3 billion base pairs of human genome DNA, discover all human genes, find their location on chromosomes, and decipher all human genetic information.
(2) Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)
■DNA carries genetic information necessary for the synthesis of RNA and proteins, and is a biomacromolecular polymer essential for the development and normal functioning of organisms. In the DNA molecular structure, two polydeoxynucleotide chains are coiled around a common central axis, forming a double helix structure.
DNA is made up of bases, deoxyribose, and phosphoric acid. Among them, there are 4 bases: adenine A, guanine G, thymine T and cytosine C.
■RNA, a carrier of genetic information found in biological cells and some viruses and virus-likes. RNA is condensed by a phosphodiester bond of ribonucleotides into a chain-like molecule. A ribonucleotide molecule consists of phosphoric acid, ribose and bases. There are four main bases of RNA, namely adenine A, guanine G, cytosine C, and uracil U, among which uracil U replaces thymine T in DNA. The role of RNA in the body is mainly to guide the synthesis of proteins.
(2) After 21 years, the field of human genome sequencing has ushered in a new milestone
In 2001, the international "Human Genome Project", which participated in the participation of scientists from six countries, including China, published a draft human genome and preliminary analysis in the British journal Nature. Due to the limitations of sequencing technology at the time, there were many gaps in this draft human genome.
After the new version completed in 2013 and updated in 2019, there are still millions of bases in the human genome sequencing results represented by the letter "N", indicating that the actual base at that location is unknown. What's more, biologically important regions that make up about 8% of the human genome are unexplored.
To fill the gap, nearly 100 scientists from dozens of research institutions formed a large team called the T2T Consortium, which sequences each chromosome from telomeres at one end to telomeres at the other. As the results of the study were published, one of the team leaders, Professor Evan Eichler of the University of Washington, said that we "read chapters in the Book of Life that we had never read before."
Genome sequencing
(iii) Fill the last 8 per cent gap
The team found a cell line with only a single genome to eliminate the problem of allele diversity. This particular cell line, derived from so-called molar pregnancy, is an abnormal embryo that retains only one copy of the parent's genome after fertilization.
The key progress in overcoming the difficulties is also inseparable from a major leap in sequencing technology. Based on the revolutionary breakthrough of long-read sequencing technology, researchers can decode longer sequences, and even accurately read up to millions of base pairs at a time.
(iv) This achievement is a new starting point for humankind
The analysis of the complete human genome sequence will significantly increase scientists' understanding of human chromosomes, opening up new research directions. This helps answer fundamental biological questions about how chromosomes separate, divide, and more. The research team also identified more than 2 million additional genetic variants using the complete human genome sequence, which provided more accurate genetic variation information for 622 medically relevant genes.
Eric Green, director of the National Human Genome Research Institute, said that the completion of complete human genome sequencing is an important scientific achievement that provides the first comprehensive view of human DNA. This most basic information will advance the understanding of all the nuanced functional differences of the human genome and facilitate genetic research into human diseases.
The significance of the human genome sequencing project is seen as comparable to that of the Apollo moon landing program. The human genome contains human genetic information, and deciphering it can bring revolutionary progress to disease diagnosis, new drug development, and new treatment exploration.
(v) The direction of future efforts
The genome sequence released this time still has shortcomings: because the sequenced samples come from haploid-derived cell lines, T2T-CHM13 does not have a Y chromosome sequence, and scientists will solve this problem later.