
Professor minoru kanehisa, Kyoto University of Japan| Source: kanehisa.jp/
Introduction
The problem to be solved by biological information is the generation, management and mining of biological data. This protracted systemic push and support seems more likely to be overlooked and underestimated than the usual star-studded concept and technological advances.
Bioinformatics has indeed given a strong impetus to the development of biological research and application in all directions. The embarrassment is that it lacks depth as a utility, and conceptual and technological breakthroughs rely heavily on experimental design and data quality. As far as the Nobel Prize is concerned, the biggest pain point of bioinformatics is that it is difficult to close the loop and reach a generally accepted height.
Written by | Zhang Xiaoniu
Responsible editor| Chen Xiaoxue
● ● ●
In 2018, Professor Kim Hisashi of Kyoto University in Japan was listed by the American consulting firm Corevien as one of the possible candidates for the Nobel Prize in Physiology and Medicine, nominated for "outstanding contributions to bioinformatics, in particular the development of the Kyoto Encyclopedia of Genes and Genomes". The full English name of the Kyoto Encyclopedia of Genes and Genomes is "Kyoto Encyclopedia of Genes and Genomes", or kegg for short. Even people who work professionally in bioinformatics may not know Professor Jin Jiushi, but anyone who has been exposed to bioinformatics will know kegg.
Genes interact with each other to achieve biological functions, and the collection of genes that perform specific biological functions is called pathways, such as metabolic pathways, signal transduction pathways, and so on. Kegg has developed a series of bioinformatics tools through continuous collation of access data sets, and has provided long-term genetic function information support for daily biological research. This is a vast and complex systematic work that has clearly contributed greatly to modern biological research.
But this protracted systemic push and support seems more likely to be overlooked and underestimated than the usual star-studded concept and technological advances. This may be one of the reasons why Professor Kim Has not won the Nobel Prize so far. This also reflects the dilemma of traditional bioinformatics, which lacks depth as a practical tool, and conceptual and technological breakthroughs rely heavily on experimental design and data quality.
Professor Kim graduated from the Department of Physics at the University of Tokyo in 1976, did postdoctoral research at hopkins school of medicine, and became a research scientist at the Alamos National Laboratory in 1981. During this time, he was involved in the development of the biological database genbank. This experience apparently helped him a lot in developing the kegg professional database since then. Today, genbank is one of the most important first-class genetic databases in the world, and the vast majority of human research can be found in this database.
In 1985, he returned to Kyoto University as an associate professor and was promoted to full professor in 1987. In 1995, he began the most important project of his life, the construction of the kegg database. The kegg database contains a large amount of information about pathways in the form of gene interactions (the phenomenon of non-alleles influencing the presentation of the same trait through interactions). With the development of research technology, the channel information data is constantly accumulated and updated. The most typical application of the kegg database is pathway mapping, which predicts the possible biological function of the target gene through enrichment analysis of the target gene.
Kim Ku-sil | Source kyoto-u.ac.jp/
He became the first president of the Japan Bioinformatics Society in 1999 and an honorary fellow of the International Society for Computational Biology in 2013. It can be said that Professor Jin Jiushi has made a lot of solid work to promote the development of bioinformatics in Japan and even international bioinformatics.
The kegg database was first published in 1999 with the goal of organizing experimental data from species at the pathway level and developing bioinformatics tool annotation and comparison pathways. The basic data unit in kegg is the gene, which has different functional identities and achieves a specific function by interacting with other genes or small molecules, and genes and small molecules related to specific functions are organized into pathways in the database.
Logo of kegg database Image source: kegg official website (https://www.genome.jp/kegg/)
In the beginning, the kegg database consisted of only a few species that had been sequenced, and hundreds of hand-drawn pathways derived from biochemical experiments. After years of development, the current kegg2 version consists of four parts: system information, genomic information, chemical information, and health information.
System information is a further construction of structured functional modules on the basis of pathways, so that different pathways can be organized under a unified architecture for analysis. Genomic information includes genomic sequence information, gene annotation, and orthographic homologous gene mapping. Chemical information is mainly all kinds of metabolites, glycoproteins, biochemical reaction information and enzymes. Health information includes disease-related mutation and network information, human disease information, and drug-related information.
It can be said that kegg is the ultimate development of the technical form of the traditional association database in the direction of gene function annotation, and through the functional annotation of genes, it has effectively promoted the development of biological research and application in all directions.
The human genome sketch was published in 2001, which means that Kegg's design predates the generation of large-scale genomic data. In fact, kegg's early core pathway information was also manually plotted based on experimental data. In the era of relatively small amounts of data, the process-based data organization method can reflect gene function very well. However, with the rapid development of sequencing technology and the explosive growth of biological sequence information, the interpretation of gene function is not only at the level of the pathway, for example, in recent years, biological research has gradually expanded from genes as the basic functional unit to single cells as the basic functional unit.
Essentially, natural selection plays a role at all levels, genes, cells, organs, individuals, groups, species, and even ecosystems. The combination of gene types and gene regulation forms cells, the combination of cell types and cell distribution forms organs, and so on.
Kegg provides static functional information by recording the interrelationships between genes or genes and metabolites. However, more complex life phenomena, such as cell types, are made up of combinations of different pathways, which goes beyond the capabilities of kegg's existing data architecture.
Kegg is an early database form of bioinformatics, an important milestone in the development of bioinformatics, and will also be an important basic tool for bioinformatics in the future, but kegg's support for the exploration of complex life phenomena shows a clear ceiling effect.
There are some very strange phenomena in the field of bioinformatics. On the one hand, people who do bioinformatics cannot be recruited everywhere, and those who do bioinformatics research are often considered by the mainstream to be unable to ask scientific questions. On the other hand, anyone can claim to be doing bioinformatics, and there are many opinions on the specific directions of bioinformatics.
Essentially, bioinformatics is an engineering discipline, not a science discipline. The problem to be solved by biological information is the generation, management and mining of biological data. Biological information does not need to solve biological problems, because biological problems can also be circumvented through experimental technological advances. The focus of bioinformatics, then, should be on a series of research and development efforts around specific biological data.
Significant and impactful work should be directed towards goals that have long-term viability. For example, data analysis methods developed based on a particular sequencing technology will lose value due to the development of sequencing technology, but data analysis for gene function, such as kegg, will not lose value due to the passage of time.
Professor Jin Jiushi has not yet been recognized by the Nobel Prize, most likely only because the representation of the pathway as a function is relatively flat, and the architecture of kegg itself also limits its analysis of more complex life phenomena. The two major data types facing modern bioinformatics are biological sequences and biological images, and methodologically, machine learning tools based on big data are becoming more and more powerful. Then, the high point of the next wave of bioinformatics is basically clear. Conceptually, the data object that has long-term survival value and can completely solve some important problems is undoubtedly a cell type.
The comparison of cell type to gene type increases both spatial and temporal complexity, so the data object obviously has multimodal properties. Specifically, integrating big biological data to solve problems at the cellular level, such as advancing artificial intelligence based on understanding the structure of biological brains, has the opportunity to do Nobel Prize-level work.
bibliography:
https://web.ornl.gov/sci/first/clarivateanalyticscitationlaureates.pdf
https://en.wikipedia.org/wiki/minoru_kanehisa
https://www.kanehisa.jp/en/kanehisa.html
https://www.kegg.jp/
ogata, h., goto, s., fujibuchi, w., and kanehisa, m.; computation with the kegg pathway database. biosystems 47, 119-128 (1998).