
Big data abstract works, welcome to forward to the circle of friends, if you need to reprint, please leave a message in the background to apply for authorization, friends who have applied for authorization do not have to apply again, reprint as agreed.
Compiled | Xi Xiongfen Ding Yi
Proofreading | Yao Jialing
When Martin Krzywinski was a systems administrator at the Michael Smith Genome Center in Canada, he had no intention of becoming a pioneer in the visualization of biological data in the 21st century. In fact, he didn't even have a background in biology, although he had completed graduate programs in physics and mathematics. But it was the late 1990s, and he was able to control a computer.
Krzywinski built the center's first information system, enhanced its security, designed and optimized the keyboard layout, and basically did everything a geek could do. At the same time, he began helping researchers with their projects, gradually understanding their data and its potential. All that's left is to make history.
Rapidly declining DNA sequencing prices and increasing cellular complexity quickly unleashed a torrent of genetic data. However, the tools used to collect data have far outweighed those that depict it. Krzywinski said: "I was frustrated and read a lot of scientific papers and didn't understand what they were saying. I just wanted them to be simpler, and I couldn't do anything to make biology simpler, but I started telling people to make clearer diagrams. ”
To do this, Krzywinski developed Circos, an open-source visualization tool for arranging tabular data in circles. It's a simple idea, but revolutionary: it's been used for visualization thousands of times and has become synonymous with our unique aesthetic for information richness today.
Clockwise from the upper right corner, the genes of humans, chimpanzees, mice, and zebrafish are arranged in a circle, and each color square corresponds to a pair of chromosomal colors. The lines are connected like DNA sequences, only visually emphasizing how many genes we share with other species. (Photo: Martin Krzywinski/EMBO)
This is an ongoing project at the British Library, from horses to platypuses, comparing 16 different species to our genes. In each small lattice, a circle represents the result of a comparison with a human chromosome, with human genes arranged along the lower half of the circle, while the entire genome of a given species is located in the upper half of the circle. (Photo: Martin Krzywinski)
On September 13, 1848, an explosion caused an iron rod to penetrate the skull of Phineas Gage, the foreman of railway construction. Incredibly, Gage survived, but his personality and temperament changed dramatically, making him a case study in early behavioral neuroanatomy textbooks. In this image, the researchers modeled how the iron rod disturbs a particular system of the human brain, arranged around the circumference of the Circos diagram, and the connections between them are represented by wires. (Photo: Van Horn et al./PNAS)
For this image, Krzywinski tried to think about genomes in a new way, transforming their features (i.e., the number of duplicates) into directional vectors. "Now, these genomes have unexpected shapes, it's just pure path algorithms," he said. Some figures are circular, and some look like the shape of continents or countries. I just think it's an attractive way to look at the genome, not just to give a sequence. (Photo: Martin Krzywinski)
For information designers, π value is very attractive. Krzywinski, in order to draw these two figures, encoded them with color pairs, showing the first 3422 bits of the π value on the left and the first 123,201 bits of the π value on the right, which are arranged in an Archimedean spiral. (Photo: Martin Krzywinski)
Messy Furballs: Visualizations like this helped inspire Krzywinski's work. Commonly referred to as woolly masses, they are used to visualize interactions between networks. They are very useful in the right scenario, but they live up to their nickname when networks become large and complex. Krzywinski said: "Many hairballs appear random, and many times, their structure confuses us and makes us think about something we know and don't actually know. "For example, the hairballs above come from a diagram of human protein interactions, suggesting a structure that doesn't actually exist. The researchers write: "Obviously, the yellow band of nodes is a artifact of the graph layout algorithm. ”。 The algorithm does not explain the apparent separation of red and blue edges, but the naked eye of a person can recognize it. (Photo: Rual et al./Nature)
Krzywinski's latest visualization tool is Hive Plot, where network nodes are assigned to axes with attribute definitions such as connectivity, density, and centrality. It is in this arrangement that the structural features become apparent, and in the figure above, above is the current E. Coli (left) and Linux (right), the structural features of the original version at the bottom are much more obvious. Krzywinski has said that the key to designing a live plots or any visualization is to understand what parameters need to be emphasized. Although some informatics still believe that as long as there is enough data, the raw data can be simply presented according to the rules. He said: "I don't believe that, you need to plan and explain. The result will not be like this. (Photo: Martin Krzywinski)
Here, the researchers compare three Arabidopsis thaliana (a plant that is often used to study plant genetic patterns) with their common ancestors. Each strain's genome is placed on an axis; if they come from the same ancestral sequence, the two regions are derived to be connected. (Pictured: Mandáková et al./plant cells)
Circos are used not only to compare genomes, but also to characterize them, such as Gloreobacterviolaceus, a direct descendant of one of the most primitive photosynthetic bacterial varieties. While this graph undoubtedly has more significance for scientists than for a layman, it is still a compelling one: far-reaching and significantly richer than the visualization of the genome a decade ago. Photo: Saw et al. / PLoS One
Not all of Krzywinski's work involves data visualization. The mouse embryonic vascular images, a cover image from last year's issue of the National Academy of Sciences, are derived from a synthesis of multiple microscopic cross-sectional images whose colors were adjusted based on hubble space telescope photos and Star Trek. Krzywinski said: "Now it can be said that I have completed one of my life goals to make biology look like astrophysics." (Photo: Krzywinski / PNAS)
Big Data Digest Compiler Profile
Xi Xiongfen is a graduate student majoring in wireless signal processing at Beijing University of Posts and Telecommunications, mainly researching graph signal processing, interested in graph data mining based on social networks, hoping to use this platform to meet more people engaged in big data and make more like-minded people. A Ph.D. candidate in the Department of Pharmacology at Duke Ding University, he is interested in big data mining in bioinformatics and clinical pharmacy. Yao Jialing, a housewife, is very interested in the knowledge of data analysis and data processing, and is working hard to learn.
In August 2015, the dry goods file was packaged and downloaded, please click on the bottom menu of the big data digest
Big Data Digest Highlights:
Reply to [Finance] See the [Finance and Business] column history journal article
Reply [Visualization] Feel the perfect combination of technology and art
Reply to [Security] Fresh cases of leaks, hacking, attack and defense
Reply to [algorithm] both knowledgeable and interesting people and things
Reply to [Google] Look at its initiatives in the field of big data
Reply to [Academician] See how many academicians talk about big data
Reply to [Privacy] See how much privacy there is in the era of big data
Reply 【Medical】 View 6 articles in the medical field
Reply to [Credit Investigation] Four special topics on big data credit reporting
Reply to [Big Countries] "Big Data National Archives" of the United States and other 12 countries
Reply to [Sports] Big data application cases in tennis, NBA and so on
Long press fingerprint to follow "Big Data Digest"
Focus on big data, sharing every day
One of the members of the WeMedia Alliance, which reaches tens of millions of readers