laitimes

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

At the end of the year, the most important thing is a variety of inventory summaries.

No, the Chinese institution achieved a 7-minute completion of 30X sequencing depth human genome sequencing, and it was mentioned after 3 months.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

It doesn't matter if we don't understand it, we just need to know that this achievement means that genetic screening will likely enter the routine physical examination, and genetic disease examination may also be as immediately as desirable as throat swab testing.

For example, sickle anemia, congenital heart disease and all diseases caused by genetic abnormalities can be detected early and prevented and treated early through genetic testing, especially in terms of reproductive health.

However, most of the current genetic testing projects only screen for common genetic diseases, and some rare genetic diseases are difficult to detect. And the testing agency generally needs more than 20 days to issue a report, and the testing project cycle is too long.

Part of BGI Medical Single Gene Genetic Disease Testing. ▼

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

The Time required by the Chinese team to sequence the whole human genome was directly compressed to 7 minutes, which is equivalent to opening a Harmony for the biological community and obtaining all the genetic information of the organism, which is a matter of minutes.

To find out how meaningful 7 minutes is, let's figure out what whole genome sequencing is.

Gene sequencing is the process of converting DNA information into digital information that can be read by humans, while whole genome sequencing is to convert all the DNA information of a living thing into digital information.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

Reading the base arrangement information of an entire DNA strand is not only slow, but also error-prone. In practice, long strands of DNA are cut into many small pieces and sequenced at the same time, which can greatly reduce the sequencing time.

Although the acquisition of small fragment sequence information is faster and easier, it also brings a new problem, how to correctly stitch these small fragments back into a complete sequence?

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

Anyone who has played puzzles knows that to determine whether two pieces of zero are adjacent, you need to refer to whether their patterns fit together well.

The same is true for splicing DNA fragments, whether the two fragments are adjacent or not depends on whether the sequence at the end of them can completely overlap.

As long as the same sequence exists at each end of two sequences, the two sequences can be merged into one.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

Of course, this is a good luck situation, and two adjacent clips can be found smoothly. If you're unlucky, you may not find a clip that matches it at a breakpoint.

In order to ensure that the sequencing fragment can cover the entire gene sequence, the commonly used means is to win by quantity. Fill in the template with more than ten times and dozens of times the fragments, and if there is still a situation that you can't fill in the blanks, you should go to buy lottery tickets.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

But the direct consequence of doubling the number of pieces is an exponential increase in the amount of stitching work, after all, it takes more than ten times as long to assemble a 1,000-piece puzzle than 100-piece puzzle.

How much work is this? Let's put it in a specific sequencing case and calculate it.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.
The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

Now that we have an idea of the number of data reads, let's convert the memory footprint of the data. According to inaccurate calculations, the 1bp base occupies about 3B of memory, so the whole human genome with 30X sequencing depth needs to occupy nearly 300GB of memory.

Not to mention reading and analyzing data, just storing it is enough to crash the computer, so such tasks are generally left to the powerful servers of professional sequencing companies. At the industry's current level, it takes at least 24 hours to complete the stitching of the whole human genome.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

In this comparison, 7 minutes can complete 24 hours of massive data processing work, which is indeed a lot of strength. Could it be that a super CPU has appeared?

CPUs or those CPUs, but there are new ways to process data.

We regard data reading and writing as transporting parcels to the warehouse, large and small packages must be loaded inside, regardless of the size of the objects are all placed in order, not only low handling efficiency, space utilization is not high.

The correct way is to pack the small parcels into large boxes and place them sequentially with other large parcels, which not only improves the overall space utilization, but also reduces the handling time.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

This is one of the reasons why 7 minutes to do 24 hours of work, big data is written directly, small files are aggregated into large files and then written, not only faster, but also more.

Another secret of massive data processing in seconds is "the world is the same".

Usually, different types of data do not know each other, need to use a separate protocol for private dialogue, and it is not convenient to call.

If you want to improve the efficiency of data call, then let them all come to the square to shout, and it is better to find people in the open space than to find people from house to house in the community.

As long as the encryption and decryption logic between different data is broken, the unified data access protocol is used, and the loading process is eliminated, and fast calls to all the data in the disk can be realized.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

In addition to these two groundbreaking data processing methods, some hardware and software enhancements contributed to this 7-minute achievement.

For example, compressing the disk size, modifying the server structure, and placing a larger number of SSDs with the same volume to achieve larger capacity data storage functions.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

In addition, the platform has developed a data read and write mode for multi-line operation, which can increase the speed of processing data by one level; and also improved the data compression algorithm to process more data with less disk capacity.

The combination of innovative technologies enables a leap forward in 24 hours to 7 minutes for massive data analysis. Even astronomically large biological information can be processed in a matter of minutes, and what else can't be done?

The significance of this 7 minutes is not only to obtain all genetic information quickly, but also to be a very important breakthrough in the field of data processing.

The previous 24-hour genome sequencing took the Chinese team only 7 minutes.

Similar to some applications that require precise calculations and huge amounts of data, they are handled quickly and securely by using China's own servers.

For example, satellite remote sensing, drug research and development, energy surveys, etc., all need to analyze massive amounts of data; and technologies such as autonomous driving require real-time feedback of data, and the high-speed computing and processing capabilities of data are essential.

In other words, taming data is equivalent to grasping the lifeblood of science and technology, and those who get data win the world. All the fields attached to this foundation have to be riveted and rolled up again.

Perhaps, AR glasses, which have been stumbling, will soon become popular.

Author: Xingkun Editor: Surface line

Images, references:

https://e.huawei.com/cn/case-studies/storage/2021/west-china-hospital-sichuan-university

Read on