laitimes

A 24-hour man-machine war without gun smoke, this group of post-95s fought

A 24-hour man-machine war without gun smoke, this group of post-95s fought

In the field of basic software, the autonomous control of domestic databases is a necessary road.

Databases are one of the infrastructures in the field of computing. All application usage and user behavior need to be accommodated in the database. Since the day it was launched, the database has undertaken the function of supporting application software upwards and mobilizing system resources downwards, and has been in a core position in the IT architecture, and is known as the "crown jewel of the software industry".

Oracle, IBM, and Microsoft, the world's leading databases, currently control 75 percent of the world's database market share. Oracle's market share last year was 39.8%, and its revenue in fiscal 2021 reached $40.5 billion (about 270 billion yuan), which is evident in its importance.

In March 2021, the 14th Five-Year Plan was officially promulgated, highlighting the need to cultivate and expand emerging digital industries such as big data. In recent years, more and more domestic self-developed databases have emerged in China. Especially in the field of distributed databases, China has explored its own path.

However, there is a shortage of R&D talents in the domestic database field, which seriously affects the development of the database industry. If China's database is to continue to go on, it is necessary to absorb fresh blood, and it also needs computer talents from some universities and Internet companies to support the future development of China's database.

"OceanBase Database Competition" As the first distributed database kernel development competition in China, it hopes to gather young talents in the database field and build a platform for them to exchange technology, so that domestic database talents can collide with new sparks here.

This competition lasted from August last year to April this year, the participating teams learned and compared at the same time, through a long period of learning and practice, the database has been from shallow to deep, from the overall rough to the depth of a certain problem, each team has grown.

On April 28, 2022, this long programmer's marathon ushered in the final final, and in these last 24 hours, every team gave their all.

Because of the epidemic, the final competition faced more ups and downs than before, but these young computer people, despite the difficulties, finally came up with their brilliant results. And the different solutions they propose also make it possible to learn from each other and improve each other.

The charm of technology lies in this: because the implementation path is different, sometimes it is difficult to have a recognized NO.1, but the collision between different ideas and the blossoming of a hundred flowers will eventually make Chinese computer technology go further.

Here's their true story:

Text | Little North

Edit | Cai Yu

At 12 noon on April 28, a group of young people ushered in a special "war".

The war was free of smoke, but it did not affect the tension it brought. The 20 teams of students on site had to code 24 hours a day in front of a computer—they had to engage in a contest with memory over space, a game between man and machine.

When the number of Chinese netizens has reached a huge order of 1 billion, how to store data and analyze data has become a problem that must be solved in the current Internet industry. Space cannot be expanded in disorder, so how to analyze and process a larger amount of data through technology in a limited space has become the basis for determining whether the Internet can run in an orderly and smooth manner.

With this vision in mind, 20 teams, including "NoPassCET4" from Chinese University, "East Asian Boys' Team" from East China Normal University, and "push_d_" from China University of Electronic Science and Technology, rushed to the final round of the "OceanBase Database Competition".

"What we have to do is to achieve the most perfect balance between data and storage space, not only to compress data, but also to consider various performances", Huang Renhuang, captain of the "NoPassCET4" team that rushed to the final circle, said that in the computer world, data is not only stored, but also involves reading, such as browsing history, shopping cart, game archive, all need to retrieve compressed data, and users will not wait for more than ten seconds, or even minutes to wait for the interface to jump out.

A 24-hour man-machine war without gun smoke, this group of post-95s fought

Figure | Chinese University NoPassCET 4 small group photo of three people

Prior to this, "NoPassCET4" has competed with nearly 2,000 database enthusiasts from 246 universities, 200 enterprises and 1179 teams at home and abroad, from Tsinghua University, Renmin University, Zhejiang University, Nankai University, University of Electronic Science and Technology of China, East China Normal University, Huazhong University of Science and Technology, Columbia University, Hong Kong Chinese University, Nanyang Technological University and other domestic and foreign universities.

At 13:00, the finalists such as "NoPassCET4" got the final test question: dividing a set of 3 million rows of * 48 bytes of data into 9 columns, stored as a 128M data file, and a 60M index file.

In the past, in school laboratories, such a data volume often took 7 to 10 days to complete. But now, they only have 24 hours, and during those 24 hours, they need to analyze the data, find a solution path, write the code or algorithm, implement file compression, implement index compression, and finally verify the effect.

At the same time, they must also seek a balance in all these steps, so that the final technical presentation can be complete, easy to use, and even aesthetic, and the time is very tight.

For the next 24 hours, they had to go all out.

If the "NoPassCET4" team name is translated directly into Chinese, it is "Squad Below Level 4".

In programmer circles, this is an "old terrier" - if someone on the team has not even passed the 4th grade of college English, he must have spent all his time in the lab doing research. "Have you worked so hard, isn't it yours to be afraid of the championship?" Captain Huang Renhuang said.

The team composition of "NoPassCET4" is somewhat special, with a couple including one of the three squads. "Like the divine eagle heroes, they are chivalrous, I am the divine eagle," said Wang Yuanzhen, a member of the team who loves martial arts novels on weekdays, with a smile.

At a campus publicity event, Captain Huang Renhuang and his girlfriend Tu Zhanhong learned about the "2021 OceanBase Database Competition", which was organized by the domestic distributed database leader OceanBase and the academic cooperation team of Ant Group, for college students who love databases nationwide, which can not only help students systematically learn database theoretical knowledge from scratch, but also help students accumulate experience in enterprises.

A 24-hour man-machine war without gun smoke, this group of post-95s fought

Huang Renhuang pulled up his girlfriend Tu Jianhong and his apprentice Wang Yuanzhen, and the three of them signed up together. "This combination is actually very subtle, the three of us have different ideas," Huang Renhuang introduced, "when facing the problem, we can produce different ideas and solutions, and finally evaluate who is the most effective, use whose plan." ”

In this final, "NoPassCET4" also chose a different path to conclude.

Huang Renhuang's idea is to compress this data into columns by encoding, saving space, and other reserved rows, "just like a table, the regular numbers are represented by simple functions, which has the effect of saving space." ”

Wang Yuanzhen put forward a bolder idea, he hopes to find a code to compress all the columns, and then read uniformly - which means that he not only has to find the code of the regular column, but also find the code of the irregular column, and the amount of computation involved in the whole is very large.

For a short time, neither side could persuade the other to accept its own plan. In order to complete the solution as quickly as possible, they decided to fight their own battles first.

Wang Yuanzhen also compressed all 9 string columns into 20-bit strings and tried to decompress them using dictionary encoding. But most of the strings are irregular, Wang Yuanzhen's side of the calculation is very difficult, he is a little worried, it seems that it is difficult to complete the goal in 24 hours.

But fortunately, the Yellow Tu squad had some eyebrows in 3 hours. Taking a globally increasing data column as an example, Tu Zhanhong solved the problem of complex operation by adding hidden fields in front of the column, and quickly marched to the next step.

Just when Huang Renhuang confidently opened the next stage, the other team in the final field fell into a state of anxiety - the "East Asian Boys' Team" from East China Normal University found that the data could not be derived.

The captain even Xue Chao was "confused in his head at that time." Before participating in the finals, Shanghai was experiencing a complex epidemic, and the three members of the team were trapped in the dormitory building, and could only communicate through online conference software every day, and shared the screen to let the team members check each other's code for problems.

A 24-hour man-machine war without gun smoke, this group of post-95s fought

Figure | East China Normal University East Asian Boys' Team three-member small group photo

In order to boost morale, all three members of the team changed themselves into a "little yellow chicken head", Lian Xuechao, the captain who likes Arab culture, is a small yellow chicken wearing an Arab turban and the badge of Mohammed, Weng Siyang, who likes Taoist culture, is a small yellow chicken with a Taoist crown, and Hu Zirui has brought Marx's same white hair to his little yellow chicken head.

They had hoped to boost morale through a consistent avatar, but who would have thought that something went wrong in the course of the game.

Like the Yellow Tu squad of "NoPassCET4", the "East Asian Boys' Team" also found the regularity of some data columns in the data, and at about 1 a.m., found the corresponding dictionary code and delta code, and prepared to start building tables.

Generally speaking, when doing the technical operation of the data, everyone will first write out the technical command based on the basic idea, and then configure the technical command into the table and import the original data for verification. But unexpectedly, just as they were ready to continue to conquer delta encoding and the B+ tree encoding involved in indexing, the content that popped up on the computer was "Table building failed!" ”

By this time, 12 hours had passed since the race. After repeated communication with the staff, the East Asian boys' team only checked out, it turned out that the loading time presented in the program was too long, and after the successful construction of the watch, the time had reached 4 a.m., and there were only 9 hours left before the end of the game.

Because of the long time of play, there was no rest, everyone in the team was exhausted, but no one shouted tired, and everyone was still insisting on solving the problem.

Similar to the "East Asian Boys' Team", another team from the University of Electronic Science and Technology of China, "push_d_", was also stuck for 3 hours because of a small problem, and their time also became tense.

"push_d_" captain Li Hao, teammates Li Shihao and Wang Shuhan adopted the same idea as "NoPassCET4" Wang Yuanzhen at the beginning, but they did not start from coding, but chose Huffman algorithm.

Huffman's algorithm belongs to entropy coding, according to the probability of different characters appearing in the data, with different lengths of encoding to represent different characters, "push_d_" hopes to solve the problem with The Huffman algorithm through another 6 sets of irregular data columns.

Team leader Li Hao was full of confidence at first, after all, the "push_d_" trio team had used the design foundation laid by the brothers and sisters to make the basic functions of the database in the university database laboratory in half a year. Because the tutor's surname is Duan, they left the D letter during the competition, named the team "push_d_", and used that time to spur themselves forward with the expectations of the teacher.

A 24-hour man-machine war without gun smoke, this group of post-95s fought

Figure | University of Electronic Science and Technology of China push_d_ a small group photo of three people

The three members of the team are more accustomed to using computers to talk, appear a little shy in reality, always look down at every turn, and seem to be less confident, but in fact, they are the "bravest one" of all the teams, and they are the fastest people to think of direct compression of all data columns in a non-coding way.

Perhaps because he was too nervous, teammate Li Shihao ignored 0 when converting values, stuck for 3 hours, only to find that the basic value was wrong, he patted his head, "How to make such a mistake!" The head is numb! ”

Li Hao and Wang Shuhan were rushed by him to rest, Li Shihao thought by himself, thought about it and then went through all the content once - in the previous preliminary and semi-finals, he felt that he did not do enough, and he wanted the brothers to rest a little more and take on more responsibilities in the final.

For a full 24 hours of the game, Li Shihao did not rest for a minute, and finally the three finally completed all their technical steps at 12 o'clock the next day, achieving the compression goal.

At 10 a.m. on April 29, the game had lasted 21 hours.

The "East Asian Boys' Team", which had been stuck for 3 hours due to the failure of the construction of the watch, finally completed all the technical instructions at about 10 o'clock, and after seeing that they had ranked first in the hacker song real-time list, they breathed a sigh of relief and lay on the table for a while.

As a result, when they woke up at 11:30, the first place in the real-time list appeared with the name "NoPassCET4", and Huang Renhuang and Tu Jianhong also completed all the technical steps at 11:30 a.m.

Immediately after, at 12 o'clock, the "push_d_" also chased all the way, and finally completed the Huffman algorithm and the score quickly climbed.

Lian Xuechao, the captain of the "East Asian Boys' Team", was frightened, he had thought that their degree of compression was higher, the index was compressed by 40%, the storage was compressed by 47%, and there should be no other team that would catch up. After discovering that "NoPassCET4" surpassed them, the "East Asian Boys" immediately went to check whether there was any room for compression in their path, but it was too late.

In the end, "NoPassCET4" won the championship. During the review, "NoPassCET4" shared their byte optimization changes to index data, and the "East Asian Boys" earnestly commented after reading it: This method is indeed simple and crude, very direct to achieve the compression results, and the "push_d_" Huffman algorithm is also amazing.

In this competition, there are other teams that also show a rather bright side.

A 24-hour man-machine war without gun smoke, this group of post-95s fought

Figure | Winners of the contest

"Two tigers eat radish" team, team members Shan Haikang, Chen Jingang, Li Xiang were in Wuhan, Shenzhen two places, cross-regional cooperation, the final because of the simple venue, mosquitoes, the team played a night of mosquitoes, and in the course of the game encountered laboratory server downtime, distributed encountered downtime, but finally at 5 a.m. to adjust the code, won the third prize.

East China Normal University's "lying_flat" team, called Lieping, actually went all out. There is only one captain in the team, Jia Yuhang, who carries a team alone, and his survival concept is "seize the day" in "Death Poetry Society" to grasp the moment.

This philosophy also allowed him to grasp his own life through the database competition during the Shanghai lockdown, and finally won the third place. In Hackathon's mentor comments, the mentor commented on him: "One person is a team, the analysis of B+ tree optimization is more in-depth, and there is a certain understanding of column existence, and personal ability is strong." ”

The "Stomach and Digestion" squad of the University of Science and Technology of China originally had 2 classmates, and then the teammates withdrew from the game for some reason, and the captain Pan Renhua interned in the company during the day and took time to play at night, and he completed the whole process alone.

Huazhong University of Science and Technology's "111111" squad also suffered the accident of teammates withdrawing from the competition, and the captain alone supported the whole process. Their score may not be high, but the perseverance and patience to persevere until the end of this hackathon are still touching.

A 24-hour man-machine war without gun smoke, this group of post-95s fought

In fact, this is the most moving part of the technology competition, where everyone learns from each other and makes progress together through the collision and exchange of technology, not just whether or not to win.

This 24-hour showdown is not the whole of this database contest. Since october last year, all the teams have been learning and practicing at the same time, and eventually these 20 teams have reached the final.

Among the names of these technicians, the final has another name: "Hackathon". It originally refers to a kind of programmers getting together and working closely together to implement a certain technology, which is a marathon in programming and a carnival of programmers using technology to communicate and collide.

In the database contest, it also means that the last 24 hours are the end of this marathon for these participants.

At this moment, the pressure on each team is enormous, but each group has withstood this pressure and stepped down.

In the world of technology, there is no clear concept of "first".

People have different understandings of technology, and when faced with the same technical problem, they often come up with different technical implementation paths. When this path is not the only one, it will be possible to blossom. Because everyone can understand the methods of other people's technological implementations, learn from each other and make progress together in this kind of harmony and difference.

That's the beauty of technology.

After the game, each team also had a different understanding of the technique.

Huang Renhuang of "NoPassCET4" feels that one of the focuses of the database is to balance and balance the performance of all aspects to achieve better overall performance. He found the database more complex and more attractive the more it was learned.

Lian Xuechao of the "East Asian Boys' Team" always paid attention to performance when doing academic research in the past, but after this practical operation, he found that performance is important, but for users, it is not big to improve consumer perception after performance reaches a certain level, and it may be better to focus on the database ecology. He had a shift from academic to applied feelings.

"push_d_" in this process, they are exposed to some knowledge that has not been exposed to before, and the overall design of the database competition while learning and comparing, so that they have a new experience of the database from theory to practice.

In the 14th Five-Year Plan, the state advocates technological autonomy, and in the field of computing, the autonomy of the database as one of the infrastructures is particularly important.

In the past decade, China's databases have been built from scratch, such as Dameng, OceanBase, and so on. But this is far from the end, China's database to go down, but also need to accommodate different blood and talent.

For many of the technologists involved in this game, it was the challenge of the database that was more interesting and engaging. They believe that China's database will not stop there, and they want to contribute to the future of China's database.

Read on