laitimes

Analysis | Omicron attack, how to make the health code not "drop the chain"?

Guide

The health code has become the "touchstone" of government management capabilities, and its system architecture design should follow the principle of elasticity, which can be stacked in parallel with equipment or services to ensure processing capacity; load volume design needs to find a balance between efficiency, safety and cost

The health code is a key link in accurate epidemic prevention, which is related to the daily production and life of people under the epidemic mode, and has become a "touchstone" to test the government's management ability. Photo/People's Vision

Wen | Caixin Qin Min

At the beginning of 2022, a new round of COVID-19 with the Omicron variant as the protagonist hit, and the situation was more severe than ever. Cases were sporadic in various places, especially in Shaanxi, Tianjin and Henan, with 190 new confirmed cases and 123 new locally confirmed cases in the three places on January 12, and as of January 12, there were 3460 confirmed cases nationwide.

The health code is a key link in accurate epidemic prevention, which is related to the daily production and life of people under the epidemic mode, and has become a "touchstone" to test the government's management ability. Recently, the health code in many places has "dropped the chain": on January 10, the Guangdong Kang code used daily in Guangdong was abnormally accessed; on January 9 and January 10, the Tianjin nucleic acid testing system was intermittently "paralyzed"; earlier, the Xi'an one-code pass collapsed twice in more than a month. Judging from the official notification, the failure of the health code is mainly due to the excessive load and insufficient system capacity. For example, the Yuekang Code announcement said that due to abnormal access, the traffic reached up to 1.4 million times per minute, exceeding the bearing limit and triggering the system protection mechanism; Xi'an Yidian pass also explained after the first crash that the number of visits per second reached more than 10 times the previous peak, resulting in network congestion.

An industry insider analyzed that the concurrency of health code design in many places is about one million people per minute, usually millions of concurrency is enough, but there are many special situations under the epidemic situation, which is difficult to predict in advance, resulting in a surge in the number of health code visitors in a short period of time and collapsing the entire system.

"Xi'an one-code pass crashed twice, which is essentially a software load design problem." A person close to Xi'an One Code Pass told Caixin that the architecture design of Xi'an One Code Pass is also based on about one million concurrently per minute, and after the first crash, it has rapidly expanded by two or three times. However, on January 4, Xi'an required a new round of nucleic acids for all employees, and the time point of nucleic acid detection and sampling in each district was relatively concentrated, resulting in an instantaneous outbreak of user visits, far exceeding the peak of concurrent design after expansion. After overloading, the entire system is under pressure, especially the firewall problem, resulting in the underlying data can not be called.

A technical expert introduced to Caixin that from the architecture, the health code operation mechanism is usually as follows: users access the health code, issue requests, firewalls and other security mechanisms are activated, invalid or even malicious requests are filtered, clean traffic is allowed to enter the load layer, passed to the business layer through the load layer, and then call the underlying data of the government cloud according to different needs and feedback to the user. "After the traffic is accessed, we will layer the traffic, just like the user queues, someone in front, someone in the back, to ensure the availability of the service, rather than a brain blocking the door, no one can pass." We will also do a distributed design, leaving the data to N service areas to process, rather than all the data in the same background processing. ”

……

Read on