laitimes

The Weibo that was paralyzed by Lu Han's official announcement, so that the design of the high concurrency system stabilized the | geek time

Recently, many students who have been promoted and job-hopping have left messages in the background, saying that they have invariably encountered the problem of "high concurrent architecture design" in the assessment or interview.

Indeed, from ordinary programmers to architects, from programming thinking to architectural thinking, "high concurrent architecture design" is indispensable, and now it has become a high-frequency interview question for entering a big factory.

The difficulty is that many people do not have high concurrency project scenarios in their work, and today through the simulation design of a microblog system that supports a volume of 1 billion users, it takes you to practice high concurrency architecture design immersively.

This design case, from the tongcheng travel transportation chief architect Li Zhizhi's new column "High Concurrent Architecture Practical Lesson", mainly selects 15+ typical high concurrency system cases, using the most commonly used "software design document" form by architects, to present you with the architect's high concurrent architecture design ideas, want to see the complete column can be directly pulled to the bottom.

We know that one of the characteristics of Weibo is that when the star V publishes a more topical dynamic, such as Lu Han's announcement, it will cause a lot of reading, commenting and dissemination. The high concurrent access caused by such sudden hot events can put extreme load pressure on the system, and improper handling can even lead to system crashes.

The Weibo that was paralyzed by Lu Han's official announcement, so that the design of the high concurrency system stabilized the | geek time

So, how should the weibo-like information flow system architecture be designed to solve the sudden high concurrent access pressure generated by hot messages and ensure the availability of the system?

Today we will design a microblogging system that can support 1 billion users, and the system name is "Weitter". There are three main steps: requirements analysis, outline design, and detailed design.

Requirements analysis

Weitter's core functions are only three: tweet, follow friends, and brush Weibo. In addition, users can also bookmark, forward, and comment on Microblogs.

Load metric estimation

Text content storage: 100 GB/day

Multimedia file storage: 60 TB/day

Access concurrency estimation

QPS: 46296/sec, peak QPS at 2x average: 100,000 QPS

Network bandwidth: 4.8 Tb/s

Outline design

From the requirements analysis, it can be seen that Weitter's business logic is relatively simple, but the concurrency and data volume are relatively large, so the core of the system architecture is to solve the problem of high concurrency, and the overall deployment model of the system is as follows:

The figure includes two links: Get Request and Post Request.

Get request

Most of the bandwidth-consuming requests for users to access Weibo's data centers, pictures, and videos can be hit by the CDN cache, that is, more than 90% of the bandwidth pressure of 4.8Tb/s can be digested by the CDN.

Post request

Instead of going through a CDN and reverse proxy, the client reaches the application server directly through the load balancing server. On the one hand, the application server writes the published microblogs to the Redis cache cluster, and on the other hand, it writes to the shard database.

Detailed design

Post/subscribe issues on Weibo

This is the core business problem of Weibo, that is, how to quickly get the latest Weibo content published by Weibo users after they follow their friends.

When the user is online, we can use the "push mode": create a user subscription table, after the user's friends post Weibo, immediately insert a record for the user in the user's subscription, recording the user ID and the Weibo ID posted by the friend.

If the user is not currently online, then the system will delete the subscription table, when the user logs in to refresh, use the "pull mode" to rebuild the list for it: when the user refreshes the Weibo, according to the list of friends he follows, query the recent weibo of each friend, and then sort all the Weibo in chronological order to build a list.

Cache usage policy

The overall caching architecture of Weibo adopts a time elimination algorithm: all microblogs published by users in 7 days are cached. The cached key is the user ID, and the value is a list of Weibo IDs published by the user in the last 7 days. The Weibo ID and Weibo content are also cached as key and value, respectively.

In addition, for particularly popular Weibo content, local caching mode is enabled: if the user has more than 1 million followers, cache all weibo posts published within 48 hours. The application server caches particularly popular Weibo content in memory, and when the application builds a Weibo refresh page, it will prioritize checking whether the Weibo content corresponding to the Weibo ID is in the local cache.

Database sharding strategy

Weibo's databaseWe use a distributed database deployed in shards. The rules for sharding use user ID (hash value) sharding: all tweets posted by a user are saved to a database server. When the system needs to find the Weibo posted by the user, it only needs to visit one server to complete it.

However, there are also disadvantages to doing so, for a large V user, its data access will become a hot spot, which in turn will cause the server to be under too much load pressure. Similarly, if a user posts frequently, it will cause the server's data to grow too fast.

For this problem, it can be improved by optimizing the cache; and the problem of a user frequently posting microblogs can be solved by setting the upper limit of the number of tweets published per day.

How to practice a high concurrency architecture without a real-world scenario?

The case study is basically here, it comes from Geek Time's new column "Li Zhizhi · The most attractive thing about the column is that Li Zhizhi has formulated a set of practical learning methods for the difficulty of "no real scene in the work of ordinary programmers and lack of high concurrency architecture design practice":

Bring in the architect's perspective and experience the real system design scenario;

Follow Li Zhizhi to dismantle 15+ typical high concurrency architecture cases, gathering the strengths of the zhongjia technology;

Try to model the software yourself and write architectural design documents;

Most of these cases are highly concurrent, high-performance, and highly available systems that everyone is currently concerned about. For example, network disks, search engines, short video applications, taxi software, dating software, Weibo, etc. They are excellent "class representatives" of high concurrency architecture design, and their technology can solve more than 80% of the existing high concurrency commonality problems.

Most importantly, Lee redesigns these well-known applications rather than analyzing how these existing applications are designed.

Who is Li Zhihui?

He is now the chief architect of Tongcheng Travel and Transportation, having worked as an architect at Alibaba and Intel. He has been involved in the architectural design and development of alibaba.com and Apache Spark, and has experienced high concurrency technology challenges ranging from zero to one million daily orders as CTO.

He has opened two column courses in geek time, namely "Learning Big Data from 0" and "Back-end Technology Interview 38 Lectures", which has been watched by nearly 3W people and praised. Li Zhizhi can be said to be a real bull in the field of high concurrency, and he poured 15 years of architectural design experience into the column.

Start with new offers without losing money

Limited time spike + password "BINGFA999" only 69

Friends who want to learn high concurrency design in depth, broaden their skill tree, and recommend you to study deeply.

Finally, I would like to say that an engineer who cannot think from the perspective of an architect, lead the team, and complete the architectural design and development of a system as a whole will never understand how to be an architect.

Now is the best learning opportunity, how dry it is, take a look at the catalog:

The Weibo that was paralyzed by Lu Han's official announcement, so that the design of the high concurrency system stabilized the | geek time

Read on