Bilibili Live Universal Ranking System

Positioning and business value of the listboard system

The list covers all corners of the live broadcast related business of station B, live broadcast reward, live broadcast room interaction, paid gameplay, interactive gameplay, activities, anchor PK, chat room, popular anchor ranking, high-value user ranking, value-added collection cards, up main charging, etc., in these many business scenarios, we can see a variety of lists.

The existence of the list can stimulate the enthusiasm of the anchor to improve the performance level and improve the quality of the performance, so as to attract more audiences. The audience can also learn about other people's interactions and tips on the anchor through the rankings displayed in the list, and motivate him to participate more actively in the interaction or reward, so as to gain a sense of identity and presence. Through the list, the anchor can get higher income and more exposure traffic. In short, the list is an important bridge connecting the platform, anchors, and audiences, and has a great effect on improving the good atmosphere of the entire live broadcast. In addition, the rules for users to be on the list are diversified to ensure that the consumption tipping behavior will not be overly commercialized, and it also plays a positive role in guiding the audience to rational consumption and the healthy development of the platform.

Business Overview

The following is a product business architecture diagram of the list system, although many details are ignored, but from this diagram, we can have a preliminary understanding of the whole picture of the live list system of station B.

System introduction

As mentioned at the beginning of the article, there are many types of lists in the live broadcast business.

From the perspective of the roles of the members on the list, there are anchor lists, user lists, props lists, room lists, hall lists, and guild lists;

From the perspective of the scope of competition rankings, there are whole site lists, regional lists, event season lists, and live broadcast lists;

From the perspective of the granularity of the closing time, it can be divided into long-term list, quarterly list, monthly list, weekly list, daily list, and hourly list;

From the perspective of specific scenarios, some businesses will also divide the list into "tracks", such as regional lists, boys and girls lists, and the meaning of "track" is customized and self-understood by the business side.

It can be seen that the business carried by the list system is not only multi-scenario, but also has different access forms. Therefore, in terms of ease of access and development efficiency, there are high requirements for the versatility and flexibility of the list system. Next, let's introduce the list system from several aspects, such as list configuration, business access, and challenges.

Leaderboard configuration

After such a long period of development, the list has become a companion demand for almost every business iteration. In many cases, every time a new business requirement is launched, one or even more new lists will be launched at the same time, so the list system has become the "basic component" of daily business demand iteration.

So how do we quickly configure a new list? The way to do this is to open the management background and match it.

Basic configuration

The list ID will be automatically generated by the system when the new list is configured, and will not change after it is generated. When you read and update the list data through the API, you need to specify the list ID. The list ID is the "hard currency" in the list system.

The name of the list is the business logo, which is easy to identify.

The ranking system uses Mysql as the persistent storage and redis kv as the cache of the scores of the ranking members, and the implementation of the ranking function relies on the capabilities of Redis zset. The cache TTL of the leaderboard and the cache TTL of the leaderboard members' scores are set according to the requirements, taking into account the service traffic pressure and the memory pressure of the cache cluster. The length limit of the leaderboard must also be set, that is, the length of zset must be limited, and the large key should be avoided in the Redis storage node while meeting the product requirements.

The R&D side can also specify which set of Mysql clusters and Redis clusters to be used based on actual business conditions, so as to ensure reasonable storage and traffic isolation. After all, some of the list services may not have more than 100 read and write QPS, while some services can have more than 100,000 read QPS per day. There are also great differences in the amount of data in the list between different businesses, and some businesses have a large amount of data but a short life cycle, so you can allocate separate storage for them and formulate different regular cleaning and archiving policies.

The start time and end time of the points should not need to be explained, in fact, it is the effective time range of the list online, which is mostly used in the list that automatically adds points and is launched on a regular basis.

The score splitting configuration is related to the ranking method above. In practice, there will be sorting strategies such as "the one who gets this score first when the score is the same", and "the one with the highest fan medal rank when the score is the same". We know that Redis zset uses a double 64-bit floating point number to represent the sort weight score of a member, and the maximum significant number of digits of the double type is 16 bits. So we can specify that the n-digit integer part represents the true business score, and the (16-n) decimal part assists the secondary sorting. When you call a common API to update the score at a certain time (the timestamp is ts), there are three ways to calculate the final score according to the configuration:

Sort in chronological order. The score parameter is taken as the integer part, and the result truncated by (999999999-ts) is taken as the decimal part.
Sort in reverse chronological order. The score parameter is taken as the integer part, and the result of ts truncated by the number of digits is taken as the decimal part.
Custom sorting. The score parameter is taken as the integer part, and the subscore parameter is taken as the decimal part, that is, the integer part and the decimal part are calculated by the business.

Finally, the integer part and the decimal part are literally concatenated as the zset member score. If the final score of the list members is still the same, it follows the internal implementation of zset and sorts them according to the binary dictionary order of the list members' keys.

Configure the list dimension

Before the list is launched, you also need to configure the bonus dimension of the list, the time granularity of the list, etc.

The dimension items in the general dimension configuration are preset based on the common dimensions in the live broadcast business, and if the extra dimension is checked, the list will be divided horizontally. Here are some examples:

If you do not check any of the options, there is no need for horizontal rankings, that is, one list for the whole site. For example, the anchor popularity list is the popularity ranking of all anchors on the whole site, and only one list instance is needed.
If Streamers is selected, there will be an instance of the list for each host. For example, the anchor popularity support list is the ranking of all users who contribute to the popularity value of a particular anchor.
If the two options of anchor and session ID are checked, it means that there will be an instance of the list for each live broadcast of each anchor, such as the reward list of each live broadcast conducted by the anchor.
When the preset dimension cannot meet the logical requirements, it is left to the business side to customize the extra dimension, and the business side understands it. When updating and reading the list, the business uses a unified algorithm to calculate the dimension value.

If the time dimension is configured, each list instance will be vertically ranked according to the configured time granularity, and the time range is the natural time, and the identifier is the start time of the natural time period. For example, if you select a natural day, you will automatically switch to the new day's list at 0:00 every day. The implementation method is not complicated, that is, according to the timestamp of each bonus behavior, calculate which natural time period the current moment belongs to, and you will know which score to add points to. The same is true for the reading of the leaderboard, which leaderboard should be read based on the timestamp specified in the request.

It is conceivable that every time the list system is written and read, there will be an operation to format and calculate the time according to the timestamp, and the formatting calculation of the time is more performance-intensive. So here's an optimization point, we can cache the start and end timestamps of the natural time and the corresponding formatting results, instead of having to do a calculation every time, and only recalculate when the time is outside the effective range of the cache, which is similar in some open source high-performance logging libraries.

Automate the configuration of extra points

As the name suggests, the automatic bonus point is that when a specific event occurs, the list configured with this automatic bonus point will be triggered to update.

The origin of the automation bonus

In the business code related to live streaming, MQ is used to subscribe to upstream business messages, and if each service access needs to be registered separately and the consumption logic code must be implemented once, a lot of similarity work will be done. Therefore, in order to improve development efficiency and reduce the cost of each business party accessing MQ, a behavior system is implemented in the department, and all upstream messages are subscribed and consumed by the behavior system, and then the RPC interface of each downstream business party is called for behavior event delivery. Of course, there is another issue that cannot be ignored here, but it will not be the focus of this article.

As the downstream of the behavior system, when a behavior event occurs, the list RPC interface is called, and the behavior event will fan out to all the lists with the corresponding switch turned on. In this way, the business side only needs to configure a list in the management background, and the list can be directly displayed where it is needed, which is still very convenient.

Of course, not all services are suitable for the automatic scoring link, they need to calculate the score in their own business layer logic, and then actively call the general interface of the list system to update the list.

The logic of the specific bonus process will be introduced later.

Filter rule configuration

In the data processing process of the list system, a general filter configuration module is integrated, and the specific configuration page is shown in the figure below.

Filtering rules are expanded with the development of the business, and the scope of application is relatively narrow. Level 1 and Level 2 partition filtering indicates that points will be added only when the behavior event occurs in a specific live broadcast section, otherwise no points will be awarded. The opposite is reverse filtering.

If the behavior event is a message that the user buys a gift, you can also specify that only specific products will add points to this list, which is commonly used in some activities.

The special list bonus is a special business logic hook, which can be understood as a callback hook placed for special services in the generalization link, which is also commonly used in activities. It can also be understood that some business logic is coupled in the generalized link, and it is no longer recommended to use it for regular services.

Data storage

Combined with the infrastructure situation and its own business characteristics, in terms of data storage selection, the list system selects MySQL and Redis as persistent storage systems and cache systems respectively. In particular, it also relies on the ability provided by Redis's zset to implement leaderboard listings. In terms of storage structure design, the list system is closely related to and corresponds to the design of the list plus point dimension configuration.

MYSQL STORAGE

The structure of the MySQL storage table in the general list system is also unified, and some of the most important fields are as follows:

CREATE TABLE `some_rank` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
  `rank_id` int(11) NOT NULL DEFAULT '0' COMMENT '榜单ID',
  `type_id` varchar(100) NOT NULL DEFAULT '' COMMENT '子榜标识',
  `item_id` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT '榜单成员标识',
  `score` bigint(20) unsigned NOT NULL DEFAULT '0' COMMENT '榜单成员积分',
  // ......
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='xxx榜单数据表'

Fields such as id, rank_id, score, rank, and score can be easily understood from the name and comments, so there is no need to explain more. We're focusing on the type_id and item_id fields.

type_id field type is verchar, and its value is calculated according to the "Dimension Item" configuration of the list. The code implementation encapsulates the fixed logic, and assembles and combines the values of the type_id fields according to the dictionary order of the name of the fields of the selected bonus dimension. After the bonus dimension of the list is fixed, the algorithm results are also relatively stable.

For example, the bonus dimension of a business list is configured as follows:

Then, when calculating the type_id, it is concatenated in the form of a string of "${anchored sub-list starting time point}_${anchor uid}_${other}". The starting point of the anchored sub-leaderboard will be calculated based on the bonus points or the timestamp in the query request.

For example, the body and field comments of a bonus request are as follows:

{
  "rank_id": 12345, //榜单ID
  "item_id": 110000653, //榜单列表的一个成员ID，此请求传入的为用户的UID
  "score": 1980, //本次请求所增加的分数值
  "dimensions": { //加分维度必要的参数
    "ruid": 110000260, //主播的UID，也就是加分通用维度中“主播”选项对应的参数及value
    "timestamp": 1713165315, //加分行为的时间戳，用于锚定按时间切分的子榜
    // 其他可能需要的维度参数
  }
  // 其他必传参数，如必要的幂等key、业务标识等等
}

The timestamp carried by this bonus behavior is 1713165315, that is, 2024-04-15 15:15:15, and the time dimension is configured as 1 natural month, then the starting time of the natural month to which this bonus behavior belongs is 2024-04-01 00:00:00, and the timestamp is 1711900800, so the type_id calculated by this bonus request is "1711900800_110000260", item_ The value passed in the ID field is the user's UID.

Then, this bonus behavior can be interpreted as: the user (uid: 110000653) contributed 1980 points to the anchor's (uid: 110000260) list this month, and the corresponding data row (main field) is as follows.

rank_id, type_id, and item_id are the primary keys of a data row.

Similarly, if the time dimension configured for this list is 1 natural day, then the starting time of the natural day for this bonus behavior is 2024-04-15 00:00:00, and the timestamp is 1713110400, then the type_id calculated by this bonus request is "1713110400_110000260".

From this, we can also see that each individual bonus request requires the delivery of a timestamp of the business side where the behavior occurred, the gift-giving bonus behavior carries the real-time timestamp of the user's gift-giving, and the barrage bonus behavior carries the real-time timestamp of the user's barrage. In this way, even if there are occasional network jitter, MQ message delay and other abnormal situations, there will be no "yesterday's gift added points to today's sub-list", even if there is a consumption cursor reset, MQ rebalance, etc., of course, this also needs to be idempotent to ensure that there will be no repeated points.

Redis caching

When Redis caches the list, the key of zset is also composed of rank_id and type_id, so that the bonus points and queries will be anchored to the correct key according to the timestamp. I won't repeat it here.

The list has been updated

After introducing how to quickly configure and launch a list, how to update the list with the service code after the list configuration is completed. The following is a flowchart of the list bonus points as an illustration.

The flowchart shows the general steps and brief instructions for the extra points, and here are a few key points to note.

If multiple lists have the corresponding automatic bonus switch turned on, one action triggers the update of multiple lists. If a list fails to be updated, it will automatically retry internally.

If the upstream MQ resets the cursor or triggers rebalance, the message will not be repeatedly consumed.

Some lists have high requirements for the real-time perception of users, and the service code will rely on the ability of the long-connected system to broadcast the latest list data to the device as needed, so that users can see the latest list data without re-entering the room or actively refreshing the list.

Business challenges

As mentioned at the beginning of this article, the wide range of application scenarios of the list system means that the read and write traffic of the list system is very high. For some lists, it is necessary for the audience to see the changes in the list or rankings in time, especially the consumption tipping behavior. The list also plays an important role in the operation of the platform, such as the distribution of incentives to anchors and users, and the settlement of rewards for activity results, many of which rely on their rankings on the list.

Therefore, the accuracy of the list data, the robustness of the system, and the efficiency of the interface are also important health indicators.

Write performance

At present, the performance bottleneck in the list writing mostly comes from user interaction behavior events in the automatic bonus link, such as continuous viewing, liking, and barrage, which are generally continuous and massive, and are not sensitive to the experience compared to the user's consumption and reward behavior.

In the second case, when there is a large-scale event or event, there will be a single-point popular live broadcast room, and a large number of writes will cause throughput pressure on storage shards (Redis and MySQL), and data may be backlogged or eventually lost due to write timeouts, and retry avalanches may occur in some scenarios.

For the first case, the list system introduces a layer of "slow queue". After the bonus request for an interactive behavior event is preprocessed, it is not directly updated to the store, but is delivered to the slow queue. Slow queue MQ connects to the company's asynchronous event processing platform, and uses the platform's aggregation capability to aggregate data from the same write dimension before writing it to storage, greatly reducing storage pressure.

* Image from the asynchronous event processing platform documentation

For the second case, we mostly use a downgrade strategy, this kind of live broadcast room often does not need to display unnecessary lists, you can downgrade the entire list, and you can just do not write or display. It is also possible to downgrade certain event behaviors, such as when hitting the live event room, the interaction behavior will be downgraded to no points.

Read performance

The read traffic on the list is basically hit to the Redis cluster, and the Redis cluster can withstand the daily pressure. Read performance bottlenecks are still found in popular live broadcast rooms, such as daily live broadcasts, live broadcasts of large events, and live broadcasts of large parties or events, in which case the pressure on a single node in the Redis cluster is often too great. For the hot live broadcast room, we solved the problem by adding the second-level memory cache.

Connect to the CDN Popular Room SDK. This is a public component in the company, and through the configuration capabilities provided by this component, we can know whether the current live broadcast room is judged to be a popular live broadcast room. If the threshold of popular live broadcast rooms is reached, the system caches the list in memory to reduce the pressure on a single storage node in the Redis cluster.
A hot spot detection component was developed. By recording the access to the ZSET of the list, the hot ZSET key is cached at the memory level.
Configure forced hot rooms. If necessary, manually designate some live broadcast rooms as popular live broadcast rooms to force memory caching.

Adding a L2 cache will certainly come with some real-time sacrifices, but it will all be done within acceptable limits for the business.

Architecture as is

An internal simplified schematic diagram of the architecture is introduced here for reference only.

Follow-up planning

Code quality has been further improved

The design of any system cannot be separated from the actual application scenario, and it requires compromises in all aspects. The current list system is actually the product of "changing wheels while running", for example, there are many business logics that are still coupled in the general link. There have been some hierarchical design concepts before, and they have also tried to introduce more reasonable design models to enhance scalability and maintainability, but they have not been fully implemented and need to be continued.

Log governance

At present, the log volume of the list system is very large, at most nearly 20T per day. Among them, there are many unreasonable practices such as noise logs, repetition of meanings, and large-scale structure output, which need to be sorted out and governed.

Storage governance

There are many old and expired data, and there are many configurations, storage tables, and non-TTL caches left over from the list that are no longer used, and mysql data does not have the ability to automatically expire, so you can build automatic alarm and customized archiving capabilities.
Although there is cluster isolation, some business lists use multiple clusters in the code through hard code, and the advantages of the so-called "core" and "non-core" data isolation storage are also lost, and the overall feeling is that the use is somewhat confusing.
If you want to view the current usage of storage clusters in the ranking system, you need to look at the monitoring of multiple platforms of multiple MySQL and Redis clusters, such as capacity monitoring and access traffic monitoring, and then summarize and evaluate them.

The new structure of the list system

Based on some of the pain points of the current list system

More reasonable architecture layering and clearer boundary division between domains; The core of the list is supported by the business layer, which is scalable and testable in a self-consistent closed loop.
The real-time processing of bonus behavior and the optimization and improvement of data consistency.
More scientific storage selection.

Author: Line of Business

Source-WeChat public account: Bilibili Technology

Source: https://mp.weixin.qq.com/s/XCWM1OFDKXBGxa7TebrrzA