laitimes

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

author:JD Cloud developer

1. Introduction to the hit interface of the portrait system

What is a portrait system

The label portrait system is a data management and analysis tool, which integrates and analyzes multi-dimensional information such as user behavior data, transaction data, and social data to build a detailed portrait of users, helping our operators to better understand the target user group, so as to achieve precision marketing and refined operation.

What capabilities are provided: tag registration, tag precipitation, tag value; group circle selection; Core capabilities of group services (group hits, group downloads and subscriptions, etc.).

How to achieve precision marketing and refined operation

Hit Interface: It acts as a bridge between user data and marketing campaigns, ensuring that marketing messages can be accurately delivered to the most appropriate user groups.

Business scenarios

▪ Scenario 1: The cashier page of the mall, whether the coupon is displayed or not, and the amount of the discount.

▪ Scenario 2: What kind of marketing activities are displayed in the resource slots of the financial APP.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

•……

It plays a vital role in core businesses such as payment, consumer finance, and wealth, influencing key links such as user acquisition, transaction conversion, and activation.

Hit the interface implementation logic

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

challenge

•How to ensure three highs: high performance (less than 50 ms), high concurrency (million-level TPS), and high availability.

•Large amount of data: The number of groups is large, and the number of pins of a large number of groups is more than 10 million, and the number of pins in some groups has reached billions.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team
Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

2. The hit interface of the 1.0 and 2.0 versions of the image system

Storage Method (Physical Machine Memory)

Crowd data is pulled from OSS and stored in a physical machine (32C256G), because the number of groups is too large, 256G is not enough to save all the OSS files of the population, so the logic of sharding is adopted, and each shard only stores a quarter of the OSS files of the population.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Advantages and disadvantages of the solution

Merit:

1. Performance standard: based on memory value, reduce middleware dependence, single machine pressure test up to 30000TPS, TP999 up to 40ms;

2. Low storage cost: At that time, the cost was lower than the direct use of cache;

3. Alleviate the pressure of CK: Pull files from OSS and reduce the number of CK queries.

Shortcoming:

1. Initializing the group is very slow: Since the data is stored in the memory of the machine, all groups must be loaded in full every time the machine is started, but with the development of the business, the number of groups will increase and the restart will become slower and slower;

2. High expansion cost: Although the group can be expanded horizontally, after the grouping is fixed, the memory of a single machine is limited, and once the upper limit is reached, the capacity must be doubled, resulting in difficulty in expanding the group;

3. Low efficiency of troubleshooting: Because it is a self-developed storage solution that relies on machine memory, the overall structure is complex, the ability of operation and maintenance is weak, and there are few related tools for operation and maintenance.

3. The hit interface of the 3.0 version of the portrait system

background

•Moved to JDOS, JDOS does not have a physical machine with 256G memory, so we use a physical machine with 192G memory.

•The number of groups has swelled from 8,000+ to 25,000+ groups, and the memory of the physical machines is reaching the limit of the physical machines due to the increasing amount of data stored by the four-shard physical machines in versions 1.0 and 2.0.

•When you pull a group from OSS, the information of the group will be disconnected due to network jitter, which will cause repeated pulling, resulting in jitter and instability of the physical stack memory, affecting TP999 and affecting the normal operation of services.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Storage Method (Physical Machine Memory)

The crowd data is still stored in the physical machine, but the pre-split operation is used, and the original physical machine four slices are changed to eight shards. First, split the OSS file of the population into 8 parts, and then the corresponding shard of the physical machine only needs to pull the file corresponding to its own shard.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Data Tiering Scheme (Caching)

Why do you need to use caching for data tiering?

•Improve the availability and reliability of the system: In response to unpredictable failures, network problems, server failures, etc., the data layering strategy can help the system cope with these unpredictable failures and improve the availability and reliability of the system

Storage

Because of the characteristics of the set collection, the number of elements less than 512 will be automatically compressed, so we use the set collection to store the user's offset.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Fourth, the hit interface of the 4.0 version of the portrait system

background

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Since in 3.0, the offset encoding has exceeded 2 to the 32nd power. The CK version we use does not support the calculation of 32nd power bitmaps greater than 2 when processing group bitmaps, and the upgraded version is not compatible. The CK computing layer architecture is upgraded: Offset computation of less than 32 bits is added to the existing bitmap, and when offset computation of more than 32 bits, all offset - 2^32 is computed to a bitmap of 32 bits higher. Based on this, we came up with the idea of using R2M at a low cost. It is also possible to achieve the compression effect of RoaringBitmap through development.

Why compress

We know that when creating a bitmap, the length must be the length of the maximum offset. Combined with our scenario, the number of pins we have now exceeded billions, but each group of people cannot be billions, so the data stored by our bitmap is not continuous, but sparse.

Let's say we only have 64 users now, and then we have an audience with only 1 offset, 63.

Normally, we need to create a 64-bit bitmap

1 2 3 …… 63 64
…… 1

At this point, we'll split the array into two groups, with users from 1 to 32 stored in the first group and users from 33 to 64 stored in the second group. At this point, we find that the first group of arrays is full of zeros, so can the first group not be created? This allows us to store 64 bits of data in a 32-bit array.

Next, to split the 64-bit array into groups of 16, then we only need to create the last grouped array, which is the 16-bit array, to achieve further compression.

49 50 51 …… 63 64
…… 1
Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Compress the process

According to our business scenario, each bitmap size is 65536 for the best compression.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Storage method (cache)

Deploy multiple R2M clusters (or groups), split the group bitmaps, and store them in different clusters according to a certain routing policy, providing query capabilities through different clusters, as shown in the following figure

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

In the new solution, a complete set of crowd push services is added, including data addition, update, deletion, detection, retry, etc., all visualized to the page, which greatly enhances the detectability and effectiveness of crowd loading, while in the previous scheme, these effective operation and maintenance methods were very lacking.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Process comparison

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

Comparison of the advantages and disadvantages of using centralized cache to replace self-developed distributed storage

Self-developed distributed storage R2M Storage (New Solution)
inferior position advantage
stability Sudden addition or change of a large number of people information, heap memory changes drastically, and it is easy to reach the upper limit of the server • Long recovery period: After the server goes down, the recovery period is long • A single shard is relatively large, and the traffic of a single hit storage service accounts for a relatively large proportion, and the impact is relatively large when a failure occurs · R2M has the ability of automatic expansion, when it reaches a certain usage ratio, the shard will automatically expand the capacity • If the shard goes down, it will switch between master and slave, and restore instantly, and support remote computer room backup • Single shard is relatively small, and the impact range is smaller
Maintenance costs Slow startup: A single storage service starts slowly, and the startup can reach up to 30 minutes. Difficult to go online: There are a total of 103 servers, 65 groups (8 pre-release groups, 57 production groups), group management is more complicated, configuration files are more difficult to manage, and there are many things to pay attention to when starting Expansion is difficult: Because each group is stateful and requires special physical machines, the expansion is relatively complex There is no such node, there is no problem of slow startup and difficult to go online • Simple expansion: automatic expansion or expansion tickets can be raised • The cache cluster is maintained by the operation and maintenance team, and they have a professional team and tools to maintain it
Hardware costs A total of 103 physical servers, including 8 pre-release servers, 48 hit servers, 8 download servers, and 39 other servers (to be offline) -- 32C192G servers · R2M is a science and technology cluster, and the fragments will be less manageable and used, and the cache space can be used more effectively • There is no pre-issued server that has been wasting • 4 billion clusters are about 500M, if it is 13,000 people with an average of 10 billion, it is about 1.5T, and the master-slave structure is about 3T, converted into 192G physics, it is about 15 physical machines, and if the water level is kept at 60%, it is about 25 physical machines
other Pull the OSS file only once during initialization to reduce bandwidth usage

Performance comparison

The overall TP999 time consumption of the interface has dropped from 40 ms to less than 10 ms

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team
Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

5. Follow-up exploration of hit interfaces

1. Reverse storage

background

When we analyze some time-consuming requests, we find that some business parties may transmit dozens of group codes at a time, in which case we may need to determine dozens of times and then return the result to the user, and the performance will definitely be worse than only passing one group at a time. So we want to use the user's pin as the key, and then the value is the group data that the user is in. In this way, no matter how many groups a user sends at a time, we only need to fetch the cache once.

Storage

key:用户的pin的offset为key

value: the set of all groups to which the user belongs (stored as a bitmap)

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

advantage

•High performance: When users transmit multiple population codes, we only need to take the cache once, and the performance is greatly improved.

2. Determination of Rules

background

It was found that some users only use tags when creating groups, and then do logical and non-operative operations on these tags.

For example, if a user selects two tags (tag 1: male, tag 2: over 30 years old, and yes and relationship) to create a group, and now wants to determine whether a pin is in this group, what can we do? We only need to determine whether the user is male, then determine whether the user is over 30 years old, and then do logical operations in memory.

advantage

• Storage space: There is no need to store the entire group bitmap file, and the label bitmap can be reused, which greatly saves storage space

6. Relevant technical appendices

At present, there are a large number of tags and groups composed of pin sets in the CDP system, with a total of 4,000+ tags and 17,000+ effective groups so far. Moreover, a large number of groups are pins, the number of pins is more than 10 million, and the number of pins in some groups has even reached 4 billion+. Such a large number of PIN sets put forward high requirements for our storage structure.

Let's take the group as an example, if a group contains 1000W pins, through text file storage, it takes about 150M, and the 4 billion group has reached an astonishing 150*40*10=60000M, about 60G, and the number of our group has reached 17000+, plus the tag data, the required storage space will be unacceptable.

In addition, the storage of data is only one aspect, and it is also a challenge to create a more fine-grained PIN package for the subsequent computation of tags and populations.

In the face of the above problems, CDP adopts the idea of Bitmap to solve it, which not only solves the problem of storage space, but also supports the combination calculation of different tags and groups by Bitmap itself.

1. Introduction to Bitmap

The basic idea is to uniquely mark a value with a bit bit, so that it can be used to record a tuple of data with no duplicate values. And each piece of data is identified by only one bit, which can greatly save storage space.

For example, if you want to store an array of values [2,4,6,8].

If you use the byte type to store in Java, you need 4 bytes of space, and each byte has 8 bits, that is, 4*8=32bits.

If you use a larger data type, the storage space will also increase accordingly, such as using Integer (4 bytes), which requires 4 * 4 * 8 = 128 bits.

If you use the idea of bitmap, you only need to build an 8-bit space, that is, a byte of space, to store, as shown in the following figure.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

2. PIN pool encoding

Through the above example, we can see that using the Bitmap idea to store, in fact, each data is a bit, and it cannot be repeated, which is consistent with the user pin, and there is no duplicate pin.

Since only 0 or 1 can be stored in the bitmap to identify whether the current bit has a value, and the user PIN is indeed a string, it is necessary to encode billions of user pins uniquely, which is what we often call the offset offset.

Each pin corresponds to a unique offset, which has reached 4.6 billion +, that is to say, the current maximum offset is 4.5 billion +. In this way, the newly registered PIN only needs to increase the offset value sequentially.

The following is a simple diagram, suppose I have 8 pins, pin1~pin8, and the corresponding offset number is 1~8.

If I want to create a tag or group that only contains even pins, I just need to set the bits with offset to 2, 4, 6, and 8 to 1.

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

三、ClickHouse简介

Once you have bitmaps, where do they exist, and how do you calculate the intersection and difference between bitmaps. Based on these problems, we used ClickHouse (hereinafter referred to as CK), a high-performance analytical SQL database open-sourced by Yandex in Russia in 2016, which is a columnar DBMS for online analytical processing (OLAP).

It has the following features:

1. Complete database management functions, including DML (Data Operation Language), DDL (Data Definition Language), permission control, data backup and recovery, distributed computing and management.

2. Columnar storage and data compression: Data is stored in columns, which can effectively reduce the amount of data that needs to be scanned during query in the scenario of column aggregation. At the same time, storing data in columns is naturally friendly to data compression (the homogeneity of column data), reducing the pressure on network transmission and disk I/O.

3. Relational model and SQL: ClickHouse uses the relational model to describe data and provides the concepts of traditional databases (databases, tables, views, functions, etc.). At the same time, using standard SQL as a query language makes it easier to understand and learn, and easily integrates with third-party systems.

4. Data sharding and distributed query: Horizontally slices data and distributes it to each server node in the cluster for storage. At the same time, the query calculation of the data can be pushed down to each node for parallel execution to improve the query speed.

More features can be found in the official documentation: https://clickhouse.com/docs/zh/introduction/distinctive-features

You can also check out this article by a colleague: How to use Clickhouse to build a multi-billion-level user portrait platform

In addition to the features of CK above, it also has the advantages of high performance of data analysis, simple development process, high activity of the open source community, and support for bitmap compression.

Fourth, the group processing link

Based on the above introduction of bitmap and clickhouse, we use ClickHouse + Bitmap to realize the combination calculation of tag groups and the storage of groups

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

As you can see in the figure above, in addition to the data in CK, we also put a group data in OSS.

五、RoaringBitmap压缩

我们最终使用的是RoaringBitmap,一种高效的压缩位图实现,简称RBM。 于2016年由S. Chambi、D. Lemire、O. Kaser等人在论文《Better bitmap performance with Roaring bitmaps》 《Consistently faster and smaller compressed bitmaps with Roaring》中提出。

The basic implementation idea is as follows:

Taking integer int (32 bits) as an example, the data is divided into two parts: the upper 16 bits and the lower 16 bits, the lower 16 bits remain unchanged as the data bit container, and the upper 16 bits are used as the bucket number container.

For example, if I want to store the value of 65538, the high position is 65538>>>16=1, and the low position is 65538-65536*1=2, that is, it is stored in the No. 2 position of bucket 1, and the storage location is as follows:

Evolution and Innovation of Crowd Service Data Storage Architecture of Portrait System JD Cloud technical team

The current version of RoaringBitmap we are using is 0.8.13, and Container contains three implementations: ArrayContainer,

Author: Jingdong Technology Wang Jingxing

Source: JD Cloud Developer Community

Read on