Kuaishou Big Data Security Governance Practices

Kuaishou was founded in 2011 and is committed to becoming the world's most obsessed company with creating value for customers. In Q4 2022, the company had 366 million daily active users and 640 million monthly active users. In order to support such a large scale of Kuaishou, there is a lot of data-related construction behind it.

Kuaishou's data platform is designed to improve decision-making efficiency and performance. The platform builds data warehouses and data services through the data middle platform, including analytical decision-making, experimental decision-making, AB testing, and core asset services. At present, Kuaishou's data volume has reached the trillion level, and the total data volume has reached the exabyte level. This sharing focuses on data security and will share Kuaishou's practice in big data security governance. It mainly includes the following parts:

1. Background

2. Platform construction

3. Governance Practices

4. Results and planning

5. Q&A session

Guest speaker|Ni Shun, Kuaishou, head of big data management platform

Edited and organized|Liu Yang

Content proofreading|Li Yao

Exhibiting Section|DataFun

Background

1. Kuaishou big data security platform positioning

Kuaishou Big Data Security Governance Practices

As a listed company, Kuaishou is very concerned about data security. The main responsibility of Kuaishou big data security platform is to escort the whole link and life cycle of big data and ensure data security. The whole link here consists of several layers:

In the data warehouse construction stage, data developers can use the development capabilities provided by the platform to build data warehouses, such as creating data marts and dimension tables based on ODS. Among them, the data platform has a complete data permission application control mechanism to prevent confidential data leakage.
In the data collection phase, the data platform identifies sensitive data, encrypts and desensitizes data, and controls data security when it is stored in the warehouse.
In the data application phase, the data platform also takes security measures to authenticate users on data services or applications to ensure the security of data assets.

2. Challenges faced by Kuaishou big data security

In the process of building a data platform, there are several challenges:

Versatility: The system covers a wide range of 30+ systems, which needs to have strong versatility.
Refined management and control: The first is resource refinement, covering heterogeneous resources such as reports, datasets, indicators, and dimensional databases and tables, the second is the refinement of operation types, including read and write operations, and the third is the refinement of accounts, including individual accounts and accounts in a multi-tenant system, and permission control and isolation are required.
High availability: Authentication and authentication are at the core of the data service, and once an abnormality occurs, the scope of impact is very large, so the security requirements are extremely high.
Scalability: Business needs are flexible and changeable, and they need to meet the permission control requirements of multiple business lines, which puts forward high requirements for scalability.

3. Kuaishou big data security construction ideas

In order to meet the challenges of data platform construction, Kuaishou's construction ideas revolve around several directions:

The first is organizational norms, Kuaishou has set up virtual organizations such as data committees and information security committees, formulated data classification and grading specifications, data permission specifications, data security and privacy marking specifications, etc., and also established a special security platform group to be responsible for the implementation of these specifications.
Secondly, the construction principle takes into account safety and efficiency, and a hierarchical approval process has been formulated and a coordination mechanism has been established. It is necessary to ensure safety and improve efficiency.
Finally, in terms of security principles, follow relevant laws and regulations, and follow the principle of least privilege.

Platform construction

1. History

The development process of the big data security platform can be divided into four stages:

In the original stage, the data platform was mainly built around the reporting platform, and the basic permission management was implemented at that time, the permission model was based on RBAC, and the security capabilities were at the 2A level, including authentication, application permissions, etc., and the overall situation was relatively primitive.
In the development stage, the RPAC permission model was introduced, the permission control was enhanced, and the system coverage was expanded to cover engine systems such as Hive.
In the refined construction stage, row-level permissions (PRBC) were introduced to achieve more refined permission control, tenant data isolation was strengthened to ensure data security, and iterative security capabilities reached the 4A level, and the authentication system and full-link audit were improved.
In the data compliance construction stage, we focused on privacy and data protection, introduced capabilities such as encryption, decryption and desensitization, and security isolation cabins, achieving 5A-level capabilities, and expanded system coverage to platforms such as Druid, CK, Kafka, and HDFS, and continued to promote data compliance construction to ensure data security.

2. Construction ideas

The idea of building a security platform revolves around the following three aspects:

Global coverage, covering storage engines, middle-end systems (such as production platforms, analysis platforms), and analysis and decision-making platforms.
Based on the 5A methodology, we build all-round security capabilities such as authentication, authorization, access control, resource protection, and auditing.
Full lifecycle control: focus on privacy data compliance in advance, strengthen data encryption and permission control through data security marking, privacy data marking and other measures, pay attention to the stability of authentication during the event, and build security situational awareness capabilities based on audit logs after the event, identify abnormal access behaviors, formulate risk policies, and ensure data security.

3. System architecture

The system adopts a multi-layer architecture, including:

Application layer: provides application services for users.
The core layer of the security platform includes the plug-in layer, interface layer, service layer, and storage layer.
Dependency layer: provides external dependencies, such as tenant accounts and resource systems.

The core layer contains the following modules:

Plug-in layer: meets the characteristics of different engines and implements permission authentication.
Interface layer: Provides HTTP and RPC interfaces for middle-end applications and development platforms.
Service layer: Unified access to resources and accounts, permission granting and management services.
Storage layer: Automatically caches and accelerates data to improve access efficiency.

In order to ensure the high availability and high performance of the system, the system provides comprehensive monitoring, alarming, degradation, fault tolerance plan, drill rate limiting and other safeguard measures.

4. Key Technologies – Certification Systems

The purpose of the authentication system is to verify the identity of the user. When designing the certification system, we faced the following challenges:

Lightweighting: Avoid significant impact on existing systems.
Localization: Integrate with the organizational system.
Easy to evolve: to meet new business needs such as international exploration in the future.

Drawing on mature solutions in the industry, we have developed a set of authentication systems based on third-party keyless transmission. The authentication process consists of three network communications: client authentication, obtaining an in-validity access token, and background service token validation. The certification system consists of the following key points:

Account system: includes individual accounts and group accounts.
Token types: Include regular access tokens, proxy access tokens, and downgrade tokens.
Downgrade token mechanism: Ensure that current access is not affected when the key distribution center is abnormal.

5. Key Technology – Permission Model

The permission model is used to control a user's access to a resource. Common permission models in the industry include:

Access Control Lists (ACLs): Directly establish relationships between users and resources, checking whether users have permissions each time they access.
Role-based access control (RBAC): Introduces the concept of roles, where roles are bound to resources and users inherit permissions by joining roles.
Policy-based access control (PBAC): Introduces the concept of policy to comprehensively determine access based on the attributes of the principal, environment, or object.
Attribute-based access control (ABAC): Similar to PBAC, but with more emphasis on the role of attributes in access control.

Due to the complexity of resources and the localization of the account system, Kuaishou has developed a policy-based role access control (PRBAC) model based on RBAC and PBAC. The PRBAC model is policy-centric and covers the following four areas:

Entity: Custom user group and tenant account.
Resource: A Uniform Identifier (UIN) that consists of a company domain, a resource domain, and a unique ID.
Actions: Common actions such as reading and writing.
Condition: The key to row-level permissions, which are based on the WHERE condition in the SQL query.

6. Key Technologies – Unified Authentication

Authentication systems can be divided into two categories:

Application system: Low QPS, high latency tolerance, good combination with Kuaishou system, can directly integrate middleware framework and access remote authentication services.
Big data engine: It is less integrated with the big data framework, and is based on the transformation of the open-source engine, provides authentication plug-ins, and selects local or remote authentication mode according to the characteristics of the engine.

For the authentication core service, it includes:

Automated Refresher: Incrementally or fully loads data.
Local data cache: Quickly recover after an exception.
Authentication Engine: calculates permission models and policy rules to implement flexible authentication rule judgment.

7. Key technology – end-to-end audit logs

End-to-end audit traces the source of data leakage, including production systems, application systems, Hive engine, HDFS server, and other links. Based on upstream data sources, the audit collects asset operation logs, access logs, and download logs in real time. Audit logs are transformed, such as expanding the Hive context, to facilitate subsequent audits. Audit logs are used for inventory and policy building, such as approval log policies. Features of end-to-end audit include:

Full-link coverage
Fuse lineage information
The audit format is uniform
Real-time risk alerts are supported

Governance practices

Next, we will introduce the key problems and solutions in Kuaishou's data governance practice.

1. Data classification and grading

The first thing to introduce is classification and grading. Classification grading is designed to prioritize highly sensitive data by dividing data into different levels of sensitivity.

Classification: Data that was previously fused together is now separated, and private data is listed separately. Both general data and private data are graded according to the public level, with general data being classified as C1 to C4 (public, internal, confidential, and original secret) and private data being classified as P1 to P4.
Tiering: After tiering, different levels of sensitivity will be protected differently for data. For example, C4 and P4 level data will have a more stringent approval process involving department heads and secondary department heads. In addition, this data will be stored with protective measures such as encryption or redacting.

Data classification and grading follow the following principles:

Principle of escalation: If sensitive information exists in a table, the entire table is processed according to the highest standard.
Downgrade principle: After data is desensitized or anonymized, its sensitivity level can be reduced.

The data classification and grading process is divided into three stages:

Metadata collection: The metadata center automatically collects data sources and data table changes from external platforms and stores them in the metadata center and gallery.
Based on metadata, the following three methods are used for automatic identification, among which, lineage recognition: analyze table lineage, task lineage, etc., identify sensitive fields and mark them. Algorithm detection: Use algorithms to detect specific data types, such as bank card numbers. Rule Template Matching: Matches built-in personal information identification rule templates, such as names, mobile phone numbers, and bank card numbers.
Data dashboard analysis, after identification, push the data to the user for secondary confirmation and marking. At the same time, it provides an ex-post asset dashboard to help users review the distribution of assets from the perspectives of individuals, organizations, and departments.

2. Data Engine Security

Data Engine security has the following issues:

In terms of internal regulations, there was a lack of account system and tenant account system in the early stage, and the ownership of assets was unclear and security responsibilities were unclear.
Security capabilities: Lack of identity authentication information, lack of security audit and traceability capabilities, and lack of authority control.
In terms of operational governance, it is difficult to locate real visitors and prevent work from being promoted, and multiple teams use multiple platforms, making it difficult to collaborate.

We have developed the following solutions to address data engine security issues:

Standardization: Implement the account system and authentication system. Clarify the responsibilities of administrative roles, including approval permissions for tenant administrators and security interface personnel.
In terms of tools, refined permission control is introduced, such as row-column and column-level permissions. The authentication mode is optimized to perform hierarchical authentication based on the engine level.
Governance: Set up a special working group to promote governance for each engine. Adopt the 28 principle and focus on the head platform. Adopt a flexible blocking strategy and gradually promote the transformation of the platform.

3. Sensitive Data Protection

Sensitive data protection governance faces the following challenges:

Differences in laws and regulations: Different countries have different requirements for sensitive data, so you need to carefully study the relevant laws and regulations.
Centralized control: Sensitive data should be managed separately from common data to facilitate security management and risk warning.
Cost and efficiency: Separating sensitive data from common data involves the transformation of different links, which requires a comprehensive consideration of cost and efficiency.

The cost and efficiency of each retrofit varies, and needs to be considered comprehensively. Retrofitting involves the following:

Data storage: enhanced identification and automatic desensitization.
Data processing: Focus on the approval of sensitive data.

In the sensitive data protection solution, in order to solve the sensitive data protection challenge, we focus on introducing the concept of security isolation warehouse:

Security Isolation Warehouse: A virtual concept used to isolate external data sources that contain sensitive information.
Encryption and isolation: Once an external data source containing sensitive information is identified, it is automatically encrypted and placed in a secure isolation bin.

In addition, we have taken the following measures:

Standardization: Research the laws and regulations of different countries to define the types of sensitive information, desensitization methods, and requirements.
Tool building: Develop tools for data identification, file field encryption, and masking.
Data protection measures: Implement data protection measures such as field-level permission control and strict approval processes.
Incremental processing: Regularly scan and identify new sensitive information to promote user governance and implementation.

Through the above measures, we have established a comprehensive sensitive data protection system to ensure that sensitive data is effectively protected.

Results and planning

1. Summary of results

Since its construction, Kuaishou's big data security system has been implemented in more than 30 systems, with a resource scale of 10 million and an average daily application volume of 1,000, covering approval flows from C2 to C4 and P4. The application scope covers multiple levels, including web systems, authentication and other services. The overall operation is stable, and there are no major failures. It effectively guarantees data security and improves the level of data governance.

2. Future planning

The future plan mainly includes the following aspects:

Improve coverage: Promote 100% access authentication and authentication for users of the underlying engine, and improve the authentication and authentication of users at the upper layer of HDFS.
Situational Awareness Enhancement: Analyzes the distribution of data assets and sensitive data access behaviors, and detects abnormal data behaviors.
New technology exploration: Explore enhanced data protection technologies, such as enhanced privacy data protection and multi-party security detection, and explore new ideas such as data fabric to make data available but invisible.
Intelligent improvement: Use large models and machine learning algorithms to improve the accuracy of data classification and classification and sensitive data identification, and explore intelligent data governance methods.

Through the above work, the protection of sensitive data is guaranteed, and the security of enterprise data is escorted.

Q&A session

Q1: How do I process tokenized data into the lake?

A1: Identify the sensitivity of tokenized data when entering the lake. If the data is only used for modeling, no additional processing is required. Otherwise, the data desensitization process is performed according to the data desensitization specifications to ensure data security.

Q2: About cross-departmental data permission application: How does Kuaishou divide data rights and responsibilities?

A2: There are different levels of permission requests:

Common data: Permission owner approves the permission.
Important data (e.g., C4): Approval by the authority owner and the second-level department head.
Very important data: approval by the person in charge of authority, the head of the second-level department, and the head of the first-level department.

The application method includes individual name and group name, and the permission can be renewed or upgraded after the validity period.

Q3: About row-level record deletion on big data platforms: How does Kuaishou support row-level record deletion under privacy compliance?

A3: Deletes data from the end, including service databases and downstream data. Hive Partition Files: Not suitable for row-level deletion and costly. It is recommended to use the Hudi engine: it supports row-level addition, deletion and modification, and has better performance. The specific process of deletion is as follows:

The user makes a data deletion request.
The system verifies the legitimacy of the request.
Start the end-to-end data deletion process.
Delete the corresponding data from the business database.
The Hudi engine deletes the corresponding row-level data.
Other downstream systems delete the corresponding data synchronously.

That's all for this sharing, thank you.

Share the guests

INTRODUCTION

Ni Shun

quick worker

Head of Big Data Management Platform

Focus on the construction of data management in the field of big data, including data security and quality, metadata platform, big data resource management, etc.

Kuaishou Big Data Security Governance Practices

Read on