laitimes

The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

author:Flash Gene

1. Overview

How to use data safely and compliantly in marketing scenarios and maintain the core operation of the online advertising business model has become an urgent problem for enterprises in the current advertising ecosystem. Alimama has always paid attention to the safe and compliant use of private data to protect user privacy and data security to the greatest extent. Following the previous part of sharing the technical solutions of Alimama Marketing Privacy Computing Platform Secure Data Hub (hereinafter referred to as "SDH") in the Group's production environment (Further reading: Privacy Computing Practice in Advertising and Marketing Scenarios: Alimama Marketing Privacy Computing Platform SDH), this article shares the technical implementation and application practice of Alimama Marketing Privacy Computing Platform SDH in the public cloud, welcome to read and exchange.

II. Background

With the introduction of personal information protection policies in major markets around the world, data security and user privacy protection issues in the Internet ecosystem have become more and more important and strict. In 2019, the state included data as a "factor of production", advocating to drive data circulation and reflect the value of data. In 2022, the "20 Data Articles" were released to the public, accelerating the development of the data element market and the efficient circulation of data elements, forming a new situation of open sharing of data elements. The circulation of data elements is the essential requirement for the release of data value, security and compliance are the basic prerequisites for the orderly circulation of data, and privacy computing technology provides a key technical foundation and important technical support for solving the problem of data circulation and data value mining.

As the largest business model of the Internet, the scale of China's online advertising market will exceed the trillion level in 2022 and is expected to maintain a rapid growth of 12.9% in 2023, gradually forming a huge advertising and marketing industry with a complete ecology. Privacy and data security issues have had a huge impact on the global advertising and marketing industry, resulting in a series of issues such as prohibiting third-party cookies, compliant collection and use of device IDs, data rights confirmation, and data security and compliant circulation. In the digital advertising industry, data is the foundation of marketing, and the circulation of data will also make the value of data continue to amplify and improve. Considering that advertising and user data will be scattered in multiple roles in the advertising ecosystem, including: users, media, advertisers, SSP, ADX, DSP, DMP, CDP, etc., how to solve the problem of data silos and cross-domain data circulation in advertising and marketing scenarios, and provide timely, accurate and safe marketing services for media, advertisers and marketing participants on the basis of ensuring data privacy security and legal compliance of multiple roles has become the frontier direction and consensus of agile exploration in the global advertising and marketing industry.

Alimama Marketing Privacy Computing Platform SDH is a Data Clean Room product for advertising engines, advertisers, and third-party DSPs/DMPs to perform data fusion, privacy computing, and joint modeling in a privacy-preserving environment. Based on privacy-enhancing computing technologies such as Secure Multi-Party Computation (MPC), Federated Learning (FL), and Differential Privacy (DP), SDH provides brands with cross-domain secure and consistent data decision-making capabilities.

3. Technical architecture

3.1 Core Competencies

The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

3.2 System Architecture

The architecture of the SDH public cloud system is as follows:

The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

Roles:

  • Platform: Deploy SDH services, manage basic data and schedule and distribute tasks, and do not involve the storage and computing of business-side data.
  • Business: The SDH computing engine is deployed in the private domain environment and is responsible for storage and computing in the private domain environment of the business side.

Functional Modules:

  • Console: is responsible for basic data management and task scheduling and distribution, and does not involve the storage and calculation of business-side data.
  • Agent: is responsible for identity authentication and provides APIs for instance lifecycle management, including instance starting, querying, and stopping.
  • Compute engine: is responsible for the generation of logical execution plans and the scheduling and execution of physical execution plans in the private domain environment.

Network Communication:

  • Platform and business side: The public IP communication is used to transmit metadata access and task distribution, and the communication is stand-alone with a small amount of traffic.
  • Business-to-business parties: Private IP communication (VPC peering connection) is used to transmit clear ciphertext computing data between business parties, which is distributed communication with a large amount of traffic.
The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

3.3 Core Principles

3.3.1 Metadata Design

SDH defines the availability and visibility of data in a detailed layer according to the granularity of data columns to achieve the ability of "usable and invisible" data:

  • Availability: Associative key column attributes, grouping key column attributes.
  • Visibility: Visible Attributes, Hash Visible Attributes, Grouping Visible Attributes, Aggregate Visible Attributes.

3.3.2 Execution plan generation

The SDH compute engine is implemented based on the Flink computing framework and traverses the execution plan from the bottom up in the execution plan generation stage, which mainly includes two stages: validity verification and splitting and rewriting.

  • Validity verification: Define complete data availability and visibility derivation rules, which override Flink's built-in operator operators, system functions, and custom UDF functions to verify whether data meets the privacy protection requirements of the table-level and column-level data.
  • Split and rewrite: Iterate through the execution plan from the bottom up, color the execution plan according to the data holder, split and rewrite the operator, and split the execution plan into several subgraphs.

3.3.3 Dense state operator implementation

  • Join operator: SDH implements the PSI Join secret operator based on the Elliptic Curve Diffie–Hellman key Exchange (ECDH) anonymous key consensus protocol, and the encryption process is shown in the following figure. In the Hash Join Building, Probing uses ECDH encryption to determine the medium truth value of the Join condition, and the Bloom Filter is introduced to implement the pre-filtering of the Join Key in the Probing stage to optimize the Join performance and support the privacy intersection of tens of billions of data.
The implementation of Alimama's marketing privacy computing platform SDH in the public cloud
  • Inequality operation operator: Based on the secret sharing encapsulation secret state comparison operator, the judgment of the true value of inequality is performed by the expression execution engine, which can support the secret state comparison of hundreds of millions of data quantities on the premise of ensuring the calculation accuracy (2-32 power).
  • Ciphertext operation unit: Based on cryptography technologies such as ECDH, Secret Sharing, and HE, it encapsulates various types of ciphertext operators, and supports common logical operations (AND, OR), relational operations (<, <=, ==, !=, >=, >), and arithmetic operations (+、-、*, /), and continuously improves the computing efficiency of ciphertext operation units through ciphertext operator optimization.

3.3.4 Privacy and Security Protection

  • Meta-data protection: provides "table-level" permission control;
  • Field-level protection: Provides "column-level" field availability and visibility control, and supports field privacy protection attribute derivation and legitimacy verification for different operators.
  • Data protection: The platform provides a complete data authorization mechanism, minimizes access control policies for cloud services, and supports multi-layer access authentication to ensure data isolation.
  • Communication protection: communication encryption is completed based on asymmetric encryption + symmetric encryption, that is, the two parties use asymmetric encryption to transmit the randomly generated symmetric encryption key in the initial stage, and then use the symmetric encryption method to encrypt and decrypt. Ensure that all data transmitted over the network is visible.

3.3.5 Distributed Computing Optimization

  • Distributed hash join: SDH supports (Shuffle) hash join, that is, the data of two parties is sharded according to the same rules according to the join key in the equivalent condition and the number of shards is the same, that is, the data of the same join key of the two parties will be distributed on the workers with the same shard ID after Shuffle, and the workers of the two parties are directly associated point-to-point based on the hash join.
The implementation of Alimama's marketing privacy computing platform SDH in the public cloud
  • Distributed communication optimization: In order to improve the encryption performance, the scheme of asymmetric encryption + symmetric encryption is adopted, that is, the two parties use asymmetric encryption to transmit the randomly generated symmetric encryption key in the initial stage, and the subsequent communication adopts the symmetric encryption method for encryption and decryption. In order to reduce the overhead of network transmission, the data in the communication process is transmitted in batches and compressed to reduce the data scale of network communication. For multi-party security computing tasks with relatively complex logic, optimization rules such as predicate pushdown are used to pre-position the computing logic as much as possible, and the local data is pre-filtered in advance locally, so as to further reduce the amount of data for network communication

3.3.6 Marketing Analytics Component

  • Unified external query API: SDH provides a unified lightweight query API interface, which allows users to perform logical queries by submitting MPC SQL statements or calling marketing analysis components, in which the analysis components can support automatic MPC SQL rewrite and then submit to the computing engine
  • In-Service component integration: The marketing analytics component is integrated into SDH's Service, reducing additional deployment and network connection costs
The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

Fourth, the deployment architecture

SDH provides cloud-based deployment solutions for different cloud environments (Alibaba Cloud, third-party clouds, and private clouds). Serverless K8s-based clusters can support one-click SDH engine deployment, which is lightweight, simple in process, and low in technology interconnection costs. At the same time, it supports elastic scaling and pay-as-you-go billing of cloud resources. The SDH public cloud deployment solution is shown in the following figure, and the overall deployment process can be summarized as follows:

  1. Prepare your cloud account
  2. Apply for a cloud product scope and configure access control
  3. Servicess K8s 集群部署
  4. SDH Engine Deployment
  5. VPC peering connection
  6. VPN Connectivity Test (Applicable to Third-Party Cloud and Private Cloud Deployments)
  7. API/analytics component call test
The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

5. Application cases

5.1 Global consumer asset analysis

Based on the SDH marketing privacy platform, Alimama and Yili have jointly created an application practice case of global consumer asset analysis and digital operation. Through SDH's privacy-enhanced analysis capabilities, it connects the extraterritorial delivery groups of Yili brand (variety show return crowds, media direct investment crowds, etc.), combines the rich "people-goods-field" user tag data and marketing strategy analysis of the "Dharma Pan" marketing strategy center to put the whole asset into production, form a delivery strategy, and finally synchronize the Wanxiangtai crowd supermarket for scene delivery to ensure the reach and delivery effect of high-value groups.

Based on the privacy-enhancing analysis and computing capabilities of PSI and MPC provided by the SDH platform, Yili has realized the joint calculation and analysis of MPC between the assets of one group of people and the brand user assets on the Dharma disk without leaving the domain, completed the up-turning of one group of people and the analysis of global consumer assets, helped customers complete the precipitation of global assets and released marketing value, and brought about an overall improvement of 30%+ global asset penetration, purchase conversion rate and ROI.

The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

5.2 Advertising cross-domain marketing effect tracking

The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

Alimama and Jiahe Technology have carried out in-depth cooperation in privacy computing technology. Based on the privacy-enhanced analysis and computing capabilities provided by the SDH marketing privacy computing platform, on the basis of ensuring the privacy security and compliant use of multi-party data, the company conducted in-depth technical exploration on cross-domain user identification before advertising, joint modeling of in-investment algorithms, cross-channel advertising effect measurement after advertising, and omni-channel user asset analysis scenarios, and completed the implementation practice of "advertising cross-domain marketing tracking and global asset analysis project based on privacy computing". It solves the practical marketing pain problems that advertisers cannot track and identify cross-domain users, the effect of public domain advertising cannot be accurately measured, and user assets are scattered and data fragmented.

Based on the fact that Jiahe Technology holds advertising public domain advertising data and brand private domain data, and Alimama holds platform advertising data, user tag data and e-commerce conversion data. SDH's marketing analysis components such as PSI, crowd portrait, and rule-based attribution are used to efficiently complete the MPC calculation of bilateral data to achieve cross-domain user identification and user journey tracking. Accumulate safe and efficient solutions for cross-domain advertising effect measurement and global crowd asset analysis, so as to further complete the analysis of cross-domain advertising marketing's reach crowd characteristics, the tracking and measurement of the conversion effect of advertisements in Taobao and Tmall stores, and the analysis of omni-channel advertisers' crowd assets, provide scientific and real post-advertising link conversion and user characteristics analysis reports, and significantly improve data security, analysis diversity, calculation accuracy and data timeliness compared with traditional data authorization schemes.

This SDH-based privacy data solution serves 10+ top brand advertisers under the ReachMax product of Jiahe Technology, covering multiple industries such as beauty, food, and daily chemicals. It helps advertisers improve the effect of advertising delivery, fully tap the value of advertising data, provide a strong reference for the reasonable allocation of merchants' budgets, and form a virtuous circle of "placement→ drainage→ growth → delivery". This solution has been selected as an excellent case of big data "Galaxy" in 2023 (further reading: The cooperation results of Alimama x Jiahe Technology in privacy computing have been selected as an excellent case of big data "Galaxy" in 2023).

The implementation of Alimama's marketing privacy computing platform SDH in the public cloud

6. Summarize and look forward

Alimama marketing privacy computing platform SDH supports distributed data processing for complex tasks mixed with clear ciphertext, and can realize 2 billion/h computing tasks including privacy set intersection, dense state relationship, arithmetic operation, window aggregation, etc., with a calculation accuracy of up to 2-32. Based on the EFLS framework, FL training is supported for 10 to 10 billion samples. At the same time, SDH provides SQL API interfaces, integrates multiple types of general marketing analysis components, supports a variety of lightweight cloud deployment solutions, further lowers the access threshold and provides efficient marketing privacy computing and analysis. The solutions and application cases of cross-domain advertising effect measurement and global crowd asset analysis break the problem of data silos and cross-domain data circulation in advertising and marketing scenarios, and explore the establishment of a new paradigm of "usable and invisible" data element circulation, which is an innovative application of privacy computing technology for data element circulation in the entire advertising industry.

In the future, SDH will continue to promote the safe and compliant circulation of data elements in the advertising ecosystem, and is committed to providing brands with cross-domain secure and consistent data decision-making capabilities. We will continue to improve the productization capabilities of Saas, and continue to build privacy-enhancing analysis capabilities for joint statistics and modeling with higher computational complexity, so as to help advertisers safely and efficiently carry out data processing, delivery optimization, and data modeling for data processing, delivery optimization, and effect measurement in advertising and marketing scenarios.

Author: Yi Yi

Source-WeChat public account: Alimama Technology

Source: https://mp.weixin.qq.com/s/UemrAjULSEvLoOsbgrwRzA

Read on