laitimes

Data Governance Issue 3 | Data Asset Center

author:A data man's own place

01 Preface

In the first installment of our data governance series, we talked about the basic concepts, governance goals, and governance strategies of data governance (see: Data governance issue 1 | Briefly talk about strategies for data governance). In this issue, we will talk about the most core part of data governance - data asset governance, this article mainly explains the strategy and tool construction ideas of data asset governance.

02 Basic concepts

Data assets in a broad sense cover all unstructured, semi-structured and structured data, data assets in a narrow sense mainly include business logs on the business side, topics of streaming data, data tables of batch data, production scheduling tasks/jobs, indicators, dimensions and datasets of the model layer, reports, APIs, applications/services of the application layer, etc. This paper is mainly for data assets in a narrow sense, among which the data tables, data indicators, and reports that everyone has the most contact with are the most.

03 Problem analysis

1) User A is a data development engineer, familiar with the structure and content of the data table, the daily work content is mainly data collection, data warehouse modeling (ETL) and operation and maintenance troubleshooting, the main requirements to query the upstream and downstream production links of the data table and the implementation of production scheduling operations, but also randomly probe data fields, enumeration values and definition functions to assist in data development.

2) User B is a business-side data analyst, with basic data mining and analysis capabilities, and his daily work is mainly to produce data analysis reports, configure business indicators and reports for the business frontline, which data table needs to be stored in according to business needs, and know the definition and enumeration value of each field in the data table, so as to determine whether to meet the query requirements.

3) User C is a data management personnel, familiar with the data warehouse modeling specification and data caliber definition, the daily work is mainly to standardize the data development process, reduce data resource storage and development costs, while ensuring the timeliness and quality of business report output, hoping that the asset center can provide unified caliber maintenance, asset monitoring and evaluation capabilities.

Data Governance Issue 3 | Data Asset Center

Figure 1: Asset Center typically represents the analysis of user demand scenarios

04 Governance objectives

In summary, the core users of the data asset center are data analysts, product managers, data operations and other users on the business side, who constitute the consumer side of the data asset center and are the key to the circulation of data assets and thus generate exchange value, while the user groups on the supply side of the asset center are mainly data developers and data managers.

Therefore, for the consumer side, the asset center mainly solves the pain points of finding people and having a good number, the core governance goal is to ensure the integrity, standardization and consistency of data asset meta-information, and for the supply side, the asset center mainly solves the pain points of production and development efficiency improvement and resource cost control, and the governance goal is to reduce costs and increase efficiency.

05 Industry research

To study the construction experience of data asset centers of major factories on major data forums in recent years, Didi and Tencent were specially selected as the research targets, as follows:

1. Didi Data DreamWorks

1) Scenario analysis: As shown in Figure 2, Didi's main data assets are divided into three categories: people, roads, and vehicles, which mainly show the characteristics of large data volume, high proportion of structured data, and high data security level, and the main requirements are data asset cost governance, data security governance, and data quality governance.

Data Governance Issue 3 | Data Asset Center

Figure 2: Didi Data asset characteristics

2) Solution ideas:

As shown in Figure 3, Didi's internal data servitization, index management platform and asset management platform are unified into the field of data content construction, positioned as upward serving various data application platforms, downward docking with the middle public data layer of the data development platform, taking data content as the starting point, the asset management platform is unified as a tool for data asset meta-information collection and management, standardizing the caliber and quality of assets through the indicator management platform, and then serving data assets to the business team through data servitization.

Data Governance Issue 3 | Data Asset Center

Figure 3: Didi Data Platform Business Architecture

As shown in Figure 4, Didi designs the use objects of the data asset platform into two categories, one is the data processor, the other is the data manager, the data processor undertakes the daily production control of various assets, and the data manager undertakes the resource cost and security control of various assets.

Data Governance Issue 3 | Data Asset Center

Figure 4: Object design of Didi Data Asset Management Platform

3) Product introduction: Figure 5 is a sample of the main functional modules of Didi Asset Management platform

Data Governance Issue 3 | Data Asset Center
Data Governance Issue 3 | Data Asset Center

2. Tencent's game data asset management platform

1) Scenario analysis: As shown in Figures 6 and 7, Tencent Games includes hundreds of various types of end-game, page, and mobile games, with a huge amount of data, such as diverse data, lack of unified standards, inconsistent caliber definitions, low link quality, inability to quickly locate problems, and difficulty in evaluating data value and cost.

Data Governance Issue 3 | Data Asset Center

Figure 6: Overview of Tencent's game big data operations

Data Governance Issue 3 | Data Asset Center

Figure 7: Tencent game data asset problem pain points

2) Solution idea: Tencent Games has mainly built two major systems for asset governance, namely the metadata management system of data assets and the evaluation system of data asset value, of which the metadata management system involves the fields of metadata application, metadata management, metadata storage and metadata collection, and the data asset value evaluation system mainly evaluates from the three perspectives of popularity, breadth and revenue, the details are as follows:

Data Governance Issue 3 | Data Asset Center

Figure 8: Architecture design of metadata management system of Tencent's game asset management platform

Data Governance Issue 3 | Data Asset Center

Figure 9: Architecture design of data asset valuation system

Data Governance Issue 3 | Data Asset Center

Figure 10: Data asset heat "ice-cold-warm-hot" evaluation model

Data Governance Issue 3 | Data Asset Center

Figure 11: Micro-Small-Medium-Large Data Asset Breadth Valuation Model

Data Governance Issue 3 | Data Asset Center

Figure 12: Data asset profitability "poor-medium-good-excellent" evaluation model

3) Product introduction:

Data Governance Issue 3 | Data Asset Center
Data Governance Issue 3 | Data Asset Center
Data Governance Issue 3 | Data Asset Center

Figure 13: Mockups and functional descriptions of the main modules of Tencent's Game Data Asset Management Platform

3. Research summary

Analyzing the content shared by Didi and Tencent, it is found that the two leading companies have a common point in data asset governance, that is, all kinds of data asset governance are implemented through platform-based means, both pay attention to the metadata standardization, security and cost of assets, and provide services such as data asset retrieval and lineage link retrieval. In terms of focus, Didi's asset management tools are richer and more mature, taking into account the pain points of data producers and managers, while Tencent's highlight lies in the unique design of the data asset value evaluation system, which is worth learning from.

06 Product architecture

As shown in Figure 14, the data asset center is divided into three layers, namely the service layer, the management layer, and the collection layer, of which the service layer provides data asset retrieval-related service capabilities to data analysts, data products, business operations and other data consumption end users. The management is mainly oriented to data asset managers, mainly representing data product managers, R&D engineers and main responsible persons of each business line product/technical team, providing data asset entry and maintenance capabilities, and providing asset cost governance services; The collection layer is mainly for each data source, including but not limited to burying point meta-information collection, business database meta-information collection, report/indicator meta-information collection, personnel organization information collection, etc., at the same time, the collected meta-information needs to be defined by asset maintenance and managers in accordance with the unified model provided by the management.

Data Governance Issue 3 | Data Asset Center

Figure 14: Data Asset Center product architecture diagram

07 Product Design

1. Data access

Product positioning: As shown in Figure 15, the core of the data asset center is the central database of various data asset meta-information, and the meta-information collection of various data assets is mainly divided into two parts: automatic collection of upstream business systems and manual input of the front-end page of the asset center, therefore, the data access module undertakes the functions of asset center open source and standard definition.

Data Governance Issue 3 | Data Asset Center

Figure 15: Schematic diagram of meta-information collection in the data asset center

Specification definition: the data asset center needs to collect asset meta-information with different structures such as hive, kafkatopic, clickhouse, druid, reports, indicators, APIs, etc., unified definition and integrated analysis of various types of heterogeneous data to draw a data map is the main problem that the asset center needs to solve, the system design should consider the complexity and difference of various assets and the future versatility, therefore, if a unified set of meta-information collection standards is not defined, Then, with the inclusion of data assets, there will be problems such as resource control costs and reduced meta-information quality. Here, such as Figure 16 and Figure 17, we abstract a set of meta-information models that can generalize and define various types of data assets to solve the above problems:

Data Governance Issue 3 | Data Asset Center

Figure 16: Design diagram of the meta-information acquisition model of the data asset center

Data Governance Issue 3 | Data Asset Center
Data Governance Issue 3 | Data Asset Center
Data Governance Issue 3 | Data Asset Center
Data Governance Issue 3 | Data Asset Center

Figure 17: Examples of main category attribute definitions

2. Data map

Asset retrieval: As shown in Figure 18, it supports normal search and advanced search, and recommends structured knowledge graphs to users

Data Governance Issue 3 | Data Asset Center

FIGURE 18: ASSET RETRIEVAL HOMEPAGE DEMO

Asset details: The asset details page displays the basic, business, and technical information of the asset, and provides capabilities such as permission application, asset collection, lineage link query, quick retrieval, and SQL template generation

EXHIBIT 19: ASSET DETAIL PAGE DEMO

Data Governance Issue 3 | Data Asset Center

EXHIBIT 19: ASSET DETAIL PAGE DEMO

3. Asset maintenance

As shown in Figure 20, the data asset center provides an asset entry and maintenance interface from the perspective of managers, supports asset maintenance with the asset center as a unified maintenance platform, and ensures the timely update of asset meta-information

FIGURE 20: ASSET META INFORMATION MAINTENANCE DEMO

Data Governance Issue 3 | Data Asset Center

FIGURE 20: ASSET META INFORMATION MAINTENANCE DEMO

4. Asset governance

Quality analysis: Asset governance provides data asset quality evaluation and analysis reports, and analyzes and evaluates asset element information integrity, standardization, and repeatability

Data Governance Issue 3 | Data Asset Center

FIGURE 21: DATA ASSET QUALITY EVALUATION AND ANALYSIS REPORT DEMO

Governance list: divided into individual list and team list, comprehensively calculate the quality (integrity, standardization, uniqueness, etc.) score, cost score (storage cost and growth trend, etc.) of the assets under charge, and evaluation score (user evaluation, query popularity, etc.) to rank, provide daily ranking, weekly ranking and monthly ranking, and reset data once a month.

Data Governance Issue 3 | Data Asset Center

EXHIBIT 22: DEMO OF DATA ASSET GOVERNANCE LIST

5. Handover of assets

The asset center provides one-stop asset handover and disposal capabilities, reducing the unmaintained and safety hazards of assets caused by resignation

Data Governance Issue 3 | Data Asset Center

FIGURE 23: ASSET HANDOVER MODULE DEMO

08 Future prospects

Looking forward to the future, I think that the asset center based on its characteristics of data content collection and management, in fact, can go deep into the field of data applications and services, through advanced search and AI algorithms, quickly provide lightweight data visualization, data analysis and attribution prediction and other services, in order to meet the needs of business finding people and numbers, directly feedback data conclusions. Simplify the process of analyzing data after subsequent services find it, and improve data analysis efficiency.

Refer to existing foreign products, such as ThoughtSpot (a tool for automated production of data reports based on search engines), as shown in Figure 24, with search as the starting point, based on the association and construction between metadata, quickly recommend and draw visual charts, provide lightweight configuration capabilities, and quickly meet the needs of user data analysis:

Data Governance Issue 3 | Data Asset Center

Exhibit 24: ThoughtSpot, an intelligent search analytics product

Another example is Einstein Discovery (see Figure 25), which automatically correlates user-based data, analyzes and interprets the data content, and provides users with interpretation reports in natural language to quickly and lightweightly answer the user: "What happened?" Why does it happen? What's coming? What needs to be done? ”:

Data Governance Issue 3 | Data Asset Center

Exhibit 25: Introduction to Einstein Discovery

09 Appendix: References

1. The materials shared by Didi and Tencent are derived from the external sharing materials of the "2019 China Data Intelligent Management Summit"

2. The asset governance strategy refers to the official account of Meituan's technical team: systematic modeling of data governance integration practice

3. Some sources of future outlook:

  • Data Intelligence Search Recommendations: https://www.thoughtspot.com/
  • Einstein Discoveries: Salesforce Einstein Discovery White Paper

The success of future generations is achieved on the shoulders of predecessors, and the above information provides very important reference materials in the author's understanding of data asset governance ideas, and special thanks to the above companies/teams/individuals/organizers!

10 Next Announcement

This is the third issue of my data governance series of content, here in advance special thanks to readers have been supporting me in the data security and data governance series of articles, due to work reasons, the update is slow, but still hope that you continue to pay attention to and support me, the next issue, I will face the quality inspection monitoring center that provides quality inspection monitoring services for data caliber, output and security, talk about the construction ideas of DQC and SLA tools, welcome to follow, we will see you in the next issue ~