01 Preface
In the first installment of our data governance series, we talked about the basic concepts, governance goals, and governance strategies of data governance (see: Data governance issue 1 | Briefly talk about strategies for data governance). In this issue, we will talk about the most core part of data governance - data asset governance, this article mainly explains the strategy and tool construction ideas of data asset governance.
02 Basic concepts
Data assets in a broad sense cover all unstructured, semi-structured and structured data, data assets in a narrow sense mainly include business logs on the business side, topics of streaming data, data tables of batch data, production scheduling tasks/jobs, indicators, dimensions and datasets of the model layer, reports, APIs, applications/services of the application layer, etc. This paper is mainly for data assets in a narrow sense, among which the data tables, data indicators, and reports that everyone has the most contact with are the most.
03 Problem analysis
1) User A is a data development engineer, familiar with the structure and content of the data table, the daily work content is mainly data collection, data warehouse modeling (ETL) and operation and maintenance troubleshooting, the main requirements to query the upstream and downstream production links of the data table and the implementation of production scheduling operations, but also randomly probe data fields, enumeration values and definition functions to assist in data development.
2) User B is a business-side data analyst, with basic data mining and analysis capabilities, and his daily work is mainly to produce data analysis reports, configure business indicators and reports for the business frontline, which data table needs to be stored in according to business needs, and know the definition and enumeration value of each field in the data table, so as to determine whether to meet the query requirements.
3) User C is a data management personnel, familiar with the data warehouse modeling specification and data caliber definition, the daily work is mainly to standardize the data development process, reduce data resource storage and development costs, while ensuring the timeliness and quality of business report output, hoping that the asset center can provide unified caliber maintenance, asset monitoring and evaluation capabilities.
Figure 1: Asset Center typically represents the analysis of user demand scenarios
04 Governance objectives
In summary, the core users of the data asset center are data analysts, product managers, data operations and other users on the business side, who constitute the consumer side of the data asset center and are the key to the circulation of data assets and thus generate exchange value, while the user groups on the supply side of the asset center are mainly data developers and data managers.
Therefore, for the consumer side, the asset center mainly solves the pain points of finding people and having a good number, the core governance goal is to ensure the integrity, standardization and consistency of data asset meta-information, and for the supply side, the asset center mainly solves the pain points of production and development efficiency improvement and resource cost control, and the governance goal is to reduce costs and increase efficiency.
05 Industry research
To study the construction experience of data asset centers of major factories on major data forums in recent years, Didi and Tencent were specially selected as the research targets, as follows:
1. Didi Data DreamWorks
1) Scenario analysis: As shown in Figure 2, Didi's main data assets are divided into three categories: people, roads, and vehicles, which mainly show the characteristics of large data volume, high proportion of structured data, and high data security level, and the main requirements are data asset cost governance, data security governance, and data quality governance.
Figure 2: Didi Data asset characteristics
2) Solution ideas:
As shown in Figure 3, Didi's internal data servitization, index management platform and asset management platform are unified into the field of data content construction, positioned as upward serving various data application platforms, downward docking with the middle public data layer of the data development platform, taking data content as the starting point, the asset management platform is unified as a tool for data asset meta-information collection and management, standardizing the caliber and quality of assets through the indicator management platform, and then serving data assets to the business team through data servitization.
Figure 3: Didi Data Platform Business Architecture
As shown in Figure 4, Didi designs the use objects of the data asset platform into two categories, one is the data processor, the other is the data manager, the data processor undertakes the daily production control of various assets, and the data manager undertakes the resource cost and security control of various assets.
Figure 4: Object design of Didi Data Asset Management Platform
3) Product introduction: Figure 5 is a sample of the main functional modules of Didi Asset Management platform
2. Tencent's game data asset management platform
1) Scenario analysis: As shown in Figures 6 and 7, Tencent Games includes hundreds of various types of end-game, page, and mobile games, with a huge amount of data, such as diverse data, lack of unified standards, inconsistent caliber definitions, low link quality, inability to quickly locate problems, and difficulty in evaluating data value and cost.
Figure 6: Overview of Tencent's game big data operations
Figure 7: Tencent game data asset problem pain points
2) Solution idea: Tencent Games has mainly built two major systems for asset governance, namely the metadata management system of data assets and the evaluation system of data asset value, of which the metadata management system involves the fields of metadata application, metadata management, metadata storage and metadata collection, and the data asset value evaluation system mainly evaluates from the three perspectives of popularity, breadth and revenue, the details are as follows:
Figure 8: Architecture design of metadata management system of Tencent's game asset management platform
Figure 9: Architecture design of data asset valuation system
Figure 10: Data asset heat "ice-cold-warm-hot" evaluation model
Figure 11: Micro-Small-Medium-Large Data Asset Breadth Valuation Model
Figure 12: Data asset profitability "poor-medium-good-excellent" evaluation model
3) Product introduction:
Figure 13: Mockups and functional descriptions of the main modules of Tencent's Game Data Asset Management Platform
3. Research summary
Analyzing the content shared by Didi and Tencent, it is found that the two leading companies have a common point in data asset governance, that is, all kinds of data asset governance are implemented through platform-based means, both pay attention to the metadata standardization, security and cost of assets, and provide services such as data asset retrieval and lineage link retrieval. In terms of focus, Didi's asset management tools are richer and more mature, taking into account the pain points of data producers and managers, while Tencent's highlight lies in the unique design of the data asset value evaluation system, which is worth learning from.
06 Product architecture
As shown in Figure 14, the data asset center is divided into three layers, namely the service layer, the management layer, and the collection layer, of which the service layer provides data asset retrieval-related service capabilities to data analysts, data products, business operations and other data consumption end users. The management is mainly oriented to data asset managers, mainly representing data product managers, R&D engineers and main responsible persons of each business line product/technical team, providing data asset entry and maintenance capabilities, and providing asset cost governance services; The collection layer is mainly for each data source, including but not limited to burying point meta-information collection, business database meta-information collection, report/indicator meta-information collection, personnel organization information collection, etc., at the same time, the collected meta-information needs to be defined by asset maintenance and managers in accordance with the unified model provided by the management.
Figure 14: Data Asset Center product architecture diagram
07 Product Design
1. Data access
Product positioning: As shown in Figure 15, the core of the data asset center is the central database of various data asset meta-information, and the meta-information collection of various data assets is mainly divided into two parts: automatic collection of upstream business systems and manual input of the front-end page of the asset center, therefore, the data access module undertakes the functions of asset center open source and standard definition.
Figure 15: Schematic diagram of meta-information collection in the data asset center
Specification definition: the data asset center needs to collect asset meta-information with different structures such as hive, kafkatopic, clickhouse, druid, reports, indicators, APIs, etc., unified definition and integrated analysis of various types of heterogeneous data to draw a data map is the main problem that the asset center needs to solve, the system design should consider the complexity and difference of various assets and the future versatility, therefore, if a unified set of meta-information collection standards is not defined, Then, with the inclusion of data assets, there will be problems such as resource control costs and reduced meta-information quality. Here, such as Figure 16 and Figure 17, we abstract a set of meta-information models that can generalize and define various types of data assets to solve the above problems:
Figure 16: Design diagram of the meta-information acquisition model of the data asset center
Figure 17: Examples of main category attribute definitions
2. Data map
Asset retrieval: As shown in Figure 18, it supports normal search and advanced search, and recommends structured knowledge graphs to users
FIGURE 18: ASSET RETRIEVAL HOMEPAGE DEMO
Asset details: The asset details page displays the basic, business, and technical information of the asset, and provides capabilities such as permission application, asset collection, lineage link query, quick retrieval, and SQL template generation
EXHIBIT 19: ASSET DETAIL PAGE DEMO
EXHIBIT 19: ASSET DETAIL PAGE DEMO
3. Asset maintenance
As shown in Figure 20, the data asset center provides an asset entry and maintenance interface from the perspective of managers, supports asset maintenance with the asset center as a unified maintenance platform, and ensures the timely update of asset meta-information
FIGURE 20: ASSET META INFORMATION MAINTENANCE DEMO
FIGURE 20: ASSET META INFORMATION MAINTENANCE DEMO
4. Asset governance
Quality analysis: Asset governance provides data asset quality evaluation and analysis reports, and analyzes and evaluates asset element information integrity, standardization, and repeatability
FIGURE 21: DATA ASSET QUALITY EVALUATION AND ANALYSIS REPORT DEMO
Governance list: divided into individual list and team list, comprehensively calculate the quality (integrity, standardization, uniqueness, etc.) score, cost score (storage cost and growth trend, etc.) of the assets under charge, and evaluation score (user evaluation, query popularity, etc.) to rank, provide daily ranking, weekly ranking and monthly ranking, and reset data once a month.
EXHIBIT 22: DEMO OF DATA ASSET GOVERNANCE LIST
5. Handover of assets
The asset center provides one-stop asset handover and disposal capabilities, reducing the unmaintained and safety hazards of assets caused by resignation
FIGURE 23: ASSET HANDOVER MODULE DEMO
08 Future prospects
Looking forward to the future, I think that the asset center based on its characteristics of data content collection and management, in fact, can go deep into the field of data applications and services, through advanced search and AI algorithms, quickly provide lightweight data visualization, data analysis and attribution prediction and other services, in order to meet the needs of business finding people and numbers, directly feedback data conclusions. Simplify the process of analyzing data after subsequent services find it, and improve data analysis efficiency.
Refer to existing foreign products, such as ThoughtSpot (a tool for automated production of data reports based on search engines), as shown in Figure 24, with search as the starting point, based on the association and construction between metadata, quickly recommend and draw visual charts, provide lightweight configuration capabilities, and quickly meet the needs of user data analysis:
Exhibit 24: ThoughtSpot, an intelligent search analytics product
Another example is Einstein Discovery (see Figure 25), which automatically correlates user-based data, analyzes and interprets the data content, and provides users with interpretation reports in natural language to quickly and lightweightly answer the user: "What happened?" Why does it happen? What's coming? What needs to be done? ”:
Exhibit 25: Introduction to Einstein Discovery
09 Appendix: References
1. The materials shared by Didi and Tencent are derived from the external sharing materials of the "2019 China Data Intelligent Management Summit"
2. The asset governance strategy refers to the official account of Meituan's technical team: systematic modeling of data governance integration practice
3. Some sources of future outlook:
- Data Intelligence Search Recommendations: https://www.thoughtspot.com/
- Einstein Discoveries: Salesforce Einstein Discovery White Paper
The success of future generations is achieved on the shoulders of predecessors, and the above information provides very important reference materials in the author's understanding of data asset governance ideas, and special thanks to the above companies/teams/individuals/organizers!
10 Next Announcement
This is the third issue of my data governance series of content, here in advance special thanks to readers have been supporting me in the data security and data governance series of articles, due to work reasons, the update is slow, but still hope that you continue to pay attention to and support me, the next issue, I will face the quality inspection monitoring center that provides quality inspection monitoring services for data caliber, output and security, talk about the construction ideas of DQC and SLA tools, welcome to follow, we will see you in the next issue ~