laitimes

In the era of large models, is the data center outdated now?

author:Everybody is a product manager
In the past two years, there don't seem to be many companies talking about the concept of "middle office". Some people even say that Zhongtai is no longer good. Is that actually the case? Not really.
In the era of large models, is the data center outdated now?

1. Is the concept of data center outdated?

After the popularity of the middle platform in 2019, large manufacturers have built middle platforms, and some data service companies have even launched products related to the data middle platform.<XX拆中台了> But is there really no need for a data center?

The core idea of the data middle platform is to componentize and service-based data output capabilities, so as to improve the efficiency of data from collection to enabling business applications. This is especially important in the context of a sluggish economic environment. It is just that there is a need to balance the value of the output of the business application and the ROI of the cost of middle office capacity building. AIGC and generative AI applications also need to rely on complete data assets and middle-platform data capabilities, otherwise the customization of each GENAI business scenario will inevitably be costly and difficult to replicate quickly.

The core idea of the data middle platform is to integrate all data resources and services into a unified platform to achieve centralized data management and services. This is mainly to solve the problems of "storage", "communication" and "generation" in enterprise data management, that is, to connect data islands, make all business data, and make all data business.

By transforming enterprise data into data assets, the data middle platform can improve the efficiency of data research and development, data discovery, and data analysis, and solve data quality problems, including the quality of data warehouse design, the consistency of indicators, and the quality of data research and development.

The data middle office needs to solve the following problems:

  • Data silos: Different departments or systems in an enterprise may have their own data resources, forming data silos. Through centralized management and integration of these data resources, the data middle platform realizes data sharing and interoperability, breaks data silos, and improves the utilization value of data.
  • Data efficiency issues: In the process of data research and development, data discovery, and data analysis, the lack of a unified data platform often leads to inefficiency. By providing unified data interfaces and services, the data middle platform simplifies the process of data acquisition and processing, and improves work efficiency.
  • Data quality issues: Data quality is often difficult to ensure due to diverse data sources and inconsistent formats. Through data cleaning, integration and standardized processing, the data middle platform improves the consistency, accuracy, timeliness and completeness of data, thereby ensuring the quality of data.

Second, the general architecture of the data middle platform

In the era of large models, is the data center outdated now?

The goals of the data middle platform are: efficiency, cost, reuse, business data, data assetization, and asset operation.

Reduce the cost of obtaining and using data required in the business innovation process, and make data analysis and big data artificial intelligence applications more convenient.

Many articles will say that the output of the data middle platform is API, API as a service, but for an API to be faster and more efficient, it depends on various circulation links such as data synchronization, data cleaning and processing, and asset precipitation.

Therefore, I divide the product architecture of the data middle office into five layers:

  1. Data service layer: analysis services, data query services, visualization services, tags, and algorithm services based on data assets and platform tool output
  2. Data asset layer: includes data warehouse model asset construction, data governance, and asset inventory
  3. Data processing layer: The process of ETL based on business logic, including the development, scheduling, transportation, and O&M of batch and stream data
  4. Data integration layer: The first step is to synchronize data from different data sources to a unified data warehouse or data lake
  5. Infrastructure layer: The lowest layer is the big data cluster service, including storage, computing, resource scheduling, and management of various components of the Hadoop ecosystem
In the era of large models, is the data center outdated now?

3. The data middle platform contains a brief introduction to data products

1. Data collection

Positioning: Provide internal and external data collection solutions for enterprises, improve the original data support for big data analysis applications, and is the "data crude oil" of enterprises.

Product module: buried solution & buried management platform, crawler system, data filling system

2. Component management

Positioning: A big data component management platform that replaces human command-line operations, operation and maintenance of big data clusters and various components with configured processes.

Product modules: cloud platform, HDFS management, Kafka management, HBase management, and Elasticsearch management

3. Development kits

Positioning: Productization, automated data collection, synchronization, processing to application processes, improve data development efficiency, reduce development costs, and shorten the data demand cycle for business innovation.

Product modules: data integration, offline development platform, real-time development platform, intelligent operation and maintenance platform, and machine learning platform

4. Data Assets

Positioning: Precipitate data assets, disclose asset catalogs, facilitate data sharing, and monitor data quality according to the formulated data audit rules, ensure accurate and high availability of data from the source, unify authority control, and ensure data security.

Product modules: data map, data lineage, indicator system, data quality monitoring, model construction platform, asset management center

5. Data governance

Positioning: Govern inefficient or worthless data and tasks, release storage & computing resources, and achieve refined management of data asset costs.

Product modules: Cost Optimization Center, Data Security Center

6. Analyze the application

Positioning: Aggregate and govern cross-domain data, encapsulate data capabilities in the form of products, apply data to business decision-making, product optimization, refined operations, etc., mine the value of data, and empower business.

Product Modules:

Data analysis: Adhoc query, user behavior analysis system, self-service analysis, agile BI, data visualization platform, intelligent analysis platform

Product intelligence: personalized recommendation, user portrait and precision marketing platform

7. Data Services

Positioning: Based on the idea of data middle platform, the data can be quickly output to API services, and it has the ability of service monitoring and management

Product module: API service platform, recommendation platform, intelligent early warning & data subscription (from people to numbers)

8. General Functions

Positioning: Abstract the common modules of each data product, provide unified service capabilities, reduce the duplicate construction of each product, and reduce the development cost

Product modules: ticketing system, message center, help center, unified permissions, product navigation, and demand center

In the era of large models, is the data center outdated now?

Fourth, summary

As a data product manager, what product field are you currently working in, and which module of product work do you want to work in in the future career development plan?

Here, I would like to clearly answer the doubts of some students, < is the era of AI a data product should be transformed into an AI product manager?> the answer is: AI application is one of the ways to embody the core value of data, but if you want AI application to be more efficient and low-cost, enterprises need to continue to improve the middle platform infrastructure capability of basic data products.

This article was published by Everyone is a Product Manager Author【Data Dry Rice People】, WeChat public account: [Data Dry Rice People], original/authorized Published in Everyone is a Product Manager, without permission, it is forbidden to reprint.

Image from Unsplash, based on the CC0 license.

Read on