laitimes

Qudian big data platform architecture

author:Flash Gene

Qudian big data platform architecture

As the technical support of the group's data business, the Qudian big data platform provides the group with a whole-process data-driven solution from collection, modeling, storage, analysis to intelligent application, and finally connects to the business system or data products to drive business decision-making and product intelligence with data services.

1

What is a data platform

The big data platform collects, processes, manages, and servitizes business data, and then feeds it back to the business, which is actually a concept, emphasizing a kind of reuse ability. When enterprises need data transformation and refined operations, or when business needs and data scale reach a certain level, they need to build a data platform, which is a combination of high-quality and efficient data systems and services that empower the business front office.

Qudian big data platform architecture

For example, when my family wants to eat, I buy my own groceries and cook them in my own kitchen with ordinary kitchen utensils. If it is Foxconn, tens of thousands of people eat, it is necessary to build a processing and distribution center for food ingredients to provide catering services in a centralized manner. Essentially, it is "a change in the magnitude of the scale of the demand, leading to a qualitative change in the solution". At this point, how does it sound like a data middle office? In fact, the name definition is not important, all we need to do is make the data as a service better.

2

What needs the data platform solves

Corporate perspective

Service: The service is stable, functional, and the data is of high quality.

Efficiency: Execution is more efficient and business is more real-time.

Cost: While supporting the rapid development of data business, the cost growth is controlled as much as possible.

Security: Access security, storage security, and data compliance.

Technical perspective

Componentization: Complete functions and complete big data components and data applications to meet the needs of massive data storage and computing.

Platformization: service abstraction, data sharing, self-service openness, and analysis and decision-making.

3

What are the challenges in the process of building and implementing the data platform?

Qudian big data platform architecture

The service is stable

Based on the Hadoop ecosystem, we have built a low-cost, highly reliable, highly scalable, highly effective, and highly fault-tolerant data platform, from data access to > data exchange to > data conversion to >data analysis to >data visualization to achieve full-link service and data monitoring, as well as service assurance goals such as distributed storage and computing, disaster recovery and backup, auto scaling, and data decoupling.

Qudian big data platform architecture

Cost control

Unlike the business front office, which is directly connected to the front-line business, the data platform is directly linked to the business KPI, and the platform is often a very large cost center in the eyes of the boss. We continue to control costs in performance tuning, technology selection, scaling architecture, business evaluation, and other measures to effectively solve the contradiction between business development needs and storage and computing cost growth.

Qudian big data platform architecture

Efficiency improvement

Collaboration efficiency, for example, the operation staff to obtain data, the demand execution path is as follows: operation personnel -> analysts -> data warehouse team -> platform team, which shows that this efficiency is extremely low. In response to this multi-level data demand, we provide different data application services for different teams, achieving flatter cooperation between teams and higher efficiency in R&D and analysis.

Development efficiency, from the practice of all stages of platform development, we found that for a statistical task/algorithm task/risk control task, if the traditional programming thinking method is used to achieve, then only through the heap developer, do a lot of development, iteration and maintenance work, a task may take a day or even a few days to complete, for dozens or even hundreds of task requirements a day, a SQL only needs a few minutes of development time (take our factory as an example, 25,000+ per day). routine and ad-hoc tasks). We have achieved full-platform SQL, replaced programming thinking with SQL thinking, lowered the threshold for platform use, and greatly liberated the upper-level business development team.

Data governance

As a big data platform in the financial industry, the requirements for data security and data quality are much higher than those of other industries, and we are committed to sorting out data standards, building data security and privacy standards, and solving data quality and security issues around business scenarios.

Data quality measures: Carry out a series of management activities such as identification, measurement, monitoring, and early warning of various data quality problems that may be caused by each stage of the data life cycle from planning, acquisition, storage, sharing, maintenance, application, and extinction, and monitor and analyze data quality problems around integrity, accuracy, rationality, consistency, and timeliness, so as to improve the data quality of enterprises.

Data security measures: Provide reasonable security means and measures for the system, application and network levels, and establish a sound authority control, access audit and data desensitization mechanism to ensure the security of internal information of the enterprise.

Qudian big data platform architecture

4

What does the Qudian big data platform look like?

Data scale: petabytes of hot data, processing 25,000+ tasks per day.

The architecture diagram is as follows:

Qudian big data platform architecture

Basic services

Basic services include core links such as data collection, data factory, data governance, and data services. Divide the data area according to the function, design the data model, integrate all kinds of data under the unified process scheduling, and form a basic data system together with the existing enterprise-level data warehouse and historical data storage system, provide various data applications to support operation and management, and support upper-level applications. The following are the main basic services of Qudian big data:

Qudian big data platform architecture
Qudian big data platform architecture

Big data infrastructure

Qudian big data platform architecture
Qudian big data platform architecture
Qudian big data platform architecture

Monitoring platform

Qudian big data platform architecture

5

How a data warehouse is built

Qudian big data platform architecture

Original layer

Primitive-layer data, including app burial points, server-side logs, and business DB data synchronization, can be processed in real time (at the database level or instance level), which can be configured by users on demand, and finally can be implemented into HDFS and KUDU systems.

Warehouse layer

The warehouse layer is Qudian's core data service, including data cleaning, normalization processing, data enumeration confirmation, data fusion system, data transformation system, data labeling system, and business logic four systems.

Application layer

The application layer is directly oriented to applications, with highly summarized data, user detailed data, user tags and other data, serving major business scenarios such as Qudian reporting platform, risk control A, B, and C card model points, marketing, collection, intelligent recommendation system, and multi-dimensional analysis.

The above is the overall introduction of Qudian big data platform, and we will carry out technical dismantling and analysis of the big data hierarchical architecture one by one in the future, welcome interested students to pay attention.

Author: Lin Qingmin

Source-WeChat public account: Qudian technical team

Source: https://mp.weixin.qq.com/s/OCEUS1v0844nVL4V01lPIA

Read on