laitimes

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

author:Program Ape Mouth
Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

First, what is the cargo lala gimbal

Lalala Gimbal is an internal self-developed BI (Business Intelligence) visualization platform.

As of December 2022, Lalala's business scope has covered 352 Chinese mainland cities, with 660,000 monthly active drivers and 9.5 million monthly active users, with the continuous expansion of the company's business, the demand for data analysis and data value transformation is becoming more and more obvious, the original data analysis platform has been unable to meet the diversified needs of the business to a certain extent, and can not directly connect with the company's existing permission management system, business application system, and there is a certain data permission control and data leakage risk. At this time, the urgency of building a self-developed BI platform became stronger and stronger, and Lalala Gimbal came into being.

Second, the introduction of the system architecture of Lalala

In fact, the term "business intelligence" (BI) first appeared in 1865, when Richard Millar Devens used it in the Cyclopaedia of Commercial and Business Anecdotes to describe the banker Sir Henry Furnese by collecting and using data information. Act ahead of your competitors to reach profitability. Today, BI has undergone historical precipitation and has been redefined by us as a set of solutions that use technical means or methods to transform data into knowledge to support corporate decision-making and explore business value. Data-centric, the core functions of BI mainly include data warehouse, data ETL, data analysis, data mining and data visualization.

Different from business informatization to the user data processing between different businesses of various business systems, data informatization is cross-business, cross-system data analysis insight and overall indicator output, which can reverse promote the construction of business intelligence and improve decision-making science, so it often relies more on frameworks, models and computing power, which also determines the height of enterprise data informatization. With the development of natural language processing, artificial intelligence, lakehouse integration and other technologies, the concept of AI+BI has been proposed, and Databricks (a big data processing platform company founded by the creators of Apache Spark) also believes that the data platform infrastructure that supports both BI and ML is the future of data analysis. The BI of the future only requires you to say a question into a computer to get the answer. However, from the perspective of "current" and "domestic", there is still a long way to truly realize business "intelligence".

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Next, let's take a look at the big data architecture of Lalala, which is briefly divided as follows:

Basic layer and platform layer: Provide data storage, data processing, and data development capabilities for the company through infrastructure such as data warehouse and computing power platform (including data access, computing engine, and governance development).

Service layer and application layer: It mainly favors data value mining platforms, making full use of the capabilities of the basic layer and platform layer to maximize the value of the company's data assets.

In addition, Lalala has also established a data security protection system in the big data system, such as: data permission control system, data security audit, data encryption and desensitization, etc.

It can be seen that the visual analysis of the entire PTZ is one of the most direct connections between data and users, and plays a role in the big data business system. Data-centric, directly responsible to business users upward, relying downward on data warehouse, data ETL, data analysis, data mining, etc.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

The design goal of the gimbal system: to allow users to produce professional scenario-based data reports in a few minutes.

Flow → Create a data model (model configuration & custom SQL analysis) → Visual report creation (report creation & dashboard creation) → Publishing (multi-service system sharing & report permission control)

Apach Echarts, a visual open-source component, is used in the report production process, and users can easily create a report by dragging and dropping the chart component and data fields.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Connect to multiple data sources:

At the data source level, the data sources that PTZ can dock are mainly based on internal self-use types, such as MYSQL, PostgreSQL, Phoenix, Doris, Hive, etc., and can also support the upload of Excel/CSV data files, and at the same time dock the company's internal system fast reports and indicator databases, and users of the company's internal data sources can also connect and use without knowing the specific account password.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Data model:

Earlier, we introduced that there are two ways for users to create data models, namely model configuration and writing custom SQL, model configuration is more friendly to non-technical people, PTZ has smoothed out the differences between query statements of different data sources, can achieve zero code, users do not need to understand the writing of SQL statements, through drag-and-drop can create a dataset and report. In addition to some necessary limitations, users can create data models very flexibly.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)
Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Report Production:

In the report creation workbench, which is the main editing page for making reports, 14 chart types are currently supported, such as indicator cards, funnel charts, crosstabs, scatter charts, stacked charts, etc.

Users can use dragging and dropping dimensions and measure fields for data binding, adding and modifying chart components in a full-interface manner, and customizing sorting, totaling, grouping, topN, and year-on-chain analysis, which is simple and flexible and easy to use.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Dashboard Production:

In the dashboard interface, users can use drag-and-drop to select some edited charts and filter components to generate interactive visual pages for self-service analysis.

The completed report supports data download, and can be connected with the big data data security management and control system after release, so as to realize the unified management of report permissions, which is friendly to users with relatively high requirements for some sensitive data, and can also be shared to multiple business systems.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Third, the design of cargo lala gimbal visualization technology

Data source connection:

Most of the visualization platforms on the market have a common problem, that is, data source connections generally need to fill in the user name and password, because many business intelligence BI tools consider more generalization, which is actually not very good for business users, because the user name and password of the data source are only known by technical development, and users must take it from the development every time they use it, in which communication costs and data security cannot be avoided.

As an internal self-developed system, this situation can be easily avoided, users do not even need to know the data source connection string, password, etc., only know which database and which table the required data is in, which is also convenient for permission control for users, rather than creating many connection accounts.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Data model:

As mentioned in the previous architecture introduction, drawing on the industry's visualization platform, PTZ can also create data models through model configuration and SQL writing, including multi-table Joins, and bind the results of SQL statement queries to visual charts by dragging and dropping.

The data model will automatically generate dimensions and measures according to the field attributes of the query, automatically identify the field type, such as integer, decimal, string, date, of which the numeric type is recognized as a measure by default, users can also adjust the dimension or measure and field type, field remarks, etc., measures can perform general aggregation operations, including Sum, Count, Avg, Max, Min, deduplication count, etc., dimensions and measures can be edited by field expressions to copy the field (Copy), Functional operations such as Delete and Transform. For specific fields, we also provide column-level and row-level permission control.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Flexible diversity of queries:

The query of the visualization platform needs to be very flexible and diverse, such as SQL where query, data needs to be filtered layer by layer, data format conversion, such as decimal places/percentages, year/month/day aggregation calculation of date and time, dimension grouping, dimension and measure sorting, drill-down/linkage between charts, linkage between filters, etc.

At present, PTZ interactive data filtering is mainly divided into two types: single chart filtering and dashboard global filtering, and the filtering of a single chart is relatively simple, which is a single-dimensional Set set of data models. The global filtering of the dashboard is relatively complex, and there may be linkage between each filter.

For example, A linkage BCD, A + B linkage C, A + C linkage D, just these 3 kinds of filter linkage relationship, at first glance estimated is not easy to understand, but for example, it is not clear: assuming that A is a large region, B is a province, C is a city, D is a region, then Guangdong Province corresponds to South China, A + B linkage C is all cities under South China / Guangdong Province, A + C linkage D is actually A + B + C linkage D, then D is equal to all areas under South China / Guangdong Province / Shenzhen City.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Of course, in addition to the linkage between data filtering and filters, there is also linkage and drill-down between charts. For example, click Zhongshan City in the chart to make a component below display the specific data of Zhongshan City. Drill-down is a bit like a multi-level filter, which is a flexible interaction between filters.

Architecture design of Lalala gimbal system (abbreviation: BI (Business Intelligence) visualization platform)

Data calculation:

The specific calculation operations that users can do include: field expression conversion, date and time aggregation calculation, dimension grouping, total, TopN, etc., which will eventually be converted into SQL statements to calculate the results at the database level.

There is also a SQL-based query result that performs quadratic computation in memory. For example, crosstabs, a single SQL request does not get the final result, and you need to perform secondary calculations in memory.

Performance optimization:

Page queries rely on the underlying database engine, such as queryHive, we use Presto, performance has been greatly improved, as well as Doris, ClickHouse, ElasticSearch and so on. In addition to the query engine, there are also many places that can be optimized at the system level, such as the following:

  1. Source data cache: Due to the particularity of the visualization platform, in addition to the final data results of the report, the user does not pay much attention to all the data in the entire report production process, so only a part of the data needs to be queried, and all the data is displayed when the report is actually displayed. For example, a date dimension report, when dragging and dropping to edit, display 1 month of data, instead of 1 year, which is better for the report making experience, the query is relatively fast, PTZ here mainly uses H2
  2. SQL query result caching: Redis is used to cache SQL execution results, which can achieve better performance results
  3. Cache of filter drop-down option values: Drop-down option values generally change less, persist to the database, and are updated regularly, which greatly helps query performance
  4. Connection pool encounters high concurrency: When the number of connections is full and there are many slow queries, the new connection pool processes new requests to ensure that new incoming queries are not affected, but be careful not to exceed the database max_user_connections
  5. Cross-pivot table query: Since the first n pieces of data are fixed after row sorting, you can use the results of the current page row as a query condition to query and match a small part of the data, which improves performance greatly
  6. Interrupt large data volume queries: Use Google Concurrent SimpleTimeLimiter to interrupt threads with long query times, which is very CPU- and memory-intensive
  7. Asynchronous multithreading: such as downloading data asynchronously

Fourth, the conclusion and next planning

This article mainly introduces the architecture design ideas of Lalala PTZ visualization products from the perspective of developers. As a self-developed visualization platform, PTZ is still in the internal incubation stage in Lalala, and there are many shortcomings, such as some data source query speed is not fast enough, the operation process is too cumbersome, the prompts are not friendly enough, and so on. However, with the increase of access systems and the growth of users, we will strive to improve system functions, optimize user experience, ensure system stability, and strive to make PTZ a powerful tool for enterprises to reduce costs and increase efficiency.

The next step in the technical planning is about two points:

  1. System stability governance: System stability overrides. Lao Tzu once said, "If you rule a big country and cook small fresh food", the governance system is similar, you must master the heat, select the ingredients, use the appropriate ingredients, supplemented by frying, cooking, frying, and boiling, in order to produce a good dish. We know that in addition to code robustness, system stability governance mainly has three aspects: monitoring alarms, stress testing and drills, and the next step will focus on the robustness construction of system code and monitoring alarm degradation.
  2. Code design integration: As the system becomes more and more functional, it is difficult to achieve perfect code design, performance and readability are often multiple choice questions, and how to make trade-offs between multiple choices is not a simple matter. Fortunately, there are some books on this subject for reference, such as "Clean Code" and "Design Patterns".
Original link: https://juejin.cn/post/7211008551761543229