—

Preface

Peacock system SDK is the predecessor of our client buried SDK. At the beginning of the design, the Peacock system mainly provides various styles of advertising display and statistics of advertising view, click and other events for the application. Ad styles include: splash screen ads, reminder pop-ups, rewind pop-up ads, general ads, etc., similar to the current third-party ad SDKs (banner, interstitial, open, native, etc.), directly return to the ad view, and the client is responsible for displaying.

Later, with the development of the business, in order to meet the custom advertising styles of each APP, the form of directly returning the ad view was gradually eliminated, and the advertising data that met the corresponding conditions was returned, and the client analyzed the data to display the ads by itself. Advantages:

The background data template is easy to configure and expand;
Diverse delivery conditions, easy to control (expiration date, channel, city, version, application delivery, etc.);
The business logic SDK is implemented internally, so the business does not need to care about it.

In addition to the above-mentioned advertising services, the Peacock system is also responsible for the statistical reporting of data, which is initially just a simple number of statistics such as ad view, click, start, loading, etc., and then gradually evolves to add various forms of data reporting such as active reporting, event-based log messages, and custom event reporting. In order to better data-driven business, the comprehensiveness and correctness of data are extremely important, which drives the continuous iteration and optimization of statistical business. The following introduces the practice and evolution of the buried point statistics project from several aspects.

—

conception

The following questions need to be understood:

埋点是什么？埋哪些点？埋点的形式都有哪些？有哪些区别和优缺点？

Buried point is an act of data collection for applications such as web pages, APPs or backgrounds. Through burying, some client data can be collected for analyzing and optimizing the experience of the product, and it can also provide data support for product operation. Among the more common metrics, such as PV, UV, DAU, duration, addition, page clicks, etc.

When collecting data, the corresponding code is generally added to the APP, and when a certain condition is reached, the report is triggered (here due to policy reasons, it may not be uploaded to the server in real time). And this process of "adding code" is called "burying points". Generally, there are three forms of burying:

Code tracking: refers to calling the data collection API to report data when an event occurs.
For example, according to the requirements of the product/operation or the buried document, the R&D adds the code of behavior reporting in the source code of the web page or App, and when a certain condition is met, the code will be executed and the data will be reported to the server. This solution is the most basic solution, and every time the conditions for data reporting are added or modified, the developer needs to participate, and the effect can only be seen after the next version is launched. Basically, all data platforms provide SDKs for this kind of data reporting, which encapsulate the backend server interface for behavior reporting into a simple client SDK interface. Developers can report behavioral data by embedding this kind of SDK and calling a small amount of code at the buried point.
Full buried point: also known as no buried point, refers to all the behaviors generated in the web page or App that meet a certain condition and report them to the background server.
For example, all button clicks in an app are reported, and then the product or operation goes to the background to filter the required behavioral data. The advantage of this solution is very obvious, that is, when adding or modifying the behavior reporting conditions, there is no need to find a developer to modify the buried code. However, its disadvantages are as obvious as its advantages, that is, the amount of data reported is much larger than that of code burial, and many of it may be worthless data. In addition, this approach tends to look at the user's behavior independently, rather than focusing on the context of the behavior, which brings some difficulties to data analysis. Many companies also provide SDKs for this kind of function, which "hook" the original App code in a static or dynamic way, so as to realize the monitoring of behavior, and usually use the scheme of accumulating multiple re-reports to merge requests when reporting data.
Visual burying point: It refers to configuring the collection node through the visualization tool, finding the node in the app or web page resolution, listening to the events generated by the node, and reporting.
For example, when the App starts, it will obtain the pre-selected configuration of the product/operation from the background server, and then find and monitor the elements on the App interface according to this configuration, and when a certain element meets the conditions, the behavior data will be reported to the background server. With the full burying technology scheme of violence, it is easy to associate with on-demand burying, and visual burying is a scheme that configures burying on demand. Now there are some companies that provide this kind of SDK, when the monitoring elements are selected, some provide a web management interface, and the mobile phone can be connected to the management interface after installing and initializing the SDK, so that the user can configure the elements that need to be monitored on the web management interface, and some directly allow the user to select the elements on the mobile phone for burying.

Pros and cons:

Various burying methods have their use scenarios and advantages and disadvantages.

Analysis of the burying scheme of the Chinese perpetual calendar client

At present, there are many SDKs that support one or all of the above tracking methods, such as Mixpanel, Sensorsdata, TalkingData, GrowingIO, Umeng Analytics, etc., among which Mixpanel and Sensorsdata have been open-sourced. In this way, when we encapsulate our own tracking SDK, we can refer to its better solutions and understand the relevant implementation principles.

—

The process of data collection

The above are some concepts and methods of burying points, and a typical data platform generally includes the following steps for data processing:

Data collection: The collected data, this step actually includes determining the collected data and verifying the data after collection.
Data transmission: The process of data reporting, which is divided into real-time and batch reporting.
Data storage: After the data arrives at the server, it is modeled and stored for analysis.
Data analysis: Statistics, analysis, and mining of data.
Data display: Present a visual page and feedback the corresponding data for reference.

The first step is the core part, and the accuracy, richness, and real-time nature of the data will directly affect the final effect of the data platform.

Accuracy: It is to ensure the correctness of the buried point, the buried event is to meet the needs of the product and data, and the caliber of statistics should be discussed and determined with all parties, which is the most important. Because if the tracking is wrong, first, it is equivalent to the fact that the buried point is invalid, and it is useless, and the code can only wait for the next iteration to be re-added; second, it may also affect the previous buried data. Therefore, it is very important to ensure the accuracy of the buried point.
Richness: Buried points are used for data analysis, and the buried elements should be rich enough to meet various conditions for data analysis.
Real-time: Try to ensure the real-time upload of buried data, but if each piece of data is reported once, the number of requests for the reporting interface will be particularly large. Therefore, the general strategy is: report when the app is started, exit the report, meet a certain number of reports, time interval, and provide an immediate reporting interface.

Data verification method:

R&D personnel can verify through logs.
The client page displays the buried data that will be generated in a suspended manner, and the back-end page displays (packet capture and transfer, and the back-end interface is refreshed in real time).
assert断言字段结构。

In addition, there are some things to pay attention to during the data transfer process:

Batch upload: Data is generally uploaded in batches with multiple pieces of data.
Compression: In order to reduce the size of the upload, compression, such as GZIP, is generally required.
Encryption: To ensure the security of data, encryption policies are generally used.
Fault tolerance: If data fails to be uploaded, it should be stored in the database to avoid loss.

The first two steps involve the client, and the subsequent steps are generally handled on the server side without introduction.

—

The buried evolution of our project

At present, most projects still use code burying, because code burying can accurately count user behavior, and if the burying is reasonable, it is simpler and more convenient than full burying and visualization. At present, the burying scheme we use has also been improved through many iterations.

Early tracking: third-party SDKs

Our early burying is relatively simple, generally integrated statistical SDK (Umeng, talkingdata, etc.), initialized according to the corresponding SDK, and added to the statistical burying of the page. When encountering some complex services, you can use the SDK's custom burying points (such as Umeng's counting and calculating events).

The advantages of this method are that it is simple and convenient, has low maintenance costs, and does not need to define statistical data formats and upload packets. For some small apps, I personally think it's enough.

The disadvantage is that the data is not transparent, and the integrator cannot obtain the uploaded packets.

peacock SDK中的埋点

The buried statistics in peacock have gone through the following stages:

The statistics of advertisements in the Chinese perpetual calendar generally only count the number of times of view, click, loading, start, etc., which is now abandoned.
Initiates active reporting (now deprecated), event-based log packets.
Custom events are reported

The above statistical strategies can basically meet our current business needs with third-party SDKs. However, it is not perfect, first, there is no independent SDK to process this part of the logic (coupled with peacock), second, the packet format is not uniform (event-based and custom upload packets are inconsistent with the interface), third, the reporting and storage strategy needs to be optimized (the previous data is stored in memory, and reported after meeting certain conditions, and if it is unsuccessful, it will be stored in the database, which will have the risk of data loss), and fourth, there is a lack of some automatic statistics, such as page PV, duration, APP startup and exit, and application duration statistics.

Analytics SDK

In the third stage, we optimized the collection protocol and reconstructed the buried document to make the buried point more complete and convenient. These include the following features:

1. Independent SDK, clear business logic, easy to integrate.

2. The packet format and reporting interface are unified and simpler.

3. Richer reporting fields and expanded anti-cheat fields.

4. Add some automated statistics, including APP start, exit and corresponding duration statistics, page PV and duration, various control click events, etc.

5. Optimize the reporting strategy to ensure the timeliness of data. (Start, exit, abnormal exit, custom number of reports, custom report interval, immediate report, etc.)

6. Support H5 buried point statistics.

7. Various configurations are more flexible, such as reporting addresses, the number of reports to be reported, and the time interval between two reports.

—

Relevant technical points

The above is the evolution related to statistical burying, and we will encounter various problems that need to be solved in the process of iterative evolution. Here are some of the points in detail.

Device fingerprint

We know that the usual statistics of DAU (daily active), DNU (daily new), MAU (monthly active), retention, etc., are all counted through the device fingerprint, and the most important thing for the device fingerprint is to ensure the uniqueness of the device. On the basis of adopting the common practice in the industry, we have adopted the following methods to achieve this:

Obtain it from the SP (SharedPreferences) cache, if it is obtained, it will be used directly, if it cannot be obtained, it will be tried to obtain it from the Sdcard file, if it is obtained, it will be returned and written to the SP file, if it cannot be obtained, it will be obtained by the system native method and stored in the SP and Sdcard files. The values are stored in the SP and Sdcard files to obtain the priority SP, Sdcard, and native methods to ensure the uniqueness of the data to the greatest extent.

Automatic statistics for view

In the process of using the application, we need to count the display of various controls, that is, the view event, in order to count this event more accurately and easily, encapsulated two classes ETADLayout and ETADUtils.

ETADLayout is used to count entries, and it is added to the outermost layer of the entries to be counted, without affecting the internal layout structure. There are multiple public methods provided internally to facilitate developers to add information that needs to be counted, among which the setAdEventData method must be called.

ETADUtils is a utility class, the viewAllETADLayouts method will loop through all the ETADLayout objects in the statistical interval in the ViewGroup, and call the statistical method of the ETADLayout objects. The user can call the method at the appropriate time.

For example, listview counts currently visible entries when the slide stops.

Add ETADLayout to the outermost layer of the entry layout

Add the information to the ETADLayout object that needs to be counted

当ListView滑动停止时,调用ETADUtils的viewAllETADLayouts方法

Notes:

ETADLayout is the smallest unit and cannot be nested;
Atadleut内部做了去重逻辑,10s内相同条目ID只统计一次;
1/2 of the entry will be counted;
listview and recyleview are counted when the slide stops, but not when the slide is fast, and the top and bottom positions are passed in.

Fully buried technical scheme

The full buried point needs to be automatically collected, so the corresponding ID needs to be generated for the page and control, and the ID needs to be unique and stable, that is, it will not change at will.

Page ID

This is relatively simple, and the general class name can be satisfied (unless the corresponding class name is actively modified). In Android, there are two types of pages: Activity and Fragment, and Fragment can be embedded in different Activities, so the ID definition rules are somewhat different for the two:

Activity，ID 规则为 ActivityClassName|额外参数；
Fragment，ID 规则为 ActivityClassName[FragmentClassName]|额外参数。

Control ID

Compared with the page ID, the definition of the control ID is relatively more complicated. First of all, the R.id of the control cannot be satisfied, because the ID is not fixed for compilation reasons, and this ID may be different when the resource changes, and the stable conditions are not met, so this solution is not feasible. There are two options:

Use the ID name of the control
The id name and type of a generic widget rarely change unless the page layout is refactored. Use the ID name of the control to ensure its uniqueness and stability to the greatest extent. However, the widget ID may be the same for different pages, so you need to use the page ID to distinguish them.
Rule: Page ID + Control ID Name
The layout path of the control

Based on the parent-child relationship of the widget in the layout, we traverse from the widget itself until we find the root node, which finds a layout path for the widget in the view tree, and in turn we can determine the widget on the view tree. As shown below:
According to the above path generation rules, for a button, the path is: FrameLayout[0]/LinearLayout[1]/Button[0]. This path may be the same for different pages, and you need to use the page ID to distinguish them.
Rule: page ID + control layout path

The combination of these two solutions can ensure the uniqueness and stability of the control ID to the greatest extent. When we bury statistics, we can report both forms of values, including page ID, control ID name, and control path. Preferentially take the name of the widget ID as its unique ID, and then take the widget path if it is empty.

After the page ID and control ID are set, we can implement the code to implement its automatic reporting. For page burying, the implementation is relatively simple, you can add burying events (active) to onResume and onPause on the corresponding page, and you can also listen to the ActivityLifecycleCallbacks lifecycle to automatically bury events (automatic).

For the automatic burying of the control, after having the control ID, as long as it is clicked or long-pressed, the statistics can be made. In Android, the click and long press of the control have a relatively standard callback function, and the relevant methods encapsulated in the SDK can be called at the callback, and the relevant parameters of the view can be passed. The challenge is how to insert this method into the corresponding callback, and here's how to implement it.

AspectJ 实现AOP

The implementation principle is to find the location of the event that needs to be reported in the source code during the code compilation period, and insert the event reporting code of the SDK, which uses the framework of AspectJ.

A few concepts

PDO

AOP is the abbreviation of Aspect Oriented Programming, which means: a technology for face-oriented programming, which realizes the unified maintenance of program functions through pre-compilation and runtime dynamic agents.

AOP is a continuation of OOP, a hot topic in software development, and an important content in the Spring framework, which is a derivative paradigm of functional programming.

AOP can be used to isolate each part of the business logic, so that the coupling degree between the parts of the business logic can be reduced, the reusability of the program can be improved, and the development efficiency can be improved.

In a nutshell, AOP is a technique that allows you to dynamically add functionality to a program without modifying the source code through pre-compilation and runtime dynamic agents.

AspectJ

* JPoint: code cut-off point (that's where we want to insert the code)

* Appearance:代码切点的描述

Pointcut：描述切点具体是什么样的点，如函数被调用的地方（ Call(MethodSignature) ）、函数执行的内部（ execution(MethodSignature) ）

Advice: Describe where to insert the code at the point, such as in front of the Pointcut (@Before) or behind (@After), or around the entire Pointcut (@Around)

It can be seen that the following things need to be done when implementing the AOP function:

Define an aspect, which must have two attributes: Pointcut and Advice.
Write code that needs to be injected when matching code that matches the Pointcut and Advice descriptions.
When compiling the code, through the special java compiler (the ajc compiler of Aspect), find the code that meets the definition of Aspect, and insert the code that needs to be injected into the position specified by Advice.

AspectJ is a framework that makes it easy to implement code instrumentation by relying on imports. Here are the brief steps for its implementation.

implement

1. First, define Aspect

import org.aspectj.lang.JoinPoint;  
import org.aspectj.lang.annotation.After;  
import org.aspectj.lang.annotation.Aspect;  
import org.aspectj.lang.annotation.Pointcut;


/**
 * android.view.View.OnClickListener.onClick(android.view.View)
 */
@Aspect
public class ViewOnClickListenerAspectj {  
    /**
     * 埋点的具体实现
     */
    private void doAOP(final JoinPoint joinPoint) {


    }


    /**
     * 支持 butterknife.OnClick 注解
     */
    @Pointcut("execution(@butterknife.OnClick * *(..))")
    public void methodAnnotatedWithButterknifeClick() {
    }


    @After("methodAnnotatedWithButterknifeClick()")
    public void onButterknifeClickAOP(final JoinPoint joinPoint) throws Throwable {
        try {
            if (AnalyticsDataAPI.sharedInstance().isButterknifeOnClickEnabled()) {
                doAOP(joinPoint);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


    /**
     * android.view.View.OnClickListener.onClick(android.view.View)
     *
     * @param joinPoint JoinPoint
     * @throws Throwable Exception
     */
    @After("execution(* android.view.View.OnClickListener.onClick(android.view.View))")
    public void onViewClickAOP(final JoinPoint joinPoint) throws Throwable {
        doAOP(joinPoint);
    }


    /**
     * android.view.View.OnLongClickListener.onLongClick(android.view.View)
     *
     * @param joinPoint JoinPoint
     * @throws Throwable Exception
     */
    @After("execution(* android.view.View.OnLongClickListener.onLongClick(android.view.View))")
    public void onViewLongClickAOP(JoinPoint joinPoint) throws Throwable {


    }
}

This aspect code defines that doAOP (joinPoint) is inserted after the onClick method of view; The code is reported to the buried point. The above-mentioned methods also support onClick's Butterknife dependency injection method.

2. Secondly, use the ajc compiler to "weave" the Aspect code into the source code

Add dependencies in the build.gradle file at the project level:

这段Aspect代码定义：在view的onClick方法后插入doAOP(joinPoint);代码进行埋点上报。其中上述也支持onClick的Butterknife依赖注入方式。


2、其次使用ajc编译器向源代码中“织入”Aspect代码
在 project 级别的 build.gradle 文件中添加依赖：

在主APP module的 build.gradle 文件中添加依赖：

//aop全埋点需要
implementation 'org.aspectj:aspectjrt:1.8.9'

Add the compilation code required to import AspectJ and put it at the end of the file:

import org.aspectj.bridge.IMessage  
import org.aspectj.bridge.MessageHandler  
import org.aspectj.tools.ajc.Main  
final def log = project.logger  
final def variants = project.android.applicationVariants


variants.all { variant ->  
if (!variant.buildType.isDebuggable()) {  
    log.debug("Skipping non-debuggable build type '${variant.buildType.name}'.")
    return;
}


JavaCompile javaCompile = variant.javaCompile  
javaCompile.doLast {  
    String[] args = ["-showWeaveInfo",
                     "-1.8",
                     "-inpath", javaCompile.destinationDir.toString(),
                     "-aspectpath", javaCompile.classpath.asPath,
                     "-d", javaCompile.destinationDir.toString(),
                     "-classpath", javaCompile.classpath.asPath,
                     "-bootclasspath", project.android.bootClasspath.join(File.pathSeparator)]
    log.debug "ajc args: " + Arrays.toString(args)


    MessageHandler handler = new MessageHandler(true);
    new Main().run(args, handler);
    for (IMessage message : handler.getMessages(null, true)) {
        switch (message.getKind()) {
            case IMessage.ABORT:
            case IMessage.ERROR:
            case IMessage.FAIL:
                log.error message.message, message.thrown
                break;
            case IMessage.WARNING:
                log.warn message.message, message.thrown
                break;
            case IMessage.INFO:
                log.info message.message, message.thrown
                break;
            case IMessage.DEBUG:
                log.debug message.message, message.thrown
                break;
        }
    }
  }
}

3. Check the woven class file

After completing the above two steps, insert the buried statistics code into the onClick method, and check the compiled class file as follows:

public void onClick(View v) {  
        JoinPoint var2 = Factory.makeJP(ajc$tjp_0, this, this, v);
        try {
            if (v.getId() == 2131165324) {
                Log.i("MainActivity", "tv_test onClick");
            } else if (v.getId() == 2131165226) {
                Log.i("MainActivity", "btn_test onClick");
                this.startActivity(new Intent(this, TestActivity.class));
            }
        } catch (Throwable var5) {
            ViewOnClickListenerAspectj.aspectOf().onViewClickAOP(var2);
            throw var5;
        }
        ViewOnClickListenerAspectj.aspectOf().onViewClickAOP(var2);
    }

This is the basic usage of AspectJ, which can theoretically be replaced by any method, such as TabHost, RadioGroup, etc.

At present, our latest version of the statistical burying SDK is based on this method, which modifies the bytecode during compilation and inserts the event reporting code.

In addition to the preceding solutions, there are also Transform APIs (1.5.0 or later), ASM, Javassist, and proxy listeners provided by the Gradle plug-in.

Implementation reference:

51 Credit Card Android Auto-Buried Practice
网HubbleData之Android 无埋点实践

—

Summary and outlook

From the earliest manual burying to the current part of the automatic burying SDK, the burying statistics are slowly iteratively optimized, so that the work of burying can be more convenient and comprehensive. But even so, the work of manual burying can not be completely replaced, we should use it in combination according to the characteristics of the business, in some relatively stable page controls, use automatic burying, for some business changes frequently, you can use manual burying.

In the future, a visual burying platform can also be built, which can realize dynamic on-demand burying and make the burying platform more perfect.

Author | Li Heng

Source-WeChat public account: micro carp technical team

Source: https://mp.weixin.qq.com/s/9dLlU-FiAvGWcNp6lKfsMg

Analysis of the burying scheme of the Chinese perpetual calendar client