laitimes

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

author:Digitization of finance

Text / Zeng Ling, Huang Cheng, Intelligent Operation Center of China Everbright Bank

In recent years, in the context of the digital transformation of the banking industry, machine learning models have been widely used in bank marketing business to help banks achieve accurate customer acquisition, live customer retention, and improve marketing efficiency. Compared with traditional statistical models, machine learning models can achieve higher accuracy, but the algorithms used in them are mostly black boxes and the internal principles are more complex. It is difficult for business personnel to understand the internal operation mechanism of the model and the basis for the output results, and the trust and execution effect of the model will be greatly reduced. In response to the problem of interpretability of model algorithms, the "Evaluation Specification for Financial Applications of Artificial Intelligence Algorithms" issued by the People's Bank of China in March 2021 also made clear requirements for the safety and interpretability of the application of artificial intelligence algorithms in the financial field, pointing out that "algorithm interpretability is an important basis for judging whether an algorithm is applicable". Therefore, whether it is to improve the application effect of the model, or to meet regulatory requirements and protect consumer rights and interests, the machine learning model in the bank's intelligent marketing scenario should be interpretable.

Definition and classification of model interpretability

1. Model interpretability definition

The strict definition of model interpretability is currently inconclusive in the academic community. However, from the perspective of business application, explainability is to transform the complex algorithm logic structure into a language that humans can intuitively understand, so that people can understand the basis of the model's output results, know what it is, and know why it is so. For example, the degree of influence of each feature on the model results can be given, without giving all the detailed calculation formulas, which can meet the needs of business personnel to use the model.

2. Model interpretability classification

There are multiple perspectives and approaches to achieving model interpretability, which can be classified according to the following dimensions.

(1) According to the source of interpretation, it is divided into intrinsic explainability and ex-post explainability. Intrinsic explainability refers to the fact that the algorithm itself is interpretable, such as logistic regression algorithms, which can judge the impact on the model results through characteristic coefficients. Such algorithms are usually simple in structure, but have limited accuracy. Post-the-fact explainability refers to the calculation and business interpretation of the contribution or importance of the prediction process by applying explanatory methods after the model is trained, such as feature importance analysis and visualization, which is suitable for a variety of machine learning algorithms (as shown in Figure 1).

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

Fig. 1 The relationship between model interpretability and prediction accuracy

(2) According to the scope of interpretation, it is divided into global interpretability and local interpretability. Global explainability refers to the interpretation of the overall structure and parameters of the model based on the full data set to help people understand the operation mechanism of the model. Local interpretability refers to the attribution of how a model obtains an output based on a specific input based on a single observation, which is used to analyze individual differences.

(3) According to the dependence with the model algorithm, it is divided into model-related explainability and model-independent interpretability. Model-related interpretability refers to the extraction of the results or parameters or indicators in the calculation process as explanations according to the principles and structures of different algorithms, such as p-value and R-square in linear models, and the number of times that features in the tree model are used as the basis for splitting, the number of observations that affect the observations, and the gain contribution. Model-independent interpretability refers to the use of a uniform method that applies to all algorithms to measure the degree of influence of features, such as observing changes in the output result by perturbating the input features.

SHAP Interpretation Method

In the application scenario of machine learning model in the field of bank intelligent marketing, we not only want the model to be interpretable as a whole, but also want to attribute the model results of each customer, that is, the model needs to have both global and local interpretability. At the same time, in order to avoid the limitation of algorithm selection and compare the interpretation results of different algorithms, we hope to adopt a model-independent ex-post interpretation method. A common interpretation method that satisfies the above conditions is the SHAP method.

The core of the Shapley Additive Explanations (SHAP) method is the Shapley value, the concept of which is derived from game theory, which is used to solve the problem of contribution and benefit distribution of each member in a cooperative game, and the main idea is to measure the importance of an individual by calculating the marginal contribution of an individual in cooperation. In model interpretability, each feature is regarded as an individual contributor, and the marginal output value of a feature, that is, the change in the output result after the model is included in the feature, is used as the quantitative contribution of the feature.

For a certain feature of a sample, the marginal output value is calculated by training the model with the same dataset, algorithm, and parameters in the case of permutation and combination of all possible feature subsets, and calculating the difference between the output value when the model incorporates a feature and the output value before the inclusion of the feature, and then weighted average. Suppose that a model has M features in the mold, xj is the j-th feature of sample x, S is a subset of features without feature xj, and f is the output value of the model under a specific feature combination. Then the Shapley value of xj is calculated as follows:

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

where the weights

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

is a subset of features S and the rest (M-|S|-1) The ratio of the number of feature permutations to the number of all feature permutations.

For example, if you take a model with three features, the output values of each feature combination are as follows:

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

There are 4 subsets of features that do not include feature A:

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

故特征A的Shapley值

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

The Shapley value is additive, and the Shapley value of all features is added to the benchmark value (the predicted average value of the modeled sample) to be the predicted result of the model output. This also makes the SHAP approach more business-explanatory.

For the model as a whole, the contribution of a feature is the average of the absolute Shapley values of that feature on each observation.

Application practice of SHAP in the intelligent marketing scenario of banks

The SHAP interpretation method gives the importance of each feature that affects the output of the model. However, for business people, it needs to be translated into easy-to-understand language descriptions to truly guide business understanding and decision-making. CEB has actively explored the application of the SHAP method in the intelligent marketing scenario of retail business, and has achieved good results. The following is an example of a customer group upgrading private bank prediction model.

The modeling goal of the model is to predict the possibility of the customer group's assets to increase and upgrade to private bank customers in the next month, and the business goal is to use the model to help the business accurately locate high-potential target customers from a large number of customers in this customer group, carry out active marketing, and promote the increase of private bank customers. The steps to apply the SHAP method are as follows.

1. Contribution calculation

After the modeling is completed, the relevant information of the prediction model is input, including the features of the model, algorithm, parameters, output results, etc., and the Shapley value of each feature of a single sample and the model as a whole is calculated to obtain the quantified feature contribution. For this model, the output value of the model is the logarithmic probability of the prediction probability, that is, logodds=log(p/(1-p)).

(1) A single sample. Figure 2 shows the calculation results of each feature contribution of a customer in the form of a waterfall chart, the red arrow pointing to the right indicates that the value of the feature has a positive impact on the model result, and the blue arrow pointing to the left represents the value of the feature has a negative impact on the model result. The features are listed from top to bottom in descending order of absolute contribution. The mean output of the model to all modeled customers is -3.776, and the values of the 58 relatively unimportant features in the bottom row make the output value of this customer decrease by 0.1 compared with those without these features, that is, they make the model predict that the customer is less likely to upgrade to private banking. The ninth important feature, "Current Average Daily Growth Rate of Assets", is 0.397, which increases the customer's output by 0.12, which makes the model predict that the customer is more likely to upgrade to private banking. In the same way, the value of the first most important feature "current asset balance" is 5230913.82, which increases the output value of the customer by 1.88, and under the superposition of all features, the final output value of the customer is 0.884, an increase of 4.66 compared with the average, ranking in the top 1% of all modeled customers, that is, the model finally predicts that the probability of the customer upgrading to private banking is extremely high compared with the overall level of the customer group.

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

Figure 2 The contribution of each feature of a customer

(2) The model as a whole. Figures 3 and 4 show the contribution degree and contribution relationship of the model features after all customers are summarized. Figure 3 shows the overall contribution of each feature, and we can intuitively see the importance ranking of features, with the first five features having a greater impact and the subsequent features having a relatively small impact. Figure 4 shows the relationship between the value and contribution of each feature, and each row in the figure is a scatter plot of the value and contribution of all customers of the feature, and there are a large number of samples piled up at the widened point. The color of the dot from red to blue represents the customer's value level of the feature from high to low, and the trend of red, right, blue and left indicates that the customers with high value of the feature have a positive feature contribution, and the customer with a low value has a negative feature contribution, that is, the feature has a positive impact on the prediction results of the model. On the contrary, the trend of red, left, blue and right indicates that this feature has a negative impact on the prediction results of the model. The degree of dispersion of the scatter distribution also confirms the importance of the feature. In the model business review process, the relevant information can be used to judge the reasonableness and applicability of the model, and reveal which customer groups are more likely to upgrade to private travel.

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

Fig.3 Feature importance

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

Fig.4 Scatter plot of feature contribution

2. Selection of important features

Business people are usually more concerned about the most important factors that affect the results of the model, that is, the features with the highest contribution, without knowing the details of all the features (there are usually dozens or even hundreds of all features into the mold). And from the perspective of the ranking of contribution (whether it is a single sample or the model as a whole), the feature contribution usually starts to decline from the 4th ~ 6th. In addition, although the above results analyze the feature contribution, they are still not easy to understand, and business personnel need to have certain professional knowledge to understand it. Therefore, we select the top 5 characteristics of each customer with the greatest contribution for further interpretation and interpretation.

3. Feature binning

In addition to describing what the important features are, we also want to qualitatively describe the level of feature values, and the easiest way to do this is to bin. Although binning may have been done during modeling, it is not appropriate to use the exact same binning rules here, because the main principle of binning is to make the model prediction accurate, and the binning here is mainly to describe the relative level of feature values for business understanding. Therefore, we use as simple binning rules as possible for the characteristics of the same data type.

(1) Numerical features, according to the quartile of the customer in the modeling range, are divided into 4 boxes, and some special values are processed separately, for example, outliers similar to 999999 may have special business meanings, and they should be divided into 1 box separately.

(2) Character features, which are directly divided into N boxes according to the number of value types.

(3) Boolean features, which are directly divided into 2 boxes according to the value.

4. Business interpretation

According to the binning results, the features are classified and interpreted and converted into understandable text descriptions.

(1) Numerical features, which are described as "large", "large", "small" and "small" according to the bins. For example, if the value of the "transaction amount in the past 1 month" feature of a customer is greater than the 3/4 quantile of the feature, it will be described as "the customer has a large transaction amount in the past 1 month".

For example, the value of the feature of "minimum maturity period for holding wealth management products" is 999999, and the description is "currently not holding wealth management products".

(2) Character-based features, which are directly converted into descriptions according to binning. For example, if a customer's "Wealth Management Type with the Most Amount Held at the End of the Month" feature is "Foreign Currency Wealth Management", it will be described as "The type of wealth management with the most amount held at the end of the month is Foreign Currency Wealth Management".

(3) Boolean features, which are directly converted to descriptions based on binning. For example, if the value of the feature of "whether a customer holds a private placement product in the current month" is 1, it will be described as "the customer holds a private placement product in the current month".

The business interpretation of the top 5 important characteristics of each customer is combined into the business interpretation on which the output results of the customer model are based, and the customer description is formed. For example, "the customer has a large transaction amount in the past 1 month and holds a private placement product in the current month,...... Therefore, the model predicts that there is a high probability that it will be upgraded to private banking in the next month."

5. Marketing Applications

After preliminary exploration and practice, CEB has built a set of digitally-driven marketing system for the hierarchical operation of the whole life cycle of private bank customer value, and applied the model to screen target customers for each life cycle stage, and assisted in the formation of precision marketing clues with customer portrait information, which are pushed to the account manager through the intelligent marketing platform, the private account manager workbench and other system tools for outbound marketing. On the basis of the existing model, we embed explanatory descriptive information into marketing clues to conveniently and intuitively display the model impact factor of each customer for front-line marketers, which can not only dispel their doubts about the black box algorithm, but also explore the differentiated financial needs of customers, formulate targeted marketing strategies and words, and improve the success rate of marketing. For example, for customers with a large balance of nine assets at the end of the month, the gap with the private bank that meets the standard is small, and the account manager can recommend advantageous products with a lower threshold to achieve rapid asset improvement and promote conversion; For customers with a large number of recent transactions and large amounts, they may have more assets outside the bank, and they can match the corresponding products and rights based on the characteristics of customers' asset allocation preferences and consumption preferences, so as to attract customers to transfer funds to our bank and improve customer stickiness. The data shows that the marketing success rate of this customer group after optimizing leads is 2 percentage points higher than that before optimization.

At the same time, the contribution of the overall characteristics of the model is displayed to the business managers of the head office and branch in the form of visual kanban, which can help them understand the overall portrait structure of the customer group, monitor its changes, analyze the business direction, and guide business decisions.

Conclusion

CEB has widely applied the model interpretation method to the practice of intelligent marketing scenarios, such as customer asset level improvement, stable conversion and loss prevention, product position improvement, etc., breaking the barrier between the black box algorithm and business personnel, strengthening the supporting role of the model in business, and extending the application depth of the model strategy. In the future, CEB will continue to explore the business application of model interpretability, so that machine learning technology can more effectively empower the bank's digital transformation and development and create business value.

(This article was published in the first half of April 2024)

Read on