laitimes

Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

author:Everybody is a product manager
Data is the beginning of everything, and the recall determines the upper limit of the entire recommendation system, and if the recall is wrong at the beginning, the recommendation effect of the whole system will be poor.
Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

There are three common recall strategies:

Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

1. Rule recall

The most commonly used recall strategy with the strongest explanatory nature.

Advantages: The policy logic is clear and clear, the business significance is clear, and the interpretability is strong

Disadvantages: Weak personalization, one face, easy to cause Matthew effect, more and more head exposure.

Applicable scenario: When you first build a recommendation system

Label recall

How to use: It was first applied to music and movie websites, marking content and users at the same time, and calculating the overlap of the two tags.

Core issues: how to build a scientific and comprehensive labeling system, how to mark users and content, the mainstream marking method is still manual marking.

High-quality sub-recall & category recall

How to use: E-commerce recommendation and content recommendation scenarios, suitable for cold start.

For example, in the field of e-commerce, the quality score of materials is comprehensively evaluated through historical sales, praise rate, number of collections, etc.; The content is comprehensively evaluated by the number of views and interactions.

Note: The quality factor is equipped with hyperparameters, which determine the importance of this part in the whole formula, which is set manually, and the parameters are obtained from model training.

Quality factor normalization: Min-Max normalization formula is carried out, and different categories of e-commerce need to be normalized to prevent the impact of great differences.

Hot recall

How to use: Recall recently popular materials, suitable for new user recall strategy, "hot" is defined by the business, and the statistical period (long, medium, short) needs to be designed as x, y, z.

High click-through rate recalls

How to use: Recall the core indicator of the "CTR estimation model".

Repurchase recall

How to use: Frequently used in the field of fresh e-commerce

Implementation: Unify the purchased goods based on the user dimension, and use the Min-Max normalization method, in the field of integrated e-commerce, bulk commodities will make users have a poor impression

2. Collaborative filtration

The most classic algorithms of the recommendation system include the material-based (Item-CF, 1998) algorithm and the user-based (User-CF, 1992) algorithm, that is, "collaboration + filtering", which uses group data to find patterns, measure the similarity between materials and users, exclude materials and users with low similarity, and then sort.

Core question: How to calculate the similarity between items and between users

Advantages: The algorithm logic is simple, easy to implement, and at the same time has good results, with a certain degree of personalization

Disadvantages: Consistent with the shortcomings of the rule recall, the cold start problem is obvious, there is a certain Matthew effect, and the top hot problem is easy to be associated with other products

Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies
Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

1. Mine the set of users who are similar to the target user, and take the users with the highest similarity as the candidate set.

Jacard系数:b = 0.4; wake = 0.25; with = 0.2; y = 0.75,b⌈高

2. Dig out the popular materials in the collection, and recommend the materials that the target user has not been exposed to.

Among the products browsed by B and E, A is the one who has browsed D and E, and its interest is estimated

Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

P(A,d) = 0.4*1+0.75*0 = 0.4;P(A,e) = 0.4*1+0.75*1=1.15,故而A对e商品的兴趣度高选择e商品为用户推荐

Item-based collaborative filtering (Item algorithm): It is widely used in major Internet companies and is calculated by cosine similarity.

For example, 6 users and 5 products.

Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

Calculate the similarity between products: The cosine theorem calculates the similarity between products

Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

Based on the target user's historical browsing behavior and the similarity between products, recommend products that they are interested in and have not browsed

There are only 5 products in this article, and target A has browsed a, b, and c. I haven't browsed d and e, so I estimated P(A,d) and P(A,e).

P(A,d) = 0.5*1+0*1+0.67*1=1.17

P(A,e) = 0.5*1+0.35*1+0.89*1=1.74

Therefore, it is given priority to recommend product E for user A.

The similarities and differences between the UserCF algorithm and the ItemCF algorithm are summarized

Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

基于图模型的方法(graph-based model)

There are two main steps:

1) Convert the data table into a bipartite graph

2) Judge the correlation based on the number of paths, the length of the path, and the out-of-degree of the nodes passing through the two vertices.

For example: "A-a-B-c", the path length is 3, there is only one path from A to c, and there are two paths from A to e, and A and e are more correlated than A and c.

Which of the two paths from A to E has a stronger correlation is compared to the out-degree (the vertex is connected to several other vertices), and the larger the out-degree, the weaker the correlation.

Recommended Strategies: What Product Managers Must Know (2): Three Common Recall Strategies

3. Vector-based recall

1. Implicit semantic model

The most classic application is the implicit semantic model, or implicit vector model.

In reality, the matrix between users and materials is very sparse and difficult to estimate, while the idea of the implicit semantic model is to mine the feature attributes between users and materials, classify users and materials into the same feature dimension, generally a four-quadrant dimension, and then compare.

Core: Decompose a co-occurrence matrix (the interaction matrix between users and materials) into two small matrices (user matrix and material matrix), and the two matrices are in the same vector dimension.

There are three common ways to decompose matrices:

Method 1: Eigenvalue decomposition

It can only be applied to the NxN matrix, and most users of the x material matrix are not square matrices and have no applicability.

Method 2: Singular value decomposition

It is suitable for all MxN matrices, but the density of the matrix is high, and the missing values must be completed with approximate values and average values when applying, which is complex to calculate and requires high resources.

Method 3: Gradient descent method

Funk SVD, also known as LFM, compares the predicted value to the actual scored value, the loss function is the mean square deviation, and it is iterated with gradient descent until the model converges.

Advantages and disadvantages of the implicit semantic model:

1) Strong generalization ability. To alleviate the problem of matrix sparsity to a certain extent

2) Low computational complexity. The computational complexity is (m+n)*k, and the co-matrix is m*m or n*n

3) Better flexibility and scalability. It can be combined or spliced with other features, and can also be combined with deep learning neural networks

Only the characteristics of users and materials are considered, and it is inconvenient to add users, materials, contextual features and some other interactive features, and the model itself has certain limitations.

2. Twin tower model

Merit:

Read on