Strategy Product Manager: Six commonly known algorithms for model training

author：Everybody is a product manager 2024-05-14 14:07:00

Strategic product managers need to understand some algorithmic logic in order to move their work forward. In this article, the author introduces six common algorithms and discusses the choice of algorithms for product design models.

Strategy Product Manager: Six commonly known algorithms for model training

1. Commonly used algorithms in the industry

As a strategy product manager who interfaces with algorithm students, we must have an understanding of the algorithm logic commonly used by algorithm students, and I will introduce the underlying algorithm logic and the types of tasks they are applicable to.

1. 逻辑回归（logistics regression，LR）

Model Training Category: Supervised Learning Algorithms.
Applicable Problem Task: Categorization.
Algorithm features: low complexity, strong interpretability, and good online effect.

Function formula:

y represents the model estimate, the value range is [0,1], and x represents the feature value of the input model, which can be understood as the specific value corresponding to a series of features finally used; T represents the transposition of the matrix, which has no actual numerical significance; w indicates the corresponding parameters that the model trained for each feature. Taking the CTR estimation model as an example, the predicted value output by the logistic regression model represents the business significance of the user's interest in the material.

In addition, although linear regression and logistic regression are both referred to as LR, linear regression solves regression problems, logically solves classification problems, and logistic regression models include linear regression models, which are linear regression models.

2. K近邻算法（K-nearest neighbor，KNN）

Model Training Category: Supervised Learning Algorithms.
Applicable problem tasks: classification, regression.
The value of K is the key factor, which needs to be verified by the cross-validation method (test set + training set).
Note: The idea of using the KNN algorithm is something that every strategy product manager needs to understand.

Classification Tasks:

1. Calculate the distance between the point to be classified (black cross) and other points of known categories.

2. According to the distance in the positive row, the category with the best proportion is the category of the points to be classified, and the calculation methods are (1) Euclidean distance and (2) Manhattan distance.

Return Mission:

The overall idea is consistent with the classification task, and the value of the prediction point is equal to the average of the nearest K points from the prediction point.

Summary:

The KNN algorithm does not have a model training mitigation, but is directly applied, so the time complexity of the KNN algorithm in the training link is 0, but in the application link, with the sharp increase in sample size and complexity, the KNN algorithm cannot be used in scenarios with high efficiency requirements.

3. 贝叶斯模型（Bayes Model）

Model Training Category: Supervised Learning Algorithms.
Applicable Problem Task: Categorization.
Model direction: "inverse probability" problem, used for mail classification, weather prediction.

Function formula:

4. K聚类算法（K-Means）

Model training category: Unsupervised learning algorithms.
Applicable Problem Task: Clustering.
K-Means does not have a model training link, and uses heuristic iteration, and the selection of K value is determined by the business scenario.

Steps:

Divide all samples into clusters, i.e., set the K value.
The model recalculates the centroid of the new cluster and classifies it again.
Keep repeating and optimizing.

5. 决策树（decision tree）

Model Training Category: Supervised Learning Algorithms.
Applicable problem tasks: classification, regression.
Core idea: Query datasets based on discriminating variables.

Basic Framework Elements:

1. Root node: contains all the original sample data, which will be further divided into multiple subsets.

2. Decision nodes and leaf nodes: Leaf nodes are "no longer divided", but they can be divided, and the decision nodes continue to be divided according to their characteristics.

3. Parent Node and Child Node: A node that is split into a child node is called the parent node of the child node.

Types of decision trees: (1) Classification trees (2) Regression trees

Decision Tree Effect Evaluation: Which Combination of Features to Choose for the Best Build?

Classification tree: Gini impurity assessment, the lower the impurity, the better the effect.
Regression Tree: Variance indicator evaluation, the smaller the variance, the better the model fit.

Key Parameters of Decision Tree:

The minimum number of samples included in a node split: too large for underfitting, too small for overfitting, and requires cross-validation to tune parameters.
Minimum number of samples in leaf nodes: Prevent too many leaf nodes, and divide them into small ones if the positive and negative samples are uneven.
Decision Tree Maximum Depth: Cross-validation resolution.
Overall leaf node number control.
The most used feature number in the overall split: According to the modeling experience, the number of features with the open root number is the optimal feature number.

6. 深度神经网络（deep neutral network）

There is a lot of information on the Internet, so you can find it yourself.

To put it simply, "depth" in deep learning refers to its hidden layer, and in the output layer and input layer, the more hidden layers, the greater the depth. The differences between deep learning and conventional neural network algorithms are mainly reflected in the training data, training methods, and number of layers.

At present, in the field of product planning, deep learning can solve the visual recognition problems in the security field and retail industry, as well as natural speech recognition and language processing applications such as ChatGPT.

Second, the selection of product design model algorithms

Multiple algorithms may be used for the same business scenario, but as a product manager, we need to focus on two key aspects of the model: the accuracy of the model's predictions and the interpretability of the model.

For scenarios that are subject to strong supervision such as financial risk control, we prefer to use interpretable models, while for product experience scenarios such as Souguang Push, we pay more attention to the use effect of the product.

This article was originally published by @产品研习中 on Everyone is a Product Manager. Reproduction without the permission of the author is prohibited

The title image is from Unsplash and is licensed under CC0

The views in this article only represent the author's own, everyone is a product manager, and the platform only provides information storage space services.