本文将从回归和分类的本质、回归和分类的原理、回归和分类的算法三个方面，带您一文搞懂回归和分类 Regression And Classification 。

Regression and classification

1. The essence of regression and classification

Regression and classification are two basic prediction problems in machine learning. The essential difference between them is the type of output: the output of a regression problem is a continuous numerical value, and the output of a classification problem is a finite, discrete category label.

The essence of regression: The essence of regression is to find the relationship between the independent and dependent variables in order to be able to predict the output value of a new, unknown data point. For example, predict the price of a house based on its size, location, and other characteristics.

Return to the essence

Number of independent variables:
Univariate regression: Regression analysis that involves only one independent variable and one dependent variable.
Multiple regression: Regression analysis involving two or more independent variables and one dependent variable.
Relationship between the independent variable and the dependent variable:
Linear regression: The relationship between the independent variable and the dependent variable is assumed to be linear, i.e., the dependent variable is a linear combination of independent variables.
Nonlinear regression: The relationship between the independent variable and the dependent variable is nonlinear, which usually needs to be described by a nonlinear model.
Number of dependent variables:
Simple regression: A regression analysis in which there is only one dependent variable, regardless of the number of independent variables.
Multiple regression: Regression analysis involving multiple dependent variables. In this case, the model attempts to predict the values of multiple dependent variables at the same time.

The essence of classification: The essence of classification is to divide input data into predefined categories based on its characteristics. For example, you can determine the category (cat, dog, flower, etc.) that an image belongs to based on its content.

The essence of classification

Binary Classification: Indicates that there are two categories in the classification task. In binary classification, we usually use some common algorithms for classification, such as logistic regression, support vector machines, etc. For example, if we want to identify whether an image is a cat or not, this is a binary problem, because there are only two possible answers to yes or no.
Multi-Class Classification: Indicates that there are multiple categories in a classification task. Multi-classification assumes that each sample has one and only one label: a fruit can be an apple or a pear, but it can't be both. In multi-classification, we can use some common algorithms to classify, such as decision trees, random forests, etc. For example, classifying a bunch of fruit images, which could be oranges, apples, pears, etc., is a multi-classification problem.
Multi-Label Classification: Each sample is given a series of target labels, which can be thought of as the attributes of a data point that are not mutually exclusive. There are two methods of multi-label classification, one is to transform the problem into a traditional classification problem, and the other is to adapt the existing algorithm to adapt to multi-label classification. For example, a text may be considered a religious, political, financial, or education-related topic at the same time, which is a multi-label classification problem, because a text can have multiple tags at the same time.

2. Principles of regression and classification

Linear regression VS logistic regression

The principle of regression: to explore the relationship between independent and dependent variables by building a mathematical model between them.

linear regression

线性回归(Linear Regression):求解权重(w)和偏置(b)的主要步骤。

求解权重（w）和偏置（b）

Initialize weights and biases: Select initial values for weights w and bias b, and prepare training data X and label y.
Define a loss function: Choose a loss function, such as mean squared error, to measure the difference between the model prediction and the actual value.
Apply the gradient descent algorithm: Use the gradient descent algorithm to iteratively update w and b to minimize the loss function until the stop condition is met.

The gradient descent algorithm iteratively updates w and b

Obtain and verify the final parameters: When the algorithm converges, the final w and b are obtained, and the model performance is checked on the validation set.
Build the final model: Build a linear regression model with the final w and b for new data prediction.

New data forecasts

The principle of classification: to divide things or concepts into the same category according to their common characteristics, and to divide things or concepts with different characteristics into different categories.

Logistic regression

Logistic Regression: A binary classification algorithm that maps linear regression results to probability through the sigmoid function.

Feature engineering: Transform and enhance the original feature to better represent the problem.
Model building: Build a logistic regression model and use the sigmoid function to map linear combinations to probabilities.
Model training: Train a model by optimizing algorithms, such as gradient descent, to minimize the loss function.
Model evaluation: Evaluate the performance of your model using a validation set or test set.
Prediction: Apply the trained model to classify and predict new data.

Identification of dogs and cats

3. Regression and classification algorithms

Regression algorithm: Mainly used to predict numerical data.

Linear Regression: This is the most basic and common regression algorithm that assumes a linear relationship between the dependent and independent variables and fits the data by minimizing the squared error between the predicted and actual values.
Polynomial Regression: Polynomial regression can be used when the relationship between the independent and dependent variables is nonlinear. It captures nonlinear relationships by introducing higher-order terms of independent variables to fit the data.
Decision Tree Regression: Decision tree regression is a tree-based regression method that divides the data space by constructing a decision tree and fitting a simple model (such as a constant or linear model) on each leaf node. Decision tree regression is easy to understand and interpret, is able to handle nonlinear relationships, and is insensitive to feature selection.
Random Forest Regression: Random Forest Regression is an ensemble learning method that improves regression performance by constructing multiple decision trees and combining their predictions. Random forest regression is able to handle high-dimensional data and nonlinear relationships, and is robust to noise and outliers.

Classification algorithm: It is mainly used to discover category rules and predict the category of new data.

Logistic Regression: Despite the name "regression", logistic regression is actually a classification algorithm that is commonly used for binary classification problems. It maps the output of linear regression to (0,1) through a logistic function to obtain the probability that the sample point belongs to a certain category. In regression problems, logistic regression is sometimes used to deal with cases where the dependent variable is binary, and the problem can be seen as a regression to probability.
Support Vector Machine (SVM): Support vector machine is a classification algorithm based on statistical learning theory. It achieves classification by finding a hyperplane to maximize the spacing between different classes. SVMs perform well in high-dimensional spaces and with finite samples, and can also be extended with kernel functions for nonlinear problems.
K-Nearest Neighbors (KNN): K-Nearest Neighbors is an instance-based learning algorithm that determines the class of an input sample based on the class of its K nearest neighbor samples. The KNN algorithm is simple and does not require a training phase, but it may be less efficient when processing large datasets.
Naive Bayesian classifier: Naive Bayes is a classification algorithm based on Bayes' theorem, which assumes that features are independent of each other (i.e., naïve assumptions). Although this assumption is often not true in practical applications, naïve Bayesian classifiers still perform well in many areas, especially in areas such as text classification and spam filtering.

I will share some of the artificial intelligence learning materials I have compiled for free to you, which has been sorted out for a long time and is very comprehensive. Including artificial intelligence basic introductory video + AI common framework actual video, machine learning, deep learning and neural network and other videos, courseware source code, completed projects, AI popular papers, etc.

The following is a screenshot, scan the code to enter the group to receive it for free: scan the code to enter the group to receive information

I will regularly share the development of artificial intelligence, employment and related information with my friends in the group.

Finally, I wish you all progress every day!!

Neural Network Algorithms - Understanding Regression and Classification in One Article

The principle of regression: to explore the relationship between independent and dependent variables by building a mathematical model between them.

线性回归(Linear Regression):求解权重(w)和偏置(b)的主要步骤。

Read on

To regulate issues such as swiping orders and speculation, and cashback from positive reviews, the "Interim Provisions on Online Anti-Unfair Competition" was promulgated

Morning Post |OpenAI CEO says iPhone is the greatest tech product / Tesla will spend $500 million to build charging network / Former Blizzard president calls out Microsoft Xbox

"Computing power × connection" H3C Network makes AI computing more inclusive

Hancun Township: Organize and carry out publicity activities to prevent minors from becoming addicted to the Internet

What should I do if I am infringed? The "Guidelines for Reporting Online Infringement Information Involving Enterprises" gives you advice

The market for general-purpose network communication command platforms is growing rapidly

Wary! A variety of "will sell" scams, specifically to deceive the elderly! Prevent telecom network fraud

The 2024 Jiading Youth Internet Literacy Knowledge Contest is waiting for you to participate!

Focus on "artificial intelligence + education" and see what the Provincial Education Network Association will do after the change

The State Administration for Market Regulation issued the Interim Provisions on Anti-Unfair Competition Online

Due to the 4/5G patent lawsuit, Lenovo and Motorola support cellular network equipment were banned in Germany

Cyberspace Data Security and Governance | YEF2024

Shi Jianzhong: Regulate online competition and escort the sustainable and healthy development of the digital economy

The State Administration for Market Regulation Issued the Interim Provisions on Anti-Unfair Competition on the Internet (with full text and interpretation)

Huangpu: Build a "3+7" network collaborative governance mechanism to shape a good network ecosystem

The spicy "children's clothing rubbing cheongsam" incident on the hot search: the Internet deformed aesthetics, the more you look at it, the more disgusting it becomes