Artificial intelligence is the general direction of the current era, and it is also the development trend of the next generation of information technology in the next 5-10 years. Regardless of the role, understanding the basic ideas, main methodological frameworks, and main processes of artificial intelligence will be of great benefit to us when negotiating business and collaborating with technical departments. The partners of the Algorithm Research Center will share the overall framework of artificial intelligence technology and the basic principles of entry-level machine learning to help us better understand artificial intelligence.
Part.1 Overall framework of artificial intelligence technology
01 What is a model?
Everyone should have heard of the recently popular ChatGPT, it is a natural language processing model, so what is a model? A model is an abstraction of the objective world that is constantly revised and improved.
For example, in astronomy, how do we know the earth? When we were young, the globe was a model, we knew that the earth was round, and after learning to correct it, we discovered that the earth was an ellipsoid.
For example, the periodic table is a modeling of chemical elements, which is expressed in the form of a table that implies different element laws.
Another example is a physical particle, you can think of the particle as a model, and after middle school to university, you will learn that what the particle can no longer represent may be embodied as a rigid body.
These so-called models are an abstraction of something, and the model itself is constantly being improved and revised.
02 From experience, patterns to models
As individuals, how do we understand the objective world?
The first stage is from the empirical level. One of the most common questions is introduced here. What is a bird? When we were young, we would go to see a lot of picture cards and learn what a bird is. Usually at the beginning we learned that birds can fly, and one day adults told us that chickens can't fly, but they are also birds. These are the results of learning and summarizing experience (hens are birds, sparrows are birds). The first level of our understanding of the objective world is experience. These experiences are like mathematical induction, which summarizes and summarizes some phenomena and mathematical laws.
The second stage is the pattern. So what is a pattern? Personal experience is limited, but we can extract a set of patterns or rules through a large number of experience summaries, such as here, combined with the example of hens and sparrows, we will correct the experience understanding, may feel that as long as the feathered is a bird, this is a pattern.
The third stage is through mathematical models or physical models. Suppose a bird has a specific DNA fragment from which we can identify whether an unknown species is a bird.
The fourth stage is artificial intelligence. How does artificial intelligence identify birds? This is actually an image recognition problem. For the artificial intelligence model, it gets a picture, he will learn the characteristics of the picture, we may not know whether he is looking at feathers or mouth or feet, but he can capture this picture and features, you just need to tell him that there is such a picture is a bird, that kind of picture is not a bird, through a large number of picture training, he can grasp the rules.
(For example, the dark area in the figure above is the area that the AI model focuses on when identifying and judging whether the four birds below are the specific types of birds on the left, and it can be seen that the AI model mainly captures the beak features of the bird on the left)
Therefore, through artificial intelligence models, we can directly put data into it without mining specific patterns, without specifying laws or rules, and let it learn to explore the objective world by itself.
Part.2 Fundamentals of entry-level machine learning
01 Simple principles of machine learning
Machine learning is a branch of artificial intelligence, and in many cases, it has become almost synonymous with artificial intelligence. At present, there are many prominent areas of machine learning in the industry, such as: computer vision, natural language processing, recommendation systems, and so on.
Machine learning algorithms use statistical techniques to automatically learn and identify patterns within data. With these laws, the algorithm can make highly accurate predictions.
In layman's terms, an algorithmic model is used to make a computer learn from the data to make predictions.
Why choose to use machine learning? "Instead of summarizing experience and knowledge and telling the computer, it is better to let the computer learn experience and knowledge autonomously."
For example, to determine the taste of oranges, ordinary computer algorithms need to artificially exhaust all possibilities and determine which combination of possibilities will cause the orange to be sweet.
Machine learning algorithms throw problems at the computer and let the computer learn which combinations cause the orange to be sweet.
If it contains a large number of characteristics such as color, size, origin, rainfall, weather, season, etc., ordinary algorithms are difficult to apply.
02 The basic process of artificial intelligence
The basic process of artificial intelligence can include two stages: training and prediction.
Training phase
First of all, the training data, the technical term is called training set. The model learns from the training data, mines internal laws, and constantly adjusts to improve the learning effect.
For example, provide specific image and label data, and the model automatically builds and learns, forming empirical knowledge and mining internal rules. If you make a mistake, you can adjust the model by providing more data.
Prediction phase
After the training is completed, the prediction/testing phase is entered, in the same way as the student's post-study exam. The values predicted by the trained model need to be compared with the real values, and if there is a difference, the model needs to be adjusted and new data provided for learning. The core of this lies in the model learning data and features on its own.
It should be emphasized that the training data and the test data should come from the same population, that is, independently and with the same distribution. For example, if pictures of birds are provided for training, images of the same category should be provided for testing.
Let's elaborate on what these models are.
03 Statistical-based machine learning
linear regression
For example, the essence of selecting a relatively simple linear regression model in artificial intelligence is to use regression analysis in mathematical statistics to determine the quantitative relationship between two or more variables that are interdependent. It is expressed in the form y = w'x+e, and e is a normal distribution with errors following a mean of 0.
At this point, the model is a mathematical expression, represented on the image, which is a straight line (as shown in the following figure). Imagine if the expression changes from a primary expression to a higher-order expression (which can fit more precisely), it will appear as a curve. The other figure below shows the use of logistic regression as a classification problem, which is represented on the image, and the intuitive display form is a logistic curve.
SVM
SVM (Support Vector Machine) refers to the support vector machine, which is a common method of discrimination. In the field of machine learning, it is a supervised learning model that is often used for pattern recognition, classification, and regression analysis.
Sometimes, a model can be not only a straight line or curve, but also a high-dimensional hyperplane in space. The figure below is a schematic diagram (only for illustration), the two color points represent different categories, the two classes are interlaced and mixed, just an ordinary high-order curve, can no longer be separated, at this time through the SVM model, you can find a more complex spatial high-dimensional plane (you can imagine a high-dimensional space surface) to separate the two colors.
At this time, in the SVM model, the model is mathematically essentially an equation (objective function, constraints) of the optimization problem, and the model is a hyperplane that cannot be drawn on a plane.
decision tree
Decision tree is based on the known probability of occurrence of various situations, through layer by layer judgment, the formation of decision tree, to find the probability of the expected value of the net present value greater than or equal to zero, is a model construction method that intuitively uses probability analysis to simulate the human hierarchical decision-making decision-making process.
At this time, the model is a series of judgment rules in logical form, and the visual representation of the model is a tree (it can also be multiple trees, such as random forest, Xgboost, etc.).
Neural networks
Neural networks explore models that simulate the function of the human brain nervous system through the modeling and connection of neurons, the basic unit of the human brain, and develop an artificial system with intelligent information processing functions such as learning, association, memory and pattern recognition. An important feature of a neural network is its ability to learn from the environment. The core of the neural network is that a complex connection network is built between the input and output to realize the automatic extraction and representation of high-dimensional features of abstract data (such as typical images, text, etc.).
At this time, the model is mathematically essentially an optimization problem (as shown in the figure above, finding the optimal solution based on gradient descent in the high-dimensional space), and the model is represented on a visual graph, which is a network.
(This diagram shows more models in machine learning)
Part.3 Machine learning related extensions
01 Basic characteristics of artificial intelligence
- Big data volumes
- Lots of callouts
- Complex model structure
- Strong correlation with data
In summary, we can know why we often say that the three core elements of artificial intelligence technology are data, models and computing power. Data is the foundation, and without enough data, deep learning and model training cannot be carried out. The model is the most important determinant, and it directly determines the accuracy of AI technology in solving problems in specific application scenarios. Computing power is the bottleneck of performance effect, and computing power determines the computing speed and processing power of artificial intelligence technology.
In addition, I believe you may have another question, that is, what is the difference between machine learning, deep learning and data mining?
You can see from this figure that although different people have different ways of understanding, in general, the industry's accepted understanding is that traditional data mining, as well as machine learning, deep learning, are essentially overlapping with each other!
The relationship between machine learning and deep learning is a very interesting question that will be focused on in the next section.
02 The development of machine learning
The following will show the author's understanding of the development process of machine learning from three levels: basic theory, engineering implementation, and data scenarios.
As can be seen from this figure, the theory of machine learning is constantly expanding, including ensemble learning, reinforcement learning, transfer learning, and so on. These fundamental theories are extremely important and are at the heart of machine learning's ability to continue to innovate.
At the same time, in the application of machine learning methods, corresponding solutions are developed in the face of corresponding problems in multiple links such as data acquisition, feature extraction, model construction, and model deployment (such as less data, difficulty in feature extraction construction, insufficient characterization of traditional models, and difficulties in deploying too large models in some scenarios), as shown in the figure below.
It should be emphasized that in terms of feature extraction and model construction, due to the application of neural network methods, the model can achieve automatic extraction and representation of high-dimensional features, and the model can be built more complex and flexible.
Especially after great success in the fields of images, text, etc., it flourished and eventually developed into a special branch: deep learning. (See here, do you already understand the difference between machine learning and deep learning?) )
With the introduction of neural networks and reinforcement learning, deep learning is booming, so where is the next breakthrough point? The future is worth looking forward to! (Remember the previous point?) The basic theories of machine learning, such as reinforcement learning, transfer learning, active learning, etc., will be the cornerstone and inexhaustible source of continuous innovation and vigorous development of artificial intelligence in the future)
Finally, we show the expansion of machine learning methods for specific data scenario problems, in actual problems, data is complex and diverse, so machine learning methods have developed a series of techniques and methods for data problems in specific scenarios.
03 Limitations of machine learning
Data distribution issues
It must be ensured that the data is independent and distributed, and the data learned by the model and the data examined must be the same distribution.
Reliability of model results
The stability of multiple results, the reliability of reasoning of predicted results.
Security issues
How to ensure the safety of AI models? For example, model attacks based on adversarial samples, etc.
other
- Inference (e.g. how the model gives causal inference, how the model assists decision-making)
- Explainability (Is the model unbiased and unbiased?) Whether the model's method is reliable)
- Challenges of large models (e.g. more training data required for large models, higher computing power requirements, more complex and unknowable model structures)
- Challenges with the amount of labeling (e.g. less labeling, less negative samples, uncertain positive samples, poor quality labeling, etc.)
About the author
Ake & Luoling
Data Link Algorithm Research Center
*Some of the pictures in this article originate from the Internet, the source is see watermark, invasion and deletion