Machine learning fundamentals

Any aspiring data scientist has a lot of worries when starting his/her career. Why do we have to focus on machine learning today, or why is machine learning being talked about so much? Is machine learning a recent recent? I think there are dozens of definitions available and dozens of reasons why someone should go into it. We don't want to reinvent the wheel here, but we want to address several important aspects related to aspiring data scientists.

Why Machine Learning

We know that a huge amount of data is generated every minute, such as retail payments, GPS, photos, blogs, videos, e-commerce, investments, insurance, healthcare, accounting, logistics, utilities, and many more. Because there is so much data, there is an opportunity to make predictions on all of these fronts. Predicting means preparing for the future by making the right decisions in the present.

definition

Machine learning is the study of computer science, statistics, and mathematics that is used to make predictions or cluster data. The most widely used definition is that machine learning is an application of artificial intelligence (AI) that enables systems to automatically learn and improve experiences without explicit programming.

glossary

Independent variables: There is a set of variables/fields (often referred to as traits) that, when combined, give the output of the data. For example, multiple fields/variables can be used to predict rainfall forecasts anywhere, such as the geography of the region (tropical, coastal, mountainous, etc.), the months of the year, the state of the previous day, humidity levels, etc. These fields/variables are called dependent variables or traits. There are often multiple independent variables in any dataset.
Dependent variable: The data point we intend to predict is the dependent variable. In the example above, the dependent variable will be the rainfall forecast (yes or no) or the amount of rainfall we expect (in mm\). There is usually 1 dependent variable in any dataset, but there can also be multiple dependent variables.

3. Dataset: The combination of independent variables and independent variables is called a dataset. In other words, around a business problem, many different data points are grouped together to call a data set. For basic machine learning problems, it's usually in tabular form: each row is a data entry, and each column is a feature (or argument). For example, in the rainfall forecast example above, combining 300 examples (300 days of data) or rows and 5 columns (across 4 features and 1 output) together is called a dataset. In the example above, rows are referenced for each day, while different dimensions such as humidity, temperature, altitude, day of the week, and so on are referenced columns.

Training data: The complete dataset is divided into 3 parts, with training data typically being the largest part in the divided dataset. This is called training data because usually machine learning algorithms work on this set and create their models (technically called equations)
Validation Data: This is the 02nd block (from a larger, full data set) that verifies the accuracy or correctness of the model created. The model or equation (created during training) is run on this validation set, and as the model is run, the model changes the hyperparameters to further improve accuracy.
Test data: This is the final piece of the dataset where the model is run to predict the accuracy score
Fitting data (or training): Whenever someone says that the data is fitting or the data is being trained, it means that a machine learning algorithm is creating a model or creating a generalized equation that the data can fit. For example, the equation for a circle in two-dimensional space is (x-h)2 — (y-k)2 = r2 where r is the radius and the center of the circle is (h,k). Now, this equation is a generalized equation and we put any x,y in it and it will create a circle. Similarly, after creating the model, the machine learning model gives the value of the dependent variable whenever we enter a new value for the independent variable.

8. Loss: Loss is the difference between the predicted value and the actual value of a single training record. This gives how far the predicted value differs from the estimate of the actual value

Cost function: The cost function is the average of the losses of all training examples

10. Optimization: This is a process of minimizing losses by adjusting weights or parameters. It is achieved by a partial derivative (differentiation) of the ownership weight relative to the cost function

Parameter: A parameter is a weight associated with each argument. These weights change with each iteration of optimization. When these weights do not change or are significantly updated, we assume that the optimization is complete

Categories of machine learning

Let's understand that machine learning works for data and also works for digital data. This means that all text must be converted into digital data and then machine learning algorithms applied. This will be discussed later in the process.

Supervised learning: Such algorithms are needed when we assign independent variables (outputs) to each set of dependent variables (features or columns in the dataset). The problem is predicting the independent variable (output) given a new set of dependent variables (features).
Unsupervised learning: This type of algorithm is required or used when clustering or segregating datasets is required. For example, if we divide the students of the school into categories based on their characteristics (address, height, weight, age, scores obtained in the last year, drawing skills, sports medals, musical skills, wearing glasses or not, etc.). Underlying data Among these traits, the unsupervised model can divide students into 3 or 4 categories, i.e., a) studious, b) athletes, and c) artists.

Reinforcement Learning: Reinforcement learning is a subset of machine learning in which the model calculates all possible paths/options to reach/compute the destination and then selects the path/option that provides a reward (positive score) with the least penalty (negative score).

algorithm

Below is a list of commonly used algorithms

linear regression
Logistic regression
decision tree
Support vector machines
Naive Bayes
kNN
K-means
Random forest
Dimensionality reduction algorithms
Gradient boosting algorithm

a） GBM的

b） XGBoost

process

We have already explained this process in the data science section, and since machine learning is a part of data science, all the steps of machine learning are similar to the steps in data science. Read this article till the end, because the last step is your bonus item

Data collection: Data collection is the foundational building block of any machine learning problem. Data can be collected in a structured format (databases, available datasets, internet history) or in an unstructured format (videos, blogs, etc.).
Cleansing data: Cleansing/cleansing data refers to the process of data having no NULL values, data not having too many outliers, removing irrelevant columns, etc
Exploratory data analysis: Visualize data with charts to identify patterns, outliers, or key insights that require further action in the next step (feature engineering).
Feature Engineering: This step is done to achieve the right feature set with the help of — a) adding more records to the dataset, b) adding more features, c) grouping operations (e.g., maximum, minimum, pivot values) d) normalizing/scaling the data, e) logarithmic or exponential transformations, f) redesigning features for dimensionality reduction or identifying collinearity, g) monothermal encoding, and many more

5. Algorithm selection: There are multiple algorithms that can be applied to a single business problem, so we can test multiple algorithms. For example, in the case of classification, we can use logistic regression or decision trees or naïve Bayes, depending on which algorithm provides better accuracy.

Modeling: This includes training the model, which means finding the right set of weights (associated with columns/features/data) to create generalized equations. It includes tuning the model with the help of cross-validation and updating hyperparameters. Then, evaluate the accuracy of the model by running it on invisible data, known as test data. Once the accuracy reaches the threshold of business expectations, it is moved to production.
Production Deployment: Production deployment is a very critical element in any model and is rarely talked about/explained. Here are the tangents we need to consider when deploying the model

a) Was the model developed for web-based or device-based interactions?

b) Do we need real-time scaling of the model (expecting more users over a specific period of time) or will the users not change?

c) Does it need to be integrated with external devices such as webcams, e-commerce portals, etc.?

d) What are the security implications of using the model?

e) Do we want the customer to initialize the algorithm and call the prediction function every time, or do we want to use a REST API for our model?

A master plan on how to learn machine learning in detail

Use Youtube (first) to learn about supervised and unsupervised learning. Park reinforcement learning for a few months until you are confident in both supervised and unsupervised
Enroll in several courses on Coursera, Udemy, or any other online platform. It doesn't hurt to look for free or any place where you can get financial aid, but take a look at the content and match it to the above

3. Practice some algorithms on multiple different datasets (you can get a lot of free datasets online)

significant

One. Use Kaggle to create an account and download the dataset

b. Create a profile on GitHub and showcase your work on GitHub

Similarly, update your profile on LinkedIn

This is explained on a very basic level, and we recommend practicing at least 2-3 algorithms for each type of machine learning to start gaining confidence.

Machine learning fundamentals

Read on

MotorNerve: A Machine Learning Character Animation System [GDC 2024]

Using Python to predict the price trend of gold futures, it turns out that machine learning is so simple! (code included)

Use machine learning models to build quantitative timing strategies (with full-process code)

Inventory of 10 Commonly Used and Efficient Machine Learning Algorithms in the Field of Quantitative Trading (with Example Source Code)

Renormalization Swarm Meets Machine Learning: A Multi-scale Perspective Exploring the Intrinsic Unity of Complex Systems

CICC | Machine Learning Series (1): Exploring Factor Construction Paradigms Using Deep Reinforcement Learning Models

AI Illusions: Visual Illusions and Cognitive Challenges in Machine Learning Synergize Innovation

Big Data, Artificial Intelligence, and Machine Learning: A Paradigm Shift in Election Campaigns

Bionomous, a Swiss biotechnology company, has innovatively combined micro-engineering with machine learning to develop a fully automated device for the screening, sorting and dispensing of micro-biological entities Swiss Innovation Agency China Battalion

The 2024 China Internet Development, Innovation and Investment Competition (Open Source) landed at the 2024 Global Machine Learning Technology Conference

Machine learning collides with human resource management

The basic principle of the K-nearest neighbor algorithm for machine learning

Support vector machine algorithms for machine learning

Application of machine learning algorithms in mobile game recharge monitoring

Technology Application | Application of machine learning model explainability in bank intelligent marketing scenarios

Up to 170W performance! Lenovo's top mobile map station is newRecently, the Lenovo ThinkPadP series has finally been updated - ThinkPadP16AI2024

🍳 Kitchen tips that few mothers don't know, learn is earn! 1️⃣ In the past, boiled poached eggs were boiled directly into the pot, but the boiled eggs in this way are not only easy to scatter yellow, but also special

The children in the county shouted "The thing I regret the most is to study hard", is the educational competition really full of blood and tears?

The creation of self-media content is flowing like a tide, and Zhihu, as a treasure trove of knowledge, has become more and more fierce in the battle between original and pseudo-original. I, a senior creator of Zhihu, have pros and cons of pseudo-originality

Once, Guo Degang asked Wang Baoqiang: "I heard that you have practiced martial arts in Shaolin Temple for 6 years, so in reality, how many people can you beat at most?" Wang Baoqiang smiled stupidly and said, "It's hard to say

Do you dare to believe it: Jing Ke, who failed to assassinate Qin, has eight tombs! What spirit do you want to learn from him?

Party Discipline Study | What are the circumstances in which the punishment can be mitigated or mitigated? What circumstances should the punishment be heavier or aggravated?

Learn how Chen Shiduo can solve all kinds of throat problems flexibly and domineeringly (with fire soup and other prescriptions)

My child is about to enter first grade, and I have found that cultivating an attitude towards learning is more important than subject knowledge

The reason why the 8 people in the refrigerated compartment did not call for help was revealed, and the insider's voice was spoken, and the experience of netizens is worth learning

Is there anyone who is in charge of the cup? See a lovely mug. There is a saying that I have heard that there are many poor students. I don't agree with this sentence because my boss often uses this phrase to say me and I listen

Wang Fengkua woke up at the age of 8 to be a scholar, addicted to learning and couldn't extricate himself, inheriting his mother Zhang Ziyi's excellent genes

Ni Ni talks about serving as a judge of the film festival for the first time: I am here to learn, like a college student with a sense of luxury

The 60-year-old aunt eats apples every day after dinner, and the hospital has a physical examination after 8 months

There are 18 majors under the computer category, and this article allows you to understand the differences in the learning content of each major

Cobo Cat Grows with Me: Efficient Learning and Realizing Dreams

From confusion to determination - Cobo Cat helped me break through the learning dilemma