Machine Learning Lecture Notes
Machine Learning Lecture Notes
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the
development of algorithms which allow a computer to learn from the data and past experiences on their
own. The term machine learning was first introduced by Arthur Samuel in 1959. We can define it in a
summarized way as:
Machine learning enables a machine to automatically learn from data, improve performance from
experiences, and predict things without being explicitly programmed.
With the help of sample historical data, which is known as training data, machine learning algorithms
build a mathematical model that helps in making predictions or decisions without being explicitly
programmed. Machine learning brings computer science and statistics together for creating predictive
models. Machine learning constructs or uses the algorithms that learn from historical data. The more
we will provide the information, the higher will be the performance.
With the help of sample historical data, which is known as training data, machine learning algorithms
build a mathematical model that helps in making predictions or decisions without being explicitly
programmed. Machine learning brings computer science and statistics together for creating predictive
models. Machine learning constructs or uses the algorithms that learn from historical data. The more
we will provide the information, the higher will be the performance.
This machine learning tutorial gives you an introduction to machine learning along with the
wide range of machine learning techniques such as Supervised, Unsupervised, and
Reinforcement learning. You will learn about regression and classification models, clustering
1
methods, hidden Markov models, and various sequential models.
We can train machine learning algorithms by providing them the huge amount of data and let
them explore the data, construct the models, and predict the required output automatically. The
performance of the machine learning algorithm depends on the amount of data, and it can be
determined by the cost function. With the help of machine learning, we can save both time and
money.
The importance of machine learning can be easily understood by its uses cases, Currently,
machine learning is used in self-driving cars, cyber fraud detection, face recognition, and
friend suggestion by Facebook, etc. Various top companies such as Netflix and Amazonhave
build machine learning models that are using a vast amount of data to analyze the user interest
and recommend product accordingly.
Following are some key points which show the importance of Machine Learning:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
2
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample labeled
data to the machine learning system in order to train it, and on that basis, it predicts the output.
The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any supervision.
The training is provided to the machine with the set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal
of unsupervised learning is to restructure the input data into new features or a group of objects
with similar patterns.
In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning agent gets a
reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning,
the agent interacts with the environment and explores it. The goal of an agent is to get the most
reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
3
Supervised Machine Learning
Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct output.
In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student
learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the
machine learning model. The aim of a supervised learning algorithm is to find a mapping
function to map the input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.
The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is that we need to train the model for each shape.
4
o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to
identify the shape.
The machine is already trained on all types of shapes, and when it finds a new shape, it
classifies the shape on the bases of a number of sides, and predicts the output.
1. Regression
Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come
under supervised learning:
5
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there
are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
6
Difference between Regression and Classification
Regression Algorithm Classification Algorithm
In Regression, the output variable must be of
In Classification, the output variable must be
continuous nature or real value. a discrete value.
The task of the regression algorithm is to map
The task of the classification algorithm is to
the input value (x) with the continuous output
map the input value(x) with the discrete
variable(y). output variable(y).
Regression Algorithms are used with Classification Algorithms are used with
continuous data. discrete data.
In Regression, we try to find the best fit line,
In Classification, we try to find the decision
which can predict the output more accurately.
boundary, which can divide the dataset into
different classes.
Regression algorithms can be used to solve Classification Algorithms can be used to
the regression problems such as Weather solve classification problems such as
Prediction, House price prediction, etc. Identification of spam emails, Speech
Recognition, Identification of cancer cells,
etc.
The regression Algorithm can be further The Classification algorithms can be divided
divided into Linear and Non-linear into Binary Classifier and Multi-class
Regression. Classifier.
Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
7
Example: Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the given
dataset, which means it does not have any idea about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the image dataset into the groups
according to similarities between images.
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which make
unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding output so to
solve such cases, we need unsupervised learning.
Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine
learning model in order to train it. Firstly, it will interpret the raw data to find the hidden
patterns from the data and then will apply suitable algorithms such as k-means clustering,
Decision tree, etc.
8
Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
o Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities with
the objects of another group. Cluster analysis finds the commonalities between the data
objects and categorizes them as per the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is used for
finding the relationships between variables in the large database. It determines the set
of items that occurs together in the dataset. Association rule makes marketing strategy
more effective. Such as people who buy X item (suppose a bread) are also tend to
purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket
Analysis.
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
9
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
The main differences between Supervised and Unsupervised learning are given below:
10
Supervised learning is not close to true Unsupervised learning is more close to the
Artificial intelligence as in this, we first train true Artificial Intelligence as it learns
the model for each data, and then only it can similarly as a child learns daily routine things
predict the correct output. by his experiences.
It includes various algorithms such as Linear It includes various algorithms such as
Regression, Logistic Regression, Support Clustering, KNN, and Apriori algorithm.
Vector Machine, Multi-class Classification,
Decision tree, Bayesian Logic, etc.
What is Hypothesis?
The hypothesis is defined as the supposition or proposed explanation based on insufficient
evidence or assumptions. It is just a guess based on some known facts but has not yet been
proven. A good hypothesis is testable, which results in either true or false.
Example: Let's understand the hypothesis with a common example. Some scientist claims that
ultraviolet (UV) light can damage the eyes then it may also cause blindness. In this example, a
scientist just claims that UV rays are harmful to the eyes, but we assume they may cause
blindness. However, it may or may not be possible. Hence, these types of assumptions are
called a hypothesis.
11
In supervised learning techniques, the main aim is to determine the possible hypothesis out of
hypothesis space that best maps input to the corresponding or correct outputs.
There are some common methods given to find out the possible hypothesis from the Hypothesis
space, where hypothesis space is represented by uppercase-h (H) and hypothesisby
lowercase-h (h). These are defined as follows:
It is often constrained by choice of the framing of the problem, the choice of model, and the
choice of model configuration.
However, if the machine learning model is not accurate, it can make predictions errors, and
these prediction errors are usually known as Bias and Variance.
In machine learning, these errors will always be present as there is always a slight difference
between the model predictions and actual predictions.
The main aim of ML/data science analysts is to reduce these errors in order to get more accurate
results.
12
Errors in Machine Learning?
In machine learning, an error is a measure of how accurately an algorithm can make predictions
for the previously unknown dataset. On the basis of these errors, the machine learning model
is selected that can perform best on the particular dataset. There are mainly two types of errors
in machine learning, which are:
o Reducible errors: These errors can be reduced to improve the model accuracy. Such errors
can further be classified into bias and Variance.
o Irreducible errors: These errors will always be present in the model regardless of which
algorithm has been used. The cause of these errors is unknown variables whose value can't be
reduced.
What is Bias?
In general, a machine learning model analyses the data, find patterns in it and make predictions.
While training, the model learns these patterns in the dataset and applies them to test data for
prediction. While making predictions, a difference occurs between prediction values made
13
by the model and actual values/expected values, and this difference is known as bias errors
or Errors due to bias. It can be defined as an inability of machine learning algorithms such as
Linear Regression to capture the true relationship between the data points. Each algorithm
begins with some amount of bias because bias occurs from assumptions in the model, which
makes the target function simple to learn. A model has either:
o Low Bias: A low bias model will make fewer assumptions about the form of the target function.
o High Bias: A model with a high bias makes more assumptions, and the model becomes unable
to capture the important features of our dataset. A high bias model also cannot perform well
on new data.
Generally, a linear algorithm has a high bias, as it makes them learn fast. The simpler the
algorithm, the higher the bias it has likely to be introduced. Whereas a nonlinear algorithm
often has low bias.
Some examples of machine learning algorithms with low bias are Decision Trees, k-Nearest
Neighbours and Support Vector Machines. At the same time, an algorithm with high bias
is Linear Regression, Linear Discriminant Analysis and Logistic Regression.
Low variance means there is a small variation in the prediction of the target function with
changes in the training data set. At the same time, High variance shows a large variation in
the prediction of the target function with changes in the training dataset.
A model that shows high variance learns a lot and perform well with the training dataset, and
does not generalize well with the unseen dataset. As a result, such a model gives good results
with the training dataset but shows high error rates on the test dataset.
14
Ways to Reduce High Variance:
o Reduce the input features or number of parameters as a model is overfitted.
o Do not use a much complex model.
o Increase the training data.
o Increase the Regularization term.
1. Low-Bias,Low-Variance:
The combination of low bias and low variance shows an ideal machine learning model.
However, it is not possible practically.
2. Low-Bias, High-Variance: With low bias and high variance, model predictions are
inconsistent and accurate on average. This case occurs when the model learns with a large
number of parameters and hence leads to an overfitting
3. High-Bias, Low-Variance: With High bias and low variance, predictions are consistent but
inaccurate on average. This case occurs when a model does not learn well with the training
dataset or uses few numbers of the parameter. It leads to underfitting problems in the model.
4. High-Bias, High-Variance:
With high bias and high variance, predictions are inconsistent and also inaccurate on average.
15
How to identify High variance or High Bias?
High variance can be identified if the model has:
Bias-Variance Trade-Off
While building the machine learning model, it is really important to take care of bias and
variance in order to avoid overfitting and underfitting in the model. If the model is very simple
with fewer parameters, it may have low variance and high bias. Whereas, if the model has a
large number of parameters, it will have high variance and low bias. So, it is required to make
a balance between bias and variance errors, and this balance between the bias error and variance
error is known as the Bias-Variance trade-off.
Some examples of machine learning algorithms with low variance are, Linear Regression,
Logistic Regression, and Linear discriminant analysis. At the same time, algorithms with
high variance are decision tree, Support Vector Machine, and K-nearest neighbours.
16
Cross-Validation in Machine Learning
Cross-validation is a technique for validating the model efficiency by training it on the subset
of input data and testing on previously unseen subset of the input data. We can also say that it
is a technique to check how a statistical model generalizes to an independent dataset.
In ML, there is always the need to test the stability of the model. It means based only on the training
dataset; we can't fit our model on the training dataset. For this purpose, we reserve a particular sample
of the dataset, which was not part of the training dataset. After that, we test our model on that sample
before deployment, and this complete process comes under cross-validation. This is something different
from the general train-test split.
17