ml_unit_3_notes
ml_unit_3_notes
Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct output. In supervised learning,
the training data provided to the machines work as the supervisor that teaches the machines to
predict the output correctly. It applies the same concept as a student learns in the supervision of
the teacher.
In supervised learning, models are trained using labelled dataset, where the model learns
about each type of data. Once the training process is completed, the model is tested on
the basis of test data (a subset of the training set), and then it predicts the output.
The working of Supervised learning can be easily understood by the below example and
diagram:
Suppose we have a dataset of different types of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled
as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to
identify the shape. The machine is already trained on all types of shapes, and when it finds
a new shape, it classifies the shape on the bases of a number of sides, and predicts the
output.
1. Regression
Regression algorithms are used if there is a relationship between the input variable and the output variable.
It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below
are some popular Regression algorithms which come under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there are two classes
such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o With the help of supervised learning, the model can predict the output on the basis of prior
experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such as fraud detection,
spam filtering, etc.
o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is different from the training
dataset.
Linear Regression
Linear Regression is a machine learning algorithm based on supervised learning. Linear regression
is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that
is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc. It performs a regression task. Regression
models a target prediction value based on independent variables. It is mostly used for finding out the
relationship between variables and forecasting.
Advantages Disadvantages
Linear Regression is simple to implement On the other hand in linear regression technique
and easier to interpret the output outliers can have huge effects on the regression and
coefficients. boundaries are linear in this technique.
Linear regression can be further divided into two types of the algorithm:
A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:
Logistics Regression
o Logistic regression is one of the most popular Machine Learning algorithms, which comes under
the Supervised Learning technique. It is used for predicting the categorical dependent variable
using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome
must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between
0 and 1.
On the basis of the categories, Logistic Regression can be classified into three types:
o Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types
of the dependent variable, such as "cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as "low", "Medium", or "High".
Advantages Disadvantages
It can easily extend to multiple classes(multinomial The major limitation of Logistic Regression is the
regression) and a natural probabilistic view of class assumption of linearity between the dependent variable
predictions. and the independent variables.
It not only provides a measure of how appropriate a It can only be used to predict discrete functions. Hence,
predictor(coefficient size)is, but also its direction of the dependent variable of Logistic Regression is bound
association (positive or negative). to the discrete number set.
It can interpret model coefficients as indicators of It is tough to obtain complex relationships using logistic
feature importance. regression.
o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems.
o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps
in building the fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape,
and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending on each other.
Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values are
sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor
variables are the independent Booleans variables. Such as if a particular word is present or not in a
document. This model is also famous for document classification tasks.
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used
for Classification as well as Regression problems. However, primarily, it is used for Classification problems
in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
o
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and
classifier used is called as Non-linear SVM classifier.
o
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means if there are
2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum distance between
the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of the
hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a
Support vector.
Strength of SVM
• SVM is applicable only for binary classification, i.e. when there are only two classes in the problem
domain.
• The SVM model is very complex – almost like a black box when it deals with a high-dimensional
data set.
• It is slow for a large dataset, i.e. a data set with either a large number of features or a large
number of instances.
• It is quite memory-intensive.
Evaluation method
• False Negative (FN) is the number of incorrect predictions that an example is negative which
means positive class incorrectly identified as negative.
Example: Given class is spam. however, the classifier has been incorrectly predicted it as non-
spam.
• False positive (FP) is the number of incorrect predictions that an example is positive which
means negative class incorrectly identified as positive.
• Example: Given class is non-spam. however, the classifier has been incorrectly predicted it as
spam.
• True Positive (TP) is the number of correct predictions that an example is positive which
means positive class correctly identified as positive.
• Example: Given class is spam and the classifier has been correctly predicted it as spam.
• True Negative(TN) is the number of correct predictions that an example is negative which means
negative class correctly identified as negative.
• Example: Given class is spam and the classifier has been correctly predicted it as negative.
• The total number of correct classifications (either as the class of interest, i.e. True Positive or as
not the class of interest, i.e. True Negative) divided by total number of classifications done.