0% found this document useful (0 votes)
3 views

ml_unit_3_notes

Supervised machine learning involves training models on labeled data to predict outcomes, with applications in regression and classification. Key algorithms include Linear Regression, Logistic Regression, Naïve Bayes, and Support Vector Machines, each with distinct advantages and disadvantages. The effectiveness of these models is often evaluated using confusion matrices to assess prediction accuracy.

Uploaded by

1671320
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ml_unit_3_notes

Supervised machine learning involves training models on labeled data to predict outcomes, with applications in regression and classification. Key algorithms include Linear Regression, Logistic Regression, Naïve Bayes, and Support Vector Machines, each with distinct advantages and disadvantages. The effectiveness of these models is often evaluated using confusion matrices to assess prediction accuracy.

Uploaded by

1671320
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Supervised Machine Learning

Supervised learning is the types of machine learning in which machines are trained using well
"labelled" training data, and on basis of that data, machines predict the output. The labelled
data means some input data is already tagged with the correct output. In supervised learning,
the training data provided to the machines work as the supervisor that teaches the machines to
predict the output correctly. It applies the same concept as a student learns in the supervision of
the teacher.

How Supervised Learning Works?

In supervised learning, models are trained using labelled dataset, where the model learns
about each type of data. Once the training process is completed, the model is tested on
the basis of test data (a subset of the training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below example and
diagram:

Suppose we have a dataset of different types of shapes which includes square, rectangle,
triangle, and Polygon. Now the first step is that we need to train the model for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be labelled
as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is to
identify the shape. The machine is already trained on all types of shapes, and when it finds
a new shape, it classifies the shape on the bases of a number of sides, and predicts the
output.

Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems:

1. Regression

Regression algorithms are used if there is a relationship between the input variable and the output variable.
It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below
are some popular Regression algorithms which come under supervised learning:

o Linear Regression

o Regression Trees

o Non-Linear Regression

o Bayesian Linear Regression

o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical, which means there are two classes
such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

o Random Forest

o Decision Trees

o Logistic Regression

o Support vector Machines


Advantages of Supervised learning:

o With the help of supervised learning, the model can predict the output on the basis of prior
experiences.

o In supervised learning, we can have an exact idea about the classes of objects.

o Supervised learning model helps us to solve various real-world problems such as fraud detection,
spam filtering, etc.

Disadvantages of supervised learning:

o Supervised learning models are not suitable for handling the complex tasks.

o Supervised learning cannot predict the correct output if the test data is different from the training
dataset.

o Training required lots of computation times.

o In supervised learning, we need enough knowledge about the classes of object.

Linear Regression

Linear Regression is a machine learning algorithm based on supervised learning. Linear regression
is one of the easiest and most popular Machine Learning algorithms. It is a statistical method that
is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc. It performs a regression task. Regression
models a target prediction value based on independent variables. It is mostly used for finding out the
relationship between variables and forecasting.
Advantages Disadvantages

Linear Regression is simple to implement On the other hand in linear regression technique
and easier to interpret the output outliers can have huge effects on the regression and
coefficients. boundaries are linear in this technique.

When you know the relationship between


the independent and dependent variable Diversely, linear regression assumes a linear
have a linear relationship, this algorithm is relationship between dependent and independent
the best to use because of it’s less variables. That means it assumes that there is a
complexity to compared to other straight-line relationship between them. It assumes
algorithms. independence between attributes.

But then linear regression also looks at a relationship


Linear Regression is susceptible to over- between the mean of the dependent variables and the
fitting but it can be avoided using some independent variables. Just as the mean is not a
dimensionality reduction techniques, complete description of a single variable, linear
regularization (L1 and L2) techniques and regression is not a complete description of
cross-validation. relationships among variables.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression:


If a single independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Simple Linear Regression.

o Multiple Linear regression:


If more than one independent variable is used to predict the value of a numerical dependent
variable, then such a Linear Regression algorithm is called Multiple Linear Regression.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:

o Positive Linear Relationship:


If the dependent variable increases on the Y-axis and independent variable increases on X-axis,
then such a relationship is termed as a Positive linear relationship.
o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable increases on the X-axis,
then such a relationship is called a negative linear relationship.

Logistics Regression

o Logistic regression is one of the most popular Machine Learning algorithms, which comes under
the Supervised Learning technique. It is used for predicting the categorical dependent variable
using a given set of independent variables.

o Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome
must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but
instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between
0 and 1.

Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three types:

o Binomial: In binomial Logistic regression, there can be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more possible unordered types
of the dependent variable, such as "cat", "dogs", or "sheep"

o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of dependent
variables, such as "low", "Medium", or "High".

Advantages Disadvantages

If the number of observations is lesser than the number


Logistic regression is easier to implement, interpret, of features, Logistic Regression should not be used,
and very efficient to train. otherwise, it may lead to overfitting.

It makes no assumptions about distributions of


classes in feature space. It constructs linear boundaries.

It can easily extend to multiple classes(multinomial The major limitation of Logistic Regression is the
regression) and a natural probabilistic view of class assumption of linearity between the dependent variable
predictions. and the independent variables.

It not only provides a measure of how appropriate a It can only be used to predict discrete functions. Hence,
predictor(coefficient size)is, but also its direction of the dependent variable of Logistic Regression is bound
association (positive or negative). to the discrete number set.

Non-linear problems can’t be solved with logistic


regression because it has a linear decision surface.
Linearly separable data is rarely found in real-world
It is very fast at classifying unknown records. scenarios.

It can interpret model coefficients as indicators of It is tough to obtain complex relationships using logistic
feature importance. regression.

Naïve Bayes Classifier Algorithm

o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems.

o It is mainly used in text classification that includes a high-dimensional training dataset.

o Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps
in building the fast machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.

o Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape,
and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending on each other.

o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem

o The formula for Bayes' theorem is given as:

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this
dataset we need to decide that whether we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.

3. Now, use Bayes theorem to calculate the posterior probability.

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.

o It can be used for Binary as well as Multi-class Classifications.

o It performs well in Multi-class predictions as compared to the other Algorithms.

o It is the most popular choice for text classification problems.


Disadvantages of Naïve Bayes Classifier:

o Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the
relationship between features.

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring.

o It is used in medical data classification.

o It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.

o It is used in Text classification such as Spam filtering and Sentiment analysis.

Types of Naïve Bayes Model:

There are three types of Naive Bayes Model, which are given below:

o Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if
predictors take continuous values instead of discrete, then the model assumes that these values are
sampled from the Gaussian distribution.

o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial
distributed. It is primarily used for document classification problems, it means a particular
document belongs to which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.

o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor
variables are the independent Booleans variables. Such as if a particular word is present or not in a
document. This model is also famous for document classification tasks.

Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used
for Classification as well as Regression problems. However, primarily, it is used for Classification problems
in Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.
Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.

o
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and
classifier used is called as Non-linear SVM classifier.
o

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the dataset, which means if there are
2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum distance between
the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which affect the position of the
hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a
Support vector.

Strength of SVM

• SVM can be used for both classification and regression.


• It is robust, i.e. not much impacted by data with noise or outliers.
• The prediction results using this model are very promising.
• Effective on datasets with multiple features, like financial or medical data.
• Effective in cases where number of features is greater than the number of data points.
• Different kernel functions can be specified for the decision function. You can use common
kernels, but it's also possible to specify custom kernels.
Weakness of SVM

• SVM is applicable only for binary classification, i.e. when there are only two classes in the problem
domain.
• The SVM model is very complex – almost like a black box when it deals with a high-dimensional
data set.
• It is slow for a large dataset, i.e. a data set with either a large number of features or a large
number of instances.
• It is quite memory-intensive.

Evaluation method

• Confusion Matrix is a tool to determine the performance of classifier. It contains information


about actual and predicted classifications.
• A matrix containing correct and incorrect predictions in the form of TPs, FPs, FNs and TNs is known
as confusion matrix.

• False Negative (FN) is the number of incorrect predictions that an example is negative which
means positive class incorrectly identified as negative.

Example: Given class is spam. however, the classifier has been incorrectly predicted it as non-
spam.

• False positive (FP) is the number of incorrect predictions that an example is positive which
means negative class incorrectly identified as positive.
• Example: Given class is non-spam. however, the classifier has been incorrectly predicted it as
spam.
• True Positive (TP) is the number of correct predictions that an example is positive which
means positive class correctly identified as positive.
• Example: Given class is spam and the classifier has been correctly predicted it as spam.
• True Negative(TN) is the number of correct predictions that an example is negative which means
negative class correctly identified as negative.
• Example: Given class is spam and the classifier has been correctly predicted it as negative.

• The total number of correct classifications (either as the class of interest, i.e. True Positive or as
not the class of interest, i.e. True Negative) divided by total number of classifications done.

Difference between bagging and boosting


S.NO Bagging Boosting

The simplest way of combining A way of combining predictions


predictions that that
1. belong to the same type. belong to the different types.
2. Aim to decrease variance, not bias. Aim to decrease bias, not variance.
Models are weighted according to
3. Each model receives equal weight. their performance.
New models are influenced
by the performance of previously
4. Each model is built independently. built models.
Different training data subsets are Every new subset contains the
randomly drawn with replacement from elements that were misclassified
5. the entire training dataset. by previous models.
Bagging tries to solve the over-fitting
6. problem. Boosting tries to reduce bias.
If the classifier is unstable (high If the classifier is stable and simple
7. variance), then apply bagging. (high bias) the apply boosting.
Example: The Random forest model Example: The AdaBoost uses
8. uses Bagging. Boosting techniques

Topic 3.8 left to be covered

You might also like