0% found this document useful (0 votes)
32 views

Unit 2 - NOTES1 - ML

Uploaded by

mauli.imscit21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Unit 2 - NOTES1 - ML

Uploaded by

mauli.imscit21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Machine Learning

Unit 2 - Supervised
Learning
Machine Learning
How Supervised Learning Works?

In supervised learning, models are trained using labelled dataset,
where the model learns about each type of data.

Once the training process is completed, the model is tested on the
basis of test data (a subset of the training set), and then it predicts
the output.
Classification in Machine Learning

Supervised Machine Learning algorithm can be broadly classified
into Regression and Classification Algorithms.

Classification is a process of categorizing a given set of data into
classes.

It can be performed on both structured or unstructured data.

A program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups.

E.g. : Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes
can be called as targets/labels or categories.

The process starts with predicting the class of given data points.
The classes are often referred to as target, label or categories.

The classification predictive modeling is the task of
approximating the mapping function from input variables to
discrete output variables. The main goal is to identify which
class/category the new data will fall into.
The best example of an ML classification algorithm is
Email Spam Detector.
Classification in Machine Learning

E.g. : Heart disease detection can be identified as a classification problem,
this is a binary classification since there can be only two classes i.e has
heart disease or does not have heart disease. The classifier, in this case,
needs training data to understand how the given input variables are related
to the class. And once the classifier is trained accurately, it can be used to
detect whether heart disease is there or not for a particular patient.

The most common classification problems are – speech recognition, face
detection, handwriting recognition, document classification, etc. It can
be either a binary classification problem or a multi-class problem too.
Types of Classifcation

The algorithm which implements the classification on a dataset is
known as a classifier.

Binary Classifier: If the classification problem has only two
possible outcomes, then it is called as Binary Classifier.

Examples: YES or NO, MALE or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.

Multi-class Classifier: If a classification problem has more than
two outcomes, then it is called as Multi-class Classifier.

Example: Classification of types of music.
Types Of Learners In Classification

Lazy Learners – Lazy learners simply store the training data and wait until
a testing data appears. The classification is done using the most related
data in the stored training data. They have more predicting time compared
to eager learners. Eg – k-nearest neighbor, case-based reasoning.


Eager Learners – Eager learners construct a classification model based
on the given training data before getting data for predictions. It must be
able to commit to a single hypothesis that will work for the entire space.
Due to this, they take a lot of time in training and less time for a
prediction. Eg – Decision Tree, Naive Bayes, Artificial Neural Networks.
Regression

Regression is the process of finding a model or function for distinguishing
the data into continuous real values instead of using classes or discrete
values.

Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables with one
or more independent variables.

Regression analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable when other
independent variables are held fixed.

It predicts continuous/real values such as temperature, age, salary, price, etc.

E.g. : In the below image, Compnay want to predict the sales if
it 200 for the Advertisement.
Regression

Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables.

It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.

"Regression shows a line or curve that passes through all the datapoints
on target-predictor graph in such a way that the vertical distance between
the datapoints and the regression line is minimum."
Regression Examples

Some examples of regression can be as:


Prediction of rain using temperature and other factors

Determining Market trends

Prediction of road accidents due to rash driving.
Regression Terminology

Dependent Variable: The main factor in Regression analysis which we want to predict
or understand is called the dependent variable. It is also called target variable.

Independent Variable: It affect the dependent variables or which are used to predict
the values of the dependent variables are called independent variable, also called as a
predictor.

Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to othe observed values.

Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity.

Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such problem is called
underfitting.
Classification vs Regression
Parameter CLASSIFICATION REGRESSION

Basic The mapping function is Mapping Function is used for


used for mapping values to the mapping of values to
predefined classes. continuous output.
Involves Discrete values Continuous values
prediction of
Nature of the Unordered Ordered
predicted data
Method of by measuring accuracy by measurement of root mean
calculation square error
Example Decision tree, logistic Regression tree (Random
Algorithms regression, etc. forest), Linear regression, etc.
Regression Techniques

Regression Analysis is a statistical process for estimating the
relationships between the dependent variables and one or more
independent variables.

A regression problem is when the output variable is a real or
continuous value, such as “salary” or “weight”. Many different
models can be used, the simplest is linear regression.
Linear Regression

Linear Regression is a machine learning algorithm based on supervised
learning.

It is a statistical method that is used for predictive analysis. Linear regression
makes predictions for continuous/real or numeric variables such as sales, salary,
age, product price, etc.

Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), consequently called linear
regression.

It finds how the value of the dependent variable is changing according to the
value of the independent variable.

The above graph presents the linear relationship between the dependent
variable and independent variables. When the value of x (independent
variable) increases, the value of y (dependent variable) is likewise
increasing.

Mathematically, we can represent a linear regression as:

y= b0+b1x+b2x1+b3x3+ ε

Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
b0= intercept of the line (Gives an additional degree of freedom)
b1 = Linear regression coefficient (scale factor to each input value).
ε = random error
Types of Linear Regression

Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called
Simple Linear Regression.

Multiple Linear regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.


A linear line showing the relationship between the dependent and
independent variables is called a regression line. A regression line can
show two types of relationship:

Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent
variable increases on X-axis, then such a relationship is termed as a
Positive linear relationship.


Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent
variable increases on the X-axis, then such a relationship is called a
negative linear relationship.

Logistic Regression

Logistic regression is basically a supervised classification algorithm.

It is preferred when the dependent variable is binary in nature

Logistic regression is one of the types of regression analysis technique, which gets used
when the dependent variable is discrete. Example: 0 or 1, true or false, Spam or not
spam etc.

This means the target variable can have only two values, and a sigmoid curve denotes
the relation between the target variable and the independent variable.

Logit function is used in Logistic Regression to measure the relationship between the
target variable and independent variables.

Linear Vs Logistic
Linear Regression Logistic Regression

Linear Regression is a supervised regression Logistic Regression is a supervised classification


model. model.

In Linear Regression, we predict the value by In Logistic Regression, we predict the value by 1
an integer number. or 0.

No threshold value is needed. A threshold value is added.

Here dependent variable should be numeric Here the dependent variable consists of only two
and the response variable is continuous to categories. Logistic regression estimates the odds
value. outcome of the dependent variable given a set of
quantitative or categorical independent variables.

Linear regression is used to estimate the Whereas logistic regression is used to calculate
dependent variable in case of a change in the probability of an event. For example, classify if
independent variables. For example, predict mail is spam or not.
the price of houses.
Polynomial Regression

It is the same as Multiple Linear Regression with a little modification.

It is used for curvilinear data.

It is a regression algorithm that models the relationship between a dependent(y) and independent variable(x)
as nth degree polynomial.

y= b0+b1x1+ b2x12+ b2x13+...... bnx1n

It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial terms
to the Multiple Linear regression equation to convert it into Polynomial Regression.

It is a linear model with some modification in order to increase the accuracy.


Support Vector Regression

It identifies a hyperplane with maximum margin such that
the maximum number of data points are within those
margins.

The basic idea behind SVR is to find the best fit line. In SVR,
the best fit line is the hyperplane that has the maximum number
of points.

In SVR, this straight line is referred to as hyperplane.

The data points on either side of the hyperplane that are closest
to the hyperplane are called Support Vectors which is used to
plot the boundary line.

Unlike other Regression models that try to minimize the error
between the real and predicted value, the SVR tries to fit the
best line within a threshold value.

The threshold value is the distance between the hyperplane and
boundary line.
Decision Tree Regression

It is a tree-structured classifier with three types of nodes.

The Root Node is the initial node which represents the entire
sample and may get split further into further nodes.

The Interior Nodes represent the features of a data set and the
branches represent the decision rules.

Finally, the Leaf Nodes represent the outcome. This algorithm
is very useful for solving decision-related problems.
Advantages of Supervised learning:


With the help of supervised learning, the model can predict the
output on the basis of prior experiences.

In supervised learning, we can have an exact idea about the
classes of objects.

Supervised learning model helps us to solve various real-world
problems such as fraud detection, spam filtering, etc.
Disadvantages of supervised learning:


Supervised learning models are not suitable for handling the
complex tasks.

Supervised learning cannot predict the correct output if the test
data is different from the training dataset.

Training required lots of computation times.

In supervised learning, we need enough knowledge about the
classes of object.

You might also like