Unit 2 - NOTES1 - ML
Unit 2 - NOTES1 - ML
Unit 2 - Supervised
Learning
Machine Learning
How Supervised Learning Works?
In supervised learning, models are trained using labelled dataset,
where the model learns about each type of data.
Once the training process is completed, the model is tested on the
basis of test data (a subset of the training set), and then it predicts
the output.
Classification in Machine Learning
Supervised Machine Learning algorithm can be broadly classified
into Regression and Classification Algorithms.
Classification is a process of categorizing a given set of data into
classes.
It can be performed on both structured or unstructured data.
A program learns from the given dataset or observations and then
classifies new observation into a number of classes or groups.
E.g. : Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes
can be called as targets/labels or categories.
The process starts with predicting the class of given data points.
The classes are often referred to as target, label or categories.
The classification predictive modeling is the task of
approximating the mapping function from input variables to
discrete output variables. The main goal is to identify which
class/category the new data will fall into.
The best example of an ML classification algorithm is
Email Spam Detector.
Classification in Machine Learning
E.g. : Heart disease detection can be identified as a classification problem,
this is a binary classification since there can be only two classes i.e has
heart disease or does not have heart disease. The classifier, in this case,
needs training data to understand how the given input variables are related
to the class. And once the classifier is trained accurately, it can be used to
detect whether heart disease is there or not for a particular patient.
The most common classification problems are – speech recognition, face
detection, handwriting recognition, document classification, etc. It can
be either a binary classification problem or a multi-class problem too.
Types of Classifcation
The algorithm which implements the classification on a dataset is
known as a classifier.
Binary Classifier: If the classification problem has only two
possible outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification problem has more than
two outcomes, then it is called as Multi-class Classifier.
Example: Classification of types of music.
Types Of Learners In Classification
Lazy Learners – Lazy learners simply store the training data and wait until
a testing data appears. The classification is done using the most related
data in the stored training data. They have more predicting time compared
to eager learners. Eg – k-nearest neighbor, case-based reasoning.
Eager Learners – Eager learners construct a classification model based
on the given training data before getting data for predictions. It must be
able to commit to a single hypothesis that will work for the entire space.
Due to this, they take a lot of time in training and less time for a
prediction. Eg – Decision Tree, Naive Bayes, Artificial Neural Networks.
Regression
Regression is the process of finding a model or function for distinguishing
the data into continuous real values instead of using classes or discrete
values.
Regression analysis is a statistical method to model the relationship
between a dependent (target) and independent (predictor) variables with one
or more independent variables.
Regression analysis helps us to understand how the value of the dependent
variable is changing corresponding to an independent variable when other
independent variables are held fixed.
It predicts continuous/real values such as temperature, age, salary, price, etc.
E.g. : In the below image, Compnay want to predict the sales if
it 200 for the Advertisement.
Regression
Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables.
It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
"Regression shows a line or curve that passes through all the datapoints
on target-predictor graph in such a way that the vertical distance between
the datapoints and the regression line is minimum."
Regression Examples
Some examples of regression can be as:
Prediction of rain using temperature and other factors
Determining Market trends
Prediction of road accidents due to rash driving.
Regression Terminology
Dependent Variable: The main factor in Regression analysis which we want to predict
or understand is called the dependent variable. It is also called target variable.
Independent Variable: It affect the dependent variables or which are used to predict
the values of the dependent variables are called independent variable, also called as a
predictor.
Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to othe observed values.
Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity.
Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such problem is called
underfitting.
Classification vs Regression
Parameter CLASSIFICATION REGRESSION
In Linear Regression, we predict the value by In Logistic Regression, we predict the value by 1
an integer number. or 0.
Here dependent variable should be numeric Here the dependent variable consists of only two
and the response variable is continuous to categories. Logistic regression estimates the odds
value. outcome of the dependent variable given a set of
quantitative or categorical independent variables.
Linear regression is used to estimate the Whereas logistic regression is used to calculate
dependent variable in case of a change in the probability of an event. For example, classify if
independent variables. For example, predict mail is spam or not.
the price of houses.
Polynomial Regression
It is the same as Multiple Linear Regression with a little modification.
It is used for curvilinear data.
It is a regression algorithm that models the relationship between a dependent(y) and independent variable(x)
as nth degree polynomial.
y= b0+b1x1+ b2x12+ b2x13+...... bnx1n
It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial terms
to the Multiple Linear regression equation to convert it into Polynomial Regression.
It is a linear model with some modification in order to increase the accuracy.
Support Vector Regression
It identifies a hyperplane with maximum margin such that
the maximum number of data points are within those
margins.
The basic idea behind SVR is to find the best fit line. In SVR,
the best fit line is the hyperplane that has the maximum number
of points.
In SVR, this straight line is referred to as hyperplane.
The data points on either side of the hyperplane that are closest
to the hyperplane are called Support Vectors which is used to
plot the boundary line.
Unlike other Regression models that try to minimize the error
between the real and predicted value, the SVR tries to fit the
best line within a threshold value.
The threshold value is the distance between the hyperplane and
boundary line.
Decision Tree Regression
It is a tree-structured classifier with three types of nodes.
The Root Node is the initial node which represents the entire
sample and may get split further into further nodes.
The Interior Nodes represent the features of a data set and the
branches represent the decision rules.
Finally, the Leaf Nodes represent the outcome. This algorithm
is very useful for solving decision-related problems.
Advantages of Supervised learning:
With the help of supervised learning, the model can predict the
output on the basis of prior experiences.
In supervised learning, we can have an exact idea about the
classes of objects.
Supervised learning model helps us to solve various real-world
problems such as fraud detection, spam filtering, etc.
Disadvantages of supervised learning:
Supervised learning models are not suitable for handling the
complex tasks.
Supervised learning cannot predict the correct output if the test
data is different from the training dataset.
Training required lots of computation times.
In supervised learning, we need enough knowledge about the
classes of object.