0% found this document useful (0 votes)

93 views25 pages

ML Unit-2 Material WORD

This document discusses supervised machine learning techniques including classification and regression. It focuses on decision trees, describing their structure and how they work. The ID3 algorithm is explained as a method for building classification decision trees by calculating entropy and information gain at each step to determine the most important attributes to split the data on. An example applying ID3 to a weather dataset to predict if it is suitable for golf is provided to illustrate how the algorithm builds the tree recursively.

Uploaded by

afreed khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views25 pages

ML Unit-2 Material WORD

Uploaded by

afreed khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Machine Learning

(IV IT – I SEM.)

A.Y.: 2022 – 2023

UNIT – II

Supervised Learning

Learning a Class from Examples, Linear, Non-linear, Multi-class and Multi-label classification, Decision

Trees: ID3, Classification and Regression Trees (CART), Regression: Linear Regression, Multiple Linear

Regression, Logistic Regression.

[Type text]
UNIT – II
Supervised Learning

Decision Tree
Introduction Decision Trees are a type of Supervised Machine Learning (that is you
explain what the input is and what the corresponding output is in the training data) where the
data is continuously split according to a certain parameter. The tree can be explained by two
entities, namely decision nodes and leaves. The leaves are the decisions or the final outcomes.
And the decision nodes are wherethe data is split.

An example of a decision tree can be explained using above binary tree. Let‟s say you want to
predict whether a person is fit given their information like age, eating habit, and physical
activity, etc. The decision nodes here are questions like „What‟s the age?‟, „Does he exercise?‟,
and „Does he eat a lot of pizzas‟? And the leaves, which are outcomes like either „fit‟, or
„unfit‟. In this case this was a binary classification problem (a yes no type problem). There are
two main types of Decision Trees:

1. Classification trees (Yes/No types)

What we have seen above is an example of classification tree, where the outcome was a
variable like„fit‟ or „unfit‟. Here the decision variable is Categorical.

2. Regression trees (Continuous data types)

Here the decision or the outcome variable is Continuous, e.g. a number like 123. Working Now
that we know what a Decision Tree is, we‟ll see how it works internally. There are many
algorithms out there which construct Decision Trees, but one of the best is called as ID3
Algorithm. ID3 Stands for Iterative Dichotomiser 3. Before discussing the ID3 algorithm,
we‟ll go through few definitions. Entropy Entropy, also called as Shannon Entropy is denoted
by H(S) for a finite set S, is the measure of the amount of uncertainty or randomness in data.

Intuitively, it tells us about the predictability of a certain event. Example, consider a coin toss
whose probability of heads is 0.5 and probability of tails is 0.5. Here the entropy is the highest
possible, since there‟s no way of determining what the outcome might be. Alternatively,

[Type text]
consider a coin which has heads on both the sides, the entropy of such an event can be predicted

[Type text]
perfectly since we know beforehand that it‟ll always be heads. In other words, this event has no
randomness hence it‟s entropy is zero. In particular, lower values imply less uncertainty
while higher values imply high uncertainty. Information Gain Information gain is also called
as Kullback-Leibler divergence denoted by IG(S,A) for a set S is the effective change in entropy
after deciding on a particular attribute A. It measures the relative change in entropy with respect
to the independent variables.
Information Gain Formula

where IG(S, A) is the information gain by applying feature A. H(S) is the Entropy of the entire
set, while the second term calculates the Entropy after applying the feature A, where P(x) is the
probability of event x. Let‟s understand this with the help of an example Consider a piece of
data collected over the course of 14 days where the features are Outlook, Temperature,
Humidity, Wind and the outcome variable is whether Golf was played on the day. Now, our job
is to build a predictive model which takes in above 4 parameters and predicts whether Golf will
be played on the day. We‟ll build a decision treeto do that using ID3 algorithm.

Day Outlook Temperature Humidity Wind Play Golf

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

ID3
ID3 Algorithm will perform following tasks recursively

1. Create root node for the tree

2. If all examples are positive, return leaf node „positive‟
3. Else if all examples are negative, return leaf node „negative‟
4. Calculate the entropy of current state H(S)
5. For each attribute, calculate the entropy with respect to the attribute „x‟ denoted by H(S, x)
6. Select the attribute which has maximum value of IG(S, x)
7. Remove the attribute that offers highest IG from the set of attributes
[Type text]
8. Repeat until we run out of all attributes, or the decision tree has all leaf nodes.

Now we‟ll go ahead and grow the decision tree. The initial step is to calculate H(S), the Entropy
of the current state.In the above example, we can see in total there are 5 No‟s and 9 Yes‟s.

Ye No Tota
s l
9 5 14

where „x‟ are the possible values for an attribute. Here, attribute „Wind‟ takes two
possible values in the sampledata, hence x = {Weak, Strong} we‟ll have to calculate:

Amongst all the 14 examples we have 8 places where the wind is weak and 6 where the wind is Strong.

Wind = Wind = Tota

Weak Strong l
8 6 14

Now out of the 8 Weak examples, 6 of them were „Yes‟ for Play Golf and 2 of them were „No‟
for „Play Golf‟. So,we have,

Similarly, out of 6 Strong examples, we have 3 examples where the outcome was „Yes‟ for
Play Golf and 3where we had „No‟ for Play Golf.
[Type text]
Remember, here half items belong to one class while other half belong to other. Hence we
have perfect randomness.Now we have all the pieces required to calculate the Information
Gain,

Which tells us the Information Gain by considering „Wind‟ as the feature and give us
information gain of 0.048.Now we must similarly calculate the Information Gain for all the
features.

We can clearly see that IG(S, Outlook) has the highest information gain of 0.246, hence
we chose Outlookattribute as the root node. At this point, the decision tree looks like.

[Type text]
Here we observe that whenever the outlook is Overcast, Play Golf is always „Yes‟, it‟s no
coincidence by any chance, the simple tree resulted because of the highest information gain is
given by the attribute Outlook. Now how do we proceed from this point? We can simply apply
recursion, you might want to look at the algorithm steps described earlier. Now that we‟ve used
Outlook, we‟ve got three of them remaining Humidity, Temperature, and Wind. And, we had
three possible values of Outlook: Sunny, Overcast, Rain. Where the Overcast node already
ended up having leaf node „Yes‟, so we‟re left with two subtrees to compute: Sunny and Rain.

Table where the value of Outlook is Sunny looks like:

Temperature Humidity Wind Play Golf
Hot High Weak No
Hot High Strong No
Mild High Weak No
Cool Normal Weak Yes
Mild Normal Strong Yes

As we can see the highest Information Gain is given by Humidity. Proceeding in the same way with

will give us Wind as the one with highest information gain. The final Decision Tree looks
something likethis. The final Decision Tree looks something like this.

Classification and Regression

Trees Classification Trees
A classification tree is an algorithm where the target variable is fixed or categorical. The
algorithm is then used to identify the “class” within which a target variable would most likely
fall.
An example of a classification-type problem would be determining who will or will not
subscribe to a digital platform; or who will or will not graduate from high school.
These are examples of simple binary classifications where the categorical dependent variable
canassume only one of two, mutually exclusive values. In other cases, you might have to predict
among a number of different variables. For instance, you may have to predict which type of
smartphone a consumer may decide to purchase.
In such cases, there are multiple values for the categorical dependent variable. Here‟s what a
classic classification tree looks like
[Type text]
Regression Trees
A regression tree refers to an algorithm where the target variable is and the algorithm is
used topredict it‟s value. As an example of a regression type problem, you may want to predict
the selling prices of a residential house, which is a continuous dependent variable.
This will depend on both continuous factors like square footage as well as categorical factors
like the style of home, area in which the property is located and so on.

When to use Classification and Regression Trees

Classification trees are used when the dataset needs to be split into classes which belong to the
response variable. In many cases, the classes Yes or No.
In other words, they are just two and mutually exclusive. In some cases, there may be more than
two classes in which case a variant of the classification tree algorithm is used.
Regression trees, on the other hand, are used when the response variable is continuous. For
instance, if the response variable is something like the price of a property or the temperature of
the day, aregression tree is used.
In other words, regression trees are used for prediction-type problems while classification trees
areused for classification-type problems.

How Classification and Regression Trees Work

A classification tree splits the dataset based on the homogeneity of data. Say, for
instance, there are two variables; income and age; which determine whether or not a consumer
will buy a particular kind of phone.
If the training data shows that 95% of people who are older than 30 bought the phone, the data
[Type text]
gets split there and age becomes a top node in the tree. This split makes the data “95% pure”.
Measures of impurity like entropy or Gini index are used to quantify the homogeneity of the data
when it comes to classification trees.

In a regression tree, a regression model is fit to the target variable using each of the independent
variables. After this, the data is split at several points for each independent variable.
At each such point, the error between the predicted values and actual values is squared to get
“A Sumof Squared Errors” (SSE). The SSE is compared across the variables and the variable or
point which has the lowest SSE is chosen as the split point. This process is continued
recursively.

Advantages of Classification and Regression Trees

The purpose of the analysis conducted by any classification or regression tree is to create a set
of if-elseconditions that allow for the accurate prediction or classification of a case.
(i) The Results are Simplistic
The interpretation of results summarized in classification or regression trees is usually fairly
simple. Thesimplicity of results helps in the following ways.
 It allows for the rapid classification of new observations. That‟s because it is much
simpler to evaluate just one or two logical conditions than to compute scores using
complex nonlinear equations for each group.
 It can often result in a simpler model which explains why the observations are either
classifiedor predicted in a certain way. For instance, business problems are much easier
to explain with if-then statements than with complex nonlinear equations.
(ii) Classification and Regression Trees are Nonparametric & Nonlinear
The results from classification and regression trees can be summarized in simplistic if-then
conditions. This negates the need for the following implicit assumptions.
 The predictor variables and the dependent variable are linear.
 The predictor variables and the dependent variable follow some specific nonlinear link function.
 The predictor variables and the dependent variable are monotonic.
Since there is no need for such implicit assumptions, classification and regression tree methods
are well suited to data mining. This is because there is very little knowledge or assumptions that
[Type text]
can be made beforehand about how the different variables are related.
As a result, classification and regression trees can actually reveal relationships between these
variables that would not have been possible using other techniques.
(iii) Classification and Regression Trees Implicitly Perform Feature Selection
Feature selection or variable screening is an important part of analytics. When we use decision
trees, the top few nodes on which the tree is split are the most important variables within the set.
As a result, feature selection gets performed automatically and we don‟t need to do it again.

Limitations of Classification and Regression Trees

Classification and regression tree tutorials, as well as classification and regression tree ppts, exist in
abundance. This is a testament to the popularity of these decision trees and how frequently they are used.
However, these decision trees are not without their disadvantages.
There are many classification and regression trees examples where the use of a decision tree
has notled to the optimal result. Here are some of the limitations of classification and regression
trees.
(i) Overfitting
Overfitting occurs when the tree takes into account a lot of noise that exists in the data
and comes up with an inaccurate result.
(ii) High variance
In this case, a small variance in the data can lead to a very high variance in the
prediction, thereby affecting the stability of the outcome.
(iii) Low bias
A decision tree that is very complex usually has a low bias. This makes it very difficult
for the model to incorporate any new data.

What is a CART in Machine Learning?

A Classification and Regression Tree (CART) is a predictive algorithm used in machine
learning. Itexplains how a target variable‟s values can be predicted based on other
values.
It is a decision tree where each fork is a split in a predictor variable and each node at the
end has aprediction for the target variable.
The CART algorithm is an important decision tree algorithm that lies at the foundation of
machine learning. Moreover, it is also the basis for other powerful machine learning algorithms
like bagged decision trees, random forest and boosted decision trees.
Summing up
The Classification and regression tree (CART) methodology is one of the oldest and most
fundamental algorithms. It is used to predict outcomes based on certain predictor variables.
They are excellent for data mining tasks because they require very little data pre-processing.
Decision tree models are easy to understand and implement which gives them a strong
advantage when compared to other analytical models.

Regression
Regression Analysis in Machine learning
Regression analysis is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent variables. More
[Type text]
specifically, Regression analysis helps us to understand how the value of the dependent variable
is changing corresponding to an independent variable when other independent variables are held

[Type text]
fixed. It predicts continuous/real values such as temperature, age, salary, price,

etc. We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisement every year
and get sales on that. The below list shows the advertisement made by the company in the last 5
years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know
the prediction about the sales for this year. So to solve such type of prediction problems in
machine learning, we need regression analysis.
Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best fits the given datapoints,
using this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.

Some examples of regression can be as:

o Prediction of rain using temperature and other factors
o Determining Market trends
o Prediction of road accidents due to rash driving.

Terminologies Related to the Regression Analysis:

o Dependent Variable: The main factor in Regression analysis which we want to predict
or understand is called the dependent variable. It is also called target variable.
o Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called asa predictor.
o Outliers: Outlier is an observation which contains either very low value or very high

[Type text]
value in comparison to other observed values. An outlier may hamper the result, so it

[Type text]
should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each other than
other variables, then such condition is called Multicollinearity. It should not be present in
the dataset, because it creates problem while ranking the most affecting variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such problem is called
underfitting.

Why do we use Regression Analysis?

As mentioned above, Regression analysis helps in the prediction of a continuous variable.
There are various scenarios in the real world where we need some future predictions such as
weather condition, sales prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately. So for such case we need Regression
analysis which is a statistical method and used in machine learning and data science. Below are
some other reasons for using Regression analysis:
o Regression estimates the relationship between the target and the independent variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important
factor, theleast important factor, and how each factor is affecting the other factors.

Types of Regression
There are various types of regressions which are used in data science and machine learning.
Each type has its own importance on different scenarios, but at the core, all the regression
methods analyze the effect of the independent variable on dependent variables. Here we are
discussing some important types of regression which are given below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression

[Type text]
Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and
shows the relationshipbetween the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-
axis) and thedependent variable (Y-axis), hence called linear regression.

o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
o The relationship between variables in the linear regression model can be explained using
the below image. Here we are predicting the salary of an employee on the basis of the
year of experience.

Below is the mathematical equation for Linear regression:

[Type text]
Y= aX+b

Here, Y = dependent variables (target

variables),X= Independent variables
(predictor variables), a and b are the linear
coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates

o Salary forecasting

o Real estate prediction

o Arriving at ETAs in traffic.

Logistic Regression:
o Logistic regression is another supervised learning algorithm which is used to solve the
classification problems. In classification problems, we have dependent variables in a
binary or discrete format such as 0or 1.

o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or
No, True or False, Spam or not spam, etc.

o It is a predictive analysis algorithm which works on the concept of probability.

o Logistic regression is a type of regression, but it is different from the linear regression
algorithm in the term how they are used.

o Logistic regression uses sigmoid function or logistic function which is a complex cost
function. This sigmoid function is used to model the data in logistic regression. The
function can be represented as: 3
o f(x)= Output between the 0 and 1 value. 4

o x= input to the function

o e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:

[Type text]
o It uses the concept of threshold levels, values above the threshold level are rounded up
to 1, and valuesbelow the threshold level are rounded up to 0.

There are three types of logistic regression:

o Binary(0/1, pass/fail)

o Multi(cats, dogs, lions)

o Ordinal(low, medium, high)

Linear Regression in Machine

Learning

Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a
statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables suchas sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression shows
the linear relationship, which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship between
the variables. Consider the below image:

[Type text]
Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of
freedom) a1 = Linear regression coefficient (scale factor to each
input value).ε = random error
The values for x and y variables are training datasets for Linear Regression model representation.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical dependent
variable, then such aLinear Regression algorithm is called Simple Linear Regression.

o Multiple Linear regression:

If more than one independent variable is used to predict the value of a numerical
dependent variable, thensuch a Linear Regression algorithm is called Multiple Linear
Regression.

Linear Regression Line:

A linear line showing the relationship between the dependent and independent variables is
called a regression line.A regression line can show two types of relationship:
o Positive Linear Relationship:
If the dependent variable increases on the Y-axis and independent variable increases on
X-axis, then such arelationship is termed as a Positive linear relationship.

[Type text]
o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable increases
on the X-axis, thensuch a relationship is called a negative linear relationship.

Finding the
best fitline:

When working with linear regression, our main goal is to find the best fit line that
means the error betweenpredicted values and actual values should be minimized. The best fit
line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a different line of
regression, so weneed to calculate the best values for a0 and a1 to find the best fit line, so to
calculate this we use cost function.
Cost function-
o The different values for weights or coefficient of lines (a0, a1) gives the different line
of regression, and thecost function is used to estimate the values of the coefficient for
the best fit line.

o Cost function optimizes the regression coefficients or weights. It measures how a

linear regression modelis performing.

[Type text]
o We can use the cost function to find the accuracy of the mapping function, which
maps the input variableto the output variable. This mapping function is also known as
Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which
is the average ofsquared error occurred between the predicted values and actual values. It can
be written as:

For the above linear equation, MSE can be calculated as:

Where,
N=Total number of
observationYi = Actual
value
(a1xi+a0)= Predicted value.

[Type text]
Residuals: The distance between the actual value and predicted values is called residual. If
the observed points arefar from the regression line, then the residual will be high, and so cost
function will high. If the scatter points are close to the regression line, then the residual will be
small and hence the cost function.

Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost function.

o A regression model uses gradient descent to update the coefficients of the line
by reducing the costfunction.

o It is done by a random selection of values of coefficient and then iteratively update

the values to reach theminimum cost function.

Model Performance:
The Goodness of fit determines how the line of regression fits the set of
observations. The process offinding the best model out of various models is called
optimization. It can be achieved by below method:

1. R-squared method:
o R-squared is a statistical method that determines the goodness of fit.

o It measures the strength of the relationship between the dependent and

independent variables on a scale of0-100%.

o The high value of R-square determines the less difference between the predicted
values and actual valuesand hence represents a good model.

o It is also called a coefficient of determination, or coefficient of

multiple determination for multipleregression.

o It can be calculated from the below formula:

Assumptions of Linear Regression

Below are some important assumptions of Linear Regression. These are some formal
checks while building aLinear Regression model, which ensures to get the best possible result
from the given dataset.

o Linear relationship between the features and target:

Linear regression assumes the linear relationship between the dependent and independent variables.

o Small or no multicollinearity between the features:

[Type text]
Multicollinearity means high-correlation between the independent variables. Due to

[Type text]
multicollinearity, it may difficult to find the true relationship between the predictors and
target variables. Or we can say, it is difficult to determine which predictor variable is
affecting the target variable and which is not. So, the model assumes either little or no
multicollinearity between the features or independent variables.

o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the values of
independent variables.With homoscedasticity, there should be no clear pattern
distribution of data in the scatter plot.

o Normal distribution of error terms:

Linear regression assumes that the error term should follow the normal
distribution pattern. If error termsare not normally distributed, then confidence
intervals will become either too wide or too narrow, which may cause difficulties
in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without any
deviation, which meansthe error is normally distributed.

o No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there will be
any correlation in the error term, then it will drastically reduce the accuracy of the
model. Autocorrelation usually occurs ifthere is a dependency between residual errors.

Simple Linear Regression in Machine Learning

Simple Linear Regression is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown by a
Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple
Linear Regression.

The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on continuous or
categorical values.

Simple Linear regression algorithm has mainly two objectives:

o Model the relationship between the two variables. Such as the

relationship between Income andexpenditure, experience and Salary, etc.
o Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of acompany according to the investments in a year,
etc.

Simple Linear Regression Model:

[Type text]
The Simple Linear Regression model can be represented using the

[Type text]
below equation:y= a0+a1x+ ε

Wher
e, a0= It is the intercept of the Regression line (can be obtained putting
x=0) a1= It is the slope of the regression line, which tells whether the line
is
increasing or decreasing.ε = The error term. (For a good model it will be
negligible)

Multiple Linear Regressions

In the previous topic, we have learned about Simple Linear Regression, where a single
Independent/Predictor(X) variable is used to model the response variable (Y). But there may be
various cases in which the response variable is affected by more than one predictor variable; for
such cases, the Multiple Linear Regression algorithm is used.
Moreover, Multiple Linear Regression is an extension of Simple Linear regression as it
takes more than onepredictor variable to predict the response variable.

We can define it as:

“Multiple Linear Regression is one of the important regression algorithms which models the
linear relationshipbetween a single dependent continuous variable and more than one
independent variable.”

Example:
Prediction of CO2 emission based on engine size and number of cylinders in a car.

Some key points about MLR:

o For MLR, the dependent or target variable(Y) must be the continuous/real, but
the predictor or independentvariable may be of continuous or categorical form.

o Each feature variable must model the linear relationship with the dependent variable.

o MLR tries to fit a regression line through a multidimensional space of data-points.

MLR equation:
In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple
predictor variablesx1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so
the same is applied for the multiple linear regression equation, the equation becomes:
Y= b0+b1x1+ b2x2+
b3x3+...... bnxn (a)
Where,
Y= Output/Response variable
b0, b1, b2, b3 , bn = Coefficients of the model.
x1, x2, x3, x4,= Various Independent/feature variable
Assumptions for Multiple Linear Regression:
o A linear relationship should exist between the Target and predictor variables.
o The regression residuals must be normally distributed.
o MLR assumes little or no multicollinearity (correlation between the independent variable) in data.
[Type text]

Classification
No ratings yet
Classification
148 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Decision Tree
0% (1)
Decision Tree
24 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Practice Q Machine Learning Ans
No ratings yet
Practice Q Machine Learning Ans
54 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Tree Models
No ratings yet
Tree Models
42 pages
3 Happiness Exercises
No ratings yet
3 Happiness Exercises
20 pages
EF4e Uppint Filetest 5a
100% (6)
EF4e Uppint Filetest 5a
7 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Decision Trees Lectures
No ratings yet
Decision Trees Lectures
55 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-08-19 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-08-19 Reference-Material-I
11 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
ML Unit-2 Material
No ratings yet
ML Unit-2 Material
20 pages
UNIT-IV - Decision Tree Induction
No ratings yet
UNIT-IV - Decision Tree Induction
19 pages
Unit 3
No ratings yet
Unit 3
46 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Shinva 80L
100% (1)
Shinva 80L
5 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Slide 3
No ratings yet
Slide 3
23 pages
Dec Tree
No ratings yet
Dec Tree
17 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant Blog
No ratings yet
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant Blog
17 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
ICT 7 Learning Module
No ratings yet
ICT 7 Learning Module
77 pages
ID3
No ratings yet
ID3
7 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
DelcoRemy DiagnosticManual Updated Digital
No ratings yet
DelcoRemy DiagnosticManual Updated Digital
32 pages
WEG - Transformer
No ratings yet
WEG - Transformer
20 pages
Of Tribes, Hunters and Barbarians - Forest Dwellers in The Mauryan Period
No ratings yet
Of Tribes, Hunters and Barbarians - Forest Dwellers in The Mauryan Period
20 pages
MIL 11 - 12 Q3 0102 What Is Media and Information Literacy PS
No ratings yet
MIL 11 - 12 Q3 0102 What Is Media and Information Literacy PS
14 pages
BDA Answers-1
No ratings yet
BDA Answers-1
15 pages
The Hydrologic Budget
100% (1)
The Hydrologic Budget
6 pages
Introduction-to-TikTok-Shop-Affiliate-Program 2
No ratings yet
Introduction-to-TikTok-Shop-Affiliate-Program 2
10 pages
ch01 Edit v2
No ratings yet
ch01 Edit v2
33 pages
Random Variables: Complete Business Statistics, 8/e Instructor's Solutions Manual, Chapter 3
No ratings yet
Random Variables: Complete Business Statistics, 8/e Instructor's Solutions Manual, Chapter 3
33 pages
Transactions - 1
No ratings yet
Transactions - 1
41 pages
Guidance Mandatory Competence Attainment Report (v7) Final 04072012
No ratings yet
Guidance Mandatory Competence Attainment Report (v7) Final 04072012
8 pages
Installation Instructions: Diesel/Alternator Tachometer 3-3/8" & 5"
No ratings yet
Installation Instructions: Diesel/Alternator Tachometer 3-3/8" & 5"
2 pages
Lesson 3: Surface Creation
No ratings yet
Lesson 3: Surface Creation
86 pages
Class 10 - Maths - Arithmetic Progressions
No ratings yet
Class 10 - Maths - Arithmetic Progressions
51 pages
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
No ratings yet
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
55 pages
Air Pollution: Classification of Air Pollutants
No ratings yet
Air Pollution: Classification of Air Pollutants
33 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
C&NS Unit-1 R16
No ratings yet
C&NS Unit-1 R16
44 pages
Udgam School For Children: Page 1 of 2 Class-VII / Subject - English / Worksheet
No ratings yet
Udgam School For Children: Page 1 of 2 Class-VII / Subject - English / Worksheet
2 pages
Menalled Et Al Canopy Develop Trop Tree Plantations
No ratings yet
Menalled Et Al Canopy Develop Trop Tree Plantations
15 pages
Fuzzy Logic To Controlled Signal System
No ratings yet
Fuzzy Logic To Controlled Signal System
10 pages
Economic-Geology-1965 - v60-n07 - P1459-P1477structural Analysis of Ore Shoots at Greenside
No ratings yet
Economic-Geology-1965 - v60-n07 - P1459-P1477structural Analysis of Ore Shoots at Greenside
19 pages
Photo Resume
No ratings yet
Photo Resume
2 pages
Final Baba Ghulam Shah Badshah University Admission 2021 Notification 1
No ratings yet
Final Baba Ghulam Shah Badshah University Admission 2021 Notification 1
2 pages
Bda Pdf1 Bda Pdf2 Merged
No ratings yet
Bda Pdf1 Bda Pdf2 Merged
16 pages
Electrophysiology Devices Market Report
No ratings yet
Electrophysiology Devices Market Report
7 pages
Pollution Exam Questions
No ratings yet
Pollution Exam Questions
5 pages
PHD Student in "Innovation Management in The Context of The Space Sector"
No ratings yet
PHD Student in "Innovation Management in The Context of The Space Sector"
4 pages
Task 3 - Instructions Sheet
No ratings yet
Task 3 - Instructions Sheet
4 pages
Filtermedia HSL HSL-C Uk
No ratings yet
Filtermedia HSL HSL-C Uk
2 pages
C&ns Midi 3sets 22-23-19bq
No ratings yet
C&ns Midi 3sets 22-23-19bq
2 pages
Homework 2 DSP
No ratings yet
Homework 2 DSP
2 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet

ML Unit-2 Material WORD

Uploaded by

ML Unit-2 Material WORD

Uploaded by

Machine Learning

A.Y.: 2022 – 2023

Regression, Logistic Regression.

1. Classification trees (Yes/No types)

2. Regression trees (Continuous data types)

Day Outlook Temperature Humidity Wind Play Golf

1. Create root node for the tree

Wind = Wind = Tota

Table where the value of Outlook is Sunny looks like:

Classification and Regression

When to use Classification and Regression Trees

How Classification and Regression Trees Work

Advantages of Classification and Regression Trees

Limitations of Classification and Regression Trees

What is a CART in Machine Learning?

Some examples of regression can be as:

Terminologies Related to the Regression Analysis:

Why do we use Regression Analysis?

Below is the mathematical equation for Linear regression:

Here, Y = dependent variables (target

Some popular applications of linear regression are:

o Real estate prediction

o Arriving at ETAs in traffic.

o It is a predictive analysis algorithm which works on the concept of probability.

o x= input to the function

o e= base of natural logarithm.

There are three types of logistic regression:

o Multi(cats, dogs, lions)

o Ordinal(low, medium, high)

Linear Regression in Machine

Types of Linear Regression

o Multiple Linear regression:

Linear Regression Line:

o Cost function optimizes the regression coefficients or weights. It measures how a

For the above linear equation, MSE can be calculated as:

o It is done by a random selection of values of coefficient and then iteratively update

o It measures the strength of the relationship between the dependent and

o It is also called a coefficient of determination, or coefficient of

o It can be calculated from the below formula:

Assumptions of Linear Regression

o Linear relationship between the features and target:

o Small or no multicollinearity between the features:

o Normal distribution of error terms:

Simple Linear Regression in Machine Learning

Simple Linear regression algorithm has mainly two objectives:

o Model the relationship between the two variables. Such as the

Simple Linear Regression Model:

Multiple Linear Regressions

We can define it as:

Some key points about MLR:

o MLR tries to fit a regression line through a multidimensional space of data-points.

You might also like