0% found this document useful (0 votes)

13 views11 pages

Unit-7 ML

Uploaded by

gsinren

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Unit-7 ML

Uploaded by

gsinren

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

UNIT-7

Decision Tree
Decision Tree is a Supervised learning technique that can be used
for both classification and Regression problems, but mostly it is
preferred for solving Classification problems. It is a tree-structured
classifier, where internal nodes represent the features of a
dataset, branches represent the decision rules and each leaf node
represents the outcome.
In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any
decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further
branches.
It is called a decision tree because, similar to a tree, it starts with
the root node, which expands on further branches and constructs
a tree-like structure.
In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
Decision Tree Algorithm
Input: Training data set, test data set (or data points)

Steps:
Do for all attributes
calculate the entropy Ei of the attribute Fi
if Ei < Emin then
Emin = Ei and Fmin = Fi
End if
End do

Split data set into subset using attribute Fmin

Draw a decision tree node containing the attribute Fmin and split
the data set into subsets
Repeat above steps until full tree is drawn covering all
attributes of original table
hhhh

Strengths of Decision Trees:

1. Interpretability: Decision trees are easy to understand and

interpret, making them a valuable tool for explaining the
reasoning behind a particular decision or prediction.
2. Non-linearity: Decision trees can model complex, non-linear
relationships in data without requiring extensive data
preprocessing or feature engineering.
3. Handling both numerical and categorical data: Decision trees can
handle a mix of numerical and categorical data without the need
for one-hot encoding or other data transformations.
4. Feature selection: Decision trees implicitly perform feature
selection by giving more importance to the most informative
features at the top of the tree, which can help reduce
dimensionality and improve model performance.
5. Can handle missing values: Decision trees can handle missing
values in the dataset by considering alternative paths when a
feature's value is missing.
6. Low computational complexity for prediction: Once a decision tree
is trained, making predictions on new data is fast and efficient,
with a time complexity of O(log(n)), where n is the number of
nodes in the tree.
Weaknesses of Decision Trees:
1. Overfitting: Decision trees are prone to overfitting, especially
when they are deep and complex. To mitigate overfitting,
techniques like pruning and setting a maximum depth or minimum
samples per leaf are often used.
2. Instability: Small changes in the data can lead to significant
changes in the structure of the tree, making decision trees
unstable compared to some other algorithms like random forests
or gradient boosting.
3. Limited expressiveness: While decision trees can model non-linear
relationships, they may struggle with highly complex patterns in
the data, which other algorithms like neural networks may handle
better.
4. Not always the most accurate: Decision trees may not always
produce the most accurate predictions, especially when the data
relationships are highly complex. Ensemble methods like random
forests or gradient boosting often outperform standalone decision
trees.
5. Lack of global optimization: Decision trees make local decisions at
each node, which may not necessarily lead to a globally optimal
model.
Classification Algorithms
K-nearest Neighbour(kNN)
Decision Tree
Random forest classification

Supervised Learning And classification steps

Supervised learning is where the model is trained on a labelled
dataset. A labelled dataset is one that has both input and output
parameters.
The labelled dataset used in supervised learning consists of
input features and corresponding output labels.
KNN Algorithm(K-Nearest Neighbour)
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
o K-NN algorithm stores all the available data and classifies a new data point based
on the similarity. This means when new data appears then it can be easily classified
into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly
it is used for the Classification problems.
o It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.

 Step-1: Select the number K of the neighbours

 Step-2: Calculate the Euclidean distance of K number of neighbours

 Step-3: Take the K nearest neighbours as per the calculated Euclidean distance.

 Step-4: Among these k neighbours, count the number of the data points in each
category.
 Step-5: Assign the new data points to that category for which the number of the
neighbour is maximum.
 Step-6: Our model is ready.

We have a new entry but it doesn't have a class yet. To know its class,
we have to calculate the distance from the new entry to other entries
in the data set using the Euclidean distance formula.
Here's the formula(Euclidean distance): √(X₂-X₁)²+(Y₂-Y₁)²
Where:
X₂ = New entry's IMDb (7.4).
X₁= Existing entry's IMDb.
Y₂ = New entry's Duration (114).
Y₁ = Existing entry's IMDb.

For k=3

As you can see above, the majority class within the 3 nearest
neighbours to the new entry is (41,46,54) . Therefore, we'll classify
the new entry as red.
Majority Voting (Action,comedy,comedy) = Comedy

Define linear regression. Also explain Sum of squares

with its formula.
 Linear Regression is a fundamental statistical and machine
learning technique used for the relationship between a dependent
variable (target) and one or more independent variables (features
or predictors).
 The primary goal of linear regression is to find the linear equation
that best describes how the independent variables influence the
dependent variable. In simple linear regression, there is one
independent variable, while in multiple linear regression, there are
two or more.
 The general form of a simple linear regression equation is:
y=b0+b1x+ε
Where:
 y is the dependent variable (the variable you want to
predict).
 x is the independent variable (the feature used for
prediction).
 b0 is the intercept (the value of y when x=0).
 b1 is the slope (the change in y for a one-unit change in
x).
 ε represents the error term (the part of y that cannot be
explained by x).

The goal in linear regression is to determine the values of b0 and b1

that minimize the sum of squares of the error term, also known as
the "Sum of Squares of Residuals."

Sum of Squares (SS) is a measure of the total variation in a dataset.

In the context of linear regression, there are two important sums of
squares:

(1)Total Sum of Squares (SST):

SST measures the total variability in the dependent variable y. It is
the sum of the squared differences between each data point and the
mean of y.
(2)Residual Sum of Squares (SSE):
SSE measures the unexplained variability in the dependent variable after
applying the linear regression model. It is the sum of the squared differences
between each data point's actual value and the predicted value from the
regression model.
The sum of squares can be used to calculate the third important measure:

(3)Explained Sum of Squares (SSR):

SSR measures the variability in the dependent variable that is explained by the
regression model. It is the difference between the total variability (SST) and the
unexplained variability (SSE).
Formula: SSR=SST−SSE

ML Notes
No ratings yet
ML Notes
50 pages
ML 3
No ratings yet
ML 3
20 pages
TCL Child Care Services
No ratings yet
TCL Child Care Services
18 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Unit 3
No ratings yet
Unit 3
19 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
Velkley, Richard (Seth Benardete On de Anima)
No ratings yet
Velkley, Richard (Seth Benardete On de Anima)
17 pages
Unit Ivnotes
No ratings yet
Unit Ivnotes
19 pages
WK 07
No ratings yet
WK 07
8 pages
Syllabus in EE104
No ratings yet
Syllabus in EE104
7 pages
Unit 5
No ratings yet
Unit 5
73 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
9 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
Case Study - Hy Dairies, Inc. 1
No ratings yet
Case Study - Hy Dairies, Inc. 1
7 pages
ML Unit-III
No ratings yet
ML Unit-III
30 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
Class 3 - Classification
No ratings yet
Class 3 - Classification
80 pages
Ai Unit 4
No ratings yet
Ai Unit 4
17 pages
HCP Competencies Domain1 2023
No ratings yet
HCP Competencies Domain1 2023
3 pages
202440272655FF Ubah
No ratings yet
202440272655FF Ubah
1 page
Supervised Learning
No ratings yet
Supervised Learning
67 pages
1 s2.0 S266682702400001X Main
No ratings yet
1 s2.0 S266682702400001X Main
8 pages
Applied Machine Learning I
No ratings yet
Applied Machine Learning I
29 pages
Booz Allen Total Rewards
100% (2)
Booz Allen Total Rewards
4 pages
KNN (K-Nearest Neighbours) Is A Supervised Learning and Non-Parametric Algorithm That Can
No ratings yet
KNN (K-Nearest Neighbours) Is A Supervised Learning and Non-Parametric Algorithm That Can
4 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
DLL - Mapeh 6 - Q3 - W10
No ratings yet
DLL - Mapeh 6 - Q3 - W10
4 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Unit - 2 ML Notes
No ratings yet
Unit - 2 ML Notes
14 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
RTS qp3
No ratings yet
RTS qp3
2 pages
DataMining Unit-3
No ratings yet
DataMining Unit-3
8 pages
FMLanswerkey-IT 2
No ratings yet
FMLanswerkey-IT 2
11 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
ML Unit 2
No ratings yet
ML Unit 2
46 pages
Synthesis Writing Template: I. Introduction - MUST HAVE ALL THREE
No ratings yet
Synthesis Writing Template: I. Introduction - MUST HAVE ALL THREE
4 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
The Burdenof Englishin Africa Universityof Botswana June 09 Version 2
No ratings yet
The Burdenof Englishin Africa Universityof Botswana June 09 Version 2
14 pages
Chap 8
No ratings yet
Chap 8
9 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Algorithms
No ratings yet
Algorithms
5 pages
Edfo 418 Comparative Education
No ratings yet
Edfo 418 Comparative Education
4 pages
Unit 2
No ratings yet
Unit 2
11 pages
Motivation Letter Sample 3
No ratings yet
Motivation Letter Sample 3
2 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
BC Alumni NEwsletter February 2023 Final
No ratings yet
BC Alumni NEwsletter February 2023 Final
16 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Tabular Arrangement & Distribution Worksheet
No ratings yet
Tabular Arrangement & Distribution Worksheet
3 pages
Learning Types ML
No ratings yet
Learning Types ML
18 pages
Certificate of Recognition: (Dish Gardening, Desktop Publishing & Food Carving)
No ratings yet
Certificate of Recognition: (Dish Gardening, Desktop Publishing & Food Carving)
6 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
Coincent Data Analysis Answers
No ratings yet
Coincent Data Analysis Answers
16 pages
(This Serves As An Invitation) : Maka-Diyos Makatao Makakalikasan Makabansa
No ratings yet
(This Serves As An Invitation) : Maka-Diyos Makatao Makakalikasan Makabansa
2 pages
Sample TeacherFit Questions
100% (1)
Sample TeacherFit Questions
4 pages
CrackVerbal Verbal Study Plan With HW
No ratings yet
CrackVerbal Verbal Study Plan With HW
125 pages
1st Week Aug
No ratings yet
1st Week Aug
5 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Unit - II
No ratings yet
Unit - II
37 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Crystal Hi-Lites Malden Catholic High School
No ratings yet
Crystal Hi-Lites Malden Catholic High School
80 pages
Agumentik Task 18 (Blog Writing On Various Topics) by Deep Vyas
No ratings yet
Agumentik Task 18 (Blog Writing On Various Topics) by Deep Vyas
11 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Ayam Petelur
No ratings yet
Ayam Petelur
42 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
Reflections On Practice: The Difference of Two Squares
No ratings yet
Reflections On Practice: The Difference of Two Squares
13 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Chapter 1-Humanities, Art History, Art Appreciation, and Assumptions of Art - 2ndsem20192020 PDF
No ratings yet
Chapter 1-Humanities, Art History, Art Appreciation, and Assumptions of Art - 2ndsem20192020 PDF
46 pages
CH E 350, Process Heat Transfer Fall 2010: Course Content
No ratings yet
CH E 350, Process Heat Transfer Fall 2010: Course Content
2 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Annual Summary Report 2015 - Minhaj Welfare Foundation
No ratings yet
Annual Summary Report 2015 - Minhaj Welfare Foundation
44 pages
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
No ratings yet
Day 2 - Session 2: - KNN - Decision Tree - Random Forest - Naïve Bayes Classification
50 pages
Carnival in Brazil
No ratings yet
Carnival in Brazil
2 pages
Control of Indirect Matrix Converter Under Unbalanced Source Voltage and Load Current Conditions
No ratings yet
Control of Indirect Matrix Converter Under Unbalanced Source Voltage and Load Current Conditions
7 pages
PerceptiLabs-ML Handbook
No ratings yet
PerceptiLabs-ML Handbook
31 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages

Unit-7 ML

Uploaded by

Unit-7 ML

Uploaded by

UNIT-7

Split data set into subset using attribute Fmin

Strengths of Decision Trees:

1. Interpretability: Decision trees are easy to understand and

Supervised Learning And classification steps

 Step-1: Select the number K of the neighbours

 Step-2: Calculate the Euclidean distance of K number of neighbours

Define linear regression. Also explain Sum of squares

The goal in linear regression is to determine the values of b0 and b1

Sum of Squares (SS) is a measure of the total variation in a dataset.

(1)Total Sum of Squares (SST):

(3)Explained Sum of Squares (SSR):

You might also like