Decision Tree in R Programming Language

The document provides an overview of using Decision Trees in R for statistical computing and data visualization, highlighting its applications in data mining and machine learning. It explains the processes of partitioning and pruning in decision trees, as well as important factors like entropy and information gain for model selection. Additionally, it discusses an example using the 'readingSkills' dataset to demonstrate model creation, prediction, and accuracy evaluation.

Uploaded by

maryjoseph

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views22 pages

Decision Tree in R Programming Language

Uploaded by

maryjoseph

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Decision Tree in R

Programming
Language
 R is a programming language for statistical computing and data
visualization. It has been adopted in the fields of data
mining, bioinformatics and data analysis.
 The core R language is augmented by a large number of extension
packages, containing reusable code, documentation, and sample
data.
 R software is open-source and free software. It is licensed by the GNU
Project and available under the GNU General Public License. It is
written primarily in C, Fortran, and R
itself. Precompiled executable are provided for various operating
systems.
Why use R Programming?

 There are several tools available in the market to perform data

analysis. Learning new languages is time taken.
 The data scientist can use two excellent tools, i.e., R and Python.
 Data scientist job is to understand the data, manipulate it, and
expose the best approach.
 For machine learning, the best algorithms can be implemented
with R.
 R communicate with the other languages and possibly calls Python,
Java, C++. The big data world is also accessible to R.
 We can connect R with different databases like Spark or Hadoop.
EXAMPLE

 Let us consider the scenario where a medical company wants to

predict whether a person will die if he is exposed to the Virus.
 The important factor determining this outcome is the strength of his
immune system, but the company doesn’t have this info.
 Since this is an important variable, a decision tree can be
constructed to predict the immune strength based on factors like the
sleep cycles, cortisol levels, supplement intaken, nutrients derived
from food intake, and so one of the person which is all continuous
variables.
Working of a Decision Tree in R

 Partitioning:
 It refers to the process of splitting the data set into subsets.
 The decision of making strategic splits greatly affects the accuracy of
the tree.
 Many algorithms are used by the tree to split a node into sub-nodes
which results in an overall increase in the clarity of the node with
respect to the target variable.
 Various Algorithms like the chi-square and Gini index are used for
this purpose and the algorithm with the best efficiency is chosen.
 Pruning:
 This refers to the process wherein the branch nodes are turned into
leaf nodes which results in the shortening of the branches of the tree.
 The essence behind this idea is that overfitting is avoided by simpler
trees as most complex classification trees may fit the training data
well but do an underwhelming job in classifying new values.
 Selection of the tree:
 The main goal of this process is to select the smallest tree that fits
the data due to the reasons discussed in the pruning section.
Important factors to consider while
selecting the tree in R

 Entropy:
Mainly used to determine the uniformity in the given sample.
 If the sample is completely uniform then entropy is 0, if it’s uniformly
partitioned it is one.
 The higher the entropy more difficult it becomes to draw conclusions
from that information.
 Information Gain:
Statistical property which measures how well training examples are
separated based on the target classification.
 The main idea behind constructing a decision tree is to find an
attribute that returns the smallest entropy and the highest
information gain.
 It is basically a measure in the decrease of the total entropy, and it is
calculated by computing the total difference between the entropy
before split and average entropy after the split of dataset based on
the given attribute values.
R – Decision Tree Example

 Let us now examine this concept with the help of an example, which
in this case is the most widely used “readingSkills” dataset by
visualizing a decision tree for it and examining its accuracy.
 Installing the required libraries
Import required libraries and Load the dataset
readingSkills and execute head(readingSkills)
As you can see clearly there 4 columns nativeSpeaker, age, shoeSize,
and score. Thus basically we are going to find out whether a person is
a native speaker or not using the other criteria and see the accuracy
of the decision tree model developed in doing so.
Splitting dataset into 4:1 ratio for train
and test data

Separating data into training and testing sets is an important part of

evaluating data mining models. Hence it is separated into training and
testing sets. After a model has been processed by using the training set,
you test the model by making predictions against the test set. Because
the data in the testing set already contains known values for the
attribute that you want to predict, it is easy to determine whether the
model’s guesses are correct.
Create the decision tree model using
ctree and plot the model
The basic syntax for creating a decision
tree in R is:

where, formula describes the predictor and response variables and data
is the data set used.

In this case, nativeSpeaker is the response variable and the other

predictor variables are represented by, hence when we plot the model we
get the following output
From the tree, it is clear that those who have a score less
than or equal to 31.08 and whose age is less than or
OUTPUT equal to 6 are not native speakers and for those whose
score is greater than 31.086 under the same criteria,
they are found to be native speakers.
Making a prediction
OUTPUT

The model has correctly predicted 13 people to be non-native speakers but

classified an additional 13 to be non-native, and the model by analogy has
misclassified none of the passengers to be native speakers when actually
they are not.
Determining the accuracy of the model
developed

Here the accuracy-test from the confusion matrix is calculated and is

found to be 0.74. Hence this model is found to predict with an
accuracy of 74 %.
Inference

 Thus Decision Trees are very useful algorithms as they are not only
used to choose alternatives based on expected values but are also
used for the classification of priorities and making predictions.
 It is up to us to determine the accuracy of using such models in the
appropriate applications.
Advantages of Decision Trees

 Easy to understand and interpret

 Does not require Data normalization
 Doesn’t facilitate the need for scaling of data
 The pre-processing stage requires lesser effort compared to other
major algorithms, hence in a way optimizes the given problem
Disadvantages of Decision Trees

 Requires higher time to train the model

 It has considerable high complexity and takes more time to process
the data
 When the decrease in user input parameter is very small it leads to
the termination of the tree
 Calculations can get very complex at times

PWS1-1725KTL-H-NA - EX-O User Manual V4.0 - 20240315 - Control Box
No ratings yet
PWS1-1725KTL-H-NA - EX-O User Manual V4.0 - 20240315 - Control Box
72 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Unit II
No ratings yet
Unit II
34 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Decision Tree
100% (1)
Decision Tree
57 pages
MIS410 Chapter6
No ratings yet
MIS410 Chapter6
47 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Dar Lect 12
No ratings yet
Dar Lect 12
29 pages
AIRMATIC Function
100% (2)
AIRMATIC Function
3 pages
ML Chapter 4 Part2
No ratings yet
ML Chapter 4 Part2
75 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
EDA Cat2
No ratings yet
EDA Cat2
54 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Session 9 10 Decision Tree
No ratings yet
Session 9 10 Decision Tree
41 pages
Unit IV
No ratings yet
Unit IV
36 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Unit 3
No ratings yet
Unit 3
30 pages
Assignment of Decision Tree
No ratings yet
Assignment of Decision Tree
15 pages
Decisiontree1 2
No ratings yet
Decisiontree1 2
29 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DA Lab Week-3
No ratings yet
DA Lab Week-3
15 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
Session 17-Decision Tree
No ratings yet
Session 17-Decision Tree
16 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Ruckus One Subscriptions Licensing Guide 5-30-2025
No ratings yet
Ruckus One Subscriptions Licensing Guide 5-30-2025
59 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Lecture 07 On Decision Trees
No ratings yet
Lecture 07 On Decision Trees
36 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Unit-16 Advance Analysis Using R
No ratings yet
Unit-16 Advance Analysis Using R
19 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
1.2.1 Types of Organizational Data
No ratings yet
1.2.1 Types of Organizational Data
12 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
What Is A Decision Tree ?: - Decision Tree Is A Classifier in The Form of A Tree Structure, Where Each Node Is Either
No ratings yet
What Is A Decision Tree ?: - Decision Tree Is A Classifier in The Form of A Tree Structure, Where Each Node Is Either
18 pages
22ESC145 (C Programming)
No ratings yet
22ESC145 (C Programming)
5 pages
For Cert
No ratings yet
For Cert
44 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Prac 6
No ratings yet
Prac 6
6 pages
The Complete Servicenow System Administrator Course: Section 7 - Core Applications
No ratings yet
The Complete Servicenow System Administrator Course: Section 7 - Core Applications
36 pages
Week14 - LAQs - SWR
No ratings yet
Week14 - LAQs - SWR
3 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Lecture 16
No ratings yet
Lecture 16
5 pages
lpc2104 2105 2106
No ratings yet
lpc2104 2105 2106
32 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Experiment 12
No ratings yet
Experiment 12
2 pages
Unit 1-4 With Answers 1
No ratings yet
Unit 1-4 With Answers 1
15 pages
Practical 7
No ratings yet
Practical 7
2 pages
P7a Mahesh
No ratings yet
P7a Mahesh
2 pages
Advancing Students Computational Thinking Skills Through Educational Robotics PDF
No ratings yet
Advancing Students Computational Thinking Skills Through Educational Robotics PDF
10 pages
"Smart Waste Management System Using Iot"
No ratings yet
"Smart Waste Management System Using Iot"
6 pages
ELE Deliverable D2 18 Report On State of LT in 2030
No ratings yet
ELE Deliverable D2 18 Report On State of LT in 2030
56 pages
Real Function L9
No ratings yet
Real Function L9
25 pages
NeurIPS 2023 Bootstrapping Vision Language Learning With Decoupled Language Pre Training Paper Conference
No ratings yet
NeurIPS 2023 Bootstrapping Vision Language Learning With Decoupled Language Pre Training Paper Conference
16 pages
Java Variables
No ratings yet
Java Variables
30 pages
DevOps Unit-2 - 230325 - 123955
No ratings yet
DevOps Unit-2 - 230325 - 123955
15 pages
Passing Value From Popup Window To Parent Form TextBox
No ratings yet
Passing Value From Popup Window To Parent Form TextBox
2 pages
Business Analytics Using R - A Practical Approach
No ratings yet
Business Analytics Using R - A Practical Approach
7 pages
FAQ On The European Medical Device Nomenclature (EMDN) : MDCG 2021-12
No ratings yet
FAQ On The European Medical Device Nomenclature (EMDN) : MDCG 2021-12
5 pages
Cadison P&id Designer
No ratings yet
Cadison P&id Designer
4 pages
Migrate From An M-100 or M-500 Appliance To An M-200 or M-600 Appliance
No ratings yet
Migrate From An M-100 or M-500 Appliance To An M-200 or M-600 Appliance
7 pages
Formalization of The Data Flow Diagram Rules For Consistency Check
No ratings yet
Formalization of The Data Flow Diagram Rules For Consistency Check
18 pages
Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU
No ratings yet
Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU
3 pages
Bubble Sort in C
No ratings yet
Bubble Sort in C
5 pages
Calls of The Wild Exploring Procedural Abstraction in App Inventor
No ratings yet
Calls of The Wild Exploring Procedural Abstraction in App Inventor
8 pages
NCERT Solutions For Class 9 Maths Chapter 2 Exercise 2.3 - Free PDF 2024
No ratings yet
NCERT Solutions For Class 9 Maths Chapter 2 Exercise 2.3 - Free PDF 2024
6 pages
Windows 10 USB Power Save Configuration
No ratings yet
Windows 10 USB Power Save Configuration
4 pages
Ajp PR2
No ratings yet
Ajp PR2
4 pages
Chanel Authentication Guide & Serial Codes - Yoogi's Closet
No ratings yet
Chanel Authentication Guide & Serial Codes - Yoogi's Closet
1 page
MACAWI Respiratory Blower Specifications
No ratings yet
MACAWI Respiratory Blower Specifications
4 pages
SQL Server Version List
No ratings yet
SQL Server Version List
3 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet

Decision Tree in R Programming Language

Uploaded by

Decision Tree in R Programming Language

Uploaded by

Decision Tree in R

 There are several tools available in the market to perform data

 Let us consider the scenario where a medical company wants to

Separating data into training and testing sets is an important part of

In this case, nativeSpeaker is the response variable and the other

The model has correctly predicted 13 people to be non-native speakers but

Here the accuracy-test from the confusion matrix is calculated and is

 Easy to understand and interpret

 Requires higher time to train the model

You might also like