ML Lec6

The document discusses the transition from decision trees to random forests, highlighting the advantages of using an ensemble of decision trees to improve classification accuracy and generalization. It outlines the algorithm for constructing random forests, including the selection of variables and training samples, as well as the pros and cons of this method. Additionally, it emphasizes the efficiency, speed, and robustness of random forests in handling large datasets and missing values.

Uploaded by

luosuochao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views4 pages

ML Lec6

Uploaded by

luosuochao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lecture 6 Random Forest 1.

From Decision tree to Random Forrest

• To handle decision tree’s generalization
1. From decision tree to random forest capability, we may use a new approach
• Power of the crowds
2. Algorithm
3. Pros & Cons
4. More information

Definition Decision trees

• Decision trees are individual learners that are combined.
• A single decision tree does not perform well
• They are one of the most popular learning methods commonly
• But it is super fast, what if we learn multiple trees? used for data exploration.
• Random forest (or random forests) is an ensemble classifier • One type of decision tree is called CART - classification and
that consists of many decision trees and outputs the class regression tree.
that is the mode of the class's output by individual trees. • CART - greedy, top-down binary, recursive
• The term came from random decision forests that was first partitioning, that divides feature space into sets of
proposed by Tin Kam Ho of Bell Labs in 1995. disjoint rectangular regions.
• Regions should be pure wrt response variable.
• The method combines Breiman's "bagging" idea and the
random selection of features. • Simple model is fit in each region – majority vote for
classification, constant value for regression.
Decision tress involve greedy, recursive partitioning. 2. Algorithm
Simple dataset with two predictors Each tree is constructed using the following algorithm:
1. Let the number of training cases be N, and the number of
variables in the classifier be M.
2. We are told the number m of input variables to be used to
determine the decision at a node of the tree; m should be
much less than M.
3. Choose a training set for this tree by choosing n times with
replacement from all N available training cases (i.e. take a
bootstrap sample). Use the rest of the cases to estimate the error
of the tree, by predicting their classes.
Greedy, recursive partitioning along TI and PE 4. For each node of the tree, randomly choose m variables on which
to base the decision at that node. Calculate the best split based on
these m variables in the training set.
5. Each tree is fully grown and not pruned (as may be done in
constructing a normal tree classifier).
For prediction a new sample is pushed down the tree. It is assigned the
label of the training sample in the terminal node it ends up in. This
procedure is iterated over all trees in the ensemble, and the average
vote of all trees is reported as random forest prediction.

Algorithm flow chart Practical consideration

• Splits are chosen according to a purity measure:

• E.g., squared error (regression), Gini index or
deviance (classification)
• How to select n?
• Build trees until the error no longer decreases
• How to select m?
• Try to recommend defaults, half of them and twice
of them and pick the best.
3. Pros and Cons • It has methods for balancing error in class population unbalanced data
sets.
The advantages of random forest:
• Generated forests can be saved for future use on other data.
• It is one of the most accurate learning algorithms available. For many
data sets, it produces a highly accurate classifier. • Prototypes are computed that give information about the relation
between the variables and the classification.
• It runs efficiently on large databases.
• It computes proximities between pairs of cases that can be used in
• It can handle thousands of input variables without variable deletion.
clustering, locating outliers, or (by scaling) give interesting views of
• It gives estimates of what variables are important in the the data.
classification.
• The capabilities of the above can be extended to unlabeled data,
• It generates an internal unbiased estimate of the generalization error as leading to unsupervised clustering, data views and outlier detection.
the forest building progresses.
• It offers an experimental method for detecting variable
• It has an effective method for estimating missing data and maintains interactions.
accuracy when a large proportion of the data are missing.

4. Additional information
Disadvantages
Estimating the test error:
• While growing forest, estimate test error from
• Random forests have been observed to overfit for some
training samples
datasets with noisy classification/regression tasks.
• For each tree grown, 33-36% of samples are not selected in
• For data including categorical variables with different number of bootstrap, called out of bootstrap (OOB) samples
levels, random forests are biased in favor of those attributes with
more levels. Therefore, the variable importance scores from • Using OOB samples as input to the corresponding tree,
random forest are not reliable for this type of data. predictions are made as if they were novel test samples
• Through book-keeping, majority vote (classification), average
(regression) is computed for all OOB samples from all trees.
• Such estimated test error is very accurate in practice, with
reasonable n
Summary:
Estimating the importance of each input: • Fast fast fast!
• RF is fast to build. Even faster to predict!
• Denote by êthe OOB estimate of the loss when
• Practically speaking, not requiring cross-validation alone for model
using original training set, D. selection significantly speeds training by 10x-100x or more.
• For each input xp where p∈{1,..,k} • Fully parallelizable … to go even faster!

• Randomly permute pth input to generate a new set • Automatic predictor (inputs) selection from large number
of samples D' ={(y1,x'1),…,(yN,x'N)} of candidates
• Resistance to over training
• Compute OOB estimate êk of prediction error with
the new samples • Ability to handle data without preprocessing
• data does not need to be rescaled, transformed, or modified
• A measure of importance of predictor xp is êk – ê,
• resistant to outliers
the increase in error due to random perturbation of
• automatic handling of missing values
pth predictor

Want to know more?

https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

Repair+manuals Chilton Manuales
39% (95)
Repair+manuals Chilton Manuales
26 pages
Pokemon Ranger Guardian Signs Official Game Guide - Unleashed
100% (1)
Pokemon Ranger Guardian Signs Official Game Guide - Unleashed
257 pages
Wartsila Me Operations Manual
100% (15)
Wartsila Me Operations Manual
373 pages
Durst Laborator 138 Instructions PDF
No ratings yet
Durst Laborator 138 Instructions PDF
23 pages
Random Forest
No ratings yet
Random Forest
8 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Random Forest Intro Presented
No ratings yet
Random Forest Intro Presented
38 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Random Forest
No ratings yet
Random Forest
83 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forests
No ratings yet
Random Forests
35 pages
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
No ratings yet
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
27 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Random Forest
No ratings yet
Random Forest
32 pages
Case Study Possible Questions
No ratings yet
Case Study Possible Questions
3 pages
Advanced Predictive Analytics Using R & Python: - Muquayyar Ahmed Data Scientist
No ratings yet
Advanced Predictive Analytics Using R & Python: - Muquayyar Ahmed Data Scientist
11 pages
Random Forests
No ratings yet
Random Forests
43 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest
No ratings yet
Random Forest
13 pages
Random Forest
No ratings yet
Random Forest
21 pages
Random FOrest
No ratings yet
Random FOrest
19 pages
25 June 2024 12:34: Random Fores Page 1
No ratings yet
25 June 2024 12:34: Random Fores Page 1
6 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
Random Forest
No ratings yet
Random Forest
29 pages
A Random Forest Guided Tour: Gerard - Biau@
No ratings yet
A Random Forest Guided Tour: Gerard - Biau@
41 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
ML Mid Question Solve
No ratings yet
ML Mid Question Solve
19 pages
Random Forest
No ratings yet
Random Forest
14 pages
Random Forest, CNN and Different Algorithm
No ratings yet
Random Forest, CNN and Different Algorithm
14 pages
Ijeit1412201405 47
No ratings yet
Ijeit1412201405 47
7 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
4 pages
Da MS
No ratings yet
Da MS
24 pages
Random Forest Presentation
No ratings yet
Random Forest Presentation
37 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Random Forest
No ratings yet
Random Forest
5 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Biau 2016
No ratings yet
Biau 2016
31 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
Guided Tour To Random Forest
No ratings yet
Guided Tour To Random Forest
42 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Week 12
No ratings yet
Week 12
34 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Random Forest - Basics
No ratings yet
Random Forest - Basics
9 pages
Random Forest Medical Diagnosis 1684665707
No ratings yet
Random Forest Medical Diagnosis 1684665707
10 pages
Random Forest
No ratings yet
Random Forest
11 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Random Forest
No ratings yet
Random Forest
10 pages
Random Forests: N 1 N J X A I X A I
No ratings yet
Random Forests: N 1 N J X A I X A I
12 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Montillo RandomForests 4-2-2009
No ratings yet
Montillo RandomForests 4-2-2009
28 pages
Week 6 - Random Forest
No ratings yet
Week 6 - Random Forest
12 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
ML Lec11
No ratings yet
ML Lec11
14 pages
ML Lec12
No ratings yet
ML Lec12
10 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
ML Lec3
No ratings yet
ML Lec3
10 pages
JVC Lt-22hg45e Led TV PDF
No ratings yet
JVC Lt-22hg45e Led TV PDF
43 pages
Name:Nor Shakira Binti Azemi & Dharvin Dharan A/L Elango Theme: Environment Issue Topic: Humans Are To Blame For Environmental Degradation
No ratings yet
Name:Nor Shakira Binti Azemi & Dharvin Dharan A/L Elango Theme: Environment Issue Topic: Humans Are To Blame For Environmental Degradation
3 pages
Money Market Instruments in Pakistan
No ratings yet
Money Market Instruments in Pakistan
48 pages
Datasheet LORENTZ PSk3 Hybrid Solar Pumping Solution
No ratings yet
Datasheet LORENTZ PSk3 Hybrid Solar Pumping Solution
7 pages
Platonic Idealism: By: Dylan Isabela Jairus Marcos
No ratings yet
Platonic Idealism: By: Dylan Isabela Jairus Marcos
15 pages
Ductile Iron Pipe Piles (DSI-Case Atlantic)
No ratings yet
Ductile Iron Pipe Piles (DSI-Case Atlantic)
16 pages
SUTENE2 TRM Test U8B
No ratings yet
SUTENE2 TRM Test U8B
4 pages
Hebel Product Guide HELIT150
No ratings yet
Hebel Product Guide HELIT150
59 pages
1 - Chemicals in The Workplace
No ratings yet
1 - Chemicals in The Workplace
58 pages
MR 307 Laguna 8
No ratings yet
MR 307 Laguna 8
314 pages
Across Arabian Seas - PT 4
No ratings yet
Across Arabian Seas - PT 4
3 pages
Hypatia Ipazia: The Mean Streets of Old Alexandria by Mike Flynn
No ratings yet
Hypatia Ipazia: The Mean Streets of Old Alexandria by Mike Flynn
28 pages
MCQ For EXCEL Part1
No ratings yet
MCQ For EXCEL Part1
12 pages
L&T Pushbutton Catalogue Price List
100% (1)
L&T Pushbutton Catalogue Price List
16 pages
Chapter 12 - The Cell Cycle
No ratings yet
Chapter 12 - The Cell Cycle
2 pages
(Classics in Applied Mathematics) Stephen L. Campbell, Carl D. Meyer - Generalized Inverses of Linear Transformations - Society For Industrial and Applied Mathematics (2008)
100% (1)
(Classics in Applied Mathematics) Stephen L. Campbell, Carl D. Meyer - Generalized Inverses of Linear Transformations - Society For Industrial and Applied Mathematics (2008)
294 pages
F650man I
No ratings yet
F650man I
553 pages
Lesson Plan
No ratings yet
Lesson Plan
4 pages
Alternating Voltage and Current
No ratings yet
Alternating Voltage and Current
41 pages
Lar 47th Edition Addendum I en
No ratings yet
Lar 47th Edition Addendum I en
5 pages
Coir
No ratings yet
Coir
34 pages
A. Local Related Literature and Studies
No ratings yet
A. Local Related Literature and Studies
7 pages
45905128e8e0b-1 Gs Pre Abhyaas Test 4359 e 2024 Letter
No ratings yet
45905128e8e0b-1 Gs Pre Abhyaas Test 4359 e 2024 Letter
22 pages
Chong Mai 43-27
100% (2)
Chong Mai 43-27
5 pages
Latihan Soal
No ratings yet
Latihan Soal
10 pages
1 Belgrade
No ratings yet
1 Belgrade
67 pages

ML Lec6

Uploaded by

ML Lec6

Uploaded by

Lecture 6 Random Forest 1.

From Decision tree to Random Forrest

Definition Decision trees

Algorithm flow chart Practical consideration

• Splits are chosen according to a purity measure:

Want to know more?

You might also like