0% found this document useful (0 votes)

16 views5 pages

Data Mining Notes

Uploaded by

yasminefakhfakh06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views5 pages

Data Mining Notes

Uploaded by

yasminefakhfakh06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

-Naive Bayes

- Decision tree (information gain, bootstrapping, aggregation)

- General embedding
- Random forest: Making weak learners, mabaadhom iwaliw strong
➢ We do the sampling through bootstrapping
➢ 80% of the information out of the bag
➢ the 20% of information will help us test the single tree
➢ We do sampling on rows and sampling on columns
➢ Nelkaw 80 observations naamlou aliha decision tree
➢ ADABOOST: tree tsalah tree? ⇒ A direct derivative of random forest
➢ OOB Error: Out-of-bag error (very important)
➢ Features importance is from the strengths of random forest (kife exactly
feature import?)
➢ Neuron network: a lot of data and reseau kbir is required to understand it
➢ Boosting: Replicating what we want the system to pay attention to
Data balancing: you have another class that you do an oversampling in it, a
replication, to make it bigger OR take the whole sample annd?? (VERY
IMPORTANT) We do that because the system dosent work on 1/1000

Random Forest:
- Ensemble simplymeans combining multiple models. Thus a collection of
models is used to make
predictions rather than an
individual model.
- Bagging: It creates a
different training subset from
sample training data with
replacement & the final output is
based on majority voting. For
example, Random Forest.
- Boosting : It combines
weak learners into strong learners by creating sequential models such
that the final model has the highest accuracy. For example, ADA BOOST,
XG BOOST.

Weak learners:
Random Forest steps:
1. Step 1: In the Random forest model, a subset of data points and a
subset of features is selected for constructing each decision tree.
Simply put, n random records and m features are taken from the
data set having k number of records.
2. Step 2: Individual decision trees are constructed for each sample.
3. Step 3: Each decision tree will generate an output.
4. Step 4: Final output is considered based on Majority Voting or
Averaging for Classification and regression, respectively.

+ Diversity and stability

+ Good with dimensionality
+ Parallenization

1- Bagging steps:
1. Subset selection
2. Bootstrap sampling
3. Bootstrapping
4. Independent Model Training
5. Majority voting (choosing most
predicted model)
6. Aggregation: Involves combining
all the results and generating final
output

2- Boosting:
➢ Use the concept of ensemble learning.
➢ A boosting algorithm combines multiple simple models (also known as
weak learners or base estimators) to generate the final output.
➢ It is done by building a model by using weak models in series.

Examples:
⇒ AdaBoost was the first really successful boosting algorithm that was developed
for the purpose of binary classification.
⇒ abbreviation for Adaptive Boosting and is a prevalent boosting technique that
combines multiple “weak classifiers” into a single “strong classifier.”
Last session before midterm:
- - k- fold cross validation- nkasmou data ala K w kol partie naamlou testing
w sampling as if you will increase the number of tests to increase the
accuracy
- - Hyperparameter tuning nestaamlou fih K fold cross validation barcha
(yesser mouhema for supervised learning)==> The technique is called grid
search cross validation
- fil R: GridSearchCV() fil R: Improvement of model performance (kima
bagging and boosting)
- Stump: asgher tree (weak learner)
- Kalna rivzou b chatgpt fih kol chay
- Bch yaamlna session online ala python w bch yaatina "tutorial" bch ii
menou fi devoir
-
-
Random Forest: An Overview
1. What is a Random Forest?
● A Random Forest is an ensemble learning method, primarily used for
classification and regression.
● It operates by constructing a multitude of decision trees at training time
and outputting the class that is the mode of the classes (in classification)
or mean prediction (in regression) of the individual trees.
2. How Does it Work?
● Bootstrap Aggregating (Bagging): Random Forests create an ensemble of
Decision Trees using bagging. Each tree is built on a bootstrap sample, a
random sample with replacement of the training data.
● Feature Randomness: When building each tree, each time a split is
considered, a random sample of features is chosen as split candidates
from the full set of features. This introduces more diversity among the
trees, and it's a key difference from a single decision tree.
3. Key Characteristics
● Reduction in Overfitting: By averaging multiple trees, there is a
significant reduction in the risk of overfitting.
● Handling Unbalanced Data: Random Forests can handle unbalanced data.
They work well for classification where the classes are imbalanced.
● Feature Importance: They give insights into which features are important
in the prediction.
4. Applications
● Random Forests are versatile and can be used in various tasks including
but not limited to medical diagnosis, stock market prediction, and
e-commerce recommendation systems.
5. Limitations
● Interpretability: They are less interpretable than decision trees.
Understanding how a decision is made by looking at individual trees can
be challenging due to the ensemble nature.
● Performance Issues: For data including categorical variables with
different numbers of levels, Random Forests are biased in favor of those
attributes with more levels.
● Large Datasets: For very large datasets, the size of the trees can take up a
lot of memory. It can also be time-consuming to train.
6. Practical Considerations
● Number of Trees: The number of trees in the forest should be high
enough to achieve stable accuracy, but adding more trees beyond a
certain point does not improve performance.
● Tree Depth: Control overfitting by adjusting the depth of the trees.
Deeper trees can model more complex patterns but also increase the risk
of overfitting.
7. Random Forest vs. Decision Trees
● While a single decision tree can be prone to overfitting, the Random
Forest averages multiple trees, which reduces overfitting and improves
generalization.
Conclusion
Random Forest is a powerful and versatile machine learning algorithm, suitable
for a wide range of applications. It provides a good balance between prediction
accuracy and model interpretability, especially in settings where decision trees
alone might be too simplistic or prone to overfitting. However, understanding
the nuances of its parameters and the nature of your data is key to harnessing its
full potential.

Grade 2 3 Ukulele Unit Plan
67% (3)
Grade 2 3 Ukulele Unit Plan
10 pages
Importance of Philosophy of Education To Teachers
100% (5)
Importance of Philosophy of Education To Teachers
18 pages
Random Forest Algorithm Updated
No ratings yet
Random Forest Algorithm Updated
11 pages
SLI-02 Cover Letter
100% (1)
SLI-02 Cover Letter
2 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
10 Report
No ratings yet
10 Report
65 pages
Random Forest
No ratings yet
Random Forest
9 pages
Rizal National School of Arts and Trades: Summative Test - Math 9 Quarter III-Week 1 and 2
100% (1)
Rizal National School of Arts and Trades: Summative Test - Math 9 Quarter III-Week 1 and 2
4 pages
Tony Duff - The Method of Preserving The Face of Rigpa by Ju Mipham Namgyal - Eng & Port - 21 Pgs
100% (4)
Tony Duff - The Method of Preserving The Face of Rigpa by Ju Mipham Namgyal - Eng & Port - 21 Pgs
21 pages
English 10 - Unit 5 Technology and You - Reading
0% (1)
English 10 - Unit 5 Technology and You - Reading
4 pages
PERIODICAL 3rd QUARTER GRADE 7
No ratings yet
PERIODICAL 3rd QUARTER GRADE 7
4 pages
Random Forest
No ratings yet
Random Forest
14 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
Random Forests
No ratings yet
Random Forests
43 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Random Forest
No ratings yet
Random Forest
27 pages
Random Forest
No ratings yet
Random Forest
29 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Randon Forest
No ratings yet
Randon Forest
34 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Random Forest
No ratings yet
Random Forest
25 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Bagging and Random Forest Presentation1
100% (3)
Bagging and Random Forest Presentation1
23 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest Classic Style
No ratings yet
Random Forest Classic Style
9 pages
Random Forest, CNN and Different Algorithm
No ratings yet
Random Forest, CNN and Different Algorithm
14 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
Focus Group Discussions
No ratings yet
Focus Group Discussions
16 pages
AIML QB in Short Form
No ratings yet
AIML QB in Short Form
48 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Lab Exercise No.3 Simple Subdivision
No ratings yet
Lab Exercise No.3 Simple Subdivision
2 pages
Random Forest (RF) : Decision Trees
No ratings yet
Random Forest (RF) : Decision Trees
3 pages
Random Forest Algorithm 1
No ratings yet
Random Forest Algorithm 1
14 pages
Art Center Business Plan FINAL 5-4-09
No ratings yet
Art Center Business Plan FINAL 5-4-09
42 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Random FOrest
No ratings yet
Random FOrest
19 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Session 7 - Random Forest
No ratings yet
Session 7 - Random Forest
8 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
PDS LVC 2 Post-Session Summary
No ratings yet
PDS LVC 2 Post-Session Summary
11 pages
05.random Forest
No ratings yet
05.random Forest
3 pages
Bagging
No ratings yet
Bagging
6 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
4 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Adv Ai Ia2
No ratings yet
Adv Ai Ia2
6 pages
Bagging
No ratings yet
Bagging
7 pages
Random Forest Algorithms - Comprehensive Guide With Examples
No ratings yet
Random Forest Algorithms - Comprehensive Guide With Examples
13 pages
Bagging Vs Boosting - Javatpoint
No ratings yet
Bagging Vs Boosting - Javatpoint
8 pages
Random Forest
No ratings yet
Random Forest
6 pages
Tips For A Newscast
No ratings yet
Tips For A Newscast
2 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Bagging and Random Forest 10 Marks Notes
No ratings yet
Bagging and Random Forest 10 Marks Notes
2 pages
Psocct 1645
No ratings yet
Psocct 1645
39 pages
Asian EFL Journal Research Articles. Vol. 23 Issue No. 3.3 May 2019
No ratings yet
Asian EFL Journal Research Articles. Vol. 23 Issue No. 3.3 May 2019
23 pages
Winter Recurrent 2021 Atr72
No ratings yet
Winter Recurrent 2021 Atr72
21 pages
2023 24 Format BBA Report
No ratings yet
2023 24 Format BBA Report
6 pages
Winnipeg General Stike Learning by Doing
No ratings yet
Winnipeg General Stike Learning by Doing
7 pages
Balancing Distant Learning
No ratings yet
Balancing Distant Learning
2 pages
Past Tenses Used To
No ratings yet
Past Tenses Used To
2 pages
Class 11 Chapter-5 Rural Development
No ratings yet
Class 11 Chapter-5 Rural Development
26 pages
Influences of Social and Educational Environments On Creativity During Adolescence: Does SES Matter?
No ratings yet
Influences of Social and Educational Environments On Creativity During Adolescence: Does SES Matter?
10 pages
MA Student Handbook 2023-24 (EndSeptVersion)
No ratings yet
MA Student Handbook 2023-24 (EndSeptVersion)
35 pages
Advstat wk7
No ratings yet
Advstat wk7
3 pages
Oxford Fact Sheet 2022 PDF
No ratings yet
Oxford Fact Sheet 2022 PDF
3 pages
PSYC1030 Cycle of Science Template - Individual - 2023
No ratings yet
PSYC1030 Cycle of Science Template - Individual - 2023
2 pages
ACA Essay Jip Duijn Final
No ratings yet
ACA Essay Jip Duijn Final
4 pages
Abstract and Concrete Nouns Review
No ratings yet
Abstract and Concrete Nouns Review
3 pages
Media Guide 2009
No ratings yet
Media Guide 2009
34 pages
November 2024 Top Cited Article in Education
No ratings yet
November 2024 Top Cited Article in Education
22 pages
Resume Reformat
No ratings yet
Resume Reformat
1 page
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
From Everand
Python Machine Learning: Learn how to build powerful Python machine learning algorithms to generate useful data insights with this data analysis tutorial
Sebastian Raschka
4/5 (20)
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Data Mining Notes

Uploaded by

Data Mining Notes

Uploaded by

-Naive Bayes

- Decision tree (information gain, bootstrapping, aggregation)

+ Diversity and stability

You might also like