ML Unit 3

The document discusses bootstrapping as a resampling method in statistics, explaining its application in creating smaller datasets for analysis. It introduces bagging and pasting as ensemble methods, highlighting the concept of out-of-bag scoring and the use of random forests for classification and regression tasks. Additionally, it provides a data preparation example using Python, including preprocessing steps and model training with grid search for hyperparameter tuning.

Uploaded by

krishnnachaitanyakaricheti3333

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views3 pages

ML Unit 3

Uploaded by

krishnnachaitanyakaricheti3333

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

in statistics bootstrapping refrs to a resample method that consists of repeatedly

drawn with replacement samples from data to form other smaller datasets,called
bootstrapping samples. it's as if the bootstrapping method is a making abbunch of
simulations to our original dataset o its as if the bootstrapping method is making
a bunch of simulations to our original datasets so in some cases we can generalise
the mean and the standard deviation

For example lets say we have a set of observations:(2,4,32,8,16).we want each

bootstrap sample containing n obseRvations, the following are valid samples

n=3:(32,4,4),(8,16,2),(2,2,2)......
n=4(2,32,4,16),(2,4,2,8),(8,32,4,2).......

BAGGING& PASTING
Bagging means bootstrap +aggregating and it is a ensemble method in which we first
bootstap our data and for each bootstrap sample we train one model .After that,we
aggregate them with equal weights .When its not used replacement, method is called
pasting.
OUT-OF-Bag Scoring
If we are using bagging, theres chance that a sampe would never be selected, while
anothers may be selected multiple time .The probability of not selecting a
specijficsample isn(1-1/n)^n.Some samples are never tested but used in the model
this is called out of bag(OBB)

RANDOM FOREST
It is an ensemble of decision trees that can be used to classifications or
regression .In most voted becomes the output of the model . This is helpful to make
the model with the more accuracy and stable ,preventing overfiting.

Another very useful property of random forests is the ability to measure the
relative importance of each feature by calculating how much each one reduce the
impurity of the model .This is called feature importance.

DATA PREPERATION

import numpy as np
import pandas as pd
df = pd.read_csv('data/income.csv')
col = pd.Categorical(df.high_income)
df["high_income"]=col.codes
we define a transformer to pre process our data

from sklearn.base import BaseEstimator ,Transformermixin

from sklearn.preprocessing import MinMaxscalerclass
preprocessTransformer (BaseEstimator,TransformerMixin):
def__init__(self,cat_features,num_features):
self.cat_features=cat_features
self.num_features=num_features def fit(self,X,y=None):
return self def transform(self,x,y=None):
df=X.copy()
df.local[df['workclass']=='?',workclass']='Unknown'
df.local[df['native_country']!='united-states','native_country']='non_usa'
for name in self.cat_featurs:
col=pd.Categorical(df[name])
df[name]=col.codes
scaler=MinMaxScaler()
df[self.num_features]=scaler.fit_transform(df[num_features])
return df;

from sklearn.model_selection import train_test_split

X-train,X_test,y_train,y_test= train_test_split(
df.drop('high_income',axis=1)'
df['high_income'],
test_size=0.2,
random_state=42,
shuffle=True,
stratify=df['high_income']

first we create a pipeline to preprocess with our customer transformer

from sklearn.model_selection import KFold,GridsearchCV

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score,make_scorer
search_space=[
{
'clf,:[DecisionTreeClassifier()],
'clf__max_leaf_nodes':[128],
'fs__score_fun':[chi2],
'fs__k':[10],
},
{
'clf':[RandomForestClassifier()],
'clf__n_eatimators':[200],
'clf__max_leaf_nodes':[128],
'fs__score_func': [chi2],
'fs__k':[10],
}
]
scoring = {'AUC':'roc_auc','Accuracy':make_scorer(accuracy_score)}
Kfold=KFold(n_splits=10,random_state=42)
grid=GridSearchCV(
pipe,
param_grid=search_space,
cv=kflod,
scoring=scoring,
refit='AUC',
verbose=1,
n_jobs=-1
)
model=grid.fit(X_train,y_train)

best_estimator=grid.best_estimator_.steps[-1][1]
columns=x_tesst.columns.tolist()print('OOB Score:
{}'.format(best_estimator.obb_score_))
print('feature Importance')
for i,imp in enumerate(best_estimator.feature_importances_):
print('{}:{:.3f}'.format(columns[i],imp))
=>AveragePrice+LearningRate*ResidualPredicted by decision Tree

residual=Actual Value-Predicted Value

+>AveragePrice+LRResidualPredicted by DT1+LRResidual Predicted by DT2+.....

+LR*ResidualPredicted by DT N

Amount of Say=1/2log(1-Total Error/Total Error)

Amount of Say=1/2log(1-1/4)
______=0.239
( 1/4)

Amount of Say
New Sample Weight=(sample weight)*e

0.239
New Sample Weight=(1/4)*e
-Amount of Say
New Sample Weight+(sample weight)*e

-0.239
New Sample Weight=(1/4)* e =0.197

Model Algorithm
BaseModel 1 Decision Tree
Basemodel 2 neural network
basemodel 3 support Vector Machine

Model Algorithm
Meta-Model Logistic Regression

model Algorithm
Base Model 1 Decision Tree
Basemodel 2 neural network
basemodel 3 support Vector Machine
Meta-Model Logistic Regression

Tory Ime: The Story of Ramayan
100% (1)
Tory Ime: The Story of Ramayan
3 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Yoga - Christian Perspective - Vishal Mangalwadi
100% (1)
Yoga - Christian Perspective - Vishal Mangalwadi
18 pages
Reasons For Underdeveloped West AFRICA
100% (4)
Reasons For Underdeveloped West AFRICA
15 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
ISEB CE CASE Mathematics 13 Specification 2020 11.22
No ratings yet
ISEB CE CASE Mathematics 13 Specification 2020 11.22
35 pages
Import Data Connections To SAP S4HANA
100% (1)
Import Data Connections To SAP S4HANA
4 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Notes
No ratings yet
Notes
35 pages
Bagging and Random Forest Presentation1
100% (3)
Bagging and Random Forest Presentation1
23 pages
The Restoration and The 18th Century Literature
100% (2)
The Restoration and The 18th Century Literature
29 pages
Rehabilitation and Retrofitting of Structurs Question Papers
No ratings yet
Rehabilitation and Retrofitting of Structurs Question Papers
4 pages
CE880 Lecture7 Slides
No ratings yet
CE880 Lecture7 Slides
78 pages
Digital Scholarly Editing - Elena Pierazzo
No ratings yet
Digital Scholarly Editing - Elena Pierazzo
247 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Ensembles Models and Decision Tree
No ratings yet
Ensembles Models and Decision Tree
21 pages
PR CV Short
No ratings yet
PR CV Short
26 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Random Forest
No ratings yet
Random Forest
16 pages
Random Forest
No ratings yet
Random Forest
16 pages
ML Interview
No ratings yet
ML Interview
65 pages
Arnav MLlab05
No ratings yet
Arnav MLlab05
12 pages
Falcis V CIvil Registrar Case Digest
No ratings yet
Falcis V CIvil Registrar Case Digest
2 pages
Harsh It
No ratings yet
Harsh It
9 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
16 pages
New Perspectives New Truths
No ratings yet
New Perspectives New Truths
50 pages
AI and ML Lab Ex3 To 12
No ratings yet
AI and ML Lab Ex3 To 12
27 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
ML Minors Exp8
No ratings yet
ML Minors Exp8
6 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
T3 Bda
No ratings yet
T3 Bda
27 pages
Udacity Machine Learning Analysis Supervised Learning
100% (1)
Udacity Machine Learning Analysis Supervised Learning
504 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Questions & Pointers
No ratings yet
Questions & Pointers
3 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
21 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Fall 2022 Midterm Notes PDF
No ratings yet
Fall 2022 Midterm Notes PDF
15 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Text: "Yerba Mate and The Guaraní People": English Written Test
No ratings yet
Text: "Yerba Mate and The Guaraní People": English Written Test
3 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
Eng 2603 Ass 1
No ratings yet
Eng 2603 Ass 1
13 pages
Unit 3
No ratings yet
Unit 3
59 pages
Root Finding:: X X D D X X D X X
No ratings yet
Root Finding:: X X D D X X D X X
3 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
CS3491 Lab Manual
No ratings yet
CS3491 Lab Manual
21 pages
ML Unit-3 Part-1
No ratings yet
ML Unit-3 Part-1
17 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Shoulder Dystocia 1
No ratings yet
Shoulder Dystocia 1
38 pages
PDS LVC 2 Post-Session Summary
No ratings yet
PDS LVC 2 Post-Session Summary
11 pages
Day 2 Presentation
No ratings yet
Day 2 Presentation
65 pages
AIML Solved Paper Nov-Dec 2024
No ratings yet
AIML Solved Paper Nov-Dec 2024
2 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Classroom 1 Class Notes For Article
No ratings yet
Classroom 1 Class Notes For Article
2 pages
Random Forest
No ratings yet
Random Forest
25 pages
SML
No ratings yet
SML
8 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
DLP - Properties of Gas
100% (3)
DLP - Properties of Gas
2 pages
Unit 3
No ratings yet
Unit 3
63 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Lab Gis
No ratings yet
Lab Gis
16 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Spag Activity Sheets
No ratings yet
Spag Activity Sheets
15 pages
CN Lab prog-BH
No ratings yet
CN Lab prog-BH
12 pages
Lesson Plan 1-Triumpfh of Surgery
No ratings yet
Lesson Plan 1-Triumpfh of Surgery
9 pages
Reign of Ashurbanipal and Akhenaten
No ratings yet
Reign of Ashurbanipal and Akhenaten
2 pages
Diocese of Lafia Dyc Keffi 2020
No ratings yet
Diocese of Lafia Dyc Keffi 2020
5 pages
ML Mid Question Solve
No ratings yet
ML Mid Question Solve
19 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Responsible Use of Media and Information
No ratings yet
Responsible Use of Media and Information
14 pages
Goethean Science and The Reciprocal System
No ratings yet
Goethean Science and The Reciprocal System
7 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
Nutritional Needs of A Newborn: Mary Winrose B. Tia, RN
No ratings yet
Nutritional Needs of A Newborn: Mary Winrose B. Tia, RN
32 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Why Do I Oppose The Unification Church?
No ratings yet
Why Do I Oppose The Unification Church?
6 pages
CIVL 4750 Numerical Solutions To Geotechnical Problems: I: TA: T V: Tuesday/ C O
No ratings yet
CIVL 4750 Numerical Solutions To Geotechnical Problems: I: TA: T V: Tuesday/ C O
3 pages
Qualifications of Board of Nursing
No ratings yet
Qualifications of Board of Nursing
11 pages
Valuation Variant
No ratings yet
Valuation Variant
4 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet