ML Unit 3
ML Unit 3
drawn with replacement samples from data to form other smaller datasets,called
bootstrapping samples. it's as if the bootstrapping method is a making abbunch of
simulations to our original dataset o its as if the bootstrapping method is making
a bunch of simulations to our original datasets so in some cases we can generalise
the mean and the standard deviation
n=3:(32,4,4),(8,16,2),(2,2,2)......
n=4(2,32,4,16),(2,4,2,8),(8,32,4,2).......
BAGGING& PASTING
Bagging means bootstrap +aggregating and it is a ensemble method in which we first
bootstap our data and for each bootstrap sample we train one model .After that,we
aggregate them with equal weights .When its not used replacement, method is called
pasting.
OUT-OF-Bag Scoring
If we are using bagging, theres chance that a sampe would never be selected, while
anothers may be selected multiple time .The probability of not selecting a
specijficsample isn(1-1/n)^n.Some samples are never tested but used in the model
this is called out of bag(OBB)
RANDOM FOREST
It is an ensemble of decision trees that can be used to classifications or
regression .In most voted becomes the output of the model . This is helpful to make
the model with the more accuracy and stable ,preventing overfiting.
Another very useful property of random forests is the ability to measure the
relative importance of each feature by calculating how much each one reduce the
impurity of the model .This is called feature importance.
DATA PREPERATION
import numpy as np
import pandas as pd
df = pd.read_csv('data/income.csv')
col = pd.Categorical(df.high_income)
df["high_income"]=col.codes
we define a transformer to pre process our data
best_estimator=grid.best_estimator_.steps[-1][1]
columns=x_tesst.columns.tolist()print('OOB Score:
{}'.format(best_estimator.obb_score_))
print('feature Importance')
for i,imp in enumerate(best_estimator.feature_importances_):
print('{}:{:.3f}'.format(columns[i],imp))
=>AveragePrice+LearningRate*ResidualPredicted by decision Tree
Amount of Say=1/2log(1-1/4)
______=0.239
( 1/4)
Amount of Say
New Sample Weight=(sample weight)*e
0.239
New Sample Weight=(1/4)*e
-Amount of Say
New Sample Weight+(sample weight)*e
-0.239
New Sample Weight=(1/4)* e =0.197
Model Algorithm
BaseModel 1 Decision Tree
Basemodel 2 neural network
basemodel 3 support Vector Machine
Model Algorithm
Meta-Model Logistic Regression
model Algorithm
Base Model 1 Decision Tree
Basemodel 2 neural network
basemodel 3 support Vector Machine
Meta-Model Logistic Regression