0% found this document useful (0 votes)
0 views

ensemble learning

The document discusses ensemble learning, emphasizing the importance of training diverse models to improve prediction accuracy. It covers techniques such as hard voting, soft voting, random forests, bagging, and boosting, illustrating their applications with examples from Python code. The document also includes results from a Random Forest Classifier and a Voting Classifier, showcasing their performance metrics.

Uploaded by

oulla898
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

ensemble learning

The document discusses ensemble learning, emphasizing the importance of training diverse models to improve prediction accuracy. It covers techniques such as hard voting, soft voting, random forests, bagging, and boosting, illustrating their applications with examples from Python code. The document also includes results from a Random Forest Classifier and a Voting Classifier, showcasing their performance metrics.

Uploaded by

oulla898
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

7/19/22, 8:07 AM ensemble learning - Jupyter Notebook

Ensemble learning
Key idea: Run a base learning algorithm multiple times, then
combine the predictions of the different learners to get a final
prediction.

Training models independently on same dataset tends to yield same result!


• For an ensemble to be useful, trained models need to be different

1. Use slightly different (randomized) datasets


2. Use slightly different (randomized) training procedure

Hard Voting:
In the setting of binary classification, hard voting is a simple way for
an ensemble of classifiers to make predictions, that is, to output the
majority winner between the two classes.

Even if each classifier is a weak learner, the ensemble can be a strong


learner under hard voting, provided suficiently many weak yet diverse
learners.

localhost:8888/notebooks/ML/ensemble learning.ipynb 1/7


7/19/22, 8:07 AM ensemble learning - Jupyter Notebook

Soft Voting
If the classifiers in the ensemble have class probabilities, we may use soft voting to
aggregate.
Soft voting: the ensemble will predict the class with the highest class probability,
averaged over all the individual classifiers.
Often better than hard voting.

Random forest
It is part of the decision tree idea, where we first identify a random part of our data
(10% of which is) and build a tree.
Decisions of its own, then take another random part (another 10%)
and build another tree, and so on in the rest of the trees for the rest of the data . . .

And when we get information, we want to do a forecast, apply it to all the trees, and
get the average output of all the trees.

localhost:8888/notebooks/ML/ensemble learning.ipynb 2/7


7/19/22, 8:07 AM ensemble learning - Jupyter Notebook

Bagging

localhost:8888/notebooks/ML/ensemble learning.ipynb 3/7


7/19/22, 8:07 AM ensemble learning - Jupyter Notebook

boosting
Boosting is a learning method for ensemble models, where individual predictors are
trained sequentially, each trying to correct its predecessor. (AdaBoost)

localhost:8888/notebooks/ML/ensemble learning.ipynb 4/7


7/19/22, 8:07 AM ensemble learning - Jupyter Notebook

In [4]:  1 #Import Libraries


2 from sklearn.datasets import load_breast_cancer
3 from sklearn.model_selection import train_test_split
4 from sklearn.ensemble import RandomForestClassifier
5 from sklearn.metrics import confusion_matrix
6 import seaborn as sns
7 import matplotlib.pyplot as plt
8 #----------------------------------------------------
9 ​
10 #load breast cancer data
11 ​
12 BreastData = load_breast_cancer()
13 ​
14 #X Data
15 X = BreastData.data
16 #print('X Data is \n' , X[:10])
17 #print('X shape is ' , X.shape)
18 #print('X Features are \n' , BreastData.feature_names)
19 ​
20 #y Data
21 y = BreastData.target
22 #print('y Data is \n' , y[:10])
23 #print('y shape is ' , y.shape)
24 #print('y Columns are \n' , BreastData.target_names)
25 ​
26 #----------------------------------------------------
27 #Splitting data
28 ​
29 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
30 ​
31 #Splitted Data
32 #print('X_train shape is ' , X_train.shape)
33 #print('X_test shape is ' , X_test.shape)
34 #print('y_train shape is ' , y_train.shape)
35 #print('y_test shape is ' , y_test.shape)
36 ​
37 #----------------------------------------------------
38 #Applying RandomForestClassifier Model
39 ​
40 '''
41 ensemble.RandomForestClassifier(n_estimators='warn’, criterion=’gini’, ma
42 min_samples_split=2, min_samples_leaf=1,m
43 max_features='auto’,max_leaf_nodes=None,m
44 min_impurity_split=None, bootstrap=True,o
45 random_state=None, verbose=0,warm_start=F
46 '''
47 ​
48 RandomForestClassifierModel = RandomForestClassifier(criterion = 'gini',n
49 RandomForestClassifierModel.fit(X_train, y_train)
50 ​
51 #Calculating Details
52 print('RandomForestClassifierModel Train Score is : ' , RandomForestClass
53 print('RandomForestClassifierModel Test Score is : ' , RandomForestClassi
54 print('RandomForestClassifierModel features importances are : ' , RandomF
55 print('----------------------------------------------------')
56 ​

localhost:8888/notebooks/ML/ensemble learning.ipynb 5/7


7/19/22, 8:07 AM ensemble learning - Jupyter Notebook

57 #Calculating Prediction
58 y_pred = RandomForestClassifierModel.predict(X_test)
59 y_pred_prob = RandomForestClassifierModel.predict_proba(X_test)
60 print('Predicted Value for RandomForestClassifierModel is : ' , y_pred[:1
61 print('Prediction Probabilities Value for RandomForestClassifierModel is
62 ​
63 #----------------------------------------------------
64 #Calculating Confusion Matrix
65 CM = confusion_matrix(y_test, y_pred)
66 print('Confusion Matrix is : \n', CM)
67 ​
68 # drawing confusion matrix
69 sns.heatmap(CM, center = True)
70 plt.show()

RandomForestClassifierModel Train Score is : 0.9606299212598425


RandomForestClassifierModel Test Score is : 0.9574468085106383
RandomForestClassifierModel features importances are : [0.06004193 0.00853
752 0.06310082 0.04467382 0.00023185 0.02855769
0.08153966 0.09144584 0.00163715 0.00110104 0.00786124 0.00101276
0.01108242 0.035841 0. 0.00034509 0.00142356 0.0023928
0.00063721 0. 0.10442464 0.00883582 0.14215625 0.12409232
0.00402041 0.01868416 0.04009237 0.10689029 0.00327017 0.00607014]
----------------------------------------------------
Predicted Value for RandomForestClassifierModel is : [0 0 1 0 1 1 1 0 0 1]
Prediction Probabilities Value for RandomForestClassifierModel is : [[0.73
648588 0.26351412]
[0.96361511 0.03638489]
[0.03527063 0.96472937]
[0.96361511 0.03638489]
[0.22306319 0.77693681]
[0.12475729 0.87524271]
[0.02826541 0.97173459]
[0.71213358 0.28786642]
[0.94329633 0.05670367]
[0.06490707 0.93509293]]
Confusion Matrix is :
[[ 64 4]
[ 4 116]]

localhost:8888/notebooks/ML/ensemble learning.ipynb 6/7


7/19/22, 8:07 AM ensemble learning - Jupyter Notebook

In [8]:  1 from sklearn import datasets


2 from sklearn.model_selection import cross_val_score
3 from sklearn.linear_model import LogisticRegression
4 from sklearn.naive_bayes import GaussianNB
5 from sklearn.ensemble import RandomForestClassifier
6 from sklearn.ensemble import VotingClassifier
7 ​
8 iris = datasets.load_iris()
9 X = iris.data[:, 1:3]
10 y = iris.target
11 ​
12 clf1 = LogisticRegression(solver='lbfgs', multi_class='multinomial',rando
13 clf2 = RandomForestClassifier(n_estimators=100, random_state=1)
14 clf3 = GaussianNB()
15 ​
16 eclf = VotingClassifier(estimators=[ ('rf', clf2), ('gnb', clf3)])
17 ​
18 for clf, label in zip([clf1, clf2, clf3, eclf], ['Logistic Regression',
19 scores = cross_val_score(clf, X, y, cv=5, scoring='accuracy')
20 print("Accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std

Accuracy: 0.95 (+/- 0.04) [Logistic Regression]


Accuracy: 0.94 (+/- 0.04) [Random Forest ]
Accuracy: 0.91 (+/- 0.04) [naive Bayes ]
Accuracy: 0.93 (+/- 0.01) [Ensemble ]

localhost:8888/notebooks/ML/ensemble learning.ipynb 7/7

You might also like