0% found this document useful (0 votes)
5 views13 pages

Question - 2-Interview Question ML

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

Question - 2-Interview Question ML

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Importing and creating Dataframe

Imports the necessary libraries and modules for data processing, model training, optimization, evaluation, and visualization.

In [1]:

!pip install scikit-optimize

Requirement already satisfied: scikit-optimize in /usr/local/lib/python3.10/dist-packages (0.9.0)


Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.10/dist-packages (from scikit-
optimize) (1.3.1)
Requirement already satisfied: pyaml>=16.9 in /usr/local/lib/python3.10/dist-packages (from scikit-o
ptimize) (23.7.0)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.10/dist-packages (from scikit
-optimize) (1.22.4)
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.10/dist-packages (from scikit
-optimize) (1.10.1)
Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.10/dist-packages (from
scikit-optimize) (1.2.2)
Requirement already satisfied: PyYAML in /usr/local/lib/python3.10/dist-packages (from pyaml>=16.9->
scikit-optimize) (6.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from
scikit-learn>=0.20.0->scikit-optimize) (3.1.0)

In [2]:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from skopt import BayesSearchCV
from xgboost import XGBClassifier
import matplotlib.pyplot as plt
from tqdm import tqdm

In [3]:

csv1_path = '/content/CSV_1.csv'
csv2_path = '/content/CSV_2.csv'
df1 = pd.read_csv(csv1_path)
df2 = pd.read_csv(csv2_path)

Data Preprocessing

This line merges the two DataFrames (df_csv1 and df_csv2) based on the common column 'sid', resulting in a combined DataFrame (df_combined) with
the columns from both datasets.

In [4]:

df = pd.merge(df1, df2, on='sid')


In [5]:

df

Out[5]:

sid v1 v2 v3 v4 v5 v6 v7 v8 v9 ... v92 v93 v94 v95 v96 v97 v98 v99 v100 output1

0 caCEdA3c29 0 0 0 0 0 0 10 20 20 ... 3860 3860 3870 3870 3880 3880 3880 3890 3890 0

1 e28cD34A51 0 0 0 0 0 0 30 100 220 ... 1570 1570 1580 1580 1590 1590 1600 1600 1610 0

2 c83fffD852 0 0 70 360 610 800 940 1030 1100 ... 2070 2070 2080 2080 2080 2090 2090 2090 2100 0

3 F3b3Ca734f 0 0 10 40 180 400 530 650 750 ... 2750 2760 2770 2780 2790 2800 2810 2820 2830 1

4 cf49dBacDb 0 0 0 0 20 120 320 490 610 ... 1910 1910 1920 1920 1930 1930 1940 1940 1950 1

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

7466 edED49ae11 0 0 0 0 0 0 0 0 30 ... 3800 3800 3800 3790 3770 3770 3780 3780 3790 0

7467 7cFBcdf8Cf 0 0 0 0 40 200 440 660 870 ... 2260 2260 2270 2270 2270 2270 2280 2280 2280 0

7468 6D676fF94F 0 0 10 60 270 540 790 980 1120 ... 2330 2330 2340 2340 2340 2350 2350 2350 2360 0

7469 Cf5EeB2C06 0 0 30 100 120 180 380 590 750 ... 0 0 0 0 0 0 0 0 0 1

7470 eFD07AcF5e 0 0 20 70 210 660 1050 1370 1630 ... 3800 3800 3810 3810 3810 3820 3820 3820 3820 1

7471 rows × 102 columns

These lines split the combined DataFrame (df_combined) into feature data (X) by dropping the 'output' column, and the target variable (y) by selecting only
the 'output' column.

In [6]:

X = df.drop(['sid', 'output1'], axis=1)


y = df['output1']

In [7]:

X = X.fillna(X.mean()) #This line fills any missing values in the feature data (X) with the mean value of each co
lumn.

This line splits the resampled feature data (X_resampled) and target variable (y_resampled) into training and testing sets using the train_test_split
function. The testing set size is set to 25% of the data, and the random state is set to 42 for reproducibility.

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Fit the XGBoost model with Bayesian optimization

In [10]:

param_space = {
'n_estimators': (100, 500),
'max_depth': (3, 7),
'learning_rate': (0.01, 0.1),
'subsample': (0.7, 1.0),
'gamma': (0, 3)
}

These lines create an XGBoost classifier (xgb) and perform Bayesian optimization using BayesSearchCV from the skopt library. The param_space
dictionary defines the search space for hyperparameters. The number of iterations is set to 50, the cross-validation is performed with 3 folds, and parallel
computing is enabled (n_jobs=-1). The optimization is performed on the training data (X_train and y_train).
In [11]:

xgb = XGBClassifier(objective='binary:logistic', random_state=42)


opt_xgb = BayesSearchCV(xgb, param_space, n_iter=20, cv=3, n_jobs=-1)

with tqdm(total=20) as pbar:


def update_progress(_):
pbar.update(1)

opt_xgb.fit(X_train_scaled, y_train, callback=update_progress)

100%|██████████| 20/20 [08:53<00:00, 26.69s/it]

Training

These lines create an XGBoost classifier (xgb_model) using the best parameters obtained from Bayesian optimization (best_params). The classifier is
initialized with the objective set to 'binary:logistic' and the random state set to 42. The fit method is then called to train the model on the training data

In [12]:

best_params = opt_xgb.best_params_
xgb_model = XGBClassifier(objective='binary:logistic', random_state=42, **best_params)
xgb_model.fit(X_train_scaled, y_train, eval_set=[(X_train_scaled, y_train)], eval_metric='logloss', verbose=True)

[0] validation_0-logloss:0.68613
[1] validation_0-logloss:0.67919
[2] validation_0-logloss:0.67241

/usr/local/lib/python3.10/dist-packages/xgboost/sklearn.py:835: UserWarning: `eval_metric` in `fit`


method is deprecated for better compatibility with scikit-learn, use `eval_metric` in constructor or
`set_params` instead.
warnings.warn(

[3] validation_0-logloss:0.66573
[4] validation_0-logloss:0.65920
[5] validation_0-logloss:0.65277
[6] validation_0-logloss:0.64648
[7] validation_0-logloss:0.64027
[8] validation_0-logloss:0.63429
[9] validation_0-logloss:0.62830
[10] validation_0-logloss:0.62246
[11] validation_0-logloss:0.61671
[12] validation_0-logloss:0.61103
[13] validation_0-logloss:0.60548
[14] validation_0-logloss:0.60011
[15] validation_0-logloss:0.59481
[16] validation_0-logloss:0.58954
[17] validation_0-logloss:0.58440
[18] validation_0-logloss:0.57931
[19] validation_0-logloss:0.57437
[20] validation_0-logloss:0.56953
[21] validation_0-logloss:0.56467
[22] validation_0-logloss:0.55992
[23] validation_0-logloss:0.55527
[24] validation_0-logloss:0.55070
[25] validation_0-logloss:0.54631
[26] validation_0-logloss:0.54186
[27] validation_0-logloss:0.53754
[28] validation_0-logloss:0.53321
[29] validation_0-logloss:0.52901
[30] validation_0-logloss:0.52484
[31] validation_0-logloss:0.52079
[32] validation_0-logloss:0.51682
[33] validation_0-logloss:0.51284
[34] validation_0-logloss:0.50900
[35] validation_0-logloss:0.50522
[36] validation_0-logloss:0.50150
[37] validation_0-logloss:0.49782
[38] validation_0-logloss:0.49418
[39] validation_0-logloss:0.49062
[40] validation_0-logloss:0.48717
[41] validation_0-logloss:0.48371
[42] validation_0-logloss:0.48024
[43] validation_0-logloss:0.47691
[44] validation_0-logloss:0.47362
[45] validation_0-logloss:0.47036
[46] validation_0-logloss:0.46718
[47] validation_0-logloss:0.46405
[48] validation_0-logloss:0.46097
[49] validation_0-logloss:0.45799
[50] validation_0-logloss:0.45498
[51] validation_0-logloss:0.45198
[52] validation_0-logloss:0.44906
[53] validation_0-logloss:0.44615
[54] validation_0-logloss:0.44336
[55] validation_0-logloss:0.44053
[56] validation_0-logloss:0.43778
[57] validation_0-logloss:0.43509
[58] validation_0-logloss:0.43247
[59] validation_0-logloss:0.42985
[60] validation_0-logloss:0.42730
[61] validation_0-logloss:0.42470
[62] validation_0-logloss:0.42216
[63] validation_0-logloss:0.41969
[64] validation_0-logloss:0.41725
[65] validation_0-logloss:0.41486
[66] validation_0-logloss:0.41251
[67] validation_0-logloss:0.41016
[68] validation_0-logloss:0.40778
[69] validation_0-logloss:0.40545
[70] validation_0-logloss:0.40313
[71] validation_0-logloss:0.40090
[72] validation_0-logloss:0.39873
[73] validation_0-logloss:0.39658
[74] validation_0-logloss:0.39448
[75] validation_0-logloss:0.39237
[76] validation_0-logloss:0.39032
[77] validation_0-logloss:0.38828
[78] validation_0-logloss:0.38632
[79] validation_0-logloss:0.38426
[80] validation_0-logloss:0.38228
[81] validation_0-logloss:0.38033
[82] validation_0-logloss:0.37834
[83] validation_0-logloss:0.37651
[84] validation_0-logloss:0.37465
[85] validation_0-logloss:0.37286
[86] validation_0-logloss:0.37110
[87] validation_0-logloss:0.36933
[88] validation_0-logloss:0.36761
[89] validation_0-logloss:0.36589
[90] validation_0-logloss:0.36417
[91] validation_0-logloss:0.36249
[92] validation_0-logloss:0.36086
[93] validation_0-logloss:0.35922
[94] validation_0-logloss:0.35761
[95] validation_0-logloss:0.35601
[96] validation_0-logloss:0.35442
[97] validation_0-logloss:0.35286
[98] validation_0-logloss:0.35135
[99] validation_0-logloss:0.34991
[100] validation_0-logloss:0.34846
[101] validation_0-logloss:0.34697
[102] validation_0-logloss:0.34556
[103] validation_0-logloss:0.34412
[104] validation_0-logloss:0.34272
[105] validation_0-logloss:0.34131
[106] validation_0-logloss:0.33992
[107] validation_0-logloss:0.33863
[108] validation_0-logloss:0.33727
[109] validation_0-logloss:0.33599
[110] validation_0-logloss:0.33475
[111] validation_0-logloss:0.33346
[112] validation_0-logloss:0.33221
[113] validation_0-logloss:0.33096
[114] validation_0-logloss:0.32977
[115] validation_0-logloss:0.32851
[116] validation_0-logloss:0.32732
[117] validation_0-logloss:0.32612
[118] validation_0-logloss:0.32494
[119] validation_0-logloss:0.32374
[120] validation_0-logloss:0.32260
[121] validation_0-logloss:0.32144
[122] validation_0-logloss:0.32032
[123] validation_0-logloss:0.31918
[124] validation_0-logloss:0.31807
[125] validation_0-logloss:0.31696
[126] validation_0-logloss:0.31590
[127] validation_0-logloss:0.31486
[128] validation_0-logloss:0.31380
[129] validation_0-logloss:0.31280
[130] validation_0-logloss:0.31178
[131] validation_0-logloss:0.31083
[132] validation_0-logloss:0.30984
[133] validation_0-logloss:0.30875
[134] validation_0-logloss:0.30785
[135] validation_0-logloss:0.30687
[136] validation_0-logloss:0.30593
[137] validation_0-logloss:0.30503
[138] validation_0-logloss:0.30409
[139] validation_0-logloss:0.30325
[140] validation_0-logloss:0.30236
[141] validation_0-logloss:0.30143
[142] validation_0-logloss:0.30049
[143] validation_0-logloss:0.29964
[144] validation_0-logloss:0.29878
[145] validation_0-logloss:0.29799
[146] validation_0-logloss:0.29708
[147] validation_0-logloss:0.29623
[148] validation_0-logloss:0.29536
[149] validation_0-logloss:0.29449
[150] validation_0-logloss:0.29368
[151] validation_0-logloss:0.29287
[152] validation_0-logloss:0.29203
[153] validation_0-logloss:0.29124
[154] validation_0-logloss:0.29049
[155] validation_0-logloss:0.28972
[156] validation_0-logloss:0.28895
[157] validation_0-logloss:0.28816
[158] validation_0-logloss:0.28740
[159] validation_0-logloss:0.28672
[160] validation_0-logloss:0.28593
[161] validation_0-logloss:0.28525
[162] validation_0-logloss:0.28452
[163] validation_0-logloss:0.28382
[164] validation_0-logloss:0.28318
[165] validation_0-logloss:0.28252
[166] validation_0-logloss:0.28180
[167] validation_0-logloss:0.28115
[168] validation_0-logloss:0.28049
[169] validation_0-logloss:0.27984
[170] validation_0-logloss:0.27924
[171] validation_0-logloss:0.27859
[172] validation_0-logloss:0.27795
[173] validation_0-logloss:0.27738
[174] validation_0-logloss:0.27674
[175] validation_0-logloss:0.27615
[176] validation_0-logloss:0.27555
[177] validation_0-logloss:0.27495
[178] validation_0-logloss:0.27436
[179] validation_0-logloss:0.27379
[180] validation_0-logloss:0.27323
[181] validation_0-logloss:0.27267
[182] validation_0-logloss:0.27210
[183] validation_0-logloss:0.27157
[184] validation_0-logloss:0.27108
[185] validation_0-logloss:0.27051
[186] validation_0-logloss:0.26995
[187] validation_0-logloss:0.26935
[188] validation_0-logloss:0.26888
[189] validation_0-logloss:0.26839
[190] validation_0-logloss:0.26790
[191] validation_0-logloss:0.26742
[192] validation_0-logloss:0.26693
[193] validation_0-logloss:0.26646
[194] validation_0-logloss:0.26592
[195] validation_0-logloss:0.26540
[196] validation_0-logloss:0.26495
[197] validation_0-logloss:0.26445
[198] validation_0-logloss:0.26393
[199] validation_0-logloss:0.26350
[200] validation_0-logloss:0.26302
[201] validation_0-logloss:0.26259
[202] validation_0-logloss:0.26212
[203] validation_0-logloss:0.26164
[204] validation_0-logloss:0.26119
[205] validation_0-logloss:0.26077
[206] validation_0-logloss:0.26037
[207] validation_0-logloss:0.25992
[208] validation_0-logloss:0.25952
[209] validation_0-logloss:0.25907
[210] validation_0-logloss:0.25871
[211] validation_0-logloss:0.25828
[212] validation_0-logloss:0.25781
[213] validation_0-logloss:0.25741
[214] validation_0-logloss:0.25702
[215] validation_0-logloss:0.25660
[216] validation_0-logloss:0.25622
[217] validation_0-logloss:0.25589
[218] validation_0-logloss:0.25550
[219] validation_0-logloss:0.25511
[220] validation_0-logloss:0.25466
[221] validation_0-logloss:0.25426
[222] validation_0-logloss:0.25385
[223] validation_0-logloss:0.25348
[224] validation_0-logloss:0.25310
[225] validation_0-logloss:0.25278
[226] validation_0-logloss:0.25240
[227] validation_0-logloss:0.25205
[228] validation_0-logloss:0.25169
[229] validation_0-logloss:0.25133
[230] validation_0-logloss:0.25098
[231] validation_0-logloss:0.25056
[232] validation_0-logloss:0.25022
[233] validation_0-logloss:0.24984
[234] validation_0-logloss:0.24950
[235] validation_0-logloss:0.24920
[236] validation_0-logloss:0.24884
[237] validation_0-logloss:0.24853
[238] validation_0-logloss:0.24827
[239] validation_0-logloss:0.24798
[240] validation_0-logloss:0.24767
[241] validation_0-logloss:0.24735
[242] validation_0-logloss:0.24706
[243] validation_0-logloss:0.24678
[244] validation_0-logloss:0.24652
[245] validation_0-logloss:0.24622
[246] validation_0-logloss:0.24589
[247] validation_0-logloss:0.24562
[248] validation_0-logloss:0.24532
[249] validation_0-logloss:0.24500
[250] validation_0-logloss:0.24470
[251] validation_0-logloss:0.24440
[252] validation_0-logloss:0.24416
[253] validation_0-logloss:0.24386
[254] validation_0-logloss:0.24357
[255] validation_0-logloss:0.24331
[256] validation_0-logloss:0.24305
[257] validation_0-logloss:0.24276
[258] validation_0-logloss:0.24252
[259] validation_0-logloss:0.24229
[260] validation_0-logloss:0.24194
[261] validation_0-logloss:0.24168
[262] validation_0-logloss:0.24139
[263] validation_0-logloss:0.24110
[264] validation_0-logloss:0.24085
[265] validation_0-logloss:0.24060
[266] validation_0-logloss:0.24042
[267] validation_0-logloss:0.24023
[268] validation_0-logloss:0.23999
[269] validation_0-logloss:0.23970
[270] validation_0-logloss:0.23951
[271] validation_0-logloss:0.23933
[272] validation_0-logloss:0.23912
[273] validation_0-logloss:0.23888
[274] validation_0-logloss:0.23866
[275] validation_0-logloss:0.23843
[276] validation_0-logloss:0.23825
[277] validation_0-logloss:0.23804
[278] validation_0-logloss:0.23780
[279] validation_0-logloss:0.23755
[280] validation_0-logloss:0.23736
[281] validation_0-logloss:0.23713
[282] validation_0-logloss:0.23692
[283] validation_0-logloss:0.23667
[284] validation_0-logloss:0.23645
[285] validation_0-logloss:0.23620
[286] validation_0-logloss:0.23596
[287] validation_0-logloss:0.23576
[288] validation_0-logloss:0.23546
[289] validation_0-logloss:0.23522
[290] validation_0-logloss:0.23500
[291] validation_0-logloss:0.23482
[292] validation_0-logloss:0.23463
[293] validation_0-logloss:0.23444
[294] validation_0-logloss:0.23421
[295] validation_0-logloss:0.23403
[296] validation_0-logloss:0.23384
[297] validation_0-logloss:0.23365
[298] validation_0-logloss:0.23347
[299] validation_0-logloss:0.23326
[300] validation_0-logloss:0.23308
[301] validation_0-logloss:0.23283
[302] validation_0-logloss:0.23266
[303] validation_0-logloss:0.23248
[304] validation_0-logloss:0.23228
[305] validation_0-logloss:0.23220
[306] validation_0-logloss:0.23198
[307] validation_0-logloss:0.23179
[308] validation_0-logloss:0.23160
[309] validation_0-logloss:0.23137
[310] validation_0-logloss:0.23110
[311] validation_0-logloss:0.23090
[312] validation_0-logloss:0.23067
[313] validation_0-logloss:0.23047
[314] validation_0-logloss:0.23025
[315] validation_0-logloss:0.23007
[316] validation_0-logloss:0.22991
[317] validation_0-logloss:0.22971
[318] validation_0-logloss:0.22952
[319] validation_0-logloss:0.22933
[320] validation_0-logloss:0.22920
[321] validation_0-logloss:0.22898
[322] validation_0-logloss:0.22887
[323] validation_0-logloss:0.22871
[324] validation_0-logloss:0.22858
[325] validation_0-logloss:0.22837
[326] validation_0-logloss:0.22822
[327] validation_0-logloss:0.22810
[328] validation_0-logloss:0.22800
[329] validation_0-logloss:0.22785
[330] validation_0-logloss:0.22774
[331] validation_0-logloss:0.22757
[332] validation_0-logloss:0.22742
[333] validation_0-logloss:0.22729
[334] validation_0-logloss:0.22715
[335] validation_0-logloss:0.22703
[336] validation_0-logloss:0.22692
[337] validation_0-logloss:0.22678
[338] validation_0-logloss:0.22666
[339] validation_0-logloss:0.22648
[340] validation_0-logloss:0.22628
[341] validation_0-logloss:0.22617
[342] validation_0-logloss:0.22601
[343] validation_0-logloss:0.22593
[344] validation_0-logloss:0.22580
[345] validation_0-logloss:0.22561
[346] validation_0-logloss:0.22547
[347] validation_0-logloss:0.22528
[348] validation_0-logloss:0.22517
[349] validation_0-logloss:0.22499
[350] validation_0-logloss:0.22483
[351] validation_0-logloss:0.22467
[352] validation_0-logloss:0.22459
[353] validation_0-logloss:0.22446
[354] validation_0-logloss:0.22436
[355] validation_0-logloss:0.22422
[356] validation_0-logloss:0.22412
[357] validation_0-logloss:0.22398
[358] validation_0-logloss:0.22394
[359] validation_0-logloss:0.22380
[360] validation_0-logloss:0.22364
[361] validation_0-logloss:0.22348
[362] validation_0-logloss:0.22340
[363] validation_0-logloss:0.22322
[364] validation_0-logloss:0.22309
[365] validation_0-logloss:0.22293
[366] validation_0-logloss:0.22286
[367] validation_0-logloss:0.22270
[368] validation_0-logloss:0.22263
[369] validation_0-logloss:0.22252
[370] validation_0-logloss:0.22241
[371] validation_0-logloss:0.22223
[372] validation_0-logloss:0.22208
[373] validation_0-logloss:0.22195
[374] validation_0-logloss:0.22182
[375] validation_0-logloss:0.22170
[376] validation_0-logloss:0.22156
[377] validation_0-logloss:0.22148
[378] validation_0-logloss:0.22138
[379] validation_0-logloss:0.22123
[380] validation_0-logloss:0.22106
[381] validation_0-logloss:0.22097
[382] validation_0-logloss:0.22084
[383] validation_0-logloss:0.22072
[384] validation_0-logloss:0.22063
[385] validation_0-logloss:0.22052
[386] validation_0-logloss:0.22040
[387] validation_0-logloss:0.22029
[388] validation_0-logloss:0.22022
[389] validation_0-logloss:0.22010
[390] validation_0-logloss:0.22002
[391] validation_0-logloss:0.21988
[392] validation_0-logloss:0.21973
[393] validation_0-logloss:0.21954
[394] validation_0-logloss:0.21938
[395] validation_0-logloss:0.21929
[396] validation_0-logloss:0.21917
[397] validation_0-logloss:0.21905
[398] validation_0-logloss:0.21894
[399] validation_0-logloss:0.21885
[400] validation_0-logloss:0.21873
[401] validation_0-logloss:0.21854
[402] validation_0-logloss:0.21846
[403] validation_0-logloss:0.21841
[404] validation_0-logloss:0.21829
[405] validation_0-logloss:0.21819
[406] validation_0-logloss:0.21814
[407] validation_0-logloss:0.21804
[408] validation_0-logloss:0.21797
[409] validation_0-logloss:0.21782
[410] validation_0-logloss:0.21770
[411] validation_0-logloss:0.21752
[412] validation_0-logloss:0.21746

Out[12]:

▾ XGBClassifier
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=Non
e,
gamma=3, gpu_id=None, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.01, max_bin=Non
e,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=6, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None

Evaluation

In [13]:

accuracy = xgb_model.score(X_test_scaled, y_test)


print(f"Accuracy: {accuracy}")

Accuracy: 0.9163879598662207

This line uses the trained XGBoost model (xgb_model) to make predictions on the testing set (X_test). The predicted values are stored in y_pred.

In [14]:

y_pred = xgb_model.predict(X_test_scaled)
print(y_pred)

[0 0 0 ... 0 0 0]

Classification Report
This line generates a classification report using the classification_report function. classification report provides important evaluation metrics such as
precision, recall, F1-score, and support for each class in the prediction

In [15]:

from sklearn.metrics import classification_report


import seaborn as sns

In [16]:

report=classification_report(y_test, y_pred)
print(report)

precision recall f1-score support

0 0.92 1.00 0.96 1372


1 0.38 0.02 0.05 123

accuracy 0.92 1495


macro avg 0.65 0.51 0.50 1495
weighted avg 0.87 0.92 0.88 1495

In [17]:

report_data = []
lines = report.split('\n')
for line in lines[2:-5]:
row = line.split()
report_data.append(row)
df_report = pd.DataFrame(report_data, columns=['class', 'precision', 'recall', 'f1-score', 'support'])
df_report.set_index('class', inplace=True)

df_report = df_report.apply(pd.to_numeric)

plt.figure(figsize=(8, 6))
sns.heatmap(df_report, annot=True, cmap='Blues', fmt='.2f', linewidths=0.5)
plt.title('Classification Report')
plt.xlabel('Metrics')
plt.ylabel('Class')
plt.show()
Figures

In [18]:

from xgboost import plot_tree

# Get the number of boosting rounds (number of trees)


num_rounds = xgb_model.best_iteration

# Generate separate plots for all the trees


for i in range(num_rounds):
plt.figure(figsize=(100, 50), dpi=600)
plot_tree(xgb_model, num_trees=i, rankdir='LR')
plt.tight_layout()
plt.savefig(f'xgboost_tree_{i}.png', dpi=600)
plt.show()

<Figure size 60000x30000 with 0 Axes>

<Figure size 60000x30000 with 0 Axes>

<Figure size 60000x30000 with 0 Axes>

<Figure size 60000x30000 with 0 Axes>


---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-18-c25c4672fb8c> in <cell line: 8>()
10 plot_tree(xgb_model, num_trees=i, rankdir='LR')
11 plt.tight_layout()
---> 12 plt.savefig(f'xgboost_tree_{i}.png', dpi=600)
13 plt.show()

/usr/local/lib/python3.10/dist-packages/matplotlib/pyplot.py in savefig(*args, **kwargs)


1021 def savefig(*args, **kwargs):
1022 fig = gcf()
-> 1023 res = fig.savefig(*args, **kwargs)
1024 fig.canvas.draw_idle() # Need this if 'transparent=True', to reset colors.
1025 return res

/usr/local/lib/python3.10/dist-packages/matplotlib/figure.py in savefig(self, fname, transparent, **


kwargs)
3341 ax.patch._cm_set(facecolor='none', edgecolor='none'))
3342
-> 3343 self.canvas.print_figure(fname, **kwargs)
3344
3345 def ginput(self, n=1, timeout=30, show_clicks=True,

/usr/local/lib/python3.10/dist-packages/matplotlib/backend_bases.py in print_figure(self, filename,


dpi, facecolor, edgecolor, orientation, format, bbox_inches, pad_inches, bbox_extra_artists, backend
, **kwargs)
2364 # force the figure dpi to 72), so we need to set it again here.
2365 with cbook._setattr_cm(self.figure, dpi=dpi):
-> 2366 result = print_method(
2367 filename,
2368 facecolor=facecolor,

/usr/local/lib/python3.10/dist-packages/matplotlib/backend_bases.py in <lambda>(*args, **kwargs)


2230 "bbox_inches_restore"}
2231 skip = optional_kws - {*inspect.signature(meth).parameters}
-> 2232 print_method = functools.wraps(meth)(lambda *args, **kwargs: meth(
2233 *args, **{k: v for k, v in kwargs.items() if k not in skip}))
2234 else: # Let third-parties do as they see fit.

/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in print_png(self, filena


me_or_obj, metadata, pil_kwargs)
507 *metadata*, including the default 'Software' key.
508 """
--> 509 self._print_pil(filename_or_obj, "png", pil_kwargs, metadata)
510
511 def print_to_buffer(self):

/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in _print_pil(self, filen


ame_or_obj, fmt, pil_kwargs, metadata)
455 *pil_kwargs* and *metadata* are forwarded).
456 """
--> 457 FigureCanvasAgg.draw(self)
458 mpl.image.imsave(
459 filename_or_obj, self.buffer_rgba(), format=fmt, origin="upper",

/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in draw(self)
392 def draw(self):
393 # docstring inherited
--> 394 self.renderer = self.get_renderer()
395 self.renderer.clear()
396 # Acquire a lock on the shared font cache.

/usr/local/lib/python3.10/dist-packages/matplotlib/_api/deprecation.py in wrapper(*inner_args, **inn


er_kwargs)
382 # Early return in the simple, non-deprecated case (much faster than
383 # calling bind()).
--> 384 return func(*inner_args, **inner_kwargs)
385 arguments = signature.bind(*inner_args, **inner_kwargs).arguments
386 if is_varargs and arguments.get(name):

/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in get_renderer(self, cle


ared)
409 reuse_renderer = (self._lastKey == key)
410 if not reuse_renderer:
--> 411 self.renderer = RendererAgg(w, h, self.figure.dpi)
412 self._lastKey = key
413 elif cleared:

/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in __init__(self, width,


height, dpi)
82 self.width = width
83 self.height = height
---> 84 self._renderer = _RendererAgg(int(width), int(height), dpi)
85 self._filter_renderers = []
86

KeyboardInterrupt:

Error in callback <function _draw_all_if_interactive at 0x7b74856d6050> (for post_execute):

---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/matplotlib/pyplot.py in _draw_all_if_interactive()
118 def _draw_all_if_interactive():
119 if matplotlib.is_interactive():
--> 120 draw_all()
121
122

/usr/local/lib/python3.10/dist-packages/matplotlib/_pylab_helpers.py in draw_all(cls, force)


130 for manager in cls.get_all_fig_managers():
131 if force or manager.canvas.figure.stale:
--> 132 manager.canvas.draw_idle()
133
134

/usr/local/lib/python3.10/dist-packages/matplotlib/backend_bases.py in draw_idle(self, *args, **kwar


gs)
2080 if not self._is_idle_drawing:
2081 with self._idle_draw_cntx():
-> 2082 self.draw(*args, **kwargs)
2083
2084 @property

/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in draw(self)
392 def draw(self):
393 # docstring inherited
--> 394 self.renderer = self.get_renderer()
395 self.renderer.clear()
396 # Acquire a lock on the shared font cache.

/usr/local/lib/python3.10/dist-packages/matplotlib/_api/deprecation.py in wrapper(*inner_args, **inn


er_kwargs)
382 # Early return in the simple, non-deprecated case (much faster than
383 # calling bind()).
--> 384 return func(*inner_args, **inner_kwargs)
385 arguments = signature.bind(*inner_args, **inner_kwargs).arguments
386 if is_varargs and arguments.get(name):

/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in get_renderer(self, cle


ared)
409 reuse_renderer = (self._lastKey == key)
410 if not reuse_renderer:
--> 411 self.renderer = RendererAgg(w, h, self.figure.dpi)
412 self._lastKey = key
413 elif cleared:

/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in __init__(self, width,


height, dpi)
82 self.width = width
83 self.height = height
---> 84 self._renderer = _RendererAgg(int(width), int(height), dpi)
85 self._filter_renderers = []
86

KeyboardInterrupt:

<Figure size 60000x30000 with 0 Axes>

You might also like