Question - 2-Interview Question ML
Question - 2-Interview Question ML
Imports the necessary libraries and modules for data processing, model training, optimization, evaluation, and visualization.
In [1]:
In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from skopt import BayesSearchCV
from xgboost import XGBClassifier
import matplotlib.pyplot as plt
from tqdm import tqdm
In [3]:
csv1_path = '/content/CSV_1.csv'
csv2_path = '/content/CSV_2.csv'
df1 = pd.read_csv(csv1_path)
df2 = pd.read_csv(csv2_path)
Data Preprocessing
This line merges the two DataFrames (df_csv1 and df_csv2) based on the common column 'sid', resulting in a combined DataFrame (df_combined) with
the columns from both datasets.
In [4]:
df
Out[5]:
sid v1 v2 v3 v4 v5 v6 v7 v8 v9 ... v92 v93 v94 v95 v96 v97 v98 v99 v100 output1
0 caCEdA3c29 0 0 0 0 0 0 10 20 20 ... 3860 3860 3870 3870 3880 3880 3880 3890 3890 0
1 e28cD34A51 0 0 0 0 0 0 30 100 220 ... 1570 1570 1580 1580 1590 1590 1600 1600 1610 0
2 c83fffD852 0 0 70 360 610 800 940 1030 1100 ... 2070 2070 2080 2080 2080 2090 2090 2090 2100 0
3 F3b3Ca734f 0 0 10 40 180 400 530 650 750 ... 2750 2760 2770 2780 2790 2800 2810 2820 2830 1
4 cf49dBacDb 0 0 0 0 20 120 320 490 610 ... 1910 1910 1920 1920 1930 1930 1940 1940 1950 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7466 edED49ae11 0 0 0 0 0 0 0 0 30 ... 3800 3800 3800 3790 3770 3770 3780 3780 3790 0
7467 7cFBcdf8Cf 0 0 0 0 40 200 440 660 870 ... 2260 2260 2270 2270 2270 2270 2280 2280 2280 0
7468 6D676fF94F 0 0 10 60 270 540 790 980 1120 ... 2330 2330 2340 2340 2340 2350 2350 2350 2360 0
7470 eFD07AcF5e 0 0 20 70 210 660 1050 1370 1630 ... 3800 3800 3810 3810 3810 3820 3820 3820 3820 1
These lines split the combined DataFrame (df_combined) into feature data (X) by dropping the 'output' column, and the target variable (y) by selecting only
the 'output' column.
In [6]:
In [7]:
X = X.fillna(X.mean()) #This line fills any missing values in the feature data (X) with the mean value of each co
lumn.
This line splits the resampled feature data (X_resampled) and target variable (y_resampled) into training and testing sets using the train_test_split
function. The testing set size is set to 25% of the data, and the random state is set to 42 for reproducibility.
In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
In [9]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
In [10]:
param_space = {
'n_estimators': (100, 500),
'max_depth': (3, 7),
'learning_rate': (0.01, 0.1),
'subsample': (0.7, 1.0),
'gamma': (0, 3)
}
These lines create an XGBoost classifier (xgb) and perform Bayesian optimization using BayesSearchCV from the skopt library. The param_space
dictionary defines the search space for hyperparameters. The number of iterations is set to 50, the cross-validation is performed with 3 folds, and parallel
computing is enabled (n_jobs=-1). The optimization is performed on the training data (X_train and y_train).
In [11]:
Training
These lines create an XGBoost classifier (xgb_model) using the best parameters obtained from Bayesian optimization (best_params). The classifier is
initialized with the objective set to 'binary:logistic' and the random state set to 42. The fit method is then called to train the model on the training data
In [12]:
best_params = opt_xgb.best_params_
xgb_model = XGBClassifier(objective='binary:logistic', random_state=42, **best_params)
xgb_model.fit(X_train_scaled, y_train, eval_set=[(X_train_scaled, y_train)], eval_metric='logloss', verbose=True)
[0] validation_0-logloss:0.68613
[1] validation_0-logloss:0.67919
[2] validation_0-logloss:0.67241
[3] validation_0-logloss:0.66573
[4] validation_0-logloss:0.65920
[5] validation_0-logloss:0.65277
[6] validation_0-logloss:0.64648
[7] validation_0-logloss:0.64027
[8] validation_0-logloss:0.63429
[9] validation_0-logloss:0.62830
[10] validation_0-logloss:0.62246
[11] validation_0-logloss:0.61671
[12] validation_0-logloss:0.61103
[13] validation_0-logloss:0.60548
[14] validation_0-logloss:0.60011
[15] validation_0-logloss:0.59481
[16] validation_0-logloss:0.58954
[17] validation_0-logloss:0.58440
[18] validation_0-logloss:0.57931
[19] validation_0-logloss:0.57437
[20] validation_0-logloss:0.56953
[21] validation_0-logloss:0.56467
[22] validation_0-logloss:0.55992
[23] validation_0-logloss:0.55527
[24] validation_0-logloss:0.55070
[25] validation_0-logloss:0.54631
[26] validation_0-logloss:0.54186
[27] validation_0-logloss:0.53754
[28] validation_0-logloss:0.53321
[29] validation_0-logloss:0.52901
[30] validation_0-logloss:0.52484
[31] validation_0-logloss:0.52079
[32] validation_0-logloss:0.51682
[33] validation_0-logloss:0.51284
[34] validation_0-logloss:0.50900
[35] validation_0-logloss:0.50522
[36] validation_0-logloss:0.50150
[37] validation_0-logloss:0.49782
[38] validation_0-logloss:0.49418
[39] validation_0-logloss:0.49062
[40] validation_0-logloss:0.48717
[41] validation_0-logloss:0.48371
[42] validation_0-logloss:0.48024
[43] validation_0-logloss:0.47691
[44] validation_0-logloss:0.47362
[45] validation_0-logloss:0.47036
[46] validation_0-logloss:0.46718
[47] validation_0-logloss:0.46405
[48] validation_0-logloss:0.46097
[49] validation_0-logloss:0.45799
[50] validation_0-logloss:0.45498
[51] validation_0-logloss:0.45198
[52] validation_0-logloss:0.44906
[53] validation_0-logloss:0.44615
[54] validation_0-logloss:0.44336
[55] validation_0-logloss:0.44053
[56] validation_0-logloss:0.43778
[57] validation_0-logloss:0.43509
[58] validation_0-logloss:0.43247
[59] validation_0-logloss:0.42985
[60] validation_0-logloss:0.42730
[61] validation_0-logloss:0.42470
[62] validation_0-logloss:0.42216
[63] validation_0-logloss:0.41969
[64] validation_0-logloss:0.41725
[65] validation_0-logloss:0.41486
[66] validation_0-logloss:0.41251
[67] validation_0-logloss:0.41016
[68] validation_0-logloss:0.40778
[69] validation_0-logloss:0.40545
[70] validation_0-logloss:0.40313
[71] validation_0-logloss:0.40090
[72] validation_0-logloss:0.39873
[73] validation_0-logloss:0.39658
[74] validation_0-logloss:0.39448
[75] validation_0-logloss:0.39237
[76] validation_0-logloss:0.39032
[77] validation_0-logloss:0.38828
[78] validation_0-logloss:0.38632
[79] validation_0-logloss:0.38426
[80] validation_0-logloss:0.38228
[81] validation_0-logloss:0.38033
[82] validation_0-logloss:0.37834
[83] validation_0-logloss:0.37651
[84] validation_0-logloss:0.37465
[85] validation_0-logloss:0.37286
[86] validation_0-logloss:0.37110
[87] validation_0-logloss:0.36933
[88] validation_0-logloss:0.36761
[89] validation_0-logloss:0.36589
[90] validation_0-logloss:0.36417
[91] validation_0-logloss:0.36249
[92] validation_0-logloss:0.36086
[93] validation_0-logloss:0.35922
[94] validation_0-logloss:0.35761
[95] validation_0-logloss:0.35601
[96] validation_0-logloss:0.35442
[97] validation_0-logloss:0.35286
[98] validation_0-logloss:0.35135
[99] validation_0-logloss:0.34991
[100] validation_0-logloss:0.34846
[101] validation_0-logloss:0.34697
[102] validation_0-logloss:0.34556
[103] validation_0-logloss:0.34412
[104] validation_0-logloss:0.34272
[105] validation_0-logloss:0.34131
[106] validation_0-logloss:0.33992
[107] validation_0-logloss:0.33863
[108] validation_0-logloss:0.33727
[109] validation_0-logloss:0.33599
[110] validation_0-logloss:0.33475
[111] validation_0-logloss:0.33346
[112] validation_0-logloss:0.33221
[113] validation_0-logloss:0.33096
[114] validation_0-logloss:0.32977
[115] validation_0-logloss:0.32851
[116] validation_0-logloss:0.32732
[117] validation_0-logloss:0.32612
[118] validation_0-logloss:0.32494
[119] validation_0-logloss:0.32374
[120] validation_0-logloss:0.32260
[121] validation_0-logloss:0.32144
[122] validation_0-logloss:0.32032
[123] validation_0-logloss:0.31918
[124] validation_0-logloss:0.31807
[125] validation_0-logloss:0.31696
[126] validation_0-logloss:0.31590
[127] validation_0-logloss:0.31486
[128] validation_0-logloss:0.31380
[129] validation_0-logloss:0.31280
[130] validation_0-logloss:0.31178
[131] validation_0-logloss:0.31083
[132] validation_0-logloss:0.30984
[133] validation_0-logloss:0.30875
[134] validation_0-logloss:0.30785
[135] validation_0-logloss:0.30687
[136] validation_0-logloss:0.30593
[137] validation_0-logloss:0.30503
[138] validation_0-logloss:0.30409
[139] validation_0-logloss:0.30325
[140] validation_0-logloss:0.30236
[141] validation_0-logloss:0.30143
[142] validation_0-logloss:0.30049
[143] validation_0-logloss:0.29964
[144] validation_0-logloss:0.29878
[145] validation_0-logloss:0.29799
[146] validation_0-logloss:0.29708
[147] validation_0-logloss:0.29623
[148] validation_0-logloss:0.29536
[149] validation_0-logloss:0.29449
[150] validation_0-logloss:0.29368
[151] validation_0-logloss:0.29287
[152] validation_0-logloss:0.29203
[153] validation_0-logloss:0.29124
[154] validation_0-logloss:0.29049
[155] validation_0-logloss:0.28972
[156] validation_0-logloss:0.28895
[157] validation_0-logloss:0.28816
[158] validation_0-logloss:0.28740
[159] validation_0-logloss:0.28672
[160] validation_0-logloss:0.28593
[161] validation_0-logloss:0.28525
[162] validation_0-logloss:0.28452
[163] validation_0-logloss:0.28382
[164] validation_0-logloss:0.28318
[165] validation_0-logloss:0.28252
[166] validation_0-logloss:0.28180
[167] validation_0-logloss:0.28115
[168] validation_0-logloss:0.28049
[169] validation_0-logloss:0.27984
[170] validation_0-logloss:0.27924
[171] validation_0-logloss:0.27859
[172] validation_0-logloss:0.27795
[173] validation_0-logloss:0.27738
[174] validation_0-logloss:0.27674
[175] validation_0-logloss:0.27615
[176] validation_0-logloss:0.27555
[177] validation_0-logloss:0.27495
[178] validation_0-logloss:0.27436
[179] validation_0-logloss:0.27379
[180] validation_0-logloss:0.27323
[181] validation_0-logloss:0.27267
[182] validation_0-logloss:0.27210
[183] validation_0-logloss:0.27157
[184] validation_0-logloss:0.27108
[185] validation_0-logloss:0.27051
[186] validation_0-logloss:0.26995
[187] validation_0-logloss:0.26935
[188] validation_0-logloss:0.26888
[189] validation_0-logloss:0.26839
[190] validation_0-logloss:0.26790
[191] validation_0-logloss:0.26742
[192] validation_0-logloss:0.26693
[193] validation_0-logloss:0.26646
[194] validation_0-logloss:0.26592
[195] validation_0-logloss:0.26540
[196] validation_0-logloss:0.26495
[197] validation_0-logloss:0.26445
[198] validation_0-logloss:0.26393
[199] validation_0-logloss:0.26350
[200] validation_0-logloss:0.26302
[201] validation_0-logloss:0.26259
[202] validation_0-logloss:0.26212
[203] validation_0-logloss:0.26164
[204] validation_0-logloss:0.26119
[205] validation_0-logloss:0.26077
[206] validation_0-logloss:0.26037
[207] validation_0-logloss:0.25992
[208] validation_0-logloss:0.25952
[209] validation_0-logloss:0.25907
[210] validation_0-logloss:0.25871
[211] validation_0-logloss:0.25828
[212] validation_0-logloss:0.25781
[213] validation_0-logloss:0.25741
[214] validation_0-logloss:0.25702
[215] validation_0-logloss:0.25660
[216] validation_0-logloss:0.25622
[217] validation_0-logloss:0.25589
[218] validation_0-logloss:0.25550
[219] validation_0-logloss:0.25511
[220] validation_0-logloss:0.25466
[221] validation_0-logloss:0.25426
[222] validation_0-logloss:0.25385
[223] validation_0-logloss:0.25348
[224] validation_0-logloss:0.25310
[225] validation_0-logloss:0.25278
[226] validation_0-logloss:0.25240
[227] validation_0-logloss:0.25205
[228] validation_0-logloss:0.25169
[229] validation_0-logloss:0.25133
[230] validation_0-logloss:0.25098
[231] validation_0-logloss:0.25056
[232] validation_0-logloss:0.25022
[233] validation_0-logloss:0.24984
[234] validation_0-logloss:0.24950
[235] validation_0-logloss:0.24920
[236] validation_0-logloss:0.24884
[237] validation_0-logloss:0.24853
[238] validation_0-logloss:0.24827
[239] validation_0-logloss:0.24798
[240] validation_0-logloss:0.24767
[241] validation_0-logloss:0.24735
[242] validation_0-logloss:0.24706
[243] validation_0-logloss:0.24678
[244] validation_0-logloss:0.24652
[245] validation_0-logloss:0.24622
[246] validation_0-logloss:0.24589
[247] validation_0-logloss:0.24562
[248] validation_0-logloss:0.24532
[249] validation_0-logloss:0.24500
[250] validation_0-logloss:0.24470
[251] validation_0-logloss:0.24440
[252] validation_0-logloss:0.24416
[253] validation_0-logloss:0.24386
[254] validation_0-logloss:0.24357
[255] validation_0-logloss:0.24331
[256] validation_0-logloss:0.24305
[257] validation_0-logloss:0.24276
[258] validation_0-logloss:0.24252
[259] validation_0-logloss:0.24229
[260] validation_0-logloss:0.24194
[261] validation_0-logloss:0.24168
[262] validation_0-logloss:0.24139
[263] validation_0-logloss:0.24110
[264] validation_0-logloss:0.24085
[265] validation_0-logloss:0.24060
[266] validation_0-logloss:0.24042
[267] validation_0-logloss:0.24023
[268] validation_0-logloss:0.23999
[269] validation_0-logloss:0.23970
[270] validation_0-logloss:0.23951
[271] validation_0-logloss:0.23933
[272] validation_0-logloss:0.23912
[273] validation_0-logloss:0.23888
[274] validation_0-logloss:0.23866
[275] validation_0-logloss:0.23843
[276] validation_0-logloss:0.23825
[277] validation_0-logloss:0.23804
[278] validation_0-logloss:0.23780
[279] validation_0-logloss:0.23755
[280] validation_0-logloss:0.23736
[281] validation_0-logloss:0.23713
[282] validation_0-logloss:0.23692
[283] validation_0-logloss:0.23667
[284] validation_0-logloss:0.23645
[285] validation_0-logloss:0.23620
[286] validation_0-logloss:0.23596
[287] validation_0-logloss:0.23576
[288] validation_0-logloss:0.23546
[289] validation_0-logloss:0.23522
[290] validation_0-logloss:0.23500
[291] validation_0-logloss:0.23482
[292] validation_0-logloss:0.23463
[293] validation_0-logloss:0.23444
[294] validation_0-logloss:0.23421
[295] validation_0-logloss:0.23403
[296] validation_0-logloss:0.23384
[297] validation_0-logloss:0.23365
[298] validation_0-logloss:0.23347
[299] validation_0-logloss:0.23326
[300] validation_0-logloss:0.23308
[301] validation_0-logloss:0.23283
[302] validation_0-logloss:0.23266
[303] validation_0-logloss:0.23248
[304] validation_0-logloss:0.23228
[305] validation_0-logloss:0.23220
[306] validation_0-logloss:0.23198
[307] validation_0-logloss:0.23179
[308] validation_0-logloss:0.23160
[309] validation_0-logloss:0.23137
[310] validation_0-logloss:0.23110
[311] validation_0-logloss:0.23090
[312] validation_0-logloss:0.23067
[313] validation_0-logloss:0.23047
[314] validation_0-logloss:0.23025
[315] validation_0-logloss:0.23007
[316] validation_0-logloss:0.22991
[317] validation_0-logloss:0.22971
[318] validation_0-logloss:0.22952
[319] validation_0-logloss:0.22933
[320] validation_0-logloss:0.22920
[321] validation_0-logloss:0.22898
[322] validation_0-logloss:0.22887
[323] validation_0-logloss:0.22871
[324] validation_0-logloss:0.22858
[325] validation_0-logloss:0.22837
[326] validation_0-logloss:0.22822
[327] validation_0-logloss:0.22810
[328] validation_0-logloss:0.22800
[329] validation_0-logloss:0.22785
[330] validation_0-logloss:0.22774
[331] validation_0-logloss:0.22757
[332] validation_0-logloss:0.22742
[333] validation_0-logloss:0.22729
[334] validation_0-logloss:0.22715
[335] validation_0-logloss:0.22703
[336] validation_0-logloss:0.22692
[337] validation_0-logloss:0.22678
[338] validation_0-logloss:0.22666
[339] validation_0-logloss:0.22648
[340] validation_0-logloss:0.22628
[341] validation_0-logloss:0.22617
[342] validation_0-logloss:0.22601
[343] validation_0-logloss:0.22593
[344] validation_0-logloss:0.22580
[345] validation_0-logloss:0.22561
[346] validation_0-logloss:0.22547
[347] validation_0-logloss:0.22528
[348] validation_0-logloss:0.22517
[349] validation_0-logloss:0.22499
[350] validation_0-logloss:0.22483
[351] validation_0-logloss:0.22467
[352] validation_0-logloss:0.22459
[353] validation_0-logloss:0.22446
[354] validation_0-logloss:0.22436
[355] validation_0-logloss:0.22422
[356] validation_0-logloss:0.22412
[357] validation_0-logloss:0.22398
[358] validation_0-logloss:0.22394
[359] validation_0-logloss:0.22380
[360] validation_0-logloss:0.22364
[361] validation_0-logloss:0.22348
[362] validation_0-logloss:0.22340
[363] validation_0-logloss:0.22322
[364] validation_0-logloss:0.22309
[365] validation_0-logloss:0.22293
[366] validation_0-logloss:0.22286
[367] validation_0-logloss:0.22270
[368] validation_0-logloss:0.22263
[369] validation_0-logloss:0.22252
[370] validation_0-logloss:0.22241
[371] validation_0-logloss:0.22223
[372] validation_0-logloss:0.22208
[373] validation_0-logloss:0.22195
[374] validation_0-logloss:0.22182
[375] validation_0-logloss:0.22170
[376] validation_0-logloss:0.22156
[377] validation_0-logloss:0.22148
[378] validation_0-logloss:0.22138
[379] validation_0-logloss:0.22123
[380] validation_0-logloss:0.22106
[381] validation_0-logloss:0.22097
[382] validation_0-logloss:0.22084
[383] validation_0-logloss:0.22072
[384] validation_0-logloss:0.22063
[385] validation_0-logloss:0.22052
[386] validation_0-logloss:0.22040
[387] validation_0-logloss:0.22029
[388] validation_0-logloss:0.22022
[389] validation_0-logloss:0.22010
[390] validation_0-logloss:0.22002
[391] validation_0-logloss:0.21988
[392] validation_0-logloss:0.21973
[393] validation_0-logloss:0.21954
[394] validation_0-logloss:0.21938
[395] validation_0-logloss:0.21929
[396] validation_0-logloss:0.21917
[397] validation_0-logloss:0.21905
[398] validation_0-logloss:0.21894
[399] validation_0-logloss:0.21885
[400] validation_0-logloss:0.21873
[401] validation_0-logloss:0.21854
[402] validation_0-logloss:0.21846
[403] validation_0-logloss:0.21841
[404] validation_0-logloss:0.21829
[405] validation_0-logloss:0.21819
[406] validation_0-logloss:0.21814
[407] validation_0-logloss:0.21804
[408] validation_0-logloss:0.21797
[409] validation_0-logloss:0.21782
[410] validation_0-logloss:0.21770
[411] validation_0-logloss:0.21752
[412] validation_0-logloss:0.21746
Out[12]:
▾ XGBClassifier
XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=Non
e,
gamma=3, gpu_id=None, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=0.01, max_bin=Non
e,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=6, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None
Evaluation
In [13]:
Accuracy: 0.9163879598662207
This line uses the trained XGBoost model (xgb_model) to make predictions on the testing set (X_test). The predicted values are stored in y_pred.
In [14]:
y_pred = xgb_model.predict(X_test_scaled)
print(y_pred)
[0 0 0 ... 0 0 0]
Classification Report
This line generates a classification report using the classification_report function. classification report provides important evaluation metrics such as
precision, recall, F1-score, and support for each class in the prediction
In [15]:
In [16]:
report=classification_report(y_test, y_pred)
print(report)
In [17]:
report_data = []
lines = report.split('\n')
for line in lines[2:-5]:
row = line.split()
report_data.append(row)
df_report = pd.DataFrame(report_data, columns=['class', 'precision', 'recall', 'f1-score', 'support'])
df_report.set_index('class', inplace=True)
df_report = df_report.apply(pd.to_numeric)
plt.figure(figsize=(8, 6))
sns.heatmap(df_report, annot=True, cmap='Blues', fmt='.2f', linewidths=0.5)
plt.title('Classification Report')
plt.xlabel('Metrics')
plt.ylabel('Class')
plt.show()
Figures
In [18]:
/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in draw(self)
392 def draw(self):
393 # docstring inherited
--> 394 self.renderer = self.get_renderer()
395 self.renderer.clear()
396 # Acquire a lock on the shared font cache.
KeyboardInterrupt:
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/matplotlib/pyplot.py in _draw_all_if_interactive()
118 def _draw_all_if_interactive():
119 if matplotlib.is_interactive():
--> 120 draw_all()
121
122
/usr/local/lib/python3.10/dist-packages/matplotlib/backends/backend_agg.py in draw(self)
392 def draw(self):
393 # docstring inherited
--> 394 self.renderer = self.get_renderer()
395 self.renderer.clear()
396 # Acquire a lock on the shared font cache.
KeyboardInterrupt: