Logistic Regression
Logistic Regression
↪classification
nr_employed y
0 5228.1 0
1 5195.8 0
2 4991.6 1
3 5099.1 0
4 5076.2 1
1
• 5 - Cons.conf.idx: consumer confidence index - monthly indicator (numeric)
• 6 - Euribor3m: euribor 3 month rate - daily indicator (numeric)
• 7 - Nr.employed: number of employees - quarterly indicator (numeric)
The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).
Where variable y is encoded as yes = 1 and No = 0
[4]: #check if the data set is balanced or not.
data.y.value_counts()
[4]: 0 36548
1 4640
Name: y, dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41188 entries, 0 to 41187
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 41188 non-null int64
1 duration 41188 non-null int64
2 emp_var_rate 41188 non-null float64
3 cons_price_idx 41188 non-null float64
4 cons_conf_idx 41188 non-null float64
5 euribor3m 41188 non-null float64
6 nr_employed 41188 non-null float64
7 y 41188 non-null int64
dtypes: float64(5), int64(3)
memory usage: 2.5 MB
print("Columns:",data.shape[1])
print("Rows:",data.shape[0])
Columns: 8
Rows: 41188
[7]: #define x, y
#Seperating independent data matrix & response vector
2
[8]: # Splitting data into training & testing sets (Validation set approach)
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.
↪2,random_state=0) # 20% of data is selected for test set-validation set
[10]: LogisticRegression()
[12]: model.intercept_
[12]: array([0.00389662])
3
[0.9914955 , 0.0085045 ],
…,
[0.9921623 , 0.0078377 ],
[0.94365365, 0.05634635],
[0.99442537, 0.00557463]])
[15]: 1.0
6 Confusion matrix
[16]: #obtain the confucion matrix
confusion_matrix(y_test,y_pred)
[17]: sns.heatmap(confusion_matrix(y_test,y_pred),annot=True,fmt="g")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
4
7 Values inside the Confusion Matrix
[18]: tn,fp,fn,tp=confusion_matrix(y_test,y_pred).ravel()
[19]: 7157
[20]: 606
[21]: 307
[22]: 168
5
8 Accuracy & Misclassification Error
[23]: #accuracy values from manual calculation
accuracy=(np.diag(confusion_matrix(y_test,y_pred)).sum())/len(y_test)
accuracy
[23]: 0.9060451565914057
[24]: 0.9060451565914057
[25]: 1-accuracy_score(y_test,y_pred)
[25]: 0.09395484340859428
[26]: 0.09395484340859428
6
[29]: #Area under Curve
auc = roc_auc_score(y_test, y_pred_probs[:,1])
auc
[29]: 0.9121246014152794
[30]: # Homework : Check if there is a way for you to calculate the best threshold
# Homewrork : Experiment with other available classifier models on this Data to␣
↪classify the subscription of term deposit.
7
[32]:
[ ]:
[ ]: