Feature Engineering On Banks' Private Credit Data - Ipynb - Colab
Feature Engineering On Banks' Private Credit Data - Ipynb - Colab
ipynb - Colab
import pandas as pd
df=pd.read_csv('credit.csv', index_col=0)
df.head()
Highest
Target Nation Birth_Place Gender Age Marriage_State House_State Work_Years Title ... ZX_Max_Overdue_Account
Education
Cust_No
5 rows × 31 columns
<Axes: >
#isnull().sum()
df_missing = pd.DataFrame(df.isnull().sum()/df.shape[0],columns = ['missing_rate']).reset_index()
df_missing.sort_values(by='missing_rate', ascending=False)[:10]
https://colab.research.google.com/drive/1NKCYHsHNEgWK9rQhpOYasj9JNCijPk_I?authuser=3#scrollTo=1bLaK0vL2bRm&printMode=true 1/6
9/24/24, 10:26 AM Feature Engineering on Banks' Private Credit Data.ipynb - Colab
index missing_rate
9 Title 0.603075
11 Industry 0.594120
7 House_State 0.424806
1 Nation 0.356877
5 Marriage_State 0.334404
10 Duty 0.126563
25 ZX_Max_Credits 0.000000
21 ZX_Max_Overdue_Account 0.000000
22 ZX_Link_Max_Overdue_Amount 0.000000
<ipython-input-9-ea6953e5cbbd>:3: FutureWarning: Calling int on a single element Series is deprecated and will raise a TypeError in the
df[col] = df[col].fillna(int(df[col].mode()))
#isnull().sum()
df_missing_2 = pd.DataFrame(df.isnull().sum()/df.shape[0],columns = ['missing_rate']).reset_index()
df_missing_2.sort_values(by='missing_rate', ascending=False)[:10]
index missing_rate
0 Target 0.0
16 Ast_Curr_Bal 0.0
29 ZX_Credit_Total_Overdue_Months 0.0
28 ZX_Credit_Max_Overdu_Amount 0.0
27 ZX_Max_Overdue_Credits 0.0
26 ZX_Max_Credit_Banks 0.0
25 ZX_Max_Credits 0.0
24 ZX_Max_Overdue_Duration 0.0
23 ZX_Total_Overdu_Months 0.0
22 ZX_Link_Max_Overdue_Amount 0.0
missingno.matrix(df)
https://colab.research.google.com/drive/1NKCYHsHNEgWK9rQhpOYasj9JNCijPk_I?authuser=3#scrollTo=1bLaK0vL2bRm&printMode=true 2/6
9/24/24, 10:26 AM Feature Engineering on Banks' Private Credit Data.ipynb - Colab
<Axes: >
#feature_selection
#filter
#crosstab()
cross_table = pd.crosstab(df.House_State, columns=df.Target, margins=True)
cross_table_rowpct = cross_table.div(cross_table["All"], axis=0)
cross_table_rowpct
Target 0 1 All
House_State
[('Work_Years', 30037.98992988671),
('Birth_Place', 2337.714562647344),
https://colab.research.google.com/drive/1NKCYHsHNEgWK9rQhpOYasj9JNCijPk_I?authuser=3#scrollTo=1bLaK0vL2bRm&printMode=true 3/6
9/24/24, 10:26 AM Feature Engineering on Banks' Private Credit Data.ipynb - Colab
('Marriage_State', 42.7575821276435),
('Duty', 30.50877073893663),
('Industry', 25.452013582101742),
('Nation', 5.256723621174332),
('Gender', 2.309012664555949),
('Highest Education', 1.2297808819626934),
('Title', 0.8774406202190365),
('House_State', 0.3184384017473372)]
<Axes: >
cols_pair=[]
for index_ in corr_matrix.index:
for col_ in corr_matrix.columns:
if corr_matrix.loc[index_,col_]>=0.8 and index_!=col_ and (col_,index_) not in cols_pair:
cols_pair.append((index_,col_))
cols_pair
[('ZX_Max_Account_Number', 'ZX_Max_Link_Banks'),
('ZX_Max_Credits', 'ZX_Max_Credit_Banks')]
https://colab.research.google.com/drive/1NKCYHsHNEgWK9rQhpOYasj9JNCijPk_I?authuser=3#scrollTo=1bLaK0vL2bRm&printMode=true 4/6
9/24/24, 10:26 AM Feature Engineering on Banks' Private Credit Data.ipynb - Colab
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
[ 5 1 9 1 1 1 10 1 1 1 1 1 1 1 11 1 1 4 1 1 1 1 1 1
3 8 7 2 1 6]
[False True False True True True False True True True True True
True True False True True False True True True True True True
False False False False True False]
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1)
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n iter i = check optimize result(
# Embedded Method
from sklearn.ensemble import RandomForestClassifier
emb = RandomForestClassifier()
emb.fit(X,y)
▾ RandomForestClassifier i ?
RandomForestClassifier()
[(0.1313, 'Ast_Curr_Bal'),
(0.1137, 'Age'),
(0.0921, 'Year_Income'),
(0.0669, 'Std_Cred_Limit'),
(0.0421, 'ZX_Link_Max_Overdue_Amount'),
(0.0412, 'ZX_Max_Account_Number'),
(0.0373, 'Highest Education'),
(0.0349, 'Duty'),
(0.0347, 'ZX_Max_Link_Banks'),
(0.0341, 'ZX_Total_Overdu_Months'),
(0.0312, 'Industry'),
(0.0307, 'Birth_Place'),
(0.0295, 'ZX_Max_Overdue_Duration'),
(0.0295, 'ZX_Max_Overdue_Account'),
(0.0252, 'Loan_Curr_Bal'),
(0.0248, 'Couple_Year_Income'),
(0.0244, 'Marriage_State'),
(0.0222, 'L12_Month_Pay_Amount'),
(0.0198, 'ZX_Max_Credit_Banks'),
(0.0192, 'ZX_Max_Credits'),
(0.0188, 'ZX_Credit_Max_Overdu_Amount'),
(0.0155, 'Work_Years'),
(0.0142, 'ZX_Credit_Max_Overdue_Duration'),
(0.0141, 'Gender'),
(0.0124, 'Title'),
https://colab.research.google.com/drive/1NKCYHsHNEgWK9rQhpOYasj9JNCijPk_I?authuser=3#scrollTo=1bLaK0vL2bRm&printMode=true 5/6
9/24/24, 10:26 AM Feature Engineering on Banks' Private Credit Data.ipynb - Colab
(0.012, 'ZX_Credit_Total_Overdue_Months'),
(0.0102, 'ZX_Max_Overdue_Credits'),
(0.0097, 'Nation'),
(0.0084, 'House_State'),
(0.0, 'Couple_L12_Month_Pay_Amount')]
# variable removal
deleted_col = ['ZX_Max_Overdue_Duration','Couple_L12_Month_Pay_Amount','House_State', 'ZX_Max_Overdue_Credits', 'Couple_Year_Income','Marriage
df_selected = df.drop(deleted_col, axis=1)
df_selected.head()
Highest
Target Nation Birth_Place Gender Age Duty Industry Year_Income L12_Month_Pay_Amount ... Loan_Curr_Bal ZX_M
Education
Cust_No
5 rows × 22 columns
https://colab.research.google.com/drive/1NKCYHsHNEgWK9rQhpOYasj9JNCijPk_I?authuser=3#scrollTo=1bLaK0vL2bRm&printMode=true 6/6