0% found this document useful (0 votes)
24 views10 pages

1 - Credit Risk Log File

The document analyzes credit risk data using logistic regression. Descriptive statistics and frequency tables are presented for the variables. Logistic regression is used to predict the probability of high credit risk based on independent variables like checking/savings amounts, customer history, demographics and loan details. Several independent variables are found to have statistically significant correlations with credit risk including loan purpose, gender, and housing status.

Uploaded by

Abdullah alsilme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views10 pages

1 - Credit Risk Log File

The document analyzes credit risk data using logistic regression. Descriptive statistics and frequency tables are presented for the variables. Logistic regression is used to predict the probability of high credit risk based on independent variables like checking/savings amounts, customer history, demographics and loan details. Several independent variables are found to have statistically significant correlations with credit risk including loan purpose, gender, and housing status.

Uploaded by

Abdullah alsilme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 10

-----------------------------------------------------------------------------------

-----------------------------------------------------------------------------------
----------
name: <unnamed>
log: \Credit Risk Log File_Last Version.log
log type: text

. Data : "2_Credit Risk data.dta"

. * Data description: Dependent variable: creditrisk (binary variable taking 1 if


the customer is high risk and taking 0 if the customer is low risk) Independent
Continuous
variables ( checking savings monthscustomer monthsemployed age) and independent
categorical variables ( loanpurpose gender maritalstatus housing job)
***********************************************************************************
***********************************************************************************
***********************

. * Descriptive Statistics: The Stata command "sum" for continuous variables and
the Stata command "tab1" to display at one time all the frequency tables of the
categorical
variables

.
.
. sum checking savings monthscustomer monthsemployed age

Variable | Obs Mean Std. Dev. Min Max


-------------+---------------------------------------------------------
checking | 425 1048.014 3147.183 0 19812
savings | 425 1812.562 3597.285 0 19811
monthscust~r | 425 22.89647 12.2676 5 73
monthsempl~d | 425 31.89647 32.25932 0 119
age | 425 34.39765 11.04513 18 73

. tab1 loanpurpose gender maritalstatus housing job

-> tabulation of loanpurpose

loanpurpose | Freq. Percent Cum.


------------+-----------------------------------
1 | 44 10.35 10.35
2 | 23 5.41 15.76
3 | 85 20.00 35.76
4 | 4 0.94 36.71
5 | 104 24.47 61.18
6 | 12 2.82 64.00
7 | 2 0.47 64.47
8 | 105 24.71 89.18
9 | 40 9.41 98.59
10 | 6 1.41 100.00
------------+-----------------------------------
Total | 425 100.00

-> tabulation of gender

gender | Freq. Percent Cum.


------------+-----------------------------------
0 | 135 31.76 31.76
1 | 290 68.24 100.00
------------+-----------------------------------
Total | 425 100.00

-> tabulation of maritalstatus

maritalstat |
us | Freq. Percent Cum.
------------+-----------------------------------
1 | 156 36.71 36.71
2 | 36 8.47 45.18
3 | 233 54.82 100.00
------------+-----------------------------------
Total | 425 100.00

-> tabulation of housing

housing | Freq. Percent Cum.


------------+-----------------------------------
1 | 292 68.71 68.71
2 | 81 19.06 87.76
3 | 52 12.24 100.00
------------+-----------------------------------
Total | 425 100.00

-> tabulation of job

job | Freq. Percent Cum.


------------+-----------------------------------
1 | 54 12.71 12.71
2 | 271 63.76 76.47
3 | 89 20.94 97.41
4 | 11 2.59 100.00
------------+-----------------------------------
Total | 425 100.00

. * frequency table of the dependent variable : creditrisk

. tab creditrisk

creditrisk | Freq. Percent Cum.


------------+-----------------------------------
0 | 214 50.35 50.35
1 | 211 49.65 100.00
------------+-----------------------------------
Total | 425 100.00

. * Comment: 50.35 % of borrowers are ranked low risk and 49.65 are ranked high
risk by the bank

. * Joint distribution and correlation between creditrisk and loanpurpose (the


stata command is : tab2 creditrisk loanpurpose, chi2)

.
.
. tab2 creditrisk loanpurpose, chi2

-> tabulation of creditrisk by loanpurpose


| loanpurpose
creditrisk | 1 2 3 4 5 6
7 8 9 10 | Total
-----------
+----------------------------------------------------------------------------------
----------------------------+----------
0 | 21 9 42 1 39 8
1 63 28 2 | 214
1 | 23 14 43 3 65 4
1 42 12 4 | 211
-----------
+----------------------------------------------------------------------------------
----------------------------+----------
Total | 44 23 85 4 104 12
2 105 40 6 | 425

Pearson chi2(9) = 21.2695 Pr = 0.012

. * There is a strong significant correlation between creditrisk and loanpurpose


(p-value =1.2% < 5%)

.
.
. * Joint distribution and correlation between creditrisk and gender (the stata
command is : tab2 creditrisk gender, chi2)

.
.
. tab2 creditrisk gender, chi2

-> tabulation of creditrisk by gender

| gender
creditrisk | 0 1 | Total
-----------+----------------------+----------
0 | 57 157 | 214
1 | 78 133 | 211
-----------+----------------------+----------
Total | 135 290 | 425

Pearson chi2(1) = 5.2320 Pr = 0.022

.
.
. * There is a strong significant correlation between creditrisk and gender (p-
value =2.2% < 5%)

. * Joint distribution and correlation between creditrisk and housing (the stata
command is : tab2 creditrisk housing , chi2)

. tab2 creditrisk housing , chi2

-> tabulation of creditrisk by housing

| housing
creditrisk | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 161 32 21 | 214
1 | 131 49 31 | 211
-----------+---------------------------------+----------
Total | 292 81 52 | 425

Pearson chi2(2) = 8.5524 Pr = 0.014

. * There is a strong significant correlation between creditrisk and housing (p-


value =1.4% < 5%)
***********************************************************************************
***********************************************************************************
**********************

. **** LOGISTIC REGRESSION****

. * Logistic regression is used when the dependent variable is binary and when we
have a typical coding : 0 for negative outcome (event did not occur) and 1 for
positive
outcome (event did occur). We use a Logit model when we are interested in seeing
how the independent variables affect the probabilty of the event occuring (or not
occuring)

. * Logit model : y = c + bX + e where y is the dependent variable (creditrisk) x a


set of independent continuous and categorical variables ( checking savings
monthscustomer
monthsemployed age loanpurpose gender maritalstatus housing job) c (constant/no
real significance in logistic regression model) and b are parameters to be
estimated.
e, the error term has mean 0 and variance π^2 (Pi squared). Pr(y=1|x)=
exp(c+bx)/1+exp(c+bx). So a positive coefficien b, indicates that higher levels of
x are associated
with an increase in Pr(y=1|x) and a negative coefficient indicates that higher
levels of x are associated with a decrease in Pr(y=1|x).

.
.
. * The Stata commad to estimate a Logit model is: "logit depvar indvars"

. logit creditrisk checking savings monthscustomer monthsemployed age i.loanpurpose


i.gender i.maritalstatus i.housing i.job

Iteration 0: log likelihood = -294.57696


Iteration 1: log likelihood = -257.67227
Iteration 2: log likelihood = -257.54937
Iteration 3: log likelihood = -257.54926
Iteration 4: log likelihood = -257.54926

Logistic regression Number of obs = 425


LR chi2(22) = 74.06
Prob > chi2 = 0.0000
Log likelihood = -257.54926 Pseudo R2 = 0.1257

--------------------------------------------------------------------------------
creditrisk | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
checking | -.0000476 .0000348 -1.37 0.171 -.0001158 .0000206
savings | -.0000496 .0000316 -1.57 0.117 -.0001116 .0000124
monthscustomer | .0502559 .0105246 4.78 0.000 .029628 .0708837
monthsemployed | -.0039044 .0037417 -1.04 0.297 -.011238 .0034291
age | -.0116182 .0112195 -1.04 0.300 -.0336081 .0103717
|
loanpurpose |
2 | .1805578 .5798451 0.31 0.756 -.9559176 1.317033
3 | .0208036 .4146947 0.05 0.960 -.791983 .8335902
4 | 1.511012 1.296294 1.17 0.244 -1.029677 4.051702
5 | .7164681 .4034342 1.78 0.076 -.0742485 1.507185
6 | -.7741574 .7415011 -1.04 0.296 -2.227473 .6791581
7 | .5033596 1.546789 0.33 0.745 -2.528292 3.535011
8 | -.2122486 .404214 -0.53 0.600 -1.004494 .5799963
9 | -1.345953 .5263792 -2.56 0.011 -2.377637 -.3142685
10 | .0141124 1.029348 0.01 0.989 -2.003373 2.031598
|
1.gender | .129157 .5282329 0.24 0.807 -.9061604 1.164474
|
maritalstatus |
2 | -.4163424 .6175879 -0.67 0.500 -1.626792 .7941076
3 | -.6619016 .5082489 -1.30 0.193 -1.658051 .334248
|
housing |
2 | .5931464 .2893651 2.05 0.040 .0260012 1.160292
3 | .5939747 .3756847 1.58 0.114 -.1423538 1.330303
|
job |
2 | -.2862753 .3604276 -0.79 0.427 -.9927004 .4201498
3 | -.1103014 .412744 -0.27 0.789 -.9192648 .6986621
4 | -.5714215 .7860727 -0.73 0.467 -2.112096 .9692526
|
_cons | -.1554973 .6956042 -0.22 0.823 -1.518857 1.207862
--------------------------------------------------------------------------------

. * Model 2 : logit creditrisk savings monthscustomer age i.loanpurpose i.gender


i.maritalstatus i.housing (some inconsistent independent variables are dropped)

.
.
. logit creditrisk savings monthscustomer age i.loanpurpose i.gender
i.maritalstatus i.housing

Iteration 0: log likelihood = -294.57696


Iteration 1: log likelihood = -259.73212
Iteration 2: log likelihood = -259.65054
Iteration 3: log likelihood = -259.65049
Iteration 4: log likelihood = -259.65049

Logistic regression Number of obs = 425


LR chi2(17) = 69.85
Prob > chi2 = 0.0000
Log likelihood = -259.65049 Pseudo R2 = 0.1186

--------------------------------------------------------------------------------
creditrisk | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
savings | -.0000523 .000031 -1.69 0.091 -.000113 8.43e-06
monthscustomer | .0491996 .0102011 4.82 0.000 .0292058 .0691934
age | -.0135457 .0106361 -1.27 0.203 -.0343921 .0073007
|
loanpurpose |
2 | .2306476 .5752784 0.40 0.688 -.8968774 1.358172
3 | .0283228 .4071357 0.07 0.945 -.7696485 .8262941
4 | 1.619859 1.280687 1.26 0.206 -.8902408 4.129959
5 | .7146552 .3995288 1.79 0.074 -.0684068 1.497717
6 | -.7673558 .7213625 -1.06 0.287 -2.1812 .6464887
7 | .3543893 1.514912 0.23 0.815 -2.614784 3.323563
8 | -.2122968 .3982476 -0.53 0.594 -.9928478 .5682542
9 | -1.228455 .5094138 -2.41 0.016 -2.226887 -.2300221
10 | .2539983 .9985979 0.25 0.799 -1.703218 2.211214
|
1.gender | .1892455 .5261844 0.36 0.719 -.842057 1.220548
|
maritalstatus |
2 | -.5019915 .6130893 -0.82 0.413 -1.703625 .6996415
3 | -.7653826 .502026 -1.52 0.127 -1.749336 .2185704
|
housing |
2 | .5846078 .286382 2.04 0.041 .0233094 1.145906
3 | .5378179 .3707556 1.45 0.147 -.1888497 1.264486
|
_cons | -.4400372 .5845148 -0.75 0.452 -1.585665 .7055908
--------------------------------------------------------------------------------

. * To check the predictive power of the estimated model, the stata post
estimation command is : "estat classification"

. estat classification

Logistic model for creditrisk

-------- True --------


Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 135 65 | 200
- | 76 149 | 225
-----------+--------------------------+-----------
Total | 211 214 | 425

Classified + if predicted Pr(D) >= .5


True D defined as creditrisk != 0
--------------------------------------------------
Sensitivity Pr( +| D) 63.98%
Specificity Pr( -|~D) 69.63%
Positive predictive value Pr( D| +) 67.50%
Negative predictive value Pr(~D| -) 66.22%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 30.37%
False - rate for true D Pr( -| D) 36.02%
False + rate for classified + Pr(~D| +) 32.50%
False - rate for classified - Pr( D| -) 33.78%
--------------------------------------------------
Correctly classified 66.82%
--------------------------------------------------

. * Considering the estimated Logit model, the percentage of customers correctly


classified (ranked high or low risk) is around 67% (correctly classified 66.82 %).
This model has a good predictive power.

. * To report the odds ratios (exp(b)) for each independent variable, the stata
command is "logit depvar indvars, or". Standard errors and confidence intervals
are also transformed.
.
.
. logit creditrisk savings monthscustomer age i.loanpurpose i.gender
i.maritalstatus i.housing, or

Iteration 0: log likelihood = -294.57696


Iteration 1: log likelihood = -259.73212
Iteration 2: log likelihood = -259.65054
Iteration 3: log likelihood = -259.65049
Iteration 4: log likelihood = -259.65049

Logistic regression Number of obs = 425


LR chi2(17) = 69.85
Prob > chi2 = 0.0000
Log likelihood = -259.65049 Pseudo R2 = 0.1186

--------------------------------------------------------------------------------
creditrisk | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
savings | .9999477 .000031 -1.69 0.091 .9998871 1.000008
monthscustomer | 1.05043 .0107155 4.82 0.000 1.029636 1.071643
age | .9865457 .010493 -1.27 0.203 .9661926 1.007327
|
loanpurpose |
2 | 1.259415 .7245144 0.40 0.688 .4078412 3.889079
3 | 1.028728 .4188318 0.07 0.945 .4631758 2.284836
4 | 5.052379 6.470514 1.26 0.206 .4105569 62.17537
5 | 2.043482 .8164298 1.79 0.074 .9338805 4.47147
6 | .464239 .3348846 -1.06 0.287 .1129059 1.908826
7 | 1.42531 2.15922 0.23 0.815 .0731836 27.75908
8 | .8087246 .3220727 -0.53 0.594 .37052 1.765183
9 | .2927446 .1491281 -2.41 0.016 .1078636 .7945161
10 | 1.28917 1.287362 0.25 0.799 .1820967 9.126792
|
1.gender | 1.208338 .6358084 0.36 0.719 .4308234 3.389044
|
maritalstatus |
2 | .605324 .3711177 -0.82 0.413 .1820226 2.013031
3 | .4651559 .2335204 -1.52 0.127 .1738895 1.244297
|
housing |
2 | 1.794287 .5138515 2.04 0.041 1.023583 3.14529
3 | 1.712266 .6348324 1.45 0.147 .8279109 3.54127
|
_cons | .6440125 .3764348 -0.75 0.452 .2048115 2.025043
--------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

.
.
. * Notes : 1) for positive b, "the odds are exp(b) times larger" or "the odds
increase by a factor of exp(b)" --- 2) for negative b, "the odds are exp(b) times
smaller" or
> "the odds decrease by a factor of exp(b)"--- 3) odds close to 1 indicate a small
change (multiplying by 1.01 or 0.99 does not change the odds much. --- 4) The odds
of Y=1 (high risk) increase multiplicatively by exp(b) for a one unit increase
in X, holding all other variables constant.
. * Comments: Results in the table above show 1) the odds of the variable
monthscustomer (continuous) for a customer ranked high risk (creditrisk=1) increase
by a factor of 1
> .05 for a unit increase in monthscustomer. 2) The odds for a customer ranked
high risk (creditrisk=1) increase by a factor of 1.8 when the customer rents a
house compared
when he owns his house.

. * How to obtain easier coefficients for easier interpretation : the stata post
estimation command "listcoef, percent" gives the percent change in odds for unit
increase in
> X and the percent change in odds for Standard Deviation increase in X.

. listcoef, percent

logit (N=425): Percentage Change in Odds

Odds of: 1 vs 0

----------------------------------------------------------------------
creditrisk | b z P>|z| % %StdX SDofX
-------------+--------------------------------------------------------
savings | -0.00005 -1.688 0.091 -0.0 -17.1 3597.2850
monthscust~r | 0.04920 4.823 0.000 5.0 82.9 12.2676
age | -0.01355 -1.274 0.203 -1.3 -13.9 11.0451
2.loanpurp~e | 0.23065 0.401 0.688 25.9 5.4 0.2265
3.loanpurp~e | 0.02832 0.070 0.945 2.9 1.1 0.4005
4.loanpurp~e | 1.61986 1.265 0.206 405.2 17.0 0.0967
5.loanpurp~e | 0.71466 1.789 0.074 104.3 36.0 0.4304
6.loanpurp~e | -0.76736 -1.064 0.287 -53.6 -11.9 0.1658
7.loanpurp~e | 0.35439 0.234 0.815 42.5 2.5 0.0685
8.loanpurp~e | -0.21230 -0.533 0.594 -19.1 -8.8 0.4318
9.loanpurp~e | -1.22845 -2.412 0.016 -70.7 -30.2 0.2923
10.loanpur~e | 0.25400 0.254 0.799 28.9 3.0 0.1181
1.gender | 0.18925 0.360 0.719 20.8 9.2 0.4661
2.maritals~s | -0.50199 -0.819 0.413 -39.5 -13.1 0.2788
3.maritals~s | -0.76538 -1.525 0.127 -53.5 -31.7 0.4983
2.housing | 0.58461 2.041 0.041 79.4 25.8 0.3932
3.housing | 0.53782 1.451 0.147 71.2 19.3 0.3281
----------------------------------------------------------------------

. * Table description : 1) b = raw coefficient 2)z = z-score for test of b=0 3)p>|
z|= p-value for z-test 4)% = percent change in odds for percent increase in X 5)
%StdX = percent change in odds for SD increase in X 6) SDofX = standard
deviation of X.

. * Results : 1)the odds of a high risk customer increase by 5% for 1 month


increase of the variable monthscustomer, holding other variables constant. 2)the
odds of a high
risk customer decrease by 70.7% when the loan purpose is buying a used car
( loanpurpose=9) compared to a high risk borrower when the loan is for business
( loanpurpose=1,
the reference), holding other variables constant. 3)the odds of a high risk
customer renting a house increase by 79.4% compared to a high risk borrower owning
a house
(house=1, the reference), holding other variables constant.
***********************************************************************************
***********************************************************************************
*********************************
. *** Probability Prediction***

. * to predict the probability of a customer to be ranked high risk, the stata


command is "prvalue". So how? Example: for a customer having specific
characteristics
x( savings=5000 monthscustomer=28 age=30 loanpurpose=2 gender=1 maritalstatus=2
housing=2), the first step is to estimate the model this way : "quietly logit
creditrisk savings
monthscustomer age loanpurpose gender maritalstatus housing" . Preceding any
stata command by "quietly and Stata would not display results (we do not need
them).
The second step consists to run the command "prvalue, x(savings=5000
monthscustomer=28 age=30 loanpurpose=2 gender=1 maritalstatus=2 housing=2)"

. quietly logit creditrisk savings monthscustomer age loanpurpose gender


maritalstatus housing

. * Stata does not display results...

. prvalue, x( savings=5000 monthscustomer=28 age=30 loanpurpose=2 gender=1


maritalstatus=2 housing=2)

logit: Predictions for creditrisk

Confidence intervals by delta method

95% Conf. Interval


Pr(y=1|x): 0.6512 [ 0.5224, 0.7800]
Pr(y=0|x): 0.3488 [ 0.2200, 0.4776]

savings monthscust~r age loanpurpose gender


maritalsta~s housing
x= 5000 28 30 2 1
2 2

. * The predicted probability to be ranked high risk of a customer with these


characteristics is 0.6512 with 95% CI [0.5224 0.7800].

. * To predict the probability to be ranked high risk of a customer at the mean of


the set of independent variables, the stata command is "prvalue, rest(mean)"

. prvalue, rest(mean)

logit: Predictions for creditrisk

Confidence intervals by delta method

95% Conf. Interval


Pr(y=1|x): 0.4976 [ 0.4472, 0.5479]
Pr(y=0|x): 0.5024 [ 0.4521, 0.5528]

savings monthscust~r age loanpurpose gender


maritalsta~s housing
x= 1812.5624 22.896471 34.397647 5.24 .68235294
2.1811765 1.4352941
. * the predicted probabilty to be ranked high risk is 0.4976 with 95% CI [0.4472
0.5479]

.
***********************************************************************************
***********************************************************************************
**********************************

. log close

-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
----------

You might also like