Data Analysis - Using R
Data Analysis - Using R
incorporated in $I%# and have no independent value as predictors of interest rate. Statistical Modeling To understand and ?uantify the effect of $I%# and other variables on Interest rate, linear model !as used. %harts plottin residuals a ainst variables other than $I%# !ere created and studied. The chart for .oan len th is attached as $i +. $inally an all comprehensive modelin !as done to see ho! much of variation is explained by $I%# plus other variables. A summary for this final modelin !as enerated, to look at ?uantitative contribution by each of these variables. To understand potential confounders, relationship bet!een Interest 1ate, $I%#, .oan .en th and Amount re?uested !as studied usin a linear model. Reproducibility Cven thou h it is optional, the author has pasted all the final 1 commands under appendix. System specifications are also mentioned therein . Results The data offers number of variables other than $I%# !hich may influence Interest rates. There is some overlap( $I%# consists of payment history, current level of indebtedness, types of credit used and len th of credit history, and ne! credit, !hich are represented in the current data by these variables( #pen %redit .ines, /ebt to Income 1atio, .oan @urpose, 6onthly Income, 1evolvin %redit 7alance and In?uiries in the last < 6onths. So, most of these variables !ill have lesser impact on Interest 1ate, as compared to $I%# and this is borne out by the analysis. 5e ative association bet!een Interest 1ate and $I%# is non-random and has a very si nificant probability (A +e-2<). In fact, oin by T values, none of the variables have associations, stron er than $I%#. So, it can be said that the present dataset sho!s that $I%# is the stron est predictor of Interest rate. In a descendin scale, one can see that the next four variables influencin Interest rates are( duration of the loan (.oan .en th), the amount requested (especially in the ran e of D 2B,3-- to >,,---), Inquiries in the last 6 months and open credit lines. %onsiderin the T values, the first three variables appear to be stron ly associated !ith interest rate, after FICO. )ere are the relevant statistics(
Coefficients:
(Intercept) $I%# .oan .en th (<- mo) Amount 1e?uested H ,3,-, B-,-) Amount 1e?uested H B-,-,2+2--) Amount 1e?uested H2+2--,2B3--) Amount 1e?uested HD2B3--,>,---I In?. in the .ast < 6o #pen %1C/IT .ines Cstimate 4.+,-eG-2 -=.<43e--, >.+4,eG-,.+>,e--2 =.>22e--2 2.<++eG->.+<BeG->.+B2e--2 ->.2+2e--+ Std. Crror 2.-2>eG-2.+=-e--< 2.2+4e--2 2.>,3e--2 2.>==e--2 2.33+e--2 2.<23e--2 >.3=-e--+ 2.-<<e--+ t value 42.,=3 -<4.43> +B.-,= >.=<4 ,.B=< 22.+3< +-.+34 B.3,4 -+.B+B @r(EFtF) A +e-2< A +e-2< A +e-2< -.---22> +.34e--B A +e-2< A +e-2< A +e-2< -.-->3>+
7ut as discussed above both amount re?uested and In?uiries in last < months are confounders, ie. Stron ly correlated !ith Interest rate as !ell as $I%#. So, loan len th appears to be the only predictor of interest rates, other than $I%#. Criticis ( Jhile $I%# may be a ood predictor of Interest rates, it may not forecast actual behavior of the borro!er after lendin . It is becomin apparent that $I%# has its failin s and it may be less and
less accurate in forecastin a default. In +--2 there !as an avera e >2-point difference in the $I%# score bet!een borro!ers !ho had defaulted and those !ho paid on time. 7y +--< the difference !as only 2- pointsvi. A better !ay could be a proper analysis of a potential borro!er&s assets and employment. Another concern is K a desire to improve $I%# profile, encoura es scams and propels borro!ers in to an ever deepenin do!n!ard spiral of indebtedness, for example by openin more debt lines or by increasin credit limits etc. The dan ers of such trends to individuals and its impact on economy !ere obvious durin recent economic recession in 'SA (+--4--=) and other leadin economies of the !orld. .oans based on real assessment and meant for productive (entrepreneurial) purposes should be promoted in preference to consumptive loans. Conclusions In the iven dataset, $I%# explains the variation in Interest rate to the best possible de ree. #nly other variable doin so (to a lesser extent) is .en th of loan. #ther variables are confounders ie. they have already been incorporated in $I%#. 7ut !hether $I%# can predict future default by the borro!er has been ?uestioned in extant literature. This study !ould have benefited if information about defaults, if any, !ere also provided in the dataset and !ere related to current variables.
Figures
$i 2( A histo ram sho!in $I%# distribution. It sho!s ne ative ske!. 6ost scores are to!ards the lo!er end.
$i +( .oan len th in a linear model of Interest 1ate L $I%#. .oan len th has a positive association !ith interest rate- but not as stron as $I%#.
$i >( Scatter plot sho!in ne ative association bet!een $I%# and Interest rates by .oan len th. )i her the $I%# score, lesser is the interest rate. This correlation is stron er for <- month loan len th. In case of >< months, it holds true only bet!een ,-2,; interest rate and tapers $i 3( Scatter plot sho!in positive association bet!een Interest rates and Amount re?uested. It is stron bet!een off beyond it. 2,-+,; interest rate for amount bet!een D2-,--->-,---.
Cndnotes(
i ii iii iv v vi