0% found this document useful (0 votes)
62 views

Data Analysis - Using R

Peer to peer lending model removes much complexity of the formal banking sector and splits the savings between borrower and lender. The only challenge is developing a flexible interest rate based on an objective criteria. Some agencies have worked out a statistical model to assess the risk of default by the borrower and base the interest rate on it. One such score- FICO, is a type of credit score that makes up a substantial portion of the credit report that lenders use to assess an applicant's credit risk and whether to extend a loan and at what interest rate.  FICO is an acronym for the Fair Isaac Corporation, the creators of the FICO scorei. Using mathematical models, the FICO score takes into account various factors in each of these five areas to determine credit risk: payment history, current level of indebtedness, types of credit used and length of credit history, and new credit. Higher the score, better are the borrower's chances of getting a loan at favorable terms, ie. at lower interest ratei. This study was to explore if any other variable had a better association with interest rate than FICO had. For example, if two people have the same FICO score, can other variables explain a difference in interest rate between them?
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Data Analysis - Using R

Peer to peer lending model removes much complexity of the formal banking sector and splits the savings between borrower and lender. The only challenge is developing a flexible interest rate based on an objective criteria. Some agencies have worked out a statistical model to assess the risk of default by the borrower and base the interest rate on it. One such score- FICO, is a type of credit score that makes up a substantial portion of the credit report that lenders use to assess an applicant's credit risk and whether to extend a loan and at what interest rate.  FICO is an acronym for the Fair Isaac Corporation, the creators of the FICO scorei. Using mathematical models, the FICO score takes into account various factors in each of these five areas to determine credit risk: payment history, current level of indebtedness, types of credit used and length of credit history, and new credit. Higher the score, better are the borrower's chances of getting a loan at favorable terms, ie. at lower interest ratei. This study was to explore if any other variable had a better association with interest rate than FICO had. For example, if two people have the same FICO score, can other variables explain a difference in interest rate between them?
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Assignment 1

(from Satyendra Srivastava, Student)

Risk Assessment in Peer to Peer lending


Introduction Access to credit and investment in such services is an important aspect of many societies. In Asia, many such peer-to-peer models have been set up to help small entrepreneurs in social sector in recent timesi,ii. This model removes much complexity of the formal bankin sector and splits the savin s bet!een borro!er and lender. The only challen e is developin a flexible interest rate based on an ob"ective criteria. Some a encies have !orked out a statistical model to assess the risk of default by the borro!er and base the interest rate on it. #ne such score- $I%#, is a type of credit score that makes up a substantial portion of the credit report that lenders use to assess an applicant&s credit risk and !hether to extend a loan and at !hat interest rate. $I%# is an acronym for the $air Isaac %orporation, the creators of the $I%# scoreiii. 'sin mathematical models, the $I%# score takes into account various factors in each of these five areas to determine credit risk( payment history, current level of indebtedness, types of credit used and len th of credit history, and ne! credit. )i her the score, better are the borro!er&s chances of ettin a loan at favorable terms, ie. at lo!er interest rateiv. This study !as to explore if any other variable had a better association !ith interest rate than $I%# had. $or example, if t!o people have the same $I%# score, can other variables explain a difference in interest rate bet!een them* Methods Data Collection The data for this study consist of a sample of +,,-- peer-to-peer loans issued throu h the .endin %lubv, provided on %oursera, as part of assi nment one. A csv file, rda file and the codebook !as kindly made available for do!nload. Exploratory Analysis The author explored the data (.oans/ata.csv) - its dimensions, names (variables), type of variables and factors if any, missin values, extreme 0 improbable values, usin 1 commands like dim, is.na, boxplot, names, summary etc. The data had +,-- observations a ainst 23 variables. There !as a fifteenth un-named variable, some kind of .oan I/, seen in a spreadsheet pro ram and havin no relevance to this study and hence !as i nored. There !ere only 4 5A in the entire dataset, !hich !ere left untouched. There !as one unlikely value- 6inimum amount $unded 7y Investors 8--.-29. Some of the variables needed conversion : transformation for further analysis. This !as done to remove ; from interest rates, &month& from &.oan .en th&, and hyphen from $I%# ran e. This !as done preparatory to conversion to numeric variables usin &as.numeric& function, in the next step. $I%# !as converted to a number by removin the hyphen and dividin the resultin < di it fi ure by 2---. $I%# ran ed bet!een <3- to =>3, !ith median at 4-- and a noticeable ne ative ske! - to!ards lo!er end ($i 2). To explore the association bet!een Interest 1ate and $I%#, these t!o variables !ere plotted. A stron ne ative association !as found as expected ($i >, by loan length). 5e ative association !as also found bet!een number of open credit lines and Interest rate. A positive association !as found bet!een Interest rates and amount re?uested, duration of loan and In?uiries made in last < months. Confounders in this analysis !ere amount re?uested and In?uiries in last < months. 7oth are correlated to $I%# as !ell as Interest rate. @robability of Amount re?uested bein positively related to both $I%# and Interest 1ate is very hi h, respectively -.---+3 and 8A+e-2<9. In case of In?uiries in last < months, it is 8>.B,e--<9 and 8A+e-2<9. Therefore, it !ill be safe to assume that these t!o variables are already

incorporated in $I%# and have no independent value as predictors of interest rate. Statistical Modeling To understand and ?uantify the effect of $I%# and other variables on Interest rate, linear model !as used. %harts plottin residuals a ainst variables other than $I%# !ere created and studied. The chart for .oan len th is attached as $i +. $inally an all comprehensive modelin !as done to see ho! much of variation is explained by $I%# plus other variables. A summary for this final modelin !as enerated, to look at ?uantitative contribution by each of these variables. To understand potential confounders, relationship bet!een Interest 1ate, $I%#, .oan .en th and Amount re?uested !as studied usin a linear model. Reproducibility Cven thou h it is optional, the author has pasted all the final 1 commands under appendix. System specifications are also mentioned therein . Results The data offers number of variables other than $I%# !hich may influence Interest rates. There is some overlap( $I%# consists of payment history, current level of indebtedness, types of credit used and len th of credit history, and ne! credit, !hich are represented in the current data by these variables( #pen %redit .ines, /ebt to Income 1atio, .oan @urpose, 6onthly Income, 1evolvin %redit 7alance and In?uiries in the last < 6onths. So, most of these variables !ill have lesser impact on Interest 1ate, as compared to $I%# and this is borne out by the analysis. 5e ative association bet!een Interest 1ate and $I%# is non-random and has a very si nificant probability (A +e-2<). In fact, oin by T values, none of the variables have associations, stron er than $I%#. So, it can be said that the present dataset sho!s that $I%# is the stron est predictor of Interest rate. In a descendin scale, one can see that the next four variables influencin Interest rates are( duration of the loan (.oan .en th), the amount requested (especially in the ran e of D 2B,3-- to >,,---), Inquiries in the last 6 months and open credit lines. %onsiderin the T values, the first three variables appear to be stron ly associated !ith interest rate, after FICO. )ere are the relevant statistics(
Coefficients:
(Intercept) $I%# .oan .en th (<- mo) Amount 1e?uested H ,3,-, B-,-) Amount 1e?uested H B-,-,2+2--) Amount 1e?uested H2+2--,2B3--) Amount 1e?uested HD2B3--,>,---I In?. in the .ast < 6o #pen %1C/IT .ines Cstimate 4.+,-eG-2 -=.<43e--, >.+4,eG-,.+>,e--2 =.>22e--2 2.<++eG->.+<BeG->.+B2e--2 ->.2+2e--+ Std. Crror 2.-2>eG-2.+=-e--< 2.2+4e--2 2.>,3e--2 2.>==e--2 2.33+e--2 2.<23e--2 >.3=-e--+ 2.-<<e--+ t value 42.,=3 -<4.43> +B.-,= >.=<4 ,.B=< 22.+3< +-.+34 B.3,4 -+.B+B @r(EFtF) A +e-2< A +e-2< A +e-2< -.---22> +.34e--B A +e-2< A +e-2< A +e-2< -.-->3>+

7ut as discussed above both amount re?uested and In?uiries in last < months are confounders, ie. Stron ly correlated !ith Interest rate as !ell as $I%#. So, loan len th appears to be the only predictor of interest rates, other than $I%#. Criticis ( Jhile $I%# may be a ood predictor of Interest rates, it may not forecast actual behavior of the borro!er after lendin . It is becomin apparent that $I%# has its failin s and it may be less and

less accurate in forecastin a default. In +--2 there !as an avera e >2-point difference in the $I%# score bet!een borro!ers !ho had defaulted and those !ho paid on time. 7y +--< the difference !as only 2- pointsvi. A better !ay could be a proper analysis of a potential borro!er&s assets and employment. Another concern is K a desire to improve $I%# profile, encoura es scams and propels borro!ers in to an ever deepenin do!n!ard spiral of indebtedness, for example by openin more debt lines or by increasin credit limits etc. The dan ers of such trends to individuals and its impact on economy !ere obvious durin recent economic recession in 'SA (+--4--=) and other leadin economies of the !orld. .oans based on real assessment and meant for productive (entrepreneurial) purposes should be promoted in preference to consumptive loans. Conclusions In the iven dataset, $I%# explains the variation in Interest rate to the best possible de ree. #nly other variable doin so (to a lesser extent) is .en th of loan. #ther variables are confounders ie. they have already been incorporated in $I%#. 7ut !hether $I%# can predict future default by the borro!er has been ?uestioned in extant literature. This study !ould have benefited if information about defaults, if any, !ere also provided in the dataset and !ere related to current variables.

Figures

$i 2( A histo ram sho!in $I%# distribution. It sho!s ne ative ske!. 6ost scores are to!ards the lo!er end.

$i +( .oan len th in a linear model of Interest 1ate L $I%#. .oan len th has a positive association !ith interest rate- but not as stron as $I%#.

$i >( Scatter plot sho!in ne ative association bet!een $I%# and Interest rates by .oan len th. )i her the $I%# score, lesser is the interest rate. This correlation is stron er for <- month loan len th. In case of >< months, it holds true only bet!een ,-2,; interest rate and tapers $i 3( Scatter plot sho!in positive association bet!een Interest rates and Amount re?uested. It is stron bet!een off beyond it. 2,-+,; interest rate for amount bet!een D2-,--->-,---.

Cndnotes(

i ii iii iv v vi

http(::!!!.ft.com:cms:s:-:c2e+=4+c-34+d-22e2-b=34---233feabdc-.htmlMaxNN+O5<Ny5P# http(::i-lend.in: http(::!!!.investopedia.com:terms:f:ficoscore.aspMixNN+O5B1TtB" Ibid. https(::!!!.lendin club.com:home.action http(::en.!ikipedia.or :!iki:%reditQscoreQinQtheQ'nitedQStates

You might also like