0% found this document useful (0 votes)
257 views3 pages

Part 3 Comparing The Information Gain of Alternative Data and Modelstxt

Your model and Eggertopia Scores both aim to reduce uncertainty about credit card applicants' likelihood of default. Your model identifies relationships between input data and default risk, while Eggertopia Scores also use data to predict default. Both can be evaluated on a test set to calculate their information gain over the base default rate, with your model achieving a higher percentage information gain and saving the bank more per bit of information extracted. The incremental information gain from adding Eggertopia Scores to what your model already provides is negligible.

Uploaded by

mayank
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
257 views3 pages

Part 3 Comparing The Information Gain of Alternative Data and Modelstxt

Your model and Eggertopia Scores both aim to reduce uncertainty about credit card applicants' likelihood of default. Your model identifies relationships between input data and default risk, while Eggertopia Scores also use data to predict default. Both can be evaluated on a test set to calculate their information gain over the base default rate, with your model achieving a higher percentage information gain and saving the bank more per bit of information extracted. The incremental information gain from adding Eggertopia Scores to what your model already provides is negligible.

Uploaded by

mayank
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Comparing the Information Gain of Eggertopia Scores and Your Model

Both the Eggertopia Scores and your binary classification model can be thought of
as tools to reduce uncertainty about future default outcomes of credit card
applicants.

Your own model, deeloped in !art ", identifies dependencies between, on the one
hand, the si# types on input data collected by the ban$, and on the other hand, the
binary outcome default%no default.

If we assume that the dependencies identified by Eggertopia Scores and by your


model on the &est Set are stable and representatie of all future data 'a big
assumption( we can draw some further conclusions about how much information gain,
or reduction in uncertainty, is proided by each.

)efinitions are gien in the Information Gain Calculator Spreadsheet, proided


below.

Information Gain Calculator.#ls#


*uestion+
*uestion+ n your model� s &est Set results,
results, what is the condition
conditional
al entropy of
default, gien your test classifications-

int+ you need


need your model� s true positie rate from
from !art
!art ", *uestion
*uestion "/, and � test
incidence
incidence� 0proportion
0proportion of eents your
your model classifies
classifies as default1
default1 from
from !art ",
2uestion "3.
"3. 4se the condition
condition incidence
incidence of /56 and your
your model� s &rue !ositie
!ositie rate
to calculate the portion of &!s. &hen you hae the inputs needed to use the
Information Gain Calculator Spreadsheet.

" r

7ecall that the entropy of the original base rate, minus the conditional entropy of
default gien your test classification, e2uals the Mutual Information between
default and the test.

I'89Y( : '8( � '8;Y(.

&he population of potential credit card customers consists of /56 future


defaulters. &he base rate incidence of default './5, .<5( has an uncertainty, or
entropy, of './5, .<5( : ./5=log> ? .<5=log".333 : .@""3 bits.

*uestion+ n your test set results, what is the Mutual Information, or information
Gain, in aerage bits per eent-

" r

7ecall that !ercentage Information Gain '!.I.G.( is the ratio of I'89Y(%'8(.

*uestion+ on your &est Set results, what is the !ercentage Information Gain
'!.I.G.( of your model-

" r
Since you hae, for you model on the &est Set, a saingsAperAeent, and a bitsAperA
eent 'Mutual Information( you can calculate a saingsAperAbit. &his is a powerful
concept, because it places a financial alue directly on the information content of
a model 'or additional data source, li$e the Eggertopia scores(.

*uestion+ ow many dollars does the ban$ sae, for eery bit of information gain
achieed by your model-

" r

Information Gain of Eggertopia Scores oer the Base 7ate

or 2uestions in this section, assume your model and the data it uses are not
aailable � the ban$� s choice is between Eggertopia scores and the base rate.

*uestion+ hat is the Mutual Information of the Eggertopia Scores-

In other words, on the &est Set, hat is the information gain, in aerage bits per
eent, oer the base rate of './5, .<5( offered by the Eggertopia Scores-

."3D5 bits per eent #

n the test set, what is the Eggertopia scores� !ercentage Information Gain '!IG(-

"5./56 #

If Eggertopia data were free, and your model was unaailable, what would the dollar
saings per bit of information e#tracted be-

)ollar saings are >"/ rounded to the nearest dollarA from 2uiF /, 2uestion 

Halue would be >/< per bit. #

Incremental Information Gain of Eggertopia Scores Compared to Your Model and


ailable )ata 'any answer scores(

'or this section, assume your Model and the )ata it uses are aailable(.

*uestion+ hat is the incremental information gain of the Eggertopia scores, oer
your model from !art ", in aerage bits per eent, if any-
" r

hat is the ma#imum 'brea$Aeen( price the ban$ should pay for Eggertopia scores,
per score, if your model from !art " and data are already aailable-

" r
t the aboe ma#imum 'brea$Aeen( price per score, what would be the alue per bit
of incremental information gained from the Eggertopia scores- Gie your answer in
%bit.

" r

You might also like