Part 3 Comparing The Information Gain of Alternative Data and Modelstxt
Part 3 Comparing The Information Gain of Alternative Data and Modelstxt
Both the Eggertopia Scores and your binary classification model can be thought of
as tools to reduce uncertainty about future default outcomes of credit card
applicants.
Your own model, deeloped in !art ", identifies dependencies between, on the one
hand, the si# types on input data collected by the ban$, and on the other hand, the
binary outcome default%no default.
" r
7ecall that the entropy of the original base rate, minus the conditional entropy of
default gien your test classification, e2uals the Mutual Information between
default and the test.
*uestion+ n your test set results, what is the Mutual Information, or information
Gain, in aerage bits per eent-
" r
*uestion+ on your &est Set results, what is the !ercentage Information Gain
'!.I.G.( of your model-
" r
Since you hae, for you model on the &est Set, a saingsAperAeent, and a bitsAperA
eent 'Mutual Information( you can calculate a saingsAperAbit. &his is a powerful
concept, because it places a financial alue directly on the information content of
a model 'or additional data source, li$e the Eggertopia scores(.
*uestion+ ow many dollars does the ban$ sae, for eery bit of information gain
achieed by your model-
" r
or 2uestions in this section, assume your model and the data it uses are not
aailable � the ban$� s choice is between Eggertopia scores and the base rate.
In other words, on the &est Set, hat is the information gain, in aerage bits per
eent, oer the base rate of './5, .<5( offered by the Eggertopia Scores-
n the test set, what is the Eggertopia scores� !ercentage Information Gain '!IG(-
"5./56 #
If Eggertopia data were free, and your model was unaailable, what would the dollar
saings per bit of information e#tracted be-
)ollar saings are >"/ rounded to the nearest dollarA from 2uiF /, 2uestion
'or this section, assume your Model and the )ata it uses are aailable(.
*uestion+ hat is the incremental information gain of the Eggertopia scores, oer
your model from !art ", in aerage bits per eent, if any-
" r
hat is the ma#imum 'brea$Aeen( price the ban$ should pay for Eggertopia scores,
per score, if your model from !art " and data are already aailable-
" r
t the aboe ma#imum 'brea$Aeen( price per score, what would be the alue per bit
of incremental information gained from the Eggertopia scores- Gie your answer in
%bit.
" r