Transactions of Society of Actuaries 1 9 9 5 VOL. 47
Transactions of Society of Actuaries 1 9 9 5 VOL. 47
1 9 9 5 VOL. 47
1. INTRODUCTION
This paper addresses certain pervasive problems in using secondary data
in actuarial research. Those problems include the following situations:
• The data are summarized in a histogram or tabular (grouped data) format,
perhaps with additional mean or median information (for example, pub-
lished medical research, demographic data, and so on), which must be
incorporated into actuarial analysis.
• Two published sources yield histogram or tabular summaries with the
same variable, but the two sources do not group the values of the variable
*Dr. Brockett, not a member of the Society, is Director of the Center for Cybernetic Studies at
the University of Texas at Austin.
"['Dr. Golany, not a member of the Society, is Associate Professor and Associate Dean, Faculty
of Industrial Engineering and Management at the Technion-lsrael Institute of Technology, Technion
City, Haifa, Israel.
:[:Dr, Phillips, not a member of the Society, is Director, Business School, Oregon Graduate In-
stitute of Science and Technology, Portland, Oregon.
§Dr, Song, not a member of the Society, is on the actuarial staff of the National Actuarial Services
Group, Ernst & Young, LLP, New York City.
89
90 TRANSACTIONS, VOLUME XLVII
the same way (for example, mortality rates grouped into age intervals
can be distinctly different in different medical studies).
• The researcher wishes to answer a question by using information from
several distinctly grouped data streams, and the original, detailed data
underlying the published summary (which might give a better answer to
the question) are unavailable.
Reconciling and matching information from two or more sources is a
common analytic problem faced by practicing actuaries. Data reported by
magazines, medical journals, or government publications are often given in
grouped histogram form. Because these information sources operate inde-
pendently of one another, their reports usually have incompatibly grouped
data. The summary presentation of such information frequently is accom-
panied by values of some of its moments or the conditional moments with
certain subintervals. This data-matching problem is a specific case of the
more general question, "How can we make statistical inferences from sec-
ondary data and incorporate this information into our actuarial analysis?" In
this paper we present a method (a maximum-entropy procedure) that is based
on the concepts of statistical information theory and that shows how to use
all the information available (and no other) to answer such questions.
Applications with real data often involve conflicting or missing data ele-
ments. A publication may provide a histogram along with its overall mean
and one conditional mean (that is, the mean of some subinterval), in which
the latter two do not agree because of typographical error or because they
are a summarization of two different studies.
Situations can also arise in which the data given are insufficient even to
apply information theoretic techniques, but a uniform treatment is s011
needed. Accordingly, the rigorous statistical procedures detailed in Brockett
[3] must be supplemented with some heuristics to handle these cases. These
heuristic procedures also are discussed.
In this paper we present a procedure for generating maximum-entropy
density estimates from data in histogram form with the possibility that ad-
ditional means and medians may be known. With the computing power now
available, completely rigorous maximum-entropy estimates can be obtained
for nearly any consistent "information scenario" (combinations of infor-
mation about moments and conditional moments of the density function that
are consistent with at least one probability distribution). This paper provides
illustrations of this.
While in general the histograms analyzed are analogous to probability
densities, the procedure can also be used in some cases for more general
ACTUARIAL USAGE OF GROUPED DATA 91
2. DEFINITION
The early literature on statistical information theory was developed by
Kullback and Leihler [16] following the work of Khinchine [14] and grew
out of the engineering literature on communication theory. A complete in-
troduction and description as well as applications of information theory to
problems in actuarial science can be found in Brockett [3]. To summarize,
in information theoretic notation, the expected information for distinguishing
between two measures, p and q, is denoted by l(plq)- This expected infor-
mation is mathematically quantified by the expected log-odds ratio; that is,
where p and q are discrete with masses Pi and qi for each i. Extensive
discussion of the information functional (2.1) and its role as a unifying
concept for statistics can be extracted from Kullback [15]. Brockett [3] also
places the functional in perspective for actuarial science.
By applying Jensen's inequality to the function h(x)=x-lnx, l(plq)-->0
with l(plq)=0 if and only if p = q . As a consequence, the quantity l(plq)
can be thought of as the (pseudo-) distance or "closeness measure" between
p and q within the space of all measures having equal total mass. In our
case, we want to choose that measure p that is "as close as possible" to
some given measure q and that satisfies certain additional knowledge we
have about p. The measure q is the benchmark, or beginning measure, and
p is the measure we want to obtain. The additional information about p is
written in the form of constraints that p must satisfy. Accordingly, our prob-
lem becomes one of minimizing l(plq) over all possible p, subject to the
given constraints on p. The solution p* to minimizing (2. l) subject to con-
straints is referred to as the minimum discrimination information (MDI)
estimate.
In many applications, however, there is no such a priori, benchmark, or
starting-point measure q from which to derive p. In this case, we express
our ignorance about q by choosing all values of q to be equally likely; that
92 TRANSACTIONS, VOLUME XLVII
is, q,.= 1 for all i in the discrete measure case or q ( x ) = l for all x in the
continuous density case. Accordingly, our objective function (in the discrete
case) is of form
Mi.,+,+
or equivalently
(2.2)
TABLE 1
RELATIVE MORTALITY RATIOS
FOR 5,131 SPINALCORD INJURY PATENTS
INJURED BETWEEN 1973 AND 1980
WHO SURVIVEDAT LEAST 24 HOURS
AFTER INJURY; BY NEUROLOGICALCATEGORY
AND AGE GROUP AT TIME OF INJURY [l !]
NeurologicalCategory Relative
and Age Groupat Injury MortalityRatio
Incomplete Paraplegia
1-24 4.82
25--49 6.59
50+ 3.26
Complete Paraplegia
1-24 4.93
25--49 6.93
50+ 3.26
Incomplete Quadriplegia
1-24 4.22
25--49 6.71
50+ 3.95
Complete Quadriplegia
1-24 12.4
25-49 20.78
50+ 14.11
An actuary might attempt to use these data for adjusting a mortality table
for use in such cases as wrongful injury damage award compensation cal-
culations and life insurance premium determination for medically impaired
lives. A reasonable question is: Is there a statistically rigorous way to esti-
mate, consistent with the data given in Table 1, the mortality rates for, say,
incomplete paraplegics that is as close as possible to some presupposed
standard table without actually having access to the original detailed data? ~
The answer is "yes." Brockett and Song [9] provide a life table adjustment
method based on a constrained information theoretic methodology. This
model minimizes the "information theoretic distance" (2.1) between the
adjusted mortality rates and the corresponding standard rates subject to con-
straints that reflect the known characteristics of the individual. An interesting
subproblem in their study is how to estimate the exposure level, E~, that
must be used in the calculation. To be most accurate, E x should be taken as
actually exhibited by the patient study population; however, when secondary
~By "the original detailed data," we mean the original sample observations, including both the
sampling frame and the sample size--all the information that would have been available had the
actuary done the primary research.
94 TRANSACTIONS, VOLUME XLVII
data are used, this detailed information about the precise age distribution of
the patient study population is often unavailable. In fact, DeVivo et al. [11]
in their report only give partial information on E x for the three age categories
in Table 1. Accordingly, the study of Brockett and Song [9] must develop
a method to derive the values Ex for the study population distribution. They
show how information theoretic techniques can be used to obtain a set of
exposure values, E~, that are as close as possible to the exposure profile of
the standard population but that are consistent with the information about
the study patient population profile given in DeVivo et al. [11].
As another example of a situation in which the actuary may be asked to
use secondary data to answer questions, consider the loss distribution infor-
mation presented in Table 2.
TABLE 2
EXPECTED LOSS EXPERIENCE
FOR 1000 CLAIMS
Average
ExpeCted ClaimSize
LossInterval Numberof in the
[a, b] Claims Interval
0 75 $ 0
1, 1,000 500 900
1,001, 5,000 250 4,000
5,001, 10,000 150 9,000
10,001, 100,000 20 20,000
100,001,500,000 4 200,000
500,001, 1,000,000 0.8 650,000
1,000,000+ 0.2 1,500,000
Total / Average 1,000 $ 4,820
Note that Table 2 gives both the conditional probabilities and conditional
means of the loss size subintervals. These may have arisen as summary
statistics for a very large data set that has only been saved in "banded"
form (compare Reitano [17]) or may have come from a published secondary
data source.
The actuary may be asked to determine the probability that a claim will
exceed a certain threshold level, say $50,000, and to determine the expected
claim size if a policy were issued with this threshold level as a policy limit.
Since $50,000 is strictly interior to one of the intervals, the actuary must
"interpolate" to find such an answer. The usual actuarial methods of as-
suming a constant force or a uniform distribution within the individual
ACTUARIAL USAGE OF GROUPED DATA 95
subinterval will not work, because they would produce probability distri-
butions inconsistent with the known average claim sizes within the subin-
tervals (since the mean of each subinterval is known and is not consistent
with the uniform or constant force assumption). For example, the interval
[1, 1,000] would have a mean of 500 under the uniform distribution as-
sumption; however, this is incompatible with the fact that the mean for this
interval is known to be 900.
The data matching discussed previously can arise when two histograms
are incompatible or when the subinterval endpoints on a single histogram
are not convenient to the user of the published data. Figure 1 displays data
on the use of a particular actuarial software program used in defined-benefit
pension plan calculations by consulting actuaries. The marketing actuary for
the developer of an improved software that can be used as an adjunct to the
original actuarial software has determined the R&D costs of developing the
program and ascertained that the purchase is only cost-effective by the con-
suiting actuary who performs this calculation more than 15 times per month.
Accordingly, it is desired to know, "How many actuaries perform this cal-
culation more than 15 times per month?" (The fact that usage is not evenly
distributed over the 10-30 interval means that a quick proportional calcu-
lation based on the histogram data would be unreliable.)
FIGURE 1
PERCENTAGE OF CONSULTING ACTUARIES REPORTING THE NUMBER OF USES
OF DEFINED-BENEFIT CALCULATIONS OF THE GIVEN TYPE PER MONTH
(THE AVAILABLE DATA DESCRIPTION LISTS THE MEAN AS 8.5 AND THE MEDIAN AS 4.5.)
40°/o
30o/o
20°/o
10°1o
i
Under 5.0 5.0 - 9.9 10.0 - 29.9 30.0 - 50.0
96 TRANSACTIONS, VOLUME XLVII
hi
Pi(X) - for x E (ai, bi]. (5.2)
(bi - ai)H
This is, for example, the estimate used in mortality table analysis when the
uniform distribution of deaths assumption is used and yields the familiar
"bar chart."
When further auxiliary information is available on certain moments (or
conditional moments) off(x), then the ME procedure yields a more desirable
estimate p(x) in a manner consistent with maximum-entropy estimation the-
ory.2 When no such information is given, p(x) becomes the final estimate
off(x). 3
The conditional moments that might be available are the conditional
means E(Xlai<X<-b~)=~i and/or the conditional medians M(X
lai<X<-bi)=Mi of the individual subintervals (a i, bi] (here M denotes the
median).
Returning to the general topic of determining a distribution that is close
to some distribution q but that satisfies certain constraints, we define the
minimum discrimination information (compare Brockett [3]) estimate of q
to be the distribution p, which solves the extremization problem:
p(x)
Minimize f p(x)ln - ~ ) dx (5.3)
Subject to 0i = ~ Ti(x)p(x)dx i : 1. . . . . n.
d
2Since the algorithm used in this paper is concerned only with continuous density functions, we
use the notation p(x) and f ( x ) interchangeably in this discussion.
3However, when a t or b. is infinite, different strategies must be used because the uniform distri-
bution is not acceptable in these situations.
100 TRANSACTIONS, VOLUME XLVII
as finding the "closest" distribution to q (in flae sense that the distribution
found is least distinguishable from q; compare Brockett [3]), which is con-
sistent with the known information stated in (5.3).
The analysis given in Brockett [3] implies that the optimal solution of the
problem above is a density function p° of the form
q( x )e ~a,r,~)
p'(x) = (5.4)
f q(t)e ~'13ir~')dt
bi
p p(x)
ai
ACTUARIAL USAGE OF GROUPED DATA 101
bi
f x p( x )dx
ai
= Ixi i E I, (5.5)
bi
f p( x )dx
ai
for that subset 1 of indices of subintervals for which this conditional mean
type of information is given, we rewrite (5.5) as
bn
f (x - I.~i)ll~,.b,lP(X)dx = O, (5.6)
al
where lt~,. b,1 denotes the indicator function of the interval [a i, bi]. This can
be written in the global expectation constraint formulation of (5.3) by
defining
For intervals i ~ I, we know only the mass (or histogram height) informa-
tion, which can be written as
102 TRANSACTIONS, VOLUME XLVII
bl
f p( x )dx hi (5.8)
H'
al
which can also be put into the global expectational constraint form of (5.3)
by defining
Note that in the numerator of (5.4), only one of the T~(x) is nonzero for any
given value of x, so that we can reformulate the solution as
4Note the x-axis scaling on the expressions for the exponential and truncated exponential distribu-
tions in the table. These density functions are usually applied to intervals with one endpoint at x=O.
In analyzing histograms, it may be necessary to fit these density functions for individual intervals
of x with arbitrary endpoints; for this reason the table displays the most general forms of
ACTUARIAL USAGE OF GROUPED DATA 103
TABLE 3
SUMMARY OF MAXIMUM ENTROPY DISTRIBUTION
FOR DIFFERENT KNOWN INFORMATIONSCENARIOS
Interval Moments Known ME Density
[a. b] None Uniform: f ( x ) = 1 / ( b - a )
[a, bl Mean ~, Truncated exponential:
f ( x ) = a e ~ ' l ( e ~ - e *')
is an implicit function of p,.
[42, O0] Mean p~ Exponential:
f ( x ) = e -t~-°m'-°~l/(p,-a)
[-ao, b] Mean O, Exponential:
f ( x ) = e-lV'-'~)t(b-v')ll(b-p,)
[ - = , ool Mean p, and Normal:
Variance tr 2 I
f(x) = ~ e - ~ ' - ~ 2,,2
the function. Also, the exponential distribution on (-oo, b] is written to be monotonically increasing.
Other combinations of moments and intervals result in ME distributions that can be derived by
using the same procedure demonstrated in (5.5)-(5.10).
104 TRANSACTIONS, VOLUME XLVII
be etb __ a e ' ~
a -t. (6.1)
p, = eab _ eaa
F ( x ) = F(a) + (6.2)
eab - eO~•
hi(e ~x -- eotal)
F*(x) = F*(ai) + H ( e ~ ' _ e ~ ) for x E (a i, bi]. (6.3)
hj(x - aj)
F * ( x ) = F*(aj) + [H(bj - aj)]" (6.4)
f
ai
x p ( x )dx = (6.5)
bo
p(x)dx = 1
ai
hi e - ~
H(b i - ai)
p(x) = for x E (a i, bi], i = 1. . . . . n. (6.6)
f q(t)e -~t dt
where the same limits of integration apply. Substituting (6.6) into (6.5) yields
106 TRANSACTIONS, VOLUME XLVII
~i b i -- 4- -- +
= (6.7)
Ix ~ [ b~-a~
h' ] [e-a°'-e-ab']
While Equation (6.7) is not easily solvable for 13 in terms of Ix, the value
of 13 can easily be obtained by numerical techniques. By integrating (6.6),
where F*(a~)=0, the corresponding distribution function F*(x) can be writ-
ten as
that is,
hn
F*(x) = F(a.) + -~ [1 - e <a"-x)l¢~"-~")] for x E (a., oo]. (6.16)
For the opposite instance, where the leftmost interval (i= 1) is unbounded
on the left, we assign
a2(h I + h 2) - h2ix2
(6.17)
Ixl = hi
whence
h
F*(x) = =.! eta~-x)/t~,-a2) for x • ( - ~ , a2]. (6.18)
H
108 TRANSACTIONS, VOLUME XLVII
7. APPLICATION EXAMPLES
In this section, we apply the formulas derived in Section 6 to solve the
problems posed previously.
B. E x p e c t e d Loss Calculation
Table 2 concerns the expected loss experience for 1,000 claims. We note
that when the conditional mean, Ix, and probability, p, for a bounded interval
[a, b] are known, the ME conditional distribution for the interval is a trun-
cated exponential. Suppose the ME conditional distribution for the interval
[a, b] is parametrically expressed as
fL(x) = e "+1~.
Then the following set of equations must hold for the derived conditional
probability and mean to be as given
f e Q+I~'dx = p, (7.1)
a
f xe ~'+l~ dx = plx.
d
ACTUARIAL USAGE OF GROUPED DATA 109
FIGURE 2
ADJUSTED MORTALITY RATES BY INFORMATION THEORETIC APPROACH
FOR INCOMPLETE PARAPLEGIA
-l
-2
..~-3
~-4
°~
-~-6 f )
-7
-8 ...... In (ip)
-9 1 t l, , , t ~ ......
0 20 40 60 80 1O0 120
Age
1 1 + p.13 -- a ~
13 = - - In (7.2a)
b- a 1 + Ix13- b13
and
p13
ot = In (7.2b)
ebl3 _ eaf~"
TABLE 4
NUMERICAL RESULTS FOR M E CONDITIONAL
PROBABILITY FUNCTION
Loss Interval
[a, bl ~ 13
l, lO00 --15.294 0.009995
lOO1, 5 0 0 0 -- 12.865 0.000898
5001, 10000 -- 18.439 0.000960
10001, 100000 -12.124 - 0.000100
100001, 500000 -16.215 -0.000009
500001, 1 ~ - 16.527 -0.000005
Note that the last loss interval in Table 2 (that is, $1,000,000+) is a half-
bounded interval with its conditional mean given. The results in Table 3 can
then be applied to obtain the distributional function for this interval.
Based on the ME distribution obtained for this example, those questions
raised in the introduction can be answered easily. For example, if it is desired
to know the probability that a claim will exceed a certain threshold level,
say, $50,000, and also to know the expected claim size if a policy were
issued with this threshold level as a policy limit, we then calculate
50,000
= 1- f fL(x)dx
0
10,000 50,0011
0 10,000
50,000
= 1 -0.975- f e-lZ124-°'~dx
10.000
= 0.0054.
ACTUARIAL USAGE OF GROUPED DATA 111
Similarly,
= I xfL(X)dx +
i
50,000 Pr[Loss --> $50,000]
0
+ f xfL(xldx
10,000
+ (50,000)(0.0054)
= 0 + 450 + 1,000 + 1,350 + 378 + 270
= $3,448,
A mean of 8.5 appears reasonable, and we can proceed with the estima-
tion. The results are presented in Table 5.
112 TRANSACTIONS, VOLUME XLVH
TABLE 5
SENSITIVITY OF THE INFORMATION
THEORETIC DISTRIBUTION ESTIMATE
OF THE PERCENTAGE OF FIRMS
WITH SIXTEEN OR MORE USES
OF THE GIVEN DEFINED-BENEFIT
CALCULATION AS THE SUPPLIED
MEDIAN USE CHANGES
The insensitivity of the far fight column to the choice of median provided
some degree of comfort with the initial market estimate for the software
product.
8. CONCLUDING REMARKS
The heuristic statistical procedure described in this paper (use what is
known and maximize the uncertainty of what is not known) can be cate-
gorized as problem-solving for grouped data with auxiliary information.
Such problems arise naturally in the actuarial analysis of secondary data.
Responding to data interpretation needs that arise in actuarial practice, the
algorithmic portion uses available information theoretic techniques when
possible. In other cases, when data are missing or in conflict, ad hoc mea-
sures (based on practical logic and experience) are taken to facilitate the use
of the same techniques. A variety of solved application examples were pro-
vided, and further application areas indicated.
ACKNOWLEDGMENT
This paper arose in part from a problem first posed by Edward Lew at
the 1984 Actuarial Research Conference in Waterloo, Ontario. The research
was funded in part by a grant from the Actuarial Education and Research
Fund of the Society of Actuaries. The comments of an anonymous referee
are gratefully acknowledged.
REFERENCES
l. ARTHANARI, T.S., AND DODGE, Y. Mathematical Programming in Statistics. New
York: Wiley, 1981.
ACTUARIAL USAGE OF GROUPED DATA 113