Concentration Index
Concentration Index
Concentration Index
Introduction
The concentration index [1-3] and related concentration curve (see Technical Note #6) provide a means of
quantifying the degree of income-related inequality in a specific health variable. For example, it could be
used to quantify the degree to which health subsidies are better targeted towards the poor in some countries
than others [4], or the degree to which child mortality is more unequally distributed to the disadvantage of
poor children in one country than another [5], or the extent to which inequalities in adult health are more
pronounced in some countries than in others [6]. Many other applications are possible. This Note
describes how to compute the concentration index, and how to obtain a standard error for it. Both the
grouped-data and micro-data cases are considered.
1.00
100%
0.80
cumulative % of ill health
L (p )
0.60
0.40
0.20
0.00
0%
0% 100%
0.00 0.20cumulative
0.40 % 0.60
of persons,
0.80 1.00
ranked by economic status
where p is the cumulative percent of the sample ranked by economic status, L(p) is the corresponding
concentration curve ordinate, and T is the number of socioeconomic groups.
Table 1 provides a worked example. It shows the number of births in each wealth group over the period
1982-92 in India. Expressing these as percentages of the total number of births, and cumulating them gives
the cumulative percentage of births, ordered by wealth. This is what is plotted on the x-axis in the
concentration curve diagram and gives us p. (See the Technical Note on the concentration curve for the
concentration curve graph for these data.) Also shown are the under-five mortality rates (U5MR) for each
of five wealth groups. Multiplying the U5MR by the number of births gives the number of deaths in each
wealth group. Expressing these as a percentage of the total number of deaths, and cumulating them, gives
the cumulative percentage of deaths for the corresponding percentage of births. This is what is plotted on
the y-axis in Figure 1, and gives us L(p). The final column shows the terms in brackets in the formula
above, there being T-1 terms in total. The sum of these is –0.1694, which is the concentration index. The
negative concentration index reflects the higher mortality rates amongst poorer children.
Computing a standard error for the concentration index with grouped data
A standard error can be computed for C in the grouped data case using a formula given in Kakwani et al.
[2]. Let n denote the sample size, T the number of groups, ft the proportion of the sample in the tth group,
μt the mean value of health variable amongst the tth group, and C the concentration index. Let Rt be the
fractional of the tth group, defined as
R t = ¦ γ =1 f γ +
t −1
1
2 ft
and hence indicating the cumulative proportion of the population up to the midpoint of each group interval.
The variance of C is given by eqn (14) in Kakwani et al.:
var( C ) =
1
n [¦ T
t =1
f t a t2 − (1 + C )
2
] + nμ1 2 ¦
T
t =1
f t σ t2 ( 2 Rt − 1 − C )
2
Table 2 gives an example using data on under-five mortality (actual rates, not rates per 1000 births) from
the 1998 Vietnam Living Standards Survey (VLSS). The data were computed directly from the survey,
with children being grouped into household per capita consumption quintiles. Sample weights were not
used. The assumption made in Table 2 is that the standard errors for the mortality rates are not known.
Below, we relax this assumption. On the assumption that the standard errors are not known, one has to
compute only the first term in the expression for var(C) above, and n is replaced by T in the denominator.
Table 2, which is extracted from an Excel file, shows the values for each “quintile” of R, q, a and f·a2. Also
shown is the sum of f·a2 across the five quintiles. Substituting Σf·a2=0.680, C = -0.1841 and T = 5 into the
expression for var(C) above, gives a variance of C equal to 0.0029, and a hence a standard error equal to
0.0537. The t-statistic for C is therefore -3.43.
In such cases, one uses n (rather than T) in the denominator of the first term in the expression for var(C)
above, and one needs to compute the second term as well as the first term. Table 3 shows the standard
errors for each quintile’s under-five mortality rate from the Vietnam data. The final column shows the
value for each quintile of the term in the summation operator in the second term of the expression for
var(C) above, as well as the sum of these across the five quintiles. Dividing this sum through by nμ2 gives
1.511e-6, which is the second term of the expression for var(C). Dividing Σfa2 through by n (=5315) gives
2.717e-6, which is the first term. The sum of the two terms is the variance, equal in this case to 4.228e-6,
giving a standard error of C equal to 0.0021. This, unsurprisingly, is substantially smaller than the standard
error obtained in the previous case, and results in a t-statistic for C equal to -89.54.
Consumption No. of
group births R U5MR CI q a f . a2 std error fσ2(2R-0.5-0.5C)2
0
Poorest 1002 0.094 0.060 -0.024 0.312 0.648 0.079 0.008 4.631E-06
2nd 949 0.278 0.034 -0.013 0.482 0.959 0.164 0.006 4.354E-07
Middle 1002 0.461 0.041 -0.053 0.695 0.944 0.168 0.007 9.085E-08
4th 1082 0.657 0.028 -0.095 0.854 0.842 0.144 0.005 1.423E-06
Richest 1280 0.880 0.022 0.000 1.000 0.719 0.124 0.004 3.780E-06
Total/average 5315 0.036 -0.184 0.680 1.036E-05
are used to measure inequality in malnutrition between poor and better-off children. Malnutrition is
measured by the child’s height-for-age percentile score (HAP) in a hypothetical population of well-
nourished children assembled by the US National Center for Health Statistics (NCHS). Thus a score of 50
means the child in question is at the median height-for-age in the well-nourished reference population. We
rank children by per capita household consumption (PCCONS). Initially, the commands below use sample
weights (WT), as the 1998 VLSS is not nationally representative without them. These weights, or
expansion factors, indicate the number of people in Vietnam which each represents.
C = 2 cov(yi,Ri) / μ,
where y is the health variable whose inequality is being measured, μ is its mean, Ri is the ith individual’s
fractional rank in the socioeconomic distribution (e.g. the person’s rank in the income distribution), and
cov(.,.) is the covariance. Where the data are weighted, a weighted covariance needs to be computed, and a
weighted fractional rank needs to be generated [10].
The covariance between HAP and CONRNK is 1.1505 and the mean of the HAP is 14.024 (meaning the
average Vietnamese child is only at the fourteenth percentile in the reference population). This gives a
concentration index of 0.1641—i.e. a tendency for better-off children in Vietnam to be taller (and better
nourished) than poor children.
WEIGHT BY wt .
RANK VARIABLES=pccons (A) /RFRACTION into RNKCON /PRINT=YES
/TIES=MEAN .
CORRELATIONS /VARIABLES=rnkcon hap /STATISTICS XPROD
/MISSING=PAIRWISE .
DESCRIPTIVES VARIABLES=hap rnkcon /STATISTICS=MEAN.
advantage of yielding an estimate of the concentration index itself. Neither, however, is appropriate with
weighted data. In the example, we have assumed for illustrative purposes that the VLSS data are self-
weighting. The value of C obtained ignoring the weighted character of the data is 0.1731.
1 ª1 n 2 2º
var(C ) = ¦ ai − (1 + C ) »¼
n «¬ n i =1
where
yi
ai = (2 Ri − 1 − C ) + 2 − qi −1 − qi
μ
and
1
¦
i
qi = y
γ =1 i
μn
is the ordinate of the concentration curve L(p), and q0=0.
This is easily computed in Stata with the following commands, which are for the malnutrition example.
The GLCURVE command generates GLHAP, which, divided through by the mean of the health variable
HAP, gives the concentration curve ordinate CCURVE (the analogue of q or L(p)). The next two commands
generate the lagged value of L(p), or qi-1. Inserting the estimated value of C in the next command generates
the variable a. The mean of a2 is then obtained, which can then be used to compute var(C) manually using
the formula above. In the VLSS example, the mean of a2 is equal to 2.1741, which gives a value of se(C)
equal to 0.0124.
ªy º
2σ R2 « i » = α + βRi + ui
¬μ ¼
where σ R2 is the variance of the fractional rank variable. The estimate of β is equal to the concentration
index, C. Estimating this equation is an alternative to (but equivalent to) the convenient covariance
method. It also gives rise to an alternative interpretation of the concentration index as the slope of a line
passing through the heads of a parade of people, ranked by their consumption or SES, and their height
proportional to the value of their health variable, expressed as a fraction of the mean. The standard error of
β provides an estimate of the standard error of C, but is inaccurate since the nature of the fractional rank
variable induces a particular pattern of autocorrelation in the data. The formula above gets round this, but
an alternative is to use the Newey-West [11] regression estimator, which corrects for autocorrelation, as
well as any heteroscedasticty. The commands below implement this for the malnutrition example.
The GLCURVE command generates the rank variable INCRNK. The next three commands generate the left-
hand side variable (LHS) in the convenient regression. The NEWEY command then obtains the Newey-
West regression, producing a value of β (the concentration index) equal to 0.1731, and a standard error for
C equal to 0.0130. This is larger than the standard error obtained using the formula method (0.0124),
reflecting the additional adjustment for heteroscedasticity, which in turn is larger than the standard error
from an OLS regression (0.0117), which takes into account neither the autocorrelation induced by the rank
nature of the fractional rank variable or any heteroscedasdicity in the data.
Useful links
Bibliography
1. Wagstaff, A., P. Paci, and E. van Doorslaer, On the measurement of inequalities in health. Social
Science and Medicine, 1991. 33: p. 545-557.
2. Kakwani, N.C., A. Wagstaff, and E. Van Doorslaer, Socioeconomic inequalities in health:
Measurement, computation and statistical inference. Journal of Econometrics, 1997. 77(1): p. 87-
104.
3. Lambert, P., The distribution and redistribution of income: A mathematical analysis. 2nd ed.
1993, Manchester: Manchester University Press.
4. Castro-Leal, F., et al., Public spending on health care in Africa: do the poor benefit? Bulletin of
the World Health Organization, 2000. 78(1): p. 66-74.
5. Wagstaff, A., Socioeconomic inequalities in child mortality: comparisons across nine developing
countries. Bulletin of the World Health Organization, 2000. 78(1): p. 19-29.
6. Van Doorslaer, E., et al., Income-related inequalities in health: Some international comparisons.
Journal of Health Economics, 1997. 16: p. 93-112.
7. Fuller, M. and D. Lury, Statistics Workbook for Social Science Students. 1977, Oxford: Phillip
Allan.
8. Kakwani, N.C., Income Inequality and Poverty: Methods of Estimation and Policy Applications.
1980, New York: Oxford University Press.
9. Jenkins, S., Calculating income distribution indices from microdata. National Tax Journal, 1988.
61: p. 139-142.
10. Lerman, R.I. and S. Yitzhaki, Improving the Accuracy of Estimates of Gini Coefficients. Journal of
Econometrics, 1989. 42(1): p. 43-47.
11. Newey, W.K. and K.D. West, Automatic Lag Selection in Covariance Matrix Estimation. Review
of Economic Studies, 1994. 61(4): p. 631-53.