STA 2110 Notes
STA 2110 Notes
STA 2110 Notes
Hours)
Pre-Requisites STA 2100 Probability & Statistics I
(1) Identify and compare the advantages and disadvantages of the dif-
ferent sources of vital statistics data.
1
(11) Project a population using appropriate equations and assumptions.
2
[2] Poston, D and L Bouvier Population and society: An introduction
to demography. Oxford. HB 849.4 POS. 2010.
Course Journals
[2] Lee, E.T., Wang J., Statistical Methods for Survival Data Analy-
sis,3rd edition, John Wiley & Sons, ISBN: 9780471458555, 2003
3
Reference Journals
[3] Journal of the Royal Statistical Society (Series B) Online ISSN: 1467-
9868.
4
1 INTRODUCTION
Vital statistics provides an introduction to demography. The word ” demog-
raphy” is derived from the Greek words demos, people, and grapho, to write.
Demography can be defined as the study of human populations including
their composition, distributions, densities, growth and other characteristics
as well as the causes and consequences of changes in these factors. Demog-
raphy focuses its attention on three readily available human phenomena:
5
specializes in expensive sports cars. Demographic awareness could help
the potential dealer in finding neighborhoods where young and middle-
aged affluent people live. Placing a Corvette shop in a working-class
community populated mainly by low-income middle-aged and elderly
people could be disastrous.
3. Allocation of Money for Social Services: This point also pertains to the
relationship between Demography and the Government (part B). Given
that it is important for the allocation of money to poor people, one
can understand why disadvantaged groups in society are at a farther
disadvantage when the Census Bureau under counts those populations.
• Census
• Sample surveys
6
• Ad-hoc Demographic studies
1.2.1 Census
DE JURE
Advantages
Disadvantages
7
• Information collected regarding persons away from home is often in-
complete or incorrect.
DE FACTO
Advantages
• There is less chance for the omission of persons from the count.
Disadvantages
• Vital statistics are usually distorted (in areas with high migration)..
8
• Collecting information
• Evaluation
Qualities of a Census
9
1.2.3 Sample Surveys
1. What is Demography?
10
2 DEMOGRAPHIC MEASURMENT TOOLS AND TECH-
NIQUES
After completion of this chapter the students should be able to:
• Ratios
• Proportions and
• Rates
Ratio
a/b
11
• The numerator and denominator (a and b) are defined for a specific
geographic area and period of time.
Proportion
Rate
• Numerator
• Denominator
12
2.1 Techniques of Demographic Measurement
The crude birth rate indicates the number of live births (children born alive)
per 1000 mid – year population in a given year.
13
In the world CBR varies widely from population to population. It is high
for population of the developing countries and low for those of the developed
ones.
Limitations
The General Fertility Rate is the number of live births per 1000 females aged
15-49 years (fertile age group) in a given year. The GFR in more sensitive
measure of fertility than the CBR, since it refers to the age and sex group
capable of giving birth (females 15-49 years of age). It eliminates distortions
that might arise due to different age and sex distributions among the total
population. The major limitation of GFR is that not all women in the
denominator are exposed to the risk of child birth.
The GFR is a more refined measure than CBR to compare fertility across
populations. The GFR is approximately four times the CBR.
14
Age specific fertility rate
The Age specific fertility rate is defined as the number of children born alive
to females in a specific age group per 1000 females in that specific age group,
example (15-19), (20-24),….. (45-49) years of age.
Example: Consider the following data:- Find the age specific fertility rate,
GF R = (2514118/10590405) ∗ 1000
15
Total Fertility Rate (TFR)
The total Fertility Rate is the average number of children that would be born
to a woman throughout her life time or her child bearing age (14-49 years),
if she were to pass through all her child bearing years at the same rates as
the women now in each age group. The TFR sums up in a single number the
Age Specific Fertility Rates of all women at a given point in time. If 5 – year
age groups are used, the sum of the rates is multiplied by 5. This measure
gives the approximate magnitude of “completed family size”. The TFR is
one of most useful indicators of fertility, because it gives the best picture of
how many children women are having currently. In the previous example the
total fertilty rate is year.
T F R = 5(sumASF Rperwoman)
= 5[1666/1000]
= 5 x 1.666 = 8 Per woman The TFR is the best single measure to compare
fertility across
Exercise Find the general fertility rate and the total fertility rate for the
following data TFR = 5 x 1.276 = 6.38
i.e. 6.38 children per woman in her reproductive life.
Exercise
Consider the following data on the births and the female population in
a state for the year 2000 .Find the age specific fertility rate and the total
fertility rate Age group 2000 births female population
Solution
16
Age. Population of female Live births ASFR
15-19 1,237,721 117,583 0.095
20-24 978,136 268,987 0.275
25-29 979,623 283,111 0.289
30-34 989,693 380778 0.257
35-39 814,243 162,034 0.199
40-44 548,882 57,633 0.105
45-49 406,540 22,766 0.056
Total 5,954,838 1,166,465 1.276
17
sum= 419.7
TFR = 419.7 X 5 = 2,098.5 live births per 1,000 female state residents in
2000 who live through their reproductive years.
The Gross Reproduction Rate is the average number of daughters that would
be born to a woman throughout her lifetime or child bearing age (15-49
years), if she were to pass through all her child bearing age. This rate is
like the TFR except that it counts only daughters and literally measures
“reproduction”; a woman reproducing herself by having a daughter. The
GRR is calculated by multiplying the TFR by the proportion of female births
(Sex Ratio at birth).
Child woman ratio is defined as the number of children 0 – 4 years of age per
1000 women of child bearing age, (15 -49 years). This ratio is used where
birth registration statistics do not exist or are inadequate. It is estimated
through data derived from censuses.
18
• The number of deaths occurring in that population group (numerator)
• A time period.
The crude death rate is the number of deaths per 1000 population in a given
year. As its name implies the CDR is not a sensitive measure (indicator) of
health status of a population. It is affected by particularly the age structure
of the population. Crude Death Rate also varies between populations of the
world.
Death Rates can be calculated for specific age groups, in order to compare
mortality at different ages. E.g. for infants (< one year of age), children 1-4
yeas of age, children under five years, etc.
Infant Mortality Rate is the number of deaths of infants under one year of
age (0-11 months of age) per 1000 live births in a given year. Infant (children
under one year of age) are at highest risk of death than any other age group.
The infant mortality rate is considered to be a sensitive indicator of the health
status of a community, because it reflects the socio-economic condition of the
19
population; i.e. the level of education, environmental sanitation, adequate
and safe water supply, communicable diseases, provision of health services
etc. These factors mostly affect infants and children under five years of age.
Hence, IMR widely varies between countries in the world.
The past neonatal age is the period of time between one month up to one
year. Post – Neonatal mortality (death) is deaths of infants one month (four
weeks) of age up to one year (1 – 11 months age of) per 1000 live births The
post-neonatal mortality rate reflects deaths due to factors related to;
• Environmental sanitation
• Nutritional problems
20
Maternal Mortaly (Death) Rate (MMR)
Sex Specific Death Rate is the number of deaths among a specific sex group
(males or females) per 1000 population of the same sex group. Sex specific
mortality rate is used to determine which sex group is at higher risk of death
than the other.
21
the CDR of two regions A and B in terms of the age specific death rates:-
∑ A A
DA M P
CDR = A ∗ 1000 =
A
∑ XA X
P PX
∑ B B
DB M P
CDR = B ∗ 1000 =
B
∑ XB X
P PX
This shows CDR as the weighted arithmetic mean of the age specific death
rates, weights being equal to the proportion of population with corresponding
ages i.e.
P
∑X
PX
This implies that even if the age specific death rates in the two popula-
tions are comparable, their CDRs will differ as a consequence of the difference
in the age distributions of the two populations
To remove this defect we have to use the same set of weights in taking
the weighted average of the SDRs of two regions. The index thus obtained is
called standardized death rate or adjusted death rate. When age distribution
is taken as the basis of finding the standard set of weights, the standardized
death rate will be called age standardized death rate or age adjusted death
rate. Similarly, it may be adjusted for other characteristics such as sex, oc-
cupation etc. and similarly interpreted. A death rate may also be adjusted
for more than one factor simultaneously. There are two methods of stan-
dardization:
1. Direct Method of Standardization: In this method we take some other
population called a standard population. For computing age adjusted death
rate we take the population for different age groups in the standard popula-
22
tion as the set of weights.
Let PXS be the number of persons at age x in the standard population,
then standardized death rate or adjusted death rate for the two regions A
and B are given as: ∑ A S
M P
ST DR = ∑ X S X
A
PX
∑ B S
M P
ST DRB = ∑ X S X
PX
Thus the age-adjusted death rate is simply CDR that would be observed
in the standard population if it were subject to the age SDR of the given re-
gion. The standard population is generally taken to be the actual population
of a bigger region of which both A and B are subsets.
Thus standardized death rate is a good index for comparing the mortality
conditions of two regions. Any differences in their age specific death rates
will be faithfully reflected.
23
the formula for indirect age adjusted death rate is given by
∑ S S∑ S
T otal number deaths in region A M P P
ST DR = ∑ A × ∑ XA XS ∑ XA
PX M X P X PX
Standardized death rate suffers from the drawback that the value of STDR
so obtained depends on the age and sex composition of the standard pop-
ulation chosen which may be quite different from that of the region under
consideration. In case there is not much variation in the age distribution of
the standard population from that of the region under consideration, there
is not much problem. However, if the age distributions of the two regions
under comparison differ significantly, it is better to take the population of
that region as standard whose death rate is of more concern to us. Also
choice of the standard region is quite subjective and may introduce bias in
the results. Indirect method of standardization is used whenever the requi-
site data for computing STDR by Direct method is not available. However,
the two methods would give exactly equivalent results if the SDRs of the
given region happen to be proportional to the SDRs of the standard region.
Example Which of the two locations, for which mortality data are given
below, is in your opinion healthier?
24
2.1.3 Measures of population growth
• Vital index
• Replacement index
Example The following are figures obtained from a sample registration system
in a partcular city in the yea 2000. Mid year population=35272000
Total births=187236
Total deaths=72299
25
Find the crude rate of natural increase.
187236 − 72299
( ∗ 1000)
35272000
Vital Index
An index that takes into account the two most vital events , namely births
and deaths. The index is given by
T otal births
( )
T otal deaths
26
• None of the dies before reaching the age in the upper limit of age group
• All of them experience, throughout their reproductive span the current
level of fertility
However this information may not always be available but the total num-
ber of births be known. In such a case an approximate value of GRR may be
computed from the following expression which is based on the assumption
that sex ratio at birth is more or less constant over all ages of mothers:
= T F R ∗ ( T otalT otal
number of f emale births
number of births
)
Gross reproduction rate may be used for comparing the fertility situation
in different regions. However, because of under registration of births and
wrong statement about the age of mother at the time of registration, the
value of gross reproduction rate may not be accurate. Also it assumes that
none of the newly born babies dies before the age , which inflates the number
of future mothers
27
actly replace itself in the next generation provided the current fertility and
mortality rates prevail in future also. In this case population will have a
tendency to remain constant in size. If net reproduction rate is greater than
one, population has a tendency to increase as in this case each female will be
replaced by more than one daughter. Similarly if NRR is less than 1, pop-
ulation will tend to decline. However, net reproduction rate should not be
used for forecasting the size of future population as it assumes that current
rates of fertility and motility prevail in future also, an assumption, which is
not true. It also ignores the factor of migration and considers the age and
sex distribution of a hypothetical life table population that may be quite
different from that of the actual population.
The most common ratios are the sex ratio and the dependency ratio.
Sex ratio relates the number of males to females in the same populations and
therefore measures the numerical balance between the sexes.
T otal males
( ) ∗ 100
T otal f emales
A sex ratio with a value more than 100 indicates an excess of male over the
female births. A value less than 100 indicates an excess of female over male.
Example 1 The total number of male births registered in Kenya in 1960 was
16218 and the number of female births was 14219. Find the sex ratio at
28
birth.
solution
16218
( ) ∗ 100
14219
= 114.06
114 male births per 100 female births in the population.
Example 2 The total number of births in kenya in 1970 was 24216 . Find
the total of females and males using the sex ratio at birth in the previous
example.
solution
114
N umberof malebirths = ( ) ∗ 24216
214
= 12900
100
N umberof f emalebirths = ( ) ∗ 24216
214
= 11316
Example 3 The sex ratio at birth in a given population is given as 113. If the
sex ratio at birth of 105 male births to 100 female births. Find the number
of female births per 100 male births when sex ratio at birth is taken as 105.
105 113
=( = )
100 x
100
X=( ) ∗ 113
105
= 107.6 =108
The calculation suggests that there is a relative under registration of about
8 percent of females in the registration system.
29
General sex ratio
This is the ratio of males to females in a given population. This ratio can be
obtained from census enumerations of the total population or from sample
surveys.
All males
GSR = ( ) ∗ 100
All f emales
Example 1 The number of males from 1989 census was 11516219 and the
number of females was 13217430. Find the general sex ratio.
All males
GSR = ( ) ∗ 100
All f emales
11516219
GSR = ( ) ∗ 100
13217430
= 87
Dependency ratio
It is defined as the ratio of youths under 15 years of age plus peresons aged
65 and above to adults aged 15 to 64 years. It indicates the relative predom-
inance of persons in the dependent ages in relation to those in the productive
ages as broadly defined in most social and economic systems.
30
Find the deoendency ratio
(4414000) + (421231)
dependency ratio = ( ) ∗ 100
48335231
= 100
Note: There exists many other ratios in demographic analysis such as:-
ratio of birth to death, sex ratio at death , ratio of married females to married
males.
31
3 ERRORS IN DEMOGRAPHIC DATA
The accuracy of demographic statistics varies from one country to another.
The deficiencies are most in the developing countries because among other
problems of lack of administrative machinery, individual ignorance about
certain personal details and sometimes open hostility to some types of inquiry,
due to ignorance. Errors in the demographic data are mainly of two types,
namely, coverage and content errors.
32
3. The coding process - errors caused from the failure to allocate
information on the census or survey schedule to the proper code.
Demographic data are usually classified by age and sex. Errors in age report-
ing are more frequent than errors in sex reporting. With reference to Kenya
in census/survey reporting of age, there are five major forms irregularities -
• Age heaping.
33
3.1.1 Whipple’s Index of Digit Preference
This give the relative preference for digit ‘O’ and ‘5’ while reporting age in
the interval 23 to 62 years. It is computed as;
P 25 + P 30 + P 35 + P 40 + P 45 + P 50 + P 55 + P 60
WI = ∑ ∗ 100
1/5 PX
P 30 + P 40 + P 50 + P 60
WI = ∑ ∗ 100
1/10 PX
To test for heaping at ages ending in ‘5’ i.e. 25, 35, 45, and 55, the index
is computed as;
P 25 + P 35 + P 45 + P 55
WI = ∑ ∗ 100
1/10 PX
• It does not measure preference for digits other than ‘0’ and ‘5’
• It considers only the arbitrary interval 23 to 62 years and not the entire
life span of 0 to 80 or 100 years.
• It does not take into consideration the decreasing nature of the age
distribution due to depletion by death.
34
• It’s applicable only to single year’s data.
This index is used for evaluating single - year age – sex data. It can give the
extent of digit preference for all the digits 0, 1, 2, 3... 9. It can be used to
report errors for all ages 10 – 99 years.
Assumption
The underlying assumption of the method is that in the absence of sys-
tematic irregularities in the reporting of age, the blended sum at each termi-
nal digit should be approximately equal to 10 percent of the total blended
population. If the sum at any given digit exceeds 10percent of the total
blended population, it indicates over selection of ages ending in that digit
(i.e. digit preference). On the other hand, a negative deviation or sum that
is less than 10 percent of the total blended population indicates an under se-
lection of the ages ending in that digit (i.e. digit avoidances). If age heaping
is non-existent, the index would be approximately zero.
Example
Use the Myer’s blended index to assess the quality of age data given
below-
Example Using age distribution Use the Myer’s blended index to assess
the quality of age data given below-
Procedure for computations
1. Sum all the populations ending in each digit over the whole range i.e.
10-99
35
3. Multiply the sums in (1) by coefficients; 1, 2, 3,4,5,6,7,8,9 and 10.
5. Add the product of (3) and (4) , to obtain the blended sum
This index which was proposed by the United Nation is used for evaluation
of five-year age-sex data. The index is also referred to as Joint Score. It has
three components;
(a) Average sex ratio score
36
This score is obtained by fist calculating the sex ratio at each age group.
Successive differences irrespective of sign are added and averaged. Age –
specific sex ratio = 5Pxm X 100 5PXf 5
Pxm =males aged x to x + 5 5
pxf = females aged x to x + 5
(b) Average male age ratio score (M)
For each age group for males, calculate the age ratios computed as Age
ratio = 5Px X 100 ½ (5Px – 5 + 5Px + 5) The deviations from unity
irrespective of sign are added and averaged (M).
(c) Average female age ratio score (F)
For each age group for females, the age ratios are calculated using the
same formulae as for males. The deviations from unity irrespective of sign
are added and averaged (F). The index is then computed as: UNAI = 3(S)
+ M + F.
The reported age-sex data for a given population is presumed to be ac-
curate if the age-sex accuracy index is between 0 and 19.9, inaccurate if the
index is between 20 and 39.9, and highly inaccurate if the index is above 40.
Example —
37
4 Life Table
A life table gives the mortality experience of a hypothetical group of people
(called the cohort) starting life together and experiencing same mortality
conditions as given by the observed age specific death rate throughout their
lifetime. It shows how this cohort gets depleted through deaths at each age
till finally nobody is alive. One can compute the probability that a person
of some given age will live for a specific number of years. It also enables to
compute the average longevity per person. The data for the construction of
life tables is taken from census and death registers. As a number of factors
cause differential mortality, life tables may be constructed on the basis of each
one of them such as religion, sex, occupation etc. The life table has been a
key tool of actuaries for some 200 years and is the basis for calculating life
expectancy.
38
• They are used by the government for determining retirement benefits
for its employees, for knowing the number of senior citizens, for know-
ing the future size of the population for estimating the school going
population etc.
• Further, one could also study the impact of this cause on expectation
of life.
The cohort or generation life table is usually constructed for groups still
living and stops at the present age.
The period or cross sectional or conventional life table express the mortal-
ity experience that a hypothetical cohort would have if it experienced the
mortality rates observed in a given time.
1. Start with a hypothetical group of infants say 1,00,000 all born at the
same time.
39
2. The deaths are distributed uniformly over the age interval (x, x+1) (an
assumption which is not valid for early years of life, especially for age
0).
4. dx Number of persons who attain age x and die before reaching the age
(x+1) = l( x) − l( x + 1)
5. qx The probability that a person of exact age x will die before reaching
dx
the age (x + 1) = lx
6. px The probability that a person of exact age x will survive till the age
lx+1
(x+1) = lx
7. mx Probability that a person in the age group x to (x+1) will die while
2qx
in this age group. = 2−qx
40
8. µx Ratio of instantaneous rate of decrease in lx to the value of lx; =
− dlogl
dxl
x
10. Tx Total number of years lived by the cohort after attaining age x
= Lx + Lx+1 + .......
11. e0x Complete expectation of life at age x is the average number of years
lived after age x.
13. ex Average number of complete years lived by the cohort after age
x.=e0x - 21
NOTE: qx column is called the pivotal column of a life table. Once the
values of l0 and qx are known, the entire life table can be constructed. The
2mx
values of qx can be obtained from the relationship 2+m x
where the values of
dx
mx are calculated from the relationship mx = Lx
on the basis of census data
and death registration data.
Consider a large group, or ”cohort”, of males, for example, who were born
on the same day. If we could follow the cohort from birth until all members
died, we could record the number of individuals alive at each birthday – age
x, say – and the number dying during the following year. The ratio of these
41
is the probability of dying at age x, usually denoted by q(x). It turns out
that once the q(x)’s are all known the life table is completely determined.
In practice such ”cohort life tables” are rarely used, in part because indi-
viduals would have to be followed for up to 100 years, and the resulting life
table would reflect historical conditions that may no longer apply. Instead,
one generally works with a period, or current, life table. This summarizes
the mortality experience of persons of all ages in a short period, typically one
year or three years. More precisely, the death probabilities q(x) for every
age x are computed for that short period, often using census information
gathered at regular intervals (every ten years in the Kenya.). These q(x)’s
are then applied to a hypothetical cohort of 100,000 people over their life
span to produce a life table. An example is given below.
42
Table 1: Life table for U.S. Males. From: National Center for Health Statis-
tics (1997).
• x: age
• d(x): number of deaths in the interval (x,x+1) for persons alive at age
x. Thus of the l(50)=89,867 persons alive at age 50, d(50) = 566 died
prior to age 51.
44
other methods are used; for details see the National Center for Health
Statistics (1997) or Schoen (1988). Note: m(x) = d(x)/L(x).]
• T(x): total number of person-years lived by the cohort from age x until
all members of the cohort have died. This is the sum of numbers in
the L(x) column from age x to the last row in the table.
Notes
1. Life expectancy is not the same as median survival time, the latter be-
ing the time at which only 50% of a cohort are still alive. For example,
of the 100,000 persons alive at age 0, 51,387 are alive at age 75, and
48,565 are alive at age 76. The median survival time at birth (age 0)
is thus between 75 and 76 additional years (and can be shown to be
75.5), while the life expectancy at birth is e(0) = 71.8 additional years.
There are two major types of life table which can be constructed: Com-
plete life table A complete life table is a table in which the mortality expe-
rience is considered in single years of age throughout the life span. All the
45
columns of the life table are given for single years and it is extremely de-
tailed. Abridged life table An abridge life table is one in which the measures
are given not for every single year of age, but for age groups.
Example
to be done in class
To construct a life table for a group of individuals with a common risk factor
(or factors), one may either
2. adjust the mortality rates in the basic life table by adding excess death
rates (EDRs) or multiplying by relative risks (RRs), either of which
may be available from published studies.
5 Graphical presentation
46
bers: the time (e.g., year) at which it occurs and the age (or other duration
measure) of the person to whom it occurs
6 POPULATION PROJECTION
Introduction
Forecasts are the basis for all forms of planning for the future. Even a
complacent assumption that life will continue much as in the past is a form
of forecast. Demographic forecasts, in particular, are fundamental to any
form of social, economic or business planning. Suppliers of any good or
service, from maternity hospitals to the manufacturers of coffins, can only
plan ahead if they have some idea how many potential users of their services
they can expect to be located where.
Some phenomena can be forecast with perfect accuracy. For example,
we know exactly when and where future solar eclipses will occur. Other
phenomena, such as the weather, are inherently impossible to forecast with
any certainty for more than a few days into the future. It seems unlikely that
we will ever to be able to forecast future demographic developments with the
certainty that we can eclipses; we certainly cannot yet. Nevertheless, human
populations have two fundamental characteristics that reduce uncertainty
about how they will develop in the future:
47
• Second, one fundamental aspect of the human condition is that every
year that passes we all get exactly one year older until we eventually
die.
48
Total methods
Total methods calculate trends in the size of the population as a whole
using a mathematical model of population growth. They may then distribute
this total into sub-groups in ratio to the current structure of the population
or an extrapolated forecast of its structure. Therefore, such approaches are
sometimes known instead as ratio methods of projection.
Cohort component methods
Cohort component methods project each age group, sex and other cat-
egory of interest separately. They then aggregate the results to obtain the
total population. The term cohort emphasizes that an age group is made up
of people born at the same time who go through life together. The size of a
cohort at one age (and date) is strongly predictive of its size at other ages
(and dates).
Many population projections combine both approaches, although projec-
tions dominated largely by the cohort component approach are by far the
most common. Nevertheless, cohort component methods require many more
input data and assumptions than total methods and may be inappropriate:
49
extrapolate the population forward (or on occasion backward into the more
distant past).
The main steps involved in the procedure are to:
• estimate the parameters of the model from past estimates of the pop-
ulation
• extrapolate the fitted curves and read off the projected population.
The following are several mathematical functions that can be used to model
population growth
• arithmetic growth
• exponential growth
• logistic growth
• Geometric model
50
Arithmetic growth
The next simplest model is that of arithmetic or linear growth. This model
assumes that a constant numeric change occurs in the size of the population
in every period of the same length.A minimum of two estimates of the popu-
lation for different dates are needed to estimate the annual increment in the
population and project its size at other dates. The model can be fitted to
a longer series of estimates of the population by a means of a simple linear
regression of population size on time.
Thus, if P(t) refers to the population at time t and P(t+n) refers to the
population n years later:
P(t+n) = P(t) + a × n
(P(t+n) − P (t ))
a=
n
Exercise
If the population is 62,000 at time 275 and was 50,000 at time 225, what will
be the population at time 350?
Exponential growth
51