MICS Booklet M2 Unit1 Final
MICS Booklet M2 Unit1 Final
The designations employed and the presentation of material in this publication do not imply the
expression of any opinion whatsoever on the part of UNICEF, UNESCO or of IIEP-UNESCO concerning
the legal status of any country, territory, city or area or of its authorities, or concerning the
delimitation of its frontiers or boundaries.
The ideas and opinions expressed in this publication are those of the authors and do not necessarily
reflect the views of UNICEF, UNESCO or IIEP-UNESCO.
LICENCE
This work is licensed under Attribution – Non-commercial use – ShareAlike 4.0 International. To see a
copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
ACKNOWLEDGEMENTS
This Module is the second and last module and one of the key documents of the UNICEF & IIEP-
UNESCO Dakar Short Course on the Use of MICS data for education sector (or situation) analysis and
monitoring.
The development and delivery of this training programme has been made possible through UNICEF’s
MICS1-Education Analysis for Global Learning and Equity (MICS-EAGLE) initiative2, which was launched
in 2018 with the objective of improving learning outcomes and equity issues in education by
addressing two critical education data problems: (i) data availability i.e. gaps in key education data
and (ii) data utilization i.e. lack of effective data utilization by governments and education
stakeholders.
This module has been developed under the overall guidance of Suguru Mizunoya, Chief of the
Education Unit and Senior Advisor on Statistics and Monitoring at UNICEF HQ, Koffi Segniagbeto, Head
of IIEP-UNESCO Dakar Office and Luc Gacougnolle, Deputy Head Office. Therrezinha Fernandes Kinkin,
Head of IIEP-UNESCO Dakar Training Unit and Peggy Kelly, Statistics and Monitoring Specialist at
UNICEF HQ, have been responsible for the overall coordination of the course’s design and
development. Oswald Koussihouèdé, Education Policy Analyst at IIEP-UNESCO Dakar, Polycarp
Omondi Otieno, Education Policy Analyst and Planner at IIEP-UNESCO Dakar, Peggy Kelly and Sakshi
Mishra, Specialist in education at UNICEF HQ, have been responsible for the technical development of
the module.
CONTENTS
ACKNOWLEDGEMENTS 3
CONTENTS 4
LIST OF TABLES 4
LIST OF ACRONYMS 4
Introduction to Module 2 (M2) 5
Unit 1 Statistical Concepts 6
INTRODUCTION TO UNIT 1 6
1.1. STATISTICS 6
1.2. POPULATION AND STATISTICAL UNIT 7
1.3. CENSUS AND SAMPLE STUDIES 7
1.4. CATEGORIZATION OF VARIABLES 8
1.4.1 Quantitative variables 9
1.4.2 Categorical variables 9
1.5. DATA ANALYSIS 10
1.5.1 Measures of central tendency and choice 10
LIST OF TABLES
Table 1: Number of days of teacher absence ....................................................................................... 10
LIST OF ACRONYMS
EMIS Education Management Information System
Module 1 explained how countries can work toward improving their education systems by conducting
ESAs and by monitoring progress made towards the achievement of SDG 4. The module also explained
that the quality of these two processes is highly dependent of the availability and quality of education
data, from which education indicators, at the core of said processes, are calculated. Module 1 then
introduced UNICEF’s MICS and MICS-EAGLE initiative which support countries in their efforts by
addressing two critical education data problems:
● Data availability i.e. gaps in key education data; and
● Data utilization i.e. lack of effective data utilization by governments and education
stakeholders.
This Module 2 on “Analyzing and interpreting key education indicators using MICS” is the core of this
training on the Use of MICS for education sector analysis and monitoring as it will focus on the
application of MICS data in the computation of selected education indicators.
Module 2 will present key statistical concepts used in analyzing surveys, highlight important education
indicators that can be calculated in MICS6, and provide a step-by-step approach for interpreting
education and SDG 4 indicators.
This module will be covered in three units:
1. The first unit will pick up from where Module 1 left off and present deeper statistical concepts
and analysis tools, including the population and statistical units, censuses and sample-based
studies, categorization of variables, types of analysis and use
2. The second unit will focus on the identification of key education indicators calculable from
MICS data and their linkage with the SDG 4 indicators, with a brief explanation of how the
indicators are calculated
3. The third and final unit will focus on how to interpret the indicators used in the MICS-EAGLE
factsheets, including the presentation of results and how these results can be used to inform
educational policy.
1.1. STATISTICS
In statistics, we generally distinguish between descriptive statistics and inferential statistics.
1 Descriptive statistics aim to summarize the characteristics of the dataset by analyzing the
degree of centrality (i.e., identifying the most typical or average observation) and heterogeneity
(i.e., how different/far from each other are the observations) of the data. These two objectives
are represented in measures of the central tendency and measures of the spread/variation.
Tables and graphs (scatter plots, curves, histograms, pie charts, etc.) may be used to present
the descriptive statistics in a way that is easy to understand and interpret. For example,
descriptive statistics can be used to describe the differences in school participation in various
regions, the evolution of enrolment over time, the distribution of scores in a class, or the
distribution of teachers by school in a region or in the country. The purpose of descriptive
statistics is to summarize data and provide overall or detailed characteristics of factors being
studied.
2 Inferential statistics help to make conclusions or predictions based on the data. They are based
on the idea of the statistical hypothesis testing that estimates how significant and substantive
are the differences/effects across different groups and/or factors on the studies outcomes. Like
descriptive statistics, inferential statistics are a way of making inferences about population
based on samples, but it relies on the statistical hypothesis testing. For example, in a learning
assessment where 500 learners were sampled in grade 3 in a particular country, a researcher
may use bar charts to describe the divide between learners with and without foundational skills
(Descriptive statistics). Inferential statistics would take that further and make estimates or test
hypotheses about the rate of foundational learning skills of all children in grade 3 (i.e. the whole
population of grade 3) in that particular country. Inferential statistics are used for (i) estimating
population parameters (where research is used in answering research questions like, which
intervention is effective in improving learning between providing teachers and textbooks).
7
Inferential statistics can also be used in answering research questions such as whether boys and
girls at the end of primary school perform similarly in reading and mathematics.
A population is entirely defined by the information on one of the following two elements:
1 A characteristic of the individuals that constitute it: it is a characteristic possessed only by the
individuals of this population.
Examples:
Education planners want to study the difference of school attendance between girls
and boys of primary school age. The study population is all children aged 6-11 years
old (or the corresponding primary school age in the specific country). The individual is
the child.
Researchers wish to examine the relationship between a mother’s education level and
the vaccination of her children. The study population is all women who have children
between, for instance, 0 and 5 years old. The individual is the woman.
A census is the procedure of systematically acquiring and recording information about all the
individuals of a given population. An example is the population and housing census, which is generally
3 Empirical knowledge or empirical evidence is information acquired by observation or experimentation instead of relying on
logic.
8
carried out by countries once every ten years or so, to collect, analyze and publish demographic,
economic and social data on all the country's inhabitants at a given time.4
In the education sector, all countries conduct, conditions permitting, an annual school census,
collecting data on students, teachers, infrastructure, equipment, etc. across schools falling under the
Ministry of Education in the country. It is the central part of an EMIS. It is an outstanding source of
baseline data for determining education staffing and infrastructure requirements and for developing
policies including educational ones. However, some schools could fall outside the purview of the
annual school census, for example Koranic schools.
A sample study on the other hand is based on a subset of a population. For statistical purposes, a
sample is specifically designed, in terms of number and characteristics of its individuals, so that results
of the analysis on that sample can be inferred to be valid for the whole population – we say that the
sample is representative of the whole population.5 Observing a subset of a population rather than an
entire population provides faster results at lower costs. Sample studies allow for an extensive set of
questions to be posed to respondents and often result in better quality data, as much more careful
collection is possible when few individuals (compared to a population) are surveyed. However, small
samples can result uncertainty regarding the estimates calculated on the sample. This uncertainty is
called sampling error and can be usually measured by confidence intervals. In general, the derivation
of sample sizes for sample surveys are calculated according to specific methodology which ensure
representativity of the population while maintaining the desired level of precision for the subject being
studied.
There are two key types of sampling methods: probability sampling and non-probability sampling.
Probability sampling uses theories of probability (i.e. every individual in the population has a chance
of being included in the sample) to choose a sample. Within probability sampling, there are different
sampling methods including random sampling, stratified sampling, cluster sampling etc. When
working on a sample, the objective is usually to provide information that is applicable to the
population. If a simple random sampling is used to draw the sample (and the sample is large enough),
then the sampling automatically leads to information that can be inferred to the population. When
other types of sampling are used (e.g., stratified random sampling), an analyst will need to compute
weights to make the link between the sample and the population. The extrapolation of the estimates
into population parameters is known as inferential statistics.
4 In some countries, where data collection is facilitated by small distances and good infrastructures and efficient
administration, the census may be carried out every 5 years. In other countries, where vital statistics are systematically
recorded, the census information is updated annually based on a partial census.
5 A representative sample will include individuals from various groups of the population, various regions, various income
groups, etc. For instance, a sample that would only include urban schools or families would not be representative of the
whole population, as attendance and learning results are typically different in urban and rural contexts.
9
whereas categorical variables could be ordinal or nominal. Knowing the type of each variable is
important because it determines the statistical tools that can be used to analyze it.
Depending on the nature of the variables, many indicators can be computed. The following section
reflects on the types of analysis that can be performed, depending on the nature of the variables.
It is possible to convert a quantitative variable into a categorical variable, but not the reverse. This is
done by grouping data. For example, the age of children in a population or in a school is a quantitative
variable. This variable can be transformed into a categorical variable by grouping together students
who are of official primary school age (for instance ages 6-11) and students who are not of official
primary school age (under age 6 years and over age 11). This new variable is categorical with two
levels.
Because of this one-way conversion possibility, categorical variables (e.g., age or income groups) in a
questionnaire are a potential handicap for the subsequent analysis if the right (most relevant)
categories are not selected when developing the survey. It is often preferable to ask individuals about
the quantitative variable and then empirically select the correct grouping based on the distribution of
the information collected.
6 When it is impossible or impractical to list all possible values of a categorial value (for instance, the nationality of an
individual), a category « Other » is often used.
10
The choice between these different estimators depends on what question an analyst is attempting to
answer. Some statistics are considered robust, i.e., less sensitive to outliers in the distribution of a
variable. This is true of the mode and the median, which do not use all the data in their calculations.
Thus, an error in the data does not affect the median and may not affect the mode. On the other hand,
the mean considers all the data and is sensitive to outliers that may skew the result.
And although they are called “measures of central tendency”, remember that the mode and mean are
not always at the center of the distribution of a variable. Furthermore, these indicators alone cannot
summarize all the information contained in a dataset as illustrated in the following example.
Example: An inspector wants to study teacher absenteeism in the 4 schools in his district. Each of the
schools has 5 teachers whose unexcused absences are shown in the following table.
Using only the mean number of days teachers are absent to rank the schools, it appears that teachers
in school 4 are more often absent. However, a closer look at the data shows that this initial analysis
should be revised.
School 1: the teachers had the same number of days of absence; the mean represents the
dataset well.
Schools 2 and 3: the teachers have different numbers of days of absence, but the mean is the
same. It may be that in at least one case, the average does not represent the data well.
School 4: only one of the 5 teachers was absent, and they were absent for a very long time,
while the others were always present. The mean is 12 and does not represent the data well.
11
These examples show that using the mean as the only indicator to summarize the data can be
misleading, particularly in assessing the dispersion, i.e. the way in which the values are distributed
around the mean.
12
Comprehension Check 1
2. Select all statements that are correct about measures of central tendency and dispersion:
a. Using the mean alone to summarize data can be misleading, particularly in assessing the
dispersion
b. The mode and the median do not use all data in their calculations hence non-sensitive
c. The mean considers all the data and is sensitive to outliers
d. The mode and mean are always at the center of the distribution of a variable.