Green Book, Part 2
Green Book, Part 2
You may be surprised that up to this point in the book there has not been much about
data collection and statistical analysis. That changes now! But you will still find little in the
way of mathematics or technical details. There are two reasons. First , many of the ideas
you need to design and analyse good studies can be explained and understood without
using mathematics. Secondly, there are many books around that describe the mathemat-
ics, and many of the courses in ‘research methods’, or ‘statistics’ that you will have
followed will have used a mathematical, rather than an intuitive, approach. We want to
provide an alternative.
These chapters can only be introductions to important ideas and methods. Maybe they
will be all you need. It is more likely that they will raise all sorts of questions that are
important in your research, and prompt you to seek out further understanding. They may
even help you make sense of that statistics course you took and hated so!
The more technical aspects of a research project are important , and sadly many
students have failed, or had to redo parts of their project , through failing to understand
them early enough. If the material here raises any questions or uncertainties then you
should get help. Biometricians and statisticians are experts in this stuff, so consult them!
And any successful researcher must also have a sound grasp, and should be able to help
you.
The topics covered in Part 4 are only a selection from those that could have been
included. We have used two criteria to select them:
1. The topic is essential to most projects, yet commonly misunderstood. (Included here
are the chapters on design of experiments, planning surveys, measurement , managing
and analysing data).
2. The topic is important but often omitted from research methods courses. (We have
included chapters on finding and using secondary and spatial data, and modelling).
The chapters are roughly arranged in the logical order in which they might occur in a
project . But please don’t wait until the end of your project to look at later chapters! The
design of the study depends on how you will analyse it , so you must be aware of the later
steps during early stages.
Richard Coe
113
The Green Book
114
4.1 Using secondary data sources
Jayne Stack
115
The Green Book
search study should be undertaken and without a prior search of secondary sources (also
termed desk research). There are several grounds that give us confidence in making such a
bold statement . (The following material was adapted from Crawford and Wycoff, 1990):
• Secondary data helps you to: define a research problem, formulate research questions
and hypotheses, and select a research design. The assembly and analysis of secondary
data almost invariably makes an important contribution to the research process. A review
of existing knowledge will improve your understanding of the research problem, including
the key issues, core concepts, and on-going debates. It will reveal approaches to data
collection (e.g., useful conceptual models, variables for concepts of interest , appropriate
analysis techniques) that may improve or complement your own initial research design. In
sifting purposefully through secondary data, you may find something else that sends you
exploring new regions or ideas you may not even have thought of before. And, you might
find evidence that will actually change the shape of your ideas.
• Secondary data may be sufficient to answer the research question. Occasionally you may
find the available data are so adequate that primary data collection is unnecessary. If
useful secondary data are available, they can be used to substitute for primary data
collection at any stage during your research. It is not always necessary for you to collect
all the information required for the analysis yourself. For example, daily rainfall records for
the last 10 years obtained from the Meteorological Office allow you to draw conclusions
about the adequacy of the growing season and the problem of dry spells, or agricultural
data from a national sample survey can provide good information on the major character-
istics of a farming system.
• Data costs are substantially lower for secondary data than for primary data. A thorough
review of secondary sources can be completed at a fraction of the cost and time it takes
to complete even a modest primary data collection exercise. Finding a ‘ready made’
solution in existing sources is unlikely, but even partial solutions help primary data
collection needs, and therefore save time and money. For example, the current livestock
situation in a country in terms of stocking densities, grazing pressure, herd structure, and
management practices could be studied using a combination of secondary livestock data
from the Ministry of Agriculture, the veterinary services, and reports of past research
studies.
• Secondary sources of information can yield more accurate data than that obtained
through primary data. This is not always true, but when a government or international
agency has undertaken a large-scale survey or even a census, their results are likely to be
far more accurate than your own surveys when these are based on relatively small sample
sizes. For example, a national income and expenditure sample survey is likely to yield
more accurate results than an income and expenditure study of 200 sample households in
a single area. However, it should be remembered that all secondary data was once
someone else’s primary data. Some people who work with official statistics wrongly
conclude that their own analysis is more objective than analyses of primary data, which
is ‘soft’ data.
• Secondary sources help define the population. They can be extremely useful both in
defining the population and in structuring the sample you wish to take. For instance,
government statistics on a country’s agriculture will help to stratify a sample and, once
you have calculated your sample statistics, the stratified sample can be used to project
those estimates from the sample to the population.
116
4.1 Using secondary data sources
• Secondary data can be used to make comparisons. Within and between nations and
societies, comparisons can enlarge the scope for generalisations and insights. Global and
regional data sets (e.g., those of the Food and Agriculture Organization of the United
Nations (FAO), World Resources Institute (WRI), or the World Bank) are a valuable source
of secondary data for between-country comparisons on a vast range of topics including
poverty issues, food security, trade patterns, growth rates, and technical change. Within-
country comparisons can be made using national data sets disaggregated by administra-
tive or natural regions.
• The availability of secondary data over time enables the employment of a longitudinal
research design. One can find baseline measurements in studies made in the past and
locate similar data collected more recently. With an increasing emphasis on understanding
patterns of change, the use of secondary sources can also be critical to single point
surveys, which lack a time dimension.
• Secondary data can be used to increase the credibility of research findings obtained from
primary data. The comparative use of other research together with a comparison of data
collected during your study with official statistics on the same topic can be very valuable
when you reach the analysis stage. Research results are more credible when supported
by other studies.
potential sources of bias and errors. It helps considerably if you are able to speak to
individuals involved in the collection of the data to gain some guidance on the level of its
accuracy and limitations.
• Definitions. A common problem in using secondary data is how various terms were
defined by those responsible for its preparation. Terms such as family size, income, credit ,
farm size, output sales, and price need very careful handling. For example, a family size
may refer to only the nuclear family or include the extended family. In census data a
household is often a group of people who stayed the census night in the dwelling unit ,
irrespective of whether they are part of the nuclear family or not . Income data often
exclude the value of own-produced goods. Credit and sales statistics often ignore trans-
actions that pass through the informal sector. Even apparently simple terms like the year
for which the data apply may need care in interpretation. For instance, in Zimbabwe, the
marketing year 2002/2003 refers to the period 1 April 2002–31 March 2003. Any crop sales
data recorded against 2002/2003 refers to sales from the 2002 harvest . Sales from the 2003
harvest are recorded under the 2003/2004 marketing year! Special care in interpreting
definitions and years is necessary in combining secondary data from several sources to
produce a derived data set .
• Timescale. Most secondary data has been collected in the past so it may be out-of-date
when you want to use it . If the data source includes estimates of growth rates this
information may be used to extrapolate figures for subsequent years. For example,
population censuses usually include an estimate of population growth that can be used to
estimate inter-census population data.
• Source bias. You should be aware of vested interests when you consult secondary
sources. The objectivity of officials may be affected when it comes to reporting situations
for which they themselves are partly responsible. Similarly respondents may provide
biased information depending on their perceptions of the purpose of data collection (e.g.,
planning drought relief, forced destocking,). Further, official economic data may be a very
inaccurate source of statistics in situations where the informal economy and/or black
market account for a significant share of economic transactions.
118
4.1 Using secondary data sources
• Government statistics. These may include population censuses, national income data,
agricultural statistics, poverty surveys, trade data, cost of living surveys, nutritional sur-
veys, the results of commissions of enquiry into particular issues (e.g., land tenure) and
possibly data on market prices.
Secondary sources can include:
• Marketing boards, which are likely to have information on quantities purchased of differ-
ent commodities, imports and exports, buying and selling prices, and stocks
• Extension organisations who will have crop area and production estimates for various
crops and probably farm budget data for different enterprises
• Agricultural research institutes that are an important source of information on such
agronomic issues as soil fertility studies, crop and livestock breeding programmes and
technology
• Veterinary departments who may have data on livestock numbers and disease control
measures, e.g., dip tank records
• Hospitals and clinics might have data on incidence of malnutrition, particular diseases and
causes of death
• Local administration offices often have lists of households which could be useful in the
construction of sampling frames. They might also provide information on project activities
in the district , e.g., active NGOs, or registered cooperatives
• Archives are a useful source of information to help you understand patterns of change
• International organisations may have country studies available at their local information
centres or offices
• Websites. With the rapid development of information technology and computerised databases,
the scope for you to carry out a search of secondary sources and to use secondary data
sets compiled by other organisations and posted on websites, has increased dramatically.
The following is a selection of key websites providing access to statistical data of particu-
lar interest to African agricultural, environmental and rural development researchers.
119
The Green Book
• Millennium Development Goal Indicators has 48 social and economic indicators for 1985–
2000 used to monitor the implementation of the goals and targets of the United Nations
Millennium Declaration. http://milleniumindicators.un.org/
Non-official sources
• Consultants reports (which may be gathering dust on the shelves of the body sponsoring
the research!)
• Records of NGO activities including drought relief and supplementary feedings schemes
• Baseline surveys and project documents.
As you can see, secondary information can come from a bewilderingly large number of
sources. Perhaps the most efficient and effective way to begin is to talk to people. Find the
authorities in the field; search out the researchers working in your areas of interest . Conver-
sations with them can get you further faster than almost any other search method. Research-
ers outside your own country can usually be contacted by e-mail and many are happy to
forward copies of their own publications. Develop a network of contacts in key positions and
cultivate them over time. Such contacts are particularly useful sources of semi-official and
unpublished reports from research institutions and universities. In addition, experienced
researchers have usually built up their own list of favourite websites that provide material on
key research themes in development .
120
4.1 Using secondary data sources
• Publisher
• Place of publication.
121
The Green Book
For example:
• Study of consumption and marketing decisions of smallholders where the major source
of information is a household sample survey, but where secondary data on grain
production and marketed surplus by region are combined with official population data
to examine past trends in agricultural production and marketed output . Here the
analysis of secondary data provides a context for the analysis of the primary data.
• Investigation into the feasibility of edible insect farming using an experimental farm,
where secondary information on artificial feeding is used to identify alternative feeding
methods for field trials.
2. Research which uses aggregated secondary data as a major source of information, when
interpreting this information. For example:
• International comparison of various development indicators using a World Bank’s
global data set .
• Regional human poverty comparisons made by the United Nations Development Pro-
gramme (UNDP) for Zimbabwe using a poverty assessment study survey undertaken
the Ministry of Public Service, Labour and Social Welfare and other secondary data
sets (UNDP, 1998).
3. Research which uses disaggregated secondary data, perhaps in raw form, as a major
source of information, with a new analysis of the same data. For example:
• Modelling agricultural supply response using a data set derived from secondary data
found in official statistics
• Construction of a food balance sheet using official statistics
• Lenin’s’ famous analysis of peasant differentiation using Zemstov house-to-house cen-
sus data as his major source of data (Lenin, 1961).
The conceptual and analytical tools used to interrogate secondary data will vary depend-
ing on the role that secondary data play in the study. For instance, if secondary data are the
major source of data for your research task, the analytical process, (specification and
estimation) is likely to be a central component of your thesis. On the other hand, if you are
assembling secondary data to improve your understanding of the socio-economic conditions
in a field study area you are more likely to use simple descriptive statistics to highlight
important trends and characteristics.
Regardless of the way you intend to use secondary data some general comments can be
made about methods of interrogating it .
General
Research, like any other types of thinking, can be thought of as involving two stages. [The
distinction between first stage and second stage thinking was first brought to my attention
in a highly recommended course called ‘Writing for effective change’ distributed by an NGO
called Fahamu, Learning for Change (Fahamu)]:
• First-stage thinking: exploration, discovery, generating ideas
• Second-stage thinking: collating, sifting, organising the ideas into a robust structure
First-stage thinking. Sometimes called ‘divergent’ or ‘radiant’ thinking; during this stage, you
explore and gather anything and everything that you think might be of interest or use to your
study.
Second-stage thinking. By contrast this is sometimes called ‘convergent’ or ‘focused’ think-
ing. It organises the material you have collected to support one or another of your ideas.
122
4.1 Using secondary data sources
Mindmapping
Mindmapping has been around for a long
time, but the person who has done most
to explain it and make it popular is Tony
Buzan (1993). Mindmapping exploits our
mind’s extraordinary ability to create
meaningful connections between ideas.
Mindmapping helps us to see – or make
– connections in our thinking, increas-
ing our creativity and making thinking
more efficient .
Brainstorming is the first step in mind
mapping. Figures 1–3 show how a
mindmap was developed to think about
Figure 1. Example mindmap (4)
how to improve feeding systems using
traditional practices (The mindmap ex-
ample is adapted from ‘Writing for ef-
fective change’ distributed by an NGO
called Fahamu, Learning for Change). Be-
gin by writing the main research ques-
tion or concept in a circle in the centre
of a page. Then, jot down any ideas that
come to mind when you think of this
concept . (Figure 1). As you think of each
new idea, new branches are created from
the central balloon and the idea is writ-
ten along the line (Figure 2). The next
step is ‘free associating’ on each idea to
build a verbal map of words or images
that are connected to it . Sub-sets of
Figure 2. Example mindmap (5)
123
The Green Book
124
4.1 Using secondary data sources
importance of each sector. If information on the volume, value and prices of exports of a
particular commodity are available over a period of time, then calculating instability indices
for each variable will demonstrate the level of export earnings instability and the extent this
is due to either export price instability or instability in quantity exported.
125
The Green Book
However, diagrams can also be a useful way of conceptualising links and feedbacks within
a system. For example, the livelihoods framework illustrated in Figure 4 is useful for thinking
through livelihood circumstances of individuals, households, villages, and even communities
and districts. The limitations of any such 2-dimensional representation of a process as
complex as livelihood formation are recognised from the outset . The purpose of such a
diagram is to organise ideas into manageable categories and identify the main components
(assets, mediating processes, activities) and the critical links and dynamic processes between
them (Ellis, 2000).
126
4.1 Using secondary data sources
Analytical frameworks
Analytical frameworks are useful tools that enable you to concentrate on the broader picture.
For example, food policy analysts may compute a food balance sheet using secondary
information to examine food availability and identify key characteristics of domestic food
consumption. Economic statisticians use accounting frameworks to prepare a country’s na-
tional accounts and balance of payments based on secondary data.
Conclusions
A lot can be learned from secondary data and you should be prepared to explore various
alternative ways of interrogating available information. Data sources should always be ac-
knowledged and some guidance provided on the reliability and limitations of data used. In
practice the collection and interrogation of secondary data is not just a first-stage activity but
is something that can and should contribute to every stage of the research cycle. As noted
in the opening section, secondary data can assist in designing a sampling frame, and in
identifying a potentially useful method of analysis or appropriate conceptual framework.
Secondary information provides a context for the analysis of primary data. The comparative
use of secondary data can be especially valuable at the analysis stage and a good researcher
will highlight areas of contrast and similarity between their own data and research findings of
earlier studies on similar topics. Whilst findings gain more credibility if they are supported by
a number of other studies, you should not be afraid to indicate where findings are different .
127
The Green Book
128
4.2 Spatial data and geographic information
systems
Thomas Gumbricht
• Understanding the
spatial context of Introduction
an agricultural and The Earth is a sphere with an average distance to the Sun of
resource manage- 150 million km. The Sun radiates energy, which is received by the
ment problem will
probably be an rotating Earth in diurnal cycles with annual modulation as the
important part of Earth completes its annual ellipse. The energy that hence reaches
solving it, so can the Earth is mainly dissipated at the Earth’s surface. It rotates the
not be ignored in hydrological cycle, releases nutrients that feeds the ecosystems,
your research
and drives photosynthesis (all which have been largely altered by
• Geographical man since the Industrial Revolution). Thanks to these processes
information systems
(GIS) allow you to
life exists and the Earth’s surface has developed a ‘natural’ logic. In
manage and man- dry areas with poor resources vegetation is sparse, in valleys
ipulate spatial data where water and resources accumulate the vegetation is more
• Simple manipula- luxurious. If there is a trough and enough water a body of water
tion of data sets will form. In a similar way the human landscape is also logical,
that has already with fields in fertile valleys and dwellings along the ridges. Cities
been prepared can
be learned quickly.
have to be close to large sources of water. These logical land-
However, using data scapes are also evident on a much smaller scale. Most vegetation
from multiple is bound to specific habitats narrowly defined by conditions of
sources for more climate, soil and water; that can shift within a scale as small as
complex tasks can
be a major under- one metre. At an even finer scale a human thought is also depend-
taking ent on energy dissipation at interfaces – in a very well described
• Many basic spatial
spatial context between the synapses of nerve cells. Image analysis
data sets are avail- and location information systems are hence very important tools
able for Africa but in medicine, sociology, anthropology, biology, ecology, geology,
poor Internet conn- hydrology and many other sciences.
ections may limit
access to them For a particular study the spatial information needed might
only be a map – as were the descriptive studies conducted by the
• Freely available first European explorers. In most instances a researcher is prob-
software is now
sophisticated ably more interested in extracting more information in order to
enough to be useful test a hypothesis. This could be comparing two district-level data
in many spatial sets, perhaps one on poverty and one on incidence of malaria.
research projects
This is easily done in a geographic information system (GIS), and
you still only need a single map, with attributes (databases) on
both malaria and poverty. But malaria is a vector-borne disease,
and the mosquito carrying the parasite breeds in water, so proxim-
ity to water is most probably important . To test that hypothesis an
additional data layer of water availability is needed. This step is a
major complication that has yet to be fully taken in the case of
malaria. Rivers and lakes can easily be found, and their proximity
to each population group calculated. But now you ideally also
want population and malaria data on village level, not just for
districts. Then you realise that mosquitoes can breed in water
129
The Green Book
tanks, small puddles, or even water trapped in an old bucket or boot . Now the comparison
becomes almost impossible, and you need to get data on rainfall and temperature in order
to calculate the daily water balance. This calculation is possible; the data are there (as you
will see below), but the calculation is not a trivial task.
In general, a thematic study will need more refined data, whereas an interdisciplinary
study must probably be satisfied with more generalised data. Often this is because detailed
data of different origin are seldom compatible in their spatial resolution. However, the use of
GIS and spatial data can be very rewarding. The first level, including a map, is almost always
welcomed and very simple, it will only take a few days. The second level, comparing
(‘overlaying’ in GIS jargon) attribute data related to the same spatial context is also quite
simple, and will take a week to a month. The third level, analysing spatial relations introduces
complexity, but can still be done by most standard GIS packages (and some of the freeware
packages listed on page 142), but it will take some months to a year. The fourth level of
integrating GIS with dynamic (time-resolved) models is quite complicated. This level will
demand in-depth knowledge of both GIS and modelling, and most probably of programming
as well. It will take longer than a year.
130
4.2 Spatial data and GIS
Figure 4. Interpolated8 x 8 raster image from 64 Boolean sample points, randomly placed in each grid cell:
a. Inverse distance weights (IDW) to 8 neighbours, b. Reclassification of a, c. Spline smoothing
function to 8 neighbours, d. Reclassification of c. The reclassification is done as a threshold using
the value 0.5. Both illustrated interpolation methods can be parameterised to get a true chessboard,
that, however, demands iterations and skills, together with knowledge about the pattern of the
generated surface
a. b. c. d.
Figure 5. Interpolated 8 x 8 raster image from 31 randomly selected points (see Figure 4) a. IDW to
8 neighbours, b. Reclassification of a, c. Spline smoothing function to 8 neighbours, and d. Reclassi-
fication of c. The reclassification is done as a threshold using the value 0.5
131
The Green Book
Figure 7. Observations of a chess game on four occasions. At first the game is apparently static. Only
with a more detailed scrutiny it is revealed that the players actually are shifted a little between
each observation. However, a s we have n o hypothesis or information of sub-cell pattern or
process we neglect this as observation error
132
4.2 Spatial data and GIS
133
The Green Book
now more esoteric interest among the chess community, but the general public, policy- and
decision-makers are unaware of this development .
Implications
A game of chess always aims at checkmate – which is unambiguously defined, as is the role
of each player. The rules of the game show no evolution, neither in space, nor over time. If
you change the extent of the arena, the role of the players or the outcome for checkmate to
an unknown event , the computer would have little chance of winning. In a transient social or
natural environment that is how the evolutionary game is played. In the simple case of chess
there are only two scales that are of importance, that of a cell and the whole board.
Furthermore, the game as such has no influence on the arena. In a landscape all discretised
scales are arbitrarily chosen, the real landscape is a continuous nested hierarchy: but some
scales have dominance-generating spatial architectures and temporal cycles, entrapped by key
stone species and related processes. This also leads to the conclusion that the processes are
forming the patterns rather than the other way around – and that the systems has feedback
loops at various scales. All those aspects can be disregarded in the special (and simple) case
of the chess game.
The general conclusion that can be drawn is that modelling in GIS is hampered by several
shortcomings, that care must be exercised when using distributed data for modelling, and
that the quality of many GIS integrated models is poor. They are also poor because they have
poor GUIs, fail to visualise the results, and hence do not reach the intended user community.
In order to secure high-quality GIS-integrated models the following issues need to be consid-
ered:
• Close co-operation between GIS model researchers in general, and particular among
– researchers studying the same phenomena but adopting different methods and/or
scales
– researchers, planners and decision-makers
• Up- and down-scaling, and nesting models of different resolution
• Spatial and temporal domain, grain size and sampling intensity when integrating data from
various sources
• Strategies for sampling spatial phenomena to get representative data
• Selection of spatial interpolation methods and spatially correlated error tracking and
tagging
• Methods for evaluating the influence of error and error propagation on model perform-
ance, and error visualisation for communication information on uncertainty
• Integration of remote sensing into GIS models
• Integration of temporal processes into GIS (3D- and 4D-GIS)
• Integrated systems that support a complete digital data flow from data collection with
mobile field GIS (Global Positioning Systems, GPS) to visualise and exchange results via
networks
• Formulation of versatile criteria for evaluating the prediction power of GIS-related environ-
mental models
• Compilation of high quality, accessible (shared) databases to be used as back-drops to
evaluate the predictive power of different GIS-related environmental models
• Establishing baseline and framework data
• Development of guiding GUIs that can lead the user to select the best method for the
formulated problem and the available data
134
4.2 Spatial data and GIS
• Development of friendly interfaces that promote the dissemination of GIS and integrated
models to domain experts, planners and managers.
especially powerful for analysing climate data (supplied with the software) for agriculture and
natural resource management applications. In many cases the software and bundled data are
free for use in Africa by non-profit organisations.
Several recent efforts in creating more-detailed (large-scale) regional framework databases
for Africa have been made. The most comprehensive is probably the Africover project by the
Food and Agricultural Organization of the United Nations (FAO). This database also includes
detailed land cover derived from combinations of Landsat ETM data and topographic maps.
The agencies of the UN have also initiated an attempt to create a common depot for their
GIS data – which has led to the Data Exchange Platform for the Horn of Africa (DEPHA) (see
page 140 for a more complete list of framework data sources available).
Land use/cover
Two global land cover data sets covering Africa in 1-km resolution are presently available. The
latest is derived from TERRA-MODIS (Moderate Resolution Imaging Spectroradiometer) data
(2000/2001) and was created by the University of Boston. MODIS has also been used to
create a global tree cover database in 500-m resolution available from the University of
Maryland. The older land cover is produced by the USGS from NOAA–AVHRR (Advanced
Very High Resolution Radiometer) data (1992/1993). It exists in several versions useful for
different applications and also includes monthly vegetation data from April 1992 to March
1993. The Africover database mentioned above is superior to these global databases but does
not yet cover the whole continent .
136
4.2 Spatial data and GIS
human impacts on the climate are available from the Climate Research Unit (CRU), University
of East Anglia, UK, either directly via the Internet , or from the Intergovernmental Panel on
Climate Change (IPCC) as a CD.
Population
The best and latest population figures are the 1-km resolution Landscan project data for
2000, 2001 and 2002 from the Oak Ridge National Laboratory, USA. These figures are created
from census data and downscaled using intelligent interpolation (using relations such as light
at night , slope, or elevation, which correlates strongly with population density). The Center of
International Earth Science Information Network (CIESIN) hosted by University of California,
has compiled global population data for 1990 and 1995. The data has an original resolution
of 5 arc-minutes (approximately 10 km), but for Africa the data mostly represent averages for
larger regions. United Nations’ African population figures for selected countries covering the
second half of the 20th century are available from Central African Regional Program for the
Environment (CARPE) (see page 140).
Soil map
FAO has produced a Digital Soil Map of the World (DSMW) in 1:5 million scale. Soil classes
are given as polygons, with derived characteristics attributed. The soil map is only available
as a CD. For some regions FAO also has a 1:1 million scale soil map.
Satellite imagery
Remote sensing (RS) data are increasingly important for creating and updating both physical/
biological and socio-economic databases. Access to RS data is constantly improving thanks
to: lowered prices, declassification of historical high-resolution data, a new generation of
multi-sensor satellites (TERRA and ENVISAT) that are now operating, improved computing
power and better software–user interfaces.
For national to continental studies NOAA-AVHRR and TERRA-MODIS data and their
derivatives are the most easily accessible. Other data of similar resolution that can be easily
accessed include the European Space Agency (ESA) ERS-2 satellite and its ATSR 7-band
sensor (which can be downloaded from the Internet in near real time), and the SeaWiFS 6-
band sensor.
Full coverage, high-resolution Landsat TM and ETM data are now also freely available for
the whole of sub-Saharan Africa via the University of Maryland. Landsat E(TM) composites in
Mr-SID compressed formats of the whole globe are more easy to download and available
from NASA. To find all available Landsat MSS, TM and ETM scenes, and other satellite data
sources use the NASA Earth Observing System Data Gateway.
The original TERRA–MODIS and NOAA-AVHRR scenes that were used for the land use/
cover classifications (see above) are all freely available as composites from University of
Maryland (TERRA–MODIS) and USGS (NOAA–AVHRR). The Africa NOAA–AVHRR tiles for
vegetation are also available from the International Centre for Insect Physiology and Ecology
(ICIPE). Additional, raw, NOAA–AVHRR data are available via the NOAA Satellite Active
Archive on the Internet , or from USGS at the cost of reproduction.
Points to remember
• Increased data availability and the ease with which distributed data layers are created
from point and line data, and remote sensing, have led to a widespread coupling of GIS
and remote sensing to existing (non-topological) cause-effect models in, for example,
hydrology and erosion studies, and to updating and downscaling land cover and popula-
tion density maps
• Data availability for Africa has now reached a point where it is possible to do such
studies, often with freely available data
• The major problem for the individual researcher is in accessing the data, and in acquiring
the skills of GIS and RS needed to ‘massage’ the data into a coherent database
• The free GIS software programmes available today are powerful enough for you to learn
GIS, and to create basic databases
• The bottleneck for using GIS for research in Africa is poor Internet access and poor GIS skills
• If you want to use GIS you should download the necessary data or order it via CD/DVD
(usually possible for a small fee), and you should learn GIS by using one of the listed free
software programmes
• As most of the software programmes have very similar interfaces, learning one means that
learning a second becomes an order of magnitude easier. So going from DIVA-GIS to Arc-
View is very simple (also because they share data formats).
Expert systems
GIS and RS (or geoinformatics) have developed from being tools for data storage and
presentation to also include analyses and modelling. Overlaying two or more thematic maps
(see Figure 9) is a simple but often illustrative means of identifying relations in spatial
138
4.2 Spatial data and GIS
Figure 9. Schematic structure of an expert system approach for spatial data analysis
patterns. More advanced analyses include using map algebraic formulae combining several
thematic layers. Such ‘expert system’ approaches are widely used to rank vulnerability of
natural resources, food security, or water availability. One example is the DRASTIC method
(Depth to groundwater, Recharge, Aquifer media, Soil, Topography, Impact of rootzone,
Conductivity) for groundwater vulnerability analysis, where each of the seven factors has a
physical related value. Development is towards more advanced expert systems including
object-oriented methods, and considering ancillary and multi-temporal data, and spatial
relations (Figure 9). Expert systems are like the game of chess – unambiguously defined with
a set of strict rules. Expert systems are thus said to be data- or forward-driven. However, GIS
is also becoming a decision-support system (DSS), e.g., for ill-structured (localisation) prob-
lems. Used as a DSS GIS becomes more of a tool for discussions and illustration of decision
alternatives. Formal methods have been developed to involve various stakeholders in such
discussions, including multi-criteria evaluation (MCE). In contrast to expert systems based on
predefined rules and weights of physical parameters, DSS are related to different stakeholders
perceptions, and as the aim is to reach a solution (for allocation of land use/development ,
water or nature protection), the method is said to be goal-driven.
Whether studying natural or social science, GIS can be very useful, and there is a plethora
of methods, models and techniques that you can apply to analyse or present data that deals
with spatial relations. But it is critical that you formulate a sound hypothesis and use
adequate data of sufficient quality. To avoid mistakes a parsimonious approach, and rigour-
ous meta-data description is essential. This will make it easy to update and eventually publish
your data and results.
139
The Green Book
140
4.2 Spatial data and GIS
CIESIN (Center for International Earth Science Information Network) Colombia University
Including climate data and global gridded population data from 1990 and 1995
www.ciesin.org
CARPE (Central African Regional Program for the Environment)
http://carpe.umd.edu/products/
141
The Green Book
142
4.3 Designing experiments
Richard Coe
• Experiments are a
central part of the
Experimenting as part of research
scientific method Experimenting is a part of everyday life. In an informal way you
because they allow experiment when you check whether your tea is too hot to drink,
you to test cause- whether the bus is less crowded if you leave for work earlier or
effect hypotheses
whether your supervisor approves of your style of writing. Within
• Many students formal agricultural research, experimentation has long been the
learn about experi-
ments in the key tool. To many people ‘agricultural research’ is synonymous with
context of studies field plot experiments. If you visit an agricultural research station
of small field plots, one of the main things you can see are small plots for comparing
but the key princi- different crop varieties or different management techniques. Much
ples of experimen-
tal design are of the current theory and the methods for carrying out experi-
equally important ments were developed in the context of agricultural experiments,
in all studies most notably by R.A. Fisher, a geneticist and statistician working at
• All aspects of the Rothamsted Experimental Station in UK.
design of an experi- Today field plot experiments on research stations are not the
ment depend on its only avenue for agricultural research, but the ideas and methods of
objectives, so the
objectives have to experimentation are still central to good research. Why?
be carefully and Experimentation is concerned with the ‘testing theory’ step of
thoroughly devel- research (Chapter 4.2). Theories which help in problem solving
oped often describe what will happen if a change is made:
• The details of the • If we use this new variety of maize there will be less damage
design of a good from stem borers
experiment will
balance theoretical • If we substitute dairy meal with calliandra fodder milk produc-
optimality with tion will not be reduced
practicality • If we train farmers in pest management they will be able to
• Every experiment grow cabbages more profitably
should have a • If communities are better informed they will be more effective
written protocol in managing common grazing.
that can be shared
with others, so the Now in order to test your theory the obvious thing to do is to
design can be make the change and observe whether the predicted outcome
improved before occurs. This is the basis of experimentation, and the reason it is
the experiment so important .
starts
There are situations in which it is impractical or unethical to
experiment , in these situations other ways of testing theories have
to be found. It is not feasible to experiment if your prediction is:
• If the average annual temperature rises by 2oC then maize
production in Kenya will drop by 15%
You could test the prediction by setting up simulation models
(Chapter 4.8) that describe the relationship between production
and temperature. But those models will themselves be based on
theories tested by experiment . Here is another well known predic-
tion made some years ago:
143
The Green Book
• Regular smoking will lead to an increased chance of lung cancer and other diseases.
It was not possible to test this by experimentation as that would have involved taking a
group of people and requiring some of them to smoke. This theory was tested largely by
surveys (Chapter 4.4) which are distinct from experiments. In a survey you observe what is
happening without making deliberate changes. Thus the effect of smoking was investigated by
comparing the health of people who smoke with those who don’t , and a clear correlation
emerged. The limitations of the study design are clear: if the smokers have a higher rate of
lung cancer we can not be sure the smoking causes the lung cancer. Perhaps there is some
unknown factor that tends to lead people to both smoke and get lung cancer. This is a
problem of the survey approach to investigation, and means that theory testing is harder
using surveys than using experiments. In the case of the health impacts of smoking, various
possibilities for such factors were suggested (diet , genetics), and then eliminated by surveys
which controlled for them, each providing evidence in support of the theory. However there
will always be people who think of one more factor that could be the explanation. This would
not be the case if the theory could be tested by a well designed experiment .
This chapter summarises the key decisions that have to be made if you are to conduct a
well designed experiment . A prerequisite is understanding the principles and language of
experimental design, described in the next section.
144
4.3 Designing experiments
Table 1. Results of a simple Are these results more convincing? They certainly show
experiment consistency: Boreproof had less damage than M512 in every
Stem borer damage (%) field except field 7, which has no stem borers anyway. But
Field M512 Boreproof look carefully at the way the design was described. The
1 50 20 M512 was always placed on the left-hand plot . Maybe the
2 20 10
3 30 20 difference in stem borer damage is nothing to do with
4 60 30 variety, but due to some other consistent difference be-
5 60 40 tween left- and right-hand plots. Maybe the wind blows
6 20 5 from the left , bringing the pests or stressing the plants.
7 0 0
8 40 10 You may know that is not the case, but could have trouble
convincing others. And you can never be sure that there is
not some other systematic difference between left- and
right-hand plots. The solution is to randomise the allocation of treatments to plots. In field
1, toss a coin to decide whether Boreproof or M512 goes on the left-hand plot . Then
randomise again in field 2, and so on. In field 1 you might end up with Boreproof on a plot
with less stem borer damage for reasons unconnected with the variety, but over the whole
experiment you can be sure that the only systematic difference between plots with Boreproof
and plots with M512 is indeed the variety.
The basic ideas of experimentation described above should also help you understand
studies which, though they involve comparison, are not experiments and can not demonstrate
cause. For example, suppose a study showed that farmers in Central district have less stem
borer damage in their fields than farmers in West district . They also have higher adoption
rates for the Boreproof variety. This study involves comparing districts, but is not an experi-
ment as the differences in adoption of Boreproof were not imposed by the researcher. It is
also common to make comparisons over time, for example, by comparing stem borer
damage levels before and after introduction of Boreproof in Central district . In such studies
the change may be devised by the researcher, but it should only be considered an experi-
ment if other features are present , e.g., some other districts in which Boreproof was not
introduced, random allocation of the introductions, and some replication.
145
The Green Book
Determine if Boreproof 1. Boreproof maize 10m x 10m plots of land Percentage plants damaged
is more resistant to 2. M512 maize by stem borer
stem borer than M512
Find the effect on milk 1. Base diet + dairy meal Dairy cows for 2 weeks of Milk production in the
production of substitut- 2. Base diet + calliandra third month of lactation second week
ing dairy meal with 3. Base diet + 50%
calliandra fodder calliandra + 50% dairy
meal
Check whether training 1. No training Farmers for whom cabbage 1. Farmers’ knowledge
in pest management 2. Attendance at farmer production is a main 2. Profitability of cabbage
allows farmers to field school on pest enterprise enterprise
produce cabbages management
more profitably
Design decisions
Now you understand the basics of designing experiments you can start thinking about the
design of your experiments. You may need one substantial experiment , several smaller
related experiments or possibly no experiments. If you are going to experiment then there
are many decisions you will have to make about details of the design. How can you make
those decisions? There are several sources of help:
• The fundamental principles of experimental design – the outlines above and more details
in other texts
• The more practical ideas in the following sections, and in other texts
• Papers and reports describing similar experiments that others have done
• Other researchers who have worked on a similar topic (maybe in a different region) or
used a similar method
• Your observations of other experiments
• Your imagination
• Pilot studies in which you try out techniques and arrangements before committing your-
self to an expensive or long-term experiment .
There is no single correct way to design your trial, but there will be plenty of ways that
are wrong – designs which will not lead to valid conclusions meeting your objectives. Even if
you design a trial that will give valid results it may be inefficient – not give you as much
information as possible for the time and effort spent . Avoid these scenarios by:
1. Thinking.
2. Using all the sources of help listed above.
3. Showing your design to others and getting their comments.
4. Envisaging the data your design might produce and the way in which you would then
interpret it . Some researchers sketch out the tables and graphs they would use in the
146
4.3 Designing experiments
analysis of the data, then making sure the design will generate the required numbers to
complete them.
5. Thinking of the practical as well as the theoretical requirements. You have to manage your
trial (set it up, look after it), cope with the travel requirements, have enough time and
equipment to measure all the plots, and so on. And you have to be able to afford it!
6. Iterate. Start with a possible design, think through the consequences then go back and
revise it until you have something sound.
7. Thinking.
In the following sections the main ideas you need to make decisions on each of the key
points are described together with some of the common mistakes that you must try to avoid.
Objectives
All aspects of the design depend on the objectives. Therefore you must get the objectives right!
Objectives must be:
• Clear. If the objectives are vague it will not be possible to decide on the rest of the design
• Complete. Often the statement of objectives is incomplete so that the experiment can not
be designed.
• Relevant. In applied research, experiments are made to help solve real problems and fill
knowledge gaps in the process. The objectives of the experiment must be relevant to
solving the problem. It must be clear how you will be a step nearer solving the problem
once you have the results from the experiment
• Reasonable. The objectives must be reasonable given current understanding of relevant
phenomena and other observations. Avoid objectives that contain elements of alchemy or
wishful thinking
• Capable of being met by an experiment. Some research questions do not need an
experiment . Two problems which often arise here are:
– objectives that require a survey rather than an experiment
– objectives that require two or more experiments rather than a single one.
Make sure that the objectives fit in well with the overall strategy of the project . You have
to be able to explain what the next step will be after the experiment is completed.
Treatments
There are four ideas you need when choosing treatments:
147
The Green Book
1. Comparison and contrasts. Experiments involve making comparisons. The exact com-
parisons that meet the objectives can be defined as contrasts, i.e., the numerical expres-
sion of the comparison. Make sure your experiment has all the treatments needed to
make all the comparisons implied by the objectives.
2. Controls. ‘Controls’ or ‘control treatments’ are the baseline treatments against which
others are evaluated. In the stem borer experiment M512 might be considered the control.
3. Factorial treatment structure. Many experimental objectives require looking at several
‘treatment factors’. For example, in the stem borer experiment you may also want to look
at the effect of sowing date (early, mid, or late). Then the experiment might have 6
treatments (Boreproof sown early, mid, or late and M512 sown early, mid, or late). Factorial
treatment structures are important for two main reasons:
– they tell you about interaction – such as whether the difference between Boreproof
and M512 depends on when they are sown
– if there is no interaction they give information about both factors with the same
precision as would be obtained if the same amount of experimental effort went into
investigating just one of them. This is the ‘hidden replication’ described in textbooks.
4. Quantitative levels. Some experiments require varying a quantity that could have many
different levels, such as sowing date or amount of fertilizer applied. Choosing the levels to
use as treatments in the experiment depends on the exact objectives and what you
already know about the response to varying it . Generally fewer rather than more levels are
needed, and there is rarely a reason for using more than 4 different levels.
Units
With crop experiments, decisions have to be made on size, shape, orientation and arrange-
ment of plants within the plot . There are a few guidelines based on theory:
148
4.3 Designing experiments
• Many small plots often give more precise results than a few large plots taking up the same
area
• Long thin plots often give more precise results than squarer plots.
These guidelines have to be modified by practical considerations:
• Plots have to be large enough to manage (sow, weed, spray, harvest) in a way that
represents what a farmer could do
• Plots have to be large enough to take measurements, allowing for the possible disturbing
effects of destructive measurements during the experiment (Chapter 4.5).
• Borders may have to be left around each plot to make sure that anything happening on
one plot does not influence what goes on in the next plot .
These considerations, particularly the last point , often overrule the theory.
If the units are not plots of land but animals, people or communities then there are often
more decisions to make and few general guidelines. Base the design of the experimental
unit on the experience of others who have done similar experiments. What did they use as
the unit? What problems did they have? How will your experiment differ from previous ones?
Does that imply any changes in unit?
Think of the experiment with factorial treatment structure, with two varieties (Boreproof
and M512) each sown early, mid, and late. A common design for this type of experiment is
the split-plot. Large plots are defined and the early, mid, or late sowing date allocated
randomly to each one. Then each large plot is divided into two, with M512 and Boreproof
randomly allocated to the two halves. A split-plot design can have practical advantages, for
example, you are less likely to disturb the early sown plots when sowing the later ones.
However it does have disadvantages. There are two sorts of plot (large plots and split plots).
This complicates the analysis because variation between both types of plot has to be
considered. The precision of a split-plot trial is generally less than for that of alternative of
random allocation of all treatment combinations to the smaller plot . Don’t use a split plot
design unless practical considerations require it.
Replication
Replication (having several units of each treatment) is important for four reasons:
1. Estimating precision. The uncertainty in an average is estimated by the variation
between the observations being averaged.
2. Increasing precision. Calculating an average over more values from a replicated experi-
ment will increase precision since the calculated value will be closer to the true value.
149
The Green Book
3. Insurance. More replications in an experiment will provide some insurance against things
going wrong with one or two replicates. Without such insurance an experiment may be
rendered useless by, for example, goats getting into the field, or some participating
farmers dropping out of the study.
4. Replication. This can increase the range of validity of results if a comparison is repeated
under a range of conditions.
There should be enough replicates to satisfy all the above reasons for replicating:
1. Estimating precision. Look at the error d.f. (degrees of freedom) from the analysis of
variance, 10 d.f. can be considered a reasonable minimum. Much more than 20 has no
particular advantage.
2. Increasing precision. If you have an idea of the precision you need and the variation in
your experimental material then it is possible to estimate the number of replicates
needed. Details are in books, and software is available to help.
3. The number required for insurance must depend on the risks. A long-term trial in
a risky environment (e.g., one that might be burned in the dry season) may be worth
insuring, by adding replicates. A short-term trial that can easily be repeated if something
goes wrong is not worth insuring.
4. Increasing the range of validity. Suppose the stem borer trial had some replicates on
sandy soil, some on loam and some on clay soil. Then you could be more confident that
the results were generally valid than if the experiment had only been done on sandy soil.
The importance of this will depend on the objectives.
150
4.3 Designing experiments
Site
The site(s) for the experiment will be determined by the objectives. It has to be representative
of the problem area, both on a large scale (for example, in the same agro-ecozone) and on
a small scale (for example, having the appropriate soil type and previous management).
The site also has to be practical. It should be:
• Accessible
• Secure
• Large enough
An experiment will have to be made at more than one site if any of the following apply:
1. The problem area is too variable in key characteristics for a single representative site to
be found.
2. You are unsure of the key environmental (biophysical, social or economic) characteristics
that may determine the outcome of the experiment , so cannot be sure they are repre-
sented by a single selected site. Getting consistent results from several sites will give you
confidence that these results really do apply to a wider area.
3. The objectives of the trial require conditions to be compared that cannot be controlled as
treatments, such as soil type, rainfall or soil depth.
Cases 1 and 3 require sites to be selected in the same way that single sites are selected.
There is an argument in case 2 for sites to be chosen by random selection, but that is
rarely practical.
The same considerations apply when experiments are carried out with farmers and
communities. Do not simply choose the villages or farmers in which last researcher worked,
but look carefully at the objectives and decide on which characteristics it is important to have
represented.
151
The Green Book
Next , determine which treatment will be applied to each unit . Random allocation should
be used. Random allocation does not simply mean ‘mixed up’. Avoid any possible bias by
using an explicit random process. For example, use pieces of paper with treatment names put
into a ‘hat’. The number of pieces of paper for each treatment will be the number of
replicates. Then decide the treatment for the first unit by drawing a paper from the hat
without looking, again for the next and so on. There are computer programs to help with this.
The precision of almost every experiment can be improved by blocking. Whatever units
you have, you know they will vary. Some variation is predictable. Try to arrange the units into
homogeneous groups, each of which will become a block. Table 3 gives some suggestions on
characteristics that might be used to block different types of unit , but what is suitable for
your trial will depend on the objectives of your trial. For the stem borer experiment , the level
of stem borer damage in the previous season may be a good characteristic to use to group
units into blocks. However it would be irrelevant if the trial was about N leaching or weeding
regimes.
Table 3. Possible factors to use in definition of blocks
Units Characteristics used in blocking
If:
1. Every treatment will have the same number of replicates and
2. Every block has the same number of units and
3. The number of units in a block is equal to the number of treatments.
Then the best design is to put exactly one replicate of each treatment in each block. The
allocation of treatments within a block should be random. This is the randomised-block
design.
If the blocks are not all the same size, or the number of units in each block is not equal
to the number of treatments, then you will have an incomplete block design. Take care when
deciding which treatments go into each block. Software is available to help you with this.
2. Assuming blocks have to be the same size and equal to the number of treatments.
Incomplete block designs can be very useful. If you have to get a bit of help designing and
analysing them it will be worth it .
Management
In a field experiment , ‘management’ means preparing the land, sowing, weeding and all the
other agronomic practices needed to raise the crop. In other types of experiments there are
equivalent management activities. The management of an experiment is often not considered
part of the design, yet it can have a large impact on the success of the trial.
1. Decide whether the objectives demand that you manage the experimental material to a
very high level (e.g., zero weeds) or a realistic level (e.g., farmers’ weeding practice). The
first may be appropriate if you are studying processes such as water or N uptake, and
don’t want weeds to obscure results. The second will be appropriate if you are evaluating
technologies and want them to represent farmers’ systems.
2. Avoid confounding treatments with management differences.
3. Aim for uniform management . Often the difference between a successful and a failed trial
is in how well the crops (or animals or people) were managed, and whether this was done
uniformly. You can improve uniformity by, for example, training fieldworkers and monitor-
ing the way they execute operations.
154
4.4 Designing surveys
Erica Keogh
155
The Green Book
Example 1
a. Problem 1. Increased elephant damage has been reported in some villages. Are the
elephants moving along normal migration routes or are they roaming more widely than
before?
Here you would be interested in describing existing conditions and, possibly, trying to make
comparisons with conditions in previous years. The research could be extended over a
period of time, thus monitoring the situation over a number of years.
b. Problem 2. Is infestation of maize fields by Striga worse when fields are suffering from
soil erosion?
In this case you could do a preliminary investigation into whether there is a measurable
relationship between Striga infestation and the level of soil erosion. Alternatively, if there is
prior information about this relationship, you can test the hypothesis that this relationship
exists and is quantifiable.
Types of surveys
There are various ways to distinguish one type of survey from another, but perhaps in the
present setting it is best to provide examples which will illustrate the wide variety of studies
that are possible.
• Street interviews to assess public opinion about price increases of seed maize
• Household interviews to measure food production and consumption for monitoring food
security
• Field observations to estimate earworm infestation in the current maize crop
• Field observations and community discussions to quantify the effects of elephant damage
to crops
• Household interviews to gauge the effects of HIV/AIDS on labour availability for house-
hold agricultural activities
• A case control study to compare old and new tillage practices in different communities
• An enumeration of tree species in quadrats within a specified area for assessing biodiversity.
• A study to estimate soil fertility prior to land preparation
• A study of a sample of records from the meteorology department to track rainfall patterns
over the last 50 years
• An investigation of sections of river banks to determine silting levels arising from gold
panning.
From the examples you should realise that a survey may entail interviewing people, or
collecting specimens, or measuring items, or studying records, or a combination of one or
more of these activities. Thus, the type of survey you are planning dictates what measure-
156
4.4 Designing surveys
ment instruments you will be using (e.g., a questionnaire for interviews or a tape measure to
check the area planted), and also the sampling scheme (the rules for choosing exactly which
things will be measured) you will be using. This matching of ‘tools’ to the type of study is one
of the classic features of surveys, with each survey having a unique set of instruments and
methodology for efficient data collection.
Example 2
Referring back to the problems introduced in Example 1, some of the terminology you are
going to meet when designing and implementing surveys can be illustrated.
Problem 1 Problem 2
Setting up a survey
An effective survey encompasses many activities, which must all come together to provide a
useful and timely report. The actual planning for a survey is as important as its implemen-
tation, and the amount of work involved in the planning should not be underestimated. The
efficient and successful management of a survey depends to a great extent on a thorough
understanding of the population, of the survey topic, and on having well structured adminis-
trative backup available throughout . Available resources will often dictate planning decisions,
but it is essential to aim to maintain the quality of all procedures by adopting a ‘global’
viewpoint , i.e., by considering the impact of each decision made at a particular stage, on
the whole project, thereby achieving balance and consistency throughout. Some examples
of surveys have been given. Next we look at the details of survey design and implementation.
Familiarise yourself with all possible sources of existing knowledge from previous
studies. Such information can be used not only to identify gaps and thus emphasise the
need for the present study, but also to provide checks on possible sources of bias, to help
avoid duplicating work already competently carried out , or to improve estimates previously
obtained. It is also important to identify all possible secondary users, i.e., those who may
have use for your data in the future. Such users can be of great help with planning, avoiding
conflict , and suggesting alternative approaches. You may also be in the situation where your
research project is but a small part of some on-going larger research project – in this case
it is essential to:
• Maintain contact with those implementing the larger project
• Receive information about results being obtained from other sections of the project
• Ensure your project fits in with the overall larger objectives
• Provide timely feedback on your progress to all other players
• Work with others as part of the larger team.
The flow chart shown in Figure 1 illustrates the phases of a survey, each of which needs
careful planning right from the beginning.
Right from the beginning, it is essential to:
• Be aware of all resource limitations
• Be able to identify, for each task:
– Who is going to be responsible
– How much it will cost
– How much time it will take.
Surveys involve large amounts of documentation, all of which have to be prepared in
advance and tested for ease of usage. Sometimes you will need to recruit persons who can
assist you at one stage or another.
158
4.4 Designing surveys
159
The Green Book
Budgeting
Hand-in-hand with timetabling for the survey, is the survey budgeting. This is probably the
most difficult task of all since the survey design is totally dependent on the budget , and
vice versa – so which comes first?
Errors
A survey requires and combines the techniques of sampling, design of tools, data collection
and data analysis, and the accuracy of the methods employed will determine the quality of
the information finally produced. In any survey there are many potential sources of error
which may be broadly classified as sampling errors and non-sampling errors.
Sampling errors. These are errors arising because, by chance, the sample is not fully
representative of the population. Such errors can be estimated and are a random result of
the sampling procedures. Broadly speaking, the larger the sample size, the smaller the
sampling errors.
Non-sampling errors. This category includes all of those errors which can arise from other
sources:
• Variation between data-collection personnel
• Inadequate tools
• Inadequate sampling frame
• Data-entry errors
• Coding errors
• Non-response
• Errors in response
• Effects caused by the way questions are worded.
Each of these can give rise to bias which is often not measurable. Bias means that the
results based on the survey are not , even on average, the same as those that would have
been derived from a total census of the population, but consistently over- or under-estimate
quantities.
Increasing the sample size, so as to reduce sampling error, can very well increase
non-sampling errors due to resulting poorer-quality enumeration and lower levels of supervi-
sion. Sampling and non-sampling errors and their relative magnitudes must be considered
simultaneously when determining sample size. Often only sampling errors are mentioned
since non-sampling errors are usually not measurable, and sometimes unknown. You must
remember that you will be making many measurements on your sample and that the
precision of estimates is likely to vary from factor to factor.
160
4.4 Designing surveys
Sampling
The theory of sampling is covered adequately in many texts and only some brief notes are
made here. There are two inter-related decisions to make: the type of sampling and the
sample size. Decisions depend on blending the theoretically optimal with what is really
practical, in the light of the survey objectives.
Decisions about sample size must be taken in the global context of the project and must
include consideration of the following factors:
• Available resources
• Objectives of the study
• Sub-groups, within a population, that you wish to study
• Practical constraints
• The precision needed
• Homogeneity of the population.
Following are definitions and examples illustrating them.
The population
The target population is defined as all those units in which you are interested. The study
population is defined as all those units that you can reliably identify. Ideally these two
populations should coincide but , unfortunately, this is often not the case, particularly when
the population consists of people.
Units
When you implement your survey, you are going to be dealing with a unit, i.e., you are going
to interview a person, or generate discussions with a group of people, or count the number
of Striga plants in a quadrat within a field. These are the units of study. In many studies
there is a hierarchical arrangement of units. We measure things on people, but also record
something about the household they are in, the village in which the household is found, and
the district where the village is located. This hierarchy may be used in sampling, even if
measurements are taken at only one level.
Sampling frame
The sampling frame is a ‘list’ of all the items from
which you are going to select your sample, noting
that you need a separate frame for each level in the
hierarchy of units. Careful construction of the frame
is needed since, as mentioned above, unexpected
errors can easily arise if the frame is out of date, or
if it has inaccurate or duplicate records, and so on.
If the frame is inadequate we say it exhibits over-
coverage o r under-coverage – these terms simply
reflecting the non-match of study and target
populations (Figure 2). If you are sampling house-
holds, then the ideal frame is a list of all the house- Figure 2. Target and study populations
holds. If sampling fields or rivers, for example, the
sampling frame may be a map or aerial photo, i.e., an implicit list .
161
The Green Book
Example 3
Refer back to Problem 1 in the previous examples. Not only do you want to observe and
measure the actual damage in the fields, but you also will wish to interview the villagers and
discuss with them their methods for protecting their crops. Another aspect of interest will be
gender differences in managing crop damage. Suppose the district authorities provide you
with a map on which the locations of villages in the study area are marked. Your first stage
of sampling will be to select villages and thus your target population is all villages in the
area, whilst your study population is all villages marked on the map provided to you. If the
map is out of date it may mark a village which, no longer exists because its inhabitants
moved out to another area 2 years ago precisely because of high rates of crop damage. We
call this over-coverage since that village would potentially be selected into the sample
(according to the map) and yet it does not really exist . Conversely, if a new village has been
formed, with some inhabitants of one village moving away and making it their own new
settlement area, then this village may not be marked on the map at all and so will not be
available for selection into the sample. We call this under-coverage since that village is not
(and yet should be) available for sampling. Both of these situations will give rise to non-
sampling errors that cannot be measured and you may never know they exist.
162
4.4 Designing surveys
Clearly it is often useful to classify the population in more than one way, and thus you can
use techniques of stratification and clustering together. The final selection of the units you
are going to study is usually done using either simple random sampling or systematic
sampling. The following examples will clarify these notions.
Example 4
Referring back to Example 3 – recall we have a map (hopefully up-to-date) showing the
location of villages in the area of interest. As noted before, the villages marked on the map
will be the items in the sampling frame for the first stage of sampling. The actual people
resident in each selected village, i.e., the households, will represent items in another sampling
frame, for a second stage of sampling. Focus for now on this selection of households and
recall that you are interested in the gender dimensions (of head of household) of crop
protection from animal damage.
Discussions with the district officials and some initial contact with communities in the
area will provide you with information about the overall picture of wild animal marauding in
the district . You discover that in one area the main proponent of damage are elephants, with
lesser damage caused by baboons and jackals, whilst in another part of the district with a
different vegetation type, there was apparently an influx of Quelea birds which caused terrific
damage during the past month. The remainder of the district suffers little from large animal
damage, with only baboons and jackals causing any measurable loss. Obviously, it will be of
interest to study the whole district , even though the elephant damage is only restricted to
one area – by looking at the whole district you would hope to be able to compare and
contrast areas with different levels of elephant damage.
On the basis of the above observations you decide that there should be three strata
within the district . Within each stratum, villages can be randomly selected – probably using
a systematic sample from the map. This will constitute the first stage of sampling.
The second stage of sampling, that of households within villages, can be approached in
a number of ways. Firstly, for each selected village, the village head could be asked to
prepare two lists of names of heads of households, a male list and a female list , and a
random sample of households can be drawn from each list . Alternatively, you could, with the
assistance of the village head, draw up a map showing all households in the village, marking
each one as male- or female-headed. Within each group of male and female household
heads, each household will be given a number and then, for each gender group, a systematic
sample of households can be selected.
Heads of households can be interviewed using a prepared questionnaire to extract
demographic details and obtain estimates of crop damage that has occurred in the past two
seasons. In addition, focus group discussions can be held with key informants in each village
in order to obtain in-depth information and opinions on the issues of crop damage and ways
to reduce it .
The process described in Example 4 is called multistage sampling – in other words you
sample at various stages of the population hierarchy, ensuring that at each stage you select
an adequate sample from each sampling frame. Issues of sample size at each level of
sampling will need to be discussed and finalised with someone who understands the
theoretical aspects of random sampling.
Example 5
Now let us take Example 4 further, and address the issue of data collection from the fields
163
The Green Book
and storage places of the villagers. This will constitute a third stage of the multi-stage
sampling process. One approach is to use the selected sample of households as a starting
point for selection of actual sites for measurements of crop damage. Each selected house-
hold will have a number of fields under cultivation. Depending on the size of area under
cultivation, it may be sensible to sample areas within fields for exact measurements, or
another approach could be to sample whole fields from those under cultivation. Sampling
areas within fields can be done by mapping the field, dividing it into plots of the size to be
examined, and then randomly selecting a sample of plots or quadrats – this can be easily
carried out once the map is drawn up. Additional considerations that must be taken into
account when planning this third sampling design include the type of crop planted, direction(s)
from which animals invade, location of water sources, and any other factors that may have a
bearing on crop damage.
An alternative approach would be to ignore the sample of selected households and begin
afresh, requesting the community to draw up a map of all planted fields, including crop and
animal access information, location of water sources, etc., as above. Planted areas may then
need to be clustered before sampling quadrats within each cluster.
Selection of storage places for recording types of storage and amount of damage can
again be approached in several ways.
Sample size
Decisions on sample size depend on a number of factors, including:
• What is required in terms of precision of variables measured?
• Just how much variability is there expected to be in each item to be measured?
• The practicalities – how big a sample can you actually deal with, in terms of both time and
resources?
• What sub-groups of the population are really of interest? You need to decide on the
sample size for the smallest sub-group of interest to ensure that the sample for this sub-
group is adequate for realistic estimation
• Which variable should be used to calculate sample size?
A good way to think about sample size is in terms of obtaining a confidence interval, i.e.,
what width of confidence interval will be acceptable for decision-making, based on the survey
results? The width you need will be used to determine the sample size, for each sub-group
of the population. Expressing the results in terms of confidence intervals helps in interpreting
the results more realistically. If the confidence interval is too wide then no meaningful
conclusions can be made. As mentioned earlier, the larger the sample size the narrower the
confidence interval – but increasing the sample size is likely to increase the cost and non-
sampling errors. Managing a large sample survey requires extensive resources and person-
nel if quality is to be maintained, and it is only the large agencies who can afford this type
of survey. But if the sample size is too small, then once again the quality of the estimates is
at stake, and results will not be meaningful.
It is not true that the fraction (f) of the population sampled greatly influences the
accuracy of the sample. The information in a sample of 50 from a population of 10,000
(f = 50/10000 = 0.5%) is much the same as that in a sample of 50 from a population of
100,000 (f = 0.05%), other things being equal. The sampling fraction is not something to
consider when fixing the sample size, and aiming for a 10% sample or a 5% sample is not
logical. The only exception to this is when f starts to get large – say over 20%.
164
4.4 Designing surveys
You should be constantly aware that each survey study planned and implemented is a
unique case and thus ‘standard’ sample sizes do not exist. You should familiarise yourself
with previous research, but only use it to provide guidelines for your own study. The sample
size used last time may or may not be suitable for the current study and you should make
your own considerations and do your own calculations, rather than assuming that those used
in a previous study were suitable.
Questionnaires
When communicating with people, either via a structured interview or via focus group
discussions, or by any other means, it is wise to lay out a questionnaire ahead of time and
to know in advance the type of answers you can expect from each question. Questionnaire
design is extremely important – when you are interviewing people, you are assuming that:
• Everyone has the same understanding of each question
• Each question does have an answer
• Each question can be relatively easily answered
• Each question should be relevant to your study
• The question is not ‘leading’ the respondent towards a particular answer.
Remember that sensitive questions can upset people, which will lead to inaccurate
information being provided. Good questionnaire design can only come with experience and
it is wise to always ask for assistance.
Questions can be classified as open or closed. A n open question is one for which any
answer is accepted and recorded in full. A closed question is one in which you supply pre-
165
The Green Book
determined response categories into which each and every response should fit . Thus,
response categories should be:
• Non-overlapping, i.e., mutually exclusive
• Exhaustive
• Permit an overview of the situation
• Neither too many nor too few
• Placed in a logical order.
Open questions provide more information than do closed questions but they are corre-
spondingly harder to analyse and wherever possible it is best to use closed questions.
Focus group discussions (using open questions) are extremely useful for finding out
general information and situations on the ground.
Finally, remember that you should place your questions in a sensible and logical order so
that the interview/discussion will flow.
166
4.4 Designing surveys
some ways it is like a ‘mock’ study of all procedures. When the pilot is complete you can
finalise the measurement tools and reproduce them in bulk as required.
The time between the pilot study and the main study should be as short as possible so
that all personnel remain in the correct frame of mind for the main study.
Data management
A survey generates a huge amount of data and thus it is essential to be absolutely
organised for every aspect. Data collection forms should bear unique ID numbers which, by
means of codes, will enable the data manager to know exactly where that data was collected,
and by whom. The team leaders should check each completed form in the field and, if there
are problems, the person who collected the information will have to return to the site and
repeat the process. Once the team leader is satisfied with a form, he/she will pass it on to
the data manager. If any coding is to be done it is now that it should occur – for instance,
categorisation and consequent coding of the content of open questions can take place at this
time. Thereafter the form is ready for data entry. Often data will be entered twice – double
data entry – an approach that is recommended since it nullifies errors of entry – many
statistical packages offer this facility. Those doing data entry should have been involved in
the planning so that they are aware of the survey objectives, familiar with the measurement
tools, and thus in a position to spot inconsistencies and/or errors on the data forms – in this
way cleaning of the data begins even at the data-entry stage.
Full-scale cleaning of the data usually takes place once all data has been entered and the
data files merged into one. Cleaning involves examining each variable in turn, looking for
outlier values and inconsistencies, particularly in respect of other variables that provide
complementary information. Once the data is pronounced clean the data analysis plan can
be put into action and results obtained for input into the final report . Additional recoding
may take place during the data analysis, e.g., merging categories of responses for more
realistic analysis.
167
The Green Book
Data storage
The importance of backing up your data files cannot be emphasised too often. At least three
copies of each of the following files should be kept , preferably on CD’s
• Original data entry files, obtained before cleaning
• Cleaned data files
• On-going analysis files
• Records of comments on data collection
• Records of progress on data collection
• Records of coding and recoding
• Tables and other results of the analysis plan
• Reports.
All of this information will, eventually, feed into the data archive that should be set up on
completion of the survey.
Reporting
The survey report should follow the phases of the survey. Each phase should be reported
upon fully, including both good and bad aspects. Full details of measurement instruments,
training instructions, field reports, coding procedures, cleaning procedures, and the data
analysis plan, should be available as appendices to the main report. Don’t forget to report
on the non-sampling errors!
168
4.4 Designing surveys
Scheaffer, R.L., Mendenhall III, W. and Ott , R.L. 1996. Elementary Survey Sampling. Duxbury Press, Kent , UK.
Thompson, S.K. 2002. Sampling. Second edition. John Wiley & Sons, New York, USA.
United Nations. 1964. Recommendations for the Preparation of Sample Survey Reports (Provisional issue), Statistical
Papers, Series C, ST/STAT/SER,C/1/Rev.2, New York, USA.
United Nations. 1984. Handbook of Household Surveys (Revised edition), Series F, No. 31. New York, USA.
United Nations. 2004. Technical Report: Household Surveys in Developing and Transition Countries: Design, Implementation
and Analysis (Section D, Chapter 14) (in preparation) ST/ESA/STAT/AC.85/www. New York, USA.
Internet resources
www.reading.ac.uk/ssc/develop
http://unstats.un.org/unsd/hhsurveys/index.htm
169
The Green Book
170
4.5 Measurements
Jane Poole
• Measurements
generate the
Introduction
primary data in Measurement is a general term that encompasses many types of
your study, whether data collection. Measurements may be numbers that a scientist
it is a survey or
collects, such as yields of a crop in a field trial. But they can also
experiment
be notes made of a farmer group discussion, climatic data pro-
• You will have to vided by a local meteorological station, or responses to survey
measure not only
the primary quanti- and interviewing questions.
ties that meet your Every aspect of your research study needs careful design. This
objectives, but includes choosing what measurements to take, when to take them
those data that and why. You must also consider how to measure and how much
help explain and
qualify them to measure. For large trials and surveys it may be necessary to
delegate data-collection to other scientists, local extension officers
• There are always
alterative ways of or farmer representatives – you will need to decide who takes the
measuring anything. measurements. This chapter provides general guidance on how to
Choose the method make these choices and highlights important issues to be consid-
that best meets ered.
your objectives
while being practi-
cally feasible What are measurements?
• Pay attention to Measurements generate the data you need for your research.
quality control: You require these data and their analyses to make your research
careless measure-
ment can jeopard- conclusions. There are many different types of measurements and
ise the whole study your choice of which to use will depend on the objectives of the
study, and on other details of the design. The measurements
needed will also determine some aspects of the design.
The following are examples of measurements that may be
taken for different types of agricultural research. These examples
are just a small selection of the hundreds of possible measure-
ments you could take.
• Laboratory trials – chemical properties of soil and water
samples, pathogen growth on petri dishes, insect mating and
offspring production, eating routines of insect pests
• On-station and on-farm field trials – plant heights, insect
pest and disease levels, crop and biomass yields, root damage
of plants, farmer-participatory evaluation of varieties, labour
requirements
• Participatory research – farmer group characteristics, farmer
perception of new technologies, farmer evaluation of on-station
demonstration trials
• Biophysical surveys – site location and characteristics, plant
varieties, crop management , scientist-evaluated disease infec-
tion levels, farmers’ perception of disease infection levels
171
The Green Book
• Socio-economic surveys – site location and characteristics, household and farmer char-
acteristics, farmers’ perception of crop management practices, farm labour information
• General/environmental measurements – weather data (rainfall, temperature), soil type
and properties.
Types of measurement
Qualitative and quantitative
Both qualitative (farmer opinions of new technologies) and quantitative (crop yields) data
require measurements. Quantitative measurements are necessary for many analyses and
interpretations. Qualitative data can often add insights and explanations that are hard to
capture in numbers. The distinction between the two is not always clear. Qualitative data
(farmer reasons for crop failure) can be quantified after coding, (e.g., by noting whether or
not ‘drought’ is given as reason for crop failure and then reporting the proportion of farmers
who give different coded answers such as the proportion believing ‘drought’ to be the
reason).
Example 1
An on-station researcher-managed trial was conducted to investigate sorghum varietal resist-
ance to stem borers. Quantitative measurements were taken of the number of stem borers
in the stems, stand count and crop yields. Local farmers were then invited to the station to
view the different treatments and group discussions were held to elicit farmers’ opinions on
the performances of the varieties. These additional qualitative data provided the researchers
with information about: characteristics farmers found important , the opportunities for trans-
ferring the experiments on-farm, and the likelihood for farmer uptake of the most resistant
varieties.
Repeated measures
Measurements taken on the same unit (plant , plot , household) repeatedly during a study are
called ‘repeated measures’. These type of data are frequently used in laboratory and field
trials, e.g., plant disease levels estimated every week, the growth of a fungal pathogen on a
petri dish measured every 3 days.
When you are collecting repeated measures how often should you collect the data? In
some cases the answer to this question is simple, as when data are required after chemical
spraying or rain, then the occasions are defined. In other instances it is up to you to decide
how frequently to measure.
General guidelines when choosing the number of repeated measures to take:
• If you want to fit a (growth) curve to your data then 4–5 time points are usually sufficient
• When you don’t know which time points will give you information (plant disease levels in
a field trial may stay constant for some time) then you may need to take measurements
regularly (once a week). Note that for the plant disease example there is no point taking
measurements at the start of the trial if there is no disease present . In this case you
should be checking the site regularly and then start taking measurements when the
disease starts to appear, otherwise you will spend a lot of time collecting a lot of zeros!
• It is not essential that the observations be taken at equal time intervals. However, it is
important to record details of each time point so that the patterns observed can be
accurately plotted (time plotted on the x-axis on the correct scale).
172
4.5 Measurements
Example 2
A researcher wishes to measure above-ground biomass in an agroforestry trial, over a period
of 3 years. The plot size is set at 10m x 15m. He/she has several measurement options for
evaluating the amount of biomass, some ‘destructive’ and others ‘non-destructive’. What
measurement(s) could he/she take? Some options are in Table 1.
Destructively sample a few plants • Collect large amounts of data on • Lower precision of yield estimates
per plot at regular intervals biomass production (increase plot size to overcome this)
• If plant size within a plot is highly
variable then a large sample is
needed for a precise estimate of
biomass
• Time requirements are high
Destructively sample a few plants • Collect large amounts of data on • Guard rows may not be representa-
from the guard rows at regular in- biomass production tive of the plot
tervals • Over time the guard rows will lose
their ability to ‘protect’ the crop
• Time requirements are high
Destructively harvest the whole plot • Time requirement is low • Have no idea of the biomass pro-
at the end of the experiment only • Does not require extra plot area duction over the 3-year time pe-
riod
Record the plant heights at regu- • Does not require extra plot area • The height measurements may not
lar intervals and harvest the whole • Large sample (whole plot) can be be representative of the biomass
plot at the end of the experiment measured yields
• Plants can be followed over time
Record the plant heights at regu- • The harvest measurements from • Requires a lot of experience to cor-
lar intervals. A sample of plants neighbouring plants can be used rectly calibrate the measurements
grown close to the trial are har- to calibrate the non-destructive
vested regularly measurements
Bulked samples
Some variables, like the soil samples and chemical properties can be measured by bulking
together samples collected in the plot (or laboratory, site, etc.). You take N samples from a
plot/location and mix them together to form a single composite sample. M sub-samples are
then extracted from the composite mixture and measurements taken for each. Things to note
about this type of measurement:
173
The Green Book
• The variation you observe between the M sub-samples is due to measurement error and/
or poor mixing. It has nothing to do with the variation in the plot
• The closeness of the measured values to the plot value will depend on how close the
value in the bulked sample is to the plot value. This is determined by the N field samples.
The more samples you bulk together (i.e., N is large) the more representative of the site
your composite mixture will be
• If the N field samples are highly variable, or collected in a way that introduces bias (e.g.,
all samples taken from one corner of the plot), then increasing the number of sub-samples
you take (M) will not help.
Think carefully about the information you really need. Do you want to know how soil P,
for example, varies between different samples from the same plot or how it varies between
different plots? If you only need the latter then maybe M can be 1, but N may still have to
be large to make sure the bulked sample really represents the whole plot .
Plant aphid counts (every 7 days) Yield at harvesting (to investigate the aphids’ effect on yields)
Rainfall and temperature (changes in climatic conditions may af-
fect aphid numbers, how do these relate to the treatment effects?)
Soil fertility measurements
174
4.5 Measurements
Socio-economic survey
Objective – Investigate farmer perceptions of the impact of Striga on their maize yields.
(Detail – 200 farmers interviewed in one district of Kenya).
NB. This survey could be combined with a ‘researcher observed’ level of Striga to compare
farmer perception to the actual levels of infection.
• Look carefully at your research objectives
• What are the primary response measurements you should take so that you can answer
your objectives?
• What are the additional variables you could measure (Table 3) that will help you to explain
the patterns you observe and enable you to compare your research to similar work?
Table 3. Suggested measurements in Striga survey
Primary response measurement(s) Additional variable(s)
Farmer perception of Striga levels Maize management methods (that may affect levels of infection)
Farmer perception of yield loss due to Importance of maize to farmer’s livelihood (looks at the impact of
Striga infection the perceived Striga levels)
So, your measurement options are determined by your research objectives. But often you
will find that several different measurements could be used to answer the objectives so how
do you decide which ones to use, without duplicating the information? The answer to this
question depends on your research design, available resources, and practical considerations.
Research design
Almost all experimental designs have more than one level of hierarchy (villages/ farms/fields/
plots, or plot/row/plant/leaf) and you have to decide what measurements to take at each
level. Different quantities should be measured at different levels of the hierarchy, for example,
the wealth of a farmer is usually measured at the household level, the crop yield may be
assessed for each plot , and tree height has to be measured on individual trees. Other
variables may be measured at higher levels, for example, discussions with a farmer group will
generate village-level variables.
The type of research you are doing also determines which measurements are appropriate.
For a researcher-designed and managed trial it makes sense to take measurements on every
plot and location. In a farmer-designed and managed trial measurements may only be taken
on some plots. In a farmer-designed and managed varietal trial the objectives require crop
yields to be measured. However, on some farms the level of crop management was very low
and weeds greatly reduced yields. Yield measurements were taken on the sub-set of well
managed farms and conclusions applied to this environment . The reasons for varying man-
agement input were recorded on all farms to explain the differences between the well
managed and poorly managed sites. In this example measuring the yields on poorly managed
plots would not have provided the information necessary to explain varietal differences.
Research resources
The type and number of measurements you can take will depend on the resources available
in terms of time, money, and human resources.
It is often not possible to take as many measurements as you would like due to a lack
of these resources. So, should you take small samples of many different types of measure-
175
The Green Book
ments or fewer types with more samples? The answer depends on how precisely (i.e., the size
of measurement error) you want to evaluate each type of variation. It is often possible to
simplify your measurements, by using indicators and proxies, so that a larger sample can be
measured. Review the following two situations and decide which measurement option you
would take.
Situation 1
Conduct a biophysical survey to evaluate the levels of coffee berry disease in five coffee-
growing districts. You have enough resources to sample 1000 trees. The majority of farms
have around 200 trees and there are approximately 500 farms in each district (Table 4).
As an alternative to this option you could increase the number of farms to 20 and
decrease the trees sampled per farm to 10 – thereby increasing the precision at the district
level but decreasing precision at the farm level.
Table 4. Measurment options in coffee survey
Measurement options Gain/loss considerations
Sample (all) 200 trees on each farm A precise estimate of disease level on each farm but you only
Visit 1 farm in each district have one observation per district and therefore no idea of
variation within the districts
Sample 20 trees on each farm Estimation of variation within each farm and also within each
Visit 10 farms in each district district . Comparison of the two is also possible
Sample 1 tree on each farm A good estimate of disease levels within each district but no
Visit 200 farms in each district idea of variation within each farm
Situation 2
Carry out a farmer-managed experiment to evaluate the yield potential of 4 sorghum varie-
ties. You have 50 farmers who are willing to participate in the trial, but you are the only
scientist on the project . The crop matures on all farms in the same 2 weeks (Table 5).
Visit every farm and carry out the 1. Time – you don’t have enough of it!!
harvesting yourself, avoiding the edges of 2. Should the researcher control the harvesting of a farmer-
plots, taking into account damaged plants managed trial?
and gaps in the plot , etc. 3. The assistance provided by you may give a bias to the
farmers’ perception of the varieties
4. Does harvesting the whole plot at the same time (which you
would have to do) accurately reflect the actions of a farmer?
Take proxy measurements such as stand 1. Requires less of your time and can be carried out before harvest
count and height prior to harvest 2. May not always be a good proxy for the crop yield
Ask farmers to harvest their own plots 1. There is less time needed for you to interview the farmers
and provide you with sorghum yields for about their yields and perceptions
each plot , in kg/plot or as a score such 2. Farmers maintain ‘ownership’ of the trial and the trial
as, ‘poor’ to ‘excellent’ remains ‘farmer managed’
3. You do not obtain a precise researcher-controlled crop yield,
although you can use the farmer evaluations to answer the
objectives of the trial
176
4.5 Measurements
• What measurements do you need to take at each level of your design hierarchy?
• What resources do you have for your research – in terms of time, money and labour?
• What use of resources will give you the highest precision for your most important
measurements?
Before
Measurements taken at the start of your research can:
• Provide you with a baseline for your experimentation, e.g., soil fertility measurements of
a field-trial site
• Be used to characterise the plot/farm, e.g., wealth categorisation of farmers prior to their
participation in an on-farm ‘uptake of technology’ trial
• Assist with your design, e.g., characteristics of regional farming population used to select
a representative sample for participatory work.
During
You want to collect data on ‘interim’ responses whilst conducting your research:
• Common measurements on crop trials include plant stand and height . Other measure-
ments may include labour use for different operations, insect pest and disease levels etc.
• You might opt to include the use of participatory research tools, e.g., participatory rural
appraisal (PRA) during on-farm experimentation
• Whilst evaluating the use of agricultural information centres or extension offices you may
choose to record the daily attendance numbers.
After
Towards the end of your research there may be follow-up measurements that can help you
to complete your understanding of the results:
• You could measure soil fertility levels at the end of an on-station field trial, or farmer
perceptions and technology uptake at the end of an on-farm trial, for instance, do they
choose to continue using one of the tested technologies?
• Data could be collected to demonstrate the impact of your research, by comparing to
your baseline data.
177
The Green Book
Field trials
• Ensure data collectors are trained in how to take each of the measurements. Give your
enumerators a demonstration of the data-collection methods
• Monitor enumerators’ performance by observing at least some of the data collection
• Check through the data as soon as it is given to you and follow-up on any problems with
the enumerator immediately – before their memory fades
• Remember to take photographs of significant results or events, and label them carefully
for later reference.
178
4.5 Measurements
Ashby, J.A. 1990. Evaluating Technology with Farmers: A Handbook. CIAT Publication no. 187. Centro Internacional
de Agricultura Tropical (CIAT), Apartado Aereo 6713, Cali, Colombia. 95 pp.
Ackroyd, S. and Hughes, J.A. 1981. Data Collection in Context. Longmans, London, UK. 155 pp.
CIMMYT Economics Program. 1993. The Adoption of Agricultural Technology: A Guide for Survey Design. Centro
Internacional de Mejoramiento de Maíz y Trigo (CIMMYT), Mexico DF, Mexico. 88 pp. [email protected]
Coe, R., Franzel, F., Beniest , J. and Barahona, C. 2003. Designing on-farm participatory experiments. Resources for
trainers. World Agroforestry Centre (ICRAF), Nairobi, Kenya. www.worldagroforestycentre.org
Feldstein, H. and Jiggins, J. 1998. Tools for the Field: Methodologies Handbook for Gender Analysis in Agriculture.
Kumarian Press, Hartford, Connecticut , USA. 270 pp.
Franzel, S. and Scherr, S.J. 2002. Trees on the Farm: Assessing the Adoption Potential of Agroforestry Practices in Africa.
CABI, Wallingford, UK. 208 pp.
Lal, R. 1994. Soil Erosion Research Methods. St Lucie Press, Florida, USA. 352 pp.
Philip, M.S. 1994. Measuring Trees and Forests. CABI, Wallingford, UK. 336 pp.
Spencer, D. 1993. Collecting meaningful data on labour use in on-farm trials. Experimental Agriculture 29: 39–46.
Schroth, G. and Sinclair, F.L. 2003. Trees, Crops and Soil Fertility – Concepts and Research Methods. CABI, Wallingford,
UK. 416 pp.
Internet resources
• Reading Statistical Services Centre (SSC) website (http://www.rdg.ac.uk/ssc/) contains sev-
eral downloadable booklets and papers. They provide ‘easy to read’ discussions and
advice on various aspects of experimental and survey design and analysis. ‘Measure-
ments’ are often discussed within these topics.
Examples of useful information on the SSC website:
N. Marsland, I.M. Wilson, S. Abeyasekera and U. Kleih (2000). A Methodological Framework for
Combining Qualitative and Quantitative Survey Methods.
DFID Good Practice Guidelines:
• Guidelines for planning effective surveys
• On-farm Trials – Some Biometric Guidelines
• Centro Internacional de Agricultura Tropical (CIAT) website (www.ciat .cgiar.org)
Online publications:
Horne, P.M, and Sturr, W.W. (2003). Developing Agricultural Solutions with Smallholder
Farmers: How to get started with participatory approaches.
TSBF Institute of CIAT (2001). Legume Cover Crop and Biomass Transfer Extension Leaf-
lets
• Food and Nutrition Technical Assistance (FANTA) website (www.fantaproject .org)
(Downloadable: Agricultural Productivity Indicators Measurement Guide)
• International Livestock Research Institute (ILRI) website (www.ilri.cgiar.org) – check the
‘Capacity Strengthening – Training Materials’ page for some on-line materials.
• International Institute of Tropical Agriculture (IITA) website (www.iita.cgiar.org)
On-line publications (a few of these need to be ordered from IITA):
180
4.5 Measurements
181
The Green Book
182
4.6 Data management
Gerald W. Chege and Peter K. Muraya
• ‘Data management’
refers to all the
Introduction
steps in looking Research work, irrespective of whether experimental or survey
after and processing type, generates data. Data are the resources used by scientists to
your data, from make conclusions and discoveries. As in other human activities, if
observation in the
field until the end of you plan to use resources you need to take care of them, because
the study, and after lack of care may have disastrous effects. For example, a computer
• Attention to data file containing medical data collected over a number of years
management is could become corrupted. If there was no other copy elsewhere the
important to ensure total value of the resource would be wiped out .
your observations Data management can be defined as the process of designing
are valid, they can
be processed effi- data collection instruments, looking after data sheets, entering
ciently and will data into computer files, checking for accuracy, maintaining records
remain available for of the processing steps, and archiving it for future access. It also
follow-up analysis at
the end of your
includes data ownership and responsibility issues.
study Data management is important for the following reasons:
• To assure data quality. Since conclusions are based on data,
• Your project must
have a data manage- accuracy is paramount and errors resulting from wrong data
ment strategy that entry, incorrect methods of conversion and combining numbers
describes proce- must be avoided
dures and responsi-
bilities
• Documentation and archiving. Documenting or describing
data and archiving it are important so that anybody can make
• Computing will be sense out of the volumes of rows and columns of numbers for
an important part of
a data management ongoing research and future use
strategy. If your data • Efficient data processing. Scientists spend a great deal of
are simple then time preparing data for analysis. This includes converting data
spread-sheets may to suitable formats, merging data sitting in different files, and
be suitable tools for
data management. summarising data from field measurements. The time spent in
There are good and this pre-processing step can be greatly reduced if data are
bad ways of using properly managed.
spreadsheets
To see why data management is important , it may be worth-
• If your data are while considering how organisations manage financial and accounting
complex then data. Whole departments spend huge resources on tracking trans-
spreadsheets will
not be sufficient actions to ensure quality, on keeping records to document and
and you will need describe those transactions, and on ensuring the records are avail-
to learn something able for future reference, to generate invoices, or to make pay-
about database
design and use
ments and summary accounts. Specialist accountants are trained
and hired to do this. Unlike accountants, scientists are expected to
• Misunderstandings perform similar tasks with research data without the benefits of
over data ownership
can damage pro- training.
jects. Make sure all The key steps followed in research data management are sum-
ownership issues are marised in Figure 1.
resolved before data
are collected
183
The Green Book
Planning for data management takes into account research objectives, resources and skills
available. Appropriate field data recording sheets are designed. Data collection includes
appropriate quality control. Raw data should be checked for errors. It should be entered into
well organised computer files. Captured data must be backed up to safeguard against
catastrophes. Data are processed for analysis, the results of which are checked again for any
errors. Any data processing is logged to track data changes. Finally, data are archived for
future reference possibly by other scientists.
After reading this chapter, we hope you will be better able to manage your research data.
To appreciate the difficulties involved, some of the problems will be discussed. Such prob-
lems are both technical and people-oriented.
Technical problems include such issues as: lack of skills, lack of data documentation so
future access is not straightforward, joint access for team projects, lack of proper design so
as to meet data requests, incompatible data sets in cases where similar data are gathered at
different locations or times, or files backed up on software that is no longer supported.
‘Soft’ or people issues include: time wastage in searching for data, re-processing old data
sets, collecting data that had already been collected, and reformatting data.
Data capture
As shown in Figure 1, data capture is the activity that combines collection, checking, entry
and saving data in some permanent electronic medium. You can get lots of help from
specialists in carrying out the data management steps before and after data capture, but this
is the one step you cannot shortcut and have to do yourself without much help. It is the step
that takes most of the resources (time and money) meant for the research; and that’s why
it is so critical. The quality of the data processing that comes after this step will be
determined by many factors, including which data you capture, how you lay it out , and which
tools you use to do the job.
The tools
Some data capture needs can be sufficiently met by using word-processing software to
publish the final results in a simple table. That’s important , but it’s not the main reason you
enter data. It is to help you turn raw data into more meaningful results, an operation that is
more difficult to archive with word processors than other software tools. Databases are the
other type of tools available for data manipulation. However they are not in common use
because their use is not intuitive for users who do not have much programming experience.
Between the word processor and database extremes lie the spreadsheets that some people
prefer to use for data capture because they are easy to use for data entry, limited manipu-
lation and to display simple graphics. Here, the word limited is emphasised because the
extent of the spreadsheet limitation is something that is under your control. Used without any
discipline, a spreadsheet can be as severely limiting as a word processor; with discipline you
184
4.6 Data management
can use it to process your data with a flexibility coming very close to what you can achieve
with a well designed database application.
• The long names are captured in single cells and formatted in word wrapping style –
instead of the more common way of using multiple cells to break the label into small
displayable chunks
• The short names are all alphabetic. Avoiding other characters or alphanumerics is a good
discipline since most other applications will strip them out , or replace them with codes
that may change the column names to something unexpected
• The short labels are entered on the last row, just before the numbers – allowing a data
export range that excludes the long titles to be formulated and named.
Other descriptors
To further describe data sets, users will often go to great lengths to formulate folders and
filenames that document the data. So a folder/filename like, /Western Kenya/Eva/Striga
185
The Green Book
research/2000.xls is not uncommon. There are two problems with describing data sets like
this. The first is that you lose this description if the file is copied to another folder. The
second is that the folder/filename structure gets very convoluted if you attempt to cram in
all available documentation. One way to get round these problems is to enter these other
documentations directly into the spreadsheet , rather than coding them into folders and
filenames. Entering them at the header is less likely to interfere with other spreadsheet
operations than anywhere else. A good example is shown in Table 3.
Figure 2. One entry per cell principle: a. Error in column Q1, b. Solved by another sheet for Q1
The circled entry is in error. The solution is to create another worksheet for Q1 as shown
in Figure 2b. The two worksheets are then linked, using special formulae in Excel, e.g.,
vlookup(...) that are more difficult to use than exporting the data to a database package like
Access.
186
4.6 Data management
solution of moving the second part to a different sheet is not recommended because you
lose the integrating effect that allows you to analyse the data set as a single unit .
• Lines 8 and 15 are blanks, which represent entities that are different from the prior rows.
The user may have inserted them for some sort of clarity, and not to indicate the end of
a data set range, which is how Excel would interpret them. So, if you used your sort
function, only the top part , up to the blank row, of your data set would be sorted, which
is probably not what you intended.
• The case for data lines 4–7, 10–14 and 17–22 also needs attention. Without rearranging
these data, it is clear to us what the implied values are in the blank entries. But this would
no longer be the case if the data were sorted. This is a very common problem when users
try to make a spreadsheet look exactly like the paper forms. The solution, of course, is to
fill in the implied values.
187
The Green Book
recommendation is to leave the columns blank. Should you need to explain further why no
value existed, use the comment feature. Unfortunately, comments are ignored when data are
exported to environments outside of Excel, thus limiting their usefulness.
1 1 1 20 1 214 4
1 1 1 20 2 252 6
1 1 1 20 3 153 2
1 1 1 20 4 183 4
1 1 2 18 1 98 1
1 1 2 18 2
1 1 2 18 3 201 3
1 1 2 18 4 192 1
1 1 3 9 1 232 8
1 1 3 9 2 201 7
1 1 3 9 3 198 4
1 1 3 9 4 152 4
1 1 4 10 1 175 2
188
4.6 Data management
achieved by freezing or splitting the window panes. In Excel this is achieved by selecting
Window àFreeze Panes (or Window à Split) when the row below the column heading is
selected. To remove this effect , select Window à Unfreeze Panes.
Figure 3. Scatter plot example showing data outlier (for details see Appendix 11)
189
The Green Book
Data auditing
For already existing data, the auditing tool allows you to check for some errors. Auditing is
created by: Select Tools à Formula Auditing … and then click on Show Formula Auditing
Toolbar. On the toolbar, click on the icon Circle Invalid Data (second from right). This draws
a ring around invalid data.
An illustration of auditing, with validation rules for both species and rcd is shown in Figure 4.
In the figure errors are circled for the variables rcd and species. All cells in a column should
have the same data type. In the case of column D (Figure 4), the data type is numeric. The
string ‘DEAD’ in cell D10 is therefore inappropriate. The entry 12.7, 13.3 in cell D2 is also
invalid (it is ringed) because of two decimal number values and so is 198 (in D21) because it
is out of range. In cell C19, A. polyantha is wrongly spelled and hence ringed. You can see how
the auditing tool helps you to spot data entry errors before processing.
Figure 4. Auditing of existing data (see SCS–University of Reading: Disciplined Use of Spreadsheet Pack-
ages for Data Entry)
190
4.6 Data management
Designing a database
If you are handling several related worksheet files, the logical step to take is to use a
database for these data. The key issue in a database approach in capturing research data is
the initial design of the base tables. Each data object (called entity – like a plant), has
characteristics that define that object . For example, a plant has leaves, height , maturity stage,
shoot size at a given time, description, etc. These are the properties of the plant object . In
Access-speak, the plant entity’s properties are the fields or attributes and each has a data
type. (sample data types are integers – identified as integer and long integer, decimal point
numbers – identified as long and double, and text , amongst others).
Database design means defining each of these attributes with their data types, selecting
one or a combination of attributes as a unique identifier for the plant object (called the
191
The Green Book
primary key) and then repeating this process for other entities. After this, you can create
relationships between the different entities. For more complicated databases some real
design work is necessary (which includes normalisation). An example is the relationship
between an employee and her dependants shown in Figure 5 (both employee and dependant
are entities). This type of relationship is called one-to-many – one employee can have several
dependants.
Employee Dependants
Once the fields for each entry are choosen you can define a table to hold the data. The
table design screen in Figure 6 shows the design of a person-level table. Names for the fields
and their data types are defined. Once the table is created you can enter data via the
datasheet or the spreadsheet view. This is shown in Figure 7. The datasheet resembles the
Excel worksheet .
192
4.6 Data management
193
The Green Book
multiple worksheets so that data sets can be extracted from all of them, and also to check
data validity.
Data archiving
Archiving is the process of storing data for future use. The user of archived data is not
necessarily the person who did the experiments, or carried out a survey. Indeed a well
archived data set can be used by others to derive new relationships in the data or to
compare primary data with secondary data. Funding agencies may even be attracted by the
possibility of archiving data from the findings of a proposed project .
The process of archiving data requires three basic principles:
1. The data about the project rather than the results of the study itself (sometimes called
meta-data – description of the data itself) should be archived.
2. The description of why data was collected should be archived.
3. You must archive a description of the data files – their types and structure.
The latter makes future retrieval easier. The first point makes it possible to easily under-
stand the rationale of the data-collection exercise, while the second gives additional informa-
tion on the procedures and processes of data collection. This means a future researcher
would be able to replicate the experiment or survey for scientific validation of the findings.
Data files need to well structured, in the majority of cases they are computer files.
Backups of computer files should be made regularly with a strategy of keeping a master
copy far away from the archival site. This ensures continuity in case of natural hazards like
fires, floods or earthquakes.
To summarise, a good data archive should be/have:
• Accessible: hence easy to access by many users who have commonly available software
• Easy to use: so that the field data collection forms, and what will be entered into the
computer are similar
• Clearly defined variables: the units of measure and codes used (labels for names of
variables) should be as clear as possible
• Consistency: of names, codes, units of measurement , and abbreviations
• Reliable: archive should be as free from errors as possible
• Internal documentation: documentation should be complete with regard to: procedures
for data collection, sampling methodology and sampling units used; the structure of the
archive (how different files are related); a list of all computer files in the archive; a full list
of variables and notes on how to treat missing values; summary statistics for cross-
checking the information in the archive; and any warnings and comments that need to be
observed for data usage
• Confidentiality: ensure that the data remain confidential if this is required by the sources
• Complete: if possible you should store copies of: the data capture field forms; the data
management log-book; a description of derived/calculated variables.
Storage and access to the archives is also an issue for you to consider. A good archive
includes information on how to get into the archive with rules of use and replication to other
third parties. The storage medium for computer archives is in most cases, hard disks. With
the new pervasiveness of the Internet , access to the archives is mainly by downloading
archive files. This is true for different types of data including text , graphics, maps, photos,
and audio and video material. Using diskettes and/or CDs a s distribution media is of course
an option. The other medium is good old printouts sent by mail for long-distance access.
194
4.6 Data management
To show the seriousness of data archiving and its place in research, it is now possible to
publish peer-reviewed data papers (http://esa.sdc.edu/Archive/E081-003/main.html).
195
The Green Book
• Time. This is necessary for a good output . Funding agencies have recognised that besides
the final research output (analysis results), data is also an important output achieved by
archiving. Clearly enough time ought to be devoted to data management in the overall
research timeframe
• Financial resources. It is important to include data management within the project
proposal, otherwise the necessary tasks it involves will not be done.
1. Transformation
This describes the entire data management cycle starting from problem definition, formula-
tion of research objectives/hypothesis, development of data capture tools, data entry using
some validation rules, selection of data for analysis, the actual data analysis, management of
results and finally publication of findings. This is a cycle because it is possible to go back to
any point in the process in case there are errors.
2. Managing meta-data
Meta-data is a description of the data to be handled in a research project . It can be used to
describe data sets, enable effective management of data resources, and to enable other
researchers to understand the data sets of a project .
The key areas of a meta-data are:
i. Title. The name of the data set or the project
ii. Authors. Names of researchers (principal researcher and others) with addresses, phone,
e-mail, and web contacts
iii. Data set overview. Introduction to the data set , location of data, time of experiments/
survey, and any references
iv. Instrument description. Brief description of data capture instrument with references
v. Data collection and processing. Description of how data were collected, computed
values, and quality control procedures
vi. Data format. Structure of data files and naming conventions, codes (if used), data format
and layout , version number and date.
Meta-data description must be done for every project.
Conclusions
Data management in research is very important . It is the entire process encompassing
project initiation, through all the phases up to the time a paper is published as a result of
that research. For quality results, all the phases as described in the strategy above must be
managed according to the stated principles. The scientific community now accepts data
papers for publication, in addition to the traditional research output documents. Data
archives are a rich source of information so your data should be archived following the
guidelines discussed above. These should give you enough reasons to embrace research data
management and practice it . After reading this chapter, you should have sufficient informa-
tion to better manage your research data.
Besides the referenced material, additional sources of relevant information are provided
below.
197
The Green Book
198
4.7 Analysing the data
Susan J. Richardson-Kageler
199
The Green Book
• Interpretation
– Understand the results
– Combine new and old information
– Develop models
– Develop new hypotheses
As you progress through your research and analyse the data as you go along, you will find
that the data analysis is iterative, this means it is not a simple matter of following straight
through the process outlined above. You will need to stop and revisit previous steps as new
information is discovered. Even though you analysed the data as you progressed through
your work, you may need to re-organise your data and reanalyse, so you must leave plenty
of time to complete the analysis and write up the results after data collection. Remember, it
always takes longer than you expect to get the tables, figures and analyses compiled and
written up effectively.
In this chapter, two examples are used. The first is a survey which investigates farmer’s
perceptions to, and use of, planted fallows. A questionnaire was administered to 121 farmers
who had experience with planted (improved) fallows grown with or without rock phosphate
fertilizer in Western Kenya.
The second example is an experiment to evaluate whether the pumpkin (Cucurbita maxima
L.) variety Flat White Boer can be used as a smother crop when planted at the same time and
intercropped with the long-season maize variety PAN86 at the University Farm, Mazowe,
Zimbabwe. This evaluation is done by comparing sole maize, sole pumpkin and a pumpkin–
maize intercrop.
Analysis objectives
Two key points for this section:
• Analysis objectives are determined by, but more specific than, the overall research objec-
tives.
• Analysis objectives will evolve during the research as you gain insights and experience.
When the research proposal is put together at the beginning of the research, it is usual to
state an aim and a set of objectives in the introduction. Analysis objectives often need to
be stated separately from objectives of the research work. This is because the analysis
objectives are dealing with the specifics of the analysis. The original objectives may appear
vague in comparison as they often do not specify precisely which variables are to be
analysed and how they are to be processed. The analysis objectives will determine such
specific things as:
• What the relevant variables are and to which level they are summarised (see later in this
chapter in the section on Preparing for analysis)
• The specific comparisons that will be made
• The relationships between variables that will be investigated.
For the pumpkin–maize intercrop example, the analysis objectives include:
• Comparing the maize grain yield between sole maize and pumpkin–maize intercrops
• Comparing the weed density of the sole maize, sole pumpkin and pumpkin–maize intercrop
• Determining how the maize yield depends on pumpkin and weed cover.
The analysis objectives should be refined as you proceed through your data collection.
Your experience in the field will give you ideas and insights you did not have when you
planned the research. It is a good idea to keep a notebook and record your observations and
200
4.7 Analysing the data
ideas as you go along. Things often happen for which there is no place on your data entry
field sheets. You often find that when you come to complete the writing up that you cannot
remember these important things that have occurred during the research process. Even go so
far as to keep a notebook beside your bed at night . This will help you to sleep better as you
can write down the things that you think of, and so you won’t have to keep yourself awake
to make sure that you remember your ideas in the morning! Examples for your notebook
include the fact that one of your test animals broke out and ate your neighbour’s vegetables,
or ideas for more informative graphs and data summaries.
201
The Green Book
related back to the objectives of the study so that the analysis does not become misdirected.
You may need to revisit this step again after carrying out part of the analysis as you may
realise then that the summaries you have calculated may be inappropriate, or some analyses
have indicated patterns that will be best examined at a different level or with different
variables.
The results of this stage are data sets that are in the correct form to answer the research
objectives. If this stage was not done in a statistics package, then the data should now be
ready for transfer to a statistics package for analysis. Software for handling data entry,
modification and analyses should all be compatible and includes:
• Database management software
• Spreadsheets
• Statistics packages
• Word processors.
Compatibility means that you can move the information from one package to another. For
example, you may want to add your data to your final report as an appendix. You should be
able to copy the data into a data-entry package such as Excel, and paste them into the word-
processing package in which you are writing your project report , e.g., Word. Further, the
graphs generated in a statistics package such as Genstat or Minitab, can be copied from the
statistics package and pasted into your project report in the word-processing package.
202
4.7 Analysing the data
Example
One student who went immediately to an ANOVA missed out the most important finding of his two
and a half years of research. He was investigating the effect of different diets on the fat in ostrich
meat . He collected the data and then carried out an analysis of variance. He did not plot any graphs,
calculate any summary statistics, or check the residuals resulting from the analysis of variance. Later,
when the residuals were examined, it was found that there was at least one outlier generated by each
analysis of variance. On examination of these outliers, it was found that the birds concerned all came
from the same farmer. He was raising them so they had consistently lower fat in their meat than any
of the other ostriches. Further investigation of this farmer could reveal he used a diet that will answer
the aim of the research – which was to develop a diet for ostriches which results in low body fat . The
farmer was doing exactly what was required as an outcome, he was already producing birds with low
body fat and high muscle yield – but the student had missed the point entirely. The lesson from this
is that data analysis is not complete without a proper investigation of the pattern before carrying
out a confirmatory analysis.
The preliminary analysis of the data (exploration and description) should reveal the following:
• Structure/shape of the data and pattern as related to the objectives
• Outliers or unusual observations
• The need to modify the data
• Patterns suggesting new questions and the data analyses.
203
The Green Book
Table 2. Numbers of people interviewed by sense, for example, villages close together,
village summarised by gender or by using a data-driven method such as
Count of village Gender
cluster analysis. The clustering could be
Village F M Grand total
based on variables decided upon after ex-
amining the objectives of the research.
Eb 2 2 4
Ed 3 4 7 But also think about the analysis objec-
Ei 11 7 18 tives: Do you actually need to know about
Ek 1 1 differences between different villages?
El 8 4 12
Maybe ‘village’ is only recorded as part of
Em 1 12 13
Es 2 2 4 the logistics of data collection, and need
Et 1 1 not appear in your summary tables. The
Eu 4 3 7 point is: the tables should relate to what
Ey 1 3 4
you need to know.
Lu 3 4 7
Mu 4 4 One objective in this survey is to as-
Ny 12 5 17 sess the use of fallows in the past. This is
Sa 12 6 18 given in Table 3.
So 1 1
For the intercropping experiment , the
Sr 1 2 3
data were first entered into an Excel
Grand total 66 55 121
spreadsheet . It was noted, while entering
the data, that a mistake was made in car-
Table 3. Summary of the use of natural fallows rying out the experiment . One of the plots
by gender
that should have been Treatment 3 was
Count of natural fallow Gender accidentally assigned Treatment 6. This is
F M Grand total not the end of the world, and the data can
No 31 22 53 still be analysed with slight adaptations to
Past 5 6 11 some of the methods. As these data are
Still 29 26 55
Unknown 1 1 2 in Excel it is possible to carry out some
Grand total 66 55 121
analyses using Excel. The output shown
here is not the default from Excel. The
Excel output showed many decimal places.
This has been modified to give one decimal place for each average. Displaying too many
decimal places is a common mistake made by students – often because they just copy and
paste the output from one package into their document .
A quick examination of Figure 1 reveals that the treatment with the highest weed biomass
was the pumpkin-only crop with no weeding. The lowest weed biomass in the sole maize and
sole pumpkin crops was for those weeded at 3+5+8 weeks. All the pumpkin–maize intercrops
that were weeded showed similar weed biomass levels. It is not clear if these are different ,
however it appears that the best return for effort is to weed the intercrop at 3 weeks. Bar
Table 4. Two-way pivot table of average weed biomass for the three crops and four weeding
treatments
Weeding
Crop None 3 weeks 3+5 weeks 3+5+8 weeks Average
Sole maize 83.5 79.3 28.3 6.8 51.4
Pumpkin–maize intercrop 87.6 6.2 3.5 5.5 24.2
Sole pumpkin 162.2 30.9 40.8 3.1 59.3
Average 111.1 35.6 23.7 5.1 44.2
204
4.7 Analysing the data
charts can be useful displays when presenting results because they give a quick visual display
of what is going on that most people intuitively understand. However, the x-axis would be the
crop treatments, the plots would be each weed biomass and separate lines could be used to
join each weed treatment for the three crop treatments. (Figure 1). Note that the lines
connecting the points highlight which ones are from the same weeding treatment . They do
not suggest that there is a weed biomass for some intermediate treatments.
So far, the analysis has summarised the data using means. Other summaries such as the
minimum and maximum values, trimmed means, standard deviations and standard errors
can be calculated. These summaries are useful when dealing with larger data sets, as they
may give indications of outliers, but they are not useful when dealing with small data sets like
the pumpkin–maize intercropping data. Take care in your use of a spreadsheet , it may
contain statistics calculated in a way that you are not sure about . For example, the way the
spreadsheet deals with missing values, or whether the spreadsheet is calculating a population
or a sample standard deviation. If in doubt , take the data into a statistics package.
Table 5 shows a printout for the summary statistics calculated on the pumpkin–maize
intercrop. This is not suitable to be presented in your report as an unmodified printout for
a number of reasons. First , notice that the variable name ‘Treatment’ has been reduced to
eight characters by the statistics package. Next , unhelpful treatment numbers rather than
informative names are given. Most importantly there are statistics given by this printout that
may not be appropriate, or that you may not even understand. The statistics may not be
205
The Green Book
Table 5. Descriptive statistics of the weed biomass of the pumpkin–maize intercrop data
(see text for explanation as to why this table should not be presented in this form)
Descriptive Statistics
Variable Treatmen N Mean Median TrMean StDev
Biomass 1 3 83.5 84.5 83.5 27.8
2 3 79.3 98.4 79.3 58.5
3 2 28.30 28.30 28.30 4.92
4 3 6.81 5.44 6.81 5.84
5 3 87.6 73.3 87.6 36.3
6 4 6.24 7.03 6.24 2.27
7 3 3.527 2.960 3.527 1.136
8 3 5.49 7.32 5.49 3.24
9 3 162.2 160.0 162.2 18.9
10 3 30.92 34.89 30.92 9.14
11 3 40.83 38.03 40.83 6.01
12 3 3.14 1.73 3.14 3.10
Variable Treatmen SE Mean Minimum Maximum Q1 Q3
Biomass 1 16.1 55.2 110.8 55.2 110.8
2 33.8 13.7 125.9 13.7 125.9
3 3.48 24.82 31.78 * *
4 3.37 1.79 13.21 1.79 13.21
5 20.9 60.7 128.8 60.7 128.8
6 1.14 2.91 7.98 3.87 7.81
7 0.656 2.785 4.835 2.785 4.835
8 1.87 1.75 7.39 1.75 7.39
9 10.9 144.4 182.1 144.4 182.1
10 5.27 20.48 37.41 20.48 37.41
11 3.47 36.73 47.72 36.73 47.72
12 1.79 1.00 6.70 1.00 6.70
Figure 2. Dotplot of the weed biomass for the pumpkin–maize intercrop experiment showing each
of the treatments
206
4.7 Analysing the data
appropriate for the data structure and the statistics may not answer the research objectives.
The student is also allowing the statistics package being used to have undue influence on the
analysis and the presentation of the results and to distract him/her from the objectives of the
research. This is also an extremely untidy table which is difficult to read! Tables are better if
they don’t go over a row for each treatment , and ‘treatment number’ would make more sense
if it was replaced by a text label.
Figure 2 shows a type of exploratory graph (dotplot). Dotplots show the spread of the
data. Notice how the points for Treatments 1, 2, 5 and 9 are more spread than those of the
other treatments. The graph is still labelled with unhelpful treatment numbers rather than
names but maybe that does not matter. This is an example of a graph which is important to
you in analysing the data – it shows that some treatments are much more variable in weed
biomass than others. That is something you may need to take into account in your analysis.
But , unless it relates to a key analysis objectives, you will not need to include this graph in
your report. Hence, its inelegant layout is not a problem.
Other plots like boxplots and stem-and-leaf plots are available and should be tried. If the
data have two related variables, for example, yield and amount of fertilizer applied, scatterplots
should be plotted to check whether a regression line can be fitted.
The analysis shown so far should be repeated for each response variable in a data set .
You can see that by this stage you could already write a fair amount on the data patterns and
subjectively suggest results. For example, in the intercropping example you could suggest
which is the best crop and weeding regime to use. But there are some severe limitations to
this analysis. Two important ones are:
1. Only simple patterns can be investigated. You can look at how y varies as x varies by
plotting y against x. But what if there are several x’s, all to be considered simultaneously?
2. There has been no consideration of the uncertainty in any of the summaries that are used
to interpret the data. Yet we know there is variation in the observations, so there is
uncertainty in the results.
The formal analysis addresses these problems. But , you should note that although Excel
is useful for the descriptive analyses described so far, Excel is not good for more formal
analyses and modelling. Use a reputable statistics package such as Minitab, Genstat or SAS.
207
The Green Book
descriptive stuff is insufficient . You are after all doing a research degree. If there is no
generalisation, then there may be no research – and you might not get your degree!
It is at this stage that you will need to get some idea of the precision and accuracy of
your results. Precision is the closeness of the data points to each other. It is often measured
using variance, whereas accuracy is the closeness of the data points to the true population value.
Regression
Statistical models are mathematical representations of the pattern in data. The simple
regression model is often taught in basic statistics courses as it illustrates many important
concepts. A regression is the fitting of a straight line to data to describe and predict the
relationship between two variables. These variables consist of a response variable or depend-
ent variable and the variable to which it responds, the independent variable. In the following
regression example (Figure 3) the relationship between inorganic soil nitrogen and crop yield
was investigated.
208
4.7 Analysing the data
Another common error is to include all the regression output generated by computer
software in the text of the report. If you really need to include the output , it is best put into
an appendix with the key points summarised in the text . This is also a common mistake
when carrying out the ANOVA and can be dealt with in the same way.
Your work is not complete when you find the model or regression line. You must still
check to see if you have fitted an appropriate model and if each of the parameters should
be in the model. This is done using the residuals and carrying out a number of significance
tests. You must indicate in the methods section of your report that such checks have been
carried out , or the reader will assume they have not been done and question the validity of
the models that you have produced (see any good statistical text or ask a statistician for
details on how to check residuals).
Part of exploratory data analysis is checking patterns of the residuals – there should be
no patterns if you have picked up the pattern and structure of the data correctly. However,
you often cannot check the residuals until you have actually fitted a model because the
residuals are the result of fitting the model.
Confirmatory analysis
Having checked that a model is now possible you need to look at the confirmatory statistics
supplied in the output . This is the statistical inference part of the data analysis. There are
several concepts you need to understand before you can do the next part of the analysis.
These are estimates (point and interval) and tests of significance.
It is the ideas of inference and modelling that students are usually taught in university
statistics courses, but find difficult . So you may have to review the key ideas of estimation,
confidence intervals and significance testing. Often this has been covered in your statistics
course, but in ways that are difficult to relate to your needs in analysing your research data.
This is perhaps because your statistics course was too theoretical. It may not have included
realistic examples. Some courses still do not integrate the use of the computer with the
discussions of the concepts of statistical modelling. Also, you may not have been very
interested at the time, perhaps because you had convinced yourself that the ideas were
difficult .
There are now many resources that you can use to review the ideas you need without
using too much mathematics. The references at the end of this chapter are for students who
need to review such ideas. But don’t leave this too late in your thesis writing. The later you
leave it , the more pressure you will be under to finish the thesis, so you will not be able to
concentrate on reviewing ideas that are not central to the needs of a current chapter.
If learning statistics was a problem for you, then remember it might also have been a
problem for your supervisors. They may be hoping that you will be more comfortable with
statistical concepts than they were! Even if they now like statistics, be aware that some
supervisors may cling to one or two favourite methods of analysis. These may not be the
only methods that can now be used to process your data.
You should understand these ideas, because you should not do analyses that you do not
understand. The rule remains to analyse the data in ways that are dictated by the
objectives of your study. For example, suppose you use a method called principal compo-
nents in your analysis. You do not necessarily need to understand all the formulae that
underlie this method. But you must be able to explain (perhaps in an oral examination) why
you have used this method and how the results have contributed to your understanding of
the data in relation to the objectives of the analysis. It is not sufficient to say:
209
The Green Book
210
4.7 Analysing the data
Table 6. Tables of mean weed biomass for the pumpkin–maize intercrop experiment
1
a. Treatment Mean weed biomass (g/m2)
1. Sole maize with no weeding 83.50b
2. Sole maize with weeding at 3 weeks 79.30b
3. Sole maize with weeding at 3+5 weeks 28.30a
4. Sole maize with weeding at 3+5+8 weeks 6.81a
5. Pumpkin–maize intercrop with no weeding 87.60b
6. Pumpkin–maize intercrop with weeding at three weeks 6.24a
7. Pumpkin–maize intercrop with weeding at 3+5 weeks 3.53a
8. Pumpkin–maize intercrop with weeding at 3+5+8 weeks 5.49a
9. Sole pumpkin with no weeding 162.20c
10. Sole pumpkin with weeding at 3 weeks 30.92a
11. Sole pumpkin with weeding at 3+5 weeks 40.83a
12. Sole pumpkin with weeding at 3+5+8 weeks 3.14a
1. Means with the same letter are not significantly different (5% LSD)
There is a need to examine the appropriateness of the model, and the model fitting itself
can lead to further data exploration and understanding of the results. In the pumpkin–maize
intercrop, the ANOVA table showed some significant effects, but the analysis of the residuals
showed that the model was not appropriate (the residuals showed inconsistent variances
across the treatments and the normal probability plot of the residuals was not a straight line).
If you only fit the model for the significance levels to test your null hypotheses you will miss
the real information in your data. You might produce significance levels with no understand-
ing of what really happened. The question is, you found a significiant result , but so what? You
have missed the patterns and information in the data and you have yet to prove that the
‘significant’ model is actually appropriate.
Mixed modelling
ANOVA can be difficult to apply to situations where there are multiple sources of random
variation – such as those between villages, between farms within villages and between plots
within farms. An approach to modelling, called mixed modelling is now available to deal with
these situations. This is an important statistical development for analysis of many field
studies, both survey and experiment . See Allan and Rowlands (2001) for further information.
The methods touched on briefly in this chapter are the main methods that have been used
in agricultural research in Africa.
Note that the value of any statistical method depends on what you want to find out , and
the nature of the data and the research design that generated it.
It does not depend in any way on whether your data came from a research station or
from farms, whether the study was participatory, or whether it relates to the biophysical,
social or economic aspects of a problem.
211
The Green Book
• A model is a
simplified represen-
Introduction
tation of part of the Modelling can mean many things in research, and models of one
real world. In this sort or another play a crucial role in much research. Experience
chapter we discuss shows that the role and use of models is rarely explained in
models that can be
described math- research methods courses. The result is that many students have
ematically only a vague idea about what models can and should be doing for
• Models are based them. Modelling is often regarded as the domain of specialists
on theory. In who sit hunched over computers, not of agricultural researchers
research models who want to solve real problems in the field. The result is that
help to test theory much research is less effective than it might be. The aim of this
by making predic-
tions that can be chapter is to start to fill that gap.
compared with The chapter is divided into three major parts. The first shows
observations you how models are a natural part of the research process. This is
• Models also allow to help you develop your ideas from the general ‘models are
the implications of everywhere’ to the main focus of the chapter, which is concerned
research results to with mathematical o r simulation models. The second part dis-
be explored by
making predictions cusses your options if you plan to do some mathematical model-
for new situations ling. Finally, details of the steps you need to follow to construct ,
• Each model is built use and test simple models are described, using examples where
for a specific modelling tools have been applied in research studies in Kenya.
purpose. A model Research findings can be enriched by the use of simulation models
that is useful for and this is an attempt to encourage you not to shy away from
one job may be
inappropriate for using modelling tools just because you don’t like maths!
another task on a
similar topic Model types
• Models vary in
scope from the Models are everywhere
simple, which you
can put together You may not be aware of them, but you are using models all the
and use very time. They come as physical models in all shapes and sizes from
quickly, to the dolls, miniaturised cars and aeroplanes and globes, or as visual
complex that may
take much of your
representations in maps or pictures. They may be presented as
project time to verbal or mental models, or in more abstract arithmetic or alge-
develop and use braic form, in nearly all we learn. A model is just a simplified
• Computing tools representation of part of the real world.
designed for the job Physical models have been used for centuries in research. En-
can make modelling gineers use models of boats to study their stability and resistance
feasible for stu-
dents who are not
to movement through the water. In biological research one species
specialists is often said to ‘model’ another; in the early stages of medical
research monkeys and mice are used to model man, because they
represent some aspects of human physiology well. The images we
carry in our minds, i.e., mental models, are simplified representa-
213
The Green Book
tions of complex systems. We use them constantly to interpret the world around us and we
usually do not realise that we are doing so.
None of these models involve the complete similarity of real world and model, but
similarity in key features. A model is useful if it behaves in a realistic way for your problem.
The scale model of a ship may be useful for investigating its stability in the water, but it will
be useless for determining the profitability of operating the ship. Different models of the
same phenomenon are useful for different things. Take a 1-ha farm as an example. A map of
the farm (a visual scale model) might be useful when the farmer is planning the location of
different crops. Physical models of the landscape, built up from clay and painted, can be
used to examine the interaction of the farm with neighbouring farms and other land areas.
Numerical input–output models help in making investment decisions. Detailed numerical
topological models can be used to understand water flow and erosion on the farm. Each of
these is a ‘model of the farm’ and each is useful for its own purpose, but inadequate for
other purposes.
Mathematical models
This chapter is about the mathematical models that are used in agricultural research. If the
relationships and rules that make up the model are sufficiently well specified, then they can
be written down mathematically and produce numerical results. In very many models the
basic mathematical relationships and rules are simple (such statements as ‘volume = mass/
density ‘or ‘yield is zero until after flowering’). Complex patterns of results often emerge
because of the many interacting components, rather than because there are some complex
mathematical ideas embedded in the model. This is important . It means you do not have to
be a mathematician, or even very good at using mathematics, to make effective use of
models in your research.
214
4.8 Models
215
The Green Book
Roles of models
Models play several roles including:
• Exploring the implications of theory. It may not be possible to see the implications of
theories that involve several interacting components without calculating what happens in
different conditions. Used in this way, models provide insights and add creativity
• Prediction or forecasting tools help users make sensible educated guesses about future
behaviour. These can be used in planning, scenario analysis and impact analysis
• Explaining observations and generating hypotheses
• Training so that learners can carry out ‘virtual experiments’, exploring the result of
making changes.
In research models can help answer such questions as:
‘Can I construct a theory that explains my observations?’
‘Is my hypothesis credible?’
‘What new phenomenon does my theory help to explain?
Used for prediction, models can answer such questions as:
‘Given the model, what will happen in the future?’
‘Given the model, what’s going on between places where I have data?’
‘What is the likelihood of a given event?’
How to model
You have three options if you decide to use simulation models in your work. You can use an
already existing developed model, modify an existing model or develop a new model alto-
gether.
• You may not find a model that actually describes the phenomena in which you are
interested at the right level of simplification
• The available models may require inputs that are not available to you
• You may not fully understand how the model is constructed (the theory on which it is
based)
• The model may not run on any computer available to you, or in the way you need for your
research.
If you are considering using a model, then select it by:
1. Determining exactly what you want to do with it . You will only be able to decide if
candidate models are suitable when your task is clear.
2. Searching literature and the Internet for references to models that tackle your problems,
and asking experts in the field.
3. Evaluating each possible model against your requirements. If you end up with more than
one candidate then choose the simplest.
217
The Green Book
Steps in modelling
The steps involved in the modelling proc-
ess are summarised in the flowchart (Fig-
ure 2). However, developing any useful
model will be an iterative process – you
will certainly have to return to early steps,
for example, if you are looking again at
the interactions in your model when it
does not seem to give sensible predic-
tions.
The model-building process can be as
enlightening as the model itself, because
it reveals what you know and what you
don’t know about the connections and
causalities in the system you are studying.
Thus modelling can suggest what might
be fruitful paths for you to study and also
help you to pursue those paths.
218
4.8 Models
parameters in question. Such variations can be ignored in a long-term model but could be
important in a short-time model. Examples of scales and typical times are:
• Metabolic (enzyme-catalyzed reactions; seconds to minutes)
• Epigenetic (short-term regulation of enzyme concentration; minutes to hours)
• Developmental (hours to years)
• Evolutionary (months to years).
Constructing a model
Building a model is an interactive, trial and error process. A model is usually built up in steps
of increasing complexity until it is capable of describing the aspects of the system of interest .
Note: It will never ‘reproduce reality’.
The appropriate tools you need to construct a model depend on the complexity of the
model. The simplest tools may be paper and pencil. Others may use spreadsheets, while the
219
The Green Book
more complex models may require dedicated modelling software that uses its own language.
The simplest mathematical model takes the form of equations show how the magnitude of
one variable can be calculated from the others and spreadsheets like Excel are adequate for
the task.
More complex computer simulations use special software that allows the building and
testing of a model. There are software products available that make building and running
some types of models very easy even if you know nothing about computer programming.
Investigate such software as STELLA and ModelMaker before trying to write your own code
in lower-level computing languages. They make the job of developing and running your own
models very much simpler!
The development of the
simple soil water model out-
lined in Figure 1 is shown
here to give you an idea of
what is involved. The model
represented in Figure 1 is
drawn in STELLA. In Figure
3a. STELLA uses four main
types of building blocks:
Stocks. These are stores of
‘stuff’, represented by rectan-
gles. They may describe wa-
ter, money, people, biomass,…
whatever you are modelling.
Flows. These are the move-
ments of material into and
out of stocks, represented by
broad arrows. The arrow can
be thought of as a pipe, with
a tap on it to regulate the Figure 3a. Simple soil water model in STELLA
flow. Sources and sinks of the
material are represented by
‘clouds’.
Converters. These are represented by circles. They hold values of constants and formulae
used to convert one type of material to another.
Connectors. These narrow arrows show the logical connections between components in the
model. The equations describing the model must be consistent with these connections.
The stock of soil water (W) has an inflow of rain (R) and outflows of uptake (U) and
drainage (D). The actual values of these are read from data files. The model is completed by
filling in a formula or other details in each location marked by ‘?’. The model can then be run.
In Figure 3b the uptake is now calculated as c.P, where P is the potential evapo-
transpiration (PeT), also read from a file. It should be clear from this that modifying the model
requires little more than adding components to the diagram. The real challenge of course is
deciding how to model uptake, not changing the computer code – this is why software such
as STELLA is so important . The final step (Figure 3c) shown here displays two more changes
that the modeller thought would help. The drainage is now calculated (because there was no
220
4.8 Models
Sensitivity analysis,
validation,
verification and
calibration
Sensitivity analysis
Through sensitivity analysis,
you can gain a good over-
view of the most sensitive
components of the model.
Sensitivity analysis attempts
to provide a measure of the
sensitivity of other param-
eters or forcing functions, or
Figure 3c. Simple soil water model with uptake depending on
sub-models to the stated both crop biomass and soil water
variables of greatest interest
in the model. It helps you to
221
The Green Book
systematically explore the response of the model to changes in one or more parameters, to
see how sensitive the overall model outcome is to a change in value. This sensitivity is always
dependent on the context of the setting of other parameters, so you should be careful about
the conclusions you draw. Some parameters only matter in particular types of circumstance.
Others, however, seem to always matter, or to matter hardly at all. This type of model
analysis is used to see which parameters should get priority in a measurement programme.
You must be provided with affordable techniques for sensitivity analysis if you are to
understand which relationships are meaningful in complicated models. This is equally true
whether you are using an already developed model, modifying a model or developing one.
Modelling leaf phenology effects on growth and water use in an agroforestry system
containing maize in the semi-arid Central Kenya using WaNuLCAS.
The three tree species under study were Grevillea robusta (evergreen), Alnus acuminata (semi-
decidous) and Paulownia fortunei (deciduous). The inputs included climate data, soil data,
calendar of events, crop and tree parameters, agroforestry zones and layers, and leafing
phenologies. The scenario outputs included soil water balance, tree and crop biomass and
stem diameter. WaNuLCAS model simulations demonstrated that altering leaf phenology from
evergreen through semi-deciduous to deciduous decreased tree water uptake and intercep-
tion losses but increased crop water uptake, and drainage rates in all the species. It was
222
4.8 Models
therefore concluded that deciduous tree species would compete less with crops and be more
advantageous in increasing stream flow than evergreen trees. Phenology had not previously
been a major consideration in determining tree selection For more details, see Muthuri
(2003).
Modelling the benefits of soil water conservation using PARCH; A case study from a
semi-arid region of Kenya.
The PARCH model was used to simulate maize grain yield under three soil/water conserva-
tion scenarios: 1. a typical situation where 30% of rainfall above a 15 mm threshold is lost as
runoff, 2. runoff control, where all rainfall infiltrates, and 3. runoff harvesting, which results in
60% extra ‘rainfall’ for rains above 15 mm. The study showed that runoff control and runoff
harvesting produced significant maize yield increases in both the short and the long rains.
Previously runoff control was justified more for erosion benefits than increased crop produc-
tion. For more details, see Stephens and Hess (1999).
Conclusions
The success of models developed by physicists and chemists has led to the rapid develop-
ment of modern technology, the conquest of many diseases resulting in increased life
expectancy, and the improvement of human lives on earth. But , no matter how successful a
model has been, scientists realise there may be aspects of the world that the model fails to
explain, or worse, predicts incorrectly. Nevertheless, creating and using models is one of the
most powerful tools ever developed. But , there is a need to revise and improve models as
new information is discovered.
223
The Green Book
van Noordwijk, M. and Lusiana, B. 2000. WaNuLCAS version 2.0: Background on a model of water, nutrient
and light capture in agroforestry systems. International Centre for Research in Agroforestry (ICRAF),
Bogor, Indonesia, 186 pp.
Shepherd, K.D. and Soule, M.J. 1998. Soil fertility management in Western Kenya: dynamic simulation of
productivity, profitability and sustainability at different resource endowment levels. Agriculture, Ecosystem
and Environment 71: 131–145.
Soto, R. 2003. Introducing system thinking in high school. The connector (Connecting system thinkers
around the world) 1(5). http://www.hps-inc.com/hps/zine/sepoct03/jake.html
Stephens, W. and Hess, T.M. 1999. Modelling the benefits of soil water conservation using the PARCH
model—a case study from a semi-arid region of Kenya. Journal of Arid Environments 41: 335–344.
Internet resources
Ecological models http://www.wiz.uni-kassel.de/ecobas.html
CERES crop models http://www-bioclim.grignon.inra.fr/ecobilan/cerca/ceres.html
FALLOW model at http://www.icraf.cgiar.org/sea/AgroModels/FALLOW/
FLORES model at http://www.cifor.cgiar.org/flores/ An example of model building in participatory research
PARCHED-THIRST at http://www.cluwrr.ncl.ac.uk/projects/tanzania/modelling.html
WaNuLCAS model at http://www.icraf.cgiar.org/sea/AgroModels/WaNulCAS/
STELLA software; High performance Systems Inc at http://www.hps-inc.com/
Powersim software; The business simulating company www.powersim.com/
Vensim PLE. Vantana Systems Inc. www.vensim.com/
Management Unit of the North sea Mathematical Models (MUMM) (2003) http://www.mumm.ac.be/EN/
Models/Development/Ecosystem/how.php
Model Maker: available from www.modelkenetix.modelmaker/index.htm
224
4.9 Where is the participation?
Richard Coe
• Effective projects ‘One hour of life, crowded to the full with glorious action, and filled with
will involve partici- noble risks, is worth whole years of those mean observances of paltry
pation of stake-
holders in all decorum, in which men steal through existence, like sluggish waters through
stages of planning, a marsh, without either honour or observation.’
implementation and Sir Walter Scott
evaluation
• Participation in Part 2 of the book made a strong case for ‘participation’ – involv-
a research study, ing those who will use the research in the process. So why has
both who and how, Part 4 of this book, which focuses on research methods, not
should be deter-
mined by the included a specific section on these methods?
objectives of the The answer is simple: appropriate method and levels of partici-
study pation are needed in all stages of a project . The subject of this
• Participation of book is sound research, and the principles of methods needed to
farmers or others do good research are much the same whoever is participating.
in a research study ‘Participatory methods’ for both the social and natural sciences are
is no reason to
forget the elements widely discussed and described in the literature and they have
of research design been referenced in the appropriate chapters. Thus, just as we do
that will allow you not describe all the methods available for experimental design, or
to reach valid
conclusions
analysing data or building models, so this book does not provide
a comprehensive review of the research methods which are avail-
able when collecting and analysing data in a participatory research
process. There are specialised texts to help you with this.
This answer will not satisfy everyone, so it is elaborated below.
In the discussion I distinguish between participation in a project
and participation in the research studies that make up the project .
The role of participation at the two levels can be different , and the
extent to which, as a student , you can have some influence over
them is also different .
Projects
Think of the cowpea project described in Chapter 2.3. Pests were
found to be an important constraint in cowpea production, so the
project aimed to find ways of overcoming them. However it is
conceived, a project to tackle this problem will have elements of
refining understanding of the problem, devising and testing possi-
ble solutions, promoting widespread use of the solutions and
assessment of their impact , with iteration and cycling around these
elements. It therefore seems natural that farmers should be in-
volved in all the stages –
• Who better understands the occurrence and impacts of cowpea
pests than the farmers growing the crop?
225
The Green Book
• Who can test and evaluate solutions, but farmers who will have to use them?
• Who can evaluate their impacts, but the farmers who feel them?
These are the pragmatic reasons for participation in problem-solving projects.
Other reasons are also often given, reasons which might be described as ideological,
some discussed in Part 2. Development workers all over the world believe that ‘development’
involves giving people more control over their lives and resources – indeed for some this is
the definition of development . It is a right of farmers not to be told that their problem is
cowpea pests, and told what to do about it , but facilitated in understanding for themselves
their problems and solutions which suit them. Projects that take this approach are more likely
to lead to sustainable solutions, that continue to have impact past the end of the project and
the departure of researchers.
So, if the cowpea project leaders are aware and agree with these arguments they will set
up a project which involves the farmers in all stages, with farmers working collegially with
researchers, each bringing their own expertise and knowledge to the table. But try to do this
and many further questions arise. For example:
1. Just who should participate? Farmers, or maybe others with an interest , such as cowpea
consumers and traders. Almost certainly the interests of all parties will not coincide.
2. Who decided that cowpea pests was the problem to work on? If you ask farmers their
problems you are more likely to get responses such as ‘Paying school fees’ or ‘Getting a
job’, rather than ‘Pests on the cowpeas’.
3. The participatory approach requires intensive engagement in villages, meaning the project
can only involve a few of them. But the problem covers large areas. How can you get the
participation of all cowpea farmers?
There are no simple answers to these types of questions. Each project has to find ways
that are best suited to the circumstances, but this has to be done knowing and understand-
ing the many options and approaches available.
Point 3 above is one of the main reasons why projects need a sound research component.
If your objectives only stretch to helping in those villages or households with which you are
immediately engaged, then maybe you do not have to pay much attention to research
methods. However there are few instances in which this is the case. Every project wants to
generate information that will be useful beyond its own bounds. The only way to do this with
known reliability is to use well planned research methods.
As a student joining a project to undertake thesis research you may not have been
involved in planning the overall project and the approach to participation adopted. However
you need to understand what the approach is and the reasoning behind it . And you must be
prepared to challenge it if necessary.
Research studies
Most projects will involve a number of specific research studies which contribute to its overall
strategy. How should participation be built into these, and what participatory methods are
appropriate? The answer is the same answer that we give to most other research methods
questions: it depends on objectives. If the objectives are specified clearly enough then they
should guide you. And, referring back to the previous section, there should have been
appropriate participation in setting the objectives for each component study.
A single project may well have component studies with different levels of participation,
which are mutually agreed by all concerned. An example of a project introducing high-value
226
4.9 Where is the participation?
227
The Green Book
experiment and analysing the data. But they do not alter the underlying requirements for a
good experimental design.
The same is true for surveys, the other way of collecting data. Here it is maybe even more
important to recognise the elements of sound design when using participatory methods.
Many of the tools used in participatory research are actually survey tools, so think about
their use in terms of survey principles. For example, focus group discussions can be very
valuable in understanding local conditions, problems and opportunities, particularly when
linked to such techniques as resource mapping and wealth ranking. They give real insights
into the villages with which you are working. And they give the participants themselves
insights into their own situation. If that is your objective, then you can be flexible in how and
where the tools are used But do you want to know how broadly applicable those results are,
and the extent to which they are representative of a larger population? If so, then use
sampling techniques to select the villages in which you work, and use the ideas of survey
management to make sure that information is really comparable across different villages.
The final point to make about the use of participation in a research study is ‘Beware of
packages!’. There are many guides to participatory research that present a packaged set of
tools and processes, based on what the author has found to work. But that author’s
objectives and circumstances will not be the same as yours, so do not expect to be able to
follow the same steps and use exactly the same tools. You have to learn to pick and choose
the methods and tools that meet your objectives. As a simple example, we have found that
participatory tools for matrix ranking to assess alternatives, based on the traditional mbao
game, can be used effectively to evaluate an on-station, researcher-designed experiment .
228
4.9 Where is the participation?
229
The Green Book
230
5 Where to from here?
Kay Muir-Leresche
• The more people ‘…the initiation of all wise and noble things comes…,generally at first
able and deter- from some one individual.’
mined to contribute
to sustainable John Stuart Mill
development the Representative Government
better
‘…there will be no injustice in compelling the philosophers who grow up in
• Your individual
ideas and actions our state to have a care for the others.’
do count. You can Plato
make a difference The Republic VII
• Creativity and
adaptability are Where to from here?
essential criteria You have finished your thesis, you have had it examined and you
for successful
economies in a have undertaken your corrections — it is bound and you have a
rapidly changing copy which you proudly present to your family.
and global
environment
What now?
• You can contribute Remember when you started out how unsure you were – ‘How will
to both the poor
and to your own I ever manage that?’, you thought? Well you have – and you have
advancement with grown in the process. You have developed skills and most impor-
imagination tant of all, you have gained confidence. After the strain of produc-
• Experience and a ing your thesis, you may be feeling a bit flat and uninspired. You
track record are probably need to renew your enthusiasm and burning desire to
important for contribute.
getting jobs — you
may need to start Confidence, the ability to use your own initiative and the inspi-
off in a menial
position or doing ration and determination to make a difference, are the most
voluntary work to valuable of all resources any country can have.
establish your
credentials The more people there are determined and able to contribute
to sustainable development , the better. You only have to look
• It is possible to be
an entrepreneur around Africa, or the world, to see that it is not necessarily rich
even without mineral resources, nor rich agricultural land, nor rich coastal wa-
capital ters that make countries better able to provide for their poor. It is
• Blend modern their human capital. It is their commitment to success and their
and traditional, ability to respond quickly – to be creative and adaptable – so that
indigenous and they can take advantage of changing technology, institutions and
conventional
social relations. It is essential for us to start to take charge of our
• Be proud of your destiny; to succeed in developing our countries. We Africans want
heritage, under-
stand the limita- our children to grow up in an environment where they are able to
tions and grasp the chart their own course and do not feel hopeless. We need to be
opportunities able to move away from corrupt practices that obtain short-term
advantages. We need to earn our incomes by providing goods and
services which in turn will develop Africa. We need to be able to
231
The Green Book
take control of our development at the personal, village, national, and regional levels.
Finding work that provides you with money and prestige are common goals. They are
important . We need that money to live, to repay our families who invested in our education
and to provide for our futures. Social standing can be important to many people – but
remember that fashions change and what is prestigious today may not be in 10 years’ time.
Happiness, however, is not limited to wealth and fame. There is considerable personal
satisfaction from contributing to society. If you can make a lasting difference to the lives of
the poor, to the development of your country, or even to one student or one farmer or one
village, you will be able to look back when you are 80 and say – Yes, I did make a difference!!
These goals do not have to be mutually exclusive.
Example 1
Nyasha is hired by Norsk Agricultural Chemical Co. to promote the sale of fertilizers and
pesticides. She earns her salary by selling to conventional markets and using the established
recommendations. Perhaps she remembers that when she was doing her graduate research,
many small-scale farmers could not afford to apply fertilizer at the recommended rate. She
has heard of someone who has adapted the established recommendations to more closely
suit small-scale, poor farmers. So in her spare time, she contacts them and then draws up a
marketing strategy that would provide farmers with access to this new information. She has
to persuade the company that , although these recommendations are for lower fertilizer use,
they will make it possible to sell to many more farmers.
In this example the sales agent used her initiative and commitment to change things for
the better. She also advanced her career. You can all do this. In every job you do, it is
possible to make the world a little better for the future. You need to believe in your own
power. You need to learn to be a self-starter and to be prepared to put in that extra effort.
You need to take an ethical stand. Do not allow your valuable skills to be used to further
corruption and the cheating of your fellow citizens. Do not contribute to the degradation of
the environment and the impoverishment of future generations. We owe it to our children to
leave a better world than we found, and you can make a difference.
Employment
The formal sector
Remember that employment does not necessarily mean working for someone in return for a
salary. Employment means using your skills and labour to produce output that will have
financial and other rewards. Professional jobs for new graduates in Africa have become
increasingly difficult to find, despite the considerable shortage of skills. This is because for
some 50 years, governments employed new graduates. They would obtain practical experi-
ence and learn to operate in the working world that gave them credibility and led to formal
employment in the private sector. Decentralisation, declining government budgets and re-
duced investment in research, extension, and education have all contributed to shrinking
these opportunities in most African countries. At the same time, the private sector is
reluctant to hire untested graduates. In most countries very strict legislation makes it difficult
for employers to release staff once they have been hired and as a result they are very risk-
averse in their employment policies. You are required to be much more innovative than your
parents in seeking employment .
232
5.1 Where to from here?
You need to get together some evidence of your ability. Take a copy of your thesis and
of a few other projects or papers you have produced. Ensure that you include the extra-
curricula activities with which you have been involved and any positions of leadership or trust
which you may have held. Speak to the people you are going to use as referees and be sure
they are happy to do this. Provide them with a copy of your CV so that it is easier for them
to write the reference.
Find out about the company before you go for an interview. See where you think you
would fit in. At the interview you should not be arrogant but you must make an opportunity
to be able to tell them how you think you could contribute to their organisation. For example,
if the job involves selling tractors, you might mention your contacts from your home area
who may be interested clients – or mention your experience working in a garage during one
of your vacations. If it is project management and budgeting, you could mention your role in
the university agricultural student society. If you don’t have anything specific you could offer,
at least be sure you understand what the organisation does and show that you have thought
through how you could play a role within it .
Something else
If you are unsuccessful in obtaining formal sector employment , you should seriously consider
voluntary service as a stepping stone. Most prospective employers would be prepared to
provide you with basic transport and food costs. If you cannot find a company to hire you
even on these terms, then prepare a research proposal and contact relevant NGOs, govern-
ment research departments or even churches. Do not be ambitious for a high financial
reward even when you are contacting an international agency. Remember this first ‘job’ is
more to establish your credibility and gain experience than to provide an income. Impress
the prospective benefactor with the fact that you are prepared to sacrifice in order to get
ahead in the future and to contribute to your society. You need to realise that the world does
not owe you a living and that you have to be creative in getting that first job. Once you have
experience, if you prove yourself, it will be much easier to move up the ladder.
For many African students this is difficult . Their families have invested resources in the
graduate’s education and now they expect that person to start to contribute to the family.
Prepare your family. Show them your strategy ahead of time and I am sure you will find them
much more understanding.
Even if no-one is prepared to take you on, even as a volunteer, you may then have to go
and take a much more menial job. Look at it positively as a stepping stone and be constantly
on the look-out for how you can contribute to the success of the organisation for whom you
are working. It is surprising how many highly successful people have started in very menial
positions.
Increasingly in Africa the best way to get ahead is to become an entrepreneur yourself.
How you go about this will depend on the contacts and resources you have. If you are able
to raise capital then you can be more adventurous. If you cannot raise any capital then start
very small. Identify a need and provide for it even in a very small way.
Example 2
Tapiwa realised that there would be no bread available in the following year. He knew that
urban workers would need to have convenience food that they could afford. He went to his
aunt in the rural areas and asked her to provide him with some sweet potatoes and promised
to repay them when he harvested his own crop. He went and read up all the literature on
233
The Green Book
sweet potatoes and learned what he could about their preferred soil types, mineral require-
ments and the most ideal moisture conditions. He could not afford fertilizer but he ap-
proached the people in his street and asked them if they would put all their vegetables and
other wet refuse into bags for him. He would collect it and this would reduce the unpleas-
antness of such refuse left out on the road for days. He also collected newspapers and on
a vacant lot he made a compost pit . As a result he had a bumper harvest of high-quality
sweet potatoes that fetched a high price because of the need he had identified. In due
course he became a successful market gardener, bought his own plot and was able to employ
workers.
234
Contributors
Gerald W. Chege is a Kenyan with a PhD in Parallel Computing from York University, UK. He
is presently Assistant Professor and Coordinator of the Information Systems and Technology
Department , United States International University, Nairobi. His main area of interests are:
computer networks, database technology; systems development , and Internet technology.
Richard (Ric) Coe is an Applied Statistician from the United Kingdom. He gained an MSc in
Biometry from the University of Reading, where he continued as a lecturer for 10 years. During
that time he was involved in a number of training and research projects in Africa and Asia. In
1990 he moved to the World Agroforestry Centre (ICRAF) in Nairobi, Kenya. There he is Head of
the Research Support Unit that provides technical support and training in research planning and
design, data management and analysis to all ICRAF projects and partners. His interests are in
making research for development as effective as possible through the use of sound method-
ology, and increasing capacity in Africa to do this. He has taught courses at several universities
in the region and has worked with hundreds of graduate students on their research projects.
Tony Greenfield, a graduate in Statistics from London University with a PhD in Experimental
Design from Sheffield Hallam University, was formerly Head of Process Computing and Statistics
at the British Iron and Steel Research Association, Sheffield, and Professor of Medical Comput-
ing and Statistics at Queen’s University, Belfast . He is a Visiting Professor to the Industrial
Statistics Research Unit (ISRU), at the University of Newcastle-upon-Tyne and is past President
of European Network for Business and Industrial Statistics (ENBIS). While at Queen’s University
he established a course in research methods for the medical faculty. His publications include
Research Methods: Guidance for Post-Graduates (Editor and co-author), first published by Edward Arnold
in June 1996, second edition in June 2002. This book is used in some English universities in
courses for postgraduates who intend to proceed to research degrees.
Thomas Gumbricht is a Swedish Hydrologist working with the World Agroforestry Centre. He
holds a PhD in Land Improvement and Drainage from the Royal Institute of Technology (KTH),
Sweden, and prior to his arrival in Kenya was Head of Geoinformatics at the Department of
Earth Sciences, Uppsala University, Sweden. His main interests are systems ecology and hydrol-
ogy; using geoinformatics as a platform for understanding and modelling processes on a
landscape scale.
Sue Hainsworth has been editing all her working life. After graduating in Agricultural Sciences
from Nottingham University she edited the Tropical Pest Management Journal and wrote the first
three titles in the Tropical Pest Management Manual series on bananas, groundnuts, and rice.
After time in Rome with the Food and Agriculture Organization of the United Nations (FAO) and
the International Plant Genetics Resource Institute (IPGRI, then IBPGR) in 1983 she joined the
International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) and rose to become
Manager, Publications before leaving to start her own Editorial and Publishing Services in 1998.
Erica Keogh is a Zimbabwean, holding an MSc in Statistics (University of Zimbabwe, 1987). She
has been employed as a lecturer at the University of Zimbabwe since 1980 but is currently on
long leave while she engages in a long-term consultancy with the UK’s Department for Interna-
tional Development (DFID) monitoring their Humanitarian Relief Programme in Zimbabwe. Since
235
The Green Book
the early 1990’s she has become increasingly involved in applications of statistics and has had
extensive experience in the design and implementation of surveys in both rural and urban areas,
focussing mainly on issues of poverty and related aspects of social change.
Eric McGaw, an American national, has lived and worked in the developing world for over 30
years. He graduated from Rockford College with a degree in Fine Arts, and completed post-
graduate work in Education at Boston State College, USA. After serving in the Peace Corps in
El Salvador, he worked as a university professor, a deep sea diver, a freelance writer and editor,
and a communication specialist in Colombia, Brunei, Singapore, the Philippines and India. He
has travelled widely throughout Latin America, Asia and Africa. Currently, he is employed as
Head of Communications at ICRISAT located near Hyderabad, India.
Peter K. Muraya is a Kenyan with a BSc in Electronic Computer Systems Engineering from
Loughborough University, UK. Presently he is Data Management Specialist with the responsibility
of leading the World Agroforestry Centre’s initiative to bring management of research data to
agreed standards in all regions and projects. He was earlier involved with the development of
simulation models for agroforestry systems. His main area of interest is in conceptualising,
design and implementation of data management methods and software tools.
Liliosa Maveneka has a BSc in Mathematics and Botany and an MSc in Agricultural Econom-
ics. She also is an Associate Member of the Institute of Chartered Secretaries and Administra-
tors. She has worked as a Registrar in the Faculty of Agriculture and as a Senior Administrator
for the University of Zimbabwe for 15 years. She is a consultant working on HIV/AIDS impacts,
in issues related to water allocation and pricing and in providing assistance in accessing Internet
data to post-graduate students and researchers.
Catherine Wangari Muthuri is a Lecturer in the Department of Botany, Jomo Kenyatta Univer-
sity of Agriculture and Technology (JKUAT), Kenya, where she earlier gained an MSc in Botany
(Plant Physiology). Catherine submitted her PhD thesis on the ‘Impact of agroforestry on crop
performance and water resources in semi-arid Central Kenya’ for examination in September
2003. She carried out her research work for 3½ years at the World Agroforestry Centre. For the
past 10 years, Catherine has been involved in teaching, administration and research at JKUAT.
Her research interests include environmental plant physiology in agroforestry and non-agroforestry
systems with particular emphasis on drought stress and the application of agroforestry models
in research.
Joseph Opio-Odongo is a Ugandan holding a PhD in Rural Sociology from Cornell University,
USA. He is one of the United Nations Development Programme’s (UNDP’s) Environmental Policy
Specialists, out-posted to Nairobi to provide technical backstopping to UNDP Country Offices in
sub-Saharan Africa on policy and programme development . He previously served as a Sustain-
236
Contributors
able Development Advisor at the UNDP Country Office in Uganda after some years of teaching
at Makerere University in Uganda and Ahmadu Bello University in Nigeria. His research and
teaching experience has been mainly in the fields of agricultural and rural development . His
research and development interests include policy analysis, empowerment of civil society, sus-
tainable development , organisational development , science and technology policy, lobbying and
advocacy, and the codification and application of indigenous knowledge and technology.
Bharati K. Patel has been working in the Food Security Division of The Rockefeller Foundation
in Africa for the past 10 years. As an Associate Director she ran the Forum on Agricultural
Resource Husbandry a competitive grants programme designed to encourage and support
research on agricultural resources. The programme supported the staff in ten Faculties of
Agriculture in Kenya, Malawi, Mozambique, Uganda, and Zimbabwe in their training of graduate
students. A Zambian with a primary degree in Botany and a PhD in Nematology from the Waite
Institute in Australia, Bharati also worked for the Zambian Agricultural Research System for 20
years where she rose to become the first woman Director of Agricultural Research in Africa. She
also worked in ICRISAT prior to her assignment with The Rockefeller Foundation.
Aleya Pillai is an Indian, with a Commercial Arts Diploma from Jawaharlal Nehru Technological
University, Hyderabad, India. She worked as Art Director from 1984 to 2001 with various Indian
advertising agencies, the last one being Mindset in collaboration with Saatchi & Saatchi. During
that time she designed several award-winning annual reports, calendars, press ads and logos.
She is no newcomer to cartoons, having used them to get across scientific concepts in the past .
Aleya also enjoys cartooning and oil painting in her spare time.
Jane Poole is a British national and holds an MSc in Biometrics (Applied Statistics) from
Reading University, UK. She has recently returned to the UK after 6 years of working in Africa,
where she provided biometrics support to scientists and research students at the World Agroforestry
Centre and CAB International (Africa Regional Centre), both based in Nairobi, Kenya. Jane
currently works at the UK Forest Research Agency with scientists covering a wide range of
disciplines: from forestry, entomology, and pathology to ecology and environmental research.
Jane has wide experience in experimental design and analysis and in small and large-scale
biological and socio-economic surveys. She enjoys working with students and scientists from
many disciplines, learning about their research and working with them as a member of the
research team throughout their projects.
Jayne Stack is a Senior Lecturer in the Department of Agricultural Economics and Extension,
University of Zimbabwe and has more than 20 years experience in development training,
development programmes and research in Africa and Asia. She has taught research methods at
undergraduate and postgraduate level and contributed to the development of distance-learning
courses in research methods and data analysis for Imperial College, London. Jayne has a wide
interest in development issues ranging from crop marketing to agricultural policy reform, house-
237
The Green Book
hold food security and livelihood analysis. Her research work aims to make a difference in the
lives of the poor and to contribute information that will enhance livelihood security of vulnerable
households.
Paul L. Woomer is a researcher working with the Sustainable Agriculture Centre for Research
Extension and Development in Africa (SACRED-Africa), a Kenya-based NGO. One of his major
interests is the adaptive research process where different potentially useful technologies are
compared, combined and refined to suit the needs of individual farmers. He has written or
edited four books and published over 90 papers or chapters in international journals and multi-
authored books. Paul was raised in the Hawaiian Islands where he developed a keen interest in
tropical crops and ecology. He attended the University of Hawaii, where he obtained a BSc in
Agronomy and a MSc and PhD in Soil Science. He previously worked with NifTAL-MIRCEN,
TSBF-UNESCO, The Alternatives to Slash and Burn Consortium and the University of Nairobi.
Paul has lived in Kenya since 1990 and has visited or worked in 18 different African countries.
238
Acronyms and abbreviations
ACSS African Crop Science Society
ADDS African Data Dissemination Service
ACT almanac characterisation tool
AFRENA Agroforestry Research Network for Africa
AHI African Highlands Initiative
AI appreciative inquiry
AKIS Agricultural Knowledge and Information System (World Bank)
ANOVA analysis of variance
AVHRR Advanced Very High Resolution Radiometer
CARPE Central African Regional Program for the Environment
CBO community-based organisation
CD compact disk
CGIAR Consultative Group on International Agricultural Research
CIESIN Center of International Earth Science Information Network
COSOFAP Consortium for scaling up options for increased farm productivity in
Western Kenya
CRSP Collaborative Research Support Program (USAID)
CRU Climate Research Unit (University of East Anglia, UK)
CTA Technical Centre for Agricultural and Rural Cooperation (the Netherlands)
DCW Digital Chart of the World
DEPHA Data Exchange Platform for the Horn of Africa
DEM digital elevation model
DFID Department for International Development (UK)
DMA Defense Mapping Agency
DRASTIC depth of groundwater, recharge, aquifer media, topography, impact of root zone,
conductivity
DRC domestic resource cost
DSMW Digital Soil Map of the World (FAO)
DSS decision-support system
ENBIS European Network for Business and Industrial Statistics
ESA European Space Agency
ESRI Environmental Systems Research Institute, Inc.
ETM Enhanced Thematic Mapper
FAO Food and Agriculture Organization of the United Nations
FEWS Famine Early Warning System (USAID)
GDP gross domestic product
GIS geographic information system
GPS global positioning system
GUI graphical user interface
IARC international agricultural research centre
IBPGR International Board on Plant Genetic Resources (now IPGRI)
ICIPE International Centre for Insect Physiology and Ecology
ICRAF World Agroforestry Centre
239
The Green Book
240
Appendices on the CD
241
The African Crop Science Society
P.O. Box 7062, Kampala, Uganda
ISBN 9970-866-00-1