0% found this document useful (0 votes)
92 views223 pages

Basic Statistics Notes

Uploaded by

JAPHARI MAYEYE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views223 pages

Basic Statistics Notes

Uploaded by

JAPHARI MAYEYE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 223

BASICS OF STATISTICS

Module Contents
• Introduction to Statistics
• Description of data
• Basics of data collection
• Basics of field work Supervision
• Data summarization

1
Introduction to Statistics
Outline
• Introduction
• Characteristics of statistics
• Limitations of Statistics
• Classes of Statistics
• Importance of Statistics
• Basic concepts of Statistics

2
..Introduction to Statistics..
• What is Statistics?
Statistics is a profession of dealing with
information in numbers.
• It is not an easy job, to define Statistics.
• The word Statistics has two meanings/senses:
• Statistics has been defined differently by
different Authors (Bowley, Boddington,
Croxton and Cowden, Horace Secrist etc.).
1.Statistics in singular:
Is the study of principles and methods applied
in collecting, organizing, presenting, analyzing
and interpreting the numerical data in any field
of investigation.
3
..Introduction to Statistics..
2. Statistics in plural
Statistics in plural sense refers to numerical facts
and figures collected in a systematic manner with a
definite purpose in any field of study. Example
production of rice in Mbeya in year 2012, Statistics
in football game, number of students enrolled at
EASTC in year 2013/2014 etc.

Statistics – as a numerical value calculated from the


sample dataset.

4
Characteristics of statistics..
1.Aggregate of facts
Statistics does not refer to a single figure but it refers
to a series of figures. A single weigh of 60 kg of a
first year Bachelor Degree is not statistics but a series
relating to the weight of a group of persons is called
statistics. It means, all those figures which relate to
the totality of facts are called statistics. Statistics has
got nothing to deal with what is happening to a
particular object.

5
..Characteristics of statistics..
2. Affected by Multiplicity of Causes
Statistics are not affected by one factor only,
rather they are affected by a large number of
factors. Example prices are affected by
conditions of demand, supply, money
supply, imports, exports and various other
factors.

6
..Characteristics of statistics..

3. Numerically expressed
Another characteristic of statistics is that they
are expressed in quantitative form,
qualitative expressions like young, old, good,
bad etc. are not statistics.
4. Estimated according to Reasonable
Standards of Accuracy
Exactness can not be guaranteed.

7
..Characteristics of statistics..
5. Collected in a Systematic Manner
For accuracy or reliability of data, the figures
should be collected in a systematic manner. If
the figures are collected in a disorganized
manner, the reliability of such data will
decrease. Thus for reasonable standard of
accuracy, the data should be collected in a
systematic manner, otherwise the results
would be erroneous.

8
..Characteristics of statistics..
6. Collected for a Pre-determined Purpose
The usefulness of the data collected would be
negligible if the data are not collected with some pre-
determined purpose. The figures are collected with
some objective in mind. The efforts made without any
set objective would render the collected figures
useless. Thus the purpose of collecting data must be
decided well in advance. Besides, the objective should
be concrete and specific. For example, if we want to
collect data on prices, then we must be clear whether
we have to collect whole-sale or retail prices. If we
want data on retail prices, then we have to see the
number of goods required to serve the objective.

9
..Characteristics of statistics..
7. Placed in relation to each other
The collection of data is generally done with the
motive to compare. If the figures collected are not
comparable in that case, they lose a large part of
their significance.
Question
Are figures calculated from ‘kipimajoto’
regarded as statistics?

10
Limitations of Statistics
1. Statistics does not study qualitative phenomena:
Statistics deals with facts and figures. So the quality aspect
of a variable or the subjective phenomenon falls out of the
scope of Statistics. For example, qualities like beauty,
honesty, intelligence etc. can not be studied unless they are
converted into quantitative form.
2. Statistics does not deal with isolated
measurements:
Individuals facts and figures are of importance to
individuals only, Statistics does not deals with them. It
deals with mass phenomena and therefore, throws
light on the whole of a given group. Statistics deals
with aggregates, though for purpose of analysis these
aggregates are very often reduced to single figures.
11
Limitations of Statistics
3. Statistics can be misused:
Statements supported by statistics are more
appealing and are commonly believed. For this,
Statistics is often misused. Statistical methods
rightly used are beneficial but if misused these
become harmful. Statistical methods used by less
expert hands will lead to inaccurate results. Here the
fault does not lie with the subject of statistics but
with the person who makes wrong use of it since it
requires experience and skill to draw sensible
conclusion from the data.

12
Limitations of Statistics

4. Statistics cannot express the entire story:


Statistics are presented in summaries, or reduced
form, in that way they leave out a lot of
information that may be important.

13
Classes/Divisions of Statistics
Statistics as a science can be divided into two main
classes/divisions.
1. Statistical Methods
Statistical methods are those devices by which complex and
numerical data are so systematically treated in order to
present a comprehensive and understandable view of them.
These methods are: Collection, Organization , Presentation,
Analysis and Interpretation.
2. Applied Statistics
Applied Statistics deals with the application of statistical
methods to some specific problems example agricultural
statistics, industrial statistics, labor statistics etc.

14
Statistical Methods
1.Collection of data
Collection of data constitutes the first step in a statistical
investigations. Utmost care must be taken in collecting data
because they form the foundation of statistical analysis. If data
are faulty, the conclusion drawn can never be reliable. Data
may be available from existing published sources or else may
be collected by the investigator.
2. Organization of data
Organization of data can be done in three ways:
• Editing,
• Classification, and
• Tabulation

15
Statistical Methods
3.Presentation of data
3.1 Textual
Statistics are expressed in words.
3.2 Tabular presentation
A statistical table can usually be considered to consist of the
following seven basic parts
1. Table number
This number, which precedes the title, serves to identify the
table in case many tables are presented. It is usually a serial
number with suitable sub-numbers to indicate the concerned
main topic and sub-classification.
2. Title
3. Column caption
4. Stub or row caption
5. Body of the table
6. Footnote (if any)
7. Source (if any)
16
Statistical Methods
3.3 Graphical presentation
• Pictographs
• Pie-charts
• Bar and column charts
• Rectilinear charts
4.0 Analysis
A major part of this is devoted to the methods used
in analyzing the presented data. Methods used in
analyzing data are numerous ranging from simple to
complicated. (Descriptive Statistics and Inferential
Statistics)

17
Statistical Methods
5.0 Interpretation
The last stage in statistical investigation is
interpretation , drawing conclusions from the
data collected and analyzed. This is a difficult
task and necessitates a high degree of skills
and experience. If the analyzed data are not
properly interpreted then the whole objective
of investigation may be defeated.

18
Importance of statistics
• Enable to describe/understand the current
situation in any socio-economic set up
• Enable to make evidence/informed decisions –”
you can not measure it, you can not manage it”,
“Statistics are the eyes and ears of any planner
and decision taker”, “Without statistics you have
no right to speak”.
• Enable to make predictions for the future – “A
ship without radar is likely to anchor anywhere”
• Used in planning – “Planning without statistics is
like a blind man groping in a dark trying to locate
a black cat that is not there”.

19
Basic concepts of Statistics
• Population
A complete set of items under
discussion. It is important for the
investigator to carefully and
completely defines the population
before collecting the sample.

20
Basic concepts of Statistics
•Sampling
Sampling is the process of selecting units from a
population of interest so that by studying the
sample we may fairly generalize our results back
to the population from which they were chosen.
Sampling can be probability sampling or non
probability sampling. With probability sampling
we get the so called a random sample.
•Sample
Sample is a subset of population dataset.

21
Basic concepts of Statistics
• Sampling frame
Is a complete list of all items of a population. In order for
the procedure of sampling to be correct, the sampling
frame must also be correct. The good sampling frame does
not contain inaccurate units, it is free from duplications
and omission, it is not out of date.
• Elementary unit
Each individual member of a population giving an
observation. It is the smallest unit yielding information
which by suitable aggregation leads to the population
under discussion.
Example: If an age distribution is to be estimated from the
sample of households then the person is the elementary
unit but again if the size of the household is to be estimated
then the household will be the elementary unit.

22
Basic concepts of Statistics
• Attribute
A qualitative changing characteristics that can not
be measured numerically. Example hair color.
• Parameter
A parameter is a value, usually unknown (and
which therefore has to be estimated), used to
represent a certain population characteristic.
Within a population, a parameter is a fixed value
which does not vary. Parameters are often
assigned Greek letters.

23
Basic concepts of Statistics
• Statistic
Is a numeric value calculated from a sample data set. It is
used to give information about unknown values in the
corresponding population. It is possible to draw more than
one sample from the same population and the value of a
statistic will in general vary from sample to sample.
Statistics are often assigned Roman letters.
• Characteristic
Is a common mark of elementary units of the population in
which we are interested to take our observations. Example
mean, standard deviation.

24
Basic concepts of Statistics
• Survey Sample Survey
Is a method of collecting detailed information
relating to a representative groups.
• Census (100% sample survey)
Is the process by which information about every
member of a population is collected .example
animal census, population and housing census etc.
• Variable
Variable is any thing that can be assigned different
values in different situations.

25
Basic concepts of Statistics
• Homogeneity
The property of elementary units with very similar or very
related qualities (characteristics).
• Heterogeneity
The property of elementary units with different qualities
(characteristics).
• Data
Data is a collection of facts such as values or measurements
(are raw information from which statistics are created).

26
Description of data
• Classification of statistical data according to
source
1. Internal data
These are data that an organization collects within
itself so as to know how it is being run from day to
day. They mainly arise as a by-product of
administration or management.
These data may be non-statistical eg. Data collected
for accounting purpose.

27
Description of data
2.External data
These are data about an organization but compiled
by an agency outside the organization. This outside
agency may be Government as part of its work
(statistical unit outside the organization) or just
another organization which needs the data about
this organization as an interested part.
It is good as a source of data because sometimes
some data have been overlooked by the organization
itself. Although great care should be taken by the
organization since the purpose of collecting the data
may be different
28
Description of data
• Classification of statistical data by purpose
1. Primary data
Data which are specially collected to solve a specific
problem, they are collected directly from the field of
enquiry and hence are original in nature. This is
done when the data required for a particular study
can be found neither in the internal records of
enterprises nor in published sources.

29
Classification of statistical data by
purpose
1. Secondary data
Data collected to solve a problem at one time,
but are being used to solve another problem.
However, secondary data must be used with
utmost care. The reason is such data may be
full of errors because of bias, inadequate
sample size, error of definition.

30
Description of data
Statistician compile different types of statistics.
Statistics compiled can be grouped into three major
groups.
1. Economic statistics
These covers the economic aspects of people’s life.
Statistics on industrial, infrusture etc.
2. Social statistics
These cover all social aspects of the conditions of life
and work of population. Statistics on housing,
health, education, cultural activities, crime statistics
etc.

31
Description of data
3. Population statistics
Is the use of statistics to analyze characteristics or
changes to a population.
Exercise:
Identify the type of data for the following cases:
1. Price of food products.
2. Statistics on entertainment and ceremonies.
3. Number of EASTC students in 2013.
4. Enrolment capacity of Primary one.
5. Total number of children.

32
Description of data
There are mainly two types of data:
1. Qualitative/categorical data
Are data expressing the qualities of units involved in
the investigations. They deal with description and
they can be observed but not measured.
Example: colors, smell, nationality, gender etc.
Qualitative data can be binary, ordinal or nominal
• Binary data
These are data which involve only two categories.
Example: gender, Smoking status, attendance etc.

33
Qualitative data
• Ordinal data
These are data which involve more than two ordered
categories.
Example: degree of illness, students’ opinions about
Basics of Statistics class.
• Nominal data
These are data which involve more than two
unordered categories.
Example: Nationality.

34
Quantitative data
2. Quantitative data
These are data on quantities. The figures involved
have their intrinsic values. They deal with numbers
and they can be observed as well as measured.
Examples: Length, weight, number of lecturers etc.
Quantitative data can be discrete or continuous.
• Discrete data
These are data that take only specific values within
the given range. Example number of people, number
of doctors etc.

35
Quantitative data
• Continuous data
These are data that may take any value within the
given range. Example weight, height, volume etc.

Data can also be classified according to levels of


measurement. The level of measurement of the data
dictates the calculations that can be done to
summarize and present the data. It will also
determine the statistical tests that should be
performed.

36
Levels of measurement
There are actually four levels of measurement:
nominal, ordinal, interval, and ratio. The
lowest, or the most primitive, measurement is
the nominal level. The highest, or the level that
gives us the most information about the
observation, is the ratio level of measurement.

37
Levels of measurement
• Nominal data
Data values serve as labels, but the labels have no
meaningful order. Nominal has no order, distance or
origin. This is the simplest and lowest level of
measurement. Nominal data are qualitative only. Eg
Gender – 1- Male, 2- Female etc
Data presentation: pie- chart and bar chart.
Analysis: Frequency tabulation

38
Levels of measurement
• Ordinal data
Ordinal data can be qualitative or quantitative. Data
values serve as the labels but the labels have natural
meaningful order. However, different between values
are meaningless. In other words ordinal data have
order, but no distance or origin. Example rating of
Mathematics lecturer – Strongly like, Like, Dislike,
Strongly Dislike, Statistics grade – A, B,C, D, position
of a student in class etc.

39
Levels of measurement
• Interval data
They are always quantitative, data values have
natural meaningful order, the difference between
data values are meaningful. In other words, interval
measurements have both order and distance but no
origin. There is no absolute (true) zero. Example,
temperature, year of birth, time. Manipulation of
numbers is possible.

40
Levels of measurement
• Ratio
Has order, distance and origin. It is the highest and
most ideal level of measurement. It is suitable in
measuring properties which have natural zero points.
They are always quantitative, data values are
numerical with natural meaningful order, and the
differences between data values are meaningful and
the ratios between values are meaningful. Zero
measurement indicates absence of the quantity being
measured. Example number of children, distance etc.
Data presentation: histogram, line graph.
Manipulation is possible.
41
Basics of data collection
Data production cycle
Information/
Data needed to
solve a particular
problem

Data
Dissemination collection

Report Data processing,


writing analysis and
interpretation

42
Basics of data collection
There are different methods of data collection
1. Physical observation/measurement (primary data)
2. Interviews (primary data).
3. Questionnaires (primary data).
4. Focus group (primary data).
5. Registration/administrative records (Primary and
secondary).
6. Transcription from records (secondary).

43
Physical Observation/measurement
• This is the method that involves enumerators to
take observations or measurement on units selected
in the study in order to discover particular
information.
• Observation involves viewing the phenomenon,
recording something about it, video taping, taking
notes about what was seen, counting occurrences of
a phenomenon.
• Observation can be done as structured or non
structured, directly or indirectly, participative or
non participative, obtrusive or non obtrusive.

44
Physical Observation/measurement
Physical observation is considered to be the best
method of data collection when a researcher :-
• is trying to understand an ongoing process or
situation,
• gathering data on individual behaviors,
• in need of knowing about a physical setting,
• need to collect data from unwilling respondents.

45
Physical Observation/measurement
• Advantages
1. Free from non response error,
2. Free from language problem,
3. Free from the errors of memory failure,
4. Free from errors of prestige.

46
Physical Observation/measurement
• Disadvantages
1. Susceptible to observers’ bias,
2. It is not always feasible (realistic),
3. It does not increase observers’ understanding of
why people behave the way they do,
4. Can be very laborious and time consuming,
5. Can be affected by transportation and accessibility
problems.

47
Interviews
• An interview is a series of questions a researcher
addresses personally to respondents for the
purpose of obtaining research relevant information.
Interviews can be structured or un structured.
• Interviews are useful method to investigate issues
in an in depth way, to discover how individual feel
about a topic and why they hold certain opinion. It
is also useful for investigating sensitive topic where
people may feel uncomfortable discussing them in
a focus group.

48
Interviews
They can be conducted as:
1. Face to face interview
In face to face interview, an interviewer is physically
present to ask the survey questions and to assist the
respondent in answering them. It is also known as
personal interview. It is probably the most popular
and oldest form of survey data collection. With face to
face interview the interviewer can supplement the
information given by the respondents with personal
observation.

49
Interviews
2. Telephone interview
Telephone interviewing stands out as the best method
for gathering quickly needed information. Responses
are collected by the researcher on telephone. It is a
very fast method of data collection but it is limited on
supplementing respondents’ explanation with non
verbal communication given by the respondent.

50
Interviews
3. Computer Assisted Personal Interview (CAPI)
CAPI is usually conducted using portable personal
computer. A researcher conducts face to face
interview using the portable computer. After the
interview the interviewers send the data to a central
computer. With CAPI, routing problems is eliminated
and interviewers cannot miss questions and this is
rarely achieved by face to face interviewers.
However, it takes considerable time to construct and
programme a questionnaire of CAPI, and in
experience interviewers may direct much of their
attention to their portable computers.
51
Interviews
• Advantages of interviews
1. They usually achieve a high response rate.
2. Respondents own word are recorded.
3. Ambiguities can be clarified.
4. Interviewees are not influenced by others in the
group.

52
Interviews
• Disadvantages of interviews
1. They can be limited by memory failure.
2. They can be limited by language problem.
3. They can be costly.
4. They can be affected by interviewers’ biases – in
asking and recording answers.

53
Questionnaires
A standard list of questions relating to the particular
investigation is prepared. This list of questions is
called questionnaire. The data are collected by
sending the questionnaires to the respondents and
requesting them to return the questionnaire after
answering the questions. In order to make this
method successful, a very polite letter is sent to the
respondents emphasizing the need and usefulness of
the problem under investigation.

54
Questionnaires
A standard list of questions relating to the particular
investigation is prepared. This list of questions is
called questionnaire. The data are collected by
sending the questionnaires to the respondents and
requesting them to return the questionnaire after
answering the questions. In order to make this
method successful, a very polite letter is sent to the
respondents emphasizing the need and usefulness of
the problem under investigation.

55
Questionnaires
Questionnaires can be conducted as paper pencil
questionnaire as well as web based questionnaire.
• Paper pencil questionnaire
Questions are presented on printed paper. The
respondent fill in his/her responses and return the
paper questionnaire to the researcher.
• Web based questionnaire
Questions are presented on the computer. The method
is limited to some people who have no computer,
internet or could not use computer. However, the
method eliminates the need to print the questionnaires
and manually deliver and collect them.
56
Questionnaires
Questionnaires is a useful method of data collection
when:-
• resources are limited and you need data from many
people.
• it is important to protect the privacy of participants.
(Questionnaires are helpful in maintaining
participants’ privacy because participants’ responses
can be anonymous).

57
Questionnaires
• Advantages
1. A large number of the population can be contacted
at a relatively low cost.
2. The responses are gathered in a standardized way.
3. They can be used in a sensitive topics.
4. Respondents have time to think about their
responses.
5. They are free from interviewers’ bias.

58
Questionnaire
• Disadvantages
1. Call backs is not feasible when questionnaires are
anonymous.
2. It is sometime difficult to attain a high response
rate.
3. Respondents may misinterpret questions, and
therefore giving wrong responses.

59
Focus group
• A focus group is a small group discussion guided by
a trained leader. The group composition is carefully
planned to create a non threatening environment in
which people are free to talk about a focused topic.
• It is useful in understanding why people hold
certain opinion about a certain issue and not
concerned with making statements about a
population.

60
Focus group
• Advantages
1. Obtaining depth responses, since group members
can often stimulate new thoughts for each other,
2. Provide data more quickly than if individuals were
interviewed separately ,
3. It is relatively cheap,
4. Researcher can gain information from non – verbal
responses to supplement verbal responses.

61
Focus group
• Disadvantages
1. Some group members may feel hesitant to speak
regardless of how much the leader encourages
team members to contribute.
2. Members’ opinion can be influenced by other
members.
3. Small number and convenience sampling severely
limit ability to generalize about a large population.
4. The leader may knowingly or unknowingly bias
results by providing cues about what type of
responses are desirable.

62
STEPS OF PRIMARY DATA
COLLECTION
In collecting primary data one has to follow the
following steps/stages.
1. Objective/purposes and resources.
• Laying down the census/survey's objectives (these
are in most cases given by the users or sponsoring
agent).
• The statement(s) of the objective should be precise
and not giving statements of broad aims.
• At this stage one has to ascertain the availability of
reasonable resources to the work.

63
STEPS OF PRIMARY DATA
COLLECTION
2. Coverage
• The population to be covered should be specified:
its geographical, demographic and other
boundaries - and to decide whether it should be
fully (census) or only partially covered.
• The type of sampling design to be used.
• The appropriate sampling unit (administrative
wards, districts, household, family or individual).
• Is the sampling frame available or has to be
developed?
• What is the required size of the sample, to give the
required accuracy, and how big a sample is
sometime depends on available resources.
64
STEPS OF PRIMARY DATA
COLLECTION
3. Questionnaire design
• The framing and arrangement of questions is
perhaps the most substantial planning tasks.
• The scope of the questionnaire, its layout and
printing, the definitions and instructions to go
with it and on the wording and order of the
questions should be carefully watched out.
• Possible tables to be produced using the collected
data should be thought of during this stage of
questionnaire design.
• In practice, design of questionnaire involves more
than one person, including statistician(s), subject
specialist(s), computer specialist(s) and user(s) or
the sponsoring agent.
65
STEPS OF PRIMARY DATA
COLLECTION
4. Questionnaire pre-test
• After the questionnaire has been designed, it is
tested using few units in different places to check
for its feasibility and worthiness to the objectives of
the survey also for the correctness of the various
questions and their order, i.e. if they are easily
understandable by the interviewers as well as
respondents.
• After pre-test there may be a need to restructure the
questionnaire to accommodate all the important
issues which might have been missing and/or the
irrelevant ones are omitted.
• Pretesting the questionnaire may be done more than
once.
66
STEPS OF PRIMARY DATA
COLLECTION
5. Pilot survey
• This is similar to questionnaire pre-test, the only
difference is that the aim here is to try to estimate,
say, the work load of every enumerator and thus
estimating the timing and costs in the real survey.
6. Supervisors and interviewers recruitment and
training
7. Data collection (field work).

67
TOOLS FOR DATA COLLECTION
The various methods of data gathering involve the use
of appropriate recording forms. These are called tools
or instruments of data collection. They consists of:-
1. Observation schedule /observationnaire
This is a form on which observations of an object or a
phenomenon are recorded.
2. Interview guide
This is used for non- directive and depth interviews. It
does not contain a complete list of items on which
information has to be elicited from a respondent. It
contains only the broad topics or areas to be covered in
the interview.
68
TOOLS FOR DATA COLLECTION
3. Rating Scale
4.Check list
5. Opinionnaire
6. Document schedule
7. Schedule for institutions
8. Inventories
9. Interview schedule and mailed questionnaire
They are both used widely in surveys. They are both
complete lists of questions on which information is
elicited from the respondents. The basic difference
between them lies in recording responses. A schedule
is filled out by the interviewer while questionnaire is
completed by the respondent.
69
QUESTIONNAIRE DESIGN
There are no hard-and-fast rules about how to design a
questionnaire, but there are a number of points that
can be borne in mind:
1. A well-designed questionnaire should meet the
research objectives. This may seem obvious, but
many research surveys omit important aspects due
to inadequate preparatory work, and do not
adequately probe particular issues due to poor
understanding.
2. It would keep the interview brief and to the point
and be so arranged that the respondent(s) remain
interested throughout the interview.
70
QUESTIONNAIRE DESIGN
3. It should obtain the most complete and accurate
information possible. The questionnaire designer
needs to ensure that respondents fully understand the
questions and encourage respondents to provide
accurate, unbiased and complete information.
4. A well-designed questionnaire should make it easy
for respondents to give the necessary information and
for the interviewer to record the answer, and it should
be arranged so that sound analysis and interpretation
are possible.

71
QUESTIONNAIRE DESIGN
Before an investigator starts to constructs questions to
be included in his/her questionnaire he/she should
first decide on what information should be collected,
what will be the target respondents, and what will be
the proper method of data collection. After designing
the draft questionnaire, he/she will need to evaluate
the draft questionnaire (relevance, appropriateness,
clarity and ambiguity, practicability, validity, the
logical order, the length of the instrument and other
aspects).

72
QUESTIONNAIRE DESIGN
The revised draft must be pretested to test whether:
i. the instrument would elicit responses required to
achieve research objectives,
ii. the content of the instrument is relevant and
adequate,
iii. wording of question is clear,
iv. the instrument has the required quality – question
structure and sequence.
After pre testing the questionnaire, the procedures or
instructions relating to its use must be prepared and
the format of the questionnaire is designed.
73
QUESTIONNAIRE DESIGN
Question construction
This involve four major decision areas:-
1. Question relevance and content
2. Question wording
3. Response form/Types of questions
4. Question order or sequence

74
QUESTIONNAIRE DESIGN
1. Question relevance and content
Any question to be included in the instrument should
pass certain tests.
Relevance test: Is it relevant to the research objective?
Can it yield significant information for answering
research questions?
Coverage test: If the question has passed the relevance
test, we should then consider its coverage. Is it double
barreled question that require splitting? Does the
question provide the information needed to interpret
the response fully? Does the question include technical
words?
75
QUESTIONNAIRE DESIGN
2. Question wording
The designer should look for words that have the
following characteristics:
i. Shared vocabulary
ii. Uniformity of meaning
iii. Exactness
iv. Simplicity
v. Neutrality
vi. Presumptions
vii. Questions with no embarrassing matters.
76
QUESTIONNAIRE DESIGN
3. Response form/Types of questions
The third major area in question construction is to
decide types of questions to be included. The
questions included my be classified as open questions
or closed questions.
The choice between open and closed questions
depends on the situations like Objective of the
interview, Respondents’ level of information about the
topic, investigator’s knowledge about the topic etc.

77
QUESTIONNAIRE DESIGN
Advantages of closed ended questionnaires:
• It provides the respondent with an easy method of
indicating his answer - he does not have to think
about how to articulate his answer.
• It 'prompts' the respondent so that the respondent
has to rely less on memory in answering a question.
• Responses can be easily classified, making analysis
very straightforward.

78
QUESTIONNAIRE DESIGN
Disadvantages of closed ended questionnaires:
• It force a statement of response in researcher’s terms
rather than respondent’s.
• They often do not reveal things which were not
known by the investigator.

79
QUESTIONNAIRE DESIGN
Advantages of open ended questionnaires:
• They allow the respondent to answer in his own
words, with no influence by any specific alternatives
suggested by the interviewer.
• They often reveal the issues which are most
important to the respondent, and this may reveal
findings which were not originally anticipated when
the survey was initiated.

80
QUESTIONNAIRE DESIGN
Disadvantages of open ended questionnaires:
• Respondents may find it difficult to properly and
fully explain their attitudes or motivations.
• Data collected has to be coded and reduced to
manageable categories. This can be time consuming
for analysis.
• Respondents will tend to answer open questions in
different 'dimensions'.

81
QUESTIONNAIRE DESIGN
4. Question order or sequence
The order in which questions are arranged is
important as question wording. The questions should
begin with simple and general items to more complex
and specific items.

82
QUESTIONNAIRE DESIGN
• Common problems with questions
i. Ambiguous terms that are not understood by
respondents. Are you interested in small house?
ii. Questions that beg a certain response. You don’t eat
ladies finger do you?
iii. Questions that embarrass the respondent, have you
stopped beating your wife?

83
QUESTIONNAIRE DESIGN
iv. Assigning improper response scales. How often
does your family dine out?
A. Very frequently
B. Once in a while
C. 2-3 times in a week
D. Constantly.
v. Double barreled questions. Do you support the
Competence based system and examination
regulations of ABC institute?
vi. Long questions.

84
QUALITIES AND FUNCTIONS OF
ENUMERATORS
Qualities of enumerators:
• Honest
• Interest
• Accuracy
• Adaptability
• Personality
• Independent and
• Intelligence and education

85
FUNCTIONS OF ENUMERATORS
1. Pre – enumeration responsibilities:
• Attending training of enumerators’ workshop.
• Receiving enumeration materials and equipment
from the supervisor.
• Developing an enumeration schedule/itinerary.
• Be familiar with the Enumeration Areas (EA.)
• Check and amend the EA map where necessary, and
inform the supervisor.

86
FUNCTIONS OF ENUMERATORS
2. Functions during enumeration:
• Locating/selecting sample members
Enumerator will locate/select respondents according
to the sampling method used. For example in quota
sampling enumerators themselves will make a
selection though limited by the quotas, while with
random sampling – the enumerators will have to
interview only selected individuals.

87
FUNCTIONS OF ENUMERATORS
• Obtaining an interview
Having located his/her respondents, the interviewer
has to obtain an interview with them before starting to
ask them questions. He/she should plan for a good
timing, since time can influence the accuracy of the
data obtained. The next thing to obtain an interview
with the respondents is through proper introduction,
of whom the interviewer is, what organization he/she
is working for, and perhaps showing the identification
card. This can be followed by why the survey is done,
and what is expected out of that.

88
FUNCTIONS OF ENUMERATORS
• Asking questions
The interviewer must aim at attaining uniformity in
asking questions as well as recording answers, in
order to achieve this enumerators are therefore
expected to ask all applicable questions, ask the
questions in the given order with no more clarification
and probing than is explicitly allowed and make no
unauthorized variations in the wording of questions.

89
FUNCTIONS OF ENUMERATORS
• Recording answers
The interviewers will be required to record the
answers given by the respondents, at the end of the
interview session, enumerators must edit the
questionnaire to check that all questions have been
asked and responses are recorded, ensure that he/she
has ticked/shaded the right codes and check for
inconsistencies between answers.

90
FUNCTIONS OF ENUMERATORS
3. Post – enumeration responsibilities
• Ensure that all questionnaires and equipment are
accounted for.
• Hand over all questionnaires (filled, spoilt and
blank) with other literature and materials used in
the survey to the supervisor.
• Write a brief field report.

91
SUPERVISION OF DATA COLLECTION
• Supervision
Is the activity carried out by supervisors to oversee the
productivity and progress of employees who report
directly to the supervisors.
Supervision is a management activity and supervisors
typically are considered to have a management role in
the organization.
• Supervisor
Is someone who oversees the work or tasks of others.
A supervisor comes as a middle man.

92
SUPERVISION OF DATA COLLECTION
The data collection supervisor should have the
following skills and attributes:
• Ability to work with teams and motivate people,
• Well-organized and efficient in planning activities;
• Have full knowledge of the supervisor and
enumerator manuals and the control forms and be
able to apply the instructions during the interviews;
• Should have a good understanding of the objectives
of the survey;
• Good communication skills;
• Must have ability to manage stress.
93
SUPERVISION OF DATA COLLECTION
Roles of a Supervisor in data collection.
• Training enumerators.
• Obtaining and managing household lists and maps
for each area, or other lists to be used as the
sampling frame.
• Informing local authorities about the survey.
• Obtaining necessary venues, supplies and
equipment.
• Provide interviewers with detailed instructions
about locating households and respondents.

94
SUPERVISION OF DATA COLLECTION
• Supervising the interview process and recording
daily activities.
• Ensuring data quality;
• Be available to discuss any problems interviewers
might have in the field;
• Sending progress reports to the management;
• Providing completed instruments to data entry
supervisor as per agreement.

95
ENUMERATOR’S MANUAL
• Enumerator’s manual is prepared for the
enumerator to help him/her understands and
improves the execution of his functions.
• It present the objectives of the survey/census.
• It also present the fundamental concepts that
enumerator should know, practical approaches to
field enumeration, and the guidelines on how to
obtain accurate, complete, reasonable and consistent
data from the respondents.

96
SUPERVISOR’S MANUAL
• Supervisor’s manual is prepared for the supervisor
to help understand and improve the execution of
his/her functions.
• It contains recommendations about the coordination
of data collection (field work organization,
documents and materials for the survey), quality
control and the responsibilities and tasks the
supervisor will undertake.
• It also presents the objectives of the survey/census.

97
NORMS AND ETHICS OF STATISTICS
Statistics like any other profession has it norms and
ethics:
• Confidentiality
• Objectivity
• Openness
• Honesty
• Professionalism

98
DATA PRESENTATION AND
SUMMARIZATION
• Data:
Data can simply be defined as collected information.
Data can be expressed as single (ungrouped) data or
grouped data. Ungrouped data are those data
whereby each individual takes its value. Ungrouped
data is usually the starting point of analyses. And,
grouped data are those data in the form of interval
(continuous data). Grouped data means there is less
data to work with. Grouped data has been classified
and some data analysis has been done, which means
this data is no longer raw.

99
TERMS USED IN GROUPED DATA
1. Class interval
Is merely the length of a class, or the range of values it
can contain. The Class interval can be uniform or non-
uniform.
With non-uniform class interval the classes have
different class width. But uniform class interval uses
the same class width for all classes. The rule can be
used to decide on the number of intervals when
uniform class interval is required.

L is the number of class interval and n is the number of


observations.
100
TERMS USED IN GROUPED DATA
• We seldom use fewer than 6 or more than 15 classes,
the exact number we use in a given situation will
depend on the nature, magnitude, and range of data.
• We always make sure that each item goes into one
and only one class.
2. Class Size
It is sometimes known as class width or class
magnitude. It is theoretically determined as:
where, R = Range
L = Number of class interval
h = Class size
101
TERMS USED IN GROUPED DATA
The difference between Upper Class Boundary and
Lower Class Boundary of a given class interval gives
the Class Size.
3. Class limits
Are the smallest and the largest numbers that can go
into any given class interval. The smallest observation
in any class is the Lower Class Limit (L.C.L) and
largest observation in a given class is Upper Class
Limit (U.C.L).

102
TERMS USED IN GROUPED DATA
4. Correction Factor
Is the difference between the lower limit of the second
class and the upper limit of the first class, dividing the
difference by two.
5. Class boundaries
• Are the Upper and Lower class limits if there are no
gaps between the consecutive classes.
• Lower Class Boundary (L.C.B) /Lower Real Limit is
the smaller boundary of each of the class intervals.
• It is found by subtracting correction factor from the
lower Class Limit.
103
TERMS USED IN GROUPED DATA
• Upper Class Boundary (U.C.B)/Upper Real Limit is
the larger boundary of each of the class intervals.
• It is found by adding correction factor to the Upper
Class Limit.
6. Class mark
Class mark is the mean value of upper class limit and
lower class limit of any given class interval. It is also
known as Mid – point or mid – value and it is used as a
typical value.
7. Frequency
This is a number of times by which a data or interval
data occur in a group.
104
TERMS USED IN GROUPED DATA
Class frequency is the number of observations that fall
in a given class interval.
The cumulative frequency is the total frequency of all
values less than the upper class boundary of a given
class interval or all values greater than the lower class
boundary.
FREQUENCY DISTRIBUTION
A frequency distribution is a tool for describing a
dataset. We use it to group data into categories and
show the number of observations in each category.

105
FREQUENCY DISTRIBUTION
The frequency distribution can be
numeric/quantitative distribution if the data are
grouped according to numeric size and if the data are
grouped into categories that differ in quality the
resulting table is called categorical/qualitative
distribution.
• Creating a Frequency Table
• Decide on the number of classes you wish to use .
• Tallying the data into classes, each item or observation
should go into one and only one class.
• Construct the frequency table by counting the number of
data values in each class.
106
FREQUENCY DISTRIBUTION
Example 1:
Create a frequency distribution table on marks scored
by 20 students in their Statistics exam.
97, 92, 88, 75, 83, 67, 89, 55, 72, 78, 81, 91, 57, 63, 67, 74,
87, 84, 98, 46.
Table1. Frequency Distribution Table
Class Interval Frequency (f)

40 - 49 1

50 - 59 2

60 - 69 3

70 - 79 4

80 - 89 6

90 - 99 4

107
FREQUENCY DISTRIBUTION
Example 2:
A school nurse weighed 30 students, their weights (in
kg) were recorded as follows:
50, 52, 53, 54, 55, 65, 60, 70, 48, 63, 74, 40, 46, 59, 68, 44,
47, 56, 49, 58, 63, 66, 68, 61, 57, 58, 62, 52,56,58.
Present this information in a frequency distribution
table.
Note: Lower Class limit should be 40 and class size
should be 5.

108
FREQUENCY DISTRIBUTION
Table 2: Frequency Distribution Table
Class interval Frequency (f)

40 - 44 2
45 - 49 4
50 - 54 5
55 - 59 8
60 - 64 5
65 - 69 4
70 - 74 2

Exercise.
A survey was taken in 20 selected households, people were
asked how many cars were registered to their households. The
results were recorded as follows:
1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0
Present this data in frequency distribution table.
109
CUMULATIVE DISTRIBUTION
Sometimes it is preferable to present data in what is
called Cumulative Frequency Distribution or simply
Cumulative Distribution.
A table which shows data together with the
corresponding cumulative frequencies is known as
Cumulative Frequency Distribution Table. This table
shows directly how many of the items are less than or
greater than various values.
Cumulative Distribution table for example 1 can be
drawn as follows:

110
CUMULATIVE DISTRIBUTION
Table 3: Cumulative Distribution Table
Class Interval (f) Less than Cumulative More than
Frequency Cumulative
Frequency

40 - 49 1 1 20
50 - 59 2 3 19
60 - 69 3 6 17
70 - 79 4 10 14
80 - 89 6 16 10
90 - 99 4 20 4
Note: Less than cumulative frequencies show how
many items are less than the Upper Class Boundaries
and More than Cumulative frequencies show how
many items are more than the Lower Class
Boundaries.

111
LESS THAN OGIVE AND MORE THAN
OGIVE
Less than Ogive
Is the graph/curve of the less than cumulative
frequency distribution which shows the number of
observations LESS THAN the upper class
boundary/class limit.
More than Ogive
Is the graph/curve of the greater than cumulative
frequency distribution which shows the number of
observations GREATER THAN the lower class
boundary/class limit.

112
CUMULATIVE DISTRIBUTION
Exercise:
Prepare Cumulative Distribution table for example 2
above.
Relative Frequency Distribution/Percentage
Distribution
Often, it is better to show what percentage of the items
falls into each class of a distribution instead of
showing the actual class frequencies. To convert a
frequency distribution into a corresponding
percentage distribution we divide each class
frequency by the total number of items grouped and
multiply the quotient by 100.
113
PERCENTAGE DISTRIBUTION
Frequency distribution of example 1 above can be
converted to percentage distribution as follows:
Table 4: Percentage Distribution Table
Class Interval (f) Percentage frequency

40 - 49 1 5%
50 - 59 2 10%
60 - 69 3 15%
70 - 79 4 20%
80 - 89 6 30%
90 - 99 4 20%

We can also compute the cumulative percentage, and


create a cumulative percentage table.

114
CATEGORICAL/QUALITATIVE
FREQUENCY TABLE
The general problem of constructing categorical
frequency table is somewhat similar as constructing
numeric table. Here again, you decide how many
categories (classes) to use and what kind of items each
category is to contain, making sure that all items are
accommodated and that there is no ambiguities. With
categorical distributions, we do not have to worry
about class limits, class boundaries and class marks. In
order to avoid ambiguities in defining categories, it is
advisable where possible to use standard categories.

115
STEM AND LEAF PLOT
A stem and leaf plot is a device used to group a small
data set (up to 50 data values). The data set is arranged
in ascending order while retaining all the original data
values. In a stem and leaf each data is considered to
have two parts, a stem and a leaf. The leading digit(s)
of the data value form the stem, and the trailing
digit(s) becomes the leaf.
To construct a stemplot, we:
• Enter the stems to the left of a vertical dividing line
and the leaf to right of the vertical dividing line for
each data value.

116
STEM AND LEAF PLOT
• Record each data value as listed in the data set to
construct stemplot.
• If a stem and leaf plot has 32*|8 0 4 7 6 then the
corresponding data are 328, 320, 324, 327, and 326
and if the stem and leaf has 8**|12 92 00 29 then the
corresponding data are 812, 892, 800, and 829
Example 1:
Consider the number of minutes taken by 30 students
to accomplish Statistics test 1.
64, 62, 57, 54, 47, 67, 58, 51, 72, 45, 51, 83, 51, 74, 59, 53,
78, 45, 69, 64, 58, 54, 42,62, 51, 45, 69, 51, 78, 67
Prepare the frequency table and a stem and leaf plot.
117
STEM AND LEAF PLOT
Table1: Frequency distribution table
Time Frequency

40 - 49 5

50 - 59 12

60 - 69 8

70 - 79 4

80 - 89 1

If we wanted to avoid the loss of information inherent


in the above table, we could replace the table by a stem
and leaf below:
4* 7 5 5 2 5
5* 7 4 8 1 1 1 9 3 8 4 1 1
6* 4 2 7 9 4 2 9 7
7* 2 4 8 8 8
8* 3
118
HISTOGRAMS AND FREQUENCY POLYGON
Histogram:
• A histogram is a graphical presentation of the
information in a frequency table using a bar graph.
It depicts the distribution of a set of data.
• The histogram should have the variable being
measured in the data set as its horizontal axis, and
the class frequency as the vertical axis.
• Each data class will be represented by a vertical bar
whose height is the frequency of the class and whose
width is the class width.

119
HISTOGRAMS AND FREQUENCY POLYGON
• The bar for each class is centered at the class
midpoint, and the bars are not separated from one
another.
• Histograms cannot be used in connection with
frequency distributions having open classes, and
they must be used with extreme care when the class
intervals are not equal.
Frequency Polygon
Is a line graph presentation of the information in a
frequency table. Like histogram, the vertical axis
represents frequency and the horizontal axis
represents the variable being measured in the data set.
120
HISTOGRAMS AND FREQUENCY
POLYGON
To construct the graph, a point is plotted for each class
at its midpoint and the height given by frequency of
the class. The points are then connected by straight
lines.
Frequency polygons are useful for comparing
distributions. This is achieved by overlaying the
frequency polygons drawn for different data sets.
Example:
Prepare the frequency distribution table in the interval
X1-X2 where X1 is included and X2 is excluded, starting
from 40-50, 50-60, 60-70 etc

121
HISTOGRAMS AND FREQUENCY
POLYGON
and hence draw the frequency polygon on the same
plane for the marks scored by 31 students in
Mathematics test and Statistics test.
Mathematics test Scores
85,77,76,48,56,87,82,81,90,82,95,68,63,56,48,96,92,91,68,
52,51,59,89,72,79,73,74,73,86,81,90
Statistics test Score
55, 84,51, 40,41,90,98,70,72,89,60,67,83,44,67,58,61,73,84,
55,67,85,91,90,50,60,68,42,52,69,79.

122
HISTOGRAMS
Histogram with Unequal width
When classes have unequal widths, the vertical axis of
a histogram must represent not frequency (number of
occurrences) but frequency density (frequency divided
by class width.), and the class widths must be
accurately represented on the horizontal axis, so that
the AREA of each bar (not the height) represents the
frequency of that class.

123
HISTOGRAMS
• The following are questions that a statistician should
be able to answer about any histogram.
1. What is the maximum data value as shown on the
histogram?
2. What is the minimum data value as shown on the
histogram?
3. Is the histogram symmetric, skewed to the left,
skewed to the right? (Because real data rarely
results in perfectly symmetric histograms, anything
close to this shape can be classified as such.)

124
HISTOGRAMS
3. How many peaks does the histogram have, and
where are they located? (Peaks are bars with shorter
bars on each side. First bars that are taller than second
bars or last bars that are taller than the preceding bar
are also called peaks. Two or more adjacent bars of the
same height with neighboring shorter bars - a plateau -
would be considered one peak.)
4. Does the histogram have any gaps, and if so, where
are they located?
5. Does the histogram have any extreme values, and if
so, where are they located? (An extreme value is a bar
with a large gap - two or more classes - between it and
the other bars.)
125
HISTOGRAMS

minimum = 18, maximum = 60, skewed left, one


peak at 45,one gap between 24 and 30, no extreme
values

126
BOX AND WHISKER PLOTS
• A box and whisker plot (sometimes called a boxplot)
is a graph that presents information from a five-
number summary. It does not show a distribution in
as much detail as a stem and leaf plot or histogram
does, but is especially useful for indicating whether
a distribution is skewed and whether there are
potential unusual observations (outliers) in the data
set.
• Constructing box and Whisker Plots
1. Put the values in numerical order if they are not.
2. Then you find the median of your data. The
median divides the data into two halves.
127
BOX AND WHISKER PLOTS
3. To divide the data into quarters, you then find the
medians of these two halves. Note: If you have an even
number of values, so the first median was the average
of the two middle values, then you include the middle
values in your sub-median computations. If you have
an odd number of values, so the first median was an
actual data point, then you do not include that value in
your sub-median computations. That is, to find the
sub-medians, you're only looking at the values that
have not yet been used.

128
BOX AND WHISKER PLOTS
You have three points: the first middle point (the
median), and the middle points of the two halves
(what I call the "sub-medians"). These three points
divide the entire data set into quarters, called
"quartiles". The top point of each quartile has a name,
being a "Q" followed by the number of the quarter. So
the top point of the first quarter of the data points is
"Q1", and so forth. Note that Q1 is also the middle
number for the first half of the list, Q2 is also the
middle number for the whole list, Q3 is the middle
number for the second half of the list.

129
BOX AND WHISKER PLOTS
4. Draw a number line using the scale that will
correspond with your data, indicate the minimum
value, Q1,Q2,Q3 and the maximum value.
5. The "box" part of the plot goes from Q1 to Q3.
6. And then the "whiskers" are drawn to the endpoints.
We shall discuss later how box plot can be used to
study the spread of data.
Example
Draw a box and plot of the following marks of
students. 78,82,85,87,91,93,100. Are the data negatively
skewed, positively skewed or symmetrical?

130
GRAPHS AND CHARTS
After preparing a frequency table one can draw
some Graphs and charts apart from Histogram,
Frequency Polygon and Ogives. Graphs and
Charts are simple and effective way of
illustrating and comprehending a table. It gives
pictorial effect to what would otherwise be just
a mass of figures. Graphs and charts include:
• Pictographs, Pie-charts, Bar and column
charts, Line Graphs and Statistical maps
(GIS).

131
PICTOGRAPHS
• This form of presentation of data is used to
bring out a comparative picture of the values
of a single characteristic over time or space.
• The commonly used form of a pictograph
consists in displaying the data by using equi-
sized pictorial forms, the number of such
symbols being made proportional to the
value of the characteristic.
• You can use part of a symbol. Although this method
is quite popular it is not an accurate way of
presenting data because fractions of whole symbols
cannot be represented proportionally.

132
PIE CHART
• These are circular in shape and present data as slices of a pie,
are used to represent and compare component parts of a
total.
• In a chart, the component parts are shown as segments of a
circle by making the angles of (and hence the area of) the
segments proportional to the magnitudes of the components
parts.
• Pie-charts are most useful in bringing out the relative
importance of the different components of the whole.
• When to use Pie charts?
• Do the parts make up a meaningful whole?
• Are the parts mutually exclusive?
• Do you want to compare the parts to each other or the parts
to the whole?
• How many parts do you have? More than 5??
133
BAR AND COLUMN CHART
Are charts that uses either horizontal or vertical bars to
show comparisons among categories.
1. Simple bar /Column charts
• Are used to represent data on one characteristic
over time or space or by class intervals.
• One axis of the chart shows the specific categories
being compared, and the other axis represents a
discrete value.
• They are used to represent discrete/categorical data.
• The bar graph are drawn in such away that the
vertical scale should start from zero.

134
BAR AND COLUMN CHART
• Usually equi- width bars are used and they are
drawn such that their length (and hence their areas)
are proportional to the magnitude of the
characteristic represented.
When a bar chart is drawn, you might look for:
• the tallest bar.
• the shortest bar.
• one bar relative to another.
• growth or shrinking of the bars over time.

135
BAR AND COLUMN CHART
Example 1:
Draw a simple bar diagram to represent the profits of
a bank for 5 years.
Years 1989 1990 1991 1992 1993

Profit (million $) 10 12 18 25 42

136
BAR AND COLUMN CHART
2. Multiple Bar Chart
• When information on more than one characteristic
is to be presented at the same time for enabling
comparison in respect of them over time or space or
class intervals, then multiple bar chart is used.
• Here the bars representing different characteristics
are put together for each classification used,
(different colours of shadings is desirable).

137
BAR CHARTS
2. Example of a Multiple Bar Chart
40

35

30

25

20 Number of students 1980


Number of students1981

15

10

0
Tanzania Kenya Uganda Malawi Botswana Zimbabwe Zambia

138
BAR CHARTS
3. Sub-divided Bar Charts
• If the characteristic under consideration has two or
more components, then they can be presented in the
form of sub-divided bar chart.
• This chart is similar to the simple bar chart except
that each simple bar is subdivided into sub-bars to
represent the magnitude of the components of the
characteristics. (different colours or shadings is
desirable).

139
BAR CHARTS
Example of Sub-divided Bar Charts:
60

50

40

30 Male
Female

20

10

0
2001 2002 2003 2004 2005 2006 2007

140
BAR CHARTS
4. 100% Sub-divided Bar Charts:
• This chart is similar to the sub-divided bar chart
with the difference that the magnitude of the
characteristic for each bar is taken as 100% and the
components are shown as percentages.
• In this type of presentation, the information on the
magnitudes of the characteristic is lost, but this is
more convenient for comparing the composition of
the characteristic over time, space and class
intervals.

141
BAR CHARTS
Consider the following table indicating the number of
students of EASTC and their nationality.

Female Male

Tanzania 21 23

Kenya 22 34

Uganda 33 23

Malawi 33 23

Botswana 12 21

Zimbabwe 23 12

Zambia 34 23

142
BAR CHARTS
The above table can be presented in the 100% sub-
divided bar as follows:
100%

90%

80%

70%

60%

50% Male
Female
40%

30%

20%

10%

0%
Tanzania Kenya Uganda Malawi Botswana Zimbabwe Zambia

143
BAR CHARTS
5. Paired Bar Chart
• There are certain characteristics which are generally
considered in pairs, such as sex composition, import
and exports, income and expenditure etc.
• This chart consists of bars on both sides of the
vertical line or band representing the values of the
paired characteristics, e.g. Age/Sex Pyramid.

144
BAR CHARTS
Example of paired bar chart.

145
RECTILINEAR LINE GRAPHS
• Simple line graph
This graph is used to present the values of one
characteristic for studying its trend overtime (dates,
years)or ordered categories. Useful when there are
many data points and the order is important.
• Multiple line graph
This graph is obtained when more than one simple
line graph are drawn on the same graph paper with a
view of comparing the trends of different
characteristics.

146
SCATTER PLOTS
Scatter plots are used to show the relationship between
pairs of quantitative measurements made for the same
object or individual. For example, a scatter plot could
be used to present information about the examination
and coursework marks for each of the students in a
class.
By analysing the pattern of dots that make up a scatter
plot it is possible to identify whether there is any
systematic or causal relationship between the two
measurements. Regression lines can also be added to
the graph and used to decide whether the relationship
between the two sets of measurements can be
explained or if it is due to chance.
147
STATISTICAL MAPS (GIS)
• Maps are an excellent means of presenting statistical
information. Not only are they visually attractive,
but they also make it easier for users to relate data to
location. They are excellent for showing geographic
patterns but are poor for providing precise values
with which users can do their own calculations.
• Most Geographic Information Systems (GIS) offer a
range of ways of drawing statistical maps. Statistical
maps can be color shaded maps, proportional
symbol maps, and doted maps.

148
MEASURES OF CENTRAL TENDENCY
Under Measures of Central Tendency we loot at
possible methods to get a single value/ number to
represent/describe a set of observations. Sometime
measure of central tendency is referred as average.
The set of observations contains a lot of information,
which can not be appreciated just by looking at
figures. We must find methods of extracting the most
important bits of information, so that we can use them
for comparison.

149
MEASURES OF CENTRAL TENDENCY
Therefore, objectives of the study of averages are to:
i. To get single value that describe the characteristics
of the entire set of data. Average enables to get an
overall view of the entire data.
ii. Facilitate comparison. By reducing the mass of raw
data in one single figure enables comparison to be
made. Comparison can be made at a point of time
or over period of time.

150
REQUISITE OF GOOD CENTRAL
TENDENCY
Since the average is the single value that describe the
characteristics of entire data, it must have the
following characteristics:
i. It should not be affected by extreme values.
ii. It takes into account every item of set of data.
iii. It should be capable of further algebraic
treatments.
iv. It always exists, it can be calculated for any set of
numerical data.
v. It should be easy to calculate and simple to follow.

151
MEASURES OF CENTRAL TENDENCY
One more thing to be remembered about averages is
that the items whose average is being calculated
should form a homogenous group. It is absurd to talk
about the average of a man's height and his weight. If
the data from which an average is being calculated are
not homogeneous, misleading conclusions are likely to
be drawn. Thus we see that as far as possible, the data
from which an average is calculated should be a
homogeneous lot. Homogeneity can be achieved either
by selecting only like items or by dividing the
heterogeneous data into a number of homogeneous
groups.
152
MEASURES OF CENTRAL TENDENCY
The commonly used measures of the Central Tendency
are:-
Mean
 Simple Arithmetic (unweighted) Mean;
 Weighted Mean;
 Geometric Mean; and
 Harmonic Mean.
Median
Mode
The measures of central tendency can be calculated for
both grouped and ungrouped data.
153
SIMPLE ARITHMETIC MEAN
 Simple Arithmetic mean (unweighted mean) for
ungrouped data.
The mean of the sample n values is the sum of the
values divided by n. Mathematically;
n

Sample mean : x i

x  i 1
n

The mean of a population of N items is defined as the


sum of the N items divided by N.
Population mean:
N

X i
  i 1
N

154
SIMPLE ARITHMETIC MEAN FOR
UNGROUPED DATA
Example:
A sample of 5 high –precision spring driven motors is
taken from a production lot of 100,000 such motors.
The motors are wound and started, and their running
times clocked at 3.50, 3.65, 3.55, 3.58, and 3.52 minutes.
Find their mean running time.
(3.50+3.65+3.55+3.58+3.52)/5= 3.56 minutes.
We can often use the sample mean to estimate the
population mean.

155
SIMPLE ARITHMETIC MEAN
Properties of arithmetic mean:
1. The algebraic sum of deviations of a set of
observations from their mean is zero. That is
 (x i  x)  0
Pr oof
 (x i  x)  xi  x
 nx  nx
 0
2. Sum of squared deviations of a set of observations
from their mean is minimum/less than the sum of
squared deviations of the set of observations from any
other value. That is  xi  x 2 is minimum.
Prove property 2.
156
SIMPLE ARITHMETIC MEAN FOR
GROUPED DATA
Arithmetic Mean =
n

 f i xi
x  i 1
n

i 1
fi

where xi = Class mid point/class mark


f i = Frequency of the th class
i
Example:
Calculate the mean rent of the following distribution.
Rent (Dollars) Frequency
7.5 – 12.5 12
12.5 – 17.5 26
17.5 – 22.5 45
22.5 – 27.5 60
27.5 – 32.5 37
32.5 – 37.5 13
37.5 – 42.5 5
42.5 – 47.5 2

157
SIMPLE ARITHMETIC MEAN FOR GROUPED
DATA
Rent Class mark Frequency f i xi
fi
(Dollars) (x)
7.5 – 12.5 10 12 120
12.5 – 17.5 15 26 390
17.5 – 22.5 20 45 900
22.5 – 27.5 25 60 1500
27.5 – 32.5 30 37 1110
32.5 – 37.5 35 13 455
37.5 – 42.5 40 5 200
42.5 – 47.5 45 2 90
Total 200 4765

= 4765/200 = 23.825 dollars


158
SIMPLE ARITHMETIC MEAN FOR
GROUPED DATA
However, we can reduce the work in our calculations
by using
i. Assumed mean method
ii. Coding method.

ARITHMETIC MEAN BY ASSUMED MEAN


If A is any assumed Arithmetic mean (which may be
any number) and if d j  X j  A are the deviations of X
j

from A then: f d
k

j j
j 1
X  A k

 j 1
fj

Prove the formula


159
SIMPLE ARITHMETIC MEAN FOR
GROUPED DATA
ARITHMETIC MEAN BY CODING METHOD
If the class size (c) of the class intervals are equal, the
deviations d j  X j  A can be expressed as CU j , where U j
can be positive or negative integers or zero, then
k


j 1
f jU j

X  A  c. k

 j 1
fj

Prove the formula


Example: Consider the following table that present the
weight of students when they were measured by the
school nurse. Estimate the mean of weight.

160
SIMPLE ARITHMETIC MEAN FOR
GROUPED DATA
ARITHMETIC MEAN BY CODING METHOD

Mass (kg) Number of


student
60-62 5
63-65 18
66-68 42
69-71 27
72-74 8
Method 1 (By definition) = 6745/100 = 67.45
Method 2 (By Assumed Mean) = 67 +45/100 = 67.45
Method 3 (By coding method)= 67+3(15)/100 = 67.45
161
ADVANTAGES OF SIMPLE
ARITHMETIC MEAN
The arithmetic mean has the following advantages:
i. It can be calculated for any set of numerical data,
so it always exists.
ii. It is always unique.
iii. It lends itself to further statistical treatment (the
means of several sets of data can be combined into
the overall mean of all the data).
iv. It takes into account every item of the data.

162
DISADVANTAGES OF SIMPLE
ARITHMETIC MEAN
The arithmetic mean has the following disadvantages:
i. It is highly affected by extreme values.
ii. It can not be calculated for categorical data.
iii. It is not an appropriate average for highly skewed
distributions.
iv. When there is open ended class intervals,
assumption have to be made carefully and if not, it
may lead to bad results.
v. Not good for calculating rates.
vi. Not good for calculating averages of data sets with
unequal number of observations.
163
WEIGHTED MEAN
The weighted mean is similar to a simple arithmetic
mean where instead of each of the data points
contributing equally to the final average, some data
points contribute more than others.
Weighted averages are more realistic. A standard
average assumes that everything is created equal, but
in the real world this is not the case. So by assigning
different weights to things with different values, you
can come up with a more realistic average.
n

w x i where wi are weights.


i
x  i 1
n

w
i 1
i

164
WEIGHTED MEAN
Example:
A student obtained 40, 50, 60, 80, and 45 marks in
the subjects of Math, Statistics, Physics, Chemistry
and Biology respectively. Assuming weights 5, 2, 4,
3, and 1 respectively for the above mentioned
subjects. Find Weighted Arithmetic Mean per
subject.
Answer:

165
WEIGHTED MEAN
A special application of the formula for weighted
mean arises when we must find the overall mean, or
grand mean of k sets of data having the means x i
and constituting of ni observations. The formula of
grand mean is given by: k

n x i i
x  i 1
k

n i

Example: i 1

In three separate weeks a discount store chain sold


475, 310, and 420 microwave ovens at average prices of
$490, $520, and $495. What is the average price of the
ovens sold?

166
WEIGHTED MEAN
Grand mean = (475(490)+310(520)+420(495))/475+310+420
= $499.46
Exercise:
Given two school classes, one with 20 students, and
another with 30 students, the grades in each class on a test
were:
Morning class = 62, 67, 71, 74, 76, 77, 78, 79, 79, 80, 80, 81,
81, 82, 83, 84, 86, 89, 93, 98 Afternoon class = 81, 82, 83, 84,
85, 86, 87, 87, 88, 88, 89, 89, 89, 90, 90, 90, 90, 91, 91, 91, 92,
92, 93, 93, 94, 95, 96, 97, 98, 99.
Use the weighted mean to find the average grades of a
student.

167
GEOMETRIC MEAN
The geometric mean is defined as the n th of the
product of n observations of a variable x.
Mathematically: n
G  n  xi
Geometric mean i 1
The geometric mean is well defined for sets of positive
real numbers.
Computing Geometric mean can be simplified by
using logarithm.
Geometric mean for ungrouped data is given by
G=Antilog of  log xi
n

Prove the above formula.

168
GEOMETRIC MEAN
Geometric mean for grouped data is given by
G = Antilog of  f log xi
n
Advantages of Geometric mean
• Geometric Mean is calculated based on all observations in the
series.
• Geometric Mean is clearly defined.
• Geometric Mean is not affected by extreme values in the
series.
• Geometric Mean is amenable to further algebraic treatment.
• Geometric Mean is useful in estimating the mean percentage
growth in a population, interest rates of financial institutions,
sales and productions

169
GEOMETRIC MEAN
Disadvantages of Geometric mean
• Geometric Mean is relatively difficult to
compute.
Example:
What is the geometric mean of 4, 9, 9, and 2?
Solution:
Just multiply the four numbers and take the 4th
root: The answer will be 5.045.

170
HARMONIC MEAN
The harmonic mean of n numbers x1 , x2 , x3 ,... xn is
defined as the reciprocal of the arithmetic mean of the
reciprocals of the variables. Mathematically:
H= n
n
1
 x
i 1 i

The harmonic mean has limited usefulness, but it is


appropriate for computing the average rate.
Given a series of sub-trips at different speeds, if each
sub-trip covers the same distance, then the average
speed is the harmonic mean of all the sub-trip speeds,
and if each sub-trip takes the same amount of time,
then the average speed is the arithmetic mean of all the
sub-trip speeds.
171
HARMONIC MEAN
Example 1.
A fellow travels from city A to city B. For the first
hour, he drove at the constant speed of 20 miles per
hour. Then he (instantaneously) increased his speed
and, for the next hour, kept it at 30 miles per hour.
Find the average speed of the motion.
Example 2:
A fellow travels from city A to city B. The first half of
the way, he drove at the constant speed of 20 miles per
hour. Then he (instantaneously) increased his speed
and traveled the remaining distance at 30 miles per
hour. Find the average speed of the motion.
172
HARMONIC MEAN
Example.
If a traveler drives 10 miles on a freeway at 60 miles
per hour and the next 10 miles of the freeway at
30miles per hour, what is his average speed over the
entire distance.
n
H= 2
1

i 1 xi
2

1 1

60 30
 40

173
HARMONIC MEAN
Advantages
• The H.M. of the given data set is also computed
based on every element in the data set.
• It is not affected by extreme values.
• The original formula of H.M. can be extended to
accommodate further analysis of data by certain
algebraic manipulations.
Disadvantages
• The H.M. of any data set cannot be calculated if it
has negative and/or zero elements.
• The calculation of H.M. is relatively complicated.
174
THE MEDIAN
• The median is the middle score for a set of data that
has been arranged in order of magnitude.
• The median of a finite list of numbers can be found
by arranging all the observations from lowest value
to highest value and picking the middle one.
Example:
The median of {3, 5, 9} is 5.
• If there is an even number of observations, then
there is no single middle value; the median is then
usually defined to be the mean of the two middle
values.
Example: The median of {3, 5, 7, 9} is (5 + 7) / 2 = 6
175
THE MEDIAN
In the case of grouped data we have the frequencies in
the various classes and not the detailed observations.
Consequently the median cannot be located as a
particular observation or midway between two
observations. In this case the median of the grouped
data is obtained by: c N 
median  Lm    fb 
fm  2 
Where Lm is the lower class boundary of the median
class, C is the class size,
fm is frequency of the median class,
fb is sum of frequencies of classes below the median
class.
176
THE MEDIAN
Example
Calculate the median rent of the frequency distribution
shown below:
Rent (dollars) Frequency
7.5 – 12.5 12
12.5 – 17.5 26
17.5 – 22.5 45
22.5 – 27.5 60
27.5 – 32.5 37
32.5 – 37.5 13
37.5 – 42.5 5
42.5 – 47.5 2

177
THE MEDIAN
The median class is 22.5-27.5, therefore Lm =22.5 ,
C = 5, fm =60, N=200, fb= 83,

c N 
median  Lm    f b 
fm  2 

5  200 
median  22.5    83
60  2 
median  23.9

178
ADVANTAGES AND DISADVANTAGES OF THE
MEDIAN
Advantages
• The median is less affected by outliers and skewed
data than the mean, and therefore it is usually the
preferred when the distribution is not symmetrical.
• It is always unique and can be located graphically.
• It can easily be calculated even when there is open
class intervals or irregular class intervals.
Disadvantages
• The median cannot be identified for categorical
nominal data, as it cannot be logically ordered.
• If the values are spread erratically, the value of the
median may not be a realistic representative figure.
179
QUANTILES
Dividing ordered data into essentially equal-sized data
subsets is the motivation for quantiles; the quantiles
are the data values marking the boundaries between
consecutive subsets. The median divide the data set
into two equal parts. By dividing the set of data into
four, ten and hundreds equal parts, we get Quartiles,
deciles and percentiles respectively.
1. Quartiles
These are three numbers that divides the data into four
equal parts.
If the data are arranged in an increasing order, then
the value that divides the first quarter of the data is
180
QUARTILES FOR UNGROUPED DATA
First quartile (Q1). The number that divides the data
into two halves is known as second quartile (Q2). The
third number that divides the data into three quarters
is known as the third quartile (Q3).
When the items have been arranged in ascending
order then Q  i n  1  item
th

i
 4 
Example:
Find the value of the three quartiles from the following
data set:
25,80,68,53,76,73,85,88,91,79,58
Answer
Q1 = 58,Q2 = 76,Q3 = 85
181
QUARTILES FOR GROUPED DATA
For grouped data the jth quartile is given by:
c  j  
Q j  LQj     N  f bQj  j= 1,2,3
f Qj  4  
Where LQj = Lower class boundary of the Quartile
class
C= class size
fQj = frequency of the quartile class
fbQj = sum of frequencies of classes below the quartile
class.

182
QUARTILES FOR GROUPED DATA
Example: Find the three quartiles of the following
distribution.

Class interval frequency


60-62 5
63-65 18
66-68 42
69-71 27
72-74 8
75-77 12
78-80 16

183
QUARTILES FOR GROUPED DATA
Class Class frequency C.f
interval boundaries
60-62 59.5 – 62.5 5 5
63-65 62.5 – 65.5 18 23
66-68 65.5 – 68.5 42 65
69-71 68.5 – 71.5 27 92
72-74 71.5 – 74.5 8 100
75-77 74.5 – 77.5 12 112
78-80 77.5 – 80.5 16 128

Q1= 66.14, Q2= 68.43 and Q3 = 73

184
DECILES AND PERCENTILES
Deciles:
These are 9 numbers that divides the data into ten
equal parts. The number that divides a set into j
position is called the jth decile and is given by:

 n 1
th

Di  i 
 10 

For grouped data if Dj is jth decile then Dj is given by:

c  N 
D j  LDj   i   f bDj 
f Dj   10  
185
DECILES AND PERCENTILES
Percentiles:
These are 99 numbers that divides the data into 100
equal parts. The number that divides a set into j
position is called the jth percentile and is given by:

 n 1
th

Pi  i 
 100 

For grouped data if Pj is jth percentile then Pj is given


by:
c  N  
Pj  LPj   i   f bPj 
f Pj   100  
186
THE MODE
The mode is the most commonly occurring value in a
distribution
For example, the mode of the sample [1, 3, 6, 6, 6, 6, 7,
7, 12, 12, 17] is 6.
Given the list of data [1, 1, 2, 4, 4] the mode is not
unique - the dataset may be said to be bimodal if it has
two modes, while a set with more than two modes
may be described as multimodal.
Example
Find the mode of the following data set.
26, 33, 26, 21, 22, 25, 30, 30, 29,28,28,30.

187
THE MODE
In grouped data the modal class is the class with
highest frequency. The mode of a grouped data is
given by:

Where:
L= Lower class boundary of the modal class
f1 = Is the difference between the frequency of the
modal class and the frequency just before the modal
class
f 2 = Is the difference between the frequency of the
modal class and the frequency just after the modal
class.
188
THE MODE
Example: Find the mode of the following distribution.
Class interval frequency
60-62 5
63-65 18
66-68 42
69-71 27
72-74 8
75-77 12
78-80 16
The modal class 66 – 68, Hence L = 65.5, c = 3, f1 =
42-18 = 24, f 2 =42-27 = 15
Mode = 67.35
189
ADVANTAGES OF THE MODE
• The mode has an advantage over the median and
the mean as it can be found for both numerical and
categorical data.
• It is easy to understand and simple to calculate.
• It is not affected by extreme large or small values.
• It can be computed in an open ended class intervals.
• It can be located graphically.

190
DISAVANTAGES OF THE MODE
• The presence of more than one mode in the same
distribution can limit the ability of the mode in
describing the centre or typical value of the
distribution because a single value to describe the
centre cannot be identified.
• In some cases, particularly where the data are
continuous, the distribution may have no mode at
all (i.e. if all values are different).
• It is not capable of further mathematical treatment.

191
MEASURES OF DISPERSION
The various measures of central tendency discussed
give us one single figure that presents the entire data.
But the averages alone can not adequately describe a
set of observation unless all the observations are the
same. Since in most cases the observations are not the
same it is necessary to describe the variability or
dispersion of the observation. Also in two or more
distributions the central value may be the same but
still there can be wide disparities in those
distributions. Measures of variation help us in
studying the important characteristics of distribution
such as the extent to which the items vary from one
another, and how items vary from central value.
192
ABSOLUTE MEARURES OF VARIATION
Absolute Measures of Dispersion (AMD):
Absolute Measures of Dispersion are expressed in
same units in which original data is presented but
these measures can be used to compare the variations
between the two series which have the same units and
are of the same average. The Absolute Measures of
Dispersion are limited in comparing the variation
between two sets of data which have different units or
different averages.
The common AMD are Range, quartile Deviation,
Mean Deviation, and Standard Deviation.

193
RELATIVE MEARURES OF VARIATION

Relative Measures of Dispersion:


It is the ratio of absolute dispersion to an appropriate
average. It is used when we want to compare the
variability of two or more series which have different
averages or different units.
The common Relative Measures of Dispersion is
Coefficient of Variation but one can also use
Coefficient of Range, Coefficient of Quartile Deviation,
and Coefficient of mean Deviation.

194
RANGE AND COEFFICIENT OF RANGE
Range:
• Range is defined as the difference between the
maximum and the minimum observation of the
given data. Range  X m  X 0
• In case of grouped data, the range is the difference
between the upper boundary of the highest class
and the lower boundary of the lowest class . It is the
simplest measure of dispersion. It gives a general
idea about the total spread of the observations.
• Range can be strongly affected by the outliers.

195
RANGE
• The range is based on the two extreme observations.
It gives no weight to the central values of the data. It
is a poor measure of dispersion and does not give a
good picture of the overall spread of the
observations with respect to the center of the
observations. This defect in range cannot be
removed even if we calculate the coefficient of range
which is a relative measure of dispersion.
• However, the range is very useful in quality control,
studying fluctuations in the share prices, and
weather forecast.

196
COEFFICIENT OF RANGE
It is a relative measure of dispersion and is based on
the value of range. Xm  X0
Coefficient of range = X m  X 0
Example:
Set A marks of students out of 25 :10,15, 18, 20, 20
Set B marks of students out of 100:30, 35, 40, 45, 50
Range for A = (20-10) =10 Coefficient of range: 0.33
Range for B = (50-30)=20 Coefficient of range: 0.25
Using the range, one might conclude that there is
greater dispersion in Set B than set A. But this is not
true, since the two sets have different base.
197
COEFFICIENT OF RANGE
When we convert these two values into coefficient of
range, we see that coefficient of range for set A is
greater than that of set B. Thus there is greater
dispersion or variation in set A than in set B.
Exercise:
Following are the wages of 8 workers of a factory. Find
the range and the coefficient of range. Wages in ($)
were 1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440.

198
QUARTILE DEVIATION AND COEFFICIENT
OF QUARTILE DEVIATION
The quartile deviation (semi – interquartile range) is
defined as Q  Q  Q .
3 1
2
The quartile deviation gives the average amount by
which the two quartile differ from the median. In
symmetrical distribution the two quartiles (Q1 and Q3)
are equidistant from the median.
The quartile deviation describes the variations among
the central items of the data.
The quartile deviation is a slightly better measure of
absolute dispersion than the range. But it ignores the
observation on the tails.
199
COEFFICIENT OF QUARTILE DEVIATION
A relative measure of dispersion based on the quartile
deviation is called the coefficient of quartile deviation.
Quartile deviation = Q3  Q1
Q3  Q1
It is pure number free of any units of measurement. It
can be used for comparing the dispersion in two or
more than two sets of data.
Example:
The wheat production (in Kg) of 20 acres is given as:
1240, 1320, 1040, 1080, 1200, 1440, 1360, 1680, 1730,
1785, 1342, 1960, 1880, 1755, 1720, 1600, 1470, 1750, and
1885. Find the quartile deviation and coefficient of
quartile deviation.
200
COEFFICIENT OF QUARTILE DEVIATION
After arranging the observations in ascending order,
we get 1040, 1080, 1200, 1240, 1320, 1342, 1360, 1440,
1470, 1600, 1680, 1720, 1730, 1750, 1755, 1785, 1880,
1885, 1960.
Where Q1 =1320, Q3 =1755
Q3  Q1
From Quartile deviation =
Q3  Q1

= (1755 - 1320)/(1755+1320)
= 435/3075
=0.1415

201
MEAN DEVIATION
The mean deviation or the average deviation is
defined as the mean of the absolute deviations of
observations from some suitable average which may
be the arithmetic mean, the median or the mode.
Thus for sample data in which the suitable average is
the X the mean deviation is given by:
Mean deviation = Xi  X
 n
For grouped data, the mean deviation is given by:
M.D = Xi  X
f
f
i
i

202
MEAN DEVIATION
The mean deviation is a better measure of absolute
dispersion than the range and the quartile deviation.
The mean deviation is based on all the observations, a
property which is not possessed by the range and the
quartile deviation.
The main limitation of the mean deviation is the
concept of absolute deviation itself.
Example: Find the mean deviation of 2,3,4,5,6.
Mean = 4
Xi  X
 n
= 6/5=1.2

203
COEFFICIENT OF MEAN DEVIATION
A relative measure of dispersion based on the mean
deviation is called the coefficient of the mean
deviation. It is defined as the ratio of the mean
deviation to the average used in the calculation of the
mean deviation. Thus
Coefficient of MD (mean)= MD(mean)/mean
Coefficient of MD(median) = MD(median)/median
Coefficient of MD (mode) = MD (mode)/mode
Example: Find the Coefficient of MD(mean) and
MD(median) for set A and B below.
Set A marks of students out of 25 :10,15, 18, 20, 20
Set B marks of students out of 100:30, 35, 40, 45, 50
204
STANDARD DEVIATION
Since the sum of the deviations from their arithmetic
mean is zero and we are interested in the magnitude of
the deviations, and not in their signs. We might
consider ways of eliminating the problem of signs, one
solution is to consider the squares of the deviations
from the mean. Thus if we average the squared
deviation from the mean and then take a square root to
compensate for the fact that the deviations were
squared we obtain the standard deviation given by the
formula:
S 
 
xi  x
2

N
And the square of standard deviation is known as
variance.
205
STANDARD DEVIATION
The standard deviation shows how much variation or
dispersion from the average exists. A low standard
deviation indicates that the data points tend to be very
close to the mean; a high standard deviation indicates
that the data points are spread out over a large range
of values. A useful property of the standard deviation
is that, unlike the variance, it is expressed in the same
units as the data.
Example: Find the standard deviation of the following
data set. 2,4,4,4,5,5,7,9
Mean= 5
S=2.

206
STANDARD DEVIATION
For grouped data;  f i xi  x 
2

S
f i

This can be written in the long form as:


fx   f i xi 
2 2

S i i
 
f i
 f 
 i 

And in the short form it can be written as:


 fd   fdi 
2 2

S i
 
f   f  where di= Xi-A
 
 fu   fui 
2 2

Which can further be simplified to S c i


 
Where Ui=(Xi – A)/c
f  f 
 

207
STANDARD DEVIATION
Example:
The time taken to complete the Mathematics quiz is
given by the following distribution table. Find the
standard deviation of time taken by students to finish
the quiz. Time Number of
students
0-10 4
10-20 9
20-30 6
30-40 4
40-50 2

208
STANDARD DEVIATION
Solution by long method:
Time Number of Xi fixi Xi2 fixi2
students

0-10 4 5 20 25 100
10-20 9 15 135 225 2025
20-30 6 25 150 625 3750
30-40 4 35 140 1225 4900
40-50 2 45 90 2025 4050
535 14825

S= 593 457.96
S = 11.62
Exercise: Use coding method and short method.
209
STANDARD DEVIATION
Combined Standard Deviation:
Just as it is possible to compute combined mean of two
or more than two groups, we can similarly compute
the combined standard deviation of two or more
groups. The formula of combined standard deviation
is given as:
n n

 ni si   ni d i
2 2

Sc  i 1
n
i 1

n
i 1
i

where
S c  Combined Standard Deviation
Si  Standard Deviation for sample i
di  X i  X c

210
STANDARD DEVIATION
Example:
The number of workers employed, the mean wage
(Tshs) per week and standard deviation in (Tsh) in
each section of a factory are given below, calculate the
standard deviation for all workers.
SECTION No of Mean Wage STD deviation
workers (Tsh) (Tsh)

A 50 220 20

B 60 240 40

C 90 230 15

211
COEFFICIENT OF VARIATION
The corresponding relative measure of standard
deviation is coefficient of variation (C.V). Coefficient
of variation is the most commonly used measure of
relative variation. It is used to compare variability of
two or more series. The series for which the coefficient
of variation is greater is said to be more variable or
conversely less consistent, less uniform, less stable or
less homogenous. On the other hand the series which
coefficient of variation is less is said to be less variable,
or more consistent, more uniform, more stable or more
homogenous. Coefficient of variation is a ratio of
standard deviation to the mean. C.V  s X 100%
x
212
COEFFICIENT OF VARIATION
Example:
The followings are the means and Standard deviations
of the wage ($) obtained by employees of two different
institutions. Identify the institution which is more
consistent in paying its employees.
X 1  1769.5
S1  377.75
X 2  1364.5
S 2  310.081
C.V1=377.75/1769.5*100= 21%
C.V2 = 310.081/1364.5*100 = 23%
Institution 1 is more consistent.

213
SKEWNESS AND KURTOSIS
• Skewness is a measure of the degree of asymmetry
of a distribution (it helps to measure how much the
distribution is not symmetric).
• Kurtosis is the degree of peakedness of a
distribution.
It is a measure of whether the data are peaked or flat
relative to a normal distribution (mesokurtic). That is,
data sets with high kurtosis tend to have a distinct
peak near the mean that decline rather rapidly
(Leptokurtic).Data sets with low kurtosis tend to have
a flat top near the mean rather than a sharp peak
(Platykurtic).

214
MEASURES OF SKEWNESS
Measures of skewness tell us the direction and extent
of asymmetry in a series and permit us to compare
two or more series. The extent of skewness can be
measured using different method. So in order to
calculate skewness, the following methods can be
used.
1. Absolute measures
Skewness can be measured in absolute terms, by
taking the difference between mean and mode.
Symbolically
Absolute SK*= X - mode.

215
MEASURES OF SKEWNESS
2. Relative measures
2.1 Pearson’s first Coefficient of skewness.
In this the measure of skewness is based on the
property of the divergence of mean from the mode in
any skewed distribution. The difference between mean
and mode is divided by the standard deviation to give
a relative measure. Thus, Karl Pearson’s coefficient of
skewness is given by:
Coefficient of Skewness (Skp) = ( X – mode)/s.
If SKp > 0, then the distribution is positively skewed.
If SKp < 0, then the distribution is negatively skewed.

216
MEASURES OF SKEWNESS
Example:
Calculate the coefficient of skewness of the following
data by using Karl Pearson's method. 1, 2, 3, 3, 4, 4, 4
Solution
mean=3, standard deviation = 1.07, mode= 4
Coefficient of skewness, SKp=(3−4)/1.07=−0.93
2.2 Pearson’s second Coefficient of skewness.
By the application of the empirical relation,
Mean – Mode = 3(Mean – Median)
We get the Pearson’s second coefficient of skewness
Coefficient of Sk =3(mean -median)/s
217
MEASURES OF SKEWNESS
For a perfectly symmetrical distribution, the value of
Skp is 0, and in general its value must fall between -3
and 3.
2.3 Bowley’s coefficient of skewness
Another measure of skewness is based on the quartile.
In a symmetrical distribution the third quartile is the
same distance above the median as the first quartile is
below it, Q3-Median = Median – Q1
If the distribution is positively skewed, the top 25% of
the values will tend to be further from the median than
the bottom 25%, hence the Bowley’s coefficient of
Skewness is given by the formula.
218
MEASURES OF SKEWNESS
SKQ =((Q3-Q2)-(Q2-Q1))/(Q3-Q2)+(Q2-Q1)
SKQ =(Q3+Q1-2Q2)/Q3-Q1.
This measure is also called quartile measure of
skewness, and it varies from -1 to 1.This measure
concentrates with the middle of the distribution and
ignores the tails.
2.4 Kelly’s coefficient of skewness
Bowley’s measure discussed above neglects the two
extreme quarters of the data. It would be better for a
measure to cover the entire data, thus Bowley’s
measure can be extended by taking any two deciles
equidistant from the median,
219
MEASURES OF SKEWNESS
or any percentiles equidistant from the median.
Kelly has suggested the following formula for
measuring skewness upon the 10th and the 90th
percentiles (or the first and ninth deciles):

SKk = ((P90-P50)-(P50-P10))/(P90-P50)+(P50-P10)
SKk =(P90+P10-2P50)/P90-P10

This method is not popular in practice and generally


Karl Pearson method is used.

220
MEASURES OF SKEWNESS
Example: Calculate measures of skewness (Pearson’s
and Bowley’s coefficient of skewness ) for the
frequency distribution below.

Pearson’s Coefficient:
Mean = 23.8, Mode = 24.4, S= 7.2, SKp =(23.8-24.4)/7.2
= -0.08
221
MEASURES OF SKEWNESS
Bowley’s coefficient:
Q1 =18.8, Q2=23.9 and Q3 =28.4
SkQ =(28.4+18.8-2(23.9))/28.4-18.8
= -0.06
2.5 Measure of Skewness based on the third moment
A measure of Skewness may also be obtained by
making use of the third moment about the mean. This
is expressed in dimensionless form and is given by:
3
  m3 / m2 2
Where:
m3 = ∑(x−x̅)3 / n and m2 = ∑(x−x̅)2 / n

222
MEASURES OF KURTOSIS
A measure of kurtosis uses the 4th moment about the
mean and is expressed in dimensionless form. It is
given by:
 2  m4 / m2 2

For a normal distribution (mesokurtic distribution):


 2  3 (Kurtosis can be  2 -3)
When  2>3 the kurtosis is positive and it is known as
leptokurtic distribution, when  2 <3 the kurtosis is
negative and it is known as platykurtic distribution.

223

You might also like