0% found this document useful (0 votes)
18 views81 pages

Mse1 Stat Class

The document provides an introduction to statistics, defining it as the science of collecting, organizing, summarizing, analyzing, and drawing conclusions from data. It covers important statistical terms, branches of statistics, types of data, and methods of data collection, emphasizing the significance of statistics in research and decision-making. Additionally, it discusses the classification of variables and measurement scales, along with practical examples and exercises to reinforce understanding.

Uploaded by

brinokamangat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views81 pages

Mse1 Stat Class

The document provides an introduction to statistics, defining it as the science of collecting, organizing, summarizing, analyzing, and drawing conclusions from data. It covers important statistical terms, branches of statistics, types of data, and methods of data collection, emphasizing the significance of statistics in research and decision-making. Additionally, it discusses the classification of variables and measurement scales, along with practical examples and exercises to reinforce understanding.

Uploaded by

brinokamangat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Introduction to Statistics

Francis J. Majawa

Malawi University of Business and Applied Sciences

February 7, 2024

Francis J. Majawa (Malawi University of Business


Introduction
and Applied
to Statistics
Sciences) February 7, 2024 1 / 16
Definition.

Statistics is the science of conducting studies to collect,


organize, summarize, analyze, and draw conclusions from data.
[Bluman Chapter 1]
Statistics;
Is basic to research
Equip students to understand various statistical studies
performed in their respective fields.
Train students to become better consumers and citizens,
i.e. in making intelligent decisions.

2/1
Important Statistical Terms.
Observation: A single member of a collection of items
that we want to study. i.e
Employee
Age
Heights e.t.c.
Variable: A characteristic/attribute that can assume
different values. i.e
x + 5 = 16 then x is a variable, it can take any other values
as long as when added to 5 should give us 16.
Age of students at the UNIMA in 2020, ’Age’ is a variable,
it can take any values.
Data: Are the values (measurements/observations) that
the variables can assume.

i.e. √
22 22
x = 11, x = , x = 121... so 11, , 121... are values of
2 2
a variable, hence data
Probable ages of students in UNIMA would be
23, 26, 34, 56, 78, 40 e.t.c. all are values of a variable ”Age”,
3/1
Terms Conti...

Population: Are all subjects (human or not) that are


being studied.
Random Variable: Values of a variable that are
determined by chance/probability.
Data Set:
A collection of data values.
Can either consist of one variable or many variables.
If it consist of one variable it is called a Univariate data
set
If it consist of two variables it is called a Bivariate data
set
if it consist of three and above variables it is called a
Multivariate data set

4/1
Data Set Variable Example Typical Task
Univariate 1 Income Histogram, Basic St
Bivariate 2 Income, Age Scatter plot, Correla
Multivariate 3 Income, Age, Gender Regression Modellin

Sample: Is a group of subjects selected from a population.

5/1
Branches of Statistics

Statistics is divided into two branches depending on how data


has been used.
1 Descriptive statistics: Which involves collection,
organization, summarization, and presentation of data.
2 Inferential statistics: Which involves generalizing from
samples to populations, performing estimations and
hypothesis tests, determining relationships among
variables, and making predictions.

6/1
Examples

Determine whether descriptive or inferential statistics were


used.
1 The average jackpot for the top five lottery winners was
K367.6 million.
2 A study done by the American Academy of Neurology
suggests that older people who had a high caloric diet more
than doubled their risk of memory loss.
3 Based on a survey of 9317 consumers done by the National
Retail Federation, the average amount that consumers
spent on Valentines Day in 2011 was $116.
4 Scientists at the University of Oxford in England found
that a good laugh significantly raises a persons pain level
tolerance.

7/1
Solutions

1 Descriptive statistics were used because this is an


average, and it is based on data obtained from the top five
lottery winners at this time.
2 Inferential statistics were used since this is a
generalization made from a sample to a population.
3 Descriptive statistics were used since this is an average
based on a sample of 9317 respondents.
4 Inferential statistics were used since an inference is
made from a sample to a population.

8/1
Exercise
Read the following passage and answer questions that follows:

A study conducted at Domasi college revealed that students


who attended class 95 to 100% of the time usually received an A
in the class. Students who attended class 80 to 90% of the time
usually received a B or C in the class. Students who attended
class less than 80% of the time usually received a D or an F or
eventually withdrew from the class.Based on this information,
attendance and grades are related. The more you attend class,
the more likely it is you will receive a higher grade. If you
improve your attendance, your grades will probably improve.
Many factors affect your grade in a course. One factor that you
have considerable control over is attendance. You can increase
your opportunities for learning by attending class more often.

9/1
Exercise Questions

1 What are the variables under study?


2 What are the data in the study?
3 Are descriptive, inferential, or both types of statistics used?
4 What is the population under study?
5 Was a sample collected? If so, from where?
6 From the information given, comment on the relationship
between the variables.

10 / 1
Variables.

Variables in statistics are classified into two categories.


Qualitative Variable:
Also known as Categorical variable.
Variables that can be placed into distinct or specific
categories according to some characteristic/attribute.
i.e. people are classified according to gender(female or
male), hence gender is categorical variable.
1.e. people are classified according to blood group, A, AB,
O, B e.t.c. hence blood type is a categorical variable.
Quantitative Variable:
Also known as Numerical variables.
These are variables that can be counted or measured.
i.e. Age, height, number of shops in a market e.t.c.

11 / 1
1. Categorical Data.

Can be represented with label. i.e.


Vehicle type: Car, Bus, Tracks
Gender : Female, Male
Can also be coded, thus using numbers to represent
categories to facilitate statistical analysis. i.e
Vehicle Type:1 = Car, 2 = Track, 3 = Bus
Gender: 0 = Male, 1 = Female
Note: Coding a category as a number does not make the data
numerical.

12 / 1
2. Quantitative Variables.

Quantitative variables are further grouped into two:


Discrete Variable: Quantitative variables whose values
can be counted. i.e. Age, number of children in a family
e.t.c.
Continuous Variable: Quantitative variables that
assumes an infinite number of values between any two
specific values. They are obtained by measuring. They
often include fractions and decimals. i.e. height, weight,
temperature, haemoglobin level e.t.c.

13 / 1
Question.

Classify each variable as a discrete variable or a continuous


variable.
1 The highest wind speed of a hurricane
2 The weight of baggage on an airplane
3 The number of pages in a statistics book
4 The amount of money a person spends per year for online
purchases

14 / 1
Solutions

1 Continuous: since wind speed must be measured


2 Continuous: since weight is measured
3 Discrete: since the number of pages is countable
4 Discrete: since we can count the money.

15 / 1
Types of Data

In addition to being classified as qualitative or


quantitative, variables can be classified by how they are
categorized, counted, or measured.
For example, can the data be organized into specific
categories, such as area of residence (rural, suburban, or
urban)?
Can the data values be ranked, such as 1st place, 2nd place,
3rd place etc.?
Or are the values obtained from measurement i.e. heights,
IQs, or temperature?
This type of classification, i.e., how variables are
categorized, counted, or measured, uses measurement scale
(also known as Level of Measurement).
16 / 1
Measurement Scale / Levels of Measurement

Levels of measurement of the data, dictates the calculations


that can be done to summarize and present the data.
It also determine the statistical tests that should be
performed in the analysis stage.
The 4 common types of measurement scale used are:
Nominal, Ordinal, Interval, and Ratio.
These levels of measurement are best understood by
examples.

17 / 1
Nominal

Comes from Latin word ”nomen” meaning Name


Data that can be categorized but not ordered/ranked.
The variable of interest can be divided into mutually
exclusive (non overlapping) categories or outcome.
Examples; Gender(Male, female), religion (SDA, RCC,
CCAP), Color (Green, Orange, Yellow), National/region
(malawi, namibia, Zambia) etc.

18 / 1
Ordinal
Data classifications categorisation are represented by labels
or names (high, medium, low) that have relative values.
Because of the relative values, the data classified can be
ranked or ordered.
Though the data can be categorized and ranked/ordered
but the difference between ranks does not exist.
In other words, precise differences between the ranks do
not exist.
the ranks lack the properties that are required to compute
many statistics, such as the average.
Example; grades (A, B, C, D,...), Rating scale (Poor, good,
excellent, ...), satisfaction level, happiness,...
Specifically, there is no clear meaning to the distance
between A and B, or if ranks are coded with numbers, the
difference between 1 and 2 is meaningless.
what would be the distance between Rarely and Never? 19 / 1
Interval
Interval data includes all the characteristics of the ordinal
level.
And precise differences between units of measure do exist
and is a constant size; however, there is no meaningful zero.
Equal differences in the characteristic are represented by
equal differences in the measurements.
Examples include temperature, Scores, IQ,...
The interval between 60◦ C and 70◦ C is the same as the
interval between 20◦ C and 30◦ C.
Since intervals between numbers represent distances, we
can do mathematical operations such as taking an average.
But having no meaningful Zero, i.e. we can’t say that 60◦ C
is twice as warm as 30◦ C or we cannot say a temperature
of 0◦ C means there is no temperature.
20 / 1
Ratio

Ratio data Possesses all the characteristics of interval


measurement.
And there exists a true zero, and represents the absence of
the quantity being measured.
Zero does not have to be observable in the data.
E.g: Newborn babies, cannot have zero weight, but weight
is ratio data. So here, what matters is that the zero is just
an absolute reference point.
In addition, true ratios exist when the same variable is
measured on two different members of the population.
Examples; Height, weight, Salary, Age, time, units of
production, changes in stock prices, distance between
branch offices,...
21 / 1
Time Series Vs Cross sectional Data

Are different in-terms of their use and nature of data


Time-series data considers the same variables over a certain
period of time, whereas cross-sectional data uses different
data for a given point in time.
It means that time-series data are stable, whereas the data
used in the cross-sectional analysis are scattered.

22 / 1
Time Series

Time series data are observations of data that are collected


at specific intervals of time.
Therefore, time-series data may be categorized into hourly,
daily, monthly, quarterly, half yearly, and yearly.
The idea is to check the similarity and differences of data
recorded at different time periods.
Time-series analysis is an analysis to determine the future
pattern by considering the past and present conditions and
then extending the trend to future conditions.
Therefore, in the case of time-series data, the longer the
interval between two data collection times, the better it is
for the future prediction of the outcome

23 / 1
Cross-section Data

Cross-sectional data are observations of multiple subjects


at one point in time.
Cross-sectional data can also be referred to observations of
many different individuals (subjects, objects) at a given
time.
Where each observation belonging to a different individual.
A simple example of cross-sectional data is the gross
annual income for each of 1000 randomly chosen
households in Blantyre for the year 2012.
Cross-sectional data are distinguished from longitudinal
data, where there are multiple observations for each unit,
over time.

24 / 1
Data Collection

Data Collection Strategy: No one best way: decision


depends on:
What you need to know: numbers or stories
Where the data reside: environment, files, people
Resources and time available
Complexity of the data to be collected
Frequency of data collection
Intended forms of data analysis

25 / 1
Rules for Collecting Data.

Use multiple data collection methods


Use available data, but need to know
how the measures were defined
how the data were collected and cleaned
the extent of missing data
how accuracy of the data was ensured

26 / 1
Rules for Collecting Data.

If must collect original data:


be sensitive to burden on others
pre-test, pre-test, pre-test
establish procedures and follow them (protocol)
maintain accurate records of definitions and coding
verify accuracy of coding, data input

27 / 1
Data Collection Tools

Participatory Methods
Records and Secondary Data
Observation
Surveys and Interviews
Focus Groups
Diaries, Journals, Self-reported Checklists
Other Tools

28 / 1
Participatory Methods

Involve groups or communities heavily in data collection


Examples:
community meetings
mapping: this method gives participants freedom to shape
discussion on a given topic with minimal intervention from
researchers.
transect walks: where members of the community walk
through different areas of the community, interviewing
passers-by and drawing a map with observations of
characteristics, risks and existing solutions after the walk.

29 / 1
Community Meetings

One of the most common participatory methods


Must be well organized
agree on purpose
establish ground rules
who will speak
time allotted for speakers
format for questions and answers

30 / 1
Records and Secondary data

Examples of sources:
files/records
computer data bases
industry or government reports
other reports or prior evaluations
census data and household survey data
electronic mailing lists and discussion groups
documents (budgets, organizational charts, policies and
procedures, maps, monitoring reports)
newspapers and television reports

31 / 1
Using Existing Data Set

Key issues to consider: validity, reliability, accuracy, response


rates, data dictionaries, and missing data rates

32 / 1
Advantages/Disadvantages

Advantages: Often less expensive and faster than


collecting the original data again
Disadvantage: There may be coding errors or other
problems. Data may not be exactly what is needed. You
may have difficulty getting access. You have to verify
validity and reliability of data

33 / 1
Observation

One sees what is happening:


traffic patterns
land use patterns
layout of city and rural areas
quality of housing
condition of roads
conditions of buildings
who goes to a health clinic

34 / 1
Observation is helpful when:

need direct information


trying to understand ongoing behavior
there is physical evidence, products, or outputs than can be
observed
need to provide alternative when other data collection is
unfeasible or inappropriate

35 / 1
Ways to Record Information from Observations:

Observation guide: printed form with space to record


Recording sheet or checklist: Yes/no options; tallies, rating
scales
Field notes:least structured, recorded in narrative,
descriptive style

36 / 1
Guidelines for Planning Observations

Have more than one observer, if feasible


Train observers so they observe the same things
Pilot test the observation data collection instrument
For less structured approach, have a few key questions in
mind

37 / 1
Advantages/Disadvantages

Advantage: Collects data on actual vs. self- reported


behavior or perceptions. It is real-time vs. retrospective
Disadvantage: Observer bias, potentially unreliable;
interpretation and coding challenges; sampling can be a
problem; can be labor intensive; low response rates.

38 / 1
Surveys and Interviews

Excellent for asking people about: perceptions, opinions,


ideas
Less accurate for measuring behavior
Sample should be representative of the whole
Big problem with response rates

39 / 1
Modes of Survey

Telephone surveys
Self-administered questionnaires distributed by mail,
e-mail, or websites
Administered questionnaires, common in the development
context
In development context, often issues of language and
translation

40 / 1
Advantage/Disadvantage

Advantage: Best when you want to know what people


think, believe, or perceive, only them can tell you that.
Disadvantage:People may not accurately recall their
behavior or may be reluctant to reveal their behavior if it is
illegal or stigmatized. What people think they do or say
they do is not always the same as what they actually do.

41 / 1
Interviews.

Often semi-structured
Used to explore complex issues in depth
Forgiving of mistakes: unclear questions can be clarified
during the interview and changed for subsequent interviews
Can provide evaluators with an intuitive sense of the
situation

42 / 1
Challenges of Interviews.

Can be expensive, labor intensive, and time consuming


Selective hearing on the part of the interviewer may miss
information that does not conform to pre-existing beliefs
Cultural sensitivity: e.g., gender issues

43 / 1
Focus Group

Type of qualitative research where small homogeneous


groups of people are brought together to informally discuss
specific topics under the guidance of a moderator
Purpose: to identify issues and themes, not just interesting
information, and not ”counts”

44 / 1
Focus Groups are Inappropriate when:

There is a language barriers.


evaluator has little control over the situation
trust cannot be established
free expression cannot be ensured
confidentiality cannot be assured

45 / 1
Advantage/Disadvantage

Advantage: Can be conducted relatively quickly and


easily; may take less staff time than in-depth, in-person
interviews; allow flexibility to make changes in process and
questions; can explore different perspectives; can be fun.
Disadvantage: Analysis is time consuming; participants
not be representative of population, possibly biasing the
data; group may be influenced by moderator or dominant
group members.

46 / 1
The Population

There are two different types of population:


Target Population: Consists of the group of population
units from whom we would like to collect data (e.g. all
students in the Unima)
Study or Survey Population: Consists of the group of
population units from whom we can collect data (e.g. all
students in UNIMA with laptops)

47 / 1
The Population

NOTE: Ideally a sample survey should have collected data from


Target Population but in practice, we collect data from Study
Population due to some constraints.

48 / 1
The Sample

A sample must be:


Unbiased: The chosen sample should be representative of
the entire population of interest. E.g. if we are interested
in the weight of primary school children, we should select a
sample that includes children from a range of primary
school classes and year groups.
Taken from the collect population: The sample should
only contain members of the population of interest. E.g. if
we are interested in the characteristics of primary school
children, the sample should not contain children from
secondary school.

49 / 1
Sampling Methods

Grouped into two categories:


Non-Probability Sampling: Involves non-random
selection based on convenience or other criteria, allowing
you to easily collect initial data.
Probability Sampling: Involves random selection,
allowing you to make statistical inferences about the whole
group.

50 / 1
Non-Probability Sampling

Has the following characteristics:


No sampling frame is used, therefore the chance of someone
being included in the sample cannot be calculated.
Results from the survey can be produced cheaply and
quickly.
Population coverage is poor since it only captures those
that are available to contribute at the time and/or are
interested enough in the subject under investigation;
It is difficult to make estimates of the population from the
sample results and any generalizations that are made must
be treated with caution.
Performing non-probability sampling is considerably less
expensive than probability sampling methods.
51 / 1
Types of Non-probability Sampling

Convenience Sampling: Data is collected from any


willing and available respondent. Examples include
Street corner interviews;
Magazine and newspaper questionnaires; and
Phone-in polls.
The sample is likely to be unrepresentative of the
population, because only those who feel strongly about the
topic are likely to respond and interviewers may only
approach one particular type of respondent, usually those
that they feel comfortable with. Therefore, the results of
the survey may be biased.

52 / 1
Types of Non-probability Sampling

Purposive Sampling:
Read on Purposive Sampling and write down what it
is,when to use it, advantages and disadvantages.

53 / 1
Types of Non-probability Sampling

Quota Sampling: The population is divided into different


groups or classes according to different characteristics of
the population, and some percentage(proportion) of the
different groups in total population is fixed
In Quota sampling, researchers create a sample involving
individuals that represent a population.
Researchers choose these individuals according to specific
traits or qualities.
Quotas are devised to reflect the characteristics of the
population, hence quota sampling attempts to obtain a
more representative sample than convenience sampling, and
therefore more representative sample results should be
obtained.

54 / 1
Quota Sampling Example & Steps

A study to investigate the proportion of those who eat Pizza


and Cake at home.
Steps
Divide the group into subgroups of some characteristics
Identify proportion of these subgroups in the population.
i.e. N = 10, 4 cakes and 6 pizza
Lastly, select subjects to form sample group: i.e. 50% cakes
(n = 2) and 50% pizza (n = 3), hence total sample n = 5

55 / 1
Snow-ball sampling

56 / 1
Advantages & Disadvantages of Non-probability
Sampling

Advantages:
Non-probability sampling techniques are a more conducive
and practical method for researchers deploying surveys in
the real world.
Getting responses using non-probability sampling is
faster(time effective) and more cost-effective than
probability sampling because the sample is known to the
researcher. The respondents respond quickly as compared
to people randomly selected as they have a high motivation
level to participate.
Effective when it is unfeasible or impractical to conduct
probability sampling.

57 / 1
Advantages & Disadvantages of Non-probability
Sampling

Disadvantages:
Lower level of generalization of research findings compared
to probability sampling
Difficulties in estimating sampling variability and
identifying possible bias

58 / 1
Probability Sampling

All members of the study population have known probability of


being included in the sample
Has the following characteristics:
Use a sampling frame from which to select a sample
Select samples at random from the sampling frame.
Therefore every item on the sampling frame has a chance
of being selected and the probability of selection can be
calculated
Select a sample that is more representative of the
population (than non-probability methods) and
Researchers can calculate the accuracy of the survey
estimates

59 / 1
Example Questions

1 What is the distribution of household sizes in Mulanje


district?
2 What proportion of children aged 6 and attending standard
1 in Mangochi sleep under a mosquito net?
3 What is the distribution of ages of University students in
Malawi?

60 / 1
Some Important terms

Target population: Total population about which


information is required, e.g all University students at time
of study
Study population: The set of individuals from which
individuals to be studied will be selected, e.g all those
attending classes during the study period (when data
collection takes place)
Often these are identical or very similar. But not always

61 / 1
Some Important terms Cont...

Population characteristic: The aspect(s) of the


population to be studied, e.g mean age, proportion of
babies who sleep under a net
Sampling units: The persons or groupings used to select
sample members, e.g households
Sampling frame: Set of sampling units, e.g schools in a
village
List: A real list of units in the sampling frame

62 / 1
Some notation

Population size: N
Sample size: n
n
Sampling fraction: f = N

63 / 1
Probability Sampling Methods

1 Simple Random sampling


2 Systematic sampling
3 Stratified sampling
4 Cluster sampling
5 Multi-phase Sampling
6 Multi-stage sampling

64 / 1
Simple Random sampling (SRS)

Each and every member of the study population has the


same chance of being selected into the sample.
The chance is equal to the sampling fraction (f) where
n
f=N .
Requirements:
A list of all members of the sampling frame
Possible methods:
Pieces of paper in a hat / drum
Random digit tables
Use random digit methods in a software package

65 / 1
Replacement

Sample without replacement - once selected a sampling


unit cannot be drawn again
Sample with replacement - after being selected a sampling
unit can still be drawn again (same chance each time)

66 / 1
Simple Random Sample (WITHOUT Replacement)

Step 1: List the N subjects in the study population. This is


the list of the sampling frame.
Step 2: Number entries in the listing from 1 to N
Step 3: Select n random numbers between 1 and N
Step 4: Use the list of the sampling frame to identify each
individual corresponding to the ID numbers selected
Step 5: Locate each and seek their consent to participate
in the survey

67 / 1
Selecting n random numbers using Excel

Use function: RANDBETWEEN(1, N )


Repeat at least n times
Example
Select a SRS of 30 subjects from a population of 500
N = 500
n = 30

68 / 1
Stratified random sampling

Stratification is the process of grouping the units within a


population of interest into homogeneous sub-groups called
strata
All strata should be mutually exclusive, that is that every
unit within the population of interest can only be assigned
to one strata.
Collectively the strata should also be exhaustive so that all
units are covered by one of the strata

69 / 1
Stratified random sampling cont...

A stratified random sample can be chosen by following the steps


below:
Divide the population into groups called strata: The
population should be split into groups according to some
characteristic that is related to the subject of the survey
A sample is selected from within each stratum using SRS
method. We determine the number of units to be selected
from each strata using an allocation method. The methods
of allocation that such as equal, proportional or optimal
allocation.
The samples for each stratum are collated to form the total
sample of the population. This ensures that each stratum
is represented in the sample.

70 / 1
Allocating the Sample among the Strata

Once we have split our population into strata, we need to


work out how many units to sample from each stratum.
There are three methods of allocating a sample of size n
among the different strata - equal allocation, proportional
allocation and optimal

71 / 1
Advantages

1 The results of stratified random samples tend to be more


accurate (have lower variance) since the grouping together
of similar units controls for the variation within strata.
2 The sample obtained through stratification is more
representative of the population
3 Stratification also permits separate analyses on each group,
which researchers may find useful

72 / 1
Disadvantages

1 This method is more costly and difficult to organize, since


it involves splitting the population into different strata and
taking a sample from each stratum
2 There is a danger of splitting the population into too many
small strata. This may mean that some of the strata may
not contain any sample members or the sample may not be
large enough to be spread across all of the strata
3 Sometimes there may be more than one variable that the
survey needs to be stratified by

73 / 1
Systematic Random Sampling

Systematic random sampling


Use the anticipated population size and planned sample
size to determine the sampling fraction f to be used
Determine a sequence in which sampling units are added to
the list, eg entry in a register, order on a route
1
Determine the sampling interval k = f
Randomly select a number between 1 and k
Select this sampling unit
Then select every k ∗ th sampling unit thereafter

74 / 1
Example

Target population: Patients attending the Out Patient


Department (OPD) at QECH
Number of patients expected in study period = 20, 000
Sample size = 200
Sampling fraction f = 1/100; k = 1/f = 100
Select a random number between 1 and 100, say 42
Approach 42nd patient, then 142nd , 242nd etc.

75 / 1
Cluster Random Sampling

Cluster sampling
Used members of the study population are naturally in
groups, called clusters,
e.g villages for residence,
schools for education,
health center catchment areas for health care e.t.c.
Obtain a simple random sample of clusters
Sample members from the selected clusters only
May select only a sub-set of them

76 / 1
Cluster Sampling Example

What proportion of standard 1 students sleep under a mosquito


net in Mangochi district?
Study population: Standard 1 students aged 6 in Mangochi
district
Population size: approximately 3,000
Number of schools = 54
Randomly select 7 schools and obtain data for every
standard 1 student in the chosen schools
7
Final sample size is approximately 3, 000 × = 389
54

77 / 1
Do all members of the study population have known probability
of being included in the sample?
If Yes:
7
probability a school is selected = = 0.13
54
since all students in selected schools are selected this is also
probability a student is selected
Sometimes sampling of clusters uses sampling in proportion
to size

78 / 1
What are the sampling units?
In cluster sampling the primary sampling units are the
clusters
Individuals that make up the clusters are secondary
sampling units
For the standard 1 students e.g:
primary sampling units -schools
secondary sampling units - students

79 / 1
Multistage cluster sampling

80 / 1
The End.

81 / 1

You might also like