0% found this document useful (0 votes)
12 views171 pages

Lecture-3&4 - Measure of Centeral T

Introduction to probability

Uploaded by

Abebe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views171 pages

Lecture-3&4 - Measure of Centeral T

Introduction to probability

Uploaded by

Abebe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 171

Chapter-Three

Measure of Central Tendency


Definition
Measures of central tendency
oThe tendency of statistical data to get the actual value at which the
data tends to concentrate.
oThey are a single numbers which quantify the characteristics of a
distribution of data set. This measure used to provide insight into
typical or representative score of the data.
oThe methods of determining the actual value at which the data
tend to concentrate.

Why measure of central tendency:


 To describe (locate) the center of the distribution
 To facilitate comparison
To make further statistical analysis

2
Cont’d …
o When two or more groups are measured, the central tendency
provides the basis of comparison between them.

A typical average is better if;


 It should be based on all observations

 It should not be affected by the extreme values


 It should be as close to the maximum number of values as
possible
 It should be defined rigidly(have definite value)

3
Cont’d …
The most common measures of central tendency include:
1. Mean (Arithmetic, Weighted, Geometric, and

Harmonic)
2. Median

3. Mode

4. Quantiles (Quartiles, deciles and percentiles)

4
X 1  X 2  ...  X n
x  x2  ...  xn 
X  1 N
n
N
n

x i
x
i 1
i

 i 1
N
n
5
k

 fi X i
X  i 1
k

 i 1
fi

 fi X i
X  i 1k ,
 i 1
fi

6
Example

7
Grouped Data

8
Special properties of A.M
Cont’d

X 1 n1  X 2 n 2  X i ni
40(350)  60(380)
Xc   i 12  368Birr
n1  n 2 40  60
n i 1
i

10
Merits and demerits of arithmetic mean
o The mean can be used as a summary measure for quantitative
data, but it is not appropriate for either nominal or ordinal data.
o For a given set of data, there is only one arithmetic mean
(uniqueness).
o Easy to calculate and understand (simple).
o Greatly affected by the extreme values.

o In the case of grouped data, if any class interval is open-ended,


the arithmetic mean can not be calculated.
2. Weighted mean
o While calculating simple arithmetic mean, all items were
assumed to be of equal importance (each value in the data set
has equal weight).
o When the observations have different weight, we use weighted
average. Weights are assigned to each item in proportion to its
relative importance.
Con'd….
Con'd….
 Solution: We use a weighted mean, the weight associated with
each course being taken as the number of credits received for
the corresponding course.
Cont’d

15
Con'd….
Merits and Demerits of Arithmetic Mean
Merits:
 It is based on all observations
 It is suitable for further statistical analysis
 It is easy to calculate and simple to understand
Demerits:
 It is affected by extreme observations
 It cannot be used in the case of open-ended classes
 It cannot be used when dealing with qualitative
characteristics, such as intelligence, honesty, and beauty

By habtamu.A. 12/13/24
3. Geometric mean
Con'd….

By habtamu.A. 12/13/24
Con'd….

Values 3 4 5 6
Freq. 2 3 1 2
Con'd….
4. Harmonic Mean
Con'd….
Activity; discuses the advantage and disadvantage of A.M, G.M
and H.M
Exercise: The number of diarrhea episodes for 25child are
summarized in the following table.

diarrhea No child
episodes
1 3
2 3
3 f3
5 2
6 10
8 f6

If the arithmetic mean is 4.8,then what are the values of f3 and f6?
23
 Median is as its name indicates the middle most value in the
arrangement which divides the data in to two equal parts

If “n” is odd If “n” is Even

~ ~ 1 
X X 1 X   X n  X n 
2
( n 1)
2 2 1
2 

i.e
When n = 11, then the median is the 6th observation.
When n = 12, then the median is the 6.5th observation, which is an
observation halfway between the 6th and 7th ordered observation.

24
Example: For the same random sample, the ordered observations will be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. = (32+34)/2 = 33.
Median of Group Data

~ w n 
X Lme    lcfbm 
fm  2 

Lme = Lower class boundary of the median class


w = Width of the median class fm = Frequency of the median class
n = total observation lcfbm = less than cumulative frequency of
the class before the median class
To determine the median class, we have to take the class that contains or
th
 n n
  or lcf
 2 2

25
Example: Find Median

Age in years Number of people Cumulative number of


people
14.5-19.5 677 677
19.5-24.5 1908 2585
24.5-29.5 1737 4332
29.5-34.5 1040 5362
34.5-39.5 294 5656
39.5-44.5 91 5747
44.5-49.5 16 5763
Total 5763 -

26
Solution: To determine the median class, we have to take the class
that contains
th th
 n  5763  th
   
  2881.5 item
 2  2 

The first Lcf in which 2881.5 is less than or equal to is 4332.


Hence, the median class is 24.5-29.5 Then,
Lme=15 ~ w  n 
X  L me    lcfbm 
w =5 f m 2 
n =5763
 5763 5 
fm = 1737  24.5    2585 
1737  2 
Fpm = 2585
 24.5  0.85  24.9

27
THE MODE ( X̂ )
The mode or modal value is the value with the highest frequency in
the data set. The mode of a set of data or distribution can be:
No mode: In this case all values appear equal number of times
Unimodal: If the distribution has only one mode
Bimodal: If the distribution has two modes
Multi-modal: If the distribution has more than two modes

Example: The age distribution of male at the time of marriage:


23, 28, 28, 31, 32, 33, 34, 37, 41, 43, and 45 is 28,
since it occurred twice while the other values occurred only once.

28
Mode of Group Data
 1
x Lmo  w
1   2
Lmo = Lower boundary of modal class
Δ1 = difference of frequency between modal class and class before it
Δ2 = difference of frequency between modal class and class after it
w = class width
 1
f mo
 f 1

 2
f  f
mo 2

f1 = frequency of the class preceding the modal class


f2= frequency of the class succeeding the modal class
Modal Class: class which has highest frequency
29
Example: The following are the sizes (in millimeters) incidental
intracranial aneurysms (IIAs) of 30 patients.

Since, the maximum frequency is 12, the modal


IIAs Frequency (f) class is 5-9.Then,
Lmo=4.5 w =5 fmo=12 f1= 6 f2 = 7
0-4 6
5-9 12  1
f mo
 f 1
12  6 6

10-14 7  2
f  f 12  7 5
mo 2

15-19 5
  1 
20-24 0 x  L mo  W  
 
Total n = 30  1   2 
  6
x  4.5  5 
 65

x  4.5  0.55 5.05

30
Measure of position(quantiles)
 Quantiles are measures of position that divide a dataset into
equal intervals, each containing a specific proportion of the data.
 They help to describe the distribution of a dataset by identifying
values at specific points that divide the data into portions.
 The most commonly used quantiles are: quartile, decile and
percentile

31
Con'd….
Con'd….
Example: The following data shows the age of 30 sampled patients
in JUSH 6, 9, 11, 14, 16, 17, 18, 21, 22, 22, 22, 22, 23, 25, 25,
26, 27, 28, 28, 32, 33, 34, 34, 36, 39, 39, 41, 45, 46, 49

Find the lower, middle and upper quartiles for the above data.
Solution: n = 30 Q1  1 (n  1)th = 1 4 (30  1) th
4

= 7.75th value =7th value +0.75(8th value


-7th value)
=18+0.75(21-18) =
18+2.25 =20.25
This implies one fourth of the patients(25%)age are below 20.5
years.
34
Quartile for grouped data
Deciles
Con'd….
Percentile
Con'd….
Con'd….

Income No. of person


100 - 200 15
100 - 300 33
100 - 400 63
100 - 500 83
100 - 600 100
Chapter four

Measures of Variation
Introduction
o Measures of central tendency locate the center of the distribution.
However, they do not tell how individual observations are
scattered on either side of the center. The spread of observations
around the center is known as dispersion or variability.
o In other words, the degree to which numerical data tends to
spread about an average value is called dispersion or variation of
the data.
o Measures of dispersions are statistical measures that provide
ways of measuring the extent to which data are dispersed or
spread out.
Significance of measure of dispersion
o To determine the reliability of an average: If the variation is
small, the average will closely represent the individual values and
is highly representative on the other hand, if the dispersion or
variation is large, the average will be quite unreliable.
o To compare the variability of two or more groups: It is also
useful to determine the uniformity or consistency of two or more
groups. A high degree of variation would mean less consistency or
less uniformity as compared to the data having less variation.
o For facilitating the use of other statistical measures: Measures
of dispersion serve as the basis of many other statistical measures
such as correlation, regression, and testing of hypothesis.
Type of measure of dispersion
1. Absolute measures of variation. The absolute measure is
expressed in the same statistical unit in which the original data
are given such as kilograms, tones, etc. These measures are
suitable for comparing the variability in two distributions having
variables expressed in the same units and the same averaging
size.
2. Relative measure of variation: In case the two sets of data are
expressed in different units of measurement, then the absolute
measures of variation are not comparable. In such cases,
measures of relative variation should be used
Absolut Relative
1. Range Relative range
2. Inter-quartile range Coefficient of quartile
deviation
3. Variance Coefficient of variation
4. Standard deviation Standard score
Range
o It is the difference between the largest and smallest
observations from the data.
o Example: Consider the data on the weight (in Kg) of 10
newborn children at Jimma Hospital within a month: 2.51,
3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43
Solution: The range for the dataset can be computed by first
arranging all observations into ascending order as: 1.98, 2.02,
2.33, 2.33, 2.43, 2.51, 2.88, 2.98, 3.01, 3.25

Range = Maximum – Minimum = 3.25-1.98 = 1.27


Quartile deviation and Coefficient of quartile deviation
Con'd….
The variance and standard deviation
Con'd….
Con'd….
Con'd….

By habtamu.A. 12/13/24
Con'd….
Some important properties of variance and standard
deviation

12/13/24
Coefficient of variation (CV)
oWhen two data sets have different units of measurement, or their
means differ sufficiently in size, the CV should be used as a
measure of dispersion. It is used to assess the relative variability
of data.
oThe coefficient of variation is defined as the ratio of standard
deviation to the mean, usually expressed as a percent.
oData with lower CV indicates less variability or consistency,
meaning the data is more tightly clustered around the mean.
oData with higher CV indicates more variability relative to the
mean, meaning the data is more spread out.
o measure variation relative to the mean and present in percentage
(%)
Con'd….
Example: Last semester, the students of the nursing and
anesthesia departments took Stat273 course. At the end of the
semester, the following information was recorded.

Department Nursing Anesthesia


Mean score 79 64
Standard deviation 23 11

Compare the relative dispersion of the two departments' scores?

Solution: The means of the two sets of data are very different, we
use coefficient of variation to compare variability
Con'd….
 The coefficients of variation are calculated as

 Interpretation: It can be seen that CV for Nursing students is


greater than that of Anesthesia students, we can say that there is
more variation relative to the mean in the distribution of Nursing
students' scores compared with that of Anesthesia students.
Standard score (Z-scores)
o Statistical measure that quantifies the number of standard
deviations a data point is away from the mean of a dataset.
o It tells us how many standard deviations a specific value is
above or below the mean value of the data set.
o It is obtained by subtracting the mean of the data set from the
value and dividing the result by the standard deviation of the
data set.
 A Z-score of 0 means the data point is exactly at the mean.
 A positive Z-score indicates the data point is above the mean
(greater than average).
 A negative Z-score indicates the data point is below the mean
(less than average).
Con'd….

Course Average Std.dev


score
Int. to 51 12
Statistics
Int.to 72 16
Economics
Solution:

Even though Student scored 66 in Int. to Statistics and scored 80 ,in


Int. Economics the Z-scores tell us that student has higher
performance in Int. to Statistics than in Int.to Economics relative to
the class.
Measure of shapes
o In statistics, measure of shapes used to describe how data is
distributed, particularly the symmetry, and the presence of
skewness or kurtosis.
o These measures help provide insights into how the data is
spread or clustered around its central values.
The key tool used to measure the shape of data distribution;
I. Skewness
II. Kurtosis
o Skewness: skewness of distribution defined as lack of
symmetry. It tells us the direction of variability from the center,
not the size.
o A distribution is said to be skewed if one of its tails (either the
left or the right) is longer or fatter than the other.
Con'd….
Based on the type of skewness, distributions can be:
a)Symmetrical distribution: It is neither positively nor negatively
skewed.
b)Negatively skewed distribution: occurs when the majority of
scores are at the right end of the curve and a few small scores are
scattered at the left end.
c)Positively skewed distribution: Occurs when the majority of
scores are at the left end of the curve and a few extremely large
scores are scattered at the right end.

By habtamu.A. 12/13/24
Graphical representation

o In symmetry distribution, the three measures of central tendency


are approximately equal and the shape of frequency distribution
divided into two equal parts at the mean.
o If extremely low or extremely high observations are present in a
distribution, then the mean tends to shift towards those scores.
By habtamu.A. 12/13/24
Some important measure of skewness
kurtosis
o Kurtosis is the degree of peakedness or flatness of a distribution
o It tells us how tall and sharp the central peak is, relative to a
standard bell curve. It tells us the degree of data concentration
around the mean.
o When the curve of a distribution is relatively flatter than normal
it is known as platykurtic.
o The distribution is more peaked than normal, it is called
leptokuric. The normal distribution which is not very high
peaked or flat topped is called mesokurtic.

The diagram illustrates the shape of three different curves mentioned above
Measure of kurtosis

Interpretation of the value of 𝛼4


If 𝛼4 > 3 then the curve is
leptokurtic
If 𝛼4 = 3 then the curve is
mesokurtic
If 𝛼4 < 3 then the curve is
platykurtic

By habtamu.A.
Con'd….

67
Chapter Five

Elementary Probability
Introduction
o Every scientific experiment investigating the patterns in natural
phenomena may result in " events " that may or may not happen.
o Most of the events in real life have uncertainty in their
happening.
For example, The event of increase in the gold price under an
economic condition in a country.
A drug control for cancer patients for curing disease in a
period of time.
In such a situation, knowledge about the chance or probability of
occurrence of an event of interest is vital and calculation of
probability for happening of an event is imperative.
Con'd….
The word probability has two basic meaning
Quantitative measure of uncertainty is concerned with
decision-making under risk and uncertainty conditions or the
occurrence of uncertain events that will happen in the future.

Measure the degree of belief in a particular statement or


problem. hypotheses are tested by using probability, also
Predictions are based on probability.

o It is used to estimate the likelihood of certain events


occurring.

70
Definitions of some probability terms
o Experiment: any process which generates a well-defined
outcome.
o Random Experiment: it is an experiment that can be repeated
any number of times under the same conditions, but their
outcomes are uncertain and do not give unique results.
o Random experiment has three important components
multiplicity of outcomes
uncertainty regarding the outcomes, and
repeatability of the experiment under identical manner
 Example: Toss a coin many times
There are two (H or T) possible outcome → multiplicity
In each toss no certainty of getting H or T → uncertainty
Tossing many times (in same fashion) → repeatability
Con'd….
o Outcome: The result of a single trial of a random experiment.
Example: experiment outcome
Tossing a fair coin: head, tail
Rolling a fair die: 1, 2, 3, 4, 5, 6.
o Sample space: The set of all possible outcomes of a random
experiment is called sample space and is usually denoted by S
(or Ω).
o Event: A subset of the sample space is called an event and
denoted by upper case letter.
o Equally Likely Events: Events which have the same chance of
occurring.
o Elementary event: An event consisting of a single outcome.
o Impossible event: An event that can’t occur.
Con'd….
Con'd….
Fundamental of counting principle

o To calculate probabilities, we have to know


The number of elements of an event
The number of elements of the sample space

o To determine the number of outcomes one can use several


rules of counting
 Permutation rule
 Combination rule

By habtamu.A. 12/13/24
1. Permutation
Con'd….

By habtamu.A. 12/13/24
Examples: Suppose we have the letters A, B, C, and D
a. How many permutations are there taking all four?
b. How many permutations are there if two letters are used at a
time?
1.How many different permutations can be made from the letters in
the word “CORRECTION”?
2. Combination
Con'd….

80
Activity
1. A question paper contains section A with 5 questions and
section B with 7 questions. A student is required to attempt 8
questions in all, selecting at least 3 from each section. In how
many ways can a student select the questions?
2. A class contains 12 boys and 10 girls. From the class, 10
students are to be chosen for a competition under the condition
that at least 4 boys and at least 4 girls must be represented. In
how many ways can the selection are made?

82
Approaches to measuring probability

 There are three different approaches to define the probability.


1. The classical approach
2. The frequentist approach
3. Axiomatic approach
1. The classical approach

By habtamu.A. 12/13/24
Con'd….

Example: A fair die is tossed once. What is the probability of


getting
a. Number 6?
b. An odd number?
c. An even number?
d. Number 8?
Con'd….
Solution: First identify the sample space,
S= {1, 2, 3, 4, 5, 6}; N= n(S) = 6
a. Let A be the event of number 6; A= {6}

NA= n (A) =1; P (A) = n (A)/n(S) =1/6

b. Let A be the event of odd numbers; A= {1, 3, 5}

NA= n (A) =3; P (A) = n (A)/n(S) = 3/6 = 0.5

c. Let A be the event of number 8


NA= n (A) = 0; P (A) = n (A)/n(S) = 0/6 = 0
By habtamu.A. 12/13/24
Con'd….
Example: A box of 80 candles consists of 30 defective and 50
non-defective candles. If 10 of these candles are selected at
random, what is the probability that

a. All will be defective.


b. 6 will be non-defective
c. All will be non-defective
By habtamu.A. 12/13/24
Activity

1. Let an experiment of tossing a fair die twice what is the


probability of
A. The sum of the event is an even number
B. The sum of the event is greater than 10
C. The sum of the event is equal to 12
D. The same of the event equal to 15

2. A class has 12 boys and 4 girls. Suppose 3 students are


selected at random from the class. Find the probability that all
are boys.
3. There are 5 items defective in a sample of 30 items. Find the
probability that an item chosen at random from the sample is
(i) defective (ii) non – defective
2. The frequentist approach
Con'd….
Con'd….
3. Axioms of Probability
Basic Theorem of probability

94
Con'd….

95
Conditional Probability

By habtamu.A. 12/13/24
Con'd….
Con'd….
Con'd….
Con'd….
Con'd….
Con'd….
Activity

103
Con'd….

104
Chapter Six

Random variable and Probability


distribution
Definitions of random variable

o Variable: any characteristic or attribute that assumes different


values and can be measured and counted.
o Random variable (r.v): a variable whose values are determined
by chance. random variable is a numerical description of the
outcomes of an experiment or a numerical valued function
defined on sample space, usually denoted by capital letters.
o If X is a random variable, then it is a function from the elements
of the sample space to the set of real numbers.
i.e. X is a function X: S→ R.

By habtamu.A. 12/13/24
Con'd….
 There are two types of random variable

1. Discrete random variables: A random variable is said to be


discrete if it takes only a finite or countable infinite number of
values.
2. Continuous random Variables: A random variable X is said to
be continuous if it takes values in an interval.
Example:
Probability distribution of discrete random variable

By habtamu.A. 12/13/24
Con'd….
Probability density function of random variable
Con'd….
Con'd….
Expectation of random variable

12/13/24
Variance of random variable
What is the expected proportion of surveys returned in any given
quarter?
Solution: by definition
Common discrete probability distribution

By habtamu.A. 12/13/24
Con'd….
Con'd….

By habtamu.A. 12/13/24
Con'd….
Con'd….

12/13/24
Con'd….
2. Poisson Distribution
The number of events that occur in an interval of time when the
events are occurring at a constant rate. It provides a model for the
relative frequency of the number of "rare events" that occur in a
unit of time, area, volume, etc.

Examples of events whose relative frequency distribution can be


Poisson probability distributions are:
The number of new jobs submitted to a computer in any one
minute,
The number of fatal accidents per month in a manufacturing
plant,
The number of bacteria per small volume of fluid.

124
Con'd….
Con'd….
Exercise: On average, five smokers pass a certain street
corner every ten minutes, what is the probability that during a
given 10 minutes the number of smokers passing will be
a. 6 or fewer
b. 7 or more
c. Exactly 8…….
1.The average number of traffic accidents per week in a small
city is equal to 3.
a. What is the probability that there will not be any
accidents in the next one week?
b. What is the probability that there will be an accident
Bywithin the next 2 weeks?
habtamu.A. 12/13/24
Common continuous probability distribution
Properties of the normal distribution curve
 The areas under the curve that lie within one standard deviation,

two and three standard deviations of the mean are approximately


0.68 (68%), 0.95 (95%), and 0.997 (99.7%) respectively.
 Graphically, it can be shown as:
2. Standard normal probability distribution
Con'd….
Con'd….
Con'd….
Con'd….

Example: Find the area under the standard normal distribution


which lies
A. Between Z = 0 and Z = 0.96

B. Between Z = -1.45 and Z = 0


C. To the left of Z = -0.35
Con'd….
Con'd….
Exercise: A random variable X has a normal distribution with a
mean of 80 and a standard deviation of 4.8.

What is the probability that it will take a value


A. Less than 87.2

B. Greater than 76.4

C. Between 81.2 and 86.0


By habtamu.A. 12/13/24
Chapter Seven

Sampling and Sampling Distribution


Introduction
Definitions of Some Basic Terms in Sampling
o Population: is the complete set of possible measurements for
which inferences are to be made. Population represents the target
of an investigation, and the objective of the investigation is to
draw conclusions about the population.
o Census: a complete enumeration of the population. But in most
real problems it cannot be realized, hence we take sample.
o Sample: Elements taken from the population under consideration.
o Sample survey: A study that asks questions of a sample drawn
from some population.
o Parameter: Characteristic or measured value obtained from a
population.
o Statistic: Characteristic or measure value obtained from a sample.
Con'd….
o Sampling: is the process of selecting a subset of individuals or
items from a larger population to make inferences about the
population as a whole.
o To draw valid conclusions from your results, you have to
carefully decide how you will select a sample that is
representative of the group as a whole.
o The selection of a sampling method depends on several factors
such as the research objective, the size of the population, the
resources available, and the level of precision required.
Con'd….
o Sampling unit: An element or a group of elements on which
observations can be taken during sampling is called a
sampling unit.
o Examples:
 If somebody studies the Scio-economic status of the
households, households are the sampling unit.
 If one studies performance of freshman students in some
college, the student is the sampling unit.
o Sampling frame: is the list of all elements in a population.
 List of households.
 List of students in the registrar office.
Con'd….
o Errors in sample survey: When we take a sample, our results
will not exactly equal the results for the whole population. That
is, our results will be subject to errors.
There are two types of errors
A. Sampling errors: the error that results from using the sample to
estimate information regarding the population.
o It is the discrepancy between sample statistic and population
parameter.
o This may arise due to inappropriate sampling techniques applied.
o Sampling error can be minimized by increasing the size of
sample.
143
Con'd….

Advantages of sampling over complete enumeration:


 Reduced cost

 Greater speed(Urgent information required)

 Greater accuracy

 Organization of work or increase the feasibility of the study

 More detailed information can be obtained.


Sampling Technique
The technique of selecting a sample is important in sampling and
usually, it depends upon the nature of the investigation.
oThere are two types of sampling.

1.Random Sampling or probability sampling.


oProbability sampling is a technique in which every unit in the
population has a chance of being selected in the sample, and this
chance can be accurately determined.
 Simple random sampling
 Stratified random sampling
 Cluster sampling
 Systematic sampling

By habtamu.A. 12/13/24
Simple Random Sampling
o It is a method of selecting items from a population such that
every possible sample of a specific size has an equal chance of
being selected. In this case, the sample can be drawn in two
possible ways.
o The sampling units are chosen without replacement in the
sense that the unit once chosen are not placed back in the
population.
o The sampling units are chosen with replacement in the sense
that the chosen units are placed back in the population.
o Simple random sampling involves randomly selecting
respondents from a sampling frame, but with large sampling
frames, usually a table of random numbers or a computerized
random number generator is used.
Stratified Random Sampling
o When the population is heterogeneous concerning the study
variable it would not be desirable to use simple random sampling.
In such cases, stratified random sampling would be appropriate.
o The population is first divided into homogenous groups called
strata and a simple random sample is then taken from each strata.
o Elements in the same strata should be more or less homogeneous
while different in different strata.
o The number of units to be selected from each stratum can be
determined by one of the following allocation methods.
o Proportional allocation: If the same sampling fraction is used for
each stratum. Some of the criteria for dividing a population into
strata are: Sex (male, female); Age (under 18, 18 to 28, and 29 to
39).
Con'd….
Example: To find the average height of the students in a school
of class 1 to class 12, the height varies a lot as the students in
class 1 are of age around 6 years and students in class 10 are of
age around 16 years. So one can divide all the students into
different subpopulations or strata such as
 Students of class 1, 2, and 3: Stratum 1
 Students of class 4, 5, and 6: Stratum 2
 Students of class 7, 8, and 9: Stratum 3
 Students of class 10, 11, and 12: Stratum 4
oNow draw the samples by SRS from each of the strata 1, 2, 3,
and 4. All the drawn samples combined will constitute the final
stratified sample for further analysis.
Cluster Sampling
o If you have a population dispersed over a wide geographic
region, it may not be feasible to conduct a simple random
sampling of the entire population. In such case, it may be
reasonable to divide the population into clusters (usually along
geographic boundaries).
o A simple random sample of groups or clusters of elements is
chosen and all the sampling units in the selected clusters will
be surveyed.
o Clusters are formed in a way that elements within a cluster are
heterogeneous, i.e. observations in each cluster should be more
or less dissimilar.
o Cluster sampling is useful when it is difficult or costly to
generate a simple random sample.
Con'd….

o In cluster sampling, we follow the following steps:


1. Divide the population into clusters (usually along
geographic boundaries).
2. Randomly sample clusters.
3. Measure all units within sampled clusters
o Cluster sampling is preferred when

1. No reliable listing of elements is available and it is


expensive to prepare it.
2. Even if the list of elements is available, the location or
identification of the units may be difficult.
Con'd….
o For example, to estimate the average annual household
income in a large city we use cluster sampling because to use
simple random sampling we need a complete list of households
in the city from which to sample.
o To use stratified random sampling, we would again need the
list of households. A less expensive way is to let each block
within the city represent a cluster.
o A sample of clusters could then be randomly selected, and
every household within these clusters could be interviewed to
find the average annual household income.
Systematic Sampling
Con'd….
2. Non-random sampling

o It is the sampling technique in which some units of the

population have zero chance of selection or the probability of


selection cannot be accurately determined.
o Typically, sample are selected based on personal judgment or

other than random chance, such as quota, convenience, personal


choice, or interest of the researcher.
o Non-probability sampling does not allow the estimation of
sampling errors and may be subjected to a sampling bias.
Con'd….
o There are different type of non-random sampling method
among those.
 Judgment sampling.
 Convenience sampling
 Quota Sampling.
Judgment Sampling
o In this case, the person taking the sample has direct or
indirect control over which items are selected for the sample.
It is subjective and can be influenced by researcher bias.
o This approach is used when a sample is taken based on
certain judgments about the overall population.

o Example: Consider a marketing expert analyzing consumer


preferences for a new product. using their expertise, they
select specific focus groups based on age, income, and
shopping behavior.
Convenience Sampling
o In this method, the decision maker select participants based on
their accessibility and availability to the researcher. Rather than
being drawn at random from a bigger population, participants in
this strategy are picked because they are easily available to the
researcher.

o For example: television reporters often look for people on the


street interviewers’ to find out how people view an issue.
o The researcher selects participants in street shopping to know the
satisfaction of customers and the service quality of the new
product or service.
Quota sampling
o The population is divided into mutually exclusive subgroups
(just as in stratified sampling), and then a non-random set of
observations is chosen from each subgroup to meet a
predefined quota.
o In this method, the decision maker requires the sample to
contain a certain number of items with a given characteristic.
o Example 1: Suppose a cigarette company wants to find out
what age group prefers what brand of cigarettes in a particular
city. They apply survey quota to the age groups of 21–30, 31–
40, 41–50, and 51+.
o Many political polls are, in part, quota-sampling.
Sampling Distribution of the sample mean
o A statistic, such as sample mean, sample proportion sample
standard deviation, is a number computed from the sample. Since
a sample is random, every statistic is random variable: it varies
from sample to sample in a way that cannot be predicted with
certainty.
o A sampling distribution is simply a type of probability
distribution. Unlike the distributions studied so far, a sampling
distribution refers not to individual observations but to the values
of statistic computed from those observations, in sample after
sample.
o A sampling distribution shows how a statistic would vary in
repeated data production.
Con'd….
Con'd….
Con'd….
Con'd….
o Step 1: Draw all possible sample sizes:

6 8 10 12 14

6 (6, 6) (6, 8) (6, 10) (6, 12) (6, 14)

8 (8,6) (8,8) (8,10) (8,12) (8,14)

10 (10,6) (10,8) (10,10) (10,12) (10,14)

12 (12,6) (12,8) (12,10) (12,12) (12,14)

14 (12,6) (14,8) (12,10) (12,12) (12,14)


Con'd….
o Step 2: Calculate the mean for each sample:

6 8 10 12 14

6 6 7 8 9 10

8 7 8 9 10 11

10 8 9 10 11 12

12 9 10 11 12 13

14 10 11 12 13 14
Con'd….
o Step 3: Summarize the mean obtained in step 2 in terms of
frequency distribution.
Con'd….

By habtamu.A. 12/13/24
Con'd….
o Exercise: a rolling team consists of four rowers who weigh

152, 156, 160, and 164 pounds.


 Find all possible samples with replacement two.

 Compute the sample mean for each one.

 Find the probability distribution, the mean, and the standard

deviation of the sample mean.


Con'd….
Con'd….
Central Limit Theorem
Con'd….

o In conclusion, The central limit theorem states that if you take

sufficiently large samples from a population, the sample mean


will be normally distributed, even if the population isn’t
normally distributed.

a. The sample size is sufficiently large.

b. The samples are independent and identically distributed(iid).

c. The population’s distribution has finite variance.

You might also like