0% found this document useful (0 votes)
27 views16 pages

Unit-5 Biostatistics Descriptive

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views16 pages

Unit-5 Biostatistics Descriptive

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit 5: The sampling distribution and the Central Limit

Theorem

Unit 5: The sampling distribution and the Central Limit


Theorem
Objectives:
1. Define population and sample
2. Discuss Sampling distribution & Central limit Theorem
3. Discuss Percentiles & Normal distribution
4. Explain the relationship between percentile and Normal distribution

Population:
A population in statistics refers to the entire set of individuals or items that the researcher is
interested in studying. For example, if a researcher is studying the income of households in Karachi, the
population would be all households in the city, and their corresponding incomes would be the
measurements of interest. While it is often impractical to gather data from every single unit in the
population, researchers typically select a sample — a smaller, representative subset of the population
— and use the sample data to estimate population characteristics, such as the average income. These
estimates rely on statistical principles like the sampling distribution and the Central Limit Theorem,
which help researchers draw conclusions about the population based on sample data.

Sample:
A sample is a subset of a population that is selected for study, and it is used to make inferences or
estimates about the entire population. When researchers want to estimate certain characteristics (such
as the average income) of a population, they typically select a random sample. A random sample is
chosen in such a way that every member of the population has an equal chance of being selected,
which helps ensure that the sample is a good representative of the population. This reduces the risk of
bias and increases the likelihood that the sample's characteristics reflect the true characteristics of the
population, allowing researchers to make accurate generalizations.
Unit 5: The sampling distribution and the Central Limit
Theorem

Sampling Distribution
A sampling distribution refers to a probability distribution of a statistic that comes from choosing
random samples of a given population. Also known as a finite-sample distribution, it represents the
distribution of frequencies on how spread apart various outcomes will be for a specific population.
Unit 5: The sampling distribution and the Central Limit
Theorem
The sampling distribution depends on multiple factors – the statistic, sample size, sampling process,
and the overall population. It is used to help calculate statistics such as means, ranges, variances, and
standard deviations for the given sample.

Sampling distribution work:


 Select a random sample of a specific size from a given population.
 Calculate a statistic for the sample, such as the mean, median, or standard deviation.
 Develop a frequency distribution of each sample statistic that you calculated from the step above.
 Plot the frequency distribution of each sample statistic that you developed from the step above.
The resulting graph will be the sampling distribution.

Types of Sampling Distribution

1. Sampling distribution of mean


As shown from the example above, you can calculate the mean of every sample group chosen from the
population and plot out all the data points. The graph will show a normal distribution, and the center
will be the mean of the sampling distribution, which is the mean of the entire population.

2. Sampling distribution of proportion


It gives you information about proportions in a population. You would select samples from the
population and get the sample proportion. The mean of all the sample proportions that you calculate
from each sample group would become the proportion of the entire population.

3. T-distribution
T-distribution is used when the sample size is very small or not much is known about the population. It
is used to estimate the mean of the population, confidence intervals, statistical differences, and linear
regression. (no need of T-distribution over here)

Practical Example
Suppose you want to find the average height of children at the age of 10 from each continent. You
take random samples of 100 children from each continent, and you compute the mean for each sample
group.
For example, in South America, you randomly select data about the heights of 10-year-old children, and
you calculate the mean for 100 of the children. You also randomly select data from North America and
calculate the mean height for one hundred 10-year-old children.
As you continue to find the average heights for each sample group of children from each continent, you
can calculate the mean of the sampling distribution by finding the mean of all the average heights of
each sample group. Not only can it be computed for the mean, but it can also be calculated for other
statistics such as standard deviation and variance.

Importance of Using a Sampling Distribution


Since populations are typically large in size, it is important to use a sampling distribution so that you
can randomly select a subset of the entire population. Doing so helps eliminate variability when you are
doing research or gathering statistical data.
It also helps make the data easier to manage and builds a foundation for statistical inferencing, which
leads to making inferences for the whole population. Understanding statistical inference is important
because it helps individuals understand the spread of frequencies and what various outcomes are like
within a dataset.

Population & sample measures


In statistics, the population and the sample are summarized by different types of descriptive
measures:
Unit 5: The sampling distribution and the Central Limit
Theorem
A parameter is a descriptive characteristic that summarizes the entire population. It is a fixed value that
typically represents some aspect of the population, such as the population mean (μ\muμ) or
population variance (σ2). For example, the true average income of all households in Karachi would be a
population mean parameter.

A statistic, on the other hand, is a descriptive characteristic that summarizes the data from a sample. It
is an estimate of the corresponding population parameter. For example, if a researcher takes a random
sample of households in Karachi and calculates the average income of the sampled households, this
sample mean is a statistic. The sample mean serves as an estimate of the population mean, but it will
likely differ from the true population mean due to sampling variability.

So the, parameters describe populations, while statistics describe samples, and sample statistics (like
the sample mean) are used to estimate population parameters (like the population mean).

Parameters Population mean (µ) Population standard deviation (δ) Statistic (sample estimates)
µ = X bar
δ=S

Sampling distribution of the mean

The sampling distribution of the mean refers to the probability distribution of the sample mean
calculated from multiple random samples taken from the population. Because no two samples are
exactly the same, each sample will likely have a slightly different sample mean. Therefore, the sample
mean itself is a random variable, and like all random variables, it has its own distribution.
Key points about the sampling distribution of the mean:

1. Variability of the Sample Mean: Each sample will produce a different sample mean, meaning that
the sample mean will vary from sample to sample. This variability is due to the randomness of the
sampling process.
2. Random Variable: Since the sample mean is a random variable, it has a distribution that can be
described. This distribution is known as the sampling distribution of the sample mean.
3. Shape of the Distribution: According to the Central Limit Theorem (CLT), if the sample size nnn is
sufficiently large, the sampling distribution of the sample mean will approximate a normal
distribution, regardless of the shape of the population distribution (as long as the population has a
finite variance). If the sample size is large enough (typically n≥30n \geq 30n≥30), we can assume
that the sample means follow a normal distribution.
4. Properties of the Normal Distribution: If the sampling distribution of the sample mean is normal,
we can use the properties of the normal distribution to compute probabilities. For instance, we can
Unit 5: The sampling distribution and the Central Limit
Theorem
calculate the likelihood that the sample mean will fall within a certain range or estimate how far
the sample mean is likely to differ from the population mean.

Properties of Sampling Distribution

1. The Mean of the Sample Means:


The mean of the sampling distribution of the sample mean will be equal to the population mean. This
means that if we repeatedly take random samples from the population and compute their means, the
average of all those sample means will be the same as the population mean μ\muμ.
Mathematically, this is expressed as: μXˉ=μ
where:
μXˉ is the mean of the sampling distribution of the sample mean,
μ\muμ is the population mean.

Interpretation: On average, the sample mean is an unbiased estimator of the population mean,
meaning that repeated sampling will not systematically overestimate or underestimate the true
population mean.

2. The Standard Deviation of the Sample Means (Standard Error):


The standard deviation of the sampling distribution of the sample mean (often referred to as the
standard error or SE) will be smaller than the standard deviation of the population. This is because a
sample mean tends to be less variable than individual data points in the population.
The formula for the standard error of the sample mean is: SE= σ/√ n
where:
σ\sigmaσ is the population standard deviation,
n is the sample size.

Interpretation: As the sample size nnn increases, the standard error decreases, meaning that the
sample mean becomes a more precise estimate of the population mean. This is why larger samples tend
to yield more reliable estimates of the population parameters.

So:
The Mean of the sampling distribution of the sample mean is equal to the population mean.
The Standard Deviation of the sampling distribution (also called the standard error) is smaller than the
population standard deviation and is given by . σ/√ n The larger the sample size n, the smaller the
standard error, meaning more precise estimates of the population mean.
These properties allow us to understand the behavior of sample means and to make statistical
inferences about the population, such as estimating the population mean and determining how likely it
is that a sample mean falls within a certain range.
The standard deviation of the sample means is called the standard error of the mean.
S Xˉ = σ/√ n

Example 1

The sample mean , is to be calculated from a random sample of size 2 taken from a population
consisting of the five values ($2, $3, $4, $5, $6).
Find the sampling distribution x of based on a sample of size 2. First note that the population mean (µ)
is: 2+3+4+5+6/5=$4

Solution:
1. Population Information:
Unit 5: The sampling distribution and the Central Limit
Theorem
The population consists of the following values:
2,3,4,5,6
To calculate the population mean (μ\muμ):
μ=2+3+4+5+6/5=20/5=4
So, the population mean μ=4

2. Possible Samples of Size 2:


Since the sample size is 2, we need to list all possible combinations of 2 values drawn from the
population. The number of ways to select 2 items from 5 is given by the combination formula:
Number of combinations=(5/2)=5×4/2×1=10
So, there are 10 possible samples of size 2. These are the following pairs:
(2,3)
(2,4)
(2,5)
(2,6)
(3,4)
(3,5)
(3,6)
(4,5)
(4,6)
(5,6)
3. Calculate the Sample Means:
Now, let's calculate the sample mean (xˉ) for each of these possible samples:
Sample (2,3): xˉ=2+3/2=5/2=2.5
Sample (2,4): xˉ=2+4/2=6/2=3
Sample (2,5): xˉ=2+5/2=7/2=3.5
Sample (2,6): xˉ=2+6/2=8/2=4
Sample (3,4): xˉ=3+42=72=3.5
Sample (3,5): xˉ=3+5/2=8/2=4
Sample (3,6): xˉ=3+6/2=9/2=4.5
Sample (4,5): xˉ=4+5/2=9/2=4.5
Sample (4,6): xˉ=4+6/2=10/2=5
Sample (5,6): xˉ=5+6/2=11/2=5.5

4. Sampling Distribution of the Mean:


Now, we have the following sample means:
2.5,3,3.5,4,3.5,4,4.5,4.5,5,5.5

5. Frequency of Each Sample Mean:


Let's summarize the frequency of each unique sample mean:
2.5: 1 time
3: 1 time
3.5: 2 times
4: 2 times
4.5: 2 times
5: 1 2 time
5.5: 1 time

Thus, the sampling distribution of the sample mean is:


Sample Mean (xˉ) Frequency
2.5 1
3 1
3.5 2
4 2
Unit 5: The sampling distribution and the Central Limit
Theorem
Sample Mean (xˉ) Frequency
4.5 2
5 1
5.5 1

6. Properties of the Sampling Distribution:


Mean of the Sampling Distribution: The mean of the sample means (which is the sampling distribution
mean) should equal the population mean. Let's verify this:

μXˉ=(2.5×1)+(3×1)+(3.5×2)+(4×2)+(4.5×2)+(5×1)+(5.5×1)10
μxˉ= 2.5+3+7+8+9+5+5.5/10
μxˉ= 40/10 = 4
Thus, the mean of the sampling distribution is μxˉ=4.

7. Variance of the Sampling Distribution:


The variance of the sampling distribution of the sample mean is calculated using the formula:
σxˉ2 = ∑(Sample Mean−μxˉ)2 × Frequency/ Total number of samples
Now, we calculate the squared differences between each sample mean and the mean of the sampling
distribution (μxˉ=4):
σxˉ 2 =(2.5−4)2×1+(3−4)2×1+(3.5−4)2×2+(4−4)2×2+(4.5−4)2×2+(5−4)2×1+(5.5−4)2×1/10
σxˉ2 =(1.5)2×1+(1)2×1+(0.5)2×2+(0)2×2+(0.5)2×2+(1)2×1+(1.5)2×1/10
σxˉ2 = 2.25+1+0.5+0+0.5+1+2.25/10
σxˉ2 = 7.5/10 =0.75
Thus, the variance of the sampling distribution is σxˉ2 = 0.75

8. Standard Deviation of the Sampling Distribution:


The standard deviation of the sampling distribution is the square root of the variance:
σxˉ= √ 0.75= 0.866

Summary of the Sampling Distribution Properties:


Mean of the Sampling Distribution (μxˉ): 4 (equal to the population mean)
Variance of the Sampling Distribution (σxˉ2): 0.75
Standard Deviation of the Sampling Distribution (σxˉ): 0.866

So;
As expected, the mean of the sampling distribution of the sample mean equals the population mean
(μ=4), and we have also calculated the variance and standard deviation of the sampling distribution.

Percentile
In statistics, a percentile is a term that describes how a score compares to other scores from the same
set.
Percentile are position measures used in educational health related fields to indicate the position of an
individual in a group
• Percentile divide the data set into 100 equal groups. At least n% of the data lie above the nth
percentile, and at most (100-n)% of the data lie below the nth percentile. E.g. 90th percentile indicates
that at least 10% of the data lie above it, and at most 90% of the data lie below it.

For example:
If a test score is in the 90th percentile, it means that the score is higher than 90% of all the other scores,
and only 10% of the scores are higher.

Percentile Rank:
Unit 5: The sampling distribution and the Central Limit
Theorem
This refers to the percentage of scores in a distribution that fall below a particular score.
Percentile Rank tells you how a specific data point compares to all the other points in the dataset.

Percentile Value:
The percentile value is the actual value or observation in the data set that corresponds to a given
percentile.

For example, the 25th percentile (often called Q1, the first quartile) is the value below which 25% of
the data points fall.

Percentiles divide a data set into 100 equal parts, so there are 100 percentiles (each representing 1% of
the data). For example:

 The 1st percentile is the value below which 1% of the data points fall.
 The 25th percentile is the value below which 25% of the data points fall (this is also called the first
quartile or Q1).
 The 50th percentile is the median value, where 50% of the data points fall below it (and 50% are
above it).
 The 75th percentile is the value below which 75% of the data points fall (this is the third quartile or
Q3).
 The 100th percentile is the maximum value in the data set.

Normal Distribution

The normal distribution is one of the most fundamental concepts in statistics, as it is widely used in a
variety of fields, including psychology, economics, biology, and social sciences. It describes how data
values are distributed in many real-world scenarios, especially when the data is symmetrically
distributed around the mean.

Characteristics of the Normal Distribution

1. Shape: The normal distribution is a bell-shaped curve that is symmetrical around the mean. It is
sometimes referred to as the Gaussian distribution.

2. Symmetry: The distribution is perfectly symmetrical around the mean. This means that the left half
of the distribution mirrors the right half. Therefore, the mean, median, and mode of a normal
distribution are all equal and located at the center.

3. Mean, Median, and Mode:


The mean is the central value.
The median is the middle value when the data is sorted.
The mode is the most frequent value in the dataset.
In a normal distribution, these three measures of central tendency are the same.

4. Standard Normal Distribution: A standard normal distribution is a normal distribution with a mean
of 0 and a standard deviation of 1. The Z-score formula is used to standardize data points in any
normal distribution to fit the standard normal distribution:

Z=X−μ/σ

Where:
Z is the Z-score (how many standard deviations X is from the mean),
X is the value of the data point,
μ\muμ is the mean,
σ\sigma is the standard deviation.
Unit 5: The sampling distribution and the Central Limit
Theorem

68-95-99.7 Rule: In a normal distribution:


68% of the data falls within 1 standard deviation of the mean.
95% of the data falls within 2 standard deviations of the mean.
99.7% of the data falls within 3 standard deviations of the mean.
This is known as the Empirical Rule or the 68-95-99.7 Rule.

Properties of the Normal Distribution

1. Bell-shaped Curve:
The curve is symmetrical, with the highest point at the mean (μ\muμ), and it tapers off towards
both ends. The two tails never touch the horizontal axis, but they get infinitely close to it.

2. Asymptotic:
The tails of the normal distribution extend infinitely in both directions, approaching but never
actually touching the horizontal axis. This indicates that extreme values (both high and low) are still
possible, but less likely.

3. Defined by Mean and Standard Deviation:


Mean (μ\muμ): The location of the peak of the distribution. It indicates the "center" of the
distribution.
Standard Deviation (σ\sigmaσ): The spread of the distribution. A larger σ\sigmaσ makes the curve
wider, and a smaller σ\sigmaσ makes the curve narrower.

4. Area Under the Curve:


The total area under the normal distribution curve equals 1 (or 100%), which represents the total
probability. The area under any section of the curve represents the probability of an outcome
falling within that section.

Examples
Suppose Z has a standard normal distribution.
A) Find the 84th percentile of this distribution

84th percentile = 84% data below the point


0.84-0.5= 0.34 now find the z score against area of 0.34

0.50

84% 0.34

0 z= ?
Unit 5: The sampling distribution and the Central Limit
Theorem

Examples cont..
b) Find the 50th percentile or the median of the standard normal distribution

50th percentile = 50% of the data below the point

0.5 0.5

Examples
c. Find the 16th Percentile of this distribution.

16th percentile = 10% of the data below this point


50%-16%= 34% or 0.34

0.50

0.34

Z=? 0

Example:
Suppose the reaction time of a particular durg X, has a normal distribution with a mean of 10 min and a
standard deviation of 2 min.
Unit 5: The sampling distribution and the Central Limit
Theorem

a)Find the 50th Percentile or median reaction time


b) What will be the 84th Percentile of the reaction time?

50th percentile

Z= x-mean= SD 84th percentile = 84% or 0.84 of the data below this

Point so 0.84-0.5= 0.34 find Z against

Area of 0.34

0.5

0.34

Z score 0 z=?

Mean

Xi =10 Xi=?

Suppose the age in a population has a normal distribution with mean 50 years and standard deviation
of 10 yrs.
a)Find the 50th percentile of the variable age
b)Find the 65th percentile of the variable age
c)Find the 10th percentile of the variable age

a) Find the 50th percentile of the variable age


Unit 5: The sampling distribution and the Central Limit
Theorem

b) Find the 65th percentile of the variable age

c. Find the 10th percentile of the variable age


Unit 5: The sampling distribution and the Central Limit
Theorem

Tutorial 1.
If a set of score on an epidemiology examination are approximately normally distributed with a mean of
76 and standard deviation of 4, find:

a. what is the 33rd percentile of the variable score


b. What percent of the students who take this examination score at most 78?what percentile is score
78?
c. What percent of the students who take this examination get a score at least 67? What percentile

a. what is the 33rd percentile of the variable score

Solve the problem step-by-step:

Given Information:

The mean (μ\muμ) of the scores = 76


The standard deviation (σ\sigmaσ) of the scores = 4
We need to find the 33rd percentile of the variable score.

Step 1: Find the Z-Score for the 33rd Percentile


The first thing we need to do is determine the Z-score corresponding to the 33rd percentile. The Z-
score is the number of standard deviations away from the mean a given value is in a normal
distribution.
To find the Z-score for the 33rd percentile, we can use a Z-table or a standard normal distribution
table. The Z-table tells us the cumulative probability to the left of a given Z-score.
For the 33rd percentile, the cumulative probability is 0.33.
From the standard normal table, the Z-score corresponding to the cumulative probability of 0.33 is
approximately:
Unit 5: The sampling distribution and the Central Limit
Theorem
Z= −0.43
This means that the value at the 33rd percentile is 0.43 standard deviations below the mean.

Step 2: Use the Z-Score Formula to Find the Actual Score


Once we have the Z-score, we can use the formula to convert the Z-score back to the actual value in the
original distribution.
The formula is:
Z=X−μ/σ
Where:
Z is the Z-score,
X is the value of the score at the 33rd percentile (this is what we're solving for),
μ\muμ is the mean (76),
σ\sigmaσ is the standard deviation (4).
We already know:
Z=−0.43
μ=76
σ=4
Now, plug these values into the formula and solve for X
−0.43=X−76/4
Multiply both sides of the equation by 4 to eliminate the denominator:
−0.43×4=X−76
−1.72=X−76
Now, add 76 to both sides to solve for XXX:
X=76−1.72=74.28

So;
The 33rd percentile of the exam scores is approximately:
74.28
Thus, the score at the 33rd percentile is approximately 74.28.

b. What percent of the students who take this examination score at most 78?what
percentile is score 78?

Solution:
Given Information:

The mean (μ\muμ) of the scores = 76


The standard deviation (σ\sigmaσ) of the scores = 4
We need to find:

(b) The percent of students who score at most 78.


The percentile for the score of 78.

step 1: Find the Z-Score for a Score of 78


To find the percentage of students who score at most 78, we need to convert the score of 78 into a Z-
score.
The Z-score formula is:
Z=X−μ/σ
Where:
X=78 (the score we're interested in),
μ=76 (mean),
σ=4 (standard deviation).
Now, substitute the values:
Z=78−76l4 = 2/4 =0.5
Unit 5: The sampling distribution and the Central Limit
Theorem
Thus, the Z-score corresponding to a score of 78 is Z = 0.5.

Step 2: Find the Percent of Students Who Score at Most 78


To find the percentage of students who score at most 78, we need to find the cumulative probability for
Z = 0.5 in the standard normal distribution. This tells us the proportion of students whose scores are
less than or equal to 78.
Using a Z-table or a standard normal table, we can look up the cumulative probability for Z=0.5. From
the Z-table, we find that the cumulative probability for Z = 0.5 is approximately:
P(Z≤0.5)= 0.6915
This means that approximately 69.15% of the students score at most 78.
Thus, about 69.15% of the students score at most 78.

Step 3: Find the Percentile for the Score of 78


The percentile for a score of 78 is simply the cumulative probability that we found above. The
percentile is the percentage of data points that fall below this score. From the Z-table, we know that for
Z = 0.5, the cumulative probability is approximately 0.6915 or 69.15%.
Thus, the percentile for a score of 78 is:
69.15%

Percent of students who score at most 78: 69.15%.


Percentile for a score of 78: 69.15th percentile.

c. What percent of the students who take this examination get a score at least 67?
What percentile

Let's solve part (c) step by step:


Given Information:

The mean (μ\muμ) of the scores = 76


The standard deviation (σ\sigmaσ) of the scores = 4
We need to find:
(c) The percent of students who score at least 67.
The percentile for the score of 67.

Step 1: Find the Z-Score for a Score of 67


To find the percentage of students who score at least 67, we first need to convert the score of 67 into a
Z-score.
The Z-score formula is:
Z=X−μ/σ
Where:
X=67 (the score we're interested in),
μ=76 (mean),
σ=4 (standard deviation).
Now, substitute the values:
Z=67−76/4=−9/4=−2.25
So, the Z-score corresponding to a score of 67 is Z = -2.25.

Step 2: Find the Percent of Students Who Score at Least 67


Now, we need to find the percentage of students who score at least 67, or equivalently, the proportion
of students who score greater than or equal to 67. This is the right-tail probability for the Z-score Z = -
2.25.
To do this:
Find the cumulative probability for Z = -2.25 from the standard normal distribution (using a Z-table or a
calculator).
Unit 5: The sampling distribution and the Central Limit
Theorem
Subtract that probability from 1 to find the percentage of students who score at least 67 (i.e., greater
than or equal to 67).

Step 2a: Find the Cumulative Probability for Z=−2.25


Looking up Z = -2.25 in the standard normal table, we find that the cumulative probability (i.e., the
proportion of students scoring less than 67) is approximately:
P(Z≤−2.25)= 0.0122
This means that about 1.22% of students score below 67.

Step 2b: Find the Percent of Students Who Score At Least 67


To find the percentage of students who score at least 67, we subtract the cumulative probability from
1:
P(Z≥−2.25)=1−P(Z≤−2.25)=1−0.0122=0.9878
So, approximately 98.78% of students score at least 67.

Step 3: Find the Percentile for the Score of 67


The percentile for a score of 67 is simply the cumulative probability that corresponds to the Z-score Z = -
2.25. From the Z-table, we found that the cumulative probability for Z = -2.25 is 0.0122.
This means that a score of 67 is at the 1.22nd percentile of the distribution.
Thus, the percentile for the score of 67 is:
1.22%

So the Results:
Percent of students who score at least 67: 98.78%.
Percentile for a score of 67: 1.22nd percentile.

You might also like