0% found this document useful (0 votes)

113 views31 pages

Introduction To Data Analytics: Sampling Distributions

This document discusses sampling distributions and the central limit theorem. It defines key concepts like population, sample, random variable, and statistics. It explains that a sample statistic has a probability distribution called a sampling distribution. The central limit theorem states that as sample size increases, the sampling distribution of the sample mean will approach a normal distribution, regardless of the shape of the population distribution. Examples are provided to illustrate sampling distributions and how the central limit theorem can be applied.

Uploaded by

preethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views31 pages

Introduction To Data Analytics: Sampling Distributions

Uploaded by

preethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

INTRODUCTION TO

DATA ANALYTICS

Class #9
Sampling Distributions

Dr. Sreeja S R
Assistant Professor
Indian Institute of Information Technology
IIIT Sri City
IIITS: IDA - M2021 1
IN THIS PRESENTATION…
• Basic concept of sampling distribution

• Usage of sampling distributions

• Issue with sampling distributions

• Central limit theorem

• Application of Central limit theorem

• Major sampling distributions

• distribution

• t-distribution
IIITS: IDA - M2021 2
• F distribution
Introduction
As a task of statistical inference, we usually follow the following steps:

• Data collection
• Collect a sample from the population.

• Statistics
• Compute a statistics from the sample.

• Statistical inference
• From the statistics we made various statements concerning the values of population
parameters can be inferred.
• For example, population mean from the sample mean, etc.

IIITS: IDA - M2021 3

Basic terminologies
Some basic terminology which are closely associated to the above-mentioned tasks are
reproduced below.

• Population: A population consists of the totality of the observation, with which we are
concerned.

• Sample: A sample is a subset of a population.

• Random variable: A random variable is a function that associates a real number with each
element in the sample.

• Statistics: Any function of the random variable constituting random sample is called a
statistics.

• Statistical inference: It is an analysis basically concerned with generalization and prediction.

IIITS: IDA - M2021 4
Basic terminologies
Probability distribution: A function that shows the probabilities of the outcomes of an
event or experiment.

Normal (Gaussian) distribution: A probability distribution that looks like a bell. Two
terms that describe a normal distribution are mean and standard deviation. Mean is the
average value that has the highest probability to be observed. Standard deviation is a
measure of how spread out the values are. As standard deviation increases, the normal
distribution curve gets wider.

IIITS: IDA - M2021 5

Statistical Inference
There are two facts, which are key to statistical inference.

1. Population parameters are fixed number whose values are usually unknown.
2. Sample statistics are known values for any given sample, but vary from sample to
sample, even taken from the same population.
• In fact, it is unlikely for any two samples drawn independently, producing identical
values of sample statistics.
• In other words, the variability of sample statistics is always present and must be
accounted for in any inferential procedure.
• This variability is called sampling variation.

Note:
A sample statistics is random variable and like any other random variable, a sample
statistics has a probability distribution.

IIITS: IDA - M2021 6

Sampling Distribution
• precisely, sampling distributions are probability distributions and used to describe
More
the variability of sample statistics.

Definition 7.1: Sampling distribution

The sampling distribution of a statistics is the probability distribution of that
statistics.

• The probability distribution of sample mean (hereafter, will be denoted as ) is called

the sampling distribution of the mean (also, referred to as the distribution of sample
mean).

• Like we call sampling distribution of variance (denoted as ).

• Using the values of and for different random samples of a population, we are to make
inference on the parameters and (of the population).

IIITS: IDA - M2021 7

Sampling Distribution
•
Example 7.1:
Consider five identical balls numbered and weighing as . Consider an experiment consisting of drawing two
balls, replacing the first before drawing the second, and then computing the mean of the values of the two balls.
Following table lists all possible samples and their mean.

Sample Mean Sample Mean Sample Mean

[1,1] [2,4] [4,2]

IIITS: IDA - M2021 8

Sampling Distribution
Sampling distribution of means

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

IIITS: IDA - M2021 9

Issues with Sampling Distribution
1. In practical situation, for a large population, it is infeasible to have all
possible samples and hence probability distribution of sample statistics.

2. The sampling distribution of a statistics depends on

• the size of the population

• the size of the samples and

• the method of choosing the samples.

IIITS: IDA - M2021 10

Theorem on Sampling Distribution
Famous theorem in Statistics

Theorem 7.1: Sampling distribution of mean and variance

The sampling distribution of a random sample of size n drawn from a population
with mean and variance will have mean and variance

Example 7.2: Consider the following small population consisting of N=6 patients
who recently underwent total hip replacement. Three months after surgery they rated
their pain-free function on a scale of 0 to 100 (0=severely limited and painful
functioning to 100=completely pain free functioning). The data are shown below and
ordered from smallest to largest.
Pain-Free Function Ratings in a Small Population of N=6 Patients:
25, 50, 80, 85, 90, 100

IIITS: IDA - M2021 11

Example 7.2: 25, 50, 80, 85, 90, 100 For the population,

Suppose we did not have the population data and instead we were estimating the mean functioning
score in the population based on a sample of n=4. The table below shows all possible samples of size
n=4 from the population of N=6. The rightmost column shows the sample mean based on the 4
observations contained in that sample.

Sample Observations in the Sample (n=4) Mean

1 25 50 80 85 60.0
2 25 50 80 90 61.3 From the table,
3 25 50 80 100 63.6
4 25 50 85 90 62.5
5 25 50 85 100 65.0
6 25 59 90 100 66.3
7 25 80 85 90 70.0
8 25 80 85 100 72.5
9 25 80 90 100 73.8
10 25 85 90 100 75.0
11 50 80 85 90 76.3
12 50 80 85 100 78.8
13 50 80 90 100 80.0
14 50 85 90 100 81.3
15 80 85 90 100 88.8

IIITS: IDA - M2021 12

Central Limit Theorem
• The Theorem 7.1 is an amazing result and in fact, also verified that if we sampling
from a population with unknown distribution, the sampling distribution of will still be
approximately normal with mean and variance provided that the sample size is large.

This further, can be established with the famous “central limit theorem”, which is
stated below.

Theorem 7.3: Central Limit Theorem

If is the mean of a random sample of size taken from a population having the
mean and the finite variance , then

is a random variable whose distribution function approaches that of the standard

normal distribution as

IIITS: IDA - M2021 13

Central Limit Theorem
CLT states that the sampling distribution of
the sample means approaches a normal
distribution as the sample size gets larger –
no matter what the shape of the population
distribution. This fact holds especially true
for sample sizes over 30.

Why is it so important to have a normal

distribution?

Normal distribution is described in terms

of mean and standard deviation which can
easily be calculated. And, if we know the
mean and standard deviation of a normal
distribution, we can compute pretty much
everything about it.

IIITS: IDA - M2021 14

Example for central limit theorem:
Different classes of these lipid transport carriers can be separated (fractionated)based on their density
and where they layer out when spun in a centrifuge. High density lipoprotein cholesterol (HDL) is
sometimes referred to as the "good cholesterol," because higher concentrations of HDL in blood are
associated with a lower risk of coronary heart disease. In contrast, high concentrations of
low density lipoprotein cholesterol (LDL) are associated with an increased risk of coronary heart
disease. The illustration on the right outlines how total cholesterol levels are classified in terms of risk,
and how the levels of LDL and HDL fractions provide additional information regarding risk.

IIITS: IDA - M2021 15

Example
for central limit theorem:
Data from the Framingham Heart Study found that subjects over age 50 had a mean HDL of 54 and a
standard deviation of 17. Suppose a physician has 40 patients over age 50 and wants to determine the
probability that the mean HDL cholesterol for this sample of 40 men is 60 mg/dl or more (i.e., low risk).

• Probability questions about a sample mean can be addressed with the Central Limit Theorem, as long as
the sample size is sufficiently large.
• In this case n=40, so the sample mean is likely to be approximately normally distributed, so we can
compute the probability of HDL>60 by using the standard normal distribution table.
• The population mean is 54, but the question is what is the probability that the sample mean will be >60?

Solution:
= 60, = 54, = 17, = 40.

P(Z > 2.22) = 1 - 0.9868 = 0.0132.

Therefore, the probability that the mean HDL in these 40 patients will exceed 60 is 1.32%.

IIITS: IDA - M2021 16

Applicability of Central Limit Theorem
• The normal approximation of will generally be good if 0
• The sample size is, hence, a guideline for the central limit theorem.
• The normality on the distribution of becomes more accurate as grows larger.

One very important application
n=large
of the Central Limit Theorem
is the determination of
reasonable values of the
population mean and variance
n = small
n=1 to moderate

IIITS: IDA - M2021 17

STANDARD SAMPLING DISTRIBUTIONS

• • Apart from the normal distribution to describe sampling distribution, there

are some other quite different sampling, which are extensively referred in
the study of statistical inference.

• : Describes the sampling distribution of the mean when is unknown

• : Describes the distribution of variance.
• F: Describes the distribution of the ratio of two variables.

IIITS: IDA - M2021 18

The 𝒕 Distribution
•1. To know the sampling distribution of mean we make use of Central Limit Theorem
with

2. Central Limit Theorem require the known value of a priori.

3. However, in many situation, is certainly no more reasonable than the knowledge of

the population mean .

4. In such situation, only measure of the standard deviation available may be the sample
standard deviation .

5. It is natural then to substitute for . The problem is that the resulting statistics is not
normally distributed!

6. The distribution is to alleviate this problem. This distribution is called or simply .

IIITS: IDA - M2021 19
The 𝒕 Distribution

Definition 7.4: distribution

If is the mean of a random sample of size taken from a normal population having
the mean and , then

is a random variable having the distribution with the parameter

IIITS: IDA - M2021 20

Example for t-distribution:
A manufacturer of fuses claims that with a 20% overload, the fuses will blow in 12.40
minute on the average. To test this claim, a sample of 20 of the fuses was subjected to a
20% overload, and the time it took them to blow had a mean of 10.63 minutes and a std.
dev. of 2.48 minutes. If it can be assumed that the data constitute a random sample from a
normal population, do they tend to support or refute the manufacturer’s claim?

Solution:
= 10.63, = 12.40, = 2.48, = 20.

is a random variable having the 𝑡 distribution with the parameter 𝑣 = 𝑛−1 = 19 degrees of
freedom. From the t-distribution table, for t = -3.19 and v =19, the probability is 0.005.
Since the probability is very small, we conclude that the data refute the manufacturer’s
claim. In all likelihood, the mean blowing time of his fuses with a 20% overload is less than
12.40 minutes.

IIITS: IDA - M2021 21

THE DISTRIBUTION
• A
common use of the distribution is to describe the distribution of the sample
variance.
• It is concerned with the sampling distribution of the sample variance for random
samples from normal populations.

Definition 7.5: distribution

If is the variance of a random sample of size taken from a normal population

having the variance , then

is a random variable having the chi squaredistribution with the parameter

IIITS: IDA - M2021 22

The Distribution
• The distribution finds enormous applications in comparing sample variances.

Definition 7.5: distribution

If and are the variances of independent random samples of size and , respectively,
taken from two normal populations having the same variance, then

is a random variable having the F distribution with the parameter and

Therefore, if we assume that we have sample of size from a population with variance
and an independent sample of size from another population with variance , then the
statistics

IIITS: IDA - M2021 23

Representation of random variable

Definition 7.6: random variable

Let be independent standard normal random variables.

has a chi sqaure distribution with degrees of freedom.

IIITS: IDA - M2021 24

Representation of 𝒕 random variable

Definition 7.7: random variable

Let the standard normal and with degrees of freedom be independent.

has a t distribution with degrees of freedom.

IIITS: IDA - M2021 25

Representation of F random variable

Definition 7.8: random variable

Let the chi square variables , with degrees of freedom, and , with degrees of
freedom, be independent.

has a F distribution with degrees of freedom.

IIITS: IDA - M2021 26

REFERENCE

The detail material related to this lecture can be found in

Probability and Statistics for Enginneers and Scientists (8 th Ed.) by

Ronald E. Walpole, Sharon L. Myers, Keying Ye (Pearson), 2013.

IIITS: IDA - M2021 27

Any question?

You may post your question(s) at the “Discussion Forum” maintained in

the course Web page!

IIITS: IDA - M2021 28

QUESTIONS OF THE DAY…

1. What are the degrees of freedom in the

following cases.
Case 1: A single number.
Case 2: A list of n numbers.
Case 3: a table of data with m rows and n
columns.
Case 4: a data cube with dimension m×n×p.

IIITS: IDA - M2021 29

QUESTIONS OF THE DAY…

2. In the following, two normal sampling distributions are shown

with parameters n, μ and σ (all symbols bear their usual
meanings).
𝑛 1 , 𝜇 1 , 𝜎 1

𝑛 2 , 𝜇 2 , 𝜎 2

What are the relations among the parameters in the two?

IIITS: IDA - M2021 30

QUESTIONS OF THE DAY…

•3. Suppose, and S denote the sample mean and standard

deviation of a sample. Assume that population follows
normal distribution with population mean and standard
deviation . Write down the expression of z and t values
with degree of freedom n.

IIITS: IDA - M2021 31

Statistics For Management - 2
80% (10)
Statistics For Management - 2
14 pages
Starbucks Coffee-In Bangladesh-Marketing
No ratings yet
Starbucks Coffee-In Bangladesh-Marketing
22 pages
Central Limit Theorem
100% (3)
Central Limit Theorem
38 pages
13 Eco6e Ev Edms
100% (3)
13 Eco6e Ev Edms
38 pages
TAS2019
No ratings yet
TAS2019
48 pages
Harvard Referencing
0% (1)
Harvard Referencing
15 pages
Titles of Enoch-Metatron in 2 Enoch
100% (1)
Titles of Enoch-Metatron in 2 Enoch
16 pages
BRILLIANT Portraiture 1991 Introduction
100% (1)
BRILLIANT Portraiture 1991 Introduction
19 pages
Understanding Stress
100% (17)
Understanding Stress
9 pages
IV Fluid Therapy
No ratings yet
IV Fluid Therapy
48 pages
EFFECTIVENESS OF INTERVENTION CLASSES USING PROJECT MDAS..division
No ratings yet
EFFECTIVENESS OF INTERVENTION CLASSES USING PROJECT MDAS..division
6 pages
08 Probability Distributions
No ratings yet
08 Probability Distributions
50 pages
Screenshot 2024-12-15 at 01.18.34
No ratings yet
Screenshot 2024-12-15 at 01.18.34
161 pages
Hypothesis Testing 23.09.2023
No ratings yet
Hypothesis Testing 23.09.2023
157 pages
Sampling Distribution
No ratings yet
Sampling Distribution
22 pages
Answer To Declaration of Nullity
No ratings yet
Answer To Declaration of Nullity
5 pages
Lecture 3: Sampling and Sample Distribution
No ratings yet
Lecture 3: Sampling and Sample Distribution
30 pages
Review Notes in Drug Education By: Rkmanwong: Definition of Terms
No ratings yet
Review Notes in Drug Education By: Rkmanwong: Definition of Terms
8 pages
UNIT 5 Modes of Entry
No ratings yet
UNIT 5 Modes of Entry
12 pages
Chapter 5 PDF
No ratings yet
Chapter 5 PDF
30 pages
Estimation & Hypothesis Testing - PPTX (Final)
No ratings yet
Estimation & Hypothesis Testing - PPTX (Final)
92 pages
Unit 2 Sampling Distribution (S) of Statistic (S) : Structure
No ratings yet
Unit 2 Sampling Distribution (S) of Statistic (S) : Structure
24 pages
MTPDF6 - Sampling Distribution and Point Estimation
No ratings yet
MTPDF6 - Sampling Distribution and Point Estimation
62 pages
Module 6 Sampling Theory
No ratings yet
Module 6 Sampling Theory
16 pages
Chap1 Sampling Distribution
No ratings yet
Chap1 Sampling Distribution
20 pages
Plasenta Previa 2
No ratings yet
Plasenta Previa 2
44 pages
CH06
No ratings yet
CH06
48 pages
IIT Roorki IKS
No ratings yet
IIT Roorki IKS
3 pages
3sampling Distribution and Estimation and CI
No ratings yet
3sampling Distribution and Estimation and CI
51 pages
MC 106 354 395
No ratings yet
MC 106 354 395
42 pages
Statistics For Management 2
No ratings yet
Statistics For Management 2
14 pages
Thesis Statement About Adderall
100% (2)
Thesis Statement About Adderall
5 pages
5.some Important Sampling Distribution
No ratings yet
5.some Important Sampling Distribution
45 pages
Unit - 1
No ratings yet
Unit - 1
40 pages
Sampling Notes Part 01
No ratings yet
Sampling Notes Part 01
13 pages
PDF The Indonesian Genocide of 1965: Causes, Dynamics and Legacies Katharine Mcgregor Download
100% (6)
PDF The Indonesian Genocide of 1965: Causes, Dynamics and Legacies Katharine Mcgregor Download
55 pages
The Relativity of Simultaneity: An Analysis Based On The Properties of Electromagnetic Waves
No ratings yet
The Relativity of Simultaneity: An Analysis Based On The Properties of Electromagnetic Waves
13 pages
Waves & Oscillations: Physics 42200
No ratings yet
Waves & Oscillations: Physics 42200
21 pages
Chapter Two Fundamentals of Marketing Estimation and Hypothesis Testing
No ratings yet
Chapter Two Fundamentals of Marketing Estimation and Hypothesis Testing
73 pages
04.sampling Distributions of The Estimators
No ratings yet
04.sampling Distributions of The Estimators
32 pages
04.sampling Distributions of The Estimators
No ratings yet
04.sampling Distributions of The Estimators
32 pages
Screenshot 2024-09-03 at 14.24.06
No ratings yet
Screenshot 2024-09-03 at 14.24.06
34 pages
Biost 6.1
No ratings yet
Biost 6.1
28 pages
Honors English I: Analysis Essay Peer Edit Checklist
No ratings yet
Honors English I: Analysis Essay Peer Edit Checklist
2 pages
Stats Lecture 07. Sample Distribution
No ratings yet
Stats Lecture 07. Sample Distribution
36 pages
Week 7 & 8
No ratings yet
Week 7 & 8
37 pages
Sampling Distribution
No ratings yet
Sampling Distribution
29 pages
Course: Statistical Inference & Applications: Instructor in Charge
No ratings yet
Course: Statistical Inference & Applications: Instructor in Charge
30 pages
Cogs 14B JANUARY 26, 2017
No ratings yet
Cogs 14B JANUARY 26, 2017
38 pages
Statisticsppt Copy 170221201132
No ratings yet
Statisticsppt Copy 170221201132
30 pages
Quantitative Techniques by Amit Ramawat
No ratings yet
Quantitative Techniques by Amit Ramawat
26 pages
Module 2 - Sample - Afterclass
No ratings yet
Module 2 - Sample - Afterclass
36 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
2 - Analyze - Inferential Statistics
No ratings yet
2 - Analyze - Inferential Statistics
27 pages
W5 Lecture5
No ratings yet
W5 Lecture5
15 pages
Unit 4 - Introduction To Statistical Inference Vs2
No ratings yet
Unit 4 - Introduction To Statistical Inference Vs2
24 pages
STAT 410 Chapter 07 PPT Sem 231
No ratings yet
STAT 410 Chapter 07 PPT Sem 231
18 pages
Statistics June Notes
No ratings yet
Statistics June Notes
24 pages
Statistics I: Introduction and Distributions of Sampling Statistics
No ratings yet
Statistics I: Introduction and Distributions of Sampling Statistics
22 pages
Chapter 5
No ratings yet
Chapter 5
47 pages
SChedule VI
No ratings yet
SChedule VI
88 pages
3.3 Sampling Distribution
No ratings yet
3.3 Sampling Distribution
22 pages
Sampling Distribution
No ratings yet
Sampling Distribution
19 pages
Standard Error: Sampling Distribution
No ratings yet
Standard Error: Sampling Distribution
5 pages
Week 9
No ratings yet
Week 9
19 pages
Business Statistics CH
No ratings yet
Business Statistics CH
29 pages
Chapter 6-8 Sampling and Estimation
No ratings yet
Chapter 6-8 Sampling and Estimation
48 pages
MRTP Act India
No ratings yet
MRTP Act India
2 pages
English Study Guide Based On Future Tenses: Going To-Will
No ratings yet
English Study Guide Based On Future Tenses: Going To-Will
5 pages
TOPIC 5.3-5.6 - Genetics Student Learning Guide, 2021
No ratings yet
TOPIC 5.3-5.6 - Genetics Student Learning Guide, 2021
7 pages
The Story of Halloween
100% (1)
The Story of Halloween
18 pages
Sampling Distribution of Sample Mean: Muhammad Tahir Yousafzai
No ratings yet
Sampling Distribution of Sample Mean: Muhammad Tahir Yousafzai
15 pages
Sampling Distribution,: College of Information and Computing Sciences
No ratings yet
Sampling Distribution,: College of Information and Computing Sciences
11 pages
Chap 1 Sampling Distributions
No ratings yet
Chap 1 Sampling Distributions
14 pages
Sampling Distribution
No ratings yet
Sampling Distribution
13 pages
Introduction To Data Analytics: Statistical Inference - II
No ratings yet
Introduction To Data Analytics: Statistical Inference - II
42 pages
c08 Sampling
No ratings yet
c08 Sampling
6 pages
Baughman Don Marianne 1977 Nigeria
No ratings yet
Baughman Don Marianne 1977 Nigeria
11 pages
The Practice of Statistic For Business and Economics Is An Introductory
No ratings yet
The Practice of Statistic For Business and Economics Is An Introductory
15 pages
Length Lab
No ratings yet
Length Lab
3 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
30 pages
Applied Statistics and Probability For Engineers Chapter - 7
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 7
8 pages
Sampling Distributions of Sample Means
No ratings yet
Sampling Distributions of Sample Means
7 pages
Week 8 Statistics and Probability
No ratings yet
Week 8 Statistics and Probability
7 pages
Central Limit Theorem Grade 11 Group 4
No ratings yet
Central Limit Theorem Grade 11 Group 4
7 pages
Estadística II T2
No ratings yet
Estadística II T2
4 pages
Sampling Distributions: The Basic Practice of Statistics
No ratings yet
Sampling Distributions: The Basic Practice of Statistics
14 pages
Stat Notes
No ratings yet
Stat Notes
5 pages
Survey On Security in Internet of Things State of The Art and Challenges
No ratings yet
Survey On Security in Internet of Things State of The Art and Challenges
8 pages
Student Suicides - What Are The Deep Rooted Problems
No ratings yet
Student Suicides - What Are The Deep Rooted Problems
8 pages
Curriculum
No ratings yet
Curriculum
7 pages
The Impact of Big Five Personality Factors On Organizational Citizenship Behaviour
No ratings yet
The Impact of Big Five Personality Factors On Organizational Citizenship Behaviour
5 pages
Chapter 3 Q and A
No ratings yet
Chapter 3 Q and A
3 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet