0% found this document useful (0 votes)
32 views86 pages

11 ASAP Basic Statistics Sampling and Sampling DistributionsSampleSize-3

The document discusses calculating z-scores and standardizing data points. It also discusses determining sample sizes using formulas that consider factors like confidence level, variance, and maximum error. Sample size is important for estimating population parameters from sample data.

Uploaded by

George Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views86 pages

11 ASAP Basic Statistics Sampling and Sampling DistributionsSampleSize-3

The document discusses calculating z-scores and standardizing data points. It also discusses determining sample sizes using formulas that consider factors like confidence level, variance, and maximum error. Sample size is important for estimating population parameters from sample data.

Uploaded by

George Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

FOUNDATION TO DATA SCIENCE

Business Analytics

Unit1: BASIC STATISTICS


INTERVEL EXTIMATION SAMPLE SIZE

Prof. Dr. George Mathew


B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD 1
What are Z-scores?
The most common approach is to standardize each data point by computing a Z-
score the following way:
(Data value – Mean of data)/(Standard deviation of data)
Essentially a Z-score measures the number of standard deviations the data point
differs from average. Since 95 percent of observations from a normal random
variable are within two standard deviations of average, any data point with a Z-score
exceeding 2 in absolute value is deemed an outlier.
As an example of the computation of Z-scores, the file contains for the years 1967–
2016 the Las Vegas prediction for the number of points by which the
favorite would win the game and the number of points by which the favorite won.
For example, in 2016 the Carolina Panthers were favored by 5 and lost by 14, so they
performed -14 – 5 = -19 points relative to the point spread. We call the difference
between the point spread and actual game outcome the residual or error for the
game. You can compute Z-scores for the residuals as a measure for each game of
how unexpected or unusual the game’s outcome was. Proceed as follows:
■ Compute the residual for each game by copying from cell C6 to C7:C55 the
formula =E6-D6.
■ In cell C4, compute the mean residual (-0.39) with the formula =AVERAGE(C6:C55).
■ In cell I4, compute the standard deviation (16.10) of the errors with the formula
=STDEV(C6:C55).
What are Z-scores?
The Normal Distribution
• Normal curve
• Bell shaped
• Almost all of its values are within plus or
minus 3 standard deviations
• I.Q. is an example
Standard Normal Curve
• The curve is bell-shaped or symmetrical
• About 68% of the observations will fall
within 1 standard deviation of the mean
• About 95% of the observations will fall
within approximately 2 (1.96) standard
deviations of the mean
• Almost all of the observations will fall
within 3 standard deviations of the mean
Standardized Normal Distribution
• Symetrical about its mean
• Mean identifies highest point
• Infinite number of cases - a continuous
distribution
• Area under curve has a probability density = 1.0
• Mean of zero, standard deviation of 1
Standardized Values
• Used to compare an individual value to the
population mean in units of the standard
deviation

x−
z=

Standard Error of the Mean


Sx =
n
Area corresponding to z=1.50 under normal curve, is 0.4332. Hence
required probability is .5 -.4332 = 0.0668
Probability that lamp will fail in the first 700 hours is 0.0668
Ref: Statistics and probability for engineering applications with Microsoft
Excel
Marion Dairies Problem
1. Calculate the probability that the mean score of Blugert
given by the simple random sample of Marion Dairies
customers will be 75 or less.
Data
Mean=80
Standard Deviation =25
Sample size=50
Z = (X-Mean)/SD
= (75-80)/(25/sqrt(50) = -1.41
Probability= 0.079
Marion Dairies Problem
2. If the Marketing Department increases the sample size to
150, what is the probability that the mean score of Blugert
given by the simple random sample of Marion Dairies
customers will be 75 or less?
Data
Mean=80
Standard Deviation =25
Sample size=150
Z = (X-Mean)/SD
= (75-80)/(25/sqrt(150) = -1.41
Probability= 0.007
Marion Dairies Problem
3. Explain to Marion Dairies senior management why the
probability that the mean score of Blugert given by the simple
random sample of Marion Dairies customers will be 75 or less
differs for these two sample sizes.

When Sample size increases, Standard error sigma xbar


decreases. Hence z-value increases.
Then probability decreases.
A confidence interval is a range of values that gives the user a
sense of how precisely a statistic estimates a parameter.
Confidence intervals can be used with distributions that aren’t
normal—that are highly skewed or in some other way non-
normal.
Calculate the confidence interval in
Excel
'=CONFIDENCE(alpha,sd,sample_size), alpha=0.04,
sd=14, sample_size=38
=CONFIDENCE(0.04,14,38) =4.66
The confidence value you get from the calculation both
adds to and subtracts from the mean to give you a range
that your data can distribute and still support the null
hypothesis. Using the example confidence value of
±4.66, add and subtract to get the interval. This results in
(44.55 + 4.66 = 49.21) and (44.55 - 4.66 = 39.89) for a
confidence interval between 39.89 and 49.21.
You can also use the confidence interval formula in the
spreadsheet as =[44.55]+(1.96*([14]/SQRT([38]))).
Confidence Interval in t-test
Confidence Interval in t-test
Comp Time Data
6 21 17 20 7 0
8 16 29 3 8 12
11 9 21 25 15 16

n= 18
Mean= 13.556
S= 7.801
Std Error= 1.839

α 0.100
df= 17.000
t= 1.740

10.357<=Mean<= 16.75
Population standard deviation
is unknown
When Population standard deviation
is unknown or if population is not a
normal distribution we use t-
distribution instead of z-distribution
(normal distribution)
Confidence Interval for Proportion
Unless the sample size is very small, it is not practical to
find confidence intervals for proportion by calculations of
individual probabilities directly from the binomial
distribution. We need to use either a normal
approximation or a computer solution.
A computer solution with Excel (except for rather small
sample sizes) involves using the function BINOMDIST
to obtain cumulative probabilities. Then the goalseeking
algorithm can be used to find the upper limit or the lower
limit of the appropriate confidence interval for the
proportion p, say the probability that any one
item will be defective.
Confidence Interval for Proportion
Confidence Interval for Proportion
Confidence Interval for Proportion
Determing Sample Size
What is sample size?
Sample size is the number of people or subjects
that are used as a sample in the study.
Sample Size Formula
The sample size formula helps us find the accurate
sample size through the difference between the
population and the sample. To recall, the number
of observations in a given sample population is
known as sample size. The sample size is denoted
by “n” or “N”. Here, it is written as “SS”.
Sample Size
Sample size calculation use:
1.Variance (standard deviation)
2.Magnitude of error
3.Confidence level
Estimating the Standard Error
of the Mean

S
S x
=
n
S
 = X  Z cl
n
Sample Size Formula

2
 zs 
n= 
E
Standard Error of the Proportion

sp =
pq
n

or

p (1− p )
n
Confidence Interval for a
Proportion

p  ZclSp
Sample Size for a Proportion

2
Z pq
n= 2
E
z2pq
n= 2
E
Where:
n = Number of items in samples

Z2 = The square of the confidence interval


in standard error units.

p = Estimated proportion of success

q = (1-p) or estimated the proportion of failures

E2 = The square of the maximum allowance for error


between the true proportion and sample proportion
or zsp squared.
Calculating Sample Size
at the 95% Confidence Level

p = .6 (1. 96 )2(. 6)(. 4 )


n=
q = .4 ( . 035 )2
(3. 8416)(. 24)
=
001225
. 922
=
. 001225
= 753
Determing Sample Size
Sample Size Formula - Example
Suppose a survey researcher,
studying expenditures on lipstick,
wishes to have a 95 percent confident
level (Z) and a range of error (E) of
less than $2.00. The estimate of the
standard deviation is $29.00.
Sample Size Formula - Example

 (1.96)(29.00)
2 2
 zs 
n =  = 
E  2.00 
2
 56.84 
=  = (28.42)2
= 808
 2.00 
Sample Size Formula - Example

Suppose, in the same example as the


one before, the range of error (E) is
acceptable at $4.00, sample size is
reduced.
Sample Size Formula - Example

(1.96)(29.00)
2 2
 zs 
n =  =  
E  4.00 
2
56.84
=  = (14.21)2
= 202
 4.00 
Calculating Sample Size
99% Confidence
2 2
(2.57)(29)  (2.57)(29) 
n=  n= 
 2   4 
 
2
74.53 
2

= 74.53
=  
 2   4 
= [37.265] = [18.6325]
2 2

=1389 = 347
Determing Sample Size
Determing Sample Size
Standard Error of the Proportion

sp =
pq
n

or

p (1− p )
n
Confidence Interval for a
Proportion

p  ZclSp
Sample Size for a Proportion

2
Z pq
n= 2
E
z2pq
n= 2
E
Where:
n = Number of items in samples

Z2 = The square of the confidence interval


in standard error units.

p = Estimated proportion of success

q = (1-p) or estimated the proportion of failures

E2 = The square of the maximum allowance for error


between the true proportion and sample proportion
or zsp squared.
Calculating Sample Size
at the 95% Confidence Level

p = .6 (1. 96 )2(. 6)(. 4 )


n=
q = .4 ( . 035 )2
(3. 8416)(. 24)
=
001225
. 922
=
. 001225
= 753

You might also like