Basics of Statistical Inference: Drawing Inferences From A Starbucks Survey

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Basics of Statistical Inference

Drawing inferences from a Starbucks Survey

Akash Goel (19314) Arnav Garg (19321)


Bhavya Jain (19326) Darpan Khurana (19328)
Ishaan Jain (19341)
About the survey
Starbucks Coffee carried out a customer
satisfaction survey in 2019 for its one of
the most attractive market around the
world- Malaysia.

The survey includes the demographics


of the customers and their ratings on a 5
point likert scale for different factors that
may influence their buying decisions at
Starbucks.
Objective of the project
In this project we will do hypothesis testing to evaluate different claims about the sample
data of the survey.
- The sampled population data is normally distributed.
- The majority service rating is above average.
- The customers are equally concerned about the ambience and service.
- Both men and women are equally likely to be affected by sales and promotion.

The statistical tests to be used are: Test for normality, Sign test, Test for homogeneity, and
2 sample Z test.
Hypothesis Testing
Hypothesis testing is a decision-making process for evaluating claims about a
population. A Statistical hypothesis is a conjecture about a population parameter.
This conjecture may or may not be true.

- State the hypothesis & identify the claim


- Find Critical Values
- Compute the test values
- Make the decision to reject or not reject the null hypothesis
- Summarize the results
Parametric Tests Vs. Non- Parametric Tests
- Parametric tests are used if the data is - Non-parametric tests are used for non-
normally distributed. normal data or skewed data.

- When scale of measurement is metric - Nominal or ordinal scale


scale (Interval or ratio)

- For quantitative data - For qualitative data

- For example: T-test, Paired t-test, Z test, - For example: Sign test, Wilcoxon
Anova Signed rank test, Spearman’s rank
correlation
6

Test of
Normalit
y
7

Test of Normality
▫ Tests whether the given data was sampled from a normal
distribution
▫ It is a test of Goodness of fit

\
8

Test Statistic
▫ The Test Statistic is compared with the Critical Value for
the given Level of Significance and Degrees of
Freedom(df)
▫ Df is determined by
1. Number of groups
2. Minus 1(For making )
3. Minus 2(Estimating µ and SD from the sample)
▫ If Test statistic>Critical Value, we reject the null
hypothesis
9

Application of Test of Normality


Sample Size = 122
Null Hypothesis The underlying population of the sample is normally distributed

Alternate Hypothesis The sampled population is not normally distributed

Test statistic 38.771

Level of Significance 0.05

Degrees of Freedom 1

Critical Value 3.841

Since the test statistic(38.771) > the critical value(3.841) therefore we


reject the null hypothesis. Therefore, we have conclusive proof that the
population from which the sample is collected is not normally distributed
Excel Application
11

Sign Test
12

SIGN TEST
▫ Tests one population median, η (eta)
▫ Corresponds to t-test for one mean
▫ Assumes population is continuous
▫ Small sample test statistic: Number of sample values
above (or below) median which is determined by
alternative hypothesis
▫ Uses normal approximation if n ≥ 30
13
Large Sample Sign Test for a Population Median
η
One-Tailed Test
H0: η = η0
Ha: η > η0 [or Ha: η < η0 ]

Test statistic:

S = Number of sample measurements greater than η0


or Number of measurements less than η0
14

Frequency Table of Satisfaction Score

Satisfaction Score Frequency


1 1
2 4
3 43
4 51
5 23
Total 122
15

Application of Sign test on Satisfaction Score


Sample Size = 122
Null Hypothesis Median Satisfaction Score <= 3

Alternate Hypothesis Median Satisfaction Score > 3

2.263394
Test statistic
where S=74

Critical value at 0.05 1.645

P value 0.030795

Since the p-value of 0.03079 is less than 0.05 , therefore we reject the null hypothesis.
Thus, we have conclusive proof that that population median is greater than 3 which
means that Starbucks is successful in satisfying it's customer.s
16

Test of
Homogeneity
17

Chi-Square Test of Homogeneity


▫ Tests the claim that different populations (or subgroups
of a population) share the same proportions of specified
characteristics
▫ The null hypothesis states that the distribution of the
categorical variable is the same for the populations (or
subgroups).
▫ The alternative hypothesis says that the distributions
differ.
18

Chi-Square Test of Homogeneity - Conditions


▫ For each population, the sampling method is simple
random sampling
▫ The variable under study is categorical
▫ If sample data are displayed in a contingency table
(Populations x Category levels), the expected frequency
count for each cell of the table is at least 5
19

Contingency Table
Satisfied Dissatisfied Total

Service 117 5 122


Satisfied: Satisfaction score >= 3
Dissatisfied: Satisfaction score < 3
Ambiance 113 9 122

Total 230 14 244


20

Expected Values
Satisfied Dissatisfied Total

Service 115 7 122


Satisfied: Satisfaction score >= 3
Dissatisfied: Satisfaction score < 3
Ambiance 115 7 122

Total 230 14 244


21

Chi-Square Test Statistic


Satisfied Dissatisfied Total

Service 0.034782609 0.571429 0.60621118

Ambiance 0.034782609 0.571429 0.60621118

Total 0.069565217 1.142857 1.21242236


22

Application of the test


Customers rating for the service and customers rating for the
Null Hypothesis
ambiance are homogenous with respect to being satisfied
Customers rating for the service and customers rating for the
Alternate Hypothesis
ambiance are not homogenous with respect to being satisfied

Degree of Freedom 1
Critical value at 0.05
3.841
Significance level
Test Statistic 1.21242236
Reject null hypothesis if the test
Decision Rule
statistic is greater than 3.841

Since the Test Statistic (1.212422) < Critical Value (3.841) therefore we fail to reject
the null hypothesis, thus we can conclude the 2 data sets are homogenous implying
that the customers who are satisfied with the ambiance are satisfied with the
service of Starbucks
23

2 Sample
Z Test
24

2 Sample test of hypothesis (4 major cases)


1. Normally/non-normally distributed populations, population
standard deviations known, sample sizes are large and independent
2. Normally/non-normally distributed populations, population
standard deviations not known, sample sizes are large and
independent
3. Normally distributed populations, population standard deviations
are known, sample sizes are small and independent
4. Normally distributed populations, population standard deviations
are not known, sample sizes are paired and dependent
25

Understanding the sample


Male 57

Female 65

Female Frequency Table Male Frequency Table


Score Frequency Score Frequency
1 1 1 5
2 5 2 2
3 13 3 17
4 25 4 15
5 21 5 16
26
Test for 2 independent sample means: Large
Samples, Standard Deviations not Known
While t-distribution is always the right distribution when the population standard deviation is not
known but when the sample sizes of both the samples is large then we can use the z-test because the
t-test in such scenario is only marginally higher than the t-test and can be approximated by the z-
values

In this n1 and n2 are, the respective sample sizes of the two samples
27
28

You might also like