Presentation 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

University of Kelaniya, Sri Lanka

Revision Session 2
MBA 52063 - Business Statistics
Master of Business Administration (MBA)
Dr. Indumathi Welmilla
Structure of the question paper
• Number of questions 07
• Should answer Any 5 questions
• Time 3 hours
Focus topics
1. Introduction- population, Sampling techniques, type of data collection, data collection
methods
2. Probability distributions
3. Sampling and estimations- confidence interval ( interpretation), hypothesis testing (
application and interpretation)
4. Correlation analysis
5. Regression analysis
6. Parametric tests- Independent samples t-test, one-way ANOVA
7. Non-parametric tests- brief introduction of parametric and non-parametric tests,
criteria for selecting data analysis techniques, Chi-square test ( interpretation)
Model q 1
A company wants to know about the effectiveness of their training programme.
The company is predicted their claim as after the training programme each
employee’s performance should increase by at least 70%. Suppose the sample
size was 50 and the computed 95% confidence interval is between 65.5 - 72.8.

1.Interpret the above CI results.


There is 95% confidence that the mean performance increase of all employees of the
company (who participated in the training ) is between 65.5% and 72.8%. It is
reached the target level of 70%. Therefore the training program is effective.

There is (Level of confidence) confidence that the mean (variable name) of all (
population) between (values).
Model q 1
A company wants to know about the effectiveness of their training programme.
The company is predicted their claim as after the training programme each
employee’s performance should increase by at least 70%. Suppose the sample
size was 50 and the computed 95% confidence interval is between 65.5 - 72.8.

ii. If the company repeats this process and conducts 40 intervals from a separate independent sample,
can we expect about 38 of those intervals to contain the true proportion of employees’ performance
levels? ( explain answer)
Yes.
A 95% level of confidence means that if a 100 confidence interval were constructed based on
different samples from the same population would expect 95 of the interval to contain the population
mean. In this case, 38 of 40 is 95%.
Model q 1
A company wants to know about the effectiveness of their training programme.
The company is predicted their claim as after the training programme each
employee’s performance should increase by at least 70%. Suppose the sample
size was 50 and the computed 95% confidence interval is between 65.5 - 72.8.

iii. Is the statement “About 95% of employees’ performances are increased” true or false? Explain
False- Confidence Levels don’t tell us the response rate.
• The level of confidence in a CI is a probability that represents the percentage of intervals that will contain if a
large number of repeated samples are obtained.
• A confidence interval does not predict with a given probability that the parameter lies within the interval.
A university wants to determine whether the new curriculum and teaching methods are effective or not. The Uni
suggested that new curriculum and teaching methods are effective if students' pass rate is greater than 70%.
Suppose the data are normally distributed.

(i) Advance the hypotheses relate to this situation.


H1; students’ average pass rate is greater than 70%.
H0; students’ average pass rate is less than or equal ( not greater than) 70%
Or
𝑙𝑒𝑡, 𝝅= proportion of pass rate
H0; 𝝅 ≤ 70%
H1; 𝝅 > 𝟕𝟎%

(ii) Are the hypotheses directional or non-directional? Explain your answer.


Directional hypothesis . Because it predict the direction that is grater

(iii) To give the optional (directional or non-directional) answer to the above part (ii) question, how would the university
determination be changed?
A new curriculum and teaching methods are effective if only students' pass rate is equal to 70%.
A university wants to determine whether the new curriculum and teaching methods are effective or not. The
Uni suggested that new curriculum and teaching methods are effective if students' pass rate is greater than
70%. Suppose the data are normally distributed.

(iv)
If the calculated p-value to test the hypothesis, which was 0.03. At the 5% significance level, can
you reject or fail to reject your null hypothesis? Give a reason for your answer.
Since the p-value (.03) is less than the significance (.05) level, Null hypothesis should be rejected.

(v) Interpret the given p-value in your own words.


The P-value is the probability for the null hypothesis to be True.

(vi)What is your conclusion for the given scenario at the 5% level of significant?
At the 5% level of significance, there is enough evidence to conclude that the new curriculum
and teaching methods are effective.

(vii) If you use 1% as a significance level, does your conclusion change or not? If it is changed, what is
the new conclusion?
change. Since p-value (.03) is greater than the significance (.01) level, Null hypothesis should be
failed to reject.
At the 1% level of significance, there is not enough evidence to conclude that the new curriculum
and teaching methods are effective.
LO 1. Explain the statistical tests, and parametric and non parametric tests

One sample
Types of statistical tests
T-Tests Paired groups
Two samples
Unpaired Groups
Parametric Tests One –Way
ANOVA
Two- Way

Pearson’s
Correlation

Regression
Statistical Chi-Square test
Tests
Mann-Whitney U test

Kruskal- Wallis Test


Non- parametric Tests
Wilcoxon signed-rank test

Spearman ρ test

Indumathi Welmilla MBA Univesity of kelaniya 8


LO 1. Explain the statistical tests, and parametric and non parametric tests

Parametric and nonparametric tests

Statistical Tests

Parametric Non parametric


tests tests

• Not based on assumptions about the


• Specific assumptions are made about
population
population
• test statistic is not based on
• test statistic is based on distribution
distribution
• the measurement of variables are
• The measurement of variables are
interval or ratio level
nominal or ordinal scale.
• complete information about the
• there is no information about the
population.
population.

Indumathi Welmilla 9
Lo 7. Differentiate paired and unpaired samples

T-test

T-test for one T- test for two


sample samples

Unpaired two samples/ Paired two samples/


Independent two samples Dependent two samples

Indumathi Welmilla 10
LO.9- Perform a hypotheses testing using T-test for unpaired two samples

Test normality output


Skewness .050 ( between -.5 and .5)
Kurtosis .205 ( between -.5 and .5)

Kolmogorov-Smirnov (N> 50)


Sig.> .05, data is normally
distributed.
Shapiro-Wilk.
(N<50)
Sig.> .05, data is normally
distributed.
Sig (.403)> 0.05, data is normally
distributed

Indumathi Welmilla 11
Lo. 2 Explain one sample T-test

One sample t- test


• When compare the mean of a single group of observations with
a specified value.

• In one sample t- test, The population mean is known. We draw a


random sample from the population and then compare the
sample mean with the population mean and make a statistical
decision as to whether or not the sample mean is different from
population.

Indumathi Welmilla 12
Lo.4 Perform hypothesis testing using T-test for one sample

Example – One sample T-test

Or
H0:The average age of the first degree completion of
the students in Sri Lankan State Universities is 26
years or less than 26 years.
H1: The average age of the first degree completion of
the students in Sri Lankan State Universities is greater
than 26 years.
H0; 𝜇 ≤ 26
H1; 𝜇 > 26

Indumathi Welmilla 13
Lo 7. Differentiate paired and unpaired samples

T-test unpaired two samples


An unpaired t-test is used to compare the mean between two
independent groups.

An unpaired t-test (also known as an independent t-test) is a


statistical procedure that compares the averages/means of
two independent or unrelated groups to determine if there is a
significant difference between the two.

Indumathi Welmilla 14
LO.9- Perform a hypotheses testing using T-test for unpaired two samples

T- test for unpaired two samples (Example)


Research Question?
Men or women are more satisfied with their jobs?
Hypotheses
H0: Men are less satisfied with their job than women
H0; 𝜇𝑚 ≤ 𝜇𝑤
H1: Men are more satisfied with their job than women.
H1; 𝜇𝑚 > 𝜇𝑤

Indumathi Welmilla 15
LO 12. Explain T-test for paired two samples its assumptions

T- test for paired two samples


A paired t-test ( dependent t-test) is a statistical
test that compares the averages/means of two
related groups to determine if there is a
significant difference between the two groups.

Paired t-tests are used when the same item or


group is tested twice, which is known as a
repeated measures t-test.

Indumathi Welmilla 16
LO.13 Perform a hypotheses testing using T-test for paired two samples

T- test for paired two samples (Example)


Research question
• Whether the Company training program (program A) is
effective or not?
H1:There is a significant difference between the employee
performance level before and after completing the training
program. H1; 𝜇𝐴 ≠ 𝜇𝐵
H0: There is no significant difference between the employee
performance level before and after completing the training
program. H0; 𝜇𝐴 = 𝜇𝐵
Indumathi Welmilla 17
LO.2- Identify multiple applications where one way ANOVA test
approach is appropriate

One Way ANOVA


The one-way analysis of variance (ANOVA) is used
to determine whether there are any statistically
significant differences between the means of three or
more independent (unrelated) groups.

Indumathi Welmilla
LO.2- Identify multiple applications where one way ANOVA test
approach is appropriate

When to use a one-way ANOVA


Use a one-way ANOVA when you have collected data about,
• One categorical independent variable
• The independent variable should have at least three levels (i.e. at
least three different groups or categories).
• One quantitative dependent variable. (Continuous measure)
• ANOVA tells you if the dependent variable changes according to
the level of the independent variable.

Indumathi Welmilla
LO.2- Identify multiple applications where one way ANOVA test
approach is appropriate

When to use a one-way ANOVA


Example :
Are rich people happier? Do different income classes report a
significantly different satisfaction with life?

Independent variable ( categorical) Dependent Variable ( Continuous )


Income classes life satisfaction
( can be measured
using likert scales)
Group1 – low income
Group 2– average income
Group 3–satisfactory income
Group 4- High income

Indumathi Welmilla
LO.4- Perform hypothesis testing by using One way ANOVA test

Example – One –Way ANOVA


Research Question?
Does age group influence the job satisfaction among
school teachers in Sri Lanka?

Hypothesis
H0: Job satisfaction does not differs via the age group of school
teachers in Sri Lanka. ( H0; 𝜇1 = µ2 = µ3 )
H1: Job satisfaction differs via the age group of school teachers
in Sri Lanka. ( H1; 𝜇1 ≠ µ2 ≠ µ3 )

Indumathi Welmilla
LO.7- Identify multiple applications where two way ANOVA test
approach is appropriate

Two way ANOVA


A two-way ANOVA is used to estimate how the mean of
a quantitative variable changes according to the levels of two
categorical variables. Use a two-way ANOVA when you want to
know how two independent variables, in combination, affect a
dependent variable.
A two-way ANOVA tests the effect of two independent variables on a
dependent variable.
A two-way ANOVA test analyzes the effect of the independent
variables on the expected outcome along with their relationship to the
outcome itself.

Indumathi Welmilla
LO.7- Identify multiple applications where two way ANOVA test
approach is appropriate
Two way ANOVA
The effective life (in hours) of batteries is compared by material type (1, 2 or 3) and
operating temperature: Low (-10˚C), Medium (20˚C) or High (45˚C). 100 batteries are
randomly selected from each material type and are then randomly allocated to each
temperature level.

Main effect
IV-1 Type 1
material type Type 2
Type 3
DV
Interaction effect Effective life of batters
) of batteries life (in
2IV-2 low hours) of batteries
operating temperature medium
hig high
High rating Main effect

Indumathi Welmilla
LO.9- Perform hypothesis testing by using Two way ANOVA test

Example- Two way ANOVA


Research Question- Is there an interaction between job
categories and gender on job stress? ( dependent variable
would be “job stress", independent variables gender (two groups
male and female) and job categories (clerical, junior executive,
executive)

Hypotheses
H01: Gender will have no significant effect on job stress
H02: Job categories will have no significant effect on job stress.
H03:Gender and job categories interaction will have no
significant effect on job stress.

Indumathi Welmilla
LO.9- Perform hypothesis testing by using Two way ANOVA test

Example- Two way ANOVA


Research Question- Is there an interaction between job
categories and gender on job stress? ( dependent variable
would be “job stress", independent variables gender (two groups
male and female) and job categories (clerical, junior executive,
executive)

Hypotheses
Ha1: Gender will have significant effect on job stress
Ha2: Job categories will have significant effect on job stress.
Ha3:Gender and job categories interaction will have significant
effect on job stress.

Indumathi Welmilla
LO 2 Explain the Chi-Square test

Chi-Square Test

The Chi-Square Test of Independence determines whether there is an


association between categorical variables (i.e., whether the variables
are independent or related)
Example: a researcher wants to know if education level
and gender are related for all people in particular country.

It is performed mainly on frequencies. It determines whether the


observed frequencies differ significantly from expected frequencies.

Indumathi Welmilla 26
LO 2 Explain the Chi-Square test

What determines the Chi- Square


It determines whether the observed frequencies differ significantly
from expected frequencies.

The Chi Square statistic is commonly used for testing relationships


between categorical variables. The null hypothesis of the Chi-Square
test is that no relationship exists on the categorical variables in the
population; they are independent.

Indumathi Welmilla 27
LO 2 Explain the Chi-Square test

Characteristics of data for Chi- Square


test
• The data must be in the form of frequencies.
• The frequency data must have a precise numerical value and must be
organized into categories or groups.
• The total number of observations must be greater than 20.

Indumathi Welmilla 28
LO 2 Explain the Chi-Square test

Example – Chi Squared Test


Research Question : Is there an association between
students’ preference for online or face-to-face instruction
and their education level?

Survey Items: Are you an undergraduate or graduate


student?
o Undergraduate o Graduate
Which method of instructional delivery do you prefer?
o Face-to-face o Online

Indumathi Welmilla 29
LO 2 Explain the Chi-Square test

Example – Chi Squared Test


• H0: There is no significant association between students’ educational
level and their preference for online or face-to-face instruction.

• Ha: There is a significant association between students’ educational level


and their preference for online or face-to-face instruction.

Indumathi Welmilla 30
LO 3 Interpret the results of Chi-Square test

Chi square Output – Interpretation

Pearson Chi-Square" row- χ(1) =


2.885, p = .089. This tells us that there is no
statistically significant association between
students’ educational level and their
preference for online or face-to-face
instruction. (p> 0.05)
Correlation Coefficient- Phi and Cramer's V are both tests of the
strength of association. We can see that the strength of
association between the variables is weak. (0.24)

Indumathi Welmilla 31
LO 4 Write the conclusion for output of Chi-Square test

Chi Square - decision and conclusions

• Pearson Chi-Square of association between students’


educational level and their preference for online or face-to-
face instruction, is not significant because significant factor is
higher than .05.
• Therefore , there is not enough evidence to conclude there is
a association between students’ educational level and their
preference for online or face-to-face instruction.
• Not enough evidence to reject the null hypothesis that, there
is no significant association between students’ educational
level and their preference for online or face-to-face
instruction.

Indumathi Welmilla 32
LO 2 Explain the Chi-Square test

Chi-Square Test

The Chi-Square Test of Independence determines whether there is an


association between categorical variables (i.e., whether the variables
are independent or related)
Example: a researcher wants to know if education level
and gender are related for all people in particular country.

It is performed mainly on frequencies. It determines whether the


observed frequencies differ significantly from expected frequencies.

Indumathi Welmilla 33
LO 5 Explain the Mann-Whitney U test

Mann-Whitney U test

The Mann-Whitney U test is used to compare


differences between two independent groups when the
dependent variable is either ordinal or continuous, but
not normally distributed.

Indumathi Welmilla 34
LO 5 Explain the Mann-Whitney U test

Mann-Whitney U test- Example

Research question: Does gender affect the level of job satisfaction?

Hypothesis;
H0: There is no difference between gender difference and level of
job satisfaction.

H1:There is a significant difference between gender difference and


level of job satisfaction.

Indumathi Welmilla 35
LO 8 Explain Wilcoxon signed-rank test

WILCOXON SIGNED RANK TEST

The Wilcoxon singed rank test is a nonparametric statistical test that


compares two paired groups.
The test essentially calculate the difference between sets of pairs and
analyzes these differences to establish if they are statistically
significantly different from one another.
LO 8 Explain Wilcoxon signed-rank test

WILCOXON SIGNED RANK


TEST
When
As the Wilcoxon signed-rank test does not assume normality in the data, it
can be used when this assumption has been violated and the use of the
dependent t-test is inappropriate.
Used when two measurements of the same dependent variable are taken at
different time points or under different conditions for each subject.
LO 8 Explain Wilcoxon signed-rank test

Example: Wilcoxon signed rank test


Research Question; Whether the new teaching method ( blended learning)
increase the literacy of children?

Hypotheses
• H0; There is no difference between the students’ test scores before
conducting the blended learning session and after the blended learning
session.
• H1; There is a significant difference between the students’ test scores
before conducting the blended learning session and after the blended
learning session.
LO 11 Explain Kruskal- Wallis Test

Kruskal- Wallis Test


The Kruskal-Wallis test is a rank-based nonparametric test.

It can be used to determine if there are statistically significant


differences between multiple groups.
LO 11. Explain Kruskal- Wallis Test

Kruskal- Wallis Test


It is alternative to the one-way ANOVA ( If data normally
one way ANOVA , if data not normally Kruskal- Wallis)

It is an extension of the Mann-Whitney U test. ( Mann-


Whiteny allow the comparison of two independent groups,
Kruskal- Wallis allow more than two independent groups)
LO 11. Explain Kruskal- Wallis Test

When to use Kruskal- Wallis Test


1. The data do not meet the requirements for a parametric test. (i.e.
use it if the data are not normally distributed)
2.you have three or more conditions that you want to compare
3. each condition is performed by a different group of participants
(i.e. you have an independent-measures design with three or more
conditions)
LO 11. Explain Kruskal- Wallis Test

Kruskal- Wallis Test requirements


Dependent variable should be measured at
the ordinal or continuous level.
Ordinal variables
• Job performance ( questions measured using Likert s Scale)
• Job engagement ( questions measured using Likert s Scale)
Continuous variables
• Intelligence ( measured using IQ score )
• Exam performance (measured from 0 to 100 obtained marks)
• Weight ( measured in Kgs )
LO 14. Explain the criterions for selecting right statistical test

Criteria for right statistical test

• Purpose of your Study


• level of measurements
• Assumptions about the distribution
• Nature of samples
• Number of groups

Indumathi Welmilla 43
LO 14. Explain the criterions for selecting right statistical test

Criteria for right statistical test

Purpose of the Study


Purpose/objective/ research question

Relationship/ Impact Difference

Parametric
parametric

Regression T-tests
Pearson correlation ANOVA
nonparametric

nonparametric
Mann-Whitney U test
Spearman rank correlation Wilcoxon signed-rank test
Chi-square ( categorical variables) Kruskal- Wallis Test
LO 14. Explain the criterions for selecting right statistical test

Criteria for right statistical test


Level of measurement
Measurement

Quantitative Variables Qualitative Variables

Nonparametric tests
Parametric tests

Interval Nominal

Ratio Ordinal
LO 14. Explain the criterions for selecting right statistical test

Variables
Interval
Nominal
Numbers with known
Unordered/ no ranking
differences between variables
e.g. male/female
e.g. time/ temperature
Ordinal
Ordered/ ranking
e.g. excellent/very good/ good/ fair/ poor
teaching performance

Continuous
Can take any value within a range Ratio
e.g. height in feet/ weight in kgs Numbers that have measurable
intervals, where difference can
be determined
Discrete
e.g. Hight/ weight
Whole numerical value/ typically counts
e.g. number of students in class

Indumathi
LO 14. Explain the criterions for selecting right statistical test

Criteria for right statistical test

Assumptions about the distribution

Nonparametric tests
Parametric tests

Normal Distribution Non- Normal Distribution


LO 14. Explain the criterions for selecting right statistical test

Criteria for right statistical test

Nature of samples
Independent Samples/ Dependents samples/ paired
unpaired groups groups

Parametric
parametric

independent samples t-test


ANOVA tests Dependent samples t-test
nonparametric

nonparametric
Mann-Whitney U test Wilcoxon signed-rank test
Kruskal- Wallis Test
LO 14. Explain the criterions for selecting right statistical test

Criteria for right statistical test

Number of groups in the samples

One/ two groups More than two groups

Parametric
parametric

One sample t-test


ANOVA
Two samples t-test
( one way / two way )
(paired/unpaired groups)
nonparametric

nonparametric
Mann-Whitney U test Kruskal- Wallis Test
Wilcoxon signed-rank test
LO 8 Interpret the results of hypotheses testing by using multiple regression analysis

multiple regression Output


The first table of interest is the Model Summary table. This table
provides the R, R2, adjusted R2, and the standard error of the
estimate, which can be used to determine how well a regression
model fits the data: he "R" column represents the value of R,
the multiple correlation coefficient. R can be considered to be one
measure of the quality of the prediction of the dependent variable;
in this case, WLB. A value of 0.316, indicates a moderate level of
prediction.
The "R Square" column represents the R2 value (also called the
coefficient of determination), which is the proportion of variance in
the dependent variable that can be explained by the independent
variables (technically, it is the proportion of variation accounted for
by the regression model above and beyond the mean model). R
square value of 0.100 mean that the independent variables explain
10% of the variability of our dependent variable, WLB . However,
you also need to be able to interpret "Adjusted R Square" (adj. R2)
to accurately report your data.

7/18/2023 Indumathi Welmilla


LO 8 Interpret the results of hypotheses testing by using multiple regression analysis

Hypotheses testing ( multiple regression


output)

The F-ratio in the ANOVA table tests


whether the overall regression model is a
good fit for the data. The table shows that
the independent variables statistically
significantly predict the dependent
variable, F(2, 261) = 14.4, p < .05 (i.e., the
regression model is a good fit for the data).

7/18/2023 Indumathi Welmilla


LO 8 Interpret the results of hypotheses testing by using multiple regression analysis

Hypotheses testing ( multiple regression output)

The general form of the equation to


predict WLB from work demand and
family demand
Predicted WLB = 3.86-0.12 ×
𝑤𝑜𝑟𝑘 𝑑𝑒𝑚𝑎𝑛𝑑 + .014 × family
demand

Unstandardized coefficients indicate


how much the dependent variable
varies with an independent variable
when all other independent variables
are held constant.

7/18/2023 Indumathi Welmilla


LO 8 Interpret the results of hypotheses testing by using multiple regression analysis

Hypotheses testing ( multiple regression output)

You can test for the statistical


significance of each of the
independent variables. This tests
whether the unstandardized (or
standardized) coefficients are equal
to 0 (zero) in the population. If p <
.05, you can conclude that the
coefficients are statistically
significantly different to 0 (zero).
The t-value and corresponding p-
value are located in the "t" and "Sig."
columns, respectively.

7/18/2023 Indumathi Welmilla


LO 9 Write conclusion for hypotheses testing output ( for multiple regression)

Reporting results Multiple regression


The results of the multiple regressions of independent
variables (work demand and family demand) against the
dependent variable (WLB) are shown in the table ….. .

The adjusted R square is .100, which indicates that 10% of


variation in WLB is explained by the 2 independent
variables jointly. The F value is 14.4 which is significant at
5%, which suggest that the 2 independent variables have
significantly explained 10% of the variation in the WLB.

7/18/2023 Indumathi Welmilla

You might also like