0% found this document useful (0 votes)
30 views58 pages

Lecture 5 Inferential Stat

This document discusses inferential statistics and hypothesis testing. It defines key concepts such as: - Hypothesis: An educated guess about the relationship between variables or status of a situation based on available facts. Examples of different types of hypotheses are given. - Hypothesis testing: Using sample data to test claims about population parameters. It involves defining the null and alternative hypotheses, significance level, test statistic, and making a decision to reject or fail to reject the null hypothesis. - Type I and II errors: Rejecting a true null hypothesis is a Type I error. Failing to reject a false null hypothesis is a Type II error. Common parametric and nonparametric statistical tests are
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views58 pages

Lecture 5 Inferential Stat

This document discusses inferential statistics and hypothesis testing. It defines key concepts such as: - Hypothesis: An educated guess about the relationship between variables or status of a situation based on available facts. Examples of different types of hypotheses are given. - Hypothesis testing: Using sample data to test claims about population parameters. It involves defining the null and alternative hypotheses, significance level, test statistic, and making a decision to reject or fail to reject the null hypothesis. - Type I and II errors: Rejecting a true null hypothesis is a Type I error. Failing to reject a false null hypothesis is a Type II error. Common parametric and nonparametric statistical tests are
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

INFERENTIAL

STATISTICS
Larger Set Smaller Set
(N units/observations) (n units/observations)

Inferences and
Generalizations
Hypothesis Testing
What is a hypothesis?
A hypothesis is an educated guess. It is a
conjecture or proposition about the solution to the
problem, a tentative relationship between two or more
variables, or the status of a situation based on the
available facts or information that the researcher
already knows.

For example, the research problem is:


Is there a relationship between achievement in
mathematics and interest in mathematics of students
in secondary school?
The hypothesis might be:
• There is no significant relationship between
achievement in mathematics and interest in
mathematics of students in secondary school.
(Null and non-directional) or
• There is a significant relationship between
achievement in mathematics and interest in
mathematics of students in secondary school.
(Alternative and non-directional) or
• There is a positive relationship between
achievement in mathematics and interest in
mathematics of students in secondary school.
(Alternative and directional)
Examples of directional hypotheses:
1. Male students score higher in risk taking
than female students.
2. Single and young teachers tend to be more
innovative in teaching than married and old
teachers.
3. The science achievement of high-ability
students exceeds that of average ability
students.
4. As a teacher’s salary increases, his
perception towards administrative personnel
also improves.
Examples of non-directional hypotheses:
1. Faculty morale is related to the frequency of
promotions.
2. There is no relationship between attitude towards
science and achievement in science.
3. The mathematics achievement of high-ability
students equals that of average-ability students.
4. There is no change in the pupils’ behaviour before
and after attending the Summer Camp.
5. There is no difference between young and single
teachers and old and married teachers in their
commitment to professional growth.
Hypothesis Testing

 Is also called significance testing


 Tests a claim about a parameter using evidence
(data in a sample)
Type I and Type II Errors
 Type I (α) error states that the null
hypothesis is false when in fact is true.
(Rejecting the null hypothesis when it is
true.)
 Type II (error occurs when the null
hypothesis is not rejected when in fact it is
false. (Accepting the null hypothesis when
it is false.)
DECISION ERRORS
Two types of decision errors:
Type I error = erroneous rejection of true H0
Type II error = erroneous retention of false H0

Truth
Decision H0 true H0 false
Retain H0 Correct retention Type II error
Reject H0 Type I error Correct rejection

α ≡ probability of a Type I error


β ≡ Probability of a Type II error
Significance Level
• The significance level which is usually
denoted by (α) is related to the degree of
certainty we require in order to reject the null
hypothesis in favor of the alternative
hypothesis.
• Very frequently used are the .05 and .01 level
of significance.
Note:
0.05 level of significance implies that we
are willing to commit an error of 5% therefore
a confidence level of 95%.
Critical Value
• The critical value of the tabular value for
the hypothesis test is a threshold to which
the value of the test statistic in a sample is
compared to determine whether or not the
null hypothesis is rejected.
• We reject the null hypothesis if the
computed value is greater than or equal to
the critical value.
General Rule

If p – value<, or Test Stat Value > Test


Stat Critical Value
[ Decision: Reject Ho, and accept Ha]

Otherwise : [Decision: Ho is not rejected]


Steps in hypothesis testing
1. State the null and alternative hypotheses (H o
and Ha)
2. Identify the level of significance
3. Determine the test statistic (statistical test) to
be used (would it be a test of difference or a
test of relationship?)
4. Perform the computation. Compare the
computed value with the critical value (others
are using the p-value instead)
5. Make the decision rule (Reject the null
hypothesis or failed to reject it).
Remarks:

 The null hypothesis (H0) is a claim of “no


difference in the population”
 The alternative hypothesis (Ha) claims “H0
is false”. Meaning, there exists a difference.
Example

 The problem: In the 1990s, 20–29 year old men in


the Philippines had a mean μ body mass index(BMI)
of 20. Standard deviation σ was 3. We test whether
mean BMI in the population now differs.

 Null hypothesis H0: μ = 20 (“no difference”)

 The alternative hypothesis can be either


Ha: μ > 20 (one-tailed test) or
Ha: μ ≠ 20 (two-tailed test)
Exercises
State the null and alternative hypotheses for each of the
following problem statements.
1. Is there significant difference between the pretest
scores and posttest scores of the respondents?
2. Is there significant difference among three treatments
in terms of glucose level of mice?
3. Is there significant difference between the mathematics
grades of male and female respondents?
4. Is there significant difference on the weight of the
participants before and after the experiment?
5. Is there significant difference in the mean lifetimes of
the four brands of batteries?
Alternative Hypothesis (Ha) Type of Statistical
Test

Non-Directional: Two- tailed test


 1  2 or   o

Directional: One-tailed test


 1 > 2 ,  1< 2 ,  > o , or
 < o
Test Statistic
  FREQUENTLY USED INFERENTIAL STATISTICAL TOOLS  
  Single Two Two More than More than  
LEVEL OF Sample Related Independent Two Related two CORRELA
MEASURE- Samples Samples Samples Independe TIONAL
MENT nt MEASURE
Samples S

PARAMETRI
INTERVAL/ t test for Paired t test t test for Pearson r
RATIO single independent ANOVA for ANOVA

C
sample samples repeated F-test
measures
Z test
ORDINAL Kolmogorov Sign test, Mann- Friedman Kruskal- Spearman
-Smirnov   Whitney U Rank Test Wallis rank order
one-sample Wilcoxon test, H Test correlation

NON-PARAMETRIC
test matched-  
pairs, Wald-
  Wolfowitz
Signed- runs test
ranks test
NOMINAL Chi-square McNemar Chi-square   Chi- Phi
one-sample test for square Coefficient,
test independent test for  
samples with with more Yule’s Q
two than two  
subclasses subclasses
Nonparametric tests
Parametric tests (means)
(medians)

1-sample Sign, 1-sample


1-sample t test
Wilcoxon
2-sample t test Mann-Whitney test
Kruskal-Wallis, Mood’s
One-Way ANOVA
median test

Friedman test
Non-Parametric Tests

Nonparametric statistics refer to a statistical


method wherein the data is not required to fit a
normal distribution. Nonparametric statistics
uses data that is often ordinal, meaning it does
not rely on numbers, but rather a ranking or
order of sorts.
Reasons to Use Parametric Tests

1. Parametric tests can perform well with skewed and


nonnormal distributions provided they satisfy the
sample size guidelines.

Sample size guidelines for


Parametric analyses
nonnormal data
1-sample t test Greater than 20
2-sample t test Each group should be greater than 15

• If you have 2-9 groups, each group


should be greater than 15.
One-Way ANOVA
• If you have 10-12 groups, each group
should be greater than 20.
2. Parametric tests can perform well when
the spread of each group is different

3. Statistical power
Some Parametric Tests
Tests of Difference

• t-test for dependent samples


(paired)
• t- test for independent samples
• Z-test for one sample
• Z-test for two sample means
• F-test or the Analysis of Variance
t-test for Dependent Samples

• test applied to one group of samples


• used in the evaluation of a certain
program or treatment
When do we use the t-test for dependent
samples?

• “applied when the mean before and


mean after are being compared”

• Pretest  Intervention Posttest


Why do we use the t-test for dependent
samples?
• t-test for dependent samples is used
to find out if difference exists
between the before and after means.

• If there is a difference in favor of


the posttest, then the treatment or
intervention is effective.
Example:

An experimental study was conducted


on the effect of programmed materials
in English on the performance of 20
selected college students. Before the
program was implemented the pretest
was administered and after 5 months,
the same instrument was used to get
the posttest result of the experiment.
Test to be used:

•t-test for dependent samples will be


used to determine if a significant
difference exists between the pretest
mean score and the posttest mean
score (that is to test if the program
was effective).
Sample Problem Statements

• Is there a significant difference


between the beliefs of the teachers
before and after the training ?

• Is the slimming tea effective in reducing


body weight?
t-test for Independent Samples

- test applied to compare two


independent groups and when the
sample size in each group is less than
30.
Example:
Two groups of experimental rats were
injected with tranquilizer at 1.0 mg and
1.5 mg dose respectively.

t- test for independent samples will be


used to test if difference in the dosage
has no effect on the length of time it
took the rats to fall asleep.
Sample Problem Statements

1. Is there significant difference between


the performance of male and female
participants in mathematics?
2. Is there a significant difference
between teaching method A and
teaching method B in terms of attitude?
Z-Test
• used to determine whether two population
means are different when the variances or
standard deviations are known and the
sample size is large (n>30).

• tests the mean of a normally distributed


population with known variance or
standard deviation
Examples:

1. The manager of a candy manufacturer wants to


know whether the mean weight of a batch of
candy boxes is equal to the target value of 10
ounces. From historical data, they know that the
filling machine has a standard deviation of 0.5
ounces.
2. According the DOLE, full-time graduate students
receive an average salary of P12,800. The dean
of graduate studies at a large state university in
claims that his graduate students earn more than
this. He surveys 46 randomly selected students
and finds their average salary is P13,445 with a
standard deviation of P1800. With alpha = 0.05,
is the dean’s claim correct?
F-test or the Analysis of Variance
(ANOVA)

 test applied to compare means of two or


more groups of independent samples.
 The analysis of variance is used to test the
hypothesis that the means of three or more
populations are the same against the
alternative hypothesis that not all population
means are the same.
 It is called the analysis of variance because
the test is based on the analysis of variation
in the data obtained from different samples.
a. One-Way ANOVA
- used when there is only one variable
involved.
Example:
Suppose we have teachers at a school
who have devised three different methods to
teach arithmetic. They want to find out if these
three methods produce different mean scores.
Remark:
In testing for the equality of mean arithmetic
scores of students taught by each of the three
different methods, we are considering only one
factor, which is the effect of different teaching
methods on the scores of students.
Suppose that more than one teachers
teach arithmetic using these three
methods; we can analyze the effects of
teachers and teaching methods on the
scores of students. This is done by using
a two-way ANOVA.
Factor 1: Teaching method
Factor 2: Teacher

b. Two-Way ANOVA
- used when there are two variables
involved.
Sample Problem
A study needs to investigate the effect of Open-
ended teaching approach on the Mathematics
achievement of students with different learning
abilities

Independent Variables:
Teaching approach
(Open-ended and Traditional)
Learning Abilities (Low, Average, and High)

Hence, the problem will use a 2x3 Analysis of


Variance
2. Test of Relationships

 Consider Galton’s
data on heights of
fathers and first born
sons
 Tall Father tend to
have tall sons; short
fathers tend to have
short sons.
Correlation Analysis

 Used to measure strength of association (linear


relationship) between two numerical variables
 Only concerned with strength of the relationship
 No causal effect is implied
2. Tests of Relationship

• The Pearson Product Moment


Coefficient Correlation r
• Chi-square Test of
Independence
Sample of Observations from Various r
Values

Y Y Y

X X X
r = -1 r = -.6 r=0
Y Y

r X= .6 r=1
X
• The Pearson Product Moment
Coefficient Correlation r

- used to analyze if a relationship exists


between two variables (measured in the
interval or ratio scale) say variable x
and variable y.

Example :
Suppose we want to know if midterm
grades is significantly related with
final grades.
Remark:
If there is a relationship that exists
between x and y, then we can determine
the extent by which x influences y using
the coefficient of determination which is
equal to the square of r and multiplied by
100%.
The coefficient of determination answers or
explains how much the independent variable
influences the dependent variables.
Chi-Square Test of Independence
- used to determine the association between
two variables measured in the nominal scale

Examples
1. We may want to test if there is a
relationship between IQ level and music
preference.
2. We may want to test if there is an
association between being a man or a
woman and having a preference for
watching sports or soap operas on
television.
Example of a Contingency Table

IQ
Music High Medium Low Column
Preference Total
Classical 40 26 17 83
Pop 47 59 25 131
Rock 83 104 79 266
Row Total 170 189 121 480
Why Use Chi-Square Tests of
Independence
• A doctor who knows that hypertension
depends on smoking habits can tell his
smoking patients what they should do.

• If the traffic condition (light, moderate, heavy,


standstill) is found to be dependent on vehicle
plate numbers (odd, even) a traffic officer may
decide to revise traffic law enforcement.

• If poverty status of households is found to be


correlated with family size, government ought
to adopt a viable poverty management
program.
Exercises:
State what inferential statistical tool is best
suited for the following problems.

1. A study aims to compare the achievement of


the students taught by problem solving method
and students taught by traditional method.

2. A researcher wonders if interest in sport


programs (measured as High, Moderate, or
Low) is significantly related to social class
classified as working class, middle class and
upper class.
3. A study is conducted on the
relationship of number of absences and
grades of students in English.
4. A study aims to determine if a certain
jogging program can improve the self –
esteem of a person in less than 3 weeks.

5. A study aims to find out which of the


three treatments is the best in decreasing
body weight.
Simple Linear Regression
- This statistical procedure is concerned with
prediction or forecasting
- It is a statistical method that allows us to
summarize and study relationships between two
continuous (quantitative) variables.
- In simple linear regression, we predict scores
on one variable (dependent) from the scores on
a second variable (independent).
- The variable we are predicting is called the
criterion variable and is referred to as Y. The
variable we are basing our predictions on is
called the predictor variable and is referred to as
X.
 The goal of the
analyst who
studies the data is
to find a
functional
relation

between the
response variable
y and the predictor
variable x.
Some examples of statistical relationships might include:

1. Height and weight — as height increases, you'd expect


weight to increase, but not perfectly.
2. Alcohol consumed and blood alcohol content — as
alcohol consumption increases, you'd expect one's
blood alcohol content to increase, but not perfectly.
3. Vital lung capacity and pack-years of smoking — as
amount of smoking increases (as quantified by the
number of pack-years of smoking), you'd expect lung
function (as quantified by vital lung capacity) to
decrease, but not perfectly.
4. Driving speed and gas mileage — as driving speed
increases, you'd expect gas mileage to decrease, but not
perfectly.
Interpretation of the Regression Model

Regression coefficients represent the mean


change in the response variable for one unit of
change in the predictor variable while holding
other predictors in the model constant. The key
to understanding the coefficients is to think of
them as slopes, and they're often called slope
coefficients.
This table provides the R and R2 values. The R value
represents the simple correlation and is 0.873 (the "R"
Column), which indicates a high degree of correlation.
The R2 value (the "R Square" column) indicates how
much of the total variation in the dependent
variable, Price, can be explained by the independent
variable, Income. In this case, 76.2% can be explained.
The ANOVA table indicates that the regression model
predicts the dependent variable significantly well.
Regression Model:

Price = 8287 + 0.564(Income)

You might also like