0% found this document useful (0 votes)
5 views18 pages

Week 9

The document outlines the curriculum for a statistics module, focusing on inferential statistics, hypothesis testing, and the comparison of dependent and independent samples. It details the objectives, concepts, and procedures for hypothesis testing, including the significance level and test statistics, as well as methods for analyzing proportions and means. Additionally, it covers chi-square tests for independence and the basics of correlation and regression analysis.

Uploaded by

Hạ Lý
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

Week 9

The document outlines the curriculum for a statistics module, focusing on inferential statistics, hypothesis testing, and the comparison of dependent and independent samples. It details the objectives, concepts, and procedures for hypothesis testing, including the significance level and test statistics, as well as methods for analyzing proportions and means. Additionally, it covers chi-square tests for independence and the basics of correlation and regression analysis.

Uploaded by

Hạ Lý
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Week 9 – 18/4/2025

First half:

- Different concepts

Second half:

- Inferential statistics
- Presentation applied statistics

Module 4: Inferential Statistics:


Hypothesis Testing
Descriptive statistics Inferential Statistics
• To organize and summarize (sample) • To make generalizations about an
data unknown population from sample data
- Describe data visually and Use statistical tests (e.g., t-tests, chi-
numerically by measurement: e.g., square tests) to estimate the population
measures of data center, measures parameter from sample data, and
of data variation etc. hypothesis testing

Objectives:

1. Basic concepts of inferences from two sample


2. Hypothesis testing
3. Statistical tests with two samples

Basic concepts of inferences from two sample


Dependent & Independent samples
- Two samples are independent if the sample values from one population are not
related to or somehow naturally paired or matched with the sample values from the
other population.
- Two samples are dependent (matched pairs) if the sameple values are somehow
matched, where the matching is based on some inherent relationship.
➔ E.g. Which type of samples are they?
o Writing scores of group A are compared with writing scores of group B →
independent
o Pre-test writing scores and post-test wrtiting scores of group A →
dependent

Group discussion: What are the possibale comparisons and what types of the samples?

Group A Group B
Pre-test (1) (2)
Post-test (3) (4)
Delayed post-test (5) (6)
- Independent samples:
o (1) & (2)
o (3) & (4)
o (5) & (6)
- Dependent samples:
o (1) & (5)
o (1) & (3)
o (3) & (5)
o (2) & (4)
o (2) & (6)
o (4) & (6)

Group A Group B Group C


Pre-test (1) (2) (7)
Post-test (3) (4) (8)
Delayed post-test (5) (6) (9)
- Independent samples: 12, 34, 56, 27, 48, 69, 17, 28, 59
- Dependent samples: 13, 15, 35, 24, 46, 26, 78, 79, 89

Hypothesis Testing
- Hypothesis /haɪˈpɒθ.ə.sɪs/
- Hypotheses /haɪˈpɒθ.ə.sɪz/
Basic concepts
- A hypothesis test is a claim or statement about a property of a population.
- A hypothesis test (or test of significance) is a procedure for testing a claim about a
property of a population.

Types of Hypotheses
- What do the following symbols stand for?

= : equal ≠ : different > : more than < : less than

- Null hypothesis: = - Alternative hypothesis: ≠, >, <


H0 HA : Two tails
H1 H2 : One tail

➔ Example: Identify H0 and HA of the following claims about the property of the
population:
o The mean score of the online class is 60
H0:  = 60
HA:  ≠ 60
H1:  > 60
H2:  < 60

- The null hypothesis (denoted by H0) is a statement that the value of a population
parameter (such as proportion, mean, or standard deviation) is equal to some
claimed value.
- The alternative hypothesis (denoted by H1 or Ha or HA) is a statement that the
parameter has a value that somehow differs from the null hypothesis. For the
methods of this chapter, symbolic form of the alternative hypothesis must be use
one of these symbols: <, >, ≠

➔ Example: Identify H0 and HA of the following claims about the property of the
population:
o There is no difference in mean scores between the online class (1) and the
F2F class (2)
H0: 1 = 2
HA: 1 ≠ 2
H1: 1 > 2
H2: 1 < 2
o The variances of the online class (12) and the F2F (22)
H0: 12 = 22
HA: 12 ≠ 22
H1: 12 > 22
H2: 12 < 22

Procedure for Hypothesis Testing

1. Identify the claim, H0 and Ha

2. Select significance level alpha

3. Identify the test statistics

4. Check required assumptions

5. Calculate estimate of population parameter


• sá slkkvl sjvkjklv

Significance level alpha for a hypothesis testing

E.g.:, H0 : muy1 = muy2

1. If H0 is true, how do we interprete that?


→ The [means] are equal or there is no difference in [means]

2. If H0 is rejected, how do we interpret that?

→ H0 is not true; the means are not equal; or there is some difference; or a difference
exists.

3. Significance level alpha of 0.05 for a hypothesis testing indicates a 5% risk of


concluding that a difference exists; in fact, they are equal/there is no actual difference.

➔ Practice: check picture

Identify the test statistics

Independent
sample

Hypothesis of
difference

Dependent
sample
Statistical
tests
Categorical
data

Hypothesis of
Ordinal data
association

Numerical
data
Week 12 – 9/5/2025
Review
The purpose of quantity study: test prediction is true or not

1st step: identify the claim (H0 or Ha)

2nd step: identify the significance level

3rd step: identify statistical test – whether it’s meaningful in your study

4th step: check required assumptions whether your dat meet the requirements or not

5th step: check if the result is valid or correct


Find the value of the test stat & p-value method
Alpha = 0.05

H0: muy = 50

HA: muy # 50

Decide if you can reject or fail to reject H0

a. p = 0.025 → Reject H0
b. p = 0.3 > 0.05 → Fail to reject H0
c. p = 0.000 < 0.05 → Reject H0
d. p = 0.431 > 0.05 → Fail to reject H0
e. p = 0.05 = 0.05 → Reject H0

Final conclusion of Hypothesis Test


E.g.; H0: muy1 = muy2

p-value Decision Conclusion


p-value =< alpha Reject H0 There is sufficient evidence to support
the rejection of the claim that… (H0)
p-value > alpha Fail to reject H0 There is not sufficient evidence to
support the rejection of the claim that
… (H0)

Confidence Interval Method


• A confidence interval estimates a population parameter contains the likely values of
that parameter
• Reject H0: the population parameter has a value that is not included in the
confidence interval

Reporting test statistics & p-value

Module 5: INFERENCES ABOUT


PROPORTIONS
Review:

Descriptive Inferential
- -
- -
- -
- -
Identify the test statistics:

z test for
proportion

Independent F test for


sample variances

Hypothesis of Independent t-
difference test for means

Dependent Paired sample


sample test for means
Statistical tests
Chi-square of
Categorical data
Independence

Hypothesis of
Ordinal data Spearman rho
association

Numerical data Pearson r's

Z test for proportion


Proportion
Example: IU University with 2600 students

Number of boys: 1950

Number of girls: 650

1. What is the proportion of the boys at IU?


➔ (1950/2600)x100% = 75%
2. What is the proportion of the girls at IU?
➔ 25%

Example: IU with n students

Number of boys: x1
Number of girls: x2

1. What is the proportion of the boys at IU?


𝑥1
. 100 = 𝑝
𝑛
2. What is the proportion of the girls at IU?
𝑥2
. 100 = 𝑞
𝑛
q is complement of p: q = 1 – p

Population proportion vs. Sample proportion


Decide if the following is population proportion or sample proportion

1. a. The proportion of male students at all universities in HCMC → population propor


b. The proportion of male students at IU. → sample propor
2. a. The proportion of male students at Le Hong Phong highschool → sample
b. The proportion of male students at all highschools in HCMC → population

a. Group 1 (IU) with 2600 students


b. Group 2 (LHP high school) with 2600 students
1. In group 1 (IU), 1950 males. What is the proportion of the males at IU
2. In group 2 (LHP high school), 1560 males. What is the proportion of male students at
LHP high school?

Population proportion p1 Population proportion p2

~
Test statistic for two proportions
Check pictures

Two variances or Standard Deviations


H0: sigma1 square = sigma2 square

HA: sigma1 square # sigma2 square

Requirements:
1. The two populations are independent.
2. The two samples are simple random samples.
3. Each of the two populations must be normally distributed, regardless of their
sample size.

Test Statistics:
𝑠1 𝑠𝑞𝑢𝑎𝑟𝑒
F = 𝑠2 𝑠𝑞𝑢𝑎𝑟𝑒

p-value =< alpha: sigma1 square # sigma2 square

p-value > alpha: sigma1 square = sigma2 square


Degree of freedom:
Week 13 – 16/5/2025

INFERENCES ABOUT TWO MEANS:


DEPENDENT SAMPLES (MATCHED
PAIRS)
Dependent samples (within group, paired
samples)
Before After Difference
1 2 d
1 𝟐 𝐝
n n n

➔ Why not n1 n2 nd → the same sample n


➔ At the beginning: 35 → later, 5 people dropped out of the data set → exclude them
➔ Hypothesis: about the population → why muy not x bar → because it is inferential
statistics

H0: muy 1 = muy 2

HA: muy 1 # muy 2

Requirements:

1. The sample data are dependent (matched pairs)


2. The matched pairs are a simple random sample
3. The number of pairs is large (n > 30)
4. The differences of the pairs are distributed normally.

t value: t = (xbar1 – xbar2)/ Squareroot(sd^2 /n)

df = n-1
confidence interval

muy1 – muy 2 = xbar1 – xbar2 +- E

E = ta/2 . squareroot(sd^2/n)

Week 14 - 23/5/2025
Chi-square rest for independence
Purpose:

• To compare two categorical variables in a contingengy table to see if they are


related (dependent) or unrelated (independent)

Hypotheses:

• H0: the two variables (factors) are independent.


• Ha: The two variables (factors) are dependent.

Scenario: 261 sts (50 frmen, 66 sopho, 67 juniors, 78 seniors) were surveyed to find out their
ratings of their academic motivations (1 = not at all motivated; 2 = somewhat motivated; 3 =
very motivated; 4 = highly motivated)

H0: Class level and academic motivation are independent (no relationship)

Ha: Class level and academic motivation are dependent (related)

Contingency table

• Number of students per rating level by class level

Class level 1 2 3 4 Total


Freshmen 10 20 14 6 50
Sophomores 13 13 20 20 66
Juniors 16 12 30 9 67
Seniors 24 18 26 10 78

Requirements

• Two categorical variables (nominal or ordinal variables) → at least two categorical


variables: freshmen, sophomore,…
• Two or more categories (groups) for each variable
• Independence of observations
1. Data Type Requirements

• Categorical variables only (nominal or ordinal).

o Examples: gender, preference, satisfaction level.

• Data must be in frequency counts, not percentages or raw scores.

2. Independence of Observations

• Each observation must be independent.

o One individual = one observation.

o No repeated measures or paired data.

3. Expected Frequencies

• Expected count in each cell should ideally be:

o At least 5 for the Chi-square test to be reliable.

o Acceptable if:

▪ No expected frequency < 1, and

▪ No more than 20% of cells have expected frequency < 5.

4. Sample Size

• Should be large enough to ensure that the approximation to the Chi-square


distribution is valid.

• Small samples (especially in 2x2 tables) may require Fisher’s Exact Test.

5. Random Sampling

• The data should come from a random sample from the population to ensure
generalizability.

➔ In the Chi-square test, one of the key assumptions is that the expected frequency
in each cell of the contingency table should be 5 or more. When expected counts
are less than 5, the test may not be valid due to inaccuracies in the approximation
to the Chi-square distribution. → suggested solution: merge them
Formula for a Chi-square Statistics

• Test statistics

Xc^2 = sum (Oi – Ei)^2/Ei

X^2 is the chi-square test statistic

O is the observed frequency

E is the expected frequency

Hypotheses

H0: The two variables (factors) are independent/There is no relationship between the two
variables

Ha: The two variables (factors) are dependent/There is a relationship between the two
variables.

df: (number of variable 1 group - 1)* (number of variable 2 groups - 1)

• p-value =< a: x^2 # 0


• p-value > a x^2 = 0

Correlation and Regression


Correlation
A correlation exists between two variables when the values of one valriable are somehow
associated with the values of the other variable.

A linear correlation
Exists between two variables when there is a correlation and the plotted points of paired
data result in a pattern that can be approximated by a straight line.
Scatter plots
Direction & strength of a linear correlation between two variables

Linear correlation coefficient


• for a sample: r
• for a population: p (rho)

b is stronger

You might also like