0% found this document useful (0 votes)
13 views

Chapter6 Tests Relation Variables

The document discusses various statistical tests for examining relationships between variables, including parametric and non-parametric tests. It provides information and examples of Pearson correlation tests, Spearman correlation tests, chi-square tests of independence, t-tests, ANOVA, and Kruskal-Wallis tests. Examples are given of applying these tests along with interpreting results and checking assumptions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Chapter6 Tests Relation Variables

The document discusses various statistical tests for examining relationships between variables, including parametric and non-parametric tests. It provides information and examples of Pearson correlation tests, Spearman correlation tests, chi-square tests of independence, t-tests, ANOVA, and Kruskal-Wallis tests. Examples are given of applying these tests along with interpreting results and checking assumptions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

FACULTY OF ECONOMICS AND BUSINESS

CAMPUS BRUSSELS

Statistical Modelling
Tests for relation between 2 variables

1
Context

Parametric Non-parametric Non-parametric


(no normality or ordinal) (nominal)

1 sample t-test Sign test Binomial test


Wilcoxon signed-rank test
2 paired samples t-test differences /

2 independent samples t-test Mann-Whitney-Wilcoxon test Chi-square test

More than 2 ANOVA Kruskal-Wallis test Chi-square test


independent samples
Relation between 2 Pearson-correlation Spearman correlation Chi-square test
variables (linear relation two (relation between ordinal (relation two
quantitative variables) variables) qualitative
variables)

2
Chi-square test of independence
 Goal: Evaluate whether there is a statistical relation between two
qualitative variables.
o the two variables are independent
o the two variables are dependent
 Method: The chi-square test statistic is based on counts in the cross-table
of two variables. It measures the distance between
o observed counts
o expected counts if the two variables are statistically independent
number of rows in cross-table,
number of columns in cross-table

 If is true the chi-square statistic has a distribution with


degrees of freedom.
 Assumptions: (1) all , (2) not more than 20% cells with .

3
Chi-square test: approach
Example: Are the categorical variables education level and income
category related?

4
Chi-square test: approach
As we are at the boundary of violating the assumptions, we join the
categories college degree and post-undergraduate degree.

5
Pearson correlation test
 Goal: Evaluate whether two quantitative variables have a linear relation.
We also aim to assess the direction and strength of the linear relation.
 We distinguish
o The population correlation coefficient
o The sample correlation coefficient
 A correlation coefficient takes values between -1 and 1, i.e.
o means the variables are not related
o close to 0 means the variables have a weak relation
o means the variables have a perfect positive linear relation
o means the variables have a perfect negative linear relation

6
Pearson sample correlation
 Suppose we have a SRS of the variables and
The sample correlation between quantitative variables and is defined
as:

 A positive linear relation between and (see


top panel) means that observations with an -value
above average usually also have a -value above
average.
 A negative linear relation between and (see
bottom panel) means that observations with an -
value above average usually also have a -value
below average.

7
Pearson sample correlation

measures the size and direction of the linear relation between two
variables

8
Pearson sample correlation
measures the size and direction of the linear relation between and .
In this example but there is a strong non-linear (i.e., quadratic)
relation between and .

9
Pearson sample correlation
Outliers can have a very big effect on the sample correlation coefficient.

In this example one outlier increases the sample correlation from to


.

10
Sample Pearson correlation in SPSS
 We compute correlations between monthly wage, weekly working hours,
age for a sample of observations.
In SPSS: analyze/correlate/bivariate

Correlation
between age
monthly wage
= .302

11
Test
 there is no linear relation between and : .
 there is a linear relation between and : .
 If is true, and if has a bivariate Normal distribution (or if
than the test statistic is -distributed with degrees of
freedom:

12
Test
 If and have a bivariate normal distribution, the scatterplot has the
shape of an ellipse.
Bivariate normal distribution no bivariate normal distribution

Remark: if the Pearson correlation test is valid, even if and do


not have a bivariate Normal distribution.
13
Test in SPSS

14
Spearman correlation-test
 The non-parametric Spearman correlation test can be used
o to measure and test the relation between two ordinal qualitative
variables
o to measure and test the relation between two quantitative variables if
the assumptions of the Pearson correlation test are violated (i.e., small
sample and do not have a Bivariate Normal distribution).

The Spearman correlation (available in SPSS) is equivalent to the Pearson


correlation computed on the ranks of the observations.

We do not further discuss the test in this course.

15
Overview testing the relation between variables
(Parametric // non-parametric) test
2 quantitative variables:
Pearson correlation // Spearman correlation

 2 qualitative variables:
chi-square test

 1 quantitative variable and 1 qualitative variable


o qualitative variable with 2 categories:
independent samples -test // MWW-test
o qualitative variable with more than 2 categories
ANOVA // Kruskal Wallis-test

16
Exercise 1
 Suppose we have a sample of 4000 observations for the following
variables:
o Trust of a respondent in the government measured on a scale from 0 to 100.
o Country with categories 1=Belgium, 2=France, 3= the Netherlands
o Age measured in years
o Gender: nominal variable with categories 0=male, 1=female
Which test can you use to test whether there is a relation between
o Country and trust
o Gender and trust
o Country and gender
o Trust and age
 Formulate the null and alternative hypothesis for each test. Discuss
whether/when the proposed test is valid in the present context.

17
Exercise 2
 Consider the cross-table between two qualitative variables education level and
type of company for a sample of observations . The table contains
observed counts and expected counts if the variables are assumed to be
independent.

 Compute the expected counts for the first row of the table, compute the chi-
square test statistic and test (using ) the null hypothesis that education
level and type of company are statistically independent. Formulate a conclusion
about the result of the test.
18
Exercise 3
 We compute the Pearson sample correlation between household income
and years with current employer in a SRS of employees.
Correlations

1 ,625

N 850 850
,625 1

N 850 850

 Test whether the population correlation is positive (using ) and


draw a conclusion. Indicate whether the assumptions of the test are
satisfied.
19
Solution Exercise 1
Relation country and trust

 the null hypothesis is wrong
 To test , we can use a one-way ANOVA with as dependent variable
trust and as factor country.
 If the assumptions of the ANOVA are violated (residuals not normally
distributed, population variances not equal), the non-parametric Kruskal-
Wallis test could be used.
Relation gender and trust


 To test , we can use an independent samples t-test with as dependent
variable trust and as factor gender.

20
Solution Exercise 1
 The sample is very large and hence the t-statistic has an
approximate t-distribution if population variances for males and females
are equal. If the null-hypothesis of equal population variances for
males/females is rejected, a Welch correction to the t-statistic can be used.
Country and Gender
 country and gender are statistically independent
 country and gender are statistically dependent
 To test we can use a Pearson chi-square test on the cross-table country
x gender. The assumptions are (1) that all expected counts are larger than 1,
(2) that not more than 20% of the cells in the cross-table are smaller than 5.
 Stated otherwise, the chi-square test tests the null hypothesis that the
proportion of males is the same in the three countries:
versus is wrong

21
Solution Exercise 1
Trust and age


 To test that there is no linear relation between age and trust in the
population we can use a Pearson correlation test. As the sample size is
large the test statistic has an approximate t-distribution and
hence the test is valid.

22
Solution Exercise 2

 the variables company size and diploma are statistically independent


 the variables company size and diploma are statistically dependent
 Expected counts
o Small company and low education level: (794)(734)/1476=394.8
o Small company and average education level (794)(505)/1476=271.7
o Small company and high education level: (794)(237)/1476=127.5

23
Solution Exercise 2


 Let
 and hence we reject . We conclude with 95% confidence that
company size and diploma are statistically related.
 The assumptions of the test are satisfied:
o All expected counts are larger than or equal to 1
o There are no cells with an expected count smaller than 5, hence the
proportion of cells with is smaller than 20%.

24
Solution Exercise 3
 We test against H A :   0 with


 and hence we reject . We conclude with 95% confidence that the
population correlation between household income and years with current
employer is positive.
 The scatterplot shows that the assumption of a bivariate Normal
distribution for the two variables is doubtful. However, as the sample size
is large , the test statistic will have an approximate t-distribution
and hence the test is valid.
 Remark: to reduce the influence of outliers it is recommended to apply a
natural log transformation to household income.

25

You might also like