0% found this document useful (0 votes)

13 views

Chapter6 Tests Relation Variables

The document discusses various statistical tests for examining relationships between variables, including parametric and non-parametric tests. It provides information and examples of Pearson correlation tests, Spearman correlation tests, chi-square tests of independence, t-tests, ANOVA, and Kruskal-Wallis tests. Examples are given of applying these tests along with interpreting results and checking assumptions.

Uploaded by

Syed Shahzaib Asghar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Chapter6 Tests Relation Variables

Uploaded by

Syed Shahzaib Asghar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

FACULTY OF ECONOMICS AND BUSINESS

CAMPUS BRUSSELS

Statistical Modelling
Tests for relation between 2 variables

1
Context

Parametric Non-parametric Non-parametric

(no normality or ordinal) (nominal)

1 sample t-test Sign test Binomial test

Wilcoxon signed-rank test
2 paired samples t-test differences /

2 independent samples t-test Mann-Whitney-Wilcoxon test Chi-square test

More than 2 ANOVA Kruskal-Wallis test Chi-square test

independent samples
Relation between 2 Pearson-correlation Spearman correlation Chi-square test
variables (linear relation two (relation between ordinal (relation two
quantitative variables) variables) qualitative
variables)

2
Chi-square test of independence
 Goal: Evaluate whether there is a statistical relation between two
qualitative variables.
o the two variables are independent
o the two variables are dependent
 Method: The chi-square test statistic is based on counts in the cross-table
of two variables. It measures the distance between
o observed counts
o expected counts if the two variables are statistically independent
number of rows in cross-table,
number of columns in cross-table

 If is true the chi-square statistic has a distribution with

degrees of freedom.
 Assumptions: (1) all , (2) not more than 20% cells with .

3
Chi-square test: approach
Example: Are the categorical variables education level and income
category related?

4
Chi-square test: approach
As we are at the boundary of violating the assumptions, we join the
categories college degree and post-undergraduate degree.

5
Pearson correlation test
 Goal: Evaluate whether two quantitative variables have a linear relation.
We also aim to assess the direction and strength of the linear relation.
 We distinguish
o The population correlation coefficient
o The sample correlation coefficient
 A correlation coefficient takes values between -1 and 1, i.e.
o means the variables are not related
o close to 0 means the variables have a weak relation
o means the variables have a perfect positive linear relation
o means the variables have a perfect negative linear relation

6
Pearson sample correlation
 Suppose we have a SRS of the variables and
The sample correlation between quantitative variables and is defined
as:

 A positive linear relation between and (see

top panel) means that observations with an -value
above average usually also have a -value above
average.
 A negative linear relation between and (see
bottom panel) means that observations with an -
value above average usually also have a -value
below average.

7
Pearson sample correlation

measures the size and direction of the linear relation between two
variables

8
Pearson sample correlation
measures the size and direction of the linear relation between and .
In this example but there is a strong non-linear (i.e., quadratic)
relation between and .

9
Pearson sample correlation
Outliers can have a very big effect on the sample correlation coefficient.

In this example one outlier increases the sample correlation from to

10
Sample Pearson correlation in SPSS
 We compute correlations between monthly wage, weekly working hours,
age for a sample of observations.
In SPSS: analyze/correlate/bivariate

Correlation
between age
monthly wage
= .302

11
Test
 there is no linear relation between and : .
 there is a linear relation between and : .
 If is true, and if has a bivariate Normal distribution (or if
than the test statistic is -distributed with degrees of
freedom:

12
Test
 If and have a bivariate normal distribution, the scatterplot has the
shape of an ellipse.
Bivariate normal distribution no bivariate normal distribution

Remark: if the Pearson correlation test is valid, even if and do

not have a bivariate Normal distribution.
13
Test in SPSS

14
Spearman correlation-test
 The non-parametric Spearman correlation test can be used
o to measure and test the relation between two ordinal qualitative
variables
o to measure and test the relation between two quantitative variables if
the assumptions of the Pearson correlation test are violated (i.e., small
sample and do not have a Bivariate Normal distribution).

The Spearman correlation (available in SPSS) is equivalent to the Pearson

correlation computed on the ranks of the observations.

We do not further discuss the test in this course.

15
Overview testing the relation between variables
(Parametric // non-parametric) test
2 quantitative variables:
Pearson correlation // Spearman correlation

 2 qualitative variables:
chi-square test

 1 quantitative variable and 1 qualitative variable

o qualitative variable with 2 categories:
independent samples -test // MWW-test
o qualitative variable with more than 2 categories
ANOVA // Kruskal Wallis-test

16
Exercise 1
 Suppose we have a sample of 4000 observations for the following
variables:
o Trust of a respondent in the government measured on a scale from 0 to 100.
o Country with categories 1=Belgium, 2=France, 3= the Netherlands
o Age measured in years
o Gender: nominal variable with categories 0=male, 1=female
Which test can you use to test whether there is a relation between
o Country and trust
o Gender and trust
o Country and gender
o Trust and age
 Formulate the null and alternative hypothesis for each test. Discuss
whether/when the proposed test is valid in the present context.

17
Exercise 2
 Consider the cross-table between two qualitative variables education level and
type of company for a sample of observations . The table contains
observed counts and expected counts if the variables are assumed to be
independent.

 Compute the expected counts for the first row of the table, compute the chi-
square test statistic and test (using ) the null hypothesis that education
level and type of company are statistically independent. Formulate a conclusion
about the result of the test.
18
Exercise 3
 We compute the Pearson sample correlation between household income
and years with current employer in a SRS of employees.
Correlations

1 ,625

N 850 850
,625 1

N 850 850

 Test whether the population correlation is positive (using ) and

draw a conclusion. Indicate whether the assumptions of the test are
satisfied.
19
Solution Exercise 1
Relation country and trust

 the null hypothesis is wrong
 To test , we can use a one-way ANOVA with as dependent variable
trust and as factor country.
 If the assumptions of the ANOVA are violated (residuals not normally
distributed, population variances not equal), the non-parametric Kruskal-
Wallis test could be used.
Relation gender and trust


 To test , we can use an independent samples t-test with as dependent
variable trust and as factor gender.

20
Solution Exercise 1
 The sample is very large and hence the t-statistic has an
approximate t-distribution if population variances for males and females
are equal. If the null-hypothesis of equal population variances for
males/females is rejected, a Welch correction to the t-statistic can be used.
Country and Gender
 country and gender are statistically independent
 country and gender are statistically dependent
 To test we can use a Pearson chi-square test on the cross-table country
x gender. The assumptions are (1) that all expected counts are larger than 1,
(2) that not more than 20% of the cells in the cross-table are smaller than 5.
 Stated otherwise, the chi-square test tests the null hypothesis that the
proportion of males is the same in the three countries:
versus is wrong

21
Solution Exercise 1
Trust and age


 To test that there is no linear relation between age and trust in the
population we can use a Pearson correlation test. As the sample size is
large the test statistic has an approximate t-distribution and
hence the test is valid.

22
Solution Exercise 2

 the variables company size and diploma are statistically independent

 the variables company size and diploma are statistically dependent
 Expected counts
o Small company and low education level: (794)(734)/1476=394.8
o Small company and average education level (794)(505)/1476=271.7
o Small company and high education level: (794)(237)/1476=127.5

23
Solution Exercise 2


 Let
 and hence we reject . We conclude with 95% confidence that
company size and diploma are statistically related.
 The assumptions of the test are satisfied:
o All expected counts are larger than or equal to 1
o There are no cells with an expected count smaller than 5, hence the
proportion of cells with is smaller than 20%.

24
Solution Exercise 3
 We test against H A :   0 with


 and hence we reject . We conclude with 95% confidence that the
population correlation between household income and years with current
employer is positive.
 The scatterplot shows that the assumption of a bivariate Normal
distribution for the two variables is doubtful. However, as the sample size
is large , the test statistic will have an approximate t-distribution
and hence the test is valid.
 Remark: to reduce the influence of outliers it is recommended to apply a
natural log transformation to household income.