0% found this document useful (0 votes)
60 views39 pages

Session 1.3 Notes

1. The document discusses various statistical techniques for analyzing relationships between variables, including correlation, linear regression, multiple regression, and using dummy variables. 2. Key terms defined include the correlation coefficient, significance testing of correlations, simple and multiple linear regression models, the coefficient of determination (R2), and how dummy variables can be used to quantify categorical variables. 3. Examples are provided of how to perform standard multiple regression and interpret results, as well as how to include dummy variables in a regression model to examine effects of categorical predictors like gender.

Uploaded by

Arwin Siy Layson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views39 pages

Session 1.3 Notes

1. The document discusses various statistical techniques for analyzing relationships between variables, including correlation, linear regression, multiple regression, and using dummy variables. 2. Key terms defined include the correlation coefficient, significance testing of correlations, simple and multiple linear regression models, the coefficient of determination (R2), and how dummy variables can be used to quantify categorical variables. 3. Examples are provided of how to perform standard multiple regression and interpret results, as well as how to include dummy variables in a regression model to examine effects of categorical predictors like gender.

Uploaded by

Arwin Siy Layson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Essentials of

Research
Day 1

Dr Shweta
Pandey
Quantitative estimate of linear correlation

Karl Pearson’s correlation coefficient ( ) : Interval or ratio scale; linear; normally


distributed
Significance of correlation coefficient
i.e. null hypothesis of no correlation in the population against the
alternative hypothesis that there is a significant correlation

Where rxy is the sample correlation coefficient between x and Y


and t statistic has n-2 degrees of freedom.

If computed absolute value of t > tabulated value of t with n-2


degrees of freedom then, the null hypothesis is rejected.

In SPSS if p-value <0.05 then null hypothesis will be rejected at


5% level of significance
What is regression?

Dependent variable
(y)
Independent variable (x)

• It depicts the variation in a dependent variable using one or more independent


variables.
• It is an explanation of causation
• If the independent variable sufficiently explain the variation in the dependent
variable, the model can be used for prediction.
• Dependent variable= Ratio/Interval/Continuous
• Independent variable= Continuous or Categorical
Simple linear regression
Simple Linear Regression

Depicted as y= b0+ b1x1+e


Dependent variable

y = b 0 + b1 x ± e Where
є
y is the dependent/effect variable
b1 = slope
x is the independent/ regressor/causal variable,
b0 (y intercept) = ∆y/ ∆x
e is the error
(y)

Independent variable (x) b0 and b1 are parameters that must be estimated


so that the equation best represents a given
data
b1= slope i.e. what is estimated change in
average value of y with 1-unit change in x1

Simple linear regression fits a straight line to


the data.
Multiple Regression
The percentage of variation in the dependent variable explained by the
independent variables is known as the Coefficient of Determination, and is often
referred to R²

H0: There is no relationship between the dependent and independent variables


H1: There is a significant relationship between the dependent and independent
variables
If probability of the F statistic for the overall regression relationship is <0.001,
less than or equal to the level of significance of 0.05, we reject the null
hypothesis (R² = 0).

6
Coefficient of Determination
The Coefficient of Determination

The value of R² can range between 0 and 1, and the higher its value the more
accurate the regression model is.

It is independent of units of measurement and can therefore be used for


comparing goodness of fit of two regression equations.

Rrule of thumb:
correlation <=0.20 is characterized as very weak;
>= 0.20 and less than or equal to 0.40 is weak;
>= 0.40 and less than or equal to 0.60 is moderate;
>= 0.60 and less than or equal to 0.80 is strong;
>= 0.80 is very strong.
Types of Multiple Regression

1. Standard multiple regression is used to evaluate the relationships between a


set of independent variables and a dependent variable.
2. Hierarchical, or sequential, regression is used to examine the relationships
between a set of independent variables and a dependent variable, after
controlling for the effects of some other independent variables on the dependent
variable.
3. Stepwise, or statistical, regression is used to identify the subset of
independent variables that has the strongest relationship to a dependent
variable.

8
Standard Multiple Regression: Session 1.3 Activity 1 data

To compute a multiple
regression in SPSS,
select the Regression |
Linear command from
the Analyze menu.

9
Variables
First, move the
dependent variable
Demand to the
Dependent text box.
Third, select the
method for entering
Second, move the the variables into the
independent variables analysis from the
price and income to drop-down Method
the Independent(s) list menu. In this
box. example, we accept
the default of Enter
for direct entry of all
variables, which
Fourth, click on the produces a standard
Statistics… button multiple regression.
to specify the
statistics options
that we want.
Standard Multiple Regression: Statistics

First, mark the


checkboxes for
Estimates on the
Regression
Coefficients Third,
panel. Collinearity
statistics for
multi-
collinearity
Fourth, click on the issues
Second, mark the
Continue button to
checkboxes for
close the dialog box.
Model Fit and
Then click OK
Descriptives.
Standard Multiple Regression

1. All the independent variables are entered at the same time


2. R² measures the strength of the relationship between the set of independent
variables and the dependent variable.
3. An F test is used to determine if the relationship can be generalized to the
population represented by the sample.
4. A t-test is used to evaluate the individual relationship between each
independent variable and the dependent variable.

12
Dummy Variables:

• Dependent variable may be influenced by the qualitative variables- gender,


marital status, profession, geographical region, and religion etc.

• To quantify the qualitative variables, dummy variables are used.

• The number of dummy variables in a regression model equals the number of


categories of data less one.

• Dummy variable may take two values such as zero, one; ten, eleven; or any
other such value.

• Dummy variables could also be used to examine the moderator effect


between two variables.
Dummy Variables: Categorical variables
Suppose dependent variable is impacted by categorical/nominal variables

Example cosmetics ordered = f(Price, Gender)

Here we use DUMMY variables

Number of Dummy variables= No of groups-1= 2-1

Define Gender:

Value=0 if female
= 1 if male
Session 1.3 Activity 2 data
Dummy Variables
Salary, Y= f( Experience, Gender)
Gender (D)
= 1 if male
= 0 if female
Dummy Variables Y= 17.231+1.545*Experience
+3.286 Gender

All p values<0.05 so significant

Interpretation: Gender= 0, Female

For females:
Salary= 17.231+1.545 Experience

For males:
Salary= 17.231+1.545*Experience +
3.286
Average salary of males >Average
salary of females if experience is
kept constant
Session 1.3 Activity 2
Multi-Collinearity

VIF (Variance Inflation Factor)<5 indicates no multi-collinearity issue


Session 1.3 Activity 2 dummy variables data
Moderating effects:
Suppose we want to see moderating impact of years of experience on the
starting salary of males' vs females
Salary= f( X, DX)

X= Experience

D= Dummy variable for gender

Value=0 if female
= 1 if male
Moderating effects: Session 1.3 Activity 2 Data

Compute variable DX in SPSS


Moderating effects: Session 1.3 Activity 2
Suppose we want to see moderating impact of years of experience on the starting salary of males vs
females
Moderating effects: Session 1.3 Activity 2
Suppose we want to see moderating impact of years of experience on the starting salary of males vs
females
Moderating effects: Session 1.3 Activity 2
Y (Salary)= 18.964+1.225X+
0.639 DX

D=0 for females

Female Salary= 18.964+1.225X+


0.639*0

D=1 for males

Male salary= 18.964+1.225X+


0.639X
Thus male salary is Peso 639
higher per month for every year of
experience as compared to
female lecturers.
Moderator Variable

Consider Y = a + b1x + b2z + b3xz

• If b3 is insignificant and b2 is significant, than z is not a moderator


variable but simply an independent predictor variable.
• If b2 is insignificant and b3 is significant, than z is a PURE moderator
variable.
• If both b2 and b3 are significant, than z is a QUASI moderator variable.
• Our example both coefficient of Experience and Gender*Experience
(DX) are significant so Gender is a Quasi Moderator
Standard multiple regression

1. Dependent variable metric?/ Independent variables metric or dichotomous?

2. Ratio of cases to independent variables at least 5 to 1?

3. Probability of ANOVA test of regression less than/equal to level of significance?

4. Probability of relationship between each IV and DV <= level of significance?

5. Coefficient of determination value


HIEARCHICAL MULTIPLE REGRESSION

1. In hierarchical multiple regression, the independent variables are entered in


two stages.
2. In the first stage, the independent variables that we want to control for are
entered. In the second stage, the independent variables whose relationship
we want to examine after the controls are entered.
3. A statistical test of the change in R² from the first stage is used to evaluate
the importance of the variables entered in the second stage.

25
Hierarchical multiple regression

Happiness= F(age, Sex,


Health, life)

Control variables:
Age, Sex

To compute a multiple
regression in SPSS,
select the Regression |
Linear command from
the Analyze menu.
Specify Control variables
First, move the
dependent
variable to the
Dependent text
box.

Second, move the Fourth, click on the


independent Next button to tell
variables to control SPSS to add another
for age and sex to block of variables to
the Independent(s) the regression
list box. analysis.

Third, select the method for entering


the variables into the analysis from
the drop down Method menu. In this
example, we accept the default of
Enter for direct entry of all variables in
the first block which will force the
controls into the regression.
Add the other independent variables
1. SPSS identifies that
we will now be adding
variables to a second
block.

Move the other


independent variables-
Health, Life to the
Independent(s) list box
for block 2. Click on the
Statistics… button
to specify the
statistics options
that we want.
Specify the statistics output options

Mark the checkboxes for Model Fit,


Descriptive, and R squared change.
The R squared change statistic will tell
us if the variables added after the
controls have a relationship to the
dependent variable.
Hierarchical multiple regression
1. Dependent variable metric?/ Independent variables metric or
dichotomous?
2. Ratio of cases to independent variables at least 5 to 1?
3. Probability of F test of for change in R² less than or equal to level of
significance?
4. Change in R² correctly reported?
5. Probability of relationship between each IV added after controls and DV
less than or equal to level of significance?
6. Direction of relationship between each IV added after controls and DV
interpreted correctly?
STEPWISE MULTIPLE REGRESSION

1. Find the most parsimonious set of predictors that are most effective in
predicting the dependent variable.
2. Variables are added to the regression equation one at a time, using the
statistical criterion of maximizing the R² of the included variables.

31
Request a stepwise multiple regression

To compute a multiple
regression in SPSS,
select the Regression |
Linear command from
the Analyze menu.
Specify variables and method for selecting
variables
First, we move the dependent
variable income98 to the
Dependent text box.

Second, move the


independent variables to
control for hrs1,
prestg80, educ, and
degree to the
Independent(s) list box.

Third, select the Stepwise method


for entering the variables into the
analysis from the drop down
Method menu.
Open statistics options dialog box

Click on the Statistics…


button to specify the statistics
options that we want.
Specify the statistics output options

First, mark the


checkboxes for
Estimates on the
Regression
Coefficients panel.

Third, click on
the Continue
Second, mark button to close
the checkboxes the dialog box.
for Model Fit and
Descriptives.
Request the regression output

Click on the OK
button to request
the regression
output.
Relationship between dependent and independent
variables Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .492 a .242 .237 3.607
2 .532 b .283 .273 3.522
a. Predictors: (Constant), RS HIGHEST DEGREE
b. Predictors: (Constant), RS HIGHEST DEGREE, RS
OCCUPATIONAL PRESTIGE SCORE (1980)

The Multiple R for the relationship between the


subset of independent variables that best predict
the dependent variable is 0.532, which would be
characterized as moderate
Relationship between dependent and independent
variables
Variables Entered/Removeda

The most important predictor of Variables Variables


Model Entered Removed Method
total family income is highest 1 Stepwise
(Criteria:
academic degree. Probabilit
y-of-F-to-e
RS
nter <=
HIGHEST .
.050,
The second most important DEGREE
Probabilit
y-of-F-to-r
predictor of total family income is emove >=
.100).
occupational prestige score. 2 Stepwise
(Criteria:
RS Probabilit
OCCUPATI y-of-F-to-e
ONAL nter <=
.
PRESTIGE .050,
SCORE Probabilit
(1980) y-of-F-to-r
emove >=
.100).
a. Dependent Variable: TOTAL FAMILY INCOME
Stepwise multiple regression
1. Dependent variable metric?/ Independent variables metric or dichotomous?
2. Ratio of cases to independent variables at least 5 to 1?
3. Probability of ANOVA test of regression less than/equal to level of significance?
4. Strength of relationship for included variables interpreted correctly?
5. Is the stated order of importance independent variables correct?
6. Probability of F test of for change in R² less than or equal to level of significance?
7. Change in R² correctly reported?
8. Probability of relationship between each IV added after controls and DV less than or
equal to level of significance?
9. Direction of relationship between each IV added after controls and DV interpreted
correctly?

You might also like