0% found this document useful (0 votes)
16 views72 pages

Week 8 - 10

Uploaded by

Tunahan Sahin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views72 pages

Week 8 - 10

Uploaded by

Tunahan Sahin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Linear Regression Analysis using

SPSS Statistics
• Once we’ve acquired data with multiple
variables, one very important question is how
the variables are related.
• For example, we could ask for the relationship
between people’s weights and heights, or
study time and test scores.
• Regression is a set of techniques for
estimating relationships, and we’ll focus on
them
• We begin with simple linear regression in
which there are only two variables of interest
(e.g., weight and height, or force used and
distance stretched).
• After developing intuition for this setting, we’ll
then turn our attention to multiple linear
regression, where there are more variables.
• We’re going to fit a line y = β0 + β1x to our
data.
• Here, x is called the independent variable or
predictor variable, and y is called the
dependent variable or response variable
• β1 is the slope of the line: this is one of the
most important quantities in any linear
regression analysis. A value very close to 0
indicates little to no relationship; large
positive or negative values indicate large
positive or negative relationships, respectively.
• β0 is the intercept of the line.
In order to actually fit a line, we’ll start
with a way to quantify how good a line
is. We’ll then use this to fit the “best”
line we can.

One way to quantify a line’s


“goodness” is to propose a
probabilistic model that generates
data from lines.

Then the “best” line is the one for


which data generated from the line is
“most likely”. This is a commonly used
technique in statistics: proposing a
probabilistic model and using the
probability of data to evaluate how
good a particular model is.
• It is used when we want to predict the value
of a variable based on the value of another
variable.
• The variable we want to predict is called the
dependent variable (or sometimes, the
outcome variable). The variable we are using
to predict the other variable's value is called
the independent variable (or sometimes, the
predictor variable).
Example
• A salesperson for a large car brand wants to
determine whether there is a relationship
between an individual's income and the price
they pay for a car.
• As such, the individual's "income" is the
independent variable and the "price" they pay for
a car is the dependent variable.
• The salesperson wants to use this information to
determine which cars to offer potential
customers in new areas where average income is
known.
• For example, you could use linear regression
to understand
– whether exam performance can be predicted
based on revision time;
– whether cigarette consumption can be predicted
based on smoking duration and so forth.
• If you have two or more independent
variables, rather than just one, you need to
use multiple regression.
Meaning of Regression Coefficients
• The values of the regression parameters 0,
and 1 are not known.We estimate them from
data.
• 1 indicates the change in the mean response
per unit increase in X.
Estimator of 0 : b0
Estimator of 1: b1
We want to estimate b0 and b1
• With using “least square estimation” method

 x y  n x y 
i i b0  y  b1 x
b1  i 1

 i  
n
x 2
 n x 2

i 1

My model
Y=b0+ b1X
The Sign of the Coefficcient b1
• if x increases and y increases “the value
of b1 is positive”.
The Sign of the Coefficcient b1
• if x increases and y decreases or if x
decreases and y increases “the value of
b1 is negative”.
INDIVIDUAL HYPOTHESIS TESTS FOR
REGRESSION COEFFICIENTS
(T TESTS)
• 0
Ho: The Regression Coefficient is not important (The regression
coefficient does not make a significant contribution to the
model)(β0=0)
Ha: The Regression Coefficient is important (The regression
coefficient makes a significant contribution to the model
(β00)
b0  (  0  0)
ttest 
Sb 0

• Degrees of freedom: (n-2)


• If ttest > ttable then Ho is rejected.
• 1
Ho: The Regression Coefficient is not important (The regression
coefficient does not make a significant contribution to the
model) (β1=0)
Ha: The Regression Coefficient is important (The regression
coefficient makes a significant contribution to the
model)(β10)

b1  ( 1  0)
ttest 
Sb1

• Degrees of freedom: (n-2)


• If ttest > ttable then Ho is rejected
• Hypothesis for 1 is very important. It should
be meaningful, not zero!!!!

My model
Y=b0+ b1X
SIMULTANEOUSLY HYPOTHESIS TESTS
(F TESTS)
• The modeled contribution of each coefficient
on the previous slide was tested separately.
• However, whether the model is generally
meaningful is possible if both coefficients are
tested at the same time.
• F test is performed.
Ho: Model is not significant
(β0= 1 =0)
Ha: Model is significant
At least β is different from 0
Example
• In order to find out the relationship between
monthly food expenses and monthly incomes,
Monthly income of 30 people and their food
expenses were asked and the following
responses were received .
Does income have a significant impact on spending? If so, what is the mathematical
expression of the linear relationship between the two.
Expense: Dependent Variable (Y)
İncome: Independent Variable (X) Uyg_reg.sav
Let's check the linear relationship between expense (Y) and income (X) with a
scatter plot
Select Simple scatter and click define

Y=Expense (dependent
variable)

X=Income (Independent
variable)
As understood, the relationship
between Y and X seems to be positive;
that means when x increases, then y
increases (the value of b1 is positive)
The easy way of finding the model
Double click on plot
yˆ  0.31  0.28 x
The Common Way
(Regression Analysis)
Dependent : expense
Independent : Income

Click Statistics
1

1) Dependent Variable: Expense


2) R2 (R square)=0.95. The R value represents the simple correlation and is 0.975
(the "R" Column), which indicates a high degree of correlation between Y and X .
The R2 value (the "R Square" column) indicates how much of the total variation in
the dependent variable, expense, can be explained by the independent
variable, Income. In this case, 95% can be explained, which is very large.
Interpretation of R square
• After you have fit a linear model using
regression analysis, ANOVA, or design of
experiments (DOE), you need to determine
how well the model fits the data.
• R-squared is a statistical measure of how close
the data are to the fitted regression line.
• R-squared is always between 0 and 100%:
– 0% indicates that the model explains none of the
variability of the response data around its mean.
– 100% indicates that the model explains all the
variability of the response data around its mean.
• In general, the higher the R-squared, the
better the model fits your data.
1 2

1) F test =536.655 . MSR/MSE=308.849/0.576=536.555


2) simultaneously Hypothesis Tests (F test)
Ho: Model is not significant (β0= 1 =0)
Ha: Model is significant At least β is different from 0

P=0.000<0.05 so H0 is rejected. Model is significant!!!!

This table indicates that the regression model predicts the dependent
variable significantly well. How do we know this? Look at the "Regression"
row and go to the "Sig." column. This indicates the statistical significance of
the regression model that was run. Here, p < 0.00, which is less than 0.05,
and indicates that, overall, the regression model statistically significantly
predicts the outcome variable.
The Coefficients table provides us with the necessary information to predict
expense from income, as well as determine whether income contributes
statistically significantly to the model (by looking at the "Sig." column).
Furthermore, we can use the values in the "B" column under the
"Unstandardized Coefficients" column, as shown below:
1 2

1) (constant) b0 = 0.314 and b1 =0.283


 b0 = 0.314 : when x=0; Y=0.314
 b1 =0.283 :This means that if X differed by one unit Y will differ by
b1units, on average. So when the income increases in 1 unit, then
expense increases 0.283 unit, on average. I said …. expense
increases …because b1 is positive
2) Standard error of b0=0.401 and Standard error of b1 =0.012
3

b0  (  0  0) 0.314
ttest    0.783
Sb 0 0.401

b1  ( 1  0) 0.283
ttest    23.164
Sb1 0.012
4
H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β0=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β00)
P=0.440-> H0 is not rejected (but it is not important for us)

H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β1=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β10)
P=0.000-> H0 is rejected The regression coefficient makes a significant contribution to
the model
In Summary
• Model; x=income y=expense

yˆ  0.314  0.283 x
Example
• A food company is interested in determining a
shelf-life for a new chilled food product and
hence they would like to quantify the
relationship between microbial activity and
time (shelf-life)
Does microbial activity have a significant impact on time (shelf-life)? If so, what is
the mathematical expression of the linear relationship between the two.
Time: Dependent Variable (Y)
microbial activity : Independent Variable (X)
Let's check the linear relationship between Time (Y) and microbial activity (X)
with a scatter plot
Select Simple scatter and click define

Y=time (dependent variable)

X= microbial activity
(Independent variable)
As understood, the relationship
between Y and X seems to be positive;
that means when x increases, then y
increases (the value of b1 is positive)
The easy way of finding the model
Double click on plot
yˆ  244  2.56 x
The Common Way
(Regression Analysis)
Dependent : time
Independent : micro_activity

Click Statistics
1

1) Dependent Variable:Time
2) R2 (R square)=0.86. The R2 value (the "R Square" column) indicates how much of
the total variation in the dependent variable, time, can be explained by the
independent variable, microbial activity,.
In this case, 86% can be explained, which is very large.
1 2

1) F test =257.314 . MSR/MSE=305680.948/1187.969= 257.314


2) simultaneously Hypothesis Tests (F test)
Ho: Model is not significant (β0= 1 =0)
Ha: Model is significant At least β is different from 0

P=0.000<0.05 so H0 is rejected. Model is significant!!!!

This table indicates that the regression model predicts the dependent
variable significantly well. How do we know this? Look at the "Regression"
row and go to the "Sig." column. This indicates the statistical significance of
the regression model that was run. Here, p < 0.00, which is less than 0.05,
and indicates that, overall, the regression model statistically significantly
predicts the outcome variable.
1 2

1) (constant) b0 =243.759 and b1 =2.556


 b0 =243.759 : when x=0; Y= 243.759
 b1 = 2.556 :This means that if X differed by one unit Y will differ by
b1units, on average. So when the microbial activity increases in 1
unit, then time increases 2.556 unit, on average.
2) Standard error of b0=13.057 and Standard error of b1 =0.159
3

b0  ( 0  0) 243.759
ttest    18.669
Sb 0 13.057

b1  ( 1  0) 2.556
ttest    16.041
Sb1 0.159
4
H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β0=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β00)
P=0.000-> H0 is rejected The regression coefficient makes a significant contribution to
the model

H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β1=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β10)
P=0.000-> H0 is rejected The regression coefficient makes a significant contribution to
the model
In Summary
• Model; x=microbial activity y=time

yˆ  243.759  2.556 x
Example
• Fruits and vegetables expenditures were
recorded in TL in 1 month and they were
asked to explain the relationship between
expenditures and the number of children they
have.
Does the number of children have a significant
effect on fruit and vegetable expenditure? If
there is a mathematical expression of the linear
relationship between these two.

Expenditure:Dependent (Y)
Number of children: Independent Variable (X)
Let's check the linear relationship between expenditure (money) (Y) and number
of children (X) with a scatter plot
Select Simple scatter and click define

Y=money (dependent
variable)

X= number of children
(Independent variable)
As understood, the relationship
between Y and X seems to be zero;
that means no relationship, no
increasing or decreasing trend!!!!!!!!
The easy way of finding the model
Double click on plot
R  0.042
2

yˆ  115  1.32 x
The Common Way
(Regression Analysis)
Dependent : money
Independent : number of kids

Click Statistics
1

1) Dependent Variable:Money
2) R2 (R square)=0.0402. The R2 value (the "R Square" column) indicates how much
of the total variation in the dependent variable, time, can be explained by the
independent variable, microbial activity,.
In this case, 4.02% can be explained, which is very small!!!!!!.
1 2

1) F test =0.877 . MSR/MSE=131.655/150.090= 0.877


2) simultaneously Hypothesis Tests (F test)
Ho: Model is not significant (β0= 1 =0)
Ha: Model is significant At least β is different from 0

P=0.36>0.05 so H0 is not rejected. Model is NOT significant!!!!


1 2

1) (constant) b0 =114.528 and b1 =-1.318


 b0 =114.528 : when x=0; Y= 114.528
 b1 = -1.318 :This means that if X differed by one unit Y will differ by
b1units, on average. So when the number of children decreases in
1 unit, then money increases 1.318 unit, on average.
2) Standard error of b0=5.073 and Standard error of b1 =1.407
3

b0  ( 0  0) 114.528
ttest    22.576
Sb 0 5.073

b1  ( 1  0) 1.318
ttest    0.205
Sb1 1.407
4

H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β0=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β00)
P=0.000-> H0 is rejected The regression coefficient makes a significant contribution to
the model

H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β1=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β10)
P=0.360-> H0 is rejected The regression coefficient DOES NOT make a significant
contribution to the model
In Summary
• β1 is not SIGNIFICANT
• Regression model can not be established. We
could not establish a linear mathematical
model between the number of children and
monthly fruit and vegetable expenditures of
the families.

You might also like