Week 8 - 10
Week 8 - 10
SPSS Statistics
• Once we’ve acquired data with multiple
variables, one very important question is how
the variables are related.
• For example, we could ask for the relationship
between people’s weights and heights, or
study time and test scores.
• Regression is a set of techniques for
estimating relationships, and we’ll focus on
them
• We begin with simple linear regression in
which there are only two variables of interest
(e.g., weight and height, or force used and
distance stretched).
• After developing intuition for this setting, we’ll
then turn our attention to multiple linear
regression, where there are more variables.
• We’re going to fit a line y = β0 + β1x to our
data.
• Here, x is called the independent variable or
predictor variable, and y is called the
dependent variable or response variable
• β1 is the slope of the line: this is one of the
most important quantities in any linear
regression analysis. A value very close to 0
indicates little to no relationship; large
positive or negative values indicate large
positive or negative relationships, respectively.
• β0 is the intercept of the line.
In order to actually fit a line, we’ll start
with a way to quantify how good a line
is. We’ll then use this to fit the “best”
line we can.
x y n x y
i i b0 y b1 x
b1 i 1
i
n
x 2
n x 2
i 1
My model
Y=b0+ b1X
The Sign of the Coefficcient b1
• if x increases and y increases “the value
of b1 is positive”.
The Sign of the Coefficcient b1
• if x increases and y decreases or if x
decreases and y increases “the value of
b1 is negative”.
INDIVIDUAL HYPOTHESIS TESTS FOR
REGRESSION COEFFICIENTS
(T TESTS)
• 0
Ho: The Regression Coefficient is not important (The regression
coefficient does not make a significant contribution to the
model)(β0=0)
Ha: The Regression Coefficient is important (The regression
coefficient makes a significant contribution to the model
(β00)
b0 ( 0 0)
ttest
Sb 0
b1 ( 1 0)
ttest
Sb1
My model
Y=b0+ b1X
SIMULTANEOUSLY HYPOTHESIS TESTS
(F TESTS)
• The modeled contribution of each coefficient
on the previous slide was tested separately.
• However, whether the model is generally
meaningful is possible if both coefficients are
tested at the same time.
• F test is performed.
Ho: Model is not significant
(β0= 1 =0)
Ha: Model is significant
At least β is different from 0
Example
• In order to find out the relationship between
monthly food expenses and monthly incomes,
Monthly income of 30 people and their food
expenses were asked and the following
responses were received .
Does income have a significant impact on spending? If so, what is the mathematical
expression of the linear relationship between the two.
Expense: Dependent Variable (Y)
İncome: Independent Variable (X) Uyg_reg.sav
Let's check the linear relationship between expense (Y) and income (X) with a
scatter plot
Select Simple scatter and click define
Y=Expense (dependent
variable)
X=Income (Independent
variable)
As understood, the relationship
between Y and X seems to be positive;
that means when x increases, then y
increases (the value of b1 is positive)
The easy way of finding the model
Double click on plot
yˆ 0.31 0.28 x
The Common Way
(Regression Analysis)
Dependent : expense
Independent : Income
Click Statistics
1
This table indicates that the regression model predicts the dependent
variable significantly well. How do we know this? Look at the "Regression"
row and go to the "Sig." column. This indicates the statistical significance of
the regression model that was run. Here, p < 0.00, which is less than 0.05,
and indicates that, overall, the regression model statistically significantly
predicts the outcome variable.
The Coefficients table provides us with the necessary information to predict
expense from income, as well as determine whether income contributes
statistically significantly to the model (by looking at the "Sig." column).
Furthermore, we can use the values in the "B" column under the
"Unstandardized Coefficients" column, as shown below:
1 2
b0 ( 0 0) 0.314
ttest 0.783
Sb 0 0.401
b1 ( 1 0) 0.283
ttest 23.164
Sb1 0.012
4
H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β0=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β00)
P=0.440-> H0 is not rejected (but it is not important for us)
H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β1=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β10)
P=0.000-> H0 is rejected The regression coefficient makes a significant contribution to
the model
In Summary
• Model; x=income y=expense
yˆ 0.314 0.283 x
Example
• A food company is interested in determining a
shelf-life for a new chilled food product and
hence they would like to quantify the
relationship between microbial activity and
time (shelf-life)
Does microbial activity have a significant impact on time (shelf-life)? If so, what is
the mathematical expression of the linear relationship between the two.
Time: Dependent Variable (Y)
microbial activity : Independent Variable (X)
Let's check the linear relationship between Time (Y) and microbial activity (X)
with a scatter plot
Select Simple scatter and click define
X= microbial activity
(Independent variable)
As understood, the relationship
between Y and X seems to be positive;
that means when x increases, then y
increases (the value of b1 is positive)
The easy way of finding the model
Double click on plot
yˆ 244 2.56 x
The Common Way
(Regression Analysis)
Dependent : time
Independent : micro_activity
Click Statistics
1
1) Dependent Variable:Time
2) R2 (R square)=0.86. The R2 value (the "R Square" column) indicates how much of
the total variation in the dependent variable, time, can be explained by the
independent variable, microbial activity,.
In this case, 86% can be explained, which is very large.
1 2
This table indicates that the regression model predicts the dependent
variable significantly well. How do we know this? Look at the "Regression"
row and go to the "Sig." column. This indicates the statistical significance of
the regression model that was run. Here, p < 0.00, which is less than 0.05,
and indicates that, overall, the regression model statistically significantly
predicts the outcome variable.
1 2
b0 ( 0 0) 243.759
ttest 18.669
Sb 0 13.057
b1 ( 1 0) 2.556
ttest 16.041
Sb1 0.159
4
H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β0=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β00)
P=0.000-> H0 is rejected The regression coefficient makes a significant contribution to
the model
H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β1=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β10)
P=0.000-> H0 is rejected The regression coefficient makes a significant contribution to
the model
In Summary
• Model; x=microbial activity y=time
yˆ 243.759 2.556 x
Example
• Fruits and vegetables expenditures were
recorded in TL in 1 month and they were
asked to explain the relationship between
expenditures and the number of children they
have.
Does the number of children have a significant
effect on fruit and vegetable expenditure? If
there is a mathematical expression of the linear
relationship between these two.
Expenditure:Dependent (Y)
Number of children: Independent Variable (X)
Let's check the linear relationship between expenditure (money) (Y) and number
of children (X) with a scatter plot
Select Simple scatter and click define
Y=money (dependent
variable)
X= number of children
(Independent variable)
As understood, the relationship
between Y and X seems to be zero;
that means no relationship, no
increasing or decreasing trend!!!!!!!!
The easy way of finding the model
Double click on plot
R 0.042
2
yˆ 115 1.32 x
The Common Way
(Regression Analysis)
Dependent : money
Independent : number of kids
Click Statistics
1
1) Dependent Variable:Money
2) R2 (R square)=0.0402. The R2 value (the "R Square" column) indicates how much
of the total variation in the dependent variable, time, can be explained by the
independent variable, microbial activity,.
In this case, 4.02% can be explained, which is very small!!!!!!.
1 2
b0 ( 0 0) 114.528
ttest 22.576
Sb 0 5.073
b1 ( 1 0) 1.318
ttest 0.205
Sb1 1.407
4
H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β0=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β00)
P=0.000-> H0 is rejected The regression coefficient makes a significant contribution to
the model
H0: The Regression Coefficient is not important (The regression coefficient does not
make a significant contribution to the model)(β1=0)
Ha: The Regression Coefficient is important (The regression coefficient makes a
significant contribution to the model (β10)
P=0.360-> H0 is rejected The regression coefficient DOES NOT make a significant
contribution to the model
In Summary
• β1 is not SIGNIFICANT
• Regression model can not be established. We
could not establish a linear mathematical
model between the number of children and
monthly fruit and vegetable expenditures of
the families.