0% found this document useful (0 votes)
13 views10 pages

Chapter 9

Uploaded by

Nurul Izzati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views10 pages

Chapter 9

Uploaded by

Nurul Izzati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

CHAPTER 9: REGRESSION [SIMPLE LINEAR]

• The main objective in regression analysis is to analyses the relationship between dependent
and independent variable(s) and formulate the relationship into a mathematical equation.
• Regression analysis has the same function as correlation analysis in identifying the
relationship between dependent variable and independent variable(s). For example:
✓ Relationship between the expenditure and the expenses of household.
✓ Relationship between chemical response and temperature.
✓ Relationship between the height of bean sprout and humidity of soil.
• After we obtained the mathematical equation of the relationship or so-called regression
line, the value of the dependent variable on the basis of the independent variable(s) can be
predicted. In this chapter, dependent and independent variable are denoted by y and x,
respectively.
• The basic method of finding the regression line is through scatter plot.
✓ A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the
independent variable x and the dependent variable y.
✓ It is a visual way to describe the nature of the relationship between the independent
and dependent variables.
• However, obtaining the regression line by using scatter plot is not the best method because
different researcher might draw straight line with different slope. Thus, we need the
mathematical formulation.

Figure 1: Example Scatter Plot

1
©NOOR MAIZATUL NAZUHA MOHAMAD
9.1 Simple Regression Analysis

• A simple regression analysis analyses the relationship of y and one independent variable x
through a regression line given by
y = β0 + β1 x + ε

• Both β0 and β1 are population parameters which are usually unknown and hence estimated
from the sample data. The sample regression line that used to estimate the population
regression line can be written as:

𝑦̂ = 𝑏0 + 𝑏1 𝑥

Where;
𝑏0 is the estimate of the intercept
𝑏1 is the estimate of the slope

9.2 The Interpretation of the Coefficient

• There are two coefficients in this analysis which are 𝑏0 and 𝑏1 .


i. 𝑏0 is the intercept of the regression line.
𝑏0 describe the expected value of the dependent variable 𝑦̂ when the x-value is
equal to zero.
ii. 𝑏1 is the slope of regression line.
𝑏1 describe the amount of change in expected value of dependent variable, 𝑦̂ for
every unit change in x-value will increase the 𝑦̂-value about 𝑏1 times. If 𝑏1 is
negative value, the increment of one unit in x-value will reduce the 𝑦̂-value about
𝑏1 times.

2
©NOOR MAIZATUL NAZUHA MOHAMAD
9.3 Model Assumption

• The sample regression line/model is valid if only the following four conditions for the error
variable (𝜀) are met:
i. The probability distribution of 𝜀 is normal.
ii. The mean of the distribution is 0; that is 𝐸(𝜀) = 0
iii. The variance of the error 𝜀 is a constant 𝜎 2 regardless of the value of x that is
𝑉(𝜀) = 𝜎 2 . For conditions (i, ii and iii) we can then write the distribution of the
error variables as 𝜀~𝑁(0, 𝜎 2 ).
iv. The value of 𝜀𝑗 that associated with 𝑦𝑗 is independent of 𝜀𝑗 that associated with
𝑦𝑗 .

9.4 Testing the Regression Parameter (Slope), 𝛃𝟏

• If no linear relationship exists between the two variables, we would expect the regression
line to be horizontal, that is to have a slope of zero. We want to see if there is a linear
relationship as we want to see if the slop β1 is something other than zero. 5 steps of
hypothesis are as follows:

Step 1: State the hypothesis and identify the claim.


𝐻0 : β1 ≥ 0 vs 𝐻1 : β1 < 0
𝐻0 : β1 ≤ 0 vs 𝐻1 : β1 > 0
𝐻0 : β1 = 0 vs 𝐻1 : β1 ≠ 0

Step 2: Compute the test value.

(𝑏1 −β1 ) 𝑆𝑦𝑦 −𝑏1 𝑆𝑥𝑦


𝑡= 2
where, 𝑆 2 𝜀 =
𝑛−2
√𝑆 𝜀⁄𝑆
𝑥𝑥

3
©NOOR MAIZATUL NAZUHA MOHAMAD
Step 3: Find the critical value from the appropriate table
Left tail: −𝑡𝛼,𝑛−2
Right tail: 𝑡𝛼,𝑛−2
Two tail: 𝑡𝛼⁄2,𝑛−2

Step 4: Make decision whether to reject or not reject Ho


Left tail: RR: [𝑡 < −𝑡𝛼,𝑛−2 ]
Right tail: RR: [𝑡 > 𝑡𝛼,𝑛−2 ]
Two tail: RR: [|𝑡| > 𝑡𝛼⁄2,𝑛−2 ]

Step 5: Summarize the results conclusion.

9.5 Prediction/Estimation

• The prediction value of y is denoted by 𝑦̂. The value of y could be predicted if and only if
x-value is known. There are two method of prediction which are interpolation and
extrapolation. Interpolation can be represented as x-value is in given range of data while
extrapolation as x-value is not in given range of data.

9.6 Coefficient of Determination

• Coefficient of determination, 𝑅 2 is one of the variable tools for checking whether the
regression model use to relate the dependent and independent variable(s) is a good model
or not. The coefficient of determination is the square of coefficient of correlation, hence
𝑅 2 = (𝑟)2 . It gives the proportion of variation in y explained by all x variables. Anyhow,
if the value of coefficient of correlation is unavailable, the 𝑅 2 can be calculated using the
following formula,
𝑆 2 𝑥𝑦
𝑅2 =
𝑆𝑥𝑥 − 𝑆𝑥𝑦

4
©NOOR MAIZATUL NAZUHA MOHAMAD
• As in ANOVA, the variation of y can be partition into two parts variation of y, 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 =
𝑆𝑆𝑅 − 𝑆𝑆𝐸. Thus, 𝑅 2 can be calculated using formula;
𝑆𝑆𝐸
𝑅2 = 1 −
𝑆𝑆𝑇
SSE – sum of squares of error, measures the amount of variation in y that remained
unexplained
SST – sum of squares of regression, measures the amount of variation in y explained by
variation in x

• The coefficient of determination does not have a critical value that enables us to draw
conclusions. In general, the higher the value of 𝑅 2 , the better the model fits the data. If
𝑅 2 = 0.6483, it implies that 64.83% of the variation in y can be explained by the
variation in the x variable.

9.7 Working Example and Hypothesis Statement

The null an alternative hypothesis is defined as follows:

Simple Linear Regression test – Data 1

H0: There is no relationship between marks and total credit hours.

H1: There is a relationship between marks and total credit hours.

Simple Linear Regression test – Data 2

H0: There is no relationship between crude palm oil yield and land area.

H1: There is a relationship between crude palm oil yield and land area.

5
©NOOR MAIZATUL NAZUHA MOHAMAD
9.8 To Obtain a Simple Linear Regression

Select Analyze menu, Select Regression, Click on Linear.

Data Preparation using SPSS

*Variable View

6
©NOOR MAIZATUL NAZUHA MOHAMAD
*Data View

7
©NOOR MAIZATUL NAZUHA MOHAMAD
Step 1: Select Analyze menu, Select Regression, Click on Linear.
Step 2: Click on appropriate variable, click on the Arrow button into Dependent box.
Step 3: Click on appropriate variable, click on the Arrow button into Independent box.
Step 4: Click on OK.

SPSS Output
Data 1
Decision Rule: Reject H0 if p-value ≤ 𝛼

p-value of good fit


model

8
©NOOR MAIZATUL NAZUHA MOHAMAD
p-value of
regression
parameter test

Reporting
In this analysis, 𝑅 2 = 0.515 show that about 51.5% of total variation in Marks is explained
by the total credits’ hours. However. A good fit model (ANOVA) is shown significant (p-
value=0.000). A regression parameter test showed that the relationship between marks and
total credit hours are statistically significant, p-value=0.000.

Data 2

p-value of good fit


model

9
©NOOR MAIZATUL NAZUHA MOHAMAD
p-value of
regression
parameter test

Reporting
In this analysis, 𝑅 2 = 0.696 show that about 69.6% of total variation in crude palm oil
yield is explained by the total variation in land area. However. A good fit model (ANOVA)
is shown significant (p-value=0.000). A regression parameter test showed that the
relationship between crude palm oil yield and land area are statistically significant, p-
value=0.000.

10
©NOOR MAIZATUL NAZUHA MOHAMAD

You might also like