0% found this document useful (0 votes)
36 views3 pages

Chapter 10

Uploaded by

ipalomata272002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views3 pages

Chapter 10

Uploaded by

ipalomata272002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CHAPTER 10 – CORRELATION AND REGRESSION

a. Correlation Analysis
- its purpose is to measure the strength and direction of the linear association between two
variables
-used when we’re interested in describing how two variables change relative to each other

a.1 Scatter Diagram


- a two-dimensional graph used to visualize the possible underlying relationship between
two variables by plotting individual pairs of observations

a.2. Linear Correlation Coefficient (𝝆)


- a measure of strength and direction of the linear relationship existing between two
variables that is independent of their respective scales of measurement

𝐸(𝑋𝑌) − 𝐸(𝑋)𝐸(𝑌)
𝜌=
√𝑉𝑎𝑟(𝑋)𝑉𝑎𝑟(𝑌)

a.2.a. Properties of Linear Coefficient


a. It can only assume values between -1 and 1, inclusive of endpoints
b. The sign of 𝜌 describes the direction of the linear relationship between X and Y
 A positive value/negative value means that the line slopes upward to the right, and
so when X increases, Y is expected to increase/decrease
c. The absolute value of 𝜌 tells us the strength of the linear relationship between X and Y.
A high value of |𝜌| implies strong linear relationship, and a low value implies otherwise
d. When 𝜌 = −1 or 1, there is a perfect linear relationship between X and Y and all the
points lie on a straight line

If 𝜌 = 0, then there is no linear relationship between X and Y. However, it doesn’t mean a


lack of association

A strong linear relationship doesn’t mean that X causes Y or Y causes X as it is possible that
other variables may have caused the change in both X and Y or due to a coincidence
a.2.a. Pearson’s r
- a point estimator of 𝜌

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)


𝑟=
√(∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 )(∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2 )

a.2.a.1. Computational Formula

𝑛 ∑𝑛𝑖=1(𝑥𝑖 𝑦𝑖 ) − (∑𝑛𝑖=1 𝑥𝑖 )(∑𝑛𝑖=1 𝑦𝑖 )


𝑟=
√(𝑛(∑𝑛𝑖=1 𝑥𝑖 2 ) − (∑𝑛𝑖=1 𝑥𝑖 )2 )(𝑛(∑𝑛𝑖=1 𝑦𝑖 2 ) − (∑𝑛𝑖=1 𝑦𝑖 )2

a.2.b. Hypothesis Test on 𝝆

TEST STATISTIC ALTERNATIVE NULL REGION OF


HYPOTHESIS HYPOTHESIS REJECTION
𝑋̅ − 𝜇𝑂 𝜌 < 𝜌0 𝜌 = 𝜌0 𝑇 < −𝑡𝑎 (𝑣 = 𝑛 − 2)
𝑇= 𝑠
√𝑛 𝜌 > 𝜌0 𝑇 > 𝑡𝑎 (𝑣 = 𝑛 − 2)
𝜌 ≠ 𝜌0 |𝑇| < 𝑡𝑎 (𝑣 = 𝑛 − 2)
2

b. Simple Linear Regression


- its purpose is to evaluate the relative impact of a predictor on a particular outcome
- it contains only one explanatory (independent) variable and is linear w/ respect to both the
regression coefficients and the response (dependent) variable

𝑌𝑖 = 𝛽𝑜 + 𝛽1 𝑋𝑖 + 𝜖1

Legend:
- Value of the response variable for the ith element
- Value of the explanatory for the ith element
- Regression coefficient that gives the Y-intercept of the regression line
It is the value of the mean of Y when X = 0
- Regression coefficient that gives the slope of the regression line
It gives the amount of change in the mean or expected value of Y for every unit
increase in the value of X
- Random error term for the ith element; independent; normally distributed w/ mean
0 and variance

b.1. Purpose of Linear Regression


a. Describe the linear relationship between variables
b. Determine how much one variable affects another variable on the average
c. Predict the value of a dependent variable given a value of an independent variable

b.2. Random Error Term


- may be thought of as a representation of the effect of other factors that isn’t explicitly
stated in the model but affect the response variable to some extent
- accounts for the inherent variation or basic and unpredictable element of randomness in
response
- also accounts for measurement errors in recording the value of the response variable
b.2.a. Assumption on Random Error Term
a. The error terms are independent from one another
b. The error terms are normally distributed
c. The error terms all have a mean of 0
d. The error terms have constant variance 𝜎 2

Since 𝜖𝑖 ~𝑁𝑜𝑟𝑚𝑎𝑙 (0, 𝜎 2 ), it follows that the 𝑌𝑖′ 𝑠 follow a normal distribution where the
expected value of 𝑌𝑖 is 𝛽𝑜 + 𝛽1 𝑥 and the variance is still 𝜎 2 . Thus we have 𝑌𝑖 ~𝑁𝑜𝑟𝑚𝑎𝑙 (𝛽𝑜 +
𝛽1 𝑥, 𝜎 2 ), 𝑌𝑖′ 𝑠 independent

b.3. Steps in Simple Linear Regression


Step 1. Obtain the equation that best fits the data
Step 2. Evaluate the equation to determine the strength of the relationship for prediction
and estimation
Step 3. Determine if the assumptions about the error terms are satisfied
Step 4. If the model fit the data adequately, use the equation for prediction and for
describing the nature of the relationship between the variables
b.4. Method of Least Squares
- It’s the hinged on the idea that the best-fitting line is selected as the one that minimizes
the sum of squares of the deviations of the observed value of Y from its expected value
- In this method, we minimize the square deviations of the observed values of Y and the
expected values of Y or the square of the error terms

𝜖𝑖 = 𝑌𝑖 − 𝐸(𝑌𝑖 ) = 𝑌𝑖 − (𝛽𝑜 + 𝛽1 𝑋𝑖 )

So, we require 𝛽𝑜 and 𝛽1 to be those values that minimizes the term


𝑛

∑ 𝜀𝑖 2
𝑖=1

b.5. Estimated Regression Equation


- appropriate only for the relevant range of X

𝑌̂ = 𝑏𝑜 + 𝑏1 𝑋

Legend:
- Estimated value of Y
- Are the estimates for 𝛽𝑜 and 𝛽1

For every unit increase in X, 𝑌̂ increases by 𝑏1 units. If X = 0 , then 𝑌̂ is equal to 𝑏𝑜

b.6. Coefficient of Determination (𝑅 2)


- is the proportion of the variability in the observed values of the response variable that can
be explained by the explanatory variable through their linear relationship
- its realized value will be between 0 and 1 as −1 ≤ 𝑟 ≤ 1
- if a model has perfect predictability, then 𝑅 2 = 1, but if a model has no predictive
capability, then 𝑅 2 = 0
- an 𝑅 2 between 0 and 1 indicates the extent to which the dependent variable is predictable
- an 𝑅 2 of 0.10 means that 10 percent of the variance in Y is predictable from X, and so on

For simple linear regression, 𝑅 2 = 𝑟 2 (pearson’s correlation coefficient)

b.7. Predicting Y from X


- can be computed by substituting x for X in the equation
We can predict the value of the mean of Y given a value of X using the model by inputting the
value of X in the model but we can’t use it to predict Y when the value of X is outside the
bounds of the observed X

You might also like