0% found this document useful (0 votes)
19 views17 pages

Regreesion and Correlation Presentation Revised

The document explains correlation and linear regression as tools for analyzing relationships between variables and making predictions. It details the correlation coefficient, its interpretation, and significance testing, along with regression analysis to estimate relationships between dependent and independent variables. Real-world examples illustrate these concepts, demonstrating how to calculate and interpret correlation and regression results.

Uploaded by

khadija Nadeem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views17 pages

Regreesion and Correlation Presentation Revised

The document explains correlation and linear regression as tools for analyzing relationships between variables and making predictions. It details the correlation coefficient, its interpretation, and significance testing, along with regression analysis to estimate relationships between dependent and independent variables. Real-world examples illustrate these concepts, demonstrating how to calculate and interpret correlation and regression results.

Uploaded by

khadija Nadeem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Understanding Correlation and

Linear Regression
Introduction
Correlation and Linear Regression are powerful
tools in data analysis.
They help us understand relationships between
variables and make predictions.
Correlation Coefficient

• The sample correlation coefficient, denoted by r, quantifies the


strength and direction of a linear relationship between two
variables measured on an interval or ratio scale.
• Its values range from -1.00 to 1.00:
• r = 1.00: Perfect positive relationship.
• r = -1.00: Perfect negative relationship.
• r = 0: No linear association between the variables.
Real-World Example ; Correlation

• As temperatures rise, people tend to crave cold


refreshments, leading to increased ice cream sales. This
demonstrates a positive correlation, where higher
temperatures correspond to higher ice cream sales.
• In contrast, higher temperatures are associated with a
decline in hot chocolate sales, as people prefer cooler
beverages. This represents a negative correlation, where as
temperature increases, hot chocolate sales decrease.
• There’s no meaningful connection between the time spent
watching movies and the number of shoes someone buys.
This is an example of no correlation, where one variable
has no impact on the other.
How is the Correlation Coefficient
Determined?
The correlation coefficient is calculated using the formula:
Correlation Coefficient r, Continued
Step-by-Step Example
Suppose a study examines whether hours studied (X) correlate with test scores (Y)

X: Hours
4 5 6 7 8
Studied
Y: Test 50 60 65 70 85
Scores
 Interpretation:
The correlation coefficient r=0.31 indicates a weak positive
relationship between hours studied and test scores.
 It simply means if numbers of study hours increases the test
score automatically increases
Testing the Significance of r; Step-by-Step
 Again consider the previous Scenario .We want to test whether the hours studied (X)
and test scores (Y) have a significant correlation. The correlation coefficient r=0.31 has
already been calculated.
• Step 1: State the null and the alternate hypothesis
• H0: = 0 The correlation in the study hours and test score is zero
• H1: ≠ 0 The correlation in the study hours and test score is different from zero
• Step 2: Select the level of significance
Let’s use α=0.05 (commonly chosen significance level).
• Step 3: Select the test statistic;
we use t formula
• Step 4: Formulate the Decision Rule

 For a two-tailed test with α=0.05 and degree of freedom (df ) =n−2 =3*Where n=5
 The critical values from the t-distribution are t=±2.160.
 Decision Rule: Reject null hypothesis (H0) ​if t<−2.160 or t >2.160
Step 5:Make the Decision

Calculated t=0.565 lies between −2.160 and 2.160.


Conclusion:
Fail to reject null hypothesis ​. There is not enough evidence to conclude a significant
correlation between hours studied and test scores in the population. In the example of hours
studied and test scores, the correlation r=0.31r suggested a weak positive relationship.
However, the statistical test shows that this relationship is not significant at α=0.05 . This
means the observed correlation might be due to random chance rather than a true
relationship in the population.
Regression Analysis
• Regression analysis estimates the relationship between a
dependent variable (Y) and an independent variable (X).
– Dependent Variable (Y): The variable we are trying to predict
or estimate.
– Independent Variable (X): The variable used to make
predictions or estimate the dependent variable.
• Linear Relationship:
– A simple linear regression assumes that the relationship
between the dependent and independent variables is linear,
meaning it can be represented by a straight line.
• Scale of Measurement:
– For linear regression, both the dependent and independent
variables must be measured at the interval or ratio scale.
Least Squares Regression Line
• This is the equation of a line

– is the estimated value of y for a selected value of x


– a is the constant or intercept
– b is the slope of the fitted line
– x is the value of the independent variable
The formulas for a and b are

•Where n is the number of data points


•xi​and yi are the individual data values for the independent and dependent
variables.
Real-World Example: Regression

Example:
• Let’s consider a simple dataset where we have sales (in units) as the
dependent variable and advertising expenditure (in thousands) as the
independent variable.
Advertising Expenditure (X) Sales (Y)
1 3

2 4

3 5

4 6

5 7

• Now, let’s calculate the regression equation


Regression Equation:
Thus, the regression equation is:

This means that for every unit increase in advertising expenditure, the sales are
expected to increase by 1 unit, starting from 2 units when there is no advertising
expenditure.
Regression Equation slope test
• Let's conduct a hypothesis test to determine if the slope of the regression line is
significantly different from zero (which would indicate a relationship between
advertising expenditure and sales).
• Step 1: State the Hypotheses:
Null Hypothesis (H₀): The slope is less than or equal to zero, meaning there is no
relationship between advertising expenditure and sales.

Alternate Hypothesis (H₁): The slope is greater than zero, meaning there is a positive
relationship between advertising expenditure and sales.

• Step 2: Select the Level of Significance:


We will use a significance level of 0.05.

• Step 3: Select the Test Statistic:


We use the t-statistic to test the hypothesis for the slope. The formula for the t-
statistic is:

Where SEb​is the standard error of the slope


Regression Equation slope test

• Step 4: Formulate the Decision Rule:


We reject the null hypothesis if the t-statistic is greater than the critical
value of 1.771 for a one-tailed test with 3 degrees of freedom at a 0.05
significance level.
• Step 5: Make a Decision:
Suppose the calculated t-statistic is 6.205. Since 6.205 > 1.771, we reject
the null hypothesis.

Thus, we conclude that there is a statistically significant positive


relationship between advertising expenditure and sales

You might also like