0% found this document useful (0 votes)
18 views

Lesson #7 - Regression Analysis

Lesson #7_ Regression Analysis

Uploaded by

chefreme420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lesson #7 - Regression Analysis

Lesson #7_ Regression Analysis

Uploaded by

chefreme420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Lesson #7: Regression Analysis

1. Introduction to Regression Analysis

● Definition: Regression analysis is a statistical technique used to understand the


relationship between variables. It models the relationship between a dependent variable
and one or more independent variables.
● Purpose:
○ To predict the value of the dependent variable based on the values of
independent variables.
○ To understand the strength and nature of the relationships between variables.

2. Types of Regression Analysis

● 1.1 Simple Linear Regression:


○ Definition: A method to model the relationship between a single independent
variable (X) and a dependent variable (Y) by fitting a linear equation.
○ Equation: Y=β0+β1X+ϵY = \beta_0 + \beta_1X + \epsilonY=β0+β1X+ϵ
■ YYY = Dependent variable
■ XXX = Independent variable
■ β0\beta_0β0 = Y-intercept (the value of Y when X = 0)
■ β1\beta_1β1 = Slope (change in Y for a one-unit change in X)
■ ϵ\epsilonϵ = Error term (the difference between observed and predicted
values of Y)
○ Example: Predicting a person's weight (Y) based on their height (X).
● 1.2 Multiple Linear Regression:
○ Definition: An extension of simple linear regression that models the relationship
between a dependent variable and two or more independent variables.
○ Equation: Y=β0+β1X1+β2X2+⋯+βnXn+ϵY = \beta_0 + \beta_1X_1 + \
beta_2X_2 + \dots + \beta_nX_n + \epsilonY=β0+β1X1+β2X2+⋯+βn
Xn+ϵ
■ X1,X2,…,XnX_1, X_2, \dots, X_nX1,X2,…,Xn = Independent variables
■ β1,β2,…,βn\beta_1, \beta_2, \dots, \beta_nβ1,β2,…,βn = Coefficients
representing the impact of each independent variable on Y
○ Example: Predicting a house's price (Y) based on its size (X1), location (X2), and
age (X3).
● 1.3 Logistic Regression:
○ Definition: A type of regression used when the dependent variable is
categorical, typically binary (e.g., yes/no, success/failure).
○ Equation: log⁡(p1−p)=β0+β1X1+β2X2+⋯+βnXn\log\left(\frac{p}{1-p}\
right) = \beta_0 + \beta_1X_1 + \beta_2X_2 + \dots + \
beta_nX_nlog(1−pp)=β0+β1X1+β2X2+⋯+βnXn
■ ppp = Probability of the event occurring (e.g., success)
■ The log-odds of the probability is modeled as a linear combination of the
independent variables.
○ Example: Predicting whether a customer will buy a product (Y = 1) or not (Y = 0)
based on their income, age, and browsing history.

3. Assumptions of Linear Regression


● 1. Linearity: The relationship between the independent and dependent variables should
be linear.
● 2. Independence: Observations should be independent of each other.
● 3. Homoscedasticity: The variance of the residuals (errors) should be constant across
all levels of the independent variables.
● 4. Normality: The residuals should be approximately normally distributed.
● 5. No Multicollinearity (for Multiple Regression): Independent variables should not be
highly correlated with each other.

4. Evaluating the Regression Model

● 1. Coefficient of Determination (R2R^2R2):


○ Definition: A measure of the proportion of the variance in the dependent variable
that is predictable from the independent variables.
○ Range: 0≤R2≤10 \leq R^2 \leq 10≤R2≤1
○ Interpretation:
■ R2=0R^2 = 0R2=0: The model explains none of the variability in the
dependent variable.
■ R2=1R^2 = 1R2=1: The model explains all of the variability in the
dependent variable.
● 2. Adjusted R2R^2R2:
○ Definition: Adjusts the R2R^2R2 value for the number of predictors in the model,
providing a more accurate measure in models with multiple independent
variables.
● 3. F-Test:
○ Purpose: Tests whether at least one of the coefficients in the model is different
from zero (i.e., the model has predictive power).
● 4. P-Value of the Coefficients:
○ Purpose: Tests the null hypothesis that the coefficient is equal to zero (no
effect).
○ Interpretation: A small p-value (typically < 0.05) indicates that the coefficient is
statistically significant.

5. Interpreting Regression Coefficients

● 1. Intercept (β0\beta_0β0):
○ Represents the expected value of Y when all independent variables are zero.
● 2. Slope Coefficients (β1,β2,…\beta_1, \beta_2, \dotsβ1,β2,…):
○ Represents the change in the dependent variable (Y) for a one-unit change in the
independent variable (X), holding other variables constant.
● 3. Significance of Coefficients:
○ Positive Coefficient: Indicates a positive relationship between the independent
variable and the dependent variable.
○ Negative Coefficient: Indicates a negative relationship.

6. Residual Analysis

● 1. Definition: Residuals are the differences between observed and predicted values of
the dependent variable.
● 2. Purpose: To check the validity of the regression assumptions (e.g., homoscedasticity,
normality).
● 3. Diagnostic Plots:
○ Residual vs. Fitted Plot: Used to check for linearity and homoscedasticity.
○ Normal Q-Q Plot: Used to check the normality of residuals.
○ Scale-Location Plot: Checks the homoscedasticity assumption.
○ Residuals vs. Leverage Plot: Identifies influential observations (outliers).

7. Practical Applications of Regression Analysis

● Economics: Predicting GDP based on factors like investment, education, and labor.
● Medicine: Analyzing the relationship between patient outcomes and treatment methods.
● Marketing: Estimating sales based on advertising spend, price, and market conditions.
● Finance: Modeling stock prices based on economic indicators and company
performance.

8. Limitations of Regression Analysis

● 1. Outliers: Can disproportionately affect the regression line and model accuracy.
● 2. Multicollinearity: High correlation between independent variables can make it difficult
to determine the effect of each predictor.
● 3. Overfitting: Including too many predictors can lead to a model that fits the sample
data well but performs poorly on new data.
● 4. Assumption Violations: If the assumptions of regression are violated, the results
may not be reliable.

You might also like