100% found this document useful (1 vote)
60 views20 pages

Regression

Here are examples of linear and multiple regression: Linear regression example: A researcher wants to understand the relationship between hours studied and exam scores. They would conduct a linear regression with hours studied as the independent variable and exam scores as the dependent variable. Multiple regression example: A real estate agent wants to understand what factors influence home prices. They would conduct a multiple regression with home price as the dependent variable and independent variables like number of bedrooms, square footage, location, age of home, etc. This allows them to see the influence of each factor while controlling for the others.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
60 views20 pages

Regression

Here are examples of linear and multiple regression: Linear regression example: A researcher wants to understand the relationship between hours studied and exam scores. They would conduct a linear regression with hours studied as the independent variable and exam scores as the dependent variable. Multiple regression example: A real estate agent wants to understand what factors influence home prices. They would conduct a multiple regression with home price as the dependent variable and independent variables like number of bedrooms, square footage, location, age of home, etc. This allows them to see the influence of each factor while controlling for the others.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

REGRESSION

Adverse Childhood Experiences & Depression


Adverse Childhood Experiences
& Depression
What is an appropriate statistics to use here?
Regression

• “Regression” is a generic term for statistical methods that attempt


to fit a model to data, in order to quantify the relationship
between the dependent (outcome) variable and the independent
variable(s) ( (often called ‘predictors’, ‘covariates’, or ‘features’).
• It can be used either to merely describe the relationship between
the two groups of variables (explanatory), or to predict new values
What will prediction look
(prediction).
like in the previous
example?
Regression

• “Regression” is a generic term for statistical methods that attempt


to fit a model to data, in order to quantify the relationship
between the dependent (outcome) variable and the independent
variable(s) ( (often called ‘predictors’, ‘covariates’, or ‘features’).
• It can be used either to merely describe the relationship between
the two groups of variables (explanatory), or to predict new values
What will explanation look
(prediction).
like in the previous
example?
Scatterplot

x
Why is it called Regression?

• Termed "regression" by Sir Francis Galton in the 19th century


• He was studying height of people.
• He found that while there are shorter and taller people, only
outliers are very tall or short, and most people cluster somewhere
around (or "regress" to) the average.
Regression - Terminology

• Dependent variable or target variable: Variable to predict.


• Independent variable or predictor variable: Variables to estimate the dependent variable.
• Outlier: Observation that differs significantly from other observations. It should be avoided
since it may hamper the result.
• Multicollinearity : Situation in which two or more independent variables are highly linearly
related.
• Homoscedasticity  or homogeneity of variance: Situation in which the error term is the
same across all values of the independent variables.
• Regression line: in prediction of Y from X, the straight line of best fit to the Y values
• Regression equation: the equation that locates the regression line.
Y = bX + C
Regression
Equation

                        Slope                                                        Regression coefficient [b and β (beta)]


                                                       Intercept
Regression
Line

Criterion for the regression line: the line of best fit is the one that minimizes the sum of the
squares of the discrepancies between the actual value of Y and the predicted value (called the
least-squares criterion)
Regression vs Correlation
Regression -Types

Polynomi Support Decision


Linear
al Vector Tree
Regressio
Regressio Regressio Regressio
n
Random n n n
Ridge Lasso Logistic
Forest
Regressio Regressio Regressio
Regressio
n n n
n
Regression - Assumptions

1. Your dependent variable should be measured at the continuous level (i.e., it is either


an interval or ratio variable).
2. Your independent variable should also be measured at the continuous level (i.e., it is either
an interval or ratio variable). 
3. There needs to be a linear relationship between the independent and the dependent variable.
4. There should be no significant outliers. 
5. You should have independence of observations. No multicollinearity in the independent
variables.
6. Your data needs to show homoscedasticity, which is where the variances along the line of best
fit remain similar as you move along the line. 
7. The residuals (errors) of the regression line are approximately normally distributed
Confidence Interval & Prediction Interval

• Confidence Interval : The 95% confidence interval is commonly


interpreted as there is a 95% probability that the true linear
regression line of the population will lie within the confidence
interval of the regression line calculated from the sample data 
• Prediction Interval : an interval around the predicted value
y0 for x0 such that there is a 95% probability that the real value of
y (in the population) corresponding to x0 is within this interval. 
Output tables for Linear Regression in PSPP

• R value is correlation
• R2 value indicates how much of the
total variation in the dependent
variable can be explained by the
independent variable
• ANOVA table reports how well the
regression equation fits the data

• Regression Coefficient
Multiple Regression
Output tables for Multiple Regression in PSPP

• R value is correlation
• R2 value indicates how much of the
total variation in the dependent
variable can be explained by the
independent variable
• ANOVA table reports how well the
regression equation fits the data

• Regression Coefficient
What makes a Multiple regression multiple?

• A multiple regression considers the effect of more than one


explanatory variable on some outcome of interest.
• It evaluates the relative effect of these explanatory, or
independent, variables on the dependent variable when holding all
the other variables in the model constant.
Why Use a Multiple Regression Over a
Linear Regression?

• A dependent variable is rarely explained by only one variable.


• In such cases, an analyst uses multiple regression, which attempts
to explain a dependent variable using more than one independent
variable.
• The model, however, assumes that there are no major correlations
between the independent variables.
Work in pairs!

Give me one example of


linear regression &
multiple regression each

You might also like