0% found this document useful (0 votes)
109 views1 page

Stat Lab 2

This document provides an introduction to regression analysis. It explains that regression is used to determine the linear relationship between two or more variables and can be used for prediction and causal inference. Regression shows how one variable co-varies with another by estimating the slope, intercept, and error term of the linear relationship through the regression coefficient. However, regression alone cannot prove causation. The document also distinguishes regression from correlation and introduces the simple and multiple linear regression models.

Uploaded by

docbin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views1 page

Stat Lab 2

This document provides an introduction to regression analysis. It explains that regression is used to determine the linear relationship between two or more variables and can be used for prediction and causal inference. Regression shows how one variable co-varies with another by estimating the slope, intercept, and error term of the linear relationship through the regression coefficient. However, regression alone cannot prove causation. The document also distinguishes regression from correlation and introduces the simple and multiple linear regression models.

Uploaded by

docbin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

StatLab Workshop Series 2008 Introduction to Regression/Data Analysis

If you are interested in whether one variable differs among possible groups, for instance,
then regression isn’t necessarily the best way to answer that question. Often you can find
your answer by doing a t-test or an ANOVA. The flow chart shows you the types of
questions you should ask yourselves to determine what type of analysis you should
perform. Regression will be the focus of this workshop, because it is very commonly
used and is quite versatile, but if you need information or assistance with any other type
of analysis, the consultants at the Statlab are here to help.

II. Regression: An Introduction:

A. What is regression?

Regression is a statistical technique to determine the linear relationship between two or


more variables. Regression is primarily used for prediction and causal inference.

In its simplest (bivariate) form, regression shows the relationship between one
independent variable (X) and a dependent variable (Y), as in the formula below:

The magnitude and direction of that relation are given by the slope parameter ( 1), and
the status of the dependent variable when the independent variable is absent is given by
the intercept parameter ( 0). An error term (u) captures the amount of variation not
predicted by the slope and intercept terms. The regression coefficient (R2) shows how
well the values fit the data.

Regression thus shows us how variation in one variable co-occurs with variation in
another. What regression cannot show is causation; causation is only demonstrated
analytically, through substantive theory. For example, a regression with shoe size as an
independent variable and foot size as a dependent variable would show a very high
regression coefficient and highly significant parameter estimates, but we should not
conclude that higher shoe size causes higher foot size. All that the mathematics can tell
us is whether or not they are correlated, and if so, by how much.

It is important to recognize that regression analysis is fundamentally different from


ascertaining the correlations among different variables. Correlation determines the
strength of the relationship between variables, while regression attempts to describe that
relationship between these variables in more detail.

B. The linear regression model (LRM)

The simple (or bivariate) LRM model is designed to study the relationship between a pair
of variables that appear in a data set. The multiple LRM is designed to study the
relationship between one variable and several of other variables.

In both cases, the sample is considered a random sample from some population. The two
variables, X and Y, are two measured outcomes for each observation in the data set. For

http://www.yale.edu/statlab 3

You might also like