0% found this document useful (0 votes)

59 views15 pages

STA2100-Regression Analysis

Uploaded by

kigsboni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views15 pages

STA2100-Regression Analysis

Uploaded by

kigsboni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

STA 2100 Probability and Statistics I

Chapter 9
Relations (Simple Linear Regression)

Learning outcomes
Upon completing this topic, you should be able to:

• Define correlation, regression and know the link between these two concepts

• Know the assumptions of linear regression

• Calculate the equations of least squares regression lines, and use them to
estimate any given values for a set of data

• Interpret the meaning of the values obtained in the regression equation, and
how the values link to the correlation coefficient.

134
STA 2100 Probability and Statistics I

1. Introduction Regression
As indicated in introductory section of the previous lesson, we shall now be looking
at regression in this lesson, and specifically simple linear regression.
If two variables are significantly correlated, and if there is some theoretical basis
for doing so, it is possible to predict values of one variable from the other.
Regression analysis, in general sense, means the estimation or prediction of the
unknown value of one variable from the known value of the other variable. It is
one of the most important statistical tools which is extensively used in almost all
sciences – Natural, Social and Physical.
“Regression analysis is a mathematical measure of the average relationship
between two or more variables in terms of the original units of the data.”

Definitions

Regression analysisis a measure of the average relationship between two or

more variables in terms of the original units in which the data was given.
It provides a mathematically expression, an equation for estimating or pre-
dicting the values of one variable from the known of one or more other
variables.

Predictions

One of the primary advantages of knowing about a relationship between two vari-
ables is that one can use the knowledge to facilitate making predictions. Specif-
ically when one has exact knowledge of the individual score on one of the two
variables, then he/she can use the knowledge of the relationship to increase the
accuracy of a prediction of the individuals’ score on the other variable. As a note,
by the term prediction in this case, we mean a “best guess” of what a single
value of score will be. e.g. the value to be predicted can be the number of years
that a 62 yr old woman with high blood pressure will live or the time an individual
machine will take before it will break down. Therefore, a prediction is a guess
about the value of a term to be drawn from a specified population.

A predictor variable is one that provides relevant information for predicting what
scores will be on some other variable.

135
STA 2100 Probability and Statistics I

A predicted variable is one about which predictions are made. (An exact rela-
tionship between the predictor and predicted variable is very essential if we
are to make the most accurate predictions possible)

Simple Regression: Here, we use a single variable to estimate or predict another

variable.

Linear Regression Analysis: A regression analysis is called linear if the equation

of the method represents a straight line.

Curvilinear Regression Analysis: A regression Analysis is called curvilinear if

it represents a curve.

Multiple Regressions: This is a multivariate regression; that is it involves several

variables.

Independent Variable: This is the variable whose value is known.

Dependent Variable: this is the variable whose value is to be predicted by

convention, the independent variable is denoted by X and the dependent
variable by Y.

The simplest form of regression analysis, called simple linear regression or straight
line regression which involves the statistical modeling between a single input fac-
tor X (the “regerssor”) and a single output variable Y (the “response”).

1.1. Simple Linear (least squares) Regression Model

• I believe that at some point in your life, you have encountered the equation
of a straight line,y=mx+b.

• In this equation m is the “slope” of the line (change in y over change in

x) and b is the “intercept” of the line where the y-axis is intersected by the
line.

• The plot of a line with slope 2 and intercept 1 is depicted in the following
figure:

136
STA 2100 Probability and Statistics I

• Even though the straight-line model is perfect for algebra class, in the real
world finding such a perfect linear relationship is next to impossible.

• Most real world relationships are not perfectly linear models, but imperfect
models where the relationship between x and y is more like the correlation plot
we saw earlier.

• In this case, the question literally is “Where do you draw the line?”.
Simple linear regression is the statistical technique to correctly answer this
equation.

• Simple linear regression is the statistical model between X and Y in the real
world, where there is random variation associated with measured variable
quantities. To study the relationship between X and Y, the simplest rela-
tionship is that of a straight line, as opposed to a more complex relationship
such as a polynomial.

• Therefore in most cases we want to try to fit the data to a linear model.

• Plotting X versus Y, as we did in the correlation plot is a good first step

137
STA 2100 Probability and Statistics I

to determine if a linear modelis appropriate. This may reveal a a lot of

details about the data at hand.

• Sometimes data can be transformed (often by taking logarithms, square-root

or other mathematical methods) to fit a linear pattern if they do not do so
in the original measurement scale.

• Therefore, determining if a straight line relationship between X and Y is

appropriate is the first step.

• The second step, once it is determined that a linear model is a good idea,
is to determine the best fitting line that represents the relationship.

1.2. Assumptions of linear regression

Generally, to fit a regression modellinear or otherwise, some assumptions have
to be made and these include:

• For any given value of X, the true mean value of Y depends on X, which can
be written µy|x .

• In regression, the line represents mean values of Y not individual data values.
Each observation Yi is independent of all other, Yj 6= Yi

• We assume linearity between X and the mean of Y. The mean of Y is

determined by the straight line relationship which can be written as: µy|x =
β0 + β1 X or y∗ = a + bx where the betas are the slope and intercept of
the line.

• The variance of the Yi s is constant (homoscedasticity property).

• The response Yi s are normally distributed with a constant variance.

1.3. Fitting the Regression Model/Equation

• Consider the correlation plot between gene1 and gene2 as show in the
following figure.

138
STA 2100 Probability and Statistics I

1.0
0.5
gene2

0.0
−1.0

−1.0 −0.5 0.0 0.5

gene1

• We can simply take a ruler draw in a line (straight?), which, according to

our subjective eyes, best goes through the data.

• This method is subject to much error and is unlikely we will produce the
“best fitting” line. Therefore a more sophisticated method is needed.

• Regression analysis can be thought of as being sort of like the flip side of
correlation. It has to do with finding the equation for the kind of straight
lines we have just looked at.

• Suppose we have a sample of size n and it has two sets of measures, denoted
by x and y. We can predict the values of y given the values of x by using
the equation, y∗ = a + bx . Where,

b = ( (xi − x)(yi − y))/ (xi − x)2

P P

This can further be rearranged and expressed as,

x y)/n x2 − ( x)2
P P P P P
b = (n xy −

For a we have

a = y + bx

or rewritten as
P P
a = ( y − b x)/n

The symbol y∗ refers to the predicted value of y from a given value of x from
the regression equation.

139
STA 2100 Probability and Statistics I

• Suppose we have the linear equation y = 25 + 20x which gives the total
cost, y of a word processing job. Given the amount of time required,x, we
can use the equation to determine the exact cost of the job,y.

• However, things are not quite simple as in this case of word processing exam-
ple. So more often than not we have to be content with rough predictions.
In fact, for many circumstances, the variable being predicted will vary even
for a fixed value of the variable being used to make the prediction.

• For instance, we cannot predict the exact price of a Datsun Z cars by just
knowing the age . Indeed even for a fixed age, say three (3) years old, the
price of a Datsun Z varies from car to car.

Example. Suppose we have the following data on Age Vs Price of Datsun Z’s.
Age(yrs)5 7 6 6 5 4 7 6 5 5 2
Price 80 57 58 55 70 88 43 60 69 63 118
($100)
It’s useful to plot the data so that we can visualize the apparent relationship
between Age and price. Such plot is known as a scatter diagram.

140
STA 2100 Probability and Statistics I

From the diagram, it’s clear that the points are not on a straight line, but it’s
apparent they are clustered about a straight line. Hence, we fit a straight line
to the data, and then we could use that line to predict the price of Datsun Z’s.
Since it is possible to draw many reasonable looking straight lines through the
cluster of points, we need a method to choose the “Best” line. The method used
is known as the Least-square criterion.
So how does it work?
Simple illustration;
Suppose we have two lines A and B drawn for a set of plots in a scatter
diagram, say;
Line A: y = 0.5 + 1.25x
Line B: y = −0.25 + 1.5x
Then we have the following predicted values, and errors for the two lines as
follows:
x y yˆA e= e2A yˆB e= e2B
y− y−
yˆA yˆB
0 2 0.5 1.5 2.25 - 2.25 5.0625
0.25
1 4 1.75 2.25 5.0625 1.25 2.75 7.5625
2 6 3 3 9.00 2.75 3.25 10.5625
3 8 4.25 3.75 14.0625 4.25 3.75 14.0625
P 2 P 2
eA = eB =
30.375 37.25
Where,
x is the observed value of x
y is the observed value of y
eA is the error made if we use line A for prediction
eB is the error made of we use line B for prediction

• The rule for choosing the best line among several possible lines, is that we
choose the line with the smaller value of e . This line will give the best
P 2

fit for the data at hand. This may not be an easy task as we shall be forced
to draw all the possible lines, which may not also be possible. To solve the

141
STA 2100 Probability and Statistics I

problem, we use the regression equation formula as previously illustrated.

The line obtained using that equation/formula gives the line with least sum
of squares, hence the name least square regression!

• Hence, from the above examples of lines A and B, we would choose line A
as it has the least square error, i.e. its the line of best fit for the data if we
were to consider only these two lines.

• The least-squares criterion tells us what property the best-fitting line to a

set of data points must have, but it does not present a formula that permits
us to actually determine the best-fitting line to a set of data points. (Here
we just use the formula although the formula is derived using elementary
calculus)

1.4. Regression Equation:

As previously indicated, the equation of the best-fitting line (regression line) to a
set of data points is given by;
y∗ = a + bx
Where,

x y)/n x2 − ( x)2
P P P P P
b = (n xy −

or rewritten as
P P
a = ( y − b x)/n

We can then derive the line of best fit for the Datsun Z cars example, and also
answer the following questions;
Example. Refer to Age Vs Price data for the Datsun Z’s:

1. Determine the regression equation for the data; i.e. find the equation of the
regression line.

2. Describe the apparent relationship between Age and price for Datsun Zs

3. What does the slope of the regression equation represent in terms of the
prices for Datsun Zs?

142
STA 2100 Probability and Statistics I

4. Use the regression equation to predict the price for a two year-old Z and a
five-year old Z.

Solution

1. To determine the regression equation, we need to compute and using the

formulas above. It is therefore convenient to construct a table of values for

n, Σx, Σy, Σxy, Σx2

and their sums as presented below:

x y xy x2
5 80 400 25
7 57 399 49
6 58 348 36
6 55 330 36
5 70 350 25
4 88 352 16
7 43 301 49
6 60 360 36
5 69 345 25
5 63 315 25
2 118 236 4
58 761 3,736 326

The slope of the regression equation is therefore:

b = [11(3736) − (58)(761)]/11(326) − (58)2 = −13.7

While the intercept is:

a = (761 − (13.7)(58)/11 = 141.43

Thus, the regression equation for this data is:

ŷ = 141.43 − 13.7x

143
STA 2100 Probability and Statistics I

2. To graph the regression equation, we need only substitute two different x-

values to obtain two distinct points (why? ). Using the x-values x = 2 and
x = 8. The corresponding y-values are:

ŷ = 141.43 − 13.7(2) = 114.03

ŷ = 141.43 − 13.7(8) = 31.83
Consequently, the regression line passes through the two points (2, 114.03)
and (8, 31.83). The plot of these points should be shown on the diagram and also
include the data points as given in the table. This is the straight line that best fits
the data points according the least-squares criterion (i.e. the straight line whose
sum of squared errors is smallest)

3. Here, we are to describe the apparent relationship between age and price for
Datsun Zs. Since the slope of the regression line is negative, we see that
the price tends to decrease as age increases-Any surprises!

4. For this part, we are to interpret the slope of the regression equation in
terms of the prices for Datsun Zs. To begin, recall that represents age, in
years, and represents price, in hundred dollars. The slope of -13.70 or $1,370
indicates that Datsun Zs depreciate an estimated $1,370 per year, at least
in the two-to seven year-old range.

5. Finally, we are meant to use the regression equation ŷ = 141.43 − 13.7x to

predict the price for a two-year old Z and a five-year old Z. i.e. x = 2 and
x = 5 and Hence, predicted price is;

ŷ = 141.43 − 13.7(2) = 114.03

Or $114.03. Similarly, the price for a five year old Z is:
ŷ = 141.43 − 13.7(5) = 72.93
Or $7,293.
Remark 3. Warning on use of linear regression line: The idea behind regres-
sion line is based on the assumption that the data points are actually scattered
around a straight line. But data points can at times be scattered about a curve.
Unfortunately the formulas for a and b will work for this data set but fit an in
appropriate regression line to the data.

144
STA 2100 Probability and Statistics I

Remark 4. If you plan to find a regression line for a set of data points, first look at
a scatter diagram of the data. If data points do not appear to be scattered about
a straight line, do not determine a regression line.

2. Exercises

Exercise 22. Scores made by students in a statistics class in the mid-term and
final examination are given here. Develop a regression equation which may be used
to predict final examination scores from the mid – term score.
Student 1 2 3 4 5 6 7 8 9 10
Mid- 98 66 100 96 88 45 76 60 74 82
term
Final 90 74 98 88 80 62 78 74 86 80

3. Revision Questions
The following is a list of questions that will assist you in your revision.
Practice Problems:

1. Let variable X is the number of hamburgers consumed at a cook-out, and variable Y

is the number of beers consumed. Develop a regression equation to predict how
many beers a person will consume given that we know how many hamburgers that
person will consume.

Subject 1 2 3 4 5
Hamburgers 5 4 3 2 1
Beers 8 10 4 6 2

2. A horse owner is investigating the relationship between weight carried and the finish
position of several horses in his stable. Calculate r and R for the data given

Weight 11 11 12 11 11 11 11 12 10 10 11 11
carried
Position 0
2 3
6 0
3 5
4 0
6 5 7
4 3
2 6
1 8
4 0
1 0
3
Finishe
3. The top and bottom number which may appear on a die are as follows Calculate r
d
and R for these values. Are the results surprising?

Top 1 2 3 4 5 6

Bottom 5 6 4 3 1 2

4. Researchers interested in determining if there is a relationship between death

anxiety and religiosity conducted the following study. Subjects completed a death
anxiety scale (high score = high anxiety) and also completed a checklist designed to
measure an individual’s degree of religiosity (belief in a particular religion, regular
attendance at religious services, number of times per week they regularly pray, etc.)
(high score = greater religiosity . A data sample is provided below:

X 38 42 29 31 28 15 24 17 19 11 8 19 3 14 6
y 4 3 11 5 9 6 14 9 10 15 19 17 10 14 18

a) What is your computed answer?

b) What does this statistic mean concerning the relationship between death
anxiety and religiosity?

c) What percent of the variability is accounted for by the relation of these two
variables?

5. The data given below are obtained from student records.( Grade Point Average (x)
and Graduate Record exam score (y)) Calculate the regression equation and compute
the estimated GRE scores for GPA = 7.5 and 8.5..

Subject 11 12 13 14 15 16 17 18 19 20
X 8.3 8.6 9.2 9.8 8.0 7.8 9.4 9.0 7.2 8.6
y 2300 2250 2380 2400 2000 2100 2360 2350 2000 2260

145
STA 2100 Probability and Statistics I

6. A horse was subject to the test of how many minutes it takes to reach a point from
the starting point. The horse was made to carry luggage of various weights on 10
trials.. The data collected are presented below in the table. Find the regression
equation between the load and the time taken to reach the goal. Estimate the time
taken for the loads of 35 Kgs , 23 Kgs, and 9 Kgs. Are the answers in agreement with
your intuitive feelings? Justify.

Trial 1 2 3 4 5 6 8 8 9 10
Number 11
Weight 23 16 32 12 28 29 19 25 20
(in Kgs) 13
Time 22 16 47 13 39 43 21 32 22
taken
(in
7. A study was conducted
mins) to find whether there is any relationship between the weight
and blood pressure of an individual. The following set of data was arrived at from a
clinical study.

Serial 1 2 3 4 5 6 8 8 9 10
Number 78
Weight 86 72 822 80 86 84 89 68 71
Blood 140 160 134 144 180 176 174 178 128 132
Pressure
8. It is assumed that achievement test scores should be correlated with student's
classroom performance. One would expect that students who consistently perform
well in the classroom (tests, quizzes, etc.) would also perform well on a standardized
achievement test (0 - 100 with 100 indicating high achievement (x)). A teacher
decides to examine this hypothesis. At the end of the academic year, she computes a
correlation between the students achievement test scores (she purposefully did not
look at this data until after she submitted students grades) and the overall G.P.A.(y)
for each student computed over the entire year. The data for her class are provided
below.

X 98 96 94 88 01 77 86 71 59 6 8 7 7 7 8 8 7 9 9 6
Y 3.6 2.7 3.1 4.0 3.2 3.0 3.8 2.6 3.0 3 4 9
2 1 5 2
3 2 6 3
2 2 5 1 3 3
2 3 0 2
1
. . . . . . . . . . .
a) Compute the correlation coefficient. 2 7 1 6 9 4 4 8 7 2 6

b) What does this statistic mean concerning the relationship between

achievement test performance and G.P.A.?

c) What percent of the variability is accounted for by the relationship between

the two variables and what does this statistic mean?

d) What would be the slope and y-intercept for a regression line based on this
data?

e) If a student scored a 93 on the achievement test, what would be their

predicted G.P.A.? If they scored a 74? A 88?

9. With the growth of internet service providers, a researcher decides to examine

whether there is a correlation between cost of internet service per month (rounded to

146
STA 2100 Probability and Statistics I

the nearest dollar) and degree of customer satisfaction (on a scale of 1 - 10 with a 1
being not at all satisfied and a 10 being extremely satisfied). The researcher only
includes programs with comparable types of services. A sample of the data is
provided below.

Dollars 11 18 17 15 9 5 12 19 22 25
Satisfaction 6 8 10 4 9 6 3 5 2 10

a) Compute the correlation coefficient.

b) What does this statistic mean concerning the relationship between amount of
money spent per month on internet provider service and level of customer
satisfaction?

c) What percent of the variability is accounted for by the relationship between

the two variables and what does this statistic mean?

10. It is hypothesized that there are fluctuations in norepinephrine (NE) levels which
accompany fluctuations in affect with bipolar affective disorder (manic-depressive
illness). Thus, during depressive states, NE levels drop; during manic states, NE
levels increase. To test this relationship, researchers measured the level of NE by
measuring the metabolite 3-methoxy-4-hydroxyphenylglycol (MHPG in micro gram
per 24 hour) in the patient's urine experiencing varying levels of mania/depression.
Increased levels of MHPG are correlated with increased metabolism (thus higher
levels) of central nervous system NE. Levels of mania/depression were also recorded
on a scale with a low score indicating increased mania and a high score increased
depression. The data is provided below.

MHPG 980 1209 1403 1950 1814 1280 1073 1066 880 776

Affect 22 26 8 10 5 19 26 12 23 28

a) Compute the correlation coefficient.

b) What does this statistic mean concerning the relationship between MHPG
levels and affect?

c) What percent of the variability is accounted for by the relationship between

the two variables?

d) What would be the slope and y-intercept for a regression line based on this
data?

e) What would be the predicted affect score if the individual had an MHPG level
of 1100? of 950? of 700?

147
STA 2100 Probability and Statistics I

4. Learning Activities
1. Daniel computed the following statistics based on the amount (X) in millions
(Kshs) that he invested in his cyber café business, and the income (Y) in
millions (Kshs) generated.
P P 2 P P P 2
n = 10, xi = 93, xi = 999, xi yi = 293, yi = 28, yi = 90

• Using the data, fit a linear regression line of the income (y) generated on
the amount (x) invested.

• Use the regression equation to determine how much Daniel would realize if
he invested Kshs 2.5M and comment on your results.

148

Chapter-4-Simple Linear Regression & Correlation
100% (3)
Chapter-4-Simple Linear Regression & Correlation
9 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Maths PRojet 2
No ratings yet
Maths PRojet 2
13 pages
Time Management
91% (11)
Time Management
40 pages
Unit III Part B
No ratings yet
Unit III Part B
31 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
83 pages
Correlation
100% (1)
Correlation
29 pages
Correlation
No ratings yet
Correlation
57 pages
8.2 Regression
No ratings yet
8.2 Regression
16 pages
Linear Regression Analysis - 1
No ratings yet
Linear Regression Analysis - 1
18 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Regression
No ratings yet
Regression
18 pages
Regression: Leech N L, Barret K C & Morgan G A (2011)
No ratings yet
Regression: Leech N L, Barret K C & Morgan G A (2011)
35 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Simple Linear Regression: Scatter Chart
No ratings yet
Simple Linear Regression: Scatter Chart
3 pages
Module - 05 Statistical Computing and R Programming
No ratings yet
Module - 05 Statistical Computing and R Programming
53 pages
Chapter 0 - Multiple Regression Models
No ratings yet
Chapter 0 - Multiple Regression Models
34 pages
6145 Maths 1
100% (1)
6145 Maths 1
866 pages
Chapter Regression PDF
No ratings yet
Chapter Regression PDF
95 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
No ratings yet
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
13 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
No ratings yet
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
8 pages
Elementary Statistics A Step-by-Step Approach Bluman 9th Edition Test Bank PDF Download
100% (1)
Elementary Statistics A Step-by-Step Approach Bluman 9th Edition Test Bank PDF Download
38 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
QT - Unit 2 - Part B - Regression
No ratings yet
QT - Unit 2 - Part B - Regression
40 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Assignment 6 - STAT
No ratings yet
Assignment 6 - STAT
12 pages
Ra Web
No ratings yet
Ra Web
70 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
MEFall2023 5
No ratings yet
MEFall2023 5
13 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Safety and Fire Engineering Syllabus
No ratings yet
Safety and Fire Engineering Syllabus
62 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Hsslive-Xii-Statistics-2. Rehression English
No ratings yet
Hsslive-Xii-Statistics-2. Rehression English
5 pages
Edu 211 Sample Past Papers With Answers
No ratings yet
Edu 211 Sample Past Papers With Answers
35 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
How To Write Objectives of The Study in Thesis
100% (3)
How To Write Objectives of The Study in Thesis
7 pages
Simple Linear Regression and Correlation 568a5ac2ce9b3
No ratings yet
Simple Linear Regression and Correlation 568a5ac2ce9b3
31 pages
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
No ratings yet
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
10 pages
Aiml Module 3 Part 3
No ratings yet
Aiml Module 3 Part 3
12 pages
Escop Abstractbook 05092023
No ratings yet
Escop Abstractbook 05092023
300 pages
SDM Lab Report
No ratings yet
SDM Lab Report
35 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Regression
No ratings yet
Regression
1 page
Regression Analysis
No ratings yet
Regression Analysis
21 pages
Cha 6
No ratings yet
Cha 6
8 pages
Normal Distribution Z MultiTool RBD
No ratings yet
Normal Distribution Z MultiTool RBD
45 pages
Management Science Notes
No ratings yet
Management Science Notes
13 pages
Probablity
No ratings yet
Probablity
4 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Introduction To Regression
No ratings yet
Introduction To Regression
13 pages
The Use of Old Man Cactus
No ratings yet
The Use of Old Man Cactus
17 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
BJMC 14 Block 02
No ratings yet
BJMC 14 Block 02
44 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
08 NLP With Deep Learning
No ratings yet
08 NLP With Deep Learning
31 pages
Econometrics Model With Panel Data: Dinh Thi Thanh Binh Faculty of International Economics, FTU
No ratings yet
Econometrics Model With Panel Data: Dinh Thi Thanh Binh Faculty of International Economics, FTU
19 pages
Production Planning and Control
No ratings yet
Production Planning and Control
44 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Regression Intro
No ratings yet
Regression Intro
3 pages
CH 5 CTMC
No ratings yet
CH 5 CTMC
42 pages
Hamlet PDF
No ratings yet
Hamlet PDF
7 pages
Exam-Empirical Methods For Finance
No ratings yet
Exam-Empirical Methods For Finance
7 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
3 pages
Detection of Fraud Statement Based On Word Vector Evidence From Financial Companies in China - ScienceDirect
No ratings yet
Detection of Fraud Statement Based On Word Vector Evidence From Financial Companies in China - ScienceDirect
9 pages
Final Exam - Quantitative1
No ratings yet
Final Exam - Quantitative1
9 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
Cross Cultural Analysis
No ratings yet
Cross Cultural Analysis
7 pages
U3-L4 - Sampling Distributions
No ratings yet
U3-L4 - Sampling Distributions
25 pages
Comando Svy Stata
No ratings yet
Comando Svy Stata
3 pages
Cultures and Organizations: Software of The Mind
No ratings yet
Cultures and Organizations: Software of The Mind
14 pages
Service Quality PDF
No ratings yet
Service Quality PDF
33 pages
Department of Management School of Management &business Studies Jamia Hamdard
No ratings yet
Department of Management School of Management &business Studies Jamia Hamdard
3 pages
IET 480 - Fall 2021 - Madojemu
No ratings yet
IET 480 - Fall 2021 - Madojemu
8 pages
BW LME Tutorial2 PDF
No ratings yet
BW LME Tutorial2 PDF
22 pages
Assignment3 - Nekhlesh SIngh Sajwan
No ratings yet
Assignment3 - Nekhlesh SIngh Sajwan
5 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

STA2100-Regression Analysis

Uploaded by

STA2100-Regression Analysis

Uploaded by

STA 2100 Probability and Statistics I

• Know the assumptions of linear regression

Regression analysisis a measure of the average relationship between two or

Simple Regression: Here, we use a single variable to estimate or predict another

Linear Regression Analysis: A regression analysis is called linear if the equation

Curvilinear Regression Analysis: A regression Analysis is called curvilinear if

Multiple Regressions: This is a multivariate regression; that is it involves several

Independent Variable: This is the variable whose value is known.

Dependent Variable: this is the variable whose value is to be predicted by

1.1. Simple Linear (least squares) Regression Model

• In this equation m is the “slope” of the line (change in y over change in

• Plotting X versus Y, as we did in the correlation plot is a good first step

to determine if a linear modelis appropriate. This may reveal a a lot of

• Sometimes data can be transformed (often by taking logarithms, square-root

• Therefore, determining if a straight line relationship between X and Y is

1.2. Assumptions of linear regression

• We assume linearity between X and the mean of Y. The mean of Y is

• The variance of the Yi s is constant (homoscedasticity property).

• The response Yi s are normally distributed with a constant variance.

1.3. Fitting the Regression Model/Equation

−1.0 −0.5 0.0 0.5

• We can simply take a ruler draw in a line (straight?), which, according to

b = ( (xi − x)(yi − y))/ (xi − x)2

This can further be rearranged and expressed as,

problem, we use the regression equation formula as previously illustrated.

• The least-squares criterion tells us what property the best-fitting line to a

1.4. Regression Equation:

1. To determine the regression equation, we need to compute and using the

n, Σx, Σy, Σxy, Σx2

and their sums as presented below:

The slope of the regression equation is therefore:

b = [11(3736) − (58)(761)]/11(326) − (58)2 = −13.7

While the intercept is:

a = (761 − (13.7)(58)/11 = 141.43

Thus, the regression equation for this data is:

2. To graph the regression equation, we need only substitute two different x-

ŷ = 141.43 − 13.7(2) = 114.03

5. Finally, we are meant to use the regression equation ŷ = 141.43 − 13.7x to

ŷ = 141.43 − 13.7(2) = 114.03

1. Let variable X is the number of hamburgers consumed at a cook-out, and variable Y

4. Researchers interested in determining if there is a relationship between death

a) What is your computed answer?

b) What does this statistic mean concerning the relationship between

c) What percent of the variability is accounted for by the relationship between

e) If a student scored a 93 on the achievement test, what would be their

9. With the growth of internet service providers, a researcher decides to examine

a) Compute the correlation coefficient.

c) What percent of the variability is accounted for by the relationship between

a) Compute the correlation coefficient.

c) What percent of the variability is accounted for by the relationship between

You might also like