0% found this document useful (0 votes)
256 views8 pages

Regression

Regression analysis is a statistical technique used to understand the relationship between variables. Simple linear regression analyzes the relationship between a single independent variable and dependent variable. The estimated simple linear regression equation can be used to predict the dependent variable (y) based on the independent variable (x) using the slope (b1) and y-intercept (b0). For example, a construction company used advertising expenditures (x) to predict sales (y) and found that sales increased by $5.67 for every $1 increase in advertising expenditures.

Uploaded by

Jonjie Milado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
256 views8 pages

Regression

Regression analysis is a statistical technique used to understand the relationship between variables. Simple linear regression analyzes the relationship between a single independent variable and dependent variable. The estimated simple linear regression equation can be used to predict the dependent variable (y) based on the independent variable (x) using the slope (b1) and y-intercept (b0). For example, a construction company used advertising expenditures (x) to predict sales (y) and found that sales increased by $5.67 for every $1 increase in advertising expenditures.

Uploaded by

Jonjie Milado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IT2011

Regression Analysis
• In statistical modeling, regression analysis is a statistical process for estimating the relationship among
variables. It includes many techniques for modeling and analyzing several variables when the focus is
on the relationship between a dependent variable and one or more independent variables (or
predictors). More specifically, regression analysis helps one understand how the typical value of the
dependent (or criterion variable) changes when the independent variables are varied. Given below
are some common applications of regression analysis in business and social sciences.
o The marketing manager wants to know if sales is dependent on factors such as advertising
spend, the number of products introduced, the number of sales personnel, etc.
o The HR department wants to predict the efficiency of management trainees based on
their academic performance, leadership abilities, IQ level, etc.
o A social researcher wants to predict the age of marriage of a girl based on characteristics
such as her education level, parent’s education level, number of siblings, and parent’s
annual income.

Simple Linear Regression


• Simple linear regression is a statistical method that is used to summarize and study the linear
relationship between two (2) quantitative variables. One variable, denoted by X, is regarded as the
predictor, explanatory, or independent variable. The other variable, denoted as Y, is regarded as the
response, outcome, or dependent variable.
• Simple linear regression, from the word itself simple, concerns the study of only one independent
variable.

Population Simple Linear Regression Model

𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀
where:

𝛽0 is the y-intercept of the regression line;

𝛽1 is the slope of the regression line; and

𝜀 is the error term.

Estimated Simple Linear Regression Equation (Least Square Regression Equation)

• It is an equation that is used to predict the value of the dependent variable based on the value of
the independent variable.

𝑦̂ = 𝑏0 + 𝑏1 𝑥
where:

𝑦̂ is the predicted value of y for a given x value;

𝑏0 is the y-intercept of the line; and

𝑏1 is the slope of the line.

10 Handout 1 *Property of STI


[email protected] Page 1 of 8
IT2011

*Note: 𝑏0 and 𝑏1 are the sample statistics used to estimate 𝛽0 and 𝛽1

• The Estimated Simple Linear Regression Equation can be formulated by finding the slope and y-
intercept of the equation.
o Slope of the equation
𝑛(∑ 𝑋𝑌) − (∑ 𝑋) (∑ 𝑌)
𝑏1 = 2
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)
where:

𝑋 is the set of observations in the independent variable;

𝑌 is the set of observations in the dependent variable;

𝑋 2 is the set of the squares of observations in the independent variable; and

𝑛 is the sample size.

o y-intercept of the equation


(∑ 𝑌)(∑ 𝑋 2 ) − (∑ 𝑋) (∑ 𝑋𝑌)
𝑏0 = 2
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)
where:

𝑋 is the set of observations in the independent variable;

𝑌 is the set of observations in the dependent variable;

𝑋 2 is the set of the squares of observations in the independent variable; and

𝑛 is the sample size.

Coefficient of Determination (𝑟 2 )

• It is used to determine how well the estimated regression line fits the sample data. It is very
useful in assessing how much errors of prediction of the dependent variable (y) can be reduced
by using the information provided by the independent variable (x). It can be computed using the
following formula:
2

𝑛(∑ 𝑋𝑌) − (∑ 𝑋) (∑ 𝑌)
𝑟2 =
√[𝑛(∑ 𝑋 2 ) − (∑ 𝑋)2 ][𝑛(∑ 𝑌 2 ) − (∑ 𝑌)2 ]
( )

Example:

• The manager at a construction company wants to predict the sales base on the advertising
expenditures. The company’s general manager has collected the following data on advertising
expenditures and gross sales for the past 12 months.

10 Handout 1 *Property of STI


[email protected] Page 2 of 8
IT2011

Advertising Sales
Month
(in hundred thousand pesos) (in hundred thousand pesos)
January 1.2 21.2
February 1.4 21.8
March 0.5 17.0
April 2.1 25.5
May 2.0 26.2
June 1.6 22.5
July 1.0 19.5
August 0.6 17.3
September 0.8 17.5
October 1.8 24.0
November 1.9 23.8
December 1.5 22.3
Solution:

• The independent variable is the advertising expenditures, while the dependent variable is the
sale.
• The estimated simple linear regression equation can be used to predict the sales based on the
advertising expenditures. To formulate this equation, first is to find the slope and the y-intercept
of the equation, then substitute these values to the equation 𝑦̂ = 𝑏0 + 𝑏1 𝑥.

Month Advertising (X) Sales (Y) XY 𝑿𝟐 𝒀𝟐


January 1.2 21.2 25.44 1.44 449.44
February 1.4 21.8 30.52 1.96 475.24
March 0.5 17.0 8.5 0.25 289
April 2.1 25.5 53.55 4.41 650.25
May 2.0 26.2 52.4 4 686.44
June 1.6 22.5 36 2.56 506.25
July 1.0 19.5 19.5 1 380.25
August 0.6 17.3 10.38 0.36 299.29
September 0.8 17.5 14 0.64 306.25
October 1.8 24.0 43.2 3.24 576
November 1.9 23.8 45.22 3.61 566.44
December 1.5 22.3 33.45 2.25 497.29
Total: 16.4 258.6 372.16 25.72 5682.14

10 Handout 1 *Property of STI


[email protected] Page 3 of 8
IT2011

o Slope of the equation


𝑛(∑ 𝑋𝑌) − (∑ 𝑋) (∑ 𝑌)
𝑏1 = 2
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)
12(372.16) − (16.4)(258.60)
=
12(25.72) − (16.4)2
= 𝟓. 𝟔𝟕
o y-intercept of the equation
(∑ 𝑌)(∑ 𝑋 2 ) − (∑ 𝑋) (∑ 𝑋𝑌)
𝑏0 = 2
𝑛(∑ 𝑋 2 ) − (∑ 𝑋)
(258.6)(25.72) − (16.4)(372.16)
=
12(25.72) − (16.4)2
= 𝟏𝟑. 𝟖𝟎
o Estimated simple linear regression equation
𝑦̂ = 𝑏0 + 𝑏1 𝑥

̂ = 𝟏𝟑. 𝟖𝟎 + 𝟓. 𝟔𝟕𝒙
𝒚

• The regression line can be seen in the following graph:

Sales and Advertising Expenditures for a


Construction Company
30
Sales (in hundred thousand)

25

20

15

10

0
0 0.5 1 1.5 2 2.5
Advertising Expenditures (in hundred thousand)

• The coefficient of determination (𝑟 2 ) can be computed as:


2

𝑛(∑ 𝑋𝑌) − (∑ 𝑋) (∑ 𝑌)
𝑟2 =
√[𝑛(∑ 𝑋2 ) − (∑ 𝑋)2 ][𝑛(∑ 𝑌2 ) − (∑ 𝑌)2 ]
( )

10 Handout 1 *Property of STI


[email protected] Page 4 of 8
IT2011

2
2
12(372.16) − (16.4)(258.6)
𝑟 =( ) = 𝟎. 𝟗𝟕𝟏𝟔 𝒐𝒓 𝟗𝟕. 𝟏𝟔%
√[12(25.72) − (16.4)2 ][12(5682.14) − (258.6)2 ]

Therefore, approximately 97.16% of the variability in sales can be explained by the advertising
expenses. It means that the estimated linear regression equation works well in predicting the
sales of the company by reducing the error to 2.84% using the advertising expenses.

Explained and Unexplained Variation

• The sum of squares due to regression (SSR) is the explained variation, which is the deviation
between the predicted/estimated value of y and the average value of y. 𝑺𝑺𝑹 = ∑(𝒚̂ − 𝒚 ̅)𝟐
• The sum of squares due to error (SSE) is the unexplained variation, which is the deviation between
the actual/observed value of y and the predicted/estimated value of y. 𝑺𝑺𝑬 = ∑(𝒀 − 𝒚̂ )𝟐
• The total sum of squares (SST) is the deviation between the actual/observed value of y and the
average value of y, or simply the sum of SSE and SSR. 𝑺𝑺𝑻 = ∑(𝒀 − 𝒚 ̅)𝟐 or 𝑺𝑺𝑻 = 𝑺𝑺𝑬 + 𝑺𝑺𝑹

*Note: The coefficient determination can also be calculated using the following formula:
𝑆𝑆𝐸
𝑅2 = 1 − ; 𝑜𝑟
𝑆𝑆𝑇
𝑆𝑆𝑅
𝑅2 =
𝑆𝑆𝑇

Multiple Regression

• Multiple regression is the extension of simple linear regression and focuses on assessing the
strength of the relationship between each of a set of independent variables and a single dependent
variable.

Population Multiple Regression Model

𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑝 𝑥𝑝 + 𝜀
where:
𝛽0 , 𝛽1 , 𝛽2 ,…, 𝛽𝑝 are the parameters; and
𝜀 is the random error.

Estimated Multiple Regression Equation from a Sample

𝑦̂ = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑝 𝑥𝑝

where:

𝑏0 , 𝑏1 , 𝑏2 ,…, 𝑏𝑝 are the sample statistics (Coefficients).

Example:

Suppose we are interested in predicting the current market value of houses in a particular city. Data are
collected from a random sample of 30 house current values (in ₱100,000s) together with the

10 Handout 1 *Property of STI


[email protected] Page 5 of 8
IT2011

corresponding living area (in 100 square feet) and the distance in miles from the city center. The data
are shown in the following table:

City Center City Center


Value Area Distance Value Area Distance
(in ₱100,000s) (100 sq. feet) (miles) (in ₱10,000s) (100 sq. feet) (miles)
House (y) (𝑥1 ) (𝑥2 ) House (y) (𝑥1 ) (𝑥2 )
1 122.87 12 1.2 16 130.82 13 1.3
2 153.30 15.6 1.5 17 141.27 14.5 1.6
3 151.20 14.6 1.6 18 143.37 14.3 1.9
4 143.08 14.7 2.5 19 153.83 15.8 2.4
5 139.75 15 2.7 20 154.41 16.9 2.6
6 153.65 16.6 2.6 21 157.62 17.2 4
7 118.67 12 3.2 22 138.00 15.7 3.2
8 129.36 13.4 3.3 23 137.36 16 4.2
9 121.00 14 4.1 24 127.66 12.5 3.9
10 136.95 15.6 4.2 25 140.92 15.7 3.8
11 114.35 13 4.4 26 138.70 15.6 4.5
12 129.82 14.5 4.7 27 105.59 12.6 4.8
13 112.24 12.3 5.1 28 117.33 13.4 5
14 145.30 17.7 5.3 29 120.30 13.7 5.1
15 127.60 15.3 5.5 30 139.46 17.6 6.3
Formulate a multiple regression equation that can predict or estimate the house values using the living
area and distance variables.

Solution:

• The given data set has two (2) independent variables (area (𝑥1 ) and distance from the city center
(𝑥2 )) and one (1) dependent variable (value of the house (y)).
• The current market value of a specific house can be estimated by formulating the estimated
multiple regression equation.
• MS Excel can be used to calculate the sample statistics and formulate the estimated multiple
regression equation.
a. Input the data in a worksheet. Go
to Data, then click Data Analysis.
b. In the Data Analysis box, choose
Regression.

10 Handout 1 *Property of STI


[email protected] Page 6 of 8
IT2011

c. Set the Input Y Range and Input X Range by selecting the observations in dependent variables
and independent variables, respectively. Check the Labels checkbox and Confidence Level

checkbox. Set the Output Range, then click OK.

d. The calculated sample statistics are shown below:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.956726
R Square 0.915325
Adjusted R Square 0.909053
Standard Error 4.231613
Observations 30

ANOVA
df SS MS F Significance F
Regression 2 5226.31 2613.155 145.9329457 3.34736E-15
Residual 27 483.4768 17.90655
Total 29 5709.787

Coefficients
Standard Error t Stat P-value Lower 95%
Intercept 46.79047 7.038031 6.648234 0.00000039087831467 32.34962768
Area 7.328059 0.477722 15.3396 0.00000000000000748 6.34785547
City Center Distance -5.5225 0.563573 -9.79907 0.00000000021906578 -6.678855156

Therefore, the estimated multiple regression equation is


̂ = 𝟒𝟔. 𝟕𝟗𝟎𝟓 + 𝟕. 𝟑𝟐𝟖𝟏𝒙𝟏 − 𝟓. 𝟓𝟐𝟐𝟓𝒙𝟐
𝒚

• This equation can be used to estimate the travel time given the distance traveled, number of
deliveries, and gas price.

10 Handout 1 *Property of STI


[email protected] Page 7 of 8
IT2011

o Ex. What is the estimated house value if its area is 1200 square feet and the distance
from the city center is 1.2 miles?
Solution:
𝑦̂ – estimated travel time
𝑥1 = area (in 100 square feet)= 12
𝑥2 = distance from city center (miles) = 1.2

𝑦̂ = 46.7905 + 7.3281(12) − 5.5225(1.2)


= 𝟏𝟐𝟖. 𝟏𝟎𝟎𝟐
The estimated house value of a 1200 square feet house 1.2 miles from the city
center is approximately ₱12,810,020.

Hypothesis Test of Significance for the Individual Parameters

• To test the individual significance of each of the beta parameters (𝛽1 , 𝛽2 , 𝛽3 ), test if each
parameter is equal to zero. Remember that these values are the slopes, and if the slope is equal
to 0, then there is no relationship between x and y. In general, we can state the following
hypotheses:
𝐻0 : 𝛽𝑖 = 0 (There is no significant relationship between x and y.)
𝐻𝑎 : 𝛽𝑖 ≠ 0 (There is a significant relationship between x and y.)
• The t-stat or p-value can be used to test the significance of the individual parameter.
o Rejection rule using the p-value.
Suppose the computed p-value is less than or equal to the set significance level. In that case, the
decision is to “Reject the null hypothesis (𝐻0 )”. On the other hand, if the p-value is greater than
the set significance level, the decision is “Fail to reject the null hypothesis (𝐻0 )”.

o Rejection rule using critical value.

Suppose the computed t-value/t-stat is less than or equal to the critical value. In that case, the
decision is “Fail to reject the null hypothesis (𝐻0 )”, on the other hand, if the t-value is greater than
the critical value, the decision is “Reject the null hypothesis (𝐻0 )”.

References:

Gaur, A., Gaur, S. (2009). Statistical method for practices and research. Singapore: SAGE Publication Inc.

Landau, S., Everitt, B. (2004). A handbook of statistical analysis using SPSS. United State of America:
Chapman & Hall/CRC

Sullivan, M. (2017). Informed decision using data: Fifth edition: Pearson Education

Unknown (ND). Multiple regression. Retrieved from:


https://www.statstutor.ac.uk/resources/uploaded/multiple-regression.pdf

10 Handout 1 *Property of STI


[email protected] Page 8 of 8

You might also like