FBAS
FBAS
Managerial decisions
- are often based on the relationship between variables.
Regression analysis
- is a statistical procedure that uses data to develop an equation showing how the variables are related.
o Dependent (or response/ target) variable
- is the variable being predicted.
o Independent (or predictor) variables (or features)
- are variables used to predict the value of the dependent variable.
o Simple linear regression
- is a form of regression analysis in which a (“simple”) single independent variable,
x is used to develop a “linear” relationship (straight line) with the dependent
for a given 𝑥.
^y is the point estimator of E ( y∨x ) , the mean of y
b 1=
∑ ( xi −x ) ( y i − y )
∑ ( x i−x )2
b 0= y−b1 x
n ∑ xy −∑ x ∑ y
b 1= 2
n ∑ x −¿ ¿
b 1=10(5594)−¿ ¿
55940−53600
b 1=
674500−640000
2340
b 1=
34500
b 1=0.0678
b 0= ȳ−b1 x̄
b 0=6.7−(0.0678)(80)
b 0=6.7−5.424
b 0=1.276
Experimental region
- defined as the range of values of the independent variables in the data used to
estimate the model.
Extrapolation
- the prediction of the value of the dependent variable outside the experimental
region, is risky and should be avoided unless we have empirical evidence
dictating otherwise.
The Sums of Squares
sum of squares due to error (SSE)
- is a measure of the error that results from using the ^y ivalues to predict the y i
values
SSE=∑ ( y i−^y i )
2
-
SST =∑ ( y i− y )
2
-
- is a measure of how much the ^y ivalues deviate from the sample mean y .
SSR=∑ ( ^y i− y )
2
-
The relationship between these three sums of squares is SST =SSR+ SSE
Coefficient of Determination
The ratio SSR/SST is called the coefficient of determination, denoted by r 2.
2 SSR
r=
SST
The coefficient of determination can only assume values between 0 and 1 is used to evaluate the goodness of
fit for the estimated regression equation.
A perfect fit exists when y i is identical to ^y i for every observation i so that all residuals y i− ^y i =0.
Poorer fits between y iand ^y i result in larger values of SSE and lower r 2values.
Where:
^y is a point estimate of E ( y ) for a given set of p independent variables, x 1 , x 2 ,… , x q.
A simple random sample is used to compute the sample statistics b 0 , b 1 , b 2 , … , bq that are used as estimates of
β 0 , β1 , β2 , … , βq.
The least squares method uses the sample data to provide the values of the sample statistics b 0 , b 1 , b 2 , … , bq
that minimize the sum of the square errors between the y i and the ^y i.
Because the formulas for the regression coefficients involve the use of matrix algebra, we rely on computer
software packages to perform the calculations.
The emphasis will be on how to interpret the computer output rather than on how to make the multiple
regression computations.