0% found this document useful (0 votes)

15 views33 pages

Regression PDF

The document provides a comprehensive overview of regression analysis, including simple and multiple regression techniques, their rationale, and methods for assessing model fit and significance. It discusses various regression methods, assumptions, and the identification of outliers, along with practical implementation in R. Key concepts such as the method of least squares, coefficient of determination, and partial correlation are also covered.

Uploaded by

yu.ann.chen216

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views33 pages

Regression PDF

Uploaded by

yu.ann.chen216

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Regression

Qin Gao

Contents
Simple Linear Regression 3
Rationale of simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Assessing how well the model fits the observed data . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Perform simple regression in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Multiple Regression 6
Rationale of multiple regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Multiple Regression: Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Partial correlation, semi-partial (part) correlation, and regression coeﬀicients . . . . . . . . . . . . 7
Perform multiple regression in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Interpretatio of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Types of Research Questions and Corresponding Regression Methods 10

Forced Entry Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Hierarchical Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Stepwise Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
All-subsets-regressions (Best-subset) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Summary of Data-Driven Regression Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Assumptions of Regression 20
Straightforward Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
The More Tricky Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Finding outliers and influential cases 23

Outliers and influential cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Residuals and standardized residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
DFBeta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1
DFFits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Cook’s D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Hat values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Model Building and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Categorical predictors and multiple reression 30

Summary 33

2
Simple Linear Regression

Rationale of simple linear regression

• Regression is a way of predicting the value of one variable from another.

• In correlation, the two variables are treated as equals. In regression, one variable is considered inde-
pendent (=predictor) variable (X) and the other the dependent (=outcome) variable Y.

𝑌𝑖 = 𝑏0 + 𝑏𝑖 𝑋𝑖 + 𝜀𝑖

The Method of Least Squares

This graph shows a scatterplot of some data with a line representing the general trend. The vertical lines
(dotted) represent the differences (or residuals) between the line and the actual data

Assessing how well the model fits the observed data

• The regression line is only a model based on the data.

• We need some way of testing how well the model fits the observed data.
• Testing the model fit with ANOVA
– SST: Total variability (variability between scores and the mean).
– SSR: Residual/error variability (variability between the regression model and the actual data).
– SSM: Model variability (difference in variability between the model and the mean).

3
Testing the Model: ANOVA

If the model results in better prediction than using the mean, then we expect SSM to be much greater than
SSR

𝑀 𝑆𝑀
𝐹 =
𝑀 𝑆𝑅

• dfM: number of predictors

• dfR: the number of observations minus the number of parameters being estimated

Coeﬀicient of determination: 𝑅2

• 𝑅2 is the proportion of variance accounted for by the regression model.

• 𝑅2 is the Pearson Correlation Coeﬀicient Squared

𝑆𝑆𝑀
𝑅2 =
𝑆𝑆𝑇

Assessing the significance of individual predictors: t-test

𝐻0 ∶ 𝛽1 = 0 𝐻1 ∶ 𝛽1 ≠ 0
Testing Statistics:

𝑏𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 − 𝑏𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑏
𝑡= = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 ∼ 𝑡(𝑁 − 2)
𝑆𝐸𝑏 𝑆𝐸𝑏

4
Assessing the significance of individual predictors: F-test

𝐻0 ∶ 𝛽1 = 0 𝐻1 ∶ 𝛽1 ≠ 0

• 𝑆𝑆𝑀 : model sum of square

• 𝑆𝑆𝐸 : error sum of square
• 𝑆𝑆𝑀𝐻 : model sum of square without the j predictor
• 𝑆𝑆𝐸𝐻 : error sum of square without the j predictor

Testing Statistics:

𝑆𝑆𝑀 − 𝑆𝑆𝑀𝐻 𝑆𝑆𝑀𝑗

𝐹𝑗 = = ∼ 𝐹(1,𝑛−𝑘−1)
𝑆𝑆𝐸 /(𝑛 − 𝑘 − 1) 𝑆𝑆𝐸 /(𝑛 − 𝑘 − 1)

Perform simple regression in R

We run a regression analysis using the lm() function – lm stands for ‘linear model’. This function takes the
general form:
newModel<-lm(outcome ~ predictor(s), data = dataFrame, na.action = an action))

* na.action = na.fail
* na.action = na.omit or na.exclude

Example

album1 <- read.delim("Album Sales 1.dat", header = TRUE)

str(album1)

## 'data.frame': 200 obs. of 2 variables:

## $ adverts: num 10.3 985.7 1445.6 1188.2 574.5 ...
## $ sales : int 330 120 360 270 220 170 70 210 200 300 ...

head(album1)

## adverts sales
## 1 10.256 330
## 2 985.685 120
## 3 1445.563 360
## 4 1188.193 270
## 5 574.513 220
## 6 568.954 170

5
albumSales.1 <- lm(sales ~ adverts, data = album1)
summary(albumSales.1)

##
## Call:
## lm(formula = sales ~ adverts, data = album1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -152.949 -43.796 -0.393 37.040 211.866
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.341e+02 7.537e+00 17.799 <2e-16 ***
## adverts 9.612e-02 9.632e-03 9.979 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 65.99 on 198 degrees of freedom
## Multiple R-squared: 0.3346, Adjusted R-squared: 0.3313
## F-statistic: 99.59 on 1 and 198 DF, p-value: < 2.2e-16

cor(album1$sales, album1$adverts)^2

## [1] 0.3346481

Multiple Regression

Rationale of multiple regression

Multiple Regression is a natural extension of linear model:

𝑦𝑖 = 𝛽0 + 𝛽1 ∗ 𝑋1𝑖 + 𝛽2 ∗ 𝑋2𝑖 + 𝛽3 ∗ 𝑋3𝑖 + ...𝜀𝑖

Multiple Regression: Parameter Estimation

Least squares solution:

When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters
are:

𝑦 = X𝛽 + 𝜖
When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters
are

𝛽 ̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦

so the fitted values are

6
𝑦 ̂ = 𝑋 𝛽 ̂ = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦

The following matrix is called a projection matrix or hat matrix

𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇

The residual:

𝜖 = 𝑦 − 𝑦 ̂ = 𝑦 − 𝐻𝑦 = (𝐼 − 𝐻)𝑦

Partial correlation, semi-partial (part) correlation, and regression coeﬀicients

Partial vs. semi-partial correlation

• Partial correlation: measures the relationship between two variables, controlling for the effect that a
third variable has on all the variables involved
• Semi-partial correlation: Measures the relationship between two variables controlling for the effect that
a third variable has on only one of the others. It measures the unique contribution of a predictor to
explaining the variance of the outcome.

Partial correlation

• Using bi-variate correlations

𝑟12 − 𝑟13 𝑟23

𝑟12.3 = 2 √1 − 𝑟 2
√1 − 𝑟13 23

• Using multiple regressions

2 2
2 𝑅1.23 − 𝑅1.3
𝑟12.3 = 2
1 − 𝑅1.3

2
• 𝑅1.23 is the R2 from a multiple regression with 1 being Y and 2 and 3 being the predictor variables.
2
• 𝑅1.3 is the R2 from a simple regression with 1 being Y and 2 being X - the single predictor variable.

Semi-partial correlation Semipartial correlation removes the effects of additional variables from one of
the variables under study (typically X)

𝑟12 − 𝑟13 𝑟23

𝑟1(2.3) = 2
√1 − 𝑟23

Using multiple regressions:

2 2 2
𝑟1(2.3) = 𝑅1.23 − 𝑅1.3

7
Uses of Partial and Semipartial

• The partial correlation is most often used when some third variable z is a plausible explanation of the
correlation between X and Y.
• The semipartial is most often used when we want to show that some variable adds incremental variance
in Y above and beyond other X variable

Calculate semi-partial correlation in r: spcor() from the ppcor package

‘spcor(x, method = c(“pearson”, “kendall”, “spearman”))’

Semi-partial correlation and regression coeﬀicients

• Each regression coeﬀicient in the regression model is the amount of change in the outcome variable
that would be expected per one-unit change of the predictor, if all other variables in the model
were held constant.

• Regression is essentially about semi-partials: Each X is residualized on the other X variables.

• For each X we add to the equation, we ask, “What is the unique contribution of this X above and
beyond the others?” Increment in R2 when added last.
• We do NOT residualize Y, just X.
• Semipartial correlation coeﬀicient is conceptually close to standardized regression coeﬀicient

𝑟12 − 𝑟13 𝑟23

𝑟1(2.3) = 2
√1 − 𝑟23

𝑟12 − 𝑟13 𝑟23

𝛽1(2.3) = 2
1 − 𝑟23

• The difference is the square root in the denominator.

• The regression coeﬀicient can exceed 1.0 in absolute value; the correlation cannot.

Perform multiple regression in R

Example

• A record company boss was interested in predicting record sales from advertising.
• Data: 200 different album releases
• Outcome variable:
– Sales (CDs and downloads) in the week after release (in units of 1000)
• Predictor variable:
– The amount (in units of £1000) spent promoting the record before release.
– Number of plays on the radio: number of times played on radio the week before release
– Attractiveness of the CD cover: expert rating with a 0-10 scale

album2 <- read.delim("Album Sales 2.dat", header = TRUE)

albumSales.2 <- lm(sales ~ adverts+airplay+attract, data = album2)
summary(albumSales.2)

8
##
## Call:
## lm(formula = sales ~ adverts + airplay + attract, data = album2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -121.324 -28.336 -0.451 28.967 144.132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26.612958 17.350001 -1.534 0.127
## adverts 0.084885 0.006923 12.261 < 2e-16 ***
## airplay 3.367425 0.277771 12.123 < 2e-16 ***
## attract 11.086335 2.437849 4.548 9.49e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47.09 on 196 degrees of freedom
## Multiple R-squared: 0.6647, Adjusted R-squared: 0.6595
## F-statistic: 129.5 on 3 and 196 DF, p-value: < 2.2e-16

Interpretatio of the model

• F(3, 196) - 129.5, p < .001: The model results significantly better prediction than if we used the mean
value of album sales.
• 𝛽1 = 0.085: As advertising increases by 1 unit (£1000), album sales increase by 0.085 unit (1000).
• 𝛽2 = 3.367: When the number of plays on the radio increases by 1 unit, its sales increase by 3.367 unit
(1000).
• 𝛽3 = 11.086: When attractiveness of CD cover increases by 1 unit, its sales increase by 11.086 unit
(1000).

Standardised Beta Values

The standardized regression coeﬀicients represent the change in response for a change of one standard
deviation in a predictor.

𝑠𝑥𝑖
𝛽𝑖∗ = 𝛽𝑖
𝑠𝑦

You can calculate standardised beta values using the lm.beta() function from QuantPsyc package.

library(QuantPsyc)
lm.beta(albumSales.2)

## adverts airplay attract

## 0.5108462 0.5119881 0.1916834

Alternatively, you can standardize the original raw data by converting each original data value to a z-score,
and then perform multiple linear regression using the standardized data. The obtained regression coeﬀicients
are standardized.

9
Standardised Beta Values

• 𝛽1∗ = 0.511: As advertising increases by 1 standard deviation (£485, 655), album sales increase by
0.511 of a standard deviation (0.511* 80,699).
• 𝛽2∗ = 0.512: When the number of plays on the radio increases by 1 SD (12.27) its sales increase by
0.512 standard deviations(0.512* 80,699).
• 𝛽3∗ = 0.192: When the attractiveness of CD cover increases by 1 SD (1.40) its sales increase by 0.192
standard deviations (0.192* 80,699).

R2 and adjusted R2

• 𝑅2 : The proportion of variance accounted for by the model.

• 𝐴𝑑𝑗.𝑅2 : penalized by adjusting for the number of parameters in the model compared to the number
of observations.
– Wherry’s formula (Default output in R)

(1 − 𝑅2 ) · (𝑛 − 1)
1−
𝑛−𝑣

The adjusted R2 increases when a new explanator is included only if the new explanator improves the R2
more than would be expected by chance.

Parsimony-adjusted measures of fit

• Akaike Information Criterion (AIC) is a measure of fit which penalizes the model for having more
variables

𝑆𝑆𝐸
𝐴𝐼𝐶 = 𝑛 ln + 2𝑘
𝑛

* The bigger AIC is, the worse the fit is

* The smaller AIC is, the better the fit is
* AIC is only useful for comparing models with the same data and the same outocme variable

Calculating AIC in R
AIC(regression model, k = the number of predictors)

Types of Research Questions and Corresponding Regression Meth-

ods
Types of research questions that multiple regression can answer

• How well a set of IVs is able to predict a particular outcome (DV) ?

• Which IV is the best predictor of an outcome?
• Whether a particular predictor variable is still able to predict an outcome when the effects of another
variable is controlled for?

10
Methods of Regression

• Forced Entry: All predictors are entered simultaneously.

• Hierarchical: Experimenter decides the order in which variables are entered into the model.
• Stepwise: Predictors are selected using their semi-partial correlation with the outcome.
– Forward/backward/both
• All-subsets methods (best-subset methods)

Forced Entry Regression

• All variables are entered into the model simultaneously.

• Identifies the strongest predictor variable within the model
• Some researchers argued this method is the only appropriate method for theory testing (Studenmund
& Cassidy, 1987)

Interpretation of Results

• The F-test: It tells us whether using the regression model is significantly better at predicting values of
the outcome than using the mean.
• Beta values: the change in the outcome associated with a unit change in the predictor.
• Standardised beta values: tell us the same but expressed as standard deviations.

Hierarchical Regression

• Predictors are entered into the regression model in the order specified by the researcher based on past
research .
• New predictors are then entered in a separate step/block.
• Each IV is assessed in terms of what it adds to the prediction of DV after the previous IVs have been
controlled for.
• Overall model and relative contribution of each block of variables is assessed
– F test of 𝑅2 change.

Example

If we control for the possible effect of promotion budget, are airplay and CD cover design still able to predict
a significant amount of the variance in CD sales?
Examine individual models

album.adv.only <- lm(sales~adverts, data = album2)

album.full <- lm(sales~adverts+attract+airplay, data = album2)
summary(album.adv.only)

##
## Call:
## lm(formula = sales ~ adverts, data = album2)
##
## Residuals:
## Min 1Q Median 3Q Max

11
## -152.949 -43.796 -0.393 37.040 211.866
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.341e+02 7.537e+00 17.799 <2e-16 ***
## adverts 9.612e-02 9.632e-03 9.979 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 65.99 on 198 degrees of freedom
## Multiple R-squared: 0.3346, Adjusted R-squared: 0.3313
## F-statistic: 99.59 on 1 and 198 DF, p-value: < 2.2e-16

summary.lm(album.full)

##
## Call:
## lm(formula = sales ~ adverts + attract + airplay, data = album2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -121.324 -28.336 -0.451 28.967 144.132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26.612958 17.350001 -1.534 0.127
## adverts 0.084885 0.006923 12.261 < 2e-16 ***
## attract 11.086335 2.437849 4.548 9.49e-06 ***
## airplay 3.367425 0.277771 12.123 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47.09 on 196 degrees of freedom
## Multiple R-squared: 0.6647, Adjusted R-squared: 0.6595
## F-statistic: 129.5 on 3 and 196 DF, p-value: < 2.2e-16

Compare nested regression models

SSE.adv.only <- sum((fitted(album.adv.only)-album2$sales)^2)

SSM.adv.only <- sum((fitted(album.adv.only)-mean(album2$sales))^2)
SSE.full <- sum((fitted(album.full)-album2$sales)^2)
SSM.full <- sum((fitted(album.full)-mean(album2$sales))^2)
cbind(SSM.adv.only, SSE.adv.only)

## SSM.adv.only SSE.adv.only
## [1,] 433687.8 862264.2

cbind(SSM.full, SSE.full)

## SSM.full SSE.full
## [1,] 861377.4 434574.6

12
F=((SSM.full-SSM.adv.only)/2)/(SSE.full/196)
F

## [1] 96.44738

If the null hypothesis is supported, 𝐹 ∼ 𝐹(2,196)

anova(album.adv.only, album.full) #Note: for the function of anova(model1, model2), all predictors in mo

## Analysis of Variance Table

##
## Model 1: sales ~ adverts
## Model 2: sales ~ adverts + attract + airplay
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 198 862264
## 2 196 434575 2 427690 96.447 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note on hierarchical regression

Build the model

• A hierarchical regression can have as many blocks as there are groups of independent variables, i.e. the
analyst can specify a hypothesis that specifies an exact order of entry for variables.

• A more common hierarchical regression specifies two blocks of variables: a set of control variables
entered in the first block and a set of predictor variables entered in the second block.
– Control variables are often demographics which are thought to make a difference in scores on
the dependent variable.
– Predictors are the variables in whose effect our research question is really interested, but whose
effect we want to separate out from the control variables.

Evaluate the models

• Support for a hierarchical hypothesis would be expected to require statistical significance for the
addition of each block of variables.
• However, many times, we want to exclude the effect of blocks of variables previously entered into the
analysis, whether or not a previous block was statistically significant. The analysis is interested in
obtaining the best indicator of the effect of the predictor variables. The statistical significance of
previously entered variables is not interpreted.
• The latter strategy is also widely adopted in research.

Interpret hierarchical regression

• 𝑅2 change, i.e. the increase when the predictors variables are added to the analysis is interpreted
rather than the overall R² for the model with all variables entered.
• In the interpretation of individual relationships, the relationship between the predictors and the de-
pendent variable is presented.
• Similarly, in the validation analysis, we are only concerned with verifying the significance of the pre-
dictor variables. Differences in control variables are often ignored.

13
Reporting Hierarchical Regression

• Report the procedure to perform the regression

Hierarchical multiple regression was performed to investigate the ability airplay and CD cover
design to predict the variance in CD sales, after controlling for the possible effect of promotion
budget.

In the first step of hierarchical multiple regression, advertisement budget was entered. This model
was statistically significant (F (1, 198) = 99.59; p < .001) and explained 33 % of variance in CD
sales.

• Report 𝑅2 change and its significant test results after the entry of each new block

After the entry of airplay and CD cover design at Step 2, the total variance explained by the
model as a whole was 66% (F (3, 196) = 129.5; p < .001). The introduction of airplay and CD
cover design explained additional 33 % variance in criminal thinking style, after controlling for
advertisement budget (F (2, 196) = 96.45; p < .001).

• Interpret the final model

14
Source: Park, N., Kee, K. F., & Valenzuela, S. (2009). Being immersed in social networking environment:
Facebook groups, uses and gratifications, and social outcomes. CyberPsychology & Behavior, 12, 729-733.

Stepwise Regression

• Variables are entered into the model based on mathematical criteria.

• Forward regression as an example
– R looks for the predictor that can explain the most variance in the outcome variable.
– Having selected the 1st predictor, a second one is chosen from the remaining predictors based
some prespecified criterion.
∗ The semi-partial correlation is often used as a criterion for selection.
∗ Alternatives model selection techniques include adjusted _R_2, AIC
– The procedure stops when the model cannot be improved signifcantly by adding any new variables.
– In R, the step() function choose a model by AIC
• Backward method is the opposite
– All predictors enter the model at first
– Remove each variable and see if AIC goes up
• Both method (called stepwise by some other programs)
– Starts in the same way as the forward method
– Each time a predictor is added to the equation a removal test is made of the least useful predictor
• Difference: in forward, a variable is never removed once it is in the model; in stepwise, the variable
entered in the first step might not be included in the second step.

Stepwise Regression in R

step(album.full, direction = "backward")

## Start: AIC=1544.76
## sales ~ adverts + attract + airplay
##
## Df Sum of Sq RSS AIC
## <none> 434575 1544.8

15
## - attract 1 45853 480428 1562.8
## - airplay 1 325860 760434 1654.7
## - adverts 1 333332 767907 1656.6

##
## Call:
## lm(formula = sales ~ adverts + attract + airplay, data = album2)
##
## Coefficients:
## (Intercept) adverts attract airplay
## -26.61296 0.08488 11.08634 3.36743

album.min = lm (sales ~ 1, data = album2)

step(album.min, direction = "forward", scope = ~ adverts + airplay + attract)

## Start: AIC=1757.29
## sales ~ 1
##
## Df Sum of Sq RSS AIC
## + airplay 1 464863 831089 1670.4
## + adverts 1 433688 862264 1677.8
## + attract 1 137822 1158130 1736.8
## <none> 1295952 1757.3
##
## Step: AIC=1670.44
## sales ~ airplay
##
## Df Sum of Sq RSS AIC
## + adverts 1 350661 480428 1562.8
## + attract 1 63182 767907 1656.6
## <none> 831089 1670.4
##
## Step: AIC=1562.82
## sales ~ airplay + adverts
##
## Df Sum of Sq RSS AIC
## + attract 1 45853 434575 1544.8
## <none> 480428 1562.8
##
## Step: AIC=1544.76
## sales ~ airplay + adverts + attract

##
## Call:
## lm(formula = sales ~ airplay + adverts + attract, data = album2)
##
## Coefficients:
## (Intercept) airplay adverts attract
## -26.61296 3.36743 0.08488 11.08634

step(album.min, direction = "both", scope = ~ adverts + airplay + attract)

16
## Start: AIC=1757.29
## sales ~ 1
##
## Df Sum of Sq RSS AIC
## + airplay 1 464863 831089 1670.4
## + adverts 1 433688 862264 1677.8
## + attract 1 137822 1158130 1736.8
## <none> 1295952 1757.3
##
## Step: AIC=1670.44
## sales ~ airplay
##
## Df Sum of Sq RSS AIC
## + adverts 1 350661 480428 1562.8
## + attract 1 63182 767907 1656.6
## <none> 831089 1670.4
## - airplay 1 464863 1295952 1757.3
##
## Step: AIC=1562.82
## sales ~ airplay + adverts
##
## Df Sum of Sq RSS AIC
## + attract 1 45853 434575 1544.8
## <none> 480428 1562.8
## - adverts 1 350661 831089 1670.4
## - airplay 1 381836 862264 1677.8
##
## Step: AIC=1544.76
## sales ~ airplay + adverts + attract
##
## Df Sum of Sq RSS AIC
## <none> 434575 1544.8
## - attract 1 45853 480428 1562.8
## - airplay 1 325860 760434 1654.7
## - adverts 1 333332 767907 1656.6

##
## Call:
## lm(formula = sales ~ airplay + adverts + attract, data = album2)
##
## Coefficients:
## (Intercept) airplay adverts attract
## -26.61296 3.36743 0.08488 11.08634

Limitations with Stepwise Methods

• Rely on a mathematical criterion.

• Variable selection may depend upon only slight differences in the semi-partial correlation.
• These slight numerical differences can lead to major theoretical differences and create nonsensical
results!
• The computer cannot distinguish spurious correlations or make judgments regarding multi-collinearity
• Stepwise regression is a completely non-theoretical approach to prediction
• Should be used only for exploration or for predictive research

17
All-subsets-regressions (Best-subset)

• A procedure that considers all possible regression models given the set of potentially important pre-
dictors
• Model selection criteria:
– 𝑅2 . Find a subset model so that adding more variables will yield only small increases in R-squared
– Adjusted R2.
– MSE criterion
– Mallow’s Cp criterion
– Other: PRESS, predicted 𝑅2 (which is calculated from the PRESS statistic)…

Mallow’s Cp

• An underspecified model is a model in which important predictors are missing. An underspecified

model yields biased regression coeﬀicients and biased predictions of the response.
• Mallows’ Cp-statistic estimates the size of the bias that is introduced into the predicted responses by
having an underspecified model.

𝑆𝑆𝐸𝑘
𝐶𝑝 = − (𝑛 − 2(𝑘 + 1))
𝑀 𝑆𝐸𝑇

* k = number of independent variables included in a particular regression model

* T = total number of parameters to be estimated in the full regression model
* $SSE_k$ = residual sum of squares for the particular regression model with k predictors
* $MSE_T$ = mean square error of the full model

• When the Cp value is …

– … near k+1, the bias is small (next to none)
– … much greater than k+1, the bias is substantial
– … below k+1, it is due to sampling error; interpret as no bias

*Strategy for using Cp to identify “best” models**

• Identify subsets of predictors for which the Cp value is near k+1 (if possible).
– The full model always yields Cp = k+1, so don’t select the full model based on Cp.
• If all models, except the full model, yield a large Cp not near k+1, it suggests some important predic-
tor(s) are missing from the analysis. In this case, we are well-advised to identify the predictors that
are missing!
• If a number of models have Cp near k+1, choose the model with the smallest Cp value, thereby insuring
that the combination of the bias and the variance is at a minimum.
• When more than one model has a small value of Cp value near k+1, in general, choose the simpler
model or the model that meets your research needs.

All-subsets-regression in R

library(leaps)
album.subsets <- regsubsets(sales~adverts+attract+airplay, data = album2)
summary(album.subsets)

18
## Subset selection object
## Call: regsubsets.formula(sales ~ adverts + attract + airplay, data = album2)
## 3 Variables (and intercept)
## Forced in Forced out
## adverts FALSE FALSE
## attract FALSE FALSE
## airplay FALSE FALSE
## 1 subsets of each size up to 3
## Selection Algorithm: exhaustive
## adverts attract airplay
## 1 ( 1 ) " " " " "*"
## 2 ( 1 ) "*" " " "*"
## 3 ( 1 ) "*" "*" "*"

plot(album.subsets, scale = "Cp")

4
Cp

180
(Intercept)

adverts

attract

airplay

plot(album.subsets, scale = "adjr2")

19
0.66
adjr2

0.63

0.36
(Intercept)

adverts

attract

airplay
Options for plot( ) are r2, bic, Cp, and adjr2.

• It is important to note that no a single criterion can determine which is the best model.
• The different criteria quantify different aspects of the regression model, and therefore often yield
different choices for the best set of predictors.
• Subsets regression is better to be used as a screening tool to reduce the large number of possible
regression models to just a handful of models for further evaluation.
• Further evaluation and refinement might entail performing residual analyses, transforming the predic-
tors and/or response, adding interaction terms, and so on, until you are satisfied with the best model
that summarize the trend in the data and allows you to answer your research question.

Summary of Data-Driven Regression Methods

• Model selection statistics are generally not used blindly, but rather information about the field of
application, the intended use of the model, and any known biases in the data are taken into account
in the process of model selection.
• More suitable for exploratory model building
• Better to cross-validate the model with new data

Assumptions of Regression

Straightforward Assumptions

• Variable Type:

20
– Outcome must be continuous
– Predictors can be continuous or dichotomous.
• Non-Zero Variance: Predictors must not have zero variance.
• Linearity: The relationship we model is, in reality, linear.
• Homoscedasticity: For each value of the predictors the variance of the error term should be constant.
• Independence: All values of the outcome should come from different persons.

Residual Analysis: check assumptions

• Check the assumptions of regression by examining the residuals

– Examine for linearity assumption
– Examine for constant variance for all levels of X (homoscedasticity)

– Evaluate normal distribution assumption

– Evaluate independence assumption

21
The More Tricky Assumptions
• No multicollinearity: Predictors must not be highly correlated.
• Independent Errors: For any pair of observations, the error terms should be uncorrelated.
– Tested with Durbin-Watson test
– Value 1~4, 2 means uncorrelated
• Normally-distributed Errors

Multicollinearity

• Multicollinearity exists if predictors are highly correlated.

• This assumption can be checked with collinearity diagnostics
1
• Variance inflation factor (VIF): 𝑓 = 1−𝑅 2
𝑖

– VIF > 10 are worthy of concern (Myers 1990)

– VIF > 5 are worthy of concern (Menard 1995)
– Average VIF is substantially 1: the model is biased (Bowerman & O’Connell, 1990)
• Tolerance statistic: 1/VIF

library(car)
vif(album.full)

## adverts attract airplay

## 1.014593 1.038455 1.042504

Testing independence in R

• Using the Drubin-Watson test

durbinwatsonTest(model) or dwt(model)

dwt(album.full)

## lag Autocorrelation D-W Statistic p-value

## 1 0.0026951 1.949819 0.68
## Alternative hypothesis: rho != 0

22
Sample Size

• Sample size – results with small sample do not generalize

– 15 cases per predictor (Stevens, 1996)
– Formula for calculating sample size (Tabachnick & Fidell, 2007)

𝑁 > 50 + 8𝑘

∗ N = number of Participants
∗ k = number of IVs

Finding outliers and influential cases

Outliers and influential cases

• An observation that is unconditionally unusual in Y value is called an outlier, but it is not necessarily
a regression outlier
• An observation that has an unusual X value—i.e., it is far from the mean of X—has leverage on (i.e.,
the potential to influence) the regression line
• Influential cases: an unusual X-value with an unusual Y-value given its X-value
• The olsrr package offers a number of tools to detect influential observations. For more use of olsrr,
check out this introduction.

Residuals and standardized residuals

• Unstandardized residuals
• Standardized residuals:
1. those >3 are cause of concern
2. If more than 1% of cases > 2.5, the level of error within model is unacceptable
3. If more than 5% of cases > 2, the level of error within model is unacceptable
• Estimating the outliers in R
– unstanddized residuals: resid()

23
– standardized residuals: rstandard()

album2$standardized.residual <- rstandard(album.full)

library(dplyr)
large.standardized.residual <- filter(album2, standardized.residual > 2 | standardized.residual < -2)

large.standardized.residual

## adverts sales airplay attract standardized.residual

## 1 10.256 330 43 10 2.177404
## 2 985.685 120 28 7 -2.323083
## 3 174.093 300 40 7 2.130289
## 4 102.568 40 25 8 -2.460996
## 5 405.913 190 12 4 2.099446
## 6 1542.329 190 33 8 -2.455913
## 7 579.321 300 30 7 2.104079
## 8 56.895 70 37 7 -2.363549
## 9 1000.000 250 5 7 2.095399
## 10 9.104 120 53 8 -2.628814
## 11 145.585 360 42 8 3.093333
## 12 785.694 110 20 9 -2.088044

library(olsrr)
ols_plot_resid_stand(album.full)

Standardized Residuals Chart

3
Threshold:
169abs(2)

1 10 52 61 100
2
Standardized Residuals

−1

−2 200
2 68
47 55
164

0 50 100 150 200

Observation

24
DFBeta

• DFBeta: the difference between a parameter estimated using all cases and estimated when one case is
excluded

Belsley, Kuh, and Welsch recommend 2 as a general cutoff value to indicate influential observations and √2
𝑛
as a size-adjusted cutoff.
For our sample, √2 = 0.14
200

album2$dfbeta <- dfbeta(album.full)

head(album2$dfbeta)

## (Intercept) adverts attract airplay

## 1 -5.42182707 -1.661591e-03 0.8529699235 0.0433929166
## 2 0.21601702 -8.649690e-04 -0.0450304095 0.0025870806
## 3 -0.65851797 1.207436e-03 -0.0130879018 0.0128983716
## 4 -0.04480869 8.441700e-05 0.0003156056 0.0009589848
## 5 -0.14928350 7.552860e-06 0.0331263834 -0.0039693382
## 6 1.14345654 1.554094e-05 -0.1251265742 -0.0057924331

ols_plot_dfbetas(album.full)

page 1 of 1
Influence Diagnostics for (Intercept) Influence Diagnostics for attract
52 1
Threshold: 0.14 Threshold: 0.14
12
0.2 0.2
DFBETAS

DFBETAS

55 164 200 113

27 169
7 138 152

0.0 0.0
7 138
27 169 69 146
−0.2 47
12 113 −0.2
1 52 200

0 50 100 150 200 0 50 100 150 200

Observation Observation

Influence Diagnostics for adverts Influence Diagnostics for airplay

164 124 169
68
Threshold: 0.14 0.2 42
50
Threshold: 0.14
0.2 3
47 87 1 10 69 105 200
100 148
DFBETAS

DFBETAS

0.0
0.0
68 152
83
10 −0.2 94
−0.2 46 99
100
1 169 119

55 −0.4 164

0 50 100 150 200 0 50 100 150 200

Observation Observation

25
DFFits
• DFFit: The difference between the predicted value for a case when the model is calculated including
that case and when the model is calculated excluding that case.
• An observation is deemed influential if the absolute value of its DFFITS value is greater than:

𝑘+1
2∗√
𝑛
where n is the number of observations and k is the number of predictors.
For our sample, this equals

3+1
2∗√ = 0.28
200
album2$dffit <- dffits(album.full)
head(album2$dffit)

## [1] 0.48929398 -0.21109830 0.21418431 0.01688873 -0.02020169 0.07410797

##large.standardized.residual <- filter(album2, dffit > 0.28 | dffit < -0.28)

##large.standardized.residual

ols_plot_dffits(album.full)

Influence Diagnostics for sales

1 Threshold: 0.28
169

52 100
0.3
DFFITS

0.0

−0.3 47 68 200
99 119
55

164

0 50 100 150 200

Observation

26
Cook’s D

• Cook’s distance: the impact that a case has on the model’s ability to predict all cases
𝑛
∑𝑗=1 (𝑌𝑗̂ −𝑌𝑗(𝑖)
̂ )2
– 𝐷𝑖 = 𝑘 𝑀𝑆𝐸

• Since Cook’s distance is in the metric of an F distribution with p and n-pdegrees of freedom, the
median point of the 𝐹(𝑝,𝑛−𝑝) can be used as a cut-off
• For large n, a simple cutoff value of 1 can be used. (Weisberg, 1982)

album2$cooks.distance <- cooks.distance(album.full)

plot(album2$cooks.distance)
0.06
album2$cooks.distance

0.04
0.02
0.00

0 50 100 150 200

Index

ols_plot_cooksd_chart(album.full)

27
Cook's D Chart
164
Threshold: 0.02

1
0.06

169

55
Cook's D

0.04
52
100 119
99
47 200
68
0.02

0.00

0 50 100 150 200

Observation

Hat values

• Leverage: hat value

– the diagonal elements of the hat matrix 𝐻 = 𝑋(𝑋 𝑇 𝑋)−1 𝑋 𝑇
– ℎ𝑖𝑖 = 𝜕𝜕𝑦𝑦𝑖̂
𝑖
– Average: (k+1)/n
• Criteria
– 0 indicates no influence; 1 indicates the case has complete influence
– Values greater than 2 times the average are cause of concern (Welsch 1978)
– Values greater than 3 times the average are cause of concern (Stevens 2002)

album2$hatvalues <- hatvalues(album.full)

plot(album2$hatvalues)
abline(h = 2*(3+1)/200, lty = 2, col = "green")
abline(h = 3*(3+1)/200, lty = 2, col = "red")

28
0.10
0.08
album2$hatvalues

0.06
0.04
0.02

0 50 100 150 200

Index

Model Building and Validation

• In data-driven research, the final step in the model-building process is to validate the selected regression
model.
– Collect new data and compare the results.
– Cross-validation: If the data set is large, split the data into two parts and cross-validate the results
(often 80-20).

{width = 60%}

29
Categorical predictors and multiple reression
• Often you may have categorical variables (e.g., gender, major) which you want to include as predictors
in regression.
• To do that, you need to code the categorical predictors with dummy variables
– Create k-1 dummy variables
– Choose the baseline/control group. If you don’t havea specific control group, choose the group
that represents the majority of people.
– Assign the baseline values of 0 for all of your dummy variables.
– For your first dummy variable, assign the value 1 to the first group that you want to compare
against the baseline group. Assign all other groups 0 for this variable.
– For your second dummy variable, assign the value 1 to the second group that you want to compare
against the baseline group. Assign all other groups 0 for this variable.
– Repeat this until you run out of dummy variables
– Put dummy variables into the regression model

Example

Salaries is a data frame pre-loaded with carData package. It includes 397 observations on the 2008-09
nine-month academic salary for Assistant Professors, Associate Professors and Professors in a college in the
U.S.

• rank: a factor with levels AssocProf AsstProf Prof

• discipline: a factor with levels A (“theoretical” departments) or B (“applied” departments).
• yrs.since.phd: years since PhD.
• yrs.service: years of service.
• sex: a factor with levels Female Male
• salary: nine-month salary, in dollars.

library(car)
data("Salaries", package = "carData")
str(Salaries)

## 'data.frame': 397 obs. of 6 variables:

## $ rank : Factor w/ 3 levels "AsstProf","AssocProf",..: 3 3 1 3 3 2 3 3 3 3 ...
## $ discipline : Factor w/ 2 levels "A","B": 2 2 2 2 2 2 2 2 2 2 ...
## $ yrs.since.phd: int 19 20 4 45 40 6 30 45 21 18 ...
## $ yrs.service : int 18 16 3 39 41 6 23 45 20 18 ...
## $ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 1 ...
## $ salary : int 139750 173200 79750 115000 141500 97000 175000 147765 119250 129000 ...

Regression with Interval Variables

salaryModel1 <- lm(salary ~ yrs.service, data = Salaries)

summary(salaryModel1)

##
## Call:
## lm(formula = salary ~ yrs.service, data = Salaries)

30
##
## Residuals:
## Min 1Q Median 3Q Max
## -81933 -20511 -3776 16417 101947
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 99974.7 2416.6 41.37 < 2e-16 ***
## yrs.service 779.6 110.4 7.06 7.53e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28580 on 395 degrees of freedom
## Multiple R-squared: 0.1121, Adjusted R-squared: 0.1098
## F-statistic: 49.85 on 1 and 395 DF, p-value: 7.529e-12

Coding categorical variables with two levels

If we want to examine if gender plays a role in determining professors’ salaries, we would like to put it into
the regression model.
Note that the variable of sex in this dataset has already been set contrasts using female as the baseline group

contrasts(Salaries$sex)

## Male
## Female 0
## Male 1

You can use the function relevel() to set the baseline category to males as follow:

Salaries <- Salaries %>%

mutate(sex = relevel(sex, ref = "Male"))
contrasts(Salaries$sex)

## Female
## Male 0
## Female 1

Regression with Dummy Variables

The fact that the coeﬀicient for sexFemale in the regression output is negative indicates that being a Female
is associated with a constant decrease (i.e., smaller intercept in regression) in salary (relative to Males).

salaryModel2 <- lm(salary ~ yrs.service + sex, data = Salaries)

summary(salaryModel2)

##
## Call:
## lm(formula = salary ~ yrs.service + sex, data = Salaries)
##

31
## Residuals:
## Min 1Q Median 3Q Max
## -81757 -20614 -3376 16779 101707
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 101428.7 2531.9 40.060 < 2e-16 ***
## yrs.service 747.6 111.4 6.711 6.74e-11 ***
## sexFemale -9071.8 4861.6 -1.866 0.0628 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28490 on 394 degrees of freedom
## Multiple R-squared: 0.1198, Adjusted R-squared: 0.1154
## F-statistic: 26.82 on 2 and 394 DF, p-value: 1.201e-11

• The interaction term indicates whether females’ salaries grow with years of service in the same rate as
males’ salaries do.

salaryModel3 <- lm(salary ~ yrs.service+yrs.service*sex, data = Salaries)

summary(salaryModel3)

##
## Call:
## lm(formula = salary ~ yrs.service + yrs.service * sex, data = Salaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -80381 -20258 -3727 16353 102536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 102197.1 2563.7 39.863 < 2e-16 ***
## yrs.service 705.6 113.7 6.205 1.39e-09 ***
## sexFemale -20128.6 7991.1 -2.519 0.0122 *
## yrs.service:sexFemale 931.7 535.2 1.741 0.0825 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28420 on 393 degrees of freedom
## Multiple R-squared: 0.1266, Adjusted R-squared: 0.1199
## F-statistic: 18.98 on 3 and 393 DF, p-value: 1.622e-11

Code categorical variables with more than two levels

• Note that by default R treats the levels of categorical data in alphabetical order (level 1 = AsstProf,
level 2 = AssocProf, level 3 = Prof)

contrasts(Salaries$rank) <- contr.treatment(3, base = 1)

contrasts(Salaries$rank)

## 2 3

32
## AsstProf 0 0
## AssocProf 1 0
## Prof 0 1

• Create the dummy variables by setting contrasts

– The contr.treatment() function sets a contrast based on comparing all groups to a baseline con-
dition.
– contr.treatment(number of groups, base = number representing the baseline groups)

Regression with dummy variables

salaryModel4 <- lm(salary ~ yrs.service + sex + rank, data = Salaries)

summary(salaryModel4)

##
## Call:
## lm(formula = salary ~ yrs.service + sex + rank, data = Salaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -64500 -15111 -1459 11966 107011
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82081.5 2974.1 27.598 < 2e-16 ***
## yrs.service -171.8 115.3 -1.490 0.13694
## sexFemale -5468.7 4035.3 -1.355 0.17613
## rank2 14702.9 4266.6 3.446 0.00063 ***
## rank3 48980.2 3991.8 12.270 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23580 on 392 degrees of freedom
## Multiple R-squared: 0.4, Adjusted R-squared: 0.3938
## F-statistic: 65.32 on 4 and 392 DF, p-value: < 2.2e-16

Summary
• Different types of regression methods are developed for theory-driven or data-driven research purposes.
• To compare nested models, F-test and AIC are often used.
• To compare not nested models, more general goodness-of-fit indices are often used, including Mallow’s
Cp.
• It is possible for a single observation to have a great influence on the results of a regression analysis.
Methods to detect such observations include DFBeta, DFFits, Cook’s D, and hat values.
• Categorical variables can be included in linear regression models after coded using aa set of dummy
variables.

Stat2 by Ann R. Cannon
No ratings yet
Stat2 by Ann R. Cannon
639 pages
REG2022
No ratings yet
REG2022
313 pages
Regression GL M
No ratings yet
Regression GL M
315 pages
Econometric S 2007
No ratings yet
Econometric S 2007
167 pages
ECONOMETRICS (2014 Version) by Bruce E. Hansen
No ratings yet
ECONOMETRICS (2014 Version) by Bruce E. Hansen
387 pages
Intro To Econometrics With R PDF
No ratings yet
Intro To Econometrics With R PDF
392 pages
Practical AI For Business Leaders, Product Managers, and Entrepreneurs (Alfred Essa, Shirin Mojarad)
100% (1)
Practical AI For Business Leaders, Product Managers, and Entrepreneurs (Alfred Essa, Shirin Mojarad)
240 pages
Math2831 Course Pack
No ratings yet
Math2831 Course Pack
246 pages
Iter PDF
No ratings yet
Iter PDF
400 pages
2 Linear
No ratings yet
2 Linear
15 pages
Econometrics PDF
No ratings yet
Econometrics PDF
387 pages
SOA Exam Statistics For Risk Modelling Study Manual
No ratings yet
SOA Exam Statistics For Risk Modelling Study Manual
42 pages
Course Notes18
No ratings yet
Course Notes18
113 pages
Linear Regression With Python
No ratings yet
Linear Regression With Python
140 pages
Regression Models For Data Science in R
No ratings yet
Regression Models For Data Science in R
137 pages
Applied Statistics
No ratings yet
Applied Statistics
457 pages
Quantitative Research Methods For Political Science, Public Policy and Public Administration, With Applications in R
No ratings yet
Quantitative Research Methods For Political Science, Public Policy and Public Administration, With Applications in R
259 pages
Introduction To Econometrics With R: Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer
No ratings yet
Introduction To Econometrics With R: Christoph Hanck, Martin Arnold, Alexander Gerber, and Martin Schmelzer
481 pages
Manuel PDF
No ratings yet
Manuel PDF
503 pages
Handbook of Regression Methods
100% (5)
Handbook of Regression Methods
654 pages
Econometria Con R
No ratings yet
Econometria Con R
300 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
AI ML Question Bank With Answers
No ratings yet
AI ML Question Bank With Answers
29 pages
Regression Models For Data Science in R by Brian Caffo
No ratings yet
Regression Models For Data Science in R by Brian Caffo
144 pages
Eco No Metrics
No ratings yet
Eco No Metrics
299 pages
Cursus Advanced Econometrics
No ratings yet
Cursus Advanced Econometrics
129 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
CS 2008 3complete PDF
No ratings yet
CS 2008 3complete PDF
53 pages
Regression Models Course Notes
No ratings yet
Regression Models Course Notes
102 pages
Econometrics Written by Hansen
100% (1)
Econometrics Written by Hansen
238 pages
Statistical Regression
No ratings yet
Statistical Regression
32 pages
Reg Book Stat
No ratings yet
Reg Book Stat
79 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
SM Notes 2020
No ratings yet
SM Notes 2020
139 pages
Introduction To Econometrics With R
No ratings yet
Introduction To Econometrics With R
400 pages
ch03 Regression
No ratings yet
ch03 Regression
10 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Etc 2410 Notes
50% (2)
Etc 2410 Notes
133 pages
Stat 378
No ratings yet
Stat 378
73 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Econometrics Note
No ratings yet
Econometrics Note
387 pages
2 Quantitative
No ratings yet
2 Quantitative
19 pages
Reg Mods
No ratings yet
Reg Mods
137 pages
Impact of Job Stress On Job Performance With Perceived Organizational Support As A Moderator
No ratings yet
Impact of Job Stress On Job Performance With Perceived Organizational Support As A Moderator
18 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
Stat 331 Course Notes
No ratings yet
Stat 331 Course Notes
79 pages
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
No ratings yet
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
110 pages
Quantitative Techniques For Financial Modelling
No ratings yet
Quantitative Techniques For Financial Modelling
33 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
46 pages
Lecture 1. Part 1-Regression Analysis. Correlation and SLRM
No ratings yet
Lecture 1. Part 1-Regression Analysis. Correlation and SLRM
44 pages
Lecture 5
No ratings yet
Lecture 5
31 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
5-Lane-By-Lane Analysis Framework For Conducting Highway Capacity Analyses at Freeway Segments
No ratings yet
5-Lane-By-Lane Analysis Framework For Conducting Highway Capacity Analyses at Freeway Segments
13 pages
Customer Satisfaction Towards Nissan
No ratings yet
Customer Satisfaction Towards Nissan
43 pages
What Statistics Books Try To Teach You But Dont Joe King University of Washington
No ratings yet
What Statistics Books Try To Teach You But Dont Joe King University of Washington
40 pages
Kuiper Ch03 PDF
No ratings yet
Kuiper Ch03 PDF
35 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
QF Project
No ratings yet
QF Project
27 pages
Joint Inversion Using Deep Learning
No ratings yet
Joint Inversion Using Deep Learning
7 pages
Terrorism in Pakistan and Its Impact On Foreign Investment
No ratings yet
Terrorism in Pakistan and Its Impact On Foreign Investment
23 pages
OLS Regression
No ratings yet
OLS Regression
24 pages
Linear Models and Linear Mixed Effects Models in R
No ratings yet
Linear Models and Linear Mixed Effects Models in R
5 pages
Earnings Management and Financial Performance of Quoted Deposit Money Banks in Nigeria
No ratings yet
Earnings Management and Financial Performance of Quoted Deposit Money Banks in Nigeria
9 pages
Impact of Celebrity Endorsements and Brand Mascots On Consumer Buying Behavior
No ratings yet
Impact of Celebrity Endorsements and Brand Mascots On Consumer Buying Behavior
18 pages
Tutorial Session 12 - Model Selection Solution
No ratings yet
Tutorial Session 12 - Model Selection Solution
4 pages
Psychological Well-Being, Dental Esthetics, and Psychosocial Impacts in Adolescent Orthodontic Patients: A Prospective Longitudinal Study
No ratings yet
Psychological Well-Being, Dental Esthetics, and Psychosocial Impacts in Adolescent Orthodontic Patients: A Prospective Longitudinal Study
12 pages
Heather Walen-Frederick SPSS Handout
No ratings yet
Heather Walen-Frederick SPSS Handout
13 pages
Ps 4 Lab
No ratings yet
Ps 4 Lab
4 pages
1967 Blau Duncan
No ratings yet
1967 Blau Duncan
14 pages
ID Pengaruh Self Efficacy Coaching Dan Empo
No ratings yet
ID Pengaruh Self Efficacy Coaching Dan Empo
10 pages
Johanson Cointegration Test and ECM
100% (7)
Johanson Cointegration Test and ECM
7 pages
Noakes 1990
No ratings yet
Noakes 1990
12 pages
Covid-19 Short-Term Forecasting in Bangladesh Using Supervised Machine Learning
No ratings yet
Covid-19 Short-Term Forecasting in Bangladesh Using Supervised Machine Learning
8 pages
Managerial Approaches and Organizational Performance of Multinational Enterprises in Nigeria
No ratings yet
Managerial Approaches and Organizational Performance of Multinational Enterprises in Nigeria
13 pages
ECON 301 - Midterm - F2020 Answer Key - pdf-1601016920671
No ratings yet
ECON 301 - Midterm - F2020 Answer Key - pdf-1601016920671
7 pages
A Confirmatory Factor Analysis of The End-User Computing Satisfaction Instrument
No ratings yet
A Confirmatory Factor Analysis of The End-User Computing Satisfaction Instrument
10 pages
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
The evaluation of financial risk profile of the companies and the mandatory disclosure on Liquidity and Credit Risk: An experiment to evaluate the usefulness of the disclosure required by the IFRS 7 accounting standard for Professional users (Financial analysts) and nonprofessional users (University students)
From Everand
The evaluation of financial risk profile of the companies and the mandatory disclosure on Liquidity and Credit Risk: An experiment to evaluate the usefulness of the disclosure required by the IFRS 7 accounting standard for Professional users (Financial analysts) and nonprofessional users (University students)
Olga Cucaro
No ratings yet
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
The Stock Market from A to See - 2nd Edition
From Everand
The Stock Market from A to See - 2nd Edition
John Nunez
No ratings yet