Basic Regression Analysis 2
Basic Regression Analysis 2
H.M.F 6
Categorical independent
variables/creating dummy variables
Examples
Let the variable x is coded 1, 2 ,3 and we want to give values
like low, medium and high
Mydata$x=factor(mydata$x,levels=c(1,2,3),labels=c(“low”,
“medium”,”high”))
H.M.F 8
Regression assumptions
Linearity of the data. The relationship between the predictor (x) and the
outcome (y) is assumed to be linear.
Normality of residuals. The residual errors are assumed to be normally
distributed.
Homogeneity of residuals variance. The residuals are assumed to have a
constant variance (homoscedasticity): Plotting residuals versus fitted values
is a good test.
Independence of residuals error terms.
You should check whether or not these assumptions hold true. Potential
problems include:
Non-linearity of the outcome - predictor relationships
Heteroscedasticity: Non-constant variance of error terms.
Presence of influential values in the data that can be:
Outliers: extreme values in the outcome (y) variable
High-leverage points: extreme values in the predictors (x) variable
H.M.F 9
Regression diagnostics {reg-diag}
Diagnostic plots
Regression diagnostics plots can be created using the
R base function plot() or the autoplot() function
[ggfortify package], which creates a ggplot2-based
graphics.
Create the diagnostic plots with the R base function:
par(mfrow = c(2, 2))
plot(model)
library(ggfortify)
autoplot(model)
H.M.F 10