0% found this document useful (0 votes)
17 views6 pages

Basic Regression Analysis 2

Basic Regression Analysis 2

Uploaded by

Abhorn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Basic Regression Analysis 2

Basic Regression Analysis 2

Uploaded by

Abhorn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Regression Analysis

 Regression analysis is the appropriate statistical method


when the response variable and all explanatory variables
are continuous. Here, we only discuss linear regression,
the simplest and most common form.

 The purpose of this lesson on correlation and linear


regression is to provide guidance on how R can be used to
determine the association between two variables and to
then use this degree of association to predict future
outcomes.
 Past behavior is the best predictor of future behavior.
H.M.F 5
Regression Analysis
Linear Model
 Regression Analysis: is a statistical technique that can be used
to develop a mathematical equation showing how variables
are related.
 The basic function for fitting ordinary multiple models is
lm(), and a streamlined version of the call is as follows:
> fitted.model <- lm(formula, data =
data.frame)
>fit <- lm(y ~ x1 + x2 + x3, data=mydata) # with intercept
>fit <- lm(y ~ x1 + x2 + x3 -1 , mydata=mydata) # omitting intercept
summary(fit) # show results

H.M.F 6
Categorical independent
variables/creating dummy variables
 Examples
 Let the variable x is coded 1, 2 ,3 and we want to give values
like low, medium and high
 Mydata$x=factor(mydata$x,levels=c(1,2,3),labels=c(“low”,
“medium”,”high”))

 R by default chooses the reference category as the 'first’ or


baseline one which is decided alphabetically or
numerically (if coded as 1 2 3…). So if you had a factor with
four values ‘married', ‘divorced', ‘widowed', ‘single' then R
will use ‘divorced' as the reference category.
 Varx=relevel(varx,ref=“wanted ref”)#do table for the
orginal and the changed table(x)
H.M.F 7
Creating dummy variables in R
 Dummy variables are always binary, but they can also
be created based on categorical variables with more
than two categories.
 For instance, you might consider the geographic region
of respondents.
 You can use the region variable to this end. But this is a
categorical variable with four values
 data$cat1 <- ifelse(dat$var == “value", 1, 0)

H.M.F 8
Regression assumptions
 Linearity of the data. The relationship between the predictor (x) and the
outcome (y) is assumed to be linear.
 Normality of residuals. The residual errors are assumed to be normally
distributed.
 Homogeneity of residuals variance. The residuals are assumed to have a
constant variance (homoscedasticity): Plotting residuals versus fitted values
is a good test.
 Independence of residuals error terms.

 You should check whether or not these assumptions hold true. Potential
problems include:
 Non-linearity of the outcome - predictor relationships
 Heteroscedasticity: Non-constant variance of error terms.
 Presence of influential values in the data that can be:
 Outliers: extreme values in the outcome (y) variable
 High-leverage points: extreme values in the predictors (x) variable
H.M.F 9
Regression diagnostics {reg-diag}

 Diagnostic plots
 Regression diagnostics plots can be created using the
R base function plot() or the autoplot() function
[ggfortify package], which creates a ggplot2-based
graphics.
 Create the diagnostic plots with the R base function:
 par(mfrow = c(2, 2))
 plot(model)
 library(ggfortify)
 autoplot(model)

H.M.F 10

You might also like