Manual For Regression Analysis Using R-Software
Manual For Regression Analysis Using R-Software
Department Statistics
Laboratory manual
Prepared by:
Wosenie Gebireamanuel
Dessie, Ethiopia
March, 2013 E.C
Table of content and table of figure
Table of content and table of figure ............................................................................................................... i
List of figures ................................................................................................................................................ ii
Introduction ................................................................................................................................................... 1
Part –I ............................................................................................................................................................ 2
1. Introduction to R and its feature ........................................................................................................... 2
1.1. What is R....................................................................................................................................... 2
1.2. Features of R ................................................................................................................................. 2
1.3. Installing R .................................................................................................................................... 2
1.4. Opening R ...................................................................................................................................... 3
1.5. Data entry ...................................................................................................................................... 4
1.6. The Menu Bar ............................................................................................................................... 5
1.7. Basic data types in R ..................................................................................................................... 6
1.7.1. Vectors .................................................................................................................................. 7
1.7.2. Matrices................................................................................................................................. 8
1.7.3. Array ................................................................................................................................... 10
1.7.4. Lists ..................................................................................................................................... 10
1.7.5. Data frame ........................................................................................................................... 10
2. Part II statistical regression analysis ................................................................................................... 11
2.1. Regression analysis ..................................................................................................................... 11
2.1.1. Simple linear regression ...................................................................................................... 12
2.1.2. Multiple linear regressions .................................................................................................. 17
2.2. ANOVA Models ......................................................................................................................... 19
2.3. Generalized linear model ............................................................................................................ 22
2.3.1. Binary logistic regression.................................................................................................... 24
2.4. MODEL DIAGNOSTIC ............................................................................................................. 26
2.4.1. Scatter ................................................................................................................................. 26
2.4.2. Normality of Residuals ....................................................................................................... 30
2.4.3. Outliers checking ................................................................................................................ 31
2.4.4. Influential Observations ...................................................................................................... 32
2.4.5. Non-constant Error Variance/heteroscedasticity/................................................................ 33
2.4.6. Multi-collinearity ................................................................................................................ 34
2.4.7. Evaluate Nonlinearity.......................................................................................................... 35
Reference .................................................................................................................................................... 36
Regression analysis answers questions about the dependence of a response variable on one or
more predictors, including prediction of future values of a response, discovering which
predictors are important, and estimating the impact of changing a predictor or a treatment on the
value of the response.
Linear statistical models for regression, analysis of variance, and experimental design are widely
used today in business administration, economics, engineering, and the social, health, and
biological sciences. Successful applications of these models require a sound understanding of
both the underlying theory and the practical problems that are encountered in using the models in
real-life situations. While this module, is basically a practical guide to perform regression
analysis using R software.
This module has two parts. Basically it illustrates on part one an introductory about R and in
part two, there are some note and R-software syntax of statistical regression analysis those
elaborate in detail way about practical Regression analysis.
1.1.What is R?
R is a programming language and software environment for statistical analysis, graphics
representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand, and is currently developed by the R Development Core
Team. The core of R is an interpreted computer language which allows branching and looping as
well as modular programming using functions.
1.2.Features of R
As stated earlier, R is a programming language and software environment for statistical analysis,
graphics representation and reporting. The following are the important features of R:
-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
ndling and storage facility,
1.3. Installing R
The R system for statistical computing consists of two major parts: the base system and a
collection of user contributed add-on packages. The R language is implemented in the base
system. Implementations of statistical and graphical procedures are separated from the base
system and are organized in the form of packages. A package is a collection of functions,
examples and documentation. The functionality of a package is often focused on a special
statistical methodology. Both the base system and packages are distributed via the
Comprehensive R Archive Network (CRAN) accessible under
http://CRAN.R-project.org
You can download the Windows installer version of R from R-(recent version) for Windows
(32/64 bit) and save it in a local directory.
1.4. Opening R
1. After installation you can locate the icon to run the Program under the Windows Program
Files. Clicking this icon brings up the R-GUI which is the R console to do R
Programming.
2. Double click on the R shortcut(existing R shortcut)
3. Double-click on the .RData file in the folder.
Then, the R console (command line) window will be automatically displayed
R comes without any frills and on startup shows simply a short introductory message including
the version number and a prompt ‘>‟ sign. It indicates that R is ready for another command. Or
“way of saying―Go Ahead...Do something”
The instructions you give R are called commands.
Commands are separated either by a semicolon (;) or by a new line.
Comments can be put almost anywhere, starting with a hash mark (#); everything after#
is a comment.
If you see a “+” in place of the prompt that means that your last command was not
completed.
Example:
>4*
+
Don„t forget that R is case sensitive!
The menu bar in R is very similar to that in most Windows based/menu based programs (SPSS,
MINITAB…).
It contains six pull down menus, which are briefly described below. Much of the functionality
provided by the menus is redundant with those available using standard windows commands
(CTRL+C to copy, for example) and with commands you can enter at the command line.
Nevertheless, it is handy to have the menu system for quick access to functionality.
File
Similar to other statistical packages, in R the file menu contains options for opening, saving, and
printing R documents, as well as the option for exiting the program (which can also be done
using the close button in the upper right hand corner of the main program window). The options
that begin with “load” (“Load Workspace and “Load History”) are options to open previously
saved work. The next chapter discusses the different save options available in some detail as well
as what a workspace and a history are in terms of R files. The option to print is standard and will
print the information selected.
Edit
The edit menu contains the standard functionality of cut, copy and paste, and selects all. In
addition there is an option to “Clear console or Ctrl+L” which creates a blank workspace with
> rm(list=ls(all=T))
To remove a specified number of objects use:
1.7.1. Vectors
Vectors are the simplest type of object in R. They can easily be created with c, the combined
function.
There are 3 main types of vectors:
a) Numeric vectors
b) Character vectors
c) Logical vectors
A. Numeric Vector: is a single entity consisting of an ordered collection of numbers.
Example: To setup a numeric vector X consisting of 5 numbers,10,6,3,6,22, we use any one of
the following commands:
>x<-c(10,6,3,6,22)#OR
>x=c(10,6,3,6,22)#OR
>assign(“x”,c(10,6,3,6,22))#OR
>c(10,6,3,6,22)->x
Functions that return a single value
>length(x)#the number of elements in x
>sum(x)#the sum of the values of x
>mean(x)#the mean of the values of x
>var(x)#the variance of the values of x
>sd(x)#the standard deviation of the values of x
>min(x)#the minimum value from the values of x
>max(x)#the maximum value from the values of x
>prod(x)#the product of the values of x
>range(x)#the range of the values of x(smallest and largest)
B. Character vectors
To setup a character/string vector z consisting of 3 place names use:
>z<-c("Canberra","Sydney","Newcastle")Or
>z<-c('Canberra','Sydney','Newcastle')
Character strings are entered using either matching double ("") or single ('') quotes, but are
printed using double quotes (or sometimes without quotes).
C. Logical Vectors
A logical vector is a vector whose elements are TRUE, FALSE or NA.
Note: TRUE and FALSE are often abbreviated asT and F respectively, however T and F are just
variables which are set toTRUE and FALSE by default, but are not reserved words and hence
can be over written by the user.
The logical operators are <, <=, >, >=, == for exact equality and !=for inequality.
1.7.2. Matrices
Is a Rectangular table of data of the same type. As with vectors, all the elements of a matrix must
be of the same data type.
We can Use the function matrix
X=matrix(c(1:8),2,4,byrow=F)
An equivalent expression:
> x<-matrix(c(1:8),nrow=2,ncol=4)
Use the function cbind to create a matrix by binding two or more vectors as column vectors.
WU, Department of Statistics Page 8
The function rbind is used to create a matrix by binding two or more vectors as row vectors.
Example:
> cbind(c(1,2,3),c(4,5,6))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> rbind(c(1,2,3),c(4,5,6))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Matrix operations
Name Operation
dim() Dimension of the matrix (number of rows and columns)
as.matrix() Used to coerce an argument into a matrix object
%*% Matrix multiplication
t() Matrix transpose
det() Determinant of a square matrix
solve() Matrix inverse; also solves a system of linear equations
eigen() Computes eigenvalues and eigenvectors
R has a number of matrix specific operations, for example:
1.7.4. Lists
Are collections of arbitrary objects. That is, the elements of a list can be objects of any type and
structure. Consequently, a list can contain another list and therefore it can be used to construct
arbitrary data structures. A list could consist of a numeric vector, a logical value, a matrix, a
complex vector, a character array, a function, and so on.
Lists are created with the list() command:
L<-list(object-1,object-2,…,object-m)
Example:
>L<-list(c(1,5,3),matrix(1:6,nrow=3),c("Hello","world"))
>L
[[1]]
[1] 1 5 3
[[2]]
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
[[3]]
[1] "Hello“ "world"
Regression Analysis is a statistical tool for the investigation of relationship between variables.
The investigator wants to ascertain causal effect of one variable on another. The investigator
also typically assesses the ―Statistical significance‖ of the estimated relationship (the degree of
confidence that the true relationship is close to the estimated relationship).
Regression may be simple or multiple, linear or non-linear. The template for a statistical model is
a linear regression model with independent, homoscedastic errors.
error component ε accounts for the failure of data to lie on the straight line and represents the
difference between the true and observed realization of Y . There can be several reasons for such
difference, e.g., the effect of all deleted variables in the model, variables may be qualitative,
inherent randomness in the observations etc. We assume that ε is observed as independent and
identically distributed random variable with mean zero and constant variance . Later, we will
Y ~ (A + B + C)^2 Y = βo+ β1A + β2B + β3C + A model including all first-order effects and interactions up to the nth order, where n is
β4AB + β5AC + β6AC given by ( )^n. An equivalent code in this case is Y ~ A*B*C – A:B:C.
We consider the modeling between the dependent and one independent variable. When there is
only one independent variable in the linear regression model, the model is generally termed as a
simple linear regression model. When there are more than one independent variables in the
model, then the linear model is termed as the multiple linear regression model.
Response ~expression
y~x
y~1+x
Both imply the same simple linear regression model of y on x. The first has an implicit intercept
term and the second an explicit one.
y~0+x
Using R sample data “iris”, we can make simple linear regression analysis Sepal.Length on
Sepal.Width
>linmodel=lm(Sepal.Length~Sepal.Width,data=iris)
> linmodel
Call:
lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
Coefficients:
(Intercept) Sepal.Width
6.5262 -0.2234
> summary(linmodel)
Call:
lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
Residuals:
Min 1Q Median 3Q Max
-1.5561 -0.6333 -0.1120 0.5579 2.2226
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5262 0.4789 13.63 <2e-16 ***
coef(object)
Extract the regression coefficient (matrix).
Long form: coefficients(object).
deviance(object)
Residual sum of squares, weighted if appropriate.
formula(object)
Extract the model formula.
plot(object)
Produce four plots, showing residuals, fitted values and some diagnostics.
predict(object, newdata=data.frame)
The data frame supplied must have variables specified with the same labels as the original. The
value is a vector or matrix of predicted values corresponding to the determining variable values
in data.frame.
print(object)
Print a concise version of the object. Often used implicitly.
residuals(object)
Extract the (matrix of) residuals, weighted as appropriate.
The model with the smallest value of AIC (Akaike„s An Information Criterion) discovered in the
stepwise search is returned.
summary(object)
Print a comprehensive summary of the results of the regression analysis.
(Intercept) Sepal.Width
6.5262226 -0.2233611
resid(linmodel)
> resid(linmodel)
1 2 3 4 5 6 7 8
-0.64445884 -0.95613937 -1.11146716 -1.23380326 -0.72212273 -0.25511441 -1.16679494 -0.76679494
9 10 11 12 13 14 15 16
-1.47847547 -0.93380326 -0.29978662 -0.96679494 -1.05613937 -1.55613937 0.16722169 0.15656612
17 18 19 20 21 22 23 24
-0.25511441 -0.64445884 0.02254948 -0.57745052 -0.36679494 -0.59978662 -1.12212273 -0.68913105
25 26 27 28 29 30 31 32
-0.96679494 -0.85613937 -0.76679494 -0.54445884 -0.56679494 -1.11146716 -1.03380326 -0.36679494
33 34 35 36 37 38 39 40
-0.41044220 -0.08810609 -0.93380326 -0.81146716 -0.24445884 -0.82212273 -1.45613937 -0.66679494
41 42 43 44 45 46 47 48
-0.74445884 -1.51249211 -1.41146716 -0.74445884 -0.57745052 -1.05613937 -0.57745052 -1.21146716
49 50 51 52 53 54 55 56
-0.39978662 -0.78913105 1.18853284 0.58853284 1.06619674 -0.51249211 0.59918842 -0.20081158
57 58 59 60 61 62 63 64
0.51086895 -1.09015600 0.72152453 -0.72314769 -1.07950043 0.04386063 -0.03482822 0.22152453
65 66 67 68 69 70 71 72
-0.27847547 0.86619674 -0.25613937 -0.12314769 0.16517178 -0.36781990 0.08853284 0.19918842
73 74 75 76 77 78 79 80
0.33218010 0.19918842 0.52152453 0.74386063 0.89918842 0.84386063 0.12152453 -0.24548379
81 82 83 84 85 86 87 88
-0.49015600 -0.49015600 -0.12314769 0.07685231 -0.45613937 0.23320506 0.86619674 0.28750789
89 90 91 92 93 94 95 96
-0.25613937 -0.46781990 -0.44548379 0.24386063 -0.14548379 -1.01249211 -0.32314769 -0.15613937
97 98 99 100 101 102 103 104
-0.17847547 0.32152453 -0.86781990 -0.20081158 0.51086895 -0.12314769 1.24386063 0.42152453
105 106 107 108 109 110 111 112
0.64386063 1.74386063 -1.06781990 1.42152453 0.73218010 1.47787727 0.68853284 0.47685231
113 114 115 116 117 118 119 120
0.94386063 -0.26781990 -0.10081158 0.58853284 0.64386063 2.02254948 1.75451621 -0.03482822
121 122 123 124 125 126 127 128
1.08853284 -0.30081158 1.79918842 0.37685231 0.91086895 1.38853284 0.29918842 0.24386063
As it has 150 observations, the residuals errors in Sepal.Length are also up to 150.
fitted(linmodel)
1 2 3 4 5 6 7 8 9 10 11
5.744459 5.856139 5.811467 5.833803 5.722123 5.655114 5.766795 5.766795 5.878475 5.833803 5.699787
12 13 14 15 16 17 18 19 20 21 22
5.766795 5.856139 5.856139 5.632778 5.543434 5.655114 5.744459 5.677451 5.677451 5.766795 5.699787
23 24 25 26 27 28 29 30 31 32 33
5.722123 5.789131 5.766795 5.856139 5.766795 5.744459 5.766795 5.811467 5.833803 5.766795 5.610442
34 35 36 37 38 39 40 41 42 43 44
5.588106 5.833803 5.811467 5.744459 5.722123 5.856139 5.766795 5.744459 6.012492 5.811467 5.744459
45 46 47 48 49 50 51 52 53 54 55
5.677451 5.856139 5.677451 5.811467 5.699787 5.789131 5.811467 5.811467 5.833803 6.012492 5.900812
56 57 58 59 60 61 62 63 64 65 66
5.900812 5.789131 5.990156 5.878475 5.923148 6.079500 5.856139 6.034828 5.878475 5.878475 5.833803
67 68 69 70 71 72 73 74 75 76 77
5.856139 5.923148 6.034828 5.967820 5.811467 5.900812 5.967820 5.900812 5.878475 5.856139 5.900812
78 79 80 81 82 83 84 85 86 87 88
5.856139 5.878475 5.945484 5.990156 5.990156 5.923148 5.923148 5.856139 5.766795 5.833803 6.012492
89 90 91 92 93 94 95 96 97 98 99
5.856139 5.967820 5.945484 5.856139 5.945484 6.012492 5.923148 5.856139 5.878475 5.878475 5.967820
100 101 102 103 104 105 106 107 108 109 110
5.900812 5.789131 5.923148 5.856139 5.878475 5.856139 5.856139 5.967820 5.878475 5.967820 5.722123
111 112 113 114 115 116 117 118 119 120 121
5.811467 5.923148 5.856139 5.967820 5.900812 5.811467 5.856139 5.677451 5.945484 6.034828 5.811467
122 123 124 125 126 127 128 129 130 131 132
5.900812 5.900812 5.923148 5.789131 5.811467 5.900812 5.856139 5.900812 5.856139 5.900812 5.677451
133 134 135 136 137 138 139 140 141 142 143
5.900812 5.900812 5.945484 5.856139 5.766795 5.833803 5.856139 5.833803 5.833803 5.833803 5.923148
144 145 146 147 148 149 150
5.811467 5.789131 5.856139 5.967820 5.856139 5.766795 5.856139
It is Examining of the linear relationship between one dependent (Y) & two or more
independent variables (Xi). Or Use more than one explanatory variable to explain the variability
in the response variable.
ε is random error
The coefficients of the multiple regression models are estimated using sample data with k
independent variables
Ŷi b0 b1X1i b 2 X 2i b k X ki
– b1=The change in the mean of Y per unit change in X1, taking into account the
effect of X2 (or net of X2)
ModelSummary
Residuals:
Coefficients:
---
Signif. codes: 0 „***‟ 0.001 „**‟ 0.01 „*‟ 0.05 „.‟ 0.1 „ ‟ 1
2.2.ANOVA Models
One-Way ANOVA: Analysis of variance is used to test the hypothesis that several means are
equal
One way ANOVA model: Yij i ij
i=1,2,…,I and j=1,2,….,Ji
This function will calculate an analysis of variance table, which can be used to evaluate the
significance of the terms in single models or to compare two nested models.
The basic function for fitting ordinary ANOVA models is aov() and a streamlined version of the
call is as follows:
Generalized Linear Models in R-are an extension of linear regression models allow dependent
variables to be far from normal. A general linear model makes three assumptions –
modelglm<-glm(count~spray,poisson,data=InsectSprays)
> summary(modelglm)
Call:
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3852 -0.8876 -0.1482 0.6063 2.6922
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.67415 0.07581 35.274 < 2e-16 ***
sprayB 0.05588 0.10574 0.528 0.597
sprayC -1.94018 0.21389 -9.071 < 2e-16 ***
sprayD -1.08152 0.15065 -7.179 7.03e-13 ***
sprayE -1.42139 0.17192 -8.268 < 2e-16 ***
sprayF 0.13926 0.10367 1.343 0.179
---
Signif. codes: 0 „***‟ 0.001 „**‟ 0.01 „*‟ 0.05 „.‟ 0.1 „ ‟ 1
Age=c(23,32,57,23,34,28,24,35,39,32,43,30,25,36,33,41,24,27,28,40,23,35,50,35,26)
Gender=c(1,0,0,1,1,1,0,1,0,0,0,1,1,1,1,0,0,1,1,0,1,1,0,0,0)
Edu.level=c(1,2,2,0,0,0,2,2,2,0,1,1,1,0,0,0,1,0,0,2,1,1,1,1,0)
dataBLRM=data.frame(Status.mind,Age,Gender,Edu.level)
fittedBLRM=glm(Status.mind~Age+as.factor(Gender)+as.factor(Edu.level),
family="binomial",data=dataBLRM)
summary(fittedBLRM)
Call:
glm(formula = Status.mind ~ Age + as.factor(Gender) + as.factor(Edu.level),
family = "binomial", data = dataBLRM)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9761 -0.7033 0.4872 0.8396 1.9189
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.24021 2.61303 -0.857 0.3913
Age 0.01395 0.06441 0.217 0.8286
as.factor(Gender)1 0.57867 1.17692 0.492 0.6229
as.factor(Edu.level)1 2.17438 1.09859 1.979 0.0478 *
as.factor(Edu.level)2 3.24459 1.52683 2.125 0.0336 *
---
Signif. codes: 0 „***‟ 0.001 „**‟ 0.01 „*‟ 0.05 „.‟ 0.1 „ ‟ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 34.617 on 24 degrees of freedom
Residual deviance: 26.622 on 20 degrees of freedom
AIC: 36.622
Number of Fisher Scoring iterations: 4
WU, Department of Statistics Page 25
2.4. MODEL DIAGNOSTIC
As with all statistical procedures linear regression analysis rests on basic assumptions about
the population from where the data have been derived. The results of the analysis are only
reliable when these assumptions are satisfied. Hence, the possible influence of outliers and
the checking of assumptions made in fitting the linear regression model, i.e., constant
variance and normality of error terms, can both be undertaken using a variety of diagnostic
tools, of which the simplest and most well-known are the estimated residuals, i.e., the
differences between the observed values of the response and the fitted values of the response.
In essence these residuals estimate the error terms in the simple and multiple linear
regression models. So, after estimation, the next stage in the analysis should be an
examination of such residuals from fitting the chosen model to check on the normality and
constant variance assumptions and to identify outliers. Hence, in order to check whether the
model assumptions are satisfied or not residuals and fitted values of the model can be
extracted and saved using fitted (model) and residuals (model), respectively. The most useful
plots of these residuals are:
• A plot of residuals against each explanatory variable in the model. The presence of a non-
linear relationship, for example, may suggest that a higher order term, in the explanatory
variable should be considered.
• A plot of residuals against fitted values. If the variance of the residuals appears to increase
with predicted value, a transformation of the response variable may be in order.
• A normal probability plot of the residuals. After all the systematic variation has been
removed from the data, the residuals should look like a sample from a standard normal
distribution. A plot of the ordered residuals against the expected order statistics from a
normal distribution provides a graphical check of this assumption.
2.4.1. Scatter
Scatter is measured by the size of the residuals. A common problem is where the scatter
increases as the mean response increases. This means the big residuals happen when the fitted
values are big recognize this by a “funnel effect” in the residuals versus fitted value plot.
Data (airquality) for Daily air quality measurements in New York, May to September
1973:
Library(car); data(airquality)
WU, Department of Statistics Page 26
Variables are:
Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at
Roosevelt Island
Solar.R: Solar radiation in Langleys in the frequency band 4000–7700
Angstroms from 0800 to 1200 hours at Central Park
Wind: Average wind speed in miles per hour at 0700 and 1000 hours at
LaGuardia Airport
Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia
Airport.
Month: months, 1…12
Day: the numeric data from 1….31
regair=lm(Ozone~Solar.R+Wind+Temp,data=airquality)
regair
summary(regair)
>regair=lm(Ozone~Solar.R+Wind+Temp,data=airquality)
> regair
Call:
lm(formula = Ozone ~ Solar.R + Wind + Temp, data = airquality)
Coefficients:
(Intercept) Solar.R Wind Temp
-64.34208 0.05982 -3.33359 1.65209
summary(regair)
Call:
lm(formula = Ozone ~ Solar.R + Wind + Temp, data = airquality)
Residuals:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -64.34208 23.05472 -2.791 0.00623 **
Solar.R 0.05982 0.02319 2.580 0.01124 *
Wind -3.33359 0.65441 -5.094 1.52e-06 ***
Temp 1.65209 0.25353 6.516 2.42e-09 ***
---
Signif. codes: 0 „***‟ 0.001 „**‟ 0.01 „*‟ 0.05 „.‟ 0.1 „ ‟ 1
R2 is 60.59%
plot(regair)
th
The 117
Observation is an
outlier
>regair117=lm(Ozone~Solar.R+Wind+Temp,data=airquality,subset=-117)
>regair117
summary(regair117)
Call:
lm(formula = Ozone ~ Solar.R + Wind + Temp, data = airquality,
subset = -117)
Residuals:
Min 1Q Median 3Q Max
-38.757 -13.274 -1.993 9.972 62.314
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -76.89396 20.86408 -3.685 0.000362 ***
Solar.R 0.05405 0.02087 2.590 0.010951 *
Wind -2.76110 0.59860 -4.613 1.12e-05 ***
Temp 1.74239 0.22854 7.624 1.11e-11 ***
---
Signif. codes: 0 „***‟ 0.001 „**‟ 0.01 „*‟ 0.05 „.‟ 0.1 „ ‟ 1
Residual standard error: 19.04 on 106 degrees of freedom
(42 observations deleted due to missingness)
Multiple R-squared: 0.6369, Adjusted R-squared: 0.6267
F-statistic: 61.99 on 3 and 106 DF, p-value: < 2.2e-16
Now R2 is 63.69%
2.4.2.1. Histogram
It is a graphical way of depicting data whether the data are normally distributed or not. In
general, histogram is used for checking the normality assumptions.
For example we use the airquality data
1. Applied Regression Analysis and Generalized Linear Models / John Fox, Sage, 2008