0% found this document useful (0 votes)

61 views

Model Visualisation: (With Ggplot2)

The document discusses visualizing linear regression models with ggplot2. It notes that the current approach of using plot.lm is suboptimal because it separates the data from the representation. The author argues for a better strategy where the data is separated from the representation to allow for more customizable visualizations of linear models.

Uploaded by

api-14814295

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Model Visualisation: (With Ggplot2)

Uploaded by

api-14814295

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Model

visualisation
(with ggplot2)

Hadley Wickham
Rice University

Monday, 13 July 2009

1. Introducing plot.lm
2. The current state of play. Why this is
suboptimal.
3. A better strategy: separate data from
representation.
4. Why a canned set of plots is not
good enough.

Monday, 13 July 2009

plot.lm(mod,
Residuals vs Fitted
which = 1)
0.3

624 ●
0.2

● ●
● ●
●
● ● ●● ●
●
● ● ● ●
● ● ●
● ● ● ●●
● ● ●●
0.1

● ● ●
● ● ● ● ● ● ● ●●
●
● ● ● ● ● ●
● ●●
●
●● ● ● ●●● ● ●● ●
● ● ●● ●
● ●●
●● ● ● ●
●●
●●● ● ●● ● ●●
● ● ●
●
● ● ● ● ●
● ●
●● ●●●●● ● ●
●●● ● ●
●● ●
● ●● ●● ● ● ● ●
● ● ●● ● ●●● ● ●●● ● ●● ● ● ● ●
Residuals

● ● ●●●● ●● ● ●●● ● ●● ● ●●
● ●
●● ●●● ●●●
● ● ●
●● ●●● ●●●● ●● ● ●●
●
● ●● ●●●● ● ● ●●●●●
●●
●●● ● ●●
●● ● ● ● ●
●●●●● ●●
● ●●
●●●●● ●
●
● ●● ●●● ● ● ●
●
0.0

●● ●● ●
●●●●
● ● ● ●●●●
●●
● ●●●●
● ●
●●● ●
● ●●● ● ● ● ●
● ●●●●● ●●● ●● ● ●● ●
● ●
●● ●● ●
●● ● ●● ●● ●●
● ● ●
●
●
● ● ●●● ●● ●●
●●●●●●●● ● ●
● ●●●●●●● ●
●● ●●● ● ●
●●●●● ●
●●● ● ●●
● ● ● ●
●● ● ● ●●●●● ●● ● ● ● ●●●●●
● ●●●
●
●
● ● ●●
● ●
● ●
●●●● ●
●●
● ● ● ● ● ● ●●
● ●●● ●
●●
● ●
● ● ● ●● ● ● ●● ●
● ●
●
● ●
●●
●
●
●●●● ●●
● ●●●
●
●●
●●
●●●●●●●●● ● ●●●●●
●
● ● ● ●● ● ● ● ● ● ●●
●● ●
● ●
● ● ●
● ● ● ● ●● ● ● ●
●● ● ●● ●
●● ●●● ● ●●● ●●● ●●● ● ●● ● ●●●● ● ●
●●
● ●
● ● ●
−0.1

● ● ● ●
● ● ● ● ● ●
● ● ● ● ●
●
● ● ● ●● ●● ● ●
● ● ● ●
●
●
●
−0.2

●
133 ●

● 574
−0.3

−0.2 0.0 0.2 0.4 0.6

Fitted values
lm(log10(sales) ~ city * ns(date, 3) + factor(month))
Monday, 13 July 2009
# File src/library/stats/R/plot.lm.R show[which] <- TRUE
# Part of the R package, http://www.R-project.org r <- residuals(x)
# yh <- predict(x) # != fitted() for glm
# This program is free software; you can redistribute it and/or modify w <- weights(x)
# it under the terms of the GNU General Public License as published by if(!is.null(w)) { # drop obs with zero wt: PR#6640
# the Free Software Foundation; either version 2 of the License, or wind <- w != 0
# (at your option) any later version. r <- r[wind]
# yh <- yh[wind]
# This program is distributed in the hope that it will be useful, w <- w[wind]
# but WITHOUT ANY WARRANTY; without even the implied warranty of labels.id <- labels.id[wind]
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the }
# GNU General Public License for more details. n <- length(r)
# if (any(show[2L:6L])) {
# A copy of the GNU General Public License is available at s <- if (inherits(x, "rlm")) x$s
# http://www.r-project.org/Licenses/ else if(isGlm) sqrt(summary(x)$dispersion)
else sqrt(deviance(x)/df.residual(x))
plot.lm <- hii <- lm.influence(x, do.coef = FALSE)$hat
function (x, which = c(1L:3,5), ## was which = 1L:4, if (any(show[4L:6L])) {
caption = list("Residuals vs Fitted", "Normal Q-Q", cook <- if (isGlm) cooks.distance(x)
"Scale-Location", "Cook's distance", else cooks.distance(x, sd = s, res = r)
"Residuals vs Leverage", }
expression("Cook's dist vs Leverage " * h[ii] / (1 - h[ii]))), }
panel = if(add.smooth) panel.smooth else points, if (any(show[2L:3L])) {
sub.caption = NULL, main = "", ylab23 <- if(isGlm) "Std. deviance resid." else "Standardized residuals"
ask = prod(par("mfcol")) < length(which) && dev.interactive(), ..., r.w <- if (is.null(w)) r else sqrt(w) * r
id.n = 3, labels.id = names(residuals(x)), cex.id = 0.75, ## NB: rs is already NaN if r=0, hii=1
qqline = TRUE, cook.levels = c(0.5, 1.0), rs <- dropInf( r.w/(s * sqrt(1 - hii)), hii )
add.smooth = getOption("add.smooth"), }
label.pos = c(4,2), cex.caption = 1)
{ if (any(show[5L:6L])) { # using 'leverages'
dropInf <- function(x, h) { r.hat <- range(hii, na.rm = TRUE) # though should never have NA
if(any(isInf <- h >= 1.0)) { isConst.hat <- all(r.hat == 0) ||
warning("Not plotting observations with leverage one:\n ", diff(r.hat) < 1e-10 * mean(hii, na.rm = TRUE)
paste(which(isInf), collapse=", "), }
call.=FALSE) if (any(show[c(1L, 3L)]))
x[isInf] <- NaN l.fit <- if (isGlm) "Predicted values" else "Fitted values"
} if (is.null(id.n))
x id.n <- 0
} else {
id.n <- as.integer(id.n)
if (!inherits(x, "lm")) if(id.n < 0L || id.n > n)
stop("use only with \"lm\" objects") stop(gettextf("'id.n' must be in {1,..,%d}", n), domain = NA)
if(!is.numeric(which) || any(which < 1) || any(which > 6)) }
stop("'which' must be in 1L:6") if(id.n > 0L) { ## label the largest residuals
isGlm <- inherits(x, "glm") if(is.null(labels.id))
show <- rep(FALSE, 6) labels.id <- paste(1L:n)

Monday, 13 July 2009

iid <- 1L:id.n
show.r <- sort.list(abs(r), decreasing = TRUE)[iid] }
if(any(show[2L:3L])) abline(h = 0, lty = 3, col = "gray")
show.rs <- sort.list(abs(rs), decreasing = TRUE)[iid] }
text.id <- function(x, y, ind, adj.x = TRUE) { if (show[2L]) { ## Normal
labpos <- ylim <- range(rs, na.rm=TRUE)
if(adj.x) label.pos[1+as.numeric(x > mean(range(x)))] else 3 ylim[2L] <- ylim[2L] + diff(ylim) * 0.075
text(x, y, labels.id[ind], cex = cex.id, xpd = TRUE, qq <- qqnorm(rs, main = main, ylab = ylab23, ylim = ylim, ...)
pos = labpos, offset = 0.25) if (qqline) qqline(rs, lty = 3, col = "gray50")
} if (one.fig)
} title(sub = sub.caption, ...)
getCaption <- function(k) # allow caption = "" , plotmath etc mtext(getCaption(2), 3, 0.25, cex = cex.caption)
as.graphicsAnnot(unlist(caption[k])) if(id.n > 0)
text.id(qq$x[show.rs], qq$y[show.rs], show.rs)
if(is.null(sub.caption)) { ## construct a default: }
cal <- x$call if (show[3L]) {
if (!is.na(m.f <- match("formula", names(cal)))) { sqrtabsr <- sqrt(abs(rs))
cal <- cal[c(1, m.f)] ylim <- c(0, max(sqrtabsr, na.rm=TRUE))
names(cal)[2L] <- "" # drop " formula = " yl <- as.expression(substitute(sqrt(abs(YL)), list(YL=as.name(ylab23))))
} yhn0 <- if(is.null(w)) yh else yh[w!=0]
cc <- deparse(cal, 80) # (80, 75) are ``parameters'' plot(yhn0, sqrtabsr, xlab = l.fit, ylab = yl, main = main,
nc <- nchar(cc[1L], "c") ylim = ylim, type = "n", ...)
abbr <- length(cc) > 1 || nc > 75 panel(yhn0, sqrtabsr, ...)
sub.caption <- if (one.fig)
if(abbr) paste(substr(cc[1L], 1L, min(75L, nc)), "...") else cc[1L] title(sub = sub.caption, ...)
} mtext(getCaption(3), 3, 0.25, cex = cex.caption)
one.fig <- prod(par("mfcol")) == 1 if(id.n > 0)
if (ask) { text.id(yhn0[show.rs], sqrtabsr[show.rs], show.rs)
oask <- devAskNewPage(TRUE) }
on.exit(devAskNewPage(oask)) if (show[4L]) {
} if(id.n > 0) {
##---------- Do the individual plots : ---------- show.r <- order(-cook)[iid]# index of largest 'id.n' ones
if (show[1L]) { ymx <- cook[show.r[1L]] * 1.075
ylim <- range(r, na.rm=TRUE) } else ymx <- max(cook, na.rm = TRUE)
if(id.n > 0) plot(cook, type = "h", ylim = c(0, ymx), main = main,
ylim <- extendrange(r= ylim, f = 0.08) xlab = "Obs. number", ylab = "Cook's distance", ...)
plot(yh, r, xlab = l.fit, ylab = "Residuals", main = main, if (one.fig)
ylim = ylim, type = "n", ...) title(sub = sub.caption, ...)
panel(yh, r, ...) mtext(getCaption(4), 3, 0.25, cex = cex.caption)
if (one.fig) if(id.n > 0)
title(sub = sub.caption, ...) text.id(show.r, cook[show.r], show.r, adj.x=FALSE)
mtext(getCaption(1), 3, 0.25, cex = cex.caption) }
if(id.n > 0) { if (show[5L]) {
y.id <- r[show.r] ylab5 <- if (isGlm) "Std. Pearson resid." else "Standardized residuals"
y.id[y.id < 0] <- y.id[y.id < 0] - strheight(" ")/3 r.w <- residuals(x, "pearson")
text.id(yh[show.r], y.id, show.r) if(!is.null(w)) r.w <- r.w[wind] # drop 0-weight cases

Monday, 13 July 2009

rsp <- dropInf( r.w/(s * sqrt(1 - hii)), hii )
ylim <- range(rsp, na.rm = TRUE) format(mean(r.hat)),
if (id.n > 0) { "\n and there are no factor predictors; no plot no. 5")
ylim <- extendrange(r= ylim, f = 0.08) frame()
show.rsp <- order(-cook)[iid] do.plot <- FALSE
} }
do.plot <- TRUE }
if(isConst.hat) { ## leverages are all the same else { ## Residual vs Leverage
if(missing(caption)) # set different default xx <- hii
caption[[5]] <- "Constant Leverage:\n Residuals vs Factor Levels" ## omit hatvalues of 1.
## plot against factor-level combinations instead xx[xx >= 1] <- NA
aterms <- attributes(terms(x))
## classes w/o response plot(xx, rsp, xlim = c(0, max(xx, na.rm = TRUE)), ylim = ylim,
dcl <- aterms$dataClasses[ -aterms$response ] main = main, xlab = "Leverage", ylab = ylab5, type = "n",
facvars <- names(dcl)[dcl %in% c("factor", "ordered")] ...)
mf <- model.frame(x)[facvars]# better than x$model panel(xx, rsp, ...)
if(ncol(mf) > 0) { abline(h = 0, v = 0, lty = 3, col = "gray")
## now re-order the factor levels *along* factor-effects if (one.fig)
## using a "robust" method {not requiring dummy.coef}: title(sub = sub.caption, ...)
effM <- mf if(length(cook.levels)) {
for(j in seq_len(ncol(mf))) p <- length(coef(x))
effM[, j] <- sapply(split(yh, mf[, j]), mean)[mf[, j]] usr <- par("usr")
ord <- do.call(order, effM) hh <- seq.int(min(r.hat[1L], r.hat[2L]/100), usr[2L],
dm <- data.matrix(mf)[ord, , drop = FALSE] length.out = 101)
## #{levels} for each of the factors: for(crit in cook.levels) {
nf <- length(nlev <- unlist(unname(lapply(x$xlevels, length)))) cl.h <- sqrt(crit*p*(1-hh)/hh)
ff <- if(nf == 1) 1 else rev(cumprod(c(1, nlev[nf:2]))) lines(hh, cl.h, lty = 2, col = 2)
facval <- ((dm-1) %*% ff) lines(hh,-cl.h, lty = 2, col = 2)
## now reorder to the same order as the residuals }
facval[ord] <- facval legend("bottomleft", legend = "Cook's distance",
xx <- facval # for use in do.plot section. lty = 2, col = 2, bty = "n")
xmax <- min(0.99, usr[2L])
plot(facval, rsp, xlim = c(-1/2, sum((nlev-1) * ff) + 1/2), ymult <- sqrt(p*(1-xmax)/xmax)
ylim = ylim, xaxt = "n", aty <- c(-sqrt(rev(cook.levels))*ymult,
main = main, xlab = "Factor Level Combinations", sqrt(cook.levels)*ymult)
ylab = ylab5, type = "n", ...) axis(4, at = aty,
axis(1, at = ff[1L]*(1L:nlev[1L] - 1/2) - 1/2, labels = paste(c(rev(cook.levels), cook.levels)),
labels= x$xlevels[[1L]][order(sapply(split(yh,mf[,1]), mgp = c(.25,.25,0), las = 2, tck = 0,
mean))]) cex.axis = cex.id, col.axis = 2)
mtext(paste(facvars[1L],":"), side = 1, line = 0.25, adj=-.05) }
abline(v = ff[1L]*(0:nlev[1L]) - 1/2, col="gray", lty="F4") } # if(const h_ii) .. else ..
panel(facval, rsp, ...) if (do.plot) {
abline(h = 0, lty = 3, col = "gray") mtext(getCaption(5), 3, 0.25, cex = cex.caption)
} if (id.n > 0) {
else { # no factors y.id <- rsp[show.rsp]
message("hat values (leverages) are all = ", y.id[y.id < 0] <- y.id[y.id < 0] - strheight(" ")/3

Monday, 13 July 2009

text.id(xx[show.rsp], y.id, show.rsp)
} }
} }
}
if (show[6L]) { if (!one.fig && par("oma")[3L] >= 1)
g <- dropInf( hii/(1-hii), hii ) mtext(sub.caption, outer = TRUE, cex = 1.25)
ymx <- max(cook, na.rm = TRUE)*1.025 invisible()
plot(g, cook, xlim = c(0, max(g, na.rm=TRUE)), ylim = c(0, ymx), }
main = main, ylab = "Cook's distance",
xlab = expression("Leverage " * h[ii]),
xaxt = "n", type = "n", ...)
panel(g, cook, ...)
## Label axis with h_ii values
athat <- pretty(hii)
axis(1, at = athat/(1-athat), labels = paste(athat))
if (one.fig)
title(sub = sub.caption, ...)
p <- length(coef(x))
bval <- pretty(sqrt(p*cook/g), 5)

usr <- par("usr")

xmax <- usr[2L]
ymax <- usr[4L]
for(i in 1L:length(bval)) {
bi2 <- bval[i]^2
if(ymax > bi2*xmax) {
xi <- xmax + strwidth(" ")/3
yi <- bi2*xi
abline(0, bi2, lty = 2)
text(xi, yi, paste(bval[i]), adj = 0, xpd = TRUE)
} else {
yi <- ymax - 1.5*strheight(" ")
xi <- yi/bi2
lines(c(0, xi), c(0, yi), lty = 2)
text(xi, ymax-0.8*strheight(" "), paste(bval[i]),
adj = 0.5, xpd = TRUE)
}
}

## axis(4, at=p*cook.levels, labels=paste(c(rev(cook.levels),

cook.levels)),
## mgp=c(.25,.25,0), las=2, tck=0, cex.axis=cex.id)
mtext(getCaption(6), 3, 0.25, cex = cex.caption)
if (id.n > 0) {
show.r <- order(-cook)[iid]
text.id(g[show.r], cook[show.r], show.r)

Monday, 13 July 2009

Problems

Hard to understand.
Hard to extend.
Locked into set of pre-specified graphics.
Of no use to other graphics packages.

Monday, 13 July 2009

Alternative approach

What does this actually code do?

It 1) extracts various quantities of interest
from the model and then 2) plots them
So why not perform those two tasks
separately?

Monday, 13 July 2009

Quantities of interest
fortify.lm <- function(model, data = model$model, ...) {
infl <- influence(model, do.coef = FALSE)
data$.hat <- infl$hat
data$.sigma <- infl$sigma
data$.cooksd <- cooks.distance(model, infl)

data$.fitted <- predict(model)

data$.resid <- resid(model)
data$.stdresid <- rstandard(model, infl)

data
}
Note use of . prefix to
avoid name clasehes
Monday, 13 July 2009
plot.lm(mod,
Residuals vs Fitted
which = 1)
0.3

624 ●
0.2

● ●
● ●
●
● ● ●● ●
●
● ● ● ●
● ● ●
● ● ● ●●
● ● ●●
0.1

● ● ● ●
● ● ● ● ● ●
● ● ● ● ●
●
● ● ● ●● ●● ● ●
● ● ● ●
●
●
●
−0.2

●
133 ●

● 574
−0.3

−0.2 0.0 0.2 0.4 0.6

Fitted values
lm(log10(sales) ~ city * ns(date, 3) + factor(month))
Monday, 13 July 2009
ggplot(mod, aes(.fitted, .resid)) +
geom_hline(yintercept = 0) +
geom_point() +
geom_smooth(se = F)

Monday, 13 July 2009

●

0.2 ●
●
● ●
●
●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ●●
0.1 ● ●● ● ●
● ● ● ●● ●●
● ● ● ●●
● ● ● ● ● ● ●
● ● ● ●
●
● ●
● ●●● ● ● ● ● ●
● ●
● ● ● ●
●
●
●●●●●●●
●
●● ●
●●● ●
●● ●● ● ● ●● ●
● ●
● ● ●●●
●● ●
●
● ●
● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●
● ● ●● ● ● ● ● ●
●● ● ● ● ● ●
● ● ● ● ●●● ● ● ●● ●● ● ●●
● ●●● ●●● ●●● ●●●●
● ● ●●
● ●● ●● ●
● ●● ●● ● ● ●
●
● ● ●● ● ●● ● ●
●● ●● ● ● ● ● ● ●●● ● ●
●● ● ●●●●
●● ●●
●● ● ●
● ●
●● ● ●
●● ● ●● ●
●
● ●
● ●●
●
●● ●●●●
●● ●
●
●
● ● ●
●● ●● ● ●● ● ● ●● ● ●● ● ●● ● ● ●
0.0 ●
● ● ●● ●● ●● ● ● ●● ● ● ● ●● ●
● ● ● ●
● ● ● ●● ● ●
.resid

● ●
● ● ● ● ● ● ● ●●
●●● ● ●● ●
●● ● ●
●●
● ● ● ● ● ●●● ● ● ●●
●
● ● ● ● ●●●● ●●●● ●●●●●●● ●
●●● ●
●●●
●● ●● ●● ●
●● ● ●● ● ● ●
●● ● ●●●
● ● ●● ●●● ● ● ●● ●
●●
● ●●● ● ●● ●●● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ●●● ●
● ●● ●●●
●● ● ●● ● ● ● ● ●●
● ● ● ● ●●
● ●● ●●●● ●
● ●●●
●● ●
● ● ●●
● ● ●●●●●●●
●
●●● ● ●●
● ●
● ●
● ● ● ●● ● ● ● ●●
● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●
●
● ● ● ●● ● ● ● ●● ● ●
● ● ●
● ● ●● ● ● ●●
●
● ●
●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
−0.1 ● ● ● ●
●
● ●
● ● ●● ●
● ●
● ●●
● ● ●
●
●
●

−0.2 ●

−0.2 0.0 0.2 0.4 0.6

.fitted
Monday, 13 July 2009
Diagnostics should
reflect data

Monday, 13 July 2009

●

● ●
● ● ● ● ● ● ● ●●
●●● ● ●● ●
●● ● ●
●●
● ● ● ● ● ●●● ● ● ●●
●
● ● ● ● ●●●● ●●●● ●●●●●●● ●
●●● ●
●●●
●● ●● ●● ●
●● ● ●● ● ● ●
●● ● ●●●
● ● ●● ●●● ● ● ●● ●
●●
● ●●● ● ●● ●●● ● ● ●● ● ●
● ● ● ● ● ●● ● ● ●●● ●
● ●● ●●●
●● ● ●● ● ● ● ● ●●
● ● ● ● ●●
● ●● ●●●● ●
● ●●●
●● ●
● ● ●●
● ● ●●●●●●●
●
●●● ● ●●
● ●
● ●
● ● ● ●● ● ● ● ●●
● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●
●
● ● ● ●● ● ● ● ●● ● ●
● ● ●
● ● ●● ● ● ●●
●
● ●
●
● ● ● ●
● ● ● ● ●
● ●
● ● ● ● ●
−0.1 ● ● ● ●
●
● ●
● ● ●● ●
● ●
● ●●
● ● ●
●
●
●

−0.2 ●

−0.2 0.0 0.2 0.4 0.6

.fitted
Monday, 13 July 2009
●
Use informative
0.2
x variable
●
●
● ●
●
●
● ● ●
● ● ●
● ● ●
● ● ●
● ●● ●
●
●
● ● ●
●
●
0.1 ● ● ● ● ●
●
●
● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ● ●
●
● ● ● ● ●
● ● ●
● ●
● ● ● ● ●
● ● ● ●●●● ●● ●● ●
● ●
● ●● ● ● ●
●● ● ● ● ● ● ● ● ● ●●
● ●●● ● ●
● ● ●●● ● ●● ● ● ● ● ● ●
●
● ● ● ● ●● ● ● ● ● ● ● ●● ●
● ● ● ● ● ●● ● ● ● ●● ● ● ●●●● ●
● ● ● ●
●
● ●●●
●● ●
● ● ●●●
● ●● ●● ● ●
● ● ● ●
●●● ●● ●●● ● ●● ● ●
●
●
●
● ● ● ● ● ●
●●
●
●
●
● ● ● ● ●● ● ●
●
●●
●
●●● ●●●
●
● ●
●● ●
● ● ● ● ●●● ● ● ●● ● ●
● ● ●
●● ● ● ● ● ● ● ●● ● ●
●●
● ● ● ● ●
0.0 ●●
● ●
●
● ● ●
● ●
● ● ● ● ●
● ●● ● ● ●
●
● ●● ●●
● ●
● ●
.resid

● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●●●●
●●
● ●●
● ● ● ● ● ●● ● ● ● ●● ● ●
● ● ● ● ● ●
●●● ● ● ●●
● ●
● ● ●
● ● ● ● ●●● ● ● ●● ●●● ●
● ●
●● ●
● ●● ●● ● ●
● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●●●
● ●●●● ●
● ●●●● ● ●● ●
● ● ● ●
● ●
●
● ●
●● ● ●
● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ●
● ● ●●
● ● ●
●●●
●●
● ●● ●
● ●
●●● ● ● ● ● ● ● ● ●●
● ● ● ●● ● ●
● ●● ● ● ● ● ● ● ●●● ● ● ● ● ●●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●● ●
● ● ● ● ● ● ●
● ● ● ●● ● ●
● ● ● ● ●
−0.1 ● ●
●
● ●
● ●
● ●●
● ● ● ●●
● ●
● ● ●
●
●
●

−0.2 ●

2000 2002 2004 2006 2008

date
Monday, 13 July 2009
Connect original
0.2
units

0.1

0.0
.resid

−0.1

−0.2

2000 2002 2004 2006 2008

date
Monday, 13 July 2009
Colour by possible
explanatory variable
0.2

0.1

0.0
.resid

−0.1

−0.2

2000 2002 2004 2006 2008

date
Monday, 13 July 2009
Austin Bryan−College Station Dallas

0.2

0.1

0.0

−0.1

−0.2

48,000 / 86,000
.resid

Houston San Antonio San Marcos

0.2

0.1

0.0

−0.1

−0.2
29,000 / 50,000
2000 2002 2004 2006 2008 2000 2002 2004 2006 2008 2000 2002 2004 2006 2008
date
Monday, 13 July 2009
ggplot(modf, aes(date, .resid)) +
geom_line(aes(group = city))

ggplot(modf, aes(date, .resid,

colour = college_town)) +
geom_line(aes(group = city))

ggplot(modf, aes(date, .resid)) +

geom_line(aes(group = city)) +
facet_wrap(~ city)

Monday, 13 July 2009

fortify.lm <- function(model, data = model$model, ...) { # Which = 1
infl <- influence(model, do.coef = FALSE) ggplot(mod, aes(.fitted, .resid)) +
data$.hat <- infl$hat geom_hline(yintercept = 0) +
data$.sigma <- infl$sigma geom_point() +
data$.cooksd <- cooks.distance(model, infl) geom_smooth(se = F)

data$.fitted <- predict(model) # Which = 2

data$.resid <- resid(model) ggplot(mod, aes(sample = .stdresid)) +
data$.stdresid <- rstandard(model, infl) stat_qq() +
geom_abline()
data
} # Which = 3
ggplot(mod, aes(.fitted, abs(.stdresid)) +
geom_point() +
geom_smooth(se = FALSE) +
scale_y_sqrt()

# Which = 4
mod$row <- rownames(mod)
ggplot(mod, aes(row, .cooksd)) +
geom_bar(stat = "identity")

# Which = 5
ggplot(mod, aes(.hat, .stdresid)) +
geom_vline(size = 2, colour = "white", xintercept = 0) +
geom_hline(size = 2, colour = "white", yintercept = 0) +
geom_point() +
geom_smooth(se = FALSE)

# Which = 6
ggplot(mod, aes(.hat, .cooksd, data = mod)) +
geom_vline(colour = NA) +
geom_abline(slope = seq(0, 3, by = 0.5), colour = "white") +
geom_smooth(se = FALSE) +
geom_point()

Monday, 13 July 2009

Other models

A work in progress: hard work because

most of the functions are like plot.lm
Models: lm, tsdiag, survreg
Maps: maps, and sp classes. Much
easier to work with data frames.

Monday, 13 July 2009

Conclusions

Separating data from visualisation

improves clarity and reusability.
A pre-specified set of plots will not
uncover many model problems. Should
be easy custom diagnostics for your
needs.

Monday, 13 July 2009

crantastic! http://crantastic.org
A community site for finding,
rating, and reviewing R packages.

Monday, 13 July 2009

User's Manual: PVR-9300T
100% (1)
User's Manual: PVR-9300T
80 pages
The Future Is Asian
100% (1)
The Future Is Asian
2 pages
Basics of MRI PDF
75% (4)
Basics of MRI PDF
44 pages
Ejemplo de Uso
No ratings yet
Ejemplo de Uso
13 pages
PCA Eurasia All
No ratings yet
PCA Eurasia All
2 pages
0 1000 2000 3000 4000 5000 My - Data$time
No ratings yet
0 1000 2000 3000 4000 5000 My - Data$time
1 page
Math 141: Lecture 18: Correlation and Regression
No ratings yet
Math 141: Lecture 18: Correlation and Regression
26 pages
STAT 432: Basics of Statistical Learning: Tree and Random Forests
No ratings yet
STAT 432: Basics of Statistical Learning: Tree and Random Forests
54 pages
Old Faithful Geyser Data: 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Eruptions
No ratings yet
Old Faithful Geyser Data: 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Eruptions
2 pages
Old Faithful Geyser Data: 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Eruptions
No ratings yet
Old Faithful Geyser Data: 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Eruptions
1 page
Myplot PDF
No ratings yet
Myplot PDF
1 page
Old Faithful Geyser Data: 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Eruptions
No ratings yet
Old Faithful Geyser Data: 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Eruptions
1 page
Myplot2 PDF
No ratings yet
Myplot2 PDF
1 page
Myplot
No ratings yet
Myplot
1 page
Clustering
No ratings yet
Clustering
62 pages
RQ Engellogplot
No ratings yet
RQ Engellogplot
1 page
1-s2.0-S0165176521003311-mmc1
No ratings yet
1-s2.0-S0165176521003311-mmc1
6 pages
5 Regression PDF
No ratings yet
5 Regression PDF
115 pages
RQ Regspline
No ratings yet
RQ Regspline
1 page
K-Means RK
No ratings yet
K-Means RK
43 pages
Pnas 1811269115 Sapp
No ratings yet
Pnas 1811269115 Sapp
25 pages
PCA Eurasia All Modern
No ratings yet
PCA Eurasia All Modern
2 pages
1_intro
No ratings yet
1_intro
18 pages
bit2dcols
No ratings yet
bit2dcols
1 page
Clustering
No ratings yet
Clustering
14 pages
Intro To Statistical Learning
No ratings yet
Intro To Statistical Learning
46 pages
1000 2000 3000 4000 5000 Household Income
No ratings yet
1000 2000 3000 4000 5000 Household Income
1 page
Zone Log PDF
No ratings yet
Zone Log PDF
2 pages
Zone Log PDF
No ratings yet
Zone Log PDF
2 pages
Mutant Year Zero - Zone Log PDF
No ratings yet
Mutant Year Zero - Zone Log PDF
2 pages
Mutant Year Zero - Zone Log
No ratings yet
Mutant Year Zero - Zone Log
2 pages
0 200 400 600 800 1000 1200 1400 Try2$Operating - Days (Try2$Well - Num "Wellnum 17963")
No ratings yet
0 200 400 600 800 1000 1200 1400 Try2$Operating - Days (Try2$Well - Num "Wellnum 17963")
4 pages
ELOVL3 MYC Correlation 9EZyE
No ratings yet
ELOVL3 MYC Correlation 9EZyE
1 page
MYZ Zone Log Printer Friendly
No ratings yet
MYZ Zone Log Printer Friendly
1 page
Coordinates Terrain Rot Level Threat Comment
No ratings yet
Coordinates Terrain Rot Level Threat Comment
1 page
Probability
No ratings yet
Probability
27 pages
Hero Habit Tracker (PDF)
No ratings yet
Hero Habit Tracker (PDF)
1 page
Chapter 8
No ratings yet
Chapter 8
4 pages
Eloisa
No ratings yet
Eloisa
12 pages
ELOVL4 MYC Correlation QNMHD
No ratings yet
ELOVL4 MYC Correlation QNMHD
1 page
330 Lecture9 2014
No ratings yet
330 Lecture9 2014
40 pages
Fo Inconsistency
No ratings yet
Fo Inconsistency
3 pages
Original PDF
No ratings yet
Original PDF
2 pages
Week06 Notes
No ratings yet
Week06 Notes
15 pages
Ggplot2 Course2 ch5 Slides
No ratings yet
Ggplot2 Course2 ch5 Slides
23 pages
Problems Encountered On All Sources: Timeouts Failures
No ratings yet
Problems Encountered On All Sources: Timeouts Failures
1 page
4_Chap2_TwoVars (1)
No ratings yet
4_Chap2_TwoVars (1)
24 pages
Week05 Notes
No ratings yet
Week05 Notes
16 pages
Problems Encountered On All Sources: Timeouts Failures
No ratings yet
Problems Encountered On All Sources: Timeouts Failures
1 page
Lecture HPC 11 Parallelization
No ratings yet
Lecture HPC 11 Parallelization
128 pages
330 Lecture15 2014
No ratings yet
330 Lecture15 2014
53 pages
Products For Promotion
No ratings yet
Products For Promotion
3 pages
Normal Q Q Plot (Sepal - Length) Normal Q Q Plot (Sepal - Width)
No ratings yet
Normal Q Q Plot (Sepal - Length) Normal Q Q Plot (Sepal - Width)
1 page
Benzecri Distances (2D) Rows
No ratings yet
Benzecri Distances (2D) Rows
1 page
Ca FX 4
No ratings yet
Ca FX 4
1 page
LocalGLMnet: A Deep Learning Architecture For Actuaries
No ratings yet
LocalGLMnet: A Deep Learning Architecture For Actuaries
35 pages
MRT 1550 Vantage Orian Brochure MCAMR0143EA
100% (1)
MRT 1550 Vantage Orian Brochure MCAMR0143EA
19 pages
bit5dcols
No ratings yet
bit5dcols
1 page
330 Lecture8 2015
No ratings yet
330 Lecture8 2015
33 pages
Lpic Package - L TEX Over Graphics: Vinh Q. Nguyen
No ratings yet
Lpic Package - L TEX Over Graphics: Vinh Q. Nguyen
4 pages
Aries's Requirement
No ratings yet
Aries's Requirement
49 pages
1 Toefl17122331 1
No ratings yet
1 Toefl17122331 1
1 page
Pericarditis: Pathophysiology
No ratings yet
Pericarditis: Pathophysiology
6 pages
Upstream Penicillin
0% (1)
Upstream Penicillin
6 pages
MRF1550 Motorola
No ratings yet
MRF1550 Motorola
12 pages
Summer Training Report
No ratings yet
Summer Training Report
45 pages
Transport X
No ratings yet
Transport X
26 pages
SSC JE Study Materials Civil ESTIMATING COSTING and VALUATION
100% (1)
SSC JE Study Materials Civil ESTIMATING COSTING and VALUATION
20 pages
Introduction To Masonry Structures
No ratings yet
Introduction To Masonry Structures
33 pages
Tugas VBT Kimia Anorganik 2-Melva Hilderia S. (06101381520043)
No ratings yet
Tugas VBT Kimia Anorganik 2-Melva Hilderia S. (06101381520043)
6 pages
Maths Sample Paper 1
No ratings yet
Maths Sample Paper 1
11 pages
Advantages of The TOGA-Transformer Oil Gas Analyzer Involving Headspace-GC Analysis and A DGA System
No ratings yet
Advantages of The TOGA-Transformer Oil Gas Analyzer Involving Headspace-GC Analysis and A DGA System
6 pages
Graduate School: Research Design
No ratings yet
Graduate School: Research Design
3 pages
Plant Reproduction
No ratings yet
Plant Reproduction
28 pages
Salesforce Connector Implementation Guide
No ratings yet
Salesforce Connector Implementation Guide
15 pages
Baroody The development of adaptive expertise and flexibility The integration of conceptual and procedural knowledge
No ratings yet
Baroody The development of adaptive expertise and flexibility The integration of conceptual and procedural knowledge
52 pages
Progetto Delta
No ratings yet
Progetto Delta
9 pages
Introduction To Pressure Vessels
100% (1)
Introduction To Pressure Vessels
17 pages
White Paper 16 - Quantitative Efficiency Analysis of Power Distribution Configurations For Data Centers
100% (3)
White Paper 16 - Quantitative Efficiency Analysis of Power Distribution Configurations For Data Centers
35 pages
New Methods and Ways of Foreign Language Teaching
No ratings yet
New Methods and Ways of Foreign Language Teaching
4 pages
Helen Project On Single Parenting and Students' Education
No ratings yet
Helen Project On Single Parenting and Students' Education
50 pages
AGC602 Marketing Management Practice in Baba
No ratings yet
AGC602 Marketing Management Practice in Baba
15 pages
Attentional Training - OCD
No ratings yet
Attentional Training - OCD
7 pages
VBSL - Comments Plan & Profile
No ratings yet
VBSL - Comments Plan & Profile
2 pages
CS 109: Data Science: Exploratory Data Analysis & Effective Visualizations
100% (1)
CS 109: Data Science: Exploratory Data Analysis & Effective Visualizations
99 pages
A Web-Based Lost and Found System For Gardner College Diliman
No ratings yet
A Web-Based Lost and Found System For Gardner College Diliman
3 pages
ANN and Power System
No ratings yet
ANN and Power System
37 pages