0% found this document useful (0 votes)

73 views6 pages

How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable

The document provides instructions on how to use qqplot in R to create quantile-quantile (Q-Q) plots for normal and other distributions like Poisson. It then discusses simple linear regression, including how to find the regression line and coefficients using the lm() function, make predictions, and identify outliers. Methods for resistant regression like least trimmed squares (lqs()) and rlm() are introduced. Finally, examples are given of adding trend lines using scatter.smooth(), smooth.spline(), and supsmu() for non-linear relationships.

Uploaded by

Daniel Wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views6 pages

How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable

Uploaded by

Daniel Wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

1

How to use qqplot

(1) Normal quantile-quantile plot
# Generating data from the normal distribution
x <- rnorm(500)
hist(x)
qqnorm(x)
qqline(x)

(2) Q-Q plot for other distributions

# Generating data from Poisson distribution
x <- rpois(100, lambda=5)
hist(x)
par(mfrow=c(1,2), pty="s")
# Comparing against a Poisson
# First, generate theoretical quantiles
th.quantile = qpois( seq(0, 1, length=length(x)), lambda=mean(x) )
qqplot( th.quantile, x, xlab="Theoretical Quantiles", ylab="x")
title(main="Poisson Q-Q Plot")
# Comparing against a Normal
qqnorm(x, ylab="x")
qqline(x)
par(mfrow=c(1,1))

3.4 Simple linear regression

variables x and y have a linear relationship
=> y = mx + b, where m is the slope, b the intercept.
x : independent variable, y : dependent variable

simple linear regression model :

: error term,

the coefficients

and

: regression coefficients.

x : predictor variable
y : response variable
meaning of "linear" applies to the way the regression coefficients are used.
meaning of simple : only one predictor variable is used.
The estimated regression line :
the predicted value :
residual : the difference between the observed value and the predicted value

: the signed vertical distance of the point

to the prediction line.

Estimation method : The method of least squares

- chooses the coefficients so that the sum of the squared residuals is as small as possible.

( x x)( y y )
( x x)
i

x
y
1
=

3.4.1 Using the regression model for prediction

- make predictions for the response value for new values of the predictor.

3.4.2 Finding the regression coefficients using lm()

lm() : function for linear model fitting
lm(model.formula)

ex) lm(y ~ x) : y is modeled by x

Example 3.5: The regression line for the Maple wood home data
homedata(UsingR) : a strong linear trend between the 1970 & and the 2000 assessments.
attach(homedata)
lm(y2000 ~ y1970)
fit1 = lm(y2000 ~ y1970) # save the result
fit1

Adding the regression line to a scatterplot: abline()

plot(y1970, y2000, main="-113,000+5.43x")

abline(fit1)
Remark) abline() function can add other lines too :
abline(a,b) : the line y=a+bx
abline(h=c) : the horizontal line y=c
abline(v=c) : the vertical line x=c

Using the regression line for predictions

predict the y value for a given x value
(1)
-113000 + 5.43*50000
(2)
betas = coef(fit1)
sum(betas * c(1, 50000)) # beta0 * 1 + betal * 50000
Remark) Other useful extractor functions like coef()
residuals() : returns the residuals
predict() : perform predictions

Eg. Find the predicted and residual value at the data point (55100, 130200)
To specify the x value, a data frame is required with properly named variables.
predict(fit1, data.frame(y1970=55100))
130200 - predict(fit1, data.frame(y1970=55100)) # residual

More on model formulas

Summary() is one of generic function: output different depending on input argument
The plot() function is an example of a generic function in R.
plot( model formula ) : a scatterplot is created.
plot(the output of the density() function) : a density plot is produced.
plot(y2000 ~ y1970)
fit1 = lm(y2000 ~ y1970)
abline(fit1)
3.4.3 Transformations of the data
Example 3.6: Kids weights: Is weight related to height squared?
kid.weights(UsingR) data set : the relationship between height and weight.
the BMI suggests a relationship between height squared and weight.

height.sq = kid.weights$height^2
plot(weight ~ height.sq, data=kid.weights)
fit2 = lm(weight ~ height.sq, data=kid.weights)
abline(fit2)
fit2

Using a model formula with transformations

(1) Wrong method
plot(weight ~ height^2, data=kid.weights) # not as expected
fit2 = lm(weight ~ height^2, data=kid.weights)
abline(fit2)

(2) Right method

plot(weight ~ I(height^2), data=kid.weights)
fit2 = lm(weight ~ I(height^2), data=kid.weights)
abline(fit2)

3.4.4 Interacting with a scatterplot

identify() : R function to identify points on a scatterplot.
usage : identify (x, y, labels=, n=)
The value n= : specifies the number of points to identify.
The argument labels= : allows for the placement of other text.
locator() : locate the (x, y) coordinates of the points we select.
called with the number of points desired, as with locator (2).

Example 3.7: Florida 2000

florida(UsingR) data set : county-by-county vote counts for the 2000 U.S. presidential election
in the state of Florida.
plot(BUCHANAN ~ BUSH, data=florida) # two outliers.
res = lm(BUCHANAN ~ BUSH, data=florida)
abline(res)
with(florida, identify(BUSH, BUCHANAN, n=2, labels=County))
florida$County[c(13,50)]
The predicted amount and residual for Palm Beach :
with(florida, predict(res, data.frame(BUSH = BUSH[50])))

residuals(res)[50]

Buchanan received 2,610 of Gores votes. Many more than the 567 that decided the state
and the presidency.

3.4.5 Outliers in the regression model

Two types of outliers:
outlier for the individual variables
outlier in the regression : points that are far from the trend or pattern of the data.
Example 3.8: Emissions versus GDP
emissions(UsingR) data set : data for several countries on CO2 emissions and per-capita gross
domestic product(GDP).
f = CO2 ~ perCapita # save formula
plot(f, data=emissions) # one isolated point that seems to pull the regression line upward
abline( lm(CO2 ~ perCapita, data=emissions) )
abline( lm(f, data=emissions, subset=-1), lty=2 )
Remark) U.S. point is an outlier for the CO2 variable, but not for the per-capita GDP.
an outlier in regression, as it stands far off from the trend set by the rest of the data.
an influential observation, as its presence dramatically affects the regression line.

3.4.6

Resistant regression lines: lqs() and rlm()

The regression coefficients are subject to strong influences from outliers.

use resistant regression methods.

(1) Least-trimmed squares

The method of least-trimmed squares
- use the sum of the q smallest squared residuals, where q is roughly n/2.
- lqs() function from the MASS package.
library(MASS)
abline( lqs(f, data=emissions), lty=3 )
(2) Resistant regression using rlm()
rlm() function, from the MASS package.
abline( rlm(f, data=emissions, method="MM"), lty=4 )
(3) Adding legends to plots
legend()
The placement : in (x, y) coordinates or done with the mouse using locator(n=1).

The labels : legend= argument.

The markings : different line types (lty=); with different colors (col=);
or with different plot characters (pch=).
the.labels = c(lm, lm w/o 1, least trimmed squares, rlm with MM)
the.ltys = 1:4
legend(5000, 6000, legend=the.labels, lty=the.ltys)

3.4.7 Trend lines

When a proper transformation is not possible to make a linear relationship,
superimpose a trend line on top of the data using one of the smoothing techniques.
scatter.smooth() : uses the loess() function to plot both the scatterplot and a trend line.
smooth.spline() : fit the data using cubic splines
supsmu() : perform Friedmans super smoother algorithm.
Example 3.9: Five years of temperature data
five.yr.temperature(UsingR) : five years of New York City temperature data.
- scatterplot shows a periodic, sinusoidal pattern.
attach(five.yr.temperature)
scatter.smooth(temps ~ days, col=gray(0.75))
scatter.smooth(temps ~ days, col=gray(0.75))
lines(smooth.spline(temps ~ days), lty=2, lwd=2)
lines(supsmu(days, temps), lty=3, lwd=2)
legend(locator(1), lty=c(1,2,3), lwd=c(1,2,2), legend=c(scatter.smooth,smooth.spline,supsmu))
detach(five.yr.temperature)

Canadian Manual On Foundation Engineering
No ratings yet
Canadian Manual On Foundation Engineering
297 pages
IC ENGINES PPT Revised1
No ratings yet
IC ENGINES PPT Revised1
80 pages
Using R For Introductory Econometrics
No ratings yet
Using R For Introductory Econometrics
378 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
The Design Development and Testing of A PDF
No ratings yet
The Design Development and Testing of A PDF
109 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Guidelines For Foreign Exchange Transactions - Bangladesh Bank
No ratings yet
Guidelines For Foreign Exchange Transactions - Bangladesh Bank
441 pages
Analysing Data Using Linear Models 5th Ed January 2021
No ratings yet
Analysing Data Using Linear Models 5th Ed January 2021
388 pages
Course Notes18
No ratings yet
Course Notes18
113 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
OH-SFF Naval Manual
No ratings yet
OH-SFF Naval Manual
180 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
Chapter 9 Real Mortgage
100% (3)
Chapter 9 Real Mortgage
6 pages
Template Erasmus Mundus
100% (1)
Template Erasmus Mundus
3 pages
BUHK408
No ratings yet
BUHK408
5 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Application of Graph Theory in Electrical Network: Berdewad O. K., Dr. Deo S. D
No ratings yet
Application of Graph Theory in Electrical Network: Berdewad O. K., Dr. Deo S. D
2 pages
Applied Auditing
No ratings yet
Applied Auditing
2 pages
Lec 05 2 - Time Series Regression Model
No ratings yet
Lec 05 2 - Time Series Regression Model
75 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Pencil
No ratings yet
Pencil
17 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
Chapter4 - Part 2
No ratings yet
Chapter4 - Part 2
37 pages
Wong 2010
No ratings yet
Wong 2010
27 pages
Lecture Notes Week 2
No ratings yet
Lecture Notes Week 2
76 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Homework 2
100% (1)
Homework 2
14 pages
Regression Analysis Script
No ratings yet
Regression Analysis Script
24 pages
Swivel Grease MSDS
No ratings yet
Swivel Grease MSDS
8 pages
Lec 05 - Time Series Regression Model
No ratings yet
Lec 05 - Time Series Regression Model
32 pages
Unit 4 - R Programming
No ratings yet
Unit 4 - R Programming
26 pages
Econometrics I - R Summary (Maite Cabeza-Gutes)
No ratings yet
Econometrics I - R Summary (Maite Cabeza-Gutes)
77 pages
CS 2008 3complete PDF
No ratings yet
CS 2008 3complete PDF
53 pages
Sterling N Computing
No ratings yet
Sterling N Computing
2 pages
Correlation Analysis. Regression
No ratings yet
Correlation Analysis. Regression
73 pages
R Practicals
No ratings yet
R Practicals
32 pages
R Stastics PDF
No ratings yet
R Stastics PDF
30 pages
RegrCorr PDF
No ratings yet
RegrCorr PDF
20 pages
Regression An Ova
No ratings yet
Regression An Ova
24 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Practice-Training BTTC
No ratings yet
Practice-Training BTTC
25 pages
Java ™ Cryptography Architecture (JCA) Reference Guide: For Java Platform Standard Edition 6
No ratings yet
Java ™ Cryptography Architecture (JCA) Reference Guide: For Java Platform Standard Edition 6
95 pages
Coefplot Manual
No ratings yet
Coefplot Manual
36 pages
21BCS5999 - Ankit Kumar (Assignment 2)
No ratings yet
21BCS5999 - Ankit Kumar (Assignment 2)
16 pages
R Workshop PART 2
No ratings yet
R Workshop PART 2
36 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Multiple Regression
No ratings yet
Multiple Regression
7 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Bioseparation Citric Acid
No ratings yet
Bioseparation Citric Acid
32 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
English Set: Sl. No
No ratings yet
English Set: Sl. No
12 pages
Checking, Selecting & Predicting With Gams: Mathematical Sciences, University of Bath, U.K
No ratings yet
Checking, Selecting & Predicting With Gams: Mathematical Sciences, University of Bath, U.K
21 pages
R Codes
No ratings yet
R Codes
5 pages
TP2 Reg 2024
No ratings yet
TP2 Reg 2024
5 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Strategic Plan On Security Operations
No ratings yet
Strategic Plan On Security Operations
3 pages
R Lab 3
No ratings yet
R Lab 3
7 pages
Exercice V
No ratings yet
Exercice V
5 pages
Tap Magic Eco Oil Sds en Us 2023pdf
No ratings yet
Tap Magic Eco Oil Sds en Us 2023pdf
8 pages
Time Series Practice P2
No ratings yet
Time Series Practice P2
4 pages
R Course
No ratings yet
R Course
7 pages
OPA Annex 4 Request For Funds Format (15 March 2018)
No ratings yet
OPA Annex 4 Request For Funds Format (15 March 2018)
5 pages
Maaz Assignment # 3 Deep Learning
No ratings yet
Maaz Assignment # 3 Deep Learning
5 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
3010 Lab Model Diagnostic-1
No ratings yet
3010 Lab Model Diagnostic-1
4 pages
Stats Notes
No ratings yet
Stats Notes
4 pages
R Regression Commands
No ratings yet
R Regression Commands
5 pages
1,6 Hexanediamine
No ratings yet
1,6 Hexanediamine
7 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Fikir Pure Natural Spring Water
No ratings yet
Fikir Pure Natural Spring Water
2 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
U.S. Seismic Design Maps CHRYSLER
No ratings yet
U.S. Seismic Design Maps CHRYSLER
2 pages
Common Stat 101 Commands For Rstudio: 1 One Categorical Variable
No ratings yet
Common Stat 101 Commands For Rstudio: 1 One Categorical Variable
5 pages
How To Convert A Section 8 Company Into A Public Limited Company
No ratings yet
How To Convert A Section 8 Company Into A Public Limited Company
3 pages
Southpoint School & College: Time: 30 Mins Subject: Computer Studies (Objectives) Full Marks: 30
No ratings yet
Southpoint School & College: Time: 30 Mins Subject: Computer Studies (Objectives) Full Marks: 30
2 pages
Research Methodology: Types of Taboo Words Are Used in What Is The Function of Taboo Words Are Used in
No ratings yet
Research Methodology: Types of Taboo Words Are Used in What Is The Function of Taboo Words Are Used in
3 pages
Gamal Mohamed CV
No ratings yet
Gamal Mohamed CV
2 pages
RStudio Cheat Sheet 2022
No ratings yet
RStudio Cheat Sheet 2022
1 page
Continental Device India Limited: PNP Silicon Epitaxial Power Transistor CFB1370 (9AW) TO-220FP
No ratings yet
Continental Device India Limited: PNP Silicon Epitaxial Power Transistor CFB1370 (9AW) TO-220FP
2 pages
Geronimo Creer, Jr. For Plaintiffs-Appellees. Benedicto G. Cobarde For Defendant, Defendant-Appellant
No ratings yet
Geronimo Creer, Jr. For Plaintiffs-Appellees. Benedicto G. Cobarde For Defendant, Defendant-Appellant
2 pages

How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable

Uploaded by

How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable

Uploaded by

1

How to use qqplot

(2) Q-Q plot for other distributions

3.4 Simple linear regression

simple linear regression model :

: the signed vertical distance of the point

to the prediction line.

Estimation method : The method of least squares

3.4.1 Using the regression model for prediction

3.4.2 Finding the regression coefficients using lm()

ex) lm(y ~ x) : y is modeled by x

Adding the regression line to a scatterplot: abline()

Using the regression line for predictions

More on model formulas

Using a model formula with transformations

(2) Right method

3.4.4 Interacting with a scatterplot

Example 3.7: Florida 2000

3.4.5 Outliers in the regression model

Resistant regression lines: lqs() and rlm()

The regression coefficients are subject to strong influences from outliers.

use resistant regression methods.

(1) Least-trimmed squares

The labels : legend= argument.

3.4.7 Trend lines

You might also like