0% found this document useful (0 votes)
5 views

Chapter4 Notes

Chapter 4 discusses linear regression, a statistical model used to predict the value of an outcome variable based on one or more predictor variables, establishing a linear relationship between them. It covers the mathematical equations for linear and multiple regression, the use of the lm() function in R to create relationship models, and the predict() function for making predictions. Additionally, the chapter addresses advanced graphics techniques in R for visualizing data and managing multiple plots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Chapter4 Notes

Chapter 4 discusses linear regression, a statistical model used to predict the value of an outcome variable based on one or more predictor variables, establishing a linear relationship between them. It covers the mathematical equations for linear and multiple regression, the use of the lm() function in R to create relationship models, and the predict() function for making predictions. Additionally, the chapter addresses advanced graphics techniques in R for visualizing data and managing multiple plots.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

CHAPTER - 4

Linear Regression:
Regression shows a line or curve that passes through all the data points on the target-predictor
graph in such a way that the vertical distance between the data points and the regression line is
minimum.
A linear regression is a statistical model is used to predict the value of an outcome variable y on
the basis of one or more input predictor variables x.
In other words, linear regression is used to establish a linear relationship between the predictor
and response variables.
Regression analysis is a very widely used statistical tool to establish a relationship model
between two variables. One of these variable is called predictor variable whose value is gathered
through experiments. The other variable is called response variable whose value is derived from
the predictor variable.
Linear regression is one of the most basic statistical models.
In linear regression, predictor and response variables are related through an equation in which
the exponent (power) of both these variables is 1. Mathematically, a linear relationship denotes a
straight line, when plotted as a graph.
There is the following general mathematical equation for linear regression:
y = ax + b
y is a response variable (Dependent Variable)
x is a predictor variable (Independent Variable).
a and b are constants that are called the coefficients (the intercept and the slope)
Example:
The prediction of the weight of a person when his height is known is a simple example of
regression. To predict the weight, we need to have a relationship between the height and
weight of a person. Weight= a+Height*b
When you calculate the age of a child based on their height, you are assuming the older

they are, the taller they will be.


Linear Regression Line
A linear line showing the relationship between the dependent and independent variables is called
a regression line. A regression line can show two types of relationship:

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 1


CHAPTER - 4

Positive Linear Relationship: If the dependent variable increases on the Y-axis and the
independent variable increases on the X-axis, then such a relationship is termed as a Positive
linear relationship.
Negative Linear Relationship: If the dependent variable decreases on the Y-axis and
independent variable increases on the X-axis, then such a relationship is called a negative linear
relationship.
Hence, we try to find a linear function that predicts the response value(y) as accurately as
possible as a function of the feature or independent variable(x).
Y = β₀ + β₁X + ε
The dependent variable, also known as the response or outcome variable, is represented
by the letter Y.
The independent variable, often known as the predictor or explanatory variable, is
denoted by the letter X.
The intercept, or value of Y when X is zero, is represented by the β₀.
The slope or change in Y resulting from a one-unit change in X is represented by the β₁.
The error term or the unexplained variation in Y is represented by the ε.
Steps to Establish a Regression:
A simple example of regression is predicting weight of a person when his height is known. To do
this we need to have the relationship between height and weight of a person.
1. Carry out the experiment of gathering a sample of observed values of height and
corresponding weight.
2. Create a relationship model using the lm() functions in R.
3. Find the coefficients from the model created and create the mathematical equation using
these
4. Get a summary of the relationship model to know the average error in prediction. Also
called residuals.
5. To predict the weight of new persons, use the predict() function in R.
Step 1.
Input Data
Below is the sample data representing the observations −

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 2


CHAPTER - 4

# Values of height
151, 174, 138, 186, 128, 136, 179, 163, 152, 131
# Values of weight.
63, 81, 56, 91, 47, 57, 76, 72, 62, 48
Step 2 creates the relationship model between the predictor and the response variable.
lm ( ) Function:
This function creates the relationship model between the predictor and the response variable
Syntax:
The basic syntax for lm() function in linear regression is −
lm(formula, data)
formula is a symbol presenting the relation between x and y.
data is the vector on which the formula will be applied
Note: Tilde (~) is used to separate the left- and right-hand sides in a model formula.

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) #weight
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) #height
# Apply the lm() function.
relation <- lm(y~x)
print(relation)
#Get the Summary of the Relationship
#print(summary(relation))

Output

Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-38.4551 0.6746

The predict() Function:


Now, we will predict the weight of new persons with the help of the predict() function. There is
the following syntax of predict function:
predict(object, newdata)

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 3


CHAPTER - 4

object: It is the formula that we have already created using the lm() function.
newdata: It is the vector that contains the new value for the predictor variable
Predict the weight of new persons:

# The predictor vector.


x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The resposne vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)

#plot(y,x,col = "blue",main = "Height & Weight Regression",


#abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")

Output

1
76.22869

Call: Shows the function call used to compute the regression model.
Residuals: Provide a quick view of the distribution of the residuals, which by definition
have a mean zero. Therefore, the median should not be far from zero, and the minimum
and maximum should be roughly equal in absolute value.
Coefficients: Shows the regression beta coefficients and their statistical significance.
Predictor variables, that are significantly associated to the outcome variable, are marked
by stars.
Residual standard error (RSE), R-squared (R2) and the F-statistic are metrics that are
used to check how well the model fits to our data.

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 4


CHAPTER - 4

Multiple Linear Regressions:


Multiple regression is an extension of linear regression into relationship between more than two
variables. In simple linear relation we have one predictor and one response variable, but in
multiple regression we have more than one predictor variable and one response variable. i.e
Multiple Linear Regression basically describes how a single response variable Y depends
linearly on a number of predictor variables.
The general mathematical equation for multiple regression is −
y = a + b1x1 + b2x2 +.......bnxn
y is the response variable.
a, b1, b2...bn are the coefficients.
x1, x2, ...xn are the predictor variables.
Example:
1. The selling price of a house can depend on the desirability of the location, the number of
bedrooms, the number of bathrooms, the year the house was built, the square footage of
the lot, and a number of other factors.
2. The height of a child can depend on the height of the mother, the height of the father,
nutrition, and environmental factors
We create the regression model using the lm() function in R. The model determines the value of
the coefficients using the input data. Next we can predict the value of the response variable for a
given set of predictor variables using these coefficients.
lm() Function
This function creates the relationship model between the predictor and the response variable.
Syntax
The basic syntax for lm() function in multiple regression is −
lm(y ~ x1+x2+x3..., data)
formula is a symbol presenting the relation between the response variable and predictor
variables.
data is the vector on which the formula will be applied.

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 5


CHAPTER - 4

Example:
Input Data:
Consider the data set "mtcars" available in the R environment. It gives a comparison between
different car models in terms of mileage per gallon (mpg), cylinder displacement("disp"), horse
power("hp"), weight of the car("wt") and some more parameters.
The goal of the model is to establish the relationship between "mpg" as a response variable with
"disp","hp" and "wt" as predictor variables. We create a subset of these variables from the mtcars
data set for this purpose.

input <- mtcars[,c("mpg","disp","hp","wt")]


# Create the relationship model.
model <- lm(mpg~disp+hp+wt, data = input)
# Show the model.
print(model)
# Get the Intercept and coefficients as vector elements.
cat("# # # # The Coefficient Values # # # ","\n")
a <- coef(model)[1]
print(a)
Xdisp <- coef(model)[2]
Xhp <- coef(model)[3]
Xwt <- coef(model)[4]
print(Xdisp)
print(Xhp)
print(Xwt)
#z<-data.frame(disp = 221, hp = 102 ,wt = 2.91 )
#print(z)
#result<-predict(model,z)
#print(result)

Output

Call:
lm(formula = mpg ~ disp + hp + wt, data = input)

Coefficients:
(Intercept) disp hp wt
37.105505 -0.000937 -0.031157 -3.800891

# # # # The Coefficient Values # # #


(Intercept)

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 6


CHAPTER - 4

37.10551
disp
-0.0009370091
hp
-0.03115655
wt
-3.800891

Create Equation for Regression Model:


Based on the above intercept and coefficient values, we create the mathematical equation.
Y = a+Xdisp.x1+Xhp.x2+Xwt.x3
or
Y = 37.15+(-0.000937)*x1+(-0.0311)*x2+(-3.8008)*x3
Apply Equation for predicting New Values
We can use the regression equation created above to predict the mileage when a new set of
values for displacement, horse power and weight is provided.
For a car with disp = 221, hp = 102 and wt = 2.91 the predicted mileage is −
Y = 37.15+(-0.000937)*221+(-0.0311)*102+(-3.8008)*2.91 = 22.7104

Advanced graphics:
Handling the Graphics Device:
So far, your plotting has dealt with one image at a time. It’s possible to have multiple graphics
devices open, but only one will be deemed active at any given time.
Manually Opening a New Device:
The typical base R commands you’ve met already (such as plot, hist, boxplot, and so on) will
automatically open a device for plotting and draw the desired plot, if nothing is currently open.
You can also open new device windows using dev.new( ); this newest window will immediately
become active, and any subsequent plotting commands will affect that particular device.
As an example, first close any open graphics windows and then enter the following at the R
prompt:
R> plot(quakes$long, quakes$lat)
Now, let’s say you’d also like to see a histogram of the number of stations that detected each
event. Execute the following to open a new plotting window:

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 7


CHAPTER - 4

R> dev.new( )
At this point, you can enter the usual command to bring up the desired histogram in Device 3:
R> hist(quakes$stations)
If you hadn’t used dev.new, the histogram would’ve just overwritten the plot of the spatial
locations in Device 2.
Switching Between Devices:
To change something in Device 2 without closing Device 3, use dev.set followed by the device
number you want to make active.
R> dev.set(2)
quartz
2
R> plot(quakes$long,quakes$lat,cex=0.02*quakes$stations, xlab="Longitude",ylab="Latitude")
R> dev.set(3)
quartz
3
R> abline(v=mean(quakes$stations),lty=2)
Closing a Device:
To close a graphics device, use the dev.off( ) function
R> dev.off(2)
quartz
3
Then repeat the call without an argument to close the remaining device:
R> dev.off()
null device
1
Multiple Plots in One Device:
You can also control the number of individual plots in any one device. par( ) function is used to
control various graphical parameters of traditional R plots.
Setting the mfrow Parameter:
The mfrow argument instructs a new (or the currently active) device to “invisibly” divide itself
into a grid of the specified dimensions, with each cell holding one plot. You pass the mfrow

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 8


CHAPTER - 4

option a numeric integer vector of length 2 in the order of c(rows,columns); as you might guess,
its default is c(1,1).
Now, say you want the two plots of the quakes data side by side in the same device.
You would set mfrow as a 1 × 2 grid with the vector c(1,2)—one row of plots and two columns
R> dev.new(width=8,height=4)
R> par(mfrow=c(1,2))
R> plot(quakes$long,quakes$lat, cex=0.02*quakes$stations, xlab="Longitude",ylab="Latitude")
R> hist(quakes$stations)
R> abline(v=mean(quakes$stations), lty=2)
Defining a Particular Layout
You can refine the arrangements of plots in a single device using the layout( ) function, which
offers more ways to individualize the panels into which the plots will be drawn.
When you use layout, you provide the dimensions in a matrix mat as the first argument; these
govern an invisible rectangular grid, just like controlling the mfrow option. The difference now
is that you can use numeric integer entries in mat to tell layout which plot number will go where.
Examine the following object:
R> lay.mat <- matrix(c(1,3,2,3),2,2)
R> lay.mat
[,1] [,2]
[1,] 1 2
[2,] 3 3
The dimensions of this matrix create a 2 × 2 grid of plotting cells, but the values inside lay.mat
tell R that you want plot 1 to take the upper-left cell, plot 2 to take the upper-right cell, and plot 3
to stretch itself over the two bottom cells.
Calling layout as follows will either initialize the active device based on lay.mat or open a new
one (if the null device is the only device currently available) and initialize it.
R> layout(mat=lay.mat)
If you’re ever unsure of the result of your specification, you can use the layout.show( ) function
to see how plots will be placed.
R> layout.show(n=max(lay.mat))
R> plot(survey$Wr.Hnd, survey$Height, xlab="Writing handspan", ylab="Height")

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 9


CHAPTER - 4

R> boxplot(survey$Height~survey$Smoke, xlab="Smoking frequency", ylab="Height")


R> barplot(table(survey$Exer), horiz=TRUE, main="Exercise")
Plotting Regions and Margins:
For any single plot created using base R graphics, there are three regions that make up the image.
The plot region: This is where your actual plot appears and where you’ll usually be
drawing your points, lines, text, and so on. The plot region uses the user coordinate
system, which reflects the value and scale of the horizontal and vertical axes.
The figure region is the area that contains the space for your axes, their labels, and any
titles. These spaces are also referred to as the figure margins.
The outer region, also referred to as the outer margins, is additional space around the
figure region that is not included by default but can be specified if it’s needed.
We can explicitly measure and set margin space in a different ways. You specify these as vectors
of length 4 in a particular order; each of the four elements corresponds to one of the four sides:
c(bottom, left, top, right). The graphical parameters oma (outer margin) and mar (figure margin)
are used to control these amounts; like mfrow, they are initialized through a call to par before
you begin to draw any new plot.
Default Spacing:
You can find your default figure margin settings with a call to par( ) in R
R> par()$oma
[1] 0 0 0 0
R> par()$mar
[1] 5.1 4.1 4.1 2.1
You can see here that oma=c(0, 0, 0, 0)—there is no outer margin set by default. The default
figure margin space is mar=c(5.1, 4.1, 4.1, 2.1)—in other words, 5.1 lines of text on the bottom,
4.1 on the left and top, and 2.1 on the right.
R> plot(1:10)
R> box(which="figure",lty=2)
Custom Spacing:
R> par(oma=c(1,4,3,2),mar=4:7)
R> plot(1:10)
R> box("figure",lty=2)

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 10


CHAPTER - 4

R> box("outer",lty=3)
R> mtext("Figure region margins\nmar[ . ]",line=2)
R> mtext("Outer region margins\noma[ . ]",line=0.5,outer=TRUE)
mtext( ):
Here, you provide the text you want written in a character string as the first argument, and the
argument line instructs how many lines of space away from the inside border the text should
appear.
Clipping:
Controlling clipping allows you to draw in or add elements to the margin regions with reference
to the user coordinates of the plot itself. For example, you might want to place a legend outside
the plotting area, or you might want to draw an arrow that extends beyond the plot region to
enhance a particular observation.
The graphical parameter xpd controls clipping in base R graphics. By default, xpd is set to
FALSE, so all drawing is clipped to the available plot region only (with the exception of special
margin-addition functions such as mtext).
Setting xpd to TRUE allows you to draw things outside the formally defined plot region into the
figure margins but not into any outer margins.
Setting xpd to NA will permit drawing in all three areas—plot region, figure margins, and the
outer margins.
For example, take a look at the images in Figure 23-5, showing side-by side boxplots of mileage
split by number of cylinders, created with the following code:
R> dev.new()
R> par(oma=c(1,1,5,1),mar=c(2,4,5,4))
R> boxplot(mtcars$mpg~mtcars$cyl,xaxt="n",ylab="MPG")
R> box("figure",lty=2)
R> box("outer",lty=3)
R> arrows(x0=c(2,2.5,3),y0=c(44,37,27),x1=c(1.25,2.25,3),y1=c(31,22,20), xpd=FALSE)
R> text(x=c(2,2.5,3),y=c(45,38,28),c("V4 cars","V6 cars","V8 cars"), xpd=FALSE)

Point-and-Click Coordinate Interaction:


Retrieving Coordinates Silently:
Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 11
CHAPTER - 4

The locator( ) command allows you to find and return user coordinates.
To see how it works, first execute a call to plot(1,1) to bring up a simple plot with a single point
in the middle. To use locator, you simply execute the function (with no arguments for default
behavior), which will “hang” the console, without returning you to the prompt. Then, on an
active graphics device, your mouse cursor will change to a + symbol (you may need to first click
your device once to bring it to the foreground of your computer desktop). With your cursor as the
+, you can perform a series of (left) mouse clicks inside the device, and R will silently record the
precise user coordinates. To stop this, simply right-click to terminate the command and once you
do, the coordinates you identified in the device are returned as a list with components $x and $y.
R> plot(1,1)
R> locator()
$x
[1] 0.8275456 1.1737525 1.1440526 0.8201909
$y
[1] 1.1581795 1.1534442 0.9003221 0.8630254
Visualizing Selected Coordinates:
You can also use locator( ) to plot the points you select as either individual points or as lines.
R> plot(1,1)
R> Rtist <- locator(type="o",pch=4,lty=2,lwd=3,col="red",xpd=TRUE)
R> Rtist
$x
[1] 0.5013189 0.6267149 0.7384407 0.7172250 1.0386740 1.2765699
[7] 1.4711542 1.2352573 1.2220592 0.8583484 1.0483300 1.0091491
$y
[1] 0.6966016 0.9941945 0.9636752 1.2819852 1.2766579 1.4891270
[7] 1.2439071 0.9630832 0.7625887 0.7541716 0.6394519 0.9618461
Ad Hoc Annotation:
The locator function also allows you to place ad hoc annotations, such as legends, on your plot.
The student survey data in the MASS package, first loading the package by calling
library("MASS").
R> library("MASS")

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 12


CHAPTER - 4

R>plot(survey$Height~survey$Wr.Hnd,pch=16,
col=c("gray","black")[as.numeric(survey$Sex)],
xlab="Writing handspan",ylab="Height")
R> legend(locator(n=1),legend=levels(survey$Sex), pch=16, col=c("gray","black"))
If you specify n=1, locator will automatically terminate after you left-click once in the device

Customizing Traditional R Plots:


Now that you’re familiar with the way R places and handles plots in the graphics device, it’s
time to focus on common features of plots.
Graphical Parameters for Style and Suppression:
For an example image, let’s plot MPG against horsepower (from the ready-to-use mtcars data
set) and set each plotted point to be sized proportionally to the weight of each car. For
convenience, create the following objects:
R> hp <- mtcars$hp
R> mpg <- mtcars$mpg
R> wtcex <- mtcars$wt/mean(mtcars$wt)
The last object is the car weight vector scaled by its sample mean. This creates a vector where
cars less than the average weight have a value < 1 and cars more than the average weight have a
value > 1, making it ideal for the cex parameter to scale the size of the plotted points
accordingly.
Let’s start by focusing on some more graphical parameters usually used in the first instance of a
call to plot().
R> plot(hp,mpg,cex=wtcex)
There are two axis “styles,” controlled by the graphical parameters xaxs and yaxs. Their sole
purpose is to decide whether to impose the small amount of additional horizontal and vertical
buffer space that’s present at the ends of each axis to prevent points being chopped off at the end
of the plotting region. The default, xaxs="r" and yaxis="r", is to include that space.
The alternative, setting one or both of these to "i", instructs the plot region to be strictly defined
by the upper and lower limits of the data (or by those optionally supplied to xlim and/or ylim),
that is, with no additional padding space.
R> plot(hp,mpg,cex=wtcex, xaxs="i", yaxs="i")

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 13


CHAPTER - 4

This plot is almost the same as the default, but note now that there’s no padding space at the end
of the axes.
plot(hp,mpg,cex=wtcex, axes=FALSE, ann=FALSE)
Customizing Boxes:
To add a box specific to the current plot region in the active graphics device, you use box and
specify its type with bty.
The bty argument is supplied a single character: "o" (default), "l", "7", "c", "u", "]", or "n".
You can use other relevant parameters that you’ve met already, such as lty, lwd, and col, to
further control the appearance of a box.
R> box(bty="l",lty=3,lwd=2)
R> box(bty="]",lty=2,col="gray")
Customizing Axes
Once you have the box the way you want it, you can focus on the axes. The axis ( ) function
allows you to control the addition and appearance of an axis on any of the four sides of the plot
region in greater detail.
The first argument it takes is side, provided with a single integer: 1 (bottom), 2 (left), 3 (top), or
4 (right). These numbers are consistent with the positions of the relevant margin-spacing values
when you’re setting graphical parameter vectors like mar.
R> hpseq <- seq(min(hp),max(hp),length=10)
R> plot(hp,mpg,cex=wtcex,xaxt="n",bty="n",ann=FALSE)
R> axis(side=1,at=hpseq)
R> axis(side=3,at=round(hpseq))
Specialized Text and Label Notation:
Font:
The displayed font is controlled by two graphical parameters: family for the specific font family
and font, an integer selector for controlling bold and italic typeface.
There are three generic families—"sans" (the default), "serif", and "mono"—that are always
available.
These are paired with the four possible values of font—1 (normal text, default), 2 (bold), 3
(italic), and 4 (bold and italic).
R> text(0,6,label="sans text (default)\nfamily=\"sans\", font=1")

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 14


CHAPTER - 4

R> text(0,5,label="serif text\nfamily=\"serif\", font=1",


family="serif", font=1)
R> text(0,4,label="mono text\nfamily=\"mono\", font=1",
family="mono", font=1)
R> text(0,3,label="mono text (bold, italic)\nfamily=\"mono\", font=4",
family="mono", font=4)
R> text(0,2, label="sans text (italic)\nfamily=\"sans\", font=3", family="sans", font=3)
R> text(0,1,label="serif text (bold)\nfamily=\"serif\", font=2", family="serif",font=2)
R> mtext("some", line=1,at=-0.5,cex=2,family="sans")
R> mtext("different", line=1,at=0,cex=2,family="serif")
R> mtext("fonts", line=1,at=0.5,cex=2,family="mono")
Here, text is used to place the content at predetermined coordinates, and mtext is used to add to
the top figure margin.
Greek Symbols:
For statistically or mathematically technical plots, annotation may occasionally require Greek
symbols or mathematical markup. You can display these using the expression( ) function.
R> par(mar=c(3,3,3,3))
R> plot(1,1,type="n",xlim=c(-1,1),ylim=c(0.5,4.5),xaxt="n",yaxt="n",
ann=FALSE)
R> text(0,4,label=expression(alpha),cex=1.5)
R> text(0,3,label=expression(paste("sigma: ",sigma," Sigma: ",Sigma)),family="mono",cex=1.5)
R> text(0,2,label=expression(paste(beta," ",gamma," ",Phi)),cex=1.5)
R> text(0,1,label=expression(paste(Gamma,"(",tau,") = 24 when ",tau," = 5")),
family="serif",cex=1.5)
R> title(main=expression(paste("Gr",epsilon,epsilon,"k")),cex.main=2)

Mathematical Expressions:
R> expr1 <- expression(c^2==a[1]^2+b[1]^2)
R> expr2 <- expression(paste(pi^{x[i]},(1-pi)^(n-x[i])))
R> expr3 <- expression(paste("Sample mean: ", italic(n)^{-1}, sum(italic(x)[italic(i)],
italic(i)==1, italic(n))==frac(italic(x)[1]+...+italic(x)[italic(n)], italic(n))))

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 15


CHAPTER - 4

R> expr4 <- expression(paste("f(x","|",alpha,",",beta, ")"==frac(x^{alpha-1}~(1-x)^{beta-1},


B(alpha,beta))))
And then I used them in the following code:
R> par(mar=c(3,3,3,3))
R> plot(1,1,type="n",xlim=c(-1,1),ylim=c(0.5,4.5),xaxt="n",yaxt="n", ann=FALSE)
R> text(0,4:1,labels=c(expr1,expr2,expr3,expr4),cex=1.5)
R> title(main="Math",cex.main=2)

Representing and Using Color:


Color plays a key role in many plots.
Red-Green-Blue Hexadecimal Color Codes:
When specifying colors in plots, your instruction to R so far has been given either in the form of
an integer value from 1 to 8 or as a character string.
One of the most common methods of color specification is to specify different saturations or
intensities of three primaries—red, green, and blue (RGB)—which are then mixed to form the
resulting target color. Each primary component of the standard RGB system is assigned an
integer from 0 to 255 (inclusive). Such mixtures are therefore able to form a total of 2563 =
1,67,77,216 possible colors.
You always express these values in (R, G, B) order; the result is commonly referred to as a
triplet. For example, (0, 0, 0) represents pure black, (255, 255, 255) represents pure white, and
(0, 255, 0) is full green.
The col argument lets you select one of eight colors when you supply it an integer from 1 to 8.
You can find these eight colors with the following call:
R> palette()
[1] "black" "red" "green3" "blue" "cyan" "magenta"
[7] "yellow" "gray"
These are but a small subset of the 650+ named colors that you can list by entering colors() at the
R prompt. All of these named colors can also be expressed in the standard RGB format. To find
the RGB values for a color, supply the desired color names as a vector of character strings to the
built-in col2rgb( ) function. Here’s an example:
R> col2rgb(c("black","green3","pink"))
[,1] [,2] [,3]

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 16


CHAPTER - 4

red 0 0 255
green 0 205 192
blue 0 0 203
These RGB triplets are frequently expressed as hexadecimals, a numeric coding system often
used in computing. In R, a hexadecimal, or hex code, is a character string with a # followed by
six alphanumeric characters: valid characters are the letters A through F and the digits 0 through
9. The first pair of characters represents the red component, and the second and third pairs
represent green and blue, respectively.
R> rgb(t(col2rgb(c("black","green3","pink"))), maxColorValue=255)
[1] "#000000" "#00CD00" "#FFC0CB"

pcol <- function(cols){


n <- length(cols)
dev.new(width=7,height=7)
par(mar=rep(1,4))
plot(1:5,1:5,type="n",xaxt="n",yaxt="n",ann=FALSE)
for(i in 1:n){
pt <- locator(1)
rgbval <- col2rgb(cols[i])
points(pt,cex=4,pch=19,col=cols[i])
text(pt$x+1,pt$y,family="mono",
label=paste("\"",cols[i],"\"","\nR: ",rgbval[1],
" G: ",rgbval[2]," B: ",rgbval[3],
"\nhex: ",rgb(t(rgbval),maxColorValue=255),
sep=""))
}
}
mycols <- c("black","blue","royalblue2","pink","magenta","purple",
"violet","coral","lightgray","seagreen4","red","red2",
"yellow","lemonchiffon3")
pcol(mycols)

3D Scatterplots
Creating 3D scatterplots, which allow you to plot raw observations based on three continuous
variables at once, as opposed to only two in a conventional 2D scatterplot.
Basic Syntax:

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 17


CHAPTER - 4

The syntax of the scatterplot3d function is similar to the default plot function. In the latter, you
supply a vector of x- and y-axis coordinates; in the former, you merely supply an additional third
vector of values providing the z-axis coordinates. With that additional dimension, you can think
of these three axes in terms of the x-axis increasing from left to right, the y-axis increasing from
foreground to background, and the z-axis increasing from bottom to top.
Install and load the scatterplot3d package.

library("scatterplot3d")
pwid <- iris$Petal.Width
plen <- iris$Petal.Length
swid <- iris$Sepal.Width
slen <- iris$Sepal.Length
scatterplot3d(x=pwid,y=plen,z=swid)

Prepared by: Chandrashekhar K, Asst. Prof, Dept. of BCA Page 18

You might also like