Module -05 Statistical Computing and r Programming
Module -05 Statistical Computing and r Programming
MODULE-05
Simple linear regression, multiple linear regression, linear model selection and diagnostic s.
Advanced graphics: plot customization, plotting regions and margins, point and click
coordinate interaction, customizing traditional R plots, specialized text and label
notation.Defining colors and plotting in higher dimensions, representing and using color, 3D
scatter plots.
Regression: The linear regression is defined as the analyze the relationship between the
independent and dependent variable is called regression.
In statistics, regression refers to a statistical method used to examine the relationship between
one or more independent variables and a dependent variable.
It aims to model and understand how changes in the independent variables are associated with
changes in the dependent variable.
Regression analysis helps in:
Predicting or estimating the value of the dependent variable based on the values of one
or more independent variables.
Understanding the strength and nature of the relationship between variables.
Making inferences about the population based on sample data.
There are various types of regression models,
1. Simple Linear Regression: Examining the relationship between two variables,
typically one independent variable and one dependent variable.
2. Multiple Linear Regression: Exploring the relationship between a dependent
variable and multiple independent variables.
3. Logistic Regression: Used when the dependent variable is categorical, predicting the
probability of an event occurring.
NOTE:
1) Regression analysis involves fitting a regression model to the data, assessing the
model's goodness of fit, evaluating the significance of variables, and using the model
for prediction or inference.
2) It's widely used across numerous fields to understand relationships, make predictions,
and inform decision-making based on data.
3) The dependent variable(Y):It is the outcome variable or response variable. That want
to be predict.
It assumes that there is a linear relationship between the predictor variable (often denoted as
X) and the outcome variable (often denoted as Y).
The simple linear regression model can be represented as:
Y=β0+β1⋅X+ε
Where:
Y is the dependent variable (outcome or response variable).
X is the independent variable (predictor or explanatory variable).
β0 is the intercept (the value of Y when X is 0).
β1 is the slope (the change in Y for a one-unit change in X).
ε represents the error term (the difference between the predicted and actual values).
The goal of simple linear regression is to estimate the values of β0 and β1 that minimize the
sum of squared differences between the observed Y values and the values predicted by the
model for given X values.
Applications:
Simple linear regression is a foundational statistical technique used to establish relationship s
between two quantitative variables.
1.Economics and Finance: In finance, it's used to model the relationship between variables
like interest rates and stock prices or GDP and unemployment rates. Economists might employ
it to analyze the effect of inflation on consumer spending.
2.Market Research: Simple linear regression helps in analyzing the impact of advertising
expenditure on sales. It assists in understanding how changes in advertising spending might
affect product sales.
3.Healthcare and Medicine: Medical researchers often use simple linear regression to study
the relationship between a drug dosage and its effectiveness or to examine the impact of certain
lifestyle factors (like diet, exercise) on health indicators.
4.Education: Simple linear regression can be applied in educational research to investiga te
the relationship between study hours and exam scores or to predict student performance based
on various factors.
5. Sports Analytics: Analysts might use it to study the relationship between the number of hours of
practice and athletic performance or to predict team performance based on player statistics.
The goal of simple linear regression is to estimate the values of β0 and β1 that minimize the
sum of squared differences between the observed Y values and the values predicted by the
model for given X values.
Y== β₀ + β₁X + ε
yˆ = βˆ0 + βˆ1. x
oR
E[Y ] = βˆ0 + βˆ1 .x
OR
E[Y|X = x] = βˆ0 + βˆ1 .x
𝑛
(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅ )
𝛽1 = ∑
(𝑥𝑖 − 𝑥̅ )2
𝑖̇ =1
Problems:
X={2,3,5,7,9} y={4,5,7,10,15}
Step1:
∑𝑥 26
To calculate mean 𝑋̅ = = =5.2
𝑛 5
∑𝑌 41
𝑌̅ = = =8.2
𝑛 5
To Write in R program
The syntax lm function:
lm(formula, data)
Output:
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) slope x
7.67 0.098
Now, we will predict the weight of new persons with the help of the predict() function. There
is the following syntax of predict function:
Syntax:
predict(object, newdata)
1. object It is the formula that we have already created using the lm() function.
2. Newdata It is the vector that contains the new value for the predictor variable.
Source Code:
#Creating input vector for lm() function
x <- c(2,3,5,7,9)
y <- c(4,5,7,10,15)
Plotting Regression
Now, we plot out prediction results with the help of the plot() function. This function takes
parameter x and y as an input vector and many more arguments.
plot(y,x,col = "red",main = "Height and Weight Regression",abline(lm(x~y)),cex = 1.3,pch =
16,xlab = "Weight in Kg",ylab = "Height in cm")
# Saving the
file. dev.off()
The Multiple linear regression are the Multiple relationship between independent and
dependent two or more variable.
OR
A Multiple Linear regression predict a one dependent and Tow-more independent variable.
It is the extension of the simple linear regression. The multiple linear regression model
is used to determine a mathematical relationship among several random variable.
Multilinear regression, also known as multiple linear regression, is a statistica l method
used to analyze the relationship between multiple independent variables and a
dependent variable.
EXAMPLE:
let's consider a real-life example of multiple linear regression. Imagine you're trying to
predict house prices based on various factors such as square footage, number of bedrooms,
and distance from the city center.
Solution: In the case of multiple linear regression for predicting house prices:
Using multiple linear regression, a model can be created based on historical data of houses
sold. This model will estimate the house price by considering all these factors
simultaneously. For instance, the model might suggest that for every additional square foot,
the price increases by a certain amount, or that being closer to the city center correlates with
higher prices. This information helps in making predictions about house prices for new
properties based on their features.
Y= X.β+€
1 𝑋1,1 𝑋𝑝,1
x = [1 𝑋1,2 𝑋𝑝,2 ]
1 𝑋1,𝑛 𝑋𝑝,𝑛
=(𝑥 𝑇 ∗ 𝑋) −1𝑥 𝑇 ∗ 𝑌
NOTE:
T
* The symbol represents matrix multiplication, the superscript rep- resents the
transpose, and −1 represents the inverse when applied to matrice s
* Extending the size of β and X (note the leading column of 1s in X) to
create structures of size p + 1 (as opposed to just the number of
predictors p) allows for the estimation of the overall intercept β0.
Terminology:
1. lurking variable : A lurking variable influences the response, anothe r
predictor, or both, but goes unmeasured (or is not included) in a
predictive model
2. nuisance or extraneous variable :A nuisance or extraneous variable is a
predictor of secondary or no interest that has the potential to confo un d
relationships between other variables and so affect your estimates of the
other regression coefficients.
3. Multiple Linear Regression: A statistical method to model the relations hip
between multiple independent variables and a dependent variable.
4. Dependent Variable: Also known as the response variable, it's the variable
being predicted or explained by the independent variables in the model.
5. Independent Variables/Predictors: Variables used to predict or explain the
variation in the dependent variable.
6. Coefficients: These represent the weights or slopes assigned to each
independent variable, indicating the strength and direction of their influence on
the dependent variable. In the regression equation Y=β0+β1X1+β2X 2+...+β n
Where:
Y is the dependent variable.
X1,X2,…,Xn are the independent variables.
β0,β1,…,βn are the coefficients.
ϵ is the error term.
Step 5: Estimate Coefficients
Use statistical methods (such as the least squares method) to estimate the coefficients that
minimize the sum of squared differences between predicted and actual values.
Step 6: Assess Model Fit
Evaluate the goodness of fit of the model by examining metrics like R-squared, adjusted R-
squared, p-values of coefficients, and residuals analysis.
Step 7: Interpret Results
Interpret the coefficients to understand the relationship between the independent variables
and the dependent variable.
Step 8: Validate the Model
Validate the model's performance on a separate dataset or using cross-validation technique s
to ensure its reliability and generalizability.
Step 9: Make Predictions
Use the validated model to make predictions on new or unseen data based on the established
relationships.
Before you begin, make sure to load the necessary R libraries. For linear regression, you'll
typically use the lm() function, which is in the base R package
This code creates a dataset and performs multiple linear regression using lm(), where Y is
regressed on X1, X2, and X3. The summary() function provides detailed information about
the regression results, including coefficients, p-values, R-squared, etc.
Replace Y, X1, X2, X3, and the sample dataset with your actual data and variable names to
apply multiple linear regression to your specific case in R.
Difference between Simple linear Regression and Multiple linear Regression
Simple linear Regression Multiple linear Regression
2.Y = β0 + β1 X1 + β2 + ϵ 2. Y = β0 + β1 X1 + β2 X2 + . . . + βp Xp
+ϵ
Source code:
# Creating sample data
set.seed(42)
bedrooms <- sample(1:5, 100, replace = TRUE)
house_size <- rnorm(100, mean = 1500, sd = 300)
house_price <- 50000 + 20000 * bedrooms + 100 * house_size + rnorm(100, mean = 0, sd =
5000)
OUTPUT:
1 2 3
127690.4 147650.5 107250.9
Sample data for bedrooms , house_size , and house_price is created.
A multiple linear regression model is fitted (model) using lm() with house_price as
the dependent variable and bedrooms and house_size as independent variables.
New data (new_data) with values for bedrooms and house_size for three houses is
created for which we want to predict house_price .
The predict() function is used to predict house_price for the new data based on the
model.
The output provides predicted house prices for the three houses specified in the
new_data.
fitted model.
The conf_intervals output will provide the lower and upper bounds of the confidence
Linear model selection and diagnostics involve choosing the appropriate linear model
for a given dataset and assessing its performance.
Linear model selection methods are a critical part of statistical analysis with a wide
range of applications, particularly in high-dimensional data analysis, where the
number of variables is much larger than the sample size.
In linear model selection are the select the best model based on the criteria.
The Linear selection model is also called as the statistical model
The linear selection models are selected or balancing few factors.
1. Goodness-of-fit
2. Complexity
1. Goodness-of-fit: Goodness-of- fit refers to the goal of obtaining a model that best
represents the relationships between the response and the predictor (or pre-
dictors).
2. Complexity: Complexity describes how complicated a model is; this is always tied
to the number of terms in the model that require estimation—the inclusion of
more predictors and additional functions (such as polynomial transforma tions
and interactions) leads to a more complex model.
OR
To represents the complicated model selection from various theoretical and practical
challenges
Principle of Parsimony:
Statisticians refer to the balancing act between goodness-of- fit and complexity as the
principle of parsimony,
where the goal of the associated model selection is to find a model that’s as simple as possib le
(in other words, with relatively low complexity), without sacrificing too much goodness-
of-fit.
LINEAR MODEL SELECTION:
In statistics, several model selection algorithms help in choosing the best-fitting model among
a set of candidate models.
A model selection algorithm is to sift through your available explana to r y
variables
1. Forward Selection: Begins with an empty model and iteratively adds predictors that
most improve the model fit until a stopping criterion is met.
2. Backward Elimination: Starts with a model containing all predictors and removes the
least significant ones iteratively until a stopping criterion is reached.
3. Stepwise Selection:Combines forward and backward selection methods, allowing both
addition and removal of predictors based on certain criteria.
4. Best Subset Selection: Fits all possible combinations of predictors and selects the
model that best fits the data based on a chosen criterion (e.g., AIC, BIC).
version of the bigger, more complex model. Formally, let’s say you’ve fitted two linea r
regressions models as follows:
Here, the reduced model, predicting ŷ redu, has p predictors, plus one intercept. The
full model, predicting ŷ full , has q predictor terms.
provide a statistically significant improvement in goodness-of- fit. The partial F-
test addresses these hypotheses: −
H0 : βp+1 = βp+2 = . . . = βq = 0
HA : At least one of the βj ≠ 0 (for j = p, . . . , q)
ForwardSelection:
Forward selection is a model selection technique in linear regression where predictors are
incrementally added to the model based on statistical criteria until no further improvement is
observed
Example:
# Sample data
set.seed(123)
data <- data.frame(
outcome = rnorm(100),
predictor1 = rnorm(100),
predictor2 = rnorm(100),
predictor3 = rnorm(100)
)
OUTPUT:
Call:
lm(formula = outcome ~ predictor1 + predictor2 + predictor3,
data = data)
Residuals:
Min 1Q Median 3Q Max
-2.35541 -0.58837 -0.08408 0.55592 2.31302
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.09952 0.09309 1.069 0.288
predictor1 -0.04102 0.09547 -0.430 0.668
predictor2 -0.12493 0.09719 -1.285 0.202
predictor3 -0.04219 0.08892 -0.474 0.636
Residuals:
Min 1Q Median 3Q Max
-2.39957 -0.58426 -0.02865 0.60141 2.09693
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.09041 0.09128 0.99 0.324
AIC = −2 × L + 2 × (p + 2)
Here, L is a measure of goodness-of- fit named the log-likelihood, and p is the number
of regression parameters in the model, excluding the overall intercept
Source code:
# Sample data
set.seed(123)
data <- data.frame(
outcome = rnorm(100),
predictor1 = rnorm(100),
predictor2 = rnorm(100),
predictor3 = rnorm(100)
Residuals:
Min 1Q Median 3Q Max
-2.39957 -0.58426 -0.02865 0.60141 2.09693
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.09041 0.09128 0.99 0.324
ADVANCED GRAPHICS:
1. Many users are first drawn to R because of its impressive graphical flexibility and
the ease with which you can control and tailor the resulting visuals.
2. Advanced plot customization in statistics involves fine-tuning visual elements of
graphs and charts to convey data effectively.
3. It includes adjusting colors, adding annotations, changing axis scales, incorporating
multiple data series, and utilizing various plot types to enhance the representation of
statistical information.
4. The advanced graph in statistics visual representation of the data.
5. Advanced graph customization allows you to tailor every aspect of your visualizatio ns.
6. This can involve adjusting axis limits, changing tick marks, customizing grid lines,
modifying line styles and marker shapes, incorporating annotations, adjusting legends,
applying color palettes, and even creating interactive or 3D plots.
Characterstics of advanced graphs
1) Color and Style:
Color Palettes: Choose meaningful color schemes for different data elements. Use
tools like color gradients or categorical color palettes.
Line Styles and Markers: Customize the appearance of lines, such as dashed or
dotted lines, and markers for data points.
2) Annotations:
Text Annotations: Add labels, titles, or captions to highlight important features or
provide additional context.
Arrows and Lines: Use arrows or lines with annotations to draw attention to
specific data points or trends.
3) Axis Customization:
Tick Marks and Labels: Adjust the appearance of tick marks and labels on both
the x and y axes.
Logarithmic Scales: Utilize logarithmic scales for axes if data spans multip le
orders of magnitude.
4) Legend:
Positioning: Move the legend to a suitable position (top, bottom, left, right) for
better readability.
Custom Labels: Provide clear and concise labels for each series in the legend.
5) Faceting and Multiple Plots:
Facet Grids: Use facets to create multiple plots based on different categories,
making it easier to compare subsets of the data.
Arrangement: Adjust the arrangement of subplots to create a coherent and
informative layout.
6) Statistical Summaries:
Error Bars: Include error bars to show variability or uncertainty in data
Regression Lines: Add regression lines or other statistical summaries to illustrate
trends.
7) Background and Grid Lines:
Background Color: Customize the background color to improve contrast and
aesthetics.
Grid Lines: Adjust the appearance of grid lines to guide the viewer's eye.
8) Export and Save Options:
High-Resolution Output: Ensure that exported graphs are of high resolution for
publications or presentations.
Save in Multiple Formats: Save plots in different formats (PDF, PNG, SVG) for
versatility.
2. dev. off () ---------To close the device plot after completion of the task.
3. dev. set () --------- Switching Between Devices to change something in Device 2 without
closing Device 3, use dev. Set
Plot customization: Advanced plot customization in advanced graphs involves fine-tuning visual
elements to convey complex data effectively.
Customizing plots in R can be done using various functions and parameters to modify the appearance
of different plot elements. Here's a basic overview of how you can customize plots in R.
3. Adding a Legend
4. Axis Customization
5. Lines
6. Annotations
7. Plot Range
8. Defining a Particular Layout
1.Titles and Labels:
main: Main title of the plot.
xlab and ylab: Labels for the x and y-axes.
Example:
plot(x, y, main="Scatter Plot", xlab="X-axis label", ylab="Y-axis label")
output:
Example:
plot(x, y, col="blue", pch=16)
Output:
3.Adding a Legend:
legend: Adding a legend to the plot.
Example:
legend("topright", legend=c("Group 1", "Group 2"), col=c("red", "blue"), pch=16)
output:
4.Axis Customization:
axis: Customize the axis ticks and labels.
Example:
plot(x, y, xaxt="n") # Turn off x-axis
axis(1, at=c(1, 2, 3), labels=c("Label 1", "Label 2", "Label 3"))
output:
5.Lines:
lines: Add lines to the plot.
abline: Add a straight line.
Example:
lines(x, y, col="green") abline(h=0, v=0,
col="red", lty=2)
output:
6.Annotations:
text: Add text annotations to the plot.
Example:
7.Plot Range:
Output:
coordinate system, which reflects the value and scale of the horizo nta l
and vertical axes.
2. Figure region: The figure region is the area that contains the space
for your axes, their labels, and any titles. These spaces are also referred
to as the figure margins.
3. outer region :The outer region, also referred to as the outer margins,
is additional space around the figure region that is not included by
default but can be spec- ified if it’s needed
2.Plotting Margins:: Margin is the space between the chart border and the
canvas border. You can set the chart margins on any one of the chart's four sides.
Adjust the margins using par(mar=...). The mar parameter specifies the margin sizes in inches
(bottom, left, top, right).
Example:
par(mar=c(5, 4, 4, 2)) # Set margins (bottom, left, top, right)
plot(1:10)
output:
2.Outer margin(oma )
The margin in margin lines with oma. The outer margins are specially useful for
adding text to a combination of plots.
To access the current outer margins with par("oma").
In this example we set
Example:
par(oma=c(1,4,3,2),mar=
4:7) R> plot(1:10)
R>
box("figure",lty=2)
R>
box("outer",lty=3)
Output:
Note:The graphical parameters oma (outer margin) and mar (figure margin)
are used to control these regions and magins like mfrow, they are initialized
through a call to par before you begin to draw any new plot.
3.Default spacing:default figure margin settings with a call to par in R
Example:
1.par()$oma
[1] 0 0 0 0
2. par()$mar
[1] 5.1 4.1 4.1 2.1
here that oma=c(0, 0, 0, 0)—there is no outer margin set by default. The default
figure margin space is mar=c(5.1, 4.1, 4.1, 2.1)—in other words, 5.1 lines of text
on the bottom, 4.1 on the left and top, and 2.1 on the right.
3.consider the image on the left of Figure 23-4, created in a fresh graphics
device with the following:
plot(1:10)
box(which="figure",lty=2)
4.Custom Spacing: Let’s produce the same plot but with tailored outer margins so that
the bottom, left, top, and right areas are one, four, three, and two lines, respec tively,
and the figure margins are four, five, six, and seven lines.
Example:
par(oma=c(1,4,3,2),mar=4:7)
plot(1:10)
box("figure",lty=2)
box("outer",lty=3)
Output:
provide the text you want written in a character string as the fir s t
argument, and the argument line instructs how many lines of space awa y
from the inside border the text should appear.
5.Clipping:
Controlling clipping allows you to draw in or add elements to the mar- gin regions with
reference to the user coordinates of the plot
Example:
# Draw legend
legend("topright", inset = c(-0.3, 0.1), legend = c("Group 1","Group 2"), pch = c(11,12), col
= 1:2)
Output:
Example:
plot(1,1)
locator()
$x
[1] 0.8275456 1.1737525 1.1440526 0.8201909
$y
[1] 1.1581795 1.1534442 0.9003221 0.8630254
2.Visualizing Selected Coordinates
To use locator to plot the points you select as either individual points or as lines
Example:
plot(1,1)
list <- locator(type="o",pch=4,lty=2,lwd=3,col="red",xpd=TRUE) R>
list
$x
[1] 0.5013189 0.6267149 0.7384407 0.7172250 1.0386740 1.2765699
[7] 1.4711542 1.2352573 1.2220592 0.8583484 1.0483300 1.0091491
$y
[1] 0.6966016 0.9941945 0.9636752 1.2819852 1.2766579 1.4891270
[7] 1.2439071 0.9630832 0.7625887 0.7541716 0.6394519 0.9618461
2. Ad Hoc Annotation
The locator function also allows you to place ad hoc annotations, such as lege nd s,
on your plot—remember, since locator returns valid R user coordinates, these
results can directly form the positional argument of most standard annota tio n
functions.
Example:
plot(survey$Height~survey$Wr.Hnd,pch=16,col=c("gray","black")[as.num eric(survey$
Sex)], xlab="Writing handspan",ylab="Height")
legend(locator(n=1),legend=levels(survey$Sex),pch=16,col=c("gray","black"))
OUTPUT:
Example:
hp <- mtcars$hp
plot(hp,mpg,cex=wtcex)
plot(hp,mpg,cex=wtcex,xaxs="i",yaxs="i")
Output:
:
let’s plot MPG against horsepower (from the ready-to-use mtcars data set)
and set each plotted point to be sized propor tionally to the weight of each
car.
This plot is almost the same as the default, but note now that there’s no padding
space at the end of the axes; the most extreme data points sit right on the axes.
2.Customizing Boxes:
To add a box specific to the current plot region in the active graphics device, you use box
and specify its type with bty.
box(bty="u")
Example:
box(bty="]",lty=2,col="gray")
Output:
The bty argument is supplied a single character: "o" (default), "l", "7", "c", "u",
"]", or "n". The help file entry for bty
The resulting box boundaries will follow the appearance of the corresponding
uppercase letter, with the exception of "n"
3.Customizing Axes
The axis function allows you to control the addition and appearance of an axis on any of the
four sides of the plot region.
The first argument it takes is side, provided with a single integer: 1 (bottom), 2 (left), 3
(top), or 4 (right).
Syntax:
Parameters:
side: It defines the side of the plot the axis is to be drawn on possible values such as
below, left, above, and right.
at: Point to draw tick marks
Example:
x <- 1:5; y = x * x
plot(x, y, axes =
FALSE)
# Calling the axis() function
axis(side = 1, at = 1:5, labels =
LETTERS[1:5]) axis(3)
Output: Output:
1.Font:
The displayed font is controlled by two graphical parameters. family for the
specific font family and font, an integer selector for controlling bold and italic
typeface.
Available fonts depend on both your operating system and the graphics
Three generic families— “sans” the default, “serif”, and “mono”— are
always available
These are paired with the four possible values of font—1 (normal text, default), 2
(bold), 3 (italic), and 4 (bold and italic).
Example:
par(mar=c(3,3,3,3))
plot(1,1,type="n",xlim=c(-1,1),ylim=c(0,7),xaxt="n",yaxt="n",ann=FALSE)
text(0,6,label="sans text (default)\nfamily=\"sans\", font=1")
text(0,5,label="serif text\nfamily=\"serif\", font=1", family="serif",font=1)
text(0,4,label="mono text\nfamily=\"mono\", font=1", family="mono",font=1)
text(0,3,label="mono text (bold, italic)\nfamily=\"mono\", font=4", family="mono",font=4)
OUTPUT:
Displaying font styles through use of the family and font graphical parameters
2. Greek Symbols
For statistically or mathematically technical plots, annotation may occasionally require
Greek symbols or mathematical markup.
3.Mathematical Expressions:
Formatting entire mathematical expressions to appear in R plots is a bit mor e
complicated and is reminiscent of using markup languages
Example:
expr1 <- expression(c^2==a[1]^2+b[1]^2)
expr2 <- expression(paste(pi^{x[i]},(1-pi)^(n-x[i])))
expr3 <- expression(paste("Sample mean:",italic(n)^{-
1},sum(italic(x)[italic(i)], italic(i)==1, italic(n))
==frac(italic(x)[1]+...+italic(x)[italic(n)], italic(n))))
expr4 <- expression(paste("f(x","|",alpha,",",beta,")"==frac(x^{alpha-1}~(1-x)^{beta-1},
B(alpha,beta))))
Output:
2. Green: "#00FF00"
3. Blue: "#0000FF"
4. Orange: "#FFA500"
5. Purple: "#800080"
3.RGB Values:
You can define colors using RGB values, specifying the intensity of red, green, and blue channels
individually within the range of 0 to 1.
1. Red: rgb(1, 0, 0)
2. Green: rgb(0, 1, 0)
3. Blue: rgb(0, 0, 1)
4. Yellow: rgb(1, 1, 0)
5. Purple: rgb(0.5, 0, 0.5
4.HSL Values
HSL (Hue, Saturation, Lightness) values can also be used to define colors. The hue value represents
the color (0-360), saturation represents intensity (0-1), and lightness represents the brightness (0-1).
For example:
1. Red: hsl(0, 1, 0.5)
2. Green: hsl(120, 1, 0.5)
3. Blue: hsl(240, 1, 0.5)
4. Yellow: hsl(60, 1, 0.5)
5. Purple: hsl(300, 1, 0.5)
Example:
# Using different color representations in a plot
OUTPUT:
2.2D Plots: ‘2D’ stands for 2-dimensional and a 2D line is a line that is moved in 2-
dimensions. A line in 2D means that we could move in forward and backward direction but
also in any direction like left, right, up, down.
Scatter Plot:
3.Surface Plot: Surface plots are diagrams of three-dimensional data. Rather than showing the
individual data points, surface plots show a functional relationship between a designated
dependent variable (Y), and two independent variables (X and Z). The plot is a companion plot
to the contour plot.
Example:
x <- seq(-10, 10,
length=100) y <- seq(-10,
10, length=100)
z <- outer(x, y, function(x, y) sin(sqrt(x^2 + y^2)))
persp(x, y, z, theta=30, phi=20, col="lightblue", shade=0.5)
output:
OUPUT:
2. Built-in Palettes
Being able to implement your own RGB colors is most useful when you need
many colors, the collection of which is referred to as a palette.
There are a number of color palettes built into the base R installa- tion. These
are defined by the functions rainbow, heat.colors, terrain.colors, topo.colors,
cm.colors, gray.colors, and gray.
Example:
N <- 600
rbow <- rainbow(N)
heat <- heat.colors(N)
terr <- terrain.colors(N)
topo <- topo.colors(N)
cm <- cm.colors(N)
gry1 <- gray.colors(N)
gry2 <- gray(level=seq(0,1,length=N))
dev.new(width=8,height=3)
par(mar=c(1,8,1,1))
plot(1,1,xlim=c(1,N),ylim=c(0.5,7.5),type="n",xaxt="n",yaxt="n",ann=FALSE)
points(rep(1:N,7),rep(7:1,each=N),pch=19,cex=3,col=c(rbow,heat,terr,topo,cm,gry1,gry2))
OUTPUT:
Showcasing the color ranges of the built-in palettes, with default limits used in gray.colors.
3.Color Palettes:
R provides color palettes that allow you to use a sequence of colors.
Example:
# Example using a color palette
colors <- rainbow(10) # Generates 10 colors of the
4. Color Functions:
R has functions for creating color gradients or interpolating colors.
Example:
library(RColorBrewer)
colors <- colorRampPalette(brewer.pal(9,
output:
5.Color Scales:
R supports various color scales, like viridis, magma, etc.
Example:
library(viridis)
plot(1:10, col=viridis(10), pch=16)
output:
3D SCATTER PLOTS.:
creating 3D scatterplots, which allow you to plot raw observations based on thre e
continuous variables at once, as opposed to only two in a conventional 2D scatterplot
Syntax:
Example:
One popular package for this purpose is scatterplot3d. Here's a basic example of
how to create a 3D scatter plot using this package:
Example:
#Install and load the scatterplot3d package
install.packages("scatterplot3d")
library(scatterplot3d)
# Generate some random data set.seed(123)
x <- rnorm(100)
y <- rnorm(100)
z <- rnorm(100)
OUTPUT: