DA R Unit-4
DA R Unit-4
Histograms
Box plots
Bar Charts
Line Graphs
Scatter plots
Pie Charts
Histograms
hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used −
In this example, we first load the mtcars dataset, which contains information about
different car models. We then pass the mpg column of the mtcars dataset to the
hist() function, which creates a histogram of the data. The main argument specifies
the title of the plot, the xlab argument specifies the label for the x-axis, the col
argument specifies the color of the bars in the plot, and the breaks argument
specifies the number of bins to use in the histogram.
You can also create stacked histograms to compare the distribution of different
variables or groups of data. Here's an example:
# Create a stacked histogram of the "iris" dataset, showing the distribution of petal
length for different species of flowers
data(iris)
dev.off()
hist(iris$Petal.Length[iris$Species == "setosa"], col = "blue", main = "Distribution of
Petal Length in Iris Flowers by Species", xlab = "Petal Length (cm)", ylim = c(0, 30),
breaks = 10)
par(new = TRUE)
hist(iris$Petal.Length[iris$Species == "versicolor"], col = "green", xlab = "", ylab = "",
breaks = 10)
par(new = TRUE)
hist(iris$Petal.Length[iris$Species == "virginica"], col = "red", xlab = "", ylab = "",
breaks = 10)
legend("topright", c("setosa", "versicolor", "virginica"), fill = c("blue", "green", "red"))
In this example, we first load the iris dataset, which contains information about
different species of iris flowers. We then create three histograms of the Petal.Length
variable in the iris dataset, grouped by species. We pass the Petal.Length variable for
each species to the hist() function, along with other arguments to specify the color,
title, label, and number of bins for each histogram. We also use the add argument to
stack the histograms on top of each other, and the legend() function to add a legend
to the plot. par(new = TRUE) sets up a new plotting region without clearing the
previous one.
Histograms can be useful for identifying the shape and spread of a dataset, as well as
for detecting any outliers or unusual patterns in the data. By displaying the
frequency or density of values within each bin, histograms provide a quick and easy
way to understand the distribution of a dataset.
# Create a histogram of the "Sepal.Length" variable from the iris dataset
data(iris)
2.hist(iris$Sepal.Length, main = "Histogram of Sepal Length", xlab = "Sepal Length",
ylab = "Frequency", col = "blue")
In this example, we first load the iris dataset, which contains information on various
iris flowers. We then pass the Sepal.Length variable to the hist() function to create
the histogram. The main argument specifies the title of the plot, the xlab argument
specifies the label for the x-axis, and the ylab argument specifies the label for the y-
axis. The col argument specifies the color of the bars in the histogram.
You can also customize the number of bins in the histogram by passing the breaks
argument. Here's an example:
# Create a histogram of the "Petal.Length" variable from the iris dataset with 15 bins
data(iris)
3.hist(iris$Petal.Length, breaks = 15, main = "Histogram of Petal Length", xlab =
"Petal Length", ylab = "Frequency", col = "red")
In this example, we create a histogram of the Petal.Length variable from the iris
dataset with 15 bins. The breaks argument specifies the number of bins in the
histogram.
Box plots
A box plot (also known as a box-and-whisker plot) is a type of chart that displays the
distribution of a dataset using five summary statistics: the minimum value, the first
quartile, the median, the third quartile, and the maximum value. Box plots are
commonly used to show the distribution of continuous data and to identify outliers.
Box plots are used to display the distribution of a numeric variable and any outliers.
Syntax
The basic syntax to create a boxplot in R is −
x is a vector or a formula.
varwidth is a logical value. Set as true to draw width of the box proportionate to the
sample size.
names are the group labels which will be printed under each boxplot.
In this example, we first load the iris dataset, which contains information about
different species of iris flowers. We then pass the Sepal.Length column of the iris
dataset to the boxplot() function, which creates a box plot of the data. The main
argument specifies the title of the plot, the ylab argument specifies the label for the
y-axis, and the col argument specifies the color of the boxes in the plot.
You can also create side-by-side box plots to compare the distribution of different
variables or groups of data. Here's an example:
# Create a side-by-side box plot of the "mtcars" dataset, showing the miles per gallon
for different car models by transmission type
data(mtcars)
boxplot(mpg ~ am, data = mtcars, main = "Miles per Gallon for Different Cars by
Transmission Type", xlab = "Transmission Type", ylab = "Miles per Gallon", col =
c("blue", "red"), names = c("Automatic", "Manual"))
In this example, we use the formula mpg ~ am to specify that we want to create a
box plot of the mpg variable in the mtcars dataset, grouped by the am variable
(which indicates the type of transmission). We pass these variables to the data
argument of the boxplot() function, along with other arguments to specify the title,
labels, colors, and names of the plot.
Box plots can be useful for identifying the spread and skewness of a dataset, as well
as for detecting outliers and comparing the distribution of different variables or
groups of data. By displaying summary statistics such as the median and quartiles,
box plots provide a quick and easy way to understand the distribution of a dataset.
# Create a basic box plot of the "mpg" variable from the mtcars dataset
data(mtcars)
2.boxplot(mtcars$mpg, main = "Box Plot of MPG", ylab = "Miles per Gallon", col =
"blue")
In this example, we first load the mtcars dataset, which contains information on
various cars. We then pass the mpg variable to the boxplot() function to create the
box plot. The main argument specifies the title of the plot, the ylab argument
specifies the label for the y-axis, and the col argument specifies the color of the
boxes in the plot.
You can also create side-by-side or stacked box plots by passing multiple variables or
a matrix to the boxplot() function. Here's an example of a side-by-side box plot:
# Create a side-by-side box plot of the "mpg" and "wt" variables from the mtcars
dataset
data(mtcars)
3.boxplot(mtcars$mpg, mtcars$wt, names = c("MPG", "Weight"), main = "Side-by-
Side Box Plot of MPG and Weight", ylab = "Value", col = c("blue", "red"))
In this example, we create a side-by-side box plot of the mpg and wt variables from
the mtcars dataset. The names argument specifies the labels for the boxes in the
plot, and the col argument specifies the colors of the boxes.
You can also create grouped box plots by passing a factor variable to the boxplot()
function. Here's an example:
# Create a grouped box plot of the "mpg" variable from the mtcars dataset by
number of cylinders
data(mtcars)
4.boxplot(mtcars$mpg ~ mtcars$cyl, main = "Grouped Box Plot of MPG by Number
of Cylinders", xlab = "Number of Cylinders", ylab = "Miles per Gallon", col = c("blue",
"red", "green"))
In this example, we create a grouped box plot of the mpg variable from the mtcars
dataset by number of cylinders. The ~ symbol indicates that we want to group the
data by the cyl variable. The xlab argument specifies the label for the x-axis, and the
col argument specifies the colors of the boxes in the plot.
Bar charts
A bar graph (also known as a bar chart) is a type of chart that displays data as a
series of rectangular bars. Bar graphs are commonly used to compare different
categories or groups of data, and are particularly useful for showing changes in data
over discrete periods.
Syntax
The basic syntax to create a bar-chart in R is −
barplot(H,xlab,ylab,main, names.arg,col)
Following is the description of the parameters used −
Example
A simple bar chart is created using just the input vector and the name of each bar.
The below script will create and save the bar chart in the current R working directory.
# Create the data for the chart
H <- c(7,12,28,3,41)
2.# Create a bar graph of the number of cars by country in the mtcars dataset
data(mtcars)
cars_by_country <- table(mtcars$am, mtcars$cyl)
barplot(cars_by_country, main = "Number of Cars by Country", xlab = "Number of
Cylinders", ylab = "Count", col = c("blue", "red"))
In this example, we first load the mtcars dataset, which contains information on
various cars. We then use the table() function to create a table of the number of cars
by country, where 0 represents automatic transmissions and 1 represents manual
transmissions. We then pass this table to the barplot() function to create the bar
graph. The main argument specifies the title of the plot, the xlab argument specifies
the label for the x-axis, and the ylab argument specifies the label for the y-axis. The
col argument specifies the colors of the bars in the graph.
You can also create stacked or grouped bar graphs by passing matrices or data
frames with multiple columns to the barplot() function. Here's an example of a
stacked bar graph:
# Create a stacked bar graph of the number of cars by transmission type and number
of cylinders in the mtcars dataset
data(mtcars)
cars_by_type_cyl <- table(mtcars$am, mtcars$cyl)
3.barplot(cars_by_type_cyl, main = "Number of Cars by Transmission Type and
Number of Cylinders", xlab = "Number of Cylinders", ylab = "Count", col = c("blue",
"red"), legend = c("Automatic", "Manual"), args.legend = list(x = "topright"))
In this example, we create a table of the number of cars by transmission type and
number of cylinders. We then pass this table to the barplot() function to create a
stacked bar graph. The legend argument specifies the labels for the legend, and the
args.legend argument specifies the location of the legend on the plot.
In this example, we first load the mtcars dataset, which contains information about
different car models. We then pass the mpg column of the mtcars variable to the
barplot() function, which creates a bar graph of the data. The main argument
specifies the title of the plot, the xlab argument specifies the label for the x-axis, the
ylab argument specifies the label for the y-axis, and the col argument specifies the
color of the bars in the plot.
You can also create stacked or grouped bar graphs by passing multiple variables to
the barplot() function. Here's an example:
# Create a stacked bar graph of the "mtcars" dataset, showing the miles per gallon
for different car models by transmission type
data(mtcars)
mpg_by_trans <- tapply(mtcars$mpg, mtcars$am, mean)
barplot(mpg_by_trans, main = "Miles per Gallon for Different Cars by Transmission
Type", xlab = "Transmission Type", ylab = "Miles per Gallon", col = c("blue", "red"),
legend = c("Automatic", "Manual"))
In this example, we first use the tapply() function to calculate the mean miles per
gallon for different car models by transmission type. We then pass the output of the
tapply() function to the barplot() function to create a stacked bar graph of the data.
The main argument specifies the title of the plot, the xlab argument specifies the
label for the x-axis, the ylab argument specifies the label for the y-axis, and the col
argument specifies the colors of the bars in the plot. The legend argument specifies
the legend for the plot.
Bar graphs can be useful for comparing different categories or groups of data, such
as sales figures for different products or the number of students enrolled in different
courses. By displaying data as rectangular bars, bar graphs make it easy to see the
differences between categories and identify trends in the data.
Line graphs
A line graph (also known as a line chart or line plot) is a type of chart that displays
information as a series of data points connected by straight lines. Line graphs are
commonly used to display trends over time, and are particularly useful for showing
changes in data over a continuous period.
Syntax
The basic syntax to create a line chart in R is −
plot(v,type,col,xlab,ylab)
Following is the description of the parameters used −
type takes the value "p" to draw only the points, "l" to draw only the lines and "o" to
draw both points and lines.
Example
A simple line chart is created using the input vector and the type parameter as "O".
The below script will create and save a line chart in the current R working directory.
# Create the data for the chart.
v <- c(7,12,28,3,41)
In this example, we first load the AirPassengers dataset, which contains monthly
passenger counts for an airline. We then pass the AirPassengers variable to the plot()
function, along with the type = "l" argument, which specifies that we want to create
a line graph. The main argument specifies the title of the plot, the xlab argument
specifies the label for the x-axis, the ylab argument specifies the label for the y-axis,
and the col argument specifies the color of the line in the plot.
You can also create multiple line graphs on the same plot by passing additional
variables to the lines() function. Here's an example:
# Create a line graph of the "AirPassengers" dataset, along with a second line graph
of the same data with a moving average
data(AirPassengers)
plot(AirPassengers, type = "l", main = "Airline Passengers over Time", xlab = "Year",
ylab = "Passenger Count", col = "blue")
lines(ma(AirPassengers, order = 12), col = "red")
In this example, we first create the same line graph as before using the plot()
function. We then pass the AirPassengers variable to the ma() function, which
calculates a moving average of the data with a window size of 12 (i.e., a 12-month
moving average). We then pass the output of the ma() function to the lines()
function to create a second line graph on the same plot, this time with a red color.
Line graphs can be useful for visualizing various types of data over time, such as
stock prices, weather patterns, or website traffic. By displaying data points and
trends in a simple and easy-to-understand way, line graphs can help you identify
patterns and make informed decisions based on the data.
Scatter plots
A scatter plot is a type of chart that displays the relationship between two
continuous variables. It is used to show how one variable is affected by another
variable, and to identify any patterns or trends in the data.
Syntax
The basic syntax for creating scatterplot in R is −
# Plot the chart for cars with weight between 2.5 to 5 and mileage between
15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage"
)
Scatterplot Matrices
When we have more than two variables and we want to find the correlation between
one variable versus the remaining ones we use scatterplot matrix. We
use pairs() function to create matrices of scatterplots.
Syntax
The basic syntax for creating scatterplot matrices in R is −
pairs(formula, data)
Following is the description of the parameters used −
formula represents the series of variables used in pairs.
data represents the data set from which the variables will be taken.
Example
Each variable is paired up with each of the remaining variable. A scatterplot is
plotted for each pair.
# Give the chart file a name.
png(file = "scatterplot_matrices.png")
pairs(~wt+mpg+disp+cyl,data = mtcars,
main = "Scatterplot Matrix")
Scatter plots are used to display the relationship between two numeric variables.
Here's an example using the "mtcars" dataset:
# Load the dataset
data(mtcars)
# Create a scatter plot of mpg versus weight
plot(mtcars$wt, mtcars$mpg, xlab="Weight", ylab="Miles Per Gallon")
In this example, we first load the mtcars dataset, which contains information about
different car models. We then pass the hp and mpg columns of the mtcars dataset to
the plot() function, which creates a scatter plot of the data. The main argument
specifies the title of the plot, the xlab argument specifies the label for the x-axis, the
ylab argument specifies the label for the y-axis, the col argument specifies the color
of the data points in the plot, and the pch argument specifies the shape of the data
points.
You can also create a scatter plot with a regression line to show the direction and
strength of the relationship between the variables. Here's an example:
# Create a scatter plot of the "iris" dataset, showing the relationship between petal
length and petal width, with a regression line
data(iris)
plot(iris$Petal.Length, iris$Petal.Width, main = "Petal Length vs. Petal Width", xlab =
"Petal Length (cm)", ylab = "Petal Width (cm)", col = iris$Species, pch = 19)
abline(lm(iris$Petal.Width ~ iris$Petal.Length), col = "red")
legend("topleft", c("setosa", "versicolor", "virginica"), col = 1:3, pch = 19)
In this example, we first load the iris dataset, which contains information about
different species of iris flowers. We then pass the Petal.Length and Petal.Width
columns of the iris dataset to the plot() function, along with other arguments to
specify the color, title, label, and shape of the data points. We also use the lm()
function to calculate a linear regression line for the relationship between the
variables, and the abline() function to add the regression line to the plot. We use the
legend() function to add a legend to the plot.
Scatter plots can be useful for identifying any patterns or trends in the relationship
between two variables. By displaying the distribution of data points in a two-
dimensional space, scatter plots provide a quick and easy way to understand the
relationship between two variables. Scatter plots can also help to identify any
outliers or unusual patterns in the data.
Pie charts
A pie chart is a type of chart that displays the proportion or percentage distribution
of different categories in a dataset. It is used to show the relative sizes of different
categories in a dataset, and is particularly useful when the number of categories is
small.
Syntax
The basic syntax for creating a pie-chart using the R is −
radius indicates the radius of the circle of the pie chart.(value between −1 and +1).
clockwise is a logical value indicating if the slices are drawn clockwise or anti
clockwise.
A very simple pie-chart is created using just the input vector and labels. The below
script will create and save the pie chart in the current R working directory.
Live Demo
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")
Example
The below script will create and save the pie chart in the current R working directory.
piepercent<- round(100*x/sum(x), 1)
3D Pie Chart
A pie chart with 3 dimensions can be drawn using additional packages. The
package plotrix has a function called pie3D() that is used for this.
# Get the library.
library(plotrix)
# Create a pie chart of the "mtcars" dataset, showing the distribution of car models by number of
cylinders
data(mtcars)
In this example, we first load the mtcars dataset, which contains information about different car
models. We then use the table() function to create a frequency table of the number of cars with
different numbers of cylinders. We pass this frequency table to the pie() function, which creates a
pie chart of the data. The main argument specifies the title of the plot, and the col argument
specifies the colors to use for the different categories.
Pie charts can be useful for visualizing the relative sizes of different categories in a dataset. However,
they can be difficult to read if the number of categories is large, or if the differences between
categories are small. In general, it is recommended to use a pie chart when the number of categories
is small and the differences between categories are large, and to use other types of charts, such as
bar charts or stacked bar charts, when the number of categories is large or the differences between
categories are small.
Pie charts are used to display the proportion of categorical data. Here's an example using the
"ChickWeight" dataset:
data(ChickWeight)