0% found this document useful (0 votes)
47 views32 pages

DA R Unit-4

Uploaded by

deepikach564
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views32 pages

DA R Unit-4

Uploaded by

deepikach564
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Data Visualization using R

Working with R Charts and Graphs:

 Histograms

 Box plots

 Bar Charts

 Line Graphs

 Scatter plots

 Pie Charts
Histograms

A histogram is a type of chart that shows the distribution of a dataset by grouping


the data into a set of bins and counting the number of observations that fall into
each bin. Histograms are commonly used to show the shape and spread of
continuous data.
Syntax
The basic syntax for creating a histogram using R is −

hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used −

v is a vector containing numeric values used in histogram.

main indicates title of the chart.

col is used to set color of the bars.

border is used to set border color of each bar.

xlab is used to give description of x-axis.

xlim is used to specify the range of values on the x-axis.

ylim is used to specify the range of values on the y-axis.

breaks is used to mention the width of each bar.


Example
A simple histogram is created using input vector, label, col and border parameters.
The script given below will create and save the histogram in the current R working
directory.
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.


png(file = "histogram.png")

# Create the histogram.


hist(v,xlab = "Weight",col = "yellow",border = "blue")

# Save the file.


dev.off()
When we execute the above code, it produces the following result −
Range of X and Y values
To specify the range of values allowed in X axis and Y axis, we can use the xlim and
ylim parameters.
The width of each of the bar can be decided by using breaks.
Live Demo
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.


png(file = "histogram_lim_breaks.png")

# Create the histogram.


hist(v,xlab = "Weight",col = "green",border = "red", xlim = c(0,40), ylim =
c(0,5),
breaks = 5)

# Save the file.


dev.off()
When we execute the above code, it produces the following result −
The basic function for creating a histogram in R. It takes a single argument, a numeric
vector, and returns a histogram plot with default settings.
Histograms are used to display the distribution of a numeric variable. Here's an
example using the "mtcars" dataset:

# Load the dataset


data(mtcars)
# Create a histogram of the mpg variable
1.hist(mtcars$mpg)

# Create a simple histogram of the "mtcars" dataset


data(mtcars)
hist(mtcars$mpg, main = "Distribution of Miles per Gallon in Cars", xlab = "Miles per
Gallon", col = "blue", breaks = 10)

In this example, we first load the mtcars dataset, which contains information about
different car models. We then pass the mpg column of the mtcars dataset to the
hist() function, which creates a histogram of the data. The main argument specifies
the title of the plot, the xlab argument specifies the label for the x-axis, the col
argument specifies the color of the bars in the plot, and the breaks argument
specifies the number of bins to use in the histogram.
You can also create stacked histograms to compare the distribution of different
variables or groups of data. Here's an example:

# Create a stacked histogram of the "iris" dataset, showing the distribution of petal
length for different species of flowers
data(iris)
dev.off()
hist(iris$Petal.Length[iris$Species == "setosa"], col = "blue", main = "Distribution of
Petal Length in Iris Flowers by Species", xlab = "Petal Length (cm)", ylim = c(0, 30),
breaks = 10)
par(new = TRUE)
hist(iris$Petal.Length[iris$Species == "versicolor"], col = "green", xlab = "", ylab = "",
breaks = 10)
par(new = TRUE)
hist(iris$Petal.Length[iris$Species == "virginica"], col = "red", xlab = "", ylab = "",
breaks = 10)
legend("topright", c("setosa", "versicolor", "virginica"), fill = c("blue", "green", "red"))

In this example, we first load the iris dataset, which contains information about
different species of iris flowers. We then create three histograms of the Petal.Length
variable in the iris dataset, grouped by species. We pass the Petal.Length variable for
each species to the hist() function, along with other arguments to specify the color,
title, label, and number of bins for each histogram. We also use the add argument to
stack the histograms on top of each other, and the legend() function to add a legend
to the plot. par(new = TRUE) sets up a new plotting region without clearing the
previous one.

Histograms can be useful for identifying the shape and spread of a dataset, as well as
for detecting any outliers or unusual patterns in the data. By displaying the
frequency or density of values within each bin, histograms provide a quick and easy
way to understand the distribution of a dataset.
# Create a histogram of the "Sepal.Length" variable from the iris dataset
data(iris)
2.hist(iris$Sepal.Length, main = "Histogram of Sepal Length", xlab = "Sepal Length",
ylab = "Frequency", col = "blue")

In this example, we first load the iris dataset, which contains information on various
iris flowers. We then pass the Sepal.Length variable to the hist() function to create
the histogram. The main argument specifies the title of the plot, the xlab argument
specifies the label for the x-axis, and the ylab argument specifies the label for the y-
axis. The col argument specifies the color of the bars in the histogram.

You can also customize the number of bins in the histogram by passing the breaks
argument. Here's an example:
# Create a histogram of the "Petal.Length" variable from the iris dataset with 15 bins
data(iris)
3.hist(iris$Petal.Length, breaks = 15, main = "Histogram of Petal Length", xlab =
"Petal Length", ylab = "Frequency", col = "red")

In this example, we create a histogram of the Petal.Length variable from the iris
dataset with 15 bins. The breaks argument specifies the number of bins in the
histogram.
Box plots
A box plot (also known as a box-and-whisker plot) is a type of chart that displays the
distribution of a dataset using five summary statistics: the minimum value, the first
quartile, the median, the third quartile, and the maximum value. Box plots are
commonly used to show the distribution of continuous data and to identify outliers.
Box plots are used to display the distribution of a numeric variable and any outliers.
Syntax
The basic syntax to create a boxplot in R is −

boxplot(x, data, notch, varwidth, names, main)


Following is the description of the parameters used −

x is a vector or a formula.

data is the data frame.

notch is a logical value. Set as TRUE to draw a notch.

varwidth is a logical value. Set as true to draw width of the box proportionate to the
sample size.

names are the group labels which will be printed under each boxplot.

main is used to give a title to the graph.

Creating the Boxplot


The below script will create a boxplot graph for the relation between mpg (miles per
gallon) and cyl (number of cylinders).
# Give the chart file a name.
png(file = "boxplot.png")

# Plot the chart.


boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
ylab = "Miles Per Gallon", main = "Mileage Data")

# Save the file.


dev.off()

When we execute the above code, it produces the following result −


Boxplot with Notch
We can draw boxplot with notch to find out how the medians of different data groups
match with each other.
The below script will create a boxplot graph with notch for each of the data group.
Live Demo
# Give the chart file a name.
png(file = "boxplot_with_notch.png")

# Plot the chart.


boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon",
main = "Mileage Data",
notch = TRUE,
varwidth = TRUE,
col = c("green","yellow","purple"),
names = c("High","Medium","Low")
)
# Save the file.
dev.off()
When we execute the above code, it produces the following result −

Here's an example of how to create a simple box plot in R:


# Create a simple box plot of the "iris" dataset
data(iris)
boxplot(iris$Sepal.Length, main = "Distribution of Sepal Length in Iris Flowers", ylab =
"Sepal Length (cm)", col = "blue")

In this example, we first load the iris dataset, which contains information about
different species of iris flowers. We then pass the Sepal.Length column of the iris
dataset to the boxplot() function, which creates a box plot of the data. The main
argument specifies the title of the plot, the ylab argument specifies the label for the
y-axis, and the col argument specifies the color of the boxes in the plot.

You can also create side-by-side box plots to compare the distribution of different
variables or groups of data. Here's an example:

# Create a side-by-side box plot of the "mtcars" dataset, showing the miles per gallon
for different car models by transmission type
data(mtcars)
boxplot(mpg ~ am, data = mtcars, main = "Miles per Gallon for Different Cars by
Transmission Type", xlab = "Transmission Type", ylab = "Miles per Gallon", col =
c("blue", "red"), names = c("Automatic", "Manual"))
In this example, we use the formula mpg ~ am to specify that we want to create a
box plot of the mpg variable in the mtcars dataset, grouped by the am variable
(which indicates the type of transmission). We pass these variables to the data
argument of the boxplot() function, along with other arguments to specify the title,
labels, colors, and names of the plot.

Box plots can be useful for identifying the spread and skewness of a dataset, as well
as for detecting outliers and comparing the distribution of different variables or
groups of data. By displaying summary statistics such as the median and quartiles,
box plots provide a quick and easy way to understand the distribution of a dataset.

Here's an example using the "iris" dataset:


# Load the dataset
data(iris)
# Create a box plot of the petal length variable by species
1.boxplot(iris$Petal.Length ~ iris$Species)

# Create a basic box plot of the "mpg" variable from the mtcars dataset
data(mtcars)
2.boxplot(mtcars$mpg, main = "Box Plot of MPG", ylab = "Miles per Gallon", col =
"blue")

In this example, we first load the mtcars dataset, which contains information on
various cars. We then pass the mpg variable to the boxplot() function to create the
box plot. The main argument specifies the title of the plot, the ylab argument
specifies the label for the y-axis, and the col argument specifies the color of the
boxes in the plot.

You can also create side-by-side or stacked box plots by passing multiple variables or
a matrix to the boxplot() function. Here's an example of a side-by-side box plot:

# Create a side-by-side box plot of the "mpg" and "wt" variables from the mtcars
dataset
data(mtcars)
3.boxplot(mtcars$mpg, mtcars$wt, names = c("MPG", "Weight"), main = "Side-by-
Side Box Plot of MPG and Weight", ylab = "Value", col = c("blue", "red"))

In this example, we create a side-by-side box plot of the mpg and wt variables from
the mtcars dataset. The names argument specifies the labels for the boxes in the
plot, and the col argument specifies the colors of the boxes.

You can also create grouped box plots by passing a factor variable to the boxplot()
function. Here's an example:

# Create a grouped box plot of the "mpg" variable from the mtcars dataset by
number of cylinders
data(mtcars)
4.boxplot(mtcars$mpg ~ mtcars$cyl, main = "Grouped Box Plot of MPG by Number
of Cylinders", xlab = "Number of Cylinders", ylab = "Miles per Gallon", col = c("blue",
"red", "green"))

In this example, we create a grouped box plot of the mpg variable from the mtcars
dataset by number of cylinders. The ~ symbol indicates that we want to group the
data by the cyl variable. The xlab argument specifies the label for the x-axis, and the
col argument specifies the colors of the boxes in the plot.
Bar charts
A bar graph (also known as a bar chart) is a type of chart that displays data as a
series of rectangular bars. Bar graphs are commonly used to compare different
categories or groups of data, and are particularly useful for showing changes in data
over discrete periods.
Syntax
The basic syntax to create a bar-chart in R is −

barplot(H,xlab,ylab,main, names.arg,col)
Following is the description of the parameters used −

H is a vector or matrix containing numeric values used in bar chart.


xlab is the label for x axis.
ylab is the label for y axis.
main is the title of the bar chart.
names.arg is a vector of names appearing under each bar.
col is used to give colors to the bars in the graph.

Example
A simple bar chart is created using just the input vector and the name of each bar.
The below script will create and save the bar chart in the current R working directory.
# Create the data for the chart
H <- c(7,12,28,3,41)

# Give the chart file a name


png(file = "barchart.png")

# Plot the bar chart


barplot(H)

# Save the file


dev.off()
When we execute above code, it produces following result −
Bar Chart Labels, Title and Colors
The features of the bar chart can be expanded by adding more parameters.
The main parameter is used to add title. The col parameter is used to add colors to
the bars. The args.name is a vector having same number of values as the input
vector to describe the meaning of each bar.
Example
The below script will create and save the bar chart in the current R working directory.
# Create the data for the chart
H <- c(7,12,28,3,41)
M <- c("Mar","Apr","May","Jun","Jul")

# Give the chart file a name


png(file = "barchart_months_revenue.png")

# Plot the bar chart


barplot(H,names.arg=M,xlab="Month",ylab="Revenue",col="blue",
main="Revenue chart",border="red")

# Save the file


dev.off()

When we execute above code, it produces following result −


Group Bar Chart and Stacked Bar Chart
We can create bar chart with groups of bars and stacks in each bar by using a matrix
as input values.
More than two variables are represented as a matrix which is used to create the
group bar chart and stacked bar chart.
# Create the input vectors.
colors = c("green","orange","brown")
months <- c("Mar","Apr","May","Jun","Jul")
regions <- c("East","West","North")

# Create the matrix of the values.


Values <- matrix(c(2,9,3,11,9,4,8,7,3,12,5,2,8,10,11), nrow = 3, ncol = 5,
byrow = TRUE)

# Give the chart file a name


png(file = "barchart_stacked.png")

# Create the bar chart


barplot(Values, main = "total revenue", names.arg = months, xlab =
"month", ylab = "revenue", col = colors)
# Add the legend to the chart
legend("topleft", regions, cex = 1.3, fill = colors)

# Save the file


dev.off()
Bar charts are used to display the frequency or proportion of categorical data. Here's
an example using the "Titanic" dataset:
1.# Load the dataset
data(Titanic)
# Create a bar chart of the survival variable by class
barplot(table(Titanic$Survived, Titanic$Class), col=c("red", "green"), legend=TRUE)

2.# Create a bar graph of the number of cars by country in the mtcars dataset
data(mtcars)
cars_by_country <- table(mtcars$am, mtcars$cyl)
barplot(cars_by_country, main = "Number of Cars by Country", xlab = "Number of
Cylinders", ylab = "Count", col = c("blue", "red"))

In this example, we first load the mtcars dataset, which contains information on
various cars. We then use the table() function to create a table of the number of cars
by country, where 0 represents automatic transmissions and 1 represents manual
transmissions. We then pass this table to the barplot() function to create the bar
graph. The main argument specifies the title of the plot, the xlab argument specifies
the label for the x-axis, and the ylab argument specifies the label for the y-axis. The
col argument specifies the colors of the bars in the graph.

You can also create stacked or grouped bar graphs by passing matrices or data
frames with multiple columns to the barplot() function. Here's an example of a
stacked bar graph:

# Create a stacked bar graph of the number of cars by transmission type and number
of cylinders in the mtcars dataset
data(mtcars)
cars_by_type_cyl <- table(mtcars$am, mtcars$cyl)
3.barplot(cars_by_type_cyl, main = "Number of Cars by Transmission Type and
Number of Cylinders", xlab = "Number of Cylinders", ylab = "Count", col = c("blue",
"red"), legend = c("Automatic", "Manual"), args.legend = list(x = "topright"))

In this example, we create a table of the number of cars by transmission type and
number of cylinders. We then pass this table to the barplot() function to create a
stacked bar graph. The legend argument specifies the labels for the legend, and the
args.legend argument specifies the location of the legend on the plot.

# Create a simple bar graph of the "mtcars" dataset


data(mtcars)
barplot(mtcars$mpg, main = "Miles per Gallon for Different Cars", xlab = "Car
Model", ylab = "Miles per Gallon", col = "blue")

In this example, we first load the mtcars dataset, which contains information about
different car models. We then pass the mpg column of the mtcars variable to the
barplot() function, which creates a bar graph of the data. The main argument
specifies the title of the plot, the xlab argument specifies the label for the x-axis, the
ylab argument specifies the label for the y-axis, and the col argument specifies the
color of the bars in the plot.
You can also create stacked or grouped bar graphs by passing multiple variables to
the barplot() function. Here's an example:
# Create a stacked bar graph of the "mtcars" dataset, showing the miles per gallon
for different car models by transmission type
data(mtcars)
mpg_by_trans <- tapply(mtcars$mpg, mtcars$am, mean)
barplot(mpg_by_trans, main = "Miles per Gallon for Different Cars by Transmission
Type", xlab = "Transmission Type", ylab = "Miles per Gallon", col = c("blue", "red"),
legend = c("Automatic", "Manual"))

In this example, we first use the tapply() function to calculate the mean miles per
gallon for different car models by transmission type. We then pass the output of the
tapply() function to the barplot() function to create a stacked bar graph of the data.
The main argument specifies the title of the plot, the xlab argument specifies the
label for the x-axis, the ylab argument specifies the label for the y-axis, and the col
argument specifies the colors of the bars in the plot. The legend argument specifies
the legend for the plot.

Bar graphs can be useful for comparing different categories or groups of data, such
as sales figures for different products or the number of students enrolled in different
courses. By displaying data as rectangular bars, bar graphs make it easy to see the
differences between categories and identify trends in the data.
Line graphs
A line graph (also known as a line chart or line plot) is a type of chart that displays
information as a series of data points connected by straight lines. Line graphs are
commonly used to display trends over time, and are particularly useful for showing
changes in data over a continuous period.
Syntax
The basic syntax to create a line chart in R is −

plot(v,type,col,xlab,ylab)
Following is the description of the parameters used −

v is a vector containing the numeric values.

type takes the value "p" to draw only the points, "l" to draw only the lines and "o" to
draw both points and lines.

xlab is the label for x axis.

ylab is the label for y axis.

main is the Title of the chart.

col is used to give colors to both the points and lines.

Example
A simple line chart is created using the input vector and the type parameter as "O".
The below script will create and save a line chart in the current R working directory.
# Create the data for the chart.
v <- c(7,12,28,3,41)

# Give the chart file a name.


png(file = "line_chart.jpg")

# Plot the bar chart.


plot(v,type = "o")

# Save the file.


dev.off()
When we execute the above code, it produces the following result −
Line Chart Title, Color and Labels
The features of the line chart can be expanded by using additional parameters. We
add color to the points and lines, give a title to the chart and add labels to the axes.
Example
# Create the data for the chart.
v <- c(7,12,28,3,41)

# Give the chart file a name.


png(file = "line_chart_label_colored.jpg")

# Plot the bar chart.


plot(v,type = "o", col = "red", xlab = "Month", ylab = "Rain fall",
main = "Rain fall chart")

# Save the file.


dev.off()
When we execute the above code, it produces the following result −
Multiple Lines in a Line Chart
More than one line can be drawn on the same chart by using the lines()function.
After the first line is plotted, the lines() function can use an additional vector as input
to draw the second line in the chart,
Live Demo
# Create the data for the chart.
v <- c(7,12,28,3,41)
t <- c(14,7,6,19,3)

# Give the chart file a name.


png(file = "line_chart_2_lines.jpg")

# Plot the bar chart.


plot(v,type = "o",col = "red", xlab = "Month", ylab = "Rain fall",
main = "Rain fall chart")

lines(t, type = "o", col = "blue")

# Save the file.


dev.off()
When we execute the above code, it produces the following result −

Here's an example of how to create a simple line graph in R:


Line graphs are used to display trends over time or across a continuous variable.
Here's an example using the "AirPassengers" dataset:
# Load the dataset
data(AirPassengers)
# Create a line graph of the number of air passengers over time
plot(AirPassengers, type="l", xlab="Year", ylab="Number of Passengers")

In this example, we first load the AirPassengers dataset, which contains monthly
passenger counts for an airline. We then pass the AirPassengers variable to the plot()
function, along with the type = "l" argument, which specifies that we want to create
a line graph. The main argument specifies the title of the plot, the xlab argument
specifies the label for the x-axis, the ylab argument specifies the label for the y-axis,
and the col argument specifies the color of the line in the plot.

You can also create multiple line graphs on the same plot by passing additional
variables to the lines() function. Here's an example:

# Create a line graph of the "AirPassengers" dataset, along with a second line graph
of the same data with a moving average
data(AirPassengers)
plot(AirPassengers, type = "l", main = "Airline Passengers over Time", xlab = "Year",
ylab = "Passenger Count", col = "blue")
lines(ma(AirPassengers, order = 12), col = "red")

In this example, we first create the same line graph as before using the plot()
function. We then pass the AirPassengers variable to the ma() function, which
calculates a moving average of the data with a window size of 12 (i.e., a 12-month
moving average). We then pass the output of the ma() function to the lines()
function to create a second line graph on the same plot, this time with a red color.

Line graphs can be useful for visualizing various types of data over time, such as
stock prices, weather patterns, or website traffic. By displaying data points and
trends in a simple and easy-to-understand way, line graphs can help you identify
patterns and make informed decisions based on the data.
Scatter plots

A scatter plot is a type of chart that displays the relationship between two
continuous variables. It is used to show how one variable is affected by another
variable, and to identify any patterns or trends in the data.
Syntax
The basic syntax for creating scatterplot in R is −

plot(x, y, main, xlab, ylab, xlim, ylim, axes)


Following is the description of the parameters used −

x is the data set whose values are the horizontal coordinates.

y is the data set whose values are the vertical coordinates.

main is the tile of the graph.

xlab is the label in the horizontal axis.

ylab is the label in the vertical axis.

xlim is the limits of the values of x used for plotting.

ylim is the limits of the values of y used for plotting.

axes indicates whether both axes should be drawn on the plot.

Creating the Scatterplot


The below script will create a scatterplot graph for the relation between wt(weight)
and mpg(miles per gallon).
# Get the input values.
input <- mtcars[,c('wt','mpg')]

# Give the chart file a name.


png(file = "scatterplot.png")

# Plot the chart for cars with weight between 2.5 to 5 and mileage between
15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage"
)

# Save the file.


dev.off()
When we execute the above code, it produces the following result −

Scatterplot Matrices
When we have more than two variables and we want to find the correlation between
one variable versus the remaining ones we use scatterplot matrix. We
use pairs() function to create matrices of scatterplots.
Syntax
The basic syntax for creating scatterplot matrices in R is −
pairs(formula, data)
Following is the description of the parameters used −
 formula represents the series of variables used in pairs.
 data represents the data set from which the variables will be taken.
Example
Each variable is paired up with each of the remaining variable. A scatterplot is
plotted for each pair.
# Give the chart file a name.
png(file = "scatterplot_matrices.png")

# Plot the matrices between 4 variables giving 12 plots.

# One variable with 3 others and total 4 variables.

pairs(~wt+mpg+disp+cyl,data = mtcars,
main = "Scatterplot Matrix")

# Save the file.


dev.off()
When the above code is executed we get the following output.

Here's an example of how to create a scatter plot in R:


# Create a scatter plot of the "mtcars" dataset, showing the relationship between
horsepower and miles per gallon
data(mtcars)
plot(mtcars$hp, mtcars$mpg, main = "Horsepower vs. Miles per Gallon", xlab =
"Horsepower", ylab = "Miles per Gallon", col = "blue", pch = 20)

Scatter plots are used to display the relationship between two numeric variables.
Here's an example using the "mtcars" dataset:
# Load the dataset
data(mtcars)
# Create a scatter plot of mpg versus weight
plot(mtcars$wt, mtcars$mpg, xlab="Weight", ylab="Miles Per Gallon")

In this example, we first load the mtcars dataset, which contains information about
different car models. We then pass the hp and mpg columns of the mtcars dataset to
the plot() function, which creates a scatter plot of the data. The main argument
specifies the title of the plot, the xlab argument specifies the label for the x-axis, the
ylab argument specifies the label for the y-axis, the col argument specifies the color
of the data points in the plot, and the pch argument specifies the shape of the data
points.

You can also create a scatter plot with a regression line to show the direction and
strength of the relationship between the variables. Here's an example:
# Create a scatter plot of the "iris" dataset, showing the relationship between petal
length and petal width, with a regression line
data(iris)
plot(iris$Petal.Length, iris$Petal.Width, main = "Petal Length vs. Petal Width", xlab =
"Petal Length (cm)", ylab = "Petal Width (cm)", col = iris$Species, pch = 19)
abline(lm(iris$Petal.Width ~ iris$Petal.Length), col = "red")
legend("topleft", c("setosa", "versicolor", "virginica"), col = 1:3, pch = 19)
In this example, we first load the iris dataset, which contains information about
different species of iris flowers. We then pass the Petal.Length and Petal.Width
columns of the iris dataset to the plot() function, along with other arguments to
specify the color, title, label, and shape of the data points. We also use the lm()
function to calculate a linear regression line for the relationship between the
variables, and the abline() function to add the regression line to the plot. We use the
legend() function to add a legend to the plot.

Scatter plots can be useful for identifying any patterns or trends in the relationship
between two variables. By displaying the distribution of data points in a two-
dimensional space, scatter plots provide a quick and easy way to understand the
relationship between two variables. Scatter plots can also help to identify any
outliers or unusual patterns in the data.
Pie charts

A pie-chart is a representation of values in the form of slices of a circle with different


colors. Slices are labeled with a description, and the numbers corresponding to each
slice are also shown in the chart.

A pie chart is a type of chart that displays the proportion or percentage distribution
of different categories in a dataset. It is used to show the relative sizes of different
categories in a dataset, and is particularly useful when the number of categories is
small.

Syntax
The basic syntax for creating a pie-chart using the R is −

pie(x, labels, radius, main, col, clockwise)


Following is the description of the parameters used −

x is a vector containing the numeric values used in the pie chart.

labels is used to give description to the slices.

radius indicates the radius of the circle of the pie chart.(value between −1 and +1).

main indicates the title of the chart.

col indicates the color palette.

clockwise is a logical value indicating if the slices are drawn clockwise or anti
clockwise.

A very simple pie-chart is created using just the input vector and labels. The below
script will create and save the pie chart in the current R working directory.

Live Demo
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")

# Give the chart file a name.


png(file = "city.png")

# Plot the chart.


pie(x,labels)

# Save the file.


dev.off()
When we execute the above code, it produces the following result −

Pie Chart Title and Colors


We can expand the features of the chart by adding more parameters to the function. We will use
parameter main to add a title to the chart and another parameter is col which will make use of
rainbow colour pallet while drawing the chart. The length of the pallet should be same as the
number of values we have for the chart. Hence we use length(x).

Example

The below script will create and save the pie chart in the current R working directory.

# Create data for the graph.


x <- c(21, 62, 10, 53)
labels <- c("London", "New York", "Singapore", "Mumbai")

# Give the chart file a name.


png(file = "city_title_colours.jpg")

# Plot the chart with title and rainbow color pallet.


pie(x, labels, main = "City pie chart", col = rainbow(length(x)))

# Save the file.


dev.off()

Slice Percentages and Chart Legend


We can add slice percentage and a chart legend by creating additional chart
variables.
o
# Create data for the graph.
x <- c(21, 62, 10,53)
labels <- c("London","New York","Singapore","Mumbai")

piepercent<- round(100*x/sum(x), 1)

# Give the chart file a name.


png(file = "city_percentage_legends.jpg")

# Plot the chart.


pie(x, labels = piepercent, main = "City pie chart",col = rainbow(length(x)))
legend("topright", c("London","New York","Singapore","Mumbai"), cex = 0.8,
fill = rainbow(length(x)))

# Save the file.


dev.off()
When we execute the above code, it produces the following result −

3D Pie Chart
A pie chart with 3 dimensions can be drawn using additional packages. The
package plotrix has a function called pie3D() that is used for this.
# Get the library.
library(plotrix)

# Create data for the graph.


x <- c(21, 62, 10,53)
lbl <- c("London","New York","Singapore","Mumbai")

# Give the chart file a name.


png(file = "3d_pie_chart.jpg")

# Plot the chart.


pie3D(x,labels = lbl,explode = 0.1, main = "Pie Chart of Countries ")

# Save the file.


dev.off()
When we execute the above code, it produces the following result −
Here's an example of how to create a pie chart in R:

# Create a pie chart of the "mtcars" dataset, showing the distribution of car models by number of
cylinders

data(mtcars)

cylinders <- table(mtcars$cyl)

pie(cylinders, main = "Distribution of Car Models by Number of Cylinders", col =


rainbow(length(cylinders)))

In this example, we first load the mtcars dataset, which contains information about different car
models. We then use the table() function to create a frequency table of the number of cars with
different numbers of cylinders. We pass this frequency table to the pie() function, which creates a
pie chart of the data. The main argument specifies the title of the plot, and the col argument
specifies the colors to use for the different categories.

Pie charts can be useful for visualizing the relative sizes of different categories in a dataset. However,
they can be difficult to read if the number of categories is large, or if the differences between
categories are small. In general, it is recommended to use a pie chart when the number of categories
is small and the differences between categories are large, and to use other types of charts, such as
bar charts or stacked bar charts, when the number of categories is large or the differences between
categories are small.

Pie charts are used to display the proportion of categorical data. Here's an example using the
"ChickWeight" dataset:

# Load the dataset

data(ChickWeight)

# Create a pie chart of the number of chicks in each diet group

pie(table(ChickWeight$Diet), col=c("red", "green", "blue", "orange"), main="Chick Weight by Diet")

You might also like