Ex 4
Ex 4
Scenario:
You have a Titanic dataset and want to compare metric values across many subgroups. Since a column
chart can become cluttered when you have many groups, a box plot or violin plot is often more
effective.
R Code:
CopyEdit
library(ggplot2)
geom_boxplot() +
x = "Passenger Class",
y = "Age",
fill = "Survived") +
theme_minimal()
Explanation:
• This code generates a boxplot comparing the distribution of passengers' ages across different
passenger classes (Pclass) and survival outcomes.
• Boxplots are great for comparing distributions because they show median, quartiles, and
outliers. You can also try violin plots to visualize the density of data points across groups.
Question 2: Plotting mpg Data from ggplot2 with displ on x-axis and hwy on y-axis
Scenario:
The mpg dataset in ggplot2 contains information about car models. We are plotting engine
displacement (displ) against highway miles per gallon (hwy).
R Code:
CopyEdit
data(mpg)
geom_point() +
theme_minimal()
Explanation:
• This is a scatterplot of engine displacement (displ) on the x-axis and highway miles per
gallon (hwy) on the y-axis.
• The scatterplot is useful for visualizing how engine size affects fuel efficiency.
Scenario:
Make a scatterplot of hwy vs cyl for the mpg dataset, mapping the colors to class variable. Customize
the size, shape, and transparency of points.
R Code:
CopyEdit
library(ggplot2)
x = "Number of Cylinders",
theme_minimal()
Explanation:
• This scatterplot maps the number of cylinders (cyl) on the x-axis and highway miles per
gallon (hwy) on the y-axis. The points are colored by car class (class).
• The size = 4 makes the points bigger, shape = 17 uses triangle markers, and alpha = 0.6
applies slight transparency to the points.
Scenario:
Given a dataset, you can create different charts like bar charts, line charts, and pie charts.
CopyEdit
geom_bar(position = "dodge") +
x = "Gender",
fill = "Survived") +
theme_minimal()
CopyEdit
year = 2000:2020,
value = c(100, 105, 110, 115, 120, 118, 122, 130, 140, 145, 150, 160, 170, 175, 180, 190, 200, 210,
215, 220, 230)
geom_line(color = "blue") +
x = "Year",
y = "Value") +
theme_minimal()
Explanation:
• The bar chart compares the survival count by gender, while the line chart shows trends over
time.
Scenario:
You need to create a boxplot for the given dataset that shows minimum, quartiles, and maximum
values.
R Code:
CopyEdit
geom_boxplot() +
x = "Survived",
y = "Age",
fill = "Survived") +
theme_minimal()
Explanation:
• This boxplot displays the age distribution for passengers who survived and those who did not.
The boxplot shows the minimum, first quartile, median, third quartile, and maximum.