0% found this document useful (0 votes)
8 views4 pages

Ex 4

The document provides R code examples for visualizing data using ggplot2, including box plots, scatter plots, and bar charts, specifically using the Titanic and mpg datasets. It explains how to compare metric values across subgroups, plot engine displacement against highway MPG, and customize scatter plots with various aesthetics. Additionally, it demonstrates creating different types of charts like bar and line charts to represent survival counts and time series data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Ex 4

The document provides R code examples for visualizing data using ggplot2, including box plots, scatter plots, and bar charts, specifically using the Titanic and mpg datasets. It explains how to compare metric values across subgroups, plot engine displacement against highway MPG, and customize scatter plots with various aesthetics. Additionally, it demonstrates creating different types of charts like bar and line charts to represent survival counts and time series data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Question 1: Visualization for Comparing Metric Values Across Different Subgroups

Scenario:

You have a Titanic dataset and want to compare metric values across many subgroups. Since a column
chart can become cluttered when you have many groups, a box plot or violin plot is often more
effective.

R Code:

CopyEdit

# Load necessary libraries

library(ggplot2)

# Assuming you have Titanic dataset loaded

data <- read.csv("path_to_your_titanic_dataset.csv")

# Create boxplot to compare survival rate by passenger class

ggplot(data, aes(x = factor(Pclass), y = Age, fill = Survived)) +

geom_boxplot() +

labs(title = "Survival Distribution by Passenger Class",

x = "Passenger Class",

y = "Age",

fill = "Survived") +

theme_minimal()

Explanation:

• This code generates a boxplot comparing the distribution of passengers' ages across different
passenger classes (Pclass) and survival outcomes.

• Boxplots are great for comparing distributions because they show median, quartiles, and
outliers. You can also try violin plots to visualize the density of data points across groups.

Question 2: Plotting mpg Data from ggplot2 with displ on x-axis and hwy on y-axis

Scenario:

The mpg dataset in ggplot2 contains information about car models. We are plotting engine
displacement (displ) against highway miles per gallon (hwy).

R Code:

CopyEdit

# Load necessary libraries


library(ggplot2)

# Load mpg dataset (it's included in ggplot2)

data(mpg)

# Plot displ vs hwy

ggplot(mpg, aes(x = displ, y = hwy)) +

geom_point() +

labs(title = "Engine Displacement vs Highway MPG",

x = "Engine Displacement (liters)",

y = "Highway Miles per Gallon") +

theme_minimal()

Explanation:

• This is a scatterplot of engine displacement (displ) on the x-axis and highway miles per
gallon (hwy) on the y-axis.

• The scatterplot is useful for visualizing how engine size affects fuel efficiency.

Question 3: Scatterplot with Customization in mpg Dataset

Scenario:

Make a scatterplot of hwy vs cyl for the mpg dataset, mapping the colors to class variable. Customize
the size, shape, and transparency of points.

R Code:

CopyEdit

# Load necessary libraries

library(ggplot2)

# Scatterplot with customizations

ggplot(mpg, aes(x = cyl, y = hwy, color = class)) +

geom_point(size = 4, shape = 17, alpha = 0.6) +

labs(title = "Scatterplot of Highway MPG vs Cylinder Count",

x = "Number of Cylinders",

y = "Highway Miles per Gallon",

color = "Car Class") +

theme_minimal()
Explanation:

• This scatterplot maps the number of cylinders (cyl) on the x-axis and highway miles per
gallon (hwy) on the y-axis. The points are colored by car class (class).

• The size = 4 makes the points bigger, shape = 17 uses triangle markers, and alpha = 0.6
applies slight transparency to the points.

Question 4: Plot Different Charts for a Dataset

Scenario:

Given a dataset, you can create different charts like bar charts, line charts, and pie charts.

R Code (for bar chart):

CopyEdit

# Bar chart example for Titanic dataset

ggplot(data, aes(x = factor(Sex), fill = factor(Survived))) +

geom_bar(position = "dodge") +

labs(title = "Survival Count by Gender",

x = "Gender",

fill = "Survived") +

theme_minimal()

R Code (for line chart):

CopyEdit

# Line chart example (using hypothetical time-series data)

time_series_data <- data.frame(

year = 2000:2020,

value = c(100, 105, 110, 115, 120, 118, 122, 130, 140, 145, 150, 160, 170, 175, 180, 190, 200, 210,
215, 220, 230)

ggplot(time_series_data, aes(x = year, y = value)) +

geom_line(color = "blue") +

labs(title = "Time Series Data",

x = "Year",

y = "Value") +

theme_minimal()
Explanation:

• The bar chart compares the survival count by gender, while the line chart shows trends over
time.

Question 5: Box Plot for Statistical Data

Scenario:

You need to create a boxplot for the given dataset that shows minimum, quartiles, and maximum
values.

R Code:

CopyEdit

# Boxplot for statistical data

ggplot(data, aes(x = factor(Survived), y = Age, fill = factor(Survived))) +

geom_boxplot() +

labs(title = "Age Distribution by Survival",

x = "Survived",

y = "Age",

fill = "Survived") +

theme_minimal()

Explanation:

• This boxplot displays the age distribution for passengers who survived and those who did not.
The boxplot shows the minimum, first quartile, median, third quartile, and maximum.

You might also like