0% found this document useful (0 votes)
34 views21 pages

19PDSC205 Lab Manual

The documents demonstrate various data structures and operations in R programming. The first shows how to find the maximum of three numbers using logical operators. The second defines functions to build a simple calculator program. The third and fourth documents show how to create, manipulate and perform

Uploaded by

U1 cutz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views21 pages

19PDSC205 Lab Manual

The documents demonstrate various data structures and operations in R programming. The first shows how to find the maximum of three numbers using logical operators. The second defines functions to build a simple calculator program. The third and fourth documents show how to create, manipulate and perform

Uploaded by

U1 cutz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

1.

Maximum among three numbers

b <- 9
c <- 3

if (a >= b && a >= c) {


print(a)
} else if (b >= a && b >= c) {
print(b)
} else {
print(c)
}

In this program, we first set the values of three variables a, b, and c. Then we use logical operators &&
(AND) and || (OR) to compare these variables. We use && to check if a is greater than or equal to both b
and c. If it is, then a is the maximum number, so we print a. If a is not the maximum, we move to the
next else if block to check if b is greater than or equal to a and c. If it is, then b is the maximum, so we
print b. If neither a nor b is the maximum, we know that c must be the maximum, so we print c.

# Taking input from the user


a <- as.numeric(readline("Enter the first number: "))
b <- as.numeric(readline("Enter the second number: "))
c <- as.numeric(readline("Enter the third number: "))

# Checking for the maximum number


if (a > b) {
if (a > c) {
cat("The maximum number is: ", a)
} else {
cat("The maximum number is: ", c)
}
} else {
if (b > c) {
cat("The maximum number is: ", b)
} else {
cat("The maximum number is: ", c)
}
}

This program takes input from the user for three numbers and checks for the maximum number among
them using nested if statements and logical operators. The input numbers are first converted to numeric
using the as.numeric function, and then the readline function is used to take input from the user.

The program then checks for the maximum number among the three using nested if statements and
logical operators. The first if statement checks if a is greater than b. If it is, then it checks if a is greater
than c. If both conditions are true, it means a is the maximum number, and it prints the result using the
cat function.

If the first condition (a > b) is false, it means b is greater than a. In that case, the second if statement
checks if b is greater than c. If it is, then b is the maximum number, and it prints the result using the cat
function. If both the conditions in the second if statement are false, it means c is the maximum number,
and the program prints the result using the cat function.

2. Simple Calculator:

add <- function(n1, n2) {


print(paste(n1,"+",n2,"=",n1+n2))
}
subtract <- function(n1, n2) {
print(paste(n1,"-",n2,"=",n1-n2))
}
multiply <- function(n1, n2) {
print(paste(n1,"*",n2,"=",n1*n2))
}
divide <- function(n1, n2) {
print(paste(n1,"/",n2,"=",n1/n2))
}

print("*** Simple Calculator ***")


print("-----------------------------")
ch='y'

while (ch == 'y'| ch == 'Y'){


n1=as.integer(readline(prompt = "enter the value for n1:"))
n2=as.integer(readline(prompt = "enter the value for n2:"))

print("1. Addition")
print("2. Subtraction")
print("3. Multiplication")
print("4. Divition")

print(" enter your operation(1/2/3/4)")


op=as.integer(readline(prompt="enter the operation no :"))
if (op == 1){
add(n1,n2)
} else if (op == 2){
subtract(n1,n2)
} else if (op == 3){
multiply(n1,n2)
} else{
divide(n1,n2)
}
print(" do you want to continue?")
ch=readline(prompt = "enter y / n : ")
}

This code defines four functions: add, subtract, multiply, and divide, which perform the corresponding
arithmetic operations on two input numbers. It then enters a loop that repeatedly prompts the user for
two numbers and an operation to perform on those numbers, using the functions defined earlier. The
loop continues until the user chooses to stop by entering 'n' or 'N' when prompted.

3. VECTOR

# Creating a numeric vector


my_numeric_vector <- c(1, 2, 3, 4, 5)
print(my_numeric_vector)

# Creating a character vector


my_character_vector <- c("John", "Mike", "Sara")
print(my_character_vector)

# Creating a logical vector


my_logical_vector <- c(TRUE, TRUE, FALSE)
print(my_logical_vector)

# Accessing the first element of a vector


first_element <- my_numeric_vector[1]
print(first_element)

# Accessing the last element of a vector


last_element <- my_numeric_vector[length(my_numeric_vector)]
print(last_element)

# Changing an element of a vector


my_numeric_vector[2] <- 6
print(my_numeric_vector)

# Adding an element to a vector


my_numeric_vector <- c(my_numeric_vector, 7)
print(my_numeric_vector)

# Removing an element from a vector


my_numeric_vector <- my_numeric_vector[-3]
print(my_numeric_vector)

# Vector addition
vector_sum <- my_numeric_vector + 2
print(vector_sum)

# Vector subtraction
vector_difference <- my_numeric_vector - 2
print(vector_difference)

# Vector multiplication
vector_product <- my_numeric_vector * 2
print(vector_product)

# Vector division
vector_quotient <- my_numeric_vector / 2
print(vector_quotient)

# Greater than
greater_than_vector <- my_numeric_vector > 2
print(greater_than_vector)

# Less than
less_than_vector <- my_numeric_vector < 2
print(less_than_vector)

# Equal to
equal_to_vector <- my_numeric_vector == 2
print(equal_to_vector)

# Concatenating two vectors


concatenated_vector <- c(my_numeric_vector, my_character_vector)
print(concatenated_vector)

This code demonstrates some fundamental operations on numeric, character, and logical vectors in R.

● Here we created three vectors of different types: a numeric vector, a character vector,
and a logical vector.

● Accessed elements of a vector using square brackets and the index of the element you
wanted to access.

● Demonstrated how to access the first element and the last element of a vector.

● Changed an element of a vector by assigning a new value to the index of the element
you wanted to change.

● Added an element to a vector using the concatenation operator c().

● Removed an element from a vector by using negative indexing.

● Performed arithmetic operations on a vector, including addition, subtraction,


multiplication, and division.

● Created new vectors by applying logical operators to a vector, including greater than (>),
less than (<), and equal to (==).
● Concatenated two vectors together using the concatenation operator c().

Overall, these are important building blocks for working with vectors in R.

4. LIST

# Creating a list with different types of elements


my_list <- list(Name = "John", Age = 28, Gender = "M", Salary = 50000)
my_list

# Accessing an element of a list


age <- my_list$Age
age

# Accessing an element of a list using double square brackets


age <- my_list[["Age"]]
age

# Changing an element of a list


my_list$Age <- 30
print(my_list)

# Adding an element to a list


my_list$City <- "New York"

# Removing an element from a list


my_list$Salary <- NULL

# Finding the length of a list


list_length <- length(my_list)
list_length

# Creating two lists


list1 <- list(Name = "John", Age = 28)
list2 <- list(Gender = "M", Salary = 50000)

# Concatenating two lists


concatenated_list <- c(list1, list2)
concatenated_list

# Accessing the first element of a list


first_element <- my_list[[1]]
first_element
# Accessing the last element of a list
last_element <- my_list[[length(my_list)]]
last_element
# Iterating over a list using a for loop
for (i in 1:length(my_list)) {
print(my_list[[i]])
}

# Iterating over a list using the lapply function


lapply(my_list, print)

This code demonstrates some fundamental operations on lists in R. Here we performed the follwing
operations on List.

● Created a list with different types of elements, including a character string, a numeric value, and
a couple of scalar values.
● Accessed elements of a list using the dollar sign ($) and double square brackets [[]].
● Changed an element of a list by assigning a new value to the index of the element you wanted to
change.
● Added an element to a list using the dollar sign ($).
● Removed an element from a list by using the NULL keyword.
● Found the length of a list using the length() function.
● Created two lists and concatenated them using the concatenation operator c().
● Accessed the first element and the last element of a list using double square brackets [[]].
● Iterated over a list using a for loop and the lapply() function.
These are important building blocks for working with lists in R.

5. MATRIX

# A matrix in R is a two-dimensional array of data of the same type (numeric, character, or logical).
# Here are some basic operations that can be performed on matrices in R:

# Creating a matrix with numeric values


my_matrix <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)
my_matrix

# Creating a matrix with character values


my_matrix <- matrix(c("A", "B", "C", "D", "E", "F"), nrow = 2, ncol = 3, byrow = TRUE)
my_matrix

# Accessing a specific element of a matrix


element <- my_matrix[1, 2]
element
# Accessing a specific row of a matrix
row <- my_matrix[1, ]
row

# Accessing a specific column of a matrix


col <- my_matrix[, 1]
col

# Changing an element of a matrix


my_matrix[1, 2] <- 7
my_matrix

# Adding a row to a matrix


my_matrix <- rbind(my_matrix, c(8, 9, 10))
my_matrix

# Adding a column to a matrix


my_matrix <- cbind(my_matrix, c(11, 12))
my_matrix

# Matrix addition
matrix_sum <- my_matrix + 2
matrix_sum

# Matrix subtraction
matrix_difference <- my_matrix - 2
matrix_difference

# Matrix multiplication
matrix_product <- my_matrix * 2
matrix_product

# Matrix division
matrix_quotient <- my_matrix / 2
matrix_quotient

# Transposing a matrix
matrix_transpose <- t(my_matrix)
matrix_transpose

# Matrix Multiplication
matrix_mult <- my_matrix %*% matrix_transpose
matrix_mult

# Inverse of a matrix
matrix_inv <- solve(my_matrix)
matrix_inv

# Determinant of a matrix
matrix_det <- det(my_matrix)
matrix_det

This code demonstrates some fundamental operations on lists in R. Here we performed the follwing
operations on matrix.

● Created a matrix with numeric values using the matrix() function.


● Created a matrix with character values using the matrix() function with the byrow parameter set
to TRUE.
● Accessed a specific element, row, or column of a matrix using square brackets [].
● Changed an element of a matrix using square brackets [].
● Added a row or column to a matrix using rbind() or cbind(), respectively.
● Performing basic arithmetic operations on a matrix, such as addition, subtraction, multiplication,
and division.
● Finding a transpose of a matrix using the t() function.
● Multiplying two matrices using the %*% operator.
● Finding the inverse of a matrix using the solve() function.
● Finding the determinant of a matrix using the det() function.

Overall, this code demonstrates the basic operations that can be performed on matrices in R, which are a
powerful data structure for storing and manipulating data in a two-dimensional format.

6. DATA FRAME

# Creating a data frame


my_data_frame <- data.frame(Name = c("John", "Mike", "Sara"),
Age = c(28, 32, 25),
Gender = c("M", "M", "F"),
Salary = c(50000, 60000, 45000))

# Displaying the data frame


print(my_data_frame)

# Accessing a specific column of a data frame


ages <- my_data_frame$Age
print(ages)

# Accessing a specific row of a data frame


row <- my_data_frame[1, ]
print(row)

# Accessing a specific element of a data frame


element <- my_data_frame[2, 3]
print(element)

# Changing an element of a data frame


my_data_frame[1, 4] <- 55000
print(my_data_frame)

# Adding a row to a data frame


new_row <- data.frame(Name = "Tom", Age = 30, Gender = "M", Salary = 70000)
my_data_frame <- rbind(my_data_frame, new_row)
print(my_data_frame)

# Adding a column to a data frame


my_data_frame$City <- c("New York", "Chicago", "Boston", "San Francisco")
print(my_data_frame)

# Removing a row from a data frame


my_data_frame <- my_data_frame[-4, ]
print(my_data_frame)

# Removing a column from a data frame


my_data_frame$City <- NULL
print(my_data_frame)

# Filtering a data frame


filtered_data <- subset(my_data_frame, Age > 27)
print(filtered_data)

# Sorting a data frame


sorted_data <- my_data_frame[order(my_data_frame$Salary), ]
print(sorted_data)

# Summarizing a data frame


summary_data <- summary(my_data_frame)
print(summary_data)

This R program demonstrates basic operations on a data frame, which is a two-dimensional table-like
data structure in R.

● Creating a data frame using the data.frame() function.


● Accessing a specific column, row, or element using the $ or [] operator.
● Changing a specific element in a data frame.
● Adding a row or column to a data frame using rbind() or $ operator respectively.
● Removing a row or column from a data frame using the - operator or $ operator with NULL value
respectively.
● Filtering the data frame using subset() function.
● Sorting the data frame using order() function.
● Summarizing the data frame using summary() function.

7. DESCRIPTIVE STATISTICS

# Descriptive statistics

dat<-iris # load the iris dataset and renamed it dat

head(dat) # first 6 observations

str(dat) # structure of dataset

min(dat$Sepal.Length) # minimum
max(dat$Sepal.Length) # maximum

rng<-range(dat$Sepal.Length)
rng
rng[1]
rng[2]

max(dat$Sepal.Length) - min(dat$Sepal.Length) # Range

# Range using function


range2 <- function(x) {
range <- max(x) - min(x)
return(range)
}

range2(dat$Sepal.Length)

mean(dat$Sepal.Length) # mean
median(dat$Sepal.Length) # median
quantile(dat$Sepal.Length, 0.5) # median using quantile() function
quantile(dat$Sepal.Length, 0.25) # first quartile
quantile(dat$Sepal.Length, 0.75) # third quartile

quantile(dat$Sepal.Length, 0.4) # 4th decile


quantile(dat$Sepal.Length, 0.98) # 98th percentile

IQR(dat$Sepal.Length) # Interquartile range


quantile(dat$Sepal.Length, 0.75) - quantile(dat$Sepal.Length, 0.25) # Interquartile range

sd(dat$Sepal.Length) # standard deviation


var(dat$Sepal.Length) # variance
# Tip: to compute the standard deviation (or variance) of multiple variables at the same time,
# use lapply() with the appropriate statistics as second argument:
lapply(dat[, 1:4], sd)

summary(dat)

by(dat, dat$Species, summary) # descriptive statistics by group

# finding mode
tab <- table(dat$Sepal.Length) # number of occurrences for each unique value
sort(tab, decreasing = TRUE) # sort highest to lowest

dat$size <- ifelse(dat$Sepal.Length < median(dat$Sepal.Length),"small", "big") #Contigency table


#dat$size
table(dat$size)
table(dat$Species, dat$size)

This R code performs basic descriptive statistics on the iris dataset:

● The dataset is loaded and renamed as 'dat'.


● The first 6 observations of the dataset are displayed using the 'head()' function.
● The structure of the dataset is displayed using the 'str()' function.
● The minimum and maximum values of the 'Sepal.Length' variable are displayed using the 'min()'
and 'max()' functions.
● The range of the 'Sepal.Length' variable is computed and displayed using both manual
calculations and a user-defined function 'range2()'.
● The mean, median, and quartiles of the 'Sepal.Length' variable are displayed using the 'mean()',
'median()', and 'quantile()' functions.
● The 4th decile and 98th percentile of the 'Sepal.Length' variable are displayed using the
'quantile()' function.
● The interquartile range (IQR), standard deviation, and variance of the 'Sepal.Length' variable are
displayed using the 'IQR()', 'sd()', and 'var()' functions.
● The 'lapply()' function is used to compute the standard deviation of multiple variables at the
same time.
● The 'summary()' function is used to display summary statistics of the dataset.
● Descriptive statistics are computed by group (species) using the 'by()' function.
● The mode of the 'Sepal.Length' variable is computed by counting the number of occurrences for
each unique value and sorting them in decreasing order.
● A new variable 'size' is created based on the median of the 'Sepal.Length' variable, and a
contingency table is displayed to show the frequency of each combination of 'Species' and 'size'.

8. PREPROCESSING

# Steps in Data Preprocessing


# Step 1: Importing the Dataset
data<-read.csv("purchase.csv")
data
# Step 2: Handling the Missing Data
is.na(data$Age)# finding the missing values in age column
sum(is.na(data$Age))# counting the missing values in age column
# Replace the missing data with the average of the feature in which the data is missing:
data$Age = ifelse(is.na(data$Age),
ave(data$Age, FUN = function (x)mean(x, na.rm = TRUE)),
data$Age)

data$Salary = ifelse(is.na(data$Salary),
ave(data$Salary, FUN = function (x)mean(x, na.rm = TRUE)),
data$Salary)
# Step 3: Encoding Categorical Data.
data$Country = factor(data$Country,
levels = c('india','srilanka','pakistan'),
labels = c(1.0, 2.0 , 3.0 ))

data$Purchased = factor(data$Purchased,
levels = c('no', 'yes'),
labels = c(0.0, 1.0))
data$Purchased[is.na(data$Purchased)] <- 0
as.factor(data$Purchased)

# Step 4: Splitting the Dataset into the Training and Test sets
# Training set
# Test set
library(caTools)# required library for data splition
set.seed(123)
split = sample.split(data$Purchased, SplitRatio = 0.8)# returns true if observation goes to the Training set
and false if observation goes to the test set.

#Creating the training set and test set separately


training_set = subset(data, split == TRUE)
test_set = subset(data, split == FALSE)
training_set
test_set
# Step 5: Feature Scaling
# training_set
# test_set
training_set[, c(2,4)] = scale(training_set[, c(2,4)])
test_set[, c(2,4)] = scale(test_set[, c(2,4)])
training_set
test_set

This code performs data preprocessing on a dataset named "purchase.csv". The dataset contains
information about customers including their age, salary, country, and whether or not they made a
purchase.

The steps involved in the data preprocessing include:

● Importing the dataset using the read.csv function.


● Handling the missing data by replacing the missing values in the Age and Salary columns with
their respective column means.
● Encoding categorical data by converting the Country and Purchased columns to factors, with the
former being assigned numerical values and the latter being assigned binary values of 0 and 1.
● Splitting the dataset into a training set and a test set using the sample.split function from the
caTools library.
● Performing feature scaling by standardizing the values in the Age and Salary columns of both the
training set and the test set using the scale function.
9. Linear Regression

# The predictor vector.


x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
# The resposne vector.
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
print(relation)
summary(relation)
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)
# Give the chart file a name.
png(file = "linearregression.png")
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)),cex = 1.3,pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")
# Save the file.
dev.off()

#*****************************************************************************
# Load the dataset
#data <- read.csv(file.choose())
data=women

# Check the structure of the data


str(data)

# Split the data into training and testing sets


set.seed(123)
trainIndex <- sample(1:nrow(data), 0.8*nrow(data))
trainData <- data[trainIndex,]
testData <- data[-trainIndex,]

# Fit a linear regression model


model <- lm(weight ~ ., data = trainData)

# Print the model summary


summary(model)

# Make predictions on the testing set


predictions <- predict(model, testData)

# Evaluate the model using RMSE and R-squared


library(Metrics)
rmse <- rmse(predictions, testData$weight)
rSquared <- function(predicted, actual) {
1 - sum((actual - predicted) ^ 2) / sum((actual - mean(actual)) ^ 2)
}

# Usage
rSquared=rSquared(predictions, testData$weight)

# Print the evaluation metrics


cat("RMSE:", rmse, "\n")
cat("R-squared:", rSquared, "\n")

# Visualize the results


library(ggplot2)
ggplot(data, aes(x = height, y = weight)) +
geom_point() +
geom_smooth(method = "lm") +
xlab("Predictor Variable") +
ylab("Target Variable")
This program has two parts. The first part shows an example of simple linear regression using the
lm() function with a sample dataset and predicts y value for the given x value.

The second part uses the women dataset from R and performs linear regression by splitting the
data into train and test sets with evaluation metrics and visualization.

The summary of the second part linear regression model shows the estimated coefficients for
the intercept and the predictor variable 'height', as well as the corresponding standard errors,
t-values, and p-values. The intercept is estimated to be -87.930 and the slope coefficient for
'height' is estimated to be 3.460, with a standard error of 0.100. The t-value for the slope
coefficient is 34.6 and the associated p-value is very small (9.63e-12), indicating strong evidence
against the null hypothesis that the true slope coefficient is zero. The R-squared value is 0.9917,
indicating that 99.17% of the variance in the response variable 'weight' is explained by the
predictor variable 'height'. The adjusted R-squared value, which takes into account the number
of predictor variables and the sample size, is 0.9909. The residual standard error is 1.584, which
is the estimated standard deviation of the error term in the linear regression model. The
F-statistic is 1197, with a p-value of 9.632e-12, indicating that the overall fit of the model is
significant.

The RMSE value for the model is 1.422896 and the R-squared value is 0.9697314. The RMSE is a
measure of how well the model fits the data, and a lower RMSE indicates a better fit. The
R-squared value is a measure of how much of the variation in the target variable is explained by
the predictor variable, and a higher R-squared value indicates a better fit. In this case, the model
seems to have a good fit with both high R-squared and low RMSE values.

10. Outlier Detection


# Load the dataset
data <- mtcars

# Visualize the distribution of the "mpg" variable


hist(data$mpg)
# Define a function to detect outliers based on the interquartile range (IQR)
outlier.detect <- function(x, k = 1.5) {
q <- quantile(x, probs = c(0.25, 0.75))
iqr <- q[2] - q[1]
upper <- q[2] + k * iqr
lower <- q[1] - k * iqr
return(x[x > upper | x < lower])
}

# Detect outliers in the "mpg" variable using the function


outliers <- outlier.detect(data$mpg)
outliers

# Remove the outliers


o_mpg_data <- data[!(data$mpg %in% outliers), ]

# Visualize the distribution of the "mpg" variable after removing the outliers
hist(o_mpg_data$mpg)

hist(data$wt)

# Detect outliers in the "wt" variable using the function


outliers <- outlier.detect(data$wt)
outliers

# Remove the outliers


o_wt_data <- data[!(data$wt %in% outliers), ]
dim(o_wt_data)

# Visualize the distribution of the "wt" variable after removing the outliers
hist(o_wt_data$wt)
# IQR method to detect outliers in the "wt" variable of the "mtcars" dataset
# Create a boxplot to visualize the distribution of the wt variable
data=mtcars
boxplot(data$wt)

# Calculate the upper and lower bounds for outliers using the IQR method
q1 <- quantile(data$wt, 0.25)
q3 <- quantile(data$wt, 0.75)
iqr <- q3 - q1
upper_bound <- q3 + 1.5 * iqr
lower_bound <- q1 - 1.5 * iqr

# Identify the outliers using the upper and lower bounds


outliers <- data[data$wt > upper_bound | data$wt < lower_bound,]
outliers

# Remove the outliers


o_wt_data=data[!(data$wt %in% outliers$wt), ]
dim(data)
dim(o_wt_data)
boxplot(o_wt_data$wt)

# Print the outliers


print(data[outliers, ])

This code includes:

Loading the mtcars dataset and visualizing the distribution of the mpg variable using a
histogram.
Defining a function called outlier.detect() to detect outliers based on the interquartile range
(IQR).
Detecting and removing outliers in the mpg variable using the outlier.detect() function.
Visualizing the distribution of the mpg variable after removing the outliers using a histogram.
Detecting and removing outliers in the wt variable using the outlier.detect() function.
Visualizing the distribution of the wt variable after removing the outliers using a histogram.
Using the IQR method to detect outliers in the wt variable of the mtcars dataset and printing the
outliers.

11. Exploring and manipulating data with dplyr package

In this exercise, we will use the dplyr package to explore and manipulate a dataset called mtcars.

This dataset contains information about 32 different car models, including their miles per gallon
(mpg), horsepower (hp), weight (wt), and other characteristics.

#Load the dplyr package:


library(dplyr)

#Load the mtcars dataset:


data(mtcars)

#Inspect the structure of the dataset using the str() function:


str(mtcars)

#Use the select() function to select only the columns mpg, cyl, and wt:
mtcars_select <- select(mtcars, mpg, cyl, wt)
head(mtcars_select)

#Use the filter() function to select only the rows where cyl is equal to 4:
mtcars_filter <- filter(mtcars_select, cyl == 4)
head(mtcars_filter)

#Use the arrange() function to sort the dataset by wt in descending order:


mtcars_arrange <- arrange(mtcars_select, desc(wt))
head(mtcars_arrange)
#Use the mutate() function to create a new column called displacement that is equal to hp/cyl:
mtcars_mutate <- mutate(mtcars_select, displacement = hp/cyl)
head(mtcars_mutate)

#Use the summarize() function to calculate the mean mpg, cyl, and wt for the entire dataset:
mtcars_summarize <- summarize(mtcars_select, mean_mpg = mean(mpg), mean_cyl =
mean(cyl), mean_wt = mean(wt))
mtcars_summarize

#Use the group_by() and summarize() functions to calculate the mean mpg, cyl, and wt for each
unique value of cyl:

mtcars_group <- group_by(mtcars_select, cyl)


mtcars_summarize_group <- summarize(mtcars_group, mean_mpg = mean(mpg), mean_cyl =
mean(cyl), mean_wt = mean(wt))
mtcars_summarize_group

12. Plots

# Load the required libraries


library(ggplot2)
library(dplyr)

# Load the data


data("iris")

# Clean and prepare the data


iris_filtered <- iris %>% filter(Species == "setosa")

# Create the plots


# Bar plot
ggplot(iris_filtered, aes(x = Sepal.Length, y = Petal.Length)) +
geom_bar(stat = "identity", fill = "blue") +
labs(title = "Bar Plot of Sepal.Length vs Petal.Length", x = "Sepal.Length", y = "Petal.Length")

# Line graph
ggplot(iris_filtered, aes(x = Sepal.Length, y = Petal.Length)) +
geom_line(color = "red") +
labs(title = "Line Graph of Sepal.Length vs Petal.Length", x = "Sepal.Length", y = "Petal.Length")

# Scatter plot
ggplot(iris_filtered, aes(x = Sepal.Length, y = Petal.Length)) +
geom_point(color = "green") +
labs(title = "Scatter Plot of Sepal.Length vs Petal.Length", x = "Sepal.Length", y = "Petal.Length")

# Histogram
ggplot(iris_filtered, aes(x = Sepal.Length)) +
geom_histogram(binwidth = 0.1, fill = "orange") +
labs(title = "Histogram of Sepal.Length", x = "Sepal.Length", y = "Count")

# Box plot
ggplot(iris_filtered, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
labs(title = "Box Plot of Sepal.Length by Species", x = "Species", y = "Sepal.Length")

# Save the plots


ggsave("bar_plot.png")
ggsave("line_graph.jpeg")
ggsave("scatter_plot.pdf")
ggsave("histogram.png")
ggsave("box_plot.jpeg")

You might also like