19PDSC205 Lab Manual
19PDSC205 Lab Manual
b <- 9
c <- 3
In this program, we first set the values of three variables a, b, and c. Then we use logical operators &&
(AND) and || (OR) to compare these variables. We use && to check if a is greater than or equal to both b
and c. If it is, then a is the maximum number, so we print a. If a is not the maximum, we move to the
next else if block to check if b is greater than or equal to a and c. If it is, then b is the maximum, so we
print b. If neither a nor b is the maximum, we know that c must be the maximum, so we print c.
This program takes input from the user for three numbers and checks for the maximum number among
them using nested if statements and logical operators. The input numbers are first converted to numeric
using the as.numeric function, and then the readline function is used to take input from the user.
The program then checks for the maximum number among the three using nested if statements and
logical operators. The first if statement checks if a is greater than b. If it is, then it checks if a is greater
than c. If both conditions are true, it means a is the maximum number, and it prints the result using the
cat function.
If the first condition (a > b) is false, it means b is greater than a. In that case, the second if statement
checks if b is greater than c. If it is, then b is the maximum number, and it prints the result using the cat
function. If both the conditions in the second if statement are false, it means c is the maximum number,
and the program prints the result using the cat function.
2. Simple Calculator:
print("1. Addition")
print("2. Subtraction")
print("3. Multiplication")
print("4. Divition")
This code defines four functions: add, subtract, multiply, and divide, which perform the corresponding
arithmetic operations on two input numbers. It then enters a loop that repeatedly prompts the user for
two numbers and an operation to perform on those numbers, using the functions defined earlier. The
loop continues until the user chooses to stop by entering 'n' or 'N' when prompted.
3. VECTOR
# Vector addition
vector_sum <- my_numeric_vector + 2
print(vector_sum)
# Vector subtraction
vector_difference <- my_numeric_vector - 2
print(vector_difference)
# Vector multiplication
vector_product <- my_numeric_vector * 2
print(vector_product)
# Vector division
vector_quotient <- my_numeric_vector / 2
print(vector_quotient)
# Greater than
greater_than_vector <- my_numeric_vector > 2
print(greater_than_vector)
# Less than
less_than_vector <- my_numeric_vector < 2
print(less_than_vector)
# Equal to
equal_to_vector <- my_numeric_vector == 2
print(equal_to_vector)
This code demonstrates some fundamental operations on numeric, character, and logical vectors in R.
● Here we created three vectors of different types: a numeric vector, a character vector,
and a logical vector.
● Accessed elements of a vector using square brackets and the index of the element you
wanted to access.
● Demonstrated how to access the first element and the last element of a vector.
● Changed an element of a vector by assigning a new value to the index of the element
you wanted to change.
● Created new vectors by applying logical operators to a vector, including greater than (>),
less than (<), and equal to (==).
● Concatenated two vectors together using the concatenation operator c().
Overall, these are important building blocks for working with vectors in R.
4. LIST
This code demonstrates some fundamental operations on lists in R. Here we performed the follwing
operations on List.
● Created a list with different types of elements, including a character string, a numeric value, and
a couple of scalar values.
● Accessed elements of a list using the dollar sign ($) and double square brackets [[]].
● Changed an element of a list by assigning a new value to the index of the element you wanted to
change.
● Added an element to a list using the dollar sign ($).
● Removed an element from a list by using the NULL keyword.
● Found the length of a list using the length() function.
● Created two lists and concatenated them using the concatenation operator c().
● Accessed the first element and the last element of a list using double square brackets [[]].
● Iterated over a list using a for loop and the lapply() function.
These are important building blocks for working with lists in R.
5. MATRIX
# A matrix in R is a two-dimensional array of data of the same type (numeric, character, or logical).
# Here are some basic operations that can be performed on matrices in R:
# Matrix addition
matrix_sum <- my_matrix + 2
matrix_sum
# Matrix subtraction
matrix_difference <- my_matrix - 2
matrix_difference
# Matrix multiplication
matrix_product <- my_matrix * 2
matrix_product
# Matrix division
matrix_quotient <- my_matrix / 2
matrix_quotient
# Transposing a matrix
matrix_transpose <- t(my_matrix)
matrix_transpose
# Matrix Multiplication
matrix_mult <- my_matrix %*% matrix_transpose
matrix_mult
# Inverse of a matrix
matrix_inv <- solve(my_matrix)
matrix_inv
# Determinant of a matrix
matrix_det <- det(my_matrix)
matrix_det
This code demonstrates some fundamental operations on lists in R. Here we performed the follwing
operations on matrix.
Overall, this code demonstrates the basic operations that can be performed on matrices in R, which are a
powerful data structure for storing and manipulating data in a two-dimensional format.
6. DATA FRAME
This R program demonstrates basic operations on a data frame, which is a two-dimensional table-like
data structure in R.
7. DESCRIPTIVE STATISTICS
# Descriptive statistics
min(dat$Sepal.Length) # minimum
max(dat$Sepal.Length) # maximum
rng<-range(dat$Sepal.Length)
rng
rng[1]
rng[2]
range2(dat$Sepal.Length)
mean(dat$Sepal.Length) # mean
median(dat$Sepal.Length) # median
quantile(dat$Sepal.Length, 0.5) # median using quantile() function
quantile(dat$Sepal.Length, 0.25) # first quartile
quantile(dat$Sepal.Length, 0.75) # third quartile
summary(dat)
# finding mode
tab <- table(dat$Sepal.Length) # number of occurrences for each unique value
sort(tab, decreasing = TRUE) # sort highest to lowest
8. PREPROCESSING
data$Salary = ifelse(is.na(data$Salary),
ave(data$Salary, FUN = function (x)mean(x, na.rm = TRUE)),
data$Salary)
# Step 3: Encoding Categorical Data.
data$Country = factor(data$Country,
levels = c('india','srilanka','pakistan'),
labels = c(1.0, 2.0 , 3.0 ))
data$Purchased = factor(data$Purchased,
levels = c('no', 'yes'),
labels = c(0.0, 1.0))
data$Purchased[is.na(data$Purchased)] <- 0
as.factor(data$Purchased)
# Step 4: Splitting the Dataset into the Training and Test sets
# Training set
# Test set
library(caTools)# required library for data splition
set.seed(123)
split = sample.split(data$Purchased, SplitRatio = 0.8)# returns true if observation goes to the Training set
and false if observation goes to the test set.
This code performs data preprocessing on a dataset named "purchase.csv". The dataset contains
information about customers including their age, salary, country, and whether or not they made a
purchase.
#*****************************************************************************
# Load the dataset
#data <- read.csv(file.choose())
data=women
# Usage
rSquared=rSquared(predictions, testData$weight)
The second part uses the women dataset from R and performs linear regression by splitting the
data into train and test sets with evaluation metrics and visualization.
The summary of the second part linear regression model shows the estimated coefficients for
the intercept and the predictor variable 'height', as well as the corresponding standard errors,
t-values, and p-values. The intercept is estimated to be -87.930 and the slope coefficient for
'height' is estimated to be 3.460, with a standard error of 0.100. The t-value for the slope
coefficient is 34.6 and the associated p-value is very small (9.63e-12), indicating strong evidence
against the null hypothesis that the true slope coefficient is zero. The R-squared value is 0.9917,
indicating that 99.17% of the variance in the response variable 'weight' is explained by the
predictor variable 'height'. The adjusted R-squared value, which takes into account the number
of predictor variables and the sample size, is 0.9909. The residual standard error is 1.584, which
is the estimated standard deviation of the error term in the linear regression model. The
F-statistic is 1197, with a p-value of 9.632e-12, indicating that the overall fit of the model is
significant.
The RMSE value for the model is 1.422896 and the R-squared value is 0.9697314. The RMSE is a
measure of how well the model fits the data, and a lower RMSE indicates a better fit. The
R-squared value is a measure of how much of the variation in the target variable is explained by
the predictor variable, and a higher R-squared value indicates a better fit. In this case, the model
seems to have a good fit with both high R-squared and low RMSE values.
# Visualize the distribution of the "mpg" variable after removing the outliers
hist(o_mpg_data$mpg)
hist(data$wt)
# Visualize the distribution of the "wt" variable after removing the outliers
hist(o_wt_data$wt)
# IQR method to detect outliers in the "wt" variable of the "mtcars" dataset
# Create a boxplot to visualize the distribution of the wt variable
data=mtcars
boxplot(data$wt)
# Calculate the upper and lower bounds for outliers using the IQR method
q1 <- quantile(data$wt, 0.25)
q3 <- quantile(data$wt, 0.75)
iqr <- q3 - q1
upper_bound <- q3 + 1.5 * iqr
lower_bound <- q1 - 1.5 * iqr
Loading the mtcars dataset and visualizing the distribution of the mpg variable using a
histogram.
Defining a function called outlier.detect() to detect outliers based on the interquartile range
(IQR).
Detecting and removing outliers in the mpg variable using the outlier.detect() function.
Visualizing the distribution of the mpg variable after removing the outliers using a histogram.
Detecting and removing outliers in the wt variable using the outlier.detect() function.
Visualizing the distribution of the wt variable after removing the outliers using a histogram.
Using the IQR method to detect outliers in the wt variable of the mtcars dataset and printing the
outliers.
In this exercise, we will use the dplyr package to explore and manipulate a dataset called mtcars.
This dataset contains information about 32 different car models, including their miles per gallon
(mpg), horsepower (hp), weight (wt), and other characteristics.
#Use the select() function to select only the columns mpg, cyl, and wt:
mtcars_select <- select(mtcars, mpg, cyl, wt)
head(mtcars_select)
#Use the filter() function to select only the rows where cyl is equal to 4:
mtcars_filter <- filter(mtcars_select, cyl == 4)
head(mtcars_filter)
#Use the summarize() function to calculate the mean mpg, cyl, and wt for the entire dataset:
mtcars_summarize <- summarize(mtcars_select, mean_mpg = mean(mpg), mean_cyl =
mean(cyl), mean_wt = mean(wt))
mtcars_summarize
#Use the group_by() and summarize() functions to calculate the mean mpg, cyl, and wt for each
unique value of cyl:
12. Plots
# Line graph
ggplot(iris_filtered, aes(x = Sepal.Length, y = Petal.Length)) +
geom_line(color = "red") +
labs(title = "Line Graph of Sepal.Length vs Petal.Length", x = "Sepal.Length", y = "Petal.Length")
# Scatter plot
ggplot(iris_filtered, aes(x = Sepal.Length, y = Petal.Length)) +
geom_point(color = "green") +
labs(title = "Scatter Plot of Sepal.Length vs Petal.Length", x = "Sepal.Length", y = "Petal.Length")
# Histogram
ggplot(iris_filtered, aes(x = Sepal.Length)) +
geom_histogram(binwidth = 0.1, fill = "orange") +
labs(title = "Histogram of Sepal.Length", x = "Sepal.Length", y = "Count")
# Box plot
ggplot(iris_filtered, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
labs(title = "Box Plot of Sepal.Length by Species", x = "Species", y = "Sepal.Length")