Functions and Packages
Functions and Packages
Functions and Packages
1. Introduction to Function
A function is a self-contained block of code that encapsulates a specific task or related group of tasks.
Functions take some inputs, perform their task, and then send an output. R comes with numerous built-
in functions, and it also allows you to create your own, known as user-defined functions.
What is function?
A function in R is a piece of code written to carry out a specified task. It takes some input, processes it,
and returns a result. Functions help in reducing redundancy, making the code more readable, and
debugging easier.
Types of Functions:
Anatomy of a function
A typical function in R has the following components:
[1] 8
#create a data
data <- c(2,3,4,5,6,7,8,9,10,11, 11, 11, 12,11)
#calculate mean
mean_data <- mean(data)
cat("mean of the data: ", mean_data, "\n")
#calculate median
median_data <- median(data)
cat("median of the data: ", median_data, "\n")
#calculate mode
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
mode_data <- get_mode(data)
cat("mode of the data: ", mode_data, "\n")
#creating main function to calculate mean, median and mode for all numerical variables
stat_mtcars <- function(dataset) {
#filter out non-numeric columns
numeric_data <- dataset[sapply(dataset, is.numeric)]
return(results)
str(mtcars)
return(result)
}
for(i in 1:ncol(dataset)){
missing_counts[i] <- sum(is.na(dataset[[i]]))
}
detect_missing_values(sample_data)
A B
1 1
detect_missing_values(mtcars)
named numeric(0)
detect_missing_values(data1)
Introduction to Packages
• Efficiency: Save time and effort by using pre-written and tested code.
Installing packages
You can install packages directly from CRAN (Comprehensive R Archive Network), or other
repositories, and also from local files.
Loading packages
After installing a package, you need to load it into the R environment to use its functions.
#using the 'filter' function from 'dplyr' to filter rows in a data frame.
dplyr::filter(mtcars, mpg > 20)
ls(getNamespace("dplyr"))
help(package="dplyr")
Exercise
1. Basic Statistics:
a. Load the iris dataset. Compute the mean, median and standard deviation for the
Sepal.Length and Sepal.Width columns.
b. Using the mtcars dataset, determine which car model has the highest miles per gallon ( mpg ).
2. Data manipulation:
a. From the mtcars dataset, filter only those rows where the number of cylinders ( cyl ) is 4.
b. Using the iris dataset, group the data by species and compute the average Sepal.Length for
each group.
3. Custom Functions:
a. Write a function that takes a dataframes and a column name as input and returns the range
(min to max) of that column.
b. Develop a function that accepts a dataframe and returns a list of columns that have missing
values along with the count of missing values.
4. Data Cleaning:
a. Identity and replace any negative values in the Sepal.Length column of the iris dataset with
the mean value of the column.
b. Using any dataset of your choice with missing values, impute the missing values using the
median of the respective columns.