R Programming Notes
R Programming Notes
A data frame is one of the most important data structures in R used to store tabular data. It can
hold data of different types (numeric, character, factor, etc.) in columns. Each column can have
a different data type, but all columns must have the same number of rows. Data frames are
commonly used in data analysis, as they resemble tables in databases or Excel sheets. They
are ideal for statistical modeling and manipulation of datasets.
In R, a function is called by using its name followed by parentheses. If the function requires
arguments, they are passed inside the parentheses. For example, calling the sum() function
with arguments looks like this: sum(2, 3, 5) which will return 10. User-defined functions are
also called in the same way.
Factors in R are created using the factor() function. Factors are used to represent
categorical data and store it as levels. For example:
This creates a factor variable with levels "Male" and "Female". Factors are useful in statistical
modeling and data analysis.
x <- c(1, 2, 3, 4)
y <- c(2, 3, 2, 1)
2
Data casting refers to converting data from one type to another in R. For example, converting a
numeric value to a character, or a factor to numeric. This is done using functions like
as.numeric(), as.character(), as.logical(), etc. For example:
x <- as.numeric("5")
The setwd() function is used to set the working directory in R. The working directory is the
folder where R looks for files to read and where it saves files. Setting it correctly is important for
file input/output operations. Example:
setwd("C:/Users/Student/Documents")
This sets the working directory to the specified path so that you can easily read or write files
from/to that location.
1. Free and Open Source: R is completely free to use, modify, and distribute. This makes
it accessible to individuals and institutions without licensing fees.
2. Rich Collection of Packages: R has thousands of packages in CRAN (Comprehensive
R Archive Network) for different domains like bioinformatics, machine learning, finance,
and more.
3. Built-in Statistical Functions: R has a vast library of inbuilt functions for linear and
nonlinear modeling, time-series analysis, clustering, classification, and more.
3
4. Powerful Data Visualization: R excels at data visualization using packages like
ggplot2, lattice, and plotly. It can create detailed and publication-quality graphs.
5. Great for Data Cleaning and Manipulation: With packages like dplyr, tidyr, and
data.table, R makes it easy to manipulate and clean large datasets efficiently.
6. Platform Independent: R runs on all major operating systems (Windows, macOS,
Linux).
7. Community Support: R has an active global community of users, which means it's easy
to find help, tutorials, and packages.
8. Integration with Other Languages: R can work with C, C++, Java, and Python,
allowing integration into larger software projects.
1. Numeric: These include decimal numbers and integers. Example: x <- 5.6 or y <-
10.
2. Integer: Whole numbers are explicitly defined using L. Example: x <- 10L.
3. Character: Text or string data, enclosed in quotes. Example: name <- "R
Language".
4. Logical: Boolean values that are either TRUE or FALSE. Used in conditions and logical
operations. Example: x <- TRUE.
5. Complex: Numbers with real and imaginary parts. Example: z <- 4 + 5i.
6. Raw: Represent bytes and are used in advanced programming tasks like encryption or
data compression.
Each of these data types is the foundation for more complex structures like vectors, matrices,
and data frames. R provides functions like class() and typeof() to check the data type of
any object. Understanding data types is crucial for writing error-free and optimized code in R.
Q 4: Write the code of a program in R which will accept the height and
weight of all students of your class and it will display the details of all
students whose height is less than 6 feet and weight is more than 90
kg.
4
R Code:
# Accepting height and weight of students
# Sample data (you can take input manually or read from a file)
students <- data.frame(
Name = c("Ravi", "Pooja", "Amit", "Sneha", "Karan"),
Height = c(5.5, 6.1, 5.8, 5.4, 6.0), # in feet
Weight = c(92, 85, 95, 88, 91) # in kg
)
✅ Output:
All Students' Data:
Name Height Weight
1 Ravi 5.5 92
2 Pooja 6.1 85
3 Amit 5.8 95
4 Sneha 5.4 88
5 Karan 6.0 91
✅ Explanation:
5
✅ (a) paste()
The paste() function is used to concatenate (join) multiple strings together.
Example:
✅ (b) max()
The max() function returns the maximum value from a set of numeric values.
Example:
✅ (c) seq()
The seq() function is used to generate a sequence of numbers.
Example:
✅ (d) mean()
The mean() function is used to calculate the average of numeric values.
Example:
Q 6 : Describe the R functions for reading a matrix or data frame from a file. Also
demonstrate with some examples. List the different problems which can be encountered
during this process.
1. read.table()
Example:
data <- read.table("data.txt", header = TRUE, sep = ",")
print(data)
○
2. read.csv()
Example:
df <- read.csv("students.csv")
print(df)
○
3. readLines()
Example:
lines <- readLines("file.txt")
print(lines)
○
4. scan()
Example:
numbers <- scan("numbers.txt")
print(numbers)
○
5. Converting Data Frame to Matrix
Example:
df <- read.csv("data.csv")
mat <- as.matrix(df)
print(mat)
○
Q 7: Write and explain the procedure to create and generate the R-factors
and factor levels.
Answer: In R, factors are used to represent categorical data. They store both the
values and the corresponding levels (categories).
9
Creating Factors
# Create a vector of genders
gender <- c("Male", "Female", "Female", "Male", "Male")
# Convert to factor
gender_factor <- factor(gender)
Output:
[1] Male Female Female Male Male
Levels: Female Male
Checking Levels
levels(gender_factor)
Output:
# Ordered factor
education_factor <- factor(education,
levels = c("High School", "Bachelor",
"Master", "PhD"),
ordered = TRUE)
8: Explain about apply method used in R. Also explain lapply and sapply
with suitable examples.
Answer:
R provides a family of apply functions to perform repetitive tasks over data structures without
writing explicit loops.
1. apply()
Used for matrices or data frames, applying a function over rows or columns.
Syntax:
Example:
2. lapply()
Example:
3. sapply()
Similar to lapply(), but returns a vector or matrix instead of a list (if possible).
Example:
Summary Table
Function Input Type Output Type Use Case
9: Explain, how are multiple curves plotted in the same graph? Illustrate
with suitable example.
Answer:
In R, you can plot multiple curves (lines) on the same graph using the plot() function
followed by lines() or points() for additional curves.
Step-by-Step Procedure
Example:
# Data for curve 1
x <- 1:10
y1 <- x^2
# Add legend
legend("topleft", legend = c("y = x^2", "y = x^1.5"),
col = c("blue", "red"), lty = c(1, 2), lwd = 2)
Explanation:
Output:
UNIT-I of R Programming
1. Introduction to R: What is R?
2. Why R?
R is chosen by statisticians and data scientists for several reasons. It has a rich ecosystem of
packages for performing statistical tests, building predictive models, and visualizing data. R is
highly extensible, allowing users to develop their own functions and packages. Additionally, R is
supported by a strong community, with numerous tutorials, documentation, and forums available
online. It is especially useful for exploratory data analysis due to its interactive environment and
strong visualization capabilities.
RStudio is an Integrated Development Environment (IDE) for R that makes coding easier and
more organized. The R command prompt is where you can directly enter and execute R
commands interactively. The R script file is a text file where you write multiple lines of code to
execute all at once or step-by-step. These files are saved with the .R extension. Comments in
14
R are written using the # symbol and are used to describe the code. Comments help make the
code readable and maintainable, especially when shared with others or revisited later.
R allows users to expand its functionalities by using packages, which are collections of
functions, data, and documentation bundled together. To use a package, you must first install it
using install.packages("package_name"). Once installed, you must load it into your R
session using library(package_name). To check which packages are installed, use the
command installed.packages(). To know more about a package, use
packageDescription("package_name"). If you need help with a function or package, use
help(function_name) or simply ?function_name. You can also use
find.package("package_name") to locate the installation path.
In R, data can be entered directly from the keyboard. This is useful for testing or small inputs.
You can use the c() function to create vectors, like x <- c(1, 2, 3, 4). For entering text
data, use c("apple", "banana"). You can also use the scan() function for numeric input
from the keyboard. This kind of input is typically useful for quick data entry or interactive
prompts. R also supports reading data from files (CSV, Excel, etc.), but keyboard entry is
fundamental for learning the language.
● Vectors are the most basic data type in R and can only hold elements of the same type
(e.g., numeric or character). Example: v <- c(1, 2, 3)
● Lists can store elements of different types (e.g., numbers, strings, vectors, and even
other lists). Example: lst <- list(1, "hello", TRUE)
● Matrices are 2D data structures where all elements must be of the same type. Created
using matrix() function.
15
● Arrays are like matrices but with more than two dimensions. Example: array(1:8,
dim = c(2,2,2)) creates a 3D array.
● Factors are used to handle categorical data and store data as levels. Example:
factor(c("male", "female", "male"))
● Data Frames are tabular data structures where each column can contain different types
of data. Example: data.frame(Name=c("Alice","Bob"), Age=c(25,30)). Data
frames are similar to Excel sheets and are widely used for data analysis.
In R, variables are created by assigning values using the <- or = operators. Example: x <-
10 or y = "Hello". The data type of a variable is determined by the kind of data stored in it.
You can check the type using typeof(x) or class(x).
To see all variables currently stored in the R environment, use the command ls(). This helps in
keeping track of your workspace. If you want to delete a variable, use the rm() function. For
example, rm(x) will remove the variable x from memory.
UNIT-II of R Programming,
R Operators
R provides various types of operators that are used to perform operations on variables and
values. The Arithmetic Operators include + for addition, - for subtraction, * for multiplication,
/ for division, ^ for exponentiation, %% for modulus (remainder), and %/% for integer division. For
example, if a <- 10 and b <- 3, then a + b returns 13, a %% b returns 1, and a ^ b
returns 1000. Relational Operators are used to compare two values and return logical results
(TRUE or FALSE). These operators include == (equal to), != (not equal to), <, >, <=, and >=. For
example, a > b returns TRUE and a == b returns FALSE. Logical Operators are used for
combining multiple logical expressions. These include & (element-wise AND), | (element-wise
OR), ! (NOT), && (first element AND), and || (first element OR). For example, (a > 5 & b <
5) returns TRUE, while !(a == b) returns TRUE. Assignment Operators are used to assign
values to variables. The common ones are <-, =, ->, and <<-. For example, x <- 5, or 5 ->
16
y both assign the value 5 to the variable. Miscellaneous Operators include : (sequence
generator) and %in% (checks if an element belongs to a vector). For example, 1:5 gives 1 2 3
4 5, and 3 %in% c(1,2,3) returns TRUE.
R Decision Making
if (x > 0) {
print("Positive")
} else if (x == 0) {
print("Zero")
} else {
print("Negative")
}
This structure checks whether x is positive, zero, or negative. The switch statement is used
when you want to match a value with multiple choices and execute a corresponding block. For
example:
x <- 2
switch(x, "one", "two", "three") # Returns "two"
R Loops
Loops are used to execute a block of code repeatedly. The repeat loop continues executing a
block until a break statement is used to stop it. For example:
x <- 1
repeat {
print(x)
x <- x + 1
if (x > 5) break
}
This will print numbers 1 to 5. The while loop keeps running as long as the condition is TRUE.
For instance:
x <- 1
while (x <= 5) {
print(x)
x <- x + 1
}
This prints 1 to 5. The for loop is useful for iterating through elements in a vector or list.
Example:
for (i in 1:5) {
print(i)
}
This also prints 1 to 5. To control loop execution, R provides break and next statements. The
break statement exits the loop prematurely when a condition is met. The next statement skips
the current iteration and continues with the next one. Example:
for (i in 1:5) {
if (i == 3) next
print(i)
}
R Functions
Functions in R allow for reusable blocks of code. You define a function using the function()
keyword. A simple user-defined function might look like:
This function takes two arguments and returns their sum. Built-in functions are predefined and
commonly used in R. For example:
R provides several functions to manipulate strings or textual data. The substr() function is
used to extract or replace substrings in a character vector. For example, substr("Welcome",
19
1, 4) returns "Welc". The strsplit() function splits strings into substrings based on a
delimiter. For instance, strsplit("Hello World", " ") splits it into ["Hello",
"World"]. The paste() function combines strings. For example, paste("R",
"Programming") returns "R Programming"; using paste(..., sep = "-") gives a
custom separator. The grep() function searches for patterns in strings and returns matching
indices. For example, grep("a", c("cat", "dog", "rat")) returns 1 and 3 (for "cat"
and "rat"). The toupper() and tolower() functions convert text to uppercase and lowercase
respectively. Example: toupper("r language") returns "R LANGUAGE" and
tolower("HELLO") returns "hello".
R – Vectors
Vectors are basic data structures in R. A sequence vector can be created using : or seq();
for example, 1:5 gives 1 2 3 4 5, and seq(1, 10, 2) gives 1 3 5 7 9. The rep()
function repeats elements; rep(1:3, times=2) gives 1 2 3 1 2 3. Vector access is done
using indexing, such as v[2] for the second element. You can assign names to vector
elements like names(v) <- c("a", "b", "c"). Vector math allows operations like
addition, subtraction, multiplication, and division element-wise. For example, c(1,2,3) +
c(4,5,6) results in 5 7 9. Vector recycling occurs when unequal-length vectors are used in
operations; shorter ones are reused. Sorting vectors is done with sort(), order(), and
rev(). For example, sort(c(3,1,2)) returns 1 2 3.
R – List
Lists can hold different types of data. A list is created using list() like myList <-
list(name="Tom", age=25, scores=c(85,90,95)). You can use tags (names) to
access elements: myList$name. Elements can be added using indexing: myList$city <-
"Delhi", or removed using myList$age <- NULL. To get the size of a list, use
length(myList). You can merge lists using c(list1, list2), and convert a list to a
vector using unlist(myList).
20
R – Matrices
A matrix is a two-dimensional array of the same data type, created using matrix() function.
For example, matrix(1:6, nrow=2) gives a 2×3 matrix. Accessing matrix elements is
done with row and column indices like m[1,2]. Matrix computations include element-wise
addition (+), subtraction (-), multiplication (*), and division (/). For example, m1 + m2 adds two
matrices. Matrix multiplication (not element-wise) is done using %*%.
R – Arrays
Arrays in R are used to store multi-dimensional data. An array can be created with array()
function. Example: array(1:8, dim=c(2,2,2)) creates a 3D array. You can name rows
and columns using dimnames() while creating the array. Accessing elements is done with
array[1,2,1] (1st row, 2nd column, 1st layer). Arrays support manipulation using indexing,
and calculations across elements can be done using functions like apply(array,
MARGIN, FUN), where MARGIN is 1 for rows, 2 for columns.
R – Factors
Factors are used for categorical data. You can create a factor using factor(). For example,
gender <- factor(c("Male", "Female", "Male")). R internally stores the levels
(categories). You can check levels using levels(gender). To generate factor levels
automatically, use gl(). Example: gl(2, 3, labels=c("Control", "Treatment"))
generates a factor with 2 levels, each repeated 3 times.
R – Data Frames
Data frames are table-like structures in R with rows and columns. You can create a data frame
using data.frame(). For example:
Access elements using column names like df$Name, or df[1,2] for specific cell. To
understand the data, use:
To extract data, use subset() or simple indexing. To expand a data frame, use:
You can join data frames using rbind() (row-wise), cbind() (column-wise), and merge two
data frames using merge(df1, df2, by="ID"). For reshaping data, use the reshape2
package functions melt() and cast() to convert data between wide and long formats.
R allows users to handle files and directories easily while working on data projects. The
working directory is the location on your computer where R reads and writes files. You can find
the current working directory using getwd(), and change it using
setwd("path/to/your/folder"). To see what files are in the working directory, use dir()
or list.files(). Properly setting the working directory ensures that R knows where to find or
save files, which is essential when reading or writing data.
22
CSV (Comma-Separated Values) files are widely used for storing tabular data. In R, the function
read.csv("filename.csv") is used to read CSV files into data frames. For example, data
<- read.csv("students.csv") loads the file into an object called data. Once loaded, you
can analyze the dataset using several built-in functions. The summary(data) function gives a
summary of each column (like mean, min, max for numeric data). You can use
min(data$Age), max(data$Age), and range(data$Age) to find minimum, maximum, and
range of a specific column. For central tendencies, use mean(data$Marks) for average and
median(data$Marks) for the middle value. The apply() function is used to apply a function
(like mean, sum, etc.) to rows or columns. For example, apply(data[,2:4], 2, mean) will
return column-wise means for the selected columns.
To write data into a CSV file, use write.csv(data, "output.csv"). This is helpful for
exporting processed data or results to be shared or used in other applications.
To read Excel files in R, external packages like readxl or openxlsx are required. These
packages provide functions to directly import .xlsx files. First, you install the package using
install.packages("readxl") and load it with library(readxl). Then, use
read_excel("file.xlsx") to load an Excel sheet into R as a data frame. This method
avoids converting Excel files into CSV format, and it is useful when working with Excel files that
have multiple sheets or rich formatting.
R is known for its excellent data visualization capabilities. With base R functions, you can create
various types of charts:
● Scatter Plots help in visualizing the relationship between two numeric variables. Use
plot(data$Height, data$Weight) to create one.
● Box and Whisker Plots are great for understanding the spread and outliers in data.
Example: boxplot(data$Marks).
● Heat Maps visualize data intensity using colors. Use heatmap(matrix_data) where
matrix_data is a numeric matrix.
● Contour Plots are used to represent 3D data in 2D form using contour lines. These are
useful in surface analysis and are created using contour(matrix_data).
Each of these plots can be customized with labels, colors, and styles for better presentation.
The ggplot2 package is a powerful tool for creating complex and aesthetically pleasing
visualizations. It uses a layered grammar of graphics and requires installing the package using
install.packages("ggplot2") and loading it with library(ggplot2).
ggplot2 makes it easy to add themes, labels, colors, titles, and save plots as images. It is
widely used in data science and statistics for high-quality graphing.