0% found this document useful (0 votes)
10 views

Module_1&2_R Programming

R is an interpreted programming language primarily used for statistical computing and data visualization, developed by Ross Ihaka and Robert Gentleman. It is open-source, supports various platforms, and has a large community with numerous packages available for data analysis. RStudio is an integrated development environment for R, and the document also covers installation, basic scripting, data types, control structures, and functions in R.

Uploaded by

siddharth.tcsc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Module_1&2_R Programming

R is an interpreted programming language primarily used for statistical computing and data visualization, developed by Ross Ihaka and Robert Gentleman. It is open-source, supports various platforms, and has a large community with numerous packages available for data analysis. RStudio is an integrated development environment for R, and the document also covers installation, basic scripting, data types, control structures, and functions in R.

Uploaded by

siddharth.tcsc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 189

R PROGRAMMING

SIDDHARTH JAMBHAVDEKAR
WHAT IS R?

• R is an interpreted computer programming language.


• R is a popular programming language used for statistical computing and graphical presentation.
• The most implementation of S is R.
• Its most common use is to analyze and visualize data.
• R was developed by Ross Ihaka and Robert Gentleman in the University of Auckland, New Zealand.
• This programming language name is taken from the name of both the developers.
• The first project was considered in 1992. The initial version was released in 1995, and in 2000, a stable
was released.
• The current version is 4.4.2 released on 31st October, 2024.
WHY USE R?

• It is a great resource for data analysis, data visualization, data science and machine learning
• It provides many statistical techniques (such as statistical tests, classification, clustering and
data reduction)
• It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc++
• It works on different platforms (Windows, Mac, Linux)
• It is open-source and free
• It has a large community support
• It has many packages (libraries of functions) that can be used to solve different problem e.g.
MLR (Machine Learning in R).
WHAT IS R STUDIO?

• RStudio IDE (or RStudio) is an integrated development environment for R, a


programming language for statistical computing and graphics.
• It is available in two formats: RStudio Desktop is a regular desktop application
while RStudio Server runs on a remote server and allows accessing RStudio using a
web browser.
• The RStudio IDE is a product of Posit PBC (formerly RStudio PBC, formerly RStudio Inc.).
INSTALLATION OF R

• To install R, go to cran.r-project.org

• Choose for Download R for Windows

• Install R Click on install R for the first time.

• Click Download R for Windows. Open the downloaded file.

• Select the language you would like to use during the installation. Then click OK.

• Click Next.

• Select where you would like R to be installed. It will default to your Program Files on your C Drive. Click Next.

• You can then choose which installation you would like.

• (Optional) If your computer is a 64-bit, you can choose the 64-bit User Installation. Then click Next.

• Then specify if you want to customized your startup or just use the defaults. Then click Next.

• Then you can choose the folder that you want R to be saved within or the default if the R folder that was created. Once you have finished, click Next.

• You can then select additional shortcuts if you would like. Click Next.

• Click Finish.
INSTALLATION OF R STUDIO

• Go to https://posit.co/downloads/
• Click Download RStudio.
• Once the packet has downloaded, the Welcome to RStudio Setup Wizard will open. Click
Next and go through the installation steps.
• After the Setup Wizard finishing the installation, RStudio will open.
SIMPLE SCRIPTS IN R

• "Hello World!“ //This will simply print Hello World! As an output

• print("Hello World!") // Even this will print the same thing but we are using print()
function in order to perform the operation.
• myString <- "Hello,
World!" print (myString)

• Output: "Hello, World!"

• Here first statement defines a string variable myString, where we assign a string
"Hello, World!" and then next statement print() is being used to print the value stored
in variable myString.
COMMENTS IN R

• In order to write comments in R we using # and then type the comment we want.
• # This is a comment
"Hello World!“
• "Hello World!" # This is a comment
• # This is a
comment # written
in
# more than just one line
"Hello World!"
VARIABLES IN R

• R does not have a command for declaring a variable.


• A variable is created the moment you first assign a value to it.
• To assign a value to a variable, use the <- sign.
• To output (or print) the variable value, just type the variable name:
• name <- "John"
age <- 40

name # output "John"


age # output 40
• However, R does have a print() function available if you want to use it.
• This might be useful if you are familiar with other programming languages, such as
Python, which often use a print() function to output variables.
• name <- "John Doe"

print(name) # print the value of the name variable


CONCATENATE ELEMENTS

• You can also concatenate, or join, two or more elements, by using the paste() function.
• To combine both text and a variable, R uses comma (,):
• text <- "awesome"

paste("R is", text)

• Output: R is awesome
• You can also use , to add a variable to another variable:
• text1 <- "R is"
text2 <- "awesome"

paste(text1, text2)

• Output: R is awesome
• For numbers, the + character works as a mathematical operator:
• num1 <- 5
num2 <- 10

num1 + num2

• Output: 15
MULTIPLE VARIABLES

• R allows you to assign the same value to multiple variables in one line:
• # Assign the same value to multiple variables in one line
var1 <- var2 <- var3 <- "Orange"

# Print variable values


var1
var2
var3
• For the above code Orange will be printed thrice as it is assigned to three variables.
VARIABLE NAMES

• # Legal variable
names: myvar <-
"John" my_var <-
"John" myVar <-
"John" MYVAR <-
"John"
myvar2 <- "John"
.myvar <- "John"

# Illegal variable names:


2myvar <- "John"
my-var <- "John"
my var <- "John"
_my_var <- "John"
my_v@ar <- "John"
TRUE <- "John"
DATATYPES

• Variables can store data of different types, and different types can do different things.
• In R, variables do not need to be declared with any particular type, and can even change
type after they have been set:
• my_var <- 30 # my_var is type of numeric
my_var <- "Sally" # my_var is now of type character (aka string)
• Basic data types in R can be divided into the following types:

 numeric - (10.5, 55, 787)


 integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
 complex - (9 + 3i, where "i" is the imaginary part)
 character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
 logical (a.k.a. boolean) - (TRUE or FALSE)
• We can use the class() function to check the data type of a variable:

• # numeric
x <- 10.5
class(x)

# integer
x <- 1000L
class(x)

# complex
x <- 9i + 3
class(x)

# character/string
x <- "R is exciting"
class(x)

# logical/boolean
x <- TRUE
class(x)
NUMBERS

• There are three number types in R:


 numeric
 integer
 Complex
• x <- 10.5 # numeric
y <- 10L # integer
z <- 1i # complex
NUMERIC

• A numeric data type is the most common type in R, and contains any number with
or without a decimal, like: 10.5, 55, 787:
• x <- 10.5
y <- 55

# Print values of x and y


x
y
INTEGER

• Integers are numeric data without decimals. This is used when you are certain that
you will never create a variable that should contain decimals. To create an integer
variable, you must use the letter L after the integer value:
• x <- 1000L
y <- 55L

# Print values of x and y


x
y
COMPLEX

• A complex number is written with an "i" as the imaginary part:


• x <- 3+5i
y <- 5i

# Print values of x and y


x
y
TYPE CONVERSION

• You can convert from one type to another with the following functions:

 as.numeric()
 as.integer()
 as.complex()
• x <- 1L # integer
y <- 2 # numeric

# convert from integer to numeric:


a <- as.numeric(x)

# convert from numeric to integer:


b <- as.integer(y)

# print values of x and y


x
y
MATH

• 10+5
• 10-5
• These is simple math operation we can do in R.
BUILT-IN MATH FUNCTIONS

• R also has many built-in math functions that allows you to perform mathematical tasks
on numbers.
• For example, the min() and max() functions can be used to find the lowest or highest number in a
set:
• max(5, 10, 15)

min(5, 10, 15)


• Output:
15

5
• The sqrt() function returns the square root of a number:

 sqrt(16)

Output: 4

• The abs() function returns the absolute (positive) value of a number:

 abs(-4.7)

Output: 4.7

• The ceiling() function rounds a number upwards to its nearest integer, and the floor() function rounds a number downwards to its nearest
integer, and returns the result:

 ceiling(1.4)

floor(1.4)

Output: 2

1
STRINGS

• Strings are used for storing text.


• A string is surrounded by either single quotation marks, or double quotation marks:
• "hello" is the same as 'hello‘.
• If you want the line breaks to be inserted at the same position as in the code, use the cat() function:
 str <- "Lorem ipsum dolor sit
amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua."

cat(str)
STRING LENGTH

• There are many useful string functions in R.


• For example, to find the number of characters in a string, use the nchar() function:
 str <- "Hello World!"

nchar(str)
Output: 12
CHECK A STRING

• Use the grepl() function to check if a character or a sequence of characters are present
in a string:
 str <- "Hello World!"

grepl("H", str)
grepl("Hello",
str) grepl("X",
str)
Output: True
True
False
• x <- c('Geeks', 'Geeksfor', 'Geek',
• 'Geeksfor', 'Gfg')

• grep('Geek', x)
ESCAPE CHARACTERS

• To insert characters that are illegal in a string, you must use an escape character.

• An escape character is a backslash \ followed by the character you want to insert.

 str <- "We are the so-called "Vikings", from the north."

str

• To fix this problem, use the escape character \":

 str <- "We are the so-called \"Vikings\", from the north."

str
cat(str)

• Output: "We are the so-called \"Vikings\", from the


north." We are the so-called "Vikings", from the north.
IF ELSE STATEMENT

• An "if statement" is written with the if keyword, and it is used to specify a block of code to
be executed if a condition is TRUE:
• a <- 33
b <- 200

if (b > a) {
print("b is greater than a")
}
ELSE IF

• The else if keyword is R's way of saying "if the previous conditions were not true, then try this condition":
 a <- 33
b <- 33

if (b > a) {
print("b is greater than a")
} else if (a == b) {
print ("a and b are equal")
}
IF ELSE

• The else keyword catches anything which isn't caught by the preceding conditions:

a <- 200

b <- 33

if (b > a) {

print("b is greater than a")

} else if (a == b) {

print("a and b are equal")

} else {

print("a is greater than b")

}
NESTED-IF STATEMENTS

• You can also have if statements inside if statements, this is called nested if

statements. x <- 41

if (x > 10)

{ print("Above

ten") if (x > 20) {

print("and also above 20!")

} else {

print("but not above 20.")

} else

{ print("below

10.")

}
WHILE LOOP

• With the while loop we can execute a set of statements as long as a condition is TRUE:
• Print i as long as i is less than 6:

• i <- 1
• while (i < 6) {
• print(i)
• i <- i + 1
• }
BREAK

• With the break statement, we can stop the loop even if the while condition is TRUE:

• Exit the loop if i is equal to 4.

• i <- 1

• while (i < 6) {

• print(i)

• i <- i + 1

• if (i == 4) {

• break

• }

• }
NEXT

• With the next statement, we can skip an iteration without terminating the loop:

• Skip the value of 3:

• i <- 0

• while (i < 6) {

• i <- i + 1

• if (i == 3) {

• next

• }

• print(i)

• }
QUIZ
(IF .. ELSE COMBINED WITH A WHILE LOOP)
• To demonstrate a practical example, let us say we play a game of Yahtzee!
• Print "Yahtzee!" If the dice number is 6:
• dice <- 1
while (dice <= 6) {
if (dice < 6) {
print("No Yahtzee")
} else {
print("Yahtzee!")
}
dice <- dice + 1
}
FOR LOOP

• A for loop is used for iterating over a sequence:


• for (x in 1:10)
{ print(x)
}

• Print every item in a list:


• fruits <- list("apple", "banana", "cherry")

for (x in fruits) {
print(x)
}
QUIZ
(IF .. ELSE COMBINED WITH A FOR LOOP)
• Print "Yahtzee!" If the dice number is 6:
• dice <- 1:6

for(x in dice) {
if (x == 6) {
print(paste("The dice number is", x, "Yahtzee!"))
} else {
print(paste("The dice number is", x, "Not Yahtzee"))
}
}
NESTED LOOP

• It is also possible to place a loop inside another loop. This is called a nested loop:
• Print the adjective of each fruit in a list:
• adj <- list("red", "big", "tasty")

fruits <- list("apple", "banana", "cherry")


for (x in adj) {
for (y in fruits)
{ print(paste(x,
y))
}
}
FUNCTION

• A function is a block of code which only runs when it is called.


• You can pass data, known as parameters, into a function.
• A function can return data as a result.
• To create a function, use the function() keyword:
• my_function <- function() { # create a function with the name my_function
print("Hello World!")
}

my_function() #calling a the function


ARGUMENTS

• Information can be passed into functions as arguments.

• Arguments are specified after the function name, inside the parentheses. You can add as many arguments as you want, just separate them with
a comma.

• The following example has a function with one argument (fname). When the function is called, we pass along a first name, which is used inside the
function to print the full name:

• my_function <- function(fname)

{ paste(fname, "Griffin")

my_function("Peter")

my_function("Lois")

my_function("Stewie")
• my_function <- function(fname, lname) {
paste(fname, lname)
}

my_function("Peter", "Griffin")
DEFAULT PARAMETER

• f we call the function without an argument, it uses the default value:


• my_function <- function(country = "Norway") {
paste("I am from", country)
}

my_function("Sweden")
my_function("India")
my_function() # will get the default value, which is Norway
my_function("USA")
RETURN VALUE

• To let a function return a result, use the return() function:


• my_function <- function(x)
{ return (5 * x)
}

print(my_function(3))
print(my_function(5))
print(my_function(9))
NESTED FUNCTIONS

• There are two ways to create a nested function:


• Call a function within another function.
• Write a function within a function.
• Example
• Call a function within another function:
• Nested_function <- function(x, y) {
a <- x + y
return(a)
}

Nested_function(Nested_function(2,2), Nested_function(3,3))
FUNCTION WITHIN A FUNCTION

• Write a function within a function:


• Outer_func <- function(x)
{ Inner_func <- function(y)
{ a <- x + y
return(a)
}
return (Inner_func)
}
output <- Outer_func(3) # To call the Outer_func
output(5)
RECURSION

• R also accepts function recursion, which means a defined function can call itself.
• Recursion is a common mathematical and programming concept. It means that a function calls itself. This
has the benefit of meaning that you can loop through data to reach a result.
• The developer should be very careful with recursion as it can be quite easy to slip into writing a function
which never terminates, or one that uses excess amounts of memory or processor power. However, when
written correctly, recursion can be a very efficient and mathematically-elegant approach to programming.
• In this example, tri_recursion() is a function that we have defined to call itself ("recurse"). We use the k
variable as the data, which decrements (-1) every time we recurse. The recursion ends when the condition is
not greater than 0 (i.e. when it is 0).
• To a new developer it can take some time to work out how exactly this works, best way to find out is by
testing and modifying it.
• tri_recursion <- function(k) {
if (k > 0) {
result <- k + tri_recursion(k - 1)
print(result)
} else
{ result =
0
return(result)
}
}
tri_recursion(6)
GLOBAL VARIABLE

• Variables that are created outside of a function are known as global variables.
• Global variables can be used by everyone, both inside of functions and outside.
• Create a variable outside of a function and use it inside the function:
• txt <- "awesome"
my_function <- function()
{ paste("R is", txt)
}

my_function()
OBJECTS

• Vectors
• Lists
• Matrices
• Arrays
• Data Frames
• Factors
VECTORS

• A vector is simply a list of items that are of the same type.

• To combine the list of items to a vector, use the c() function and separate the items by a comma.

• X<- c(61, 4, 21, 67, 89, 2)

• cat('using c function', X, '\n')

• Y<- seq(1, 10, length.out = 5)

cat('using seq() function', Y, '\n')

• Z<- 2:7

• cat('using colon', Z)
• To create a vector with numerical values in a sequence, use the : operator:

• # Vector with numerical values in a

sequence numbers <- 1:10

numbers

• You can also create numerical values with decimals in a sequence, but note that if the last element does not belong to
the sequence, it is not used:

• # Vector with numerical decimals in a


sequence numbers1 <- 1.5:6.5
numbers1

# Vector with numerical decimals in a sequence where the last element is not used
numbers2 <- 1.5:6.3
numbers2
TYPES OF VECTORS

• Numeric Vectors: Numeric vectors are those which contain numeric values such
as integer, float, etc.
• Character Vectors: Character vectors in R contain alphanumeric values and special
characters.
• Logical Vectors: Logical vectors in R contain Boolean values such as TRUE, FALSE and
NA for Null values.
• To find out how many items a vector has, use the length() function:

• fruits <- c("banana", "apple",

"orange") length(fruits)

• To sort items in a vector alphabetically or numerically, use the sort() function:

• #default sorting ascending

• fruits <- c("banana", "apple", "orange", "mango", "lemon")

numbers <- c(13, 3, 5, 7, 20, 2)

sort(fruits) # Sort a string

sort(numbers) # Sort numbers

sort(fruits, decreasing=TRUE)
• You can access the vector items by referring to its index number inside brackets []. The first item has index 1, the second item has index 2, and so on:

• fruits <- c("banana", "apple", "orange")

# Access the first item (banana)

fruits[1]

• You can also access multiple elements by referring to different index positions with the c() function:

• fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Access the first and third item (banana and orange)

fruits[c(1, 3)]
• You can also use negative index numbers to access all items except the ones specified:
• fruits <- c("banana", "apple", "orange", "mango", "lemon")

• # Access all items except for the first item


• fruits[c(-1)]
CHANGE AN ITEM IN VECTOR

• To change the value of a specific item, refer to the index number:


• Example
• fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Change "banana" to "pear"


fruits[1] <- "pear"

# Print fruits
fruits
REPEAT VECTOR

• To repeat vectors, use the rep() function:

• Repeat each value:

repeat_each <- rep(c(1,2,3), each = 3)

repeat_each

• Repeat the sequence of the vector:

• repeat_times <- rep(c(1,2,3), times = 3)

repeat_times
• Repeat each value independently:
• repeat_indepent <- rep(c(1,2,3), times = c(5,2,1))

repeat_indepent

• To make bigger or smaller steps in a sequence, use the seq() function:


• numbers <- seq(from = 0, to = 100, by = 20)

numbers
• #Deleting a
vector M<- c(8, 10,
2, 5)

M<- NULL
cat('Output vector', M)
APPENDING IN VECTOR

• Using c() function

x <- 1:5

n <- 6:10

y <- c(x, n)

print(y)

• Using append() function

x <- 1:5

x <- append(x, 6:10)

print(x)
• Appending using
indexing my_vector <- c(1,
2, 3, 4)
my_vector[5] <- 5
my_vector[6] <- 6

my_vector
RANGE FUNCTION IN VECTOR

• Range function is used to get the minimum and maximum values of the vector passed
to it as an argument.
• # R program to find the minimum and maximum element of a vector
x <- c(8, 2, 5, 4, 9, 6, 54, 18)
range(x)
FORMAT FUNCTION IN VECTOR

• format() is used to show how the content will be visible. The alignment is based on three types left, right and

center. # Placing string in the left side

result1 <- format("GFG", width = 8, justify = "l")

# Placing string in the center

result2 <- format("GFG", width = 8, justify = "c")

# Placing string in the right

result3 <- format("GFG", width = 8, justify = "r")

# Getting the different string placement

print(result1)

print(result2)

print(result3)
NUMBER FORMATTING

# Rounding off the specified digits into 4

digits result1 <- format(12.3456789, digits=4)

result2 <- format(12.3456789, digits=6)

print(result1)

print(result2)

# Getting the specified minimum number of digits to the right of the decimal point.

result3 <- format(12.3456789, nsmall=2)

result4 <- format(12.3456789, nsmall=7)

print(result3)

print(result4)
• # Getting the number in the string form

• result1 <- format(1234)

• result2 <- format(12.3456789)

• print(result1)

• print(result2)

• # Display numbers in scientific notation

• result3 <- format(12.3456789, scientific=TRUE) #here the output will be multiplied by the power of 10 means 1.23456789e+01 means 1.23456789 × 10¹, which
equals 12.3456789.

• result4 <- format(12.3456789, scientific=FALSE)

• print(result3)

• print(result4)
DATE AND TIME FORMATTING

• # Current date and time


• x <- Sys.time()

• formatted <- format(x, format = "%Y-%m-%d %H:%M:%S")


• print(formatted)

• #Format date with the month’s name


• x <- as.Date("2023-06-27")
• formatted <- format(x, format = "%B %d, %Y")
• print(formatted)
REPLACE FUNCTION

• # Initializing a string vector


• x <- c("GFG", "gfg", "Geeks")
• # Getting the strings
• x
• # Calling replace() function to replace the word gfg at index 2 with the GeeksforGeeks element
• y <- replace(x, 2, "GeeksforGeeks")
• # Getting the new replaced strings
• y
• # Initializing a string vector
• x <- c("GFG", "gfg", "Geeks")
• # Getting the strings
• x
• # Calling replace() function to replace the word GFG at index 1 and Geeks at index 3 with the A and
B elements respectively
• y <- replace(x, c(1, 3), c("A", "B"))
• # Getting the new replaced strings
• y
TOSTRING()

• # Initializing a string vector


• x <- c("GFG", "Geeks", "GeeksforGeekss")

• # Calling the toString() function


• toString(x)
SUBSTRING()

• # Calling substring() function


• substring("Geeks", 2, 3)
• substring("Geeks", 1, 4)
• substring("GFG", 1, 1)
• substring("gfg", 3, 3)
MULTIPLE VALUES USING SUBSTRING()

• # Initializing a string vector


• x < - c("GFG", "gfg", "Geeks")

• # Calling substring() function


• substring(x, 2, 3)
• substring(x, 1, 3)
• substring(x, 2, 2)
STRING REPLACEMENT

• # Initializing a string vector


• x <- c("GFG", "gfg", "Geeks")

• # Calling substring() function


• substring(x, 2, 3) <- c("@")
• print(x)
QUIZ

Create two vectors: a <- c(2, 4, 6, 8, 10) and b <- c(1, 3, 5, 7, 9). Compute their sum,
difference, product, and division.
a <- c(2, 4, 6, 8, 10)

b <- c(1, 3, 5, 7, 9)

sum_vec <- a + b

diff_vec <- a - b

prod_vec <- a * b

div_vec <- a / b

print(sum_vec)

print(diff_vec)

print(prod_vec)

print(div_vec)
• Given a vector nums <- c(5, 12, 18, 25, 7, 30, 45), extract elements greater than 20.
nums <- c(5, 12, 18, 25, 7, 30, 45)
filtered_nums <- nums[nums > 20]
print(filtered_nums)
• Count how many times 5 appears in x <- c(1, 5, 3, 5, 7, 5, 9, 5).
x <- c(1, 5, 3, 5, 7, 5, 9, 5)
count_5 <- sum(x == 5)
print(count_5)
• Write a R program to append value to a given empty vector.
vector = c()
values = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
for (i in 1:length(values))
vector[i] <- values[i]
print(vector)
LISTS

• A list in R can contain many different data types inside it. A list is a collection of data which
is ordered and changeable.
• To create a list, use the list() function:
• # List of strings
thislist <- list("apple", "banana", "cherry")

# Print the list


thislist
• empId = c(1, 2, 3, 4)
• empName = c("Debi", "Sandeep", "Subham", "Shiba")
• numberOfEmp = 4

• empList = list(empId, empName, numberOfEmp)


• print(empList)
• # Creating a named list
• my_named_list <- list(name = "Sudheer", age = 25, city = "Delhi")

• # Printing the named list


• print(my_named_list)
ACCESSING COMPONENTS BY NAMES

empId = c(1, 2, 3, 4)

empName = c("Debi", "Sandeep", "Subham", "Shiba")

numberOfEmp = 4

empList =

list( "ID" =

empId,

"Names" = empName,

"Total Staff" = numberOfEmp

print(empList)

# Accessing components by names

cat("Accessing name components using $ command\n")

print(empList$Names)
ACCESSING COMPONENTS BY INDICES

# Creating a list by naming all its components # Accessing a top level components by indices
empId = c(1, 2, 3, 4) cat("Accessing name components using indices\n")
empName = c("Debi", "Sandeep", "Subham", print(empList[[2]])
"Shiba") numberOfEmp = 4 # Accessing a inner level components by indices
empList = cat("Accessing Sandeep from name using indices\n")
list( "ID" = print(empList[[2]][2])
empId, # Accessing another inner level components by
indices
"Names" = empName,
cat("Accessing 4 from ID using indices\n")
"Total Staff" = numberOfEmp
print(empList[[1]][4])
)
print(empList)
MODIFYING COMPONENTS OF LIST

# Creating a list by naming all its components


empId = c(1, 2, 3, 4)
# Modifying the top-level component
empName = c("Debi", "Sandeep", "Subham", "Shiba")
empList$`Total Staff` = 5
numberOfEmp = 4
empList =
list( "ID" =
# Modifying inner level component empList[[1]]

empId, [5] = 5
"Names" = empName, empList[[2]][5] = "Kamala"
"Total Staff" = numberOfEmp
) cat("After modified the list\n")
cat("Before modifying the list\n")
print(empList)
print(empList)
CONCATENATION OF LIST

# Creating a list by naming all its components # Creating another list


empId = c(1, 2, 3, 4) empAge = c(34, 23, 18, 45)
empName = c("Debi", "Sandeep", "Subham",
"Shiba") numberOfEmp = 4
# Concatenation of list using concatenation
empList = list( operator
"ID" = empId, empList = c(empName, empAge)
"Names" = empName,
"Total Staff" = numberOfEmp cat("After concatenation of the new list\n")
) print(empList)
cat("Before concatenation of the new list\n")
print(empList)
DELETION OF ELEMENTS FROM LIST

# Creating a list by naming all its components


empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
# Deleting a top level components
numberOfEmp = 4 cat("After Deleting Total staff
empList = components\n")
list( "ID" = print(empList[-3])
empId,
"Names" = empName,
"Total Staff" = numberOfEmp # Deleting a inner level components
) cat("After Deleting sandeep from name\
cat("Before deletion the list is\n")
n") print(empList[[2]][-2])
print(empList)
• You can access the list items by referring to its index number, inside brackets. The first item has index 1, the second item
has index 2, and so on:
• thislist <- list("apple", "banana", "cherry")

thislist[1]

• To change the value of a specific item, refer to the index number:


• thislist <- list("apple", "banana", "cherry")
thislist[1] <- "blackcurrant"

# Print the updated list


thislist
• To find out how many items a list has, use the length() function:

• thislist <- list("apple", "banana", "cherry")

• length(thislist)

• To find out if a specified item is present in a list, use the %in% operator:

• Check if "apple" is present in the list:

• thislist <- list("apple", "banana", "cherry")

"apple" %in% thislist


• To add an item to the end of the list, use the append() function:
• Add "orange" to the list:
• thislist <- list("apple", "banana", "cherry")
append(thislist, "orange")

• To add an item to the right of a specified index, add "after=index number" in the append() function:
• Add "orange" to the list after "banana" (index 2):

thislist <- list("apple", "banana", "cherry")


append(thislist, "orange", after = 2)
• You can also remove list items. The following example creates a new, updated list without an "apple" item:

• Remove "apple" from the list:

• thislist <- list("apple", "banana", "cherry")

newlist <- thislist[-1]

# Print the new list


newlist

• You can specify a range of indexes by specifying where to start and where to end the range, by using the : operator:

• Return the second, third, fourth and fifth item:

• thislist <- list("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")

• (thislist)[2:5]
• You can loop through the list items by using a for loop:

• Print all items in the list, one by one:

thislist <- list("apple", "banana",

"cherry")

for (x in thislist) {

print(x)

• Joining two lists.

list1 <- list("a", "b", "c")


list2 <- list(1,2,3)
list3 <- c(list1,list2)

list3
QUIZ

Question:
A company maintains an employee database using R lists. The list stores the following
details:
•Employee IDs as a numeric vector (e.g., c(1, 2, 3, 4))
•Employee Names as a character vector (e.g., c("Debi", "Sandeep", "Subham", "Shiba"))
•Total number of employees as a single numeric value
The HR manager wants to perform the following
operations:
1. Retrieve the list of employee names using both name-based and index-based access.
2. Update the names of employees with a new set of names.
3. Merge the employee list with another list containing department and location details.
4. Remove the total employee count from the list.
Write an R program to help the HR manager accomplish these tasks and display the results
accordingly.
# Creating an employee list
empList <- list(
ID = c(1, 2, 3, 4),
Names = c("Debi", "Sandeep", "Subham", "Shiba"),
Total_Staff = 4
)

# 1. Retrieve employee names using both name-based and index-based access


cat("Accessing employee names:\n")
print(empList$Names) # Using name-based access
print(empList[[2]]) # Using index-based access

# 2. Update employee names with a new set of names


cat("\nUpdating employee names:\n")
empList$Names <- c("Amit", "Rohan", "Sneha", "Vikram") # Updating Names
print(empList$Names)

# 3. Merge employee list with department and location details


cat("\nMerging employee list with department and location:\n")
newList <- list(Department = "HR", Location = "New York")
mergedList <- c(empList, newList) # Merging lists
print(mergedList)

# 4. Remove total employee count from the list


cat("\nRemoving Total_Staff component:\n")
empList <- empList[-3] # Removing Total_Staff
print(empList)
MATRICES

• A matrix is a two dimensional data set with columns and rows.


• A column is a vertical representation of data, while a row is a horizontal representation of data.
• A matrix can be created with the matrix() function. Specify the nrow and ncol parameters to get
the amount of rows and columns:
• # Create a matrix
• thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol =
2) # Print the matrix
thismatrix
• You can also create a matrix with strings:
• thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
thismatrix
• You can access the items by using [ ] brackets. The first number "1" in the bracket
specifies the row-position, while the second number "2" specifies the column-position:
• thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol =
2) thismatrix[1, 2]
• The whole row can be accessed if you specify a comma after the number in the bracket:
• thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
thismatrix[2,]

• The whole column can be accessed if you specify a comma before the number in the bracket:
• thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol =
2) thismatrix[,2]
• Access More Than One Row
• More than one row can be accessed if you use the c() function:
• thismatrix <- matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear", "melon", "fig"), nrow = 3, ncol = 3)
thismatrix[c(1,2),]

• Access More Than One Column


• More than one column can be accessed if you use the c() function:
• thismatrix <- matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear", "melon", "fig"), nrow = 3, ncol = 3)
thismatrix[, c(1,2)]
• Add Rows and Columns

• Use the cbind() function to add additional columns in a Matrix:

• thismatrix <- matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear", "melon", "fig"), nrow = 3, ncol = 3)

newmatrix <- cbind(thismatrix, c("strawberry", "blueberry", "raspberry"))

# Print the new matrix

newmatrix

Note: The cells in the new column must be of the same length as the existing matrix.

• Use the rbind() function to add additional rows in a Matrix:

• thismatrix <- matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear", "melon", "fig"), nrow = 3, ncol = 3)

newmatrix <- rbind(thismatrix, c("strawberry", "blueberry", "raspberry"))

# Print the new matrix

newmatrix

Note: The cells in the new row must be of the same length as the existing matrix.
• Remove Rows and Columns

• Use the c() function to remove rows and columns in a Matrix:

• thismatrix <- matrix(c("apple", "banana", "cherry", "orange", "mango", "pineapple"), nrow = 3, ncol

=2) #Remove the first row and the first column

thismatrix <- thismatrix[-c(1), -c(1)]

thismatrix

• Check if an Item Exists

• To find out if a specified item is present in a matrix, use the %in% operator:

• Check if "apple" is present in the matrix:

• thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol =

2) "apple" %in% thismatrix


• Number of Rows and Columns
• Use the dim() function to find the number of rows and columns in a Matrix:
• thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol =
2) dim(thismatrix)

• Matrix Length
• Use the length() function to find the dimension of a Matrix:
• thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol =
2) length(thismatrix)
• Loop Through a Matrix
• You can loop through a Matrix using a for loop. The loop will start at the first row, moving right:
• Loop through the matrix items and print them:
thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow = 2, ncol = 2)
for (rows in 1:nrow(thismatrix)) {
for (columns in 1:ncol(thismatrix))
{ print(thismatrix[rows, columns])
}
}
• Combine two Matrices

• Again, you can use the rbind() or cbind() function to combine two or more matrices

together: # Combine matrices

Matrix1 <- matrix(c("apple", "banana", "cherry", "grape"), nrow = 2, ncol = 2)

Matrix2 <- matrix(c("orange", "mango", "pineapple", "watermelon"), nrow = 2, ncol = 2)

# Adding it as a rows

Matrix_Combined <- rbind(Matrix1, Matrix2)

Matrix_Combined

# Adding it as a columns

Matrix_Combined <- cbind(Matrix1, Matrix2)

Matrix_Combined
QUIZ

A hospital maintains a patient monitoring system where vital signs such as heart rate, blood pressure, and oxygen levels are
stored in a matrix. Each row represents a patient, and each column represents a different health metric. The hospital's data
analysts need to perform several operations to manage and analyze the data:

• Retrieve patient information by accessing specific rows and columns.

• Add new patients and new health metrics as rows and columns.

• Remove discharged patients and outdated metrics from the matrix.

• Check if a critical value exists in the matrix to identify emergency cases.

• Find the total number of patients and health metrics in the dataset.

• Calculate the total number of recorded values in the matrix.

• Loop through the matrix to analyze each patient's data.

• Merge multiple matrices from different hospital branches for consolidated analysis.
patient_data <- matrix(c(80, 120, 98, 75, 130, 95, 90, 140, 88), nrow = 3, byrow = TRUE, dimnames = list(c("Patient1", "Patient2", "Patient3"), c("Heart Rate", "Blood Pressure", "Oxygen Level")))
print("Initial Patient Data:")
print(patient_data)

patient1_data <- patient_data[1, c("Heart Rate", "Oxygen Level")]


print("Patient 1 Data (Heart Rate & Oxygen Level):")
print(patient1_data)

new_patient <- c(85, 125, 97)


patient_data <- rbind(patient_data, new_patient) rownames(patient_data)
[nrow(patient_data)] <- "Patient4"

temperature <- c(98.6, 99.0, 98.7, 98.5) # Added data for all patients
patient_data <- cbind(patient_data, Temperature = temperature)
print("Updated Patient Data with New Patient and Metric:")
print(patient_data)

# Removing Patient 2 (2nd row) and Blood Pressure (2nd column) using index numbers
patient_data <- patient_data[-2, ]
patient_data <- patient_data[, -2]
print("Data After Removing Patient 2 and 'Blood Pressure' Metric:")
print(patient_data)

critical_values <- patient_data > 100 # Check for heart rate > 100
print("Critical Values (Emergency Cases):")
print(critical_values)

total_patients <- nrow(patient_data)


total_metrics <- ncol(patient_data)
print(paste("Total Patients:", total_patients))
print(paste("Total Metrics:", total_metrics))

total_values <- total_patients * total_metrics


print(paste("Total Recorded Values:", total_values))

for (i in 1:total_patients) {
cat(paste("Analyzing Data for", rownames(patient_data)[i], ":\n"))
cat(paste("Heart Rate:", patient_data[i, "Heart Rate"], "\n"))
cat(paste("Oxygen Level:", patient_data[i, "Oxygen Level"], "\n"))
cat(paste("Temperature:", patient_data[i, "Temperature"], "\n"))
cat("\n")
}

branch2_data <- matrix(c(85, 130, 95, 90, 125, 99, 92, 135, 98), nrow = 3, byrow = TRUE, dimnames = list(c("Patient5", "Patient6", "Patient7"),c("Heart Rate", "Blood Pressure", "Oxygen Level")))

# Adding Temperature column to branch2_data to match patient_data


branch2_temperature <- c(98.4, 98.8, 98.6)
branch2_data <- cbind(branch2_data, Temperature = branch2_temperature)
branch2_data <- branch2_data[, colnames(patient_data)] # Ensure same column order

consolidated_data <- rbind(patient_data, branch2_data)


print("Consolidated Data from Multiple Branches:")
print(consolidated_data)
ARRAYS

• Compared to matrices, arrays can have more than two dimensions.


• We can use the array() function to create an array, and the dim parameter to specify the dimensions:
• # An array with one dimension with values ranging from 1 to 24
• thisarray <- c(1:24)
• thisarray

• # An array with more than one dimension


• multiarray <- array(thisarray, dim = c(4, 3, 2))
• multiarray
• Access Array Items

• You can access the array elements by referring to the index position. You can use the [] brackets to access the desired elements from an

array: thisarray <- c(1:24)

multiarray <- array(thisarray, dim = c(4, 3, 2))

multiarray[2, 3, 2]

The syntax is as follow: array[row position, column position, matrix level]

• You can also access the whole row or column from a matrix in an array, by using the c()

function: thisarray <- c(1:24)

# Access all the items from the first row from matrix one

multiarray <- array(thisarray, dim = c(4, 3, 2))

multiarray[c(1),,1]

• A comma (,) before c() means that we want to access the column.

• A comma (,) after c() means that we want to access the row.
Check if an Item Exists
• To find out if a specified item is present in an array, use the %in% operator:
• Check if the value "2" is present in the array:
• thisarray <- c(1:24)
• multiarray <- array(thisarray, dim = c(4, 3, 2))
• 2 %in% multiarray
• Amount of Rows and Columns

• Use the dim() function to find the amount of rows and columns in an array:

• tlhisarray <- c(1:24)

• multiarray <- array(thisarray, dim = c(4, 3, 2))

• dim(multiarray)

• Array Length

• Use the length() function to find the dimension of an array:

• thisarray <- c(1:24)

• multiarray <- array(thisarray, dim = c(4, 3, 2))

• length(multiarray)
• Loop Through an Array
• You can loop through the array items by using a for loop:
thisarray <- c(1:24)
multiarray <- array(thisarray, dim = c(4, 3, 2))
for(x in multiarray){
print(x)
}
A library maintains a system to track the availability of books in different genres and their popularity ratings. The library uses an
array to store the data. Each row represents a book, each column represents a different attribute (such as genre and popularity
rating), and each "layer" of the array represents different libraries within a region. The library's data analysts need to perform
several operations to manage and analyze the data:

• Retrieve book information by accessing specific rows, columns, and layers.

• Add new books and new attributes (like genre or rating) as rows and columns in the array.

• Remove discontinued books and outdated attributes from the array.

• Check if a book has a critical rating (e.g., below 3, indicating low popularity).

• Find the total number of books and attributes in the dataset.

• Calculate the total number of data points in the array.

• Loop through the array to analyze each book's details (genre, rating).

• Merge multiple arrays from different library branches for consolidated analysis.
book_data <- array(c("Fiction", "5", "Non-fiction", "4", "Mystery", "2", "Sci-Fi", "5", "Fiction", "3", "Biography", "4"), dim = c(3, 2, 2), total_attributes <- dim(book_data)[2]
dimnames = list(c("Book1", "Book2", "Book3"), c("Genre", "Rating"), c("Library1", "Library2")))
print(paste("Total Books:", total_books))

print("Initial Book Data:")


print(paste("Total Attributes:", total_attributes))

print(book_data) total_data_points <- total_books * total_attributes * dim(book_data)[3]

book1_data_library1 <- book_data["Book1", , "Library1"] print(paste("Total Recorded Data Points:", total_data_points))

for (i in 1:total_books) {
print("Book1 Data in Library1 (Genre & Rating):")
for (j in 1:dim(book_data)[3]) {
print(book1_data_library1)
cat(paste("Analyzing", dimnames(book_data)[[1]][i], "Data in Library", dimnames(book_data)[[3]][j], ":\n"))

new_book <- array(c("Romance", "4", "Romance", "4"), dim = c(1, 2, 2),dimnames = list("Book4", c("Genre", "Rating"), c("Library1",
cat(paste("Genre:", book_data[i, "Genre", j], "\n"))
"Library2")))
if (!is.null(dimnames(book_data)[[2]]) && "Rating" %in% dimnames(book_data)[[2]]) {
book_data <- array(c(book_data, new_book), dim = c(4, 2, 2), dimnames = list(c("Book1", "Book2", "Book3", "Book4"), c("Genre",
"Rating"), c("Library1", "Library2"))) cat(paste("Rating:", book_data[i, "Rating", j], "\n"))

}
print("Updated Book Data with New Book:")
cat("\n")
print(book_data)
}

# Removing Book2 (2nd row) and Rating (2nd column) using index numbers
}

book_data <- book_data[-2, , ] # Removes second row (Book2) library3_data <- array(c("Fiction", "4", "Non-fiction", "5", "Mystery", "3",

"Sci-Fi", "2", "Fiction", "4", "Biography", "5"),


book_data <- book_data[, -2, , drop = FALSE] # Removes second column (Rating)

dim = c(3, 2, 1),


print("Data After Removing Book2 and 'Rating' Metric:")
dimnames = list(c("Book1", "Book2", "Book3"),
print(book_data)
c("Genre", "Rating"),

critical_ratings <- as.numeric(book_data[, "Rating", ]) < 3 # Check for ratings less than 3
c("Library3")))

print("Books with Critical Ratings (Below 3):") merged_data <- array(c(book_data, library3_data), dim = c(3, 2, 3), dimnames = list(c("Book1", "Book3", "Book4"), c("Genre", "Rating"), c("Library1", "Library2", "Library3")))

print("Consolidated Book Data from Multiple Libraries:")


print(critical_ratings)
print(merged_data)
total_books <- dim(book_data)[1]
DATA FRAMES

• Data Frames are data displayed in a format as a table.

• Data Frames can have different types of data inside it. While the first column can be character, the second and third can be numeric or logical. However, each column should
have the same type of data.

• Use the data.frame() function to create a data frame:

• Example

• # Create a data frame

• Data_Frame <- data.frame (

• Training = c("Strength", "Stamina", "Other"),

• Pulse = c(100, 150, 120),

• Duration = c(60, 30, 45)

• )

• # Print the data frame

• Data_Frame
SUMMARIZE THE DATA

• Summarize the Data

• Use the summary() function to summarize the data from a Data Frame:

• Example

• Data_Frame <- data.frame (

• Training = c("Strength", "Stamina", "Other"),

• Pulse = c(100, 150, 120),

• Duration = c(60, 30, 45)

• )

• Data_Frame

• summary(Data_Frame)
ACCESSING ITEMS

• We can use single brackets [ ], double brackets [[ ]] or $ to access columns from a data frame:

• Example

• Data_Frame <- data.frame (

• Training = c("Strength", "Stamina", "Other"),

• Pulse = c(100, 150, 120),

• Duration = c(60, 30, 45)

• )

• Data_Frame[1]

• Data_Frame[["Training"]]

• Data_Frame$Training
ADD ROWS

• Add Rows

• Use the rbind() function to add new rows in a Data Frame:

• Example

• Data_Frame <- data.frame (

• Training = c("Strength", "Stamina", "Other"),

• Pulse = c(100, 150, 120),

• Duration = c(60, 30, 45)

• )

• # Add a new row

• New_row_DF <- rbind(Data_Frame, c("Strength", 110, 110))

• # Print the new row

• New_row_DF
ADD COLUMNS

• Use the cbind() function to add new columns in a Data Frame:

• Example

• Data_Frame <- data.frame (

• Training = c("Strength", "Stamina", "Other"),

• Pulse = c(100, 150, 120),

• Duration = c(60, 30, 45)

• )

• # Add a new column

• New_col_DF <- cbind(Data_Frame, Steps = c(1000, 6000, 2000))

• # Print the new column

• New_col_DF
REMOVE ROWS AND COLUMNS

Use the c() function to remove rows and columns in a Data Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Remove the first row and column Data_Frame_New <- Data_Frame[-c(1), -c(1)]

# Print the new data frame Data_Frame_New


AMOUNT OF ROWS AND COLUMNS

Use the dim() function to find the amount of rows and columns in a Data Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

dim(Data_Frame)
You can also use the ncol() function to find the number of columns and nrow() to find the number of rows:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

ncol(Data_Frame) nrow(Data_Frame)
DATA FRAME LENGTH

Use the length() function to find the number of columns in a Data Frame (similar to ncol()):

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

length(Data_Frame)
COMBINING DATA FRAMES

Use the rbind() function to combine two or more data frames in R vertically:

Example
Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame2 <- data.frame (


Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)

New_Data_Frame <- rbind(Data_Frame1, Data_Frame2) New_Data_Frame


And use the cbind() function to combine two or more data frames in R horizontally:

Example
Data_Frame3 <- data.frame (
Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame4 <- data.frame (


Steps = c(3000, 6000, 2000),
Calories = c(300, 400, 300)
)

New_Data_Frame1 <- cbind(Data_Frame3, Data_Frame4) New_Data_Frame1


QUIZ

A university maintains a system to track student enrollment and academic performance across different departments. The university uses a
data frame to store the data. Each row represents a student, and each column represents different attributes such as student name,
department, GPA, and enrollment status. The university's data analysts need to perform several operations to manage and analyze the data:
• Retrieve student information by accessing specific rows and columns.
• Add new students and new attributes (e.g., department, GPA) as rows and columns in the data frame.
• Remove students who have graduated or withdrawn from the university.
• Check if any students have a GPA below 2.0 (indicating academic probation).
• Find the total number of students and attributes in the dataset.
• Calculate the total number of records in the data frame.
• Loop through the data frame to analyze each student’s data (name, department, GPA).
• Merge multiple data frames from different departments for consolidated analysis.
student_data <- data.frame( total_students <- nrow(student_data)

StudentID = c(101, 102, 103, 104), total_attributes <- ncol(student_data)

Name = c("John Doe", "Jane Smith", "Mary Johnson", "Mike Lee"), print(paste("Total Students:", total_students))

Department = c("Computer Science", "Biology", "Chemistry", "Physics"), print(paste("Total Attributes:", total_attributes))

total_records <- total_students * total_attributes


GPA = c(3.5, 2.8, 3.2, 1.9),

print(paste("Total Number of Records:", total_records))


EnrollmentStatus = c("Active", "Active", "Graduated", "Active")
for (i in 1:total_students) {
)
cat(paste("Analyzing Student ID:", student_data$StudentID[i], "\
print("Initial Student Data:")
n")) cat(paste("Name:", student_data$Name[i], "\n"))
print(student_data)
cat(paste("Department:", student_data$Department[i], "\n"))
student_102_info <- student_data[student_data$StudentID == 102, c("Department", "GPA")]
cat(paste("GPA:", student_data$GPA[i], "\n"))
print("Student 102 Department and GPA:")
cat(paste("Enrollment Status:", student_data$EnrollmentStatus[i], "\n"))
print(student_102_info)
cat(paste("Email:", student_data$Email[i], "\n"))

new_student <- data.frame(StudentID = 105, Name = "Sarah Lee", Department = "Mathematics", GPA = 3.9,
cat("\n")
EnrollmentStatus = "Active")
}
student_data <- rbind(student_data, new_student)
math_department_data <- data.frame(
email_data <- data.frame(Email = c("[email protected]", "[email protected]", "[email protected]",
"[email protected]", "[email protected]")) StudentID = c(106, 107),

student_data <- cbind(student_data, email_data) Name = c("Anna White", "Paul Black"),

Department = c("Mathematics", "Mathematics"),


print("Updated Student Data with New Student and Email Column:")

GPA = c(3.6, 3.1),


print(student_data)
EnrollmentStatus = c("Active", "Active")
student_data <- student_data[student_data$EnrollmentStatus != "Graduated", ]
)
print("Student Data After Removing Graduated Students:")
email_math_data <- data.frame(Email = c("[email protected]", "[email protected]"))
print(student_data)
math_department_data <- cbind(math_department_data, email_math_data)
students_on_probation <- student_data[student_data$GPA < 2.0, ]
consolidated_data <- rbind(student_data, math_department_data)
print("Students on Academic Probation (GPA below 2.0):")
print("Consolidated Student Data from Multiple Departments:")
print(students_on_probation)
print(consolidated_data)
FACTORS

Factors are used to categorize data. Examples of factors are:


Demography: Male/Female
Music: Rock, Pop, Classic, Jazz
Training: Strength, Stamina
To create a factor, use the factor() function and add a vector as argument:

Example
# Create a factor
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))

# Print the factor music_genre


To only print the levels, use the levels() function:

Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz")) levels(music_genre)
You can also set the levels, by adding the levels argument inside the factor() function:

Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"), levels = c("Classic", "Jazz", "Pop

levels(music_genre)
FACTOR LENGTH

Use the length() function to find out how many items there are in the factor:

Example
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz")) length(music_genre)
ACCESS FACTORS

To access the items in a factor, refer to the index number, using [] brackets:

Example
Access the third item:
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz")) music_genre[3]
CHANGE ITEM VALUE

• To change the value of a specific item, refer to the index number:


• Example
• Change the value of the third item:
• music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock",

"Jazz")) music_genre[3] <- "Pop"

music_genre[3]
SPECIAL VALUES IN R

Special Value Description


NA Missing values (e.g., missing data in datasets)
NaN "Not a Number" (e.g., result of 0/0)
Inf Positive infinity (e.g., 1/0)
-Inf Negative infinity (e.g., -1/0)
NULL Absence of a value (e.g., empty lists)
x <- c(1, 2, NA, 4, 5)

mean(x)

mean(x, na.rm = TRUE)

y <- 0/0

print(y)

z1 <- 1/0

z2 <- -1/0

print(z1)

print(z2)

v <- NULL

length(v)
TREATING MISSING VALUES

• data <- c(10, 20, NA, 40, NA, 60)


• is.na(data) # Returns TRUE for NA values

• clean_data <- na.omit(data)


• print(clean_data) # Removes NA values

• # Replace NAs with mean of non-missing values


• data[is.na(data)] <- mean(data, na.rm = TRUE)
• print(data)
WORKING WITH CONTINUOUS AND CATEGORICAL
VARIABLES
# Creating a dataset
df <- data.frame(
Age = c(23, 45, 31, 52, 40), # Continuous variable
Gender = factor(c("Male", "Female", "Female", "Male", "Female")), # Categorical variable
Income = c(50000, 60000, 70000, 80000, 75000) # Continuous variable
)

summary(df) # Shows statistics for continuous variables


CONVERTING CONTINUOUS TO CATEGORICAL

• The cut() function in R is used to divide continuous numerical values into discrete
categories (also called binning or bucketing). This is useful for creating groups
from numerical data.
• df$AgeGroup <- cut(df$Age, breaks = c(0, 30, 50, Inf),
• labels = c("Young", "Middle-aged", "Senior"))
• print(df)
IMPLEMENTING DATA STRUCTURES ON BUILT-IN DATA
SETS
• Vector in R (1D Homogenous

Data) # Load built-in dataset

data("mtcars")

# Extracting a numeric vector (Miles Per Gallon - mpg)

mpg_vector <- mtcars$mpg

print(mpg_vector)

# Extracting a character vector (Converting row names to a vector)

car_names <- rownames(mtcars)

print(car_names)

# Logical vector: Identify cars with mpg > 25

high_mpg <- mpg_vector > 25

print(high_mpg)
• Lists in R
# Creating a list with different components
car_list <- list(
mpg_values = mtcars$mpg, # Numeric vector
car_names = rownames(mtcars), # Character vector
first_five = head(mtcars, 5) # Data frame (first 5
rows)
)

print(car_list)
• Matrices in R (2D Homogenous Data)
# Extracting first 10 rows and 3 numeric columns
car_matrix <- as.matrix(mtcars[1:10, c("mpg", "hp", "wt")])
print(car_matrix)
# Matrix operations
col_means <- colMeans(car_matrix) # Column-wise mean
print(col_means)

row_sums <- rowSums(car_matrix) # Row-wise sum


print(row_sums)
• Data frames in R
# Creating a small dataset
df <- data.frame(
Car = rownames(mtcars)[1:5], # Car names (Character)
MPG = mtcars$mpg[1:5], # Numeric
HP = mtcars$hp[1:5], # Numeric
Automatic = mtcars$am[1:5] == 0 # Logical (0 = Automatic, 1 = Manual)
)

print(df)
• Factors in R
# Convert transmission type (0 = Automatic, 1 = Manual) into a factor
mtcars$am <- factor(mtcars$am, levels = c(0, 1), labels = c("Automatic", "Manual"))

# Check structure
str(mtcars$am)

# Count occurrences
table(mtcars$am)
PLOT

The plot() function is used to draw points (markers) in a


diagram. The function takes parameters for specifying points in the
diagram. Parameter 1 specifies points on the x-axis.
Parameter 2 specifies points on the y-axis.
At its simplest, you can use the plot() function to plot two numbers against each other:

Example
Draw one point in the diagram, at position (1) and position (3):
plot(1, 3)
• To draw more points, use vectors:
• Example
• Draw two points in the diagram, one at position (1, 3)
• and one in position (8, 10):
• plot(c(1, 8), c(3, 10))
MULTIPLE POINTS

• You can plot as many points as you like, just make sure you have the same number
of points in both axis:
• Example
• plot(c(1, 2, 3, 4, 5), c(3, 7, 8, 9, 12))
• For better organization, when you have many values, it is better to use variables:
• Example
• x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 8, 9, 12)

plot(x, y)
DRAW A SEQUENCE OF POINTS

If you want to draw dots in a sequence, on both the x-axis and the y-axis, use the : operator:

Example
plot(1:10)
DRAW A LINE

The plot() function also takes a type parameter with the value l to draw a line to connect all the points in the diagram:

Example
plot(1:10, type="l")
PLOT LABELS

The plot() function also accept other parameters, such as main, xlab and ylab if you want to customize the
graph with a main title and different labels for the x and y-axis:

Example
plot(1:10, main="My Graph", xlab="The x-axis", ylab="The y axis")
GRAPH APPEARENCE

There are many other parameters you can use to change the appearance of the points.
Colors
Use col="color" to add a color to the points:

Example
plot(1:10, col="red")
SIZE

Use cex=number to change the size of the points (1 is default, while 0.5 means 50% smaller, and 2 means
100% larger):

Example
plot(1:10, cex=2)
POINT SHAPE

Use pch with a value from 0 to 25 to change the point shape format:

Example
plot(1:10, pch=25, cex=2)
LINE GRAPH

A line graph has a line that connects all the points in a diagram.
To create a line, use the plot() function and add the type parameter with a value of "l":

Example
plot(1:10, type="l")
LINE COLOR

The line color is black by default. To change the color, use the col parameter:

Example
plot(1:10, type="l", col="blue")
LINE WIDTH

To change the width of the line, use the lwd parameter (1 is default, while 0.5 means 50% smaller, and 2 means 100% larger):

Example
plot(1:10, type="l", lwd=2)
LINE STYLES

The line is solid by default. Use the lty parameter with a value from 0 to 6 to
specify the line format.
For example, lty=3 will display a dotted line instead of a solid line:

Example
plot(1:10, type="l", lwd=5, lty=3)

Available parameter values for lty:


•0 removes the line
•1 displays a solid line
•2 displays a dashed line
•3 displays a dotted line
•4 displays a "dot dashed" line
•5 displays a "long dashed" line
•6 displays a "two dashed" line
MULTIPLE LINES

To display more than one line in a graph, use the plot() function together
with the lines() function:

Example
line1 <- c(1,2,3,4,5,10)
line2 <- c(2,5,7,8,9,10)

plot(line1, type = "l", col = "blue")


lines(line2, type="l", col = "red")
SCATTER PLOTS

You learned from the Plot chapter that the plot() function is used to plot numbers
against each other.
A "scatter plot" is a type of plot used to display the relationship between two
numerical variables, and plots one dot for each observation.
It needs two vectors of same length, one for the x-axis (horizontal) and one for the
y-axis (vertical):

Example
x <- c(5,7,8,7,2,2,9,4,11,12,9,6)
y <- c(99,86,87,88,111,103,87,94,78,77,85,86)

plot(x, y)
• The observation in the example above should show the result of 12 cars passing by.

• That might not be clear for someone who sees the graph for the first time, so let's add a header and
different labels to describe the scatter plot better:

• Example

• x <- c(5,7,8,7,2,2,9,4,11,12,9,6)
y <- c(99,86,87,88,111,103,87,94,78,77,85,86)

plot(x, y, main="Observation of Cars", xlab="Car age", ylab="Car speed")


COMPARE PLOTS

In the example above, there seems to be a relationship between the car speed and
age, but what if we plot the observations from another day as well? Will the scatter
plot tell us something else?
To compare the plot with another plot, use the points() function:

Example
Draw two plots on the same figure:
# day one, the age and speed of 12 cars:
x1 <- c(5,7,8,7,2,2,9,4,11,12,9,6)
y1 <- c(99,86,87,88,111,103,87,94,78,77,85,86)

# day two, the age and speed of 15 cars:


x2 <- c(2,2,8,1,15,8,12,9,7,3,11,4,7,14,12)
y2 <- c(100,105,84,105,90,99,90,95,94,100,79,112,91,80,85)

plot(x1, y1, main="Observation of Cars", xlab="Car age", ylab="Car


speed", col="red", cex=2)
points(x2, y2, col="blue", cex=2)
PIE CHARTS

A pie chart is a circular graphical view of data.


Use the pie() function to draw pie charts:

Example
# Create a vector of
pies x <-
c(10,20,30,40)

# Display the pie chart


pie(x)

As you can see the pie chart draws one pie for each value in the vector (in this
case 10, 20, 30, 40).
By default, the plotting of the first pie starts from the x-axis and
move counterclockwise.
Note: The size of each pie is determined by comparing the value with all the
other values, by using this formula:
The value divided by the sum of all values: x/sum(x)
START ANGLE

You can change the start angle of the pie chart with the init.angle parameter.
The value of init.angle is defined with angle in degrees, where default angle is
0.

Example
Start the first pie at 90 degrees:
# Create a vector of
pies x <-
c(10,20,30,40)

# Display the pie chart and start the first pie at 90


degrees pie(x, init.angle = 90)
LABELS AND HEADER

Use the label parameter to add a label to the pie chart, and use
the main parameter to add a header:

Example
# Create a vector of
pies x <-
c(10,20,30,40)

# Create a vector of labels


mylabel <- c("Apples", "Bananas", "Cherries", "Dates")

# Display the pie chart with labels


pie(x, label = mylabel, main =
"Fruits")
COLORS

You can add a color to each pie with the col parameter:

Example
# Create a vector of colors
colors <- c("blue", "yellow", "green", "black")

# Display the pie chart with colors


pie(x, label = mylabel, main = "Fruits", col = colors)
LEGEND

To add a list of explanation for each pie, use the legend() function:

Example
# Create a vector of labels
mylabel <- c("Apples", "Bananas", "Cherries", "Dates")

# Create a vector of colors


colors <- c("blue", "yellow", "green", "black")

# Display the pie chart with colors


pie(x, label = mylabel, main = "Pie Chart", col = colors)

# Display the explanation box


legend("bottomright", mylabel, fill = colors)

The legend can be positioned as either:


bottomright, bottom, bottomleft, left, topleft, top, topright, right,
cente r
BAR CHARTS

A bar chart uses rectangular bars to visualize data. Bar charts can be displayed
horizontally or vertically. The height or length of the bars are proportional to the
values they represent.
Use the barplot() function to draw a vertical bar chart:

Example
# x-axis values
x <- c("A", "B", "C", "D")

# y-axis values
y <- c(2, 4, 6, 8)

barplot(y, names.arg = x)

•The x variable represents values in the x-axis (A,B,C,D)


•The y variable represents values in the y-axis (2,4,6,8)
•Then we use the barplot() function to create a bar chart of the values
•names.arg defines the names of each observation in the x-axis
BAR COLOR

Use the col parameter to change the color of the bars:

Example
x <- c("A", "B", "C", "D")
y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, col = "red")


DENSITY / BAR TEXTURE

To change the bar texture, use the density parameter:

Example
x <- c("A", "B", "C", "D")
y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, density = 10)


BAR WIDTH

Use the width parameter to change the width of the bars:

Example
x <- c("A", "B", "C", "D")
y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, width = c(1,2,3,4))


HORIZONTAL BARS

If you want the bars to be displayed horizontally instead of vertically, use horiz=TRUE:

Example
x <- c("A", "B", "C", "D")
y <- c(2, 4, 6, 8)

barplot(y, names.arg = x, horiz = TRUE)


DATA SET

A data set is a collection of data, often presented in a table.


There is a popular built-in data set in R called "mtcars" (Motor Trend Car Road Tests), which is
retrieved from the 1974 Motor Trend US Magazine.
In the examples below (and for the next chapters), we will use the mtcars data set, for
statistical purposes:

Example
# Print the mtcars data
set mtcars
INFORMATION ABOUT DATA SET

You can use the question mark (?) to get information about the mtcars data set:

Example
# Use the question mark to get information about the data set

?mtcars
GET INFORMATION
Use the dim() function to find the dimensions of the data set, and the names() function to view the names of the variables:

Example
Data_Cars <- mtcars # create a variable of the mtcars data set for better organization

# Use dim() to find the dimension of the data set dim(Data_Cars)

# Use names() to find the names of the variables from the data set names(Data_Cars)
Use the rownames() function to get the name of each row in the first column, which is the name of each car:

Example
Data_Cars <- mtcars rownames(Data_Cars)
If you want to print all values that belong to a variable, access the data frame by using the $ sign, and the name of the variable (for example cyl (cylin

Example
Data_Cars <- mtcars

Data_Cars$cyl
To sort the values, use the sort() function:

Example
Data_Cars <- mtcars sort(Data_Cars$cyl)
ANALYZING THE DATA
Now that we have some information about the data set, we can start to analyze it with some statistical numbers.
For example, we can use the summary() function to get a statistical summary of
the data:

Example
Data_Cars <- mtcars summary(Data_Cars)
MEAN

• Find the average weight (wt) of a car:


• Data_Cars <- mtcars

mean(Data_Cars$w
t)
MEDIAN

Find the mid point value of weight (wt):


Data_Cars <- mtcars

median(Data_Cars$wt)
MODE

• The mode value is the value that appears the most number of times.
• R does not have a function to calculate the mode. However, we can create our own
function to find it.
• Data_Cars <- mtcars

names(sort(-table(Data_Cars$wt)))[1]
PERCENTILES

• Percentiles are used in statistics to give you a number that describes the value that
a given percent of the values are lower than.
• Data_Cars <- mtcars

# c() specifies which percentile you want


quantile(Data_Cars$wt, c(0.75))
If you run the quantile() function without specifying the c() parameter, you will get the percentiles of 0, 25, 50, 75 and 100:

Example
Data_Cars <- mtcars quantile(Data_Cars$wt)

You might also like