0% found this document useful (0 votes)
5 views

Data Analysis Using R - 3

Uploaded by

harshvasudevkoli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Analysis Using R - 3

Uploaded by

harshvasudevkoli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

DAR

3.Data Structures in R(Contd.)

3.1 Matrices

 Matrix is a rectangular arrangement of numbers in rows and columns.


 A column is a vertical representation of data, while a row is a horizontal representation of data.
 A matrix is a two dimensional data set with columns and rows.
 A matrix can be created with the matrix() function.
 Syntax -
The basic syntax for creating a matrix in R is −
matrix(data, nrow, ncol, dimnames, byrow=TRUE)
 data is the input vector which becomes the data elements of the matrix.

 nrow is the number of rows to be created.

 ncol is the number of columns to be created.

 byrow is a logical clue.Bydefault matrix element are arranged by columns,by setting parameter
byrow = TRUE we can arrange the elements row-wise.

 dimname is the names assigned to the rows and columns.

3.1.1 Operations on Matrix and element manipulation

 Creating Matrices :

# Elements are arranged sequentially by row.


M <- matrix(3:14, nrow = 4, byrow = TRUE)
print(M)
# Elements are arranged sequentially by column.
N <- matrix(3:14, nrow = 4, byrow = FALSE)
print(N)
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
P <- matrix(3:14, nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)
Output:[,1] [,2] [,3]
[1,] 3 4 5
[2,] 6 7 8
[3,] 9 10 11
[4,] 12 13 14

[,1] [,2] [,3]


[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14

 Properties of Matrix: str( ), dim( ), length( ) functions :

 str( ) : The str() function in R to display the internal structure of any R object in a compact way. It
is an alternative function to display the summary of the output produced, especially when the data
set is huge.
Syntax: str(object)
Lets find the structure of the above matrix P using str( ).
 str(P)
Ouptput: int [1:4, 1:3] 3 6 9 12 4 7 10 13 5 8 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:4] "row1" "row2" "row3" "row4"
..$ : chr [1:3] "col1" "col2" "col3"

 dim( ) : The dim( ) function in R Language is used to get or set the dimension of the specified
matrix, array or data frame.
Syntax: dim(object)
Lets find the dimension of the above matrix M using dim( ).
 dim(M)
Ouptput:[1] 4 3

 length( ) : In R, the length( ) function is used to get the length of the object. In simpler terms, it is
used to find out how many items are present in that object.
Syntax: length(object)
Lets find the length of the above matrix N using length( ).
 length(N)
Ouptput:[1] 12

 Naming rows and columns in Matrix: rownames( ), colnames( ) :

 rownames( ) : rownames( ) function in R Language is used to set the names to rows of a matrix.
Syntax: rownames(matrix_name) <- value
 A = matrix(1:9, 3, 3, byrow = TRUE)
rownames(A) <- c("X","Y","Z")
print(A)

Ouptput: [,1] [,2] [,3]


X 1 2 3
Y 4 5 6
Z 7 8 9

 colnames( ) : colnames( ) function in R Language is used to set the names to columns of a matrix.
Syntax: colnames(matrix_name) <- value
 A = matrix(1:9, 3, 3, byrow = TRUE)
colnames(A) <- c("X","Y","Z")
print(A)

Ouptput: X Y Z
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9

 Accessing and replacing Matrix elements using index :

 Accessing Elements of matrix using index:

 We can access the items by using [ ] brackets. The first number "1" in the bracket specifies the

row-position, while the second number "2" specifies the column-position:

m1(c("apple", "banana", "cherry", "orange","grape", "pineapple", "pear", "melon", "fig"),


nrow = 3, ncol = 3)
m1
Ouptput:[,1] [,2] [,3]
[1,] "apple" "orange" "pear"
[2,] "banana" "grape" "melon"
[3,] "cherry" "pineapple" "fig"

 m1[1, 2] #[1] "orange"


 The whole row can be accessed if we specify a comma after the number in the bracket:
 m1[2,] #[1] "banana" "grape" "melon"

 The whole row can be accessed if we specify a comma before the number in the bracket:
 m1[,2] #[1] "orange" "grape" "pineapple"

 More than one row can be accessed if we use the c() function:
 m1[c(1,2),] # [,1] [,2] [,3]
#[1,] "apple" "orange" "pear" 2
#[2,] "banana" "grape" "melon"
 More than one column can be accessed if we use the c() function:
 m1[,c(1,2)] # [,1] [,2]
#[1,] "apple" "orange"
#[2,] "banana" "grape"
#[3,] "cherry" "pineapple"

 Replacing/Modifying Elements of matrix using index:

 In R we can modify the elements of the matrices by a direct assignment.

A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3, ncol = 3, byrow = TRUE)


cat("The 3x3 matrix:\n")
print(A)
# Editing the 3rd rows and 3rd column element from 9 to 30 by direct assignments
A[3, 3] = 30
cat("After edited the matrix\n")
print(A)
Output:
The 3x3 matrix:
[, 1] [, 2] [, 3]
[1, ] 1 2 3
[2, ] 4 5 6
[3, ] 7 8 9

After edited the matrix


[, 1] [, 2] [, 3]
[1, ] 1 2 3
[2, ] 4 5 6
[3, ] 7 8 30

 Adding rows and columns in Matrix: rbind( ), cbind( )

 rbind( ): We can use the rbind( ) function to add additional rows in a Matrix:

Note: The cells in the new row must be of the same length as the existing matrix.

 Using this method we can add a row into the existing matrix:
m1<- matrix(letters[1:9], nrow = 3, ncol = 3)
m1
m2<- rbind(m1,letters[10:12])
m2

Output:
[,1] [,2] [,3] #m1
[1,] "a" "d" "g"
[2,] "b" "e" "h"
[3,] "c" "f" "i"

[,1] [,2] [,3] #m2


[1,] "a" "d" "g"
[2,] "b" "e" "h"
[3,] "c" "f" "i"
[4,] "j" "k" "l"

 Using this method we can combine 2 matrix row wise(Concatenation of 2 matrix using rbind( )) :
m1 <- matrix(1:9, nrow = 3, ncol = 3)
m1
m2<-matrix(10:12, nrow = 1, ncol = 3)
m2
m3<- rbind(m1,m2)
m3
Output:
[,1] [,2] [,3] #m1
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

[,1] [,2] [,3] #m2


[1,] 10 11 12

[,1] [,2] [,3] #m3


[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[4,] 10 11 12

 cbind( ): We can use the cbind( ) function to add additional columns in a Matrix:

Note: The cells in the new column must be of the same length as the existing matrix.

 Using this method we can add a column into the existing matrix:
m1<- matrix(letters[1:9], nrow = 3, ncol = 3)
m1
m2<- cbind(m1,letters[10:12])
m2

Output:
[,1] [,2] [,3] #m1
[1,] "a" "d" "g"
[2,] "b" "e" "h"
[3,] "c" "f" "i"

[,1] [,2] [,3] [,4] #m2


[1,] "a" "d" "g" "j"
[2,] "b" "e" "h" "k"
[3,] "c" "f" "i" "l"

 Using this method we can combine 2 matrix column wise(Concatenation of 2 matrix using
cbind( )) :
m1<- matrix(1:4, nrow = 2, ncol = 2)
m2<- matrix(5:8,2,2)
m3<-cbind(m1,m1)
m3

Output:[,1] [,2] [,3] [,4] #m3


[1,] 1 3 1 3
[2,] 2 4 2 4

 Matrix Arithmetic:

 Various Arithmetic operations are performed on the matrices using the R operators. The result of
the operation is also a matrix.
 The dimensions (number of rows and columns) should be same for the matrices involved in the
operation.
 Arithmetic operations include addition (+), subtraction (-), multiplication(*), division (/) and
modulus(%).
 Creating 2 matrix to perform operations on it:
# Create two 2x3 matrices.
matrix1 <- matrix(c(10, 9, 7, 6, 3, 6), nrow = 2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 1, 4, 2, 4), nrow = 2)
print(matrix2)

Output:

[,1] [,2] [,3] #matrix 1


[1,] 10 7 3
[2,] 9 6 6
[,1] [,2] [,3] #matrix 2
[1,] 5 1 2
[2,] 2 4 4

 Addition:
result <- matrix1 + matrix2
cat("Result of addition","\n")
print(result)

Output:
Result of addition
[,1] [,2] [,3]
[1,] 15 8 5
[2,] 11 10 10

 Subtraction:
result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result)

Output:
Result of subtraction
[,1] [,2] [,3]
[1,] 5 6 1
[2,] 7 2 2

 Multiplication:
result <- matrix1 * matrix2
cat("Result of multiplication","\n")
print(result)

Output:
Result of multiplication
[,1] [,2] [,3]
[1,] 50 7 6
[2,] 18 24 24

 Division:
result <- matrix1 / matrix2
cat("Result of division","\n")
print(result)

Output:
Result of division
[,1] [,2] [,3]
[1,] 2.0 7.0 1.5
[2,] 4.5 1.5 1.5

 Scalar Arithmetic: We can do basic arithmetic operations on numeric scalars – they won't work
on character scalars.
Performing scalar arithmetic on above matrix 1
 result <- matrix1 + 1 # adding 1 into matrix
cat("Result of addition","\n")
print(result)

Output:
Result of addition
[,1] [,2] [,3]
[1,] 11 8 4
[2,] 10 7 7

 result <- matrix1 - 2 # Subtract 2 from matrix


cat("Result of subtraction","\n")
print(result)

Output:
Result of subtraction
[,1] [,2] [,3]
[1,] 8 5 1
[2,] 7 4 4

 result <- matrix1 * 3 # Multiply 3 with matrix


cat("Result of multiplication","\n")
print(result)

Output:
Result of multiplication
[,1] [,2] [,3]
[1,] 30 21 9
[2,] 27 18 18

 result <- matrix1 / 2 # Divide matrix by 2


cat("Result of division","\n")
print(result)

Output:
Result of division
[,1] [,2] [,3]
[1,] 5.0 3.5 1.5
[2,] 4.5 3.0 3.0

 Extra:

 Miscellaneous functions:

 sum(matrix1) #[1]41
 rowSums(matrix1) #[1]20 21
 colSums(matrix1) #[1]19 13 9
 min(matrix1) #[1]3
 max(matrix1) #[1]10
 is.matrix(matrix1) #[1]TRUE
 ncol(matrix1) #[1]3
 nrow(matrix1) #[1]2

 Different ways to assign names to rows & columns :

 row=c(“r1”,”r2”,”r3”)

columns=c(“c1”,”c2”,”c3”)

M3<-matrix(1:9,nrow=3,dimnames=list(row,columns))
 M4<-matrix(1:9,nrow=3, ncol=3)

rownames(M4)=c (“r1”,”r2”,”r3”)

colnames(M4)=c (“c1”,”c2”,”c3”)

 Modify Matrix elements:

 M1[1,3]<-0

 M2[M2%2==0]<-1 #replace all even elements by 1

 Transposition of Matrix:

 Transpose of a matrix is an operation in which we convert the rows of the matrix in column and column of

the matrix in rows.

row=c("r1","r2")
columns=c("c1","c2")
M3<-matrix(1:4,nrow=2,dimnames=list(row,columns))
M3
M3<-t(M3)
M3

Output:
c1 c2
r1 1 3
r2 2 4
r1 r2
c1 1 2
c2 3 4
 Deletion of Rows and Columns :
m<- matrix(c("apple", "banana", "cherry", "orange", "mango", "pineapple"), nrow = 3,
ncol =2)
m
#Remove the first row and the first column
m<- m[-c(1), -c(1)]
M

Output:
[,1] [,2]
[1,] "apple" "orange"
[2,] "banana" "mango"
[3,] "cherry" "pineapple"
[1] "mango" "pineapple"

3.2 Dataframes

 A data frame is a heterogenous,two-dimensional array-like structure or a table in which a column


contains values of one variable, and rows contains one set of values from each column.
 A data frame is a special case of the list in which each component has equal length.
 A data frame is used to store data table and the vectors which are present in the form of a list in a
data frame, are of equal length.
 In a simple way, it is a list of equal length vectors. A matrix can contain one type of data, but a data
frame can contain different data types such as numeric, character, factor, etc.
 Use the data.frame() function to create a data frame

There are following characteristics of a data frame.

o The columns name should be non-empty.


o The rows name should be unique.
o The data which is stored in a data frame can be a factor, numeric, or character type.
o Each column contains the same number of data items.

3.2.1 Operations on Dataframe and element manipulation

 Creating Dataframe :

friend.data <- data.frame(


friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE #By default dataframe treats data as factors and
#factors store data in form of integer so make it false
)
print(friend.data)

Output:

friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni

 Getting the structure of R Data Frame

In R, we can find the structure of our data frame. R provides an in-build function called str() which
returns the data with its complete structure.

# structure of the data frame


friend.data <- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# using str()
print(str(friend.data))
Output:

'data.frame': 5 obs. of 2 variables:


$ friend_id : int 1 2 3 4 5
$ friend_name: chr "Sachin" "Sourav" "Dravid" "Sehwag" ...
NULL

 Extracting data from Dataframe

The data of the data frame is very crucial for us. To manipulate the data of the data frame, it is
essential to extract it from the data frame. We can extract the data in three ways which are as follows:

1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.

# Extracting friend_name column


result <- data.frame(friend.data$friend_name)
print(result)

Output:

friend.data.friend_name
1 Sachin
2 Sourav
3 Dravid
4 Sehwag
5 Dhoni

Some more ways to access data from dataframe


 friend$friend_name[3] #displays the 3rd value from friend_name column
[1] "Dravid"
 friend["friend_name"] #displays friend_name col in tabular form
friend _name
1 Sachin
2 Sourav
3 Dravid
4 Sehwag
5 Dhoni
 friend$friend_id #displays friend_id in vector format
[1] 1 2 3 4 5

 friend[[2]] #access the 2nd col and shows data in vector format
 Friend[2] #access the 2nd col and shows data in a list format

 friend$friend_name[1] #display name of 1st value in name col

 friend[1:2,] #display 1st two row of each col

 Data Reshaping: Adding rows and columns, Merge Dataframes, Melting and Casting of
Dataframe.
 Adding rows and columns:
 R allows us to do modification in our data frame. Like matrices modification, we can modify
our data frame through re-assignment.
 We cannot only add rows and columns, but also we can delete them. The data frame is
expanded by adding rows and columns.

We can

1. Add a column by adding a column vector with the help of a new column name using cbind()
function.
2. Add rows by adding new rows in the same structure as the existing data frame and using rbind()
function
3. Delete the columns by assigning a NULL value to them.
4. Delete the rows by re-assignment to them.

stud.data<- data.frame(

stud_id = c (1:5),

stud_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),

age=c(17,18,18,17,18),

stringsAsFactors = FALSE)

x <-c(6,"Vaishali",18) #Adding column in the data frame

stud.data<-rbind(stud.data,x)

stud.data

#Adding column in the data frame

stud.data<-cbind(stud.data,mob=c(1234,3456,5478,2354,5725,2354))

stud.data

#stud.data$add=c(“pune”, “Mumbai”, “Nashik”, “Beed”) another way to add col

Output:

stud_id stud_name age


1 1 Shubham 17
2 2 Arpita 18
3 3 Nishka 18
4 4 Gunjan 17
5 5 Sumit 18

stud_id stud_name age


1 1 Shubham 17
2 2 Arpita 18
3 3 Nishka 18
4 4 Gunjan 17
5 5 Sumit 18
6 6 Vaishali 18

stud_id stud_name age mob


1 1 Shubham 17 1234
2 2 Arpita 18 3456
3 3 Nishka 18 5478
4 4 Gunjan 17 2354
5 5 Sumit 18 5725
6 6 Vaishali 18 2354

 Merging 2 dataframes:

Using cbind( ):

 stud_info<-data.frame(stud_id=1:6,add=c(“Pune”,“Mumbai”,“Nashik”,“Beed”,“Mumbai”,

“Pune”))

stud_info

Output:

stud_id add
1 1 Pune
2 2 Mumbai
3 3 Nashik
4 4 Beed
5 5 Mumbai
6 6 Pune

sinfo<-cbind(stud.data,stud_info)

sinfo

Output:

stud_id stud_name age mob stud_id add


1 1 Shubham 17 1234 1 Pune
2 2 Arpita 18 3456 2 Mumbai
3 3 Nishka 18 5478 3 Nashik
4 4 Gunjan 17 2354 4 Beed
5 5 Sumit 18 5725 5 Mumbai
6 6 Vaishali 18 2354 6 Pune

Using rbind( ):

emp<-data.frame(eid=1:3,ename=c("jack","john","mickey"))
emp
ep<-data.frame(eid=4:6,ename=c("jacky","max","jerry"))
ep
Output:

eid ename
1 1 jack
2 2 john
3 3 mickey

eid ename
1 4 jacky
2 5 max
3 6 jerry

emp_info<-rbind(emp,ep)
emp_info
Output:

eid ename
1 1 jack
2 2 john
3 3 mickey
4 4 jacky
5 5 max
6 6 jerry

 Modify Values in Dataframe


 stud.data[1,"stud_name"]<-"Geeta" #modify value of 1st row of stud_name col
stud.data
Output:
stud_name age mob
2 Geeta 18 3456
3 Nishka 18 5478
4 Gunjan 17 2354
5 Sumit 18 5725
6 Vaishali 18 2354

 stud.data$mob[1]=8779

 Merge Dataframes:

 Dataframe is made up of three principal components, the data, rows, and columns.
 In R we use merge() function to merge two dataframes in R. This function is present
inside join() function of dplyr package.
 The most important condition for joining two dataframes is that the column type should be the
same on which the merging happens.
 merge() function works similarly like join in DBMS. Types of Merging Available in R are,
1. Natural Join or Inner Join
2. Left Outer Join
3. Right Outer Join
4. Full Outer Join
5. Anti Join
Basic Syntax of merge() function in R:
Syntax: merge(df1, df2, by.df1, by.df2, all.df1, all.df2, sort = TRUE)
Parameters:
df1: one dataframe df2: another dataframe
by.df1, by.df2: The names of the columns that are common to both df1 and df2.
all, all.df1, all.df2: Logical values that actually specify the type of merging happens.

# Data frame 1
df1 = data.frame(StudentId = c(101:106),
Subject = c("Hindi", "English","Maths", "Science","History","Physics"))
df1
Output:

StudentId Subject
1 101 Hindi
2 102 English
3 103 Maths
4 104 Science
5 105 History
6 106 Physics

# Data frame 2
df2 = data.frame(StudentId = c(102, 104, 106,107, 108),
State = c("Mangalore", "Mysore","Pune", "Dehradun", "Delhi"))
df2
Output:

StudentId State
1 102 Mangalore
2 104 Mysore
3 106 Pune
4 107 Dehradun
5 108 Delhi

1. Natural Join or Inner Join:


Inner join is used to keep only those rows that are matched from the dataframes, in this, we actually
specify the argument all = FALSE
df = merge(x = df1, y = df2, by = "StudentId")
df
Or
df = df1 %>% inner_join(df2, by = “StudentId”) #to perform by this way we have to
#Import required library:library(dplyr
Output:
StudentId Subject State
1 102 English Mangalore
2 104 Science Mysore
3 106 Physics Pune

2.Left Outer Join


Left Outer Join is basically to include all the rows of your dataframe x(left) and only those
from y(right) that match, in this, we actually specify the argument x = TRUE
df = merge(x = df1, y = df2, by = "StudentId", all.x = TRUE)
df
Or
df = df1 %>% left_join(df2, by = “StudentId”)

Output:
StudentId Subject State
1 101 Hindi NA
2 102 English Mangalore
3 103 Maths NA
4 104 Science Mysore
5 105 History NA
6 106 Physics Pune

3. Right Outer Join


Right, Outer Join is basically to include all the rows of your dataframe y(right) and only those from
x(left) that match, in this, we actually specify the argument y = TRUE.
df = merge(x = df1, y = df2, by = "StudentId", all.y = TRUE)
df
Or
df = df1 %>% right_join(df2 ,by = “StudentId”)
Output:

StudentId Subject State


1 102 English Mangalore
2 104 Science Mysore
3 106 Physics Pune
4 107 NA Dehradun
5 108 NA Delhi

4. Full Outer Join


Outer Join is basically used to keep all rows from both dataframes, in this, we actually specify the
arguments all = TRUE.
df = merge(x = df1, y = df2, by = "StudentId",all = TRUE)
df
Or
df = df1 %>% full_join(df2 ,by = “StudentId”)
Output:

StudentId Subject State


1 101 Hindi NA
2 102 English Mangalore
3 103 Maths NA
4 104 Science Mysore
5 105 History NA
6 106 Physics Pune
7 107 NA Dehradun
8 108 NA Delhi

5. Anti Join
In terms of set theory, we can say anti-join as set difference operation, for example, A = (1, 2, 3, 4) B
= (2, 3, 5) then the output of A-B will be set (1, 4). This join is somewhat like df1 – df2, as it basically
selects all rows from df1 that are actually not present in df2.
# Import required library
library(dplyr)

df = df1 %>% anti_join(df2, by = "StudentId")


df
Output:

StudentId Subject
1 101 Hindi
2 103 Maths
3 105 History

6.Cross Join
A Cross Join also known as cartesian join results in every row of one dataframe is being joined to
every other row of another dataframe. In set theory, this type of joins is known as the cartesian
product between two sets.
 frame1 = data.frame(s1=c(2,50,71))
frame1
frame2 = data.frame(s1=c(11,38,90))
frame2
df = merge(x = frame1, y= frame2 , by=NULL)
df

Output:

s1
1 2
2 50
3 71
s2
1 11
2 38
3 90
s1.x s2.y
1 2 11
2 50 11
3 71 11
4 2 38
5 50 38
6 71 38
7 2 90
8 50 90
9 71 90

 Melting and Casting of Dataframe:


 Melting and casting in R, are the functions that can be used efficiently to reshape the data.
 The functions used to do this are called melt() and cast().
 There are many packages in R that require data reshaping.
 Each data is specified in multiple rows of dataframe with different details in each row and this
type of format of data is known as long format.
 In some cases it is efficient to use long format and in some cases wide format is better.
 Required packages for melt() and cast() function
> install.packages("reshape2") > install.packages("reshape")
 Loading the required libraries
> library(reshape2) >library(reshape)

1. Melting in R:
 Melting in R programming is done to organize the data. The melt function takes data in wide
format and stacks a set of columns into a single column of data.
 It is performed using melt() function.To make use of this function we need to specify a data frame,
the constant variables , i.e id variables and the measured variables (columns of data) to be
stacked.
 The default assumption on measured variables is that it is all columns that are not specified as id
variables.
 Using melt(), dataframe which is in wide format is converted into long format .
 Creating a dataframe:

 Applying melt( ) function to a dataframe:

2. Casting in R:
 Casting in R programming is used to reshape the molten data using cast() function which takes
aggregate function and formula to aggregate the data accordingly.
 This function is used to convert long format data back into some aggregated (wide) format of data
based on the formula in the cast( ) function.
 Applying cast( ) function to molten data:

 One more way to cast the dataframe into wide format:

 One more example of casting:


(Extra But Important)
 Subset( ):subset() function in R is used to create subsets of a Data frame. This can also be used
to drop columns from a data frame.we can apply conditions on subset function and can retrive
the data which we want.
 emp<-data.frame(eid=1:6,ename=c("jack","john","mickey","jacky","max","jerry"),
sal=c(10000,30000,9000,50000,55000,70000))
emp
subset(emp,sal>10000)

Output:

eid ename sal


1 1 jack 10000
2 2 john 30000
3 3 mickey 9000
4 4 jacky 50000
5 5 max 55000
6 6 jerry 70000

eid ename sal


2 2 john 30000
4 4 jacky 50000
5 5 max 55000
6 6 jerry 70000

 Modify Column Name(taking above code here also):


 colnames(emp)[1]<- “emp_id”
Output:

emp_id ename sal


1 1 jack 10000
2 2 john 30000
3 3 mickey 22000
4 4 jacky 50000
5 5 max 55000
6 6 jerry 70000
 colnames(emp)[c(2,3)]<-c("emp_name","salary")
emp
Output:
emp_id emp_name salary
1 1 jack 10000
2 2 john 30000
3 3 mickey 22000
4 4 jacky 50000
5 5 max 55000
6 6 jerry 70000
 names(emp)[1]<- “empno”
 emp<-setNames(emp,c(“ ”, “ ” ))
 for deleting any column using select
 library(“dplyr”)
 emp<-select(emp,-salary) #single column
 emp<-select(emp,c(-emp_id,-salary))

 Deleting data :
 stud.data<-stud.data[-1,-1] #removes 1st row and 1st col of dataframe
stud.data
Output:
stud_name age mob
2 Arpita 18 3456
3 Nishka 18 5478
4 Gunjan 17 2354
5 Sumit 18 5725
6 Vaishali 18 2354

 stud.data$age<-NULL
Output:
stud_name mob
2 Geeta 3456
3 Nishka 5478
4 Gunjan 2354
5 Sumit 5725
6 Vaishali 2354

 rm(stud.data) #delete the entire dataframe


ls()
Output:
Character(0)

 Sorting Dataframe:
 Sorting a DataFrame allows us to reorder the rows based on the values in one or more columns.
This can be useful for various purposes, such as organizing data for analysis or presentation.
 Methods to sort a dataframe:

1. Using order( ) function:This function is used to sort the dataframe based on the particular column
in the dataframe.
We can also use the order function with a character vector. Note that ordering a categorical variable
means ordering it in alphabetical order.
Syntax: order(dataframe$column_name,decreasing = TRUE))
where
 dataframe is the input dataframe
 Column name is the column in the dataframe such that dataframe is sorted based on this column
 Decreasing parameter specifies the type of sorting order
If it is TRUE dataframe is sorted in descending order. Otherwise, in increasing order

Example:
# create dataframe with roll no and
# subjects columns
data = data.frame(
rollno = c(1, 5, 4, 2, 3),
subjects = c("java", "python", "php", "sql", "c"))

print(data)

print("sort the data in decreasing order based on subjects ")


print(data[order(data$subjects, decreasing = TRUE), ] )
print("sort the data in decreasing order based on rollno ")
print(data[order(data$rollno, decreasing = TRUE), ] )

Output:

rollno subjects
1 1 java
2 5 python
3 4 php
4 2 sql
5 3 c

[1] "sort the data in decreasing order based on subjects "


rollno subjects
4 2 sql
2 5 python
3 4 php
1 1 java
5 3 c

[1] "sort the data in decreasing order based on rollno "


rollno subjects
2 5 python
3 4 php
5 3 c
4 2 sql
1 1 java
#we can also apply sort() function to sort column of dataframe:
sort(data$rollno,decreasing=TRUE) #[1] 5 4 3 2 1
sort(data$subject) #[1] c java php python sql
Levels: c java php python sql

 Some other functions:


1. The sort() function simply sorts the values in the in ascending and descending order.
By default elements are sorted in ascending order . For sorting elements in descending order use
“decreasing=TRUE”

2. The order( ) function returned the index of each element in sorted order.
By default we get the index order in ascending order.for sorting indexes in descending order use
“decreasing=TRUE”

3. The rank() function assigned a rank to each element , i.e. it tells us that which position the element
will take after sorting.

For example, rank() tells us that the first value in the original vector is the smallest (rank = 1) and
the second value in the original vector is the largest (rank = 4)

The following code shows how to use sort(), order(), and rank() functions :

x <- c(0, 20, 10, 15)

sort(x) #[1] 0 10 15 20

sort(x,decreasing=TRUE) #[1] 20 15 10 0

order(x) #[1] 1 3 4 2
order(x,decreasing=TRUE) #[1] 2 4 3 1

rank(x) #[1] 1 4 2 3

rank(-x) #[1] 4 1 3 2

3.3 List

 Lists are the objects of R which contain elements of different types such as number, vectors,
string and another list inside it.
 It can also contain a function or a matrix as its elements.
 a list is a generic vector which contains other objects.
 Lists are one-dimensional, heterogeneous data structures.
 A list is a data structure which has components of mixed data types.
 Each componenet in a list can have different length.

 A list in R is created with the use of list() function.


 R allows accessing elements of an R list with the use of the index value. In R, the indexing of a list
starts with 1 instead of 0 like in other programming languages.

 3.3.1 Operations on List and Components manipulation

 Creating List

list1 <- list(29, 32, 34) # list with similar type of data

list1

list2 <- list("Ranjy", 38, TRUE) # list with different type of data

list2

Output:
#list1 #list2
[[1]] [[1]]
[1] 29 [1] "Ranjy"

[[2]] [[2]]
[1] 32 [1] 38

[[3]] [[3]]
[1] 34 [1] TRUE

OR

student<-list(rno=1:3,name=c("A","B","C"))
student

$Output:rno
[1] 1 2 3

$name
[1] "A" "B" "C"

 Structure of list str():


str(list1)
str(list2)
str(student)

Output:
#list1 #list2
List of 3 List of 3
$ : num 29 $ : chr "Ranjy"
$ : num 32 $ : num 38
$ : num 34 $ : logi TRUE

#student
List of 2
$ rno : int [1:3] 1 2 3
$ name: chr [1:3] "A" "B" "C

 Accessing List components


 list1[1] #o/p:[[1]] [1]32
 list2[2] #o/p:[[1]] [1] “Ranjy”
 student$name #o/p:[1] "A" "B" "C
 student$name[2] #o/p:[1] “B”
 student[“name”] #o/p:$name [1] "A" "B" "C"
 student[[“name”]] #o/p:[1] "A" "B" "C"

Difference between [] and [[]] :when we use single square bracket[] we get output in form of
list as shown above and when we use double square bracket[[]] we get only the elements of list
not the list i.e in form of vector.
Lets understand by an example:
Using []
 stud<-student["name"]#extracting name from student and storing it in stud
stud
class(stud)
Output:rn
$name
[1] "A" "B" "C"
[1] "list"

Using[[]]

 studi<-student[["name"]]
studi
class(studi)
Output:
[1] "A" "B" "C"
[1] "character"

 Inserting components to/from List


Using $ and [ ]:

 student$marks=c(50,67,84)
 student[["city"]]<-c("M","P","N")
student
Output:
$rno
[1] 1 2 3

$name
[1] "A" "B" "C"

$marks
[1] 50 67 84

$city
[1] "M" "P" "N"
Inserting a specific value
 student[[1]][[4]]<-4
student[[2]][[4]]<-"D"
student[["marks"]][[4]]<-45
student[["city"]][[4]]<-"P"
student

Output:

$rno
[1] 1 2 3 4

$name
[1] "A" "B" "C" "D"

$marks
[1] 50 67 84 45

$city
[1] "M" "P" "N" "P"

 Update elements in List:


 student$city[2]="M" #updates city for 2 element
Output:$city [1] "M" "M" "N" "P"
 student[["marks"]][[2]]=78
Output:$marks[1] 50 78 84 45

 student$city=c("a","b","d","e") #updates city of all element


Output:$city [1] "a" "b" "d" "e"

 student[["city"]][[4]]<-NA #stores NA value at 4th index


Output:$city[1] "a" "b" "d" NA

 length( ): length function in list gives the length of the list, i.e it displays how many elements are
there in the list.
 length(student)
Output: [1] 4

 Merging 2 lists:

 There are several ways to join, or concatenate, two or more lists in R.

 The most common way is to use the c() function, which combines two elements together:

 mob<-list(mobile=c(1234,4572,5849,9827)) #creating new list

mob

Output: $mobile [1] 1234 4572 5849 9827

 studinfo<-c(student,mob) #merging two lists using c func.

studinfo

Output:
$rno
[1] 1 2 3 4

$name
[1] "A" "B" "C" "D"

$marks
[1] 50 78 84 45

$city
[1] "a" "b" "d" NA

$mobile
[1] 1234 4572 5849 9827

 Adding vector into list:


 shift<-c("FS","SS","FS","SS") #creating a new vector
shift
class(shift) #determining its type
Output:
[1] "FS" "SS" "FS" "SS"
[1] "character"

 studinfo[["shift"]]=shift #adding vector into list


studinfo

Output:
$rno
[1] 1 2 3 4

$name
[1] "A" "B" "C" "D"

$marks
[1] 50 78 84 45

$city
[1] "a" "b" "d" NA

$mobile
[1] 1234 4572 5849 9827

$shift
[1] "FS" "SS" "FS" "SS"

 Deleting components to/from List:


 R allows us to remove items for a list. We first access elements using a list index and add
negative sign - to indicate we want to delete the item.

 thislist <- list("apple", "banana", "cherry")


newlist <- thislist[-2]
newlist
Output: #removes banana from list
[[1]]
[1] "apple"

[[2]]
[1] "cherry"

3.4 Date and Time functions in R:


 date ()function: date() function in R Language is used to return the current date and time.
>date() # Output:[1] "Tue Oct 10 16:52:02 2023"

 Sys.Date() Function: Sys.Date() function is used to return the system’s date.


>Sys.Date() # Output:[1] "2023-10-10"

 Sys.time(): Sys.time() function is used to return the system’s date and time.
>Sys.Date() # Output:[1] "2023-10-10 17:36:23 UTC"
 Sys.timezone(): Sys.timezone() function is used to return the current time zone.
>Sys.Date() # Output:[1] "Etc/UTC"

 # Creating POSIXct and POSIXlt objects


datetime <- as.POSIXct("2023-09-27 14:30:00", tz = "UTC")
datetime_lt <- as.POSIXlt("2023-09-27 14:30:00", tz = "UTC")

install.packages("lubridate")
library("lubridate")
cat("Year: ",year(datetime)) #Year:2023
cat("Month: ",month(datetime)) #Month:9
cat("Day: ",day(datetime)) #Day:27
cat("Minute: ",minute(datetime)) #Minute:30
cat("Second: ",second(datetime)) #Second:0

# Date arithmetic

 my_date <- as.Date("2023-09-27")

add_date <- my_date + 7


add_date #[1] "2023-10-04"

subtract_date <- my_date - 7


subtract_date #[1] "2023-9-30"

3.5 Strings in R:

 Strings are used for storing text.


 A string is surrounded by either single quotation marks, or double quotation marks:
"hello" is the same as 'hello':
 Working with String using different string functions.

 Assign a String to a Variable


Assigning a string to a variable is done with the variable followed by the <- operator and the
string:
str <- "Hello"
str # Output: [1] Hello

 Length of String
The length of strings indicates the number of characters present in the string. The function
nchar( ) inbuilt function of R can be used to determine the length of strings in R.
Syntax: nchar(x)
str <- "Hello World!"
nchar(str) # Output: [1] 12

 Accessing portions of an R string


The individual characters of a string can be extracted from a string by using the indexing
methods of a string.
Syntax: substring(x,first,last)
str <- substring("Extract",5,7) or substr("Extract",5,7)
(str) # Output: [1] “act”

 Concatenating Strings
Many strings in R are combined using the paste() function. It can take any number of
arguments to be combined together.
Syntax: paste(..., sep = " ", collapse = NULL)
str1 <- "Hello"
str2 <- "World"
paste(str1, str2) # Output: [1] Hello World
OR
paste("Hello","World")

 Cat Function ( )
If you want the line breaks to be inserted at the same position as in the code, use the cat()
function:
str <- "Lorem ipsum dolor sit amet, # Output:Lorem ipsum dolor sit amet,
consectetur adipiscing elit, consectetur adipiscing elit,
sed do eiusmod tempor incididunt sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua." ut labore et dolore magna aliqua.
cat(str)

 Splitting the String using strspilt( ) :


An R Language function which is used to split the strings into substrings with split arguments.
Syntax: strsplit(x,split,fixed=T)
Where:
X = input data file, vector or a stings.
Split = Splits the strings into required formats.
Fixed = Matches the split or uses the regular expression.

> strsplit("computer_dept",split="_") [1] "computer" "dept"


#this will split the string by removing the “_” as we have given it as split argument.
#it is not mandatory to right the split word before the split argument.
> strsplit("computer_dept","_") ` [1] "computer" "dept"
> strsplit("Hello","e",) [1] "H" "llo"
#this will split the string by removing the “e” as we have given it as split argument.

 Changing the case - toupper() & tolower() functions


These functions change the case of characters of a string.
Syntax: toupper(x)
tolower(x)
str <- "UPpEr aNd LowEr" #Output:
print(toupper(str)) [1] "UPPER AND LOWER"
print(tolower(str)) [1] "upper and lower"

 Grep ( ) & Grepl( ) function:


The two functions grep() and grepl() let you check whether a pattern is present in a character
string or vector of a character string, but they both return different outputs:
- Grep() return vector of indices of the element if a pattern exists in that vector.
- Grepl() return TRUE if the given pattern is present in the vector. Otherwise, it return FALSE

str <- c("geeks" ,"for", "geeks") #Output:


grep("geeks", str) [1] 1 3
grep("l",str) integer(0)
grepl("for", str) [1] FALSE TRUE FALSE
grepl("X", str) [1] FALSE FALSE FALSE

3.6 Control Structures in R :

Control statements are expressions used to control the execution and flow of the program based on

the conditions provided in the statements.

Following are the types of control statements as follows:

- if-else

- For loop

- While loop

- Repeat loop

- Next, break

- apply( ), sapply( ), lapply( ) functions

-Switch case

3.6.1 if-else :

 The if statement contains a logical condition that needs to be evaluated and checked. There is a
block of code under if statement which gets executed and returns output once the logical
condition is satisfied or is TRUE. Otherwise, if the logical condition doesn’t get satisfied or is
FALSE, the block of code does not get executed.
 Then comes the else part of this statement. This part allows us to execute a block of code when
the logical condition mapped under the if statement gives output as FALSE.
Syntax:
if(expression){
statements
}
else{
statements
}
 Example:
x<- 5

# Check value is less than or greater than 10


if(x > 10)
{
print(paste(x, "is greater than 10"))
}else
{
print(paste(x, "is less than 10"))
}
#Output: [1] "5 is less than 10"

3.6.2 For Loop:

 It is a type of loop or sequence of statements executed repeatedly until exit condition is reached.
Syntax:
for(value in vector){
statements
}
 Example:
x <- letters[4:10]

for(i in x)
{
print(i)
}
#Output:[1] "d"
[1] "e"
[1] "f"
[1] "g"
[1] "h"
[1] "i"
[1] "j"
3.6.3 While Loop:

 while loop is another kind of loop iterated until a condition is satisfied. The testing expression is
checked first before executing the body of loop.
Syntax:
while(expression){
statement
}
 Example:
x = 1

# Print 1 to 5
while(x <= 5)
{
print(x)
x = x + 1
}
#Output:[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

3.6.4 Repeat Loop:

 repeat is a loop which can be iterated many number of times but there is no exit condition to come
out from the loop. So, break statement is used to exit from the loop. break statement can be used
in any type of loop to exit from the loop.
Syntax:
repeat {
statements
if(expression)
{ break }}
 Example:
x = 1
# Print 1 to 5
repeat{
print(x)
x = x + 1
if(x > 5)
{
break
}
}
#Output:[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

3.6.5 Next and Break:

Next Statement:
 next statement is used to skip the current iteration without executing the further statements and
continues the next iteration cycle without terminating the loop.
Syntax:
if (test_condition) {
next}
 Example:
x <- 1:10
# Print even numbers
for(i in x)
{
if(i%%2 != 0)
{
next #Jumps to next loop
}
print(i)
}
#Output:[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
Break Statement:
 The break keyword is a jump statement that is used to terminate the loop at a particular iteration.
Syntax:
if (test_expression) {
break
}
 Example:
a<-1
while (a < 10)
{
print(a)
if(a==5)
break
a = a + 1
}
#Output:[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
 Switch Case :
 A switch statement is a selection control mechanism that selects several expressions or values
based on the value of a given expression.
 Switch case statements are a substitute for long if statements that compare a variable to several
integral values.
 Switch case in R is a multiway branch statement. It allows a variable to be tested for equality
against a list of values.
The basic syntax of the switch function is as follows:
switch(EXPR, CASE1, CASE2, ..., CASEN, DEFAULT)
Example:
day <- "Monday"
message <- switch(
day,
"Monday" = "It's the start of the workweek.",
"Tuesday" = "You're in the middle of the workweek.",
"Wednesday" = "It's hump day!",
"Thursday" = "Almost there, one more day to the weekend.",
"Friday" = "Happy Friday! It's the weekend soon.",
"Saturday" = "Enjoy your weekend!",
"Sunday" = "It's still the weekend.",
"Unknown day"
)
print(message) #Output: [1] "It's the start of the workweek."

3.6.6 apply( ),lapply( ),sapply( ) functions


 apply( )
 The apply() function lets us apply a function to the rows or columns of a matrix or data frame.
 This function takes matrix or data frame as an argument along with function and whether it has to
be applied by row or column and returns the result in the form of a vector or array or list of values
obtained.
 Syntax: apply( x, margin, function )
- x: determines the input array including matrix.
- margin: If the margin is 1 function is applied across row, if the margin is 2 it is applied across the
column.
- function: determines the function that is to be applied on input data.
 lapply() function
 The lapply() function helps us in applying functions on list objects and returns a list object of the
same length.
 The lapply() function in the R Language takes a list, vector, or data frame as input and gives output
in the form of a list object.
 Since the lapply() function applies a certain operation to all the elements of the list it doesn’t need
a MARGIN.
 Syntax: lapply( x, fun )
- x: determines the input vector or an object.
- fun: determines the function that is to be applied to input data.

 sapply() function
 sapply() function takes list, vector or data frame as input and gives output in vector or matrix.
 It is useful for operations on list objects and returns a list object of same length of original set.
 sapply function in R does the same job as lapply() function but returns a vector.
 Syntax: sapply( x, fun )
- x: determines the input vector or an object.
- fun: determines the function that is to be applied to input data.

You might also like