0% found this document useful (0 votes)
22 views

Data Analysis Using R - 2

Uploaded by

harshvasudevkoli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Data Analysis Using R - 2

Uploaded by

harshvasudevkoli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

DAR

2. R Programming Fundamentals

2.1 Overview of R Language :

 R is an interpreted programming language (so it is also called a scripting language), which


means that your code will not be compiled before running it.
 R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand, and is being developed by the R Development Core Team, of which Chambers is a
member.
 R is a powerful language and environment for statistical computing and graphics.
 R is available as free software.
 It is platform independent. Runs on any Operating system.
 It is extensible. i.e. Developers can easily write their own software and distribute it in the form
of R add-on packages.
 It provides massive packages for statistical modelling, machine learning, visualization etc.

2.2 Features of R :

 R is a simple and effective programming language that has been well-developed, as well as R is
data analysis software.
 R is a well - designed, easy, and effective language that has the concepts of conditionals,
looping, user-defined recursive procedures, and various I/O facilities.
 R has a large, consistent, and incorporated set of tools used for data analysis.
 R contains a suite of operators for different types of calculations on arrays, lists, and vectors.
 R provides highly extensible graphical techniques.
 R has an effective data handling and storage facility.
 R is a vibrant online community.
 R is free, open-source, robust, and highly extensible.

2.3 Basic Data Types and Operators in R :

 There are four basic data types in R:

1.Logical: TRUE/FALSE

A logical value is mostly created when a comparison between variables are done. An example
will be like:

x = 4 # Sample values
y = 3
z = x > y # Comparing two values

print(z) # print the logical value

print(class(z)) # print the class name of z

print(typeof(z)) # print the type of z

Output
[1] TRUE
[1] "logical"
[1] "logical"

2. Character :

R supports character data types where you have all the alphabets and special characters. It
stores character values or strings. Strings in R can contain alphabets, numbers, and symbols.

char = "Geeksforgeeks" # Assign a character value to char

print(class(char)) # print the class name of char

print(typeof(char)) # print the type of char

Output
[1] "character"
[1] "character"

3. Numeric/double: Real numbers with/without decimal point.(3,4.5)

Decimal values are called numerics in R. It is the default R data type for numbers in R. If you
assign a decimal value to a variable x as follows, x will be of numeric type. Real numbers with a
decimal point are represented using this data type in R. it uses a format for double-precision
floating-point numbers to represent numerical values.

# Assign a decimal value to x # Assign an integer value to y


x = 5.6 y = 5

# print the class name of variable # print the class name of variable
print(class(x)) print(class(y))

# print the type of variable # print the type of variable


print(typeof(x)) print(typeof(y))

Output Output
[1] "numeric" [1] "numeric"
[1] "double" [1] "double"

4. Integer: Fixed number suffix by L.(1L,30L)

R supports integer data types which are the set of all integers. You can create as well as convert
a value into an integer type using the as.integer() function. You can also use the capital ‘L’
notation as a suffix to denote that a particular value is of the integer R data type.
x = as.integer(5) # Create an integer value

print(class(x)) # print the class name of x

print(typeof(x)) # print the type of x

y = 5L # Declare an integer by appending an L suffix.

print(class(y)) # print the class name of y

print(typeof(y)) # print the type of y

Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"

 Types of the operator in R language


 Arithmetic Operators
 Logical Operators
 Relational Operators
 Assignment Operators

1. Arithmetic Operators
Arithmetic operations in R simulate various math operations, like addition, subtraction,
multiplication, division, and modulo using the specified operator between operands, which may be
either scalar values, complex numbers, or vectors.

I. Addition operator (+): The values at the corresponding positions of both operands are added.

a <- c (1, 0.1)


b <- c (2.33, 4)
print (a+b)
Output : 3.33 4.10

II. Subtraction Operator (-): The second operand values are subtracted from the first.

a <- 6
b <- 8.4
print (a-b)
Output : -2.4

III. Multiplication Operator (*) : The multiplication of corresponding elements of vectors and
Integers are multiplied with the use of the ‘*’ operator.

a <- c(4,4)
b <- c(5,5)
print (a*b)
Output : 20 20
IV. Division Operator (/) : The first operand is divided by the second operand with the use of the
‘/’ operator.

a <- 10
b <- 5
print (a/b)
Output : 2

V.Power Operator (^): The first operand is raised to the power of the second operand.

a <- 4
b <- 5
print(a^b)
Output : 1024

VI. Modulo Operator (%%): The remainder of the first operand divided by the second operand
is returned.

a <- c(2, 22)


b <-c(2,4)
print(a %% b)
Output : 0 2

# R program to illustrate
# the use of Arithmetic operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)

# Performing operations on Operands


cat ("Addition of vectors :", vec1 + vec2, "\n")
cat ("Subtraction of vectors :", vec1 - vec2, "\n")
cat ("Multiplication of vectors :", vec1 * vec2, "\n")
cat ("Division of vectors :", vec1 / vec2, "\n")
cat ("Modulo of vectors :", vec1 %% vec2, "\n")
cat ("Power operator :", vec1 ^ vec2)

Output :
Addition of vectors : 2 5
Subtraction of vectors : -2 -1
Multiplication of vectors : 0 6
Division of vectors : 0 0.6666667
Modulo of vectors : 0 2
Power operator : 0 8

2. Logical Operators
Logical operations in R simulate element-wise decision operations, based on the specified operator
between the operands, which are then evaluated to either a True or False boolean value.
I. Element-wise Logical AND operator (&): It is called Element-wise Logical AND operator. It
combines each element of the first vector with the corresponding element of the second vector and
gives a output TRUE if both the elements are TRUE.
v <- c(3,1,TRUE,2+3i)
t <- c(4,1,FALSE,2+3i)
print(v & t)
Output :TRUE TRUE FALSE TRUE

II. Element-wise Logical OR operator (|): It is called Element-wise Logical OR operator. It combines
each element of the first vector with the corresponding element of the second vector and gives a
output TRUE if one the elements is TRUE.

v <- c(3,0,TRUE,2+3i) # 0 is always considered as FALSE


t <- c(4,0,FALSE,2+3i)
print(v | t)
Output :TRUE FALSE TRUE TRUE

III. NOT operator (!): It is called Logical NOT operator. Takes each element of the vector and gives
the opposite logical value.

v <- c(3,0,TRUE,2+3i)
print(!v)
Output :FALSE TRUE FALSE FALSE

IV. Logical AND operator (&&): Called Logical AND operator. Takes first element of both the vectors
and gives the TRUE only if both are TRUE.

v <- c(3,0,TRUE,2+2i)
t <- c(1,3,TRUE,2+3i)
print(v && t)
Output :TRUE

V. Logical OR operator (||): Called Logical OR operator. Takes first element of both the vectors and
gives the TRUE if one of them is TRUE.

v <- c(0,0,TRUE,2+2i)
t <- c(0,3,TRUE,2+3i)
print(v || t)
Output :False

# R program to illustrate
# the use of Logical operators
vec1 <- c(0,2)
vec2 <- c(TRUE,FALSE)

# Performing operations on Operands


cat ("Element wise AND :", vec1 & vec2, "\n")
cat ("Element wise OR :", vec1 | vec2, "\n")
cat ("Logical AND :", vec1 && vec2, "\n")
cat ("Logical OR :", vec1 || vec2, "\n")
cat ("Negation :", !vec1)
Output :
Element wise AND : FALSE FALSE
Element wise OR : TRUE TRUE
Logical AND : FALSE
Logical OR : TRUE
Negation : TRUE FALSE

3. Relational Operators:
The relational operators in R carry out comparison operations between the corresponding elements
of the operands. Returns a boolean TRUE value if the first operand satisfies the relation compared to
the second. A TRUE value is always considered to be greater than the FALSE.

I. Less than (<): Checks if each element of the first vector is less than the corresponding element of
the second vector.

a <- c(TRUE, 0.1,"apple") # in case of char it checks the first alphabet


b<- c(0,0.1,"bat")
print(a < b)
Output :FALSE FALSE TRUE

II. Less than equal to (<=) : Checks if each element of the first vector is less than or equal to the
corresponding element of the second vector.

a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a <= b)
Output : TRUE FALSE TRUE TRUE

III. Greater than (>) :Checks if each element of the first vector is greater than the corresponding
element of the second vector.

a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a > b)
Output :FALSE TRUE FALSE FALSE

IV. Greater than equal to (>=):Checks if each element of the first vector is greater than or equal to
the corresponding element of the second vector.

a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a >= b)
Output :FALSE TRUE FALSE TRUE

V. Not equal to (!=) :Checks if each element of the first vector is unequal to the corresponding
element of the second vector.

a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a != b)
Output :TRUE TRUE TRUE FALSE

VI. Equals(==):Checks if each element of the first vector is equal to the corresponding element of the
second vector.

a <- c(2,5.5,6,9)
b<- c(8,2.5,14,9)
print(a != b)
Output :FALSE FALSE FALSE TRUE

# R program to illustrate
# the use of Relational operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)

# Performing operations on Operands


cat ("Vector1 less than Vector2 :", vec1 < vec2, "\n")
cat ("Vector1 less than equal to Vector2 :", vec1 <= vec2, "\n")
cat ("Vector1 greater than Vector2 :", vec1 > vec2, "\n")
cat ("Vector1 greater than equal to Vector2 :", vec1 >= vec2, "\n")
cat ("Vector1 not equal to Vector2 :", vec1 != vec2, "\n")

Output
Vector1 less than Vector2 : TRUE TRUE
Vector1 less than equal to Vector2 : TRUE TRUE
Vector1 greater than Vector2 : FALSE FALSE
Vector1 greater than equal to Vector2 : FALSE FALSE
Vector1 not equal to Vector2 : TRUE TRUE

4. Assignment Operator:
Assignment operators in R are used to assigning values to various data objects in R. The objects may
be integers, vectors, or functions. These values are then stored by the assigned variable names.

I. Left Assignment (<- or <<- or =) :

v1 <- c(“A”,TRUE)
v2 <<-c(10L,5.2)
v3 = c(3,1,TRUE,2+3i)
print (v1)
print (v2)
print (v3)
Output :
[1] “A”,"TRUE"
[1] 10.0 5.2
[1] 3+0i 1+0i 1+0i 2+3i
II. Right Assignment (-> or ->>):

v1 -> c(“A”,TRUE)
v2 ->>c(10L,5.2)

Output :
[1] “A”,"TRUE"
[1] 10.0 5.2

2.4 Data Structures in R


 A data structure is a particular way of organizing data in a computer so that it can be used
effectively.

 The idea is to reduce the space and time complexities of different tasks. Data structures in R
programming are tools for holding multiple values.

 R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether
they’re homogeneous (all elements must be of the identical type) or heterogeneous (the
elements are often of various types).

 This gives rise to the six data types which are most frequently utilized in data analysis.
 The most essential data structures used in R include:

 Vectors : A vector is an ordered collection of basic data types of a given length. The only key
thing here is all the elements of a vector must be of the identical data type e.g homogeneous
data structures. Vectors are one-dimensional data structures.

 Factors : Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns which have a
limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in
data analysis for statistical modeling. Factors are created using the factor () function by
taking a vector as input.

 Arrays : Arrays are essential data storage structures defined by a fixed number of
dimensions. Arrays are used for the allocation of space at contiguous memory
locations.Arrays consist of all elements of the same data type.
 Matrices : Matrix is a rectangular arrangement of numbers in rows and columns. In a matrix,
as we know rows are the ones that run horizontally and columns are the ones that run
vertically.

 Dataframes: A data frame is a table or a two-dimensional array-like structure in which each


column contains values of one variable and each row contains one set of values from each
column.
 List :Lists are one-dimensional, heterogeneous data structures. The list can be a list
of vectors, a list of matrices, a list of characters and a list of functions, and so on.List is
created using list() function.

2.5 Vectors
 There are 2 types of vectors:
-Atomic Vector: Sequence of elements that share same data type.
- List: A vector that contains possibly heterogeneous type of elements.
 Atomic vectors are of four types: Logical, Integer, Numeric/double, Character.
 v1<- c(TRUE,FALSE,TRUE) # logical vector.
 v2<- c(1.2, 2.5, 3) # numeric/double vector
 v3<- 50L # integer vector
 v4<- c(“Hello”, ”Computer”) # character vector

2.5.1 Properties of Vectors: type, length, attributes.


 typeof(): displays type of vector. e.g.typeof(v1)
 length(): displays number of elements in vector. e.g. length(v1)
 attributes(): displays names of vector elements.
e.g. v1<-c(“A”=20,”B”=20)
attributes(v1) #A and B are attribute or names of values stored in v1.
 is(): used to check whether vector is of specified type. e.g. is.integer(v1)
 c(): used to combine more that one vectors/values.
 is.na(): used to check NA values.
e.g. v1<-c(1,2,NA)
is.na(v1) #result will be FALSE FALSE TRUE

2.5.2 Working with Vectors


 Creating and Deleting Vectors of different types using: seq( ), assign( ), Vector( ),
rep( ), c( ), rm( ) functions
 Different ways to create vector:
1. Create numeric/double vector using seq( ) function:
• It is used to generate sequence of input values.
 a<-seq(from=1, to=10) or a<-seq(1,10)
 a<- seq(1,10,by=2) #sequence of odd numbers from 1 to 10.
 a<-seq_len(10) #sequence of 1 to 10 numbers.
 a<- seq(from=-5, to=5)

2.Create vector using assign( ) function:


• It is used to assign value to a vector.
 assign(“a”,10)
 assign(“a”, “Computer”)
 assign(“colours”, c(“Blue”,”White”,”Orange”,”Green”))

3.Create vector using vector( ) function:


• The vector() function is used to create a vector of any type. It takes the
parameter mode and length. mode is used to specify the type and length is used to
specify the length of the vector with default values.
The following example creates a logical vector with 5 elements.

 # Create Vector using vector()


x <- vector(mode='logical',length=5)
print(x) #O/P: [1] FALSE FALSE FALSE FALSE FALSE
print(is.vector(x)) #O/P:[1] TRUE
print(typeof(x)) #O/P:[1] "logical"

4. Create vector using rep( ) function:


• Repeat vector values for specified number of times.
Assume v1<-c(1,2,3)
 rep(v1,3) #O/P: 1 2 3 1 2 3 1 2 3
 rep(v1,each=3) #O/P: 1 1 1 2 2 2 3 3 3
 rep("a",2) #O/P: "a" “a”
 rep(v1,times=c(2,3,4)) #O/P: 1 1 2 2 2 3 3 3 3
 rep(c(1,2),3) #O/P: 1 2 1 2 1 2

5. Assigning values using c( ):


 Fruits<-c(“apple”, ”mango”, ”watermelon”)
 Age<-c(10,12,14,16)
 v1<-c(1,2) v2<-c(3,4,5) v3<-c(v1,v2)

6.removing variables using rm( ):

•The rm() function in R is used to delete or remove a variable from a workspace.


 a <- 4 # creating variables

b <- sqrt(a)

c <- b * b

d <- c * b

# listing all the variables in the present working directory

# ls() function to return a list of all the variables

ls() #O/P: [1] "a" "b" "c" "d"

rm(list = c("a", "d")) # removing variables

print(ls()) #O/P:[1] "b" "c"

Print(c) #O/P:[1] 4

7. Vector with single value:


 Fruit<- “apple”
 Age<-55

8. Numeric vector using colon( : ) operator:


 Age<-1:50 #vector Age with 1 to 50 values.

 How to modify a vector in R?


We can modify a vector using the assignment operator.

x <- c(-3,-2,-1,0,1,2)

x[2] <- 0; x # modify 2nd element

x[x<0] <- 5; x # modify elements less than 0

x <- x[1:4]; x # truncate x to first 4 elements

Output:

[1] -3 0 -1 0 1 2

[1] 5 0 5 0 1 2

[1] 5 0 5 0

 How to delete a vector in R?

We can delete a vector by simply assigning a NULL to it.


x <- c(-3,-2,-1,0,1,2)

print(x)

x<-x[-6]

print(x)

x <- NULL

x[4]

Output:

[1] -3 -2 -1 0 1 2

[1] -3 -2 -1 0 1

NULL

 Sorting vector: sort( ) function


 sort() function is used with the help of which we can sort the values in ascending or
descending order.
fruits <- c("banana", "apple", "orange", "mango", "lemon")
numbers <- c(13, 3, 5, 7, 20, 2)
sort(fruits) [1] "apple" "banana" "lemon" "mango" "orange"
sort(numbers) [1] 2 3 5 7 13 20

• Assigning names to vector components:


 V1<-c(1,2,3)
Names(V1)<- c(“first”, ”second”, ”third”)
V1[“first”] # displays component with name “first” (character index)
 V2<- 1:10
V2[1:5] # displays first 5 elements.
V2[4:9] # displays elements from specified index positions.

 Dealing with NA (Not Available) values


 In R, the NA symbol is used to define the missing values, and to represent impossible
arithmetic operations (like dividing by zero) we use the NAN symbol which stands for “not a
number”.
 In simple words, we can say that both NA or NAN symbols represent missing values in R.
 Missing Values in R, are handled with the use of some pre-defined functions:

 is.na() Function for Finding Missing values:

A logical vector is returned by this function that indicates all the NA values present. It
returns a Boolean value. If NA is present in a vector it returns TRUE else FALSE.
x<- c(NA, 3, 4, NA, NA, NA)
is.na(x)
Output: [1] TRUE FALSE FALSE TRUE TRUE TRUE
 Removing NA values:
There are two ways to remove missing values:

Extracting values except for NA values:

Example 1:
x <- c(1, 2, NA, 3, NA, 4)
d <- is.na(x)
x[! d]
Output: [1] 1 2 3 4

Example 2:
x <- c(1, 2, 0 / 0, 3, NA, 4, 0 / 0)
x
x[! is.na(x)]
Output: [1] 1 2 NaN 3 NA 4 NaN [1] 1 2 3 4

2.5.3 Vector Indexing

 Elements of a vector can be accessed using vector indexing. The vector used for indexing can be
logical, integer or character vector.

 Using integer vector as index


 Vector index in R starts from 1, unlike most programming languages where the index starts
from 0.

 We can use a vector of integers as index to access specific elements.

 We can also use negative integers to return all elements except those specified.

 But we cannot mix positive and negative integers while indexing and real numbers, if used, are
truncated to integers.

x <- 1:7# access 3rd element

x[3] # access 2nd and 4th element

x[c(2, 4)] # access all but 1st element

x[-1] # cannot mix positive and negative integers

x[c(2, -4)]

Output:

[1] 3

[1] 2 4
[1] 2 3 4 5 6 7

ERROR!

Error in x[c(2, -4)] : only 0's may be mixed with negative subscripts Execution halted

 Using logical vector as index


 When we use a logical vector for indexing, the position where the logical vector is TRUE is
returned.

x <- 1:5

x[c(TRUE, FALSE, FALSE, TRUE, TRUE)]# filtering vectors based on conditions

x[x < 0]

x[x > 0]

Output:

[1] 1 4 5

integer(0)

[1] 1 2 3 4 5

 Using character vector as index

 This type of indexing is useful when dealing with named vectors. We can name each elements
of a vector.

x <- c("first"=3, "second"=0, "third"=9)

names(x)

x["second"]

Output:

[1] "first" "second" "third"

second

2.5.4 Reading data using scan( ) function.


This scan() function in R reads data as a list or vector that takes user input through a console or file.

scan(file, what, ...)


This function takes the following arguments.

 file: It is the text file to be scanned.


 what: It is the type of input according to the value to be scanned. This argument has the
following data types: raw, integer, logical, numeric, complex, character, and list.

 a<- scan() #read numeric /double value


 a<- scan(what=integer()) #read Integer value
 a<- scan(what=character()) #read character value
 a<- scan(what=“ “) #read string value
 a[c(2,4)] #display 2nd & 4th components
 a[-1] #Skip 1st component in output.

Extra points :

 Add new component/value to existing vector using append( ) :

V<-c(“Hello”, ”World”)

append(V, “Computer”, after=1) .

append( ) receives 3 parameters:

• 1st is name of vector

• 2nd is the string/number to be added.

• 3rd is the index after which new component is to be added.

Above command will insert word Computer after 1st index position

 c( ) to add new component: It appends data only at the end of vector.

V<- 1:5

V<-c(V,c(6,7,8)) or V<-c(V,6:8) #append values 6,7,8 at the end of V.

 Vector arithmetic:

X <- c(5, 2, 5, 1, 51, 2) # Creating Vectors


Y <- c(7, 9, 1, 5, 2, 1)

Z <- X + Y # Addition
print('Addition:')
print(Z)

S <- X - Y # Subtraction
print('Subtraction:')
print(S)

M <- X * Y # Multiplication
print('Multiplication:')
print(M)

D <- X / Y # Division
print('Division:')
print(D)
Output:

Addition: 12 11 6 6 53 3
Subtraction: -2 -7 4 -4 49 1
Multiplication: 35 18 5 5 102 2
Division: 0.7142857 0.2222222 5.0000000 0.2000000 25.5000000 2.0000000

 R Environment

 Environment is a virtual place to store data objects.


 Default Environment of R is R_GlobalEnv.
 Some R commands to work with environment:

 environment(): get the name of the current environment.


 ls( )/objects( ): used to list out all objects created and stored under any environment.
 new.env():is used to create new environment.
 e.g. demo<- new.env()

 Creating data objects in new environment:

 demo$a<-10 #create “a” in environment “demo” with value 10.


 assign(“a”,10,envir=demo)
 ls(demo) #list out data objects created in specific environment.

 Removing object from environment:

 rm(a,b) # remove objects a and b from default environment.


 rm(list=ls()) #remove all data objects from default environment.
 rm(a, envir=demo) #”a” is name of object to be removed from demo environment.

 Check whether object is present in environment:

 exists(“a”, envir=demo) # searching object “a” in demo environment.

 Display the value of data object from any environment:

 demo$a or get(“a”, envir=demo)

 Save and Load objects:

• There are different functions to save R objects in a file on disk.

1. save() function: It creates .Rdata file. In these files, we can store several variables.

 save(student, file=“demo.RData”) #save student object in demo.RData file. Student can


be any data object such as vector, dataframe, matrix, list, array etc.
 save(list=c(“stud”, ”emp”), file=“demo.RData”) # save multiple objects.
 save() function has compress parameter which is by default turned on. That means the resulting
file will use less space on disk. If it is a huge dataset, it could take longer time to load it later
because R first has to extract the file again.
 If you want to save space, then keep compress parameter on. If time is to be saved, then set
parameter compress = F.
2. save.image method: It is used to save the current workspace files.

 It is an extended version of the save method in R which is used to create a list of all the declared
data objects and save them into the workspace.
 save.image(file=“filename.RData”): It takes one argument file – name of the file where the R object
is saved to or read from.
 a<- 1:3, b<- 4:5
 assign(“colours”, c(“Blue”,”White”,”Orange”,”Green”))
 save.image(file=“demo.RData”) #save all data objects in demo.RData file.

3. saveRDS() function: It can only be used to save one object in one file. The file that is saved using
saveRDS( ) can be read in R environment by using readRDS() function.

 saveRDS(book, file = "data.Rds") #save book object in data.Rds file.


 book.copy <- readRDS(file = "data.Rds“) # load file into book.copy object.
 Compress parameter is available for saveRDS() also.

 The load() command will place the objects into the global environment if it is deleted.

 load(file=“demo.RData”) # it loads data objects from file to R environment.

 Working directory in R: Its a default path in computer system to store data objects and work
done.

• When R want to import any dataset, it is assumed that data is stored in working directory.

• We can have a single working directory at a time called current working directory.

 getwd() is used to display current working directory.


 setwd() is used to change current working directory and set new one.
 e.g. setwd(dir=“ new path of directory“)

2.6 Factors

 Factor is a data object that is used to store categorical data.


 It takes only predefined types of values, called levels.
 Factors components are of character type but internally they are stored as integer and levels are
associated with them.
 They are useful in the columns which have a limited number of unique values. Like "Male,
"Female" and True, False etc.
 They are useful in data analysis for statistical modeling.
 Factors are created using the factor () function by taking a vector as input.
x <-c("female", "male", "male", "female") # Creating a vector
print(x)

gender <-factor(x) # Converting the vector x into a factor


print(gender) # named gender

Output :

[1] "female" "male" "male" "female"


[1] female male male female
Levels: female male

 Levels can also be predefined by the programmer.


# Creating a factor with levels defined by programmer
gender <- factor(c("female", "male", "male", "female"),
levels = c("female", "transgender", "male"));
gender

Output :

[1] female male male female


Levels: female transgender male

 Checking for a Factor in R


The function is.factor() is used to check whether the variable is a factor and returns “TRUE” if it is a
factor.
gender <- factor(c("female", "male", "male", "female"));
print(is.factor(gender)) #class(gender) o/p: [1] "factor"

Output : [1] TRUE

If you want to modify a factor and add value out of predefined levels, then first modify levels.
gender <- factor(c("female", "male", "male", "female" ));

# add new level


levels(gender) <- c(levels(gender), "other")
gender[3] <- "other"
gender

Output :
[1] female male other female
Levels: female male other

2.7 Arrays
2.7.1 Operations on Array and elements manipulation
 Arrays are essential data storage structures defined by a fixed number of dimensions.
 Arrays are the R data objects which can store data in more than two dimensions.
 Its a multidimensional data structure.
 1D array is a vector, 2D array is matrix
 An array is created using the array() function. It takes vectors as input and uses the values in
the dim parameter to create an array.
 Vectors from which array is created can be of either of same lengths or of different lengths.
 Array can be created using single vector, two vectors or by using more than two 1vectors.
Syntax:
Array_name= array(data, dim = c(rowsize, colsize, matrix), dimnames=names)
 “array” function is used to create array in R.
 “data” are input values from which array is created created. It can be either vector or
number sequences.
 “dim” parameter specifies dimensions of array. i.e. number of rows, no. of columns and no.
of matrix in a array.
 “dimnames” parameter is used to specify names of dimensions.
1. V1<-c(1,2,3,4,5)
V2<-c(6,7,8,9)
A<-array(c(V1,V2), dim=c(3,3)) #A<-array(c(V1,V2), dim=c(3,3))
Output:
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

2. B<-array(c(V1,V2), dim=c(3,3,2))
“B” array will be created with dimensions 3 rows, 3 columns and 2 matrices. It has two 3x3 matrices
created with vectors V1 and V2.
Output:
,,1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
,,2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

3. C=array(1:4,dim=c(2,2))
print(C) #:2x2 array “C” is created from sequence(1:4)
Output:
[,1] [,2]
[1,] 1 3
[2,] 2 4

 Naming of Arrays

The row names, column names and matrices names are specified as a vector of the number of rows,
number of columns and number of matrices respectively. By default, the rows, columns and matrices
are named by their index values.

row_names <- c("row1", "row2")


col_names <- c("col1", "col2", "col3")
mat_names <- c("Mat1", "Mat2")

arr = array(2:14, dim = c(2, 3, 2),


dimnames = list(row_names, col_names, mat_names))
print (arr)

Output:

,, Mat1
col1 col2 col3
row1 2 4 6
row2 3 5 7
,, Mat2
col1 col2 col3
row1 8 10 12
row2 9 11 13

 Accessing arrays:
 Elements in array can be accessed using row index, column index and matrix index/level.
 Syntax: array_name[row_index, column_index, matrix_level]

Accessing specific rows and columns of matrices


Rows and columns can also be accessed by both names as well as indices.
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
vec2 <- c(10, 11, 12)
row_names <- c("row1", "row2")
col_names <- c("col1", "col2", "col3")
mat_names <- c("Mat1", "Mat2")
arr = array(c(vec1, vec2), dim = c(2, 3, 2),
dimnames = list(row_names,
col_names, mat_names))

# accessing matrix 1 by index value


print ("1st column of matrix 1")
print (arr[, 1, 1])

# accessing matrix 2 by its name


print ("2nd row of matrix 2")
print(arr["row2",,"Mat2"])
Output:

[1] "1st column of matrix 1"


row1 row2
1 2
[1] "2nd row of matrix 2"
col1 col2 col3
8 10 12

 print(A[2,3]) :prints element at 2nd row and 3rd column.


 print(A[2,3,1]) :prints element at 2nd row and 3rd column from 1st matrix.
 print(A[ print(A[2,3, ]) :prints element at 2 row and 3 column of both matrix.
 print(A[ , ,1 ]) :prints first matrix
 print(A[ 1, , ]) :prints first row of both matrix.
 print(A[ ,2 ,1 ]) :prints 2nd column of matrix 1.
 print(A[ c(1,2), ,1 ]) :prints 1st & 2nd rows of matrix 1.
 print(A[ ,c(2,3), ]) :prints 2nd & 3rd columns of both matrix.
 print(A[ c(1 ,2),c(2,3), 2]) :prints the values of 2nd & 3rd columns for 1st & 2nd rows of matrix 2.

 Array Arithmetic:

In R, the basic operations of addition, subtraction, multiplication, and division work element-wise. We
need to ensure that the arrays are of the proper size and valid according to matrix arithmetic.

For example, the number of rows of the first matrix and the number of columns of the second matrix
should be the same for multiplication.
 test_arr1 <-array(c(1:8),c(2,2,2))

test_arr2 <- array(c(9:16),c(2,2,2))

test_arr1

test_arr2

Output:

, , 1

[,1] [,2]

[1,] 1 3

[2,] 2 4

, , 2

[,1] [,2]

[1,] 5 7

[2,] 6 8

, , 1

[,1] [,2]

[1,] 9 11

[2,] 10 12

, , 2

[,1] [,2]

[1,] 13 15

[2,] 14 16

1. Addition:
 arr_add <- test_arr1+test_arr2
arr_add
Output:

, , 1

[,1] [,2]
[1,] 10 14
[2,] 12 16

, , 2

[,1] [,2]
[1,] 18 22
[2,] 20 24

2. Subtraction:
 arr_sub <- test_arr2-test_arr1
arr_sub

Output:

, , 1

[,1] [,2]
[1,] 8 8
[2,] 8 8

, , 2

[,1] [,2]
[1,] 8 8
[2,] 8 8

3. Multiplication:
 arr_mul <- test_arr1*test_arr2
arr_mul

Output:

, , 1

[,1] [,2]
[1,] 9 33
[2,] 20 48

, , 2

[,1] [,2]
[1,] 65 105
[2,] 84 128

4. Division:
 arr_div <- test_arr2/test_arr1
arr_div

Output:

, , 1

[,1] [,2]

[1,] 9 3.666667
[2,] 5 3.000000

, , 2

[,1] [,2]

[1,] 2.600000 2.142857

[2,] 2.333333 2.000000

 test_arr1[, , 1]+test_arr1[, , 2] #:add 1st and 2nd matrix of array

Output:

[,1] [,2]

[1,] 6 10

[2,] 8 12

 test_arr1[1, , 1]+test_arr1[1, , 2] #add 1st row of both matrix in array.

Output:

[1] 6 10

 sum(test_arr1) #add all values of 1st array

Output:

[1] 36

 min(test_arr1) #finds minimum value in an array

Output:

[1] 1

 max(A) #finds maximum value in an array

[1] 8

You might also like