0% found this document useful (0 votes)
10 views

Data Structure in

The document discusses different data structures in R including vectors, lists, matrices and arrays. Vectors are the simplest structure, allowing elements of the same type. Lists can contain elements of different types. Matrices are two-dimensional structures with rows and columns. Arrays allow multiple dimensions.

Uploaded by

tapstaps902
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Data Structure in

The document discusses different data structures in R including vectors, lists, matrices and arrays. Vectors are the simplest structure, allowing elements of the same type. Lists can contain elements of different types. Matrices are two-dimensional structures with rows and columns. Arrays allow multiple dimensions.

Uploaded by

tapstaps902
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Structures in R :-

Vectors: creating, indexing, operations

Creating Vectors:-
A vector is simply a list of items that are of the same type.
To combine the list of items to a vector, use the c() function and separate
the items by a comma.
In the example below, we create a vector variable called fruits, that
combine strings:

Example
# Vector of strings
fruits <- c("banana", "apple", "orange")

# Print fruits
fruits
Example
# R program to illustrate Vector

# Vectors(ordered collection of same data type)


X = c(1, 3, 5, 7, 8)
# Printing those elements in console
print(X)

In this example, we create a vector that combines numerical values:

Example
# Vector of numerical values
numbers <- c(1, 2, 3)

# Print numbers
numbers

To create a vector with numerical values in a sequence, use


the : operator:

Example
# Vector with numerical values in a sequence
numbers <- 1:10

numbers
You can also create numerical values with decimals in a sequence, but
note that if the last element does not belong to the sequence, it is not used:

Example

# Vector with numerical decimals in a sequence


numbers1 <- 1.5:6.5
numbers1

# Vector with numerical decimals in a sequence where the last element is not
used
numbers2 <- 1.5:6.3
numbers2

Result:

[1] 1.5 2.5 3.5 4.5 5.5 6.5


[1] 1.5 2.5 3.5 4.5 5.5

Vector Length

To find out how many items a vector has, use the length() function:

Example
fruits <- c("banana", "apple", "orange")

length(fruits)

Sort a Vector
To sort items in a vector alphabetically or numerically, use
the sort() function:

Example

fruits <- c("banana", "apple", "orange", "mango", "lemon")


numbers <- c(13, 3, 5, 7, 20, 2)

sort(fruits) # Sort a string


sort(numbers) # Sort numbers
Access Vectors
You can access the vector items by referring to its index number inside
brackets []. The first item has index 1, the second item has index 2, and so
on:

Example

fruits <- c("banana", "apple", "orange")

# Access the first item (banana)


fruits[1]

IMP :- Elements of a Vector are accessed using indexing. The [ ]


brackets are used for indexing. Indexing starts with position 1. Giving a
negative value in the index drops that element from
result.TRUE, FALSE or 0 and 1 can also be used for indexing.

Change an Item
To change the value of a specific item, refer to the index number:

Example
fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Change "banana" to "pear"


fruits[1] <- "pear"

# Print fruits
fruits

Repeat Vectors

To repeat vectors, use the rep() function:


Example
Repeat each value:
repeat_each <- rep(c(1,2,3), each = 3)

repeat_each

Example
Repeat the sequence of the vector:
repeat_times <- rep(c(1,2,3), times = 3)

repeat_times
Example

Repeat each value independently:


repeat_indepent <- rep(c(1,2,3), times = c(5,2,1))

repeat_indepent

Generating Sequenced Vectors


One of the examples on top, showed you how to create a vector with
numerical values in a sequence with the : operator:
Example
numbers <- 1:10

numbers
To make bigger or smaller steps in a sequence, use the seq() function:
Example
numbers <- seq(from = 0, to = 100, by = 20)

numbers
Note: The seq() function has three parameters: from is where the
sequence starts, to is where the sequence stops, and by is the interval of
the sequence.

Example

# Accessing vector elements using position.


t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]print(u)
# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]print(v)
# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)]print(x)
# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]print(y)

When we execute the above code, it produces the following result −


[1] "Mon" "Tue" "Fri"
[1] "Sun" "Fri"
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
[1] "Sun"
R - List :-

Lists are the R objects which contain elements of different types like −
numbers, strings, vectors and another list inside it. A list can also contain
a matrix or a function as its elements. List is created using list() function.

A list in R can contain many different data types inside it. A list is a
collection of data which is ordered and changeable.

To create a list, use the list() function:

# Create a list containing strings, numbers, vectors and a logical# values.


list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)

When we execute the above code, it produces the following result −

[[1]]
[1] "Red"

[[2]]
[1] "Green"

[[3]]
[1] 21 32 11

[[4]]
[1] TRUE

[[5]]
[1] 51.23

[[6]]
[1] 119.1
Matrices

A matrix is a two dimensional data set with columns and rows.

A column is a vertical representation of data, while a row is a horizontal


representation of data.

A matrix can be created with the matrix() function. Specify


the nrow and ncol parameters to get the amount of rows and columns:

Lists Examples :-
Example

# List of strings
thislist <- list("apple", "banana", "cherry")

# Print the list


thislist

Access Lists
You can access the list items by referring to its index number, inside
brackets. The first item has index 1, the second item has index 2, and so
on:

Example
thislist <- list("apple", "banana", "cherry")

thislist[1]

Change Item Value


To change the value of a specific item, refer to the index number:

Example

thislist <- list("apple", "banana", "cherry")


thislist[1] <- "blackcurrant"

# Print the updated list


thislist

List Length

To find out how many items a list has, use the length() function:
Example

thislist <- list("apple", "banana", "cherry")

length(thislist)

Check if Item Exists


To find out if a specified item is present in a list, use the %in% operator:
Example
Check if "apple" is present in the list:
thislist <- list("apple", "banana", "cherry")

"apple" %in% thislist

Output : - true

Add List Items


To add an item to the end of the list, use the append() function:

Example

Add "orange" to the list:


thislist <- list("apple", "banana", "cherry")
append(thislist, "orange")

To add an item to the right of a specified index, add "after=index


number" in the append() function:

Example

Add "orange" to the list after "banana" (index 2):

thislist <- list("apple", "banana", "cherry")

append(thislist, "orange", after = 2)


Range of Indexes

You can specify a range of indexes by specifying where to start and


where to end the range, by using the : operator:
Example
Return the second, third, fourth and fifth item:
thislist <- list("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")
(thislist)[2:5]

Note: The search will start at index 2 (included) and end at index 5
(included). Remember that the first item has index 1.

Join Two Lists

There are several ways to join, or concatenate, two or more lists in R.

The most common way is to use the c() function, which combines two
elements together:

Example

list1 <- list("a", "b", "c")


list2 <- list(1,2,3)
list3 <- c(list1,list2)

list3

Arrays: -

Arrays are the R data objects which can store data in more than two
dimensions. For example − If we create an array of dimension (2, 3, 4)
then it creates 4 rectangular matrices each with 2 rows and 3 columns.
Arrays can store only data type.

An array is created using the array() function. It takes vectors as input


and uses the values in the dim parameter to create an array.

Example

The following example creates an array of two 3x3 matrices each with 3
rows and 3 columns.
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.


result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)

When we execute the above code, it produces the following result −

1
[1] [2] [3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

2
[1] [2] [3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

Naming Columns and Rows

We can give names to the rows, columns and matrices in the array by
using the dimensions parameter.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.


result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames =
list(row.names,column.names, matrix.names))
print(result)
When we execute the above code, it produces the following result −

Matrix1
COL1 COL2 COL3
ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

Matrix2
COL1 COL2 COL3
ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

Accessing Array Elements

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames =
list(row.names, column.names, matrix.names))

# Print the third row of the second matrix of the array.


print(result[3,,2])

# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])

# Print the 2nd Matrix.


print(result[,,2])

For Practice :-
print(result[,,2])
print(result[,2,2])
print(result[1,3,1])
When we execute the above code, it produces the following result −

COL1 COL2 COL3


3 12 15
[1] 13
COL1 COL2 COL3
ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

Manipulating Array Elements

As array is made up matrices in multiple dimensions, the operations on


elements of array are carried out by accessing elements of the matrices.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
array1 <- array(c(vector1,vector2),dim = c(3,3,2))
print(array1)

# Create two vectors of different lengths.


vector3 <- c(9,1,0)
vector4 <- c(6,0,11,3,14,1,2,6,9)
array2 <- array(c(vector3,vector4),dim = c(3,3,2))
print(array2)

# create matrices from these arrays.


matrix1 <- array1[,,2]
print(matrix1)
matrix2 <- array2[,,2]
print(matrix2)

# Add the matrices.


result <- matrix1+matrix2
print(result)
When we execute the above code, it produces the following result −
matrix1 <- array1[,,2]
> print(matrix1)
[,1] [,2] [,3]
[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

>matrix2 <- array2[,,2]


> print(matrix2)

[,1] [,2] [,3]


[1,] 2 9 6
[2,] 6 1 0
[3,] 9 0 11

> result <- matrix1+matrix2


> print(result)
[,1] [,2] [,3]
[1,] 7 19 19
[2,] 15 12 14
[3,] 12 12 26
Factors in R: -

Factors are the data objects which are used to categorize the data and
store it as levels. They can store both strings and integers. They are useful
in the columns which have a limited number of unique values. Like
"Male, "Female" and True, False etc. They are useful in data analysis for
statistical modelling.

Factors are created using the factor () function by taking a vector as


input.

Factors are used to categorize data. Examples of factors are:

 Demography: Male/Female
 Music: Rock, Pop, Classic, Jazz
 Training: Strength, Stamina

Example: -

# R program to illustrate factors


# Creating factor using factor()
fac = factor(c("Male", "Female", "Male",
"Male", "Female", "Male", "Female"))
print(fac)
Output:
[1] Male Female Male Male Female Male Female
Levels: Female Male

To only print the levels, use the levels() function:


# R program to illustrate factors
# Creating factor using factor()
fac = factor(c("Male", "Female", "Male",
"Male", "Female", "Male", "Female"))
levels(fac)

Output:-
Levels: Female Male
R Data Frames

Data Frames are data displayed in a format as a table.

Data Frames can have different types of data inside it. While the first
column can be character, the second and third can be numeric or logical.
However, each column should have the same type of data.

Use the data.frame() function to create a data frame:

Following are the characteristics of a data frame.

 The column names should be non-empty.


 The row names should be unique.
 The data stored in a data frame can be of numeric, factor or
character type.
 Each column should contain same number of data items.

Example
# Create a data frame
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# Print the data frame
Data_Frame

Summarize the Data

Use the summary() function to summarize the data from a Data Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame
Access Items

We can use single brackets [ ], double brackets [[ ]] or $ to access


columns from a data frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame[1]

Data_Frame[["Training"]]

Data_Frame$Training

Add Rows

Use the rbind() function to add new rows in a Data Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new row


New_row_DF <- rbind(Data_Frame, c("Strength", 110, 110))

# Print the new row


New_row_DF
Add Columns

Use the cbind() function to add new columns in a Data Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# Add a new column
New_col_DF <- cbind(Data_Frame, Steps = c(1000, 6000, 2000))
# Print the new column
New_col_DF
Remove Rows and Columns

Use the dataframe[-c(row_number), ] function to remove rows and


columns in a Data Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Remove the first row and column


Data_Frame_New <- Data_Frame[-c(1), -c(1)]

# Print the new data frame


Data_Frame_New

Example for practice :-


# create a dataframe
data=data.frame(name=c("manoj","manoja","manoji","mano","manooj"),
age=c(21,23,21,10,22))
# display by removing 4 th row
print(data[-c(4), ])
# display by removing 5 th row
print(data[-c(5), ])
# display by removing 1 st row
print(data[-c(1), ])
Amount of Rows and Columns

Use the dim() function to find the amount of rows and columns in a Data
Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

dim(Data_Frame)

You can also use the ncol() function to find the number of columns
and nrow() to find the number of rows:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

ncol(Data_Frame)
nrow(Data_Frame)
Data Frame Length

Use the length() function to find the number of columns in a Data Frame

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

length(Data_Frame)
Combining Data Frames

Use the rbind() function to combine two or more data frames in R


vertically:

Example
Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame2 <- data.frame (


Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)

New_Data_Frame <- rbind(Data_Frame1, Data_Frame2)


New_Data_Frame

And use the cbind() function to combine two or more data frames in R
horizontally:

Example
Data_Frame3 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame4 <- data.frame (


Steps = c(3000, 6000, 2000),
Calories = c(300, 400, 300)
)

New_Data_Frame1 <- cbind(Data_Frame3, Data_Frame4)


New_Data_Frame1

You might also like