0% found this document useful (0 votes)
3 views7 pages

Unit 2 R

The document provides an overview of various data structures in R, including arrays, factors, tibbles, and indexing methods for vectors, matrices, data frames, and lists. It also covers slicing techniques for data extraction and string operations using the stringr package. Each section includes examples of how to create and manipulate these data structures and perform operations on strings.

Uploaded by

swapnajakka3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

Unit 2 R

The document provides an overview of various data structures in R, including arrays, factors, tibbles, and indexing methods for vectors, matrices, data frames, and lists. It also covers slicing techniques for data extraction and string operations using the stringr package. Each section includes examples of how to create and manipulate these data structures and perform operations on strings.

Uploaded by

swapnajakka3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Arrays

Arrays are the R data objects which store the data in more than two
dimensions. Arrays are n-dimensional data structures. For example, if we
create an array of dimensions (2, 3, 3) then it creates 3 rectangular matrices each
with 2 rows and 3 columns. They are homogeneous data structures.
How to create arrays in R
A = array(c(1, 2, 3, 4, 5, 6, 7, 8), # Taking sequence of elements
dim = c(2, 2, 2)) # Creating two rectangular matrices
# each with two rows and two columns
print(A)
dim(A) #check dimension of the array

Factors
Factors are the data objects which are used to categorize the data and store it
as levels. They are useful for storing categorical data. They can store both
strings and integers. They are useful to categorize unique values in columns like
“TRUE” or “FALSE”, or “MALE” or “FEMALE”, etc.. They are useful in data analysis
for statistical modelling.
How to create factors in R
fact = factor(c("Male", "Female", "Male",
"Male", "Female", "Male", "Female"))
print(fact)

Tibbles
Tibbles are an enhanced version of data frames in R, part of the tidyverse
(collection of open source packages for the R programming language).
How we can create a tibble in R
install.packages("tibble") #first install the package
library(tibble) # Load the tibble package
data <- tibble(name = c("Sandeep", "Amit", "Aman"),age = c(25, 30, 35),
city = c("Pune", "Jaipur", "Delhi"))
print(data)

Indexing data structures


In R, indexing is used to access specific elements of data structures like vectors,
matrices, data frames, and lists.
1. Vectors
Vectors are the simplest data structure in R. You can index a vector by
position, logical condition, or using names.
 By position:
a <- c(10, 20, 30, 40)
a[1] # Accesses the first element
a[2:3] # Accesses the second and third element

 By logical condition:
a[a > 20] # Accesses elements greater than 20

 By name:
v <- c(a = 10, b = 20, c = 30)
v["b"] # Accesses the element named “b”

2. Matrices
Matrices are two-dimensional arrays. Indexing can be done by specifying row
and column indices.

 By position (row, column):


m <- matrix(1:9, nrow = 3, ncol = 3)
m[1, 2] # Accesses element in the first row and second column

 By row or column:
m[1, ] # Accesses the entire first row
m[, 2] # Accesses the entire second column

 By logical indexing:
m[m > 5] # Accesses elements greater than 5

3. Data Frames
Data frames are like matrices, but each column can contain different
types of data. You can index data frames by row, column, or column
names.
 By row and column:
df <- data.frame(a = 1:3, b = c("x", "y", "z"))
df[1, 2] # Accesses element in the first row, second column

 By column name:
df[["b"]] # Accesses the column "b"
df$b # Another way to access column “b”

4. Lists
Lists are versatile structures that can hold different types of data.

 By position:
lst <- list(a = 1, b = "hello", c = TRUE)
lst[[1]] # Accesses the first element

 By name:
lst[["b"]] # Accesses the element named "b"

 By $ operator:
lst$b # Another way to access element "b"

Slicing In R
The process of extracting or accessing particular subsets or sections of a
vector, matrix, or data frame depending on predetermined criteria is
known as slicing. Using start and end indices, we can choose sequential
ranges of elements or subsets of data.

This can be done by three ways. They are listed below-


 Slicing with [ , ]
dataframeName[ fromRow : toRow , columnNumber]

stats <- data.frame(player=c('A', 'B', 'C', 'D'), # create a data frame


runs=c(100, 200, 408, NA),
wickets=c(17, 20, NA, 5))
print("stats Dataframe")
stats

stats[2:3,c(1,2)] # fetch 2,3 rows and 1,2 columns

cat("players - ", stats[1:3,1]) # fetch 1:3 rows of 1st column

 Slicing with logical vectors


# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 23),
wickets=c(17, 20, 3, 5))

print("stats Dataframe")
stats

# fetch player details who scores


# more than 100 runs
batsmens<-stats[stats$runs>100,]
batsmens

 Slicing with subset()


subset( x = dataframe, subset = filter_logic, select=c(columnNames))

# create a data frame


stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 23),
wickets=c(17, 20, 3, 5))

print("stats Dataframe")
stats

# fetch player details who pick


# more than 5 wickets
subset(x=stats, subset=wickets>5, select=c(player,wickets))

String operations using stringr


package

The stringr package in R provides easy-to-use functions for handling and


manipulating strings.
Here are some common string operations using stringr:

install.packages("stringr") #If you haven’t installed stringr


library(stringr) #load it into your session

Concatenating Strings
str_c("Hello", "World", sep = " ") #str_c() is used to concatenate strings
str_c("Player", 1:3) #concatenate multiple strings

Extracting Substrings
#str_sub() extracts substrings from a given string
str_sub("Hello World", 1, 5)
str_sub("Hello World", -5, -1)

Changing Case
#str_to_upper() converts to uppercase.
str_to_upper("hello")

#str_to_lower() converts to lowercase


str_to_lower("HELLO")

#str_to_title() converts to title case (capitalizing the first letter of each


word).
str_to_title("hello world")

String Length
#str_length() returns the number of characters in a string.
str_length("Hello Students")

Detecting Patterns in Strings


#str_detect() checks if a pattern exists in a string
str_detect("Hello World", "World")
str_detect("Hello World", "Earth")

Extracting Patterns
#str_extract() extracts the first occurrence of a pattern from a string
str_extract("abc123xyz", "\\d+") # "\\d+" refers to any digit

#str_extract_all() extracts all occurrences of the pattern


str_extract_all("abc123xyz456", "\\d+")
str_extract_all("abc123xyz456", "[a-zA-Z]+")

Replacing Patterns
#str_replace() replaces the first occurrence of a pattern
str_replace("The cat and the cat", "cat", "dog")

#str_replace_all() replaces all occurrences


str_replace_all("The cat and the cat", "cat", "dog")

Trimming Whitespace
#str_trim() removes leading and trailing whitespace
str_trim(" Hello World ")

#str_squish() removes extra whitespace between words


str_squish("Hello World")
Splitting Strings
#str_split() splits a string into a list based on a delimiter
str_split("a,b,c", ",")

Counting Matches
# str_count() counts the number of occurrences of a pattern
str_count("ab123cd456", "\\d")

You might also like