Unit 2 R
Unit 2 R
Arrays are the R data objects which store the data in more than two
dimensions. Arrays are n-dimensional data structures. For example, if we
create an array of dimensions (2, 3, 3) then it creates 3 rectangular matrices each
with 2 rows and 3 columns. They are homogeneous data structures.
How to create arrays in R
A = array(c(1, 2, 3, 4, 5, 6, 7, 8), # Taking sequence of elements
dim = c(2, 2, 2)) # Creating two rectangular matrices
# each with two rows and two columns
print(A)
dim(A) #check dimension of the array
Factors
Factors are the data objects which are used to categorize the data and store it
as levels. They are useful for storing categorical data. They can store both
strings and integers. They are useful to categorize unique values in columns like
“TRUE” or “FALSE”, or “MALE” or “FEMALE”, etc.. They are useful in data analysis
for statistical modelling.
How to create factors in R
fact = factor(c("Male", "Female", "Male",
"Male", "Female", "Male", "Female"))
print(fact)
Tibbles
Tibbles are an enhanced version of data frames in R, part of the tidyverse
(collection of open source packages for the R programming language).
How we can create a tibble in R
install.packages("tibble") #first install the package
library(tibble) # Load the tibble package
data <- tibble(name = c("Sandeep", "Amit", "Aman"),age = c(25, 30, 35),
city = c("Pune", "Jaipur", "Delhi"))
print(data)
By logical condition:
a[a > 20] # Accesses elements greater than 20
By name:
v <- c(a = 10, b = 20, c = 30)
v["b"] # Accesses the element named “b”
2. Matrices
Matrices are two-dimensional arrays. Indexing can be done by specifying row
and column indices.
By row or column:
m[1, ] # Accesses the entire first row
m[, 2] # Accesses the entire second column
By logical indexing:
m[m > 5] # Accesses elements greater than 5
3. Data Frames
Data frames are like matrices, but each column can contain different
types of data. You can index data frames by row, column, or column
names.
By row and column:
df <- data.frame(a = 1:3, b = c("x", "y", "z"))
df[1, 2] # Accesses element in the first row, second column
By column name:
df[["b"]] # Accesses the column "b"
df$b # Another way to access column “b”
4. Lists
Lists are versatile structures that can hold different types of data.
By position:
lst <- list(a = 1, b = "hello", c = TRUE)
lst[[1]] # Accesses the first element
By name:
lst[["b"]] # Accesses the element named "b"
By $ operator:
lst$b # Another way to access element "b"
Slicing In R
The process of extracting or accessing particular subsets or sections of a
vector, matrix, or data frame depending on predetermined criteria is
known as slicing. Using start and end indices, we can choose sequential
ranges of elements or subsets of data.
print("stats Dataframe")
stats
print("stats Dataframe")
stats
Concatenating Strings
str_c("Hello", "World", sep = " ") #str_c() is used to concatenate strings
str_c("Player", 1:3) #concatenate multiple strings
Extracting Substrings
#str_sub() extracts substrings from a given string
str_sub("Hello World", 1, 5)
str_sub("Hello World", -5, -1)
Changing Case
#str_to_upper() converts to uppercase.
str_to_upper("hello")
String Length
#str_length() returns the number of characters in a string.
str_length("Hello Students")
Extracting Patterns
#str_extract() extracts the first occurrence of a pattern from a string
str_extract("abc123xyz", "\\d+") # "\\d+" refers to any digit
Replacing Patterns
#str_replace() replaces the first occurrence of a pattern
str_replace("The cat and the cat", "cat", "dog")
Trimming Whitespace
#str_trim() removes leading and trailing whitespace
str_trim(" Hello World ")
Counting Matches
# str_count() counts the number of occurrences of a pattern
str_count("ab123cd456", "\\d")