0% found this document useful (0 votes)
4 views58 pages

R Unit2

This document provides an overview of data objects in R, including variables, vectors, and arrays. It covers topics such as naming and assigning values to variables, generating datasets, and utilizing control structures. Additionally, it explains vector operations, handling missing values, and creating multi-dimensional arrays with examples.

Uploaded by

Shiva Prasanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views58 pages

R Unit2

This document provides an overview of data objects in R, including variables, vectors, and arrays. It covers topics such as naming and assigning values to variables, generating datasets, and utilizing control structures. Additionally, it explains vector operations, handling missing values, and creating multi-dimensional arrays with examples.

Uploaded by

Shiva Prasanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 58

UNIT-02

Data objects in R, Series and Control Statements: Assignment, Modes, Operators,


Basic Functions, Generating Data sets, Control Structures Vectors:
Definition, Declaration, Generating, Indexing, Naming, Adding and Removing Elements,
Operations on Vectors: Recycling, Special Operators,
Functions for Vectors Missing Values, Null Values, Filtering and Sub setting,
Data Structures in R-Arrays: Creating Arrays, Dimensions and Naming,
Indexing and Naming, Functions on Arrays.
Variables in R-
Variables in R are used to store and manipulate data. Here's an overview of
variables in R:
1. Naming Variables:
Variable names can contain letters, numbers, periods (.), and underscores (_),
but must start with a letter or a period followed by a letter.
Variable names are case-sensitive.
Avoid using reserved keywords as variable names.
2. Assigning Values:
Values are assigned to variables using the assignment operator <- .
Example:
x <- 10
y <- "Hello"
3. Data Types:
R supports various data types, including numeric, character, logical, complex,
raw, and factors.
The data type of a variable is determined by the value assigned to it.
Example:
num_var <- 10 # Numeric
char_var <- "Hello" #
Character bool_var <- TRUE
# Logical
4. Checking Data Types:
The class() function is used to check the data type of a variable.
Example:
class(num_var) # Output: "numeric"
class(char_var) # Output: "character"
5. Reassigning Values:
Values of variables can be reassigned at any time.
Example: x <- 20
6. Removing Variables:
The rm() function is used to remove variables from the workspace.
Example: rm(x)
7. Printing Variables:
To print the value of a variable, simply type its name in the console.
Example: num_var # Output: 10
8. Variable Scope:
 Variables can have either global or local scope.
 Global variables are accessible throughout the R session, while
local variables are limited to the environment in which they are
defined.
9. Avoiding Common Pitfalls:
 Avoid using reserved keywords as variable names.
 Be cautious with variable names that differ only by case, as they can
lead to confusion.
Input of Data
Using readline() method
In R language readline() method takes input in string format. If one inputs an integer then it is inputted as a
string, lets say, one wants to input 255, then it will input as “255”, like a string. To convert the inputted
value to the desired data type, there are some functions in R,
as.integer(n); —> convert to integer
as.numeric(n); —> convert to numeric type (float, double etc)
as.complex(n); —> convert to complex number (i.e 3+2i)
as.Date(n) —> convert to date …, etc
n
Syntax:
var = readline();
var = as.integer(var);
Basic Functions In R-
Generating Data Sets-
Generating datasets in R is a crucial aspect of data analysis and statistical
modeling. R provides several built-in functions and packages to create synthetic
datasets for various purposes, including testing algorithms, illustrating concepts,
and conducting simulations. Here are some commonly used methods to
generate datasets in R:
1. Random Data Generation:
 rnorm(): Generates random numbers from a normal distribution.
Ex-
# Generate 100 random numbers from a normal distribution with mean 0 and
standard deviation 1
random_data <- rnorm(100, mean = 0, sd = 1)
 runif(): Generates random numbers from a uniform distribution.
# Generate 100 random numbers from a uniform distribution between 0 and 1
random_data <- runif(100)
2. Random Sampling:
sample(): Randomly samples elements from a vector.
Ex-
# Sample 10 numbers from 1 to 100 without replacement
sampled_data <- sample(1:100, 10, replace = FALSE)
3. Creating Sequences:
seq(): Generates sequences of numbers.
Ex-
# Generate a sequence from 1 to 10 with step size 2
sequence <- seq(1, 10, by = 2)
4. Generating Factor Levels:
gl(): Generates factor levels.
Ex-
# Generate a factor with three levels, each repeated 5 times
factor_levels <- gl(3, 5)
5. Generating Time Series Data:
ts(): Creates time series objects.
Ex-
# Generate a time series with random data
time_series <- ts(random_data)
6. Creating Data Frames:
data.frame(): Combines vectors into a data frame.
EX-
# Create a data frame with two columns: age and height
df <- data.frame(age = c(25, 30, 35), height = c(170, 175, 180))
7. Generating Synthetic Data:
MASS Package: Provides functions to generate synthetic datasets for regression
analysis.
Ex-
library(MASS)
# Generate a synthetic dataset with 100 observations and 4 predictors
synthetic_data <- mvrnorm(100, mu = rep(0, 4), Sigma = diag(4))
8. Simulating Data:
sim(): Simulates data from a statistical model using the sim() function from
various packages like simstudy, simEd, etc.
Ex-
library(simstudy)
# Simulate a dataset with 100 observations from a linear regression model
simulated_data <- sim(linReg(coef = c(2, 3), n = 100))
These functions and packages allow you to generate synthetic datasets
efficiently for various analysis purposes in R. Depending on your requirements,
you can choose the appropriate method to create datasets that suit your
needs.
Vectors-
Nn
Naming Vectors:
In R, you can assign names to individual elements of a
vector using the names() function. You can also assign
names to the entire vector using the names() attribute.
Here's how:
Assigning Names to Elements:
Example:
vec <- c(10, 20, 30, 40, 50)
names(vec) <- c("A", "B", "C", "D", "E")
Assigning Name to Entire Vector:
Example:
vec <- c(10, 20, 30, 40, 50)
names(vec) <- "Numbers"
Accessing Named Elements:
Once you've named elements of a vector, you can access them
using their names:
Ex-
vec <- c(A = 10, B = 20, C = 30, D = 40, E = 50)
vec["C"] # Accesses the element named "C" (30)
Indexing Vectors:
In R, you can access elements of a vector using square brackets [
]. Indexing starts at 1. There are several ways to index vectors:
Single Element Indexing:
To access a single element of a vector, specify its index inside
square brackets.
Example:
vec <- c(10, 20, 30, 40, 50)
vec[3] # Accesses the third element of the vector (30)
Multiple Elements Indexing:
You can access multiple elements of a vector by specifying a
vector of indices inside square brackets.
Example:
vec <- c(10, 20, 30, 40, 50)
vec[c(2, 4)] # Accesses the second and fourth elements of the
vector (20, 40)
Removing elements in a vector-
Negative Indexing:
You can exclude specific elements from a vector using negative
indices.
Example:
vec <- c(10, 20, 30, 40, 50)
vec[-3] # Excludes the third element of the vector (10, 20, 40,
50)
Vector Recycling in R-
Vector recycling in R refers to the automatic repetition of
shorter vectors to match the length of longer vectors during
operations. This feature allows for more concise and efficient
code in many situations. Let's explore how vector recycling
works with examples:
Basic Example:
\# Define two vectors of different lengths
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5)
# Add the vectors
together result <- vec1 +
vec2
# Result: c(5, 7, 7)
In this example, since vec1 has a length of 3 and vec2 has a
length of 2, vector recycling occurs. vec2 is recycled to match
the length of vec1 during the addition operation. The elements
of vec2 are repeated to match the length of vec1, resulting in
c(4, 5, 4) before the addition takes place.
More Complex Example:
# Define a vector and a scalar
vec <- c(1, 2, 3)
scalar <- 10
# Add the scalar to the vector
result <- vec + scalar
# Result: c(11, 12, 13)
In this example, the scalar value 10 is recycled to match the
length of the vector vec, resulting in c(10, 10, 10) before
the addition takes place.
Recycling with Logical Operations:
# Define two logical vectors of different lengths
logical_vec1 <- c(TRUE, FALSE)
logical_vec2 <- c(TRUE, FALSE, TRUE, FALSE)
# Perform a logical operation
result <- logical_vec1 & logical_vec2
# Result: c(TRUE, FALSE, TRUE,
FALSE)
In this example, vector recycling occurs with logical
vectors. logical_vec1 is recycled to match the length of
logical_vec2 during the logical AND operation.
Conclusion:
Vector recycling in R allows for more concise and intuitive code
by automatically matching the lengths of vectors during
operations. Understanding how vector recycling works is
essential for writing efficient and effective R code, especially
when working with vectors of different lengths.
NULL values-
In R, vectors cannot directly contain null values. However, you
can achieve similar functionality by using NA (Not Available)
values or creating a vector with zero length. Let's discuss these
concepts:
1. NA Values in Vectors:
 NA is a special value in R that represents missing
or undefined data.
 You can create a vector with NA values using the
NA function or by assigning NA directly.
Example:
vec <- c(1, 2, NA, 4, 5) # Create a vector with NA values
2. Empty Vectors:
An empty vector has zero length and can be created using the
c() function with no arguments.
Example:
empty_vec <- c() # Create an empty vector
3. Working with NA Values:
 Operations involving NA values often return NA
unless explicitly handled.
 Functions like is.na() and na.omit() are used to check
for and remove NA values, respectively.
Example:
vec <- c(1, 2, NA, 4, 5)
# Check for NA values
is.na(vec) # Returns TRUE for NA values
# Remove NA values
clean_vec <- na.omit(vec)
4. Dealing with Empty Vectors:
 Empty vectors can be useful as placeholders or
for initializing variables.
 Operations involving empty vectors typically return
empty vectors.
Example:
empty_vec <- c()
# Concatenating an empty vector with another vector
new_vec <- c(empty_vec, 1, 2, 3) # new_vec is now c(1, 2, 3)
Conclusion:
While R does not have a direct representation of null values in
vectors, NA values and empty vectors serve similar purposes.
NA values represent missing or undefined data, while empty
vectors have zero length and can be used as placeholders.
Understanding how to work with NA values and empty
vectors is important for handling missing data and initializing
variables in R.
Filtering a Vector in R:
Logical Indexing:
Use a logical condition inside square brackets to subset the vector.
Example: vec <- c(1, 2, 3, 4, 5)
filtered_vec <- vec[vec > 3] # Select elements greater than 3
Using subset() Function:
Use the subset() function to filter elements based on conditions.
Example: vec <- c(1, 2, 3, 4, 5)
filtered_vec <- subset(vec, vec > 3) # Select elements greater than 3
Using filter() Function (from dplyr package):
When working with data frames, use the filter() function from the dplyr
package.
Example: install.packages("dplyr") # Install dplyr package if not already installed
library(dplyr)
vec <- c(1, 2, 3, 4, 5)
filtered_vec <- as.data.frame(vec) %>% filter(vec > 3) %>% pull(vec) # Select
elements greater than 3
In R, arrays are multi-dimensional objects that can store data of the same type.
You can create arrays using the array() function or by converting matrices or
vectors into arrays. Here's how to create arrays in R:
1. Using the array() Function:
You can create arrays using the array() function by specifying the data,
dimensions, and optionally dimension names.
Ex-
e.g. > A = array(c(1, 2, 3, 4, 5, 6, 7, 8),dim = c(2, 2, 2))
> print(A) #PRINTING AN ARRAY
Output:

, , 1 , , 2
[,1] [,2] [,1] [,2]
[1,] 5 7 [1,] 1 3
[2,] 2 4 [2,] 2 4

2. Converting Matrices or Vectors to Arrays:


You can convert matrices or vectors into arrays using the array() function. Just
provide the data and specify the dimensions.
From Matrix:
# Create a matrix
mat <- matrix(1:9, nrow = 3,
ncol = 3) # Convert the matrix
to an array
arr_from_mat <- array(mat, dim =
c(3, 3, 1)) # Display the array
arr_from_mat
From Vector:
# Create
a vector
vec <- 1:8
# Convert the vector to an array
arr_from_vec <- array(vec, dim =
c(2, 2, 2)) # Display the array
arr_from_vec
Example
The following example creates an array of two 3x3 matrices
each with 3 rows and 3columns.

# Create two vectors of different


lengths. vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
result <-
array(c(vector1,vector2),dim=c(3,3,2))
print(result)

When we execute the above code, it produces the following


result:
, , 1
[,1] [,2] [,3]
[1,] 5 1 1
0 3
[2,] 9 1 1
1 4
[3,] 3 1 1
2 5
, , 2
[,1] [,2] [,3]
[1,] 5 1 1
0 3
[2,] 9 1 1
1 4
[3,] 3 1 1
2 5

DIMENSION NAMES
We can give names to the rows, columns and matrices in the
array by using the dimnames parameter.

# Create two vectors of different


lengths. vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")
# Take these vectors as input to the array.
result <-
array(c(vector1,vector2),dim=c(3,3,2),dimnames =
list(row.names, column.names, matrix.names))
print(result)

Using Dimension Names for Indexing:


After assigning dimension names, you can use them instead of indices
for accessing elements of the array.
EX-
arr["Row2", "Col3", "Layer1"] # Access element using di
When we execute the above code, it produces the following
result:
, , Matrix1
ROW1 ROW2
ROW3
COL 5 1 1
1 0 3
COL 9 1 1
2 1 4
COL 3 1 1
3 2 5
, , Matrix2
ROW1 ROW2
ROW3
COL 5 1 1
1 0 3
COL 9 1 1
2 1 4
COL 3 1 1
3 2 5

Accessing Array Elements


# Create two vectors of different
lengths. vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")

matrix.names <- c("Matrix1","Matrix2")


# Take these vectors as input to the array.
result <-
array(c(vector1,vector2),dim=c(3,3,2),dimnames =
list(column.names,row.names,matrix.names))
# Print the third row of the second matrix of the
array. print(result[3,,2])
# Print the element in the 1st row and 3rd column
of the 1st matrix.
print(result[1,3,1])
# Print the 2nd
Matrix.
print(result[,,2])

When we execute the above code, it produces the following


result:

ROW1 ROW2
ROW3 3 12 15
[1] 13
ROW1 ROW2
ROW3 COL1 5
10 13
COL2 9 11 14
COL3 3 12 15

Manipulating Array Elements


As array is made up matrices in multiple dimensions, the
operations on elements of array are carried out by accessing
elements of the matrices.
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
array1 <- array(c(vector1,vector2),dim=c(3,3,2))
# Create two vectors of different lengths.

vector3 <- c(9,1,0)


vector4 <- c(6,0,11,3,14,1,2,6,9)
array2 <- array(c(vector3,vector4),dim=c(3,3,2))
# create matrices from these arrays.
matrix1 <- array1[,,2]
matrix2 <- array2[,,2] #
Add the matrices.
result <- matrix1+matrix2
print(result)

When we execute the above code, it produces the following result:


[,1] [,2] [,3]
[1,] 10 20 26
[2,] 18 22 28
[3,] 6 24 30

Dimensions of array in R-
In R, the dimensions of an array define its structure and determine how the data
is organized across multiple dimensions. You can specify the dimensions of an
array using the dim argument in the array() function or by assigning values to
the dim() attribute of an existing array. Let's explore how dimensions work in R
arrays:
1. Specifying Dimensions with array() Function:
You can create an array in R and specify its dimensions using the dim argument
in the array() function.
# Create a 3x3x2 array filled with random numbers
arr <- array(data = runif(18), dim = c(3, 3, 2))
# Display the array
arr
In this example, c(3, 3, 2) specifies the dimensions of the array, indicating that
it has 3 rows, 3 columns, and 2 layers.
2. Assigning Dimensions to an Existing Array:
You can also assign dimensions to an existing array using the dim() function.
# Create a matrix
mat <- matrix(1:9, nrow = 3, ncol = 3)
# Convert the matrix to an array and assign dimensions
arr_from_mat <- array(mat, dim = c(3, 3, 1))
# Display the array
arr_from_mat
In this example, c(3, 3, 1) assigns dimensions to the array, specifying 3 rows, 3
columns, and 1 layer.
3. Accessing Dimensions:
You can access the dimensions of an array using the dim() function.
# Get the dimensions of the array
array_dim <- dim(arr)
# Display the dimensions
array_dim
Conclusion:
Dimensions define the structure of an array in R, specifying the number of rows,
columns, and layers (or additional dimensions) it contains. You can specify
dimensions when creating an array or assign them to an existing array.
Understanding and managing dimensions are essential for working with multi-
dimensional data in
Indexing Arrays:
In R, arrays are multi-dimensional objects, and indexing allows you to access
specific elements or subsets of an array.
Basic Indexing:
You can access individual elements of an array by specifying indices for each
dimension within square brackets [ ].
EX-
# Create a 3x3x2 array
arr <- array(1:18, dim = c(3, 3, 2))
# Accessing elements
arr[2, 3, 1] # Access element in the second row, third column, and first layer
Slicing:
You can extract subsets of an array using slicing, specifying ranges of indices for
each dimension.
EX-
# Slicing to extract a subset of the array
subset_arr <- arr[1:2, , ] # Extract first two rows for all columns and layers
Logical Indexing:
You can use logical vectors to subset elements of an array based on specific
conditions.
EX-
# Logical indexing to filter elements
filtered_arr <- arr[arr > 10] # Extract elements greater than 10
mension names
Functions for arrays in R-
1. Creating Arrays:
array(): Create an array from data and specified dimensions.
EX-
arr <- array(data = 1:12, dim = c(3, 2, 2))
2. Summarizing Arrays:
sum(): Calculate the sum of elements in an array.
EX-
total_sum <- sum(arr)
mean(): Compute the mean of elements in an array.
EX-
avg_value <- mean(arr)
min() and max(): Find the minimum and maximum values in an array.
3. Aggregating Functions:
Calculations across Array Elements
We can do calculations across the elements in an array using the
apply()function.

Syntax
apply(x, margin, fun)
Following is the description of the parameters used:
 x is an array.
 margin is the name of the data set used.
 fun is the function to be applied across the elements of the array.

Example
We use the apply() function below to calculate the sum of the
elements in the rows of an array across all the matrices.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.

new.array <- array(c(vector1,vector2),dim=c(3,3,2))

print(new.array)
# Use apply to calculate the sum of the rows across all
the matrices.
result <- apply(new.array, c(1), sum)
print(result)

When we execute the above code, it produces the following result:


, , 1
[,1] [,2] [,3]
[1,] 5 1 1
0 3
[2,] 9 1 1
1 4
[3,] 3 1 1
2 5
, , 2
[,1] [,2] [,3]
[1,] 5 1 1
0 3
[2,] 9 1 1
1 4
[3,] 3 1 1
2 5

[1] 56 68 60

EX-
row_sums <- apply(arr, 1, sum) # Calculate row-wise sum
rowSums() and colSums(): Compute row-wise or column-wise sums.
4. Statistical Functions:
 quantile(): Compute quantiles of elements in an array.
 sd(): Calculate the standard deviation of elements in an array.
 var(): Compute the variance of elements in an array.
5. Conditional Functions:
 which(): Return the indices of array elements satisfying a condition.
 ifelse(): Perform conditional operations on array elements.
6.Mathematical functions-
 log(), exp(), sqrt(): Apply logarithmic, exponential, and square
root functions to array elements.
 sin(), cos(), tan(): Apply trigonometric functions.

You might also like