0% found this document useful (0 votes)
14 views27 pages

Subset Creation in R

Uploaded by

Kakashi Hatake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views27 pages

Subset Creation in R

Uploaded by

Kakashi Hatake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Subset Creation in R

November 19, 2024

This conversation may reflect the link creator’s personalized data, which isn’t shared and can
meaningfully change how the model responds.

Consider the following data frame given below:


subject
class
1
marks
1
2
56
2
3
75
1
4
48
2
5
69
1
6
84
2
i.
53
Create a subset of subject less than 4 by using subset () function and demonstrate
the output.
ii.
Create a subset where the subject column is less than 3 and the class equals to 2
by using [ ] brackets and demonstrate the output.

To create the subsets from the given data frame, we'll use the `subset()` function and indexing with
square brackets `[ ]` in R. Here's how you can accomplish it:

Original Data Frame

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/27
R

# Create the data frame


df <- data.frame(
subject = c(1, 2, 3, 4, 5, 6),
class = c(1, 2, 1, 2, 1, 2),
marks = c(56, 75, 48, 69, 84, 53)
)
print(df)

Output

kotlin

subject class marks


1 1 1 56
2 2 2 75
3 3 1 48
4 4 2 69
5 5 1 84
6 6 2 53

Part i: Subset where `subject` is less than 4 using `subset()`

# Create subset where subject < 4


subset1 <- subset(df, subject < 4)
print(subset1)

Output

kotlin

subject class marks


1 1 1 56
2 2 2 75
3 3 1 48

Part ii: Subset where `subject` is less than 3 and `class` equals 2 using `[ ]`
brackets

# Create subset where subject < 3 and class == 2


subset2 <- df[df$subject < 3 & df$class == 2, ]
print(subset2)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/27
Output

kotlin

subject class marks


2 2 2 75

Let me know if you'd like further clarifications or examples!

The data analyst of Argon technology Mr. John needs to enter the salaries of 10
employees in R. The salaries of the employees are given in the following table:
Sr. No. Name of employees Salaries
1
Vivek
21000
2
Karan
55000
3
James
67000
4
Soham
50000
5
Renu
54000
6
Farah
40000
7
Hetal
30000
8
Mary
70000
9
Ganesh
10
20000
Krish
15000
i.
ii.
Which R command will Mr. John use to enter these values demonstrate the output.
Now Mr. John wants to add the salaries of 5 new employees in the existing table,

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/27
which command he will use to join datasets with new values in R. Demonstrate the
output.

To enter and manage the given salary data in R, Mr. John can use a combination of commands to create
the initial data frame and then append additional data to it. Here's how he can proceed:

Part i: Create the Initial Dataset

To enter the salaries of the 10 employees, Mr. John can use the `data.frame()` function.

# Create the initial dataset


employees <- data.frame(
Sr_No = 1:10,
Name = c("Vivek", "Karan", "James", "Soham", "Renu", "Farah", "Hetal", "Mary", "Ganesh",
"Krish"),
Salaries = c(21000, 55000, 67000, 50000, 54000, 40000, 30000, 70000, 20000, 15000)
)
print(employees)

Output

Sr_No Name Salaries


1 1 Vivek 21000
2 2 Karan 55000
3 3 James 67000
4 4 Soham 50000
5 5 Renu 54000
6 6 Farah 40000
7 7 Hetal 30000
8 8 Mary 70000
9 9 Ganesh 20000
10 10 Krish 15000

Part ii: Add Salaries of 5 New Employees

To add new rows of employee data, Mr. John can use the `rbind()` function. Here is an example with the
new employee data.

New Employee Data:

Sr. No. Name Salaries


11 Rahul 45000
12 Pooja 35000

13 Ajay 60000
14 Simran 52000

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/27
Sr. No. Name Salaries

15 Meera 47000

R Code to Append New Data

# Create new employee data


new_employees <- data.frame(
Sr_No = 11:15,
Name = c("Rahul", "Pooja", "Ajay", "Simran", "Meera"),
Salaries = c(45000, 35000, 60000, 52000, 47000)
)

# Append new employees to the existing dataset


updated_employees <- rbind(employees, new_employees)
print(updated_employees)

Output

Sr_No Name Salaries


1 1 Vivek 21000
2 2 Karan 55000
3 3 James 67000
4 4 Soham 50000
5 5 Renu 54000
6 6 Farah 40000
7 7 Hetal 30000
8 8 Mary 70000
9 9 Ganesh 20000
10 10 Krish 15000
11 11 Rahul 45000
12 12 Pooja 35000
13 13 Ajay 60000
14 14 Simran 52000
15 15 Meera 47000

Let me know if you'd like further clarification or assistance!

i. Write the script to sort the values contained in the following vector in ascending
order and descending order: (23, 45, 10, 34, 89, 20, 67, 99). Demonstrate the
output.
ii. Name and explain the operators used to form data subsets in R.

Part i: Sorting a Vector in R

Script for Ascending and Descending Order

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 5/27
R

# Define the vector


vec <- c(23, 45, 10, 34, 89, 20, 67, 99)

# Sort in ascending order


ascending <- sort(vec)

# Sort in descending order


descending <- sort(vec, decreasing = TRUE)

# Print the results


print("Ascending Order:")
print(ascending)

print("Descending Order:")
print(descending)

Output

csharp

[1] "Ascending Order:"


[1] 10 20 23 34 45 67 89 99

[1] "Descending Order:"


[1] 99 89 67 45 34 23 20 10

Part ii: Operators Used to Form Data Subsets in R

In R, several operators are used to form subsets of data. Here’s an explanation of the most common
ones:

1. Square Brackets `[ ]`

Used to extract specific elements from vectors, matrices, data frames, or lists.
Syntax: `object[row, column]`
Example:

# Subsetting a vector
vec <- c(10, 20, 30, 40)
vec[2] # Returns the second element

# Subsetting a data frame


df <- data.frame(A = 1:3, B = c("x", "y", "z"))
df[1, 2] # Returns the element in the first row, second column

2. Dollar Sign `$`

Used to extract a column from a data frame or list by its name.


Example:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 6/27
R

df <- data.frame(Name = c("A", "B"), Age = c(25, 30))


df$Name # Returns the 'Name' column

3. Double Square Brackets `[[ ]]`

Used to extract a single element from a list.


Example:

my_list <- list(a = 1:5, b = "Hello")


my_list[[1]] # Returns the first element (1:5)

4. Logical Operators

Logical conditions can be used inside square brackets to filter data.


Operators:
`<`, `>`, `<=`, `>=`: Comparison operators
`==`: Equality operator
`!=`: Not equal to
`&`, `|`: Logical AND and OR
Example:

vec <- c(10, 20, 30, 40)


vec[vec > 20] # Returns elements greater than 20

5. Subset Function `subset()`

A function used for creating subsets of data frames based on logical conditions.
Example:

df <- data.frame(Age = c(25, 30, 35), Name = c("A", "B", "C"))


subset(df, Age > 25) # Subset rows where Age is greater than 25

These operators make subsetting data in R flexible and powerful, allowing fine-grained data selection
for analysis.

Q) i. Create a data frame from the following 4 vectors and demonstrate the output:
emp_id = c (1:5) emp_name = c("Rick","Dan","Michelle", "Ryan","Gary")
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 7/27
start_date = c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")
salary = c(60000, 45000, 75000, 84000, 20000)

ii. Display structure and summary of the above data frame.


iii. Extract the emp_name and salary columns from the above data frame.
iv. Extract the employee details whose salary is less than or equal to 60000

i. Create a Data Frame

To create the data frame from the given vectors:

# Create the vectors


emp_id <- c(1:5)
emp_name <- c("Rick", "Dan", "Michelle", "Ryan", "Gary")
start_date <- as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-2
7"))
salary <- c(60000, 45000, 75000, 84000, 20000)

# Create the data frame


employees <- data.frame(emp_id, emp_name, start_date, salary)

# Print the data frame


print(employees)

Output

yaml

emp_id emp_name start_date salary


1 1 Rick 2012-01-01 60000
2 2 Dan 2013-09-23 45000
3 3 Michelle 2014-11-15 75000
4 4 Ryan 2014-05-11 84000
5 5 Gary 2015-03-27 20000

ii. Display Structure and Summary of the Data Frame

R Code

# Display the structure of the data frame


str(employees)

# Display the summary of the data frame


summary(employees)

Output
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 8/27
Structure:

ruby

'data.frame': 5 obs. of 4 variables:


$ emp_id : int 1 2 3 4 5
$ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ...
$ start_date: Date, format: "2012-01-01" "2013-09-23" ...
$ salary : num 60000 45000 75000 84000 20000

Summary:

mathematica

emp_id emp_name start_date salary


Min. :1.0 Length:5 Min. :2012-01-01 Min. :20000
1st Qu.:2.0 Class :character 1st Qu.:2013-09-23 1st Qu.:45000
Median :3.0 Mode :character Median :2014-05-11 Median :60000
Mean :3.0 Mean :2014-03-14 Mean :56800
3rd Qu.:4.0 3rd Qu.:2014-11-15 3rd Qu.:75000
Max. :5.0 Max. :2015-03-27 Max. :84000

iii. Extract the `emp_name` and `salary` Columns

R Code

# Extract emp_name and salary columns


emp_name_salary <- employees[, c("emp_name", "salary")]
print(emp_name_salary)

Output

markdown

emp_name salary
1 Rick 60000
2 Dan 45000
3 Michelle 75000
4 Ryan 84000
5 Gary 20000

iv. Extract Employee Details with Salary ≤ 60000

R Code

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 9/27
R

# Subset employees with salary <= 60000


low_salary_employees <- subset(employees, salary <= 60000)
print(low_salary_employees)

Output

yaml

emp_id emp_name start_date salary


1 1 Rick 2012-01-01 60000
2 2 Dan 2013-09-23 45000
5 5 Gary 2015-03-27 20000

Let me know if you need further assistance!

Q) List and explain various functions that allow users to handle data in R workspace with
appropriate examples.

In R, the workspace refers to the environment where all the data objects (vectors, matrices, data frames,
etc.) are stored. R provides a variety of functions that allow users to manage, manipulate, and query the
data stored in the workspace. Below is a list of key functions, along with explanations and examples, to
handle data in the R workspace:

1. `ls()` - List Objects in the Workspace

This function lists all the objects (variables, functions, etc.) currently in the R workspace.
Syntax: `ls()`

Example:

x <- 10
y <- 20
ls() # Lists the objects in the workspace

Output:

csharp

[1] "x" "y"

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 10/27
2. `rm()` - Remove Objects from the Workspace

This function removes specified objects from the R workspace, effectively deleting them.
Syntax: `rm(object_name)`

Example:

x <- 10
y <- 20
rm(x) # Removes the object 'x' from the workspace
ls() # Check the workspace after removal

Output:

csharp

[1] "y"

3. `rm(list = ls())` - Remove All Objects from the Workspace

This function removes all objects from the workspace by passing the `ls()` output to `rm()`.
Syntax: `rm(list = ls())`

Example:

x <- 10
y <- 20
rm(list = ls()) # Removes all objects from the workspace
ls() # Check if the workspace is empty

Output:

scss

character(0)

4. `objects()` - List Objects in the Workspace (Alternative to `ls()`)

This function is similar to `ls()`, listing the names of all objects in the current environment.
Syntax: `objects()`

Example:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 11/27
R

a <- 5
b <- 6
objects() # List all objects in the workspace

Output:

csharp

[1] "a" "b"

5. `exists()` - Check If an Object Exists

This function checks if an object exists in the workspace.


Syntax: `exists("object_name")`

Example:

x <- 100
exists("x") # Check if 'x' exists in the workspace

Output:

csharp

[1] TRUE

6. `save()` - Save Objects to a File

This function saves one or more objects from the workspace to an external file (RData format).
Syntax: `save(object_name, file = "filename.RData")`

Example:

a <- 10
b <- 20
save(a, b, file = "data.RData") # Save 'a' and 'b' to 'data.RData'

7. `load()` - Load Objects from a File

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 12/27
This function loads objects from an external file into the workspace.
Syntax: `load("filename.RData")`

Example:

load("data.RData") # Load the objects from 'data.RData'


ls() # Check if the objects have been loaded

Output:

csharp

[1] "a" "b"

8. `saveRDS()` - Save a Single R Object to a File

This function saves a single R object to a file in RDS format, which can later be read back into R with
`readRDS()`.
Syntax: `saveRDS(object, file = "filename.rds")`

Example:

a <- c(1, 2, 3, 4, 5)
saveRDS(a, file = "vector.rds") # Save the vector 'a'

9. `readRDS()` - Read an R Object from an RDS File

This function reads an R object saved in RDS format back into R.


Syntax: `readRDS("filename.rds")`

Example:

a <- readRDS("vector.rds") # Load the vector from the RDS file


print(a)

Output:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 13/27
csharp

[1] 1 2 3 4 5

10. `attach()` - Attach a Data Frame to the Workspace

This function allows the variables of a data frame to be accessed directly by name without
referencing the data frame.
Syntax: `attach(data_frame)`

Example:

df <- data.frame(x = 1:5, y = c(10, 20, 30, 40, 50))


attach(df) # Makes the columns 'x' and 'y' accessible directly
print(x) # Can access 'x' directly without using df$x
detach(df) # Detach the data frame

11. `detach()` - Detach a Data Frame from the Workspace

This function removes the data frame from the search path, making it no longer accessible by the
variable names alone.
Syntax: `detach(data_frame)`

Example:

detach(df) # Detach the data frame so its columns are no longer accessible directly

12. `ls.str()` - List Structure of All Objects

This function lists the names of all objects in the workspace along with their structure, which can
be helpful for understanding the content of larger objects.
Syntax: `ls.str()`

Example:

x <- 1:10
y <- data.frame(a = 1:5, b = letters[1:5])
ls.str() # List structure of all objects in the workspace

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 14/27
13. `clear()` (or `rm(list = ls())`) - Clear the Workspace

Clears all objects from the workspace, effectively resetting the environment.
Syntax: `rm(list = ls())`

Example:

rm(list = ls()) # Clears the workspace


ls() # Verify that the workspace is empty

14. `setwd()` - Set Working Directory

This function sets the working directory, which determines where files will be saved or read from.
Syntax: `setwd("path/to/directory")`

Example:

setwd("C:/Users/YourName/Documents") # Set the working directory

Conclusion

These functions are crucial for managing the workspace in R. They allow users to list, remove, save, and
load data, making it easier to organize and manipulate data in the R environment. These functions also
help in saving time and memory when working with large datasets.

Q) i. What are the advantages of using functions over scripts?


ii. Suppose you have two datasets A and B.
Dataset A has the following data: 6 7 8 9
Dataset B has the following data: 1 2 4 5.
Which function is used to combine the data from both datasets into dataset C.
Demonstrate the function with the input values and write the output.

i. Advantages of Using Functions Over Scripts

Using functions in R offers several advantages over using standalone scripts:

1. Reusability:
Functions allow you to encapsulate code logic that can be reused multiple times with different
inputs. This reduces redundancy and promotes cleaner, more modular code.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 15/27
Example: Instead of copying and pasting the same code multiple times, you can write a
function once and call it whenever needed.
2. Readability:
Functions help in making your code more readable and organized. With functions, you can
abstract complex operations and give them meaningful names.
Example: A function `calculate_sum()` is more readable and easier to understand than
manually writing out the summing logic multiple times.
3. Maintainability:
Functions make the code more maintainable since you only need to update or fix logic in one
place, rather than in multiple locations in a script.
Example: If you need to change the logic for calculating a value, you only need to update the
function, not every instance where that logic is used.
4. Modularity:
Functions break down large scripts into smaller, manageable pieces. Each function can handle
a specific task, making the code easier to debug and test.
Example: Instead of having one long script, you can have several smaller functions that each
handle different tasks.
5. Flexibility:
Functions can accept parameters and return values, which allows for flexible operations on
different data inputs. This makes them ideal for dynamic and data-driven tasks.
Example: You can write a function that sorts a dataset, and call it with different datasets
without modifying the function.
6. Avoiding Code Duplication:
By using functions, you avoid repeating the same code in different parts of your script. This
helps prevent errors and inconsistencies.
Example: If you need to calculate the mean of different columns in a dataset, you can write a
function that does this for any column, reducing repetitive code.

ii. Combining Datasets A and B into Dataset C

To combine the two datasets (A and B), the `c()` function in R is typically used to concatenate elements.
This function combines data from two or more vectors into one.

Given Data:

Dataset A: `6, 7, 8, 9`
Dataset B: `1, 2, 4, 5`

We can use the `c()` function to combine these datasets into Dataset C.

R Code:

# Define the datasets


A <- c(6, 7, 8, 9)
B <- c(1, 2, 4, 5)

# Combine A and B into C

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 16/27
C <- c(A, B)

# Print the combined dataset C


print(C)

Output:

csharp

[1] 6 7 8 9 1 2 4 5

Explanation:

The `c()` function combines the two datasets, `A` and `B`, into a single vector `C`. This is a simple
concatenation of the values from both datasets in the order they are provided.

Q) Consider the following data frame given below: course id class 1 11 marks 1 2 12 56 2 3 13 75 1 4
14 48 2 5 15 69 1 6 16 84 2 53 i.Create a subset of course less than 3 by using [ ] brackets and
demonstrate the output. ii. Create a subset where the course column is less than 3 or the class
equals to 2 by using subset () function and demonstrate the output.
Q) List and discuss various types of data structures in R.

i. Subset of `course` less than 3 using `[]` brackets

To create a subset where the `course` column is less than 3 using `[]` brackets, you can filter the data
based on the condition directly within the indexing.

R Code:

# Create the data frame


data <- data.frame(
course = c(1, 2, 3, 4, 5, 6),
id = c(11, 12, 13, 14, 15, 16),
marks = c(56, 75, 48, 69, 84, 53)
)

# Subset where course < 3


subset_course <- data[data$course < 3, ]
print(subset_course)

Output:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 17/27
bash

course id marks
1 1 11 56
2 2 12 75

Explanation:

The expression `data$course < 3` creates a logical vector where only the rows where the `course`
value is less than 3 are selected.
The `[]` brackets are used to subset the data frame based on this condition.

ii. Subset where `course` is less than 3 or `class` equals 2 using `subset()` function

To create a subset where the `course` column is less than 3 or the `class` equals 2 using the `subset()`
function, you can define the condition inside the `subset()` function.

R Code:

# Create the data frame


data <- data.frame(
course = c(1, 2, 3, 4, 5, 6),
class = c(11, 12, 13, 14, 15, 16),
marks = c(56, 75, 48, 69, 84, 53)
)

# Subset where course < 3 or class == 2


subset_condition <- subset(data, course < 3 | class == 2)
print(subset_condition)

Output:

kotlin

course class marks


1 1 11 56
2 2 12 75

Explanation:

The `subset()` function allows for logical conditions to filter rows. The condition `course < 3 |
class == 2` selects rows where the `course` is less than 3 or the `class` is equal to 2.

Q) List and Discuss Various Types of Data Structures in R

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 18/27
R provides several types of data structures to store and manipulate data. Here are the main data
structures in R:

1. Vector

A vector is a basic data structure in R. It contains elements of the same type (e.g., numeric,
character, logical).
Types:
Numeric vector: Contains numbers.
Character vector: Contains strings.
Logical vector: Contains TRUE or FALSE values.
Example:

numeric_vector <- c(1, 2, 3, 4)


character_vector <- c("apple", "banana", "cherry")

2. Matrix

A matrix is a two-dimensional array where elements are of the same type (e.g., all numeric). It is
useful for mathematical computations.
Example:

m <- matrix(1:6, nrow = 2, ncol = 3)


print(m)

Output:

css

[,1] [,2] [,3]


[1,] 1 3 5
[2,] 2 4 6

3. Array

An array is a multi-dimensional data structure that can store elements of the same type. It can be
viewed as an extension of matrices to more than two dimensions.
Example:

arr <- array(1:8, dim = c(2, 2, 2))


print(arr)

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 19/27
Output:

css

, , 1

[,1] [,2]
[1,] 1 3
[2,] 2 4

, , 2

[,1] [,2]
[1,] 5 7
[2,] 6 8

4. Data Frame

A data frame is a two-dimensional data structure, similar to a table in a database or a data frame in
Python's Pandas. It can store columns of different data types (numeric, character, etc.).
Example:

df <- data.frame(Name = c("Alice", "Bob", "Charlie"), Age = c(25, 30, 35))


print(df)

Output:

markdown

Name Age
1 Alice 25
2 Bob 30
3 Charlie 35

5. List

A list is a flexible data structure that can hold elements of different types, including other lists. Lists
are often used to store mixed-type data.
Example:

lst <- list(Name = "Alice", Age = 25, Scores = c(90, 85, 92))
print(lst)

Output:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 20/27
bash

$Name
[1] "Alice"

$Age
[1] 25

$Scores
[1] 90 85 92

6. Factor

A factor is used to represent categorical data. Factors are treated as integers with corresponding
levels, making them useful for modeling categorical variables.
Example:

f <- factor(c("low", "medium", "high", "medium"))


print(f)

Output:

scss

[1] low medium high medium


Levels: high low medium

7. Tibble

A tibble is a modern version of a data frame provided by the `tibble` package. It offers more
convenient printing and handling of data frames.
Example:

library(tibble)
tb <- tibble(Name = c("John", "Paul", "George"), Age = c(28, 30, 32))
print(tb)

Output:

perl

# A tibble: 3 × 2
Name Age
<chr> <dbl>
1 John 28

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 21/27
2 Paul 30
3 George 32

8. Data Table

Data tables are an enhanced version of data frames, which are optimized for speed and memory
usage. They are part of the `data.table` package.
Example:

library(data.table)
dt <- data.table(Name = c("Tom", "Jerry"), Age = c(5, 3))
print(dt)

Output:

makefile

Name Age
1: Tom 5
2: Jerry 3

Conclusion

These data structures provide different ways to handle data in R, depending on the type and complexity
of the task at hand. Understanding when and how to use them is crucial for efficient data analysis in R.

i. The following table shows the number of units of different products sold on
different days:
Product
Monday Tuesday Wednesday Thursda
y
Bread
12
3
5
Friday
Milk
11
21
27
18
9
Cola Cans
20

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 22/27
10
1
33
6
15
Chocolate
bars
12
6
7
4
13
Detergent
5
8
12
12
Create five sample numeric vectors from this data.
20
23
ii. Name and explain the operators used to form data subsets in R.

i. Creating Sample Numeric Vectors from the Given Data

We can create five numeric vectors, each corresponding to the data for one of the products. Each vector
will represent the number of units sold on different days (Monday, Tuesday, Wednesday, Thursday, and
Friday).

Given Data:

Product Monday Tuesday Wednesday Thursday Friday


Bread 12 3 5 20 23

Milk 11 21 27 18 9
Cola Cans 20 10 1 33 6
Chocolate Bars 12 6 7 4 13
Detergent 5 8 12 12 12

Now, let's create five numeric vectors, one for each product.

R Code:

# Create numeric vectors for each product


bread_sales <- c(12, 3, 5, 20, 23)
milk_sales <- c(11, 21, 27, 18, 9)
cola_cans_sales <- c(20, 10, 1, 33, 6)
chocolate_bars_sales <- c(12, 6, 7, 4, 13)
detergent_sales <- c(5, 8, 12, 12, 12)
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 23/27
# Print the vectors
print(bread_sales)
print(milk_sales)
print(cola_cans_sales)
print(chocolate_bars_sales)
print(detergent_sales)

Output:

csharp

[1] 12 3 5 20 23
[1] 11 21 27 18 9
[1] 20 10 1 33 6
[1] 12 6 7 4 13
[1] 5 8 12 12 12

Explanation:

Each vector represents the number of units sold for a specific product across the five days of the
week (Monday to Friday).

ii. Operators Used to Form Data Subsets in R

In R, there are several operators and functions that allow you to subset data, whether it is in a vector,
matrix, data frame, or list. The most commonly used operators for subsetting data are:

1. Indexing Operator `[]`

The `[]` operator is used to subset data frames, vectors, and matrices. You can specify the row and
column indexes inside the brackets to extract specific elements.
Examples:
For a vector:

vec <- c(10, 20, 30, 40, 50)


subset_vec <- vec[2:4] # Extracts elements 2 to 4
print(subset_vec)

Output:

csharp

[1] 20 30 40

For a data frame:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 24/27
R

df <- data.frame(A = 1:5, B = 6:10)


subset_df <- df[2:4, ] # Extracts rows 2 to 4
print(subset_df)

Output:

css

A B
2 2 7
3 3 8
4 4 9

2. Column Subsetting Using `$` (for Data Frames)

The `$` operator is used to extract specific columns from a data frame by name.
Example:

df <- data.frame(A = 1:5, B = 6:10)


col_A <- df$A # Extracts the 'A' column
print(col_A)

Output:

csharp

[1] 1 2 3 4 5

3. Logical Subsetting

Logical conditions can be used inside the `[]` operator to subset data based on conditions. The
condition returns a logical vector (TRUE/FALSE) and is used to filter data.
Example:

vec <- c(10, 20, 30, 40, 50)


subset_vec <- vec[vec > 30] # Extracts values greater than 30
print(subset_vec)

Output:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 25/27
csharp

[1] 40 50

4. `subset()` Function (for Data Frames)

The `subset()` function is used for filtering data frames based on a condition, and it is more
readable and intuitive for complex filtering.
Example:

df <- data.frame(A = 1:5, B = 6:10)


subset_df <- subset(df, A > 2) # Subsets rows where A > 2
print(subset_df)

Output:

css

A B
3 3 8
4 4 9
5 5 10

5. `which()` Function

The `which()` function is used to identify the indices of elements that meet a condition.
Example:

vec <- c(10, 20, 30, 40, 50)


indices <- which(vec > 30) # Identifies the indices where elements are greater than 30
print(indices)

Output:

csharp

[1] 4 5

6. Matrix Subsetting

In a matrix, rows and columns are selected using the `[]` operator, with row and column indices.
Example:

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 26/27
R

mat <- matrix(1:9, nrow = 3, byrow = TRUE)


subset_mat <- mat[1:2, 2:3] # Extracts the first two rows and columns 2 and 3
print(subset_mat)

Output:

css

[,1] [,2]
[1,] 2 3
[2,] 5 6

Summary of Operators for Data Subsetting:

1. `[]` – Indexing operator for vectors, matrices, data frames, and lists.
2. `$` – Extracts specific columns from data frames.
3. Logical conditions – Used within `[]` or `subset()` for conditional subsetting.
4. `subset()` – A function for filtering data frames with conditions.
5. `which()` – Identifies indices based on a condition.

Each of these methods has its use cases depending on the data structure and the task at hand, making
data manipulation in R flexible and efficient.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 27/27

You might also like