Subset Creation in R
Subset Creation in R
This conversation may reflect the link creator’s personalized data, which isn’t shared and can
meaningfully change how the model responds.
To create the subsets from the given data frame, we'll use the `subset()` function and indexing with
square brackets `[ ]` in R. Here's how you can accomplish it:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/27
R
Output
kotlin
Output
kotlin
Part ii: Subset where `subject` is less than 3 and `class` equals 2 using `[ ]`
brackets
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/27
Output
kotlin
The data analyst of Argon technology Mr. John needs to enter the salaries of 10
employees in R. The salaries of the employees are given in the following table:
Sr. No. Name of employees Salaries
1
Vivek
21000
2
Karan
55000
3
James
67000
4
Soham
50000
5
Renu
54000
6
Farah
40000
7
Hetal
30000
8
Mary
70000
9
Ganesh
10
20000
Krish
15000
i.
ii.
Which R command will Mr. John use to enter these values demonstrate the output.
Now Mr. John wants to add the salaries of 5 new employees in the existing table,
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/27
which command he will use to join datasets with new values in R. Demonstrate the
output.
To enter and manage the given salary data in R, Mr. John can use a combination of commands to create
the initial data frame and then append additional data to it. Here's how he can proceed:
To enter the salaries of the 10 employees, Mr. John can use the `data.frame()` function.
Output
To add new rows of employee data, Mr. John can use the `rbind()` function. Here is an example with the
new employee data.
13 Ajay 60000
14 Simran 52000
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/27
Sr. No. Name Salaries
15 Meera 47000
Output
i. Write the script to sort the values contained in the following vector in ascending
order and descending order: (23, 45, 10, 34, 89, 20, 67, 99). Demonstrate the
output.
ii. Name and explain the operators used to form data subsets in R.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 5/27
R
print("Descending Order:")
print(descending)
Output
csharp
In R, several operators are used to form subsets of data. Here’s an explanation of the most common
ones:
1. Square Brackets `[ ]`
Used to extract specific elements from vectors, matrices, data frames, or lists.
Syntax: `object[row, column]`
Example:
# Subsetting a vector
vec <- c(10, 20, 30, 40)
vec[2] # Returns the second element
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 6/27
R
4. Logical Operators
A function used for creating subsets of data frames based on logical conditions.
Example:
These operators make subsetting data in R flexible and powerful, allowing fine-grained data selection
for analysis.
Q) i. Create a data frame from the following 4 vectors and demonstrate the output:
emp_id = c (1:5) emp_name = c("Rick","Dan","Michelle", "Ryan","Gary")
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 7/27
start_date = c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")
salary = c(60000, 45000, 75000, 84000, 20000)
Output
yaml
R Code
Output
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 8/27
Structure:
ruby
Summary:
mathematica
R Code
Output
markdown
emp_name salary
1 Rick 60000
2 Dan 45000
3 Michelle 75000
4 Ryan 84000
5 Gary 20000
R Code
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 9/27
R
Output
yaml
Q) List and explain various functions that allow users to handle data in R workspace with
appropriate examples.
In R, the workspace refers to the environment where all the data objects (vectors, matrices, data frames,
etc.) are stored. R provides a variety of functions that allow users to manage, manipulate, and query the
data stored in the workspace. Below is a list of key functions, along with explanations and examples, to
handle data in the R workspace:
This function lists all the objects (variables, functions, etc.) currently in the R workspace.
Syntax: `ls()`
Example:
x <- 10
y <- 20
ls() # Lists the objects in the workspace
Output:
csharp
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 10/27
2. `rm()` - Remove Objects from the Workspace
This function removes specified objects from the R workspace, effectively deleting them.
Syntax: `rm(object_name)`
Example:
x <- 10
y <- 20
rm(x) # Removes the object 'x' from the workspace
ls() # Check the workspace after removal
Output:
csharp
[1] "y"
This function removes all objects from the workspace by passing the `ls()` output to `rm()`.
Syntax: `rm(list = ls())`
Example:
x <- 10
y <- 20
rm(list = ls()) # Removes all objects from the workspace
ls() # Check if the workspace is empty
Output:
scss
character(0)
This function is similar to `ls()`, listing the names of all objects in the current environment.
Syntax: `objects()`
Example:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 11/27
R
a <- 5
b <- 6
objects() # List all objects in the workspace
Output:
csharp
Example:
x <- 100
exists("x") # Check if 'x' exists in the workspace
Output:
csharp
[1] TRUE
This function saves one or more objects from the workspace to an external file (RData format).
Syntax: `save(object_name, file = "filename.RData")`
Example:
a <- 10
b <- 20
save(a, b, file = "data.RData") # Save 'a' and 'b' to 'data.RData'
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 12/27
This function loads objects from an external file into the workspace.
Syntax: `load("filename.RData")`
Example:
Output:
csharp
This function saves a single R object to a file in RDS format, which can later be read back into R with
`readRDS()`.
Syntax: `saveRDS(object, file = "filename.rds")`
Example:
a <- c(1, 2, 3, 4, 5)
saveRDS(a, file = "vector.rds") # Save the vector 'a'
Example:
Output:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 13/27
csharp
[1] 1 2 3 4 5
This function allows the variables of a data frame to be accessed directly by name without
referencing the data frame.
Syntax: `attach(data_frame)`
Example:
This function removes the data frame from the search path, making it no longer accessible by the
variable names alone.
Syntax: `detach(data_frame)`
Example:
detach(df) # Detach the data frame so its columns are no longer accessible directly
This function lists the names of all objects in the workspace along with their structure, which can
be helpful for understanding the content of larger objects.
Syntax: `ls.str()`
Example:
x <- 1:10
y <- data.frame(a = 1:5, b = letters[1:5])
ls.str() # List structure of all objects in the workspace
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 14/27
13. `clear()` (or `rm(list = ls())`) - Clear the Workspace
Clears all objects from the workspace, effectively resetting the environment.
Syntax: `rm(list = ls())`
Example:
This function sets the working directory, which determines where files will be saved or read from.
Syntax: `setwd("path/to/directory")`
Example:
Conclusion
These functions are crucial for managing the workspace in R. They allow users to list, remove, save, and
load data, making it easier to organize and manipulate data in the R environment. These functions also
help in saving time and memory when working with large datasets.
1. Reusability:
Functions allow you to encapsulate code logic that can be reused multiple times with different
inputs. This reduces redundancy and promotes cleaner, more modular code.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 15/27
Example: Instead of copying and pasting the same code multiple times, you can write a
function once and call it whenever needed.
2. Readability:
Functions help in making your code more readable and organized. With functions, you can
abstract complex operations and give them meaningful names.
Example: A function `calculate_sum()` is more readable and easier to understand than
manually writing out the summing logic multiple times.
3. Maintainability:
Functions make the code more maintainable since you only need to update or fix logic in one
place, rather than in multiple locations in a script.
Example: If you need to change the logic for calculating a value, you only need to update the
function, not every instance where that logic is used.
4. Modularity:
Functions break down large scripts into smaller, manageable pieces. Each function can handle
a specific task, making the code easier to debug and test.
Example: Instead of having one long script, you can have several smaller functions that each
handle different tasks.
5. Flexibility:
Functions can accept parameters and return values, which allows for flexible operations on
different data inputs. This makes them ideal for dynamic and data-driven tasks.
Example: You can write a function that sorts a dataset, and call it with different datasets
without modifying the function.
6. Avoiding Code Duplication:
By using functions, you avoid repeating the same code in different parts of your script. This
helps prevent errors and inconsistencies.
Example: If you need to calculate the mean of different columns in a dataset, you can write a
function that does this for any column, reducing repetitive code.
To combine the two datasets (A and B), the `c()` function in R is typically used to concatenate elements.
This function combines data from two or more vectors into one.
Given Data:
Dataset A: `6, 7, 8, 9`
Dataset B: `1, 2, 4, 5`
We can use the `c()` function to combine these datasets into Dataset C.
R Code:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 16/27
C <- c(A, B)
Output:
csharp
[1] 6 7 8 9 1 2 4 5
Explanation:
The `c()` function combines the two datasets, `A` and `B`, into a single vector `C`. This is a simple
concatenation of the values from both datasets in the order they are provided.
Q) Consider the following data frame given below: course id class 1 11 marks 1 2 12 56 2 3 13 75 1 4
14 48 2 5 15 69 1 6 16 84 2 53 i.Create a subset of course less than 3 by using [ ] brackets and
demonstrate the output. ii. Create a subset where the course column is less than 3 or the class
equals to 2 by using subset () function and demonstrate the output.
Q) List and discuss various types of data structures in R.
To create a subset where the `course` column is less than 3 using `[]` brackets, you can filter the data
based on the condition directly within the indexing.
R Code:
Output:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 17/27
bash
course id marks
1 1 11 56
2 2 12 75
Explanation:
The expression `data$course < 3` creates a logical vector where only the rows where the `course`
value is less than 3 are selected.
The `[]` brackets are used to subset the data frame based on this condition.
ii. Subset where `course` is less than 3 or `class` equals 2 using `subset()` function
To create a subset where the `course` column is less than 3 or the `class` equals 2 using the `subset()`
function, you can define the condition inside the `subset()` function.
R Code:
Output:
kotlin
Explanation:
The `subset()` function allows for logical conditions to filter rows. The condition `course < 3 |
class == 2` selects rows where the `course` is less than 3 or the `class` is equal to 2.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 18/27
R provides several types of data structures to store and manipulate data. Here are the main data
structures in R:
1. Vector
A vector is a basic data structure in R. It contains elements of the same type (e.g., numeric,
character, logical).
Types:
Numeric vector: Contains numbers.
Character vector: Contains strings.
Logical vector: Contains TRUE or FALSE values.
Example:
2. Matrix
A matrix is a two-dimensional array where elements are of the same type (e.g., all numeric). It is
useful for mathematical computations.
Example:
Output:
css
3. Array
An array is a multi-dimensional data structure that can store elements of the same type. It can be
viewed as an extension of matrices to more than two dimensions.
Example:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 19/27
Output:
css
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
4. Data Frame
A data frame is a two-dimensional data structure, similar to a table in a database or a data frame in
Python's Pandas. It can store columns of different data types (numeric, character, etc.).
Example:
Output:
markdown
Name Age
1 Alice 25
2 Bob 30
3 Charlie 35
5. List
A list is a flexible data structure that can hold elements of different types, including other lists. Lists
are often used to store mixed-type data.
Example:
lst <- list(Name = "Alice", Age = 25, Scores = c(90, 85, 92))
print(lst)
Output:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 20/27
bash
$Name
[1] "Alice"
$Age
[1] 25
$Scores
[1] 90 85 92
6. Factor
A factor is used to represent categorical data. Factors are treated as integers with corresponding
levels, making them useful for modeling categorical variables.
Example:
Output:
scss
7. Tibble
A tibble is a modern version of a data frame provided by the `tibble` package. It offers more
convenient printing and handling of data frames.
Example:
library(tibble)
tb <- tibble(Name = c("John", "Paul", "George"), Age = c(28, 30, 32))
print(tb)
Output:
perl
# A tibble: 3 × 2
Name Age
<chr> <dbl>
1 John 28
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 21/27
2 Paul 30
3 George 32
8. Data Table
Data tables are an enhanced version of data frames, which are optimized for speed and memory
usage. They are part of the `data.table` package.
Example:
library(data.table)
dt <- data.table(Name = c("Tom", "Jerry"), Age = c(5, 3))
print(dt)
Output:
makefile
Name Age
1: Tom 5
2: Jerry 3
Conclusion
These data structures provide different ways to handle data in R, depending on the type and complexity
of the task at hand. Understanding when and how to use them is crucial for efficient data analysis in R.
i. The following table shows the number of units of different products sold on
different days:
Product
Monday Tuesday Wednesday Thursda
y
Bread
12
3
5
Friday
Milk
11
21
27
18
9
Cola Cans
20
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 22/27
10
1
33
6
15
Chocolate
bars
12
6
7
4
13
Detergent
5
8
12
12
Create five sample numeric vectors from this data.
20
23
ii. Name and explain the operators used to form data subsets in R.
We can create five numeric vectors, each corresponding to the data for one of the products. Each vector
will represent the number of units sold on different days (Monday, Tuesday, Wednesday, Thursday, and
Friday).
Given Data:
Milk 11 21 27 18 9
Cola Cans 20 10 1 33 6
Chocolate Bars 12 6 7 4 13
Detergent 5 8 12 12 12
Now, let's create five numeric vectors, one for each product.
R Code:
Output:
csharp
[1] 12 3 5 20 23
[1] 11 21 27 18 9
[1] 20 10 1 33 6
[1] 12 6 7 4 13
[1] 5 8 12 12 12
Explanation:
Each vector represents the number of units sold for a specific product across the five days of the
week (Monday to Friday).
In R, there are several operators and functions that allow you to subset data, whether it is in a vector,
matrix, data frame, or list. The most commonly used operators for subsetting data are:
The `[]` operator is used to subset data frames, vectors, and matrices. You can specify the row and
column indexes inside the brackets to extract specific elements.
Examples:
For a vector:
Output:
csharp
[1] 20 30 40
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 24/27
R
Output:
css
A B
2 2 7
3 3 8
4 4 9
The `$` operator is used to extract specific columns from a data frame by name.
Example:
Output:
csharp
[1] 1 2 3 4 5
3. Logical Subsetting
Logical conditions can be used inside the `[]` operator to subset data based on conditions. The
condition returns a logical vector (TRUE/FALSE) and is used to filter data.
Example:
Output:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 25/27
csharp
[1] 40 50
The `subset()` function is used for filtering data frames based on a condition, and it is more
readable and intuitive for complex filtering.
Example:
Output:
css
A B
3 3 8
4 4 9
5 5 10
5. `which()` Function
The `which()` function is used to identify the indices of elements that meet a condition.
Example:
Output:
csharp
[1] 4 5
6. Matrix Subsetting
In a matrix, rows and columns are selected using the `[]` operator, with row and column indices.
Example:
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 26/27
R
Output:
css
[,1] [,2]
[1,] 2 3
[2,] 5 6
1. `[]` – Indexing operator for vectors, matrices, data frames, and lists.
2. `$` – Extracts specific columns from data frames.
3. Logical conditions – Used within `[]` or `subset()` for conditional subsetting.
4. `subset()` – A function for filtering data frames with conditions.
5. `which()` – Identifies indices based on a condition.
Each of these methods has its use cases depending on the data structure and the task at hand, making
data manipulation in R flexible and efficient.
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 27/27