How to Extract random sample of rows in R DataFrame with nested condition
Last Updated :
24 Jun, 2021
In this article, we will learn how to extract random samples of rows in a DataFrame in R programming language with a nested condition.
Method 1: Using sample()
We will be using the sample() function to carry out this task. sample() function in R Language creates random samples based on the parameters provided in the function call. It takes either a vector or a positive integer as the object in the function parameter.
Another function which we will be using is which(). This function will help us provide conditions according to which samples will be extracted. which() function returns the elements (along with indices of the elements) which satisfy the condition given in the parameters.
Syntax: df[ sample(which ( conditions ) ,n), ]
Parameters:
- df: DataFrame
- n: number of samples to be generated
- conditions: samples are extracted according to this condition. Ex: df$year > 5
DataFrame in Use:
| name | year | length | education |
---|
1 | Welcome | 10 | 40 | yes |
2 | to | 51 | NA | yes |
3 | Geeks | 19 | NA | no |
4 | for | 126 | 100 | no |
5 | Geeks | 99 | 95 | yes |
Thus, to realize this approach the dataframe is first created and then passed to sample() along with the condition that will be used to extract rows from the dataframe. Given below are implementations that uses the above dataframe to illustrate the same.
Example 1:
R
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
df[ sample(which (df$year > 5) ,2), ]
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
Example 2:
R
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "3 samples"
name year length education
5 Geeks 99 95 yes
1 Welcome 10 40 yes
2 to 51 NA yes
Method 2: Using sample_n() function
sample_n() function in R Language is used to take random sample specimens from a data frame.
Syntax: sample_n(x, n)
Parameters:
- x: Data Frame
- n: size/number of items to select
Along with sample_n() function, we have also used filter() function. The filter() function in R Language is used to choose cases and filtering out the values based on the filtering expression.
Syntax: filter(x, expr)
Parameters:
- x: Object to be filtered
- expr: expression as a base for filtering
We have loaded the dplyr package as it contains both filter() and sample_n() function. In the parameters of the filter function, we have passed our sample dataframe->df and our Nested conditional as arguments. Then we have used our sample_n() function to extract the "n" number of samples from the dataframe after satisfying the conditions.
Syntax: filter(df, condition) %>% sample_n(., n)
Parameters:
- df: Dataframe Object
- condition: Nested conditionals. Ex: df$name != "to"
- n: Number of samples
Example 1:
R
library(dplyr)
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
filter(df, df$name != "to") %>% sample_n(., 2)
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 Welcome 10 40 yes
2 Geeks 99 95 yes
Example 2:
R
library(dplyr)
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
filter(df, df$year >20 ) %>% sample_n(., 2)
Output:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 for 126 100 no
2 to 51 NA yes
Similar Reads
How to Remove Rows with Some or All NAs in R DataFrame?
In this article, we will discuss how to remove rows with some or all NA's in R Programming Language. We will consider a dataframe and then remove rows in R. Let's create a dataframe with 3 columns and 6 rows. R # create dataframe data = data.frame(names=c("manoj", "bobby", "sravan", "deepu", NA, NA)
2 min read
How to extract the dataframe row with min or max values in R ?
The tabular arrangement of rows and columns to form a data frame in R Programming Language supports many ways to access and modify the data. Application of queries and aggregate functions, like min, max and count can easily be made over the data frame cell values. Therefore, it is relatively very ea
5 min read
How to Conditionally Remove Rows in R DataFrame?
In this article, we will discuss how to conditionally remove rows from a dataframe in the R Programming Language. We need to remove some rows of data from the dataframe conditionally to prepare the data. For that, we use logical conditions on the basis of which data that doesn't follow the condition
4 min read
Count number of rows within each group in R DataFrame
DataFrame in R Programming Language may contain columns where not all values are unique. The duplicate values in the dataframe can be sectioned together into one group. The frequencies corresponding to the same columns' sequence can be captured using various external packages in R programming langua
5 min read
Return Column Name of Largest Value for Each Row in R DataFrame
In this article, we will discuss how to return column names of the largest value for each row in DataFrame in R Programming Language. Example: Â Column1Column2Column3Max columnRow1200Column1 , Because, Â Column2 value and Column 3 value is less than Column1Row2435Column3 , Because, Â Column2 value and
2 min read
How to randomly shuffle contents of a single column in R dataframe?
In this article, we will learn how can we randomly shuffle the contents of a single column using R programming language. Sample dataframe in use: c1c2c3a1w11ab2x22bc3y33cd4z44dMethod1: Using sample() In this approach we have used the transform function to modify our dataframe, then we have passed th
3 min read
How to change Row Names of DataFrame in R ?
The rows are stacked together, each denoted by a unique name. By default, the integer identifiers beginning from 1 to the number of rows are assigned to the data frame by default. The task here is to change the Rows names in given dataframe using R programming. Dataset in use: First SecondThird1a72a
3 min read
How to remove rows that contain all zeros in R dataframe?
In this article, let's discuss how to rows that contain all zeroes in R dataframe. Approach: Create dataframeGet the sum of each rowSimply remove those rows that have zero-sum. Based on the sum we are getting we will add it to the new dataframe. if the sum is greater than zero then we will add it ot
1 min read
Remove rows with NA in one column of R DataFrame
Columns of DataFrame in R Programming Language can have empty values represented by NA. In this article, we are going to see how to remove rows with NA in one column. We will see various approaches to remove rows with NA values.ApproachCreate a data frameSelect the column based on which rows are to
2 min read
How to Extract a Column from R DataFrame to a List ?
In this article, we will discuss how to extract a column from a DataFrame to a List in R Programming Language. Method 1: Converting all columns to list In this method, we are going to create a vector with character (names) and integer(marks) type data and passing to the student dataframe. Similarly,
2 min read