How to filter R DataFrame by values in a column?
Last Updated :
30 May, 2021
In R Programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained :
- Rows are considered to be a subset of the input.
- Rows in the subset appear in the same order as the original dataframe.
- Columns remain unmodified.
- The number of groups may be reduced, based on conditions.
- dataframe attributes are preserved during data filter.
Method 1 : Using dataframe indexing
Any dataframe column in the R programming language can be referenced either through its name df$col-name or using its index position in the dataframe df[col-index]. The cell values of this column can then be subjected to constraints, logical or comparative conditions, and then a dataframe subset can be obtained. These conditions are applied to the row index of the dataframe so that the satisfied rows are returned.
- Selection based on a check of missing values or NA
Cells in dataframe can contain missing values or NA as its elements, and they can be verified using is.na() method in R language.
Example:
R
# declaring a dataframe
data_frame = data.frame(col1 = c(NA,"b",NA,"e","e") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values are not NA
data_frame_mod <- data_frame[!is.na(data_frame$col1),]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 <NA> 0 TRUE
2 b 2 FALSE
3 <NA> 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
2 b 2 FALSE
4 e 4 TRUE
5 e 5 TRUE
- Selection based on a single comparative condition on a column
Column values can be subjected to constraints to filter and subset the data. The values can be mapped to specific occurrences or within a range.
Example:
R
# declaring a dataframe
data_frame = data.frame(col1 = c("b","b","e","e","e") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which columns have col3
# value equivalent to true
data_frame_mod <- data_frame[data_frame$col3==TRUE,]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 e 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
4 e 4 TRUE
5 e 5 TRUE
- Selection based on multiple comparative conditions on a column
Column values can be subjected to constraints to filter and subset the data. The conditions can be combined by logical & or | operators. The %in% operator is used here, in order to check values that match to any of the values within a specified vector.
Example:
R
# declaring a dataframe
data_frame = data.frame(col1 = c("b","b","d","e","e") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1
# are equivalent to b or e
data_frame_mod <- data_frame[data_frame$col1 %in% c("b","e"),]
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 e 5 TRUE
Method 2 : Using dplyr library
The dplyr library can be installed and loaded into the working space which is used to perform data manipulation.
The filter() function is used to produce a subset of the dataframe, retaining all rows that satisfy the specified conditions. The filter() method in R can be applied to both grouped and ungrouped data. The expressions include comparison operators (==, >, >= ) , logical operators (&, |, !, xor()) , range operators (between(), near()) as well as NA value check against the column values. The subset dataframe has to be retained in a separate variable.
Syntax:
filter(df , cond)
Parameter :
df - The dataframe object
cond - The condition to filter the data upon
Example:
R
library ("dplyr")
# declaring a dataframe
data_frame = data.frame(col1 = c("b","b","d","e","e") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1
# are equivalent to b or e
data_frame_mod <- filter(data_frame,col2>1)
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 2 FALSE
2 e 4 TRUE
3 e 5 TRUE
Also, the values can be checked using the %in% operator to match the column cell values with the elements contained in the input specified vector.
Example:
R
library ("dplyr")
# declaring a dataframe
data_frame = data.frame(col1 = c("b","b","d","e","e") ,
col2 = c(0,2,1,4,5),
col3= c(TRUE,FALSE,FALSE,TRUE, TRUE))
print ("Original dataframe")
print (data_frame)
# checking which values of col1
# are equivalent to b or e
data_frame_mod <- filter(data_frame,col1 %in% c("b","e"))
print ("Modified dataframe")
print (data_frame_mod)
Output
[1] "Original dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
3 d 1 FALSE
4 e 4 TRUE
5 e 5 TRUE
[1] "Modified dataframe"
col1 col2 col3
1 b 0 TRUE
2 b 2 FALSE
4 e 4 TRUE
5 e 5 TRUE
Similar Reads
How to extract column from data frame as vector in R ? In this article, we are going to convert dataframe column into a vector in R Programming Language. Steps -Create vectorsCreate a dataframe by passing these vectorsConvert dataframe column using"[[]]" operator (indexing). [[]] is used to access the dataframe column.It is used to index the dataframe.
1 min read
How to filter R dataframe by multiple conditions? In R programming Language, dataframe columns can be subjected to constraints, and produce smaller subsets. However, while the conditions are applied, the following properties are maintained : Rows are considered to be a subset of the input.Rows in the subset appear in the same order as the original
5 min read
Drop column(s) by name from a given DataFrame in R Dropping of columns from a data frame is simply used to remove the unwanted columns in the data frame. In this article, we will be discussing the two different approaches to drop columns by name from a given Data Frame in R. The different approaches to drop columns by the name from a data frame is R
3 min read
How to Select DataFrame Columns by Index in R? In this article, we will discuss how to select columns by index from a dataframe in R programming language. Note: The indexing of the columns in the R programming language always starts from 1. Method 1: Select Specific Columns By Index with Base R Here, we are going to select columns by using index
2 min read
Get All Factor Levels of DataFrame Column in R The data frame columns in R can be factorized on the basis of its factor columns. The data frame factor columns are composed of factor levels. Factors are used to represent categorical data. Each of the factor is denoted by a level, computed in the lexicographic order of appearance of characters or
3 min read
Select DataFrame Rows where Column Values are in Range in R In this article, we will discuss how to select dataframe rows where column values are in a range in R programming language. Data frame indexing can be used to extract rows or columns from the dataframe. The condition can be applied to the specific columns of the dataframe and combined using the logi
2 min read