How to Find and Count Missing Values in R DataFrame
Last Updated :
10 Jul, 2025
In R programming, missing values are commonly represented as NA. To identify and handle these values effectively, we can use the is.na() function, which checks whether a data point is missing.
Syntax
which(is.na(data))
sum(is.na(data))
Parameters:
- is.na(data): Identifies missing values and returns TRUE for each NA.
- which(is.na(data)): Returns the index positions of missing values.
- sum(is.na(data)): Calculates the total number of missing values.
Find and count the Missing values From the entire Data Frame
We create a data frame named stats and use which(is.na()) to get the positions of missing values and sum(is.na()) to get the total number.
- data.frame: creates tabular data from vectors.
- is.na: checks whether a value is missing (NA).
- which: returns the positions of
TRUE
values in a logical vector. - sum: counts
TRUE
values by summing them (as TRUE = 1
, FALSE = 0
).
R
stats <- data.frame(player = c('A', 'B', 'C', 'D'),
runs = c(100, 200, 408, NA),
wickets = c(17, 20, NA, 5))
print("Position of missing values ")
which(is.na(stats))
print("Count of total missing values ")
sum(is.na(stats))
Output:
OutputCount the number of Missing Values with summary
We use summary() to get statistical details of each column, including the number of missing values.
- summary: gives descriptive statistics and NA counts per column.
R
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(NA, 200, 408, NA),
wickets=c(17, 20, NA, 8))
summary(stats)
Output:
OutputCount the number of Missing Values with colSums
We use colSums() with is.na() to count NA values in each column.
- colSums: computes the sum of each column, here to count NAs.
- is.na: checks for missing values.
R
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(NA, 200, 408, NA),
wickets=c(17, 20, NA, 8))
colSums(is.na(stats))
Output:
OutputFind and count the Missing values in one column of a Data Frame
We check the missing values in specific columns using dataframe$column.
- $ operator: accesses a specific column from a data frame.
R
stats <- data.frame(player = c('A', 'B', 'C', 'D'),
runs = c(NA, 200, 408, NA),
wickets = c(17, 20, NA, 8))
print("Location of missing values in runs column")
which(is.na(stats$runs))
print("Count of missing values in wickets column")
sum(is.na(stats$wickets))
Output:
OutputFind and count missing values in all columns in Data Frame
We use sapply() to apply functions column-wise and identify NA positions and counts.
- sapply: applies a function to each column and returns a list or vector.
- function(x): defines an anonymous function to check NAs per column.
- which: returns positions of missing values.
- sum: counts total missing values in each column.
R
stats <- data.frame(player = c('A', 'B', 'C', 'D'),
runs = c(100, 200, 408, NA),
wickets = c(17, 20, NA, 5))
print("Position of missing values by column wise")
sapply(stats, function(x) which(is.na(x)))
print("Count of missing values by column wise")
sapply(stats, function(x) sum(is.na(x)))
Output:
OutputThe output shows the position and count of missing values in each column. The runs column has a missing value at position 4 and wickets has one at position 3, while player has no missing values. This helps quickly locate and quantify missing data column-wise.