How to Handle Error in lm.fit with createFolds Function in R
Last Updated :
23 Jul, 2025
When you are working with linear models and cross-validation in R then you may come across the following error “Error in lm. fit (0 non-na Cases)” This is a common error with creating folds with the caret package, which can sometimes produce inaccurate folds. In this article, you will learn why this error can occur and how to manage this issue in R Programming Language.
Understanding the Error
The "Error in lm. fit (0 non-na Cases)" typically occurs when:
- Improper Handling of NAs: If NAs are not properly handled or imputed before creating the folds, they can cause issues during the model fitting process.
- Data Subsetting Issues: When using functions like creating folds from the caret package for cross-validation, the data might be split in a way that one or more folds contain only missing values (NAs).
- Imbalanced Datasets: If your dataset is heavily imbalanced or contains a lot of missing values, some of the cross-validation folds might end up without any valid observations.
This is normally an error that arises when analyzing discrete data or when using disproportionate stratified sampling on rare occurrence cases.
Causes and Solutions of the Error in lm.fit
Here are the main types of Error in lm.fit occuers in the model and we will discuss different methods to solve this errors.
1. Presence of NA Values
Looking at your current set of your data, there appear to be NA values which would cause an inconvenience in the model fitting.
R
# Load necessary packages
if (!requireNamespace("caret", quietly = TRUE)) {
install.packages("caret")
}
library(caret)
# Create a dataset where 'x' contains only NA values
data <- data.frame(
x = rep(NA, 100), # 'x' column with 100 NA values
y = rnorm(100) # 'y' column with random normal values
)
# Function to fit linear model and handle errors
fit_model <- function(data) {
tryCatch({
lm(y ~ x, data = data)
}, error = function(e) {
message("Error fitting model: ", e$message)
return(NULL) # Return NULL if there's an error
})
}
# Fit linear model (intentionally triggers error)
result <- fit_model(data)
# Check if model fitting was successful
if (!is.null(result)) {
print(summary(result)) # Print summary of the model if successful
}
Output:
Error fitting model: 0 (non-NA) cases
2. Imbalanced Datasets
This error occurs when the dataset is heavily imbalanced or contains a lot of missing values, leading some cross-validation folds to have no valid observations.
R
# Example of an imbalanced dataset
data_imbalanced <- data.frame(
x = c(rep(NA, 8), 9, 10),
y = c(rep(2, 8), NA, 20)
)
# Create 3 folds
folds <- createFolds(data_imbalanced$y, k = 3)
# Perform cross-validation
cv_results <- lapply(folds, function(train_indices) {
train_data <- data_imbalanced[train_indices, ]
model <- lm(y ~ x, data = train_data)
return(summary(model))
})
Output:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
3. Improper Handling of NAs
This error occurs if NAs are not properly handled or imputed before creating the folds, causing issues during the model fitting process.
R
# Data with missing values not handled
data_with_nas <- data.frame(
x = c(1, 2, NA, 4, 5, NA, 7, 8, 9, 10),
y = c(2, 4, 6, NA, 10, 12, 14, 16, NA, 20)
)
# Create 5 folds
folds <- createFolds(data_with_nas$y, k = 5)
# Perform cross-validation
cv_results <- lapply(folds, function(train_indices) {
train_data <- data_with_nas[train_indices, ]
model <- lm(y ~ x, data = train_data)
return(summary(model))
})
Output:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
Now we will discuss all the solutions of the caused errors.
Solution 1: Remove or Impute NAs Before Creating Folds
For the first example, removing rows with NAs can solve the issue.
R
# Load necessary libraries
library(caret)
# Example data with NAs
set.seed(123)
data <- data.frame(
x = c(1, 2, 3, NA, 5, 6, 7, 8, NA, 10),
y = c(2, 4, 6, 8, 10, NA, 14, 16, 18, 20)
)
# Remove rows with NAs
clean_data <- na.omit(data)
# Create folds on the clean data
folds <- createFolds(clean_data$y, k = 5)
# Perform cross-validation on clean data
cv_results <- lapply(folds, function(train_indices) {
train_data <- clean_data[train_indices, ]
model <- lm(y ~ x, data = train_data)
return(summary(model))
})
print(cv_results)
Output:
$Fold1
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16 NA NA NA
x NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
$Fold2
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0 NA NA NA
x 2 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
$Fold3
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0 NA NA NA
x 2 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
$Fold4
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20 NA NA NA
x NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
$Fold5
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10 NA NA NA
x NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Solution 2: Check for Valid Cases Within Each Fold
For the second example, ensuring each fold has valid cases can help.
R
# Load necessary libraries
library(caret)
# Example of an imbalanced dataset
data_imbalanced <- data.frame(
x = c(rep(NA, 8), 9, 10),
y = c(rep(2, 8), NA, 20)
)
# Create 3 folds
folds <- createFolds(data_imbalanced$y, k = 3)
# Perform cross-validation
cv_results <- lapply(folds, function(train_indices) {
train_data <- data_imbalanced[train_indices, ]
model <- lm(y ~ x, data = train_data)
return(summary(model))
})
Output:
$Fold1
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20 NA NA NA
x NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Solution 3: Impute Missing Values
For the third example, imputing missing values ensures the model has data to work with.
R
# Load necessary libraries
library(caret)
# Example of an imbalanced dataset
data_imbalanced <- data.frame(
x = c(rep(NA, 8), 9, 10),
y = c(rep(2, 8), NA, 20)
)
# Remove rows with NAs in the target variable before creating folds
clean_data <- na.omit(data_imbalanced)
# Create 3 folds on the clean data
folds <- createFolds(clean_data$y, k = 3)
# Perform cross-validation with a check for non-NA cases
cv_results <- lapply(folds, function(train_indices) {
train_data <- clean_data[train_indices, ]
if (all(is.na(train_data$x)) | all(is.na(train_data$y))) {
return(NULL) # Skip fold if all values are NA
} else {
model <- lm(y ~ x, data = train_data)
return(summary(model))
}
})
print(cv_results)
Output:
$Fold1
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0 NA NA NA
x 2 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
$Fold2
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
3 6
-3 3
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9 3 3 0.205
x NA NA NA NA
Residual standard error: 4.243 on 1 degrees of freedom
$Fold3
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.5 NA NA NA
x 0.0 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: NaN, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
$Fold4
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0 NA NA NA
x 2 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
$Fold5
Call:
lm(formula = y ~ x, data = train_data)
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0 NA NA NA
x 2 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
By following these complete solutions, you can handle the "Error in lm.fit (0 non-NA cases)" effectively and ensure a smooth model training and evaluation process.
Conclusion
The "Error in lm.fit (0 non-na Cases)" when using createFolds can be frustrating, but it's often a sign of underlying data issues. By understanding the causes and implementing robust solutions, you can ensure your cross-validation process is more reliable and your machine learning models are built on solid foundations. Remember to always inspect your data thoroughly before applying machine learning techniques, and consider the nature of your dataset when choosing cross-validation strategies. With these practices in place, you'll be better equipped to handle and prevent such errors in your R-based machine learning projects.
Similar Reads
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.Do you
5 min read
Introduction to Machine Learning
Python for Machine Learning
Machine Learning with Python TutorialPython language is widely used in Machine Learning because it provides libraries like NumPy, Pandas, Scikit-learn, TensorFlow, and Keras. These libraries offer tools and functions essential for data manipulation, analysis, and building machine learning models. It is well-known for its readability an
5 min read
Pandas TutorialPandas is an open-source software library designed for data manipulation and analysis. It provides data structures like series and DataFrames to easily clean, transform and analyze large datasets and integrates with other Python libraries, such as NumPy and Matplotlib. It offers functions for data t
6 min read
NumPy Tutorial - Python LibraryNumPy (short for Numerical Python ) is one of the most fundamental libraries in Python for scientific computing. It provides support for large, multi-dimensional arrays and matrices along with a collection of mathematical functions to operate on arrays.At its core it introduces the ndarray (n-dimens
3 min read
Scikit Learn TutorialScikit-learn (also known as sklearn) is a widely-used open-source Python library for machine learning. It builds on other scientific libraries like NumPy, SciPy and Matplotlib to provide efficient tools for predictive data analysis and data mining.It offers a consistent and simple interface for a ra
3 min read
ML | Data Preprocessing in PythonData preprocessing is a important step in the data science transforming raw data into a clean structured format for analysis. It involves tasks like handling missing values, normalizing data and encoding variables. Mastering preprocessing in Python ensures reliable insights for accurate predictions
6 min read
EDA - Exploratory Data Analysis in PythonExploratory Data Analysis (EDA) is a important step in data analysis which focuses on understanding patterns, trends and relationships through statistical tools and visualizations. Python offers various libraries like pandas, numPy, matplotlib, seaborn and plotly which enables effective exploration
6 min read
Feature Engineering
Supervised Learning
Unsupervised Learning
Model Evaluation and Tuning
Advance Machine Learning Technique
Machine Learning Practice