2 - Early Prediction of Remaining Useful Life of Battery
2 - Early Prediction of Remaining Useful Life of Battery
LIFE OF BATTERY
A PROJECT REPORT
Submitted by
LOGESHWAR S 715520105019
SHARATH B 715520105311
IN
BONAFIDE CERTIFICATE
i
Internal Examiner External Examiner
ii
ABSTRACT
As the need for reliable energy storage systems grows, it is imperative to ensure
that batteries are used as effectively as possible. This work addresses the critical issue of
early Remaining Useful Life (RUL) prediction for lithium-ion batteries, with the goal of
enabling proactive maintenance and optimizing battery performance. In this study, a
novel approach to accurately predict the Remaining Useful Life (RUL) of batteries based
on relevant operational parameters, using regression models is provided. The study
methodology includes the collection of a significant quantity of operating battery data,
including voltage characteristics, temperature profiles, and charge-discharge cycles.
Several regression models are applied to this dataset in order to predict the relationship
between various parameters like discharge capacity, voltage, cycle life etc. Feature
engineering approaches are applied to extract meaningful information that support
accurate RUL prediction.
iii
ACKNOWLEDGEMENT
We extend our gratitude to all the faculty and staff members, friends and
our parents for their constant support and guidance.
iv
TABLE OF CONTENTS
ABSTRACT ii
LIST OF TABLES vi
LIST OF FIGURES vii
LIST OF ABBREVIATIONS ix
1 INTRODUCTION 1
1.1 NEED FOR ANALYSIS 2
1.2 OBJECTIVES 3
1.3 OUTLINE OF THE REPORT 3
2 LITERATURE SURVEY 5
3 FEATURE EXTRACTION 14
3.1 INTRODUCTION 14
3.2 SUMMARY OF DATASET 14
3.3 FEATURE SELECTION AND EXTRACTION 16
3.4 PEARSON CORRELATION COEFFICIENT 21
3.5 FEATURE GENERATION AND 23
STATISTICAL APPROACH
4 UNIVARIATE AND MULTIVARIATE 26
BASED RUL PREDICTION
4.1 STATISTICAL MODELS 26
4.2 UNIVARIATE MODEL FROM ΔQ100−10(V) 26
4.2.1 Univariate Features and Transformations 27
4.3 MULTIVARIATE MODEL 28
v
CHAPTER NO. TITLE PAGE NO.
4.3.1 Methods of Multivariate Model 30
4.3.2 Hyperparameter Optimization 32
4.4 LASSO REGRESSION MODEL 34
5 RESULT AND DISCUSSION 37
6 CONCLUSION 45
7 APPENDIX 46
8 REFERENCES 62
vi
LIST OF TABLES
vii
LIST OF FIGURES
viii
S.NO. NAME PAGE NO.
ix
LIST OF ABBREVIATIONS
RF Random Forest
x
CHAPTER 1
INTRODUCTION
The motive of this project delves into the significance of precise battery
lifetime prediction and investigates the utilization of statistical learning approaches
for developing models in this domain. Firstly, the applications of battery lifetime
prediction are outlined, encompassing screening new components during research,
optimizing cell designs, and predicting real-world cycle life and safety events. The
challenges in modelling battery degradation are then acknowledged, leading to the
discussion of desirable criteria for data-driven models, emphasizing attributes such
as accuracy, minimal cycle requirements, small training set sizes, and
interpretability.
The role of machine learning, particularly advanced and deep learning
methods, is explored for their potential in capturing information from complex
battery cycling datasets. However, their drawbacks, including brittleness and
interpretability issues, are noted. The text underscores the underexplored realm of
statistical learning for battery lifetime prediction, referencing the work of Severson
et al., who developed models for lithium iron phosphate/graphite cells undergoing
fast charging.
Various dimensionality reduction and model-building approaches are
discussed, revealing that simple models, such as multivariate and univariate linear
models based on ΔQ100−10(V), perform comparably to more complex ones.
Interpretability is highlighted as crucial, showcasing the effectiveness of statistical
learning.
Moreover, this project sheds light on new approaches using elements of
ΔQ100−10(V) and their unique insights into the relationship with cycle life. This
project mainly underscores the importance of accurate predictions, acknowledges
1
challenges in modeling, and advocates for the integration of interpretable statistical
learning approaches alongside advanced machine learning methods.
2
Figure 1.1 Block Diagram of RUL Prediction
1.2 OBJECTIVES
● Prediction of Remaining Useful Life (RUL) within the early cycles
(approximately 100 cycles).
● Utilize a multivariate approach to estimate Remaining Useful Life (RUL)
based on historical data and current conditions.
● Feature selection for reduction of error between predicted and actual values.
● Validate the RUL estimation model using appropriate techniques and
optimize its parameters.
CHAPTER 1: INTRODUCTION
Brief introduction about the statistical learning for accurate battery lifetime
prediction, addressing challenges and advocating interpretability alongside
advanced methods. The needs and objectives of the project are explained.
3
CHAPTER 2: LITERATURE SURVEY
The recent advancements in lithium-ion battery Remaining Useful Life
(RUL) prediction methods, including data-driven approaches, machine
learning models, and hybrid algorithms, showcasing improved accuracy and
applicability across various battery types and conditions are summarized.
CHAPTER 6: CONCLUSION
Brief summary of the project is provided. Future extension of the project
and reference that helped in completion of the project are discussed.
4
CHAPTER 2
LITERATURE SURVEY
Liu, J., Saxena, A., Goebel, K., Saha, B., & Wang, W. (2010). An adaptive
recurrent neural network for remaining useful life prediction of lithium-ion
batteries. Annual Conference of the Prognostics and Health Management Society,
PHM 2010. The ARNN utilizes adaptive/recurrent neural network architecture
with optimized network weights using the recursive Levenberg-Marquardt (RLM)
method, showcasing its effectiveness in prognostics.
Cheng, Y., Lu, C., Li, T., & Tao, L. (2015). Residual lifetime prediction for
lithium-ion battery based on functional principal component analysis and Bayesian
approach. Energy, 90, The study presents a novel method for real-time lithium-ion
battery residual lifetime prediction, using functional principal component analysis
(FPCA) and Bayesian updating, validated with NASA Ames Prognostics Center
data.
Zhou, Y., & Huang, M. (2016). Lithium-ion batteries remaining useful life
prediction based on a mixture of empirical mode decomposition and ARIMA
model. Microelectronics Reliability, 65. The paper proposes a novel approach for
predicting lithium-ion battery remaining useful life (RUL) in electric vehicle
battery management systems. It combines empirical mode decomposition (EMD)
with autoregressive integrated moving average (ARIMA) models, outperforming
other methods like relevance vector machine and monotonic echo state networks.
5
Mo, B., Yu, J., Tang, D., & Liu, H. (2016). A remaining useful life prediction
approach for lithium-ion batteries using Kalman filter and an improved particle
filter. 2016 IEEE International Conference on Prognostics and Health
Management, ICPHM 2016. The paper proposes a novel Particle Filtering (PF)
method for estimating the Remaining Useful Life (RUL) of lithium-ion batteries.
By combining Kalman filter with PF and integrating a particle swarm optimization
algorithm, it achieves higher accuracy compared to standard PF and particle swarm
optimization-based PF methods, validated using NASA's battery dataset.
Zhao, Q., Qin, X., Zhao, H., & Feng, W. (2018). A novel prediction method based
on the support vector regression for the remaining useful life of lithium-ion
batteries. Microelectronics Reliability, 85. The paper introduces real-time
measurable health indicators (HIs) for lithium-ion battery state of health (SOH)
estimation, including the time interval of an equal charging voltage difference
(TIECVD) and the time interval of an equal discharging voltage difference
(TIEDVD). These HIs, along with a novel method combining feature vector
selection (FVS) and support vector regression (SVR), enhance accuracy and
efficiency in predicting SOH and remaining useful life (RUL). Experimental
results on two datasets validate the effectiveness of the proposed approach.
Sun, Y., Hao, X., Pecht, M., & Zhou, Y. (2018). Remaining useful life prediction
for lithium-ion batteries based on an integrated health indicator. Microelectronics
Reliability, 88–90. The paper introduces an integrated health indicator for lithium-
ion battery remaining useful life (RUL) prediction, combining multiple factors
with a polynomial degradation model and particle filter algorithm. It offers insights
into RUL prediction in energy-intensive applications.
6
Severson, K. A., Attia, P. M., Jin, N., Perkins, N., Jiang, B., Yang, Z., Chen, M.
H., Aykol, M., Herring, P. K., Fraggedakis, D., Bazant, M. Z., Harris, S. J., Chueh,
W. C., & Braatz, R. D. (2019). Data-driven prediction of battery cycle life before
capacity degradation. Nature Energy, 4(5), 383–391,The study utilizes early-cycle
discharge data from commercial LFP/graphite batteries to develop accurate cycle
life prediction models using data-driven methods without requiring prior
knowledge of cell chemistry.
Xiong, R., Zhang, Y., Wang, J., He, H., Peng, S., & Pecht, M. (2019). Lithium-Ion
Battery Health Prognosis Based on a Real Battery Management System Used in
Electric Vehicles. IEEE Transactions on Vehicular Technology, 68(5), 4110–4121.
This paper [9] This paper introduces a health indicator derived from partial charge
voltage curves for lithium-ion batteries, alongside a moving-window-based method
for predicting battery remaining useful life (RUL). Tested on a real electric vehicle
battery management system, the method shows promising results in estimating
capacity and predicting RUL.
Ma, G., Zhang, Y., Cheng, C., Zhou, B., Hu, P., & Yuan, Y. (2019). Remaining
useful life prediction of lithium-ion batteries based on false nearest neighbors and a
hybrid neural network. Applied Energy, 253. The paper proposes a hybrid neural
network approach, combining false nearest neighbors for sliding window size
determination with a convolutional and long short-term memory network.
Experimental results show improved prediction accuracy for remaining useful life
estimation in lithium-ion batteries compared to existing methods.
7
Yang, F., Wang, D., Xu, F., Huang, Z., & Tsui, K. L. (2020). Lifespan prediction
of lithium-ion batteries based on various extracted features and gradient boosting
regression tree model. Journal of Power Sources, 476, This paper proposes using
gradient boosting regression trees (GBRT) to accurately predict battery lifespan by
modeling complex nonlinear dynamics through various extracted features,
outperforming other machine learning methods.
Liu, H., Naqvi, I. H., Li, F., Liu, C., Shafiei, N., Li, Y., & Pecht, M. (2020). An
analytical model for the CC-CV charge of Li-ion batteries with application to
degradation analysis. Journal of Energy Storage, 29. The paper explores the
relationship between constant current charge time (CCCT) and constant voltage
charge time (CVCT) in Li-ion batteries' degradation. It develops an analytical
model verified with cycling tests under various conditions and introduces the CV-
CC time ratio as a new health indicator. Additionally, two partial CC-CV charging
cases are examined.
Hasib, S. A., Islam, S., Chakrabortty, R. K., Ryan, M. J., Saha, D. K., Ahamed, M.
H., Moyeen, S. I., Das, S. K., Ali, M. F., Islam, M. R., Tasneem, Z., & Badal, F. R.
(2021). A Comprehensive Review of Available Battery Datasets, RUL Prediction
Approaches, and Advanced Battery Management. IEEE Access, 9, 86166–86193.
The review paper discusses battery data acquisition, management systems for state
estimation, and remaining useful life (RUL) estimation. It covers Li-ion battery
datasets, state estimation, RUL prognostic methods, and comparative studies of
prediction algorithms and models.
8
Afshari, S. S., Cui, S., Xu, X., & Liang, X. (2022). Remaining Useful Life Early
Prediction of Batteries Based on the Differential Voltage and Differential Capacity
Curves. IEEE Transactions on Instrumentation and Measurement, 71, This article
proposes a data-driven method for early prediction of battery remaining useful life
(RUL) using features extracted from differential capacity and voltage curves.
Employing Sparse Bayesian Learning (SBL), the method outperforms lasso and
elastic net methods in accuracy, offering a versatile solution applicable to various
battery types.
Azyus, A. F., Wijaya, S. K., & Naved, M. (2023). Prediction of remaining useful
life using the CNN-GRU network: A study on maintenance management[Formula
presented]. Software Impacts, 17, discusses the CNN-GRU network, an open-
source software package for predicting remaining useful lives (RULs) using
advanced deep learning techniques. It highlights ongoing research, limitations, and
future improvements aimed at enhancing the software's accuracy, efficiency, and
applicability in various industries.
Xiong, W., Xu, G., Li, Y., Zhang, F., Ye, P., & Li, B. (2023). Early prediction of
lithium-ion battery cycle life based on voltage-capacity discharge curves. Journal
of Energy Storage, 62, This paper proposes a method for early predicting lithium-
ion battery cycle life using weighted least squares support vector machine (WLS-
SVM) with health indicators (HIs) extracted from voltage-capacity discharge
curves. By addressing nonlinearity and outlier data issues, it achieves accurate
cycle life prediction, validating its effectiveness for early prediction.
9
Zhou, Z., & Howey, D. A. (2023). Bayesian hierarchical modelling for battery
lifetime early prediction. IFAC-PapersOnLine, 56(2), 6117–6123, The paper
proposes a hierarchical Bayesian linear model for battery life prediction,
combining individual cell features with population-wide features. Leveraging data
from the first 100 cycles, the model outperforms baseline models with a root mean
square error of 3.2 days and mean absolute percentage error of 8.6% in 5-fold
cross-validation.
Lyu, G., Zhang, H., & Miao, Q. (2023). RUL Prediction of Lithium-Ion Battery in
Early-Cycle Stage Based on Similar Sample Fusion Under Lebesgue Sampling
Framework. IEEE Transactions on Instrumentation and Measurement, 72, 1–11.
This article suggests an RUL prediction method for early-cycle lithium-ion
batteries, employing similar sample fusion under the Lebesgue sampling
framework. It introduces a novel similarity measurement index optimized by the
jumping spider optimization algorithm (JSOA), showcasing effectiveness through
experimental results on the APR18650M1A battery dataset.
Xu, Q., Wu, M., Khoo, E., Chen, Z., & Li, X. (2023). A Hybrid Ensemble Deep
Learning Approach for Early Prediction of Battery Remaining Useful Life.
IEEE/CAA Journal of Automatica Sinica, 10(1), 177–187. The paper proposes a
hybrid deep learning model for early prediction of lithium-ion battery remaining
useful life (RUL). It combines handcrafted features with deep network-learned
features and utilizes a non-linear correlation-based method for feature selection. A
snapshot ensemble learning strategy enhances model generalization. Experimental
results show superior performance compared to other approaches, even on
secondary test sets with different distributions.
1
Li, X., Yu, D., Søren Byg, V., & Daniel Ioan, S. (2023). The development of
machine learning-based remaining useful life prediction for lithium-ion batteries.
In Journal of Energy Chemistry (Vol. 82). The study tracks the progress of
machine learning (ML) algorithms in predicting the remaining useful life (RUL) of
lithium-ion batteries, highlighting common approaches and exploring the potential
to extend battery lifetime through optimized charging profiles based on accurate
RUL predictions.
Wang, S., Fan, Y., Jin, S., Takyi-Aninakwa, P., & Fernandez, C. (2023). Improved
anti-noise adaptive long short-term memory neural network modeling for the
robust remaining useful life prediction of lithium-ion batteries. Reliability
Engineering and System Safety, 230. The paper presents an enhanced method for
predicting lithium-ion battery remaining useful life (RUL) in power supply
applications. It introduces an improved anti-noise adaptive long short-term
memory (ANA-LSTM) neural network with robust feature extraction and
parameter optimization, achieving significantly better prediction accuracy than
existing methods.
Ezzouhri, A., Charouh, Z., Ghogho, M., & Guennoun, Z. (2023). A Data-Driven-
Based Framework for Battery Remaining Useful Life Prediction. IEEE Access, 11.
The paper introduces a data-driven framework for predicting the Remaining Useful
Life (RUL) of lithium-ion batteries in Battery Electric Vehicles (BEVs). It includes
a novel feature extraction strategy and demonstrates high accuracy on batteries not
used during training, showcasing its adaptability to various discharge conditions.
1
Gong, S., He, H., Zhao, X., Shou, Y., Huang, R., & Yue, H. (2023). Lithium-Ion
battery remaining useful life prediction method concerning temperature-influenced
charging data. International Conference on Electrical, Computer, Communications
and Mechatronics Engineering, ICECCME 2023. The paper proposes an integrated
method for predicting the Remaining Useful Life (RUL) of lithium-ion batteries. It
considers temperature effects, employing a temperature correction formula for
health indicators, neural network (NN) techniques for determining battery State of
Health (SOH), and an Unscented Particle Filter (UPF) for end-of-life (EOL)
prediction. Experimental validation using Sandia National Lab data confirms its
improved accuracy over conventional methods.
Zheng, X., Wu, F., & Tao, L. (2024). Degradation trajectory prediction of lithium-
ion batteries based on charging-discharging health features extraction and
integrated data-driven models. Quality and Reliability Engineering International.
The paper introduces CSSA-ELM-LSSVR, a fusion method for predicting battery
RUL, combining CSSA, ELM, and LSSVR. It extracts HIs from charging-
discharging, correlates them with aging, and predicts degradation using ELM and
LSSVR optimized by CSSA. Fusion of CSSA-ELM and LSSVR with weighting
schemes outperforms other methods, validated on NASA and MIT datasets.
The literature review covers various methods and models for predicting the
remaining useful life (RUL) of lithium-ion batteries, essential for managing battery
health and ensuring reliable performance in electric vehicles and other
applications. Methods include data-driven approaches like neural networks,
particle filtering, and machine learning algorithms, as well as hybrid models
combining different techniques for improved accuracy.
1
These approaches leverage features extracted from battery data, such as voltage-
capacity curves, early-cycle discharge data, and charging-discharging health
indicators, to estimate RUL. Several studies propose novel algorithms and
frameworks for RUL prediction, addressing challenges such as temperature effects,
nonlinear dynamics, and limited data availability.
Fusion methods, Bayesian models, and ensemble learning techniques are explored
to enhance prediction performance and robustness. Experimental validation on
publicly available datasets, such as those from NASA and MIT, demonstrates the
effectiveness of these methods in accurately predicting battery RUL under various
operating conditions.
Additionally, the review discusses the significance of RUL prediction for battery
management, highlighting its potential to optimize maintenance schedules, prolong
battery lifespan, and improve overall system reliability. Overall, the literature
provides valuable insights into the state-of-the-art techniques and future directions
in battery RUL prediction, contributing to advancements in energy storage
technology and electric vehicle deployment.
1
CHAPTER 3
FEATURE EXTRACTION
3.1 INTRODUCTION
A key component of predictive maintenance is feature extraction, especially
when it comes to early battery Remaining Useful Life (RUL) prediction. The need for
dependable energy storage is growing, and with it comes the necessity to be able to
extract meaningful insights from raw battery data. This chapter explores the
fundamental function of feature extraction techniques and clarifies their importance in
extracting relevant data from intricate datasets that include voltage characteristics,
temperature profiles, and charge-discharge cycles. Through the use of a range of
methodologies, including statistical analysis, time-domain assessments, and frequency-
domain transformations, this research aims to reveal the fundamental patterns that
underlie battery performance and therefore enable precise RUL prediction.
This chapter delves into an exhaustive exploration of diverse feature selection
criteria and extraction methodologies, aiming to shed comprehensive light on the
transformative process of converting raw battery data into actionable insights. By
comprehensively evaluating these methods, the aim is to provide insights into their
applicability, effectiveness, and computational efficiency in the context of battery RUL
prediction.
1
temperature of 30°C. The cells underwent a wide range of fast-charging conditions
during the experimental protocol. These conditions ranged from the 3.6C fast-charging
rate recommended by the manufacturer to an exploratory extreme of 6C, which was
designed to mimic rapid charging scenarios with charging times as short as roughly 10
minutes.
Table 3.1 provides a brief explanation of each cell in the dataset. The term "batch
date" denotes the day the batch was initiated. "charging policy" describes the currents
that range from 0% to 80% and are written as "C1(Q1%)-C2," where Q1 is the SOC at
which the current changes and C1 and C2 stand for the first and second applied C rates,
respectively.
1
The critical parameter used to measure the lifespan and performance of the cell is cycle
life, which was measured as the total number of charge-discharge cycles until 80% of
the nominal capacity was reached. The dataset has an extensive range of cycle
lifetimes, which range from s 150 cycles to 2,300 cycles. This range reflects the
various charging regimens that have been implemented. This large dataset has recorded
about 96,700 cycles of lithium-ion batteries that are currently accessible.
Critical parameters like voltage, current, cell temperature, and internal resistance
were monitored throughout the cycling process. This process enriched the dataset with
detailed insights into battery behavior under a variety of operating conditions. In
addition to providing insight into the complex dynamics of battery degradation, these
observations set the stage for the creation and validation of complex predictive models
that are precisely tailored for cycle life prediction and optimization of battery
performance in a wide range of practical applications.
The selection of features for lithium-ion batteries was based on the knowledge to
capture the electrochemical evolution of individual cells during cycling, while being
unaffected by particular chemistry or degradation mechanisms. Features such as initial
discharge capacity, charge time, and cell temperature provide insights into the battery's
performance and potential degradation.
The change in discharge voltage curves (Delta Q{100-10}(V)) between the 100th
and 10th cycles was important for understanding battery behavior over time. Summary
statistics like minimum, mean, and variance of the (Delta Q_{100-10}(V)) curves were
calculated to capture changes between cycles.
These features were chosen for their predictive ability and used in models to assess
the battery's life cycle, with a focus on metrics such as root-mean-squared error
(RMSE) and average percentage error.
1
Table 3.2 Summary of Selected Features
Feature No Quantity
1 The variance of the change in charge between the 100th and 10th
cycle, with respect to voltage, is computed and log transformed.
2 The minimum change in charge between the 100th and 10th cycle, with
respect to voltage, is computed and log transformed.
3 The slope and intercept of the linear fit to the capacity fade curve from
cycles 2 to 100.
Twelve features were selected from the raw data collected during the cycling of lithium
iron phosphate (LFP)/graphite cells and processed using MATLAB. These features are
summarized in Table 3.2. While this feature set may not represent either the minimal or
1
maximal possible set, these features offer valuable information for the early prediction
of the remaining useful life (RUL) of the batteries.
Figure 3.1 Discharge Capacity Curves for 100th, 10th and 2nd Cycles
Severson et al. [7] mainly used features from ΔQ100−10(V) because they thought it
could predict battery cycle life well, especially in cases where degradation is uneven. It
was found that the same thing in our analysis: features from ΔQ100−10(V) also
showed good predictive ability. These features are primarily centered around the "Delta
Q" (ΔQ) metric, as shown in Figure 3.1 which represents the change in charge capacity
of the battery between different cycles. ΔQ is a fundamental measure that captures the
dynamic behavior and degradation of the battery over its lifetime.
For instance, features such as the variance and minimum change in ΔQ between
specific cycles offer valuable insights into the variability and minimum degradation
experienced by the battery. These metrics provide a comprehensive view of how the
battery's capacity fluctuates and degrades over time.
1
Figure 3.2 Initial Capacity Curve of Single Cycle
1
precision. Utilizing the IC curve allows us to capture meaningful information about the
battery's condition and improve the accuracy of our battery SOH estimation.
Furthermore, features related to discharge capacity at different cycles, such as the 2nd
and 100th cycles, offer essential information about the battery's initial capacity and its
capacity at later stages. These features highlight how the battery's performance evolves
over its lifetime, reflecting both its initial state and its degradation trajectory.
Figure 3.4 represents the internal resistance of the battery at a specific cycle (e.g., 2nd
cycle). Internal Resistance (Int.Res.) distribution is given for the first 1000 cycles of
cell number 5 from the dataset. Rising internal resistance suggests degradation, which
affects battery performance and lifespan. It shows the change in internal resistance
from the 2nd cycle to the 100th cycle. Positive trends in IRDiff2And100 indicate
increasing internal resistance, reflecting degradation and reduced lifespan.
2
Figure 3.5 Voltage Distribution of Multiple Cycles
During the discharge process of a battery, it takes a period for its voltage to drop from a
relatively high voltage to a lower voltage which is denoted in Figure 3.5. Voltage
distribution is given for the Multiple cycles of cell number 5 from the dataset. This time
will decrease as the number of battery charge–discharge cycles increase. With the
increase in the number of charges–discharge cycles of the battery, the rate of decrease
in the discharge voltage gradually becomes faster.
2
Figure 3.6 PCC Values of Extracted Features
The PCC values for each of the 12 features were calculated and visualized. However,
for clarity and simplicity, only the highest PCC values were shown in Figure 3.6. This
ensured that the focus is on features with the strongest relationship to cycle life. By
examining this figure, it was identified and prioritized features with the highest
correlation, enhancing the accuracy and reliability of our predictive model.
The features that exhibited the highest PCC values, notably originating from discharge
capacity and internal resistance, align with expectations due to their direct influence on
battery performance and degradation. Table 3.3 shows the selected features.
2
Table 3.3 Summary of Extracted Features Based on PCC
All electrochemical discharge data for the first 100 cycles of each battery cell
can be effectively represented using a single capacity matrix, which comprises 1000
voltage points and 100 cycles, resulting in a high-dimensional dataset akin to a
grayscale image. However, neural networks come with inherent drawbacks, including
the need for many samples, computational intensity, and the requirement for domain
expertise, making them challenging to interpret.
2
predicting the log10-transformed cycle life, and model performance was evaluated
using root-mean-squared error (RMSE).
2
within a 1% change until the sampling frequency exceeds 40 mV (40 points over a 1.6
V voltage window) for the training and primary test sets, and 160 mV (10 points) for
the secondary test set.
This analysis indicates that the extensive dataset does not necessitate 1000 points to
capture crucial features in ΔQ100−10(V), prompting the adoption of reduced point
sampling strategies for enhanced computational efficiency. Such an approach holds
promise for identifying optimal sampling rates in voltage curve analysis across various
applications, particularly for extensive battery cycling datasets.
2
CHAPTER 4
Severson et al [7]. identified the log10 variance of ΔQ100−10(V) as the most effective
feature among twenty generated from all data sources. While this feature has plausible
explanations, it may not fully capture battery degradation trends. The aim was to
explore if other univariate transformation-function pairings can achieve similar results
while being more physically meaningful.
The focus was on simple univariate linear models and considered fourteen summary
statistical functions of ΔQ100−10(V) and four feature transformations, resulting in 56
2
pairings. These functions include common summary statistics like min, mean, range,
variance, and kurtosis, as well as single elements of ΔQ100−10(V) at specific voltages.
The feature transformations such as square root, cube root, and log10, though square
root and log10 do not accept negative values were explored. Elastic net regression is
used for regularization even in univariate models.
Furthermore, our study underscores the effectiveness of univariate models derived from
individual elements of ΔQ100−10(V) at specific voltage points. Despite their
simplicity, these models achieve remarkable accuracy in predicting battery degradation.
Overall, our findings emphasize the nuanced interplay between feature selection,
transformation, and model choice in optimizing predictive performance for battery
degradation prediction tasks.
Data is transformed to change its scale or distribution. The initial data exhibits
unprocessed statistical information, providing a standard for comparison and exposing
the inherent properties of the data. For skewed datasets, square root transformation is a
great way to stabilize variance and lessen the impact of outliers by taking the square
root of absolute values.
2
Similar to the square root, the cube root transformation reduces skewness and stabilizes
variance; it works particularly well with data that is right-skewed. By taking the base-
10 logarithm of absolute values, compressing the range of big values and extending the
range of small values, distributions are made more symmetrical for simpler
interpretation. This process is known as logarithmic transformation (base 10). The
benefits of each transformation are different, and the decision is based on the aims of
the research and the properties of the dataset. To find the best transformation for a
particular situation, experimentation is frequently required.
In analyzing the difference in capacity between the 100th and 10th cycle at particular
voltage levels, various summary statistics features offer invaluable information. The
mean, minimum, and maximum provide an overview of central tendency and extremes,
indicating average, lowest, and highest capacity differences, respectively. These
metrics offer initial impressions of the dataset's behavior.
Further delving into dispersion, metrics like the Inter-Decile range (IDR), Inter-
Quartile range (IQR), and variance elucidate the spread of data. The IDR and IQR
highlight the difference between specific percentiles, delineating the middle 80% and
50% of the data's spread, respectively. Variance quantifies the spread around the mean,
offering insights into the dataset's variability. The results and outcomes of the
univariate models with these statistical features are discussed in the following chapter
in detail.
The transition from univariate to multivariate modelling marks a pivotal shift in our
approach to understanding battery degradation mechanisms. Unlike univariate models,
which analyse individual features in isolation, multivariate models enable a holistic
exploration of the complex relationships between multiple input features and the
target variable, cycle life.
2
In this phase of our research, the focus extends beyond singular factors like
voltage-specific capacity differences (ΔQ100−10(V)) to encompass a broader spectrum
of battery characteristics. By embracing multivariate modelling techniques, we seek to
unravel the intricate interplay between various parameters and their collective impact
on cycle life.
The dataset, consisting of data from 124 LFP/graphite cells, was gathered as part
of this study and served as a source of information for the development of models for
predicting battery cycle life. The dataset was divided into three sets namely
Three sets of the dataset are separated: test, validation, and training. The model is
trained on the training set, performance is assessed and hyperparameters are adjusted
on the validation set, and the final model is tested on the test set. All cells were
subjected to one of 72 different fast charging protocols, cycling from 0% to 80% state-
of-charge (SOC) in 9 to 13 minutes. They were all discharged at a consistent rate of 4C
(a rate that fully discharges the battery in one hour).
2
Figure 4.1 Dataset Distribution Histogram
Figure 4.1 denotes the histogram which visually represents the distribution of cycle life
across datasets. It helps compare the performance and variability of training, primary
test, and secondary test sets in terms of battery cycle life. From this it was observed that
there is an overfitting of data between the datasets. Hence the cells with overfitting
were removed and the modified data was produced for model tuning.
3
Lasso Regression, similar to Ridge Regression, incorporates a regularization
term but utilizes the L1 norm instead of the L2 norm. This choice promotes sparsity in
the coefficient estimates, effectively performing variable selection by driving certain
coefficients to zero. While Lasso Regression excels at feature selection, it may struggle
with correlated predictors, leading to instability in coefficient estimates.
By combining the best features of both Lasso and Ridge regression approaches,
Elastic Net overcomes the drawbacks of each separately. With the inclusion of L1 and
L2 penalties, Elastic Net provides a flexible method for regression modeling. Similar to
Lasso regression, the L1 penalty promotes feature selection and coefficient sparsity,
and the L2 penalty guarantees stability and robustness, similar to Ridge regression.
With the help of this hybridization, Elastic Net is better equipped to negotiate the
challenging landscape of high-dimensional data, where regularization and feature
selection are crucial for creating precise and understandable models. Advanced
techniques such as Principal Component Regression (PCR) and Partial Least Squares
Regression (PLSR) offer alternate approaches to modeling complex interactions
between goal variables and input data, going beyond the conventional linear regression
paradigms.
In contrast to PLSR, which places more emphasis on optimizing the covariance
between features and the goal, PCR uses Principal Component Analysis (PCA) to
lower the dimensionality of the feature space prior to performing linear regression.
These techniques are vital resources in the toolbox of the contemporary data scientist,
providing effective ways to manage high-dimensional data while revealing the
underlying structure that powers predictive success.
The root mean squared error (RMSE), which is calculated for every fold and
gives an average deviation of the predicted values from the actual ones, is the
evaluation metric.
3
The final model trains on the whole training set without cross-validation after
identifying the ideal hyperparameters. Then, performance is evaluated for
generalization ability on the test set. Among the things to take into account are
managing class imbalances, choosing the best k value for statistical robustness and
computing efficiency, and properly rearranging and stratifying the data to minimize
bias.
3
Elastic Net presents a versatile approach by amalgamating the regularization
techniques of Lasso and Ridge Regression. Here, the hyperparameters α and λ dictate
the strengths of L1 and L2 penalties, respectively. Employing methods like grid search
or random search, practitioners systematically traverse the hyperparameter space to
identify the optimal α and λ combination that minimizes MSE or other relevant
evaluation metrics, thereby achieving a balanced model with enhanced predictive
capabilities.
3
reducing mistakes and offering insightful information on the factors underlying battery
degradation.
to zero, identifying key features, and reducing model complexity. Lasso's regularization
parameter (λ) controls shrinkage, preventing overfitting and improving generalization.
Figure 4.2 is a generalized output graph depicting the lasso model interpretation.
The model is trained using Lasso regression with different regularization parameters,
which helps handle feature correlations and improves model generalizability. Cross-
validation is employed for hyperparameter tuning and model selection based on the
lowest root mean squared error (RMSE). The results of these are discussed briefly in
the following chapter.
3
Figure 4.3 Predicted vs Actual Cycle Life
The results include a graph, shown in Figure 4.3, comparing the actual cycle life of the
batteries with the predicted cycle life, showing how well the model's predictions align
with the true values. This comparison provides a clear visual representation of the
model's effectiveness and accuracy in predicting the remaining life of the batteries.
3
The model was developed to estimate the cycle life of lithium-ion batteries by utilizing
a unique combination of features and data from early cycles. This approach resulted in
a significant reduction in error, achieving an average error rate of just 6.8%, which
represents a notable improvement over existing models.
The model leverages specific features derived from the voltage and current
measurements during the initial cycles of discharge, as well as other supplementary
data sources. This comprehensive approach enabled the capture of essential trends and
patterns in battery behaviour that could be overlooked with simpler models.
By fine-tuning the model's parameters and selecting the most relevant features, its
accuracy and reliability were enhanced. This model outperforms prior models, which
typically require larger data sets or more extensive degradation for accurate predictions.
The developed predictive model achieves an impressively low error rate of 6.8%,
which is significantly lower than the error rates reported in previous research, including
those presented by Severson et al [7]. This highlights the enhanced performance and
precision of the current model in predicting the remaining cycle life of lithium-ion
batteries. In comparison to earlier methods, the suggested model gives better accuracy
in battery life estimates by utilizing features that have been carefully selected and by
using strong model fitting and selection strategies.
3
CHAPTER 5
RESULTS AND
DISCUSSION
Table 5.1 depicts the results of univariate models. The analysis revealed that the
choice of data transformation significantly influences the predictive accuracy of our
models. Across various parameters, such as different voltage levels of Delta Q and
statistical measures like minimum, range, and median, employing a logarithmic
transformation consistently resulted in lower test errors compared to other
transformations like square root or cube root. This highlights the importance of
selecting an appropriate data transformation method to improve model performance.
Additionally, the top-performing single-feature model, Delta Q at 3.428V, shown in
Figure 3.6 exhibited superior predictive ability compared to a complex model from [7],
underscoring the effectiveness of focusing on specific features for accurate predictions.
3
Table 5.1 Root-Mean-Square Error of Univariate Models.
3
Figure 5.1 Root-Mean-Square Error of Univariate Models-1
3
Figure 5.3 Root-Mean-Square Error of Univariate Models-3
Figures 5.1, 5.2, 5.3, 5.4 serves as a pictorial representation of the results in Table 5.1.
From the above discussion we could conclude that the Root-Mean-Square error value
4
of Delta Q(100-10)(3.248 V) was found to be the least when compared to other
summary statistical functions.
Despite the challenge of high collinearity among features, common in fields like
chemometrics and bioinformatics, we're leveraging neural network methods tailored
for such scenarios. These include ridge regression, elastic net regression, Principal
Components Regression (PCR), Partial Least Squares Regression (PLSR), Bayesian
Lasso regression, among others, known for handling highly correlated features
effectively.
Table 5.2 provides a comparison of root mean squared error (RMSE) values across
various models for predicting battery cycle life: the three models from the Severson [7]
4
study, neural network model and our model. Among the regression models evaluated,
PLSR stands out with the lowest RMSE values, in neural network models across both
training and testing sets, indicating its effectiveness in capturing the underlying
patterns in the data while avoiding overfitting. But our model is obtained by the Lasso
model incorporated with the L1 Regularization technique. In contrast, models such as
Principal Component Regression show high RMSE values on the training set,
suggesting potential overfitting issues.
4
characteristics. Through careful selection and engineering of features, the aim was to
capture the intricate patterns and relationships within the data, thereby enhancing the
model's predictive performance.
From Table 5.2, The three models from the Severson [7] study—Variance,
Discharge, and Full—show varying levels of performance. The Variance model has an
RMSE of 103 on the training set and 196 on the secondary test set. The Discharge
model performs slightly better on the training set with an RMSE of 76, but its RMSE
increases on the secondary test sets (173). The Full model achieves the best
performance on the training set with an RMSE of 51, but its RMSE increases to 214 on
the secondary test set.
Our model, which was developed using different features and techniques,
demonstrates a significant improvement in performance. It achieves an RMSE of 87 on
the training set and 90 on the secondary test set, indicating that it generalizes well to
4
new data. This suggests that our model is more robust and provides more reliable
predictions than the models in the [7].
4
CHAPTER 6
CONCLUSION
4
APPENDIX
1. Feature Extraction:
function [xTable, y] = helperGetFeatures(batch)
% HELPERGETFEATURES function accepts the raw measurement data and computes
% the following feature:
%
% Q_{100-10}(V) variance [DeltaQ_var]
% Q_{100-10}(V) minimum [DeltaQ_min]
% Slope of linear fit to the capacity fade curve cycles 2 to 100 [CapFadeCycle2Slope]
% Intercept of linear fit to capacity fade curve, cycles 2 to 100 [CapFadeCycle2Intercept]
% Discharge capacity at cycle 2 [Qd2]
% Average charge time over first 5 cycles [AvgChargeTime]
% Minimum Internal Resistance, cycles 2 to 100 [MinIR]
% Internal Resistance difference between cycles 2 and 100 [IRDiff2And100]
% Voltage feature [VoltageFeature]
% Current feature [CurrentFeature]
y = zeros(N, 1);
DeltaQ_var = zeros(N,1);
V_var = zeros(N,1);
DeltaQ_min = zeros(N,1);
CapFadeCycle2Slope = zeros(N,1);
CapFadeCycle2Intercept = zeros(N,1);
CapFadeCycle2Slope2 = zeros(N,1);
CapFadeCycle2Intercept2 = zeros(N,1);
Qd2 = zeros(N,1);
Qd10 = zeros(N,1);
4
Qmax = zeros(N,1);
Qd100 = zeros(N,1);
IR2= zeros(N,1);
IR100= zeros(N,1);
IR100= zeros(N,1);
AvgChargeTime = zeros(N,1);
MinIR = zeros(N,1);
IRDiff2And100 = zeros(N,1);
VoltageFeature = zeros(N,1);
CurrentFeature = zeros(N,1);
DeltaQ_skewness = zeros(N,1);
DeltaQ_kurtosis = zeros(N,1);
Qmax_minus_Qd2 = zeros(N,1);
Qdischarge = zeros(N,1);
Qcharge = zeros(N,1);
IntegralTemp = zeros(N,1);
dQ_dV = zeros(N,1);
dQ_dV_max = zeros(N,1);
dQ_dV_mean = zeros(N,1);
dQ_dV_sk = zeros(N,1);
dQ_dV_k = zeros(N,1);
for i = 1:N
% cycle life
y(i,1) = size(batch(i).cycles,2) + 1;
numCycles = size(batch(i).cycles, 2);
timeGapCycleIdx = [];
jBadCycle = 0;
for jCycle = 2: numCycles
dt = diff(batch(i).cycles(jCycle).t);
if max(dt) > 5*mean(dt)
jBadCycle = jBadCycle + 1;
timeGapCycleIdx(jBadCycle) = jCycle;
end
end
4
batch(i).summary.QDischarge(timeGapCycleIdx) = [];
batch(i).summary.IR(timeGapCycleIdx) = [];
batch(i).summary.chargetime(timeGapCycleIdx) = [];
DeltaQ_var(i) = log10(abs(var(DeltaQ)));
DeltaQ_min(i) = log10(abs(min(DeltaQ)));
% Slope and intercept of linear fit for capacity fade curve from cycle
% 2 to cycle 100
coeff2 = polyfit(batch(i).summary.cycle(2:100),
batch(i).summary.QDischarge(2:100),1);
CapFadeCycle2Slope(i) = coeff2(1);
CapFadeCycle2Intercept(i) = coeff2(2);
MinIR(i) = min(temp(temp~=0));
4
IRDiff2And100(i) = batch(i).summary.IR(100) - batch(i).summary.IR(2);
end
4
% For example, let's say you want to compute the variance of the voltage data
voltageFeature = log10(abs(var(voltageData)));
end
2. Univariate model:
import copy
from functools import reduce
import glob
import json
import os
from pathlib import Path
from sklearn.svm import SVR
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib import cm
from matplotlib.ticker import AutoMinorLocator
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import pandas as pd
import torch
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
from scipy.stats import skew, kurtosis
from sklearn import preprocessing
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LinearRegression, RidgeCV, ElasticNetCV
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.cross_decomposition import PLSRegression
5
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import KNeighborsRegressor
from image_annotated_heatmap import heatmap, annotate_heatmap
from sklearn.linear_model import LassoCV
from sklearn.linear_model import BayesianRidge
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.gaussian_process.kernels import RationalQuadratic
from sklearn.ensemble import VotingRegressor
# ## Settings
#
# Set plotting settings:
rcParams['figure.autolayout'] = True
rcParams['lines.markersize'] = 5
rcParams['lines.linewidth'] = 1.0
rcParams['font.size'] = 7
rcParams['legend.fontsize'] = 7
rcParams['legend.frameon'] = False
rcParams['xtick.minor.visible'] = True
rcParams['ytick.minor.visible'] = True
rcParams['font.sans-serif'] = 'Arial'
rcParams['mathtext.fontset'] = 'custom'
rcParams['mathtext.rm'] = 'Arial'
rcParams['pdf.fonttype'] = 42
rcParams['ps.fonttype'] = 42
5
for k, file in enumerate(files):
cell = np.genfromtxt(file, delimiter=',')
dataset[k,:,:] = cell # flip voltage dimension
return dataset
# Load three datasets (`test1` = primary test set, `test2` = secondary test set):
data_train = load_dataset('train')
data_test1 = load_dataset('test1')
data_test2 = load_dataset('test2')
y_train = np.log10(cycle_lives_train)
y_test1 = np.log10(cycle_lives_test1)
y_test2 = np.log10(cycle_lives_test2)
# Make a new version of `data_test1`, `cycle_lives_test1` and `y_test1` that exclude the
outlier battery:
data_test1_mod = data_test1.copy()
5
cycle_lives_test1_mod = cycle_lives_test1.copy()
y_test1_mod = y_test1.copy()
data_test2_mod = data_test2.copy()
cycle_lives_test2_mod = cycle_lives_test2.copy()
y_test2_mod = y_test2.copy()
print(np.median(cycle_lives_train))
print(np.median(cycle_lives_test1_mod))
print(np.median(cycle_lives_test2_mod))
5
np.sign(np.min(X_test2)) ==
np.sign(np.max(X_test2)))
else: # apply transform
transform = transforms[trans]
X_train, X_test1, X_test2 = transform(X_train), transform(X_test1),
transform(X_test2)
# Evaluate error
RMSE_train[k, k2], RMSE_test1[k, k2], RMSE_test2[k, k2] =
get_RMSE_for_all_datasets(y_train_pred, y_test1_pred, y_test2_pred)
# Print stats
skews[k, k2] = skew(X_train)
# print(f'{trans}: Skew(train) = {skews[k, k2]:0.3f}, RMSE(train) =
{RMSE_train[k, k2]:0.3f}')
# print(f'{trans}: Skew(test1) = {skews[k, k2]:0.3f}, RMSE(test1) =
{RMSE_test1[k, k2]:0.3f}')
# print(f'{trans}: Skew(test2) = {skews[k, k2]:0.3f}, RMSE(test2) =
{RMSE_test2[k, k2]:0.3f}')
print()
5
X_fit = np.linspace(np.min(X_train_scaled), np.max(X_train_scaled), 100)
y_fit = X_fit * enet.coef_[0] + enet.intercept_
plt.figure()
plt.plot(X_train_scaled, y_train, 'o', label='Data')
plt.plot(X_fit, y_fit, label='Fit')
plt.title(func)
plt.legend()
3. Lasso Regression:
trainDataSize = size(trainData,2);
cycleLife = zeros(trainDataSize,1);
DeltaQ_var = zeros(trainDataSize,1);
for i = 1:trainDataSize
cycleLife(i) = size(trainData(i).cycles,2) + 1;
DeltaQ_var(i) = var(trainData(i).cycles(100).Qdlin - trainData(i).cycles(10).Qdlin);
end
loglog(DeltaQ_var,cycleLife, 'o')
ylabel('Cycle life'); xlabel('Var(Q_{100} - Q_{10}) (Ah^2)');
[XTrain,yTrain] = helperGetFeatures(trainData);
head(XTrain)
rng("default")
alphaVec = 0.01:0.1:1;
lambdaVec = 0:0.01:1;
MCReps = 1;
cvFold = 4;
rmseList = zeros(length(alphaVec),1);
minLambdaMSE = zeros(length(alphaVec),1);
wModelList = cell(1,length(alphaVec));
betaVec = cell(1,length(alphaVec));
for i=1:length(alphaVec)
[coefficients,fitInfo] = ...
5
lasso(XTrain.Variables,yTrain,'Alpha',alphaVec(i),'CV',cvFold,'MCReps',MCReps,'Lamb
da',lambdaVec);
wModelList{i} = coefficients(:,fitInfo.IndexMinMSE);
betaVec{i} = fitInfo.Intercept(fitInfo.IndexMinMSE);
indexMinMSE = fitInfo.IndexMinMSE;
rmseList(i) = sqrt(fitInfo.MSE(indexMinMSE));
minLambdaMSE(i) = fitInfo.LambdaMinMSE;
end
numVal = 4;
[out,idx] = sort(rmseList);
val = out(1:numVal);
index = idx(1:numVal);
alpha = alphaVec(index);
lambda = minLambdaMSE(index);
wModel = wModelList(index);
beta = betaVec(index);
[XVal,yVal] = helperGetFeatures(valData);
rmseValModel = zeros(numVal);
for valList = 1:numVal
yPredVal = (XVal.Variables*wModel{valList} + beta{valList});
rmseValModel(valList) = sqrt(mean((yPredVal-yVal).^2));
end
[rmseMinVal,idx] = min(rmseValModel);
wModelFinal = wModel{idx};
betaFinal = beta{idx};
[XTest,yTest] = helperGetFeatures(testData);
yPredTest = (XTest.Variables*wModelFinal + betaFinal);
[idx,scores] = fsrftest(XTest,yTest);
bar(scores(idx))
xticklabels(strrep(XTest.Properties.VariableNames(idx),'_','\_'))
xlabel("Predictor rank")
ylabel("Predictor importance score")
figure;
scatter(yTest,yPredTest)
hold on;
5
refline(1, 0);
title('Predicted vs Actual Cycle Life')
ylabel('Predicted cycle life');
xlabel('Actual cycle life');
errTest = (yPredTest-yTest);
rmseTestModel = sqrt(mean(errTest.^2))
n = numel(yTest);
nr = abs(yTest - yPredTest);
errVal =
(1/n)*sum(nr./yTest)*100
4. Multivariate model:
import copy
from functools import reduce
import glob
import json
import os
from pathlib import Path
from sklearn.svm import SVR
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib import cm
from matplotlib.ticker import AutoMinorLocator
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import pandas as pd
import torch
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
from scipy.stats import skew, kurtosis
from sklearn import preprocessing
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LinearRegression, RidgeCV, ElasticNetCV
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.cross_decomposition import PLSRegression
5
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import KNeighborsRegressor
from image_annotated_heatmap import heatmap, annotate_heatmap
from sklearn.linear_model import LassoCV
from sklearn.linear_model import BayesianRidge
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.gaussian_process.kernels import RationalQuadratic
from sklearn.ensemble import VotingRegressor
# ## Settings
#
# Set plotting settings:
rcParams['figure.autolayout'] = True
rcParams['lines.markersize'] = 5
rcParams['lines.linewidth'] = 1.0
rcParams['font.size'] = 7
rcParams['legend.fontsize'] = 7
rcParams['legend.frameon'] = False
rcParams['xtick.minor.visible'] = True
rcParams['ytick.minor.visible'] = True
rcParams['font.sans-serif'] = 'Arial'
rcParams['mathtext.fontset'] = 'custom'
rcParams['mathtext.rm'] = 'Arial'
rcParams['pdf.fonttype'] = 42
rcParams['ps.fonttype'] = 42
5
Vdlin = np.linspace(3.6, 2, 1000)
# Define colors for train, test1, and test2 to roughly match Severson:
colors_list = ['Blues', 'Reds', 'Oranges']
# Set `numpy` seed for consistent `sklearn` results:
np.random.seed(0)
# ## Load data
# The data is stored in the `data` directory. The data was generated by the
`generate_voltage_array.m` MATLAB script, which creates three folders of "voltage
arrays" for the train, test1, and test2 splits in Severson et al.
#
# We will load each dataset as 3D arrays of `[cell idx, voltage position, cycle
number]`. #
# Note that because cycle 1 is unavailable for one batch, position 0 in axis 2 is cycle 2.
# Similarly, cycle 100 is position 98 in axis 2.
# In describing the cycle numbers in the text, I assume the first cycle is cycle 1 (i.e.
references to cycle number in the text are one-indexed, not zero-indexed).
def sortKeyFunc(s):
return int(os.path.basename(s)[4:-4])
def load_dataset(folder):
files = glob.glob(f'./data/{folder}/*.csv')
files.sort(key=sortKeyFunc) # glob returns list with arbitrary order
l = len(files)
dataset = np.zeros((l, 1000, 99))
5
models_cycavg['GPR with Squared Exponential Kernel'] = gpr_sqexp_model_cycavg
models['SVM with Cubic Kernel'] = svm_cubic_model
models_cycavg['SVM with Cubic Kernel'] = svm_cubic_model_cycavg
models['K-Nearest Neighbors'] = knn_model
models_cycavg['K-Nearest Neighbors'] = knn_model_cycavg
models['Lasso Regression'] = lasso_model
models_cycavg['Lasso Regression'] = lasso_model_cycavg
models['Bayesian Ridge Regression'] = bayesian_ridge_model
models_cycavg['Bayesian Ridge Regression'] = bayesian_ridge_model_cycavg
models['Bayesian Lasso Regression'] = bayesian_lasso_model
models_cycavg['Bayesian Lasso Regression'] = bayesian_lasso_model_cycavg
# Add Gradient Boosting Regressor to the dictionary of cycavg models
models_cycavg['Gradient Boosting Regression'] = gbr_cycavg
models['Gradient Boosting Regression'] = gbr
models['Ensemble (Lasso & Ridge)'] = ensemble
models_cycavg['Ensemble (Lasso & Ridge)'] = ensemble
model_names = list(models.keys())
for k, model_name in enumerate(models):
if model_name != "MLP":
# Get model objects
model = models[model_name]
model_cycavg = models_cycavg[model_name]
# Fit models
model.fit(X_train_scaled, y_train)
model_cycavg.fit(X_train_scaled_cycavg, y_train)
7
# Predict on train and test sets
y_train_pred = model.predict(X_train_scaled).squeeze()
y_test1_pred = model.predict(X_test1_scaled).squeeze()
y_test2_pred = model.predict(X_test2_scaled).squeeze()
y_train_pred_cycavg = model.predict(X_train_scaled_cycavg).squeeze()
y_test1_pred_cycavg = model.predict(X_test1_scaled_cycavg).squeeze()
y_test2_pred_cycavg = model.predict(X_test2_scaled_cycavg).squeeze()
else: # MLP model
y_train_pred = np.log10(mlp_predictions["train_pred"])
y_test1_pred = np.log10(mlp_predictions["test1_pred"])
6
y_test2_pred = np.log10(mlp_predictions["test2_pred"])
y_train_pred_cycavg = np.log10(mlp_predictions_cycavg["train_pred"])
y_test1_pred_cycavg = np.log10(mlp_predictions_cycavg["test1_pred"])
y_test2_pred_cycavg = np.log10(mlp_predictions_cycavg["test2_pred"])
# Evaluate error
RMSE_train[k], RMSE_test1[k], RMSE_test2[k] =
get_RMSE_for_all_datasets(y_train_pred, y_test1_pred, RMSE_train_cycavg[k],
RMSE_test1_cycavg[k], RMSE_test2_cycavg[k] =
get_RMSE_for_all_datasets(y_train_# Display errors
print(f'{model_name} RMSE: Train={RMSE_train[k]:.0f} cyc,
Test1={RMSE_test1[k]:.0f} cyc, Test2={print(f'{model_name} RMSE, cycavg:
Train={RMSE_train_cycavg[k]:.0f} cyc, Test1={RMSE_test1_cycavg[8]
6
REFERENCES
[1]. Liu, J., Saxena, A., Goebel, K., Saha, B., & Wang, W. (2010). An adaptive
recurrent neural network for remaining useful life prediction of lithium-ion
batteries. Annual Conference of the Prognostics and Health Management
Society, PHM 2010.
[2]. Cheng, Y., Lu, C., Li, T., & Tao, L. (2015). Residual lifetime prediction for
lithium-ion battery based on functional principal component analysis and
Bayesian approach. Energy, 90.
https://doi.org/10.1016/j.energy.2015.07.022
[3]. Zhou, Y., & Huang, M. (2016). Lithium-ion batteries remaining useful life
prediction based on a mixture of empirical mode decomposition and
ARIMA model. Microelectronics Reliability, 65.
https://doi.org/10.1016/j.microrel.2016.07.151
[4]. Mo, B., Yu, J., Tang, D., & Liu, H. (2016). A remaining useful life
prediction approach for lithium-ion batteries using Kalman filter and an
improved particle filter. 2016 IEEE International Conference on
Prognostics and Health Management, ICPHM 2016.
https://doi.org/10.1109/ICPHM.2016.7542847
[5]. Zhao, Q., Qin, X., Zhao, H., & Feng, W. (2018). A novel prediction method
based on the support vector regression for the remaining useful life of
lithium-ion batteries. Microelectronics Reliability, 85.
https://doi.org/10.1016/j.microrel.2018.04.007
[6]. Sun, Y., Hao, X., Pecht, M., & Zhou, Y. (2018). Remaining useful life
prediction for lithium-ion batteries based on an integrated health indicator.
6
Microelectronics Reliability, 88–90.
https://doi.org/10.1016/j.microrel.2018.07.047
[7]. Severson, K. A., Attia, P. M., Jin, N., Perkins, N., Jiang, B., Yang, Z., Chen,
M. H., Aykol, M., Herring, P. K., Fraggedakis, D., Bazant, M. Z., Harris, S.
J., Chueh, W. C., & Braatz, R. D. (2019). Data-driven prediction of battery
cycle life before capacity degradation. Nature Energy, 4(5), 383–391.
https://doi.org/10.1038/s41560-019-0356-8
[8]. Xiong, R., Zhang, Y., Wang, J., He, H., Peng, S., & Pecht, M. (2019).
Lithium-Ion Battery Health Prognosis Based on a Real Battery
Management System Used in Electric Vehicles. IEEE Transactions on
Vehicular Technology, 68(5), 4110–4121.
https://doi.org/10.1109/TVT.2018.2864688
[9]. Ma, G., Zhang, Y., Cheng, C., Zhou, B., Hu, P., & Yuan, Y. (2019).
Remaining useful life prediction of lithium-ion batteries based on false
nearest neighbors and a hybrid neural network. Applied Energy, 253.
https://doi.org/10.1016/j.apenergy.2019.113626
[10]. Yang, F., Wang, D., Xu, F., Huang, Z., & Tsui, K. L. (2020). Lifespan
prediction of lithium-ion batteries based on various extracted features and
gradient boosting regression tree model. Journal of Power Sources, 476.
https://doi.org/10.1016/j.jpowsour.2020.228654
[11]. Liu, H., Naqvi, I. H., Li, F., Liu, C., Shafiei, N., Li, Y., & Pecht, M. (2020).
An analytical model for the CC-CV charge of Li-ion batteries with
application to degradation analysis. Journal of Energy Storage, 29.
https://doi.org/10.1016/j.est.2020.101342
[12]. Hasib, S. A., Islam, S., Chakrabortty, R. K., Ryan, M. J., Saha, D. K.,
Ahamed, M. H., Moyeen, S. I., Das, S. K., Ali, M. F., Islam, M. R.,
6
Tasneem, Z., & Badal, F. R. (2021). A Comprehensive Review of Available
Battery Datasets, RUL Prediction Approaches, and Advanced Battery
Management. IEEE Access, 9, 86166–86193.
https://doi.org/10.1109/ACCESS.2021.3089032
[13]. Afshari, S. S., Cui, S., Xu, X., & Liang, X. (2022). Remaining Useful Life
Early Prediction of Batteries Based on the Differential Voltage and
Differential Capacity Curves. IEEE Transactions on Instrumentation and
Measurement, 71. https://doi.org/10.1109/TIM.2021.3117631
[14]. Azyus, A. F., Wijaya, S. K., & Naved, M. (2023). Prediction of remaining
useful life using the CNN-GRU network: A study on maintenance
management[Formula presented]. Software Impacts, 17.
https://doi.org/10.1016/j.simpa.2023.100535
[15]. Xiong, W., Xu, G., Li, Y., Zhang, F., Ye, P., & Li, B. (2023). Early
prediction of lithium-ion battery cycle life based on voltage-capacity
discharge curves. Journal of Energy Storage, 62.
https://doi.org/10.1016/j.est.2023.106790
[16]. Zhou, Z., & Howey, D. A. (2023). Bayesian hierarchical modelling for
battery lifetime early prediction. IFAC-PapersOnLine, 56(2), 6117–6123.
https://doi.org/10.1016/j.ifacol.2023.10.708
[17]. Lyu, G., Zhang, H., & Miao, Q. (2023). RUL Prediction of Lithium-Ion
Battery in Early-Cycle Stage Based on Similar Sample Fusion Under
Lebesgue Sampling Framework. IEEE Transactions on Instrumentation and
Measurement, 72, 1–11. https://doi.org/10.1109/TIM.2023.3260277
[18]. Xu, Q., Wu, M., Khoo, E., Chen, Z., & Li, X. (2023). A Hybrid Ensemble
Deep Learning Approach for Early Prediction of Battery Remaining Useful
Life. IEEE/CAA Journal of Automatica Sinica, 10(1), 177–187.
6
https://doi.org/10.1109/jas.2023.123024
[19]. Li, X., Yu, D., Søren Byg, V., & Daniel Ioan, S. (2023). The development of
machine learning-based remaining useful life prediction for lithium-ion
batteries. In Journal of Energy Chemistry (Vol. 82).
https://doi.org/10.1016/j.jechem.2023.03.026
[20]. Wang, S., Fan, Y., Jin, S., Takyi-Aninakwa, P., & Fernandez, C. (2023).
Improved anti-noise adaptive long short-term memory neural network
modeling for the robust remaining useful life prediction of lithium-ion
batteries. Reliability Engineering and System Safety, 230.
https://doi.org/10.1016/j.ress.2022.108920
[21]. Ezzouhri, A., Charouh, Z., Ghogho, M., & Guennoun, Z. (2023). A Data-
Driven-Based Framework for Battery Remaining Useful Life Prediction.
IEEE Access,
11.
https://doi.org/10.1109/ACCESS.2023.3286307
[22]. Gong, S., He, H., Zhao, X., Shou, Y., Huang, R., & Yue, H. (2023).
Lithium-Ion battery remaining useful life prediction method concerning
temperature-influenced charging data. International Conference on
Electrical, Computer, Communications and Mechatronics Engineering,
ICECCME 2023. https://doi.org/10.1109/ICECCME57830.2023.10253076
[23]. Zheng, X., Wu, F., & Tao, L. (2024). Degradation trajectory prediction of
lithium-ion batteries based on charging-discharging health features
extraction and integrated data-driven models. Quality and Reliability
Engineering International. https://doi.org/10.1002/qre.3496