0% found this document useful (0 votes)
29 views76 pages

2 - Early Prediction of Remaining Useful Life of Battery

The project report focuses on the early prediction of the Remaining Useful Life (RUL) of lithium-ion batteries to enhance maintenance and performance. It employs regression models to analyze operational data and validate predictions against real-world datasets, demonstrating improved accuracy over traditional methods. The findings aim to optimize battery management systems across various applications, including electric vehicles and renewable energy systems.

Uploaded by

d22e703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views76 pages

2 - Early Prediction of Remaining Useful Life of Battery

The project report focuses on the early prediction of the Remaining Useful Life (RUL) of lithium-ion batteries to enhance maintenance and performance. It employs regression models to analyze operational data and validate predictions against real-world datasets, demonstrating improved accuracy over traditional methods. The findings aim to optimize battery management systems across various applications, including electric vehicles and renewable energy systems.

Uploaded by

d22e703
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 76

EARLY PREDICTION OF REMAINING USEFUL

LIFE OF BATTERY

A PROJECT REPORT

Submitted by
LOGESHWAR S 715520105019

SAI SHRI MOURIYA C M 715520105037

GNANA PRAKASH S 715520105303

SHARATH B 715520105311

In partial fulfillment for the award of the degree


of
BACHELOR OF ENGINEERING

IN

ELECTRICAL AND ELECTRONICS ENGINEERING

PSG INSTITUTE OF TECHNOLOGY AND APPLIED


RESEARCH, COIMBATORE - 641062

ANNA UNIVERSITY: CHENNAI


600025 MAY 2024
ANNA UNIVERSITY: CHENNAI 600025

BONAFIDE CERTIFICATE

Certified that this project report “EARLY PREDICTION OF REMAINING


USEFUL LIFE OF BATTERY” is the bonafide work of “LOGESHWAR S, SAI
SHRI MOURIYA C M , GNANA PRAKASH S , SHARATH B” who carried out
the project work under my supervision .

Dr. C.L. Vasu Mr. S.


Head Of The Department Ravikrishna
Professor & HoD Supervisor
Department of Electrical and
Assistant Professor (Sl.Gr.)
Electronics Engineering
Department of Electrical and
PSG Institute of Technology
Electronics Engineering PSG
and Applied Research
Institute of Technology and
Applied Research

Certified the candidate was examined in the viva-voce examination held on .

i
Internal Examiner External Examiner

ii
ABSTRACT

As the need for reliable energy storage systems grows, it is imperative to ensure
that batteries are used as effectively as possible. This work addresses the critical issue of
early Remaining Useful Life (RUL) prediction for lithium-ion batteries, with the goal of
enabling proactive maintenance and optimizing battery performance. In this study, a
novel approach to accurately predict the Remaining Useful Life (RUL) of batteries based
on relevant operational parameters, using regression models is provided. The study
methodology includes the collection of a significant quantity of operating battery data,
including voltage characteristics, temperature profiles, and charge-discharge cycles.
Several regression models are applied to this dataset in order to predict the relationship
between various parameters like discharge capacity, voltage, cycle life etc. Feature
engineering approaches are applied to extract meaningful information that support
accurate RUL prediction.

The proposed technique is validated by experiments using real-world battery


datasets, and the performance of the regression models is compared with baseline models.
The results demonstrate that the proposed regression models outperform traditional
methods in providing accurate and timely RUL forecasts. The finding is significant
because it might lead to better battery management systems. This can result in less
downtime and proactive maintenance methods for a range of applications, including
electric vehicles, portable electronics, and renewable energy systems. The regression
models that have been developed offer a practical way to improve battery performance,
increase the lifespan of batteries, and ultimately support the reliable and sustainable
deployment of energy storage systems.

iii
ACKNOWLEDGEMENT

We would like to express our gratitude to Dr. N. Saravanakumar, Principal


and the management of PSG Institute of Technology and Applied Research, for
providing us the opportunity to carry out our project with the best infrastructure
and facilities .

We extend our sincere thanks to Dr. C. L. Vasu, Professor and Head of


the Department of Electrical and Electronics Engineering for providing
consistent support through the project.

We extend our sincere thanks to our project guide Mr. S. Ravikrishna,


Assistant Professor (Sl.Gr.), Department of Electrical and Electronics Engineering
for giving their unfailing support throughout the project .

We extend our gratitude to all the faculty and staff members, friends and
our parents for their constant support and guidance.

iv
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT ii
LIST OF TABLES vi
LIST OF FIGURES vii
LIST OF ABBREVIATIONS ix
1 INTRODUCTION 1
1.1 NEED FOR ANALYSIS 2
1.2 OBJECTIVES 3
1.3 OUTLINE OF THE REPORT 3
2 LITERATURE SURVEY 5
3 FEATURE EXTRACTION 14
3.1 INTRODUCTION 14
3.2 SUMMARY OF DATASET 14
3.3 FEATURE SELECTION AND EXTRACTION 16
3.4 PEARSON CORRELATION COEFFICIENT 21
3.5 FEATURE GENERATION AND 23
STATISTICAL APPROACH
4 UNIVARIATE AND MULTIVARIATE 26
BASED RUL PREDICTION
4.1 STATISTICAL MODELS 26
4.2 UNIVARIATE MODEL FROM ΔQ100−10(V) 26
4.2.1 Univariate Features and Transformations 27
4.3 MULTIVARIATE MODEL 28

v
CHAPTER NO. TITLE PAGE NO.
4.3.1 Methods of Multivariate Model 30
4.3.2 Hyperparameter Optimization 32
4.4 LASSO REGRESSION MODEL 34
5 RESULT AND DISCUSSION 37
6 CONCLUSION 45
7 APPENDIX 46
8 REFERENCES 62

vi
LIST OF TABLES

S.NO. NAME PAGE NO.


3.1 Dataset Description 15

3.2 Summary of Selected Features 17

3.3 Summary of Extracted Featured Based on PCC 23

5.1 Root-Mean-Square Error of Univariate Models 38

5.2 Root-Mean-Square Error of Multivariate Models 41

vii
LIST OF FIGURES

S.NO. NAME PAGE NO.

1.1 Block Diagram of RUL Prediction 3

3.1 Discharge Capacity Curves for 100th, 10th and 21


2nd Cycles

3.2 Initial Capacity Curve of Single Cycle 19

3.3 Initial Capacity Curve of Multiple Cycles 19

3.4 Internal Resistance Distribution of First 1000 20


Cycles

3.5 Voltage Distribution of Multiple Cycles 21

3.6 PCC Values of Extracted Features 22

3.7 Voltage Vs ΔQ100-10 (V) 24

4.1 Dataset Distribution Histogram 30

4.2 Lasso Model Interpretation 34

4.3 Actual Vs Predicted Lifecycle 35

5.1 Root-Mean-Square Error Of Univariate Model-1 39

5.2 Root-Mean-Square Error Of Univariate Model-2 39

5.3 Root-Mean-Square Error Of Univariate Model-3 40

viii
S.NO. NAME PAGE NO.

5.4 Root-Mean-Square Error Of Univariate Model-4 40

5.5 Root-Mean-Square Error Of Multivariate 42


Models

ix
LIST OF ABBREVIATIONS

GPR Gaussian Process Regression

IDR Interdecile Range

IQR Interquartile Range

KNN K-Nearest Neighbors

LCB Lower Confidence Bound

MEA Mean Absolute Error

MLP Multilayer Perceptron

PCC Partial Least Squares Regression

PCC Pearson Correlation Coefficient

PLSR Principal Component Analysis

RF Random Forest

RMSE Remaining Useful Life

RUL Root Mean Square Error

SVM Support Vector Machine

UCB Upper Confidence Bound

x
CHAPTER 1

INTRODUCTION

The motive of this project delves into the significance of precise battery
lifetime prediction and investigates the utilization of statistical learning approaches
for developing models in this domain. Firstly, the applications of battery lifetime
prediction are outlined, encompassing screening new components during research,
optimizing cell designs, and predicting real-world cycle life and safety events. The
challenges in modelling battery degradation are then acknowledged, leading to the
discussion of desirable criteria for data-driven models, emphasizing attributes such
as accuracy, minimal cycle requirements, small training set sizes, and
interpretability.
The role of machine learning, particularly advanced and deep learning
methods, is explored for their potential in capturing information from complex
battery cycling datasets. However, their drawbacks, including brittleness and
interpretability issues, are noted. The text underscores the underexplored realm of
statistical learning for battery lifetime prediction, referencing the work of Severson
et al., who developed models for lithium iron phosphate/graphite cells undergoing
fast charging.
Various dimensionality reduction and model-building approaches are
discussed, revealing that simple models, such as multivariate and univariate linear
models based on ΔQ100−10(V), perform comparably to more complex ones.
Interpretability is highlighted as crucial, showcasing the effectiveness of statistical
learning.
Moreover, this project sheds light on new approaches using elements of
ΔQ100−10(V) and their unique insights into the relationship with cycle life. This
project mainly underscores the importance of accurate predictions, acknowledges

1
challenges in modeling, and advocates for the integration of interpretable statistical
learning approaches alongside advanced machine learning methods.

1.1 NEED FOR ANALYSIS


The study by Severson et al. on battery lifetime prediction using statistical
learning approaches addresses several critical aspects, emphasizing the need for
such investigations:
● Battery Degradation Complexity: Traditional modeling struggles witH intricate
battery degradation mechanisms, favoring statistical learning for capturing
complex interactions within high-dimensional datasets.
● Versatile Applications: Accurate battery lifetime prediction is vital across the
product cycle, aiding R&D, electric vehicles, and grid storage optimization,
ensuring design efficiency, cycling protocols, and cost estimates.
● Interpretability and Trust: Integrating statistical learning ensures model
interpretability, fostering trust, generalization across datasets, and
understanding battery degradation physics.
● Benchmarking: Severson et al.'s study serves as a benchmark, comparing
statistical learning with advanced methods, offering insights into predictive
modeling trade-offs.
● Dataset Considerations: The lithium iron phosphate/graphite cell dataset
underscores tailored approaches' importance, addressing dataset-specific
challenges in battery lifetime prediction.

2
Figure 1.1 Block Diagram of RUL Prediction

1.2 OBJECTIVES
● Prediction of Remaining Useful Life (RUL) within the early cycles
(approximately 100 cycles).
● Utilize a multivariate approach to estimate Remaining Useful Life (RUL)
based on historical data and current conditions.
● Feature selection for reduction of error between predicted and actual values.
● Validate the RUL estimation model using appropriate techniques and
optimize its parameters.

1.3 OUTLINE OF THE REPORT

CHAPTER 1: INTRODUCTION
Brief introduction about the statistical learning for accurate battery lifetime
prediction, addressing challenges and advocating interpretability alongside
advanced methods. The needs and objectives of the project are explained.

3
CHAPTER 2: LITERATURE SURVEY
The recent advancements in lithium-ion battery Remaining Useful Life
(RUL) prediction methods, including data-driven approaches, machine
learning models, and hybrid algorithms, showcasing improved accuracy and
applicability across various battery types and conditions are summarized.

CHAPTER 3: FEATURE EXTRACTION


Brief information about the dataset used and in-depth information of various
features and their implementation towards the final objective are discussed.

CHAPTER 4: UNIVARIATE AND MULTIVARIATE BASED RUL


PREDICTION
Various models, softwares and their contribution towards the final objective
of the project is explained.

CHAPTER 5: RESULTS AND DISCUSSION


The outcomes attained when implementing different models and features are
enumerated. Tables and figures are used to illustrate the project's final result.

CHAPTER 6: CONCLUSION
Brief summary of the project is provided. Future extension of the project
and reference that helped in completion of the project are discussed.

4
CHAPTER 2
LITERATURE SURVEY

Liu, J., Saxena, A., Goebel, K., Saha, B., & Wang, W. (2010). An adaptive
recurrent neural network for remaining useful life prediction of lithium-ion
batteries. Annual Conference of the Prognostics and Health Management Society,
PHM 2010. The ARNN utilizes adaptive/recurrent neural network architecture
with optimized network weights using the recursive Levenberg-Marquardt (RLM)
method, showcasing its effectiveness in prognostics.

Cheng, Y., Lu, C., Li, T., & Tao, L. (2015). Residual lifetime prediction for
lithium-ion battery based on functional principal component analysis and Bayesian
approach. Energy, 90, The study presents a novel method for real-time lithium-ion
battery residual lifetime prediction, using functional principal component analysis
(FPCA) and Bayesian updating, validated with NASA Ames Prognostics Center
data.

Zhou, Y., & Huang, M. (2016). Lithium-ion batteries remaining useful life
prediction based on a mixture of empirical mode decomposition and ARIMA
model. Microelectronics Reliability, 65. The paper proposes a novel approach for
predicting lithium-ion battery remaining useful life (RUL) in electric vehicle
battery management systems. It combines empirical mode decomposition (EMD)
with autoregressive integrated moving average (ARIMA) models, outperforming
other methods like relevance vector machine and monotonic echo state networks.

5
Mo, B., Yu, J., Tang, D., & Liu, H. (2016). A remaining useful life prediction
approach for lithium-ion batteries using Kalman filter and an improved particle
filter. 2016 IEEE International Conference on Prognostics and Health
Management, ICPHM 2016. The paper proposes a novel Particle Filtering (PF)
method for estimating the Remaining Useful Life (RUL) of lithium-ion batteries.
By combining Kalman filter with PF and integrating a particle swarm optimization
algorithm, it achieves higher accuracy compared to standard PF and particle swarm
optimization-based PF methods, validated using NASA's battery dataset.

Zhao, Q., Qin, X., Zhao, H., & Feng, W. (2018). A novel prediction method based
on the support vector regression for the remaining useful life of lithium-ion
batteries. Microelectronics Reliability, 85. The paper introduces real-time
measurable health indicators (HIs) for lithium-ion battery state of health (SOH)
estimation, including the time interval of an equal charging voltage difference
(TIECVD) and the time interval of an equal discharging voltage difference
(TIEDVD). These HIs, along with a novel method combining feature vector
selection (FVS) and support vector regression (SVR), enhance accuracy and
efficiency in predicting SOH and remaining useful life (RUL). Experimental
results on two datasets validate the effectiveness of the proposed approach.

Sun, Y., Hao, X., Pecht, M., & Zhou, Y. (2018). Remaining useful life prediction
for lithium-ion batteries based on an integrated health indicator. Microelectronics
Reliability, 88–90. The paper introduces an integrated health indicator for lithium-
ion battery remaining useful life (RUL) prediction, combining multiple factors
with a polynomial degradation model and particle filter algorithm. It offers insights
into RUL prediction in energy-intensive applications.

6
Severson, K. A., Attia, P. M., Jin, N., Perkins, N., Jiang, B., Yang, Z., Chen, M.
H., Aykol, M., Herring, P. K., Fraggedakis, D., Bazant, M. Z., Harris, S. J., Chueh,
W. C., & Braatz, R. D. (2019). Data-driven prediction of battery cycle life before
capacity degradation. Nature Energy, 4(5), 383–391,The study utilizes early-cycle
discharge data from commercial LFP/graphite batteries to develop accurate cycle
life prediction models using data-driven methods without requiring prior
knowledge of cell chemistry.

Xiong, R., Zhang, Y., Wang, J., He, H., Peng, S., & Pecht, M. (2019). Lithium-Ion
Battery Health Prognosis Based on a Real Battery Management System Used in
Electric Vehicles. IEEE Transactions on Vehicular Technology, 68(5), 4110–4121.
This paper [9] This paper introduces a health indicator derived from partial charge
voltage curves for lithium-ion batteries, alongside a moving-window-based method
for predicting battery remaining useful life (RUL). Tested on a real electric vehicle
battery management system, the method shows promising results in estimating
capacity and predicting RUL.

Ma, G., Zhang, Y., Cheng, C., Zhou, B., Hu, P., & Yuan, Y. (2019). Remaining
useful life prediction of lithium-ion batteries based on false nearest neighbors and a
hybrid neural network. Applied Energy, 253. The paper proposes a hybrid neural
network approach, combining false nearest neighbors for sliding window size
determination with a convolutional and long short-term memory network.
Experimental results show improved prediction accuracy for remaining useful life
estimation in lithium-ion batteries compared to existing methods.

7
Yang, F., Wang, D., Xu, F., Huang, Z., & Tsui, K. L. (2020). Lifespan prediction
of lithium-ion batteries based on various extracted features and gradient boosting
regression tree model. Journal of Power Sources, 476, This paper proposes using
gradient boosting regression trees (GBRT) to accurately predict battery lifespan by
modeling complex nonlinear dynamics through various extracted features,
outperforming other machine learning methods.

Liu, H., Naqvi, I. H., Li, F., Liu, C., Shafiei, N., Li, Y., & Pecht, M. (2020). An
analytical model for the CC-CV charge of Li-ion batteries with application to
degradation analysis. Journal of Energy Storage, 29. The paper explores the
relationship between constant current charge time (CCCT) and constant voltage
charge time (CVCT) in Li-ion batteries' degradation. It develops an analytical
model verified with cycling tests under various conditions and introduces the CV-
CC time ratio as a new health indicator. Additionally, two partial CC-CV charging
cases are examined.

Hasib, S. A., Islam, S., Chakrabortty, R. K., Ryan, M. J., Saha, D. K., Ahamed, M.
H., Moyeen, S. I., Das, S. K., Ali, M. F., Islam, M. R., Tasneem, Z., & Badal, F. R.
(2021). A Comprehensive Review of Available Battery Datasets, RUL Prediction
Approaches, and Advanced Battery Management. IEEE Access, 9, 86166–86193.
The review paper discusses battery data acquisition, management systems for state
estimation, and remaining useful life (RUL) estimation. It covers Li-ion battery
datasets, state estimation, RUL prognostic methods, and comparative studies of
prediction algorithms and models.

8
Afshari, S. S., Cui, S., Xu, X., & Liang, X. (2022). Remaining Useful Life Early
Prediction of Batteries Based on the Differential Voltage and Differential Capacity
Curves. IEEE Transactions on Instrumentation and Measurement, 71, This article
proposes a data-driven method for early prediction of battery remaining useful life
(RUL) using features extracted from differential capacity and voltage curves.
Employing Sparse Bayesian Learning (SBL), the method outperforms lasso and
elastic net methods in accuracy, offering a versatile solution applicable to various
battery types.

Azyus, A. F., Wijaya, S. K., & Naved, M. (2023). Prediction of remaining useful
life using the CNN-GRU network: A study on maintenance management[Formula
presented]. Software Impacts, 17, discusses the CNN-GRU network, an open-
source software package for predicting remaining useful lives (RULs) using
advanced deep learning techniques. It highlights ongoing research, limitations, and
future improvements aimed at enhancing the software's accuracy, efficiency, and
applicability in various industries.

Xiong, W., Xu, G., Li, Y., Zhang, F., Ye, P., & Li, B. (2023). Early prediction of
lithium-ion battery cycle life based on voltage-capacity discharge curves. Journal
of Energy Storage, 62, This paper proposes a method for early predicting lithium-
ion battery cycle life using weighted least squares support vector machine (WLS-
SVM) with health indicators (HIs) extracted from voltage-capacity discharge
curves. By addressing nonlinearity and outlier data issues, it achieves accurate
cycle life prediction, validating its effectiveness for early prediction.

9
Zhou, Z., & Howey, D. A. (2023). Bayesian hierarchical modelling for battery
lifetime early prediction. IFAC-PapersOnLine, 56(2), 6117–6123, The paper
proposes a hierarchical Bayesian linear model for battery life prediction,
combining individual cell features with population-wide features. Leveraging data
from the first 100 cycles, the model outperforms baseline models with a root mean
square error of 3.2 days and mean absolute percentage error of 8.6% in 5-fold
cross-validation.

Lyu, G., Zhang, H., & Miao, Q. (2023). RUL Prediction of Lithium-Ion Battery in
Early-Cycle Stage Based on Similar Sample Fusion Under Lebesgue Sampling
Framework. IEEE Transactions on Instrumentation and Measurement, 72, 1–11.
This article suggests an RUL prediction method for early-cycle lithium-ion
batteries, employing similar sample fusion under the Lebesgue sampling
framework. It introduces a novel similarity measurement index optimized by the
jumping spider optimization algorithm (JSOA), showcasing effectiveness through
experimental results on the APR18650M1A battery dataset.

Xu, Q., Wu, M., Khoo, E., Chen, Z., & Li, X. (2023). A Hybrid Ensemble Deep
Learning Approach for Early Prediction of Battery Remaining Useful Life.
IEEE/CAA Journal of Automatica Sinica, 10(1), 177–187. The paper proposes a
hybrid deep learning model for early prediction of lithium-ion battery remaining
useful life (RUL). It combines handcrafted features with deep network-learned
features and utilizes a non-linear correlation-based method for feature selection. A
snapshot ensemble learning strategy enhances model generalization. Experimental
results show superior performance compared to other approaches, even on
secondary test sets with different distributions.

1
Li, X., Yu, D., Søren Byg, V., & Daniel Ioan, S. (2023). The development of
machine learning-based remaining useful life prediction for lithium-ion batteries.
In Journal of Energy Chemistry (Vol. 82). The study tracks the progress of
machine learning (ML) algorithms in predicting the remaining useful life (RUL) of
lithium-ion batteries, highlighting common approaches and exploring the potential
to extend battery lifetime through optimized charging profiles based on accurate
RUL predictions.

Wang, S., Fan, Y., Jin, S., Takyi-Aninakwa, P., & Fernandez, C. (2023). Improved
anti-noise adaptive long short-term memory neural network modeling for the
robust remaining useful life prediction of lithium-ion batteries. Reliability
Engineering and System Safety, 230. The paper presents an enhanced method for
predicting lithium-ion battery remaining useful life (RUL) in power supply
applications. It introduces an improved anti-noise adaptive long short-term
memory (ANA-LSTM) neural network with robust feature extraction and
parameter optimization, achieving significantly better prediction accuracy than
existing methods.

Ezzouhri, A., Charouh, Z., Ghogho, M., & Guennoun, Z. (2023). A Data-Driven-
Based Framework for Battery Remaining Useful Life Prediction. IEEE Access, 11.
The paper introduces a data-driven framework for predicting the Remaining Useful
Life (RUL) of lithium-ion batteries in Battery Electric Vehicles (BEVs). It includes
a novel feature extraction strategy and demonstrates high accuracy on batteries not
used during training, showcasing its adaptability to various discharge conditions.

1
Gong, S., He, H., Zhao, X., Shou, Y., Huang, R., & Yue, H. (2023). Lithium-Ion
battery remaining useful life prediction method concerning temperature-influenced
charging data. International Conference on Electrical, Computer, Communications
and Mechatronics Engineering, ICECCME 2023. The paper proposes an integrated
method for predicting the Remaining Useful Life (RUL) of lithium-ion batteries. It
considers temperature effects, employing a temperature correction formula for
health indicators, neural network (NN) techniques for determining battery State of
Health (SOH), and an Unscented Particle Filter (UPF) for end-of-life (EOL)
prediction. Experimental validation using Sandia National Lab data confirms its
improved accuracy over conventional methods.

Zheng, X., Wu, F., & Tao, L. (2024). Degradation trajectory prediction of lithium-
ion batteries based on charging-discharging health features extraction and
integrated data-driven models. Quality and Reliability Engineering International.
The paper introduces CSSA-ELM-LSSVR, a fusion method for predicting battery
RUL, combining CSSA, ELM, and LSSVR. It extracts HIs from charging-
discharging, correlates them with aging, and predicts degradation using ELM and
LSSVR optimized by CSSA. Fusion of CSSA-ELM and LSSVR with weighting
schemes outperforms other methods, validated on NASA and MIT datasets.

The literature review covers various methods and models for predicting the
remaining useful life (RUL) of lithium-ion batteries, essential for managing battery
health and ensuring reliable performance in electric vehicles and other
applications. Methods include data-driven approaches like neural networks,
particle filtering, and machine learning algorithms, as well as hybrid models
combining different techniques for improved accuracy.

1
These approaches leverage features extracted from battery data, such as voltage-
capacity curves, early-cycle discharge data, and charging-discharging health
indicators, to estimate RUL. Several studies propose novel algorithms and
frameworks for RUL prediction, addressing challenges such as temperature effects,
nonlinear dynamics, and limited data availability.
Fusion methods, Bayesian models, and ensemble learning techniques are explored
to enhance prediction performance and robustness. Experimental validation on
publicly available datasets, such as those from NASA and MIT, demonstrates the
effectiveness of these methods in accurately predicting battery RUL under various
operating conditions.
Additionally, the review discusses the significance of RUL prediction for battery
management, highlighting its potential to optimize maintenance schedules, prolong
battery lifespan, and improve overall system reliability. Overall, the literature
provides valuable insights into the state-of-the-art techniques and future directions
in battery RUL prediction, contributing to advancements in energy storage
technology and electric vehicle deployment.

1
CHAPTER 3
FEATURE EXTRACTION

3.1 INTRODUCTION
A key component of predictive maintenance is feature extraction, especially
when it comes to early battery Remaining Useful Life (RUL) prediction. The need for
dependable energy storage is growing, and with it comes the necessity to be able to
extract meaningful insights from raw battery data. This chapter explores the
fundamental function of feature extraction techniques and clarifies their importance in
extracting relevant data from intricate datasets that include voltage characteristics,
temperature profiles, and charge-discharge cycles. Through the use of a range of
methodologies, including statistical analysis, time-domain assessments, and frequency-
domain transformations, this research aims to reveal the fundamental patterns that
underlie battery performance and therefore enable precise RUL prediction.
This chapter delves into an exhaustive exploration of diverse feature selection
criteria and extraction methodologies, aiming to shed comprehensive light on the
transformative process of converting raw battery data into actionable insights. By
comprehensively evaluating these methods, the aim is to provide insights into their
applicability, effectiveness, and computational efficiency in the context of battery RUL
prediction.

3.2 SUMMARY OF THE DATASET


The research selected dataset consists of 124 lithium iron phosphate
(LFP)/graphite cells that are accessible commercially. These cells are A123 Systems
model APR18650M1A, with a nominal capacity of 1.1Ah. These cells were subjected
to an intense cycling program in an atmosphere carefully regulated to maintain a steady

1
temperature of 30°C. The cells underwent a wide range of fast-charging conditions
during the experimental protocol. These conditions ranged from the 3.6C fast-charging
rate recommended by the manufacturer to an exploratory extreme of 6C, which was
designed to mimic rapid charging scenarios with charging times as short as roughly 10
minutes.

Table 3.1 Dataset Description

Table 3.1 provides a brief explanation of each cell in the dataset. The term "batch
date" denotes the day the batch was initiated. "charging policy" describes the currents
that range from 0% to 80% and are written as "C1(Q1%)-C2," where Q1 is the SOC at
which the current changes and C1 and C2 stand for the first and second applied C rates,
respectively.

1
The critical parameter used to measure the lifespan and performance of the cell is cycle
life, which was measured as the total number of charge-discharge cycles until 80% of
the nominal capacity was reached. The dataset has an extensive range of cycle
lifetimes, which range from s 150 cycles to 2,300 cycles. This range reflects the
various charging regimens that have been implemented. This large dataset has recorded
about 96,700 cycles of lithium-ion batteries that are currently accessible.
Critical parameters like voltage, current, cell temperature, and internal resistance
were monitored throughout the cycling process. This process enriched the dataset with
detailed insights into battery behavior under a variety of operating conditions. In
addition to providing insight into the complex dynamics of battery degradation, these
observations set the stage for the creation and validation of complex predictive models
that are precisely tailored for cycle life prediction and optimization of battery
performance in a wide range of practical applications.

3.3 FEATURE SELECTION AND EXTRACTION

The selection of features for lithium-ion batteries was based on the knowledge to
capture the electrochemical evolution of individual cells during cycling, while being
unaffected by particular chemistry or degradation mechanisms. Features such as initial
discharge capacity, charge time, and cell temperature provide insights into the battery's
performance and potential degradation.
The change in discharge voltage curves (Delta Q{100-10}(V)) between the 100th
and 10th cycles was important for understanding battery behavior over time. Summary
statistics like minimum, mean, and variance of the (Delta Q_{100-10}(V)) curves were
calculated to capture changes between cycles.
These features were chosen for their predictive ability and used in models to assess
the battery's life cycle, with a focus on metrics such as root-mean-squared error
(RMSE) and average percentage error.

1
Table 3.2 Summary of Selected Features

Feature No Quantity

1 The variance of the change in charge between the 100th and 10th
cycle, with respect to voltage, is computed and log transformed.

2 The minimum change in charge between the 100th and 10th cycle, with
respect to voltage, is computed and log transformed.

3 The slope and intercept of the linear fit to the capacity fade curve from
cycles 2 to 100.

4 The discharge capacity at the 2nd cycle.

5 The average charge time over the first 5 cycles.

6 The minimum internal resistance from cycles 2 to 100.

7 The difference in internal resistance between cycles 2 and 100.

8 A feature based on the variance of voltage data from cycles 2 to 100,


which is log-transformed.

9 A feature based on the variance of current data from cycles 2 to 100,


which is log-transformed.

10 The integral of temperature over cycles 2 to 100.

11 Discharge capacity at cycle 100.

12 Difference between max discharge capacity and cycle 2

Twelve features were selected from the raw data collected during the cycling of lithium
iron phosphate (LFP)/graphite cells and processed using MATLAB. These features are
summarized in Table 3.2. While this feature set may not represent either the minimal or

1
maximal possible set, these features offer valuable information for the early prediction
of the remaining useful life (RUL) of the batteries.

Figure 3.1 Discharge Capacity Curves for 100th, 10th and 2nd Cycles

Severson et al. [7] mainly used features from ΔQ100−10(V) because they thought it
could predict battery cycle life well, especially in cases where degradation is uneven. It
was found that the same thing in our analysis: features from ΔQ100−10(V) also
showed good predictive ability. These features are primarily centered around the "Delta
Q" (ΔQ) metric, as shown in Figure 3.1 which represents the change in charge capacity
of the battery between different cycles. ΔQ is a fundamental measure that captures the
dynamic behavior and degradation of the battery over its lifetime.

For instance, features such as the variance and minimum change in ΔQ between
specific cycles offer valuable insights into the variability and minimum degradation
experienced by the battery. These metrics provide a comprehensive view of how the
battery's capacity fluctuates and degrades over time.

1
Figure 3.2 Initial Capacity Curve of Single Cycle

Figure 3.3 Initial Capacity Curve of Multiple Cycles

The IC (incremental capacity) curve serves as a valuable feature because it provides


key health indicators for analyzing battery aging and predicting remaining service life.
As shown in Figures 3.2 and 3.3, these indicators include parameters such as peak
height, position, area, width, and slope, which reflect changes in the battery's condition
over time. Several studies have demonstrated the effectiveness of extracting health
indicators from the IC curve for estimating battery State Of Health (SOH) with high

1
precision. Utilizing the IC curve allows us to capture meaningful information about the
battery's condition and improve the accuracy of our battery SOH estimation.

Furthermore, features related to discharge capacity at different cycles, such as the 2nd
and 100th cycles, offer essential information about the battery's initial capacity and its
capacity at later stages. These features highlight how the battery's performance evolves
over its lifetime, reflecting both its initial state and its degradation trajectory.

Figure 3.4 Internal Resistance Distribution of First 1000 Cycles

Figure 3.4 represents the internal resistance of the battery at a specific cycle (e.g., 2nd
cycle). Internal Resistance (Int.Res.) distribution is given for the first 1000 cycles of
cell number 5 from the dataset. Rising internal resistance suggests degradation, which
affects battery performance and lifespan. It shows the change in internal resistance
from the 2nd cycle to the 100th cycle. Positive trends in IRDiff2And100 indicate
increasing internal resistance, reflecting degradation and reduced lifespan.

2
Figure 3.5 Voltage Distribution of Multiple Cycles

During the discharge process of a battery, it takes a period for its voltage to drop from a
relatively high voltage to a lower voltage which is denoted in Figure 3.5. Voltage
distribution is given for the Multiple cycles of cell number 5 from the dataset. This time
will decrease as the number of battery charge–discharge cycles increase. With the
increase in the number of charges–discharge cycles of the battery, the rate of decrease
in the discharge voltage gradually becomes faster.

3.4 PEARSON CORRELATION COEFFICIENT

The Pearson Correlation Coefficient (PCC) analysis is a statistical method used to


measure the linear correlation between two variables. In our case, we're using PCC to
assess the relationship between various battery features and battery cycle life. The
features should be prioratized with high absolute PCC values, indicating a stronger
linear correlation with cycle life. The goal is to select the 8 features with the highest
PCC values, as they're likely to have the most significant influence on predicting
battery cycle life. This approach helps us identify the most relevant features for our
predictive model.

2
Figure 3.6 PCC Values of Extracted Features

The PCC values for each of the 12 features were calculated and visualized. However,
for clarity and simplicity, only the highest PCC values were shown in Figure 3.6. This
ensured that the focus is on features with the strongest relationship to cycle life. By
examining this figure, it was identified and prioritized features with the highest
correlation, enhancing the accuracy and reliability of our predictive model.

The features that exhibited the highest PCC values, notably originating from discharge
capacity and internal resistance, align with expectations due to their direct influence on
battery performance and degradation. Table 3.3 shows the selected features.

2
Table 3.3 Summary of Extracted Features Based on PCC

1. Variance of the change in charge between the 100th


and 10th cycle, with respect to voltage, is computed
and log transformed.
ΔQ100-10(V) features
2. Minimum change in charge between the 100th and
10th cycle, with respect to voltage, is computed and
log transformed.
3. Discharge capacity at cycle 100.
Discharge capacity fade 4. Difference between max discharge capacity and
curve feature cycle 2
5. Discharge capacity at the 2nd cycle.
6. Average charge time over the first 5 cycles.
7. Difference in internal resistance between cycles 2
Other features and 100
8. Internal resistance, cycle 2

3.5 FEATURE GENERATION AND STATISTICAL APPROACH

All electrochemical discharge data for the first 100 cycles of each battery cell
can be effectively represented using a single capacity matrix, which comprises 1000
voltage points and 100 cycles, resulting in a high-dimensional dataset akin to a
grayscale image. However, neural networks come with inherent drawbacks, including
the need for many samples, computational intensity, and the requirement for domain
expertise, making them challenging to interpret.

In this context, our study primarily focuses on exploring classical statistical


learning methods rather than deep learning models. The primary objective remains

2
predicting the log10-transformed cycle life, and model performance was evaluated
using root-mean-squared error (RMSE).

A critical aspect of statistical learning involves generating features that


meaningfully represent the input data, aiding in predicting the objective. Given the high
dimensionality of our dataset (100,000 features) relative to the number of observations
(41), our goal is to reduce the dataset's dimensionality to its intrinsic dimension.
Dimensionality reduction can be achieved through manual feature selection or by
employing dimensionality reduction methods.

Reducing the number of voltage points per discharge curve is a straightforward


dimensionality reduction technique, given that the selection of 1000 points per curve in
was arbitrary. Determining the optimal number of points to adequately represent a
voltage curve is crucial, especially for large datasets comprising thousands or millions
of cells. Figure 3.7 demonstrates the impact of down sampling the number of points
used to represent ΔQ100−10(V).

Figure 3.7 Voltage vs ΔQ100−10(V)

In Figure 3.7, ΔQ100−10(V) is depicted with varying numbers of points for an


exemplary cell.. Relative to the model utilizing all 1000 points, the RMSE remains

2
within a 1% change until the sampling frequency exceeds 40 mV (40 points over a 1.6
V voltage window) for the training and primary test sets, and 160 mV (10 points) for
the secondary test set.

This analysis indicates that the extensive dataset does not necessitate 1000 points to
capture crucial features in ΔQ100−10(V), prompting the adoption of reduced point
sampling strategies for enhanced computational efficiency. Such an approach holds
promise for identifying optimal sampling rates in voltage curve analysis across various
applications, particularly for extensive battery cycling datasets.

2
CHAPTER 4

UNIVARIATE AND MULTIVARIATE BASED RUL PREDICTION

4.1 STATISTICAL MODELS


Statistical learning methods are used to develop precise and interpretable
models for predicting battery lifespan. By using the same datasets and objectives as
Severson et al [7]. for consistency and comparison, the research serves as a
comprehensive benchmarking study of earlier work.
Simple univariate linear models using a single aspect of ΔQ100−10(V) achieve
performance comparable to the best models from Severson et al [7]. New methods
using elements of ΔQ100−10(V) offer unique insights into the relationship between
these elements and cycle life.
The shift from univariate to multivariate models represents a fundamental
advancement in our approach to comprehending battery degradation mechanisms.
Unlike univariate models, which isolate individual features for analysis, multivariate
models allow for a comprehensive examination of the intricate relationships between
multiple input variables and the target variable, cycle life.

4.2 UNIVARIATE MODELS FROM ΔQ100−10(V)

Severson et al [7]. identified the log10 variance of ΔQ100−10(V) as the most effective
feature among twenty generated from all data sources. While this feature has plausible
explanations, it may not fully capture battery degradation trends. The aim was to
explore if other univariate transformation-function pairings can achieve similar results
while being more physically meaningful.

The focus was on simple univariate linear models and considered fourteen summary
statistical functions of ΔQ100−10(V) and four feature transformations, resulting in 56

2
pairings. These functions include common summary statistics like min, mean, range,
variance, and kurtosis, as well as single elements of ΔQ100−10(V) at specific voltages.
The feature transformations such as square root, cube root, and log10, though square
root and log10 do not accept negative values were explored. Elastic net regression is
used for regularization even in univariate models.

The analysis focused solely on ΔQ100−10(V), neglecting exploration into other


combinations of cycle numbers. Despite this limitation, our findings reveal valuable
insights into the efficacy of various modeling approaches. Notably, no model
significantly surpasses log10(var(ΔQ100−10(V))), emphasizing its robustness as a
predictive feature. However, the log10(IQR) model consistently demonstrates the
lowest error rates across all datasets, suggesting its potential as a promising alternative.

Furthermore, our study underscores the effectiveness of univariate models derived from
individual elements of ΔQ100−10(V) at specific voltage points. Despite their
simplicity, these models achieve remarkable accuracy in predicting battery degradation.
Overall, our findings emphasize the nuanced interplay between feature selection,
transformation, and model choice in optimizing predictive performance for battery
degradation prediction tasks.

4.2.1 UNIVARIATE FEATURES AND TRANSFORMATIONS

Data is transformed to change its scale or distribution. The initial data exhibits
unprocessed statistical information, providing a standard for comparison and exposing
the inherent properties of the data. For skewed datasets, square root transformation is a
great way to stabilize variance and lessen the impact of outliers by taking the square
root of absolute values.

2
Similar to the square root, the cube root transformation reduces skewness and stabilizes
variance; it works particularly well with data that is right-skewed. By taking the base-
10 logarithm of absolute values, compressing the range of big values and extending the
range of small values, distributions are made more symmetrical for simpler
interpretation. This process is known as logarithmic transformation (base 10). The
benefits of each transformation are different, and the decision is based on the aims of
the research and the properties of the dataset. To find the best transformation for a
particular situation, experimentation is frequently required.

In analyzing the difference in capacity between the 100th and 10th cycle at particular
voltage levels, various summary statistics features offer invaluable information. The
mean, minimum, and maximum provide an overview of central tendency and extremes,
indicating average, lowest, and highest capacity differences, respectively. These
metrics offer initial impressions of the dataset's behavior.

Further delving into dispersion, metrics like the Inter-Decile range (IDR), Inter-
Quartile range (IQR), and variance elucidate the spread of data. The IDR and IQR
highlight the difference between specific percentiles, delineating the middle 80% and
50% of the data's spread, respectively. Variance quantifies the spread around the mean,
offering insights into the dataset's variability. The results and outcomes of the
univariate models with these statistical features are discussed in the following chapter
in detail.

4.3 MULTIVARIATE MODEL

The transition from univariate to multivariate modelling marks a pivotal shift in our
approach to understanding battery degradation mechanisms. Unlike univariate models,
which analyse individual features in isolation, multivariate models enable a holistic
exploration of the complex relationships between multiple input features and the
target variable, cycle life.

2
In this phase of our research, the focus extends beyond singular factors like
voltage-specific capacity differences (ΔQ100−10(V)) to encompass a broader spectrum
of battery characteristics. By embracing multivariate modelling techniques, we seek to
unravel the intricate interplay between various parameters and their collective impact
on cycle life.

This evolution in methodology is driven by the recognition that battery degradation is


influenced by a multitude of factors operating simultaneously. Through multivariate
modelling, the aim is to uncover hidden patterns, identify key contributors to
degradation, and refine our predictive capabilities.

The dataset, consisting of data from 124 LFP/graphite cells, was gathered as part
of this study and served as a source of information for the development of models for
predicting battery cycle life. The dataset was divided into three sets namely

● Training Set: A set of 41 cells used for model training.


● Primary Test Set: A set of 43 cells used for initial model testing.
● Secondary Test Set: A set of 40 cells used for additional testing after model
development.

Three sets of the dataset are separated: test, validation, and training. The model is
trained on the training set, performance is assessed and hyperparameters are adjusted
on the validation set, and the final model is tested on the test set. All cells were
subjected to one of 72 different fast charging protocols, cycling from 0% to 80% state-
of-charge (SOC) in 9 to 13 minutes. They were all discharged at a consistent rate of 4C
(a rate that fully discharges the battery in one hour).

2
Figure 4.1 Dataset Distribution Histogram
Figure 4.1 denotes the histogram which visually represents the distribution of cycle life
across datasets. It helps compare the performance and variability of training, primary
test, and secondary test sets in terms of battery cycle life. From this it was observed that
there is an overfitting of data between the datasets. Hence the cells with overfitting
were removed and the modified data was produced for model tuning.

4.3.1 METHODS OF MULTIVARIATE MODEL


Within the field of linear regression techniques, Ridge Regression is a reliable
approach to address overfitting. Ridge Regression places a regularization restriction on
the model coefficients by adding a penalty term—typically the L2 norm—to the
conventional least squares objective. In turn, this regularization discourages the use of
overly large coefficients, which leads the model towards simpler solutions. As such,
Ridge Regression improves the robustness of the model by carefully balancing the need
to fit the training data while retaining a degree of generalization that applies to
unknown data.

3
Lasso Regression, similar to Ridge Regression, incorporates a regularization
term but utilizes the L1 norm instead of the L2 norm. This choice promotes sparsity in
the coefficient estimates, effectively performing variable selection by driving certain
coefficients to zero. While Lasso Regression excels at feature selection, it may struggle
with correlated predictors, leading to instability in coefficient estimates.
By combining the best features of both Lasso and Ridge regression approaches,
Elastic Net overcomes the drawbacks of each separately. With the inclusion of L1 and
L2 penalties, Elastic Net provides a flexible method for regression modeling. Similar to
Lasso regression, the L1 penalty promotes feature selection and coefficient sparsity,
and the L2 penalty guarantees stability and robustness, similar to Ridge regression.
With the help of this hybridization, Elastic Net is better equipped to negotiate the
challenging landscape of high-dimensional data, where regularization and feature
selection are crucial for creating precise and understandable models. Advanced
techniques such as Principal Component Regression (PCR) and Partial Least Squares
Regression (PLSR) offer alternate approaches to modeling complex interactions
between goal variables and input data, going beyond the conventional linear regression
paradigms.
In contrast to PLSR, which places more emphasis on optimizing the covariance
between features and the goal, PCR uses Principal Component Analysis (PCA) to
lower the dimensionality of the feature space prior to performing linear regression.
These techniques are vital resources in the toolbox of the contemporary data scientist,
providing effective ways to manage high-dimensional data while revealing the
underlying structure that powers predictive success.

The root mean squared error (RMSE), which is calculated for every fold and
gives an average deviation of the predicted values from the actual ones, is the
evaluation metric.

3
The final model trains on the whole training set without cross-validation after
identifying the ideal hyperparameters. Then, performance is evaluated for
generalization ability on the test set. Among the things to take into account are
managing class imbalances, choosing the best k value for statistical robustness and
computing efficiency, and properly rearranging and stratifying the data to minimize
bias.

4.3.2 HYPERPARAMETER OPTIMIZATION

Hyperparameter optimization stands as a pivotal stage in the development of


machine learning models, with the overarching objective of identifying the most
effective hyperparameter configuration to maximize model performance. This process
involves methodically exploring a predefined range of hyperparameter values to
pinpoint the combination that yields optimal performance according to a chosen
evaluation metric.

In Ridge Regression, for instance, a key hyperparameter to fine-tune is the


regularization strength, typically denoted as λ. By systematically adjusting λ and
assessing the model's performance using metrics such as mean squared error (MSE),
practitioners can ascertain the optimal balance between model complexity and
generalization.

Lasso Regression employs regularization to mitigate overfitting, utilizing the L1


norm penalty to encourage sparsity in coefficient estimates and facilitate feature
selection. Here, the regularization parameter α serves as the primary hyperparameter to
optimize. Through systematic exploration of α values and evaluation of model
performance via metrics like MSE, researchers can discern the optimal degree of
regularization to enhance model effectiveness.

3
Elastic Net presents a versatile approach by amalgamating the regularization
techniques of Lasso and Ridge Regression. Here, the hyperparameters α and λ dictate
the strengths of L1 and L2 penalties, respectively. Employing methods like grid search
or random search, practitioners systematically traverse the hyperparameter space to
identify the optimal α and λ combination that minimizes MSE or other relevant
evaluation metrics, thereby achieving a balanced model with enhanced predictive
capabilities.

Techniques like Partial Least Squares Regression (PLSR) and Principal


Component Regression (PCR) focus on optimizing different parameters. PLSR
revolves around tuning the number of components to best capture the variance in the
target variable, while PCR concentrates on determining the number of principal
components to retain after dimensionality reduction via PCA. In both cases, the optimal
configuration is determined by minimizing MSE or similar metrics, showcasing the
iterative nature of hyperparameter optimization in honing model performance.

There were significant advances in our modeling of battery degradation by


switching from multivariate models to a Lasso regression model. The Lasso model's
root mean square error (RMSE) score greatly beat other models after training of the
dataset using a variety of algorithms. When compared to other models, the Lasso
regression model appears to be more apt in identifying the underlying patterns and
correlations in the data because of its capacity to reduce the RMSE. Moreover, the
switch to the Lasso regression model also takes care of overfitting issues.
Successful removal of unnecessary features and lower the chance of overfitting
by utilizing the Lasso regression model, which produces a more reliable and accurate
prediction model. A thorough examination of the Lasso regression model's
performance was performed in the following section, emphasizing its effectiveness in

3
reducing mistakes and offering insightful information on the factors underlying battery
degradation.

4.4 LASSO REGRESSION MODEL

Lasso regression is a regularized linear regression technique that is particularly useful


for predicting battery lifecycles. It promotes sparse models by setting some coefficients

Figure 4.2 Lasso Model Interpretation

to zero, identifying key features, and reducing model complexity. Lasso's regularization
parameter (λ) controls shrinkage, preventing overfitting and improving generalization.
Figure 4.2 is a generalized output graph depicting the lasso model interpretation.

The model is trained using Lasso regression with different regularization parameters,
which helps handle feature correlations and improves model generalizability. Cross-
validation is employed for hyperparameter tuning and model selection based on the
lowest root mean squared error (RMSE). The results of these are discussed briefly in
the following chapter.

3
Figure 4.3 Predicted vs Actual Cycle Life

The results include a graph, shown in Figure 4.3, comparing the actual cycle life of the
batteries with the predicted cycle life, showing how well the model's predictions align
with the true values. This comparison provides a clear visual representation of the
model's effectiveness and accuracy in predicting the remaining life of the batteries.

The model's performance is validated on a separate dataset to identify the best-


performing model configuration. Once the optimal model is selected, it is tested on an
independent test dataset. Performance metrics such as RMSE and other error
calculations are used to evaluate the model's accuracy. The formula (4.1) and (4.2) are
used to calculate the Root-Mean-Square Error (RMSE) and Error value (%err)
respectively for the extracted features.

3
The model was developed to estimate the cycle life of lithium-ion batteries by utilizing
a unique combination of features and data from early cycles. This approach resulted in
a significant reduction in error, achieving an average error rate of just 6.8%, which
represents a notable improvement over existing models.

The model leverages specific features derived from the voltage and current
measurements during the initial cycles of discharge, as well as other supplementary
data sources. This comprehensive approach enabled the capture of essential trends and
patterns in battery behaviour that could be overlooked with simpler models.

By fine-tuning the model's parameters and selecting the most relevant features, its
accuracy and reliability were enhanced. This model outperforms prior models, which
typically require larger data sets or more extensive degradation for accurate predictions.

The developed predictive model achieves an impressively low error rate of 6.8%,
which is significantly lower than the error rates reported in previous research, including
those presented by Severson et al [7]. This highlights the enhanced performance and
precision of the current model in predicting the remaining cycle life of lithium-ion
batteries. In comparison to earlier methods, the suggested model gives better accuracy
in battery life estimates by utilizing features that have been carefully selected and by
using strong model fitting and selection strategies.

3
CHAPTER 5

RESULTS AND

DISCUSSION

Exploring the univariate models motivates us to understand if simpler models,


particularly those using the log10(variance of ΔQ100−10(V)), can perform well. While
this feature has shown promise, its physical interpretation in battery science is not
entirely satisfying. Around 14 summary statistical functions were applied to
ΔQ100−10(V) and four feature transformations, totaling 56 function-transformation
pairings. These include common summary statistics like min, mean, range, variance,
and kurtosis, along with transformations like square root, cube root, and log10. We're
using elastic net regression for regularization, even in univariate models, aiming for
more robust and generalizable results. Additionally, focusing on ΔQ100−10(V) and
not exploring other cycle number combinations, as previous research indicates its
relative accuracy.

Table 5.1 depicts the results of univariate models. The analysis revealed that the
choice of data transformation significantly influences the predictive accuracy of our
models. Across various parameters, such as different voltage levels of Delta Q and
statistical measures like minimum, range, and median, employing a logarithmic
transformation consistently resulted in lower test errors compared to other
transformations like square root or cube root. This highlights the importance of
selecting an appropriate data transformation method to improve model performance.
Additionally, the top-performing single-feature model, Delta Q at 3.428V, shown in
Figure 3.6 exhibited superior predictive ability compared to a complex model from [7],
underscoring the effectiveness of focusing on specific features for accurate predictions.

3
Table 5.1 Root-Mean-Square Error of Univariate Models.

3
Figure 5.1 Root-Mean-Square Error of Univariate Models-1

Figure 5.2 Root-Mean-Square Error of Univariate Models-2

3
Figure 5.3 Root-Mean-Square Error of Univariate Models-3

Figure 5.4 Root-Mean-Square Error of Univariate Models-4

Figures 5.1, 5.2, 5.3, 5.4 serves as a pictorial representation of the results in Table 5.1.
From the above discussion we could conclude that the Root-Mean-Square error value

4
of Delta Q(100-10)(3.248 V) was found to be the least when compared to other
summary statistical functions.

Exploring multivariate models to enhance our understanding beyond univariate


linear models, aiming to capture more meaningful information about the prediction
objective. By using elements of ΔQ100−10(V) directly as input features, the capacity
differences at specific voltages that have the greatest impact on cycle life can be
potentially observed. This approach allows us to gain insights into relevant degradation
mechanisms based on the sensitivity of cycle life to features at different voltages.

Despite the challenge of high collinearity among features, common in fields like
chemometrics and bioinformatics, we're leveraging neural network methods tailored
for such scenarios. These include ridge regression, elastic net regression, Principal
Components Regression (PCR), Partial Least Squares Regression (PLSR), Bayesian
Lasso regression, among others, known for handling highly correlated features
effectively.

Table 5.2 Root-Mean-Square Error of Multivariate Models.

Table 5.2 provides a comparison of root mean squared error (RMSE) values across
various models for predicting battery cycle life: the three models from the Severson [7]

4
study, neural network model and our model. Among the regression models evaluated,
PLSR stands out with the lowest RMSE values, in neural network models across both
training and testing sets, indicating its effectiveness in capturing the underlying
patterns in the data while avoiding overfitting. But our model is obtained by the Lasso
model incorporated with the L1 Regularization technique. In contrast, models such as
Principal Component Regression show high RMSE values on the training set,
suggesting potential overfitting issues.

Figure 5.5 Root-Mean-Square Error of Multivariate Models

Ridge Regression and Elastic Net Regression exhibit moderate performance,


balancing between fitting the training data well and generalizing to unseen data. In
Figure 5.5 the observation is that our model provides the least Root-Mean-Square error
value among other models. Overall, our model emerges as the most promising model
for accurately predicting the target variable while maintaining good generalization
performance.

Customized approach by implementing a Lasso regression model using


MATLAB was explored. The key innovation lay in the feature extraction process,
where it was devised a tailored combination of features optimized for the dataset's

4
characteristics. Through careful selection and engineering of features, the aim was to
capture the intricate patterns and relationships within the data, thereby enhancing the
model's predictive performance.

Our model exhibited promising outcomes, as evidenced by the achieved RMSE


of 90 and a percentage error of 6.8%. These metrics signify the model's ability to
accurately predict outcomes and outperform existing models, showcasing the
effectiveness of our tailored feature selection methodology and modelling technique.
The obtained results are shown below.

Root Mean Squared Error (Test):

90.111236 Error Validation: 6.899845

Root Mean Squared Error (Train): 87.819009

Root Mean Squared Error (Validation): 180.252659

From Table 5.2, The three models from the Severson [7] study—Variance,
Discharge, and Full—show varying levels of performance. The Variance model has an
RMSE of 103 on the training set and 196 on the secondary test set. The Discharge
model performs slightly better on the training set with an RMSE of 76, but its RMSE
increases on the secondary test sets (173). The Full model achieves the best
performance on the training set with an RMSE of 51, but its RMSE increases to 214 on
the secondary test set.

Our model, which was developed using different features and techniques,
demonstrates a significant improvement in performance. It achieves an RMSE of 87 on
the training set and 90 on the secondary test set, indicating that it generalizes well to

4
new data. This suggests that our model is more robust and provides more reliable
predictions than the models in the [7].

4
CHAPTER 6

CONCLUSION

In the realm of lithium-ion battery advancement, the utilization of data-driven


modeling presents a compelling avenue for enhancing diagnostics and prognostics. Our
research endeavors have led to the creation of cycle life prediction models, which are
honed using early-cycle discharge data sourced from commercial LFP/graphite
batteries undergoing rapid charging. Remarkably, our regression model achieves a test
error rate of just 6.8% utilizing data from only the initial 100 cycles, even in the
absence of discernible capacity degradation during these early phases.
This remarkable precision is attained by extracting key characteristics from
high-rate discharge voltage curves, obviating the need for reliance on slow diagnostic
cycles or extensive prior knowledge of cell chemistry and degradation mechanisms.
The consistency exhibited by our model in capturing deterioration mechanisms,
which may not manifest as evident capacity decline in the early cycles but nonetheless
impact voltage curves, underscores its efficacy. This innovative approach holds
promise in complementing existing techniques reliant on specialized diagnostics and
physical or semi-empirical models.
Ultimately, our findings underscore the potential of integrating data production
with data-driven modeling to enhance our understanding and advancement of intricate
systems like lithium-ion batteries. By leveraging the power of data-driven insights, the
insights can drive forward strides in battery optimization and longevity, ushering in a
new era of efficiency and reliability in energy storage technology.

4
APPENDIX

1. Feature Extraction:
function [xTable, y] = helperGetFeatures(batch)
% HELPERGETFEATURES function accepts the raw measurement data and computes
% the following feature:
%
% Q_{100-10}(V) variance [DeltaQ_var]
% Q_{100-10}(V) minimum [DeltaQ_min]
% Slope of linear fit to the capacity fade curve cycles 2 to 100 [CapFadeCycle2Slope]
% Intercept of linear fit to capacity fade curve, cycles 2 to 100 [CapFadeCycle2Intercept]
% Discharge capacity at cycle 2 [Qd2]
% Average charge time over first 5 cycles [AvgChargeTime]
% Minimum Internal Resistance, cycles 2 to 100 [MinIR]
% Internal Resistance difference between cycles 2 and 100 [IRDiff2And100]
% Voltage feature [VoltageFeature]
% Current feature [CurrentFeature]

N = size(batch,2); % Number of batteries


% Preallocate space for features
numFeatures = 10; % Number of features
xTable = table('Size', [N, numFeatures], 'VariableTypes', repmat({'double'}, 1,
numFeatures),...
'VariableNames', {'DeltaQ_var', 'DeltaQ_min', 'AvgChargeTime', 'IR2', 'Qd100',
'Qd2', 'Qmax', 'IRDiff2And100', 'VoltageFeature', 'CurrentFeature'});

y = zeros(N, 1);
DeltaQ_var = zeros(N,1);
V_var = zeros(N,1);
DeltaQ_min = zeros(N,1);
CapFadeCycle2Slope = zeros(N,1);
CapFadeCycle2Intercept = zeros(N,1);
CapFadeCycle2Slope2 = zeros(N,1);
CapFadeCycle2Intercept2 = zeros(N,1);
Qd2 = zeros(N,1);
Qd10 = zeros(N,1);

4
Qmax = zeros(N,1);
Qd100 = zeros(N,1);
IR2= zeros(N,1);
IR100= zeros(N,1);
IR100= zeros(N,1);
AvgChargeTime = zeros(N,1);
MinIR = zeros(N,1);
IRDiff2And100 = zeros(N,1);
VoltageFeature = zeros(N,1);
CurrentFeature = zeros(N,1);
DeltaQ_skewness = zeros(N,1);
DeltaQ_kurtosis = zeros(N,1);
Qmax_minus_Qd2 = zeros(N,1);
Qdischarge = zeros(N,1);
Qcharge = zeros(N,1);
IntegralTemp = zeros(N,1);
dQ_dV = zeros(N,1);
dQ_dV_max = zeros(N,1);
dQ_dV_mean = zeros(N,1);
dQ_dV_sk = zeros(N,1);
dQ_dV_k = zeros(N,1);
for i = 1:N
% cycle life
y(i,1) = size(batch(i).cycles,2) + 1;
numCycles = size(batch(i).cycles, 2);
timeGapCycleIdx = [];
jBadCycle = 0;
for jCycle = 2: numCycles
dt = diff(batch(i).cycles(jCycle).t);
if max(dt) > 5*mean(dt)
jBadCycle = jBadCycle + 1;
timeGapCycleIdx(jBadCycle) = jCycle;
end
end

% Remove cycles with time gaps


batch(i).cycles(timeGapCycleIdx) = [];

4
batch(i).summary.QDischarge(timeGapCycleIdx) = [];
batch(i).summary.IR(timeGapCycleIdx) = [];
batch(i).summary.chargetime(timeGapCycleIdx) = [];

% compute Q_100_10 stats


DeltaQ = batch(i).cycles(100).Qdlin - batch(i).cycles(10).Qdlin;

DeltaQ_var(i) = log10(abs(var(DeltaQ)));
DeltaQ_min(i) = log10(abs(min(DeltaQ)));

% Slope and intercept of linear fit for capacity fade curve from cycle
% 2 to cycle 100
coeff2 = polyfit(batch(i).summary.cycle(2:100),
batch(i).summary.QDischarge(2:100),1);
CapFadeCycle2Slope(i) = coeff2(1);
CapFadeCycle2Intercept(i) = coeff2(2);

% Discharge capacity at cycle 2


Qd2(i) = batch(i).summary.QDischarge(2);

% Discharge capacity at cycle 100


Qd100(i) = batch(i).summary.QDischarge(100);

% Compute maximum capacity [Qmax] as difference between discharge capacity at


cycle 2 and 1.1
Qmax(i) = 1.1 - batch(i).summary.QDischarge(2);

% Avg charge time, 2- 100 cycles


AvgChargeTime(i) = mean(batch(i).summary.chargetime(2:100));

% Minimum internal resistance, cycles 2 to 100


temp = batch(i).summary.IR(2:100);

MinIR(i) = min(temp(temp~=0));

% Compute difference in internal resistance between cycles 2 and 100


[IRDiff2And100]

4
IRDiff2And100(i) = batch(i).summary.IR(100) - batch(i).summary.IR(2);

%internal resistance cycle 2


IR2(i) = batch(i).summary.IR(2);

%internal resistance cycle 100


IR100(i) = batch(i).summary.IR(100);

% Integral of temperature from cycles 2 to 100


tempIntT = 0;
for jCycle = 2:100
tempIntT = tempIntT + trapz(batch(i).cycles(jCycle).t, batch(i).cycles(jCycle).T);
end
IntegralTemp(i) = tempIntT;
% Calculate the voltage feature
voltageData = [];
for jCycle = 2:100
voltageData = [voltageData; batch(i).cycles(jCycle).V];
end
VoltageFeature(i) = calculateVoltageFeature(voltageData);

% Calculate the current feature


currentData = [];
for jCycle = 2:100
currentData = [currentData; batch(i).cycles(jCycle).I];
end
CurrentFeature(i) = calculateCurrentFeature(currentData);
end

xTable = table(DeltaQ_var, DeltaQ_min, ...


AvgChargeTime,IR2,Qd100,Qd2,Qmax,IRDiff2And100);

end

function voltageFeature = calculateVoltageFeature(voltageData)


% Here you can implement your calculation for the voltage feature

4
% For example, let's say you want to compute the variance of the voltage data
voltageFeature = log10(abs(var(voltageData)));
end

function currentFeature = calculateCurrentFeature(currentData)


% Here you can implement your calculation for the current feature
% For example, let's say you want to compute the variance of the current data
currentFeature = log10(abs(var(currentData)));
end

2. Univariate model:
import copy
from functools import reduce
import glob
import json
import os
from pathlib import Path
from sklearn.svm import SVR
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib import cm
from matplotlib.ticker import AutoMinorLocator
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import pandas as pd
import torch
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
from scipy.stats import skew, kurtosis
from sklearn import preprocessing
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LinearRegression, RidgeCV, ElasticNetCV
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.cross_decomposition import PLSRegression

5
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import KNeighborsRegressor
from image_annotated_heatmap import heatmap, annotate_heatmap
from sklearn.linear_model import LassoCV
from sklearn.linear_model import BayesianRidge
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.gaussian_process.kernels import RationalQuadratic
from sklearn.ensemble import VotingRegressor
# ## Settings
#
# Set plotting settings:

fig_width = 3.25 # ECS spec is 3.25" width


fig_width_2col = 7
fig_height = (3 / 4) * fig_width # standard ratio

rcParams['figure.autolayout'] = True
rcParams['lines.markersize'] = 5
rcParams['lines.linewidth'] = 1.0
rcParams['font.size'] = 7
rcParams['legend.fontsize'] = 7
rcParams['legend.frameon'] = False
rcParams['xtick.minor.visible'] = True
rcParams['ytick.minor.visible'] = True
rcParams['font.sans-serif'] = 'Arial'
rcParams['mathtext.fontset'] = 'custom'
rcParams['mathtext.rm'] = 'Arial'
rcParams['pdf.fonttype'] = 42
rcParams['ps.fonttype'] = 42

# Set path for saving figures:

figpath = Path.cwd() / 'figs'

# Define linearly spaced voltage vector (consistent with Severson):

5
for k, file in enumerate(files):
cell = np.genfromtxt(file, delimiter=',')
dataset[k,:,:] = cell # flip voltage dimension

return dataset

# Load three datasets (`test1` = primary test set, `test2` = secondary test set):

data_train = load_dataset('train')
data_test1 = load_dataset('test1')
data_test2 = load_dataset('test2')

# Load cycle lives:

cycle_lives_train = np.genfromtxt('./data/cycle_lives/train_cycle_lives.csv', delimiter=',')


cycle_lives_test1 = np.genfromtxt('./data/cycle_lives/test1_cycle_lives.csv', delimiter=',')
cycle_lives_test2 = np.genfromtxt('./data/cycle_lives/test2_cycle_lives.csv', delimiter=',')

# Confirm 124 cells are loaded:

print(len(data_train) + len(data_test1) + len(data_test2))


print(len(cycle_lives_train) + len(cycle_lives_test1) + len(cycle_lives_test2))

# Define `y` values as log10 of cycle life:

y_train = np.log10(cycle_lives_train)
y_test1 = np.log10(cycle_lives_test1)
y_test2 = np.log10(cycle_lives_test2)

# Make a new version of `data_test1`, `cycle_lives_test1` and `y_test1` that exclude the
outlier battery:

data_test1_mod = data_test1.copy()

5
cycle_lives_test1_mod = cycle_lives_test1.copy()
y_test1_mod = y_test1.copy()

data_test1_mod = np.delete(data_test1_mod, 21, axis=0)


cycle_lives_test1_mod = np.delete(cycle_lives_test1_mod, 21)
y_test1_mod = np.delete(y_test1_mod, 21)

data_test2_mod = data_test2.copy()
cycle_lives_test2_mod = cycle_lives_test2.copy()
y_test2_mod = y_test2.copy()

indices_to_delete = [6, 15, 16, 39] # List of indices to delete


data_test2_mod = np.delete(data_test2_mod, indices_to_delete, axis=0)
cycle_lives_test2_mod = np.delete(cycle_lives_test2_mod, indices_to_delete)
y_test2_mod = np.delete(y_test2_mod, indices_to_delete)

# Get median cycle lives for each dataset:

print(np.median(cycle_lives_train))
print(np.median(cycle_lives_test1_mod))
print(np.median(cycle_lives_test2_mod))

# Create helper function for reducing RMSE boilerplate:

def get_RMSE_for_all_datasets(y_train_pred, y_test1_pred, y_test2_pred):


"""
Calculate RMSE for three datasets. Use units of cycles instead of log(cycles)
"""

RMSE_train = mean_squared_error(np.power(10, y_train), np.power(10,


y_train_pred), squared=False)
RMSE_test1 = mean_squared_error(np.power(10, y_test1_mod), np.power(10,
y_test1_pred), squared=False)
RMSE_test2 = mean_squared_error(np.power(10, y_test2_mod), np.power(10,
y_test2_pred), squared=False)

5
np.sign(np.min(X_test2)) ==
np.sign(np.max(X_test2)))
else: # apply transform
transform = transforms[trans]
X_train, X_test1, X_test2 = transform(X_train), transform(X_test1),
transform(X_test2)

# Scale via standarization


scaler = preprocessing.StandardScaler().fit(X_train.reshape(-1, 1))
X_train_scaled = scaler.transform(X_train.reshape(-1, 1))
X_test1_scaled = scaler.transform(X_test1.reshape(-1, 1))
X_test2_scaled = scaler.transform(X_test2.reshape(-1, 1))

# Define and fit linear regression via enet


enet = ElasticNetCV(l1_ratio=l1_ratios, cv=5, random_state=0)
enet.fit(X_train_scaled, y_train)

# Predict on test sets


y_train_pred = enet.predict(X_train_scaled)
y_test1_pred = enet.predict(X_test1_scaled)
y_test2_pred = enet.predict(X_test2_scaled)

# Evaluate error
RMSE_train[k, k2], RMSE_test1[k, k2], RMSE_test2[k, k2] =
get_RMSE_for_all_datasets(y_train_pred, y_test1_pred, y_test2_pred)

# Print stats
skews[k, k2] = skew(X_train)
# print(f'{trans}: Skew(train) = {skews[k, k2]:0.3f}, RMSE(train) =
{RMSE_train[k, k2]:0.3f}')
# print(f'{trans}: Skew(test1) = {skews[k, k2]:0.3f}, RMSE(test1) =
{RMSE_test1[k, k2]:0.3f}')
# print(f'{trans}: Skew(test2) = {skews[k, k2]:0.3f}, RMSE(test2) =
{RMSE_test2[k, k2]:0.3f}')
print()

# make plot for last transformation

5
X_fit = np.linspace(np.min(X_train_scaled), np.max(X_train_scaled), 100)
y_fit = X_fit * enet.coef_[0] + enet.intercept_

plt.figure()
plt.plot(X_train_scaled, y_train, 'o', label='Data')
plt.plot(X_fit, y_fit, label='Fit')
plt.title(func)
plt.legend()

3. Lasso Regression:
trainDataSize = size(trainData,2);
cycleLife = zeros(trainDataSize,1);
DeltaQ_var = zeros(trainDataSize,1);
for i = 1:trainDataSize
cycleLife(i) = size(trainData(i).cycles,2) + 1;
DeltaQ_var(i) = var(trainData(i).cycles(100).Qdlin - trainData(i).cycles(10).Qdlin);
end
loglog(DeltaQ_var,cycleLife, 'o')
ylabel('Cycle life'); xlabel('Var(Q_{100} - Q_{10}) (Ah^2)');
[XTrain,yTrain] = helperGetFeatures(trainData);
head(XTrain)
rng("default")

alphaVec = 0.01:0.1:1;
lambdaVec = 0:0.01:1;
MCReps = 1;
cvFold = 4;

rmseList = zeros(length(alphaVec),1);
minLambdaMSE = zeros(length(alphaVec),1);
wModelList = cell(1,length(alphaVec));
betaVec = cell(1,length(alphaVec));

for i=1:length(alphaVec)
[coefficients,fitInfo] = ...

5
lasso(XTrain.Variables,yTrain,'Alpha',alphaVec(i),'CV',cvFold,'MCReps',MCReps,'Lamb
da',lambdaVec);
wModelList{i} = coefficients(:,fitInfo.IndexMinMSE);
betaVec{i} = fitInfo.Intercept(fitInfo.IndexMinMSE);
indexMinMSE = fitInfo.IndexMinMSE;
rmseList(i) = sqrt(fitInfo.MSE(indexMinMSE));
minLambdaMSE(i) = fitInfo.LambdaMinMSE;
end
numVal = 4;
[out,idx] = sort(rmseList);
val = out(1:numVal);
index = idx(1:numVal);

alpha = alphaVec(index);
lambda = minLambdaMSE(index);
wModel = wModelList(index);
beta = betaVec(index);
[XVal,yVal] = helperGetFeatures(valData);
rmseValModel = zeros(numVal);
for valList = 1:numVal
yPredVal = (XVal.Variables*wModel{valList} + beta{valList});
rmseValModel(valList) = sqrt(mean((yPredVal-yVal).^2));
end
[rmseMinVal,idx] = min(rmseValModel);
wModelFinal = wModel{idx};
betaFinal = beta{idx};
[XTest,yTest] = helperGetFeatures(testData);
yPredTest = (XTest.Variables*wModelFinal + betaFinal);
[idx,scores] = fsrftest(XTest,yTest);
bar(scores(idx))
xticklabels(strrep(XTest.Properties.VariableNames(idx),'_','\_'))
xlabel("Predictor rank")
ylabel("Predictor importance score")
figure;
scatter(yTest,yPredTest)
hold on;

5
refline(1, 0);
title('Predicted vs Actual Cycle Life')
ylabel('Predicted cycle life');
xlabel('Actual cycle life');
errTest = (yPredTest-yTest);
rmseTestModel = sqrt(mean(errTest.^2))
n = numel(yTest);
nr = abs(yTest - yPredTest);
errVal =
(1/n)*sum(nr./yTest)*100

4. Multivariate model:
import copy
from functools import reduce
import glob
import json
import os
from pathlib import Path
from sklearn.svm import SVR
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib import cm
from matplotlib.ticker import AutoMinorLocator
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import pandas as pd
import torch
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
from scipy.stats import skew, kurtosis
from sklearn import preprocessing
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LinearRegression, RidgeCV, ElasticNetCV
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.cross_decomposition import PLSRegression

5
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import KNeighborsRegressor
from image_annotated_heatmap import heatmap, annotate_heatmap
from sklearn.linear_model import LassoCV
from sklearn.linear_model import BayesianRidge
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.gaussian_process.kernels import RationalQuadratic
from sklearn.ensemble import VotingRegressor
# ## Settings
#
# Set plotting settings:

fig_width = 3.25 # ECS spec is 3.25" width


fig_width_2col = 7
fig_height = (3 / 4) * fig_width # standard ratio

rcParams['figure.autolayout'] = True
rcParams['lines.markersize'] = 5
rcParams['lines.linewidth'] = 1.0
rcParams['font.size'] = 7
rcParams['legend.fontsize'] = 7
rcParams['legend.frameon'] = False
rcParams['xtick.minor.visible'] = True
rcParams['ytick.minor.visible'] = True
rcParams['font.sans-serif'] = 'Arial'
rcParams['mathtext.fontset'] = 'custom'
rcParams['mathtext.rm'] = 'Arial'
rcParams['pdf.fonttype'] = 42
rcParams['ps.fonttype'] = 42

# Set path for saving figures:

figpath = Path.cwd() / 'figs'

# Define linearly spaced voltage vector (consistent with Severson):

5
Vdlin = np.linspace(3.6, 2, 1000)

# Define `alphas` and `l1_ratios` to use for `RidgeCV` and `ElasticNetCV`


hyperparameter optimization:

l1_ratios = [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1]


alphas = np.logspace(0.001, 100, 20)

# Define colors for train, test1, and test2 to roughly match Severson:
colors_list = ['Blues', 'Reds', 'Oranges']
# Set `numpy` seed for consistent `sklearn` results:

np.random.seed(0)

# ## Load data
# The data is stored in the `data` directory. The data was generated by the
`generate_voltage_array.m` MATLAB script, which creates three folders of "voltage
arrays" for the train, test1, and test2 splits in Severson et al.
#
# We will load each dataset as 3D arrays of `[cell idx, voltage position, cycle
number]`. #
# Note that because cycle 1 is unavailable for one batch, position 0 in axis 2 is cycle 2.
# Similarly, cycle 100 is position 98 in axis 2.
# In describing the cycle numbers in the text, I assume the first cycle is cycle 1 (i.e.
references to cycle number in the text are one-indexed, not zero-indexed).

def sortKeyFunc(s):
return int(os.path.basename(s)[4:-4])

def load_dataset(folder):
files = glob.glob(f'./data/{folder}/*.csv')
files.sort(key=sortKeyFunc) # glob returns list with arbitrary order

l = len(files)
dataset = np.zeros((l, 1000, 99))

5
models_cycavg['GPR with Squared Exponential Kernel'] = gpr_sqexp_model_cycavg
models['SVM with Cubic Kernel'] = svm_cubic_model
models_cycavg['SVM with Cubic Kernel'] = svm_cubic_model_cycavg
models['K-Nearest Neighbors'] = knn_model
models_cycavg['K-Nearest Neighbors'] = knn_model_cycavg
models['Lasso Regression'] = lasso_model
models_cycavg['Lasso Regression'] = lasso_model_cycavg
models['Bayesian Ridge Regression'] = bayesian_ridge_model
models_cycavg['Bayesian Ridge Regression'] = bayesian_ridge_model_cycavg
models['Bayesian Lasso Regression'] = bayesian_lasso_model
models_cycavg['Bayesian Lasso Regression'] = bayesian_lasso_model_cycavg
# Add Gradient Boosting Regressor to the dictionary of cycavg models
models_cycavg['Gradient Boosting Regression'] = gbr_cycavg
models['Gradient Boosting Regression'] = gbr
models['Ensemble (Lasso & Ridge)'] = ensemble
models_cycavg['Ensemble (Lasso & Ridge)'] = ensemble
model_names = list(models.keys())
for k, model_name in enumerate(models):
if model_name != "MLP":
# Get model objects
model = models[model_name]
model_cycavg = models_cycavg[model_name]
# Fit models
model.fit(X_train_scaled, y_train)
model_cycavg.fit(X_train_scaled_cycavg, y_train)
7
# Predict on train and test sets
y_train_pred = model.predict(X_train_scaled).squeeze()
y_test1_pred = model.predict(X_test1_scaled).squeeze()
y_test2_pred = model.predict(X_test2_scaled).squeeze()
y_train_pred_cycavg = model.predict(X_train_scaled_cycavg).squeeze()
y_test1_pred_cycavg = model.predict(X_test1_scaled_cycavg).squeeze()
y_test2_pred_cycavg = model.predict(X_test2_scaled_cycavg).squeeze()
else: # MLP model
y_train_pred = np.log10(mlp_predictions["train_pred"])
y_test1_pred = np.log10(mlp_predictions["test1_pred"])

6
y_test2_pred = np.log10(mlp_predictions["test2_pred"])
y_train_pred_cycavg = np.log10(mlp_predictions_cycavg["train_pred"])
y_test1_pred_cycavg = np.log10(mlp_predictions_cycavg["test1_pred"])
y_test2_pred_cycavg = np.log10(mlp_predictions_cycavg["test2_pred"])
# Evaluate error
RMSE_train[k], RMSE_test1[k], RMSE_test2[k] =
get_RMSE_for_all_datasets(y_train_pred, y_test1_pred, RMSE_train_cycavg[k],
RMSE_test1_cycavg[k], RMSE_test2_cycavg[k] =
get_RMSE_for_all_datasets(y_train_# Display errors
print(f'{model_name} RMSE: Train={RMSE_train[k]:.0f} cyc,
Test1={RMSE_test1[k]:.0f} cyc, Test2={print(f'{model_name} RMSE, cycavg:
Train={RMSE_train_cycavg[k]:.0f} cyc, Test1={RMSE_test1_cycavg[8]

6
REFERENCES

[1]. Liu, J., Saxena, A., Goebel, K., Saha, B., & Wang, W. (2010). An adaptive
recurrent neural network for remaining useful life prediction of lithium-ion
batteries. Annual Conference of the Prognostics and Health Management
Society, PHM 2010.
[2]. Cheng, Y., Lu, C., Li, T., & Tao, L. (2015). Residual lifetime prediction for
lithium-ion battery based on functional principal component analysis and
Bayesian approach. Energy, 90.
https://doi.org/10.1016/j.energy.2015.07.022
[3]. Zhou, Y., & Huang, M. (2016). Lithium-ion batteries remaining useful life
prediction based on a mixture of empirical mode decomposition and
ARIMA model. Microelectronics Reliability, 65.
https://doi.org/10.1016/j.microrel.2016.07.151
[4]. Mo, B., Yu, J., Tang, D., & Liu, H. (2016). A remaining useful life
prediction approach for lithium-ion batteries using Kalman filter and an
improved particle filter. 2016 IEEE International Conference on
Prognostics and Health Management, ICPHM 2016.
https://doi.org/10.1109/ICPHM.2016.7542847
[5]. Zhao, Q., Qin, X., Zhao, H., & Feng, W. (2018). A novel prediction method
based on the support vector regression for the remaining useful life of
lithium-ion batteries. Microelectronics Reliability, 85.
https://doi.org/10.1016/j.microrel.2018.04.007
[6]. Sun, Y., Hao, X., Pecht, M., & Zhou, Y. (2018). Remaining useful life
prediction for lithium-ion batteries based on an integrated health indicator.

6
Microelectronics Reliability, 88–90.
https://doi.org/10.1016/j.microrel.2018.07.047
[7]. Severson, K. A., Attia, P. M., Jin, N., Perkins, N., Jiang, B., Yang, Z., Chen,
M. H., Aykol, M., Herring, P. K., Fraggedakis, D., Bazant, M. Z., Harris, S.
J., Chueh, W. C., & Braatz, R. D. (2019). Data-driven prediction of battery
cycle life before capacity degradation. Nature Energy, 4(5), 383–391.
https://doi.org/10.1038/s41560-019-0356-8
[8]. Xiong, R., Zhang, Y., Wang, J., He, H., Peng, S., & Pecht, M. (2019).
Lithium-Ion Battery Health Prognosis Based on a Real Battery
Management System Used in Electric Vehicles. IEEE Transactions on
Vehicular Technology, 68(5), 4110–4121.
https://doi.org/10.1109/TVT.2018.2864688
[9]. Ma, G., Zhang, Y., Cheng, C., Zhou, B., Hu, P., & Yuan, Y. (2019).
Remaining useful life prediction of lithium-ion batteries based on false
nearest neighbors and a hybrid neural network. Applied Energy, 253.
https://doi.org/10.1016/j.apenergy.2019.113626
[10]. Yang, F., Wang, D., Xu, F., Huang, Z., & Tsui, K. L. (2020). Lifespan
prediction of lithium-ion batteries based on various extracted features and
gradient boosting regression tree model. Journal of Power Sources, 476.
https://doi.org/10.1016/j.jpowsour.2020.228654
[11]. Liu, H., Naqvi, I. H., Li, F., Liu, C., Shafiei, N., Li, Y., & Pecht, M. (2020).
An analytical model for the CC-CV charge of Li-ion batteries with
application to degradation analysis. Journal of Energy Storage, 29.
https://doi.org/10.1016/j.est.2020.101342
[12]. Hasib, S. A., Islam, S., Chakrabortty, R. K., Ryan, M. J., Saha, D. K.,
Ahamed, M. H., Moyeen, S. I., Das, S. K., Ali, M. F., Islam, M. R.,

6
Tasneem, Z., & Badal, F. R. (2021). A Comprehensive Review of Available
Battery Datasets, RUL Prediction Approaches, and Advanced Battery
Management. IEEE Access, 9, 86166–86193.
https://doi.org/10.1109/ACCESS.2021.3089032
[13]. Afshari, S. S., Cui, S., Xu, X., & Liang, X. (2022). Remaining Useful Life
Early Prediction of Batteries Based on the Differential Voltage and
Differential Capacity Curves. IEEE Transactions on Instrumentation and
Measurement, 71. https://doi.org/10.1109/TIM.2021.3117631
[14]. Azyus, A. F., Wijaya, S. K., & Naved, M. (2023). Prediction of remaining
useful life using the CNN-GRU network: A study on maintenance
management[Formula presented]. Software Impacts, 17.
https://doi.org/10.1016/j.simpa.2023.100535
[15]. Xiong, W., Xu, G., Li, Y., Zhang, F., Ye, P., & Li, B. (2023). Early
prediction of lithium-ion battery cycle life based on voltage-capacity
discharge curves. Journal of Energy Storage, 62.
https://doi.org/10.1016/j.est.2023.106790
[16]. Zhou, Z., & Howey, D. A. (2023). Bayesian hierarchical modelling for
battery lifetime early prediction. IFAC-PapersOnLine, 56(2), 6117–6123.
https://doi.org/10.1016/j.ifacol.2023.10.708
[17]. Lyu, G., Zhang, H., & Miao, Q. (2023). RUL Prediction of Lithium-Ion
Battery in Early-Cycle Stage Based on Similar Sample Fusion Under
Lebesgue Sampling Framework. IEEE Transactions on Instrumentation and
Measurement, 72, 1–11. https://doi.org/10.1109/TIM.2023.3260277
[18]. Xu, Q., Wu, M., Khoo, E., Chen, Z., & Li, X. (2023). A Hybrid Ensemble
Deep Learning Approach for Early Prediction of Battery Remaining Useful
Life. IEEE/CAA Journal of Automatica Sinica, 10(1), 177–187.

6
https://doi.org/10.1109/jas.2023.123024
[19]. Li, X., Yu, D., Søren Byg, V., & Daniel Ioan, S. (2023). The development of
machine learning-based remaining useful life prediction for lithium-ion
batteries. In Journal of Energy Chemistry (Vol. 82).
https://doi.org/10.1016/j.jechem.2023.03.026
[20]. Wang, S., Fan, Y., Jin, S., Takyi-Aninakwa, P., & Fernandez, C. (2023).
Improved anti-noise adaptive long short-term memory neural network
modeling for the robust remaining useful life prediction of lithium-ion
batteries. Reliability Engineering and System Safety, 230.
https://doi.org/10.1016/j.ress.2022.108920
[21]. Ezzouhri, A., Charouh, Z., Ghogho, M., & Guennoun, Z. (2023). A Data-
Driven-Based Framework for Battery Remaining Useful Life Prediction.
IEEE Access,
11.
https://doi.org/10.1109/ACCESS.2023.3286307
[22]. Gong, S., He, H., Zhao, X., Shou, Y., Huang, R., & Yue, H. (2023).
Lithium-Ion battery remaining useful life prediction method concerning
temperature-influenced charging data. International Conference on
Electrical, Computer, Communications and Mechatronics Engineering,
ICECCME 2023. https://doi.org/10.1109/ICECCME57830.2023.10253076
[23]. Zheng, X., Wu, F., & Tao, L. (2024). Degradation trajectory prediction of
lithium-ion batteries based on charging-discharging health features
extraction and integrated data-driven models. Quality and Reliability
Engineering International. https://doi.org/10.1002/qre.3496

You might also like