0% found this document useful (0 votes)

47 views71 pages

AI and Machine Learning Report Sample 2

Uploaded by

Junaid Qaiser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views71 pages

AI and Machine Learning Report Sample 2

Uploaded by

Junaid Qaiser

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

UNIVERSITY OF BIRMINGHAM

SCHOOL OF COMPUTER SCIENCE

Artificial intelligence and Machine Learning MSc

Deep Learning for Remaining Useful Life Estimation

in Lithium Ion Batteries

Mohammed Eesa Asif

Student ID: 2458628

Computer Science Supervisors:

Dr. Qamar Natsheh
Dr. Mubashir Ali

Materials and Metallurgy Supervisor:

Dr. Alireza Rastegrapanah

Date
September 17, 2023
“There is a better way for everything. Find it.”

– Thomas Edison
Abstract
The rapid proliferation of electric vehicles (EVs) has resulted in a significant number of
EV batteries in circulation, sparking increased interest in recycling and reusing them due
to their considerable market share and dependence on rare earth materials like nickel
and cobalt. As the demand for battery recycling grows, accurate assessment of bat-
tery health and estimation of the remaining useful life (RUL) becomes a critical concern.
Consequently, this study aims to enhance the precision of RUL estimation for lithium-ion
batteries (LIB) by employing data-driven methods, specifically leveraging deep learning
techniques. Our approach involves in-depth analysis of battery parameters and historical
performance data, yielding substantial RUL prediction accuracy improvement. Notably,
the Convolutional, LSTM, Densely Connected (CLDNN) model and Transformer-LSTM
(Temporal-Transformer) model stand out as exceptional RUL predictors. Specifically, the
CLDNN model achieved a MAE of 54.012, and MAPE of 25.676, while the Temporal-
Transformer model displayed a loss MAE of 65.134, and MAPE of 28.7932. These deep
learning models allow for detailed analysis of various battery parameters, historical per-
formance data, and other pertinent factors, improving the accuracy of remaining useful life
estimates. By refining RUL estimation accuracy, this research endeavours to contributes
to the field of battery recycling by enabling the identification of batteries with RUL, thereby
promoting their reuse and minimising waste.

Key-words: Deep Learning, Remaining Useful Life, Estimation, Lithium-Ion Batteries,

Battery Management Systems, Recycling and Reuse, Battery Degradation
Acknowledgement
I would like to express my profound gratitude to Dr. Natsheh and Dr. Mubashir of the Computer Sci-
ence department. They consistently provided weekly feedback on the optimal approach to research
and structuring of the thesis, ensuring that each section was impactful. Additionally, their insights and
wisdom led to significant improvements in the overall structure of the report, and their expertise in
modelling techniques was incredibly beneficial. Their guidance and unwavering support ensured that
my work met the department’s standards, enhancing the overall quality of my thesis.

My sincere thanks are extended to Dr. Alireza from the School of Metallurgy and Materials for his
invaluable contributions to this project. His initial proposal, guidance, and previous research have
greatly enriched my research experience and deepened my insights into the field.

I am also indebted to Professor Christopher Baber for his meticulous review of the project. His in-
sightful feedback and actionable suggestions during the demonstration meeting were invaluable.

I must also acknowledge the Ph.D. candidate Cesar Contreras of the Extreme Lab Robotics team.
His assistance and engaging discussions have been instrumental in elevating my understanding of
data preprocessing of the dataset.

Moreover, I wish to convey my appreciation to Dr. Ma’d El-Dalahmeh, a postdoc at Newcastle Univer-
sity. His expertise in data-driven techniques and his insights into the physical chemistry of Lithium-Ion
Batteries not only inspired but also shaped the structure of the research I conducted.

My heartfelt appreciation goes to all these individuals, whose collective efforts and unwavering sup-
port played an essential role in the successful completion of my thesis.

ii
Contents
1 Introduction 1
1.1 Background and Motivation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim and Objectives: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Report Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 4
2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Overfitting & Underfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.4 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.5 Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.6 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Dot product and convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Modeling Temporal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Long-term Dependencies issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Transformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 Positional encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.2 Multi-head attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.3 Layer normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.4 Feed-forward network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1 Mean absolute error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.2 Root Mean square error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.3 Mean Absolute Percentage Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Li-ion Batteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6.1 Understanding Health Estimation techniques . . . . . . . . . . . . . . . . . . . . 14
2.6.2 Li-ion battery ageing mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Literature Review 17
3.1 Physics-based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Electrochemical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Empirical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.3 Equivalent Circuit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.4 Electrochemical impedance Models . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.5 Filter-Based Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.6 Limitation of Physics-Based models . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Data-driven approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Statistical Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.2 Deep Learning approaches: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 Knowledge Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Methodology 26
4.1 Dataset Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Stratified Random Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 Outlier removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.2 Smoothing and Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.3 Data preprocessing tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

iii
4.4 Designing the networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.1 CLDNN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.2 Temporal-Transformer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4.3 Model Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.4.4 Hyper-parameter Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5 Hyperparameter Analysis for the hybrid models . . . . . . . . . . . . . . . . . . . . . . . 38
4.5.1 Observations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Results and Evaluation 40

5.1 Baseline Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Comparing all the tested temporal models . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Best performing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.1 CLDNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3.2 Temporal-Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 Comparison against SOTA approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Discussion 47
6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 Deficiencies and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7 Conclusion 49
7.1 Thesis Contributions: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

8 Appendices 50
8.1 Appendix A - Various Neural Network architectures explained . . . . . . . . . . . . . . . 50
8.2 Appendix B - Examples of the analytical equations used in physics based models . . . . 51
8.3 Appendix C - Filter Based Method Background . . . . . . . . . . . . . . . . . . . . . . . 51
8.4 Appendix D - LIB charge/discharge policies explained further . . . . . . . . . . . . . . . 54
8.5 Appendix E - Pseudo code for best performing models . . . . . . . . . . . . . . . . . . . 55
8.6 Appendix F - Introduction To Liquid Neutral Networks . . . . . . . . . . . . . . . . . . . . 55
8.7 Appendix G - The training loss plotted for all the temporal models . . . . . . . . . . . . . 57
8.8 Appendix H - Source code explanation, Software and Hardware . . . . . . . . . . . . . . 57

References 59

iv
List of Figures
1 Architecture of the Deep Learning based framework for battery lifetime prediction. . . . 2
2 Plot of the common activation function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Gradient descent: finding the global minimum . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Neural network with 4 layers; input, 2 hidden, and output layer . . . . . . . . . . . . . . . 7
5 VGG-16 neural network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6 Recurrence relation, the update of the hidden state, and the output of an RNN . . . . . 9
7 Visualisation of an LSTM unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
8 Multi-head attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9 Factors affecting battery age during cycling and their associated degradation modes
(Birkl et al., 2017) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
10 Different approaches to battery RUL prediction . . . . . . . . . . . . . . . . . . . . . . . 17
11 Schematic diagram of multi-step charging and constant-current discharging (Xu et al.,
2023). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
12 Comparison for the Internal resistance first element in data. . . . . . . . . . . . . . . . . 29
13 Comparison for the Discharge time first element in data. . . . . . . . . . . . . . . . . . . 29
14 Comparison of Discharge Quantity for the first element in data. . . . . . . . . . . . . . . 30
15 Discharge capacity versus cycle number. . . . . . . . . . . . . . . . . . . . . . . . . . . 31
16 Convolutional Long Short-Term Memory Deep Neural Network (CLDNN). . . . . . . . . 34
17 The Temporal Transformer model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
18 MAE - Mean Absolute Error for each of the models. . . . . . . . . . . . . . . . . . . . . . 42
19 MAE- Mean Absolute Error for the best performing models. . . . . . . . . . . . . . . . . 42
20 MAPE- Mean Absolute Percentage Error for the best performing models. . . . . . . . . 43
21 MAPE- Mean Absolute Percentage Error for the best performing models. . . . . . . . . 43
22 MSE- Mean Squared Error for the best performing models. . . . . . . . . . . . . . . . . 44
23 MSE- Mean Squared Error for the best performing models. . . . . . . . . . . . . . . . . 44
24 Epoch count for convergence of each model vs process time in seconds . . . . . . . . . 45
25 Training loss (MSE) of each of the models . . . . . . . . . . . . . . . . . . . . . . . . . . 57
26 Training loss (MSE) of each of the models plotted . . . . . . . . . . . . . . . . . . . . . . 57

v
List of Tables
1 Comparison of Physics-Based Models (PBM) vs Data-Driven Models (DDM) . . . . . . 20
2 Summary of DL-based RUL estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Comparison of the results of different models explored. . . . . . . . . . . . . . . . . . . 40
4 Caparison of our model against other state-of-the-art approaches. . . . . . . . . . . . . 46
5 Explainig the code base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

vi
Nomenclatures

vii
1 Introduction
Electric Vehicles (EVs) are increasingly becoming a preferred option in the automotive industry, moti-
vated by a global emphasis on sustainable transportation. Governments, such as the United Kingdom
with its Road to Zero targets, are incentivizing EV adoption through measures like tax subsidies (Of-
fice, 2018). While the high performance, efficiency, and environmental benefits of EVs contribute to
their growing popularity, there exist significant challenges, particularly concerning their battery tech-
nology.

1.1 Background and Motivation:

While Lithium-ion batteries (LIBs) have become the favored choice for Electric Vehicles (EVs) owing
to their notable energy density and prolonged life spans (Plett, 2015), their real-world application is
riddled with challenges. The prediction of the remaining useful life (RUL) of LIBs becomes partic-
ularly complex, given the constraints of physics-based models. Such models demand an intricate
understanding of battery electrochemical dynamics, which are not just labor-intensive but also com-
putationally taxing (Berckmans et al., 2017; Eisenstein, 2021; Chatzakis et al., 2003). Oversimplified
assumptions, parametrization issues stemming from production discrepancies, elevated computa-
tional challenges, and a vulnerability to small uncertainties further impede their practical usability.

Additionally, the literature indicates significant knowledge gaps in existing data-driven models, in their
robustness, generalizability, efficacy of deep learning approaches, and several other intricate con-
siderations. These challenges underscores the need for a resilient and robust Battery Management
System (BMS). Without it, the high costs associated with LIBs and the ramifications of poor operating
conditions that expedite degradation pose significant risks, emphasizing the need of accurate battery
behavior modelling.

1.2 Aim and Objectives:

This project is driven by the ambition to harness deep learning techniques for improved estimation
of battery health, an indispensable component for the safe and efficient functioning of Lithium-ion
batteries (LIBs). As accurate State-of-Health (SOH) estimation becomes paramount across diverse
sectors like electric vehicles, renewable energy systems, and portable electronics, the endeavour is
to leverage data-driven models Pinegar and Smith (2019). Such models are poised to overcome the
constraints of conventional modelling approaches, offering precise predictions pivotal for advanced
battery management systems. This aim is centred around the pressing question: ”How can data-
driven techniques, especially deep learning, be effectively integrated to refine battery management
systems?”

Addressing this central inquiry encompasses multiple phases:

1. Investigate methods to preprocess the raw signals with different sampling rates to reduce their
dimensionality while keeping essential features
2. Examine the best strategies for designing and fine-tuning a deep neural network to predict
Remaining Useful Life of Li-ion batteries.
3. Evaluate the different hybrid models to enhance predictive capabilities.
4. Contrast the efficacy of the suggested models against leading-edge algorithms.

1.3 Limitations
In the existing body of literature concerning battery technology and Prognostic and Diagnostic health,
several limitations are prevalent, particularly in the context of modelling the complex nature of lithium-
ion batteries. Physics-Based Models (PBMs) are at the forefront of estimating the remaining useful life
(RUL) of these batteries. These models leverage electrochemistry and physics principles to simulate
battery behavior over time, considering factors like ion diffusion, chemical reactions, capacity degra-
dation, cycle effects, and thermal influences. While PBMs offer enhanced accuracy by incorporating
real-world data for proactive maintenance, they face challenges due to their demand for in-depth elec-
trochemical insights, limited real-world accuracy, high computational complexity, and the intricacies

1
of parameterization influenced by manufacturing variations. These challenges underscore the mo-
tivation for exploring Data-Driven Methods, particularly through the utilisation of statistical machine
learning techniques and deep learning approaches.

1.4 Thesis Overview

In this study, we delve into a comprehensive examination Data driven methods for RUL prediction
of LIB. The dataset employed for our research, featuring 124 commercial lithium-ion batteries that
underwent extensive cycling until failure within specified operational conditions. This dataset en-
compasses critical parameters, including discharge capacity, temperature, internal resistance, and
discharge time. To effectively address the inherent heterogeneity within the dataset, we devised a
novel hybrid approach that combines Convolutional Neural Networks (CNN), Long Short-Term Mem-
ory (LSTM), and Dense Neural Networks (DNN), collectively referred to as CLDNN. Additionally, we
introduced an innovative model known as the Temporal Transformer, combining the strengths of the
Transformer and LSTM architectures, although several other models were considered during the ex-
perimentation phase.

A pivotal aspect of this research was the meticulous optimisation of model hyperparameters using
Bayesian Optimization techniques, aimed at maximising the models’ predictive accuracy. The pri-
mary objective throughout this endeavour was to develop a robust predictive model for estimating the
Remaining Useful Cycles (RUL) of lithium-ion batteries, accounting for intricate temporal dynamics
and feature variations across different battery batches.

To ensure the dataset’s suitability for model training and evaluation, an array of preprocessing steps
were executed. These steps encompassed outlier detection and removal, data smoothing, and inter-
polation techniques, all of which collectively served to enhance data quality and improve the model’s
predictive accuracy.

Our research, illustrated in Figure 1, underscores the significance of harnessing the power of hy-
brid neural network architectures and optimising hyperparameters using Bayesian techniques. This
not only advances the state-of-the-art in RUL prediction but also demonstrates an effective approach
to handling diverse and heterogeneous datasets. The outcomes of this study hold significant promise
for improving the reliability and efficiency of lithium-ion batteries across various applications.

Figure 1: Architecture of the Deep Learning based framework for battery lifetime prediction.

2
1.5 Report Structure
Chapter 2 introduces Machine Learning, Artificial Neural Networks, and Lithium-ion Battery complex-
ities. Chapter 3 reviews existing models, categorising them as physics-based or data-driven and
identifying knowledge gaps. Chapter 4 outlines our methodology, detailing the dataset, preprocessing
steps, and neural network designs. In Chapter 5, we present experimental results, comparing our
solutions to baseline models and leading approaches. Chapter 6 reflects on our achievements, limita-
tions, and future enhancements. The final chapter, Chapter 7, summarises key insights and broader
implications.

3
2 Background
This chapter offers an extensive survey of the deep learning algorithms implemented in this project.
It aims to enrich the readers’ understanding of the forthcoming chapters by supplying a broader per-
spective. It also includes an introduction to Lithium-ion Batteries (LIB) and various ageing mecha-
nisms of LIBs.

2.1 Machine Learning

As described by Ng (2009), ”Machine learning is the science of getting computers to act without being
explicitly programmed”. The following three paradigms can define the algorithms in machine learning
(ML):
• Supervised-learning: we have a training dataset for input-output pairs; the output is the super-
vised information. We aim to build a model mapping an input to an output so that the predicted
output is similar to the actual output (Goodfellow et al., 2016).
• Unsupervised-Learning: We only have feature information. We aim to find a pattern among the
feature information or an internal representation of the input (Goodfellow et al., 2016).
• Reinforcement-Learning: Learning paradigm where we want to interact with the environment.
Our goal is to maximise some reward function (Goodfellow et al., 2016).

2.1.1 Overfitting & Underfitting

The different behaviour in training and testing leads to concepts of underfitting and overfitting. Over-
fitting means that the model performs well on the training data but does not generalise to testing data
(Raschka and Mirjalili, 2019). It happens when the model is too complex relative to the amount of
training data. Underfitting is the opposite of overfitting: it occurs when the model is too simple to learn
the underlying structure of the data.

Dropout

Dropout is a regularisation technique in machine learning, commonly used in neural networks, where
a fraction of neurons is randomly deactivated during each training iteration based on a preset dropout
rate (Raschka and Mirjalili, 2019). By preventing the network from relying too heavily on specific neu-
rons, dropout helps mitigate overfitting, enhancing the model’s generalisation performance on unseen
data, and is turned off or scaled down during inference for accurate predictions (Goodfellow et al.,
2016).

2.1.2 Regularisation
In ML algorithms, we aim to find a model that exhibits good performance on the training examples
while maintaining a level of simplicity for effective generalisation (Raschka and Mirjalili, 2019). One
approach to achieve this balance is through regularisation, where the emphasis is placed on en-
couraging smaller weights (Goodfellow et al., 2016). By doing so, no single feature dominates the
prediction, promoting a more balanced and robust model behaviour.

Regularised Least Square Regression. Given dataset S = (x(1), y(1)), ..., (x(n), y(n)) and a reg-
ularisation parameter λ > 0, we find a model to minimise.

1 n 2 λ
C(w) = Σi=1 (y(i) − wT x(i) ) + ||w||22 (1)
2n 2
As can be seen, the objective function has two components: the data-fitting term and the regulariser.
We aim to get a model with good behaviour on training examples to minimise the data-fitting term.
To minimise the regulariser, we want to get a model with simplicity for generalisation to unseen data.
These two terms are generally contradictory: minimising the data-fitting term asks for a complex
model, and minimising the regulariser asks for a simple model. We can balance these two terms
by choosing an appropriate regularisation parameter. (Note: If λ is large, then we focus on simple
models. If λ is small, then we focus on complex models.)

4
2.1.3 Activation Function
There are several choices of activation functions. Activation functions are employed to increase non-
linearity in the neural network (so the Universal Approximation Theory is valid for neural networks).
Their behaviour is illustrated in Figure 2.

Figure 2: Plot of the common activation function.

Sigmoid and Tanh are differentiable. However, these two use exponential functions and require more
computation (exponential function is more difficult to compute than a maximum operator) (Raschka
and Mirjalili, 2019). We can see the behaviour of Sigmoid and Tanh are very similar. The difference
is the sigmoid function outputs value range in [0, 1], while Tanh outputs values range in [−1, 1] (Good-
fellow et al., 2016).

Sigmoid Function
1
σ= (−∞, ∞) (2)
1 + e(−x)
Hyperbolic tangent
e(x) − e(−x)
T anh = [−1, 1] (3)
e(x) + e(−x)
Rectified Linear Unit
ReLU (x) = max(0, x) [0, ∞) (4)
Rectified Linear Unit (ReLU) is simple to compute since it involves only the maximum operator. ReLU
is a continuous function but not differentiable. It can be checked from the definition that ReLU is not
differentiable at the zero point. ReLU is widely used in modern neural networks due to its simplicity
(Raschka and Mirjalili, 2019).

2.1.4 Gradient descent

Gradient descent is one of the most straightforward but very general algorithms for minimising an
objective function C(w), first proposed by Cauchy in 1847 (Lemaréchal, 2012). It is an iterative
algorithm which allows us to update our learn-able parameters (weights and biases), starting from
w(0) (t = 0) and producing a new w(t+1) at each iteration as:

w(t+1) = w(t) − η∇C(w(t) ), t = 0 , 1 , ... (5)

where ηt > 0 is the learning rate or step size. Therefore, GD uses a negative gradient as a search
direction and moves along this direction with step size ηt . Here we see that the essential part of im-
plementing gradient descent is to compute the gradient (Goodfellow et al., 2016). Once we know how
to compute the gradient, updating the parameters along the negative gradient direction is trivial. The
gradient computation has a closed-form solution for linear regression. For other problems like neural
networks, while we do not have a closed-form solution for gradient, the gradient can be computed by
some algorithms (Backpropagation) (Goodfellow et al., 2016).

5
Figure 3: Gradient descent: finding the global minimum

2.1.5 Artificial Neural Network

Artificial neural networks (ANNs) first proposed by Rosenblatt (1958) are machine learning algo-
rithms inspired by the brain’s structure. They consist of interconnected nodes, called neurons, that
can approximate various functions. Evolutionary algorithms can also be used to determine model
parameters such as weights and biases if sufficient resources are available. ANNs offer a flexible
framework for capturing complex patterns and relationships in data through a process of iterative
training and adjustment of parameters. Gradient-based optimisation algorithms, such as stochastic
gradient descent, are commonly used due to their efficiency and theoretical foundations (Goodfellow
et al., 2016).

2.1.6 Backpropagation
The mathematical foundation of Backpropagation is the chain rule. Direct use of the chain rule is not
possible due to the large number of parameters. The neural network is structured in several layers.
Each neuron has nodes in the computation graph(LeCun et al., 2015). We then define several back-
propagated gradients as the gradient of the loss w.r.t. a node in the computation graph (Raschka
and Mirjalili, 2019). Due to the structure of the Neural Network, we can find a recursive relationship
l
between back-propagated gradients. The parameters of the network are the weights (wij ) and the
l
biases (bi ) . We need to compute the gradient of the loss function w.r.t. these parameters for the
implementation of gradient descent. An example of 2 hidden layers can be seen in figure 4.
Before introducing the backpropagation algorithm in NNs, we first introduced notation
• L : number of layers.
• n : width of network (varies between layers).
l
• wij : weight of connection between the i-th unit in layer (l−1) to the j-th unit in layer l (superscript
i is input layer, superscript i is output layer ).
• bli : bias of i-th unit in layer l.
P l l−1
• zil = j wij aj + bli : weighted input to unit i in layer l.

• ali = σ(zil ) : activation of unit i in layer l, where σ is an activation function.

To apply gradient descent to optimise the weight - w and bias - b in a neural network, we apply the
chain rule (Linearised example):

∂C ∂C ∂v
= · (6)
∂u ∂v
|{z} ∂u
|{z}
back-propagated gradient local gradient

6
Figure 4: Neural network with 4 layers; input, 2 hidden, and output layer

We have the following relationship between zil and wij

n
X
l l−1
zil = wij aj + bli (7)
j=1

alj = σ(zil ) (8)

l l−1
∂zil ∂wij aj
l
= l
= al−1
j (9)
∂wij ∂wij
Therefore according to the chain rule, the updated rule for each parameter is:
∂C ∂C ∂zil ∂C
= l· = l · al−1 (10)
l
∂wij l
∂zi ∂wij ∂zi j

∂C ∂C ∂z l ∂C
l
= l · li = l (11)
∂bi ∂zi ∂bi ∂zi
∂C ∂zil ∂zil
In the proceeding equation in ∂zil
s back-propagated gradient w.r.t to a node. While l
∂wij
and ∂bli
is
the local gradients. The back-propagated gradient for both weights and bias is the same s can be
pre-computed:
∂C
δil := l (12)
∂zi
Vectorized Back-propagation
∂C l ∂zil ∂C l ∂zi
l

l
= δ i · l
, = δ i · (13)
∂wij ∂wij ∂bli ∂bli

In matrix form:
∂C ∂C
 
... δ1l al−1 δnl al−1
   l
l
∂w11 l
∂w1n 1 ... n δ1
 . ..  =
 .. ..  =  ..  · l−1
 .
.   .  a1 ... al−1 (14)

 . ... .   . ... n
∂C ∂C
l
∂wn1
... ∂wl δnl al−11 ... δnn al−1
n δnl
nn

The gradient w.r.t. both the weight matrices and bias in terms of back-propagated gradients:
∂C ∂C
= δ l · (al−1 )T , = δl (15)
∂W l ∂bl
Finally, we can update are parameters based on the update rule:
∂C ∂C
W(t+1) = W(t) + η b(t+1) = b(t) + η (16)
∂W l ∂bl

7
2.2 Convolution Neural Network
A Convolution Neural Network (CNN) is an artificial neural network initially designed for image pro-
cessing and optical character recognition tasks introduced by LeCun et al. (1995). CNNs can learn
features from data invariant to translation, rotation, and transformations. CNNs comprise a series of
layers, each performing a different operation on the input matrix (originally was an image) (LeCun
et al., 2015). The first layer in a CNN is typically a convolutional layer, which applies a series of
filters to the input data, like an image. The filters are designed to extract specific features from the
image, such as edges, corners, and textures. The output of the convolutional layer is then passed to
a pooling layer, which reduces the size of the output (Goodfellow et al., 2016). This is done to reduce
the network’s computational complexity and help the network learn abstract features. The output of
the pooling layer is then passed to a series of fully connected layers, which are similar to the layers
found in traditional neural networks. The fully connected layers combine the features learned by the
convolutional layers and pool layers to make a final prediction. Next, we will cover the mathematical
underpinning of CNNs. Figure 5 is a popular CNN highlighting each layer’s different operations.

2.2.1 Dot product and convolution

Dot product of two vectors a = [ a1 , a2 , ... , an ] and b = [ b1 , b2 , ... , bn ]:
n
X
a·b= ai bi = a1 b1 + a2 b2 + ... + an bn (17)
i

So the convolution operation given data matrix x:

k
X k
X
Conv = w ∗ x = Conv [i, j] = w [u, v] x [i − u, j − v] (18)
u=−k v=−k

Where u and v are indices in the kernel grid and i, j are indices in the data matrix, and k denotes the
radius of the kernel.

2.3 Modeling Temporal Data

Temporal data refers to data that is ordered and dependent on time (Sutskever and Vinyals, 2014).
It includes any type of information that changes or evolves, such as stock prices, weather measure-
ments, sensor readings, or the sequence of words in a sentence. ”Sequential modelling” is a tech-
nique used to analyse and make predictions based on temporal data. It involves building models that
can capture and learn patterns, trends, and dependencies within the sequential nature of the data.
These models consider the order and relationships between data points in the sequence. A com-
mon approach for sequential modelling is recurrent neural networks (RNNs), long short-term memory
(LSTM) networks, and gated-recurrent units (GRU), which have internal memory that enables them
to maintain information about past data points as they process new ones.

2.3.1 Recurrent Neural Network

Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to process sequen-
tial data by maintaining an internal memory. Unlike traditional feedforward neural networks, RNNs
have connections forming a directed cycle, allowing information to be passed from one sequence
step to the next. RNNs can consider the context of previous inputs when processing the current input,
making them effective in modelling and understanding sequences with varying lengths and complex
temporal relationships (Karpathy, 2015). Figure 6 shows a mathematical illustration of a recurrence
relation.

Figure 6: Recurrence relation, the update of the hidden state, and the output of an RNN

9
2.3.2 Long-term Dependencies issues
Recurrent Neural Networks (RNNs) have a shortcoming known as the ”long-term dependencies” issue
(Bengio et al., 1994). It refers to the difficulty RNNs face in capturing and retaining information from
earlier steps in a sequence when processing long sequences. This occurs because RNNs’ internal
memory tends to degrade or lose important information over time due to the vanishing or exploding
gradient problem during training (Hochreiter, 1998).

2.3.3 Long Short-Term Memory

To address the limitation of long-term dependencies, LSTM networks were introduced by Hochre-
iter and Schmidhuber (1997). LSTMs are a type of RNN that incorporate specialised memory cells
and gating mechanisms to handle long-term dependencies better. These memory cells have a more
sophisticated structure that allows them to retain or forget information over multiple time steps se-
lectively. LSTMs achieve this through three main components: the input gate, the forget gate, and
the output gate. These gates regulate the flow of information, determining what information to add to
the memory, what information to forget, and what information to output at each time step. By care-
fully controlling the flow of information, LSTMs can effectively capture and retain relevant long-term
dependencies, making them more capable of modelling and understanding complex sequential data
Raschka and Mirjalili (2019). Figure 7 shows a depiction of an LSTM unit. The LSTM cell has the

Figure 7: Visualisation of an LSTM unit

following gates:
• Forget gate (ft ): which is a neural network composed of the sigmoid function. The forget gate
decides which information is allowed to flow through and which to suppress . Subsequently, ft
is computed as follows:
ft = σ(Wxf xt + Whf ht−1 + bf ) (20)

• Input Gate (it ): which is another neural network composed of the sigmoid function. Responsible
for updating the cell state.
it = σ(Wxi xt + Whc ht−1 + bi ). (21)

• Candidate layer (Ct′ ): which is a neural network composed of the hyperbolic tangent. This layer
is responsible for updating the cell state. The computation is:

Ct ′ = tanh(Wxc xt + Whc ht−1 + bc ). (22)

To compute the cell state at the time [t], the following update rule is used:

Ct = (Ct−1 ⊙ ft ) ⊗ (it ⊙ Ct ′ ). (23)

10
• Output Gate (ot ): which once again is a neural network composed of the sigmoid function.
Which decides how the values of the hidden computational units are updated. Computed as
follows:
ot = σ(Wxo xt + Who ht−1 + bo ). (24)

• Hidden State (ht ): which is a vector. The hidden units at the current time step are computed as
follows:
ht = ot ⊙ tanh(Ct ). (25)

• Memory State (Ct )

Inputs to the LSTM cells are represented by the x, also known as the current input. The vector h is
the previous hidden state, and the vector C which is the previous memory state (Raschka and Mirjalili,
2019). The outputs from the cells are h (current hidden state) and C(current memory state).

2.4 Transformers
To address the limitations of RNNs and LSTM net-
works in processing sequential data, the trans-
former model is introduced by Vaswani et al.
(2017). RNNs and LSTMs effectively capture
dependencies within a sequence but suffer from
sequential computation, making them difficult to
parallelise and train. Additionally, they struggle
with capturing long-range dependencies due to
the vanishing/exploding gradient problem (Bengio
et al., 1994). On the other hand, the Transformer
model introduced a self-attention mechanism that
allows it to capture global dependencies in the
input sequence. Instead of processing the se-
quence step-by-step like RNNs, the Transformer
can process all positions in the sequence simul-
taneously. This parallelisation significantly speeds
up training and inference (Fang et al., 2021). The
self-attention mechanism in Transformers enables
them to assign different weights to different posi-
tions in the input sequence, giving more impor-
tance to relevant parts of the sequence. This
helps Transformers capture long-range dependen-
cies more effectively than RNNs or LSTMs. An-
other advantage of Transformers is their ability
to handle variable-length input sequences with-
out padding or truncation. This contrasts RNNs
and LSTMs, which require fixed-length input se-
quences (Vaswani et al., 2017).
The encoder stack
The original encoder stack had six layers (Vaswani et al., 2017). The output of layer l is the input to
layer l + 1. The encoder stack’s first layer contains an input embedding sub-layer where input features
are mapped to a vector of dimension (dmodel = 512). For this to occur, the input features must be
tokenised. Once tokenised, the input must be embedded (x′ ).

2.4.1 Positional encoding

We add a positional encoding value to the input embedding instead of having additional vectors to
describe the token’s position in the data sequence. Rothman (2022) introduced sine and cosine
functions that can be used to generate different frequencies for the positional vectors. The following
computation is implemented:

pos
pvpos 2i = sin 2i i ∈ [0, 255] (26)
10000 dmodel

11

pos
pvpos 2i+1 = cos 2i i ∈ [256, 512] (27)
10000 dmodel
The embedding vector being a constant (512) begin from i = 0 to i = 511. This results in the sine
function being implemented on the even numbers and the cosine on the odd numbers.

Next the above position vectors pv are added to the embedded data (x′ ) to obtain the positional
encoding of the input data x:
pe(x) = (x′ ) + pv(x) (28)
The output of the positional encoding lead to the multi-head attention sub-layer.

2.4.2 Multi-head attention

The input of the multi-attention sub-layer of the first layer of the encoder stack is a vector that contains
the embedding and the positional encoding of each data point. The dimension of the vector for
each data point is xn of input is dmodel = 512 (Vaswani et al., 2017). Each data point is mapped
to all the data points to see how it fits in the sequence. So now, instead of running a considerable
calculation for training the model using the dmodel for each data point which would be computationally
expensive, we divide the dmodel = 512 of each data point all point in given input feature into 8 dn = 64
dimensions (Rothman, 2022). This allows 8 heads parallel computation, significantly improving the
speed depicted in figure 8.

Figure 8: Multi-head attention

The output of each multi-head attention layer is a matrix Z.

M ultiHead(ouput) = Concat(Z0 , Z1 , Z2 , Z3 , Z4 , Z5 , Z6 , Z7 ) (29)

Within the the attention mechanism each head hm , there are three matrix representations:
• Query matrix Q, which seeks all the key-value pairs of the other data points matrices, and
dimension dq = 64.
• Key matrix K, trained to provide attention values, and dimension dk = 64.
• Value matrix V, trained to provide attention values, and dimension dv = 64.

This leads us to the Scaled-dot product attention, which is computed as follows:

QKT

Attention(Q, K, V) = sof tmax √ V (30)
dk

2.4.3 Layer normalisation

Each attention layer and each feedforward sub-layer of the transformer is followed by post-layer nor-
malisation (Post-LN). The normalisation layer contains the previous sub-layer and a residual con-
nection which improves learning and allows information on gradient going backwards to skip a layer
(Rothman, 2022). Which is described thus follows:

LayerN orm(x + Sublayer(x)) (31)

12
The sub-layer can be the Attention Layer or the Feedforward Network layer. Next, we assign the input
of layer normalisation as r = x + Sublayer(x). Moreover, thus, the layer norm is calculated as follows:
r−µ
LayerN orm(r) = γ +β (32)
σ
Where γ is a scaling parameter and β is a bias vector, µ is the mean of r of dimension d:
d
1X
µ= rn (33)
d n=1

σ is the standard deviation of r of dimension d:

d
2 1X
σ = (rn−µ ) (34)
d n=1

2.4.4 Feed-forward network

The feedforward network plays a critical role in the transformer architecture by performing non-linear
transformations on the input sequence. Comprising multiple layers of fully connected neural net-
works, each layer includes two linear transformations followed by a non-linear activation function like
ReLU (Rothman, 2022). The feedforward network operates on embeddings derived from the preced-
ing self-attention layer. Notably, it applies these transformations independently to each position in
the sequence, disregarding positional relationships. This parallel computation feature enhances the
efficiency of the transformer model.
F F N (x) = max(0, xW1 + b1 )W2 + b2 (35)
The output of the feedforward network is then fed into the next self-attention layer for further process-
ing in the transformer architecture or passed on decoder stack.

2.5 Evaluation Metrics

In the field of Remaining Useful Life (RUL) prediction for Lithium-ion Batteries (LIBs), various metrics
are employed to evaluate the performance and accuracy of predictive models (Yang et al., 2021).
While different metrics serve different purposes, three commonly used metrics for consistency and
comparison in RUL prediction are Mean Squared Error (MSE), Mean Absolute Percentage Error
(MAPE), and Mean Absolute Error (MAE).

2.5.1 Mean absolute error

Calculates the Mean Absolute Error (MAE) for the model prediction. It takes the true values (ytrue ) and
predicted values (ypred ), converts them to tensors, computes the absolute difference between them,
and then returns the mean of the differences (Raschka and Mirjalili, 2019). The result is multiplied by
the scaling factor (ρ) to scale the output.
n
1X
M AE(ŷ, y) = |(ŷ − y)| (36)
n i=1

2.5.2 Root Mean square error

Calculates the Root Mean Squared Error (MSE) for the remaining cycles prediction. It computes the
logarithm of the accuracy ratio between the predicted and true values (normalised by the minimum
value) and then returns the mean of the logarithmic differences (Raschka and Mirjalili, 2019).
v
u n
u1 X 2
RM SE(ŷ, y) = t (ŷ − y) (37)
n i=1

13
2.5.3 Mean Absolute Percentage Error
Mean Absolute Percentage Error (MAPE) is a commonly used cost function in forecasting and re-
gression tasks. It measures the average percentage difference between predicted and actual values
(Raschka and Mirjalili, 2019).
n
1 X y − ŷ
M AP E(ŷ, y) = | | (38)
n i=1 y

MAPE is expressed in percentage terms, making it easy to interpret. A lower MAPE indicates a
better predictive model or forecasting accuracy, representing a minor average percentage difference
between the predicted and actual values.

Each metric captures different aspects of prediction performance, comprehensively evaluating mod-
els’ accuracy, precision, and relative error magnitudes. The consistency in metric selection aids in
bench-marking and advancing the state-of-the-art RUL prediction for LIBs, enabling researchers to
build upon existing knowledge and techniques.

2.6 Li-ion Batteries

Lithium-ion batteries (LIB) have become instrumental in a multitude of energy storage applications,
including electric vehicles (EVs), hybrid electric vehicles (HEVs) and grid storage technology (Hu
et al., 2017). However, their performance inevitably diminishes over time due to the degradation
of electrochemical components, which contributes to capacity and power fade (Mikolajczak et al.,
2012). This phenomenon, known as battery ageing, stems from various coupled ageing mechanisms
and is influenced by factors such as battery chemistry, manufacturing, environmental and operating
conditions.

2.6.1 Understanding Health Estimation techniques

Prognostics and diagnostics are essential aspects of maintaining and ensuring the safe operation of
systems, including LIB. They help to predict and detect the state and health of batteries. Here’s what
each term means:

1. Diagnostics:
• Definition: It refers to the process of identifying and diagnosing present anomalies, faults,
or defects in a system. For lithium-ion batteries, diagnostics might include detecting short
circuits, overcharging, undercharging, temperature anomalies, or other immediate issues
(Wang et al., 2022).
• Objective: To determine the current state or condition of the battery and if it has any
immediate issues that need attention.
• Methods: Typically involves measuring and analysing present data like voltage, current,
temperature, and impedance. Fault detection algorithms or immediate state-of-charge
(SOC) and state-of-health (SOH) estimations can be parts of a diagnostic system. Where
SOH is defined in eqt. (1), here Ct is the current capacity and C0 is nominal capacity.
Ct
SOH = · 100% (39)
C0

2. Prognostics:

• Definition: This refers to the process of predicting the future state or performance of a
system based on its current and past behavior. LIB prognostics encompass the task of
predicting the remaining usable lifespan (RUL) or the potential point of future failure. This
entails foreseeing the count of charge and discharge cycles the battery can undergo before
encountering issues(Wang et al., 2022).
• Objective: To predict the future performance and life expectancy of the battery so that
timely interventions or replacements can be planned to prevent unexpected failures.

14
• Methods: Involves modelling battery degradation, using historical and real-time data to
forecast future behavior. Techniques might include data-driven models, physics-based
models, or a combination of both. Machine learning or artificial intelligence can also play a
role in modern prognostics.
The significance of prognostics and diagnostics for lithium-ion batteries can’t be understated. Given
the critical role these batteries play in various applications – from electric vehicles to aerospace
(Scrosati et al., 2011).

The battery is said to have reached its end of life (EOL) when it can no longer satisfy its intended
application’s energy or power requirements. To guarantee the safety and reliability of batteries, de-
spite ageing, the application of health diagnostic and prognostic tools is indispensable (Li et al., 2019).
State of Health (SOH) estimation techniques have been devised to monitor the ongoing performance
of batteries in operation. The SOH is a measure of a battery’s current ability to store and supply
energy/power compared to its initial state, computed as the ratio of the actual cell capacity/resistance
to its initial value.

For applications where energy availability is crucial, such as in EVs, capacity is typically used to
characterise SOH (Farmann et al., 2015; Lu et al., 2013). Conversely, internal resistance is generally
adopted as a SOH metric in applications where power is paramount, such as in grid storage. Bat-
teries are typically considered at EOL (and consequently, deemed ready for replacement) when their
capacities decline below 80% of the initial values or when their internal resistances doubled (Farmann
et al., 2015; Lu et al., 2013).

The field of prognostics is concerned with the future degradation of battery energy/power and seeks
to predict when battery performance will become unsatisfactory (Zhang and Lee, 2011). It requires
information on current and historical degradation signals, often sourced from the SOH estimator, to
forecast the future state of the system under certain operating conditions. The devised SOH esti-
mation and health prediction algorithms are then incorporated into the battery management system
(BMS) for real-time monitoring. This affords users critical information about battery health and lifes-
pan, allowing them to monitor cell performance and plan maintenance or replacements ahead of time.

2.6.2 Li-ion battery ageing mechanisms

The effectiveness of health estimation and predictive tools hinges on a comprehensive understanding
and mathematical representation of the ageing processes and their causative factors in lithium-ion
batteries (Wenzl et al., 2005). Many research efforts have been dedicated to establishing cause-
effect relationships regarding performance deterioration. This summary outlines the prevalent ageing
mechanisms and highlights the chief stress factors. A more detailed overview can be gained from
specialised literature (Palacı́n, 2018; Yu and Manthiram, 2018).

Degradation in Li-ion batteries is predominantly categorised into three modes: loss of lithium inven-
tory (LLI), loss of active material (LAM) in the electrodes, and the increase of cell internal resistance
(Li et al., 2019). LLI arises primarily from consuming Li-ions by irreversible side reactions such as
solid electrolyte interface (SEI) formation and electrolyte decomposition reactions (Barré et al., 2013).
Meanwhile, LAM usually results from structural deterioration of electrodes due to active materials’ vol-
ume changes during cycling, chemical decomposition, and the formation of parasitic phases (Barré
et al., 2013; Birkl et al., 2017).

Understanding both calendar and cyclic ageing—deterioration occurring when the battery is not in
use and under continuous charge/discharge cycling, respectively—is critical for the design and ap-
plication of SOH estimation and RUL prediction tools. Calendar ageing is driven primarily by high
storage SOC and high temperature (Grolleau et al., 2014), while cyclic ageing is influenced by addi-
tional factors such as over-charge/discharge, current rate, and cycling depths (Vetter et al., 2005).

Other stress factors include high temperatures, which accelerate side reactions; low temperatures,
which slow down Li-ion transport; over-charge/discharge that lead to irreversible structural changes,
high currents that result in heat waste; and mechanical stresses from manufacturing and operational
procedures (Finegan et al., 2015; Qian et al., 2016; Cannarella and Arnold, 2013).

15
The battery management system (BMS) in large-format battery systems is typically tasked with man-
aging these operating conditions to ensure longevity and safe operation. Comprehending the impact
of ageing factors is indispensable for the development of reliable health diagnostic and prognostic
tools (Plett, 2015). Nevertheless, testing batteries under all potential operating conditions is practi-
cally unfeasible. Therefore, it is essential to focus on the stress factors that significantly affect battery
ageing in a specific application (Wenzl et al., 2005).

Figure 9: Factors affecting battery age during cycling and their associated degradation modes (Birkl
et al., 2017)

2.7 Summary
This chapter delved into various topics, such as Artificial Neural Networks, Machine Learning method-
ologies like Overfitting, Underfitting, Regularisation, and Activation Functions. It also tackles the pro-
cess of Optimisation using Gradient Descent, the principle of Backpropagation, the structure and use
of Convolutional Neural Networks, and the handling of Temporal Data with a focus on Long-term De-
pendencies. The revolutionary architecture of Transformers and the various Metrics used to assess
model performance are also explored.

Furthermore, the chapter provided a detailed discussion of the effectiveness of Lithium-ion (Li-ion)
batteries. It thoroughly comprehends ageing mechanisms in Li-ion batteries, highlighting the crucial
role of State of Health (SOH) estimation and prognostics. It sheds light on the influence of stress
factors in dictating the performance of these batteries.

Moving forward, we will explore the Literature Review, ranging from conventional methodologies to
the most advanced and contemporary approaches used for diagnosing Lithium-ion Batteries (LIB).

16
3 Literature Review
A core part of RUL prediction is modelling the battery, which is vital for maintaining the safe and op-
timal operation of the battery pack. Current Battery management systems (BMS) combine various
estimation techniques to determine the SOC, and power limit, equalised electrochemical cells and
predict its future state RUL (Plett, 2015). The most common Battery Model-based estimation tech-
niques for Li-ion batteries can be separated into two meta categories Physics-based model (PBM)
and data-driven models (DDM).

Figure 10: Different approaches to battery RUL prediction

3.1 Physics-based Models

Physics-based Models (PBM) play a crucial role in estimating the remaining useful life of lithium-ion
batteries. These models leverage the fundamental principles of electrochemistry and battery physics
to simulate the behaviour of the battery over time (Yang et al., 2021). By considering factors such as
ion diffusion, electrode reactions, discharge capacity, cycles, capacity fade, and thermal effects, these
models can provide valuable insights into the degradation mechanisms and predict the remaining
useful life of the battery. By integrating experimental data and real-time measurements, physics-
based models enhance their accuracy and enable proactive maintenance strategies, ensuring optimal
battery utilisation while minimising the risk of unexpected failures.

3.1.1 Electrochemical Modelling

Some PBMs include the use of partial differential equation (PDEs) governing electrochemical models
for LIB in EVs as demonstrated by Han et al. (2015), which introduced the need to simplify the elec-
trochemical models of LIB for improved simulation and computational efficiency in EVs. Currently, the
complex nature of the electrochemical model poses challenges for online simulations in EVs. There-
fore, it becomes essential to develop simplified models that can be used instantaneously under vari-
ous operating conditions. The proposed approach in the paper by Han et al. (2015) achieved this by
using an isothermal pseudo-two-dimensional (P2D) model. This model considers the spatiotemporal
dynamics of lithium-ion concentration, electrode potential in each phase, and the Butler-Volmer ki-
netics, which describes the electrochemical reactions at the electrode-electrolyte interface (Dickinson
and Wain, 2020). The P2D model is composed of several PDEs (See Appendix E 8.2), which describe
the solid-phase diffusion within the particles (Ramadesigan et al., 2010), in the electrolyte potential
occurring in the three regions in LIB, which composes of the positive electrode, negative electrode and
the separator (permeable membrane with opening large enough to let Li+ pass through unimpeded,
but small enough that the positive and negative electrode do not touch which would short circuit the
cell) (Yang et al., 2021).

17
The P2D model (Han et al., 2015), despite its accuracy, suffers from high computational costs, making
it less practical for EV applications. To address this issue, Lin and Tang (2016) propose employing
techniques that reduce the number of PDEs to be solved simultaneously, enabling faster computa-
tions with limited resources. However, it is essential to note that their experimentation was conducted
under isothermal conditions, which may limit the model’s applicability in real-world scenarios where
temperature fluctuations occur due to variations in internal resistance (Schweiger et al., 2010), side
reactions, short-circuiting, heat dissipation, high currents, and fast charging (Lyu et al., 2020).

3.1.2 Empirical Models

The Empirical Models (EM) are analytical models that utilise empirical data to establish the relation-
ship between terminal voltage and SOC through mathematical functions. These models capture the
nonlinear characteristics of a battery using reduced-order polynomials or mathematical expressions.
Some notable examples include the Shepherd model (Tremblay et al., 2007), the Nernst Model (Zhao
et al., 2014), and the recently introduced Modified Nernst Hysteresis Model (Meng et al., 2018),
which has demonstrated superior accuracy and particularly excels in scenarios involving continuous
discharging currents. However, similar to previous EM approaches, these methods suffer from draw-
backs such as high complexity, lengthy recursion times, and sub-optimal dynamic performance as it is
challenging to establish accurate mathematical models due to the complex nature of electrochemical
systems (Song et al., 2023).

3.1.3 Equivalent Circuit Models

The equivalent circuit models are based on an electrical circuit analogy, where the battery is rep-
resented by idealised components such as resistors, capacitors, and voltage sources. Equivalent
circuit models aim to simplify the intricate electrochemical processes occurring within the battery by
approximating them with a set of lumped parameters (Plett, 2015). These models primarily comprise
a voltage source, an internal resistor, and resistance-capacitance (RC) elements, which establish a
relationship between the SOH and the terminal voltage(Tian et al., 2014; Ouyang et al., 2014). The
resultant model often performs well on small datasets. However, it takes into account a small number
of features which means its generalizability is limited.

3.1.4 Electrochemical impedance Models

The electrochemical impedance model is a mathematical representation used to describe the be-
haviour of LIBs. It focuses on the battery’s impedance response, which measures its opposition to
the flow of alternating current (AC) signals (Plett, 2015). In this model, the battery is considered an
electrochemical system composed of various components, such as electrodes, electrolytes, and in-
terfaces. These components contribute to the overall impedance of the battery (Wang et al., 2021).
The impedance is typically represented as a complex function that incorporates both the resistance
(real component) and reactance (imaginary component) of the battery. The resistance represents
the ohmic losses in the system, while the reactance accounts for processes such as diffusion and
charge transfer. By analysing the impedance response at different frequencies, the electrochemi-
cal impedance model provides valuable insights into the internal dynamics and characteristics of the
battery. This information can be used to evaluate the battery’s state of health and performance and
diagnose any underlying issues. Existing approaches include the Randles model, the Impedance
model with Warburg component and constant phase environment (CPE) (Yahia et al., 2016).

Electrochemical models used to describe the behaviour of LIBs and other electrochemical systems
face several challenges. One of the main difficulties is the complexity inherent in these models (Tolouei
et al., 2020). They need to account for intricate electrochemical processes within the battery, includ-
ing chemical reactions, diffusion, and interface interaction. Additionally, the parametrisation of these
models is a complex task as they rely on numerous parameters that are often challenging to deter-
mine experimentally and subject to significant uncertainty. Furthermore, electrochemical processes
in batteries exhibit non-linear behaviour due to the complex interplay of factors such as electrode
kinetics, ion transport, and solid-state diffusion.

18
3.1.5 Filter-Based Models
The Kalman filter is the most commonly used filtering algorithm for State-of-Health (SoH) estimation
(Plett, 2006). It follows a process of ”prediction-measurement and correction” to estimate the system
state. In the case of a lithium-ion battery (LIB), which is a nonlinear system, the extended Kalman
filter (EKF) is often employed (Hu et al., 2012). Another approach is the dual extended Kalman filter
(DEKF), which utilises parameters from two independent Kalman filters to predict the State-of-Charge
(SoC) (Dai et al., 2012). These algorithms offer reduced computational complexity and are favourable
for online SoC prediction. However, the degradation processes in lithium-ion batteries exhibit complex
nonlinear behaviours that cannot be accurately modelled using linear equations.

Consequently, the Kalman filter fails to capture the intricate degradation patterns and non-linearities
(Spagnol et al., 2011). Despite the EKF’s attempt to linearise the non-linear dynamics using a first-
order Taylor expansion, the process introduces approximation errors, and the filter’s performance
heavily relies on the accuracy of the linearisation. The EKF may provide suboptimal estimates when
the non-linearities are significant. To better understand the Kalman filter, view Appendix C 8.3.

The Physics-Informed Smooth Particle Filter (SPF) introduced by El-Dalahmeh et al. (2023) is a com-
putational approach for predicting the remaining useful life (RUL) of lithium-ion batteries, incorporating
physics-based modelling and estimating degradation parameters directly from voltage and capacity
data using the Single Particle Model (SPM) (Lambert, 2018). The proposed framework achieves a
best-case RUL prediction of 2402 cycles at the prediction starting point of 2000 cycles and demon-
strated significantly lower relative errors than traditional approaches.

However, it may be limited in handling complex non-linear relationships inherent in battery behaviour,
which data-driven methods like machine learning and deep learning can better capture, offering po-
tential improvements in RUL prediction when faced with intricate battery characteristics. To better
understand the Particle filter, view Appendix C 8.3.

3.1.6 Limitation of Physics-Based models

Physics-based models face challenges in predicting lithium-ion batteries RUL.Performing well only in
narrow domains, they require a detailed understanding of battery electrochemical processes, which
are time-consuming and computationally intensive. Simplified assumptions limit their ability to capture
real-world complexities, resulting in lower RUL prediction accuracy. Parameterisation is challenging
due to manufacturing variations and difficulty in obtaining precise model parameters. The models
exhibit limited adaptability to dynamic environments and high computational complexity, hampering
real-time applicability. Calibration and validation procedures demand extensive data and resources.
Sensitivity to uncertainties affects accuracy, as minor errors propagate in measurements and as-
sumptions. Consequently, data-driven methods like machine learning have gained prominence for
their ability to learn complex nonlinear relationships from data, providing a promising alternative for
RUL prediction in lithium-ion batteries.

3.2 Data-driven approaches

Data-driven approaches have gained superiority over previous models for Remaining Useful Life
(RUL) prediction, primarily due to the recent surge in data availability and the popularisation of Elec-
tric Vehicle (EV) batteries (Li et al., 2019). Unlike traditional models that rely on analytical equations
or empirical assumptions, data-driven models leverage machine learning algorithms to learn patterns
and relationships directly from the available data(Yang et al., 2021).

One key advantage of data-driven models is their ability to capture complex and nonlinear relation-
ships that may exist in the data, which can be challenging for traditional models to capture accurately.
Data-driven models can automatically extract relevant features from the input data and adapt their
internal parameters to optimise the prediction performance (comparison of physics-based models
vs data-driven models can be seen in Table 1). Furthermore, data-driven models are more flexible
and adaptable compared to previous models. They can handle a wide range of input data formats,
including time-series data, sensor measurements, and operational parameters, allowing for a more
comprehensive analysis of battery behaviour. This flexibility enables the models to capture subtle
variations and interactions between factors affecting battery degradation and RUL.

19
Data-driven methods can also be further categorised into two major modelling methods. First, a
statistical machine learning method.

Table 1: Comparison of Physics-Based Models (PBM) vs Data-Driven Models (DDM)

Approach Strengths Weaknesses

• Extremely difficult to model intricate

dynamical nature
• Easy integration into BMS
• Requires regulated charging/dis-
• Abundance of techniques and litera- charging process
Physics-Based ture available
Models (Differ- • Initial condition variations affect accu-
• Low computational demand and racy
ential Analysis) complexity
• High sensitivity to temperature
• High interpretability changes
• Requires noise filtering

• High estimation precision

• No need to comprehend the complex
system
• High computational demand
Data-Driven • Applicable under dynamic operating
• Limited interpretability
Models (Ma- conditions
• Estimation accuracy strongly relies
chine Learning) • Eliminates the need for physics-
on data quality
based models
• Removes the requirement for differ-
ential analysis

3.2.1 Statistical Machine Learning

With the progress of data mining and pattern recognition techniques, there have been significant ad-
vancements in prognostics and diagnostics for RUL prediction. In the early stages, classical machine
learning algorithms were employed for RUL prediction, as demonstrated by Anton et al. (2013) who
utilised the Support Vector Machine (SVM) approach to estimate the SOC for high-capacity Lithium-
ion Batteries (LIB). Their approach achieved a maximum Absolute Error (AE) of 0.71%. These results
indicate a reasonable level of accuracy. Furthermore, the algorithm exhibited robustness and efficient
execution time, rendering it suitable for online training scenarios. Consequently, the field of prog-
nostics and diagnostics has witnessed significant improvements recently, mainly through data mining
techniques for recognising nonlinear relationships between RUL outputs and input parameters such
as voltage, current, and temperature.

Gaussian Process Regression

Additional statistical machine learning approaches used in Diagnostic for SoH and performance man-
agement balance of cells include Gaussian process regression (GPR) and auto-regressive models
introduced by Deng et al. (2020), where the Gaussian process is achieved by extending multivariate
Gaussian’s distributions to infinite dimension, which is interpreted as a distribution over continuous
function this is beneficial as it allows non-parametric modelling and probabilistic predictions. Incon-
sistencies between cells of a battery pack set barriers to building an explicit model to capture the
dynamics of the battery pack accurately (Li et al., 2019). To take advantage of non-parametric mod-
elling and probabilistic prediction (Wu et al., 2018), a data-driven method based on GPR is proposed
to estimate battery pack SOC accurately. The paper by Deng et al. (2020) mentions that the reg-
ular GPR model shows more estimation errors under extreme conditions than the auto-regressive
model. The proposed data-driven method based on Gaussian process regression (GPR) accurately
estimates battery packs’ state of charge (SOC), achieving SOC estimation errors lower than 3.4%

20
for different dynamic cycles and ageing states. Introducing an auto-regressive GPR model further
improves the accuracy, with SOC estimation errors lower than 3% and narrower confidence intervals.
The method shows resilience to extreme conditions and retains good performance, achieving RM-
SEs of SOC estimation as low as 3.21% and 3.87% for different cycling tests. However, there are
limitations to this approach. GPR models can be computationally expensive and may not scale well
for real-time or resource-constrained applications (Gijsberts and Metta, 2013). The accuracy of the
estimation heavily relies on the selection and extraction of input features that have high correlations
with SOC.

3.2.1.2 Bayesian Methods

Bayesian approaches introduced by (Hu et al., 2016) focus on machine-learning-based battery SoH
in EVs. It utilises the sample entropy of short voltage sequences as a signature of capacity loss and
employs advanced sparse Bayesian predictive modelling (SBPM) methodology to establish a SOH
monitor, which is compared to a polynomial model. The proposed approach integrates temperature
effects for a more comprehensive SOH estimator and combines SBPM and bootstrap sampling for
forecasting the remaining useful life. Experimental data from multiple lithium-ion battery cells at dif-
ferent temperatures are used for model construction, verification, and comparison, highlighting the
significance of a multi-cell setting in battery health prognosis. Using sample entropy and SBPM
demonstrates accuracy and robustness, with an average error of less than 1.2% at each temperature.
The study has shortcomings most notably the evaluation neglects the integration diverse operating
conditions like load profiles or cycling protocols.

Support Vector Machines and Support vector Regression

Other notable applications of classical DDM include Patil et al. (2015), which used support vector
regression (SVR) to accurately predict the remaining useful life (RUL) by extracting features from
voltage and temperature profiles. This method combines SVM and SVR-based machine learning
techniques, employing classification and regression models to estimate RUL based on critical fea-
tures from the profiles. The approach enables simultaneous and accurate RUL prediction for multiple
batteries, making it suitable for real-time onboard RUL estimation in electric vehicle battery packs,
achieving an RMSE of 0.357%. However, a limitation of this approach is its dependency on suffi-
cient data availability in the testing region, requiring further investigation regarding its accuracy when
applied to batteries cycled with continuously changing load profiles. Moreover, the exploration of
multi-layer frameworks and active online learning algorithms could enhance the prediction accuracy
of the model further.

3.2.2 Deep Learning approaches:

Deep learning methods have gained traction for their capacity to learn intricate patterns from raw
battery data, accommodating diverse datasets. By incorporating multi-layer neural networks, these
techniques enhance the construction of robust models, improving battery health prediction and RUL
estimation.

BMS usually measure variables like voltage (V), current (I), and temperature (T). These variables
are sampled at high rates during repeated charge and discharge cycles, resulting in very long time-
series data. Hidden within the aggregated data are factors that contribute to the degradation of the
battery. The resulting time series is often presented to the NN through a moving window, meaning NN
estimates at time t based on previous history. Due to the sequential nature of the data, a famous NN
architecture is some variation of Recurrent Neural Network (RNN) to find the relation between RUL
and the time series (Ho et al., 2002). However, vanilla RNNs, as such, struggle to capture long-term
dependencies attributed to the vanishing gradient phenomenon; hence, more sophisticated archi-
tectures like Long short-term memory cells (LSTM) (Hochreiter and Schmidhuber, 1997) and other
Gated Recurrent Units (GRUs) are used. Other NN architecture that have been explored include the
convolution neural network CNN (LeCun et al., 2015) extracting salient features from sparse time se-
ries data.

Temporal models:

The following paper Lipu et al. (2018) introduced a nonlinear autoregressive with exogenous input

21
(NARX)-based neural network (NARXNN) algorithm for an accurate and robust SOC estimation of
lithium-ion battery which is effective and computationally rich for controlling dynamic systems and
predicting time series. The authors propose to use a lighting search algorithm (LSA) to find the best
values of input delays, feedback delays, and hidden layer neurons for the NARXNN model. They
evaluate the performance of their proposed model under different discharge current profiles and tem-
peratures and compare it to other neural network methods. The results show that the proposed
NARXNN-LSA model achieves higher accuracy with less computational time than other SOC algo-
rithms achieving an RMSE 0.52%-1.26% and MAE 0.34%-0.76%. However, there are a few knowl-
edge gaps that could be addressed. First, the paper does not consider the impact of ageing on the
performance of the NARXNN-LSA model. As lithium-ion batteries age, their capacity and internal
resistance slowly degrade, impacting the accuracy of SOC estimation methods.

To analyse the non-stationary signal in the time-frequency domain of popular Li-ion datasets like
NASA and CALCE used for LIB RUL. Hence Bashir et al. (2022) introduced a comprehensive ap-
proach of applying various decomposition techniques, which include Discrete Wavelet Transform
(DWT), Empirical Mode Decomposition (EMD), and Variational Mode Decomposition (VMD). These
decomposition techniques are used to analyse the capacity trend of LIBs to extract meaningful fea-
tures and identify underlying patterns or trends in the data. This analysis was combined with the
NARX model, achieving prediction results for the VMD-NARX model to predict the future capacity
trajectory with 2.385% RMSE and 1.6% MAE. However, incorporating additional relevant features de-
rived from the battery’s operational data could enhance the prediction accuracy in addition to using
signal decomposition techniques. These features could include temperature, charging/discharging
rates, voltage profiles, and other battery-specific characteristics.

In Chemali et al. (2018) introduces a novel approach for RUL estimation in Li-ion batteries by leverag-
ing a recurrent neural network (RNN) architecture with long short-term memory (LSTM). Unlike tra-
ditional methods that rely on battery models, filters, or inference systems, the proposed LSTM-RNN
directly encodes dependencies over time to estimate RUL accurately. Furthermore, the machine-
learning capabilities of the LSTM-RNN enable generalisation to different conditions, as demonstrated
by training the model on datasets recorded at various ambient temperatures. The LSTM-RNN achieves
impressive estimation performance with low MAE values 0.774%-2.088% and RMSE < 3%, demon-
strating its potential as a powerful tool for Li-ion battery RUL estimation. One of the limitations
(Chemali et al., 2018) of the LSTM-RNN approach is its reliance on sequential data and temporal
dependencies. This makes it susceptible to errors when dealing with specific battery characteristics
that may not exhibit strong temporal patterns. By incorporating a CNN into the architecture, spatial in-
formation from battery measurements, such as voltage and current, can be captured more effectively.
CNNs excel at extracting local spatial features, making them suitable for analysing multidimensional
data like battery measurements. This can enhance the model’s ability to capture spatial patterns rel-
evant to RUL estimation, potentially improving its accuracy.

Convolutional models:

The incorporation of CNN was introduced by Shen et al. (2019), which focused on the online es-
timation of lithium-ion (Li-ion) battery capacity using a deep convolutional neural network (DCNN).
The proposed DCNN method utilises voltage, current, and charge capacity measurements during a
partial charge cycle to estimate cell-level capacity. The DCNN model leverages local connectivity
and shared weights to capture the complex data-to-health relationship of battery cells effectively. The
study validates the method using long-term cycling data from implantable Li-ion cells. Unlike tradi-
tional machine learning methods, such as shallow neural networks, SVM and SVR. The DCNN ap-
proach demonstrates higher accuracy and robustness in online capacity estimation achieving RMSE
1.477% MaxE 9.479%.

While the proposed Deep Convolutional Neural Network (DCNN) method (Shen et al., 2019) shows
considerable promise for estimating battery capacity, the study also acknowledges several limita-
tions. One major limitation is the fixed-size input matrix, coupled with a limited understanding of
the method’s inner workings, which restricts its applicability in capacity estimation and necessitates
further exploration. Additionally, the study emphasises the need to consider variable ambient temper-
ature conditions, recognising that temperature fluctuations can significantly influence battery capacity.
To address these shortcomings, additional enhancements may be pursued. Future research may use
the DCNN method with various charging protocols and dynamic cell operating schedules. Further-

22
more, integrating the DCNN model with a Recurrent Neural Network (RNN) could provide a way to
enhance scalability and identify temporal dependencies within the data (How et al., 2020). Such
prospective improvements would pave the way for a more thorough assessment of deep learning’s
effectiveness in the real-time capacity estimation of Li-ion batteries.

Zhou et al. (2020) proposed a temporal convolutional network (TCN) based SOH monitoring and
RUL prediction model for LIB. The commonly used SOH prediction and RUL estimation models are
deemed inefficient and incapable of capturing local capacity regeneration in batteries. The TCN model
incorporates causal convolution and dilated convolution techniques to improve its ability to capture lo-
cal regeneration, thereby enhancing the overall prediction accuracy. Residual connection and dropout
technologies expedite training speed and prevent overfitting in deep networks. Empirical mode de-
composition (EMD) is used to denoise offline data in RUL prediction to minimise errors caused by
local regeneration. However, some potential limitations/issues could include the specific applicability
of the TCN model to other battery chemistries or types, the generalizability of the findings beyond
the tested datasets, the computational and data requirements for implementing the model, and the
potential challenges of real-time implementation in practical applications.

The paper introduced by Ren et al. (2020) presents a data-driven approach, the Auto-CNN-LSTM
prediction model, for accurately estimating the remaining useful life (RUL) of lithium-ion batteries
(LIBs). The model leverages an improved convolutional neural network (CNN) and long short-term
memory (LSTM) to extract deep information from finite data. The approach involves a three-step
process: feature extraction, expansion, and RUL prediction. Firstly, an autoencoder is employed to
augment the original feature vectors, enhancing the CNN and LSTM training effectiveness. The ex-
tracted features are combined, incorporating time series and depth information. An output filter is
applied to smooth and stabilise the output of a multi-layer deep neural network (DNN) that predicts
the RUL. Experimental results on a real-world dataset demonstrate the effectiveness of the proposed
method, with a root mean square error (RMSE) of 4.8% and an accuracy rate of 94.97%.

However, several knowledge gaps and limitations exist in this Ren et al. (2020) study. One major
limitation is the scarcity of degradation data, which restricts the prediction accuracy of data-driven
methods. The authors acknowledge the need for a larger and more diverse dataset to enhance the
prediction model’s performance. Furthermore, the instability of model-driven methods caused by ex-
ternal factors, such as temperature, poses a challenge. Future research could focus on developing
robust techniques to handle the influence of these external factors. Additionally, although using an
auto-encoder reduces data noise, it cannot be eliminated. Exploring alternative methods to mitigate
the impact of data noise or employing noise-robust algorithms could improve the reliability of the pre-
dictions. Lastly, while the proposed model is compared with alternative data-driven models (DNN,
SVM), the paper does not consider other potential competing models, warranting further evaluation
and comparison

Hybrid models:

In Fan et al. (2020), a hybrid network with GRU and CNN for SoH estimation was proposed. The
input that the model accepted was the raw data which included V, I, and T. The GRU and the CNN
are concatenated in the final layer. The method achieved a maximum estimation error of 4.3% on the
NASA. Additional approaches on the same dataset NASA can be seen in Venugopal (2019), where
the authors proposed Independent RNN (IndRNN), which takes in several additional features that in-
clude capacity at different samples, time elapsed, time of current loads. This approach yields better
results, but real-world applications have some ambiguity as it takes capacity as an input parameter.
Ideally, one would have used the capacity estimated with the previous state within the RNN.

In Fei et al. (2023) paper focuses on predicting the remaining useful life (RUL) of lithium-ion batteries
using a limited amount of data. Most existing data-driven studies in this field use an enormous scope
of cyclic data over the entire battery life. However, such data may not always be available in real-world
applications. The paper aims to study battery RUL prediction from a new angle by predicting RUL us-
ing data collected from a limited number of incomplete cycles, precisely ten cycles, at any ageing
stage of the battery. The authors propose a deep learning framework called the attention-assisted
temporal convolutional memory-augmented network (ATCMN) to achieve accurate and rapid battery
RUL prediction under this challenging problem. The ATCMN framework integrates battery parame-
ters such as time, capacity, and temperature dimensions obtained from the partial discharge process

23
into a three-dimensional tensor input structure. The framework includes an attention module, tem-
poral convolution module, memory-augmented module, and prediction module to process the input
data, learn latent spatial-temporal feature representations, enhance feature representation through
historical information, and derive nonlinear mappings to predict battery RUL. The method achieved
the score of MAPE: 12.4%-32.8%. However, the limitation of the proposed approach is the restricted
availability of data, notably incomplete cycles. The authors acknowledge that real-world applications
may not always provide sufficient data for accurate predictions. Further research could explore meth-
ods to overcome this limitation, such as incorporating additional sources of information or developing
techniques for data augmentation.

3.2.3 Knowledge Gap

The literature on predicting the remaining useful life (RUL) of lithium-ion batteries (LIBs) reveals sev-
eral significant gaps.

• Robustness and Accuracy: For data-driven models to be implemented on end-user devices’

Battery Management Systems (BMS), they must be robust, reliable, and accurate Barillas et al.
(2015); Hashemi et al. (2021).
• Generalizability: Many current implementations tend to choose particular lithium-ion battery
(LIB) cells based on specific charging protocols. However, this approach, while beneficial for
those specific cells, limits the broader applicability of models and disregards the potential of
utilizing the complete dataset. Additionally, the presence of varying sampling rates for different
features further compounds the challenge, making it intricate to effectively train models. as seen
in Table 2.
• Deep Learning Approaches: Methods like convolutional neural networks (CNNs), recurrent
neural networks (RNNs), and hybrid models show promise in capturing complex patterns and
accommodating diverse datasets but still face challenges (Li et al., 2019).Need for exploring
and benhmarking various other temporal network such as Neural Turing Machine, Differentiable
Neural Computer and exploring the tranformer model.
• Dataset and Input features: Existing deep learning studies often underutilize datasets due to
varying feature sampling rates, risking the loss of essential information.
• Additional Considerations: These include the impact of ageing on prediction accuracy, analy-
sis of non-stationary signals, incorporation of relevant features from operational data, limitations
in capturing spatial information, variable ambient temperature conditions, applicability across
different battery chemistries, and handling limited or incomplete data.

In the following section, we will attempt to address some of these limitations by creating end-to-end
models that utilise the entire dataset, including features with disparate sampling rates. The goal is to
improve performance metrics for RUL in LIBs and, move one step closer to having these model be
implemented in BMS for real-time usage.

3.3 Summary
This literature review discusses two approaches for predicting the Remaining Useful Life (RUL) of
lithium-ion batteries: Physics-based Models (PBM) and Data-driven Models (DDM). PBMs, grounded
in electrochemistry and battery physics, offer insights into battery behaviour but struggle with high
complexity, detailed requirements, and uncertainty sensitivity. DDMs use machine learning algorithms
to learn from data, capturing complex, nonlinear relationships and are adaptable to various data
formats. Within DDMs, two categories exist: traditional statistical machine learning methods and
more recent deep learning methods, which learn complex patterns from raw data. Despite advances,
there remain gaps in achieving robust predictions, suggesting future research in areas like the impact
of ageing, signal analysis, spatial information capture, and applicability to different battery chemistries.

24
Table 2: Summary of DL-based RUL estimates.

Approach Input Outputs Dataset Performance Limitations

Random or experience-
NARXNN- MAE:0.34- based assignment of pa-
FUDS,
LSA (Lipu V,I,T SOC 0.76% RMSE: rameters and the need for
US06
et al., 2018) 0.55%-0.89% further experimental veri-
fication.
Would be better if a wider
VMD-NARX Historical SoH,
CALCE- RMSE:2.385% array of features could be
(Bashir et al., sampling-time SOC
4CC MAE:1.6% used also this is not an
2022) (discharge)
end-to-end model.
Requires robust SOH
Historical MAE:1.10%
LSTM-RNN estimation, Insufficient
SoH,V,I,T, (25◦ C)
(Chemali RUL+SOC NASA-4CC dataset-variability, longer
sampling-time MAE:2.17%
et al., 2018) warm-up start after 50%
(discharge) (−20◦ C)
battery life
V,I,T, sampling-
DCNN (Shen RMSE: 0.375% Large execution time,
time (dis- RUL SNL, NASA
et al., 2019) MaxE: 6.549% larger complexity
charge)
NASA-3CC, Requires robust SOH
TCNN (Zhou
Historical SoH SOH+RUL CALCE- RMSE: 0.048 estimation, Long-initial
et al., 2020)
2CC phase start-cycle 30
Historical Requires robust SOH es-
Auto encoder-
SoH,V,I,T, timation, what is SOH is
DNN (Ren SOH+RUL NASA-3CC RMSE: 13.2%
sampling-time noise prone, insufficient
et al., 2018)
(charge+discharge) performance
Auto-CNN- V,I,T,
RUL (cycles- Insufficient-dataset vari-
LSTM (Ren sampling-time NASA-6CC RMSE: 5.03%
count) ablity
et al., 2020) (charge+discharge)
Limited-data availabil-
discharging-
ATCMN (Fei RUL (cycles- CALCE- MAPE: 32.8% , ity (incomplete cycles),
time(t), V,
et al., 2023) count) 4CC, TRI MAE: 74 Generalizability, Inter-
capacity (Q)
pretability.

25
Figure 11: Schematic diagram of multi-step charging and constant-current discharging (Xu et al.,
2023).

4 Methodology
This chapter describes the dataset used in the project, which consists of 124 commercial lithium-
ion batteries cycled until failure under specific conditions. The dataset includes discharge capacity,
temperature, internal resistance, and discharge time. To handle the dataset’s heterogeneity, a hybrid
approach using Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Dense
Neural Network (DNN) was employed, referred to as CLDNN. Another model called Temporal Trans-
former was also used, combining Transformer and LSTM (Several other models were also consid-
ered). Hyper-parameter tuning was performed using Bayesian Optimization to optimise the models’
performance. The goal was to accurately predict the remaining useful cycles (RUL) of lithium-ion
batteries, considering temporal dynamics and feature variations across battery batches. The pre-
processing steps involved outlier removal, smoothing, and interpolation to enhance the data quality
and improve the model’s accuracy. Using these hybrid architectures and hyperparameter optimisation
allows for better RUL predictions and efficient handling of heterogeneous data.

4.1 Dataset Description

This project utilised a dataset of 124 commercial lithium-ion batteries that were cycled until failure
under fast-charging conditions in a convection temperature chamber set to 30°C. These batteries
were of the lithium iron phosphate (LFP)/graphite type (The nominal capacity of these cells is 1.1 Ah,
and they have a nominal voltage of 3.3 V). The dataset was published by Toyota Research Institution
(TRI). As depicted in Figure 11, the cells were initially charged with a current of C1 until they reached
a state-of-charge (SOC) of S1. Following that, they were charged with a current of C2 until the SOC
reached S2, a value consistent at 80% across all cells. Afterwards, the cells were charged from an
80% to 100% SOC using a 1 C-rate constant current-constant voltage (CC-CV) charging policy up to
the cut-off voltage of 3.6 V (Xu et al., 2023). The cells were then discharged at a constant current of
4 C-rate down to 2.0 V. All related signals, such as voltage, capacity, and current, were consistently
measured and recorded throughout the cycling process, whether within a single cycle or the entire
life cycle. The battery’s lifetime is determined by the cycle number at which the capacity falls to 80%
of its original value—for the 124 cell samples tested, lifetimes varied between 1350 and 2300 cycles.
For further information on charging profiles in LIB, refer to Appendix E 8.8.

4.1.1 Stratified Random Sampling

For dataset segmentation, stratified random sampling was the method of choice (Aoyama, 1954).
This approach randomly and proportionally selected cells from different battery batches, creating a
representative sample. To achieve this, the data had to be categorised based on specific batches or
production iterations of the batteries. This was crucial because there were inherent variations quality

26
control protocols across different batches. Ensuring a proportional representation of each batch’s
distinct characteristics in both training and test datasets was paramount in (Raschka and Mirjalili,
2019). By adopting this stratified approach, the risk of models overfitting to unique attributes of a
particular batch, which might not apply to others, was reduced. By including samples from all batches,
the predictive models aimed at determining the remaining useful life of batteries became more robust.
They were better equipped to capture batch-specific anomalies or trends, ultimately enhancing their
reliability and applicability across a wide range of Lithium-ion battery batches. Eighty per cent of the
data set was used for training, while the remaining 20 per cent was used for validation.

4.2 Feature Selection

The primary objective of our study is to optimise the prediction accuracy of RUL for LIBs. The dataset
was divided into three batches, each consisting of approximately 48 cells, to facilitate the analysis. It
is worth noting that each batch exhibited a few irregularities, which are removed in the prepossessing
stage (Severson et al., 2019).

Several vital features enhance the models’ accuracy in the context of RUL prediction for LIBs. First,
the input is Linearly Interpolated Discharge Capacity (Qdlin) provides insights into the deviations of
various battery parameters; LIB exhibits a voltage curve during discharge where the voltage gradually
decreases as the battery capacity is consumed. Establishing a voltage capacity and capacity rela-
tionship makes estimating the battery’s remaining capacity at a given time possible.

Second, the input Linearly Interpolated temperature (Tdlin) estimates the battery’s temperature at
a specific point based on measured temperature values at known time intervals. This focuses on the
temporal aspect of battery operation, allowing the analysis of temporal deviations within the chosen
window. By considering trends, cyclic patterns, and temporal dependencies, the model can improve
RUL predictions by accounting for the evolving dynamics of the battery system.

(Note: The above two features use interpolation because the discharge curve and temperature are
typically not measured at every single point but at specific points).

Other salient features include internal resistance (IR), a crucial parameter affecting battery perfor-
mance and degradation. Monitoring changes in internal resistance within the select window provides
valuable information about degradation and capacity loss, which can impact the remaining cycles.
The discharge data reflects the battery’s performance, degradation, and wear-out patterns. Analysing
discharge behaviour to capture essential factors such as capacity degradation and voltage decay
that influence the remaining cycles. The input quantity deviation of discharge (QD) provides further
insights into the deviations of various quantities or variables, enabling the detection of abnormal or
degraded behaviour in different battery parameters. Lastly, the remaining cycles serve as the target
variable and directly contribute to RUL prediction. Focusing on a select window of remaining cycles
allows the model to capture changing trends and patterns in the battery’s ageing process, enabling
accurate prediction of the remaining cycles. By incorporating these features, models can leverage
information about quantity deviations, temporal dynamics, internal resistance, discharge behaviour,
and remaining cycles to enhance the prediction of RUL for LIBs.

4.3 Data Preprocessing

Data preprocessing is a critical component of the data analysis pipeline as it converts raw data into a
suitable format for advanced analysis and modelling. Its significance lies in guaranteeing subsequent
data analysis tasks’ accuracy, reliability, and efficacy. In the following section, we outline the specific
steps and tools employed to achieve effective data preprocessing.

4.3.1 Outlier removal

Outliers in the data can significantly impact the accuracy and reliability of subsequent analysis or
modelling. Removing outliers helps to improve data quality by reducing the influence of extreme or
erroneous values. This ensures that the data used for further analysis is more representative and
reliable.

27
4.3.2 Smoothing and Interpolation
Smoothing techniques such as moving averages and spline interpolation can help reduce noise and
variability in the data. Applying these techniques gives the cleaned data a more precise representation
of the underlying patterns and trends. This can enhance the accuracy of subsequent analysis and
modelling by focusing on the meaningful aspects of the data.

4.3.3 Data preprocessing tools

The spline method in MATLAB refers to the interpolation technique known as cubic spline interpola-
tion. Cubic spline interpolation constructs a piece-wise cubic polynomial that passes through a given
set of data points while ensuring the smoothness and continuity of the interpolated curve (Sarfraz,
2003).

Mathematically, a cubic spline is defined by a set of cubic polynomials on each interval between
adjacent data points. Let’s denote the data points as (xi , yi ), where i = 1, 2, ..., n. The cubic spline
function S(x) can be represented as (Wang et al., 2021):
n
S(x) = Si (x) = ai + bi (x − xi ) + ci (x − xi )2 + di (x − xi )3 if xi ≤ x ≤ xi+1 (40)

where Si (x) is the cubic polynomial on the interval [xi , xi+1 ], and ai , bi , ci , di are the coefficients of the
polynomial. These coefficients are determined by solving a system of equations using the conditions
of interpolation and smoothness. The cubic spline interpolation provides a smooth curve passing
through the data points (cub).

Given a data vector or matrix X, the filloutliers function replaces outliers in X with interpolated
values using a specified method. Mathematically, the filloutliers method can be represented as:

output = filloutliers(X, method, threshold, direction) (41)

Where:
• output is the resulting data with outliers filled,
• X is the original data,

• method is the method used for outlier replacement (e.g., linear, spline, movmean),
• threshold is the threshold used to detect outliers,
• direction specifies the direction for outlier replacement (e.g., center, above, below).

The filloutliers method in MATLAB is a convenient way to handle outliers in data by replacing
them with reasonable values based on the specified method and threshold.

In the preprocessing stage, several preprocessing steps are applied to different features to enhance
the Remaining Useful Life (RUL) prediction for LIBs. Here are the steps for each feature:

• Internal Resistance
– Outlier removal: The filloutliers function is used with the spline method and a moving
average window of 100 to replace outliers in the data with interpolated values, which can
be seen in Figure 12.
• Discharge data

– Smoothing: The moving average smoothing function applies a moving average filter with
a window size of 15 to smooth the discharge data, which is illustrated in Figure 13.
• Discharge time
– Outlier removal: Similar to the internal resistance feature, outlier removal is performed
using the filloutliers function with the spline method and a moving average window of
100, as illustrated in Figure 14.

28
• Linearly Interpolated features (Tdlin, Qdlin)
– captures the temporal dynamics across the 124 cells. The number of cycles for each of
these cells differs. We ensure the sampling rate encompasses 1000 entries for every cycle
to maintain consistency. This comprehensive approach provides a robust representation of
the battery’s temporal discharge behaviour, essential for accurate RUL predictions. Given
the original signal’s complexity, a Principal Component Analysis (PCA) was applied post-
outlier removal to reduce the dimensionality of the signal without compromising its key
characteristics. The preprocessed Qdlin can be seen in Figure 15.

After applying the data preprocessing steps highlighted above, the cleaned data is stored in TFRecord
files; these files provide an efficient, compact, and integrated format tailored for TensorFlow, guaran-
teeing swift data access, consistency, and adaptability for extensive machine learning operations.

Figure 12: Comparison for the Internal resistance first element in data.

Figure 13: Comparison for the Discharge time first element in data.

29
Figure 14: Comparison of Discharge Quantity for the first element in data.

30
Figure 15: Discharge capacity versus cycle number.

31
4.4 Designing the networks
In order to provide superior prediction outcomes, hybrid neural networks combine multiple architec-
tures, leveraging the strengths of each. In this context, the architectures employed include Convolu-
tional Neural Network (CNN), Long Short-Term Memory (LSTM), and Dense Neural Network (DNN),
collectively referred to as CLDNN, and Temporal Transformer, a combination of Transformer network
with LSTM units.

These hybrid models were mainly selected due to the nature of the dataset used, which comprises
features with varying sampling rates and sizes. This disparity necessitates a tailored approach to
each feature type. This is especially true when some display more significant spatial-temporal depen-
dencies, such as Linearly Interpolated Temperature and Linearly Interpolated Discharge Capacity.

Several temporal models were considered for evaluation. These include the Convolution-Neural-
Network Differential-Neural-Computer (CNN-DNC), CNN-LSTM-Neural-Turing-Machine (CNN-LSTM-
NTM), CNN-Transformers, Transformer-Autoencoder, CLDNN, and Temporal Transformer. However,
only the CLDNN and Temporal Transformer demonstrated outstanding efficiency. For a deeper un-
derstanding of the aforementioned temporal neural network architectures, please refer to Appendix A
8.1. Due to the lesser performance of other models, we’ve chosen not to delve into their detailed de-
signs. In the following sections, we’ll explore the design and efficiency of the top-performing models:
CLDNN and Temporal

4.4.1 CLDNN Architecture

The CLDNN architecture, with a total of 1,518,665 trainable parameters, employs a combination of
CNN, LSTM, and DNN. The primary layers include CNN for feature downsizing, LSTM for temporal
dynamics, and DNN for output prediction. Other layers, such as dropout and dense layers, augment
the network’s performance. This model’s Hierarchical Data Format file can be seen in Figure 16 which
incorporates inputs such as Qdlin, Tdlin, IR, Discharge time, and QD. These data pass through sev-
eral layers, including convolution, max-pooling, LSTM, and Dense, to predict remaining useful cycles
for lithium-ion batteries.

Convolutional Layers

The CNN layers are specifically designed to capture the spatial patterns in the detailed feature sets,
Qdlin and Tdlin.

Feature Concatenation: The features Qdlin and Tdlin are concatenated along the third dimension,
creating a composite feature set.

Convolutional Sub-layers:
• A series of 1D convolutional layers, augmented by max-pooling operations, sequentially process
the data. This hierarchy aids in capturing multi-scale patterns present in the data.
• Parameters such as kernel size, strides, and the number of filters dictate the transformation
capability of these convolutional layers. These are meticulously chosen for optimal feature ex-
traction.
• Dropout layers are interspersed to prevent overfitting and to ensure generalisation.
Integration with Scalar Features: Post convolutional processing, the resultant feature set is combined
with scalar features (IR, Discharge time, and QD) for holistic data representation.

LSTM Layer

LSTM layers are adept at handling temporal sequences, capturing long-term dependencies which
might be essential for RUL prediction.

Temporal Sequence Processing:

• The combined feature set is passed through LSTM layers, ensuring the temporal relationships
are adequately captured.

32
• The dropout post LSTM processing acts as a regularising agent, ensuring the robustness of the
model.
Dense Layers

The output from LSTM layers undergoes further transformation via dense (fully connected) layers,
refining the model’s prediction.

4.4.2 Temporal-Transformer Architecture

The Temporal Transformer architecture, with 3,936,281 trainable parameters, combines Transformer
and LSTM. The Transformer component is primarily used for handling complex temporal dependen-
cies of the features, and LSTM captures the long-term relationships between the features. Similar
to the previous model, this architecture uses a dense layer for prediction. This model’s Hierarchical
Data Format file can be seen in Figure 17 where the Temporal Transformer model considers Qdlin,
Tdlin, IR, Discharge time, and QD as inputs, combining them in a concatenated detail layer. It applies
time-distributed layers to the data before feeding it into a transformer block. Post-dropout, the LSTM
and Dense layers process the transformed information to predict the remaining cycles for lithium-ion
batteries.

The following outlines each section of the transformer model and its training in detail:

Multi-Head Self Attention Mechanism:

The core of the transformer model lies in its ability to focus on different parts of the input sequence,
called attention mechanisms. In the multi-head variant, the model uses multiple sets of attention
weights to attend to different input parts simultaneously. This results in a richer context for each input
element.

Separate Heads: The embedding dimension is divided into multiple heads, allowing for parallel pro-
cessing of different attention mechanisms. This separation aids in enhancing the ability of the model
to capture varied nuances in the data.

Attention Computation: Attention scores are computed as the dot product between query and key
tensors. The scores are scaled and passed through a softmax function to generate attention weights.
These weights generate a weighted combination of the value tensor, resulting in output for the atten-
tion mechanism.

Transformer Block:

The Transformer Block is a composite layer incorporating the multi-head self-attention mechanism
and further processes its outputs.

Feed-forward Neural Network (FFN):

Post attention, a feed-forward neural network (comprising two dense layers) further refines the model
outputs. This aids in capturing non-linear relationships within the data. Normalisation and Dropout:

Following the attention and FFN mechanisms, layer normalisation stabilises activations and improves
training efficiency. Dropout layers are added post attention and FFN outputs, providing a regularising
effect and aiding in model generalisation.

4.4.3 Model Compilation

Datasets for both training and validation are generated using a dedicated function. This function pro-
cesses the cleaned data stored as tfrecords (section 4.3), ensuring it is moulded into sequences that
fit seamlessly as model inputs. This preprocessing step is crucial as it guarantees that data fed into
the models are in the appropriate structure and format, optimising the learning process.

33
An assortment of metrics is integrated into the training process to establish a multifaceted evalua-
tion of the model’s forecasting acumen (can be seen in section 2.5). Specifically, the Mean Absolute
Error (MAE) offers insights into the average magnitude of errors, the Mean Absolute Percentage Error
(MAPE) gives a normalised measure of inaccuracies, and the MSE presents a squared quantification
of prediction deviations. Together, these metrics grant a comprehensive perspective on the model’s
performance, guiding further refinements if necessary.

Figure 16: Convolutional Long Short-Term Memory Deep Neural Network (CLDNN).

34
Figure 17: The Temporal Transformer model

In addition to these models, several other neural network architectures were developed, as shown
in figure 1. However, the results obtained from these models were unsatisfactory, so their details were
omitted.

Several parameters and hyperparameters are used in Algorithm 1, which outlines Multi-Head Self
Attention and Transformer algorithm. The parameter embed dim represents the embedding dimen-
sion of the input, and it must be divisible by num heads, which is the number of attention heads in
the multi-head attention mechanism. The projection dim is computed as the quotient of embed dim
and num heads, defining the dimensionality of individual attention head projections. The dense layers
query dense, key dense, and value dense are responsible for transforming the input into queries,
keys, and values respectively, while combine heads merges the multiple head results. In the Trans-
formerBlock class, ff dim is the hidden layer size of the feed-forward network inside the transformer,
and rate is the dropout rate used for regularisation. Other elements like att, ffn, layernorm1, and
layernorm2 define the attention mechanism, feed-forward network, and layer normalisations. The

35
algorithm is eventually compiled with a specific loss function and metrics to measure the model’s
performance.

Algorithm 1 Multi-Head Self Attention and Transformer

Class MultiHeadSelfAttention:
Initialise embed dim, num heads
projection dim ← embed dim / num heads
Define query dense, key dense, value dense, combine heads
Function attention(query, key, value):
Compute score, scaled score, weights, output
return output, weights
Function separate heads(x, batch size):
return reshaped and transposed x
Function call(inputs):
Separate heads for query, key, value
Apply attention
return combined output
Class TransformerBlock:
Initialise embed dim, num heads, ff dim, rate
Define att, ffn, layernorm1, layernorm2, dropout1, dropout2
Function call(inputs, training):
Apply attention and dropout
return normalised output
Define dimensions embed dim, num heads, ff dim
Initialise transformer block
Define Additional LSTM model architecture
Compile model with loss and metrics

4.4.4 Hyper-parameter Optimisation

The hyper-parameter tuning for the hybrid model was achieved using Bayesian Optimisation. Hyper-
parameter tuning is the process of optimising the parameters of a machine learning model to enhance
its performance 2. Here is a step-by-step breakdown of the implementation used:
1. HyperModel Definition: A class MyHyperModel is defined, inheriting from the HyperModel class
of Keras Tuner. This class defines the structure of the model (which includes Conv1D, LSTM,
and Dense layers, Transformers block) and the space of hyperparameters to search over, in-
cluding the learning rate, the number of filters and kernel size for the convolutional layers, the
number of units in the LSTM and Dense layers, activation functions, and dropout rates.
2. Custom Bayesian Optimisation Tuner: A custom Bayesian Optimisation tuner class
MyBayesianOptimizationTuner is defined. It extends the BayesianOptimization class from
Keras Tuner and overrides the on error method to mark a trial as failed if an error is encountered
(Team).
3. HyperModel Instance and Tuner: An instance of MyHyperModel is created, and a Bayesian
Optimisation tuner is initialised with the hypermodel, an objective to minimise (the validation
mean absolute error of remaining cycles), and other parameters.
4. Hyper-parameter Search: The tuner performs the hyperparameter search over the specified
number of trials (10000 in this case) on the training dataset for a given number of epochs (10 in
this case) and evaluates the performance on the validation dataset.
5. Best Hyperparameters and Model: Once the search is complete, the best set of hyperparam-
eters is retrieved, and a new model is built using these hyperparameters. The best model found
during the search is also retrieved, and its weights are transferred to the new model.

36
Algorithm 2 Hyperparameter Optimization with Keras Tuner
1: Define the hypermodel class
2: Define the build method
3: Set up various hyperparameters (learning rate, convolution filters, etc.)
4: Define the architecture of the model (Input, Conv1D, MaxPooling1D, etc.)
5: Create the model
6: Compile the model with an optimiser and metrics
7: Return the model

The advantage of using Bayesian Optimization for hyperparameter tuning is that it uses past trial in-
formation to choose the next set of hyperparameters to evaluate, thus making it more efficient than
other methods like Grid Search or Random Search, which do not use past evaluation results.

Using Bayesian Optimization and Keras Tuner in this script allows for efficiently exploring a poten-
tially large hyperparameter space, improving the model’s performance without extensive manual trial
and error. The model’s hyperparameters, such as learning rate, number of convolution filters, and
LSTM units, significantly influence the model’s performance. By optimising these parameters, the
model can learn better patterns from the data, potentially leading to more accurate predictions.

37
4.5 Hyperparameter Analysis for the hybrid models
The configurations of two sets of neural network models, namely the Transformer-LSTM and CLDNN,
are presented in this report. For each model, both the original and the optimized hyperparameters
are detailed.
Transformer-LSTM: Transformer-LSTM-optimized:

• Embedding Dimension (embed dim): 64 • Embedding Dimension (embed dim): 32

• Number of Attention Heads (num heads): 8 • Number of Attention Heads (num heads): 2

• Hidden Layer Size (ff dim): 32 • Hidden Layer Size (ff dim): 32

• DENSE ACTIVATION: tanh • DENSE ACTIVATION: tanh

• DENSE UNITS: 40 • DENSE UNITS: 64

• LSTM ACTIVATION: relu • LSTM ACTIVATION: tanh

• LSTM UNITS: 132 • LSTM UNITS: 108

• DROPOUT RATE LSTM: 0.4 • DROPOUT RATE LSTM: 0.3

• OUTPUT ACTIVATION: relu cut • OUTPUT ACTIVATION: relu

• LEARNING RATE: 0.001 • LEARNING RATE: 0.0001

CLDNN: CLDNN-optimized:

• CONVOLUTION FILTERS: 56 • CONVOLUTION FILTERS: 44

• CONVOLUTION KERNEL: 27 • CONVOLUTION KERNEL: 12

• DENSE ACTIVATION: tanh • DENSE ACTIVATION: tanh

• DENSE UNITS: 40 • DENSE UNITS: 64

• LSTM ACTIVATION: tanh • LSTM ACTIVATION: tanh

• LSTM UNITS: 132 • LSTM UNITS: 108

• DROPOUT RATE CNN: 0.45 • DROPOUT RATE CNN: 0.3

• DROPOUT RATE LSTM: 0.4 • DROPOUT RATE LSTM: 0.3

• OUTPUT ACTIVATION: relu cut • OUTPUT ACTIVATION: relu cut

• LEARNING RATE: 0.001 • LEARNING RATE: 0.0001

4.5.1 Observations:
Several key observations were drawn from the hyperparameter setups of the presented neural net-
work models:

• The Transformer-LSTM-optimized seems to have reduced the embedding dimension and num-
ber of attention heads compared to the original model.
• Both Transformer-LSTM and CLDNN models have ’optimized’ versions with distinguishable hy-
perparameter configurations. For instance, the dense layer in the optimized models contains an
increased number of units, with Transformer-LSTM-optimized having 64 units compared to its
original 40. Additionally, the optimized configurations have a reduced dropout rate and learning
rate.
• There’s a prominent difference in the learning rates between the Transformer-LSTM models.
Specifically, the original Transformer-LSTM model has a learning rate which is ten times higher
than its optimized counterpart.

38
All models used a custom activation function called relu cut. This function is defined as:

def relu_cut(x):
return K.relu(x, max_value=1.2)

The custom activation function has several benefits:

• Regularization Effect: By setting an upper limit on the activation values at 1.2, the network
could potentially maintain robustness against overfitting or capturing noise.

• Stability in Training: The capping mechanism might mitigate the challenges posed by explod-
ing gradients, thereby ensuring smoother and more stable training dynamics.
These insights underscore the efforts to fine-tune the models for optimal performance. Further em-
pirical evaluations is provided in subsequent chapter.

Note: To learn more about reproducibility of our results refer to appendix H 8.8 which detail the
code as well as the hardware and software that was used.

4.6 Summary
This chapter explained the methodology employed for preprocessing raw data and the design of
hybrid neural networks. The data preprocessing stage involves outlier removal, smoothing and inter-
polation using tools such as MATLAB’s spline, filloutliers, and MovingAverage functions. Three
specific features - internal resistance, discharge data, and discharge time - were processed to en-
hance the Remaining Useful Life (RUL) prediction for Lithium-Ion Batteries (LIBs). We also detail
the hyperparameter setup for the experimentation. Two specific hybrid architectures were detailed:
Convolutional Long Short-Term Memory Deep Neural Network (CLDNN) and Temporal Transformer.
These models leverage the strengths of each of their component architectures (Convolutional Neural
Network (CNN), Long Short-Term Memory (LSTM), and Dense Neural Network (DNN) for CLDNN;
Transformer and LSTM for Temporal Transformer) to handle features with different sampling rates and
sizes, capture complex spatial-temporal dependencies, and provide superior prediction outcomes for
RUL.

Following this, we will delve into the collected findings of the fine-tuned models and compare these
results with the existing state-of-the-art techniques in the literature previously discussed in section 3.

39
5 Results and Evaluation
This section presents and compares our findings and results to other algorithms leveraging the vali-
dation data.

5.1 Baseline Models

In the initial phase of the study aimed at predicting the Remaining Useful Life (RUL) of Lithium-Ion
batteries, several baseline models were applied, which included a Long Short-Term Memory network
(LSTM), and a Transformer integrated with Global Max Pooling. The performance profiles of these
models are illustrated in Figures (25, 26). Despite these baseline models offering a necessary starting
point, they were eventually deemed sub-optimal for the objective at hand. Consequently, our attention
gravitated towards the examination and deployment of hybrid models.

During model development, it became evident that the most effective models necessitated two car-
dinal attributes. Firstly, they were required to be capable of managing sparse data by proficiently
extracting significant features. Secondly, they were expected to possess the capability to learn both
temporal and spatial relationships between the features, as elaborated in section 4.2, and the remain-
ing cycles of each Lithium-Ion battery.

After these findings, a comparative investigation was undertaken among an assortment of hybrid
models, which were developed based on deep learning and embraced these two critical characteris-
tics. The performance of these models is visually encapsulated in the following Figures (19, 21, 23),
with key performance indicators such as loss, Mean Absolute Error (MAE), Mean Absolute Percent-
age Error (MAPE), and Mean Squared Error (MSE) being conspicuously delineated.

5.2 Comparing all the tested temporal models

When placed in contrast with other models that embody the desired characteristics, such as Convolution-
Neural-Network Differential-Neural-Computer (CNN-DCN), CNN-LSTM-Neural-Turing-Machine (CNN-
LSTM-NTM), CNN-Transformers, and Transformer-Autoencoder, it becomes evident that these mod-
els exhibit higher error metrics, thereby underscoring their relative inefficacy in accurately predicting
the RUL for Lithium-Ion batteries can be seen in Figure 25 and Table 3. Refer to Appendix A 8.1 to
learn more about this architecture.

Table 3: Comparison of the results of different models explored.

Model MSE MAE MAPE RMSE %

CLDNN* 0.6754 54.012 25.676 0.8218

CNN-DCN 1.402 95.6365 46.408 1.1841
CNN-LSTM-NTM 1.333 284.887 - 1.1546
Transformer-LSTM* 0.7136 65.134 28.7932 0.8444
CNN-Transformers 0.6783 92.127 36.981 0.8236
Transformer-Autoencoder 1.524 288.951 - 1.2345

5.3 Best performing Models

In the course of assessing these results, it transpired that two models, namely the Convolutional,
LSTM, Densely Connected (CLDNN) and the Transformer-LSTM (Temporal-Transformer), emerged
as the most proficient in predicting the RUL, as outlined in Table 3. The CLDNN model exhibited a
MAE of 54.012, MAPE of 25.676, and MSE of 0.6754. Conversely, the Temporal-Transformer model
recorded a MAE of 65.134, MAPE of 28.7932, and MSE of 0.7136.

The results underscore the proficiency of both the CLDNN and Temporal-Transformer models in fore-
casting the cycle count, especially when benchmarked against other approaches. The CLDNN model,

40
with its Mean Absolute Error (MAE) of 54.012 cycles, suggests an average deviation between its pre-
dictions and the actual cycle count. Meanwhile, the Temporal-Transformer model follows closely with a
MAE of 65.134 cycles. In contrast, other models exhibit a significantly larger range in MAE, spanning
from 92 to 288 cycles. These comparative MAE values clearly demonstrate that both the CLDNN and
Temporal-Transformer models significantly outperform other models in terms of prediction accuracy.

5.3.1 CLDNN
The Convolutional Long Short-Term Memory Deep Neural Network (CLDNN) stands as an exem-
plar for predicting batteries’ Remaining Useful Life (RUL) owing to its three core components. At its
forefront, the CNN layers, inclusive of three sets of Conv1D layers coupled with max-pooling, are
dedicated to feature extraction from inputs (Qdlin and Tdlin). This extraction is vital for understand-
ing battery health, a strength derived from the convolutional component’s adeptness at reducing the
size of selected features and pulling out salient ones (as detailed in section 2.2). Additionally, the
LSTM layer, a hallmark of recurrent neural networks renowned for deciphering long-term dependen-
cies (section 2.3.3), provides insights into the temporal dynamics existing between battery cycles.
The synthesis of spatial and temporal features into the final RUL prediction is executed by the dense
layers. Inputs like IR, discharge time, and QD amplify the depth of insights the model can achieve.
Through hyperparameter tuning using the Keras Tuner, the model’s characteristics are meticulously
refined. In summation, by amalgamating convolutional layers, LSTM components, dense layers, and
battery-specific inputs, this architecture is rendered apt for battery RUL prediction, mirroring the mul-
tifaceted nature of battery health for accurate forecasts.

5.3.2 Temporal-Transformer
The Temporal Transformer stands out in its proficiency for predicting the Remaining Useful Life (RUL)
of machinery or components. One of its standout features is the transformer’s attention mechanism,
which, by identifying intricate patterns and dependencies in the time-series data, becomes invaluable
for precision in RUL predictions. This is augmented by its capability to process features with ease,
especially when dealing with intricate temporal dependencies owing to its Multi-head attention (as
detailed in section 2.4). This model deepens its understanding by processing multiple input features
through various layers, decoding complex sensor-parameter relationships. Beyond the transformer
block, the integration of an LSTM component ensures the data’s sequential essence remains cap-
tured, especially given the LSTM’s prowess in grasping temporal dynamics (refer to section 2.3.3).
The architecture’s allowance for tunable hyperparameters imparts adaptability across various scenar-
ios and potential for fine-tuning tailored to specific machinery or component types. Summarising, the
combination of transformer with LSTM layers, underpinned by a flexible structure, qualifies this model
as especially potent for RUL prediction, allowing insights into potential component maintenance or
failures. The complete architecture is depicted in Figure 17.

41
Figure 18: MAE - Mean Absolute Error for each of the models.

Figure 19: MAE- Mean Absolute Error for the best performing models.

42
Figure 20: MAPE- Mean Absolute Percentage Error for the best performing models.

Figure 21: MAPE- Mean Absolute Percentage Error for the best performing models.

43
Figure 22: MSE- Mean Squared Error for the best performing models.

Figure 23: MSE- Mean Squared Error for the best performing models.

44
Figure 24: Epoch count for convergence of each model vs process time in seconds

5.4 Comparison against SOTA approaches

The models titled Auto encoder-DNN (Ren et al., 2018), Auto-CNN-LSTM (Ren et al., 2020), and
ATCMN (Fei et al., 2023) stand out in the context of our research due to two principal reasons (can
be seen in Table 2). First, akin to our models, these are hybrid deep learning approaches which
leverage the strengths of various network architectures to optimise the prediction process. The Auto
encoder-DNN combines the power of autoencoders and deep neural networks, the Auto-CNN-LSTM
blends convolutional neural networks and long short-term memory networks, and ATCMN integrates
attention mechanism with temporal convolutional memory network.

Second, these models distinguish themselves from the others listed in the table by predicting the
Remaining Useful Life (RUL) of the batteries in terms of the remaining cycle counts, which aligns with
our research objectives. These models provide an estimate of the battery’s lifespan in terms of its
usage cycles, offering a more specific and actionable metric for battery health management.

Contrarily, the other models included in the Table 2 predominantly undertake a classification task,
predicting if the battery still has a useful life remaining or if it has reached the end of its life. While
this information is valuable, it lacks the level of detail provided by a cycle count prediction, which
can provide more precise information for scheduling maintenance, replacement, and other decisions
related to battery management. Therefore, the Auto encoder-DNN, Auto-CNN-LSTM, and ATCMN
models bear the most relevance to our study due to their hybrid architecture and their approach to
RUL prediction.

CLDNN (Convolutional LSTM DNN) and Transformer-LSTM models have exhibited superior perfor-
mance in predicting the Remaining Useful Life (RUL) of Lithium-Ion Batteries (LIB), particularly in
comparison to other temporal models such as LSTM-RNN, DCNN, and TCNN can be seen in Table
4. The superiority can be attributed to the intricate synergy of convolutional layers for feature extrac-
tion, LSTM layers for temporal sequence modelling, and dense layers for final predictions in CLDNN.
Transformer-LSTM leverages self-attention mechanisms to capture long-range dependencies across
sequences. Among the listed models, ATCMN is most similar to our model, utilising discharging time,
voltage, and capacity for RUL estimation. However, our models yield superior results for the remaining
cycles of LIB, for a moving window of 100 cycles and using CC-CV charge policy (section 4.1).
It is imperative to acknowledge that the Auto-CNN-LSTM model proposed by Ren et al. (2020)
operates on a distinct dataset, characterised by less variation and little diversity in its input param-
eters. The authors themselves have candidly acknowledged the limitations stemming from dataset
insufficiency (Ren et al., 2020). In comparison to our approach, which encompasses a more exten-
sive range of input parameters, this disparity underscores the potential for discrepancies in outcomes

45
Table 4: Caparison of our model against other state-of-the-art approaches.

Model MAE RMSE MAPE

Auto-CNN-LSTM (Ren et al., 2020) - 5.03% -
ATCMN (Fei et al., 2023) 74 - 32.8%
CLDNN(ours) 54 0.8218% 25.676 %
Transformer-LSTM (ours) 65.134 0.8444% 28.7923%

between the two models. The availability of a more comprehensive and diverse dataset in our study
contributes to a more robust foundation for modelling and prediction, thereby enhancing the reliability
and applicability of our findings.

5.5 Summary
This chapter explores the Remaining Useful Life (RUL) prediction of Lithium-Ion batteries. Initial
models, such as LSTM and Transformer with Global Max Pooling, weren’t optimal. Effective mod-
els needed to handle sparse data, extract essential features, and understand feature relationships.
Hybrid models, particularly the Convolutional, LSTM, Densely Connected (CLDNN), and Transformer-
LSTM (Temporal-Transformer), were more effective. These models captured battery health complex-
ity, delivering robust RUL predictions. Through hyperparameter tuning, their performance improved.
Comparatively, they surpassed state-of-the-art models like Auto encoder-DNN in predicting precise
remaining battery cycles, essential for battery health management.

46
6 Discussion
This chapter highlights the accomplishments of our project, featuring the development of two unique
deep learning models, which, to our knowledge, are unparalleled on this dataset for enhanced lithium-
ion battery health estimation, and their rigorous benchmarking against contemporary techniques. It
emphasises the necessity for continued exploration, particularly involving real-world validation with a
broader and diverse battery dataset, to bolster our understanding and trust in these models.

6.1 Achievements
The overarching goal of this project was to harness deep learning techniques for enhancing battery
health estimation, mainly focusing on Lithium-ion batteries (LIBs). In pursuit of this ambition, the
research has achieved substantial breakthroughs, aligning with the original objectives and contribut-
ing significantly to the data-driven battery management systems field. Below, we summarise these
accomplishments:

• Successful implementation and investigation of various preprocessing techniques on raw signals

with different sampling rates. This crucial first step ensured the dimensionality reduction while
preserving the essential features, enabling more manageable yet comprehensive data to feed
into the deep learning models.
• An in-depth exploration of strategies for designing and fine-tuning deep neural networks resulted
in the creation of two robust hybrid models, namely Convolutional, LSTM, Densely Connected
(CLDNN) and the Transformer-LSTM (Temporal-Transformer). These models showed superior
performance in predicting the Remaining Useful Life (RUL) of LIBs, a critical step towards effi-
cient battery health management.
• The research introduced these two hybrid models and underscored their efficacy by rigorously
benchmarking them against an array of temporal network frameworks and state-of-the-art ap-
proaches. The comprehensive comparison further validated the potential of these novel models.
• Our models, derived through a customised neural network architecture to suit specific features,
demonstrates significant improvements in both root mean square error (RMSE), mean absolute
percentage error (MAPE), and mean absolute error (MAE) outperforming the study conducted
by the China University of Geo-sciences (Fei et al., 2023) and from Beihang University (Ren
et al., 2020).
• The innovative amalgamation of LSTM and Transformer models in the Temporal-Transformer
configuration is a pioneering achievement of this study. This novel combination has demon-
strated a commendable ability to interpret LIB data and opened a new avenue for future explo-
ration.
• The documentation of the entire research methodology, including data preprocessing, dataset
attributes, and the design intricacies of our neural network models, adds to the transparency
and reproducibility of our work, contributing to the broader academic community.

It is essential to mention that while the project has made strides in its objectives, the path towards
complete and accurate battery health estimation still requires further exploration and investigation.
For example, the real-world implementation and validation of the models on a more extensive, di-
verse set of battery data can add another dimension to our understanding and confidence in these
approaches.

In conclusion, this study is essential to refining battery management systems using data-driven deep
learning techniques. The insights from the research serve as a solid foundation for further advance-
ments in this field.

6.2 Deficiencies and Future Work

Future work may also explore more complex augmentation methods and alternative ensemble solu-
tions to refine model performance further.

47
Intriguingly, liquid neural networks (LNNs), a new type of neural network that can adapt its struc-
ture and parameters in response to new data, could improve this research area. The dynamic and
adaptive nature of LNNs could make them superior to traditional static neural networks for tasks such
as predicting the life of lithium-ion batteries. These adaptive networks may bring about greater ef-
ficiency, robustness, and explainability in the modelling process, heralding a paradigm shift in how
machine learning is applied within battery management systems.

Liquid Neural Networks (LNNs), with their inherent capability to adjust structure and parameters dy-
namically based on new data, offer a compelling route for predicting lithium-ion batteries’ Remaining
Useful Life (RUL). Their compact design, reduced parameter count, and overall less computational
intensity position them perfectly for large-scale applications, especially in battery management sys-
tems of electric vehicles (To learn more about this nascent architecture, refer to Appendix F 8.6 ).
LNNs exhibit a lesser propensity to overfit, so they display enhanced generalisation capabilities when
introduced to new data sets. The adaptability, efficiency, scalability, and robustness of LNNs present
an exciting potential to overhaul RUL prediction techniques for lithium-ion batteries, thus significantly
refining battery management systems.

Moreover, a critical area for future research would be to delve into the reduction in parameter count,
which can lead to decreased memory usage, making models more lightweight and efficient. This
can be especially valuable in applications with limited computational resources, such as embedded
systems found in electric vehicles.

Additionally, it would be intriguing to measure and compare the processing time of data-driven tech-
niques with traditional approaches. Assessing this contrast in processing time during real-time infer-
ence can provide insights into the practical applicability of these techniques, especially in scenarios
demanding instant decision-making.

One intriguing avenue for further exploration, which arises more as a point of interest rather than a
direct recommendation, pertains to the potential connections between the dataset derived from hyper-
parameter optimization and established physics models. Specifically, an intriguing question arises: to
what extent do the parameters identified through differential equations commonly employed in physics
align with those determined through hyperparameter optimization? Investigating this alignment could
potentially yield insights into the compatibility and convergence of these distinct methodologies. This
avenue of inquiry holds the promise of enhancing our understanding of the intricate interplay between
data-driven techniques and physics-based modelling, thereby contributing to the broader advance-
ment of both fields. Further investigation into this nuanced relationship between hyperparameter
optimization outcomes and physics models could thus be a valuable trajectory for future research.

In wrapping up, our study signifies an advancement in predicting the remaining useful life of lithium-
ion batteries. Still, upcoming research should hone in on diversifying data sources, simplifying model
intricacies, and probing into the potential of emergent technologies, notably liquid neural networks
8.6.

48
7 Conclusion
Our primary aim was to propose data-driven algorithms capable of predicting the remaining cycles
of lithium-ion batteries, based on a moving window size. The algorithms we developed have demon-
strated superior performance compared to prior approaches, even outperforming top-tier models pre-
viously utilised.

7.1 Thesis Contributions:

1. Effective Preprocessing: Implemented diverse techniques on raw signals to reduce dimen-
sionality while preserving essential features for deep learning.
2. Innovative Hybrid Models: Explored deep neural network design, resulting in CLDNN and
Temporal-Transformer models that outperformed in predicting Lithium-ion battery RUL.

3. Comprehensive Benchmarking: Rigorously compared hybrid models against temporal net-

works and state-of-the-art approaches, affirming their effectiveness.
4. Enhanced Predictive Accuracy: Custom neural architectures improved key metrics, surpass-
ing previous studies from China University of Geo-sciences and Beihang University.

5. Innovative Temporal-Transformer: Novel LSTM and Transformer fusion in Temporal-Transformer,

advancing Lithium-ion battery data interpretation.
6. Transparent Methodology: Documented research methodology, enhancing transparency and
reproducibility.

However, due to the limitation in the quantity and diversity of data, along with the substantial com-
putational requirements due to a high count of parameters in our models, these algorithms may not
be immediately suitable for deployment in real-time, online systems. While we believe that our work
has contributed valuable insights and advanced the state of knowledge in this field, it is clear that
more work is needed before these algorithms can be fully integrated into battery management sys-
tems for real-time RUL predictions. The reduction in the model complexity and increase in the amount
of diverse data could allow for the development of more efficient models, which could potentially be
implemented online in battery management systems.

In summary, our work is not the final solution, but it does represent a significant step forward in
this field. It provides a solid foundation for future research, with the potential to greatly improve the
reliability and efficiency of battery management systems, ultimately enhancing the performance and
safety of devices that rely on lithium-ion batteries.

49
8 Appendices
8.1 Appendix A - Various Neural Network architectures explained
Different Neural Network and reasons for why they are used:

1. Artificial Neural Network (ANN): ANNs are a foundational type of neural network architecture
used for general-purpose tasks. They consist of interconnected nodes, or neurons, organised in lay-
ers. ANNs are versatile and can be used for various applications such as classification, regression,
and pattern recognition (Hsu et al., 1995).

2. Convolutional Neural Network (CNN): CNNs are designed to analyse data with a grid-like struc-
ture, such as images and video. CNNs leverage convolutional layers to automatically learn local
features and hierarchical representations from the input data. They excel at tasks like image classifi-
cation, object detection, and image segmentation (LeCun et al., 1995).

3. Recurrent Neural Network (RNN): RNNs are suitable for processing sequential data, where
the current input depends on previous inputs. RNNs maintain an internal state, or memory, to cap-
ture temporal dependencies. They are commonly used in natural language processing tasks, speech
recognition, and time series analysis (LeCun et al., 2015).

4. Long Short-Term Memory (LSTM): LSTMs are a type of RNN architecture designed to address
the vanishing gradient problem and capture long-term dependencies. LSTMs incorporate memory
cells that selectively retain and update information over time, making them more effective at handling
sequences with long-term dependencies (Hochreiter and Schmidhuber, 1997).

5. Bidirectional LSTM (Bi-LSTM): Bi-LSTMs combine the capabilities of LSTMs with the ability to
process data in both forward and backward directions. This allows the network to capture information
from both past and future contexts, enhancing its understanding of the input sequence. Bi-LSTMs are
commonly used in machine translation, sentiment analysis, and speech recognition tasks Jang et al.
(2020).

6. Deep Belief Network (DBN): DBNs are generative models of multiple layers of restricted Boltz-
mann machines (RBMs). They are used for unsupervised learning and feature representation. DBNs
can learn hierarchical representations and have been applied to feature learning, anomaly detection,
and generative modelling tasks (Hinton, 2009).

7. Neural Turing Machine (NTM): NTMs are neural network architecture incorporating an exter-
nal memory component. They are designed to mimic the working of a Turing machine, allowing them
to learn to read, write, and access a memory matrix. NTMs are useful for tasks that require memory
manipulation, such as algorithmic tasks, language modelling, and sequential processing (Collier and
Beel, 2018).

8. Transformer: Transformers are attention-based neural network architectures that have revolu-
tionised natural language processing tasks. They use self-attention mechanisms to capture relation-
ships between different positions within a sequence. Transformers have achieved state-of-the-art
results in machine translation, language understanding, and text generation tasks (Vaswani et al.,
2017).

9. Autoencoder: Autoencoders are unsupervised learning models for dimensionality reduction and
feature learning. They consist of an encoder network that maps the input to a latent space rep-
resentation and a decoder network that reconstructs the input from the latent representation. Au-
toencoders have applications in data compression, denoising, and anomaly detection (Raschka and
Mirjalili, 2019).

10. Boltzmann Machine: Boltzmann Machines are a generative stochastic neural network type.
They model the joint probability distribution of binary random variables and can capture complex de-
pendencies between variables. Boltzmann Machines have been used in various applications such as
feature learning, dimensionality reduction, and collaborative filtering (Kappen and Rodrı́guez, 1998).

50
11. Deep Neural Network (DNN): DNNs refer to neural networks with multiple hidden layers be-
tween the input and output layers. They can learn hierarchical representations and capture complex
relationships in the data. DNNs have been successful in various domains, including image recogni-
tion, natural language processing, and speech recognition Raschka and Mirjalili (2019).

Another example is the recurrent neural network (RNN) with external memory, such as the Neural
Turing Machine (NTM) or the Differentiable Neural Computer (DNC). These models incorporate an
external memory component that can read from and write to a memory matrix, allowing them to store
and retrieve information over long sequences or perform algorithmic tasks.

The Nonlinear Auto-Regressive with exogenous inputs (NARX) architecture is a neural network widely
used in modelling and forecasting time series data, notably excelling in handling nonlinear relation-
ships and intricate dynamics. This network has two principal components: an auto-regressive (AR)
component, which models the connection between current output and previous outputs and inputs,
thereby capturing temporal dependencies, and an exogenous (X) component that incorporates ex-
ternal variables or influences into the model to enrich prediction. Training of the NARX architecture
happens using historical data, and it employs a feedback mechanism to self-correct internal states
based on prediction errors, enhancing forecasting accuracy over time. NARX’s effectiveness is well-
demonstrated across various fields, such as finance, engineering, and environmental science.

8.2 Appendix B - Examples of the analytical equations used in physics based

models
The following is an example of the Differential equations that are required to be solved:
P2D - Pseudo 2D Isothermal model Subramanian et al. (2007):
Li-ion diffuse in solid phase: Cs is the Li-ion concentration in solid spherical particles following Fick’s
Second Law
∂Cs Ds ∂ 2 ∂Cs
= 2 (r ), f or r ∈ (0, Rs ) (42)
∂t r ∂r ∂r
Li-ion diffuse in electrolyte phase: In general, the electrolyte concentration distribution ce

∂ce ∂ 2 ce (1 − t0+ ) Li
ηe = Deef f + j , f or x ∈ (0, L) (43)
∂t ∂x2 F
Potential equation in the solid phase: Generally, the potential distribution in the solid phase ϕs follows
Ohm’s law
∂ 2 ϕs
σ ef f − j Li = 0, f or x ∈ (0, L) (44)
∂x2
Potential equation in the electrolyte phase: the potential distribution in the electrolyte phase ϕs follows
the Ohm’s law
∂ 2 ϕe 2RT ∂ 2 lnce
k ef f 2
+ kdef f + j Li = 0, f or x ∈ (0, L) (45)
∂x F ∂x2
The BV kinetic formula is used to describe the rates of the Li-ion intercalation/deintercalation reactions
for each electrode
αp F −αn F
j Li = αs i0 (csmax − cse )α (cse )α [exp( γ) − exp( γ)], f or x ∈ (0, L) (46)
RT RT

8.3 Appendix C - Filter Based Method Background

Kalman Filter

A ”Kalman filter” is an algorithm used to estimate the state of a dynamic system by incorporating
noisy measurements over time (Welch et al., 1995). It combines predictions from a mathematical
model of the system with real-time measurements to obtain an optimal estimate of the system’s true
state. The Kalman filter is widely used in various fields, including control systems, robotics, naviga-
tion, and signal processing (Welch et al., 1995).

The basic concept of a Kalman filter involves two main steps: the prediction step and the update
step. In the prediction step, the filter uses the system’s mathematical model and the previous state

51
estimate to predict the current state. The predicted state is then used as a prior estimate for the up-
date step. The filter incorporates sensor measurements in the update step to refine the state estimate.
It compares the predicted state with the measurements, considering the uncertainties associated with
the predictions and the measurements, to compute an optimal estimate of the current state.

The Kalman filter uses a recursive process, where each new measurement is incorporated into the
estimate and continuously refined as new measurements become available 3. It maintains a belief
about the system’s state, represented by a probability distribution, and updates this belief based on
the incoming measurements and the system dynamics. The filter considers both the current mea-
surement and the past estimate’s accuracy to adjust the weight given to each source of information.

The strength of the Kalman filter lies in its ability to handle noisy measurements and model uncertain-
ties, providing an optimal estimate that minimises the mean squared error between the true state and
the estimated state. It balances incorporating measurements and relying on the system’s model pre-
dictions. However, the Kalman filter assumes linearity and Gaussian noise distributions, which may
limit its effectiveness in highly nonlinear systems or the presence of non-Gaussian noise. Extensions,
such as the extended Kalman filter or the unscented Kalman filter, address some of these limitations
and allow for modelling nonlinear systems. The Kalman gain is a key component of the Kalman filter

Algorithm 3 Kalman Filter

1: Initialise the initial state estimate and covariance matrix.
2: for each time step do
3: Predict the new state estimate based on the system dynamics.
4: Predict the new covariance matrix based on the process noise.
5: Compute the Kalman-gain based on predicted covariance-matrix & measurement noise.
6: Update the state estimate based on the observed measurement.
7: Update the covariance matrix based on the Kalman gain and measurement noise.
8: end for
9: estimate the system state using the final state estimate.

algorithm. It is a weighted factor used to determine the contribution of the measured data to the up-
dated state estimate in the filter. The Kalman gain adjusts the balance between the predicted state
estimate and the measured data, considering the uncertainty in both (Welch et al., 1995).

Mathematically, the Kalman gain is calculated as the product of the predicted covariance matrix (rep-
resenting the uncertainty in the predicted state estimate) and the measurement matrix (relating the
measurements to the state space). The resulting product is then multiplied by the inverse sum of the
predicted covariance matrix and the measurement noise covariance matrix (Bishop et al., 2001).

The Kalman gain effectively determines the relative importance of the predicted state estimate and
the measured data in the updating step of the Kalman filter. If the predicted covariance is high com-
pared to the measurement noise covariance, the Kalman gain will be low, indicating that more weight
is given to the predicted state estimate. Conversely, if the measurement noise covariance is high
compared to the predicted covariance, the Kalman gain will be high, indicating that more weight is
given to the measured data (Bishop et al., 2001).

By adjusting the Kalman gain, the Kalman filter can balance the contributions of the predicted state
estimate and the measured data, resulting in an optimal estimate of the system state that minimises
the overall estimation error (Bishop et al., 2001).

Particle Filter

A particle filter, also known as a Monte Carlo filter, is a probabilistic algorithm used for estimating
the state of a system in the presence of uncertainty (Djuric et al., 2003). It is advantageous when the
system being modelled exhibits nonlinear and non-Gaussian behaviour. The particle filter represents
the state estimate as a set of particles, where each particle represents a possible hypothesis of the
system’s state (Djuric et al., 2003).

The particle filter operates in a recursive manner, consisting of two main steps: the prediction step

52
and the update step 4. In the prediction step, particles are propagated forward in time according to
a dynamic model that describes the system’s behaviour. The propagation process introduces uncer-
tainty, and particles are spread out to account for potential state variations (Djuric et al., 2003).

In the update step, the particles are re-weighted based on the likelihood of the observed measure-
ments given their respective states. The weights of the particles represent the probability of each
particle being the true state of the system. The update is performed by comparing the predicted mea-
surements generated by each particle with the actual measurements obtained from sensors. Parti-
cles that generate measurements closer to the observed measurements are assigned higher weights,
while particles that generate less accurate measurements receive lower weights (Djuric et al., 2003).

After the update step, re-sampling is typically performed to select a new set of particles for the next
iteration. The re-sampling process favours particles with higher weights, increasing their represen-
tation in the particle set. This helps concentrate the particles around the more probable states and
discard particles less likely to represent the actual state (Djuric et al., 2003).

Repeating the prediction, update, and re-sampling steps over time, the particle filter estimates the
system’s state that converges to the true state as more measurements are incorporated. The par-
ticle filter is particularly advantageous in situations where the system dynamics are nonlinear, and
the noise is non-Gaussian. It can handle multi-modal distributions and track multiple hypotheses si-
multaneously. However, the particle filter’s computational complexity increases with the number of
particles, and selecting an appropriate number of particles is crucial to balance estimation accuracy
and computational efficiency (Djuric et al., 2003). Additional variants of the Kalman filter:

Algorithm 4 Particle Filter

1: Initialise particles with random states.
2: for each time step do
3: Predict the new state of each particle based on the system dynamics.
4: Update the weight of each particle based on the likelihood of observed measurements.
5: Re-sample particles based on their weights to obtain a new set of particles.
6: end for
7: Estimate the system state using the weighted average of the particles.

The main differences between the Adaptive Kalman Filter (AKF), Extended Kalman Filter (EKF), and
Unscented Kalman Filter (UKF) lie in their approaches to handle non-linearity and adaptability in the
estimation process:

1. Adaptive Kalman Filter (AKF): The AKF is an extension of the standard Kalman Filter that in-
corporates adaptive mechanisms to update the filter’s parameters and covariance matrices based on
the observed data. It adjusts the model and measurement noise covariances to adapt to changing
system dynamics and uncertainties (Hu et al., 2003). The AKF allows the filter to dynamically ad-
just its estimation process better to handle variations in the system and measurement characteristics.
This adaptability can improve estimation accuracy when system dynamics or noise statistics change
over time (Hu et al., 2003).

2. Extended Kalman Filter (EKF): The EKF is a nonlinear extension of the Kalman Filter that approx-
imates the system dynamics and measurement functions using first-order Taylor series expansions
(linearisations). It linearises the nonlinear functions around the current estimated state to apply the
standard Kalman Filter equations (Einicke and White, 1999). The EKF is suitable for systems with
moderately nonlinear dynamics. However, the linearisation process introduces approximation errors,
and the filter’s performance heavily relies on the accuracy of the linearisation. If the non-linearities
are significant, the EKF may provide suboptimal estimates(Einicke and White, 1999)..

3. Unscented Kalman Filter (UKF): The UKF is another nonlinear extension of the Kalman Filter
that addresses the limitations of linearisation in the EKF. Instead of linearising the nonlinear functions,
the UKF employs an unscented transform to sample representative points (sigma points) around the
mean estimate (Wan and Van Der Merwe, 2001). These sigma points are then propagated through
the nonlinear functions to capture the statistical properties of the system’s state distribution. By avoid-
ing explicit linearisation, the UKF provides a more accurate approximation of the nonlinear functions

53
and performs better than the EKF for highly nonlinear systems (Wan and Van Der Merwe, 2001).

8.4 Appendix D - LIB charge/discharge policies explained further

The following appendix section is here to explain further charging modes of LIB:

1. C-rate: The C-rate measures the rate at which a battery is discharged relative to its maximum
capacity. A 1C rate means that the discharge current will discharge the entire battery in 1 hour.
For a battery with a capacity of 1000 mAh, a discharge rate of 1C would be 1000 mA, 2C would
be 2000 mA, and so on. Conversely, charging at a 1C rate would charge the entire battery in
one hour Plett (2015).

2. CC-CV (Constant Current-Constant Voltage): This is a two-stage charging process widely

used for lithium-ion batteries.
- Constant Current (CC) Phase: During the first phase, the charger applies a constant current
to the battery, gradually raising the voltage. The CC phase allows the battery to be charged
quickly as it can take more current without damaging the battery Plett (2015).
- Constant Voltage (CV) Phase: After reaching a certain voltage threshold, the charger switches
to a constant voltage phase. Here, the current gradually decreases as the battery becomes
more charged. This phase ensures that the battery is charged fully without exceeding the max-
imum voltage, which could cause damage Plett (2015).
3. CP-CV (Constant power-Constant Voltage): This is a modification of the CC-CV charging
strategy, often used for higher efficiency in some applications.
- Constant Power (CP) Phase: Unlike constant current, where the current remains fixed, the
constant power phase adjusts the current to maintain a constant power delivery as the voltage
increases. Depending on the specific battery chemistry and condition, this can provide a more
optimised charging curve Plett (2015).
- Constant Voltage (CV) Phase: Like the CC-CV mode, the constant voltage phase follows,
where the voltage is held constant, and the current decreases to prevent overcharging.

54
8.5 Appendix E - Pseudo code for best performing models

Algorithm 5 Hyperparameter Optimization Transformer-LSTM

1: Constructor(windowSize): set self.window-size = windowSize
2: function build(hp):
3: Set hyperparameters using hp inludes: embed dim, num heads, ff dim
4: Define Inputs: qdlin in, tdlin in, ir in, dt in, qd in
5: Concatenate qdlin in and tdlin in
6: Apply TimeDistributed dense layer and dropout
7: Flatten the results
8: Concatenate with other inputs
9: Define TransformerBlock and MultiHeadSelfAttention classes
10: Create transformer block using embed dim, num heads, ff dim
11: Apply transformer to concatenated inputs
12: Apply dropout and LSTM layer
13: Add dense layers
14: Compile the model with loss, optimiser, and metrics
15: return model

Algorithm 6 Hyperparameter Optimization for CLDNN

Class MyHyperModel:
Initialise Hyperparameter options for learning rate, filters, kernel, stride, activations, dropout rates,
and optimizer.
Function build(hp):
Choose hyperparameters using hp.Choice, hp.Int, and hp.Float
Define Input layers for qdlin in, tdlin in, ir in, dt in, and qd in
Concatenate detailed features for qdlin in and tdlin in
Construct CNN layers:
Apply Conv1D with TimeDistributed on concatenated input
Apply MaxPooling1D with TimeDistributed
Repeat the above two steps twice more, increasing filters by a factor of 2 each time
Flatten the output
Apply Dropout
Concatenate flattened CNN output with ir in, dt in, and qd in
Construct LSTM layers:
Apply LSTM with chosen units and activation
Apply Dropout
Construct Dense layers:
Apply Dense layer with chosen units and activation
Apply final Dense layer for output with ’relu cut’ activation
Model Compilation:
Select optimizer based on hyperparameter choice
Define metrics list: mae remaining cycles, mape remaining cycles, and mse remaining cycles
Compile model with ’mse’ loss, chosen optimizer, and metrics
return compiled model

8.6 Appendix F - Introduction To Liquid Neutral Networks

A brief Overview of Liquid Neural Network

Introduced by the researchers at the pioneering Computer Science and Artificial Intelligence Lab
(CSAIL) at the Massachusetts Institute of Technology, a groundbreaking work has been achieved by
Hasani et al. (2021) in the field of neural networks. The work encompasses several significant ad-
vancements:

1.Time-Continuous Recurrent Neural Networks : Traditional RNNs operate on discrete time steps,

55
processing input sequences one step at a time (Karpathy, 2015). Here, the researchers are working
with models that operate in continuous time, which could allow for smoother transitions and potentially
more accurate modelling of real-world phenomena.

2.Linear First-Order Dynamical Systems with Nonlinear Gates : Instead of using nonlinearities
implicitly (which is more common in neural networks), the authors are building their models with linear
first-order dynamical systems. These systems describe how some variable changes over time, using
linear equations. The authors then modulate these linear systems with nonlinear gates. This com-
bination allows for complex dynamics while retaining some of the mathematical tractability of linear
systems (Hasani, 2020).

3.Liquid Time-Constants: Traditional dynamical systems have fixed time-constants, which deter-
mine how quickly a system responds to changes. In this model, the time-constants can vary, meaning
they are ”liquid” and can adapt to the hidden state of the network (Hasani et al., 2021). This flexibility
could make the system more adaptable to different kinds of data.

4.Numerical Differential Equation Solvers : To compute the output of these networks, the authors
are using numerical methods to solve the associated differential equations. This is in line with their
continuous-time formulation (Xiao et al., 2023).

5.Stable and Bounded Behaviour: This likely means that the network’s responses are controlled
and won’t lead to runaway feedback or instability.

6.Superior Expressivity in Neural Ordinary Differential Equations: The networks appear to pro-
vide better performance within the family of neural ODEs. Neural Ordinary Differential Equations
(ODEs) are a type of neural network where the dynamics of the network are defined by a continuous-
time differential equation (Xiao et al., 2023). The claim here is that this new model offers improved
capabilities in this domain.

7.Improved Performance on Time-Series Prediction: The authors claim that their Liquid Time-
Constant Networks (LTCs) perform better than both classical and modern RNNs on time-series pre-
diction tasks.

8.Theoretical Approach and Experimental Verification: The Hasani et al. (2021) don’t just make
these claims, but they take a two-step approach to verify them. They first analyse the networks
mathematically, finding bounds on their dynamics and computing their expressive power. Then, they
conduct a series of experiments on time-series prediction tasks to demonstrate the capabilities of
these networks.

To sum up, the authors are presenting an innovative form of continuous-time Recurrent Neural Net-
work (RNN) that incorporates linear first-order dynamical systems controlled by nonlinear gates.
These networks exhibit adjustable time-constants and a reduction in the number of parameters.
Importantly, they exhibit excellent performance in both theoretical evaluations and real-world trials.
These characteristics hold promise, particularly in the realm of Remaining Useful Life (RUL) predic-
tion. The integration of physics-based and data-driven models could potentially enhance accuracy in
such applications.

56
8.7 Appendix G - The training loss plotted for all the temporal models

Figure 25: Training loss (MSE) of each of the models

Figure 26: Training loss (MSE) of each of the models plotted

8.8 Appendix H - Source code explanation, Software and Hardware

The following appendix section describes the code associated with the project address Table
5:

Explaining the folders in the repository Included within these folders are numerous Jupyter note-
books which contain the investigated hybrid models. It’s worth noting that among these notebooks,
the ones labeled as CLDNN OPTIMISED and TRANSFORMER LSTM OPTIMISED stood out as the most suc-
cessful approaches in terms of performance.

57
Table 5: Explainig the code base

Folder Description
checkpoints Contains the checkpoints of the models as well
Hierarchical Data Format.
data preprocessing Contains the Matlab files which contain the
code for cleaning up the data and each of its
salient features.
hyperparameter optimisation Hyper-parameter optimisation with each of the
trials for the models
initial Downloading the raw data and structuring it for
data preprocessing
logs The logs for some of the models explored
tfrecords The cleaned up data which is split into to the
train and test set following the data preprocess-
ing
utils Helper files includes the constant parameter,
custom cost function, formatting and the load-
ing of the data from the tfrecords folder
wandb Saving the weigh using Weights and Biases
platform for the models

The Readme file in the git lab repository contains the link to a google drive where the entire code
base is stored (Note to access public google drive you need a google account). Due to exceeding the
100MB version control limit for each push on GitLab, this has been created so that so that all the code
can be found.

Git Lab Repository: https://git.cs.bham.ac.uk/projects-2022-23/mea228

Google Drive Link: https://drive.google.com/drive/folders/1iO0UhoPYrc13-dIOAM7oU-VO-pud0MQH

Software and Hardware used: Python 3.10, Tensorflow 2.12.0, WANDB, and MATLAB the model
were trained using A100 Nvidia GPU and Tesla V100-SXM2-16GB.

58
References
C. R. Birkl, M. R. Roberts, E. McTurk, P. G. Bruce and D. A. Howey, Journal of Power Sources, 2017,
341, 373–386.
L. Xu, Z. Deng, Y. Xie, X. Lin and X. Hu, IEEE Transactions on Transportation Electrification, 2023, 9,
2628–2644.
D. f. T. Office, Reducing emissions from road transport: Road to
Zero Strategy, 2018, https://www.gov.uk/government/publications/
reducing-emissions-from-road-transport-road-to-zero-strategy.

G. L. Plett, Battery management systems. 1 Battery modeling / Gregory L. Plett, Boston Artech House,
2015.
G. Berckmans, M. Messagie, J. Smekens, N. Omar, L. Vanhaverbeke and J. Van Mierlo, Energies,
2017, 10, 1314.
P. A. Eisenstein, EV Battery Fires: What Consumers Should Know, 2021, https://www.forbes.com/
wheels/news/battery-car-fires/#:~:text=Chevy%20is%20the%20latest%20manufacturer%
20to%20recall%20its.
J. Chatzakis, K. Kalaitzakis, N. C. Voulgaris and S. N. Manias, IEEE transactions on Industrial Elec-
tronics, 2003, 50, 990–999.

H. Pinegar and Y. R. Smith, Journal of Sustainable Metallurgy, 2019, 5, 402–416.

A. Ng, Andrew Ng - Courses, 2009, http://www.robotics.stanford.edu/~ang/courses.html#:~:
text=Machine%20learning%20is%20the%20science.
I. Goodfellow, Y. Bengio and A. Courville, Deep Learning (Adaptive Computation and Machine Learn-
ing series), The MIT Press, Illustrated edn., 2016.

S. Raschka and V. Mirjalili, Python Machine Learning: Machine Learning and Deep Learning with
Python, 2019.
C. Lemaréchal, Doc Math Extra, 2012, 251, 10.
F. Rosenblatt, Psychological review, 1958, 65, 386.

Y. LeCun, Y. Bengio and G. Hinton, nature, 2015, 521, 436–444.

Y. LeCun, Y. Bengio et al., The handbook of brain theory and neural networks, 1995, 3361, 1995.
I. Sutskever and O. Vinyals, Sequence to sequence learning with neural networks - arxiv.org, 2014,
https://arxiv.org/pdf/1409.3215.pdf.

A. Karpathy, The Unreasonable Effectiveness of Recurrent Neural Networks, 2015, http://

karpathy.github.io/2015/05/21/rnn-effectiveness/.
Y. Bengio, P. Simard and P. Frasconi, IEEE transactions on neural networks, 1994, 5, 157–166.
S. Hochreiter, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998,
6, 107–116.
S. Hochreiter and J. Schmidhuber, Neural Computation, 1997, 9, 1735–1780.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin,
Advances in neural information processing systems, 2017, 30, year.

J. Fang, Y. Yu, C. Zhao and J. Zhou, Proceedings of the 26th ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming, 2021, pp. 389–402.
D. Rothman, TRANSFORMERS FOR NATURAL LANGUAGE PROCESSING - : build, train, and fine-
tune deep neural network... architectures for nlp with python, pytorch, tensor., Packt Publishing
Limited, 2022.

59
B. Yang, J. Wang, P. Cao, T. Zhu, H. Shu, J. Chen, J. Zhang and J. Zhu, Journal of Energy Storage,
2021, 39, 102572.
X. Hu, C. Zou, C. Zhang and Y. Li, IEEE Power and Energy Magazine, 2017, 15, 20–31.
C. Mikolajczak, M. Kahn, K. White and R. T. Long, Lithium-ion batteries hazard and use assessment,
Springer Science & Business Media, 2012.
S. Wang, K. Liu, Y. Wang, D.-I. Stroe, C. Fernandez and J. M. Guerrero, Multidimensional Lithium-Ion
Battery Status Monitoring, CRC Press, 2022.
B. Scrosati, J. Hassoun and Y.-K. Sun, Energy & Environmental Science, 2011, 4, 3287–3295.
Y. Li, K. Liu, A. M. Foley, A. Zülke, M. Berecibar, E. Nanini-Maury, J. Van Mierlo and H. E. Hoster,
Renewable and Sustainable Energy Reviews, 2019, 113, 109254.
A. Farmann, W. Waag, A. Marongiu and D. U. Sauer, Journal of Power Sources, 2015, 281, 114–130.
L. Lu, X. Han, J. Li, J. Hua and M. Ouyang, Journal of power sources, 2013, 226, 272–288.
J. Zhang and J. Lee, Journal of power sources, 2011, 196, 6007–6014.
H. Wenzl, I. Baring-Gould, R. Kaiser, B. Y. Liaw, P. Lundsager, J. Manwell, A. Ruddell and V. Svoboda,
Journal of power sources, 2005, 144, 373–384.
M. R. Palacı́n, Chemical Society Reviews, 2018, 47, 4924–4933.
X. Yu and A. Manthiram, Energy & Environmental Science, 2018, 11, 527–543.
A. Barré, B. Deguilhem, S. Grolleau, M. Gérard, F. Suard and D. Riu, Journal of Power Sources, 2013,
241, 680–689.
C. R. Birkl, M. R. Roberts, E. McTurk, P. G. Bruce and D. A. Howey, Journal of Power Sources, 2017,
341, 373–386.
S. Grolleau, A. Delaille, H. Gualous, P. Gyan, R. Revel, J. Bernard, E. Redondo-Iglesias, J. Peter and
S. Network, Journal of Power Sources, 2014, 255, 450–458.
J. Vetter, P. Novák, M. R. Wagner, C. Veit, K.-C. Möller, J. Besenhard, M. Winter, M. Wohlfahrt-
Mehrens, C. Vogler and A. Hammouche, Journal of power sources, 2005, 147, 269–281.
D. P. Finegan, M. Scheel, J. B. Robinson, B. Tjaden, I. Hunt, T. J. Mason, J. Millichamp, M. Di Michiel,
G. J. Offer, G. Hinds et al., Nature communications, 2015, 6, 6924.
K. Qian, Y. Li, Y.-B. He, D. Liu, Y. Zheng, D. Luo, B. Li and F. Kang, RSC advances, 2016, 6, 76897–
76904.
J. Cannarella and C. B. Arnold, Journal of Power Sources, 2013, 226, 149–155.
X. Han, M. Ouyang, L. Lu and J. Li, Journal of Power Sources, 2015, 278, 814–825.
E. J. Dickinson and A. J. Wain, Journal of Electroanalytical Chemistry, 2020, 872, 114145.
V. Ramadesigan, V. Boovaragavan, J. C. Pirkle and V. R. Subramanian, Journal of The Electrochemi-
cal Society, 2010, 157, A854.
C. Lin and A. Tang, Energy Procedia, 2016, 104, 68–73.
H.-G. Schweiger, O. Obeidi, O. Komesker, A. Raschke, M. Schiemann, C. Zehner, M. Gehnen,
M. Keller and P. Birke, Sensors, 2010, 10, 5604–5625.
P. Lyu, X. Liu, J. Qu, J. Zhao, Y. Huo, Z. Qu and Z. Rao, Energy Storage Materials, 2020, 31, 195–220.
O. Tremblay, L.-A. Dessaint and A. Dekkiche, 2007 IEEE Vehicle Power and Propulsion Conference,
2007, 284–289.
X. Zhao, Y. Wang, Z. Sahinoglu, T. Wada, S. Hara and R. Callafon, author, 2014, pp. 2779–2785.
J. Meng, G. Luo, M. Ricco, M. Swierczynski, D.-I. Stroe and R. Teodorescu, Applied Sciences, 2018,
8, year.

60
W. Song, D. Wu, W. Shen and B. Boulet, Procedia Computer Science, 2023, 217, 1830–1838.
Y. Tian, C. Chen, B. Xia, W. Sun, Z. Xu and W. Zheng, Energies, 2014, 7, 5995–6012.
M. Ouyang, G. Liu, L. Lu, J. Li and X. Han, Journal of Power Sources, 2014, 270, 221–237.
S. Wang, J. Zhang, O. Gharbi et al., Nat Rev Methods Primers, 2021, 1, 41.
M. S. B. Yahia, H. Allagui and A. Mami, 2016 7th International Renewable Energy Congress (IREC),
2016, 1–6.
N. E. Tolouei, S. Ghamari and M. Shavezipur, Journal of Electroanalytical Chemistry, 2020, 878,
114598.
G. Plett, Journal of Power Sources, 2006, 161, 1356–1368.
C. Hu, B. D. Youn and J. Chung, Applied Energy, 2012, 92, 694–704.
H. Dai, X. Wei, Z. Sun, J. Wang and W. Gu, Applied Energy, 2012, 95, 227–237.
P. Spagnol, S. Rossi and S. M. Savaresi, 2011 IEEE International Conference on Control Applications
(CCA), 2011, pp. 587–592.
M. El-Dalahmeh, M. Al-Greer, M. El-Dalahmeh and I. Bashir, Measurement, 2023, 214, 112838.
B. Lambert, A Student’s Guide to Bayesian Statistics, SAGE, 2018.
J. C. A. Anton, P. J. G. Nieto, C. B. Viejo and J. A. V. Vilán, IEEE Transactions on power electronics,
2013, 28, 5919–5926.
Z. Deng, X. Hu, X. Lin, Y. Che, L. Xu and W. Guo, Energy, 2020, 205, 118000.
Y. Li, K. Liu, A. M. Foley, A. Zülke, M. Berecibar, E. Nanini-Maury, J. Van Mierlo and H. E. Hoster,
Renewable and sustainable energy reviews, 2019, 113, 109254.
Z. Wu, Y. Xiong, S. X. Yu and D. Lin, Proceedings of the IEEE conference on computer vision and
pattern recognition, 2018, pp. 3733–3742.
A. Gijsberts and G. Metta, Neural networks, 2013, 41, 59–69.
X. Hu, J. Jiang, D. Cao and B. Egardt, IEEE Transactions on Industrial Electronics, 2016, 63, 2645–
2656.
M. A. Patil, P. Tagade, K. S. Hariharan, S. M. Kolake, T. Song, T. Yeo and S. Doo, Applied energy,
2015, 159, 285–297.
S.-L. Ho, M. Xie and T. N. Goh, Computers & Industrial Engineering, 2002, 42, 371–375.
S. Hochreiter and J. Schmidhuber, Neural computation, 1997, 9, 1735–1780.
M. S. H. Lipu, M. A. Hannan, A. Hussain, M. H. M. Saad, A. Ayob and F. Blaabjerg, IEEE Access,
2018, 6, 28150–28161.
O. NASA, Li-ion Battery Aging Datasets, https://data.nasa.gov/dataset/
Li-ion-Battery-Aging-Datasets/uj5r-zjdb.
O. CALCE, center for advanced life cycle engineering, https://calce.umd.edu/battery-data.
I. Bashir, M. Al-Greer et al., 2022 57th International Universities Power Engineering Conference
(UPEC), 2022, pp. 1–6.
E. Chemali, P. J. Kollmeyer, M. Preindl, R. Ahmed and A. Emadi, IEEE Transactions on Industrial
Electronics, 2018, 65, 6730–6739.
S. Shen, M. Sadoughi, X. Chen, M. Hong and C. Hu, Journal of Energy Storage, 2019.
D. N. How, M. A. Hannan, M. S. H. Lipu, K. S. Sahari, P. J. Ker and K. M. Muttaqi, IEEE Transactions
on Industry Applications, 2020, 56, 5565–5574.
D. Zhou, Z. Li, J. Zhu, H. Zhang and L. Hou, IEEE Access, 2020, 8, 53307–53320.

61
L. Ren, J. Dong, X. Wang, Z. Meng, L. Zhao and M. J. Deen, IEEE Transactions on Industrial Infor-
matics, 2020, 17, 3478–3487.
Y. Fan, F. Xiao, C. Li, G. Yang and X. Tang, Journal of Energy Storage, 2020, 32, 101741.

P. Venugopal, Energies, 2019, 12, 4338.

Z. Fei, Z. Zhang, F. Yang and K.-L. Tsui, Journal of Energy Storage, 2023, 62, 106903.
J. K. Barillas, J. Li, C. Günther and M. A. Danzer, Applied Energy, 2015, 155, 455–462.

S. R. Hashemi, A. Bahadoran Baghbadorani, R. Esmaeeli, A. Mahajan and S. Farhad, International

Journal of Energy Research, 2021, 45, 5747–5765.
L. Ren, L. Zhao, S. Hong, S. Zhao, H. Wang and L. Zhang, Ieee Access, 2018, 6, 50587–50598.
TRI, Toyota Research Institute, https://data.matr.io/.

H. Aoyama, Ann. Inst. Stat. Math, 1954, 6, 1–36.

K. A. Severson, P. M. Attia, N. Jin, N. Perkins, B. Jiang, Z. Yang, M. H. Chen, M. Aykol, P. K. Herring,
D. Fraggedakis et al., Nature Energy, 2019, 4, 383–391.
M. Sarfraz, Computers & Graphics, 2003, 27, 107–121.

Y. Wang, D. Watanabe, E. Hirata and S. Toriumi, Journal of Marine Science and Engineering, 2021,
9, year.
https://uk.mathworks.com/help/curvefit/cubic-spline-interpolation.html.
K. Team, Keras documentation: BayesianOptimization Tuner, https://keras.io/api/keras_tuner/
tuners/bayesian/.

K.-l. Hsu, H. V. Gupta and S. Sorooshian, Water resources research, 1995, 31, 2517–2530.
B. Jang, M. Kim, G. Harerimana, S.-u. Kang and J. W. Kim, Applied Sciences, 2020, 10, 5841.
G. E. Hinton, Scholarpedia, 2009, 4, 5947.

M. Collier and J. Beel, Artificial Neural Networks and Machine Learning–ICANN 2018: 27th Interna-
tional Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings,
Part III 27, 2018, pp. 94–104.
H. J. Kappen and F. Rodrı́guez, Neural Computation, 1998, 10, 1137–1156.
V. R. Subramanian, V. Boovaragavan and V. D. Diwakar, Electrochemical and Solid-State Letters,
2007, 10, A255.
G. Welch, G. Bishop et al., 1995.
G. Bishop, G. Welch et al., Proc of SIGGRAPH, Course, 2001, 8, 41.

P. M. Djuric, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo and J. Miguez, IEEE signal
processing magazine, 2003, 20, 19–38.
C. Hu, W. Chen, Y. Chen, D. Liu et al., Journal of Global Positioning Systems, 2003, 2, 42–47.
G. A. Einicke and L. B. White, IEEE transactions on signal processing, 1999, 47, 2596–2599.

E. A. Wan and R. Van Der Merwe, Kalman filtering and neural networks, 2001, 221–280.
R. Hasani, M. Lechner, A. Amini, D. Rus and R. Grosu, Proceedings of the AAAI Conference on
Artificial Intelligence, 2021, pp. 7657–7666.
R. Hasani, Ph.D. thesis, Wien, 2020.

W. Xiao, T.-H. Wang, R. Hasani, M. Lechner, Y. Ban, C. Gan and D. Rus, International conference on
machine learning, 2023, pp. 38100–38124.

Machine Learning For Environmental Noise Classification in Smart Cities (Ali Othman Albaji) (Z-Library)
100% (2)
Machine Learning For Environmental Noise Classification in Smart Cities (Ali Othman Albaji) (Z-Library)
179 pages
Machine Learning Systems
No ratings yet
Machine Learning Systems
300 pages
FULLTEXT02
No ratings yet
FULLTEXT02
72 pages
CS230
No ratings yet
CS230
101 pages
Jetson Nano
100% (1)
Jetson Nano
349 pages
No - Ntnu Inspera 142737689 34440404
No ratings yet
No - Ntnu Inspera 142737689 34440404
84 pages
2020 Spring Yan Fsu 0071E 15702-1
No ratings yet
2020 Spring Yan Fsu 0071E 15702-1
96 pages
Mountain Scholar Surayagari Colostate 0053N 16401
No ratings yet
Mountain Scholar Surayagari Colostate 0053N 16401
76 pages
Undergraduate Fundamentals of Machine Learning
No ratings yet
Undergraduate Fundamentals of Machine Learning
163 pages
2023thesefombonne de GalateauA
No ratings yet
2023thesefombonne de GalateauA
156 pages
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
No ratings yet
The Hundred-Page Machine Learning Book-Andriy Burkov (2019) - Removed
145 pages
Om Scratch
100% (1)
Om Scratch
124 pages
Deep Learning Cours
No ratings yet
Deep Learning Cours
165 pages
Introduction To Data Science - Lin and Li
No ratings yet
Introduction To Data Science - Lin and Li
403 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
Fardapaper Image Generation Through Feature Extraction and Learning Using A Deep Learning Approach
No ratings yet
Fardapaper Image Generation Through Feature Extraction and Learning Using A Deep Learning Approach
113 pages
A Comparison of Deep Learning Methods For Time Series Forecasting With Limited Data
No ratings yet
A Comparison of Deep Learning Methods For Time Series Forecasting With Limited Data
55 pages
17 Master2017Liu
No ratings yet
17 Master2017Liu
105 pages
Roche 91xx - Service Manual
100% (1)
Roche 91xx - Service Manual
192 pages
Thesis Khetan Harsha
No ratings yet
Thesis Khetan Harsha
82 pages
Ids PDF
No ratings yet
Ids PDF
397 pages
Machine Learning - A First Course For Engineers and Scientists
No ratings yet
Machine Learning - A First Course For Engineers and Scientists
348 pages
Loan Approval Prediction2
No ratings yet
Loan Approval Prediction2
72 pages
Aca 20 HW
No ratings yet
Aca 20 HW
54 pages
MAJDANI SHABESTARI 2020 Automated Anomaly Recognition in Real Time
No ratings yet
MAJDANI SHABESTARI 2020 Automated Anomaly Recognition in Real Time
181 pages
UCLA Electronic Theses and Dissertations: Title
No ratings yet
UCLA Electronic Theses and Dissertations: Title
62 pages
Machine Learning and Deep Learning Techniques Used in Cybersecurity and Digital Forensics: A Review
No ratings yet
Machine Learning and Deep Learning Techniques Used in Cybersecurity and Digital Forensics: A Review
91 pages
Food Waste ML Thesis
No ratings yet
Food Waste ML Thesis
45 pages
Final Rear Forward Axle Housing Snorkel
No ratings yet
Final Rear Forward Axle Housing Snorkel
38 pages
Cs181 Textbook
No ratings yet
Cs181 Textbook
163 pages
Textbook
No ratings yet
Textbook
161 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
112 pages
Hmls
No ratings yet
Hmls
126 pages
Diploma Thesis Lada Zadranska
No ratings yet
Diploma Thesis Lada Zadranska
96 pages
Fabric Defect Final Black Book Abcdeffg
No ratings yet
Fabric Defect Final Black Book Abcdeffg
64 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
275 pages
Business Forecasting System 181103
No ratings yet
Business Forecasting System 181103
51 pages
Generators 1
No ratings yet
Generators 1
101 pages
Thesis 2022-Bayesian Convolutional Neural Network With Prediction Smoothing A
No ratings yet
Thesis 2022-Bayesian Convolutional Neural Network With Prediction Smoothing A
65 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
194 pages
Solar Power Forecasting With Machine Learning Techniques: Emil Isaksson Mikael Karpe Conde
No ratings yet
Solar Power Forecasting With Machine Learning Techniques: Emil Isaksson Mikael Karpe Conde
64 pages
Rapport
No ratings yet
Rapport
106 pages
SML Book Draft Latest (001 046)
No ratings yet
SML Book Draft Latest (001 046)
46 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Machine Learning Simplified
100% (1)
Machine Learning Simplified
109 pages
UCLA Electronic Theses and Dissertations: Title
No ratings yet
UCLA Electronic Theses and Dissertations: Title
43 pages
Python Data Science
100% (1)
Python Data Science
173 pages
Deep Learning A Z PDF
100% (7)
Deep Learning A Z PDF
799 pages
6.036 Notes
No ratings yet
6.036 Notes
99 pages
Traffic Sign Classification: Mezzi Houssem
No ratings yet
Traffic Sign Classification: Mezzi Houssem
36 pages
Machine Learning Basic Principles
No ratings yet
Machine Learning Basic Principles
124 pages
Machine Learning Algorithms Applications and Practices in Data Science PDF
No ratings yet
Machine Learning Algorithms Applications and Practices in Data Science PDF
113 pages
Gene Prediction Using Unsupervised Deep Networks
No ratings yet
Gene Prediction Using Unsupervised Deep Networks
49 pages
Thesis
No ratings yet
Thesis
45 pages
Validation and Optimization of 3d-Human Body Pose Estimation Approaches For Use in Motion Analysis, Ab. Sami Noorzad and Malek Zedan
No ratings yet
Validation and Optimization of 3d-Human Body Pose Estimation Approaches For Use in Motion Analysis, Ab. Sami Noorzad and Malek Zedan
87 pages
MEMS Motion Sensor: Three-Axis Digital Output Gyroscope: Applications
No ratings yet
MEMS Motion Sensor: Three-Axis Digital Output Gyroscope: Applications
44 pages
Basic Knowledge of Periodic Maintenance
No ratings yet
Basic Knowledge of Periodic Maintenance
58 pages
Table of Contents
No ratings yet
Table of Contents
9 pages
6036 Lecture Notes
No ratings yet
6036 Lecture Notes
56 pages
ATA 46 Network
100% (1)
ATA 46 Network
52 pages
CA Foundation Maths Chapterwise Weightage
No ratings yet
CA Foundation Maths Chapterwise Weightage
7 pages
MSFS Hangar Asset Guide Book 2 - C To G
No ratings yet
MSFS Hangar Asset Guide Book 2 - C To G
36 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Csvtu Syllabus Be Civil 5 Sem
No ratings yet
Csvtu Syllabus Be Civil 5 Sem
12 pages
Shades EQ Shade Chart Mini 2020
100% (1)
Shades EQ Shade Chart Mini 2020
2 pages
Hydraulic Circuit (Standard Topper)
No ratings yet
Hydraulic Circuit (Standard Topper)
7 pages
CAFE DES ARTISTES Recipes
No ratings yet
CAFE DES ARTISTES Recipes
17 pages
Companies - NCR
No ratings yet
Companies - NCR
8 pages
How To Make The Best Mimosa
No ratings yet
How To Make The Best Mimosa
17 pages
Assessment of Corrosion
No ratings yet
Assessment of Corrosion
30 pages
There Are 12 Main Ways by Which You May Send Your Payment To The College
No ratings yet
There Are 12 Main Ways by Which You May Send Your Payment To The College
3 pages
FundamentalsExam W Ratio
No ratings yet
FundamentalsExam W Ratio
13 pages
Pelco IP110 Series Camclosure Spec
No ratings yet
Pelco IP110 Series Camclosure Spec
4 pages
Upton Plays 1 Upton Judy PDF Download
No ratings yet
Upton Plays 1 Upton Judy PDF Download
30 pages
Narative Text
No ratings yet
Narative Text
2 pages
Core 2010 Wu
No ratings yet
Core 2010 Wu
12 pages
Vacuum Pump Troubleshooting
100% (1)
Vacuum Pump Troubleshooting
2 pages
English 1st Rearange
No ratings yet
English 1st Rearange
15 pages
MPMC Syllabus
No ratings yet
MPMC Syllabus
2 pages
Xavier A. Crosswin Surface Welltest Supervisor
No ratings yet
Xavier A. Crosswin Surface Welltest Supervisor
3 pages
Fact - Geten 1B Rwy 19 - Arr-07
No ratings yet
Fact - Geten 1B Rwy 19 - Arr-07
1 page
In-Vitro Anti-Diabetic Activity of Acalypha indica by Using Α - Amylase Inhibition Assay
No ratings yet
In-Vitro Anti-Diabetic Activity of Acalypha indica by Using Α - Amylase Inhibition Assay
3 pages
Characteristics of Precipitation
No ratings yet
Characteristics of Precipitation
4 pages
Tariff - 2024-25
No ratings yet
Tariff - 2024-25
4 pages
Shenbin Cao - China
No ratings yet
Shenbin Cao - China
2 pages
Mapping The Sustainable Development Goals Relationships: Sustainability
No ratings yet
Mapping The Sustainable Development Goals Relationships: Sustainability
15 pages

AI and Machine Learning Report Sample 2

Uploaded by

AI and Machine Learning Report Sample 2

Uploaded by

UNIVERSITY OF BIRMINGHAM

SCHOOL OF COMPUTER SCIENCE

Deep Learning for Remaining Useful Life Estimation

Mohammed Eesa Asif

Computer Science Supervisors:

Materials and Metallurgy Supervisor:

Key-words: Deep Learning, Remaining Useful Life, Estimation, Lithium-Ion Batteries,

5 Results and Evaluation 40

ANN Artificial Neural Network

1.1 Background and Motivation:

1.2 Aim and Objectives:

Addressing this central inquiry encompasses multiple phases:

1.4 Thesis Overview

2.1 Machine Learning

2.1.1 Overfitting & Underfitting

Figure 2: Plot of the common activation function.

2.1.4 Gradient descent

w(t+1) = w(t) − η∇C(w(t) ), t = 0 , 1 , ... (5)

2.1.5 Artificial Neural Network

• ali = σ(zil ) : activation of unit i in layer l, where σ is an activation function.

We have the following relationship between zil and wij

alj = σ(zil ) (8)

2.2.1 Dot product and convolution

So the convolution operation given data matrix x:

Figure 5: VGG-16 neural network architecture

2.3 Modeling Temporal Data

2.3.1 Recurrent Neural Network

2.3.3 Long Short-Term Memory

Figure 7: Visualisation of an LSTM unit

Ct ′ = tanh(Wxc xt + Whc ht−1 + bc ). (22)

Ct = (Ct−1 ⊙ ft ) ⊗ (it ⊙ Ct ′ ). (23)

• Memory State (Ct )

2.4.1 Positional encoding

2.4.2 Multi-head attention

Figure 8: Multi-head attention

The output of each multi-head attention layer is a matrix Z.

M ultiHead(ouput) = Concat(Z0 , Z1 , Z2 , Z3 , Z4 , Z5 , Z6 , Z7 ) (29)

This leads us to the Scaled-dot product attention, which is computed as follows:

2.4.3 Layer normalisation

LayerN orm(x + Sublayer(x)) (31)

σ is the standard deviation of r of dimension d:

2.4.4 Feed-forward network

2.5 Evaluation Metrics

2.5.1 Mean absolute error

2.5.2 Root Mean square error

2.6 Li-ion Batteries

2.6.1 Understanding Health Estimation techniques

2.6.2 Li-ion battery ageing mechanisms

Figure 10: Different approaches to battery RUL prediction

3.1 Physics-based Models

3.1.1 Electrochemical Modelling

3.1.2 Empirical Models

3.1.3 Equivalent Circuit Models

3.1.4 Electrochemical impedance Models

3.1.6 Limitation of Physics-Based models

3.2 Data-driven approaches

Table 1: Comparison of Physics-Based Models (PBM) vs Data-Driven Models (DDM)

Approach Strengths Weaknesses

• Extremely difficult to model intricate

• High estimation precision

3.2.1 Statistical Machine Learning

Gaussian Process Regression

3.2.1.2 Bayesian Methods

Support Vector Machines and Support vector Regression

3.2.2 Deep Learning approaches:

3.2.3 Knowledge Gap

• Robustness and Accuracy: For data-driven models to be implemented on end-user devices’

Approach Input Outputs Dataset Performance Limitations

4.1 Dataset Description

4.1.1 Stratified Random Sampling

4.2 Feature Selection

4.3 Data Preprocessing

4.3.1 Outlier removal

4.3.3 Data preprocessing tools

output = filloutliers(X, method, threshold, direction) (41)

4.4.1 CLDNN Architecture