0% found this document useful (0 votes)
7 views54 pages

ReportCapstoneP0908 - Final

The project report titled 'Advancing Energy Demand Prediction: The Role of Location-Enhanced N-HiTS and TimesNet Models' focuses on improving energy consumption forecasting through advanced deep learning techniques. It highlights the limitations of traditional models and introduces novel approaches that incorporate spatial variations and complex temporal patterns to enhance prediction accuracy. The study aims to provide scalable solutions for energy forecasting, particularly beneficial for regions with sparse data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views54 pages

ReportCapstoneP0908 - Final

The project report titled 'Advancing Energy Demand Prediction: The Role of Location-Enhanced N-HiTS and TimesNet Models' focuses on improving energy consumption forecasting through advanced deep learning techniques. It highlights the limitations of traditional models and introduces novel approaches that incorporate spatial variations and complex temporal patterns to enhance prediction accuracy. The study aims to provide scalable solutions for energy forecasting, particularly beneficial for regions with sparse data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Advancing Energy Demand Prediction: The Role of Location-

Enhanced N-HiTS and TimesNet Models

A Project Report submitted in partial fulfilment of the requirements for the award of
the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted by

Singireddy Laxmi Prasad Reddy - HU21CSEN0100592


Y. Haritha - HU21CSEN0101210
Movva Likitha - HU21CSEN0101383
Narasimha Shravan Kotra - HU21CSEN0101769

Under the esteemed guidance of

Dr. Sarat Chandra Nayak


Professor

Department of Computer Science and Engineering


GITAM School of Technology
GITAM (Deemed to be University)
HYDERABAD
March - 2025
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GITAM SCHOOL OF TECHNOLOGY
GITAM
(Deemed to be University)

DECLARATION

I/We, hereby declare that the project report entitled “Advancing Energy Demand
Prediction: The Role of Location-Enhanced N-HiTS and TimesNet Models” is an
original work done in the Department of Computer Science and Engineering, GITAM
School of Technology, GITAM (Deemed to be University) submitted in partial
fulfilment of the requirements for the award of the degree of B.Tech. in Computer
Science and Engineering. The work has not been submitted to any other college or
University for the award of any degree or diploma.

Date: 14-03-2025

Registration Numbers Names Signatures

HU21CSEN0100592 Singireddy Laxmi Prasad Reddy

HU21CSEN0101210 Y. Haritha

HU21CSEN0101383 Movva Likitha

HU21CSEN0101769 Narasimha Shravan Kotra

II
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GITAM SCHOOL OF TECHNOLOGY
GITAM
(Deemed to be University)

CERTIFICATE

This is to certify that the project report entitled “Advancing Energy Demand
Prediction: The Role of Location-Enhanced N-HiTS and TimesNet Models” is
a bonafide record of work carried out by S. Laxmi Prasad (HU21CSEN0100592),
Y. Haritha (HU21CSEN0101210), M. Likitha (HU21CSEN0101383), K.
Narasimha Shravan (HU21CSEN0101769) students submitted in partial fulfilment
of requirement for the award of degree of Bachelors of Technology in Computer
Science and Engineering.

Project Guide Project Coordinator Head of the Department

Dr. Sarat Chandra Nayak Dr. A B Pradeep Kumar Mahaboob Shaik Basha

Professor Assistant Professor Professor and H.O.D

Dept. of CSE Dept. of CSE Dept. of CSE

III
ACKNOWLEDGEMENT

Our project report would not have been successful without the help of several people. We would
like to thank the personalities who were part of our seminar in numerous ways, Those who gave
us outstanding support from the birth of the seminar

We are extremely thankful to our honourable Pro-Vice-Chancellor, Prof. D. Sambasiva Rao,


for providing the necessary infrastructure and resources for the accomplishment of our seminar.
We also express our sincere appreciation to Prof. N. Seetharamaiah, Associate Director, School
of Technology, for his support during the tenure of the seminar.

We are very grateful to Prof. Dr. Mahaboob Shaik Basha, Head of the Department of Computer
Science & Engineering, for providing the opportunity to undertake this seminar and
encouragement in the completion of this seminar.

We hereby wish to express our deep sense of gratitude to Dr. A B Pradeep Kumar, Project
Coordinator, Department of Computer Science and Engineering, School of Technology and to
our guide, Dr. Sarat Chandra Nayak, Professor, Department of Computer Science and
Engineering, School of Technology for the esteemed guidance, moral support and invaluable
advice provided by them for the success of the project report.

We are also thankful to all the Computer Science and Engineering department staff members
who have cooperated in making our seminar a success. Finally, we acknowledge the unwavering
support of our parents and friends, whose encouragement played a vital role in this endeavour.

Sincerely,

S. Laxmi Prasad - HU21CSEN0100592

Y. Haritha - HU21CSEN0101210

M. Likitha - HU21CSEN0101383

K. Shravan - HU21CSEN0101769

IV
ABSTRACT

Energy consumption forecasting plays a critical role in ensuring the efficient management and
distribution of energy resources. Accurate predictions help utility companies, grid operators, and
governments plan ahead, ensuring that energy supply meets demand without interruptions. This study
addresses the limitations of traditional energy forecasting models, such as CNN, LSTM, and hybrid
techniques, which struggle to capture complex temporal patterns and long-term dependencies in
energy consumption data. These models often fail to account for the richness and diversity of
contemporary time series data, making accurate predictions challenging. To overcome these
challenges, this work introduced models, advanced deep learning models that are more capable of
handling intricate time series data. The model is enhanced by an exogenous variable and a
multivariate feature, allowing the model to better capture the impact of spatial variations in energy
consumption. This novel approach significantly improves forecasting accuracy, reduces the need for
extensive feature engineering, and offers a scalable solution for energy forecasting.

V
Table of Contents

CHAPTER 1: INTRODUCTION 1
1.1 ENERGY CONSUMPTION IN URBAN AND RURAL AREAS 1

1.1.1 Urban Energy Consumption Trends 1


1.1.2 Rural Energy Consumption Trends 2
1.2 ENERGY CONSUMPTION FORECASTING 2

1.3 MACHINE LEARNING AND DEEP LEARNING-BASED ENERGY FORECASTING 3

CHAPTER 2: LITERATURE REVIEW 5


CHAPTER 3: SYSTEM ANALYSIS AND DESIGN 17
3.1 SYSTEM ANALYSIS 17

3.1.1 Problem Definition 17


3.1.2 Project Objectives 18
3.1.3 Scope & Constraints 19
3.2 FUNCTIONAL REQUIREMENTS 20

3.2.1 Data Ingestion & Preprocessing 20


3.2.2 Model Training & Forecasting 20
3.2.3 Comparison & Evaluation 20
3.2.4 Visualization & Reporting 20
3.3 NON-FUNCTIONAL REQUIREMENTS 20

3.4 SYSTEM DESIGN 21

3.4.1 System Architecture 21


3.4.2 Component Design 21

CHAPTER 4: METHODOLOGY 22
4.1 DATASET DESCRIPTION 22

4.2 DATASET PREPROCESSING 22

4.2.1 AGGREGATION AND TRANSFORMATION 23


4.2.2 MULTIVARIATE ANALYSIS AND EXOGENOUS VARIABLE 24
4.3 ARCHITECTURE ANALYSIS 25

VI
4.3.1 N-BeatsX 25
4.3.2 N-HiTS 26
4.4 PROPOSED METHODOLOGY 28

4.4.1 N-HiTSX 28
4.4.2 Hybrid N-HiTS Block 30
4.4.3 TimesNet 32

CHAPTER 5: EXPERIMENTAL RESULTS 35


5.1. PERFORMANCE METRICS 35

5.1.1 Root Mean Squared Error (RMSE) 35


5.1.2 Mean Absolute Error (MAE) 36
5.1.3 Mean Squared Error (MSE) 36
5.1.4 Mean Absolute Percentage Error (MAPE) 36
5.2 MODEL’S PERFORMANCE 37

CHAPTER 6: CONCLUSION 44
CHAPTER 7: FUTURE WORK 45
REFERENCES 46

VII
CHAPTER 1: INTRODUCTION

1.1 ENERGY CONSUMPTION IN URBAN AND RURAL AREAS

Energy is the cornerstone of modern society, driving industrial growth, economic


development, and improving quality of life. As populations expand, especially in developing
regions like Telangana, India, the demand for energy rises exponentially. Energy consumption
is particularly critical when we consider the domestic sector, which includes residential
electricity usage. Urban areas, characterized by higher population density, modern
infrastructure, and industrial activity, tend to consume more energy per capita than rural
regions. However, rural areas, especially in emerging economies, are witnessing rapid growth
in energy demand due to increased electrification, modernization of households, and policy-
driven efforts to improve energy access.

1.1.1 Urban Energy Consumption Trends

Urban areas are characterized by higher population density, advanced infrastructure, and a
concentration of industrial and commercial activities, all of which contribute to their higher
per capita energy consumption. In cities, a greater reliance on modern appliances, electronic
gadgets, air conditioning systems, and public transportation networks leads to a significant
energy footprint. The expansion of smart cities, IT hubs, and high-rise residential complexes
further drives electricity demand. In Telangana, the urban regions contribute approximately
65% of the state's domestic energy consumption. This is largely driven by rapid urbanization,
industrial growth, and the expanding service sector, which includes IT companies, shopping
malls, and entertainment centers that require continuous power supply.

Furthermore, urban energy consumption is influenced by lifestyle changes, increased


electrification of transport (such as electric vehicles), and the adoption of high-power-
consuming technologies. While urban centers generally have well-established electricity
infrastructure and a more stable power supply, they also face challenges such as energy
wastage, peak load management, and environmental concerns stemming from high
consumption levels.

1
1.1.2 Rural Energy Consumption Trends

Despite traditionally lower per capita energy consumption in rural areas, the demand is
witnessing a substantial increase due to policy-driven efforts aimed at rural electrification.
Government initiatives such as the Deendayal Upadhyaya Gram Jyoti Yojana (DDUGJY) and
the Saubhagya Scheme have significantly improved rural electricity access, leading to a surge in
energy usage. The rural sector, which accounts for about 35% of Telangana's total domestic
energy consumption, is experiencing notable growth in demand due to the modernization of
households, increased use of electrical appliances, and the establishment of small-scale
industries and agricultural electrification.

In many rural regions, the integration of renewable energy sources, such as solar power, is
becoming an essential part of energy supply strategies. Microgrid projects and decentralized
solar solutions are helping to bridge the gap where traditional grid infrastructure may be lacking
or unreliable. Additionally, as rural economies diversify beyond agriculture into small businesses
and manufacturing, the need for stable and affordable electricity continues to rise.

1.2 ENERGY CONSUMPTION FORECASTING

Energy consumption forecasting is a critical task for power systems planning, environmental
management, and economic strategy. Accurate forecasting enables energy providers,
policymakers, and grid operators to make informed decisions about capacity expansion,
resource allocation, and distribution management. Reliable demand predictions help prevent
electricity shortages, optimize infrastructure investments, and promote sustainable energy
usage by integrating renewable energy sources effectively into the power grid.

Traditional energy forecasting methods, such as statistical regression models and time-series
analysis, have been widely used to predict energy consumption based on historical data.
However, these classical approaches have significant limitations in addressing the
complexities of modern electricity demand. The advent of more dynamic, non-linear
consumption patterns—especially in domestic energy use—has highlighted the need for more
advanced forecasting techniques.

Several factors contribute to the unpredictability of energy consumption, including evolving


consumer behaviour, climate variability, and the integration of renewable energy sources. The
increasing adoption of smart appliances, electric vehicles, and demand-response programs
means that consumption patterns are no longer static. Additionally, climate fluctuations
2
significantly impact electricity demand, particularly for heating and cooling applications,
making traditional forecasting models less effective.

Renewable energy integration presents another challenge. Unlike conventional power plants
that operate at steady outputs, renewable energy sources such as solar and wind are highly
dependent on geographic location, weather patterns, and time of day. The intermittent nature
of renewable energy production can lead to fluctuations in power supply, further complicating
demand forecasting. Traditional models often struggle to account for these variables,
necessitating the use of advanced machine learning (ML) and artificial intelligence (AI)-
based forecasting methods.

To address these challenges, energy planners and researchers are increasingly turning to AI-
driven techniques such as deep learning, neural networks, and hybrid forecasting models.
These approaches can analyze vast amounts of historical and real-time data, identify hidden
patterns, and provide highly accurate demand predictions. By leveraging techniques like
Long Short-Term Memory (LSTM) networks, Transformer models, and ensemble learning,
energy forecasting models can adapt to shifting consumption trends and optimize grid
operations.

1.3 MACHINE LEARNING AND DEEP LEARNING-BASED ENERGY FORECASTING

With the rise of machine learning (ML) and deep learning (DL) methods, energy consumption
forecasting has entered a new era of sophistication. Unlike conventional approaches, ML/DL
models can handle vast amounts of data, model complex non-linear relationships, and
uncover hidden patterns that are often missed by statistical techniques. This makes them
particularly well-suited for forecasting energy demand in domestic usage, where consumption
patterns can vary significantly between households and across different regions like urban
and rural Telangana.

Among the most widely used ML techniques for energy forecasting are Decision Trees,
Random Forests, and Gradient Boosting Machines (GBMs), such as XGBoost and
LightGBM. These models are effective in handling structured data and can incorporate
multiple influencing factors like weather conditions, seasonal variations, and economic
indicators. However, while tree-based models excel at short-term forecasting and feature
importance analysis, they often struggle with capturing long-term dependencies in time series
data.

3
To address this limitation, deep learning models like Recurrent Neural Networks (RNNs),
Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs) have
gained prominence. These models are designed to recognize sequential patterns in time series
data, making them ideal for energy forecasting. LSTM and GRU networks, in particular, are
capable of capturing long-term dependencies while mitigating the vanishing gradient
problem, which is a common issue in traditional RNNs.

Hybrid techniques, such as ARIMA-LSTM or kCNN-LSTM, have also emerged to enhance


predictive accuracy by combining the strengths of traditional and deep learning models. For
example, ARIMA handles linear components, while LSTM addresses non-linearities. These
hybrid models have shown promising results, especially in handling multivariate time series
forecasting, but they often require large datasets and extensive computational resources.
Additionally, deep learning models may struggle with interpretability, making them less
transparent compared to traditional statistical methods.

To overcome these challenges, advanced deep learning models like N-HiTS and TimesNet
have been developed. N-HiTS, a neural hierarchical interpolation for time series, offers
interpretable and scalable forecasting capabilities without the need for specialized knowledge
of the domain. TimesNet, on the other hand, uses a hierarchical structure that enables it to
capture both short-term fluctuations and long-term trends, making it highly effective in
dealing with complex, multivariate time series data. These models offer the added advantage
of requiring fewer data samples for training while still maintaining high accuracy, making
them ideal for regions like rural Telangana, where data may be sparse or inconsistent.

4
CHAPTER 2: LITERATURE REVIEW

Accurate forecasting of energy consumption has been a key focus of research for several decades,
driven by the increasing demand for efficient energy management and the need to balance supply
with consumption. Over time, various methodologies have been explored, ranging from
traditional statistical models to more sophisticated machine learning (ML) and deep learning
(DL) approaches. While earlier models like ARIMA and regression techniques were effective in
handling linear and univariate time series data, the growing complexity of modern energy
usage—especially with the integration of renewable energy sources and varying domestic
consumption patterns—has necessitated the development of more advanced techniques.

The table below provides a comprehensive summary of relevant literature in this domain, with
each study highlighting a unique approach to tackling energy consumption forecasting. These
papers investigate the performance of different forecasting models, including hybrid models that
combine traditional methods with deep learning, and more recent innovations like N-HiTS and
TimesNet. By reviewing these contributions, we gain insights into the evolution of energy
forecasting techniques, the advantages and limitations of various models, and the critical need
for models that can handle multi-level and multivariate time series data across different regions.

This literature survey not only establishes the foundations for the research but also emphasizes
the motivation for employing N-HiTS and TimesNet in the context of forecasting domestic
energy consumption in rural and urban areas of Telangana. The following table encapsulates the
key contributions of each study, setting the stage for the application of these advanced
forecasting techniques in the current work.

The survey provided valuable insights into how different models are employed to forecast energy
consumption in diverse settings and under varying conditions. Ramos et al. [1] and Qureshi et
al. [2] have explored the deep learning models Long Short-Term Memory (LSTM) and Gated
Recurrent Units (GRU) for residential and hospital energy consumption forecasting, highlighting
their effectiveness in capturing the temporal dependencies in the data and improving the
accuracy and scalability of deep neural networks in predicting the electricity needs, with Qureshi
et al. attaining an R-squared value of 95%. Li et al. [3] prolonged these efforts using Gated
Recurrent Units (GRU) and Convolutional Neural Networks (CNN) under varying demands
focusing on grid management for forecasting energy consumption in dynamic conditions.

5
Table 1. Literature Table

REF NAME AND MODELS FINDINGS LIMITATIONS


AUTHOR
[1] TITLE: RNN, Transformer Overfitting
Residential LSTM, architecture concerns, feature
energy GRU, works better selection
consumption TST with fewer
forecasting samples,
using deep despite the
learning. occurrence of
AUTHOR: overfitting.
PVB Ramos et
al.
[2] TITLE: Deep LSTM The LSTM The findings rely
learning‑based model on specific
forecasting of achieved a hyperparameter
electricity high settings, and
consumption. forecasting while optimizers
AUTHOR: accuracy with were tested,
Momina an R-squared further
Qureshi et al. value of 95% exploration could
yield better
performance
across other
datasets or
scenarios.
[3] TITLE: GRU, Handled No cross-
Energy CNN, seasonal validation, The
consumption TCN, variations and predictive
forecasting ARIMA nonlinear capability
with deep relationships, without weather
learning. multidimensio data is notably
lower.

6
AUTHOR: nal time-
Yunfan Li series.
[4] TITLE: CNN, CNN performs CNN model is a
Forecasting of RNN generally short-term power
power better demand
demands using forecasting model
deep learning. as it cannot
AUHTOR: forecast more
Taehyung than one day
Kang et al.
[5] TITLE: SVR, significance of Generalization
Intelligent LSTM, smart meter across Regions,
deep learning DRNN, data for energy Overfitting,
techniques MLP, M- forecasting. Forecasting
for energy BDLSTM hybrid deep Horizon
consumption learning Challenges
forecasting approach will
in smart be proposed
buildings: for improving
a review forecasting
AUTHOR: accuracy in
R. Mathumitha residential
et al. buildings from
short to long
term horizon.
[6] TITLE: LSTM, model not model doesn't yet
Electric SEA only has a handle
energy better predictions for
consumption performance multiple
prediction by with a mean buildings,
deep learning squared error Manual
with state of 0.384 than Condition
explainable the Adjustment
autoencoder. conventional

7
AUTHOR: models, but
Jin-Young also improves
Kim et al. the capacity to
explain the
results of
prediction by
visualizing the
state with t-
SNE algorithm
[7] TITLE: A kCNN- Applicable to specific urban
deep learning LSTM various urban environment and
framework for environments. data availability.
building reductions in City
energy energy Infrastructure
consumption consumption might present
forecast. and costs challenges that
AUTHOR: are not fully
Nivethitha addressed in the
Somu et al. study.
[8] TITLE: CNN- This approach Dependency on
Energy ANN outperforms Temperature data,
Demand the reference Seasonal
Forecasting Réseau de Variability and
Using Deep Transport Error Distribution
Learning: d’Electricité
Applications (RTE, French
for the French transmission
Grid. system
AUTHOR: operator)
Alejandro J. subscription-
del Real et al. based service
[9] TITLE: A DNN, The The genetic
building BPN, implementatio algorithm used
energy Elman n of rough set for attribute

8
consumption Neural theory was reduction may
prediction Network, able to not consistently
model based Fuzzy eliminate identify the most
on rough set Neural redundant relevant factors
theory and Network influencing across different
deep learning factors of datasets
algorithms. building
AUTHOR: energy
Lei Lei et al. consumption.
DBN had
more accurate
prediction of
either short-
term or long-
term building
energy
consumption
than the
shallow neural
networks such
as BP, Elman
and fuzzy
neural
networks.
[10] TITLE: Fully The proposed The exploration
Household- Connecte hybrid deep of parallelizing
Level Energy d Layers, learning bidirectional
Forecasting in Unidirecti model, which LSTMs for
Smart onal combines distributed
Buildings LSTM, unidirectional training remains a
Using a Novel Bidirectio LSTMs, future goal,
Hybrid Deep nal bidirectional indicating that the
Learning LSTM, LSTMs, and current
Model CNN- stacked RNNs, implementation

9
AUTHOR: LSTM, demonstrated may not be
Dabeeruddin ConvLST superior optimal for large-
Syed et al. M, accuracy in scale real-time
LSTM forecasting applications.
Encoder- energy
Decoder consumption
Model in residential
smart
buildings.
bidirectional
LSTMs
allowed the
model to
capture energy
consumption
patterns from
both past and
future contexts
[11] TITLE: asymmetri In forecasting significant drop
Dynamic c hybrid behaviour of in accuracy when
adaptive encoder- different types using shorter
encoder- decoder of buildings, historical data
decoder deep (AHED) the AHED
learning deep model shows
networks for learning the highest
multivariate algorithm, generality
time series LSTM, compared with
forecasting of GRU, the other
building CNN- competitive
energy LSTM, models. Being
consumption CNN- hybrid deep
AUTHOR: GRU (All learning time-
Jing Guo et al. these are series model,
for AHED

10
compariso successfully
n with overcomes the
AHED) shortcoming
of highly
depending on
large datasets
in traditional
deep learning
models
[12] TITLE: Time LSTM The approach Single Deep
Series produces Learning Model,
Forecasting of exceptional confined to short-
Electrical levels of term load
Energy accuracy, with forecasting,
Consumption MAPE of Random
Using Deep 0.010 and Behavior of
Learning RMSE of Training Epochs,
Algorithm 19.759 for a Potential External
AUTHOR: 100 time-step. Factors Not
E. O. Edoka et Addressed, No
al. Discussion of
Real-Time
Applications.
[13] TITLE: O grey models, emphasis on
Comparing NGBM, specifically short-term
forecasting NGBMPS GM (1,1) and forecasting may
accuracy of O, ONGBM overlook long-
selected grey ARIMA (1,1), achieved term trends
and time series the lowest
models based Mean
on energy Absolute
consumption Percentage
in Brazil and Error (MAPE)
India when there

11
AUTHOR: was a strong
Atif Maqbool relationship
Khan et al. with lagged
variables,
outperforming
the ARIMA
[14] TITLE: Enhanced N-BEATS* No exogenous
Enhanced N- N-BEATS outperforms variables
BEATS for (N- its predecessor included.Depend
Mid-Term BEATS*), and other ency on cross-
Electricity ARIMA, state-of-the-art learning and
Demand ETS, models. hyperparameter
Forecasting original Introduces a tuning.Limited
AUTHOR: N-BEATS hybrid loss generalizability
Kasprzyk et al. function to other
(pinball- forecasting
MAPE + problems.
normalized
MSE) and a
destandardizat
ion component
for better
generalization.
[15] TITLE: NHITS NHITS Only univariate
NHITS: (Neural outperforms forecasting; does
Neural Hierarchic existing state- not leverage
Hierarchical al of-the-art multivariate
Interpolation Interpolati models, dependencies.
for Time on for improving May require
Series Time accuracy by careful tuning of
Forecasting Series), up to 20% multi-rate
AUTHOR: N- over sampling and
Cristian BEATS, Transformer- interpolation
Challu et al. FEDform parameters.

12
er, based
Autoform methods.
er,
Informer,
LogTrans,
DilRNN,
and
ARIMA
[16] TITLE: TimesNet, TimesNet May struggle
TimesNet: N-HiTS, outperforms with datasets that
Temporal 2D- N- state-of-the-art lack clear
Variation BEATS, models in periodicity.
Modeling for Informer, time-series Requires tuning
General Time Autoform analysis, of period
Series er, capturing selection for
Analysis FEDform intra- and optimal
AUHTOR: er, inter-period performance.
Hanlin Wu et Transform variations with
al. er-based 2D CNNs.
models

Kang et al. [4] have explored deep neural networks such as Convolutional Neural Networks
(CNN) and Recurrent Neural Networks (RNN) for forecasting electricity consumption patterns
in scenarios requiring high precession for energy distribution in grid management, finding that
CNN generally performs better but CNN model is a short term power demand forecasting
model. While Mathumitha et al. [5] have highlighted how effectively the deep learning
algorithms can handle dynamic and large-scale data for forecasting the energy consumption in
smart buildings by analysing models such as Support Vector Regression (SVR) and Long Short-
Term Memory (LSTM) by pointing out how crucial accurate forecasts are to maximize energy
use and boost the sustainability of smart buildings. Kim and Cho [6] using a state-explainable
autoencoder, suggested a technique that improves model interpretability by producing accurate
forecasts as well as insights into the fundamental causes of energy usage.

13
Hybrid models have also been explored, with Somu et al. [7] using a deep learning framework
that employed a blend of LSTM (Long Short-Term Memory) networks and other deep learning
techniques to capture complex patterns and temporal dependencies, thereby exhibiting the
potential of deep learning models to cut down the energy waste and enhance operational
efficiency in smart buildings by energy management and sustainability than the traditional
approaches. Alejandro et al. [8] has showcased the capability of CNN-ANN hybrid model to
handle large-scale datasets and deliver precise predictions under wavering demand conditions
for forecasting the energy requirements within the French grid. Lei et al. [9] developed a hybrid
model by combining rough set theory and deep learning methods, which uses rough sets to
reduce the data and increase prediction accuracy and consumption projections in the context of
building energy management systems. Syed et al. [10] introduced a hybrid approach for
forecasting household-level energy consumption by capturing both short-term and long-term
dependencies, thereby enhancing forecasting reliability in smart buildings. To forecast
multivariate time series in building energy consumption, Guo et al. [11] has presented a dynamic
adaptive encoder-decoder deep learning model, which learns temporal dependencies in complex
energy consumption patterns and also adjust to the dynamic nature of demand, thereby
improving energy optimization in buildings.

To predict the energy demand for rapidly developing economies with varying consumption
patterns was demonstrated by Edoka et al. [12] and Khan and Osińska [13]. Edoka used deep
learning techniques to predict the electrical energy consumption in Nigeria.While Khan and
Osińska compared the forecasting accuracy of traditional time series methods and grey models
to predict the energy consumption in India and Brazil, which concludes that the grey models
surpass time series models in specific cases, thereby providing an alternative for forecasting
energy consumption in emerging countries with limited data availability.

Recent advancements in neural architectures have notably enhanced the efficiency and accuracy
of time-series forecasting. [14] has presented an enhanced version of the Neural Basis Expansion
Analysis for Time-Series Forecasting (N-BEATS), which is referred as N-BEATs*. This model
integrates a unique loss function that combines pinball loss based on Mean Absolute Percentage
Error (MAPE) with normalized Mean Squared Error (MSE) and a remodelled architecture with
a destandardization component. When used for mid-term electrical load forecasting across 35
European nations, these enhancements allow N-BEATS* to manage a variety of electricity
demand patterns with ease, outperforming both its predecessor and other cutting-edge models.
The N-HiTS architecture, which expands N-BEATS with hierarchical interpolation and multi-

14
rate data sampling, was suggested by [15] in a related endeavour, as each model block in this
design is specialized in processing different frequency bands, forecasting accuracy is increased
and computing complexity is decreased, especially for long-horizon forecasts. N-HiTS achieves
a 16% improvement in forecasting accuracy and is significantly faster than Transformer-based
methods by addressing computational and volatility challenges. These advancements highlight
the potential of hierarchical and modular designs in time-series forecasting, thereby providing
scalable and accurate solutions for energy consumption forecasting.

[16] proposed a novel architecture for time-series analysis that makes use of temporal 2D-
variation modelling called TimesNet in order to handle the difficulties of multi-periodicity and
complicated temporal patterns. Using a modular design, TimesNet converts 1D time-series data
into 2D tensors, specifically incorporating intraperiod (short-term) and interperiod (long-term)
changes. TimesNet is also a flexible and efficient tool for analyzing time-series data as it
incorporates computer vision methods.

Fig.1. Count by Models in published research papers recently (approx. 2019 to 2024)

15
The distribution of forecasting models in Fig.2 reveals not only the popularity of different
techniques but also the evolution of the field. The significant presence of LSTM, GRU, and
BiLSTM suggests that recurrent neural networks (RNNs) remain dominant in handling
sequential energy consumption data. However, the presence of newer models such as Informer,
FEDFormer, and AutoFormer signals a shift toward transformer-based architectures, which
excel at capturing long-range dependencies and handling non-stationary time series data—an
inherent challenge in energy consumption forecasting.

Additionally, the distribution suggests that while deep learning models dominate, no single
model universally outperforms others. The growing diversity of models also points to dataset-
specific optimizations, where researchers tailor forecasting techniques based on regional
consumption patterns, grid infrastructure, and the availability of exogenous factors such as
weather, industrial activity, and policy interventions.

The lack of standardization in model evaluation is an unspoken reality. Different studies utilize
different evaluation metrics, data preprocessing techniques, and forecasting horizons, making
direct comparisons between models difficult. The literature shows that the choice of models is
often context-dependent, reinforcing the need for benchmarking frameworks to ensure fair
comparisons and reproducibility.

16
CHAPTER 3: SYSTEM ANALYSIS AND DESIGN

3.1 SYSTEM ANALYSIS

3.1.1 Problem Definition

1.Challenges in Traditional Forecasting Methods:

Traditional models like CNN, LSTM, ARIMA-LSTM, and hybrid approaches (e.g., kCNN-
LSTM) are popular in time series forecasting but have limitations in handling the complex and
non-linear temporal patterns common in modern energy consumption data. These models
typically perform well on stable or periodic data but struggle with data involving high variability,
abrupt shifts, and diverse seasonal patterns, especially in settings with irregular consumption
patterns (e.g., rural vs. urban).

Many recurrent architectures, such as LSTMs, require extensive data preprocessing, large
datasets, and complex feature engineering to achieve high accuracy. They are sensitive to issues
like vanishing and exploding gradients, which can reduce accuracy over longer timeframes. This
makes them less effective for practical applications where data collection may be limited or
costly.

2. Regional Variability and the Need for Adaptive Models:

Urban vs. Rural Energy Patterns: Consumption patterns differ substantially between rural and
urban settings due to differences in infrastructure, energy usage behaviours, and seasonal
variations. For instance, rural areas may experience more seasonal fluctuations in energy usage
tied to agricultural cycles, while urban areas may have more stable yet progressively increasing
demand due to industrial activities.

Dynamic Energy Needs Across Time: Energy consumption patterns have become more dynamic,
reflecting broader socioeconomic developments, such as increased access to electricity in rural
regions or changes in urban residential energy demands due to population growth. Traditional
models often fail to capture these nuances, necessitating more adaptive forecasting methods that
can independently adjust to changing patterns without extensive manual intervention.

17
3. Need for Interpretability in Forecasting Models:

Decision-Making for Energy Management: Accurate forecasting is crucial for stakeholders like
policymakers, energy planners, and utility companies who require interpretable results to guide
decision-making. They need to understand not only the forecasted values but also the underlying
trends and seasonal components that influence these forecasts. Limitations of Recurrent Models
in Interpretability: Models like LSTM are challenging to interpret because they rely on complex
hidden states that make it hard to understand the influence of specific factors on the predictions.
Deep learning models like N-Beats and TimesNet provide an advantage here. Their architectural
design allows for direct modeling of trend and seasonal components, which improves
interpretability and enables stakeholders to better understand the drivers of energy demand.

3.1.2 Project Objectives

Objective 1: Model Selection for Adaptability and Accuracy

• Select advanced deep learning models (N-Beats and TimesNet) that excel in capturing
complex temporal dependencies, which can improve forecast accuracy and adaptability
to rural and urban energy usage variations. These models’ structures are well-suited to
analyze trends, seasonality, and other temporal patterns, making them more adaptable to
dynamic and region-specific consumption data.
• N-Beats, with its trend and seasonal blocks, allows for the explicit separation of these
components, providing interpretability and reducing reliance on complex feature
engineering. TimesNet, on the other hand, leverages convolutional and decomposition-
based structures to analyze time series data, making it robust to variations in sample size
and adaptable to different data characteristics.

Objective 2: Minimal Feature Engineering for Scalable Application

• Minimize feature engineering to highlight the models’ capability to learn complex


patterns directly from raw data, reducing the need for manual intervention and
preprocessing. This approach ensures that the models can adapt to various datasets,
potentially improving scalability for future implementations in different regions or
sectors without extensive customization.

18
Objective 3: Regional Comparison and Multi-Level Forecasting

• Compare the performance of N-Beats and TimesNet on data from different time periods
and regions (North Telangana: 2019-2024, South Telangana: 2021-2024). By evaluating
model effectiveness on these subsets, this project will assess the models’ ability to handle
multi-level forecasts, providing insight into energy consumption patterns over time and
enabling more informed decisions for both regional energy planning and resource
allocation.

3.1.3 Scope & Constraints

Scope:

• Geographical Focus: The project is focused on home energy consumption forecasting in


Telangana, specifically for rural and urban areas in North and South Telangana, where
energy needs vary considerably. This segmentation is intended to capture location-
specific trends and inform targeted resource management strategies.
• Temporal Focus: The dataset covers the years 2019–2024 for North Telangana and 2021–
2024 for South Telangana, selected to represent recent consumption patterns while
allowing for a period-based performance comparison.
• Analysis Depth: Forecasts will cover both short-term and long-term periods, enabling a
layered analysis of trends and patterns that affect consumption variability over time. This
approach provides insights into immediate needs as well as longer-term energy
management planning.

Constraints:

• Minimal Feature Engineering: The project aims to rely on the models’ inherent ability to
learn directly from the data, limiting the extent of feature engineering. This constraint
emphasizes the flexibility and strength of the models in handling raw or minimally
processed data but may restrict opportunities for customized data enhancement.
• Data Limitations: The selected dataset’s timeframe and regional specificity (North and
South Telangana only) provide valuable insights but may limit generalizability to other
areas or broader temporal scopes.
• Focus on Residential Consumption: The project excludes other sectors like commercial
and industrial energy consumption to focus specifically on residential needs, which
means the model may not account for cross-sectoral influences on energy demand.

19
3.2 FUNCTIONAL REQUIREMENTS

3.2.1 Data Ingestion & Preprocessing

• Load historical energy data from local sources (CSV, databases).


• Perform data cleaning, imputation, normalization, and regional splitting.
• Use cross-validation to optimize model robustness.

3.2.2 Model Training & Forecasting

• N-HiTS: Uses trend and seasonal blocks for interpretability.


• TimesNet: Employs convolutional and decomposition-based methods for adaptability.
• Apply regularization techniques (dropout, early stopping) to prevent overfitting.

3.2.3 Comparison & Evaluation

• Assess accuracy using MAE, MSE, and MAPE.


• Compare model performance across regions and timeframes.
• Evaluate interpretability by analyzing seasonal and trend components.

3.2.4 Visualization & Reporting

• Generate time series plots, trend analysis, and decomposition charts.


• Provide performance reports summarizing model effectiveness and forecasting insights.

3.3 NON-FUNCTIONAL REQUIREMENTS

• Performance: Ensure fast forecasting speeds and scalable architecture.


• Accuracy: Maintain high precision across all regions and timeframes.
• Reliability: Implement fault tolerance and data validation for consistent availability.
• Usability: Ensure clear visualization and interpretability for stakeholders.
• Maintainability: Modular design to support future updates and integrations.

20
3.4 SYSTEM DESIGN

3.4.1 System Architecture

Follows a modular microservices structure: Data Ingestion, Preprocessing, Model Training,


Evaluation, and Visualization.

3.4.2 Component Design

• Data Ingestion: Extracts and validates data (Pandas).


• Preprocessing: Cleans, normalizes, and splits data (Scikit-learn, NumPy).
• Model Training: Implements deep learning models (PyTorch, TensorFlow).
• Evaluation: Measures accuracy with statistical libraries.
• Visualization & Reporting: Generates graphical reports (Matplotlib, Plotly, ReportLab).

21
CHAPTER 4: METHODOLOGY

This section outlines the dataset, preprocessing techniques, and forecasting models used in this
study. The research focuses on energy consumption forecasting in Telangana, utilizing deep
learning models—N-HiTS and TimesNet. The methodology is designed to leverage historical
energy consumption data alongside exogenous variables to enhance forecasting accuracy.

4.1 DATASET DESCRIPTION

The dataset used in this study is sourced from the Telangana State Government Open Data Portal
and focuses on domestic energy consumption in the Telangana for the period 2019–2024. The
data is provided by two major power distribution entities in the state: The Telangana Southern
Power Distribution Company Limited (TGSPDCL), The Telangana Northern Power Distribution
Company Limited (TGNPDCL). This dataset offers near real-time insights into monthly energy
consumption patterns, connection details, and load distribution, making it highly relevant for
energy forecasting and trend analysis.

The columns Circle, Division, Subdivision, Section, Area are the geographic identifiers detail
the location of energy consumption. The "Circle" column, equivalent to districts, plays a
significant role in analyzing region-specific energy demand. Units column indicates the total
energy consumed (in units) and billed within the month, a critical indicator of demand. Load
column Specifies the load billed during the month, categorized by the connection type as
determined by the category code. Load is measured in either kilowatts (kW) or horsepower (hp).

This dataset provides a detailed and near real-time understanding of energy consumption trends,
categorized by geographic and sector-specific variables. The inclusion of granular information
such as billed connections and load distribution allows for precise modeling of energy demand
across different regions and connection types. Its comprehensive temporal coverage from 2019
to 2024 makes it a valuable resource for developing predictive energy forecasting models,
enabling better resource allocation and policy-making in the energy sector.

4.2 DATASET PREPROCESSING

The dataset was derived by combining multiple Excel files, resulting in an initial dataset of
approximately 990,000 rows. This large size was due to the detailed geographic segmentation,
with data broken down hierarchically into Circle (district), Division, Subdivision, Section, and
Area. Given the need for efficient analysis and modeling, preprocessing steps were undertaken
22
to aggregate, transform, and prepare the data for multivariate energy forecasting, including the
integration of exogenous variables.

Fig.2. Data Preprocessing

4.2.1 Aggregation and Transformation

To standardize the temporal aspect, the date column was converted into a consistent format using
Python. The first day of each month was assigned as the reference date, simplifying monthly
aggregation. Energy consumption (Units) and Load (Connected Load) were summed at the
Circle level for each month. This aggregation not only reduced the dataset to 1,931 rows but also
highlighted the scale of energy consumption across Telangana, as the cumulative sums were
large due to the statewide coverage.

Table 2. Dataset Description

INDEX DATE UNITS LOAD

1930 1930 1930


COUNT
2022-06-09 30144963.20 373969.284
MEAN
2019-01-01 4841291.0 43393.342
MINIMUM
2021-07-01 11839281.125 122258.56675
25%
2022-08-01 20994976.765 221961.109
50%
2023-10-01 39489493.0 515311.57
75%
2024-11-01 139700482.0 1839651.42
MAXIMUM

23
The aggregated dataset revealed the following characteristics:

• Units (Monthly Energy Consumption): Mean consumption was approximately


30.14 million units, with a maximum of 139.70 million units recorded in a single
month.
• Load (Monthly Connected Load): The mean load was around 373,969 kW, with a
peak load of 1,839,651 kW in the highest-demand month.
• Temporal Range: The dataset spans from January 2019 to November 2024,
capturing variations in energy consumption over time.

4.2.2 Multivariate Analysis and Exogenous Variable

Our forecasting model is structured as a multivariate time series problem, where multiple
dependent and independent variables influence the prediction of future energy consumption
(Units). The primary dataset, aggregated at the Circle level, captures monthly energy
consumption patterns across geographical regions, ensuring both granularity and interpretability.

Multivariate Feature - Load: The Load variable represents the total connected electrical load,
a strong determinant of energy demand. Mathematically, energy consumption (Ut) is a function
of connected load (Lt), consumer behavior, and external factors:

Ut = f( Lt, Xt ) + et …(1)

where Xt represents other influencing factors (e.g., seasonal patterns, policy changes), and et is
the error term. Since connected load (Lt) dictates the potential consumption capacity, it exhibits
a leading correlation with Units and aids in capturing underlying demand shifts.

Exogenous Variable - Circle (Geographical Identifier): Circle variable is incorporated as an


exogenous categorical feature to account for regional variations in energy consumption. While
it does not directly contribute to the autoregressive structure of the time series, it enables the
model to differentiate consumption patterns across locations. This is crucial because:

Ut = f( Lt, Xt, C ) + et …(2)

where 𝐶 represents the categorical effect of geographical influence. In practice, different Circles
exhibit distinct consumption behaviors due to factors such as population density, industrial

24
activity, and climate variations. By encoding Circle as an exogenous feature, the model learns
region-specific patterns that improve generalization and forecasting accuracy.

This preprocessing pipeline ensured the datasets were both representative and efficient,
balancing computational requirements with the preservation of critical trends and patterns.
Together, these datasets formed the foundation for robust and context-aware energy forecasting
models tailored to Telangana's energy consumption landscape.

4.3 ARCHITECTURE ANALYSIS

This section presents a detailed analysis of the deep learning architectures employed for energy
consumption forecasting. The study utilizes N-HiTS and N-BeatsX, both of which are designed
for time series forecasting with distinct mechanisms for capturing temporal patterns.

4.3.1 N-BeatsX

N-Beats is a deep neural architecture based on backward and forward residual links and a very
deep stack of fully-connected layers. The architecture has a number of desirable properties, being
interpretable, applicable without modification to a wide array of target domains, and fast to train
[17]. N-Beats follows a pure deep learning approach without recurrence or self-attention
mechanisms [18]. Instead, it relies on a deep stack of fully connected (MLP) layers organized
into blocks. Each block is composed of:

• Forward Expansion (for trend/cycle extraction)


• Backward Expansion (to reconstruct input signals)
• Stacked Residual Learning (deep layers iteratively refine forecasts)

Fig.3. N-BeatsX Model flowchart


25
N-BeatsX is an advanced extension of the N-Beats (Neural Basis Expansion Analysis for Time
Series) architecture, designed to enhance time series forecasting by incorporating exogenous
variables—external factors that influence the target series, such as weather data, economic
indicators, or promotional events. Building on the success of N-Beats, which uses a deep,
interpretable stacked structure of fully connected layers to decompose time series into trend and
seasonal components, N-BeatsX integrates these external features to capture more complex, real-
world dependencies [19]. Each stack in the model processes both historical time series data and
exogenous inputs, generating a backcast (reconstructing past values) and a forecast (predicting
future values) while retaining the interpretability of its predecessor. This is done by modifying
the input function to include additional feature vectors:

Ht = MLP (Xt,et) …(3)

where:

• Xt is the historical time series input.


• et represents exogenous variables (such as location, climate, and economic conditions).
• Ht is the learned representation for forecasting.

Each block is now responsible for learning mappings that consider both endogenous and
exogenous effect. This allows N-BeatsX to excel in scenarios where external drivers
significantly impact outcomes, such as energy demand forecasting (weather-driven
consumption), retail sales (holiday promotions), or financial markets (macroeconomic trends).
By combining the flexibility of deep learning with domain-specific insights from exogenous
variables, N-BeatsX achieves state-of-the-art accuracy without sacrificing transparency [334].
However, its effectiveness hinges on the availability of high-quality external data, and its
computational complexity may require robust infrastructure. Ideal for practitioners seeking to
balance interpretability with performance in dynamic forecasting environments, N-BeatsX
bridges the gap between pure statistical models and black-box neural networks.

4.3.2 N-HiTS

N-HiTS (Neural Hierarchical Interpolation for Time Series) is a deep learning model designed
for efficient and accurate long-horizon time series forecasting. Building on the success of N-
Beats, N-HiTS introduces hierarchical interpolation and multi-rate sampling to address
computational challenges in long-term predictions. The model decomposes time series into
hierarchical components using stacked blocks, each operating at different temporal resolutions

26
to capture multi-scale patterns (e.g., short-term fluctuations and long-term trends). By leveraging
interpolation techniques, N-HiTS generates smooth forecasts while drastically reducing
computational costs compared to traditional sequence-to-sequence models [20]. Its architecture
combines multi-scale backtesting, where lower blocks refine coarse predictions from higher
blocks, and a hierarchical loss function to ensure coherence across time horizons. However, its
hierarchical structure may require careful tuning of hyperparameters like the number of blocks
or interpolation rates [21]. Libraries like Darts offer user-friendly implementations, making N-
HiTS a go-to choice for balancing efficiency, accuracy, and scalability in complex forecasting
tasks.

Fig.4. N-HiTS Model flowchart

Mathematical Formulation of N-HiTS

Given a historical sequence 𝑥𝑡 , N-HiTs models the forecast 𝑦 𝑡 as a sum of outputs from different
temporal resolutions.

…(4)
𝑦̂𝑡 = ∑ 𝑓𝑖 (𝑘) (𝑥𝑡 )
𝑖

where 𝑓𝑖 𝑘 (𝑥𝑡 ) represents the forecast computed at scale k, defined as:

𝑓𝑖
(𝑘)
(𝑥𝑡 ) = 𝑊 (𝑘) 𝑔(𝑥𝑡 ) …(5)

Here:

• W(k) is a learnable weight matrix for scale k.


• g(xt) is an interpolation function that aggregates past observations dynamically.

27
The interpolation function follows:

…(6)
𝑦̅𝑡 (𝑘) = ∑ 𝛼𝑗 (𝑘) (𝑥𝑡−𝑗 )
𝑗

where 𝛼𝑗 (𝑘) are learned interpolation coefficients, allowing the model to focus more on relevant
past time steps, thereby improving long-term forecasting accuracy.

4.4 PROPOSED METHODOLOGY

4.4.1 N-HiTSX

Input Sequence Preparation

The model processes time series data with exogenous variables, where Circle is a key categorical
feature. The input sequence is constructed as follows:

• Target Variable: The primary time series X = {x1,x2,…,xT} (e.g., units sold).
• Exogenous Variables: Includes Circle (geographical region) and other features
(e.g., load).
• Input Sequence: For a time series X ∈ ℝTxF with T time steps and F features, the
input at time t is a window length L = 12:

𝑥𝑡−1
(1)
𝑧𝑡−𝐿 ⋯
(𝐹−1)
𝑧𝑡−𝐿 …(7)
𝑋𝑡−𝐿:𝑡 = ⌊ ⋮ ⋮ ⋱ ⋮ ⌋
(1) (𝐹−1)
𝑥𝑡−1 𝑧𝑡−𝐿 ⋯ 𝑧𝑡−𝐿
where 𝑥 is the target variable, and z represents exogenous features, including Circle.

Output Horizon: The model predicts H = 1 future step Yt = xt.


The time series dataset class generates these sequences and pairs them with their corresponding
targets. Each input tensor has dimensions (L, F), and the output tensor has dimensions(H). This
formulation allows the model to leverage both historical patterns and exogenous factors.

Model Architecture

The Hybrid N-HiTSX architecture is an advanced extension of the N-HiTS model that integrates
three core components: dilated convolution blocks, attention blocks, and hierarchical residual
stacking, forming a unified block structure trained end-to-end for time series forecasting. The
dilated convolution blocks leverage dilated convolutions to efficiently capture long-term
dependencies by expanding the receptive field exponentially without increasing the number of
parameters, thereby enabling multi-scale feature extraction. The attention blocks enhance model

28
interpretability and forecasting accuracy by dynamically weighting relevant time steps, allowing
the model to focus on significant temporal dependencies. Finally, hierarchical residual stacking
refines predictions through multiple stages of residual connections, ensuring stable training and
improved generalization. These three components work synergistically, allowing Hybrid N-
HiTSX to model both short-term and long-term patterns effectively, making it well-suited for
energy forecasting and other time series applications.

Dilated Convolution Block

Purpose: Capture multi-scale temporal patterns by expanding the receptive field exponentially.

Mathematical Formulation:

Dilated convolutions introduce gaps between kernel elements to increase the context window
without additional parameters. For an input X ∈ ℝLxF, the dilated convolution operation at
dilation rate d = 2 is defined as:

𝐾−1 …(8)
C [𝑘] = ∑ 𝑊[𝑖] ∙ 𝑋 (𝑙−1) [𝑘 − 𝑑 ∙ 𝑖]
(𝑙)

𝑖=0

where W is the kernel of size K = 3, and C(𝑙) is the output of layer l.

Implementation:

The DilatedConvBlock applies a 1D convolution with kernel_size=3, dilation=2, and padding=2 to


preserve sequence length. The output is flattened and passed through a fully connected (FC) layer to
project features to the horizon H:

𝑓𝐶𝑜𝑛𝑣 (𝑋) = 𝑊𝐹𝐶 ∙ 𝐹𝑙𝑎𝑡𝑡𝑒𝑛 (𝑅𝑒𝐿𝑈(𝐶 (𝑙) )) + 𝑏𝐹𝐶 …(9)

Attention Block

Purpose: Dynamically weigh significant time steps and exogenous features, including Circle.

Mathematical Formulation:

The attention mechanism computes context-aware scores for each time step. For input X,
learnable projections generate queries (Q), keys(K), and values(V):

Q = XWQ, K = XWK, V = XWV, …(10)


where WQ, WK, WV ∈ ℝFxD are weight matrices. The attention output is:

29
Attention (Q, K, V) = softmax (
𝑄𝐾𝑇
)V ∙ …(11)
√𝑑

Implementation:

The AttentionBlock computes scaled dot product attention and passes the result through an FC
layer:
𝑓𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 (𝑋) = 𝑊𝐴𝑡𝑡𝑛 ∙ 𝐹𝑙𝑎𝑡𝑡𝑒𝑛(𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄, 𝐾, 𝑉) + 𝑏𝐴𝑡𝑡𝑛 ∙ …(12)
The attention mechanism is particularly effective for handling categorical exogenous variables like
Circle, as it learns to assign importance weights to different regions dynamically.
4.4.2 Hybrid N-HiTS Block

Purpose: Fuse multi-scale temporal features (convolutions), global context (attention), and
nonlinear interactions (FC layers).

Structure:

Each NHiTS Block combines three parallel pathways:

1. FC Pathway: Flattens the input and applies two FC layers:


𝑓𝐹𝐶 (𝑋) = 𝑊2 ∙ 𝑅𝑒𝐿𝑈(𝑊1 ∙ 𝐹𝑙𝑎𝑡𝑡𝑒𝑛(𝑋) + 𝑏1 ) + 𝑏2 ∙ …(13)

2. Attention Pathway: Computes attention scores as in §2.2.


3. Dilated Convolution Pathway: Applies dilated convolutions in §2.1.

The outputs of the three pathways are summed residually

𝑌̂𝑏 = 𝑓𝐹𝐶 (𝑋) + 𝑓𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 (𝑋) + 𝑓𝐶𝑜𝑛𝑣 (𝑋) ∙ …(14)


Hierarchical Stacking

Purpose: Learn hierarchical representations at multiple temporal resolutions.

Structure:

• Multiple NHiTS Block layers are stacked, each operating on progressively downsampled
versions of the input.
• The final prediction is the sum of all block outputs:

𝐵 …(15)
𝑌̂𝑡𝑜𝑡𝑎𝑙 = ∑ 𝑌̂𝑏
𝑏=1

where B = 3 is the number of blocks.

30
Fig.5. N-HiTSX Model flowchart
31
Mathematical Innovation

The original N-HiTS block computes forecasts as:

𝐾 …(16)
𝑌̂𝑏 = ∑ 𝑊𝑘 ∙ 𝑋𝑡−𝐿:𝑡
𝑘=1

where K is the number of basis functions. In contrast, Hybrid N-HiTSX introduces:

𝑌̂𝑏 = 𝑓𝐹𝐶 + 𝑓𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 + 𝑓𝐶𝑜𝑛𝑣 …(17)

explicitly unifying three complementary inductive biases.

4.4.3 TimesNet

This section provides a detailed explanation of the TimesNet architecture, focusing on its ability
to handle time series data with exogenous variables such as Circle (a categorical feature
representing geographical regions). The model leverages convolutional layers to capture
temporal patterns and integrates exogenous features effectively. Below, we describe the
architecture, mathematical foundations, and training methodology in detail.

Input sequence preparation is prepared same as N-HitsX methodology which is proposed


previously.

Model Architecture

The architecture for TimesNet consists of multiple TimesBlock layers, each designed to capture
temporal patterns with the help of convolutional operations. These layers learn multi-scale
representations through horizontal stacking.

TimesBlock

The purpose of a TimesBlock is essentially to extract temporal features using 1D convolutions.

Mathematical Formulation

Each TimesBlock applies 1D convolutional layers with ReLU activation

𝑪(𝒍) = 𝐑𝐞𝐋𝐔(𝑾𝟐 ∗ 𝐑𝐞𝐋𝐔(𝑾𝟏 ∗ 𝑿(𝒍−𝟏) + 𝒃𝟏 ) + 𝒃𝟐 ), …(18)

Where 𝑾𝟏 , 𝑾𝟐 denote convolutional kernels, * denotes the convolutional operations , and 𝒃𝟏


and 𝒃𝟐 are bias terms.

32
Fig.6. Timesnet Model flowchart

Implementation

The first convolutional layer maps the input to a hidden representation :

𝑪𝟏 = 𝐑𝐞𝐋𝐔(𝑾𝟏 ∗ 𝑿 + 𝒃𝟏 ) …(19)

The second convolutional layer refines the features :

𝑪𝟐 = 𝐑𝐞𝐋𝐔(𝑾𝟐 ∗ 𝑪𝟏 + 𝒃𝟐 ) …(20)

Both layers use a kernel size of 3 and padding to preserve the sequence length

33
TimesNet Block

The purpose of this is to stack multiple TimesBlock layers to learn hierarchical temporal
representations.

Structure

The input tensor X ϵ 𝑅 𝐵×𝐿×𝐹 (batch size B, sequence length L, features F) is permutes to
𝑅 𝐵×𝐿×𝐹 for convolutional operations.

Each TimesBlock processes the input sequentially :

𝑿(𝒍−𝟏) = 𝑻𝒊𝒎𝒆𝒔𝑩𝒍𝒐𝒄𝒌(𝑿(𝒍−𝟏) ) …(21)

The final output is flattened and passed through a fully connected (FC) layer to predict the
horizon H :

𝒀̂ = 𝑾𝑭𝑪 . 𝑭𝒍𝒂𝒕𝒕𝒆𝒏(𝑿(𝒍) ) + 𝒃𝑭𝑪 …(22)

Mathematical Innovation :

The original N-HiTS block computes forecasts as :

𝑲 …(23)
𝒀̂ 𝒃 = ∑ 𝑾𝒌 . 𝑿𝒕−𝑳:𝒕
𝒌=𝟏

Where K is the number of basis functions. In contrast, TimesNet introduces:

𝒀̂ = 𝑾𝑭𝑪 . 𝑭𝒍𝒂𝒕𝒕𝒆𝒏(𝑿(𝑳) ) …(24)

Where 𝑿(𝑳) is the output of the final TimesBlock. This approach simplifies the architecture while
maintaining strong performance.

34
CHAPTER 5: EXPERIMENTAL RESULTS

The results presented in this study are derived from a comprehensive evaluation of multiple deep
learning and machine learning models for energy consumption forecasting in Telangana. The
models, including N-HiTS, TimesNet, LSTM, CNN-LSTM, ARIMA, XGBoost, and several
others, were trained on historical energy consumption data spanning from 2019 to 2024, with
"Load" as the target variable and "Circle" as a key exogenous variable. The dataset was
partitioned into an 80% training set and a 20% testing set to ensure robust evaluation. Each
model was fine-tuned using advanced hyperparameter optimization techniques to achieve
optimal performance. The results highlight the comparative effectiveness of these models in
capturing temporal patterns and exogenous influences, providing valuable insights into their
suitability for energy forecasting tasks in a real-world scenario.

5.1. PERFORMANCE METRICS

When evaluating energy forecasting models, it is crucial to assess their accuracy and reliability
using quantitative metrics. The following four metrics—Root Mean Squared Error (RMSE),
Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error
(MAPE)—are widely used because they provide different perspectives on model performance.

5.1.1 Root Mean Squared Error (RMSE)

RMSE measures the square root of the average squared differences between the actual and
predicted values. It penalizes large errors more than small errors due to squaring, making it
sensitive to outliers.

1 …(25)
RMSE = √𝑛 ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̂
𝑖)

RMSE provides a direct interpretation of error magnitude in the same unit as the target variable
(e.g., kWh). It penalizes larger errors heavily, making it useful when large deviations in
predictions are critical. It is beneficial when optimizing models, as reducing RMSE directly
improves overall prediction accuracy.

35
5.1.2 Mean Absolute Error (MAE)

MAE measures the average absolute difference between actual and predicted values. Unlike
RMSE, it treats all errors equally without squaring.

…(26)
1
MAE = 𝑛 ∑𝑛𝑖=1 |𝑦𝑖 − 𝑦̂
𝑖|

MAE is more interpretable and directly represents the average forecasting error in the same unit
as the target variable. It does not disproportionately penalize large errors, making it more robust
to outliers compared to RMSE. It is useful when the priority is to minimize overall average
prediction error, regardless of occasional extreme values.

5.1.3 Mean Squared Error (MSE)

MSE calculates the average squared difference between actual and predicted values. It is similar
to RMSE but without taking the square root.

1 …(27)
MSE = 𝑛 ∑𝑛𝑖=1(𝑦𝑖− 𝑦̂)
𝑖
2

MSE is commonly used for model training because it is differentiable, allowing gradient-based
optimization. Since errors are squared, MSE emphasizes large errors more than small ones,
which can be useful when large forecasting mistakes must be minimized. However, it is not
directly interpretable in the same unit as the target variable (unlike RMSE and MAE).

5.1.4 Mean Absolute Percentage Error (MAPE)

MAPE expresses the error as a percentage of actual values, making it scale-independent and
useful for comparing forecasts across different datasets.

MAPE =
1 𝑦𝑖 −𝑦̂𝑖
∑𝑛𝑖=1 | | × 100 …(28)
𝑛 𝑦𝑖

MAPE provides an easy-to-understand percentage-based measure of error, making it suitable for


business and policy decisions. It allows for comparison across different datasets with varying
scales.

36
5.2 MODEL’S PERFORMANCE

The Table 3 presents a comparative analysis of different forecasting models used for energy
consumption prediction. It evaluates models based on four key performance metrics: Mean Absolute
Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute
Percentage Error (MAPE). These metrics provide insights into the accuracy and reliability of each
model, helping in identifying the most suitable approach for forecasting energy consumption.

Table 3. Model’s Performance Comparison

Model MAE MSE RMSE MAPE(%)


ARIMA 0.10453 0.01791 0.13383 103.29
RNN 0.22439 0.08563 0.29263 99.98
LSTM 0.18883 0.09854 0.31391 86.49
Bi-LSTM 0.08221 0.01712 0.13088 25.36
CNN-LSTM 0.10108 0.04681 0.21637 21.482
FNN 0.07871 0.02138 0.14623 18.06
XG Boost 0.04749 0.00591 0.07687 17.68
DNN 0.06943 0.01521 0.12335 14.98
GBR 0.03035 0.00228 0.04777 11.33
N-Hits 0.03982 0.00552 0.07432 7.60
N-Beats 0.01461 0.00053 0.02316 6.51
Times Net 0.01382 0.00047 0.02170 6.28
N-Beats X 0.01367 0.00047 0.02184 6.03
N-Hits X 0.01252 0.00038 0.01962 5.73
The models compared include traditional statistical models like ARIMA, deep learning-based
models such as RNN, LSTM, CNN-LSTM, Bi-LSTM, and FNN, as well as machine learning
approaches like XGBoost, GBR, and DNN. More advanced architectures, including N-HiTS, N-
Beats, TimesNet, and their enhanced versions (denoted with "X"), have also been evaluated.

The ARIMA (AutoRegressive Integrated Moving Average) model, a traditional time series
forecasting technique, exhibits a relatively high MAPE of 103.29%, indicating poor
performance. This suggests that ARIMA struggles to capture the complex patterns in energy
consumption data. Its MAE and RMSE values are also higher compared to most deep learning
models.

37
Fig. 7. MAE Comparison

Fig. 8. MSE Comparison

38
Fig. 9. RMSE Comparison

Fig. 10. MAPE Comparison

39
Advanced architectures, including N-HiTS, N-Beats, and TimesNet, demonstrate superior
performance over traditional deep learning and machine learning models.
• N-HiTS achieves an MAE of 0.03982, MSE of 0.000552, and RMSE of 0.02472, with a
significantly lower MAPE of 7.60%. This highlights its effectiveness in handling temporal
dependencies in energy forecasting.

• N-Beats, another specialized deep learning model for time series forecasting, performs
slightly better, with a MAPE of 6.51%, demonstrating its strength in long-term energy demand
prediction.
• TimesNet, a transformer-based model tailored for time series data, achieves a MAPE of
6.28%, further refining forecasting accuracy.
The "X" versions of N-Beats and N-HiTS indicate improved variants with additional
optimizations such as hyperparameter tuning, additional feature engineering, or modifications to
model architecture.

• N-BeatsX achieves an even lower MAPE of 6.03%, showing an improvement over standard
N-Beats.
• N-HiTSX, the best-performing model in the table, attains the lowest MAPE of 5.73%, MAE
of 0.01252, and RMSE of 0.01962, indicating that it provides the most accurate forecasts.
These results confirm that specialized deep learning architectures outperform traditional
machine learning and statistical models for energy forecasting.

From a practical perspective, N-HiTSX and N-BeatsX are the most promising models for energy
consumption forecasting, as they yield the lowest error rates across all four metrics. These
findings suggest that transformer-based and neural basis expansion techniques are well-suited
for handling the complexities of monthly energy forecasting in Telangana.

Fig.11. MAPE Across Models


40
The lowest MAPE values are observed in state-of-the-art models like N-HiTS, N-Beats, and
TimesNet. These models utilize hierarchical decomposition, temporal attention mechanisms, and
multi-scale feature learning, allowing them to better understand temporal trends and variations
in energy consumption. Their superior performance underscores the importance of adopting
modern deep learning architectures for energy forecasting.

Fig.12. MAE vs RMSE Relationship across Models

The Fig.7. plot is a crucial visualization for comparing the performance of different forecasting
models. Each point represents a model, with its position indicating its Mean Absolute Error
(MAE) on the x-axis and Root Mean Squared Error (RMSE) on the y-axis. MAE measures the
average absolute difference between predicted and actual values, while RMSE penalizes larger
errors more heavily due to squaring. This makes RMSE particularly sensitive to outliers, whereas
MAE provides a more straightforward average deviation.

The importance of this graph lies in its ability to differentiate between models based on their
error distributions. Models positioned in the upper right region of the plot exhibit high MAE and
RMSE values, indicating poor forecasting performance. Conversely, models clustered in the
lower left region have significantly lower errors, making them more reliable for prediction tasks.
Notably, advanced deep learning models like N-HiTS, N-Beats, and TimesNet achieve the
lowest error values, confirming their superior forecasting accuracy compared to traditional
methods such as ARIMA, RNN, and LSTM.

41
Fig.13. N-HiTSX Predictions chart

This Fig.8. presents a comparison between true values and predicted values in a time series
forecasting model. The x-axis represents time steps, indicating the sequential order of data points
over time, while the y-axis represents scaled values, showing the magnitude of both actual and
predicted values.

The red line represents the true values from the dataset, while the blue line represents the model's
predicted values. From the graph, we can observe that the model generally follows the trends of
the actual data, including peaks and troughs, suggesting that it has learned the overall structure
of the time series. However, there are minor discrepancies, particularly during extreme
fluctuations, where the predicted values may slightly lag or overshoot the actual values.

Fig.14. TimesNet Predictions chart

This Fig.9. represents the performance of the TimesNet model in time series forecasting,
compared to the N-HiTSX model shown in the previous graph. One key observation is that
TimesNet seems to generalize well, especially in stable regions of the dataset, maintaining a
strong correlation with actual values. However, if the model underestimates extreme peaks or
fails to react quickly to sudden shifts, it may indicate a need for further optimization. This could
involve hyperparameter tuning, integrating exogenous variables, or combining it with hybrid
approaches to enhance its adaptability to volatile patterns.

42
Fig.15. N-HiTSX vs N-BeatsX across different regions

This Fig.10. compares the Mean Absolute Percentage Error (MAPE) values of two deep learning
models, N-HiTSX (red bars) and N-BeatsX (blue bars), across various geographic regions or
administrative circles. The MAPE metric represents the forecasting error, with lower values
indicating better predictive performance. Each bar corresponds to a specific circle, showing the
relative performance of both models in predicting energy consumption.

From the graph, we observe that the N-BeatsX model (blue) generally exhibits slightly higher
MAPE values in many circles compared to N-HiTSX (red). However, this trend is not uniform,
as certain circles display the opposite pattern, with N-HiTSX performing worse. For instance, in
circles like Mahaboobnagar and Medak, N-BeatsX has significantly higher error values, whereas
in other locations, the difference is minimal. This suggests that the models' performance varies
depending on the region, possibly due to variations in energy consumption patterns, data quality,
or local factors influencing demand. both models show competitive performance, N-HiTSX
appears to have a slight edge in several circles by maintaining a lower error rate

43
CHAPTER 6: CONCLUSION
This study aimed to enhance energy consumption forecasting in Telangana by leveraging state-
of-the-art deep learning models—N-HiTS and TimesNet. Through this experiment, it has been
observed that traditional forecasting approaches, such as ARIMA, RNN, and LSTM, struggle
with capturing long-term dependencies and multivariate complexities in energy data and models
incorporating CNN are more suited for spatially structured data. Our comparative analysis
demonstrated that deep learning models, particularly N-HiTSX which is our advancement on the
NHiTS model and TimesNet, significantly outperforms conventional models by effectively
capturing intricate temporal patterns and spatial variations through the inclusion of location as
an exogenous variable.

The experimental results reveal that N-HiTSX achieved the lowest error rates across all
performance metrics (MAE, MSE, RMSE, and MAPE), making it the most accurate model for
energy consumption prediction. Additionally, TimesNet, with its unique multi-scale temporal
learning approach, showed remarkable improvement in handling long-term dependencies and
seasonal variations. These findings underscore the potential of deep learning architectures in
real-world energy forecasting applications, where precision and adaptability are critical for
efficient resource management.

Beyond model accuracy, this study highlights the importance of integrating spatial information
(e.g., location-based consumption trends) to improve forecast reliability. By incorporating
geographical identifiers as exogenous variables, we demonstrated that energy consumption
patterns vary significantly across regions, reinforcing the need for location-aware forecasting
methodologies.

However, the study also acknowledges certain limitations. First, while deep learning models
offer superior performance, they require extensive computational resources for training and
tuning. Second, the reliance on historical data means that abrupt changes due to unforeseen
events (such as policy shifts or natural disasters) may not be effectively captured. Addressing
these challenges will be crucial for future improvements in forecasting methodologies.

In summary, our research contributes to the growing field of deep learning-based energy
forecasting by validating the effectiveness of advanced neural architectures such as N-HiTS and
TimesNet. By improving prediction accuracy and demonstrating the benefits of incorporating
spatial variables, this study provides valuable insights for policymakers, energy providers, and
researchers working towards efficient energy management and planning.

44
CHAPTER 7: FUTURE WORK

This study has demonstrated two start of the art novel models TimesNet and N-HiTS including
an advancement for the N-HiTS model i.e. NHiTSX using an exogenous variable along with a
comparative study to forecast the electric energy consumption for Telangana, several research
avenues remain open which can be explores in our future work. First, including additional
exogenous variables such as weather conditions, day-wise temperatures, geological differences
or economic activity could serve as opportunities for further research.

Additionally, combining the strengths of traditional models with N-HiTS or TimesNet could be
explored to improve interpretability while maintaining accuracy of the new models. This will
not only help in increasing interpretability, maintaining accuracy but also contribute to building
a more robust model capable of handling various features of the dataset efficiently by leveraging
the strengths of each model within the hybrid model framework

Third, transfer learning techniques could be investigated to apply the trained models to other
geographical regions with limited historical data. Additionally, deploying these models in real-
time forecasting systems could help energy providers optimize grid management and demand-
side planning. Lastly, further research could explore ways to reduce the computational cost of
deep learning models, making them more accessible for large-scale industrial applications. By
addressing these areas, future research can build on our findings to develop more robust,
scalable, and efficient forecasting solutions.

45
REFERENCES
[1] Ramos, P.V.B., Villela, S.M., Silva, W.N. and Dias, B.H., 2023. Residential energy consumption
forecasting using deep learning models. Applied Energy, 350, p.121705.
[2] Qureshi, M., Arbab, M.A. and Rehman, S.U., 2024. Deep learning-based forecasting of electricity
consumption. Scientific Reports, 14(1), p.6489.
[3] Li, Y., 2024, February. Energy consumption forecasting with deep learning. In Journal of Physics:
Conference Series (Vol. 2711, No. 1, p. 012012). IOP Publishing.
[4] Kang, T., Lim, D.Y., Tayara, H. and Chong, K.T., 2020. Forecasting of power demands using deep
learning. Applied Sciences, 10(20), p.7241.
[5] Mathumitha, R., Rathika, P. and Manimala, K., 2024. Intelligent deep learning techniques for energy
consumption forecasting in smart buildings: a review. Artificial Intelligence Review, 57(2), p.35.
[6] Kim, J.Y. and Cho, S.B., 2019. Electric energy consumption prediction by deep learning with state
explainable autoencoder. Energies, 12(4), p.739.
[7] Somu, N., MR, G.R. and Ramamritham, K., 2021. A deep learning framework for building energy
consumption forecast. Renewable and Sustainable Energy Reviews, 137, p.110591.
[8] Real Torres, A.D., Dorado, F. and Durán, J., 2020. Energy Demand Forecasting Using Deep Learning:
Applications for the French Grid. Energies, 13 (9), Article number 2242.
[9] Lei, L., Chen, W., Wu, B., Chen, C. and Liu, W., 2021. A building energy consumption prediction
model based on rough set theory and deep learning algorithms. Energy and Buildings, 240, p.110886.
[10] Syed, D., Abu-Rub, H., Ghrayeb, A. and Refaat, S.S., 2021. Household-level energy forecasting in
smart buildings using a novel hybrid deep learning model. IEEE Access, 9, pp.33498-33511.
[11] Guo, J., Lin, P., Zhang, L., Pan, Y. and Xiao, Z., 2023. Dynamic adaptive encoder-decoder deep
learning networks for multivariate time series forecasting of building energy consumption. Applied
Energy, 350, p.121803.
[12] Edoka, E.O., Abanihi, V.K., Amhenrior, H.E., Evbogbai, E.M.J., Bello, L.O. and Oisamoje, V., 2023.
Time series forecasting of electrical energy consumption using deep learning algorithm. Nigerian Journal
of Technological Development, 20(3), pp.163-175.
[13] Khan, A.M. and Osińska, M., 2023. Comparing forecasting accuracy of selected grey and time series
models based on energy consumption in Brazil and India. Expert Systems with Applications, 212,
p.118840.
[14] Mateusz Kasprzyka, Pawel Pelkab, Boris N. Oreshkinc, Grzegorz DudekbEnhanced, 2024. N-
BEATS for Mid-Term Electricity Demand Forecasting.
[15] Cristian Challu, Kin G. Olivares, Boris N. Oreshkin, Federico Garza, Max Mergenthaler-Canseco,
Artur Dubrawski, 2023. N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting.
[16] Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, Mingsheng Long, 2022. TimesNet:
Temporal 2D-Variation Modeling for General Time Series Analysis.
[17] Keshav G, 2020. N-BEATS: NEURAL BASIS EXPANSION ANALYSIS FOR INTERPRETABLE
TIME SERIES FORECASTING.
[18] Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio, 2020. N-BEATS: NEURAL
BASIS EXPANSION ANALYSIS FOR INTERPRETABLE TIME SERIES FORECASTING.
[19] Kin G. Olivaresa, Cristian Challua, Grzegorz Marcjaszb, Rafał Weronb, Artur Dubrawski, 2023.
Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx.

46
[20] Ahmad M. Aldabbagh, Andreas Economou, Christou Chariton, 2024. Forecasting Global Oil
Demand: Application of Machine Learning Techniques.
[21] Yu, F., 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint
arXiv:1511.07122.

47

You might also like