Data analysis-based time series forecast for managing household electricity consumption

Nour El-Houda Bezzar; Lakhdar Laimeche; Abdallah Meraoumia; Lotfi Houam

doi:10.1515/dema-2022-0176

Open Access Published by De Gruyter Open Access November 30, 2022

Data analysis-based time series forecast for managing household electricity consumption

Nour El-Houda Bezzar , Lakhdar Laimeche , Abdallah Meraoumia and Lotfi Houam

From the journal Demonstratio Mathematica

https://doi.org/10.1515/dema-2022-0176

Abstract

Recently, electricity consumption forecasting has attracted much research due to its importance in our daily life as well as in economic activities. This process is seen as one of the ways to manage future electricity needs, including anticipating the supply-demand balance, especially at peak times, and helping the customer make real-time decisions about their consumption. Therefore, based on statistical techniques (ST) and/or artificial intelligence (AI), many forecasting models have been developed in the literature, but unfortunately, in addition to poor choice of the appropriate model, time series datasets were used directly without being seriously analyzed. In this article, we have proposed an efficient electricity consumption prediction model that takes into account the shortcomings mentioned earlier. Therefore, the database was analyzed to address all anomalies such as non-numeric values, aberrant, and missing values. In addition, by analyzing the correlation between the data, the possible periods for forecasting electricity consumption were determined. The experimental results carried out on the Individual Household Electricity Power Consumption dataset showed a clear superiority of the proposed model over most of the ST and/or AI-based models proposed in the literature.

Keywords: household energy consumption; electricity consumption forecasting; time series forecast; data analysis; XGBoost

MSC 2010: 62M10; 37M10; 62J02

1 Introduction

Energy production, in particular electrical energy, is a representative indicator of the development of countries’ economies because of its contribution to all industrial activities and those related to the lives of citizens, from the facilitation of their living conditions to their improvement, which effectively contributes to their modernity. Nowadays, electricity consumption keeps increasing year by year, and this rapid growth is mainly attributed to industrial development, population growth, urbanization of cities, and countryside areas and increasing household use, which requires an efficient network capable of meeting this growing demand. Therefore, under these conditions, the forecast of electricity consumption becomes a planning tool to meet the demand for electricity and adapt the most efficient electrical resource [1]. In the field of energy forecasting, it is possible to solve practical problems such as the development of demand/production forecasting systems. In fact, by using historical data and making correlations with other data sets, it is possible to create a model that can predict future electricity consumption. These historical data can be represented by a set of observations where each one is associated with a unique instant of time t or they can also be constructed by gathering a limited number of observations. In both cases, their representation is called time series and can be expressed as a vector of n variables where the realizations are the actual observations. In practice, the first representation is the most used [2], where time series are represented by vectors, indexed by time, of successive observations.

Time series forecasting is a very active research topic in science and engineering. The main objective of analyzing these series is to develop a mathematical model capable of predicting future observations based on the available data. In the literature, there are two different approaches of time series modeling and prediction, which are linear and nonlinear forecasting models. Typical linear models for time series forecasting mainly include auto regressive (AR), moving average (MA), and auto regressive moving average (ARMA) [3,4,5, 6,7]. Until recently, linear statistical forecasting methods were widely used, but unfortunately these models hardly achieve effective prediction results for non-linear or chaotic time series because they assume the time series to be constant and show linear dependencies over time, which requires statistical methods and a prior knowledge.

Recently, many researches on the use of artificial intelligence (AI) have been proposed to solve forecasting problems using nonlinear time series, such as machine learning (ML) techniques, genetic algorithm, fuzzy logic, and so on [8,9,10, 11,12,13, 14,15,16]. ML is one of the most interesting branches of AI that can be used in the prediction of such problems. ML methods can be classified into two main families, namely, unsupervised learning and supervised learning. In unsupervised learning, without additional information, the algorithm can classify the input variables by developing an internal representation of the input data space based on specifying an underlying statistical structure of the variables. In general, supervised ML consists of creating a classification or regression model from a database that includes the input examples along with the associated desired outputs. The model parameters will therefore be adapted by comparing each time the achieved outputs and those desired.

In AI-based time series forecasting, most studies have used linear or non-linear predictive models [17], which have indeed achieved a very acceptable prediction error. Despite this success, these methods depend on the nature of the input data and the type of the model used. Indeed, for the linear model, we find that the models developed are based on the dataset collected in annual, seasonal, monthly, weekly, or daily granularities. Moreover, these models suffer from the existing dataset containing non-numeric values and missing values, which hinders the application of linear models directly to this dataset; this is why the data preparation task should be applied to fill in missing cells and replace non-numeric data. To overcome these issues, in this article, we propose a new energy consumption forecasting model based on XGBoost ensemble regression that can be used even with missing data. In addition, based on the research greed method, we will address the problem of over-fitting using a set of XGBoost hyperparameters, which has an in-built capability to handle missing values. We will test the proposed model using an Individual Household Electricity Power Consumption (IHEPC) dataset, which allows us to develop different models based on different granularities. This dataset contains several measurements, the most important of which is global active power. Therefore, due to the importance of this metric, we will focus on its prediction in this article.

The rest of this article is organized as follows: Section 2 presents a review of the literature in the field of electricity consumption forecasting. In Section 3, the XGBoost ML technique used in this work as well as the performance measures used for the evaluation of time series models are introduced. The concept of our proposed methodology is introduced in Section 4. In Section 5, we evaluate the proposed method using a publicly available database. Section 6 aims to present a comparative study between our proposed scheme and some new methods in the literature. Finally, Section 7 provides the conclusions reached and the future scopes.

2 Literature review

In this section, we will try to highlight the latest work carried out on AI-based electricity consumption prediction using a time series forecasting model. Therefore, due to the many models used, we will try to divide this overview into two parts. The first concerns ML methods, while the second part presents methods based on deep learning (DL).

In [18], four statistical models are developed, including multiple linear regression (MLP), support vector machine (SVM) with radial kernel, random forest (RF), and gradient boosting machines (GBMs). In this work, the authors first carried out a data analysis process, which included several steps, the most important of which are the aggregation of electrical energy consumption, the appliances energy consumption measurement over the whole period, the pairs plot relationship between appliance energy consumption, hourly energy consumption of appliances heat map, and variable importance and selection. Then all regression models were trained using tenfold cross-validation to select the best one. The testing of the proposed models was performed on the passive house planning package dataset where the experimental results showed that GBM performed better in terms of root mean square error (RMSE = 0.665), mean absolute percentage error (MAPE = 0.389), and mean absolute error (MAE = 0.352) compared to MLP, SVM, and RF. In another ML-based electricity consumption prediction work, Wu et al. in [19] proposed a boosting-based framework for multiple Kernels learning regression to deal with short-term load forecasting. In this method, the authors first adopt boosting to learn a set of multiple kernel regressors, and then extend this framework to the context of transfer learning. Then, homogeneous and heterogeneous transfers learning were considered. Experimental results on residential datasets (open EI and UCI dataset) demonstrate that forecasting error can be reduced to a large extent by knowledge gained from other houses.

The time series forecasting model for electricity consumption proposed in [20] is based on Kernel principal component analysis (KPCA) and SVM. The main idea of this work is to use KPCA for feature selection and data dimensionality reduction and SVM for prediction. In the experiments step, first, several preprocessing operations including data decomposition and normalization scaling process were performed. Subsequently, KPCA was used to reduce the processed data, and then a prediction of electricity consumption based on 30 min ahead was performed using the SVM model. The experimental results showed that excellent results were obtained, represented by MAE and RMSE equal to 11.48 and 13.86, respectively.

Moon et al. in [21] proposed a hybrid electric load forecasting model for an educational building complex using an RF and an MLP. In this method, 6-year electrical load data from a college campus is first collected. Then, before building their model, a preprocessing task was performed, including temperature adjustment, estimation of the week-ahead consumption, and year-ahead consumption. Finally, a hybrid forecasting model was built based on the combination of a random model and a multilayer perceptron model. Thus, various RF and multilayer perceptron configurations were performed to evaluate their prediction performance based on the cross-validation technique. The experimental results, performed on a daily power consumption dataset from a university in Korea, showed that these two models have excellent performance in forecasting electric load, in which 1.143 and 2.946 for MAPE and RMSE, respectively, were obtained.

The emergence of DL and the promising results obtained in many fields have led many researchers in the field of energy consumption to use these techniques to model electricity consumption forecasting. Thus, Rajabi and Estebsari in [22] proposed convolutional neural network (CNN) DL-based structure to predict individual loads. In this work, the time series data are first encoded into recurrence plot images, and then these images are fed into a deep CNN to obtain the predicted value of the time series data. In addition, SVM and artificial neural network (ANN) ML techniques were used with the time series dataset to compare the results obtained based on CNN DL with the results of SVM and ANN. To evaluate the performance of the proposed models, experiments were performed on the UCI dataset where the results showed that the proposed CNN-based model performs better compared to the examined ML methods (SVM and ANN), with MAE and RMSE equal to 0.59 and 0.79, respectively.

Tang et al. in [23] proposed a multilayer bidirectional recurrent neural network (BRNN) model based on long short-term memory (LSTM) and gated recurrent unit (GRU) to forecast the short-term power load. In their data preprocessing step, normalization scaling was used to render the feature values in the same magnitude range. Then, the BRNN model based on LSTM and GRU is adopted to improve the accuracy of short-term power load forecasts. Experiments were performed on EUNITE and Weather datasets. The experimental results showed that the proposed method can be used to improve the accuracy of the short-term load forecast and reduce the fluctuations of the forecast values during the forecasting process, which reach 1.90 and 16.044 for MAPE and RMSE, respectively. Also, Sajjad et al. in [24] proposed a novel hybrid approach based on CNN-GRU for short-term residential load forecasting. In this work, first, the standard min-max scaling feature method was used due to the nonlinearity of the input data, and then fed the normalized data to further training. Then, a hybrid model based on the combination of CNN with GRU was developed based on the investigation of several ML and DL models. Finally, the spatial features were extracted via CNN and then fed into the proposed multilayer GRU to model the temporal features corresponding to the input time series data. The proposed method was evaluated on the appliances energy prediction (AEP) and IHEPC datasets, where the experimental results proved that the forecasting model can work with a performance of 0.31 MAE and 0.24 RMSE for the AEP dataset.

A framework based on DL and supervised ML techniques is proposed in [25] for electricity consumption prediction. In this work, first, feature selection techniques, RF and extreme gradient boosting, were used to calculate feature importance. Then, two improved classification techniques, SVM with gray wolf optimization (SVM-GWO) and CNN-GRU with Earth Worm Optimization (CNN-GRU-EWO), are proposed to predict the electric load. The two optimization techniques, GWO and EWO, were applied to tune the hyperparameters of SVM and CNN-GRU, respectively. The results of the experiment showed that the proposed technique is 7 and 3% more efficient than the state of the art. In the same context and based on the optimization technique, Elattar et al. in [26] proposed a method based on locally weighted support vector regression (LWSVR) and a modified grasshopper optimization algorithm (MGOA) method. In this work, first, the authors modified the conventional GOA algorithm using both chaotic initialization and the sigmoid decrement criterion. Then, the enhanced MGOA was used to obtain the appropriate values of the LWSVR parameters. Finally, six datasets were used to evaluate the performance of the proposed method, namely, New York dataset, Victoria dataset, EKPC area, CEFCom, ISO New England dataset, and micro grids datasets. The experimental results indicate that the proposed method has high prediction performance for the ISO New England dataset, in which MAPE equal to 1.67 and MAE equal to 0.89 were successfully obtained.

Khan et al. in [27] developed a CNN with the LSTM-AE model and a data preprocessing step to efficiently predict electricity consumption in residential and commercial buildings. In this work, first, a preprocessing step was performed in which Min-Max scalar, standard transformation; Max-Abs scalar, quartile and power techniques were used to process existing nulls, redundant, and outlier values. Then, during the learning step, the CNN extracts features from the refined input data. The output CNN features are fed into the LSTM encoder, which encodes the input sequences of four time steps. Finally, these encoded sequences are input to another LSTM for decoding, and finally, a dense layer is used to produce the output prediction for the input sequence. The experimental results proved the ability of the proposed hybrid model to obtain good results, which gave 0.47, 0.76, and 0.31 for RMSE, MAPE, and MAE, respectively.

3 Theory framework

In this section, we will detail the concepts and theories that we will use in our methodology.

3.1 Time series components

In general, anything that is observed sequentially over time is a time series. So, a discrete time series ( x t ) 1 ≤ t ≤ n is a series of n numerical data points in a chronological order (chronological series), indexed in time with a constant time interval between the indices, where time is represented in seconds, minutes, days, years, and so on. One of the main objectives of the study of time series is the prediction of future observations, often used in several fields, such as economics, finance, demography, biology, medicine, meteorology, and pollution. Of course, no model exactly matches reality, and it is impossible to perfectly predict the future of a time series, but wherever possible, the model will provide forecast periods, so that we can provide information on the accuracy of the prediction. To do this, there is a wide choice of models that can be used, such as the regression model, exponential smoothing, and ARMA-like models [28].

A time series typically has three main components: trends, seasonality, and residual. The first component describes the movement at a long or court term of a time series. With a simple analysis of this component, all fluctuations in random observations due to seasonal and cyclical factors can be removed. The second component represents a phenomenon that repeats itself at regular time intervals (a periodic function), while the third component, or residuals, represents irregular fluctuations of low intensity but random in nature. Some time series may not be affected by all types of components mentioned earlier, but in any case, the time series should be analyzed to estimate and separate all components to highlight the relative effect of each on the overall behavior of the time series [29].

3.1.1 Time series models

Mathematically, the interaction of the three components of the time series (trends, seasonality, and residual) produces two types of models: the additive model and the multiplicative model [30].

• Additive model: In this model, the time series can be mathematically represented by:

(1) x i = g i + s i + w i ,

where x i ∣ i = 1 … n is the observed series, g i is the trend, s i is the seasonal factor and w i is the residual fluctuation component. If this phenomenon is observed at regular times, the i -order value is observed in time i × t e , where t i is the sampling period or the observation period. By convention, s i is required to have a zero mean, to unambiguously separate the trend and the seasonal factor:

(2) ∑ k = 1 p s i + k = 0 , ∀ 1 ≤ t ≤ n − p ,

where p is the period of seasonal fluctuation. Similarly, for the mean of x i to be included in the trend g i , we also assume that w i has zero mean. In this model, amplitudes of s i and w i at a given instant do not depend on the value of g i at the same instant.

• Multiplicative model: In this model, the time series can be mathematically represented by:

(3) x i = g i ( 1 + s i ) ( 1 + w i ) = g i + g i s i + g i ( 1 + s i ) w i = g i + s ˆ i + w ˆ i ,

where x i ∣ i = 1 … N is the observed series, g i is the trend, 1 + s i is the seasonal factor, and 1 + w i is the residual fluctuation component. The seasonal factor s ˆ i that is added to the trend is proportional to its value, and the residual fluctuation w ˆ i that is added to the sum of the two previous terms is itself proportional to this sum. Finally, it should be noted that the two models can be combined to obtain a hybrid or mixed model. In the literature, there are different hybrid models. Equation 4 presents an example of this model:

(4) x i = ( g i + s i ) ( 1 + w i ) .

The additive model is useful when the seasonal variance is relatively constant over time, while the multiplicative model is useful when the seasonal variance increases over time. In other cases, a hybrid model can be used [31].

3.1.2 Time series decomposition

Time series decomposition (into trends, seasonality, and residual components) is a crucial step for time series analysis, in which it is considered as an analytical tool that can be used to provide an abstract model useful for the estimation, forecasting, and generation processes. The importance of this step lies in determining whether the studied time series contain consistency or recurrence components and can be described and directly modeled. Also, each of these components can indicate whether we need to prepare data, model selection, or feature engineering tasks. Thus, time series decomposition process involves, first, applying a convolution filter to the data to estimate the trend, then removing that trend from the time series, and finally, averaging the obtained detrended series for each time period to return the seasonal component [32].

In general, when analyzing a time series, we first represent it graphically (visual inspection). In some cases, a trend component and a seasonal cycle can be observed. Unfortunately, these components are not always easy to observe, and the series may not contain any of them. In fact, the trend can sometimes be observed after the data have been transformed by a function, for example, logarithmic. In the following, we assume that the time series x t can be represented in the decomposition form as shown in equation (1).

Trend estimation: Trend estimation deals with the characterization of the fundamental (underlying) or long-term evolution of a time series. A variety of methods are available, but two popular methods (MA smoothing and difference) are as follows:
1. Smoothing by MA: the MA smoothing method is a non parametric method that reproduces the trend. The advantage of this method is that it requires no assumptions and can accept a non-constant trend during a period of seasonal variation [33]. The trend is estimated by a symmetrical MA over a length p :
  1. If the period is odd ( p = 2 q + 1 ), then
    (5) ( g i ) e = 1 2 q + 1 ∑ j = − q q x i + j = 1 p ∑ j = − q q x i + j i = q + 1 , … , n − q .
  2. If the period is even ( p = 2 q ), then
    (6) ( g i ) e = 1 p 0.5 x i − q + ∑ j = − q + 1 − q x i + j + 0.5 x i + q i = q + 1 , … , n − q .
    The choice of q always results from a compromise; thus, a too low q does not allow to extract the trend from the residual, a too high q does not take into account changes in trend.
2. Differentiation: In order to deal with the case of a time series with seasonal factor of period p , the differentiation method must be considered at order p . The gradient operator ∇ p is given by the following formula:
  (7) ∇ p x i = x i − x i − p = ( 1 − B p ) x i ,
  where B is the backward operator defined as follows:
  (8) B p x i = x i − p .
  By applying the ∇ p operator on the data, we obtain:
  (9) ∇ p x i = g i − g i − p + w i + w i − p .
  Unlike the previous method, in this method, the trend g i is not calculated explicitly, only the difference ( g i − g i − p ) is calculated.
Seasonal factor estimation: After estimating and subtracting the trend ( g i ) e from the data x i for the different values of i , it remains to estimate the seasonal factor s i as follows:
(10) ∀ 1 ≤ m ≤ n , ∃ k = n p , ( S m ) e = 1 k ∑ k = 1 K ( x k p + m − ( g k p + m ) e ) − 1 p ∑ m = 1 p ( x k p + m − ( g k p + m ) e ) ,
where ⌊ ∘ ⌋ denotes the integer part operator.
Residual fluctuation estimation: Of course, after estimating and subtracting the seasonal factor ( s i ) e from the data x i − ( g i ) e for the different values of i , we then obtain an estimate of the residual variability w i at each moment i .
(11) w i = x i − ( g i ) e − ( s i ) e .

3.2 XGBoost description

XGBoost is an implementation of gradient boosted decision trees in which decision trees are created in sequential form [34]. Weights play an important role in XGBoost, where all independent variables are weighted and then fed into the decision tree to predict the results. The weight of variables wrongly predicted by the tree is increased, and these variables are then fed to the second decision tree. Combining these individual classifiers yields a robust and more accurate model that can be used for regression, classification, ranking, and prediction problems. The algorithm of this model can be summarized in the following steps:

The first step in gradient boosting is to create an initial constant value F 0 based on dataset observations x ; mathematically, the first step can be written as follows:
(12) F i ( x ) = argmin ∑ i = 1 n L ( y i , y ˆ i ) ,
where L denotes the loss function, y i is the predicted value, and argmin means that we have to find a predicted value, y ˆ i , for which the loss function is minimum. Furthermore, this step consists of finding a minimum value of the predicted value using the loss function given in equation (13) based on the differentiation of this loss function given in equation (14).
(13) L ( y i , y ˆ i ) = ( y i , y ˆ i ) 2 .
To find y i that minimizes Σ L , we are taking a derivative of Σ L with respect to y i .
(14) ∂ ∂ y ˆ i ∑ i = 1 n L ( y i , y ˆ i ) = ∂ ∂ y ˆ i ∑ i = 1 n L ( y i − y ˆ i ) 2 = − 2 ∑ i = 1 n ( y i − y ˆ i ) = − 2 ∑ i = 1 n y i + 2 n y ˆ / ∀ y ˆ i = y ˆ ∀ i .
We are finding y ˆ i that makes ∂ Σ L / ∂ y ˆ i equal to 0.
(15) − 2 ∑ i = 1 n y i + 2 n y ˆ = 0 ⇒ ∑ i = 1 n y i = n y ˆ ⇒ 1 n ∑ i = 1 n y i = n y ˆ .
It turned out that the value y ˆ i that minimizes Σ L is the mean of y i .
The next step consists in calculating the pseudo-residual values which can be given by: (the observed value – the predicted value). This step can be written as follows:
(16) R i m = − ∂ L ( y i , F ( x i ) ) ∂ F ( x i ) F ( x ) = F m − 1 ( x ) for i = 1 , … , n .
Here, F ( x i ) is the previous model and m is the number of the decision tree (DT) created. In this step, the derivative of the loss function is taken with respect to the expected value.
(17) R i m = − ∂ L ( y i − F m − 1 ) 2 ∂ F m − 1 ,

(18) = 2 ( y i − F m − 1 ) ,
This step attempts to find the output values for each leaf in the decision tree. There may be a case where one leaf obtains more than one residue, so it is necessary to find the final output of all leaves. To find the output, it can simply take the average of all the numbers in a leaf. Mathematically, this step can be represented by:
(19) γ m = argmin ∑ i = 1 n L ( y m , F m − 1 ( x i + γ h m ( x i ) ) ) ,
where h m ( x i ) is the DT made on residuals and m is the number of DTs.
The last step is to update the predictions of previous models. It can be updated as follows:
(20) F m = F m − 1 ( x ) + v m h m ( x ) ,
where m is the number of performed decision trees, F m − 1 ( x ) is the base model prediction (previous prediction), and v m denotes the learning rate.

3.3 Performance measures

Statistical metrics, or cost functions [35], represent the key performance indicators of different models; they allow us to evaluate the performance of a particular model. Among the metrics used to evaluate time series models, RMSE, MAPE, and MAE [36] are most commonly used because they both summarize the variability of observations around the mean. Although they are similar, they are not at the same scale, and therefore, their values are not identical.

RMSE is the standard deviation of the residuals (prediction errors), which gives us the distance between the data points and the regression line [37]. Mathematically, the RMSE formula can be written as follows:
(21) RMSE = 1 n ∑ i = 1 n ( y i − y ˆ i ) 2 ,
where y ˆ i is the predicted values, y i is the observations, and n is the number of observations available for analysis. The use of RMSE is very common, and it is considered an excellent general purpose error metric for predictions.
MAPE is the second most used metric to evaluate the time series model and is represented mathematically by the following equation:
(22) MAPE = 1 n ∑ i = 1 n y i − y ˆ i y i .
This metric measures the difference in prediction errors and divides it by the actual observation value, and so it is scale independent, meaning it can be used to compare models on different datasets.
MAE measures how much the estimated or forecasted values differ from the actual values. It is mostly used in a time series, but it can be applied to any type of statistical estimation. The MAE is simply defined as follows:
(23) MAE = 1 n ∑ i = 1 n ∣ y i − y ˆ i ∣ ,
where y i and y ˆ i denote the actual and predicted values, respectively.

4 Methodology

Data analysis is a field belonging to the statistics domain, which aims to establish the relationship between different statistical data, to classify, describe, and analyze them in a concise way. The purpose of this analysis is to extract statistical information to more accurately identify the profile of the data. The results obtained make it possible to optimize the respective forecasts by adjusting certain points (missing data, measurement data, etc.). In our work and based on the data analysis domain, we have proposed a time series forecasting framework for a household electricity consumption, which is shown in Figure 1. The proposed system consists of three main steps including data analysis, feature engineering, and time series modeling. In our system, data analysis includes three main processes: data cleaning, data decomposition, and data auto-correlation. In addition, feature generation and data scaling were used in the feature engineering task. Finally, XGBoost, which is an efficient implementation of the gradient boosted trees algorithm, was used for the final modeling.

Figure 1

Proposed forecasting electricity consumption framework.

4.1 Dataset description

In this work, to evaluate our proposed method, we used the IHEPC dataset [38]. Indeed, this dataset was collected from a household located in Sceaux (Paris, France), which contains 2,075,259 observations over a period of almost 4 years between December 2006 and November 2010 (47 months). The dataset contains various electrical quantities and some submetering values as well as date and time. The electrical quantities and submetering in this dataset represent global active power ( P a ), global reactive power ( P r ), voltage ( V ), global intensity ( I ), and three submetering corresponds to the kitchen ( M 1 ), the laundry room ( M 2 ), and an electric water heater and an air conditioner ( M 3 ).

4.2 Data analysis

Data analysis is an important step in the study of time series, and it is useful to analyze and summarize its key features using data visualization (visual inspection) and statistical methods to make their observations usable by ML algorithms. As mentioned earlier, the analysis process involves various tasks among which data cleaning, data analysis, and data auto-correlation tasks.

4.2.1 Data cleaning

The first step in the dataset analysis technique is to explore the dataset used to find nulls, duplicates, outliers, and missing values. These issues can be overcome by replacing “Null” and missing values with other values (or omitting them) and ensuring there are no duplicates. In our work, to explore the IHEPC dataset, we first visualized the percentage of missing values per feature in the dataset, as shown in Figure 2. From this figure, we can clearly see some missing values of about 1.25% for the M 3 feature. Also, all calendar timestamps are present in the dataset, but for some timestamps, the measure values are missing: in this dataset, a missing value was represented by a “ ? ” character in timestamps column and with an NaN value for M 3 . In our work, to avoid the effect of missing values of M 3 on the prediction model, we replaced these invalid measurements by observations at the same time of the previous day.

Figure 2

Percentage of missing values in the dataset.

4.2.2 Data decomposition

This technique identifies the components of the time series to ensure that the series is stationary over the historical data of observation overtime period. If the time series is not stationary, we can apply the differencing factor on the records if the time series is a stationary overtime period. Moreover, it helps the data as a guide for the lags selection. Thus, we analyzed the time series used, and the obtained results are shown in Figure 3.

Figure 3

Household electricity consumption decomposition based on monthly granulate. (a) Trend, (b) seasonality, and (c) stationary (residuals).

In this step, to analyze the IHEPC dataset, we used a statistical method that uses a convolution process between the observed feature values and the filter defined by an MA model. Moreover, we assumed that the model is additive, and we predetermined the slide and frequency values to 2 and 12, respectively. The trend graph of the IHEPC dataset, presented in Figure 3(a), gives an abstract of electricity consumption, in which it shows a strong increase in electricity consumption in the hot periods between June and August and a decrease in cold periods between December and March. Also, we can clearly see in this figure the presence of an extreme value in 2008 of 97.76 KW, which represents a consumption peak identified on July 31, 2008, 2009 at 4:21 p.m.

Statistically, the trend graph of the IHEPC dataset is useful for extracting different statistical information that can be used in the electricity consumption modeling step. From Figure 3(a), it can be noted that the IHEPC trend is nonlinear over the long-term consumption period but linear over the short-term consumption period, in particular over the first and last 6 months. This is explained by the existence of a significant auto-correlation between the observations on the short-term consumption periods. It can be concluded from the aforementioned data that the forecasting model based on hourly or daily granularity is better than the one based on monthly granularity. Furthermore, this figure shows the existence of outliers, for example, in April and October 2007 and in May 2008, which indicate the need for a data scaling process.

Figure 3(b) illustrates the seasonal component, which represents variations that occur at regular intervals. Here, we set the frequency to 12, which resample every month. In this figure, we see that it is possible to identify the periodicity of the seasonality of electricity consumption where the spectral density is stronger around each beginning of the year. In this case, it appears that the seasonality with long-term consumption has a period of 1 year as shown in Figure 4.

Figure 4

Seasonality of household electricity consumption.

Finally, the last component shown in Figure 3(c) represents the time series residuals from which we can check whether the data observation is predictable. On the basis of the statistics of the residual errors obtained from the residual component, we can check the effect of aberrations on the model prediction. This can be achieved by checking the mean value of residual errors that if it is close to zero (the distribution of residual errors follows a Gaussian distribution), the seasonal component does not need seasonal adjustment to smooth out aberrations in the time series. For this, we show in Table 1, the summary statistics of the residual errors including the mean and the standard deviation of the distribution, as well as the percentiles and the minimum and maximum errors observed.

Table 1

Summary statistics of the residual errors

Mean	Standard deviation	Minimum error	Quartile (Q1)	Quartile (Q2)	Quartile (Q3)	Maximum error
0.0042	− 2.3193	− 5.2181	− 1.0298	− 0.0092	1.1090	7.0728

The statistics in this table indicate that the seasonal component does not indicate the need for seasonal adjustment, which is also confirmed by Figure 5, which shows the distribution of residual errors that follows the Gaussian distribution.

Figure 5

Household electricity consumption residual errors.

4.2.3 Time series auto-correlation

It is also important to consider the persistence of the relationship between the current observations of the time series and its past, which is called the lag. So we checked this with the auto-correlation function (ACF) of the electricity consumption series. Mathematically, the observations at y t and y t − k are separated by a lag of k , which can be in days, quarters, or years depending on the nature of the data. Thus, from the IHEPC dataset decomposition, we can observe that in contrast to the long-term consumption periods, there is a significant auto-correlation between the observations of the short-term consumption periods (hourly/daily). In our study and to calculate the ACF values, based on hourly and daily granularities, we used the Pearson correlation coefficient whose coefficient values are between − 1 and 1.

To compare the values of the ACF of the used time series dataset, it is important to compare it at different lag sizes. Thus, Figure 6 shows the ACF coefficients for both daily (Figure 6(a) and (b)) and hourly (Figure 6(c) and (d)) granularities for different lag sizes equal to 7 and 30 days, 60 and 12 h, respectively. From this figure, we can draw the following observations:

For the daily granularities, according to Figure 6(a), it appears that there is a strong correlation between the 1st day and the 7th day (approximately 0.6 value of auto-correlation) as well as between the 2nd day and on the 6th day with an acceptable correlation equal to 0.4. Figure 6(b) shows the auto-correlation value with a lag size equal to 30 days. In this figure, we observe that there is a strong correlation between the 4th day and the 8th day.
For the hourly granularities, from Figure 6(c), we observe that there is an acceptable correlation between the 13th hour and the 36th reaching an auto-correlation value around 0.2, whereas in Figure 6(d), the ACF values are very close to 0, which indicates that there is no correlation.

Figure 6

Auto-correlation of time series data resampled to: (a and b) daily mean values and (c and d) hourly mean values.

4.3 Feature engineering

Feature engineering plays a crucial role in many data modeling tasks; it is simply a process that defines particular features of data to make ML algorithms more accurate [39].

4.3.1 Feature generation

It is a fundamental step in the data analysis domain, which includes select, manipulate, and transform raw data into features that can be used in ML techniques. In this work, instead of using the IHEPC dataset as a multivariate dataset, i.e., predicting the electricity consumption for each submeasurement feature, we used the uni-variate principle. So the task is to create new features from our time series dataset. For this, we have created an additional feature, called M 4 or submetering 4, using the following formula:

(24) M 4 = ( P a × 1,000 / 60 ) − ∑ i = 1 3 M i ,

where M 4 denotes the generated feature, p a is the global active power, and M i is the submetering features. The obtained feature M 4 represent the active power consumed every minute in the household by electrical equipment not measured in M i , i = 1 , … , 3 .

4.3.2 Data scaling

As we know, most ML techniques make decisions based on a set of features applied to them and often the algorithms calculate the distance between data points to make better inferences from the data. If the feature values are closer to each other, there is a good chance that the algorithm will obtain a good and faster training rather than the dataset where the data points or feature values that have large differences with each other, which makes the algorithm take longer to understand the data and generally the accuracy of the resulting model is less [40].

As indicated in Section 4.2, the IHEPC dataset contains aberrant values corresponding to extreme electricity consumptions. To show the magnitude of IHEPC features, we calculated the probability density function of each feature as shown in Figure 7. It is important to note that the voltage values ( V ) in the household range in cases normal between 230 and 250 volts (Gaussian function centered at 240 volts), so there is no need to graph them.

Figure 7

Probability density function of IHEPC features.

As shown in Figure 7, we observed significant intra/inter feature variation in the IHEPC dataset in terms of values and units, especially for the voltage feature. To handle highly varying magnitudes, values, or units presented in the IHEPC dataset, two most common scaling techniques, which are normalization and standardization, are used in this work [41].

Normalization: Min-Max scale can be applied when the data vary across different scales. At the end of this transformation, the features will be included in a fixed interval [ 0 , 1 ] . The purpose of having such a restricted interval is to reduce the variation space of the feature values and therefore to reduce the effect of outliers. Min-Max normalization is done using the following formula:
(25) X ˆ = X − X min X max − X min ,
where X min and X max denote, respectively, the smallest and the highest observed value for feature X , and X is the value of the feature we are trying to normalize.
Standardization: Standardization can be applied when the input features respond to normal distributions (Gaussian distributions) with different mean values and standard deviations. Consequently, this transformation will have the impact of having all the features satisfying the same normal distribution X ∼ N ( 0 , 1 ) . Standardization can also be applied when features have different units. It is the process of transforming such a feature into another that will satisfy the normal distribution (Gaussian distribution) with μ = 0 and σ = 1 . The standardization formula is as follows:
(26) Z = X − μ σ ,
where X denotes the value we want to standardize (input variable), μ is the mean of the observations for this feature, and σ = 1 is the standard deviation of the observations for this feature.

4.4 Time series modeling

In general, the proposed methodology is mainly based on the XGBoost model, which, as mentioned earlier, is a powerful and flexible model. To generate our prediction model, first, in the IHEPC dataset, we specified a target y i (which is the global active power in our work) and used the rest of the other features x i as predictors of our target. Then, a decision tree is created with the residuals using a similarity score for the residuals. The similarity of the data in a leaf is calculated, as well as the similarity gain in subsequent split. Then the output value for each leaf is also calculated using the residuals. The output of the tree becomes the new residual for the dataset, which is used to build another tree. This process is repeated until the residuals stop decreasing or for a specified number of epochs. Finally, to use this model for prediction, the output of each tree multiplied by a learning rate is added to the initial prediction to arrive at a final prediction. For clarity, Figure 8 shows the behavior of XGBoost during the training phase.

Figure 8

XGBoost-based modeling process.

5 Experimental results

In this section, a detailed performance evaluation of the proposed method for time series forecast is presented. In our experiments, an IHEPC dataset described earlier was used, in which it is divided into two galleries. The first, containing data from December 26, 2006 to December 31, 2009, was used as the training dataset, while the second, containing data from January 1, 2010 to November 26, 2010, is dedicated to testing forecasting models and find the most appropriate forecast period. Several experiments will be conducted in this work, which are divided into three main parts. In the first part, the performance of the XGBoost-based model will be evaluated. We will use the default XGBoost hyperparameters and unscaled data. In the second part, we will try to adjust the XGBoost hyperparameters to achieve the best performance. Finally, in the third part, the best XGBoost configuration will be re-evaluated using dataset feature scaling.

5.1 XGBoost-based time series forecasting model

In this first set of experiments, we resampled the time series dataset with hourly and daily granularities. This can be explained by the strong auto-correlation that exists between present and past observations as explained in Section 4.2. Then, we used XGBoost learning regression with its default hyperparameters (number of estimators = 100, maximum depth = 6, learning rate = 0.1, booster type = gbtree) to build the electricity consumption forecasting model. Thus, Figure 9 illustrates the hourly and daily electricity consumption forecasts obtained using the XGBoost model.

Figure 9

Household prediction vs original observations resampled to: (a) hourly granularity, and (b) daily granularity.

By visually examining these figures, we can clearly see the good tracking of the forecast curves for all fluctuations in the original electricity consumption during the hourly and day, except for certain hours and days of peak usage, which are due to random variations for an individual home user. In fact, in the case of hourly resampling, our forecasting model produces average prediction errors of 0.78, and in the case of daily resampling, the rate of this error is 0.77 (Figure 10).

Figure 10

Household prediction errors of the XGBoost model: (a) hourly granularity and (b) daily granularity.

5.2 XGBoost hyperparameters tuning

The results obtained in the previous section are excellent, but to improve the performance of the system, especially at peak times, in this section, we will try to select the best XGBoost hyperparameters to improve the performance of the prediction model. In our hyperparameter selection process, we will use the grid search method to tune the configuration of the XGBoost hyperparameters. Thus, in our tests, we will try to choose the number of estimators, the maximum depth, as well as the learning rate among the values of the sets {500, 1.000, 1.500}, {2, 4, 6, 8}, and {0.05, 0.1, 0.15, 0.20}, respectively. These values will be tested by two booster types, which are gbtree and linear tree. From all these values, a grid of search parameters was built, which tested 468 models. After this series of experiments and by an objective comparison of the different results, we noted that the model works efficiently, in both cases of hourly and daily resampled data, for the values of 1.500, 2, and 0.15 for the number of estimators, the maximum depth, and the learning rate, respectively, when using the gbtree-type booster.

Figure 11 illustrates the hourly and daily electricity consumption predictions obtained after tuning the hyperparameters of the XGBoost model. From this figure, we can see that the prediction curves are improved to be able to effectively track fluctuations in the original electricity consumption better than the XGBoost model before tuning the hyperparameters. As shown in Figure 12, the optimal model achieves average prediction errors of 0.46 and 0.38 for the hourly and daily resampled data sets, respectively.

Figure 11

Household prediction vs original observations after XGBoost hyperparameters tuning resampled to: (a) hourly granularity, and (b) daily granularity.

Figure 12

Household prediction errors of the optimal XGBoost model (after tuning hyper parameters): (a) hourly granularity, and (b) daily granularity.

Despite this excellent performance, but in the last part of the experiments, we will try to further improve the performance by using dataset scaling.

5.3 Dataset feature scaling-based model

As indicated in Section 4.2, the IHEPC dataset contains aberrant corresponding to extreme electricity consumptions. This difference in scale can lead to a decrease in the performance of XGBoost model. Therefore, to overcome this limitation and improve the performance of the proposed model, we will apply in this section each of the scaling methods described earlier. The obtained scaled datasets are then fitted, separately, to the XGBoost learning regression with the best obtained parameters.

Figures 13 and 14 show the hourly and daily predictions of electricity consumption forecasts and prediction errors obtained after the feature scaling process using the normalization technique. It is evident from Figure 13 that the prediction curves have been vastly improved in comparison to earlier cases, so that they now more closely resemble the fluctuations of the original electricity consumption. The average prediction errors for the hourly and daily resampled datasets are 0.28 and 0.22, respectively, as illustrated in Figure 14.

Figure 13

Household predictions vs original observations after feature normalization: (a) hourly granularity and (b) daily granularity.

Figure 14

Household prediction errors of XGBoost model after feature normalization: (a) hourly granularity and (b) daily granularity.

For the final scenario, Figures 15 and 16 show the hourly and daily forecasts of electricity consumption and prediction errors, respectively, after the feature scaling process utilizing the standard scalar technique. Visually, Figure 15 appears similar to Figure 13, which is based on a normalization method that generally remains superior to both the XGBoost model and XGBoost after hyperparameter tuning. But numerically, the prediction errors (Figure 16) show that the latest proposed model is superior to previous models, reaching average prediction errors of 0.13 and 0.031 for the hourly and daily resampled datasets, respectively.

Figure 15

Household predictions vs original observations after feature standardization: (a) hourly granularity and (b) daily granularity.

Figure 16

Household prediction errors of XGBoost model after feature standardization: (a) hourly granularity (b) daily granularity.

Finally, based on the best XGBoost model in which we obtained the lowest error prediction values, the performance of the proposed method was evaluated using the RMSE and MAPE performance measures as shown in Table 2. From this table, it can be seen that the model based on the normalization scale method gives competitive results in terms of RMSE and MAPE in both cases, hourly and daily granularities. In fact, the best result was obtained in the case of daily granularity using the standardization technique, which is equal to 0.129 and 0.020 for RMSE and MAPE, respectively.

Table 2

Comparative results of the impact of the two scaling methods

Normalization				Standardization
Hourly		Daily		Hourly		Daily
RMSE	MAPE	RMSE	MAPE	RMSE	MAPE	RMSE	MAPE
0.346	0.317	0.390	0.233	0.229	0.026	0.129	0.020

6 Comparison with state of the art

In this section, after evaluating the performance of the proposed model, we will compare it to some works in the literature. Therefore, for a serious comparison, we will only choose works that use the same IHEPC dataset. This comparison was made taking into account secondary, hourly, and daily data. For hourly prediction, compared to the methods proposed in [20,22,24,27], our method showed remarkable superiority achieving the smallest error rate among these models (Table 3). For the daily prediction, the performances of the proposed model were compared to the works of [24,27], which showed a competitive performance in terms of RSME (0.129), in particular with the works of [27], while it gives a better result in terms of the MAPE (0.020). Similarly, for secondly and daily prediction, the proposed model achieves the smallest error rates of 0.229 and 0.026 compared to the works given in [20,22,27]. Future work in this study will focus on the use of other ML techniques such as ANN, SVM, and RFT, as well as DL such as CNN. In addition, some feature engineering techniques will be examined.

Table 3

Comparative analysis of our method with state of the art (DL and ML techniques) for hourly data resolution

Methods	Papers	Model	15 s resampled		Hourly down sampled			Daily down sampled
			RSME	MAE	RSME	MAPE	MAE	RSME	MAPE	MAE
Deep learning	[24]	CNN-GRU	—	—	—	—	—	0.47	—	0.33
	[27]	CNN+LSTM	—	—	0.31	0.19	047	0.11	0.08	0.69
ML	[20]	SVM	—	—	13.86	—	11.48	—	—	—
	[22]	CNN-2D-RP	0.79	0.59	—	—	—	—	—	—
		SVM	1.25	1.12	—	—	—	—	—	—
		ANN	1.15	1.08	—	—	—	—	—	—
	Proposed	XGBoost	—	—	0.229	0.026	—	0.129	0.020	—

7 Conclusion and further works

In this article, we have developed a new framework for household electricity consumption prediction. In this work, a public and available dataset, IHEPC dataset, was used for performance evaluation. First, the original data were preprocessed using data analysis techniques to remove missing, redundant, and aberrant values. Then, to verify the stationary of the time series and the prediction periods, time series decomposition and auto-correlation steps were performed. Finally, we applied different scaling techniques for better representation of the input data, resulting in an efficient model. In this study, we used XGBoost ML to investigate the future electricity consumption of hourly and daily granularities. To train and test the proposed model, three scenarios were considered, including the XGBoost-based model, the hyperparameter tuning-based model, and the feature scaling-based model. The experimental results showed that the MAPE of the XGBoost algorithm reaches 0.026 and 0.020 for the hourly and daily consumption periods, respectively. These results excellently outperform state of the art models for electricity consumption prediction, in terms of various RSME and MAPE performance metrics.

Acknowledgments

The authors would like to thank the referees for their suggestions that helped improve the original manuscript in its present form. We would also like to thank Laboratory of Mathematics an Informatics System (LAMIS) for supporting this research.

Author contributions: All authors have contributed equally to the writing of this article and they read and approve the latest manuscript.
Conflict of interest: The authors declare that there is no competing interest regarding the publication of this manuscript.
Data availability statement: The data that support the findings of this study are freely available in UCI Machine learning Repository (https://archive.ics.uci.edu).

References

[1] S. M. Molaei, and M. R. Keyvanpour, An analytical review for event prediction system on time series, International Conference on Pattern Recognition and Image Analysis, Rasht, Iran, 2015, pp. 1–6, https://doi.org/10.1109/PRIA.2015.7161635. Search in Google Scholar

[2] K. P. Amber, M. W. Aslam, and S. K. Hussain, Electricity consumption forecasting models for administration buildings of the UK higher education sector, Energy Buildings 90 (2015), 127–136, https://doi.org/10.1016/j.enbuild.2015.01.008. Search in Google Scholar

[3] V. G. Tran, V. Debusschere, and S. Bacha, One week hourly electricity load forecasting using neuro-fuzzy and seasonal ARIMA models, IFAC Proceeding, 45 (2012), no. 21, 97–102, https://doi.org/10.3182/20120902-4-FR-2032.00019. Search in Google Scholar

[4] S. E. Volkan and A. Sertaç, ARIMA forecasting of primary energy demand by fuel in Turkey, Energy Policy 35 (2007), no. 3, 1701–1708, https://doi.org/10.1016/j.enpol.2006.05.009. Search in Google Scholar

[5] J. Contreras, R. Espinola, F. J. Nogales and A. J. Conejo, ARIMA models to predict next-day electricity prices, IEEE Trans Power Syst. 18 (2003), no. 3, 1014–1020, https://doi.org/10.1109/TPWRS.2002.804943. Search in Google Scholar

[6] A. J. Conejo, M. A. Plazas, R. Espinola, and A. B. Molina, Day-ahead electricity price forecasting using the wavelet transform and ARIMA models, IEEE Trans Power Syst. 20 (2005), no. 2, 1035–1042, https://doi.org/10.1109/TPWRS.2005.846054. Search in Google Scholar

[7] R. Wang, S. Lu, and W. Feng, A novel improved model for building energy consumption prediction based on model integration, Appl. Energy 262 (2020), no. 114561, 1–14, https://doi.org/10.1016/j.apenergy.2020.114561. Search in Google Scholar

[8] T. Y. Kim and S. B. Cho, Predicting residential energy consumption using CNN-LSTM neural networks, Energy J. 182 (2019), 72–81, https://doi.org/10.1016/j.energy.2019.05.230. Search in Google Scholar

[9] A. Khan, H. Chiroma, M. Imran, A. Khan, J. I. Bangash, M. Asim, et al., Forecasting electricity consumption based on machine learning to improve performance: A case study for the organization of petroleum exporting countries (OPEC), Comput. Electr. Energy 86 (2020), 106737, 1–14, https://doi.org/10.1016/j.compeleceng.2020.106737. Search in Google Scholar

[10] Z. Qing, G. Yujing and F. Genfu, Household energy consumption in China: forecasting with BVAR model up to 2015, Fifth International Joint Conference on Computational Sciences and Optimization, Harbin, China, 2012, pp. 654–659, https://doi.org/10.1109/CSO.2012.150. Search in Google Scholar

[11] K. Yan, X. Wang, Y. Du, N. Jin, H. Huang, and H. Zhou, Multi-step short term power consumption forecasting with a hybrid deep learning strategy, Energies 11 (2018), no. 11, 3089, 1–15, https://doi.org/10.3390/en11113089. Search in Google Scholar

[12] H. Hu, L. Wang, and S. X. Lv, Forecasting energy consumption and wind power generation using deep echo state network, Renewable Energy 154 (2020), 598–613, https://doi.org/10.1016/j.renene.2020.03.042. Search in Google Scholar

[13] T. Pinto, I. Praça, Z. Vale, and J. Silva, Ensemble learning for electricity consumption forecasting in office buildings, Neurocomputing 423 (2021), 747–755, https://doi.org/10.1016/j.neucom.2020.02.124. Search in Google Scholar

[14] C. Liu, B. Sun, C. Zhang, and F. Li, A hybrid prediction model for residential electricity consumption using holt-winters and extreme learning machine, Appl. Energy 275 (2020), 115383, 1–15, https://doi.org/10.1016/j.apenergy.2020.115383. Search in Google Scholar

[15] G. Zhang, C. Tian, C. Li, J. J. Zhang, and W. Zuo, Accurate forecasting of building energy consumption via a novel ensembled deep learning method considering, Energy 201 (2021), 117531, 1–15, https://doi.org/10.1016/j.energy.2020.117531. Search in Google Scholar

[16] T. Liu, Z. Tan, C. Xu, H Chen, and Z. Li, Study on deep reinforcement learning techniques for building energy consumption forecasting, Energy Buildings 208 (2020), 109675, https://doi.org/10.1016/j.enbuild.2019.109675. Search in Google Scholar

[17] M. A., Dozdar, M. H. Masoud, and J. M. Ramadhan, A Review on Deep Sequential Models for Forecasting Time Series Data, Appl. Comput. Intell. Soft Comput. 2022 (2022), 1–19, https://doi.org/10.1155/2022/6596397. Search in Google Scholar

[18] L. M. Candanedo, V. Feldheim, and D. Deramaix, Data driven prediction models of energy use of appliances in a low-energy house, Energy Buildings 140 (2017), 81–97, https://doi.org/10.1016/j.enbuild.2017.01.083. Search in Google Scholar

[19] D. Wu, B. Wang, D. Precup, and B. J. Boulet, Multiple Kernel learning based transfer regression for electric load forecasting, IEEE Trans Smart Grid 11 (2019), no. 2, 1183–1192, https://doi.org/10.1109/TSG.2019.2933413. Search in Google Scholar

[20] V. Puspita and Ermatita, Time series forecasting for electricity consumption using kernel principal component analysis (KPCA) and support vector machine (SVM), J. Phys. Conf. Series 1196 (2019), no. 012073, 1–8, https://doi.org/10.1088/1742-6596/1196/1/012073. Search in Google Scholar

[21] J. Moon, Y. Kim, M. Son, E. Hwang, Hybrid short-term load forecasting scheme using random forest and multilayer perceptron, Energies 11 (2018), no. 12, 3283, 1–20, https://doi.org/10.3390/en11123283. Search in Google Scholar

[22] R Rajabi and A. Estebsari, Deep learning based forecasting of individual residential loads using recurrence plots, IEEE Milan Power Tech. 2019, pp. 1–5, https://doi.org/10.1109/PTC.2019.8810899. Search in Google Scholar

[23] X. Tang, Y. Dai, T. Wang, and Y. Chen, Short-term power load forecasting based on multilayer bidirectional recurrent neural network, IET Gener. Transm. Distrib. 13 (2019), no. 17, 3847–3854, https://doi.org/10.1049/iet-gtd.2018.6687. Search in Google Scholar

[24] M. Sajjad, Z. A. Khan, A. Ullah, T. Hussain, W. Ullah, M. Y. Lee, et al., A novel CNN-GRU-based hybrid approach for short-term residential load forecasting, IEEE Access, 8 (2020), 143759–143768, https://doi.org/10.1109/ACCESS.2020.3009537. Search in Google Scholar

[25] N. Ayub, M. Irfan, M. Awais, U. Ali, T. Ali, M. Hamdi, et al., Big data analytics for short and medium-term electricity load forecasting using an AI techniques ensembler, Energies 13 (2020), no. 19, 5193, 1–21, https://doi.org/10.3390/en13195193. Search in Google Scholar

[26] E. E. Elattar, N. A. Sabiha, M. Alsharef, M. K. Metwaly, A. M. Abd-Elhady, and I. B. M. Taha, Short term electric load forecasting using hybrid algorithm for smart cities, Appl. Intell. 50 (2020), 3379–3399, https://doi.org/10.1007/s10489-020-01728-x. Search in Google Scholar

[27] Z. A. Khan, T. Hussain, A. Ullah, S. Rho, M. Lee, and S. W. Baik, Towards efficient electricity forecasting in residential and commercial buildings: A novel hybrid CNN with a LSTM-AE based framework, Sensors, 20 (2020), no. 5, 1399, 1–16, https://doi.org/10.3390/s20051399. Search in Google Scholar PubMed PubMed Central

[28] G. Mahalakshmi, S. Sridevi and S. Rajaram, A survey on forecasting of time series data, International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE’16), Kovilpatti, India, 2016, pp. 1–8, https://doi.org/10.1109/ICCTIDE.2016.7725358. Search in Google Scholar

[29] G. Omkar and S. V. Kumar, Time series decomposition model for traffic flow forecasting in urban midblock sections, International Conference On Smart Technologies For Smart Nation (SmartTechCon), Bengaluru, India, 2017, pp. 720–723, https://doi.org/10.1109/SmartTechCon.2017.8358465. Search in Google Scholar

[30] J. S. Armstrong, Principles of Forecasting: A Handbook for Researchers and Practitioners, Kluwer Academic Publishers, Boston, USA, 2001. 10.1007/978-0-306-47630-3Search in Google Scholar

[31] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time Series Analysis: Forecasting and Control, John Wiley and Sons, Hoboken, New Jersey, U.S., 2015. Search in Google Scholar

[32] E. B. Dagum and S. Bianconcini, Seasonal Adjustment Methods and Real Time Trend-cycle Estimation, Springer, Cham, 2016. 10.1007/978-3-319-31822-6Search in Google Scholar

[33] P. Montero-Manso, G. Athanasopoulos, R. J. Hyndman, and T. S. Talagala, FFORMA: Feature-based forecast model averaging, Int J Forecasting 36 (2020), no. 1, 86–92, https://doi.org/10.1016/j.ijforecast.2019.02.011. Search in Google Scholar

[34] R. Bekkerman, M. Bilenko, and J. Langford, Scaling Up Machine Learning: Parallel and Distributed Approaches, Cambridge University Press, New York, NY, USA. 2011. 10.1017/CBO9781139042918Search in Google Scholar

[35] A. Botchkarev, A new typology design of performance metrics to measure errors in machine learning regression algorithms, Interdiscip. J. Inf. Knowl. Manag. 14 (2019), 045–076, https://doi.org/10.28945/4184. Search in Google Scholar

[36] B. Mathieu, Q. Z. Xiao, N. Elyes, G. Xiaofeng, and C. Patrice, Modeling and forecasting building energy consumption: A review of data-driven techniques, Sustainable Cities Society 48 (2019), no. 101533, 1–27, https://doi.org/10.1016/j.scs.2019.101533. Search in Google Scholar

[37] C. Reyes, T. Hilaire, S. Paul and C. F. Mecklenbräuker, Evaluation of the root mean square error performance of the PAST-Consensus algorithm, International ITG Workshop on Smart Antennas (WSA), Bremen, Germany, 2010, pp. 156–160, https://doi.org/10.1109/WSA.2010.5456452. Search in Google Scholar

[38] UCI repository of machine learning database [Online], Available: http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption. Search in Google Scholar

[39] M. Chaa, N. Boukezzoula, and A. Meraoumia, Features-level fusion of reflectance and illumination images in Finger-Knuckle-Print identification system, Int. J. Artif. Intell. Tools 27 (2018), no. 3, 1850007, 1–10, https://doi.org/10.1142/S0218213018500070. Search in Google Scholar

[40] S. Lee, S. Jung, and J. Lee, Prediction model based on an artificial neural network for user-based building energy consumption in South Korea, Energies 12 (2019), no. 4, 608, pp. 1–18, https://doi.org/10.3390/en12040608. Search in Google Scholar

[41] P. Trebuna, J. Halcinová, M. Filo, and J. Markovic, The importance of normalization and standardization in the process of clustering, IEEE 12th International Symposium on Applied Machine Intelligence and Informatics (SAMI), Herl’any, Slovakia, 2014, pp. 381–385, https://doi.org/10.1109/SAMI.2014.6822444. Search in Google Scholar

Received: 2022-08-03

Revised: 2022-09-26

Accepted: 2022-10-05

Published Online: 2022-11-30

This work is licensed under the Creative Commons Attribution 4.0 International License.

Data analysis-based time series forecast for managing household electricity consumption

Abstract

1 Introduction

2 Literature review

3 Theory framework

3.1 Time series components

3.1.1 Time series models

3.1.2 Time series decomposition

3.2 XGBoost description

3.3 Performance measures

4 Methodology

4.1 Dataset description

4.2 Data analysis

4.2.1 Data cleaning

4.2.2 Data decomposition

4.2.3 Time series auto-correlation

4.3 Feature engineering

4.3.1 Feature generation

4.3.2 Data scaling

4.4 Time series modeling

5 Experimental results

5.1 XGBoost-based time series forecasting model

5.2 XGBoost hyperparameters tuning

5.3 Dataset feature scaling-based model

6 Comparison with state of the art

7 Conclusion and further works

Acknowledgments

References

Journal and Issue

Articles in the same Issue