Clustering-Based Improvement of Nonparametric Functional Time Series Forecasting: Application To Intra-Day Household-Level Load Curves

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

IEEE TRANSACTIONS ON SMART GRID, VOL. 5, NO.

1, JANUARY 2014

411

Clustering-Based Improvement of Nonparametric


Functional Time Series Forecasting: Application to
Intra-Day Household-Level Load Curves
Mohamed Chaouch

AbstractEnergy suppliers are facing ever increasing competition, so that factors like quality and continuity of offered services
must be properly taken into account. Furthermore, in the last few
years, many countries are interested in renewable energies such as
solar and wind. Renewable energy resources are mainly used for
environmental and economic reasons such as reducing the carbon
emission. It might also be used to reinforce the electric network especially during high peak periods. However, the injection of such
energy resources in the low-voltage (LV) network can leads to high
voltage constrains. To overcome this issue, one can motivate customers to use thermal or electric storage devices during high-production periods of PV to foster the integration of renewable energy generation into the network. In this paper, we are interested
in forecasting household-level electricity demand which represents
a key factor to assure the balance supply/demand in the LV network. A novel methodology able to improve short term functional
time series forecasts has been introduced. An application to the
Irish smart meter data set showed the performance of the proposed
methodology to forecast the intra-day household level load curves.
Index TermsCurve discrimination, functional data, household-level forecasting, intra-day load curve, nonparametric
statistics, smart grid, unsupervised classification.

I. INTRODUCTION AND MOTIVATIONS

N RECENT years we have seen the arrival of new technologies such as electrical vehicle (EV) and electric heating
as well as the increase of renewable energy generations such
as wind and solar. Therefore, the power grid is going through
change. In fact, the stochastic nature of the renewable energy
sources will lead the power grid to a highly stochastic system.
Within this new context, two main problems arise: 1) because
of the electrification of appliances and mobility applications, the
peak demand will increase and the load curve shape will change.
At the moment a great deal of attention is attracted by EV, both
hybrid and not, that will allow users to recharge their vehicles
directly at home. It is therefore important to understand and expect what might be the impact on the power grid capacity of this
recharging activity (see for instance [1], [2]). 2) It is well-known
that one of the expected solutions to reduce the peak demand is
Manuscript received February 14, 2013; revised May 28, 2013, July 30, 2013,
July 31, 2013; accepted August 01, 2013. Date of publication September 09,
2013; date of current version December 24, 2013. Paper no. TSG-00122-2013.
The author is with the Centre for the Mathematics of Human Behavior, Department of Mathematics and Statistics, University of Reading, Reading RG6
6AX, U.K. (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSG.2013.2277171

to reinforce the power grid by renewable energy generation. One


can use energy storage system during surplus energy periods,
e.g., PV generation during the day, and discharges during peak
load moments. Nevertheless, the integration of a large quantity
of renewable energy might lead to a serious problems in the
power network. For instance, [3] studied the impact of solar PV
on LV three phase distribution networks in terms of voltage rise,
voltage unbalance, reserve power flows and feeder losses. Since
most feeders in the LV network have a decreasing cross section,
only uni-directional electric charges might be received. Moreover, with the injection of renewable energy sources the electric charge becomes multi-directional depending on consumption and renewable energy production. Obviously, LV networks
are designed to support such scenarios but only for short time
periods (few hours).
It is worth noting that in both cases 1) and 2) explained above,
the problem of load forecasting represents a crucial issue for
operational planners. Reference [4] shows that short-term load
forecasting (STLF) is a key step for proper operation of a battery
energy storage system. They used an artificial neural network
forecaster for hourly based forecasting of the distributed power
generation and load consumption. Recently, [5] used updated
load forecast for peak shaving and battery lifetime prolonging.
Regularly, the network constraints are evaluated on a specific
area in order to prevent over-voltage problems on the network.
Very localized consumption and PV/wind production forecasts
are needed to detect constraints of current intensity and voltage
on each node and each feeder in the LV network. On the other
hand, in order to improve the management of energy demand,
the customer is always considered, by the distribution network
operators (DNOs), as an important actor that might be involved
in the regulation of the electricity network. To reduce the peak
demand, DNOs may ask a selected number of influential customers to reduce their demand during some specific days in the
year in conjunction with an incentives tariff. Another way to
make the customer an actor in the management of the energy
in the power network is to transform himself as a producer of
PV energy for instance. It is well-known that the development of
smart meter and its massive deployment in Europe (80% households will be equipped by 2020) and North America allows us to
get individual electricity consumption measures on a very fine
time scale.
One-day-ahead forecasting of aggregated electricity demand
has been widely studied in statistical literature. Different
approaches have been proposed to solve this issue. Time series analysis methods like (S)ARIMA models or exponential

1949-3053 2013 IEEE

412

smoothing can be found in [6][8]. Those based on state-space


models in [9]. Machine learning approaches such as artificial neural networks and support vector machine have also
been used in [10]. Among nonparametric and semiparametric
methods, [11] used kernel-based regression model and [12]
applied a dimension reduction approach named Moving Average Variance Estimation (MAVE, see [13] for more details)
to forecast French aggregated load curve. Generalized additive
models for short term electricity load curve forecasting were
studied in [14] for instance. For an extensive review on forecasting electric load we refer to, e.g., [15].
The arrival of smart meters allows us to receive energy
demand measurements at a finite number of equidistant time
points, e.g., every half hour or every ten minutes. Thus, in order
to forecast the load demand of the next day, one has to predict
the load demand at 48 or 144, respectively, time points. From
a statistical point of view, it is convenient to think of the daily
load demand recorded at these 48 or 144 points as a segment and
to perform load prediction for the whole segment of time points
rather than forecasting the load demand at each one of these
time points separately. This implies that we adopt the functional
time series framework. Functional approach can be also seen
as a solution to overcome the problem of incorporating a high
number of past values into the statistical model such as in
SARIMA model. The idea of forming a functional time series
has been considered by several authors, including [16], [17].
Within this framework of functional time series, several approaches have been proposed, e.g., [19] used a semi-functional
partial linear model for one-day-ahead forecasting of electricity demand and price, [20] forecasted peak load demand by
using functional linear model and [21] developed a functional
linear regression model when the response variable and the
covariate are both functional. The authors in [22] proposed a
nonparametric functional approach based on functional kernel
regression estimator. Their developed methodology supposes
that all the available information for predicting a segment is
essentially contained in the last observed segment. Moreover,
an application to sub-aggregated stationary load curve has
shown the efficiency of this method with respect to SARIMA
model. Recently, [23] performed the approach proposed by [22]
by means of a weighted average of past daily load segments.
In that case, the past load segments are identified by mean of
their closeness to some reference load segment which captures
some expected qualitative and quantitative characteristics of
the segment to be predicted.
In this paper, we are interested in short term forecasting of
household-level intra-day electricity load curve. In contrast
to aggregated load curves, which are characterized by their
seasonality, regularity, and sensibility to meteorological conditions, household load curves are very volatile, their shape
depends mainly on the customer behavior and are less dependent to weather conditions. It is easy to see that the presence of
customer behavior, which is difficult to quantify, as a determinant factor of the shape of the individual load curve makes the
issue of household-level forecasting difficult to solve. In this
paper, an improved version of the approach proposed by [22]
adapted to household-level forecasting has been introduced.
The improvement procedure here is based on the use of an

IEEE TRANSACTIONS ON SMART GRID, VOL. 5, NO. 1, JANUARY 2014

unsupervised clustering step of the historical segments which


allows us to find segments describing a common consumption
behavior. Then, we use a nonparametric curve discrimination
approach to assign a cluster to the last segment. This step
allows us to identify segments which will be used to forecast
the target segment.
The paper is organized as follows. In Section II, we introduce the concept of functional time series methodology.
Then, we summarize the functional wavelet-kernel approach
proposed by [22] and describe the methodology proposed
in this paper. Section III is devoted to an application of our
method to intra-day household level load curve forecasting. A
comparison study and an extension to 2000 Irish customers is
given in the same section. Some concluding remarks are given
in Section IV.
II. FUNCTIONAL TIME SERIES FORECASTING
Let us consider the household electricity demand as a (realvalued) continuous-time stochastic process
.
We are interested in the evolution of this process in the future.
We suppose that we observe the process
over an interval
and one would like to predict the behavior of
on the
entire interval
, where
, rather than at specific
time points. To this end we can divide the interval
into
subintervals
with
,
and to consider the (functional-valued) discrete-time stochastic
process
, where
, defined by
(1)
In this paper, we are interested in one-day ahead intra-day
load curve forecasting, the segmentation parameter corresponds to the daily electricity demand. In practice, the electricity
demand is recorded at a finite number of equidistance time
points within each day, say
, for instance, every
half hour (in that case
) or every 10 minutes (then
). Let us denote by
the observation at time
point
, within curve
. We denote
by

the segment of the total number of observations of the -th curve


. Therefore, given a sample
of segments, our purpose is then to predict the whole next segment
. In other words we want to predict

This forecasting issue has been a subject of several publication in statistical literature. The Functional Autoregressive
(FAR) process has been introduced and studied theoretically
by [24] and extensively used in both practical and theoretical
studies since then, see [25] among numerous other contributions. Under the FAR model, the best predictor,
, of the
curve
, given the historical curves
is the
conditional mean of
given the last curve .

CHAOUCH: CLUSTERING-BASED IMPROVEMENT OF NONPARAMETRIC FUNCTIONAL TIME SERIES FORECASTING

A. Functional Wavelet-Kernel Approach (FWK)


A nonparametric approach based on kernel method has been
developed by [22] to solve the same forecasting issue. In contrast to the FAR model, authors in [22] supposed that the regression operator is unknown and they estimated it non-parametrically. More precisely, the prediction of segment
was obtained by kernel smoothing, conditioning on the last observed
segment
, while the resulting predictor was expressed as a
weighted average of the past segments, placing more weight
on those segments the preceding of which is similar to the
present one. The notion of similarity between two segments (or
curves) plays an important role in the calculus of weights and
therefore in the prediction of segment
. The authors in [18]
defined some semi-metrics which allows to measure the similarity between curves. Another approach based on a distance
metric on the discrete wavelet coefficients of suitable wavelet
decomposition of the available segments has been proposed by
[22]. This approach consists in applying the discrete wavelet
transform to the historical segments in order to decompose the
temporal information of those segments into discrete wavelet
coefficients that are associated both with time and scale. Let us
consider two segments
and
, and let
and
be the discrete wavelet coefficients of
and
respectively at scale and location . Then, the measure of closeness
of the two segments
and
can be summarized in the following two steps:
a) at each scale , the closeness of the two segments,
and
, might be defined by measuring the euclidean distance
between their discrete wavelet coefficients

b) to quantify the similarity between any two segments


and
it suffices to combine all scales, then the distance
is defined as fellow

Recall that the predictor


(of the segment
weighted average of all segments, then we have

) is a

(2)
where the weights
satisfy
and
. In
the nonparametric literature the weights
, are known as Nadaraya-Watson weights and are defined as
follows
(3)
where

for some symmetric function


centered at zero (called Kernel) such that

413

and
. The tuning parameter
(the so-called bandwidth) controls the effective number of segments for which
is positive and therefore the smoothness
of the predictor.
Remark: An implicit assumption was assumed in the approach proposed by [22] which supposes that all the available
information for predicting segment
is mainly contained in
the last observed segment
.
B. Clustering-Based Improvement of the FWK Approach
(CFWK)
The proposed prediction procedure consists in the following
three main steps: a) classification of the sample of historical
segments into
(could be fixed or not) clusters containing
typical daily load curves. In contrast to aggregated load curves,
for which it is easy to observe a common pattern for the
working days, week-ends, and holidays (see [26] for the use
of a type-of-day classification to a national-level load forecasting), household load curves do not contain such kind of
similarity between days. For that reason, we use in this step, an
unsupervised classification method to identify days describing
a common consumption behavior pattern. b) Assign to the last
observed segment
the most appropriate cluster. The main
purpose of this step is to find days that contain the same information as the last observed day. In other words, we look, in the
historical segments, for those that describe a similar behavior
as what I observe today. c) Apply the FWK method to forecast
the segment
by using segments that belong to the cluster
obtained in step b). The following algorithm describes in more
detail how we improve the FWK forecasts using clustering and
curve discrimination approaches.
Step 1: Unsupervised Curve Classification: Suppose that we
have
historical segments
. In this step
we are interested in splitting automatically these
curves
into
clusters, say
. Because we do
not have at hand any categorial response variable and the data
set are clearly of functional nature then this problem can be seen
as an unsupervised curves classification. Since the number
of clusters is unknown in our case, then the unsupervised curves
classification problem becomes harder to solve. In the statistical
literature few authors gave a solution to that problem. These
contributions are mainly restricted to the works by [27] and [28]
in which -means techniques for classification analysis are extended to curves data. In this paper we used the hierarchical algorithm proposed by [29]. The reader is referred to [29] to get
more about this algorithm and the methodology behind.
Step 2: Curve Discrimination: The curve-discrimination
step can be stated as follows. Given the historical segments
, then from step1 we know in which cluster
each segment belongs to. Let us denote by
the cluster of
the segment . Assume that each pair of variables
has the same distribution as a pair of random variables
.
Given a new segment
(the last observed daily load curve)
the purpose now is to identify its class membership. For that we
estimate, for each
, the following conditional
probability:
(4)

414

IEEE TRANSACTIONS ON SMART GRID, VOL. 5, NO. 1, JANUARY 2014

This means that, whenever the last observed segment is


, what is the probability that it belongs to cluster
.
Observe that this conditional probability, given by (4) can
be seen as a regression function. Therefore, a nonparametric
estimator of these probabilities has been proposed in [30]. For
all
,

Therefore, say
, the cluster corresponding to the highest
,
probability. We suppose that
are the segments that belong
where
to the cluster
and
is the total number of segments
in
.
Step 3: Forecasting: Using results obtained in step
2, we can now build the following sample of segments
, where
is the segment (corresponding to day ) that belongs to the cluster
and
is the segment observed at the day
. Observe that
doesnt necessarily belongs to the cluster
. Recall that our
target is to forecast the segment
. Therefore we propose
the following estimator
(5)

where

for all
, and
are as defined in (3).
Remark: The hierarchical classification and the curve discrimination algorithms have been implemented in statistical
software. The program is available on-line through the npfda
package1.
III. APPLICATION TO INTRA-DAY LOAD FORECASTING
A. Description of the Data Set
To evaluate the proposed approach to the household-level
load curve, we used the smart meter data from the Irish smart
meter trial2. The data set we used consists of
residential customers with a half-hour electricity demand between
14/07/2009 to 31/12/2010. Fig. 1 gives some examples of residential load curves. We can easily observe the high volatility of
those curves. In this section, our target is to forecast, one-day
ahead, the daily half-hour electricity demand (here
)
for the 2000 residential customers. We also compare results obtained by the proposed method CFWK to those obtained by FWK
approach.
B. An Illustration of CFWK Approach to Customer 1016
In this section, we focus on the application of the CFWK
method to one randomly chosen customer. We take as example the customer number 1016 in the Irish data. Later, we
suggest to extend the results to the entire sample of 2000
1http://www.math.univ-toulouse.fr/staph/npfda/
2http://www.ucd.ie/issda/data/commissionforenergyregulation/

customers. Fig. 2(a) shows the original time series which


represents the half-hourly electricity demand of this customer
between 14/07/2009 and 31/12/2010. In Fig. 2(b), we split the
original signal into daily load curves
in order to
be able to apply the proposed functional approach. Thus, we
obtain a sample, say
, of 535 daily load curves
(segments). One can easily observe, from Fig. 2(b), that the
electricity demand for that customer is very low between 00:00
and 07:00. Then, the demand increase around 07:30 which
corresponds to the morning activity in the household. During
the day, consumption decreases in the most of days. Finally, we
can observe the classical evening peak demand between 19:00
and 20:00.
To validate our method, we split this sample into two
parts. Firstly, denoted by
, a learning
sample containing daily load curves from 14/07/2009 to
31/12/2009. This sample will be used to build clusters and
find the optimal bandwidth . The second part, denoted by
, is the test sample which will be
used to compare our forecasts to the observed daily load curves
for the period between 01/01/2010 to 31/12/2010 (365 days).
Each segment in the test sample is forecasted independently.
To forecast the segment
we use as historical segments
and the last observed segment is
. Then
to forecast the segment
, we consider the historical data
and the last observed segment now is
(the
true one and not its forecast). This procedure will be repeated
until we forecast all segments that belong to the test sample .
Based on the sample
of segments, the goal
now is to forecast the segment
(which corresponds to 1
January 2010) using CFWK approach. To this end, the following
steps are taken:
1) How Many Clusters Do We Get?: Based on segments
and using the hierarchical algorithm proposed by [29], we find three clusters which are represented in
Fig. 3(a)(c). The median daily profile has been plotted for each
cluster in Fig. 3(d). One can easily observe the peak around
07:30 for all clusters. In contrast to the other clusters, cluster
1 contains three other important peaks at 10:30, at 19:30, and
at 21:00. We can also observe that in the first cluster 50% of
segments have electricity demand between 0.5 KW and 2 KW
during the day period from 10:00 to 15:00. On the other hand,
in addition to the small peak observed at 10:30, clusters 2 and
3 are characterized by an important peak at the evening (more
important for cluster 3) and very small electricity demand
during the day.
Table I summarizes the type of days within each cluster. We
can observe that the first cluster contains mainly week-ends
(Saturdays and Sundays) which explain why this customer consumes more electricity during the morning and the afternoon.
Working days are in majority within Cluster 2 and for that reason
we observe a small peak in the morning (around 07:30), roughly
no consumption during the day and then another peak in the
evening when people come back to home. Cluster 3 contains
mainly Fridays and Sundays (about 50% of the total number of
days in that cluster) and some holidays like 25/12/2009. In comparison to cluster 2, this may explain the high values, during the
day, of the electricity demand.

CHAOUCH: CLUSTERING-BASED IMPROVEMENT OF NONPARAMETRIC FUNCTIONAL TIME SERIES FORECASTING

415

Fig. 1. First sample of residential customers load curve.

2) Which Cluster to Be Assigned to the Last Observed


Segment
?: A nonparametric curve discrimination
method introduced by [30] has been used to assign a cluster
for each last observed segment
in the training sample. In
this example the last segment
corresponds to the load

curve observed on 31/12/2009. Our main task is to predict


the corresponding class (which will be in our case cluster 1,
2 or 3) for this segment. To apply the discrimination method
explained in Section II-B several tuning parameters should be
fixed. The kernel is chosen to be quadratic and the optimal

416

IEEE TRANSACTIONS ON SMART GRID, VOL. 5, NO. 1, JANUARY 2014

Fig. 2. (a) Half-hour electricity demand of customer 1016 between 14/07/2009 and 31/12/2010. (b) A sample of 535 daily load curve (segments) of the same
customer.

Fig. 3. Clusters obtained for customer 1016.


TABLE I
TYPE OF DAYS WITHIN EACH CLUSTER

bandwidth is chosen by the cross-validation method on the


-nearest neighbors (see [18], p. 115 for more details). Another
important parameter needs to be fixed is the semi-metric
.
In this example, because of the roughness of the load curves,
we used a semi-metric computed with the functional principal
components analysis (see [31]) with an optimal dimension
equal to 2. The optimality here was measured with respect to
the rate of misclassified curves obtained within the learning
sample (17% in this case). Finally, the discrimination method

Fig. 4. Last observed segment in the training sample for customer 1016:
which corresponds to 31/12/2009.

assigned the cluster 1 to the segment


. This result looks to
be compatible with the shape of the load curve of the segment
presented in Fig. 4. Since 31/12/ 2009 is a Christmas
holiday, the customer behavior in that period is expected to be
the same as on the week-end. We can easily see, from Fig. 4,
the absence of the small peak demand usually observed at
07:30 on working days. We also observe the presence of two

CHAOUCH: CLUSTERING-BASED IMPROVEMENT OF NONPARAMETRIC FUNCTIONAL TIME SERIES FORECASTING

Fig. 5. Half-Hour Absolute Errors (HHAE) obtained by CFWK method (case of


customer 1016).

417

Fig. 6. Distribution (by month) of the daily median absolute errors (DMAE)
obtained by CFWK and FWK (case of customer 1016).

important peaks during the day: the first one around 12:30
which corresponds to lunch time and another more important
one around 15:30.
3) Day-Head Forecasting and Validation Criteria: Recall
that our main purpose in this example is to forecast the half-hour
load curve of 1 January 2010 which corresponds to the segment
. Using results obtained in step 1 and 2 we can then consider
the following sample
, where 51 is the
number of segments in cluster 1. Therefore, the forecast of
is obtained as follows
Fig. 7. Distribution (by month) of the daily median absolute errors (DMAE)
obtained by CFWK and FWK (case of customer 39).

where
are the Nadaraya-Watson weights obtained by measuring the similarity between the load curve observed day-ahead (segment
) and load curves within the
cluster 1. Those weights are determined by (3). In this step several tuning parameters should be fixed: the kernel
is chosen
to be the gaussian density function and the bandwidth being
selected by the empirical risk of prediction methodology suggested by [32]. In order to extend our study for one-year dayahead forecasting, we need just to repeat steps one, two, and
three, 365 times.
The accuracy of each model (CFWK and FWK) will be measured using half-hourly and daily errors. For each fixed day ,
with
, in the test sample, the Half-Hour Absolute
Errors (HHAE) are defined by

where
and
are the observed and the forecasted
value at the -th half hour in the -th day of the year to be predicted. The Daily Median Absolute Errors (DMAE) are defined,
for all
, by

The choice of the median instead of the mean here is because it is less sensitive to outliers (highest values of HHAE).
Fig. 5 displays the HHAE errors obtained for one year (from
01/01/2010 to 31/12/2010) day-ahead forecasting using the proposed method CFWK. One can observe that the HHAE are be-

Fig. 8. Distribution (by month) of the daily median absolute errors (DMAE)
obtained by CFWK and FWK (case of customer 708).

tween 0 KW and 3.5 KW and much important errors are obtained during the evening, a period of the day corresponding to
a high volatility of demand. A comparison between the proposed
method and the FWK approach has been made. Fig. 6 displays,
for each month in the year, the distribution of the DMAE errors
obtained with each method. We can observe clearly that CFWK
method provides smaller errors than the FWK one. One can observe that the median of DMAE obtained by CFWK are always
less than those obtained by FWK. Other similar results are given
in Figs. 7 and 8 for customers number 39 and 708.
C. Extension of the Study to 2000 Irish Customers
To measure the efficiency of the proposed approach, we extend the analysis to the sample of 2000 Irish customers. We applied the forecasting algorithm given by the CFWK approach to

418

IEEE TRANSACTIONS ON SMART GRID, VOL. 5, NO. 1, JANUARY 2014

IV. CONCLUSION

Fig. 9. Distribution (by month) of the Sample Daily Median Absolute Errors
(SDMAE) obtained by CFWK and FWK.

TABLE II
DISTRIBUTION (BY MONTH) OF THE SAMPLE DAILY MEDIAN ABSOLUTE
ERRORS (SDMAE) OBTAINED BY CFWK AND FWK

In this paper, a new approach for forecasting functional


time series has been proposed. An application to short-term
intra-day household-level load curve forecasting has shown the
performance of the proposed methodology. The idea behind
the use of a classification step is mainly to get a reasonable assumption of stationarity for our time series. Moreover, because
the intra-day individual load curve shape is mainly affected
by the consumption behavior of the customer and there is no
evidence to identify a common pattern between days we used
an unsupervised classification method to find similar segments.
The numerical results obtained showed that the clustering
based approach works very satisfactorily and outperforms the
functional wavelet-kernel time series predictor. We note the
proposed methodology might be improved by using some daily
exogenous functional random variables, like internal/external
daily temperature and sunshine curves. Other discrete variables,
such as surface of the property, number of electric appliances
and number of occupants can also be taken into account which
might affect daily individual load demand.
ACKNOWLEDGMENT
The author would like to thank Scottish and Southern Power
Distribution SSEPD for support and funding via the New
Thames Valley Vision Project (SSET203New Thames Valley
Vision)funded through the Low Carbon Network Fund3.
REFERENCES

this panel of customers. The tuning parameters has been fixed


to be the same for all customers. For each customer, the number
of clusters in step 1 of the algorithm has been found automatically. To measure the performance of the proposed method over
the panel of 2000 customers, we define the following validation
procedure: for each customer,
, we calculate,
as in the previous subsection, the 365 Daily Median Absolute
Errors, say
.
Then, for each day,
, we determine the Sample
Daily Median Absolute Error (SDMAE) defined as follows:

Fig. 9 displays the distribution, for each month, of the SDMAE


errors provided by CFKW and FKW approaches. Table II gives
numerical summary of results obtained in Fig. 9. For instance,
if we take January 2010 as an example, one can observe that,
with the CFWK (resp. FWK) approach, 50% (of the 2000 customers in the panel) have a daily median absolute error (DMAE)
less than 0.206 KW (resp. 0.222 KW) and 75% of them have
a DMAE errors between 0.195 KW and 0.215 KW (resp. 0.211
KW and 0.235 KW). The same analysis might be made for the
other months. Table II shows that CFWK approach is more efficient than the FWK one.

[1] K. Clement-Nyns, E. Haesen, and J. Driesen, The impact of charging


plug-in hybrid electric vehicles on a residential distribution grid, IEEE
Trans. Power. Syst., vol. 25, no. 1, pp. 371380, 2010.
[2] S. D. Jenkins, J. R. Rossmaier, and M. Ferdowsi, Utilization and effect
of plug-in hybrid electric vehicles in the United States power grid, in
Proc. IEEE Veh. Power Propulsion Conf. (VPPC), Harbin, China, Sep.
35, 2008, pp. 15.
[3] M. Alam, K. M. Muttaqi, and D. Sutanto, A comprehensive assessment tool for solar PV impacts on low voltage three phase distribution
networks, in Proc. IEEE 2nd Int. Conf. Develop. Renewable Energy
Technol. (ICDRET), 2012, pp. 221225.
[4] B. M. J. Vonk, P. H. Nguyen, M. O. W. Grond, J. G. Slootweg, and
W. L. Kling, Improving short-term load forecasting for a local energy
storage system, in Proc. Univ. Power Eng. Conf. (UPEC), Sep. 47,
2012, pp. 16.
[5] B. Guannan, L. Chao, Y. Zhichang, and L. Zhigang, Battery energy
storage system load shifting control based on real time load forecasting
and dynamic programming, in Proc. 8th IEEE Int. Conf. Autom. Sci.
Eng., Seoul, Korea, Aug. 2024, 2012.
[6] T. Hong, Short term electric load forecasting, Ph.D. dissertation,
Graduate Program of Operation Research and Dept. Electrical and
Computer Engineering, North Carolina State Univ., Raleigh, NC,
USA, 2010.
[7] S. Fan, K. Methaprayoon, and W. J. Lee, Multiregion load forecasting
for system with large geographical area, IEEE Trans. Ind. Appl., vol.
45, no. 4, pp. 14521459, Jul.Aug., 2009.
[8] J. W. Taylor, Triple seasonal methods for short-term electricity demand fore- casting, Eur. J. Oper. Res., vol. 204, pp. 139152, 2010.
[9] V. Dordonnat, S. J. Koopman, and M. Ooms, Dynamic factors in
periodic time-varying regressions with an application to hourly electricity load modelling, Comput. Stat. Data Anal., vol. 56, no. 11, pp.
31343152, 2012.
[10] H. S. Hippert, C. E. Pedreira, and R. C. Souza, Neural networks for
short-term load forecasting: A review and evaluation, IEEE Trans.
Power Syst., vol. 16, no. 1, pp. 4455, Feb. 2001.
[11] J. M. Poggi, Prvision non paramtrique de la consommation lectrique, Rev. de Stat. Applique, vol. 12, no. 4, pp. 8398, 1994.
3See

http://www.ofgem.gov.uk/networks/elecdist/Icnf/pages/Icnf.aspx

CHAOUCH: CLUSTERING-BASED IMPROVEMENT OF NONPARAMETRIC FUNCTIONAL TIME SERIES FORECASTING

[12] V. Lefieu, Modles Semi-paramtriques appliqus la prvision des


sries temporelles. Cas de la consommation dlectricit, Ph.D. dissertation, Universit Rennes 2, Rennes, France, 2007.
[13] Y. Xia, H. Tong, and W. K. Li, An adaptive estimation of dimension
reduction space, J. Roy. Stat. Soc. B, vol. 64, no. 3, pp. 363410, 2002.
[14] S. Fan and R. J. Hyndman, Short-term load forecasting based on a
semi-parametric additive model, IEEE Trans. Power Syst., vol. 27,
pp. 134141, 2012.
[15] H. K. Alfares and M. Nazeeruddin, Electric load forecasting: Literature survey and classification of methods, Int. J. Syst. Sci., vol. 33, pp.
2334, 2002.
[16] G. Aneiros-Prez and P. Vieu, Nonparametric time series prediction:
A semi-functional partial linear modeling, J. Multivar. Analys., vol.
99, no. 5, pp. 834857, 2008.
[17] A. Antoniadis and T. Sapatinas, Wavelet methods for continuous-time
prediction using Hilbert-valued autoregressive processes, J. Multivar.
Anal., vol. 87, pp. 133158, 2003.
[18] F. Ferraty and Ph. Vieu, Nonparametric Functional Data Analysis.
Theory and Practice. New York: Springer-Verlag, 2006.
[19] J. M. Vilar, R. Cao, and G. Aneiros, Forecasting next-day electricity
demand and price using nonparametric functional methods, Int. J.
Electr. Power Energ. Syst., vol. 39, no. 1, pp. 4855, 2012.
[20] A. Goia, C. May, and G. Fused, Functional clustering and linear regression for peak load forecasting, Int. J. Forecast, vol. 26, no. 4, pp.
700711, 2010.
[21] J. Antoch, L. Prchal, M. R. De Rosa, and P. Sarda, Electricity consumption prediction with functional linear regression using spline estimators, J. Appl. Stat., vol. 37, no. 12, pp. 20272041, 2010.
[22] A. Antoniadis, E. Paparoditis, and T. Sapatinas, A functional waveletkernel approach for time series prediction, J. Roy. Stat. Soc. B, vol. 68,
pp. 837857, 2006.
[23] E. Paparoditis and T. Sapatinas, Short-term load forecasting: The similar shape functional time series predictor, Preprint, 2012.
[24] D. Bosq, Linear Processes in Functional Spaces: Theory and Applications. Berlin, Germany: Springer, 2000.

419

[25] J. Damon and S. Guillas, The inclusion of exogenous variables in


functional autoregressive ozone forecasting, Environmetrics, vol. 13,
pp. 759774, 2002.
[26] A. Antoniadis, X. Brossat, J. Cugliari, and J. M. Poggi, Prvision dun
processus valeurs fonctionnelles en prsence de non stationnarits.
Application la consommation dlectricit, J. Soc. Fr. Stat., vol. 153,
no. 2, pp. 5278, 2012.
[27] C. Abraham, P. Cornillon, E. Matzner-Lber, and N. Molinari, Unsupervised curves classification using -splines, Scand. J. Stat., vol.
30, pp. 581595, 2003.
[28] J. Cuesta-Albertos and R. Fraiman, Impartial trimmed -means and
classification rules for functional data, Comput. Stat. Data Anal., vol.
51, no. 10, pp. 48644877, 2007.
[29] S. Dabo-Niang, F. Ferraty, and P. Vieu, Mode estimation for functional random variable and its application for curves classification,
Far East J. Theor. Stat., vol. 18, no. 1, pp. 93119, 2006.
[30] F. Ferraty and P. Vieu, Curves discrimination: A nonparametric functional approach, Comput. Stat. Data Anal., vol. 44, pp. 161173, 2003.
[31] P. Hall and M. Hosseini-Nasab, On properties of functional principal
components analysis, J. Roy. Stat. Soc., vol. 68, no. 1, pp. 109126,
2006.
[32] A. Antoniadis, E. Paparoditis, and T. Sapatinas, Bandwidth selection
for functional time series prediction, Stat. Probab. Lett., vol. 79, pp.
733740, 2009.

Mohamed Chaouch received the M.Sc. degree in biostatistics from University


of Montpellier II, France, and the Ph.D. degree in mathematics from University
of Burgundy, France, in 2005 and 2008, respectively.
After working as a Research Engineer in Statistics at EDF R&D, he joined
the University of Reading, U.K., where he is currently a Research Associate.
His current research focuses on time series forecasting and classification with
an application to electricity demand and renewable energy generation within the
smart grid context.

You might also like