Clustering-Based Improvement of Nonparametric Functional Time Series Forecasting: Application To Intra-Day Household-Level Load Curves
Clustering-Based Improvement of Nonparametric Functional Time Series Forecasting: Application To Intra-Day Household-Level Load Curves
Clustering-Based Improvement of Nonparametric Functional Time Series Forecasting: Application To Intra-Day Household-Level Load Curves
1, JANUARY 2014
411
AbstractEnergy suppliers are facing ever increasing competition, so that factors like quality and continuity of offered services
must be properly taken into account. Furthermore, in the last few
years, many countries are interested in renewable energies such as
solar and wind. Renewable energy resources are mainly used for
environmental and economic reasons such as reducing the carbon
emission. It might also be used to reinforce the electric network especially during high peak periods. However, the injection of such
energy resources in the low-voltage (LV) network can leads to high
voltage constrains. To overcome this issue, one can motivate customers to use thermal or electric storage devices during high-production periods of PV to foster the integration of renewable energy generation into the network. In this paper, we are interested
in forecasting household-level electricity demand which represents
a key factor to assure the balance supply/demand in the LV network. A novel methodology able to improve short term functional
time series forecasts has been introduced. An application to the
Irish smart meter data set showed the performance of the proposed
methodology to forecast the intra-day household level load curves.
Index TermsCurve discrimination, functional data, household-level forecasting, intra-day load curve, nonparametric
statistics, smart grid, unsupervised classification.
N RECENT years we have seen the arrival of new technologies such as electrical vehicle (EV) and electric heating
as well as the increase of renewable energy generations such
as wind and solar. Therefore, the power grid is going through
change. In fact, the stochastic nature of the renewable energy
sources will lead the power grid to a highly stochastic system.
Within this new context, two main problems arise: 1) because
of the electrification of appliances and mobility applications, the
peak demand will increase and the load curve shape will change.
At the moment a great deal of attention is attracted by EV, both
hybrid and not, that will allow users to recharge their vehicles
directly at home. It is therefore important to understand and expect what might be the impact on the power grid capacity of this
recharging activity (see for instance [1], [2]). 2) It is well-known
that one of the expected solutions to reduce the peak demand is
Manuscript received February 14, 2013; revised May 28, 2013, July 30, 2013,
July 31, 2013; accepted August 01, 2013. Date of publication September 09,
2013; date of current version December 24, 2013. Paper no. TSG-00122-2013.
The author is with the Centre for the Mathematics of Human Behavior, Department of Mathematics and Statistics, University of Reading, Reading RG6
6AX, U.K. (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSG.2013.2277171
412
This forecasting issue has been a subject of several publication in statistical literature. The Functional Autoregressive
(FAR) process has been introduced and studied theoretically
by [24] and extensively used in both practical and theoretical
studies since then, see [25] among numerous other contributions. Under the FAR model, the best predictor,
, of the
curve
, given the historical curves
is the
conditional mean of
given the last curve .
) is a
(2)
where the weights
satisfy
and
. In
the nonparametric literature the weights
, are known as Nadaraya-Watson weights and are defined as
follows
(3)
where
413
and
. The tuning parameter
(the so-called bandwidth) controls the effective number of segments for which
is positive and therefore the smoothness
of the predictor.
Remark: An implicit assumption was assumed in the approach proposed by [22] which supposes that all the available
information for predicting segment
is mainly contained in
the last observed segment
.
B. Clustering-Based Improvement of the FWK Approach
(CFWK)
The proposed prediction procedure consists in the following
three main steps: a) classification of the sample of historical
segments into
(could be fixed or not) clusters containing
typical daily load curves. In contrast to aggregated load curves,
for which it is easy to observe a common pattern for the
working days, week-ends, and holidays (see [26] for the use
of a type-of-day classification to a national-level load forecasting), household load curves do not contain such kind of
similarity between days. For that reason, we use in this step, an
unsupervised classification method to identify days describing
a common consumption behavior pattern. b) Assign to the last
observed segment
the most appropriate cluster. The main
purpose of this step is to find days that contain the same information as the last observed day. In other words, we look, in the
historical segments, for those that describe a similar behavior
as what I observe today. c) Apply the FWK method to forecast
the segment
by using segments that belong to the cluster
obtained in step b). The following algorithm describes in more
detail how we improve the FWK forecasts using clustering and
curve discrimination approaches.
Step 1: Unsupervised Curve Classification: Suppose that we
have
historical segments
. In this step
we are interested in splitting automatically these
curves
into
clusters, say
. Because we do
not have at hand any categorial response variable and the data
set are clearly of functional nature then this problem can be seen
as an unsupervised curves classification. Since the number
of clusters is unknown in our case, then the unsupervised curves
classification problem becomes harder to solve. In the statistical
literature few authors gave a solution to that problem. These
contributions are mainly restricted to the works by [27] and [28]
in which -means techniques for classification analysis are extended to curves data. In this paper we used the hierarchical algorithm proposed by [29]. The reader is referred to [29] to get
more about this algorithm and the methodology behind.
Step 2: Curve Discrimination: The curve-discrimination
step can be stated as follows. Given the historical segments
, then from step1 we know in which cluster
each segment belongs to. Let us denote by
the cluster of
the segment . Assume that each pair of variables
has the same distribution as a pair of random variables
.
Given a new segment
(the last observed daily load curve)
the purpose now is to identify its class membership. For that we
estimate, for each
, the following conditional
probability:
(4)
414
Therefore, say
, the cluster corresponding to the highest
,
probability. We suppose that
are the segments that belong
where
to the cluster
and
is the total number of segments
in
.
Step 3: Forecasting: Using results obtained in step
2, we can now build the following sample of segments
, where
is the segment (corresponding to day ) that belongs to the cluster
and
is the segment observed at the day
. Observe that
doesnt necessarily belongs to the cluster
. Recall that our
target is to forecast the segment
. Therefore we propose
the following estimator
(5)
where
for all
, and
are as defined in (3).
Remark: The hierarchical classification and the curve discrimination algorithms have been implemented in statistical
software. The program is available on-line through the npfda
package1.
III. APPLICATION TO INTRA-DAY LOAD FORECASTING
A. Description of the Data Set
To evaluate the proposed approach to the household-level
load curve, we used the smart meter data from the Irish smart
meter trial2. The data set we used consists of
residential customers with a half-hour electricity demand between
14/07/2009 to 31/12/2010. Fig. 1 gives some examples of residential load curves. We can easily observe the high volatility of
those curves. In this section, our target is to forecast, one-day
ahead, the daily half-hour electricity demand (here
)
for the 2000 residential customers. We also compare results obtained by the proposed method CFWK to those obtained by FWK
approach.
B. An Illustration of CFWK Approach to Customer 1016
In this section, we focus on the application of the CFWK
method to one randomly chosen customer. We take as example the customer number 1016 in the Irish data. Later, we
suggest to extend the results to the entire sample of 2000
1http://www.math.univ-toulouse.fr/staph/npfda/
2http://www.ucd.ie/issda/data/commissionforenergyregulation/
415
416
Fig. 2. (a) Half-hour electricity demand of customer 1016 between 14/07/2009 and 31/12/2010. (b) A sample of 535 daily load curve (segments) of the same
customer.
Fig. 4. Last observed segment in the training sample for customer 1016:
which corresponds to 31/12/2009.
417
Fig. 6. Distribution (by month) of the daily median absolute errors (DMAE)
obtained by CFWK and FWK (case of customer 1016).
important peaks during the day: the first one around 12:30
which corresponds to lunch time and another more important
one around 15:30.
3) Day-Head Forecasting and Validation Criteria: Recall
that our main purpose in this example is to forecast the half-hour
load curve of 1 January 2010 which corresponds to the segment
. Using results obtained in step 1 and 2 we can then consider
the following sample
, where 51 is the
number of segments in cluster 1. Therefore, the forecast of
is obtained as follows
Fig. 7. Distribution (by month) of the daily median absolute errors (DMAE)
obtained by CFWK and FWK (case of customer 39).
where
are the Nadaraya-Watson weights obtained by measuring the similarity between the load curve observed day-ahead (segment
) and load curves within the
cluster 1. Those weights are determined by (3). In this step several tuning parameters should be fixed: the kernel
is chosen
to be the gaussian density function and the bandwidth being
selected by the empirical risk of prediction methodology suggested by [32]. In order to extend our study for one-year dayahead forecasting, we need just to repeat steps one, two, and
three, 365 times.
The accuracy of each model (CFWK and FWK) will be measured using half-hourly and daily errors. For each fixed day ,
with
, in the test sample, the Half-Hour Absolute
Errors (HHAE) are defined by
where
and
are the observed and the forecasted
value at the -th half hour in the -th day of the year to be predicted. The Daily Median Absolute Errors (DMAE) are defined,
for all
, by
The choice of the median instead of the mean here is because it is less sensitive to outliers (highest values of HHAE).
Fig. 5 displays the HHAE errors obtained for one year (from
01/01/2010 to 31/12/2010) day-ahead forecasting using the proposed method CFWK. One can observe that the HHAE are be-
Fig. 8. Distribution (by month) of the daily median absolute errors (DMAE)
obtained by CFWK and FWK (case of customer 708).
tween 0 KW and 3.5 KW and much important errors are obtained during the evening, a period of the day corresponding to
a high volatility of demand. A comparison between the proposed
method and the FWK approach has been made. Fig. 6 displays,
for each month in the year, the distribution of the DMAE errors
obtained with each method. We can observe clearly that CFWK
method provides smaller errors than the FWK one. One can observe that the median of DMAE obtained by CFWK are always
less than those obtained by FWK. Other similar results are given
in Figs. 7 and 8 for customers number 39 and 708.
C. Extension of the Study to 2000 Irish Customers
To measure the efficiency of the proposed approach, we extend the analysis to the sample of 2000 Irish customers. We applied the forecasting algorithm given by the CFWK approach to
418
IV. CONCLUSION
Fig. 9. Distribution (by month) of the Sample Daily Median Absolute Errors
(SDMAE) obtained by CFWK and FWK.
TABLE II
DISTRIBUTION (BY MONTH) OF THE SAMPLE DAILY MEDIAN ABSOLUTE
ERRORS (SDMAE) OBTAINED BY CFWK AND FWK
http://www.ofgem.gov.uk/networks/elecdist/Icnf/pages/Icnf.aspx
419