0% found this document useful (0 votes)
9 views6 pages

WHITED Publish

Uploaded by

uxpt.ouymv18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

WHITED Publish

Uploaded by

uxpt.ouymv18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/316454996

WHITED - A Worldwide Household and Industry Transient Energy Data Set

Conference Paper · May 2016

CITATIONS READS
155 2,259

4 authors, including:

Matthias Kahl Anwar Ul Haq


Technische Universität München Technische Universität München
19 PUBLICATIONS 330 CITATIONS 17 PUBLICATIONS 348 CITATIONS

SEE PROFILE SEE PROFILE

Hans-arno Jacobsen
University of Toronto
553 PUBLICATIONS 10,566 CITATIONS

SEE PROFILE

All content following this page was uploaded by Matthias Kahl on 25 April 2017.

The user has requested enhancement of the downloaded file.


WHITED - A Worldwide Household and Industry
Transient Energy Data Set
Matthias Kahl, Anwar Ul Haq, Thomas Kriechbaumer and Hans-Arno Jacobsen
Technische Universität München
Email: [email protected]

Abstract—In this paper, we introduce a data set of appliance TABLE I


start-up measurements from several locations. The appliances C OMPARISON OF DATA SETS WITH HIGH FREQUENCY APPLIANCE TRACES
were recorded with a low-cost custom sound card meter. The
recording was mainly done in households and small industry Appliance
settings in different regions around the world. Thus, it may Data Set Bit Fs Classes Variety Purpose
be possible to extract region-specific grid characteristics from REDD [5] 24 15 kHz ∼ 20 10 house demand
the voltage waveforms in the data. To cover all corresponding
BLUED [2] 16 12 kHz ∼ 30 ∼1 house demand
transients, we recorded the first 5 seconds of the appliance
start-ups for 110 different appliances to date, amounting to 47 UK DALE [4] 20 16 kHz ∼ 40 ∼1−3 house demand
different appliance types. The aim of this data set is to provide PLAID [8] 16 30 kHz ∼ 12 ∼ 20 transients
a broad spectrum of different appliance types in regions around HFED [7] 16 5 MHz 15 1 spectral traces
the world. WHITED 16 44 kHz 46 1−9 transients

I. I NTRODUCTION
A trend towards the incorporation of green energy tech- at high frequency [4]. Sound cards have been around for years
nologies into the existing grid is on the rise. Much of the with high resolution analog to digital converters. Many lossless
renewable energy is generated and utilized at the consumer compression algorithms are available for audio data to reduce
end. For efficient integration of highly weather dependent the data size. For our measurements, we used the stereo line
distributed resources, consumers need to play an effective role. input of an external USB sound card, since most modern
Also, they need to be incentivized by providing them with laptops lack an dedicated stereo line-in port.
information of their real-time energy consumption, preferably We have collected a general purpose and freely accessi-
at the level of individual appliances. So the importance of ble data set, using an affordable and portable sound card
handling energy consumption is growing and has even lead measurement system. This paper presents the sound card
to new fields of research like non-intrusive load monitoring measurement system based on simple assembly instructions,
(NILM) [1]. NILM relies on advances in computer science by and it demonstrates several characteristics of the collected
realizing abstractions to help retrieve knowledge from energy data set. In future work, we plan to extend the data set
consumption data. Applications are found in energy measure- with additional measurements by drawing on crowd sourcing
ment scenarios including household energy demand, appliance mechanisms. One of our main insights is that even low cost
transients, and reduction of electro-magnetic interferences hardware perform pretty well for NILM purposes.
(EMI) [2]–[10]. Once such abstractions are recorded, they
constitute the data sets that provide useful information about II. R ELATED W ORK
the consumer behavior by breaking down energy consumption Several public data sets covering appliance-level energy
for each appliance. consumption already exist. The purpose of these data sets is to
Although NILM has been around for almost three decades, measure demand in private households through a non-intrusive
it was not until recently that this field started to flourish. single point measurement in either low or high frequency.
Apart from recent advances in machine learning techniques, Through constant observation of household energy demand,
one of the major contribution was the release of an open these data sets provide comprehensive longtime measurements
energy data set REDD [3]. Although many open data sets to cover user behavior in the corresponding residence. These
have been released since then, we believe that there is still data sets are a good source for power disaggregation tasks
a scarcity of publicly available energy data sets in the high as they indirectly provide transient start-up features at an
frequency domain. As stated by K. C. Armel et al. in [11], appliance level. Real-world scenario data sets include REDD
the high frequency sampling rate in electricity signal mea- [5], UK DALE [4] and BLUED [2] among others.
surements enables us to accurately distinguish between more When looking in more detail at appliance transients, it can
appliances. Unfortunately, most of the time, high frequency be cumbersome to extract them from these single measure-
data acquisition hardware for transient analysis is expensive. ments. Since the ground truth is mostly based on 1s to 6s
In the UK-DALE data set, the authors have demonstrated the data without explicit voltage or current waveforms, it might be
use of an on-board sound card to record the electricity signals possible that two start-ups fall in the same time window, thus,
Light Bulb 100W Multi Tool 135W
0.1 0.1

current (norm)

current (norm)
0.05 0.05

0 0

-0.05 -0.05

-0.1 -0.1
0 0.5 1 0 0.5 1
Time in s Time in s
Stick Blender 500W Toaster 750W
0.1 0.1

current (norm)

current (norm)
0.05 0.05

Fig. 1. Measurement equipment prototype 0 0

-0.05 -0.05
violating the assumption of the switch continuity principle -0.1 -0.1
(SCP) [12]. Therefore, it is helpful to take a closer look at 0 0.5 1 0 0.5 1
transient-focused data sets such as PLAID [8] and HFED [7]. Time in s Time in s
PLAID examines start-up transients at 30 kHz whereas in
Fig. 2. Start-up of four different appliances. The different in-rush current
HFED short transient spectral traces of up to 5 MHz were characteristics are clearly visible.
observed but require high effort in terms of hardware and
experimental setup to reproduce.
Table I gives a comparison between the above mentioned divider is located in the black isolation part that merges the
high frequency data sets in terms of resolution, purpose, current and voltage signal cables into one cable that goes into
amount of appliance types (classes) and quantity of appliances the sound card. See Fig. 1 for the complete configuration.
for each class (Variety). The information about the appliance
types and quantity are inferred from the available data. We B. Measurements
believe that a high intra-class variety leads to a more reliable The signals were recorded in 44.1 kHz temporal and 16 bit
result in terms of appliance classification. amplitude resolution. To be able to take multiple measure-
With WHITED – a Worldwide Household and Industry ments in different places, it was necessary to build 3 identical
Transient Energy Data set – we want to contribute to existing measurement kits. Therefore, we also have to deal with three
energy data sets in terms of higher sampling frequency and slightly different sets of calibration factors. The calibration
higher amount of appliance types and variety. In addition, itself is done with an VOLTCRAFT VC-330 multimeter. Since
we provide a region classification for each measurement to the multimeter provides current measurements with a current
potentially enable the investigation of region specific research clamp, it was possible to measure both signals - voltage and
questions. current - and define a voltage and current calibration factor
for each measurement kit. Some sample measurements can be
III. A RCHITECTURE seen in Fig. 2 for 4 different appliances.
In this section, we describe hard and software components To cover the start-up transients of the appliances, it is
of our measurement equipment which is based on a sound necessary to determine them on demand. This is implemented
card as inexpensive analog to a digital converter. The idea of with a Matlab routine that uses the internal DSP package
a sound card-based measurement system is not new and was to monitor the line-in signal of the sound card. The start-
already used in [4] and [13]. Sound cards have a very good up is defined based on the current signal energy crossing a
price vs. performance ratio when using them as an analog threshold. If the current signal energy leads to a start-up, the
to digital converter. Our measurement prototype is based on routine starts recording and adds 100 ms of the signal before-
a modified 3-port extension cord, a current clamp, an AC-AC hand as pre-start-up window. This window allows difference-
transformer, a voltage divider, and an external USB sound card based algorithms to work effectively. That means that not the
with a Cmedia CM6206 chipset. absolute power consumption on the start-up but the difference
between the power of the pre-start-up window and the start-
A. Hardware Design up power can be observed. This approach introduces more
For measuring the current, we use a YHDC current clamp flexibility for developing algorithms that allow the recognition
with built-in burden resistor. This current clamp produces a of concurrently running appliances with different start-up
1 V signal at 30 A primary current. For the voltage measure- transients.
ments, we need to transform the grid voltage from 230 V to We decided to measure 10 start-ups for each appliance.
11 V with the AC-AC transformer. To have a corresponding These start-ups were triggered manually by the user. Appli-
voltage signal that lies in the line-in range of the sound card, ances that have no switch (e.g., an iron) were just plugged
we reduce it with a voltage divider to 0.47 V. The voltage and unplugged 10 times as it would be the case under real
usage. The appliances are measured for 5 seconds which is TABLE II
the duration of each start-up we recorded. A PPLIANCE TYPES ( CLASSES ) THAT WERE MEASURED

AC 1 Air Pump 1 Bench Grinder 1


C. Data Set
CFL 2 Charger 7 Coffee Machine 1
To this end, our data set comprises 1100 different records Deep Fryer 1 Desktop PC 1 Desoldering tool 1
for 110 different appliances which can be grouped into 47 Drilling Machine 2 Fan 6 Fan Heater 1
different types (classes) in 6 different regions. For most Flat Iron 2 Game Console 4 Guitar Amp 1
Hair Dryer 6 Halogen Fluter 1 Heater 1
appliances, we took a photo of its electrical specification
HiFi Rack 1 Iron 3 Jigsaw 1
label. These images are located in the sub-folder images and JuiceMaker 1 Kettle 6 Laptop 1
type-labels. Table II gives an overview of the measured Laserprinter 1 LED Light 9 Light bulb 6
appliances. The signal containing files are saved as flac files Massage tool 3 Microwave 2 Mixer 4
– a common lossless audio file format. The file names contain Monitor 2 Mosquito Repellent 1 Multitool 1
meta information and are of following format: Powersupply 4 Projector 1 Sewing Machine 1
Shoe warmer 2 Shredder 2 Soldering Iron 2
Toaster 4 Treadmill 1 TV 1
[Class]_[Name]_[Region]_[#Kit]_[TimeStamp].flac
Vacuum Cleaner 4 Washing Machine 1 Water Heater 4
GuitarAmp_Marshall8240_R3_MK2_20151115133402.flac Water Pump 1

The data set is freely available on the following web


page: http://bit.ly/WHITED-Set. For demand, load and appliance
information retrieval, the most important signal is the current. 30 A
SN R = 20 · log10 = 75.91 dB
To give the voltage signal a higher significance, we decided to 0.0048 A
measure the voltage in several regions that follow the European
The effective SNR of this measurement system is 75.91 dB.
grid standards. To this end, the data set contains 4 regions in
The maximum√measurable peak to peak current Ip−p is
Germany, 1 in Austria, and 2 in Indonesia.
30.0 ARMS · 2 2 = 84.4 A. Therefore, we calculate an
Since grid characteristics are mainly affected by utilities
effective current resolution with a step size of 13.5 mA.
and the consumption characteristics of the surrounding area,
a future research direction is to look for possibilities to Ip−p
determine the region from the voltage signal. This experiment Istep = · InoiseRM S
ImaxRM S
is a similar classification task to the appliance recognition we
have already implemented. 84.4 A
Istep = · 0.0048 A = 0.0135 A
30.0 A
IV. E VALUATION
This current step size enables us to calculate the effective
To ensure the quality of the data set, we applied several
power step size Pstep corresponding to 230 V of grid voltage.
signal quality checks and conducted two classification exper-
iments.
Pstep = 230 V · 0.0135 A = 3.1 W
A. Data Quality
The resolution and noise of the sound card allows a voltage
Since sound cards do not provide a high level of linearity in
step of 0.313 V, a current step of 0.0135 A which results in a
frequency response as compared with professional ADCs, we
measurable power step of around 3.1 W based on 230 V. To
verified that there is no significant impact on the measurements
achieve reliable results only appliances with a consumption of
taken.
at least 20 W are considered in our data set. This covers most
The sound card manufacturer provides some information
household and small industry appliances.
regarding the line-in linearity which can be seen in Fig. 4. It
Fig. 3a and 3b show a spectrogram of a mixer and a multi-
is visible that the strongest damping of around 0.25 dB has its
tool based on the first 5 seconds after the start-up. Both
maximum at 3320 Hz. The steepest flank has a bandwidth of
appliances have a fast spinning motor and look similar in the
around 3300 Hz and lies between 3320 Hz and 6622 Hz which
time domain. However, there are significant differences in the
is acceptable for most considered purposes.
spectral domain that can be transformed into distinguishable
To obtain an approximation of the noise level during record-
features for appliance classification purposes.
ing, the energy of a 10 second empty signal is being compared
to the energy of a maximum amplitude sine-wave signal. With B. Experimental Results
this calculation, we estimate an effective SNR (signal to noise
Our appliance recognition experiment is based on a classifi-
ratio). We measured an average noise RMS of 4.8 mA where
cation task to distinguish appliances on its characteristics in the
30 A corresponds to the RMS maximum.
current signal. The classifier has to distinguish between all 47
RM Smax appliance types. The classification experiment is implemented
SN R = 20 · log10 in Matlab. All flac files are imported and the containing
RM Snoise
(a) Mixer (b) Multi-tool
Fig. 3. Comparison of 2 appliances that use motors with relatively high rotations per minute. The different spectral characteristics are clearly visible. The
multi-tool has stronger uneven harmonics while the harmonics are more equal in the case of the mixer.

based features. We obtain an almost perfect classification


accuracy of 99.13 % with an SVM classifier. Here, we must
consider that the feature extraction is based on characteristics
that vary over time and are not independently representative
for the corresponding region.

V. C ONCLUSIONS
In this work, we publish a data set comprised of a broad
range of household and small industry appliance start-up
transients. As discussed, we believe that there is still a need
Fig. 4. The line-in frequency response from the CM6206 specification [14]. for such kind of measurements. The purpose of this paper is to
show that even a low-budget, custom measurement system al-
lows one to retrieve significantly discriminating features from
signal is scaled with the corresponding calibration factors to appliance start-up transients to enable appliance classification
determine actual values. After this preprocessing step, a region needs.
of interest (ROI) needs to be extracted. Here, we decided to We aim at continuing to expand our data set through involv-
cut the signal right on the start-up until 500 ms after the start- ing the community and hopefully more individuals world-wide
up. These 500 ms samples are given to the feature extraction join to contribute measurements based on the measurement
stage which is an implementation of 13 different characteristics system specification. Location recognition based on the volt-
including harmonics, phase shift and total harmonic distortion age signal needs further observations of grid characteristics
(THD). in each region to be able to distinguish between regions.
The best results we achieved for the appliance classification The reason is that grid characteristics like frequency and
were based on a feature set that consisted of a period-based voltage constantly changing. Short duration measurements do
power trend with 25 dimensions, the THD and crest factor not provide sufficient information about the stability of such
of the current spectrum with each 1 dimension in its size. grid characteristics.
With these three features in 27 dimensions, we achieve an
average classification accuracy across all appliances of around R EFERENCES
95 % with a 10-fold cross-validation and a support vector
[1] G. W. Hart, “Nonintrusive appliance load monitoring,” Proceedings of
machine (SVM) classifier. This confirms the observation that the IEEE, vol. 80, no. 12, pp. 1870–1891, 1992.
power difference and harmonics contain sufficient information [2] K. Anderson, A. Ocneanu, D. Benitez, D. Carlson, A. Rowe, and
to distinguish among basic electrical appliances [15]. M. Berges, “Blued: A fully labeled public dataset for event-based non-
intrusive load monitoring research,” in Proceedings of the 2nd KDD
For the region classification experiment, we use the same workshop on data mining applications in sustainability (SustKDD),
environment but employ the voltage instead of current for the 2012, pp. 1–5.
feature extraction. The labels are not the appliances but the [3] C. Beckel, W. Kleiminger, R. Cicchetti, T. Staake, and S. Santini, “The
eco data set and the performance of non-intrusive load monitoring
region where the measurements were taken. We apply the algorithms,” in Proceedings of the 1st ACM Conference on Embedded
voltage, grid frequency and a few spectral- and waveform- Systems for Energy-Efficient Buildings, 2014, pp. 80–89.
[4] J. Kelly and W. Knottenbelt, “The uk-dale dataset, domestic appliance-
level electricity demand and whole-house demand from five uk homes,”
2015. [Online]. Available: http://www.doc.ic.ac.uk/∼dk3810/data/
[5] J. Z. Kolter and M. J. Johnson, “Redd: A public data set for energy
disaggregation research,” in Workshop on Data Mining Applications in
Sustainability (SIGKDD), San Diego, CA, vol. 25. Citeseer, 2011, pp.
59–62.
[6] A. Monacchi, D. Egarter, W. Elmenreich, S. D’Alessandro, and A. M.
Tonello, “Greend: An energy consumption dataset of households in Italy
and Austria,” CoRR, vol. abs/1405.3100, 2014.
[7] M. Gulati, S. Sundar Ram, and A. Singh, “An in depth study into
using EMI signatures for appliance identification,” in Proceedings of the
First ACM International Conference on Embedded Systems For Energy-
Efficient Buildings. ACM, 2014.
[8] J. Gao, S. Giri, E. C. Kara, and M. Bergés, “Plaid: a public dataset of
high-resoultion electrical appliance measurements for load identification
research: demo abstract,” in Proceedings of the 1st ACM Conference on
Embedded Systems for Energy-Efficient Buildings. ACM, 2014, pp.
198–199.
[9] A. Veit, C. Goebel, R. Tidke, C. Doblander, and H.-A. Jacobsen,
“Household electricity demand forecasting: Benchmarking state-of-the-
art methods,” in 5th International Conference on Future Energy Systems,
ser. e-Energy ’14. New York, NY, USA: ACM, 2014, pp. 233–234.
[10] H. Ziekow, C. Doblander, C. Goebel, and H.-A. Jacobsen, “Forecasting
household electricity demand with complex event processing: Insights
from a prototypical solution,” in 13th ACM/IFIP/USENIX International
Middleware Conference, ser. Middleware Industry’13. New York, NY,
USA: ACM, 2013, pp. 2:1–2:6.
[11] K. C. Armel, A. Gupta, G. Shrimali, and A. Albert, “Is disaggregation
the holy grail of energy efficiency? the case of electricity,” Energy Policy,
vol. 52, pp. 213–234, 2013.
[12] S. Makonin, “Investigating the switch continuity principle assumed in
non-intrusive load monitoring (nilm),” 2016.
[13] F. Englert, T. Schmitt, S. Kößler, A. Reinhardt, and R. Steinmetz,
“How to auto-configure your smart home?: high-resolution power mea-
surements to the rescue,” in Proceedings of the fourth international
conference on Future energy systems. ACM, 2013, pp. 215–224.
[14] CM6206 High Integrated USB Audio I/O Controller, Rev. 2.1 ed.,
C-Media Electronics Inc. [Online]. Available: http://www.bramcam.nl/
NA/8663-XS/CM6206.pdf
[15] K. N. Trung, O. Zammit, E. Dekneuvel, B. Nicolle, C. N. Van, and
G. Jacquemod, “An innovative non-intrusive load monitoring system
for commercial and industrial application,” in Advanced Technologies
for Communications (ATC), 2012 International Conference on. IEEE,
2012, pp. 23–27.

View publication stats

You might also like