Generative Adversarial Network For Synthetic Time Series Data Generation in Smart Grids
Generative Adversarial Network For Synthetic Time Series Data Generation in Smart Grids
Abstract—The availability of fine grained time series data the application of sophisticated models to obtain highly
is a pre-requisite for research in smart-grids. While data for accurate results.
transmission systems is relatively easily obtainable, issues related • Privacy of Data: Distribution system data, obtained
to data collection, security and privacy hinder the widespread
public availability/accessibility of such datasets at the distribution from AMI meters contain Personally Identifiable Infor-
system level. This has prevented the larger research community mation (PII) and sophisticated algorithms are required to
from effectively applying sophisticated machine learning algo- anonymize the data as per the regulations [9]. This further
rithms to significantly improve the distribution-level accuracy of prevents the large scale availability of such datasets.
predictions and increase the efficiency of grid operations.
An intuitive way to approach these problems is to generate
Synthetic dataset generation has proven to be a promising
solution for addressing data availability issues in various do- synthetic datasets that enable researchers to develop novel
mains such as computer vision, natural language processing and data-driven models, while maintaining real dataset privacy.
medicine. However, its exploration in the smart grid context Classic approaches involve modeling underlying causes of the
remains unsatisfactory. Previous works have tried to generate observed dataset and generating model-based synthetic data.
synthetic datasets by modeling the underlying system dynamics:
In [10], the authors propose a multisegment markov chain
an approach which is difficult, time consuming, error prone and
oftentimes infeasible in many problems. In this work, we propose model of the solar states and generate synthetic states using
a novel data-driven approach to synthetic dataset generation by this model. In [11], the author proposes to train autoregressive
utilizing deep generative adversarial networks (GAN) to learn models and use theta-join for generating smart meter data.
the conditional probability distribution of essential features in However, their approach requires hand-crafted features such
the real dataset and generate samples based on the learned
as fluctuations flattening and time series deseasonalizing.
distribution. To evaluate our synthetically generated dataset, we
measure the maximum mean discrepancy (MMD) between real Accurate modeling of the underlying causes is a daunting
and synthetic datasets as probability distributions, and show task. It requires us to make several assumptions (for example,
that their sampling distance converges. To further validate our markovian property) which are not necessarily true, thus
synthetic dataset, we perform common smart grid tasks such as affecting the reliability of the synthetically generated data.
k-means clustering and short-term prediction on both datasets.
Potential applications in smart grid that can benefit from
Experimental results show the efficacy of our synthetic dataset
approach: the real and synthetic datasets are indistinguishable large scale synthetic data includes behind-the-meter solar
by solely examining the output of these tasks. disaggregation [3], real-time smart grid system simulation [12]
and etc. In behind-the-meter solar disaggregation problem,
I. M OTIVATION large scale datasets are required to train and validate machine
The lack of fine grained distribution system data is a learning models, yet many of them are not available due
significant bottleneck preventing the community from devel- to privacy issues. Real-time smart grid system simulation
oping novel data science and machine solutions for smart grid also requires large scale datasets that reflect certain system
applications such as load forecasting [1], dynamic demand behavior, which is often limited due to the lack of fine-grained
response [2], behind-the-meter disaggregation [3] and so on. meters.
Although efforts exist to make public datasets available [4][5], In this work, we develop a novel data-driven approach for
they are limited due to the following reasons: generating synthetic smart grid data by directly ‘learning’ the
• Availability of Data: ISOs and RTOs regularly publish probability distribution of the real time-series data using a
data regarding the transmission level grid operations deep Generative Adversarial Network (GAN) model. While
online for public use [6][7]. However, no such framework GANs have been used to effectively synthesize cutting-edge
exists for distribution systems. Hence, the only readily “fake” images and audios [13], they have not hitherto been
available distribution system level datasets are through used for smart grid data due to various underlying challenges
the efforts of various researchers [8], which are limited in distinguishing data patterns (seasonality, short/long term,
in scope. customer behavior, prosumer etc.). Our work is based on
• Scale of Data: Several machine learning based models the following insight: we observe that smart grid time series
require vast amounts of data for training. The limited data can be separated into two distinct statistical compo-
scope and size of the available public datasets prevents nents: Level and Pattern, where Level determines high-level
statistical attributes such as mean, scale and variance while B. Generative Adversarial Network
Pattern determines the real trend. By normalizing the Level GAN [13] is a deep generative model that can implicitly
of different users, the Pattern of long-term periodic time series capture any differentiable probability distribution and provide
data can be modeled as a conditional probability distribution a way to draw samples from it. Assuming some prior distribu-
conditioned on actual date. We show that this probability tion z ∼ pz (z), we would like to learn a generator function G,
distribution can be easily “learned” using GAN. The main such that G(z) ∼ pdata (x). To achieve this goal, we introduce
contributions of this paper are as follows: a discriminator function D and let D and G play the following
• We first develop a probabilistic model to abstract sig- two-player minimax game with value function V (G, D):
nificant characteristics inherent in smart grid time series
min max V (G, D) =Ex∼pdata (x) [log D(x)]
datasets. G D (1)
• We then develop a conditional GAN to learn the proba- + Ez∼pz (z) [log(1 − D(G(z)))]
bility distribution of the real dataset in order to generate
synthetic datasets which are indistinguishable under sta- An intuitive explanation of the objective function is that the
tistical tests. To the best of our knowledge, this is the generator is trying to produce fake samples while the dis-
first effort that uses deep GANs in smart grid. criminator is trying to detect the counterfeits. A competition
• We evaluate the effectiveness of the generated synthetic in this game drives both the generator and the discriminator
datasets by performing both statistical tests as well as to improve their methods until the fake samples are indistin-
classic machine learning tasks including timeseries clus- guishable from the real data.
tering and load prediction and showing that the results In practice, function G and D are often approximated using
are indistinguishable from the real dataset. deep neural networks. It is proven in [13] that given fixed
discriminator D, minimizing the value function in Eq 1 with
respect to the generator parameters is equivalent to minimizing
II. BACKGROUND the Jensen-Shannon divergence between pdata (x) and G(z).
In other words, as training progresses, the implicit distribution
A. Target of Smart Grid Dataset Definition G(z) captures converges to pdata (x). In this work, we use a
variant GAN architecture known as Conditional GAN [17] to
The immense heterogeneity in smart grid data makes it learn conditional probability distribution of time series data.
highly unlikely that a single model could be used for synthe-
sizing the datasets. Hence, any discussion on synthetic dataset C. Evaluating Synthetic Datasets
generation is incomplete without a precise definition of the
As it is not possible to mathematically prove that the real
targeted datasets. The datasets that we target in this work
samples and the synthetic samples come from the identical
can be broadly defined as “timeseries data conditioned on
distribution, we perform statistical tests and use classic ma-
smart grid”. More precisely, we focus on the datasets which
chine learning algorithms to empirically show:
can be modeled as a timeseries. Moreover, the underlying
processes generating the dataset should be defined or affected 1) Real time series and synthetic time series share key
by the smart grid under consideration. This implies that data statistical properties.
generated using natural processes such as temperature, solar 2) Real time series and synthetic time series can not
irradiance etc. are not our focus. Moreover, event based data be distinguished by the outcomes of these machine
[14] such as on/off time of appliances, plug in/plug out times learning algorithms.
of EVs etc. are also not a focus. Ultimately, the purpose of the synthetic data is to serve as
The above definition allows us to model (trivially) a wide supplemental training data for machine learning solutions or
range of datasets such as uncontrolled load: affected by as a substitute data to preserve privacy of the original dataset.
customer behavior patterns which are conditional on economic 1) Statistical Tests: Maximum Mean Discrepency (MMD)
features etc.; PV generation: affected by solar irradiance [18] measures the distance between two probability distribu-
assuming no forced curtailment [15] and conditional on PV tions by drawing samples. Given samples {xi }N i=1 ∼ p(x) and
module number, size, efficiency; aggregate generation in the {yj }M
j=1 ∼ q(y), an estimate of MMD is:
grid: affected by smart grid demand and thus conditional on N N N M
smart grid in consideration; electricity prices: affected by mar- 1 XX 2 XX
MMD= K(x i , xj ) − K(xi , yj )
ket conditions and thus conditional on the smart grid in con- N 2 i=1 j=1 M N i=1 j=1
sideration etc. We can also model controlled load/generation M M 1/2
(e.g. due to Demand Response [16], load/solar curtailment 1 XX
+ 2 K(yi , yj )
etc. [15]) by adding additional conditional variables to denote M i=1 j=1
the control. For example, if a building has two different (2)
consumption profiles, one under normal conditions and one
under DR, then an additional binary conditional variable to where K(x, y) = exp(−||x − y||2 /(2σ 2 )) is known as radial
denote whether the building is in DR or not can be used. basis function (RBF) kernel.
Conditional Learn Generative Sample
Real Level Synthetic
Level Normalization Probability Adversarial
Time Series Recovery Time Series
Distribution Network
| {z } | {z } | {z }
Preprocessing
<latexit sha1_base64="a0kbnkD2XG6wxD08Iy9TeMZCbpI=">AAACRnicbZBNS8NAEIYn9bt+VT16CRbBU0m86LHoxWMFq4IpZbOZ1sXNbtydqCXkb/hrvCr+Bf+EN/Go24+DVgd2eXneGZh540wKS0Hw5lVmZufmFxaXqssrq2vrtY3Nc6tzw7HNtdTmMmYWpVDYJkESLzODLI0lXsQ3x0P/4g6NFVqd0SDDTsr6SvQEZ+RQtxZEuUrQxIZxLKLb25wl47/sFhHhAxUtg5nRHK0Vql+W3Vo9aASj8v+KcCLqMKlWt/YVJZrnKSrikll7FQYZdQpmSHCJZTXKLWaM37A+XjmpWIq2U4wuK/1dRxK/p417ivwR/TlRsNTaQRq7zpTRtZ32hvBfb0hIa2nL3ziXJIy+n1qLeoedQqgsJ1R8vFUvlz5pf5ipnwiDnOTACcaNcIf5/Jq5SMklX3WJhdP5/BXn+40waISnYb15NMluEbZhB/YghANowgm0oA0cHuEJnuHFe/XevQ/vc9xa8SYzW/CrKvANlB+1Fg==</latexit>
sha1_base64="Dj1uacrCRRBAU5RulcfCdXCy0ls=">AAACRnicbZC/ThtBEMbnzJ+ACeBASbOKhZTKuqOB0oImRQojxTYSZ1l7e2OzYm/32J0DrNO9Rp4mLYhX4A2o0qAoZbK2KYJhpF19+n0z0syX5Eo6CsPHoLa0vLL6YW29vvFxc2u78Wmn50xhBXaFUcaeJdyhkhq7JEnhWW6RZ4nCfnJ5MvX712idNPo7TXIcZHys5UgKTh4NG2Fc6BRtYrnAMr66Kng6/6thGRPeUtmxmFsj0Dmpx1U1bDTDVjgr9lZEL6LZ/vb0zACgM2z8jVMjigw1CcWdO4/CnAYltySFwqoeFw5zLi75GM+91DxDNyhnl1Vs35OUjYz1TxOb0f8nSp45N8kS35lxunCL3hS+600JGaNc9RoXiqQ1Nwtr0ehoUEqdF4RazLcaFYqRYdNMWSotClITL7iw0h/GxAX3kZJPvu4TixbzeSt6B60obEWnUbN9DPNagz34DF8ggkNow1foQBcE/ICfcAf3wUPwK/gd/Jm31oKXmV14VTX4B3Alt50=</latexit>
sha1_base64="bx0hDwUe9mwdAn+t/M1KsCXNKdg=">AAACRnicbZC/ThtBEMbnHP4Y889JSpoTFhKVdUcDpRUaCgpHwoDEWdbe3thesbd73p0Ltk73GrxEXoGWKK+QN6BKE0Upw9qmAJuRdvXp981IM1+cSWEpCH55lQ8rq2vr1Y3a5tb2zm7946dLq3PDscO11OY6ZhalUNghQRKvM4MsjSVexbenU//qGxortLqgSYbdlA2U6AvOyKFePYhylaCJDeNYRKNRzpL5X/aKiHBMRdtgZjRHa4UalGWv3giawaz8ZRG+iEbr/OlPY/x9u92r/48SzfMUFXHJrL0Jg4y6BTMkuMSyFuUWM8Zv2QBvnFQsRdstZpeV/oEjid/Xxj1F/oy+nihYau0kjV1nymhoF70pfNebEtJa2vItziUJo+8W1qL+SbcQKssJFZ9v1c+lT9qfZuonwiAnOXGCcSPcYT4fMhcpueRrLrFwMZ9lcXnUDINm+DVstL7AvKqwB/twCCEcQwvOoA0d4HAPD/AIP7yf3m/vr/dv3lrxXmY+w5uqwDNAzbjQ</latexit> <latexit sha1_base64="6v64LaQfxl+hYkLjaJvmvwkJ0U4=">AAACR3icbZC/TsMwEMad8q+UfwFGlogKialKWGBEsDAwFIm2SE0VOc61WDh2sC9AFeU5eBpWEI/AU7AhNnBLh1I4ydan33cn3X1xJrhB339zKnPzC4tL1eXayura+oa7udU2KtcMWkwJpa9iakBwCS3kKOAq00DTWEAnvjkd+Z070IYreYnDDHopHUje54yiRZEbhLlMQMeaMijC29ucJtN/GRUhwgMW50C15HJQlpFb9xv+uLy/IpiIOplUM3K/wkSxPAWJTFBjuoGfYa+gGjkTUNbC3EBG2Q0dQNdKSVMwvWJ8WuntWZJ4faXtk+iN6fREQVNjhmlsO1OK12bWG8F/vRFBpYQpf+NcINfqfmYt7B/1Ci6zHEGyn636ufBQeaNQvYRrYCiGVlCmuT3MY9fUZoo2+ppNLJjN569oHzQCvxFcBPXjk0l2VbJDdsk+CcghOSZnpElahJFH8kSeyYvz6rw7H87nT2vFmcxsk19Vcb4BPmm1ZA==</latexit>
sha1_base64="MRM7FehAOS5LucPlbZhTTf7VKog=">AAACR3icbZC/btswEMZP7r/U/ae2YxaiRoFOhtSlGY1k6ZAhBeLYQGQIFHV2iFCkQp6aGIKeo0+TNUEeoW+QrUOBoltD2RlspweQ+PD77oC7LyuVdBRFP4POo8dPnj7bet598fLV6zfh23dHzlRW4FAYZew44w6V1DgkSQrHpUVeZApH2ele64++o3XS6EOalzgp+EzLqRScPErDOKl0jjazXGCdnJ1VPF/9m7ROCC+o3kdutdSzpknDXtSPFsUeivhe9Ab7t78ZAByk4b8kN6IqUJNQ3LnjOCppUnNLUihsuknlsOTilM/w2EvNC3STenFawz56krOpsf5pYgu6OlHzwrl5kfnOgtOJ2/Ra+F+vJWSMcs06rhRJa8431qLpzqSWuqwItVhuNa0UI8PaUFkuLQpScy+4sNIfxsQJ95mSj77rE4s383kojj7346gff4t7g11Y1hZswwf4BDF8gQF8hQMYgoAfcAlXcB3cBL+CP8HfZWsnuJ95D2vVCe4AGm+36w==</latexit>
sha1_base64="eChMnhGPIzxsFlzI80xR1DjQX/w=">AAACR3icbZC7SgNBFIZn4y3GW9TSZjEIVmHXRsugjYWFgolCNoTZ2ZNkcHZmnTmrhmWfw4fwGWwVH8E3sLMQxE4nl0ITD8zw8/3nwDl/mAhu0PNencLM7Nz8QnGxtLS8srpWXt9oGJVqBnWmhNKXITUguIQ6chRwmWigcSjgIrw6GvgXN6ANV/Ic+wm0YtqVvMMZRYvaZT9IZQQ61JRBFlxfpzT6/eftLEC4w+wEqJZcdvO8Xa54VW9Y7rTwx6JSO3n7qNw9rJy2y99BpFgag0QmqDFN30uwlVGNnAnIS0FqIKHsinahaaWkMZhWNjwtd3csidyO0vZJdIf090RGY2P6cWg7Y4o9M+kN4L/egKBSwuR/cSqQa3U7sRZ2DloZl0mKINloq04qXFTuIFQ34hoYir4VlGluD3NZj9pM0UZfson5k/lMi8Ze1feq/plfqR2SURXJFtkmu8Qn+6RGjskpqRNG7skjeSLPzovz7nw6X6PWgjOe2SR/quD8AOsIuR4=</latexit>
Learning Postprocessing
<latexit sha1_base64="++jCujPnCpqOQAdN57tp5pqAQx8=">AAACR3icbZC/TsMwEMad8r/8CzCyRFRITFXCAmMFC2ORKK1Eqspxrq2FYwf7AlRRnoOnYQXxCDwFG2IDt80ALSfZ+vT77qS7L0oFN+j7705lYXFpeWV1rbq+sbm17e7sXhuVaQYtpoTSnYgaEFxCCzkK6KQaaBIJaEe352O/fQ/acCWvcJRCN6EDyfucUbSo5wZhJmPQkaYM8vDuLqPx9C96eYjwiHlTGUy1YmAMl4Oi6Lk1v+5PypsXQSlqpKxmz/0OY8WyBCQyQY25CfwUuznVyJmAohpmBlLKbukAbqyUNAHTzSenFd6hJbHXV9o+id6E/p7IaWLMKIlsZ0JxaGa9MfzXGxNUSpjiL84Ecq0eZtbC/mk35zLNECSbbtXPhIfKG4fqxVwDQzGygjLN7WEeG1KbKdroqzaxYDafeXF9XA/8enAZ1BpnZXarZJ8ckCMSkBPSIBekSVqEkSfyTF7Iq/PmfDifzte0teKUM3vkT1WcH6WDtZ8=</latexit>
sha1_base64="PHphvwl+ACIWXFwkKzV8XFpyu14=">AAACR3icbZC/bhNBEMbnzJ8YE8CBkmYVCymVdUdDSguaFCmMFCeWfJa1tze2V9nbPe/OAdbpniNPkxbEI/AGdCmQonTJ+k8R24y0q0+/b0aa+ZJcSUdh+CeoPXn67Ple/UXj5f6r12+aB2/PnSmswJ4wyth+wh0qqbFHkhT2c4s8SxReJJdfFv7FN7ROGn1G8xyHGZ9oOZaCk0ejZhQXOkWbWC6wjGezgqervxqVMeEPKrvGUW6NQOeknlTVqNkK2+Gy2K6I1qLVOf37jwFAd9S8j1Mjigw1CcWdG0RhTsOSW5JCYdWIC4c5F5d8ggMvNc/QDcvlaRX74EnKxsb6p4kt6eOJkmfOzbPEd2acpm7bW8D/egtCxihXbeJCkbTm+9ZaND4ellLnBaEWq63GhWJk2CJUlkqLgtTcCy6s9IcxMeU+U/LRN3xi0XY+u+L8YzsK29HXqNX5DKuqw3s4hCOI4BN04AS60AMBV3ANP+FX8Du4CW6Du1VrLVjPvIONqgUPgYm4Jg==</latexit>
sha1_base64="7J18Qk1Ux4F4AMhBisDDShiTJJo=">AAACR3icbZC/SgNBEMb34r8Y/0UtbQ6DYBXubLQM2lhYRDAqeCHs7U2Sxb3dy+6cJhz3HD6Ez2Cr+Ai+gZ2FIHa6SSw0cWCXj983AzNfmAhu0PNenMLM7Nz8QnGxtLS8srpWXt84NyrVDBpMCaUvQ2pAcAkN5CjgMtFA41DARXh9NPQvbkAbruQZDhJoxrQjeZsziha1yn6Qygh0qCmDLOj1UhqN/7yVBQh9zOrKYKIVA2O47OR5q1zxqt6o3Gnh/4hK7eT1vdK/X6m3yl9BpFgag0QmqDFXvpdgM6MaOROQl4LUQELZNe3AlZWSxmCa2ei03N2xJHLbStsn0R3R3xMZjY0ZxKHtjCl2zaQ3hP96Q4JKCZP/xalArtXtxFrYPmhmXCYpgmTjrdqpcFG5w1DdiGtgKAZWUKa5PcxlXWozRRt9ySbmT+YzLc73qr5X9U/9Su2QjKtItsg22SU+2Sc1ckzqpEEYuSMP5JE8Oc/Om/PhfI5bC87PzCb5UwXnG1IxuVk=</latexit>
the Pattern information. Then, the conditional probability distribution, feeds them into 3-layer 1D tranpose convolutional
distribution is denoted as network and produces the synthetic output. The discriminator
is a 3-layer 1D convolutional network that takes real or
0 0
xu,t ∼ p(xu,t |day of week, month, user) (4) synthetic data, day vector and month vector as input, produces
a label of whether the input is real or synthetic.
Each user has different behavior. Thus the time series
produced by different users are sampled from different
C. Machine Learning Algorithms for Evaluation
distributions. Given training data from N different users
{xu,t }Tt=1 , where u = 1, 2, · · · , N , the goal is to train a For each user u = 1, 2, · · · , N in the real dataset {xu,t }Tt=1 ,
generator function G that can produce samples subject to we generate synthetic data {x̂u,t }Tt=1 of the same time length
distribution p without explicitly modeling or calculating p. (4 years with 15 minutes interval in Pecan Street Dataset).
(a) User 93 Real Consumption and Generation
(a) Maximum Mean Discrep-
ancy (b) Training Loss
Fig. 4: Statistics during Training
{xu,t }Tt=1
enced to make it stationary.
Real Data Real or
• q: the moving average order that denotes the number of
<latexit sha1_base64="21z4uY0UWptk4PYIXADfBGq/+yE=">AAACK3icbZBLS8NAFIUnPmt9RcWVm2ARXEhJ3OhGKLpxWaEvaGOYTKft0EkmzNyoZciPcav4a1wpbv0dOmm7sK0HBg7fuRfunDDhTIHrflhLyyura+uFjeLm1vbOrr2331AilYTWieBCtkKsKGcxrQMDTluJpDgKOW2Gw5s8bz5QqZiIazBKqB/hfsx6jGAwKLAPO/op0OkZZJ0s0HDlZfe6lgV2yS27YzmLxpuaEpqqGtg/na4gaURjIBwr1fbcBHyNJTDCaVbspIommAxxn7aNjXFEla/H52fOiSFdpyekeTE4Y/p3Q+NIqVEUmskIw0DNZzn8N8sJCMFVNotTDkyKx7mzoHfpaxYnKdCYTK7qpdwB4eTFOV0mKQE+MgYTyczHHDLAEhMw9RZNY958P4umcV723LJ355Uq19PuCugIHaNT5KELVEG3qIrqiCCNntELerXerHfr0/qajC5Z050DNCPr+xdIx6lL</latexit>
sha1_base64="67x4xm79ZGFlEjaOFJuD1m0nNYw=">AAACK3icbZC9SgNBFIXvxr8Y/6JiZbMYBQsJuzbaCEEbSwWjgSQus5NJMmR2Z5m5q4ZhH8ZWsbX2HawUWx/BWieJhUYPDBy+cy/cOWEiuEbPe3FyE5NT0zP52cLc/MLiUnF55VzLVFFWpVJIVQuJZoLHrIocBaslipEoFOwi7B0N8osrpjSX8Rn2E9aMSCfmbU4JWhQU1xrmJjDpDmaNLDB44GeX5iwLiiWv7A3l/jX+tylVNj8enwDgJCh+NlqSphGLkQqidd33EmwaopBTwbJCI9UsIbRHOqxubUwipptmeH7mblnScttS2RejO6Q/NwyJtO5HoZ2MCHb1eDaA/2YDglIKnf3GqUCu5PXYWdjebxoeJymymI6uaqfCRekOinNbXDGKom8NoYrbj7m0SxShaOst2Mb88X7+mvPdsu+V/VO/VDmEkfKwDhuwDT7sQQWO4QSqQMHALdzBvfPgPDuvzttoNOd876zCLznvX5GarBo=</latexit>
sha1_base64="LnQ0FfKUHFbm0DCw5ClYdBXU5qY=">AAACK3icbZC7SgNBFIZn4z3eVsXKZjEKFhJ2bbQRgjaWEXKDJC6zk0kyZHZnmTmrhmEfxlax9QV8BCvF1kfQVieXQhN/GPj5/nPgzB/EnClw3VcrMzM7N7+wuJRdXlldW7c3NitKJJLQMhFcyFqAFeUsomVgwGktlhSHAafVoHc+yKvXVComohL0Y9oMcSdibUYwGOTb2w196+vkENJG6ms49dIrXUp9O+fm3aGcaeONTa6w9/n0fL38VfTt70ZLkCSkERCOlap7bgxNjSUwwmmabSSKxpj0cIfWjY1wSFVTD89PnX1DWk5bSPMicIb094bGoVL9MDCTIYaumswG8N9sQEAIrtK/OOHApLiZOAvaJ03NojgBGpHRVe2EOyCcQXFOi0lKgPeNwUQy8zGHdLHEBEy9WdOYN9nPtKkc5T037116ucIZGmkR7aBddIA8dIwK6AIVURkRpNEdukcP1qP1Yr1Z76PRjDXe2UJ/ZH38AMysrZQ=</latexit>
V. E XPERIMENTAL R ESULTS
A. Statistical Property Analysis
Transpose
Transpose
Transpose
Conv1D
Conv1D
Conv1D
Conv1D
Conv1D
Conv1D
Concat
Concat
Dense
(b) Synthetic consumption during day time and night time (b) K-means centroids fit on synthetic data
Fig. 5: Four years average day time and night time consump-
tion of a user. Day time is 6am to 6pm. Night time is 12am
to 6am plus 6pm to 12am.
TABLE I: K-means clustering prediction results
Train Test F1 Score
Real Synthetic 1.00
Synthetic Real 1.00
Mixed Mixed 0.96 (c) K-means centroids fit on mixed data
Fig. 6: K-means centroids of various settings. We only show
one week curve for demonstration.
night time consumption of a user in Figure 5. As shown
in Figure 5, the synthetic data captures the general pattern
as shown in real data. However, the synthetic data is noisy
than real data.
• Solar Generation Noise: The solar generation is pro-
portional to the solar radiation when it is sunny. The
noise (glitches) on solar generation curve are caused by
cloud or rain during sampling period. We notice that
the synthetic data automatically captures this feature as
shown in Figure 2.