1. Introduction
To date, massive use of fossil energy has caused serious harm for society, environment, and economy. At the same time, renewable energy has received widespread attention and become an important part of modern energy due to its good cleanliness and low carbon property. However, traditional research mainly focuses on the planning, design, and operation of individual energy systems, while ignoring the coupling between different types of energy source, resulting in greatly reduced flexibility in system scheduling.
Consequently, there is a need for energy systems to evolve from individual systems with little or no interdependencies into a complex set of integrated systems on large scales. On the basis, integrated energy system (IES) is proposed, which involved electricity, natural gas, and heat. The production, transmission, and consumption of the system can be optimized by IES as a whole, which is the key of the third industrial revolution [
1]. At present, researches and works on IES can be divided into two characteristics based on application scenarios as [
2,
3] wide-area IES and industrial-park IES. Wide-area IES mainly includes the electricity transmission network and gas transmission network. Industrial-park IES involve the coordination of multiple regional energy systems on distribution network and can be seen as upgraded microgrids. The interaction between wide-area IES and industrial-park IES is shown in
Figure 1.
Numerous functions in industrial-park IES will be considerably different compared with those in a conventional power grid, especially the load forecasting function. In recent past decades, the increasing quality of electrical load forecasts in short-term is significantly important for the accuracy and economics of marketers, which guides a lot of decisions of EMS [
4]. With the wake of industry, energy forecasting will be much more significant than ever. As the interaction among a variety of energy sources become more relevant, the planning and optimization of IES will be confronted with many difficulties if there is no accurate energy forecast. Thus, efforts to allow load forecasts for diverse energy types to replace predictions of solely electrical load prediction will make an important impact on IES development.
There have been many researches involving load forecasting on electricity, including distribution network, microgrid, and even a nation. In addition, mature data mining machine learning is applied in prediction on power ranging from long-term to short-term horizon. Hong and Fan [
5] offer a considerable review on load forecasting, from the perspective of feature use, methodologies and evaluation methods. A nonlinear combined estimator is applied to forecast short-term load in [
6], which includes the whole chain of forecasting system, such as, preprocessing, forecasting model and the evaluation style. Ref. [
7] suggests an innovative load forecasting model under cyberattack situation, which conduct operators to make suitable and appropriate dispatching decision. Ref. [
8] proposes an improved multitask method for Bayesian Gaussian process by considering a wide variety of community’s energy consumption. Ref. [
9] proposes the load prediction serving for day-ahead market, which is on account of hybrid artificial neural network, using an improved differential evolution algorithm as optimization approach. Apart from electric load forecasting, forecasting for other types of loads has also been addressed. Considering the property of energy source largely differs, the forecasting models are also built from diversified perspective. Ref. [
10] presents predictions on volume of natural gas for day ahead by considering the gas change inertia, which is also faced with problems of gas dispatching issue. Ref. [
11] deploys a linear regression to predict thermal loads 24 h ahead and compares the accuracy of each model. It can be concluded that forecasting the demand of gas and thermal in advance is also a vital preparation, which is also faced with dispatching, operation, and planning issue to cope with.
Most previously developed load forecasting methods suffer from two misunderstandings:
First, majority mature learning algorithms are of the shallow structure type. For example, in most of the neural network (NN) method, only a single hidden layer is included because of the lack of proper training algorithm for normal NN with various layers.
Therefore, complicated and error-prone hand-engineered features cannot be avoided in shallow approaches, and knowledge of specific domains is required for feature analysis. For example, maximum load, minimum load, daily load variation rate, and normalized mean square are the load characteristics extracted from the historical load data in [
10].
Furthermore, different sources of energy demands are usually taken into account separately. The influence mechanisms of different load types exhibit complex relationships because of their different levels of relevance and independence, e.g., electricity and heat can be produced by a combined heat and power (CHP) system by consuming gas, electricity can be converted into heat through aground source heat pump, gas can be converted into heat through a gas-fired boiler, even further power-to-gas (P2G) functionalities can be widely used, and so on, in addition to centralized approaches to power, heat, and gas supply. The strong couplings among diversified energy loads in industrial-park IES have not yet been considered for load forecasting applications. On the other hand, the property of other energy source is dramatically different, especially for industrial use. As a result, how to make proper predictions becomes an important research topic.
Deep learning as a frontier of machine learning technology can enable the regression of complex nonlinear functions. The framework for deep learning is similar to the hierarchical structure of human brain, the core idea of deep learning algorithm is to, first, use an unsupervised learning method to extract data representation at a higher, more abstract level and, then, to use a supervised learning model for classification or regression. The differences between the widely used shallow learning algorithms and deep learning algorithms are as follows: (1) deep learning emphasizes the depth structure of model, which usually has 5 layers, 6 layers, 10 layers, or even more hidden layers and (2) deep learning also clearly highlights the importance of feature learning in building a multilevel, which means mapping relations from external low-level external signals to intricate high-level features during training layer by layer. Thus, the ability of multilayer to capture highly abstracted features from training data makes classification and regression easier. A structure comparison of shallow learning and deep learning is shown in
Figure 2.
Several corporations, including Google, Tencent, IBM, and Microsoft, have devoted big data resource to researching this novel technology, and the development of deep learning have brought about great breakthrough in speech recognition, image processing, natural language, online advertising, and other fields in recent years. In 2011, deep learning technology was approved on speech recognition and the error rate was found to decrease by 20–30% [
12]. In 2012, deep learning technology has also achieved significant success in the ImageNet competition [
13] showing that the errors rate of this algorithm was 9% lower than that of classical strategies. In 2016, the DeepMind team from Google combined Monte Carlo random tree and reinforcement learning with deep learning to develop a new artificial intelligence in the Game of Go program named “AlphaGo,” which prevailed the human professional human ninth-dan player Lee Sedol with a total score of 4:1 [
14]. Considering the effect of practical application, deep learning may be the most successful technology in the domain of artificial intelligence in recent decades. There have been a few publications concerning research on the employment of this promising technology in the field of energy systems. Refs. [
15,
16] report the application of a sophisticated deep learning technique for wind power projection. Ref. [
17] proposes that deep learning can be applied to load prediction for short term. In addition, the simulation data is from electrical power consumption from city smart gird. Thus far, the applications of this promising technology in energy research have been very limited, without fully considering the advantage of autolearning feature from complex data space.
It is hoped that the remaining shortcomings of load forecasting methods for the IES context will be resolved. The framework of the article will be shown as addressing three respects of the problem.
The advancements in computer and sensor have generated a massive amount of data with high dimensionality in terms of features. This paper fully considers economic data, historical data, weather data, and calendar to enable them to be used to the maximum possible extent.
Deep learning approaches have been effectively applied in energy demand forecasting research by highlighting the importance of feature learning in building model process, which means mapping relations from external low-level external signals to intricate high-level features during training layer by layer. These promising approaches have a feature abstraction capability that are appropriate for addressing the features without requiring careful engineering or extensive domain expertise.
Different energy demands in IES always have complex interactions because of energy conversion and consumption fashion. Simultaneous learning for multiple relevant tasks can allow more precise energy demand forecasting results to be obtained. A deep learning architecture that also incorporates multitask learning is, therefore, proposed in this paper. It is fully demonstrated that multitask learning, for the simultaneous forecasting of electrical, gas, and heat loads, outperforms individually established prediction models. We investigate the effect compared with multiple instances of single-task to test the feasibility of multitask.
The framework of the article can be presented as follows.
Section 2 introduces a basic theory on deep learning and multitask learning.
Section 3 presents the proposed two-stage load forecasting model as well as the method used to supplement missing data, the principle applied for setting the inputs and the calculation of the accuracy index for evaluation.
Section 4 describes the selection of the architecture parameters, the experimental results obtained using our approach, and the comparison with the results of other algorithms. In the end,
Section 5 shows the conclusion and further work.
3. Model
In deregulation environment, it is necessary to implement load forecasting function for energy marketers, load aggregators, and independent system operator (ISO). In addition, many of these agents may lack computing experience and access to high-performance servers. The investigation of such scenarios will require the development of a universal model that is suitable for application by a wider variety of entities. Based on the relevant application demands and algorithm requirements, we propose a two-stage short-term hourly load projection for multiple energy types. The method takes the idea of a deep multitask learning framework composed of modules for off-line training and on-line prediction. whose structure is shown in
Figure 5.
Off-line training plays an important part in guaranteeing load forecasting accuracy, which is essential for parameter optimization. The off-line segment collects load data through wide cyber-physical system to achieve optimal parameters of algorithm. According to the computational burden and data timeliness, each training dataset used in off-line training needs to span the same appropriate fixed time range. Since a deep learning algorithm requires considerable calculations and storage resources because of its special structures, which contain enormous numbers of nodes and layers compared with a conventional shallow algorithm, services related to this portion of the framework could be provided by industrial-park IES service providers or the operators of the energy distribution systems in practical application.
Meanwhile, the on-line prediction module can be configured for execution on normal computers or embedded systems. Once the parameters of the algorithm are found through off-line training, the trained program can be downloaded to a normal PC to implement the real-time portion of the algorithm. Real-time data can be metered by smart terminal, which means true previous historical load is used as input of on-line prediction.
Load is a nonstationary process due to utility growth, variations in weather or seasonal changes. Hence, a periodic updating mechanism that is sensitive to recent load trends can assist in producing better results. When the error is larger than a certain error threshold over a certain period of time or when the same parameter has been applied over a fixed time threshold, an update strategy will be triggered that will cause a new data sample to be used to reobtain the model parameters, thus serving as a rolling updating mechanism.
3.1. Preprocessing for loads Data and Normalization
Data preprocessing or data cleansing always plays a significant role in load forecasting. Influenced by factors such as measurement equipment characteristics, grid faults and energy supply constraints, the historical data collected by data acquisition devices can be expected to be subject to problems such as missing data and abnormal fluctuations. Therefore, these data need to be supplemented or revised before being used. Load data of the same type and from the same date are selected for supplementation or correction using the weighted average method:
where
xi represents the load value at the ith sampling point,
λ1 and
λ2 are the weights used for the calculation (both could be 0.5), and
xi−24 and
xi+24 represent the load values at the same time on the previous day and the next day, respectively. Finally, the historical load data series should be normalized to the interval [0, 1] as follows:
3.2. Input Properties Setting
The input variables can make maximum use of the information provided by cyber-physical systems and numerical weather forecasts. The input features include historical load data, weather data, calendar rule, and economic data. The historical load data comprise historical datasets of heat load, gas load, and electrical load. The weather data consist of temperature, humidity, and wind speed. Weekdays and holidays (weekends and public holidays) are classified by calendar rule in accordance with the public holidays observed in China. In addition, for economic data, depending on the size of the region, one can use regional GDP data, the stock prices of corporations, or other financial data. Considering that the industrial park belongs to listed company, we also add some economical index into dataset, such as stock price. The input variables can make maximum use of the information provided by cyber-physical systems and numerical weather forecasts. The setting of input properties is shown as
Figure 6.
3.3. Evaluating Indicator
Assume
r is the total quantity of days. Let
a(
k) represents the
kth actual value, and let
c(
k) represents corresponding forecasted value. Then, the MAPE and the corresponding mean accuracy (
MA) are defined as follows:
Given that various sorts of outputs are predicted by multitask model, we use the weighted mean accuracy (WMA) to evaluate the whole model performance, which is calculated using the different load forecasting errors as weights. α
k represents the weight corresponding to
MAk, which is the mean accuracy of the
kth sample.
Here, it is implied that the type of energy load on which we are most focused is that with the highest precision. Because of the dominant importance of electricity among the various energy types in the IES context, the electrical load forecasts must be given a higher weight than the other tasks in this paper. We set weights of electricity, gas, and heat as 0.6, 0.2, and 0.2, respectively.
4. Experiments and Results
The experimental dataset is collected from the industrial-park IES demonstration project of Goldwind Technology Co., Ltd., in Daxing District, Beijing, China, consisting by electricity, natural gas, and heat. The interactions among the various energy carriers in this industrial-park IES demonstration project are shown in
Figure 7. CHP, electric boiler, and gas-fired boiler are established as energy-conversion facilities to meet the demands for transformation among various energy sources. The industrial-park IES is configured by storage station to develop economic benefit as a whole. The PV and wind generator are also configured in IES to utilize the renewable energy resource. It is noteworthy that although industrial-park IES includes the industrial production module, the community also hosts office part and residential zones. Energy loads in different period of year are shown in
Figure 8.
Taking into account of seasonal fluctuation of energy load in different periods of year, two patterns of data that represent energy consumption of summer and winter are used in our experiment. The sampling interval time is 1 h. The task is to forecast the electricity, heat, and gas load of next hour.
Pattern 1 (winter): The dataset of the period of January 2014 to December 2015 is applied for training, and the dataset in January 2016 is used as test data.
Pattern 2 (summer): The dataset of the period of July 2014 to June 2016 is used as training data, and the dataset in July 2016 is used as test data.
4.1. Parameters Selection of Deep Multitask Learning Architecture
To ensure the optimal structure of the deep multitask learning algorithm to obtain results with the minimum prediction error, it is particularly important to properly choose the exact number of layers, nodes, the training iterations, and other relevant parameters. The improper framework of model leads to negative effect for model performance. In our case, data in Pattern 1 is used as experimental data for the off-line training module. These parameters are selected using the longitudinal comparison method. When testing the effect of a given parameter on the prediction results, the other parameters are held fixed. The weights number and the training time are deemed representative of the spatiotemporal complexity of the problem.
The influence of the number of layers on the WMA is presented in
Table 1. The number of nodes in one layer can be held fixed (256 nodes). As number of layers is increased from 2 to 4, the WMA is gradually improved. When the number of layers reached 4, the WMA of the test dataset is 0.9563, and the optimal number is achieved. When it is beyond 4, the WMA is decreased because the model becomes too complicated, leading to “over-fitting” and causing the spatial and temporal complexity to simultaneously increase.
The influence of the quantity of nodes in one layer on the WMA is presented in
Table 2. The number of layers was held fixed (4 layers). As the quantity of nodes is increased from 64 to 256, the WMA increases proportionally. This suggests that if we continue to reduce nodes in each layer, the model may be unable to learn representative features. When the number of nodes per layer was 256, the WMA reached its optimal value of 0.9563; as the number of nodes rise to more than 256, the additional improvement in precision become negligible, whereas the calculation time greatly increased. Thus, including more nodes will impose an unnecessary burden for model training.
When the model is trained using fewer than 600 iterations, the WMA rises for both datasets with the increase in the training iterations. When it exceeds 600, WMA for training dataset continues to increase. However, the WMA for the test dataset begins to show a continuous decline, which is also characteristic of the over-fitting phenomenon. Therefore, the best architecture parameters are found to be 4 layers, 256 nodes in each layer, and 600 training iterations.
4.2. Various Types of Load Forecasts Results
Taking into account of seasonal fluctuation of energy load in different period of year, the real load samples and forecasting data for 1 week in January or July 2016 are chosen to analyze the forecasting results and errors distribution. The time ranged from the 72nd h to the 120th h corresponds to holidays, and the remaining time corresponds to working days. The prediction results for each type of load and the hourly MAPE variations for the different types of loads are depicted in
Figure 9 and
Figure 10.
The curve of energy load appears to reflect seasonal fluctuation in different seasons. In summer, due to high temperature, a large number of air conditioners and electric fan device are put into operation. Electrical load appears to be obviously increased, which makes peak electricity consumption occur frequently. Meanwhile, the demand for heat is limited except some necessary requirements on industrial production. On the contrary, in winter, the demand for gas or heat load is greatly increased because of people’s living and industrial use needs.
It can be concluded from
Figure 9d and
Figure 10d that the accuracy is high in the rising and falling regions of the load curves. The largest errors typically appear closely the maximum and minimum points of daily loads. Moreover, the general trends for the different types of loads largely coincide regarding on-peak and off-peak periods.
From
Figure 9 and
Figure 10, we can see that the fluctuations in electric load are more severe than those in the other loads during all periods. This is because the inertia of the heat and gas loads is greater than that of the electrical load, and the diversified nature of the storage of gas and heat ensures sufficient reserves for operation. Thus, the gas and heat load forecasts have less error. The MAPEs of different pattern for the electrical load, gas load, and heat load are shown in
Table 3.
Although holidays and weekdays are clearly classified by calendar rules in the input settings, the forecasting error for holidays still appears to be larger than that for weekdays. This error mainly originates from the uncertainty in people’s activities on holidays (relaxation, entertainment, etc.). Moreover, enthusiasm for work or production may either increase or decrease on the days surrounding holidays, which lack the regularity of normal weekdays. The average load MAPEs for holidays, the days surrounding holidays, and weekdays are 6.89%, 4.64%, and 2.79%, respectively.
Because of its limited capacity, forecasting for industrial-park IES is more complicated to implement than forecasting for a large power grid because of the higher randomness in the historical load curves. The fuzziness of the loads is largely reflected in the load forecasting result, and the random errors cannot offset each other for such a small-scale system. Therefore, this situation will certainly lead to larger fluctuations.
4.3. Comparison of Multiple Forecasting Methods
In this section, several commonly used data mining methods are applied as benchmark to compare with deep learning, which cope with same multitask load forecasting regression problems, e.g., an ARIMA, support vector regression (SVR), gradient boosted trees (GDT), and BP-NN. The dataset differed slightly because of the different conditions for each method. The results for different methods are compared in
Table 4.
The conclusion can be drawn that deep learning is superior in terms of generalization ability and learning capacity, because the process of unsupervised learning allows intrinsic relationships among a variety of factors to be found. The ARIMA model is limited by its algorithm structure such that it can only use historical load datasets and cannot effectively account for the impact of other factors. The forecasting results of the BP-NN and SVR models also show a gap with respect to the deep learning results because their restricted learning capabilities impose limits on their ability to handle complex data models.
In addition, when the shallow algorithms (such as, BP-NN and SVR) are used to make prediction of the electricity, heat, and gas, the forecasting error of thermal and gas is obviously lower compared with electrical power. The reason is that the usage of gas and heat profile are more regular, whose change is mainly from production arrangement of industry, and electrical load always has serious volatility due to nonstorable characteristic. When deep learning method is deployed to forecast the different source of load, the forecasting error gap between electrical load and other tasks is reduced, meanwhile, the prediction accuracy for all tasks has been improved, which indicates that the unsupervised learning part can learn the better features provided by other tasks by weight sharing of multitask learning. The task of electrical load forecasting is able to learn knowledge from the gas or heat profile to aid improve the forecasting performance.
The following comparative analysis focuses on the details of the underlying advantages of the deep learning algorithm compared with the shallow learning of BP-NN, the main difference between deep learning and BP-NN is whether including unsupervised learning or not as pretraining method in modeling process. Thus, we design the experiment concerning about the loads forecasting model with or without the unsupervised training to test network error for both methods.
The conclusions of experiment can be drawn as follows from
Figure 11. The forecasting error between traditional BP-NN and DBN is basically the same concerning the structure with one single hidden layer. The effect of BP-NN is very unsatisfactory as the number of layer increases, and network error becomes large. The reason is that gradient explosion existence when the error signal is transformed from top layer to bottom layer in BP-NN, resulting in inefficient training of the parameters at the bottom of the network, which is corresponding to bad generalization performance. On the other hand, BP-NN parameters are initialized randomly, and weights and bias of BP-NN are adjusted in iteration process until convergence. Hence, the starting point of iteration is far away from the optimal region and the loss function always contains multiple local minima. However, unsupervised training is implemented before supervised learning process in DBN, their weights are initialized by DBN instead of random generation. Then, DBN is expanded into a BP neural network, whose network is optimized by BP, to overcome the shortcomings of easily falling into local optimum and long training time.
4.4. Comparison of Different Training Models
In this section, the effectiveness of multitask learning is validated. The deep architecture network with multitask aggregated at different level is trained, and we compare the results with those of training for each task separately. The comparisons of single-task and multitask learning are all processed using the same deep learning architecture. The same training data are also used to support the different learning approaches to ensure the fairness of the experiment.
Figure 12 compares the multitask results with single-task results. The prediction accuracy of three kinds of loads is significantly improved by jointly training tasks. We can see that the MA for forecasting electricity-gas together is less than those of other multitask aggregated style for the reason that electrical flow and gas flow may have weak coupling with each other. On the contrary, complicated and diversified transformed fashion between gas flow- and heat flow-enabled multitask for gas-heat obtains relatively better forecasting results. In addition, compared with single-task, the accuracy improvement of each task for training all loads reached approximately 2.7%, which has the best performance in all multitask models. Instead of forecasting each load separately, the multitask method analyzes the complex coupling relationship among several types of input information, meaning that the training task for each type of load considers information related to the other load types. The multitask for all loads leads to a largest reduction in error, which effectively improves the accuracy of the results.
Moreover, multitask exhibits great performance in terms of the computational cost, which is always a significant criterion for evaluating a model. In
Table 5, it is clearly seen that although the single-task training for each individual energy source requires a relatively short amount of time compared with the multitask training, the training process for multitask learning requires less time than the sum of all training times for the single-task training processes. Therefore, simultaneous training for multiple related tasks is more convenient because it only needs one training run rather than requiring the training process to be executed separately for each task.