0% found this document useful (0 votes)
18 views10 pages

Decision Tree For Building Energy Demand Method

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views10 pages

Decision Tree For Building Energy Demand Method

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Energy and Buildings 42 (2010) 1637–1646

Contents lists available at ScienceDirect

Energy and Buildings


journal homepage: www.elsevier.com/locate/enbuild

This is the preprint version. See


Elsevier for the final official version.
A decision tree method for building energy demand modeling
Zhun Yu a , Fariborz Haghighat a,∗ , Benjamin C.M. Fung b , Hiroshi Yoshino c
a
Department of Building, Civil and Environmental Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8
b
Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8
c
Department of Architecture and Building Science, Tohoku University, Japan

a r t i c l e i n f o a b s t r a c t

Article history: This paper reports the development of a building energy demand predictive model based on the decision
Received 18 March 2010 tree method. This method is able to classify and predict categorical variables: its competitive advan-
Accepted 19 April 2010 tage over other widely used modeling techniques, such as regression method and ANN method, lies in
the ability to generate accurate predictive models with interpretable flowchart-like tree structures that
Keywords: enable users to quickly extract useful information. To demonstrate its applicability, the method is applied
Building energy consumption
to estimate residential building energy performance indexes by modeling building energy use intensity
Modeling
(EUI) levels. The results demonstrate that the use of decision tree method can classify and predict building
Decision tree
Classification analysis
energy demand levels accurately (93% for training data and 92% for test data), identify and rank signifi-
cant factors of building EUI automatically. The method can provide the combination of significant factors
as well as the threshold values that will lead to high building energy performance. Moreover, the aver-
age EUI value of data records in each classified data subsets can be used for reference when performing
prediction. One crucial benefit is improving building energy performance and reducing energy consump-
tion. Another advantage of this methodology is that it can be utilized by users without requiring much
computation knowledge.
© 2010 Elsevier B.V. All rights reserved.

1. Introduction ing energy performance rapidly so that they can optimize their
building design plans. Building energy simulation tools have been
There has been a growing concern about the total building utilized to forecast and analyze building energy consumption and
energy consumption which is a substantial user of energy world- describe building energy use patterns, in order to benefit the design
wide. Further, with rising living standards, building energy con- and operation of energy efficient buildings. In recent years, there
sumption throughout the world has been significantly increased have been many studies on building energy demand modeling and
over the past few decades. For example, from 1994 to 2004, building several methods were employed, such as traditional regression
energy consumption in Europe and North America has increased methods [3,4], artificial neural networks (ANN) methods [5–7], and
at a rate of 1.5% and 1.9% per annum, respectively [1]. Chinese building simulation methods [8,9], etc. Through statistical meth-
building energy consumption has increased at more than 10% per ods and regression equations, regression models correlate building
annum for the past 20 years [2]. The high level of building energy energy demand with relevant climatic variables and/or building
consumption and the steady increase in building energy demand physical variables in order to predict energy demand. The main
necessitate designing energy efficient buildings and improving its advantage of regression models is that they are comparatively sim-
energy performance. ple and efficient. The ANN model is also able to predict the thermal
In the practice of energy efficient building design, architects performance of building and its foundation is based on mimick-
and building designers often need to identify which parameters ing the structure and properties of biological neural networks. The
will influence future building energy demand significantly. Fur- greatest strength of ANN models in comparison with other models
thermore, based on different combinations of these parameters lies in their ability to model complex relationships between inputs
as well as their values, architects and building designers usually and outputs. These two methods have been successfully applied to
expect to find a simple and reliable method to estimate build- predict building energy demand. However, considering the regres-
sion models are normally complicated equations and ANN models
operate like a “black box”; therefore, the models developed using
∗ Corresponding author at: Department of Building, Civil and Environmental
these methods are not understandable and interpretable especially
for common users without advanced mathematical knowledge.
Engineering, Concordia University, 1455 De Maisonneuve Blvd., Montreal, Quebec,
Canada H3G 1M8. This makes it difficult to be a common predictive tool. Moreover,
E-mail address: [email protected] (F. Haghighat). in these studies, the focuses have been mainly on the energy use

0378-7788/$ – see front matter © 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.enbuild.2010.04.006
1638 Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646

Fig. 1. Schematic illustration of a simple hypothetical decision tree.

prediction of existing buildings (e.g. predict hourly heating/cooling the value of a target variable can be predicted by using the values
load for a certain type of building), whereas the energy use predic- of a set of predictor variables. Fig. 1 gives a decision tree indicat-
tion of newly designed buildings, which is also very important for ing whether residents turn room air conditioners (RAC) on or off
architects and building designers to make rational decisions at the in their rooms in the cooling season. Assume 100 data records are
early stage of design and operation, are seldom carried out. used to build this decision tree and each record has three attributes:
Building simulation allows the prediction of building energy outdoor air temperature, room occupancy, and the operating state
performance under various conditions. However, this method does of RAC.
not perform well in predicting the energy use for occupied buildings The target variable for the above decision tree is RAC operating
as compare to non-occupied buildings due to the lack of sufficient states, with potential states being classified as either turning on or
knowledge about occupants’ behavior. Additionally, the application off. The predictor variables are outdoor air temperature (≤26 ◦ C or
of building simulation programs is normally complicated and the >26 ◦ C) and room occupancy (empty or not). As shown in Fig. 1, the
learning process of these programs tends to be time-consuming. decision tree consists of three kinds of nodes: root node, internal
In the past two decades, decision tree method, a novel computa- node, and leaf node. Root node and internal node denote a binary
tional modeling technique that uses flowchart-like tree structure, split test on an attribute while leaf node represents an outcome of
has been widely used for classification and prediction in many the classification and thus holding a categorical target label. More-
scientific and medical fields [10–12]. The popularity of decision over, the numbers in the parentheses at the end of each leaf node
tree method mainly attributes to its ease of use, and abilities depict the number of data records in this leaf. If some leaves are
to generate accurate predictive models with understandable and impure (i.e. some records are misclassified into this node), the num-
interpretable structures, which, accordingly, provide clear and use- ber of misclassified records will be given after a slash. For example,
ful information on corresponding domains. Moreover, the decision (60/5) in the left most leaf in Fig. 1 means that, among the 60
tree method is able to process both numerical and categorical records having outdoor temperature is lower than or equal to 26 ◦ C
variables, and perform classification and prediction tasks rapidly that have been classified to turned off, 5 of them actually have the
without requiring much computation efforts. However, it should value turned on. By using this decision tree, whether RAC operating
be mentioned that decision tree method is more appropriate for states should be classified as being ‘turned on’ or ‘turned off’ can
predicting categorical variables than for predicting numerical vari- be predicted. For example, if the outdoor air temperature is higher
ables. The application of decision tree method in building related than 26 ◦ C and the room is not empty, occupants will turn RAC on;
studies is still very sparse. Tso and Yau [13] compared the accuracy otherwise they will turn it off.
of regression method, ANN method, and decision tree method in
predicting average weekly electricity consumption for both sum-
mer and winter in Hong Kong. It was found that decision tree model 2.2. Decision tree generation
and ANN model have a slightly higher accuracy than other models.
Therefore, it is highly desirable to utilize decision tree method to Decision tree generation is in general a two-step process,
process measured data, which has already included the influences namely learning and classification, as shown in Fig. 2. In the learn-
of occupant activities, for building energy demand modeling. ing process, the collected data are split into two subsets, training
The paper reports the development of a procedure to accurately set and testing set. Creation of training set and testing set is an
estimate building energy performance indexes. The procedure is important part of evaluating data mining models. Usually, most of
based on the decision tree method. The applicability of the proce- the data records in the database are arbitrarily selected for train-
dure is then demonstrated for residential buildings sectors. ing and the remained data records are used for testing. Note that
training set and testing set should come from the same population
but should be disjoint. Then, a decision tree generation algorithm
2. Methodology takes the training data as input and outputs a decision tree. Com-
monly used decision tree generation algorithms include ID3 [14],
2.1. Overview of decision tree classification and regression trees (CART) [16], and C4.5 [17]. In
this study, we employ C4.5, along with an open-source data min-
The decision tree methodology is one of the most commonly ing software WEKA, to build decision tree due to its flexibility and
used data mining methods [14,15]. It uses a flowchart-like tree wide applicability to different types of data. In the classification
structure to segregate a set of data into various predefined classes, process, the accuracy of obtained decision tree is first evaluated by
thereby providing the description, categorization, and generaliza- making predictions against the test data. The accuracy of a decision
tion of given datasets. As a logical model, decision tree shows how tree is measured by comparing the predicted target values and the
Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646 1639

Fig. 2. Procedure of decision tree generation.

true target values of the testing data. If the accuracy is considered tree to characterize the purity of a partition in decision tree nodes.
acceptable, the decision tree can be applied to new dataset for clas- Given a decision tree containing only binary target variables such
sification and prediction; otherwise, the reason should be identified as HIGH EUI and LOW EUI, the entropy of the data subset, Di , of the
and corresponding solutions should be adopted to tackle problems. ith tree node is defined as
The procedure of generating a decision tree from the training  n HIGH n HIGH n LOW n LOW 
data is explained as follows. Initially, all records in the training data Entropy (Di ) = − log2 + log2 (1)
TN TN TN TN
are grouped together into a single partition. At each iteration, the
algorithm chooses a predictor attribute that can “best” separate where n HIGH: the number of HIGH EUI records in Di ; n LOW: the
the target class values in the partition. The ability that a predictor number of LOW EUI records in Di ; T N: the total number of records
attribute can separate the target class values is measured based on in Di and T N = n HIGH + n LOW.
an attribute selection criterion, which will be discussed in Section The entropy varies between 0 and 1. Notice that the entropy
3.3. After a predictor attribute is chosen, the algorithm splits the equals to 0 if Di is pure and it is 1 when n HIGH equals to n LOW. At
partition into child partitions such that each child partition contains each node of a decision tree, candidate splitting test will be used to
the same value of the chosen selected attribute. The decision tree evaluate all available attributes to select the most suitable attribute
algorithm iteratively splits a partition and stops when any one of to split data. Suppose the jth attribute has been selected as node
the following terminating conditions is met: attribute. A candidate split test, ST, at the ith tree node is defined
as

1. All records in a partition share the same target class value. Thus, ST : Valj (r) ≤ T h (if the jth attribute is numerical) (2)
the class label of the leaf node is the target class value;
2. There are no remaining predictor attributes that can be used to ST : Valj (r) ∈ {v1 , v2 }
further split a partition. In this case, the majority target class
(if the jth attribute is categorical and has two values) (3)
values become the label of the leaf node; and
3. There are no more records for a particular value of a predictor where Valj (r): the value of the jth attribute of record r; T h: thresh-
variable. In this case, a lead node is created with the majority old value; v1 , v2 : two values of the jth attribute.
class value in the parent partition. Next, the algorithm applies ST to Di and partitions Di into two
subsets, DS1 and DS2 . Let r be a record in Di . If the jth attribute is a
2.3. Attribute selection criterion numerical attribute, then

DS1 = {r ∈ Di |valj (r) ≤ T h} and DS2 = {r ∈ Di |valj (r) > T h}. (4)
The decision tree generation algorithm is a greedy algorithm. It
iteratively splits a partition by choosing a split attribute that can If the jth attribute is a categorical attribute, then
best separate the target class values. The choice of split attribute
DS1 = {r ∈ Di |valj (r) = v1 } and DS2 = {r ∈ Di |valj (r) = v2 }. (5)
determines the quality of the decision tree model and, therefore,
the classification accuracy on the future data. The concept of entropy Let m and n be the number of records in DS1 and DS2 , respec-
[16] in information theory is a widely criterion measure for decision tively. The entropy after the split test can then be calculated as
1640 Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646

the weighted sum of the entropies for the individual subsets


m n
Entropy (DS1 and DS2 )= Entropy (DS1 )+ Entropy (DS2 )
m+n m+n
(6)

where Entropy (DS1 ) and Entropy (DS2 ) can be calculated by using


formula (1).
The selection of node attribute used to split data is very impor-
tant and a rational selection can improve the purity of tree nodes.
A widely used attribute selection measure is information gain [18],
which is defined as the entropy reduction before and after a can-
didate splitting test. Therefore, information gain can be calculated
as
Fig. 3. Boxplot for monthly average outdoor air temperature in the six regions in
InfoGain = Entropy (Di ) − Entropy (DS1 and DS2 ) (7) 2003.

For each tree node, the attribute with the maximum information
gain will be chosen as the splitting attribute at this node. The infor-
mation gain measure, however, has a bias to attributes with larger
number of domain values. One way to avoid such bias is to nor-
malize the information gain by a split information value defined
analogously with information gain. C4.5 employs this improved
measure, gain ratio [15]:
InfoGain
GainRatio = (8)
SplitInfo
where
 m m n n

SplitInfo = − log2 + log2 (9)
m+n m+n m+n m+n
The attribute with the highest gain ratio is selected as the split-
ting attribute.
Additionally, in order to detect whether a node should be a
leaf, a minimum threshold value of entropy (ENmin ) will be pre-
defined and compared with node classification entropy (Entropy Fig. 4. Percentage breakdown.
(Di )), if Entropy (Di ) is lower than ENmin , then this node is a leaf
and will be labeled LEAF. Otherwise a further splitting test should
be performed. However, if no significant effects can be observed on temperature has a more or less symmetric distribution. The annual
information gain or gain ratio in further candidate splitting tests, average temperature is higher than 8 ◦ C in all the six districts and
the test will be also stopped and the node will be labeled STOP. the temperature in Hokkaido and Tohoku is comparatively lower
than other districts.
Scrutinizing the data from the 80 buildings it was found that
3. Data source and basic analysis
only 67 sets were complete while the other 13 had missing values of
energy consumption data. Fig. 4 shows the percentage breakdown
3.1. Data collection and pre-processing
of available residential buildings in each district. It can be seen that
the distribution is roughly uniform.
To evaluate and improve residential building energy perfor-
Data reduction and aggregation was also performed as a pre-
mance in Japan, a project was performed by Research Committee
processing step of preparing the data for a database. For example,
on Investigation on Energy Consumption of Residential Buildings
the primary energy sources in the investigated families include
(2001–2003) and Committee on Energy Consumption of Residen-
electricity, natural gas, and kerosene. All these energy sources are
tial and Countermeasures for Global Warming (2004–2005) of the
converted into an equivalent energy value based on conversion
Architectural Institute of Japan. This analysis used the data base
coefficients in Table 1.
of Cd-Rom titled “Energy Consumption for residential buildings in
Moreover, energy end use is classified into eight categories and
Japan” [19]. In this project, field surveys on energy related data
the three major categories include the space heating/cooling, hot
and other relevant information were carried out in 80 residential
water supply, and kitchen. Each end use data with interval of
buildings located in six different districts in Japan.
5 min was aggregated so as to compute hourly, daily, monthly,
• Energy end use of all kinds of fuel used by the building at different and annual total amounts. And thus total energy use can also be
calculated as the sum of the energy content of all the fuel used
time intervals;
• Indoor environment parameters every 15 min;
• Household characteristics; and Table 1
• Other issues such as occupant behaviors and energy saving mea- Conversion coefficients of different fuels.

sures; Fuel Conversion coefficient Unit

Electricity 3.6 MJ/kWh


Fig. 3 shows the boxplot for monthly average outdoor air tem- City gas (4A–7C) 20.4 MJ/Nm3
perature in each district in 2003 using Japanese meteorological City gas (12A–13C) 45.9 MJ/Nm3
data. The mean value of monthly average temperature, i.e. annual Liquefied petroleum gas (LPG) 50.2 MJ/Nm3
Kerosene 36.7 MJ/L
average temperature, is also given. Clearly the monthly average
Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646 1641

Table 2
Summary of model inputs.

Number Variable Type Value Variable label (unit)

1 TEMP Categorical High/low Annual average air temperature


2 HOUS Categorical Detached/apartment House type
3 CONS Categorical Wood/non-wood Construction type
4 AREA Numerical [70, 240] Floor area (m2 )
5 HLCa* Numerical [1.01, 4.35] Heat loss coefficient (W/m2 K)
6 ELAb* Numerical [0.35, 13.30] Equivalent leakage area (cm2 /m2 )
7 NUM Numerical [2, 6] Number of occupants
8 HEAT Categorical Electric/non-electric Space heating
9 HWS Categorical Electric/non-electric Hot water supply
10 KITC Categorical Electric/gas Kitchen
a*
Calculated based on building design plans.
b*
Measured by the fan pressurization method.

by the building in a year. Based on above work, a database was reinforced concrete (SRC), reinforced concrete (RC), and steel
created. structure (S);
(3) Household characteristics (NUM); and
3.2. Model target variable (4) Household appliance energy sources (HEAT, HWS, KITC).
Energy sources are divided into energy generated from electric-
In order to demonstrate building energy performance, model ity consumption and energy generated from other fuels such as
target variable is expressed in energy use intensity (EUI), defined kerosene and natural gas.
as the ratio of annual total energy use to total floor area (the annual
total energy use is calculated as the sum of the energy content of Fig. 5 shows the distribution of all the categorical parameters.
all fuel used by the building in 2003). As mentioned previously, It can be observed that all the percentages range from 30% to 70%,
decision tree method is more appropriate for predicting categorical indicating a fairly uniform distribution.
variables. Therefore, a concept hierarchy for building EUI is formed
before classification and prediction are carried out. Due to the small 4. Results and discussion
database size, a two-grade descending scale, i.e. high level and low
level, corresponding to low energy performance and high energy C4.5 algorithm was used for training data set (55 records were
performance, are considered applicable and understandable. Build- arbitrarily selected from the database) and test data set (i.e. the
ing EUI ranges from 176 MJ/m2 to 707 MJ/m2 in the database and remained12 records that are independent of training set) by using
thus data ranged from the average of the maximum and minimum WEKA to build a decision tree for predicting whether the EUI
to the maximum value, i.e. [441.5, 707], is considered ‘HIGH’. And of residential buildings should be classified as being ‘HIGH’ or
data from the minimum value to the average of the maximum and ‘LOW’.
minimum, i.e. [176, 441.5] is considered ‘LOW’.
It should be mentioned that, decision tree can also be used to 4.1. Generation of decision tree
classify and predict multiple EUI levels rather than just two. For
example, instead of ‘HIGH’ and ‘LOW’, a concept hierarchy of EUI Fig. 6 shows the decision tree for the classification of building
may map real EUI values into four conceptual levels such as EXCEL- EUI levels. This decision tree is built on the basis of the training
LENT, GOOD, FAIR, and COMMON, thereby resulting in a smaller data data set of 55 data records with the ten attributes list of Table 2. It
range of each level and providing a more detailed description. How- can be seen that this tree includes a total of 21nodes among which
ever, more conceptual levels require a larger database and may 11 are leaf nodes, including 8 LEAFs and 3 STOPs: this represents
be prone to higher misclassification rate of data records and thus 11 classes (either EUI = HIGH or EUI = LOW). The explanatory note
reduce the accuracy of decision tree models. of three kinds of nodes, namely root node, internal node, and leaf
node in this decision tree is shown in Fig. 7. Note that entropy is
3.3. Model input variables also calculated and given in each node to characterize the purity
of the sub dataset in that node. Moreover, the average EUI value of
Ten parameters (or attributes) are selected from the database to data records in each class is given and used for reference when per-
be model input variables and the summary of these parameters is
given in Table 2.
These ten parameters are grouped into four categories that are
important determinants of household energy demand.

(1) Climatic conditions (TEMP). The range of annual average out-


door air temperature in the six districts is discretized into two
intervals based on the same concept hierarchy as the EUI men-
tioned earlier: the high interval (8.8 ◦ C, 13.1 ◦ C), and the low
interval (14.3 ◦ C, 17.4 ◦ C). According to this discretization cri-
terion, the low temperature districts include Hokkaido and
Tohoku while the other four districts belong to high temper-
ature districts;
(2) Building characteristics (HOUS, CONS, AREA, HLC, ELA). For
building construction type, the non-wood type includes steel Fig. 5. Categorical distribution of the six categorical parameters.
1642 Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646

Fig. 6. Decision tree for the prediction of building EUI level.

forming prediction. Specifically, this reference value can be viewed 51 records which accounts for 93% of all the training records are
as predictive numerical EUI value of the new data records that fall correctly classified: this indicates a good accuracy. Also, confusion
into that class. matrix reports how many data records are correctly classified and
The WEKA analysis report also provides the information on the misclassified in the class of HIGH EUI and LOW EUI separately, as
classification accuracy of the decision tree. The report indicates that below:

Fig. 7. Explanatory note of decision tree nodes.


Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646 1643

Table 3
Decision rules derived from the obtained decision tree.

Node Decision rules

1 5 If TEMP is high and HLC > 3.89 then EUI is HIGH


2 6 If TEMP is low and HEAT is electric then EUI is HIGH
3 9 If TEMP is high and HLC ≤ 3.89 and ELA > 4.41 then EUI is LOW
4 10 If TEMP is low and HEAT is non-electric and NUM ≤ 2 then EUI is LOW
5 12 If TEMP is high and HLC ≤ 3.89 and ELA ≤ 4.41 and HWS is electric then EUI is LOW
6 15 If TEMP is low and HEAT is non-electric and NUM > 2 and HOUS is apartment then EUI is HIGH
7 16 If TEMP is high and HLC ≤ 3.89 and ELA ≤ 4.41 and HWS is non-electric and KITC is electric then EUI is LOW
8 18 If TEMP is low and Heat is non-electric and Num > 2 and HOUS is detached and HLC ≤ 1.70 then EUI is LOW
9 19 If TEMP is low and Heat is non-electric and Num > 2 and HOUS is detached and HLC > 1.70 then EUI is HIGH
10 20 If TEMP is high and HLC ≤ 3.89 and ELA ≤ 4.41 and HWS is non-electric and KITC is non-electric and HLC ≤ 2.93 then EUI is LOW
11 21 If TEMP is high and HLC ≤ 3.89 and ELA ≤ 4.41 and HWS is non-electric and KITC is non-electric and HLC > 2.93 then EUI is HIGH

record is misclassified, this accuracy is basically acceptable. At the


same time, WEKA analysis report also provides confidence level for
the classification of each data record. The confidence level deter-
mines how likely the test data record falls into that class and, it is
In this matrix, the number of correctly classified records is given equal to the ratio of the number of correctly classified data records
in the main diagonal, i.e. upper-left to lower-right diagonal; the to total record number in that class in the training set. It can be
others are incorrectly classified. Clearly, class “LOW EUI” was mis- seen from Table 5 that generally the confidence level for the classi-
classified as “HIGH EUI” only one time and class “HIGH EUI” was fication is higher than 80%, indicating that most of the prediction is
misclassified as “LOW EUI” three times. Such information indicates reliable. Further, by using a pre-specified threshold, e.g. 80%, con-
that high EUI is more prone to be misclassified than low EUI. This fidence level could improve estimated accuracy of classification.
may have occurred due to the fact that most of the data records are In particular, if the confidence level of a data record classifica-
in LOW EUI so the tree is made more sensitive to this class. An even tion exceeds the threshold, this classification will be accepted;
distribution between HIGH EUI class and LOW EUI class in database otherwise it will be refused. For example, if the threshold in this
would possibly help obtain sufficient accuracy and sensitivity in the evaluation is set to be 80%, then all the records, except the record
desired classes. 2 that is misclassified, will be accepted. Similarly, the threshold is
The major strength of decision tree lies in its interpretability and very useful when applying decision rules to the prediction of new
ease of use, particularly when decision rules are created. Based on data sets. In addition, the error rate between the actual EUI value
a decision tree, decision rules can be easily generated by traversing and the reference EUI value are also given in this table for the relia-
a path from the root node to a leaf node. For example, a decision bility test of reference value. It can be seen that, among 11 correctly
rule can be generated from node 1 to node 5 in above decision classified data records, 5 have an error rate lower than 5% while
tree as follows: If TEMP is high and HLC ≤ 3.89 and ELA ≤ 4.41 and the other 6 have an error rate between 20% and 35%, which indi-
HWS is electric then EUI is LOW. Since each leaf node produces a cates that a higher concept hierarchy for building EUI need to be
decision rule, the complete set of decision rules, which is equivalent formed to improve the prediction performance of reference value.
to the decision tree, can be derived after all the leaf nodes have been However, this is limited by the size of database in this study.
included. Accordingly, above decision tree is converted to a set of
decision rules, as show in Table 3. 4.3. Utilization of decision tree

4.2. Evaluation of the decision tree 4.3.1. Using decision tree for prediction
Based on predictor variables, decision tree and decision rules
As mentioned previously, the decision tree accuracy should be can be utilized to predict target Variables Assume the EUI level of
evaluated to estimate how accurately it can predict building EUI a new residential building in Japan must be predicted by using the
levels before applying it to new residential buildings. Accordingly, decision tree in Fig. 6. The threshold of confidence level is set to be
the obtained decision tree was applied to the test dataset and the 85%. The typical building parameters are shown in Table 5.
results are given in Table 4. Specifically, the building EUI level is predicted as follows:
Table 5 shows that among 12 data records included in the test-
ing set eleven records, accounting for 92%, are correctly classified. Step 1: The root node, i.e. node 1 in this decision tree, is the starting
Given that the size of testing set is relatively small and only one point of prediction. From node 1, it can be seen the value of TEMP

Table 4
Results of decision tree accuracy evaluation.

Actual level Predicted level Correct or incorrect Confidence level Actual EUI Reference EUI Error

1 HIGH HIGH Correct 100% 449 450 0.2%


2 LOW HIGH Incorrect 75% 258 624 141.9%
3 HIGH HIGH Correct 100% 581 584 0.5%
4 LOW LOW Correct 100% 327 322 1.5%
5 HIGH HIGH Correct 100% 707 552 22.0%
6 LOW LOW Correct 81.80% 303 316 4.3%
7 LOW LOW Correct 81.80% 238 316 32.8%
8 LOW LOW Correct 88.90% 258 315 22.1%
9 HIGH HIGH Correct 100% 507 488 3.7%
10 HIGH HIGH Correct 100% 495 601 21.4%
11 LOW LOW Correct 81.80% 427 316 26.0%
12 HIGH HIGH Correct 100% 458 601 31.2%
1644 Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646

Table 5 Table 6
Building parameters for the prediction of building EUI levels. Summary of significant factors.

Number Variable Attribute value Unit Potential factors High temperature Low temperature
districts districts
1 TEMP High
2 HOUS Detached house Significant Rank Significant Rank
3 CONS Wood factors factors
4 NUM 4 √
5 AREA 100 m2 House type 3

6 HLC 2 W/m2 K Number of occupants 2
7 ELA 3 cm2 /m2 Floor area
√ √
8 HEAT Electricity Heat loss coefficient 1 4

9 HWS Non-electricity Equivalent leakage area 2
10 KITC Gas Construction type

Space heating mode 1

Hot water supply mode 3

Kitchen energy mode 4
should be first examined. Since TEMP is high, the node 1 test TEMP
is high is satisfied, then go to node 2;
Step 2: examine the value of HLC. Since HLC = 2, the node 2 test
ficient embodies the effect of construction type. At the same time,
HLC ≤ 3.89 is satisfied, then go to node 4;
these significant factors are ranked in terms of the degree of close-
Step 3: examine the value of ELA. Since ELA = 3, the node 4 test
ness to the root node. It can be found that heat loss coefficient and
ELA ≤ 4.41 is satisfied, then go to node 8;
space heating mode rank the first in the two districts respectively,
Step 4: examine the value of HWS. Since HWS is non-electric, the
and thus deserve extra attention when designing energy efficient
node 8 test HWS is electric is not satisfied, then go to node 13;
buildings.
Step 5: examine the value of KITC. Since KITC is gas, the node 13
The decision tree can provide the combination of significant
test KITC is electric is not satisfied, then go to node 17;
factors as well as the threshold values that will lead to high build-
Step 6: examine the value of HLC. Since HLC = 2, the node 17 test
ing energy performance. Based on such combination and threshold
HLC ≤ 2.93 is satisfied, then go to node 20;
values, some hidden yet useful information can also be extracted
Step 7: node 20 is a leaf node. As a result, the decision tree in Fig. 6
to help understand building energy consumption patterns. For
predicts that the EUI level of the residential building is LOW. In this
example, it can be seen that, in high temperature districts, a
node, the correctly classified data records account for 89% and thus
higher building heat loss coefficient than 3.89 W/m2 K will nor-
the confidence level of prediction is 89% that is larger than the pre-
mally cause a high EUI. Meanwhile, for a residential building with
determined threshold (85%). Therefore, the prediction is accepted.
heat loss coefficient lower than 3.89 W/m2 K, a high equivalent
Furthermore, the value of correctly classified records in this node
leakage area (>4.41 cm2 /m2 ) will benefit energy conservation. This
ranges from 242 MJ/m2 to 389 MJ/m2 and the average value is cal-
seems perhaps unreasonable and one possible explanation is that
culated at 315 MJ/m2 . These values can be used as reference values
the high temperature districts locate in moderate climate and
for the prediction, as mentioned previously.
have a moderate outside air temperature range. Accordingly, in
summer infiltration can serve as cooling source to remove the
excess heat generated indoor, thereby reducing overall energy con-
4.3.2. Model interpretation and useful information extraction sumption. This indicates that a rational combination of heat loss
Useful information can be extracted from the decision tree based coefficient and equivalent leakage area of residential buildings in
model so as to help understand energy consumption patterns and high temperature districts is important to improve building energy
optimize a building design plan. For example, various parameters performance. Also, a further study on the range selection of equiv-
are automatically selected as predictor variables by the decision alent leakage area may provide deeper insights into its impact on
tree algorithm for the classification of EUI levels. These parameters building energy demand. Additionally, from the nodes 8 and 13 in
are used to split the nodes of the decision tree and their degrees of Fig. 6, it can be observed that the change of the energy source of
closeness to the root node indicate the strength of the influence and hot water supply and kitchen will bring about a substantial increase
the number of records impacted. Therefore, by examining the deci- or decrease in EUI. Clearly electrical water heaters, instead of non-
sion tree nodes, the significant factors, as well as their ranks, that electric water heaters such as natural gas heaters, should be used to
determine the building energy demand profiles can be identified. save energy. Moreover, electrical water heaters can take full advan-
In particular, the variable importance of this decision tree model tage of cheap nighttime electricity and thus help users save money
can be analyzed as follows: first, the root node, i.e. TEMP, indicates spent on energy.
that outside air temperature is the most important determinant of The EUI values in the node 8 are plotted in Fig. 8 in order to make
energy demand among all these factors. Then, for clarity, the signifi- a comparison between buildings with electric HWS and buildings
cant factors for the high temperature districts (i.e. Hokuriku, Kanto, with non-electric HWS. The two significant factors with higher
Kansai and Kyusyu) and low temperature districts (i.e. Hokkaido ranks than HWS, i.e. HLC and ELA, are also taken into considera-
and Tohoku) are identified separately and summarized in Table 6. tion (HLC at abscissa, ELA at ordinate). The abscissa–ordinate plane
Clearly, four significant factors are identified for each district is divided into various grids so that EUI values can be compared
and the only parameter found to be significant for the both dis- based on similar HLC and ELA values, thereby removing the impact
tricts is heat loss coefficient. This implies that the significance of of these two factors. It is apparent from Fig. 8 that, in a same grid
these factors, except building heat loss coefficient, is dependent or adjacent grids, red points, which denote EUI values with non-
on outside air temperature. Moreover, among the three household electric HWS, are generally higher than blue points, which denote
appliance energy source parameters, space heating plays a role in EUI values with electric HWS. This is in accordance with the above
low temperature districts while hot water supply and kitchen are conclusion drawn from the decision tree.
significant in high temperature districts. Note that floor area and With regard to kitchen energy source, electrical appliances,
construction types do not appear in the decision tree. This is reason- however, tend to consume more energy than the appliances using
able since the target variable, i.e. EUI level, is a measure of annual natural gas. This may have occurred since the power of many
total energy normalized for floor area and building heat loss coef- kitchen electrical appliances, such as rice cooker, is comparatively
Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646 1645

Family size, i.e. the number of occupants, is another impor-


tant determinant of EUI in low temperature districts. As can be
seen, families with more than two occupants will have significantly
higher EUI than those with two occupants. This may have occurred
since a larger family size will cause more complicated occupant
behavior patterns thereby resulting in an increase in EUI. With
regard to house type, it can be seen that detached houses with low
heat loss coefficients (≤1.70 W/m2 K) tend to have a better energy
performance than apartments, which can occur for at least two
reasons. First, a small HLC contributes greatly to reduce energy con-
sumption on space heating and cooling; second, detached houses
normally have larger areas than apartments while both of them
have approximately same family size, which also lowers EUI values.
Such information can help building designers and owners make
intelligent decisions to improve building energy performance and
reduce building energy consumption. For example, based on above
information, architects and building designers can identify the
parameter that deserves more attention as well as its value range
at the early design stage. Also, they can perform a fast performance
estimation of newly constructed residential buildings. Moreover,
Fig. 8. Comparison of EUI between electric HWS and non-electric HWS.
building owners will easily determine which energy source should
be used for space heating, hot water supply, and kitchen to save
high and the use of these appliances is routine. Further, compared energy. It should be mentioned that heat loss coefficient and equiv-
to hot water supply energy source, kitchen energy source has a alent leakage area cannot be determined directly by architects and
smaller contribution to building energy demand and even though building designers. However, their value can be adjusted through
non-electric appliances is adopted in kitchen, an extra requirement some indirect measures such as improving construction material
on heat loss coefficient (≤2.93 W/m2 K) still need to be met in order and building air tightness.
to achieve low EUI levels.
In low temperature districts, from an energy saving point of 5. Conclusions
view, building owners and designers should give a prior consid-
eration to space heating energy source that plays a significant role In this paper, a decision tree method is proposed for building
in influencing EUI. The node 3 in Fig. 6 shows that non-electric fuel, energy demand modeling. This method is applied to Japanese res-
particularly kerosene and natural gas, should be used as primary idential buildings for predicting and classifying building EUI levels
source of residential space heating since the use of electric space and its basic steps, such as the generation of decision tree based on
heating tends to bring about a high EUI. This may be partly ascribed training data and the evaluation of decision tree based on test data
to the high efficiency of non-electric space heating devices such as are presented. The results have demonstrated that the use of deci-
kerosene space heaters. Moreover, non-electric heating devices are sion tree method can classify and predict building energy demand
more applicable than electric space heaters, such as air condition- levels accurately (93% for training data and 92% for test data), iden-
ers, in real life due to the high electricity rate in Japan. Similar to tify and rank significant factors of building EUI levels automatically,
Fig. 8, EUI values in the node 3, together with EUI values in low and provide the combination of significant factors as well as the
temperature districts in the test dataset, are plotted in Fig. 9. HLC threshold values that will lead to high building energy performance.
and NUM are used as abscissa and ordinate. The red and blue points Such method along with derived information could benefit building
represent EUI values with electric and non-electric space heating owners and designers greatly and one crucial benefit is improving
respectively. It can be observed that red points are generally higher building energy performance and reducing energy consumption
than blue points, which is in accordance with above conclusion. and the money spent on energy. Although the decision tree method
is mainly employed to predict categorical variables (the number of
the predetermined target intervals depends on the size of database
while too many intervals may result in errors in classification) and
reference value (i.e. average value of EUI in each class in this study)
instead of the precise value of target variables, as a modeling tech-
nique, the utilization of decision tree method is very simple and
its result can be interpreted more easily compared to other widely
used modeling techniques, such as regression method and ANN
method.
The application of decision tree method to Japanese residential
buildings in this paper has clearly demonstrated that this method is
feasible, having many advantages over other modeling techniques.
However, further study still need to be carried out to provide deeper
insights into the utilization of this method to modeling building
energy demand. The main focus of future research should be placed
on selecting appropriate interval number and reference value of
target variables without reducing estimation accuracy, since these
measures will provide more precise and valuable information to
users. In addition, more case studies in different sectors, such as
commercial buildings and office buildings, should be conducted to
Fig. 9. Comparison of EUI between electric HEAT and non-electric HEAT. further benefit energy conservation and policy formulation.
1646 Z. Yu et al. / Energy and Buildings 42 (2010) 1637–1646

Acknowledgements [8] Y.P. Zhou, J.Y. Wu, R.Z. Wang, S. Shiochi, Y.M. Li, Simulation and experimental
validation of the variable-refrigerant-volume (VRV) air-conditioning system in
EnergyPlus, Energy and Buildings 40 (6) (2008) 1041–1047.
The authors would like to express their gratitude to the Public [9] F.F. Al-ajmi, V.I. Hanby, Simulation of energy consumption for Kuwaiti domestic
Works and Government Services Canada, and Concordia University buildings, Energy and Buildings 40 (6) (2008) 1101–1109.
for the financial support. [10] L. Wehenkel, M. Pavella, Decision tree approach to power systems security
assessment, International Journal of Electrical Power & Energy Systems 15 (1)
(1993) 13–36.
References [11] C.-Y. Fan, P.-C. Chang, J.-J. Lin, J. C. Hsieh. A hybrid model combining case-based
reasoning and fuzzy decision tree for medical data classification. Applied Soft
Computing, in press, Corrected Proof.
[1] L. Pérez-Lombard, J. Ortiz, C. Pout, A review on buildings energy consumption [12] K.-Y. Tung, I.-C. Huang, S.-L. Chen, C.-T. Shih, Mining the Generation Xers’ job
information, Energy and Buildings 40 (3) (2008) 394–398. attitudes by artificial neural network and decision tree—empirical evidence in
[2] W.G. Cai, Y. Wu, Y. Zhong, H. Ren, China building energy consumption: sit- Taiwan, Expert Systems with Applications 29 (4) (2005) 783–794.
uation, challenges and corresponding measures, Energy Policy 37 (6) (2009) [13] G.K.F. Tso, K.K.W. Yau, Predicting electricity energy consumption: a comparison
2054–2059. of regression analysis, decision tree and neural networks, Energy 32 (9) (2007)
[3] T. Catalina, J. Virgone, E. Blanco, Development and validation of regression mod- 1761–1768.
els to predict monthly heating demand for residential buildings, Energy and [14] J.R. Quinlan, Induction of decision trees, Machine Learning (1986).
Buildings 40 (10) (2008) 1825–1832. [15] J. Han, M. Kamber, Data mining concepts and techniques, Elsevier Inc., San
[4] C. Ghiaus, Experimental estimation of building energy performance by robust Francisco, 2006.
regression, Energy and Buildings 38 (6) (2006) 582–587. [16] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression
[5] L. Zhou, F. Haghighat, Optimization of ventilation systems in office Trees, Wadsworth Inc., 1984.
environment. Part I. methodology, Building and Environment 44 (2008) [17] J.R. Quinlan, C4. 5 Programs for Machine Learning, Morgan Kaufmann, San
651–656. Mateo, 1993.
[6] L. Magnier, F. Haghighat, Multiobjective optimization of building design using [18] C.E. Shannon, A mathematical theory of communication, The Bell System Tech-
genetic algorithm and artificial neural network, Building and Environment 45 nical Journal 27 (1948) 379–623.
(2010) 739–746. [19] S. Murakami, S-i. Akabayashi, T. Inoue, H. Yoshino, K-i. Hasegawa, K. Yuasa,
[7] J. Zhang, F. Haghighat, Development of artificial neural network based heat T. Ikaga, Energy Consumption for Residential Buildings in Japan, Archi-
convection for thermal simulation of large rectangular cross-sectional area tectural Institute of Japan, Maruzen Corp., 2006, http://www.jma.go.jp/
earth-to-earth heat exchanges, Energy and Buildings 42 (4) (2010) 435– jma/indexe.html.
440.

You might also like