Ref Paper Part
Ref Paper Part
pattern. Architects must integrate their experience, the design task, the array of potential building solutions, making it suitable for thorough
specific environment of the building site, socio-economic constraints, evaluation. Subsequent to securing architectural design parameters via
and various architectural standards into their design considerations. Latin Hypercube Sampling, they are seamlessly integrated into the
Through sketching, modeling, and other methods, architects can explore architectural parametric model. After configuring the requisite simula
various design possibilities. This exploration facilitates informed design tion parameters and boundary conditions, this study proceed to simulate
decisions and ultimately results in a refined building plan. In the the anticipated building performance. This culminates in the acquisition
traditional experience-oriented architectural design process, the archi of training data that pairs architectural design parameters with corre
tects evaluate building performance mainly based on their own experi sponding performance simulation values.
ence, which may come from the widely circulated regional design
techniques, design laws summarized from repeated design practices, 3.2. Establishment of the machine learning model
experimentally verified and proposed design strategies, etc. However,
the design experience of architects is limited and varies greatly among While optimization algorithms address complex problems, they also
individuals, so the performance optimization design based on experi possess a significant limitation. Specifically, when confronted with
ence cannot quantify the level of performance improvement of design numerous optimization solutions, it might necessitate hundreds or even
solutions, and it is difficult to take into account the conflicting perfor thousands of simulation evaluations to discern the optimal solution.
mance. With the development of computer technology, design decisions Integrated building performance optimization typically entails sub
are supported by more explicit performance data calculated by building stantial data, rendering the optimization simulations labor-intensive
simulation. Building performance results are generated by incorporating and challenging for practical applications. Many researchers employ
a range of input parameters and boundary conditions, encompassing metamodeling to construct approximate models from the stochastic
aspects like the building environment, geometry, material construction, simulation data in building performance optimization, aiming to cap
equipment operation, among other simulation settings. ture the holistic traits of the simulation data.
To optimize building performance prediction, it’s crucial to source For a methodical assessment, this research partition the combined set
training data for machine learning models meticulously from the of sampled architectural design parameters and building performance
building design space to ensure all-encompassing coverage. Indeed, the data into two distinct categories: a training dataset and a validation
efficacy of the prediction largely hinges on the quality of these sampled dataset. While the former serves as a bedrock for training the machine
points. To this end, this study deploy the Latin Hypercube Sampling learning model, the latter is pivotal in ascertaining the prediction
(LHS) technique, renowned for its efficacy in efficient design space model’s accuracy. In traditional machine learning modeling, there’s a
sampling. Notably, this method offers a distinct advantage over con need for detailed manual calibration of various parameters. Such an
ventional random sampling. It ensures a holistic representation of the intricate process demands profound expertise to ensure the optimal
design space, achieved through stratifying the input probability distri predictive prowess of the ensuing model. Understanding the potential
bution, thereby laying a robust foundation for machine learning model challenges architects may face with this level of granularity, this study
predictions. introduces an enhanced approach: an optimization algorithm. This al
Further refining the approach, this study adopt the parametric gorithm autonomously identifies the quintessential modeling and
modeling methodology. This enables us to establish a building model training parameters. By minimizing the Mean Square Error between the
equipped with thermal zoning, integrating building design parameters simulation outcomes and prediction results of the training dataset, and
as variable input components. The inherent dynamism of this model concurrently maximizing the correlation coefficient between the simu
facilitates automatic adjustments in tandem with alterations in design lated and predicted outcomes of the validation dataset, the machine
parameters. Essentially, this parametric framework captures a wide learning model is primed to achieve exemplary prediction accuracy
5
Z. Li et al. Journal of Cleaner Production 425 (2023) 138922
[ ]
∑ ∑
minj,t minc1 (yi − c1 )2 + minc2 (yi − c2 )2
i:xi ∈R1 (j,t) i:xi ∈R2 (j,t)
where R1 (j, t) and R2 (j, t) denote the data partitions or regions resulting
from the split; c1 and c2 are the means of the dependent variable for the
data points within R1 and R2 , respectively; j is a feature and t is the
chosen threshold for the split.
6
Z. Li et al. Journal of Cleaner Production 425 (2023) 138922
Table 4
Comparison between different building layouts.
Type Feature Advantage Disadvantage Illustration
Side-by- Panelized building units are laid out side by The layout is relatively regular, and Space design is very boring. Generally
side style side in a certain orientation convenient for construction; the speaking, staggered, and changing spacing
transaction meets the requirement of and so on are used to break the rule;
daylight spacing; land can be building spacing will have an impact on
economized. building ventilation
Enclosure The building is laid out along the street or The enclosed space has the Since some rooms are poorly oriented, the
type built in the form of courtyard, thus a closed characteristics of quietness and safety, buildings will block each other, if they are
or semi-closed inner courtyard space is and is conducive to outdoor activities; it too high, lighting and natural ventilation
formed. can increase the building density. will be hindered.
Point- This type mostly refers to the building layout Flexible building layout makes the With usually high volume ratio, and the
group of single houses of courtyard, houses of terrain easy to be used; with better high rise buildings tend to make the
type multi-story point type and houses of high- daylighting conditions, the wind is more residents depressed; the building
rise tower type, surrounded by public likely to pass through between high orientation has great impact on the
buildings and central green areas, etc. buildings. ventilation
Mixed type Combination of 3 basic forms or Flexible architectural design can easily There is a high requirement for designers
deformations meet most people’s need.
Table 5
Types and definitions of design parameters.
Parameter type Definition Unit
In-1 Front façade length of building 20~35 (20, 25, 30, 35) m
In-2 Building width Building width = 300/front façade length of building m
In-3 Total building height Total building height = height of building standard floor*4 m
In-4 Height of building standard floor 3~4.2 (3, 3.4, 3.8, 4.2) m
In-5 Height-width ratio of building street 1~2 (1, 1.5, 2) –
In-6 Distance between trees 4-10 (4, 6, 8, 10) m
In-7 Crown size of tree 2~3.6 (2, 2.4, 3, 3.6) m
In-8 Street material 0-concrete; 1-asphalt; 2-granite; 3-moist muddy ground –
In-9 Window-wall ratio of building 0.15–0.45 (0.15, 0.25, 0.35, 0.45) –
In-10 Solar heat gain rate of window 0.25–0.99 (0.25, 0.5, 0.75, 0.99) –
In-11 Total floor area 1200 m2
In-12 Shape coefficient (STotal surface-Sground floor surface)/Vbuilding –
In-13 Building energy Obtained by OpenStudio simulation kWh/m2
climate comfort among cities. cities in China including Chengdu, Hangzhou, Nanning, Shenzhen, and
Wuhan, a total of 3659 datasets was acquired. The details of number of
records against each city are presented in Table 8. For better under
5.1. Sensitivity analysis standing, Pearson correlation matrixes are presented in Figs. 12–16. For
instance, Fig. 12 demonstrates that the degree of correlation between In-
Over the past twenty years, numerous computational methods have 1 and GHE, UTCI, and DH are 0.47, 0.36, and 0.06, respectively; be
been introduced and documented. These advancements have been tween In-2 and GHE, UTCI, and DH are − 0.48,-0.36, and − 0.06,
developed with the objective of forecasting the outcomes of intricate respectively, and so on. Similarly, the degree of correlation between
phenomena. These phenomena display highly non-linear characteristics, inputs and GHE, UTCI, and DH for residential buildings of different cities
making traditional deterministic techniques unsuitable. Ensemble-based can be estimated from Figs. 12–16. From the comparison, it is clear to
machine learning models hold a prominent position within these see that except for Wuhan, the effects of In-3 to In-6 and In-8 on UTCI
methodologies. It’s worth noting that ensemble learning techniques were all negatively correlated, with the largest negative effects of In-5
enhance the performance of weaker models. An ensemble model, built on UTCI in Hangzhou and Chengdu climatic conditions, which were
using two or more individual soft computing models, typically out − 0.53 and − 0.64, respectively. It is higher than the negative effects of
performs a single model. Hence, taking these points as a reference, this In-5 on UTCI in Shenzhen and Nanning climatic conditions, which were
study implements three widely used ensemble-based machine learning − 0.36 and − 0.40, respectively. Besides, in all typical cities except
models, namely RFR, DTR, and GBR, for microenvironment simulation Chengdu, the effects of In-9 and In-10 on GHE and DH are positively
of residential buildings (Yin, L et al., 2022a,b; Yin, L et al., 2021). correlated and stronger compared to other parameters, e.g., in Hang
The performance of the developed models is presented in this sub- zhou climate, the effects of In-9 on GHE and DH are 0.64 and 0.29,
section. Notably, the dataset was obtained for each city during the sto respectively, whereas the effects of In-10 on GHE and DH are 0.55 and
chastic simulation. Based on the research and analysis of the thermal 0.86, respectively. Unlike that, the effect of In-10 on GHE was weaker
environment of low- and medium-rise old communities in five typical
11
Z. Li et al. Journal of Cleaner Production 425 (2023) 138922
Table 6 Table 8
Parameters of the community model of Pingfeng New Village. City-wise details of total, training and testing datasets.
Type Sub-category Parameter category Unit Typical City Total dataset Training (TR – 80%) Testing (TS – 20%)
model
Hangzhou 803 642 161
Geographical Climate Meteorological – Liuxia Chengdu 856 685 171
location parameters Street Wuhan 514 411 103
Building form Building type Number of floors – 4 Shenzhen 885 708 177
parameters Front façade length m 20.8 Nanning 601 481 120
of building (S/N
direction)
Total width of m 14.4 Shenzhen and Nanning, the effects of In-1 to In-4 on DH are in the range
building of − 0.1 to 0.1. In addition, in Shenzhen and Nanning, the effects of In-1
Standard floor height m 4
Total building height m 16
to In-4 on UTCI are also very weak and lie between − 0.13-0.05, which
Width-length ratio of − 0.69 0.69 means that the morphological parameters of the building do not have as
building great an effect on the indoor and outdoor thermal environments as
Window-wall ratio – 0.26 expected.
(WWR)
After data collection and segregation by city, each dataset was par
Orientation deg 0
Geometry Volume m3 4792.3 titioned into training (TR) and testing (TS) datasets prior to computa
parameters Total exterior area m2 1725.92 tional modeling, as detailed in Table 8. Notably, 80% of each dataset
(ground area was randomly selected for the training dataset, while the remaining 20%
included) was used for the testing dataset. Also note that the training dataset was
Area of whole floor m2 1200
Shape coefficient – 0.298
used to build models, whereas the testing dataset was used to validate
Envelope design Envelope Heat transfer W/ 0.181 their ability to predict an unobserved dataset. A flowchart outlining the
parameters of structure coefficient of exterior (m2K) computational modeling steps is presented in Fig. 2.
building walls (average value) The performance of the developed models in predicting GHE, UTCI,
Heat transfer W/ 0.187
and DH is presented in Tables 9–11. The model performance for both
coefficient of ground (m2K)
(average value) training and testing datasets is presented herein. Notably, three per
Heat transfer W/ 0.191 formance criteria, namely, mean absolute error (MAE), mean square
coefficient of roof (m2K) error (MSE), and determination coefficient (R2) were determined and
(average value) assessed. Experimental results show that all the developed models can
Heat transfer W/ 1.6
coefficient of window (m2K)
estimate the desired outputs with high predictive precision. In the
(average value) training phase, the R2 was found between 99% and 100%, 97% and
Solar heat gain – 0.7 100%, and 100% for the RFR, GBR, and DTR models, respectively.
coefficient (shading However, this accuracy slightly decreased in the testing phase (Raja
coefficient)
et al., 2023; Salami et al., 2022).
Community form Community Community plot ratio – 52.9
form data Community greenery % 8.8 In the training phase, the performance (based on R2 index) of the
rate developed RFR model was scattered in the range of 0.9947–0.9990 for
Height-width ratio of – 16/11 = Hangzhou city, 0.9946 to 1 for Chengdu city, 0.9970 to 1 for Wuhan
street 1.46 city, 0.9903 to 1 for Shenzhen city, and 0.9965 to 0.9999 for Nanning
Space between trees m None
Tree canopy size m None
city; and between 0.9608 and 0.9999 for Hangzhou city, 0.9821 and
Street material – Asphalt 0.9999 for Chengdu city, 0.9841 and 0.9999 for Wuhan city, 0.9508 and
1 for Shenzhen city, and 0.9559 and 0.9999 for Nanning city in the
testing phase. The performance of GBR and DTR can be seen from Ta
bles 10 and 11, respectively.
Table 7
Performance indicators of the baseline model under different typical urban cli Scatterplots for the best-performing model (as detailed in Table 12)
matic conditions. is presented in Fig. 17. Herein, cite-wise scatterplots between the actual
and estimated values are presented for the testing dataset. A scatterplot
Objective function Hangzhou Chengdu Wuhan Shenzhen Nanning
is a type of data display that shows the relationship between two nu
UTCI [%]: comfort 0 0 0 0 0 merical variables, i.e., actual and estimated variables, for this case.
hours percentage
in outdoor
According to the presented illustrations, it can be observed that the
thermal scatterness between the actual and estimated values are very low as
environment most of the values fall between ±10% deviation lines (shown in black
DH [%]: discomfort 62.5 58.33 49.37 58.33 56.25 coloured dotted lines). Herein, the results of top-performing model are
hours percentage
presented city-wise based on the MSE value.
in indoor thermal
environment For better illustrations, Fig. 18 represents bar plots of MSE values for
GHE [t]: building 508.49 354.98 575.86 766.28 624.77 the testing datasets. For this figure, the error value between the actual
greenhouse gas and estimated values can be seen in terms of MSE index. Larger the bar
emission size, the higher the error between the actual and estimated values.
Moreover, violin plots of absolute error between the actual and esti
relative to In-1 to In-4 in Chengdu climate, only 0.39, while the effects of mated values, and error histogram along with distribution are presented
In-1,In-3 and In-4 were all greater than 0.46 for GHE, and the effect of in Figs. 19 and 20, respectively. Herein, the comparisons are presented
In-2 on GHE was negatively correlated, which is − 0.48. It is interesting for the testing datasets. A violin plot is a hybrid box plot and a kernel
to note that the effects of In-1 to In-4 on DH are small in all cities, for density plot which shows a pick in the data. It is used to visualise the
example, in hot summer and cold winter climates, such as Hangzhou, distribution of numeric data. Unlike a box plot that can only summary
Chengdu and Wuhan, the effects of In-1 to In-4 on DH are in the range of statistics, violin plots depicts summary statistics as well as density of
− 0.1 to 0.24, whereas in hot summer and warm winter climates, such as each variable. The histogram and the density plot are used to check
13
Z. Li et al. Journal of Cleaner Production 425 (2023) 138922
whether or not it is reasonable to assess the errors inherent in the On the contrary, Fig. 20 exhibits that the error range for the Nanning city
database. According to Fig. 19, the developed RFR-UTCI, GBR-UTCI, and is the lowest, followed by Wuhan, Hangzhou, Chengdu and Shenzhen
DTR-UTCI models for Chengdu city; RFR-GHE, DTR-GHE, and DTR- cities. The information presented inside the figures (refer to mu and
UTCI models for Hangzhou city; RFR-UTCI, GBR-UTCI, DTR-UTCI, and sigma values shown in table) can also be used to assess the error dis
DTR-DH for Nanning city; DTR-GHE and DTR-UTCI for Shenzhen city, tribution of the estimated datasets for the developed models. A model
and DTR-GHE and DTR-DH for Wuhan city, show lower accuracy level. with mu and sigma values closer to zero indicate higher precision level
14
Z. Li et al. Journal of Cleaner Production 425 (2023) 138922
Table 9 Table 10
Fitting indexes of the prediction model (RFR) for each city. Fitting indexes of the prediction model (GBR) for each city.
City Dataset Output MAE MSE R2 City Dataset Output MAE MSE R2
Hangzhou Training GHE 0.56 30.12 1.00 Hangzhou Training GHE 0.70 1.01 1.00
UTCI 0.46 0.66 0.99 UTCI 0.82 1.24 0.99
DH 0.22 0.40 0.99 DH 0.58 0.57 0.99
Testing GHE 1.09 4.33 1.00 Testing GHE 1.29 2.98 1.00
UTCI 1.29 4.76 0.96 UTCI 1.04 1.91 0.98
DH 0.66 1.22 0.98 DH 0.76 1.09 0.99
Chengdu Training GHE 0.16 0.16 1.00 Chengdu Training GHE 0.30 0.21 1.00
UTCI 0.54 1.16 0.99 UTCI 1.08 2.81 0.99
DH 0.11 0.06 1.00 DH 0.35 0.26 0.99
Testing GHE 0.37 0.95 1.00 Testing GHE 0.49 0.83 1.00
UTCI 1.64 10.83 0.95 UTCI 1.67 8.96 0.96
DH 0.40 0.64 0.98 DH 0.49 0.52 0.99
Wuhan Training GHE 0.62 1.20 1.00 Wuhan Training GHE 0.70 0.99 1.00
UTCI – – – UTCI – – –
DH 0.34 0.30 1.00 DH 0.62 0.68 0.99
Testing GHE 1.46 5.19 1.00 Testing GHE 1.56 4.61 1.00
UTCI – – – UTCI – – –
DH 0.83 1.48 0.98 DH 0.91 1.42 0.98
Shenzhen Training GHE 0.47 1.20 1.00 Shenzhen Training GHE 1.13 2.59 1.00
UTCI 0.30 0.61 0.99 UTCI 0.75 1.70 0.97
DH 0.35 0.28 1.00 DH 0.74 0.93 1.00
Testing GHE 1.15 3.21 1.00 Testing GHE 1.66 5.40 1.00
UTCI 0.86 3.51 0.95 UTCI 1.10 3.71 0.95
DH 1.00 2.40 0.99 DH 1.08 2.17 0.99
Nanning Training GHE 0.37 1.33 1.00 Nanning Training GHE 0.46 0.43 1.00
UTCI 0.44 0.83 0.99 UTCI 0.95 1.82 0.98
DH 0.32 0.25 1.00 DH 0.68 0.83 0.99
Testing GHE 0.75 1.62 1.00 Testing GHE 0.95 2.32 1.00
UTCI 1.25 5.27 0.96 UTCI 1.35 4.97 0.95
DH 0.98 2.47 0.97 DH 1.20 2.97 0.96
15
Z. Li et al. Journal of Cleaner Production 425 (2023) 138922
Fig. 17. Scatterplots for the best-obtained models (based on MSE value in the testing phase) for (a) Hangzhou, (b) Chengdu, (c) Wuhan, (d) Shenzhen and (e)
Nanning cities.
17
Z. Li et al. Journal of Cleaner Production 425 (2023) 138922
Fig. 18. Illustrations of MSE values for the testing dataset (a) for RFR, (b) for
GBM, and (c) for DTR models.
18
Z. Li et al. Journal of Cleaner Production 425 (2023) 138922
Fig. 20. Illustration of error histogram with distribution (testing phase) for (a) Hangzhou, (b) Chengdu, (c) Wuhan, (d) Shenzhen and (e) Nanning cities.
indoor discomfort is reduced by 10.59%, annual greenhouse gas the DH optimal solution is superior to the other two optimal
emissions decrease by 34.1 t, and outdoor thermal comfort time solutions.
sees a 55.61% increase. This improves the environmental per (4) Under Shenzhen’s climate conditions, all three optimal solutions
formance of the community from all three aspects, although the outperform the baseline model, yielding distinct results
DH optimal solution does not reduce the carbon to the same compared to other cities. Specifically, the GHE optimal solution
extent as the GHE optimal solution. In contrast to the GHE and reduces 371.62 t of greenhouse gas emissions based on the in
DH optimal solutions, the UTCI solution boosts outdoor comfort crease of the comfort time of outdoor thermal environment by
time by 75.32%, but barely makes a dent in indoor discomfort, 11.31% and the decrease of the discomfort time of indoor envi
reducing it by just 0.03%. Instead, it increases the annual ronment by 30.64%. The DH optimal solution increases outdoor
greenhouse gas emission by 27 t. Therefore, from a comprehen comfort time by 3.28%, reduces indoor discomfort by 31.31%,
sive perspective, the environmental performance of the UTCI and cuts the community’s greenhouse gas emissions by 369.89t.
optimal solution is inferior to that of the other two optimal The UTCI solution enhances outdoor comfort by 53.31%, while
solutions. cutting greenhouse gas emissions by 225.86t and reducing indoor
(3) For Wuhan’s climate conditions, evaluating outdoor thermal discomfort by 13.12%. Therefore, the indicators of all three
comfort for each solution is redundant, given that the outdoor optimal solutions are applicable to the improvement of the local
temperature during the hottest month consistently exceeds 32 ◦ C community environment.
across all solutions. The GHE optimal solution decreases the (5) For Nanning’s climate, the GHE solution cuts annual greenhouse
community greenhouse gas emission by 166.05 t on the basis of emissions by 308.7t compared to the baseline model, marginally
increasing the indoor discomfort time by 7.36%, i.e., the reduc enhancing outdoor comfort by 0.38% but increasing indoor
tion of community greenhouse gas emission is realized at the cost discomfort by 2.01%. Therefore, the greenhouse gas emission
of the indoor thermal environment comfort. In contrast, the DH reduction is achieved at the cost of the deterioration of the indoor
optimal solution can reduce the community greenhouse gas thermal environment. In contrast, both the DH optimal solution
emission by 129.05 t based on the decrease of discomfort time of and the UTCI optimal solution improve the environmental per
indoor thermal environment by 1.28%. Therefore, on the whole, formance of the baseline model in three aspects. Here, the DH
optimal solution decreases the indoor discomfort time by 6.74%
19
Z. Li et al.
DH optimal solution.
Reflecting on the results, it’s evident that every city’s optimal solu
the UTCI optimal solution is slightly lower than that under the DH
Building on this, the outdoor comfort time surges by 52.30%.
time by 2.04%, and greenhouse gas emission by 265.19 t.
20
Table 13
Optimal design parameters for each typical city.
N Design variables Hangzhou Chengdu Wuhan Shenzhen Nanning
GHE DH (*) UTCI (*) GHE (*) DH (*) UTCI (*) GHE DH (*) UTCI GHE (*) DH (*) UTCI (*) GHE DH (*) UTCI (*)
(*) (*) (*) (*)
UTCI(*): optimal value of the comfort hours percentage in outdoor thermal environment; DH(*): optimal value of discomfort hours percentage in indoor thermal environment; GHE(*): optimal value of building
greenhouse gas emission.
Journal of Cleaner Production 425 (2023) 138922