Paper 13
Paper 13
A R T I C L E I N F O A B S T R A C T
Keywords: The study extensively explored the use of machine learning for predicting condensation heat transfer coefficients
Condensation heat transfer coefficients in tubes, comparing it with traditional empirical models. A diverse database encompassing various experimental
Machine learning conditions, shape and surface structures of tubes was established. Features were categorized into primary,
ANN
thermophysical, and dimensionless number features, forming three different datasets named Group1, Group2,
and Group3. Five machine learning models, including Random Forest(RF), Extreme Gradient Boosting(XGBoost),
Artificial Neural Networks(ANN), Support Vector Regression(SVR), and Classification and Regression Trees
(CART), were employed to predict heat transfer coefficients in the database using these combined datasets.
Furthermore, the research demonstrated that machine learning models outperformed traditional empirical
models in specific conditions, particularly when utilizing complex input features. The ANN model achieved the
highest predictive performance with an R2 of 0.999, MRD of 0.70%, and MARD of 2.00%. In contrast, traditional
models performed relatively poorly, especially in diverse operating conditions. These findings underscore the
substantial potential of machine learning in addressing complex nonlinear two-phase flow heat transfer issues.
* Correspondence to: Professor and ASME Fellow in Department of Energy Engineering, Zhejiang University, Hangzhou, 310027, China.
E-mail address: [email protected] (W. Li).
https://doi.org/10.1016/j.ijheatmasstransfer.2024.125330
Received 21 October 2023; Received in revised form 1 February 2024; Accepted 13 February 2024
Available online 21 February 2024
0017-9310/© 2024 Elsevier Ltd. All rights reserved.
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
outperformed existing Nusselt number correlations for bubble conden pressure drop during condensation inside pipes under various operating
sation. Nie et al [9]. established a database containing 28 different pure conditions. Among these models, the Random Forest model, with a
fluids and 6064 experimental data points. Using a machine learning MARE of only 3.4%, demonstrated superior predictive capabilities
approach based on this combined database, they studied the heat compared to the others. Santamaria et al [12]. employed a decision tree
transfer coefficients in horizontal condensation. Five different machine model to predict the frictional pressure drop within rectangular chan
learning models were chosen for training and performance evaluation, nels, achieving prediction errors within 10% of the measured values.
and the results were compared with existing empirical correlations. It Tarabkhah et al [13]. conducted a study using four machine learning
was observed that machine learning had superior predictive capabilities. models to simultaneously investigate the heat transfer coefficients and
Finally, based on the analysis, a new general condensation heat transfer frictional pressure drop of refrigerants during condensation inside pipes.
correlation was developed and validated, yielding an MRD of 1.77% and By comparison, XGBoost demonstrated higher accuracy and better
MARD of 19.21%, ensuring that 78.21% of data points fell within a ± generalization when predicting pressure drop, with values of R2 and
30% error band. Microchannels have a high surface area-to-volume MARE being 0.999 and 3.87%, respectively, allowing it to predict 95%
ratio, resulting in enhanced heat transfer efficiency compared to con of data points within a ± 20% error range. Additionally, scholars have
ventional channels under similar flow conditions. Zhou et al [10]., in applied machine learning methods in the field of flow pattern recogni
order to predict the condensation heat transfer coefficients in micro tion. Traditional flow pattern classification often relies on direct visual
channels, also established a comprehensive flow condensation database observation. Seal et al [14]. achieved accurate classification of ten flow
(hydraulic diameters ranging from 0.424 mm to 6.52 mm). They utilized pattern maps in the condensation process of refrigerant R134a within
four machine learning models for prediction, with the Artificial Neural inclined smooth tubes, with an accuracy exceeding 98%. This was
Network (ANN) model and Extreme Gradient Boosting (XGBoost) model accomplished using Artificial Neural Networks (ANN) and visualization
achieving mean absolute errors (MAE) of 6.8% and 9.1%, respectively, data. Furthermore, Nie et al [15]. utilized a convolutional neural
demonstrating good predictive accuracy for test data. In addition to network (CNN)-based algorithm to accurately identify two-phase flow
predicting the Nusselt number or heat transfer coefficients for fluids, patterns. The database used to drive the algorithm consisted of 105,642
machine learning models can also be applied to predict two-phase flow images of methane and tetrafluoromethane condensation flow patterns
pressure drop, which is a crucial parameter for many industrial pro inside horizontal circular pipes from 696 test cases. The trained CNN
cesses, making it a significant application area. Hughes et al [11]. used algorithm achieved a prediction accuracy of 97.56%, providing a novel
three different machine learning models to predict the frictional method for flow pattern recognition in two-phase flows.
2
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
In previous studies, researchers primarily collected data from output of the model [22]. Random Forest is suitable for
condensation experiments on single smooth tubes to create their own high-dimensional data modeling, capable of handling thousands of input
databases for machine learning. However, this research addresses a gap variables while identifying the most important ones. During the
by including various enhanced tube types. While these enhanced tubes model-building process, Random Forest also reduces sensitivity to
offer superior heat transfer performance, their unique structures can feature selection by randomly choosing features. By integrating the
significantly affect fluid flow and heat transfer properties [16]. To output results from multiple decision trees, it enhances prediction ac
streamline the model and improve efficiency, only the tube diameter curacy, making its predictions more stable in high-dimensional
was considered as an input parameter for the machine learning model, modeling. Random Forest initially samples the dataset by randomly
and other structural parameters were omitted during data collection. selecting samples with replacement from the original condensation
This study compiled recent condensation data from the research group dataset, creating a new training set. In this process, the new condensa
and collected data from other researchers to create a new two-phase tion dataset will contain some duplicate samples and may miss some
flow condensation database covering various operating conditions. data. Subsequently, feature selection is done randomly. For each deci
The database includes data from both smooth tubes and enhanced tubes, sion tree, when making predictions, it doesn’t use all features simulta
with a rough 1:1.1 ratio. Fig. 1 illustrates the external shapes of some of neously. Instead, it randomly selects m features from the total of M
the enhanced tubes featured in earlier studies. Five machine learning features (feature dimension reduction) and uses these features to split
models (RF, XGBoost, ANN, SVR, and CART) were developed using nodes. For the samples and features selected through these steps, a de
Python 3.10 and relevant libraries such as scikit-learn (v1.2.2), Pandas cision tree is constructed using the decision tree algorithm. This process
(v2.0.1), Numpy (v1.24.3), etc. After data preprocessing and grouping, doesn’t require pruning, and each tree grows as much as possible until it
these models were trained and evaluated using various metrics to pre can no longer split. By repeating these steps, multiple decision trees are
dict condensation heat transfer coefficients. To assess their performance, generated, forming the RF model. When condensation data samples
the predictions were compared to traditional empirical correlations. from the test set or new input samples are input into the RF model, they
are predicted individually by all decision trees. In this study, conden
2. Methods sation heat transfer coefficients (HTC) are the output. And predicting
HTC is a regression problem. Therefore, the RF model takes the average
2.1. Machine learning model of all tree predictions as the final prediction for condensation heat
transfer coefficients within tubes.
Two-phase flow heat transfer in pipes is a complex process influ Extreme Gradient Boosting (XGBoost):XGBoost is a machine learning
enced by variables like temperature, flow rate, and vapor quality. To algorithm used for regression, classification, and ranking. It is an effi
assess and optimize heat transfer performance, this study employs five cient gradient boosting framework with strong generalization capabil
common machine learning (ML) models—Random Forest (RF), Extreme ities, performing exceptionally well on large-scale datasets. As depicted
Gradient Boosting (XGBoost), Artificial Neural Network (ANN), Support in Fig. 2(b), XGBoost is an ensemble model that employs boosting al
Vector Regression (SVR), and Classification and Regression Trees gorithms primarily based on decision trees. In boosting algorithms, se
(CART)—tailored to the non-linear nature of the problem. Fig. 2 shows rial trees are incrementally fitted using a series of different training sets
the schematic of 5 ML models. that are iteratively updated considering the residuals. During each
Random Forest (RF): RF is a popular ensemble learning algorithm, iteration, the model optimizes the negative gradient values of the loss
as illustrated in Fig. 2(a). It consists of multiple decision trees. The function to approximate the residuals for fitting regression trees [23].
fundamental idea behind Random Forest is to create several decision This ensemble model is highly scalable; XGBoost can effortlessly scale to
trees, each capable of producing a prediction. Random Forest then large datasets, supports high-dimensional features and interactions, and
combines the predictions from these individual decision trees by voting can handle more complex models. It also allows for custom loss func
or averaging, selecting the mode or mean of the predicted results as the tions and objective functions, adapting flexibly to various application
Fig. 1. Photographs of several enhanced tubes from previous studies (sourced from references (a) [17]Li et al., (b) and (c) [18]Li et al., (d)[19] Li et al., (e) [20] Sun
et al., and (f) [21] Sun et al.).
3
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
Fig. 2. Schematic representations of the five models, respectively: (a) RF, (b) XGBoost, (c) ANN, (d) SVR, and (e) CART.
scenarios. XGBoost is efficient in processing large-scale datasets and can features are passed sequentially. Ultimately, complex nonlinear map
be applied to various regression and classification tasks. It offers several pings can be established between input parameters and the final output
interpretability tools, such as feature importance assessment, feature results, aiming to achieve the goal of predicting target values [10].
interaction detection, and model explanations, aiding users in better Support Vector Regression (SVR): SVR is the application of Sup
understanding the model and data while improving model trust and port Vector Machines (SVM) to regression problems. It is a non-
interpretability. And XGBoost exhibits efficient computational perfor probabilistic algorithm that uses a kernel function to map data to a
mance and decoding capabilities, allowing for faster training and model high-dimensional space and finds the optimal hyperplane with
prediction. maximum margin between the training data points in that space to
Artificial Neural Networks (ANN): ANN consist of many units obtain a regression model [8]. SVR excels in handling complex nonlinear
called neurons that are interconnected, forming a complex network regression problems by mapping input features to high-dimensional
structure. As shown in Fig. 2(c), typically, an ANN model is composed of spaces using kernel functions. It offers robustness against outliers,
an input layer, several hidden layers, and an output layer. The funda good stability, and improved predictive performance through control of
mental principle of ANN model involves using trainable linear param model complexity and generalization. Additionally, SVR can effectively
eters (such as weights and biases) and nonlinear activation functions address high-dimensional data, generate nonlinear regression curves,
(like ReLU, tanh, Sigmoid, etc.) at the nodes of the hidden layers to and fine-tune model performance by optimizing key parameters, such as
process input data. In the interconnected hidden layers, the raw input the kernel function type, the error margin and the regularization
4
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
parameter [24], using techniques like grid search, random search, and broader dataset. Fig. 4(c) reveals the mass flow rate distribution, with
Bayesian optimization in practical applications. SVR can generate the majority (about 91.23%) below 600 W⋅m− 2 K− 1 . Fig. 4(d) displays
nonlinear regression curves and predict error data points near the the heat transfer coefficient distribution, with approximately 46.94%
regression curve. falling in the range of 2000–4000 W⋅m− 2 K− 1 , commonly seen in heat
Classification and Regression Trees (CART): CART is a type of exchanger tubes. A broader data distribution enhances model general
decision tree algorithm capable of handling both classification and ization and mitigates overfitting [27]. This diverse dataset benefits the
regression problems. It is a predictive model used to estimate the value subsequent machine learning models. Table 1 presents detailed infor
of a target variable based on incomplete variables, aiming to generate mation about the sources and operating conditions of the 1947 data
predicted values (targets) based on known values of predictive factors points in the dataset. In summary:
[25]. The CART algorithm consists of three parts: feature selection, tree
generation, and pruning. Its core concept is binary splitting and split - Geometry: ST, EHT
based on a single variable. In each decision-making step, the CART al - Working fluids:methane/propane, R1234yf/R32, R1234ze(E), R125,
gorithm performs a binary split on the observed variables using a binary R134a, R22, R32, R407C, R410A, R450A, R515B
recursive partitioning technique, dividing the current sample set into - Diameter(D): 1.088 - 11.5 mm
two sub-sample sets. This ensures that every non-leaf node of the - Mass flux (G): 47.2 - 1580.3 kg⋅m− 2 s− 1
generated decision tree has only two branches [26]. CART uses binary - Heat transfer coefficients(h): 606.1 - 18,819.2 W⋅m− 2 K− 2
tree structures, employing binary splitting for feature attributes. It - Saturation temperature (Tsat ): − 104.4 - 55 ◦ C
automatically selects features, simplifying complex feature selection - Quality (x): 0.01–0.97
tasks. This makes it ideal for challenging, high-dimensional classifica - Uncertainty: ±5.0% - ±13.84%
tion and regression problems. CART’s local modeling approach accom
modates nonlinear data relationships, making it a good fit for predicting Machine learning models typically split a newly created condensa
heat transfer coefficients in complex two-phase flow heat transfer tion dataset into two parts: an 80% training set, crucial for the model to
processes. learn from the data, and a 20% test set. The training set is essential for
Fig. 3 illustrates the process of training and evaluating the dataset the model to learn from the data, while the test set assesses the model’s
using five ML models. The initial step involves data preprocessing, performance on unseen data. This separation ensures the model fits the
where the data is segmented into three groups according to predefined training data well and can make accurate predictions on new data,
rules and then fed into the models. Subsequently, feature standardiza enhancing reliability and performance in real-world applications. [6].
tion is performed to ensure that samples maintain sufficient separation,
preventing them from being too close, thus reducing the training
complexity. Following standardization, the dataset is split, with 80% of 2.2. Feature selection
the data allocated for model training and the remaining 20% reserved
for testing and evaluating the predictive performance of the model. After In machine learning, features in a dataset refer to the attributes or
instantiating the ML models, a grid search for hyperparameter tuning characteristics that describe an instance. They are also known as inde
and model training is executed. Once the model’s evaluation results pendent variables or input variables. Features hold a fundamental role in
align with expectations, the final heat transfer coefficient predictions are machine learning, as they form the foundation for training and evalu
generated. ating machine learning models. The field of condensation heat transfer
We compiled a new dataset by gathering 1947 data points from 14 in two-phase flow within pipes inherently exhibits complex flow char
published papers. Fig. 3 visualizes this dataset, including 633 data points acteristics. Traditional predictive models typically require a multitude
from prior research, consisting of 441 enhanced tube condensation and of parameters to assess the heat transfer performance (HTC) [47], which
192 standard condensation data points. can be a cumbersome process. Furthermore, the applicability of
correlation-based models is often limited and may lead to decreased
2.2. Dataset creation predictive performance when applied to conditions beyond a certain
range of experimental settings. Balancing accuracy and generalizability
We compiled a new dataset by collecting 1947 data points from 14 becomes a critical challenge. Compared to traditional models, although
published papers. Fig. 4 illustrates the dataset. It consists of 633 data ML models also require numerous parameters as input features, they
points from a prior study’s database, including 441 for enhanced tube eliminate the need for intricate calculations and function as black-box
condensation and 192 for smooth tube condensation. Additionally, we models [39]. Machine learning models can maximize precision and
gathered 1314 data points from other research sources, comprising 582 generalizability to a certain extent. Features, being a critical component
for enhanced tube condensation and 732 for smooth tube condensation. of machine learning models, can have either positive or negative im
These data points form the dataset for our machine learning models. pacts when there are too many or too few. Too few input features may
Fig. 4(b) shows the distribution of various working fluids in the dataset. lead to underfitting, resulting in low predictive accuracy that falls short
R410A is the most common working fluid, representing 34.93% with of the expected performance. Conversely, an excess of input features can
680 data points, followed by R32 at 18.80% with 366 data points. To result in overfitting, diminishing the model’s ability to generalize
avoid overfitting, we added data with different working fluids for a effectively [40].
5
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
Fig. 4. Visualization of the condensation dataset by (a) Data source, (b) Working fluid, (c) Mass flux, and (d) Heat transfer coefficient.
Table. 1
Data points collected from literature and condensation operating conditions.
Sources Geometry Working fluids D(mm) G tsat x Uncertainty Data points
Li et al. [28] ST, EHT R32, R410A 11.5 47.2– 202.1 35,45 0.20–0.90 ±10.27% 165
Ma et al [2]. ST, EHT R32, R410A 8.3 50.2–462.6 38,45 0.20–0.82 ±11.32% 132
Li et al [29]. ST, EHT R410A 8.3, 11.5 24.1–462.6 35,40,45 0.19–0.81 ±13.84% 165
Li[17] ST, EHT R410A 11.3 250, 350 45 0.11–0.81 ±10.22% 42
Sun et al. [30] ST, EHT R410A 8.1, 11.3 62.7–452.4 35,40,45 0.20–0.80 ±12.46% 129
Cavallini et al [30]. ST R134a,R125, R32, R410A 8 65.0–750.0 40 0.15–0.88 ±5.0% 208
Zhang et al [31]. ST, EHT R410A 4.62,4.8 384.5–1580.3 36,43,50 0.40–0.90 ±18.1% 44
Wang et al [32]. ST R1234yf/R32 mixture 2, 4 102.0–405.0 40 0.11–0.94 ±15.2% 90
Zhang et al [33]. ST R22, R410A, R407C 1.088,1.289 300.0,350.0,450.0 40 0.01–0.88 ±18.7% 101
Jacob et al [34]. ST R450A, R134a 4.7 100.0–600.0 45, 55 0.01–0.97 ±12.2% 172
Diani et al [35]. EHT R450A, R515B, R1234ze(E) 6.75 50.0–400.0 30,40 0.11–0.96 ±9.3% 313
Longo et al [36]. ST, EHT R32, R410A 4.2 100.0–600.0 30, 40 0.15–0.96 ±9.0% 279
Li et al [37]. EHT R134a 5.89 600.0, 1000.0 35 0.10–0.90 ±7.73% 36
Yu et al [38]. EHT methane/propane 1.5 250.0 − 104.4 0.15–0.90 _ 71
mixture
Total: 1947
6
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
fluid’s flow characteristics and are computed based on #1 and #2. To Table. 4
assess the impact of different numbers of input parameters on predictive Search for the optimal hyperparameter combination of ML models using grid
accuracy, this study combined features from #1, #2, and #3 to create search.
new input feature sets, labeled as Group1, Group2, and Group3, which Model Hyperparameter Range
were then used for training and testing machine learning models. RF Number of estimators 100, 200, 300, 400, 500
Table 3 details the specific parameters included in the three combina Split criterion Mean Squared Error
tions: Group1, Group2, and Group3. Five different machine learning Bootstrap True
models were employed to train and test the dataset using these input Max depth of trees 10, 15, 20, 25
Max features 15, 29, 39
features, ultimately yielding predictions for the heat transfer coefficient
XGBoost Booster None
of tube condensation. These predictions were then compared and Number of estimators 100, 200, 300
evaluated. Max depth of trees 3, 5, 7, 9
Leaning rate 0.1, 0.01, 0.001
Subsample ratio of columns 0.5, 0.75
2.3. Hyperparameter selection and optimization
Minimum loss reduction for 0, 0.25, 0.5, 0.75, 1
splitting 0, 0.1, 1
Proper selection of hyperparameter combinations can significantly L1 regularization term on weights 0, 0.1, 1
improve model performance. In this regard, common search algorithms L2 regularization term on weights
include grid search, random search, and Bayesian optimization. The ANN Activation function Relu
Optimization algorithm Adam
advantage of grid search lies in its simplicity and ease of implementa
Initial learning rate 0.00001, 0.0001, 0.001
tion. This method exhaustively traverses every combination in the Regularization term of weight 0.001, 0.01
hyperparameter space, thus ensuring the finding of the global optimal matrix (160, 80, 30), (200, 160, 80, 30,
solution [42]. In order to optimize the hyperparameters of various al Hidden layers 10)
gorithms, this study chose to use a grid search method. Table 4 presents Maximum number of iterations 500
SVR Penalty coefficient 0.1, 0.2, 0.3, ⋅⋅⋅, 9.8, 9.9, 10
the search range when using grid search to find the optimal hyper Kernel linear, poly, rbf, sigmoid
parameter combination for ML models. In addition, we also used cross Polynomial degree 2, 3, 4, 5
validation methods to evaluate the performance of different hyper Epsilon 0.1, 0.2, 0.3, 0.4, 0.5, 0.6
parameter combinations to ensure that the ultimately selected hyper CART Criterion Squared error
Max depth of trees 5, 6, 7, ⋅⋅⋅, 18, 19, 20
parameters can perform robustly on various data subsets.
Minimum samples_split 2, 3, 4, ⋅⋅⋅,13, 14, 15
Minimum samples_leaf 1, 2, 3, ⋅⋅⋅, 13, 14, 15
3. Results and discussion Splitter best, random
7
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
Table. 5
Specific expressions and the range of applicable operating conditions for various general correlations.
Authors Equations Experimental Conditions
]
[ 3.8x0.76 (1 − x)
0.04 Fluid:water,R11,R22, alcohol, etc.
Shah et al. [43](1979) htp = hL (1 − x)0.8 + G = 10.8–210.6 kg m − 2s− 1
P0.38
Tsat = 21–31 ◦ C
r
kl
hL = 0.023Re0.8 0.4
L Prl do=7~40mm
D
GD cpl μl P
Rel = , Prl = , Pr =
μl kl Pc
Cavallini and Zecchin[44] kl Fluid:R22,etc.
hlp = 0.05Reoeqs Pro.33
(1974) i
di Re≥1.2 × 103
( )( )0.5
μ ρl 0.1 ≤ x ≤ 0.9
Reeq = Rev v + Rel
μlv ρv
G(1 − x)di
Rel =
μv
Cavallini et al [45]. 1 Fluid:R410A,R32,R22,etc.
{[ 3 }−
(2006) 7.5 ]− 3 1.6 Hydrocarbon G = 18–2240 kg m − 2s− 1
JTG = + C−T 3 CT = {
4.3X1.111
tt +1 2.6 Other refrigerants Tsat-Twall=0.6–28.7 ◦ C
1 Tsat =–15–302 ◦ C
xG
JG = ]2 do=3.1–17mm
[gdh ρv (ρl − ρv )
( )0.3685 ( )0.2363 ( ) ]
ρl μl μv 0.1
JG > JTG : ha = hlo [1 + 1.128x0.8170 × 1− 2.44Pr−l
ρv μv μl
[ ( T )0.8 ]( )
J Jc 0.023Re0.8 0.4
lo PrL λl
JG ≤ JTG : hd = ha c − hstra + hstra , hlo =
JG JTG D
( )
[ 1 − x 0.3321 ]− 1 [ λ3l ρl (ρl − ρv )ghlg ]0.25
hstra = 0.725 1 + 0.74 × + (1 − x0.087 )hlo
x μL D▵T
( )0.266
Bohdal et al [46]. x kl Fluid:R134a,R404A
(2011) h = 25.084Re0.258
l Pr−l 0.495 p−r 0.288 G = 64–902 kg m − 2s− 1
1− x D
Psat do=0.31–3.30 mm
pr = Tsat = 20–40 ◦ C
Pcr
[ ]
Vu et al [47]. 0.4 λl 3.8 ( x )0.201 Fluid:R410A
h = 0.023Re0.8l0 Prl (1 − x)0.8 + 0.52
(2015) di Pr 1− x G = 200–320 kg m − 2s− 1
Psat do=6.61–9.2 mm
pr =
Pcr Tsat = 48 ◦ C
51.05% of the data within a ± 50% error range, with a MRD and a axis. First, Fig. 6(a) showcases the performance of the RF model. The RF
MARD of 0.78% and 26.06%, respectively. Vu et al. [47] predict 54.85% model aligns closely with the experimental values. And around 93.56%
of the data points within a ± 50% error range, with MRD and MARD of of the data points fall within a ± 10% error range, while approximately
29.75% and 58.97%. In a comprehensive analysis and comparison of 99.45% are within a ± 30% error range, signifying the model’s high
these two models, Bohdal et al.’s model exhibits slightly better predic accuracy. To analyze the impact of different input parameter combina
tive accuracy than Vu et al.’s model. This may be attributed to the for tions, the data was divided into three groups: Group1, Group2, and
mer’s model being derived from a broader range of experimental Group3. Within the 0–5000 Wm− 2 K− 1 range, some Group1 data points
conditions, providing better generalization and suitability for predicting deviate slightly from the ±30% error band. Comparatively, the RF
condensation data across a wide range of operating conditions. model with input features from Group2 and Group3 demonstrates su
To sum up, traditional models analyzed exhibit some predictive perior fitting performance, providing more accurate predictions of the
capability for the established condensation heat transfer dataset, per condensation heat transfer coefficient inside the tube than the Group1-
forming well under specific experimental conditions but encountering based model.
limitations beyond that range. The dataset covers various tube types and Fig. 6(b) illustrates the predictive results of the XGBoost model for
condensation experimental conditions, adding complexity due to the the HTC inside the tubes. The XGBoost model demonstrates an accurate
diversity of tube types and a wide range of condensation conditions. prediction within a ± 10% error range for 90.19% of the data. More
Fig. 1 illustrates the diversity in tube types and extensive condensation over, it predicts 99.32% of the data within a ± 30% error range. Similar
experimental conditions, leading to a diverse range of scenarios, to the RF model, XGBoost exhibits a relatively high level of fitting ac
including enhanced and smooth tubes. The diversity in tube types and curacy. It’s worth noting that, like the RF model, some data points
experimental conditions makes it challenging to rely on a single uni within Group1 show significant fluctuations and slightly deviate from
versal correlation for all data points, revealing the limitations of tradi the ±30% error range compared to those in Group2 and Group3. In the
tional correlation models. Thus, this study emphasizes the potential of high HTC range, the XGBoost model trained on Group1 exhibits de
machine learning models for dealing with complex and diverse viations exceeding 10%. This may be attributed to the simplicity of
condensation heat transfer data, as they can capture intricate patterns Group1′s features, which fail to capture the complex nature of conden
and structures, enhancing predictive accuracy. sation heat transfer within this range. Additionally, the relatively low
number of samples within this range results in inadequate model
training, leading to larger deviations in this specific range.
3.3. ML prediction
Fig. 6(c) displays the training results of the ANN model on the
dataset. The ANN model can predict 87.28% of the data within a 10%
The model’s training and testing is conducted on a personal com
error range. While the predictive performance of the ANN model is
puter (CPU: Intel i5–8250 U @ 1.60 GHz, GPU: MX150), replacing cloud
slightly lower compared to RF and XGBoost models, it still outperforms
servers, to assess the offline applicability and feasibility for engineering
traditional models. Similar to the previous two models, data points
applications. Fig. 6 displays the results of predicting the tube’s HTC
within Group1 appear somewhat scattered, whereas those within
using five different ML models. Each chart plots the experimental values
Group2 and Group3 are more concentrated, with small fluctuations
(target values) on the x-axis and the models’ predicted values on the y-
8
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
Fig. 5. Prediction of HTC on the established ML dataset using traditional predictive models ((a) Shah et al. [43], (b) Cavallini and Zecchin[44], (c) Bohdal et al [46].,
(d) Vu et al [47].).
around the experimental values of the HTC. This indicates that the prediction accuracy. Although CART is also a standalone model, it
model’s predictions are very close to the experimental values, demon constructs decision trees to establish tighter connections between input
strating good predictive performance. features and output classifications. This decision tree structure enables
Fig. 6(d) presents the performance of the Support Vector Regression the CART model to better capture complex patterns and structures
(SVR) model in predicting the heat transfer coefficient within the tube within the data, resulting in more accurate predictions for unknown
by comparing its predicted values with the experimental values (ground data. In contrast, the SVR model, based on Radial Basis Function (RBF),
truth). From the visualization, it’s evident that the SVR model exhibits a while providing good regression results in some cases, typically strug
relatively larger range of prediction fluctuations, and its fit is not as gles to capture complex patterns within the data.
strong as the RF, XGBoost, and ANN models. The SVR model can accu Table 6 summarizes the model evaluation results on the test set for
rately predict 85.67% of the data points within a ± 30% error range but the three input parameter combinations. By analyzing the data in Fig. 6
only about half of the data (50.08%) within a ± 10% error range. While and Table 6, it is evident that all five ML models demonstrate remark
the SVR model still demonstrates strong predictive capabilities able predictive capabilities. They not only significantly enhance pre
compared to traditional models, it falls slightly short in performance diction accuracy but also outperform traditional correlation-based
compared to the three ML models mentioned earlier. It’s worth models, exhibiting lower MAD and MARD. Notably, the ANN model
emphasizing that both Random Forest (RF) and XGBoost belong to trained with data from Group 2 achieves the highest determination co
ensemble learning models, while the ANN is a complex model that efficient R2 of 0.999, indicating an exceptional fit. This model has a MAD
mimics the functioning of human brain neurons. In contrast, SVR is a of 0.70% and a MARD of 2.00%, showcasing outstanding performance.
relatively simple standalone model. This standalone model is less robust Additionally, the RF and XGBoost models also perform exceptionally
in terms of prediction accuracy, model stability, and handling outliers well, with R2 values exceeding 0.9, MRD values below 3%, and MARD
compared to the other three models, which explains why the SVR model values below 10%, indicating high predictive accuracy.
may be relatively less effective in dealing with complex nonlinear two- From the perspective of the three feature parameter combinations,
phase flow heat transfer problems. models trained with Group2 and Group3 data exhibit slightly lower
Fig. 6(e) presents the predictions and their comparison with exper MRD and MARD values compared to the models using Group1. This
imental values for the CART model on the dataset. The CART model can indicates that the predictive performance of the first two feature com
accurately predict 88.68% of data points within a ± 10% error range binations is superior. The difference arises from the relatively simple
and even 98.90% of data points within a ± 30% error range. Compared and single nature of Group1 features, which include only 5 original
to the SVR model, the CART model shows a significant improvement in feature parameters and 10 thermophysical properties. They cannot
9
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
Fig. 6. Prediction of condensation heat transfer coefficients in the dataset using five ML models. ((a) RF, (b) XGBoost, (c) ANN, (d) SVR, and (e) CART).
adequately reflect the intrinsic characteristics of the condensation heat affording it a greater array of choices to depict relationships among data
transfer process. Typically, within a specified range, the predictive ac points characterized by complex nonlinear dependencies. In contrast,
curacy of a machine learning model improves as the number of input Group2 and Group3 incorporate 24 dimensionless numbers, offering a
features increases. This is attributed to the ability of additional features more comprehensive representation of the characteristics of two-phase
to introduce more information about the data, enabling the model to flow heat transfer inside the tube. And these dimensionless numbers
capture the intricacies and diversity inherent in complex two-phase flow are also applied in some traditional empirical and semi-empirical pre
condensation data within tubes more comprehensively. With an diction models due to their ability to effectively describe flow charac
augmentation in the number of features, the model’s complexity rises, teristics and heat transfer in two-phase flow heat transfer inside the
10
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
Table. 6
Model evaluation of three different combinations of input parameters on the test set.
Models GROUP1 GROUP2 GROUP3
MRD(%) MARD(%) R2 MRD(%) MARD(%) R2 MRD(%) MARD(%) R2
(1) A database of condensation heat transfer coefficients inside tubes Data availability
was established, encompassing various condensation experi
mental conditions, multiple surface structures, and dimensions of Data will be made available on request.
enhanced and smooth tubes. The database comprises a total of
1947 data points, with 633 derived from a condensation heat CRediT authorship contribution statement
transfer database previously established by researchers, and an
additional 1314 data points collected from publicly available Wei Li: Conceptualization, Supervision, Validation, Resources.
literature by other researchers. This newly established database Gangan Zhang: Writing – original draft, Methodology, Formal analysis.
was used for training and testing machine learning models. Desong Yang: Writing – review & editing, Conceptualization.
11
W. Li et al. International Journal of Heat and Mass Transfer 224 (2024) 125330
Declaration of competing interest [20] Z. Sun, W. Li, X. Ma, X. Ma Y. He, Two-phase heat transfer in horizontal dimpled/
protruded surface tubes with petal-shaped background patterns, Int. J. Heat. Mass
Transf. 140 (2019) 837–851.
The authors do not have any actual and potential conflict of interest [21] Z. Sun, W. Li, X. Ma, Z. Ayub, Y. He, Flow boiling in horizontal annuli outside
with works and organizations described in the paper. horizontal smooth, herringbone and three-dimensional enhanced tubes, Int. J.
Heat. Mass Transf. 143 (2019) 118554.
[22] J. Ali, R. Khan, N. Ahmad, I. Maqsood, Random Forests and decision trees, Int. J.
Data availability Comput. Sci. Issues 9 (5) (2012) 272–278.
[23] R.E. Schapire, The boosting approach to machine learning: an overview, in: D.
Data will be made available on request. D. Denison, M.H. Hansen, C.C. Holmes, B. Mallick, B. Yu (Eds.), Nonlinear
Estimation and Classification, Springer New York, New York, NY, 2003,
pp. 149–171.
[24] H. Drucker, C.J. Burges, L. Kaufman, A. Smola, V. Vapnik, Support vector
Acknowledgments regression machines, Adv. Neural Inform. Process. Syst. 9 (1996) 155–161.
[25] G.S. Kori, M.S. Kakkasageri, Classification and regression tree (CART) based
resource allocation scheme for wireless sensor networks, Comput. Commun. 197
This work is supported by the National Natural Science Foundation (2023) 242–254.
of China (52076187 and 52320105001). [26] I. Ahmad, H. Manabu, M. Kano, S. Hasebe, Y. Inoue, H. Uegaki, Data-Based fault
diagnosis of power cable system: comparative study of k-NN, ANN, Random Forest,
and CART, IFAC Proc. Vol. 44 (1) (2011) 12880–12885.
References [27] S.J. Raudys, A.K. Jain, Small sample size effects in statistical pattern recognition:
recommendations for practitioners, IEEE Trans. Pattern Anal. Mach. Intell. 13 (3)
[1] M.M. Rahman, K. Kariya, A. Miyara, An experimental study and development of (1991) 252–264.
new correlation for condensation heat transfer coefficient of refrigerant inside a [28] W. Li, W. Feng, X. Liu, J. Li, B. Cao, B. Dou, J. Zhang, D.J. Kukulka, Condensation
multiport minichannel with and without fins, Int. J. Heat. Mass Transf. 116 (2018) heat transfer and pressure drop characteristics inside smooth and enhanced tubes
50–60. with R410A and R32, Int. J. Heat. Mass Transf. 214 (2023) 124419.
[2] L. Ma, X. Liu, Y. Gao, W. Li, Z. Wu, X. Luo, Z. Tao, S. Kabelac, R410A and R32 [29] W. Li, J. Wang, Y. Guo, Q. Shi, Y. He, D.J. Kukulka, X. Luo, S. Kabelac, R410A flow
condensation heat transfer and flow patterns inside horizontal micro-fin and 3-D condensation inside two dimensional micro-fin tubes and three dimensional dimple
enhanced tubes, Int. Commun. Heat Mass Transf. 142 (2023) 106638. tubes, Int. J. Heat. Mass Transf. 182 (2022) 121910.
[3] Y. Wan, A. Soh, Y. Shao, X. Cui, Y. Tan, K.J. Chua, Numerical study and [30] A. Cavallini, G. Censi, D.D. Col, L. Doretti, G.A. Longo, L. Rossetto, Experimental
correlations for heat and mass transfer coefficients in indirect evaporative coolers investigation on condensation heat transfer and pressure drop of new HFC
with condensation based on orthogonal test and CFD approach, Int. J. Heat. Mass refrigerants (R134a, R125, R32, R410A, R236ea) in a horizontal smooth tube, Int.
Transf. 153 (2020) 119580. -119580. J. Refrig. 24 (2001) 73–87.
[4] S.A. Sharif, M.K. Ming, V. Timchenko, G.H. Yeoh, Simulation of vapour bubble [31] J. Zhang, N. Zhou, W. Li, Y. Luo, S. Li, An experimental study of R410A
condensation using a 3D method, Nucl. Eng. Des. 403 (2023) 112128. condensation heat transfer and pressure drops characteristics in microfin and
[5] J. Yu, R. Huo, H. Shen, X. Li, Z. Zhu, A simulation study on the condensation flow smooth tubes with 5mm OD, Int. J. Heat. Mass Transf. 125 (2018) 1284–1295.
and thermal control characteristics of mixed refrigerant in a dimpled tube, Appl. [32] L. Wang, C. Dang, E. Hihara, B. Dai, Condensation heat and mass transfer
Therm. Eng. 231 (2023) 120889. characteristics of zeotropic refrigerant mixture R1234yf/R32 inside small-scale
[6] K.Girish M.T.Hughes, G. Srinivas, Status, challenges, and potential for machine tube: flow patterns observation and non-equilibrium film model calculation, Int. J.
learning in understanding and applying heat transfer phenomena, J. Heat Transfer. Thermal Sci. 191 (2023) 108347.
142 (2021) 120802. [33] H. Zhang, J. Li, N. Li, B. Wang, Experimental investigation of condensation heat
[7] M.M. Rashidi, M.A. Nazari, C. Harley, E. Momoniat, I. Mahariq, N. Ali, Applications transfer and pressure drop of R22, R410A and R407C in mini-tubes, Int. J. Heat.
of machine learning methods for boiling modeling and prediction: a comprehensive Mass Transf. 55 (2012) 3522–3532.
review, Chem. Thermodyn. Thermal Anal. 8 (2022) 100081. [34] T.A. Jacob, E.P. Matty, B.M. Fronk, Experimental investigation of in-tube
[8] J. Tang, S. Yu, C. Meng, H. Liu, Z. Mo, Prediction of heat transfer of bubble condensation of low GWP refrigerant R450A using a fiber optic distributed
condensation in subcooled liquid using machine learning methods, Chem. Eng. Sci. temperature sensor, Int. J. Refrig. 103 (2019) 274–286.
271 (2023) 118578. [35] A. Diani, Y. Liu. J. Wen, L. Rossetto, Experimental investigation on the flow
[9] F. Nie, H. Wang, Y. Zhao, Q. Song, S. Yan, M. Gong, A universal correlation for flow condensation of R450A, R515B, and R1234ze(E) in a 7.0mm OD micro-fin tube,
condensation heat transfer in horizontal tubes based on machine learning, Int. J. Int. J. Heat. Mass Transf. 196 (2022) 123260.
Thermal Sci. 184 (2023) 107994. [36] G.A. Longo, S. Mancin, G. Righetti, C. Zilio, Comparative analysis of microfin vs
[10] L. Zhou, D. Garg, Y. Qiu, S. Kim, I. Mudawar, C.R. Kharangate, Machine learning smooth tubes in R32 and R410A condensation, Int. J. Refrig. 128 (2021) 218–231.
algorithms to predict flow condensation heat transfer coefficient in mini/micro- [37] Q. Li, G. Chen, Q. Wang, L. Tao, Y. Xuan, Experimental study of condensation heat
channel utilizing universal data, Int. J. Heat. Mass Transf. 162 (2020) 120351. transfer of R134a inside the micro-fin tubes at high mass flux, Int. J. Heat. Mass
[11] M.T. Hughes, B.M. Fronk, S. Garimella, Universal condensation heat transfer and Transf. 187 (2022) 122524.
pressure drop model and the role of machine learning techniques to improve [38] J. Yu, X. Li, Z. Zhu, Study on condensation flow and heat transfer characteristics of
predictive capabilities, Int. J. Heat. Mass Transf. 179 (2021) 121712. multicomponent mixture in dimple tube, Case Stud. Thermal Eng. 35 (2022)
[12] A.D. Santamaria, M. Mortazavi, V. Chauhan, Machine learning applications of two- 102132.
phase flow data in polymer electrolyte fuel cell reactant channels, J. Electrochem. [39] Y. Li, R. Lei, H. Hu, K. Zhao, A black box based model for phase change heat
Soc. 168 (5) (2021) 054505. exchanger in refrigeration system simulations using Kriging interpolation method,
[13] S. Tarabkhah, B. Sajadi, M.A.A. Behabadi, Prediction of heat transfer coefficient Int. J. Refrig. 153 (2023) 231–239.
and pressure drop of R1234yf and R134a flow condensation in horizontal and [40] A. Lum, P. Langley, Selection of relevant features and examples in machine
inclined tubes using machine learning techniques, Int. J. Refrig. 152 (2023) learning, Artif. Intell. 97 (1997) 245–271.
256–268. [41] E.W. Lemmon, M.L. Huber, M.O. Mclinden, NIST standard reference database 23:
[14] M.K. Seal, S.M.A. Noori Rahim Abadi, M. Mehrabi, J.P. Meyer, Machine learning reference Fluid Thermodynamic and Transport Properties-REFPROP. 9.0, NIST
classification of in-tube condensation flow patterns using visualization, Int. J. NSRDS, 2010.
Multiph. Flow 14 (2021) 103755. [42] S. Xie, H. Lin, Y. Chen, H. Duan, H. Liu, B. Liu, Prediction of shear strength of rock
[15] F. Nie, H. Wang, Q. Song, Y. Zhao, J. Shen, M. Gong, Image identification for two- fractures using support vector regression and grid search optimization, Mater.
phase flow patterns based on CNN algorithms, Int. J. Multiph. Flow 152 (2022) Today Commun. 36 (2023) 106780.
104067. [43] M.M. Shah, A general correlation for heat transfer during film condensation inside
[16] L. Huang, C. Tang, J. Jiang, L. Tao, J. Chen, X. Li, Z. Zheng, H. Tao, Experimental pipes, Int. J. Heat. Mass Transf. 22 (4) (1979) 547–556.
research on refrigerant condensation heat transfer and pressure drop [44] A. Cavallini, R. Zecchin, A dimensionless correlation for heat transfer in forced
characteristics in the horizontal microfin tubes, Int. Commun. Heat Mass Transf. convection condensation, in: Proceedings of 5th international heat transfer
135 (2022) 106130. conference, Tokyo 3, 1974, pp. 309–313.
[17] W. Li, Two-phase heat transfer correlations in three-dimensional hierarchical tube, [45] A. Cavallini, D.D. Col, L. Doretti, M. Matkovic, L. Rossetto, C. Zilio, G. Censi,
Int. J. Heat. Mass Transf. 191 (2022) 122827. Condensation in horizontal smooth tubes: a new heat transfer model for heat
[18] W. Li, H. Chen, J. Wang, Y. Gao, X. Wang, Y. He, W. Tang, D.J. Kukulka, exchanger design, Heat Transf. Eng. 27 (8) (2006) 31–38.
Condensation heat transfer in annuli outside horizontal stainless steel enhanced [46] T. Bohdal, H. Charun, M. Sikora, Comparative investigations of the condensation of
tubes, Int. J. Thermal Sci. 177 (2022) 107479. R134a and R404A refrigerants in pipe minichannels, Int. J. Heat Mass Tran. 54
[19] W. Li, Y. Gao, J. Wang, Y. Gao, J. Wang, Z. Sun, X. Ma, W. Tang, Z. Ayub, Y. He, (2011) 1963–1974.
Y. Cao, Evaporation flow patterns and heat transfer in copper and stainless steel [47] P.Q. Vu, C. Kwang-Il, O. Jong-Taek, C. Honggi, K. Taehum, K. Jungho,
three-dimensional dimpled tubes, Int. J. Heat. Mass Transf. 193 (2022) 122954. C. Jaeyoung, An experimental investigation of condensation heat transfer
coefficient using R-410A in horizontal circular tubes, Energy Proc. 75 (2015)
3113–3118.
12