0% found this document useful (0 votes)
48 views16 pages

Prediction and Reliability Analysis of Shear Stren

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views16 pages

Prediction and Reliability Analysis of Shear Stren

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

www.nature.

com/scientificreports

OPEN Prediction and reliability analysis


of shear strength of RC deep beams
Khaled Megahed

This study explores machine learning (ML) capabilities for predicting the shear strength of reinforced
concrete deep beams (RCDBs). For this purpose, eight typical machine-learning models, i.e.,
symbolic regression (SR), XGBoost (XGB), CatBoost (CATB), random forest (RF), LightGBM, support
vector regression (SVR), artificial neural networks (ANN), and Gaussian process regression (GPR)
models, are selected and compared based on a database of 840 samples with 14 input features. The
hyperparameter tuning of the introduced ML models is performed using the Bayesian optimization
(BO) technique. The comparison results show that the CatBoost model is the most reliable and
accurate ML model ­(R2 = 0.997 and 0.947 in the training and testing sets, respectively). In addition,
simple and practical design expressions for RCDBs have been proposed based on the SR model with a
physical meaning and acceptable accuracy (an average prediction-to-test ratio of 0.935 and a standard
deviation of 0.198). Meanwhile, the shear strength predicted by ML models was then compared with
classical mechanics-driven shear models, including two prominent practice codes (i.e., ACI318, EC2)
and two previous mechanical models, which indicated that the ML approach is highly reliable and
accurate over conventional methods. In addition, a reliability-based design was conducted on two ML
models, and their reliability results were compared with those of two code standards. The findings
revealed that the ML models demonstrate higher reliability compared to code standards.

Keywords Deep beams, Symbolic regression (SR), Support vector regression (SVR), XGBoost (XGB),
CatBoost (CATB), Random forest (RF), Gaussian process regression (GPR), Artificial neural networks (ANN),
Bayesian optimization (BO) technique, Reliability-based design

Reinforced concrete deep beams (RCDBs), characterised by a small span-to-height ratio (typically below 2.5)1–3,
are commonly employed in various structures such as lower floors, transfer girders, and pile caps due to higher
shear strength compared to slender beams. Despite their widespread application, the design of RCDBs poses
challenges due to the nonlinear impact of different parameters on their shear behaviour. The primary failure
mode of RCDBs is shear stress, often resulting in sudden and catastrophic collapses, introducing significant safety
risks. Various shear strength models for RCDBs have been investigated, including those employing machine
learning ­methods2–11, the strut-and-tie ­model12–14, the compression field ­method15, and finite element ­analysis16.
However, traditional design methods, such as the strut-and-tie model (STM) or mechanism analysis, often fail to
adequately capture the complex relationship between parameters affecting shear strength, leading to imprecise
and conservative results compared to test results. Furthermore, the design provisions available, e.g., ACI 3­ 1817,
and ­EC218, and different ­models13,14, provide simple procedures for calculating the shear capacity of RCDBs but
their conservative nature and their discrepancy with test results fail in introducing a comprehensive model that
can approximate the shear capacity of RCDBs accurately.
In recent developments, new models have been proposed to enhance the prediction of shear capacity in deep
beams. Chen et al.19 introduced the cracking STM model, which integrates the STM approach with considera-
tions of diagonal crack patterns and strain distributions in horizontal reinforcement. Meanwhile, Chetchotisak
et al.20 presented a modified interactive STM for RCDBs, relying on two distinct load-bearing mechanisms: the
inclined strut and the truss. This model refines the strut mechanism from the interactive STM and incorporates
empirical constants into the Mohr–Coulomb failure criterion to define a new concrete failure mode. Fan et al.21
proposed an STM for unsymmetrically loaded RC deep beams, where the geometry of the compression nodal
zones is determined using Mohr’s Circle and the minimum strain energy criteria. Despite aligning well with
experimental data, these models require tedious calculations.
Machine learning (ML) has become a promising tool in many engineering aspects, providing an alternative
procedure for addressing engineering challenges. ML algorithms, including support vector machines, artificial
neural networks, genetic algorithms, and ensemble learning methods, have been extensively used in predicting

Department of Structural Engineering, Mansoura University, PO Box 35516, Mansoura, Egypt. email: k.megahed@
mans.edu.eg

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 1


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
www.nature.com/scientificreports/

the shear strength of R ­ CDBs2–11. For example, Ma et al.2 implemented six ML models to predict the shear
strength of RCDBs and compared their performance with five previous closed-form models. Recently, Nguyen
et al.3 implemented seven machine learning models for predicting the shear strength of RCDBs and found that
Gaussian process regression (GPR) is the most reliable and accurate ML model. Feng et al.6 studied four typical
ensemble learning models, including random forests, gradient boosting regression tree, adoptive boosting and
extreme gradient boosting (XGBoost), to predict the shear capacity of RCDBs using a dataset of 271 samples
and grid search method for hyper-parameters tunning. The comparison results of these models showed that
the XGBoost model is the best model concerning prediction accuracy (­ R2 = 0.992 and 0.917 in the training and
testing sets, respectively). However, the metric errors in the testing set are nearly 3–8 times those in the training
set, indicating signs of overfitting. Recently, Tiwari et al.7 used eight ML models for the shear capacity of RCDBs
and found that the XGBoost model exhibited the highest accuracy. Ashour et al.4 used genetic expression pro-
gramming to develop an empirical expression for the shear strength of RCDBs using 141 test data. Shahnewaz
et al.9 and ­Wakjira10 used a genetic algorithm to predict the shear strength of RCDBs. Liang et al.22 devoloped a
symbolic regression (SR) model based on the Modified Compression Field Theory to analyze the punching shear
resistance of fiber-reinforced polymer (FRP) reinforced concrete slabs.
From literature review, it was found that limited researchers 23,24 have examined the safety of RC deep beams
designed according to the code design practice. Aguilar et al.23 evaluated the reliability of deep beams designed
using the strut-and-tie method according to ACI 318. They found that ACI 318 design practice increases the
likelihood of nonductile failure and suggested reliability-based strength reduction factors of 0.65 for struts and
0.90 for ties. Muendacha et al.24 conducted a safety-based evaluation of shear design methods for RC deep beams
using strut-and-tie models (STMs) in accordance with international concrete codes, considering variability in
load actions and member resistances as random variables. Their findings indicated that deep beams made from
normal-strength concrete and designed using these STMs provided a satisfactory safety level, and they suggested
probability-based reduction factors to achieve a target reliability index greater than 3.5. Regarding the integration
of reliability analysis with machine learning, Shen et al.25 combined reliability analysis with machine learning by
using Monte Carlo simulation alongside a machine learning-based surrogate model to calibrate the reliability of
slab-column joints for punching shear resistance.
It can be concluded from these studies that ML can be used successfully to predict the shear strength of
RCDBs accurately. However, most models depend on primitive search algorithms, such as grid search techniques
for tuning ML parameters, lacking sophistication in refining the ML models. Moreover, most recent studies lack
a real-world practical application and fail to highlight the gap between the theory and practical implementation.
While many ML models exhibit superior results, deriving an explicit design formula from these models is chal-
lenging. The black-box and difficult-to-interpret nature of these models hinders their practical implementation
in engineering design. Moreover, previously introduced ML studies primarily focus on prediction outcomes and
accuracy without engaging in reliability-based design to bridge the gap between ML and practical engineering
applications. Furthermore, many studies develop separate models for specific beam cases, such as those with
or without web ­reinforcements2,10. This approach not only lacks generalisation but also introduces fluctuations
in the results. In addition, most expressions introduced through ML techniques, i.e., genetic expression pro-
gramming (GEP) and genetic algorithm (GA), lack clear interpretation, lack physical meaning, and are overly
­complex4,9,10. Table 1 provides an overview of ML models and previous formulas employed in previous studies,
as well as their associated results.
The present study introduces novel contributions in several key aspects. Firstly, it develops unified ML-based
models for RCDBs shear strength, combining both beam cases, i.e., with and without web reinforcements, in a
unique predictive model. while many previous studies focused on predicting each type i­ ndependently2,10. Fur-
thermore, the ML results are compared with those of mechanics-driven models, including two prominent design
codes (American code (ACI318)17 and European code (EC2)18) and two previous mechanic-based m ­ odels13,14
to validate the performance of the developed ML models. Secondly, the Bayesian optimization (BO) technique
is adopted for selecting the optimal hyperparameters for the introduced ML models. This approach differs from
the conventional and less advanced searching techniques commonly found in literature, such as the grid search
technique. Thirdly, simple and practical design expressions for RCDBs have been proposed based on the sym-
bolic regression (SR) model. These expressions are simple and easy to interpret and demonstrate remarkable
accuracy compared to previous closed-form models. Finally, a reliability-based design assessment is conducted
on two different ML models and two code standards to evaluate the reliability of utilising ML models in practi-
cal design applications.

Experimental database of RC deep beams


The schematic diagram of the shear mechanism of RCDBs is shown in Fig. 1. To construct robust ML models and
investigate their influencing parameters, a dataset comprising 840 RCDB tests was collected in existing literature
and from a database collected by Chetchotisak et al.20, including 322 specimens without web reinforcement
(WOR) and 518 specimens with web reinforcement (WWR). The details of the collected database are provided
in Supplementary data. Based on the results of various experimental and theoretical s­ tudies12–14,26, the shear
capacity of RCDBs is influenced by different shear components, which typically encompass the strength of con-
crete material, longitudinal rebars and web reinforcement. Therefore, 14 different design features were set as the
input variables, grouped into five c­ ategories26: (1) geometric dimensions: beam height (h), effective height (d),
width (bw), shear span (a) and shear span-to-depth ratio (a/d); (2) concrete property, i.e., concrete strength (fc′);
(3) bottom longitudinal reinforcement properties: reinforcement ratio (ρl), and strength (fyl); (4) web reinforce-
ment properties: vertical web reinforcement (VWR) ratio (ρv) and strength (fyv), horizontal web reinforcement
(HWR) ratio (ρh) and strength (fyh); (5) top plate width (wtp) and bottom plate width (wbp). The corresponding

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 2


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
www.nature.com/scientificreports/

Reference Category (number)* Models: Statistical criteria


Wakjira10 WOR (371)
 a −0.874
GP: Vth = 0.0456fc′0.619 ρl0.411 d bw d μ = 0.82, COV = 0.305
GP:
    2  
Ashour4 All (141) V = bw h fc′ −4.56 + 1.68 da ρl2 + 2.45 + 0.1 da − 1.16 da + 3.12ρt ρl + 0.3ρhw + 0.4ρvw


μ = 1.11, Std = 0.21

GA:  
 a 0.23  a  1 2.65
Vu = bw hfc′ 2 1
+ 0.85(ρl ρhw ρvw )0.1 − 3
− 200 da ρl ρhw ρvw
  
− ρhw ρvw 16
9 5 4 d 5 d
Shahnewaz All (381)
μ = 0.99, CoV = 0.232
  0.044 
GA: Vu = bw hfc′ 1.74 − 2 da + 0.5ρ 0.14 μ = 1.01, CoV = 0.257

EMARS, BPNN, RBFNN, SVM. EMARS is the best model. Grid search with cross-validation
Cheng5 All (106)
EMARS: training MAPE = 5.67, R
­ 2 = 0.989, testing MAPE = 5.887, ­R2 = 0.973
DT, SVM, ANN, RF, AdaBoost, GBRT, XGBoost. XGboost is the best model. Grid search with
Feng6 All (271) cross-validation
XGboost: Training ­R2 = 0.999, MAPE = 0.74, testing ­R2 = 0.928, MAPE = 10.44% (overfitting)
LWR, RF, MLR, ELM. LWR is the best model. Grid search with cross-validation
Hameed11 All (271) LWR: Training: RMSE = 22.563, MAE = 13.249, a20-index = 98.89, testing: RMSE = 57.776,
MAE = 33.933, a20-index = 85.87
LR, SVR, ANN, RF, XGBoost, NGBoost using Bayesian optimisation technique. NGBoost is the
Liu8 All (267) best model
NGboost: ­R2 = 0.9045, RMSE = 38.7976 kN
DT, SVR, RF, GB, Adaptive boosting, XGBoost, voting regression. XGboost is the best model. Grid
search with cross-validation
Tiwari7 All (271)
XGboost: Training ­R2 = 0.999, MAPE = 0.78, RMSE = 1.45 kN, testing ­R2 = 0.928, MAPE = 9.79,
RMSE = 47.76 (overfitting). μ = 1.00, CoV = 6.38%
LR, ANN, SVR, DT, GPR, XGBoost using Bayesian optimisation technique. GPR is the best model
Nguyen3 All (518) GPR: Training ­R2 = 0.99, MAE = 12.77, RMSE = 18.84 kN, validation ­R2 = 0.89, MAE = 41.72,
RMSE = 71.06 kN, testing ­R2 = 0.94, MAE = 38.44, RMSE = 63.38
kNN, DTM RFM GBDT, CatBoost, XGboost. XGboost is the best model. Grid search with cross-
validation
XGboost: Training ­R2 = 0.992, MAE = 0.148, RMSE = 0.26, testing ­R2 = 0.917, MAE = 0.531,
Ma2 All (457)
RMSE = 0.777
WOR*: μ = 1.03, Std = 0.128, WVR*: μ = 1.005, Std = 0.073, WHR*: μ = 1.003, Std = 0.077, WVHR*:
μ = 1.01, Std = 0.084
CATBoost: Training: μ = 1.005, CoV = 0.062, a20-index = 0.9894, MAPE = 4.41, RMSE = 36.8 kN
Testing: μ = 1.026, CoV = 0.141, a20-index = 0.899, MAPE = 9.32, RMSE = 160.9 kN
This study All (840)
SR: (WOR)* μ = 1.003, CoV = 0.207, a20-index = 0.68, MAPE = 16.80, RMSE = 115.9 kN
SR: (WWR)* μ = 1.004, CoV = 0.192, a20-index = 0.78, MAPE = 13.70, RMSE = 196.7 kN

Table 1.  Summary of previous ML models in predicting RCDBs shear strength. *WOR and WWR stand for
without web reinforcement and with web reinforcement cases.

Figure 1.  The dimensions of RC deep beam.

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 3


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
www.nature.com/scientificreports/

output is the shear strength index of the RCDBs (Vu/bwh fc′), denoted by vn, where Vu is the web shear capacity.
Table 2 summarises statistical information for the output and 14 input features within the established database.
The Pearson correlation coefficient (r) is used in this study to assess the strength of the linear correlation
between any two ­features27. Spanning from − 1.0 to 1.0, a value of − 1.0 indicates a strong negative relationship,
1.0 signifies a strong positive relationship, and 0 denotes no correlation. As illustrated in Fig. 2, the Pearson cor-
relation matrix displays that the relationship between most input features is insignificant. However, a relatively
high degree of correlation is observed between the VWR/HWR ratios and VWR/HWR strengths and between

Statistics
Variable Symbol Type Min Max Mean Std Skewness Kurtosis
Beam height h(mm) Input 152 2100 564 291 2.076 5.407
Beam effective height d(mm) Input 137 2000 499 271 2.16 5.962
Beam width bw(mm) Input 51 914 196 120 2.601 9.865
Shear span a(mm) Input 80 4375 637 465 2.629 9.976
Shear span-to-depth ratio a/d Input 0.27 2.502 1.304 0.541 0.349 − 0.499
Concrete strength fc′(MPa) Input 11.3 120.1 40.2 21.6 1.122 0.698
Bottom reinforcement ratio ρl Input 0.003 0.113 0.02 0.011 1.704 7.3
Bottom reinforcement strength fyl(MPa) Input 267 1330 470 131 2.828 13.993
Vertical web reinforcement ratio ρv Input 0 0.029 0.003 0.004 2.493 9.767
Vertical web reinforcement strength fyv(MPa) Input 0 1051 273 227 0.103 − 0.627
Horizontal web reinforcement ratio ρh Input 0 0.032 0.002 0.003 3.553 17.956
Horizontal web reinforcement strength fyh(MPa) Input 0 855 206 230 0.426 − 1.329
Top plate width wtp(mm) Input 10 914 146 107 3.094 12.72
Bottom plate width wbp(mm) Input 10 610 136 82 2.534 8.657
Vu
Shear strength index vn = bw hfc′ Output 0.011 0.293 0.134 0.053 0.384 − 0.287

Table 2.  Statistic features of the experimental dataset.

Figure 2.  Correlation matrix for the RC deep beams database.

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 4


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
www.nature.com/scientificreports/

the widths of the upper and lower bearing plates. The former correlation is attributed to the presence of 322
specimens without reinforcement (ρv = fyv = ρh = fyh = 0) out of the total 840, leading to a pseudo correlation effect.
While the latter correlation between the widths of the upper and lower bearing plates arises from the fact that
a significant portion of the tests were conducted with identical plate widths. Among all the input variables, the
ratio a/d, concrete strength fc′, HWR ratio ρh, and VWR ratio ρv appear to have the most significant impact
on the shear strength index (Vu/bwh fc′), with correlation values of − 0.91, − 0.39, 0.24, and 0.22, respectively.
These findings imply that increasing the ratio a/d will significantly reduce the shear strength index. Similarly,
increasing concrete strength triggers the brittle failure of the beam, leading to a reduction in the strength index,
while increasing VWR/HWR ratios enhances the ductility of the RCDBs. These observations align well with the
mechanical behaviour and experimental results of R ­ CDBs12–14,26.

Research significance
This study presents novel contributions in multiple domains: Firstly, it introduces unified machine learning
models for predicting shear strength in Reinforced Concrete Deep Beams (RCDBs). Additionally, the study
employs Bayesian Optimization for hyperparameter tuning. Simple and practical design expressions based on
symbolic regression are proposed, demonstrating remarkable accuracy compared to previous mechanism models.
In addition, a reliability-based design assessment evaluates the reliability of using machine learning models in
practical design applications.

ML algorithms
In this study, eight typical ML models are selected to predict the shear strength of RCDBs, including symbolic
regression (SR)28,29, Gaussian process (GPR)30, artificial neural network (ANN), light gradient-boosting machine
(LightGBM)31, random forests (RF)32, categorical boosting (CatBoost)33, extreme gradient boosting (XGBoost)34,
and Support vector regression (SVR)35. The predictive performances of these models are then evaluated and
compared. In general, ensemble learning tends to exhibit higher accuracy and stability compared to individual
­models2,6–8.
Random forests, proposed by B ­ reiman32, falls under the category of ensemble learning based on bagging,
which utilises bagging sampling to create a subset for training weak learners (such as decision trees) and makes
decisions on regression or classification tasks through averaging or voting. Several crucial parameters, including
the number of trees, the maximum number of features, and the maximum depth of trees, significantly impact the
training results. On the other hand, CatBoost, LightGBM, and XGBoost are all part of ensemble learning based
on boosting, which combines weak learners into a strong one through an iterative ­process36. CatBoost excels
in handling categorical features, eliminating the need for preprocessing non-numerical f­ eatures33. It solves the
problem of gradient bias and enhances the generalization ability by employing unbiased boosting techniques
­ ightGBM31 uses a histogram-based approach for splitting, while X
with categorical features. L ­ GBoost27 utilises a
level-wise depth-first approach, which results in faster training times and better handling of large databases with
LightGBM compared to XGBoost. In subsequent sections, this paper will introduce two innovative ML models,
including CatBoost and symbolic regression models.

CatBoost model
CatBoost is a gradient boosting a­ lgorithm33,37, which differs from other gradient boosting algorithms in its use
of ordered boosting, an efficient modification of gradient boosting algorithms. This modification can handle
the problem of target leakage and can reduce prediction shift during ­training33. It is beneficial for small datasets,
and it can handle categorical features. Specifically, the original variable is replaced with a new binary feature
for each category. Another advantage of CatBoost is its use of random permutations in estimating leaf values
during the selection of the tree ­structure33. This strategy helps overcome overfitting issues commonly associ-
ated with traditional gradient-boosting algorithms. Furthermore, CatBoost utilises binary decision trees as the
foundational predictor.
As described by Dorogush et al.33, CatBoost can be outlined as follows: Let Ti represent the model built after
constructing first i trees, gi (Xk , Yk ) denote the gradient value on k-th training sample after constructing i trees.
To ensure an unbiased gradient concerning the model Ti, it is essential to train Ti without the observation Xk.
The standard training process appears impossible without observations since unbiased gradients are required
for all training examples. The following trick is considered to handle this problem: for each example, Xk, a sepa-
rate model Mk is trained and never updated using a gradient estimate for that specific example. With Mk, the
gradient on Xk is estimated and used to score the resulting tree. Let us present the flowchart shown in Fig. 3a
that explains how this trick can be performed. Let Loss(y, a) be the optimising loss function, where y is the label
value and a is the formula value.

Symbolic regression and proposed equations


Symbolic regression (SR)28,29 is a genetic programming t­ echnique38 which seeks to search simple and interpret-
able analytic formulas providing the best fit for a given model by exploring a predefined space of mathematical
expressions and functions. SR are treated as multi-objective optimisation problems, finding a balance between
the model’s predictive accuracy and complexity. The genetic programming techniques are often utilised in SR
by applying natural selection and evolution principles to iteratively refine candidate mathematical expressions
until satisfactory models are obtained. This paper uses a Python library named ­PySR32 to search interpretable
simple expressions for the shear capacity of the RCDBs.
The SR algorithm initiates by constructing an initial population with a random combination of operational
symbols or functions (e.g., +, −, /, *, ^, etc.) and terminals, including input variables and constants. This process

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 5


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
www.nature.com/scientificreports/

Figure 3.  Flow charts of the introduced ML models. (a) CatBoost, (b) Symbolic regression.

generates a tree-like expression for each individual in the population. Individuals are probabilistically selected,
giving preference to the best-performing ones. The selected individuals undergo mutation (Fig. 4a,b) or crossover
(Fig. 4c) to produce a new generation of populations, using a fitness function to identify the best individuals in
each population generation is defined a­ s39

(1)
 
l(E) = lpred (E).exp frecency[C(E)]

where lpred (E) is the prediction loss (selected as the mean absolute error), C(E) is the complexity of the expression
E, (the total number of nodes in the expression), and frecency [C(E)] measures the frequency and recency of the
expression occurring at complexity C(E) in the population. This measure is employed to prevent excessive growth
and redundancies in expressions generated, balancing error minimisation and simplicity. Table 3 outlines the SR
parameters used in expression generation. The core steps of SR are presented in Fig. 3b.
Selecting the optimal expression requires numerous iterations and a thorough investigation for each itera-
tion. These iterations encompass trying various custom functions, a diverse set of operators, and extensive
combinations of input features, which could potentially affect the shear strength of ­RCDBs40. The parameter
selection process includes the most significant features identified from the Pearson correlation matrix, such as
span-to-depth ratio (a/d), concrete strength (fc′), and reinforcement ratios (ρl, ρv, ρh). Additionally, parameters
from previous equations are considered, as outlined in Table 6, such as web reinforcement contribution (ρvfyv,
ρhfyh) and the angle between the strut and the longitudinal axis (θ). The author also introduced some unitless
parameters, including vertical and horizontal web reinforcement contribution factors (ρvfyv/fc′, ρhfyh/fc′) and the
shear strength index (Vu/bwh fc′). The SR algorithm generates different expressions for each iteration using various
combinations of these parameters. Each resulting equation extracted with each iteration undergoes exhaustive
evaluation and refinement. The selection criteria carefully weigh multiple factors, including equation complexity,
accuracy, and interpretability. For RCDBs without web shear reinforcement, the following equation is derived:

Figure 4.  Mutation and crossover operations in SR model. (a) A mutation operation on expression tree,
(b) a mutation operation on input variable, (c) a crossover operation between two trees.

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 6


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
www.nature.com/scientificreports/

Parameters Value Parameters Value


Number of generations 200 Allowed Binary operators +, *, ^, /
Total number of populations 50 Loss function Mean absolute error
Population size 20 Constraints {‘^’:(–1,10)}(a)
Maximum length of expressions (total number of nodes) 30 (WWR), 22 (WOR) Nested constraints ‘^’:{‘^’:0,’/’:1}, ‘/’:{‘/’:0,’^’:1}(b)
Parsimony (factor control the expression complexity) 0.02 Model_selection Accuracy

Table 3.  The parameters of the SR model used in generating expressions. (a) The constraint ‘^’:(− 1, 10) says
that power laws can have any complexity in the left argument, but only 10 complexity (nodes) in the right
argument. (b) The nested constraints specify how many times a combination of operators can be nested. The
constraint ‘/’:{‘/’:0,‘^’:1} indicates that ‘/’ may never appear within ‘/’, but ‘^’ can be nested once in ‘/’.

Vn (0.85−0.22 da ) (0.29− da ρl ) a
= 1.5fc ρl , ≤ 2.5, ρl ≤ 0.1 (2)
bw h d
For RCDBs with web shear reinforcement, the following equation is extracted:
29ρl + 3.8ρl0.3 fc′

fyh
   
Vn fyv
= �vh a
, � vh = ρh ′
+ ρv (3)
fc′
 
bw h (0.3) d + 0.47
fc

where vh represents the web shear reinforcement contribution factor. The proposed equations establish a com-
prehensive and simple framework for predicting the shear strength of RCDBs with meaningful physical inter-
pretations. In the context of the RCDBs without web shear reinforcement in Eq. (2), it is evident that increasing
the longitudinal reinforcement ratio ρl or the concrete compressive strength will enhance shear capacity while
increasing the shear span-to-depth ratio will weaken the beam shear strength. Notably, these findings align well
with the conclusions drawn in the study of Ashour’s s­ tudy4, which identified the a/d ratio and ρl as the most
significant parameters influencing shear behavior. Concerning RCDBs with web shear reinforcement in Eq. (3),
the shear strength of RCDBs increase with increasing concrete strength, longitudinal reinforcement ratio ρl, web
shear reinforcement contribution vh and decreasing a/d ratio. These observations align well with the mechani-
cal behaviour and experimental results of ­RCDBs12–14,26,41. Furthermore, the developed expressions are simple,
robust, and have physical meaning compared to that of GEP and GA models introduced in the previous studies
in Table 1.

Data preprocessing and hyperparameter Bayesian optimisation technique


In this study, the min–max scaling technique is utilised for data normalisation to mitigate the adverse effects
of multidimensionality. Following normalisation, the datasets are partitioned into two subsets for training and
testing. Eighty percent of the original dataset is randomly allocated for training, while the remaining 20% is
reserved for testing.
The performance of most ML algorithms relies heavily on their hyperparameters, which are predefined before
model training. Properly tuning these hyperparameters is essential to ensure optimal prediction performance.
Finding the best hyperparameters requires trying various sets of hyperparameters and selecting the parameter
combination that yields the best performance with the validation data. Traditional techniques such as grid search
(GS) and random search (RS) can be exhaustive and time-consuming, especially for models with various hyper-
parameters and large search space. In contrast, Bayesian optimization (BO) models utilise surrogate functions,
i.e., Gaussian processes and tree-structured Parzen estimators (TPE)34, which guide the next selection of the
hyperparameter combination depending on the performance of the previous history of tested hyperparameter
values. This strategy minimises redundant evaluations, enabling BO to reach the optimal hyperparameter com-
bination in fewer iterations compared to GS and RS m ­ ethods42. This study adopted the TPE m­ odel34 to optimise
the introduced ML models due to its superior robustness compared to other surrogate f­ unctions42. Mean Absolute
Percentage Error, MAPE is chosen as the objective function in the validation dataset. The expected improvement
(EI) of TPE, defined in Eq. (4), builds a probability model of the objective function and uses it to select the most
promising hyperparameters to evaluate in the true objective f­ unction43:
constant w.r.t (z)
EIs∗ (z) = g(z) (4)
γ + (1 − γ ) l(z)

where z is the hyperparameter combination chosen from the search space and s* is a threshold chosen to be
some quantile γ of the observed s values, so that p(s < s∗ ) = γ . Additionally, l(z) and g(z) correspond to two
distinct distributions: one where the objective function values are below the threshold, l(z), and another where
the values exceed the threshold, g(z). To maximize EI, TPE focuses on drawing samples of hyperparameters with
the maximum l(z)/g(z) ratios from Eq. (4). Finally, cross-validation was applied to assess the introduced models’
effectiveness, avoid overfitting, and obtain accurate predictions for the testing data. Table 4 presents the optimal
hyperparameters for the introduced ML models.

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 7


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
www.nature.com/scientificreports/

ML model Optimal hyperparameters


CatBoost iterations = 1696, learning_rate = 0.0906, depth = 4, subsample = 0.389, colsample_bylevel = 0.784, min_data_in_leaf = 10
GPR Kernel: Constant*RBF + Constant*Matern + Constant*WhiteKernel + Constant* RationalQuadratic, gpr.alpha = 0.002
n_estimators = 1972, learning_rate = 0.0329, max_depth = 50, num_leaves = 10, boosting_type = Gradient Boosting Deci-
LightGBM
sion Tree
XGBoost n_estimators = 1968, max_depth = 41, learning_rate = 0.0611, booster = ‘dart’, gamma = 0.01
RandomForest random_state = 1000, n_estimators = 1134, max_depth = 22, min_samples_leaf = 2, max_features = ‘log2’, bootstrap = False
ANN number of hidden layers = 1, neurons number of hidden layer = 12
SVR Log(C) = 0.9988, log(epsilon) = − 74.88644548, log(gamma) = − 1.72300541

Table 4.  The optimal hyperparameters for ML models.

Performance and results of ML models


In this section, a comparison of the performance of the developed ML models is made. The details of established
ML models are provided in Supplementary data, including hyperparameter tuning and results. In Fig. 5, the
scatter plots depict the relationship between experimental and predicted results across different ML models. As
noticed, the data points cluster closely around the diagonal line for most of the developed ML models, indicat-
ing a strong alignment between model expectations and test results. This alignment emphasises the reliability
and prediction accuracy achieved by the developed models. Table 5 highlights evolution metrics used to study
the performance of the implemented models, i.e., coefficient of determination (R2), the mean (μ), coefficient
of variance (CoV), mean absolute percentage error (MAPE), root mean squared error (RMSE), and a20-index,
defined as follows:
2 
n  n n    n
2 yi − yi
i=1  1 yi 100%    yi  1  2
R = 1 − n  2 , µ = , MAPE = − 1 , RMSE =  yi − yi

y − y n yi n  yi
 n
i=1 i i=1 i=1 i=1
(5)
where yi and yi are the predictions and actual output values of the i-th specimen, respectively, y is the mean value
of actual observations, and n is the number of samples in the database. The a20-index44 introduces the ratio of
specimens  yi /yi ratio within the interval of 0.80–1.20.

Figure 5.  Comparison between proposed equations and ML models for training and testing datasets.

Training data Testing data All data


Metrics CatB GPR LGBM CatB GPR LGBM CatB GPR LGBM Prop.Eqn (WOR) Prop.Eqn (WWR)
Mean µ 1.005 1.006 1.008 1.026 1.023 1.026 1.01 1.01 1.012 1.003 1.004
CoV 0.062 0.062 0.076 0.141 0.151 0.154 0.085 0.088 0.098 0.207 0.192
R2 0.997 0.997 0.991 0.933 0.947 0.875 0.986 0.988 0.971 0.917 0.937
MAPE 4.41 4.44 5.32 9.32 10.27 9.61 5.39 5.60 6.18 16.80 13.70
RMSE( kN) 36.8 39.6 63.6 160.9 143.3 219.9 79.1 73.2 113.6 115.9 196.7
a20-index 0.994 0.993 0.981 0.899 0.857 0.887 0.975 0.965 0.962 0.68 0.78

Table 5.  Comparison of the developed ML models.

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 8


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
www.nature.com/scientificreports/

Code Standard Formulas


ACI31817 VACI = 0.85βs fc′ bw ws sinθ , ws = 1.8wt ∗ cosθ + wtp + wbp sinθ /2
   

EC218 Vu,EU = 0.85βs ∗ fc′ bw ws sinθ , ws = [1.85wt ∗ cosθ + wtp + wbp sinθ ]/2
 

­ ong13
Matamoros and W VM−W = Cc fc′ bw ws + Cwv ρv bw a3 fyv + Cwh ρh bw d3 fyh , Cc = 0.3/ da , Cwv = 1,Cwh = 3(1 − a/d)
 

Russo et al.14 VRusso = 0.76 kχfc′ sinθ + 0.35 da ρv fyv + 0.25tanθρh fyh bw d, χ = 0.74r 3 − 1.28r 2 + 0.22r + 0.87, r = fc′ /105
 

(0.85−0.22 da ) (0.29− da ρl ) a
Without web reinf.: bVwnh = 1.5fc ρl , ≤ 2.5, ρl ≤ 0.1
Current study √  d
29ρl +3.8ρl0.3 fc′ fyh
 
f
With web reinf.: bVwnh = � a , � vh = ρ h fc′ + ρv fyv′
(0.3) vh ( d )+0.47 c

Table 6.  Summary of previous mechanical models in predicting RCDBs shear strength. MW stands for
Matamoros and Wong’s formula. where βs is coefficient of strut, θ is the angle between the strut and the
longitudinal axis, ­ws and ­wt are the widths of the strut and tie, εs is the tie’s tensile strain, ρv is reinforcement
ratio for VWR, and ρh is reinforcement ratio for HWR; the χ function is obtained for 10 ≤ fc′ ≤ 105 MPa.

As shown in Table 6, all introduced ML models display mean μ, R2, and a20-index values close to 1.0 and
small values for CoV, MAPE, and RMSE. The MAPE values for the CATB model are 4.41 and 9.32 in the training
and testing sets, respectively, which reach the lowest values compared to other models. Similarly, those of the
GPR model are 4.94 and 10.27, and those of the LGBM model are 5.32 and 9.61, indicating the high accuracy
of the developed models. The CoV and MAPE for all ML models are nearly twice as high for the testing data
compared to the training data, indicating consistent training with minimal overfitting tendencies. Furthermore,
the μ values of the CATB model are 1.005 and 1.026, the R2 values are 0.997 and 0.933, and the a20-index values
are 0.994 and 0.899 in the training and testing sets, respectively, which are all close to 1.00. Such evaluation
metrics reveal that the CATB model introduces the best prediction accuracy and predictive balance between
the training and testing sets.
While CATB, GPR, and LGBM models exhibit superior results, deriving an explicit design formula from these
models is challenging. The black-box and difficult-to-interpret nature of these models hinders their practical
implementation in engineering design. Therefore, this study tackles this challenge by introducing straightforward
and practical explicit design formulas through the SR technique. As shown in Table 6, the proposed equations
yield μ values of 1.003 and 1.004, R2 values of 0.917 and 0.937, and CoV values of 0.207 and 0.192 for the RCDBs
without web reinforcement (WOR) and with web reinforcement (WWR) cases, respectively. Despite their slightly
lower accuracy compared to the introduced ML models, these SR-derived formulas are more accessible and easier
to interpret, encouraging their practical utility in engineering applications.

Comparisons with closed‑form models


In this section, a comparison of the proposed equations with four present closed-form models (listed in Table 6),
including two standard codes, i.e., ACI 318-1917, ­EC218, and equations proposed by Matamoros and Wong
(MW)13, and Russo et al.14 are introduced for performance evaluation. Table 7 summarises the statistical infor-
mation about the predictive capability of these models compared to the proposed equations for two different
reinforcement configurations, i.e., the case without web reinforcement (WOR) and the case with web reinforce-
ment (WWR). The values of (μ, CoV) obtained by the proposed equations are (1.003, 0.207) and (1.004, 0.192)
for WOR and WWR cases, respectively, which shows that these expressions perform well in terms of predictive
stability and robustness compared to the present closed-form models. Additionally, Fig. 6 presents the scatter
plots to illustrate the relationship between experimental and predicted results based on the entire database
obtained by the proposed expressions and the four closed-form models. In Fig. 6, ACI 318-19, EC2, and MW
expressions exhibit similar performance, with over-diagonal-skewed distribution, indicating that these models
tend toward conservative prediction. On the other hand, the proposed equations demonstrate concentrated
prediction-to-test ratios around unity, with (μ, CoV) values of (1.003, 0.198), marking the best results among
these models. Furthermore, the CATBoost model displays superior performance with (μ, CoV) values of (1.01,
0.088), highlighting its excellent efficacy in employing ML techniques for shear strength prediction of RCDBs.

Without web reinforcement (WOR) With web reinforcement (WWR) Overall


Metrics MW13 Russo14 EC218 ACI17 Prop MW13 Russo14 EC218 ACI17 Prop MW13 Russo14 EC218 ACI17 CatB Prop
Mean µ 0.782 1.083 0.589 0.669 1.003 0.822 1.075 0.732 0.718 1.004 0.807 1.078 0.677 0.699 1.01 1.003
CoV 0.308 0.251 0.299 0.283 0.207 0.432 0.233 0.35 0.409 0.192 0.392 0.24 0.353 0.372 0.085 0.198
R2 0.824 0.866 0.487 0.671 0.917 0.634 0.942 0.722 0.559 0.937 0.664 0.932 0.692 0.579 0.986 0.935
MAPE 43.08 17.93 82.88 62.32 16.80 70.09 13.57 56.70 65.93 13.70 59.74 15.28 66.73 64.544 5.389 14.89
RMSE( kN) 168.6 147.3 288 230.6 115.9 474.2 189.3 413.2 520.6 196.7 386.7 174.4 370.2 433 79.1 170.3
a20-index 0.301 0.621 0.087 0.177 0.68 0.407 0.763 0.27 0.241 0.78 0.367 0.708 0.2 0.217 0.975 0.742

Table 7.  Comparison of the developed ML models.

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 9


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
www.nature.com/scientificreports/

Figure 6.  Comparison between proposed equations and previous models.

Figure 7 illustrates the prediction errors of both design standards and the developed ML models. In Fig. 7a,
CATB, GPR, and LGBM models demonstrate precision, with over 81% of test samples falling within the 10%
error range. In contrast, the MW and Russo formulas exhibit 21% and 39% of test samples within the same error
range, respectively. As noticed, ACI318 and EC2 provisions perform less effectively, capturing only 11% and 16%
of test samples within the 10% error range, respectively. These results highlight the superior accuracy of most ML
models, particularly CATB, GPR, and LGBM, in predicting the shear strength of RCDBs compared to traditional
design standards. In Fig. 7b, the performance of the proposed equation and Russo formula for the WOR case is
comparable, with a slight advantage for the proposed equation. The better performance of the proposed equation
in the WOR case is evident in Table 7, where it exhibits a smaller error metric (i.e., CoV of 0.207) compared to
the Russo formula (i.e., CoV of 0.251). In the WWR case, the proposed equation outperforms previous models,
as shown in Fig. 7b, giving slightly better predictions (i.e., CoV of 0.192) compared to the Russo formula (i.e.,
CoV of 0.233) and outperforming ACI 318-19, EC2, and MW results, displaying almost twice the number of
test samples as MW formulas and four times the number of test samples as ACI318 and EC2 for the same error
ranges. Although the results of the proposed equations and the Russo formula are comparable, the proposed
equations are more straightforward. Furthermore, all performance metrics for the proposed equations, outlined
in Table 7, surpass those of the previously introduced mechanical models.

Feature importance analysis


Evaluating the influence of input parameters on the shear strength of RCDBs is a critical aspect of designing
RCDBs. This study employs the Shapley Additive Explanation (SHAP) method to analyze the impact of input
parameters on the shear strength parameter, Vu/bwh45. Figure 8a and b display the SHAP feature importance
of each input feature for the WOR and WWR databases, respectively. A feature importance value greater than
zero indicates a positive correlation between the variable and the strength index. In contrast, a value less than
zero signifies a negative impact on the strength index. The span-to-depth ratio (a/d), concrete strength (fc′), and
longitudinal reinforcement ratio (ρl) stand out as the most influential design parameters within the dataset for
both WOR and WWR RCDBs. In addition, feature importance analysis shows that vertical and horizontal web

Figure 7.  Prediction errors of design standards and established ML models. (a) The proposed equations, ML
models and previous models, (b) The proposed equations and previous models for WOR and WWR cases.

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 10


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
www.nature.com/scientificreports/

Figure 8.  Features importance for inputs influencing shear strength of RC deep beams. (a) Database without
web reinforcement, (b) database with web reinforcement.

reinforcement ratio (ρv, ρh) are the forth and fifth most important features for WWR database. The importance
of the remaining variables’ features is ranked in descending order.
Additionally, it can be observed that, except for the a/d ratio and beam height (h), all other input variables
have a positive and mixed impact on the strength index. Increasing concrete strength, reinforcement ratios (ρl,
ρv, ρh), and their yield strength will enhance the shear strength of RCDBs, while a/d ratio and beam height (h)
negatively influence shear strength. The negative impact of the a/d ratio aligns with experimental results con-
ducted by K ­ ani41, which showed that beams exhibit higher shear resistance at lower a/d values. Furthermore,
increasing the beam height (h) reduces the shear resistance, as a deeper beam leads to deterioration of the shear
transfer strength by aggregate interlock of the critical shear crack and relatively high energy release, thereby
aggravating the reduction in shear ­resistance46.

Reliability analysis
This section introduces the results of reliability indices for the shear strength of RCDBs for the CATB model
and the two proposed equations. In addition, it assesses the existing design factors outlined in two existing
code standards, including ACI318-1917 or ­EC218. The limit state function g of shear strength of ­RCDBs47 can be
defined as:
1
g =R−Q = Vuc − (D + L) (6)
θR
where R is the random values of shear strength of RCDBs, defined as the predicted shear capacity (Vuc) divided
by the prediction-to-test ratio θR , and Q is the random values of load effect, including the dead load (D) and live
load (L), The value Vuc is calculated for each model from Table 6 with the partial resistance factors taken as unity,
and using the random values of design variables given in Table 8. Using the distribution fit tool in Matlab, it was
found that θR ratio is best fitted with lognormal distribution with mean and variance corresponding to each code
standard, as indicated in Table 9. The nominal values Dn and Ln can be computed from the design resistance Vd
for a given live-to-dead load ratio (Ln/Dn) as follows:
 
fck fy
(7)
 
Vd , or φVn fck , fy = Sd (i.e. γD Dn + γL Ln )
γc γs

Rd or φRn
Dn = , Ln = Dn · k (8)
γD + γL · k
where k is the live-to-dead load ratio Ln/Dn, the reduced designed resistance (Vd) is extracted from dividing
the characteristic strength of concrete and steel materials ( fck and fy) by the material partial factors (γc and γs)18
or multiplying the nominal resistance (Vn) by a strength reduction factor (ϕ)17, and then Vd is balanced by the
enlarged designed load effect (Sd) to ensure a suitable safety margin. Sd is obtained by multiplying the nominal
load values, including dead and live loads (Dn and Ln), by, respectively, partial load factors (γD and γL) and then
combining them linearly. These partial factors are summarised in Table 9 for each code standard.
The safety level of structures can be measured by the reliability index β, a factor related to the failure prob-
ability Pf, as ­follows57:

β = �−1 Pf (9)
 

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 11


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
www.nature.com/scientificreports/

Properties Variables Mean Cov (%) Std. Distribution Space Refs.


47,48
bw (mm) bw + 2.286 – 4.826 (mm) Normal 200
h (mm) h 2.0 – Normal {1000, 2000, 3000} 49,50
Geometry
d (mm) d-4.826 – 12.7 (mm) Normal 0.97 (h) 47,48

a/d a/d – – Deterministic {0.5, 1.25, 2.0} 47

51,52
fc′ (MPa) χfc′* 10.1 – Normal {20, 40, 60}
50,53,54
fy (MPa) 1.2fy 8.3 – Lognormal {235, 355, 420}
Material
50,53,54
fyh (MPa) 1.2fy 8.3 – Lognormal 235
50,53,54
fyv (MPa) 1.2fy 8.3 – Lognormal 235
50,54,55
ρl ρl 1.25 – Normal {0.004, 0.012, 0.02}
50,54,55
Reinforcement ρhw ρhw 1.25 – Normal {0.002, 0.06, 0.01}
50,54,55
ρvw ρhw 1.25 – Normal {0.002, 0.06, 0.01}
k (load ratio) k – – Deterministic {0.5, 1.25, 2.0}
24,56
Load D (dead load) 1.05D 0.1 – Normal –
47,56
L (live load) L 0.18 – Gumbel –

Table 8.  Statistical properties of random variables. *χ = 3.0469 − 0.13543fc′ + 0.31743 0.1fc′ 2 − 0.02413 0.1fc′
3 51
.
  

ACI31817 EC218 CATB Prop.Eqn (WOR) Prop.Eqn (WWR)


Load factor γ
Dead load γD 1.2 1.35 1.2 1.2** 1.2**
Live load γL 1.6 1.5 1.6 1.6 1.6
Best-fit distribution θR (Table *) Lognormal Lognormal Lognormal Lognormal Lognormal
Mean (Table *) 0.699 0.677 1.01 1.003 1.004
CoV (Table *) 0.372 0.353 0.085 0.207 0.192
Target reliability β 3.559 3.818 5.29* 3.5** 3.5**
Evaluated strength reduction factor φ for the target reliability
0.58 0.78 1.0 0.76 0.78
index

Table 9.  Load and resistance factors, the prediction-to-test ratio θ_R distributions, recommended strength
reduction factor ϕ. *Target reliability β = 5.29 evaluated for strength reduction factor φ = 1.0. **Load factors
and target reliability β are assumed to be identical to these of ACI318.

where Φ is the standard cumulative distribution function. Monte Carlo simulation (MCS) is employed to deter-
mine the reliability index due to its simplicity, insensitivity to problem dimensions, and satisfactory ­accuracy56.
In MCS, the failure probability can be calculated as
Nfail
Pf = (10)
N
where N and Nfail are the total number of simulations and the number of failed simulations (when the limit state
function is violated, i.e. g ≤ 0), respectively. To accurately predict the reliability index of the design codes, the
uncertainty or randomness of all input variables, including material geometry and loads, should be ­considered47.
Thirteen random variables are considered in this study, and the statistical properties are summarised in Table 8.
The random numbers of variable inputs are generated with continuous variations stochastically chosen from their
respective distribution functions (Table 8) and drawn from a wide range of geometric and geometry parameters
of RCDBs configurations. They include three values of concrete compressive strength fc′ = {20, 40, 60} MPa, three
values of beam height h = {1000, 2000, 3000} mm, three ratios for longitudinal, VWR, and VWR ratios, three
values of longitudinal steel yield stress fyl = {235, 355, 420} MPa, four ratios for a/d = {0.5, 1.0, 1.5, 2.0}, four ratios
of Ln/Dn = {0.5, 1.0, 1.5, 2.0}. In total, there are 3 × 3 × 3 × 3 × 3 × 3 × 4 × 4 = 11,664 beam configurations considered
for each considered model. The target safety level stipulated by ­ACI31817 and E ­ C218 provisions for the shear
strength of RCDBs are 3.5 and 3.8, respectively. The accuracy of MCS is dependent on the number of samples
N. The number of samples N used in this study for achieving a reliability index β equal to 3.8 with acceptable
accuracy (CoV of 5%) is 5,528,43058.
As illustrated in Fig. 9, the strength reduction factors (ϕ) for the proposed equations are 0.76 and 0.78 for
cases without web reinforcement (WOR) and with web reinforcement (WWR), respectively, at a reliability
index value of 3.5. The higher strength reduction factor in the WWR case is attributed to the ductile behaviour
exhibited by beams with web reinforcement compared to those without web reinforcement. Furthermore, the
strength reduction factors corresponding to the target reliability for the shear strength design of RCDBs accord-
ing to ACI318 and EC2 are 0.58 and 0.78, respectively. While the strength reduction factor for the proposed

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 12


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
www.nature.com/scientificreports/

Figure 9.  Variation of reliability index β in terms of strength reduction factor ϕ for the proposed equations,
EC2 and ACI318.

equation without web reinforcement is comparable to that of EC2 (ϕ = 0.78), a notable distinction lies in their
mean values of θR. As detailed in Table 8, the proposed equation yields a mean value for θR close to 1.0, whereas
the EC2 code standard yields a smaller mean value of 0.677. As per Eqs. (6) and (10), smaller mean values of
θR correspond to low failure probability and relatively high strength reduction factors. Therefore, the reliability
associated with the proposed equations surpasses the reliability results obtained by applying code standards.
Moreover, Fig. 9 reveals that the CATB model, when used with ϕ = 1.0, can achieve a high-reliability index of
5.29. This high reliability index is attributed to the low CoV error metric of the CATB model compared to other
models, as outlined in Table 9, indicating the reliability of using ML models in enhancing the predictive accuracy
for the shear strength of RCDBs.

Conclusions
In conclusion, this study compiled a comprehensive experimental database of 840 experimental tests for the
shear strength of RCDBs from various research papers. It employed eight machine learning models optimised
using the Bayesian Optimization (BO) technique. In addition, proposed expressions are presented for designing
RCDBs. From the evolution results, the following conclusions can be drawn:

• The CATBoost, GPR, and LGBM models exhibited outstanding accuracy and stability, surpassing traditional
design standards. The CATBoost model demonstrated the best prediction accuracy and generalisation ability,
outperforming other ML models.
• The introduced explicit design formulas, derived through symbolic regression, are straightforward and robust,
offering simplicity and robustness compared to previous approaches.
• Comparison with closed-form models and design standards, such as ACI 318-19 and EC2, highlighted the
efficiency of the proposed equations, which displayed superior predictive stability and robustness.
• SHAP analysis revealed that increasing concrete strength, reinforcement ratios (ρl, ρv, ρh) and their yield
strength will enhance the performance of RCDBs, while increasing a/d ratio and beam height (h) will nega-
tively impact the shear strength parameter, Vu/bwh.
• The reliability analysis indicated that the CATBoost model and proposed equations surpassed code standards
regarding reliability and accuracy.

In summary, integrating the ML-based approach presents a promising approach for accurately predicting the
shear strength of RCDBs, providing valuable insights for engineering applications.

Data availability
All data generated or analysed during this study are included in this published article and available in a public
repository: https://​github.​com/​kmega​hed/​Deep-​beam-​ML-​models.

Received: 12 March 2024; Accepted: 7 June 2024

References
1. MacGregor, J. G., Wight, J. K., Teng, S. & Irawan, P. Reinforced Concrete: Mechanics and Design Vol. 3 (Prentice Hall, 1997).
2. Ma, C. et al. Prediction of shear strength of RC deep beams based on interpretable machine learning. Constr. Build. Mater. 387,
131640. https://​doi.​org/​10.​1016/j.​conbu​ildmat.​2023.​131640 (2023).

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 13


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
www.nature.com/scientificreports/

3. Le Nguyen, K., Thi Trinh, H., Nguyen, T. T. & Nguyen, H. D. Comparative study on the performance of different machine learn-
ing techniques to predict the shear strength of RC deep beams: Model selection and industry implications. Expert Syst. Appl. 230,
120649. https://​doi.​org/​10.​1016/j.​eswa.​2023.​120649 (2023).
4. Ashour, A. F., Alvarez, L. F. & Toropov, V. V. Empirical modelling of shear strength of RC deep beams by genetic programming.
Comput. Struct. 81(5), 331–338. https://​doi.​org/​10.​1016/​S0045-​7949(02)​00437-6 (2003).
5. Cheng, M. Y. & Cao, M. T. Evolutionary multivariate adaptive regression splines for estimating shear strength in reinforced-concrete
deep beams. Eng. Appl. Artif. Intell. 28, 86–96. https://​doi.​org/​10.​1016/j.​engap​pai.​2013.​11.​001 (2014).
6. Feng, D.-C., Wang, W.-J., Mangalathu, S., Hu, G. & Wu, T. Implementing ensemble learning methods to predict the shear strength
of RC deep beams with/without web reinforcements. Eng. Struct. 235, 111979. https://​doi.​org/​10.​1016/j.​engst​ruct.​2021.​111979
(2021).
7. Tiwari, A., Gupta, A. K. & Gupta, T. A robust approach to shear strength prediction of reinforced concrete deep beams using
ensemble learning with SHAP interpretability. Soft Comput. https://​doi.​org/​10.​1007/​s00500-​023-​09495-w (2023).
8. Liu, M. Y., Li, Z. & Zhang, H. Probabilistic shear strength prediction for deep beams based on Bayesian-optimized data-driven
approach. Buildings 13(10), 1–16. https://​doi.​org/​10.​3390/​build​ings1​31024​71 (2023).
9. Shahnewaz, M., Rteil, A. & Alam, M. S. Shear strength of reinforced concrete deep beams—A review with improved model by
genetic algorithm and reliability analysis. Structures 23, 494–508. https://​doi.​org/​10.​1016/j.​istruc.​2019.​09.​006 (2020).
10. Wakjira, T., Ibrahim, M., Sajjad, B. & Ebead, U. Shear capacity of reinforced concrete deep beams using genetic algorithm. IOP
Conf. Ser. Mater. Sci. Eng. 910(1), 012002. https://​doi.​org/​10.​1088/​1757-​899X/​910/1/​012002 (2020).
11. Hameed, M. M., Khaleel, F., AlOmar, M. K., Mohd Razali, S. F. & Alsaadi, M. A. Optimising the selection of input variables to
increase the predicting accuracy of shear strength for deep beams. Complexity https://​doi.​org/​10.​1155/​2022/​65327​63 (2022).
12. Park, J. & Kuchma, D. Strut-and-tie model analysis for strength prediction of deep beams. ACI Struct. J. 104, 657–666 (2007).
13. Matamoros, A. B. & Wong, K. H. Design of simply supported deep beams using strut-and-tie models. ACI Struct. J. 100(6), 704–712
(2003).
14. Russo, G., Pauletta, M. & Venir, R. Reinforced concrete deep beams-shear strength model and design formula. ACI Struct. J. 102(3),
429. https://​doi.​org/​10.​14359/​14414 (2005).
15. Vecchio, F. J. & Collins, M. P. The modified compression-field theory for reinforced concrete elements subjected to shear. ACI J.
19(16), 219–231 (1986).
16. Tang, C. Y. & Tan, K.-H. Interactive mechanical model for shear strength of deep beams. J. Struct. Eng. ASCE 130, 1534–1544.
https://​doi.​org/​10.​1061/​(ASCE)​0733-​9445(2004)​130:​10(1534) (2004).
17. 318 ACI Committee. Building Code Requirements for Structural Concrete: (ACI 318-19); and Commentary (ACI 318R–19) (American
Concrete Institute, 2019).
18. Hendy, C. R. & Smith, D. A. Designers’ Guide to EN 1992–2: Eurocode 2: Design of Concrete Structures: Part 2: Concrete Bridges Vol.
17 (Thomas Telford, 2007).
19. Chen, H., Yi, W. J. & Hwang, H. J. Cracking strut-and-tie model for shear strength evaluation of reinforced concrete deep beams.
Eng. Struct. 163, 396–408. https://​doi.​org/​10.​1016/j.​engst​ruct.​2018.​02.​077 (2018).
20. Chetchotisak, P., Teerawong, J. & Yindeesuk, S. Modified interactive strut-and-tie modeling of reinforced concrete deep beams
and corbels. Structures 45, 284–298. https://​doi.​org/​10.​1016/j.​istruc.​2022.​08.​116 (2022).
21. Fan, S., Zhang, Y., Ma, Y.-X. & Tan, K. H. Strut-and-tie and finite element modelling of unsymmetrically-loaded deep beams.
Structures 36, 805–821. https://​doi.​org/​10.​1016/j.​istruc.​2021.​12.​037 (2022).
22. Liang, S., Shen, Y., Gao, X., Cai, Y. & Fei, Z. Symbolic machine learning improved MCFT model for punching shear resistance of
FRP-reinforced concrete slabs. J. Build. Eng. 69, 106257. https://​doi.​org/​10.​1016/j.​jobe.​2023.​106257 (2023).
23. Aguilar, V., Barnes, R. W. & Nowak, A. Strength reduction factors for ACI 318 strut-and-tie method for deep beams. ACI Struct.
J. 119(2), 103–112 (2022).
24. Muendacha, D., Teerawong, J. & Chetchotisak, P. A safety-based evaluation of strut-and-tie methods for shear design of RC deep
beams in accordance with international concrete codes. Eng. Appl. Sci. Res. 47(2), 137–144. https://​doi.​org/​10.​14456/​easr.​2020.​14
(2020).
25. Shen, L., Shen, Y. & Liang, S. Reliability analysis of RC slab-column joints under punching shear load using a machine learning-
based surrogate model. Buildings 12(10), 1750. https://​doi.​org/​10.​3390/​build​ings1​21017​50 (2022).
26. Ismail, K. S. Shear Behaviour of Reinforced Concrete Deep Beams (University of Sheffield, 2016).
27. Schober, P., Boer, C. & Schwarte, L. A. Correlation coefficients: Appropriate use and interpretation. Anesth. Analg. 126(5), 1763–
1768 (2018).
28. Koza, J. R. Genetic programming as a means for programming computers by natural selection. Stat. Comput. 4(2), 87–112. https://​
doi.​org/​10.​1007/​BF001​75355 (1994).
29. Udrescu, S.-M. & Tegmark, M. AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 6(16), eaay2631 (2020).
30. Rasmussen, C. E. et al. Gaussian Processes for Machine Learning Vol. 1 (Springer, 2006).
31. G. Ke et al. LightGBM: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol.
30. Available: https://​proce​edings.​neuri​ps.​cc/​paper_​files/​paper/​2017/​file/​6449f​44a10​2fde8​48669​bdd9e​b6b76​fa-​Paper.​pdf (2017).
32. Breiman, L. Random forests. Mach. Learn. 45(1), 5–32. https://​doi.​org/​10.​1023/A:​10109​33404​324 (2001).
33. Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. CoRR, Available: http://​
arxiv.​org/​abs/​1810.​11363 (2018).
34. Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Confer-
ence on kNowledge Discovery and Data Mining, 785–794. https://​doi.​org/​10.​1145/​29396​72.​29397​85 (2016).
35. Suykens, J. A. K. & Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300. https://​doi.​
org/​10.​1023/A:​10186​28609​742 (1999).
36. Schapire, R. E. The strength of weak learnability. Mach. Learn. 5(2), 197–227. https://​doi.​org/​10.​1007/​BF001​16037 (1990).
37. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features.
In: Advances in Neural Information Processing , vol. 31 (2018).
38. Goldberg, D. E. & Holland, J. H. Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99. https://​doi.​org/​10.​1023/A:​
10226​02019​183 (1988).
39. Cranmer, M. Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl. Available: http://​arxiv.​org/​abs/​
2305.​01582 (2023).
40. Megahed, K., Mahmoud, N. S. & Abd-Rabou, S. E. M. Prediction of the axial compression capacity of stub CFST columns using
machine learning techniques. Sci. Rep. 14(1), 2885. https://​doi.​org/​10.​1038/​s41598-​024-​53352-1 (2024).
41. Kani, G. How safe are our large reinforced concrete beams?. J. Proc. 64(3), 128–141 (1967).
42. Yang, L. & Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 415,
295–316. https://​doi.​org/​10.​1016/j.​neucom.​2020.​07.​061 (2020).
43. Bergstra, J., Bardenet, R., Bengio, Y. & Kégl, B. Algorithms for hyper-parameter optimization. In Advances in Neural Information
Processing Systems, vol. 24. Available: https://​proce​edings.​neuri​ps.​cc/​paper_​files/​paper/​2011/​file/​86e8f​7ab32​cfd12​577bc​2619b​
c6356​90-​Paper.​pdf (2011).
44. Asteris, P. G. & Mokos, V. G. Concrete compressive strength using artificial neural networks. Neural Comput. Appl. 32(15),
11807–11826. https://​doi.​org/​10.​1007/​s00521-​019-​04663-2 (2020).

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 14


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol:.(1234567890)
www.nature.com/scientificreports/

45. Wang, J., Lu, R. & Cheng, M. Application of ensemble model in capacity prediction of the CCFST columns under axial and eccentric
loading. Sci. Rep. 13(1), 9488. https://​doi.​org/​10.​1038/​s41598-​023-​36576-5 (2023).
46. Chen, H., Yi, W.-J. & Ma, Z. J. Shear size effect in simply supported RC deep beams. Eng. Struct. 182, 268–278. https://​doi.​org/​10.​
1016/j.​engst​ruct.​2018.​12.​062 (2019).
47. Nasrollahzadeh, K. & Aghamohammadi, R. Reliability analysis of shear strength provisions for FRP-reinforced concrete beams.
Eng. Struct. 176, 785–800. https://​doi.​org/​10.​1016/j.​engst​ruct.​2018.​09.​016 (2018).
48. Mirza, S. A. & MacGregor, J. G. Probabilistic study of strength of reinforced concrete members. Can. J. Civ. Eng. 9(3), 431–448.
https://​doi.​org/​10.​1139/​l82-​053 (1982).
49. Sýkora, M., Holický, M. & Marková, J. Verification of existing reinforced concrete bridges using the semi-probabilistic approach.
Eng. Struct. 56, 1419–1426. https://​doi.​org/​10.​1016/j.​engst​ruct.​2013.​07.​015 (2013).
50. Yang, I. H., Joh, C. & Kim, B.-S. Structural behavior of ultra high performance concrete beams subjected to bending. Eng. Struct.
32(11), 3478–3487. https://​doi.​org/​10.​1016/j.​engst​ruct.​2010.​07.​017 (2010).
51. Nowak, A. S. & Szerszen, M. M. Calibration of design code for buildings (ACI 318): Part 1—Statistical models for resistance. ACI
Struct. J. 100(3), 377–382 (2003).
52. Eamon, C. & Jensen, E. Reliability analysis of RC beams exposed to fire. J. Struct. Eng. 139, 212–220. https://​doi.​org/​10.​1061/​
(ASCE)​ST.​1943-​541X.​00006​14 (2013).
53. Hess, P. E., Bruchman, D., Assakkaf, I. A. & Ayyub, B. M. Uncertainties in material and geometric strength and load variables. Nav.
Eng. J. 114(2), 139–166. https://​doi.​org/​10.​1111/j.​1559-​3584.​2002.​tb001​28.x (2002).
54. Abbas, Y. M. Shear behavior of ultra-high-performance reinforced concrete beams—Finite element and uncertainty quantification
study. Structures 47, 2365–2380. https://​doi.​org/​10.​1016/j.​istruc.​2022.​12.​060 (2023).
55. Al-Harthy, A. S. & Frangopol, D. M. Reliability assessment of prestressed concrete beams. J. Struct. Eng. 120(1), 180–199 (1994).
56. Nowak, A. S. & Collins, K. R. Reliability of Structures (CRC Press, 2012).
57. Rackwitz, R. & Flessler, B. Structural reliability under combined random load sequences. Comput. Struct. 9(5), 489–494. https://​
doi.​org/​10.​1016/​0045-​7949(78)​90046-9 (1978).
58. Soong, T. T. & Grigoriu, M. Random vibration of mechanical and structural systems. NASA STI/Recon Tech. Rep. A 93, 14690
(1993).
59. Nowak, A. S. Calibration of LRFD bridge code. J. Struct. Eng. 121(8), 1245–1251 (1995).

Acknowledgements
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-
profit sectors.

Author contributions
K.M. is responsible for material preparation, data collection, analysis and preparing the figures.

Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in coopera-
tion with The Egyptian Knowledge Bank (EKB).

Competing interests
The author declares no competing interests.

Additional information
Supplementary Information The online version contains supplementary material available at https://​doi.​org/​
10.​1038/​s41598-​024-​64386-w.
Correspondence and requests for materials should be addressed to K.M.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

© The Author(s) 2024

Scientific Reports | (2024) 14:14590 | https://doi.org/10.1038/s41598-024-64386-w 15


Content courtesy of Springer Nature, terms of use apply. Rights reserved
Vol.:(0123456789)
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

[email protected]

You might also like