NorSand4AI - A Comprehensive Triaxial Test Simulation Database For NS Model
NorSand4AI - A Comprehensive Triaxial Test Simulation Database For NS Model
Abstract. In soil sciences, parametric models known as con- initial test configurations, resulting in a total of 160 000 triax-
stitutive models (e.g., the Modified Cam Clay and the Nor- ial test results. The second one considers nested quasi-Monte
Sand) are used to represent the behavior of natural and ar- Carlo sampling techniques (Sobol and Halton) of input pa-
tificial materials. In contexts where liquefaction may occur, rameters encompassing 2048 soil types, each subjected to 42
the NorSand constitutive model has been extensively applied initial test configurations, resulting in a total of 172 032 tri-
by both industry and academia due to its relatively simple axial test results. By using the quasi-Monte Carlo dataset and
critical state formulation and low number of input parame- 49 of its subsamples, it is shown that the dataset of 2000 soil
ters. Despite its suitability as a good modeling framework to types and 40 initial test configurations is sufficient to rep-
assess static liquefaction, the NorSand model still is based resent the general behavior of the NorSand model. In this
on premises which may not perfectly represent the behavior process, four machine learning algorithms (Ridge Regressor,
of all soil types. In this context, the creation of data-driven KNeighbors Regressor and two variants of the Ridge Regres-
and physically informed metamodels emerges. The literature sor which incorporate nonlinear Nystroem kernel mappings
suggests that data-driven models should initially be devel- of the input and output values) were trained to predict the
oped using synthetic datasets to establish a general frame- constitutive and test parameters based solely on the triaxial
work, which can later be applied to experimental datasets to test results. These algorithms achieved 13.91 % and 16.18 %
enhance the model’s robustness and aid in discovering poten- mean absolute percentage errors among all 14 predicted pa-
tial mechanisms of soil behavior. Therefore, creating large rameters for undrained and drained triaxial test inputs, re-
and reliable synthetic datasets is a crucial step in construct- spectively. As a secondary outcome, this work introduces a
ing data-driven constitutive models. In this context, the Nor- Python script that links the established Visual Basic imple-
Sand model comes in handy: by using NorSand simulations mentation of NorSand to the Python environment. This en-
as the training dataset, data-driven constitutive metamodels ables researchers to leverage the comprehensive capabilities
can then be fine-tuned using real test results. The models of Python packages in their analyses related to this constitu-
created that way will combine the power of NorSand with the tive model.
flexibility provided by data-driven approaches, enhancing the
modeling capabilities for liquefaction. Therefore, for a mate-
rial following the NorSand model, the present paper presents
a first-of-its-kind database that addresses the size and com- 1 Introduction
plexity issues of creating synthetic datasets for nonlinear
constitutive modeling of soils by simulating both drained and In situations where liquefaction is a potential concern,
undrained triaxial tests. Two datasets are provided: the first geotechnical engineers and soil scientists seek suitable mod-
one considers a nested Latin hypercube sampling of input pa- eling frameworks to accurately evaluate and mitigate asso-
rameters encompassing 2000 soil types, each subjected to 40 ciated risks. One specific scenario highlighting this need is
the case of filtered tailing piles. These piles pose significant
geotechnical risks related to liquefaction, requiring thorough of a NorSand material, while the second one, completely dif-
assessment through appropriate constitutive modeling. Fac- ferent from the first dataset, will be a perfect out-of-sample
tors such as the height and speed of stacking play crucial testing dataset used to perform the sample size validations
roles in creating vulnerable regions within the pile suscep- mentioned. A byproduct of such sample size validation will
tible to liquefaction. The existence of a liquefaction trigger, be the training of different machine learning algorithms to
particularly in undrained loading conditions, has the poten- perform the following learning task: obtain the input param-
tial to result in the structural collapse of the pile. eters of the NorSand model solely from the results of triaxial
In this scenario, the NorSand constitutive model emerges tests. Different sampling techniques will be used to produce
as a suitable alternative to liquefaction modeling due to its the datasets mentioned, such as nested Latin hypercube and
relatively simple critical state formulation and low number quasi-Monte Carlo sampling of input parameters. Then, the
of input parameters. This model is a generalized critical state third aspect is considered by presenting an implementation
model based on the state parameter ψ, as defined by Jefferies which connects the well-known VBA implementation to the
(1993): Python environment. We will use the VBA code as the “pro-
cessing kernel” of our Python implementation, taking advan-
ψ = e − ec , (1) tage of the years of tests and validation of the algorithm pro-
vided by Jefferies and Been (2015). This new Python code
where e is the current void ratio and ec is the void ratio at allows other researchers to use the full power of Python pack-
the critical state. The NorSand model emulates natural soil ages during their analyses involving NorSand.
behavior by incorporating associated plasticity and limited The paper is structured as follows: Sect. 2 presents the gen-
hardening, which enables dilation similar to that observed in eral concepts of data-driven metamodels, with special em-
real soils. This limited hardening causes yielding during un- phasis given to soil constitutive modeling. Then, Sect. 3 in-
loading conditions and provides second-order detail in repli- troduces the Norsand model. Section 4 presents the methods
cating observed soil behavior (Silva et al., 2022; Jefferies and considered in this study. Section 5 describes the associated
Been, 2015). data records, while Sect. 6 presents technical validation of
Despite its suitability as a good modeling framework to the results. Section 7 presents some usage notes and codes
assess static liquefaction (Sternik, 2015), the NorSand model considered in the paper. Finally, Sect. 8 presents the conclu-
still is based on premises which may not perfectly represent sions.
the behavior of all soil types. Also, only recently the Nor-
Sand method has been implemented in commercial finite el-
ement software (Rocscience, 2022; Itasca Consulting Group, 2 Data-driven metamodels
2023; Bentley, 2022). Besides, regarding open-source distri-
butions, only the Visual Basic (VBA) implementation pre- Montáns et al. (2019) emphasize that human learning in-
sented by Jefferies and Been (2015) is available. It is pre- volves observing and experiencing the world, collecting data
cisely in this context that the creation of data-driven and and identifying patterns through repeated experiments. Sci-
physically informed metamodels emerges. These metamod- entific discovery involves formalizing these patterns and re-
els, when based on artificial intelligence techniques, espe- lationships into laws and equations, transforming data into
cially machine learning (ML) and deep learning (DL), may properties and variables, and converting observations into
be able to provide accurate and computationally cheap mod- events. Although laws and equations aid learning, the clas-
els, allowing them to be a perfect link between complex com- sical learning process in science is often slow and expen-
putational models and real-time data collection and monitor- sive, requiring extensive observation and experimentation to
ing. Such methods need to be trained on large-scale datasets understand the main variables and their impact on the phe-
and this is where the NorSand model comes in handy: by us- nomenon. Data-driven procedures, on the other hand, seek,
ing NorSand simulations as the training dataset, data-driven if possible, an implicitly unbiased approach to our learning
constitutive metamodels can then be fine-tuned using real test experience based on raw data from actual or synthetic ob-
results. These models will combine the power of NorSand servations. These procedures have the added advantage of
with the flexibility provided by data-driven approaches, en- testing correlations between different variables and observa-
hancing the modeling capabilities for liquefaction. tions, learning unanticipated patterns in nature and allowing
Thus, the current paper aims to address three main is- us to discover new scientific laws or even make predictions
sues: the quantity and complexity of synthetic datasets for without the availability of such laws.
nonlinear constitutive modeling of soils and the availability The recent rapid increase in the availability of measure-
of open-source implementations of the NorSand constitutive ment data from physical systems as well as from massive nu-
model. The first two aspects are addressed by simulating both merical simulations has stimulated the development of many
drained and undrained triaxial tests. Two datasets are pro- data-driven methods for modeling and predicting dynam-
vided: the first one will be used to study how large a given ics. At the forefront of data-driven methods are deep neural
dataset must be in order to accurately capture the behavior networks (DNNs). DNNs not only achieve superior perfor-
mance for tasks such as image classification, but have also on the mapping ability of machine learning algorithms can
proven effective for future-state prediction of dynamical sys- be eliminated (Zhang et al., 2020). Therefore, creating large
tems (Haghighat et al., 2021). A key limitation of DNNs and reliable synthetic datasets is a crucial step in constructing
and similar data-based methods is the lack of interpretabil- data-driven constitutive models.
ity of the resulting model: they are focused on prediction and
do not provide governing equations or clearly interpretable 2.2 Data-driven soil constitutive models
models in terms of the original set of variables. An alter-
native data-based approach uses symbolic regression to di- Currently, there is a lack of robust and high-volume datasets
rectly identify the structure of a nonlinear dynamical system in the literature for soil modeling tasks. One effective method
from data (Schmidt and Lipson, 2009). This works remark- to generate synthetic datasets is through numerical simula-
ably well for discovering interpretable physical models, but tions performed on digital soil models. Typically, these sim-
symbolic regression is computationally expensive and can be ulations involve selecting a parametric constitutive model,
difficult to scale to large problems (Montáns et al., 2019). sampling some parameters and running simulations that
mimic real-world test setups. In soil modeling, triaxial tests
2.1 Data-driven constitutive modeling are commonly simulated using conventional physics-driven
constitutive models, such as simple monotonic Konder’s ex-
In order to create metamodels from neural networks (NN), pression (Basheer, 2000), or more advanced models like the
this type of approach generally requires a priori calibration Modified Cam Clay (MCC) (Fu et al., 2007; Zhang et al.,
of the algorithms from data considered to be representative 2023).
of material behavior (He et al., 2021). For example, NNs In particular, a simple sand shear constitutive model was
have been applied to model a variety of materials, includ- used to generate synthetic datasets in the work of Zhang
ing concrete materials (Ghaboussi et al., 1991), hyperelas- et al. (2021b). A total of 14 curves were generated to de-
tic materials (Shen et al., 2005), viscoplastic steel material velop the ML-based constitutive model (9 curves for training
(Furukawa and Yagawa, 1998) and homogenized properties and 5 curves for testing).
of mixed structures (Lefik and Schrefler, 2003). Once cali- On the other hand, the MCC constitutive model was uti-
brated, NN-based constitutive models have been integrated lized to produce a benchmark stress–strain dataset of a virtual
into finite element codes to predict path- or rate-dependent soil in the work of Zhang et al. (2023). In that study, a total
material behaviors (Lefik and Schrefler, 2003; Hashash et al., of 250 soil types were considered, with 125 being part of the
2004; Jung and Ghaboussi, 2006; Stoffel et al., 2019). training dataset and the remaining 125 in the testing dataset.
Recently, DNNs with special mechanistic architectures, Considering all the initial states in the paper by Zhang et al.
such as recurrent neural networks (RNNs), have been applied (2023), 1125 sets of stress–strain samples were employed as
to path-dependent materials (Wang and Sun, 2018; Mozaffar the training dataset, while 1250 sets of stress–strain samples
et al., 2019; Heider et al., 2020). It is clear that this type of constituted the testing dataset.
approach has found significant application in a wide range The MCC model has been a fundamental element in nu-
of engineering fields, as reinforced by He et al. (2021) when merous complex models developed in recent times (Yao
they argue that data-driven computation with physical con- et al., 2008). However, this model and its variations are not
straints is an emerging computational paradigm that allows well suited for depicting the behavior of actual sands due to
the simulation of complex materials directly based on the their insufficient representation of key features such as yield-
materials database and disregards the classical constitutive ing and dilation. This is because these models assume that
model construction. soils denser than the critical state line are overconsolidated,
To develop a data-driven constitutive model, a substantial resulting in unrealistically high stiffness and excessively ex-
and reliable dataset is necessary. However, obtaining a suffi- aggerated strength (Woudstra, 2021). As indicated in the In-
ciently large dataset for soil science can be challenging since troduction section, the NorSand constitutive model presents
experimental data are often limited and inadequate for train- clear advantages over the MCC model and, therefore, shall
ing ML and DL algorithms. Generating synthetic data using be described in detail in the next section.
a theoretical function can be a useful alternative, as it allows
for the creation of an unlimited supply of data (Zhang et al.,
2021a). 3 NorSand
The literature suggests that data-driven models should ini-
tially be developed using synthetic datasets to establish a The NorSand constitutive model is a comprehensive critical
general framework, which can later be applied to experi- state model that effectively accounts for the impact of void
mental datasets to enhance the model’s robustness and aid ratio on soil behavior, providing a robust framework for mod-
in discovering potential mechanisms of soil behavior (Zhang eling static liquefaction in engineering applications. A dis-
et al., 2021a). By calibrating constitutive models on synthetic tinctive characteristic of soils is that their void ratios or rela-
datasets, the impact of experimental and measurement errors tive densities influence their mechanical properties. In this
Table 1. Input values for NorSand model also used as inputs for the NorSandTXL VBA routine (Jefferies and Been, 2015).
Soil properties
Parameter class Parameter Sampling range Unit Description
0|p0 =1 kPa [0.9,1.4] – CSL mean effective stress at p 0 = 1 kPa
CSL parameters
λ [0.01,0.07] (ln kPa)−1 Slope of CSL defined on base e
Mtc [1.2,1.5] – Critical friction ratio, with triaxial com-
pression as a reference condition
Plasticity N [0.2,0.5] – Volumetric coupling parameter
χtc [2,5] – Relates minimum dilatancy to corre-
sponding ψ, with triaxial as a reference
condition
H0 [75,500] – H is the loading plastic hardening mod-
Hψ [200,500] – ulus, such that H = H0 + Hψ ψ.
Gmax |p0 [30,100] MPa Shear modulus at p0 = p00
0
Elasticity Gexp [0.1,0.6] – Exponent of nonlinear shear
modulus change with stress,
Gmax = Gmax |p0 (p 0 /p00 )Gexp
0
ν [0.1,0.3] – Poisson’s ratio
Initial soil state
Parameter class Parameter Sampling range Unit Description
ψ0 [−0.2, ψmax /5] – Initial critical state parameter, where
ψmax = Mtc /(χ(1 + N))
Stress and deformability
p00 [50,1000] kPa Initial mean effective stress
K0 [0.8,1.2] – Geostatic stress ratio
OCR (“R”) [0.5,3] – Overconsolidation ratio
regard, NorSand, as a constitutive model, aptly elucidates dard laboratory tests. The model effectively captures a wide
changes in soil behavior resulting from variations in void ra- range of soil behaviors influenced by varying density and
tio (Jefferies and Been, 2015). confining stress. The key additional parameter, beyond what
Within the Critical State Soil Mechanics (CSSM) frame- is necessary for defining an MCC model, is the state pa-
work, NorSand aligns with widely used models like the Orig- rameter. In situations where precision in representing volume
inal Cam Clay (OCC; Schofield and Wroth, 1968) and the change is crucial, the added effort required for parameter de-
MCC (Roscoe and Burland, 1968). In fact, the NorSand and termination is more than justified.
OCC yield surfaces have the same shapes and the same flow Developed initially for sands based on observations in
rules. CSSM is founded on two principles: (1) the presence of large-scale hydraulic fills such as tailing dams, NorSand ap-
a unique failure locus known as the critical state locus (CSL) plicability extends beyond, encompassing any soil where
and (2) the assertion that shear strain guides soil toward the particle-to-particle interactions are controlled by contact
CSL. forces and slips, rather than cohesive bonds. Present appli-
The primary limitation of MCC, especially when applied cations of NorSand span a range from well-graded tills to
to sands, lies in its inability to capture the dilation behavior sands and clayey silts (Jefferies and Been, 2015).
observed in dense sands. Moreover, it proves inadequate in The input parameters of the NorSand model are presented
predicting the behavior of loose sands and is unsuitable for in Table 1, where the meaning of each parameter is also pre-
addressing liquefaction-related issues. NorSand’s key advan- sented in the column “Description”. The sampling ranges
tage lies in its incorporation of a state parameter, representing presented will be discussed in the next section, as they are
the difference between the current void ratio of the soil and not intrinsic to the NorSand model.
its critical state. This approach uniquely relates soil dilation
or compaction to the state parameter (Rocscience, 2022).
NorSand stands out for its ease of use, particularly for
practical geotechnical engineers. It relies on a minimal set of
material properties, conveniently measurable through stan-
Figure 1. Scatter plot illustrating how each space-filling technique works for particular pairs of constitutive and test-related parameters.
σa0 (kPa) and confining stress σr0 (kPa), a total of 10 entities 2023). As a result, this is the data format chosen for the
are reported from the tests, which are 1 (axial strain), v present paper.
(volumetric strain), p 0 = (σa0 + 2σr0 )/3 (mean effective stress
in kPa), q = σa0 − σr0 (deviatoric stress in kPa), e (void ra- 4.2 Sample size validation
tio), pi /p 0 (stress ratio), (pi /p0 )max (maximum stress ratio),
ψ (state parameter), Dp (dilation) and η = q/p 0 . Thus, the
The samples generated using the methods in the previous
dataset is a 4000 × 10 array, as presented in Table 2.
subsection need to be sufficiently large in order to represent
After the simulation is run, the results are saved in .h5 for-
the general behavior of the NorSand model. The best way to
mat files for postprocessing. The file extension .h5 is associ-
show that the sample size is sufficient is to study how a model
ated with the Hierarchical Data Format (HDF5) (The HDF
calibrated (or trained) on a given dataset performs. So, we
Group, 1997-2023), which is a type of high-performance
chose the most direct (and actually most important) learning
distributed file system. It is specifically designed to man-
task one could face while working with the datasets gener-
age large and complex datasets efficiently and flexibly. Ad-
ated: back-calculation of the constitutive parameters of the
ditionally, it enables a self-describing file format that is
model based solely on the triaxial test results. In short, from
portable and supports parallel I/O for data compression (Lee
the triaxial tests we will learn the values of the parameters
et al., 2022), and has shown superior performance with high-
which govern the behavior of the material.
dimensional and highly structured data (Nti-Addae et al.,
This way, it is possible to recall that a total of 14 parame-
2019). The literature indicates that the HDF5 has been popu-
ters (10 constitutive and 4 related to test conditions) are used
lar in scientific communities since the late 1990s (Lee et al.,
to generate the triaxial test results (4000 × 10 array where
2022), which is evident by the large number of open-source
4000 denotes the number of time steps of the loading pro-
and commercial software packages for data visualization and
cess and 10 is the number of quantities monitored during the
analysis that can read and write HDF5 (The HDF Group,
test), as presented in Table 2. From last subsection’s notation,
Let Ini (shape 1 × 14) be the ith row of the In matrix, which
contains the constitutive parameters, and let ttui and ttdi be triaxial test results for both drained and undrained cases. Let
the results of the triaxial test under undrained and drained us call this new dataset and qIn2048,42 .
conditions, respectively (4000 × 10 arrays, each) obtained by By using the extensibility property of the sequences con-
using these parameters on the NorSandTXL routine. sidered, 49 subsamples were taken: qInn,m for n in [32, 64,
We will consider the following learning problem: from 128, 256, 512, 1024, 2048] and m in [6, 12, 18, 24, 30, 36,
a sample of input parameters In = Inn,m , which considers 42]. One may see that powers of 2 were used as sample sizes
n different types of soil and m different test configurations for the Sobol sampling scheme, which is standard and derives
(therefore with nm rows), we will use the ttui (or ttdi ), from its implementation in scipy.stats. It is worth noting that,
for i = 1, . . ., nm, to learn the vectors of parameters Ini , for in general, none of the entries of Inn,m will be in qInn,m ,
i = 1, . . ., nm. We wish to investigate what the values of n which indicates that using qInn,m for training and validation,
and m are that suffice to produce an accurate representation and Inn,m for testing, does not allow for any data “leakage”.
of the model. In order to do so, following standard learning Besides, there is a clear benefit in using Inn,m as a test set: all
tasks in a machine learning context, we need training, valida- the models will be tested on the same dataset.
tion and testing data. It is worth noting that our methodology For the learning task considered, we used the scikit-learn
needs to be robust, so we indeed need the validation dataset Python package (Pedregosa et al., 2011) and chose four al-
because hyperparameter tuning will be performed. gorithms: Ridge Regressor, KNeighbors Regressor and two
The first dataset obtained by following the methods variants of the Ridge Regressor which incorporate nonlinear
in Sect. 4.1 was generated by a Latin hypercube sam- mappings of the input and output values. The first two algo-
pling (LHS) algorithm, which is known to provide low- rithms mentioned belong to two different classes: linear and
discrepancy sequences of values (i.e., the samples are spread neighbors-based regressors. They were chosen to illustrate
in the domain of the sampled variables). Despite being a re- how different types of algorithms learn our chosen task. The
ally powerful technique, LHS lacks one relevant property: se- variants of the Ridge Regressor were chosen to account for
quences obtained by LHS are not extensible. To put it simply, nonlinearities by using the kernel trick. Considering the high
being extensible means that a sample of size j contains the dimensionality of the input datasets, using traditional kernels
values of the sample of size k, j > k. This way, it would not is not computationally feasible, so we used Nystroem kernels
be possible to subsample from our original sample In in or- (Yang et al., 2012), which approximate a kernel map using a
der to build smaller datasets without losing the space-filling subset of the training data. By combining Nystroem kernels
capability of the dataset. This way, we needed to consider and Ridge Regressors, we can map the inputs to a nonlinear
another sampling scheme to perform our investigation. feature space and then consider a linear regression on these
We chose to combine two quasi-Monte Carlo low dis- features. This is a similar approach to the one considered to
crepancy sequence generation techniques, i.e., Sobol (Sobol, build support vector machine regressors, but with a slightly
1967) and Halton (Halton, 1960), which are also extensible, different regularization for the decision boundary.
to perform our tests. In that case, we generated a dataset with We also considered mapping the output values (14 pa-
n = 2048 and m = 42 using Sobol sampling for the consti- rameters, in our case) to the [0,1] range by combining the
tutive parameters (10 parameters) and Halton sampling for scikit-learn implementations of TransformedTargetRegres-
the experimental test condition variables (four variables) us- sor and QuantileTransformer, which transforms the target
ing the SciPy Python package (Virtanen et al., 2020). Both values (outputs of the pipeline) to follow a uniform distribu-
sequences have been scrambled (Owen and Rudolf, 2021) to tion. Therefore, for a given component, this transformation
improve their robustness for space filling. By using these pa- tends to spread out the most frequent values. It also reduces
rameters, we ran the NorSandTXL routine in the same man- the impact of (marginal) outliers (Pedregosa et al., 2011). For
ner as described in Sect. 4.1 and obtained the corresponding
Figure 2. Methodology used to assess the sufficiency of the dataset containing 2000 soil types and 40 test conditions to represent the general
behavior of the NorSand model.
all the algorithms considered, we also used a QuantileTrans- – Down-sample the 4000 time steps to 40 by using evenly
former to preprocess the input values. spaced values on a logarithmic scale (function logspace
This way, Fig. 2 presents the methodology proposed and from Python package NumPy): more values in the be-
applied to assess the quality of the sample size. In the present ginning of the time steps, where more changes are ob-
paper, the LHS-generated dataset with nsoils = 2000 and served. This process is illustrated in Figs. 3 and 4, where
nconditions = 40, whose input parameter matrix is In2000,40 , the downsampling is performed for 40 points logarith-
will have its sufficiency assessed. mically spaced between 1 = 10−3 % and 15.78 %. This
It is possible to describe the workflow in Fig. 2, reduces each simulated triaxial test corresponding to the
for n in [32,64,128,256,512,1024,2048] and m in parameter matrix qInn,m from 4000 × 10 to 40 × 3. The
[6,12,18,24,30,36,42], as follows: concatenation of all triaxial test results corresponding
to the parameter matrix qInn,m shall be named qInNn,m
– For each simulated triaxial test corresponding to the pa- and is of size (nm, 40, 3).
rameter matrix qInn,m , select only the columns corre-
sponding to 1 , p 0 , q and e (axial strain, mean effec- – Perform a GroupKFold cross-validation scheme to find
tive stress, deviatoric stress and void ratio, respectively), the best hyperparameters of an algorithm A using
which are the variables commonly measured and re- qInNn,m as inputs and qInn,m as outputs. The loss func-
ported. The other seven columns are manipulations of tion considered during the GroupKFold cross-validation
these three (Dp or η, for example) and could be used as is the mean absolute percentage error across all folds.
alternative regression variables, but such selection is not
the focus of the present paper. This reduced simulation – Retrain the algorithm A using all qInNn,m and qInn,m
dataset is of shape 4000 × 4. after fixing the hyperparameters as the optimal ones ob-
tained during the cross-validation scheme.
– Each triaxial test simulation may have different
start/end values for 1 , so it is important to “align” all – Test the trained algorithm At on Innh ,mh , where nh and
the tests considered. By alignment we mean that all the mh are the hypothesized sufficient number of materials
tests will have measurements for the same values of 1 . and test conditions, respectively.
This will enable us to use this variable as an index and,
therefore, decrease the dimensionality of each triaxial – Obtain the mean absolute percentage error in the pre-
test simulation from 4000 × 4 to 4000 × 3. (Each line dictions of all the 14 input parameters corresponding to
will correspond to a single value of 1 .) We must se- Innh ,mh .
lect the smallest maximum value of 1 across all simu-
lations (which was found to be around 15.74 % for the – Get the overall mean error corresponding to all the input
datasets considered and is represented as the vertical parameters.
line in Figs. 3 and 4).
Figure 3. Downsampling process from 4000 to 40 points in the logarithmic scale for drained tests.
Figure 4. Downsampling process from 4000 to 40 points in the logarithmic scale for undrained tests.
As described, for training and validation, we considered a [0,1], “n_components” parameter as a random equi-probable
GroupKFold cross-validation technique, which is a K-fold it- choice among [600,1200,1800], “kernel” parameter as a ran-
erator variant with non-overlapping groups (Pedregosa et al., dom equi-probable choice among [“additive_chi2”, “chi2”,
2011). This approach makes sure no material (group) is “cosine”, “linear”, “poly”, “polynomial”, “rbf”, “laplacian”,
present in the training and validation sets, which would lead “sigmoid”], “degree” parameter as the integer value trunca-
to data “leakage”. tion of an uniform random variable on [1, 10] and “coef0”
A Bayesian optimization was performed to look for the parameter uniformly on [0,1].
best hyperparameters using the cross-validation folds gen- Finally, after the best hyperparameters are found, they are
erated. This process was carried out using the HyperOpt fixed and the algorithm A is retrained with the full dataset
Python package (Bergstra et al., 2015), which considers tree- qInNn,m . This calibrated version is then used to test the qual-
structured Parzen estimators. The search space for the Ridge ity of the model on the triaxial test results corresponding to
and KNeighbors Regressors are the ones considered in the the dataset Innh ,mh . Then, the errors obtained for each model
HyperOpt-Sklearn Python package (Komer et al., 2014). For are plotted and analyzed. The reader can find the complete
the Nystroem kernel, a custom search space was defined and codes used to implement the steps above in Ozelim et al.
consisted of the following: “gamma” parameter uniformly on (2023b).
Table 3. Attributes of the NorSandTXL dataset present in each simulations for drained and undrained scenarios, respec-
Par_X_Y.h5 file. tively.
Attribute Parameter/value
6 Technical validation
“Gamma” 0|p0 =1 kPa
“lambda” λ Considering that the engine running the triaxial test simula-
“Mtc” Mtc
tions is the Excel spreadsheet presented in the book by Jef-
“N” N
feries and Been (2015) and that such a spreadsheet has been
“Xtc” χtc
“H0” H0 extensively validated by both academia and industry, there is
“Hy” Hψ no need to discuss the technical quality of the dataset. On the
“Gmax_p0” Gmax |p0 other hand, it is necessary to show that In2000,40 suffices to
0
“G_exp” Gexp cover the general behavior of the NorSand models.
“n” ν By following the methods previously described and plot-
“Psi_0” ψ0 ting the mean absolute percentage error (MAPE) result of
“p0” p00 the 49 models (each trained and validated with samples of
“K0” K0 different sizes subsampled from qIn2048,42 ), Figs. 5 and 6
“OCR” OCR (”R”) were obtained for drained and undrained conditions, respec-
“Type” Drained or Undrained tively. The four algorithms considered were Ridge, KNeigh-
bors, Ridge-K (with nonlinear kernel on inputs) and Ridge-
KT (with nonlinear kernel on inputs and also QuantileTrans-
5 Data records former on the outputs). It is clear in the figures that, for con-
tours of 0.5 % gains in MAPE, the sample size of 2000×40 is
In the present paper, it is shown that the LHS-generated actually more than enough for the learning task considered.
dataset with nsoils = 2000 and nconditions = 40 is a sufficient This can be stated by noticing that the contours with lower
dataset. Thus, the folder containing such a dataset can be error encompass samples with an exponential range of sizes.
found in Ozelim et al. (2023a) and has the following struc- (The x axis is in log scale.) This indicates a really small gra-
ture: dient on the error in the n × m space, implying a good sam-
ple size. This happens for all four algorithms, indicating that
not only linear and neighbors-based regressors have reached
NorSandTXL_H5 \ Simus\ TT\ Par_X_Y.h5
their maximum ability to learn, but also the nonlinear vari-
ants considered. It can be seen that the two nonlinear trans-
where TT stands for the test type (Drained or Undrained), X
formations applied (to inputs and to both inputs and outputs)
is the material index (from 0 to 1999) and Y is the sequential
present similar behavior, although with considerably smaller
index for the input parameters (from 0 to 79999).
MAPEs.
Each Par_X_Y.h5 file contains a dataset named Nor-
Analysis of Figs. 5 and 6 indicates that for the learning task
SandTXL which includes the simulation results as presented
hereby considered, undrained tests generally presented a bet-
in Table 2. It is worth noting that the values stored are of
ter performance when compared with drained tests. A possi-
the type float32, which is sufficient for the applications en-
ble cause for such behavior is that during undrained tests the
visioned for the dataset. In addition to the simulation results,
void ratio is kept constant. Thus, for the learning task consid-
the dataset also contains the attributes shown in Table 3. The
ered, the algorithm does not need to perform any nonlinear
correspondence between the attributes, whose data type is
operations on one-third of the input dataset (which consists
either float32 or <U7 (fixed-length character string of seven
of e, p and q for 40 values of 1 ). So, with the same number
Unicode characters), and NorSandTXL input parameters is
of training samples and analytical structure of the learning
also presented in Table 3. It is easy to see that the dataset at-
algorithm, it is expected that fewer nonlinearities in the in-
tributes in each file allow for a complete reproduction of the
puts would result in a better performance (smaller errors) of
results, if desired. The units of the parameters are consistent
the predicted outputs.
with NorSandTXL, as presented in Table 1.
Due to the space-filling qualities of both In2000,40 and
In order to prove the sufficiency of In2000,40 , we gen-
qIn2048,42 , qIn2048,42 can also be considered a sufficient
erated the dataset qIn2048,42 following the methods previ-
dataset to represent the NorSand model.
ously presented. This latter dataset is also available in Oze-
lim et al. (2023a) with a similar folder structure. In that
case, the upper-level folder is named NorSand_2048_42.
It is worth noting that, due to upload difficulties, Nor-
Sand_2048_42 was split as NorSand_2048_42_Drained and
NorSand_2048_42_Undrained, where each file contains the
Figure 5. Mean absolute percentage error for all 14 parameters after being back-calculated solely from drained triaxial test results.
Figure 6. Mean absolute percentage error for all 14 parameters after being back-calculated solely from undrained triaxial test results.
Figure 8. Scatter plots of true and predicted values for ν obtained by the best performing algorithm (Ridge-KT) with the 2048 × 42 training
dataset for both drained and undrained tests.
Figure 9. Scatter plots of true and predicted values for χtc obtained by the best performing algorithm (Ridge-KT) with the 2048×42 training
dataset for both drained and undrained tests.
Figure 10. Scatter plots of true and predicted values for Hψ obtained by the best performing algorithm (Ridge-KT) with the 2048×42 training
dataset for both drained and undrained tests.
Figure 11. Scatter plots of true and predicted values for 0 obtained by the best performing algorithm (Ridge-KT) with the 2048 × 42 training
dataset for both drained and undrained tests.
Figure 12. Scatter plots of true and predicted values for OCR obtained by the best performing algorithm (Ridge-KT) with the 2048 ×
42 training dataset for both drained and undrained tests.
Figure 13. Scatter plots of true and predicted values for Gexp obtained by the best performing algorithm (Ridge-KT) with the 2048 ×
42 training dataset for both drained and undrained tests.
Figure 14. Scatter plots of true and predicted values for Gmax,p0 obtained by the best performing algorithm (Ridge-KT) with the 2048 ×
0
42 training dataset for both drained and undrained tests.
Figure 15. Scatter plots of true and predicted values for λ obtained by the best performing algorithm (Ridge-KT) with the 2048 × 42 training
dataset for both drained and undrained tests.
Figure 16. Scatter plots of true and predicted values for H0 obtained by the best performing algorithm (Ridge-KT) with the 2048×42 training
dataset for both drained and undrained tests.
Figure 17. Drained mean absolute percentage errors obtained for Figure 18. Undrained mean absolute percentage errors obtained for
each parameter by the best performing algorithm (Ridge-KT) with each parameter by the best performing algorithm (Ridge-KT) with
training datasets of different size. training datasets of different size.
7.1 Simply run NorSand in Python – type_v: type of the simulation (either “Drained” or
“Undrained”).
If one seeks to simply run NorSand in Python, the function
run_NorSand presented in Listing 3 can be used. Its inputs This function outputs two entities: a dictionary contain-
are ing the parameters inserted to run the simulation and a
4000 × 10 pandas dataframe with simulation results (which
– final_comp: input parameters as a NumPy array are located within the “Txl SimResults” tab of the xlsm file).
of shape (1,14). The parameters need to be in- The columns are the ones presented in Table 3.
serted in the same order as dictpos.keys(), i.e.,
[“Gamma”, “lambda”, “Mtc”, “N”, “Xtc”, “H0”, “Hy”, 7.2 Generate and save files
“Gmax_p0”, “G_exp”, “nu”, “Psi_0”, “p0”, “K0”,
“OCR”]; To generate the LHS inputs for the NorSandTXL spread-
– dictpos: dictionary to locate the parameters inside the sheet, considering n_samples soil types and n_samples_2 ini-
spreadsheet; tial test conditions, the function gen_NorSand_par_2, pre-
sented in Listing 4, was considered.
– path_root: path of the spreadsheet “NorTxl.xlsm”, The quasi-Monte Carlo sampling schemes (Sobol and Hal-
obtained at http://www.crcpress.com/product/isbn/ ton) can be used to generate the input samples by means of
9781482213683 (last access: 8 February 2024); the gen_NorSand_par_LD function, presented in Listing 5.
Furthermore, to run the NorSandTXL Excel needed to be reduced due to instabilities in the VBA code
spreadsheet located in path_xlsm for all the in- calculations. These values were
put parameters previously obtained as final_comp
– final_comp[19572][10]=0.085 and
= gen_NorSand_par_2 (dict_ranges_material,
dict_ranges_test,n_samples,n_samples_2) (or fi- – final_comp[10929][10]=0.082.
nal_comp = gen_NorSand_par_LD(dict_ranges_material,
dict_ranges_test,n_samples,n_samples_2) for the Furthermore, for the quasi-Monte Carlo sampling with 2048
quasi-Monte Carlo sampling of inputs), the function soil types and 42 test conditions, five values of sampled ψ0
run_NorSand_simus_P can be run. This function is pre- needed to be reduced due to the same reasons. These values
sented in Listing 6. were
The function run_NorSand_simus_P runs the simulation – final_comp[56382][10]=0.0849,
and also saves the results as .h5 files in the same folder as the
Excel spreadsheet. In this case, the new files are saved fol- – final_comp[57476][10]=0.0766,
lowing the naming convention and folder structure discussed – final_comp[85371][10]=0.0955,
in the paper.
It is worth noting that for the LHS sampling with 2000 – final_comp[34971][10]=0.08 and
soil types and 40 test conditions, two values of sampled ψ0
– final_comp[41245][10]=0.072.
All the codes previously presented are available as the shown to provide a good balance between complexity and
Jupyter notebook Sample_and_Run.ipynb in Ozelim et al. accuracy. Also, this model is used to assess the liquefaction
(2023b). potential of soils, which is a major cause of high scale disas-
ters lately, such as tailing dams’ failures.
7.3 Analyzing errors during learning tasks In this study, major issues were addressed. Firstly, the pa-
per tackled the challenges associated with the quantity and
As described in the Methods section, we perform a sample complexity of synthetic datasets required for nonlinear con-
size validation. Considering that the codes for such valida- stitutive modeling of soils. This was achieved by simulat-
tion are lengthy, they are presented in Ozelim et al. (2023b). ing both drained and undrained triaxial tests, resulting in
The Jupyter notebook Sample_size_validation.ipynb is fully two datasets. The first dataset involved a nested Latin hyper-
commented to illustrate its usage. cube sampling of input parameters, covering 2000 soil types
with 40 initial test configurations for each, yielding a total of
160 000 triaxial test results. The second dataset employed a
8 Conclusions nested quasi-Monte Carlo sampling (Sobol and Halton) of in-
put parameters, encompassing 2048 soil types with 42 initial
Obtaining massive datasets for modeling the behavior of soils test configurations for each, resulting in a total of 172 032 tri-
is of great interest, not only because new artificial intelli- axial test results. Each simulation dataset was represented as
gence algorithms can be built, but also to assess the adequacy a matrix of dimensions 4000 × 10. The study demonstrated
of newly proposed physically informed models. In the con- that the dataset of 2000 soil types and 40 initial test con-
text of critical state approaches, the NorSand model has been
figurations adequately captured the general behavior of the geotechnical materials. In particular, all geotechnical crit-
NorSand model. ical state models involve specific simplifications, with the
Secondly, the paper addressed the issue of the availability most apparent being their reliance on “remolded” or dis-
of open-source implementations of the NorSand constitutive turbed soil properties. Understanding the consequences of
model. This was achieved by presenting an implementation such structural alterations, especially in terms of their impact
that connects the well-established VBA implementation to on the apparent OCR, poses notable challenges. The effect
the Python environment. The VBA code served as the “pro- on the stress ratio (ψ) remains unclear. Through the utiliza-
cessing kernel” for the new Python implementation, lever- tion of physics-informed machine learning and artificial in-
aging the extensive testing and validation conducted by Jef- telligence algorithms, these uncertainties can be thoroughly
feries and Been (2015). This integration allows researchers investigated, uncovering patterns and hidden features within
to harness the full capabilities of Python packages in their experimental data. We are confident that the results of the
analyses involving the NorSand model. present paper are useful assets in this quest, being useful for
A comprehensive database like the one provided is cru- both academic and industrial communities. Furthermore, re-
cial for developing ML and artificial intelligence models of searchers interested in modeling sequential data, such as time
series, could use this dataset for benchmarking purposes, as geotech-analysis/w/wiki/52850/norsand---plaxis-udsm (last
the highly nonlinear nature of the constitutive model poses a access: 15 November 2023), 2022.
significant challenge to ML and DL techniques. Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., and Cox, D. D.:
Hyperopt: a Python library for model selection and hyperpa-
rameter optimization, Computational Science & Discovery, 8,
Code and data availability. All data associated with the current 014008, https://doi.org/10.1088/1749-4699/8/1/014008, 2015.
submission are available at https://doi.org/10.5281/zenodo.8170536 Feinberg, J. and Langtangen, H. P.: Chaospy: An open source tool
(Ozelim et al., 2023a). Any updates will also be published for designing methods of uncertainty quantification, J. Com-
on Zenodo. The Python code used to generate the NorSandAI put. Sci., 11, 46–57, https://doi.org/10.1016/j.jocs.2015.08.008,
dataset is described in the present paper and available at 2015.
https://doi.org/10.5281/zenodo.10157831 (Ozelim et al., 2023b). Fu, Q., Hashash, Y. M., Jung, S., and Ghaboussi, J.:
The codes used for the learning task considered are also available at Integration of laboratory testing and constitutive
https://doi.org/10.5281/zenodo.10157831 (Ozelim et al., 2023b). modeling of soils, Comput. Geotech., 34, 330–345,
https://doi.org/10.1016/j.compgeo.2007.05.008, 2007.
Furukawa, T. and Yagawa, G.: Implicit constitutive modelling
for viscoplasticity using neural networks, Int. J. Numer.
Author contributions. Conceptualization, methodology, software,
Meth. Eng., 43, 195–219, https://doi.org/10.1002/(sici)1097-
validation and investigation: LCdSMO; formal analysis: LCdSMO,
0207(19980930)43:2<195::aid-nme418>3.0.co;2-6, 1998.
MDTC and ALBC; writing – original draft preparation: LCdSMO;
Ghaboussi, J., Garrett, J. H., and Wu, X.: Knowledge-Based
writing – review and editing: MDTC and ALBC; supervision:
Modeling of Material Behavior with Neural Networks, J.
MDTC; funding acquisition: LCdSMO. All authors read and ap-
Eng. Mech., 117, 132–153, https://doi.org/10.1061/(asce)0733-
proved the final version of the paper.
9399(1991)117:1(132), 1991.
Haghighat, E., Raissi, M., Moure, A., Gomez, H., and Juanes, R.: A
physics-informed deep learning framework for inversion and sur-
Competing interests. The contact author has declared that none of rogate modeling in solid mechanics, Comput. Method. Appl. M.,
the authors has any competing interests. 379, 113741, https://doi.org/10.1016/j.cma.2021.113741, 2021.
Halton, J. H.: On the efficiency of certain quasi-random sequences
of points in evaluating multi-dimensional integrals, Numerische
Disclaimer. Publisher’s note: Copernicus Publications remains Mathematik, 2, 84–90, https://doi.org/10.1007/bf01386213,
neutral with regard to jurisdictional claims made in the text, pub- 1960.
lished maps, institutional affiliations, or any other geographical rep- Hashash, Y. M. A., Jung, S., and Ghaboussi, J.: Numerical im-
resentation in this paper. While Copernicus Publications makes ev- plementation of a neural network based material model in fi-
ery effort to include appropriate place names, the final responsibility nite element analysis, Int. J. Numer. Meth. Eng., 59, 989–1005,
lies with the authors. https://doi.org/10.1002/nme.905, 2004.
He, X., He, Q., and Chen, J.-S.: Deep autoencoders
for physics-constrained data-driven nonlinear materi-
Acknowledgements. The authors acknowledge support from the als modeling, Comput. Method. Appl. M., 385, 114034,
University of Brasilia. https://doi.org/10.1016/j.cma.2021.114034, 2021.
Heider, Y., Wang, K., and Sun, W.: SO(3)-invariance of informed-
graph-based deep neural network for anisotropic elastoplas-
Financial support. This research has been supported by the Con- tic materials, Comput. Method. Appl. M., 363, 112875,
selho Nacional de Desenvolvimento Científico e Tecnológico (grant https://doi.org/10.1016/j.cma.2020.112875, 2020.
no. 102414/2022-0) and the Coordenação de Aperfeiçoamento de Itasca Consulting Group, I.: NorSand Model; FLAC3D 7.0 doc-
Pessoal de Nível Superior (grant no. 001). umentation – docs.itascacg.com, https://docs.itascacg.com/
flac3d700/common/models/norsand/doc/modelnorsand.html
(last access: 15 November 2023), 2023.
Jefferies, M. and Been, K.: Soil Liquefaction: A Critical
Review statement. This paper was edited by Le Yu and reviewed by
State Approach, Second Edition, CRC Press, 2nd edn.,
Michael Jefferies and two anonymous referees.
https://doi.org/10.1201/b19114, 2015.
Jefferies, M. G.: Nor-Sand: a simle critical state
model for sand, Géotechnique, 43, 91–103,
References https://doi.org/10.1680/geot.1993.43.1.91, 1993.
Jefferies, M. G. and Shuttle, D. A.: Dilatancy in gen-
Basheer, I. A.: Selection of Methodology for Neural Network Mod- eral Cambridge-type models, Géotechnique, 52, 625–638,
eling of Constitutive Hystereses Behavior of Soils, Comput.- https://doi.org/10.1680/geot.2002.52.9.625, 2002.
Aided Civ. Inf., 15, 445–463, https://doi.org/10.1111/0885- Jung, S. and Ghaboussi, J.: Neural network constitutive model
9507.00206, 2000. for rate-dependent materials, Comput. Struct., 84, 955–963,
Bentley: NorSand – PLAXIS UDSM – GeoStudio | PLAXIS Wiki https://doi.org/10.1016/j.compstruc.2006.02.015, 2006.
– GeoStudio | PLAXIS – Bentley Communities – commu-
nities.bentley.com, https://communities.bentley.com/products/
Komer, B., Bergstra, J., and Eliasmith, C.: Hyperopt-Sklearn: Auto- Shen, Y., Chandrashekhara, K., Breig, W., and Oliver, L.: Finite el-
matic Hyperparameter Configuration for Scikit-Learn, in: Proc. ement analysis of V-ribbed belts using neural network based hy-
of the 13th Python in Science Conf. (SCIPY 2014), 32–37, perelastic material model, Int. J. Nonlin. Mech., 40, 875–890,
https://doi.org/10.25080/Majora-14bd3278-006, 2014. https://doi.org/10.1016/j.ijnonlinmec.2004.10.005, 2005.
Lee, S., Yuan Hou, K., Wang, K., Sehrish, S., Paterno, Silva, J. P., Cacciari, P., Torres, V., Ribeiro, L. F., and As-
M., Kowalkowski, J., Koziol, Q., Ross, R. B., Agrawal, sis, A.: Behavioural analysis of iron ore tailings through
A., Choudhary, A., and Keng Liao, W.: A case study critical state soil mechanics, Soils Rocks, 45, 1–13,
on parallel HDF5 dataset concatenation for high energy https://doi.org/10.28927/sr.2022.071921, 2022.
physics data analysis, Parallel Comput., 110, 102877, Sobol, I.: On the distribution of points in a cube and the approximate
https://doi.org/10.1016/j.parco.2021.102877, 2022. evaluation of integrals, USSR Comput. Math. Math. Phys., 7, 86–
Lefik, M. and Schrefler, B.: Artificial neural network as an 112, https://doi.org/10.1016/0041-5553(67)90144-9, 1967.
incremental non-linear constitutive model for a finite ele- Sternik, K.: Technical Notoe: Prediction of Static Liquefaction by
ment code, Comput. Method. Appl. M., 192, 3265–3283, Nor Sand Constitutive Model, Studia Geotechnica et Mechanica,
https://doi.org/10.1016/s0045-7825(03)00350-5, 2003. 36, 75–83, https://doi.org/10.2478/sgem-2014-0029, 2015.
Montáns, F. J., Chinesta, F., Gómez-Bombarelli, R., and Kutz, Stoffel, M., Bamer, F., and Markert, B.: Neural network
J. N.: Data-driven modeling and learning in science and based constitutive modeling of nonlinear viscoplastic
engineering, Comptes Rendus Mécanique, 347, 845–855, structural response, Mech. Res. Commun., 95, 85–88,
https://doi.org/10.1016/j.crme.2019.11.009, 2019. https://doi.org/10.1016/j.mechrescom.2019.01.004, 2019.
Mozaffar, M., Bostanabad, R., Chen, W., Ehmann, K., Cao, The HDF Group: Hierarchical Data Format, version 5, https://www.
J., and Bessa, M. A.: Deep learning predicts path-dependent hdfgroup.org/HDF5/ (last access: 24 April 2023), 1997–2023.
plasticity, P. Natl. Acad. Sci. USA, 116, 26414–26420, The HDF Group: Software Using HDF5, https://docs.hdfgroup.
https://doi.org/10.1073/pnas.1911815116, 2019. org/archive/support/HDF5/tools5desc.html, last access:
Nti-Addae, Y., Matthews, D., Ulat, V. J., Syed, R., Sempéré, 24 April 2023.
G., Pétel, A., Renner, J., Larmande, P., Guignon, V., Jones, Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy,
E., and Robbins, K.: Benchmarking database systems for T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W.,
Genomic Selection implementation, Database, 2019, baz096, Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman,
https://doi.org/10.1093/database/baz096, 2019. K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson,
Owen, A. B. and Rudolf, D.: A Strong Law of Large Numbers E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J.,
for Scrambled Net Integration, SIAM Review, 63, 360–372, Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero,
https://doi.org/10.1137/20M1320535, 2021. E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa,
Ozelim, L. C. d. S. M., Casagrande, M. D. T., and Cavalcante, A. F., van Mulbregt, P., and SciPy 1.0 Contributors: SciPy 1.0: Fun-
L. B.: Database for NorSand4AI: A Comprehensive Triaxial Test damental Algorithms for Scientific Computing in Python, Na-
Simulation Database for NorSand Constitutive Model Materi- ture Methods, 17, 261–272, https://doi.org/10.1038/s41592-019-
als, Zenodo [data set], https://doi.org/10.5281/zenodo.8170536, 0686-2, 2020.
2023a. Wang, K. and Sun, W.: A multiscale multi-permeability poro-
Ozelim, L. C. d. S. M., Casagrande, M. D. T., and Cavalcante, plasticity model linked by recursive homogenizations and
A. L. B.: Codes for NorSand4AI: A Comprehensive Triaxial deep learning, Comput. Method. Appl. M., 334, 337–380,
Test Simulation Database for NorSand Constitutive Model Mate- https://doi.org/10.1016/j.cma.2018.01.036, 2018.
rials, Zenodo [code], https://doi.org/10.5281/zenodo.10157831, Woudstra, L.-J.: Verification, Validation and Application of the Nor-
2023b. Sand Constitutive Model in PLAXIS: Single-stress point analy-
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, ses of experimental lab test data and finite element analyses of a
B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, submerged landslide, Master’s thesis, TU Delft Civil Engineer-
V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Per- ing & Geosciences, 2021.
rot, M., and Duchesnay, E.: Scikit-learn: Machine Learning in Yang, T., Li, Y.-F., Mahdavi, M., Jin, R., and Zhou, Z.-H.:
Python, J. Mach. Learn. Res., 12, 2825–2830, 2011. Nyström Method vs Random Fourier Features: A Theoreti-
Rocscience: NorSand | RS2 | Advanced Constitutive Material cal and Empirical Comparison, in: Advances in Neural In-
Model – rocscience.com, https://www.rocscience.com/learning/ formation Processing Systems, edited by: Pereira, F., Burges,
norsand-in-rs2-an-advanced-constitutive-material-model (last C., Bottou, L., and Weinberger, K., vol. 25, Curran As-
access: 30 October 2023), 2022. sociates, Inc., https://proceedings.neurips.cc/paper_files/paper/
Roscoe, K. H. and Burland, J. B.: On the generalized stress-strain 2012/file/621bf66ddb7c962aa0d22ac97d69b793-Paper.pdf (last
behaviour of “wet” clay, in: Engineering plasticity, edited by: access: 20 November 2023), 2012.
Heyman, J. and Leckie, F., 535–609, Cambridge University Yao, Y., Sun, D., and Matsuoka, H.: A unified constitutive
Press, Cambridge, 1968. model for both clay and sand with hardening parameter in-
Schmidt, M. and Lipson, H.: Distilling Free-Form Natu- dependent on stress path, Comput. Geotech., 35, 210–222,
ral Laws from Experimental Data, Science, 324, 81–85, https://doi.org/10.1016/j.compgeo.2007.04.003, 2008.
https://doi.org/10.1126/science.1165893, 2009. Zhang, N., Zhou, A., Jin, Y.-F., Yin, Z.-Y., and Shen, S.-
Schofield, A. N. and Wroth, P.: Critical State Soil Mechan- L.: An enhanced deep learning method for accurate and ro-
ics, European civil engineering series, McGraw-Hill, ISBN bust modelling of soil stress–strain response, Acta Geotech.,
9780641940484, 1968. https://doi.org/10.1007/s11440-023-01813-8, 2023.
Zhang, P., Yin, Z.-Y., Jin, Y.-F., and Ye, G.-L.: An AI- Zhang, P., Yin, Z.-Y., Jin, Y.-F., and Liu, X.-F.: Modelling the
based model for describing cyclic characteristics of gran- mechanical behaviour of soils using machine learning algo-
ular materials, Int. J. Numer. Anal. Met., 44, 1315–1335, rithms with explicit formulations, Acta Geotech., 17, 1403–1422,
https://doi.org/10.1002/nag.3063, 2020. https://doi.org/10.1007/s11440-021-01170-4, 2021b.
Zhang, P., Yin, Z.-Y., and Jin, Y.-F.: State-of-the-Art Review
of Machine Learning Applications in Constitutive Model-
ing of Soils, Arch. Comput. Method. E., 28, 3661–3686,
https://doi.org/10.1007/s11831-020-09524-z, 2021a.