Kinetic Monte Carlos Simulation of SiO2 Highlighted PDF

Microscopic Modeling and Optimal Operation of Thermal Atomic
Layer Deposition
Yangyao Dinga , Yichi Zhanga , Keegan Kima , Anh Trana , Zhe Wua ,
Panagiotis D. Christofides∗,a,b
a
Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA, 90095-1592,
USA.
b
Department of Electrical and Computer Engineering, University of California, Los Angeles, CA 90095-1592, USA.
Abstract
This work develops a comprehensive framework for first-principles-based microscopic modeling,
data-driven input/output modeling and optimal operation of thermal atomic layer deposition (ALD)
of SiO2 thin-films using bis(tertiary-butylamino)silane (BTBAS) and ozone as precursors. Specifi-
cally, we initially utilize Density Functional Theory (DFT)-based calculations for the computation
of the key thermodynamic and kinetic parameters, which are then used in the microscopic mod-
eling of the ALD process. Subsequently, a detailed microscopic model is constructed, accounting
for the microscopic lattice structure and atomic interactions, as well as multiple microscopic film
growth processes including physisorption, abstraction and competing chemical reaction pathways.
Kinetic Monte-Carlo (kMC) algorithms are utilized to obtain computationally efficient microscopic
model solutions while preserving model fidelity. The obtained kMC simulation results are used
to train Artificial Neural Network (ANN)-based data-driven models that capture the relationship
between operating process parameters and time to ALD cycle completion. Specifically, a dense two-
hidden-layer feed-forward ANN is constructed to find a feasible range of ALD operating conditions
accounting for industrial considerations, and a Bayesian Regularized ANN is constructed to imple-
ment the cycle-to-cycle optimization of ALD cycle time. Extensive simulation results demonstrate
the effectiveness of the proposed approaches. The kMC models successfully achieves a growth per
cycle (GPC) of 1.8 Å per cycle, which is in the range of reported experimental values. The ANN
models accurately predict deposition time to steady-state from the given operating condition input,
∗
Corresponding author: Tel: +1 (310) 794-1015; Fax: +1 (310) 206-4107; E-mail: [email protected].
Preprint submitted to Elsevier February 28, 2019

and the cycle time optimization using BRANN model reduces the conventional BTBAS cycle time
by 60%.
Key words: atomic layer deposition; microscopic modeling; neural networks; optimal operation;
ALD cycle time optimization
1. Introduction
Thin-film deposition is one of the most important building blocks in the semiconductor industry.
Various deposition techniques, such as epitaxy, chemical vapor deposition (CVD), and physical
vapor deposition (PVD), have been developed to deposit high quality thin-films of various materials,
e.g., Al2 O3 , Hf2 O3 , RuO2 , SiO2 , etc. (Nalwa, 2002). However, the requirements in the production
of advanced memory devices have become more and more demanding. For example, the dimensions
of new high-k gate dielectrics are under transition to sub-10-nm scale and the associated film
thickness is required to be under 30 Å (George et al., 1996; Schuegraf et al., 2013). Also, new
transistor designs often involve complex three-dimensional structures rather than two-dimensional
planar surfaces, along with the demand of conformal films with a stringent criterion on uniformity
and defects. Thus, the atomic layer deposition (ALD) process has been widely adopted by industry
to meet the requirements of major design breakthroughs (Kääriäinen et al., 2013). ALD is a
thin-film deposition method originally derived from CVD. In an ALD process, a substrate surface
is exposed to alternating gas-phase precursor streams such that only one type of reactant is in
contact with the substrate surface at each half-cycle. Once in contact, the precursor undergoes
self-limiting surface reactions that allow a nearly complete and conformal surface coverage given
sufficient exposure time and appropriate reactor conditions. In between the alternating precursor
cycles, the reactor is purged with an inert gas, ensuring all previously-entered precursors are removed
from the chamber prior to the exposure of the film to the next precursor, avoiding undesirable
reactions and a decrease in film purity (George, 2009). The ALD method enables a layer-by-
layer film growth with film uniformity at atomic level, which is more precise and controllable than
the traditional CVD approach (Tanner et al., 2007; Foong et al., 2010; Shirazi and Elliott, 2014;
Ishikawa et al., 2017). Therefore, in the field of microelectronics 3D integration, where ultra-thin
2
and highly-conformal films are needed, ALD has gained significant popularity.
Currently, there is a wealth of ALD research on both laboratory and industrial scales (Kääriäinen
et al., 2013). This significant research activity on ALD has led to the discovery of novel precursors
and mechanisms which make high throughput film processing possible while allowing various sub-
strate lay-outs (Dasgupta et al., 2016). However, experimental and industrial works on ALD remain
expensive and time-consuming due to the cost of precursors and ALD-specific equipment, as well as
due to the limited throughput (Shirazi and Elliott, 2014). Additionally, the real-time in-situ mon-
itoring of film growth is not possible because molecular structure can only be understood through
methods like scanning electron microscopy (SEM) and scanning tunneling microscope (STM), which
are accurate but destructive to the deposited film (Schwille et al., 2017a). Thus, a model for ALD
that provides insights on the details of real-time film profile and the overall growth rate can be
beneficial to both industrial and research work.
Various deposition models have been developed over the past two decades in different fields of
microscopic simulation, for example, crystal growth, CVD, and plasma-ehanced CVD (PECVD)
(Nayhouse et al., 2013; Ikegawa and Kobayashi, 1989; Crose et al., 2018). In particular, Molecular
Dynamics (MD), and more recently, kinetic Monte-Carlo (kMC) are among the most popular simu-
lation methods (Elliott and Greer, 2004; Rasoulian and Ricardez-Sandoval, 2014; Crose et al., 2018).
An ab initio MD model keeps track of all the particle movements and requires an overwhelming
amount of computational resources, making it impossible to perform a simulation on an industrial

scale process (Battaile and Srolovitz, 2002). However, the kMC method has a crucial advantage in
computational efficiency as it tracks a single event at a time in a predefined lattice space. Despite
this simplification, the kMC method has been used in deposition models to successfully reproduce
realistic profiles (Rey et al., 1991; Dkhissi et al., 2008; Crose et al., 2018). Recently, Crose et al.
(2018) proposed a novel multiscale computational fluid dynamics (CFD) simulation that used a
surface microscopic n-fold hybrid kMC model and demonstrated its validity with a PECVD system.
Moreover, many groups have showed the validity of using kMC in ALD simulation. For instance,
Knoops et al. (2010) used raw probabilities of reaction and recombination to construct a kMC
model for general plasma-enhanced ALD. Shirazi and Elliott (2014) modeled a small scale Al2 O3
3
ALD deposition using kMC based on first-principles analysis.
However, previous works have not investigated the simulation of thermal ALD of SiO2 thin-film,
which is an important material for gate oxides in MOSFET and MEMS devices, sacrificial layers, and
conformal dielectric films in the front-end-of-line (FEOL) semiconductor wafer processing (Murray
et al., 2014). These ultra thin-films contain only around ten atomic layers (around 20 Å) and are
only viable to prepare with ALD method (Schuegraf et al., 2013). Current research has yet to
propose a model that is scalable to the industrial size, which have proven to be useful in traditional
thin-film deposition systems (Crose et al., 2018). Moreover, computationally efficient data-driven
approaches, such as system identification, have not yet been applied to characterize the proposed
ALD system(Djurabekova et al., 2007; Nicolas and Lorenzo, 2010; Rasoulian and Ricardez-Sandoval,
2015b; Mhaskar et al., 2018; Kimaev and Ricardez-Sandoval, 2017; Singh Sidhu et al., 2018). Such
data-driven models may make on-line optimization, run to run (R2R) control and real-time control
possible for ALD processes (Wang et al., 2009; Kwon et al., 2015a,b; Rasoulian and Ricardez-
Sandoval, 2015a, 2016; Oh and Lee, 2016; Chaffart and Ricardez-Sandoval, 2017, 2018).
In this paper, we propose a kMC model to simulate the atomic layer deposition of SiO2 thin-film
on a 3D lattice. Bis(tertiary-butylamino)silane (BTBAS) and ozone are chosen as the precursors,

and density functional theory (DFT) is used to obtain thermodynamic and kinetic parameters of
the precursors that were not previously reported. Those DFT-calculated parameters such as the
intermediate complex activation energies and pre-exponential coefficients crucially affect the micro-
scopic kMC model event selection to reproduce realistic growth rates and structure. This model
is also an extension of the previous 2D-lattice kMC models proposed, for example in Crose et al.
(2018). Although there are many advantages in using a 2D simplification, including easy setup
and computational efficiency, a 3D lattice is required to simulate ALD due to the importance of
spatial influence between species. Moreover, the modeling of deposition onto high-aspect ratio
(AR) features requires the analysis of edges and corners which is not possible with a 2D geome-
try (Schwille et al., 2017a). Therefore, adopting a 3D microscopic lattice structure also enables
simulating ALD with high AR design in the future. To further improve model performance, we
develop a 3D triangular lattice approximation of real crystal lattice while maintaining important
4
structural characteristics. After building the kMC simulation and validating its performance with
experimental results, we derive a data-driven model via machine learning techniques to predict the
steady-state film growth behavior for cycle-to-cycle optimization. Although kMC simulation can
provide information about film growth in real-time, it is computationally expensive to be imple-

mented in a control scheme for a large-scale system such as an entire wafer. Therefore, it is useful
to derive a data-driven model that can provide a closed-form solution and can capture key film
growth characteristics. Due to the stochastic nature of kMC and the non-linearity involved in the
reaction mechanisms, non-linear regression models are applied to capture the input-output rela-
tionship. Traditional algebraic input-output models such as the least-squares method are subjected
to prediction error and over-fitting error. Therefore, Artificial Neural Networks (ANNs), a more
robust and more systematic way of parameter determination for non-linear problem, can be readily
tailored to perform such tasks(Goodfellow et al., 2016). Specifically, a dense two-hidden-layer feed-
forward ANN and a Bayesian Regularized ANN are implemented to find the feasible range of ALD
operating conditions and to optimize ALD throughput cycle-to-cycle, respectively. The proposed
kMC-model achieves a growth per cycle (GPC) rate of 1.8 Å, which lies in the experimentally re-
ported range of 1.4 Å ∼ 2.1 Å per cycle. Extensive simulation results demonstrate the validity of
the proposed ANN approach in calculating optimal deposition times with respect to the operating
parameters. The resulting model is demonstrated to reduce the industrial conventional cycle time
by 60%. Furthermore, the modeling approach developed in this work can serve as a general guideline
and be extended to the ALD of other thin-film materials using different precursors and operating
conditions.
2. ALD Process Description and Modeling
This work focuses on developing a microscopic model that describes the deposition of SiO2
thin-film via thermal ALD, which captures the structural details, the reaction mechanisms and
the growth rate of SiO2 thin-films. In this section, the approximation of the 3D SiO2 lattice is
introduced and validated. Then, precursor selection is discussed based on experimental results and
thermodynamic data. Subsequently, the reaction mechanism and associated kinetics are discussed
5
in detail, including the DFT calculations of kinetic rate parameters and activation energies by
Gaussian09/Gaussview software package (Frisch et al., 2000). Next, a hybrid n-fold model-specific
kMC algorithm is developed to simulate the ALD process. Finally, we present the machine learning
algorithms involved with the data-driven analysis on the relationship between operating conditions
and cycle completion time.
2.1. Structural Assumptions of Deposited SiO2
Our microscopic model aims to simulate the deposition of α-Quartz SiO2 , which crystallizes
in the trigonal crystal system of space group P31 21 and has a local SiO4 structure similar to
tetrahedron. Although it may be tempting to use a true α-Quartz lattice structure, such a lattice
structure would not be suitable for kMC implementation. The chemical nature of SiO2 ALD process
requires the consideration of multiple reaction pathways, structural geometry and defect generations.
A true α-Quartz lattice kMC model would be conceptually complex and computationally challenging
(Shirazi and Elliott, 2014). Thus, instead of a fully realistic 3D lattice, a 2D triangular model (i.e.,
each monolayer is off-shifted from the monolayer below it) adopted in our previous work by Crose
et al. (2018) is extended to 3D as an approximation of the actual α-Quartz crystal structure. In
our model, a bond angle of 90◦ is assumed for the connectivity between Si and O atom instead of
109.5◦ . As shown in Figure 1 and Figure 2, the top view of the simulated lattice closely resembles
that of the real lattice, with some angle distortion. This assumption leads to a lattice repetition
every four cycles instead of five cycles as appears in the α-Quartz SiO2 . For any silicon atom in
the approximated lattice, another silicon atom appears directly above it every four cycles, whereas
in a realistic lattice, such pattern repeats every five cycles. However, this simplification does not
influence the connectivity of the individual lattice cell or the validity of the model. We will also
later demonstrate that our simulation captures accurately the growth rate and defect generation
pattern of α-Quartz SiO2 deposition reported by experimental results. Thus, this 3D triangular
model is a valid simplification of the true structure. A lattice size of 1200×1200 sites per layer
is used, which is large enough for the simulation to be size-independent but still computationally
efficient as demonstrated by Huang et al. (2010a,b), and the height depends on the number of cycles
simulated.
6
2.2. Precursor Selection
Surface reactions in the ALD process govern the growth rate and the structural pattern of SiO2
films. Therefore, the selection of oxygen and silicon precursors is an important topic. In the past,
many silicon precursors have been selected and studied to improve the uniformity and the growth
rate of SiO2 deposition. In recent years, aminosilane-based precursors have gained significant pop-
ularity because of the low activation energy of the sequential dissociative chemisorption mechanism
caused by the H-N hydrogen bonds formed during the adsorption stage. These characteristics lead to
high reaction rate and greatly improve the efficiency of SiO2 deposition (O0 Neill et al., 2011). Among
those aminosilane precursors, the most popular ones are: bis(tertiary-butylamino)silane (BTBAS),
bis(diethylamino)silane (BDEAS), bis(dimethylamino)-silane (BDMAS), tris(dimethylamino)silane

(TDMAS) and di(sec-butylamino)silane (DSBAS) (Kamiyama et al., 2006; Kinoshita et al., 2007;
Baek et al., 2012; Dingemans et al., 2012; Huang et al., 2013; Putkonen et al., 2014). In order
to pick the most favorable precursor for our simulation, we account for the following factors: the
existence of experimental data (e.g., growth per cycle (GPC) and precursor exposure time), the
existence of theoretical data (e.g., reaction mechanism and associated kinetic parameters), and the
availability of additional information such as film quality, sticking coefficient and steric hindrance
studies. Based on the above considerations, BTBAS is chosen as the Si precursor due to its fast
reported growth rate (1.4 Å ∼ 2.1 Å per cycle), adequate experimental and theoretical data, and
a detailed mechanism available to model the process (O0 Neill et al., 2011; Han et al., 2011). With
respect to the oxygen precursor, ozone (O3 ) is chosen among the common candidates for thermal
ALD of oxide films, because ozone is chemically reactive and does not introduce hydrogen-involved
side products in the thermal ALD process. Furthermore, ozone is extensively used in the indus-
try and is widely studied in experiments, which makes its major chemical properties and reaction
mechanisms readily accessible (Nishiguchi et al., 2002; Prechtl et al., 2003; Han et al., 2011).
2.3. Reaction Mechanism
A full deposition cycle in the ALD process consists of two half-cycles, each using a specific
precursor species to introduce the desired element onto the film. As mentioned above, we choose
7
BTBAS and ozone as the precursors for SiO2 deposition simulation. The reaction mechanism using
these two precursors was reported by Han et al. (2011) and is explained in detail below.
The first half-cycle is referred to as the Si-Cycle, which contains physisorption, abstraction and a
two-step dissociative chemisorption. In our model, we picked a fully hydroxylated SiO2 (001) surface
as our starting point, shown in Figure 1. The silicon precursor, BTBAS, is first physisorbed onto
the substrate surface under specific temperature and pressure. According to Han et al. (2011), the
two oxygen atoms in a SiO2 cell have different electronegativities. The more electronegative oxygen
atom, denoted as O1 , is more reactive and is therefore more likely to be electrophilicly attacked
by precursor particles than the less electronegative oxygen atom, denoted as O2 . Therefore, as
shown in Figure 3 (a), the precursor particle is first physisorbed onto the O1 -type hydroxyl group
through a strong H-bond to form the reactant. Then, the physisorbed precursor goes through the
first dissociative chemisorption step, forming a monoamine intermediate and releasing one of the
two aminoethyl groups. Next, the remaining aminoethyl group electrophilicly attacks an adjacent
O2 -type hydroxyl group, which can be either from the neighbour Si atom, i.e., neighbour-binding
route, or from the same substrate Si atom, i.e., self-binding route, as shown in Figure 3 (b). The
former reaction pathway retains the original surface orientation, resulting in a thermodynamically
favorable structure, whereas the latter, which is more kinetically favorable as shown in Table 1,
causes a deviation from the (001) surface orientation and leads to defect formation. After the
electrophilic attack, the other aminoethyl group is released from the surface structure and another
O-Si bond is formed. The remaining two H atoms from the Si atom then become the new substrate
surface. The competition of kinetic and thermodynamic favorability is crucial in explaining the
structural non-uniformity of SiO2 . Therefore, both reaction pathways and their reverse reactions
are incorporated in our kMC model, and the reaction kinetics will be explained in more details in
the next section.

The second half-cycle is referred to as the O-Cycle, which contains the ozone physisorption,
abstraction and surface oxidation. The oxidation steps of self-binding and neighbour-binding H-Si
groups are shown in Figure 3 (c). Once the surface is partially/fully chemisorbed, both terminating
H atoms will be oxidized by ozone to hydroxyl group (-OH), which are utilized in the next Si-cycle.
8
2.4. Relative Rate Determination
In order to apply the kMC algorithm, we need to compute the kinetic rates of reactions discussed
in the previous section. The physisorption of precursor particles onto the substrate surface is a gas-
surface reaction. For such athermal or barrierless processes, the Collision Theory, as expressed in
the equation below, is generally used to determine the rate constant:
r
pi 8kb T
rphs = sc,i Na σ (1)
RT πmi
where rphs is the physisorption reaction rate, pi is the partial pressure of the species i, R is the
ideal gas constant, T is the temperature, kb is the Boltzmann constant, mi is the molecular weight
of species i, sc,i is the sticking coefficient of the species i at given temperature, Na is the Avogadro
number, and σ is the average area per surface site. Although the sticking coefficient of BTBAS is
not reported in previous works, we obtain its value through an analogy with the sticking coefficient
of BDEAS because of structural and electronic similarity (Schwille et al., 2017b).

On the contrary, chemisorption, abstraction and oxidation are thermodynamically activated
kinetic reactions, which are generally described by the Transition State Theory (TST) (Cortright
and Dumesic, 2001). Assuming quasi-equilibrium is achieved between the complex and the reactant,
the reaction rate can be estimated using the thermodynamic properties of the transition state
complexes, which are computed using DFT. Thus, the reaction rate equation can be formulated
with standard Arrhenius-type rate law as follows:

−E
a,i
rrxn,i = Ai exp (2)
kb T
where, rrxn,i is the reaction rate of the ith thermodynamically activated reaction, Ea,i is its activation
energy for the transition state complex, and Ai is its pre-exponential factor, which is determined as
follows:
k
b
Ai = fiTST (3)
T
where kb is the Boltzmann constant, T is the temperature, and fiTST is the ratio of the vibrational
partition function between the transition state complex and the reactant, calculated with DFT.
9
In this work, all DFT calculations are performed using the Gaussian09 software, which will be
illustrated in more details below. The resulting parameters are summarized in Table 1 and the
associated nomenclature is explained in Section 2.5. Since the desorption reactions lead to gas-
phase products, the DFT-calculated vibrational partition function ratios of those reactions are
small than one, which match the results reported in literature. On the contrary, the other reactions
are entirely surface reactions. Therefore, their vibrational partition function ratios all equal to 1
(Cortright and Dumesic, 2001).
2.5. Kinetic Monte-Carlo Algorithm
As mentioned in the Introduction, a first-principles Molecular Dynamic simulation is too com-

putationally demanding to be feasible for the scale of system discussed in this work (Battaile and
Srolovitz, 2002; Rey et al., 1991; Dkhissi et al., 2008). Thus, we adopt an n-fold hybrid kMC algo-
rithm in the framework proposed by earlier works (Lou and Christofides, 2004; Christofides et al.,
2008; Crose et al., 2018). kMC is a stochastic algorithm that uses the kinetic rate information and
uniformly distributed random numbers to determine event execution and system time evolution.
Specifically, we define an event set as a collection of all events that have comparable rates. A total
rate, rtotal , is defined as:
N
X
rtotal = ri (4)
i=1
where ri represents the respective rate of each event within an event set, which consists of total N
events. Then, each rate is normalized with respect to the associated total rate to derive its relative
probability. The normalized indicator of the ith event, li ∈ (0, 1], can be interpreted as the sum of
the normalized probabilities of the first i events:

Pi
j=1 rj
li = , i = 1, ..., N (5)
rtotal
This indicator is then used for event selection via a uniformly distributed random number
selection, γ1 ∈ (0, 1]. If γ1 falls in the interval of normalized indicators li−1 to li , the ith event will
be selected for execution.
The transient behavior of the model is characterized by the time evolution scheme proposed by
10
the kMC algorithm, where the amount of time for each event is governed by using another random
number, γ2 ∈ (0, 1]. Starting from a given time, the simulation time clock is advanced by ∆t for
the chosen event, where ∆t is given by the following equation:
− ln γ2
∆t = (6)
rtotal
Therefore, the total rate for O-Cycle is computed as follows:
ro,total = ro,phs + ro,des + roa ,f + rob ,f (7)
where ro,phs is the rate of ozone physisorption, ro,des is the rate of ozone desorption, and roa ,f
together with rob ,f are the oxidation rates of the chemisorbed species attached to a neighbour-
binding silicon. The oxidation rate of the chemisorbed species attached to a self-binding silicon is
orders of magnitude higher than that of a neighbour-binding silicon. Therefore, it is considered
instantaneous and deterministic, and thus, omitted in the O-Cycle kMC selection. Similarly, the
total rate for Si-Cycle is:
rsi,total = rbtbas,phs + rbtbas,des + rneigh,f + rneigh,r + rself,f + rself,r (8)
where rbtbas,phs and rbtbas,des are the physisorption and desorption rates of the silicon precursor,
respectively, BTBAS, rneigh,f and rneigh,r are the forward and reverse rates of the neighbour-binding
dissociative chemisorption, respectively, and rself,f and rself,r are the forward and reverse rates of
the self-binding dissociative chemisorption, respectively. The reaction rate of first chemisorption
step rsi,chem is orders of magnitude higher than those of other events. Therefore, it is considered
instantaneous and deterministic, and thus, omitted in the Si-Cycle kMC selection.
For the O-Cycle, the rates of all considered reactions are comparable and can be modeled
with the standard n-fold kMC algorithm. However, for the Si-Cycle, in order to simulate the
realistic behavior of reaction kinetics, we need to consider surface reaction events separately from
physisorption events for the following two reasons: First, surface reaction events are formulated
and compared differently from physisorption events since surface species concentrations need to be
considered to correctly describe the competition between the thermodynamic and kinetic favorability
11
of competing pathways. Second, physisorption rates are an order of magnitude lower than surface
reaction rates according to the DFT calculation, which means that the model will be saturated by
surface reactions events if the events are not allocated properly. Thus, a decoupled kMC kinetic
scheme is proposed to partition the entire Si-Cycle events into two event sets: adsorption events
containing only physisorption events, and surface reaction events containing the remaining events.
The partitioned total rates, rsi,ads and rsi,rxn , are then defined as follows:
rsi,rxn = rneigh,f + rneigh,r + rself,f + rself,r + rbtbas,des (9)
rsi,ads = rbtbas,phs (10)
Additionally, in order to apply the decoupling scheme, we first compute a ratio, Jsi,ads as the
ratio of the adsorption rate versus the total rate, which is derived as follows:
rsi,ads
Jsi,ads = = 1 − Jsi,rxn (11)
rsi,total
Therefore, for a total assigned duration, ttotal , adsorption events are pre-allocated with a duration
of ttotal · Jsi,ads , and the remaining time is assigned to surface reaction events. Next, during the
allocated time period for surface reactions, the normalized event indicator under the competition
of reaction pathways and directions is calculated by the concentration-weighted reaction rates as

follows: Pi
j=1 rrxn,j Rj
lsi,i = PN , i = 1, ..., N (12)
k=1 rrxn,k Rk
where lsi,i ∈ (0, 1] represents the normalized indicator of the ith event in the surface reaction event
set, rrxn,j is the un-weighted chemical reaction rate for the j th event calculated from Eq. 2, Rj
is the number of reactants for each surface reaction, and N is the total number of events in the
Si-Cycle surface reaction event set. The normalized indicators are then used to execute the event
selection following the same approach performed in the standard kMC algorithm. In Section 3, it
is demonstrated that this decoupling scheme achieves desired accuracy.
12
2.6. DFT and Thermodynamic Calculations
Although the reaction activation energies and mechanisms have already been analyzed for BT-
BAS by Han et al. (2011) as discussed in Section 2.3, many fundamental thermodynamic and kinetic
properties of BTBAS have yet to be investigated, including its entropy, enthalpy, vibrational parti-
tion and others. Since the above properties are difficult to measure experimentally yet essential to
the accurate microscopic simulation of ALD behavior, in this work, we utilize Density Functional
Theory (DFT) with Gaussian09 package to compute them (Frisch et al., 2000).
In the Si-Cycle, to calculate properties of BTBAS and its reaction kinetics with SiO2 lattice,
we first need to investigate the configuration of the hydroxylated surface lattice and the structure
of the physisorbed BTBAS transition state complex. Specifically, to construct an optimal surface
lattice, a generic bulk α-quartz SiO2 unit cell is modified to generate a desired hydroxylated surface
layer. The bulk unit cell is first imported into VESTA 3, which is a 3D visualization program
widely adopted to construct crystalline structures (Momma and Izumi, 2011). The uppermost layer
of Si atoms is removed, leaving two single bonded oxygen atoms per unit cell. Each oxygen atom is
terminated with one hydrogen atom, and the new O-H bond is assumed to have the typical bond
angle and bond length of 0.98 Å (Li et al., 2009). Then, the hydroxylated unit cell is imported to
the Gaussview molecule builder tool. A 3 × 3 × 1 SiO2 lattice is constructed using the hydroxylated
SiO2 unit cell with Gaussian Periodic Boundary Condition (PBC) cell symmetry replication (Frisch
et al., 2000; Mankad and Jhu, 2016). A series of optimization steps is carried out, with all atoms
other than the surface hydrogen and oxygen atoms fixed during structure optimization (Han et al.,
2011). The lattice structure is first optimized using Hartree-Fock (HF) method with basis set 3-21G
to obtain an initial guess of the structure. Next, the B3LYP method, a hybrid function Becke0 s
three-parameter exchange functional (B3) with Lee-Yang-Parr gradient correction functional (LYP)
and triple valence plus polarization, is applied to optimize the structure to an acceptable energy
minimum with basis set 6-31G+dp accuracy level (Lee et al., 1988; Becke, 1993). Subsequently, an
initial guess of the TS complex is obtained by structuring together an optimized BTBAS molecule
and a 3 × 3 × 1 SiO2 surface lattice. Then, the TS2 method in Gaussian09 is used to calculate
an optimized TS complex structure that is most energetically favorable. The calculation is carried
13
out with the same level basis set accuracy as surface lattice structure optimization, plus modified
coordinate definition and force constant calculation of every atomic position. The resulting TS
structure has one negative vibrational frequency as expected (Frisch et al., 2000). Similarly, for
the O-Cycle reaction, the thermodynamic and kinetic properties of gas-phase ozone molecule, H-Si
surface lattice and physisorbed ozone are investigated. The same calculation procedures as Si-Cycle
are carried out for the O-Cycle, respectively.
Finally, in order to perform a precise vibrational frequency calculation, Gaussian-4 (G4) theory is
adopted. G4 theory is a complex computation method for accurate calculation of molecular energy
based on ab initio molecular-orbital theory. It provides thermodynamic results for compounds
containing second row (Li-F) and third row (Na-Cl) elements, which is applicable to our reacting
molecules. Parallel computational work with Linda worker from Gaussian09 package is carried
out to find out all the vibrational frequencies to calculate the ratio f T ST (Frisch et al., 2000).
The final vibrational partition function along with other important thermodynamic properties are
summarized in Table 1.
2.7. Artificial Neural Network Model and Non-linear Regression
The half-cycle time plays an important role in both industrial production and experimental
studies of ALD, yet the actual time needed for each half-cycle for various experimental conditions of
temperature and pressure remains unknown. Specifically, according to kMC simulation results and
experimental analysis in SiO2 thin-film ALD, Si-Cycles require longer time than O-Cycles at high
temperature around 600 K, which impacts film coverage and quality (Han et al., 2011). Therefore,
it is important to develop a model that can estimate the required Si-Cycle time, given operating
temperature and pressure. Although the kMC model can be used to simulate the transient behavior
of ALD and provide a reference to cycle time, it is computationally demanding to be applied for
multiple-cycle film production in real-time. Moreover, kMC model is not a closed-form model, and
thus, cannot be directly utilized for optimization and control. Therefore, instead of using the kMC
model to perform real-time optimization, we take advantage of the data-driven modeling approach
and build Artificial Neural Network (ANN) models that correlate the Si-Cycle completion time with
operating inlet temperature and pressure, using kMC-generated databases. Compared with tradi-
14
tional input-output models such as the least-squares method, the ANN approach is chosen for its
advantages to lowering prediction error and its robustness against over-fitting error. Since it takes
a long time to generate a database from the kMC model solutions, especially at low temperature
and pressure, two ANN-based models are developed to serve different levels of precision. The first
database covers a wider but sparse range of operating conditions, aiming to predict the suitable
boundary for operating conditions and to provide a general reference of cycle completion time with
acceptable accuracy. However, a higher accuracy is necessary for the real-time control and cycle-
time optimization. Therefore, the second database is developed, which focuses on a smaller range
with higher resolution. Due to the difference in data ranges, we adopt two levels of regularization to
accurately train our neural networks: (1) The standard un-regularzied feed-forward neural network
for the feasible range, and (2) The Bayesian Regularized Artificial Neural Network (BRANN) for
the optimal range.
For the first database, we develop a feed-forward neural network with regular back propagation
method for the nonlinear regression. Specifically, the input layer consists of two neurons, represent-
ing the inlet temperature and pressure, respectively. Two hidden layers are constructed, where the
first and second layer contain 35 and 30 neurons, respectively. The output layer contains one single
neuron, representing half-cycle completion time required to reach steady-state and, if possible, full
coverage. For each hidden layer, the Rectified Linear Unit (ReLU) function is used as the non-linear
activation function for better gradient propagation and efficient calculation:
ReLu(x) = x+ = max(0, x) (13)
Additionally, all layers are densely connected, and the structure of the ANN is optimized via
a grid search (Svozil et al., 1997). The general structure of a feed-forward two-input-single-output
neural network with two hidden layers is given in Figure 4, where Nij are the input neurons, Hij are
the hidden layer neurons, and No is the output neuron. The above ANN structure is constructed
with Tensorflow’s keras module, a high-level application programming interface (API) designed to
build and train a deep-learning ANN model which is widely used in deep learning applications.
The ANN is then trained using the above structure, and the mean square error (MSE) function,
15
S(w), which is typical for regression application, is chosen to be the cost function as follows:
ND
1 X
S(w) = [yi − (f (xi , w))]2 (14)
ND i=1
where ND is the number of data samples in the training dataset, yi is the desired output value, w is
the weight vector for all hidden layers, and f (xi , w) is the predicted value dependent on input xi and
weight w. The proper weight vector is obtained by solving an optimization problem to minimize
the cost function S(w) using standard back-propagation. Batch normalization is applied after each
hidden layer to avoid saturation and high variance activation, thereby facilitating convergence speed
and learning rate (Ioffe and Szegedy, 2015). Dropout regularization layers with a rate coefficient of
0.5, which is the typical value for hidden layers, is used to perform model averaging with bagging
method, which enhances the generality of the network and reduces over-fitting. The RMSProp
optimizer is adopted for model training, which utilizes normalized gradient from recent iterations
by keeping a moving average of the squared gradient for each weight using the following equation:
E[g 2 ]t = γE[g 2 ]t−1 + (1 − γ)gt2 (15)
η
wt+1 = wt − p gt (16)
E[g 2 ]t +
where wt and gt are the weight parameter and its gradient at iteration t respectively, E[g 2 ]t denotes
the running average of g 2 at iteration t, E[g 2 ]t−1 denotes the running average of g 2 at iteration t − 1,
and is a smoothing term that avoids division by zero. A learning rate η = 0.01 and a momentum
factor γ = 0.9 are used, which are the recommended values for RMSprop method (Ruder, 2016).
For the second database, our objective is to identify an ANN model that captures the precise
input-output relationship over a smaller operating range for real-time cycle-to-cycle optimization.
However, the traditional RMSprop mechanism faces a dilemma between model accuracy and over-
fitting. Thus BRANN is introduced as an alternative solution, by adding Bayesian regularization
to the standard ANN. BRANN is more robust than standard neural networks for precise regression
because the Bayesian regularization algorithm converts complex non-linear regression into a rigid
regression, which is a well-posed statistical problem. By efficiently turning off the weights that are
16
not relevant in the training process and incorporating Occam’s razor principle, BRANN avoids the
over-fitting and over-training problems by optimally penalizing excessive complexities in models
(Burden and Dave, 2008). In our model, BRANN for the second database is constructed, trained,
and implemented using MATLAB machine learning package. Specifically, the input layer and the
output layer of the BRANN are constructed using the same approach as the standard ANN above.
For the inner structure of the BRANN, one hidden layer with 25 neurons is constructed. The
hyperbolic tangent sigmoid function (tansig) is used as the activation function for the BRANN.
2
tansig(x) = −1 (17)
1 + e−2x
Additionally, the hyperparameters α and β are added to the standard cost function as follows,
which are chosen from a uniform random distribution:

ND
X Nw
X
2
S(w) = β [yi − (f (xi , w))] + α wj2 (18)
i=1 j=1
where ND is the number of data samples in the training dataset, w is the weight vector for all
hidden layers, which is assumed to have a Gaussian distribution, Nw is the number of weight
parameters, wj is the j th entry in the weight vector, yi is the desired output value, and f (xi , w)
is the predicted value dependent on w and the input xi . To compute the optimal weight vector w
and the continuously updated hyperparemeters α and β, a sequence of optimization problems are
solved using the Levenberg-Marquardt algorithm (Moré, 1978). The Bayesian inference calculation
and the construction of the optimization problems are discussed in detail by MacKay (1992) and
Burden and Dave (2008).
Finally, both databases are generated based on the kMC model using the UCLA Hoffman2
Distributed Cluster. The two ranges of operating conditions are: (1) T = 550 K ∼ 700 K (∆T = 5
K) with P = 80 Pa ∼ 160 Pa (∆P = 2 Pa), and (2) T = 590 K ∼ 610 K (∆T = 0.5 K) with P = 120
Pa ∼ 150 Pa (∆P = 1 Pa). The first set of simulation results is divided into training, validation and
test data under the ratio of 8:1:1, and the second set of simulation results is divided into training,
validation and test data under the ratio of 7:1.5:1.5. The training dataset is used to determine the
17
model parameters. The validation dataset is used in the training process to validate and improve
the training. The testing dataset is randomly chosen from the entire dataset in advance to evaluate
the final result and is not used in the training process.
3. Simulation Results
The results section is divided into three subsections. First, the kMC model is validated by
comparing the simulation results with film growth behavior observed in literature. Then, the neural
network models are demonstrated to be successfully developed to capture the relationship between
cycle steady-state time and the operating conditions for the two databases covering feasible and
optimal ranges. Specifically, the results of the first database cover a wider range of temperature and
pressure conditions (feasible operating range), thereby providing a general guideline for the suitable
conditions to carry out thermal SiO2 ALD process, whereas the second database focuses on the range
around T = 600 K and P = 133 Pa, typically employed in industry, with a higher resolution and is
used for cycle time optimization as discussed in the previous subsection. Finally, the simulation of
the multi-layer SiO2 deposition demonstrates that the average ALD deposition time can be reduced
utilizing the BRANN results, potentially allowing a higher industrial throughput.
3.1. Validation of Microscopic kMC Model with Experimental Data
The kMC model is validated by observing its behavior under varying temperature for a total
of ten ALD cycles. Specifically, the precursor partial pressure is kept constant at 133 Pa, and the
operating temperature is varied from 555 K to 625 K with an increment of 20 K for each simulation
run. Each half-cycle is assigned 2 seconds, which is sufficient for the cycle to reach steady-state as
observed. The same amount of cycle time is reported in experimental work by O0 Neill et al. (2011),
although the specific value may vary since it also accounted for the time of gas-phase development.
The precursor partial pressure on the substrate surface is important in determining physiosorption
reaction rate. Therefore, the coupled effect of the gas-phase mass and momentum transfer in the
reactor has a significant contribution to the cycle time and film quality, which we will discuss in our
future work on the investigation of multi-scale computational fluid dynamics modeling of thermal
18
ALD. Additionally, as mentioned in Section 2.7, since the O-Cycle time is much faster than Si-Cycle,
we will focus on the Si-Cycle results only.
During the Si-Cycle, the competition between neighbour-binding and self-binding pathways is
observed. As discussed in Section 2.3, self-binding events are kinetically favorable, while neighbour-
binding events are thermodynamically favorable. The designed kinetics in our model is able to
capture this behavior, as shown in Figure 5. At the beginning of each cycle, self-binding events
dominate. As the simulation time goes on, self-binding silicons undergo reverse reaction and de-
crease, as shown by the solid lines, and neighbour-binding silicons start to form, as shown by the
dashed lines. If a cycle is given enough time to develop, the fraction of neighbour-binding sil-
icons will approach unity, while the fraction of self-binding silicons approaches zero. Since the
neighbour-binding dominance is thermodynamically driven, as temperature decreases, more self-
binding silicons appear in the initial deposition stages and a longer time is required for the lattice
to develop into the desirable neighbour-binding-dominant profile, shown by the direction of the
dashed arrow in Figure 5. This observation is consistent with the calculation and experimental
analysis reported in Han et al. (2011).
ALD growth rate is also successfully simulated by the kMC model, which is characterized by
the average growth of film thickness per cycle (GPC). Although the kMC model does not report
the GPC directly as in experimental approaches, we could use the final coverage information to
compute the GPC based on the literature values of SiO2 unit cell lattice constants: a=5.5407 Å,
b=c=4.918 Å (El-Kareh, 2012). An atom-to-atom measurement is performed in Gaussview to find
the relative distance between Si and O atom within a SiO2 cell, and the ideal layer thickness is
calculated using the lattice constants and the atomic radii of oxygen and silicon atoms, which are
0.65 Å and 1.18 Å, respectively (El-Kareh, 2012). Then, the GPC is inferred through the simulated
surface coverage and the ideal layer thickness. The simulated GPC at standard industrial operating
condition of 600 K and 133 Pa is shown in Figure 6. The GPC varies little with increasing cycle
numbers but does show a slightly decreasing pattern as reported in experimental works (George
et al., 1996). The average growth rate over 10 cycles is 1.8 Å per cycle, which is in the range of the
SiO2 GPC of 1.4-2.1 Å per cycle reported by O0 Neill et al. (2011).
19
Although the temperature does not impact GPC when sufficient time is given for each cycle to
reach steady-state and achieve full coverage, it is a crucial factor for the transient deposition rate
within each cycle. Figure 7 demonstrates that the transient deposition rate increases as temperature
increases, and the rate increment is approximately proportional to the temperature increment.
This effect is observed in experimental results from the work by Putkonen et al. (2014), where the
deposition rate is demonstrated to increase with temperature for unsaturated surfaces. Moreover,
it is noteworthy that not all temperatures allow the surface to reach full coverage at steady-state.
The selection of an appropriate temperature region will be introduced in the next section.
3.2. ANN Results for Si-Cycle

3.2.1. Feasible Operating Regime
In this section, a set of kMC simulations covering a wide range of operating conditions is carried
out under fixed temperature and pressure throughout each simulation. The simulation is terminated
either when steady-state is achieved under the given conditions, or when the simulation time exceeds
5 seconds, which is too long to be considered industrially relevant (Acton, 2012). Due to the
stochastic nature of the kMC algorithm, the lattice surface configuration keeps changing at steady-
state. However, the overall coverage at steady-state, which is one of the most crucial attributes
of ALD processes, will be maintained at a certain value, with fluctuation under 0.5%. Then, the
ANN model is developed to capture the relationship between the time to achieve 98% of the final
coverage and the operating temperature and pressure. It is noted that, in the ANN model, the
time for the system to reach 98% of the final coverage is used instead of the time to reach the final
coverage to reduce the noise and inaccuracy involved in the steady-state fluctuations.
Two hidden layers are used for the ANN model and the numbers of neurons are determined to be
35 and 30 for the first and second hidden layer respectively, based on a grid search. A single-hidden-
layer feed-forward ANN does not yield a good solution, and the performance cannot be improved
by simply adding more neurons since over-fitting error is observed. Therefore, a two-hidden-layer
structure is chosen, which is conventionally adopted to capture the exponential-like behaviour in
our model. A mean absolute error on the test dataset is reported to be 8.00×10-3 s. As shown in
Figure 8 (a,b), the ANN model achieves desired performance on the test dataset. In Figure 9 (a),
20
the error distribution histogram shows a nearly normal distribution for the error between predicted
and actual completion time, with a mean close to zero. Additionally, in Figure 9 (b), the R-squared
correlation between predicted steady-state time and the simulated time is 0.979, demonstrating that
the neural network result has a good resemblance with the actual simulation result.
Moreover, in Figure 10, a region of unsuitable operating conditions is identified where either
the steady-state is not reached within 4 seconds or the steady-state coverage cannot achieve full
coverage. For example, at T = 655 K and P = 86 Pa, although the steady-state is reached at
0.33 seconds, only 95% of the surface is covered. Additionally, at T = 545 K and P = 112 Pa,
the surface does not reach the steady-state in 4 seconds. The maximum allowable time to steady-
state is set to be 3.5 seconds considering industrial usefulness. Typically, a SiO2 ALD half-cycle
will take up to 5 seconds including the precursor flushing stage, which allows for the substrate
surface to be fully saturated with precursor particles. Although the duration of precursor flushing
may vary according to reactor geometry, 1 to 1.5 seconds is the usual lower limit allowed in a
typical ALD reactor design to prevent damage to reactor caused by high precursor flow rate and
pressure (Acton, 2012). This result demonstrates that in order to achieve a good coverage, the
contribution of both temperature and pressure should be accounted for. Since the optimal operating
conditions might not be easily achieved or needed, the ANN model mentioned above can be used
as an initial guideline to determine the feasible pressure and temperature conditions in experiments
and industrial productions to operate the ALD process.
3.2.2. Optimal Operating Conditions
In this section, we focus on a narrower range of operating conditions with higher data resolution.
For this dataset, one hidden layer with 25 neurons is chosen for the BRANN and the neuron
number is determined via a grid search.A single layer BRANN is able to accurately capture the
complex non-linearity based on its regularization algorithm. The trained neural network achieves
a desired performance with a mean absolute error on the test data of 5.01×10-3 s. In Figure 9
(d), the statistical analysis shows that the neural network accurately predicts the simulation data.
The R-squared correction between the predicted steady-state time and the simulated time is 0.99,
characterized by an almost linear correlation curve. Moreover, as shown in Figure 9 (c), although
21
limited by the number of data points in the test data, the error distribution is close to a normal
distribution, which further validates the model fitness.
According to the results in Section 3.2.1, the operating conditions chosen in this section guar-
antee full coverage of the newly deposited surface at steady-state within 3.5 seconds. Under these
operating conditions, the time necessary to achieve steady-state ranges from 0.31 seconds to 1.33
seconds, and a higher temperature and pressure will reduce the time to reach steady-state. The
trained BRANN accurately predicts the time to reach steady-state and full coverage from given
pressure and temperature inputs, which will be used for cycle time optimization in the next section.
3.3. Cycle Time Optimization for Multi-Cycle SiO2 ALD
Although the neural network trained for the dataset of the small range of operating conditions in
Section 3.2.2 is based on single-cycle simulations, it is demonstrated to be applicable for predicting

cycle completion time for multi-cycle simulation when all layers are close to full coverage. In a typical
industrial setting, cycle time for the entire deposition process is fixed at given operating conditions.
However, cycle time can be reduced based on the knowledge of cycle completion time according
to temperature and pressure input from BRANN results. Such optimization is demonstrated by
performing two sets of five-cycle simulations. Specifically, five different pairs of temperature and
pressure are chosen for each cycle, as shown in Table 2. For each cycle in the first set, a fixed duration
of 3.5 seconds is given, as discussed in Section 3.2.2, to be industrially practical. In contrast, for
the second database the steady-state times predicted by the ANN are given based on the respective
operating conditions. Within each cycle, temperature and pressure are kept constant. Both sets
result in similar deposited five-layer SiO2 thin-film with almost full coverage (> 99.9%). Therefore,
by optimizing operating time from cycle to cycle, we can reduce the surface deposition time by 60%
using ANN model result, ignoring the gas-phase development time, which will be analyzed in our
future work on the multi-scale CFD modeling. In addition, the cycle-time calculation from ANN is
almost instantaneous and is much faster than that from the kMC model, which needs approximately
one hour to finish a cycle. Moreover, as the film develops thicker, the kMC model will become even
slower. This reduction in calculation time is essential for the purpose of cycle-to-cycle optimization
and real-time control.
22
4. Conclusions
In this work, we developed an integrated framework for first-principles-based microscopic model-
ing, data-driven modeling and optimal operation of thermal atomic layer deposition (ALD) of SiO2
thin-films using bis(tertiary-butylamino)silane (BTBAS) and ozone as precursors. The performance
of the 3D kMC model for the SiO2 ALD process using fundamental chemical properties from DFT
calculations was corroborated by experimentally reported data. The thermodynamic-kinetic com-

peting reaction mechanism was reproduced and a GPC of 1.8 Å per cycle under according operating
conditions lied in the range of experimental growth rates. Then, two ANN-based data-driven models
for different ranges of operating conditions were constructed. The derived ANN models enabled us
to predict the time to achieve steady-state for the film growth at given temperatures and precursor
partial pressures. Specifically, the ANN model based on the dataset of large operating range pre-
dicted a feasible boundary of operating conditions, and the ANN model based on the dataset of the
small range can be utilized in the cycle time optimization to reduce throughput time of the Si-Cycle
by 60%, neglecting the gas-phase development time. As a result, the overall approach holds promise
for developing an accurate general model for SiO2 ALD with only precursor thermodynamic and
kinetic properties, seeking proper operating conditions while increasing industrial throughput, and
significantly saving time and resources that would otherwise have been spent on the testing and
manufacturing of physical reaction chambers. Moreover, the generality of the modeling approach
in this work makes it possible to extend the current model to the thermal ALD of other thin-film
materials with different precursors.
5. Acknowledgments
Financial support from the National Science Foundation is gratefully acknowledged.
23
Literature Cited
Acton, Q.A., 2012. Chemical Processes-Advances in Research and Application: 2012 Edition:
ScholarlyBrief. ScholarlyEditions.
Baek, S.B., Kim, D.H., Kim, Y.C., 2012. Adsorption and surface reaction of bis-diethylaminosilane
as a Si precursor on an OH-terminated Si(001) surface. Applied Surface Science 258, 6341–6344.
Battaile, C.C., Srolovitz, D.J., 2002. Kinetic monte carlo simulation of chemical vapor deposition.
Annual Review of Materials Research 32, 297–319.
Becke, A.D., 1993. Density-functional thermochemistry. iii. the role of exact exchange. The Journal
of Chemical Physics 98, 5648–5652.
Burden, F., Dave, W., 2008. Bayesian regularization of neural networks, in: Artificial neural
networks: Methods and Applications. Springer, pp. 23–42.
Chaffart, D., Ricardez-Sandoval, L.A., 2017. Robust dynamic optimization in heterogeneous mul-
tiscale catalytic flow reactors using polynomial chaos expansion. Journal of Process Control 60,
128–140.
Chaffart, D., Ricardez-Sandoval, L.A., 2018. Optimization and control of a thin film growth pro-
cess: A hybrid first principles/artificial neural network based multiscale modelling approach.
Computers & Chemical Engineering 119, 465–479.
Christofides, P.D., Armaou, A., Lou, Y., Varshney, A., 2008. Control and optimization of multiscale
process systems. Springer Science & Business Media.
Cortright, R.D., Dumesic, J.A., 2001. Kinetics of heterogeneous catalytic reactions: Analysis of
reaction schemes. Advances in Catalysis 46.
Crose, M., Zhang, W., Tran, A., Christofides, P.D., 2018. Multiscale three-dimensional CFD mod-
eling for PECVD of amorphous silicon thin films. Computers & Chemical Engineering 113, 184
– 195.
24
Dasgupta, N.P., Li, L., Sun, X., 2016. Atomic layer deposition for energy and environmental
applications. Advanced Materials Interfaces 3.
Dingemans, G., Helvoirt, C.A.A.V., Pierreux, D., Keuning, W., Kessels, W.M.M., 2012. Plasma-
assisted ALD for the conformal deposition of SiO2 : Process, material and electronic properties.
Journal of the Electrochemical Society 159, H277–H285.
Djurabekova, F.G., Domingos, R., Cerchiara, G., Castin, N., Vincent, E., Malerba, L., 2007. Arti-
ficial intelligence applied to atomistic kinetic monte carlo simulations in Fe–Cu alloys. Nuclear
Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and
Atoms 255, 8–12.
Dkhissi, A., Estève, A., Mastail, C., Olivier, S., Mazaleyrat, G., Jeloaica, L., Djafari-Rouhani, M.,
2008. Multiscale modeling of the atomic layer deposition of HfO2 thin film grown on silicon: How
to deal with a kinetic monte carlo procedure. Journal of Chemical Theory and Computation 4,
1915–1927.
El-Kareh, B., 2012. Fundamentals of semiconductor processing technology. Springer Science &
Business Media.
Elliott, S.D., Greer, J.C., 2004. Simulating the atomic layer deposition of alumina from first prin-
ciples. Journal of Materials Chemistry 14, 3246–3250.
Foong, T.R.B., Shen, Y., Hu, X., Sellinger, A., 2010. Template-directed liquid ALD growth of
TiO2 nanotube arrays: Properties and potential in photovoltaic devices. Advanced Functional
Materials 20, 1390–1396.
Frisch, A., Nielsen, A.B., Holder, A.J., 2000. Gaussview user manual. Gaussian Inc., Pittsburgh,
PA 556.
George, S.M., 2009. Atomic layer deposition: An overview. Chemical Reviews 110, 111–131.
George, S.M., Ott, A.W., Klaus, J.W., 1996. Surface chemistry for atomic layer growth. The
Journal of Physical Chemistry 100, 13121–13131.
25
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press.
Han, B., Zhang, Q., Wu, J., Han, B., Karwacki, E.J., Derecskei, A., Xiao, M., Lei, X., O0 Neill, M.L.,
Cheng, H., 2011. On the mechanisms of SiO2 thin-film growth by the full atomic layer deposition
process using bis(t-butylamino)silane on the hydroxylated SiO2 (001) surface. The Journal of
Physical Chemistry C 116, 947–952.
Huang, J., Hu, G., Orkoulas, G., Christofides, P.D., 2010a. Dependence of film surface roughness and
slope on surface migration and lattice size in thin film deposition processes. Chemical Engineering
Science 65, 6101–6111.
Huang, J., Hu, G., Orkoulas, G., Christofides, P.D., 2010b. Dynamics and lattice-size dependence
of surface mean slope in thin-film deposition. Industrial & Engineering Chemistry Research 50,
1219–1230.
Huang, L., Han, B., Han, B., Derecskei-Kovacs, A., Xiao, M., Lei, X., O0 Neill, M.L., Pearlstein,
R.M., Chandra, H., Cheng, H., 2013. First-principles study of a full cycle of atomic layer de-
position of SiO2 thin films with di(sec-butylamino)silane and ozone. The Journal of Physical
Chemistry C 117, 19454–19463.
Ikegawa, M., Kobayashi, J., 1989. Deposition profile simulation using the direct simulation monte
carlo method. Journal of the Electrochemical Society 136, 2982–2986.
Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training by reducing
internal covariate shift, in: Proceedings of the 32nd International Conference on Machine Learning.
Ishikawa, K., Karahashi, K., Ichiki, T., Chang, J.P., George, S.M., Kessels, W., Lee, H.J., Tinck,
S., Um, J.H., Kinoshita, K., 2017. Progress and prospects in nanoscale dry processes: How can
we control atomic layer reactions? Japanese Journal of Applied Physics 56, 06HA02.
Kääriäinen, T., Cameron, D., Kääriäinen, M.L., Sherman, A., 2013. Atomic layer deposition:
principles, characteristics, and nanotechnology applications. John Wiley & Sons.
26
Kamiyama, S., Mira, T., Nara, Y., 2006. Comparison between SiO2 films deposited by atomic layer
deposition with SiH2 [N(CH3 )2 ]2 and SiH[N(CH3 )2 ]3 precursors. Thin Solid Films 515, 1517–1521.
Kimaev, G., Ricardez-Sandoval, L.A., 2017. A comparison of efficient uncertainty quantification

techniques for stochastic multiscale systems. AIChE Journal 63, 3361–3373.
Kinoshita, Y., Hirose, F., Miya, H., Hirahara, K., Kimura, Y., Niwano, M., 2007. Infrared study of
tris(dimethylamino)silane adsorption and ozone irradiation on Si(100) surfaces for ALD of SiO2 .
Electrochemical and Solid-State Letters 10, G80–G83.
Knoops, H.C.M., Langereis, E., Van De Sanden, M.C.M., Kessels, W.M.M., 2010. Conformality of
plasma-assisted ALD: physical processes and modeling. Journal of The Electrochemical Society
157, G241–G249.
Kwon, J.S.I., Nayhouse, M., Orkoulas, G., Ni, D., Christofides, P.D., 2015a. A method for handling
batch-to-batch parametric drift using moving horizon estimation: Application to run-to-run mpc
of batch crystallization. Chemical Engineering Science 127, 210 – 219.
Kwon, J.S.I., Nayhouse, M., Orkoulas, G., Ni, D., Christofides, P.D., 2015b. Run-to-run-based
model predictive control of protein crystal shape in batch crystallization. Industrial & Engineering
Chemistry Research 54, 4293–4302.
Lee, C., Yang, W., Parr, R.G., 1988. Development of the colle-salvetti correlation-energy formula
into a functional of the electron density. Physical Review B 37, 785.
Li, J., We, J., Zhou, C., Han, B., Karwacki, E.J., Xiao, M., Lei, X., Cheng, H., 2009. On the disso-
ciative chemisorption of tris(dimethyl-amino)silane on hydroxylated SiO2 (001) surface. Journal

of Physical Chemistry C 113, 9731–9736.
Lou, Y., Christofides, P.D., 2004. Feedback control of surface roughness of GaAs(001) thin films
using kinetic monte carlo models. Computers & Chemical Engineering 29, 225–241.
MacKay, D.J.C., 1992. Bayesian interpolation. Neural computation 4, 415–447.
27
Mankad, V., Jhu, P.K., 2016. First-principles study of water adsorption on α-SiO2 (110) surface.
AIP Advances 6, 085001.
Mhaskar, P., Garg, A., Corbett, B., 2018. Modeling and Control of Batch Processes: Theory and
Application. Springer Science & Business Media.
Momma, K., Izumi, F., 2011. Vesta 3 for three-dimensional visualization of crystal, volumetric and
morphology data. Journal of Applied Crystallography 44, 1272–1276.
Moré, J.J., 1978. The Levenberg-Marquardt algorithm: implementation and theory, in: Numerical
Analysis. Springer, pp. 105–116.
Murray, C.A., Elliott, S.D., Hausmann, D., Henri, J., LaVoie, A., 2014. Effect of reaction mechanism
on precursor exposure time in atomic layer deposition of silicon oxide and silicon nitride. ACS
Applied Materials & Interfaces 6, 10534–10541.
Nalwa, H. (Ed.), 2002. Handbook of Thin Films. volume 1. Academic Press, Burlington.
Nayhouse, M., Kwon, J.S.I., Christofides, P.D., Orkoulas, G., 2013. Crystal shape modeling and
control in protein crystal growth. Chemical Engineering Science 87, 216–223.
Nicolas, C., Lorenzo, M., 2010. Calculation of proper energy barriers for atomistic kinetic monte
carlo simulations on rigid lattice with chemical and strain field long-range effects using artificial
neural networks. The Journal of Chemical Physics 132, 074507.
Nishiguchi, T., Nonaka, H., Ichimura, S., Morikawa, Y., Kekura, M., Miyamoto, M., 2002. High-
quality SiO2 film formation by highly concentrated ozone gas at below 600◦ C. Applied Physics
Letters 81, 2190–2192.
Oh, S.K., Lee, J.M., 2016. Iterative learning model predictive control for constrained multivariable
control of batch processes. Computers & Chemical Engineering 93, 284 – 292.
O0 Neill, M.L., Bowen, H.R., Derecskei-Kovacs, A., Cuthill, K.S., Han, B., Xiao, M., 2011. Impact of
28
aminosilane precursor structure on silicon oxides by atomic layer deposition. The Electrochemical
Society Interface 20, 33–37.
Prechtl, G., Kersch, A., Icking-Konert, G.S., Jacobs, W., Hecht, T., Boubekeur, H., Schröder, U.,
2003. A model for Al2 O3 ALD conformity and deposition rate from oxygen precursor reactivity,
in: Proceedings of the International Electron Devices Meeting, Washington, DC, USA.
Putkonen, M., Bosund, M., Ylivaara, O.M., Puurunen, R.L., Kilpi, L., Ronkainen, H., Sintonen,
S., Ali, S., Lipsanen, H., Liu, X., 2014. Thermal and plasma enhanced atomic layer deposition
of SiO2 using commercial silicon precursors. Thin Solid Films 558, 93–98.
Rasoulian, S., Ricardez-Sandoval, L.A., 2014. Uncertainty analysis and robust optimization of
multiscale process systems with application to epitaxial thin film growth. Chemical Engineering
Science 116, 590–600.
Rasoulian, S., Ricardez-Sandoval, L.A., 2015a. Robust multivariable estimation and control in an
epitaxial thin film growth process under uncertainty. Journal of Process Control 34, 70–81.
Rasoulian, S., Ricardez-Sandoval, L.A., 2015b. A robust nonlinear model predictive controller for
a multiscale thin film deposition process. Chemical Engineering Science 136, 38–49.
Rasoulian, S., Ricardez-Sandoval, L.A., 2016. Stochastic nonlinear model predictive control applied
to a thin film deposition process under uncertainty. Chemical Engineering Science 140, 90–103.
Rey, J.C., Cheng, L., McVittie, J.P., Saraswat, K.C., 1991. Monte carlo low pressure deposition
profile simulations. Journal of Vacuum Science & Technology A 9, 1083–1087.
Ruder, S., 2016. An overview of gradient descent optimization algorithms. CoRR abs/1609.04747.
Schuegraf, K., Abraham, M.C., Brand, A., Naik, M., Thakur, R., 2013. Semiconductor logic
technology innovation to achieve sub-10 nm manufacturing. IEEE Journal of the Electron Devices
Society 1, 66–75.
29
Schwille, M.C., Schössler, T., Barth, J., Knaut, M., Schön, F., Höchst, A., Oettel, M., Bartha, J.,
2017a. Experimental and simulation approach for process optimization of atomic layer deposited
thin films in high aspect ratio 3D structures. Journal of Vacuum Science & Technology A:
Vacuum, Surfaces, and Films 35, 01B118.
Schwille, M.C., Schössler, T., Florian, S., Oettel, M., Bartha, J.W., 2017b. Temperature dependence
of the sticking coefficients of bis-diethyl aminosilane and trimethylaluminum in atomic layer
deposition. Journal of Vacuum Science & Technology A: Vacuum, Surfaces, and Films 35, 01B119.
Shirazi, M., Elliott, S.D., 2014. Atomistic kinetic monte carlo study of atomic layer deposition
derived from density functional theory. Journal of Computational Chemistry 35, 244–259.
Singh Sidhu, H., Siddhamshetty, P., Kwon, J., 2018. Approximate dynamic programming
based control of proppant concentration in hydraulic fracturing. Mathematics 6, 132.
doi:10.3390/math6080132.
Svozil, D., Kvasnicka, V., Pospichal, J., 1997. Introduction to multi-layer feed-forward neural
networks. Chemometrics and Intelligent Laboratory Systems 39, 43–62.
Tanner, C.M., Perng, Y.C., Frewin, C., Saddow, S.E., Chang, J.P., 2007. Electrical performance
of Al2 O3 gate dielectric films deposited by atomic layer deposition on 4H-SiC. Applied Physics
Letters 91, 203510.
Wang, Y., Gao, F., Doyle, F.J., 2009. Survey on iterative learning control, repetitive control, and
run-to-run control. Journal of Process Control 19, 1589 – 1600.
30
List of Figures
1 (a) Top view of the hydroxylated SiO2 (001) surface. (b) Side view of the hydroxylated
SiO2 (001) surface, where O1 is the more electronegative oxygen. The double bonds
are due to Gaussian display format, which does not influence the validity of the
structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 Top view of a 5-layer 12×12-site miniature demonstration of the full kinetic Monte-
Carlo simulation lattice. The five layers and the species on the lattice are shown
using different colors and symbols, respectively. The first (bottom) layer, labeled
red, contains the base Si atoms. The second layer, labeled black, contains oxygen
atoms or hydrogenated oxygens. The third layer, labeled yellow, contains the species
from the first silicon half-cycle: Si is the neighbour-binding silicon, Si! is the self-
binding silicon, and PsP and CsP are the physisorbed and chemisorbed precursors,
respectively. The fourth layer, labeled green, contains the species from the first
oxygen half-cycle: O and OH are the oxygen atoms and hydrogenated oxygens. The
fifth (top) layer, labeled blue, contains physisorbed ozones (PO1 and PO2), which
remain to be oxidized. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 (a) First dissociative chemisorption step of BTBAS. (b) Second dissociative chemisorp-
tion step of BTBAS under self-binding and neighbour-binding mechanisms. (c) Ox-
idation of self-binding and neighbour-binding SiH2 with ozone. . . . . . . . . . . . . 34
4 Feed-forward Artificial Neural Network with two inputs, two hidden layers, and one
output. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5 Competition between self-binding and neighbour-binding silicons with respect to the
normalized cycle time. The dashed lines represent the ratio of neighbour-binding
silicons versus total built silicons, and the solid lines represent the ratio of self-
binding silicons versus total built silicons. The dashed arrows represent the direction
of decreasing temperature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6 Steady-state GPC for the first ten cycles at 600 K and 133 Pa, where the dashed
lines represent the upper and lower GPC limits reported in the literature (Putkonen
et al., 2014). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7 Comparison of initial transient deposition profiles for different temperatures at 133 Pa. 38
8 Dependence of the time to reach steady-state on the operating conditions, where black
markers (dots) represent the training data and the surface represents the fitting result:
(a) Large-range operating condition fitting. (b) Small-range operating condition fitting. 39
9 Performance of the Artificial Neural Network models: (a) Prediction error distribu-
tion histogram for the large-range operating conditions. (b) Correlation accuracy
of the predicted time and the actual time for the large-range operating conditions,
where the x-axis is the predicted time to reach steady-state from the neural network
and the y-axis is the actual time to reach steady-state from the kMC model. (c)
Prediction error distribution histogram for the small-range operating conditions. (d)
Correlation accuracy of the predicted time and the actual time for the small-range
operating conditions, where the x-axis is the predicted time to reach steady-state
from the neural network and the y-axis is the actual time to reach steady-state from
the kMC model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
10 Range of operating conditions where the deposited films are not able to reach full
coverage at steady-state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
31
Figure 1: (a) Top view of the hydroxylated SiO2 (001) surface. (b) Side view of the hydroxylated SiO2 (001) surface,
where O1 is the more electronegative oxygen. The double bonds are due to Gaussian display format, which does not
influence the validity of the structure.
32
Figure 2: Top view of a 5-layer 12×12-site miniature demonstration of the full kinetic Monte-Carlo simulation lattice.
The five layers and the species on the lattice are shown using different colors and symbols, respectively. The first
(bottom) layer, labeled red, contains the base Si atoms. The second layer, labeled black, contains oxygen atoms or
hydrogenated oxygens. The third layer, labeled yellow, contains the species from the first silicon half-cycle: Si is
the neighbour-binding silicon, Si! is the self-binding silicon, and PsP and CsP are the physisorbed and chemisorbed
precursors, respectively. The fourth layer, labeled green, contains the species from the first oxygen half-cycle: O
and OH are the oxygen atoms and hydrogenated oxygens. The fifth (top) layer, labeled blue, contains physisorbed
ozones (PO1 and PO2), which remain to be oxidized.
33
Figure 3: (a) First dissociative chemisorption step of BTBAS. (b) Second dissociative chemisorption step of BTBAS
under self-binding and neighbour-binding mechanisms. (c) Oxidation of self-binding and neighbour-binding SiH2
with ozone.
34
Figure 4: Feed-forward Artificial Neural Network with two inputs, two hidden layers, and one output.
35
Figure 5: Competition between self-binding and neighbour-binding silicons with respect to the normalized cycle
time. The dashed lines represent the ratio of neighbour-binding silicons versus total built silicons, and the solid lines
represent the ratio of self-binding silicons versus total built silicons. The dashed arrows represent the direction of
decreasing temperature.
36
2.3
2.1
)
-1
GPC (Å·Cycle 1.9
1.7
1.5
1.3
1 2 3 4 5 6 7 8 9 10
Cycle Number
Figure 6: Steady-state GPC for the first ten cycles at 600 K and 133 Pa, where the dashed lines represent the upper
and lower GPC limits reported in the literature (Putkonen et al., 2014).
37
1.0
555 K
575 K
605 K
0.8 625 K
Coverage Fraction
0.6
0.4
0.2
0
0 0.05 0.10 0.15 0.20 0.25
Time (s)
Figure 7: Comparison of initial transient deposition profiles for different temperatures at 133 Pa.
38
Figure 8: Dependence of the time to reach steady-state on the operating conditions, where black markers (dots)
represent the training data and the surface represents the fitting result: (a) Large-range operating condition fitting.
(b) Small-range operating condition fitting.
39
Figure 9: Performance of the Artificial Neural Network models: (a) Prediction error distribution histogram for the
large-range operating conditions. (b) Correlation accuracy of the predicted time and the actual time for the large-
range operating conditions, where the x-axis is the predicted time to reach steady-state from the neural network and
the y-axis is the actual time to reach steady-state from the kMC model. (c) Prediction error distribution histogram
for the small-range operating conditions. (d) Correlation accuracy of the predicted time and the actual time for
the small-range operating conditions, where the x-axis is the predicted time to reach steady-state from the neural
network and the y-axis is the actual time to reach steady-state from the kMC model.
40
Figure 10: Range of operating conditions where the deposited films are not able to reach full coverage at steady-state.
41
List of Tables
1 Activation energies and partition function ratios of reactions. . . . . . . . . . . . . . 43
2 Temperature, pressure and predicted time for multi-cycle simulation. . . . . . . . . 44
42
Activation Energy Vibrational Partition
Reaction
(kcal/mole) Function Ratio
rsi,chem 8.9 1
rsi,neigh,f 20.1 1
rsi,neigh,r 33.6 1
rsi,self,f 16.1 1
rsi,self,r 14.4 1
rsi,des 17.5 9.56e-8
roa ,f 17.7 1
rob ,f 15.4 1
ro,des 9.224 1e-4
Table 1: Activation energies and partition function ratios of reactions.
43
Temperature (K) Pressure (Pa) Predicted Time (s) Original Time (s)
590 120 1.385 3.5
595 125 0.908 3.5
600 130 0.652 3.5
605 135 0.485 3.5
610 140 0.370 3.5
Table 2: Temperature, pressure and predicted time for multi-cycle simulation.
44

Kinetic Monte Carlos Simulation of SiO2 Highlighted PDF

Uploaded by

Copyright:

Available Formats

Kinetic Monte Carlos Simulation of SiO2 Highlighted PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kinetic Monte Carlos Simulation of SiO2 Highlighted PDF

Uploaded by

Copyright:

Available Formats

Microscopic Modeling and Optimal Operation of Thermal Atomic

This work develops a comprehensive framework for first-principles-based microscopic modeling,

Preprint submitted to Elsevier February 28, 2019

amount of computational resources, making it impossible to perform a simulation on an industrial

on a 3D lattice. Bis(tertiary-butylamino)silane (BTBAS) and ozone are chosen as the precursors,

provide information about film growth in real-time, it is computationally expensive to be imple-

2. ALD Process Description and Modeling

2.1. Structural Assumptions of Deposited SiO2

bis(diethylamino)silane (BDEAS), bis(dimethylamino)-silane (BDMAS), tris(dimethylamino)silane

2.3. Reaction Mechanism

the next section.

of BDEAS because of structural and electronic similarity (Schwille et al., 2017b).

with standard Arrhenius-type rate law as follows:

(Cortright and Dumesic, 2001).

2.5. Kinetic Monte-Carlo Algorithm

As mentioned in the Introduction, a first-principles Molecular Dynamic simulation is too com-

the normalized probabilities of the first i events:

Therefore, the total rate for O-Cycle is computed as follows:

ro,total = ro,phs + ro,des + roa ,f + rob ,f (7)

rsi,total = rbtbas,phs + rbtbas,des + rneigh,f + rneigh,r + rself,f + rself,r (8)

rsi,rxn = rneigh,f + rneigh,r + rself,f + rself,r + rbtbas,des (9)

rsi,ads = rbtbas,phs (10)

of reaction pathways and directions is calculated by the concentration-weighted reaction rates as

2.7. Artificial Neural Network Model and Non-linear Regression

activation function for better gradient propagation and efficient calculation:

ReLu(x) = x+ = max(0, x) (13)

E[g 2 ]t = γE[g 2 ]t−1 + (1 − γ)gt2 (15)

which are chosen from a uniform random distribution:

3.1. Validation of Microscopic kMC Model with Experimental Data

3.2. ANN Results for Si-Cycle

and industrial productions to operate the ALD process.

3.2.2. Optimal Operating Conditions

3.3. Cycle Time Optimization for Multi-Cycle SiO2 ALD

Section 3.2.2 is based on single-cycle simulations, it is demonstrated to be applicable for predicting

In this work, we developed an integrated framework for first-principles-based microscopic model-

calculations was corroborated by experimentally reported data. The thermodynamic-kinetic com-

Financial support from the National Science Foundation is gratefully acknowledged.

of Chemical Physics 98, 5648–5652.

process systems. Springer Science & Business Media.

Journal of the Electrochemical Society 159, H277–H285.

ciples. Journal of Materials Chemistry 14, 3246–3250.

Materials 20, 1390–1396.

Physical Chemistry C 116, 947–952.

principles, characteristics, and nanotechnology applications. John Wiley & Sons.

Kimaev, G., Ricardez-Sandoval, L.A., 2017. A comparison of efficient uncertainty quantification

ciative chemisorption of tris(dimethyl-amino)silane on hydroxylated SiO2 (001) surface. Journal

MacKay, D.J.C., 1992. Bayesian interpolation. Neural computation 4, 415–447.

morphology data. Journal of Applied Crystallography 44, 1272–1276.

Applied Materials & Interfaces 6, 10534–10541.

neural networks. The Journal of Chemical Physics 132, 074507.

Vacuum, Surfaces, and Films 35, 01B118.

networks. Chemometrics and Intelligent Laboratory Systems 39, 43–62.

run-to-run control. Journal of Process Control 19, 1589 – 1600.

You might also like