0% found this document useful (0 votes)
54 views87 pages

Comcot User Manual

The PCOMCOT User Manual details a parallel computer program designed for simulating nonlinear dispersive tsunami waves, enhancing efficiency and accuracy compared to existing models. Key features include a depth-integrated non-hydrostatic model, moving boundary techniques, and parallel implementations for both CPU and GPU. The manual provides comprehensive instructions on usage, governing equations, numerical methods, and various examples for model validation.

Uploaded by

eko.gunocipto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views87 pages

Comcot User Manual

The PCOMCOT User Manual details a parallel computer program designed for simulating nonlinear dispersive tsunami waves, enhancing efficiency and accuracy compared to existing models. Key features include a depth-integrated non-hydrostatic model, moving boundary techniques, and parallel implementations for both CPU and GPU. The manual provides comprehensive instructions on usage, governing equations, numerical methods, and various examples for model validation.

Uploaded by

eko.gunocipto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

PCOMCOT User Manual

(version 2.1)

by

Yifan Zhu ([email protected])

Chao An ([email protected])

School of Ocean and Civil Engineering, Shanghai Jiao Tong University


Shanghai 200240, China

October 1, 2024
Abstract

PCOCMOT is a parallel computer program for simulating nonlinear dispersive tsunami


waves. It is developed based on the shallow water model COMCOT (Cornell Multi-grid Cou-
pled Tsunami) and a depth-integrated non-hydrostatic model, with the motivation to enhance
computational efficiency and accuracy of large-scale tsunami simulations. The main features of
PCOMCOT include: (1) accounting for wave dispersion by correcting the shallow water equa-
tions with non-hydrostatic pressure terms; (2) moving boundary technique for inundation; (3)
eddy-viscosity scheme for wave breaking; (4) nested grids for cross-scale tsunami modeling; (5)
parallel implementation on both CPU and GPU. This manual provides a detailed description of
the governing equations and numerical schemes, as well as instructions on how to use the pro-
gram. Besides, various numerical examples are presented for model validation and performance
analysis.

Source code can be downloaded from https://github.com/yifanzhu-fluid/PCOMCOT

2
Contents
1 Introduction 5

2 Governing Equations 8
2.1 Depth-integrated Non-hydrostatic Model . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Governing Equations in Cartesian Coordinates . . . . . . . . . . . . . . . . . . . . . 13
2.3 Governing Equations in Earth Spherical Coordinates . . . . . . . . . . . . . . . . . . 14

3 Computational Method 15
3.1 Solution of Shallow Water Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Solution of Non-hydrostatic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Moving Boundary Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Wave Breaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Nested Grid Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Parallel Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6.1 CPU Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6.2 GPU Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8 Dealing with Numerical Instability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Configuration, Input and Output 37


4.1 Programming Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Compiling Source Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Compilation of CPU Version . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.2 Compilation of GPU Version . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Parameters in pcomcot.ctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.2 Parameters in layers.ctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.3 Format of Bathymetry Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3.4 Format of Initial Elevation and Flux Files . . . . . . . . . . . . . . . . . . . . 48
4.3.5 Parameters in FaultParameters.ctl . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.6 Parameters in Stations.ctl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3
4.4 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5 Examples 55
5.1 Solitary Wave Propagation on Flat Bottom . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Fluid Oscillation in a Paraboloidal Basin . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Solitary Wave Run-up on a Circular Island . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 2011 Tohoku Tsunami . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6 Citation of PCOMCOT 76

Appendix A MatLab Scripts to Read PCOMCOT Output 77


A.1 A sample Matlab script to read snapshots . . . . . . . . . . . . . . . . . . . . . . . . 77
A.2 A sample Matlab script to read station data . . . . . . . . . . . . . . . . . . . . . . . 79

References 81

4
1 Introduction
In the past two decades, various tsunami models have been developed for tsunami prediction and
warning, including the 2D depth-integrated models and the 3D models directly solving the Navier-
Stokes equations. The 2D models are adopted much more widely than 3D models, as they have
sufficient accuracy and much lower computational cost. There are basically two types of 2D models,
i.e., the shallow water models which treat tsunamis as ideal long waves, and the Boussinesq-type
models considering wave dispersion. Most operational tsunami codes are based on the shallow water
equations (SWEs), such as COMCOT (Liu et al., 1998; Wang and Liu, 2006), GEOCLAW (LeVeque
et al., 2011), MOST (Titov and González , 1997; Titov and Synolakis, 1998), TUNAMI (Imamura,
1996), etc. These codes are commonly applied to tsunamis generated by large earthquakes, and
have been validated in many real tsunami events (e.g., Wang and Liu, 2006; Harig et al., 2008;
Arcos and LeVeque, 2015; Heidarzadeh et al., 2016; Oishi et al., 2015). For landslide tsunamis
and trans-oceanic earthquake tsunamis, where the wavelength is relatively short or the propagation
distance is as long as thousands of kilometers, the dispersive effects may be significant (Glimsdal
et al., 2013). Under such circumstances, the Boussinesq-type models can provide more accurate
results. Different Boussinesq-type models such as FUNWAVE (Shi et al., 2012), JAGURS (Baba
et al., 2015, 2017), and COULWAVE (Lynett and Liu, 2004; Lynett et al., 2002) are now available,
and are mainly used by the scientific community in a research context.
Although the Boussinesq-type models satisfactorily describe the dynamics of tsunami waves,
they are quite difficult to solve. As the higher-order derivatives in the dispersive terms can easily
cause numerical instability, it is generally necessary to either simplify the governing equations or
implement complicated numerical treatment. Baba et al. (2015) rewrite the dispersive terms for
constant water depth with conserved variables (i.e., volume flux), and solve the equations with
the finite-difference scheme. For non-uniform water depth, the higher-order derivatives of non-
conserved variables are unavoidable, and more advanced schemes are needed to improve stability.
For example, Shi et al. (2012) utilize the MUSCL-TVD finite-volume scheme and shock-capturing
technique to solve the fully nonlinear Boussinesq equations. These complex methods are much
more time-consuming than solving the SWEs. As both accuracy and efficiency are important in
tsunami modeling, we aim to seek a dispersive model which can be stably solved with relatively
simple schemes.

5
The non-hydrostatic model, which decomposes the water pressure into hydrostatic and non-
hydrostatic components, are commonly used in 3D modeling of free surface flows (e.g., Casulli ,
1999; Koçyigit et al., 2002; Ma et al., 2012; Stelling and Zijlema, 2003; Zijlema et al., 2011). Such
model effectively describes wave dispersion, and just 2∼3 vertical grid cells can yield good results
for highly dispersive waves (Stelling and Zijlema, 2003). When using a single vertical layer, the 2D
version (i.e., depth-integrated non-hydrostatic model) shows accuracy comparable to the classical
Boussinesq equations, but is much simpler without higher-order derivatives (Stelling and Zijlema,
2003; Yamazaki et al., 2009; Zijlema and Stelling, 2008). The depth-integrated non-hydrostatic
model accounts for wave dispersion by correcting the SWEs with non-hydrostatic pressure terms,
and can be solved efficiently with a semi-implicit scheme (e.g., NEOWAVE, Yamazaki et al., 2009,
2011). This scheme is directly applicable to existing shallow water models, that is, adding the
implicitly solved correction terms to the explicit solution of SWEs.
Beside improvement of governing equations and numerical schemes, parallel computing is
another way to increase the efficiency of tsunami modeling. Most aforementioned tsunami models
have been parallelized for running on multiple CPU cores of HPC clusters (e.g., An et al., 2014;
Baba et al., 2015; Shi et al., 2012), which enables large-scale computation at reasonable time cost.
However, HPC clusters are highly expensive, and satisfactory acceleration may not be achieved
sometimes due to the time used by inter-core communication. These days, graphics processing
units (GPUs) are playing a more and more important role in numerical computation. Different
from CPUs, GPUs are highly parallel architectures with thousands of cores, and thus can provide
much higher throughput. Some dispersive tsunami models have recently been ported to GPUs.
For example, Yuan et al. (2020) developed a GPU version of FUNWAVE, and reported a speedup
ratio of > 4 compared with the CPU version running on a 36-core HPC node. Considering the
variety of computational environments, both CPU- and GPU-parallelization are needed to achieve
satisfactory performance on different platforms.
In this study, we develop a depth-integrated non-hydrostatic model and the corresponding
tsunami simulation package named PCOMCOT (Parallelized COMCOT). We theoretically derive
the depth-integrated non-hydrostatic model based on the Boussinesq equations. Our governing
equations are shown to have slightly better accuracy for wave dispersion than those in previous
studies (Stelling and Zijlema, 2003; Yamazaki et al., 2009; Zijlema and Stelling, 2008). This non-
hydrostatic model is added to the widely used tsunami package COMCOT (Cornell Multi-grid

6
Coupled Tsunami, An et al., 2014; Wang and Liu, 2006; Liu et al., 1998). First, the SWEs
are explicitly solved with the similar method as COMCOT. Then, the non-hydrostatic pressure
is calculated implicitly and used to correct the shallow water solution, which gives the dispersive
result. For near-shore processes, a moving-boundary technique is used to track wave run-up and
run-down, and an eddy-viscosity scheme is employed to handle wave breaking. A nested grid system
is adopted for tsunami modeling across different scales. For parallel implementation of PCOMCOT,
both a CPU version using the MPI library, and a CUDA-based GPU version are provided.
In summary, PCOMCOT is capable of simulating the whole life span of tsunamis — genera-
tion, propagation and inundation. It calculates nonlinear, dispersive, and breaking tsunami waves
stably and efficiently. Some important features of PCOMCOT are listed below.

• Non-hydrostatic pressure added to shallow water equations for wave dispersion

• Moving boundary technique for run-up and run-down.

• Eddy-viscosity scheme for wave breaking.

• One- and two-way nesting grids.

• Parallel implementation on both CPU and GPU.

7
2 Governing Equations

2.1 Depth-integrated Non-hydrostatic Model

In this section, we describe the governing equations of depth-integrated non-hydrostatic free surface
flows. These equations are the ones that PCOMCOT solves to simulate tsunamis. Here, for the first
time, the depth-integrated non-hydrostatic model is theoretically derived based on the Boussinesq
equations. By adopting a more realistic approximation of non-hydrostatic pressure, the accuracy in
dispersion is slightly improved compared with the previous models (Yamazaki et al., 2009; Stelling
and Zijlema, 2003; Zijlema and Stelling, 2008), without changing the simple form.
In the non-hydrostatic model, the water pressure p is decomposed into a hydrostatic part psta
and a non-hydrostatic part pdyn , as follows.

  
p = ρ psta + pdyn = ρ g(η − z) + pdyn , (2-1)

in which η represents the surface elevation, g is the gravitational acceleration, and the pressure is
normalized by water density. Thus, neglecting viscosity, the 3D continuity equation and incom-
pressible Navier-Stokes equations are expressed as

∂w
▽·u+ = 0, (2-2a)


∂z






 ∂u ∂u
+ u · ▽u + w + g▽η + ▽pdyn = 0, (2-2b)

 ∂t ∂z


∂w ∂w ∂pdyn


+ u · ▽w + w


 + = 0, (2-2c)
∂t ∂z ∂z
in which u and w denote the horizontal and vertical velocity components, respectively. Note that
the gradient operator ▽ works only in horizontal direction. The kinematic and dynamic boundary
conditions at the free surface and the seabed are

∂η
+ u · ▽η − w = 0, z = η, (2-3a)


∂t






pdyn = 0, z = η, (2-3b)



∂h




 + u · ▽h + w = 0, z = −h = −(h1 + hb ), (2-3c)
∂t
where the water depth h is divided into h1 and hb , representing the initial water depth which does

8
not change with time, and the depth change caused by seafloor motion, respectively.
For the sake of clarity, we will manipulate these equations in their dimensionless form. The
wave amplitude A, the characteristic water depth h0 , the wavenumber k, and the characteristic

phase speed of linear long wave gh0 are used for the non-dimensionalization. Following Chiang
et al. (2005), the dimensionless variables are

z h h1
(x′ , y ′ ) = k(x, y), z ′ = , t′ = k gh0 t, h′ = , h′1 =
p
,
h0 h0 h0
(2-4)
η hb u µ pdyn
η′ = , h′b = , u′ = √ , w′ = √ w, p′dyn = .
A A ε gh0 ε gh0 gA

Here we have introduced two important small parameters ε = A/h0 and µ2 = (kh0 )2 , which
represent wave nonlinearity and frequency dispersion, respectively. They are assumed to be in the
same magnitude for weakly nonlinear and moderately dispersive waves, that is,

O(ε) ≈ O(µ2 ) ≪ 1. (2-5)

The governing equations (2-2a) to (2-2c) and boundary conditions (2-3a) to (2-3c) are nondimen-
sionalized to be 
∂w
µ2 ▽ · u + = 0, (2-6a)


∂z





  
 ∂u

1 ∂u
+ ε u · ▽u + 2 w + ▽η + ▽pdyn = 0, (2-6b)
 ∂t µ ∂z



∂p
 
∂w 1 ∂w


+ dyn = 0,


 + ε u · ▽w + 2 w (2-6c)
∂t µ ∂z ∂z

  
2 ∂η
µ + εu · ▽η − w = 0, z = εη, (2-7a)




 ∂t



pdyn = 0, z = εη, (2-7b)



  

 2 ∂hb
µ + u · ▽h + w = 0, z = −h = −(h1 + εhb ). (2-7c)


∂t

Note that the primes of dimensionless variables are omitted for convenience, and ∂h
∂t = ε ∂h
∂t is used.
b

By integrating the continuity equation (2-6a) along the z direction from −h to εη and applying the

9
kinematic boundary conditions (2-7a, 2-7c), we obtain the 2D continuity equation as

∂    
η + hb + ▽ · h + εη u = 0, (2-8)
∂t
1
R εη
where u is the depth-averaged horizontal velocity defined as u = h+εη −h
udz. Note that the
Leibniz rule is used in the integration, that is,
Z εη Z εη
▽· udz = ▽ · udz + u · ε▽η + u · ▽h. (2-9)
−h −h z=εη z=−h

Since we are seeking equations with similar accuracy as the Boussinesq equations , we will
directly use the solutions of the Boussinesq equations to simplify the governing equations. For the
classical Boussinesq equations with terms up to O(ε, µ2 )(e.g., Chiang et al., 2005; Peregrine, 1967),
the horizontal and vertical velocities u and w are written as

2 ∗
u = u + µ u ,
 (2-10a)

 w = −µ2  z + h▽ · u + u · ▽h .



(2-10b)
R εη
Here, u∗ (x, y, z, t) is the dispersive component of horizontal velocity, and −h
u∗ dz = 0. By sub-
stituting (2-10a, 2-10b) into equations (2-6b, 2-6c), and ignoring the higher-order terms, the N-S
equations are approximated as

∂ 2 ∗

 ∂t u + µ u + εu · ▽u + ▽η + ▽pdyn = 0, (2-11a)



 ∂w ∂pdyn

 + = 0. (2-11b)
∂t ∂z
Integrating equations (2-11a, 2-11b) over the total water depth, and neglecting O(εµ2 ) terms, we
obtain the following depth-integrated momentum equations after some manipulations.
   Z εη 
 ∂F FF
 ∂t + ε▽ · D + D▽η + ▽ −h pdyn dz − q▽h = 0, (2-12a)


∂W q


− = 0. (2-12b)


∂t D

Here F is the horizontal volume flux, FF is a dyadic tensor (i.e., the outer product of F and
itself), D is the total water depth, q is the non-hydrostatic pressure at the bottom, and W is the

10
depth-averaged vertical velocity.
εη
1
Z
 
F = h + εη u, D = h + εη , q = pdyn , W = w dz. (2-13)
z=−h h + εη −h

Because the vertical velocity varies linearly in z direction as indicated by equation (2-10b), W is
simply the average of w at the free surface and the bottom. Again, please note that the Leibniz
rule is applied when integrating ut and ▽pdyn along z direction, which is
 Z εη Z εη
∂ ∂u ∂η ∂hb
udz = dz + ε u + ε u, (2-14a)


 ∂t −h ∂t ∂t ∂t


−h

 Z εη

Z εη


 ▽ p dyn
dz = ▽pdyn dz + pdyn ε▽η + pdyn ▽h. (2-14b)
−h −h z=εη z=−h

And the relation below is used to derive the conservative form in equation (2-12a).

      
h + εη u · ▽u = ▽ · h + εη uu − ▽ · h + εη u u. (2-15)

The above depth-integrated continuity and momentum equations (2-8, 2-12) are expressed in di-
mensional form as

∂ 
η + h + ▽ · F = 0, (2-16a)






 ∂t
   Z η 
 ∂F

FF
+▽· + gD▽η + ▽ pdyn dz − q▽h = 0, (2-16b)

 ∂t D −h



∂W q


= , (2-16c)


∂t D

Rη 
where F = −h udz = h+η u, D = h+η, and W is simply the average value of w at the free surface
and the bottom given by (2-3a, 2-3c). These equations have the same accuracy in nonlinearity and
dispersion as the classical Boussinesq equations, but cannot be solved yet, because the integration
of pdyn is not given. A reasonable approximation of pdyn is needed for evaluation of this integration.
The expression of pdyn corresponding to the classical Boussinesq equations is given by Peregrine
(1967) as
∂ z2 ∂
pdyn = z ▽ · (hu) + ▽ · u, (2-17)
∂t 2 ∂t
in which pdyn varies quadratically in the vertical direction. This expression cannot be directly
adopted in the depth-integrated non-hydrostatic model, because depth-integration of (2-17) would

11
lead to complex coefficients of q in the horizontal momentum equation. These coefficients con-
tain higher-order derivatives, making it highly difficult to stably solve q. To simplify the depth-
integration of pdyn , we rewrite (2-17) as
( 2 )
z + h − D2   
pdyn = + τ D z + h − D ▽ · ut . (2-18)
2

Here, we assume that h ≈ D and neglect the terms containing ∂h/∂t, which only causes difference
in the order of O(εµ2 ). And the dimensionless parameter τ in (2-18) is expressed as

ut · ▽h ▽h 1 λ
τ= ∼ = . (2-19)
h▽ · ut kh 2π h/▽h

Though τ is generally a variable with respect to time and space, its order can be roughly estimated
to be ▽h/(kh), which represents the contribution of water depth variation to wave dispersion. Since
pdyn mainly exists for short waves in the deep ocean with smooth bottom, |▽h| is much less than kh
in most cases where dispersive effects are significant. Thus, it is reasonable to assume that τ ≈ 0.
Correspondingly, the vertical distribution of pdyn is approximated as a pure quadratic function of
z + h (i.e., vertical distance from the seabed), and the integration of pdyn becomes

η η
2
z + h − D2 D3 2
Z Z
pdyn dz ≈ ▽ · ut dz = − ▽ · ut = qD. (2-20)
−h −h 2 3 3

Substituting (2-20) into (2-16b), we finally obtain the governing equations of the depth-integrated
non-hydrostatic model, that is

 ∂ η + h + ▽ · F = 0,

 (2-21a)
 ∂t




 ∂F
  
FF  
+▽· + gD▽η + α D▽q + q▽ η − βh = 0, (2-21b)

 ∂t D



 ∂W
 q
= , (2-21c)


∂t D

in which the values of coefficients α and β are α = 2/3, β = 0.5.


The governing equations (2-21) are the same as the conservative shallow water equations,
except the addition of a vertical momentum equation and the non-hydrostatic terms in the horizontal
momentum equation. Thus, the whole system is closed with three unknowns {η, F, q} being solved
by three equations. In the above derivation, we approximate the complex vertical distribution of

12
pdyn as a simple quadratic function. As a result, the horizontal momentum equation is significantly
simplified to (2-21b). By avoiding higher-order derivatives, we can obtain similar dispersive results
as the classical Boussinesq equations with much simpler schemes, which is the main advantage of
the depth-integrated non-hydrostatic model.
In previous studies about the depth-integrated non-hydrostatic model (e.g., Yamazaki et al.,
2009; Stelling and Zijlema, 2003), a linear vertical distribution of pdyn is simply assumed, which
leads to a horizontal momentum equation with α = 0.5, β = 1.0. In fact, non-hydrostatic pressure
of moderately dispersive waves varies quadratically instead of linearly along z direction, as is shown
by the Boussinesq equations. In this study, we propose a more realistic approximation of pdyn
(i.e., a pure quadratic function of z + h), and derive a different horizontal momentum equation
with α = 2/3, β = 0.5. Such modification slightly improves the accuracy in wave dispersion, and
brings it closer the classical Boussinesq equations. In numerical tests, our model performs better
for smooth bathymetry than the previous one, though the difference in the leading and the first
trailing waves is not significant.

2.2 Governing Equations in Cartesian Coordinates

The governing equations (2-21) together with bottom friction effects in Cartesian coordinates are
written as

∂  ∂M ∂N
η+h + + = 0, (2-22a)


∂t ∂x ∂y






∂ M2 ∂(η − βh)
      
 ∂M
 ∂ MN ∂η ∂q
 ∂t + ∂x D + ∂y

 + gD = −α D +q − fx , (2-22b)
D ∂x ∂x ∂x

∂ N2 ∂(η − βh)
     

 ∂N ∂ MN ∂η ∂q
+ + + gD = −α D + q − fy , (2-22c)


∂t ∂x D ∂y D ∂y ∂y ∂y






∂W q




 = , (2-22d)
∂t D
in which η is the free surface elevation, (M, N ) denote the volume fluxes in x and y directions
respectively, D = η + h is the total water depth, q is defined as the non-hydrostatic pressure at
the bottom, W is the depth-averaged vertical velocity equal to the average of w at the surface and
the seabed given by (2-3a, 2-3c), and (fx , fy ) are the bottom friction in x and y directions. The

13
bottom friction is evaluated via Manning’s formula, that is

gn2

 fx = M (M 2 + N 2 )1/2 ,
D7/3


(2-23)
2
f = gn N (M 2 + N 2 )1/2 ,


y
D7/3
where n is the Manning’s roughness coefficient which is usually set to ∼ 0.03.

2.3 Governing Equations in Earth Spherical Coordinates

On the surface of Earth, all the differential operators should be expressed in the spherical coordinates
defined by (R, x, y), where R is the constant Earth radius, x and y are the longitude and latitude,
respectively. By rewriting all the terms in spherical coordinates and considering the influence of
Earth rotation, the governing equations on Earth surface are
  
∂  1 ∂M ∂
η+h + + (N cos y) = 0, (2-24a)


∂t R cos y ∂x ∂y








∂ M2
    

 ∂M 1 1 ∂ MN sin y 2M N gD ∂η
+ + − +


∂t R cos y ∂x D R ∂y D R cos y D R cos y ∂x



(2-24b)



∂(η − βh)
 
D ∂q q


= −α + − fx + 2N Ω sin y,


R cos y ∂x R cos y ∂x


1 ∂ N2 sin y M 2 − N 2
    

 ∂N 1 ∂ MN gD ∂η
+ + + +


∂t R cos y ∂x D R ∂y D R cos y D R ∂y



(2-24c)



q ∂(η − βh)
 
D ∂q


= −α + − fy − 2M Ω sin y,






 R ∂y R ∂y



 ∂W q



 = , (2-24d)
∂t D
where Ω is the angular velocity of Earth rotation, and the corresponding terms represent Coriolis
forces.

14
3 Computational Method
The governing equations of our depth-integrated non-hydrostatic model are solved with a semi-
implicit staggered finite-difference scheme. In such method, the shallow water equations without the
non-hydrostatic pressure are firstly solved explicitly to give an intermediate result. Then, the non-
hydrostatic pressure is implicitly solved from a Poisson-type equation. Finally, the shallow water
solution is corrected by the non-hydrostatic pressure, and the dispersive result is obtained. Besides,
a moving boundary technique is adopted to compute inundation, and an empirical eddy-viscosity
scheme is used for wave breaking. Nested grids are employed to simulate tsunamis in different scales,
and both one-way and two-way nesting algorithms are provided. All the numerical schemes are
parallelized for a CPU cluster and a GPU. For the CPU version, we assign the computational task
to multiple CPU cores using space domain decomposition, with inter-core communication facilitated
by the MPI library. For the GPU version, we adopt the CUDA FORTRAN programming model
and map the CUDA threads to the entire simulation domain.

j+1
<latexit sha1_base64="DbVXllLN1AvyAgO2r/dNdXEI854=">AAAB6nicbVDJSgNBEK2JW4xb1KOXxiAIQpgRt2PQi8eIZoFkCD2dmqRNT8/Q3SOEIZ/gxYMiXv0ib/6NneWgiQ8KHu9VUVUvSATXxnW/ndzS8srqWn69sLG5tb1T3N2r6zhVDGssFrFqBlSj4BJrhhuBzUQhjQKBjWBwM/YbT6g0j+WDGSboR7QnecgZNVa6fzzxOsWSW3YnIIvEm5ESzFDtFL/a3ZilEUrDBNW65bmJ8TOqDGcCR4V2qjGhbEB72LJU0gi1n01OHZEjq3RJGCtb0pCJ+nsio5HWwyiwnRE1fT3vjcX/vFZqwis/4zJJDUo2XRSmgpiYjP8mXa6QGTG0hDLF7a2E9amizNh0CjYEb/7lRVI/LXsX5fO7s1LlehZHHg7gEI7Bg0uowC1UoQYMevAMr/DmCOfFeXc+pq05ZzazD3/gfP4AqomNaA==</latexit>

<latexit sha1_base64="O9K+vV1ybIqz4CVhy7TlY5KnTbo=">AAAB/3icbVDJSgNBEO2JW4zbqODFS5MgRJQwI7gcg148SQSzQGYIPZ2epE3PQneNMMQc/Ae/wIsHRbz6G97yN3aWgyY+KHi8V0VVPS8WXIFlDY3MwuLS8kp2Nbe2vrG5ZW7v1FSUSMqqNBKRbHhEMcFDVgUOgjViyUjgCVb3elcjv/7ApOJReAdpzNyAdELuc0pASy1z7wY7gvmAi5gf32NH8k4X8GHLLFglaww8T+wpKZTzztHzsJxWWua3045oErAQqCBKNW0rBrdPJHAq2CDnJIrFhPZIhzU1DUnAlNsf3z/AB1ppYz+SukLAY/X3RJ8ESqWBpzsDAl01643E/7xmAv6F2+dhnAAL6WSRnwgMER6FgdtcMgoi1YRQyfWtmHaJJBR0ZDkdgj378jypnZTss9LprU7jEk2QRfsoj4rIRueojK5RBVURRY/oBb2hd+PJeDU+jM9Ja8aYzuyiPzC+fgBP/5db</latexit>

N (i, j)

<latexit sha1_base64="EZna7IcOl6YI1aYqR9ArbOJ6L6k=">AAACAnicbVDJSgNBEO2JW4xb1JN4aRKEiBJmBJdj0IvHCGaBTAg9nZqkTc9Cd40QQvDiJ/gLXjwo4tWv8Ja/sbMc1Pig4PFeFVX1vFgKjbY9slILi0vLK+nVzNr6xuZWdnunqqNEcajwSEaq7jENUoRQQYES6rECFngSal7vauzX7kFpEYW32I+hGbBOKHzBGRqpld1zARl1JfhIC1Qc31FXiU4X6WErm7eL9gR0njgzki/l3KOnUalfbmW/3HbEkwBC5JJp3XDsGJsDplBwCcOMm2iIGe+xDjQMDVkAujmYvDCkB0ZpUz9SpkKkE/XnxIAFWvcDz3QGDLv6rzcW//MaCfoXzYEI4wQh5NNFfiIpRnScB20LBRxl3xDGlTC3Ut5linE0qWVMCM7fl+dJ9aTonBVPb0wal2SKNNknOVIgDjknJXJNyqRCOHkgz+SVvFmP1ov1bn1MW1PWbGaX/IL1+Q3OBJjB</latexit> <latexit sha1_base64="I1oZMPx2YuqBRUrX7bnqcron6vI=">AAAB/3icbVDJSgNBEO2JW4zbqODFS5MgRJQwI7gcg148RjALZIbQ0+lJ2vQsdtcIQ8zBf/ALvHhQxKu/4S1/Y2c5aOKDgsd7VVTV82LBFVjW0MgsLC4tr2RXc2vrG5tb5vZOTUWJpKxKIxHJhkcUEzxkVeAgWCOWjASeYHWvdzXy6w9MKh6Ft5DGzA1IJ+Q+pwS01DL37rEjmA+4iPnxHXYk73QBH7bMglWyxsDzxJ6SQjnvHD0Py2mlZX477YgmAQuBCqJU07ZicPtEAqeCDXJOolhMaI90WFPTkARMuf3x/QN8oJU29iOpKwQ8Vn9P9EmgVBp4ujMg0FWz3kj8z2sm4F+4fR7GCbCQThb5icAQ4VEYuM0loyBSTQiVXN+KaZdIQkFHltMh2LMvz5PaSck+K53e6DQu0QRZtI/yqIhsdI7K6BpVUBVR9Ihe0Bt6N56MV+PD+Jy0ZozpzC76A+PrB4fql34=</latexit>

⌘ (i, j) q (i, j) <latexit sha1_base64="UCvY1Y5uYUcl0g24eTSBC0miXyo=">AAAB/3icbVDJSgNBEO2JW4zbqODFS5MgRJQwI7gcg168CBHMApkh9HR6kjY9C901whBz8B/8Ai8eFPHqb3jL39hZDpr4oODxXhVV9bxYcAWWNTQyC4tLyyvZ1dza+sbmlrm9U1NRIimr0khEsuERxQQPWRU4CNaIJSOBJ1jd612N/PoDk4pH4R2kMXMD0gm5zykBLbXMvRvsCOYDLmJ+fI8dyTtdwIcts2CVrDHwPLGnpFDOO0fPw3JaaZnfTjuiScBCoIIo1bStGNw+kcCpYIOckygWE9ojHdbUNCQBU25/fP8AH2iljf1I6goBj9XfE30SKJUGnu4MCHTVrDcS//OaCfgXbp+HcQIspJNFfiIwRHgUBm5zySiIVBNCJde3YtolklDQkeV0CPbsy/OkdlKyz0qntzqNSzRBFu2jPCoiG52jMrpGFVRFFD2iF/SG3o0n49X4MD4nrRljOrOL/sD4+gFOZpda</latexit>

<latexit sha1_base64="7iMS59GYxOfY2WIEkgFAj+fvqFs=">AAAB73icbVDLSgNBEOz1GeMr6tHLYBA8hV3xdQzqwWME84BkCbOT2WTI7Ow60yssIT/hxYMiXv0db/6Nk2QPmljQUFR1090VJFIYdN1vZ2l5ZXVtvbBR3Nza3tkt7e03TJxqxusslrFuBdRwKRSvo0DJW4nmNAokbwbDm4nffOLaiFg9YJZwP6J9JULBKFqp1bnlEinJuqWyW3GnIIvEy0kZctS6pa9OL2ZpxBUySY1pe26C/ohqFEzycbGTGp5QNqR93rZU0YgbfzS9d0yOrdIjYaxtKSRT9ffEiEbGZFFgOyOKAzPvTcT/vHaK4ZU/EipJkSs2WxSmkmBMJs+TntCcocwsoUwLeythA6opQxtR0Ybgzb+8SBqnFe+icn5/Vq5e53EU4BCO4AQ8uIQq3EEN6sBAwjO8wpvz6Lw4787HrHXJyWcO4A+czx+aSY+z</latexit>

y M (i, j)
j
<latexit sha1_base64="ilY/QkTkYupHlTufaTNlJZ/WxVI=">AAAB6HicbVDLTgJBEOzFF+IL9ehlIjHxRHaNryPRi0dI5JHAhswOvTAwO7uZmTUhhC/w4kFjvPpJ3vwbB9iDgpV0UqnqTndXkAiujet+O7m19Y3Nrfx2YWd3b/+geHjU0HGqGNZZLGLVCqhGwSXWDTcCW4lCGgUCm8HofuY3n1BpHstHM07Qj2hf8pAzaqxUG3aLJbfszkFWiZeREmSodotfnV7M0gilYYJq3fbcxPgTqgxnAqeFTqoxoWxE+9i2VNIItT+ZHzolZ1bpkTBWtqQhc/X3xIRGWo+jwHZG1Az0sjcT//PaqQlv/QmXSWpQssWiMBXExGT2NelxhcyIsSWUKW5vJWxAFWXGZlOwIXjLL6+SxkXZuy5f1S5LlbssjjycwCmcgwc3UIEHqEIdGCA8wyu8OUPnxXl3PhatOSebOYY/cD5/ANRBjPg=</latexit>

<latexit sha1_base64="WqBSNHpR9nEyi74E+G99qADckjQ=">AAACAnicbVDJSgNBEO2JW4xb1JN4aRKEiBJmBJdj0IvHCGaBzDD0dHqSNj0L3TVKGIIXP8Ff8OJBEa9+hbf8jZ3loIkPCh7vVVFVz4sFV2CaQyOzsLi0vJJdza2tb2xu5bd36ipKJGU1GolINj2imOAhqwEHwZqxZCTwBGt4vauR37hnUvEovIV+zJyAdELuc0pAS25+78FNvYEtmA+4hPnxHbYl73QBH7r5olk2x8DzxJqSYqVgHz0PK/2qm/+22xFNAhYCFUSplmXG4KREAqeCDXJ2olhMaI90WEvTkARMOen4hQE+0Eob+5HUFQIeq78nUhIo1Q883RkQ6KpZbyT+57US8C+clIdxAiykk0V+IjBEeJQHbnPJKIi+JoRKrm/FtEskoaBTy+kQrNmX50n9pGydlU9vdBqXaIIs2kcFVEIWOkcVdI2qqIYoekQv6A29G0/Gq/FhfE5aM8Z0Zhf9gfH1A5DsmTs=</latexit>

ws (i, j) wb (i, j)
<latexit sha1_base64="6XE4RKbziooW2bPpqYYURBdN66M=">AAACAnicbVDJSgNBEO2JW4xb1JN4aRKEiBJmBJdj0IvHCGaBzDD0dHqSNj0L3TVKGIIXP8Ff8OJBEa9+hbf8jZ3loIkPCh7vVVFVz4sFV2CaQyOzsLi0vJJdza2tb2xu5bd36ipKJGU1GolINj2imOAhqwEHwZqxZCTwBGt4vauR37hnUvEovIV+zJyAdELuc0pAS25+78FN1cAWzAdcwvz4DtuSd7qAD9180SybY+B5Yk1JsVKwj56HlX7VzX/b7YgmAQuBCqJUyzJjcFIigVPBBjk7USwmtEc6rKVpSAKmnHT8wgAfaKWN/UjqCgGP1d8TKQmU6gee7gwIdNWsNxL/81oJ+BdOysM4ARbSySI/ERgiPMoDt7lkFERfE0Il17di2iWSUNCp5XQI1uzL86R+UrbOyqc3Oo1LNEEW7aMCKiELnaMKukZVVEMUPaIX9IbejSfj1fgwPietGWM6s4v+wPj6AawVmUw=</latexit>

j 1
<latexit sha1_base64="5noJuvqiKeFqEujUPXXkg9zD2cY=">AAAB6nicbVDJSgNBEK2JW4xb1KOXxiB4McyI2zHoxWNEs0AyhJ5OTdKmp2fo7hHCkE/w4kERr36RN//GznLQxAcFj/eqqKoXJIJr47rfTm5peWV1Lb9e2Njc2t4p7u7VdZwqhjUWi1g1A6pRcIk1w43AZqKQRoHARjC4GfuNJ1Sax/LBDBP0I9qTPOSMGivdP554nWLJLbsTkEXizUgJZqh2il/tbszSCKVhgmrd8tzE+BlVhjOBo0I71ZhQNqA9bFkqaYTazyanjsiRVbokjJUtachE/T2R0UjrYRTYzoiavp73xuJ/Xis14ZWfcZmkBiWbLgpTQUxMxn+TLlfIjBhaQpni9lbC+lRRZmw6BRuCN//yIqmflr2L8vndWalyPYsjDwdwCMfgwSVU4BaqUAMGPXiGV3hzhPPivDsf09acM5vZhz9wPn8ArZONag==</latexit>

i 1 i
<latexit sha1_base64="LRo3G0LCjQKMMiQ6WmvD8uC4iUk=">AAAB6nicbVDLSgNBEOz1GeMr6tHLYBC8GHbF1zHoxWNE84BkCbOT3mTI7OwyMyuEJZ/gxYMiXv0ib/6Nk2QPmljQUFR1090VJIJr47rfztLyyuraemGjuLm1vbNb2ttv6DhVDOssFrFqBVSj4BLrhhuBrUQhjQKBzWB4O/GbT6g0j+WjGSXoR7QvecgZNVZ64Kdet1R2K+4UZJF4OSlDjlq39NXpxSyNUBomqNZtz02Mn1FlOBM4LnZSjQllQ9rHtqWSRqj9bHrqmBxbpUfCWNmShkzV3xMZjbQeRYHtjKgZ6HlvIv7ntVMTXvsZl0lqULLZojAVxMRk8jfpcYXMiJEllClubyVsQBVlxqZTtCF48y8vksZZxbusXNyfl6s3eRwFOIQjOAEPrqAKd1CDOjDowzO8wpsjnBfn3fmYtS45+cwB/IHz+QOsDY1p</latexit> <latexit sha1_base64="+iA735F8IicH/wr4EQv6NPT53YM=">AAAB6HicbVDLSgNBEOz1GeMr6tHLYBA8hV3xdQx68ZiAeUCyhNlJbzJmdnaZmRXCki/w4kERr36SN//GSbIHTSxoKKq66e4KEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6m/qtJ1Sax/LBjBP0IzqQPOSMGivVea9UdivuDGSZeDkpQ45ar/TV7ccsjVAaJqjWHc9NjJ9RZTgTOCl2U40JZSM6wI6lkkao/Wx26IScWqVPwljZkobM1N8TGY20HkeB7YyoGepFbyr+53VSE974GZdJalCy+aIwFcTEZPo16XOFzIixJZQpbm8lbEgVZcZmU7QheIsvL5PmecW7qlzWL8rV2zyOAhzDCZyBB9dQhXuoQQMYIDzDK7w5j86L8+58zFtXnHzmCP7A+fwB0r2M9w==</latexit>

i+1
<latexit sha1_base64="meHQFpdmn+HMhRmHkObkNteWKyQ=">AAAB6nicbVDLSgNBEOz1GeMr6tHLYBAEIeyKr2PQi8eI5gHJEmYnvcmQ2dllZlYISz7BiwdFvPpF3vwbJ8keNLGgoajqprsrSATXxnW/naXlldW19cJGcXNre2e3tLff0HGqGNZZLGLVCqhGwSXWDTcCW4lCGgUCm8HwduI3n1BpHstHM0rQj2hf8pAzaqz0wE+9bqnsVtwpyCLxclKGHLVu6avTi1kaoTRMUK3bnpsYP6PKcCZwXOykGhPKhrSPbUsljVD72fTUMTm2So+EsbIlDZmqvycyGmk9igLbGVEz0PPeRPzPa6cmvPYzLpPUoGSzRWEqiInJ5G/S4wqZESNLKFPc3krYgCrKjE2naEPw5l9eJI2zindZubg/L1dv8jgKcAhHcAIeXEEV7qAGdWDQh2d4hTdHOC/Ou/Mxa11y8pkD+APn8wepA41n</latexit>

x
<latexit sha1_base64="9bzkNiA1kSdKqCMSPUPAjvk7w0I=">AAAB73icbVDLSgNBEJyNrxhfUY9eBoPgKeyKr2NQDx4jmAckS5iddJIhs7PrTK8YlvyEFw+KePV3vPk3TpI9aGJBQ1HVTXdXEEth0HW/ndzS8srqWn69sLG5tb1T3N2rmyjRHGo8kpFuBsyAFApqKFBCM9bAwkBCIxheT/zGI2gjInWPoxj8kPWV6AnO0ErN9g1IZPSpUyy5ZXcKuki8jJRIhmqn+NXuRjwJQSGXzJiW58bop0yj4BLGhXZiIGZ8yPrQslSxEIyfTu8d0yOrdGkv0rYU0qn6eyJloTGjMLCdIcOBmfcm4n9eK8HepZ8KFScIis8W9RJJMaKT52lXaOAoR5YwroW9lfIB04yjjahgQ/DmX14k9ZOyd14+uzstVa6yOPLkgBySY+KRC1Iht6RKaoQTSZ7JK3lzHpwX5935mLXmnGxmn/yB8/kDmMWPsg==</latexit>

Figure 3.1 Staggered grid setup in PCOMCOT. The surface elevation η, non-hydrostatic pressure
q, the vertical velocity at the free surface ws and that at the bottom wb are defined at the center
of each grid cell. While the volume flux components M and N are defined at the cell interfaces.

15
3.1 Solution of Shallow Water Equations

The shallow water equations in the Cartesian or Earth coordinates comprise the continuity equation
(2-22a) or (2-24a), and the horizontal momentum equations (2-22b, 2-22c) or (2-24b, 2-24c), with
the non-hydrostatic terms neglected. By further ignoring the nonlinear terms in the momentum
equations, the linear shallow water equations are obtained. Both the linear and nonlinear shallow
water equations are solved explicitly in the staggered grids shown in Figure 3.1, where the water
depth hi,j and surface elevation ηi,j are defined at the center of grid cell (i, j), while the volume
fluxes Mi,j and Ni,j are evaluated at the cell interfaces (i + 1/2, j) and (i, j + 1/2), respectively.
Most tsunami models including COMCOT (Wang and Liu, 2006; Liu et al., 1998), solve the
linear shallow water equations with a forward time-centered space (FTCS) scheme. The FTCS
scheme is simple and efficient, but has some stability problems at low friction, which usually causes
spurious oscillations near coastlines (de Almeida et al., 2012; Bates et al., 2010). de Almeida et al.
(2012) have demonstrated that these instabilities originate from the lack of diffusive terms in the
modified equations for FTCS scheme, and proposed a “q-centered” scheme (referred to as “flux-
centered” in this paper) to address this problem. The flux-centered scheme introduces artificial
diffusive terms without changing the equations to be solved, and thus is adopted in PCOMCOT for
improving stability when solving the SWEs. This scheme is the same as the FTCS method, except
that each flux variable at the previous time step is replaced by the weighted average of itself and
the neighboring ones.
In Cartesian coordinates, the linear shallow water equations are discretized as
   ∆t   ∆t  
n+1

ηi,j n
= ηi,j − hn+1 n
i,j − hi,j −
n
Mi,j n
− Mi−1,j − n
Ni,j n
− Ni,j−1 ,
∆x ∆y






1 − θ n ∆t  n+1
  
n+1 n n n+1 n+1 (3-1)
Mi,j = θMi,j + Mi−1,j + Mi+1,j − gDi+1/2,j ηi+1,j − ηi,j − fx ∆t,


 2 ∆x

1 − θ n ∆t  n+1
  
n+1 n n n+1 n+1

Ni,j = θNi,j + Ni,j−1 + Ni,j+1 − gDi,j+1/2 ηi,j+1 − ηi,j − fy ∆t.


2 ∆y

16
In Earth coordinates, the discretization is given as
  
n+1


 ηi,j n
= ηi,j − hn+1
i,j − hni,j



 

 1 ∆t  n  ∆t  
n n n
− Mi,j − Mi−1,j − Ni,j cos yj+1/2 − Ni,j−1 cos yj−1/2 ,






 R cos yj ∆x ∆y


1 − θ n


 n+1 n n

n+1 ∆t 
n+1 n+1

Mi,j = θMi,j +

 Mi−1,j + Mi+1,j − gDi+1/2,j ηi+1,j − ηi,j
2 R∆x cos yj
(3-2)


n
+ 2Ni+1/2,j−1/2 ∆tΩ sin yj − fx ∆t,








1 − θ n ∆t  n+1

  
n+1 n n n+1 n+1
Ni,j = θNi,j + Ni,j−1 + Ni,j+1 − gDi,j+1/2 ηi,j+1 − ηi,j





 2 R∆y



n
− 2Mi−1/2,j+1/2 ∆tΩ sin yj+1/2 − fy ∆t.

In the above finite difference formulation, ∆t denotes the time step size, ∆x and ∆y the grid
sizes in x and y directions, respectively. The parameter θ is the weighting factor of the flux-centered
scheme, and can be adjusted between 0 and 1. With θ = 0, the Lax-Wendroff diffusive scheme (Lax
and Wendroff , 1960) is obtained, while θ = 1 restores the FTCS formulation without numerical
diffusion. In PCOMCOT, the value of θ is calculated by an adaptive formulation (Sridharan et al.,
2020), which varies in time and space depending on the local velocity and water depth. We find
that the flux-centered scheme, together with the adaptive weighting factor, effectively eliminate
unphysical oscillations without changing the tsunami waveforms.
The flow depth at cell interfaces is evaluated with the upwind scheme of Kowalik et al. (2005),
which is 
hn+1 n+1

n+1 i,j + hi+1,j n

η + , Mi,j ≥ 0,

 


  i,j

2
n+1

 Di+1/2,j =  (3-3a)



n+1 n+1
η n+1 + hi,j + hi+1,j , M n < 0,

 

 
i+1,j i,j
2

hn+1 n+1

i,j + hi,j+1


n+1 n
≥ 0,

η + , Ni,j
 



  i,j

2
n+1
D = (3-3b)


i,j+1/2

n+1 n+1

η n+1 + hi,j + hi,j+1 , N n < 0.

 


 
i,j+1 i,j

2

17
The bottom friction terms are calculated as

gn2
 2  2 1/2
n n n
f = M M + N ,

x
 7/3 i,j

  i,j i+1/2,j−1/2
n+1
D


i+1/2,j


(3-4)

 gn2
 2  2 1/2
n n n
f = N M + N .



 y  7/3 i,j i−1/2,j+1/2 i,j
n+1

 Di,j+1/2

And the fluxes not originally defined in the staggered grids are estimated with the average of existing
ones around them.
1
 
Mi−1/2,j+1/2 = 4 Mi−1,j + Mi,j + Mi−1,j+1 + Mi,j+1 ,


(3-5)
Ni+1/2,j−1/2 = 1 Ni,j−1 + Ni+1,j−1 + Ni,j + Ni+1,j .
  

4
At each time step, the surface elevation η is firstly calculated from the continuity equation, and
then the fluxes M and N are updated by the momentum equation.
For nonlinear shallow water equations, the nonlinear convection terms are discretized with an
upwind scheme and then added to the solution of linear momentum equations.
In Cartesian coordinates, the nonlinear momentum equations are discretized as

∆t   ∆t 
 
n+1
 M
 i,j
 = linear terms − M U 2 − M U 1 − N U2 − N U1 ,
∆x ∆y

(3-6)
 n+1 ∆t   ∆t  
Ni,j = linear terms − M V2 − M V1 − N V2 − N V1 .


∆x ∆y
In Earth coordinates, the discretization is expressed as

∆t ∆t 
   
n+1
 M i,j = linear terms − M U 2 − M U 1 − N U2 − N U1
R∆x cos yj R∆y





n n

∆t sin yj 2Mi,j Ni+1/2,j−1/2



+ ,


R cos yj n+1
Di+1/2,j





∆t   ∆t (3-7)
n+1


 N i,j = linear terms − M V 2 − M V 1 − N V2 − N V1
R∆x cos yj+1/2 R∆y







   2   2
n n
∆t sin yj+1/2 Mi−1/2,j+1/2 − Ni,j



− .


n+1

 R cos yj+1/2 Di,j+1/2

Here linear terms represent the terms on the right-hand side of (3-1) or (3-2), and the nonlinear

18
terms in parentheses are evaluated as follows.
  2  2
n n

 
 M i,j M i−1,j
n

− , Mi,j ≥ 0,

 

n+1 n+1
 
D D

 


  i+1/2,j i−1/2,j



 M U2 − M U1 =  2  2 (3-8a)

n n

M M
 

i+1,j i,j

 
 n
− n+1 , Mi,j < 0,

 
  n+1

Di+3/2,j Di+1/2,j






  n n
Mn Nn

 M N
 i,j i+1/2,j−1/2 − i,j−1 i+1/2,j−3/2 ,

n
≥ 0,

Ni+1/2,j−1/2
 
n+1 n+1
 
Di+1/2,j Di+1/2,j−1

 


 
N U2 − N U1 = (3-8b)



n n n n
Mi,j+1 Ni+1/2,j+1/2 Mi,j Ni+1/2,j−1/2

 
n
 
− , Ni+1/2,j−1/2 < 0,

 

 n+1 n+1
Di+1/2,j+1 Di+1/2,j
 



 n n n n

 Mi−1/2,j+1/2 Ni,j Mi−3/2,j+1/2 Ni−1,j n
− , Mi−1/2,j+1/2 ≥ 0,

 

n+1 n+1
 
Di,j+1/2 Di−1,j+1/2

 


 
 M V2 − M V1 =  M n (3-8c)


n n n

i+1/2,j+1/2 Ni+1,j Mi−1/2,j+1/2 Ni,j

n
 
− , Mi−1/2,j+1/2 < 0,

 

n+1 n+1
Di+1,j+1/2 Di,j+1/2

 








  2  2
n n
Ni,j Ni,j−1

 
 
 n
− , Ni,j ≥ 0,
 

  n+1 n+1
D D

 

i,j+1/2 i,j−1/2

 
N V2 − N V1 =  (3-8d)



 2  2
 
n n
 Ni,j+1 Ni,j

 


n
 
− n+1 , Ni,j < 0.

 

 n+1
Di,j+3/2 Di,j+1/2

3.2 Solution of Non-hydrostatic Model

The non-hydrostatic pressure can be implicitly calculated using the solution of shallow water equa-
tions, and then used to correct the hydrostatic solution. To distinguish the intermediate hydrostatic
result from the final non-hydrostatic solution, we use F(
eM e ,N
e ) and u(
e u, v) to represent the horizon-
e e

tal volume flux and the depth-averaged horizontal velocity from the shallow water equations, while
F(M, N ) and u(u, v) for the final non-hydrostatic counterparts. To construct the implicit equation
of non-hydrostatic pressure, we express the horizontal and vertical velocities as functions of q, and
substitute these functions into a rewritten form of the continuity equation (2-21a). The resulting
Poisson-type equation is solved efficiently with the ILU-preconditioned Bi-CGSTAB method.
First, we rewrite the continuity equation (2-21a) with horizontal and vertical velocities. Again,

19
this is done in dimensionless form, to clearly show that the O(µ2 ) accuracy in wave dispersion is
retained. According to (2-10a), by neglecting O(µ4 ) terms, the dimensionless kinematic boundary
conditions (2-7a, 2-7c) can be written as
 
∂η

2
ws = µ
 + εu · ▽η ,

 ∂t
  (3-9)
 ∂hb
wb = −µ2 + u · ▽h .


∂t

in which ws and wb are the vertical velocities at the free surface and the bottom, respectively.
Substituting (3-9) into (2-8), the following relation is obtained.

∂      ws − wb
η + hb + ▽ · h + εη u = h + εη ▽ · u + = 0. (3-10)
∂t µ2

Thus, the 2D continuity equation can be rewritten as

ws − wb
µ2 ▽ · u + = 0, (3-11)
h + εη

and its dimensional form is


ws − wb
▽·u+ = 0. (3-12)
D
Note that in the above analysis, we have not introduced any extra equation other than the governing
equations (2-21), because the kinematic boundary conditions have been used for deriving the original
continuity equation (2-21a). The rewritten form of continuity equation in (3-12) is valid to O(µ2 ),
while the original one is an exact relation.
Then, we express horizontal velocities with q, and substitute them into (3-12). According to
equation (2-21b), at the (n + 1)-th time step, the dispersive volume flux can be expressed as

h i
e n+1 − α∆t Dn+1 ▽q + q▽ η − βh n+1 ,
Fn+1 = F

(3-13)

where the superscript (n + 1) of q is dropped for simplicity. Dividing both sides of (3-13) with the
total water depth, the dispersive depth-averaged horizontal velocity is
" n+1 #
n+1 n+1 ▽ η − βh
u =u e − α∆t ▽q + q . (3-14)
Dn+1

Since w varies linearly in z direction for moderately dispersive flows, the vertical velocity at the

20
free surface can be determined from the vertical momentum equation (2-21c) as

2∆t
wsn+1 = wsn + wbn − wbn+1 + q, (3-15)
Dn+1

where the bottom vertical velocity wb is estimated from the kinematic boundary condition (2-3c).
By substituting (3-14) and (3-15) into (3-12), we obtain the Poisson-type equation of non-hydrostatic
pressure, which is
( " n+1 #) n n n+1
▽ η − βh
2
−α∆t ▽ q + ▽ · q +
2∆t
q = −▽ · e n+1 − ws + wb − 2wb .
u (3-16)
Dn+1
 2 Dn+1
Dn+1

Discretization of (3-16) in the staggered grids shown in Figure 3.1 yields the linear algebraic
equation system of q, that is

a1i,j qi−1,j + a2i,j qi+1,j + a3i,j qi,j−1 + a4i,j qi,j+1 + a5i,j qi,j = bi,j . (3-17)

In Cartesian coordinates, the coefficients and forcing terms in (3-17) are expressed as

α∆t 1 α∆t 1
  
a1i,j = 2
− 1 + φi−1,j , a2i,j = 2
− 1 − φi,j ,
(∆x) 2 (∆x) 2







α∆t 1 α∆t 1

  
− 1 + ψi,j−1 , a4i,j = − 1 − ψi,j ,


a3i,j =



 (∆y)2 2 (∆y)2 2


  
α∆t 1 1

  
a5i,j = 1 + φ + 1 − φ

i−1,j i,j
(∆x)2 2 2 (3-18)


  
α∆t 1 1 2∆t

  
+ 1 + ψ + 1 − ψ +  ,

i,j−1 i,j

(∆y)2 2 2 n+1 2

Di,j








en+1 en+1 n+1 n+1
wsni,j + wbni,j − 2wbn+1

ui,j − ui−1,j v i,j − e
v i,j−1


i,j
 e
 bi,j

 =− − − n+1 ,
∆x ∆y Di,j

in which wb , the vertical velocity at the bottom, is calculated with an upwind scheme as

n+1 n+1
hn+1 n
i,j − hi,j uni−1/2,j hi,j − hi−1,j , uni−1/2,j ≥ 0,
wbn+1
i,j =− − ×
∆t ∆x  n+1
hi+1,j − hn+1 uni−1/2,j < 0,
i,j ,
 (3-19)
v ni,j−1/2 hn+1 n+1
i,j − hi,j−1 , v ni,j−1/2 ≥ 0,
− ×
∆y hn+1 n+1
v ni,j−1/2 < 0.
i,j+1 − hi,j ,

21
In Earth coordinates, these terms are
 α∆t 1  α∆t 1 
 a1i,j = − 1 + φi−1,j , a2i,j = − 1 − φi,j ,
(R∆x cos yj )2 2 (R∆x cos yj )2 2







α∆t cos yj−1/2 1 α∆t cos yj+1/2 1


  

 a3i,j = 2
− 1 + ψi,j−1 , a4i,j = 2
− 1 − ψi,j ,
(R∆y) cos yj 2 (R∆y) cos yj 2






  
2∆t α∆t 1 1

  
a5i,j = + 1 + φ + 1 − φ

2 i−1,j i,j (3-20)
Dn+1 (R∆x cos yj )2 2 2


 i,j

  
α∆t 1 1


  

 + 1 + ψi,j−1 cos yj−1/2 + 1 − ψi,j cos yj+1/2 ,
(R∆y)2 cos yj 2 2







en+1 en+1 n+1 n+1

u i,j − ui−1,j v i,j cos yj+1/2 − e
v i,j−1 cos yj−1/2 wsni,j + wbni,j − 2wbn+1


i,j
 e
 bi,j =− − − ,

R∆x cos yj R∆y cos yj n+1
Di,j

where wb is

hn+1 − hn un hn+1 n+1
i,j − hi−1,j , uni−1/2,j ≥ 0,
i,j i,j i−1/2,j
wbn+1
i,j =− − ×
∆t R∆x cos yj hn+1 − hn+1 , uni−1/2,j < 0,
i+1,j i,j
 (3-21)
v ni,j−1/2 hn+1 n+1
i,j − hi,j−1 , v ni,j−1/2 ≥ 0,
− ×
R∆y hn+1 n+1
v ni,j−1/2 < 0.
i,j+1 − hi,j ,

In both Cartesian and Earth coordinates, the variables φ and ψ are defined as
 n+1 n+1
 η − βh i+1,j − η − βh i,j
φi,j = ,


 n+1


 Di+1/2,j
(3-22)
n+1 n+1
η − βh i,j+1 − η − βh i,j




ψi,j = .

n+1

Di,j+1/2

The depth-averaged horizontal velocity in the above equations is simply the horizontal volume flux
divided by the total water depth, that is

e n+1 e n+1

 n+1 M i,j n+1 N i,j
u = , v i,j = ,

 i,j
 e e
n+1 n+1
Di+1/2,j Di,j+1/2


(3-23)
n n n n
Mi−1,j + Mi,j Ni,j−1 + Ni,j


n
v ni,j−1/2

ui−1/2,j = , = .


n
2Di,j n
2Di,j

22
The sparse linear system of non-hydrostatic pressure (3-17) is solved with the bi-conjugate
gradient squared stabilized (Bi-CGSTAB) method (van der Vorst, 1992), and the incomplete LU
(ILU) preconditioner (Barrett et al., 1994) is used to speed up the convergence. The non-hydrostatic
pressure at the outermost cells of the computational domain is set to be zero as the boundary
condition. After solving the non-hydrostatic pressure q, the vertical velocity at the free surface ws
is calculated from (3-15), and the final dispersive volume flux (M, N ) is obtained by substituting q
into (3-13).
In Cartesian coordinates, the discretized value of the final dispersive volume fluxes are

e n+1 − α∆t Dn+1



n+1

Mi,j
 =M i,j i+1/2,j qi+1,j − qi,j
∆x





α∆t

 h n+1 n+1 i
− qi,j + qi+1,j η − βh i+1,j − η − βh i,j ,




 2∆x
(3-24)
n+1 e n+1 − α∆t Dn+1
 
Ni,j =N qi,j+1 − qi,j

 i,j i,j+1/2
∆y






α∆t

 h n+1 n+1 i
− qi,j + qi,j+1 η − βh i,j+1 − η − βh i,j .



2∆y
In Earth coordinates, they are given as

α∆t

n+1 e n+1 − n+1


 M i,j = M i,j D i+1/2,j qi+1,j − q i,j
R∆x cos yj






α∆t

 h n+1 n+1 i
− qi,j + qi+1,j η − βh i+1,j − η − βh i,j ,




 2R∆x cos yj
(3-25)
e n+1 − α∆t Dn+1

n+1
 
Ni,j =N qi,j+1 − qi,j

i,j


 R∆y i,j+1/2




α∆t

 h n+1 n+1 i
− qi,j + qi,j+1 η − βh i,j+1 − η − βh i,j .



2R∆y
It should be noted that the depth-integrated non-hydrostatic model is only applicable to
moderate bottom slope, which is the same as the Boussinesq equations. It has been demonstrated
that localized steep bottom gradients generally lead to instability of the Boussinesq-type models
(Løvholt and Pedersen, 2009). Similarly, the solution to the linear system (3-17) becomes unstable
near steep bathymetric features. For simulation of dispersive tsunamis, large bottom slope should
be avoided.

23
3.3 Moving Boundary Technique

In numerical computation of tsunami inundation, the real topography is represented with a series
of small steps located at the finite difference grids. At a submerged wet cell, the surface elevation η
is above the bottom and the total water depth D = η + h is positive. While at a dry cell, the surface
elevation is below the bottom and the value of η depends on the location of the cell. If the dry cell
is in the sea (h ≥ 0), η is set to be −h so that the total water depth D is zero. But if it is on the
land (h < 0), η is assigned to be zero and D is negative. In PCOMCOT, we adopt the algorithm of
Cho and Kim (2009) to constrain the flow direction on the wet-dry boundary according to the local
topography and the surface elevation. By doing so, the moving shoreline is captured naturally from
the continuity equation. This moving boundary algorithm can stably compute tsunami run-up and
run-down on topographic discontinuities without special treatment of the governing equations. For
simplicity, different cases in x directions involved in the wet and dry problem is presented in Figure
3.2, and the cases along y direction are handled in the same way. The various cases in terms of flow
direction in Figure 3.2 are described below in detail.

(a) If the higher cell is wet (Di+1 > 0) and the water surface at the lower cell is above the bottom
of the higher one (ηi + hi+1 > 0), then the water can flow in both directions and the flux Mi
is directly calculated from the momentum equation.

(b) If the higher cell is dry (Di+1 ≤ 0) and the water surface at the lower cell is above the bottom
of the higher one (ηi + hi+1 > 0), then the water can only flow from the lower cell to the
higher one. The value of Mi from the momentum equation is retained if its sign is consistent
with the allowed direction, and is set to be zero if not. −hi+1 is used as the value of ηi+1
when calculating Mi to avoid false surface gradient due to the setting of η value on dry cells.

(c) If the higher cell is wet (Di+1 > 0) and the water surface at the lower cell is below the bottom
of the higher one (ηi + hi+1 ≤ 0), then the water can only flow from the higher cell to the
lower one. The value of Mi is set to be zero if its sign from momentum equation is opposite
to the allowed direction. −hi+1 is used as the value of ηi when calculating Mi , because the
water level at the lower cell would not influence the flow from above.

(d) If the higher cell is dry (Di+1 ≤ 0) and the water surface at the lower cell is below the bottom
of the higher one (ηi + hi+1 ≤ 0), then the water cannot flow between these two cells and Mi

24
becomes zero.

Mi Mi
hi + 1
hi hi
hi+ 1 hi+ 1
MWL MWL
hi hi

i i+ 1 i i+ 1
(a) (b)

Mi
hi + 1

hi hi+ 1 hi hi+ 1
MWL MWL
hi hi

i i+ 1 i i+ 1
(c) (d)

Figure 3.2 1D wet-dry cases in x direction. MWL represents the mean water level, and the arrows
below the flux M denote the allowed directions of water flow.

The momentum equations together with the above constraints on flow direction determine the
volume flux on the wet-dry boundary. Once the continuity equation is solved at the next time step
based on the value of volume flux, the conversion between wet and dry is handled as following.
n n+1 n
* If η + h ≤ 0 and η + h > η + h , then water flows into the dry cell. Because
there is no water in the cell at the previous time step, the total water depth Dn+1 is actually
n+1 n n
η+h − η + h . Thus, the value of η n+1 is modified to be η n+1 − η + h to give the
correct total water depth.
n n+1 n n+1
* If η +h > 0 and η +h < η +h , then water flows out of the wet cell. If η +h is
less than a threshold, this cell is drained, and η n+1 is set to 0 when hn+1 < 0 or −hn+1 when
hn+1 ≥ 0. Otherwise, the cell remains wet and η n+1 from continuity equation is retained.
Note that we turn a wet cell into dry when the total water depth is less than a positive
threshold rather than zero, to avoid the instability caused by a very thin water film when the

25
tsunami recedes.

3.4 Wave Breaking

Since the momentum equations of depth-integrated non-hydrostatic model is only valid for non-
breaking waves, additional treatment is needed to handle wave breaking. In PCOMCOT, we use the
eddy-viscosity scheme of Kennedy et al. (2000) to model the energy dissipation caused by breaking.
This scheme adds artificial eddy viscosity terms to the right-hand side of the horizontal momentum
equations (2-22b, 2-22c) or (2-24b, 2-24c). These terms are expressed as

Rbx = ▽ · (ν▽M ),

(3-26)

R
by = ▽ · (ν▽N ),

where Rbx and Rby are the eddy viscosity terms for x and y directions, respectively. Compared with
the original form of Kennedy et al. (2000), The cross-derivatives are removed in (3-26) to improve
numerical stability (Choi et al., 2018). ν is the artificial eddy viscosity varying with time and space,
which is defined as
ν = Bδb2 (h + η)ηt , (3-27)

where the mixing length coefficient δb is 1.0 in PCOMCOT. The coefficient B varies smoothly from
0 to 1 to avoid a sudden start of breaking and the related instability. Details about how the value
of B is evaluated can be found in the studies of Kennedy et al. (2000) and Chen et al. (2000). In
brief, B is a piecewise linear function of the ratio between ηt and a parameter ηt∗ which determines
the onset and cessation of breaking. A breaking event starts when ηt exceeds an initial threshold
(I)
ηt and terminates once ηt becomes less than ηt∗ . As breaking develops, the parameter ηt∗ decreases
(I) (F )
gradually with the breaking event age from ηt to ηt . To estimate the age of each breaking event
in the computational region, the breaking history at a certain grid cell is tracked against the local
wave celerity.

3.5 Nested Grid Configuration

A system of nested rectangular grids is employed in PCOMCOT to use various spatial resolutions
for different study regions. The largest grid layer covering the entire computational region is called
the 1st-level layer (top layer), and the layers directly nested in the 1st-level layer are 2nd-level

26
layers, and so on. For any two directly nested layers, the outer layer is called the parent layer and
the inner one the child layer. Any grid size ratio between the parent and child layers is allowed.
Both one-way and two-way nesting algorithms can be used for exchanging the information between
two nested layers. In one-way nesting, interpolation of the solution over the parent layer provides
the input boundary conditions for the child layer. While the two-way nesting involves an extra
procedure of feeding back the surface elevation from the child to the parent layer.
The feed-forward from the parent to child layer is illustrated in Figure 3.3. For the time step
size ∆t given to the top layer, the time step size of a certain layer is set to be ∆t/k, where k is
the smallest integer which gives a C.F.L. (Courant-Friedrichs-Lewy) number no larger than that
of the top layer. Beginning at time t, the parent layer is firstly computed with the time step size
∆t/k1 until t + ∆t. The surface elevation η, volume flux (M, N ), non-hydrostatic pressure q and
artificial eddy viscosity ν at t + ∆t over the parent layer are interpolated to the grid cells on the
boundary of the child layer. Then, the child layer is calculated with the time step size ∆t/k2 , and
at each step, the solution on the child layer boundary is predefined by assuming a linear change of
these variables from t to t + ∆t. Note that for calculation of nonlinear terms, two rows/columns of
variables on the child layer boundary need to be interpolated from the parent layer. In Figure 3.3,
only one row/column is plotted for simplicity.

27
t t+∆t

t/k1 … t + (k1
<latexit sha1_base64="6XCuOTRpzGV0yuSVvgOcNAY9+dM=">AAAB9XicbVDJSgNBEO2JW4xb1KOXxiAIQpwRt2NQDx4jmAWSMfR0apImPQvdNUoY8h9ePCji1X/x5t/YSeagiQ8KHu9VUVXPi6XQaNvfVm5hcWl5Jb9aWFvf2Nwqbu/UdZQoDjUeyUg1PaZBihBqKFBCM1bAAk9Cwxtcj/3GIygtovAehzG4AeuFwhecoZEe8Kh9AxIZxeNBx+kUS3bZnoDOEycjJZKh2il+tbsRTwIIkUumdcuxY3RTplBwCaNCO9EQMz5gPWgZGrIAtJtOrh7RA6N0qR8pUyHSifp7ImWB1sPAM50Bw76e9cbif14rQf/STUUYJwghny7yE0kxouMIaFco4CiHhjCuhLmV8j5TjKMJqmBCcGZfnif1k7JzXj67Oy1VrrI48mSP7JND4pALUiG3pEpqhBNFnskrebOerBfr3fqYtuasbGaX/IH1+QM6cZGz</latexit>

t+
<latexit sha1_base64="/TH2+SNVcRBd3S8upq/v/qnet40=">AAAB/nicbVDLSgMxFM3UV62vUXHlJliEilhnxNeyqAuXFewD2mHIpKkNzWSG5I5QhoK/4saFIm79Dnf+jWk7C209cOHknHvJvSeIBdfgON9Wbm5+YXEpv1xYWV1b37A3t+o6ShRlNRqJSDUDopngktWAg2DNWDESBoI1gv71yG88MqV5JO9hEDMvJA+SdzklYCTf3oHDUt93j9yD9g0TQDAcm6dvF52yMwaeJW5GiihD1be/2p2IJiGTQAXRuuU6MXgpUcCpYMNCO9EsJrRPHljLUElCpr10vP4Q7xulg7uRMiUBj9XfEykJtR6EgekMCfT0tDcS//NaCXQvvZTLOAEm6eSjbiIwRHiUBe5wxSiIgSGEKm52xbRHFKFgEiuYENzpk2dJ/aTsnpfP7k6LlassjjzaRXuohFx0gSroFlVRDVGUomf0it6sJ+vFerc+Jq05K5vZRn9gff4AWdKT1A==</latexit>

N 1) t/k1 N
h , q,n M h , q,n M
Parent Layer

interpolate interpolate
η, q, ν, (M, N) η, q, ν, (M, N)

t/k2 … t + (k2
<latexit sha1_base64="EW8ffiQ66LdxFmxlWuP5stZol0g=">AAAB9XicbVDJSgNBEO1xjXGLevTSGARBiDPB7RjUg8cIZoFkDD2dmqRJz0J3jRKG/IcXD4p49V+8+Td2kjlo4oOCx3tVVNXzYik02va3tbC4tLyymlvLr29sbm0XdnbrOkoUhxqPZKSaHtMgRQg1FCihGStggSeh4Q2ux37jEZQWUXiPwxjcgPVC4QvO0EgPeNy+AYmM4smgU+4UinbJnoDOEycjRZKh2il8tbsRTwIIkUumdcuxY3RTplBwCaN8O9EQMz5gPWgZGrIAtJtOrh7RQ6N0qR8pUyHSifp7ImWB1sPAM50Bw76e9cbif14rQf/STUUYJwghny7yE0kxouMIaFco4CiHhjCuhLmV8j5TjKMJKm9CcGZfnif1csk5L53dnRYrV1kcObJPDsgRccgFqZBbUiU1wokiz+SVvFlP1ov1bn1MWxesbGaP/IH1+QM79ZG0</latexit> <latexit sha1_base64="hjtc3sxq1XYvklnnLDN8l+AkJcQ=">AAAB/nicbVDLSgMxFM3UV62vUXHlJliEilhniq9lURcuK9gHtMOQSTNtaOZBckcoQ8FfceNCEbd+hzv/xrSdhbYeuHByzr3k3uPFgiuwrG8jt7C4tLySXy2srW9sbpnbOw0VJZKyOo1EJFseUUzwkNWBg2CtWDISeII1vcHN2G8+Mql4FD7AMGZOQHoh9zkloCXX3IPj0sCtnNhHnVsmgGA41U/XLFplawI8T+yMFFGGmmt+dboRTQIWAhVEqbZtxeCkRAKngo0KnUSxmNAB6bG2piEJmHLSyfojfKiVLvYjqSsEPFF/T6QkUGoYeLozINBXs95Y/M9rJ+BfOSkP4wRYSKcf+YnAEOFxFrjLJaMghpoQKrneFdM+kYSCTqygQ7BnT54njUrZviif358Vq9dZHHm0jw5QCdnoElXRHaqhOqIoRc/oFb0ZT8aL8W58TFtzRjazi/7A+PwBXOmT1g==</latexit>

t+ 1) t/k2
Child Layer

Figure 3.3 Feed-forward from the parent to the child layer. The red box in the parent layer denotes
the boundary of the child layer. The red dots and arrows in the child layer represent η, q, ν and
(M, N ) interpolated from the parent layer, while the black dots and arrows indicate those obtained
by solving the governing equations.

The feedback from the child to parent layer is sketched in Figure 3.4. If two-way nesting
is used, the surface elevation over the parent layer is updated with the average value from the
overlapping child layer grids when the computation in all layers reaches t + ∆t. After the feedback
of surface elevation, the momentum equations are solved again in the parent layer to keep (M, N ),
q, ν coupled with η.

28
t+(k1 −1)∆t/ k1 t+∆t

N momentum equations N
h , q,n M h M
Parent Layer
q,n

average η
h

Child Layer h

Figure 3.4 Feedback from the child to the parent layer. The red and black dots in the child layer
denote the surface elevation interpolated from the parent layer and calculated from the continuity
equation, respectively. In the parent layer, the magenta dots indicate the grid points where η is
updated with the average value from the child layer. After feedback of η, the related q, ν and
(M, N ) which are colored blue are updated by solving the momentum equations again.

3.6 Parallel Implementation

3.6.1 CPU Parallelization

We use the domain decomposition technique for parallel implementation of PCOMCOT on multiple
CPU cores. As shown in Figure 3.5, the domain of each grid layer is partitioned into multiple
rectangular subdomains, each of which has almost the same dimension for load balance. These
subdomains are assigned to different computing nodes, and data exchanges between adjacent nodes
are realized with the MPI point-point communication routines. For a nested grid system, both intra-
and inter-layer communication are needed. The intra-layer communication is performed between
subdomains of a single layer, while the inter-layer communication transfers the data between two
layers which are directly nested. To solve the Poisson-type equation of non-hydrostatic pressure,
both the Bi-CGSTAB algorithm and the ILU-preconditioning are parallelized based on the same
strategies of domain decomposition and data exchange.

29
Parent Layer

#(nparty-1)×npartx Child Layer #nparty×npartx-1


#(nparty-1) #nparty
×npartx ×npartx-1

nparty #npartx
#0 #1 #npartx-1

y
#0 #1 #npartx-1

npartx
x

Figure 3.5 Domain decomposition for parallel computation. The subdomains of each layer are
assigned to different nodes. The Numbers after # are the MPI ranks of these nodes, starting from
0. The total number of subdomains along x and y directions are npartx and nparty, respectively.

The intra-layer communication is illustrated in Figure 3.6. To calculate the variables on the
boundary of a subdomain (i.e., η, q, ν, M and N ), two extra rows/columns of required data are
transferred from each of the four neighboring subdomains. For demonstration purpose, only one
row/column is plotted here. It should be noted that, to correctly synchronize the variables around
the four corners of each subdomain, data transfer needs to be performed in order. That is, each
computing node must receive data from its left and right neighbors, and then from the top and
bottom ones, or vice versa.

30
#i+npartx

13

#i−1 #i #i+1
1 12

1
4
#i−npartx

Figure 3.6 Intra-layer communication. Boundaries of subdomains are denoted by thick black lines.
The dots and arrows represent the variables η, q, ν, M and N sent to node #i at each time step.
The blue rounded boxes together with the large hollow arrows indicate from which nodes these
variables are sent, with the circled numbers showing the order of data transfer.

Figure 3.7 displays how the inter-layer communication is done. Suppose that a grid cell on
the child layer boundary is assigned to node #0, but its location in the parent layer belongs to the
subdomain assigned to node #5. So, in the feed-forward from the parent to child layer, values of
η, q, ν, M and N at this location are firstly interpolated on node #5 using the parent layer data.
Then, these interpolated values are sent to node #0 and stored at the child layer grid. The similar
procedure is done in the feedback from the child to parent layer. Suppose that a grid cell of the
parent layer is assigned to node #(4 + npartx), but its location in the child layer is covered by node
#0. Thus, η at this location is calculated on node #0 by averaging the surrounding child layer
data, and then sent to node #(4 + npartx).

31
Parent Layer

Child Layer

#npartx #npartx+1
#4+npartx #5+npartx
#0 #1

#4 #5

Figure 3.7 Inter-layer communication. Thick black and thick blue lines represent the boundaries
of parent layer and child layer subdomains, respectively. In the child layer, the solid blue dots
indicate the boundary cells where variable values are interpolated from the parent layer, while the
hollow ones are the interior cells where the surface elevation is averaged to update η in the parent
layer.

To parallelize the ILU-preconditioned Bi-CGSTAB algorithm (Barrett et al., 1994; van der
Vorst, 1992), all the involved matrices and vectors are distributed into the same subdomains. In
every iteration, matrix and vector elements on the boundaries of subdomains are synchronized via
intra-layer communication, and the scalars are updated through MPI collective communication. To
enable parallel solving of the triangular preconditioning matrices, any coefficient in (3-17) relating
different nodes are omitted when constructing these matrices. For example, if qi,j and qi−1,j are
stored on different nodes, the coefficient a1i,j is assumed to be zero. We test our implicit solver
against the Matlab function bicgstab, which solves sparse linear systems with the unpreconditioned
Bi-CGSTAB (Copyright 1984-2022 The MathWorks, Inc.). As shown in Figure 3.8, with the toler-
ance set to be 10−6 , difference between the non-hydrostatic pressure given by the Matlab bicgstab
and our parallel preconditioned version is less than 5 × 10−7 . The number of iterations in this

32
example is 6 for PCOMCOT running on 40 CPU cores, and 17 for the MATLAB bicgstab.
PCOMCOT Matlab bicgstab Difference 10 -7
500 500 500
5
2 2

4
0 0
3
y(meter)

0 -2 0 -2 0
2
-4 -4
1

-6 -6
0
-500 -500 -500
-600 -400 -200 0 200 400 600 -600 -400 -200 0 200 400 600 -600 -400 -200 0 200 400 600
x(meter)

Figure 3.8 Comparison of non-hydrostatic pressure given by PCOMCOT and the Matlab function
bicgstab.

3.6.2 GPU Parallelization

We employ the CUDA FORTRAN programming model to accelerate PCOMCOT with a NVIDIA
GPU. All the computational tasks are offloaded to the GPU through CUDA kernels, i.e., subpro-
grams which run on the device (GPU) but are called from the host (CPU). When a kernel is invoked,
it launches a grid of thread blocks, with all the threads executing the same code on different data.
This data parallelism is a fine-grained parallelism, which is the most efficient when having adjacent
threads operate on adjacent data (Ruetsch and Fatica, 2024). Thus, we map the entire simulation
domain to a 2D grid of thread blocks, with each thread corresponding to a single cell or cell inter-
face. As shown in Figure 3.9, the built-in variables BlockIdx, BlockDim, and ThreadIdx are used
to identify different threads and relate them with the array indices. This mapping strategy heavily
oversubscribes GPU cores and thus effectively hides memory latency.

33
BlockIdx%X = k BlockIdx%X = k+1

BlockIdx%Y = n+1
(1, BlockDim%Y) (BlockDim%X, BlockDim%Y)

BlockIdx%Y = n
BlockDim%Y

(ThreadIdx%X, ThreadIdx%Y) = (1,1) (1, BlockDim%X)


BlockDim%X

Figure 3.9 Mapping of the simulation domain to a 2D grid of thread blocks.

To efficiently solve the non-hydrostatic pressure on GPU, we utilize NVIDIA’s linear alge-
bra libraries, cuBLAS and cuSPARSE, to perform the Bi-CGSTAB method. However, due to
the Poisson-type nature of the equation, the ILU preconditioning is basically sequential and not
suitable for fine-grained parallelism. As the linear system (3-17) is usually diagonally dominated,
we retain only the diagonal coefficients (i.e., a5i,j ) for preconditioning. Compared with the ILU-
preconditioned Bi-CGSTAB, this “D-preconditioned” version requires ∼3 times more iterations for
convergence, but is still much faster than the unpreconditioned version. For the case of 2011 To-
hoku tsunami, the numbers of iterations required by the ILU-preconditioned, D-preconditioned,
and unpreconditioned solvers are around 10, 30 and >400, respectively.

3.7 Boundary Conditions

Two types of boundary conditions – wall boundary and absorbing boundary are provided in PCOM-
COT. For absorbing boundary condition, we use a combination of the L-D type sponge layer (Larsen
and Dancy, 1983) and the friction-type sponge layer. The L-D type sponge layer directly attenuates

34
the variables near the domain boundary at every time step, i.e.,

(η, M, N ) = (η, M, N )/Cs . (3-28)

The damping coefficient Cs is defined as

i−1
Cs = Aγ , i = 1, 2, ..., Iwidth , (3-29)

in which A and γ are free parameters which are suggested by Chen et al. (1999) to be 2.0 and
0.88∼0.92, respectively. i is the grid number along the direction away from the boundary, and
Iwidth is the layer width in number of points. A narrow layer (i.e., width ≈ the typical wavelength)
can be used for L-D type sponge for its good efficiency. However, Shi et al. (2016) pointed out that
this type of sponge layer generates sawtooth noises which may grow to be significant in long-term
simulation. To minimize the noises and make the damping more effective, the friction-type sponge
is also implemented in the same area of the L-D type. The friction-type sponge uses the friction
terms in the momentum equations, and thus attenuates the waves smoothly. In the sponge layer, an
extra Manning’s roughness coefficient ns , which increases gradually toward the boundary, is added
to the original value. The form given by Shi et al. (2016) is adopted as the expression of ns , that is

10(i − 1)
 
ns = nmax 1 − tanh , (3-30)
Iwidth − 1

where nmax is the maximum value of Manning’s coefficient in the sponge layer. The combination of
L-D and friction-type sponge layers gives satisfactory absorption in our test, without introducing
oscillations to the interior of computation domains.

3.8 Dealing with Numerical Instability

The numerical model of PCOMCOT shows good stability in large-scale and long-term simulations.
However, numerical instability cannot be avoided completely by any scheme. Here we give some
suggestions for dealing with numerical instabilities.

1. Use the flux-centered scheme.


As mentioned in Section 3.1, the FTCS scheme many models use to solve LSWEs may cause
unphysical oscillations near shorelines, due to absence of diffusive terms. The flux-centered
scheme reduces these oscillations by introducing slight numerical diffusion. Combined with the

35
adaptive diffusive coefficient, it generally provides satisfactory stability without overdamping
the wavefield. Based on our experience, the flux-centered scheme is always recommended, as
long as no overdamping is found.

2. Avoid steep bathymetry.


The non-hydrostatic model becomes unstable on rapidly varying bathymetry. The instability
related with steep bathymetry usually appears as spurious wave sources near the coast. Large
gradient of water depth should be avoided when computing dispersive waves. Based on our
experiences, a bottom slope less than 0.5 (i.e., |▽h| < 0.5) generally gives stable results. The
grdfilter module of GMT (Generic Mapping Tools) software (Wessel et al., 2013) can be used
to remove localized steep features. By setting the -F option to be -Fgwidth, we can apply
a Gaussian filter to the bathymetry grids, where the parameter width is the width of the
Gaussian function (i.e., 6 times the standard deviation) in km. According to the stability
criterion of Horrillo et al. (2006), the parameter width should be ∼4.5 times the average
water depth.

3. Limit huge velocity with a Froude number cap.


In tsunami simulations, unrealistic velocity may occur in extremely shallow water. A Froude
number cap is used in PCOMCOT to address this problem. In our tests, the Froude number
cap is set to be 10.0, and no instability is found. A smaller value can be used to ensure
model stability. Note that when calculating inundation, a Froude number cap less than 1.5 is
undesirable, because it caps the velocity on the flooding water fronts (Shi et al., 2016).

4. Adjust Water depth Limit for wet −> dry.


The parameter Water depth Limit for wet −> dry controls the threshold value of water depth,
below which the grid cell is thought to be dry. The parameter is suggested to be 0.01m for
real tsunamis and 0.001m for small-scale lab scenarios. As a thin water film may cause large
velocity during the run-down process, users can set larger values for this parameter when
instability occurs.

Although no numerical filtering is needed in our tests, we still provide a 2D 9-point numerical
filter in case there are other instabilities. The first-order filter of Shapiro (1970) can be applied to
(η, M, N ) every after a given time interval. Note that this numerical filtering is not suggested in
general, because it significantly damps the wave field after being used just a few times.

36
4 Configuration, Input and Output

4.1 Programming Flow

The CPU version of PCOMCOT (PCOMCOT-CPU) is written using Fortran 90 and the MPI
library. When running on multiple CPU cores, the computing node with MPI rank 0 is used as the
master node, while the others are slave nodes. The input data are read on the master node and
then broadcast to slave nodes. Numerical simulation is performed simultaneously on all nodes. For
data output, simulation results on slave nodes are sent to the master node, where the output files
are generated. Figure 4.1 displays the detailed programming flow of PCOMCOT-CPU.

MPI Initialize

Read input files Determine nesting relationship of gird layers Divide domains & calculate communication tables
preprocessing
Broadcast general parameters and bathymetry data to slave nodes Calculate parameters for simulation

get layer boundary ⌘, q, ⌫, M, N at coarse time form parent layer


<latexit sha1_base64="j2dwyCvjchf8ZO68qGcH7uvdp4A=">AAAB9XicbVDLSsNAFJ3UV62vqktFgkVwEUoi+FgW3bhRWrAPaGKZTG/boZNJnJkoJXTpP7hxoYhbt/0Od36DP+H0sdDWAxcO59zLvff4EaNS2faXkZqbX1hcSi9nVlbX1jeym1sVGcaCQJmELBQ1H0tglENZUcWgFgnAgc+g6ncvhn71HoSkIb9RvQi8ALc5bVGClZZuXVDYurNcHltX1nUjm7Pz9gjmLHEmJFfYHZS+H/cGxUb2022GJA6AK8KwlHXHjpSXYKEoYdDPuLGECJMubkNdU44DkF4yurpvHmilabZCoYsrc6T+nkhwIGUv8HVngFVHTntD8T+vHqvWmZdQHsUKOBkvasXMVKE5jMBsUgFEsZ4mmAiqbzVJBwtMlA4qo0Nwpl+eJZWjvHOSPy7pNM7RGGm0g/bRIXLQKSqgS1REZUSQQE/oBb0aD8az8Wa8j1tTxmRmG/2B8fEDDPWVSA==</latexit>

loop get layer boundary ⌘, q, ⌫, M, N at fine time


<latexit sha1_base64="j2dwyCvjchf8ZO68qGcH7uvdp4A=">AAAB9XicbVDLSsNAFJ3UV62vqktFgkVwEUoi+FgW3bhRWrAPaGKZTG/boZNJnJkoJXTpP7hxoYhbt/0Od36DP+H0sdDWAxcO59zLvff4EaNS2faXkZqbX1hcSi9nVlbX1jeym1sVGcaCQJmELBQ1H0tglENZUcWgFgnAgc+g6ncvhn71HoSkIb9RvQi8ALc5bVGClZZuXVDYurNcHltX1nUjm7Pz9gjmLHEmJFfYHZS+H/cGxUb2022GJA6AK8KwlHXHjpSXYKEoYdDPuLGECJMubkNdU44DkF4yurpvHmilabZCoYsrc6T+nkhwIGUv8HVngFVHTntD8T+vHqvWmZdQHsUKOBkvasXMVKE5jMBsUgFEsZ4mmAiqbzVJBwtMlA4qo0Nwpl+eJZWjvHOSPy7pNM7RGGm0g/bRIXLQKSqgS1REZUSQQE/oBb0aD8az8Wa8j1tTxmRmG/2B8fEDDPWVSA==</latexit>

for
layers
Calculate ⌘ Exchange ⌘ at subdomain edges Calculate non-dispersive, non-breaking (M, N ) Exchange (M, N ) at subdomain edges
<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

<latexit sha1_base64="zJHtEtyWQYBKP7fg+xaIal7Wx3k=">AAAB63icbVDLSgNBEOz1GeMr6lGRwSB4CruCj2PQi8cEzAOSJcxOZpMhM7PLzKwQlhy9evGgiFf/Id/hzW/wJ5xNctDEgoaiqpvuriDmTBvX/XKWlldW19ZzG/nNre2d3cLefl1HiSK0RiIeqWaANeVM0pphhtNmrCgWAaeNYHCb+Y0HqjSL5L0ZxtQXuCdZyAg2mdSmBncKRbfkToAWiTcjxfLRuPr9eDyudAqf7W5EEkGlIRxr3fLc2PgpVoYRTkf5dqJpjMkA92jLUokF1X46uXWETq3SRWGkbEmDJurviRQLrYcisJ0Cm76e9zLxP6+VmPDaT5mME0MlmS4KE45MhLLHUZcpSgwfWoKJYvZWRPpYYWJsPHkbgjf/8iKpn5e8y9JF1aZxA1Pk4BBO4Aw8uIIy3EEFakCgD0/wAq+OcJ6dN+d92rrkzGYO4A+cjx/5EpHp</latexit>

<latexit sha1_base64="zJHtEtyWQYBKP7fg+xaIal7Wx3k=">AAAB63icbVDLSgNBEOz1GeMr6lGRwSB4CruCj2PQi8cEzAOSJcxOZpMhM7PLzKwQlhy9evGgiFf/Id/hzW/wJ5xNctDEgoaiqpvuriDmTBvX/XKWlldW19ZzG/nNre2d3cLefl1HiSK0RiIeqWaANeVM0pphhtNmrCgWAaeNYHCb+Y0HqjSL5L0ZxtQXuCdZyAg2mdSmBncKRbfkToAWiTcjxfLRuPr9eDyudAqf7W5EEkGlIRxr3fLc2PgpVoYRTkf5dqJpjMkA92jLUokF1X46uXWETq3SRWGkbEmDJurviRQLrYcisJ0Cm76e9zLxP6+VmPDaT5mME0MlmS4KE45MhLLHUZcpSgwfWoKJYvZWRPpYYWJsPHkbgjf/8iKpn5e8y9JF1aZxA1Pk4BBO4Aw8uIIy3EEFakCgD0/wAq+OcJ6dN+d92rrkzGYO4A+cjx/5EpHp</latexit>

Bi-CGSTAB iteration
Construct implicit equations of q Update q Exchange q at subdomain edges
loop if meet convergence criterion
for
coarse Calculate dispersive (M, N ) with q Exchange (M, N ) at subdomain edges
<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

loop
time for
fine Calculate ⌫
<latexit sha1_base64="TjbkfnZS+qf8pJ+YJeSrWE7vk+8=">AAAB6nicbZDLSgMxFIbP1Fsdb1WXboJFcFVmBC8bsejGZUV7gXYomTTThmaSIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845YcKZNp737eQWFpeWV/Kr7tr6xuZWYXunpmWqCK0SyaVqhFhTzgStGmY4bSSK4jjktB72r0Z5/Z4qzaS4M4OEBjHuChYxgo21blsibReKXskbC82DP4XixYd7nrx9uZV24bPVkSSNqTCEY62bvpeYIMPKMMLp0G2lmiaY9HGXNi0KHFMdZONRh+jAOh0USWWfMGjs/u7IcKz1IA5tZYxNT89mI/O/rJma6CzImEhSQwWZfBSlHBmJRnujDlOUGD6wgIlidlZEelhhYux1XHsEf3bleagdlfyT0vGNVyxfwkR52IN9OAQfTqEM11CBKhDowgM8wbPDnUfnxXmdlOacac8u/JHz/gPBqpEe</latexit>

Exchange ⌫ at subdomain edges


<latexit sha1_base64="TjbkfnZS+qf8pJ+YJeSrWE7vk+8=">AAAB6nicbZDLSgMxFIbP1Fsdb1WXboJFcFVmBC8bsejGZUV7gXYomTTThmaSIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845YcKZNp737eQWFpeWV/Kr7tr6xuZWYXunpmWqCK0SyaVqhFhTzgStGmY4bSSK4jjktB72r0Z5/Z4qzaS4M4OEBjHuChYxgo21blsibReKXskbC82DP4XixYd7nrx9uZV24bPVkSSNqTCEY62bvpeYIMPKMMLp0G2lmiaY9HGXNi0KHFMdZONRh+jAOh0USWWfMGjs/u7IcKz1IA5tZYxNT89mI/O/rJma6CzImEhSQwWZfBSlHBmJRnujDlOUGD6wgIlidlZEelhhYux1XHsEf3bleagdlfyT0vGNVyxfwkR52IN9OAQfTqEM11CBKhDowgM8wbPDnUfnxXmdlOacac8u/JHz/gPBqpEe</latexit>

time

Calculate breaking (M, N ) with ⌫ Exchange (M, N ) at subdomain edges


<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

<latexit sha1_base64="TjbkfnZS+qf8pJ+YJeSrWE7vk+8=">AAAB6nicbZDLSgMxFIbP1Fsdb1WXboJFcFVmBC8bsejGZUV7gXYomTTThmaSIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845YcKZNp737eQWFpeWV/Kr7tr6xuZWYXunpmWqCK0SyaVqhFhTzgStGmY4bSSK4jjktB72r0Z5/Z4qzaS4M4OEBjHuChYxgo21blsibReKXskbC82DP4XixYd7nrx9uZV24bPVkSSNqTCEY62bvpeYIMPKMMLp0G2lmiaY9HGXNi0KHFMdZONRh+jAOh0USWWfMGjs/u7IcKz1IA5tZYxNT89mI/O/rJma6CzImEhSQwWZfBSlHBmJRnujDlOUGD6wgIlidlZEelhhYux1XHsEf3bleagdlfyT0vGNVyxfwkR52IN9OAQfTqEM11CBKhDowgM8wbPDnUfnxXmdlOacac8u/JHz/gPBqpEe</latexit>

update ⌘ by averaing child layer result


<latexit sha1_base64="zJHtEtyWQYBKP7fg+xaIal7Wx3k=">AAAB63icbVDLSgNBEOz1GeMr6lGRwSB4CruCj2PQi8cEzAOSJcxOZpMhM7PLzKwQlhy9evGgiFf/Id/hzW/wJ5xNctDEgoaiqpvuriDmTBvX/XKWlldW19ZzG/nNre2d3cLefl1HiSK0RiIeqWaANeVM0pphhtNmrCgWAaeNYHCb+Y0HqjSL5L0ZxtQXuCdZyAg2mdSmBncKRbfkToAWiTcjxfLRuPr9eDyudAqf7W5EEkGlIRxr3fLc2PgpVoYRTkf5dqJpjMkA92jLUokF1X46uXWETq3SRWGkbEmDJurviRQLrYcisJ0Cm76e9zLxP6+VmPDaT5mME0MlmS4KE45MhLLHUZcpSgwfWoKJYvZWRPpYYWJsPHkbgjf/8iKpn5e8y9JF1aZxA1Pk4BBO4Aw8uIIy3EEFakCgD0/wAq+OcJ6dN+d92rrkzGYO4A+cjx/5EpHp</latexit>

reverse
loop for
Calculate (M, N ) again
<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

layers

output ⌘, q, M, N when needed


<latexit sha1_base64="5y/8mnHbt3QIPxh1dwsYcKb4hDE=">AAAB8nicbVDJSgNBEO2JW4xb1KMijUHwEMKM4HIMevGiJGAWmAyhp9OTNOmZHrtrhBBy9BO8eFDEqx+Q7/DmN/gTdpaDJj4oeLxXRVU9PxZcg21/WamFxaXllfRqZm19Y3Mru71T1TJRlFWoFFLVfaKZ4BGrAAfB6rFiJPQFq/ndq5Ffe2BKcxndQS9mXkjaEQ84JWAkt8GA5PF9/iZ/28zm7II9Bp4nzpTkivvD8vfjwbDUzH42WpImIYuACqK169gxeH2igFPBBplGollMaJe0mWtoREKmvf745AE+MkoLB1KZigCP1d8TfRJq3Qt90xkS6OhZbyT+57kJBBden0dxAiyik0VBIjBIPPoft7hiFETPEEIVN7di2iGKUDApZUwIzuzL86R6UnDOCqdlk8YlmiCN9tAhOkYOOkdFdI1KqIIokugJvaBXC6xn6816n7SmrOnMLvoD6+MHmm6T3w==</latexit>

MPI Finalize

Figure 4.1 Programming flow of PCOMCOT-CPU. The red boxes indicate the steps that are done
only on the master node. The magenta and green boxes denote the procedures exclusive to child
and parent grid layers, respectively.

The GPU version (PCOMCOT-GPU) is written with CUDA FORTRAN. CUDA FORTRAN
is a hybrid programming model, where both the CPU and GPU are utilized. The host CPU is
responsible for workflow control and pre-/post-processing, while all the computations are carried

37
out by the GPU device. Variables are stored in both the host memory and the device global
memory, and data synchronization is realized via host-device memory transfer. The programming
flow of PCOMCOT-GPU is plotted in Figure 4.2.

GPU (Device) CPU (Host)


Host-to-device Allocate host variables
Allocate device memory
data transfer

get layer boundary ⌘, q, ⌫, M, N at coarse time form parent layer


<latexit sha1_base64="j2dwyCvjchf8ZO68qGcH7uvdp4A=">AAAB9XicbVDLSsNAFJ3UV62vqktFgkVwEUoi+FgW3bhRWrAPaGKZTG/boZNJnJkoJXTpP7hxoYhbt/0Od36DP+H0sdDWAxcO59zLvff4EaNS2faXkZqbX1hcSi9nVlbX1jeym1sVGcaCQJmELBQ1H0tglENZUcWgFgnAgc+g6ncvhn71HoSkIb9RvQi8ALc5bVGClZZuXVDYurNcHltX1nUjm7Pz9gjmLHEmJFfYHZS+H/cGxUb2022GJA6AK8KwlHXHjpSXYKEoYdDPuLGECJMubkNdU44DkF4yurpvHmilabZCoYsrc6T+nkhwIGUv8HVngFVHTntD8T+vHqvWmZdQHsUKOBkvasXMVKE5jMBsUgFEsZ4mmAiqbzVJBwtMlA4qo0Nwpl+eJZWjvHOSPy7pNM7RGGm0g/bRIXLQKSqgS1REZUSQQE/oBb0aD8az8Wa8j1tTxmRmG/2B8fEDDPWVSA==</latexit>

Read input files

get layer boundary ⌘, q, ⌫, M, N at fine time


<latexit sha1_base64="j2dwyCvjchf8ZO68qGcH7uvdp4A=">AAAB9XicbVDLSsNAFJ3UV62vqktFgkVwEUoi+FgW3bhRWrAPaGKZTG/boZNJnJkoJXTpP7hxoYhbt/0Od36DP+H0sdDWAxcO59zLvff4EaNS2faXkZqbX1hcSi9nVlbX1jeym1sVGcaCQJmELBQ1H0tglENZUcWgFgnAgc+g6ncvhn71HoSkIb9RvQi8ALc5bVGClZZuXVDYurNcHltX1nUjm7Pz9gjmLHEmJFfYHZS+H/cGxUb2022GJA6AK8KwlHXHjpSXYKEoYdDPuLGECJMubkNdU44DkF4yurpvHmilabZCoYsrc6T+nkhwIGUv8HVngFVHTntD8T+vHqvWmZdQHsUKOBkvasXMVKE5jMBsUgFEsZ4mmAiqbzVJBwtMlA4qo0Nwpl+eJZWjvHOSPy7pNM7RGGm0g/bRIXLQKSqgS1REZUSQQE/oBb0aD8az8Wa8j1tTxmRmG/2B8fEDDPWVSA==</latexit>

Determine nesting relationship of gird layers

Calculate ⌘
<latexit sha1_base64="zJHtEtyWQYBKP7fg+xaIal7Wx3k=">AAAB63icbVDLSgNBEOz1GeMr6lGRwSB4CruCj2PQi8cEzAOSJcxOZpMhM7PLzKwQlhy9evGgiFf/Id/hzW/wJ5xNctDEgoaiqpvuriDmTBvX/XKWlldW19ZzG/nNre2d3cLefl1HiSK0RiIeqWaANeVM0pphhtNmrCgWAaeNYHCb+Y0HqjSL5L0ZxtQXuCdZyAg2mdSmBncKRbfkToAWiTcjxfLRuPr9eDyudAqf7W5EEkGlIRxr3fLc2PgpVoYRTkf5dqJpjMkA92jLUokF1X46uXWETq3SRWGkbEmDJurviRQLrYcisJ0Cm76e9zLxP6+VmPDaT5mME0MlmS4KE45MhLLHUZcpSgwfWoKJYvZWRPpYYWJsPHkbgjf/8iKpn5e8y9JF1aZxA1Pk4BBO4Aw8uIIy3EEFakCgD0/wAq+OcJ6dN+d92rrkzGYO4A+cjx/5EpHp</latexit>

Calculate parameters for simulation


loop
for
Calculate non-dispersive, non-breaking (M, N )
<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

layers

Construct implicit equations of q


loop loop Bi-CGSTAB
for for Update q iteration
coarse fine
step If converged
step
Calculate dispersive (M, N ) with q
<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

Calculate ⌫
<latexit sha1_base64="TjbkfnZS+qf8pJ+YJeSrWE7vk+8=">AAAB6nicbZDLSgMxFIbP1Fsdb1WXboJFcFVmBC8bsejGZUV7gXYomTTThmaSIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845YcKZNp737eQWFpeWV/Kr7tr6xuZWYXunpmWqCK0SyaVqhFhTzgStGmY4bSSK4jjktB72r0Z5/Z4qzaS4M4OEBjHuChYxgo21blsibReKXskbC82DP4XixYd7nrx9uZV24bPVkSSNqTCEY62bvpeYIMPKMMLp0G2lmiaY9HGXNi0KHFMdZONRh+jAOh0USWWfMGjs/u7IcKz1IA5tZYxNT89mI/O/rJma6CzImEhSQwWZfBSlHBmJRnujDlOUGD6wgIlidlZEelhhYux1XHsEf3bleagdlfyT0vGNVyxfwkR52IN9OAQfTqEM11CBKhDowgM8wbPDnUfnxXmdlOacac8u/JHz/gPBqpEe</latexit>

Calculate breaking (M, N ) with ⌫


<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

<latexit sha1_base64="TjbkfnZS+qf8pJ+YJeSrWE7vk+8=">AAAB6nicbZDLSgMxFIbP1Fsdb1WXboJFcFVmBC8bsejGZUV7gXYomTTThmaSIckIZegjuHGhiEt9F/duxLcxvSy09YfAx/+fQ845YcKZNp737eQWFpeWV/Kr7tr6xuZWYXunpmWqCK0SyaVqhFhTzgStGmY4bSSK4jjktB72r0Z5/Z4qzaS4M4OEBjHuChYxgo21blsibReKXskbC82DP4XixYd7nrx9uZV24bPVkSSNqTCEY62bvpeYIMPKMMLp0G2lmiaY9HGXNi0KHFMdZONRh+jAOh0USWWfMGjs/u7IcKz1IA5tZYxNT89mI/O/rJma6CzImEhSQwWZfBSlHBmJRnujDlOUGD6wgIlidlZEelhhYux1XHsEf3bleagdlfyT0vGNVyxfwkR52IN9OAQfTqEM11CBKhDowgM8wbPDnUfnxXmdlOacac8u/JHz/gPBqpEe</latexit>

update ⌘ by averaing child layer result


<latexit sha1_base64="zJHtEtyWQYBKP7fg+xaIal7Wx3k=">AAAB63icbVDLSgNBEOz1GeMr6lGRwSB4CruCj2PQi8cEzAOSJcxOZpMhM7PLzKwQlhy9evGgiFf/Id/hzW/wJ5xNctDEgoaiqpvuriDmTBvX/XKWlldW19ZzG/nNre2d3cLefl1HiSK0RiIeqWaANeVM0pphhtNmrCgWAaeNYHCb+Y0HqjSL5L0ZxtQXuCdZyAg2mdSmBncKRbfkToAWiTcjxfLRuPr9eDyudAqf7W5EEkGlIRxr3fLc2PgpVoYRTkf5dqJpjMkA92jLUokF1X46uXWETq3SRWGkbEmDJurviRQLrYcisJ0Cm76e9zLxP6+VmPDaT5mME0MlmS4KE45MhLLHUZcpSgwfWoKJYvZWRPpYYWJsPHkbgjf/8iKpn5e8y9JF1aZxA1Pk4BBO4Aw8uIIy3EEFakCgD0/wAq+OcJ6dN+d92rrkzGYO4A+cjx/5EpHp</latexit>

reverse
output ⌘, q, M, N when needed
<latexit sha1_base64="5y/8mnHbt3QIPxh1dwsYcKb4hDE=">AAAB8nicbVDJSgNBEO2JW4xb1KMijUHwEMKM4HIMevGiJGAWmAyhp9OTNOmZHrtrhBBy9BO8eFDEqx+Q7/DmN/gTdpaDJj4oeLxXRVU9PxZcg21/WamFxaXllfRqZm19Y3Mru71T1TJRlFWoFFLVfaKZ4BGrAAfB6rFiJPQFq/ndq5Ffe2BKcxndQS9mXkjaEQ84JWAkt8GA5PF9/iZ/28zm7II9Bp4nzpTkivvD8vfjwbDUzH42WpImIYuACqK169gxeH2igFPBBplGollMaJe0mWtoREKmvf745AE+MkoLB1KZigCP1d8TfRJq3Qt90xkS6OhZbyT+57kJBBden0dxAiyik0VBIjBIPPoft7hiFETPEEIVN7di2iGKUDApZUwIzuzL86R6UnDOCqdlk8YlmiCN9tAhOkYOOkdFdI1KqIIokugJvaBXC6xn6816n7SmrOnMLvoD6+MHmm6T3w==</latexit>

loop
for
Calculate (M, N ) again
<latexit sha1_base64="Xc5ywixS5sySCdonE/WUgtWaWMs=">AAAB7HicbVDLSsNAFL2pr1pfVZduhhahopRE8LEsunGjVDBtoQ1lMp20QyeTMDMRQug3uOlCEbd+kLv+jdPHQqsHLhzOuZd77/FjzpS27YmVW1ldW9/Ibxa2tnd294r7Bw0VJZJQl0Q8ki0fK8qZoK5mmtNWLCkOfU6b/vB26jefqVQsEk86jakX4r5gASNYG8mt3J89nHSLZbtqz4D+EmdByrVS53Q8qaX1bvGr04tIElKhCcdKtR071l6GpWaE01GhkygaYzLEfdo2VOCQKi+bHTtCx0bpoSCSpoRGM/XnRIZDpdLQN50h1gO17E3F/7x2ooNrL2MiTjQVZL4oSDjSEZp+jnpMUqJ5aggmkplbERlgiYk2+RRMCM7yy39J47zqXFYvHk0aNzBHHo6gBBVw4ApqcAd1cIEAgxd4hTdLWGPr3fqYt+asxcwh/IL1+Q2MiZDZ</latexit>

layers Device-to-host
data transfer

Release device memory

Figure 4.2 Programming flow of PCOMCOT-GPU. The magenta and green boxes denote the
procedures exclusive to child and parent grid layers, respectively.

4.2 Compiling Source Files

The CPU and GPU versions of PCOMCOT have different platform requirements and compilation
options. Table 4.1 provides an overview of these requirements and options.

38
Table 4.1 Overview of platform requirements and compilation options for PCOMCOT

Version CPU GPU

System Linux, MacOS Linux


Hardware Requirements Intel® CPU NVIDIA® GPU
(Compute Capability ⩾6.0)
Software requirements Required: gfortran, openMPI NVIDIA® HPC SDK
Optional: NetCDF-Fortran

NetCDF Support Yes No


Floating-point Precision Double Single or Double
Compilation Options With NetCDF: make Single Precision: make
Without NetCDF: make nocdf Double Precision: make double

4.2.1 Compilation of CPU Version

For the CPU version of PCOMCOT, all the source files and their descriptions are listed in Table
4.2. To compile these files, gfortran – the GNU Fortran compiler, and OpenMPI – a widely
used message passing interface library must be installed. Besides, netcdf-fortran, the netCDF
programming interface for Fortran is optional for compilation, which enables PCOMCOT to read
input files in netCDF format.
We have compiled and tested PCOMCOT-CPU in MacOS and Linux systems. In MacOS, it
is recommended to install the libraries and software mentioned above using a package management
tool such as Homebrew. In Linux, they can be installed more easily with the built-in package
manager APT, by executing the following commands in the terminal.

$ sudo apt install gcc g++ gfortran


$ sudo apt install openmpi-bin openmpi-doc libopenmpi-dev
$ sudo apt install libnetcdf-dev libnetcdff-dev

39
Table 4.2 Source Files for PCOMCOT-CPU

File Name Description

pcomcot.f90 main program of PCOMCOT

pcomcotLIB.f90 all subroutines except solving equations

MPICommunicationLIB.f90 subroutines for MPI communication

pcomcotNetCDFlib.f90 subroutines reading netCDF files

pcomcotNetCDFlibEmpty.f90 empty subs for compiling without netCDF library

solveSWEs.f90 subroutines solving shallow water equations

dispersion.f90 subroutines calculating dispersive terms

breaker.f90 subroutines calculating wave breaking

BiCGStabLIB.f90 preconditioned Bi-CGSTAB for solving non-hydrostatic pressure

VariableDefination.f90 definition of structures used in PCOMCOT

okada.f90 Okada’s formula for calculating seabed displacement

After getting everything ready, you can compile the source files by running a “make” or “make
nocdf” command, depending on whether or not netcdf-fortran is available. To use the “make”
command, the path of netcdf-fortran needs to be specified according to the user’s environment.
In the makefile, the -I flag specifies the location of the module file “netcdf.mod”, and the -L flag
indicates the path of the shared library. You can get these paths immediately via a “nf-config”
command, after installing netcdf-fortran successively.

4.2.2 Compilation of GPU Version

All the source files for the GPU version of PCOMCOT are listed in Table 4.3. These files should
be compiled with the CUDA Fortran compiler, which is part of NVIDIA HPC SDK (Software
Development Kit), a suite of HPC compilers and libraries for CPU and GPU programming. How
to download and install HPC SDK is detailed in this official guide https://docs.nvidia.com/
hpc-sdk/hpc-sdk-install-guide/index.html. Note that due to conflict between the CUDA
Fortran compiler and the GNU Fortran compiler, NetCDF cannot be supported by the GPU version

40
of PCOMCOT. Besides, NVIDIA HPC SDK is now only available for Linux system.

Table 4.3 Source Files for PCOMCOT-GPU

File Name Description

pcomcot.cuf main program of PCOMCOT (controlled by CPU)


pcomcotLIB.cuf all CPU subroutines
pcomcotLIB CUDA.cuf all GPU subroutines except solving equations
pcomcotNetCDFlibEmpty.cuf empty CPU subs for compiling without netCDF library
solveSWEs.cuf GPU subroutines solving shallow water equations
dispersion.cuf GPU subroutines calculating wave dispersion
breaker.cuf GPU subroutines calculating wave breaking
VariableDefination.cuf definition of host and device structures
okada.cuf GPU subroutines for Okada’s formula

After the installation, some environment settings are required to compile PCOMCOT. First,
specify the path of the compiler by running
$ export NVDIR=/usr/local/nvidia/hpc_sdk
$ export NVARCH=‘uname -s‘_‘uname -m‘
$ export SDKVER=24.3
$ export SDKDIR=$NVDIR/$NVARCH/$SDKVER
$ export PATH=$SDKDIR/compilers/bin:$PATH
Here NVDIR is the installation location of NVIDIA HPC SDK, NVARCH is the system’s architecture
type (e.g., Linux x86 64), and SDKVER is the SDK version number. Users should adjust NVDIR
and SDKVER according to their own situations. Then, because the cuSPARSE library depends on
the nvJitLink library since CUDA 12.0, we must set the environment variable LD_LIBRARY_PATH
to include the path where the file “libnvjitlink.so” is located. Users can locate it by searching
for the file name in the directory SDKDIR. Usually, the path of “libnvjitlink.so” can be added to
LD_LIBRARY_PATH by running
$ export CUDAVER=12.3
$ export LD_LIBRARY_PATH=$SDKDIR/cuda/$CUDAVER/targets/$NVARCH/lib:$LD_LIBRARY_PATH
where CUDAVER is the cuda toolchain version included in the SDK and should be adjusted by users.

41
Finally, you can finish the compilation with a “make” or “make double” command, for single-
and double-precision computing, respectively. We provide these two options because there is only a
small number of double-precision arithmetic units in NVIDIA’s consumer GPUs (e.g., GeForce and
Quadro series). These GPUs can only be used for single-precision simulations. Double-precision
computing is supported by NVIDIA’s data center GPUs (Tesla series), but the double-precision
performance is generally half of single-precision performance.

4.3 Input

There are eight input files in total for PCOMCOT, some of which are required, and some are
optional. All the input files are listed in Table 4.4, along with their descriptions. These input
files can be classified into three types according to their usage — control files, bathymetry files
and initial condition files. More detailed instructions on how to prepare these files are given in the
following subsections.

Table 4.4 Input Files for PCOMCOT

Input File Name Description

pcomcot.ctl basic information for simulation; required

layers.ctl layer-specific parameters; optional

layerXX.xyz/nf bathymetry data files; required

InitialElevation.xyz/nf initial surface elevation; required when Initial Condition=0

InitialFluxM.xyz initial volume flux in x direction; optional when Initial Condition=0

InitialFluxN.xyz initial volume flux in y direction; optional when Initial Condition=0

FaultParameters.ctl finite fault parameters; required when Initial Condition=1 or Purpose


of Calculation=2
Stations.ctl coordinates of output stations; required

Control files of PCOMCOT include “pcomcot.ctl”, “layers.ctl” and “Stations.ctl”, which are
used to set the parameters for governing equations, numerical schemes, and the input/output data.

42
“pcomcot.ctl” is the most basic control file providing necessary information for simulation. “lay-
ers.ctl” prescribes the simulation parameters specific to each grid layer, and its parameters are just
part of those in “pcomcot.ctl”. If “layers.ctl” is not given by the user, the values in “pcomcot.ctl”
will be used as the global values for all layers. “Stations.ctl” is required to give the locations of
stations where time histories of variables are output.
The bathymetry files “layerXX.xyz/nf” store the gridded data of water depth, each of which
corresponds to a single layer in the nesting system. Users can provide bathymetry files in both
netCDF format with the extension “.nf” or in three-column ASCII format with the extension
“.xyz”.
The initial condition files “InitialElevation.xyz/nf”, “InitialFluxM.xyz”, “InitialFluxN.xyz”,
and “FaultParameters.ctl” provide the initial condition of computation. For simulating the propaga-
tion of water surface disturbance or general water waves, “InitialElevation.xyz/nf”, “InitialFluxM.xyz”
and “InitialFluxN.xyz” give the initial values of surface elevation and horizontal volume fluxes. For
simulating earthquake tsunamis, “FaultParameters.ctl” provides the parameters of finite faults as
the tsunami source.
Note that any input file in netCDF format cannot be used for the GPU version of PCOMCOT.

4.3.1 Parameters in pcomcot.ctl

The control file “pcomcot.ctl” is required to provide necessary information for PCOMCOT model.
Following are descriptions of parameters in “pcomcot.ctl”.

Basic Control Parameters

Purpose of Calculation: 1 – forward simulation of tsunami waves; 2 – calculate Green’s functions


of finite faults; 3 – calculate Green’s functions of initial surface elevation.

Initial Condition: 0 – initial water surface elevation (required) and volume flux (optional) are given
as initial condition, to simulate transient wave propagation; 1 – finite fault parameters are
given as initial condition, to simulate earthquake tsunamis.

Coordinate System: 0 – spherical coordinates on Earth surface; 1 – Cartesian coordinates.

Total run time: total simulation time in seconds.

43
Time step: time step size in seconds for the top grid layer. Time step sizes for the other layers are
determined automatically.

Time interval to Save Snapshots: time interval in seconds to save snapshots of the wavefield during
simulation.

Save Flux: 1 – output snapshots and time histories of both surface elevation η and volume flux
(M, N ); 0 – only η is output.

Save Non-hydrostatic Pressure: 1 – output snapshots and time histories of non-hydrostatic pressure
at the bottom; 0 – non-hydrostatic pressure is not output. Note that the non-hydrostatic
pressure can be output only when wave dispersion is computed.

Minimum grids on each computing node: minimum number of grid cells assigned to each comput-
ing node when running on multiple CPU cores; makes no sense for GPU version.

Feedback to parent layer: options for different grid nesting algorithms. 0 – one-way nesting from
parent to child layers; 1 – two-way nesting including feedback from child to parent layers.
Note that we have not tested the two-way nesting yet.

Parameters for Surface Deformation

Consider Horizontal Motion: 1 – when calculating sea surface deformation caused by earthquakes,
consider the contribution of horizontal motion to seafloor uplift using the equation of Tanioka
and Satake (1996); 0 – do not consider horizontal motion.

Apply Kajiura filter: 1 – apply Kajiura filter to the initial surface deformation, which accounts for
the low-pass filtering effect of sea water; 0 – do not apply Kajiura filter.

Use average depth for Kajiura: 1 – use the average water depth of the source region when applying
Kajiura filter; 0 – use the varying local depth. We suggest setting this parameter to be 1,
because using varying local depth may lead to oversmoothing near the trench.

Water depth Limit for Kajiura: the minimum water depth in meters for applying Kajiura filter.
The low-pass filtering effect of sea water should be ignored in shallow water.

Parameters for Wave Physics and Numerics

44
Nonlinearity: parameter for wave nonlinearity. 0 – no wave nonlinearity; 1 – inclusion of nonlinear
convection terms.

Dispersion: parameter for wave dispersion. 0 – no wave dispersion; 1 – inclusion of dispersive terms.

Depth change for Dispersion: parameter for the steepness of bathymetry. 0 – smooth bottom; 1
– relatively steep bottom. For smooth bottom, the new governing equations (2-21) with
α = 2/3, β = 0.5 is used. While for steep bottom, the previous model with α = 0.5, β = 1.0
(Stelling and Zijlema, 2003; Yamazaki et al., 2009; Zijlema and Stelling, 2008) is adopted.
For more accurate modeling of dispersive effects, we recommend setting this parameter to be
0 for tsunami propagation in deep ocean, and 1 for near-shore transformations. In practice,
different choices of this parameter would not cause significant difference in the leading and
first trailing waves.

Water depth Limit for Dispersion: minimum water depth in meters for calculating wave dispersion.
Dispersive effects are ignored in very shallow water for both efficiency and stability.

Breaking: parameter for wave breaking; 0 – no wave breaking; 1 – inclusion of wave breaking using
eddy-viscosity scheme.

Scheme for LSWEs: options for the numerical scheme to solve the shallow water equations. 0 –
traditional forward time-centered space (FTCS) scheme without numerical diffusion; 1 – flux-
centered scheme with extra numerical diffusion. The flux-centered scheme is recommended
for better stability.

Froude number Cap to Limit velocity: the maximum value of Froude number to avoid unrealistic
velocity. A Froude number cap = 10.0 is used in our tests. Smaller value can be used, but
for inundation calculation, a Froude number cap less than 1.5 is not suggested.

Time interval to apply Filter: time interval in seconds to apply a 2D 9-point filter to remove 2-
grid-wavelength components. It is not suggested to use this filter in general. Turn off the
filter by setting this parameter to be longer than the total simulation time.

Parameters for Boundary Condition

Boundary Condition: parameter for boundary condition type; 1 – wall; 2 – sponge.

45
Width of Sponge (West-East): width of sponge layer (m) on the west and east boundaries.

Width of Sponge (South-North): width of sponge layer (m) on the south and north boundaries.

Maximum Manning coefficient in Sponge: maximum Manning’s roughness coefficient in sponge layer.

Damping coefficient A: the free parameter A in equation (3-29) for L-D type sponge, its value is
∼2.0.

Damping coefficient R: the free parameter γ in equation (3-29) for L-D type sponge, its value is
0.88∼0.92.

Parameters for Inundation

Permanent dry limit: maximum elevation (m) above which the grid cell is treated as permanently
dry and excluded from calculation.

Water depth Limit for wet −> dry: the minimum water depth (m) for the wetting and drying
scheme. 0.01 is suggested for real tsunamis, and larger value can be used to ensure stability.

Water depth limit for Bottom friction: minimum depth (m) for calculating bottom friction. 0.05 is
suggested for real tsunamis.

Manning coefficient for Bottom friction: Manning roughness coefficient in Manning’s formula. Its
value is ∼0.03 in most cases.

Parameters for Computing Green’s Functions (only when Purpose of Calculation = 3)

Source Area Starting Longitude: west longitude (degree) of tsunami source region.

Source Area Ending Longitude: east longitude (degree) of tsunami source region.

Source Area Discretizatoin Grid Size X: size (m) of sub-source region in west-east direction.

Source Area Starting Latitude: south latitude (degree) of tsunami source region.

Source Area Ending Latitude: north latitude (degree) of tsunami source region.

Source Area Discretizatoin Grid Size Y: size (m) of sub-source region in south-north direction.

Basis Function Type: type of point source function. 1 – Gaussian function; 2 – Sigmoid function.

46
Ratio of Gaussian Raidus / Grid Size: ratio of Gaussian function radius to the sub-source region
size. This parameter determines the width of the point source.

Sigmoid Coefficient: coefficient defining the steepness of Sigmoid function.

4.3.2 Parameters in layers.ctl

The file “layers.ctl” is not required but optional. With this input file, PCOMCOT is more flexible
for balance between accuracy and efficiency. In “layers.ctl”, the name of each layer is a two-digital
integer consistent with the corresponding bathymetry file. Figure 4.3 shows the 6 layer-specific
parameters in “layers.ctl”. By setting these parameters, we can adopt different wave models for
different grid layers. For example, we can simulate linear dispersive waves in the large outer layers,
where wave dispersion may be significant in long-distance propagation, and switch to nonlinear
shallow water model in the small inner layers covering coastal areas. Besides, we can prescribe that
computation does not start in a certain layer until tsunamis approach its boundary from outside,
by setting the value of Computing Start Time. Note that not all layers must appear in “layers.ctl”.
If a layer is absent from “layers.ctl”, its parameters are taken to be same as those in “pcomcot.ctl”.

Figure 4.3 Parameters in layers.ctl

47
4.3.3 Format of Bathymetry Files

The input bathymetry files must be named as “layerXX.nf” or “layerXX.xyz”, where “XX” is
a two digital number from 01 to 99 and does not have to be continuous. Each bathymetry file
corresponds to a single grid layer in the nesting system. All files must have the format of NetCDF
(binary, ending with “.nf”) or “xyz” (ASCII, ending with “.xyz”). “nf” files can be obtained from
commonly used bathymetry data sets provided by different organizations (e.g., GEBCO2020) with
GMT commands (e.g., “grdcut” and “grdsample”); An “xyz” file contains three columns, with the
first column indicating x coordinates, the second for y coordinates, and the last for water depth in
meters. x and y coordinates should be given in meters for Cartesian coordinates and in degrees for
Earth coordinates. Data in “xyz” files must be sorted by x ascending and then y ascending order,
e.g.:

(Line 1:) 138.000000000 33.000000000 4033.7974

(Line 2:) 138.016666667 33.000000000 4016.6523

(Line 3:) 138.033333333 33.000000000 4005.2083

(Lines..) ...

(Line 842:) 138.000000000 33.016666667 4054.6309

(Lines..) ...

We can simply convert “nf” files into “xyz” files using the GMT command

$ gmt grd2xyz layerXX.nf | awk ’{print $1,$2,-$3}’ | sort -n -s -k 2 > layerXX.xyz

Note that “xyz” files should have positive water depth, while “nf” files should have negative water
depth. For the setting of nested grids, there are two general rules. First, only one top layer is
allowed; Second, one layer can contain others but cannot overlap another one without containing.
There are no constraints on the location and grid size of any layer, but child layers with larger grid
sizes than parent layers have not been tested.

4.3.4 Format of Initial Elevation and Flux Files

If the parameter Initial Condition is set to be 0 in “pcomcot.ctl”, an initial elevation file named “Ini-
tialElevation.nf” or “InitialElevation.xyz” must be provided. “InitialElevation.nf/xyz” must have

48
the same format as described above, with exactly the same x and y coordinates as the bathymetry
file of the top layer. Two extra input files “InitialFluxM.xyz” and “InitialFluxN.xyz” can also
be used to provide the initial volume fluxes in x and y directions, respectively. The formats of
“InitialFluxM.xyz” and “InitialFluxN.xyz” are the same as “InitialElevation.xyz”, but the x and y
coordinates are different. Pay attention that, because PCOMCOT uses staggered grids, η, M and
N are defined on different locations. Suppose that the total number of grid cells in a certain layer
are NX and NY along x and y directions, respectively. Then the dimensions of η, M and N are
NX × NY, (NX − 1) × NY and NX × (NY − 1), respectively. Therefore, “InitialFluxM.xyz” must
have the same y coordinates as the bathymetry file, but have only (NX − 1) x coordinates which are
larger than the corresponding ones in the bathymetry file by ∆x/2. Similarly, “InitialFluxN.xyz”
must have the same x coordinates as the bathymetry file of the top layer, but have only (NY − 1)
y coordinates which are larger than the corresponding ones in the bathymetry file by ∆y/2.

4.3.5 Parameters in FaultParameters.ctl

If the parameter Initial Condition is set to be 1 or Purpose of Calculation is set to be 2 in “pcom-


cot.ctl”, then the fault slip of an earthquake is used as the tsunami source, and finite fault parameters
must be given in “FaultParameters.ctl”. PCOMCOT takes the parameters of multiple rectangular
finite faults as input and calculates the static seafloor deformation with Okada’s half space model
(Okada, 1985). Then, the sea surface elevation caused by bottom displacement is used as the initial
condition for tsunami simulation. When the bathymetry of the source region is smooth and the
water depth is much less than the dimension of bottom deformation, the surface elevation is almost
identical to the bottom vertical displacement. However, if the earthquake occurs near a steep slope
(e.g., a trench), the horizontal motion may be a non-negligible factor of tsunami generation, and
this contribution is considered with the formula of Tanioka and Satake (1996). If the earthquake
occurs in deep ocean, the surface deformation is smoother than the bottom deformation, because of
the low-pass filtering effect of seawater. In PCOMCOT, the series representation of Kajiura filter
in space domain is used to account for such effect (Kajiura, 1963; Glimsdal et al., 2013). This form
of Kajiura filter is equivalent to its original form (i.e., the 1/cosh(kh) filter) in wavenumber domain.
For forward simulation (Purpose of Calculation = 1), all the faults with their slip given by the
user are adopted in computation. For calculation of Green’s functions (Purpose of Calculation =
2), the slip of all faults is set to be 1.0 m, and the resulting tsunami of each fault is calculated one

49
by one. An example of “FaultParameters.ctl” is shown in Figure 4.4. Following is the explanation
of each parameter.

Figure 4.4 Finite fault parameters in FaultParameters.ctl

Fault Rupture Starting Time: rupture starting time of fault in seconds. Kinematic rupture
process can be considered by assigning different rupture starting time to different subfaults.

Focal Depth: depth (m) of center of the rectangular fault.

Length of source area: length (m) of the rectangular fault (i.e., size along strike direction).

Width of source area: width (m) of the rectangular fault (i.e., size along dip direction).

50
Dislocation of fault plate: amount of slip (m) on the fault.

Rake: rake angle (degree) of the fault slip.

Strike: strike angle (degree) of the fault.

Dip: dip angle (degree) of the fault.

Epicenter’s Latitude: y coordinate of the horizontal projection of the fault center. Its value is
in degrees for Earth coordinates and in meters for Cartesian coordinates.

Epicenter’s Longitude: x coordinate of the horizontal projection of the fault center. Its value is
in degrees for Earth coordinates and in meters for Cartesian coordinates.

4.3.6 Parameters in Stations.ctl

“Stations.ctl” is a three-column ASCII file, with the first two columns indicating the x and y
coordinates, and the third column giving the names of the stations, respectively. After simulation
is finished, the time series of surface elevation and horizontal volume flux at the location of each
station can be output. An example of “Stations.ctl” is presented in Figure 4.5.

Figure 4.5 Parameters in Stations.ctl

4.4 Output

All the output files of PCOMCOT are listed in Table 4.5. PCOMCOT creates a directory “PCOM-
COToutput” at the start of simulation, and puts the output files in this directory. Three types of
data are output by PCOMCOT – coordinate data, wavefield data and station data. The coordinate
data are stored in files “ xcoordintesXX.dat” and “ ycoordintesXX.dat”, which are the coordi-
nates of grid cells along x and y directions for No.XX layer. The wavefield data are the values of

51
variables related with he wavefield, including bathymetry ( bathymetryXX.dat), snapshots of wa-
ter surface elevation (z XX xxxxxx.dat), volume flux (M XX xxxxxx.dat, N XX xxxxxx.dat) and
non-hydrostatic pressure at the bottom (Q XX xxxxxx.dat), and the maximum/minimum water
surface elevation (zmax XX.dat, zmin XX.dat). The station data are time series of water surface
elevation (Stationyyyy.dat), volume flux (Stationyyyy M.dat, Stationyyyy N.dat), and the non-
hydrostatic pressure (Stationyyyy Q.dat) at the locations of stations. Snapshots and time series of
non-hydrostatic pressure are output only when wave dispersion is included. The coordinate data
are output in ASCII format, while the wavefield data and station data are stored in binary format.

Table 4.5 Output Files of PCOMCOT

File Name Format Description

xcoordintesXX.dat ASCII x coordinates of No.XX layer; row vector

ycoordintesXX.dat ASCII y coordinates of No.XX layer; row vector

bathymetryXX.dat Binary bathymetry of No.XX layer

zmax XX.dat Binary maximum water elevation of No.XX layer

zmin XX.dat Binary minimum water elevation of No.XX layer

z XX xxxxxx.dat Binary snapshot of water elevation of No.XX layer at time step xxxxxx

M XX xxxxxx.dat Binary snapshot of flux component M

N XX xxxxxx.dat Binary snapshot of flux component N

Q XX xxxxxx.dat* Binary snapshot of bottom non-hydrostatic pressure Q

Stationyyyy.dat Binary time series of water elevation at station yyyy

Stationyyyy M.dat Binary time series of flux component M

Stationyyyy N.dat Binary time series of flux component N

Stationyyyy Q.dat* Binary time series of bottom non-hydrostatic pressure Q

*
The non-hydrostatic pressure Q is normalized by water density.

Note that the two-digit integers “XX” in the output file names are different from those in

52
the input bathymetry file names. For output, “XX” is a continuous integer starting from 01,
which indicates the order of each grid layer based on their bathymetry file names. For example,
if “layer02.nf/xyz”, “layer05.nf/xyz” and “layer12.nf/xyz” are provided by the user, then their
corresponding code in the output files are “01”, “02” and “03” respectively, despite their nesting
relationship. The six-digit integer “xxxxxx” in the snapshot file names represents the number of
time step, rather than the actual time. “yyyy” is a continuous four-digit integer starting from 0001,
which corresponds to the order of stations appearing in “Stations.ctl”.
The structures of all the wavefield data files, i.e., “ bathymetryXX.dat”, “z XX xxxxxx.dat”,
“zmax XX.dat”, “zmin XX.dat”, “Q XX xxxxxx.dat”, “M XX xxxxxx.dat” and “N XX xxxxxx.dat”
are the same. Because of the staggered grids we use, the dimensions of arrays M and N are different
from the others. Content of each line in a wavefield data file is:

(Line 1:) real data type (integer*4)

(Line 2:) nColumn, nRow (2×integer*4)

(Line 3:) the first row of data (nColumn×real data type)

(Line 4:) the second row of data (nColumn×real data type)

(Lines..) ...

(Line nRow+2:) the last row of data (nColumn×real data type)

The Format of station data files “Stationyyyy.dat”, “Stationyyyy M.dat”, “Stationyyyy N.dat” and
“Stationyyyy Q.dat” is:

(Line 1:) real data type (integer*4)

(Line 2:) calculationPurpose, nFaults, nDataLength (3×integer*4)

(Line 3:) time (nDataLength×real data type)

(Line 4:) time series from the first fault (nDataLength×real data type)

(Line 5:) time series from the second fault (nDataLength×real data type)

(Lines..) ...

53
(Line nFaults+3:) time series from the last fault (nDataLength×real data type)

For forward simulation, there is only one time series in station data files. While for calcula-
tion of Green’s functions, there are multiple time series, each of which is the Green’s function
of a single subfault or subregion with initial surface displacement. Two Matlab scripts “COM-
COT readBinaryDataSnapShot.m” and “COMCOT readBinaryDataStation.m” are provided in Ap-
pendix A, to read the binary wavefild data files and station data files, respectively.

54
5 Examples
We conduct a series of numerical experiments to validate PCOMCOT model for wave propagation,
transformation, and inundation. For small-scale tests in Cartesian coordinates, we compute solitary
wave propagation on a flat bottom, water surface oscillation in a paraboloidal basin, and run-up of
solitary wave on a circular island. These simulation results are compared with analytical solutions
and experimental data, and satisfactory agreement is obtained. For large-scale modeling of real
events, the 2011 Tohoku earthquake tsunami is simulated in Earth coordinates, and nested grid
system is used. The computed tsunami waveforms agree well with the recordings at tide gauges
and DART stations. Compared with the shallow water model, the non-hydrostatic model with
dispersion provides much better predictions at the DART stations in deep ocean. We have carried
out all the simulations using both the CPU and GPU versions of PCOMCOT, and analyzed their
performance in different situations.

5.1 Solitary Wave Propagation on Flat Bottom

As a solution of the classical Boussinesq equations, solitary wave propagation on a flat bottom
is a standard test for dispersive wave models. With the balance between wave nonlinearity and
dispersion, a solitary wave maintains its shape throughout the propagation. We set up a 3000 m-
long, 100 m-wide channel with a constant water depth of 10 m. The x and y directions are along the
length and width of the channel, respectively. The grid size is set to be ∆x = ∆y = 1 m, and 100 m-
wide sponge zones are added to both ends of the channel. A 2 m-high solitary wave at the location
x = 300 m is given as the initial condition, corresponding to wave nonlinearity ε = A/h = 0.2.
Both the initial elevation and initial flux M should be provided in the input files. The initial flux is
calculated by multiplying the analytical velocity with the total water depth. The simulation time
is 200 s, and the time step is set to be 0.05 s. Definitions of necessary parameters in “pcomcot.ctl”
are

Basic Control Parameters: Purpose of Calculation = 1, Initial condition = 0, Coordinate sys-


tem = 1, Total run time = 200.0, Time step = 0.05;

Parameters for Surface Deformation: Apply Kajiura filter = 0;

Parameters for Wave Physics and Numerics: Nonlinearity = 1, Dispersion = 1, Depth change

55
for Dispersion = 0, Water depth Limit for Dispersion = 0.1, Breaking = 0, Scheme for LSWEs
= 0, Froude number Cap to Limit velocity = 10.0, Time interval to apply Filter = 10000.0;

Parameters for Boundary Condition: Boundary Condition = 2, Width of Sponge (West-East)


= 100.0, Width of Sponge (South-North) = 0.0, Maximum Manning coefficient in Sponge =
100.0, Damping coefficient A = 2.0, Damping coefficient R = 0.9;

Parameters for Inundation: Permanent dry limit = 50.0, Water depth Limit for wet −> dry =
0.01, Water depth limit for Bottom friction = 0.05, Manning coefficient for Bottom friction
= 0.0.

analytical PCOMCOT Yamazaki et al., 2009

2
(meter)

t=5s
1

0 500 1000 1500 2000 2500 3000

2
(meter)

t = 60 s
1

0 500 1000 1500 2000 2500 3000

2
(meter)

t = 120 s
1

0 500 1000 1500 2000 2500 3000

2
(meter)

t = 180 s
1

0 500 1000 1500 2000 2500 3000


x (meter)

Figure 5.1 Comparison of numerical results and the analytical solution for solitary wave propaga-
tion on a flat bottom. The two models make different approximations of the vertical distribution of
non-hydrostatic pressure. The governing equations of both models are the same as (2-21), except
different values of α and β.

In the above list, the parameter Depth change for Dispersion is set to be 0 for the flat bottom,

56
corresponding to our new model with α = 2/3, β = 0.5 in the governing equations (2-21). To show
how the accuracy of wave dispersion is improved by our model, we also carry out the computation
using the previous governing equations with α = 0.5, β = 1.0 (Stelling and Zijlema, 2003; Yamazaki
et al., 2009), by setting the parameter Depth change for Dispersion to 1. The difference between
our new model and the previous one lies in the assumption on the distribution of non-hydrostatic
pressure. Based on the Boussinesq equations, we approximates the vertical distribution of pdyn
using a simple quadratic function. In comparison, the previous model assumes a linear vertical
distribution, which has proved to be less reasonable.
The simulation results given by two models are presented in Figure 5.1. At the first few
seconds, the wave profile is disturbed, due to using the analytical solution as the initial condition.
About 5 s later, the wave profiles of both models propagate stably along the channel, with the crest
height ≈ 2.06 m for our new model and 2.22 m for the previous one. It is seen that the result of
our new model is very close to the analytical solution. While in the previous model, the wave
profile travels faster, and there is a small trailing wave following the crest. Thus, by adopting a
more realistic approximation of non-hydrostatic pressure, slightly better results can be obtained
on smooth bottom. In real tsunami events where the water depth is non-uniform, the difference
between these two models would be smaller.

5.2 Fluid Oscillation in a Paraboloidal Basin

The oscillation of water surface in a paraboloidal basin is used to verify the moving boundary
technique for tsunami run-up. Analytical solutions for this type of motions are derived by Thacker
(1981) and are highly valuable for calibrating numerical inundation models (Cho and Kim, 2009).
One of the exact solutions for curved water surface oscillating in a circular paraboloidal basin is
presented here and compared with our numerical results. The still water depth is given as

r2
 
h = h0 1− 2 , (5-1)
a

where r is the distance from the rotational axis, a is the radius of the paraboloid, and h0 is the
water depth at the center. In this example, h0 is 1 m and a is 2500 m. The expression of water

57
surface elevation varying with time and space is
( 1 )
(1 − A2 ) 2 r2 1 − A2

η = h0 −1− 2 −1 , (5-2)
1 − A cos ωt a (1 − A cos ωt)2

in which the angular frequency ω and the non-dimensional parameter A are given as

1 1 a4 − r04
ω= (8gh0 ) 2 , A= . (5-3)
a a4 + r04

Here, r0 = 2000 m, ω = 3.54 × 10−3 rad/s, A = 0.419, and the period of free oscillation is ∼1700 s.
In numerical computation, the grid sizes ∆x and ∆y are set to be 10 m. The total simulation
time is 7000 s covering 4 periods, and the time step size is 1 s to satisfy the C.F.L condition.
The initial surface elevation is defined with the expression (5-2) at t = 0. For comparison with
analytical solution, no bottom friction is considered. Wave dispersion is not included because
the fluid motions are governed by nonlinear shallow water equations. Definitions of necessary
parameters in “pcomcot.ctl” are

Basic Control Parameters: Purpose of Calculation = 1, Initial condition = 0, Coordinate sys-


tem = 1, Total run time = 7000.0, Time step = 1.0;

Parameters for Surface Deformation: Apply Kajiura filter = 0;

Parameters for Wave Physics and Numerics: Nonlinearity = 1, Dispersion = 0, Breaking =


0, Scheme for LSWEs = 1, Froude number Cap to Limit velocity = 10.0, Time interval to
apply Filter = 50000.0;

Parameters for Boundary Condition: Boundary Condition = 1;

Parameters for Inundation: Permanent dry limit = 50.0, Water depth Limit for wet −> dry =
0.01, Water depth limit for Bottom friction = 0.05, Manning coefficient for Bottom friction
= 0.0.

At the beginning of computation, the surface is convex. Driven by gravity, the surface drops
gradually with the shoreline expanding on the basin. When the run-up height reaches its maximum
and the surface becomes concave, the water begins flowing back towards the center. The surface
returns to its initial shape at the end of a period. Figure 5.2 displays the water surface profiles
at different moments on the cross section through the rotation axis. In the first two periods,

58
the numerical model gives almost the same surface profiles as the analytical solutions. As the
computation goes on, the numerical diffusion of the upwind scheme makes water elevation lower
than the exact solutions when the surface rises.
computation analytical
2

1.5 t=0s t = 800 s t = 1600 s

0.5

-0.5

-1
2

1.5 t = 2400 s t = 3200 s t = 4000 s

1
(meter)

0.5

-0.5

-1
2

1.5 t = 4800 s t = 5600 s t = 6400 s

0.5

-0.5

-1
-4000 -2000 0 2000 4000
x (meter)

Figure 5.2 Comparison of water surface profiles on a paraboloidal basin between the analytical
solution and numerical result.

5.3 Solitary Wave Run-up on a Circular Island

The 1992 Flores Island tsunami impinged unexpectedly serious damage on the lee side of Babi
Island, Indonesia (Yeh et al., 1993), which sparked great interest of researchers in tsunami run-
up on a small island (e.g., Briggs et al., 1995; Liu et al., 1995; KÂNOĞLU and SYNOLAKIS ,
1998). The large-scale laboratory experiments of interactions between solitary wave and a circular
island were conducted by Briggs et al. (1995) at Coastal Engineering Research Center, Vicksburg,
Mississippi. These experiments reveal the fact that tsunami run-up on the sheltered backside of
an island can be comparable or even higher than that on the front side in certain situations. The

59
collected data have been used for validation of various models, including shallow water equations
(Liu et al., 1995), Boussinesq equations (Chen et al., 2000) and non-hydrostatic model (Yamazaki
et al., 2009).
In the laboratory, a 62.5 cm-high, 7.2 m toe-diameter, and 2.2 m crest-diameter circular island
with a 1:4 slope was located in a large basin. The initial solitary wave-like profiles were generated
by a directional spectral wave generator. Experiments were performed at two different water depths
(h = 32 cm and 42 cm), and solitary waves with three different height-to-depth ratios (ε = A/h =
0.045, 0.096 and 0.181) were tested. Here we simulate the cases with water depth of 32 cm, and
compare the results with the experiment data provided by NOAA Center for Tsunami Research. In
our numerical model, a truncated cone with the same size is set up in the 80 m-long and 24 m-wide
domain. 4 m-wide sponge zones are applied to both ends in x direction. As shown in Figure 5.3,
four gauges around the island are used to record the time histories of water surface elevation. Gauge
6 is in front of the island at the toe. Gauges 9, 16 and 22 are near the still shoreline of the island,
at the 0◦ , 90◦ and 180◦ radial lines, respectively. Coordinates of these wave gauges are given by
Table 5.1. Note that the location of the island center (x = 40 m, y = 0 m) in our simulations is
different from that in the lab experiments. Thus, the coordinates of gauges are adjusted to keep
their relative locations to the island unchanged. The computation domain is discretized at a grid
size of ∆x = ∆y = 0.05 m. A solitary wave profile with the crest at x = 15 m is given as the
initial condition. The Manning’s roughness coefficient is set to be 0.013, corresponding to smooth
concrete surface. Since Liu et al. (1995) reported that the wave broke in the laboratory realization,
the eddy-viscosity scheme is used in this example. These simulations last for 35 s, with the time
step size set to be 0.01 s.

60
(a)
<latexit sha1_base64="X5dXuTbfJKbrxU/SOCaY9Iszc2E=">AAAB8nicbVBNSwMxEJ2tX7V+VT16CRbBU9ktaj0WvXisYD9gu5Zsmm1Ds8mSZIWy9Gd48aCIV3+NN/+NabsHbX0w8Hhvhpl5YcKZNq777RTW1jc2t4rbpZ3dvf2D8uFRW8tUEdoikkvVDbGmnAnaMsxw2k0UxXHIaScc3878zhNVmknxYCYJDWI8FCxiBBsr+bW6+5j1CFNk2i9X3Ko7B1olXk4qkKPZL3/1BpKkMRWGcKy177mJCTKsDCOcTku9VNMEkzEeUt9SgWOqg2x+8hSdWWWAIqlsCYPm6u+JDMdaT+LQdsbYjPSyNxP/8/zURNdBxkSSGirIYlGUcmQkmv2PBkxRYvjEEkwUs7ciMsIKE2NTKtkQvOWXV0m7VvWuqpf3F5XGTR5HEU7gFM7Bgzo04A6a0AICEp7hFd4c47w4787HorXg5DPH8AfO5w+npZDe</latexit>

270

10
y (m) <latexit sha1_base64="KtRzyljMyU9VXfhNdG7kjLjUyag=">AAAB8HicbVDJSgNBEK1xjXGLevTSGARPYUbcjkEvHiOYRZIx9HR6kia9DN09QhjyFV48KOLVz/Hm39hJ5qCJDwoe71VRVS9KODPW97+9peWV1bX1wkZxc2t7Z7e0t98wKtWE1oniSrcibChnktYts5y2Ek2xiDhtRsObid98otowJe/tKKGhwH3JYkawddKD/5h1CNNk3C2V/Yo/BVokQU7KkKPWLX11eoqkgkpLODamHfiJDTOsLSOcjoud1NAEkyHu07ajEgtqwmx68BgdO6WHYqVdSYum6u+JDAtjRiJynQLbgZn3JuJ/Xju18VWYMZmklkoyWxSnHFmFJt+jHtOUWD5yBBPN3K2IDLDGxLqMii6EYP7lRdI4rQQXlfO7s3L1Oo+jAIdwBCcQwCVU4RZqUAcCAp7hFd487b14797HrHXJy2cO4A+8zx+6pZBh</latexit>

0
6 9 22 <latexit sha1_base64="xB6fm+cNuEOfns0ZWiIATvwIIus=">AAAB8nicbVDLSgNBEOz1GeMr6tHLYhA8hV3xkWPQi8cI5gHJGmYns8mQ2ZllplcISz7DiwdFvPo13vwbJ8keNLGgoajqprsrTAQ36Hnfzsrq2vrGZmGruL2zu7dfOjhsGpVqyhpUCaXbITFMcMkayFGwdqIZiUPBWuHoduq3npg2XMkHHCcsiMlA8ohTglbq+FXvMetSrumkVyp7FW8Gd5n4OSlDjnqv9NXtK5rGTCIVxJiO7yUYZEQjp4JNit3UsITQERmwjqWSxMwE2ezkiXtqlb4bKW1LojtTf09kJDZmHIe2MyY4NIveVPzP66QYVYOMyyRFJul8UZQKF5U7/d/tc80oirElhGpub3XpkGhC0aZUtCH4iy8vk+Z5xb+qXN5flGs3eRwFOIYTOAMfrqEGd1CHBlBQ8Ayv8Oag8+K8Ox/z1hUnnzmCP3A+fwCnpJDe</latexit>

180
0 16

<latexit sha1_base64="/Svwt1QiKZX2TzdRqt8pU9bSB80=">AAAB8XicbVDLSgNBEOyNrxhfUY9eBoPgKeyKz1vQi8cI5oHJGmYnk2TI7Owy0yuEJX/hxYMiXv0bb/6Nk2QPmljQUFR1090VxFIYdN1vJ7e0vLK6ll8vbGxube8Ud/fqJko04zUWyUg3A2q4FIrXUKDkzVhzGgaSN4LhzcRvPHFtRKTucRRzP6R9JXqCUbTSw5X7mLaZ0GzcKZbcsjsFWSReRkqQodopfrW7EUtCrpBJakzLc2P0U6pRMMnHhXZieEzZkPZ5y1JFQ278dHrxmBxZpUt6kbalkEzV3xMpDY0ZhYHtDCkOzLw3Ef/zWgn2Lv1UqDhBrthsUS+RBCMyeZ90heYM5cgSyrSwtxI2oJoytCEVbAje/MuLpH5S9s7LZ3enpcp1FkceDuAQjsGDC6jALVShBgwUPMMrvDnGeXHenY9Za87JZvbhD5zPHzgNkKQ=</latexit>

90
-10

0 10 20 30 40 50 60 70 80
x (m)

3.6 m
(b) 1.1 m

0.625 m

0.32 m

30 35 40 45 50
x (m)

Figure 5.3 Schematic of numerical test of solitary wave run-up on a conical island. (a) and (b)
are the top view and side view, respectively. In panel (a), the base and the still shoreline of the
island are denoted with solid and dashed lines, respectively. Locations of wave gauges are indicated
by hollow dots.

Table 5.1 Coordinates of gauges in numerical test of solitary wave run-up.

gauge number x (m) y (m)

6 36.4 0.0

9 37.4 0.0

16 40.0 -2.58

22 42.6 0.0

Definitions of necessary parameters in “pcomcot.ctl” are

Basic Control Parameters: Purpose of Calculation = 1, Initial condition = 0, Coordinate sys-


tem = 1, Total run time = 35.0, Time step = 0.01;

Parameters for Surface Deformation: Apply Kajiura filter = 0;

61
Parameters for Wave Physics and Numerics: Nonlinearity = 1, Dispersion = 1, Depth change
for Dispersion = 0, Water depth Limit for Dispersion = 0.05, Breaking = 1, Scheme for LSWEs
= 0, Froude number Cap to Limit velocity = 10.0, Time interval to apply Filter = 10000.0;

Parameters for Boundary Condition: Boundary Condition = 2, Width of Sponge (West-East)


= 4.0, Width of Sponge (South-North) = 0.0, Maximum Manning coefficient in Sponge =
100.0, Damping coefficient A = 2.0, Damping coefficient R = 0.9;

Parameters for Inundation: Permanent dry limit = 50.0, Water depth Limit for wet −> dry =
0.005, Water depth limit for Bottom friction = 0.005, Manning coefficient for Bottom friction
= 0.013.

The non-hydrostatic model together with eddy-viscosity scheme provide numerical solutions
for solitary wave propagation and transformation around the island. Figure 5.4 and Figure 5.5
display the water surfaces at different moments in three cases. Figure 5.4 shows the moments when
the incident wave reaches the front face of the island, and 2 s later when the trapped waves start
propagating toward the back side. Figure 5.5 shows the process of trapped waves wrapping around
the island, colliding at the lee side, and then passing across each other. For A/h = 0.045 and
0.096, the water surface is relatively smooth around the island, and no significant high-frequency
dispersive waves can be seen. For A/h = 0.181, the water surface becomes steep and rough at the
back side, and evident short dispersive waves are generated at the collision of trapped waves. After
that, high-frequency energy is leaking continuously from the trapped waves wrapping toward the
front side, leading to the mesh-like wave pattern behind the island. Besides, the steep wave profile
in the case of A/h = 0.181 causes breaking all around the island, which reduces the run-up height
at the lee side.
Figure 5.6 compares the computed waveforms at four gauges with the experiment data. For
each case, to align the timing of waveforms, the measured data at all gauges are shifted by a uniform
offset, so that the arrival time of wave peak at gauge 6 is consistent with the numerical result. It
is seen that the numerical model reproduces the primary waves relatively well, except that the
wave amplitude at gauge 22 is overestimated in the A/h = 0.181 case. The run-up and inundation
around the island are shown in Figure 5.7. Good agreement is obtained in all directions for all the
cases, especially that the inundation for A/h = 0.181, which is poorly reproduced by non-dispersive
and non-breaking models (Liu et al., 1995), are well predicted here. We notice that the computed

62
run-up height at the front side is slightly lower than the measured data for A/h = 0.181, and the
depressions behind the leading crests at all gauges tend to be underestimated. This may be due
to the loss of energy caused by the upwind scheme and moving boundary method. Nevertheless,
the overall good agreement with laboratory data demonstrates the capability of PCOMCOT to
simulate the complex processes of dispersive breaking waves.

Figure 5.4 Transformation of solitary waves on the front face of a conical island.

63
Figure 5.5 Propagation and interaction of trapped waves at the lee side of a conical island.

64
computed measured
0.02 0.08
0.04
0.06
0.01 0.02 0.04
(m)

0.02
0 0 0
-0.02
-0.01 g6, A/h = 0.045 -0.02 g6, A/h = 0.096 g6, A/h = 0.181
-0.04
8 10 12 14 16 18 20 22 8 10 12 14 16 18 20 22 8 10 12 14 16 18 20 22

0.03 0.1
0.05
0.02
0.05
(m)

0.01
0
0 0

-0.01 g9, A/h = 0.045 g9, A/h = 0.096 g9, A/h = 0.181
-0.05 -0.05
8 10 12 14 16 18 20 22 8 10 12 14 16 18 20 22 8 10 12 14 16 18 20 22

0.02 0.04
0.05
(m)

0.01 0.02

0 0 0

-0.01
g16, A/h = 0.045 -0.02
g16, A/h = 0.096 g16, A/h = 0.181
-0.05
8 10 12 14 16 18 20 22 8 10 12 14 16 18 20 22 8 10 12 14 16 18 20 22

0.06
0.15
0.02
0.04
0.1
(m)

0.01
0.02
0.05
0 0
-0.01 0
g22, A/h = 0.045 -0.02 g22, A/h = 0.096 g22, A/h = 0.181
8 10 12 14 16 18 20 22 8 10 12 14 16 18 20 22 8 10 12 14 16 18 20 22
Time (s) Time (s) Time (s)

Figure 5.6 Comparison of measured and computed waveforms at four gauges.

A/h = 0.045 A/h = 0.096 A/h = 0.181


270° 270° 270°
300° 240° 300° 240° 300° 240°

330° 210° 330° 210° 330° 210°

0° 180° 0° 180° 0° 180°

30° 150° 30° 150° 30° 150°

60° 120° 60° 120° 60° 120°


90° 90° 90°
computed measured
4 4 4
R/A

R/A

R/A

2 2 2

0 0 0
0° 30° 60° 90° 120° 150° 180° 0° 30° 60° 90° 120° 150° 180° 0° 30° 60° 90° 120° 150° 180°
Direction Direction Direction

Figure 5.7 Comparison of inundation and run-up around the conical island between measurements
and the numerical model. The run-up heights are normalized by the incident wave height. The 0◦
and 180◦ directions correspond to the exposed front side and the sheltered lee side, respectively.

65
5.4 2011 Tohoku Tsunami

The tsunami triggered by the 2011 Tohoku earthquake is used to verify our dispersive model and
the nesting algorithm in a realistic scenario. This megathrust earthquake occurred near the trench
to the northeast of Japan, where the Pacific plate is subducted beneath the North American plate.
Extensive studies have revealed large coseismic slip extending to the updip edge of the fault (e.g.,
Fujii et al., 2011; Fujiwara et al., 2011; Satake et al., 2013), which caused the unexpectedly large
tsunami. The very shallow slip near the trench could lead to short-wavelength tsunami waves, so the
dispersive effects in deep ocean may be more significant than in most earthquake tsunami events.
Previous studies have suggested the importance of dispersive models for predicting the tsunami
waveforms at some DART stations (Baba et al., 2015; Glimsdal et al., 2013; Saito et al., 2011).

N
0 km 100 km

613

202

40˚N
804
slip (m)
40
803
21418
30
801
20

10

35˚N

8 6 4 2 0 −2
Water depth (km)
140˚E 145˚E 150˚E

Figure 5.8 The earthquake source model and computational domains for the 2011 Tohoku tsunami.
The finite-fault model with heterogeneous slip is obtained by inverting the tsunami data. Magenta
triangle and black inverted triangles denote the DART station and GPS tide gauges, respectively.
Rectangular boxes outline the domains of nested grid layers.

To simulate the 2011 Tohoku tsunami, we estimate a finite-fault model for the earthquake

66
source using an inverse algorithm. The fault geometry is obtained from the USGS moment tensor
solution, and is divided into 8 and 6 subfault patches in the along-strike and down-dip directions,
respectively. The slip on each subfault is then estimated by inverting the tsunami data based
on the linear shallow water equations. The seafloor deformation is calculated with Okada’s half-
space elastic model (Okada, 1985), and the contribution of horizontal motion to seafloor uplift
is considered with the formula of Tanioka and Satake (1996). The initial surface elevation is
determined by applying the Kajiura filter (Kajiura, 1963; Glimsdal et al., 2013) to the seafloor
displacement.
We compute the tsunami waves in two nested grid layers. The outer layer is from 138◦ E to
152◦ E in longitude, and from 33◦ N to 44◦ N in latitude, with a resolution of 1 arcmin. The inner
layer covers the northeastern coast of Japan (140.5◦ E ∼ 142.5◦ E, 37.5◦ N ∼ 41◦ N) at a resolution of
15 arcsec. A 200 km-wide sponge zone is surrounding the top layer is used to avoid wave reflection
on the boundary. The bathymetry data of both layers are extracted from GEBCO2020. Six tsunami
stations, including five GPS tide gauges 202, 613, 801, 803, 804 and a DART station 21418 are used
for comparison between the simulation results and the tsunami observations. The earthquake source
model, computational domains, and the locations of tsunami observation are displayed in Figure
5.8. Because frequency dispersion may be necessary for the transformation of tsunami waves when
propagating from deep ocean to the shallow region, dispersive models are used for both layers. Wave
nonlinearity is only considered in the inner layer. The total simulation time is 6000 s, with the time
step size set to be 2 s for the outer layer . The time step size for the inner layer is automatically set
to be 1.0 s , so that the C.F.L. number for both layers are almost the same. Definitions of necessary
parameters in “pcomcot.ctl” are

Basic Control Parameters: Purpose of Calculation = 1, Initial condition = 1, Coordinate sys-


tem = 0, Total run time = 6000.0, Time step = 2.0, Feedback to parent layer = 0;

Parameters for Surface Deformation: Consider Horizontal Motion = 1, Apply Kajiura filter
= 1, Use average depth for Kajiura = 1, Water depth Limit for Kajiura = 100.0;

Parameters for Wave Physics and Numerics: Nonlinearity = 1, Dispersion = 1, Depth change
for Dispersion = 0, Water depth Limit for Dispersion = 100.0, Breaking = 0, Scheme for
LSWEs = 1, Froude number Cap to Limit velocity = 10.0, Time interval to apply Filter =
50000.0;

67
Parameters for Boundary Condition: Boundary Condition = 2, Width of Sponge (West-East)
= 100000.0, Width of Sponge (South-North) = 100000.0, Maximum Manning coefficient in
Sponge = 100.0, Damping coefficient A = 2.0, Damping coefficient R = 0.9;

Parameters for Inundation: Permanent dry limit = 100.0, Water depth Limit for wet −> dry
= 0.01, Water depth limit for Bottom friction = 0.05, Manning coefficient for Bottom friction
= 0.03.

The layer-specific parameters in “layers.ctl” are

Layer Name : 01
Nonlinearity = 0, Dispersion = 1, Depth change for Dispersion = 0, Breaking = 0, Scheme
for LSWEs = 1, Computing Start Time = 0.0, Time interval to apply Filter = 50000.0;

Layer Name : 02
Nonlinearity = 1, Dispersion = 1, Depth change for Dispersion = 0; Breaking = 0, Scheme
for LSWEs = 1, Computing Start Time = 0.0, Time interval to apply Filter = 50000.0.

Figure 5.9 displays the water surfaces in two grid layers at different moments. It can be
seen that the tsunami waves propagate seamlessly across the boundary of nesting grids, and no
unphysical oscillations are introduced. Comparison of water surface profiles between the dispersive
and non-dispersive models in Figure 5.10 shows evident dispersive short waves caused by the large
shallow slip. Figure 5.9 vividly depicts how frequency dispersion influences tsunami waves in the
deep ocean and the shallow region, respectively. In the deep ocean, a series of trailing waves are
generated, and both the amplitude and steepness of the leading wave are reduced. While when the
wave front approaches the coast and becomes sufficiently steep, dispersion begins to cause splitting
of the wave. As a result, the leading wave is amplified, and multiple short-period waves are formed
near the crest.
The simulated and observed tsunami waveforms at six stations are plotted in Figure 5.11. The
dispersive and non-dispersive models give almost identical results in the coast, which agree well with
the observation data. On the other hand, at the DART station 21418, the dispersive model gives
excellent predictions of the leading wave and the small trailing waves, which cannot be reproduced
by the shallow water model. These results suggest that dispersive effects are important in deep
ocean, while the shallow water equations are generally adequate for simulating local tsunamis in
coastal areas.

68
In summary, the non-hydrostatic model for wave dispersion and the nesting algorithm perform
satisfactorily in terms of accuracy and stability. For the 2011 Tohoku case, the time cost of the
dispersive model is only ∼2.5 times longer than that of the shallow water model. The good balance
between accuracy and efficiency achieved by PCOCMOT supports its application to accurate large-
scale tsunami modeling.

69
41°N
t = 800 s t = 800 s
42°N
40°N
elevation(m)
8

38°N 4 39°N
0

-4
38°N
34°N -8

41°N
t = 1600 s t = 1600 s
42°N
40°N

38°N 39°N

38°N
34°N

41°N
613 t = 2400 s t = 2400 s
42°N
202 40°N

804
803 21418
38°N 801 39°N

38°N
34°N

41°N
t = 3200 s t = 3200 s
42°N
40°N

38°N 39°N

38°N
34°N

138°E 142°E 146°E 150°E 141°E 142°E

Figure 5.9 Snapshots of water surface elevation given by the dispersive model in nested grids. The
left panels show the surface profiles in the parent layer, with black boxes indicating the domain of
the child layer. The right panels are snapshots of the child layer at the same moments.

70
dispersive, t = 800 s non-dispersive, t = 800 s
42°N

elevation (m)

38°N 5

-5
34°N

dispersive, t = 1600 s non-dispersive, t = 1600 s


42°N

38°N

34°N

dispersive, t = 2400 s non-dispersive, t = 2400 s


42°N

38°N

34°N

dispersive, t = 3200 s non-dispersive, t = 3200 s


42°N

38°N

34°N

138°E 142°E 146°E 150°E 138°E 142°E 146°E 150°E

Figure 5.10 Comparison of water surface profiles between the dispersive and non-dispersive models.

71
observed non-dispersive dispersive

4
Wave height (m)
2
202 613
2 1

0 0

-2 -1

-2
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90

10
801 5 803
0
0
-10

-5

0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
3

2
5 804 21418
1

0 0

-1
-5
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
Time (min)

Figure 5.11 Comparison of tsunami waveforms recorded at wave gauges and computed with dif-
ferent models.

5.5 Performance Analysis

All the numerical tests have been conducted for both the CPU and GPU versions of PCOMCOT.
Before analyzing the model performance on CPU and GPU, it is necessary to state that these two
versions give exactly the same results. Take the case of 2011 Tohoku tsunami as example, the
dispersive waveforms computed by the CPU and GPU versions are plotted in Figure 5.12. Double-
and single-precision computing are performed on CPU and GPU, respectively. It is seen that
tsunami waves given by the two versions are completely identical, and the difference in floating-
point precision has no influence on accuracy of the result.

72
observed PCOMCOT-CPU (double precision) PCOMCOT-GPU (single precision)

4
Wave height (m)
2
202 613
2 1

0 0

-2 -1

-2
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90

10
801 5 803
0

0
-10

-5
-20
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90

2
5 804 21418
1

0 0

-1
-5
0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90
Time (min)

Figure 5.12 Comparison of dispersive tsunami waves computed by CPU and GPU versions of
PCOMCOT.

We run the CPU version of PCOMCOT on 32 cores of an Intel Xeon Platinum 8358 pro-
cessor, with a base frequency of 2.6 GHz. For the GPU version, we use three different NVIDIA
Tesla GPUs — P100, V100, and A100, which represent the out-of-date, mainstream, and the most
advanced architectures, respectively. On each of these platforms, we adopt four types of equations
(i.e., linear/nonlinear, dispersive/non-dispersive) for simulation, and record the corresponding time
cost. Table 5.2 displays the time cost for the 2011 Tohoku cause using different hardwares. The
PCOMCOT model is quite efficient on the 32-core CPU, where it takes only 10 min to finish the
nonlinear dispersive simulation for 1.6 hours. The model performance on P100 GPU is comparable
to that on the Intel Xeon 8358 CPU. Significant decease in time cost can be seen on a V100 and
A100 GPU.

73
Table 5.2 Time cost of PCOMCOT on different hardwares for the 2011 Tohoku case

Intel® Xeon® 8358 CPU NVIDIA® P100 GPU V100 GPU A100 GPU

linear,
3.5 min 4 min 50 s 30 s
non-dispersive

nonlinear,
4 min 5 min 1 min 40 s
non-dispersive

linear,
9.5 min 10 min 3.3 min 1.7 min
dispersive

nonlinear,
10 min 10.5 min 3.5 min 1.8 min
dispersive

CPU: 32-core Intel Xeon 8358 GPU: P100-16G GPU: V100-16G GPU: A100-40G
(price: $4.6k) (price: $0.3k) (price: $1k) (price: $8k)
8
7x
Performance relative to CPU

6x
6 5.6x 5.6x

4.2x
4 4x

2.9x 2.9x

2
1x 0.9x 1x 0.8x 1x 0.95x 1x 0.95x

0
linear, non-dispersive nonlinear, non-dispersive linear, dispersive nonlinear, dispersive

Figure 5.13 Speedup of PCOMCOT-GPU over PCOMCOT-CPU for the 2011 Tohoku case.

Figure 5.13 shows speedup of PCOMCOT-GPU over PCOMCOT-CPU, together with the
current market prices for different hardwares. Using a V100 GPU, about 4 and 3 times speedup
can be obtained for non-dispersive and dispersive simulations, respectively. While on an A100 GPU,
a speedup ratio of ∼6 is achieved for all types of simulations. The decrease in speedup ratio for

74
dispersive simulations is due to the less effective preconditioning method used by GPU version to
solve the Poisson-type equation. In brief, considering both the performance and hardware price,
the mainstream V100 GPU can provide significant speedup over a CPU which is ∼5 times more
expensive.

75
6 Citation of PCOMCOT
To publish papers containing the use of PCOMCOT or mentioning this model, please cite our articles
about the algorithms and applications of PCOMCOT. The articles on PCOMCOT include:

• Zhu, Y., C. An, H. Yu, W. Zhang, and X. Chen (2024), High-resolution tsunami hazard assessment
for the guangdong-hong kong-macao greater bay area based on a non-hydrostatic tsunami model,
Science China Earth Sciences, 67 (7), 2326–2351, doi: 10.1007/s11430-023-1300-9

• An, C., H. Liu, Z. Ren, and Y. Yuan (2018), Prediction of tsunami waves by uniform slip models,
Journal of Geophysical Research: Oceans, 123 (11), 8366–8382, doi: 10.1029/2018JC014363

• Wang, X., and P. L.-F. Liu (2006), An analysis of 2004 Sumatra earthquake fault plane mecha-
nisms and Indian Ocean tsunami, Journal of Hydraulic Research, 44 (2), 147–154, doi: 10.1080/
00221686.2006.9521671

76
Appendix A MatLab Scripts to Read PCOMCOT Output

A.1 A sample Matlab script to read snapshots

function [ x , y , dat ] = COMCOT readBinaryDataSnapshot ( f n )


% [ x , y , d a t ] = COMCOT readBinaryDataSnapshot ( f n )

l b s l a s h = s t r f i n d ( fn , ’ / ’ ) ;
i f ( isempty ( l b s l a s h ) )
path = ’ . / ’ ;
fn0 = fn ;
else
l b s l a s h l a s t = max( l b s l a s h ) ;
path = f n ( 1 : l b s l a s h l a s t ) ;
f n 0 = f n ( l b s l a s h l a s t +1:end ) ;
end

i i = s t r f i n d ( fn0 , ’ ’ ) ;
i l a y e r = str2num ( f n 0 ( i i ( 1 ) + 1 : i i ( 1 ) + 2 ) ) ;
i f ( f n 0 (1)== ’ ’ )
i i = s t r f i n d ( fn0 , ’ . dat ’ ) ;
i l a y e r=str2num ( f n 0 ( i i (1) −2: i i ( 1 ) − 1 ) ) ;
end
i f ( isempty ( i l a y e r ) == 1 )
ilayer = 1;
end

x = load ( [ path , ’ x c o o r d i n a t e ’ , s p r i n t f ( ’%02d ’ , i l a y e r ) , ’ . dat ’ ] ) ;


y = load ( [ path , ’ y c o o r d i n a t e ’ , s p r i n t f ( ’%02d ’ , i l a y e r ) , ’ . dat ’ ] ) ;

i f ( ( f n 0 ( 1 ) == ’M’ ) | | ( f n 0 ( 1 ) == ’m’ ) )
x = x ( 1 : end−1)+0.5∗( x(2) −x ( 1 ) ) ;

77
e l s e i f ( ( f n 0 ( 1 ) == ’N ’ ) | | ( f n 0 ( 1 ) == ’ n ’ ) )
y = y ( 1 : end−1)+0.5∗( y(2) −y ( 1 ) ) ;
end

f i d = fopen ( fn , ’ rb ’ ) ;
StartTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
f p = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ; % f l o a t i n g −p o i n t p r e c i s i o n : 4/8
EndTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
r e a l T y p e = s p r i n t f ( ’ r e a l ∗%d ’ , f p ) ;

StartTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
n c o l = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ; %NX = n c o l
nrow = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ; %NY = nrow
EndTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
dat = zeros ( nrow , n c o l ) ;
f o r i = 1 : nrow
StartTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
dattmp = fread ( f i d , n c o l , r e a l T y p e ) ;
dat ( i , : ) = dattmp ( : ) ;
EndTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
end
fclose ( f i d ) ;

78
A.2 A sample Matlab script to read station data

function dat = COMCOT readBinaryDataStation ( f n )


% r ead d a t a from b i n a r y f i l e t o memory
% d a t = COMCOT readBinaryDataStation ( f n )

f i d = fopen ( fn , ’ rb ’ ) ;
StartTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
f p = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ; % f l o a t i n g −p o i n t p r e c i s i o n : 4/8
EndTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
r e a l T y p e = s p r i n t f ( ’ r e a l ∗%d ’ , f p ) ;

StartTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
ComputeGreen = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
NFaults = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
NDataLength = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
EndTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;

i f ( ComputeGreen ˜= 1 )
nDatCol = NFaults +1;
else
nDatCol = 2 ;
end
dat = zeros ( NDataLength , nDatCol ) ;

StartTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
t = fread ( f i d , NDataLength , r e a l T y p e ) ;
dat ( : , 1 ) = t ( : ) ;
EndTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;

f o r i C o l = 1 : nDatCol−1

79
StartTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
h = fread ( f i d , NDataLength , r e a l T y p e ) ;
dat ( : , 1 + i C o l ) = h ( : ) ;
EndTag = fread ( f i d , 1 , ’ i n t e g e r ∗4 ’ ) ;
end
fclose ( f i d ) ;

80
References
An, C., I. Sepúlveda, and P. L.-F. Liu (2014), Tsunami source and its validation of the 2014
Iquique, Chile, earthquake, Geophysical Research Letters, 41 (11), 3988–3994, doi: 10.1002/
2014GL060567.

An, C., H. Liu, Z. Ren, and Y. Yuan (2018), Prediction of tsunami waves by uniform slip models,
Journal of Geophysical Research: Oceans, 123 (11), 8366–8382, doi: 10.1029/2018JC014363.

Arcos, M. E. M., and R. J. LeVeque (2015), Validating velocities in the GeoClaw tsunami model
using observations near Hawaii from the 2011 Tohoku tsunami, Pure and Applied Geophysics,
172 (3), 849–867, doi: 10.1007/s00024-014-0980-y.

Baba, T., N. Takahashi, Y. Kaneda, K. Ando, D. Matsuoka, and T. Kato (2015), Parallel imple-
mentation of dispersive tsunami wave modeling with a nesting algorithm for the 2011 Tohoku
tsunami, Pure and Applied Geophysics, 172 (12), 3455–3472, doi: 10.1007/s00024-015-1049-2.

Baba, T., S. Allgeyer, J. Hossen, P. R. Cummins, H. Tsushima, K. Imai, K. Yamashita, and


T. Kato (2017), Accurate numerical simulation of the far-field tsunami caused by the 2011 Tohoku
earthquake, including the effects of Boussinesq dispersion, seawater density stratification, elastic
loading, and gravitational potential change, Ocean Modelling, 111, 46–54, doi: 10.1016/j.ocemod.
2017.01.002.

Barrett, R., M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo,


C. Romine, and H. van der Vorst (1994), Templates for the Solution of Linear Systems: Building
Blocks for Iterative Methods, chap. 3, Society for Industrial and Applied Mathematics, doi: 10.
1137/1.9781611971538.

Bates, P. D., M. S. Horritt, and T. J. Fewtrell (2010), A simple inertial formulation of the shallow
water equations for efficient two-dimensional flood inundation modelling, Journal of Hydrology,
387 (1), 33–45, doi: 10.1016/j.jhydrol.2010.03.027.

Briggs, M. J., C. E. Synolakis, G. S. Harkins, and D. R. Green (1995), Laboratory experiments of


tsunami runup on a circular island, pure and applied geophysics, 144 (3), 569–593.

81
Casulli, V. (1999), A semi-implicit finite difference method for non-hydrostatic, free-surface flows,
International Journal for Numerical Methods in Fluids, 30 (4), 425–440, doi: 10.1002/(SICI)
1097-0363(19990630)30:4⟨425::AID-FLD847⟩3.0.CO;2-D.

Chen, Q., R. A. Dalrymple, J. T. Kirby, A. B. Kennedy, and M. C. Haller (1999), Boussinesq


modeling of a rip current system, Journal of Geophysical Research: Oceans, 104 (C9), 20,617–
20,637, doi: 10.1029/1999JC900154.

Chen, Q., J. T. Kirby, R. A. Dalrymple, A. B. Kennedy, and A. Chawla (2000), Boussinesq modeling
of wave transformation, breaking, and runup. II: 2D, Journal of Waterway, Port, Coastal, and
Ocean Engineering, 126 (1), 48–56, doi: 10.1061/(ASCE)0733-950X(2000)126:1(48).

Chiang, C. M., M. Stiassnie, and D. K.-P. Yue (2005), Theory and Applications of Ocean Surface
Waves, chap. 12, World Scientific, doi: 10.1142/5566.

Cho, Y.-S., and J.-M. Kim (2009), Moving boundary treatment in run-up process of tsunami,
Journal of Coastal Research, pp. 482–486.

Choi, Y.-K., F. Shi, M. Malej, and J. M. Smith (2018), Performance of various shock-capturing-type
reconstruction schemes in the Boussinesq wave model, FUNWAVE-TVD, Ocean Modelling, 131,
86–100, doi: 10.1016/j.ocemod.2018.09.004.

de Almeida, G. A. M., P. Bates, J. E. Freer, and M. Souvignet (2012), Improving the stability
of a simple formulation of the shallow water equations for 2-d flood modeling, Water Resources
Research, 48 (5), doi: 10.1029/2011WR011570.

Fujii, Y., K. Satake, S. Sakai, M. Shinohara, and T. Kanazawa (2011), Tsunami source of the 2011
off the Pacific coast of Tohoku earthquake, Earth, Planets and Space, 63 (7), 815–820.

Fujiwara, T., S. Kodaira, T. No, Y. Kaiho, N. Takahashi, and Y. Kaneda (2011), The 2011 Tohoku-
Oki earthquake: displacement reaching the trench axis, Science, 334 (6060), 1240–1240, doi:
10.1126/science.1211554.

Glimsdal, S., G. K. Pedersen, C. B. Harbitz, and F. Løvholt (2013), Dispersion of tsunamis: does
it really matter?, Natural Hazards and Earth System Sciences, 13 (6), 1507–1526, doi: 10.5194/
nhess-13-1507-2013.

82
Harig, S., Chaeroni, W. S. Pranowo, and J. Behrens (2008), Tsunami simulations on several scales,
Ocean Dynamics, 58 (5), 429–440, doi: 10.1007/s10236-008-0162-5.

Heidarzadeh, M., S. Murotani, K. Satake, T. Ishibe, and A. R. Gusman (2016), Source model of
the 16 September 2015 Illapel, Chile, Mw 8.4 earthquake based on teleseismic and tsunami data,
Geophysical Research Letters, 43 (2), 643–650, doi: 10.1002/2015GL067297.

Horrillo, J., Z. Kowalik, and Y. Shigihara (2006), Wave dispersion study in the indian ocean-tsunami
of december 26, 2004, Marine Geodesy, 29 (3), 149–166, doi: 10.1080/01490410600939140.

Imamura, F. (1996), Simulation of wave-packet propagation along sloping beach by TUNAMI-code,


in Long-Wave Runup Models, edited by H. Yeh, P. Liu, and C. Synolakis, pp. 231–241, World
Scientific.

Kajiura, K. (1963), The leading wave of a tsunami, Bulletin of the Earthquake Research Institute,
University of Tokyo, 41 (3), 535–571.

KÂNOĞLU, U., and C. E. SYNOLAKIS (1998), Long wave runup on piecewise linear topographies,
Journal of Fluid Mechanics, 374, 1–28, doi: 10.1017/S0022112098002468.

Kennedy, A. B., Q. Chen, J. T. Kirby, and R. A. Dalrymple (2000), Boussinesq modeling of wave
transformation, breaking, and runup. I: 1D, Journal of Waterway, Port, Coastal, and Ocean
Engineering, 126 (1), 39–47, doi: 10.1061/(ASCE)0733-950X(2000)126:1(39).

Koçyigit, M. B., R. A. Falconer, and B. Lin (2002), Three-dimensional numerical modelling of


free surface flows with non-hydrostatic pressure, International Journal for Numerical Methods in
Fluids, 40 (9), 1145–1162, doi: 10.1002/fld.376.

Kowalik, Z., W. Knight, T. Logan, and P. Whitmore (2005), Numerical modeling of the global
tsunami: Indonesian tsunami of 26 December 2004, Science of Tsunami Hazards, 23 (1), 40–56.

Larsen, J., and H. Dancy (1983), Open boundaries in short wave simulations — a new approach,
Coastal Engineering, 7 (3), 285–297, doi: 10.1016/0378-3839(83)90022-4.

Lax, P., and B. Wendroff (1960), Systems of conservation laws, Communications on Pure and
Applied Mathematics, 13 (2), 217–237, doi: 10.1002/cpa.3160130205.

83
LeVeque, R. J., D. L. George, and M. J. Berger (2011), Tsunami modelling with adaptively refined
finite volume methods, Acta Numerica, 20, 211–289, doi: 10.1017/S0962492911000043.

Liu, P., S. Woo, and Y. Cho (1998), Computer programs for tsunami propagation and inundation,
Technical report, Cornell University, Ithaca, N.Y.

Liu, P. L. F., Y.-S. Cho, M. J. Briggs, U. Kanoglu, and C. E. Synolakis (1995), Runup of
solitary waves on a circular island, Journal of Fluid Mechanics, 302, 259–285, doi: 10.1017/
S0022112095004095.

Løvholt, F., and G. Pedersen (2009), Instabilities of Boussinesq models in non-uniform depth,
International Journal for Numerical Methods in Fluids, 61 (6), 606–637, doi: 10.1002/fld.1968.

Lynett, P., and P. L.-F. Liu (2004), A two-layer approach to wave modelling, Proceedings of the
Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 460 (2049),
2637–2669, doi: 10.1098/rspa.2004.1305.

Lynett, P. J., T.-R. Wu, and P. L.-F. Liu (2002), Modeling wave runup with depth-integrated
equations, Coastal Engineering, 46 (2), 89–107, doi: 10.1016/S0378-3839(02)00043-1.

Ma, G., F. Shi, and J. T. Kirby (2012), Shock-capturing non-hydrostatic model for fully dispersive
surface wave processes, Ocean Modelling, 43-44, 22–35, doi: 10.1016/j.ocemod.2011.12.002.

Oishi, Y., F. Imamura, and D. Sugawara (2015), Near-field tsunami inundation forecast using the
parallel TUNAMI-N2 model: Application to the 2011 Tohoku-Oki earthquake combined with
source inversions, Geophysical Research Letters, 42 (4), 1083–1091, doi: 10.1002/2014GL062577.

Okada, Y. (1985), Surface deformation due to shear and tensile faults in a half-space, Bulletin of
the Seismological Society of America, 75 (4), 1135–1154, doi: 10.1785/BSSA0750041135.

Peregrine, D. H. (1967), Long waves on a beach, Journal of Fluid Mechanics, 27 (4), 815–827, doi:
10.1017/S0022112067002605.

Ruetsch, G., and M. Fatica (2024), CUDA Fortran for scientists and engineers: best practices for
efficient CUDA Fortran programming, Elsevier.

84
Saito, T., Y. Ito, D. Inazu, and R. Hino (2011), Tsunami source of the 2011 Tohoku-Oki earthquake,
Japan: Inversion analysis based on dispersive tsunami simulations, Geophysical Research Letters,
38 (7), doi: 10.1029/2011GL049089.

Satake, K., Y. Fujii, T. Harada, and Y. Namegaya (2013), Time and space distribution of coseismic
slip of the 2011 Tohoku earthquake as inferred from tsunami waveform data, Bulletin of the
Seismological Society of America, 103 (2B), 1473–1492, doi: 10.1785/0120120122.

Shapiro, R. (1970), Smoothing, filtering, and boundary effects, Reviews of Geophysics, 8 (2), 359–
387, doi: 10.1029/RG008i002p00359.

Shi, F., J. T. Kirby, J. C. Harris, J. D. Geiman, and S. T. Grilli (2012), A high-order adaptive
time-stepping TVD solver for Boussinesq modeling of breaking waves and coastal inundation,
Ocean Modelling, 43-44, 36–51, doi: 10.1016/j.ocemod.2011.12.004.

Shi, F., J. T. Kirby, B. Tehranirad, J. C. Harris, Y.-K. Choi, and M. Malej (2016), FUNWAVE-
TVD fully nonlinear Boussinesq wave model with TVD solver - documentation and user’s manual
(version 3.0), Center Appl. Coastal Res., Univ. Delaware (2016).

Sridharan, B., D. Gurivindapalli, S. N. Kuiry, V. K. Mali, N. Nithila Devi, P. D. Bates, and D. Sen
(2020), Explicit expression of weighting factor for improved estimation of numerical flux in local
inertial models, Water Resources Research, 56 (7), e2020WR027,357, doi: https://doi.org/10.
1029/2020WR027357, e2020WR027357 2020WR027357.

Stelling, G., and M. Zijlema (2003), An accurate and efficient finite-difference algorithm for non-
hydrostatic free-surface flow with application to wave propagation, International Journal for
Numerical Methods in Fluids, 43 (1), 1–23, doi: 10.1002/fld.595.

Tanioka, Y., and K. Satake (1996), Tsunami generation by horizontal displacement of ocean bottom,
Geophysical Research Letters, 23 (8), 861–864, doi: 10.1029/96GL00736.

Thacker, W. C. (1981), Some exact solutions to the nonlinear shallow-water wave equations, Journal
of Fluid Mechanics, 107, 499–508, doi: 10.1017/S0022112081001882.

Titov, V. V., and F. I. González (1997), Implementation and testing of the method of splitting
tsunami (MOST) model, in NOAA Technical Memorandum ERL PMEL-112.

85
Titov, V. V., and C. E. Synolakis (1998), Numerical modeling of tidal wave runup, Journal
of Waterway, Port, Coastal, and Ocean Engineering, 124 (4), 157–171, doi: 10.1061/(ASCE)
0733-950X(1998)124:4(157).

van der Vorst, H. A. (1992), Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the
solution of nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing,
13 (2), 631–644, doi: 10.1137/0913035.

Wang, X., and P. L.-F. Liu (2006), An analysis of 2004 Sumatra earthquake fault plane
mechanisms and Indian Ocean tsunami, Journal of Hydraulic Research, 44 (2), 147–154, doi:
10.1080/00221686.2006.9521671.

Wessel, P., W. H. F. Smith, R. Scharroo, J. Luis, and F. Wobbe (2013), Generic mapping tools:
Improved version released, Eos, Transactions American Geophysical Union, 94 (45), 409–410, doi:
10.1002/2013EO450001.

Yamazaki, Y., Z. Kowalik, and K. F. Cheung (2009), Depth-integrated, non-hydrostatic model


for wave breaking and run-up, International Journal for Numerical Methods in Fluids, 61 (5),
473–497, doi: 10.1002/fld.1952.

Yamazaki, Y., K. F. Cheung, and Z. Kowalik (2011), Depth-integrated, non-hydrostatic model


with grid nesting for tsunami generation, propagation, and run-up, International Journal for
Numerical Methods in Fluids, 67 (12), 2081–2107, doi: 10.1002/fld.2485.

Yeh, H., F. Imamura, C. Synolakis, Y. Tsuji, P. Liu, and S. Shi (1993), The Flores Island tsunamis,
Eos, Transactions American Geophysical Union, 74 (33), 369–373, doi: 10.1029/93EO00381.

Yuan, Y., F. Shi, J. T. Kirby, and F. Yu (2020), Funwave-gpu: Multiple-gpu accel-


eration of a boussinesq-type wave model, Journal of Advances in Modeling Earth Sys-
tems, 12 (5), e2019MS001,957, doi: https://doi.org/10.1029/2019MS001957, e2019MS001957
10.1029/2019MS001957.

Zhu, Y., C. An, H. Yu, W. Zhang, and X. Chen (2024), High-resolution tsunami hazard assessment
for the guangdong-hong kong-macao greater bay area based on a non-hydrostatic tsunami model,
Science China Earth Sciences, 67 (7), 2326–2351, doi: 10.1007/s11430-023-1300-9.

86
Zijlema, M., and G. Stelling (2008), Efficient computation of surf zone waves using the nonlinear
shallow water equations with non-hydrostatic pressure, Coastal Engineering, 55 (10), 780–790,
doi: 10.1016/j.coastaleng.2008.02.020.

Zijlema, M., G. Stelling, and P. Smit (2011), SWASH: An operational public domain code for
simulating wave fields and rapidly varied flows in coastal waters, Coastal Engineering, 58 (10),
992–1012, doi: 10.1016/j.coastaleng.2011.05.015.

87

You might also like