0% found this document useful (0 votes)

10 views11 pages

2D Acoustic Wave Equation - Parallel Hybrid Implementation

Uploaded by

Anh Lê Tuấn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

2D Acoustic Wave Equation - Parallel Hybrid Implementation

Uploaded by

Anh Lê Tuấn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/342302061

A parallel hybrid implementation of the 2D acoustic wave equation

Preprint · June 2020

CITATIONS READS

0 347

3 authors:

Arshyn Altybay Michael Ruzhansky

Al-Farabi Kazakh National University Ghent University and Queen Mary University of London
17 PUBLICATIONS 105 CITATIONS 616 PUBLICATIONS 7,005 CITATIONS

SEE PROFILE SEE PROFILE

Niyaz Tokmagambetov
CRM Centre for Mathematical Research
110 PUBLICATIONS 1,105 CITATIONS

SEE PROFILE

All content following this page was uploaded by Michael Ruzhansky on 24 June 2020.

The user has requested enhancement of the downloaded file.

A PARALLEL HYBRID IMPLEMENTATION OF THE 2D
ACOUSTIC WAVE EQUATION

ARSHYN ALTYBAY, MICHAEL RUZHANSKY, AND NIYAZ TOKMAGAMBETOV

Abstract. In this paper, we propose a hybrid parallel programming approach for

a numerical solution of a two-dimensional acoustic wave equation using an implicit
difference scheme for a single computer. The calculations are carried out in an
arXiv:2006.10142v1 [physics.comp-ph] 17 Jun 2020

implicit finite difference scheme. First, we transform the differential equation into
an implicit finite-difference equation and then using the ADI method, we split the
equation into two sub-equations. Using the cyclic reduction algorithm, we calculate
an approximate solution. Finally, we change this algorithm to parallelize on GPU,
GPU+OpenMP, and Hybrid (GPU+OpenMP+MPI) computing platforms.
The special focus is on improving the performance of the parallel algorithms to
calculate the acceleration based on the execution time. We show that the code that
runs on the hybrid approach gives the expected results by comparing our results
to those obtained by running the same simulation on a classical processor core,
CUDA, and CUDA+OpenMP implementations.

1. Introduction
The reduction of computational time for long-term simulation of physical processes
is a challenge and an important issue in the field of modern scientific computing.
The cost of supercomputer, CPU clusters and hybrid clusters with a large number of
GPUs are very expensive and they consume a lot of energy, which is inaccessible and
ineffective to some small laboratories and individuals.
Nowadays, new generation computers are multi-core, hybrid architecture and their
computational power is also quite high. For example, the Intel Xeon E5-2697 v2
(2S-E5) processors theoretically has computing power of about 19.56 GFLOPS, and,
accordingly, the computational power of the NVIDIA TITAN Xp video card is about
up to 379.7 GFLOPS. If we use the computing power of the CPU and GPU together,
we can show good results.
The goal of this work is to develop a parallel hybrid implementation of the finite-
difference method for solving two-dimensional wave equation using CUDA, CUDA +
OpenMP and CUDA + OpenMP + MPI technologies and to study the parallelization
efficiency by comparing the time to solve this problem with the above approaches.

2010 Mathematics Subject Classification. 35L05, 76B15, 68Q85.

Key words and phrases. Parallel computing, GPU, CUDA, MPI, Open MP, acoustic equation,
wave equation.
The authors were supported by the FWO Odysseus 1 grant G.0H94.18N: Analysis and Partial
Differential Equations. MR was supported in parts by the EPSRC Grant EP/R003025/1, by the
Leverhulme Research Grant RPG-2017-151. AA was supported by the MESRK Grants AP08052028
and AP08053051 of the Ministry of Education and Science of the Republic of Kazakhstan.

1
2 ARSHYN ALTYBAY, MICHAEL RUZHANSKY, AND NIYAZ TOKMAGAMBETOV

Already for several years, GPUs have been used to accelerate well parallelizable com-
puting, only with the advent of a new generation of GPUs with multicore architecture,
this direction began to give palpable results.
For multidimensional problems, the efficiency of an implicit compact difference
scheme depends on the computational efficiency of the corresponding matrix solvers.
From this point of view, the ADI method [1] is promising because they can decompose
a multidimensional problem into a series of one-dimensional problems. It has been
shown that schemes acquired are unconditionally stable. For the proper assignment
of large domains of modeling, two- or three-dimensional computational grids with
a sufficient number of points are used. Calculations on such grids require more
CPU time and computer memory resources. To accelerate the computation process,
GPU, OpenMP, MPI technologies were used in this paper, which allows the program
to operate on larger grids. With GPU becoming a viable alternative to CPU for
parallel computing, the aforementioned parallel tridiagonal solvers and other hybrid
methods have been implemented on GPUs [4]–[11]. In this paper, we propose three
different parallel programming approaches using hybrid CUDA, OpenMP and MPI
programming for personal computers. There are many examples in the literature of
successfully using hybrid approaches for different simulation [12]–[17].
Here we study some issues in the numerical simulation of some problems in the
propagation of acoustic waves on high performance computing systems.

2. Problem Statement and Numerical Scheme

We consider 2D acoustic wave equation with the positive ”speed” function c and
the source term f
∂ 2u
2
∂ u ∂ 2u

(2.1) − c(t) + = f (t, x, y), (t, x, y) ∈ [0; T ] × [0; l] × [0; l],
∂t2 ∂x2 ∂y 2
subject to the initial conditions
(2.2) u(0, x, y) = ϕ(x, y), x, y ∈ [0, l],
∂u(0, x, y)
(2.3) = ψ(x, y), x, y ∈ [0, l],
∂t
and boundary conditions
(2.4) u(t, x, 0) = 0, u(t, x, l) = 0, t ∈ [0, T ], x ∈ [0, l],

(2.5) u(t, 0, y) = 0, u(t, l, y) = 0, t ∈ [0, T ], y ∈ [0, l].

In what follows, we take all data, namely, the coefficient c, the source function f , the
initial functions ϕ and ψ, smooth enough. Due to the notion of ”very weak solutions”
and the approach developed by Garetto and Ruzhansky in [18], we can deal with 2D
acoustic wave equation with singular (δ–like) data approximating them by smooth
functions. For more details on these techniques and applications, we refer to the
papers [18, 19, 20, 21, 22, 23, 24].
For numerical simulations, we introduce a space-time grid with steps h1 , h2 , τ re-
spectively, in the variables x, y, t :
(2.6) ωhτ 1 ,h2 = {tk = kτ, k = 1, M ; xi = ih1 , i = 1, N1 ; yj = jh2 , j = 1, N2 },
A PARALLEL HYBRID IMPLEMENTATION OF THE 2D ACOUSTIC WAVE EQUATION 3

and

(2.7) Ωτh1 ,h2 = {tk = kτ, k = 0, M ; xi = ih1 , i = 0, N1 ; yj = jh2 , j = 0, N2 },

where h1 = l/N1 , h2 = l/N2 and τ = T /M .

On this grid we approximate the problem (2.1)–(2.5) using the finite difference
method. For simplicity, we put N := N1 = N2 and denote h := h1 = h2 . Consider
the Crank-Nicolson scheme for the problem (2.1)–(2.5)
(2.8)
uk+1 k k−1
i,j − 2ui,j + ui,j ck+1 k+1
− (u − 2uk+1 k+1 k+1 k+1 k+1
i,j + ui−1,j + ui,j+1 − 2ui,j + ui,j−1 )
τ2 2h2 i+1,j
ck−1
− 2 (uk−1 k−1 k−1 k−1 k−1 k−1 k
i+1,j − 2ui,j + ui−1,j + ui,j+1 − 2ui,j + ui,j−1 ) = fi,j ,
2h
for (k, i, j) ∈ ωhτ 1 ,h2 , with initial conditions

(2.9) u0i,j = ϕi,j , u1i,j − u0i,j = τ ψi,j ,

for (i, j) ∈ 0, N × 0, N , and with boundary conditions

(2.10) uk0,j = 0, ukN,j = 0, uki,0 = 0, uki,N = 0,

for (j, k) ∈ 0, N × 0, M and (i, k) ∈ 0, N × 0, M , respectively.

It is well-known, that the implicit scheme is unconditionally stable and it has
accuracy order O(τ + |h|2 ), see, for example [25]. We solve the difference equation
(2.8) by the alternating direction implicit (ADI) method, namely, dividing it into two
sub-problems
k+1/2 k−1/2
ui,j − 2uki,j + ui,j ck+1/2 k+1/2 k+1/2 k+1/2
− (ui+1,j − 2ui,j + ui−1,j )
(2.11) τ2 2h 2

ck−1/2 k−1/2 k−1/2 k−1/2 k

− 2
(ui+1,j − 2ui,j + ui−1,j ) = fi,j ,
2h
and
k+1/2
uk+1
i,j − 2ui,j + uki,j ck+1 k+1
2
− 2
(ui,j+1 − 2uk+1 k+1
i,j + ui,j−1 )
(2.12) τ 2h
ck k k+1/2
− 2 (ui,j+1 − 2uki,j + uki,j−1 ) = fi,j .
2h

3. Hybrid parallel computing model

High-performance computing uses parallel computing to achieve high levels of per-
formance. In parallel computing, the program is divided into many subroutines, and
then they are all executed in parallel to calculate the required values. In this section,
we will propose a hybrid parallel approach numerically solving a two-dimensional
wave equation, for this, we use CUDA, MPI OpenMP technologies.
4 ARSHYN ALTYBAY, MICHAEL RUZHANSKY, AND NIYAZ TOKMAGAMBETOV

3.1. CUDA approach. The graphics processing unit (GPU) is a highly parallel,
multi-threaded, and multi-core processor with enormous processing power. Its low
cost and high bandwidth floating point operations and memory access bandwidth are
attracting more and more high performance computing researchers [32]. In addition,
compared to cluster systems, which consist of several processors, computing on a GPU
is inexpensive and requires low power consumption with equivalent performance. In
many disciplines of science and technology, users were able to increase productivity
by several orders of magnitude using graphics processors [2], [3]. The year 2007,
with the appearance of the CUDA programming language, programming GPUs on
NVIDIA graphics cards became significantly simpler because its syntax is similar to
C[26].
It is designed so that its constructions allow a natural expression of concurrency at
the data level. A CUDA program consists of two parts: a sequential program running
on the CPU, and a parallel part running on the GPU [3], [31]. The parallel part is
called the kernel. A CUDA program automatically uses more parallelism on GPUs
that have more processor cores.
A C program using CUDA extensions hand out a large number of copies of the
kernel into available multiprocessors to be performed simultaneously.
The CUDA code consists of three computational steps: transferring data to the
global GPU memory, running the CUDA core, and transferring the results from the
GPU to the CPU memory. We have designed a CUDA program based on cyclic
reduction method, whose full CR function codes are located in [29]. The algorithm
for solving the problem (2.1)–(2.5) is shown in Algorithm 1.

Algorithm 1 Implementation of 2D wave equation

compute initial function matrix U 0
from initial condition (2.2) we get u = U 0
while (t < tend ) do
for j = 0, . . . , n
for i = 0, . . . , n
calculate tridiagonal system elements ai , bi , ci , fi
call function CR(ai , bi , ci , fi , yi , n)
calculate matrix U x
for i = 0, . . . , n
for j = 0, . . . , n
calculate tridiagonal system elements aj, bj, cj, f j
call function CR(aj , bj , cj , fj , yj , n)
calculate matrix U y
swap (u, U x)
swap (U 0, U y)
t ← t+ M t

k−1/2 k+1/2
Here, u, U 0, U x, U y denote ui,j , uki,j , ui,j , uk+1
i,j , respectively. The CR()
function includes 3 device functions, namely, CRM f orward(), cr div(),
andCRM backward(), and one host function calc dim(). First we have to calculate
A PARALLEL HYBRID IMPLEMENTATION OF THE 2D ACOUSTIC WAVE EQUATION 5

the block size according to the size of the matrix and step numbers of forward and
backward sub-steps. For this, we use one cycle
for (i = 0; i < log 2(n + 1) − 1; i + +) {
stepN um = (n − pow(2.0, i + 1))/pow(2.0, i + 1) + 1;
calc dim(stepN um, &dimBlock, &dimGrid);
CRM f orward <<< dimGrid, dimBlock >>>(d a, d b, d c, d f, n, stepN um, i);
}
Here log 2(n + 1) − 1 is a step number and a variable of stepN um. It is to identify
the block size. Therefore, we need the function calc dim(), which is identifying the
block sizes. After that the function CRM f orward() runs log 2(n + 1) − 1 times.
Consequently, the system reduces to one equation. After that we synchronize the
device and call the function cr div(). This function calculates two unknowns. Then
we use one cycle
for (i = log 2(n + 1) − 2; i > = 0; i − −) {
stepN um = (n − pow(2.0, i + 1))/pow(2.0, i + 1) + 1;
calc dim(stepN um, &dimBlock, &dimGrid);
CRM backward <<< dimGrid, dimBlock >>>(d a, d b, d c, d f, d x, n, stepN um, i);
}
Here the backward substitution runs log 2(n+1)−2 times because the first backward
substitution sub-step is calculated by the function calc dim(). Thus, we calculate d x
array. After that we copy the calculated data d x from the device to the host using
cudaM emcpy(y,d x, sizeof (double) ∗ n, cudaM emcpyDeviceT oHost).

3.2. OpenMP+CUDA approach. OpenMP (Open Multi-Processing) was intro-

duced to provide the means to implement shared memory concurrency in FORTRAN
and C/C ++ programs. In particular, OpenMP defines a set of environment vari-
ables, compiler directives and library procedures that will be used for parallelization
with shared memory. OpenMP was specifically designed to use certain characteris-
tics of shared memory architectures, such as the ability to directly access memory
throughout a system with low latency and very fast shared memory locking [27].
We can easy parallelize loops by using MPI thread libraries and invlove the OpenMP
compilers. In this way, the threads can obtain new tasks, the unprocessed loop iter-
ations directly from local shared memory. The basic idea behind OpenMP is data-
shared parallel execution. It consists of a set of compiler directives, callable runtime
library routines and environment variables that extend FORTRAN, C, and C++
programs. The working unit of OpenMP is a thread. It works well when accessing
shared data costs you nothing. Every thread can access a variable in a shared cache
or RAM.
In this paper, we use OpenMP to solve an initial boundary value problem. Since do
deal with it we use two cycles and calculate one matrix. Moreover, OpenMP parallel
computing model is very convenient to implement, and it has low latency and high
bandwidth.

3.3. Hybrid approach. The message passing interface (MPI) is a standardized and
portable programming interface for exchanging messages between multiple processors
executing a parallel program in distributed memory.
6 ARSHYN ALTYBAY, MICHAEL RUZHANSKY, AND NIYAZ TOKMAGAMBETOV

MPI works well on a wide variety of distributed storage architectures and is ideal for
individual computers and clusters. However, MPI depends on explicit communication
between parallel processes which requires mesh decomposition in advance due to data
decomposition. Therefore, MPI can cause load balancing and consume extra time.
Since MPICH2 is freely accessible here in our implementations we use it. In [8] the
authors used a compact implementation of the MPI standard for message-passing for
distributed-memory applications. MPICH is a free software. Also, it is available for
most types of UNIX and Microsoft Windows systems. MPI is standardized on many
levels. Indeed, it provides many advantages for the users. One of them makes you
sure that MPI codes can be executed in any MPI implementation launching on your
architecture, even if the syntax has been standardized. Since the functional behavior
of the MPI calls is also standardized, its should behave in the same way whatever of
implementation, which ensures the portability of the parallel programs.
We use MPI technology to calculate the elements of the tridiagonal matrix sys-
tem, i.e ai, bi, ci, f i because these values can be calculated independently, so we can
successfully apply MPI technology here.
Listing code 1 shows the program code.

Listing 1. Calculation of ai, bi, ci, f i

i1 = (n*rank) / size;
i2 = (n*(rank + 1)) / size;
for (i = i1; i <i2; i++)
{
a_m[kk] = tau*tau;
c_m[kk] = tau*tau;
b_m[kk] = 2 * tau*tau + h*h;
f_m[kk] = h*h*Unn[i] - 2 * h*h*uu0[i];
kk++;
}

MPI_Gather(a_m, n/size, MPI_DOUBLE, a, n/size, MPI_DOUBLE, 0,

MPI_COMM_WORLD);
MPI_Gather(b_m, n/size, MPI_DOUBLE, b, n/size, MPI_DOUBLE, 0,
MPI_COMM_WORLD);
MPI_Gather(c_m, n/size, MPI_DOUBLE, c, n/size, MPI_DOUBLE, 0,
MPI_COMM_WORLD);
MPI_Gather(f_m, n/size, MPI_DOUBLE, f, n/size, MPI_DOUBLE, 0,
MPI_COMM_WORLD);

if (rank == 0)
{
MPI_Bcast(a, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(b, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(c, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Bcast(f, n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
}
A PARALLEL HYBRID IMPLEMENTATION OF THE 2D ACOUSTIC WAVE EQUATION 7

These parallel technologies, CUDA, OpenMP and MPI can be combined to form a
multi-layered hybrid structure, the premise is that the system has several CPU cores
and at least one graphics processor. Under this hybrid structure (Figure 1), we can
make better use of the advantages of another programming model.

Figure 1. Flowchart of hybrid approach

4. Experimental Results
In this section we show the results obtained on a desktop computer with configu-
ration 4352 cores GeForce RTX 2080 TI, NVIDIA GPU together with a CPU Intel
Core(TM) i7-9800X, 3.80 GHz, RAM 64Gb. Simulation parameters are configured
as follows. Mesh size is uniform in both directions with ∆x = ∆y = 0.01, coefficients
c = 1 and numerical time step ∆t is 0.02, and simulation time is T = 1.0, therefore
the total numerical time step is 50.
Using the implicit sub-scheme (2.11), the cyclic reduction [30] method is performed
k+1/2
in the x direction, with the result that we get the grid function ui,j . In the second
fractional time step, using the sub-scheme (2.12), the cyclic reduction method is
performed in the direction of the y axis, as the result we get the grid function uk+1
i,j .
To present more realistic data we test four cases with large domain sizes of 1024 ×
1024, 2048 × 2048, 4096 × 4096 and 8192 × 8192. In Table 1 we report the execu-
tion times in seconds for serial (CPU time), CUDA (GPU time), GPU+OpenMP,
and CUDA+OpenMP+MPI implementations of the cyclic reduction method to the
discrete problem (2.8)–(2.10).
8 ARSHYN ALTYBAY, MICHAEL RUZHANSKY, AND NIYAZ TOKMAGAMBETOV

Table 1. Execution Time (Seconds)

N (mesh size) CPU GPU GPU/OpenMP GPU/OpenMP/MPI

2 CPU core 4 CPU core 8 CPU core
1024 × 1024 48.13 24.104 24.151 24.432 23.232 22.61
2048 × 2048 189.677 45.033 45.01 35.133 33.571 30.261
4096 × 4096 755.614 122.24 59.996 58.797 54.223 51.413
8192 × 8192 3272.305 435.854 239.556 173.45 168.876 159.501

5. Conclusion and Future Work

In this paper, we proposed the numerical solution of the 2D acoustic wave equation
based on an implicit finite difference scheme using the cyclic reduction method. And,
we constructed a heterogeneous hybrid programming environment for a single PC by
combining the message passing interface MPI, OpenMP, and CUDA programming.
Also, implemented parallelization of the cyclic reduction method on the graphic pro-
cessing unit. Finally, we showed how we accelerated the cyclic reduction method on
the NVIDIA GPU. From the test results of table 1 it can be seen that the accelera-
tion method proposed by us gives a good result. Our hybrid CUDA/OpenMP/MPI
implementation obtained up to 2.75 times faster results than CUDA implementation.
In future work, we are planning to improve and adapt our hybrid approach for
GPU cluster and test it.

References
[1] D. W. Peaceman, H. H. Rachford. The Numerical Solution of Parabolic and Elliptic Differential
Equations, Journal of the Society for Industrial and Applied Mathematics, 3.1, 1955, issn:
03684245. url: http://www.jstor.org/stable/2098834
[2] N. Bell, M. Garland. Efficient sparse matrix-vector multiplication on CUDA, NVIDIA Technical
Report, 2008.
[3] E. Elsen, P. LeGresley, E. Darve. Large calculation of the flow over a hypersonic vehicle using
a GPU, J. Comput. Phys, 227,1014810161, 2008.
[4] Y. Zhang, J. Cohen, J. Owens, Fast tridiagonal solvers on the GPU, ACM Sigplan Notices,
45:5,127136,2010
[5] Y. Zhang, J. Cohen, A. Davidson, J. Owens, A hybrid method for solving tridiagonal systems
on the GPU, GPU Computing Gems Jade Edition, 117,2011.
[6] A. Davidson, J. Owens Register packing for cyclic reduction: a case study, Proceedings of the
FourthWorkshop on General Purpose Processing on Graphics Processing Units, ACM, 4,2011.
[7] A. Davidson, Y. Zhang, J. Owens An auto-tuned method for solving large tridiagonal systems
on the GPU, Parallel and Distributed Processing Symposium (IPDPS), IEEE International,
IEEE, 2011,956965,2011.
[8] D. Goddeke, R. Strzodka. Cyclic reduction tridiagonal solvers on GPUs applied to mixed-
precision multigrid, Parallel and Distributed Systems, IEEE Transactions, 22:1, 2232, 2011.
[9] H. Kim, S.Wu, L. Chang, W. Hwu. A scalable tridiagonal solver for GPUs, Parallel Processing
(ICPP), 2011 International Conference on, IEEE, 444453, 2011.
[10] N. Sakharnykh. Tridiagonal solvers on the GPU and applications to fluid simulation, GPU
Technology Conference, 2009.
[11] Z. Wei, B. Jang, Y. Zhang, Y.Jia. Parallelizing Alternating Direction Implicit Solver on GPUs,
International Conference on Computational Science, ICCS, Procedia Computer Science 18,
389398, 2013.
A PARALLEL HYBRID IMPLEMENTATION OF THE 2D ACOUSTIC WAVE EQUATION 9

[12] F. Bodin, S. Bihan. Heterogeneous multicore parallel programming for graphics processing
units, J. Sci. Programming 17:4, 325336, 2009. doi:10.3233/SPR-2009-0292.
[13] C.T. Yang, C. L. Huang, C. F. Lin. Hybrid CUDA, OpenMP, and MPI parallel programming
on multicore GPU clusters, Computer Physics Communications 182,266269,2011
[14] Y. Liu, R. Xiong. A MPI + OpenMP + CUDA Hybrid Parallel Scheme for MT Oc-
cam Inversion, International Journal of Grid and Distributed Computing Vol. 9,9,67-82,2016
http://dx.doi.org/10.14257/ijgdc.2016.9.9.07
[15] A. L. D, J.E. Roman. MPI-CUDA parallel linear solvers for block-tridiagonal matrices in the
context of SLEPcs eigensolvers, Parallel Computing 118, 2017
[16] D. Mu, P, Chen, L. Wang. Accelerating the discontinuous Galerkin method for seismic
wave propagation simulations using multiple GPUs with CUDA and MPI. Earthq Sci
26(6):377393,2013 DOI 10.1007/s11589-013-0047-7
[17] P. Alonso, R. Cortina, F.J. Martnez-Zaldvar, J. Ranilla, Neville elimination on multi- and many-
core systems: OpenMP, MPI and CUDA, J. Supercomputing, in press, doi:10.1007/s11227-009-
0360-z, SpringerLink Online Date: Nov. 18, 2009.
[18] C. Garetto and M. Ruzhansky. Hyperbolic Second Order Equations with Non-Regular Time
Dependent Coefficients. Arch. Rational Mech. Anal., 217(1):113–154, 2015.
[19] M. Ruzhansky and N. Tokmagambetov. Wave equation for operators with discrete spectrum
and irregular propagation speed. Arch. Ration. Mech. Anal., 226(3):1161–1207, 2017.
[20] M. Ruzhansky, N. Tokmagambetov. Very weak solutions of wave equation for Landau Hamil-
tonian with irregular electromagnetic field. Lett. Math. Phys., 107:591–618, 2017.
[21] M. Ruzhansky and N. Tokmagambetov. On a very weak solution of the wave equation for a
Hamiltonian in a singular electromagnetic field. Math. Notes, 103(5–6):856–858, 2018.
[22] J. C. Munoz, M. Ruzhansky and N. Tokmagambetov. Wave propagation with irregular dissipa-
tion and applications to acoustic problems and shallow waters. J. Math. Pures Appl., 123:127–
147, 2019.
[23] J. C. Munoz, M. Ruzhansky and N. Tokmagambetov, Acoustic and Shallow Water Wave Prop-
agation with Irregular Dissipation. Funct. Anal. Appl., 53(2):153–156, 2019.
[24] M. Ruzhansky and N. Tokmagambetov. Wave Equation for 2D Landau Hamiltonian. Appl.
Comput. Math., 18(1):69-78, 2019.
[25] A.A. Samarskii. The Theory of Difference Schemes. CRC Press, 2001.
[26] NVIDIA, Nvidia, http://www.nvidia.com/, Accessed 2019.
[27] G. Karniadakis, R. M. Kirby. Parallel Scientific Computing in C++ and MPI: A Seamless
Approach to Parallel Algorithms and Their Implementation Cambridge University Press, PA-
P/CDR edition, 17-30, 2003
[28] D. Goddeke, R. Strzodka, J. Mohd-Yusof, P. McCormick, S. Buijssen, M. Grajewski, S. Tureka,
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster, Parallel Comput.
33, 685699, 2007.
[29] 2D wave GPU implementation https://github.com/Arshynbek/2Dwave-GPU-implementation
[30] R. W. Hockney. A fast direct solution of Poissons equation using Fourier analysis. Journal of
the ACM, 12:1, 95113, 1965.
[31] J. Nickolls, I. Buck, M. Garland, K.Skadron. Scalable parallel programming with cuda. Queue,
6:2,4053,2008. doi:http://www.doi.acm.org/10.1145/ 1365490.1365500.
[32] A. Klockner, T. Warburton, J. Bridge, and J.S. Hesthaven. Nodal discontinuous Galerkin meth-
ods on graphics processors, J. Comput. Phys., 228: 21,78637882,2009.

Arshyn Altybay:
Al-Farabi Kazakh National University
71 Al-Farabi avenue
050040 Almaty
Kazakhstan
and
Department of Mathematics: Analysis, Logic and Discrete Mathematics
Ghent University, Belgium
10 ARSHYN ALTYBAY, MICHAEL RUZHANSKY, AND NIYAZ TOKMAGAMBETOV

and
Institute of Mathematics and Mathematical Modeling
125 Pushkin str., Almaty, 050010
Kazakhstan,
E-mail address [email protected]

Michael Ruzhansky:
Department of Mathematics: Analysis, Logic and Discrete Mathematics
Ghent University, Belgium
and
School of Mathematical Sciences
Queen Mary University of London
United Kingdom
E-mail address [email protected]

Niyaz Tokmagambetov:
Department of Mathematics: Analysis, Logic and Discrete Mathematics
Ghent University, Belgium
and
al–Farabi Kazakh National University
71 al–Farabi ave., Almaty, 050040
Kazakhstan,
E-mail address [email protected]

View publication stats

Performance Analysis of Different Iterative Solvers Parallelized On Gpu Architecture
No ratings yet
Performance Analysis of Different Iterative Solvers Parallelized On Gpu Architecture
8 pages
An Efficient Sorting Algorithm With CUDA
No ratings yet
An Efficient Sorting Algorithm With CUDA
8 pages
Wave Equations Pinn
No ratings yet
Wave Equations Pinn
13 pages
David R. Emerso PDF
No ratings yet
David R. Emerso PDF
488 pages
Time-Parallel Methods For Accelerating The Solution of Structural Dynamics Problems
No ratings yet
Time-Parallel Methods For Accelerating The Solution of Structural Dynamics Problems
189 pages
NMFHT 2020
No ratings yet
NMFHT 2020
358 pages
FDTD Cuda
No ratings yet
FDTD Cuda
118 pages
3319534289
No ratings yet
3319534289
137 pages
Parallel PNSC
No ratings yet
Parallel PNSC
26 pages
Thesis 1997 Abdullah
No ratings yet
Thesis 1997 Abdullah
259 pages
PDE Overview
No ratings yet
PDE Overview
334 pages
Arxiv Mocp
No ratings yet
Arxiv Mocp
24 pages
MMMMPR
No ratings yet
MMMMPR
25 pages
Numerical Methods For Pdes: Spring 2007
No ratings yet
Numerical Methods For Pdes: Spring 2007
35 pages
Parallelisation of A Multi-Grid FDTD Electromagnet
No ratings yet
Parallelisation of A Multi-Grid FDTD Electromagnet
12 pages
1 s2.0 S0165212524002191 Main
No ratings yet
1 s2.0 S0165212524002191 Main
20 pages
The IMA Volumes in Mathematics and Its Applications: Willard Miller, JR
No ratings yet
The IMA Volumes in Mathematics and Its Applications: Willard Miller, JR
308 pages
FE Parabolic Integrodif P-Laplace Mohim-1
No ratings yet
FE Parabolic Integrodif P-Laplace Mohim-1
25 pages
Object-Oriented Software Tools For The Construction of Preconditioners
No ratings yet
Object-Oriented Software Tools For The Construction of Preconditioners
12 pages
Performance Evaluation of A Two Dimensional Lattice Boltzmann-2017
No ratings yet
Performance Evaluation of A Two Dimensional Lattice Boltzmann-2017
22 pages
Witherden 2015 - On The Development and Implementation of High-Order Flux Reconstruction Schemes For Computational Fluid Dynamics
No ratings yet
Witherden 2015 - On The Development and Implementation of High-Order Flux Reconstruction Schemes For Computational Fluid Dynamics
131 pages
A Comparative Numerical Study of Pressure-Poisson-Equation Discretization Strategies For SPH
No ratings yet
A Comparative Numerical Study of Pressure-Poisson-Equation Discretization Strategies For SPH
9 pages
CHAPTER 3 Wave Equation Computations and Truly Parallel Processing - 1989 - Handbook of Geophysical Exploration Seismic Exploration
No ratings yet
CHAPTER 3 Wave Equation Computations and Truly Parallel Processing - 1989 - Handbook of Geophysical Exploration Seismic Exploration
20 pages
ECCOMAS Glasgow Article
No ratings yet
ECCOMAS Glasgow Article
11 pages
Articles CAF Hierarchical Published
No ratings yet
Articles CAF Hierarchical Published
10 pages
A Tensorial Approach To Computational Continuum Me
No ratings yet
A Tensorial Approach To Computational Continuum Me
13 pages
2020-Mixed - Mode Stattime Discretization
No ratings yet
2020-Mixed - Mode Stattime Discretization
32 pages
Du and He - 2023 - Neural-Integrated Meshfree (NIM) Method A Differe
No ratings yet
Du and He - 2023 - Neural-Integrated Meshfree (NIM) Method A Differe
44 pages
Transported Memory Networks
No ratings yet
Transported Memory Networks
9 pages
A Comparison of Finite Difference Methods For A One-Dimensional Hyperbolic Equation With Nonlocal Boundary Conditions
No ratings yet
A Comparison of Finite Difference Methods For A One-Dimensional Hyperbolic Equation With Nonlocal Boundary Conditions
8 pages
Journal of Computational Physics: Sanghyun Ha, Junshin Park, Donghyun You
No ratings yet
Journal of Computational Physics: Sanghyun Ha, Junshin Park, Donghyun You
19 pages
Data Sheet: Communication Unit 560CMR01
No ratings yet
Data Sheet: Communication Unit 560CMR01
5 pages
Parallel Computing in CFD: Milovan Perić
No ratings yet
Parallel Computing in CFD: Milovan Perić
25 pages
David - Wille - 1995 - Electromagnetic Field Computations On Massively Parallel Computers
No ratings yet
David - Wille - 1995 - Electromagnetic Field Computations On Massively Parallel Computers
6 pages
High Performance Parallel Computing of Flows in Complex Geometries - Part 1 - Methods
No ratings yet
High Performance Parallel Computing of Flows in Complex Geometries - Part 1 - Methods
25 pages
Parallel Solving Tasks of Digital Image Processing
No ratings yet
Parallel Solving Tasks of Digital Image Processing
5 pages
Physics-Informed Neural Network Method For Solving One-Dimensional Advection Equation Using Pytorch
No ratings yet
Physics-Informed Neural Network Method For Solving One-Dimensional Advection Equation Using Pytorch
15 pages
Computing Room Acoustics With CUDA - 3D FDTD Schem PDF
No ratings yet
Computing Room Acoustics With CUDA - 3D FDTD Schem PDF
4 pages
Manapy: MPI-Based Framework For Solving Partial Differential Equations Using Finite-Volume On Unstructured-Grid
No ratings yet
Manapy: MPI-Based Framework For Solving Partial Differential Equations Using Finite-Volume On Unstructured-Grid
24 pages
ECCOMAS Oslo Article
No ratings yet
ECCOMAS Oslo Article
12 pages
Witherden - 2013
No ratings yet
Witherden - 2013
13 pages
EngAn3 CFD 2013 14 Lect - 2
No ratings yet
EngAn3 CFD 2013 14 Lect - 2
48 pages
CHAPTER 1 Is 3 D Wave Equation Modeling Feasible in The Next Ten Years - 1989 - Handbook of Geophysical Exploration Seismic Exploration
No ratings yet
CHAPTER 1 Is 3 D Wave Equation Modeling Feasible in The Next Ten Years - 1989 - Handbook of Geophysical Exploration Seismic Exploration
10 pages
Report
No ratings yet
Report
13 pages
Solving Pdes With Cuda
No ratings yet
Solving Pdes With Cuda
34 pages
Hysing PHD Thesis
No ratings yet
Hysing PHD Thesis
134 pages
Parallel-Vector Equation Solvers For Finite Element Engineering Applications
No ratings yet
Parallel-Vector Equation Solvers For Finite Element Engineering Applications
15 pages
Distributed Performance Improvement of Alternating Iterative Methods Running On Master Worker Paradigm With Mpi
No ratings yet
Distributed Performance Improvement of Alternating Iterative Methods Running On Master Worker Paradigm With Mpi
13 pages
Cfdpre
No ratings yet
Cfdpre
354 pages
A Full Finite-Volume Time-Domain Approach Towards General-Purpose Code Development For Sound Propagation Prediction With Unstructured Mesh
No ratings yet
A Full Finite-Volume Time-Domain Approach Towards General-Purpose Code Development For Sound Propagation Prediction With Unstructured Mesh
15 pages
Kochurov 2015 - GPU Implementation of Jacobi Method and Gauss-Seidel For Large Data
No ratings yet
Kochurov 2015 - GPU Implementation of Jacobi Method and Gauss-Seidel For Large Data
13 pages
Parallel Pde
No ratings yet
Parallel Pde
11 pages
OCI Fast Track Tutorial-OCI
No ratings yet
OCI Fast Track Tutorial-OCI
52 pages
SUCCEX-E F4 V2.1 Wiring Diagram 201006
100% (3)
SUCCEX-E F4 V2.1 Wiring Diagram 201006
3 pages
Recspareparts (42 02 Parts c3
No ratings yet
Recspareparts (42 02 Parts c3
72 pages
Quick Sort
No ratings yet
Quick Sort
18 pages
Windows 11 Upgrade - User Communication
No ratings yet
Windows 11 Upgrade - User Communication
2 pages
MENSAJES
No ratings yet
MENSAJES
816 pages
Microcontrollers Lab Manual - East Point
No ratings yet
Microcontrollers Lab Manual - East Point
42 pages
Smart Engineering Project
No ratings yet
Smart Engineering Project
12 pages
CANopen User Guide
100% (2)
CANopen User Guide
53 pages
Spectro V16 SoftWare
No ratings yet
Spectro V16 SoftWare
20 pages
Practical 2 - COA
No ratings yet
Practical 2 - COA
10 pages
Serial Communication Hardware For Sigma FSP
No ratings yet
Serial Communication Hardware For Sigma FSP
4 pages
COA Quetion Bank (Pyq) PDF
No ratings yet
COA Quetion Bank (Pyq) PDF
18 pages
Windows 11 Commercial Licensing Guide
No ratings yet
Windows 11 Commercial Licensing Guide
17 pages
Samsung Firmware Download - Lastest Official Firmware Update
No ratings yet
Samsung Firmware Download - Lastest Official Firmware Update
1 page
Temperature Control Module User's Manual: - Q64TCTT - Q64Tcttbw - Q64TCRT - Q64Tcrtbw - GX Configurator-TC (SW0D5C-QTCU-E)
No ratings yet
Temperature Control Module User's Manual: - Q64TCTT - Q64Tcttbw - Q64TCRT - Q64Tcrtbw - GX Configurator-TC (SW0D5C-QTCU-E)
178 pages
PL Kla 08.06.23
No ratings yet
PL Kla 08.06.23
17 pages
FinalProject PDF
No ratings yet
FinalProject PDF
105 pages
Chapter 4 - Virtualization
No ratings yet
Chapter 4 - Virtualization
7 pages
Wireshark Lab-Analyzing TCP and DNS Protocols
No ratings yet
Wireshark Lab-Analyzing TCP and DNS Protocols
9 pages
98-361 Software Development Fundamentals - Skills Measured
No ratings yet
98-361 Software Development Fundamentals - Skills Measured
2 pages
Computer System
No ratings yet
Computer System
3 pages
App C PDF
No ratings yet
App C PDF
13 pages
Installationshandbuch Windows SLMBC
No ratings yet
Installationshandbuch Windows SLMBC
24 pages
PPL 7
No ratings yet
PPL 7
12 pages
Sahil Behl Resume
No ratings yet
Sahil Behl Resume
3 pages
MURALI - VLSI Design Engineers
0% (1)
MURALI - VLSI Design Engineers
3 pages
Anaconda Installation Guide
No ratings yet
Anaconda Installation Guide
6 pages
Get Traces On QuesCom 300-400
No ratings yet
Get Traces On QuesCom 300-400
4 pages
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
From Everand
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
XGBoost GPU Implementation and Optimization: The Complete Guide for Developers and Engineers
From Everand
XGBoost GPU Implementation and Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
From Everand
GPU Assembly and Shader Programming for Compute: Low-Level Optimization Techniques for High-Performance Parallel Processing
Robert Johnson
No ratings yet
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
From Everand
OpenACC Programming Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical High Performance Computing: Definitive Reference for Developers and Engineers
From Everand
Practical High Performance Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Global Illumination: Advancing Vision: Insights into Global Illumination
From Everand
Global Illumination: Advancing Vision: Insights into Global Illumination
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet

2D Acoustic Wave Equation - Parallel Hybrid Implementation

Uploaded by

2D Acoustic Wave Equation - Parallel Hybrid Implementation

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A parallel hybrid implementation of the 2D acoustic wave equation

Preprint · June 2020

Arshyn Altybay Michael Ruzhansky

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

ARSHYN ALTYBAY, MICHAEL RUZHANSKY, AND NIYAZ TOKMAGAMBETOV

Abstract. In this paper, we propose a hybrid parallel programming approach for

2010 Mathematics Subject Classification. 35L05, 76B15, 68Q85.

2. Problem Statement and Numerical Scheme

(2.5) u(t, 0, y) = 0, u(t, l, y) = 0, t ∈ [0, T ], y ∈ [0, l].

(2.7) Ωτh1 ,h2 = {tk = kτ, k = 0, M ; xi = ih1 , i = 0, N1 ; yj = jh2 , j = 0, N2 },

where h1 = l/N1 , h2 = l/N2 and τ = T /M .

(2.9) u0i,j = ϕi,j , u1i,j − u0i,j = τ ψi,j ,

for (i, j) ∈ 0, N × 0, N , and with boundary conditions

(2.10) uk0,j = 0, ukN,j = 0, uki,0 = 0, uki,N = 0,

for (j, k) ∈ 0, N × 0, M and (i, k) ∈ 0, N × 0, M , respectively.

ck−1/2 k−1/2 k−1/2 k−1/2 k

3. Hybrid parallel computing model

Algorithm 1 Implementation of 2D wave equation

3.2. OpenMP+CUDA approach. OpenMP (Open Multi-Processing) was intro-

Listing 1. Calculation of ai, bi, ci, f i

MPI_Gather(a_m, n/size, MPI_DOUBLE, a, n/size, MPI_DOUBLE, 0,

Figure 1. Flowchart of hybrid approach

Table 1. Execution Time (Seconds)

N (mesh size) CPU GPU GPU/OpenMP GPU/OpenMP/MPI

5. Conclusion and Future Work

View publication stats

You might also like