0% found this document useful (0 votes)
79 views8 pages

Using GPU Technologies To Drastically Accelerate FDTD Simulations

FDTD is a method for computing electromagnetic fields at the most elementary level. The method has inherent obstacles that have kept it from being universally used. Using a GPU to accelerate FDTD simulations is a promising approach.

Uploaded by

orphyus
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views8 pages

Using GPU Technologies To Drastically Accelerate FDTD Simulations

FDTD is a method for computing electromagnetic fields at the most elementary level. The method has inherent obstacles that have kept it from being universally used. Using a GPU to accelerate FDTD simulations is a promising approach.

Uploaded by

orphyus
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Using GPU Technologies to Drastically Accelerate

FDTD Simulations

Chad Pendley, Remcom, Inc.

Introduction
Wireless technologies play a significant part of the world we live in and have
quickly developed from obscure and mysterious to openly accepted and often
demanded. Cell phones, once a status commodity, are now common world-wide.
GPS devices, communicating with satellites over 10,000 miles away traveling at
thousands of miles per hour, are found in numerable devices giving users real-
time, precise location. WIFI stations communicate with a host of devices,
providing untold conveniences. Doctors gain precise detail about the inner
workings of patients for diagnosis and treatment. Uncounted devices operating
simultaneously and in close proximity necessitates precision and intelligence to
ensure correct and safe functionality. However, time to market for high tech
devices directly affects competitiveness and profitability. Is it possible to be
accurate and still get to market quickly? While the answer may be complex,
many may benefit from using a GPU accelerated Finite Difference Time Domain
method as described in this paper.

REMCOM INC. | 315 South Allen Street, Suite 222 | State College, PA 16801 USA
Tel: +1.814.861.1299 | Fax: +1.814.861.1308 | www.remcom.com
Overview of Finite Difference Time Domain Method
The Finite Difference Time Domain (FDTD) method has been utilized over the
last several decades and become increasingly more prevalent in scientific
research and technical industry. The origin of the FDTD method is generally
attributed to Kane Yee who, in a paper published in 1966 [1], described a method
for computing Maxwell’s Equations discretely in a time-stepping manor. This
technique was later expanded and named by Allen Taflove[2]-[4]. The FDTD
method is virtually unique to EM simulation methods because it directly
implements Maxwell's Curl Equations, which models electromagnetic fields at the
most elementary level. Since the method is fundamentally sound, FDTD is often
used to verify results originating from faster, assumption based techniques. The
method has been applied to problems ranging from kilohertz to visible light. While
accurate, the FDTD method has inherent obstacles that have kept it from being
universally used. For example, the entire computation space must be evaluated
at each time step and the grid dimensions must be sufficiently small to accurately
model the signal propagation. For effectively large project spaces, FDTD-based
codes may become memory intensive and relatively slow.

Figure 1: A CAD representation of an F/A-22 Raptor rendered in XFdtd version 7.0.3. CAD
images may be imported or created in geometry space and easily converted into FDTD grid.

Remcom’s XFdtd® is an EM solver based on the FDTD method and has been
used significantly to model structures that require a high level of fidelity. For the
FDTD results of this paper, Remcom’s XFdtd version 7.0.3 was utilized.

© 2009 Remcom Inc. Page 2


Overview of GPU Technology
Graphics Processing Unit (GPU) technology has exploded over the last decade.
Why? Because video gaming connoisseurs have demanded and been willing to
pay for it. High end GPUs perform significant quantities of computations in order
to render high resolution images and action sequences in a seemingly seamless
manner. GPUs take advantage of parallel processing, or threading, in order to
render calculations simultaneously. In fact, GPUs may have hundreds of threads
operating calculations at any given time.

Years ago, a concept of general-purpose computing on a GPU (sometimes


referred to as GPGPU) began. Initially, engineers had to "trick" the GPU into
performing the desired computations. Calculations had to be masked into a
graphics format even though graphics were not the desired result. While there
were a number of successes in these attempts, the difficultly in developing was
prohibitive to many.
Those who did
succeed were able to
see significant speed
improvements that
kept development
interest high.
Towards the end of
2006 NVIDIA
launched its CUDA
(an acronym for
Compute Unified
Device Architecture)
technology, which
was designed to
make GPU
computing truly
general-purpose.
Today, GPU Figure 2: GPU performance has grown at a much faster rate
technology is used than the modern CPU. Data provided by NVIDIA.
increasingly across
many industries as a
way of speeding up
time intensive calculations. Generally, methods that involve inherently parallel
computations, such as FDTD, exhibit a significant amount of speedup using this
method.

© 2009 Remcom Inc. Page 3


Quantifying Speedups using FDTD on the GPU
Several variables determine speedup using GPU accelerated simulations. One
factor is the specific hardware used in simulation. Figure 2 describes GPU
performance over the last seven years; there’s a significant performance
increase with each device release. One would expect to see approximately 2x
performance benefit when comparing the Tesla T10 with the Tesla G80.

To the left is NVIDIA’s Tesla C1060 computing


board—their latest computation specific GPU
with 4 GB memory capacity.

To the right is the Tesla S1070


computing system with four
times the capacity of the single
C1060.

Another consideration for comparing CPU to GPU timing is identifying


calculations performed on the GPU(s). XFdtd, like many scientific tools, saves
data at designated intervals depending on the types of results requested by the
user. When saving data, field results are pulled from the GPU and saved by the
CPU rather than the graphics card(s). Typical saves may not radically alter the
overall simulation time but may when significant amounts of data are
requested—such as large volume SAR calculations or multiple steady state
frequency extractions. Bus speeds and system RAM contribute to the overall
performance of GPU simulations—Remcom generally recommends twice the
system RAM as that available on the GPUs. To most accurately compare
technology performance, data saves should be minimized. However, for
individual justification for changing technology, data saves should be considered
to the extent that they are used in real application simulations.

Now, we consider the actual timing comparisons for some examples. The
combination of these examples begins to showcase the benefits of coupling the
GPU technology with the accuracy of FDTD simulations.

© 2009 Remcom Inc. Page 4


8x8 Array of Patch Antennas
For the first example, we consider
a patch antenna array built into an
8x8 configuration. This specific
array is detailed more fully by S.
Bellofiore, et al. [5]. The overall
memory requirement to simulate
this project was about 233 MB.
Steady state far field data was
requested during simulation time
and a single frequency source was
used.
Figure 3: Amplified portion of 8x8 patch antenna
array with far field pattern representation. The benchmark for this simulation
was an HP xw9400 with dual-quad
Opteron 2216 running 64-bit Red
Hat Linux. By contrasting simulation times with the GPU accelerated runs using
NVIDIA’s Tesla C870, Quadro FX 5600, and Tesla C1060, the resulting
speedups were on the order of 14x, 18x, and 47x, respectively. At a 47x
speedup, simulations that would typically take an hour to complete would be
finishing up in just over one minute and 16 seconds.

Rotman Lens
The Rotman Lens can be
costly to simulate since the
device is electrically large
and contains a relatively
complex geometry along
one plane. The lens shown
in Figure 4 is resolved in a
geometry that requires
about 1 GB in RAM. A
broadband source was used
and only S-Parameters were
requested during simulation
time.

GPU simulations were run Figure 4: Rotman lens as generated by Remcom’s RLD
software and imported into XFdtd.
using one and two NVIDIA
Tesla C1060 cards. This
produced a performance
increase of about 49x and 75x, respectively. Due to its nature, the Rotman Lens
may run for multiple days to resolve a single device. With a 75x speedup, a
simulation requiring three days to complete would complete in just less than one
hour.

© 2009 Remcom Inc. Page 5


Vivaldi Quad Flared Horn Antenna
Array
The array for this next analysis came
from a 1994 paper written by E.
Thiele and A. Taflove [6]. The paper
goes through a number of examples
using a Vivaldi Flared Horn, but for
this example, only the final quad
horn antenna array is simulated. The
project space for this example is a
grid region of 873 x 559 x 174 and
Figure 5: Array of Vivaldi Quad Flared Horn
Antennas with 3D antenna pattern displayed.
requires about 2.5 GB of RAM. A
broadband source was used and
steady-state far field pattern was
requested for 10 GHz.

When running this on the Tesla cards, performance speedups of 43x and 54x
were achieved for one and two cards, respectively. This was a significantly large
project and a noticeable amount of data was requested, but we were still able to
realize more than a 50x speedup. This could be the difference between two
weeks or six and one quarter hours.

Cell Phone
The simulation of a cell phone represents an
interesting challenge for EM simulation tools. The
modeling of internal conductors and dielectric
components for most handheld devices requires a
high degree of fidelity. A typical simulation may
require the calculation of SAR information, which
carries a significant amount of data transfers.

For this case, a project was used which carried a


memory footprint of about 750 MB of RAM. By
contrasting simulation times with those achieved
by using NVIDIA Quadro FX 5600, speedups of
29x and 49x were realized by using one and two
GPUs, respectively. When changing to the NVIDIA
Tesla C1060, the speedup values increased to
Figure 6: Image of cell
54x and 88x. To put that in perspective, a simulation
phone with SAM head.
that requires 24 hours to run an a single CPU would
be reduced to only requiring 16 minutes and 22
seconds for an 88x speedup.

© 2009 Remcom Inc. Page 6


Summary
The marriage of the FDTD method and the GPU technology ensures a strong
combination between accuracy and speed. The overall time saved using this
combination should benefit users of FDTD. By taking advantage of the GPU
speeds, weeks, if not months, could be saved getting research or product to
market.

GPU Speedup over CPU

90
80
70
60
50
40
30
20
10
0
Quadro FX 2x Tesla
Tesla C870 Tesla C1060
5600 C1060
8x8 Patch Array 17.81 13.74 46.79 45.83
Rotman Lens 48.94 74.45
Vivaldi Quad Array 43.11 54.19
Cell Phone 29.19 13.17 54.26 87.99

© 2009 Remcom Inc. Page 7


References
[1] K.Yee, “Numerical Solution of Initial Boundary Value Problems Involving
Maxwell’s Equations in Isotropic Media,” IEEE Trans. Antennas Prop., AP-14,
1966, pp. 302-307

[2] A. Taflove and M. E. Brodwin, "Numerical Solution of Steady-State


Electromagnetic Scattering Problems using the Time-Dependent Maxwell's
Equations". Microwave Theory and Techniques, IEEE Transactions, 1975, pp.
623–630.

[3] A. Taflove and M. E. Brodwin, "Computation of the Electromagnetic Fields


and Induced Temperatures within a Model of the Microwave-Irradiated Human
Eye". Microwave Theory and Techniques, IEEE Transactions, 1975, pp. 888–
896.

[4] A. Taflove, "Application of the Finite-Difference Time-Domain Method to


Sinusoidal Steady State Electromagnetic Penetration Problems".
Electromagnetic Compatibility, IEEE Transactions, 1980, pp. 191–202.

[5] S. Bellofiore, J. Foutz, R. Govindarajula, I. Bahçeci, C. Balanis, A. Spanias, J.


Capone, and T. Duman, “Smart Antenna System Analysis, Integration and
Performance for Mobile Ad-Hock Networks (MANETs)”. IEEE Trans. Antennas
Prop., AP-50, 2002, pp. 571-581.

[6] E. Thiele and A. Taflove “FD-TD Analysis of Vivaldi Flared Horn Antennas
and Arrays”. IEEE Trans. Antennas Prop., AP-42, 1994, pp. 633-641.

© 2009 Remcom Inc. Page 8

You might also like