0% found this document useful (0 votes)

79 views8 pages

Using GPU Technologies To Drastically Accelerate FDTD Simulations

FDTD is a method for computing electromagnetic fields at the most elementary level. The method has inherent obstacles that have kept it from being universally used. Using a GPU to accelerate FDTD simulations is a promising approach.

Uploaded by

orphyus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views8 pages

Using GPU Technologies To Drastically Accelerate FDTD Simulations

Uploaded by

orphyus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Using GPU Technologies to Drastically Accelerate

FDTD Simulations

Chad Pendley, Remcom, Inc.

Introduction
Wireless technologies play a significant part of the world we live in and have
quickly developed from obscure and mysterious to openly accepted and often
demanded. Cell phones, once a status commodity, are now common world-wide.
GPS devices, communicating with satellites over 10,000 miles away traveling at
thousands of miles per hour, are found in numerable devices giving users real-
time, precise location. WIFI stations communicate with a host of devices,
providing untold conveniences. Doctors gain precise detail about the inner
workings of patients for diagnosis and treatment. Uncounted devices operating
simultaneously and in close proximity necessitates precision and intelligence to
ensure correct and safe functionality. However, time to market for high tech
devices directly affects competitiveness and profitability. Is it possible to be
accurate and still get to market quickly? While the answer may be complex,
many may benefit from using a GPU accelerated Finite Difference Time Domain
method as described in this paper.

REMCOM INC. | 315 South Allen Street, Suite 222 | State College, PA 16801 USA
Tel: +1.814.861.1299 | Fax: +1.814.861.1308 | www.remcom.com
Overview of Finite Difference Time Domain Method
The Finite Difference Time Domain (FDTD) method has been utilized over the
last several decades and become increasingly more prevalent in scientific
research and technical industry. The origin of the FDTD method is generally
attributed to Kane Yee who, in a paper published in 1966 [1], described a method
for computing Maxwell’s Equations discretely in a time-stepping manor. This
technique was later expanded and named by Allen Taflove[2]-[4]. The FDTD
method is virtually unique to EM simulation methods because it directly
implements Maxwell's Curl Equations, which models electromagnetic fields at the
most elementary level. Since the method is fundamentally sound, FDTD is often
used to verify results originating from faster, assumption based techniques. The
method has been applied to problems ranging from kilohertz to visible light. While
accurate, the FDTD method has inherent obstacles that have kept it from being
universally used. For example, the entire computation space must be evaluated
at each time step and the grid dimensions must be sufficiently small to accurately
model the signal propagation. For effectively large project spaces, FDTD-based
codes may become memory intensive and relatively slow.

Figure 1: A CAD representation of an F/A-22 Raptor rendered in XFdtd version 7.0.3. CAD
images may be imported or created in geometry space and easily converted into FDTD grid.

Remcom’s XFdtd® is an EM solver based on the FDTD method and has been
used significantly to model structures that require a high level of fidelity. For the
FDTD results of this paper, Remcom’s XFdtd version 7.0.3 was utilized.

© 2009 Remcom Inc. Page 2

Overview of GPU Technology
Graphics Processing Unit (GPU) technology has exploded over the last decade.
Why? Because video gaming connoisseurs have demanded and been willing to
pay for it. High end GPUs perform significant quantities of computations in order
to render high resolution images and action sequences in a seemingly seamless
manner. GPUs take advantage of parallel processing, or threading, in order to
render calculations simultaneously. In fact, GPUs may have hundreds of threads
operating calculations at any given time.

Years ago, a concept of general-purpose computing on a GPU (sometimes

referred to as GPGPU) began. Initially, engineers had to "trick" the GPU into
performing the desired computations. Calculations had to be masked into a
graphics format even though graphics were not the desired result. While there
were a number of successes in these attempts, the difficultly in developing was
prohibitive to many.
Those who did
succeed were able to
see significant speed
improvements that
kept development
interest high.
Towards the end of
2006 NVIDIA
launched its CUDA
(an acronym for
Compute Unified
Device Architecture)
technology, which
was designed to
make GPU
computing truly
general-purpose.
Today, GPU Figure 2: GPU performance has grown at a much faster rate
technology is used than the modern CPU. Data provided by NVIDIA.
increasingly across
many industries as a
way of speeding up
time intensive calculations. Generally, methods that involve inherently parallel
computations, such as FDTD, exhibit a significant amount of speedup using this
method.

© 2009 Remcom Inc. Page 3

Quantifying Speedups using FDTD on the GPU
Several variables determine speedup using GPU accelerated simulations. One
factor is the specific hardware used in simulation. Figure 2 describes GPU
performance over the last seven years; there’s a significant performance
increase with each device release. One would expect to see approximately 2x
performance benefit when comparing the Tesla T10 with the Tesla G80.

To the left is NVIDIA’s Tesla C1060 computing

board—their latest computation specific GPU
with 4 GB memory capacity.

To the right is the Tesla S1070

computing system with four
times the capacity of the single
C1060.

Another consideration for comparing CPU to GPU timing is identifying

calculations performed on the GPU(s). XFdtd, like many scientific tools, saves
data at designated intervals depending on the types of results requested by the
user. When saving data, field results are pulled from the GPU and saved by the
CPU rather than the graphics card(s). Typical saves may not radically alter the
overall simulation time but may when significant amounts of data are
requested—such as large volume SAR calculations or multiple steady state
frequency extractions. Bus speeds and system RAM contribute to the overall
performance of GPU simulations—Remcom generally recommends twice the
system RAM as that available on the GPUs. To most accurately compare
technology performance, data saves should be minimized. However, for
individual justification for changing technology, data saves should be considered
to the extent that they are used in real application simulations.

Now, we consider the actual timing comparisons for some examples. The
combination of these examples begins to showcase the benefits of coupling the
GPU technology with the accuracy of FDTD simulations.

© 2009 Remcom Inc. Page 4

8x8 Array of Patch Antennas
For the first example, we consider
a patch antenna array built into an
8x8 configuration. This specific
array is detailed more fully by S.
Bellofiore, et al. [5]. The overall
memory requirement to simulate
this project was about 233 MB.
Steady state far field data was
requested during simulation time
and a single frequency source was
used.
Figure 3: Amplified portion of 8x8 patch antenna
array with far field pattern representation. The benchmark for this simulation
was an HP xw9400 with dual-quad
Opteron 2216 running 64-bit Red
Hat Linux. By contrasting simulation times with the GPU accelerated runs using
NVIDIA’s Tesla C870, Quadro FX 5600, and Tesla C1060, the resulting
speedups were on the order of 14x, 18x, and 47x, respectively. At a 47x
speedup, simulations that would typically take an hour to complete would be
finishing up in just over one minute and 16 seconds.

Rotman Lens
The Rotman Lens can be
costly to simulate since the
device is electrically large
and contains a relatively
complex geometry along
one plane. The lens shown
in Figure 4 is resolved in a
geometry that requires
about 1 GB in RAM. A
broadband source was used
and only S-Parameters were
requested during simulation
time.

GPU simulations were run Figure 4: Rotman lens as generated by Remcom’s RLD
software and imported into XFdtd.
using one and two NVIDIA
Tesla C1060 cards. This
produced a performance
increase of about 49x and 75x, respectively. Due to its nature, the Rotman Lens
may run for multiple days to resolve a single device. With a 75x speedup, a
simulation requiring three days to complete would complete in just less than one
hour.

© 2009 Remcom Inc. Page 5

Vivaldi Quad Flared Horn Antenna
Array
The array for this next analysis came
from a 1994 paper written by E.
Thiele and A. Taflove [6]. The paper
goes through a number of examples
using a Vivaldi Flared Horn, but for
this example, only the final quad
horn antenna array is simulated. The
project space for this example is a
grid region of 873 x 559 x 174 and
Figure 5: Array of Vivaldi Quad Flared Horn
Antennas with 3D antenna pattern displayed.
requires about 2.5 GB of RAM. A
broadband source was used and
steady-state far field pattern was
requested for 10 GHz.

When running this on the Tesla cards, performance speedups of 43x and 54x
were achieved for one and two cards, respectively. This was a significantly large
project and a noticeable amount of data was requested, but we were still able to
realize more than a 50x speedup. This could be the difference between two
weeks or six and one quarter hours.

Cell Phone
The simulation of a cell phone represents an
interesting challenge for EM simulation tools. The
modeling of internal conductors and dielectric
components for most handheld devices requires a
high degree of fidelity. A typical simulation may
require the calculation of SAR information, which
carries a significant amount of data transfers.

For this case, a project was used which carried a

memory footprint of about 750 MB of RAM. By
contrasting simulation times with those achieved
by using NVIDIA Quadro FX 5600, speedups of
29x and 49x were realized by using one and two
GPUs, respectively. When changing to the NVIDIA
Tesla C1060, the speedup values increased to
Figure 6: Image of cell
54x and 88x. To put that in perspective, a simulation
phone with SAM head.
that requires 24 hours to run an a single CPU would
be reduced to only requiring 16 minutes and 22
seconds for an 88x speedup.

Summary
The marriage of the FDTD method and the GPU technology ensures a strong
combination between accuracy and speed. The overall time saved using this
combination should benefit users of FDTD. By taking advantage of the GPU
speeds, weeks, if not months, could be saved getting research or product to
market.

GPU Speedup over CPU

90
80
70
60
50
40
30
20
10
0
Quadro FX 2x Tesla
Tesla C870 Tesla C1060
5600 C1060
8x8 Patch Array 17.81 13.74 46.79 45.83
Rotman Lens 48.94 74.45
Vivaldi Quad Array 43.11 54.19
Cell Phone 29.19 13.17 54.26 87.99

References
[1] K.Yee, “Numerical Solution of Initial Boundary Value Problems Involving
Maxwell’s Equations in Isotropic Media,” IEEE Trans. Antennas Prop., AP-14,
1966, pp. 302-307

[2] A. Taflove and M. E. Brodwin, "Numerical Solution of Steady-State

Electromagnetic Scattering Problems using the Time-Dependent Maxwell's
Equations". Microwave Theory and Techniques, IEEE Transactions, 1975, pp.
623–630.

[3] A. Taflove and M. E. Brodwin, "Computation of the Electromagnetic Fields

and Induced Temperatures within a Model of the Microwave-Irradiated Human
Eye". Microwave Theory and Techniques, IEEE Transactions, 1975, pp. 888–
896.

[4] A. Taflove, "Application of the Finite-Difference Time-Domain Method to

Sinusoidal Steady State Electromagnetic Penetration Problems".
Electromagnetic Compatibility, IEEE Transactions, 1980, pp. 191–202.

[5] S. Bellofiore, J. Foutz, R. Govindarajula, I. Bahçeci, C. Balanis, A. Spanias, J.

Capone, and T. Duman, “Smart Antenna System Analysis, Integration and
Performance for Mobile Ad-Hock Networks (MANETs)”. IEEE Trans. Antennas
Prop., AP-50, 2002, pp. 571-581.

[6] E. Thiele and A. Taflove “FD-TD Analysis of Vivaldi Flared Horn Antennas
and Arrays”. IEEE Trans. Antennas Prop., AP-42, 1994, pp. 633-641.

Brigance Record Book I ASSIGN A STUDENT A GRADE LEVEL PDF
100% (2)
Brigance Record Book I ASSIGN A STUDENT A GRADE LEVEL PDF
41 pages
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Discussion 1 - Intro To Lumerical and DVD Tutorial
No ratings yet
Discussion 1 - Intro To Lumerical and DVD Tutorial
61 pages
The Finite-Difference Time-Domain Method For Electromagnetics With MATLAB Simulations 2009
50% (2)
The Finite-Difference Time-Domain Method For Electromagnetics With MATLAB Simulations 2009
453 pages
FDTD Method For em With MATLAB Simulations
100% (2)
FDTD Method For em With MATLAB Simulations
453 pages
FDTD
100% (1)
FDTD
53 pages
Class 6 - Lasers Problems - Dr. Ajitha - PHY1701
100% (1)
Class 6 - Lasers Problems - Dr. Ajitha - PHY1701
15 pages
CA LISA Virtualization - Presentation
No ratings yet
CA LISA Virtualization - Presentation
15 pages
Jonathan Bennett Events and Their Names
No ratings yet
Jonathan Bennett Events and Their Names
239 pages
XStream WhitePaper Glossy Final
No ratings yet
XStream WhitePaper Glossy Final
8 pages
FDTD Cuda
No ratings yet
FDTD Cuda
118 pages
Computational Electromagnetics: The Finite-Difference Time-Domain
No ratings yet
Computational Electromagnetics: The Finite-Difference Time-Domain
11 pages
kbiswas,+ACES Journal May 2019 Paper 14
No ratings yet
kbiswas,+ACES Journal May 2019 Paper 14
7 pages
Bandwidth Intensive 3-D FFT Kernel For Gpus Using Cuda: Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka
No ratings yet
Bandwidth Intensive 3-D FFT Kernel For Gpus Using Cuda: Akira Nukada, Yasuhiko Ogata, Toshio Endo, Satoshi Matsuoka
11 pages
CST Studio Suite: Electromagnetic Field Simulation Software
No ratings yet
CST Studio Suite: Electromagnetic Field Simulation Software
8 pages
Meep Openmp
No ratings yet
Meep Openmp
13 pages
1 s2.0 S0010465518303990 Main
No ratings yet
1 s2.0 S0010465518303990 Main
11 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
MEEp Tutorial For ICE Cube
No ratings yet
MEEp Tutorial For ICE Cube
39 pages
Valu, Avx and Gpu Acceleration Techniques For Parallel FDTD Methods
No ratings yet
Valu, Avx and Gpu Acceleration Techniques For Parallel FDTD Methods
250 pages
The Parallel Finite Difference Time Domain (FDTD) Project
No ratings yet
The Parallel Finite Difference Time Domain (FDTD) Project
4 pages
Applsci 09 02775
No ratings yet
Applsci 09 02775
15 pages
OpenGL Deep Dive: Expert Techniques and Performance Optimization: OpenGL
From Everand
OpenGL Deep Dive: Expert Techniques and Performance Optimization: OpenGL
Kameron Hussain
No ratings yet
Engineering AI Excellence
From Everand
Engineering AI Excellence
Azhar ul Haque Sario
No ratings yet
Accelerating Image Recognition On Mobile Device Using GPGPU
No ratings yet
Accelerating Image Recognition On Mobile Device Using GPGPU
10 pages
Master Thesis
No ratings yet
Master Thesis
87 pages
Image Feature Extraction Algorithm Based On CUDA Architecture: Case Study GFD and GCFD
No ratings yet
Image Feature Extraction Algorithm Based On CUDA Architecture: Case Study GFD and GCFD
8 pages
Wireless Insite: Wireless em Propagation Software From The Leaders in High-Fidelity Propagation
No ratings yet
Wireless Insite: Wireless em Propagation Software From The Leaders in High-Fidelity Propagation
3 pages
Empro 3d em Simulation Software
No ratings yet
Empro 3d em Simulation Software
10 pages
Image Parallel Processing Based On GPU PDF
No ratings yet
Image Parallel Processing Based On GPU PDF
4 pages
Choosing Right Photonic Design Software
No ratings yet
Choosing Right Photonic Design Software
5 pages
3-D Parallel Fault Simulation With GPGPU
No ratings yet
3-D Parallel Fault Simulation With GPGPU
11 pages
A Novel FDTD Application Featuring Openmp-Mpi Hybrid Parallelization
No ratings yet
A Novel FDTD Application Featuring Openmp-Mpi Hybrid Parallelization
12 pages
FDTD Thesis
100% (2)
FDTD Thesis
7 pages
06837093
No ratings yet
06837093
20 pages
SoC Based PR
No ratings yet
SoC Based PR
13 pages
PathWave EM Design (EMPro)
No ratings yet
PathWave EM Design (EMPro)
11 pages
Cpapameletis Thesis
No ratings yet
Cpapameletis Thesis
96 pages
Ict 2004
No ratings yet
Ict 2004
6 pages
A Finite-Difference Time-Domain (FDTD) Software For Simulation of Printed Circuit Board (PCB) Assembly PDF
No ratings yet
A Finite-Difference Time-Domain (FDTD) Software For Simulation of Printed Circuit Board (PCB) Assembly PDF
37 pages
Main GPU
No ratings yet
Main GPU
87 pages
Advanced Simulation Final
No ratings yet
Advanced Simulation Final
16 pages
Ijct V3i2p17
No ratings yet
Ijct V3i2p17
9 pages
01 Hallquist LSTC
No ratings yet
01 Hallquist LSTC
111 pages
Simulia 3D Experience
No ratings yet
Simulia 3D Experience
16 pages
1 s2.0 S0098300413001040 Main
No ratings yet
1 s2.0 S0098300413001040 Main
9 pages
OptiFDTD Tutorials PDF
No ratings yet
OptiFDTD Tutorials PDF
36 pages
OptiFDTD Tutorials
No ratings yet
OptiFDTD Tutorials
36 pages
Sure 2012 Ece Abraham
No ratings yet
Sure 2012 Ece Abraham
1 page
EMPro Workshop 4.0
100% (1)
EMPro Workshop 4.0
140 pages
FDTD Elsherbeni and Demir PDF
No ratings yet
FDTD Elsherbeni and Demir PDF
456 pages
Jjackson Nano
No ratings yet
Jjackson Nano
25 pages
CUDA 4 1 Webinar v11-11-22
100% (1)
CUDA 4 1 Webinar v11-11-22
41 pages
VkFFT-A Performant Cross-Platform and Open-Source GPU FFT Library
No ratings yet
VkFFT-A Performant Cross-Platform and Open-Source GPU FFT Library
20 pages
Comparison of Processing Performance and Architectural Efficiency Metrics For Fpgas and Gpus in 3D Ultrasound Computer Tomography
No ratings yet
Comparison of Processing Performance and Architectural Efficiency Metrics For Fpgas and Gpus in 3D Ultrasound Computer Tomography
7 pages
1 D Code
No ratings yet
1 D Code
6 pages
CUDA Optimization Fundamentals
No ratings yet
CUDA Optimization Fundamentals
150 pages
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
From Everand
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Device Tools
No ratings yet
Device Tools
33 pages
Presented by Ragasudha.B Pavitha.P
No ratings yet
Presented by Ragasudha.B Pavitha.P
13 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Fault Diagnosis of Reducers Based On Digital Twins and Deep Learning
No ratings yet
Fault Diagnosis of Reducers Based On Digital Twins and Deep Learning
15 pages
Applications Enabled by FPGA-Based Technology
No ratings yet
Applications Enabled by FPGA-Based Technology
4 pages
Creating Numerically E Cient FDTD Simulations Using Generic C++ Programming
No ratings yet
Creating Numerically E Cient FDTD Simulations Using Generic C++ Programming
2 pages
Problem Solving Assignment 1 PDF
No ratings yet
Problem Solving Assignment 1 PDF
5 pages
Land Use & Zoning: Line & Grade
No ratings yet
Land Use & Zoning: Line & Grade
19 pages
Create Stored Procedures in The NorthWind
No ratings yet
Create Stored Procedures in The NorthWind
7 pages
Pile Type 1 - Screw Pile Load Test Outline (Terna)
No ratings yet
Pile Type 1 - Screw Pile Load Test Outline (Terna)
113 pages
Prajwal Deshmukh - Batch A
No ratings yet
Prajwal Deshmukh - Batch A
38 pages
Paradigm Shifts
No ratings yet
Paradigm Shifts
1 page
C: Identify The Structures of The Given Sentences. P: Create Sentences Using Sentence Structures. A: Share Ideas Regarding Sentence Structures
No ratings yet
C: Identify The Structures of The Given Sentences. P: Create Sentences Using Sentence Structures. A: Share Ideas Regarding Sentence Structures
11 pages
Are We Compatible or Terrible
No ratings yet
Are We Compatible or Terrible
6 pages
Handbook Rheometer
No ratings yet
Handbook Rheometer
328 pages
Boost Power Stage in SMPS
No ratings yet
Boost Power Stage in SMPS
32 pages
Alternative Energy Demystified (S.Gibilisco)
No ratings yet
Alternative Energy Demystified (S.Gibilisco)
338 pages
2-3btc of Freebitco - in
100% (1)
2-3btc of Freebitco - in
2 pages
Cs 1 12th Experiment
0% (1)
Cs 1 12th Experiment
34 pages
CLP 02.2 Course Title: Microprocessors & Microcontrollers Lab
No ratings yet
CLP 02.2 Course Title: Microprocessors & Microcontrollers Lab
6 pages
Cement Evaluation Challenges
No ratings yet
Cement Evaluation Challenges
18 pages
Mock Exam-P1 Review 2025
No ratings yet
Mock Exam-P1 Review 2025
41 pages
Brochure Rilsan-PA11 2005
No ratings yet
Brochure Rilsan-PA11 2005
32 pages
Tabla de Torques DP DC HW
No ratings yet
Tabla de Torques DP DC HW
1 page
Sim of Tyre Rolling Resistance Final Rev
No ratings yet
Sim of Tyre Rolling Resistance Final Rev
26 pages
Module 2 Previous Year Questions
No ratings yet
Module 2 Previous Year Questions
9 pages
2010 01 12 3DBeam CDT6
No ratings yet
2010 01 12 3DBeam CDT6
65 pages
AERO3000 Equation List
No ratings yet
AERO3000 Equation List
19 pages
STEEL Standard Specifications
100% (1)
STEEL Standard Specifications
4 pages
Pamantayan Sa Pagkatuto Time Allotment
No ratings yet
Pamantayan Sa Pagkatuto Time Allotment
41 pages
Analizador de Carbono Orgánico Total C391E058L TOC V
100% (1)
Analizador de Carbono Orgánico Total C391E058L TOC V
20 pages

Using GPU Technologies To Drastically Accelerate FDTD Simulations

Uploaded by

Using GPU Technologies To Drastically Accelerate FDTD Simulations

Uploaded by

Using GPU Technologies to Drastically Accelerate

Chad Pendley, Remcom, Inc.

© 2009 Remcom Inc. Page 2

Years ago, a concept of general-purpose computing on a GPU (sometimes

© 2009 Remcom Inc. Page 3

To the left is NVIDIA’s Tesla C1060 computing

To the right is the Tesla S1070

Another consideration for comparing CPU to GPU timing is identifying

© 2009 Remcom Inc. Page 4

© 2009 Remcom Inc. Page 5

For this case, a project was used which carried a

© 2009 Remcom Inc. Page 6

GPU Speedup over CPU

© 2009 Remcom Inc. Page 7

[2] A. Taflove and M. E. Brodwin, "Numerical Solution of Steady-State

[3] A. Taflove and M. E. Brodwin, "Computation of the Electromagnetic Fields

[4] A. Taflove, "Application of the Finite-Difference Time-Domain Method to

[5] S. Bellofiore, J. Foutz, R. Govindarajula, I. Bahçeci, C. Balanis, A. Spanias, J.

© 2009 Remcom Inc. Page 8

You might also like