Advanced Computational Electromagnetic Methods and Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 597

Advanced Computational Electromagnetic

Methods and Applications

Yu-FM.indd i 2/24/2015 11:24:02 AM


For a complete listing of titles in the
Artech House Antennas and Electromagnetics Analysis Library
turn to the back of this book.

Yu-FM.indd ii 2/24/2015 11:24:14 AM


Advanced Computational Electromagnetic
Methods and Applications

Wenhua Yu
Wenxing Li
Atef Elsherbeni
Yahya Rahmat-Samii

Editors

Yu-FM.indd iii 2/24/2015 11:24:14 AM


Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the U.S. Library of Congress.

British Library Cataloguing in Publication Data


A catalogue record for this book is available from the British Library.

Cover design by John Gomes

ISBN 13: 978-1-60807-896-7

© 2015 ARTECH HOUSE


685 Canton Street
Norwood, MA 02062

All rights reserved. Printed and bound in the United States of America. No part of this book
may be reproduced or utilized in any form or by any means, electronic or mechanical, including
photocopying, recording, or by any information storage and retrieval system, without permission
in writing from the publisher.
All terms mentioned in this book that are known to be trademarks or service marks have been
appropriately capitalized. Artech House cannot attest to the accuracy of this information. Use of
a term in this book should not be regarded as affecting the validity of any trademark or service
mark.

10 9 8 7 6 5 4 3 2 1

Yu-FM.indd iv 2/24/2015 11:24:14 AM


Contents
Preface ....................................................................................... xv

1. Novelties of Spectral Domain Analysis in Antenna


Characterizations: Concept, Formulation,
and Applications ......................................................................... 1
Joshua M. Kovitz and Yahya Rahmat-Samii

1.1 INTRODUCTION .................................................................................. 1


1.2 ANTENNA RADIATION ANALYSIS IN THE SPECTRAL
DOMAIN ................................................................................................ 5
1.2.1 From Maxwell’s Equations to the Plane Wave Spectrum ......... 6
1.2.2 The Plane Wave Spectrum and the Fourier Transform ........... 10
1.2.3 Radiated Far Fields as a Spectrum of Plane Waves ................ 12
1.3 OBTAINING THE PLANE WAVE SPECTRUM FROM FAR-
FIELD PATTERNS AND RADIATED POWER ................................ 22
1.3.1 Finding the True Far-Field Magnitudes................................... 22
1.3.2 Plane Wave Spectrum Retrieval from Far-Field Patterns ........ 26
1.4 PLANE WAVE SPECTRUM COMPUTATION VIA FAST
FOURIER TRANSFORM .................................................................... 27
1.4.1 Discretizing the Plane Wave Spectrum and the Electric
Field Distribution .................................................................... 28
1.4.2 Proper Normalization of the Fast Fourier Transform .............. 30
1.4.3 The Sampling Theorem and Spectral Analysis ....................... 34
1.4.4 Far-Field Sampling Rates ........................................................ 37
1.4.5 Interpolating the Far Fields ..................................................... 40
1.4.6 Subtle Issues When Implementing the FFT and iFFT Using
Pre-Built Packages and Libraries ............................................ 44
1.5 COORDINATE TRANSFORMATIONS FOR GENERALIZED
SIMULATION AND MEASUREMENT SYSTEMS .......................... 45
1.6 THEORETICAL VALIDATION OF NEAR-FIELD PREDICTION .. 52

v
vi Advanced Computational Electromagnetic Methods and Applications

1.6.1 Rectangular Aperture Distribution .......................................... 53


1.6.2 Circular Aperture Distribution ................................................ 57
1.6.3 Axial Field Prediction of the Uniform Circular Aperture........ 60
1.7 SOME PRACTICAL EXAMPLES ...................................................... 64
1.7.1 A Symmetric Reflector Antenna ............................................. 64
1.7.2 A Symmetric Reflector Antenna with an Elliptical
Projected Aperture................................................................... 70
1.7.3 Near-Field Prediction with Only Two Pattern Cuts ................ 75
REFERENCES .............................................................................................. 80

2. High-Order FDTD Methods .................................................... 83


Mohammed F. Hadi and Atef Z. Elsherbeni

2.1 FOURTH ORDER DIFFERENCES IN FDTD


DISCRETE SPACE .............................................................................. 84
2.2 SEAMLESS HYBRID S24/FDTD SIMULATIONS ........................... 90
2.3 ABSORBING BOUNDARY CONDITIONS ...................................... 94
2.4 POINT CURRENT AND FIELD SOURCES ...................................... 99
2.5 PLANE WAVE SOURCES ................................................................ 101
2.6 PEC MODELING ............................................................................... 104
2.6.1 Planar PEC Boundaries ........................................................ 104
2.6.2 Noncritical Curved PEC Models .......................................... 104
2.6.3 Critical Curved PEC Models ................................................ 104
2.7 ADVANCED FORMS OF HIGH-ORDER FDTD
ALGORITHMS .................................................................................. 106
2.7.1 The Finite Volumes-Based FV24 Algorithm ........................ 106
2.7.2 High-Order Algorithms for Compact-FDTD Grids ............... 109
REFERENCES ............................................................................................ 112

3. GPU Acceleration of FDTD Method for Simulation of


Microwave Circuits ................................................................ 115
Veysel Demir

3.1 INTRODUCTION .............................................................................. 115


3.2 FDTD CODE FOR MICROWAVE CIRCUIT SIMULATION ......... 116
Contents vii

3.2.1 Features of the FDTD Code .................................................. 116


3.2.2 Input Parameters File............................................................. 118
3.2.3 Main Program Layout ........................................................... 119
3.2.4 Field Updates......................................................................... 121
3.2.5 Outputs of the Program ......................................................... 124
3.3 FDTD CODE USING CUDA ............................................................. 127
3.3.1 Performance Optimization .................................................... 127
3.3.2 Memory Accesses ................................................................. 128
3.3.3 Preparation of the GPU Device ............................................. 129
3.3.4 Thread to Cell Mapping ........................................................ 133
3.3.5 The Time-Marching Loop ..................................................... 135
3.3.6 Field Updates......................................................................... 136
3.3.7 Source Updates and Output Calculations .............................. 139
3.4 NUMERICAL RESULTS .................................................................. 142
REFERENCES ............................................................................................ 143

4. Recent FDTD Advances for Electromagnetic Wave


Propagation in the Ionosphere .............................................. 147
Alireza Samimi, Bach T. Nguyen, and Jamesina J. Simpson

4.1 INTRODUCTION .............................................................................. 147


4.2 CURRENT STATE OF THE ART..................................................... 149
4.3 FDTD EARTH-IONOSPHERE MODEL OVERVIEW .................... 151
4.3.1 FDTD Space Lattice .............................................................. 151
4.3.2 Example Updating Algorithm for TM Grid Cells ................. 153
4.4 NEW MAGNETIZED IONOSPHERIC PLASMA
ALGORITHM .................................................................................... 155
4.4.1 Collisional Plasma Algorithm ............................................... 156
4.4.2 Two Example Validations ..................................................... 158
4.4.3 Summary of Performance ...................................................... 167
4.5 STOCHASTIC FDTD (S-FDTD) ....................................................... 167
4.5.1 Overview ............................................................................... 167
4.5.2 Mean Field Equations ............................................................ 169
4.5.3 Variance Field Equations ...................................................... 170
4.6 INPUT TO FDTD/S-FDTD EARTH-PLAMSA IONOSPHERE
MODELS ............................................................................................ 171
viii Advanced Computational Electromagnetic Methods and Applications

4.7 CONCLUSIONS ................................................................................ 172


REFERENCES ............................................................................................ 172

5. Phi Coprocessor Acceleration Techniques in


Computational Electromagnetic Methods............................ 175
Wenhua Yu, Xiaoling Yang, and Lei Zhao

5.1 INTRODUCTION .............................................................................. 176


5.2 ENVIRONMENT REQUIREMENTS AND SETTINGS .................. 178
5.2.1 Hardware Configuration ........................................................ 178
5.2.2 Software Configuration ......................................................... 180
5.2.3 Compilation Environment ..................................................... 188
5.2.4 Example Code for CPU and Xeon Phi Coprocessor ............. 190
5.3 CODE DEVELOPMENT ................................................................... 199
5.3.1 Performance Optimization ................................................... 199
5.3.2 Memory Alignment .............................................................. 204
5.3.3 Parallel FDTD Implementation ............................................ 204
5.3.4 Job Scheduling Strategy ....................................................... 208
5.3.5 FDTD Code Development .................................................... 211
5.3.6 Matrix Multiplication ........................................................... 215
5.4 NUMERICAL RESULTS .................................................................. 219
REFERENCES ............................................................................................ 225

6. Domain Decomposition Methods for Finite Element


Analysis of Large-Scale Electromagnetic Problems ............ 227
Ming-Feng Xue and Jian-Ming Jin

6.1 FETI METHODS WITH ONE AND TWO LAGRANGE


MULTIPLIERS .................................................................................. 229
6.1.1 FETI Method with One Lagrange Multiplier ........................ 229
6.1.2 FETI Method with Two Lagrange Multipliers ...................... 232
6.1.3 Symbolic Formulation ........................................................... 234
6.2 FETI-DP METHODS WITH ONE AND TWO LAGRANGE
MULTIPLIERS .................................................................................. 235
6.2.1 FETI-DP Method with One Lagrange Multiplier .................. 236
6.2.2 FETI-DP Method with Two Lagrange Multipliers ................ 239
Contents ix

6.2.3 Comparison Between FETI-DP Methods with One


and Two Lagrange Multipliers .............................................. 242
6.3 LM-BASED NONCONFORMAL FETI-DP METHOD .................... 243
6.3.1 Nonconformal Interface and Conformal Corner Meshes ...... 243
6.3.2 Extension to Nonconformal Interface and Corner Meshes .... 245
6.4 CE-BASED NONCONFORMAL FETI-DP METHOD..................... 247
6.4.1 Nonconformal Interface and Conformal Corner Meshes ...... 247
6.4.2 Extension to Nonconformal Interface and Corner Meshes .... 251
6.4.3 Comparison Between the LM- and CE-Based FETI-DP
Methods ................................................................................. 251
6.5 FETI-DP METHOD ENHANCED BY THE SECOND-ORDER
TRANSMISSION CONDITION ........................................................ 252
6.6 HYBRID NONCONFORMAL FETI/CONFORMAL FETI-DP
METHOD ........................................................................................... 254
6.7 NUMERICAL EXAMPLES ............................................................... 256
6.7.1 Wave Propagation in Free Space ........................................... 257
6.7.2 Wave Propagation in PML Medium...................................... 259
6.7.3 Vivaldi Antenna Array .......................................................... 263
6.7.4 Vivaldi Antenna Array with a Large Scan Angle .................. 266
6.7.5 NRL Vivaldi Antenna Array with Radome ........................... 269
6.7.6 Medium-Scale Two-Dimensional Microring Resonator ....... 271
6.7.7 Full-Scale Three-Dimensional Double-Microring
Resonator ............................................................................... 275
6.8 SUMMARY ........................................................................................ 278
REFERENCES ............................................................................................ 279

7. High-Accuracy Computations for Electromagnetic


Integral Equations .................................................................. 283
Andrew F. Peterson and Malcolm M. Bibby

7.1 NORMALIZED RESIDUAL ERROR ............................................... 284


7.2 HIGH-ORDER TREATMENT OF SMOOTH TARGETS ................ 285
7.3 THE DIPOLE ANTENNA ................................................................. 287
7.4 HIGH-ORDER TREATMENT OF WEDGE SINGULARITIES ...... 289
7.5 HIGH-ORDER TREATMENT OF JUNCTIONS .............................. 292
x Advanced Computational Electromagnetic Methods and Applications

7.6 ALTERNATIVE ERROR ESTIMATORS ........................................ 292


7.7 PROSPECTS FOR CONTROLLED ACCURACY
COMPUTATIONS IN THREE-DIMENSIONAL PROBLEMS ....... 293
7.8 SUMMARY ........................................................................................ 295
REFERENCES ............................................................................................ 295

8. Fast Electromagnetic Solver Based on Randomized


Pseudo-Skeleton Approximation ........................................... 299
Xianyang Zhu

8.1 INTRODUCTION .............................................................................. 299


8.2 LOW RANK PROPERTY OF SUBMATRICES OF
PARTITIONED IMPEDANCE MATRIX ......................................... 301
8.3 PARTITIONING OF THE COMPUTATIONAL DOMAIN ............. 304
8.4 LOW RANK MATRIX DECOMPOSITION ..................................... 307
8.4.1 Singular Value Decomposition ............................................. 307
8.4.2 Randomized Projection Approach ......................................... 309
8.4.3 Adaptive Cross Approximation (ACA) ................................. 310
8.4.4 Randomized Pseudo-Skeleton Approximation ...................... 312
8.5 LOW RANK DECOMPOSITION OF MULTIPLE RIGHT SIDES .. 316
8.6 DIRECT SOLVER BASED ON BLOCK LU DECOMPOSITION... 317
8.7 PARALLELIZATION VIA OPENMP AND BLAS LIBRARY ....... 319
8.8 NUMERICAL EXAMPLES ............................................................... 320
8.8.1 Selection of the Sample Numbers ......................................... 320
8.8.2 Accuracy of the Randomized Pseudo-Skeleton
Approximation ...................................................................... 321
8.8.3 Comparison with ACA .......................................................... 322
8.8.4 RCS of a PEC Sphere ............................................................ 323
8.8.5 Multiple Monostatic Scattering Analysis of an Airplane
Model .................................................................................... 324
8.8.6 Speed-Up of the Parallel Implementation ............................. 326
8.9 SUMMARY ........................................................................................ 327
REFERENCES ............................................................................................ 328
Contents xi

9. Computational Electromagnetics for the Evaluation of


EMC Issues in Multicomponent Energy Systems ................ 331
Osama A. Mohammed and Mohammadreza R. Barzegaran

9.1 INTRODUCTION .............................................................................. 331


9.2 PHYSICS-BASED MODELING FOR THE ANALYSIS OF THE
MACHINE DRIVE............................................................................. 333
9.2.1 Multiscale Problems .............................................................. 333
9.2.2 Numerical Virtual Prototyping .............................................. 335
9.3 EQUIVALENT SOURCE MODELING ............................................ 338
9.3.1 Introduction Motor ................................................................ 340
9.3.2 DC Motor .............................................................................. 356
9.3.3 Synchronous Generator ......................................................... 364
9.3.4 Cable Sets .............................................................................. 367
9.3.5 Coupling of Machines ........................................................... 375
9.3.6 Whole System Setup ............................................................. 377
9.3.7 Generalization of the Equivalent Source Model .................... 381
9.4 POWER CONVERTERS ................................................................... 390
9.4.1 Modeling Approach............................................................... 390
9.4.2 Simulation and Experiment ................................................... 393
9.4.3 Applications of the Frequency Response Analysis of the
Stray Field ............................................................................. 399
9.5 HIGH-FREQUENCY EQUIVALENT SOURCE MODELING ........ 401
9.6 OPTIMIZATION OF POWER ELECTRONIC CONVERTERS
USING PHYSICS-BASED MODELS ............................................... 405
9.7 SUMMARY ........................................................................................ 407
REFERENCES ............................................................................................ 408

10. Manipulation of Electromagnetic Waves Based on New


Unique Metamaterials: Theory and Applications ............... 411
Qun Wu, Jiahui Fu, Fanyi Meng, Kuang Zhang, and Guohui Yang

10.1 INTRODUCTION .............................................................................. 411


10.2 THEORY OF TRANSFORM OPTICS AND APPLICATIONS ....... 412
10.2.1 Theory of Transform Optics .................................................. 412
10.2.2 Invisibility Cloak Based on Transform Optics ...................... 414
xii Advanced Computational Electromagnetic Methods and Applications

10.2.3 Electromagnetic Concentrator Based on the Transform


Optics .................................................................................... 417
10.2.4 Reflectionless Waveguide Connector Based on Transform
Optics .................................................................................... 420
10.2.5 Multibeam Antenna Based on Transform Optics .................. 423
10.3 A DETACHED ZERO INDEX METAMATERIAL LENS FOR
ANTENNA GAIN ENHANCEMENT ............................................... 427
10.3.1 Design and Analysis of Detached ZIML ............................... 429
10.3.2 Fabrication, Simulation, and Test of ZIML ........................... 431
10.4 AUTOMATIC DESIGN OF BROADBAND GRADIENT INDEX
METAMATERIAL LENS FOR GAIN ENHANCEMENT OF
CIRCULARLY POLARIZED ANTENNAS ..................................... 435
10.4.1 Automatic Design Method of GRIN Metamaterial Lens ...... 436
10.4.2 Numerical Simulations .......................................................... 441
10.4.3 Fabrication and Measurement ............................................... 445
10.5 CONCLUSIONS ................................................................................ 449
REFERENCES ............................................................................................ 450

11. Time-Domain Integral Equation Method for Transient


Problems .................................................................................. 455
Mingyao Xia

11.1 INTRODUCTION .............................................................................. 455


11.2 DERIVATIONS OF TIME-DOMAIN INTEGRAL EQUATIONS... 457
11.2.1 Integral Equations for the 3-D PEC Object ........................... 457
11.2.2 Integral Equations for 1-D and 2-D PEC Structures ............. 459
11.2.3 Integral Equations for the 3-D Dielectric Body ..................... 461
11.3 DISCRETIZATION OF GOVERNING EQUATIONS ..................... 463
11.3.1 Discretization for the Wire Problem ...................................... 464
11.3.2 Discretization for the 2-D Problem ....................................... 469
11.3.3 Discretization for the 3-D Conducting Body ......................... 471
11.3.4 Discretization for the 3-D Dielectric Body............................ 477
11.4 EVALUATION OF MATRIX ELEMENTS ...................................... 479
11.4.1 Matrix Setup for the Wire Problem ....................................... 479
11.4.2 Matrix Setup for the 3-D Problem ......................................... 484
11.4.3 Matrix Setup for the 2-D Problems ....................................... 488
Contents xiii

11.5 EXTENSION TO MOVING OBJECTS............................................. 493


11.5.1 Transforms of Space Time and Fields ................................... 494
11.5.2 Simulation Process ................................................................ 499
11.6 NUMERICAL IMPLEMENTATIONS .............................................. 501
11.6.1 Numerical Examples for Wire Problems ............................... 503
11.6.2 Numerical Examples for the 2-D Structures.......................... 506
11.6.3 Numerical Examples for the 3-D Geometries ....................... 508
11.6.4 Numerical Examples for Moving Objects ............................. 512
11.7 SUMMARY ........................................................................................ 515
REFERENCES ............................................................................................ 515

12. Statistical Methods and Computational Electromagnetics


Applied to Human Exposure Assessment ............................. 519
Joe Wiart

12.1 INTRODUCTION .............................................................................. 519


12.2 EXPOSURE ASSESSMENT USING FDTD AND THE
CHALLENGE OF VARIABILITY .................................................... 520
12.2.1 Present Exposure Assessment Using FDTD ......................... 520
12.2.2 Uncertainty and Variability Management ............................. 524
12.3 METAMODEL MODEL FOR UNCERTAINTY
PROPAGATION ................................................................................ 526
12.4 DESIGN OF EXPERIMENTS ........................................................... 527
12.5 SURROGATE MODEL VALIDATION............................................ 530
12.6 MODEL CONSTRUCTION AND REGRESSION ........................... 532
12.7 POLYNOMIAL CHAOS EXPANSIONS .......................................... 534
12.7.1 Introduction to Polynomial Chaos Expansions ..................... 534
12.7.2 Calculation of the GPCE Coefficients ................................... 538
12.7.3 Construction of a Surrogate Model Using a Polynomial
Chaos ..................................................................................... 540
12.7.4 Example of the Use of the GPCE Model ............................... 543
12.7.5 Sensibility Analysis ............................................................... 546
12.8 KRIGING ........................................................................................... 550
12.8.1 Introduction to Kriging .......................................................... 550
12.8.2 Covariance and Variogram .................................................... 551
xiv Advanced Computational Electromagnetic Methods and Applications

12.8.3 Ordinary and Simple Kriging ................................................ 552


12.9 CONCLUSION ................................................................................... 555
REFERENCES ............................................................................................ 555

About the Authors .................................................................. 559


Index ........................................................................................ 569
Preface

As an important branch of electromagnetic fields and microwave techniques,


computational electromagnetics (CEM) has found a variety of applications in
scientific research and engineering. Commonly used methods in CEM include the
finite element method (FEM), finite difference time domain (FDTD) method, and
the method of moments (MoM). However, challenges often arise in these methods
when they are applied to solve large problems or some special problems in terms
of simulation time, accuracy, or memory usage. This book presents some important
extensions and enhancements to these methods.
Chapter 1 details the utilization of spectral domain analysis to retrieve the
absolute electric field magnitude and phase values in the near-field region of an
antenna, an important problem in the characterization of antenna performance for
interference and safety evaluation. By employing the plane wave spectrum (PWS)
representation, the authors outline the process in detail for reconstructing the
absolute values of the near fields from the far-field patterns and the knowledge of
either the input or radiated power, which can be used on both simulated and
measured data. Our goal is to provide a complete, self-contained reference from
which readers of all levels can fully implement the procedure discussed. To make
the material accessible for a broad audience, the chapter starts from the
fundamentals of Maxwell's equations and introduces all essential parameters and
reconstruction procedures to the readers. We provide an overview on the Fourier
transform relationship between the plane wave spectrum and the aperture near-
fields, which enables the application of the computationally efficient fast Fourier
transform (FFT) in the context of sampled data. Other critical aspects including
data interpolation, sampling, normalization, and coordinate transformations are
also disclosed. To complete the chapter, we present several theoretical and real-life
examples and compare our results to previously known results in the literature.
Chapter 2 will detail the theoretical basis and analysis of a high-order FDTD
method that has received continuous development over the years and benefited
from a fully designed suite of high-order ancillary modeling tools that matches its
phase accuracy performance. These modeling tools will in turn be fully explained
along with supporting MATLAB code segments, paying closer attention to the
more critical tools: point and planar wave initiations, absorbing boundary
conditions, and planar and curved PEC modeling. The chapter will conclude with a
brief introduction to advanced forms of this high-order method that offer
substantial performance gains at the expense of higher complexity of
implementation.

xv
xvi Advanced Computational Electromagnetic Methods and Applications

Chapter 3 presents a general-purpose computing technique on a graphics


processing unit (GPGPU) to achieve a higher performance of the FDTD method
than that on a central processing unit (CPU). The chapter presents an
implementation of a three-dimensional (3-D) FDTD code using the Compute
Unified Device Architecture (CUDA) development environment from NVIDIA.
The demonstration includes the main components of an FDTD program such as the
implementation of electric and magnetic field updating equations, excitation of
ports, calculation of voltages and currents at the ports, and calculation of scattering
parameters. The presented program, therefore, can be used to simulate basic
microwave circuits on a GPU platform.
In Chapter 4, a full 3-D magnetized ionospheric plasma FDTD algorithm has
been developed that calculates all important ionospheric effects on signals,
including absorption, refraction, phase and group delay, frequency shift,
polarization, and Faraday rotation. This chapter starts with an overview of the
current state of the art for trans-ionospheric EM wave propagation, and then
describes in detail an efficient, 3-D FDTD magnetized ionospheric plasma model
that may be used to greatly advance the current state of the art. Next, a new
stochastic FDTD (S-FDTD) magnetized ionospheric plasma model is described,
which yields both average as well as variance electric and magnetic fields due to
variances and uncertainties in the ionosphere composition. The chapter concludes
with a few example applications of these models.
Chapter 5 introduces the architecture of the Phi coprocessor, programming
techniques, and acceleration techniques in computational electromagnetic methods.
We also introduce how to modify a serial code to run on the Phi coprocessor
platform in the parallel format efficiently. The representative examples will be in
acceleration for the parallel FDTD methods and matrix multiplications occurring in
the method of moments (MoM) and the finite element method (FEM). The
numerical examples demonstrate the excellent performance of the Phi coprocessor
for computational electromagnetic methods. A comparison between the popular
CPUs and Phi coprocessors is provided for typical examples in antennas and
microwave circuits.
Chapter 6 is focused on the development of domain decomposition methods
(DDMs) for the finite element analysis of large-scale electromagnetic problems.
It first describes several numerical algorithms based on the dual-primal finite
element tearing interconnecting (FETI-DP) method for the full-wave analysis
of electromagnetic problems. Then it formulates two FETI-DP methods to deal
with nonconformal meshes at the subdomain interfaces using Robin-type
transmission conditions. This is followed by the implementation of higher-order
transmission conditions for a faster convergence of the iterative solution of
the global interface system. Finally, it presents a hybrid method to handle multi-
region electromagnetic problems, where the finite element tearing and
interconnecting (FETI) method is employed to deal with mesh-nonconformal
and/or geometry-nonconformal interfaces between regions and the FETI-DP
method is used for mesh-conformal and geometry-conformal interfaces inside each
Preface xvii

region. Many numerical examples are presented to demonstrate the application,


accuracy, efficiency, and capability of these FETI-DP algorithms.
Chapter 7 will review the current state of the art in high-accuracy
computations of the type arising from the MoM discretizations of electromagnetic
integral equations. By high accuracy we imply something approaching the goal of
dialable accuracy on the part of a user. The ingredients needed to facilitate high
accuracy computations include robust formulations, curved patch models, high-
order representation of currents or fields, treatment of field and current
singularities at edges, corners, and tips, accurate techniques for Green’s function
integrals, an understanding of convergence rates, techniques for error estimation,
and an overall control strategy that incorporates adaptive refinement procedures.
The state of progress in each of these areas will be reviewed and illustrated by
examples, and the areas where additional work is needed will be identified.
In Chapter 8, an efficient and simple approach is proposed for the analysis of
electromagnetic scattering. The algorithm starts with a multilevel partitioning of
the computational domain, which is very similar to the technique employed in the
multilevel fast multipole algorithm (MLFMA). Any of the impedance sub-matrices
associated with the well-separated partitioning clusters (far interaction terms) are
rank deficient and can be represented by the product of two much smaller matrices.
Therefore, the memory requirement will be relieved and the total CPU time will be
reduced significantly as well. Compared to various low-rank decomposition
methods including the popular adaptive cross approximation (ACA), the approach
based on the randomized pseudo-skeleton approximation (RPSA) is much more
efficient and easy to implement. Numerical examples are provided to show the
validity of the new algorithm.
In Chapter 9, we will show modeling details and procedures to quantify
signatures and EMI of actual physical components in several practical examples.
Detailed physics-based computational electromagnetic field models of
multicomponent energy systems enable the evaluation of realistic waveforms of
voltages and currents for low and high frequency operation. These models also
enable inclusion of practical effects such as parasitic elements, leakage saturation,
and switching patterns during the system operation. This is essential for studying
signatures from individual components and connected systems, which is necessary
during the design stage. These models also enable the evaluation of conducted and
radiated electromagnetic fields in machinery, cables, and power converters used in
multicomponent energy systems. The models enhance our ability to determine
their signatures and EMI interactions as well as to evaluate the effectiveness of
connecting controllers and/or other components.
Chapter 10 introduces metamaterials arranging a set of unit cells in a regular
array throughout a region of space, thus obtaining some desirable macroscopic
electromagnetic behavior. The desired property is often one that is not normally
found naturally (negative refractive index, near-zero index, and so forth). Over the
past few years, the flexibilities of the metamaterials in choosing the numerical
value of the effective permittivity or permeability have led to kinds of novel
xviii Advanced Computational Electromagnetic Methods and Applications

theoretical and practical possibilities for different applications, ranging from


microwave to optical regime. In this chapter, we discuss the theoretical basis by
which metamaterials can manipulate the electromagnetic waves, and further
discuss their applicability to various devices or components, including: (1) novel
devices based on optical transformation, such as invisibility cloaks, energy
concentrators, waveguide connectors, and multibeam antennas; (2) metamaterial
absorbers; and (3) gain enhancement metamaterial lenses.
In Chapter 11, the time domain integral equation (TDIE) method for
simulations of transient phenomena is presented. Following a brief introduction to
the approach, various integral equations are derived based on the equivalent
principle, the retarded potential theory, and the boundary conditions. Then
discretizing schemes are described, including geometric meshing and selections of
both temporal and spatial basis functions. An emphasis is placed on precise
evaluations of matrix elements, which are crucial for stability and accuracy. The
method is extended to transient scattering by an arbitrarily moving body, which
travels at hypervelocity and rotates or maneuvers simultaneously about an apparent
barycenter. Many numerical results are provided for both algorithmic verifications
and real-world applications.
Chapter 12 discusses stochastic modeling and presents case studies that show
the ability of this method to assess the human exposure induced by RF sources. It
presents case studies in the near field and at larger distances using equivalent
principle and spherical modes expansion of RF sources. This chapter discusses the
use of surrogate models to characterize the statistical variations of the output
induced by the variation of the inputs. It presents case studies that indicate the
potential of statistical methods, such as chaos polynomial expansion, that can be
used to build these surrogate models with a parsimonious number of FDTD
simulations.
This book provides insightful understanding of modern topics for senior
students and graduate students in electrical engineering and college professors in
the areas related to electromagnetic computing techniques. This book could also
become a great reference for engineers who are eager to learn the advanced CEM
methods and problem-solving techniques.
This book was partially supported by Project 111 of Harbin Engineering
University and National Science Foundation of China under Grant no.61372057 .

Wenhua Yu
Wenxing Li
Atef Z. Elsherbeni
Yahya Rahmat-Samii

March 2015
Chapter 1
Novelties of Spectral Domain Analysis in
Antenna Characterizations: Concept,
Formulation, and Applications
Joshua M. Kovitz and Yahya Rahmat-Samii

1.1 INTRODUCTION

Characterizing and understanding electromagnetic radiation has been a focal point


of research worldwide for more than a century and still remains an important
problem today. Radiation can be generated by a wide variety of sources, both
natural and synthetic. Antennas are one important example of man-made sources
(or receivers) of electromagnetic waves whose primary purpose is to convert
guided waves into radiating waves and vice versa. The conversion enables the
manipulation of the guided waves with integrated electronics to provide certain
functionalities such as communications or sensing. The services provided by
antenna systems have led to revolutionary developments in wireless
communication, bringing about new economic markets such as cellular and
satellite telephony, direct broadcast television, global positioning systems (GPSs),
and more. Thus, the importance of fully understanding and characterizing the
antenna cannot be overemphasized.
When analyzing the radiation from an antenna, scientists and engineers are
most often concerned with the antenna’s so-called far-field properties. The far-
fields are the electromagnetic fields (EMFs) occurring at distances where the
antenna approximately appears as a point, that is, the distance to the antenna is
much larger than the overall extent of the antenna. At these distances, the radiated
electromagnetic fields behave similarly to plane waves, and analyzing the radiation
becomes simplified due to several key approximations. The region of space
corresponding to distances much larger than the antenna size is often termed the
far-field region, and the distances associated with this region can be extremely far
from the antenna depending on its size. While the far-field antenna properties tend
to be the primary concern for connectivity or sensing matters, there has been a
recent interest in finding the EMFs at any location in the vicinity of the antenna.
The fields that occur in regions other than the far-field region have often been
1
2 Advanced Computational Electromagnetic Methods and Applications

termed the near fields. In these regions, the antenna no longer appears as a point,
leading to complex field behavior that is difficult to analyze numerically and
analytically. The near-field and far-field regions are illustrated in Figure 1.1 with a
large reflector dish antenna ground station, where an observer in the near-field
region does not perceive the antenna as a point and experiences complex wave
behavior. However, the depicted satellite orbits the Earth at a large distance away
from the surface, where the satellite experiences far-field radiation as if the dish
antenna was a point source that concentrated its power in one direction.

Figure 1.1 Qualitative illustration of the near-field and far-field regions. The observer in the figure is
standing within the near-field region of the large ground dish antenna, whereas the satellite
is located in the far-field region of the dish antenna ground station. Note that in the far-
field region the dish antenna appears nearly as a point source to the satellite, whereas the
antenna does not appear as a point source to the observer in the near-field region.

The approaches that can be used to acquire the near-fields can be divided into
three general categories, which are all depicted in Figure 1.2. The first category
encompasses direct measurements of the EMFs near the antenna. Since the electric
field is the primary quantity of interest, a simple and intuitive technique under this
category would be to measure the electric fields using a simple power meter and
antenna positioner. If the electric field phase is desired as well, then the power
meter can be replaced with a vector network analyzer. While this directly measures
and obtains the near fields, the approach can be cumbersome, time-consuming,
expensive, and in some cases impractical. First, the approach requires having
robust mechanical equipment that can provide motion on three different axes,
which is certainly not straightforward over large volumes. Furthermore, if
measurement is chosen as the tool to determine the near-field values, there is no
guarantee that the design satisfies the antenna near-field requirements.
Consequently, multiple design iterations may compel additional costs to
reconstruct the antenna design in order to satisfy the desired specifications.
Novelties of Spectral Domain Analysis 3

Figure 1.2 Depiction of the possible techniques to find the near-field radiation from a given antenna.
Of the three techniques, this chapter specifically focuses on spectral analysis, which often
requires less time or computational effort in comparison to full-wave simulation or direct
measurements. A handy feature of this technique is that the only data required are the far-
field patterns and the radiated power, which are often known in most practical
circumstances.

The next category of approaches comprises the standard methodologies used


to solve radiation problems. Some such techniques include the finite-element
method (FEM), finite difference time domain (FDTD), and the method of moments
(MoM), often known as full-wave simulation techniques. A tremendous amount of
research has been directed towards developing, enhancing, and utilizing each of
these algorithms towards solving difficult electromagnetic problems, and many of
the subsequent chapters in this book are devoted to the modification and utilization
of these algorithms. However, for each of these algorithms, the computational
burden increases dramatically as the antenna’s size increases (in terms of
wavelengths). While modifications can be made to the algorithms, this is still the
inherent limiting feature of these algorithms. For the FEM and FDTD algorithms,
obtaining reasonable accuracy in the EMF values also requires a fine mesh, which
further hinders the application of these algorithms towards this problem. Among
the algorithms, MoM can be reasonably applied to large antennas, such as reflector
antennas or slot arrays, to find the current distribution leading to the radiation.
However, obtaining the near fields typically involves a very tedious integration
over the problem domain for every single observation point in the near-field
region. This can require a vast amount of time and effort, which could impede the
use of MoM towards this problem. In the context of reflector antennas, the
physical optics (PO) approximation remains a popular strategy. The currents on the
reflector can be quickly approximated by the incident fields from the feed. With
the currents known, the near-field radiation can be computed. However, computing
4 Advanced Computational Electromagnetic Methods and Applications

the near fields still requires a tedious integration over the currents for every
observation point in the near-field region.
The last category of techniques can be classified as spectral analysis. In many
of these techniques, the fields are analyzed by decomposing the fields into an
ensemble of propagating and evanescent waves traveling in different directions. A
simple and intuitive approach is to decompose the fields into plane waves [1, 2].
This enables the rapid calculation of the fields in any region through the use of the
fast Fourier transform (FFT), which is well known in the computational
community for its inherent computational efficiency. Rather than using currents to
predict the near fields, we can directly utilize the far field to evaluate the near-
fields. This is rather convenient since the far-fields are usually known in most
practical cases, where the far fields can be found via simulation or measurements.
With the knowledge of the far field radiation and radiated power, one can
accurately predict the magnitude of the near fields. Often, the antenna is placed in
a complex environment where it can be difficult to characterize the radiation from
interactions between the antenna and other nearby objects. When applied to the
measured far fields, the spectral domain approach conveniently provides the near
fields radiated from all parts of the antenna and any interactions with the antenna’s
environment. This is important in accurately characterizing the near-fields, and can
also be challenging to achieve via the standard computation techniques.
The search for an efficient and accurate near field computational technique is
motivated by personal safety as well as interference concerns. Ensuring safety for
anyone in the antenna vicinity is critical in any antenna installation, and providing
a quick means of characterizing the near fields is an important problem in the field
of antenna engineering and electromagnetics. Often standards are placed by
government organizations such as the U.S. Federal Communications Commission
(FCC) in order to provide safety for individuals and minimize possible interference
with other devices. The knowledge of the antenna near fields also can be used in
the design of compact electronic systems such as CubeSat’s and other spaceborne
aircraft, where the induced fields may cause undesirable interference or breakdown
in electronics placed near the antenna. Once the near fields are known, then either
the electronics can be placed appropriately on the satellite to avoid such problems
or the antenna can be optimized such that the near fields in a particular location are
minimized.
In this chapter, we detail the steps needed to evaluate the near fields based on
the far-field data and the radiated power. Starting from Maxwell’s equations, it will
be revealed how EMFs can be decomposed into a spectrum of plane waves, which
has been popularized as plane wave expansion (PWE). The result is the Fourier
transform relationship between the near fields and far fields, which has seen use in
many applications including theoretical and computational electromagnetics [17],
antenna measurements [812], and even optics [13]. In our derivations, we provide
general results that can be used for several popular orientations of the coordinate
system describing the antenna. The discretization of the near-field and far-field
data is also discussed in detail, leading to the application of the FFT. The use of the
Novelties of Spectral Domain Analysis 5

FFT requires proper normalization to account for sampling. In the past, the data
from the FFT was simply normalized to the maximum, but doing so will not
provide the field values attained in real life. The normalization effectively scales
the results to the desired units and is accomplished with only the knowledge of the
directivity and the power radiated. Without the normalization, the resulting data
only provides relative field strengths, which is not helpful in finding the realized
values of the fields. Interpolation is another important aspect when using the FFT,
since a rectangular sampling grid in the spectral domain must be used. In general,
the far-field values are complex numbers, and care must be taken when
interpolating the values. Some simple and effective choices for interpolation
schemes are briefly highlighted and discussed in detail.
As usual, some mathematical notations and assumptions must be pointed out
to the reader. In the following derivations, the italic notation f represents a complex
scalar, while the bold notation B represents a complex vector in 3-D space. With
the exception of the discussion on FFT, these quantities are given in the phasor
domain, where the engineering ejωt time convention is assumed. This will be the
convention used throughout the chapter, unless otherwise noted.
The material derived and discussed in the ensuing sections effectively covers
all necessary aspects to recover the near-field data from the far-field data. The
chapter provides the complete story in the development and use of this technique
to aid any reader in replicating the results and applying the technique to their
antennas in general. To conclude the chapter, the concepts developed herein are
applied towards several instructive examples of well-known aperture distributions,
where the fields are known analytically for comparison. A real-life reflector
antenna example is provided, where we obtain the near fields using the simulated
far fields. Quite commonly the far fields are only known for two principal planes,
and we extend the spectral analysis technique to these cases as well. We compare
the scenario where only two principal planes are known versus the case where the
far-field patterns for all angles are known for a reflector antenna.

1.2 ANTENNA RADIATION ANALYSIS IN THE SPECTRAL DOMAIN

In order to obtain the near fields, the theoretical framework behind radiation in the
near field and far field must be established. Radiation from antennas is
characterized by its radiated electric field and magnetic fields, denoted as E and H,
respectively. Both of these physical quantities exhibit complex behavior that is
challenging to model either analytically or numerically. However, spectral analysis
provides an intuitive link between the near fields and the far fields that enables an
efficient and systematic procedure to compute the near fields based upon the
knowledge of the far fields. This is depicted in Figure 1.3, where a new quantity
known as the PWS has been introduced to facilitate a simple relationship between
the fields. As shown, the PWS has a Fourier transform relation to the near-fields in
a plane z = C, where C is some arbitrary constant. Once the PWS is known, then
6 Advanced Computational Electromagnetic Methods and Applications

the near fields in any region are known and can be computed via Fourier
transform, and the far fields can be computed via an asymptotic relation to the
PWS. In this section, the behavior of electromagnetic waves and in particular plane
waves is reviewed and described in detail. These fundamental concepts lay the
foundation to introduce the PWS formally. The relationships between the near-
fields, far-fields, and the PWS are also derived and explained. Lastly, the analytical
procedure to obtain the near fields based upon the far-field distribution and the
radiated power is outlined.

Figure 1.3 Depiction of the relationship between the near fields and far fields provided by spectral
analysis. The technique utilizes the so-called PWS to relate the fields in the near-field and
far-field regions, resulting in a Fourier transform relation to the near-field electric fields in
the planes z = C, where C is an arbitrary constant. The PWS also has a useful asymptotic
relationship to the far fields.

1.2.1 From Maxwell’s Equations to the Plane Wave Spectrum

For antennas and EMFs in general, the electric and magnetic fields can be
mathematically described by Maxwell’s equations, shown below.
 E   jB (1.1a)

 H J  jD (1.1b)

 D   (1.1c)

 B  0 (1.1d)
In the equations above, B represents the magnetic flux density, D represents the
electric flux density, J represents the electric current density, and  represents the
electric charge density. Note also that  is the angular frequency in rad/s. These
equations are known individually as Faraday’s law, Ampere’s law, Gauss’ law, and
the magnetic Gauss’ law, respectively. Maxwell’s equations are often paired with
the constitutive relations B=H and D=E, assuming homogenous, linear, and
isotropic materials are present.
Novelties of Spectral Domain Analysis 7

While Maxwell’s equations provide insights into the relationship between the
electric field, magnetic fields, and the electric sources, the solutions to these
equations are not immediately obvious. A few mathematical manipulations of
these equations can reveal some remarkable insights. Taking the curl of Faraday’s
law

    E  - j  H  -J   2 E (1.2)

and using the vector identity  F     F   2F along with the constitutive
relations, we have

2E  k 2E  jJ  (1.3)

where k    is known as the wavenumber. This equation is an inhomogeneous
partial differential equation of second order, and in unbounded space can be solved
using standard techniques (e.g., vector potentials, assuming that J and  are
known). Unfortunately, the knowledge of the sources usually comes at a great
computational cost, as discussed in the previous section.
The spectral analysis technique avoids this problem by analyzing the fields in
the source-free regions, where simplifications to the differential equations can be
made. No currents or charges exist in these regions, that is, J = 0 and  = 0, which
leads to the Helmholtz equation
2 E  k 2 E  0 (1.4)
A similar equation can also be derived for the magnetic field H. The solutions of
this equation have very interesting implications as discussed in [1416]. While
many solutions of this equation can be derived for any coordinate system, the most
important and possibly the simplest to understand are the solutions in rectangular
coordinates. In rectangular coordinates, the solutions of this equation are
E( x, y, z )  E0e jkr (1.5)

resulting in a phenomenon known as plane waves as illustrated in Figure 1.4. In


(1.5), the vector r describes the position in space as 𝐫 = 𝑥𝑥̂ + 𝑦𝑦̂ + 𝑧𝑧̂ . The vector
k contains both the wavenumber k and direction of propagation embedded, and it
can be described in the rectangular coordinate system by
𝐤 = 𝑘𝑥 𝑥̂ + 𝑘𝑦 𝑦̂ + 𝑘𝑧 𝑧̂ (1.6)
Altogether, the variables kx, ky, and kz are known as the propagation constants, as
they describe the amount of propagation in each of the three axes. The propagation
constants also must follow the dispersion relation, which states

k x2  k y2  k z2  k 2 (1.7)
8 Advanced Computational Electromagnetic Methods and Applications

This forces the speed of the plane wave to be equal to the speed of light in that
medium (i.e., 𝑣𝑝 = 𝜔⁄𝑘 = 1⁄√𝜇𝜀 ). The dispersion relation represents one of
many important properties of plane waves. One consequence is that there are only
two independent components, which means that only two components must be
known to have full knowledge of the propagation constant vector k. In many cases,
only kx and ky are given, but kz can always be found for plane waves using

k z   k 2  k x2  k y2 (1.8)

Care must be taken in choosing either positive or negative values of kz, but usually
there is enough information in the problem being solved to determine the sign. We
will highlight those cases in the subsequent sections. Also, if kx2 + ky2 > k2, then
imaginary values of kz can ensue, resulting in evanescent waves decaying in
magnitude as the observation points move in the +z direction.
Another important feature about the plane wave solution is the electric field
polarization vector E0, which points in the direction of oscillation as time
progresses. There are several important features about this vector. The first is that
the electric field vector E0 in free space (or in isotropic mediums) will be
orthogonal to the direction of propagation. This can be shown by considering the
source-free Gauss’ law

   
  E    E0e jkr  E0  e jkr  E0  ke jkr  0 (1.9)

where the second equality is made through the vector identity


  F    F  F  . The implication is that E0 · k = 0, meaning that the
electric field is orthogonal to the propagation direction. This is another important
characteristic of the electric field because it signifies that only two components of
the electric field are independent. For example, if only Ex and Ey are known, then
Ez can be found by
Ex k x  E y k y
Ez   (1.10)
kz
The magnetic field of plane waves is also an important consideration. For plane
waves, the magnetic field can be found quickly with the knowledge of the k and E
vectors. Using (1.5) and the constitutive relations, one can rewrite Faraday’s law as
k E (1.11)
H

which is the plane wave relationship between the magnetic and electric field. From
this it can be shown that the magnitude of the magnetic field is |H| = |E|/ where 
is the intrinsic impedance of the medium defined as  = √𝜇 ⁄𝜀. Equation (1.11)
also implies that the magnetic field must be orthogonal to both k and E since A ·
Novelties of Spectral Domain Analysis 9

(A × B) = 0, as depicted in Figure 1.4. Thus, in order to have full knowledge of a


plane wave, only two vector components of the propagation constant vector k and
two vector components of the electric field E need to be known. This is an
important property that is used often in spectral analysis.

Figure 1.4 Illustration of a plane wave whose direction of propagation is towards the k direction.
Note that the surfaces of constant phase are planes, hence the term plane waves. The k
vector is orthogonal to these planes, implying that propagation occurs orthogonal to these
surfaces. An example of the electric and magnetic fields of this plane wave are also
shown, where E0 and H0 are orthogonal to k.

At this point we have discussed one interesting solution to Maxwell’s


equations in unbounded space, which is known as the plane wave. Spectral
analysis uses the plane wave as the elementary building block to analyze the
electromagnetic fields emitted from an antenna. A paramount feature of Maxwell’s
equations is their inherent linearity, which implies that a superposition of plane
waves such as

E   E0 n e jk n r (1.12)
n

is also a solution to the equations. This equation represents the presence of


multiple plane waves, all traveling in different directions given by kn. Each of
these plane waves has an associated electric field vector E0n defining the
magnitude and field direction for each plane wave.
Suppose now that we have an ensemble of plane waves whose directions are
uniformly spread throughout all possible directions. Specifically, the values of kx
and ky can be written as
kx  mkx , m  ..., 2, 1,0,1, 2,... (1.13)

k y  nk y , n  ..., 2, 1, 0,1, 2,... (1.14)

With just kx and ky known, we can have full knowledge of the wave directions and
the wave vector k using (1.8). We can also introduce the quantity A, which
represents the spectral density, i.e. that is the field density packed into the spectral
10 Advanced Computational Electromagnetic Methods and Applications

band centered at kx and ky with widths of k x and k y . This is analogous to the


power spectral density often seen in communications. The spectral density can be
written as
E0 (m, n) (1.15)
A(m, n) 
C k x k y
where C is a unitless arbitrary constant, resulting in the total electric field given as

E  C  A(m, n)e jk mn r k x k y (1.16)


m n

Taking this equation and shrinking the factors k x and k y to zero produces

EC  A( k , k x y )e jkr dk x dk y (1.17)


K y Kx

which is a very important result [13]. This is typically referred to as the


continuous spectrum in contrast to the discrete spectrum. Notice that this form of
electric field is still a solution of Maxwell’s equations. It turns out that this
particular solution is quite convenient in that we can use this to represent any field
distribution in general with the correct choice of A. The interpretation is that, for a
given kx and ky, the spectral density A(kx, ky) is the intensity and direction provided
to the fields that propagate in the 𝐤 = 𝑘𝑥 𝑥̂ + 𝑘𝑦 𝑦̂ + 𝑘𝑧 𝑧̂ direction. Note again that
kz can be computed from (1.8). When defined in this manner, the quantity A(kx, ky)
is often referred to as the plane wave spectrum. However, (1.17) describes how to
find A given the electric field E. Thus, some simplifications are necessary for
further interpretation.

1.2.2 The Plane Wave Spectrum and the Fourier Transform

The PWS represents a vector quantity that provides a means to relate the near
fields and far-fields in a simple, intuitive, and compact manner. (1.17) from the
previous section demonstrated the intuition behind the PWS as a spectrum of plane
waves propagating in many different directions, all with the same frequency .
However, many more properties can be extracted from (1.17) through some
important assumptions.
A special but important case occurs when the observation point lies in the z =
0 plane. In this plane, (1.17) reduces to

 
1
  A( k , k
 jk x x  jk y y
Et ( x, y, 0)  x y )e dk x dk y (1.18)
4 2
 
Novelties of Spectral Domain Analysis 11

where C has been set to C  1/ 4 2 and the ranges of kx and ky have been extended
to cover   k x , k y   . The above equation can be recognized as a 2-D Fourier
transform with respect to the parameters kx and ky. The propagation constants kx
and ky are alike to the angular frequency  in the more common Fourier transform
relationship

 S ( )e (1.19)
jt
s(t )  d


between frequency and time dependence of signals. One key difference in (1.18) is
that the kx and ky represent spatial frequencies rather than frequencies in time.
These equations also make it clear that A represents the frequency-domain
components (alike to S()) and E represents the physical quantity of interest in
space (alike to s(t)). Another distinction is that a minus sign appears in the
exponential factor of (1.18), while the typical Fourier transform usually has a
positive exponential factor when going back to the time domain.
The fact that the PWS has the 2-D Fourier transform relationship suggests that
the PWS can be obtained via the inverse Fourier transform as
 

  E ( x, y, 0)e
jk x x  jk y y
A(k x , k y )  t dxdy (1.20)
 

due to the Fourier inversion theorem. Notice that this equation has a positive sign
in the exponential factor. Since the relationship shown in (1.20) shares similarities
with the typical Fourier transform, we denote the Fourier transform by the script
letter F, where we can rewrite (1.18) and (1.20) more compactly by

E( x, y,0)  1
 A(k x , k y )  (1.21a)

A(kx , k y )  E( x, y,0) (1.21b)

Note that the Fourier transform operation in (1.21b) has a positive exponent as
denoted in (1.20). Now, it is interesting to note that all information about the PWS
can be obtained if the electric field is known in one plane. This scenario frequently
occurs within the antenna discipline in theory and measurements. Once the full
PWS has been obtained, then all radiation information relevant to the antenna can
be computed.
As an example, let us assume an electric field distribution with the form
ˆ 0 ( y)rect( x / )
E( x, y,0)  xE (1.22)

where rect(x/ℓ) is equal to 1 inside the region  / 2  x  / 2 and zero outside of


the region and (.) is the delta function. It can be shown that the resulting PWS
from an electric field distribution of this shape would be
12 Advanced Computational Electromagnetic Methods and Applications

 ky 
A(k x , k y )  xˆE0 sin
sincc   (1.23)
 2 
where the sinc() function is defined as sinc(x) ≡ sin(x)/x. Note that with the PWS
fully known, we can go back and retrieve the electric field at z = 0 using the
inverse Fourier transform in (1.21a).
Another interesting case to consider with (1.17) is an observation plane at a
nonzero z value. If we consider the observation points on a plane defined by z = z0,
then we obtain
 
1
  A( k , k
 jk x x  jk y y
E( x, y, z0 )  x y )e jkz z0 e dk x dk y (1.24)
4 2
 

In this equation, it is important to note that both the A(kx, ky) and the e jkz z0 terms
are functions of kx and ky. If we rewrite the equation by setting

A((kkx , k y )  A(k x , k y )e jkz z0 (1.25)

then this results in the equation


 
1
  AA((k , k
 jk x x  jk y y
E( x, y, z0 )  x y )e dk x dk y (1.26)
4 2  

which means that the electric field in another plane can be computed through a
Fourier transform of the modified PWS written as A((kk x , k y ) . Both sides of the
equation are also vectors, which means that the x, y, and z components of the
electric field can be obtained from the x, y, and z components of the Fourier
transform of A .
The resulting equation clearly shows that the electric field at any point in
space can be found assuming that the PWS is already known. While there are a few
methods to obtain this quantity, we will show how to retrieve the PWS from the
far-field patterns in a later section of this chapter. Thus, with the PWS already
known from the far-field patterns, the near-field electric field at any point in space
near the antenna can be found. This is an important consequence of the PWS.
Another important point to consider is only planes of constant z were discussed;
however, this treatment can be extended to planes of constant x or y as well as
planes tilted at some arbitrary angle.

1.2.3 Radiated Far Fields as a Spectrum of Plane Waves

Previously, it was shown that the electric field in a plane could be related to the
PWS by the Fourier transform and vice versa. The treatment was generalized such
that any point in space could be obtained assuming that the point was located on
Novelties of Spectral Domain Analysis 13

the plane z = z0. Thus, it stands to reason that one could obtain the far field from
the PWS as well. In fact, the PWS represents the field strength devoted to a plane
wave in the given k direction. It then becomes intuitive that in the far field (i.e.,
r   ) the only radiation that will be received in the (, ) direction is the plane
wave component traveling in the rˆ  k / k direction as shown in Figure 1.5. In
particular, the plane wave traveling in the r̂ direction is associated with the kx and
ky propagation constants, where
kx  k sin  cos  (1.27a)

k y  k sin  sin  (1.27b)

kz  k cos  (1.27c)

where we assume that k x2  k y2  k 2 . In the case that k x2  k y2  k 2 , the waves will


have an imaginary kz, leading to evanescent waves that decay to zero in the far
field. Thus, only the components satisfying the criterion k x2  k y2  k 2 contribute to
the far field as discussed in further detail in the next subsection.

Figure 1.5 When computing the PWS from the near fields at z = 0, we are decomposing the fields
into the plane wave components propagating in the direction specified by kx and ky. In the
far-field, the only radiation that will reach the point defined by (r, ) is the plane wave
component of the PWS traveling in the r̂ direction, where rˆ  k / k or 𝑘𝑥 = 𝑘𝑠𝑖𝑛𝜃𝑐𝑜𝑠
and 𝑘𝑦 = 𝑘𝑠𝑖𝑛𝜃𝑠𝑖𝑛.

The far-field can be related to PWS by finding the asymptotic form of the
integral shown in (1.17). Using coordinate transformations, we can rewrite this
equation as
14 Advanced Computational Electromagnetic Methods and Applications

 
1 
 jr k x sin  cos  k y sin  sin   k z cos 
E(r , ,  ) 
4 2   A( k , k
 
x y )e dk x dk y (1.28)

and assume that r   and find the resulting integral. In [17], the asymptotic form
was derived using the method of stationary phase, which results in

jke jkr
E(r , ,  )  cos  A(k x , k y ) kx k sin cos (1.29)
2 r k y  k sin  sin 

This equation reveals many interesting properties about the relationship between
the far-field radiation and the PWS. First, the only spectral component that
contributes to the far-field radiation in the direction towards is the
component corresponding to the direction of propagation k fully defined by kx and
ky. An important aspect of the equation is the scaling factors jk/2, which must be
included if the proper magnitudes are to be obtained in the near field. Interestingly,
this factor has a factor of 2 embedded in comparison to the scaling factors of
vector potentials [5]. This is analogous to utilizing a perfect magnetic conductor
(PMC) sheet and doubling the magnetic current sources in order to work with the
electric field [14]. If both electric and magnetic fields are taken into account, then
the familiar 1/4 would be observed in the equation. 
Another interesting artifact of this equation is that only the  and 
components will exist in the far field. This is a well-known result and has been
proven through vector potential analysis [14]. It can also be shown by first
remembering that the far field is a source-free region and writing
 
1
E 
4 2      A(k , k
 
x y )e jk r  dk x dk y  0 (1.30)

Using  ( F)   F  F  and  A(k x .k y )  0 , we can find that

  A(kx , k y )e jkr   A(k x , k y )  ke jkr (1.31)

where one can argue that k  A(k x , k y )  0 to satisfy the source-free condition for
all positions r in the source-free region. In the far field, the propagation constant
vector is k  kr̂ , which means that rˆ  E  0 according to the above equations.
This is in agreement with previous results and is intuitive since the far fields are
considered local plane waves with no electric field component in the direction of
propagation.
We can also use this property for the PWS to provide a direct formula for the
E and E components in terms of the PWS components. Since it is common to
have knowledge of only two components of the PWS, such as Ax and Ay , one
should first write the PWS in terms of all its components, by
Novelties of Spectral Domain Analysis 15

k x Ax  k y Ay
A  xˆAx  yˆAy  zˆ (1.32)
kz
which provides the full PWS given two components. When using the spectral
analysis for constant z-planes, typically Ax and Ay are the components that are
known. Note however that the theory is not limited to only this case, and other
coordinate system configurations can be considered. Next, one can write the E
and E components in terms of the components of A as

Er , ,  
jke jkr  
 Ax k x , k y  cos    cos  sin  
ˆ ˆ  (1.33)
2r

 A
 y x y k , k ˆ sin   ˆ sin  cos 

 
k xy  k sin  sin 
k  k sin  cos 

which provides a direct link between the far-field components E and E and the
PWS. This is quite useful and we will utilize these relationships to take data from
the far-field to find the PWS in the following sections.

1.2.3.1 Visible and Invisible Regions in the Spectral Domain

A keen eye should note that only certain spectral components actually contribute to
the far-field. If kx, ky, and kz satisfy the conditions in (1.27a)(1.27c), then it
becomes impossible to achieve propagation constant values outside the region
k x2  k y2  k 2 unless we use complex values for (θ, ). This region has often been
designated by the electromagnetics community as the visible region of the PWS.
The other region satisfying the criterion k x2  k y2  k 2 has been referred to as the
invisible region. Both regions are depicted in Figure 1.6 in the spectral domain,
that is, in terms of kx and ky.
This vision behind this terminology is that only the components within the
visible region are observable to an object or receiver in the far field of the antenna.
The components outside of this region decay rapidly as the distance from the
antenna increases due to the imaginary value of kz. Thus, any far-field data will
only contain the contributions from the visible region and the evanescent waves are
invisible to the observer in the far field. Because of this, it becomes difficult to
gain knowledge of the evanescent waves from the far fields. In response, the
components of the PWS in the invisible region are often approximated as zero.
While this implies that one can only gain partial knowledge of the PWS, the
contributions from the evanescent waves are negligible for many practical cases
for antenna engineers. Specifically, electrically large antennas such as reflectors,
arrays, and large horn antennas will radiate little evanescent waves since the
16 Advanced Computational Electromagnetic Methods and Applications

spectral content is packed more densely into the visible region. This is analogous
to the inverse relationship of the bandwidth and time extent of signals. As the
antenna size becomes larger, the spectral bandwidth becomes smaller. The time-
frequency analogy is that when any signal is stretched to a longer length of time,
the bandwidth decreases, based on the scaling property of Fourier transforms.

Figure 1.6 Illustration of the invisible and visible spectral components of the PWS in the spectral
domain. Only the spectral components within the visible region contribute to the far-field
region, and are observable to a user in the far-field region. Components (or energy) in the
invisible region represent evanescent waves that decay to zero in the far field.

It is important to realize that the invisible region corresponds to higher-


frequency spectral content, which means that rapid variations in the near field
produce evanescent waves. While the overall extent of the antenna is important as
previously discussed, another significant characteristic of the near fields is how
quickly the fields vary in space. A sharp transition in the near fields results in
stronger evanescent waves due to the higher spectral content. This is exactly the
same principle in signals as well, where a sharp discontinuity in the signal can
create strong sidelobes in the frequency domain. Using smooth transitions reduces
these sidelobes, and most large antennas make use of this feature in order to reduce
their sidelobes in the far field.
To demonstrate the importance of both the antenna size and its transitions
observed in the near fields, we will consider several interesting cases of near-field
distributions of E and discuss the properties. The first case is the uniform
rectangular aperture, whose electric field distribution has the form

x  y
E  x, y, 0   xE
ˆ 0 rect   rect   (1.34)
a
  b

Note that the electric fields go immediately to zero outside the region where
a / 2  x  a / 2 and b / 2  y  b / 2, resulting in a sharp transition and
ultimately higher spectral content. Others often describe the electric fields in a
Novelties of Spectral Domain Analysis 17

plane above the antenna as the aperture distribution, and one can consider a and b
as the lengths of the physical antenna size. It is quite difficult to obtain an aperture
distribution of this form, and usually there will be some transition to zero near the
edge of the antenna. Using the Fourier transform relationship, we can find that the
PWS has the form

k a k b
A  k x , k y   xa
ˆ bE0 sinc  x  sinc  y  (1.35)
 2   2 

which is expected since the Fourier transform of rectangular pulses is sinc()


functions.
To illustrate the points being made, we plot the results from this analysis for
two cases: a = b = 5, and a = b = The results are shown in Figure 1.7. The
magnitudes of the electric fields and their corresponding PWS are both plotted for
all cases. In the PWS plots, the black circles indicate the boundary between visible
and invisible regions. The components corresponding to the visible region are
located inside the circle, while all other components are invisible components.
It can be observed that the smaller size antenna shown in Figure 1.7(a) has the
features as discussed previously. The sharp discontinuity gives rise to the high-
frequency spectral components seen in the PWS, occurring both in the visible and
invisible regions. The larger antenna (corresponding to a larger nonzero
distribution of Ex) also has many high-frequency spectral components in its PWS,
but the majority of the high-frequency components fall within the visible region.
This exemplifies the inverse relationship between antenna size and spectral
bandwidth, where the larger antenna has a narrower bandwidth in the spectral
domain in comparison to the smaller antenna. Notice also that the larger antenna
has a narrower main beam (centered at kx = ky = 0) in terms of kx and ky when
compared to the smaller antenna. In the context of far-field patterns, this implies
that the beamwidth of the larger antenna will be narrower compared to the beam
from the smaller antenna, as expected from antenna theory.
Another interesting case to investigate is a tapered electric field distribution.
In particular, the triangular distribution defined by

 2 | x |  2 | y | 
E( x, y, 0)  xˆE0 1  1   (1.36)
 a  b 

and the electric fields are zero outside the region a / 2  x  a / 2 and
b / 2  y  b / 2 . This tapering provides continuity at the edges of the aperture
and ensures that there are no sharp discontinuities. The result is the PWS given by
18 Advanced Computational Electromagnetic Methods and Applications

(a) (b)

(c) (d)
Figure 1.7 (a) Normalized electric field distribution at z = 0 for a rectangular pulse distribution with a
size of 5 × 5. (b) Normalized magnitude of the PWS of the electric field in (a). (c)
Normalized electric field distribution at z = 0 for a rectangular pulse distribution with a
size of 10 × 10. (d) Normalized magnitude of the PWS of the electric field in (c). The
black circles in the PWS plots illustrate the boundary of the visible and invisible regions.

ab k a  k yb 
A( x, y, 0)  xˆE0 sinc2  x  sinc2   (1.37)
2  4   4 
The electric field distribution and corresponding PWS for an antenna of size 5 ×
5 and 10 × 10 are shown in Figure 1.8. Comparing the plots in this figure
reveals that the use of the triangular distribution can significantly remove the
higher frequency components in the spectral domain. Again, the black circles
denote the boundary between the visible and invisible regions. For both the small
and large distributions, the most significant spectral content is found in the visible
regions since there are no sharp discontinuities. The formulas provided for the
PWS of the square versus triangular pulses also agree with these observations. The
Novelties of Spectral Domain Analysis 19

PWS envelope of the square pulse decays as 1/kxky whereas the triangular pulse
PWS decays as 1/(kxky)2, resulting in a significant decrease of high frequency
spectral components.

(a) (b)

(c) (d)
Figure 1.8 (a) Normalized electric field distribution at z = 0 for a triangular pulse distribution with a
size of 5 × 5. (b) Normalized magnitude of the PWS of the electric field in (a). (c)
Normalized electric field distribution at z = 0 for a triangular pulse distribution with a size
of 10 × 10. (d) Normalized magnitude of the PWS of the electric field in (c).

1.2.3.2 Radiated Power and Parseval’s Theorem

Another important feature of the invisible versus visible regions is the power
associated with the evanescent waves and the radiated power. This is of interest in
the near-field problem in order to ensure that the correct electric field values are
being obtained and make sense physically. Parseval’s theorem can be utilized to
relate the power in the electric field distribution at z=0 to the power in the PWS by
20 Advanced Computational Electromagnetic Methods and Applications

   
2 1 2

 
E( x, y, 0) dxdy 
4 2 
 
A(k x , k y ) dk x dk y (1.38)

which relates the power in the spatial domain to the power in the spectral domain.
For large antennas, the left side of the equation has been widely used to
approximate the radiated power from an antenna, since once the distribution is
known it is then usually straightforward to integrate. The right side integral in the
spectral domain can be split into two integrals over the visible and invisible
regions by
 
2
 A(k x , k y ) dk x dk y  I visible  I invisible 
  (1.39)
2 2
 A(k x , k y ) dk x dk y   A(k x , k y ) dk x dk y
k x2  k y2  k 2 k x2  k y2  k 2

where the first term corresponds to the visible region while the second term
corresponds to the invisible region. This is one place where the presence of
evanescent waves can make a notable difference if present. If a significant portion
of power gets transferred into the evanescent waves, that is, Iinvisible is on the same
order as Ivisible, then the aperture plane wave approximation may not be an accurate
one. In that case, the radiated power might be better computed directly through the
radiated far-field patterns.
However, when the antennas are electrically large (with respect to ), then one
can make the approximation that

 
2 2
(1.40)
 A(k x , k y ) dk x dk y   A(k x , k y ) dk x dk y
  k x2  k y2  k 2

since most of the power is within the visible region as illustrated in Figures 1.7 and
1.8. This ultimately implies that a good approximation of the radiated power is

 
| Ex |2  | E y |2 1 | Ax |2  | Ay |2
(1.41)
Prad   2
dxdy 
4 2  2
dk x dk y
  k x2  k y2  k 2

where we assume that Ez = 0 based on the plane wave approximation. This is


important in verifying that the correct amount of power is being observed in both
near fields and far fields.
Novelties of Spectral Domain Analysis 21

1.2.3.3 Space as a Filter of Evanescent Waves

The last important point to realize is that the (x, y, z) space operates as a bandpass
filter for evanescent waves (i.e., higher-frequency components of the PWS). The
end result is that capturing data in the far field ultimately removes the ability to
sense the evanescent waves due to the ideal bandpass filter effect of space. This
can best be understood by revisiting the formula to find the electric field at any
point in space as
 
1
  A( k , k
 jk x x  jk y y
E( x, y, z0 )  x y )e jkz z0 e dk x dk y (1.42)
4 2  

where A is the PWS that can be computed using electric field data from another
plane or may be provided by other means. The important factor in this equation is
the exponential term e jkz z0 . This term must be included and well understood in its
role when computing the Fourier transform to obtain the electric field E. This term
can be written in terms of kx and ky as
 jz0 k 2  k x2  k y2
e jkz z0  e (1.43)
A closer examination of this term reveals that the magnitude of this term
remains at unity within the visible region. However, the term will decay rapidly in
the invisible region for large positive values of z0 if we assume a coordinate system
with the majority of the waves traveling in the +z-direction.

Figure 1.9 Illustration of the factor e jkz z0 and its role as a filter to remove the higher frequency
components in the spectral domain. As z0 increases, the factor becomes more like an ideal
bandpass filter, removing the evanescent waves. Note that this plot provides values
assuming that ky = 0.
22 Advanced Computational Electromagnetic Methods and Applications

If we plot the e jkz z0 factor against kx and ky, we have the result shown in Figure
1.9. In this plot the magnitude of the factor is plotted against kx, where it is
assumed that ky = 0. The results are shown for several values of z0. Effectively, the
rolloff becomes faster as z0 approaches infinity. Once in the far field (i.e., z0
becomes infinite), this factor will act as an ideal bandpass filter, resulting in only
the visible spectrum being observed by the far-field observer. This is important in
recognizing why one cannot have access to the invisible components
mathematically.

1.3 OBTAINING THE PLANE WAVE SPECTRUM FROM FAR-FIELD


PATTERNS AND RADIATED POWER

Previously the relationship between the PWS and the electric fields in different
planes has been derived. The PWS can be also used to compute the far-field
distribution using asymptotic expansion. From this relationship, it stands to reason
that one could approximate the PWS through the use of the far-field data. This is
quite useful since most antenna engineers have some information about the far
field of the antenna. Thus, our next goal to achieve is to utilize the far-field data to
approximate the PWS distribution. The correct scaling must be used to reflect the
radiated power for the antenna system, which is a parameter that is assumed to be
known through the input power, impedance matching, and antenna radiation
efficiency. Once the proper scaling has been realized, the asymptotic relationship
between the far fields and the PWS can be utilized.

1.3.1 Finding the True Far-Field Magnitudes

Typically, antenna engineers have knowledge of the far-field pattern distribution,


but this pattern is usually either normalized to its maximum (making the maximum
unity) or scaled to its maximum directivity. However, this information cannot be
directly used to predict the far-field electric fields at an observation point in the
far-field for a given amount of radiated power. In order to predict the far fields, the
proper scaling must be computed in order to ensure the proper radiated power is
observed in the far field. The first parameter to discuss is the radiated power. This
quantity is often not directly available, but rather can be approximated using the
input power and other important antenna performance parameters. It is well known
that the radiated power Prad can be related to the input power through the
impedance matching and the antenna efficiency [17]. By definition, the antenna
efficiency is known as
Prad
r  (1.44)
Pacc
Novelties of Spectral Domain Analysis 23

where Pacc is the accepted power that enters the antenna. The radiation efficiency
represents the ohmic and dielectric losses in the antenna when written in this
manner. The accepted power can be related to the impedance matching
performance and the input power as


Pacc  Pin 1  
2
 (1.45)

where Pin is the input power and  is the reflection coefficient. The reflection
coefficient describes the voltage of the wave reflected by the antenna port, and
|  |2 describes the power reflected. The reflection coefficient can be found
through
Zin  Z0
 (1.46)
Zin  Z0
where Zin is the input impedance of the antenna and Z0 is the characteristic
impedance of the transmission line feeding the antenna [18]. With this in mind, the
radiated power Prad can be computed from the input power Pin through


Prad  Pin r 1  
2
 (1.47)

which shows that Prad will always be smaller than Pin assuming no amplification is
implemented at the antenna level. Usually the radiation efficiency and reflection
coefficient are known to the antenna engineer. If not, a reasonable approximation
is that the antenna is 100% efficient and minimal reflection occurs (i.e., r ≈ 1 and
 ≈ 0).
Now that the power radiated from the system level can be found, we move to
relate the Prad to the far fields for proper scaling. In order to remain general in the
derivation, we assume that the antenna radiates two orthogonal polarizations
defined by the â1 and â2 directions. These unit vectors can either be the right-hand
circular polarization (RHCP), left-hand circular polarization (LHCP), spherical, or
Ludwig’s polarization vectors, and it is important to remember that the polarization
unit vectors are dependent on the angle , ). These polarization vectors are
depicted in Figure 1.10. Furthermore, the antenna has the radiation patterns
associated with each polarization defined as f1(, ) and f2(, ), corresponding to
polarizations 1 and 2. For the sake of generality, we will assume that these patterns
have no normalization. The only assumption we make is that these patterns were
found with the same radiated power. This can be done by controlling the input
power either in simulation or in measurement. This is important or else the
relationship between f1(, ) and f2(, ) has no meaning.
The electric and magnetic fields can be found by using the point source
approximation in the far field. It is assumed that the angular distribution remains
24 Advanced Computational Electromagnetic Methods and Applications

fixed as r changes, but the electric field’s magnitude and phase change as if the
antenna were behaving like an isotropic source as

E0e jkr
Er , ,   aˆ1 f1  ,   aˆ2 f 2  ,  (1.48)
r

Figure 1.10 Coordinate system of the antenna under test (AUT). Note that we assume that the field
can be decomposed into two polarizations 𝑎̂1 and 𝑎̂2 that have arbitrary orientation for a
given direction (. This generalization allows the use of 𝜃̂, ̂ or even Ludwig’s
definitions of copolar/cross-polar fields. 

The magnetic field can be found using the local plane wave relationship
shown in (1.11) by

rˆ  E E0e jkr
Hr , ,    aˆ2 f1  ,   aˆ1 f 2  ,  (1.49)
 r
It is this scaling factor E0 that remains to be found to compute the field magnitudes
in the far field.
The Poynting vector P describes the power density propagating in a particular
direction and can be computed by
1
P(r ,  ,  )

2

Re E(r , ,  )  H* (r , ,  )  (1.50)

We can substitute (1.48)(1.49) into (1.50), the equation above to find that


P(r , ,  )  Re E1 (r , ,  ) H 2* (r , ,  )  E2 (r , ,  ) H1* (r , ,  )
2
 (1.51)

Since both polarizations contribute to the r̂ component of the radiated power, it is


useful to define the contributions from each polarization to the Poynting vector as
2
1 E
2
 
Pr ,1 (r , ,  )  Re E1 (r , ,  ) H 2* (r , ,  )  1
2
(1.52)
Novelties of Spectral Domain Analysis 25

2
1 E
2

Pr ,2   Re E2 (r , ,  ) H1* (r , ,  )  rˆ 2
2
 (1.53)

P(r , ,  )  rˆ  Pr ,1  Pr ,2  (1.54)

We can now compute the total radiated power by integrating the contributions of
the Poynting vector over all angles (, ) by
2 

  rˆ  P(r, , )r sin  d d (1.55)


2
Prad 
0 0

This is often computed when obtaining the directivity D of the antenna, which
describes how much radiation is concentrated in a particular ( , ) direction
compared to an antenna with equal radiation in all directions. For practicing
antenna engineers, the directivity is associated with a given polarization, and it can
be computed for the ith polarization by
4 U i ( ,  )
Di ( ,  )  (1.56)
Prad
where Ui(, ) = r2Pr,i(r, , ) is the radiation intensity associated with the ith
polarization. Note that this equation can be computed for any angle ( , ) since this
is a ratio of the radiation intensity to the radiated power. The importance of the
ratio is brought out by rewriting (1.56) in terms of the patterns rather than the
radiated power as
2
4 fi ( ,  ) (1.57)
Di ( ,  )  2 

  f  sin  d d
2 2
1  f2
0 0

This shows that one can compute the directivity at any angle with only the
knowledge of the far-field patterns. Since directivity can be computed for any
angle with this information, it remains instructive to rewrite (1.56) further in terms
of the electric fields as
2 2
2 E1 ( ,  ) 2 E0 2
Di ( ,  )   fi ( ,  ) (1.58)
 Prad  Prad
where the second equality utilizes the definition of the far-field electric field in
(1.48). This equation shows that the magnitude E0 could possibly be found through
the knowledge of the patterns and radiated power. We can rewrite this equation to
find
26 Advanced Computational Electromagnetic Methods and Applications

 Prad Di ( ,  ) (1.59)
E0  2
2 fi ( ,  )

Most often antenna engineers have the maximum directivity D0i for the dominant
polarization on hand, which corresponds to the angles ( , ). The patterns are
also often normalized to the dominant polarization components (i.e.,
max fi ( ,  )  1 ). With this in mind, we can arrive at the simplified scaling as
i , ,

 Prad D0i (1.60)


E0 
2

which contains parameters already assumed to be approximately known to the


system designer. Thus, the scaled far fields can be written in their scaled form as

e jkr Prad D0i


Er , ,   a2 f1,n  ,   a1 f2,n  ,  (1.61)
r 2

where f1,n(, ) and f2,n(, ) are the normalized far-field radiation patterns. These
scaled far-field patterns will be the patterns used to compute the near-field electric
field magnitudes.

1.3.2 Plane Wave Spectrum Retrieval from Far-Field Patterns

Using the results from the previous section, one can find the PWS based on the
properly scaled far-field patterns. For this section we assume that the antenna’s
main beam is pointing in the hemisphere containing 0    90 and that the planes
of interest are constant z planes. If other observation planes are desired then the
appropriate PWS corresponding to those planes should be computed using
coordinate transformations.
With the â1 and â2 components known, the first step in the process of
computing the PWS is to find the electric field in rectangular coordinates. This can
be accomplished through the use of a vector transformation matrix Tca , which
converts the components in the aˆi directions into the Cartesian vectors by

Ec  Tca Ea (1.62)

If the vector Ea is known in the spherical vector components then this manifests as
Novelties of Spectral Domain Analysis 27

 Ex  cos  cos   sin  


     E 
 E y   cos  sin  cos    E  (1.63)
 E    sin  0    
 z 

Once we have the Cartesian components of E, then we can compute the PWS via
the asymptotic relationship shown in (1.29). Rearranging the equation brings us to
the final relationship as

 j 2 e jkr r
A(k x , k y ) |kx k sin cos 
k y  k sin  sin  k cos 
 xˆEx (r, , )  yˆEy (r, , )  (1.64)

Thus, we arrive at the PWS from the far fields and can retrieve the near field using
the Fourier transform. Note that only the data from the range of 0     / 2 is
used to compute the PWS. It is interesting to point out that the factor 1/cos will
have a singularity at  = 90°, and in practice the radiation patterns may have finite
values at these angles, leading to infinite values in the PWS. A simple solution to
overcome this is to smoothen out the patterns with a windowing function. The
window function forces the patterns to zero at  = 90°. Using a reasonable
windowing function does not significantly change the final results.
An important observation is that the resulting PWS from the defined
procedure only obtains the visible components due to the inability to observe the
invisible components in the far field, as discussed in Section 1.2. Thus we
approximate the PWS as zero in the invisible region, which is a reasonable
approximation for large antennas as discussed previously.

1.4 PLANE WAVE SPECTRUM COMPUTATION VIA FAST FOURIER


TRANSFORM

The electric field in the near fields and the PWS share a Fourier transform
relationship, providing a remarkably insightful and intuitive link between the far-
field and near-field radiation. In practice, however, the far-field data is found at
sampled intervals (, ) in the far field. In order to make full use of the Fourier
transform relation for practical applications, one must modify these relationships
slightly when using sampled data. Thus, the DFT must be used to compute the
PWS from the far-field data, which comes in the form of the FFT for high-
efficiency computation. In this section we assume that a sampled version of the
far-field patterns is available to the user. With this data, we provide all the
necessary steps to obtain the near fields from the far-field data.
28 Advanced Computational Electromagnetic Methods and Applications

1.4.1 Discretizing the Plane Wave Spectrum and the Electric Field
Distribution

In many cases, only sampled data of the electric field distribution are available to
the user. We know that we can find the PWS by
 

j kx x  k y y 
A(k x , k y )  e jkz z0   E( x, y, z )e
 
0 dxdy (1.65)

However, if we only have knowledge of samples of the electric field, then we


approximate the integral as
N 1 M 1
 
A(k x , k y )  e jkz z0   E( xn , ym , z0 )e
j k x xn  k y ym
xy (1.66)
n 0 m 0

where the samples xn and ym are defined as


xn  nx (1.67)

ym  my (1.68)

In its current form, (1.66) resembles a discrete-time Fourier transform (DTFT),


which has a discrete spatial domain and a continuous spectral domain. If the PWS
spectrum is sampled at
2
k x ( p)  pk x  p (1.69)
N x
2 (1.70)
k y (q)  qk y  q
M y
where p = 0, 1, ..., N1 and q = 0, 1, ..., M1 then we can rearrange (1.66) as
N 1 M 1  pn qm 
j 2   
xy   E( xn , ym , z0 )e
jk z , pq z0
A pq  e  N M  (1.71)
n 0 m 0

which can be quickly recognized as a 2-D DFT, which can be computed quickly
via FFT. Note that the extra exponential term exp(jkz,pqz0) is included for
completeness. When the observation points (x, y, z) are at z = 0, then this term
disappears from the equation. Another point to note is that the values of kx and ky
range from 0 to 2/x and from 0 to 2/y, respectively. Thus, the parameters kxx
and kyy are the electromagnetic analogues of a signal’s angular frequency .
Just as the Fourier transform is an invertible operation, the FFT operation is
also invertible but with one slight modification. Starting with the inverse
relationship
Novelties of Spectral Domain Analysis 29

 
1
  A( k , k
 jk x x  jk y y
E( x, y, z0 )  x y )e jkz z0 e dk x dk y (1.72)
4 2
 

and using the same discretization leads to

 np mq 
k x k y N 1 M 1  j 2   
E( xn , ym , z0 ) 
4 2
 
p 0 q 0
 A pq e jkz , pq z0  e
 
N M  (1.73)

Interestingly, this equation does not reflect the traditional form of the inverse FFT,
and we can rearrange the equation by

 np mq 
1 N 1 M 1  j 2   
E( xn , ym , z0 )     A pq e jkz , pq z0  e  N M 
NM xy p 0 q 0  
(1.74)

which implies that the link via the FFT exists between the following entities

FFT

E( xn , ym , z0 )xy   A pq e jkz , pq z0
 (1.75)
iFFT

since the 1-D forward and inverse FFT can be defined as

N 1 mn
 j 2
Fm   f n e N (1.76)
n 0

mn
1 N 1  j 2
fn 
N
 Fme
m 0
N (1.77)

where it is again noted that the minus sign appears in the exponent for the inverse
FFT. Thus, one must utilize the provided scaling constants in order to remain in
agreement with the physical reality. This is an important feature to discuss, and
most researchers discard any constants and simply normalize to the maximum
absolute value of the resulting data matrix. The inclusion of the sampling distances
x and y in (1.75) is to preserve the physical values of the near fields and will be
discussed later.
Overall, these equations define the relationships that will apply to practical
datasets, that is, sampled data in the near field, PWS, and far fields. An underlying
point that should be highlighted in this discussion is that the sampled far fields lead
directly to the sampled values of the PWS via (1.64). A sample of the far-field at
the angle (, ) literally provides a sample of the PWS at the points (kx, ky) =
(ksincos, ksinsin), which only provides data for the visible region. The FFT
must be then used to compute the near-field electric fields from the PWS.
30 Advanced Computational Electromagnetic Methods and Applications

1.4.2 Proper Normalization of the Fast Fourier Transform

The FFT enables the power of the Fourier transform for practical applications
where only samples are known of a given distribution. Assuming that adequate
sampling has been implemented, no information loss should occur. Yet it is not
sufficient to merely apply the FFT operation on a data set; the resulting numbers
only provide relative information. The more interesting data in the application at
hand is the absolute values of the electric field in V/m, which takes some careful
manipulation and interpretation of the formulas. Most researchers apply the FFT
and normalize to make observations on the relative field values. This comes in
many forms, but the most common results to plot are the field values relative to the
maximum. However, in this section we will attempt to uncover the proper scaling
factors (i.e., normalization) that ensure that the units and values are sensible and
accurate when the data is computed from the FFT. Thus, the resulting data predict
the exact near field electric field values in V/m given the radiation pattern and
radiated power.
Finding the proper normalization starts by comparing the equations for the
inverse continuous Fourier transform and the inverse DFT. For simplicity, we list
out the formulas for one dimension, since the extension of the results to two
dimensions is straightforward. These can be written in order as

G  kx 

g ( x) 1
  G  k x  e jkx x dk x (1.78)


mn
1  j 2
hn  n
1
 Gm  
N
G e
m
m
N (1.79)

where again a negative exponential represents the forward Fourier transform. The
Gm represents the PWS, while gn and hn represent the continuous and sampled
electric fields in the near-fields, respectively. Note that these equations are written
in the form that would be utilized for the specific problem at hand. Given a
sampled version of the PWS, we want to find the correct near-field values using
the FFT. We denote the sampled version as hn rather than gn to make it clear that
merely applying the FFT on the data is not enough, and it will be shown later that
the values resulting from this operation are quite unreasonable. The first
observation from these equations is the lack of differential length x in (1.79), and
this is by definition of the DFT. The lack of the differential length is the first hint
of how the resulting output data should be normalized given the radiation patterns
and radiated power.
The second hint that can be used is Parseval’s theorem, which states that the
total power observed in the spectral domain is equal to that in the spatial domain,
that is,
Novelties of Spectral Domain Analysis 31

 
2 1 2
P 

g ( x) dx 
2  G (k )

x dk x (1.80)

If each side of the equation is discretized, then one can arrive at the conclusion that
1
g G
2 2
n x  n k x (1.81)
n 2 n

This formula represents the physical intuition behind the conservation of power.
The left side represents the electric fields in the aperture, while the right side
represents the far-field radiation. Since power is not lost, the total sum of all power
observed in the two domains must be equal. Note the use of the variable gn for the
sampled version of the correct field values. This variable represents the exact
electric fields sampled in the near field. Interestingly, differences exist between
this result and Parseval’s theorem for the DFT, which states
N
1 N
h 
2 2
n  Gn (1.82)
n 1 N n1
Both equations are correct and can be proven using the equations presented in this
chapter. Yet, a mysterious 1/N factor appears in (1.82) that does not appear in the
original. Equation (1.82) can be manipulated further by multiplying both sides by
kx to reveal
2
hn 2 1
h k x   G
2 2
n x  n k x (1.83)
n n x N N n

with further modifications produces


2
hn 1
 G
2
x  n k x (1.84)
n x 2 n

which immediately suggests that the resulting inverse FFT (iFFT) output should be
scaled such that gn = hn / x in order to produce the correct values of the electric
fields. This agrees with the resulting relationship shown in (1.75). By applying the
iFFT onto the samples Gn, we can find the values for hn = gn x. We then must
scale by 1/x in order to find the true magnitudes.
To illustrate these points, a 1-D example of the PWS will be shown. Assume
that the PWS of a given antenna is known as

sin k x W 2
Ak x , k y   xˆ 2Ak x  k y  xˆ 2A0   kx W 2
 ky  (1.85)
32 Advanced Computational Electromagnetic Methods and Applications

which reduces (1.18) down to a scalar 1-D Fourier transform relationship in terms
of A(kx) as

1
 A(k )e (1.86)
 jk x x
Ex ( x, y, 0)  x dk x
2 

The near-fields at z = 0 can be found as

A0  x
Ex ( x, y,0)  rect   (1.87)
W W 
which is simply the 1-D inverse Fourier transform of the sinc() function. For
numerical purposes, we choose A0 = 10V, W = 20, and f = 8.4 GHz and plot the
results in Figure 1.11. Clearly, the magnitudes are as expected, where the peaks of
the plots in Figures 1.11(a) and 1.11(b) are 10 and 14, respectively. Thus, the
interpretation is that the PWS has either measured or simulated peak values of
20V·m, while the electric field is equal to 14 V/m at z = 0. 

(a) (b)
Figure 1.11 Plots of the 1-D PWS example for A0 = 10 V, W = 20, and f = 8.4 GHz. These plots
represent the physical reality, and the values shown here are the true values of PWS and
electric fields. (a) PWS function A(kx). (b) Resulting electric field distribution at z = 0.

These values should also be reflected in the sampled versions resulting from
the FFT. For the case with A0 = 10V, W = 20, and f = 8.4 GHz, we have two
examples with two different sampling periods of the PWS. The first case is where
N = 200 and x = /4, leading to a spectral sampling period of kx = 0.04/. This
case is plotted in Figures 1.12(a) and 1.12(b). The other case tested uses N = 300
and x = /8, leading to a spectral sampling period of kx = 0.0533/. This case
is plotted in Figures 1.12(c) and 1.12(d). Note that the electric fields are plotted
when directly implementing the iFFT without any normalization. The resulting
electric field plots (Figures 1.12(b) and 1.12(d)), highlight several important
artifacts of the iFFT operation without normalization. The first observation is that
the magnitude of the electric field is not close to the magnitude of the electric field
given in Figure 1.11(b), which finds an electric field of 14 V/m. Furthermore, the
Novelties of Spectral Domain Analysis 33

values are different when using different sampling rates, as seen when comparing
Figures 1.12(b) and 1.12(d). These effects are all due to the lack of normalization
when computing the iFFT (or FFT) and can only be removed with the proper
normalization.

(a) (b)

(c) (d)
Figure 1.12 Plots of the 1-D PWS example for A0 = 10V, W = 20, and f = 8.4 GHz with different
sampling rates. (a) PWS function A(kx) for N = 200, x = /4. (b) Resulting electric field
distribution without normalization at z = 0 from iFFT operation. (c) PWS function A(kx)
for N = 300, x =/8. (d) Resulting electric field distribution without normalization at z =
0 from iFFT operation.

For the same case with A0 = 10V, W = 20, and f = 8.4 GHz, we computed the
electric field using the iFFT and 1/x normalization and plotted the results in
Figure 1.13. Notice that both plots predict roughly the same magnitude for the
electric field. More importantly, the electric field values agree well with the exact
values based on the continuous Fourier transform. This demonstrates both the
subtlety and the significance of including the 1/x normalization in the iFFT/FFT
operations. Another notable characteristic is the ringing effects near x = ±10,
which can be observed for the sampled electric field distributions. This is due to
the finite truncation of the sinc() function commonly known as Gibb’s
phenomenon, and the best way to minimize these effects is by increasing N. These
34 Advanced Computational Electromagnetic Methods and Applications

effects appear when truncating the higher-frequency spectral components.


However, the prediction of the electric field is still quite accurate in spite of this
artifact. The ringing effects observed here are usually not as pronounced when
applying the FFT to practical problems, since the antennas usually have smooth
transitions and are thus bandlimited in spectral components, as discussed in
Section 1.2.3.1.
The beauty of this normalization is manifested in its simplicity and its
mathematical rigor. By utilizing the definitions of the FFT, iFFT, Fourier
transform, and inverse Fourier transform as its foundation, the proper scaling
factor has been derived to ensure that the true magnitudes of the electric field
distribution are recovered. Other normalization approaches can be undertaken. A
useful trick could have been to normalize the fields by the overall power observed
in the resulting near fields, followed by a scaling of Prad. However, this was not
mathematically satisfying. The 1/x normalization does not require any a priori
knowledge of the radiated power, making the overall procedure both fast and
mathematically sound.

(a) (b)

Figure 1.13 Plots of the normalized 1-D electric field example for A0 = 10V, W = 20, and f = 8.4
GHz with different sampling rates. (a) Resulting electric field distribution at z = 0 from
iFFT operation for N = 200, x = /4. (b) Resulting electric field distribution at z = 0
from iFFT operation for N = 300, x = /8.

Extending the normalization to two dimensions is relatively straightforward.


When applying the 2-D FFT and iFFT, the resulting electric field should be scaled
by (xy)1, similarly to the 1-D case. This is illustrated later with practical
examples. 

1.4.3 The Sampling Theorem and Spectral Analysis

An interesting question that one might ask is how finely the electric field
distribution should be sampled in order to recover the electric field exactly. The
Novelties of Spectral Domain Analysis 35

answer relies on the well-known Nyquist-Shannon sampling theorem, which for


two dimensions states that one can reconstruct a bandlimited function f(x, y) with a
maximum spectral component Kx = max(|kx|) and Ky = max(|ky|) by using the sinc()
interpolation of regularly spaced samples as [19]
    n  m  sin  K x x  n  sin  K y y  m 
f ( x, y )   f ,
 K x K y  K x x  n K y y  m
(1.88)
n  m    
In a qualitative sense, the Nyquist-Shannon theorem makes it clear that one must
sample the function as rapidly as its fastest possible oscillations, or otherwise the
resulting data can suffer from distortion. Thus, one must sample on the order of the
highest wavenumber as x = y = /K = /k = /2, where K = max(Kx, Ky) = k
and k is the wavenumber, assuming that the same sampling rate is used in both
dimensions. 
In the context of the visible and invisible regions, any spectral components in
the invisible region can create very rapid oscillations in the near field. These rapid
variations in the near-field cannot be captured if only sampling the visible region.
This is the case when using the far-field patterns to predict the near field, since
only the visible region components can be observed in the far field. A high
presence of evanescent waves radiated by the antenna can limit the accuracy at
distances very close to the antenna, thus limiting the space at which the technique
remains applicable. For most large antennas of interest, the evanescent waves are
very minimal, and there is not a significant loss of accuracy in the observation
planes of interest.
It should be emphasized that if the complete PWS is known in the visible and
invisible regions, then the FFT and sinc() interpolation can be accomplished
without any significant loss of accuracy. Essentially, the presence of evanescent
waves manifests a physical limitation of the far-field to near-field technique, which
limits its application to electrically large antennas, which radiate negligible
evanescent components. To make these points clear, take an example PWS

A(kx , k y )  xˆ  (kx  kx 0 )   (kx  kx1 )   (k x  k x 2 )   (k y ) (1.89)

which has three spectral components located at (kx, ky) = (kx0, 0), (kx1, 0), and (kx2,
0). To make things interesting, we can place kx0 and kx1 in the visible region and kx2
in the invisible region. With this we have the corresponding electric field
distribution

 1  j  xkxi  z0
E( x, y, z0 )  xˆ   e
k 2  k xi2   e z
0 k x22  k 2 
e jkx 2 x  (1.90)
 i 0 
We choose values of kx0 = 0, kx1 = 0.707k, and kx2 = 1.5k to illustrate the points
being made and plot the electric field distribution for several planes of interest in
Figure 1.14 for several planes. Clearly, the distribution changes dramatically from
36 Advanced Computational Electromagnetic Methods and Applications

one observation plane to the next, where rapid variations in Ex are observed in the
z0 = 0 plane compared to the other planes. This is due to the evanescent component
that would be observed near the source antenna. However, the rapid variations in
the field distribution attenuate as z0 increases, and the only components that
effectively contribute to the field distribution in Figure 1.14(c) are the two
components in the visible region. This is a direct result of the passband filtering
properties of space, as discussed in Section 1.2.3.3.
We can also test the case where only the knowledge of the visible components
is known. In this case, the electric field distribution takes the form

  j  xkx 0  z0
E( x, y, z0 )  xˆ  e
k 2  k x20   e j xk x1  z0 k 2  k x21  (1.91)

 
The distribution is plotted in Figure 1.15, where clear differences can be observed
when compared to the results shown in Figure 1.14. Without the invisible spectral
components, the distribution in Figure 1.15(a) at z0 = 0 is not representative of the
true physical reality of the electric field distribution. However, as z0 increases, the
invisible components become vastly attenuated, and the fields begin to appear very
similar. Even at z0 = /4, the distributions share most of the major features, and at
z0 =  the two distributions in Figures 1.14(c) and 1.15(c) are almost identical. It is
interesting to note that it only takes one wavelength for the evanescent component
to decay to a negligible existence. This is quite a small distance, and from a
practical perspective it highlights the fact that the spectral technique can be readily
applied as long as the observation plane is not too close to the antenna.

(a) (b) (c)


Figure 1.14 Magnitude of Ex in (1.90) at several different observation planes with the complete
knowledge of the PWS in the visible and invisible regions. In this example, the values
kx0 = 0, kx1 = 0.707k, and kx2 = 1.5k are chosen. (a) z0 = 0. (b) z0 = /4. (c) z0 = .

The last point to highlight is the minimum sampling rate and the minimum
recoverable feature size in the electric field distribution. Since observations are
made in the far-field region, we are limited to only detecting the visible region of
the PWS, which implies that the maximum detectable wavenumber is
Novelties of Spectral Domain Analysis 37

(a) (b) (c)

Figure 1.15 Predicted magnitude of Ex in (1.90) at several different observation planes with only the
knowledge of the PWS in the visible regions. In this example, the values kx0 = 0, kx1 =
0.707k, and kx2 = 1.5k are chosen. (a) z0 = 0. (b) z0 = /4. (c) z0 = . 

2
K  max | k x |,| k y |  k  (1.92)

This means that the minimum sampling period must be at least
  (1.93)
x, y  
K 2
in order to ensure that all details are obtained and can completely recover the
electric field. Note that this minimum sampling rate only works if there are only
visible components present. If the observation plane is at distances near the
antenna, then the evanescent waves can create rapid oscillations that cannot be
captured with this sampling rate. The minimum recoverable feature size is highly
related to the minimum sampling period. The only features that change over a
distance of /2 or longer will be captured with the visible region. Even if a smaller
resolution in x and y is used, the fastest observable changes in the electric field
distribution will occur over a length of /2 when only the visible components are
used to predict the near fields. If the observation plane is near the antenna where
strong evanescent fields are radiating, then rapid oscillations in the electric field
distributions can occur that cannot be observed since the changes are
subwavelength (< /2). This is an important limitation that should be understood
when using the spectral domain analysis.

1.4.4 Far-Field Sampling Rates

In the previous section it was shown that the minimum sampling rate is x and y
< /2, and this can be achieved through a well-directed sampling scheme in the far
field. Based on (1.69) and (1.70), there is a relationship between the sampling rate
38 Advanced Computational Electromagnetic Methods and Applications

in the near-field spatial domain and the sampling rate in the PWS, which depends
directly on the far field. Specifically, the spectral sampling period should satisfy
2 
x   (1.94)
N k x 2
2  (1.95)
y  
M k y 2
leading to
4
N k x   2k (1.96)

4
M k y   2k (1.97)

which restates the sampling theorem from another perspective.
Another important criterion is the overall size of the antenna under
consideration. In most scenarios, it is generally desired to obtain the near-field
distribution over an observation plane whose area is larger than the physical size of
the antenna. This implies that the factor N x  D , where D is the largest
dimension of the antenna, leading to a constraint on N as
2D
N (1.98)

and similarly for the y component. Note that when increasing N, we will increase
the factor Nx, resulting in a smaller spectral sampling period. If we assume that N
> 2D/ and x = /2, then the corresponding spectral sampling period is
2
k x  (1.99)
D
This is an interesting result that is expected from antenna theory [17]. As the
largest antenna dimension D increases, the PWS (and the far field) will have
increasingly faster variations. Specifically, the beamwidth becomes narrower and
the number of observed side lobes increases. This implies that one must properly
sample the far field in order to observe all of its features in the near field. This is
depicted in Figure 1.16, where the far-field patterns of differently sized antennas
are shown. The antenna with D = 30 requires much more sampling points
compared to the one with D = 5. The resulting PWS for the D = 30 antenna will
have very rapid oscillations that call for a smaller sampling period to capture all of
the PWS features.
Novelties of Spectral Domain Analysis 39

Another physical interpretation of (1.99) can be obtained by expanding the


terms in the inequality. We can manipulate the inequality to shed some insight on
the sample spacing needed when sampling the far fields. Since the PWS will be
sampled in the visible region, we can write the spectral sampling period as
kx  kx1  kx 0  k sin 1 cos 1  k sin 0 cos 0 (1.100)

where the angles (1 , 1 ) and (0 , 0 ) represent far-field angles corresponding to
samples of the far fields. Next, we assume that the observation angle lies in the
principal x-z plane for simplicity, that is,  = 0, and also assume that
1     / 2 and 0     / 2 then it can be shown that

   2
k x  2k cos   sin   (1.101)
 2  D
For   0 we can approximate this inequality using sin(x) ≈ x
 (1.102)
 
D cos  

which is very similar to the diffraction limit seen in optics [20]. When  = 0, the
inequality in (1.102) leads to an angular far-field spacing of  < /D, which is a
good starting point for sampling the far-field data. In practice, researchers typically
use smaller sampling periods in order to ensure that the far-field patterns have
sufficient sampling.

(a) (b)
Figure 1.16 Illustration of how larger antennas can have faster variations in the PWS and far field.
(a) Far-field magnitude pattern of an antenna with dimensions on the order of D = 30.
(b) Far-field magnitude pattern of antenna with dimensions on the order of D = 5. Note
that for both plots. 
40 Advanced Computational Electromagnetic Methods and Applications

Notice that the conditions derived in this section and the previous section
merely show how to approach a decision when it comes to the sampling periods in
both space and spectrum. It is recommended to use even higher sampling rates in
order to ensure more accurate results. A typical recommendation for sampling the
far field is to ensure that each sidelobe has a few points sampling it. A reasonable
sampling period is to sample the far field patterns with 10 points within one
sidelobe or within the half-power beamwidth, leading to the final recommended
sampling period given as
6 (1.103)
 
D/
where  is in degrees. This ensures that all features of the far fields get properly
incorporated into the PWS in order to compute the near fields.

1.4.5 Interpolating the Far Fields

Having knowledge of the far-field patterns is either accomplished via measurement


or simulation. In each of those cases, it is natural to obtain the far-field patterns
with angles (, ) having a uniform angular separation (, ) throughout the
pattern, where each observation angle is defined by
n
 n  n   max , n  0,..., N  1 (1.104)
N
m
 m  m   max , m  0,..., M  1 (1.105)
M
One popular example is using spherical angles (, ) = (, ), but others such as
(AZ, EL) are also used frequently. Given the observation angles, such as ( , ),
one can compute their corresponding (kx, ky) points in the spectral domain using
(1.27). An example of this is shown in Figure 1.17(a), where the sample locations
are indicated by an X. We will denote this type of grid as the uniform spherical
grid.
As shown in Section 1.4.1, the FFT requires a rectangular sample grid in terms
of kx and ky, which is depicted in Figure 1.17(b). Nonuniform grid FFTs are also
available; however, in this chapter we will focus on uniform-grid FFT which is
more common and more readily available. Unfortunately the uniform spherical
grid that is typical for far-field data does not immediately provide the data at the
rectangular grid points. The remedy for this problem is to utilize interpolation
techniques to find the PWS values at the desired grid points. Since the far-field
data is complex, it becomes important to discuss the interpolation approach. The
data can be separated into either its magnitude and phase components or into its
real and imaginary components. As a rule of thumb, the best choice often lies in
the smoothness of the function. With complex data, the magnitude, real, and
Novelties of Spectral Domain Analysis 41

imaginary parts are usually reasonably smooth for most data sets. However, phase
is often wrapped into the [, ] range by most programs, and thus discontinuities
are a typical feature of a phase distribution. These can produce inaccurate results.
Therefore, it is generally recommended to interpolate the real and imaginary
components independently to obtain more accurate results. Therefore, when we
refer to interpolating the far-field data, it is automatically assumed that we are
interpolating the real and imaginary parts separately throughout the rest of this
section.

(a) (b)
Figure 1.17 (a) Location of the spherical grid points (, ) on the spectral kx and ky domain. The X
markers indicate the location of a PWS sample given a uniformly spaced spherical grid.
(b) Rectangular grid of (kx, ky) points in the spectral domain that is needed for the FFT. 

Interpolating the far-field data can become quite time-consuming, especially


when working with large data sets. When deciding on an interpolation scheme,
there are a few things to consider. Given a point of interest and the known data,
one must choose how many neighboring points to use in the approximation.
Related to that issue is the computational requirement of the algorithm as well as
the continuity of the resulting interpolant. Some methodologies provide continuity
in the first and second derivatives as well as the interpolant itself, while others
provide discontinuous interpolants. The domain in which the interpolation is
performed is also another choice to consider. A natural choice is to interpolate in
the (kx, ky) domain, but interpolation could also effectively be accomplished in the
( domain just as well.
Many of the choices at hand depend on the sampling grid in which the far-
field data is obtained. Most of the interpolation methods will execute the
interpolation by using the information of known data points in the neighborhood of
the point of interest. Thus, searching for the neighboring points must be
accomplished in nearly every interpolation algorithm, which can be one of the
most time-consuming processes. With more intelligent grids, the search can be
42 Advanced Computational Electromagnetic Methods and Applications

carried out faster. The most ideal is a regular grid of data in the domain of interest,
like the grid shown in Figure 1.17(b). Searching for points within a regular grid is
simplified to a small computation of the indices, which can be predicted by the
sampling periods. A curvilinear grid, like the one shown in Figure 1.17(a), is
somewhat more difficult, but can also be predicted through inverse mapping
functions. However, if the inverse functions are mathematically intractable or
challenging to compute (or if the data is on a random grid), then one may proceed
to use algorithms targeting scattered grids. However, it is highly encouraged to
spend the effort if possible in utilizing intelligent grids; a dramatic acceleration in
the interpolation process can be observed compared to schemes using scattered
grids.
One of the simplest interpolation techniques is the nearest neighbor
approximation, where the value of the point of interest is assigned the value of the
nearest neighboring point. For a given grid, this technique will likely produce the
fastest results with the least amount of memory requirements. However, a finely
meshed grid with a small maximum spacing between samples must be available, or
this technique suffers in accuracy. It will also produce a discontinuous interpolant,
which can be undesirable. This technique is useful when working under severe
memory and hardware speed constraints, but for many applications this technique
is not used.
A popular technique that provides a continuous interpolant is the bilinear
interpolation technique, which has also been referred to as the four-point bivariate
Lagrangian method [9]. One critical assumption in this technique is that the known
sampled data is on a regular (or even rectilinear) grid. First, the four points
neighboring the point of interest are identified at x11 = (x1, y1), x12 = (x1, y2), x21 =
(x2, y1), and x22 = (x2, y2), and the value at (x, y) is computed by

 x  x2  y  y2   x  x2  y  y1 
f ( x, y )     f11     f12
 x1  x2  y1  y2   x1  x2  y2  y1  (1.106)
 x  x1  y  y2   x  x1  y  y1 
   f 21     f 22
 x2  x1  y1  y2   x2  x1   y2  y1 

where fij = f(xi, yj). It is interesting to note that contrary to the name, the resulting
formula is actually not a linear function in x and y, since a resulting xy term can be
found when expanding this formula. The equation is quite fast to compute for each
point of interest, and rapid results can be achieved with this technique while still
maintaining good accuracy. The speed is achieved in both the search phase and the
computation phase. The only constraint is that the grid must be rectilinear, which
limits its use to only certain sets of far-field data. This can usually be accomplished
by interpolating in the (, ) domain rather than the (kx, ky) domain, since it is
popular to discretize the angles in a regular grid. Other more computationally
complex algorithms exist for those interested in obtaining more accurate results.
Novelties of Spectral Domain Analysis 43

The cubic spline algorithms can be extended to two dimensions, resulting in


solutions with continuous interpolants and first derivatives. These techniques also
require regular grids.
If the grid cannot be easily exploited to find the nearest points, one can
perform an exhaustive search of all points to find its N nearest neighbors. With this
information, one can interpolate using an inverse distance weighting of its N
nearest neighbors [21]. Another interesting approach is the linear-triangular
approach, where three points are chosen based on a certain triangulation scheme.
Delaunay triangulation provides a convenient and efficient method to triangulate
the sampled data with many nice properties. Once the triangulation is found, then
for a given point with its surrounding triangle we can compute the approximate
value by

f ( x, y)  1 f ( x1 , y1 )  2 f ( x2 , y2 )  3 f ( x3 , y3 ) (1.107)

where i are the barycentric coordinates of the triangle defined for the spatial
coordinate of interest x by

x  1x1  2x2  3x3 (1.108)

1  2  3  1 (1.109)

and xi are the three vertices of the triangle [22]. While the interpolants are fast to
compute, the triangulation can be extremely time-consuming for large data sets.
These interpolation algorithms have been implemented in a number of
different packages available online, including the functions built into MATLAB.
The bilinear and the cubic splines methods can be executed through the interp2
function. For scattered data points, the griddata function implements the
Delaunay triangulation followed by the linear-triangular interpolation. To highlight
the importance of intelligent grid approaches versus scattered data approaches, we
have tested the runtime when using an example of each approach. We tested this
on a sample size of 3,721 × 3,601 points, and our goal was to interpolate the data
to a set of 2,001 × 2,001 points. We applied the bilinear approach in the (, )
domain using the interp2 function in MATLAB and compared it to the linear-
triangular approach in MATLAB using the griddata function. The overall
computational time was 336 times faster using the bilinear approach compared to
the linear-triangular approach due to the lengthy triangulation needed. Both
procedures produced almost identical data. Clearly, the choice of interpolation is
extremely critical in reducing the overall computational time, and one must make
an informed decision on this matter.
44 Advanced Computational Electromagnetic Methods and Applications

1.4.6 Subtle Issues When Implementing the FFT and iFFT Using Pre-Built
Packages and Libraries

One of the greatest advantages in using the FFT is the plethora of packages and
research purely devoted to making its computation faster and more efficient. Since
the advent of personal computing and the internet, many packages and libraries
have been and are being developed to perform even the most challenging of tasks,
including the FFT. Some examples are the Fastest Fourier Transform in the West
(FFTW) subroutine library for C and C++ [23] and the FFTPACK Fortran
packages [24]. MATLAB currently implements the FFTW library with their built-
in functions fft2 and ifft2 [25].
While utilizing these packages can avoid spending large amounts of time
writing code, it is important to recognize some subtle differences in the common
implementation of the FFT and the FFT mentioned in this chapter. The most
common definition of the FFT and iFFT utilizes a negative and positive sign in the
exponent, respectively, which we will denote by
N 1 mn
 j 2
Gm m  g ( n) 
  g (n)e
n 0
N (1.110)

mn
1 N 1 j 2
g ( n)
 n  Gm 
1

N
 Gme
m 0
N (1.111)

While each algorithm implements the transform differently, the bottom line is that
precoded packages have the opposite sign in the exponential compared to this
chapter’s definitions in (1.76) and (1.77). In order to circumvent this issue, one can
rearrange the terms and use some mathematical manipulations to ensure a simple
and clear implementation. It can be easily shown that the following two operations

 g ( n) 
*
m  g ( n)   m
*
(1.112)

 g ( n) 
*
m
1
 g ( n)   1
m
*
(1.113)

are equivalent. For example, in order to implement the FFT operation defined by
(1.76) and (1.77) in MATLAB, we could use the following code
G = conj(fft2(conj(g));
or for the iFFT operation
G = conj(ifft2(conj(g));
where the conj function performs complex conjugation on the input matrix. This
can also easily be implemented in other languages using the built-in functions such
as CONJG in FORTRAN or the conj operator in the complex class of C++.
Novelties of Spectral Domain Analysis 45

The difference in the equations can be partially attributed to the choice of time
convention. If the physics time convention were chosen, that is, exp  it  , then
the opposite sign would appear in each of the exponentials, matching with the
conventional FFT/iFFT definitions. However, the engineering notation,
exp   jt  , is frequently used within the antenna engineering community, and
thus it was chosen for convenience.
The last detail to consider when implementing the FFT is the arrangement of
the spectral frequencies. In the context of time-frequency signals, each number in
the resulting output vector of the FFT corresponds to some spectral component n.
Since the FFT is periodic with 2, the range of frequencies can either be (0, 2) or
(, ), where a good choice is somewhat arbitrary. With many of the packages
available, the output of the FFT typically corresponds to the (0, 2) spectral
frequencies. Unfortunately, this representation is not convenient for the near-field
applications, and some adjustments to the data must be made in order to
make full use of existing algorithms. Since A(kx + 2x, ky) = A(kx , ky + y) =
A(kx + x, ky + y) = A(kx, ky), a circular element shift in the data array can
be used to reorient the data in a (, ) range. In MATLAB, this can be
accomplished using the fftshift function, which circularly shifts the elements
from right to left. Conversely, many existing iFFT function implementations must
have the data provided as an argument in the range of (0, 2). Again, a circular
shift can be used to accommodate this requirement, and MATLAB provides the
function ifftshift to accomplish the circular shift [25].

1.5 COORDINATE TRANSFORMATIONS FOR GENERALIZED


SIMULATION AND MEASUREMENT SYSTEMS

When conducting antenna simulations and measurements it may be convenient to


acquire the far-field patterns using another coordinate system versus the coordinate
system of interest. Many antenna simulation and measurement software packages
often assume a certain coordinate system and generate the far-field vector
components defined by three orthogonal unit vectors ( uˆ, vˆ, w ˆ ). In nearly all cases,
the vector representation changes for different coordinate systems, that is,
( uˆ1, vˆ1, w
ˆ1 )( uˆ2 , vˆ2 , wˆ 2 ). In the near-field analysis problem, the available far-field
data may likely be provided in a coordinate system different than the coordinate
system that fits the needs of the requirement or specification. Therefore, coordinate
transformations become necessary to convert the fields from the measurement
coordinate system (MCS) (xm, ym, zm) to the coordinate system of interest, which
we denote as the desired back-projection coordinate system (BCS) (xb, yb, zb).
Figure 1.18 exemplifies the general case, where the far-field pattern coordinate
system is both translated and rotated in comparison to the measurement coordinate
46 Advanced Computational Electromagnetic Methods and Applications

system. Note that the measurement coordinate system does not have to exclusively
refer to measurement data and coordinates. This can also represent far-field data
found from simulation that was only available for one specific coordinate system.
Since our application will perform the transformation in the far field, some
approximations and assumptions can be made in order to simplify the final
relations. First, the far field electric field can be written as

e jkrm
Em,u (rm , vm , wm )  E0 f (vm , wm ) (1.114)
rm
which is a general result that is evident in (1.29) and can also be proven through
vector potentials [14]. The factor E0 should be scaled according to the
normalization scheme discussed in Section 1.3. Notice that we assume that the first
measurement coordinate is um = rm in order to make the analysis more amenable to
the concept of the far-field. This will be assumed throughout the rest of the section.
Starting with the available complex vector far-field electric fields in the MCS
(Erm, Evm, Ewm), we attempt to find the complex vector electric field components in
the desired BCS (Exb, Eyb, Ezb) in order to compute the near-field prediction in the
desired planar area of interest. A systematic and intuitive approach uses the
following step by step procedure [26, 27]:
1. Convert the given field data (Erm, Evm, Ewm) defined by ( rˆm , vˆm , w
ˆ m ) into their
rectangular components (Exm, Eym, Ezm) .
2. Transform the rectangular field components (Exm, Eym, Ezm) into desired
coordinate system rectangular components (Exb, Eyb, Ezb) using Eulerian
angles.
3. For each given MCS location rm, vm, wm (or xm, ym, zm) associated with the
field data, compute the location with respect to the BCS.
In step 1, converting the given field data (Erm, Evm, Ewm) into the rectangular
components can be accomplished using transformation matrices. An important
assumption is that the 3-tuple ( rˆm , vˆm , w
ˆ m ) forms a set of orthonormal vectors
throughout all three-dimensional space, that is, ( rˆm  vˆm  vˆm  w
ˆ m  rˆm  wˆ m  0 ) for
all ( xm , ym , zm ) . With this assumption, the transformation matrix can be written as
a matrix whose elements are the projection of the vector components (Erm, Evm,
Ewm) onto the rectangular directions, as

 rˆm  xˆm vˆm  xˆm wˆ m  xˆm 


Tm,ru  rˆm  yˆ m vˆm  yˆ m wˆ m  yˆ m  (1.115)

 rˆm  zˆm vˆm  zˆm wˆ m  zˆm 


Novelties of Spectral Domain Analysis 47

where the subscript “ru” denotes the conversion from the uvw components to the
rectangular components. With the transformation matrix at hand, the rectangular
components can be computed from

Em,r  Tm,ru Em,u (1.116)

For example, in the conversion of spherical vector components to rectangular


ˆ m ) = ( rˆm ,ˆm ,ˆm ), which results in
components, we have ( rˆm , vˆm , w

 Em, x  sin  cos  cos  cos   sin    Em,r 


    
 Em, y    sin  sin  cos  sin  cos    Em,  (1.117)
 
 Em, z   cos   sin  0   Em, 

as expected. In step 2, the rectangular components in the measurement coordinate


system must be transformed into rectangular components with respect to the far-
field pattern coordinate system. This can also be visualized as computing the
projection of each rectangular component in the MCS onto the axes of the BCS,
which are shown in Figure 1.18. The two coordinate systems are displaced by rbm ,
but this does not affect the projections since the rectangular unit vectors are
position invariant. The transformation matrix for this operation would appear as

 xˆb  xˆm xˆb  yˆ m xˆb  zˆm 


Tbm   yˆb  xˆm yˆb  yˆ m yˆb  zˆm  (1.118)
 zˆb  xˆm zˆb  yˆ m zˆb  zˆm 

Figure 1.18 Coordinate system transformations are critical in converting the fields from one
coordinate system to another. Many times the far-field patterns are measured using a
MCS predefined by the measurement system denoted with the subscript m. The data has
to be converted using coordinate system transformations to obtain the data represented
in the far-field BCS, denoted by the subscript b. 
48 Advanced Computational Electromagnetic Methods and Applications

A simple and elegant expression of this transformation matrix decomposes this


matrix into three dimensional rotation matrices. Any general coordinate system
orientation can be written as three coordinate system rotations about an axis. The
three angles that the coordinate system must rotate to align with another coordinate
system are known as the Eulerian angles. The choice of axis rotations is somewhat
arbitrary, but a popular one used in the antenna community is to first rotate the
coordinate system about the zm -axis by . This results in a coordinate system
whose x-axis is aligned with x , as shown in Figure 1.19. Note that the primed
coordinate system  xm , ym , zm  has the same orientation as the MCS with its origin
displaced to the origin of the BCS. Subsequently, the coordinate system is rotated
about x (the line of nodes), by ,

Tbm  R zb   R x    R zm   (1.119)

followed by a rotation of  about the zb-axis. We apply each of these rotations in


the order given by where R z    , R x    and R z   are the rotation matrices
b m

about the x- and z-axes, respectively, defined as

 cos  sin  0 
R zm      sin  cos  0  (1.120)
 0 0 1 

1 0 0 
R x     0 cos  sin   (1.121)
0  sin  cos  

 cos  sin  0
R zb       sin  cos  0  (1.122)
 0 0 1 

With the transformation matrix now defined, the rectangular vector components in
the BCS can be computed as

Eb,r  TbmEm,r (1.123)

leading to the final resulting equation to obtain the far-field vector components
with respect to the BCS
Eb,r  R zb   R x    R zm   Tm,ru Em,u (1.124)
Novelties of Spectral Domain Analysis 49

Figure 1.19 Eulerian angles used to transform vector components in the MCS to the back-projection
′ ′ ′
coordinate system. Note that the primed measurement coordinate system (𝑥𝑚 , 𝑦𝑚 , 𝑧𝑚 )
has the same orientation as the MCS but the origin has been displaced to the origin of
the BCS.

The third and last step is to determine each point’s location in the BCS given its
coordinates in the MCS. Assuming that every point has a known (um, vm, wm)
coordinate, we can write its position vector as
rm  rxm um , vm , wm xˆm  rxm um , vm , wm yˆm  rxm um , vm , wm zˆm (1.125)

where rxm, rym, and rzm are the transformation functions relating the (um, vm, wm)
coordinates to the (xm, ym, zm) position. With the position vector, (1.123), and the
origin displacement vector rbm, it can be shown that the observation point location
can be found in rectangular coordinates with respect to the BCS by the relationship

rb  Tbm  rm  rbm  (1.126)

which is depicted in Figure 1.18. This is useful as it provides a direct link between
the coordinates (um, vm, wm) of an observation point in the MCS to the coordinates
(xb, yb, zb) with respect to the BCS.
In the far-field, the distance between the coordinate systems is negligible
compared to the distances to the observation point, that is, |rm| » |rbm| and |rb| »
|rbm|. This leads to two different approximations that are often seen in antenna
theory. The first is a zeroth order approximation, which states that

rb  Tbmrm (1.127)

leading to the approximation that

T 
T
rf  rbT rb  r
bm m Tbmrm  rm (1.128)
50 Advanced Computational Electromagnetic Methods and Applications

1 T
using the definition of rotation matrices as orthogonal matrices where Tbm  Tbm
where 𝑇̅𝑏𝑚
𝑇
represents the matrix transpose. The first-order approximation can be
used by approximating rb by

 T r  rbm   Tbm  rm  rbm 


T
rb  rbT rb  bm m
(1.129)
T T T T T
 r r  2r r  r r
m m m fm bm bm  r r  2r r
m m m bm

using |rm| » |rbm|. Next, we can use the binomial approximation 1  x  1  x / 2 to


find that

rmT rbm
rb  rm   rm  rˆm  rbm (1.130)
rm
where it should be noted that this becomes exact in the true far field at r   . We
use these two approximations to write the far fields as
 jk  rb  rˆb rmb 
e (1.131)
Eb,r (rb ,b , b )  E0 Tbm Tm,ru f (b , b )
rb

where we assume that rˆb  Tbmrm and rmb  Tbmrbm . The assumptions also lead
to the relationship between the spherical angles in the BCS and MCS as

sin b cos b   rxm (um , vm , wm ) 


 sin  sin    1 T  r (u , v , w )  (1.132)
 b b
r
bm  ym m m m 

 cos b  m  rzm (um , vm , wm ) 

These are the final results that can be used to transform the electric fields from one
coordinate system to another as well as convert coordinates (um, vm, wm) into the
spherical angles (b, b). Equation (1.131) has two critical factors that alter the
original data: the phase factor and transformation matrices. The phase factor
accounts for the origin displacement between the two coordinate systems. The
transformation matrices convert the vectors to another coordinate system
orientation, as discussed previously. Remember that the equation defines the
rectangular components of Eb. The PWS is most often written in rectangular form,
and thus no further steps are required beyond this equation.
As an example, we will derive these relations for an elevation-azimuth (EL-
AZ) coordinate system commonly used in measurement systems. As with any
antenna pattern measurement, the AUT must be placed on a positioner that can
provide motion in at least two axes. A common positioner configuration is EL over
AZ, denoted as EL/AZ, where each axis of rotation directly changes the angles AZ
and EL as depicted in Figure 1.20. The coordinate system shown in Figure 1.20(b)
Novelties of Spectral Domain Analysis 51

describes both the MCS and the desired BCS. Notice that the origins are not
displaced for this example, but the orientation of the coordinate systems is
different. In this example, we assume that we know the electric field distribution in
terms of EEL and EAZ for a given (EL, AZ) coordinate in the far field. Our task is to
convert these fields into the BCS in order to find the PWS and compute the near
fields using the FFT.

(a) (b)
Figure 1.20 (a) EL/AZ antenna positioner used in antenna pattern measurements. (b) Coordinate
system configuration for the MCS and BCS of the EL/AZ example defining the AZ and
EL angles.

Using (1.115), we can find the rectangular components of the electric field as

 Em, x  cos EL cos AZ  sin EL cos AZ  sin AZ   0 


    
 Em, y    cos EL sin AZ  sin EL sin AZ cos AZ   Em, EL  (1.133)
   sin EL cos EL 0   Em, AZ 
 Em, z  

where we assume that the radial electric field, that is, Em,r = 0, based on the far-
field assumption. For the Eulerian angles, it can be determined that we have  =
,  = /2,  = /2, which leads to a rotation matrix given by
 0 1 0  1 0 0   0 1 0   0 0 1 
Tbm   1 0 0 0 0 1   1 0 0   0 1 0  (1.134)
 0 0 1  0 1 0  0 0 1  1 0 0

based on (1.119)(1.122). This agrees with the coordinate system depiction in


Figure 1.20. The xm and zb axes are aligned along with the zm and xb axes. The ym
and yb axes are antiparallel, which agrees with the 1 center element. The
rectangular electric field components in the BCS can now be found through
(1.123) as
52 Advanced Computational Electromagnetic Methods and Applications

 Eb , x   sin EL cos EL 0  0 
      (1.135)
 Eb , y     cos EL sin AZ sin EL sin AZ  cos AZ   Em, EL 
 Eb, z   cos EL cos AZ  sin EL cos AZ  sin AZ   Em, AZ 
 

Thus, we have the conversion of the electric field distribution into the BCS well
defined for all points. The last step is to relate the coordinates between the two
coordinate systems. We can relate the (EL, AZ) angles to (b, b) angles using the
relationship rˆb  TbmTm, ru rm,u :

sin b cos b   sin EL 


 sin  sin      cos EL sin AZ  (1.136)
 b b  
 cos b   cos EL cos AZ 

With the electric field and its corresponding location computed, one can then
proceed to compute the PWS. As a final note, we can now write the final far-field
electric field in the BCS as

E0 e jkrb
Eb (rb ,b , b )   xˆb  Em , EL cos EL 
rb 
 yˆb  Em, EL sin EL sin AZ  Em, AZ cos AZ  (1.137)

 zˆb  Em, EL sin EL cos AZ  Em, AZ sin AZ  

where the (EL, AZ) angles are related to (b, b) through the relationship shown in
(1.136), resulting in the equations b = cos1(cos(EL)cos(AZ)) and b = –tan1
(cot(EL)sin(AZ)). Note also that the phase term did not alter the electric field
phase since the coordinate systems’ origins were collocated (i.e., 𝒓𝑚𝑏 = 0).

1.6 THEORETICAL VALIDATION OF NEAR-FIELD PREDICTION

To demonstrate the use of spectral analysis in computing the near fields, the FFT
approach was applied on several examples. Two well-known aperture distributions
with analytical radiation patterns were selected. The resulting near-field
distribution and field values based on the far fields agree quite well with the
theoretical aperture distribution. Both of these problems have theoretical and
practical significance for reflector, array, horn, slot, and other antennas.
Novelties of Spectral Domain Analysis 53

1.6.1 Rectangular Aperture Distribution

Suppose that the electric field in a rectangular aperture, depicted in Figure 1.21, of
width a and length b has the field distribution as

 a / 2  x  a / 2
ˆ xa ,
 xE
E( x, y, 0)   b / 2  y  b / 2 (1.138)
0 elsewhere

where Exa is the electric field at the rectangular aperture. If the aperture is large,
one can predict the radiated power provided by this aperture by

Exa2
Prad  ab (1.139)
2
since the fields mimic plane waves traveling through the aperture.

Figure 1.21 Rectangular aperture electric field distribution used for testing the spectral analysis-FFT
technique.

Using (1.20), we can find the PWS as

k a  k yb 
A(k x , k y )  xˆExa absinc  x  sinc   (1.140)
 2   2 
which leads to a far-field distribution of
E0e jkr (1.141)
E(r , ,  )  f ( ,  )
r
 ka   kb 

f  ,    sinc sin  cos   sinc cos  sin   ˆ cos  cos   ˆ sin  (1.142) 
 2   2 
where E0 contains the scaling factors that would be unknown to the observer. The
function f(, ) represents the pattern that would be known to the user. The
directivity may already be known to the user, or one could predict the directivity
54 Advanced Computational Electromagnetic Methods and Applications

by integrating the pattern. For this particular aperture distribution, we can predict
the directivity by 
ab
D0  4 (1.143)
2
Using the spectral analysis procedure outlined in Section 1.1 and detailed in the
previous sections, the near-field electric field distribution has been computed for
several planes. As a numerical example, an aperture size of a = b = 33.5 was
chosen along with a radiating power of Prad = 87W at 13.4 GHz. Using (1.139), the
electric field magnitude in the aperture becomes Exa = 341.5 V/m for this particular
radiated power. For these particular values, we have plotted the radiation pattern in
Figure 1.22(a). Notice that the beamwidth is fairly small with a half-power
beamwidth of 1.5°. Thus, a sufficiently small spectral sampling period must be
used in order to capture the information from the pattern.
With the radiation pattern readily available, the FFT spectral analysis program
was applied to these fields to predict the aperture field distribution. For the results
shown in Figure 1.22(b), 2,000 points were used for both kx and ky sampling in the
visible region. Note that any values for A(kx,ky) outside the visible region were set
to zero in order to maintain the rectangular grid. The spacing x = y = /4 was
chosen based on the recommendations in Section 1.4, leading to a spectral
sampling period of kx = ky = 0.002k. With this spacing, the angular spacing is
roughly  = 0.11° or larger (since the angular spacing is not uniform in ).

(a) (b)
Figure 1.22 (a) Normalized radiation patterns for the rectangular aperture with a = b = 33.5 for  =
0°, 90°. (b) Predicted electric field aperture distribution |Ex| via FFT given the far-field
patterns, directivity, and radiated power. The FFT utilized the sampled far field with N =
M = 2,000 points and spatial sampling period of x = y = /4. The spectral sampling
period was kx/k = ky/k = 0.002. 

The resulting aperture fields at z = 0 from the FFT computation are shown in
Figure 1.22(b). We only plot the magnitude of the x-component since the y-
Novelties of Spectral Domain Analysis 55

component is negligible. The first and most evident characteristic from the plot is
that the FFT spectral analysis predicts a square-shaped aperture distribution with a
sharp roll-off in the electric field. The length and width of the aperture predicted
by the resulting aperture distribution are roughly 33.5, which agrees with the
theoretical development. Some ringing effects can be observed at the outer
periphery of the aperture, but this is due to the fact that only the visible portion of
the spectrum is considered in the approach. This artifact comes from the physical
limitation of the spectral analysis framework. Even more important is the
magnitude of the fields in V/m. It was found that the magnitude of the electric field
component Ex in the aperture had a mean value of 341.1 V/m, agreeing well with
the theoretical value of 341.5 V/m.
It is interesting to observe the effect of larger spectral sampling periods as well
as the spatial sampling on the spectral analysis' ability to predict the near field.
This is compared by examining the changes in the near-field values for different
number of samples N = M (for both x- and y-directions) for the same rectangular
aperture as in Figure 1.22. The spatial sampling spacing x, y remain constant at
/4, which means that changing N presents a change in spectral sampling and
ultimately angular sampling. For the case when N = 250, the smallest angular
sampling is roughly  0.916°. The results of the comparison are shown in
Figure 1.23(a), where the electric field is plotted versus x/ for y = 0. It is
interesting to note that a larger angular spacing will still provide a satisfactory
prediction as seen in Figure 1.24(a). In each of the sampling schemes, the general
features are observed and no significant difference can be observed. From a
numerical perspective, the smaller values of N lead to faster computation times,
which can be useful in computationally intensive applications.
A comparison of the results for different spatial sampling periods is shown in
Figure 1.23 for the same rectangular aperture. The magnitude of the electric field
component Ex is plotted versus x with different spatial sampling spacings x = y.
This was done while keeping the number of samples constant at N = M = 1,000.
As expected, the curves converge as the spacing between samples becomes
smaller. Ringing is still present in all cases, since only the visible region spectral
components are present. However, it is interesting to note that the /2 case seems
to have no ringing. This is due to the spacing of the samples, and a closer
investigation shows that the /2 case provides nearly identical values to the smaller
spacing cases. As expected, the smaller sample spacing does not necessarily
provide a dramatic improvement in the values recovered in this procedure, but
rather provides more data points if desired. Note also that the slope of the rolloff
does not become steeper with smaller x. The lack of higher spectral components
in the invisible region causes the finite slope, and a steeper slope similar to the
theoretical distribution can only be attained by incorporating higher spectral
content.
56 Advanced Computational Electromagnetic Methods and Applications

(a)

(b)
Figure 1.23 (a) Near-field electric field distribution in Ex for the rectangular aperture with a = b =
33.5 at y = 0. Different number of samples N were used in the FFT to compare their
effect, and x = y = /4. (b) Near-field electric field distribution in Ex for the
rectangular aperture with a = b = 33.5 at y = 0. Different spatial sampling was used in
the FFT to compare its effect with .

The resulting near-field distributions for the same rectangular aperture are
shown in Figure 1.24 for several different planes. These plots are generated by the
same spectral analysis (FFT) program with different values for z using the far
fields and the radiated power. The fields at z = 50 (Figure 1.24(a)) demonstrate
similar features to the original aperture distribution, while those farther away
(Figure 1.24(d) with z = 400) resemble the far-field radiation patterns, as
expected. The contour plots provide insight into the hotspot locations at various
planes of interest. The plots illustrate an increased field intensity at the corners of
the aperture which eventually shift towards the center (x = y = 0). The maximum
field intensity also does not decrease monotonically versus z. In fact, the largest
field intensity between these plots can be observed at z = 400. Similar
observations have been made with previous findings on near-field distributions,
which showed that the fields tend to oscillate rapidly in the near field and gradually
begin to attenuate by 1/r around the far-field region. This axial variation will be
discussed later on.
Novelties of Spectral Domain Analysis 57

(a) (b)

(c) (d)

Figure 1.24 Magnitude of the Ex component of the rectangular aperture radiating 87W with a = b =
33.5 with an FFT sampling of x =y = /4 and N = M = 2,000 for (a) z = 50 (b) z =
100 (c) z = 200 and (d) z = 400.

1.6.2 Circular Aperture Distribution

Suppose that there exists an electric field over a circular aperture, shown in Figure
1.25, of radius a with the distribution given by

 ˆ , x2  y 2  a2
 xE
E( x, y, 0)   xa (1.144)
0
 elsewhere

where Exa is the electric field magnitude in the aperture and does not depend on
space. Similar to the rectangular aperture, the electric field in the aperture can be
related to the power radiated by
58 Advanced Computational Electromagnetic Methods and Applications

2
Exa
Prad   a2 (1.145)
2
and the directivity can be predicted by
 a2
D0  4 2 (1.146)

Figure 1.25 Circular aperture electric field distribution used for testing the spectral analysis-FFT
technique.

(a) (b)
Figure 1.26 (a) Normalized radiation patterns for the circular aperture with a = 16.75 for  = 0°,
90°. (b) Predicted electric field aperture distribution |Ex| via FFT given the far-field
patterns and radiated power. The FFT utilized the sampled far field with N = M = 2,000
points and spatial sampling period of x = y = /4. The spectral sampling period was
kx/k = ky/k = 0.002. 

For the circular aperture, the far-field patterns can be found by computing
PWS and employing its asymptotic relation to the far fields. The PWS can be
found by taking the 2-D Fourier transform of the circular disc as

 e
jk x x  jk y y
ˆ xa
A(k x , k y )  xE dxdy (1.147)
Sc

where Sc is the circular area centered about the origin with radius a. The integral
can be rewritten in the aperture cylindrical coordinates as
Novelties of Spectral Domain Analysis 59

a 2

j  k x cos  k y sin    d d 
A(k x , k y )  xˆExa e
0 0
(1.148)

The exponent in the equation above can be modified to have the form
a 2
j  k x2  k y2 sin   
A(k x , k y )  xˆExa 
0 0
e  d d  (1.149)

 
where   tan 1 k x / k y . The integrand is a periodic function in  and thus can be
recognized as the Bessel function of the first kind. Integrating in  leads to
a

 
A(k x , k y )  xˆ 2 Exa J 0  k x2  k y2  d 
0
 (1.150)

where Jm is the Bessel function of the first kind of mth order. Setting
t   k x2  k y2 and dt  d  k x2  k y2 and using the property that
t
 J 0 ( x) xdx  tJ1 (t ) , it can be shown that
0

A(k x , k y )  xˆ 2 Exa a 2

J1 a k x2  k y2  (1.151)
a k x2  k y2

which is often referred to as the Airy disc function. Using (1.29), we can find the
far-field pattern of the circular aperture using its PWS as

E0 e jkr
E(r , ,  )  f ( ,  ) (1.152)
r
2 J1 kasin   ˆ
f  ,  
kasin 

 cos  cos   ˆ sin   (1.153)

where E0 is another arbitrary scaling factor unknown to the observer or user. The
factor of 2 in f(, ) is included in order to normalize the pattern.
With the radiation pattern readily available, the FFT spectral analysis program
was applied to these fields to predict the aperture field distribution. As a numerical
example, an aperture size of a = 16.75 was chosen along with a radiating power
of Prad = 87W at 13.4 GHz. With the radiated power and the area known, the
electric field in the aperture can be computed as Exa = 385.4 V/m using (1.145).
For these particular values and aperture sizes, we have plotted the radiation pattern
in Figure 1.26(a). Overall, the patterns in the principal planes are similar to those
of the rectangular aperture with the exception of the lower sidelobes. Since a
60 Advanced Computational Electromagnetic Methods and Applications

similarly sized aperture was utilized, the beamwidth is also comparable to that of
the rectangular aperture at 1.6°. Similar patterns between  = 0° and 90° are
realized since the aperture is circular.
For the results shown in Figure 1.26(b), 2,000 points were used for both kx and
ky sampling in the visible region, again placing zeros for any A(kx, ky) falling
outside the visible region. The sample spacing was set to x = y = /4, leading to
a spectral sampling period of kx = ky = 0.002k. With this sample spacing and
spectral period, the smallest angular spacing is  = 0.11°, which provides an
ample number of points to sample the radiation pattern. For the  = 0 cut, the
spectral sampling period provides roughly 16 points per sidelobe, thus ensuring
that the oscillations in the far-field radiation pattern are well sampled. The
resulting aperture distribution shown in Figure 1.26(b) reflects the original circular
shape with a radius of approximately 16.75. The results were generated based on
the knowledge of only the far-field patterns and the radiated power. Besides this,
no other a priori knowledge was utilized to generate the aperture fields.
Near-field electric field distributions were generated for different planes
farther from the aperture plane to show how the near-field distribution and the
maximum electric field value can change along the distance z. In Figure 1.27, the
electric field distribution can be observed for z = 50, 100, 200, and 400. The
contour plots reveal that the distribution from a uniform circular aperture spread
over a large area into a focused beam. Even at 50 away from the aperture, the
fields oscillate around 400 V/m in roughly the same area in the original aperture.
At 100 (Figure 1.27(b)), the oscillations in the fields become more pronounced,
and a sharp beam starts to take shape with large sidelobes. The electric field
distributions at 200 and 400 have the appearance of a concentrated beam. It is
also interesting to keep track of the maximum Ex field intensity for each of the
planes. After the FFT computation, the resulting data matrix for the Ex component
was searched to find the location and value of the maximum. The maximum fields
found from the FFT computation were given as 561 V/m, 723 V/m, 622 V/m, and
686 V/m for z = 50, 100, 200, and 400, respectively. The search also showed
that the maximum values were located at x = y = 0 in every case, as expected from
the plots. To summarize, the near-field predictions made by this tool provide both
insight into the near-field distributions as well as a direct tool for engineers to
evaluate systems in terms of both the requirements within the vicinity of the
antenna. By knowing how the fields are distributed in V/m, one can directly assess
the interference upon nearby electronic systems as well as the radiation levels
received by individuals in the vicinity of the antenna.

1.6.3 Axial Field Prediction of the Uniform Circular Aperture

While the rectangular aperture provided some interesting insights into the
evolution of a near field to a far-field distribution, a unique feature of the circular
aperture is that the near fields can be analytically solved along the z-axis. This is a
Novelties of Spectral Domain Analysis 61

well-known feature that has been proven insightful in understanding the near-field
behaviors of large antennas. For our purpose, it can serve as a benchmark problem
to ensure the validity of the technique at nonzero distances from the aperture.
Therefore, we will first derive the near fields of a circular aperture and compare
with the results generated from a spectral analysis FFT program.

(a) (b)

(c) (d)
Figure 1.27 Magnitude of the Ex component of the circular aperture radiating 87W with a = 16.75
with an FFT sampling of x =y = /4 and N = M = 2000 for (a) z = 50, (b) z = 100, (c)
z = 200, and (d) z = 400. The maximum Ex fields observed in these planes were 561
V/m, 723 V/m, 622 V/m, and 686 V/m, respectively.

With the same electric field distribution given in (1.144) and illustrated in
Figure 1.25, we can begin to compute the near-field electric field distribution along
the z-axis (i.e., x = y = 0). It has been shown using vector potentials and the surface
equivalence theorem that the radiated electric field from such an aperture can be
found using integral [14]
62 Advanced Computational Electromagnetic Methods and Applications

 e jk r r 
E(r)  2 zˆ  
Sc
E( x, y , 0)  
 4 r  r 

 dxdy 


(1.154)

where Sc is the circular surface of radius a for the integration, r is the observation
point, and r' is the source location along with any other primed coordinates.
Substituting the field distribution in (1.144) and taking the first cross-product leads
to

 e jk r r 
E  2 Exa  S
yˆ  
 4 r  r 

 dxdy 


(1.155)

The next step is to find the gradient for the factors inside the parentheses as

 r  r  1  jk r  r   e jkR 
E  2 Exa S
yˆ   
 r  r

r  r 4 r  r 
 dxdy 


(1.156)

Since the observation points are along the z-axis and the source points are only
located in the x-y plane (i.e., x = y = 0 and z' = 0), we can expand this into

  x' xˆ  y ' yˆ  z ' zˆ 


 
Exa  x'2  y '2  z '2 
(1.157)
2 
E ˆ
y   dx' dy '
2 2 2
S  1  jk x'  y '  z '  jk x '2  y '2  z '2 
  e 
 x'2  y '2  z '2 
Taking the cross-product further simplifies this to

E xzˆ  zxˆ 1  jk x 2  y 2  z 2 e

 jk x2  y 2  z 2
E  xa dxdy  (1.158)
2 x 2  y 2  z 2
2
x  y   z2 2
S

Converting to cylindrical coordinates, the integral appears as

E
a 2
  cos  zˆ  zxˆ 1  jk  2  z 2 e

 jk  2  z 2
E  xa  d d   (1.159)
2 2
  z 2 2
  z 2
0 0

where one immediately can see that the z component will go to zero. Integrating in
 provides
Novelties of Spectral Domain Analysis 63

a
1  jk  2  z 2

 2  z 2
ˆ xa
E  xzE e jk  d   (1.160)
 
3/2
0  2  z 2

which can be solved using substitution and integration by parts to find the electric
field as

 2
e jk a  z
2

ˆ 
E  xExa e  jkz
  (1.161)
 1  (a / z ) 2 
 
which has been shown in other works as well [28].

(a) (b)

(c) (d)
Figure 1.28 Comparison of the exact axial field distribution for some representative uniform circular
apertures. A sample size of N = M = 500 was used in conjunction with a sample spacing
of x = y = /4. The aperture radii were (a) a = 16.75 (b) a = 5 (c) a = 10 and (d)
a = 50.

Using this formulation, a comparison can be made with the results from the
FFT program. Using a similar configuration to the previous cases, an aperture size
64 Advanced Computational Electromagnetic Methods and Applications

of a = 16.75 was utilized. In order to speed up the computation, the sample size
was chosen to be N = M = 500 and x = y = /4. The results shown in
Figure 1.28(a) agree considerably well even at very close distances. In order to
further demonstrate the ability of the FFT spectral analysis, other aperture sizes
were considered, and the results are also shown in Figures 1.28(bd). For these
other apertures the same sample size N and sample spacing x were also used.
Remarkably good agreement can also be observed in these plots as well.
Clearly, the power of the FFT spectral analysis approach is exemplified by
these plots. Computing the near fields presents a major challenge. Many
researchers have worked towards approximating the fields within the near field as
well as the Fresnel-zone regions through a variety of techniques [4, 2830].
However, most of those techniques were only able to approximate the fields
adequately to a distance of a few diameters away from the aperture (e.g., z = 3D),
whereas this technique is able to accurately predict the near fields within only a
few wavelengths away from the aperture. This is due to the fact that the evanescent
waves that make up the invisible part of the spectrum quickly die out and no longer
contribute to the radiation pattern after a few wavelengths.

1.7 SOME PRACTICAL EXAMPLES

1.7.1 A Symmetric Reflector Antenna

For high power and high gain applications, antenna engineers often prefer the
reflector antenna due to its widely proven use, efficiency, and power handling
capabilities. The cost and complexity in scaling the reflector antenna to provide
higher gain are also reasonable compared to other options such as arrays.
Therefore, it would be instructive to consider a practical example of a reflector
antenna in the context of computing the near-fields. The traditional reflector
antenna systems are made up of two components: the feed and the reflector system.
In general, the feed can be a horn antenna or even an array for added antenna
capabilities. The feed antenna illuminates the reflector(s), where the scattered
radiation becomes focused or directed due to the properties of the reflectors. The
reflector system can be configured to provide many unique functionalities in the
radiation patterns. The reflector(s) can be curved, flat, or corner for different
purposes. There also can be multiple reflectors or a single reflector. A common
reflector design is the single symmetric parabolic dish fed with a feed at the focus,
as seen in Figure 1.29, where the dish is symmetric about . When the feed is
placed at the focus of the parabola, the scattered fields become collimated, that is,
the scattered fields appear as plane waves traveling in the +z-direction.
Consequently, the fields over the aperture have a uniform phase, leading to a high
directivity.
Novelties of Spectral Domain Analysis 65

In the example to be explored, the feed antenna is modeled as a cosq() feed


[31], where the radiated fields are assumed to have the form

E f r ,  ,   
e jkr
r
   
cos q x  cos  Exˆ  E yˆ  cos y  sin  E yˆ  Exˆ (1.162)
q

where the choice of Ex and Ey determine the polarization of the feed. Note that
Ef = 0 for  > 90°. Setting (Ex, Ey) = (1, 0) would provide an x-polarized feed
antenna, while setting (Ex, Ey) = (0, 1) would provide a y-polarized feed antenna.
Circular polarization can also be achieved by setting Ex = Ey along with quadrature
phase between the two components. The most important point to note from this
pattern is that the q-factors qx and qy control the pattern beamwidth in the x-z and
y-z planes, respectively. In the design being discussed, these q-factors were chosen
in order to provide a certain taper in the aperture fields, widely known as the edge
taper (ET). In the particular example at hand, the desired edge taper was ET = 10
dB, which is optimal for single parabolic reflector directivity, providing the
maximum aperture efficiency of ap = 81% [31]. The tapering should be reflected
in the aperture distribution, where the fields should be roughly 10 dB at the
reflector edge compared to the center.

(a) (b)
Figure 1.29 Symmetric parabolic dish antenna fed with a feed antenna at the parabolic focal point.
(a) Side view. (b) Top view.

The radiation from the feed antenna excites surface currents on the reflector.
The currents in turn radiate the fields observed in the far field along with any
additional radiation from the feed. A good approximation of the current
distribution on the reflector is the physical optics approximation, where the surface
currents can be computed by
J PO  2nˆ  H f (1.163)

where JPO is the physical optics (PO) surface current, n̂ is the unit normal vector
to the reflector surface, and Hf is the radiated magnetic field from the feed antenna,
66 Advanced Computational Electromagnetic Methods and Applications

which can be computed from (1.162) and the local plane wave relationship in
(1.11). Notice that the only required knowledge is the geometry of the parabolic
reflector (to provide the unit normal vector) and the incident magnetic field. In
reality, the current distribution deviates from the PO prediction due to the
interactions of the feed and reflector in addition to strong edge currents.
Nevertheless, the PO approximation is still quite accurate and useful in predicting
the pattern features in the main beam and its first few sidelobes.
With the currents known on the reflector surface, the radiated fields can be
ascertained through an integration of each infinitesimal current’s contribution. A
generalized treatment of the radiated electric fields applicable in both far-field and
near-field regions has been formulated from vector potentials [32], with the
resulting integration of the current being

   e4R d
 jkR
EPO r    jk0  g1J PO r'  g2 J PO r'  Rˆ Rˆ (1.164)

where R = r  r', R = |R|, and Rˆ  R / R and g1 and g2 are defined as


1 1
g1  1  2
j (1.165)
 kR  kR

3 3
g2  1  2
j (1.166)
 kR  kR

These are the exact PO integrals that provide the electric field in the near-field and
far-field regions without any approximations. The evaluation of this integral has
been discussed in detail in [32] and other works, but it is not the main focus for this
chapter. Rather, this exact formulation is compared against the results from the
FFT procedure discussed.
For the symmetric reflector example, the chosen diameter was D = 33.5 and
the ratio f/D =0.568, leading to a focal length of 19.03. The frequency 10 GHz
was chosen arbitrarily. Using Figure 1.29, it can be shown that the subtended angle
s = 47.5° with this configuration. In order to obtain the edge taper of
ET = 10 dB, the q-factors were chosen as qx = qy = 2.483. The feed was also x-
polarized and the power radiated was 100W. With a diameter of this size, the half-
power beamwidth can be predicted to be roughly 1.8°, and thus a rapid sampling
rate must be applied in order to make an effective prediction using the spectral
analysis-FFT program. The far-field patterns in Figure 1.30 confirm the rapid
variations, where E and E are plotted for the x-z and y-z planes. The far fields
were generated by evaluating (1.164) in the far-field region, taking an overall
computational time of 2.4 hours. The speed was accelerated through parallelization
into four separate cores on a computer equipped with two quad-core Intel Xeon
Novelties of Spectral Domain Analysis 67

E5420 processors alongside 32 GB of RAM in order to gain a four fold decrease in


time.
The directivity of the reflector configuration was computed at roughly 39.56
dB, which shows that this reflector has an aperture efficiency of ap = 81.5%, as
expected. With this information, a sampling period of x = y = /4 and N = M =
2,000 was chosen, leading to a spectral sampling period of kx = ky = 0.002k.
Like the other examples, this provides roughly  = 0.11° sampling in the far field.
Note that with symmetric reflectors, the cross-polarization is minuscule in these
two planes due to the symmetry of the reflector and therefore was not plotted.

(a) (b)
Figure 1.30 Far-field patterns in the x-z ( = 0°) and y-z ( = 90°) planes for a symmetric parabolic
reflector antenna with D = 33.5 and f = 19.03.The q-factors for the feed were qx = qy =
2.483, and the patterns were generated by integrating the PO currents from (1.164). 

(a) (b)
Figure 1.31 (a) Near-field aperture distribution of Ex computed via PO integration. (b) Near-field
aperture distribution of Ex computed via FFT. For both cases the geometry was set to D
= 33.5 and f = 19.03 and the radiated power was 100W. The observation plane is z =
h  f = 15.34.

The resulting distributions in the aperture of the reflector antenna are shown in
Figures 1.31 and 1.32. In Figure 1.31(a), the aperture fields result from the
68 Advanced Computational Electromagnetic Methods and Applications

computation of the near-field integrals of (1.164), whereas the results from the FFT
computation are shown in Figure 1.31(b). The plots depict the magnitude of |Ex|
over the plane at z = h  f, as illustrated in Figure 1.29. The aperture distribution
was computed by applying the iFFT onto the PWS, which was obtained via the far-
field patterns. The near-field distribution shown was computed using the pattern
data in the range of  = [0, 45°] and  = [0, 360°], and overall the prediction of the
near fields is quite accurate even with the limited data available. The smallest
sidelobe levels included were 60 dB below the peak, which clearly was enough to
recover the near fields accurately. The data available from the program that
computes the far-field patterns provided the patterns over a rectangular - grid.
Therefore, interpolation was also used in order to convert the data to a rectangular
kx  ky grid. As discussed in Section 1.4.5, the interpolation was performed in the
- domain in order to exploit the rectangular grid available. The FFT along with
its sampling parameters, e.g. N, x, and so forth, predetermine the spectral
coordinates (kx, ky) in which the far-field components must be known. The
interpolation was performed by first converting the desired (kx, ky) coordinates into
(, ) locations via (1.27). Bilinear interpolation was subsequently employed to
compute the electric fields at the desired (, ) locations.

(a) (b)
Figure 1.32 (a) Magnitude of Ex along the x-axis (y = 0) for z = h  f = 15.34 compared between
the FFT and PO integration procedures. (b) Magnitude of Ex along the y-axis (x = 0) for
z = h  f = 15.34 compared between the FFT and PO integration procedures. For both
cases the geometry was set to D = 33.5 and f = 19.03 and the radiated power was
100W.

Some interesting differences can be observed between the two plots and
Figure 1.32 highlights some of those features. In Figure 1.32(a), the Ex magnitude
is plotted along the x-axis, i.e. y = 0, in which both numerical procedures yield
excellent agreement. Some slight differences between the FFT approach and the
PO integration (exact) can be observed, such as the ripple and the rolloff of the
fields outside the aperture. In spite of this, the agreement between the PO
integration and the FFT overall is notable. Both approaches take totally different
paths in generating the near-fields and yet arrive at almost identical solutions. The
results in the z = 10 and 100 cases also demonstrate noteworthy agreement.
Novelties of Spectral Domain Analysis 69

(a) (b)
Figure 1.33 (a) Near-field distribution of Ex computed via PO integration. (b) Near-field distribution
of Ex computed via FFT. For both cases the geometry was set to D = 33.5 and f =
19.03 and the radiated power was 100W. The observation plane is z = 10

(a) (b)

Figure 1.34 (a) Magnitude of Ex along the x-axis (y = 0) for z = 10 compared between the FFT and
PO integration procedures. (b) Magnitude of Ex along the y-axis (x = 0) for z = 10
compared between the FFT and PO integration procedures. For both cases the geometry
was set to D = 33.5 and f = 19.03 and the radiated power was 100W.

It is also worth pointing out that the FFT approach provided the resulting
aperture distribution significantly faster than the PO integration approach. Using
the same computer with the same core allocation, the spectral analysis-FFT
procedure finished in roughly 9.1 seconds, including the time for interpolation. As
for the PO integration approach, the final computation time was roughly 2.91
hours, resulting in about 1,000 times slower speed than the FFT approach. The
only assumption is that the FFT has the far-field patterns in order to calculate the
near-field distribution. The PO integration is performed by splitting the reflector
into many small subdomains and computing their contribution to the integral via
70 Advanced Computational Electromagnetic Methods and Applications

Gauss-Legendre quadrature. The number of sections and integration points were


chosen to balance the time and accuracy of the simulation. The computer in which
these computations took place held two quad-core Intel Xeon E5420 processors
with 32 GB of RAM installed. In this particular study, each algorithm was
assigned 1 core, although parallelization is a definite possibility for future
computational endeavors.

(a) (b)
Figure 1.35 (a) Near-field distribution of Ex computed via PO integration. (b) Near-field distribution
of Ex computed via FFT. For both cases the geometry was set to D = 33.5 and f =
19.03 and the radiated power was 100W. The observation plane is z = 100.

(a) (b)
Figure 1.36 (a) Magnitude of Ex along the x-axis (y = 0) for z = 100 compared between the FFT
and PO integration procedures. (b) Magnitude of Ex along the y-axis (x = 0) for z = 100
compared between the FFT and PO integration procedures. For both cases the geometry
was set to D = 33.5 and f = 19.03 and the radiated power was 100W.

1.7.2 A Symmetric Reflector Antenna with an Elliptical Projected Aperture

Symmetric reflectors are not necessarily limited to having circular projected


apertures. If different beamwidths along the x-z and y-z planes are desired, then
Novelties of Spectral Domain Analysis 71

one can elongate or shorten the aperture in one dimension, making the projected
aperture elliptical in shape, as shown in Figure 1.37. The projected aperture is
characterized by its major and minor axes a and b. By increasing one of the axes
and properly adjusting the feed, the beamwidth along the dimension corresponding
to the axis can be narrowed. As for the reflector geometry, the aperture no longer
lies in a plane since a ≠ b, and thus the maximum parabola heights are not equal
(i.e., hx ≠ hy). In this case, the aperture plane can be considered as z = max(hx, hy) 
f.
The ensuing simulations assumed that the feed’s far-field radiation patterns
appeared as cosq() patterns similar to the circular symmetric reflector. Thus, the
design procedure for the elliptical symmetric reflector antenna is nearly identical to
that of the circular symmetric reflector, with the only difference in the choice of a
and b as well as the feed’s q-values qx and qy. As an example, the geometry was
chosen as a = 16.75, b = 25, and f = 19.03, leading to a narrower beamwidth in
the y-z plane compared to the beamwidth in the x-z plane. The feed parameters qx
and qy were chosen in order to provide a 10-dB edge taper as best as possible.
This was accomplished by computing the subtended angles sx and sy and setting
the q-values to obtain the proper feed taper, which also takes the path loss into
account. Since the aperture is longer along the y dimension in this particular
example, one can expect the qy to be smaller than the qx value. The resulting
directivity from this design was 41.209 dB, producing roughly 80% aperture
efficiency. The radiated power was set to Prad = 100W.

(a) (b)
Figure 1.37 Symmetric parabolic dish antenna with an elliptical projected aperture having major and
minor axes of length a and b. (a) 3-D view, and (b) Top view.

The radiated far-field patterns were generated over the range of  = (0, 45°)
and  = (0, 360°), and the normalized patterns for the two principal planes (the x-z
and y-z planes) are shown in Figure 1.38. The results were found by directly
integrating the PO currents as shown in (1.164) using composite Gauss-Legendre
quadrature on small subdomains of the reflector. The far-field patterns were
computed on a regular (, ) grid where  = 0.1° and  = 0.1°, a reasonable
72 Advanced Computational Electromagnetic Methods and Applications

choice given the beamwidths in the x-z and y-z planes as 1.8° and 1.4°,
respectively. These beamwidths can be observed in Figure 1.38. The beamwidths
are not drastically different in comparison to the previous symmetric reflector
antenna, and thus a similar far-field sampling scheme was applied. The sampling
parameters were N = M = 2,000 and x = y = /4, leading to a spectral sampling
period of kx = ky = 0.002k. The data was interpolated to achieve the electric field
over a regular (kx, ky) grid in the same manner as the symmetric reflector with a
circular aperture.

(a) (b)
Figure 1.38 Normalized far-field patterns in the x-z ( = 0°) and y-z ( = 90°) planes for an elliptical
symmetric reflector antenna with a = 16.75 b = 25, and f = 19.03.The q-factors for
the feed were qx = 2.483 and qy = 1.24, and the patterns were generated by integrating
the PO currents using (1.164).

(a) (b)
Figure 1.39 (a) Near-field aperture distribution of |Ex| computed via PO integration. (b) Near-field
aperture distribution of |Ex| computed via FFT. For both cases the geometry was set to a
= 16.75, b = 25 and f = 19.03 and the radiated power was 100W. The observation
plane is at the aperture plane, located at z = hy  f = 10.82.
Novelties of Spectral Domain Analysis 73

(a) (b)
Figure 1.40 (a) Magnitude of Ex along the x-axis (y = 0) for z = hy  f = 10.82 compared between
the FFT and PO integration procedures. (b) Magnitude of Ex along the y-axis (x = 0) for
z = hy  f = 10.82compared between the FFT and PO integration procedures. For
both cases the geometry was set to a = 16.75, b = 25and f = 19.03and the radiated
power was 100W.


(a) (b)
Figure 1.41 (a) Near-field distribution of Ex computed via PO integration. (b) Near-field distribution
of Ex computed via FFT. For both cases the geometry was set to a = 16.75, b = 25,
and f = 19.03 with a radiated power of 100 W. The observation plane is z = 10.

Figures 1.39-1.44 provide the resulting near-field distributions at several


different planes and compare the PO integration approach with the FFT approach
discussed herein. Again, excellent agreement is observed in all planes shown. For
the aperture plane distribution shown in Figures 1.39-1.40, there are some small
differences in the rolloff outside of the aperture; however the difference is not
altogether significant. For the cases where z = 10 and 100, even better
agreement is observed, where the solutions provided by both algorithms are nearly
identical. It is interesting to examine the evolution of the fields towards the far-
74 Advanced Computational Electromagnetic Methods and Applications

field distribution. For the two planes shown, the beamwidth in the x-z plane is
smaller compared to the y-z plane. However, this will change as the distance z
approaches the far-field region where the y-z plane beamwidth becomes the
smaller beamwidth as expected from antenna theory. Lastly, it should be noted that
the distribution for Ey could also be investigated, but the values are negligible
compared to those for Ex.

(a) (b)
Figure 1.42 (a) Magnitude of Ex along the x-axis (y = 0) for z = 10 compared between the FFT and
PO integration procedures. (b) Magnitude of Ex along the y-axis (x = 0) for z = 10
compared between the FFT and PO integration procedures. For both cases the geometry
was set to a = 16.75, b = 25 and f = 19.03 and the radiated power was 100W.


(a) (b)
Figure 1.43 (a) Near-field distribution of Ex computed via PO integration. (b) Near-field distribution
of Ex computed via FFT. For both cases the geometry was set to a = 16.75, b = 25
and f = 19.03 and the radiated power was 100W. The observation plane is z = 100. 
Novelties of Spectral Domain Analysis 75

(a) (b)
Figure 1.44 (a) Magnitude of Ex along the x-axis (y = 0) for z = 100 compared between the FFT
and PO integration procedures. (b) Magnitude of Ex along the y-axis (x = 0) for z = 100
compared between the FFT and PO integration procedures. For both cases the geometry
was set to a = 16.75, b = 25 and f = 19.03 and the radiated power was 100W.

1.7.3 Near-Field Prediction with Only Two Pattern Cuts

In many cases, antenna designers only have knowledge of the far-field radiation
patterns in two principal planes (e.g.,  = 0°, 90°). Clearly, this does not provide
the complete set of data needed to recover the PWS in the visible region. However,
one can attempt to interpolate the patterns in  to make a good initial prediction of
the near fields. Denoting the radiation patterns as f() and g() for the  = 0 and
90° cuts, respectively, we can write a simple interpolation in the far field as

E0 e jkr 
f ( ) cos   g ( )sin  ˆ
 
E(r , ,  )
r (1.167)

 f ( ) cos  
g ( )sin  ˆ 

where it is assumed also that the pattern functions f() and g() only contain the
magnitudes of the fields. If the phase information is also available, then the minus
sign becomes a plus sign in (1.167). This formulation is fairly general with respect
to the polarization of the far fields, being able to handle either x-polarized, y-
polarized, or CP with the proper insertion of phase into each component.
The reader should note that this interpolation is quite simplistic and does not
work well if the aperture distribution is not symmetric. Another major assumption
is that the main beam is centered about  = 0. Different interpolation schemes in 
must be utilized for more complex patterns such as scanned beams, contour beams,
and asymmetric patterns. A straightforward example of an aperture distribution
whose far fields can be interpolated using the sin()/cos() approach is any aperture
distribution that can be written as E(  ,  , 0)  E f (  ) , which has no dependency on
0

 and E0 is an arbitrary vector constant. 


76 Advanced Computational Electromagnetic Methods and Applications

As an example of this interpolation scheme in , the far-field patterns from the


circular symmetric reflector antenna are revisited, with the assumption that only
the patterns in the  = 0, 90° planes are known. The patterns in these planes can be
found in Figure 1.30. The reflector antenna was linearly polarized in the x-
direction, and its cross-polarization levels were extremely small in the two
principal planes (less than 100 dB below the copolar component). Therefore, both
g ( and f() were set to zero, and the interpolation was completed by using
f and g () = E(°). Since the phase of the electric fields was
embedded in the pattern functions, the final equation for interpolation employed

Er ,  ,   
r

E0e jkr ˆ
f   cos   ˆg  sin   (1.168)

in the implementation.
Once the far fields were fully interpolated over the  = (0, 2) span, the next
steps to find the near fields followed the usual procedure, where the PWS was
computed using (1.64) and the iFFT applied with the proper normalization to
achieve the final near-field values. The resulting near-field distributions found
from only two planes (or cuts) is shown in Figure 1.45. Nearly identical results
were found from this procedure, thus demonstrating the power of having
knowledge of only the two principal plane far-field distributions.
It is important to note that the cos/sin interpolation in  only works well for
circular symmetric aperture distributions which result in roughly circularly
symmetric far-field patterns. Good performance cannot be guaranteed for all
aperture distributions in general with this interpolation. The cos/sin interpolation
was also tested on the elliptical symmetric reflector and the rectangular aperture
distribution, where the near-fields at the aperture were computed using only the
two principal planes. The results shown in Figures 1.46 and 1.47 show that decent
agreement can be obtained for the elliptical case (although there are more
noticeable discrepancies in other areas) while poor results are obtained with the
rectangular aperture distribution. Thus caution must be exercised when applying
this interpolation scheme. Since the elliptical symmetric reflector has similar
patterns throughout , interpolating the pattern with simple cos/sin functions works
decently; however this is not the case for the rectangular aperture distribution.
These observations can be confirmed by examining the far-field patterns for
E and E for the  = 45° cut, as shown in Figure 1.48. In this figure, the far-field
patterns for the circular symmetric reflector, elliptical symmetric reflector, and the
rectangular aperture are compared between their original (exact) patterns and the
interpolated patterns using sin/cos interpolation. The  = 45° cut is typically the
plane at which the largest discrepancies can be observed between the exact patterns
and the sin/cos interpolation. In the  = 0°, 90° cuts, the interpolated patterns are
identical to the exact patterns due to the zeros of the sin/cos functions. Therefore
the most interesting cut to investigate is the far-field patterns of the  = 45° cut. 
Novelties of Spectral Domain Analysis 77

(a) (b)

(c) (d)

(e) (f)
Figure 1.45 Magnitude of Ex compared between the FFT approach with (two cuts) and without
interpolation in  (all cuts) for several planes for the circular symmetric reflector
antenna. For all cases the geometry was set to D = 33.5 and f = 19.03 and the radiated
power was 100 W. (a) Plot along the x-axis (y = 0) for z = h  f = 15.34(b) Plot
along the y-axis (x = 0) for z = h  f = 15.34. (c) Plot along the x-axis (y = 0) for z =
h  f = 10. (d) Plot along the y-axis (x = 0) for z = h f = 10. (e) Plot along the x-axis
(y = 0) for z = h  f = 100. (f) Plot along the y-axis (x = 0) for z = h  f = 100.

Nearly identical far-field patterns can be observed for the circular symmetric
reflector due to its aperture distribution. However, the elliptical symmetric
78 Advanced Computational Electromagnetic Methods and Applications

reflector and the rectangular aperture show some deviations from the exact
patterns. Both the main beam beamwidth and the sidelobe levels are noticeably
different in both cases. Among the two, the elliptical shows better agreement with
the exact patterns in terms of the main beam and also the sidelobes. This is because
the elliptical symmetric reflector still has fairly similar sidelobe levels in the  =
45° case compared to the  = 0°, 90° patterns as shown in Figure 1.38. The exact
pattern for the rectangular aperture has significantly lower sidelobes in the  = 45°
compared to the  = 0°, 90° patterns shown in Figure 1.22, which leads to a poor
prediction by the sin/cos interpolation.

(a) (b)
Figure 1.46 Magnitude of Ex compared between the FFT approach with (two cuts) and without
interpolation in  (all cuts) for several planes for the elliptical symmetrical reflector. For
all cases the geometry was set to a = 16.75, b = 25, and f = 19.03 with the radiated
power as 100 W at 10 GHz. (a) Plot along the x-axis (y = 0) for z = 10.82. (b) Plot
along the y-axis (x = 0) for z = 10.82.

(a) (b)
Figure 1.47 Magnitude of Ex compared between the FFT approach with (two cuts) and without
interpolation in  (all cuts) for several planes from the rectangular aperture case. For all
cases the geometry was set to a = b = 33.5 and the radiated power was 87W at 13.4
GHz. (a) Plot along the x-axis (y = 0) for z = 0. (b) Plot along the y-axis (x = 0) for z = 0.
Novelties of Spectral Domain Analysis 79

(a) (b)

(c) (d)

(e) (f)
Figure 1.48 Normalized far-field patterns of E and E compared between the exact values
(computed from simulation or the exact pattern function) versus the cos/sin interpolation
in  (two cuts) for the  = 45° plane. (a) E for the circular symmetric reflector antenna.
(b) E for the circular symmetric reflector antenna. (c) E for the elliptical symmetric
reflector antenna. (d) E for the elliptical symmetric reflector antenna. (e) E for the
rectangular aperture. (f) E for the rectangular aperture. The dimensions for the circular
symmetric reflector antenna, elliptical symmetric reflector antenna, and the rectangular
aperture are the same as those listed in Figures 1.451.47, respectively.
80 Advanced Computational Electromagnetic Methods and Applications

Extensions can be made to asymmetrical patterns in a more general manner,


and further research is being conducted to determine their use in predicting the
near fields. One possibility is the use of spherical mode expansion in order to
interpolate the pattern about , but further work must be conducted in order to
validate its accuracy for the near-field application at hand. 

REFERENCES

[1] P. Clemmow, The plane wave spectrum representation of electromagnetic fields, New York, NY:
Pergamon Press, Inc., 1966.
[2] R. Rudduck, D. Wu, and M. Intihar, “Near-Feld Analysis by the Plane-wave Spectrum Approach,”
IEEE Transactions on Antennas and Propagation, Vol. 21, No. 2, pp. 231–234, 1973.
[3] H. Booker, and P. Clemmow, “The concept of an angular spectrum of plane waves, and its
relation to that of polar diagram and aperture distribution,” Proceedings of the IEE, Vol. 97,
No. 45, pp. 1117, 1950.
[4] G. Evans, S. Dvorak, and S. Fast, “Efficient computation of Fresnel zone fields associated with
circular apertures,” Radio Science, Vol. 29, No. 4, pp 705–715, 1994.
[5] E. Jull, “Radiation from Apertures,” in Antenna Handbook, Vol. 2, Y. Lo and S. Lee (eds.), New
York, NY: Van Nostrand Reinhold, 1993.
[6] R. Rudduck and C. Chen, “New plane Wave Spectrum Formulations for the Near-Fields of
Circular and Strip Apertures,” IEEE Transactions on Antennas and Propagation, Vol. 24, pp.
438449, 1976.
[7] O. Iupikov, et al., “Fast and Accurate Analysis of Reflector Antennas With Phased Array Feeds
Including Multiple Reflections Between Feed and Reflector,” IEEE Transactions on Antennas and
Propagation, Vol. 62, No. 7, pp. 34503462, 2014.
[8] P. Beeckman, “Prediction of the Fresnel region field of a compact antenna test range with serrated
edges,” IEE Proceedings on Microwaves, Antennas and Propagation, Vol. 133, No. 2, pp.
108114, 1986.
[9] M. Gatti and Y. Rahmat-Samii, “FFT applications to plane-polar near-field antenna
measurements,” IEEE Transactions on Antennas and Propagation, Vol. 36, No. 6, pp. 781791,
1988.
[10] J. McKay and Y. Rahmat-Samii, “Compact Range Reflector Analysis Using the Plane Wave
Spectrum Approach with an Adjustable Sampling Rate,” IEEE Transactions on Antennas and
Propagation, Vol. 39, No. 6, pp. 746–753, 1991.
[11] Y. Rahmat-Samii, “Surface Diagnosis of Large Reflector Antennas Using Microwave
Holographic Metrology: An Iterative Approach,” Radio Science, Vol. 19, No. 5, pp. 12051217,
1984.
[12] J. Wang, “An Examination of the Theory and Practices of Planar Near-Field Measurement,” IEEE
Transactions on Antennas and Propagation, Vol. 36, No. 6, pp. 746–753, 1988.
[13] J. Goodman, Introduction to Fourier Optics, 3rd ed., Greenwood Village, CO: Robert &
Company Publishers, 2005.
[14] C. Balanis, Advanced Engineering Electromagnetics, New York, NY: John Wiley & Sons, 2012.
Novelties of Spectral Domain Analysis 81

[15] F. Ulaby, Fundamentals of Applied Electromagnetics, Upper Saddle River, NJ: Pearson, 2004.
[16] L. Shen and J. Kong, Applied Electromagnetism, Boston, MA: PWS Publishing, 1995.
[17] C. Balanis, Antenna Theory: Analysis and Design, New York, NY: John Wiley & Sons, 2005.
[18] D. Pozar, Microwave Engineering, New York, NY: John Wiley & Sons, 2011.
[19] A. Jerri, “The Shannon sampling theorem—Its Various Extensions and Applications: A Tutorial
Review,” Proceedings of the IEEE, Vol. 65, No. 11, pp. 15651596, 1977.
[20] M. Born and E. Wolf, Principles of Optics, Cambridge, UK: Cambridge University Press, 1997.
[21] D. Shepard, “A Two-Dimensional Interpolation Function for Irregularly-Spaced Data,”
Proceedings of the 1968 ACM National Conference, pp. 517–524, 1968.
[22] D. Watson and G. Philip, “Triangle Based Interpolation,” Journal of the International Association
for Mathematical Geology, Vol. 16, No. 8, pp. 779–795, 1984.
[23] “Fastest Fourier Transform in the West.” Online at http://www.fftw.org/.
[24] P. Swarztrauber, “FFTPACK.” Online at http://www.netlib.org/fftpack/.
[25] MATLAB and Statistics Toolbox Release 2012b, The MathWorks, Inc., Natick, MA.
[26] Y. Rahmat-Samii, “Useful Coordinate Transformations for Antenna Applications,” IEEE
Transactions on Antennas and Propagation, Vol. 27, No. 4, pp. 571574, 1979.
[27] D. Duan and Y. Rahmat-Samii, “Novel Coordinate System and Rotation Transformations for
Antenna Applications,” Electromagnetics, Vol. 15, No. 1, pp 1740, 1995.
[28] V. Galindo-Israel and Y. Rahmat-Samii, “A New Look at Fresnel Field Computation Using the
Jacobi-Bessel Series,” IEEE Transactions on Antennas and Propagation, Vol. 29, No. 6, pp.
885898, 1981.
[29] M. Hu, “Fresnel Region Fields of Circular Aperture Antennas,” Journal of Research of the
National Bureau of Standards, Section D, Vol. 65, pp. 137147, 1961.
[30] R. Bickmore and R. Hansen, “Antenna Power Densities in the Fresnel Region,” Proceedings of
the IRE, Vol. 47, pp. 21192120, 1981.
[31] Y. Rahmat-Samii, “Reflector Antennas,” in Antenna Handbook, Y. Lo and S. Lee (eds.),
Vol. 2, ch. 15, New York, NY: Van Nostrand Reinhold, 1993.
[32] D. Duan and Y. Rahmat-Samii, “A generalized diffraction synthesis technique for high
performance reflector antennas,” IEEE Transactions on Antennas and Propagation, Vol. 43,
No. 1, pp. 2740, 1995.
Chapter 2
High-Order FDTD Methods
Mohammed F. Hadi and Atef Z. Elsherbeni

The field of computational electromagnetics is concerned with numerical


simulations of wave-related components and systems that range in size from the
nanotube scale, to large machinery (aircrafts and ships), to entire natural or
industrial ecosystems (weather system detection or urban wireless coverage
design). This large variation of scale and complexity often requires employing
multiple numerical techniques to work simultaneously, which is an extremely
challenging undertaking that requires in-depth knowledge of the theoretical
limitations of each involved numerical technique. The FDTD has become, during
the past two decades, the most widely used approach for modeling electromagnetic
waves in complex environments, with a diversity of applications ranging from
determining the optimal location of wireless transmitters for personal communi-
cations, to modeling wave interaction with human tissues and corresponding
medical imaging and diagnostics, to predicting the electromagnetic coupling in
integrated circuits, to large-scale challenges such as modeling ionospheric wave
propagation that is the subject of another chapter in this book.
Although it is the most robust of the computational electromagnetics methods
and thus the most capable amongst them (in theory) to universally model large and
complex problems, FDTD has required and benefited from several key advances
that chipped away at the theoretical and practical limitations that affected its
accuracy and applicability to several important engineering challenges. High-order
FDTD methods represent one of those key advances that expands dramatically its
applicability to large-scale and multiscale problems. For small-scale problems, the
second-order differencing nature of FDTD presents no hindrance to achieving
reliable solutions in terms of field-amplitude accuracy and, to a slightly lesser
degree, phase accuracy. Small phase errors incurred in such class of problems are
due mainly to the Cartesian form of FDTD’s digital grid, which causes simulated
waves to exhibit phase velocities that vary with propagation direction with respect
to the hosting digital space. As the problem size increases (with respect to the
smallest wavelength of interest), these phase velocity errors in particular quickly
get out of hand, which forces the user to switch to ever denser and more
computationally expensive FDTD grids. This double-edged sword of addressing

83
84 Advanced Computational Electromagnetic Methods and Applications

larger physical models with denser digital spaces severely limits FDTD
simulations to dozens of wavelengths at best, even when using the latest and
greatest of today’s supercomputers and other hardware acceleration techniques.
A group of higher-order FDTD methods have been designed to achieve
minimal and near isotropic numerical phase velocity behavior. Employing any of
these methods should, in principle, facilitate obtaining extremely accurate
simulated results when modeling problem scales in the thousands of wavelengths
while using relatively coarse grids relative to the largest wavelength used in the
simulation. This fantastic promise, however, never translated to wide acceptance
by the FDTD community. This is due to an unfortunate combination of (1)
inexperienced use of these methods while not fully understanding their theoretical
underpinnings, and (2) marrying them with ancillary modeling tools and practices
that were designed for and thus limited by the same anisotropic and large phase
errors as standard FDTD.
This chapter will detail the theoretical basis and analysis of a high-order
FDTD method that has received continuous development over the years and
benefited from a fully designed suite of high-order ancillary modeling tools that
matches its phase accuracy performance. These modeling tools will in turn be fully
explained and verified, paying closer attention to the more critical ones: point and
planar wave initiations, absorbing boundary conditions, and planar and curved
PEC modeling. The chapter will conclude with a brief introduction to advanced
forms of this high-order method which offer substantial performance gains at the
expense of higher complexity of implementation.

2.1 FOURTH ORDER DIFFERENCES IN FDTD DISCRETE SPACE

As a representative of high-order FDTD algorithms, the extended-stencil second


order in time, fourth order in space algorithm, which was first introduced in [1], is
selected for this chapter. This is the simplest of the class of extended-stencil FDTD
(where the difference operator spans more than one single FDTD cell) while still
posing the common challenge of developing ancillary modeling tools for this class.
Henceforth, this selected algorithm will be referred to as S24 while the FDTD
algorithm will strictly refer to standard, second-order differencing in both time and
space and can be referred to as S22.
Working with the Yee staggered electromagnetic discrete space [2] (see
Figure 2.1), S24 could be derived by applying fourth order differences in space and
second order differences in time to transform Maxwell’s curl equations
E
  H (2.1a)
t
H
  E (2.1b)
t
High-Order FDTD Methods 85

Figure 2.1 The building block Yee cell for most FDTD algorithm variants.

into the following discrete system of explicit equations:


1 1
n n
Ex 2  Ex 2
K a   K  

i, j , k i, j , k
 Hz
n
 Hz
n  b H n
 Hz
n 
t y  1
i, j  , k
1
i, j  , k  3y  z 3
i, j  , k
3
i, j  , k 
 2 2   2 2 
Ka  n n  Kb  n n 
  H y |i , j , k  1  H y |i , j , k  1    H y |i , j , k  3  H y |i , j , k  3  (2.2a)
z  2 2  3z  2 2 

1 1
n n
Ey 2  Ey 2
K a   K  

i, j , k i, j , k
 Hx
n
 Hx
n  b H n
 Hx
n 
t z  i, j , k 
1
i, j , k 
1  3z  x i, j, k 
3
i, j , k 
3 
 2 2   2 2 

Ka  n n  Kb  n n 
  H z |i  1 , j , k  H z |i  1 , j , k    H z |i  3 , j , k  H z |i  3 , j , k  (2.2b)
x  2 2  3x  2 2 
1 1
n n
Ez 2
i, j , k
 Ez 2
i, j , k Ka  n n  Kb  n n 
 Hy  Hy   
t

x 
1
i  , j,k
1
 3x  H y i  3 , j , k  H y i  3 , j , k 
i  , j,k
2 2   2 2 
Ka  n  K  
  H x |i , j  1 , k  H x |n 1   b  H x |n 3  H x |n 3  (2.2c)
y  2
i , j  ,k
2  3y  i , j 
2
, k i , j  ,
2 
k
86 Advanced Computational Electromagnetic Methods and Applications

1 1
n n
H x |i , j ,2k  H x |i , j ,2k Ka   Kb  
  n n
 E y |i , j , k  1  E y |i , j , k  1  
n n
 E y |i , j , k  3  E y |i , j , k  3 
t z  2 2  3z  2 2 

Ka  n n  Kb  n n 
  Ez |i , j  1 , k  Ez |i , j  1 , k    Ez |i , j  3 ,k  Ez |i , j  3 ,k  (2.2d)
y  2 2  3y  2 2 

1 1
n n
H y |i , j ,2k  H y |i , j ,2k Ka   Kb  
  n n
 Ez |i  1 , j , k  Ez |i  1 , j , k  
n n
 Ez |i  3 , j , k  Ez |i  3 , j , k 
t x  2 2  3x  2 2 

Ka  n n  Kb  n n 
  Ex |i , j , k  1  Ex |i , j , k  1    Ex |i , j , k  3  Ex |i , j , k  3  (2.2e)
z  2 2  3z  2 2 

1 1
n n
H z |i , j ,2k  H z |i , j ,2k Ka   Kb  
  n n
 Ex |i , j  1 , k  Ex |i , j  1 , k  
n n
 Ex |i , j  3 , k  Ex |i , j  3 , k 
t y  2 2  3y  2 2 

Ka  n n  Kb  n n 
  E y |i  1 , j , k  E y |i  1 , j , k    E y |i  3 , j , k  E y |i  3 , j , k  (2.2f)
x  2 2  3x  2 2 
where t is the temporal step, x, y, z are the spatial steps, and n, (i, j, k) are
temporal and spatial indices in the 3-D FDTD grid. The K a and K b will be carried
through as variables to generalize the entire treatment in this chapter to high-order
FDTD variants that use different coefficient values. For S24 in particular, their
Taylor series derived values would be Ka  9 / 8 and Kb  1/ 8 . It might seem
from the above equations that the E and H field values are updated at the same
time step, n. This is the only mathematical license to simplify the mathematical
derivations of dispersion and stability analysis and it agrees with most of the cited
literature for this chapter. The fact remains that the E and H field values are
updated in a leap-frog manner similar to standard FDTD.
The first order of business when deriving or developing a new FDTD
algorithm is to ascertain its stability limit and its dispersion relation. The latter will
govern the algorithm’s numerical dispersion error bounds in the discrete space and
is vital to understand and utilize correctly when developing the various modeling
tools. One approach to derive both together is to inject the above difference
equations with the trial plane wave solution A exp  j nt   x nx x   y ny y
 

  z nz z  (where  is the numerically rendered wave number by the FDTD grid)

and construct a discrete-operator system of equations [3]. Each set of difference
High-Order FDTD Methods 87

operators will correspond to what could be called a discrete operator. For example,
(2.1) above will morph into
E x
t
e 
j t 2
 e  j t 2

K H
y
e 
a z
~
 j y 2
~
 e j t 2
 K3Hy e
b z
~
3 j y 2
~
 e 3 j t 2


Ka H y
z
e ~
 j z z 2
~
 e j z z 2  Kb H y
3z
e ~
 j 3 z z 2
~
 e j 3 z z 2  (2.3a)
 x Ex
sin  t 2 
 Ka H z
sin  y y 2
 Kb H z

sin 3 y y 2   
t 2 y 2 3y 2 (2.3b)

 Ka H y

sin 3 y z 2 K H 
sin 3 y z 2 
a y
3z 2 3z 2

Dt Ex Dy H z  Dz H y
 (2.3c)

The other five update equations will morph into similarly succinct discrete-
operator equations, which could be grouped in matrix form

 Dt 0 0 0  Dz  Dy   Ex 
 0
  Dt 0  Dz 0  Dx   E y 
 
 0 0  Dt  Dy  Dx 0   Ez 
    0 (2.4)
 0  Dz  Dy  Dt 0 0 H x 
  Dz 0  Dx 0  Dt 0 H y 
  
  D y  Dz 0 0 0  Dt   H z 

with all the discrete operators given by


sin t 2
Dt  j (2.5a)
t 2

D
x  jKa
~

sin  x x 2
 jKb
 ~
sin 3 x x 2   (2.5b)
x 2 3x 2

D  jKa
~

sin  y y 2   jK ~
sin 3 y y 2  (2.5c)
y b
y 2 3y 2

D
x  jKa
~

sin  z z 2
 jKb
 ~
sin 3 z z 2   (2.5d)
z 2 3z 2

Setting the determinant of the above system of equations to zero will result in
the algorithm’s dispersion relation
88 Advanced Computational Electromagnetic Methods and Applications


 Dt2 Dx2  Dy2  Dz2 (2.6)

Furthermore, setting t   x x   y y   z z   would insure that all


system’s spatial eigenmodes will be wholly contained within the smallest temporal
eigenmodes, thus ensuring algorithm stability as per eigenvalue theory [4].
Introducing these values into the dispersion relation will result in the maximum
allowable time step, beyond which the algorithm will become unstable:


tmax  (2.7)
1 1 1
K a  Kb 3 2
 2 2
x y z

From this point forward, and to simplify the presentation of subsequent


analyses, the spatial discrete steps will be assumed uniform as in x y z h ,
which would reduce the above time step restriction to the more familiar

h  1
tmax  (2.8)
3 K a  Kb 3

Contrary to FDTD, the S24 maximum time step does not coincide with
optimum phase accuracy. This often unexpected and unlooked for behavior by the
casual user is caused by the imbalance of differencing order between the spatial
and temporal domains. The optimum time step that would minimize numerical
dispersion error could be found through detailed analysis of the dispersion relation
solutions. The following empirical formula can be used to predict this optimum
value [5]
t
toptimum  max (2.9)
0.335R  0.40
where R   / h is the grid density in FDTD cells per wavelength of interest. This
formula is independent of absolute frequency as dependence on frequency is
embedded within tmax .
comparin
The inherent phase error in S24 (and FDTD) can be observed by comparing
the numerical wave number,  , derived from the dispersion relation with its exact
continuous-space value,  . This error changes with propagation direction within
the discrete space. A global measure of this error that accounts for all propagation
directions can be constructed as
2
1 2      ( ,  ) 
     sin  d d (2.10)
4  0 0   
High-Order FDTD Methods 89

Applying this formula to FDTD returns   7 107 . In comparison, S24


returns   4 1011 when the optimum time step is used and   3 107 when the
maximum time step is used, with all values computed at a grid resolution of 20
cells per wavelength. These numbers translate into an ability of S24 to propagate
waves for an average 132 longer distances than can FDTD to accumulate the same
phase error levels at this resolution. This unique attribute is the main reason for
using high-order FDTD algorithms. Using S24 carelessly, as in selecting the
maximum time step, will defeat this purpose as it would do no better in this regard
than FDTD. The following is a complete MATLAB code for computing the global
error  :

% Computing the global phase error from (2.10) for


% the S24 algorithm. It could be used for standard FDTD by
% setting Ka = 1 and Kb = 0
% Dt_max / Dt_Opt, vary until minimum GlobErr
cour = 1;
% Grid Resolution in cells per minimum wavelength of interest
R = 20;
Ka = 9/8;
Kb = -1/8;
% Assumed maximum frequency of interest
f = 9e8;
w = 2 * pi * f;
% Multiply with largest relative dielectric strength of
% interest
epso = 8.854e-12;
muo = 4 * pi * 1e-7;
co = 1/sqrt(muo * epso);
lambda = co/f;
ko = 2 * pi/lambda;
h = lambda/R;
% 3-D Grid
dt = h/(co * cour * sqrt(3)) * 1/abs(Ka - Kb/3);
% 2D Grid
D = (h/(co * dt))^2 * (sin(w * dt/2))^2;
Sumtheta = 0;
% Integrating over a periodic 1/8th of (phi, theta) domain:
for i = 0 : 50,
theta = i/50 * pi/2;
sumphi = 0;
for j = 0 : 50,
phi = j/50 * pi/2;
A = h * sin(theta) * cos(phi)/2;
B = h * sin(theta) * sin(phi)/2;
C = h * cos(theta)/2;
D = (h/co/dt)^2 * (sin(w * dt/2))^2;
K = ko;
oldk = 0;
90 Advanced Computational Electromagnetic Methods and Applications

% Computing the numerical wavenumber from the


% dispersion relation:
while (abs(k - oldk) >= 1e-12)
oldk = k;
fun = (Ka * sin(k * A)+Kb/3 * sin(3 * k * A))^2 ...
+ (Ka * sin(k * B)+Kb/3 * sin(3 * k * B))^2 ...
+ (Ka * sin(k * C)+Kb/3 * sin(3 * k * C))^2-D;
dfun = 2 * (Ka * sin(k * A)+Kb/3 *sin(3 * k * A))...
*(Ka * A * cos(k * A)+Kb * A *cos(3 * k * A))...
+ 2 *(Ka * sin(k * B)+Kb/3 * sin(3 * k * B))...
*(Ka * B * cos(k * B)+Kb * B * cos(3 *k * B))...
+2 * (Ka * sin(k * C)+Kb/3 * sin(3 * k * C))...
*(Ka * C * cos(k * C)+Kb * C * cos(3 * k * C));
k = k - fun/dfun;
end
sumphi=sumphi + ((ko - k)/ko)^2;
end
sumtheta = sumtheta + sumphi * sin(theta);
end
GlobalErr = sumtheta/(51 * 51)

2.2 SEAMLESS HYBRID S24/FDTD SIMULATIONS

There are many situations when a user desires to implement S24 in a hybrid
simulation with FDTD. An example of such a situation would be modeling the
vicinity of perfect electric conductor (PEC) boundaries or absorbing boundary
layers with regular FDTD in an otherwise global high-order implementation. A
wave that traverses a virtual boundary in an FDTD grid between two regions, one
updated with S24 and another updated with FDTD, would encounter a numerical
impedance mismatch. As with the continuous domain planar interfaces theory, this
mismatch would cause total wave reflections or surface waves if the wave angle of
incidence upon the virtual boundary is steep enough. The reflection coefficient of
such an interface could be accurately predicted by the following formula [6]

cos  2P cos(  2 x h / 2)
1
cos 1P cos( 11xx h / 2)
 (2.11)
cos  2P cos(  2 x h / 2)
1
cos 1P cos( 11xx h / 2)

where
Dy
 P  tan 1 (2.12)
Dx
High-Order FDTD Methods 91

assuming the plane of incidence coincides with the x-y plane and the virtual
interface is along the y-axis. Applying this formula starts with specifying a value
for the incidence angle from medium 1 (S (S24) into medium 2 (S22). The x- and y-
components of 1 are then computed using the dispersion relation in medium 1.
 2 x is computed next after enforcing  2 y  1 y at the interface, using the
dispersion relation of medium 2. Both  P values are then computed to eventually
yield the reflection coefficient,  . The necessary dispersion relation is based on
(2.6) for S24. The same relation could be used for FDTD after setting K a  1 and
Kb  0 . At a typical 20 cells per wavelength resolution, the reflection coefficient
affecting a wave transiting from an S24 medium to an FDTD medium maintains
levels below 60 dB for all incidence angles from normal incidence to 45o. As the
incidence angle grows steeper, however, the reflection coefficient will rise rapidly
to total reflection territory. This can introduce serious simulation errors in wave
resonance applications or where the virtual interface spans multiple wavelengths.
The following is a complete MATLAB code that computes the reflection
coefficient,  , across an S24/S22 interface:

% Computing the reflection coefficient across an S24/S22


% interface within an FDTD grid. Wave assumed impinging from
% the left S24 space onto the right S22 space R=20. Grid
% Resolution in cells per minimum wavelength of interest.
% dt_max/dt_opt from (2.9)
cour = 7.28;
Ka = 9/8;
Kb = -1/8;
% Assumed maximum frequency of interest f=1e9;
w = 2 * pi * f;
epso = 8.854e-12;
muo = 4 * pi * 1e-7;
co = 1/sqrt(muo * epso);
lambda = co/f;
ko = 2 * pi/lambda;
h = lambda/R;
dt = h/(co * cour * sqrt(3)) * 1/abs(Ka - Kb/3);
% Incidence Propagation Angle Vector
tvec = [];
% Reflection Coefficient Vector in dB
S24vec = [];
% Transmitted Propagation Angle
phit = [];
% Incidence Polarization Angle
phiPvec = [];
92 Advanced Computational Electromagnetic Methods and Applications

% Transmitted Polarization Angle


phitPvec = [];
% Incidence propagation angle
phi = 0 : pi/2/200:pi/2;
phi = phi';
for j = 1 : 201,
A = h * cos(phi(j))/2;
B = h * sin(phi(j))/2;
C = 0;
% Computing the numerical wavenumber from the S24
% dispersion relation:
D = (h/(co * dt))^2*(sin(w * dt/2))^2;
k = ko;
oldk = 0;
while (abs(k - oldk) >= 1e-12)
oldk = k;
fun = (Ka * sin(k * A) + Kb/3 * sin(3 * k * A))^2 ...
+ (Ka * sin(k * B) + Kb/3 * sin(3 * k * B))^2 - D;
dfun = 2 * (Ka * sin(k * A) + Kb/3 * sin(3 *k * A))...
*(Ka * A * cos(k * A) + Kb * A * cos(3 *k * A))...
+ 2 * (Ka * sin(k * B) + Kb/3 * sin(3 *k * B))...
* (Ka * B * cos(k * B) + Kb * B * cos(3 * k * B));
K = k - fun/dfun;
end
Dx1 = Ka * sin(k * A)/(h/2) + Kb * sin(3 * k*A)/(3 * h/2);
Dy1 = Ka * sin(k * B)/(h/2) + Kb * sin(3 * k*B)/(3 * h/2);
% Assumed same across the interface as per boundary
% conditions
Ky = k * sin(phi(j));
% Computing the numerical wavenumber from the S22
% dispersion relation:
kx = 2/h * asin(sqrt(D - (sin(ky * h/2))^2));
Dx2 = sin(kx * h/2)/(h/2);
Dy2 = sin(ky * h/2)/(h/2);
phit = [phit; atan(ky/kx)];
phiP = atan(Dy1/Dx1);
phitP = atan(Dy2/Dx2);
phiPvec = [phiPvec; phiP];
phitPvec = [phitPvec; phitP];
kappa = cos(phitP) * cos(h/2 * kx) ...
/(cos(phiP) * cos(h/2 * (k * cos(phi(j)))));
Gamma = (1 - kappa)/(1 + kappa);
Tvec = [tvec; phi(j) * 180/pi];
High-Order FDTD Methods 93

S24vec = [S24vec; 20 * log10(abs(Gamma))];


end
plot(tvec, S24vec), ...
xlabel('Incidence Angle (in degrees)'), ...
ylabel('Numerical Reflection Coefficient (in dB)'), ...
axis([0 90 -130 0])

PEC Boundary

y
x

Figure 2.2 Collapsing S24 into FDTD (S22) normally at planar boundaries while maintaining the
high phase accuracy of S24 differencing along the transverse plane.

0
With phase-matching
-20 Without phase-matching
Reflection coefficient (dB)

-40

-60

-80

-100

-120

0 10 20 30 40 50 60 70 80 90
Incidence angle (degrees)

Figure 2.3 Numerical reflection coefficient off an S24/S22 planar interface.

One way to mitigate this issue is to adjust the interfacing algorithms such that
their tangential numerical wave numbers are identical. In the example above, this
could be accomplished by modifying medium 2 with S24 differencing along the y-
94 Advanced Computational Electromagnetic Methods and Applications

oriented interface, while maintaining FDTD differencing along the normal x-axis
to facilitate dealing with planar PEC boundaries (see Figure 2.2). Implementing
this seamless hybrid approach will ensure that normal incidence reflection errors
will be the upper error bounds for all wave incidence angles upon the cross-
algorithm interface. Figure 2.3 demonstrates the effect of this phase-matching on
cross-algorithm spurious reflections as explained here.

2.3 ABSORBING BOUNDARY CONDITIONS

Applying any of the perfectly matched layer (PML) absorbing boundary conditions
for S24 is basically the same as with FDTD, since both are using the same
temporal differencing order. The established empirical formulas in the literature
for the split-field or uni-axial PML forms equally apply and provide accurate
values for optimum PML parameters. For simulations where there is enough
separation between scatterers and PML regions to ensure little or no steeply
impinging waves on the PML boundaries, regular PML will perform wonderfully,
and there will be no added benefit from using the convolutional PML (CPML).
There are situations, however, when PML regions need to stay in close proximity
to large scatterers due to lack of computing resources and hence, steep wave
incidence and even wave evanescence cannot be avoided. CPML is mandatory for
such situations to effectively absorb all outgoing energies. Furthermore, extreme
care is required when selecting the optimum values for all of CPML’s six
parameters. The above-mentioned empirical formulas would not avail for such
situations, so the user should consider exhaustive-search optimization to find these
optimum parameters. This means running the entire model size for dozens and
often hundreds of times and comparing the results with a much larger reference
simulation, a brute force and extremely time-consuming task even for relatively
small simulations.
This approach is impractical for electrically large problems that usually call
for the use of high-order FDTD algorithms such as S24. For such situations, a
direct optimization approach that sans multiple simulation runs is required. Hadi
recently presented such an approach for FDTD and high-order FDTD algorithms
[7]. The mathematical manipulations in that reference cannot be summarized here
without losing clarity, and the reader is referred to Section III there with the sole
change of redefining Equation (21) in reference [7] to become K y  Kb / 3 . While
implementing the procedure there is an involved process, it will guarantee
optimum CPML parameters for various situations in the span of a few minutes.
The following two MATLAB program lists work together, using functions from
Mathwork’s Global Optimization Toolbox, to compute optimum values of the
CPML parameters: max, n, max, n, max, and n. Computations account for large
scatterers located in very close proximity to the CPML boundary through
High-Order FDTD Methods 95

introducing an evanescence variable (cosh()) as a function of the largest scatterer


dimension ( wmax ) and minimum simulation frequency (  min ) [8]:

2
 1 
cosh(  )  1    (2.13)
  min wmax 

% Optimization calling program for predicting CPML parameters


% around a metal plate scatterer
% format long

V = [10 3 10 0.3];
% [sigmax nsig = nkap kapmax amax], na=1  Initial guess
A = [-1 0 0 0; 0 -1 0 0; 0 0 -1 0; 0 0 0 -1];
B = [0; 1; 1; 0];
Upper = [100 10 100 100];
Lower = [0 1 1 0];
Opts = optimset('Algorithm', 'active-set', 'tolx', 0.0001,'
tolfun', 0.0001, ... 'maxiter', 2000, 'maxfuneval', 2000);
gs = GlobalSearch('Display', 'iter');
problem = createOptimProblem('fmincon', 'x0', v, ...
'objective', @S24PlateThPars, 'Aineq', A, 'bineq', b,
'ub', upper, ... 'lb', lower, 'options',opts);
[xming, fming, flagg, outptg, manyminsg] = run(gs, problem);
disp('Optimum [sigmax nsig = nkap kapmax amax na] values:')
Opt_CPML = [xming 1]
disp('Max Refl. Coeff. (in dB) across desired CPML incidence
angles:')
Gamma_Max = fming
% Called program by the optimization routine function
% GError=S24PlateThPars(v)
sigxmax = v(1);
nsig = v(2);
kapxmax = v(3);
nkap = v(2);
axmax = v(4);
na = 1;
% Optimization is performed for the entire range (0 –
% max_incidence)
% Max incidence angle on CPML layer, <=88 degrees
max_incidence = 85;
j = sqrt(-1);
Ka = 9/8;
96 Advanced Computational Electromagnetic Methods and Applications

Kb = -1/8;
KKy = Kb/3;
% Design frequency or where most scattered energy is expected
F = 1e9;
w = 2 * pi * f;
epso = 8.854e-12;
muo = 4 * pi * 1e-7;
co = 1/sqrt(muo * epso);
ko = w/co;
lambda = co/f;
% Grid Resolution in cells per minimum wavelength of interest
R = 20;
% Number of CPML layers
N = 10;
H = lambda/R;
% Max dt is used
dt = h/(co * sqrt(3)) * 1/abs(Ka - Kb/3);
thvec = 0 : 1 : 90;
thvec = thvec(1 : length(thvec) - 1);
Gammavec = [];
% Assuming a scatterer with 10 cm largest dimension
Scat = 0.1;
% in close proximity to CPML boundary
% Assuming minimum freq. of interest is 1/2 design freq.
Fmin = f/2;
Kmin = 2 * pi * fmin/co;
% Set chi=0 if no evanescence is expected
chi = acosh(1 + (1/(kmin * scat))^2);
% Sweeping across all incidence angles, 0 – 90 degrees:

for i = 1 : length(thvec)
th = thvec(i) * pi/180;

% Computing the wave number outside the CPML region,


% assuming an impinging evanescent wave:
C = cosh(chi) * cos(th) - j*sinh(chi) * sin(th);
S = cosh(chi) * sin(th) + j*sinh(chi) * cos(th);
k1 = ko;
oldk = 0;
Dtx1 = sin(w * dt/2)/(dt/2);
Dty = Dtx1;

while (abs(k1 - oldk) >= 1e-12)


High-Order FDTD Methods 97

oldk = k1;
Dx1 = Ka * sin(k1 * C * h/2)/(h/2) + Kb * sin(3 * k1 *
C * h/2)/(3 * h/2);
Dy = Ka * sin(k1 * S * h/2)/(h/2) + Kb * sin(3 * k1 * S
* h/2)/(3 * h/2);
dDx1 = Ka * C * cos(k1 * C * h/2) + Kb * C * cos(3 * k1
* C * h/2);
dDy = Ka * S * cos(k1 * S * h/2) + Kb * S * cos(3 * k1
* S * h/2);
fun = (Dx1/Dtx1)^2 + (Dy/Dty)^2 – muo * epso;
dfun = 2 * Dx1 * dDx1/Dtx1^2 + 2 * Dy * dDy/Dty^2;
k1 = k1 - fun/dfun;
end

% Setting up equations’ parameters that govern inter-


% reflections outside and within CPML layers [12]

kx1 = k1 * C;
C = co * Dx1/Dtx1;
alpha = co * dt/h/C;
D = zeros(2 * N + 2, 1);
for n = 4 : 2 * N + 2
sigx = sigxmax * ((n - 3)/(2 * N))^nsig;
kapx = 1 + (kapxmax - 1) * ((n - 3)/(2 * N))^nkap;
alphx = axmax * ((2 * N - (n - 3))/(2 * N))^na;
Ax = 1;
px = exp(-(sigx/kapx + alphx) * dt/epso);
if sigx == 0,
qx = 0;
else
qx = sigx * (px-1)/(kapx * (sigx + alphx * kapx));
end
Bx = 1/kapx + qx/(1 - px * exp(-j * w * dt));
Omx = (exp(j * w * dt/2) – Ax * exp( - j * w *
dt/2))/(j * 2 * Bx);
D(n) = 1/(2 * j * Omx);
end

D(1) = 1/(2 * j * sin(w * dt/2));


D(2) = D(1);
D(3) = D(1);
U1 = 1 + alpha * D(1) * (Ka * exp(-j * kx1 * h/2) + KKy *
exp(-j * 3 * kx1 * h/2));
98 Advanced Computational Electromagnetic Methods and Applications

V1 = - 1 + alpha * D(1) * (Ka * exp(j * kx1 * h/2) + KKy


* exp(j * 3 * kx1 * h/2));
U2 = -alpha * D(2) * (Ka – Kky * exp(-j * kx1 * h));
V2 = alpha * D(2) * (Ka + KKy * exp(j * kx1 * h));
U3 = alpha * D(3) * KKy * exp(-j * kx1 * h/2);
V3 = alpha * D(3) * KKy * exp(j * kx1 * h/2);
U4 = -alpha * D(4) * KKy;
V4 = alpha * D(4) * KKy;
M = zeros(2 * N + 2, 2 * N + 2);
B = zeros(2 * N + 2, 1);
M(1, 1) = U1;
M(1, 2) = alpha * D(1) * Ka;
M(1, 4) = alpha * D(1) * KKy;
B(1) = V1;
M(2, 1) = U2;
M(2, 2) = 1;
M(2, 3) = alpha * D(2) * Ka;
M(2, 5) = alpha * D(2) * KKy;
B(2) = V2;
M(3, 1) = U3;
M(3, 2) = -alpha * D(3) * Ka;
M(3, 3) = 1;
M(3, 4) = alpha * D(3) * Ka;
M(3, 6) = alpha * D(3) * KKy;
B(3) = V3;
M(4, 1) = U4;
M(4, 3) = -alpha * D(4) * Ka;
M(4, 4) = 1;
M(4, 5) = alpha * D(4) * Ka;
M(4, 7) = alpha * D(4) * KKy;
B(4) = V4;
M(2 * N + 2, 2 * N + 1) = -alpha * D(2 * N + 2);
M(2 * N + 2, 2 * N + 2) = 1;
M(2 * N + 1, 2 * N) = -alpha * D(2 * N + 1);
M(2 * N + 1, 2 * N + 1) = 1;
M(2 * N + 1, 2 * N + 2) = alpha * D(2 * N + 1);
M(2 * N, 2 * N - 3) = -alpha * D(2 * N) * KKy;
M(2 * N, 2 * N - 1) = -alpha * D(2 * N) * Ka;
M(2 * N, 2 * N) = 1;
M(2 * N, 2 * N + 1) = alpha * D(2 * N) * Ka;
for n = 5 : 2 * N - 1
M(n, n - 3) = -alpha * D(n) * KKy;
M(n, n - 1) = -alpha * D(n) * Ka;
High-Order FDTD Methods 99

M(n, n) = 1;
M(n, n + 1) = -M(n, n - 1);
M(n, n + 3) = -M(n, n - 3);
end

% Solving the system of equations for the multilayer


% reflection coefficient Only first value of Gamma vector
% is of interest, reflection coefficient at the front
% face of the CPML region
Gamma = M\B;
Gammavec = [Gammavec; 20 * log10(abs(Gamma(1)))];
end

% Optimization over the CPML incidence angle range, up to


% max_incidence:
GError = max(Gammavec(1 : max_incidence));

2.4 POINT CURRENT AND FIELD SOURCES

Injecting point sources in high-order FDTD follows the same guidelines as for
FDTD. Hard sources such as current sources representing antennas and input
probes are injected by simply replacing the update equation at the source location
with the time varying source function. Injecting soft (field) sources involves
adding the source function to the existing update equation at the source location.
The propagated waveform due to a soft source differs slightly from the intended
source function. In this regard, the FDTD discrete system of equations acts as a
pseudo-circuit with its own impulse response, which causes reshaping the injected
soft source function and propagating a slightly modified waveform. This effect is a
function of the Yee grid parameters as well as of the implemented FDTD
algorithm parameters. To counteract this grid/algorithm effect on the desired field
injection, the grid/algorithm impulse response, h[n] , needs to be measured using a
matching grid/algorithm simulation that is unbounded and populated
homogeneously with the same medium hosting the source location [9]. The
impulse response is then stored and reused in the actual simulation run by
convolving it with the field source function of choice, f [n] :
n 1
Ez |isn , js , ks  update equation  f |n  h[n  l ] f |l 1 (2.14)
l 0

Alternatively, instead of storing banks of impulse response measurements that


change with every grid parameter and algorithm variation, these measurements
could be used to construct fifth order infinite impulse response (IIR) filters that are
much easier to archive and disseminate among collaborating research teams [10].
100 Advanced Computational Electromagnetic Methods and Applications

More critically, obtaining impulse response measurements that are long enough to
encompass the entire simulation run can quickly become prohibitive due to
memory and run time limitations imposed on the unbounded reference simulation.
However, those IIR filters could be reliably constructed using impulse response
measurement that are only a few hundred time steps long.
This process starts by constructing the IIR filter, H(z), from the collected time
measurements using, for example, MATLAB’s Prony.m function:
5

b z k
k

H ( z)  k 1
5
(2.15)
1   ak z  k
k 1

This filter can then be used in the actual simulation to generate a synthesized
impulse response on the fly
5
hIIR [n]   bk x[n  k ]  ak hIIR [n  k ] (2.16)
k 1

where

1, n  0
x[n]   (2.17)
0, n  0
The above choice of fifth order filters will ensure that source injection error
levels remain below 90 dB. Lower error levels could be obtained by constructing
higher-order IIR filters.
The following is part of a MATLAB program that records the impulse
response of the FDTD grid/algorithm and computes its corresponding IIR filter.
The number of time steps needed is generally 100 to 200 to get an accurate IIR
filter.

(Grid/algorithm initializations)
% Impulse response, N is number of time steps
IR = zeros(N+1, 1);
% Initial value at source location
Ez(Is, Js, Ks) = 1;
% Time loop begins
for n = 1 : N,
(update H fields)
IR(n+1)=(Ez update equation at source location)
(update E fields)
% simulating a discrete impulse function hard source
Ez(Is, Js, Ks)=0;
end % Time loop ends
High-Order FDTD Methods 101

% Computes a fifth order IIR filter


[b a] = prony(IR, 5, 5);

The following is part of a MATLAB program that utilizes the computed IIR
filter above. Grid/algorithm parameters must be the same. Number of time steps
could be smaller or much larger than the one used to derive the IIR filter.

(Grid/algorithm initializations)
(Input or read the IIR filter parameters a and b)
% Computes the filter’s impulse response
H = impz(b, a, N+1);

(Define G for n=1:N, whatever chosen source function for the


simulation)
for n=1 : N, %Time loop begins
(update H fields)
(update E fields)
% Starts the source convolution with the IIR filter’s
% impulse response
conv=0;
For m=1 : n,
conv=conv+H(n-m+1)*G(m);
end

% A properly transparent soft (field) source for the


% simulation
Ez(Is, Js, Ks)= Ez(Is, Js, Ks) + G(n) - conv;
end %Time loop ends

2.5 PLANE WAVE SOURCES

Introducing plane wave sources into an FDTD grid for scattering-type problems is
best performed using a total-field/scattered-field (TFSF) approach [2]. This
approach has recently been perfected to produce computing machine-level
accuracy [11], with the introduction of a 1-D propagator that coincides perfectly
with the main FDTD grid in terms of precise source field mapping, finite-
difference matching and identical numerical dispersion characteristics. Precise
source mapping is accomplished through limiting the plane wave incidence angles
to rational ratios of number of FDTD cells along the y- and x-directions (assuming
the plane wave is injected within the x-y plane of the FDTD grid). For example,
instead of selecting an incidence angle of  = 20o, one would choose instead
102 Advanced Computational Electromagnetic Methods and Applications

  tan 1 m y mx  tan 1 7 19  20.2o . When this choice is coupled with a cell size
of
h cos 
r  (2.18)
mx

along the 1-D propagator, mapping source values from the propagator to the main
grid would simplify to direct substitutions that avoid error-causing interpolations
as shown in Figure 2.4. The 1-D propagator is then populated with a colocated
( H xs , H ys ) pair and a colocated ( Ezxs , Ezys ) pair that are staggered by a r / 2
distance.
The finite-difference matching is accomplished through modifying the
difference operator such that every half-step in the main grid is matched with
mx / 2 steps for x-differencing and my / 2 steps for y-differencing. The mx and
m y values need to be odd integers for proper matching. The corresponding update
equations along the 1-D propagator that will result in an identical dispersion
relation to the S24 algorithm of the main grid would then be [12]
1 1
n n
s s
E zx 2  E zx 2
Ka  s n n  Kb  s n n 
 m m
 H m  H ys m
 H 3m  H ys 3m

t h  y m x m x  3h  y m x m x 
 2 2   2 2 
(2.19a)
1 1
n n
s s
E zy 2  E zy 2
Ka  s n n  Kb  s n n 
 m m
 H my  H xs my
 H 3m y  H xs 
3m y
t h  x m m  3h  x m m 
 2 2   2 2 
(2.19b)
1 1
n n
H xs |m 2
 H xs |m 2
Ka  s n  Kb  s n 
  s n
 Ez | m y  Ez | m y  E |  Ezs |n 3my 
t h  m m  3h  z m  3my m 
 2 2   2 2 
(2.19c)
1 1
n n
s s
H | y m
2
H | y m
2
Ka  s n  Kb  s n 
  s n
 Ez |m  mx  Ez |m  mx
s n
  3h  Ez |m  3mx  Ez |m  3mx 
t h  2 2   2 2 

(2.19d)
s s s
where E  E  E and m is the spatial index counter along the 1-D propagator.
z zx zy

A few of the leading field nodes within the 1-D propagator need to be hard-sourced.
High-Order FDTD Methods 103

Readers are referred to [12] for one possible way of accomplishing it as well as
finer implementation details of this TFSF approach.

Figure 2.4 Mapping of source nodes from the 1-D propagator to a generalized nonuniform main grid.
No interpolation is required.

This perfect TFSF plane wave injection has also been developed for general
directions within S24 implementation upon 3-D FDTD grids [13]. This
generalization is accomplished by additionally selecting the 1-D propagator angle
off the z-axis as

mx2  my2
  tan 1 (2.20)
mz
Furthermore, the 1-D propagator would be populated by all six field
components, with all three E field nodes colocated. The same goes for all
three H field nodes, which are staggered from the E nodes by r / 2 . The spatial
step along the 1-D propagator would be
h cos  sin 
r  (2.21)
mx
104 Advanced Computational Electromagnetic Methods and Applications

2.6 PEC MODELING

There have been several approaches to modeling irregularly shaped PEC


boundaries for regular FDTD. Most of them could be adapted to work natively in
S24 simulations. Depending on the structure of the PEC object and level of
modeling precision required, the programmer has a choice to make, trading off
modeling complexity against modeling simplicity.

2.6.1 Planar PEC Boundaries

For structures that involve planar boundaries coinciding with the FDTD grid axes,
there would be no need for subcell conformal modeling once the FDTD grid is
designed properly. Examples of such structures are arrays of microstrip antennas
and equipment emissions/susceptibility modeling for electromagnetic compatibility
purposes. In such cases, the approach discussed in Section 2.2 of surrounding the
PEC boundaries with a one-cell-thick layer that has regular FDTD differencing
normal to the PEC boundary and S24 differencing in the transverse directions
would work admirably. Phase accuracy would be perfectly maintained and only a
small penalty in cross-algorithm spurious reflections would be observed. For the
record, this small error would be negligible compared to the inherent spurious
errors even in the most elaborate of today’s PEC conformal techniques.

2.6.2 Noncritical Curved PEC Models

In some cases, modeling a PEC object is only a minor consideration with respect to
the main objective of the simulation. Examples would be PEC objects embedded in
highly lossy dielectrics, or subwavelength PEC objects, or PEC backbones of PML
absorbing boundary conditions. In such cases, it would be safe to collapse the S24
algorithm to FDTD within subregions that contain such PEC objects. Within those
subregions, conformal PEC modeling would be accomplished natively within
FDTD.

2.6.3 Critical Curved PEC Models

In many applications, however, S24 advantages need to be maintained while


modeling curved PEC boundaries. In such cases, conformal PEC modeling needs
to be implemented natively within S24. The following approach is one of the
several such recently developed implementations. It is based on the Simplified
Conformal (SC) technique [14], which was later extended to high-order FDTD
[15].
As in almost all conformal PEC techniques, only the magnetic field update
equations need to be altered:
High-Order FDTD Methods 105

1 1
 lxa | 1 Ex |n 1 lxa | 1 Ex |n 1 
n n K t  i, j  ,k i, j  ,k i, j  ,k i, j  ,k 
H | 2
H | 2
 a  2 2 2 2
z i, j ,k z i, j ,k
 hsa  lya | 1 E y | 1 lya | 1 E y |n 1 
n
i  , j ,k i  , j ,k i  , j,k i  , j,k
 2 2 2 2 

 lxb | 3 Ex |n 3 lxb | 3 Ex |n 3 
K t  i, j  ,k i , j  ,k i , j  ,k i , j  ,k 

 (2.22)
2 2 2 2
 b 
3 hsb   l | E | n
 l | E | n
i  , j ,k 
yb 3 y 3 yb 3 y 3
i  , j ,k i  , j ,k i  , j ,k
 2 2 2 2 

where la , lb are the normalized PEC-free edge lengths to h , 3h , respectively, and


sa , sb are normalized PEC-free face areas to h 2 , (3h) 2 , respectively (see Figure
2.5).

Figure 2.5 Identifying PEC-free edge lengths for the SC mapping technique. The PEC-free surface
area sa is a subset of sb .

This and the matching update equations for the other magnetic field
components work well for most S24 cells encroached upon by PEC boundaries.
Numerical stability considerations dictate, however, that there is a limit to how
small the PEC-free areas could be before the onset of numerical instability. The SC
technique amends the update equations for these problematic cells by modifying
(reducing) the normalized edge lengths where needed to maintain stability:

la mod  min(2min( sa ), la ) (2.23)


106 Advanced Computational Electromagnetic Methods and Applications

where min( sa ) refers to the smallest of the four normalized PEC-free surfaces
sharing the edge la . The same modification is applied to the outer loop edges’
lengths

lb mod  min(2min( sb ), lb ) (2.24)

As an exception to the above modifiers, if the inner loop is wholly embedded in the
PEC region ( sa  0 ), then all a and b edge lengths are set to zero to produce a
zero value for the magnetic field there ( sa should be reset to unity to avoid
division by zero).

2.7 ADVANCED FORMS OF HIGH-ORDER FDTD ALGORITHMS

As mentioned earlier in this chapter, S24 is one of the simpler forms of the
extended-stencil class of high-order FDTD algorithms. It excels at being suitable
for wide-band application, has a fairly low count of floating-point operations per
update equation, which suits it well for fine-grained graphical processor computing,
and is the most understood and widest used high-order form in the literature. Its
main disadvantage concerns its need to use an optimum time step that is roughly
one-tenth of its maximum value allowable by its stability criterion. Obviously, this
cuts deep into its efficiency advantages over FDTD. Two additional variants of this
class of high-order FDTD algorithms will be briefly discussed here, which will
remedy this disadvantage and increase computational efficiency by orders of
magnitudes at the expense of more modeling complexity.

2.7.1 The Finite Volumes-Based FV24 Algorithm

FDTD can be equally derived through applying finite differences to the differential
form of Maxwell’s equations, as well as through applying finite sums to the
integral form of those equations. Most electromagnetics experts would think of
Ampere’s and Faraday’s laws when integral Maxwell’s equations are mentioned.
However, a different and little used form of Maxwell’s equations is available
which will be more useful in deriving an extremely phase-coherent high-order
FDTD algorithm [16]
E
 
V t S 
dv  ds  H (2.25)

H
  V t 
dv  ds  E
S
(2.26)
High-Order FDTD Methods 107

FV24 [5] applies finite-sums over two concentric surfaces surrounding the
field node of interest, with the critical advantage of including all the tangential
field nodes on the outer surface as demonstrated in Figure 2.6.
1 1
n n
Ex 2  Ex 2
Ka  n n 

i, j , k i, j , k
 H n
 Hz
n
 Hy 1  Hy 1

t h  z 1
i, j  , k
1
i, j  , k i, j , k  i, j , k  
 2 2 2 2 
Kb  n n n n 
  H z |i , j  3 , k  H z |i , j  3 ,k  H y |i , j ,k  3  H y |i , j ,k  3 
3h  2 2 2 2 

 H z |n 3  H z |n 3  H z |n 3 H z |n 3

 i 1, j  , k
2
i 1, j  , k
2
i , j  , k 1
2
i , j  , k 1
2 
 
 H z |n 3  H z |n 3  H z |n 3 H z n
|
Kc  
3
i 1, j  , k i 1, j  , k i , j  , k 1 i , j  , k 1
  2 2 2 2 
12h   H y |n 3 H y |
n
3 H y |
n
3 H y
n
| 3 
 i , j 1, k 
2
i , j 1, k 
2
i 1, j , k 
2
i 1, j , k 
2 
  H |n n n n 
 y 3 H y | 3 H y | 3 H y | 3 
i , j 1, k  i , j 1, k  i 1, j , k  i 1, j , k 
 2 2 2 2 
 H z |n 3  H z |n 3  H z |n 3 H z |n 3

 i  1, j 
2
, k 1 i 1, j 
2
, k 1 i 1, j  , k 1
2
i 1, j  , k 1
2 
 
 H z |n 3  H z |n 3  H z |n 3 H z |n

Kd  
3
i 1, j  , k 1 i 1, j  , k 1 i 1, j  , k 1 i 1, j  , k 1
  2 2 2 2  (2.27)
12h   H y |n 3 H y |
n
3 H y |
n
3 H y |n
3 
 i 1, j 1, k 
2
i 1, j 1, k 
2
i 1, j 1, k 
2
i 1, j 1, k 
2 
  H |n H n
H n n 
 y 3  y | 3  y | 3 H y | 3 
i 1, j 1, k  i 1, j 1, k  i 1, j 1, k  i 1, j 1, k 
 2 2 2 2 

The field nodes are grouped according to their spatial displacement from the
central field node to be updated, in this case Ex |i , j , k . Each group is then multiplied
by its own coefficient, and they in turn are optimized through the corresponding
numerical dispersion, to yield the least global phase error,  , from equation
(2.10). The same dispersion relation in equation (2.6) applies to FV24, with the
following discrete operators, which could be derived as illustrated in Section 2.1:

Dx   jK a
~

sin  x h 2
j
~
sin 3 x h 2   K ~  ~
 ~ ~ 
 K b  c cos  y h  cos  z h  K d cos  y h cos  z h    
h2 3h 2  2 
(2.28a)

Dy   jK a
~

sin  y h 2   j sin3~ h2 
y  K ~
 ~ ~
~ 
 K b  c cos  x h  cos  z h  K d cos  x h cos  z h   
h2 3h 2  2 
(2.28b)
108 Advanced Computational Electromagnetic Methods and Applications

Dz   jK a
~
sin  z h 2j
~

sin 3 z h 2  K
~ ~
  ~ ~ 

 K b  c cos  x h  cos  y h  K d cos  x h cos  y h 
h2 3h 2  2 
(2.28c)

Figure 2.6 The extended-stencil set of field nodes used for FV24 update equations. Shaded areas are
the constant-field portions of the discrete integrals.

The maximum time step for a stable FV24 can be found, using the approach in
Section 2.1 again, to be

h  1
tmax  (2.29)
3 K a  K b  K c  K d  3

The entire FV24 formulations can be collapsed to S24 and FDTD with the
proper selection of the K-tuning parameters; setting Ka = 9/8, Kb = 1/8, Kc = Kd =
0 would yield S24 while setting Ka  1, Kb  Kc  Kd  0 would yield FDTD.
The main performance advantage of FV24 over S24 becomes apparent for
single-frequency or narrow band applications. At an R  20 cells per wavelength
grid resolution, a properly tuned FV24 algorithm is capable of incurring a global
phase error from (2.10) that is seven orders of magnitude lower than S24 at the
design frequency. To put this matter in perspective, in general and succinctly, the
level of grid resolution refinement required by FDTD for matching the phase
coherence of S24 and FV24 can be stated, respectively, as [5]

2
RFDTD  RS24 (2.30)
3
RFDTD  RFV24 (2.31)
High-Order FDTD Methods 109

2.7.2 High-Order Algorithms for Compact-FDTD Grids

Modeling wave propagation through electrically large waveguiding structures such


as road tunnels can be very challenging computationally if the attempt is made
with a 3-D FDTD grid, even when using high-order FDTD. For one thing the
waveguide’s cross-section can be huge by itself, and adding the third longitudinal
dimension will only exacerbate the memory capacity problem. Moreover,
articulating the waveguide’s modes at near unbounded operating frequency would
require extensive run times. Fortunately, waveguide theory could be used to allow
modeling such structures using 2-D compact FDTD grids, where staggered field
nodes along the longitudinal dimension of the FDTD grid are compacted (or
colocated) within the same transverse FDTD plane as shown in Figure 2.7.
Furthermore memory savings are utilized by realizing that unlike in 3-D grids
where spatial steps are required to be related to the unbounded wavelength as in
h   / R , this quantity need only be related to the much larger transverse
wavelength T [17]


T  (2.32)
2
 
1  z 
 
In practical terms, regardless of the electrical size of the waveguide’s cross-
section, a total grid size in the order of 20 × 20 FDTD cells is all that is required to
model it accurately, unless fine details articulation is needed. On the flip side, this
accurate modeling would require a substantial reduction in temporal steps since the
time step must still relate to the unbounded wave period. A well-designed high-
order algorithm using the compact FDTD grid could run over a hundred times
faster than regular FDTD. This time, however, high-order differencing needs to
extend to the time derivatives in Maxwell’s equations.
Starting with Maxwell’s equations and replacing the spatial derivative along
the waveguide’s longitudinal dimension (assumed here to be along the z-axis) with
the  j z term, two decoupled sets of equations can be produced. One of these, the
more suitable to use with the grid design in Figure 2.7, is
E x H z
   z H y (2.33a)
t y

E y H z
   zHx (2.33b)
t x

E z H y H x
   (2.33c)
t x y
110 Advanced Computational Electromagnetic Methods and Applications

H x E
   z   z Ey (2.33d)
t y

H y Ez
    z Ex (2.33e)
t x

H z Ex E y
   (2.33f)
t y x

Figure 2.7 Compact-FDTD grid for modeling wave propagation through longitudinally invariant
structures.

These equations are then discretized using fourth-order finite-differences


applied to all derivatives:

E x K a   K   n
  H n
 Hz
n  b H n
 Hz
n  H (2.34a)
t h  z i, j 
1
i, j 
1  3h  z i, j 
3
i, j 
3  z y i, j
 2 2   2 2 

E y Ka   K  
  H n
 Hz
n  b H n
 Hz
n  H n
(2.34b)
t h  z 1
i , j
1
i , j  3h  z 3
i , j
3
i , j  z x i, j
 2 2   2 2 

 H n n   H n n 
 x 1  Hx 1   x 3  Hx 3 
i, j  i, j  i, j  i, j 
E K 2  Kb 
 z  a  2
  3h 
2 2 
 (2.34c)
t h H n n n n
 y i 1 , j  H y i 1 , j 
 

 Hy 3  Hy 3 
i , j i , j 
 2 2   2 2 
High-Order FDTD Methods 111

H x K   Kb  
  a n n
 Ez |i , j  1  Ez |i , j  1    Ez |i , j  3  Ez |i , j  3    z E y |i , j (2.34d)
n n n

t h  2 2  3h  2 2 

H y Ka   Kb  
  n n
 Ez |i  1 , j  Ez |i  1 , j    Ez |i  3 , j  Ez |i  3 , j    z Ex |i , j (2.34e)
n n n

t h  2 2  3h  2 2 

 Ex |n 1  Ex |n 1   Ex |n 3  Ex |n 3 
H z K a  i, j  i, j   K b 
i, j  i, j  
  2 2
 2 2
(2.34f)
t h   E y |n 1  E y |n 1  3h   E y |n 3  E y |n 3 
  
i , j i , j i , j i , j
 2 2   2 2 

The fourth order time finite-difference needs to be of the backward-difference


type to maintain numerical stability [18]:

Ex 1  n
1
n
1
n
3
n
5
n 
7
  22 Ex |i , j 2 17 Ex |i , j 2 9 Ex |i , j 2 5Ex |i , j 2  Ex |i , j 2  (2.35)
t 24t  
The same matrix equation (2.4) is used to determine the dispersion relation for
this algorithm, with the following changes to the discrete operators there:

Dt 
1
24t

22e j t 2  17e  j t 2  9e 3 j t 2  5e 5 j t 2  e 7 j t 2 (2.36a) 
Dz   z first three rows in (2.4) (2.36b)

Dz    z last three rows in (2.4) (2.36c)

The corresponding dispersion relation and stability limit would then be

Dt2  Dx2  Dy2   z2 (2.37)

1
h 
tmax  2 (2.38)
2
 h
2K a  K b 3   z 
2

 2 
When using compact-FDTD and its high-order variants, the longitudinal
wavenumber  z is an input that needs to be provided. It is chosen such that the
waveguide first mode of operation coincides with the operating frequency. For
electrically large structures, this stipulation translates to  z being nearly identical
to the unbounded wavenumber  . For example, propagating a 1-GHz signal
through a 6 × 3 m tunnel would require setting  z /   0.9982 [3]. Such values
provide a serious challenge to even high-order modeling algorithms, if not
112 Advanced Computational Electromagnetic Methods and Applications

meticulously optimized. For this present algorithm, the K a , Kb parameters as well


as the deviation from maximum time step need to be optimized to provide
acceptable levels of the 2-D version of the global phase error in (2.10). With the
proper optimized parameters, the presented high-order compact-FDTD here was
capable of simulating the 6 × 3 m tunnel 1:133 faster than the optimized regular
compact-FDTD.

REFERENCES

[1] J. Fang, Time Domain Finite Difference Computation for Maxwell's Equations, PhD Dissertation,
University of California at Berkeley, Berkeley, California, 1989.
[2] A. Elsherbeni and D. Veysel, The Finite-Difference Time-Domain Method for Electromagnetics
with MATLAB Simulations, Raleigh, NC: Scitech Publishing, Inc. 2009.
[3] M. Hadi and S. Mahmoud, “A High-Order Compact-FDTD Algorithm for Electrically Large
Waveguide Analysis,” IEEE Transactions on Antennas and Propagation, Vol. 56, No. 8, pp.
25892598, 2008.
[4] A. Taflove and M. Brodwin, “Numerical Solution of Steady-State Electromagnetic Scattering
Problems Using the Time-Dependent Maxwell’s Equations,” IEEE Trans. Microwave Theory
Techniques, Vol. 23, No. 8, pp. 623630, 1975.
[5] M. Hadi, “A Finite Volumes-Based 3-D Low Dispersion FDTD Algorithm,” IEEE Transactions
on Antennas and Propagation, Vol. 55, No. 8, pp. 22872293, 2007.
[6] M. Hadi and R. Dib, “IEEE Transactions on Antennas and Propagation Low-Dispersion FDTD
Algorithms,” Appl. Computat. Electromag. Soc. J., Vol. 22, No. 3, pp. 306314, 2007.
[7] M. Hadi, “Near-Field PML Optimization for Low and High Order FDTD Algorithms Using
Closed-Form Predictive Equations,” IEEE Transactions on Antennas and Propagation, Vol. 59,
No. 8, pp. 29332942, 2011.
[8] J. Berenger, “Evanescent Waves in PML's: Origin of the Numerical Reflection in Wave-
Structure Interaction Problems,” IEEE Trans. Antennas Propagation, Vol. 47, No. 10, pp.
14971503, 1999.
[9] J. Schneider and C. Wagner, “Implementation of Transparent Sources in FDTD Simulations,”
IEEE Transactions on Antennas and Propagation, Vol. 46, No. 8, pp. 11591168, 1998.
[10] M. Hadi and N. Almutairi, “Discrete Finite-Difference Time Domain Impulse Response Filters
for Transparent Field Source Implementations,” IET Microw. Antennas Propag., Vol. 4, No. 3,
pp. 381389, 2010.
[11] T. Tan and M. Potter, “1-D Multipoint Auxiliary Source Propagator for the Total-
Field/Scattered-Field FDTD Formulations,” IEEE Antennas and Wireless Propagation Letters,
Vol. 6, pp. 144148, 2007.
[12] M. Hadi, “A Versatile Split-Field 1-D Propagator for Perfect FDTD Plane Wave Injection,”
IEEE Transactions on Antennas and Propagation, Vol. 57, No. 9, pp. 26912697, 2011.
High-Order FDTD Methods 113

[13] W. Hui, H. Zhi, W. Xian and W. Lei, “Perfect Plane Wave Injection into 3D FDTD (2,4)
Scheme,” 2011 Cross Strait Quad-Regional Radio Science and Wireless Technology Conference,
Harbin, China, 2011.
[14] I. Zagorodnov, R. Schuhmann, and T. Weiland, “Conformal FDTD-Methods to Avoid Time Step
Reduction With and Without Cell Enlargement,” Journal of Computational Physics, Vol. 225,
No. 2, pp. 14931507, 2007.
[15] B. Al-Zohouri and M. Hadi, “Conformal Modelling of Perfect Conductors in the High-Order
M24 Finite-Difference Time-Domain Algorithm,” IET Microw. Antennas Propag., Vol. 5, No. 5,
pp. 583587, 2011.
[16] N. Madsen and R. Ziolkowski, “A Three-Dimensional Modified Finite Volume Technique for
Maxwell's Equations,” Electromagnetics, Vol. 10, No. 1/2, pp. 147161, 1990.
[17] M. Hadi and S. Mahmoud, “Optimizing the Compact-FDTD Algorithm for Electrically Large
Waveguiding Structures,” Progress in Electromagnetics Research, Vol. 75, pp. 253269, 2007.
[18] K. Hwang and J. Ihm, “A Stable Fourth-Order FDTD Method for Modeling Electrically Long
Dielectric Waveguides,” Journal of Lightwave Technology, Vol. 24, No. 2, pp. 10481056, 2006.
Chapter 3
GPU Acceleration of FDTD Method for
Simulation of Microwave Circuits
Veysel Demir

This chapter presents an implementation of the finite-difference time-domain


(FDTD) method [1, 2] using the compute unified device architecture (CUDA)
development environment from NVIDIA (www.nvidia.com) to run on graphics
processing unit (GPU) devices and utilize their immense computational power to
speed-up electromagnetic simulations.

3.1 INTRODUCTION

Parallel computation methods are gaining importance as need emerges to solve


larger problem sizes in computational analysis of physical phenomena. General-
purpose computing on graphics processing units (GPGPU) has been one of the
recent approaches to achieve parallel computation along with other parallelization
methods [3, 4]. For instance, GeForce GTX TITAN Z is the latest GPU-based
computation card released by NVIDIA at the time of writing this article. TITAN Z
is stacked with 5,760 cores and 12 GB of memory, where each core has a 705-
MHz base clock. Compared with the high end central processing unit (CPU)
devices, the GPU cores run much slower than the CPU cores; however, the large
number of cores gives the GPU device its immense computational power. If a
computational algorithm is data-parallel, then it can be programmed to run on a
GPU platform to achieve computations orders of magnitude faster than on a CPU.
The FDTD method is a data-parallel algorithm and it has been implemented
using various programming platforms to run on GPU devices. For instance, the
FDTD implementations in [57] used OpenGL, [814] used Brook [15], a subset
language for C, and [16] used High Level Shader Language (HLSL). These
programming platforms were not easy to use to develop parallel codes to run on
GPU devices. The CUDA development environment was introduced by NVIDIA
to facilitate the code development process on GPU devices, and so far it has been
the most popular and the best supported programming platform used to develop

115
116 Advanced Computational Electromagnetic Methods and Applications

GPU codes. CUDA implementations of the FDTD method are used in commercial
computational electromagnetics software. Furthermore, CUDA has been reported
as the programming environment for implementation of FDTD in several academic
research articles, which include [1722] as some of the earlier implementations.
OpenCL [23] is yet another recently introduced programming platform to develop
codes on parallel devices, and used to develop FDTD implementations [2426].
In this chapter we present an implementation of a three-dimensional FDTD
code using CUDA. The presented code includes an implementation of FDTD using
the C programming language to run on CPU as well as the implementation in
CUDA to run on GPU. Some considerations that a developer needs to keep in
mind to develop a code with better performance are also discussed.
The files of the code presented in this chapter are available on the publisher’s
website. We strongly recommend that the reader download and study the code
while reading the following sections, as these sections discuss the concepts in
parallel with the code and serve as a tutorial. Also, a basic knowledge of CUDA
programming is required. For beginners, we recommend the “NVIDIA CUDA
Getting Started Guide” and “CUDA C Programming Guide” available at
NVIDIA’s web site to start learning CUDA.
The next section presents the implementation of the code and discusses the
core functions programmed in C language to run the program on CPU. The
subsequent section presents the CUDA implementation and discusses the issues
one needs to pay attention to while programming FDTD using CUDA.

3.2 FDTD CODE FOR MICROWAVE CIRCUIT SIMULATION

The FDTD method is the most researched method, and many techniques have been
developed to model various conditions  dispersive media, nonlinear media,
absorbing boundaries, and so forth in FDTD. A code developed to demonstrate a
subset of these extensions to the basic FDTD method can be covered only in a
book. Therefore, in this chapter, we keep the implemented code limited with the
basics of FDTD, while sufficient to present a GPU implementation that is useful to
solve basic microwave circuits: implementation of electric and magnetic field
updating equations, excitation of ports, calculation of voltages and currents at the
ports, and eventually calculation of scattering parameters are presented.

3.2.1 Features of the FDTD Code

The presented FDTD code is developed following the assumptions listed below:
 The problem space is a closed PEC box; therefore, PEC boundary condition
is used at the boundaries: the tangential electric field components are set to
be zero.
GPU Acceleration Techniques of FDTD Methods 117

 The problem space is composed of a layer of dielectric substrate and a layer


of air stacked in the z-direction. PEC traces in between form a microstrip
circuit.
 The traces are zero thickness PEC and the dielectric substrate is
nonmagnetic and lossless.
 Ports extend from the bottom cover of the box to the top of the dielectric
layer in the z-direction.

The program reads an input file in which the FDTD problem to be solved is
described. For instance, Figure 3.1 illustrates a lowpass filter [4]. Simulation
parameters of this filter are defined in a text file named as lowpass_filter.txt. The
program can be executed in the command-line user interface on a Microsoft
Windows operating system as

mwfdtd lowpass_filter.txt

where “mwfdtd.exe” is the name of the executable file generated by compilation of


the code presented in this chapter. The program generates an output file with the
same file name but with “.m” extension, which is the extension for MATLAB
script files. For instance, running “lowpass_filter.txt” generates “lowpass_filter.m”
at the output. The output file is formatted such that it contains both the transient
and frequency domain results of the simulation and it can be run as a script file in
MATLAB. When running in MATLAB, it displays the simulation results in
MATLAB figures.

Figure 3.1 Configuration of the lowpass microstrip filter.


118 Advanced Computational Electromagnetic Methods and Applications

3.2.2 Input Parameters File

The contents of the input file “lowpass_filter.txt” are shown in Listing 3.1. Here,
run_on_gpu is a parameter that determines whether the simulation is to be run on
CPU (if 0) or GPU (if 1). If there are more than one GPU devices on the system,
one can choose which device is used by assigning its device ID to
gpu_device_id parameter. The parameter number_of_time_steps sets the
number of time steps to run the FDTD time-marching loop. The parameters
cell_size_x, cell_size_y, and cell_size_z set the dimension of a unit cell
in the x-, y-, and z-directions, respectively. It should be noted that the default units
of all the lengths described in the input file are in meters. The parameters
substrate_relative_permittivity and substrate_thickness define the
dielectric constant and the thickness of the substrate. The parameters box_size_x,
box_size_y, and box_size_z set the dimensions of the problem space in the x-,
y-, and z-directions, respectively. One corner of the problem space coincides with
the origin of the Cartesian coordinate system and the problem space box extends in
the x-, y-, and z-directions as illustrated in Figure 3.1.
A rectangular PEC patch can be defined by its start and end coordinates in the
input text file. The parameters microstrip_min_x, microstrip_min_y,
microstrip_max_x, and microstrip_max_y define the start and the end
coordinates. A number of rectangular patches can be combined to create complex
shapes. For instance, the lowpass filter shown in Figure 3.1 is created using three
rectangular patches.
Ports are also defined by their start and end coordinates as shown in Listing
3.1, where two ports are defined. The active source port that is used to excite the
antenna during the FDTD simulation is indicated by the parameter active
_port_index. All ports are 50-ohm ports in the presented code; therefore, the
active port is simulated as a voltage source with 50-ohm internal impedance,
whereas the inactive ports are simulated as 50-ohm resistors. Transient voltage and
current are captured on each port during a simulation. Then the captured voltages
and currents are used to calculate scattering parameters of the circuit. Finally, the
parameters frequency_start, frequency_end, and number_of_ frequen-
cies define the frequencies of interest for the scattering parameter calculations.

Listing 3.1 An example input file: lowpass_filter.txt


run_on_gpu 1
gpu_device_id 0
number_of_time_steps 5000
cell_size_x 0.0004064
cell_size_y 0.0004233
cell_size_z 0.000265
substrate_thickness 0.000795
substrate_relative_permittivity 2.2
GPU Acceleration Techniques of FDTD Methods 119

box_size_x 0.028448
box_size_y 0.027938
box_size_z 0.003445
microstrip_index 1
microstrip_min_x 0.0097536
microstrip_max_x 0.012192
microstrip_min_y 0.004233
microstrip_max_y 0.012699
microstrip_index 2
microstrip_min_x 0.016256
microstrip_max_x 0.018694
microstrip_min_y 0.015239
microstrip_max_y 0.023705
microstrip_index 3
microstrip_min_x 0.004064
microstrip_max_x 0.024384
microstrip_min_y 0.0127
microstrip_max_y 0.01524
port_index 1
port_min_x 0.0097536
port_max_x 0.012162
port_min_y 0.004233
port_max_y 0.004233
port_index 2
port_min_x 0.016256
port_max_x 0.018694
port_min_y 0.023705
port_max_y 0.023705
active_port_index 1
frequency_start 1e9
frequency_end 20e9
number_of_frequencies 91

3.2.3 Main Program Layout

Listing 3.2 shows the main() function of the C code of the FDTD program. The
program is structured such that, first, the contents of the input file are read and
relevant parameter values are assigned to associated data elements and arrays.
Then the function setupProblemSpace() is called to create and initialize arrays
for updating coefficients, electric and magnetic fields, and other auxiliary data. The
function setupPorts() is called to set up arrays regarding port calculations,
which include voltage source excitation, sampled voltage, and current calculations.
The functions startTimer() and stopTimer() are used to capture the total
120 Advanced Computational Electromagnetic Methods and Applications

time spent for a simulation. Then port scattering parameters are calculated in the
function calculateScatteringParameters() using the sampled voltages and
currents captured on the ports. Finally, the function saveResults() stores the
results of the simulation in the output MATLAB script file.

Listing 3.2 The function main()


int main(int argc, char **argv)
{
readInputFile(argc, argv);
setupProblemSpace();
setupPorts();
startTimer();
if (run_on_gpu)
{
setupGPU();
runTimeMarchingLoopOnGPU();
copyDataBackAndClearGPU();
}
else
{
runTimeMarchingLoopOnCPU();
}
calculateScatteringParameters();
stopTimer();
saveResults(argv);
}

One can notice in Listing 3.2 that the if statement is used to branch the main
FDTD time-marching loop to run either on CPU or on GPU. Listing 3.3 shows the
function runTimeMarchingLoopOnCPU(), which runs the time-marching loop
on CPU. In the time-marching loop, at every time step, first, magnetic and electric
fields are updated consecutively. These updates are followed by special updates of
fields due to a voltage source. Then electric and magnetic fields are captured at the
ports.

Listing 3.3 The function runTimeMarchingLoopOnCPU()


void runTimeMarchingLoopOnCPU()
{
int time_step;
for (time_step=0;time_step<number_of_time_steps;time_step++)
{
updateMagneticFields();
updateElectricFields();
GPU Acceleration Techniques of FDTD Methods 121

updateVoltageSource(time_step);
captureVoltageAndCurrent(time_step);
}
}

3.2.4 Field Updates

Listing 3.4 shows the function updateMagneticFields(), which executes the


magnetic field updates. For instance, the updating equation for the x-component of
magnetic field is given as


H xn0.5 i, j, k   H xn0.5 i, j, k   Chxey  E yn i, j, k  1  E yn i, j, k   (3.1)
 Chxez  E n
z i, j  1, k   E zn i, j, k 
where Chxey = t/µ0z and Chxez = –t/µ0y. Here, t is the duration of a time step,
µ0 is the free space permeability, and y and z are the size of a unit cell in the y-
and z-directions, respectively.
In the presented implementation, if there are Nx × Ny × Nz cells in a problem
space, the number of field components for each field type is Nfields = (Nx+1) ×
(Ny+1) × (Nz+1). For instance, Figure 3.2 illustrates the Ex field component
distribution on an x-y plane cut. The numbers of field components in the x- and y-
directions are Nx+1 and Ny+1, respectively. The actual number of the Ex field
components that lay in the problem space is Nx and the Ex components in the
rightmost column in Figure 3.2 are out of the problem space boundaries. These
extra field components are included in the 3-D field arrangements so that all 3-D
arrays (i.e., Ex, Ey, Ez, Hx, Hy, and Hz) are of the same size as Nfields. The offsets
between required field components in an update, discussed next below, are the
same for all types of field components if the field arrays are with the same
dimensions, which make it convenient when programming.

Listing 3.4 The function updateMagneticFields()


void updateMagneticFields()
{
for (int i=0;i<n_fields-z_offset;i++)
{
Hx[i] = Hx[i] + Chxey*(Ey[i+z_offset]-Ey[i])
+ Chxez*(Ez[i+y_offset]-Ez[i]);
Hy[i] = Hy[i] + Chyez*(Ez[i+x_offset]-Ez[i])
+ Chyex*(Ex[i+z_offset]-Ex[i]);
Hz[i] = Hz[i] + Chzex*(Ex[i+y_offset]-Ex[i])
+ Chzey*(Ey[i+x_offset]-Ey[i]);
122 Advanced Computational Electromagnetic Methods and Applications

}
}

Figure 3.2 Distribution of the x-component of electric fields on an x-y plane-cut.

Similarly, electric field updates are performed in function


updateElectricFields() shown in Listing 3.5.

Listing 3.5 The function updateElectricFields()


void updateElectricFields()
{
for (int i=z_offset;i<n_fields;i++)
{
Ex[i] = Ex[i] + Cexhz[i]*(Hz[i]-Hz[i-y_offset])
+ Cexhy[i]*(Hy[i]-Hy[i-z_offset]);
Ey[i] = Ey[i] + Ceyhx[i]*(Hx[i]-Hx[i-z_offset])
+ Ceyhz[i]*(Hz[i]-Hz[i-x_offset]);
}
for (int i=y_offset;i<n_fields-z_offset;i++)
{
Ez[i] = Ceze[i]*Ez[i]
+ Cezhy[i]*(Hy[i]-Hy[i-x_offset])
+ Cezhx[i]*(Hx[i]-Hx[i-y_offset]);
}
}
GPU Acceleration Techniques of FDTD Methods 123

It should be noted that if a 3-D array is allocated as a contiguous memory


block, it is actually stored as a 1-D linear array on the computer memory.
Therefore, a 1-D array of size Nfields can be used instead of a 3-D array of size
(Nx+1, Ny+1, Nz+1) in an implementation. One just needs to use the correct index
mapping between the 3-D array and its equivalent 1-D array. For instance, in the
presented implementation, index (i, j, k) maps to (i–1) + (j–1) × yoffset + (k–1) ×
zoffset. Here, yoffset = (Nx+1) is the distance between two neighboring fields in the y-
direction in the 3-D array; that is, (i, j, k) and (i, j+1, k). Similarly, zoffset = (Nx+1) ×
(Ny+1) is the distance between two neighboring fields in the z-direction, this is, (i,
j, k) and (i, j, k+1). The distance between two neighboring fields in the x-direction,
(i, j, k) and (i+1, j, k), is referred to as xoffset = 1. Therefore, for instance, the line of
code in Listing 3.4

Hx[i] = Hx[i] + Chxey*(Ey[i+z_offset]-Ey[i])


+ Chxez*(Ez[i+y_offset]-Ez[i]);

is equivalent to and would be coded as

Hx[i][j][k] = Hx[i][j][k] + Chxey*(Ey[i][j][k+1]


-Ey[i][j][k]) + Chxez*(Ez[i][j+1][k]
-Ez[i][j][k]);

if 3-D arrays were used.


Listing 3.6 shows the function updateVoltageSource() that is used to
update the electric field components at the active port. Here, ports is an array of
struct port_type. The field ports[ind].sv_indices is an array that
stores the indices of Ez field components that are associated with a port. These field
components are used to calculate the sampled voltage on the port. The same field
components require a special update due to the voltage source if the port is the
active port. The array voltage_source_waveform stores the source voltage
waveform for every time step. The parameter Cezvs is the updating coefficient
used to impose the voltage source. Thus, the code in Listing 3.6 simulates the last
term in the following voltage source updating equation
Ezn 1 (i, j , k )  Exn (i, j , k )

Cezhy (i, j, k ) H yn  0.5 (i, j, k )  H yn  0.5 (i  1, j, k )  (3.2)
Cezhy (i, j, k )  H n  0.5
x (i, j, k )  H n  0.5
x (i, j  1, k ) 
Cezvs (i, j, k )Vsn  0.5 .

Listing 3.6 The function updateVoltageSource()


void updateVoltageSource(int time_step)
{
124 Advanced Computational Electromagnetic Methods and Applications

int ind = active_port_index-1;


int field_index;
float current_value =
Cezvs * voltage_source_waveform[time_step];
for (int i=0;i<ports[ind].number_of_sv_fields;i++)
{
field_index = ports[ind].sv_indices[i];
Ez[field_index] = Ez[field_index] + current_value;
}
}

3.2.5 Outputs of the Program

Listing 3.7 shows the function captureVoltageAndCurrent(), which is used


to capture voltages and currents at the ports. The sampled voltage is stored in the
array ports[ind].sampled_voltage and the sampled current is stored in the
array ports[ind].sampled_current for a port indexed as ind. These voltages
and currents are used to calculate scattering parameters in the function
calculateScatteringParameters(), shown in Listing 3.8, after the time-
marching loop is completed. In Listing 3.8, discrete Fourier transforms of voltages
and currents are performed by calling the function dft() shown in Listing 3.9.

Listing 3.7 The function captureVoltageAndCurrent()


void captureVoltageAndCurrent(int time_step)
{
for (int ind=0;ind<number_of_ports;ind++)
{
int field_index;
float sum_ez = 0.0;
int n_sv_fields = ports[ind].number_of_sv_fields;
for (int i=0;i<n_sv_fields;i++)
{
field_index = ports[ind].sv_indices[i];
sum_ez = sum_ez + Ez[field_index];
}
ports[ind].sampled_voltage[time_step] =
-dz*sum_ez*nz_substrate/n_sv_fields;

float sum_hxm = 0.0;


float sum_hxp = 0.0;
float sum_hym = 0.0;
float sum_hyp = 0.0;
GPU Acceleration Techniques of FDTD Methods 125

for (int i=0;i<ports[ind].number_of_hx_fields;i++)


{
sum_hxp = sum_hxp + Hx[ports[ind].hxp_indices[i]];
sum_hxm = sum_hxm + Hx[ports[ind].hxm_indices[i]];
}

for (int i=0;i<ports[ind].number_of_hy_fields;i++)


{
sum_hyp = sum_hyp + Hy[ports[ind].hyp_indices[i]];
sum_hym = sum_hym + Hy[ports[ind].hym_indices[i]];
}
ports[ind].sampled_current[time_step] =
(sum_hxm*dx+sum_hyp*dy-sum_hxp*dx-sum_hym*dy)
/nz_substrate;
}
}

Listing 3.8 The function calculateScatteringParameters()


void calculateScatteringParameters()
{
int nf = number_of_frequencies;
int np = number_of_ports;

float df = (frequency_end - frequency_start)/nf;

frequencies = (float*) malloc(nf*sizeof(float));

for (int i=0;i<nf;i++)


frequencies[i] = frequency_start + i * df;

int nt = number_of_time_steps;
float* time_waveform;
complex sv, sc;
float Z = 50.0;
float sZ = 2.0*sqrt(Z);

for (int ind=0;ind<np;ind++)


{
ports[ind].a_wave
= (complex*) malloc(nf*sizeof(complex));
ports[ind].b_wave
= (complex*) malloc(nf*sizeof(complex));
126 Advanced Computational Electromagnetic Methods and Applications

for (int i=0;i<nf;i++)


{

time_waveform = ports[ind].sampled_voltage;
sv = dft(time_array, 0.5*dt, dt, nt,
time_waveform, frequencies[i]);
time_waveform = ports[ind].sampled_current;
sc = dft(time_array, 0.0, dt, nt,
time_waveform, frequencies[i]);
ports[ind].a_wave[i].re = (sv.re + sc.re * Z)/sZ;
ports[ind].a_wave[i].im = (sv.im + sc.im * Z)/sZ;
ports[ind].b_wave[i].re = (sv.re - sc.re * Z)/sZ;
ports[ind].b_wave[i].im = (sv.im - sc.im * Z)/sZ;
}
}

int apind = active_port_index-1;


ports[apind].S = (complex**) malloc(np*sizeof(complex*));
for (int i=0;i<np;i++) ports[apind].S[i] = (complex*)
malloc(nf*sizeof(complex));

complex a, b, s;
for (int ind=0;ind<np;ind++)
{
for (int i=0;i<nf;i++)
{
a = ports[apind].a_wave[i];
b = ports[ind].b_wave[i];

s.re = (b.re*a.re+b.im*a.im)
/(a.re*a.re+a.im*a.im);
s.im = (b.im*a.re-b.re*a.im)
/(a.re*a.re+a.im*a.im);

ports[apind].S[ind][i] = s;
}
}
}

Listing 3.9 The function dft()


complex dft(float* time_array, float time_shift, float dt,
int n, float* time_waveform, float frequency)
{
GPU Acceleration Techniques of FDTD Methods 127

complex dft_val;
dft_val.re = 0.0;
dft_val.im = 0.0;

float pi = atan(1.0)*4.0;
float w = 2*pi*frequency;
float wt, tw;

for (int i=0;i<n;i++)


{
wt = -w*(time_array[i]+time_shift);
tw = dt*time_waveform[i];
dft_val.re = dft_val.re + tw*cos(wt);
dft_val.im = dft_val.im + tw*sin(wt);
}
return dft_val;
}

3.3 FDTD CODE USING CUDA

In this section, we present the CUDA implementation of FDTD to run on a GPU


device. We will also discuss performance improvement strategies in CUDA and
their implementation in the code.

3.3.1 Performance Optimization

Some recommendations for optimization of CUDA programs and the list of best
practices for programming with CUDA are provided in “CUDA C Best Practices
Guide” available at NVIDIA’s web site (http://docs.nvidia.com/cuda/cuda-c-best-
practices-guide/#axzz39Pwtembc). Among these recommendations, we can use the
following ones that are directly applicable to FDTD and used to optimize the
presented FDTD implementation:
R1. Structure the algorithm in a way that exposes as much data parallelism as
possible. Once the parallelism of the algorithm has been exposed, it needs
to be mapped to the hardware as efficiently as possible.
R2. Ensure that global memory accesses are coalesced whenever possible.
R3. Minimize the use of global memory. Prefer shared memory access where
possible.

The FDTD updates in a cell can be performed separately from the updates in
other cells. Therefore, the FDTD algorithm inherently satisfies the first
recommendation R1 listed above. We will refer to the other recommendations as
well in the subsequent sections as we discuss the implementation of the code.
128 Advanced Computational Electromagnetic Methods and Applications

3.3.2 Memory Accesses

The main memory space on the GPU device is referred to as global memory and
the global memory is accessed via 32-, 64-, or 128-byte memory transactions such
as reading or writing an array by a block of threads. These memory transactions
must be naturally aligned for the best performance. For instance, Figure 3.3
illustrates misaligned and nonsequential access of memory locations in the global
memory by threads. Aligned and sequential access of memory locations, as
illustrated in Figure 3.4, is referred to as coalesced memory access.

Figure 3.3 Misaligned and nonsequential accesses of memory locations by threads.

To achieve coalesced memory access, a 3-D problem space can be expanded


in the x-direction by padding extra cells such that the number of field components
in that direction is an integer multiple of 16. Figure 3.5 illustrates an original
problem space in an x-y plane-cut and the new problem space after expansion. The
new problem space has an integer multiple of 16 number of fields in the x-
direction. Although padded cells increase the total number of cells in the problem
space, they ensure coalesced memory accesses, thus increasing the efficiency of
the program tremendously. Hence, the recommendation R2 listed above is satisfied.
Listing 3.10 shows the section of the function setupProblemSpace() that
calculates and adjusts the number of cells in the problem space. Notice that the
number of fields is one more than the number of cells in a direction.

Figure 3.4 Aligned and sequential access of memory locations by threads.


GPU Acceleration Techniques of FDTD Methods 129

Figure 3.5 A problem space expanded in the x-direction using padded cells.

Listing 3.10 A section of the function setupProblemSpace()


nx_cpu = round(box_size_x/dx);
ny = round(box_size_y/dy);
nz = round(box_size_z/dz);
nz_substrate = round(substrate_thickness/dz);
nx_gpu = (16*(nx_cpu/16)+15);

if (run_on_gpu)
nx = nx_gpu;
else
nx = nx_cpu;

3.3.3 Preparation of the GPU Device

Remember that the GPU device is a separate computational device that has its own
processors as well as memory spaces. In CUDA terminology, the GPU device is
referred to as device, while the CPU side of the computer is referred to as host.
Thus, if a computation will be performed on the GPU device, the relevant data
need to be transferred to the device memory before the main computation begins.
The main computation in this case is the time-marching loop that requires the
updating coefficients as the input. The task of transferring the input data to the
device memory is performed within the function setupGPU() in Listing 3.2.
During the time-marching loop, the electric and magnetic field distributions are
intermediate outputs, and sampled voltages and currents are the main outputs. The
130 Advanced Computational Electromagnetic Methods and Applications

function runTimeMarchingLoopOnGPU() executes the time-marching loop.


Once the loop is completed, the sampled voltages and currents need to be
transferred back to the host memory. This last task is performed in the function
copyDataBackAndClearGPU(). Then the output data is processed to calculate
the scattering parameters.
Listing 3.11 shows the function setupGPU(). In this function, first, the
function setGPUdevice(gpu_device_id) is called to set the GPU device
indicated in the input file as the active device. Then, the function
copyArraysToGpuMemory() is called to allocate arrays on the device and copy
the relevant data to the device global memory. Listing 3.12 uses the function
copyArraysToGpuMemory() to illustrate allocation of updating coefficient
arrays on the device memory and uses the function cudaMalloc()to copy data
from the host memory to the device memory. Here, for instance, Cexhy is a pointer
to the array on the host memory that holds the data of the updating coefficient, and
dvCexhy is a pointer to the corresponding array on the device memory. It should
be noted that all pointers to the arrays on the device memory are denoted with a
prefix dv in the presented code.

Listing 3.11 The function setupGPU()

void setupGPU()
{
setGPUdevice(gpu_device_id);
if (copyArraysToGpuMemory()!=0)
{
printf("Error while copying arrays to GPU memory!\n");
exit (EXIT_FAILURE);
}
setThreadBlocks();
}

Listing 3.12 A section of the function copyArraysToGpuMemory()


int copyArraysToGpuMemory()
{
int size_int = sizeof(int);
int size_float = sizeof(float);
int array_size =
(n_fields + maximum_threads_per_block) * size_float;

cudaError_t et;

et = cudaMalloc((void**)&dvEx, array_size);
et = cudaMalloc((void**)&dvEy, array_size);
GPU Acceleration Techniques of FDTD Methods 131

et = cudaMalloc((void**)&dvEz, array_size);
et = cudaMalloc((void**)&dvHx, array_size);
et = cudaMalloc((void**)&dvHy, array_size);
et = cudaMalloc((void**)&dvHz, array_size);

et = cudaMalloc((void**)&dvCexhy, array_size);
et = cudaMalloc((void**)&dvCexhz, array_size);
et = cudaMalloc((void**)&dvCeyhz, array_size);
et = cudaMalloc((void**)&dvCeyhx, array_size);
et = cudaMalloc((void**)&dvCeze, array_size);
et = cudaMalloc((void**)&dvCezhy, array_size);
et = cudaMalloc((void**)&dvCezhx, array_size);

array_size = n_fields * size_float;

cudaMemcpy(dvEx, Ex, array_size, cudaMemcpyHostToDevice);


cudaMemcpy(dvEy, Ey, array_size, cudaMemcpyHostToDevice);
cudaMemcpy(dvEz, Ez, array_size, cudaMemcpyHostToDevice);
cudaMemcpy(dvHx, Hx, array_size, cudaMemcpyHostToDevice);
cudaMemcpy(dvHy, Hy, array_size, cudaMemcpyHostToDevice);
cudaMemcpy(dvHz, Hz, array_size, cudaMemcpyHostToDevice);

cudaMemcpy(dvCexhy, Cexhy, array_size,


cudaMemcpyHostToDevice);

cudaMemcpy(dvCexhz, Cexhz, array_size,


cudaMemcpyHostToDevice);
cudaMemcpy(dvCeyhz, Ceyhz, array_size,
cudaMemcpyHostToDevice);

cudaMemcpy(dvCeyhx, Ceyhx, array_size,


cudaMemcpyHostToDevice);

cudaMemcpy(dvCeze, Ceze, array_size,


cudaMemcpyHostToDevice);

cudaMemcpy(dvCezhy, Cezhy, array_size,


cudaMemcpyHostToDevice);

cudaMemcpy(dvCezhx, Cezhx, array_size,


cudaMemcpyHostToDevice);

for (int ind=0;ind<number_of_ports;ind++)


{

array_size = size_int*ports[ind].number_of_sv_fields;

et = cudaMalloc((void**)&(ports[ind].dvsv_indices),
array_size);
132 Advanced Computational Electromagnetic Methods and Applications

cudaMemcpy(ports[ind].dvsv_indices,
ports[ind].sv_indices, array_size,
cudaMemcpyHostToDevice);

array_size = size_float*number_of_time_steps;

et =cudaMalloc((void**)&(ports[ind].dvsampled_voltage),
array_size);

cudaMemcpy(ports[ind].dvsampled_voltage,
ports[ind].sampled_voltage,array_size,
cudaMemcpyHostToDevice);

array_size = size_int*ports[ind].number_of_hx_fields;
et=cudaMalloc((void**)&(ports[ind].dvhxm_indices),
array_size);

cudaMemcpy(ports[ind].dvhxm_indices,
ports[ind].hxm_indices, array_size,
cudaMemcpyHostToDevice);

array_size = size_int*ports[ind].number_of_hx_fields;

et = cudaMalloc((void**)&(ports[ind].dvhxp_indices),
array_size);

cudaMemcpy(ports[ind].dvhxp_indices,
ports[ind].hxp_indices, array_size,
cudaMemcpyHostToDevice);

array_size = size_int*ports[ind].number_of_hy_fields;

et = cudaMalloc((void**)&(ports[ind].dvhym_indices),
array_size);

cudaMemcpy(ports[ind].dvhym_indices,
ports[ind].hym_indices, array_size,
cudaMemcpyHostToDevice);

array_size = size_int*ports[ind].number_of_hy_fields;
et = cudaMalloc((void**)&(ports[ind].dvhyp_indices),
array_size);

cudaMemcpy(ports[ind].dvhyp_indices,
ports[ind].hyp_indices, array_size,
cudaMemcpyHostToDevice);

array_size = size_float*number_of_time_steps;

et =cudaMalloc((void**)&(ports[ind].dvsampled_current),
GPU Acceleration Techniques of FDTD Methods 133

array_size);

cudaMemcpy(ports[ind].dvsampled_current,
ports[ind].sampled_current, array_size,
cudaMemcpyHostToDevice);
}
return 0;
}

3.3.4 Thread to Cell Mapping

In FDTD time-marching loop, while updating the fields, the fields in a cell can be
updated independently from the field updates in other cells; hence, the updates in
different cells can be performed in parallel with each other. Therefore, each cell
can be assigned to a thread to perform the field updates within the cell for parallel
processing. In CUDA, a number of threads form a block and a number of blocks
form a grid. We can map each thread to a cell and create a sufficient number of
blocks such that the grid spans all cells in the problem space. Several various
thread to cell mapping algorithms can be proposed. In this chapter, the thread
blocks are constructed as shown in Listing 3.13, which shows a section of the
function setThreadBlocks(). Here, the grid is constructed as a one-dimensional
array of thread blocks and each thread block is constructed as a one-dimensional
array of threads. The number of threads per thread block is denoted as
maximum_threads_per_block in the presented code, which is the maximum
number of threads per block available on the device. Then, in the field updating
kernels, these blocks and threads are mapped to cells. Figure 3.6 illustrates the
thread to cell mapping on an x-y plane-cut. Here, the cells in the rightmost column
are considered as extension cells that contain the extension fields in the rightmost
column in Figure 3.2.

Listing 3.13 The function of setThreadBlocks()


void setThreadBlocks()
{
int n_blocks
= ((n_fields-z_offset)/maximum_threads_per_block)
+ ((n_fields-z_offset)%maximum_threads_per_block
== 0 ? 0 : 1);

block_eh = dim3(maximum_threads_per_block, 1, 1);


grid_eh = dim3(n_blocks, 1, 1);
shared_memory_size = 2 * (maximum_threads_per_block+16)
*sizeof(float);

int ind = active_port_index-1;


int nsv = ports[ind].number_of_sv_fields;
134 Advanced Computational Electromagnetic Methods and Applications

n_blocks = (nsv/maximum_threads_per_block) +
(nsv%maximum_threads_per_block == 0 ? 0 : 1);

block_vs = dim3(maximum_threads_per_block, 1, 1);


grid_vs = dim3(n_blocks, 1, 1);

shared_memory_size_sv =
maximum_threads_per_block*sizeof(float);

int nh = ports[ind].number_of_hx_fields;

n_blocks = (nh/maximum_threads_per_block) +
(nh%maximum_threads_per_block == 0 ? 0 : 1);
block_sc_hx = dim3(maximum_threads_per_block, 1, 1);
grid_sc_hx = dim3(n_blocks, 1, 1);

shared_memory_size_sc_hx =
2*maximum_threads_per_block*sizeof(float);

nh = ports[ind].number_of_hy_fields;
n_blocks = (nh/maximum_threads_per_block) +
(nh%maximum_threads_per_block == 0 ? 0 : 1);

block_sc_hy = dim3(maximum_threads_per_block, 1, 1);


grid_sc_hy = dim3(n_blocks, 1, 1);

shared_memory_size_sc_hy =
2*maximum_threads_per_block*sizeof(float);
}

Extension cells

Number in a cell indicates its block id


ID blockidx.x

Subscript indicates thread id


ID threadidx.x

74 75 76 77 80 81 82 83 84 85 86 87
60 61 62 63 64 65 66 67 70 71 72 73
44 45 46 47 50 51 52 53 54 55 56 57
30 31 32 33 34 35 36 37 40 41 42 43
y
14 15 16 17 20 21 22 23 24 25 26 27
00 01 02 03 04 05 06 07 10 11 12 13
x

Figure 3.6 Thread to cell mapping on an x-y plane-cut.


GPU Acceleration Techniques of FDTD Methods 135

3.3.5 The Time-Marching Loop

Once the required arrays are allocated on the device memory and relevant data are
copied to the device memory the program is ready to execute the time-marching
loop. The function runTimeMarchingLoopOnGPU(), shown in Listing 3.14,
executes the time-marching loop on the GPU device. One can compare Listing
3.14 with Listing 3.3 and verify that they contain the same steps: magnetic field
updates, electric field updates, voltage source updates, and voltage and current
calculations.

Listing 3.14 The function runTimeMarchingLoopOnGPU()


void runTimeMarchingLoopOnGPU()
{
for (int time_step=0;time_step<number_of_time_steps;
time_step++)
{
updateMagneticFieldsOnGPU<<<grid_eh, block_eh,
shared_memory_size>>>
(Chxey, Chxez, Chyez, Chyex, Chzex, Chzey,
dvEx, dvEy, dvEz, dvHx, dvHy, dvHz,
y_offset, z_offset, n_fields);

updateElectricFieldsOnGPU<<<grid_eh, block_eh,
shared_memory_size>>>
(dvCexhy, dvCexhz, dvCeyhz, dvCeyhx, dvCezhx,
dvCezhy, dvCeze, dvEx, dvEy, dvEz, dvHx, dvHy,
dvHz, y_offset, z_offset, n_fields);

int ind = active_port_index-1;


float current_value = Cezvs *
voltage_source_waveform[time_step];

updateVoltageSourceOnGPU<<<grid_vs, block_vs>>>
(dvEz, ports[ind].dvsv_indices,
ports[ind].number_of_sv_fields,
time_step, current_value);

for (int ind=0;ind<number_of_ports;ind++)


{
captureVoltageOnGPU<<<grid_vs, block_vs,
shared_memory_size_sv>>>
(dvEz, ports[ind].dvsv_indices,
ports[ind].number_of_sv_fields,
136 Advanced Computational Electromagnetic Methods and Applications

ports[ind].dvsampled_voltage,
time_step, dz, nz_substrate);
}

for (int ind=0;ind<number_of_ports;ind++)


{
captureCurrentOnGPU<<<grid_sc_hx, block_sc_hx,
shared_memory_size_sc_hx>>>
(dvHx, ports[ind].dvhxm_indices,
ports[ind].dvhxp_indices,
ports[ind].number_of_hx_fields,
ports[ind].dvsampled_current,
time_step, dx, nz_substrate);

captureCurrentOnGPU<<<grid_sc_hy, block_sc_hy,
shared_memory_size_sc_hy>>>
(dvHy, ports[ind].dvhyp_indices,
ports[ind].dvhym_indices,
ports[ind].number_of_hy_fields,
ports[ind].dvsampled_current,
time_step, dy, nz_substrate);
}
}
}

3.3.6 Field Updates

Listing 3.12 shows the function updateMagneticFieldsOnGPU(). This


function utilizes the shared memory, a cached memory space that is accessible in
read and write mode to all of the threads in the same thread block, in a manner that
follows the recommendations R2 and R3. For instance, one of the magnetic field
updating equations is


H yn0.5 i, j, k   H yn0.5 i, j, k   Chyez  E zn i  1, j, k   E zn i, j, k   (3.3)
 Chyex  E n
x i, j, k  1  E xn i, j, k 
One can realize in (3.3) that, to calculate fields in a cell, the fields in
neighboring cells also are needed: For instance, one needs Ez(i+1, j, k) and Ex(i, j,
k+1) to calculate Hy(i, j, k). A thread processing a cell (i, j, k) can efficiently read a
memory space pertaining to (i, j, k+1) since this access will be coalesced. However,
access to a memory address pertaining to (i+1, j, k) is uncoalesced and expensive
in terms of computation time. In this case, the shared memory can be utilized to
GPU Acceleration Techniques of FDTD Methods 137

retain efficiency. Because it is on-chip, the access to shared memory is much faster
than the local and global memory. Each thread can access to the memory space
associated with it and load the relevant data to the shared memory. Once the data is
in the shared memory, it is ready for use by the neighboring cells’ threads.
With the thread to cell mapping described in Figure 3.6, a problem still exists;
if thread on the boundary of a thread block needs to access to a field in a cell
mapped to another thread block, that data will not be directly available in the
shared memory. To overcome this problem, some threads are scheduled to load the
field data in those cells in the neighboring block in a separate command as
illustrated in Listing 3.15. The statement sEy[ti] = Ey[fi] loads a block of
data in shared memory and the statement sEy[blockDim.x+ti] =
Ey[blockDim.x+fi] loads the neighboring block of data in the shared memory
as also illustrated in Figure 3.7. In Listing 3.15, the statement __syncthreads()
is a synchronization barrier to ensure that all required data are loaded into the
shared memory by the threads. Once the data are available in the shared memory,
they can be used to update the fields.

Figure 3.7 Data copy from global memory to shared memory.

Listing 3.15 The function updateMagneticFieldsOnGPU()


__global__ void
updateMagneticFieldsOnGPU(float Chxey, float Chxez,
float Chyez, float Chyex, float Chzex, float Chzey,
float* Ex, float* Ey, float* Ez, float* Hx, float* Hy, float*
Hz, int y_offset, int z_offset, int n_fields)
{
extern __shared__ float sE[];
float *sEy = (float*) sE;
float *sEz = (float*) &sEy[blockDim.x+16];
138 Advanced Computational Electromagnetic Methods and Applications

int ti = threadIdx.x;
int fi = blockIdx.x * blockDim.x + threadIdx.x;

sEy[ti] = Ey[fi];
sEz[ti] = Ez[fi];
if (ti<16)
{
sEy[blockDim.x+ti] = Ey[blockDim.x+fi];
sEz[blockDim.x+ti] = Ez[blockDim.x+fi];
}
__syncthreads();

if (fi>=n_fields-z_offset) return;

Hx[fi] = Hx[fi]
+ Chxey*(Ey[fi+z_offset]-sEy[ti])
+ Chxez*(Ez[fi+y_offset]-sEz[ti]);
Hy[fi] = Hy[fi]
+ Chyez*(sEz[ti+1]-sEz[ti])
+ Chyex*(Ex[fi+z_offset]-Ex[fi]);
Hz[fi] = Hz[fi]
+ Chzex*(Ex[fi+y_offset]-Ex[fi])
+ Chzey*(sEy[ti+1]-sEy[ti]);
}

Electric field updates are performed similar to the magnetic field updates by
utilizing the shared memory in the function updateElectricFieldsOnGPU()
shown in Listing 3.16.

Listing 3.16 The function updateElectricFieldsOnGPU ()


__global__ void

updateElectricFieldsOnGPU(float* Cexhy, float* Cexhz, float*


Ceyhz, float* Ceyhx, float* Cezhx, float* Cezhy, float* Ceze,
float* Ex, float* Ey, float* Ez, float* Hx, float* Hy, float*
Hz, int y_offset, int z_offset, int n_fields)
{
extern __shared__ float sH[];
float *sHy = (float*) sH;
float *sHz = (float*) &sHy[blockDim.x+16];

int ti = threadIdx.x;
int fi = blockIdx.x * blockDim.x + threadIdx.x;
GPU Acceleration Techniques of FDTD Methods 139

sHy[ti+16] = Hy[fi];
sHz[ti+16] = Hz[fi];
if (ti<16)
{
sHy[ti] = Hy[fi-16];
sHz[ti] = Hz[fi-16];
}
__syncthreads();

if (fi>=n_fields-z_offset) return;

if (fi<y_offset) return;

Ez[fi] = Ceze[fi]*Ez[fi] + Cezhy[fi]*(sHy[ti+16]-


sHy[ti+15]) + Cezhx[fi]*(Hx[fi]-Hx[fi-y_offset]);

if (fi<z_offset) return;

Ex[fi] = Ex[fi] + Cexhz[fi]*(sHz[ti+16]-Hz[fi-y_offset])


+ Cexhy[fi]*(sHy[ti+16]-Hy[fi-z_offset]);

Ey[fi] = Ey[fi] + Ceyhx[fi]*(Hx[fi]-Hx[fi-z_offset]) +


Ceyhz[fi]*(sHz[ti+16]-sHz[ti+15]);
}

3.3.7 Source Updates and Output Calculations

Listing 3.17 shows the function updateVoltageSourceOnGPU() that updates


the electric fields due to the voltage source. The indices of the fields that overlap
with the voltage source, thus needing to be updated accordingly, are stored in the
array sv_indices. The electric field components at these indices are accessed
and an excitation value is added to them. Global memory access to the data at these
indices is not coalesced; however, since the number of field components in
consideration is very small compared with the total number of fields in the entire
problem space, the penalty to the speed of the program due to this noncoalesced
access is not critical.

Listing 3.17 The function updateVoltageSourceOnGPU()


__global__ void
updateVoltageSourceOnGPU(float* Ez, int* sv_indices,
int number_of_sv_fields, int time_step,
float current_value)
{
int ti = blockIdx.x * blockDim.x + threadIdx.x;
if (ti >= number_of_sv_fields) return;
int field_index = sv_indices[threadIdx.x];
140 Advanced Computational Electromagnetic Methods and Applications

Ez[field_index] = Ez[field_index] + current_value;


}

Listing 3.18 shows the function captureVoltageOnGPU() that calculates


the sampled voltage on a port. The indices of the fields that overlap with the port
are stored in the array sv_indices. The electric field components at these
indices are first copied in to the shared memory. Then, the values are summed and
averaged to calculate the sampled voltage at the port. The calculated voltage is
stored in the array sampled_voltage at the index of the current time step.
Sampled current at a port is also captured similar to the voltage in the function
captureCurrentOnGPU(), shown in Listing 3.19, and stored in an array called
sampled_current.

Listing 3.18 The function captureVoltageOnGPU()


__global__ void
captureVoltageOnGPU(float* Ez, int* sv_indices,
int number_of_sv_fields, float* sampled_voltage,
int time_step, float dz, int nz_substrate)
{
extern __shared__ float sv[];

int ti = blockIdx.x * blockDim.x + threadIdx.x;


if (ti >= number_of_sv_fields) return;
int field_index = sv_indices[threadIdx.x];
sv[threadIdx.x] = Ez[field_index];

__syncthreads();

if (threadIdx.x > 0) return;


float sv_sum = 0;
float scaling = - dz*nz_substrate/number_of_sv_fields;
int n_fields = number_of_sv_fields
- blockIdx.x * blockDim.x;
for (int i=0; i<n_fields && i<blockDim.x; i++)
sv_sum = sv_sum + sv[i];

sampled_voltage[time_step] =
sampled_voltage[time_step] + sv_sum*scaling;
}

Listing 3.19 The function captureCurrentOnGPU()


__global__ void
GPU Acceleration Techniques of FDTD Methods 141

captureCurrentOnGPU(float* H, int* h_indices_plus, int*


h_indices_minus, int number_of_h_fields, float*
sampled_current, int time_step, float d, int nz_substrate)
{
extern __shared__ float sH[];
float *sHp = (float*) sH;
float *sHm = (float*) &sHp[blockDim.x];

int ti = blockIdx.x * blockDim.x + threadIdx.x;


if (ti >= number_of_h_fields) return;
int field_index_p = h_indices_plus[threadIdx.x];
int field_index_m = h_indices_minus[threadIdx.x];
sHp[threadIdx.x] = H[field_index_p];
sHm[threadIdx.x] = H[field_index_m];

__syncthreads();

if (threadIdx.x > 0) return;


float h_sum = 0;
float scaling = d/nz_substrate;
int n_fields = number_of_h_fields - blockIdx.x * blockDim.x;
for (int i=0; i<n_fields && i<blockDim.x; i++)
{
h_sum = h_sum + sHp[i] - sHm[i];
}

sampled_current[time_step] = sampled_current[time_step] +
h_sum*scaling;
}

Sampled voltages and currents on the ports are the main outputs of the section
of the program running on the GPU device. These output data are copied back to
the host memory in the function copyDataBackAndClearGPU(), a section of
which is shown in Listing 3.20. Here the CUDA function cudaMemcpy() is used
for data copy. The program proceeds with scattering parameters calculation, as
shown in Listing 3.2, once the output data are available on the host memory.

Listing 3.20 A section of the function copyDataBackAndClearGPU()


int array_size = sizeof(float)*number_of_time_steps;
for (int ind=0;ind<number_of_ports;ind++)
{
cudaMemcpy(ports[ind].sampled_voltage,
ports[ind].dvsampled_voltage, array_size,
cudaMemcpyDeviceToHost);
142 Advanced Computational Electromagnetic Methods and Applications

cudaMemcpy(ports[ind].sampled_current,
ports[ind].dvsampled_current, array_size,
cudaMemcpyDeviceToHost);
}

3.4 NUMERICAL RESULTS

The performance of the presented code is examined using the low-pass filter
presented in Figure 3.1 as an example. Figure 3.8 shows the scattering parameters
obtained for this filter up to 20 GHz.
Table 3.1 shows the simulation parameters used for performance evaluation.
The low-pass filter simulation is performed first using a coarse grid and then using
a fine grid that has half the cell size of the coarse grid. Each of the cases is
repeated on both the CPU and GPU platforms. The simulation times are recorded
and the computation performances are calculated as number of million cells
processed per second using the formula
N steps  N x  N y  N z
MCPS  10 6 (3.4)
ts

where Nsteps is the number of time steps; the program has been run and ts is the total
simulation time in seconds. Results show a 14 times speed-up factor for the coarse
grid simulations when computation is performed on a GPU versus a CPU. The
speed up factor is 50 for the case of fine grid simulations. Results also indicate that
the performance of computations on a GPU is higher when the problem size is
larger.

Figure 3.8 Scattering parameters of the lowpass filter.


GPU Acceleration Techniques of FDTD Methods 143

Table 3.1
Simulation Parameters and Performance of Computations

Coarse Grid Fine Cells


Platform CPU GPU CPU GPU
dx (mm) 4.064 4.064 2.032 2.032
Cell size

dy (mm) 4.233 4.233 2.117 2.117


dz (mm) 2.65 2.65 1.33 1.33
Nx 70 79 140 143
Ny 66 66 132 132
Number
of cells

Nz 13 13 26 26
Total cells 60,060 67,782 480,480 490,776
Time steps 5,000 5,000 10,000 10,000
Simulation time (s) 8.96 0.71 242 4.97
Performance (MCPS) 34 480 20 990

It should be noted that these simulations are performed on a computer that has
a CPU of Intel Core2 Quad Processor Q9550 at 2.83 GHz and a GPU of NVIDIA
GTX 480. CPU simulations are performed on a single core; thus, the full multicore
power of the CPU is not fully utilized. The speed-up factors would be different if
the CPU code was parallelized to run on multiple cores.

REFERENCES

[1] K. Yee, “Numerical Solution of Initial Boundary Value Problems Involving Maxwell’s
Equations in Isotropic Media,” IEEE Transactions on Antennas and Propagation, Vol. 14, No. 3,
pp. 302–307, 1966.
[2] A. Taflove and S. Hagness, Computational Electrodynamics: The Finite-Difference Time-
Domain Method, 3rd edition. Norwood, MA: Artech House, 2005.
[3] A. Elsherbeni and V. Demir, The Finite Difference Time Domain Method for Electromagnetics:
With MATLAB Simulations, New York: SciTech Publishing, 2009.
[4] NVIDIA CUDA Parallel Programming and Computing Platform,
http://www.nvidia.com/object/cuda_home_new.html.
[5] S. Krakiwsky, L. Turner, and M. Okoniewski, “Graphics Processor Unit (GPU) Acceleration of
Finite-Difference Time-Domain (FDTD) Algorithm,” Proc. 2004 International Symposium on
Circuits and Systems, Vol. 5, pp. 265–268, 2004.
[6] S. Krakiwsky, L. Turner, and M. Okoniewski, “Acceleration of Finite-Difference Time-Domain
(FDTD) Using Graphics Processor Units (GPU),” 2004 IEEE MTT-S International Microwave
Symposium Digest, Vol. 2, pp. 1033–1036, 2004.
144 Advanced Computational Electromagnetic Methods and Applications

[7] S. Adams, J. Payne, and R. Boppana, “Finite Difference Time Domain (FDTD) Simulations
Using Graphics Processors,” Proceedings of the 2007 DoD High Performance Computing
Modernization Program Users Group (HPCMP) Conference, pp. 334–338, 2007.
[8] M. Inman, A. Elsherbeni, and C. Smith “GPU Programming for FDTD Calculations,” The
Applied Computational Electromagnetics Society (ACES) Conference, Honolulu, HI, 2005.
[9] M. Inman and A. Elsherbeni, “Programming Video Cards for Computational Electromagnetics
Applications,” IEEE Antennas and Propagation Magazine, Vol. 47, No. 6, pp. 71–78, 2005.
[10] M. Inman and A. Elsherbeni, “Acceleration of Field Computations Using Graphical Processing
Units,” The Twelfth Biennial IEEE Conference on Electromagnetic Field Computation CEFC
2006, Miami, FL, 2006.
[11] M. Inman, A. Elsherbeni, J. Maloney, and B. Baker, “Practical Implementation of a CPML
Absorbing Boundary for GPU Accelerated FDTD Technique,” The 23rd Annual Review of
Progress in Applied Computational Electromagnetics Society, ACES'07, Verona, Italy, pp. 19–23,
2007.
[12] M. Inman, A. Elsherbeni, J. Maloney, and B. Baker, “Practical Implementation of a CPML
Absorbing Boundary for GPU Accelerated FDTD Technique,” Applied Computational
Electromagnetics Society Journal, Vol. 23, No. 1, pp. 16–22, 2008.
[13] M. J. Inman and A. Z. Elsherbeni, “Optimization and Parameter Exploration Using GPU Based
FDTD Solvers,” IEEE MTT-S International Microwave Symposium Digest, pp. 149152, June
2008.
[14] M. Inman, A. Elsherbeni, and V. Demir, “Graphics Processing Unit Acceleration of Finite
Difference Time Domain,” Chapter 12, in The Finite Difference Time Domain Method for
Electromagnetics (with MATLAB Simulations), New York: SciTech Publishing, 2009.
[15] Ian Buck, Brook Spec v0.2, Stanford, CA: Stanford University Press, 2003.
[16] N. Takada, N. Masuda, T. Tanaka, Y. Abe, and T. Ito, “A GPU Implementation of the 2-D
Finite-Difference Time-Domain Code Using High Level Shader Language,” Applied
Computational Electromagnetics Society Journal, Vol. 23, No. 4, pp. 309–316, 2008.
[17] Valcarce, G. de la Roche, and J. Zhang, “A GPU Approach to FDTD for Radio Coverage
Prediction,” Proceedings of the 11th IEEE Singapore International Conference on
Communication Systems (ICCS), pp. 1585–1590, GuangZhou, China, 2008.
[18] P. Sypek and M. Michal, “Optimization of a FDTD Code for Graphical Processing Units,” 17th
International Conference on Microwaves, Radar and Wireless Communications, MIKON 2008,
Wroclaw, Poland, pp. 1–3, 2008.
[19] P. Sypek, A. Dziekonski, and M. Mrozowski, “How to Render FDTD Computations More
Effective Using a Graphics Accelerator,” IEEE Transactions on Magnetics, Vol. 45, No. 3, pp.
1324–1327, 2009.
[20] N. Takada, T. Shimobaba, N. Masuda, and T. Ito, “High-Speed FDTD Simulation Algorithm for
GPU with Compute Unified Device Architecture,” IEEE International Symposium on Antennas
& Propagation & USNC/URSI National Radio Science Meeting, North Charleston, SC, p. 4,
2009.
[21] G. Valcarce, De La Roche, A. Jüttner, D. López-Pérez, and J. Zhang, “Applying FDTD to the
Coverage Prediction of WiMAX Femtocells,” EURASIP Journal on Wireless Communications
and Networking, 2009.
GPU Acceleration Techniques of FDTD Methods 145

[22] V. Demir, and A. Elsherbeni, “Compute Unified Device Architecture (CUDA) Based Finite-
Difference Time-Domain (FDTD) implementation,” Journal of the Applied Computational
Electromagnetics Society (ACES), Vol. 25, No. 4, pp. 303–314, 2010.
[23] The OpenCL Specification, ver. 1.0, Khronos OpenCL Working Group,
http://www.khronos.org/registry/cl/specs/opencl-1. 0.48.pdf, 2009.
[24] T. Stefanski, S. Benkler, N. Chavannes, and N. Kuster. “Parallel Implementation of the Finite-
Difference Time-Domain Method in Open Computing Language,” 2010 International
Conference on Electromagnetics in Advanced Applications (ICEAA), pp. 557–560, 2010.
[25] T. P. Stefanski, P. Tomasz, N. Chavannes, and N. Kuster. “Performance Evaluation of the Multi-
Device OpenCL FDTD Solver,” Proceedings of the 5th European Conference on Antennas and
Propagation (EUCAP), pp. 3995–3998, 2011.
[26] D. Sheen, S. Ali, M. Abouzahra, and J. Kong, “Application of the Three-Dimensional Finite-
Difference Time-Domain Method to the Analysis of Planar Microstrip Circuits,” IEEE
Transactions on Microwave Theory and Techniques, Vol. 38, No. 7, pp. 849–857, 1990.
Chapter 4
Recent FDTD Advances for Electromagnetic
Wave Propagation in the Ionosphere
Alireza Samimi, Bach T. Nguyen, and Jamesina J. Simpson

This chapter presents an overview of two recent FDTD [1, 2] modeling advances
for calculating electromagnetic wave propagation in the ionosphere [3]. Section 4.1
provides an introduction of the topic and highlights the advantages of using FDTD
over ray tracing techniques for this application. Section 4.2 summarizes the current
state of the art for calculating transionospheric electromagnetic wave propagation.
Section 4.3 provides an overview of the global FDTD Earth-ionosphere models.
Section 4.4 then describes a new and efficient 3-D FDTD magnetized ionospheric
plasma model [4] that may be used to greatly advance the current state of the art
for ionospheric wave propagation. Next, Section 4.5 describes a new capability:
stochastic FDTD (S-FDTD) [5, 6] magnetized ionospheric plasma modeling [7],
which can yield both average as well as variance electric and magnetic fields
resulting from variances and uncertainties in the ionosphere composition. This
chapter then concludes with a discussion of input parameters and a list of possible
applications of these models.

4.1 INTRODUCTION

Communications, radar, and remote-sensing applications rely heavily on accurate


knowledge of both the state of the ionosphere and its influence on electromagnetic
signal propagation. Satellite communications, the global positioning system GPS),
over-the-horizon radar, target direction finding, ionospheric remote sensing, and
space weather effects are some example applications. The success of these
applications may be greatly improved with the availability of accurate modeling
capabilities. Three major challenges, however, must be overcome in order to
perform realistic calculations of electromagnetic wave propagation through the
ionosphere:

147
148 Advanced Computational Electromagnetic Methods and Applications

 For most applications, the electromagnetic wave frequency is high enough


such that complex magnetized plasma physics must be accommodated.
 The ionosphere exhibits high variability and uncertainty in both time and
space.
 The ionosphere is comprised of both large- and small-scale structures that
often need to be accommodated.

Several approximate methods involving ray tracing have been proposed to


calculate transionospheric electromagnetic wave propagation [8–11]; however,
these methods are incapable of taking into account the full ionospheric variability
and/or terrain between the transmitters and receivers. Further, as the frequency of
the electromagnetic wave is reduced, their calculated results diverge from the true
solution as the physical reality departs from the short-wavelength asymptotic
assumptions underlying geometrical optics and ray tracing. Finally, for techniques
such as phase screen or Rytov approximations, the calculated results are only valid
for weak fluctuations of the ionosphere.
The FDTD method is a robust computational electromagnetic technique that
has been applied to problems across the electromagnetic spectrum, from low-
frequency geophysical problems below 1 Hz and up into the optical frequency
range [2]. The advantages of FDTD for Earth-ionosphere wave propagation
problems include [12–14]:

 As a grid-based method, the 3-D spatial material variations of the


ionosphere composition, topography/bathymetry, lithosphere composition,
geomagnetic field, targets, and antennas may be accommodated. Figure 4.1,
for example, shows FDTD-calculated global electromagnetic propagation in
the Earth-ionosphere waveguide below 1 kHz that includes details of the
Earth’s topography, bathymetry, oceans, and a (simplified) layered
ionosphere, which is often sufficient for studying electromagnetic wave
propagation below 1 kHz.
 The complex shielding, scattering, and diffraction of electromagnetic waves
may be calculated in a straightforward manner.
 Any number of simultaneous sources may be accommodated (antennas,
plane waves, lightning, ionospheric currents).
 Any number of observation points may be accommodated, and movies may
be created of the time-marching propagating waves.
 As a time-domain method, FDTD can model arbitrary time-varying source
waveforms, movement of objects, and time variations in the ionosphere.
 Results may be obtained over a large spectral bandwidth via a discrete
Fourier transform.
Recent FDTD Advances in the Ionosphere 149

 A full 3-D magnetized ionospheric plasma FDTD algorithm may be used to


calculate all important ionospheric effects on signals, including absorption,
refraction, phase and group delay, frequency shift, polarization, and Faraday
rotation.

The challenge of being able to accommodate all of the above details and
physics is that the FDTD model may quickly become very memory and time
intensive and thus require significant supercomputing resources. This makes real-
time calculations difficult or sometimes even impossible to obtain. Further, if the
electromagnetic frequency is high enough (and thus the required grid resolution
low enough), the required grid size may become computationally infeasible,
especially for long propagation paths.

Figure 4.1 Snapshot visualizations of round-the-world electromagnetic propagation below 1 kHz as


calculated by a 3-D FDTD model including details of the Earth’s topography, oceans,
and a (simplified) layered ionosphere. ©2007 IEEE [13].

Although supercomputing capabilities continue to improve, efficient FDTD


algorithms are still needed to make electromagnetic wave propagation modeling in
the ionosphere feasible and manageable. Armed with efficient FDTD algorithms,
FDTD and its advantages as listed above may provide a means of greatly
advancing the current state of the art relative to approximate calculations that have
been used for decades. Sections 4.4 and 4.5 introduce two such example efficient
FDTD algorithms that have been recently introduced.

4.2 CURRENT STATE OF THE ART

In 1837, W. R. Hamilton introduced a system of differential equations describing


ray paths through general anisotropic media [15]. In 1954, J. Haselgrove proposed
that Hamilton’s equations were suitable for numerical integration on electronic
computers and could provide a means of calculating ray paths in the ionosphere
[16]. In 1960, Haselgrove and Haselgrove implemented such a ray-tracing program
150 Advanced Computational Electromagnetic Methods and Applications

to calculate twisted ray paths through a model ionosphere using Cartesian


coordinates [8, 17].
In 1975, M. Jones and J. J. Stephenson generated “an accurate, versatile
FORTRAN computer program for tracing rays through an anisotropic medium
whose index of refraction varies continuously in three dimensions” [18]. This
model and variations of it are still in use today, and have been applied to such
applications as over-the-horizon radar [8]. Additionally, many other related
techniques have now been generated especially for higher frequency scintillation
studies, including the phase screen [10] or Rytov approximation, parabolic
equation method [9], and even hybrid methods, such as combining the complex
phase method and the technique of a random screen [11].
These techniques, however, are only valid under certain conditions. The
complex phase method, for example, is only valid for electromagnetic wave
propagation above 1 GHz. The phase screen or Rytov approximation is only valid
for weak fluctuations of the ionosphere. For all of these methods involving ray
tracing, as the frequency of the electromagnetic wave is reduced and its
wavelength increases, the calculated results diverge from the true solution as the
physical reality departs from the short-wavelength asymptotic assumptions
underlying geometrical optics and ray tracing [18].
Ray tracing has been traditionally employed for ionospheric propagation
because it is computationally inexpensive. However, it is incapable of taking into
account all of the variable terrain and structural material properties between the
transmitters and receivers, and it is restrictive, in that particular methodologies of
implementing the ray tracing are limited to certain frequency ranges, and its
accuracy depends on the plasma properties and it provides solutions at only
individual frequencies (steady-state solutions may be obtained; pulses cannot be
studied). An alternative to ray tracing is full-vector Maxwell’s equations FDTD
modeling, which is not limited by the above issues.
FDTD plasma models have been developed by a number of groups [1922].
However, all of these models require large amounts of computer memory, always
require very small time steps linked to the plasma parameters rather than the
Courant limit, or produce nonphysically spurious electrostatic waves (of numerical
origin) due to the spatially noncollocated status of electric fields and current
densities, resulting in late-time instabilities [19, 20]. Section 4.4 describes an
FDTD plasma method [4] that does not suffer from these drawbacks.
As an initial study of the performance of an FDTD ionospheric plasma
algorithm, FDTD plasma model results have previously been compared to ray-
tracing results for the application of reducing the radar cross-section of targets [23].
Although [23] was limited to unmagnetized, collisional cold plasmas, the authors
conclude that FDTD is more accurate and less restrictive than ray tracing, at the
cost of being more computationally demanding. For example, they determine that
ray tracing only yields accurate results in their study when both the density scale
length is long compared to the free-space wavelength of the incident wave and
when the conduction current is small as compared to the displacement current in
Recent FDTD Advances in the Ionosphere 151

the medium. Additionally, ray tracing provides solutions at only individual


frequencies (i.e., for sinusoidal steady-state signals, not for pulses).

4.3 FDTD EARTH-IONOSPHERE MODEL OVERVIEW

In this section, the general modeling approach to the global 3-D FDTD Earth-
ionosphere models is described. Only the solution to Maxwell’s equations will be
presented in this section as can be applied the lithosphere and atmosphere regions.
These Maxwell’s equations solutions may also be used in the ionosphere region if
a simplified isotropic ionosphere is utilized. Section 4.4 will describe the modeling
approach that may be used in the ionosphere region when the magnetized
ionospheric plasma must be accommodated.
Although global FDTD models are described here, these models may be easily
adapted to simulate only local regions at higher resolutions. This is useful for
higher-frequency applications in which the electromagnetic waves would only
propagate vertically (radially) or only over short distances laterally around the
world.
Two generations of global FDTD models have been generated: latitude-
longitude models [12, 24] and geodesic models [25]. Only a latitude-longitude
model will be described here because of the ease of implementing a magnetized
ionospheric plasma algorithm on its East-West and North-South components rather
than on the variable field component orientations in the geodesic hexagonal and
triangular grid cells.
Note that the model described here from [12] is more efficient than other
global FDTD models [24] because it includes a means of mitigating the grid-cell
eccentricity by merging cells in the East-West direction as either pole is
approached. This mitigation technique permits the use of a larger time step of
nearly the Courant limit permitted by the Equatorial cells.

4.3.1 FDTD Space Lattice

The present model maps the complete Earth-ionosphere cavity onto a 3-D
spherical-coordinate FDTD space lattice that extends ±100 km radially from sea
level. Figure 4.2 illustrates the general layout of the lattice as seen from the
transverse magnetic (TM) plane at a constant radial coordinate. The lattice is
presented in this section in a logically Cartesian 2M × M × K-cell arrangement,
where M is a power of 2, in order to distinguish between the spherical positions of
each cell (r, θ, ϕ) and their corresponding grid cell indices (i, j, k). The lattice-cell
position index in the West-East direction is 1 ≤ i ≤ 2M, the lattice-cell position
index in the South-North direction is 1 ≤ j ≤ M, and the lattice-cell position index
in the radial direction is 1 ≤ k ≤ K. We see that the grid cells follow along lines of
constant latitude, θ = constant, where θ is the usual spherical angle measured from
the North Pole, and along lines of constant longitude, ϕ = constant, where ϕ is the
152 Advanced Computational Electromagnetic Methods and Applications

usual spherical azimuthal angle measured from a specified prime meridian. In this
manner, each TM plane of the grid shown in Figure 4.2 is comprised of isosceles
trapezoidal cells away from the north and south poles, and isosceles triangular cells
at the poles. Similarly, each transverse electric (TE) plane at a constant radial
coordinate is comprised of isosceles trapezoidal cells away from the North and
South Poles, and a polygon at the poles [12].

E W

Figure 4.2 General layout of the 3-D FDTD Earth-ionosphere model as seen from a TM plane at a
constant radial coordinate. ©2014 IEEE [12].

The same angular increment in latitude is chosen,  =/m, for each cell in the
grid. Thus, the South-North span of each trapezoidal or triangular grid cell is Δs-n =
πR/m, where R is the radial distance from the center of the Earth. To maintain
square or nearly square grid cells near the equator, we select the baseline value of
the angular increment in longitude, Δϕ, to equal Δθ. However, this causes the west-
east span of each cell, Δw-e = R Δϕ sin θ, to be a function of θ. This could be
troublesome for cells near the North and South Poles where θ  0 and θ  π,
respectively. There, the geometrical eccentricity of each cell, Δs-n / Δw-e = Δθ / (Δϕ
sin θ), would become quite large, and the numerical stability and efficiency of the
FDTD algorithm would be degraded. To address this issue, adjacent grid cells in
the West-East direction are systematically combined as either pole is approached
in order to keep all the grid cells at nearly the same size [12]. This is illustrated
Recent FDTD Advances in the Ionosphere 153

(not to scale) in Figure 4.2. Details on the implementation of this combining


process are provided in [12].
The wrap-around or joining of the lattice is along a specific line of constant
longitude, or meridian. This joining is, in effect, a periodic boundary condition
applied at each j-row of lattice cells, whether trapezoids or triangles [12].

4.3.2 Example Updating Algorithm for TM Grid Cells

Given the above assumptions, Ampere’s Law in integral form [2] can be applied to
develop an FDTD time-stepping relation for the electric field Ez at the center of the
(i, j, k)’th trapezoidal grid cell [12]. For example, we have
Ezn 1  i, j , k   Ezn  i, j , k 
 H xn  0.5  i, j  0.5, k   we  j  0.5, k   (4.1)
t 
 
 H x  i, j  0.5, k   we  j  0.5, k 
n  0.5
 
 0 S  j, k   
  H y  i  0.5, j , k   H y  i  0.5, j , k    s  n 
 n  0.5 n  0.5

where Δt is the time step size and

  M  j  
 we  j  0.5, k   R sin   (4.2a)
 M 

  M  j  1  
 we  j  0.5, k   R sin   (4.2b)
 M 

 s n
S  j, k     we  j  0.5, k    we  j  0.5, k   (4.2c)
2

Similarly, the update for Ez at the center of the ith triangular grid cell at the
north pole (j = M) is given by
Ezn 1  i, M , k   Ezn  i, M , k 

t  H xn 0.5  i, M  0.5, k   we  M  0.5, k   (4.3)


  
 0 S  M , k    H yn0.5  i  0.5, M , k   H yn0.5  i  0.5, M , k    s n 
 
where Δw-e (M  0.5, k) is given by (4.2) for the case j = M, and

 we  M  0.5, k   s n 
    M  0.5, k   

(4.4)
S M ,k   sin cos 1  we 
2 
  2 s  n 
154 Advanced Computational Electromagnetic Methods and Applications

Expressions analogous to (4.3) and (4.4) can be derived for the i’th triangular
grid cell at the South Pole (j = 1).
The basic TM-plane FDTD time-stepping algorithm is completed by
specifying the updates for the Hx and Hy fields [12]. For example, for the
trapezoidal grid cell we have

H xn 1.5  i, j  0.5, k   H xn  0.5  i, j  0.5, k 


t (4.5)
  E n 1  i, j  0.5, k  0.5   E yn 1  i, j  0.5, k  0.5  
0  r  y
t
  Ezn 1  i, j  1, k   Ezn 1  i, j, k  
0  s  n
H yn 1.5  i  0.5, j , k   H yn  0.5  i  0.5, j , k 
t (4.6)
  E n 1  i  0.5, j, k  0.5   Exn 1  i  0.5, j, k  0.5  
0  r  x
t
  Ezn 1  i  1, j, k   Ezn 1  i, j, k  
0  w e  j , k  

where Δr is the cell span in the radial direction.


For a triangular grid cell at the North Pole (j = M), we similarly have

H xn 1.5  i, M  0.5, k   H xn  0.5  i, M  0.5, k 


t (4.7)
  E n 1  i, M  0.5, k  0.5   E yn 1  i, M  0.5, k  0.5  
0  r  y
t
  E n 1  i, M  1, k   Ezn 1  i, M , k  
0  s  n  z
H yn 1.5  i  0.5, M , k   H yn  0.5  i  0.5, M , k 
t (4.8)
  E n 1  i  0.5, M , k  0.5   Exn 1  i  0.5, M , k  0.5  
0  r  x
t
  Ezn 1  i  1, M , k   Ezn 1  i, M , k  
0  w e  M , k  

Expressions analogous to (4.7) and (4.8) can be derived for a triangular grid
cell at the South Pole (j = 1).
For additional details of the global FDTD updating equations, including those
for merging of cells, TE field components, and so forth, please refer to [12].
Recent FDTD Advances in the Ionosphere 155

4.4 NEW MAGNETIZED IONOSPHERIC PLASMA ALGORITHM

Original global FDTD models utilized a simplified conductivity profile ionosphere


(isotropic ionosphere) and thus solved only Maxwell’s equations in the ionosphere.
Only one global FDTD model has been published that accounts for a full 3-D
magnetized ionosphere plasma [26]. This section presents a recently developed
magnetized plasma model applied to a global FDTD grid [4] that is much simpler
to implement and is more efficient than the plasma model of [26].
The new FDTD plasma algorithm is analogous for each of the electrons,
positive ions, and negative ions in the ionosphere, so for simplicity only electrons
will be considered here. In the new formulation [4], the coupled Maxwell’s
equations  Lorentz equation of motion plasma model may be solved using an
algorithm originally used by Borris [27] to calculate the velocity of particles in
Particle-In-Cell (PIC) plasma models [28]. PIC codes track trajectories of particles
or groups of particles (“super-particles”) and solve for electrodynamic fields. By
using the Borris approach, the resulting FDTD plasma model is stable while also
reducing the memory requirements and the execution time compared to all
previous FDTD plasma formulations [4].
The presumption of the method is that the density of the particle species in the
ionosphere is known. It also assumes that the temporal variation of the density in
comparison to the temporal variation of the particle velocity is negligible; that is,
the plasma is cold and no thermal pressure is considered. In this case, the
momentum equation for each species is simplified to:

J j
 j J
j  0 pj2 E  Cj  J j (4.9)
t
where J j is electric current due to 𝑗 species, where the subscript 𝑗 represents the
electron or ion species (e for electrons, p for positive ions, or n for negative ions),
𝜈𝑗 is the collision frequency,  0 is the electric permittivity, 𝜔𝑝𝑗 is the j species
plasma frequency, and 𝜔 ⃗ 𝑐𝑗 is the j species gyro-frequency. Note that for electrons
and negative ions the gyro-frequency (𝜔 ⃗ 𝑐𝑗 ) is negative.
Equation (4.9) is incorporated into Maxwell’s equations as:
E (4.10)

 H 0  J J  J s
t
H (4.11)
  E   0
t
where J s is the external source current density. The discretization technique that is
used is based on the Yee algorithm where the transverse magnetic (TM) and the
transverse electric (TE) planes are stacked in the z-direction. The H field
156 Advanced Computational Electromagnetic Methods and Applications

components are calculated at each half time steps; that is, (𝑛 + 1/2), and the E
fields at every integer time step; that is, (𝑛). The FDTD form of Maxwell’s
equations for the Earth-ionosphere model are described in [12] and in Section 4.3.
Thus, the focus of this section is on the efficient computational solution of (4.9)
and its incorporation into (4.10). Equation (4.11) is not modified relative to
traditional FDTD.

4.4.1 Collisional Plasma Algorithm

The plasma may be considered collisionless or collisional. This section will


consider the more general collisional case. Under the cold plasma condition and by
assuming a known electron density, the momentum equation can be simplified as
follows:

J e (4.12)
 e J
e  0 pe
2
Ε  Ce  J e
t

The difficulty in solving (4.12) in the collisional regime is that the current
density vector is needed at time step (n+1/2), which is not yet known. In order to
solve this issue, a two-step method known as the predictor-corrector method is
employed. In the first (predictor) step, the current vector at (n1/2) is used to
predict the current density at (n+1/2). In the second (corrector) step, the predicted
current density vector from the first step is used, and all the equations are solved
again. New current density vector is found at (n+1/2) that is known as the corrector
current density vector. The average of the predicted current density vector and the
corrector current density vector at (n+1/2) is used for current density vector at
(n+1/2). The predictor-corrector method is also known as the MacCormack method;
it is second-order accurate [29, 30].
Equation (4.12) in discrete form in the predictor step is as follows:

n
1
n
1
 n  12 n
1

J e , p2  J e 2
n
n
1
 J e , p  J e
2
 (4.13)
 J e 2
  0 pe
2
En  ce2  
t  2 
 

In (4.13) it appears that the current density components should be collocated


with the electric field component. However, in the time domain, the current
densities are solved out of sync with the electric fields, which are solved at each
integer time step, this is, n. Instead, the current densities are solved at the same
time step as the H-fields (or at each half time step); that is, (n+1/2). In order to
simplify (4.13), the E-field should be incorporated into the current vector term.
We define two auxiliary current density components as follows:
Recent FDTD Advances in the Ionosphere 157

1
n
n
1
t 0 pe
2
En t J e 2
J  
J e, p
2
 (4.14)
2 2
1
n
n
1
t 0 pe
2
En t J e 2

J 
Je  2
 (4.15)
2 2
The cross product does not change the energy, therefore, J   J  .
However, the direction of the vector is changed. Figure 4.3 demonstrates the
rotation of the current density vector around 𝜔 ⃗ 𝑐𝑒 that is for simplicity (only for the
figure) assumed to be perpendicular to the current density components. The
direction of 𝜔⃗ 𝑐𝑒 is out of the paper and opposite to the B-field.

J


2 ce


J

Figure 4.3 Rotation of the current vector around 𝜔


⃗ 𝑐𝑒 .

From Figure 4.3, the angle of rotation is:

 J  J  t
 tan
 1
tan 1 ce (4.16)
2 
J J 
2

The sampling frequency should be twice the electron gyro-frequency to accurately


model it. Therefore, in order to obtain results without aliasing, Δ𝑡 < 𝜋⁄𝜔𝑐𝑒 , which
means 𝜃 ≤ 115∘ . For smaller angles, the results are more precise. The J  can be
found in four steps as follows [24, 25]:

J
0 J  t (4.17a)

J
1 J   J0 (4.17b)

J 2 J1  s (4.17c)

J J   J2 (4.17d)
158 Advanced Computational Electromagnetic Methods and Applications

Where t  ce tan    , and s  ce tan   . Note that 𝜔


⃗ 𝑐𝑒 is negative.
ce 2 ce
As part of the predictor-corrector method, (4.12) is discretized in the corrector
step as follows:

n
1
n
1
 n  12 n
1

J e,c 2  J e 2 n
n
1
 J  J 2
 (4.18)
 J e, p
2
  0 pe
2
En  ce2  e,c e

t  2 
 

The auxiliary current density vectors are then defined as:

1
n
n
1
t 0 pe
2
En t J e, p2
J  
J e ,c 2
 (4.19)
2 2
1
n
n
1
t 0 pe
2
En t J e, p2
J 
Je  2
 (4.20)
2 2

The final current vector is:

1 1
n n
n
1
J e , p2  J e , c 2
Je 2
 (4.21)
2

The maximum allowed time step that may be used depends on the electron
gyro frequency; that is, Δ𝑡 < 𝜋⁄𝜔𝑐𝑒 .
Several validation tests are performed in [4] to demonstrate the accuracy and
capability of the newly developed FDTD plasma model. Section 4.4.2 below
summarizes two example validation scenarios. Section 4.4.3 then summarizes the
performance of the new FDTD plasma model.

4.4.2 Two Example Validations

4.4.2.1 High-Resolution Tests

As example validations, the propagation of an electromagnetic wave inside a small


plasma spherical waveguide is investigated. For this section, we will thus be
utilizing fully spherical coordinates to describe the field components (as opposed
Recent FDTD Advances in the Ionosphere 159

to the logically Cartesian grid description of Section 4.3). These tests serve as a
high-resolution validation of the global FDTD plasma model of [4, 9, 26].
The spherical waveguide has an internal radius of 2.673m and an external
radius of 3.6978m. A magnetic field is considered in the south-north direction and
its strength is 𝐵0 = 0.06 𝑇. The electron density is 𝑛𝑒 = 1018 /m3 . The source of the
electromagnetic wave is located at 30o 𝑆 and propagation toward the equator is
examined. First, propagation of a single frequency sinusoidal wave with frequency
f = 10.34 GHz (  = 6.5 × 1010 rad/s) is simulated. The source creates a linearly
polarized electromagnetic plane wave polarized in the radial r-direction.
According to plasma theory, only circular polarization can propagate along the
magnetic field. The electromagnetic wave with linear polarization can be
decomposed into a left-hand and a right-hand circular polarization wave. The right-
hand circular polarization wave is known as R-wave and the left-hand circular
polarization wave is called L-wave. The velocity of the wave with left-hand
circular polarization is different from the right-hand circular polarization wave.
Because of this, the direction of polarization of the initially linearly polarized wave
rotates as the wave moves along the magnetic field. This rotation is known as
Faraday rotation [31]. It can be shown that the rotation angle per unit distance 𝜃𝐹
may be obtained from the following expression [19]:

 LH   RH (4.22)
F 
2

where  LH and  RH are the wave number of the L-wave and R-wave, respectively,
and can be calculated as follows [31]:

 LH   0 0 1 
 
 pe 2

(4.23)
1  
 pe

 RH   0 0 1 
 
 pe 2

(4.24)
1  
 pe

The electric field at radius 3.1858m is recorded at increments of 10 cells from


the source towards the equator. The resolution of the grid cells at the equator and
on the internal surface of the sphere is 1 mm × 1 mm × 1 mm in the radial
direction (r-direction), North-South direction (θ-direction), and East-West
direction (ϕ-direction), respectively, and on the external surface is 1 mm ×
160 Advanced Computational Electromagnetic Methods and Applications

1.3 mm × 1.3 mm. Figure 4.4 shows the polarization of the electromagnetic wave
at each observation point. The numerical Faraday rotation can be obtained from

E 
tan 1   
 FN   Er  (4.25)
d

The error is calculated as follows:

 F   FN
errorF  (4.26)
F

The error of the Faraday rotation angle is less than 1.7%.

Figure 4.4 Faraday rotation of 10.34-GHz electromagnetic wave propagation along the magnetic
field from 30° South toward the equator inside a small spherical waveguide with the
internal radius of 2.673m and an external radius of 3.6978 m. Note that the electric field
is recorded at a radius of 3.18m and between 10  80 mm from the source in increments
of 10 mm. ©2014 IEEE [4].
Recent FDTD Advances in the Ionosphere 161

Next, using the same model, a Gaussian pulse is used for the source of the
electromagnetic wave. The source electric field is described by the following
expression:

  t  50t 2 
E  exp    (4.27)
 2  7t 2 
 

This pulse is expected to excite the R-wave and L as well as the low frequency
whistler mode. The whistler mode is part of the R-wave dispersion relation that can
propagate at frequencies less than the electron gyro-frequency.

Figure 4.5 Time domain electric field waveform in the r-direction recorded ~40 mm from the
source along the magnetic field. ©2014 IEEE [4].
162 Advanced Computational Electromagnetic Methods and Applications

Figure 4.6 Power spectrum of the electric field in the r-direction recorded ~40 mm from the source
along the magnetic field line. ©2014 IEEE [4].

Figure 4.5 shows the time domain electric field in the r-direction; that is,
𝐸𝑟 (𝑡) , 40 cells (approximately 40 mm) from the source. The low-frequency
whistler mode arrives at the observation point at around 1.2 ps. Figure 4.6 shows
the power spectrum of the electric field corresponding to the time waveform of
Figure 4.5. The L-wave cutoff frequency, 𝜔𝐿 , the R-wave cutoff frequency, 𝜔𝑅 ,
and the whistler mode with frequency band less than the electron cyclotron
frequency (< 𝜔𝑐𝑒 ) are apparent in the figure. These results are also in very good
agreement with plasma theory and the simulation results of the previous
anisotropic model [19].
Note that in this validation test, the time step value for solving Maxwell’s
equations is chosen according to the Courant stability condition and is Δ𝑡 = 1.5 ps.
This time step value corresponds to a rotation angle 𝜃 = 0.9∘ that yields a
numerical electron gyro-frequency error of less than 0.5%. Therefore, there is no
need to use a different time step for solving the current equation compared to
Maxwell’s equations.
Next, propagation of a sinusoidal electromagnetic wave perpendicular to the
magnetic field is investigated. Depending upon the direction of the electric field
component with respect to the magnetic field, two types of electromagnetic waves
can propagate: (1) the ordinary mode (O-mode), for which the electric field
component of the electromagnetic wave is parallel to the background magnetic
field; and (2) the extraordinary mode (X-mode), for which the electric field
component of the electromagnetic wave is perpendicular to the magnetic field. The
O-mode wave has a cutoff frequency determined by the plasma frequency; that is,
𝜔0 > 𝜔𝑝𝑒 .
Recent FDTD Advances in the Ionosphere 163

To model a magnetic field perpendicular to the direction of propagation, the


direction of the background magnetic field is changed to the radial direction and a
monochromatic plane wave E-field source polarized in the radial direction is used.
Over a distance of 64 cells from the source, the medium is vacuum; thereafter, the
magnetized plasma region starts. The left panels of Figure 4.7 show the frequency
power spectrum of the electric field of 40 cells. (approximately 40 mm) before
reaching the magnetized plasma region and the right panels depict the frequency
power spectrum 40 cells from the boundary into the plasma for two frequencies:
the top panels having 𝜔0 = 2𝜋(10.35 × 109 ) > 𝜔𝑝𝑒 = 2𝜋(8.97 × 109 ) and the
bottom panels having 𝜔0 = 2𝜋(7.5 × 109 ) < 𝜔𝑝𝑒 .
Figure 4.7 clearly demonstrates that when the wave frequency is less than the
plasma frequency (over-dense plasma), the electromagnetic wave cannot penetrate
into the plasma and reflects back. Note that the peak of the frequency power
spectrum of Figure 4.7(d) is at the plasma frequency and the amplitude of the
mode corresponding to the injected plane wave is approximately 1/600 of the
amplitude of the wave before reaching the plasma boundary.
The above test is repeated for different wave and plasma frequencies and it is
found that in addition to the electron cyclotron frequency, the plasma frequency
can restrict the maximum allowed time step for some modeling scenarios.
Empirically it is determined that the maximum allowed time step should meet the
2
condition   pe dt  >0.9 implying dt < 0.87 . Unfortunately this restriction
1  
 2   pe
on the time step value is universal and should be applied to Maxwell’s equations as
well as the current equation solver. In other words, the maximum allowed time
step value is always either the Courant condition or dt < 0.87 , whichever is
 pe
smaller.
164 Advanced Computational Electromagnetic Methods and Applications

Figure 4.7 Frequency power spectrum of the 𝐸𝑟 component of an O-mode plane wave propagating
from 30∘ S toward the equator, (a) under-dense plasma (ω0 > ωpe ) 40 mm after the
plasma boundary; (b) under-dense plasma ( ω0 > ωpe ) 40 mm before the plasma
boundary; (c) over-dense plasma (ω0 < ωpe) 40 mm after the plasma boundary; and (d)
over-dense plasma (ω0 < ωpe ) 40 mm before the plasma boundary.

4.4.2.2 Global Earth-Ionosphere Propagation Test

As a second type of validation test, extremely low-frequency (ELF) propagation


attenuation in the Earth-ionosphere system is investigated. This permits the use of
a lower-resolution global plasma model so that propagation characteristics over
larger distances may be studied. This test was also performed using the global
isotropic and the previous anisotropic ionosphere models, both of which compared
very well with previous analytical results and measurements [12, 26].
The Nyquist sampling theorem requires Δ𝑡𝑐 < 3.57 × 10−7 s for modeling the
electron gyro-frequency without aliasing. Gyro-frequency error tests show that for
at least Δ𝑡𝑐 ≤ 1.2 × 10−7 s, the simulation results are in good agreement with theory
[4]. The time step value for solving Maxwell’s equations according to the Courant
stability condition is Δ𝑡 = 3 × 10−6 s , which also satisfies the condition 𝑑𝑡 <
0.87⁄𝜔𝑝𝑒 for the bottom of the ionosphere; however, this time step value is larger
than the time step value required for the current equation solver. Thus, in this test,
the time step value for the current equation and Maxwell’s equations are different.
For the current equation solver, two time-step values, Δ𝑡𝑐1 = 1.2 × 10−7 s
and Δ𝑡𝑐2 = 6 × 10−8 s are separately tested and the results are compared. In order
to resolve the inconsistency of the time step values, for each cycle that Maxwell’s
Recent FDTD Advances in the Ionosphere 165

equations are solved the current density vector is updated 25 (25Δ𝑡𝑐1 = Δ𝑡) and 50
(50Δ𝑡𝑐2 = Δ𝑡 ) times, respectively. Note that for the case of Δ𝑡𝑐1 = 1.2 × 10−7 s ,
two simulations are conducted. In the first, collisions are considered in the
ionosphere; in the second, collisions are neglected. Figure 4.8 shows the collision
frequency versus altitude that is used in the collisional simulation case. The
ionosphere is assumed to start from 80 km.

Figure 4.8 Profile of the collision frequency in the ionosphere. ©2014 IEEE [4].

Topographic and bathymetric data are obtained from National Oceanic and
Atmospheric Administration (NOAA) Global Relief CD-ROM. The magnetic field
is mapped onto the FDTD mesh according to the global geomagnetic field values
at 100 km as obtained from the international geomagnetic reference field (IGRF)
model. For the lithosphere, the same conductivity values of [12] are assigned
depending upon whether the space lattice point is located directly below an ocean
or within a continent. For the low-altitude atmosphere, the same exponential
conductivity profile of [22] is assumed according to [32].
Position and time-dependent density profiles of electrons and ions can be
obtained, for example, from the international reference ionosphere (IRI)
(http://iri.gsfc.nasa.gov/). However, for the general validation study performed
here, exponential profiles for the particle densities and the collision frequencies as
proposed in [33] are utilized. Note that all of these lithosphere, topography,
geomagnetic field, and ionosphere values match those used in the previous (less-
efficient) anisotropic plasma study of [26].
The current source is a 5 km-long Gaussian pulse with a 1/𝑒 full-width of
480Δ𝑡, similar to the source current waveform used in previous studies [12, 26].
166 Advanced Computational Electromagnetic Methods and Applications

The temporal center of the pulse is at 960 Δ𝑡. The source current is above the
Earth’s surface at 47∘ W on the equator.

Figure 4.9 ELF wave attenuation for westward propagation from ¼ and ½ of the distance between
the source and the antipode. The dotted waveform is from the previous isotropic model
of [12]. ©2014 IEEE [4].

The results of the new algorithm are compared with the validated isotropic
ionosphere FDTD model [12]. Figure 4.9 shows the attenuation of the ELF wave
travelling westward from 1 4 to 1 2 of the distance to the antipode for three cases:
(1) isotropic ionosphere case; (2) collision-less anisotropic case with Δ𝑡𝑐2 = 6 ×
10−8 s for the current equation solver; and (3) collisional anisotropic ionosphere
case with Δ𝑡𝑐2 = 1.2 × 10−7 s for the current equation solver.
The ELF wave attenuations for all three cases are very similar. Simpson and
Taflove [12] showed that the wave attenuation obtained from the isotropic model
agrees well with analytical predications and measurements.
Finally, the execution time of the new, anisotropic ionosphere plasma
algorithm is compared to the previous anisotropic model of [26]. The global Earth-
ionosphere system is modeled. Both simulations run on the same machine for only
100 time steps. The execution time of the new algorithm using Δ𝑡𝑐2 = 6 × 10−8 s
(that requires 50 iterations of the current equation solver per each time step of
Maxwell’s equations) was 128 seconds (1.28 seconds per time step) in comparison
to 286 seconds (2.86 seconds per time step) for the previous anisotropic algorithm.
Therefore, the new algorithm is 55% faster than the previous one for this numerical
test.
Recent FDTD Advances in the Ionosphere 167

4.4.3 Summary of Performance

In summary, the advantages of the new FDTD plasma model [4] over the previous,
stable 3-D magnetized ionosphere plasma formulation of [19, 26] are as follows:

 It permits the use of two different time steps for solving the current equation
versus Maxwell’s equations. The previous anisotropic model did not include
this capability, and so for some cases the time-step requirements of the
current density solutions could drastically slow down the solutions to
Maxwell’s equations. As such, obtaining solutions for cases involving high
collision frequencies was nearly impossible due to the necessary long
computational time. It is faster than the previous model. Depending upon the
size of the time step needed to solve the current equation, the new algorithm
is more than 50 percent faster than the previous version.
 Implementation of the algorithm is much simpler and no matrix equation
must be solved.
 The memory requirement is drastically less than for the previous formulation
(3 additional real numbers are stored per cell relative to traditional FDTD
compared to 9 additional real numbers stored per cell as for the previous
plasma formulation; also it does not require storage or recalculation of a
coefficient matrices of size at least 6 × 6 at every grid cell).

The only disadvantage of the new algorithm is that for simulating wave
propagation in dense plasma, the stability condition can be smaller than the
Courant limit. The plasma frequency puts an additional restriction on the
maximum allowable time step value. Therefore, either the Courant condition or a
𝑑𝑡 < 0.87⁄𝜔𝑝𝑒 , whichever is smaller, should be chosen for the time step for
Maxwell’s equations.

4.5 STOCHASTIC FDTD (S-FDTD)

4.5.1 Overview

A second recent advancement for time-domain modeling of ionospheric


electromagnetic wave propagation is the development of a new stochastic FDTD
plasma model that solves for mean as well as variance electromagnetic fields due
to uncertainties or variances in the ionosphere composition [7]. The variability of
the ionosphere renders many propagation problems too complex to be solved using
a deterministic formulation. The structure of the ionosphere can depend not only
on the altitude, time of day, and season, but also on the latitude, longitude, sun spot
cycle, and occurrence of space weather events. A useful approach to such a highly
168 Advanced Computational Electromagnetic Methods and Applications

complex problem is to consider it as a random medium problem. For variation


analysis of uncertainties effects on the EMF, methods for uncertainty
quantification are required.
Numerical electromagnetic techniques, however, typically use only average
(mean) values of the constitutive parameters of the materials and then solve for
expected (mean) electric and magnetic fields. The Monte Carlo method is a well-
established and widely-used brute force technique for evaluating random medium
problems via multiple realizations [5]. Depending on the nature of the statistical
correlation, a random medium problem may require tens or hundreds of thousands
of realizations. This yields an extremely inefficient brute force approach,
particularly for 2-D and 3-D problems, and therefore is rarely used in
electromagnetic modeling.
Several techniques have been proposed recently to solve uncertainty
quantification problems for Maxwell’s equations in FDTD model. Stochastic
FDTD (S-FDTD) is an efficient formulation that runs the ensemble averages in a
single realization scheme [5, 6]. Reference [5] provides a direct estimate of both
the mean and variance of the EMFs within a variable ionosphere at every point in
space and time. The advantage of this method is that it requires only about twice as
much computer simulation time and memory as a traditional FDTD simulation
regardless of the number of random variables. However, its limitation is that it can
only bind the field variances according to a best estimate approximation for the
cross correlation coefficients.
The approach in [6] proposes a single-realization scheme to obtain the
quantities of interest that are the ensemble average of scattered fields, which makes
use of an iterative technique to reformulate multiplicative noise into an additive
noise. However, the restriction of this algorithm is that it must meet the condition
of a weak scattering random medium, where deviation from the mean values of
electrical properties is small.
Another approach [34, 35] makes use of the generalized polynomial chaos
(gPC) method, which is an extension of the homogeneous chaos introduced by
Wiener in 1938. The gPC expands the time-domain electric and magnetic fields in
terms of orthogonal polynomial basis functions of the uncertain variables. The
infinite sum of polynomial chaos expansion is truncated to a finite number of terms
P of orthogonal basis functions. The number of terms P is given by:

P 1 
 n  1! (4.28)
n !d !

where d is the highest polynomial order in the expansion and n is the number of
random variables. It follows that P grows very quickly with the dimension and the
order of the decomposition. In general, the gPC method increases memory
consumption by a factor P + 1 and the simulation time is proportional to (P + 1)2.
The gPC results typically converge significantly faster than the Monte Carlo
method in a number of applications. However, the method has an inherent
Recent FDTD Advances in the Ionosphere 169

limitation. It can handle only a limited number of uncertain inputs. For large
numbers of random variables, polynomial chaos becomes very computationally
expensive and Monte Carlo methods are typically more feasible.
In summary, each of the above approaches has its own strengths and
limitations. Given the fact that the ionosphere content can vary even up to 100% or
more, the S-FDTD method proposed in [5] and the gPC method are good
candidates for electromagnetic wave propagation modeling in ionosphere plasma.
S-FDTD was recently extended to electromagnetic wave propagation in a
magnetized ionospheric plasma by extending the stochastic variables to both
Maxwell’s equations and the Lorentz equation of motion [7]. The electric fields,
magnetic fields, current densities, electron/ion densities and collision frequencies
are all treated as random variables with their own statistical variation. The
resulting mean and variance calculations of the EMFs and current densities provide
new capabilities, such as the ability to determine the confidence level that a
communications/remote sensing/radar system will operate as expected under
abnormal ionospheric conditions. It may also be useful in a wide variety of
geophysical studies.
In [7] an S-FDTD method is developed for the previous (less efficient)
magnetized plasma algorithm of [19]. In this algorithm, the governing stochastic
equations take the form of a large, complex matrix. As a result, the complexity of
the physical model presents a computational challenge. Aside from the S-FDTD
method, if the gPC method is applied in order to avoid the approximation for the
cross correlation coefficients, the derivation of the explicit equations for the gPC
coefficients can be very difficult, or even impossible. Recently, however, a more
efficient magnetized plasma model was developed [4] as presented in Section 4.4.
In this new algorithm, since no matrix equation must be solved and all equations
are explicit, the gPC method may potentially be applied to electromagnetic wave
propagation in the ionosphere. Therefore, the gPC simulation should be derived as
part of future work and its results compared to the S-FDTD modeling and Monte
Carlo results for validation of the algorithm. It is possible that a hybrid method will
be needed to achieve optimal and efficient results. The ultimate object is to
develop a stochastic optimization FDTD-based algorithm that is well suited for
large uncertainty quantification of the ionosphere, so that the variability of the
electromagnetic wave propagation is well under control and understood.
In the remainder of this section, general guidelines are provided for extending
the S-FDTD approach to the more efficient magnetized plasma model of Section
4.4 and [19]. The general approach is analogous to that of [7].

4.5.2 Mean Field Equations

Using the delta method [36], the average (or expected) EMFs and current density
values may be found by solving Maxwell’s equations and the current equation
while using mean (average) values of the variables [7]. For the S-FDTD
magnetized cold plasma model, the equations for the mean values of the EMFs and
170 Advanced Computational Electromagnetic Methods and Applications

current densities are of the same form as for those of the regular 3-D FDTD
magnetized cold plasma model. Thus, the mean EMF and current density values
are found by using the mean plasma frequency of ωPe, or equivalently, the mean of
electron density ne.

4.5.3 Variance Field Equations

The variance fields may be derived by using the delta method and the statistical
values. When solving only Maxwell’s equations, the variance field equations may
be solved separately from the mean field equations no matter the dimensionality of
the problem [5]. However, in the 3-D magnetized cold plasma model, the
momentum (4.9) is coupled to Maxwell’s equations (4.10) and (4.11), which leads
to a complicated but linear system. As a result, the electric field and current density
variances must be computed simultaneously. When variance equations are derived,
covariances are needed for the E and H fields and current density Je in both time
and space. Equation (4.9) also relates the current density to the collision frequency
and the electric field to the plasma frequency of the ionosphere, resulting in
additional covariance terms between the current density and collision frequency,
and the electric field and plasma frequency. For the S-FDTD method, a critical
step is to approximate these correlation coefficients, which control the accuracy of
the algorithm.
Figure 4.10 shows a diagram of the iteration process for each time step of the
S-FDTD method. What is changed from regular FDTD updating is the addition of
the calculation of the variances after the mean values are obtained. Therefore, the
running time as well as the memory required for S-FDTD is roughly double that
needed for traditional FDTD (and double that for the regular FDTD plasma model).
However, we note that this doubling in computation time is in general drastically
faster than the brute-force Monte Carlo method approach.
Since both the mean fields and their variances behave like waves, both require
boundary conditions. Thus, an absorbing boundary condition is needed for the E, H,
and Je mean values as well as for their variances. Mur’s boundary conditions as
used in [7] are an appropriate option because the boundary condition provides
good absorption regardless of the magnetic field direction [37]. The perfectly
matched layer is more complex to implement and also is only effective if the
magnetic field is homogeneous in the vicinity of the boundary region (see for
example, [38]).
Recent FDTD Advances in the Ionosphere 171

Source

Mean H

Standard Deviation ( H)

Mean E, J

Standard Deviation ( E ) and ( J )

Absorbing boundary conditions for


E, J , ( E) and ( J )

No
New time step ? Stop

Figure 4.10 S-FDTD flowchart.

4.6 INPUT TO FDTD/S-FDTD EARTH-PLAMSA IONOSPHERE


MODELS

Since FDTD models may account for highly detailed geometries and materials, it
is useful to populate the FDTD grid with realistic data. The Earth’s topography and
bathymetry data may be obtained, for example, from the National Oceanic and
Atmospheric Administration (NOAA) National Geophysical Data Center (NGDC).
The Earth’s magnetic field data and its direction and amplitude variation with
position may be obtained from the International Geomagnetic Reference Field
(IGRF).
For an isotropic conductivity profile ionosphere to be used in lower-frequency
electromagnetic propagation models, relatively simple profiles based on
measurements and analytical calculations may be used, such as an exponential
conductivity profile [32] or a knee profile [39]. To model an anisotropic
magnetized plasma ionosphere to be used in higher-frequency electromagnetic
propagation models, electron and ion densities and collision frequencies and their
variation with time and position may be obtained from the International Reference
Ionosphere (IRI) and other sources. IRI has recently been expanded to include
stochastic information about the ionosphere composition [40].
172 Advanced Computational Electromagnetic Methods and Applications

4.7 CONCLUSIONS

This chapter provided an overview of two recent FDTD modeling advances for
electromagnetic wave propagation in the ionosphere: (1) a new, efficient 3-D
magnetized ionospheric plasma model; and (2) a stochastic FDTD model of
ionospheric plasma. The combination of these models provides the capability to
model high frequency electromagnetic wave propagation over longer distances
than previously possible, while also solving for not only mean but also variance
electric and magnetic fields due to uncertainties or variances in the ionosphere.
Applications of these models range from propagation studies [12, 24], remote
sensing of ionospheric anomalies [41] and underground oil fields [25, 42], to
modeling of Schumann resonances [39], hypothetical ELF earthquake precursors
[43], space weather [44], and communications.

REFERENCES

[1] K. Yee, “Numerical Solution of Initial Boundary Value Problems Involving Maxwell´s
Equations in Isotropic Media,” IEEE Transactions on Antennas and Propagation, Vol. 14, No. 3,
pp. 302–307, 1996.
[2] A. Taflove and S. Hagness, Computational Electromagnetics: The Finite-Difference Time-
Domain (FDTD) Method, 3rd ed, Norwood, MA: Artech House, 2005.
[3] B. Nguyen A. Samimi, and J. Simpson, “Recent Advances in FDTD Modeling of
Electromagnetic Wave Propagation in the Ionosphere,” Applied Computational Electromagnetics
Society Journal, Vol. 29, No. 12, pp. 1003-1012, 2014.
[4] A. Samimi and J. Simpson, “An Efficient 3-D FDTD Model of Electromagnetic Wave
Propagation in Magnetized Plasma,” IEEE Transactions on Antennas and Propagation, Vol. 63,
No. 1, pp. 269–279, 2015.
[5] S. Smith and S. Furse, “Stochastic FDTD for Analysis of Statistical Variation in Electromagnetic
Fields,” IEEE Transactions on Antennas and Propagation, Vol. 60, No. 7, pp. 3343–3350, 2012.
[6] T. Tan, A. Taflove, and V. Backman, “Single Realization Stochastic FDTD for Weak Scattering
Waves in Biological Random Media,” IEEE Transactions on Antennas and Propagation, Vol. 61,
pp. 818–828, 2013.
[7] B. Nguyen, C. Furse, and J. Simpson, “A 3-D Stochastic FDTD Model of Electromagnetic Wave
Propagation in Magnetized Ionosphere Plasma,” IEEE Transactions on Antennas and
Propagation, Vol. 63, No. 1, pp. 304–313, 2015.
[8] S. Aune, “Comparison of Ray Tracing through Ionospheric Models,” Master’s thesis,
Department of the Air Force, Air Force Institute of Technology, Wright-Patterson Air Force Base,
Ohio, March 2006.
[9] K. Yeh and C. Liu, “Radio Wave Scintillations in the Ionosphere,” Proceedings of the IEEE, Vol.
70, No. 4, pp. 324–360, 1982.
[10] C. Rino, “A Power Law Phase Screen Model for Ionospheric Scintillation: 1. Weak Scatter,”
Radio Science, Vol. 14, No. 6, pp. 1135–1145, 1979.
Recent FDTD Advances in the Ionosphere 173

[11] V. Gherm, N. Zernov, and H. Strangeways, “Propagation Model for Transionoshperic


Fluctuating Paths of Propagation: Simulator of the Transionospheric Channel,” Radio Science,
Vol. 40, pp. RS1003:1–9, 2005.
[12] J. Simpson and A. Taflove, “Three-Dimensional FDTD Modeling of Impulsive ELF Antipodal
Propagation and Schumann Resonance of the Earth-Sphere,” IEEE Transactions on Antennas
and Propagation, Vol. 52, No. 2, pp. 443–451, 2004.
[13] J. Simpson and A. Taflove, “A Review of Progress in FDTD Maxwell’s Equations Modeling of
Impulsive Sub-Ionospheric Propagation Below 300 kHz,” IEEE Transactions on Antennas and
Propagation, vol. 55, no. 6, pp. 1582–1590, 2007.
[14] J. Simpson, “Current and Future Applications of 3-D global Earth-Ionosphere Models Based on
the Full-Vector Maxwell’s Equations FDTD Method,” Surveys in Geophysics, Vol. 30, No. 2, pp.
105–130, 2009.
[15] W. Hamilton, “Third Supplement to Treatise on Geometrical Optics,” in The Mathematical
Papers of W. R. Hamilton, edited by A. Conway, 4 Vols., Cambridge 1931–2000.
[16] J. Haselgrove, “Ray Theory and a New Method for Ray Tracing,” Report of Conference on the
Physics of the Ionosphere, London Physical Society, pp. 355364, 1954.
[17] C. Haselgrove and J. Haselgrove, “Twisted Ray Paths in the Ionosphere,” Proceedings of the
Physical Society, Vol. 75, pp. 357–361, 1960.
[18] R. Jones and J. Stephenson, A Versatile Three-Dimensional Ray Tracing Computer Program for
Radio Waves in the Ionosphere, Institute for Telecommunication Sciences, Office of
Telecommunications, U.S. Dept. of Commerce. Springfield: National Technical Information
Service, 1975.
[19] Y. Yu and J. Simpson, “An E-J Collocated 3-D FDTD Model of Electromagnetic Wave
Propagation in Magnetized Cold Plasma,” IEEE Transactions on Antennas and Propagation, Vol.
58, No. 2, pp. 469–478, 2010.
[20] W. Hu and S. Cummer, “An FDTD Model for Low and High Altitude Lightning-Generated EM
Fields,” IEEE Transactions on Antennas and Propagation, Vol. 54, pp. 1513–1522, 2006.
[21] J. Lee and D. Kalluri, “Three Dimensional FDTD Simulation of Electromagnetic Wave
Transformation in a Dynamic Inhomogeneous Magnetized Plasma,” IEEE Transactions on
Antennas and Propagation, Vol. 47, pp. 1148–1151, 1999.
[22] M. Thèvenot, J. Bérenger, T. Monedière, and F. Jecko, “An FDTD Scheme for the Computation
of VLF-LF Propagation in the Anisotropic Earth-Ionosphere Waveguide,” Annales des
Télécommunications, Vol. 54, pp. 297–310, 1999.
[23] B. Chaudhury and S. Chaturvedi, “Comparison of Wave Propagation Studies in Plasmas using
Three-Dimensional Finite-Difference Time-Domain and Ray Tracing Methods,” Physics of
Plasmas, Vol. 13, p. 123302, 2006.
[24] M. Hayakawa and T. Otsuyama, “FDTD Analysis of ELF Wave Propagation in Homogeneous
Subionospheric Waveguide Models,” Applied Computational Electromagnetics Society Journal,
Vol. 17, No. 3, pp. 239–244, 2004.
[25] J. Simpson, R. Heikes, and A. Taflove, “FDTD Modeling of a Novel ELF Radar for Major Oil
Deposits using a Three-Dimensional Geodesic Grid of the Earth-Ionosphere Waveguide,” IEEE
Transactions on Antennas and Propagation, Vol. 54, No. 6, pp. 1734–1741, 2006.
[26] Y. Yu, J. Niu, and J. Simpson, “A 3-D Global Earth-Ionosphere FDTD Model Including an
Anisotropic Magnetized Plasma Ionosphere,” IEEE Antennas and Propagation, Vol. 60, No. 7,
pp. 3246–3256, 2012.
174 Advanced Computational Electromagnetic Methods and Applications

[27] J. Boris, The Acceleration Calculation from a Scalar Potential, Plasma Physics Laboratory,
Princeton University, MATT-152, March 1970.
[28] C. Birdsall and A. Langdon, Plasma Physics Via Computer Simulation, Institute of Physics, New
York, 1991.
[29] G. Sod, “A Survey of Several Finite Difference Methods for Systems of Nonlinear Hyperbolic
Conservation Laws,” Journal of Computational Physics, Vol. 27, No. 1, pp. 1–31, 1978.
[30] R. Garcia and R. Kahawita, “Numerical Solution of the St. Venant Equations with the
MacCormack Finite-Difference Scheme,” International Journal for Numerical Methods in Fluids,
Vol. 6, pp. 259–274, 1986.
[31] F. Chen, Introduction to Plasma Physics and Controlled Fusion, Plasma Physics, 2nd ed.,
Springer, 1984.
[32] P. Bannister, “ELF Propagation Update,” IEEE J. Ocean. Eng., Vol. 0E-9, No. 3, pp. 179–188,
1984.
[33] J. Wait and K. Spies, Characteristics of the Earth-Ionosphere Waveguide for VLF Radio Waves.
Boulder, CO: National Bureau of Standards, 1964.
[34] R. Edwards, A. Marvin, and S. Porter, “Uncertainty Analyses in the Finite-Difference Time-
Domain Method,” IEEE Transactions on Electromagnetic Compatibilty, Vol. 52, No. 1, pp. 155–
163, 2010.
[35] A. Austin and C. Sarris, “Efficient Analysis of Geometrical Uncertainty in the FDTD Method
Using Polynomial Chaos With Application to Microwave Circuits,” IEEE Transactions on
Microwave and Theory Techniques, Vol. 61, No. 12, pp. 4293–4301, 2013.
[36] G. Casella and R. L. Berger, Statistical Inference, 2nd ed., Singapore: Thompson Learning, 2002.
[37] Y. Yu and J. Simpson, “A Magnetic Field-Independent Absorbing Boundary Condition for the
FDTD E-J Collocated Magnetized Cold Plasma Algorithm,” IEEE Antennas and Wireless
Propagation Letters, Vol. 10, pp. 294–297, 2011.
[38] W. Hu and S. Cummer, “The Nearly Perfectly Matched Layer is a Perfectly Matched Layer,”
IEEE Antennas and Wireless Propagation Letters, Vol. 3, pp. 137–140, 2004.
[39] H. Yang and V. Pasko, “Three-Dimensional Finite-Difference Time-Domain Modeling of the
Earth-Ionosphere Cavity Resonances,” Geophysical Research Letters, Vol. 32, No. L03114,
2005.
[40] O. Oladipo, J. Adeniyi, S. Radicella, and I. Adimula, “Variability of the Ionospheric Electron
Density at Fixed Heights and Validation of IRI-2007 Profiles Prediction at Ilorin,” Advances in
Space Research, Vol. 47, No. 3, pp. 496–505, 2011.
[41] J. Simpson and A. Taflove, “ELF Radar System Proposed for Localized D-Region Ionospheric
Anomalies,” IEEE Geoscience and Remote Sensing Letters, Vol. 3, No. 4, pp. 500–503, 2006.
[42] J. Simpson and A. Taflove, “A Novel ELF Radar for Major Oil Deposits,” IEEE Geoscience and
Remote Sensing Letters, Vol. 3, No. 1, pp. 36–39, 2006.
[43] J. Simpson and A. Taflove, “Electrokinetic Effect of the Loma Prieta Earthquake Calculated by
an Entire-Earth FDTD Solution of Maxwell's Equations,” Geophysical Research Letters, Vol. 32,
No. L09302, 2005.
[44] J. Simpson, “On the Possibility of High-Level Transient Coronal Mass Ejection-Induced
Ionospheric Current Coupling to Electric Power Grids,” Journal of Geophysical Research—
Space Phys., vol. 116, no. A11308, 2011.
Chapter 5
Phi Coprocessor Acceleration Techniques in
Computational Electromagnetic Methods
Wenhua Yu, Xiaoling Yang, and Lei Zhao

Computational electromagnetics has obtained great success from fast development


of computer science and techniques in the past decades. Today, by using a regular
computer, we easily solve large problems that were considered to be impossible
ten years ago. However, many efforts have been made to take advantage of GPUs
to further accelerate electromagnetic simulations in both academic research and
engineering applications. The GPU acceleration technique was discussed in
Chapter 3. The advantages of GPUs are obvious compared to the regular CPUs in
terms of simulation time if a problem is suitable for GPUs. Actually, the CPU
manufacturers, Intel and AMD, have embedded vector units in their CPUs for
dozens of years to enhance CPU performance in multimedia areas. One may
partially use this feature through some particular optimization options in the
C++/FORTRAN code compilation without intention. In order to achieve the best
performance from the vector units embedded in regular CPUs, it is necessary to
understand the streaming SIMD (Single Instruction Multiple Data) extensions
(SSE) instruction set and advanced vector extensions (AVX) instruction set [114].
It is a well-known fact that both GPUs and vector units have been employed to
solve many practical problems efficiently with their unique features.
Intel released the Intel Xeon Phi coprocessor in 2012 as a new numerical
computation tool to enhance existing Intel CPUs [1519]. The Xeon Phi
coprocessors offer additional power-efficient scaling vector support, and local fast
memory bandwidth, while maintaining the programmability and support associated
with the Intel Xeon processors. The Xeon Phi coprocessor provides a compatible
performance with the NVIDIA Tesla GPU (www.nvidia.com) in both memory
bandwidth and peak performance. However, the programming on the Xeon Phi
coprocessor platforms is much easier than on the GPU platforms. In addition, the
Xeon Phi coprocessors share the same source code as Intel Xeon E3 and E5 CPUs
with one extra compilation option -mmic.
In this chapter, we introduce the architecture of the Phi coprocessor,
programming techniques, software and hardware environment, and acceleration

175
176 Advanced Computational Electromagnetic Methods and Applications

techniques in computational electromagnetic methods. We also present code


optimization techniques to improve numerical simulations based on the Phi
coprocessor platform by investigating and improving memory bottlenecks. The
benchmarks and representative examples will be employed to verify the parallel
FDTD code and matrix multiplications that often occur in the MoM and the FEM.

5.1 INTRODUCTION

The Xeon Phi coprocessor is a general numerical acceleration platform, as shown


in Figure 5.1(a), which can be used to accelerate various computational
electromagnetic methods such as FDTD [2022], FEM [23], MoM [24], and so on.
One Xeon Phi coprocessor includes up to 61 compute cores, four hardware threads
per core, two pipelines, 512-bit SIMD instructions, 32 512-bit wide vector registers
that can hold 16 single precision floating numbers or 8 double precision floating
numbers, up to 16GB 16-channel high performance GDDR5 RAM, and 352GB/s
memory bandwidth. Types of Xeon Phi coprocessor and their specifications are
summarized in Table 5.1. One computer can hold a maximum four Phi
coprocessors through its PCI express × 16 slots, as shown in Figure 5.1(b).

Table 5.1
Xeon Phi Coprocessor Types and Technical Specifications

Phi 3120A/P Phi 5120D Phi Phi 7120X


(active/passive) (without heatsink) 5110P (without heatsink)/P

GDDR5 RAM
6 8 8 16
(GB)

Clock speed (GHz) 1.1 1.053 1.053 1.238

Cache (MB) 28.5 30 30 30.5

Number of cores 57 60 60 61

Number of
hardware threads 4 4 4 4

RAM bandwidth
(GB/s) 240 352 320 352

Memory channel 12 16 16 16

Max TDP (W) 300 245 225 300


Phi Coprocessor Acceleration Techniques 177

(a) Appearance of active Xeon Phi coprocessors.

Four Phi Cards

(b) More than one Xeon Phi coprocessor installed on one workstation.

Figure 5.1 Xeon Phi coprocessors (www.intel.com). (a) Appearance of active Phi coprocessors. (b)
More than one Phi coprocessor installed on one computer.

One Xeon Phi coprocessor card can be handled as one independent node in a
Linux cluster, and each one has an on-board flash device that loads the coprocessor
OS on boot and can be monitored by an optional cluster monitoring software
Ganglia (http://ganglia.info/) [25]. Xeon Phi coprocessor programming uses the
Many Integrated Core (MIC) instructions. The same source code can be compiled
for the host CPU and the Xeon Phi coprocessor with one different compilation
option, which is essentially different from a GPU, as shown in Figure 5.2.
178 Advanced Computational Electromagnetic Methods and Applications

Model 1 Model 2 Model 3 Model 4


CPU Xeon Phi

CPU only CPU CPU Phi Phi

Phi

Results Results Results Results

Figure 5.2 Possible relationship between CPU and Xeon Phi coprocessor.

5.2 ENVIRONMENT REQUIREMENTS AND SETTINGS

The Xeon Phi coprocessor can be used as an independent processor and has its
own cores, cache, and memory. It is mounted on a workstation through PCI
express × 16 slot and can run an application code independently or cooperate with
the host CPU to work together on an application. In this section, we will introduce
the environment and settings for the Xeon Phi coprocessor.

5.2.1 Hardware Configuration

Intel Xeon E3 and E5 CPU series on the Xeon Phi platform are preferred since
they can share the same source code as the Xeon Phi coprocessor. Not all
motherboards support the Xeon Phi coprocessor, and the Xeon Phi coprocessor
requires a motherboard and its BIOS with the large Base Address Register (BARs)
option (MMIO addressing greater than 4 GB). The expressions in a Courier font
represent Linux commands, special terminologies, or special phrases without
special mention in this chapter.
Check if a motherboard includes the PCIe/PCI/PnP Configuration
option in a motherboard manual. There is an option Above 4G Decoding
(available if the system supports 64-bit PCI decoding) in the BIOS, and one of its
options, Enabled, is used to decode a PCI device that supports 64-bit in the space
above 4G address.
Any motherboard with at least one PCI express × 16 slot plus the following
option in the BIOS must be active. When turning on the workstation, press the
Del key in the BIOS setting window. In the BIOS setting window, select the
Advanced option, and then select the PCIE/PCI/PnP Configuration
Phi Coprocessor Acceleration Techniques 179

option in the advanced window. The Above 4G Decoding option must be


active, and change the Disabled option to the Enabled option. If there is no
Above 4G Decoding option or it is inactive in the PCIE/PCI/PnP
Configuration window, it means that this motherboard does not support the
Xeon Phi coprocessor.
The terminologies may be different on different motherboards. For example,
to enter the BIOS settings screen, the hot key may be the Del key or one of the F
keys like F2. The PCIE/PCI/PnP Configuration option may be
Advanced settings on other motherboards. The Memory Mapped I/O
above 4GB [Enable] option on other motherboards will probably look like
PCI 64bit Resource Handling Above 4G Decoding [Enabled].
One might also see something referring to MMIO above 4G or maybe even
large BAR support. A typical example (say, a supermicro product with Intel
Xeon CPUs) shows the hardware components involved in a Xeon Phi workstation.

1. Motherboard
 Supermicro X9DRG-QF GPU-ready Server Board
 Dual Socket R (LGA 2011): dual Intel Xeon E5-2600 or E5-2600v2 CPUs
 16 DIMM: up to 1TB ECC DDR3 memory, up to 1,866 MHz
 4 PCI-E 3.0 × 16 (double-width): four Intel Xeon Phi coprocessor cards
2. CPUs
 Intel Xeon E5-2640 v2 Processor
 Ivy Bridge-EP Eight-Core, Sixteen Threads 2 GHz 20 MB 7.2GT/s 95W
 Memory Types: DDR3-800/1066/1333/1600
3. RAM
 16GB PC3-12800 DDR3 1,600 MHz Registered ECC Dual-Rank 1.35V X
4
4. Xeon Phi Coprocessor
 Intel Xeon Phi Coprocessor 3120A 1.1 GHz 28.5 MB Cache 300W
 Number of Cores: 60
 Memory: 6 GB DDR5
5. Chassis
 Supermicro SC747TQ-R1620B Tower/4U Chassis
 Power Supply: 1,620W Redundant
6. Harddisk
 1TB Seagate ST1000DM003 SATA 6.0GB/s 7,200 rpm 64 MB 3.5 inches
180 Advanced Computational Electromagnetic Methods and Applications

7. DVD Writer for operating system installation


 Samsung SH-224DB/BEBE SATA 24x DVD Rewriter

5.2.2 Software Configuration

No matter what operating system is installed in the host computer, the Xeon Phi
coprocessor only supports a micro Linux system to get the better code performance.
In this section, we introduce how to install the Linux operating system on a
workstation with the Xeon Phi coprocessors. We here use CentOS 6.5 (compatible
with Red Hat Enterprise 6.5, free download from http://www.centos.org) as an
example to explain how to install the Linux operating system on a Xeon Phi
workstation.

1. Prepare Linux DVD


 Download Red Hat Enterprise software.
 Check if the DVD burner has this option. If not, you can download a free
burner software such as Free ISO Burner (http://www.freeisoburner.com/),
CDburnerXP (http://cdburnerxp.se/en/home), All Free ISO Burner
(http://www.allfreevideoconverter.com/freeisoburner/index.html), or
Active@ISO Burner (http://www.ntfs.com/iso_burner_free.htm).
 Burn a Linux operating system DVD with the iso file format.
2. Install Linux operating system
 Insert the Linux DVD (CentOS 6.5) to the CD-ROM
 Select the Install or upgrade an existing system option
to start Linux installation.
 Use the space key in the Disc found window to select and press the
Skip button.
 In the CentOS 6 installation interface click the Next button to continue.
 Select the language English option, click the Next button and select
the keyboard (US English) option.
 Select the Basic Storage Devices option and click the Next
button to continue.
 In the Storage Device Warning window, select the Yes,
discard any data option.
 Hostname: keep the default option or specify any legal name.
 Select the time zone.
 Root password: administrator (username: root), and click the Next
button.
Phi Coprocessor Acceleration Techniques 181

 Select the Use all space option, and check the Review and
modify partition layout box. Click the Next button.
 Double-click the /home option in the LVM Volume Group-
VolGroup list, and modify the size to 1 to release space.

o Double-click the lv-swap in the LVM Volume Group-


VolGroup list, and modify its size to 8192.
o Double-click the /root option in the LVM Volume Group-
VolGroup list, and modify its size to 40000.
o Click the Create button, select the LVM Logical Volume
(VolGroup) option, click the Create button, select the /tmp
option in the Mount Point combo list and change its size to
40000.
o Click the Create button, select the LVM Logical Volume
(VolGroup) option, click the Create button, select the /opt
option in the Mount Point combo list and change its size to
40000.
o Click the Create button, select the LVM Logical Volume
(VolGroup) option, click the Create button, select the /var
option in the Mount Point combo list and change its size to
40000.
o Click the Create button, select the LVM Logical Volume
(VolGroup) option, click the Create button, select the /tmp
option in the Mount Point combo list and change its size to
40000.
o Double-click the /home option in the LVM Volume Group-
VolGroup list, and modify the size to xxx (all available space).
o Click the Next button.
o Click the Format, writing Storage Configuration to
disk button, and select the Write changes to disk option.
(formatting disc, and the Formatting window appears)
o Use the default option (install the Boot loader), click the Next
button.
o Select the Minimal option and click on then Customize Now
button, and click the Next button.
o Applications: Internet Browser
o Base System: Base and Infiniband Support
182 Advanced Computational Electromagnetic Methods and Applications

o Desktops: Desktop, Fonts, General Purpose Desktop,


Graphical Administration Tools, Legacy X
windows System Compatibility, and X Windows
System
o Development: Development tools
o Web Services: must be no selection, otherwise will be conflict.
o Click the Next button.

 Take out DVD, and then press the reboot button.


 Use the default in the Welcome page window.
 License information: click the agree button.
 Create User: skip
 Date and Time: Forward
 Kdump: Uncheck the Enable kdump box.
 Finish

3. Install Intel Manycore Platform Software Stack (MPSS)


 Requirements for MPSS

The Intel Manycore Platform Software Stack(MPSS)


software is necessary to run the Xeon Phi coprocessor. It is dependent on
Linux kernels 2.6.34 or later version, and it has been tested to work with
the specific versions of 64-bit operating systems:

o Red Hat Enterprise 6.0, 6.1, 6.2, 6.3, 6.4 and 6.5
o SuSE Linux Enterprise Server (SLES) 11 SP1 and SP2 (MPSS 2.1
release) and SuSE Linux Enterprise Server (SLES) 11 SP2 and SP3
(MPSS 3.x release)
o Microsoft Windows 7 Enterprise SP1, Windows 8 Enterprise,
Windows Server 2008 R2 SP1 and Windows Server 2012

 Software download: MPSS 3.1 release for Linux


(http://software.intel.com/en-us/articles/intel-manycore-platform-
software-stack-mpss)

 MPSS Installation

(i) Requirements
Phi Coprocessor Acceleration Techniques 183

o Administrator privileges are required to install the Intel MPSS


3.1.2 release.
o Valid SSH keys are required for the users (including the root
user) that need the SSH access to each Xeon Phi coprocessor.
(refer to SSH Access and Configuration for the
Intel Xeon Phi Coprocessor)
o Supported hardware platform with at least one Intel Xeon Phi
coprocessor installed.
o Enable the Large Base Address Registers (BAR)
Support option in the Platform BIOS. For example, for the
Supermicro motherboard, enable the Above 4G Decoding of
PCIE/PCI/PnP Configuration in the Advanced option
in the Bios setting windows.

(ii) Check if the Xeon Phi coprocessor is recognized by Linux Kennel


using the following command:

User prompt> lspci | grep Co-processor

If the Xeon Phi coprocessor is found in the list, it means the Xeon Phi
coprocessor is recognized. Otherwise, check the hardware installation
and BIOS settings.

(iii) Steps to uninstall Intel MPSS

o To uninstall 3.x-based builds, follow the steps below:

 Unload the MPSS driver using the command below:

user_prompt> sudo service mpss unload

 Uninstall the previous version using the following commands:

user_prompt> cd mpss-3.1.2
user_prompt> ./uninstall.sh

o To uninstall pre-3.x builds, follow the steps below:

 Unload the MPSS driver using the command below:

user_prompt> sudo service mpss unload

 Uninstall the previous version (Red Hat Enterprise Linux)


using the following command:
184 Advanced Computational Electromagnetic Methods and Applications

user_prompt> sudo yum remove intel-mic\*

(iv) Disable SELinux (/etc/selinux/config) before installing


the Intel MPSS software, to prevent SELinux from overriding
standard Linux permissions settings by modifying the config file.

 Select the config file


 Click the right mouse button
 Select the Open with gedit option
 Change the line to be SELINUX=disabled
 Save and exit the config file
 Execute the command setenforce0 or reboot.

(v) Disable firewall

 Open the System->Administration->Service


window in the Linux window
 Stop and disable the iptables and ip6tables option.

(vi) Identify the tar file for the host OS, then untar and install the Intel
MPSS package. There is a new folder mpss-3.1.2 in the current
folder after the MPSS package is untarred.

user_prompt> tar xvf mpss-3.1.2-rhel-6.0.tar


user_prompt> cd mpss-3.1.2

(vii) Regenerate the Xeon Phi coprocessor driver for the current OS

o Ensure the prerequisites are installed through the following


command:

user_prompt> sudo yum install kernel-


headers kernel-devel

Note: use the sudo command if the user is not administrator.

o Regenerate the MPSS driver module package through the


following commands:

user_prompt> cd <folder where extracted


tar file expanded>/src/
Phi Coprocessor Acceleration Techniques 185

user_prompt> rpmbuild --rebuild mpss-


modules-*.el6.src.rpm

o The mpss -modules binary rpm is located at


$HOME/rpmbuild/RPMS/x86_64. Use the following command
to check if the new driver is generated:

user_prompt> ls $HOME/rpmbuild/RPMS/x86_64

o Replace the original mpss-modules-XXX.rpm and mpss-


modules-dev-XXX.rpm in the mpss.3.1.2 folder with the newly
created rpm files and proceed to install.

(viii) Install the MPSS package using the following command:

user_prompt> sudo yum install *.rpm

(ix) Initialize the MPSS Default Settings using the following command:

user_prompt> sudo micctrl --initdefaults

 Configure MPSS via .conf Files (optional).

o The document MPSS_Users_Guide.pdf explains in detail


how to modify the Intel MPSS configuration files.

o To make additional configuration changes, edit


the .conf files in the /etc/mpss/ directory. After
modifying the configuration files, it is necessary to
execute the following command before the changes take
effect:

user_prompt> sudo micctrl --resetconfig

 Start the Intel MPSS by using the Linux service command:

user_prompt> sudo service mpss start

o To automatically add the Intel MPSS service to the


configuration file when the host operating system boots
next time, we need to execute the following command:

user_prompt> sudo chkconfig mpss on


186 Advanced Computational Electromagnetic Methods and Applications

o To automatically disable the Intel MPSS service when the


host operating system boots next time, we need to execute
the following command:

user_prompt> sudo chkconfig mpss off

(x) Check the Xeon Phi coprocessor connection status using the
following command:

User_prompt>micinfo

You should find one line in the list like:

Coprocessor: Intel Corporation Xeon Phi


Coprocessor 3100/5100/7100 series

4. SSH access and configuration for the Xeon Phi coprocessor

Communication with the coprocessor Linux operating system on the Xeon Phi
coprocessor is provided by a standard network interface. The interface uses a
virtual network driver over the PCIe bus. Standard networking tools such as SSH
are supported.
The Xeon Phi coprocessor Linux operating system supports network access
for all users using the SSH keys. The configuration phase of the Intel MPSS
creates users for each coprocessor based on the current user IDs in the host
/etc/passwd file.
For each user in the /etc/passwd (including root) folder, if the SSH key
files are found in the user’s .ssh directory, those keys are also populated to the
Xeon Phi coprocessor’s file system. If the users do not have valid keys, they will
not have network access to the Xeon Phi coprocessor.
To generate the SSH key, each user must execute the following command:

user_prompt> ssh-keygen

The following commands must be executed in order for the MPSS to pick up
any new keys:

user_prompt> sudo service mpss stop


user_prompt> sudo micctrl --resetconfig
user_prompt> sudo service mpss start

5. Check the status of Xeon Phi coprocessor using the command:

User_propmt> micinfo
Phi Coprocessor Acceleration Techniques 187

 System Information
Host OS : Linux
OS Version : 2.6.32-431.el6.x86
Driver Version : 3.1.2-1
MPSS Version : 3.1.2
Host Physical Memory : 32848 MB

 Device No. 0, Device Name : mic0


o Version
Flash Version : 2.1.02.0386
SMC Firewall Version : 1.14.4616
SMC Boot Loader Memory : 1.8.4326
OS Version : 2.6.38.8+mpss3.1.2

o Board
PCIe Width : × 16
PCIe Speed : 5 GT/s
Board SKU : C0 QS-3120 P/A
ECC Mode : Enabled
SMC HW Revision : Product 300W Active CS

o Cores
Total No of Active Cores : 60
Frequency : 1100000 kHz

o GDDR
GDDR Density : 2048 Mb
GDDR Size : 5952 MB
GDDR Technology : GDDR5
GDDR Speed : 5.000000 GT/s

Use the following command to check the performance of Xeon Phi


coprocessor:

User_propmt> micsmc

You can check the performance of a Xeon Phi coprocessor, such as


memory usage, temperature, power, and average core utilization, through
the following window, as shown in Figure 5.3.
188 Advanced Computational Electromagnetic Methods and Applications

Figure 5.3 Performance monitoring window on the platform with Xeon Phi coprocessor:
temperature, memory usage, power, and core average utilization for system and user.

5.2.3 Compilation Environment

In this subsection, we describe the compilation environment for the Xeon Phi
coprocessor and its host that can be used for different applications.
1. Install C++ compiler
We describe how to install the C++ compiler on the system with a Xeon
Phi coprocessor. You need to download three files from the Intel web site:

 COM_L_CPP_C94M-W38PX7PH.Lic
 I_ccompxe_2013_sp1.1.106.tgz
 License.txt

Copy the three files above to a local folder and untar I_ccompxe_XXX.tgz.
The I_ccompxe_2013XXX subfolder will be generated in the current folder.
Enter this folder and double-click the install_GUI.sh file to start to
install the C++ compiler.
Since we need to use the Xeon Phi coprocessor to run application files,
the installation components should include a component like MIC support.
Input the proper license key and the compiler installs, by default, under
the /opt/intel folder. After the compiler is successfully installed, you
need to modify the environment variables.
(a) Create an intel.sh file via the steps below:
 Enter /etc/profile.d and create a file intel.sh
 Add PATH=$PATH:/opt/intel/bin to the file
 Add export PATH to the file
 Save the file

(b) Create an intel-x86_64.conf file


Phi Coprocessor Acceleration Techniques 189

 Enter /etc/ld.so.conf.d and create a file intel-


x86_64.conf
 Add /opt/intel/lib/intel64 to the file
 Save the file
(c) user_prompt>ldconfig
2. Modify Xeon Phi environment
(a) Create a subfolder lib64 under the /var/mpss/common folder
(b) Copy the file libiomp5.so from /opt/intel/lib/mic/ to
/var/mpss/common/lib64 through the following command:
cp /opt/intel/lib/mic/libiomp5.so
/var/mpss/common/lib64
(c) Add /lib64/libiomp5.so lib64/libiomp5.so 755 0 0 to
the file common.filelist through the following commands:

user_prompt>micctrl –updateramfs
user_prompt>service mpss restart

In the MPSS folder (/var/mpss/), there are two folders and three
files. Each Xeon Phi card corresponds to one folder such as mic0, mic1,
mic2, and mic3. The parameters for each Xeon Phi card in its own
folder micX and its own file micX.filelist, and the common files
and parameters, are located in the common folder and the
common.filelist file, as shown in Figure 5.4.

Figure 5.4 Basic files and folders in the MPSS folder.


190 Advanced Computational Electromagnetic Methods and Applications

5.2.4 Example Code for CPU and Xeon Phi Coprocessor

We use a simple example to demonstrate how to use the Intel Xeon CPU and Xeon
Phi coprocessor to calculate the result of the following formula:

4096000

 aibi  ci ,
i 0
(5.1)

where a, b and c are floating numbers. The C++ code (main.cpp) is shown in
Listing 5.1, which uses the new future of AVX and MIC, fused multiply-add
(FMA), such that one multiplication and addition operation can be completed by
using a single instruction.
Listing 5.1 Demonstration code for (5.1) on the Xeon Phi coprocessor
#include <stdio.h>
#include <time.h>
#include <omp.h>
#include <immintrin.h>

#ifdef _WIN32
#include <Windows.h>
#elif defined(__linux__)
#include <sys/time.h>
#endif

#define ALIGNED_SIZE 64 // 64-byte/512-bit alignment


#ifdef __MIC__
#define SIMD_SIZE 16 // width of SIMD
#else
#define SIMD_SIZE 4 // width of SIMD
#endif
#define DIM 4096000 // problem size
#define ROUND 200000

template<class T>
void aligned_malloc1D(T *&p, size_t n)
{
p = NULL;

size_t na = ALIGNED_SIZE;
if (na < sizeof(size_t)) na = sizeof(size_t);

size_t nn = n * sizeof(T) + sizeof(size_t) + na;


char *po = (char *)malloc(nn * sizeof(char));
if (po == NULL) return;

size_t nshift = na - ((size_t)po % na);


if (nshift < sizeof(size_t)) nshift += na;
Phi Coprocessor Acceleration Techniques 191

p = (T *)(po + nshift);

*(size_t *)((char *)p - sizeof(size_t)) = (size_t)po;


}

template<class T>
void aligned_free1D(T *&p)
{
if (p == NULL) return;

size_t addr = *(size_t *)((char *)p - sizeof(size_t));


char *po = (char *)addr;

free(po);
p = NULL;
}

template<class T>
void aligned_malloc2D(T **&pp, size_t n, size_t m)
{
pp = NULL;

size_t nm = n * m;
T *p = NULL;
aligned_malloc1D(p, nm);
if (p == NULL) return;

pp = (T **)malloc(n * sizeof(T *));


if (pp == NULL)
{
aligned_free1D(p);
return;
}

int i;
for (i = 0; i < n; i ++, p += m)
{
pp[i] = p;
}
}

template<class T>
void aligned_free2D(T **&pp)
{
if (pp == NULL) return;

T *p = pp[0];
free(pp);
aligned_free1D(p);
pp = NULL;
192 Advanced Computational Electromagnetic Methods and Applications

struct TTIME
{
int hour, minute, second, millisecond;
};

TTIME operator-(const TTIME &tm1, const TTIME &tm2)


{
TTIME tm;
int x = 0;
tm.millisecond = tm1.millisecond – tm2.millisecond;
if (tm.millisecond < 0) {
tm.millisecond += 1000;
x = 1;
}
tm.second = tm1.second - tm2.second - x;
if (tm.second < 0)
{
tm.second += 60;
x = 1;
} else {
x = 0;
}
tm.minute = tm1.minute - tm2.minute - x;
if (tm.minute < 0)
{
tm.minute += 60;
x = 1;
} else {
x = 0;
}
tm.hour = tm1.hour - tm2.hour - x;

return tm;
}

void gettime(TTIME &tm)


{
#ifdef _WIN32
SYSTEMTIME current;

GetSystemTime(&current);
tm.hour = current.wHour;
tm.minute = current.wMinute;
tm.second = current.wSecond;
tm.millisecond = current.wMilliseconds;

#elif defined(__linux__)
timeval current;
Phi Coprocessor Acceleration Techniques 193

if (gettimeofday(&current, NULL)) return;


tm.millisecond = current.tv_usec / 1000;
tm.second = current.tv_sec % 60;
tm.minute = (current.tv_sec / 60) % 60;
tm.hour = (current.tv_sec / 3600) % 24;
#endif
}

int main()
{
int n;
n = DIM;
if (n % SIMD_SIZE) n = (n / SIMD_SIZE + 1)* SIMD_SIZE;

float *A, *B, *C, *D;


aligned_malloc1D(A, n);
aligned_malloc1D(B, n);
aligned_malloc1D(C, n);
aligned_malloc1D(D, n);

TTIME tm_start, tm_end;


gettime(tm_start);

#ifdef __MIC__
__m512 *vA, *vB, *vC;
vA = (__m512 *)A;
vB = (__m512 *)B;
vC = (__m512 *)C;
#else
__m128 *vA, *vB, *vC;
vA = (__m128 *)A;
vB = (__m128 *)B;
vC = (__m128 *)C;
#endif

int m = n / SIMD_SIZE;
int nThreads;

#pragma omp parallel


{
int iThread, mb;
int i, r, i1, i2;
#ifdef __MIC__
__m512 v;
#else
__m128 v;
#endif

#pragma omp single


194 Advanced Computational Electromagnetic Methods and Applications

nThreads = omp_get_num_threads();
mb = m / nThreads;
if (m % nThreads) mb ++;

iThread = omp_get_thread_num();

i1 = iThread * mb;
i2 = (iThread + 1) * mb;
if (i1 > m) i1 = m;
if (i2 > m) i2 = m;

for (r = 0; r < ROUND; r ++)


{
for (i = i1; i < i2; i ++)
{
#ifdef __MIC__
vC[i] = _mm512_fmadd_ps(vA[i],vB[i],vC[i]);
#else
v = _mm_mul_ps(vA[i], vB[i]);
vC[i] = _mm_add_ps(vC[i], v);
#endif
}
}

printf("nThreads = %d\n", nThreads);


gettime(tm_end);
TTIME tm_duration = tm_end - tm_start;

aligned_free1D(A);
aligned_free1D(B);
aligned_free1D(C);

double gigaflop = 2.0 * (double)ROUND*(double)n/1.0e9;


double t = tm_duration.hour * 3600.0
+ tm_duration.minute * 60 + tm_duration.second
+ (double)tm_duration.millisecond / 1000.0;

printf("GFLOP : %g\n", gigaflop);


printf("Duration : %02d:%02d:%02d.%03d\n",
tm_duration.hour, tm_duration.minute,
tm_duration.second, tm_duration.millisecond);
printf("GFLOPS : %g\n", gigaflop / t);

return 0;
}
}

The GNUmakefile for the CPU platform is shown in Listing 5.2.


Phi Coprocessor Acceleration Techniques 195

Listing 5.2 GNUmakefile for CPU platform


objects = main.o
CXX = icpc
CXXFLAGS = -O2 -c -openmp
LINK = icpc
LDFLAGS = -openmp
matrixcalc: $(objects)
$(LINK) $(LDFLAGS) $^ -o $@
%.o: %.cpp
$(CXX) $(CXXFLAGS) $< -o $@
.PHONY: clean
clean:
-rm -f $(objects) matrixcalc

The GNUmakefile for the Phi platform is shown in Listing 5.3.

Listing 5.3 GNUmakefile for the Phi platform


objects = main.o
CXX = icpc
CXXFLAGS = -mmic -O2 -c -openmp
LINK = icpc
LDFLAGS = -mmic -openmp
matrixcalc: $(objects)
$(LINK) $(LDFLAGS) $^ -o $@
%.o: %.cpp
$(CXX) $(CXXFLAGS) $< -o $@
.PHONY: clean
clean:
-rm -f $(objects) matrixcalc

There are two files in the application folder, namely, GNUmakefile and main.cpp.

1. Run the code on the CPU platform


It is worthwhile to mention one needs to use sh to run a prescription file, and
use ./ to run a binary file. The Intel Xeon E5 CPU and Xeon Phi coprocessor
can share the same source code, but the executable files are slightly different.
We need to compile the source code for the Intel Xeon E5 and Xeon Phi
coprocessor separately.
Go to the application file folder and type the following commands (make
sure to use the correct makefile (without the -mmic option)):
user_prompt>make clean
user_prompt>make

The second command will generate two files in the application folder,
namely, main.o and matrixcalc, as shown in Figure 5.5.
196 Advanced Computational Electromagnetic Methods and Applications

Figure 5.5 Two files main.o and matrixcalc in the application folder.

Type the following command to run the code on the CPU platform:
user_prompt>./matrixcalc
The screen message can be read as:
nThreads = 16
GFLOP : 1638.4 (total operations)
Duration : 00:00:17.709 (simulation time)
GFLOPS : 92.5179 (performance)

2. Run the code on the Phi platform (stand alone (as an independent processor))
Once again, the Intel Xeon E5 CPU and Xeon Phi coprocessor can share
the same source code with an additional compilation option –mmic.
Go to the application file folder and type the following commands (make
sure to use the correct makefile (with the –mmic option)):

user_prompt>make clean
user_prompt>make

The second command will generate two files in the application folder,
namely, main.o and matrixcalc. Type the following command to send
the code to Phi coprocessor mic0:

user_prompt> scp matrixcalc mic0:.

The screen message is:

Matrixcalc 100% 17kb 16.7kb/s 00:00

Type the following command to login to the Xeon Phi coprocessor:


Phi Coprocessor Acceleration Techniques 197

user_prompt> ssh mic0

Type the following command to run (you are on the Phi platform already):
user_prompt> ./matrixcalc

The screen message reads:

nThreads = 228 (total number of threads)


GFLOP : 1638.4 (total operations)
Duration : 00:00:2.791 (simulation time)
GFLOPS : 587.03 (performance)

The acceleration factor = 17.709/2.791= 6.345


Exit from the Phi coprocessor by using the following command:

User_prompt>exit

The screen message reads:

Logout
Connection to mic0 closed

This is a benchmark from Intel with similar operations on the CPU


platform:

nThreads = 16 (total number of threads)


GFLOP : 409.6 (total operations)
Duration : 00:00:13.297 (simulation time)
GFLOPS : 30.8039 (performance)

The same benchmark can reach the better performance on the Phi
coprocessor:

nThreads = 228 (total number of threads)


GFLOP : 5836.8 (total operations)
Duration : 00:00:03.640 (simulation time)
GFLOPS : 1603.52 (performance)

3. Run the code in the offload style


Remove the compiling variable -mmic in the makefile. Change the
environment using the following three commands:
198 Advanced Computational Electromagnetic Methods and Applications

export MIC_ENV_PREFIX=MIC
export \ MIC_LD_LIBRARY_PATH=/opt/intel/composerxe/lib/mic:
/opt/intel/mic/coi/host-linux-release/lib

The offload mode allows one to send the jobs to the Phi coprocessor
automatically. The performance in this way will be lower than the standalone
mode. In the offload mode, we need to allocate the memory both in the host
and Phi coprocessor, for example,

#ifdef _WIN32
__declspec(align(64)) float fa[FLOPS_ARRAY_SIZE];
__declspec(align(64)) float fb[FLOPS_ARRAY_SIZE];
#else
__declspec(target(mic)) float fa[FLOPS_ARRAY_SIZE]
__attribute__((align(64)));
__declspec(target(mic)) float fb[FLOPS_ARRAY_SIZE]
__attribute__((align(64)));
#endif

Use the code segment in Listing 5.4 to tell the system that the following code
will be sent to the Phi coprocessor.

Listing 5.4 Code segment for the Xeon Phi coprocessor


#pragma offload target (mic)
#pragma omp parallel for private(j, k)
for (i = 0; i < nThreads; i ++)
{
int offset = i * LOOP_COUNT;
for (j = 0; j < MAXFLOPS_ITERS; j ++)
{
#pragma vector aligned
for (k = 0; k < LOOP_COUNT; k ++)
{
fa[k+offset] = a * fa[k+offset] + fb[k+offset];
}
}
}

For the same benchmark from Intel, the performance on the Phi coprocessor is:

nThreads = 224 (total number of threads)


GFLOP : 5734.4 (total operations)
Duration : 00:00:03.997 (simulation time)
GFLOPS : 1434.68 (performance)
Phi Coprocessor Acceleration Techniques 199

The total available number of threads is 224, not 228, in the offload model
since one core is reserved for the communication.

5.3 CODE DEVELOPMENT

In this section, we will introduce performance optimization and code development


techniques based on the MIC instruction set. The parallel FDTD method and
matrix multiplication are employed to demonstrate the code development and
optimization techniques.

5.3.1 Performance Optimization

It is extremely important to understand the computer architecture for us to get a


high efficient simulation code. Next we will introduce several concepts in
computer science that can help achieve a better code performance. The major
bottleneck in electromagnetic simulations today is memory bandwidth, but not
CPU performance, as shown in Figure 5.6. The problem is getting worse with the
development of computer techniques.

Moore’s law effect


60%/yr
100000

10000
Performance

1000
CPU performance
Gap grows at
50% per year
100 7%/yr

10 Memory performance

Time

Figure 5.6 Effect of CPU and memory on the simulation perfromance.

There are several ways to improve the memory performance:


1. Distributed computing technique that uses domain decomposition to
handle the task on distributed nodes and the total memory bandwidth will
be improved linearly. However, the network performance among the
200 Advanced Computational Electromagnetic Methods and Applications

distributed nodes will significantly affect the system performance. The


parallel efficency is defined as follows:

Calculation Cost (5.2)


Parallel Efficiency 
Calculation Cost  Communication Cost

2. Large amount of cache that efficently improves the simulation


performance by increasing the buffer size between memory and
processing unit, as shown in Figure 5.7. If the data has quite a large
opportunity to be used multiple times, we can keep it in a much faster
buffer so that the read and write time can be significantly reduced. But the
buffer size is always limited because it is much more expensive than
memory. We need to keep the most frequently used data in cache as long
as we can to improve the performance.
Process Unit

Cache Memory

Figure 5.7 Cache is used as a buffer between the processing unit and memory.

3. Cache hit ratio is used to measure how to organize the data inside cache
and memory. The performance comparison among L1, L2, L3, and
memory is listed in Table 5.2.

Table 5.2
Performance Comparison Between Cache and Memory
Phi Coprocessor Acceleration Techniques 201

4. Page is a minimum memory management unit in current operating


systems, as shown in Figure 5.8. The memory space used in a code is not
a real physical one. The memory management system will divide the
request memory space into several pages (continunous memory block)
and map each page to a given physical space.

Matrix A in
physical memory
Address translation by a page
table in memory Matrix A in
physical memory
Matrix A
Matrix A in
Translation Lookaside physical memory
Buffer (TLB)

Matrix A in
physical memory

Figure 5.8 Page stratogy in the matrix operation.

The benefit of data access within a page is that it supports hardware


prefetching with no need to carry out address translation. The random
in-page access can be two to three times faster than the out-page
way.

5. Cache line is the minimum exchange unit between cache and memory, as
shown Figure 5.9. Size of the cache line is 64 bytes on the x86 system,
implying that cache will grasp 64 adjacent bytes of data or 16 floating
numbers in one fetch operation. Two adjacent instructions should access
the nearby data to improve the performance.

Cache Miss

Load

Cache Memory

Figure 5.9 Cache line between cache and memory.


202 Advanced Computational Electromagnetic Methods and Applications

6. Cache associativity is a cache policy to place the loaded data into cache,
for instance, two-way cache, as shown in Figure 5.10. M1 can be loaded
to C1 or C2. If C1 and C2 are all used, for instance, M3 is in C1 and M5
is in C2, a cache replacement will be made. Either C1 or C2 will be
written back to the memory before using it.

Memory Cache
M1 C1 C2
M2 C3 C4
M3
M4
M5

Figure 5.10 Two-way associativity.

Each core has its own cache for the better compute performance, as shown in
Figure 5.11. If core1 and core2 access adjacent memory, cache1 and cache2 will
have the image of the same memory block. To keep the data coherence, any
change in cache1 should be pushed to cache2 immediately, and vice versa. Such an
operation will cause an additional cost. Even though the cost in the Phi coprocessor
is much lower than in Intel and AMD CPUs, we still need to avoid it as much as
we can.
In order to solve this problem, let each core process a different data block, and
each data block should be at least one cache line size. One will see a significant
performance improvement in some particular problems.

CORE 1 CORE 2

Cache
11 1 Cache 2

Memory

Figure 5.11 Each core has its own cache in a modern processor unit.

If the data is accessed in the pattern of cache set size (critical stride), the cache
replacement will happen frequently since all of these data will be assigned to the
Phi Coprocessor Acceleration Techniques 203

same cache set; however, the capacity of the cache set is limited, for instance, L2
cache in the Phi coprocessor is 8-way and the capacity of each cache set is 8, as
shown in Figure 5.12.

A(10.0), A(10.1)
A(10.1) A(11.1)
Unused Unused
Unused Unused
A(11.0), A(11.1)

A(12.0), A(12.1)

Figure 5.12 Cache connection inside a core.

Suppose that we have an array A with 13×8 elements; the data in memory is
continuous along the column, as shown in Figure 5.12. For example, we calculate
A(12, i) from A(10, i) and A(11, i), i=0 to 7.

for (i = 0; i < 8; i ++)


{
A(12, i) = A(10, i) + A(11,i)
}

To calculate A(12, 0), we need A(10, 0) and A(11, 0), which have been loaded
to the cache. Actually, A(10, 1) to A(10, 7) and A(11, 1) to A(11, 7) have been
loaded simultanuously. Since the cache has been occupied by A(10, 0), A(10, 0)
will be replaced by A(12, 0). Next, when we calculate A(12, 1) for i = 1, we need
to reload A(10, 1) and then A(10, 1) is replaced by A(12, 1) again. The cache hit
ratio will be very low in this way.
Enlarging the column size of array A by one cache line size will avoid the
critical stride above, as shown in Figure 5.13.

A(10,0), A(10,1)
A(10,0)

A(11,0)

A(12,0)
A(11,0), A(11,1)

A(12,0), A(12,1)

Figure 5.13 Increase the column size of array A by one cache line size to avoid the critical stride.
204 Advanced Computational Electromagnetic Methods and Applications

5.3.2 Memory Alignment

We now demonstrate how to align memory to meet the requisites. The beginning
address of a memory block equals the multiple of a particular integer. For instance,
AVX-512 instructions need data to be aligned by 64 bytes, and the cache access is
aligned by 64 bytes while the page access is aligned by 4,096 bytes. In order to use
vector unit of the Phi coprocessor, one should use at least 64-byte alignment for
memory allocation. It is quite simple in Intel compiler, for example,

void* _mm_malloc (size_t size, size_t align );


void _mm_free (void *p);

Using the following statement:

float*A = (float *)_mm_malloc(4096 * sizeof(float), 64);

to allocate a variable A that can hold 4,096 floating numbers by 64-byte address
alignment, as shown in Figure 5.14.

0x207B340
(34059072)
A
0x207C33F
(34063167)

Figure 5.14 A variable A is aligned in memory.

5.3.3 Parallel FDTD Implementation

We introduce how to develop a parallel FDTD code on the Xeon Phi coprocessor
platform. A typical Yee cell with positions of the electric and magnetic fields is
illustrated in Figure 5.15. The complete update equations for all six components
are used to demonstrate the code development techniques based on the Xeon Phi
coprocessor platform. In order to show a realistic case, we consider the electric and
magnetic inhomogeneous media in (5.3), namely, the material parameters ε and μ
are functions of 3-D spatial coordinates. We can construct a material list and a
reference array instead of using the real material arrays to reduce the memory
usage and improve the cache hit ratio as well, as shown in Figure 5.16. A
comparison of the memory usage between regular array allocation and the material
list with the reference array is shown in Table 5.3.
Phi Coprocessor Acceleration Techniques 205

Ex
Ey Hz Ey

Ex
Ez Hy Ez

Ez Hx Ez Hx
Hy

Ex

Ey Hz
z Ey
y
x
Ex

Figure 5.15 A Yee cell and positions of the electric and magnetic fields in the FDTD method.

 x  0.5t Mx n 1 2
H xn 1 2  i, j  1 2, k  1 2   Hx  i, j  1 2, k  1 2 
 x  0.5t Mx
 E yn  i, j  1 2, k  1  E yn  i, j  1 2, k  
 
t  z 

 x  0.5t Mx  E n  i, j  1, k  1 2   E n  i, j , k  1 2  
 z z 
 y 
(5.3a)
 y  0.5t My
H yn 1 2  i  1 2, j, k  1 2   H yn 1 2  i  1 2, j, k  1 2 
 y  0.5t My
 Ezn  i  1, j , k  1 2   Ezn  i, j , k  1 2  
 
t  x 

 y  0.5t My  E n  i  1 2, j , k  1  E n  i 1 2, j , k  
 x x 
 z  (5.3b)
 z  0.5t Mz n 1 2
H zn 1 2  i  1 2, j  1 2, k   Hz  i  1 2, j  1 2, k 
 z  0.5t Mz
 Exn  i  1 2, j  1, k   Exn  i  1 2, j , k  
 
t  y 

 z  0.5t Mz  
 yE n
 i  1, j  1 2, k   E n
y  i , j  1 2, k  
 x 
(5.3c)
206 Advanced Computational Electromagnetic Methods and Applications

 x  0.5t x n
Exn 1  i  1 2, j, k   Ex  i  1 2, j, k 
 x  0.5t x
 H zn 1 2  i  1 2, j  1 2, k   H zn 1 2  i  1 2, j  1 2, k  
 
t  y 

 x  0.5t x  H n 1 2  i  1 2, j , k  1 2   H n 1 2  i  1 2, j , k  1 2  
 y y 
 z 
(5.3d)
 y  0.5t y
E yn 1  i, j  1 2, k   E yn  i, j  1 2, k 
 y  0.5t y
 H zn 1 2  i  1 2, j  1 2, k   H zn 1 2  i  1 2, j  1 2, k  
 
t  x 
+
 y  0.5t y  H xn 1 2  i, j  1 2, k  1 2   H xn 1 2  i, j  1 2, k  1 2  
 
 z  (5.3e)
 z  0.5t z n
Ezn 1  i, j , k  1 2   E  i, j , k  1 2 
 z  0.5t z z
 H yn 1 2  i  1 2, j , k  1 2   H yn 1 2  i  1 2, j, k  1 2  
 
t  x 
+
 z  0.5t z  H xn 1 2  i, j  1 2, k  1 2   H xn 1 2  i, j  1 2, k  1 2  
 
 y 
(5.3f)

   m

   m

idx

Figure 5.16 Material list in 3-D inhomogeneous environment.


Phi Coprocessor Acceleration Techniques 207

Table 5.3
Memory Comparison Between Regular Array Allocation and Material List

Real Material Array Material List + Reference Array

12 3-D floating arrays One 3-D pointer array + one 2-D floating array

12 2 (one 64-bit pointer equals two float numbers)

Now we describe the information exchanging strategy between two adjacent


subdomains in the FDTD simulation. An overlapped cell is employed to improve
the code performance and robustness, as shown in Figure 5.17. The right magnetic
field H is the inner point in domain B and can be updated using regular update
equations. But it is the outer point of domain A; hence, it cannot be updated in
domain A. Therefore, the value of these H fields calculated in domain B will be
sent from domain B to domain A through the MPI functions [2629].

Overlap

E E E
H H

E E E
H H
Domain A Domain B
E E E
H H

E E E
H H

Figure 5.17 Overlapped cell between two adjacent domains.

The Xeon Phi coprocessor is threading thirsty. The 5110P model needs at least
120 threads to make the device fully loaded. Unlike the traditional CPUs, we are
going to perform the multithreading on the x-y plane instead of in the x-direction
only, as shown in Figure 5.18 and demonstrated in Listing 5.5.

Listing 5.5 Code segment for the thread division in the x-y plane
int n = (nx + 1) * (ny + 1);
int nthreads, nsize;
#pragma omp parallel
{
#pragma omp single
208 Advanced Computational Electromagnetic Methods and Applications

{
nthreads = omp_get_num_threads();
if (n % nthreads) nsize = n / nthreads;
else nsize = n / nthreads + 1;
}
int idthread = omp_get_thread_num();
int n1, n2;
n1 = idthread * nsize;
n2 = (idthread + 1) * nsize;
if (n2 > n) n2 = n;
<< Call field update from n1 to n2 >>
}

Core 1 Core 1 Core 1 Core 1 Core 1 Core 2

Core 2 Core 2 Core 2 Core 2 Core 3 Core 3

Core 3 Core 3 Core 3 Core 4 Core 4 Core 4

i Core 4 Core 4 Core 5 Core 5 Core 5 Core 5

Figure 5.18 Thread job assignments for the Xeon Phi coprocessor.

5.3.4 Job Scheduling Strategy

Each core of a Phi coprocessor can support up to four hardware threads. There will
be no penalty for the job switch among threads. However, all of these threads in
one core share the same L1 and L2 caches. If the data in each thread can cause
cache overwrite during threads switch, the cache hit ratio will be low efficient. But
we can arrange threads and make memory access as much locally as possible to
improve the code performance. The job scheduling has the following four
strategies, which are controlled by the environment variable KMP_AFFINITY. For
example, if we have 61 threads (from 0 to 60) and 80 jobs, the scheduling strategy
is described as follows [26]:

1. KMP = compact

 80 threads mapped 4 threads/core for cores 0, 1, 2,..., 19.


 Allows scaling studies by fully loaded core count. The cores 20,..., 60
are not used in this case, as shown in Figure 5.19.
Phi Coprocessor Acceleration Techniques 209

2. KMP = scatter

 80 threads: first 61 threads mapped one thread per core for cores 0,…
60, and the last 19 threads mapped one (more) thread per core for cores
0,…, 18, as shown in Figure 5.20.
 Allows “one thread per core” studies for 1 to 61 threads.

Core 0

Core 1

……

Core 19

Core 20
……

Core 59

Figure 5.19 Compact job scheduling for 80 jobs and 61 cores (4 or 0 threads per core).

Core 0

Core 1
……

Core 18

Core 19
……

Core 59

Figure 5.20 Scatter job scheduling for 80 jobs and 61 cores (1 or 2 threads per core).
210 Advanced Computational Electromagnetic Methods and Applications

3. KMP = balanced

 Variant of the scatter model.


 80 threads: First 38 threads mapped two threads per core for cores 0,...,
18, and remaining 42 threads mapped one thread per core for cores
19,..., 60.
 Better if adjacent OpenMP threads are sharing data (since hardware
contexts share the same L1 and L2 caches).

4. KMP = explicit

 Allows exact specification of mapping using proclist modifier

Export KMP_AFFINITY=‘explicit,
proclist=[0,1,2,3,4]’

 But watch out for unexpected logical to physical processor mapping and
unexpected OpenMP thread to logical processor mapping.

We will use the scatter mode as an example here (see Listing 5.6) and select
number of threads to be twice number of cores. Then threads i and (i+n) will sit in
the same core. Since the data is continuous along the z-direction, we can split the
z-direction into two parts and let the thread i work on the first part and the thread
(i+n) work on the second one. It can localize the memory access inside a core.

Listing 5.6 Code segment for the scattering mode


int n = (nx + 1) * (ny + 1);
int nthreads, nsize;
#pragma omp parallel
{
#pragma omp single
{
nthreads = omp_get_num_threads();
nthreads /= 2;
if (n % nthreads) nsize = n / nthreads;
else nsize = n / nthreads + 1;
}
int idthread = omp_get_thread_num();
int flag = idthread / nthreads;
idthread = idthread % nthreads;
int n1, n2;
n1 = idthread * nsize;
n2 = (idthread + 1) * nsize;
if (n2 > n) n2 = n;
<< Call field update from n1 to n2 with flag >>
}
Phi Coprocessor Acceleration Techniques 211

where the number of block = number of cores, the parameter flag = 0 when thread
index is i, and the parameter flag = 1 when the thread index is (i+n).

5.3.5 FDTD Code Development

The code parallel processing on the Xeon Phi coprocessor has four levels, namely,
card level, core level, thread level and vector unit level [18, 19], as shown in
Figure 5.21. The code can be assigned to different cards, cores, threads and vector
units. A pseudo code segment is shown in Listing 5.7. We use the FDTD method
to demonstrate how to develop the parallel code on the Xeon Phi coprocessor
platform. If the storage of a 3-D array in memory is continuous along the z-
direction, we divide the data into 60 blocks in the x-y plane, which is equal to the
number of cores in the Xeon Phi coprocessor, as shown in Figure 5.22. Select one
column in an individual block and assign it to two threads of one core; all cores
will be coalesced and each thread will work on one half-column of the selected
data. The vector unit will work on 16 adjacent data at the same time and generate
16 results in each cycle.

Parallel code MPI

1
Cluster

OpenMP

2 CPU

Compute Core
Computer Core

Thread Parallel
processing SSE/AVX/FMA/MIC

4
Vector Unit

Figure 5.21 Four-level parallel processing strategy in the CEM.


212 Advanced Computational Electromagnetic Methods and Applications

Listing 5.7 Pseudo code segment for four parallel levels


#pragma omp target device(0) //Cards
#pragma omp teams num_tems(60) num_threads(4)
{
#pragma omp distribute //Cores
for(int i = 0; i < 2048; i++)
{
#pragma omp parallel for // Threads
for(int j = 0; j < 512; j++)
{
#pragma omp simd //Vector units
for(ink k = 0; k <32; k++)
{
update(i,j,k);
}
}
}
}

1/60
Thread 1

One core
Thread 2

Figure 5.22 Job division and assignment for threads and cores inside the Xeon Phi coprocessor.
© ACES 2014 [19]

The Xeon Phi coprocessor supports the AVX-512 instruction set, namely, the
MIC instruction set. It can process 16 floating calculations (32 floating calculations
with the FMA feature) in a single operation. It requires the data aligned by 64
Phi Coprocessor Acceleration Techniques 213

bytes. We will apply the AVX-512 operation to the data along the z-direction, and
then the data along the z-direction should be aligned by 64 bytes. The zero-padding
in the z-direction may be required to let an array aligned by 64 bytes. For example,
an array D[*][*][0] represents one of the electric or magnetic fields, and there are
12 grids along the z-direction. We know that it is not aligned by 64 bytes because
the direction only includes 48 bytes, as shown in Figure 5.23. Although the array D
has been aligned already when it is allocated, D[*][*][0] is not guaranteed to be
aligned. In order to align the array D to be 64 bytes, we need to pad zeroes along
the z-direction, say, 12 bytes, as shown in Figure 5.24. We will use 32 instead of
16 for coding convenience since each core has two threads employed. The zero
padding is realized by using the following statement:

if (nz % 16) nz = (nz / 16 + 1) * nz;

where the integer nz is the number of grids along the z-direction.

0 D[0][0]

48 D[0][1]

Figure 5.23 It is not alignment by 64 bytes of data on D[0][1][0].

0 D[0][0] Zero-Padding

64 D[0][1] Zero-Padding

128 D[0][2] Zero-Padding

Figure 5.24 Never miss 64-byte alignment after zero padding.

During code development, we use hardware threads in each core to be


convenient and efficient. For example, if we use only one thread in each core, the
half cycle of operation is in idle status, namely, the efficiency is only 50%. Our
experience shows that code performance has no difference between four and two
threads used in the FDTD simulation. If we use all four threads in each core, the
number of grids along the z-direction should be a multiple of 64. That is the main
reason why we use two threads in each core, which is preferred for the smaller
problems.
The grids along the z-direction are grouped by 16. Two threads in one core
alternately work on these groups, as shown in Figure 5.25.
214 Advanced Computational Electromagnetic Methods and Applications

Z-direction 16 16 16 16

Thread i Thread i+n

Figure 5.25 Data groups by 16 along the z-direction.

Ideally, the update using the MIC instructions will be 16 times faster than a
regular floating point unit. A typical pseudo code segment is given in Listing 5.8.

Listing 5.8 Pseudo code segment for the MIC demonstration


for (n = n1; n < n2; n ++)
{
i = n / (ny + 1);
j = n % (ny + 1);
if (i == 0) continue;
if (j == 0) continue;
for (vk = flag, k = 16 * flag; vk<nvk; vk += 2, k += 32)
{
code update;
}
}

The update of the electric fields, Ex, Ey, and Ez, on the Xeon Phi coprocessor
platform is demonstrated in Listing 5.9.

Listing 5.9 Electric field update on the Xeon Phi coprocessor


// Ex component
ssereg4 = _mm512_sub_ps(vHz[vk], vHz01[vk]);
ssereg4 = _mm512_mul_ps(ssereg4, vrHdy);
ssereg3 = _mm512_sub_ps(vHy[vk], ssereg1);
ssereg3 = _mm512_fmsub_ps(ssereg3, vrHdz[vk], ssereg4);
ssereg3 = _mm512_mul_ps(ssereg3, vCexh);
vEx[vk] = _mm512_fmsub_ps(vEx[vk], vCexe, ssereg3);

// Ey component
ssereg4 = _mm512_sub_ps(vHx[vk], ssereg2);
ssereg4 = _mm512_mul_ps(ssereg4, vrHdz[vk]);
ssereg3 = _mm512_sub_ps(vHz[vk], vHz10[vk]);
Phi Coprocessor Acceleration Techniques 215

ssereg3 = _mm512_fmsub_ps(ssereg3, vrHdx, ssereg4);


ssereg3 = _mm512_mul_ps(ssereg3, vCeyh);
vEy[vk] = _mm512_fmsub_ps(vEy[vk], vCeye, ssereg3);

// Ez component
ssereg4 = _mm512_sub_ps(vHy[vk], vHy10[vk]);
ssereg4 = _mm512_mul_ps(ssereg4, vrHdx);
ssereg3 = _mm512_sub_ps(vHx[vk], vHx01[vk]);
ssereg3 = _mm512_fmsub_ps(ssereg3, vrHdy, ssereg4);
ssereg3 = _mm512_mul_ps(ssereg3, vCezh);
vEz[vk] = _mm512_fmsub_ps(vEz[vk], vCeze, ssereg3);

5.3.6 Matrix Multiplication

In both FEM and MoM, solving matrix equations is the most time consuming part.
Now, we investigate how to use the Phi coprocessor to calculate the multiplication
of two matrices. We begin with the following equation:

Cnl  Anm  Bml (5.4)

where Cn×l, An×m, and Bm×l are matrices and the subscripts indicate the number of
rows and columns of the corresponding matrices. If we define a 1-D array and map
a 1-D array to a 2-D array that is used to allocate memory for the matrices in (5.4),
one matrix with 5 × 3 elements can be mapped from a 1-D array with 15 elements,
as shown in Figure 5.26. The detailed mapping relationship from 1-D array to 2-D
array was discussed in Section 3.2.4. It is obvious from Figure 5.26 that the data of
the matrix C is contiguous along its column index l.

n
C[0] C[4] C[7] C[10] C[13]
C00 C10 C20 C30 C40

l C[2] C[5] C[8] C[11] C[14]


C01 C11 C21 C31 C41

C[3] C[6] C[9] C[12] C[15]


C02 C12 C22 C32 C42

Figure 5.26 Mapping relationship from 1-D to 2-D array.


216 Advanced Computational Electromagnetic Methods and Applications

Following the idea described above to allocate the matrices A and B, the
column of matrix B will not be contiguous; in turn, the multiplication operation of
the matrices A and B is very low efficient. It is a well-known fact that we need to
make a transpose of B to speed up the matrix multiplication. If we calculate the
matrix multiplication on the Phi coprocessor, it is observed that calculation for
smaller A and B is much faster than larger A and B. This happens because the
elements of smaller A and B can be held in the cache to increase the cache hit ratio.
The cache hit ratio becomes lower when matrices A and B become larger.
We use the following strategy to improve the compute performance. Each
element in matrix C can be calculated in the way described in Figure 5.27.

= X

C A B

Figure 5.27 Multiplications of the matrices A and B.

Each core of the Xeon Phi coprocessor has 512KB L2 cache with 8 ways. The
critical strider is 512/8 = 64 KB. If the column size is equal 64 KB or 16,384
floating numbers, we append a cache line size zero-padding area to each column.

if (m % 16) m = (m / 16 + 1) * 16;
if (m % 16384 == 0) m += 16;

Data continuity is a key factor to performance improvement. However, the


matrix multiplication needs to calculate the inner product of a row of matrix A and
a column of matrix B. That means that we cannot get continuous data on both of
them, as shown in Figure 5.28. It makes the situation even worse with AVX-512
instructions, which only support vector calculation. We need additional load
instructions to feed the vector registers.
The solution is quite simple. If the data of a matrix is continued along its
columns, let us use the transpose of matrix B instead. The transpose operator will
be the extra cost but its computation complexity is only O(n2) compared with O(n3)
in multiplication operator.
Each element in matrice A and B will be used l and n times correspondingly.
The previously used data in cache should be kept as long as possible. It can be
realized by making the data access locally. We can break the matrix into blocks
and perform the calculation on each block, as shown in Figure 5.29. After that,
summarize the results of the blocks to get the final result. This method can ensure
Phi Coprocessor Acceleration Techniques 217

that the previously used data in each block can survive during the whole block
calculations.

A B

Transpose of B

Figure 5.28 Improve the matrix operation by introducing a transpose matrix.

C A B

= X + X

C A B B

Figure 5.29 Break the matrix into smaller blocks for the better code performance.

Now we use the following pseudo code to explain the programming


techniques on the Phi coprocessor. A 5110P Phi coprocessor includes 60 cores and
240 hardware threads. We need to use OpenMP to split the task into smaller pieces
based on the number of threads (nThreads = 240 for 5110P Xeon Phi coprocessor).
We split the matrix A to be the nn × mm blocks and the matrix B to be the mm × ll
blocks so that the matrix C is the nn × ll blocks. The block size of the matrix C is
16 × 16; the block size of the matrix A is 16 × 64; and the block size of the matrix
B is 64 × 16. This ensures that the data is contiguous in memory. The index k (the
column of matrix A and row of matrix B) is normalized by 16 based on the 512-bit
218 Advanced Computational Electromagnetic Methods and Applications

SIMD instruction, and the constant nBlockSizeBySIMD is 4 in this case for the
better code performance. The code for a block matrix operation is shown in Listing
5.10.

Listing 5.10 Code for the block matrix operation


#pragma omp parallel for
for (iThread = 0; iThread < nThreads; iThread ++)
{
float y;
VECT v;
int t, i, j, k, i0, j0, i1, j1;
for (t = iThread; t < nn * ll; t += nThreads)
{
i0 = (t / ll) * nBlockSizeA;
j0 = (t % ll) * nBlockSizeA;
i1 = i0 + nBlockSizeA;
j1 = j0 + nBlockSizeA;
for (k = 0; k < mm * nBlockSizeBySIMD; k +=
nBlockSizeBySIMD)
{
for (i = i0; i < i1; i ++)
{
for (j = j0; j < j1; j ++)
{
v = _mm512_set1_ps(0.0);
v = _mm512_fmadd_ps(vA[i][k],vB[j][k],v);
v = _mm512_fmadd_ps(vA[i][k+1],vB[j][k+1],v);
v = _mm512_fmadd_ps(vA[i][k+2],vB[j][k+2],v);
v = _mm512_fmadd_ps(vA[i][k+3],vB[j][k+3],v);
C[i][j] += _mm512_reduce_add_ps(v);
}
}
}
}
}

The code for a nonblock matrix operation is shown in Listing 5.11.

Listing 5.11 Code for the nonblock matrix operation


#pragma omp parallel for private(j, k)
for (i = 0; i < n; i ++)
{
for (j = 0; j < l; j ++)
{
#pragma ivdep
for (k = 0; k < m; k ++)
{
C[i][j] += A[i][k] * B[k][j];
Phi Coprocessor Acceleration Techniques 219

}
}
}

Nonblock matrix code applies OpenMP on the row direction of C and applies
AVX-512 on each C element calculation. Block matrix code applies OpenMP on
block set and applies AVX-512 on each block element calculation.
For two matrices A and B with 2,048 × 2,048 elements, the performance of
solving the matrix without the domain decomposition technique on Intel Xeon E5
2640 v2 Ivy-Bridge CPU is 0.63 second, but the performance with the domain
decomposition technique on the same CPU is 0.4 second. The performance of
solving the matrix with the domain decomposition technique on the Xeon Phi
coprocessor 5100P is 0.15 second, as shown in Figure 5.30.

RUNNING TIME OF 2048X2048 MATRIX MULTIPLICATION


Running time of 2048x2048 matrix multiplication

0.63
0.63

0.4
0.4

0.15
0.15

Intel XEON E5-2640 v2 Intel XEON E5-2640 v2 Intel XEON PHI 5110P
(Schema A) (Schema B) (Scheme B)

Figure 5.30 Performance on the Phi coprocessor using different methods.

5.4 NUMERICAL RESULTS

In this section, we use the parallel FDTD code based on the Xeon Phi coprocessor
to demonstrate performance of the Phi coprocessor [18, 19]. The host computer
incudes two Intel Xeon E5-2640 v2 CPUs with 32 GB DDR3 RAM, and one
5110P Xeon Phi coprocessor is mounted on the host through PCI × 16. The Xeon
Phi coprocessor is installed with 8 GB GDDR5 RAM. We use a typical example,
an empty box truncated by the PEC boundary, to demonstrate performance of the
Xeon Phi coprocessor. The problem size we first test is 1.29 GB and the
performance is 1,200 MCPS defined in (3.4). The performance on a single Phi card
can easily achieve 1,200 MCPS, and the performance increases to 1,350 MCPS
when the problem size increases to 7.2 GB, as shown in Figure 5.31. For the sake
of comparison, we plot performance of the Xeon E5-2640 v2 (8-core 2.0 GHz
CPU) in the same figure. We also demonstrate performance of the Phi coprocessor
for the smaller problems such as 200,000 cells, as shown in Figure 5.32.
220 Advanced Computational Electromagnetic Methods and Applications

1400 7.2GB
3.07GB
1.29GB
1200

Performance (Mcells/sec)
1000
Phi coprocessor
800
Xeon E5-2640 v2
600

400

200

0
27 36 48 64 80 100 125 150
Problem size (Mcells)

Figure 5.31 Performance of the 5110P Xeon Phi coprocessor for the parallel FDTD code for regular
size of problems. © ACES 2014 [19]

1400 0.0394GB

1200
Performance (Mcells/sec)

1000 0.0096GB

800 0.0883GB

600

400

200

0
0.2 0.46 0.82 1.28 1.84
Problem size (Mcells)

Figure 5.32 Performance of the 5110P Xeon Phi coprocessor for the parallel FDTD code for smaller
size of problems. © ACES 2014 [19]

We next use the code based on the Phi coprocessor to simulate a time domain
reflectometer (TDR) problem. A discontinuous structure is fed by a pair of parallel
plates filled with dielectric material. We investigate how to obtain the accurate
TDR using the FDTD method [3035]. If an incident pulse is a narrow Gaussian
shape, the time-domain reflectometer (TDR) is expressed as:
Phi Coprocessor Acceleration Techniques 221

T0 t0

TDR 
 f t dt   g t dt
0 0 (5.5)
t0
 g t dt
0

where t0 is width of Gaussian pulse, T0 is simulation time, and g(t) and f(t) are the
incident Gaussian pulse and reflected signal, respectively. The corresponding time
domain impedance is defined as [30]:

1  TDR (5.6)
Z Z0
1  TDR

where Z0 is the characteristic impedance of the feed microstrip structure. In


practical applications, we simulate the problem one time and separate the incident
pulse and reflected signal from the total time domain signal. A typical excitation
port with three excitations and one output is shown in Figure 5.33.
The observation point is required to be placed as close to the excitation source
as possible to reduce the size of the computational domain in practical applications.
However, if the output point is too close to the excitation source, the higher modes
will be involved in the output voltage, which, in turn, deteriorates the simulation
result. In this section, we investigate how to improve the simulation result by
adding two more observation points across the microstrip feed line. It is
straightforward to take the average of three voltages measured at the different
locations, which is called the arithmetic mean:

V1  V2  V3
V (5.7)
3

where V1, V2, and V3 are the time domain voltages measured at three different
locations, as shown in Figure 5.33(b). The TDR parameter can be calculated using
the arithmetic mean voltage. Now, we define a new parameter, namely, geometric
mean voltage [34]:

V  3 V1 V2  V3 (5.8)

The voltages V1, V2, and V3 in (5.7) and (5.8) measured at the center and two
edges might be different due to the finite microstrip in the feed port. Both (5.7) and
(5.8) are two ways to calculate the average of three output variables measured
across the microstrip. The arithmetic mean in (5.7) stands for the average of three
time domain signals measured across the microstrip, and the geometric mean in
(5.8) indicates the central tendency of three measured time domain signals. When
the observation points are located away from the excitation source, the arithmetic
and geometric means generate the same results.
222 Advanced Computational Electromagnetic Methods and Applications

Substrate
Observation voltage

Excitation

Excitation

Excitation

(a) Configuration of parallel microstriplines, single output and three excitations

Observation voltage 3
Substrate

Observation voltage 2

Excitation Observation voltage 1

Excitation

Excitation

(b) Outputs measured at the central and edge microstrip

Figure 5.33 Port configuration with excitation and output voltage sampling: (a) one output sampled
at the central stripline; and (b) three outputs sampled at the center and two edges. ©2014
IEEE [30].

As a test example, a pair of uniform strips has a 50 characteristic impedance,


as shown in the inset of Figure 5.34. The output points are 0.01 inch from the
excitation source and the 3-dB width of incident Gaussian pulse is 20 GHz. We use
three excitation sources uniformly distributed at the end of the microstrip. One six-
layer CPML [20] is used to truncate the six sides of the computational domain and
there is no white space in the CPML layer and simulated structures. The time
domain impedance obtained from the regular method (one observation point),
arithmetic mean voltage (three points), and geometric mean (three points) are
plotted in Figure 5.34. Since characteristic impedance of the uniform microstrip
structure is 50, it is obvious from Figure 5.34 that the geometric mean method is
Phi Coprocessor Acceleration Techniques 223

more accurate than the regular method and arithmetic mean that is also based on
three observation points. The numerical experiment in Figure 5.34 also shows that
the arithmetic mean does not improve the TDR calculation. The curve “Regular” in
Figure 5.34 indicates the result is calculated by using a single point output. Due to
the numerical dispersion, the incident signal measured in the simulation may not
be the same as the ideal Gaussian pulse. Therefore, the truncation of incident pulse
will affect the result slightly. Here, we select the truncation criterion is 0.1% of the
peak value of incident pulse.

Figure 5.34 Time domain impedance of the uniform microstrip structure using different algorithms.
©2014 IEEE [30].

The next example is a practical design that includes a discontinuous axial


structure excited by a pair of parallel microstrips, as shown in Figure 5.35. The
width of the feed microstrip is 14.22 mm and the thickness of the substrate is 0.508
mm. The top of the coaxial cable and the excitation port are terminated by a six-
layer CPML to ensure no numerical truncation reflection from the end of the
coaxial cable and feed microstrips. Three voltage excitations are located at the end
of the feed microstrips, and three voltage outputs are located at 0.254 mm from the
excitation source. We calculate the time-domain impedance using the formulations
(5.7) and (5.8) through the central voltage output only, arithmetic mean of three
voltage outputs, and geometric mean of three voltage outputs, respectively. The
time-domain impedance is plotted in Figure 5.36. The variation trend of time-
domain impedance is similar for all of the three cases.
224 Advanced Computational Electromagnetic Methods and Applications

Figure 5.35 A nonuniform coaxial microwave connector fed by using a pair of PEC plates.

Figure 5.36 Time-domain impedance of the discontinuous coaxial connector using the different
approaches. ©2014 IEEE [30].
Phi Coprocessor Acceleration Techniques 225

REFERENCES

[1] http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions.
[2] http://en.wikipedia.org/wiki/AVX.
[3] W. Yu, X. Yang and W. Li, VALU, AVX, GPU Acceleration Techniques for Parallel Finite
Difference Time Domain Methods, Raleigh, NC: SciTech Publisher Inc., 2013.
[4] http://www.amd.com.
[5] http://www.intel.com.
[6] http://www.intel.com/content/www/us/en/io/quickpath-technology/quickpath-technology-
general.html.
[7] http://sites.amd.com/us/documents/48101a_opteron%20_6000_qrg_rd2.pdf.
[8] http://softpixel.com/~cwright/programming/simd/sse.php.
[9] http://neilkemp.us/src/sse_tutorial/sse_tutorial.html.
[10] https://developer.apple.com/hardwaredrivers/ve/sse.html
[11] http://en.wikipedia.org/wiki/Advanced_Vector_Extensions.
[12] http://software.intel.com/en-us/avx.
[13] http://devgurus.amd.com/thread/159669.
[14] http://lomont.org/Math/Papers/2011/Intro%20to%20Intel%20AVX-Final.pdf.
[15] https://software.intel. com/en-us/mic-developer#pid-11757-1231.
[16] Intel® Xeon Phi™ Coprocessor: System Software Developers Guide,
http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-system-
software-developers-guide.html.
[17] http://en.wikipedia.org/wiki/Xeon_Phi.
[18] X. Yang and W. Yu, “Phi Coprocessor Acceleration Techniques for Finite Difference Time
Domain Methods,” IEEE International Symposium on Antennas and Propagation and USNC-
URSI Radio Science Meeting, Memphis TN, July 2014.
[19] X. Yang and W. Yu, “Phi Coprocessor Acceleration Techniques for Computational
Electromagnetic Methods,” Applied Computational Electromagnetics Society Journal, Vol. 29,
No. 12, pp. 1013-1017, 2014.
[20] W. Yu, et al., Parallel Finite Difference Time Domain Method, Norwood, MA: Artech House,
2006.
[21] A. Taflove and S. Hagness, Computational Electromagnetics: The Finite-Difference Time-
Domain Method, 3rd ed., Norwood, MA: Artech House, 2005.
[22] A. Elsherbeni and V. Demir, The Finite Difference Time Domain Method for Electromagnetics:
With MATLAB Simulations, Raleigh, NC: SciTech Publisher Inc., 2009.
[23] J. Jin, The Finite Element Method in Electromagnetics, (2nd ed.) New York: John Wiley & Sons,
2002.
[24] A. Peterson and R. Mittra, Computational Methods for Electromagnetics, New York: Wiley-
IEEE Press, 1997.
226 Advanced Computational Electromagnetic Methods and Applications

[25] Intel® Manycore Platform Software Stack,


http://registrationcenter.intel.com/irc_nas/3988/MPSS_Users_Guide.pdf.
[26] M. Snir, et al., MPI: The Complete Reference, Cambridge MA: MIT Press Cambridge, 1995.
[27] M Snir, et al., MPI—The Complete Reference: Volume 1, The MPI Core. Cambridge, MA: MIT
Press,1998.
[28] W. Gropp, et al., MPI — The Complete Reference: Volume 2, The MPI-2 Extensions, Cambridge,
MA: MIT Press, 1998.
[29] https://www.tacc.utexas.edu/c/document_library/get_file?uuid=ed331f32-49db-4c4b-9ea7-
f7d9547c79d9&groupId=13601

[30] W. Yu, X. Yang, and H. Fluhler, “Accurate Calculation Technique for Time Domain Impedance
In FDTD Method,” IEEE International Symposium on Antennas and Propagation and USNC-
URSI Radio Science Meeting, Memphis, TN, July 2014.
[31] TDR Impedance Measurements, A Foundation for Signal Integrity, Tektronix application note,
2008.
[32 and M. Diamond, “Feasibility of Reflectometry for Nondestructive
Evaluation of Prestressed Concrete Anchors,” IEEE Sensors Journal, Vol. 9. No. 11, pp.
13221329, 2009.
[33 P. Smith, C. Furse, and J. Gunther, “Analysis of Spread Spectrum Time Domain Reflectometry
for Wire Fault Location,” IEEE Sensors Journal, Vol. 5, No. 6, pp.14691478, 2005.
[34 C. Furse, C. Smith, and M. Chet, “Feasibility of Spread Spectrum Sensors for Location of Arcs
on Live Wires,” IEEE Sensors Journal, Vol. 5, No. 6, pp.14451450, 2005.
[35] J. Schneider, The Understanding Finite Difference Time Domain Method, Lecture note 2013,
Washington State University.
Chapter 6
Domain Decomposition Methods for Finite
Element Analysis of Large-Scale
Electromagnetic Problems
Ming-Feng Xue and Jian-Ming Jin

Full-wave electromagnetic simulators are widely used nowadays for engineering


design and analysis in the radio-frequency, microwave, millimeter-wave, terahertz,
and optical regimes. Because of its capability to model highly complex geometries
and materials, the FEM method becomes the most popular numerical tool for
simulating complex electromagnetic problems [1, 2]. However, a finite element
discretization of large-scale electromagnetic problems often results in a large
system of linear equations involving millions or even billions of unknowns, whose
solution is very challenging even with the most powerful computers available
today [2]. In this chapter, we discuss the development of domain decomposition
methods (DDMs) for the finite element analysis of such large-scale
electromagnetic problems. To be more specific, we consider several numerical
algorithms based on the dual-primal finite element tearing interconnecting (FETI-
DP) method [311] for the full-wave analysis of electromagnetic problems. When
first introduced into the field of computational electromagnetics (CEM), the FETI-
DP method converted a volumetric problem to a surface problem by assuming an
unknown Neumann boundary condition on the subdomain interface with the aid of
one Lagrange multiplier [8]. It was a typical nonoverlapping iterative
substructuring DDM based on the local Schur complement [12, 13]. Later, a new
FETI-DP formulation was proposed by introducing an unknown Robin boundary
condition on the subdomain interface with the aid of two Lagrange multipliers to
improve the convergence of the global interface solution in the high-frequency
region [9]. Both FETI-DP methods construct a global corner system that relates the
fields at the cross points between the subdomains through a Dirichlet continuity
condition. This corner system provides a coarse grid correction to improve the
convergence of the iterative solution of the global interface system by propagating
residual errors to the entire computational domain in each iteration [8, 9].

227
228 Advanced Computational Electromagnetic Methods and Applications

In this chapter, we focus on the developments that expand the capability and
improve the performance of the existing FETI-DP methods by: (1) lifting the
requirement of conformal meshes on the subdomain interface; (2) speeding up the
convergence rate of the iterative solution of the global interface problem; and (3)
incorporating appropriate truncation boundaries for more accurate simulations.
First, we present the formulations of the finite element tearing and interconnecting
(FETI) and FETI-DP methods based on one and two Lagrange multipliers for
domain decomposition with conformal interface meshes. We then formulate two
nonconformal FETI-DP methods, both of which implement the Robin-type
transmission condition at the subdomain interfaces. One nonconformal method
extends the conformal FETI-DP algorithm that is based on two Lagrange
multipliers to deal with nonconformal interface and corner meshes, whereas the
other employs cement elements on the subdomain interface, combines the global
primal unknowns with the global dual unknowns, and extracts the corner
unknowns to formulate a global coarse problem [14, 15]. Next, we discuss the
implementation of higher-order transmission conditions in the FETI-DP method
with two Lagrange multipliers for a faster convergence of the iterative solution of
the global interface system [16]. These higher-order transmission conditions can
transmit both transverse-electric (TE) and transverse-magnetic (TM) evanescent
modes in addition to the propagating modes [1725]. They are critical for
obtaining a converged result in the case when perfectly matched layers (PMLs) are
used for mesh truncation. After that, we describe a hybrid scheme to handle multi-
region electromagnetic problems, whose computational domain consists of several
regions that can be meshed independently. In this scheme, the FETI method is
employed to deal with mesh-nonconformal and/or geometry-nonconformal
interfaces between different regions and the FETI-DP method is used for mesh-
conformal and geometry-conformal interfaces inside each region. A unified global
system of equations is then formulated for the interface unknowns from both
conformal and non-conformal interfaces [26, 27]. Higher-order transmission
conditions and a generalized cross-point correction technique are applied to
improve the convergence and ensure a correct interconnection across subdomain
interfaces [27]. Finally, we present several numerical results for the simulation of
wave propagation, finite antenna arrays, radomes, and optical devices to
demonstrate the accuracy, efficiency, capability, and applications of these
algorithms. For simulating large finite antenna arrays, we present an oblique
absorbing boundary condition and apply it to the FETI-DP method [28]. This
boundary condition can be tuned to become reflectionless for all frequencies and
polarizations for the main beam of the radiated wave. Equation Section 6
Domain Decomposition Methods For Finite Element Analysis 229

6.1 FETI METHODS WITH ONE AND TWO LAGRANGE


MULTIPLIERS

We start our discussion with the review of the FETI method, for two different
versions that are equipped with one Lagrange multiplier (1LM) and two Lagrange
multipliers (2LM), respectively. The FETI method is very effective for some
special domain decompositions, for example, the one-way and onion-like domain
decompositions [2931]. We illustrate how these two versions enforce the
continuity condition on the tangential electric and magnetic fields across the
subdomain interface.

6.1.1 FETI Method with One Lagrange Multiplier

To implement a nonoverlapping domain decomposition method, the entire


computational domain V is first divided into N s nonoverlapping subdomains
N N
such that s s 1Vs  V and s s 1Vs   , as shown in Figure 6.1. We consider the
second-order curl-curl equation and the Neumann boundary condition for the sth
subdomain [8]

 (r1  Es )  k02 r  Es   jk0 Z0 Jimp


s
in Vs (6.1)

nˆ s  (r1  Es )  Λb on  s (6.2)

where  s denotes the interface between the sth subdomain and its neighboring
subdomains, Λ b is an unknown variable defined on the subdomain interface, and
s
J imp is an impressed current. For the portion of the subdomain boundary S s
coinciding with the exterior surface of the entire computational domain S0 , we can
either use an absorbing boundary condition (ABC), or a PML, or a boundary
integral equation (BIE) to deal with the field there. In the following, we will omit
this boundary term in order to focus on the treatment of the subdomain interfaces.

Λb
n̂(1) 12 n̂(2)  23 n̂(3)
V
V1 V2 V3
Figure 6.1 A computational domain is divided into three nonoverlapping subdomains. An unknown
Neumann boundary condition is assumed on each subdomain interface with the aid of a
global Lagrange multiplier.
230 Advanced Computational Electromagnetic Methods and Applications

After expanding the vector electric field using vector basis functions such that
E  {Ns }T {E s } and applying Galerkin’s method, we obtain the subdomain matrix
s

equation partitioned as

 Kiis Kibs  
 Ei  
 fi  0 
 fi   0 
s s s

 s   s
  s s   s s  (6.3)
 b    Bb b 
s
 Kbi Kbb   Eb   fb 
  fb 

where

K uvs ]
[  Vs
[(  {Nus })  r1  (  {N vs }T )  k02 {Nus }   r  {N vs }T ]dV (u, v  i, b)

{ fus }   jk0 Z 0  {Nus }  J imp


s
dV (u 
i , b)
Vs

{ } 
s s
 b {N }  Λ b dS
b
Ss s

In (6.3), [ K s ] is the FEM matrix, {E s } denotes the unknown discrete electric field,
and { f s } denotes the excitation vector contributed by the source. By using the
subscripts i and b, each vector is partitioned into two parts, which are associated
with the interior and interface of the subdomain, respectively. Also, [ Bbs ] is a
Boolean matrix introduced to extract the dual unknown {bs } on the sth subdomain
interface from the global interface dual unknown vector b, such that
{bs }  [ Bbs ]{b } . Because two neighboring subdomains share the same set of
Lagrange multiplier, this version is usually referred to as the FETI method with
one Lagrange multiplier. Note that [ Bbs ] is a signed Boolean matrix that contains
only 0, 1, and –1, because the tangential magnetic field continuity condition (the
Neumann boundary condition) contains opposite signs on the two sides of an
interface as

nˆ s  (r1  Ebs )  nˆ q  ( r1  Ebq )  Λb (6.4)

Equation (6.3) can be rewritten in a more compact form


[ K s ]{E
 s
} { f s }  [ Rbs ]T [ Bbs ]{b } (6.5)

where another Boolean matrix [ Rbs ] is introduced to extract the interface field
{Ebs } out of {E s } such that {Ebs }  [ Rbs ]{E s } . Normally, we can eliminate the
interior unknown {Eis } and express the boundary unknown {Ebs } in terms of the
Domain Decomposition Methods For Finite Element Analysis 231

dual unknown {b } in each subdomain independently. The resultant matrix


equation is written as

{Ebs } [ Rbs ][ K s ]1 ({ f s }  [ Rbs ]T [ Bbs ]{b }) (6.6)

To couple the fields across all the subdomains, we use the fact that at the
interface between two subdomains, the electric field satisfies the tangential
continuity condition
nˆ s  nˆ s  Ebs  nˆ q  nˆ q  Ebq (6.7)

This Dirichlet continuity condition can be enforced by assembling (6.6) over all
the subdomains and setting it to zero, which yields
Ns
Ns
s T
b
s
b [ B ] {E }  [ B ] [ R ][ Ks T
b
s
b
s 1
] ({ f s }  [ Rbs ]T [ Bbs ]{b })  0
n 1 n 1 (6.8)
This enforcement is valid only when the interface meshes are conformal. It
requires the tangential electric fields on the two sides of an interface to be equal to
each other in an unknown-by-unknown manner. It should be noted that the +1 and
–1 entries in [ Bbs ] play an important role to make (6.8) concise. After the global
assembly over all subdomains, we have

[ Kbb ]{b }  { fb } (6.9)

where
Ns
[ Kbb ]   [ Bbs ]T [ Rbs ][ K s ]1[ Rbs ]T [ Bbs ]
n 1
Ns
{ fb }   [ Bbs ]T [ Rbs ][ K s ]1{ f s }
n 1

Once {b } is obtained by solving (6.9), the electric field in every subdomain can
be calculated using (6.6).
This formulation works well if each subdomain is either lossy or lossless but
with a size small enough so that the subdomain FEM matrix [ K s ] is never singular.
Otherwise, the iterative solution of (6.9) will not converge rapidly because of the
ill-conditioned [ Kbb ] . This problem can be alleviated with the next formulation.
232 Advanced Computational Electromagnetic Methods and Applications

6.1.2 FETI Method with Two Lagrange Multipliers

To derive the formulation of the second version, we still consider the partial
differential equation in (6.1) but with the Robin boundary condition for the sth
subdomain [9]
nˆ s  (r1  Es )   s nˆ s  (nˆ s  Es )  Λbs on  s (6.10)

where Λ bs is a local unknown variable defined on the interface of the sth


subdomain, as shown in Figure 6.2. Instead of sharing the same set of Lagrange
multiplier, two neighboring subdomains have their own independent Lagrange
multipliers defined on their side of the interface. As a result, this version is usually
referred to as the FETI method with two Lagrange multipliers.

Λ b(1) Λ b(2) Λ b(3)

n̂ (1) 12 n̂ (2)  23 n̂ (3)


V
V1 V2 V3
Figure 6.2 A computational domain is divided into three nonoverlapping subdomains. An unknown
Robin boundary condition is assumed on each subdomain interface with the aid of local
Lagrange multipliers.

Similar to (6.3), we have the subdomain FEM matrix equation partitioned as

 Kiis Kibs  Ei  


 fi  0
s s

 s 
 s
  s s (6.11)
 Kbi Kbbs  M bbs  
 Eb   fb 
  b 

where

[ M bbs ]    s (nˆ s {Nbs })  (nˆ s {Nbs }T )dS


Ss s

{bs } Ss s
{Nbs }  Λbs dS

In (6.11), [ K s ] , {E s } , and { f s } are the same as those defined in Section 6.1.1,


and [ M bbs ] is a surface mass matrix due to the Robin boundary condition.
Although {bs } is local, it is necessary to construct a global interface problem to
relate all the local dual unknowns. Therefore, we introduce a Boolean matrix [Q s ]
to select the subdomain dual unknown {bs } from the global dual-unknown vector
Domain Decomposition Methods For Finite Element Analysis 233

{b }  ({b1}, , {bNs })T such that {bs }  [Qs ]{b } . Different from [ Bbs ] , [Q s ]
is unsigned. Similar to (6.6), the subdomain matrix equation for the boundary
unknown {Ebs } in terms of the dual unknown {bs } can be written as

{Ebs } [ Rbs ][ K s ]1 ({ f s }  [ Rbs ]T {bs }) (6.12)

where [ Rbs ] is the same as that defined in Section 6.1.1 but [ K s ] now contains the
surface mass matrix [ M bbs ] .
At the interface  sq , we add the two Robin boundary conditions on the two
sides of the interface such that

Λbs  Λbq   s nˆ s  (nˆ s  Ebs )   q nˆ q  (nˆ q  Ebq ) on  sq (6.13)

This equation holds true because nˆ s  (r1  Es )  nˆ q  (r1  Eq ) , which is
the continuity condition for the tangential magnetic field. Further, because of the
continuity condition for the tangential electric field nˆ s  (nˆ s  Ebs )  nˆ q  (nˆ q  Ebq ) ,
we obtain the following transmission conditions

 Λb  Λb  (   )nˆ  (nˆ  Eb )
 s q s q q q q

 q on sq (6.14)
 Λb  Λb  (   )nˆ  (nˆ  Eb )
s q s s s s

The choice of  has to satisfy the condition  s   q  0 . Taking the sth


subdomain as reference, we can discretize the first equation of (6.14) to obtain
{bs }q 
{bq }s  [M bbsq ]{Ebq }s
(6.15)
where

 ( s   q )(nˆ q {Nbs })  (nˆ q  {Nbq }T )dS


sq
[M
bb ] sq

Note that [ M bbsq ] is a projection matrix mapping the interface electric field {Ebq }
in the qth subdomain to the dual unknown {bs } in the sth subdomain. Equation
(6.15) can further be rewritten as

{bs }q 
[Tsq ]{bq }  [M bbsq ][Tsq ]{Ebq }
(6.16)
234 Advanced Computational Electromagnetic Methods and Applications

s
where [Tq ] is a Boolean matrix employed to extract the interface unknowns
defined on  sq from those defined on  s such that E   T E 
s
b q q
s s
b and

{bs }q  [Tqs ]{bs } . Equation (6.16) can be reduced by eliminating {Ebq } using
(6.12) and the result is

{bs }q  ([Tsq ]  [M bbsq ][Tsq ][ Fbbq ]){bq }  [ M bbsq ][Tsq ]{dbq }


(6.17)
where
[ Fbbq ]  [ Rbq ][ K q ]1[ Rbq ]T
{dbq }  [ Rbq ][ K q ]1{ f q }

We can then reassemble (6.17) over all s and q to obtain an interface system
for all the subdomains as

[ Kbb ]{b }  { fb } (6.18)


where
Ns
] [ I ]   [Q s ]T
[ Kbb  [Tqs ]T ([Tsq ]  [ M bbsq ][Tsq ][ Fbbq ])[Q q ]
s 1 qneighbor( s )
Ns
{ fb }   [Q s ]T  [Tqs ]T [ M bbsq ][Tsq ]{dbq }
s 1 qneighbor( s )

Once {b } is computed by solving (6.18), the electric field in every subdomain
can be calculated using (6.12).

6.1.3 Symbolic Formulation

From Sections 6.1.1 and 6.1.2, we can find many similarities between the FETI
method with 1LM and the FETI method with 2LM. First of all, both versions need
to eliminate the interior electric field unknown and express the boundary electric
field unknown in terms of the dual known, which can be written symbolically as
{Ebs }  f ({b },{ f s }) (6.19)

This step is usually performed by a direct solver. Second, both versions need to
solve a global interface problem after assembling over all the subdomains, which
can be written symbolically as
F ({b },{ f })  0 (6.20)
Domain Decomposition Methods For Finite Element Analysis 235

This final interface equation can be solved iteratively using an iterative solver, for
example, Krylov subspace methods such as the generalized minimum residual
(GMRES) method and the stabilized biconjugate gradient (BiCGStab) method [8].
As we can see, the FETI method hybridizes direct and iterative solvers in a two-
level manner during the solution procedure. The global interface problem of the
FETI method with 1LM is generally indefinite and a Dirichlet preconditioner is
required for a fast convergence [3, 4], In contrast, the global interface problem of
the FETI method with 2LM is positive-definite and all the eigenvalues are located
within the unit circle centered at (1, 0) on the complex plane.

6.2 FETI-DP METHODS WITH ONE AND TWO LAGRANGE


MULTIPLIERS

When we deal with more general cases, for example, the checkerboard domain
decompositions [3, 8], it is inevitable to encounter geometrical crosspoints, which
are the interfaces shared by more than two subdomains, as shown in Figure 6.3.
These geometrical crosspoints are also called corners.
1 12 2
(2)
n̂ (1)

 13  24

n̂ (3) n̂ (4)
3 34 4
Figure 6.3 A computational domain is divided into four nonoverlapping subdomains. A
geometrical crosspoint shared by more than two subdomains occurs in this
decomposition.

The FETI method, no matter with 1LM or 2LM, encounters difficulties when
dealing with the continuity condition at a corner. First of all, there are four electric
field unknowns {Ec(1) } , {Ec(2) } , {Ec(3) } , and {Ec(4) } defined at the corner in Figure
6.3, but only three independent equations can be obtained from the tangential
electric field continuity condition {Ecs }  {Ecq } ( s, q  1, 2,3, 4 and s  q ). Second,
because the tangential magnetic fields at the corner are interrelated, {c(1) } , {c(2) } ,
{c(3) } , and {c(4) } are not independent at the corner. From these two points, we
can see the inherent redundancy associated with the FETI method in this general
case. To remove the redundancy and the resultant singularity, Farhat and his
236 Advanced Computational Electromagnetic Methods and Applications

colleagues [57] proposed a dual-primal (DP) strategy to extract {Ecs } out and give
them a unique global index to construct a global corner system. This idea works
perfectly for the 1LM version because {cs } associated with corner edges are
cancelled out during the global assembly process due to the associated Neumann
boundary condition. However, for the 2LM version, this is not the case because of
the Robin boundary condition. As a remedy, we change the Robin boundary
condition into the Neumann boundary condition at the corners for the 2LM version.
As such, the resultant two improved FETI methods are usually referred to as the
dual-primal finite element tearing and interconnecting (FETI-DP) method with
1LM and 2LM, respectively.

6.2.1 FETI-DP Method with One Lagrange Multiplier

Again, we consider the boundary-value problem (BVP) defined by (6.1) and (6.2).
On the discretization level, the subdomain matrix equation is partitioned as
 Kiis Kibs Kics   Eis   fi s   0   fi s   0 
 s    s  s  s  s 
 Kbi Kbbs Kbcs   Ebs    fb   b    fb    Bb b  (6.21)
 K cis
 K cbs K ccs   s
 Ec 
 s  s
 f c  c 
 s  s 
 f c   c 
where
K uvs ]
[  Vs
[(  {Nus })  r1  (  {N vs }T )  k02 {Nus }   r  {N vs }T ]dV (u, v  i, b, c)

{ fus }   jk0 Z 0  {Nus }  J imp


s
dV (u 
i, b, c)
Vs

{ } 
s s
 b {N }  Λ b dS
b
Ss s

{cs }   jk0 Z 0  {N cs }  (nˆ s  H)dS


Ss s

In (6.21), [ K s ] , {E s } , { f s } , and [ Bbs ] are the same as those defined in Section


6.1.1. By introducing the subscript c, together with the subscripts i and b, we
partition each vector into three parts, which are associated with the interior,
interface, and corner of the subdomain, respectively. The separation of the corner
unknowns is one of the most important features of the dual-primal idea. Equation
(6.21) can be rewritten in a more compact form

 K rrs K rcs  
 Ers 
  f rs 
   [ Rbrs ]T Bbs b 

 s s  s
  s    (6.22)
 K cr K cc   Ec   fc 
    cs


Domain Decomposition Methods For Finite Element Analysis 237

where [ Rbrs ] is a Boolean matrix to extract the interface unknowns {Ebs } out of the
remaining unknowns {Ers } such that {Ebs }  [ Rbrs ]{Ers } . Other matrices and vectors
are defined as
 K s Kibs  s  Kics   fi 
s
[ K rrs ]   iis s  , [ K rc ]   s  , [ K s
cr ]  [ K s
ci K s
cb ] , { f r
s
}   s  (6.23)
 Kbi Kbb   Kbc   fb 
As we separate the corner unknowns {Ecs } from the noncorner interface unknowns
{Ebs } , we can obtain two equations for the sth subdomain after eliminating the
interior unknowns {Eis } as

{Ebs }  [ Rbrs ][ Krrs ]1 ({ f rs }  [ Rbrs ]T {bs }  [ Krcs ][ Bcs ]{Ec }) (6.24)

([ Kccs ]  [ Kcrs ][ Krrs ]1[ Krcs ])[ Bcs ]{Ec }

 { fcs }  {cs }  [ Kcrs ][ Krrs ]1{ f rs }  [ Kcrs ][ Krrs ]1[ Rbrs ]T [ Bbs ]{b } (6.25)

where the Boolean matrix [ Bcs ] is introduced to extract the local corner unknowns
from the global corner unknowns, which can be expressed mathematically as
[ Bcs ]{Ec }  {Ecs } .
To couple the fields over all the subdomains, however, we apply the Dirichlet
continuity condition in (6.7) to all interface electric field unknowns.
Mathematically, we can obtain the following matrix equation by assembling (6.24)
over all the subdomains and setting it to zero, which yields

Ns Ns

[ B ] {E }  [ B ]
n 1
s T
b
s
b
n 1
s T
b

[ Rbrs ][ Krrs ]1 ({ f rs }  [ Rbrs ]T [ Bbs ]{b }  [ Krcs ][ Bcs ]{Ec })  0 (6.26)
This equation is very similar to (6.8) except that it contains the contribution from
the global corner unknowns {Ec } . However, to obtain the other system equation
relating {b } to {Ec } , we can sum (6.25) over all the subdomains as

Ns Ns

[ B ]
n 1
s T
c ([ Kccs ]  [ Kcrs ][ K rrs ]1[ K rcs ])[ Bcs ]{Ec }  [ Bcs ]T
n 1

({ fcs }  {cs }  [ Kcrs ][ Krrs ]1{ f rs }  [ Kcrs ][ Krrs ]1[ Rbrs ]T [ Bbs ]{b }) (6.27)
238 Advanced Computational Electromagnetic Methods and Applications

Because the tangential component of the magnetic field is continuous across the
interface between the subdomains (assume that no surface electric current exists on
the interface), based on the definition of {cs } , we have
Ns

[ B ] { }  0
n 1
s T
c
s
c (6.28)

so that {cs } disappears in (6.27) after assembly.


Now, we have two variables {b } and {Ec } coupled through two equations
(6.26) and (6.27). In this complete system of equations, {b } is called the dual
variable and {Ec } is called the primal variable. Because the dimension of {Ec } is
usually orders of magnitude smaller than that of {b } , an efficient solution
procedure is to eliminate {Ec } first and to solve {b } only. To be more specific,
we rewrite (6.26) and (6.27) as

[ Kbb ]{b }  [ Kbc ]{Ec } 


{ fb } (6.29)

c } { f c }  [ Kcb ]{b }
[ Kcc ]{E (6.30)

where
Ns
[ K bb ]   [ Bbs ]T [ Rbrs ][ K rrs ]1[ Rbrs ]T [ Bbs ]
n 1
Ns
[ K bc ]   [ Bbs ]T [ Rbrs ][ K rrs ]1[ K rcs ][ Bcs ]
n 1
Ns
{ f b }   [ Bbs ]T [ Rbrs ][ K rrs ]1{ f rs }
n 1

Ns
[ K cc ] [B ]
n 1
s T
c ([ K ccs ]  [ K crs ][ K rrs ]1[ K rcs ])[ Bcs ]
Ns
[ K cb ]   [ Bcs ]T [ K crs ][ K rrs ]1[ Rbrs ]T [ Bbs ]
n 1
Ns
{ fc } [B ]
n 1
s T
c ({ f cs }  [ K crs ][ K rrs ]1{ f rs })
Domain Decomposition Methods For Finite Element Analysis 239

In these equations, [ Kcrs ]  [ Krcs ]T (assume that  r and  r are symmetric) and
[ Kcb ]  [ Kbc ]T because [ K rrs ] is symmetric. We can eliminate {Ec } in (6.29) and
(6.30) to find

([ Kbb ]  [ Kbc ][ Kcc ]1[ Kcb ]){ 1


b } { fb }  [ Kbc ][ Kcc ] { f c } (6.31)

After {b } is solved for, {Ec } can be obtained from (6.30) and the electric field
inside each subdomain can be obtained by solving (6.24) [8].

6.2.2 FETI-DP Method with Two Lagrange Multipliers

Now we consider the BVP defined by (6.1) and (6.10). Adopting the i, b, c
subscript notation, we can write the subdomain matrix equation as

 Kiis Kibs Kics   Eis   fi s   0 


 s    s  s s
Kbbs  M bbs Kbcs   Ebs    fb   b  Lbc Ec  (6.32)
s
 Kbi
 K cis
 K cbs K ccs   s
 Ec 
 s 
 fc   cs


where

{bs } Ss s
{Nbs }  Λ bs dS

{cs }   jk0 Z 0  {N cs }  (nˆ s  H)dS


Ss s

[ Lsbc ]    s (nˆ s {Nbs })  (nˆ s {N cs }T )dS


Ss s

In (6.32), [ K s ] , {E s } , and { f s } are the same as those defined in Section 6.2.1,


whereas [ M bbs ] is the same as that defined in Section 6.1.2. Note that [ M bbs ] does
not exist for the corner-related interface because the Robin boundary condition is
changed to the Neumann boundary condition at the corner. The additional term
[ Lsbc ] , introduced by the Neumann boundary condition, denotes the interaction
between the dual unknown {bs } and the corner electric field {Ecs } in the sth
subdomain. From (6.32), we can obtain two equations involving the interface and
dual unknowns on the subdomain interface as

{Ebs }  [ Rbrs ][ Krrs ]1 { f rs }  [ Rbrs ]T {bs }  ([ Krcs ]  [ Rbrs ]T [ Lsbc ])[ Bcs ]{Ec } (6.33)
240 Advanced Computational Electromagnetic Methods and Applications

[K s
cc ]  [ Kcrs ][ Krrs ]1 ([ Krcs ]  [ Rbrs ]T [ Lsbc ]) [ Bcs ]{Ec }

 { fcs }  {cs }  [ Kcrs ][ Krrs ]1{ f rs }  [ Kcrs ][ Krrs ]1[ Rbrs ]T {bs } (6.34)

where [ Rbrs ] , [ Bcs ] , [ K rcs ] , [ K crs ] , and { f rs } are the same as those defined in
Section 6.2.1 except for [ K rrs ] , which is now defined as

Ks Kibs 
[ K rrs ]   iis s s 
(6.35)
 Kbi Kbb  M bb 

Reassembling (6.34) through all the subdomains yields a global corner-related


finite element system

c } { f c }  [ Kcb ]{b }
[ Kcc ]{E (6.36)

where
Ns
[ K cc ]   [ Bcs ]T [ K ccs ]  [ K crs ][ K rrs ]1 ([ K rcs ]  [ Rbrs ]T [ Lsbc ]) [ Bcs ]
s 1
Ns
[ K cb ]   [ Bcs ]T ([ K crs ][ K rrs ]1[ Rbrs ]T )[Q s ]
s 1
Ns
{ fc } [B ]
s 1
s T
c ({ f cs }  [ K crs ][ K rrs ]1{ f rs })

Here, [Q ] is the same as that defined in Section 6.1.2. Note that {cs } in all
s

subdomains are cancelled out after the Neumann continuity condition is enforced
at the corner.
However, at the interface  sq , by enforcing the tangential continuity of the
magnetic field and the tangential continuity of the electric field step by step, we
can obtain similar transmission conditions as given in (6.14). Again, taking the sth
subdomain as reference, we can obtain the following matrix equation on the
discretization level as

{bs }q  [ Lsbc ]q {Ecs }q 


{bq }s  [ Lsqbc ]{Ecq }s  [ M bbsq ]{Ebq }s
(6.37)
where

Lsbc ]q
[   sq
 s (nˆ s {Nbs })  (nˆ s {Ncs }T )dS

[ Lsq
 bc ]   sq
 q (nˆ q {Nbs })  (nˆ q {N cq })T dS .
Domain Decomposition Methods For Finite Element Analysis 241

In (6.37), [ M bbsq ] is the same as that defined in Section 6.1.2. Note that [ Lsq
bc ] is a

projection matrix mapping the corner electric field {Ecq } in the qth subdomain to
the dual unknown {bs } in the sth subdomain. Equation (6.37) can further be
rewritten as

{bs }q  [ Lsbc ]q [Sqs ]{Ecs } 


[Tsq ]{bq }  [ Lsqbc ][Ssq ]{Ecq }  [M bbsq ][Tsq ]{Ebq } (6.38)
s s
where [Tq ] is defined in Section 6.1.2. Another Boolean matrix [ Sq ] is used to
extract the corner unknowns defined on  sq from those defined on  s such that
{Ecs }q  [Sqs ]{Ecs } . We can further substitute {Ebq } on the left-hand side of (6.33)
into (6.38) to obtain

{bs }q  ([Tsq ]  [M bbsq ][Tsq ][ Fbbq ]){bq }  [ Lsbc ]q [ Sqs ][ Bcs ]{Ec }

([ Lsq q q sq q q sq q q
bc ][ Ss ][ Bc ]  [ M bb ][Ts ][ Fbc ]){Ec }  [ M bb ][Ts ]{db } (6.39)

where
[ Fbbq ]  [ Rbrq ][ K rrq ]1[ Rbrq ]T
[ Fbcq ] [ Rbrq ][ K rrq ]1 ([ K rcq ]  [ Rbrq ]T [ Lqbc ])[ Bcq ]
{dbq }  [ Rbrq ][ K rrq ]1{ f rq }

Finally, we can reassemble (6.39) over all s and q to obtain an interface system for
all subdomains as

[ Kbb ]{b }  [ Kbc ]{Ec } 


{ fb } (6.40)

where
Ns
] [ I ]   [Q s ]T
[ K bb  [Tqs ]T ([Tsq ]  [ M bbsq ][Tsq ][ Fbbq ])[Q q ]
s 1 qneighbor( s )

Ns
 
[ K bc ]   [Q s ]T   [Tqs ]T [ Lsbc ]q [ Sqs ]  [ Bcs ]
s 1  qneighbor( s ) 
Ns
  [Q s ]T  [Tqs ]T ([ Lsqbc ][ S sq ][ Bcq ]  [ M bbsq ][Tsq ][ Fbcq ])
s 1 qneighbor( s )
Ns
{ f b }   [Q s ]T  [Tqs ]T [ M bbsq ][Tsq ]{d bq }
s 1 qneighbor( s )
242 Advanced Computational Electromagnetic Methods and Applications

By combining (6.36) and (6.40) to eliminate the primal variable {Ec } , we can
derive the global interface equation for the dual variable {b } as

([ Kbb ]  [ Kbc ][ Kcc ]1[ Kcb ]){ 1


b } { fb }  [ Kbc ][ Kcc ] { f c } (6.41)

After {b } is solved, {Ec } can be obtained from (6.36) and the electric field inside
each subdomain can be obtained by solving (6.33) [9].

6.2.3 Comparison Between FETI-DP Methods with One and Two Lagrange
Multipliers

Similar to the FETI method, the formulation of the FETI-DP method can also be
written symbolically. For both versions, because we separate the corner unknowns
from the noncorner interface unknowns, after eliminating the interior unknowns
{Eis } , we can extract two equations on the subdomain level. One is for the discrete
fields on the subdomain interface
{Ebs }  f ({Ec },{b },{ f s }) (6.42)

which is called the subdomain interface system. The other equation is for the fields
at the corners of the subdomain
{Ecs }  g ({b },{ f s }) (6.43)

which is called the subdomain corner system. With these, we have effectively
converted a subdomain volumetric problem into a subdomain surface problem.
Next, we have to couple all the subdomains. After the global assembly
through the noncorner interface and the corner interface, we obtain two global
interface systems, which are
F ({Ec },{b },{ f })  0 (6.44)

{Ec }  G({b },{ f }) (6.45)

An efficient solution strategy is to eliminate {Ec } to obtain the final system that
relates the dual unknown b  and the excitation vector { f } . The elimination of
{Ec } provides an additional benefit for the FETI-DP method with 2LM, this is, the
resultant condensed linear system becomes positive-definite, which is similar to
that of (6.18). However, if we solve {b } and {Ec } together, the resultant linear
system in both the 1LM and 2LM versions is indefinite, which is not desirable for
an iterative solution [15].
Domain Decomposition Methods For Finite Element Analysis 243

The purpose of the coarse grid correction is two fold: (1) to avoid redundant
auxiliary variables at corner edges because no dual unknowns have to be defined
there; and (2) to introduce a mechanism to propagate the iterative residual error
globally and thus improve the convergence rate.
Up to now, one may find that the FETI-DP method with 1LM is more concise
and simpler to implement. However, it is not scalable with respect to the
subdomain size, that is, a much slower convergence is observed when the electrical
size of a subdomain is large enough to support resonant modes [8]. This is due to
the unknown Neumann boundary condition assumed on the subdomain interface.
In contrast, the FETI-DP method with 2LM is free of numerical resonance and thus
prevails in the high-frequency applications. All modifications and improvements
presented in the following sections are made to the FETI-DP method with 2LM
because all the applications considered in this chapter are pertinent to high-
frequency problems.

6.3 LM-BASED NONCONFORMAL FETI-DP METHOD

On the two sides of a subdomain interface, the conformal FETI-DP method with
2LM does not expand the dual variables (Lagrange multipliers) explicitly and the
continuity conditions across the interface are enforced on an unknown-by-
unknown basis. Unfortunately, such an unknown-by-unknown correspondence
does not exist if the meshes for the two neighboring subdomains are not the same.
Such a case is called a nonconformal interface. Recently, an effort to extend the
FETI-DP algorithm to deal with nonconformal meshes was presented in [32] and
some preliminary results were obtained for the Laplace equation. In this section,
we extend the conformal FETI-DP method with 2LM [14, 15] to the case with
nonconformal interface meshes. The new DDM algorithm is referred to as the
Lagrange-multiplier (LM)-based FETI-DP method in the following context.

6.3.1 Nonconformal Interface and Conformal Corner Meshes

We focus on the BVP defined by (6.1) and (6.10), with nonconformal meshes on
the subdomain interface. The resultant subdomain matrix equation, which is very
similar to (6.32), can be written as
 Kiis Kibs Kics   Eis   fi s   0 
 s s  s  s  s s s
 Kbi Kbbs  M bbs Kbc   Eb   fb    Bbb b  Lbc Ec 
s
(6.46)
 K cis
 K cbs K ccs   s  s 
 Ec   f c   cs


where

[ Bbbs ]
 Ss s
{Nbs }{Nbs }T dS
244 Advanced Computational Electromagnetic Methods and Applications

In (6.46), [ K s ] , {E s } , { f s } , [ M bbs ] , [ Lsbc ] , {bs } , and {cs } are the same as those
defined in Section 6.2.2. The only extra term is [ Bbbs ] , which represents the
interaction between the interface electric field and the interface dual unknown.
Different from the conformal FETI-DP method with 2LM, the dual unknown Λ bs
here is explicitly expanded in terms of a set of curl-conforming vector basis
functions defined on  s such that Λbs  {Nbs }T { s } [15]. From (6.46), we can
obtain two equations involving the interface and dual unknowns on the subdomain
interface. One is

{Ebs }  [ Rbrs ][ Krrs ]1 { f rs }  [ Rbrs ]T [ Bbbs ]{bs }  ([ Krcs ]  [ Rbrs ]T [ Lsbc ])[ Bcs ]{Ec } (6.47)

and the other is the same as (6.34).


To couple the fields in all the subdomains, we can still obtain (6.36) by
assembling (6.34) globally. However, at the interface  sq , by enforcing the
continuity conditions for the tangential magnetic and electric fields, we can obtain
a discretized transmission condition which is similar to (6.37) as

[ Nbbs ]q {bs }q  [ Lsbc ]q {Ecs }q  [ Nbbsq ]{bq }s  [ Lsqbc ]{Ecq }s  [ Mbbsq ]{Ebq }s (6.48)

where

[ Nbbs ]q   {N bs }  {N bs }T dS
 sq

[ N ]   {N bs }  {N bq }T dS
sq
bb  sq

s
and [ Lbc ]q , [ Lsq sq
bc ] , and [ M bb ] are the same as those defined in Section 6.2.2. Note

s
that [ Nbb ]q is always diagonally dominant. Therefore, we can take the inversion of
[ Nbbs ]q to write the transmission condition (6.48) as

{bs }q  [ Nbbs ]q1[ Lsbc ]q [Sqs ]{Ecs }  [ Nbbs ]q1[ Nbbsq ][Tsq ]{bq }

[ Nbbs ]q1[ Lsqbc ][Ssq ]{Ecq }  [ Nbbs ]q1[M bbsq ][Tsq ]{Ebq } (6.49)
s s
where the Boolean matrices [Tq ] and [ Sq ] are the same as those defined in
Section 6.2.2. Similarly, we can further eliminate {Ebq } with the aid of (6.47) and
then assemble over all the subdomains to obtain the global interface system as
Domain Decomposition Methods For Finite Element Analysis 245

[ Kbb ]{b }  [ Kbc ]{Ec } 


{ fb } (6.50)

where
Ns
] [ I ]   [Q s ]T
[ K bb  [Tqs ]T [ N bbs ]q1 ([ N bbsq ][Tsq ]  [ M bbsq ][Tsq ][ Fbbq ])[Q q ]
s 1 qneighbor( s )

Ns
 
[ K bc ]   [Q s ]T   [Tqs ]T [ N bbs ]q1[ Lsbc ]q [ Sqs ]  [ Bcs ]
s 1  qneighbor( s ) 
Ns
  [Q s ]T  [Tqs ]T [ N bbs ]q1 ([ Lsqbc ][ S sq ][ Bcq ]  [ M bbsq ][Tsq ][ Fbcq ])
s 1 qneighbor( s )
Ns
{ f b }   [Q s ]T  [Tqs ]T [ N bbs ]q1[ M bbsq ][Tsq ]{dbq }
s 1 qneighbor( s )

In these expressions, [ Fbbq ]  [ Rbrq ][ Krrq ]1[ Rbrq ]T [ Bbbq ] , and [ Fbcq ] and {dbq } are the
same as those defined in Section 6.2.2. By combining (6.36) and (6.50), we can
solve for {b } and {Ec } . Afterwards, the electric field inside each subdomain can
be obtained by solving (6.47).

6.3.2 Extension to Nonconformal Interface and Corner Meshes

To further enhance the capability of the LM-based FETI-DP method to deal with
arbitrary meshes, we now focus on the extension to the nonconformal corner case.
We start from the continuity condition on one geometrical crosspoint (  c ) as
illustrated in Figure 6.4, where four subdomains share one global corner edge.
We denote the number of unknowns defined on each local corner edge as N c ,
then call the corner with most unknowns the “master” corner and the others “slave”
corners so that Ncslave  Ncmaster . Note that subdomains with more than one
crosspoint could contain both master and slave corners. For the LM-based FETI-
DP method presented in Section 6.3.1, we impose the Dirichlet continuity
condition at the corner as

Etmaster =
Eslave
t (6.51)

in a weak sense, where the subscript t specifies the tangential electric field along
the corner edge.
The tangential electric field for the master and slave subdomains (taking one
slave subdomain for example) can be expanded by two independent sets of basis
functions {Ncmaster } and {Nslave
c } as
246 Advanced Computational Electromagnetic Methods and Applications

 Eslave  Nc E slave Nslave  {Nslave }T {E slave }


slave

 t  n1 c, n c, n c c
(6.52)
 Ncmaster
Et

master
  n 1 Ec , n N c , n  {N c } {Ecmaster }.
master master master T

N cmaster N cslave
master slave
corner corner
c
slave slave
corner corner

N cslave N cslave

Figure 6.4 Master and slave corners associated with one shared crosspoint, and the number of
unknowns defined on the master corner is larger than or equal to those defined on the
slave corners ( Ncslave  Ncmaster ).©2012 IEEE [15].

By substituting (6.52) into (6.51) and testing both sides using {Nslave
c } , we obtain

[Gccslv-slv ]{Ecslave }  [ H ccslv-mst ]{Ecmaster } (6.53)

where

[Gccslv-slv ]   {Nslave
c }  {Nslave
c }T dl
c

[H slv-mst
cc ]   {Nslave
c }  {Ncmaster }T dl.
c

Note that the matrix dimensions of [Gccslv-slv ] and [ H ccslv-mst ] are Ncslave  Ncslave and
Ncslave  Ncmaster , respectively. Because [Gccslv-slv ] is always diagonally dominant and
thus invertible, we have

{Ecslave }  [Gccslv-slv ]1[ Hccslv-mst ]{Ecmaster } (6.54)

which means that the corner unknowns defined on the slave corners can be
represented by those on the master corners. Therefore, one can construct a global
coarse problem by using only the corner unknowns on all the master corners. After
incorporating the nonconformal corner scheme into the LM-based FETI-DP
method, we find that the matrices and vectors related to the global corner system
Domain Decomposition Methods For Finite Element Analysis 247

remain the same for the master subdomains but have to be modified for the slave
subdomains [15].

6.4 CE-BASED NONCONFORMAL FETI-DP METHOD

The FETI-like domain decomposition method [3337] employs cement elements


defined on both sides of the subdomain interface to glue neighboring subdomains
with nonconformal meshes. To formulate the global interface problem, it uses both
the cement element and the interface electric field as the communicator, resulting
in a double-sized global interface system compared to that of the FETI-DP method
with 2LM. Other differences between these two methods lie in the formulation of
the interface transmission condition, the choice and expansion of the dual variables,
and the treatment of dual unknowns associated with the geometrical crosspoints.
Due to the lack of the global corner system as well as the difficult choice of basis
functions to expand the cement variables, the FETI-like cement element method
suffers from slow convergence when solving the global interface problem. In this
section, the FETI-like cement element method is formulated to combine interface
dual unknowns with global primal unknowns defined on the corners. The
capability of dealing with nonconformal meshes is preserved. The new DDM
algorithm is referred to as the cement-element (CE)-based FETI-DP method in the
following context.

6.4.1 Nonconformal Interface and Conformal Corner Meshes

As for the CE-based FETI-DP method, we regard all the subdomain interfaces
(except corners) as unknown Neumann boundaries with an auxiliary unknown
representing the surface current density
js nˆ s  (r1  Es ) on  s (6.55)

Each subdomain communicates with its neighboring subdomains through the


following Robin transmission condition
js   s nˆ s  (nˆ s  Es )  jq   q nˆ q  (nˆ q  Eq ) on  sq (6.56)

where   jk0 .
To solve the BVP defined in (6.1) and (6.55) using the FEM method, each
subdomain is discretized separately into finite elements such as tetrahedra. Based
on the formulation we have derived, we can either choose the same set of vector
(e.g., curl-conforming) basis functions {Nbs } to expand both the electric field and
the interface auxiliary variable js , or choose the orthogonal sets {Nbs } and
nˆ s  {Nbs } (as was done in [3436]) to expand them, respectively. Here, we use the
248 Advanced Computational Electromagnetic Methods and Applications

s
same set of vector basis functions to expand both E and js [15]. It should be
noted that the following derivation is also valid if one chooses nˆ s  {Nbs } as the
basis function to expand js . By applying Galerkin’s method, the FEM equation for
the sth subdomain can be written as

E s 
 K iis Kibs Kics 0   is   fi s 
 s  E   
 K bi K bbs K bcs Bbjs   bs    f bs  (6.57)
E
 K cis
 K cbs K ccs Bcjs   cs   f cs 
 j 

where

B s
bj    S s  s
N  N  dS
s
b
s T
b

B s
cj    S s  s
N  N  dS
s
c
s T
b

In (6.57) [ K s ] , {E s } , and { f s } are the same as those defined in Section 6.3.1. By


testing the Robin boundary condition and summing over all the interfaces between
the sth subdomain and its neighboring subdomains using the interface basis
functions of the sth subdomain, the transmission condition can be converted into a
matrix equation as
 Ebs   Ebq 
   
[ D sjb D sjc C sjj ]  Ecs    [U sq jb U sq
jc V jjsq ]  Ecq  (6.58)
 j s  qneighbor( s )  jq 
   

where
D sjb ]
[  Ss s
(nˆ s {N bs })  ( nˆ s {N bs }T ) dS

D sjc ]
[  Ss s
(nˆ s {N bs })  ( nˆ s {N cs }T ) dS
j
[C sjj ]
k0  Ss s
{N bs }  {Nbs }T dS

 (nˆ q {N bs })  ( nˆ q {N bq }T ) dS


sq
[U 
jb ]  sq

 (nˆ q {N bs })  ( nˆ q {N cq }T ) dS


sq
[U 
jc ]  sq

1
[V jjsq ] 
jk0  sq
{N bs }  {N bq }T dS
Domain Decomposition Methods For Finite Element Analysis 249

Equations (6.57) and (6.58) can be combined to form a complete system for the sth
subdomain as

 K iis K ibs K ics 0   Eis   fi s 


 s    
 K bi K bbs K bcs Bbjs   Ebs   fbs 
   (6.59)
 K cis K cbs K ccs Bcjs   Ecs   f cs 
  s
 0 D sjb D sjc C sjj  
j  
g 
s

where

 Ebq 
 
{g s }   [U sq
jb U sq
jc V jjsq ]  Ecq 
qneighbor( s )  jq 
 
After reordering unknowns in each subdomain, we obtain

 K iis K ibs 0 K ics   Eis   f i s   f i s   0 


 s      
 Kbi Kbbs Bbjs Kbcs   Ebs   f bs   f bs   0 
         (6.60)
 0 D sjb C sjj D sjc   j s   g s   0   g s 
 s  s
 K ci K cbs Bcjs K ccs     s  s  
 Ec   f c   f c   0 

which can further be written in a compact form as

 K rrs K rcs   urs   f rs   f gs 


 s       (6.61)
 K cr Kccs   Ecs   f cs   0 

where
 K iis K ibs 0   K ics 
   
[ K rrs ]   K bis K bbs Bbbs  , [ K rcs ]   K bcs  , [ K crs ]  [ K cis K cbs Bcjs ]
 0 Dbbs s 
Cbb   D sjc 
  
 Eis   fi s  0
s  s  s  s s  
{ur }   Eb  , { f r }   f b  , and { f g }   0 
 s   g s 
j  0  
s
Based on the convention adopted in [2], { f g } represents the contribution from all
s s T s
neighbors of the sth subdomain and can be written as { f g }  [ R j ] {g } , where
[ R sj ]  [0 0 Ibbs ] and [ I bbs ] is an identity matrix.
250 Advanced Computational Electromagnetic Methods and Applications

From (6.61), we can find that the subdomain system matrices for different
subdomains become decoupled, while the interaction with the neighboring
subdomains is included in the mixed boundary condition at the interfaces. By using
the first equation in (6.61) and a Boolean matrix

0 I bbs 0
[ Rbbs ]   
0 0 I bbs 

the electric field and auxiliary unknowns at the subdomain interfaces can be found
as

{ubs } {
 Ebs , j s }T [ Rbbs ]{urs }

 [ Rbbs ][ Krrs ]1 ({ f rs }  [ Rsj ]T {g s }  [ Krcs ][ Bcs ]{Ec }) (6.62)

where [ Bcs ] is defined in Section 6.2.1.


Further, with the aid of the global boundary unknown vector {u} and the
Boolean matrix [Q s ] , we can obtain the global interface equation as

Ns Ns
{u} s
b
s 1 s 1

 [Q ] {u } [Q ] [ R
s T s T s
bb ][ K rrs ]1{ f rs }

Ns Ns
[Q s ]T [ Rbbs ][ K rrs ]1[ R sj ]T {g s }  [Q s ]T [ Rbbs ][ K rrs ]1[ K rcs ][ Bcs ]{Ec } (6.63)
s 1 s 1

Similar to the conformal FETI-DP method with 2LM, the subdomain level corner
unknown related system here can be derived from (6.61) by eliminating {urs } as

([ Kccs ]  [ Kcrs ][ Krrs ]1[ Krcs ])[ Bcs ]{Ec }

 { fcs }  [ Kcrs ][ Krrs ]1{ f rs }  [ Kcrs ][ Krrs ]1[ Rsj ]T {g s } (6.64)

Finally, we obtain two equations to relate the global interface dual unknowns
and the global corner primal unknowns

[ Krr ]{u}  [ Krc ]{Ec } 


{ fr } (6.65)

where
Domain Decomposition Methods For Finite Element Analysis 251

Ns
] [ I ]   [Q s ]T [ Rbbs ][ K rrs ]1[ R sj ]T
[ K rr  [U sq
jb V jjsq ][Q q ]
s 1 qneighbor( s )

Ns
 q 
[ K rc ]  [Q ] [ R
s T s
bb ][ K rrs ]1  [ K rcs ][ Bcs ]  [ R sj ]T  [U sq jc ][ Bc ] 
s 1  qneighbor( s ) 
Ns
{ f r }   [Q s ]T [ Rbbs ][ K rrs ]1{ f rs }
s 1

and

[ Kcc ]{E
c } { f c }  [ Kcr ]{u} (6.66)

where
Ns
 q 
[ K cc ]  [B ] s T
c   ([ K ccs ]  [ K crs ][ K rrs ]1[ K rcs ])[ Bcs ]  [ K crs ][ K rrs ]1[ R sj ]T  [U sq jc ][ Bc ] 
s 1  qneighbor( s ) 
Ns
[ K cr ]   [ Bcs ]T [ K crs ][ K rrs ]1[ R sj ]T  [U sq
jb V jjsq ][Q q ]
s 1 qneighbor( s )
Ns
{ fc } [B ]
s 1
s T
c ({ f cs }  [ K crs ][ K rrs ]1{ f rs })

Similarly, {u} can be solved by using one of the Krylov subspace iterative solvers
after eliminating {Ec } based on (6.65) and (6.66). The electric field inside each
subdomain can finally be obtained by solving (6.62).

6.4.2 Extension to Nonconformal Interface and Corner Meshes

To remove the requirement for the conformal corner mesh, one can employ the
scheme described in Section 6.3.2. After incorporating the nonconformal corner
scheme into the nonconformal FETI-DP method with cement elements, we find
that the matrices and vectors related to the global corner system in (6.65) and
(6.66) remain the same for master subdomains but have to be modified for the
slave subdomains [15].

6.4.3 Comparison Between the LM- and CE-Based FETI-DP Methods

Comparing the formulations of the LM- and CE-based FETI-DP methods, we first
find that the dimension of [ K rrs ] , which has to be factorized during the tearing
stage, is Nis  Nbs and Nis  2 Nbs for the LM- and CE-based FETI-DP methods,
respectively, where N is and N bs denote the number of interior and boundary
252 Advanced Computational Electromagnetic Methods and Applications

s
unknowns. Even with one more matrix factorization for [ Nbb ]q in (6.49), the
computational cost of the LM-based FETI-DP method is still smaller than that of
the CE-based FETI-DP method. Furthermore, in the global system solution stage,
each iteration step requires solving all subdomain equations directly. Because of
the smaller subdomain matrices, the LM-based FETI-DP method has faster
forward and back substitutions. This advantage also holds for the LM-based FETI-
DP method during the subdomain solution recovering stage. Finally, considering
the global interface system, because the LM-based FETI-DP method includes only
the Lagrange multipliers, the dimension of its global system matrix is only half of
that of the CE-based FETI-DP method, which includes both the interface electric
field and the auxiliary variable. Therefore, if these two methods converge with the
same number of steps, as we have observed in most cases, the LM-based FETI-DP
method is generally more efficient than the CE-based FETI-DP method [15].

6.5 FETI-DP METHOD ENHANCED BY THE SECOND-ORDER


TRANSMISSION CONDITION

The Robin boundary condition (6.10) and the resultant transmission conditions
(6.13) in Section 6.1.2 are equivalent to the first-order transmission condition
(FOTC) defined by (6.55) and (6.56) in Section 6.4.1. The FOTC can only
guarantee the transmission of propagating modes through the subdomain interface
[34, 38]. A higher-order transmission condition can be designed to transmit both
propagating and evanescent modes and therefore can be employed to speed up the
convergence of the iterative solution of the global interface problem [23, 24, 38,
39]. The transverse-electric second-order transmission condition (SOTC-TE) can
be written as

js   s nˆ s  (nˆ s  Es )   s  [nˆ s (  Es )n ]

  jq   q nˆ q  (nˆ q  Eq )   q  [nˆ q (  Eq )n ] on  sq (6.67)

and the fully second-order transmission condition (SOTC-FULL) can be written as

js   s nˆ s  (nˆ s  Es )   s  [nˆ s (  Es )n ]   s t t  js

  jq   q nˆ q  (nˆ q  Eq )   q  [nˆ q (  Eq )n ]   q t t  jq on  sq (6.68)

where js is defined in (6.55). Comparing (6.67) and (6.68) to (6.56), we can find
that two terms that correspond to the tangential variation of the normal magnetic
flux density  [nˆ s ( Es )n ] , where (  Es )n  nˆ s  (  Es ) , and the tangential
Domain Decomposition Methods For Finite Element Analysis 253

variation of the surface charge density t t  js are added into the FOTC gradually
to construct the SOTC-TE and the SOTC-FULL. These two added terms transmit
the transverse-electric (TE) and transverse-magnetic (TM) evanescent modes
through subdomain interfaces and thus improve the convergence.
Similar to the FOTC with Lagrange multipliers that we have used for the
FETI-DP method with 2LM, we can write the SOTC-TE with Lagrange multipliers
as

 
    
nˆ s  r1  Es   s nˆ s  nˆ s  Es   s  nˆ s   Es  
n
s
Λ on 
b s (6.69)

where  s can be determined based on the smallest mesh size and the order of
basis functions on the subdomain interface to account for all the evanescent modes
supported by the interface mesh [23, 38]. More specifically,  s 
 j / (k0  k ) ,
2
with k 
 j (kmax  k02 )1/ 2 and kmax   / hmin , where hmin denotes the smallest
mesh size on the subdomain interface. The boundary condition in (6.69) is of
particular interest because it can be implemented without introducing any extra
auxiliary variables on subdomain interfaces. When incorporated into the dual-
primal framework, it does not change the sparsity pattern of the subdomain
matrices compared to that in the FOTC case. The subdomain matrix symmetry is
also preserved, which is highly desirable for the storage and factorization by a
direct sparse solver [16]. The only extra computation is to calculate a localized
surface mass matrix, which is very cheap.
Adding the boundary conditions from two neighboring subdomains and
eliminating the tangential magnetic field, we have

Λbs  Λbq   s nˆ s  (nˆ s  Ebs )   q nˆ q  (nˆ q  Ebq )

 s  [nˆ s (  Ebs )n ]   q  [nˆ q (  Ebq )n ] on  sq (6.70)

Then we enforce the continuity of the tangential electric field and the tangential
variation of the normal magnetic flux  [nˆ s (  Ebs )n ]   [nˆ q (  Ebq )n ] to
obtain
 s q
 Λb  Λ
b ( s   q )nˆ q  (nˆ q  Ebq )  (  s   q ) [nˆ q (  Ebq ) n ]
 q s
 Λb  Λ
 ( s   q )nˆ s  (nˆ s  Ebs )  (  s   q ) [nˆ s (  Ebs ) n ]
b
(6.71)

on  sq . It can be seen that in addition to the Dirichlet and Neumann continuity


conditions, the SOTC-TE also enforces the continuity of  [nˆ ( E)n ] [16]. Due
254 Advanced Computational Electromagnetic Methods and Applications

to use of the SOTC-TE, the computation of some matrices in Sections 6.2.2 and
6.3.1 has to be modified as follows

[ M bbs ]   [ s (nˆ s {Nbs })  ( nˆ s {Nbs }T )   s (  {Nbs }) n (  {Nbs }T ) n ]dS


Ss s

[ Lsbc ]   [ s (nˆ s {Nbs })  ( nˆ s {N cs }T )   s (  {N bs }) n (  {N cs }T ) n ]dS


Ss s

[ Lsbc ]q
  sq
[ s (nˆ s {Nbs })  ( nˆ s {N cs }T )   s (  {Nbs }) n (  {N cs }T ) n ]dS

[ Lsq
 bc ]  sq
[ q (nˆ q {N bs })  (nˆ q {N cq }T )   q (  {N bs }) n (  {N cq }T ) n ]dS

 [( s   q )(nˆ q  {Nbs })  (nˆ q  {Nbq }T )


sq
[ M
bb ]  sq

 (  s   q )(  {Nbs }) n (  {Nbq }T ) n ]dS .

The SOTC-TE can be applied to both the conformal and nonconformal FETI-
DP methods with 2LM.

6.6 HYBRID NONCONFORMAL FETI/CONFORMAL FETI-DP


METHOD

For some real-life engineering problems, it is neither necessary nor desirable to


mesh a computational domain together. For example, in the computer aided design
(CAD) of electronic devices, it is often the case that only a portion of the entire
device has to be redesigned repeatedly to achieve an optimal performance.
Therefore, this portion has to be remeshed multiple times, whereas the mesh for
the remaining portion can be kept the same. Therefore, there is an engineering
need for a DDM that can allow the user to generate meshes for different regions
separately based on geometrical features and then decompose each mesh
independently using an automatic mesh decomposer. With such a process, the
entire computational domain may contain conformal interfaces (generated by a
mesh decomposer) and nonconformal interfaces between different regions
partitioned before mesh generation. For such an application, it is necessary to
develop an effective DDM to deal with mixed conformal/nonconformal
multiregion meshes [26, 27].
For the aforementioned multiregion domain decomposition, when a
subdomain interface resides within one region, it must be mesh-conformal and
geometry-conformal. In this case, [ Bbbs ] in Section 6.3.1 is reduced to a Boolean
s
matrix, [ Nbb ]q and [ Nbb
sq
] become identity matrices, and one does not have to deal
with projections on the geometrical crosspoints as described in Section 6.3.2. In
other words, the formulation presented in Section 6.2.2 is more efficient than that
in Sections 6.3.1 and 6.3.2 when the interface meshes are conformal. Thus, it is
necessary to design an efficient hybrid algorithm to take advantage of the partially
Domain Decomposition Methods For Finite Element Analysis 255

conformal meshes while it can deal with general geometry-nonconformal and


mesh-nonconformal interfaces.
A critical component in this hybrid algorithm is a general crosspoint
correction technique designed to ensure good accuracy, fast convergence, and a
nonsingular global interface matrix [27]. The basic implementation of this
correction technique includes the following guidelines:
1. A Lagrange multiplier needs to be split into two when it is defined on an
edge connecting an interregion interface and an interior interface within one region
(refer to the square symbol in the right region in Figure 6.5).

2. With an automatic domain decomposition, it is possible to have geometry


crosspoints sitting on an interregion interface (refer to the square symbol in the left
region in Figure 6.5). If this is the case, convert the original corner unknowns into
the noncorner interface unknowns, define Lagrange multipliers on these
crosspoints, and split each Lagrange multiplier into two, as shown in Figure 6.6.

3. In geometry-nonconformal cases, one Lagrange multiplier may be shared


by more than two neighboring subdomains (i.e., its supporting domain overlaps
with more than two neighboring subdomains), as shown in Figure 6.7. In this case,
split such a Lagrange multiplier according to the number of overlapped
neighboring subdomains and let each Lagrange multiplier after splitting take care
of the communication from the reference subdomain to each neighboring
subdomain.

Actually, Guidelines 1 and 2 are two special cases described by Guideline 3. It


should be noted that splitting Lagrange multipliers introduces extra boundary
unknowns into the original global interface problem, which may lead to a singular
global interface matrix equation. For this, a technique using a corner penalty term
is employed to remove the singularity or near singularity due to the redundancy in
the interface system [24].

Edges where Lagrange


multipliers are split into two

Crosspoint where only corner


unknowns are defined

Figure 6.5 Two regions of an entire computational domain decomposed into six and four
subdomains. For a better view, two regions are artificially detached. ©2014 IEEE [27].
256 Advanced Computational Electromagnetic Methods and Applications

Figure 6.6 Illustration of a split Lagrange multiplier (associated with the bold line) defined on the
interface between two regions. After splitting, two independent Lagrange multipliers (still
associated with the bold line) are defined on the shaded and solid triangles. ©2014 IEEE
[27].

(a) (b) (c)


Figure 6.7 Illustration of crosspoint correction for two regions glued through nonconformal
interface meshes with one region decomposed into three subdomains and the other
consisting of only one subdomain. (a) The interface mesh for the region consisting of
three subdomains. (b) The interface mesh for the other region consisting of only one
subdomain. (c) A Lagrange multiplier associated with the dotted edge split into three
independent Lagrange multipliers since its supporting basis overlaps with three
neighboring subdomains. ©2014 IEEE [27].

6.7 NUMERICAL EXAMPLES

The algorithms described in the previous sections have been implemented on


different serial and parallel computing platforms. Specifically, the message passing
interface (MPI) parallel programming scheme is employed for parallel
implementations. All the structures under simulation are modeled and meshed with
CUBIT [40]. The entire computational domain meshed into curvilinear tetrahedral
elements is automatically decomposed into smaller subdomains by METIS [41].
For antenna array simulations, the repetition of the array structure is fully exploited
in order to save time for generating the mesh and factorizing repeated subdomain
matrices and to save computer memory to store repeated factorized subdomain
matrices. We will present several numerical examples to demonstrate and compare
the accuracy and convergence performance of these algorithms.
Domain Decomposition Methods For Finite Element Analysis 257

6.7.1 Wave Propagation in Free Space

In the first example, we simulate wave propagation in free space and use the result
to compare the convergence performance for the solution of the global interface
problem in the conformal FETI-DP and LM-based and CE-based nonconformal
FETI-DP methods as described in Sections 6.2.2, 6.3, and 6.4, respectively. We
design three different subdomains as shown in Figure 6.8, and use them to form a
computational domain with 3  3 subdomains to test three cases: (1) mesh with
conformal interfaces and conformal corners; (2) mesh with nonconformal
interfaces but conformal corners; and (3) mesh with nonconformal interfaces and
non-conformal corners, as shown in Figure 6.9. It is well known that the
convergence rate of a linear system is closely related to its eigenvalue distribution.
Therefore, we compare the eigenspectra of the global interface equations for {b }
(in the conformal and LM-based nonconformal FETI-DP methods) and {u} (in the
CE-based non-conformal FETI-DP method) in Figures 6.10, 6.11, and 6.12. For
the case with conformal interface and corner meshes whose result is plotted in
Figure 6.10, the convergence performance of all the three methods is expected to
be similar because their eigenspectra look nearly identical except that the CE-
based FETI-DP has a pure propagation mode corresponding to the (1, 0) point on
the complex plane. Similarly, for the case with nonconformal interfaces and either
conformal or nonconformal corner meshes, we can see from Figures 6.11 and 6.12
that the performance of the LM- and CE-based nonconformal FETI-DP methods is
again similar to each other. Our prediction is further validated by comparing the
convergence history for all the cases in Figure 6.13, where the BiCGStab iterative
solver with a stopping criterion of 109 is employed to solve the global interface
equations.

(a) (b) (c)


Figure 6.8 Three different subdomains with different mesh patterns and mesh densities on the
interface. (a) Free style with a mesh size h=/20. (b) Mapped style with a mesh size
h=/20. (c) Free style with a mesh size h=/25. ©2014 IEEE [15].
258 Advanced Computational Electromagnetic Methods and Applications

2 2 2 1 2 1 1 1 1

2 2 2 2 2 2 1 3 1

2 2 2 1 2 1 1 1 1

(a) (b) (c)


Figure 6.9 Three domain decomposition patterns. (a) Conformal interface and conformal corner
meshes. (b) Nonconformal interface and conformal corner meshes. (c) Nonconformal
interface and nonconformal corner meshes. ©2014 IEEE [15].

(a) (b) (c)


Figure 6.10 Eigenspectra of the global interface system matrix for the case of conformal interface and
conformal corner meshes. (a) Conformal FETI-DP (matrix dimension: 1,216 × 1,216). (b)
LM-based nonconformal FETI-DP (matrix dimension: 1,216 × 1,216). (c) CE-based
nonconformal FETI-DP (matrix dimension: 2,432 × 2,432). ©2014 IEEE [15].

(a) (b)

Figure 6.11 Eigenspectra of the global interface system matrix for the case of nonconformal interface
and conformal corner meshes. (a) LM-based nonconformal FETI-DP (matrix dimension:
1,144 × 1,144). (b) CE-based nonconformal FETI-DP (matrix dimension: 2,288 × 2,288).
©2014 IEEE [15].
Domain Decomposition Methods For Finite Element Analysis 259

(a) (b)

Figure 6.12 Eigenspectra of the global interface system matrix for the case of nonconformal
interface and nonconformal corner meshes. (a) LM-based nonconformal FETI-DP
(matrix dimension: 1,240 × 1,240). (b) CE-based non-conformal FETI-DP (matrix
dimension: 2,480 × 2,480). ©2014 IEEE [15].

6.7.2 Wave Propagation in PML Medium

For open-domain simulations, the PML is usually employed to truncate the


computational domain because it can absorb both propagating and evanescent
modes without being placed far away from the structure of interest. In this chapter,
we implement the PML as a diagonally anisotropic artificial medium, with
r  r [ D] and  r   r [ D] , where
a 0 0
[ D]   0 b 0  (6.72)
 0 0 c 

The diagonal entries of [ D] can be further written as a  s y sz / sx ,


b  sz sx / s y , and c  sx s y / sz , where s x , s y , and s z are functions of spatial
variables x, y, and z, respectively. In the PML region, s x , s y , and s z can be
expressed as sa s  js , where a could be x, y, or z, and s  and s  are real
numbers with s  1 and s  0 , which are used to control the attenuation of the
evanescent and propagating waves in the PML [1].
Rigorously speaking, the transmission condition parameters  and  in
(6.67) should be tensors, like  and  , to account for the anisotropy of the PML
medium. However, the derivation of a general expression for optimal  and 
will be very difficult if the subdomain interface is not flat and aligned with
Cartesian axes. Therefore, simplifying the two tensor parameters to two scalars is a
reasonable and practical choice. To check whether the isotropic approximation for
transmission condition parameters works well, we investigate the convergence
performance of several FETI-DP methods and the cement element method by
simulating wave propagation in the PML medium. The computational domain is a
260 Advanced Computational Electromagnetic Methods and Applications

3  3 1 m3 rectangular box filled with the PML medium, which has  r  r  1


and
1  j 0 0 

[ D]   0 1  j 0  (6.73)

 0 0 0.5  j 0.5

(a)

(b)

(c)
Figure 6.13 Convergence history for all the cases by using the BiCGStab iterative solver with a
stopping criterion of 109. (a) The case of conformal interface and conformal corner
meshes corresponding to the decomposition pattern in Figure 6.9(a). (b) The case of
nonconformal interface and conformal corner meshes corresponding to the
decomposition pattern in Figure 6.9(b). (c) The case of nonconformal interface and non-
conformal corner meshes corresponding to the decomposition pattern in Figure 6.9(c).
©2014 IEEE [15].
Domain Decomposition Methods For Finite Element Analysis 261

(a)

(b) (c)

(d) (e)
Figure 6.14 (a) A computational domain filled with a PML medium and decomposed into nine
subdomains. (b) Convergence history of the iterative solution of the global interface
problem on the mesh in Figure 6.14(a) using the FETI-DP method with the FOTC. (c)
Using the FETI method with the SOTC-TE. (d) Using the cement element method with
the SOTC-FULL. (e) Using the FETI-DP method with the SOTC-TE.

As we know, this medium is used to absorb waves propagating along the z-axis.
The entire computational domain is further divided into nine subdomains and
discretized into tetrahedral elements with a certain mesh density, as shown in
Figure 6.14(a). In the simulation, we fix the wavelength at  =5m and decrease the
262 Advanced Computational Electromagnetic Methods and Applications

mesh size h gradually from 0.5m to 0.0625m. In all setups, the interface mesh is
required to be conformal. The scalability with respect to the mesh size is shown by
the convergence history of the iterative solution of the global interface problem
using the FETI-DP method with the FOTC, the FETI method with the SOTC-TE,
the cement element method with the SOTC-FULL, and the FETI-DP method with
the SOTC-TE in Figures 6.14(b), 6.14(c), 6.14(d), and 6.14(e), respectively. By
comparing Figure 6.14(e) with Figures 6.14(b) and 6.14(c), it is observed that the
SOTC-TE yields a faster convergence than does the FOTC and that the global
corner coarse grid correction also helps to improve the convergence. A comparison
between Figures 6.14(e) and 6.14(d) shows that the FETI-DP method with the
SOTC-TE can achieve a better convergence performance than the cement element
method with the SOTC-FULL does. It should be noted that the FETI-DP method is
even more efficient due to the reduced size of the global interface system and
symmetry of the subdomain matrices.

(a) (b)

(c) (d)
Figure 6.15 Eigenspectra of the global interface system on the mesh in Figure 6.14(a), with a mesh
size h =  / 20. (a) Using the FETI-DP method with the FOTC. (b) Using the FETI
method with the SOTC-TE. (c) Using the cement element method with the SOTC-FULL.
(d) Using the FETI-DP method with the SOTC-TE.

Figure 6.15 shows the eigenspectra of the global interface equations for the
mesh in Figure 6.14(a) with a mesh size of h=/20, using the four different DDM
solvers. It can be seen that the simplified TC parameters works well for all solvers
Domain Decomposition Methods For Finite Element Analysis 263

except for the FETI-DP method with the FOTC. Because its interface system
matrix is on longer positive-definite, the FOTC converges at the slowest rate when
solving the global interface problem. Comparing Figure 6.15(d) with Figure
6.15(b), one can see that the use of the global corner coarse grid correction helps to
make the eigenvalue distribution more compact, which results in better
convergence as shown in Figure 6.14(e). Note that the interface system of the
FETI-DP method has a dimension of 1,360, whereas those of the FETI method and
the cement element method are 1,472 and 3,160, respectively.

6.7.3 Vivaldi Antenna Array

The third example is designed to explore the capability of the LM-based


nonconformal FETI-DP method to analyze large-scale antenna arrays and compare
its performance to that of the conformal FETI-DP method with 2LM and the CE-
based nonconformal FETI-DP method. All three solvers are equipped with the
FOTC. The size of the simulated Vivaldi antenna array increases from 3  3 to
100×100. To truncate the computational domain, the first-order ABC is placed at
one extra unit cell away surrounding the array in the x-y plane. The distance
between two adjacent elements in both the x- and y-directions is set to be 36 mm.
Figure 6.16 shows the Vivaldi antenna element, where the height, width, and
thickness of the substrate are d = 33.3 mm, w = 34.0 mm, and h = 1.27 mm,
respectively. The lossless substrate has a relative permittivity of 6.0. The radius of
the hollow circle is chosen to be R = 2.5 mm. The half-width of the slot line varies
with z according to an exponential function given by w(z) = 0.25exp(0.123z) mm.
This function gives a half-width of 15 mm at the opening. The antenna is fed by a
coaxial line with an inner radius rin = 0.375 mm and an outer radius rout = 0.875
mm from under the ground. A 5-mm coaxial line is modeled and then terminated
with a waveguide port boundary condition with only the TEM mode assumed at
the end of the coax.

Figure 6.16 Structure of a Vivaldi antenna element.


264 Advanced Computational Electromagnetic Methods and Applications

The active reflection coefficients at certain ports of a 10 10 Vivaldi array on


an infinite ground plane are calculated and compared with the result of the FETI-
DP method with 2LM on a conformal mesh, as shown in Table 6.1. From the first
to second rows in Table 6.1, we find that, when using the identical conformal
meshes, the relative error in the solution is on the magnitude of the stopping
residue ( 105 ) chosen for the iterative solution of the global system equation. The
larger error in the nonconformal cases (third to fourth rows) is likely the result of
using a different mesh than that used to compute the reference solution.

Table 6.1
Comparison of the Active Reflection Coefficient in terms of the Relative L2 Norm error, Using the
Result of the FETI-DP Method with 2LM as Reference, for a 10 10 Vivaldi Antenna Array,
Simulated at 3 GHz, with the Main Beam Steered to   0 and   0o .
o

(5,5) Element (10,10) Element All Elements

LM-based FETI-DP
(with/conformal mesh) 1.48  105 2.11  104 8.80  105

CE-based FETI-DP
(with/conformal mesh) 1.23  105 5.36  105 3.12  105

LM-based FETI-DP
(with/nonconformal mesh) 8.95  104 2.11  103 1.20  103

CE-based FETI-DP
(with/nonconformal mesh) 1.15  103 1.95  103 1.35  103
Source: [15].

The convergence history of the iterative solution of the global interface


problem for the 100 100 array simulated by the conformal, LM-based, and CE-
based FETI-DP methods is plotted in Figure 6.17(a), and the computed radiation
patterns are compared in Figures 6.17(b) and 6.17(c). The BiCGStab iterative
solver is employed with a stopping criterion of 103 . For this array, the LM- and
CE-based FETI-DP methods have a similar convergence behavior and yield nearly
identical results to that of the conformal FETI-DP method.
In Table 6.2, we list the computation resources used to simulate Vivaldi
antenna arrays of different sizes by the LM-based FETI-DP method. All examples
are run on an HP workstation, equipped with a 2.66-GHz Intel Xeon processor and
12 GB memory. To plot the scalability curve as shown in Figure 6.18, we record
the computation time for solving the global interface dual unknowns as well as the
total computation time. It is observed that the computation time increases linearly
with the total number of unknowns for this example.
Domain Decomposition Methods For Finite Element Analysis 265

Table 6.2
Computational Information of the Nonconformal FETI-DP Method for Simulating Various Vivaldi
Antenna Arrays

Array Size Number of Interface Time Number of Total Time


Unknowns (hh:mm:ss) Iterations (hh:mm:ss)

3 3 209,792 00:00:21 28 00:02:20

10 10 1,908,552 00:04:54 44 00:13:43

31 31 17,410,080 00:49:27 51 02:01:21

100 100 178,235,832 07:19:58 40 19:32:20


Source: [15].

FETI-DP e 2LM FETI-DP e 2LM


(With/ nonconformal mesh) (with/ nonconformal mesh)
LM-based FETI-DP LM-based FETI-DP
(With/ nonconformal mesh)
(With/ nonconformal mesh)
CE-based FETI-DP
CE-based FETI-DP
(with/ nonconformal mesh)
(With/ nonconformal mesh)

(a) (b)

FETI-DP e 2LM
(with/ nonconformal mesh)
LM-based FETI-DP
(With/ nonconformal mesh)
CE-based FETI-DP
(with/ nonconformal mesh)

(c)

Figure 6.17 Simulation of the 100 × 100 Vivaldi antenna array at 3 GHz. (a) Convergence history. (b)
Broadside scan E-plane relative pattern. (c) Broadside scan H-plane relative pattern.
©2014 IEEE [15].
266 Advanced Computational Electromagnetic Methods and Applications

Figure 6.18 Computation time as a function of the total number of unknowns for various Vivaldi
antenna arrays. ©2014 IEEE [15].

6.7.4 Vivaldi Antenna Array with a Large Scan Angle

Antenna array is a typical case where the outgoing wave may propagate towards
the truncation boundary at an oblique direction. If this is the case, no matter how
far away a planar ABC is placed, its absorption is limited and the artificial
reflection may not be reduced to a desired level. To effectively reduce the artificial
reflection, we can employ an oblique ABC given by [28]
nˆ  ( E)   jk0 coss nˆ  (nˆ  E)  ( jk0 / cos s )tˆ(tˆ  E) (6.74)

where tˆ  (ˆs  nˆ)sin s cos s  ˆs sin s sin s and n̂ denotes the outward unit
normal vector of the planar truncation surface. The angle for perfect absorption of
this ABC can be tuned by parameters  s and s . Obviously, (6.74) is reduced to
the conventional ABC if  s  0o . Its reflection coefficients for the perpendicular
(E) and parallel (H) polarizations can be derived as
cos   cos  s cos  s  cos 
R  , R/ /  (6.75)
cos   cos  s cos  s  cos 

If the outgoing wave under simulation propagates towards the truncation boundary
at a certain specified angle, for example, the direction of the main beam of the
radiated wave is specified, we can always tune this ABC to minimize the reflection
error. Figure 6.19 compares the absorption performance of the conventional and
oblique ABCs over a range of incident angles.
To investigate the performance of the oblique ABC, a 20  20 Vivaldi antenna
array is considered. For the mesh truncation of the upper half space, we have two
setups. One is a hemispherical surface with a base radius of 7, whereas the other
Domain Decomposition Methods For Finite Element Analysis 267

is a rectangular surface placed 1 away from both the top and the side of the
antenna array. The size of the rectangular box is 8.8×9.2×1.33. Apparently, the
second setup is computationally more efficient than the first one because its
computational domain is much smaller. However, in the second setup, the radiated
field will be incident on the top truncation surface at a much larger angle than in
the first one if the antenna array is set to radiate away from broadside. In this case,
the oblique ABC can provide a good absorption performance while minimizing the
size of the computational domain. The 20  20 Vivaldi antenna array is simulated
at 3.0 GHz using: (1) the conventional ABC with the hemispherical truncation
surface; (2) the conventional ABC with the rectangular truncation surface; and (3)
the oblique ABC with the rectangular truncation surface for the main beam ( s , s )
steered to (60o, 0o). The near-zone field distributions in the x-z plane are plotted in
Figure 6.20. We take the result of Case 1 shown in Figure 6.20(a) as the reference
solution and enlarge the portion close to the antenna array in Figure 6.20(b) for a
better comparison with the results of Cases 2 and 3, which are shown in Figures
6.20(c) and 6.20(d). For the case of (s , s )  (60o ,0o ) , Case 3 yields a visually
much better result than does Case 2, as shown in Figures 6.20(c) and 6.20(d). The
far-field radiation patterns calculated in the three cases above are compared in
Figure 6.21, which shows that the result of Case 2 deviates from the reference
solution by 3 dB whereas the result of Case 3 has a much smaller derivation. For
Cases 2 and 3, it takes 9.2 minutes to finish the simulation of one frequency point
on one computational node which contains 16 Intel Xeon 2.70-GHz processors.
Both cases are computed using the conformal FETI-DP method with 2LM. The
result of the reference case (Case 1) is obtained using the hybrid nonconformal
FETI/conformal FETI-DP method described in Section 6.6 with 43.5 minutes for
one frequency on the same node.

Figure 6.19 Comparison of the reflection coefficients of the conventional ABC and the oblique ABC
(tuned to s = 60o and s= 0o) for the perpendicular (E) and parallel (H) polarizations.
©2014 Wiley [28].
268 Advanced Computational Electromagnetic Methods and Applications

(a)

(b)

(c)

(d)

Figure 6.20 Re( E ) for the 20 × 20 Vivaldi antenna array in the x-z plane at 3.0 GHz with steering
angle set at (s, s) = (60o, 0o). (a) Computed using the conventional ABC with a
hemispherical truncation surface. (b) Same as (a), but plotted in a limited region for the
purpose of comparison. (c) Computed using the conventional ABC with a rectangular
truncation surface. (d) Computed using the oblique ABC with a rectangular truncation
surface. ©2014 Wiley [28].

Figure 6.21 Copolarized radiation patterns for the 20 × 20 Vivaldi antenna array in the x-z plane at
3.0 GHz when the main beam is steered to (s, s) = (60o, 0o). ©2014 Wiley [28].
Domain Decomposition Methods For Finite Element Analysis 269

6.7.5 NRL Vivaldi Antenna Array with Radome

In this example, we consider the near-field interaction between a phased-array


antenna and its surrounding environment. The antenna array adopted here is
designed by the Naval Research Laboratory (NRL) [27, 42], and it is covered by a
hemispherical radome which provides mechanical protection. The hybrid non-
conformal FETI/conformal FETI-DP method is employed to solve this multi-
region problem.
The dual-polarized 1111 NRL Vivaldi antenna phased array is shown in
Figure 6.22(a). Each Vivaldi antenna element consists of three layers of metal and
is printed on a dielectric substrate with a height h = 246.253 mm, a width w =
35.56 mm, and a thickness d = 3.3274 mm. The relative permittivity of the
dielectric slab is  r  2.2  j 0.0009 . The separation from the middle layer to the
top layer is the same as that to the bottom layer, and the metals on top and bottom
layers have the same shape, which is shown in Figure 6.22(c). The top/bottom
layer and the middle layer are connected by vias with a radius r = 0.79 mm. Each
antenna element is fed by a coplanar transmission line as shown in Figure 6.22(b).
All antenna elements are connected to each other by solid metal posts and mounted
vertically on a finite ground whose size is 528 mm  528 mm. For more
geometrical details, the reader is referred to [27, 42].

(a) (b) (c)

Figure 6.22 An 11 × 11 NRL antenna array. (a) Measurement setup in anechoic chamber. (b) Shape
of the metal on the middle layer. (c) Shape of the metal on the top (bottom) layer.
©2014 IEEE [27].

At 3.02 GHz, the hemispherical radome has a base radius of 5.5. The
thickness and the relative permittivity of the radome are 0.1 and  r  2.0  j1.0 ,
respectively. The conventional first-order ABC is used on a hemispherical surface
placed 1 away from the exterior boundary of the hemispherical radome. In this
case, the first-order ABC is a better choice because the truncation surface can be
made conformal to the radome to reduce the size of the computational domain. In
addition, it provides good absorption for waves radiating along any direction.
Figure 6.23 shows the radiation patterns of the array with and without the radome.
All radiation patterns are normalized by the value in the maximum radiation
270 Advanced Computational Electromagnetic Methods and Applications

direction of the array without the radome. It can be seen that due to the loss of the
radome, the emitted power in the main beam direction is reduced by around 3 dB.
The result using conformal meshes on the interregion interfaces is also plotted for
comparison. Apparently, using nonconformal interregion interface meshes does not
sacrifice the accuracy of the solution since two sets of data are on the top of each
other. The field distribution is also plotted in Figure 6.24 for the cases with and
without the radome.

Without/radome, nonconformal mesh


With/radome, conformal mesh
With/ radome, nonconformal mesh

Figure 6.23 Comparison between the radiation patterns for the array with and without the radome at
3.02 GHz and steering angle s = 60o and s= 0o. ©2014 IEEE [27].

(a)

(b)

Figure 6.24 | E | in the  = 0 plane for H-pol excitation at 3.02 GHz and steering angle s = 60o and
o

s= 0o. (a) The NRL array itself. (b) The NRL array with a radome.
Domain Decomposition Methods For Finite Element Analysis 271

For the simulation of these two examples, the region containing the 11 × 11
NRL array is decomposed into 256 subdomains, and the interregion interface is
placed just above the top of the antenna elements. The other region containing the
radome is meshed by CUBIT and further decomposed into 200 subdomains by
METIS. This array-radome example involves 11,403,519 unknowns, 1,482,864
dual unknowns, and 18,552 corner unknowns. Finally, the convergence history of
the iterative solution of the global interface problem for the array with the radome
is given in Figure 6.25. It should be noted that for large-scale problems, the non-
conformal meshes on the interfaces between different regions may introduce some
numerical resonance and yield slower convergence than a conformal mesh does.

Figure 6.25 Convergence history of the iterative solution of the global interface problem for the
NRL array with the radome.

6.7.6 Medium-Scale Two-Dimensional Microring Resonator

In the last two examples, we apply the FETI-DP method with the SOTC-TE to the
analysis of computationally complex optical devices and compare its numerical
performance to that of the FETI-DP method with the FOTC. In the first example,
we simulate the TMz mode in a microring resonator (MRR) shown in Figure 6.26
for the purpose of validation. This structure is invariant along the direction
perpendicular to the page, and thus it can be modeled as a 2-D problem to validate
the 3-D solution. In the 3-D simulation, the ring/bus structure is assumed to have a
finite thickness and a perfectly conducting plane is placed at the top and bottom. If
the thickness is smaller than one-half of a wavelength, a vertically invariant field
can be preserved in the 3-D configuration. The MRR plays as a bandstop filter. To
enhance reflection at one of the resonant frequencies, a first-order grating is
employed along the inner circle of the upper half MRR [16]. If the ring lies in the
x-y plane, the parametric function for the inner radius of the ring is given by
272 Advanced Computational Electromagnetic Methods and Applications

 x( )  (r1   sin 2m ) cos  , y( )  (r1   sin 2m )sin  for 0    
 (6.76)
 x( )  r1 cos  , y( )  r1 sin  for     2

where r1  8.267 μm is the inner radius,   8.3 103 μm is the grating size, and
m  58 is the azimuthal order. The outer radius parametric function is given by
x( )  r2 cos  and y( )  r2 sin  for 0    2 with r2=8.68 m. The coupling
gap between the ring and the bus waveguide is set to g=0.248 m. The relative
permittivities of the core and the cladding are 4.0 and 1.0, respectively. The bus
waveguide is placed parallel to the x-axis. Around the wavelength =1,550 nm, the
guided mode in the bus waveguide has the electric field profile in the core and in
the cladding given by

  d
 zˆ cos[k y ( y  y0 )]exp( j  x)  y  y0  
2
 
E( x, y )   (6.77)
 zˆ cos  k y d  exp    y  y  d   exp( j  x)  d
      y  y0  
 2  
0
  2     2
1
where   6.8355 μm1 is the propagation constant, k y  4.3593 μm and
  5.5039 μm1 describe the y-dependence, d  0.413 μm is the waveguide
width, and y0  9.135 μm denotes the center of the bus waveguide.

(a) (b)

Figure 6.26 (a) Top view of a ring/bus structure modeled with CUBIT. The ring has the first-order
grating along the upper half on the inner side. (b) Enlarged view of the rectangular
region in Figure 6.26(a) showing the grating. © Optical Society of America 2014 [16].

For the simulation, the entire ring/bus structure is enclosed by a box, whose
dimensions are 24.8 µm, 24.8 µm, and 0.27µm in the Cartesian coordinate system.
Except for the top and bottom boundaries, all four sides of the computational
Domain Decomposition Methods For Finite Element Analysis 273

domain are truncated by the PML, which has a thickness of 0.827 μm on each side.
The core and cladding regions are discretized by a mesh size of 0.0744 µm and
0.149 µm, respectively. As a result, there are at least six layers of elements in the
PML for each side truncation. The entire structure is excited by the mode described
in (6.77) through a current sheet located on the bus. Two contradirectional waves
are excited from the sheet and only the forward wave that bypasses the ring is of
interest. Accordingly, the reference planes for reflection and transmission
coefficient calculation are placed on the left and right sides of the current sheet, as
shown in Figure 6.26. After being meshed with CUBIT, the entire computational
domain is decomposed into N s  512 subdomains, involving 7,522,572 unknowns,
889,488 dual unknowns, and 9,880 corner unknowns when the second-order
hierarchical vector basis functions [43] are employed. The simulation is carried out
from 1.924 1014 to 1.944 1014 Hz and the computed reflection and transmission
coefficients are plotted in Figure 6.27 as functions of frequency. To validate the
3-D simulation result, a 2-D simulation is carried out by COMSOL Multiphysics
[44] using the same geometry and material setup in the x-y plane. Two sets of
results are in excellent agreement, as shown in Figure 6.27. The field distribution
in the z  0 μm plane is plotted in Figure 6.28 for 1.9246 1014 and
1.9355 1014 Hz . The standing-wave field profile in the ring shown in Figure 6.28
indicates that the MRR has a stronger resonance at 1.9355 1014 Hz . Most of the
energy guided on the bus is reflected by the grating in the ring instead of being
delivered to the receiving port. Usually, an iterative solver takes more iterations to
converge at resonant frequencies because the matrix is more ill-conditioned. This
prediction can be verified by the convergence history of the iterative solution of
the global interface problem given in Figure 6.29, which also shows that the FETI-
DP method with the SOTC-TE effectively overcomes the convergence difficulty
encountered with the FOTC when a PML mesh truncation is employed.

Figure 6.27 Power reflection and transmission coefficients of the ring/bus structure shown in Figure
6.26 from 1.924×1014 Hz to 1.944×1014 Hz. © Optical Society of America 2014 [16].
274 Advanced Computational Electromagnetic Methods and Applications

(a) (b)

Figure 6.28 Re( Ez ) in the z = 0 m plane. (a) At 1.9246×1014 Hz. (b) At 1.9355×1014 Hz. ©Optical
Society of America 2014 [16].

Figure 6.29 Convergence history of the iterative solution of the global interface problem using the
FETI-DP method with the SOTC-TE and the FOTC. © Optical Society of America 2014
[16].

Next, we examine the parallel efficiency of the FETI-DP method with the
SOTC-TE by solving the bus/ring resonator problem using various numbers of
processors. Table 6.3 shows the time for preprocessing, the time for solving the
interface problem, and the total computation time. Based on the computation times
in Table 6.3, the speed-up, which is defined with respect to the wall-clock time
using four processors as
T4
Speed-up  (6.78)
TN p

is plotted in Figure 6.30. Note that TN p is the total wall-clock time using N p
processors. As can be seen, an excellent speed-up has been achieved using up to
Domain Decomposition Methods For Finite Element Analysis 275

128 processors. With 128 processors employed, the peak memory usage is 55.7
GB.

Table 6.3
Computation Times for the Bus/Ring Resonator Problem with 7,522,572 Unknowns and 512
Subdomains

Number of Time for Time for Solving Total Time


Processors Preprocessing (s) the Interface (s) (s)

4 492.8 3148.2 3652.4


8 270.7 1795.4 2072.2
16 137.7 969.8 1110.4
32 72.1 553.9 627.6
Source: [16].

Ideal linear speed-up


Preprocessing speed-up
Interface speed-up
Total speed-up

Figure 6.30 Parallel speed-up versus the number of processors with N =512. The computation time
using four processors is taken as the reference. © Optical Society of America 2014 [16].

6.7.7 Full-Scale Three-Dimensional Double-Microring Resonator

In the last example, a fabricated 3-D Si3N4/SiO2 evanescently coupled double-


microring resonator (ECDMRR) is simulated. The device contains two rings,
whose top view is given in Figure 6.31(a). The outer ring is a plain ring with no
grating. A first-order grating is fabricated only on the outer boundary of the top
half of the inner ring, as shown in the enlarged view in Figure 6.31(b). The radii of
the four concentric circles from the innermost to the outermost are 28.948 m,
29.548 m, 30.148 m, and 31.148 m, respectively. The parametric equation has
276 Advanced Computational Electromagnetic Methods and Applications

the same format as that in (6.76) for the circle with grating, except that in this
example the grating size is   0.1 μm and the azimuthal order is m  200 . The
bus waveguide, which has a width d=1 m, is placed 345 nm away from the outer
ring at the closest point. The two rings and the bus waveguide have the same
thickness t  0.3756 μm in the vertical direction. The refractive indices are given
by ncore  1.977 for the cores of the two rings and the bus waveguide, which are
made of Si3N4, and ncladding  1.437 for the cladding, which is made of SiO 2,
respectively [16].

(a) (b)

Figure 6.31 (a) Top view of an ECDMRR/bus structure modeled with CUBIT. The inner ring has a
first-order grating along the half on its outer side. (b) Enlarged view of the rectangular
region in Figure 6.31(a). © Optical Society of America 2014 [16].

For the simulation, the entire double ring/bus structure is enclosed by a box-
shaped computational domain, whose dimensions are 66.66 m, 67.895 m, and
5.2 m in the Cartesian coordinate system. All of the six exterior boundaries are
truncated by the PML, which has a thickness of 1.2 m in each direction. The total
computational volume, which is about 1.9 10 ( / navg ) for navg  1.442 and
4 3

=1,550 nm, represents a rather large computational domain for a full-wave


analysis. We calculate this total computational volume by first normalizing the
volumes of the core and cladding regions by their corresponding wavelengths in
the material ( / n)3 , and then summing these normalized volumes. The two rings
and the bus waveguide are first discretized with a mesh size h  0.14 μm , and the
remaining cladding region is discretized with a mesh size h  0.3 μm , where h is
the length of the tetrahedral elements. These mesh sizes are about ( / ncore ) / 5.6
and ( / ncladding ) / 3.6 at the wavelength   1550 nm for the core and cladding
regions, respectively. To ensure a good accuracy, third-order hierarchical vector
basis functions are employed. A current sheet is placed on the bus waveguide to
excite the fundamental mode, which is obtained through a 2-D simulation using
COMSOL. The geometry is meshed with CUBIT, and the detailed geometrical
features of the small grating teeth can be captured accurately by using curvilinear
tetrahedral elements.
Domain Decomposition Methods For Finite Element Analysis 277

Similar to the previous example, the simulation is carried out from


  1547 nm to =1,551 nm, and the computed reflection and transmission
coefficients are compared to those obtained from the measurement in Figure
6.32(a). It is observed that two sets of results have a similar bandwidth and similar
reflection and transmission coefficients at the resonant frequency. There is a shift
of 0.78 nm in the resonant wavelength between the simulated and measured results.
This shift is due to the imbalanced expansion of the electric field and its curl
(which is proportional to the magnetic field) in the finite element analysis [1]. Note
that the shift is only 0.05% compared to the wavelength at the resonant peak. If
this shift is adjusted, the simulated data agree very well with the measurement
result, as shown in Figure 6.32(b). The field distribution in the z  0 μm plane is
plotted in Figure 6.33 for two wavelengths. A strong resonance occurs at
=1,549.14 mm and an enlarged view is given in Figure 6.33(d). In this example,
the entire computational domain is decomposed into 930 subdomains, which
results in 103,916,607 unknowns, 9,717,426 dual unknowns, and 133,060 corner
unknowns. Because the problem is very large and highly resonant, the BiCGStab
iterative solver converges slowly. The peak memory usage is 1.25 TB. Compared
to the previous example, the memory usage is increased by 23.0 times whereas the
number of unknowns is 13.8 times larger. Thus, the memory scaling is 60% of
ideal, which is good considering that the coarse grid problem is larger, the
subdomains are larger, and the problem is divided over more processors. We
repeated the simulation with the second-order hierarchical vector basis functions to
investigate the tradeoff between simulation accuracy and computational time. The
number of unknowns is reduced to 28,680,983. The result is very similar to that of
the third-order basis functions except that the shift of the resonant peak is
increased from 0.05% to 0.063%. Thus, for this example, the second-order basis
functions are also adequate.

(a) (b)

Figure 6.32 Power reflection and transmission coefficients of the full-scale ECDMRR. (a)
Comparison between the measured and simulated results. (b) Comparison with the
simulated result shifted by 0.78 nm. © Optical Society of America 2014 [16].
278 Advanced Computational Electromagnetic Methods and Applications

(a) (b)

(c) (d)

Figure 6.33 Snapshot of | E | in the z = 0 m plane. (a) With  = 1,550.93 mm. (b) Enlarged view of
the field near the gap between the bus and the ECDMRR at  = 1,550.93 mm. (c) With
 = 1,549.14 mm. (d) Enlarged view of the field near the gap between the bus and the
ECDMRR at  = 1,549.14 mm. © Optical Society of America 2014 [16].

6.8 SUMMARY

Numerical discretization of large-scale electromagnetic problems often results in a


large system of linear equations involving millions or even billions of unknowns,
whose solution is very challenging even with the most powerful computers
available today. In this chapter, we discussed the domain decomposition methods
for finite element analysis of such large-scale electromagnetic problems. To be
more specific, we considered several numerical algorithms based on the dual-
primal finite element tearing interconnecting (FETI-DP) method for the full-wave
analysis of large-scale electromagnetic problems. When first introduced to CEM,
the FETI-DP method converted a volumetric problem to a surface problem by
assuming an unknown Neumann boundary condition on the subdomain interface
with the aid of one Lagrange multiplier. It was a typical nonoverlapping iterative
substructuring domain decomposition method combined with the local Schur
complement. Later, an unknown Robin boundary condition was introduced on the
subdomain interface with the aid of two Lagrange multipliers to improve the
convergence of the global interface solution in the high-frequency region. Both
Domain Decomposition Methods For Finite Element Analysis 279

versions construct a global corner system that relates the fields at the crosspoints
between the subdomains through a Dirichlet continuity condition. This corner
system provides a coarse grid correction to improve the convergence of an iterative
solution by propagating residual errors globally in each iteration.
The algorithms described in this chapter expand the capability and improve
the performance of the FETI-DP method by: (1) lifting the requirement of
conformal meshes on the subdomain interface; (2) speeding up the convergence
rate of the iterative solution of the global interface problem; and (3) incorporating
appropriate truncation boundaries for more accurate results. First, we formulated
two nonconformal FETI-DP methods, both of which implement the Robin-type
transmission condition at the subdomain interfaces. One nonconformal method
extends the conformal FETI-DP algorithm, which is based on two Lagrange
multipliers, to deal with nonconformal interface and corner meshes, whereas the
other one employs cement elements on the interface, combines the global primal
unknowns with the global dual unknowns, and extracts the corner unknowns to
formulate a global coarse problem. Second, we implemented higher-order
transmission conditions in the Lagrange multiplier-based FETI-DP method for a
faster convergence because higher-order transmission conditions can transmit both
transverse-electric and transverse-magnetic evanescent modes in addition to the
propagating modes. Furthermore, when perfectly matched layers (PMLs) are used
as truncation, higher-order transmission conditions become more critical for a
converged result. Third, for multiregion electromagnetic problems, we developed a
hybrid method that employs the finite element tearing and interconnecting (FETI)
method to deal with mesh-nonconformal and/or geometry-nonconformal interfaces
between regions and the FETI-DP method for mesh-conformal and geometry-
conformal interfaces inside each region. We formulated a unified global system of
equations for the interface unknowns from both nonconformal and conformal
interfaces. In the formulation, we applied higher-order transmission conditions and
a generalized crosspoint correction technique to improve the convergence and
ensure a correct interconnection across subdomain interfaces. Fourth, we described
an oblique absorbing boundary condition and applied it to the FETI-DP method for
simulating large finite antenna arrays. This boundary condition can be tuned to be
reflectionless for all frequencies and polarizations as long as the main beam of the
radiated wave is specified. Finally, we presented numerical results for the
simulation of wave propagation, finite antenna arrays, photonic crystal cavities,
and optical devices to demonstrate the application, accuracy, efficiency, and
capability of these algorithms.

REFERENCES

[1] J. Jin, The Finite Element Method in Electromagnetics, 3rd ed., New York: Wiley, 2014.
[2] J. Jin and D. Riley, Finite Element Analysis of Antennas and Arrays, New York: Wiley, 2008.
280 Advanced Computational Electromagnetic Methods and Applications

[3] C. Farhat, and F. Roux, “A Method of Finite Element Tearing and Interconnecting and its
Parallel Solution Algorithm,” Int. J. Numer. Meth. Eng., vol. 32, no. 6, pp. 12051227, 1991.
[4] C. Farhat, A. Macedo, M. Lesoinne, F. Roux, and F. Magoulès, “Two-Level Domain
Decomposition Methods with Lagrange Multipliers for the Fast Iterative Solution of Acoustic
Scattering Problems,” Comput. Methods Appl. Mech. Eng., Vol. 184, No. 24, pp. 213239,
2000.
[5] C. Farhat, M. Lesoinne, P. LeTallec, K. Pierson, and D. Rixen, “FETI-DP: A Dual-Primal
Unified FETI Method—part I: A Faster Alternative to the Two-Level FETI Method,” Int. J.
Numer. Meth. Eng., Vol. 50, No. 7, pp. 15231544, 2001.
[6] C. Farhat, J. Li, and P. Avery, “A FETI-DP Method for the Parallel Iterative Solution of
Indefinite and Complex-Valued Solid and Shell Vibration Problems,” Int. J. Numer. Meth. Eng.,
Vol. 63, No. 3, pp. 398427, 2005.
[7] C. Farhat, P. Avery, R. Tezaur, and J. Li, “FETI-DPH: A Dual-Primal Domain Decomposition
Method for Acoustic Scattering,” J. Comput. Acoust., Vol. 13, No. 3, pp. 499524, 2005.
[8] Y. Li and J. Jin, “A Vector Dual-Primal Finite Element Tearing and Interconnecting Method for
Solving 3D Large-Scale Electromagnetic Problems,” IEEE Trans. Antennas Propag., Vol. 54,
No. 10, pp. 30003009, 2006.
[9] Y. Li and J. Jin, “A New Dual-Primal Domain Decomposition Approach for Finite Element
Simulation of 3D Large-Scale Electromagnetic Problems,” IEEE Trans. Antennas Propag., Vol.
55, No. 10, pp. 28032810, 2007.
[10] Y. Li and J. Jin, “Implementation of the Second-Order ABC in the FETI-DPEM Method for 3D
EM Problems,” IEEE Trans. Antennas Propag., Vol. 56, No. 8, pp. 27652769, 2008.
[11] Y. Li and J. Jin, “Parallel Implementation of the FETI-DPEM Algorithm for General 3D EM
Simulations,” J. Comput. Phys., Vol. 228, No. 9, pp. 32553267, 2009.
[12] A. Toselli and O. Widlund, Domain Decomposition Methods  Algorithms and Theory, Berlin:
Springer-Verlag, 2005.
[13] T. Mathew, Domain Decomposition Methods for the Numerical Solution of Partial Differential
Equations, Berlin: Springer-Verlag, 2008.
[14] M. Xue and J. Jin, “Application of a Nonconformal FETI-DP Method in Antenna Array
Simulations,” IEEE APS Int. Symp. Dig., pp. 12, 2012.
[15] M. Xue and J. Jin, “Nonconformal FETI-DP methods for Large-Scale Electromagnetic
Simulation,” IEEE Trans. Antennas Propag., Vol. 60, No. 9, pp. 42914305, 2012.
[16] M. Xue, Y. Kang, A. Arbabi, S. McKeown, L. Goddard, and J. Jin, “Fast and Accurate Finite
Element Analysis of Large-Scale Three-Dimensional Photonic Devices with a Robust Domain
Decomposition Method,” Opt. Express, Vol. 22, No. 4, pp. 44374452, 2014.
[17] M. Gander, F. Magoulès, and F. Nataf, “Optimized Schwarz Methods without Overlap for the
Helmholtz Equation,” SIAM J. Sci. Comput., Vol. 24, No. 1, pp. 3860, 2003.
[18] M. Gander, L. Halpern, and F. Magoulès, “An Optimized Schwarz Method with Two-Sided
Robin Transmission Conditions for the Helmholtz Equation,” Int. J. Numer. Meth. Fluids, Vol.
55, No. 2, pp. 163175, 2007.
[19] M. Gander, and F. Kwok, “Best Robin Parameters for Optimized Schwarz Methods at Cross
Points,” SIAM J. Sci. Comput., Vol. 34, No. 4, pp. 18491879, 2010.
Domain Decomposition Methods For Finite Element Analysis 281

[20] P. Collino, G. Delbue, P. Joly, and A. Piacentini, “A New Interface Condition in the Non-
Overlapping Domain Decomposition for the Maxwell Equations,” Comput. Methods Appl. Mech.
Eng., Vol. 148, No. 12, pp. 195207, 1997.
[21] A. Alonso-Rodriguez and L. Gerardo-Giorda, “New Nonoverlapping Domain Decomposition
Methods for the Harmonic Maxwell System,” SIAM J. Sci. Comput., Vol. 28, No. 1, pp. 102122,
2006.
[22] V. Dolean, M. Gander, and L. Gerardo-Giorda, “Optimized Schwarz Methods for Maxwell’s
Equations,” SIAM J. Sci. Comput., Vol. 31, No. 3, pp. 21932213, 2009.
[23] Z. Peng, V. Rawat, and J. Lee, “One Way Domain Decomposition Method with Second Order
Transmission Conditions for Solving Electromagnetic Wave Problems,” J. Comput. Phys., Vol.
229, No. 4, pp. 11811197, 2010.
[24] Z. Peng and J. Lee, “Non-Conformal Domain Decomposition Method with Second-Order
Transmission Conditions for Time-Harmonic Electromagnetics,” J. Comput. Phys., Vol. 229, No.
8, pp. 56155629, 2010.
[25] Z. Peng and J. Lee, “Non-Conformal Domain Decomposition Method with Mixed True Second
Order Transmission Condition for Solving Large Finite Antenna Arrays,” IEEE Trans. Antennas
Propag., Vol. 59, No. 5, pp. 16381651, 2011.
[26] M. Xue and J. Jin, “A Hybrid Nonconformal FETI/Conformal FETI-DP Method for Arbitrary
Nonoverlapping Domain Decomposition Modeling,” IEEE APS Int. Symp. Dig., pp. 16281629,
2013.
[27] M. Xue and J. Jin, “A Hybrid Conformal/Nonconformal Domain Decomposition Method for
Multi-Region Electromagnetic Modeling,” IEEE Trans. Antennas Propag., Vol. 62, No. 4, pp.
20092021, 2014.
[28] M. Xue and J. Jin, “Application of an Oblique Absorbing Boundary Condition in the Finite
Element Simulation of Phased-Array Antennas,” Microwave Opt. Technol. Lett., Vol. 56, No. 1,
pp. 178184, 2014.
[29] B. Després, P. Joly, and J. Roberts, “A Domain Decomposition Method for the Harmonic
Maxwell Equations,” Iterative Methods in Linear Algebra, North-Holland, Amsterdam, pp.
475484, 1992.
[30] B. Stupfel, “A Fast-Domain Decomposition Method for the Solution of Electromagnetic
Scattering by Large Objects,” IEEE Trans. Antennas Propag., Vol. 44, No. 10, pp. 13751385,
1996.
[31] B. Stupfel and M. Mognot, “A Domain Decomposition Method for the Vector Wave Equation,”
IEEE Trans. Antennas Propag., Vol. 48, No. 5, pp. 653660, 2000.
[32] F. Roux, “A FETI-2LM Method for Non-Matching Grids,” Lecture Notes Comput. Sci. Eng., Vol.
70, pp. 121128, 2009.
[33] Y. Achdou, C. Japhet, Y. Maday, and F. Nataf, “A New Cement to Glue Nonconforming Grids
with Robin Interface Conditions: The Finite Volume Case,” Numer. Math., Vol. 92, pp. 593620,
2002.
[34] S. Lee, M. Vouvakis, and J. Lee, “A Non-Overlapping Domain Decomposition Method with
Non-Matching Grids for Modeling Large Finite Antenna Arrays,” J. Comput. Phys., Vol. 203,
No. 1, pp. 121, 2005.
282 Advanced Computational Electromagnetic Methods and Applications

[35] M. Vouvakis, Z. Cendes, and J. Lee, “A FEM Domain Decomposition Method for Photonic and
Electromagnetic Band Gap Structures,” IEEE Trans. Antennas Propag., Vol. 54, pp. 721–733,
2006.
[36] K. Zhao, V. Rawat, S. Lee, and J. Lee, “A Domain Decomposition Method with Nonconformal
Meshes for Finite Periodic and Semi-Periodic Structures,” IEEE Trans. Antennas Propag., Vol.
55, No. 9, pp. 25592570, 2007.
[37] Z. Lu, X. An, and W. Hong, “A Fast Domain Decomposition Method for Solving Three-
Dimensional Large-Scale Electromagnetic Problems,” IEEE Trans. Antennas Propag., Vol. 56,
No. 8, pp. 22002210, 2008.
[38] V. Rawat, “Finite Element Domain Decomposition with Second Order Transmission Condition
for Time Harmonic Electromagnetic Problem,” Ph.D. Thesis, The Ohio State University, 2009.
[39] J. Ma, J. Jin, and Z. Nie, “A Nonconformal FEM-DDM with Tree-Cotree Splitting and Improved
Transmission Condition for Modeling Subsurface Detection Problems,” IEEE Trans. Geosci.
Remote Sens., Vol. 52, No. 1, pp. 355364, 2014.
[40] CUBIT, available online https://cubit.sandia.gov/.
[41] METIS, Serial graph partitioning and fill-reducing matrix ordering, available online
http://glaros.dtc.umn.edu/gkhome/metis/metis/overview.
[42] M. Xue, J. Jin, S. Wong, C. Macon, and M. Kragalott, “Experimental validation of the FETI-
DPEM algorithm for simulating phased-array antennas,” IEEE APS Int. Symp. Dig., pp.
24952498, 2011.
[43] J. Webb, “Hierarchical Vector Basis Functions of Arbitrary Order for Triangular and Tetrahedral
Elements,” IEEE Trans. Antennas Propag., Vol. 47, No. 8, pp. 12441253, 1999.
[44] COMSOL Multiphysics ver. 4.2, available online http://www.comsol.com/.
Chapter 7
High-Accuracy Computations for
Electromagnetic Integral Equations
Andrew F. Peterson and Malcolm M. Bibby

Historically, the numerical solution of integral equations has been dominated by


techniques of relatively low accuracy. By “low accuracy,” we imply that the results
contain one to two decimal places of accuracy, or an error in the range of 5% to
10%. Today, the ingredients are in place to enable solutions of much higher
accuracy, say, four to six digits or more, if and when high accuracy is required.
Certainly there are many practical problems that do not require high accuracy, and
many situations where electromagnetic modeling software is used with the goal of
obtaining a qualitative understanding of the phenomenology of the fields, rather
than a quantitative result. However, there are other applications, such as
determining the resonant frequency of a cavity or optimizing the precise
dimensions of a microwave device, where an accuracy of one part in 10 6 or better
might be needed. A long-term goal of the present research is to enable techniques
that permit controlled accuracy or “dialable” accuracy, namely, specifying in
advance the number of digits of accuracy in the result. High accuracy, to some
extent, is a prerequisite for controlled accuracy computations. In other words, one
has to be able to obtain high accuracy to reliably produce moderate accuracy.
In this chapter, we review progress toward controlled accuracy computations
for electromagnetic integral equations, applied to perfectly conducting objects,
with a focus on the authors’ contributions of the past decade [117]. The
ingredients needed to facilitate high-accuracy computations include robust
formulations, curved patch models for curved objects, high-order representation of
currents or fields, treatment of field and current singularities at edges, corners, and
tips, accurate techniques for evaluating Green’s function integrals, an
understanding of convergence rates, techniques for error estimation, and an overall
control strategy that incorporates adaptive refinement procedures. The state of
progress in many of these areas will be reviewed and illustrated by examples, and
areas where additional work is needed will be identified.

283
284 Advanced Computational Electromagnetic Methods and Applications

7.1 NORMALIZED RESIDUAL ERROR

In the following sections, we will need to determine the accuracy of results in


situations where exact solutions are not available. In this work, as in [4], we use
the normalized residual error (NRE) as a measure of the accuracy of a particular
result. The NRE is obtained as a byproduct of solving an overdetermined system of
equations to obtain the least square solution. This approach was recommended by
Davies [18], who stated “…in contrast to the point-matching method, least squares
is a rigorously convergent procedure” and provided a proof in an appendix of [18].
The specific implementation we follow was proposed by Bunch and Grow under
the name boundary residual method [1921]. Suppose we have an integral
equation
Lf  g (7.1)
where L denotes the operator, f(s) denotes the unknown function to be determined,
and g(s) is the known excitation function. A numerical representation is sought
with the form

f ( s)   B ( s)
n n (7.2)

The residual function corresponding to a specific approximate solution is defined

R( s)  Lf  g   LB n n
g (7.3)

The minimum residual, in the least square sense, is produced by the set of
coefficients {n} that minimizes the expression

 R( s ) w
2 2
ds  i R( si ) (7.4)

where {wi, si} are the weights and nodes of a quadrature rule.
In the context of the boundary residual method, (7.1) can be discretized into
an overdetermined system of equations by selecting more testing points than basis
functions. Each equation is weighted with the square root of the appropriate
quadrature rule weight, to produce the M by N system [1921]

 w1 LB1 w1 LBN    w1 g ( s1 ) 
 s1
 1  
s1

      (7.5)
    
 wM LB1 sM wM LBN s   N   wM g ( sM ) 
M 
High-Accuracy Computations for Electromagnetic Integral Equations 285

Then a least-square solution to (7.5), using standard matrix library routines, will
minimize the residual in (7.4). As noted by Bunch and Grow, the more accurate
the quadrature rule, the better the residual will be minimized. For the results that
follow, we always employed Gauss-Legendre rules for weights and nodes, and
typically M/N = 2. We employ a discrete model of the target under consideration,
and apply the quadrature rule on a cell-by-cell basis when constructing (7.5).
As a byproduct of the least square solution, the residual error is easily
obtained at each quadrature node {si} and used to compute NRE

w
2
i
R ( si )
NRE  (7.6)
w
2
i
g ( si )

The NRE exhibits excellent correlation with true error for problems where
exact solutions are available [4].
In the following, numerical results will usually be computed for a basis of
fixed polynomial order q (for a polynomial of degree p, the order q will be defined
so that q = p + 1), as the cell sizes in use are systematically reduced. Under these
conditions, the slope of NRE versus the reciprocal of the cell size usually
approaches integer values. The slope of an error curve for fixed q, on a log-log plot,
is obtained as [4]
log10 (NRE 2 )  log10 (NRE1 )
slope q  (7.7)
log10 ( h1 )  log10 ( h2 )

where h denotes the average cell size in use.


We note that NRE is a convenient measure of accuracy for research purposes,
although not necessarily as computationally efficient as might be desired in an
error estimator used for production adaptive refinement algorithms. For adaptive
refinement, only the rows and columns in (7.5) associated with cells selected for
refinement need to be modified, minimizing the computational effort.

7.2 HIGH-ORDER TREATMENT OF SMOOTH TARGETS

One of the simplest electromagnetic scattering problems is that a circular, perfectly


conducting cylinder (a hypothetically infinite, 2-D structure) is illuminated by a
uniform plane wave. The equivalent electric current density can be represented by
polynomial basis functions (specifically Legendre polynomials) of degree p or
order q = p + 1. Suppose that we seek a high-order solution where the degree of the
polynomial can range to 5 or more. The various 2-D integral equations can be
solved as overdetermined systems, and the NRE computed according to (7.4).
References [4, 5] discuss this approach and contain a detailed description of the
behavior of the error associated with the results; here we note that the NRE curves
286 Advanced Computational Electromagnetic Methods and Applications

for these results approximate straight lines on a log-log plot, with slopes that
approach integers as the cell sizes are reduced. Figure 7.1 shows plots of the NRE
curves for a circular cylinder of radius 1.0 wavelength, for polynomial orders
between q = 1 and q = 9, obtained from a numerical solution of the magnetic-field
integral equation (MFIE) for the TM polarization. Figure 7.2 shows plots of the
slopes of these curves for the four different equations considered in [4, 5], namely
the electric-field integral equation (EFIE) and the MFIE for the transverse
magnetic (TM) and transverse electric (TE) polarizations. It is observed that the
TM EFIE produces results that exhibit slopes approximately equal to q + 1 as the
cell sizes are reduced, while the TE EFIE results exhibit slopes that approach the
integer q – 1. The MFIE for either polarization produces NRE curves with slopes
approximating q.
Despite the fact that this is an extremely simple problem, it is of interest
because we will use the slopes of these NRE curves (for the appropriate equation)
as a baseline for future comparison. When results for more complex structures
exhibit the same slopes as those obtained in Figure 7.1, for the same degree of
representation, we will conclude that the results truly exhibit high-order behavior.
Note that the 3-D EFIE should mimic the behavior of the 2-D TE EFIE.
0.0

-2.0

-4.0
Log10(NRE)

-6.0

-8.0

-10.0 q = 1
q = 3
q = 5
-12.0 q = 7 .
q = 9 MFIE-TM: a = 1, m = 2
-14.0
3 4 5 6 7 8 100 2 3 4 5 6
DOF

Figure 7.1 NRE values obtained for the circular perfectly conducting cylinder of unit radius,
obtained from the TM MFIE.

The behavior of NRE has been investigated for other smooth structures,
including a prolate spheroid represented by an EFIE [3] and several toroidal targets
represented by MFIE [7, 15]. For these structures, the slopes of the NRE curves
approximate the same integer values for small cell sizes as those identified above.
The primary difficulty associated with the high-order numerical solution of the
preceding examples is that of maintaining sufficient accuracy in the matrix entries
High-Accuracy Computations for Electromagnetic Integral Equations 287

arising from the MoM procedure. Suffice it to say that the tried-and-true methods
in widespread use in conjunction with low-order solution procedures are not
adequate for high-order procedures. Specific details of these computations, and
procedures for accurate evaluation of the associated integrals, are discussed in [1, 6,
10, 13].
5 6 7 100.0 2 3 4 5 6

EFIE_TE EFIE_TM
1.0

0.0

-1.0

-2.0

-3.0

-4.0
q=1
q=2
Slope-q of NRE

-5.0
q=3
q=4 -6.0
q=5
-7.0
q=6
MFIE_TE MFIE_TM
-8.0
1.0

0.0

-1.0

-2.0

-3.0

-4.0

-5.0

-6.0

-7.0

-8.0
5 6 7 100.0 2 3 4 5 6 Degrees of Freedom

Figure 7.2 Slopeq values obtained for four sets of numerical results for the circular perfectly
conducting cylinder of unit radius. The slopes approach integer values as the cell sizes
are reduced.

Clearly, most practical targets of interest are not as smooth as circular


cylinders, spheroids, or toroids. Thus, we now turn our attention to targets with
edges.

7.3 THE DIPOLE ANTENNA

Reference [3] investigated the high-order treatment of the linear dipole antenna,
when excited by a magnetic frill feed model and described by an MFIE. (EFIE
formulations were also investigated [2, 13].) The frill-fed dipole is actually a
model for a monopole fed through a ground plane. Two types of end geometries
were considered in [3]: flat end caps and hemispherical end caps. The NRE was
evaluated on a cell-by-cell basis along the dipole for various order numerical
solutions. Polynomial expansions were used in all cells.
288 Advanced Computational Electromagnetic Methods and Applications

Figure 7.3 shows the behavior of the NRE as a function of location along the
surface, for a dipole with flat end caps, after [3]. It should be apparent that as the
representation order is increased, the NRE is reduced by orders of magnitude in
cells along the barrel of the dipole, as well as in cells toward the center of the end
caps, but is not substantially reduced in the vicinity of the corners where the barrel
meets the flat ends. In fact, the charge singularity at those corners is not properly
modeled by the polynomial representation, and the resulting error in the residual
indicates that the accuracy of the result is not ensured. In this, and other examples
involving edges, corners, or even junctions involving a discontinuity in the
curvature of the geometry, purely polynomial representations are not able to
systematically reduce the local NRE levels. In practice, it is desirable to reduce the
NRE to a comparable level across the entire computational domain to ensure a
reliably accurate result.

Figure 7.3 Plot of the local NRE values along the barrel and end caps of a linear dipole of length
0.5 wavelength, radius 0.0625 wavelengths, and excited with a frill feed having b/a =
1.2. Plots show that the NRE is not reduced in the vicinity of the corners where the
barrel meets the flat end caps. ©IEEE 2004 [3].

A second lesson learned from the investigation of several formulations for


dipole antennas ([24, 13] for further details) is that it is helpful to adapt the cell
sizes or basis function polynomial orders in order to reduce the NRE to fairly
uniform levels over the entire computational domain. For instance, because of the
rapidly varying fields and current/charge densities in the vicinity of the dipole feed,
smaller cells used near the feed enable a relatively uniform residual error along the
entire barrel of the dipole, for a uniform polynomial order.
High-Accuracy Computations for Electromagnetic Integral Equations 289

7.4 HIGH-ORDER TREATMENT OF WEDGE SINGULARITIES

References [8, 1113, 16] proposed a technique for the accurate treatment of
singularities in 2-D problems, based on the analytic behavior of the current and
charge densities in the vicinity of an ideal wedge. Figure 7.4 shows a wedge with
interior angle. In the immediate neighborhood of the tip, the current density has the
asymptotic form, valid for small ,
 
Jz  c
m 0
 n 1
mn 
2 m n 1
(7.8)

for the component parallel to the edge and


 
J  d
m 0
 n 0
mn 
2 m n
(7.9)

for the component perpendicular to the edge. In these expressions,


n
n  (7.10)
2  

Figure 7.4 Perfectly conducting wedge with interior angle.

The proposed approach is to construct basis functions that incorporate


polynomial terms as well as terms that involve some of the fractional exponents in
(7.8) and (7.9). A parameter study [12] concluded that a reasonable starting point is
to create basis functions for the corner cells that include an equal number of
fractional exponents as there are integer exponents, and to make those cells twice
the size of the nearby noncorner cells (thus using the same average unknown
density). A variety of results using this approach have been presented for triangular
cylinders, square cylinders, dipoles with open ends, and targets of other shapes
[1113, 16].
290 Advanced Computational Electromagnetic Methods and Applications

As an illustrative example, Table 7.1 shows the specific degrees of freedom


for a 60o corner, with exponents associated with the current density for the TE
polarization. For such a geometry the current density is finite at the corners, but the
charge density exhibits an unbounded behavior. Numerical results were obtained
from the MFIE using these degrees of freedom to represent currents on a perfectly
conducting cylinder whose cross-section is an equilateral triangle, with side
dimension of four wavelengths, when illuminated by a uniform plane wave
incident upon one corner. Corner cells were twice as large as other cells. For these
computations [12], the NRE was reduced by more than nine orders of magnitude as
the order is varied from q = 1 to q = 8 and the cell sizes are reduced. Figure 7.5
shows the slopes of the NRE curves obtained from the results, which approach the
same integer values as the MFIE results for the circular cylinder example in Figure
7.1. At high orders, the accuracy is limited by the precision of the computations,
and the curves deviate from the integer limiting values.

Table 7.1
Exponents Used in Basis Functions of a Given Order, for Cells in the Vicinity of a 60o Corner, TE
Polarization. The corner cells involve twice as many terms and are recommended to be twice as large as
the neighboring cells.

Representation Exponents Used in Noncorner Exponents Used in Corner


Order q Cells Cells

1 0 0, 3/5

2 0, 1 0, 3/5, 1, 6/5

3 0, 1, 2 0, 3/5, 1, 6/5, 2, 9/5

4 0, 1, 2, 3 0, 3/5, 1, 6/5, 2, 9/5, 3, 12/5

5 0, 1, 2, 3, 4 0, 3/5, 1, 6/5, 2, 9/5, 3, 12/5, 4, 13/5

Another example that illustrates the treatment of current singularities, for


which an exact solution is available for comparison, is the flat perfectly conducting
strip. Numerical results obtained using the preceding approach and an MoM
solution of the EFIE were compared with an eigenfunction expansion in terms of
Mathieu functions, and demonstrated that the error in the current (as well as the
NRE) was reduced by 11 orders of magnitude as the order of the basis represen
-tation was improved from q = 1 to q = 12. Figure 7.6 shows results for the error in
the current density the TM polarization, after [17]. Similar results were obtained
with an LCN discretization as reported in [16]. These results show conclusively
that the high-order approach produces high accuracy. (Since an exact solution is
available for this problem, it also enables a comparison between the actual error in
current and the NRE, not shown, which supports the validity of the NRE as a good
measure of accuracy.)
High-Accuracy Computations for Electromagnetic Integral Equations 291

Figure 7.5 Slopes of the NRE curves for results obtained from the TE MFIE, for a perfectly
conducting cylinder whose cross section is an equilateral triangle of side dimension four
wavelengths. Until the solution accuracy reaches the precision limit, these slopes
approximate the same integer values as those of the circular cylinder. ©ACES 2009 [12].

Figure 7.6 Error in the current density as obtained by a method-of-moments solution of the TM
EFIE for a flat strip of seven wavelengths, illuminated by a uniform plane wave. The
reference result is the current density obtained from the eigenfunction expansion in
terms of Mathieu functions.

In summary, the singular representation described above has been successfully


applied to a number of 2-D targets containing corners, using both the MoM and the
locally corrected Nystrom procedures. The results from the approach exhibit slopes
292 Advanced Computational Electromagnetic Methods and Applications

that approximate those of the circular cylinder problem, suggesting that the
representations are truly producing high-order behavior.

7.5 HIGH-ORDER TREATMENT OF JUNCTIONS

The behavior of NRE has also been used to investigate the satisfaction of boundary
conditions for a problem involving the junction of three conducting strips [11, 16],
as depicted in Figure 7.7. For that target, expansions containing fractional
exponents were required at the strip ends and at the junction. When the appropriate
representations were used in both locations, local residual errors were uniform
across the individual strips and were systematically reduced as the order of the
representations was increased. When representations containing fractional
exponents were dropped from either the end cells or the cells adjacent to the
central junction, the high-order behavior was not observed. Details may be found
in [16].

Figure 7.7 A perfectly conducting target made by connecting three strips at a central junction.

7.6 ALTERNATIVE ERROR ESTIMATORS

The preceding results suggest that it is quite possible to obtain high-accuracy


solutions to 2-D problems, even in the presence of ends, corners, and other
irregular features of the geometry. Is that approach sufficient to perform controlled
accuracy computations, where a desired error level is specified in advance and
achieved by the algorithm without human intervention?
The principal missing ingredient at the present time appears to be a
computationally efficient error estimator that would enable an adaptive refinement
procedure to be used in conjunction with integral equation formulations. While
adaptive refinement procedures are widespread in connection with the solution of
High-Accuracy Computations for Electromagnetic Integral Equations 293

partial differential equations, relatively few attempts have been reported in the
electromagnetics literature in connection with integral equations [4, 14, 2226].
(There are a number of reports in the mechanical engineering literature, which we
do not attempt to review here.)
While the NRE error estimator described in Section 7.1 is robust, it is a
relatively expensive error estimator. It involves the solution of an overdetermined
system, which is a computational burden that grows as O(N3), where N is the
number of unknowns in the problem. Furthermore, on a conventional processor,
the least-square solution of a 2:1 overdetermined system requires approximately
five times as many operations as the solution of a square system by the LU
factorization. Alternative residual-based estimators have been investigated for 2-D
problems [14]. For example, if the tangential field boundary condition is enforced
as part of the formulation, the residual in the normal component of the field may be
used for error estimation. Alternatively, the residual in the magnetic field can be
used to estimate the error in an EFIE solution, and so on.
Residual computations usually involve a cost proportional to O(N2), which
may be prohibitive. However, it should be noted that this type of computation
lends itself to parallel implementation and should be easily adapted to multicore,
Xeon Phi coprocessors and GPU processors. In this situation, the actual overhead
of residual-based estimators may be small in practice.
Residual errors are fundamentally tied to the problem boundary conditions and
are inherently robust [22]. Other less robust estimators can be developed with an
O(N) cost. For example, in situations where current continuity is imposed by the
representation, the discontinuity in the derivative of the current density may be
used to identify regions with larger error. In vector problems where one component
of the current density is made continuous, the discontinuity in the other component
may be used to drive an error estimate. However, these cheaper estimators are not
expected to be as robust as the NRE estimator described above. They may also
require some kind of calibration to enable their use as a global error predictor.
These error estimators are of the explicit type and may be applied to any
approximate solution. Implicit error estimators [27] are often used in connection
with partial differential equation formulations and may also be useful for integral
equation formulations. These techniques involve re-solving subsets of the original
problem with a finer mesh or higher-order representation, measuring how much the
local solution changes, and using that information to estimate the error in the
original result.

7.7 PROSPECTS FOR CONTROLLED ACCURACY COMPUTATIONS


IN THREE-DIMENSIONAL PROBLEMS

The preceding sections primarily describe the state of high-accuracy computations


for 2-D problems. Work on 3-D problems has not yet evolved to the same extent.
Most 3-D structures are modeled with flat-faceted triangular or quadrilateral
294 Advanced Computational Electromagnetic Methods and Applications

patches, and there is little understanding of the fundamental accuracy limitations


associated with such models. Curved-cell models of 3-D structures are also in use,
and will likely be needed to eliminate fictitious edges where current and field
components incorrectly exhibit singularities. However, there remain challenges
associated with adequate smoothness [28]. For smooth geometries, a variety of
high-order basis functions exist [2932], and baseline convergence rates have been
established [5, 33]. For problems with edges, the approach of Section 7.4 may be
extended to treat edge singularities, but 3-D problems also introduce geometric tips
that will require special treatment [34, 35]. High-accuracy integration of singular
Green’s functions (and singular basis functions) remains far more challenging in
3-D than in 2-D. Finally, little work to date has focused on error estimation in 3-D
problems.
A conceptually simple 3-D problem is that of a perfectly conducting plate.
The primary challenge associated with this structure is the proper modeling of
current density and charge density near the plate edges and near the plate corners.
Since edge currents should be amenable to treatment using the approach described
in Section 7.4, for an interior wedge angle of zero, here we consider the additional
singularity introduced at the plate corner. Currents near the corner of a plate should
behave in a similar manner to currents near the tip of an infinite plane annular
sector, a canonical problem that is amenable to solution [34]. The plane annular
sector produces a current density with the asymptotic form

1  A1 ( )r 1  A0 ( )r  A1 ( )r


oi ei oi 1

Jr   A ( )r  2  A ( )r 3  ...
sin  
 (7.11)
 2 3
ei oi


J sin  B1 ( )r  (7.12)


 oi 1  oi 1  oi  3
 B1 ( )r  B3 ( )r  ...

C1 ( )r  C0 ( )r 
 1 
1 ei oi

S  J  C ( )r 1  C ( )r


sin   oi  2
 (7.13)
 1 2
ei
 ...

where oi and ei are parameters that can be found by a solution of the appropriate
Lame equations, and (r, f) denote sphero-conal coordinates (see reference [34] for
details). As an example, Table 7.2 shows the first few values for noi and nei for a
90o corner.
The expansions in (7.11)–(7.13) for the current behavior near the plate tip are
somewhat analogous to the canonical wedge solution for current behavior near an
edge. Ongoing research is expected to produce suitable representations for the
current and charge densities near plate corners based upon these expressions.
High-Accuracy Computations for Electromagnetic Integral Equations 295

Table 7.2
Exponents of the Radial Variable r Used in the Asymptotic Expansion of the Current Density in the
Vicinity of a 90o Plate Corner

Index  oi ei

1 0.81466 0.29658

2 1.59713 1.13125

3 1.95533 1.42651

4 2.52088 2.03957

7.8 SUMMARY

Controlled accuracy computations are nearing realization for fairly general 2-D
EMF problems based on integral equation formulations. The authors’ research on
high-order representations for currents at edges, combined with the existing
technologies on high-order bases for smooth surfaces, curved-cell models, and
accurate integration procedures, enables computations that have been demonstrated
to produce 710 or more digits of accuracy for a variety of problems involving
perfectly conducting structures. Additional research into accurate, efficient error
estimators is required, as is a broader experience base with penetrable and lossy
targets.
Much work is still needed for 3-D problems. To date, the authors are aware of
no 3-D target containing edges or tips for which high-accuracy solutions have been
found with integral equation formulations.

REFERENCES

[1] M. Bibby, and A. Peterson, “High Accuracy Calculation of the Magnetic Vector Potential on
Surfaces,” Applied Computational Electromagnetics Society (ACES) Journal, Vol. 18, pp. 1222,
March 2003.
[2] A. Peterson, “Application of the Locally-Corrected Nyström Method to the EFIE for the Linear
Dipole,” IEEE Trans. Antennas Propagat., Vol. 52, pp. 603605, 2004.
[3] A. Peterson, and M. Bibby, “High-Order Numerical Solutions of the MFIE for the Linear
Dipole,” IEEE Trans. Antennas Propagat., Vol. 52, pp. 26842691, 2004.
[4] M. Bibby, and A. Peterson, “On the Use of Over-Determined Systems in the Adaptive Numerical
Solution of Integral Equations,” IEEE Trans. Antennas Propagat., Vol. 53, pp. 22672273, 2005.
296 Advanced Computational Electromagnetic Methods and Applications

[5] A. Peterson, and M. Bibby, “Error Trends in Higher-Order Discretizations of the EFIE and
MFIE,” Digest of the 2005 IEEE Antennas and Propagation Society International Symposium,
Washington, D.C., Vol. 3A, pp. 5255, 2005.
[6] M. Bibby, and A. Peterson, “High Accuracy Evaluation of the EFIE Matrix Entries on a Planar
Patch,” Applied Computational Electromagnetics Society (ACES) Journal, Vol. 20, pp. 198206,
2005.
[7] M. Bibby, C. Coldwell, and A. Peterson, “Normally-Integrated Magnetic Field Integral
Equations for Electromagnetic Scattering,” IEEE Trans. Antennas Propagat., Vol. 55, pp.
25302536, 2007.
[8] M. Bibby, A. Peterson, and C. Coldwell, “High Order Representations for Singular Currents at
Corners,” IEEE Trans. Antennas Propagat., Vol. 56, pp. 22772287, 2008.
[9] M. Bibby, A. Peterson, and C. Coldwell, “Use of Extrapolation to Improve Accuracy and
Enhance Confidence in Numerical Results, ” IEEE Antennas and Propagation Magazine, Vol. 50,
No. 4, pp. 150155, August 2008.
[10] M. Bibby, and A. Peterson, “Highly Accurate Implementations of Singularity Cancellation and
Extraction Methods on a Planar patch,” ACES Journal, Vol. 23, pp. 298302, 2008.
[11] A. Peterson, M. Bibby, and C. Coldwell, “Satisfaction of End, Continuity, and Junction
Conditions by Implicit and Explicit Subsectional Legendre Expansions,” Proceedings of the 25th
Annual Review of Progress in Applied Computational Electromagnetics, Monterey, CA, pp.
771774, 2009.
[12] M. Bibby, A. Peterson, and C. Coldwell, “Optimum Cell Size for High Order Singular Basis
Functions at Geometric Corners,” ACES Journal, Vol. 24, pp. 368374, 2009.
[13] A. Peterson, and M. Bibby, An Introduction to the Locally-Corrected Nyström Method, San
Rafael: Morgan & Claypool Synthesis Lectures, 2010.
[14] U. Saeed, and A. Peterson, “Local Residual Error Estimators for the Method of Moments
Solution of Electromagnetic Integral Equations,” ACES Journal, Vol. 26, pp. 403410, 2011.
[15] M. Bibby, C. Coldwell, and A. Peterson, “A High Order Numerical Investigation of
Electromagnetic Scattering from a Torus and a Circular Loop,” IEEE Trans. Antennas Propagat.,
Vol. 61, pp. 36563661, 2013.
[16] M. Bibby, and A. Peterson, “High-order Treatment of Junctions and Edge Singularities with the
Locally-Corrected Nyström Method,” ACES Journal, Vol. 28, pp. 892902, 2013.
[17] M. Bibby and A. Peterson, Accurate Computation of Mathieu Functions, San Rafael: Morgan &
Claypool Synthesis Lectures, 2014.
[18] J. Davies, “A Least Square Boundary Residual Method for the Numerical Solution of Scattering
Problems,” IEEE Trans. Microwave Theory Tech., Vol. MTT-21, pp. 90103, 1973.
[19] K. Bunch, “Theoretical and Numerical Foundations of a Boundary Residual Method for Solving
Three-dimensional Boundary-Value Problems in Electromagnetics,” PhD Dissertation,
University of Utah, March 1990.
[20] K. Bunch and R. Grow “Numerical Aspects of the Boundary Residual Method,” Int. J. Num.
Modelling, Vol. 3, pp. 5771, 1990.
[21] K. Bunch, and R. Grow, “On the Convergence of the Method of Moments, the Boundary-
Residual Method, and the Point-Matching Method with a Rigorously Convergent Formulation of
the Point Matching Method,” ACES Journal, Vol. 8, no. 2, pp. 188202, 1993.
High-Accuracy Computations for Electromagnetic Integral Equations 297

[22] G. Hsiao, and R. Kleinman, “Mathematical Foundations for Error Estimation in Numerical
Solutions of Integral Equations in Electromagnetics,” IEEE Trans. Antennas Propagat., Vol. 45,
pp. 316328, 1997.
[23] J. Wang, and J. Webb, “Hierarchal Vector Boundary Elements and Adaption for 3-D
Electromagnetic Scattering,” IEEE Trans. Antennas Propagat., Vol. 45, pp. 18691879, 1997.
[24] A. Fourie, D. Nitch, and A. Clark, “Predicting MoM Error Currents by Inverse Application of
Residual E-Fields,” ACES Journal, Vol. 14, pp. 7275, 1999.
[25] F. Bogdanov, and R. Jobava, “Estimating Accuracy of MoM Solutions on Arbitrarily
Triangulated 3-D Geometries Based on Examination of Boundary Conditions Performance and
Accurate Derivation of Scattered Fields,” JEWA, Vol. 18, No. 7, pp. 879897, 2004.
[26] X. Wang, M. Botha, and J. Jin, “An Error Estimator for the Moment Method in Electromagnetic
Scattering,” Microwave and Optical Technology Letters, Vol. 44, pp. 320326, 2005.
[27] M. Ainsworth and J. Oden, A posteriori Error Estimation in Finite Element Analysis. New York:
Wiley, 2000.
[28] M. Ilic, S. Savic, A. Ilic, and B. Notaros, “Constant Speed Parametrization Mapping of Curved
Boundary Surfaces in Higher-Order Moment-Method Electromagnetic Modeling,” IEEE
Antennas and Wireless Propagation Letters, Vol. 10, pp. 14571460, 2011.
[29] B. Notaros, “Higher Order Frequency-Domain Computational Electromagnetics,” IEEE Trans.
Antennas Propagat., Vol. 56, pp. 22512276, August 2008.
[30] R. Graglia, D. Wilton, and A. Peterson, “Higher-Order Interpolatory Vector Bases for
Computational Electromagnetics,” IEEE Trans. Antennas Propagat., Vol. 45, pp. 329342, 1997.
[31] R. Graglia, A. Peterson, and F. Andriulli, “Curl-Conforming Hierarchical Vector Bases for
Triangles and Tetrahedra,” IEEE Trans. Antennas Propagat., Vol. 59, pp. 950959, 2011.
[32] R. Graglia and A. Peterson, “Hierarchical Divergence-Conforming Nedelec Elements for
Volumetric Cells,” IEEE Trans. Antennas Propagat., Vol. 60, pp. 52155227, 2012.
[33] K. Warnick and A. Peterson, “Higher-Order Basis Functions,” in Numerical Analysis for
Electromagnetic Integral Equations, by K. F. Warnick, Norwood MA: Artech House, pp.
161185, 2008.
[34] R. Satterwhite and R. Kouyoumjian, Electromagnetic Diffraction by a Perfectly Conducting
Plane Annular Section, Technical Report 2183-2, AFCRL-69-0401, The Ohio State University,
1970.
[35] J. Boersma, and J. Jansen, Electromagnetic Field Singularities at the Tip of an Elliptic Cone,
Eindhoven University of Technology Report 90-WSK-01, 1990.
Chapter 8
Fast Electromagnetic Solver Based on
Randomized Pseudo-Skeleton Approximation
Xianyang Zhu

Efficient modeling techniques of electromagnetic scattering problems have been an


active topic in the field of CEM. To reduce memory usage and CPU time in MoM
[1], the efficient method based on the randomized pseudo-skeleton approximation
(RPSA) [2] is presented in this chapter. The algorithm starts with a multilevel
partitioning of the computational domain, which is very similar to the technique
employed in the multilevel fast multipole algorithm (MLFMA) [38]. The
impedance submatrices associated with the well-separated partitioning groups (far-
far interaction terms) are low rank and can be represented by the product of two
much smaller matrices. Therefore, the memory requirement can be relieved and the
total simulation time can be reduced significantly as well. The efficient approach
based on RPSA is employed for the purpose of the aforementioned decomposition.
However, in applications with a lot of right sides, such as the monostatic RCS
simulations at multiple different observation angles, the RPSA method can also be
employed to reduce the number of simulations.
This chapter is organized as follows: first, the background of the fast
electromagnetics solvers is briefly introduced; and second, different approaches
including the singular value decomposition (SVD) method, the randomized
projection method, adaptive cross approximation, and the decomposition technique
of the low rank matrix based on RPSA are presented. Several numerical examples
are employed to validate the proposed algorithm.

8.1 INTRODUCTION

The MoM method has been a very popular approach in solving electromagnetic
scattering problems. However, the MoM method has also raised challenging issues
since it suffers from the high memory requirement for large dense impedance
matrices and computational complexity for large scale problems. It has been
observed in the past decades that some significant progress has been made on

299
300 Advanced Computational Electromagnetic Methods and Applications

reducing memory usage and computational cost for the MoM method. For example,
the MLFMA [36] incorporating iterative techniques [7, 8] can reduce the memory
usage and computational complexity to ON log N  . However, one of the major
disadvantages of this approach is that the algorithm is NOT independent of the
integral equation kernel. That is, for the integral equations with different kernels,
one has to make appropriate modifications to implement the associated fast
algorithms. Another approach to compression of operators is based on wavelets [9,
10], which exploits the smoothness of the elements of the matrix viewed as a
function of their indices and tends to fail for highly oscillatory operators.
It is a well-known fact that the entire impedance matrix derived from MoM is
usually neither singular nor rank-deficient. However, if all the unknowns are
assembled into groups as in MLFMA, then all the submatrix blocks representing
the interactions between two well-separated groups associated with the far
interaction terms are rank deficient. Recently, several approaches based on low-
rank representation of impedance matrix blocks have been introduced into the field
of CEM. These approaches include but are not limited to IES3 (pronounced “ice
cube,” an integral equation solver) [11], integral equation rank revealing (IE-QR)
[12, 13], predetermined interaction list octree (PILOT) [14], and adaptive cross
approximation (ACA) [1519]. In these approaches, the impedance matrix blocks
associated with the far interaction terms are represented by a product of two much
smaller matrices. For example, assume that the size of a matrix block is m × n and
its effective rank is r, then we can decompose it as the product of two matrices
with the sizes of n × r and r × n. Generally, r is much smaller than m and n. Thus,
the memory requirement can be reduced from m × n to r × (m + n), and the same
ratio of CPU time saving can be obtained for matrix-vector multiplication. The
beauty of these algorithms is their purely algebraic nature. That is, the
computational speed-up is achieved by employing linear algebra manipulations of
the impedance matrix. Thus, the implementations of these algorithms do not
depend on the complete knowledge of the integral equation kernels. However, the
computational complexity of the aforementioned algorithms is dependent on the
dimensions of the matrix under decomposition. For example, for the ACA
algorithm, its computational complexity is O(r3(m+n)), which is not trivial when m
or n is very large.
In this chapter, we will introduce the RPSA method [2] into the community of
CEM to do the matrix decomposition. In contrast to the aforementioned algorithms,
its computational complexity is O(r3), which is independent of m or n.
In Section 8.2, we will show that most submatrices of the impedance matrix
are rank deficient if the unknowns are partitioned into different groups
appropriately. The nice property can be exploited to reduce the memory
requirement and simulation time. Then different partitioning approaches are
presented in Section 8.3, and different methods for low rank matrix decomposition
are reviewed in Section 8.4. The approaches include the well-known VG method,
randomized algorithms, popular ACA algorithm, and RPSA method. For the
applications of multiple right sides, the low rank property will be exploited again
Fast Electromagnetic Solver Based on RPSA 301

to reduce the number of simulations without sacrifice of accuracy. The


implementation detail is presented in Section 8.5. The direct methods should be
employed to reduce the simulation time when the number of right sides is large
enough. To this end, the block lower upper (LU) decomposition can be a good
candidate for this purpose. Details can be found in Section 8.6. Parallelization of
the direct solver via OpenMP is illustrated in Section 8.7. Several examples are
presented in Section 8.8 to demonstrate the validity of the new algorithm, and the
conclusions are summarized in Section 8.9.

8.2 LOW RANK PROPERTY OF SUBMATRICES OF PARTITIONED


IMPEDANCE MATRIX

The electromagnetic scattering problems in the frequency domain can be solved by


applying MoM to EFIE, MFIE, or the combined field integral equations (CFIE)
[20]. An arbitrary shaped target is often modeled by triangles, and the well-known
Rao-Wilton-Glisson (RWG) basis function [21] is often used to discretize the
integral equations into a large system of linear equations. In terms of symbolic
notations, we have

ZI  V (8.1)

where Z , I, and V are impedance matrix, current coefficient vector, and voltage
vector associated with the incident fields, respectively. Their dimensions are N × N,
N × 1, and N × 1, respectively, where N is the number of current coefficients
(unknowns defined on the edges connected by two neighboring triangles). Each
element of the impedance matrix Z represents the interaction of two points (one
source and one field point) on the target surface. It is straightforward to find the
scattered fields once the current coefficients are determined.
Clearly, we can see that the memory requirement is N2 and the computational
complexity is O(N3) or O(MN2), if the direct or iterative approaches are employed,
where M is the number of iterations to get converged solutions for iterative
approaches. For electrically large problems, it is challenging to solve the large
system of linear equations successfully.
However, we know that about 10 samples per wavelength are regularly
required to get accurate results. This is due to the singular property of the Green’s
functions: the interaction between two points will change very quickly when the
distance between them becomes smaller and smaller. Therefore, more samples are
required to model the interactions accurately of two points close to each other.
However, the interactions are much smoother if the distances between them are
relatively large. That is, oversampling is not required for those cases. For a specific
source point, oversampling is only necessary for field points around itself in the
near region and is not necessary for all the other field points in the far region.
Therefore, there must be a lot of redundant information inside the impedance
302 Advanced Computational Electromagnetic Methods and Applications

matrix. The entire impedance matrix is neither singular nor rank deficient except at
the internal resonances, since each unknown is both source and field point and we
have to oversample everywhere. This is to say, the entire matrix itself is a full rank
matrix.
To exploit the redundancy property of the impedance matrix, we can partition
all the unknowns into groups according to their spatial positions. Assume all the
unknowns are partitioned into J groups, and all the unknowns inside each group
are indexed consecutively, then Equation (8.1) can be rewritten as follows:

 Z11 Z12  Z1J   I 1  V1 


Z Z 22  Z 2 J   I 2  V2 
 21      (8.2)
          
    
Z J 1 ZJ2  Z JJ   I J  VJ 

Here we reuse the subscripts. Now they stand for the indices of groups.
Therefore, each entry is a submatrix of the original impedance matrix Z,
representing the interactions between two groups. If the shortest distance between
two groups is large enough, then the submatrix is highly correlated since the
unknowns are oversampled everywhere and Green’s functions in that region are
changing slowly. That means there is a lot of redundant information in the
submatrix and therefore the submatrix must be rank deficient.
Several different approaches including IES3 [11], IE-QR [12, 13], PILOT [14],
and ACA algorithms [1519] have been developed to exploit the aforementioned
low rank property. In these methods, all the impedance matrix blocks associated
with the far interaction terms are represented by a product of two much smaller
matrices. Without loss of generality, a low rank matrix A can be approximated as
the product of two smaller matrices U and V, namely,

Am  n   U m  r V r  n  (8.3)

The dimensions of the three matrices in (8.3) are m × n, m × r, and r × n,


respectively. Both m and n are the dimensions of the original low rank matrix, and
r is the effective rank of A. Note that the symbol V is reused here.
Instead of storing m × n impedance entries in the conventional MoM method,
the low rank compression technique above only requires storing r × (m + n)
impedance entries.
Iterative approaches are often employed to solve a large system of linear
equations. Then the product of the impedance matrix and a vector has to be
evaluated once or several times in each iteration. The computational complexity is
m × n if the impedance matrix is directly applied to the vector. With the
employment of low rank approximation, we can first apply the matrix V and then
the matrix U to the vector in sequence. Then the associated computational
Fast Electromagnetic Solver Based on RPSA 303

complexity is r × (m + n). Therefore, the percentage of savings on the CPU time


will be the same as that of memory requirement.
For the electromagnetic problems, generally r is much smaller than m and n,
namely, r≪min(m, n). Therefore, the CPU time will be significantly reduced and
the memory requirement will be relieved dramatically as well. The low rank
approximation is schematically shown in Figure 8.1.

Figure 8.1 Decomposition schematic diagram of a low rank matrix.

An example rank map of an impedance matrix associated with an 8 × 8


PEC rough surface is shown in Figure 8.2. In this example the unknowns are
partitioned into 16 groups, and the diagonal and nondiagonal blocks represent the
self-interaction of one group and the mutual interaction of two different groups,
respectively. As mentioned before, any submatrix associated with two groups that
are not neighboring to each other is rank deficient. All those submatrices are
shown with numbers in the figure, where the numbers are the associated ranks. All
the other submatrices without numbers are associated with neighboring groups or
self-interactions.
The size of each submatrix is around 1,200 × 1,200 in this example. For a
typical rank 8, the memory requirement associated with the low rank
decomposition will be only 1.33% compared to the case without decomposition,
and the computational complexity of the matrix-vector product will be 1.33% as
well. This means that larger targets can be simulated with less CPU time with the
same computer resources, if the low rank decomposition is fully exploited.
It should be noted that the other nondiagonal submatrices representing the
interactions between neighboring groups are also low rank. Those ranks are larger,
but they can be compressed as the other submatrices as well.
304 Advanced Computational Electromagnetic Methods and Applications

Figure 8.2 Typical rank map of an impedance matrix.

8.3 PARTITIONING OF THE COMPUTATIONAL DOMAIN

Several algorithms are available for the partitioning of the computational domain.
One of the well-known approaches is the octree partitioning technique [22, 23]. It
has been widely used in 3-D graphics and 3-D game engines. An octree is a tree
structure in which each node has exactly eight children. It works with partitioning
the entire domain recursively in 8 groups (children, octants) as shown in Figure 8.3.
Each group can again be partitioned into 8 subgroups. This process can be repeated
until the partitioning satisfies one or more requirements. For example, it stops
when the number of unknowns in the subgroup is less than a specific number.
Fast Electromagnetic Solver Based on RPSA 305

Figure 8.3 Partition configuration in the octree technique.

The octree technique has been widely employed in MLFMA to partition a 3-D
space. The main disadvantage is that the number of unknowns inside each
subgroup could be quite different at the same level, which could result in
unbalanced load in parallel processing implementation.
Another partitioning method is based on the cobblestone distance sorting
technique [17]. The steps involved in this partitioning method are summarized as
follows:

 Create a box bounding all unsorted unknowns and find the diagonal vector of
the box.
 Project all the unknowns to the diagonal vector and find the unknown that is
the closest to the diagonal vector.
 Use that unknown as the first point of the group.
 Compute the distances between this point and all the other unsorted points,
and fill the group with the closest unsorted points.
 Terminate when the desired group size is obtained or if the next point is
farther than a specified threshold.
 Repeat the above steps for the next group until all unknowns are partitioned.

An example partitioning result is shown in Figure 8.4, where different colors


are used to distinguish neighboring groups. The target is a rectangular plate
situated in the x-y plane, and 22,801 unknowns are partitioned into 30 groups.
306 Advanced Computational Electromagnetic Methods and Applications

Figure 8.4 An example of unknowns partitioning is based on the cobblestone distance sorting
technique; 22,801 unknowns are partitioned into 30 groups.

The advantage of the cobblestone partitioning method is that the number of


unknowns in each group can be specified by the users, and the unknowns are
almost evenly partitioned. However, its disadvantage is that it is only good for
single-level partitioning. It cannot be employed for multilevel algorithms.
Another useful and promising method is based on binary space partitioning
[24]. Binary space partitioning is a generic process of recursively dividing the
unknowns into two groups until the number of unknowns in the group is less than
the threshold number specified by the users. The associated algorithm can be
described by the following steps:

1. Find the lower and upper limits of all unknowns along the three Cartesian
axes (this is equivalent to finding a bounding box for all unknowns);
2. Choose the axis with the largest extent, and partition all the unknowns along
the axis into two groups equally or almost equally (the maximum difference
is 1);
3. Repeat the above steps until the number of unknowns inside each group
meets the threshold number specified by the users.
It is obvious that this approach has more cuts along the axis with the largest
extent. Therefore, it has the combined advantages of both the octree and
cobblestone techniques: (1) the number of unknowns is almost the same for all
groups at the same level (the maximum difference is 1); and (2) it can be easily
employed for multilevel algorithms.
An example partitioning result is shown in Figure 8.5, where the target is a
sphere with 39,390 unknowns, which are partitioned into a total of 64 groups.
Fast Electromagnetic Solver Based on RPSA 307

Figure 8.5 An example of unknowns partitioning based on binary space partitioning technique,
39,390 unknowns are partitioned into a total of 64 groups

8.4 LOW RANK MATRIX DECOMPOSITION

In this section we will focus on how to decompose a rank deficient matrix into a
product of two smaller matrices as shown in (8.3).
To this end, several different approaches are available. We will review the
SVD method and then introduce an interesting approach based on the randomized
projection. The ACA algorithm that is one of the most popular compression
techniques of the past decade is then presented. Finally, the randomized pseudo-
skeleton approximation method will be introduced.

8.4.1 Singular Value Decomposition

The SVD method [25] is a factorization of a real or complex matrix. It has found
many applications in signal processing.
The SVD formulation of a complex matrix A with dimensions of m × n is:

A  USV '
 u11 u12  u1m   s1 0  0   v11 v12  v1n 
u u22  u2 m   0 s2  0  v21 v22  v2 n  (8.4)
  21   
               
   
um1 um 2  umm   0 0  sm   v n 1 v n 2  vnn 
308 Advanced Computational Electromagnetic Methods and Applications

where U is an m × m complex unitary matrix, S is an m × n rectangular diagonal


matrix with non-negative real numbers ordered from maximum to minimum on the
diagonal, and V ' (the conjugate transpose of V) is an n × n complex unitary matrix.
Note that the symbols U and V are reused.
SVD can be viewed as a method for transforming correlated data into a set of
uncorrelated ones that better expose the various relationships among the original
data items. In the meantime, it is also a method for identifying the most important
dimensions (basis vectors) along which data points exhibit the most variation. The
importance of each basis vector is reflected by its singular value.
The rank of the matrix A is the number of the nonzero singular values or the
number of the singular values larger than a specified threshold. That means that the
best approximation of the original matrix can be constructed by using fewer
dimensions (basis vectors) by throwing away those associated with zero or very
small singular values if the original matrix is low rank.
Assuming that the rank of the original matrix is r, then only the first r singular
values and their associated basis vectors need to be kept, and the original matrix
can be approximated as

 u11 u12  u1r   s1 0  0   v11 v12  v1n 


u u22  u2 r   0 s2  0  v21 v22  v2 n 
A   21    (8.5)
               
   
um1 um 2  umr   0 0  sr   v r1 v r 2  vrn 

The dimensions of the three matrices on the right side are m × r, r × r, and r ×
m, respectively. Once obtaining the three matrices, we can represent the original
matrix as the product of two matrices by putting either the first two or the last two
together. For example, the matrices U and V in (8.3) can be defined as the product
of the first two matrices and the third matrix of the right side in (8.5), respectively,
or they can be defined as the first matrix, and the product of the second and third
matrices of the right-hand side in (8.5), respectively.
It has been proved theoretically that the SVD method would find the best
decomposition for a low rank matrix with a given rank. In other words, for a given
accuracy, SVD will find the associated lowest rank.
However, SVD is very expensive, especially when the matrix dimensions are
big since the computational complexity of the best algorithm for SVD computation
 
of an m × n matrix is O 4m2 n  22n3 . Another disadvantage of SVD is that all the
elements of the original matrix will be used for the SVD decomposition. Thus, we
have to calculate all the elements of the matrix. These two factors exclude SVD for
fast solvers.
Fast Electromagnetic Solver Based on RPSA 309

8.4.2 Randomized Projection Approach

The randomized projection approach [26, 27] avoids applying the SVD
decomposition to the original matrix directly. Instead, the original matrix is
projected onto a much smaller space first. Then the rank of the matrix and
associated orthonormal bases are found through the much smaller matrix. The last
step is to find the associated coefficient matrix. The associated MATLAB code is
given in Listing 8.1.

Listing 8.1 Randomized low rank decomposition MATLAB code


function[U,V,rank]=randomized_low_rank_decomposition(A,
rank_estmated)
% Low rank matrix decomposition using randomized projection
% A (mxn) = U(m x rank) * V(rank x n)
% Inputs:
% A: low rank matrix with dimensions of m by n
% rank_estimated: estimated rank of the matrix
% Outputs:
% U: m x rank matrix
% V: rank x n matrix
% rank: real rank of the original matrix A

% dimensions of the original matrix


[m, n] = size(A);
% generate a randomized matrix
G = normrnd(0, 1, n, rank_estimated+10);

% project the original matrix


R = A*G;

% find the associated orthonormal bases and rank


U = orth(R);
rank = size(U,2);

% find the coefficient matrix V


V = U’*A;

% check the rank


if rank > rank_estimate
error(‘Estimated rank is too small!’);
end

The key technique of the randomized projection approach is to project the


original large matrix to a much smaller matrix via random projection.
It is evident that the disadvantages of this method include: (1) all the elements
of the original matrix have to be known; and (2) an estimated rank is needed.
310 Advanced Computational Electromagnetic Methods and Applications

8.4.3 Adaptive Cross Approximation (ACA)

The main idea of the ACA algorithm is to use an iterative and pivoting procedure
to find the two submatrices adaptively. It uses a series of approximation matrices
S0 , S1 ,, Sr to approximate the original matrix. Note that the symbol S is abused
here again. Equation (8.3) can be rewritten as follows in the format of the outer
product:

Am  n   U m  r V r  n  (8.6)
 U :, 1V :, 1  U :, 2V 2, :    U :, r V r, :

At the very beginning, the approximation matrix is set to be 0 (S0 = 0). In the
first step of ACA, it uses the pivoting procedure to find one column of U and one
row of V to approximate the original matrix. The associated approximation matrix
is defined as

S1  U :,1V 1,:  S0  U :,1V 1,: (8.7)

Then check if the norm of the matrix associated with the newly added column
and row is small enough or not, compared to the norm of the approximation matrix
so far. If yes, then stop; otherwise, add another column and row to U and V,
respectively. Repeat the above steps until the residue error is smaller than the
specified threshold. We can find that the approximation matrix after the kth
iteration can be written as follows

Sk  Sk 1  U :, k V k ,: (8.8)

After the first iteration, U and V have one column and one row, respectively.
The newly added column and row are chosen in such a way that the same column
and row in the approximation matrix are exactly the same as the original matrix if
the computer’s round-off error is not considered. The other elements in the
approximation matrix are approximated by the outer product of the newly added
column and row. Similarly, after the kth iteration, then there are k columns and
rows in the approximation matrix that are the same as the original matrix. At the
same time, the difference at other columns and rows will be decreasing
monotonously. That means that the ACA algorithm will be guaranteed to converge
after min(m, n) iterations. That is the worst case associated with a full rank matrix.
It should be noted that not all elements of the original low rank matrix are
required. This is one of the most important features of ACA, especially when the
dimensions of the matrix are large and it is expensive to calculate elements. The
ACA algorithm is summarized as follows:
Fast Electromagnetic Solver Based on RPSA 311

 Initialize the approximation matrix S0  0 , and without loss of generality


choose 1 as the first row index: ik 1  1 .
 For k  0,1, 2,
 Find the row of the residuum:
k
eiTk 1 Rk  eiTk 1 A   U :, k ik 1 V T l ,:
l 1

 Find the column index using pivoting method:

jk 1 : Rk ik 1 , jk 1  max Rk ik 1 , j


j

 Normalize and get the (k+1)th row of V:

V k  1,:  eiTk 1 Rk Rk ik 1 , jk 1

 Calculate the (k+1)th column of U:


 
k
U :, k  1  Ae jk 1   V l ,:ik 1 U :, l 
l 1

 Find the row index for the next iteration:

 
ik  2 : U :, k  1ik 2  max U :, k  1i
i  i k 1

 Terminate the iteration if the stopping criterion is met:

U :, k  1 F V k  1,: F   A F

The original matrix is not known completely. Therefore, it is impossible to


calculate its norm directly. The norm of the approximation Sk can be used instead.
This can be computed recursively in the following way:

2 2
Sk F
 Sk 1 F

 k 1  (8.9)
+2 Re    conj U :, j  U :, k   * V  k ,: conj  V  j,:  
 j 1 
2 2
 U :, k  F
V  k ,: F

The disadvantage of ACA is that the vectors inside U and V are not orthogonal.
That implies that there is still redundant information in U and V, that is, they are
still rank deficient themselves. To remove the redundancies, both the QR
factorization (also called the QR decomposition) and SVD can be employed here.
First apply the QR factorization to U and V’, respectively:
312 Advanced Computational Electromagnetic Methods and Applications

U  Qu * Ru (8.10)

V '  Qvp * Rvp (8.11)

Then the original matrix can be approximated as

A  U *V  Qu * Ru * R'vp *Q'vp (8.12)

Then apply SVD to the two middle matrices to find its real rank r based on the
singular values:

Ru * R 'vp  U tmp StmpVtmp (8.13)

During this step, the effective rank r of the matrix A is determined by:


r  sum diag Stmp   tol  Stmp 1,1  (8.14)

where the tol factor is the relative tolerance. Generally, the tol factor is chosen to
be 103. The ACA results can thus be recompressed as follows:

A  U new *Vnew (8.15)

where

U new  Qu *Utmp :,1 : r  * Stmp 1 : r,1 : r  (8.16)

Vnew  Vtmp 1 : r,: * Q'vp (8.17)

The above recompression procedure is very important if ACA is employed for


the low rank decomposition. Considerable memory and CPU time can be reduced
from the later steps.

8.4.4 Randomized Pseudo-Skeleton Approximation

Before going to the details of the randomized pseudo-skeleton approximation, we


first review the skeleton approximation method.
Assume that the rank of the matrix A is r, then there exists a nonsingular r × r
submatrix 𝐴̂ in A. Denote the columns and rows of A containing the submatrix 𝐴̂
by C (with dimensions of m × r) and R (with dimensions of r × n), respectively.
That is, submatrix 𝐴̂ is the intersection of C and R. Then it is easy to verify
Fast Electromagnetic Solver Based on RPSA 313

Am  n   C m  r A
ˆ 1 r  r Rr  n  (8.18)

This decomposition is known as a skeleton approximation of A.


The problem with skeleton approximation is that one should identify which
columns and rows should be chosen. Random selection will lead to a singular 𝐴̂,
thus not enough bases are embedded in those columns and rows, and the inverse
will not be available, which results in a fail decomposition. Actually, the original
matrix can be approximated by

Am  n   C m  r Gr  r Rr  n  (8.19)

where G is not necessarily equal to the inverse of 𝐴̂ and even not necessarily
nonsingular. For example, G can be chosen as the pseudo-inverse of 𝐴̂. This kind
of decomposition is called the pseudo-skeleton approximation.
Once the matrices C, G, and R are obtained, one can easily obtain the matrices
U and V defined in (8.3):

U C (8.20)

V  GR (8.21)

Or we can have:

U  CG (8.22)

V R (8.23)

Notice that the most time-consuming part in the above decomposition is to


compute the pseudo-inverse of 𝐴̂, whose computational complexity is O(r3). Thus
it is significantly faster since r<<m and r<<n for most applications in CEM. It is
also faster than the popular ACA.
In practical applications, the exact rank r of the investigated matrix is usually
unknown. However, for the submatrices representing the interactions between non-
neighboring groups their rank range at each level can be roughly estimated in
advance [2831].
Now the question is how to select the r rows and columns. This is not an easy
task. Unlike using an iterative pivoting approach in ACA to find the most
informative columns of C and rows of R, here we propose a very simple idea to
circumvent around the bottleneck: just randomly draw l columns and rows from
the original matrix A, where l is a number large enough so that r most important
basis vectors are embedded in randomly selected column and row submatrices.
This is called randomized pseudo-skeleton approximation.
314 Advanced Computational Electromagnetic Methods and Applications

Similar to Equation (8.19), the original low rank matrix can be approximated
as

Am  n   C m  l Gl  l Rl  n  (8.24)

Notice that the dimensions of C, G, and R are changed. G is the pseudo-inverse


of 𝐴̂, and 𝐴̂ is the intersection matrix of C and R.
The real rank of the original low rank matrix A can be revealed directly in the
step of evaluating G (the pseudo-inverse of 𝐴̂).
Assume that the SVD decomposition of 𝐴̂ can be written as

Aˆ l  l   U Acap l  l S Acap l  l VAcap l  l  (8.25)

The effective rank r can be determined in this step according to (8.14). We


have

Aˆ l  l   U Acap :, 1 : r S Acap 1 : r, 1 : r VAcap 1 : r, : (8.26)

Then the pseudo-inverse of 𝐴̂ (G) can be determined readily

Gl  l   U P l  r S P r  r VP r  l  (8.27)

where Up and Vp are the conjugate transpose of VAcap(1 : r, :) and VAcap(: , 1, r),
respectively. Sp is still a diagonal matrix and its diagonal elements are the inverses
of their counterparts in SAcap.
Similar to (8.20)(8.23), the original low matrix A can be approximated by the
production of two smaller matrices U and V

U  CU P (8.28)

V  S PVP R (8.29)

or

U  CU P S P (8.30)

V  VP R (8.31)

Note that only l rows and columns are needed for the original matrix.
Numerical experiments show that l = 2r is good enough to obtain excellent results.
Fast Electromagnetic Solver Based on RPSA 315

Compared to the ACA algorithm, the randomized pseudo-skeleton


approximation does not need the step of recompression, since its effective rank can
be revealed automatically when evaluating the pseudo-inverse of the intersection
matrix. The computational complexity is still O(r3), which is much smaller than
that of ACA.
The randomized pseudo-skeleton approximation algorithm can be summarized
by the following simple and straightforward steps:

1. Find l rows and columns of the original low rank matrix;


2. Find the effective rank r of the intersection matrix of the row and column
matrices;
3. Decompose the original matrix using (8.28) and (8.29) or (8.30) and (8.31).
The randomized pseudo-skeleton approximation is schematically shown in
Figure 8.6. In the example shown in Figure 8.6, the columns 4, 11, 21, 27, and 31
and the rows 2, 7, 18, 26, and 31 are chosen to form the column matrix C, and the
row matrix R, respectively. The intersection matrix 𝐴̂ is just the intersection of C
and R. The computational complexity of the pseudo-inverse operation on the
intersection matrix is in the same order of SVD. But one should note that the
dimensions of the intersection matrix 𝐴̂ are much smaller compared with the
original matrix A.

4 11 21 27 31 4,11,21,27,31
7 2

Aˆ 1

18

2,7,18,26,31
31 26

Figure 8.6 Schematic diagram of the randomized pseudo-skeleton approximation.

If the rank r is not known a priori, an adaptive scheme can be employed.


There is an implicit assumption in the randomized pseudo-skeleton approximation
algorithm that the rank of the intersection matrix should be equal (or at least
approximately equal) to that of the original low rank matrix. Therefore, we can
first set an initial size of the intersection matrix, and then double its size at the next
iteration. Once the rank of the intersection matrix does not change between two
consecutive iterations, the iteration can exit and the associated rank will be used as
the rank of the original matrix. The most time-consuming part in this step is
316 Advanced Computational Electromagnetic Methods and Applications

dominated by the last iteration. Therefore, its computational complexity is still in


the order of O(r3).

8.5 LOW RANK DECOMPOSITION OF MULTIPLE RIGHT SIDES

For many practical applications, one is often interested in the monostatic scattering
patterns of a target. In these cases, the aforementioned large system of linear
equations in (8.1) needs to be solved a lot of times. For the same target, the terms
on the left side remain the same. The voltage vector on the right side will change
for different observation angles. Simulation will be very costly if a large number of
observation angles are considered.
There is another issue associated with the monostatic applications, namely,
how many samples (observation angles) are needed to guarantee that all the details
of the monostatic scattering patterns can be caught? In general, this question can
only be addressed case by case: more observation angles need to be considered if
the scattering pattern is complex with more fast-changing details, and vice versa.
The problem is that we do not know beforehand if the scattering pattern is complex
or not, especially for targets with complex geometries.
The randomized pseudo-skeleton approximation can be employed here to
address the above issues. The right sides can be decomposed as the product of two
much smaller matrices:

V  U vVv (8.32)

where the dimensions of U, U, and V are Nedge × Nobservation, Nedge × k, and k ×
Nobservation, respectively. Nedge is the number of unknowns (defined on the edges
shared by two triangles), Nobservation is the number of observation angles, and k is the
rank of V. It should be noted that V is a very big matrix since Nedge and Nobservation
are very large numbers. But there is no need to calculate all elements of V. Only a
few rows and columns are randomly selected and calculated.
The k columns of the matrix U can be viewed as the principal bases of the
original voltage matrix V, while each column of the matrix V represents the
weighting coefficients of all the k bases at the associated observation angle.
Equation (8.23) implies that only k independent voltage vectors need to be
considered for the original Nobservation observation angles. The steps to solve the
original problems are as follows:
 Find the low rank decomposition of the right sides with the employment of
the randomized pseudo-skeleton approximation.
 Use each column in U as the right side of (8.1), and solve the equations to
obtain the current coefficients on the target surface k times. These current
coefficients are referred to as principal current components.
Fast Electromagnetic Solver Based on RPSA 317

 For each of the original Nobservation observation angles, the current coefficients
are just linear combination of the k principal current components. The
weighting coefficients are given in the columns of the matrix V.

Solving the large system of equations is time consuming. But we only need to
do that for k times other than Nobservation times. Generally, k is much smaller than
Nobservation; thus, a lot of simulation time can be reduced.
Some numerical results will be shown in the next section to validate the
approach.

8.6 DIRECT SOLVER BASED ON BLOCK LU DECOMPOSITION

Generally, the number of unknowns is large. It is very difficult and expensive to


find the inverse of the impedance matrix directly. Iterative algorithms like the
Krylov subspace method [32] can be employed to solve the large system of linear
equations.
However, iterative algorithms are only good for radiation or bistatic problems,
where the number of right-side vectors is very small. For monostatic applications
with many observation angles, iterative algorithms are not smart choices since the
iterative procedure has to be restarted from scratch for different iterative
observation angles.
Direct solver based on the block LU decomposition provides a promising
alternative for those monostatic applications. Compared to the iterative methods, it
is advantageous for at least three important reasons: (1) it has no convergence
issues associated with the iterative methods; (2) the nondiagonal blocks of L and U
are low rank so that they can be decomposed as the product of two smaller
matrices as the impedance submatrices shown before; and (3) the algorithm is rich
in matrix multiplication that is beneficial for high performance computing. Parallel
computation can be easily implemented by exploiting the subroutines at all three
levels in the basic linear algebra subprograms (BLAS).
The block LU decomposition is a generalization of the scalar LU
decomposition. Assume that all the unknowns are partitioned into J groups, then
the impedance matrix in (8.2) can be rewritten as follows:

 Z11 Z12  Z1J   L11 0  0  U11 U12  U1J 


Z Z 22  Z 2 J  L L22  0   0 U 22  U 2 J  (8.33)
 21   21
                
    
Z J 1 Z J 2  Z JJ   LJ 1 LJ 2  LJJ   0 0  U JJ 

The code of the block LU decomposition can be shown in Listing 8.2.


318 Advanced Computational Electromagnetic Methods and Applications

Listing 8.2 Block LU decomposition MATLAB code


function [L, U] = block_LU_decomposition(A)
% block LU decomposition of partitioned A
% Input:
% A: cell array, partitioned matrix
% Outputs:
% L: cell array, blocked L matrices
% U: cell array, blocked U matrices
% number of groups
n_group = size(A, 1);
for k = 1 : n_group
[L_tmp, U_tmp] = lu(A{k,k});
L{k,k} = L_tmp;
U{k,k} = U_tmp;

for i = k+1 : n_group


L{i, k} = A{i, k}/U_tmp;
U{k, i}=L_tmp\A{k, i};
end
for j = k+1 : n_group
for i = k+1 : n_group
A{i, j} = A{i, j} – L{i, k}*U{k, j};
end
end
end

In fact, it is quite expensive to obtain the block LU decomposition of the


impedance matrix. But that process is required only once. Once the block LU
decomposition is done, it is straightforward to find the current distributions by
using the forward and backward substitution in sequence as follows:

 L11 0  0   Y1  V1 
L L22  0  Y2  V2 
 21      (8.34)
          
    
 LJ 1 LJ 2  LJJ  YJ  VJ 

U11 U12  U1J   I1   Y1 


 0 U  U 2 J   I 2  Y2 
 22      (8.35)
          
    
 0 0  U JJ   I J  YJ 

It is observed that the most expensive part of this step is the inversion of the
diagonal matrices. They should be inverted immediately after the block LU
Fast Electromagnetic Solver Based on RPSA 319

decomposition. Most operations are associated with matrix multiplication, and they
can be highly parallelized using the BLAS library. Hence, the solution time to find
the current distributions is ignorable compared to the block LU decomposition of
the impedance matrix.
It is worthwhile to note that all the nondiagonal submatrices of L and U are
low rank as well. Therefore, the RPSA algorithm can also be employed here to
compress all those submatrices.

8.7 PARALLELIZATION VIA OPENMP AND BLAS LIBRARY

The MoM code based on the randomized pseudo-skeleton approximation can be


parallelized efficiently using OpenMP [33].
OpenMP is an application programming interface that supports multiplatform
shared memory multiprocessing programming. It consists of a set of compiler
directives, library routines, and environment variables that influence run-time
behavior. Generally, we only need to modify a serious code slightly to implement
the parallelization. For example, we can add a preprocessor directive to a code
block to form multiple threads. The multiple threads can then run concurrently
with the run time environment allocating threads to different processors.
However, most operations are associated with matrix multiplications. We can
rely on the BLAS library [34] to get the optimized performance. The BLAS library
consists of a specified set of low-level subroutines that perform common linear
algebra operations such as copying, vector scaling, vector dot products, linear
combinations, and matrix multiplication. Several level-3 BLAS subroutines,
especially those associated with the matrix multiplication (SGEMM, DGEMM,
CGEMM, and ZGEMM), have been highly optimized for multiprocessors. We
should try to use those subroutines as much as possible.
The flowchart of the MoM code is shown in Figure 8.7. From the figure we
can see that the first two sections do not need to be parallelized. The third section
is about the compression of all nondiagonal impedance submatrices. This part can
be fully parallelized via OpenMP. The simulation time can be scaled perfectly,
since all submatrices are independent of each other. The fourth section is used to
compress the right sides. Generally, we need to consider two polarizations (at
most) for the incident fields. Therefore, we only need two processors to implement
this part; all other processors are idle. The block LU decomposition in the next
section cannot be parallelized perfectly. This is because the updated information of
the submatrices in previous rows is needed for a given row. However, for all
submatrices in a given row, they are still independent of each other. Therefore,
OpenMP can be exploited for row by row. The sections for the calculations of
current distributions and RCS are totally independent of each other. Again, they
can be fully parallelized.
320 Advanced Computational Electromagnetic Methods and Applications

Single core
Multiple cores (partial)
Multiple cores (full)

Figure 8.7 Flowchart of the parallelized MoM code.

8.8 NUMERICAL EXAMPLES

In this section, several different numerical examples are presented to validate the
RPSA method.

8.8.1 Selection of the Sample Numbers

The first numerical example is used to determine how many random rows and
columns are needed to guarantee that RPSA works with the specified accuracy.
An electromagnetic-related impedance submatrix representing interaction
between two well separated groups is employed here for the study of the selection
of the sample numbers.
The size of the impedance submatrix is 280 × 280, and its effective rank is 8,
which is determined numerically according to (8.14).
The relative errors (the ratio of the Frobenius norms (also known as Hilbert-
Schmidt or Schur norm) of the difference matrix to the original matrix) as a
function of sample numbers are shown in Figure 8.8. For each sample number, we
run the code 10,000 times and select the worst case to calculate the relative error.
From the figure, we can see that the relative error is in the order of 104 when
the number of samples is three times that of the effective rank. The relative error is
in the order of 103 when the number of samples is twice that of the effective rank.
This should be sufficient for most applications.
It should be noted that the above results are based on the real data. For
synthetic data that all the smaller singular values are set to be zero, the RPSA
algorithm will be successful if the number of samples is larger than its rank.
Fast Electromagnetic Solver Based on RPSA 321

Figure 8.8 Relative errors as a function of the number of samples.

8.8.2 Accuracy of the Randomized Pseudo-Skeleton Approximation

To test the accuracy of the RPSA algorithm, a random complex low rank matrix is
generated. The size of the matrix is 1,200 × 1,200, and its rank is 10. The real parts
are shown in Figure 8.9. The imaginary parts are very similar.

Figure 8.9 Real parts of the original low rank matrix.


322 Advanced Computational Electromagnetic Methods and Applications

Thirty rows and columns chosen randomly are used to obtain its pseudo-
skeleton decomposition. The revealed rank is 10, exactly the same as the ground
truth. The difference between the reconstructed matrix and the original one is
shown in Figure 8.10. Notice that the difference is at the level of 105. It is evident
that the RPSA method performs excellently.

Figure 8.10 Differences between the original and reconstructed matrices.

8.8.3 Comparison with ACA

In this subsection, we will compare the performance between RPSA and ACA.
Four low rank random matrices are generated; their dimensions are 500 × 5,000,
1,000 × 1,000, 1,500 × 1,500, and 2,000 × 2,000, and their ranks are 50, 100, 150,
and 200, respectively.
For ACA, the revealed ranks are 55, 106, 158, and 217, respectively.
Therefore, further compression is necessary for ACA. The termination criterion for
the first three cases is 10-6. However, the wrong decomposition results will be
obtained for the last matrix for this criterion. The criterion has to be reduced to
108 to obtain a good result in this case. The relative Frobenius norm errors in the
four cases are at the level of 1013 (3.43e14, 1.32e13, 2.40e-3, and 4.30e13).
For RPSA, the revealed ranks are exactly the same as the ground truths. This
is because the real rank can be found directly when calculating the pseudo-inverse
of the intersection matrix. The threshold for the pseudo-inverse purpose is set to
0.001 for all cases. The relative Frobenius norm errors in the four cases are at the
level of 1015 (4.12e15, 6.25e15, 5.97e15, and 1.07e14). They are at least one order
more accurate than ACA.
The CPU time for both algorithms is shown in Figure 8.11. Again we can see
that RPSA is at least one order faster than ACA.
Fast Electromagnetic Solver Based on RPSA 323

Figure 8.11 Comparison of CPU times for ACA and RPSA.

8.8.4 RCS of a PEC Sphere

The far field of a PEC sphere can be expressed in a closed form via Mie series [35].
Therefore, it is often used to test the accuracy of different electromagnetic solvers.
To this end, the RCS of a PEC sphere versus frequency is calculated.
The sphere is modeled by 72,982 triangles (109,473 unknowns), and its radius
is 5 meters. The variation of the normalized RCS with frequency is shown in
Figure 8.12, where the solid and dashed lines represent the Mie series solution and
numerical results based on RPSA, respectively. The differences between the Mie
series solution and the numerical results are shown in Figure 8.13.

Figure 8.12 Normalized RCS of a PEC sphere.


324 Advanced Computational Electromagnetic Methods and Applications

Figure 8.13 Differences between the Mie series solution and numerical results.

From the figures we can see that the numerical result based on RPSA agrees
well with the Mie series solution. The differences of the two methods are generally
less than 0.01 dB.

8.8.5 Multiple Monostatic Scattering Analysis of an Airplane Model

In this section, we will show how RPSA is applied to reduce the number of
simulations for the multiple monostatic scattering analysis.
A generic fighter size airplane model VFY218 is shown in Figure 8.14. It has
been widely used as a benchmark by the Electromagnetic Codes Consortium
(EMCC) [36]. It is 15.5m long from nose to tail, 4.1m from top to bottom, and
8.9m from one wing tip to another.

Figure 8.14 Model of VFY218 airplane.


Fast Electromagnetic Solver Based on RPSA 325

At 300 MHz, the VFY218 airplane is modeled by using 79,172 triangles with
an average edge length of 0.071 m. The model is closed; thus, the total number of
unknowns is 118,758 (1.5 times the number of triangles).
To obtain the monostatic scattering pattern of this model, RPSA is employed
to compress the right sides. The step sizes for both azimuthal and elevation angles
are 1o. The revealed ranks are 1,511 and 1,456, for the v- and h-polarizations,
respectively. That means that only 1,511 and 1,456 independent simulations are
needed for the monostatic scattering analysis at all observation angles. The current
distributions at all observation angles (65,160 cases) can be reconstructed using the
1,511 or 1,456 eigen-current distributions for the v- or h-polarizations.
The monostatic RCS patterns on a cut ( = 90o,  = 0~360o) for different
polarizations are shown in Figure 8.15. The two curves associated with the cross-
polarizations are supposed to be the same theoretically due to the reciprocity, and
they do agree with each other very well.

(a) (b)
Figure 8.15 Monostatic RCS on a cut. (a) VV- and HH-polarizations; and (b) VH- and HV-
polarizations.

Figure 8.16 shows the current distribution on the target surface. The incident
elevation and azimuth angles are 0o and 45o, respectively. We can see that the
surface current is dominated by the physical optics in the lit region. In some local
areas, the current is disturbed due to the multiple reflections in between different
parts.
We also calculate the RCSs on the same cut without applying the randomized
pseudo-skeleton approximation to the right-hand sides. The difference is shown in
Figure 8.17. Clearly we can see that the difference is negligible. Thus, the
application of RPSA to the right sides provides an efficient way to reduce the
simulation time of multiple monostatic scattering cases, while no accuracy is
sacrificed.
326 Advanced Computational Electromagnetic Methods and Applications

Figure 8.16 Current distribution on VFY218 surface.

Figure 8.17 Differences of the RCSs on the same cut.

8.8.6 Speed-Up of the Parallel Implementation

To test the performance of the parallelized code, we run the code with different
numbers of threads. Its speed-up and efficiency as a function of the number of
threads are illustrated in Figure 8.18.
Fast Electromagnetic Solver Based on RPSA 327

It is interesting to notice that the performance is better than the ideal case.
This phenomenon is called super-linear speed-up, which is due to the fact that
more cache memory (faster than normal memory) is available.
We can also see that the efficiency appears to saturate around 50%. This is
due to the block LU decomposition, where some threads have to be idle while
waiting for the results from the other threads.

(a)

(b)
Figure 8.18 Performance of the parallelized code based on OpenMP: (a) speed-up and (b) efficiency.

8.9 SUMMARY

The RPSA algorithm is very simple and efficient compared with the other low rank
approximation methods. Similar to the popular ACA algorithm, it is purely
algebraic. Therefore, its implementation is integral equation kernel independent. Its
computational complexity is not dependent on the dimensions of the deficient
matrix, but on its effective rank. In addition to CEM, it could also benefit other
communities where the low rank decompositions are employed.
328 Advanced Computational Electromagnetic Methods and Applications

The RPSA algorithm can also be employed to reduce the number of


simulations for multiple monostatic applications. For a direct solver based on the
block LU decomposition, it can be employed to compress all of the nondiagonal
submatrices.
Numerical results have shown that the RPSA algorithm is promising and is
faster than the popular ACA algorithm.

REFERENCES

[1] F. Harrington, Field Computation by Moment Method, New York: IEEE Press, 1993.
[2] X. Zhu and W. Lin, “Randomised Pseudo-Skeleton Approximation and Its Application in
Electromagnetics,” Electronics Letters, Vol. 47, No. 10, pp. 590592, 2011.
[3] V. Rokhlin, “Rapid Solution of Integral Equations of Scattering Theory in Two Dimensions,” J.
Comput. Phys., Vol. 86, No. 2, pp. 414439, 1990.
[4] N. Engheta, W. Murphy, V. Rokhlin, and M. Vassiliou, “The Fast Multipole Method FMM for
Electromagnetic Scattering Problems,” IEEE Trans. Antennas Propag., Vol. 40, No. 6, pp.
634641, 1992.
[5] W. Chew, J. Jin, C. Lu, E. Michielssen, and J. Song, “Fast Solution Methods in
Electromagnetics,” IEEE Trans. Antennas Propag., Vol. 45, No. 3, pp. 533543, 1997.
[6] N. Geng, A. Sullivan, and L. Carin, “Fast Multipole Method for Scattering from 3-D PEC
Targets Situated in a Half-Space Environment,” Microw. Opt. Tech. Lett., Vol. 21, No. 6, pp.
399405, 1999.
[7] J. Song, C. Lu, and W. Chew, “Multilevel Fast-Multipole Algorithm for Electromagnetic
Scattering by Large Complex Objects,” IEEE Trans. Antennas Propag., Vol. 45, No. 10, pp.
14881493, 1997.
[8] N. Geng, A. Sullivan, and L. Carin, “Multilevel Fast-Multipole Algorithm for Scattering from
Conducting Targets above or Embedded in a Lossy Half Space,” IEEE Trans. Geoscience and
Remote Sensing, Vol. 38, No. 4, pp. 15611573, 2000.
[9] G. Beylkin, R. Coifman, and V. Rokhlin, “Fast Wavelet Transforms and Numerical Algorithms
I,” Comm. Pure Appl. Math., Vol. 44, No. 2, pp. 141183, 1991.
[10] B. Alpert, G. Beylkin, R. Coifman, and V. Rokhlin, “Wavelet-Like Bases for the Fast Solutions
of Second-Kind Integral Equations,” SIAM J. Sci. Comput., Vol. 14, No. 1, pp. 159184, 1993.
[11] S. Kapur and D. Long, “IES3: Efficient Electrostatic and Electromagnetic Solution,” IEEE
Comput. Sci. Eng., Vol. 5, No. 4, pp. 6067, 1998.
[12] S. Seo and J. Lee, “A Single-Level Low Rank IE-QR Algorithm for PEC Scattering Problems
Using EFIE Formulation,” IEEE Trans. Antennas Propag., Vol. 52, No. 8, pp. 21412146, 2004.
[13] R. Burkholder and J. Lee, “Fast Dual-MGS Block-Factorization Algorithm for Dense MoM
Matrices,” IEEE Trans. Antennas Propag., Vol. 52, No. 7, pp. 16931699, 2004.
[14] D. Gope and V. Jandhyala, “Efficient Solution of EFIE via Low-Rank Compression of
Multilevel Predetermined Interactions,” IEEE Trans. Antennas Propag., Vol. 53, No. 10, pp.
33243333, 2005.
Fast Electromagnetic Solver Based on RPSA 329

[15] M. Bebendorf, “Approximation of Boundary Element Matrices,” Numer. Math., Vol. 86, No. 4,
pp. 565589, 2000.
[16] K. Zhao, M. Vouvakis, and J. Lee, “The Adaptive Cross Approximation Algorithm for
Accelerated Method of Moments Computations of EMC Problems,” IEEE Trans. Electromagn.
Compat., Vol. 47, No. 4, pp. 763773, 2005.
[17] J. Shaeffer, “Direct Solve of Electrically Large Integral Equations for Problem Sizes to 1M
Unknowns,” IEEE Trans. Antennas Propag., Vol. 56, No. 8, pp. 23062313, 2008.
[18] J. Tamayo, A. Heldring, and J. Rius, “Multilevel Adaptive Cross Approximation,” IEEE Trans.
Antennas Propag., Vol. 59, No. 12, pp. 46004608, 2011.
[19] A. Heldring, J. Tamayo, C. Simon, E. Ubeda, and J. Rius, “Sparsified Adaptive cross
Approximation Algorithm for Accelerated Method of Moments Computations,” IEEE Trans.
Antennas Propag., Vol. 61, No. 1, pp. 240246, 2013.
[20] A. Peterson, S. Ray, and R. Mittra, Computational Methods for Electromagnetics, New York:
IEEE Press, 1998.
[21] S. Rao, D. Wilton, and A. Glisson, “Electromagnetic Scattering by Surface of Arbitrary Shape,”
IEEE Trans. Antennas Propag., Vol. 30, No. 3, pp. 409418, 1982.
[22] D. Meagher, “Octree Encoding: A New Technique for the Representation, Manipulation and
Display of Arbitrary 3-D Objects by Computer,” Rensselaer Polytechnic Institute Technical
Report IPL-TR-80-111.
[23] H. Eberhardt, V. Klumpp, and U. Hanbeck, “Density Trees for Efficient Nonlinear State
Estimation,” Proceedings of the 13th International Conference on Information Fusion,
Edinburgh, United Kingdom, July 2010.
[24] A. Heldring, J. Rius, J. Tamayo, J. Parron, and E. Ubeda, “Multiscale Compressed Block
Decomposition for fast Direct Solution of Method of Moments Linear System,” IEEE Trans.
Antennas Propag., Vol. 59, No. 2, pp. 526536, 2011.
[25] G. Golub, and C. Loan, Matrix Computation, Baltimore, MD: The Johns Hopkins University
Press, 1996.
[26] E. Liberty, F. Woolfe, P. Martinsson, V. Rokhlin, and M. Tygert, “Randomized Algorithms for
the Low-Rank Approximation of Matrices,” PNAS, Vol. 104, No. 51, pp. 2016720172, 2007.
[27] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert, “A Fast Randomized Algorithm for the
Approximation of Matrices,” Dept. of Computer Science, Yale University, Technical Report
1386, 2007.
[28] E. Michilsen and A. Boag, “A Multilevel Matrix Decomposition Algorithm for Analyzing
Scattering from Large Structures,” IEEE Trans. Antennas Propag., Vol. 44, No. 8, pp.
10861093, 1996.
[29] R. Pierri and F. Soldovieri, “On the Information Content of the Radiated Fields in the Near Zone
over Bounded Domains,” Inverse Problems, Vol. 14, pp. 321337, 1998.
[30] R. Piestun and D. Miller, “Electromagnetic degrees of freedom of an optical system,” J. Opt. Soc.
Am. A, Vol. 17, No. 5, pp. 892902, 2000.
[31] A. Heldring, J. Tamayo, and J. Rius, “On the Degrees of Freedom in the Interaction Between
Sets of Elementary Scatterers,” 3rd European Conference on Antennas and Propagation, Berlin,
Germany, 2009.
330 Advanced Computational Electromagnetic Methods and Applications

[32] J. Dongarra and F. Sullivan. “Guest editors’ introduction to the top 10 algorithms,” Computing in
Science and Engineering, Vol. 2, No. 1, pp. 22–23, 2000.
[33] en.wikipedia.org/wiki/OpenMP.
[34] en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms.
[35] R. Mautz, “Mie Series Solution for a Sphere (Computer Program Descriptions),” IEEE
Transactions on Microwave Theory and Techniques, Vol. 26, No. 5, p. 375, 1978.
[36] J. Kirklabnd, “The Electromagnetic Code Consortium,” IEEE Antennas and Propagation Society
International Symposium, Chicago, IL, 1992.
Chapter 9
Computational Electromagnetics for the
Evaluation of EMC Issues in Multicomponent
Energy Systems
Osama A. Mohammed and Mohammadreza R. Barzegaran

This chapter reviews the physics modeling based on the electromagnetic stray
fields and interference in the electric power network. The low-frequency as well as
high-frequency equivalent source modeling of the power components for the study
of radiated and conducted electromagnetic compatibility is implemented. The 3-D
finite element analysis with some modifications is applied in the solution method
as well as meshing strategies for the simulation of large-scale components.
Moreover, the stray field of the components is utilized for improving the control of
the machine-drive system using hardware in loop method. The optimization in
design of the components such as power converter based on the electromagnetic
compatibility (EMC) compliance is also applied. This is achieved by coupling
MATLAB with 3-D finite element technique for applying the numerical
optimization techniques. The results are verified experimentally.

9.1 INTRODUCTION

The compliance with the EMC standards is an increasingly important aspect in the
design of practical engineering systems. Consideration of EMC issues at the design
stage is necessary to ensure the functional safety and reliability of complex
modern products, which are increasingly reliant on electronic subsystems to
provide powering, communications, control, and monitoring functions that are
needed to provide enhanced levels of functionality of systems. Typical examples
include transportation vehicles (road, rail, sea, and air), manufacturing plants,
power generation and distribution, and communications. The opportunities for
using numerical simulation techniques to predict and analyze the system EMC
and related issues (e.g., human field exposure and installed antenna
performance) are therefore of considerable interest in many industries.

331
332 Advanced Computational Electromagnetic Methods and Applications

For efficient control and use of electric energy, electronics and power
electronics are increasingly used within electrical systems. Examples of such
technologies are solar and wind power conversion systems, electric vehicles,
variable speed drives, and energy-efficient lighting systems. These technologies
are also used in evolving smart grid applications. A basic performance of such
modern electrical systems is related to EMC in the area of low-frequency
disturbances. Based on the above background, the importance of low-frequency
EMC study is increasing considerably.
However, the power electronic technologies are also used in evolving
machine-drive equipment such as vessels and aircrafts. The magnetic signature is
observable at low frequencies in the local magnetic field, but then several
applications in military include detection and classification by and subsequent
detonation of sea mines and detection and localization of submarines. Due to the
improvement of the sensitivity of EMF sensors and smart signal processing,
signature reduction is vital. Thus, the first goal is the decrease of the detection
range by complying with the strict signature requirements.
The other signature study aspect of the radiated fields at low frequencies is
condition monitoring of the components. The faults in the winding of the machines
as well as switches failure and many other problems can be detected without the
need for the system to be dismantled. This is critically beneficial for the sensitive
applications in which it may not be easily possible to get near to the components
for online testing and offline testing of the component is costly.
The previous works of the investigation of the radiated fields in the power
system can be categorized into EMC studies in power systems, electromagnetic
computational modeling studies, electromagnetic signature studies, system
monitoring studies, and fault and failure diagnosis. The electromagnetic
computational modeling is the concern of this chapter.
The modeling process in the field of electromagnetic compatibility means to
establish a connection between the source of interference or any other cause and its
effect that can be the response of the component as the part of the system. This
relationship can be established in several ways, depending on the type of problem,
its complexity, and the degree of approximation with respect to an exact
formulation. The possible methods involve:

 Using circuit theory for designating the conducted disturbance, such as voltage
dips, over-voltages, voltage stoppages, harmonics, and common ground
coupling [1, 2];
 Using an equivalent model (usually circuit) with either distributed or lumped
parameters, such as at low-frequency EMF coupling expressed in terms of
mutual inductances and stray capacitances, field-to-line coupling using the
transmission line approximation, and cable crosstalk [3, 4];
 Formulating the problem in terms of formal solutions to Maxwell’s equations
and making analytical models based on that [5];
Computational Electromagnetics for Evaluation of EMC Issues 333

 Physics-based modeling using numerical methods such as FEM, FDTD, MoM,


and so forth [68].

Generally, the methods used in the EMC modeling are not only to visualize
electromagnetic phenomena but also to predict and suppress the interferences,
which can be regarded as either theoretical or experimental.
Here, first the procedure of physics-based modeling for the EMC study is
explained. Then the equivalent source modeling versus 3-D full FE modeling is
discussed. Afterward, the EMC modeling of the converter with the purpose of the
optimization of power electronic component performance is described. Through
these sections, the techniques for the physics-based modeling for special purpose
are discussed.

9.2 PHYSICS-BASED MODELING FOR THE ANALYSIS OF THE


MACHINE DRIVE

9.2.1 Multiscale Problems

In applications such as radiated emissions or immunity of the system, the cables


and any ground loop current and on-board antennas can be considered as a
complex multiport antenna that can be characterized by using electromagnetic
modeling techniques. For building an EMC model, however, it is necessary to
consider a range of modeling techniques as outlined in Table 9.1, operating at a
number of different levels. The clearest requirement for combining models of
different types is the integration of circuit behavior (type A3 models) with the
electromagnetic performance of the installation (a type A1 model). However,
difficulties in prediction of electromagnetic interference (EMI) still exist. As the
number of components within a device increase, the complexity in modeling the
mutual coupling would increase exponentially, resulting in a task that is
unpractical even with today’s most powerful computers. Therefore, in some cases
a combination of the type A2 and A3 models may be useful for computationally
expensive calculation. Moreover, the complexity of parameter extraction can be
reduced by modeling of just major EMI-related components and circuits. Ideally,
such a simplification would need some specialist information of the device
electrical behavior and basic EMI characteristics. For example, a distinction
between the power handling and logic circuits in a motor drive will justify the
concentration on salient EMI-related circuitry and components, thus leading to
considerably reduced efforts in parameter extraction.
Due to the nature of the multiscale problem that is the multilevel numerical
modeling, as well as the requirement of the numerical test environment that needs
to be simply and quickly recreated, the separate modeling of each of the
components and subcomponents is needed. Therefore, it is beneficial to divide the
problem into three levels, as shown in Figure 9.1.
334 Advanced Computational Electromagnetic Methods and Applications

Table 9.1
Classification of the Numerical Level Necessary to Predict the Different Performance Measures
in the Virtual Test Environment
Model Leveling Model Character Objectives

3-D in the time or Electromagnetic (surface 3-D EMF distribution and


A1
frequency domain or volumes meshing) related parameters
Calculation of low to high
Quasi-static,
2-D or 3-D quasi-static frequency RLC elements of
electromagnetic (surface,
A2 analysis in the time each of the components,
volumes, or peripheral
harmonic domain frames, and frame to outer
meshing )
chamber
Lumped element circuit Physics-based circuit model
0-D in the time or
A4 (discrete mathematical (circuit simulation
frequency domain
models) environment)

Device Level Interface Level Encloser Level

Electrical Machines Cable harness Outer encloser with


Between components air and each of the
Speed controllers enclosures inside
Capacitive current paths
IGBT Modules between enclosures and
between each enclosure
Inverter and Convertor to the floating ground

Integrated Circuits Decoupling Capacitors

Internal Cables

Enclosures

Power Bus

AC and DC Choke

EMI Filter

Figure 9.1 Decomposition of the modeling problem for creation of numerical test environment.

The device level consists of each of the physical models of all components
calculated from a 2-D or 3-D quasi-static electromagnetic FE analysis. In the
device level, the component models can be divided into several subsystems, based
upon their power range, and their location inside the components, their degree of
importance from EMC and EMI issues, force outage rate, and the related fault
diagnosis issues. The interface level consists of any resistive, capacitive, or
inductive paths between enclosures of the components and the additional
Computational Electromagnetics for Evaluation of EMC Issues 335

decoupling capacitor that are used to reduce the area of ground current loop and
also to cut the current path and prevent it from entering the control units.
The environmental level consists of the physical model of the chamber filled
with air and the enclosure model of each of the components placed in it.

9.2.2 Numerical Virtual Prototyping

As previously explained, the modeling problem can be decomposed into three


different levels (i.e., device level, interface level, and the environmental level). In
each level, the sets of numerical analysis are required that enable the designer to
predict the performance measure of the device under development either under
fault or no-fault situations. Figure 9.2 illustrates the modeling procedure required
at each level to provide the designer with a performance measure. Figure 9.3 gives
a description of the main origins of EMI at low and high frequencies.

1. Device Level 2. Interface Level 3. Environment Level

Low frequency physics-based Low and high frequency Creation of the chamber with
modeling of each component physics-based modeling and black box model of enclosures
including power bus, machines, the enclosures while they are in FE-based software for
switches, DC choke, EMI filters, placed in the chamber using a simulation of electromagnetic
and ETC including enclose quasi-static FE solution field propagation
using a quasi-static FE solution
Creation of the equivalent Excitation of the black box
Creation of the high frequency circuit of enclosures including model with simulated ground
equivalent circuit of each self and mutual inductances currents from simulink
element of component between enclosures and
mutual capacitance and
floating ground capacitance
Connection of the elements
(model 2) Signature study
and creation of each active or
passive component separately
Combination of models 1 and 2
Connection of all of the Excitation of the finite element
components and creation of Simulation of the global model with extracted current
the model 1 equivalent circuit in simulink from simulink
and extraction current to
Simulation of the equivalent ground
circuit of each part in simulink
Creation of each part in an FE-
based software for simulation
Fault diagnosis and prognostic
of electromagnetic field
studies
propagation

Figure 9.2 Functional model of the numerical test environment.

A schematic view of a complete motor drive system [8, 9] is shown in Figure


9.4. In this system, the motor component, the IGBT module component, direct
component (DC) bus component, cable component, and the logic components
establish the main part of this system. Each of the components is enclosed in an
enclosure, and all of the enclosures are placed inside a chamber that
electromagnetically is isolated from the outer environment.
336 Advanced Computational Electromagnetic Methods and Applications

Voltage
unbalance
LF, electric fields.
Radiated from
circuits with a high
dv/dt
DC in AC Radiated low frequency
circuits and (LF) interference (up to
vice versa about 10 kHz)
LF, magnetic
fields. Radiated
from circuits with a
Voltage dips high di/dt
and power
interruptions

Transient over-
Conducted low voltages due to
Power lightning or
frequency (LF)
frequency switching
interference (up to about
variation
10 kHz)

Conducted high frequency Oscillating


(HF) interference (from 10 transients due to
Voltage sags kHz to 1 GHz) resonance
and swells
Coupled HF
voltages and
currents
Coupled LF
voltages and
currents

HF, electric fields.


Radiated from
circuits with a high
Harmonics in
dv/dt
AC networks
Radiated high frequency
(HF) interference (from
1010kHz to 1 GHz)

HF, magnetic
fields. Radiated
from circuits with a
high di/dt

Figure 9.3 Main origins of EMI at low and high frequency.

Figure 9.4 A visualized view of the numerical test environment for machine-drive design.

For preparation of the numerical test environment for a motor-drive schema,


the following procedure is implemented to evaluate EMC and EMI in the
Computational Electromagnetics for Evaluation of EMC Issues 337

environment in which the motor drive is located. In brief, the step-by-step


implementation of the modeling procedure is summarized as:

 A coupled field-circuit 3-D FE electromagnetic and electrostatic analysis is


done to calculate the low- to high-frequency model of the each of the
components. For the motor, this task is done both for the differential mode and
for common mode. This stage is done by using the 3-D FE analysis for a given
full and equivalent configuration including layout and ground schemes. At this
stage, the estimation of the low- to high-frequency current paths inside the
structure of each of the components is also done.
 The current multi-paths between the enclosures of components and between
each of the enclosures to the outer chamber including physically grounded
paths and high-frequency capacitive paths are estimated. Here the proximity
and skin effects are ignored while the static capacitances are calculated.
Although the geometry and material effects are taken into account, in the cable
component, the enclosure is assumed as the shield of the cable that is grounded.
 Simulation of the whole motor-drive circuit includes all the high-frequency
parasitic parameters. The ground current is extracted from all of the high-
frequency ground paths and the physically grounded paths to the floating
ground point.
 The estimated current path inside each of the components via line wires in 3-D
FE software is implemented, and the radiated field to the surrounding
environment is evaluated while the simulated line and line to ground current
form circuit is modeled as a current source to inject the current to the
corresponding line wires that represent the current path model. At this stage,
the FE analysis is limited to the component and its enclosure.
 The estimated current distribution paths are implemented via line wires inside
the outer chamber together with the enclosures model, in 3-D FE software, and
the radiated field to the surrounding environment is evaluated while the
simulated ground currents from the circuit model are assumed as current
sources that inject the proper current to their corresponding line wires that
represent the current path models.
 The uncertain and stochastic impact of noise currents induced by fields can be
analyzed through placing a current source noise inside the model.
Through this type of modeling, the complexity of the physics-based modeling
is split into three shown major subproblems.
338 Advanced Computational Electromagnetic Methods and Applications

9.3 EQUIVALENT SOURCE MODELING

In this section, the electromagnetic signature study of electrical components is


estimated by evaluating the fields at a distance from their sources. A numerical
3-D model is developed and utilized for this purpose. The estimation of these
radiated fields from electrical components requires significant computational time
especially for cases involving multiple components such as generators and motors,
power converters, and cable run (see Figure 9.5) [10]. Therefore, the equivalent
source modeling is proposed.

(a) (b)
Figure 9.5 Prototype of the proposed machine (SCIM) in finite element analysis. (a) Actual model
and (b) an equivalent line-shape model for EMI and signature studies. ©IEEE 2011 [10].

The models based on optimization, such as the voltage-current rectangular


prism model [8], have good accuracy for the magnetic and electric stray fields.
However, there is an issue that these models are designed based on optimization
processes, which means that the model is applicable for one particular voltage or
specific amount of power. If sizes of the component change, the model should be
optimized again to find new parameters of branches and new dimensions.
Therefore, the new model is proposed based on the structure and operation of the
components to be applicable for all parameter sets without any need to be modeled
or simulated again.
Consequently, the principle of the model is explained in the next section. Then,
modeling strategies of each component with the simulation results are
demonstrated. Afterward, the whole setup is modeled and verified experimentally.
Finally, the procedure of generalization of the model is explained in detail along
with the simulation results.
The path and direction of currents, as well as the value of the current density,
of the electric machine winding have a very important role in establishing
magnetic stray fields at far distances. The magnetic field waveform at a far
Computational Electromagnetics for Evaluation of EMC Issues 339

distance is under the influence of the direction of wires in winding arrangements.


Therefore, a line-shape model is proposed instead of the actual model, and the
related current based on the current density of the actual machine is applied to the
lines. The model of the proposed actual machine (squirrel cage induction motor,
SCIM) and the proposed model are shown in Figure 9.5.
The equivalent source model of the system is designed and created based on
the current directions. The path of the winding arrangement for the machine and
other components, including the position of voltage terminals, should be identified.
As shown in this figure, the equivalent source model consists of numerous lines
with specific currents flowing and voltages established at nodes of these wires.
Currents of the equivalent source model are calculated based on equalizing the
magnetic field densities. Using the Biot-Savart law, the radiated magnetic field
density of a line at an R distance away from the line is as follows:

0 Il dl  aˆR
4  R 2
Bl  (9.1)

where the l is the length of the line and Il is the carrying current of the line, and aˆ R
is the distance vector between dl and the observation point. Similarly, for a volume
current, the radiated magnetic field density at an R distance is as follows:

0 Jdv  aˆR
4  R 2
Bv  (9.2)

The idea of this model is to have the same field, while the model is a line and
does not have a cross-section. Hence, by equalizing (9.1) and (9.2) and considering
J, R, ds, and dl as known parameters, then Il, the current amplitude of the line, can
be calculated. The voltage of nodes is similarly calculated by equalizing the
electric field due to the charge distribution of the line and volume. Each
component has some parameters that should be considered about this modeling and
explained in their section. More details about the basics of the model are
mentioned in [8].
A typical power setup consists of electrical generators, such as a synchronous
generator, electrical motors, such as induction and DC motors, and connection
cables and power converters. All of these components except the converter are
modeled using the equivalent source model, and some of them are verified
experimentally. The converter has some considerations, and its modeling for stray
field analysis is explained in the next section (Section 9.4). In addition to the study
of each component, their coupling is also studied. Finally, the whole setup is
investigated.
340 Advanced Computational Electromagnetic Methods and Applications

9.3.1 Introduction Motor

9.3.1.1 The Modeling Concept

Before investigating the complete equivalent source model, the combination of


equivalent source model for magnetic field and cube model for the electric field is
investigated and introduced in this part. The procedure for designing the equivalent
source model is explained in Section 9.3.2. Since the modeling is based on the
electric current, it gives good results just for the magnetic field, so the model does
not show logical results for the electric field. In order to have the electric field of
the actual machine at a far distance, a cube model is proposed, which is shown in
Figure 9.6. The details of making this cube are explained in [9].
As shown in Figure 9.6, eight voltages are applied to the nodes at the corners
of the cube. The value of these voltages is based on the maximum electric potential
difference and also the electric displacement field in the winding.

V1 A
V2

V3 V4
C

V5 V6

B V8
V7

Figure 9.6 Prototype of the proposed cube model for replicating the electric field of the actual
machine. ©IEEE 2011 [10].

Consequently, to have both the magnetic and electric field of the model
simultaneously, these two models [Figure 9.5(b) and Figure 9.6] are combined
together. The combined model propagates similar electric and magnetic fields at
far distances. This model is shown in Figure 9.7.
For simulation purposes, a three-phase, 380V, 5A, and 120 turn/phase
induction machine with a stack length of 0.15m and outer diameter of 0.175m is
simulated in the 3-D electromagnetic FE domain for a specific time. The meshed
final model is shown in Figure 9.8. The number of degrees of freedom of the
source model is considered as large as possible, in order to have accurate results of
the propagation in measured areas. In addition, an appropriate element growth rate
is applied to the model and the tolerance of analysis is considered at 1e-6.
Computational Electromagnetics for Evaluation of EMC Issues 341

Figure 9.7 The equivalent cylinder-cube model for reproducing radiated electric and magnetic
fields of the actual machine. ©IEEE 2011 [10].

Using the optimization process, the cube lengths are calculated as A =


0.1009m, B = 0.125m, and C = 0.1282m. Moreover, as mentioned earlier, the size
of the cylinder is based on the size of the actual machine. Although the analysis is
implemented for a typical machine, this equivalent model can be used for similar
types of machines (induction machine) with slight modification, for example, the
size of the model. This modification in size can be based on the ratio of the size of
any actual machine to that of the basic machine model. This basic model can be the
machine studied in this section. Other parameters such as the voltage and current
values can be considered as well. Although the optimization is used for the design
of the cube model for having the correct electric field, the new model, which
eliminates the cube and optimization process, is employed and explained in
Section 9.3.3.

Figure 9.8 Mesh of the equivalent model. ©IEEE 2011 [10].


342 Advanced Computational Electromagnetic Methods and Applications

9.3.1.2 Simulation Results

In order to verify the accuracy of the model, propagated electric and magnetic
fields from the proposed model and the actual machine along three lines in the x-,
y-, and z-directions at a far distance, as shown in Figure 9.9(a), are calculated and
compared, as shown in Figure 9.9(b). The position of the reference lines from
which the propagated electric and magnetic fields are measured is also shown in
Figure 9.9(a). As shown in Figures 9.9(b) and 9.10, the model propagates similar
electric and magnetic fields in comparison with the actual model at far distances.
Because of the adjacency of the radiated electric fields in the x and y lines, a
magnified view depicting details is shown in Figure 9.10.

In Y direction

In X direction

In Z direction

(a)
-6 X Axis (actual)
x 10
1.4 X Axis (model)
Y Axis (actual)
Y Axis (model)
1.2
Z Axis (actual)
Magnetic Field Density (T)

Z Axis (model)
1

0.8

0.6

0.4

0.2

0
-2 -1 0 1 2
Coordinate (m)

(b)
Figure 9.9 (a) Reference lines from which propagated electric and magnetic fields are measured;
and (b) propagated magnetic field from the actual and proposed model in all three axes.
©IEEE 2011 [10].
Computational Electromagnetics for Evaluation of EMC Issues 343
-4
x 10
6 X Axis (actual)
X Axis (model)
Y Axis (actual)
5 Y Axis (model)
Z Axis (actual)
Z Axis (model)
Electric Field (V/m)
4

0
-2 -1 0 1 2
Coordinate (m)

Figure 9.10 Propagated electric field from the actual and proposed model in all three axes. ©IEEE
2011 [10].

As shown in Figure 9.9(a), radiated magnetic flux densities along different


axes are different in various planes and even in the same plane. For example, the
measured magnetic flux density along the z-axis shows the lower amplitude
compared with the one along the y-axis. Since the structure of winding in which
the shaft of the motor and windings are along the z-axis, the current flows mostly
along the same axis and subsequently does not change significantly. Consequently,
the field does not change considerably along the axis.
Following the investigation of far fields, the checking of fields at closer
distances to the model was conducted. The propagated magnetic field along the x-
axis is obtained and shown in Figure 9.11. This figure shows the propagated
magnetic field density at one meter closer to the model. The difference between the
propagated magnetic field density in the x-axis direction, shown in Figures 9.9 and
9.11, is because of the butterfly effect of magnetic field density in farther distances
[11]. Furthermore, Figure 9.11 shows that there are some ripples in the fields.
These ripples are related to the high-frequency response. Therefore, since the result
of the equivalent model in rippled parts follows the actual one, it can be inferred
that the model can also be used in the high-frequency analysis.
Figures 9.9(b) to 9.11 show the fields at one line in the planes, which are
shown in Figure 9.9(a). This analysis is essential but not sufficient for investigating
the accuracy, because one may think that the fields may resemble each other just in
the center line due to symmetry. For further investigation and to assure that the
fields propagated from the actual machine and equivalent models resemble each
other at all positions in the study area, the propagated magnetic and electric field
spectrums throughout the whole x-y plane are obtained and shown in Figure 9.12.
344 Advanced Computational Electromagnetic Methods and Applications

Comparing Figures 9.12(a) and 9.12(b), it can be seen that, not only do the
wave shapes of the magnetic field density of the two models match, but also their
amplitude is almost the same in all points of the plane. Also, the electric fields of
both models, which are shown in Figures 9.12(c) and 9.12(d), are the same at
almost all points. This is also valid for all other planes around the model. In
conclusion, it is verified that the equivalent model can replace the actual model for
the signature study analysis of one case machine.
-6
x 10
1.2
X Axis (actual)
X Axis (model)
1
Magnetic Field Density (T)

0.8

0.6

0.4

0.2

0
-2 -1 0 1 2
Coordinates (m)

Figure 9.11 Radiated magnetic field from the actual and proposed models in the x-axis at 1-m
distance to the models. ©IEEE 2011 [10].

Since the final goal of this research is to use this model in multicomponent
systems, the model is studied for the two-motor case. This can also be considered
as validation of the obtained cylinder-cube model from the one-motor case, and
inserted into the model to investigate a multimachine drive, while the currents in
the branches of the cylinder and voltages at the nodes of the cube remain the same
as in the first case (single case model). The centers of the coordinates of the two
cubes and cylinders are exactly the same as the actual machine model. Figures 9.13
and 9.14 show the comparison between magnetic and electric fields propagated
from the actual and proposed models for the two-motor case. Note that the
proposed planes and lines for measuring the fields are the same as the single
machine case [see Figure 9.9(a)]. As can be seen, the magnetic and electric fields,
like the single machine case, follow the same patterns with good accuracy.
The shift in the electric field signatures measured along the z-axis [see Figure
9.14] is because of the size of the equivalent model. As discussed before, an
optimization method can be used for fitting the size of the model. If parameters of
the optimization vary, for example, the mutation factor is modified to bigger values,
this shift would be decreased. This becomes true for the magnetic field as well.
Computational Electromagnetics for Evaluation of EMC Issues 345

(a) (b)

(c) (d)
Figure 9.12 Magnetic and electric field spectrums throughout the x-y plane propagated from the
actual machine and the proposed model. ©IEEE 2011 [10].

-6
x 10 X Axis (actual)
3 X Axis (model)
Y Axis (actual)
Y Axis (model)
2.5
Z Axis (actual)
Magnetic Field Density (T)

Z Axis (model)

1.5

0.5

0
-2 -1 0 1 2
Coordinate (m)

Figure 9.13 Propagated magnetic field from the actual and proposed models for two motors in three
axes. ©IEEE 2011 [10].
346 Advanced Computational Electromagnetic Methods and Applications

X10-14
8

6
Electric Field (V/m)

5
X Axis (actual)
4 X Axis (model)
Y Axis (actual)
3 Y Axis (model)
Z Axis (actual)
Z Axis (model)
2

1
-2 -1 0 1 2
Coordinate (m)

Figure 9.14 Propagated electric field from the actual and proposed models for two motors in all
three axes. ©IEEE 2011 [10].

One of the achievements of this research is the simulation time reduction. The
comparison between the simulation times shows that this approach makes the
simulation time of the model at least 100 times faster than a full 3-D model. More
details of the comparison are illustrated in Table 9.2.

Table 9.2
Comparison of Computation Time for the Actual and Equivalent Models

Case of Studies One-Machine Case Two-Machine Case


3-D FE model 6 (Hours) 11 (Hours)
Equivalent model 100 (Sec.) 160 (Sec.)

In addition to the importance of the accuracy of the field spectrums, the


direction of the flowing fields is significant because the field spatial figures, as
shown in Figures 9.12(a)(d), do not show the direction of the fields. Hence, the
arrow line of the magnetic field density of the actual machine and that of the
equivalent model are compared and shown in Figure 9.15. As displayed in this
figure, the magnetic field in the actual case around the motor is denser in
comparison to the equivalent source model case. However, the radiated magnetic
field at further distance is almost the same in these two models. This is more
important because the model is designed for far distances. While the arrow plot in
Figure 9.15 shows the magnetic field density, this can be classified as a discrete
streamline of this field. A continuous streamline of the H-field (magnetic field
Computational Electromagnetics for Evaluation of EMC Issues 347

intensity) of the two models (actual and equivalent source models) is obtained as
shown in Figure 9.16. The H-field streamline also shows that the equivalent source
model has very similar results to the actual model. It also shows that the dipoles
establish around the equivalent source model also in near distance. It should be
noted that the purpose of this model is to obtain resembling fields at far distances.

(a) (b)
Figure 9.15 Arrow plot of magnetic field density of (a) actual machine and (b) equivalent model in
the x-y plane.

(a) (b)
Figure 9.16 Stream-line of H-field of (a) actual machine model in the x-y plane (A/m) and (b)
equivalent source model in the x-y plane (A/m).

9.3.1.3 Frequency Response Analysis  Multiresolution Analysis


Frequency analysis helps extract the waveform information that is not readily
available in the time domain. However, it determines how the field waveforms,
from real and equivalent models, are comparable at all of the various frequencies
within a certain bandwidth. Multiresolution analysis (MRA) offers a way for
348 Advanced Computational Electromagnetic Methods and Applications

analysis of nonstationary waveforms bounded in both the frequency and time


durations.
This method breaks up the signal into hierarchical levels of different
resolutions, which are matched to different frequency bands as shown in Figure
9.17(b) [12, 13]. The MRA consists of two general successive processes of
decomposition and reconstruction. In the orthogonal wavelet decomposition
procedure, the decomposition is started with the original signal and it is repeated
by consecutive approximations down into coefficient vectors with lower resolution
as shown schematically in Figure 9.17(a). The employed mathematical
manipulation for the decomposition step is called discrete wavelet transform
(DWT). As shown in Figure 9.17(a), along with the decomposition process,
lowpass and highpass decomposition filters (L and H) and downsampling of the
partial discharge (PD) signals are applied at each level.

(a)

(b)
Figure 9.17 (a) Decomposition and reconstruction by MRA at two levels (L and H represent the low
and high pass rescontruction filters, respectively) and (b) decomposition and
reconstruction by bandwidth of subsignals.

The wavelet coefficient vectors may be modified before the reconstruction


procedure is commenced. Various types of modifications of the wavelet coefficient
vectors with many known applications are employed; de-noising and compression
are more drastic and well known among them. All of the detail coefficient vectors
and only the final approximations coefficient vector at level G are applied to
reconstruct the original signal, but after some modifications, such as denoising and
compression, if required. The low and high pass reconstruction filters in Figure
Computational Electromagnetics for Evaluation of EMC Issues 349

9.17(a) are indicated by L and H, respectively. The reconstruction procedure is


originated from the inverse discrete wavelet transform (IDWT) concept. The
reversed coefficient vectors, which are calculated by the equal number of
reconstruction and decomposition levels, are termed as reconstructed details and
approximation [12, 13]. Here, the Nyquist frequency is defined as half of the
sampling frequency. The sampling frequency is calculated by the division of the
displacement by the unit speed of 1 (m/s).

0.0366<f(Hz)<0.0732 0.0366<f(Hz)<0.0732
250 160
200 140
150 120
-2 0.0732<f(Hz)<0.1465 2 0.0732<f(Hz)<0.1465
200
-2 2
200
100 Normal Electric Field (Micro V/m) 100
(nT)

0
Density(nT)

0 0.1465<f(Hz)<0.293
-2 0.1465<f(Hz)<0.293 2 -2 2
100 200
FluxDensity

0 0
-100 0.293<f(Hz)<0.5859
-200
0 4 -2 0.293<f(Hz)<0.5859 2
500
Flux

100
0 0
Magnetic
Magnetic

-500 -100
-2 0.5859<f(Hz)<1.1719 2 -2 0.5859<f(Hz)<1.1719 2
200 20
0
Normal

0
-200
Normal

1.1719<f(Hz)<2.3438 -20
-2 2 -2 1.1719<f(Hz)<2.3438 2
100 50
0
0
-100
-2 2.3438<f(Hz)<4.687 2 -50
50 -2 2.3438<f(Hz)<4.687 2
0 10
-50 0
-2 4.68<f(Hz)<9.375 2 -10
20 -2 4.68<f(Hz)<9.375 2
0 5
0
-20 -5
-2 -1 0 1 2 -2 -1 0 1 2
Coordinate (m) Coordinate (m)

Equivalent Real machine


(a) (b)
Figure 9.18 Comparisons of (a) normal magnetic and (b) electric field at different frequency band,
equivalent and real machine, one-machine case of study.

Figure 9.18 show the reconstructed magnetic and electric fields in the y-
direction at different frequency bands for one machine, respectively. It can be
observed that an acceptable matching at different frequency bands exists between
the equivalent and real machines almost in the entire frequency band.
350 Advanced Computational Electromagnetic Methods and Applications

Figure 9.19 is the reproduction of Figure 9.18 for the two-machine case study.
As can be observed, there is an acceptable agreement between the equivalent and
real models. Moreover, comparison of Figures 9.18 and 9.19 proves that the linear
relationship between the one-machine and two-machine cases of studies exist in all
of the frequency bands. However, the found relationship between the one- and
two-machine cases of studies depends on the explicit geometrical arrangement of
two-machine cases.

0.0366<f(Hz)<0.0732 0.0366<f(Hz)<0.0732
500 250
400
300 200
-2 0.0732<f(Hz)<0.1465 2
400 -2 0.0732<f(Hz)<0.1465 2
200 400
Normal Magnetic Flux Density (nT)

200
0
Normal Electric Field (Micro V/m)

-2 0.1465<f(Hz)<0.293 2 0
-2 0.1465<f(Hz)<0.293 2
200 200
0
0
-200
-2 0.293<f(Hz)<0.5859 2 -200
500 -2 0.293<f(Hz)<0.5859 2
0 100
-500 0
-2 0.5859<f(Hz)<1.1719 2 -100
-2 0.5859<f(Hz)<1.1719 2
500
0 50
-500 0
-2 1.1719<f(Hz)<2.3438 2 -50
200 -2 1.1719<f(Hz)<2.3438 2
0
50
0
-200 -50
-2 2.3438<f(Hz)<4.687 2 -2 2.3438<f(Hz)<4.687 2
100 10
0 0
-100 -10
-2 4.68<f(Hz)<9.375 2 -2 4.68<f(Hz)<9.375 2
50 5
0 0
-50 -5
-2 -1 0 1 2 -2 -1 0 1 2
Coordinate (m) Coordinate (m)

Equivalent Real machine

(a) (b)
Figure 9.19 Comparisons of (a) normal magnetic and (b) electric field at different frequency bands,
equivalent and real machine, two-machine case of study.
Computational Electromagnetics for Evaluation of EMC Issues 351

9.3.1.4 Time and Rotation Study

Time-Based Analysis

Since the actual induction machine carries alternating current (AC), the time-based
analysis is more useful. In the previous sections, the analysis was time-based;
however, the figures are just depicted in one typical moment of time. In this
section, the radiated EMFs of different instances of time in one cycle are studied.
For brevity, four time instances are selected (0.0025 second, 0.005 second, 0.0075
second and 0.0125 second). The voltage amplitude of the terminal of the model
during one time cycle is shown in Figure 9.20.
First, the radiated magnetic field in the near distance (0.5m) from the machine
is studied. The magnetic field density measured in four time instants is shown in
Figure 9.21. The result shows that the magnetic field rotates by the variation in
time, although the position of the maximum field point remains unchanged. It can
be inferred from this result that the model resembles the machine and can be used
instead of that, at all time instants, not just one time instant, in which the model is
designed. Next, the radiated magnetic field at a far distance (~10m) from the
machine is studied. In this distance, the rule of the magnetic dipoles for these
distances causes the field to become similar to a dipole, as shown in Figure 9.21
[14]. As shown in this figure, the dipoles are sensitive to time changes and they
rotate when the time changes. Consequently, the equivalent source model can be
used for the time-based analysis at near and far distances.

Phase A
Phase B
0.5 Phase C
Terminal Votage (p.u)

-0.5

-1
0 0.0025 0.005 0.0075 0.01 0.0125 0.015 0.0175 0.02
time (s)

Figure 9.20 Voltage amplitude of the terminal of the model during one time cycle.
352 Advanced Computational Electromagnetic Methods and Applications

(a) (b)

(c) (d)

Figure 9.21 Magnetic field density (B) of equivalent model in four different moments of time at near
distance (a) t = 0.0025 second; (b) t = 0.005 second; (c) t = 0.0075 second; and (d) t =
0.0125 second.

The Effect of Rotation

Another condition that should be studied for the induction machine is testing
various positions of the machine. In many cases, the location of the motor with
respect to the measured points will change. Therefore, the electromagnetic
signatures are expected to be changed. Hence, a specific change of the motor is
studied here. The whole machine was rotated around an axis and the results were
obtained and illustrated in Figure 9.22. The magnetic field in this figure is plotted
at a far distance.
Computational Electromagnetics for Evaluation of EMC Issues 353

(a) (b)

(c) (d)
Figure 9.22 Magnetic field density (B) of equivalent model at the four different intendances of time
at far distance

-6
x 10
2
0 deg
1.8 20 deg
40 deg
1.6
60 deg
Magnetic Field Density (T)

1.4 80 deg

1.2

0.8

0.6

0.4

0.2

0
-20 -16 -12 -8 -4 0 4 8 12 16 20
coordinate in X axis (m)

Figure 9.23 Deviation of magnetic field density (B) of the equivalent model due to the rotation of
the whole machine around the z-axis.

As shown in Figure 9.23, by rotating the rotor of the induction machine


around the z-axis, the magnetic field moves along the perpendicular coordinates
354 Advanced Computational Electromagnetic Methods and Applications

(x, y) from right to left. When plotting other angles ranging from 90 o to 180o, the
results are exactly symmetrical with respect to the changes from 0 o to 90o. The
magnetic field density of 180o change is exactly the same as the one with 0o change.
This study is useful in identifying the situation of the source machine by looking at
the signatures at far distances. All of these studies can be imported to an
optimization program, such as genetic algorithm or neural network. Therefore, the
machine in any situation can be recognized.

9.3.1.5 Experimental Verification

The procedure and standard of experimental measurement of the low frequency


stray fields are explained [15]. The details of the setup, which are shown in
Figure 9.24, are as follows:

Table 9.3
The Characteristics of the Components

Components Description

Coverage between 1 Hz and 3 GHz, absolute amplitude accuracy:


EMI receiver/spectrum
±0.5 dB to 3 GHz, displayed average noise level: –142 dBm/Hz at
analyzer
26.5 GHz, –155 dBm/Hz at 2 GHz and –150 dBm/Hz at 10 kHz.
Active monopole antenna, coverage between 30 Hz and 50 MHz,
Electric rod antenna
impedance: 50Ω.
Coverage between 20 Hz and 500 kHz, 36 turns of 7-41 litz wire shielded
Magnetic coil antenna
with 10-ohm resistance and 340-µH inductance.
Induction motor (IM1) 7.5-HP, 208-V, 1765-RPM, PF: 0.82, 60 Hz, EFF: 89.5%.

Figure 9.24 The studied experimental setup including the machine and measurement tools.
Computational Electromagnetics for Evaluation of EMC Issues 355

The coil antenna and the real-time spectrum analyzer, which are used in the
measurement, are specifically for low-frequency analysis with high precision. The
frequency range is between 20 Hz and 500 kHz. The winding of the antenna is 36
turns of 7-41 litz wire shielded with 10-ohm resistance and 340-µH inductance.
The antenna and the setup are located based on the standards (MIL-461-STD [16],
MIL-462-STD). The spectrum analyzer also covers 1 Hz to 3 GHz with ±0.5-dB
absolute amplitude accuracy to 3 GHz. The details of the components are
mentioned in Table 9.3.

-10
measurement
3DFE model
-20 equivalent model
Intensity(dBA/m)
(dBuA/m)

-30

-40
Intensity

-50
FluxFlux
Magnetic

-60
Magnetic

-70

-80
-2.5 -1.5 -0.5 0.5 1.5 2.5
Arc Length (m)

(a)

(b)

Figure 9.25 (a) The magnetic field intensity at 55 cm away from the setup in the y-axis while all
components except IM were off at 60 Hz (dBµA/m), and (b) the region of the model
(the model is in the center, and the measured line is shown in dark grey).
356 Advanced Computational Electromagnetic Methods and Applications

The nominal voltage is applied to the machine and the stray magnetic field is
obtained at various distances. The result of measurement, the full 3-D FE model
and the equivalent source model at 60 Hz are obtained and shown in Figure 9.25.
The magnetic field intensity, as the standard index of signature studies, is used
with dBµA/m as the unit of comparative measure.
As illustrated in Figure 9.25, the signatures from the two simulation models
match the measurement. The reason that the measurement results in the figure do
not show distortions is the low number of patterns of the measured results in
comparison with the simulation result, especially the equivalent source model. The
number of patterns along the line in the y-axis, which is used as the measured line,
for the equivalent source model is 260 points, while it is about 10 points for the
measurements. The measured line is also shown in Figure 9.25.

9.3.2 DC Motor

9.3.2.1 The Modeling Approach

The induction motor that is discussed in Section 9.3.1 had armature and field
windings and in terms of winding, it is known as the simplest machine. In contrast,
the DC machine has four types of winding, including armature, field,
compensation, and commutating windings. Therefore, their equivalent modeling
and merging as implemented in Section 9.3.1 is not easy. Each of these windings
has a specific design that causes specific types of electromagnetic signature. Since
each winding has different shapes of the radiated field at far distances, each of
them are simulated and modeled individually and finally all of them are combined
as one model.
However, the second part of the modeling is finding the appropriate size of the
model, which is very important in far-field and also near-field computation.
Basically, dimensions of the model are based on the size of the machine, but
for better and precise results, an optimization method is used. The proposed
optimization process is GA-based PSO, which was explained in Section 9.3.1.
In this method, objective functions are dimensions of the model including the
number of dimensions and their length. In addition to the length of dimensions,
also the number of dimensions can be considered as objectives of the model,
whereas the number of dimensions can vary from a cone and cube to polyhedron.
A typical schematic of this aspect of modeling is shown in Figure 9.26.
Finally, by collecting the previously mentioned methods and strategies, the
equivalent model is achieved. For better investigation and generating a more
accurate model, the equivalent model of each winding is achieved and shown in
Figures 9.27(a–d).
Computational Electromagnetics for Evaluation of EMC Issues 357

Figure 9.26 Typical schematic of the equivalent model (optimization aspect).

(a) (b)

(c) (d)
Figure 9.27 Equivalent models of (a) armature winding; (b) commutation winding; (c) compensation
winding; and (d) field winding in equivalent DC machine.

The final equivalent model consists of hundreds of currents with different


amplitudes and directions and tens of different voltages, and is illustrated in Figure
9.28. As discussed before, to have comprehensive study, appropriate switches are
considered for each winding.
358 Advanced Computational Electromagnetic Methods and Applications

Figure 9.28 Final equivalent model of the propulsion DC machine.

9.3.2.2 Results and Discussion

For simulation purposes, an 800 HP, 750 V, 8-pole, and 185 RPM propulsion DC
machine with a length of about 3m and an outer diameter of 1.7 m is simulated in a
3-D electromagnetic FE domain for one time instance. The actual model and the
mesh structure of this machine in FE domain are shown in Figures 9.29(a) and
9.29(b), respectively.

(a) (b)
Figure 9.29 Schematic of (a) the detailed model of the DC machine and (b) the mesh in FE domain.

The analysis for the model of the actual machine requires about 7 million
degrees of freedom in the FE analysis. This causes the simulation time to be about
43,000 seconds (~12 hours). However, the equivalent model with less than 1
million degrees of freedom takes about 300 seconds (~6 minutes).
The analysis method, which is used in this analysis, is the generalized minimal
residual method (usually abbreviated GMRES) with successive over-relaxation
(SOR) pre- and post-smothers, which was explained in [9].
After implementing simulation of both the actual and the equivalent models,
the propagated electric and magnetic fields are measured in different locations at a
Computational Electromagnetics for Evaluation of EMC Issues 359

distance from the source. Figure 9.30 shows the propagated magnetic fields from
both models along different lines.
Figure 9.30 show that the magnetic field propagated from the actual machine
has different wave shapes in various measured lines, so it can be inferred that it is
not possible to use a single dipole as an equivalent model because a single dipole
shows similar results in all planes. In addition to the wave shape of fields, also
their amplitudes in various measured lines are different, which can be another
reason to use an embedded equivalent model. This point can also be seen in
radiated electric field wave shapes (see Figure 9.31), although the difference of
electric field wave shapes measured along various lines is shallow and can hardly
be recognized. For example, comparing electric fields in Figures 9.31 and 9.32, it
can be seen near the peak that the results of the actual machine and the equivalent
model are different.
-4
x 10

Actual a
Equivalent a
Actual c Actual b
Equivalent c Equivalent b
Actual c
Magnetic Field Density (T)

Equivalent c
2 Actual a
Equivalent a

Actual b
1 Equivalent b

0
-20 -16 -12 -8 -4 0 4 8 12 16 20
Coordinates (m)
(a)

(b) (c) (d)


Figure 9.30 (a) Radiated magnetic field density in the case (b); (b) x-z plane when x varies between
20 and 20; (c) x-y plane when x varies between 20 and 20; and (d) y-z plane when z
varies between 20 and 20.
360 Advanced Computational Electromagnetic Methods and Applications

0.9
Actual b
0.8 Equivalent b
actual c
0.7 Equivalent c
Actual b
Equivalent b
Electric Field (V/m)

0.6

0.5

0.4

0.3
Actual c
0.2 Equivalent c

0.1

0
-20 -16 -12 -8 -4 0 4 8 12 16 20
Coordinates (m)

Figure 9.31 Radiated electric field of in (case c in Figure 9.30) the x-y plane when x varies from 20
to 20 (case c in Figure 9.30) in the y-z plane when z varies from 20 to 20.

0.9
Actual a
0.8 Equivalent a

0.7
Electric Field (V/m)

0.6

0.5

0.4

0.3

0.2

0.1

0
-20 -16 -12 -8 -4 0 4 8 12 16 20
Coordinates (m)

Figure 9.32 Radiated magnetic field density in (case d in Figure 9.30) the x-z plane when x varies
from 20 to 20.

In Figures 9.30, 9.31, and 9.32, the magnetic and electric fields of two models
along a single line are compared and show a reasonable similarity. However, one
may say there might be dissimilarities if the fields are measured in other lines of a
plane. In other words, the measured lines are in the middle of planes, so a
Computational Electromagnetics for Evaluation of EMC Issues 361

symmetric dipole of the propagated field is more likely to occur in the equivalent
model, but other lines may not have this type of result. Hence, for further
investigation, the measurement is implemented in a plane and the results are
depicted in Figures 9.33 and 9.34.
Comparing Figure 9.33(a) with Figure 9.33(b) and also Figure 9.34(a) with
Figure 9.34(b), it can be seen that the propagated fields from the equivalent model
have very similar results to the actual model on not just one line but also on a
whole slice.

(a) (b)
Figure 9.33 Magnetic field density of (a) the actual machine and (b) the equivalent model.

(a) (b)
Figure 9.34 Electric field of (a) the actual machine and (b) the equivalent model (mV/m).

As mentioned in Section 9.3.1, the main goal of this chapter is to study the
signature of a multimachine system. Therefore, for more validation of the proposed
equivalent model, a two-machine system is designed. The two equivalent models
of the studied DC machine are located at a close distance to each other and then the
analysis is applied. The applied current of branches and voltages of nodes of
362 Advanced Computational Electromagnetic Methods and Applications

equivalent model in the multimachine study are exactly the same as those in the
single machine system.
Figures 9.35 and 9.36 show the comparison between the magnetic and electric
fields propagated from the actual and the proposed models along several lines for
two-motor cases. As can be seen, the magnetic and electric fields follow the same
patterns with excellent accuracy. For brevity, only some planes and lines from
measured planes are considered, which are illustrated in Figure 9.35(b). All other
lines and planes show similar accuracy.
-4
x 10
3.5
Actual a
Actual c Equivalent a
3 Equivalent c Actual b
Equivalent b
Actual c
Magnetic Field Density (T)

2.5 Equivalent c
Actual a
Equivalent a
2

1.5
Actual b
Equivalent b
1

0.5

0
-20 -16 -12 -8 -4 0 4 8 12 16 20
Coordinates (m)

(a)

(b) (c) (d)


Figure 9.35 (a) Radiated magnetic field density of two-machine case (case b); (b) in the x-z plane
when x varies from 20 to 20; (c) in the x-y plane when x varies from 20 to 20; and (d)
in the y-z plane when z varies from 20 to 20.

For the situation applied with different rate of power, the variation coefficient
of voltages and currents of each actual machine can be applied to the respective
equivalent model.
Computational Electromagnetics for Evaluation of EMC Issues 363
1.4
Actual a
Equivalent a
1.2
Actual c
Equivalent c
1 Actual a

Electric Field (V/m)


Equivalent a
0.8

0.6

0.4
Actual c
Equivalent c
0.2

0
-20 -16 -12 -8 -4 0 4 8 12 16 20
Coordinates (m)

Figure 9.36 Radiated electric field of two machine case (case b): in the x-y plane when x varies from
20 to 20 (case c); in the y-z plane when z varies from 20 to 20.

Also, for other sizes of similar types of machines, an appropriate coefficient


can be applied. This coefficient can be obtained based on the size of the studied
machine, whereas the coefficient for the studied machine can be considered as
basic values, and for other machines any deviation can be proportional to the basic
values. Similar factors, which are obtained from the study of the actual machine,
can be applied to equivalent machine models.
Similar to the single-machine case, the EMF spectrums of two-machine cases
are measured and illustrated in Figures 9.37 and 9.38. The result shows that the
actual machine model can be replaced by the equivalent model in two-machine
(multimachine) case.

(a) (b)

Figure 9.37 Magnetic field density of (a) the actual machine and (b) the equivalent model.
364 Advanced Computational Electromagnetic Methods and Applications

(a) (b)
Figure 9.38 Electric field of (a) the actual machine; and (b) the equivalent model.

9.3.3 Synchronous Generator

9.3.3.1 The Modeling Approach

The electromagnetic signature study of the synchronous generator as the main


generator in most types of power plants can be estimated at a far distance based on
Maxwell’s equations (time-varying). However, as mentioned earlier, estimating the
radiated field from electrical machines at a far distance requires significant
simulation time, especially for multicomponent studies using physics-based
simulations. Therefore, a logical simplification used here is utilizing the edge
modeling in the FE analysis. In addition, the synchronous machine has an
excitation part in the rotor, which is connected to power electronic components,
producing the EMI.
The procedure of the design of the equivalent source model for resembling
magnetic field stray is similar to the two previous cases. The difference is the
design of the equivalent source model for resembling the propagated electric fields.
In this model, the voltages at the nodes are considered at terminal ends of the
windings. The values of voltages are based on the electric field displacement of the
actual machine. Consequently, the model consists of many loops with various
currents and node voltages, as shown in Figure 9.39.
Comparing Figure 9.39(a) with Figure 9.39(b), the model is replaced by a
collection of lines located in the position of windings in the actual machine. Since
the model does not have a cross-section, it is not possible to apply similar current
to the model. Hence, the current value of the lines will be based on the current
density of the machine in each phase. Thus, the amount of current density of each
phase of the winding is estimated and applied to the lines in the proposed model.
Although the above-mentioned method would be helpful, the types of windings are
Computational Electromagnetics for Evaluation of EMC Issues 365

different in the synchronous machine. The field winding carries DC and the
armature winding has AC. Hence, individual models should be made for each of
these windings. The equivalent models of the armature and field windings are
shown in Figures 9.40(a) and 9.40(b). The effect of each winding in the total
signature is investigated next.

(a) (b)
Figure 9.39 Prototype of synchronous machine: (a) actual machine; and (b) equivalent model.

(a) (b)

Figure 9.40 Equivalent model of individual windings: (a) armature winding; and (b) field winding.

9.3.3.2 Simulation and Discussion

After defining the final equivalent model, the simulation is implemented in the FE
domain. The 3-D electromagnetic FE method is used as an acceptable method for
physics-based simulation. For implementation purposes, a three-phase, 600-kW,
600-V, 1,200-RPM synchronous generator is simulated in a 3-D electromagnetic
FE domain for one time instant. The analysis for the model of the actual machine
requires about 5.5 million degrees of freedom in the FE analysis. This causes the
simulation time to be about 38,000 seconds (~10.5 hours). However, the equivalent
model contains less than 1 million degrees of freedom and takes about 270 seconds
(4.5 minutes).
366 Advanced Computational Electromagnetic Methods and Applications

After solving the problem by the FE model, the magnetic field density
propagated from the machine with and without the armature winding in two
conditions is evaluated as shown in Figure 9.41.

With armature
armature winding
-6
x 10 -6
×10 -6-6
2.5
2.5 With winding x×10
10
Without armature winding
1.21.2 With armature
armature winding
winding
Without armature winding
Without armature winding
Without armature winding

Density
2.02 11

fielddensity
fieldDensity
density

0.80.8
1.5
1.5

Magnetic field
field
Magnetic

0.60.6

Magnitude
Magnitude

1.01
0.40.4

0.5
0.5
0.20.2

00
-4 -3.2 -2.4 -1.6 -0.8 0 0.8 1.6 2.4 3.2 4 00
-4 -3.2 -2.4 -1.6 -0.8 0
Coordinates 0.8
(m) 1.6 2.4 3.2 4 -4
-4 -3.2
-3.2 -2.4
-2.4 -1.6
-1.6 -0.8
-0.8 00 0.8
0.8 1.6
1.6 2.4
2.4 3.2
3.2 4
4
Coordinates
Coordinate (m) Coordinate (m)

(a) (b)
Figure 9.41 Magnetic field density propagated with and without the armature winding along (a) x-
axis in the x-z plane; and (b) x-axis in the x-y plane.

As shown in Figure 9.41, the armature winding has a direct effect in


increasing the fields with the same ratio at all points. Thus, for conciseness, the
armature winding can be ignored and the linear effect of it, which can be a ratio,
may be considered in the current values of the field winding in the equivalent
current. As a result, the simulation time will decrease dramatically, while the
accuracy does not change. After considering this simplification, in the simulation
of the equivalent model, the propagated electric and magnetic fields are measured
in different distances from the source. Figure 9.42 shows the propagated electric
and magnetic fields from both the actual and equivalent models along different
lines. As can be easily seen, both radiated electric and magnetic fields from the
equivalent model [Figures 9.42(b) and 9.42(d)] accurately match the radiated fields
from the actual model [Figures 9.42(a) and 9.42(c)] in all three shown planes. The
electric fields of the two other planes are negligible compared with the x-y plane;
therefore, they are not shown in Figures 9.42(c) and 9.42(d).
Considering the accuracy and speed of simulation of the proposed model, this
model can also be replaced in the multimachine case. It should be considered that
there are several machines and other power components in a typical powertrain.
Consequently, simulating an original model of all of them together is almost
impossible, even by using a very fast processor. The experimental verification of
this component is explained in Section 9.3.5, while the component is coupled with
the induction motor.
Computational Electromagnetics for Evaluation of EMC Issues 367

(a) (b)

(c) (d)

Figure 9.42 EMF comparisons in three planes: (a) magnetic field of actual machine; (b) magnetic
field of equivalent model; (c) electric field of actual machine (V/m); and (d) electric
field of equivalent model (V/m).

9.3.4 Cable Sets

9.3.4.1 Modeling Approach

The basics of the modeling of cables are similar to the previous cases. However,
since this component is not an electrical machine, some considerations should be
employed.
The actual physical modeling of cables for signature studies requires all the
details to be considered, even in a large region. The cross-linked polyethylene
(XPLE or PEX) cables similar to all electromagnetic sources propagate dipoles at a
far distance. However, the interaction of several components such as electrical
machines and power converters modify the shape and the amplitude of dipoles.
Therefore, each model should be designed and studied independently. Nevertheless,
there is a problem, which is the modeling of the relatively small layers of multicore
368 Advanced Computational Electromagnetic Methods and Applications

XLPE cables. The studied region could be about 20,000 times bigger. This causes
the deformation of the cable model during meshing in numerical modeling
methods, such as FEM. The present study is performed on the XLPE insulated and
armored polyvinyl chloride (PVC) sheathed cable (0.6/1 kV).
Figure 9.43 shows the typical model, as well as the original and deformed
models of the studied cable in the FE analysis environment. In order to solve this
issue, a specific modeling including multidipoles with several line currents and
node voltages is designed, which resembles the actual model of the cable for
signature studies.

(a) (b) (c)

Figure 9.43 Models of the proposed cable in FE element design: (a) typical model; (b) original FE
model; and (c) deformed FE model.

The multidipole model of the studied cable is shown in Figure 9.44. A typical
node voltage and line current are displayed in the figure.
Voltage point

Current
line

Figure 9.44 Prototype of the multidipole models of the studied cable.

9.3.4.2 Case Studies with Simulation Results

For simulation purposes, first a unit length of the actual XLPE and the model
cables are simulated and compared using the FEM method. Afterward, various
Computational Electromagnetics for Evaluation of EMC Issues 369

directions of the cable are studied. The cable is then analyzed in multi-permittivity
areas such as undersea. As mentioned earlier, the XLPE insulated and armored
PVC sheathed cable (0.6/1 kV) is the proposed cable.

Unit Length of the Cable

Initially, the radiated EMFs of the proposed model and the full model of the cable
are evaluated and compared. In order to avoid deformation, the actual model is
simulated by considering a large number of elements, which is only applicable in
simple situations, such as a unit length of a cable. This case is studied by applying
two different voltages to the ends of the cable. The field spectrums radiated from
the actual and the proposed model are shown in Figures 9.45(a) and 9.45(b),
respectively. Comparing the results in Figures 9.45(a) and 9.45(b) shows that the
proposed model has a very good accuracy.

(a) (b)

Figure 9.45 Radiated magnetic field density of (a) the actual model; and (b) the equivalent model in
tesla. Note that the cable is very small compared to the region.

Since cables are symmetrical, the radiated fields are the same in all planes of
the region similar to a simple dipole. Nevertheless, the radiated fields in two planes
are measured, and the result shows that the proposed multidipole model propagates
similar radiated fields as the actual model. A similar study is implemented for the
radiated electric field, and the result is shown in Figure 9.46. As shown in the
figure, the radiated electric field of the proposed model equals the actual one.
Therefore, both indices of the signature study of the proposed model represent
accurate results, while the simulation time of this model is about 100 times less.
370 Advanced Computational Electromagnetic Methods and Applications

(a) (b)
Figure 9.46 Radiated electric field of unit length of the cable (a) for the actual model and (b) for the
equivalent model (mV/m).

Multidirectional Cables

The XLPE cables in connection between two components may have many curves
or torsions; therefore, various magnetic dipoles would be established and
consequently radiated fields would become different. In multidirection cable
analysis, the simulation time increases significantly, or in the cases of coupling
with other components, the simulation may become impossible due to the increase
of the number of tiny spaces between fragments of each component, while the
region is huge. As an initial case of multidirectional cable, perpendicular cables are
located in the same region and the radiated EMFs are measured at a far distance,
which is displayed in Figures 9.47 and 9.48. Similar to the single cable case, the
proposed model shows great accuracy. Additionally, the difference in simulation
time between the actual model and the equivalent model increases.

(a) (b)

Figure 9.47 Radiated magnetic field density of perpendicular cables case (a) for the actual model
and (b) for the equivalent model in tesla.
Computational Electromagnetics for Evaluation of EMC Issues 371

Comparing Figure 9.47 and Figure 9.45, the maximum point of the radiated
magnetic field density in the lateral plane is moved to the corner. This is because
of the interaction of dipoles of two perpendicular cables. Since the source of
signature is not symmetrical anymore, the radiated fields in the two shown planes
are different.

(a) (b)
Figure 9.48 Radiated electric field of perpendicular cables case (a) for the actual model and (b) for
the equivalent model.

Moreover, to verify the model and have the study in all dimensions, a more
complex multidirectional cable is analyzed. To do so, four discontinuous units of
the cable are located arbitrarily in different angles (see Figure 9.49).

Figure 9.49 A sample of multidirectional discontinuous cables.

Similar to previous cases, magnetic and electric fields radiated from the cables
are obtained and shown in Figures 9.50 and 9.51. Comparing the result of
multicable case with that of a single case, they are not similar at all because of the
presence of cables at various angles. The proposed model equals the actual model
in this case, as well as in previous cases. Note that the lines around the region are
for increasing the number of meshes in measured planes to have more accurate
radiated fields.
372 Advanced Computational Electromagnetic Methods and Applications

(a) (b)
Figure 9.50 Radiated magnetic field of multidirectional cables case (a) for the actual model and (b)
for the equivalent model.

(a) (b)
Figure 9.51 Radiated electric field of multidirectional cables case (a) for the actual model and (b) for
the equivalent model (mV/m).

The Cables in Multipermittivity Area

As mentioned in the chapter introduction, one of the main applications of low-


frequency EMC study is for analyzing buried, underground, and undersea cables.
Since there is the area with two or more permittivities, such as soil and air or water
and air, the radiated electric field at a far distance in this occasion would be
different. For brevity, only the radiated electric field of the undersea cable is
studied. As shown in Figure 9.52, the surface of water is illustrated, so that the
region includes water, air, and permittivity of the cable and its insulation. As
expected, the radiated fields are different in the area with various permitivities. A
similar area condition is applied to the equivalent model, and the radiated field is
the same as the actual model. Consequently, this model is applicable in various
environmental conditions.
Computational Electromagnetics for Evaluation of EMC Issues 373

(a) (b)
Figure 9.52 Radiated electric field of the cables in multipermittivity area (a) for the actual model and
(b) for the equivalent model (V/m).

Coupling of the Cable with Synchronous Machine

For further verification and studying the application of this type of modeling, the
proposed model is analyzed in connection with a power component. A
synchronous generator is coupled with a multicore XLPE cable. The modeling of a
synchronous generator was explained in Section 9.3.3. The actual and equivalent
models of the cable connected to the machine are shown in Figure 9.53.

(a) (b)
Figure 9.53 Schematic of the synchronous machine connected to the cable: (a) the detailed model;
and (b) the equivalent model.

The rated voltage is applied to the cable, which is connected to the machine
and the radiated field, which is measured at a far distance from the sources. The
current and voltage values of the equivalent model are calculated based on the
individual actual model of the machine and cable. Figure 9.54 shows the
propagated field of both models along the x-axis in the x-y plane. The proposed
line is also shown in the figure. The difference of the amplitude between these two
models is because of the superposition of materials. Since the cables and machine
are so close together, there is a superposition effect in the magnetic field. The
374 Advanced Computational Electromagnetic Methods and Applications

radiated magnetic field from the cable is induced into the machine and creates an
induced current which radiates an additional field from the machine. This situation
cannot be simulated perfectly in the proposed multidipole modeling, which results
in a difference in the curves. In order to clarify the effect fields of each component
on the total radiated fields in Figure 9.54, the radiated field of each component is
calculated and shown in Figure 9.55. As shown in the figure, the effect of the
cable’s radiated field is less than that of the machine. This is because of the volume
of the machine and the effect of that on the current density, which builds the
magnetic field.
-9
x 10
3
actual model
2.8 equivalent model

2.6
Magnetic Field Density (T)

2.4

2.2

1.8

1.6

1.4

1.2

1
-20 -16 -12 -8 -4 0 4 8 12 16 20
coordinates (m)

(a) (b)
Figure 9.54 Radiated magnetic field density along the x-axis in the x-y plane for the actual and
equivalent models. (a) Problem configuration. (b) Radiated magnetic field density along
the x-axis in the x-y plane.

-9
x 10
3
both on
only machine on
2.5 just cable on
Magnetic field density (T)

1.5

0.5

0
-20 -16 -12 -8 -4 0 4 8 12 16 20
coordinates (m)

Figure 9.55 Radiated magnetic field density along the y-axis in the x-y plane for three cases.
Computational Electromagnetics for Evaluation of EMC Issues 375

9.3.5 Coupling of Machines

The studied synchronous generator in Section 9.3.3 is coupled to the induction


motor, which was studied in Section 9.3.1. The system that is implemented in the
FE domain is shown in Figure 9.56. As mentioned, the switches of other
components are turned on and off for evaluating the radiated field of each
component or two or several components together.

(a) (b)
Figure 9.56 Schematic power setup (a) for the full FE model and (b) for the equivalent model.

The synchronous generator and induction motor are switched on and off to see
their effects separately and verify the equivalent source model. In the following,
the synchronous generator is turned on while other components are switched off.
As shown in Figure 9.56, the electric and magnetic fields radiated from the
wire-model of the synchronous generator matches the fields radiated from the
actual machine. The electric fields in Figure 9.57 match the Maxwell radiation
theory, since the electric field is in the direction of the poles of the terminal voltage.
That is why the propagated field in Figure 9.57 is more in the frontal plane
compared to the lateral planes. Inversely, the magnetic fields establish
perpendicular to the direction of currents; thus, the field in the lateral planes is
more than that on in the frontal plane in Figure 9.58.

(a) (b)
Figure 9.57 Radiated electric field of (a) the actual model and (b) the equivalent model in tesla while
the synchronous generator is turned on and other components are off.
376 Advanced Computational Electromagnetic Methods and Applications

(a) (b)

Figure 9.58 Radiated magnetic field density of (a) the actual model and (b) the equivalent model in
tesla while the synchronous generator is turned on and other components are off.

After testing the fields of each machine specifically, the fields of coupled
motor generator are measured, as shown in Figures 9.59 and 9.60.
Comparing Figure 9.60 with Figure 9.58, the amplitude of the electric field is
decreased, while the induction motor is connected to the generator. This is because
the terminal voltage in motor and the generator voltage are out of phase, so the
electric field is not cumulative. However, the direction of the current of the motor
is in the direction of the generator. By the way, the radiated field of the equivalent
model matches the radiated field of the actual model.

(a) (b)

Figure 9.59 Radiated electric field of (a) the actual model and (b) the equivalent model in tesla while
the coupling of machines (generator-motor) is turned on and others are off.
Computational Electromagnetics for Evaluation of EMC Issues 377

(a) (b)

Figure 9.60 Magnetic stray field density of (a) the actual model and (b) the equivalent model in tesla
while the coupling of machines (generator-motor) is turned on and others are off.

9.3.6 Whole System Setup

9.3.6.1 Model Design

Finally, all the components are gathered and the excitation is applied to the
generator and motor, and a pulse load and the connection cable get connected to
them. The details of the components are mentioned in Table 9.4. The model is
analyzed in full detail in the FE domain. In addition, the equivalent source model is
used to model all these components.
The models are shown in Figure 9.61. As shown in this figure, the equivalent
source model consists of numerous lines with specific currents flowing and
voltages established at nodes of these wires.

Table 9.4
The Details of the Components in the Tested Setup

Component Characteristics

Synchronous generator 13.8 kW, PF: 0.8, length: 25 cm, diameter: 28-30cm, pole: 4, RPM: 1800,
nominal voltage: 230V, amp: 39.5A, exc. voltage: 37V, exc. amp: 1.9A

Induction machine 5.5 kW, PF:0.85, length: 30 cm, diameter: 25 cm, pole: 4

Electric load 3 kW AC load

Connection cable XLPE, Diameter: 5 cm, insulated and armored PVC sheathed cable
378 Advanced Computational Electromagnetic Methods and Applications

(a) (b)

Figure 9.61 Schematic of the power setup: (a) the full FE model and (b) the equivalent model.

The described equivalent source models have some issues in a


multicomponent system. The radiated electric and magnetic fields are affected by
not only the voltage and current magnitudes but also the permittivity and
permeability of the materials [17]. In addition, the radiated field of one component
induces fields into the other components that cause the induced voltage.
Consequently, additional radiated electric and magnetic fields are created. For
example, the induced field of the induction motor in the vicinity of the cable is
obtained as shown in Figure 9.62. This field is obtained while other components
are turned off and only the induction motor is on. However, the equivalent source
model may not have this phenomenon, since the lines in the model do not have the
volume. Therefore, the induced voltage and consequently the additional radiated
field are not produced in the equivalent source model. In order to resolve this issue,
another optimization should be applied to the equivalent source model. The issue
of superposition exists, since there are no component cases and insulation in the
equivalent source model. Hence, each component in the embedded equivalent
source model is enclosed from the other components with a casing. The casing
should have optimized value of permeability to avoid superposition between
components. The value of permeability must be less than 1 and close to 0; however,
very small value of permeability increases the simulation time dramatically.
Therefore, both the mentioned parameters should be considered. Less permeability
of the casing reduces the amount of radiated magnetic field entering into the
imposed casing while the magnetic field getting out of the imposed casing does not
get affected seriously. Therefore, the superposition between the components would
decrease while the radiated field in the area around air does not change
considerably.
Computational Electromagnetics for Evaluation of EMC Issues 379

x10
-5-5
10
4.5
4.5
(T)

4
4
Density
FluxDensity (T)

3.5
3.5
MagneticFlux

3
Magnetic

2.5
2.5

2
2
0
0 0.5
0.5 1
1 1.5
1.5 22
Length of
Length of the
theline
line(m)
(m)

(a) (b)

Figure 9.62 (a) Radiated magnetic field of the induction machine on the cable, while only the
induction motor is turned on; and (b) the problem model.

The magnetic fluxes radiated from the actual and the equivalent source models
are derived from the simulation at 7m away from the arrangement and are shown
in Figure 9.63. As illustrated in the figure, the magnetic flux radiated from the
equivalent source model is similar to the actual model. The small difference
between the maximum values of magnetic flux densities of two models is due to
the issue of the superposition of the components.

(a) (b)
Figure 9.63 Radiated magnetic flux density of (a) the actual model and (b) the equivalent model in
tesla.

The optimized result is shown in Figure 9.64. As can be seen, the magnitude
of the radiated magnetic field of the optimized equivalent source model is almost
the same as the actual model.
380 Advanced Computational Electromagnetic Methods and Applications

Figure 9.64 Radiated magnetic flux density of the optimized equivalent source model.

9.3.6.2 Simulation and Experimental Results

In addition to the superposition, another application of the model is using turning


on and off switches for each of the components. By implementing the switches, the
effect of each component could be identified and analyzed. The benefit of studying
each component separately is monitoring their behavior, detecting failures and
fault conditions. This can be done using both simulation and experimental methods.
Various cases of the studied setup, shown in Figure 9.65, are conducted and added
to the figures, obtained from the simulation and measurement results. Note that the
connection cables to the loads are outside Figure 9.65. Moreover, there is a
controller connected to the drive shown in Figure 9.65, which is out of the system.
The system is started up manually. Details of the measurement elements were
mentioned before in Section 9.3.1.

Figure 9.65 The studied setup including machines, measurement devises and control drive (for
switching).
Computational Electromagnetics for Evaluation of EMC Issues 381

For the experimental test, all components including the synchronous generator,
the induction motor, and the electric load are turned on. The cables are passing
currents, so they can also be considered on. The same test as in the previous case is
studied here. All switches are turned on and the H-field is measured
experimentally and also obtained from the simulation models. The machines are
tested at their nominal voltages. The magnetic field intensity (H-field) of the
measurement and simulation models is shown in Figure 9.66. As shown in this
figure, the full FE model and the equivalent source model have a similar radiated
H-field compared to the measurement. The small differences of the amplitudes are
because of the effect of the body of the other components around the system.
The application of this study is in the system monitoring and fault diagnosis,
which is studied in [18].

0
Magnetic Flux Intensity (dBuA/m)

-10

-20

-30

-40

measurement
-50 3DFE model
equivalent model
-60
-2.5 -1.5 -0.5 0.5 1.5 2.5
Arc Length (m)

Figure 9.66 The measured magnetic field intensity at 55 cm from the setup in the y-axis while all
components are turned on at 60 Hz (dBµA/m).

9.3.7 Generalization of the Equivalent Source Model

9.3.7.1 Study Approach

All designed models are suitable for only one typical situation, including terminal
voltage rates and physical geometry conditions of the machines. The model
proposed here is optimized in a way such that it can be utilized for various types of
machine sizes and operating voltages.
The procedure involves measuring (numerically evaluating) radiated fields of
an actual AC machine model with a basic size. If the model of the proposed size is
not available, the fields can be estimated using the related equations, based on the
382 Advanced Computational Electromagnetic Methods and Applications

fields measured for basic size [19]. Optimization factors are then applied as
follows:
K BS  BiSnew / Bbase , ?K ES  EiSnew / Ebase ?
(9.3)
where Bbase and Ebase are the magnetic field density and electric field of the basic
case, and BiSnew and EiSnew are magnetic field density and electric field of any
machine size. These parameters could be measured at any random points around
the component (e.g., the maximum B in a plane at a distance from the component).
The KBS and KES factors in (9.3) are applied, respectively, to the currents and
voltages of the equivalent source model to optimize the model for a new machine
with a different size. These factors are applied due to the fact that the magnetic
field density, which is used in these equations, shows strong correlation with the
magnitude of the current of the lines in nonvolumetric models (Biot-Savart Law).
Similarly, the electric field has the same relation with the voltage at the nodes [19].
A similar procedure can be applied for variations of the terminal voltage. In this
case, the factors are as follows:
K BV  BiVnew / Bbase ,?K EV  EiVnew / Ebase
(9.4)
where BiVnew and EiVnew are the magnetic field density and electric field of any
proposed sizes. Also, the KBV and KEV factors in (9.4) are applied, respectively, to
the currents and voltages of the equivalent source model to optimize the new case
with a different terminal voltage. If there is a case with both voltage and size
variations, both factors will be multiplied by the current and voltage values of the
basic equivalent model.
Since some material properties of machines, such as the permeability, are
nonlinear, it is not possible to utilize the currents and voltages instead of B and E
in (9.3), for all working conditions. However, it might be possible to replace the
magnetic field density (B) with the current or other parameters for a specific range
of currents. Hence, the most reliable parameter is magnetic field density and
electric field, which are being used in (9.3) and (9.4). However, in order to avoid
modeling the actual machine in each different case to obtain BiSnew, EiSnew, BiVnew,
and EiVnew in (9.3) and (9.4), the related curves of the four factors (KBS, KES, KBV,
KEV) for both AC machines are obtained. Random examples for different cases
(size and voltage variation) are measured and the related factors are obtained.
These are shown in Tables 9.5 and 9.6, and based on the points in Table 9.5 and
the curve-fitting procedure, the curves are established.
As shown in Table 9.5, the factors due to the geometrical size changes are not
just based on size ratios, but many other parameters have an effect on the values of
these factors. A curve-fitting technique is used to find an equation to obtain these
factors based on a size ratio as a variable. For example, the equation for KBS of a
synchronous generator is as follows:
K BS  0.06805R3  0.80653R2  0.28031R  0.0028016 (9.5)
Computational Electromagnetics for Evaluation of EMC Issues 383

where R is the size ratio.


The factor due to the terminal voltage changes, as illustrated in Table 9.6, is
mostly related to the terminal voltage ratios. Here, there is no need for any kind of
curves for the terminal voltage variation case and the ratios that can be directly
used as factors. The curves for the size variation case, which is established based
on Table 9.5, are shown in Figures 9.67 and 9.68 [20].

Table 9.5
Some Patterns of Size Variation of Induction Motor and Synchronous (SYN) Generator

KBS of SYN KES of SYN KBS of Induction KES of Induction


Ratio
Machine Machine Machine Machine

0.6 0.460081 0.558676 0.53685 0.560039

0.8 0.704005 0.773821 0.75928 0.782211

0.9 0.872369 0.88666 0.92806 0.928058

1.0 1.0 1.0 1.0 1.0

1.3 1.5689 1.34052 1.46037 1.36037

1.7 2.49898 1.81896 2.12283 1.92283

2.0 3.23557 2.16098 2.59692 2.29693

Table 9.6
Patterns of Terminal Voltage Variation of Induction Motor and Synchronous (SYN) Generator

KBV of SYN KEV of SYN KBV of Induction KEV of Induction


Ratio
Machine Machine Machine Machine

0.6 0.5998 0.6003 0.60002 0.60001

0.8 0.7997 0.8004 0.800066 0.79998

0.9 0.8995 0.9002 0.90001 0.90001

1.0 1.0 1.0 1.0 1.0

1.3 1.2994 1.3004 1.30001 1.30002

1.7 1.6993 1.7001 1.7000 1.70001

2.0 1.9993 2.005 2.0000 2.00002


384 Advanced Computational Electromagnetic Methods and Applications

3.5
synchronous generator
induction motor
3

2.5

2
B
K

1.5

0.5

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
size ratio (S new / Sbase )

Figure 9.67 KB due to size variation of synchronous generator and induction motor.
©IEEE 2012 [20].

2.5
synchronous generator
induction motor
2

1.5
E
K

0.5

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
size ratio (S new / Sbase )

Figure 9.68 KE due to change of size of synchronous generator and induction motor.
©IEEE 2012 [20].

9.3.7.2 Case Studies

To verify the generalized model, both AC machines (induction and synchronous)


are analyzed in different cases.
An induction machine case with 1.2 times the basic size (R: 1.2) is simulated
in a large region. A comparison between the equivalent source and actual detailed
numerical models is shown in Figures 9.69(ad). Radiated magnetic field density
and electric field spectrums are used as indices. An induction machine case with
0.6 times the original terminal voltage (R: 0.6) is simulated and the comparison
Computational Electromagnetics for Evaluation of EMC Issues 385

between the new equivalent source model and the actual numerical models is
shown in Figures 9.70(ad).
This case considered both variations of geometrical size and terminal voltage.
The results is shown in Figures 9.71(ad) for both the equivalent source model and
the detailed numerical model.

(a) (b)

(c) (d)
Figure 9.69 Field spectrum of induction motor while the geometric size increased 20%: (a) B of
actual model; (b) B of equivalent source model; (c) E of actual model; and (d) E of
equivalent source model. ©IEEE 2012 [20].

According to the amplitudes and spectrums in these figures, the field


propagated from the equivalent source models shows the significant accuracy
when compared with the actual numerical models. The region in this study is (8m
× 8m) region, while the diameter of the induction motor is 1.2m.
The same procedure is implemented for a synchronous generators case. In
order to check the propagated fields and validate them at far distances, the machine
is placed in a large region (50m × 50m). Only the similar case to the third case of
induction machine is shown in this part. Hence, the variation of geometrical size
with the ratio equal to 1.7 and terminal voltage with the ratio equal to 0.8 is
considered. According to Tables 9.5 and 9.6, the factors are as follow: KBS =
2.49898, KES = 1.81896, KBV = 0.8, and KEV = 0.8. Therefore, the two applied
factors to the equivalent model will be KB = KBS × KBV and KE = KES × KEV. These
386 Advanced Computational Electromagnetic Methods and Applications

two factors are applied to the branch currents and node voltages of the equivalent
source model, respectively. For verification, the actual and optimized equivalent
models are compared and are shown in Figures 9.72(ad).
Comparing the amplitudes and spectrums in Figure 9.72(a) with Figure
9.72(b) and Figure 9.72(c) with Figure 9.72(d), the propagated fields from the
equivalent source model accurately match in both planes around the machines at a
distance of about 2025m. It should be noted that unlike the induction motor cases
in the previous study, the generators are located horizontally. Locating an
induction machine vertically is necessary in some applications [21]. As shown in
Figure 9.72, the actual models are much larger than the equivalent source one,
because they are tested with a new size, while the equivalent source model is in the
same size and shape. The optimization factors are applied and the results match
very accurately.

(a) (b)

(c) (d)

Figure 9.70 Field spectrum of induction motor while the terminal voltage decreased 40%: (a) B
distribution of actual model; (b) B distribution of equivalent model; (c) E distribution of
actual model; and (d) E distribution of equivalent model. ©IEEE 2012 [20].
Computational Electromagnetics for Evaluation of EMC Issues 387

The final test case, in which both AC machine types are located in a region
(50m × 50m) with a different test, is implemented on both. The size ratio of the
synchronous generator is chosen equal to 1.4, while this ratio for the induction
motor is chosen as 0.85. In addition, the voltage ratios for the synchronous
generator and the induction motor are selected as 1.2 and 0.7, respectively. Using
Tables 9.4 and 9.5 and the curves derived earlier, the desired factors are estimated.
A diagram for calculating these factors is shown in Figure 9.73. The two factors
for each machine are estimated based on the four factors. The field spectrums of
this test are demonstrated in Figures 9.74(ad).

(a) (b)

(c) (d)

Figure 9.71 Field spectrum of induction motor while the terminal voltage decreased 40% and
geometric size increased 20%: (a) B distribution of the actual model; (b) B distribution
of the equivalent model; (c) E distribution of the actual model; and (d) E distribution of
the equivalent model. ©IEEE 2012 [20].
388 Advanced Computational Electromagnetic Methods and Applications

(a) (b)

(c) (d)
Figure 9.72 Field spectrum of synchronous generator while the terminal voltage decreased 20% and
geometric size increased 70%: (a) B distribution of the actual model; (b) B distribution
of the equivalent model; (c) E distribution of the actual model; and (d) E distribution of
the equivalent model. ©IEEE 2014 [20].

KBS of KES of KBS KES


SYN G SYN G of IM of IM
× KB of
SYN G × KE of
SYN G × KB of
IM ×
KE of
IM
KBV of KEV of KBV KEV
SYN G SYN G of IM of IM

Figure 9.73 Calculation diagram of optimization factors for the two AC machines. SYN G and IM
stand for synchronous generator and induction motor. ©IEEE 2012 [20].

This test case shows that even with the change of all the conditions
simultaneously, the equivalent model results match the actual one. In addition to
the verification of the equivalent source model, other aspects in the area of EMC
evaluation can be recognized. For example, comparing Figure 9.74 with Figures
9.72 and 9.70, it is obvious that the field spectrum in Figure 9.74 is similar to
Computational Electromagnetics for Evaluation of EMC Issues 389

Figure 9.72. This means that coupled AC machine systems radiate similar
signatures to the synchronous machine. This is because of the large difference
between the nominal power of the synchronous generator and induction motor
(857-kVA versus 33-kVA). Therefore, radiated fields of the induction motor only
increase the amplitude of overall fields of the studied system.

(a) (b)

(c) (d)
Figure 9.74 Radiated field spectrums from both synchronous generator and induction motor: (a) B of
the actual model; (b) B of the equivalent source model; (c) E of the actual model; and (d)
E of the equivalent source model. ©IEEE 2012 [20].

In order to increase the accuracy of the equivalent source model, switches are
considered for each machine to turn them on and off to study the superposition
concept.
Finally, by comparing the simulation times of the two models, the actual
model of the machines implemented by the full FE model and the generalized
equivalent source model, the results demonstrated in Table 9.7 are obtained.
390 Advanced Computational Electromagnetic Methods and Applications

Table 9.7
Simulation Characteristics Comparison

Number of Degrees of Analysis


Type of Model
Freedom (Million) Time (s)

Induction motor (actual) 3.5 6,200

Induction motor (equivalent source) 0.9 140

synchronous generator (actual) 4.2 6,800

synchronous generator (equivalent source) 1.1 170

Coupled machine model (actual) 6.0 11,000

Coupled machine model (equivalent source) 1.5 188

9.4 POWER CONVERTERS

9.4.1 Modeling Approach

The same procedure of modeling is applied for the power converter with the
exception that the power electronics converter has switches and the switching
activities should be considered.
The proposed full FE model is shown in Figure 9.75. This electronic drive
consists of an inverter, AC load, and the armored connection cable. The details of
the devices are identified in Table 9.8.

Cable

Three Phase
Inverter

Three Phase AC
Load

Figure 9.75 The prototype of the inverter, load, and connection cable.
Computational Electromagnetics for Evaluation of EMC Issues 391

The schematic circuit of the inverter is shown in Figure 9.76. In this


simulation, the insulated-gate bipolar transistor (IGBT) module is operated for a
relatively low switching frequency to illustrate the behavior of the circuit. In the
pulse width modulation (PWM) inverter of Figure 9.3, the duty cycle ratio of the
input signal to the IGBT gate drivers is varied using the space vector PWM
technique to produce a 60-Hz sinusoidal variation of the resistor inductor (RL)
load current [22].
The operation of the inverter is divided into six sections. During the first π/3
rad of 60-Hz inverter operation, IGBTs Sap, Sbn and Scn are switched on, while the
others are in the off state. This process changes in a way to track the reference
voltage as [22]:
2
3 j
Vref  (Va  Vb   2Vc ),   e 3 (9.6)
2
Table 9.8
The Details of the Components in the Tested Setup

Component Characteristics

Three-phase, 5.5 kW, Switching frequency: 5 kHz, Switching algorithm: SVM,


Inverter
Length: 30cm, Width: 30 cm, Height: 25 cm, Nominal voltage: 320V, Amp:
20A
Electric load 3-kW AC load

XLPE, Diameters: Cross-sectional area: 1000 mm2, Thickness of insulation: 2.8


Connection cable mm, Nominal thickness of PVC sheath: 2.4 mm, Overall diameter: 51 mm,
insulated and armored PVC sheathed cable

Sap Sbp Scp

Sap
San Sbn Scn

Figure 9.76 Schematic of a six-switch inverter circuit.


392 Advanced Computational Electromagnetic Methods and Applications

The inverter operation during other sequences of the 60-Hz reference sine
wave is similar to the aforementioned sequence, except that the opposite phase of
the bridge is switched on and off. The sinusoidal variations of the duty cycle ratios
for each phase were specified by comparing triangular waveforms to the
magnitude of the sinusoidal reference signal. When the value of the reference sine
wave is larger than the value of the upper triangle wave, S ap is switched on;
otherwise, it must be off. The same procedure goes within the other IGBTs as well.
Figure 9.77 shows the simulated load current for the space vector PWM (SV-
PWM) operation.
To model the IGBT switches of the inverter for signature studies, the switches
must be considered off for a moment of time and then they must be considered on
for the next time instant. This shift occurs based on the switching frequency of the
converter. In order to do this in the FE simulation, the plate between the load and
the positive bus, as shown in Figure 9.78, is considered a conductive plate for the
switch-on case. Subsequently, this plate is considered a nonconductive plate for the
switch-off case. This alteration of the conductivity of the plate occurs 5,000 times
in a second due to the switching frequency (5 kHz).

Figure 9.77 Line current and voltage in the case of SVPWM.

Figure 9.78 Physical model of the inverter switches.


Computational Electromagnetics for Evaluation of EMC Issues 393

9.4.2 Simulation and Experiment

The proposed setup consists of an inverter, an induction machine, connecting


cables, and an AC load. The study is divided into two cases for further
investigation: the converter connected to an AC load as the case 1 (Section 9.4.2.1)
and converter connected to an induction motor as case 2 (Section 9.4.2.2). Each
study is discussed along with its application.

9.4.2.1 Case 1: Converter Connected to the Load

The schematic of the converter shown in Figure 9.75 is implemented based on the
above procedure and modification. The simulation is computed in 6 hours with
about 1 million elements including face, line, and node meshes in the model with 6
million degrees of freedom. The large number of elements is necessary because of
very small surfaces, edges, and lines of the critical part of the inverter and cable, as
shown in Figure 9.79. The details of FE modeling are reflected in [2125]. The
simulation is implemented in a fast computer Intel Xeon 16-core 3.47 GHz CPU
with 192-GB RAM.
Since there are two cases in this study, it is decided to define two types of
results. In case 1, generated fields of the system on three different surfaces at a
distance in space are considered as the result, and in case 2, the harmonics of the
fields and the frequency responses are investigated. Hence, in this case, the
generated stray magnetic and electric fields are obtained in 3-D at a given distance
in both switching circumstances. Figures 9.80(a) and (b) show that turning on and
off the switches has the effect only on the amplitude of the magnetic field density,
and the spatial distribution of the stray magnetic field on the slices does not change
significantly. This is due to the presence of the AC load, which is discussed further.
However, the electric field, which is shown in Figure 9.81, illustrates that when the
switches turn on, the electric field in two lateral planes, the x-y and y-z planes,
increases while the field in the x-z plane decreases. The increase of the electric
field, in these planes, is due to the flow of current in the switches. It is also due to
the creation of a current loop, and its reduction is because of the superposition,
which is suppression in this case. The suppression occurs due to the propagation of
fields into the other conductive parts of the devices in vicinity; therefore, the stray
field induced from the imposed conductive parts decreases. The reason for the
suppression is the inverse direction of the induced field due to Lenz’s law [26].
Therefore, the induced stray field is subtracted from the main stray field and the
total field decreases as in Figure 9.81(a).
394 Advanced Computational Electromagnetic Methods and Applications

Figure 9.79 Mesh pattern of the modeled inverter.

(a) (b)
Figure 9.80 Stray magnetic field density of the system: (a) IGBT switched on and (b) IGBT
switched off (µT).

(a) (b)
Figure 9.81 Stray electric field distribution: (a) IGBT switched on and (b) IGBT switched off (µV/m).
Computational Electromagnetics for Evaluation of EMC Issues 395

To recognize which element of the setup has more effect on the total field, the
stray magnetic fields of each component in this setup are analyzed individually to
observe their spectrum and compare it with the overall fields. The results are
shown in Figure 9.82(ad). Comparing Figure 9.82(c) with Figures 9.82(a) and
9.82(b), while only the load is switched on, the stray magnetic fields have a higher
value in comparison with Figures 9.82(a) and 9.82(b) and the total field are
affected by it [compare Figure 9.82(c) with Figure 9.82(d)]. The reason is that the
AC load has bigger conductive elements, including iron and copper materials,
compared to the other elements in the setup.

(a) (b)

(c) (d)

Figure 9.82 Stray magnetic field density of the system (µT): (a) only the cable is switched on; (b)
only the inverter is switched on; (c) only the load is switched on; and (d) the whole
system is switched on.

In order to investigate the effect of superposition, the generated field of a


random point (p) of the figures, as shown in Figure 9.82(c), can be used. For
instance, the values of a point of the three cases (0.0, 5.0 m, 0.0) of Figure 9.9 and
Figures 9.82(ac) are aggregated. The result is 1.31e3 (µT) while the overall
maximum point is 1.10e3 (µT), as shown in Figure 9.82(d). This can be due to the
dissimilarity of permeabilities and conductivities of the elements of the model. If
396 Advanced Computational Electromagnetic Methods and Applications

the resistance of an element is less than another element in the vicinity while there
is no shield between them, EMF will be induced from the component with less
conductivity into the one with higher conductivity [27]. As mentioned above, due
to Lenz’s law, the field radiated from the induced EMF will be the opposite of the
main field. Therefore, the overall field will be less than the aggregation of the
fields.

9.4.2.2 Case 2: Converter Connected to the Motor

In this case, the inverter is connected to an induction motor. The aim of this case is
investigating the radiation of harmonic fields from the inverter while the distance
and the speed of the motor change. The parameters of the induction motor are: 5.5
kW, 3phase, 208V, PF: 0.85, length: 30 cm, diameter: 25 cm, number of poles: 4.
This case is simulated using FEM shown in Figure 9.83(a).

(a)

(b)

Figure 9.83 The scheme of the setup of case 2: (a) FEM simulation and (b) measurement setup.
Computational Electromagnetics for Evaluation of EMC Issues 397

The simulation is computed in 6 hours with 950,000 elements and 5.7 million
degrees of freedom. Since the case includes very small elements and also nonlinear
materials (e.g., the core of the machine), the simulation of the inverter connected to
the load or motor may take 8 hours or more for only one time instant.
Generally, linear or nonlinear solvers are being used in the FEM simulations.
In this case, since there are several materials with nonlinear characteristics, the
linear solver cannot be used. However, using nonlinear material rises the
simulation time dramatically. Hence, a modification in choosing the solver and the
associated iterative technique is employed. Instead of having linear or curved
commutation curve, the ramp of the curve in several zones was calculated (µ r1,
µr2, …) and used instead of the commutation curve in this part as shown Figure
9.84.
The benefit of this modification is that the magnetic flux density of a
component changes in a very small period due to the steady state condition of the
system. For example, the magnetic flux density of the stator core of the induction
motor is about 1.52T in power frequency analysis, 5060 Hz. For higher
frequencies, it goes down to under 1T. Therefore, in this case, a specific zone of
the permeability can be chosen for this component. Similarly, the permeability of
other components of the system can be chosen based on the working frequency.
Therefore, having the idle parts of the commutation curves of the elements would
be avoided, and the simulation time decreases. This algorithm can be defined in the
material properties part of the FEM simulation.

µ r3
µ r2

µ r1

Figure 9.84 Commutation curve of some materials used in the simulation.

In addition to the modification in defining the material properties, some


modifications need to be performed for the solver to have a flexible solution.
Hence, as the iterative solver, the fast generalized minimal residual technique,
398 Advanced Computational Electromagnetic Methods and Applications

GMRES, with the Krylov as the preconditioner was used. The fast GMRES is a
variant of the GMRES method with flexible preconditioning that enables the use of
a different preconditioner at each step of the Arnoldi process. The Krylov subspace
is a linear subspace which enables multipreconditioning [28]. In particular, a few
steps of GMRES can be used as a preconditioner for fast GMRES. The flexibility
of this solution method is beneficial for the problem with nonlinear material
characteristics, such as the motor’s core. Therefore, the simulation time decreases
from over 8 hours to 20 minutes. More explanation is given in [29].
In addition to the simulation, the experimental setup is implemented in a
chamber, which isolates the setup from the outside environment, shown in Figure
9.83(b). The coil antenna is located at 10 cm away from the inverter to obtain the
stray magnetic field. The fields are transferred to an EMI receiver, real-time
spectrum analyzer, with a cable of 50Ω impedance.
The magnetic field intensity (H-field) generated from the setup in simulation
is shown in Figure 9.85. The H-field at 5-kHz frequency is shown on a slice at 10
cm away from the setup, the same as the experimental setup. As illustrated in this
figure, the amplitude of the stray field around the inverter box is higher than other
places. The reason is that the switching frequency of the inverter is 5 kHz, the
same as the frequency depicted from the simulation figure. The simulation is
implemented at several other frequencies but only the switching frequency of the
inverter, which is 5 kHz, is shown here.

Figure 9.85 Stray magnetic field intensity of the setup case 2 at 5 kHz simulated in FEM (µA/m).

The setup is also implemented experimentally. The frequency response from


DC to 20 kHz is obtained and shown in Figure 9.86.
Computational Electromagnetics for Evaluation of EMC Issues 399

Figure 9.86 Measured frequency response of the stray magnetic field intensity of the setup case 2
from DC to 20 kHz (dBµA/m).

The unit of the simulation result is µA/m, while the unit of the experimental
results is dBµA/m. The µA/m can be converted to dBµA/m by using (9.7). Using
this equation, the experimental peak of the stray magnetic field at 5 kHz at the
given distance is 4.37 dBµA/m (0.61 µA/m), which is very close to the simulated
value (see Figure 9.85).
dBμA m
A (9.7)
 10 20
m

9.4.3 Applications of the Frequency Response Analysis of the Stray Field

Following the experimental verification of the simulation results, related


applications such as monitoring of components for the diagnosis of failures and
shielding are investigated.
As shown in Figure 9.86, the first peak located at very low frequency is
generated from the induction motor, since the motor is working at the power
frequency 60 Hz. As the working frequencies of the components in the system are
different, the behavior of each component can be investigated individually. This
can be a very useful hint in monitoring the conditions, as well as detecting the
faults of the motor and the inverter. For example, in case 2 (see Figure 9.86), if a
failure occurs in the motor, the peak at the power frequency and the related higher
harmonic orders will shift along the frequency band or the amplitudes would
change. Similarly, failures to the inverter may cause the same type of changes in
switching frequency and the related higher harmonic orders. Note that peaks at 10
kHz and 15 kHz in Figure 9.86 are due to the second and the third harmonics of the
inverter. The frequency responses in between the harmonics are noises and
subharmonics.
400 Advanced Computational Electromagnetic Methods and Applications

without shield -measurement


0 without shield - simulation
with shoield - measurement
-5
with shield - simulation

-10

Magnetic Field (dBuA/m)


-15

-20

-25

-30

-35

-40

-45

-50
0 1250 2500 3750 5000 7500
Frequency (Hz)

Figure 9.87 Stray magnetic field intensity of the setup case 2 from DC to 7.5 kHz (dBµA/m) at 5 cm
away from the inverter with and without the shield by means of simulation and
measurement.

(a)

(b)
Figure 9.88 Stray magnetic field intensity of the setup at case 2 from DC to 20 kHz (dBµA/m) at 5
cm away from the inverter (a) without shield and (b) with shield.

As another application of this case, the shielding in the vicinity of the switch,
5 cm, is tested. Figure 9.87 shows the frequency response of the stray H-field with
Computational Electromagnetics for Evaluation of EMC Issues 401

and without the shield between the switches and the antenna by means of
simulation and measurement. Using a steel shield, Steel 1018 as an example in this
test, it can be seen that the noises, subharmonics between the main harmonic
orders, decrease dramatically. The experimental results show a wider band of
frequency, DC to 20 kHz, as shown in Figure 9.88 to illustrate the effect of
shielding on the other harmonic orders.
Consequently, considering this test, the main harmonics and the related sub-
harmonics can help in selecting a shield with proper characteristics including the
permittivity and permeability. Comparing the curves of Figure 9.87, the simulation
result is similar to the experimental one. Hence, the proposed shield can be studied
and optimized using the physics-based simulation. The permittivity, permeability,
conductivity, and other physical characteristics of the shield can be altered and
optimized for the best electromagnetic compliance or any other purposes using the
simulation and experimental design.

9.5 HIGH-FREQUENCY EQUIVALENT SOURCE MODELING

For EMC modeling the power components, especially power electronic


components such as drives and converters, the high-frequency modeling of them is
needed, which is implemented based on Figure 9.89. The detailed models of the
power converters are reflected in [32].

Figure 9.89 Inverter circuit of the AC motor drive, used in simulation, with inclusion of parasitic
components.

Figure 9.90 demonstrates the connection of a three-phase 42V inverter,


armored power cable, and three-phase PMSM (permanent magnet synchronous
motors). The inverter adopts power IGBT as the switching device. In the inverter
model, all the semiconductor devices are substituted with their corresponding
physics-based models. To simulate such an inverter drive, the time-domain
simulation approach is used. To construct the simulation model for a motor-drive
402 Advanced Computational Electromagnetic Methods and Applications

system, the three major components of the system (i.e., inverter, cable and PMSM)
are replaced with their corresponding physics-based models.

(a)

(b)
Figure 9.90 Schematic view of a motor-drive system: (a) schematic of motor-drive system used for
the CM measurement and (b) experimental setup.

The test setup used to measure the frequency spectrum at different points in
the drive system is shown in Figure 9.90(b). The illustrated test setup consists of a
DC power supply, line impedance stabilization network (LISN), inverter circuit, a
2-m long armored power cable, and a 250-watt PMSM. To measure the common
mode current, all these components are assembled on a metallic plate.
Subsequently, the conducted current can be measured between these plates. In
order to avoid a time-consuming computing process and to get a better evaluation,
the frequency-domain simulation approach is used.
Figure 9.91 shows the system structure in the FE model. This model was
solved to estimate the values of the parasitic elements in the circuit model. Figure
9.92 shows comparisons of conducted EMI common mode between the
measurements data and two modeling approaches in the frequency domain.
To study the effectiveness of our models, the equivalent models for cable and
PMSM are added to the inverter model and the simulation results are compared to
Computational Electromagnetics for Evaluation of EMC Issues 403

the experimental results. To verify the accuracy of our numerical results, the
common mode current of the setup in Figure 9.91 is measured using the current
probe with 100-MHz bandwidth. The current of Figure 9.91 is measured at the
ground port of the input DC power supply.

(a) (b)
Figure 9.91 The FE meshes of (a) the BCP and (b) the converter numerical models.

Measurement of
actual system

-4
10
|Y(f)|(dBV)

Actual measurement Simulation, without x


-6
10 Modified model considering the
switches HF model
Conventional model
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Frequency (MHz)

Figure 9.92 Frequency spectrum of the common mode current.

As an application of this modeling, three switching techniques (hysteresis,


space vector modulation, and sinusoidal PWM) with carrier frequency of 5 kHz are
applied to the inverter. A three-phase 5-kW RL load is connected to the inverter,
which makes the inverter operate at nominal power. The calculated currents are
then injected to the terminals in the 3-D FE based model. The 3-D FE solution is
obtained using harmonic propagation analysis. The step time size for this
simulation is 5 µs. To validate the obtained numerical results, the experimental setup
is implemented and the inverter’s phase current spectrum is measured using a spectrum
analyzer and a 100-MHz current probe. The measured and simulated frequency
spectrums of the inverter’s phase current for SVM as an example is shown in Figure
9.93. Subsequently, the magnetic field spectrum of the components designed using
the physics-based modeling is shown in Figure 9.94.
404 Advanced Computational Electromagnetic Methods and Applications

0
10
|Y(f)|(dBV)

-2
10

Simulation
Experimental
-4
10
1 2 3 4 5
10 10 10 10 10
Frequency (Hz)

Figure 9.93 Phase comparison of the frequency spectrum of the inverter current between the
equivalent model and experiments.

(a) (b)
Figure 9.94 Magnetic flux density at different switching patterns: (a) before switch and (b) during
switching.

It can be easily inferred from the figure that, within this model, various
parameters can be changed and studied in order to identify an EMI mitigation
strategy during the design stage of these systems. In the proposed model, the EMI
can be analyzed at any point or plane within the simulation volume and can be
solved for different switching patterns. The time dependence of the radiated EMI
can also be evaluated using the model. We can see how the magnetic flux density
Computational Electromagnetics for Evaluation of EMC Issues 405

behaves over time at specific locations and at various switching patterns and
frequencies. Furthermore, the field image can be obtained for various scenarios
specified by the designer and provide them with information that can be obtained
quickly. This would allow for efficient and effective complete design work using
CEM.

9.6 OPTIMIZATION OF POWER ELECTRONIC CONVERTERS


USING PHYSICS-BASED MODELS

In this study, the optimization of a physics-based representation of a frequency


modulated switch mode converter is presented. The proposed physics-based model
can be used to evaluate the EMI in the structure of the converter. In this method
the power converter magnetic components and their position on the circuit board
are modeled numerically using FE analysis. Subsequently, the placement of the
components and also the electrical and operating parameters of the converter are
optimized in a way to limit the propagated EMF of the components. This is an
essential fact in design of the power converters and the evaluation of their EMI
interactions for EMC compliance.
Figure 9.95 depicts the flowchart of the parameter-optimizing procedure using
genetic algorithm (GA). Parameters for optimization are the converter operating
parameters and placement of the magnetic components with respect to each other.
GA evolves the given population of individuals. The object function consists of the
area of the circuit board and energy of the output voltage signal. In order to change
the area which confines the two inductors, the filter inductor position is considered
as a reference and the resonant inductor placement is changing all around the
reference point (filter inductor). Figure 9.96 illustrates the process of changing the
placement of the two inductors with respect to each other.

Figure 9.95 The optimization process diagram.


406 Advanced Computational Electromagnetic Methods and Applications

Figure 9.96 Iteration accomplished by GA to minimize the objective function.

Table 9.9 shows the results from the optimization process. The magnetic
component positions of this converter are shown in Figure 9.97. It is clear that this
power converter is showing a poor EMI performance at the initial design stage, as
shown in Figure 9.97(a). The FE analysis is performed to observe the near-field
effects for the given layout. The best EMI performance versus geometry of the
board and the frequency is shown in Figure 9.97(b).
Figure 9.98 compares the input current spectrum, filter inductor current
spectrum, and output voltage spectrum of the converter in the ideal case and
physics-based mode (nonoptimized case), respectively. It is noticed that in the
optimized case the peak of the frequency spectrum has been decreased, as
compared to the nonoptimized case. Figure 9.99 shows the circuit layout of the
converter in the optimized case. In this case, the magnetic components are placed
so that the magnetic field generated by each one has less interference with the
other. More details are reflected in [33].

Table 9.9
Optimization Results

Parameters Lr (µH) f (kHz) Area (mm2)

Initial design 120 50 3,360

Optimized design 45 90 11,000

(a) (b)
Figure 9.97 Layout of the system: (a) before optimization and (b) after optimization.
Computational Electromagnetics for Evaluation of EMC Issues 407

0
10

-2
10
|Y(f)|(dBV)
-4
10

-6
10 Non-optimized case
Nonoptimized case
Optimized case
Optimized case
-8
10
3 4 5 6
10 10 10 10
Frequency (Hz)
Figure 9.98 Comparison of the FFT spectrum between optimized and nonoptimized quasi-resonant
converter.

Figure 9.99 Circuit of the zero current switching (ZCS) quasi-resonant buck converter in the
optimized layout.

9.7 SUMMARY

This chapter reviewed the physics-based modeling analysis for the purpose of
EMC evaluation in a multicomponent power system. It introduced the algorithm of
physics-based modulation for both low- and high-frequency analysis. The
equivalent source modeling of the powertrain was implemented for EMC studies
and the results showed that the equivalent model can produce the same result as the
full model with significantly less simulation time. The model has been used for
condition monitoring of the components based on the EM signatures. Moreover,
the optimization of the switching algorithm as well as the proper placement of the
magnetic components on the PCB was achieved all based on the radiated EMFs.
408 Advanced Computational Electromagnetic Methods and Applications

REFERENCES

[1] W. Zhang, M. Zhang, F. Lee, J. Roudet, and E. Clavel, “Conducted EMI Analysis of a Boost
PFC Circuit,” IEEE Appl. Power Electron. Conf., pp. 223–229, 1997.
[2] B. Revol, et al., “EMI Study of a Three Phase Inverter-Fed Motor Drives,” IEEE Industry
Applications Conference, Vol. 4. 2004.
[3] Y. Zhong, et al., “HF Circuit Model of Conducted EMI of Ground Net Based on PEEC,”
Zhongguo Dianji Gongcheng Xuebao. Vol. 25, No. 17, 2005.
[4] H. Zhu, et al. “Analysis of Conducted EMI Emissions from PWM Inverter Based on Empirical
Models and Comparative Experiments,” 30th IEEE Annual Power Electronics Specialists
Conference, Vol. 2, 1999.
[5] X. Pei, et al., “Analytical Estimation of Common mode Conducted EMI in PWM Inverter,” IEEE
Industry Applications Conference, Vol. 4, pp. 14, 2004.
[6] L. Sevgi, et al., “EMC and BEM Engineering Education: Physics-Based Modeling, Hands-on
Training, and Challenges,” IEEE Antennas and Propagation Magazine, Vol. 45, No .2,
pp.114119, 2003.
[7] D. Dixon, M. Obara, and N. Schade. “Finite-Element Analysis (FEA) as an EMC Prediction
Tool,” IEEE Transactions on Electromagnetic Compatibility, Vol. 35, No. 2, pp. 241248, 1993.
[8] A. Sarikhani, M. Barzegaran, and O. Mohammed, “Optimum Equivalent Models of Multi-Source
Systems for the Study of Electromagnetic Signatures and Radiated Emissions from Electric
Drives,” IEEE Transactions on Magnetics, Vol. 48, No. 2, pp. 10111014, 2012.
[9] M. Barzegaran, A. Sarikhani, and O. Mohammed, “An Optimized Equivalent Source Modeling
for the Evaluation of Time Harmonic Radiated Fields from Electrical Machines and Drives,”
Applied Computational Electromagnetics Society Journal, Vol. 28, No. 4, pp. 273282, 2013.
[10] M. Barzegaran, A. Sarikhani, and O. Mohammed, “An Equivalent Source Model for the Study of
Radiated Electromagnetic Fields in Multi-Machine Electric Drive Systems,” 2011 IEEE
International Symposium on Electromagnetic Compatibility, Long Beach, CA, pp. 442447,
2011.
[11] A. Rosales, A. Sarikhani, and O. Mohammed, “Evaluation of Radiated Electromagnetic Field
Interference due to Frequency Switching in PWM Motor Drives by 3D Finite Elements,” IEEE
Transactions on Magnetics, Vol. 47, No. 5, pp. 14741477, 2011.
[12] M. Vetterli and C. Herley “Wavelets and Filter Banks: Theory and Design,” IEEE Trans. on
Signal Processing, Vol. 40, No. 9, pp.22072232, 1992
[13] R. Coifman, Y. Meyer, and M. Wickerhauser, “Wavelet analysis and signal processing” In
Wavelets and their Applications, Boston, MA: Jones and Bartlett, pp.153–178, 1992.
[14] T. Chow, Introduction to electromagnetic theory: a modern perspective, Boston MA: Jones &
Bartlett, 2006
[15] M. Barzegaran, “Physics-Based Modeling of Power System Components for the Evaluation of
Low-Frequency Radiated Electromagnetic Fields,” PhD Dissertation, Florida International
University, FIU Electronic Theses and Dissertations, Paper 1193, 2014.
[16] Department of Defence Interface Standard, Requirements for the Control of Electromagnetic
Interference Characteristics of Subsystems and Equipment, MIL-461-STD, 2007.
Computational Electromagnetics for Evaluation of EMC Issues 409

[17] M. Ubeid, M. Shabat, and M. Sid-Ahmed, “Effect of Negative Permittivity and Permeability on
the Transmission of Electromagnetic Waves through a Structure Containing Left-Handed
Material,” Natural Science Magazine, Vol. 3, No. 4, pp. 328333, 2011.
[18] M. Barzegaran, A. Mazloomzadeh, and O. Mohammed, “Fault Diagnosis of the Asynchronous
Machines through Magnetic Signature Analysis Using Finite Element Method and Neural
Networks,” IEEE Transactions on Energy Conversion, Vol. 28, No. 4, pp. 10641071, 2013
[19] F. Ulaby, Fundamentals of Applied Electromagnetics, 5th Edition, Upper Saddle River, NJ:
Prentice Hall, pp. 321324, 2006.
[20] M. Barzegaran and O. Mohammed, “A Generalized Equivalent Source Model of AC Electric
Machines for Numerical Electromagnetic Field Signature Studies,” IEEE Transactions on
Magnetics, Vol. 48, No. 11, pp. 44404403, 2012.
[21] F. Lattarelo, Electromagnetic Compatibility in Power Systems, 1st edition, New York: Elsevier,
2006.
[22] M. Barzegaran, A. Nejadpak, and O. Mohammed, “Evaluation of High Frequency
Electromagnetic Behavior of Planar Inductor Designs for Resonant Circuits in Switching Power
Converters,” Applied computational electromagnetic society (ACES) journal, Vol. 26, No. 9, pp.
737748, 2011.
[23] M. Barzegaran and O. Mohammed, “3-D FE Equivalent Source Modeling and Analysis of
Electromagnetic Signatures from Electric Power Drive Components and Systems,” IEEE
Transactions on Magnetics, Vol. 49, No. 5, pp. 19371940, 2013.
[24] G. Skibinski, R. Kerkman, and D. Schlegel, “EMI Emissions of Modern PWM AC Drives,”
IEEE Ind. Appl. Mag., Vol. 5, No. 6, pp. 4780, 1999.
[25] O. Martins, S. Guedon, and Y. Marechal, “A New Methodology for Early Stage Magnetic
Modeling and Simulation of Complex Electronic Systems,” IEEE Trans. Magn., Vol. 48, No. 2,
pp. 319322, 2012.
[26] D. Giancoli, Physics: Principles with Applications, Upper Saddle River, NJ: Pearson Education,
p. 624, 2005.
[27] C. Paul, Inductance: Loop and Partial, Hoboken, NJ: Wiley-IEEE Press, p. 195, 2011.
[28] W. Arnoldi, “The principle of Minimized Iterations in the Solution of the Matrix Eigenvalue
Problem,” Quarterly of Applied Mathematics, Vol. 9, pp. 17–29, 1951.
[29] J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, 3rd edition, New York: Springer,
2002.
[30] M. Barzegaran and O. Mohammed, “Multi-Dipole Modeling of XLPE Cable for Electromagnetic
Field Studies in Large Power Systems,” International Journal for Comp. and Math. in Electrical
Eng. , Vol. 33, No. 1, 2014.
[31] M. Barzegaran and O. Mohammed, “Near Field Evaluation of Electromagnetic Signatures from
Wound Rotor Synchronous Generators Using Equivalent Source Modeling in Finite Element
Domain,” 28th ACES Conf., Columbus, OH, Apr. 2012.
[32] A. Nejadpak, “Development of Physics-based Models and Design Optimization of Power
Electronic Conversion Systems,” FIU Electronic Theses and Dissertations. P. 824, 2013.
[33] A. Nejadpak and O. Mohammed, “Physics-Based Optimization of EMI Performance in
Frequency Modulated Switch Mode Power Converters,” Electromagnetic Field Problems and
Applications (ICEF), pp.14, 2012.
Chapter 10
Manipulation of Electromagnetic Waves Based
on New Unique Metamaterials: Theory and
Applications
Qun Wu, Jiahui Fu, Fanyi Meng, Kuang Zhang, and Guohui Yang

Metamaterials are typically engineered by arranging a set of unit cells in a regular


array throughout a region of space, thus obtaining some desirable macroscopic
electromagnetic behavior. The desired property is often one that is not normally
found naturally (negative refractive index, near-zero index, and so forth). Over the
past few years, the flexibilities of the metamaterials in choosing the numerical
values of the effective permittivity or permeability have led to kinds of novel
theoretical and practical possibilities for different applications, ranging from
microwave to optical frequencies. In this chapter, we review the theoretical basis
by which metamaterials can manipulate the electromagnetic waves and further
discuss their applicability to various devices or components, including: (1) novel
devices based on optical transformation, such as invisibility cloaks, energy
concentrators, waveguide connectors, and multibeam antennas; and (2) gain
enhancement metamaterial lenses.

10.1 INTRODUCTION

Recently, metamaterials have been attracting growing attention. Responding to the


incident electric and/or magnetic fields, metamaterials could exhibit specific
effective permittivity ε and/or permeability μ, including the double negative (DNG
for short, the real part of ε and μ is negative, which was first proposed by Veselago
theoretically in 1968 [1]), single negative (ENG [2] or MNG [3] for short, the real
part of ε or μ is negative), and zero-index metamaterials [4] (ZIM, the real part of ε
and/or μ is near zero). Due to the exotic electromagnetic characteristics,
metamaterials have shown great potential in applications such as an invisibility

411
412 Advanced Computational Electromagnetic Methods and Applications

cloak [5], a perfect lens [6], and many other kinds of novel applications in
microwave [79], terahertz [10], and optical regime [11].
Manipulation of electromagnetic waves as desired has been a hot topic in the
field of electromagnetism for a long time. The emerging of metamaterials provides
great opportunities to control transmissions and distributions of electromagnetic
waves and energy. In this chapter, the applications of metamaterials in the
manipulation of electromagnetic waves are discussed. In Section 10.2, the theory
of transform optics is introduced. Then based on the form invariance properties, an
electromagnetic energy concentrator and waveguide connector are proposed and
simulated. After simplification processing of constitution parameters, the
simulation is completed to verify the theoretical works. In Section 10.3, zero index
metamaterials with matched impedance are constructed and applied to enhance the
gain of horn antenna. Measurements of gain and far-field pattern verify the
theoretical design. In Section 10.4, metamaterials are applied to build a novel
broadband absorber. A brief conclusion is in Section 10.5.

10.2 THEORY OF TRANSFORM OPTICS AND APPLICATIONS

10.2.1 Theory of Transform Optics

According to the Minkowski relationship, Maxwell’s equations can keep form


invariance under different coordinates/spaces. The change of coordinate or space
system is reflected on the expression of operator. This is the basic idea for the
theory of the transform optics, as shown in Figure 10.1. It is obvious that the
optical path will keep invariant in the transformation. Next we will focus on the
introduction to the derivation process of the transform optics.
First, we can express Maxwell’s equations in Minkowski’s form as:
F ,   F ,  F ,   0 (10.1)

G  J  (10.2)
where Fαβ represents the matrix of electric field E and the magnetic field B, Gαβ
represents the matrix of electric displacement vector D and the magnetic field
strength H, and Jβ is the vector of the excitation source. These three components
can be expressed as:
 0 E1 E2 E3 
 E 0 cB3 cB2 
F  1 (10.3)
  E2 cB3 0 cB1 
 
  E3 cB2 cB1 0 
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 413

 0 cD1 cD2 cD3 


 cD 0 H3 H 2 
G   1 (10.4)
 cD2 cB3 0  H1 
 
 cD3 H 2 H1 0 

c  
J 
J   1  (10.5)
 J2 
 
 J3 
Based on the expressions above, the material parameters can be expressed as:
1
G  C F  (10.6)
2
where Cαβμυ might be the constitutive matrix, including permittivity, permeability,
and bi-anisotropy parameters. Here below we will take the Cartesian coordinates as
an example to show in detail how the material parameters will be derived. As
shown in Figure 10.1, the original coordinate system can be expressed as OX1X2X3,
while the new coordinate system can be expressed as OX1X2X3, and the
transforming function can be expressed as:
x1 '  f1  x1 , x2 , x3  (10.7)

x2 '  f 2  x1 , x2 , x3  (10.8)

x3 '  f3  x1 , x2 , x3  (10.9)

Then the relationship between arbitrary vector X' in the transformed space and the
vector X in the original space can be derived as:
X '  X (10.10)
where Λ is a Jacobian matrix, which can be expressed as:
xi'
 ij   i, j  1, 2,3 (10.11)
x j
where xi' and xj represent the coordinate components in the transformed space and
in the original space, respectively. Then (10.11) can be further expressed into:
414 Advanced Computational Electromagnetic Methods and Applications

 x1' x1' x1' 


 
 x1 x2 x3 
 x ' x2' x2' 
 2  (10.12)
 x1 x2 x3 
 ' 
 x3 x3' x3' 
 x1 x2 x3 

(a) (b)
Figure 10.1 Sketch of the optical transformation: (a) original coordinate and (b) transformed
coordinate.

Finally, it can be derived that the constitutive tensor of the material in the
transformed space can be expressed as:

T
 ' ' (10.13)

It can be seen that the constitutive parameters of the material derived from the
transform optics own the direct relationship with the transforming function
between the original space and the transformed space. In the next section, we will
employ several examples to show how the transform optics works in the design of
novel microwave devices.

10.2.2 Invisibility Cloak Based on Transform Optics

Here we consider an arbitrarily N-sided polygonal object embedded in free space,


which is covered by a conformal cloak, as shown in Figure 10.2. The cross-section
of the cloak is an N-sided polygon. The transformation is identified along the z-
axis, which is a cylindrical axis; hence, we present the transformations for the
regular polygonal cloaking in the x-y plane.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 415

Suppose an arbitrary point H(x0, y0) in the original coordinate system, whose
corresponding point is G(x0', y0') in the transformed system, as shown in Figure
10.2. Then the 2-D arbitrary transformation from the original space to new space is
expressed as:
r '  k0 r  R1 (10.14)
where r', r, and R1 denote the distance of OG, OH, and OM, respectively. M is the
joint point of OG and the inner surface. k0 equals (b-a)/b, where a and b are the
intercept of the inner and outer polygons with the y-axis, respectively. Here the
cloak is conformal to the inner region, so the corresponding sides have the same
slope.
Suppose that the vertex of the inner polygon is named (xn, yn) clockwise, and
the corresponding vertex of the outer polygon is named (xn', yn'). Then the equation
of the nth side made up by (xn, yn) and (xn+1, yn+1) can be identified as:
y  yn x  xn
 (10.15)
yn 1  yn xn 1  xn

Then we can fix the distance of OM as:

R1 k1
 (10.16)
r y  k2 x

yn 1  yn yn 1  yn
where k1  yn  xn and k2  . According to the transformation
xn 1  xn xn 1  xn
invariance, the unit vectors in the original space and in the transformed space must
be equal; hence we can get the coordinate transformations as:

x' y' b  a k1
   (10.17a)
x y b y  k2 x

z'  z (10.17b)
Then we can easily compute the Jacobian transformation matrix, which
represents the derivative of the transformed coordinates with respect to the original
coordinates. Using the property that Maxwell’s equations are form invariant in the
original and the transformed spaces, we can obtain permittivity and permeability
tensors of the medium in the transformed space:

k02  2k0 y R1  y  k2 x   r 2 R12  y  k2 x 


1 2

 xx   xx  (10.18a)
k02  k0 R1
416 Advanced Computational Electromagnetic Methods and Applications

k0  y  k 2 x  R1  yk2  x   r 2 k2 R12  y  k2 x 
1 2

 xy   yx   xy   yx  (10.18b)
k02  k0 R1

k02  2k0 k2 x R1  y  k2 x   k22 r 2 R12  y  k2 x 


1 2

 yy   yy  (10.18c)
k02  k0 R1

1
 zy   zz  2 (10.18d)
k  k0 R1
0

k1
where R1  . Above all, (10.18) provides the full design parameters for the
y  k2 x
permittivity and permeability tensors in the 2-D arbitrarily irregular polygonal
cloaks. Next we will utilize the constitutive tensors that we got above to make full-
wave simulations using the FEM method, in order to validate the design.

G
b a M
H
O
x

Figure 10.2 Sketch of the arbitrarily polygonal cloak.

Here we consider the five-sided polygonal cloak with the parameters


calculated based on (10.18). The line source is adopted and the electric-field
distributions in the computational domain when the cloaking material is ideally
lossless are illustrated in Figure 10.3. Displayed is the real part of the electric-field
phasor, which is equivalent to time-domain fields at the instant of time when the
source phase is zero, so that the individual phase front is clearly visible. The
cloaking effect can be clearly found. The phase fronts of the incident wave are
parallel to one side of the cloak. As can be seen, outside the cloaking shell, the
wave of the electric line source is almost unaltered, as if no scatterer were present.
Inside the cloaking material, the wave is smoothly bent around the cloaked area
and the phase fronts are completely restored when the wave exits the cloak
material. The wave impedances of the cloak medium and free space are exactly
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 417

matched and the device is therefore reflectionless. All these results verify the
theoretical design and derivations.

(a) (a) 1 (b) 1 (b) 1 1

-1 -1 -1 -1

(a) (b)

Figure 10.3 Simulation results of E-field distributions of (a) five-sided bulgy polygonal invisibility
cloak and (b) five-sided concave polygonal invisibility cloak.

10.2.3 Electromagnetic Concentrator Based on the Transform Optics

In this section, the cylindrical electromagnetic concentrator is taken into


consideration. The optical transformation for the concentrator can be expressed so
that the region r'[0, R2] is compressed into the region r∈[0, R1], and the region
r'∈[R2, R3] is stressed into the region r'∈[R1, R3], as shown in Figure 10.4. Here r
and r' represent the radii in the physical space and the virtual space, respectively.
According to the coordinate transformation theory and the form invariance of
Maxwell’s equations, the constitutive parameters of the concentrator can be
expressed as:
f r 
r  (10.19a)
rf '  r 

rf '  r 
  (10.19b)
f r 

f 'r  f r 
z  (10.19c)
r
where f(r) is the function between the original space and the transformed space.
For the transformation between r'∈[0, R2] and r∈[0, R1], namely the core region,
the function can be easily expressed as:
418 Advanced Computational Electromagnetic Methods and Applications

R2
r '  f r   r (10.20)
R1

Then the constitutive tensor of the core region can be derived as:
 cr   c  1 (10.21a)

2
 R2 
z    (10.21b)
 R1 

For the circular region, it can be noticed that the values of εr and εθ are reciprocal.
If one of them is set as a constant, the other can be also fixed. Suppose that:
rg '  r 
   m0 (10.22)
g r 

By solving the ordinary differential equation above, the general solution can
be expressed as:

f  r   m1r m0 (10.23)

where m0 and m1 are unknown coefficients, which could be defined through the
boundary conditions. Furthermore, the transformation function f(r) between the
original space and the transformed space should fulfill the boundary condition
f  R3   R3 (10.24a)

f  R1   R2 (10.24b)

Based on the boundary conditions, the unknown coefficients can be solved, and the
constitutive tensor for the circular region can be expressed as:
1
   m0 (10.25a)
r
21 m0 
 R3 
 z  m0   (10.25b)
 r 

R3
where m0  log R3 . Hence we have obtained all the constitutive parameters of
R1
R2
the cylindrical electromagnetic concentrator. It could be seen that the relative
permittivities  r and   are obtained as constants, and only  z is the function of
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 419

the radius, which could also be homogenized through the layered structure.
Furthermore, it can be observed that the constitutive tensor is nonsingular and
positive, which improves the flexibilities for 2-D EM concentrator designs.
Moreover, the impedance of the concentrator at the outer boundary can be
expressed as Z r  R    z  1 . The electromagnetic concentrator is always
3

impedance matched with free space, which indicates minimized scattering fields of
the cylindrical electromagnetic concentrator. Next there are some full wave
simulations based on the constitutive parameters above.

R1

R2

R3

Figure 10.4 Sketch of an electromagnetic concentrator.

Here lossless cases are studied based on the simulation results of the FEM
method. The geometry parameters are selected to be R3 = 2R2 = 4R1 = 0.4m. Based
on all the geometry parameters, the constitutive parameters can be calculated
through the equations above. The frequency is selected to be 2 GHz. Figure 10.5(a)
shows the electric field distributions of the concentrator. It can be seen that the
electric fields are concentrated into the inner core region smoothly, and the fields
outside are rarely disturbed. Furthermore, the power flow of the electric fields are
also calculated and shown in Figure 10.5(b). It can be seen that the power flow is
enhanced obviously in the inner core region. The enhancing ratio can be expressed
as the ratio of R2 and R1, and the enhancement theoretically diverges to infinity as
R1 tends to zero.
420 Advanced Computational Electromagnetic Methods and Applications

Max:1 Max:1
(a) (b)
0.4

0.2
0.1
0
-0.1
-0.2

-0.4

-0.4 -0.2 -0.1 0 0.1 0.2 0.4 Min:-1 -0.4 -0.2 -0.1 0 0.1 0.2 0.4 Min:0

(a) (b)
Figure 10.5 Simulation results of the concentrator: (a) electric field distribution and (b) normalized
power flow.

10.2.4 Reflectionless Waveguide Connector Based on Transform Optics

In this section, we focus on arbitrary waveguide connector, which is feasible for


realization. Here finite embedded optical transformation is applied to the design of
the waveguide connector. The sketch of the waveguide connector is shown in
Figure 10.6(a). Considering a 2-D structure in the Cartesian coordinate system, as
shown in Figure 10.6(b), the optical transformation that compresses/expands (the
inverse problem) the original space ACBD into the transformed space ACB'D' can
be defined as:

y2  x   y1  x  y2  x   y1  x 
y'  y (10.26a)
2a 2

x'  x (10.26b)

z'  z (10.26c)
where the length of AC is assumed to be 2a, the curve that connects C and D′ is
defined as y1(x), and the curve that connects A and B' is defined as y2(x). The
functions of the two curves can be selected arbitrarily, so long as they can satisfy
the numerical values at the points of A, B' and C, D', respectively. Hence, the
Jacobian transformation matrix can be gotten based on (10.26):

 1 0 0
 
y d  y2  x   y1  x   1 d  y2  x   y1  x   y2  x   y1  x 
A    0  (10.27)
2a dx 2 dx 2a
 
 0 0 1

Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 421

which represents the derivative of the transformed coordinates with respect to the
original coordinates. Using the property that Maxwell’s equations are form
invariant in the original and transformed spaces, the permittivity and permeability
tensors of the medium in the transformed space can be expressed as:

AAT
'  (10.28a)
det  A

AAT
'  (10.28b)
det  A

where  and  represent the constitutive tensors of the original space. Here the
original space is supposed to be free space, so the constructive tensors of the
original space can be expressed as:

  I 0 (10.29a)

  I 0 (10.29b)

Hence, we can easily get the relative permittivity and permeability tensors in the
transformed region as:
2a
 xx   xx   zz   zz  (10.30a)
y2  x   y1  x 

d  y2  x   y1  x  d  y2  x   y1  x 
y  a 
dx dx (10.30b)
 xy   yx   xy   yx 
y2  x   y1  x 
2
 d  y2  x   y1  x   d  y2  x   y1  x   
y a  
 yy   yy 

 dx dx   y2  x   y1  x 
2a  y2  x   y1  x   2a
(10.30c)
Furthermore, the symmetrical constitutive matrix can be transformed into the
diagonal matrix through rotating the coordinates, which will be more useful in the
construction of metamaterial. The diagonal matrix can be expressed through the
eigenvalues of the symmetrical constitutive matrix:

   yy   4 xy
2
 xx   yy  xx (10.31a)
11  11 
2
422 Advanced Computational Electromagnetic Methods and Applications

   yy   4 xy
2
 xx   yy  xx (10.31b)
 22  22 
2
 33  33   zz (10.31c)
Above all, (10.31) provides the design parameters for the permittivity and
permeability tensors of the metamaterials filled in the waveguide connector. Next
the constitutive tensors above will be utilized for full-wave simulations on
arbitrary waveguide connectors.
Y
(a) (b) A a B

Port Waveguide 1 Wave- Port y 2(x)


Connector B’
1 guide 2 2 O
y 1(x) D’X

C -a D

(a) (b)
Figure 10.6 Sketch of the waveguide connector: (a) sketch of the connector and (b) connector in the
Cartesian coordinate.

In order to validate the constitutive tensors above, we use the FEM method to
simulate arbitrary waveguide connectors. The geometrical sizes of waveguides are
properly selected to make sure that the TE10 mode at 2 GHz can be transmitted.
The simulation domain is shown in Figure 10.6(a), and port 1 is selected to
illuminate the incident wave. Here it should be noticed that the simulations are
carried out in the transformed space, but the constitutive parameters in (10.31) are
expressed as the function of x and y, which are variables of the original space. The
variables of the original space should be replaced by the variables of the
transformed space, which can be gotten through (10.32):
x  x' (10.32a)

2ay ' a  y2  x '  y1  x ' 


y (10.32b)
y2  x '  y1  x '

z  z' (10.32c)
First, a simple connector of symmetrical structure is simulated to verify the
designed formulae. The electric field distribution of the connector filled with
metamaterials of the designed constitutive parameters is shown in Figure 10.7(a),
and the connector filled with air is also simulated for the sake of comparison, as
shown in Figure 10.7(b). It can be seen that although the connector filled with air
can fulfill the transmission of electromagnetic waves from a big waveguide into a
small one, there exist reflections and part of the energy is lost. For the connector
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 423

filled with metamaterial of the designed constitutive tensors, it is obvious that the
electromagnetic waves are properly guided from a big waveguide to a small one
without any impact on guided mode. Then several general models are also
simulated, including sharper connector and unsymmetrical connectors, and the
results are shown in Figure 10.8. From the electric field distributions it can be seen
that the connectors all work very well. The electromagnetic waves can be
transmitted from the big waveguide into the small one properly.

0.4 (a) (b)


1 1

0 0 0

-0.4 -1 -1
-0.9 0 1 -0.9 0 1

(a) (b)
Figure 10.7 Simulation results of electric field distributions of: (a) waveguide connecter based on
optical transformation and (b) traditional waveguide connecter.

0.4
1 1

0 0 0

-0.4 -1 -1
-0.9 0 1 -0.9 0 1
(a) (b)
0.4
1 1

0 0 0

-0.4 -1 -1
-0.9 0 1 -0.8 0 1.2
(c) (d)
Figure 10.8 Simulation results of electric field distributions of: (a) sharper symmetrical waveguide
connecter; (b) curve-symmetrical waveguide connecter; (c) unsymmetrical waveguide
connecter I; and (d) unsymmetrical waveguide connecter II.

10.2.5 Multibeam Antenna Based on Transform Optics

In this section, we focus on the multibeam antenna based on the transform optics.
424 Advanced Computational Electromagnetic Methods and Applications

We restrict our investigation to 2-D cases for simplification, where the field is
invariant in the z-direction. The transformation is constructed in the Cartesian
coordinate system (in the x-y plane), as shown in Figure 10.9. Assume that an ideal
isotropic line source is located at the center point O. r represents the radius of the
inner circle and l represents the length of the n-sided regular polygon. Divide the
regular polygon domain into n isosceles triangles with a vertex angle of θ. In the
triangle OAB, we first make the following transformation: the fan-shaped virtual
space OA′B′ is mapped to the triangular physics space OAB. Obviously, the
nonlinear transformation will inevitably make the material parameters
inhomogeneous. In order to eliminate the inhomogeneity of the transformation
space, a geometrical simplification is made in the fan-shaped OA′B′. For a small
angle θ, the arc length of A′B′ is approximately equal to the length of segment A′B′.
So eventually, let the triangle OA′B′ (x, y, z) be mapped to triangle OAB (x′, y′, z′).
The mapping transformation function can be written as follows:
x '  ax  by  c (10.33a)

y '  dx  ey  f (10.33b)
z'  z (10.33c)
with
1
 a   x0 y0 1  x0 
      (10.34)
 b    xA ' y A ' 1  x A 
c x yB ' 1  xB 
   B'
Then Jacobian matrix can be expressed as:

 a b 0
 
A   d e 0 (10.35)
 0 0 1
 
By the metric invariance of Maxwell’s equations, we can obtain the constitutive
parameter tensors of the material in the transformation region:

T
'  (10.36a)
det   

T
'  (10.36b)
det   

with det(Λ) = ae – bd, A′(rcos(θ/2), –rcos(θ/2)), B′(rcos(θ/2), rcos(θ/2)), A(l,


–tan(θ/2)), B(l, –tan(θ/2)) , θ = 2π/N. We assume that the original space is free
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 425

space as usual, ε = μ = diag(1, 1, 1). Hence, from (10.33) to (10.36), the


permittivity and permeability tensors of the material in the transformed space are
expressed as:

1 0 0 
 
    0 1 0  (10.37)
0 0 r 2 l 2 cos 2  2  

Figure 10.9 Geometry of the optical transformation for N-beam antenna.

It is noted that all tensors of the material parameters (permittivity and permeability)
are position-independent and are only functions of r, l, and θ. For the fixed points
O, A, A′, B, B′, C, and C′, the material parameters are constants. Therefore, we can
design an arbitrary N-beam antenna with homogeneous materials, which are much
easier to be realized by metamaterials. It is worth mentioning that the radiation
direction varies with the change of the position of the points (A, A′, B, B′) thus
resulting in more generally arbitrary radiation direction with arbitrary beam.
Nevertheless, the inhomogeneity of the media has been eliminated; the
permeability is still not unity. In the case of a transverse electric (TE) incident
wave with an electric field polarized along the z-direction, only ε′zz, μ′xx and μ′yy
should be required in (10.37). According to [5], the wave trajectory remains
unchanged as long as the products of εzzμxx and εzzμyy are kept invariant. Therefore,
we can make one optimal choice that the material parameters are the simplest. Set
r 2 cos 1  2 
 xx'   yy'  1,  zz'  (10.38)
l2

Only one component of all the tensors of the material parameters will be needed to
realize the goal. It will be quite easy to fabricate such metamaterials in practical
426 Advanced Computational Electromagnetic Methods and Applications

engineering work.
Then full-wave simulations based on the FEM method are carried out to verify
the approach. For the sake of convenience, the TE incident wave with the electric
field polarized along the z-axis is adopted. The boundary conditions surrounding
the computational region are set as PML to simulate the propagation of incident
wave in the real region. The isotropic line source is located at the center O(0, 0).
The working frequency is set to be 5.8 GHz.
Three typical cases are taken into consideration, namely, three-beam, four-
beam and eight-beam antennas, and the corresponding transformed space
parameters are ε′zz = 0.027, 0.056, and 0.07, respectively, while the line source
embedded in free space is also taken for comparison. The distributions of the
magnitude of electric field along the z-axis for all four cases are presented in
Figure 10.10. The cylindrical waves are excited by the isotropic line source in free
space as shown in Figure 10.10(a). For the cases where the line source is
embedded in the transformed medium with the specific constitutive parameters in
(10.38), the propagating path of the cylindrical waves is reorganized as desired,
and three-, four-, and eight-beam are fulfilled as shown in Figures 10.10(bd),
respectively.
y(m)

(a) (b)

(c) (d)
Figure 10.10 Electric field distributions for: (a) line source in the free space; (b) three-beam antenna;
(c) four-beam antenna; and (d) eight-beam antenna.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 427

Furthermore, we also compute the far-field radiation patterns of the above four
antennas, as shown in Figure 10.11. The far-field pattern of the line source shown
in Figure 10.11(a) in free space is taken for comparison. It can be seen from
Figures 10.11(bd) that the multibeam antennas constructed by the metamaterials
based on the transform optics provide high directivity radiation beams in the
desired directions. The far-field patterns show a good agreement with the electric
field distribution depicted in Figure 10.10. So both the near-field distribution and
far-field patterns verify the theoretical design.

(a) (b)

(c) (d)
Figure 10.11 Far-field patterns of: (a) line source in free space; (b) three-beam antenna; (c) four-beam
antenna; and (d) eight-beam antenna.

10.3 A DETACHED ZERO INDEX METAMATERIAL LENS FOR


ANTENNA GAIN ENHANCEMENT

In recent years, metamaterials have attracted tremendous attention in physics,


electromagnetism, and material fields for their unique electromagnetic properties
[12] and have been widely used in dimension miniaturization and performance
428 Advanced Computational Electromagnetic Methods and Applications

enhancement of microwave devices [1316] and antennas [1719]. In 2002, Enoch


et al. pointed out that zero index metamaterials (ZIMs) can be used to achieve
directive emissions on antenna systems [20], and later, Ziolkowski researched the
characteristic of the propagation and scattering of electromagnetic waves in ZIM
[21]. Based on the research of Enoch et al., many groups designed varieties of zero
index metamaterial lenses (ZIMLs) and lens antennas [2239]. However, most
existing ZIMLs are implemented either by a single electrical resonator [20, 22, 23,
25, 30] with near-zero permittivity or by a single magnetic resonator [27] with near
zero permeability. In these cases, the wave impedance of ZIMs is not able to match
to air, which significantly lowers the radiation efficiency of antennas. As a result,
these ZIMLs have to be embedded in apertures of horn antennas [22, 25], or
function with reflection apparatus (such as the ground plane of the patch antenna
[24, 26, 30, 31]) similar to Fabry-Pérot resonance [3234], or enclose the whole
structure of antennas [20, 27]. Obviously, such ZIMLs rely heavily on the practical
application environment, and thus their further development is obstructed.
In 2009, Ma et al. theoretically pointed out that an anisotropic ZIML with
appropriately designed constitutive tensors has good impedance matching with air
[28], so that the anisotropic ZIML can efficiently enhance antenna gain and can be
detached from antennas. Based on the results in [28], Cheng et al. realized an
anisotropic ZIML composed of split ring resonator (SRR) arrays and achieved a
remarkable directivity enhancement for a line source [27]. However, the volume of
the anisotropic ZIML in [27] is very large with regard to the line source, and the
operating bandwidth is limited. A similar design for the anisotropic ZIML was
presented in [35] to enhance the gain of a Vivaldi antenna within a broad
bandwidth, but it was not demonstrated whether this ZIML can be detached from
antennas. In addition, Turpin et al. designed and numerically demonstrated another
kind of anisotropic ZIML impedance-matched to free space [37] according to the
theoretical model proposed in [36], which is based on the simplified transformation
optics (TO) method. However, the ZIML in [37] is very difficult to fabricate
because of the 3-D structure of the unit cell. Moreover, the detachability of the
ZIML was not demonstrated. In addition, it has been demonstrated that
metamaterial-based gradient index lenses can also be applied for antenna gain
enhancement [3840], but such gradient index lenses are usually bulky and heavy.
In this section, a detached ZIML composed of both electric resonators with
near-zero permittivity and magnetic resonators with near-zero permeability is
realized and investigated. Appropriate design leads the detached ZIML to have
both the refractive index of near zero and the wave impedance matching with the
air, which is of great importance and underlies the potential of this ZIML design
for efficiently enhancing antenna gain. The detached ZIML is fully demonstrated
by numerical simulations and experiments.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 429

10.3.1 Design and Analysis of Detached ZIML

The proposed unit cell of the detached ZIML is illustrated in Figure 10.12. It
consists of a metal patch and modified split ring resonator (MSRR). The unit cells
are aligned along the x- and y-axes. The patches are continuously aligned along the
y-axis to form a metal strip that is able to cause electrical resonance to achieve
zero permittivity, similar to [41]. The MSRR consists of two square loops, and
each loop has two slots at the opposite sides. One of the two square loops is
generated by rotating 90° with the other one. The two loops are etched on the
opposite sides of the dielectric substrate. The MSRR is implemented instead of the
traditional SRR for stronger magnetic resonance, smaller electrical size, and
broader resonance bandwidth [42]. Referring to Figure 10.12, the geometric
parameters of the metal patch and MSRR are designed as: l2 = 5.4 mm, l3 = 6.6 mm,
t1 = 2.9 mm, w = 0.8 mm, t2 = 0.8 mm, and εr = 2.2. The overall length of the unit
cell along the x-, y-, and z-axes is t = 8.2 mm, h = 6.6 mm, and l1 = 8 mm,
respectively.

Figure 10.12 Geometry of the unit cell of the detached ZIML.

Figure 10.13 Transmission and reflection coefficients of the unit cell of the detached ZIML.
430 Advanced Computational Electromagnetic Methods and Applications

The S-parameters are calculated for a periodic array with the unit cells of
ZIML and the thickness of one unit cell. The magnitude of S-parameters of the
ZIML for the z-directional incident plane wave is illustrated in Figure 10.13. It is
shown that the magnitude of S21 is larger than 3 dB from 8.8 GHz to 10.9 GHz
where its peak value is 0 dB at 9 GHz and 9.9 GHz, implying that the field can
easily pass through the ZIML within this frequency band.
Effective constitutive parameters μeff and εeff of the ZIML are extracted from
the corresponding transmission and reflection data [43] and shown in Figure 10.14.
It can be seen that the effective permeability μeff and the effective permittivity εeff
in turn approach zero at 9.4 GHz and 9.7 GHz, respectively, which will make the
corresponding effective refractive index n to be near zero in a band as broad as
possible. Particularly, the effective permittivity and permeability are the same at
9.0 GH and 9.9 GHz with the values of 0.8 and 0.3, respectively, which leads the
ZIML to have both near-zero refractive index and perfectly wave impedance
matching with air.

Figure 10.14 The effective constitutive parameters of the detached ZIML.

Figure 10.15 The effective refractive index of the detached ZIML.


Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 431

The effective refractive index n is further calculated based on the constitutive


parameters and depicted in Figure 10.15. From Figure 10.15, one can see that the
effective refractive index n is low enough in a fairly broad frequency band due to
the near-zero values of both the effective permeability and permittivity. According
to [20], such a low refractive index can be used to effectively improve the
directivity of antennas. It is worth mentioning that, in the frequency region from
9.4 GHz to 9.7 GHz, the effective refractive index has a very small real part but a
relatively large imaginary part, which represents the loss of the ZIML and is
caused by the opposite signs of the effective permeability and the effective
permittivity (see Figure 10.14). In this case, the gain enhancement ability of the
ZIML, which arises from the near-zero real part of the refractive index, will be
weakened by loss. However, the destructive effect of the loss on the gain
enhancement is limited because the loss does not affect the refracted angle of
electromagnetic waves incident to the ZIML. Moreover, the loss can be small if a
thin ZIML is used. Assuming that the thickness of the ZIML is d, the transmission
coefficient for a normally incident plane wave can be expressed as [44]

4~ Z
S21  ~ (10.39)
  12  ~  12 Z
where Z is the transmission term

 ~
Z  exp  jk  (10.40)

The wavy lines above the parameters indicate that the parameters are complex.
The thickness d of the slab is much smaller than the operating wavelength, the
magnitude of S21 approximately equals 1 regardless of the loss of the ZIML. The
thickness of the detached ZIML is 8 mm, and hence it is much smaller than the
wavelength corresponding frequencies from 9.4 GHz to 9.7 GHz, which leads the
transmission of fields through the detached ZIML to a high level.

10.3.2 Fabrication, Simulation, and Test of ZIML

The detached ZIML was fabricated and measured with an H-plane horn antenna to
validate the gain enhancement ability of the detached ZIML, as shown in Figures
10.16 and 10.17. The detached ZIML is constructed from metal strip slabs and
MSRR slabs, as shown in Figures 10.16(a) and 10.17(a). The metal strip slab and
the MSRR slab are realized by splicing 13 patches and MSRRs in Figure 10.12
together along the y-axis.
A metal strip slab and an MSRR slab are paired, and nineteen pairs of the
metal strip slab and MSRR slab are periodically inserted in slits on dielectric fixing
slabs to form the detached ZIML, as shown in Figure 10.17(b). The distance
between the metal strip slab and MSRR slab is t1, and the period of the pair is t.
The parameters t and t1 are the same as the ones in Figure 10.12. The ZIML is
placed in front of the H-plane horn antenna with a distance d = 40 mm, as shown in
432 Advanced Computational Electromagnetic Methods and Applications

Figures 10.16(b) and 10.17(b). The horn has an aperture of 139 mm (a2) × 12.70
mm (a3) with length along the z-axis b = 143 mm and is fed by a waveguide of
25.40 mm (a1) × 12.70 mm (a3). In addition, in Figure 10.17, fixing slab A is used
to fix metal strip slabs and MSRR slabs to make up ZIML, fixing slab B is used to
fasten the ZIML and antenna together, and the fixing rods made of PETT
(polyethylene terephthalate) are used to reinforce the ZIML.

(a)

(b)
Figure 10.16 Schema of the detached ZIML: (a) the enlarged structure of the metal strip slab and
MSRR slab and (b) an H-plane horn with the detached ZIML.

(a) (b)
Figure 10.17 Prototype of the fabricated detached ZIML: (a) the metal strip slab and MSRR slab and
the overall view of the detached ZIML and (b) the H-plane horn with the detached
ZIML.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 433

Return losses of the horn antenna with and without the detached ZIML were
measured with a vector network analyzer in a microwave anechoic chamber and
depicted in Figure 10.18. It can be observed from the figure that the return loss of
the horn is slightly affected by the ZIML that has a good transmission due to its
impedance matching and thin thickness. So the ZIML barely reduces the total
efficiency of the antenna, which is an advantage over the traditional ZIMLs or
gradient index lenses.

Figure 10.18 Measured return losses of the H-plane with and without the detached ZIML.

The patterns of the horn antenna with and without the detached ZIML are also
investigated by both simulation and measurement. At 9.9 GHz, the simulation
results are shown in Figures 10.19(a) and 10.19(b), and the measured results are
shown in Figures 10.19(c) and 10.19(d). Figures 10.19(a) and 10.19(c) show the
simulated and measured E-plane patterns of the horn antenna with and without the
ZIML, respectively. Placing the ZIML in front of the horn significantly reduces the
width of the main lobe of E-plane from 91.40o to 14.80o. Moreover, the measured
results of the E-plane show great consistency with the simulation ones. Figures
10.19(b) and 10.19(d) show the simulated and measured H-plane patterns of the
horn antenna with and without the ZIML. In contrast to the E-plane pattern, both
the simulated results and measured results indicate that the ZIML slightly narrows
the main lobe of the H-plane pattern of the horn. The different effects of the ZIML
on the E-plane and H-plane patterns of the horn can be explained by the anisotropy
of the ZIML. In fact, the magnetic response of MSRR can be excited only by an
incident magnetic field penetrating through the MSRR plane. In this case, referring
to Figures 10.12 and 10.16(b), if the field is incident in an azimuth angle between
the wave vector and the z-axis, the constitutive parameters will be quite different
from what are extracted in Figure 10.14, and the refractive index n is not near zero
any more. However, MSRR is independent on the direction of electric field if the
electric field vector is in the y-z plane because the magnetic field can always
434 Advanced Computational Electromagnetic Methods and Applications

penetrate through the MSRR plane. As a result, the constitutive parameters vary
little if the wave is incident in a pitch angle and are almost the same as what are
extracted in Figure 10.14, which means the effective refractive index is still near
zero and the improvement in the E-plane is significant.

(a) (b)

(c) (d)
Figure 10.19 Normalized radiation patterns of the H-plane horn with and without the detached ZIML:
(a) the simulated results of E-plane patterns; (b) the simulated results of H-plane
patterns; (c) the measured results of E-plane patterns; and (d) the measured results of H-
plane patterns.

The gain enhancement of the H-plane horn antenna with the detached
ZIML is also measured, as shown in Figure 10.20. A wideband gain
enhancement from 8.9 GHz to 10.8 GHz is observed. Particularly, the gain
enhancement at 9.9 GHz is 3.88 dB and the peak is 4.02 dB at 9.7 GHz. It is
worth noting that the distance between the ZIML and the antenna will not be a
vital parameter affecting the gain enhancement ability of the detached ZIML,
which is distinct from the lenses based on Fabry-Pérot resonance and gradient
index lens. In order to verify this, numerical simulations are carried out to test
the gain of the H-plane horn antenna loading the detached ZIML with different
values of the distance d and the antenna gain variations are shown in Figure
10.21. It can be seen from the figure that the gain enhancement of the horn
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 435

antenna loading the proposed ZIML is slightly influenced by the distance


between the antenna and the ZIML.

Figure 10.20 Measured gain variation of the horn with and without the detached ZIML.

Figure 10.21 Frequency variation of the simulated gains of the horn with the detached ZIML for
different distances between the antenna and ZIML.

10.4 AUTOMATIC DESIGN OF BROADBAND GRADIENT INDEX


METAMATERIAL LENS FOR GAIN ENHANCEMENT OF
CIRCULARLY POLARIZED ANTENNAS

The concept of graded index (GRIN) metamaterials was proposed by Smith et al.
[45] before the first GRIN metamaterial lens, which possessed a negative refractive
index, was realized for free-space microwave focusing by Driscoll et al. [46].
Recently, GRIN metamaterial lenses consisting of resonant metamaterials with a
positive index of refraction were designed to transform cylindrical or spherical
waves into planar waves, yielding antennas with an increased directivity [4756].
In particular a highly sophisticated broadband, dual-linear-polarized, and high-
directivity lens horn antenna using the GRIN metamaterials, composed of
multilayer microstrip square-ring arrays, was presented in [55]. However, there are
436 Advanced Computational Electromagnetic Methods and Applications

still some open issues left in the recent studies of GRIN metamaterial lenses. All
GRIN metamaterial lenses presented so far are highly polarization sensitive. Their
characteristic electromagnetic response is only supported for predefined linear
polarization states. Polarization-insensitive GRIN metamaterial lenses would be
highly desirable for many applications, such as satellite communications, where it
is necessary to work with circularly polarized waves. For all these cases, the
implementation of GRIN metamaterial lenses requires polarization independence.
However, similar to metamaterial cloaks, ideal GRIN metamaterial lenses rely
on a continuous distribution of the effective refractive index that is achieved by a
proper grading of the unit cells in the underlying metamaterial structure. Hence,
any continuous index distribution has to be approximated by a discrete set of
various metamaterial sections where each of them contains unit cells either of
correspondingly altered shape or with a substantially different topology. In general,
the geometric parameters of the metamaterial unit cells are obtained through
extensive full-wave numerical simulation with no regard for the potential
applicability of any approximate analytical synthesis methodology. In this case the
design procedure of GRIN metamaterial lenses may become extremely arduous, let
alone the resulting high manufacturing costs.
In response to these issues, we introduce a simple and highly efficient
automatic design and fabrication method for broadband polarization-insensitive
GRIN metamaterial lenses. The GRIN metamaterial lens encompasses a
nonresonant metamaterial layer that is represented by an isotropic dielectric slab
accordingly perforated with drill holes of deep-subwavelength dimensions, where
the desired polarization insensitivity is already fostered by the nonresonant nature
of the underlying metamaterial. We also derive analytical formulas describing the
proper distribution rules of the drill holes mimicking the intended grading of the
GRIN lens. The fabricated lens structure is then both numerically and
experimentally validated by placing it on the aperture of a circularly polarized
conical horn antenna.

10.4.1 Automatic Design Method of GRIN Metamaterial Lens

Unlike classical microwave lenses composed of homogeneous dielectric materials


with a specific surface profile, GRIN metamaterial lenses usually encompass two
parallel flat boundaries with a radially varying effective refractive index in
between. As shown in Figure 10.22, an isotropic source is placed in the focus of a
GRIN metamaterial lens with the thickness of t. The choice of such idealized
source is justified because we are aiming at a polarization-insensitive structure. In
order to transform the spherical wave radiated from the source into a plane wave
that is perpendicularly emitted from the top surface of the lens structure, every
optical path from the source point to the top surface should keep the same phase
delay. In this case, according to [55], the radial function n(r) describing the
effective refractive index of the GRIN metamaterial lens must satisfy the following
expression
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 437

L2  r 2  L
n  r   n0  (10.41)
t
where n0 is the refractive index of dielectric material, L stands for the distance
from the phase center to the incidence plane of GRIN metamaterial lens, and r is
the in-plane radial variable corresponding to the radius of the displayed concentric
circle with its center at the origin O.

Figure 10.22 Geometry of the GRIN metamaterial lens that transforms incoming spherical waves into
outgoing plane waves.

It can be easily reasoned that the key issue for realizing a polarization-
insensitive GRIN metamaterial lens is to choose: (1) an isotropic background
material substrate; (2) a kind of feasible polarization-insensitive (and nonresonant)
metamaterial unit cell; and (3) a corresponding planar distribution of those unit
cells, meaning that both the latter features have to cope with the circular symmetry.
Here, the GRIN metamaterial lens is realized by a dielectric slab containing a
circularly symmetric distribution of deep-subwavelength drill holes, as displayed
in Figure 10.23. The different unit cells are organized along concentric annuli all
centered at the origin O, maintaining a uniform distribution of drill holes. These
holes have the same diameter d, and are equally spaced with the same central angle
ζ in the same annulus with a specific thickness a, where for different concentric
annuli the central angle ζ may be different. Adapting the dielectric plate with drill
holes has introduced a favorable feature into the design of lenses and cloaks
[5760], while the design of nonuniform drill holes restricts reducing the volume
of unit cell because they need a larger area to change the radius of holes to fit the
requirement of refractive index, thus it is hard to approximate the continuous
distribution of gradient index in the ideal situation. At the same time, the effective
medium theory is fit for subwavelength structures, thus the drill holes with large
radii in the design of nonuniform drill holes would make the dielectric plate
438 Advanced Computational Electromagnetic Methods and Applications

inhomogeneous. Moreover, it is much easier to control the density of holes than


the radii of holes in practice. Literature [61] provides an excellent example of
applying the uniform drill holes to design a hemispherical lens with a constant
permittivity.

y
x
z
Figure 10.23 Top view of the drill holes in the GRIN metamaterial lens (not drawn to scale).

To simplify the design process of the planar GRIN metamaterial lens,


analytical expressions, which describe the effective refractive index as a function
of the distribution of drill holes, are explored. We consider all radii of the drill
holes being identical and much smaller than the operating wavelength, allowing us
to use simple volume-based mixing rules to calculate effective relative permittivity
of the metamaterial (respective unit cell) [62]
 d  Vd    V
 eff  (10.42)
Vd  V

where εd(v) and Vd(v) are the relative permittivity and the occupied volume of the
two material phases, that is, the dielectric background and the air hole, respectively.
In order to examine the accuracy of this approximation method, we analyze
the effective refractive index of an infinite dielectric slab with a periodic
perforation of air holes (see Figure 10.24) and compared the results obtained from
the mixing rule to the numerically simulated data (see Figure 10.25). The relative
permittivity εd = 2.2 and the thickness of the dielectric slab is chosen to be t = 5
mm. The diameter d of the air hole is fixed at 0.6 mm. The simulated effective
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 439

refractive index is obtained from the effective medium theory and the S-parameter
retrieval method [32, 63, 64].

Figure 10.24 Top view of the equivalent infinite dielectric slab with periodically distributed holes.

It can be observed from Figure 10.25(a) that the calculated and the simulated
data are in good agreement, especially for small volume fractions of the air holes
where the latter is due to the validity range and the intrinsic symmetry of the
mixing formula. The relative error between the calculated and the simulated results
is defined as
nc  ns
 100% (10.43)
nc

where nc is the calculated value of refractive index and ns is the simulated one. It
can be deduced from Figure 10.25(b) that the maximum Δ over the frequency
range of 1 GHz to 15 GHz is less than 2.7%, which means that the mixing formula
given by (10.42) is operated well in the long wavelength limit, namely in the
validity range of the metamaterial approach, and therefore is sufficient to estimate
the effective permittivity/refractive index of the underlying metamaterial.
Based on the possibility of synthesizing the metamaterial effective refractive
index, we can now combine (10.41) and (10.42) to find a distribution relation for
the drill holes, which yields the graded refraction index profile for the intended
spherical-to-plane-wave transformation. On the annulus with inner radius (k 
0.5)a and outer radius (k + 0.5)a, the effective refractive index of the metamaterial
has to match the value n(ka) according to

 L2   ka   L
2
n  r   n  ka   n 
 0
t (10.44)

 r  (( k  0.5) a , ( k  0.5)], k  1, 2, L
440 Advanced Computational Electromagnetic Methods and Applications

On a specific annulus, each drill hole occupies an arc-like unit cell


characterized by its radial extent a and the center angle ζ, as shown in Figure
10.23. Referring to the mixing rule (10.42), the slab thickness t can be cancelled
out leading to a 2-D version where each volume of the metamaterial structure is
represented by its corresponding footprint. Hence, the area of the unit cell Aall
consists of area Ad of the dielectric and area Av of the drill hole, which leads to

2
2k a 2  d 
Ad  Aall  A    , k  1, 2, L (10.45)
360 2

Substituting (10.45) into (10.42), the effective permittivity of the metamaterial


on the annulus from (k  0.5)a to (k + 0.5)a can be obtained as

45d 2    d 
 eff  d  , k  1, 2, L (10.46)
ka 2

(a) (b)

Figure 10.25 Infinite dielectric slab with a periodic perforation of air holes: (a) comparison between
simulated values and calculated values of the effective refractive index for the given
frequency range and (b) relative error between calculated and simulated results.

As the effective permeability of isotropic dielectric and air is 1, the effective


refractive index can be expressed as neff   eff . By combining (10.45), (10.46)
and neff  n(r ) , the central angle ζ can be derived as
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 441

 45d 2   d  1
  r  
    
2

 
2 2
  L  ka  L  
.  ka 2  d   n0  (10.47)
  t  
   

 r  (0.0065(k  0.5), 0.0065( k  0.5)], k  1, 2, L

This analytical formula fully describes the distribution rule of the drill holes
according to the annulus k and the associated central angle ζ, allowing the GRIN
metamaterial lens to be automatically designed.

10.4.2 Numerical Simulations

In order to validate the automatic design method of the GRIN metamaterial lens, a
prototype is designed based on (10.47) and the FDTD method is employed to
simulate this lens. As shown in Figure 10.26, the simulation model consists of the
GRIN metamaterial lens mounted on a conical horn antenna.

Figure 10.26 Sketch of the conical horn antenna (lower) with the GRIN metamaterial lens (upper).

Referring to Figure 10.26, the geometric dimensions of the horn antenna are
chosen to be L = 101 mm, dw = 24.9 mm, and R0 = 50 mm. The GRIN
metamaterial lens consists of a planar dielectric disk with the dimensions R = 60
mm and t = 40 mm, which is isotropic dielectric material with a permittivity εd =
2.2. The GRIN lens is a little larger than the horn antenna to collect potential
fringing field. Referring to Figure 10.23, the diameter of the drill holes amounts to
442 Advanced Computational Electromagnetic Methods and Applications

d = 0.6 mm and the radial extent of the annulus is chosen to be a = 0.65 mm,
leading to a total of 92 annuli including 77 annuli positioned inside the horn
antenna aperture (r ≤ 50 mm) and 15 annuli outside the aperture (50 mm < r ≤ 60
mm). The conical horn antenna operates in the X-band from 8 GHz to 12 GHz.
Substituting the above values into (10.47), the central angle ζ(r) of the
corresponding annulus k = 1, 2…, 77 is calculated according to

 46.0118
   r   
 

2

 k  2.2  4.005  6.3756  0.0264k 2  (10.48)


  
r  (0.0065(k  0.5), 0.0065(k  0.5)], k  1, 2, L, 77

As for the distribution rule of the drill holes outside the horn antenna aperture
(annulus k = 78, 79 …, 92), the design is the same as that on the annulus 77.
For the purpose of visualization and comparison, the resulting effective
refractive index profile of the GRIN metamaterial lens is calculated by using (10.46)
and neff   eff based on the geometric parameters above. As depicted in Figure
10.27, the given theoretical index profile [see (10.41)] is accurately approximated
by the discrete effective refractive index values of the synthesized GRIN
metamaterial lens.

Figure 10.27 Comparison of the theoretical target profile and the realized effective refractive index
distribution of the GRIN metamaterial lens.

Figure 10.28 compares the simulated electric field distribution in the radiative
near-field region of the circularly polarized conical antenna with and without the
designed GRIN metamaterial lens at 10.5 GHz. In particular, Figure 10.28(a)
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 443

shows that the electric field distribution of the ordinary horn antenna gradually
diverges, and an associated decrease in the field amplitude is observed in the
radiation direction along the antenna axis. After placing the GRIN metamaterial
lens on the horn antenna, the electric field distribution is improved as expected,
namely, a transversally confined quasi-plane wave with virtually uniform
amplitude (along the antenna axis) appears in the radiation area, as shown in
Figure 10.28(b). All these effects are represented by a considerably reduced width
of the main radiation lobe, which is tantamount to enhanced radiation directivity,
and thus to an increased gain, while underpinning the intended transformation
performance of the GRIN metamaterial lens.

(a) (b)

Figure 10.28 Electric-field distributions in the x-z plane of the circularly polarized conical horn
antenna: (a) without GRIN metamaterial lens and (b) with GRIN metamaterial lens, both
at an operation frequency of 10.5 GHz.

The simulated normalized far-field radiation patterns of the horn antenna in


the y-z plane are displayed in Figure 10.29 with and without the GRIN
metamaterial lens at the operation frequencies 8.1 GHz, 10.0 GHz, and 11.9 GHz.
The main lobe beamwidth of the horn antenna is considerably reduced due to the
presence of the GRIN metamaterial lens. For example, at 10 GHz, the main lobe
width is reduced from 20o to 17o.
The spectral behavior of the maximum gain is given in Figure 10.30, where
the simulated gain spectra of the horn antenna with and without the GRIN
metamaterial lens clearly indicates that the GRIN metamaterial lens can efficiently
enhance the gain of the horn antenna over a very broad bandwidth ranging from 8
GHz to 12 GHz.
444 Advanced Computational Electromagnetic Methods and Applications

(a)

(b)

(c)
Figure 10.29 Normalized far-field gain in the y-z plane of the circularly polarized conical horn antenna
with (solid line) and without (dashed line) GRIN metamaterial lens at the operation
frequencies (a) 8.1 GHz; (b) 10.0 GHz; and (c) 11.9 GHz.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 445

Figure 10.30 Simulated frequency response of the maximum gain of the circularly polarized conical
antenna with (solid line) and without (dashed line) GRIN metamaterial lens.

10.4.3 Fabrication and Measurement

In order to validate the electromagnetic performance of the designed GRIN


metamaterial lens, a prototype sample was fabricated using a pile of seven stacked
dielectric disks made of F4BMX with a permittivity εd = 2.2 and a loss tangent of
0.0007, where each of them has been identically patterned by a PCB milling
machine (LPKF ProtoMat S62) and mounted on a conical horn antenna, as shown
in Figure 10.31. The return loss with and without the GRIN metamaterial lens is
given in Figure 10.32, which is more than 10 dB over a frequency range of 8 GHz
to 12 GHz [56].

Figure 10.31 Prototype of the designed GRIN metamaterial lens antenna.


446 Advanced Computational Electromagnetic Methods and Applications

Far-field radiation measurements showing the normalized gain patterns of the


designed GRIN metamaterial lens antenna are plotted in Figure 10.33 at the
operation frequencies 8.1 GHz, 10 GHz, and 10.9 GHz and compared to the
corresponding radiation characteristics of the unloaded horn antenna. Similar to the
simulated results, the directivity of the horn antenna is enhanced, and the main
lobe width is decreased by the GRIN metamaterial lens. However, the measured
performance of the lens antenna is noticeably worse than the simulated one, which
is probably due to slight process variations during the fabrication of the stacked
disks, and far-field measurement errors. The measured gain spectrum of the
designed lens antenna is compared to that of the regular horn antenna, as displayed
in Figure 10.34. The data reveals a gain enhancement for the lens antenna between
1.5 and 5.7 dB (mean value 4.3 dB) in the frequency range of 8 GHz to 12 GHz,
which agrees well with the simulation results in Figure 10.30.

Figure 10.32 Measured return loss of the conical horn antenna with (solid line) and without (dashed
line) GRIN metamaterial lens.

Regarding the designed GRIN metamaterial lens, we would expect a utilization


factor respective aperture efficiency that is potentially increased due to the curved
spreading of ray paths in the lens in conjunction with the cophasal wave emission
at the output interface of the GRIN lens. In order to comprehensively examine the
performance of the horn antenna together with the fabricated GRIN lens (see
Figure 10.31), the utilization factor of the aperture field of the horn antenna with
and without the GRIN metamaterial lens is calculated from the measurement data,
as depicted in Figure 10.35. The area of horn aperture is taken in calculating the
utilization factor of the horn itself while the cross-sectional area of the GRIN lens
in the x-y plane is used in calculating the utilization factor of horn antenna with the
GRIN lens.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 447

(a)

(b)

(c)
Figure 10.33 Measured normalized far-field gain patterns in the y-z plane of the circularly polarized
conical horn antenna with (solid line) and without (dashed line) GRIN metamaterial lens
at the operation frequencies: (a) 8.1 GHz, (b) 10.0 GHz, and (c) 11.9 GHz.
448 Advanced Computational Electromagnetic Methods and Applications

Given an upper bound of 0.522 [65] for the utilization factor (i.e., aperture
efficiency) of the optimal circular horn antenna, one easily concludes that the horn
antenna in the experiment is far from optimal, but more importantly, that the
designed GRIN metamaterial lens is capable to increase the utilization factor
significantly  not to mention the peak values well above the upper bound. It is
worth noting that, in the calculation for the aperture efficiency of the GRIN lens
antenna, we use the aperture dimension of the GRIN lens rather than that of the
horn antenna in order to obtain convincing results.
As intended by the chosen symmetry of both the air holes and hole
distribution, the designed GRIN metamaterial lens is expected to have little impact
on the polarization states of incident waves. To prove this, the axial ratio of the
circularly polarized horn antenna with the GRIN metamaterial lens is analyzed and
compared to the corresponding ratio of the bare feeding horn antenna. The
measured axial ratio within a range of operation frequency covering the entire X-
band is shown in Figure 10.36. The unloaded horn antenna emits circularly
polarized radiation with an axial ratio lower than 1.5 dB in the entire operation
bandwidth, whereas the inclusion of the GRIN metamaterial lens degrades the
axial ratio only in the subrange between 9.7 GHz and 11.3 GHz with a maximum
value below 1.6 dB. Another characteristic measure for the quality of circular
polarization is the polarization efficiency as defined here below
Pco
p  (10.49)
Pco  Pcross

where Pco and Pcross are the power of copolarization and cross-polarization,
respectively. The resulting minimum value within the whole X-band amounts to
99.2% for the radiation field emitted after the GRIN metamaterial lens, proving a
high degree of purity of the output circular polarization state.

Figure 10.34 Frequency variation of the measured maximum gain of the circularly polarized conical
antenna with (solid line) and without (dashed line) GRIN metamaterial lens.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 449

Figure 10.35 Frequency variation of the measured utilization coefficient (i.e., aperture efficiency) for
the horn antenna with (solid line) and without (dashed line) GRIN metamaterial lens.

Figure 10.36 Frequency variation of the measured axial ratio of the circularly polarized horn antenna
with (solid line) and without (dashed line) GRIN metamaterial lens.

10.5 CONCLUSIONS

In this chapter method and applications of metamaterials have been reviewed. The
theory of transform optics is first summarized, and then the electromagnetic
concentrator and the waveguide connector are presented based on the transform
optics. Both numerical experiments and measurements have validated the
metamaterial theory and design.
450 Advanced Computational Electromagnetic Methods and Applications

REFERENCES

[1] V. Veselago, “The Electrodynamics of Substances with Simultaneously Negative Values of ε and
µ, ” Soviet Physics USPEKHI, Vol. 10, No. 4, pp. 509514, 1968.
[2] J. Pendry, A. Holden, W. Stewart, and I. Youngs, “Extremely Low Frequency Plasmons in
Metallic Mesostructures,” Physical Review Letters, Vol. 76, pp. 47734476, 1996.
[3] J. Pendry, A. Holden, D. Robbins, and W. Stewart, “Magnetism from Conductors and Enhanced
Nonlinear Phenomena,” IEEE Transactions on Microwave Theory and Technology, Vol. 47, pp.
20752084, 1999.
[4] S. Enoch, G. Tayeb, P. Sabouroux, N. Guérin, and P. Vincent, “A metamaterial for Directive
Emission,” Physical Review Letters, Vol. 89, p. 213902, 2002.
[5] D. Schurig, J. Mock, B. Justice, S. Cummer and J. Pendry, “Metamaterial Electromagnetic Cloak
at Microwave Frequencies,” Science, Vol. 314, p. 977, 2006.
[6] J. Pendry, “Negative Refraction Makes a Perfect Lens,” Physical Review Letters, Vol. 85, p.
3966, 2000.
[7] K. Zhang, F. Meng, Q. Wu, J. Fu, and L. Li, “Waveguide Connector Constructed by Normal
Layered Dielectric Materials Based on Embedded Optical Transformation,” EPL, Vol. 99, p.
47008, 2012.
[8] H. Ma and T. Cui, “Three-Dimensional Broadband and Broad-Angle Transformation-Optics
Lens,” Nature Communication, Vol. 1, p. 124, 2010.
[9] T. Driscoll, G. Lipworth, J. Hunt, N. Landy, N. Kundtz, D. Basov, and D. Smith, “Performance
of a Three Dimensional Transformation-Optical-Flattened Lüneburg Lens,” Optics Express, Vol.
20, pp. 1326213273, 2012.
[10] L. Cong, W. Cao, Z. Tian, J. Gu, J. Han, and W. Zhang, “Manipulating Polarization States of
Terahertz Radiation Using Metamaterials,” New Journal of Physics, Vol. 14, p. 115013, 2012.
[11] K. Zhang, Q. Wu, J. Fu, and L. Li, “Cylindrical Electromagnetic Concentrator with only Axial
Constitutive Parameter Spatially Variant,” Journal of the Optical Society of America B, Vol. 28,
pp. 15731577, 2011.
[12] J. B. Pendry, “A Chiral Route to Negative Refraction,” Science, Vol. 306, pp. 13531355, 2004.
[13] B. Andres-Garcia, L. Garcia-Munoz, V. Gonzalez-Posadas, F. Herraiz-Martinez, and D. Segovia-
Vargas, “Filtering Lens Structure Based on SRRs in the low THz Band,” Progress in
Electromagnetics Research, Vol. 93, pp. 7190, 2009.
[14] L. Huang and H. Chen, “Multi-Band and Polarization Insentive Metamaterial Absorber,”
Progress In Electromagnetics Research, Vol. 113, pp. 103110, 2011.
[15] J. Pendry, “Negative Refraction Makes a Perfect Lens,” Physical Review Letters, Vol. 85, No. 18,
pp. 39663969, 2000.
[16] H. Chen, B. Hou, S. Chen, X. Ao, W. Wen, and C. Chan, “Design and Experimental Realization
of a Broadband Transformation Media Field Rotator at Microwave Frequencies,” Physical
Review Letters, Vol. 102, No. 18, pp. 183903:14, 2009.
[17] C. Lim and T. Itoh, “A Reflecto-Directive System Using a Composite Right/Left-Handed (CRLH)
Leaky-Wave Antenna and Hetero-Dyne Mixing,” IEEE Microwave and Wireless Components
Letters, Vol. 14, No. 4, pp. 183185, 2004.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 451

[18] H. Attia, M. Bait-Suwailam, and O. Ramahi, “Enhanced Gain Planar Inverted-F Antenna with
Metamaterial Superstrate for UMTS Applications,” Progress in Electromagnetics Research
Symposium Proceedings, Cambridge, pp. 494497, 2010.
[19] H. Bahrami, M. Hakkak, and A. Pirhadi, “Analysis and Design of Highly Compact Bandpass
Waveguide Filter Utilizing Complementary Split Ring Resonators (CSRR),” Progress in
Electromagnetics Research, Vol. 80, pp. 107122, 2008.
[20] S. Enoch, G. Tayeb, P. Sabouroux, N. Guerin, and P. Vincent, “A Metamaterial for Directive
Emission,” Physical Review Letters, Vol. 89, No. 21, pp. 213902:14, 2002.
[21] R. Ziolkowski, “Propagation in and Scattering from a Matched Metamaterial Having a Zero
Index of Refraction,” Physical Review E  Statistical, Nonlinear, and Soft Matter Physics, Vol.
70, No. 42, pp. 046608:14, 2004.
[22] Q. Wu, P. Pan, F. Meng, L. Li, and J. Wu, “A Novel Flat Lens Horn Antenna Designed Based on
Zero Refraction Principle of Metamaterials,” Applied Physics A  Materials Science and
Processing, Vol. 87, No. 2, pp. 151156, 2007.
[23] Z. Xiao and H. Xu, “Low Refractive Metamaterials for Gain Enhancement of Horn Antenna,”
Journal of Infrared And Millimeter Waves, Vol. 30, pp. 225–232, 2009.
[24] D. Kim and J. Choi, “Analysis of Antenna Gain Enhancement with a New Planar Metamaterial
Superstrate: an Effective Medium and a Fabry-Prot Resonance Approach,” Journal of Infrared
Millimeter and Terahertz Waves, Vol. 31, No. 11, pp. 12891303, 2010.
[25] S. Hrabar, D. Bonefacic, and D. Muha, “ENZ-Based Shortened Horn Antenna - An Experimental
Study,” Antennas and Propagation Society International Symposium, San Diego, CA, pp. 14,
2008.
[26] J. Ju, D. Kim, W. Lee, and J. Choi, “Wideband High-Gain Antenna Using Metamaterial
Superstrate with the Zero Refractive Index,” Microwave and Optical Technology Letters, Vol. 51,
No. 8, pp. 19731976, 2009.
[27] Q. Cheng, W. Jiang, and T. Cui, “Radiation of Planar Electromagnetic Waves by a Line Source
in Anisotropic Metamaterials,” Journal of Physics D-Applied Physics, Vol. 43, No. 33, pp.
335446:16, 2010.
[28] Y. Ma, P. Wang, X. Chen, and C. Ong, “Near-Field Plane-Wave-Like Beam Emitting Antenna
Fabricated by Anisotropic Metamaterial,” Applied Physics Letters, Vol. 94, No. 4, pp.
044107:13, 2009.
[29] Z. Jiang and D. Werner, “Anisotropic Metamaterial Lens with a Monopole Feed for High-Gain
Multi-Beam Radiation,” IEEE International Symposium on Antennas and Propagation, pp.
13461349, Spokane WA, 2011.
[30] Z. Weng, Y. Jiao, G. Zhao, and F. Zhang, “Design and Experiment of one Dimension and Two
Dimension Metamaterial Structures for Directive Emission,” Progress in Electromagnetics
Research, Vol. 70, pp. 199209, 2007.
[31] Z. Weng, X. Wang, Y. Song, Y. Jiao, and F. Zhang, “A Directive Patch Antenna with Arbitrary
Ring Aperture Lattice Metamaterial Structure,” Journal of Electromagnetic Waves and
Applications, Vol. 22, No. 89, pp. 12831291, 2008.
[32] R. Sauleau, P. Coquet, T. Matsui, and J. Daniel, “A New Concept of Focusing Antennas Using
Plane-Parallel Fabry-Perot Cavities with Nonuniform Mirrors,” IEEE Transactions on Antennas
and Propagation, Vol. 51, No. 11, pp. 31713175, 2003.
452 Advanced Computational Electromagnetic Methods and Applications

[33] D. Smith, S. Schultz, S. McCall, and P. Platzmann, “Defect Studies in a 2-Dimensional Periodic
Photonic Lattice,” Journal of Modern Optics, Vol. 41, No. 2, pp. 395404, 1994.
[34] D. Kaklamani, “Full-Wave Analysis of a Fabry-Perot Type Resonator,” Journal of
Electromagnetic Waves and Applications, Vol. 13, No. 12, pp. 16271634, 1999.
[35] B. Zhou, H. Li, X. Zou, and T. Cui, “Broadband and High-Gain Planar Vivaldi Antennas Based
on Inhomogeneous Anisotropic Zero-Index Metamaterials,” Progress in Electromagnetics
Research, Vol. 120, pp. 235247, 2011.
[36] Q. Wu, J. Turpin, D. Werner, and E. Lier, “Thin Metamaterial Lens for Directive Radiation,”
IEEE International Symposium on Antennas and Propagation, Spokane, WA, pp. 28862889,
2011.
[37] J. Turpin, Q. Wu, D. Werner, E. Lier, B. Martin, and M. Bray, “Anisotropic Metamaterial
Realization of a Flat Gain-enhancing Lens for Antenna Applications,” IEEE International
Symposium on Antennas and Propagation, pp. 28822885, Spokane, WA, 2011.
[38] Z. Mei, J. Bai, T. Niu, and T. Cui, “A Half Maxwell Fish-Eye Lens Antenna Based on Gradient-
Index Metamaterials,” IEEE Transactions on Antennas and Propagation, Vol. 60, No. 1, pp.
398401, 2012.
[39] Y. Zhang, R. Mittra, and W. Hong, “On the Synthesis of a Flat Lens Using a Wideband Low-
Refraction Gradient-Index Metamaterial,” Journal of Electromagnetic Waves and Applications,
Vol. 25, No. 16, pp. 21782187, 2011.
[40] J. Neu, B. Krolla, O. Paul, B. Reinhard, R. Beigang, and M. Rahm, “Metamaterial-Based
Gradient Index Lens with Strong Focusing in the THz Frequency Range,” Optics Express, Vol.
18, No. 26, pp. 2774827757, 2010.
[41] J. Pendry, A. Holden, W. Stewart, and I. Youngs, “Extremely low frequency plasmons in
metallic mesostructures,” Physical Review Letters, Vol. 76, No. 25, pp. 47734776, 1996.
[42] Q. Tang, F. Meng, Q. Wu, and J. Lee, “A Balanced Composite Backward and Forward Compact
Waveguide Based on Resonant Metamaterials,” Journal of Applied Physics, Vol. 109, No. 7, pp.
07A319:13, 2011.
[43] F. Meng, Q. Wu, D. Erni, and L. Li, “Controllable Metamaterial-Loaded Waveguides Supporting
Backward and Forward Waves,” IEEE Transactions on Antennas and Propagation, Vol. 59, No.
9, pp. 34003411, 2011.
[44] R. Ziolkowski, “Design, Fabrication, and Testing of Double Negative Metamaterials,” IEEE
Transactions on Antennas and Propagation, Vol. 51, No. 7, pp. 15161529, 2003.
[45] D. Smith, J. Mock, A. Starr, and D. Schurig, “Gradient Index Metamaterials,” Physical Review E,
Vol. 71, pp. 036609:15, 2005.
[46] T. Driscoll, D. Basov, A. Starr, P. Rye, S. Nemat-Nasser, and D. Schurig et al., “Free-Space
Microwave Focusing by a Negative-Index Gradient Lens,” Applied Physics Letters, Vol. 88, pp.
081101:1 3, 2006.
[47] M. Goldflam, T. Driscoll, B. Chapler, O. Khatib, N. Jokerst, and S. Palit et al., “Reconfigurable
Gradient Index Using VO2 Memory Metamaterials,” Applied Physics Letters, Vol. 99, pp.
044103:13, 2011.
[48] Paul, B. Reinhard, B. Krolla, R. Beigang, and M. Rahm, “Gradient Index Metamaterial Based on
Slot Elements,” Applied Physics Letters, Vol. 96, pp. 241110:13, 2010.
Manipulation of Electromagnetic Waves Based on New Unique Metamaterials 453

[49] L. Ruopeng, C. Qiang, J. Chin, J. Mock, T. Cui, and D. Smith, “Broadband Gradient Index
Microwave Quasioptical Elements Based on Non-Resonant Metamaterials,” Optics Express, Vol.
17, pp. 2103021041, 2009.
[50] L. Ruopeng, Y. Mi, J. Gollub, J. Mock, T. Cui, and D. Smith, “Gradient Index Circuit by
Waveguided Metamaterials,” Applied Physics Letters, Vol. 94, pp. 073506:13, 2009.
[51] D. Smith, Y. Tsai, and S. Larouche, “Analysis of a Gradient Index Metamaterial Blazed
Diffraction Grating,” IEEE Antennas and Wireless Propagation Letters, Vol. 10, pp. 16051608,
2011.
[52] Y. Xin Mi, Z. Xiao Yang, C. Qiang, M. Feng, and T. Cui, “Diffuse Reflections by Randomly
Gradient index Metamaterials,” Optics Letters, Vol. 35, pp. 808810, 2010.
[53] L. Zhen, Q. Rui, and C. Zhen, “A Novel Broadband Fabry-Perot Resonator Antenna with
Gradient Index Metamaterial Superstrate,” IEEE International Symposium Antennas and
Propagation and CNC-USNC/URSI Radio Science Meeting, Toronto, pp. 14, 2010.
[54] M. Zhong and T. Cui, “Experimental Realization of a Broadband Bend Structure Using Gradient
Index Metamaterials,” Optics Express, Vol. 17, pp. 1835418363, 2009.
[55] X. Chen, H. Ma, X. Zou, W. Jiang, and T. Cui, “Three-Dimensional Broadband and High-
Directivity Lens Antenna Made of Metamaterials,” Journal of Applied Physics, Vol. 110, pp.
044904:18, 2011.
[56] H. Ma, X. Chen, H. Xu, X. Yang, W. Jiang, and T. Cui, “Experiments on High-Performance
Beam-Scanning Antennas Made of Gradient-Index Metamaterials,” Applied Physics Letters, Vol.
95, pp. 094107:13, 2009.
[57] Z. Mei, J. Bai, and T. Cui, “Gradient Index Metamaterials Realized by Drilling Hole Arrays,”
Journal of Physics D-Applied Physics, Vol. 43, pp. 055404:16, 2010.
[58] H. Ma and T. Cui, “Three-Dimensional Broadband and Broad-Angle Transformation-Optics
Lens,” Nature Communications, Vol. 1, pp. 124:16, 2010.
[59] B. Zhou, Y. Yang, H. Li, and T. Cui, “Beam-Steering Vivaldi Antenna Based on Partial
Luneburg Lens Constructed with Composite Materials,” Journal of Applied Physics, Vol. 110, pp.
084908:16, 2011.
[60] H. Ma and T. Cui, “Three-Dimensional Broadband Ground-Plane Cloak Made of Metamaterials,”
Nature Communications, Vol. 1, pp. 21:1 6, 2010.
[61] L. Zhijia, S. Yang, and Z. Nie, “A Dielectric Lens Antenna Design by Using the Effective
Medium Theories,” International Symposium on Intelligent Signal Processing and
Communication Systems, Chengdu, China, pp. 14, 2010.
[62] A. Ittipiboon, and S. Thirakoune, “Investigation on Arrays of Perforated Dielectric Fresnel
Lenses,” IEE Proceedings Microwaves, Antennas and Propagation, Vol. 153, pp. 270276, 2006.
[63] F. Meng, Q. Wu, D. Erni, and L. Li, “Controllable Metamaterial-Loaded Waveguides Supporting
Backward and Forward Waves,” IEEE Transactions on Antennas and Propagation, Vol. 59, pp.
34003411, 2011.
[64] H. Ma and T. Cui, “Three-Dimensional Broadband and Broad-Angle Transformation-Optics
Lens,” Nature Communications, Vol. 1, pp. 124:16, 2010.
[65] T. Teshirogi and T. Yoneyama (eds.), Modern Millimeter-Wave Technologies, Burke, VA: IOS
Press, 2001.
Chapter 11
Time-Domain Integral Equation Method for
Transient Problems
Mingyao Xia

This chapter is concerned with the time-domain integral equation (TDIE) method
for solving transient phenomena. Following a brief introduction to the approach,
various integral equations are derived based on equivalent principle, retarded
potential theory, and boundary conditions. Then discretizing schemes are described,
including geometric meshing and mathematical handling to convert the continuous
operator equations into discrete linear systems. An emphasis is placed on precise
evaluations of matrix elements, which is crucial for stability and accuracy. As an
advance, the method is extended to transient scattering by an arbitrarily moving
body, which may travel at hypervelocity and rotate simultaneously about a center.
Plenty of numerical results are provided for both algorithmic validations and real-
world applications.

11.1 INTRODUCTION

Major computational electromagnetics methods today were established in the late


1960s, with the availability of computers, including MoM, FDTD, and FEM. The
TDIE method was also introduced during that period by Bennett and others [16].
A historical review on TDIE prior to 1976 was presented in [7].
There were extensive studies on TDIE methods in the 1980s and 1990s.
Instability problem draws a great deal of attention during this period, because the
marching-on-in-time (MOT) solutions of the discretized TDIE equations were
prone to be unstable. To dig up the causes and give remedies, many attempts were
practiced [813]. However, the averaging scheme [14] seemed to be the sole
effective technique, which solved the time-stepping equation by replacing the
current time-step solution with its averaged value over several earlier time steps.
Early discretizing methods for TDIE equations using polygonal patches mainly
adopted the so called explicit procedure [15] that did not need to solve matrix

455
456 Advanced Computational Electromagnetic Methods and Applications

equations at each time step. This was achieved by choosing a small size of time
step and point-matching the equations in both spatial and temporal domains, so
that the field at the being matched space-time point was contributed by two parts:
from the source on the observer’s patch at the current time step, and from the
sources on the other patches at the earlier time steps. This means that the
contributions from the sources on other patches at the current time step are
excluded due to finite propagation speed. This approach, unfortunately, was
doomed to be divergent eventually, if no extra stabilized measure was introduced.
Stable solutions may be achievable by choosing a larger size of time step, which
led to the so called implicit procedure [16] that had to solve a sparse matrix
equation at each time step. However, the results might be inaccurate. Another way
to extract stable solutions was to use entire domain matching [17] other than the
point-matching or the MOT scheme. This resulted in an increased storage
requirement and computing time. It might be said that TDIE methods were plagued
to large extent by the instability until the end of the last century, such that it missed
the key period being popularized as the FDTD. What blessing was that we
witnessed a significant progress in reducing the computing complexity for large
scale problems, i.e., the development of the fast plane wave time domain (PWTD)
algorithm [18], which was the time domain counterpart of the fast multipole
method (FMM) in frequency domain.
Since the turn of this century, two important techniques have been introduced
to overcome instability and generate accurate TDIE solutions. The first technique
was the proper choice of temporal basis functions that could postpone the
occurrence of instability. They included the squared cosine function [19],
approximate prolate spheroidal wave function [20], higher order Lagrange
interpolating function [21], and quadratic B-spline function [22]. Another choice
was the Laguerre polynomials that led to the marching-on-in-degree (MOD)
scheme [23] rather than the classical MOT scheme. The second technique
consisted of the precise evaluations of matrix elements [24] and was proved to be
very effective to improve the stability and accuracy. Precise calculations of matrix
elements for wire structures [25, 26], 2-D configurations [27, 28], and general 3-D
scatterers [29, 30] were reported. It seemed clear that previous instability and
inaccuracy of the TDIE methods were caused by inaccurate evaluations of matrix
elements, besides improper choice or manipulation of temporal basis functions. It
is expected that proper choice of temporal basis functions and precise evaluation of
matrix elements, in conjunction with the fast PWTD algorithms, would make the
TDIE solvers attractive tools for simulations of various transient or ultrawideband
problems.
This chapter gives basic description of the TDIE methods, from derivation of
governing equations to discretization as time-stepping linear systems, with
emphasis placed on precise evaluations of matrix elements. An extension to
simulations of scattering by moving objects is presented. Sufficient numerical
examples are provided for benchmarks.
Time-Domain Integral Equation Methods for Transient Problems 457

11.2 DERIVATIONS OF TIME-DOMAIN INTEGRAL EQUATIONS

For integral equation (IE) methods, the unknown functions to be solved for are the
sources, including real electric currents, equivalent electric currents, equivalent
magnetic currents, equivalent electric dipoles, or other equivalent sources. These
sources are distributed over limited regions, such as conducting wires, conducting
or dielectric surfaces, or within a finite volume. They are called wire sources,
surface sources, or volume sources. The EMFs generated by the sources are
generally expressed as integral forms through a Green function, which is a solution
of a point source under the same boundary conditions as the problem.
Because we can assume equivalent sources, it is possible that any physical
interface is removed and replaced by an equivalent surface source. The same is
true for a volume with any medium. The region can be replaced by any other
matter plus an equivalent volume source. Therefore, if we like, we can replace any
geometric structures with some equivalent sources and deal with the problems in
free space, so that the solutions of Maxwell equations in free space apply.
By assuming equivalent sources, EMFs are expressed by the sources in
integral forms. To determine the source distributions, the fields are enforced to
meet the boundary conditions. By doing so, we would obtain a lot of integral
equations (IEs), which are exactly the governing equations that we have to solve.
Once the IEs are solved out and the sources are extracted, EMFs at any space-time
point can be found by using the integral expressions. Postprocessing may be
followed by using EMFs that have been calculated.
This outlines the complete process by which an IE method is actualized. The
TDIE method obeys this process. In this section, we concentrate on deriving
various governing integral equations based on the equivalent principle described
above. Discretization and solutions of these TDIEs are left to later sections.

11.2.1 Integral Equations for the 3-D PEC Object

We start by considering a transient scattering problem by a closed 3-D perfect


electric conductor (PEC) residing in free space as shown in Figure 11.1. The
induced current and charge on the surface are denoted by Js and s, respectively,
which are demanded to meet the continuity equation:

s'  J s (r' , t' )   s (r' , t' )  0 (11.1)
t'
where the primed variables are associated with source space-time on the object
surface. The other quantities will be associated with observation space-time. The
retarded potentials due to the induced sources are
458 Advanced Computational Electromagnetic Methods and Applications

Figure 11.1 Configuration of transient scattering by a 3-D PEC object.

J s (r' , t  R / c)
A(r, t )  0  dS' , R  r  r' (11.2)
S
4πR

1  s (r' , t  R / c)
 0 S
 (r, t )  dS' (11.3)
4πR

in which  0 and 0 are the permittivity and permeability of free space and
c  1/  0 0 is the light speed in free space. The scattered fields are retrieved by
the retarded potentials and written as
A
Es      0 L0 (J s ) (11.4)
t
1
Hs    A  K0 (J s ) (11.5)
0

where 0  0 /  0 is the intrinsic impedance of free space, and two operators


have been introduced that are defined as
1  J s (r' , t  R / c) c (r' , t  R / c)
L0 (J s )  
c t S 4πR
dS'    s
S
4πR
dS' (11.6)

R  1 1   J s (r' , t  R / c)
K0 (J s )      dS' (11.7)
S
R  R c t  4πR

The  s (r' , t  R / c) in (11.6) is implicitly related to J s (r' , t' ) by virtue of


(11.1). It should be noticed that the operator L0 is integrable as r  r' ; however,
the operator K0 contains a singularity that should be extracted explicitly as
r  r' , and the result is
Time-Domain Integral Equation Methods for Transient Problems 459

1
K0 (J s )  K0 (J s )  nˆ  J s , r  S  (11.8)
2
where K0 is the principal value part of K0 , and S  means the outside/inside
surface of the scatterer.
The integral equations are established by enforcing the boundary conditions.
The electric and magnetic field boundary conditions for a PEC object read
nˆ  (Ei  Es )  0 (11.9)

nˆ  (Hi  H s )  J s (11.10)
i i
where E and H are the incident electric and magnetic fields. Substituting (11.4)
through (11.8) into above, we have

 1  J s (r' , t  R / c) c (r' , t  R / c)  1
nˆ  
 c t 
S
4π R
dS'    s
S
4π R
dS'   nˆ  Ei (r, t ) (11.11)
 0

1  R  1 1   J s (r' , t  R / c) 
J s  nˆ   P.V.     dS'   nˆ  Hi (r, t ) (11.12)
2  S
R  R c t  4π R 

where P.V. means taking the principal value integral. Either the electric field
integral equation (EFIE) (11.11) or the magnetic field integral equation (MFIE)
(11.12) can be used to solve for the induced currents on the surface. However, for a
closed body, using either EFIE or MFIE, we may encounter the interior resonant
problem that may result in significant errors near the resonant frequencies, which
may be clearly observed if data in the time domain are transformed into the
frequency domain. To overcome the resonant issue, the combined field integral
equation (CFIE) should be employed, which is a combination of EFIE and MFIE
in some way, for example, nˆ   EFIE  (1   ) MFIE , with 0    1 being a
combination parameter.

11.2.2 Integral Equations for 1-D and 2-D PEC Structures

An extension to 1-D case from the 3-D formulation above is easy. However,
usually only EFIE is used for a wire problem. The boundary condition (11.9) is
rewritten as E  E  tan  0 , where the subscript “tan” means taking the tangential
i s

component along wire. The EFIE (11.11) is then rewritten as


460 Advanced Computational Electromagnetic Methods and Applications

 1  J (r' , t  R / c) c (r' , t  R / c)  1

 c t 
L
4π R
dl'   
L
4π R
dl'   Ei (r, t )  (11.13)
 tan  0
tan

in which J and  are the linear distributions of currents and charges along a
wire. A wire problem usually means that the thin-wire approximations apply,
which require that the diameter of a wire is much smaller than both its length and a
properly defined minimum wavelength of the incident wave. The scattering
geometry of a wire problem is shown in Figure 11.2. Under the thin-wire
approximations, the currents and charges are taken to be on the centerline, and the
distance from a source point to an observation field point is approximated by

2
R a 2  r  r' (11.14)

where a is the radius of the wire, and r' is exactly on the centerline.

Figure 11.2 Configuration of transient scattering by a conducting wire.

Extension to 2-D case from the 3-D formulation is also straightforward. The
scattering geometry is shown in Figure 11.3. The body in the z-direction is taken to
be infinite. Now the retarded potentials (11.2) and (11.3) are modified as

J s r ' , t  t' 
ct
Ar, t   0   2 dct 'dC ' (11.15)
CR ct' 2  R 2
 s r ' , t  t' 
ct
1
 r, t    2 dct 'dC ' (11.16)
0 ct' 2  R 2
CR
Time-Domain Integral Equation Methods for Transient Problems 461

Figure 11.3 Configuration of transient scattering by a 2-D cylinder.

As a result, the EFIE (11.11) and MFIE (11.12) become


 ct 
1  1  J s r ' , t  t'  c s r ' , t  t' 
ct
1
n̂  dct 'dC ' s  dct 'dC '   n̂ Ei
2  c t ct' 2  R 2  
C R ct'   R
2 2 0
C R 
(11.17)

1   J s ρ' , t  t' 
ct
1 1 R 1
J s ρ, t   n̂ P.V R  ct ' R  c t  dct 'dC '  n̂ Hi
2 2 ct'  2
R 2
CR

(11.18)

11.2.3 Integral Equations for the 3-D Dielectric Body

We have derived all the integral equations for PEC scatterers above. An extension
to the homogeneous lossless dielectric case is not difficult. We use a 3-D dielectric
body as an example, as shown in Figure 11.4, and assume the distributions of
equivalent electric current J s and equivalent magnetic current J ms on the object
surface. By virtue of the duality principle and the forms of (11.4) and (11.5), the
scattered fields outside the body can be written as
Es  0 L0 (Js )  K0 (J ms ) (11.19)

1
Hs   L0 (J ms )  K0 (J s ) (11.20)
0
The transmitted fields inside the body can be written similarly as
Et  1 L1 (Js )  K1 (J ms ) (11.21)

1
Ht   L1 (J ms )  K1 (J s ) (11.22)
1
462 Advanced Computational Electromagnetic Methods and Applications

where 1  1 / 1 is the intrinsic impedance of the dielectric body, while the


operators L1 and K1 are the same as L0 and K0 but replace c of the light speed
in vacuum with c1 in the dielectric.
The integral equations are still established by using the boundary conditions.
The electric field boundary conditions are:

J ms  (Ei  Es )  nˆ , r  S (11.23)

J ms  Et  (nˆ ) , r  S (11.24)

Figure 11.4 Configuration of transient scattering by a 3-D dielectric body.

Substituting (11.19) and (11.21) into the equations above and extracting the
singularity property as (11.8), we obtain the EFIEs as follows:
1
nˆ 0 L0 (J s )  J ms  nˆ  K0 (J ms )  nˆ  Ei (11.25)
2
1
nˆ 1 L1 (J s )  J ms  nˆ  K1 (J ms )  0 (11.26)
2
Similarly, by using the magnetic field boundary conditions, we can obtain the
MFIEs, which are the dual forms of the EFIEs above, that is,
1 1
nˆ  L0 (J ms )  J s  nˆ  K0 (J s )  nˆ  Hi (11.27)
0 2

1 1
nˆ  L1 (J ms )  J s  nˆ  K1 (J s )  0 (11.28)
1 2

By using the EFIEs (11.25) and (11.26) or the MFIEs (11.27) and (11.28), the
equivalent currents J s and J ms can be solved. However, using either EFIEs or
MFIEs, we may encounter the interior resonant problem, as pointed out in the PEC
Time-Domain Integral Equation Methods for Transient Problems 463

case. The reason can be attributed to use of either the electric field or the magnetic
field boundary conditions, but not both. Thus, a simple remedy is to combine the
two sets of equations, so that both the electric field and magnetic field boundary
conditions are enforced. Direct additions of (11.25) to (11.26) and (11.27) to
(11.28) lead to the Poggio-Miller-Chang-Harrington-Wu-Tai (PMCHWT)
equations [31]:
nˆ  (0 L0  1 L1 )(Js )  nˆ  (K0 +K1 )(J ms )  nˆ  Ei (11.29)

1 1
nˆ  ( L0  L1 )(J ms )  nˆ  (K0 +K1 )(J s )  nˆ  Hi (11.30)
0 1

It is noticed that the dominate terms 1


2
J s and 12 J ms are canceled, so that the
above PMCHWT equations amounts to the EFIE in the PEC case, which is of the
first kind. In fact, a variety of combined equations may be constructed by making
use of (11.25) through (11.28). In general, multiplying (11.26) by p and adding to
(11.25), multiplying (11.28) by q and adding to (11.27), we obtain

1 p
nˆ  (0 L0  p1 L1 )(J s )  J ms  nˆ  (K0  pK1 )(J ms )  nˆ  Ei (11.31)
2
1 q 1 q
nˆ  ( L0  L1 )(J ms )  J s  nˆ  (K0  qK1 )(J s )  nˆ  Hi (11.32)
0 1 2

A special choice of p  q  1 makes this set of equations reduce to the


PMCHWT equations as (11.29) and (11.30). Another special choice is p  1 /  0
and q  1 / 0 , which leads to the Muller equations [31]. For this choice, the
dominate terms J s and 12 J ms are enhanced, so that the Muller equations amount
1
2

to the MFIE in the PEC case, which is of the second kind.


So far, we have obtained all the governing equations that may be employed to
solve for the electric currents in the PEC case, or the equivalent electric and
magnetic currents in the dielectric case. Theoretically, all these equations give the
same results; however, after discretization, the properties of their linear equation
sets can be significantly different, and as a result, the stability and accuracy can be
diverse to some extent.

11.3 DISCRETIZATION OF GOVERNING EQUATIONS

The governing equations derived above must be discretized before they can be
solved numerically. Geometric discretization is also called meshing, which is used
to divide the whole solution domain into many small units. Mathematical
464 Advanced Computational Electromagnetic Methods and Applications

discretization is an approach used to convert a continuous operator equation into a


discrete linear system.

11.3.1 Discretization for the Wire Problem

11.3.1.1 Geometric Meshing

As shown in Figure 11.5, a wire is first divided as N segments. A point on the


wire may be expressed by
N
r ( )   rnn ( ) , 0    N (11.33)
n 0

where rn (n  0,1, , N ) are a set of control points on the wire, and n ( ) are
called shape functions. The classical linear interpolating shape function reads

 ( )  1   ,  1    0
 

 ( )    (11.34)
 ( ) 1   , 0    1

and n (
 )  (  n) . In (11.34),  ( )  0 if   1 , which has been omitted,
and we will take this as a convention throughout this chapter. Obviously, the linear
shape function fits the wire with a train of straight line segments. The segmented
wire is piecewise continuous. It has a first-order derivative that is not continuous.
Higher-order derivatives are not available.

Figure 11.5 Segmentation of a conducting wire.

To improve the smooth, higher-order spatial functions may be employed.


However, using higher-order spatial functions, the control points rn (n  0,1, , N )
cannot all be located on the wire itself, which is not convenient. Instead, we may
use a pair of shape functions for a set of given positions and their derivatives on
the wire:
Time-Domain Integral Equation Methods for Transient Problems 465

N N
r ( )   rnn ( )   rn'  n ( ) , 0    N (11.35)
n 0 n 0

where rn'  r /    n , n ( )   (  n) and  n ( )   (  n) . As an example,


a pair of basis functions may be adopted:

 ( )  (1   )(1    2 ),  1    0
  2

 ( )    (11.36)
 ( )  (1   )(1    2 ), 0    1
2

  ( )   (1   ) 2 ,  1    0

 ( )    (11.37)
 ( )   (1   ) , 0    1
2

It is easy to show that


 (0)  1 ,  (1)  0 , ' (0)  ' (1)  0 (11.38)

 ' (0)  1 ,  ' (1)  0 ,  (0)   (1)  0 (11.39)

The tangential vector of the wire and the unit directional tangential vector can
be defined as
 r' ( )
s( )  r ( )  r' ( ) , sˆ( )  (11.40)
 r' ( )

An elemental length is defined by

dl  s( ) d (11.41)

Substituting (11.33) or (11.35) into (11.40), the direction vector can be


calculated. Specifically, for the linear shape function, we have
N
s( )   d n ( ) (11.42)
n 1

n1 ( )   ( )
dn ( )  rn 1  rn n  rn  rn 1 , n  1    n (11.43)
 
The lengths of a segment and the whole wire are given by
n N N
ln   d n ( ) d  rn  rn 1 , L   s( ) d   ln (11.44)
n 1 0 n 1

If the higher-order shape functions are adopted, the directional vector and the
segmental length can be calculated in the same way.
466 Advanced Computational Electromagnetic Methods and Applications

11.3.1.2 Spatial and Temporal Basis Functions

After the geometry discretization, a spatial vector basis function associated with
the shape function may be defined as

fn ( )  n ( )sˆ n ( ), n  1    n



fn (r ( )) 
 ˆ
n ( )s n ( )   (11.45)
fn ( )  n ( )sˆ n ( ), n    n  1
 

where we have set sn ( )  dn ( ) , sn ( )  dn 1 ( ) , sˆ n ( )  sn ( ) / sn ( ) , and


sˆ n ( )  sn ( ) / sn ( ) . The minus divergence of fn ( ) is given by

  1  
 g n ( )   s  ( ) 
n ( ), n  1    n
 n
g n (r ( ))  l  f n ( )   (11.46)
 g  ( )   1  
n ( ), n    n  1
 n s n ( ) 

For the linear shape function, we have

fn ( )  (1    n)sˆ n , n  1    n
  

fn ( )    (11.47)
fn ( )  (1    n)sˆ n , n    n  1

  1
 g n ( )  s  , n  1    n
 n
g n ( )   (11.48)
 g  1
( ) , n    n 1
 n s n

where sn rn  rn 1 and 


sn rn 1  rn .
Now, we expand the current and charge distributions on the wire using the
spatial basis functions defined above as
 N 1
J (r' , t' )  I
j 1 
n 1
n ( j )f n (r' )T ( t'  j ) (11.49)

 N 1
 (r' , t' )  t  I n ( j ) g n (r' ) S ( t'  j ) (11.50)
j 1 
n 1

where we have defined  t' t' / t with t being the size of the time step or the
temporal resolution, and I n ( j ) is the current at the nth node and jth time step. We
need to solve for only N  1 unknowns due to I 0 I
N 0 . If the wire is a closed
Time-Domain Integral Equation Methods for Transient Problems 467

loop, the number of unknowns should be N. Because the current and charge must
meet the continuity equation, and we already have defined gn (r' )  'l  fn (r' ) ,
we must impose
 
T ( t' )  t S ( t' )  S (t' )  S' (t' ) (11.51)
t' t'
where T ( t' ) and S ( t' ) are the temporal basis functions for the currents and
charges, respectively. A simple choice of T ( t' ) is the triangle function or linear
interpolating function:
1  t' ,  1  t'  0
T ( t' )   (11.52)
1  t' , 0  t'  1
The temporal basis function for the charge is

 12 (1  t' ) 2 , 1  t'  0
t' 
S ( t' )   T (u )du  1  12 (1  t' ) 2 , 0  t'  1 (11.53)

1, t'  1


It is noticed that the function S ( t' ) is not compact and has an infinite support
width, which will result in a full matrix system. To avoid employing an uncompact
temporal basis function, we may require the function S ( t' ) to have a definite
support interval, and the function T ( t' ) is also compact. If doing so, the function
S ( t' ) must be at least second order, and the support intervals must be at least three.
A proper choice is the quadratic B-spline function [22], which is expressed as

 12 (1  t ) 2, 0  t  1  1

S ( t )   12  t  t 2, 0  t  1 (11.54)
1 2
 2  ( t  1)  2 ( t  1) , 0  t  1  1
1

1  t , 0  t  1  1

T ( t )  S' ( t )  1  2 t , 0  t  1 (11.55)
1  ( t  1), 0  t  1  1

Now, both the temporal basis functions for the currents and charges are
compact. We will use this set of temporal basis functions in the computations later.
468 Advanced Computational Electromagnetic Methods and Applications

11.3.1.3 Marching-on-in-Time (MOT) Scheme

Substituting (11.49)) and (11.50) into (11.13),


(11. ), and then testing the equation by
and i 1,2, ,  , we obtain
fm (r) ( t  i) with m  1,2, , N 
 N 1
m  1, 2, , N
 Z m, n (i  j ) I n ( j ) 
Vm (i) ,  (11.56)
j 1 
n 1  i 1, 2, , 
By interchanging the index i  j and j (letting i  j  k and then resetting
k  j ), we can rewrite (11.56) as
i 1 N 1
m  1, 2, , N
 Z m, n ( j ) I n (i  j ) 
Vm (i ) ,  (11.57)
j   n  1  i 1, 2, , 
where the elements are
1
Vm
(i ) f m (r )  Ei (r, it )dl , dl  s( ) d (11.58)
0 m

 fm (r)  fn (r' ) S" ( j  R ) g (r) g n (r' ) 


Z m, n ( j )    
m n
4πR ct
 (ct ) m
4πR
S ( j  R )  dl' dl (11.59)

where
R R / (ct ) has been defined, and the following formula has been used

f
m
m (r) l dl    l  fm (r) dl 
m
g
m
m (r) dl (11.60)

in which the first equality holds because fm (r) vanishes at the two end points. If
the temporal basis functions are compact, that is, S ( j  R ) 
0 if j  R  1 or
j  R  p , where p  1 for the triangle function or p  2 for the quadratic B-
spline function, we should have Z ( j )  0 if j  Rmin  1 or j  Rmax  p , where
Rmin  0 is the minimum dimension of the wire and Rmax is the maximum
dimension of the wire. Therefore, we should have 0  j  int( Rmax )  p. As a
result, we can rewrite (11.57) as the marching-on-in-time (MOT) form:
min( i 1, L )
[ Z (0)]{
I (i)} {V (i)}  j 1
[ Z ( j )]{I (i  j )} (11.61)

where
 L int( Rmax )  p with p  1 being the number of support intervals of the
temporal basis function. It is clear that L   if the temporal basis function is
noncausal or has an infinite support width. It is seen that (11.61) takes a recursive
Time-Domain Integral Equation Methods for Transient Problems 469

form, that is, the right sides are known when the coefficient {I (i)} is solved for at
the ith time step.

11.3.2 Discretization for the 2-D Problem

Geometric discretization for a 2-D geometry is the same as the wire structure,
because the contour of cross-section can be seen as a wire for an open strip or a
closed loop for a cylinder, as shown in Figure 11.6. For a 2-D problem, we usually
distinguish it as either a TM case or a TE case. TM case means that the electric
field of incident wave has only a z-component. TE case means that the magnetic
field of incident wave has only a z-component.

Figure 11.6 Discretization of a conducting cylinder.

For the TM case, Ei  zˆ Ezi and Hi  (kˆ i  zˆ ) Ezi / 0 , and the induced current
on the surface has only z-component, too, that is, J s  J z zˆ . The charge on the
surface vanishes because Jz is invariant with z such that
s  Js  J z / z  0   / t . As a results, the governing equation (11.17) is
reduced to
ct
1  J z ρ' , t  t ' 1
 ct  d ct 'dC '  E zi ρ, t  (11.62)
2 ct '2
R 2 0
CR

The spatial basis function that is suitable for expanding the currents is the
same as (11.45) but replaces sˆ n ( ) with ẑ . Thus, the current is expanded as
470 Advanced Computational Electromagnetic Methods and Applications

 N
J z (ρ' , t' )   I n ( j )n (' )T ( t'  j ) (11.63)
j 1 n 1

where T ( t' ) is taken to be (11.55). Substituting (11.63) into (11.62) and testing it
with m ( ) , and after some recasting, we obtain the discrete version of (11.62) in
MOT form as
i 1
[ Z E,TM (0)]{I (i)}  {V E,TM (i)}  [ Z E,TM ( j )]{I (i  j )} (11.64)
j 1

with
1
VmE,TM (i )   m ( ) Ezi (ρ, it )dC , dC  s( ) d (11.65)
0 m

1 1
2π ct m n
Z mE,TM
,n ( j)  F ( j, R)m ( )n (' )dC' dC (11.66)

 j 1 ct
d(ct )
F  j, R    T ( j  t ) (11.67)
max  R ,  j  2  ct  (ct )2  R 2

If we use the MFIE of (11.18), its discrete version in MOT form is the same as
(11.64) just by replacing the elements with

1
 (nˆ  kˆ ) ( ) Ezi (ρ, it )dC (11.68)
i
VmH,TM (i)   m
0 m

1
Z mH,TM
,n ( j)  T  j   m ( )n ( )dC
2 m

1 1 ˆ ) ( ) (' )dC' dC
 P.V.   G( j, R)(nˆ  R m n
(11.69)
2π ct m n

( j 1) ct
 T ( j  t ) T ( j  t )  d(ct )
G( j, R)  (ct )  
max  R ,( j  2) ct  
ct'  R
 
ct  (ct )2  R 2
(11.70)

For TE case, Hi  zˆ H zi and Ei  0 (zˆ  kˆ i ) H zi , the induced current on the


surface flows around the contour, and the suitable spatial basis function is exactly
(11.47) with sˆ( )  zˆ  nˆ ( ) . The current is then expanded as
Time-Domain Integral Equation Methods for Transient Problems 471

 N
J s (ρ' , t' )   I n ( j )n (' )sˆ n (' )T ( t'  j ) (11.71)
j 1 n 1

Substituting this into (11.17) and testing it by m ( )sˆ n ( ) ( t  i) , after some


recasting, we get its discrete version in the same MOT form as (11.64) just by
replacing the elements with

 (nˆ  kˆ ) ( ) H zi (ρ, it )dC (11.72)


i
VmE,TE (i)  m
m

1 1
2π ct m n 
Z mE,TE
,n ( j) 
 F ( j , R)fm (ρ)  fn (ρ)  (ct ) 2 W ( j, R) g m (ρ) g n (ρ)  dC'dC

(11.73)
 j 1 ct
d(ct )
W  j, R    S ( j  t ) (11.74)
max  R ,  j  2  ct  (ct )2  R 2

If the MFIE of (11.18) is adopted, the elements are replaced by

VmH,TE (i)    m ( ) H zi (ρ, it )dC (11.75)


m

1 1 1
Z mH,TE
,n ( j)  T ( j )  fm (ρ)  fn (ρ)dC  P.V.  
2 m
2π ct m n

 ˆ )[f (ρ)  f (ρ)] (nˆ  R


G( j, R) (nˆ  R m n
ˆ )  [f (ρ)  f (ρ)] dC' dC
m n  (11.76)

11.3.3 Discretization for the 3-D Conducting Body

For meshing of a 3-D structure, the same procedure as in the wire case applies.
Giving a set of control points and some shape functions, a general 3-D surface can
be modeled tightly. Commonly used shape functions include planar triangle,
curved triangles, planar quadrangles, and curved quadrangles. Triangular
discretization is most popular and is adopted here.
We first use the planar triangular meshing. Three control points on the surface
are taken to be the three vertex points ri (i  1, 2,3) of a planar triangle, as shown
in Figure 11.7. A point inside the triangle is expressed by
3
r(1 , 2 )  i (1 ,  2 )ri  1r1  2 r2  (1  1  2 )r3 (11.77)
i 1

where 0  1 , 2  1 ; apparently, the three shape functions are


472 Advanced Computational Electromagnetic Methods and Applications

1 (1 ,  2 )  1

2 (1 ,  2 )   2 (11.78)
 ( ,  )  1    
 3 1 2 1 2

It is clear that (11.77) maps an arbitrary triangle into a right triangle. An


elemental area is

dS  J d1d2 (11.79)

where J is the Jacobi

r r
J    (r1  r3 )  (r2  r3 )  2 A (11.80)
1  2

with A being the area of the triangle.

(a) (b)
Figure 11.7 An arbitratily planar triangle is mapped into a right triangle: (a) an arbitrarily planar
triangle and (b) mapped as a right triangle.

If we use curved triangle meshing, as shown in Figure 11.8, by giving the


three vertex points ri (i  1, 2,3) and the three middle points ri (i=4,5,6), a point
inside the triangle is written as
6
r (1 ,  2 )  i (1 ,  2 )ri (11.81)
i 1

where the six shape functions are


1 (1 ,  2 )  1 (21  1) 4 (1 ,  2 )  41 2
 
2 (1 ,  2 )   2 (2 2  1) , 5 (1 ,  2 )  4 23 (11.82)
 ( ,  )   (2  1)  ( ,  )  4 
 3 1 2 3 3  6 1 2 3 1
Time-Domain Integral Equation Methods for Transient Problems 473

with 1  2  3  1 or 3  1  1  2 . Similarly, (11.81) maps an arbitrarily


curved triangle into a right triangle. The elemental area is still (11.79) but the
Jacobi is

r r 6 6
J  
1  2
 
i 1 j 1
ij (1 ,  2 )(ri  r j ) (11.83)

i  j
with  ij (1 ,  2 )  .
1  2

(a) (b)
Figure 11.8 An arbitratily curved triangle is mapped into a right triangle: (a) an arbitrarily curved
triangle and (b) mapped as a right triangle.

The spatial basis function corresponds to the planar triangle discretization that
is known as the Rao-Wilton-Glisson (RWG) vector triangle function, which is
defined over a pair of triangles [32], as shown in Figure 11.9. A point inside the
two triangles is expressed as

1r1   2 r2  (1  1   2 )r3 , r   n
 

r (1 ,  2 )   (11.84)
1r1   2 r2  (1  1   2 )r4 , r   n

Figure 11.9 Definition of a planer RWG basis function.


474 Advanced Computational Electromagnetic Methods and Applications

Define

  r 
s1    r1  r3 , r   n
 1
s1   (11.85)
s   r  r  r , r   
 
 1 1
4 1 n

  r 
s 2    r2  r3 , r   n
 2
s2   (11.86)
s    r  r  r , r   


2
 2
4 2 n

The RWG basis function corresponding to the nth edge is defined as


  ln 
f n (r )  ρ n , r   n
ln  2 An
f n (r )  (1s1   2s 2 )   (11.87)
Jn f  (r )  ln 
ρ n , r   n


n
2 An

where ln is the length of the nth edge, and An is the area of  n , and

ρn  1s1   2 s 2  r  r3 , r   n

  (11.88)
ρn  1s1   2 s 2  r4  r, r   n
  

The minus divergence of (11.87) is


  ln 
 g n (r )   A , r   n
 n
g n (r )  s  f n (r )   (11.89)
l
 g  (r )  n , r   


n
An
n

It is easy to demonstrate that the normal component of the basis function


defined by (11.87) is continuous across the nth edge, while its normal components
on the other four edges are zero. These properties ensure that there is no charge to
be accumulated on the common edge, and the charge over the pair of triangles is
conservative.
If we use the curved triangle meshing, a point inside the two triangles is
written as
Time-Domain Integral Equation Methods for Transient Problems 475

 6
 i (1 ,  2 )ri , r   n
 

 i 1
r (1 ,  2 )   6 (11.90)
  ( ,  )r  , r   
  i 1 2 i
 i 1
n

where ri (i  1 ~ 6) represents the six control points of  n and ri (i  1 ~ 6)


represents the six control points of  n . Similar to (11.85) and (11.86), we define

  r  6


s1   i (1 , 2 )ri
1 1 i 1
s1   (11.91)
s    r   
6

 1
 
 i (1 ,  2 )ri
 1 1 i 1

  r  6
s 2      i (1 ,  2 )ri

 2 2 i 1
s2   (11.92)
s     r  6
 2
 2
 i (1 , 2 )ri
 2 i 1

Then the curved RWG basis function is defined by

  ln
f n (r )  (1s1   2s 2 ), r   n
 J n
f n (r )   (11.93)
f  (r )  ln
( s   s ), r  
  
n J n
1 1 2 2 n

where the Jacobi is

r  r  6 6
J n  
1  2
 
i 1 j 1
ij (1 ,  2 )(ri  rj ) (11.94)

The minus divergence of (11.93) is

  2ln 
 g n (r )    , r   n
 J n
g n (r )  s  f n (r )   (11.95)
2 l
 g  (r )  n , r   
 n J n
n

Using the spatial basis functions defined above, the currents and charges on
the body surface are expanded by
476 Advanced Computational Electromagnetic Methods and Applications

 N
J s (r, t )   I n ( j )fn (r)T ( t  j ) (11.96)
j 1 n 1

 N
 s (r, t )  t  I n ( j ) g n (r) S ( t  j ) (11.97)
j 1 n 1

Substituting these into the EFIE of (11.11) and testing it with


[nˆ  fm (r)] ( t  i) , after some recasting, we obtain its discretized version in MOT
form as
min( i 1, L )
[ Z E (0)]{I (i)}  {V E (i)}  j 1
[ Z E ( j )]{I (i  j )} (11.98)

where L  int[ Rmax / (ct )]  2 , and the elements are

1
VmE (i ) 
0 f
m
m (r)  Ei (r, it )dS (11.99)

1  T' ( j  R ) fm (r)  fn (r' )


Z mE, n ( j )    
4π m n  ct R
g m (r) g n (r' ) 
(ct ) S ( j  R )  dS' dS (11.100)
R 

where the following formula has been used

f
m
m (r) s dS    s  fm (r) dS   gm (r) dS
m m
(11.101)

in which the first equality is achieved by using the divergence theorem and
noticing that the normal component of fm (r) is continuous across the common
edge and vanishes on the other four edges.
Similarly, substituting (11.96) into the MFIE of (11.12) and testing it by
fm (r) (t  it ) , after some recasting, we obtain its discretized version in MOT
form:
min( i 1, L )
[ Z H (0)]{I (i)}  {V H (i)}   j 1
[ Z H ( j )]{I (i  j )} (11.102)

with

VmH (i)  f
m
m (r)  nˆ  Hi (r, it ) dS (11.103)
Time-Domain Integral Equation Methods for Transient Problems 477

1 1  T ( j  R ) T' ( j  R ) 
Z mH, n ( j )  T ( j )  fm (r )  fn (r )dS 
2 m


4π m n  R

ct 

 ˆ )[f (r)  f (r' )]  (nˆ  R


 (nˆ  R m n
ˆ )  [f (r)  f (r' )] dS'dS
m n  (11.104)

Solving (11.98) or (11.102) obtains the expansion coefficients. To avoid


producing resonant spurious solution, the CFIE should be adopted for closed
bodies, which is a combination of the EFIE and MFIE as
min( i 1, L )
[ Z C (0)]{I (i)}  {V C (i)}  j 1
[ Z C ( j )]{I (i  j )} (11.105)

with

 Z m, n ( j )   Z m, n ( j )  (1   ) Z m, n ( j )
 C E H

 C (11.106)
Vm (i)  Vm (i)  (1   )Vm (i)
E H

and 0    1 is a combination parameter.

11.3.4 Discretization for the 3-D Dielectric Body

Geometric meshing of a 3-D dielectric geometry is the same as a conducting


structure as described previously. Now we have to solve for both the equivalent
electric currents and equivalent magnetic currents, which are expanded as

 N
J s (r, t )   I n(e) ( j )fn (r)T ( t  j ) (11.107)
j 1 n 1

 N
 s (r, t )  t  I n(e) ( j ) g n (r) S ( t  j ) (11.108)
j 1 n 1

 N
J ms (r, t )  1  I n(m) ( j )f n (r)T ( t  j ) (11.109)
j 1 n 1

 N
 ms (r, t )  1t  I n(m) ( j ) g n (r)S ( t  j ) (11.110)
j 1 n 1

The 1 in (11.109) would make the coefficient I n(e) and I n(m) to be on the
some order of magnitude. Substituting these expansions into (11.31) and (11.64)
and testing them by [nˆ  fm (r)] ( t  i) , we will obtain
478 Advanced Computational Electromagnetic Methods and Applications

i 1 i 1

[Z
j 0
EE
( j )]{I (e) (i  j )}  [ Z EH ( j )]{I (m) (i  j )}  {V E (i)}
j 0
(11.111)

i 1 i 1

[Z
j 0
HE
( j )]{I (e) (i  j )}  [ Z HH ( j )]{I (m) (i  j )}  {V H (i)}
j 0
(11.112)

The elements are

m, n ( j )  pr Lm, n ( j )
ZmEE, n ( j )  L(0) (1)
(11.113)

Z mEH, n ( j )   12 (1  p)rU m,n ( j ) r  Km(0),n ( j )  pKm(1),n ( j )  (11.114)

ZmHE, n ( j )  12 (1  q)U m,n ( j )  Km(0),n  qKm(1),n (11.115)

ZmHH, n ( j )  r L(0) (1)


m, n ( j )  qLm, n ( j ) (11.116)

1
VmE (i ) 
0 f
m
m (r )  Ei (r, it )dS (11.117)

VmH (i)  f
m
m (r)  Hi (r, it )dS (11.118)

U m, n ( j )    [nˆ  f
m n
m (r)]  fn (r) dS (11.119)

 S" ( j  Rs ) fm (r)  fn (r ') g (r) g n (r ') 


L(ms,)n ( j )    
m n
cs t 4π R
 (cs t ) S ( j  Rs ) m
4πR
 dS'dS

(11.120)

ˆ  f (r' )  S"( j  R ) S' ( j  R ) 


R
K m( s,)n ( j )  P.V.   fm (r) 
n

s
 s
 d S'dS (11.121)
m n
4πR  cs t R 

In the above c0 and c1 are the light speeds in the vacuum and dielectric,
respectively. The set of equations (11.111) and (11.112) can be written in the same
MOT form as (11.105) by replacing Z C ( j ) , I (i ) and V C (i) with the following,
respectively.

[ Z EE ( j )] [ Z EH ( j )]
[ Z ( j )]   HE HH  (11.122)
[ Z ( j )] [ Z ( j )]
Time-Domain Integral Equation Methods for Transient Problems 479

{I (e) (i)}  {V E (i)}


{I (i)}   (m)  , {V (i)}   H  (11.123)
{I (i)} {V (i)}

11.4 EVALUATION OF MATRIX ELEMENTS

Stability and accuracy of TDIE methods largely rely on the precise evaluations of
matrix elements. Key integral techniques for 1-D, 2-D and 3-D geometrics are
addressed in this section, for both singular and nonsingular terms.

11.4.1 Matrix Setup for the Wire Problem

Let’s introduce a characteristic function

1, 0  x  1
 ( x)   (11.124)
0, otherwise
By using this function, the quadratic B-spline temporal basis functions given
in (11.54) can be written as

S (t )  12 (t  1)2  (t  1)  ( 12  t  t 2 )  (t )   12  (t  1)  12 (t  1)2   (t  1) (11.125)

S' (t )  T (t )  (t  1)  (t  1)  (1  2t )  (t )  1  (t 1)  (t 1) (11.126)

S" (t )  T' (t )   (t  1)  2 (t )   (t 1) (11.127)

Substituting (11.125) and (11.127) into (11.59), we would obtain

Z m, n ( j )   X m , n ( j  1)  2 X m , n ( j )  X m , n ( j  1) 
 12 Ym(2)  1 (0) (1) (2)

, n ( j  1)   2 Ym , n ( j )  Ym , n ( j )  Ym , n ( j )  (11.128)
  12 Y(0)
m, n ( j  1)  Y (1)
m, n
(2)
( j  1)  12 Y
m,n ( j  1) 

in which we have defined


1 1 fm (r)  f n (r' )
X m, n ( j )   
4π ct m n R
 ( j  R )dl' dl

1 1 2,2
fmp (r)  fnq (r' )

4π ct

p , q 1
 R
 ( j  R )dl' dl (11.129)
 p q
m n
480 Advanced Computational Electromagnetic Methods and Applications

ct g (r) g n (r' )


  0,1, 2
4π m n
Ym(, n) ( j )  ( j  R ) m  ( j  R )dl' dl
R
ct 2,2
g mp (r ) g nq (r' )


   ( j  R)
p , q 1  p q R
 ( j  R )dl' dl (11.130)
m n

We set 1m  m and 2m  m , and so forth. It is seen from (11.128) that the
calculations of the matrix elements take a recursive way, that is, the elements
calculated in previous time steps can be used to calculate the elements at a few
later time steps. By this manner, at least half the CPU time can be saved in the
matrix setup stage.
If not for the causality imposed by the factor  ( j  R) , the integrations of
(11.129) and (11.130) may be carried out analytically without difficulty. This
factor complicates the integrals to a great extent, making analytical treatments [26]
too tedious to be practical. As a result, we will give closed-form expressions only
for the self-action term, while for interaction terms Gaussian numerical quadrature
is employed.

11.4.1.1 Evaluation for Nonsingular Terms

Let

fmp (r)  fnq (r' )


X m , n; p , q ( j )   R
 ( j  R )dl' dl (11.131)
 p q
m n

g m (r) g n (r' )
Ym(, n); p , q ( j )    ( j  R)

 ( j  R )dl' dl (11.132)
mp qn
R

If  mp and  qn do not overlap, or j  1 , the two integrals above are not


singular. They can be numerically integrated by using the Gaussian quadrature as
follows:
NG NG
X m, n; p ,q ( j )  (lmp lnq ) wi wk [fmp (ri )  fnq (rk )]Fm(0),n; p ,q ( j, Rik ) (11.133)
i 1 k 1

NG NG
Ym(, n); p ,q ( j )  (lmp lnq ) wi wk [ gmp (ri ) gnq (rk )]Fm(,n); p ,q ( j, Rik ) (11.134)
i 1 k 1

where wi’s are the weighted factors, which are given in Table 11.1, and
Time-Domain Integral Equation Methods for Transient Problems 481

 ( )  ( j  Ri , k )
 F ( j , Ri , k )  ( j  Ri , k )
 m , n; p , q Ri , k
 (11.135)
 2
 R  rmp (i )  rnq ( k )  a 2
 i ,k

with rm1 (i )  (1  i )rm1  i rm and rm2 (i )  (1  i )rm  i rm1 , where  i take the
values of xi in Table 11.1.

Table 11.1
Gauss-Legendre Quadrature Evaluated Points and Weighted Factors

NG  1 x1  0.5 w1  1

NG  2 x1  0.211325 w1  0.5
x2  1  x1 w2  0.5

NG  3 x1  0.112702 w1  0.277778
x2  0.5 w2  0.444444
x3  1  x1 w3  w1

NG  4 x1  0.069432 w1  0.173927
x2  0.330009 w2  0.326073
x3  1  x2 w3  w2
x4  1  x1 w4  w1

NG  5 x1  0.046910 w1  0.118463
x2  0.230765 w2  0.239314
x3  0.5 w3  0.284444
x4  1  x2 w4  w2
x5  1  x1 w5  w1

NG  6 x1  0.033765 w1  0.085662
x2  0.169395 w2  0.180381
x3  0.380690 w3  0.233957
x4  1  x3 w4  w3
x5  1  x2 w5  w2
x6  1  x1 w6  w1
482 Advanced Computational Electromagnetic Methods and Applications

11.4.1.2 Evaluation of Singular Terms

The integrals (11.131) and (11.132) are singular only if j  1 and at the mean time
 mp and  qn overlap. This happens in three cases: (1) if m  n  1 ,  m overlaps
with  n ; (2) if m  n ,  m overlaps with  n , and  m overlaps with  n ; and (3)
if m  n  1 ,  m overlaps with  n .

1. If m  n  1 ,  m and  n overlap, as shown in Figure 11.10(a). The integrals


(11.131) and (11.132) become
1 1
 (1  R )
X n 1, n;2,1 (1)  (ln ) 2   (1   )' d' d (11.136)
0 0
R
1 1
 (1  R )
Yn(1,) n;2,1 (1)     (1  R ) d' d (11.137)
0 0
R

where R  (ln ) (  ' )  a . Because of 0  1  R  1 , we have 0  ln   '


 2 2 2

 (ct )2  a 2 ; thus, the integral domain is the shadowing part in Figure 11.10(b).

Let R  b2 (  ' )2  a 2 and (ct )2  a 2 / b   . Define

1 1
I p(, q) (b)    R  p' q  (1  R )d' d
0 0

1 1 1   1  ' 

   d  d'   d  d'   d'  d  R  p' q (11.138)
 
0 0 min( ,1) 0 min( ,1) 0 

(a) (b)
Figure 11.10 (a) Illustration of singularity treatment and (b) the integral domain of (11.138).
Time-Domain Integral Equation Methods for Transient Problems 483

Specifically,

1 1 1   1  ' 
 d' d
( 1)
I 0,0 (b)           
  b2 (  ' )2  a 2
 0 0 min( ,1) 0 min( ,1) 0 

2  a     2  a2 
   ln  (11.139)
b b a 
 

 
where   min ct , a 2  b 2 . More formulae that are needed include

1  a     2  a2 
( 1)
I1,0 ( 1)
(b)  I 0,1 (b)    ln  (11.140)
b b a 
 

2  2a 2  9b 2  (a   ) a      2  a2 
( 1)
I1,1 (b)    ln  (11.141)
3b  6b 2 b a 
 

1  a2   2 
(0)
I 0,0 (b)    2  2  a2  (11.142)
2b  b 

2a3  2 3  3b  2  a 2 a 2    2  a 2
(1)
I 0,0 (b)   ln (11.143)
3b2 b a
Making use of the above formulae, we obtain

X n 1, n;2,1 (1)  (ln )2  I 0,1


( 1) 
(ln ) 
( 1) 
(ln )  I1,1 (11.144)

Yn(0) ( 1) 
1, n;2,1 (1)   I 0,0 (ln ) (11.145)

1 (0) 
Yn(1)1, n;2,1 (1)   I 0,0
( 1) 
(ln )  I 0,0 (ln ) (11.146)
ct
2 (0)  1
Yn(2) ( 1) 
1, n;2,1 (1)   I 0,0 (ln )  I 0,0 (ln )  (1) 
I 0,0 (ln ) (11.147)
ct (ct )2

2. If m  n , 1m overlaps with 1n and  2m overlaps with  2n . The singular


integrals are
1 1
 (1  R )
X n, n;1,1 (1)  (ln )2   ' d' d
0 0
R
484 Advanced Computational Electromagnetic Methods and Applications

 (ln )2 I1,1
( 1) 
(ln ) (11.148)
1 1
 (1  R )
X n, n;2,2 (1)  (ln )2   (1   )(1  ' ) d' d
0 0
R

 (ln )2  I 0,0
( 1)  ( 1) 
(ln )  2I1,0 (ln ) 
( 1) 
(ln )  I1,1 (11.149)
1 1
 (1  R )
, n; p , p (1)    
Yn(0) d' d
0 0
R
( 1) p
  I 0,0 (ln ) , ln1  ln , ln2  ln (11.150)
1 1
 (1  R )
, n; p , p (1)     (1  R )
Yn(1) d' d
0 0
R

( 1) p 1 (0) p
  I 0,0 (ln )  I 0,0 (ln ) (11.151)
ct
1 1
 (1  R )
, n; p , p (1)     (1  R )
Yn(2) d' d
2

0 0
R

2 (0) p 1
( 1)
  I 0,0 (lnp )  I 0,0 (ln )  (1)
I 0,0 (lnp ) (11.152)
ct (ct )2

3. If m  n  1 , 1m overlaps with  2n . The singular integrals are

X n 1, n;1,2 (1)  (ln )2  I 0,1


( 1) 
(ln ) 
( 1) 
(ln )  I1,1 (11.153)

Yn(0) ( 1) 
1, n;1,2 (1)   I 0,0 (ln ) (11.154)

1 (0) 
Yn(1)1, n;1,2 (1)   I 0,0
( 1) 
(ln )  I 0,0 (ln ) (11.155)
ct
2 (0)  1
Yn(2) ( 1) 
1, n;1,2 (1)   I 0,0 (ln )  I 0,0 (ln )  (1) 
I 0,0 (ln ) (11.156)
ct (ct )2

11.4.2 Matrix Setup for the 3-D Problem

For a 3-D PEC body, evaluations of the matrix elements of (11.100) are the same
as (11.128) through (11.132) just by replacing the spatial basis functions for wire
segmenting with the RWG basis functions for surface meshing. That is, the matrix
Time-Domain Integral Equation Methods for Transient Problems 485

element Z mE, n ( j ) is still written in the same form as (11.128), with (11.131) and
(11.132) being rewritten as

 ( j  R)
X m , n; p , q ( j )
  R
fmp (r)  fnq (r' )dS'dS (11.157)
mp nq

 ( j  R)
Ym(, n);    ( j  R)

p,q ( j ) g mp (r) g nq (r' )dS'dS (11.158)
mp nq
R

where f m1 corresponds to f m , and f m2 corresponds to f m , and so forth.


If j  1 or mp  qn (do not overlap), the two integrals are not singular and
can be calculated by Gaussian quadrature as
NG , NG
 ( j  Ri , k )
X m, n; p , q ( j ) ( Amp Anq ) 
i , k 1
wi wk
Ri , k
fmp (ri )  fnq (rk ) (11.159)

NG , NG
 ( j  Ri , k )
Ym(, n); p , q ( j ) ( Amp Anq )
 
i , k 1
wi wk ( j  Ri ,k )
Ri , k
g mp (ri ) g nq (rk ) (11.160)

where wi’s are the weighted factors, ri 1i r1  2i r2  (1  1i  2i )r3 are the
evaluated points, and Amp and Anq are the area of  mp and  qn . The evaluated
points and weighted factors are given in Table 11.2, with xi standing for 1i , yi
standing for  2i , and zi standing for 1  1i  2i .

If j  1 and mp nq (the two triangles overlap), the integrals of (11.157)
and (11.158) will be singular. Let  ( j  R) 
1  ( j  R) , and then

 ( j  R)
X m, n
; p,q ( j) K m, n; p , q   R
fmp (r)  fnq (r' )dS'dS (11.161)
mp qn

 ( j  R)
Ym(, n); p , q ( j
) Pm(, n); p , q ( j )    ( j  R)

g mp (r) g nq (r' )dS'dS (11.162)
mp qn
R

where
1
  Rf
p
K m , n; p , q
 m (r)  fnq (r' )dS'dS (11.163)
mp nq

1 p
Pm(, n);    ( j  R)

p,q ( j) g m (r) g nq (r' )dS'dS (11.164)
mp nq
R
486 Advanced Computational Electromagnetic Methods and Applications

Table 11.2
Gauss-Legendre Quadrature Evaluated Points and Weighted Factors for a Triangle Domain

NG  1 x1  1/ 3 y1  1/ 3 z1  1/ 3 w1  1

NG  3 x1  1 / 2 y1  1 / 2 z1  0 w1  1 / 3
x2  0 y2  1 / 2 z2  1 / 2 w2  w1
x3  1 / 2 y3  0 z3  1 / 2 w3  w1

NG  4 x1  1 / 3 y1  1 / 3 z1  1 / 3 w1  27 / 48
x2  1 / 5 y2  1 / 5 z2  3 / 5 w2  25 / 48
x3  3 / 5 y3  1 / 5 z3  1 / 5 w3  w2
x4  1 / 5 y4  3 / 5 z4  1 / 5 w4  w2

NG  7 x1  1 / 3 y1  1 / 3 z1  1 / 3 w1  9 / 40
x2  a y2  b z2  b w2  (155  15) /1200
x3  b y3  a z3  b w3  w2
x4  b y4  b z4  a w4  w2
x5  c y5  d z5  d w5  (155  15) /1200
x6  d y6  c z6  d w6  w5
x7  c y7  d z7  c w7  w5
a  0.05971587, b  0.47014206
c  0.79742699, d  0.10128651

The second terms in (11.161) and (11.162) are nonsingular and again
performed by using the Gaussian quadrature. The integrals (11.163) and (11.164)
are given next below [33].
1. If m  n , then p  q and we have

lm ln 1
Pm(0),n; p , q  (1) p  q
Amp Anq   R dS'dS
mp nq

lm ln  4 2   1  a  1  b  1  c  
 (1) p  q   A  ln 1    ln 1    ln 1   (11.165)
Amp Anq  3   a  s  b  s  c  s  

lm ln ( j  R) p  q lm ln
Pm(1), n; p , q  (1) p  q
Amp Anq  
mp nq R
dS'dS  jPm(0)
, n  ( 1)
ct
(11.166)
Time-Domain Integral Equation Methods for Transient Problems 487

lm ln ( j  R )2
mp qn R dS'dS
pq
Pm(2)
, n; p , q  ( 1)
Amp Anq
lm ln l l 1
 j 2 Pm(0),n  2 j (1) p  q
ct
 (1) p  q mp n q
Am An (ct )2   R dS'dS (11.167)
mp nq

lm ln ρmp  ρqn lm ln
K m, n; p , q 
2 Amp 2 Anq p q R dS'dS  2 Amp 2 Anq I S (11.168)
 
m n

A2   a 2  b2 a2  c2   a 2  b2 b2  c2 
IS  10  3 2
3 a  5  3 2 b
30  c b2   c 2
a2 
 a2  c2 c2  b2   2 2 2 A2  2 a
5  3 2
 2 2  c   a  3b  3c  8 2 
ln(1  )
 b a   a a s
 A2 4 b  2 A2 4 c 
  a 2  2b 2  4c 2  6 2  ln(1  )   a  4b 2
 2 c 2
 6  ln(1  ) 
 b b s  c2 c s 
(11.169)

In the above, A  1
2
(s  a)(s  b)(s  c) is the area of  mp , s  12 (a  b  c) ,
a, b , and c are the lengths of the three edges; a  lm is the mth edge, and b and
c are the other two edges. The last term in (11.167) can be calculated by Gaussian
quadrature.
2. If m  n , I S is written as

A2   a 2  b2 a 2  c2   a 2  b2 b2  c2 
I S  (1) p  q  10   a  5  6 b
60  c2 b2   c 2
a2 
 a2  c2 c2  b2   2 2 2 A2  12 a
5  2
6 2  c   2a  b  c  4 2  ln(1  )
 b a   a a s
 A2 2 b  2 A2 2 c 
  9a 2  3b 2  c 2  4 2 2 2
 ln(1  )   9a  b  3c  4 2  ln(1  ) 
 b b s  c c s 
(11.170)

where b and c are the lengths of mth and nth edges while and a is the length of
the third edge.

For the MFIE, it is not difficult to show


488 Advanced Computational Electromagnetic Methods and Applications

 j  1, 0  j  1  R  1
S' ( j  R ) S" ( j  R ) 1 
  1  2 j , 0  j  R  1 (11.171)
R ct R
 j  2, 0  j  1  R  1
Define

1 1, j  1
Dm, n ( j )   ( j )  fm (r)  fn (r) dS ,  ( j )   (11.172)
2 m 0, otherwise

1 ˆ )[f (r )  f (r' )]  
 ( j  R ) (nˆ  R 
P.V.  
m n
Vm, n ( j )    dS'dS (11.173)
4π R 2
ˆ
n n (nˆ  R)  [f m (r )  f n (r' )]
 
There is no singularity with the last integral (the integrand is taken to be zero
as R  0 ). Thus, the matrix element (11.104) can be cast in a recursive way:

ZmH,n ( j )  [ Dm,n ( j  1)  Dm,n ( j )]


[( j  1)Vm,n ( j  1)  (1  2 j )Vm,n ( j )  ( j  2)Vm,n ( j 1)] (11.174)

The evaluations of matrix elements for a dielectric object are essentially the
same as the PEC body. Specifically, (11.120) is the same as (11.100), while
(11.121) is only a little different from (11.104), which is easy to handle.

11.4.3 Matrix Setup for the 2-D Problems

11.4.3.1 TM Case

Refer to Figure 11.6. For the TM case, if we use EFIE, the matrix element is
(11.66). If we use MFIE, the matrix element is (11.69). Substituting (11.126) into
(11.67) and (11.70), and making use of

 2
x R
dx
2

 ln x  x 2  R 2  (11.175)

1 dx 1 xR
 xR x2  R2

R xR
(11.176)

we find that
Time-Domain Integral Equation Methods for Transient Problems 489

F  j, R   L  j  1, R  u ( j  1  R )  3L  j , R  u ( j  R )
 3L  j  1, R  u ( j  1  R )  L  j  2, R  u ( j  2  R ) (11.177)
 ln( R)   ( j  1  R )  2  ( j  R )   ( j  1  R ) 

G  j , R   K  j  1, R  u ( j  1  R )  3K  j , R  u ( j  R )
(11.178)
 3K  j  1, R  u ( j  1  R )  K  j  2, R  u ( j  2  R )

In the above,  ( x) is defined in (11.124), and

(11.179)
1, x  0
u  x  
0, x  0

L( j, R)  ln ( jct )  ( jct )2  R 2  (11.180)


 
 ( jct )  ( jct )  R
K ( j , R)    1 (11.181)
 R  ( jct )  R
Now we define
smp snq

  L  j, R  u( j  R ) ( )nq (' )dC' dC


p,q p
X m, n ( j)  m (11.182)
0 0

smp snq

  ln( R)  ( j  R) ( )nq (' )dC' dC


p,q p
Y m, n ( j)  m (11.183)
0 0

smp

  mp   nq   dC , dC  s( ) d  sm d


p,q p
 m, n (11.184)
0

smp snq

U p,q ˆ ) p    q '  dC' dC


( j )  P.V.   K  j, R  u ( j  R )(nˆ  R (11.185)
m, n m n
0 0

where s1m  sm and sm2  sm are the lengths of the portions  m and  m ,
respectively, and so forth. Using these definitions, (11.66) and (11.69) are
calculated as
1 1 2,2

,n  j  
Z mE,TM
2π ct
   j
p , q 1
p,q
m, n (11.186)

1 2,2
1 1 2,2

,n  j  
Z mH,TM T  j    mp ,,qn     j p,q
m, n (11.187)
2 p , q 1 2π ct p , q 1
490 Advanced Computational Electromagnetic Methods and Applications

with
 mp ,,qn ( j )  X mp,,nq  j  1  3 X mp,,nq  j   3 X mp,,nq  j  1  X mp,,nq  j  2 
(11.188)
+Ymp,,nq  j  1  2Ymp,,nq  j   Ymp,,nq  j  1

mp,,qn ( j )  U mp,,nq  j  1  3U mp,,nq  j   3U mp,,nq  j  1  U mp,,nq  j  2 (11.189)

The integral (11.184) can be carried out analytically and the results are:
1  1,1 1 1  2,1 1
1,2
m , m 1  sm ,  m, m  sm ,  2,2
m, m  sm ,  m, m 1  sm (11.190)
6 3 3 6
p,q
The other  m , n ’s are zero. The integrals (11.182) and (11.185) are nonsingular and
can be evaluated by Gaussian quadrature, say, using the seven-point rule. The
integral (11.183) is also nonsingular if j  2 or if  mp and  qn do not overlap. The
singularity arises only if  mp and  qn overlap along with j  1 .
Define

 
 x y  m n
Pm , n      ln  x  y   1   x y dydx
0 0  ct 
min( ct , ) x  x
  x m dx  ln  x  y  y n dy   x m dx  ln  x  y  y n dy (11.191)
0 0 min( ct , ) x  c t
min( ct , ) y  y

  y n dy  ln  y  x  x m dx   y n dy  ln  y  x  x m dx
0 0 min( ct , ) y  ct

The integration domain is shown in Figure 11.11. The integrals for m, n  0,1
can be carried out analytically without difficulty and the results are

2
P0,0    (2ln   3)  2(   )(ln   1) (11.192)
2
3
P1,0    P0,1    (2 ln   3)  ( 2   2 )(ct ) ln(ct )  1
4
(11.193)
(ct ) 2
 (   )  2 ln(ct )  1
4
Time-Domain Integral Equation Methods for Transient Problems 491

4 2
P1,1    (4 ln   7)  ( 3   3 )(ct ) ln(ct )  1
16 3 (11.194)
1 2
 (   )(ct ) 2  2 ln(ct )  1
2

4
in which   min(ct ,  ) , and only the first terms remain if ct   or    .

Figure 11.11 Illustration of the integration domain of (11.191).

By using these results, the singular integrals that will be used are

sm sm
 s  s'  s sm  s'
Ym1,2,m 1 (1)    ln( s  s' )  1  
ct  sm sm
ds ds
0 0
(11.195)
1 1
  P1,0 ( sm )   2 P1,1 ( sm )
sm ( sm )

sm sm
 s  s'  s s'
  ln( s  s' )  1  ds ds
1,1
Y m, m (1)  
ct  sm sm
0 0
(11.196)
1 
 P1,1 ( s )
m
( sm ) 2

sm sm
 s  s'  sm  s sm  s'
0 0  ds ds
2,2
Y m,m (1)  ln( s  s' )  1  
 ct  sm sm
(11.197)
2 1
 P0,0 ( sm )   P1,0 ( sm )   2 P1,1 ( sm )
sm ( sm )
492 Advanced Computational Electromagnetic Methods and Applications

sm sm
 s  s'  sm  s s'
0 0  ds ds
2,1
Y m , m 1 (1)  ln( s  s' )  1  
 ct  sm sm
(11.198)
1 1
  P0,1 ( sm )   2 P1,1 ( sm )
sm ( sm )

11.4.3.2 TE Case

For TE wave incidence, if we use EFIE, the matrix elements are (11.73), and
(11.74) becomes

 j 1 ct
d(ct )
W  j, R    S ( j  t )
max  R ,  j  2  ct  (ct )2  R 2
 Q( j  1, R)u ( j  1  R )  3Q( j, R)u ( j  R )
 3Q( j  1, R)u ( j  1  R )  Q( j  2, R)u ( j  2  R )
1 (11.199)
 ln( R) ( j  1) 2  ( j  1  R )  (2 j 2  2 j  1)  ( j  R )
2
( j  2) 2  ( j  1  R ) 

with

1  R2  3j R2 
Q( j, R)   2 j 2   L ( j , R )  ( jc t ) 2
 R 2
  (11.200)
4  (ct )2  ct (ct )2 

If we use MFIE, the elements are (11.69). Similar to (11.186) and (11.187),
(11.73) and (11.76) are calculated by

1 1 2,2
Z mE,TE
,n ( j) 
2π ct
 (sˆ
p , q 1
p
m  sˆ qn ) mp ,,qn ( j )  (ct )2 mp ,,qn ( j )  (11.201)

1 2,2

,n  j  
Z mH,TE T  j   (sˆ mp  sˆ nq ) mp ,,qn
2 p , q 1

1 1 2,2

2π ct
 (sˆ
p , q 1
p
m  sˆ nq ) mp ,,qn ( j )  (sˆ mp  sˆ nq )  Λ mp ,,qn ( j )  (11.202)

with
Time-Domain Integral Equation Methods for Transient Problems 493

smp snq

  W ( j , R) g (ρ) g nq (ρ)dsds
p,q p
 m, n ( j)  m (11.203)
0 0

smp snq

Λ p,q ˆ ) p (ρ) q (ρ' )ds ds


( j )  P.V.   G  j, R  (nˆ  R (11.204)
m, n m n
0 0

The integral (11.203) has a logarithmic singularity only if the integrations are
performed over the same segments along with j  1 . Treatment of the singularity is
the same as (11.191) but involves only P0,0 ( ) . The integral (11.204) is
nonsingular and evaluated by Gaussian quadrature. A recursive way to calculate
mp ,,qn ( j ) and Λ mp ,,qn ( j ) , like  mp ,,qn ( j ) and  mp ,,qn ( j ) may be adopted.
So far, we have provided all the formulae for evaluations of matrix elements
for 1-D, 2-D, and 3-D geometrics. Closed-form expressions must be used for
singular integrals that happen at the j=1 time step and the source segment/triangle
overlapping with the field segment/triangle. Higher-order numerical Gaussian
quadrature is suggested for nonsingular integrals.

11.5 EXTENSION TO MOVING OBJECTS

An extension of the conventional TDIE methods to scattering by arbitrarily


moving bodies is interesting, because superfast targets are emerging at an
accelerated pace. A moving target can travel in translation as a whole at
hypervelocity, and it may rotate simultaneously about three orthogonal axes. The
former is known to cause Doppler effects, while the latter is called micro-motions
that would produce micro-Doppler characteristics.
Traditionally, analysis of scattering by a moving object is based on the stop-
go-stop model, which permits one to analyze the scattering at a series of discrete
instants by the object at a series of discrete positions and attitudes. This is valid if
the speed is not very high and acceleration effects are ignorable. In other words
during the interaction of the incident pulse with the object, the displacement of the
object should not exceed a prescribed distance, say, a half-wavelength of the
carrier wave; otherwise, the echo waveform would be seriously distorted due to the
motions, which could result in blurring in positioning, imaging, and recognition.
There are two rigorous approaches to studying the scattering problems by
moving objects. One is to study it directly in the laboratory reference system,
which needs to modify the boundary conditions and even the wave equations as
well. This is not easy to handle if the shape of the target is irregular. The other is to
study it in the moving or target reference system, and transform the results between
different reference systems. This latter approach is called the Frame Hopping
Method (FHM) by Einstein. In this section, we implement the FHM in a numerical
fashion.
494 Advanced Computational Electromagnetic Methods and Applications

11.5.1 Transforms of Space Time and Fields

Refer to Figure 11.12. An object is moving at high speed and acceleration, and in
the meantime, it may rotate about a center. To characterize the motions, we
introduce four reference systems, or frames, as illustrated in Figure 11.13. We
assume that the ground frame, or G-frame, is on the Earth with the x-axis
southward, y-axis eastward, and z-axis upward. The target frame, or T-frame, is
fixed on the target with its z-axis in heading direction, x-axis toward the left wing,
and y-axis upward from the back of the target. Between the G-frame and T-frame,
two intermediate frames, called the S-frame and C-frame, are introduced. The S-
frame characterizes the initial position and orientations of the target, while the C-
frame characterizes the motion of an apparent barycenter that can be superfast
and/or in acceleration.

axis 3
, a

axis 2

axis 1

Figure 11.12 Illustration of a moving object.

The space-time transform from G-frame to S-frame is expressed as


 tS  1 0   tG  t0 
r    0 R    r  r  (11.205)
 S  ini   G 0

where r0 is the initial position, t0 is the time that the wave travels from the origin
of the G-frame to the origin of the S-frame, and R ini reflects the orientation of the
object,
1 0 0   cos(v  12 π) sin(v  12 π) 0
   
R ini  0 cos( 12 π  v ) sin( 12 π  v )     sin(v  12 π) cos(v  12 π) 0 (11.206)
0  sin( 12 π  v ) cos( 12 π  v )   0 0 1 
Time-Domain Integral Equation Methods for Transient Problems 495

where  v is the azimuthal angle measured from the xG-axis to the projection of the
velocity vector onto the xG-yG plane, and v is the elevation angle measured from
the projection to the velocity vector, as shown in Figure 11.14.

Displacement and Relativity Roll/Pitch/Yaw


Orientations Transform Spinning/Nutation/Coning

Position and Translation Micro-rotations


Attitudes (Doppler) (micro-Doppler)

Figure 11.13 Illustration of reference system transforms.

Figure 11.14 Illustration of transform between G-frame and S-frame.

The transforms between the S-frame and the C-frame are involved in relativity
transforms. If the translations are uniform, they are expressed as

ctC   0 (ctS   0 zS ) ctS   0 (ctC   0 zC )


 
 xC  xS  xS  xC
 ,  (11.207)
 yC  yS  yS  yC
 zC   0 ( zS   0 ctS )  zS   0 ( zC   0 ctC )

where 0  v0 / c ,  0  1/ 1  02 , and v0 is the target speed at tS  0 . If the


acceleration effect is taken into consideration, the space-time transform becomes
496 Advanced Computational Electromagnetic Methods and Applications

   ctC   ctC  
ctS  0 ( zC   )   0 cosh 
   sinh       0  0
      

ct ct
 z  ( z   ) cosh  C    sinh  C     


 S 0 C     0    0
      (11.208)

where   c 2 / a with a being the acceleration. It is noticed that acceleration effect


is significant only when ctc is comparable with  , or atc is comparable with c ,
which means that a is very large or tC is very long. In the following, we will
assume atc c such that (11.208) is reduced to (11.207).
Finally, the transform from the C-frame to the T-frame is
 tT   1 0   tC 

r   0 R    r  (11.209)
 T  mic   C 

where R mic characterizes the rotations about three axes intercepting at the apparent
barycenter, named as micro-motions. For a plane-like object,
R mic R roll  R pitch  R yaw
 cos sin 0  1 0 0   cos  0 sin  
   (11.210)
   sin cos 0   0 cos  sin     0 1 0 
 0 0 1  0  sin  cos     sin  0 cos  

where  (t ),  (t ), and  (t ) are the angles of yaw, pitch, and roll maneuvers. For a
missile-like object,
R mic  Rspinning  R nutation  R coning
 cos  sin  0  1 0 0   cos  sin  0 
(11.211)
   sin  cos  0   0 cos  sin      sin  cos  0 
 0 0 1  0  sin  cos    0 0 1 

with
 (t) s t   0

)  p  m sin(n t   0 )
(t  (11.212)

(t ) c t   0

where s , n , and c are the angular frequencies of spinning, nutation, and


coning, and  p is the angle of procession. The transforms of EMFs from the G-
frame to the S-frame are
Time-Domain Integral Equation Methods for Transient Problems 497

ES  R ini  EG , BS  R ini  BG (11.213)

The transforms from the S-frame to the C-frame involve relativity. If


acceleration effects are ignored, the results are
EC  L  ES  K  cBS , cBC  L  cBS  K  ES (11.214)

with

 0 0 0  0  0  0 0
L   0 0 0  , K   0  0
 0 0  (11.215)
 0 0 1   0 0 0 

If rotating effects are ignorable; that is, max Dmax c where max 
max(s , n , c ) and Dmax is the dimension of the target, the transforms from the
C-frame to the T-frame are
ET  R mic  EC , BT  R mic  BC (11.216)

Now, as two specific examples, we transform a monochromatic plane wave


and a modulated Gaussian impulse from the G-frame to the T-frame. The
monochromatic plane wave in the G-frame is written as
EG (rG , tG )  pˆ G E0 cos(G tG  k G  rG ) (11.217)

cBG (rG , tG )  qˆ G E0 cos(G tG  k G  rG ) (11.218)

where p̂ G indicates the polarization, q̂G  k̂ G  p̂G with k̂ G being the propagation
direction, and k G  kG kˆ G with kG  G / c . Because we ignore acceleration
effects and all the four frames are inertial systems, the field expressions in the four
frames take the same forms as (11.217) and (11.218). Thus, the fields in the
T-frame are
ET (rT , tT )  pˆ T E0 cos(T tT  k T  rT  0 ) (11.219)

cBT (rT , tT )  qˆ T E0 cos(T tT  k T  rT  0 ) (11.220)

with 0  G (t0  kˆ G  r0 / c) , and

pˆ T =R mic  pˆ C
=R mic  (L  pˆ S +K  qˆ S ) (11.221)
=R mic  L  (R ini  pˆ G )+K  (R ini  qˆ G ) 
498 Advanced Computational Electromagnetic Methods and Applications

qˆ T =R mic  qˆ C
=R mic  (L  qˆ S  K  pˆ S ) (11.222)
=R mic  L  (R ini  qˆ G )  K  (R ini  pˆ G ) 

It is not difficult to show that qˆ T  kˆ T  pˆ T , and the transforms for (, ck ) are
the same as the space-time (ct , r) , that is,

 T  1 0   C 
ck   0 R   ck 
 T  mic   C

 0 0 0  0  0 
1 0   0 1 0 0   S 
  
0 R mic   0 0 1 0  ck S 
 
  0  0 0 0 0 
(11.223)
 0 0 0  0  0 
1 0   0 1 0 0  1 0   G 
   
0 R mic   0 0 1 0  0 R ini  ck G 
 
  0  0 0 0 0 
A modulated Gaussian impulse in the G-frame and the T-frame is written in
the same form as

   t 
2

EG (rG , tG )  pˆ T E0 exp    T dG   cos(G G ) (11.224)
  2 G  
 

   t 
2

ET (rT , tT )  pˆ T E0 exp    T dT   cos(T T  0 ) (11.225)
  2 T  
 

where  G  tG  kˆ G  rG / c and  T  tT  kˆ T  rT / c. The transforms from (G , ck G )


to (T , ck T ) are still (11.223), while the transforms from (tdG ,  G ) to (tdT ,  T ) is

G tdG  0 
tdT  , T  G G (11.226)
T T
It is seen that the nominal bandwidth or the effective pulse duration has been
changed, which may be defined as f bw,G  6 / (2π G ) and f bw,T  6 / (2π T ) .
Time-Domain Integral Equation Methods for Transient Problems 499

11.5.2 Simulation Process

If we ignore all noninertial effects that may be caused by accelerating translation


or rotations, numerical methods in the T-frame will be the same as that in the G-
frame. So we are concentrated on transforming the scattered fields from the T-
frame to the G-frame and postprocessing the data in the G-frame.
The first step is to transform the incident wave from the G-frame to the T-
frame, say the modulated Gaussian impulse from (11.224) to (11.225). The
discretizing methods described in the previous sections are still valid but are now
in the T-frame. Once the currents are solved out by the MOT equations, the
scattered fields in the T-frame can be found. For the PEC body, the scattered fields
are computed by using (11.4), which in the far zone is reduced to
N
EsT (rT , tT )   I n EsT, n (rT , tT ) (11.227)
n 1

0 1 ˆ s ˆ s
EsT, n (rT , tT )  k T  k T   f n (r' )T' ( T  kˆ sT  r' )dS' (11.228)
4πrT ct n

where k̂ sT is the scattering direction and  T  tT  rT is the retarded time. The


magnetic field in far zone is cBsT (rT , tT )  kˆ sT  EsT (rT , tT ) . Suppose that we want to
calculate the scattered fields in the G-frame at the space-time position (rG , tG ) . We
need to do the following steps:
1. Determine the space-time position (rT ,tT ) using the space-time transforms
given before, that is,

ctT  1 0  ctC 
 r   0 R    r 
 T  mic   C 

 0 0 0  0  0 
1 0   0 1 0 0  ctS 
  
0 R mic   0 0 1 0   rS 
 
  0  0 0 0 0 
 0 0 0  0  0 
1 0  0  1 0  ctG  ct0 
   0 1 0
     (11.229)
0 R mic   0 0 1 0  0 R ini   rG  r0 
 
  0  0 0 0 0 

2. Calculate the scattered fields at (rT , tT ) in T-frame by using (11.227).


500 Advanced Computational Electromagnetic Methods and Applications

3. Transform EsT (rT , tT ) in the T-frame to EsG (rG , tG ) in the G-frame by the
inverse transforms for fields, that is,

EsG (rG , t
G)
T
R ini T
 ESs  R ini 
 L  EsC  K  cBsC 
T
R ini  L   R mic
T
 EsT   K   R mic
T
 cBsT  (11.230)

where R ini
T
and R ini
T
are the transposes of R ini and R mic , respectively,
while L and K are given in (11.215).

Once the scattered waveform or the time-domain response is found, spectrum


analyses may be followed, which is the Fourier transform of (11.230). If the
incident wave is a single pulse, the spectrum reflects the wideband frequency-
domain response given as RCS by

 EsG ( f G ) 
2

 ( f G )  lim  4πr02  (11.231)


r0   EiG ( f G ) 
 
where r0 is the distance from the observer to the target at the instant tG  0 , while
EsG ( f G ) and EiG ( f G ) are the Fourier transforms of EsG (tG ) and EiG tG  ,
respectively. If the incident wave consists of a group of pulses, the Doppler effects
caused by the superfast translation of the target as a whole may be estimated. The
Doppler spectrum at the carrier frequency may be computed by
N 2

 r0 EsG (n, fc )e j2π(k 1)(n1)/ N


SD (k f D ) 
n 1
(11.232)

where N is the number of pulses, EsG (n, fc ) is the frequency-domain response for
the nth pulse, and f D 
1/ ( NTprf ) is the Doppler resolution with Tprf being the
repeating period. To ensure coherence, it is required that Tprf  Tc , where is an
integer and Tc  1/ fc is the period of the carrier wave. One may estimate the
Doppler shift at any frequency f a by replacing the f c in (11.232).
In addition to the superfast translation, the target may rotate about an apparent
center, which is called micro-motions that may produce observable micro-Doppler
effects. To estimate the micro-Doppler effects, we may use a total of M
narrowband pulses and use the RCS defined in (11.231) to calculate
M

 (m, fG )e j2π(k 1)(m1)/ M


SmD (k f mD ) 
m 1
(11.233)
Time-Domain Integral Equation Methods for Transient Problems 501

where f mD  1/ (MTPRF ) is the resolution of the micro-Doppler effects (in general


TPRF  Tprf ) , and  (m, f G ) is the RCS found by using the mth pulse. Instead of
using the RCS, we may use the echo’s energy of each pulse to capture the micro-
Doppler frequencies. The m-th normalized echo’s energy may be defined as
K
2
E (m) r E
k 1
m
s
G (m, k t ) (11.234)

where K is the length of recorded time sequence, and rm is the distance from the
observer to the target when the mth pulse is transmitted. Replacing the  (m, f G ) in
(11.233) with E (m) of (11.234), we can identify the micro-Doppler effects as well.

11.6 NUMERICAL IMPLEMENTATIONS

In this section, we intend to provide sufficient numerical results and discussions


for 1-D, 2-D, and 3-D problems. The main purpose is for verifications of stability
and accuracy, which are most concerned for the TDIE methods.
In the following computations, if without special indication, the incident wave
is taken to be the modulated Gaussian impulse:

    td  2    6 / (2πf bw )
E (r, t ) pˆ E0 exp   
 i
  cos(2πf c ) ,  (11.235)
  2    t  kˆ  r / c
i

in which p̂ is the polarization direction, k̂ i is the incidence direction, f c is the


carrier frequency or central frequency, f bw is a nominal bandwidth that controls
the effective duration of the impulse, and td is a time delay that ensures the
incident wave does not reach the scatterer at the time t  0 . The Fourier transform
or spectrum of (11.235) is

1 2π f
 A( f  fc )  A( f  fc ) e jk r , k i  kˆ i
i
i
E (r, f ) pˆ E0 (11.236)
2 c
with

A( f )  exp 2(π f )2  e j2πf td


 (11.237)

The impulse function (11.235) and the magnitude of its spectrum (11.236) is
shown in Figure 11.15 with fc  450 MHz and f bw  300 MHz.
502 Advanced Computational Electromagnetic Methods and Applications

(a) (b)
Figure 11.15 (a) A modulated Gaussian impulse and (b) its magnitude spectrum.

Once the MOT equations are solved and the expansion coefficients are
extracted, we can compute the fields at any space-time position. Usually, far-zone
field properties are most interesting, including the time-domain waveforms and
frequency responses or wideband radar cross-section (RCS). In the far zone, the
electric field is calculated by the transverse parts of the first term of (11.4) for the
PEC body, that is,
A 0 
Es (r , rˆ , )  rˆ  rˆ   rˆ  rˆ   J s (r' ,  rˆ  r' / c)dS' (11.238)
t 4πr t S

where rˆ  kˆ s  xˆ sin s cos s  yˆ sin s sin s  zˆ coss is the scattering direction,


and uses have been made that R  r  rˆ  r' and   t  r / c . By considering the
general expansion of currents by (11.96) and the basis function in (11.127), we can
write
0 Nt N
Es (r , rˆ , )  rˆ  rˆ   I n ( j )U n ( j; rˆ , ) (11.239)
4πr j 1 n 1

where

Un ( j; rˆ , )  Fn ( j 1; rˆ , )  2Fn ( j; rˆ, )  Fn ( j  1; rˆ, ) (11.240)

Fn ( j; , rˆ )   fn (r' )  (  rˆ  r'  j )dS' (11.241)


n

The far-zone magnetic field is

1 1 0 Nt N
H s (r , rˆ , )  rˆ  Es (r , rˆ , )   rˆ   I n ( j )U n ( j; rˆ , ) (11.242)
0 0 4πr j 1 n 1
Time-Domain Integral Equation Methods for Transient Problems 503

For a dielectric geometry, the far-zone electric and magnetic fields produced
by the equivalent electric currents, denoted by Ese (r , kˆ s , ) and Hse (r , kˆ s , ) , are in
the same forms as (11.239) and (11.242) but replacing I n ( j ) with I n(e) ( j ) (refer
to (11.107)). The far-zone magnetic and electric fields produced by the equivalent
magnetic currents, by the duality principle, are
0 Nt N
H ms (r , rˆ ,
) rˆ  rˆ   I n(m) ( j )U n ( j; rˆ , ) (11.243)
4πr j 1 
n 1

 N N t

Ems (r, kˆ s , )  r, rˆ , ) 0 0 rˆ   I n(m) ( j )U n ( j; rˆ , )


0rˆ  Hms ( (11.244)
4πr j 1 
n 1

The total scattered far-zone fields are the sums, that is,
s
E Ees  Ems , H
s
Hes  Hms (11.245)

The RCS is usually defined in the frequency domain by

 Es ( f )
2

 ( f )  lim  4πr 2  (11.246)
r   2 
 Ei ( f ) 

where Es ( f ) is the Fourier transform of Es (r , rˆ , ) .

11.6.1 Numerical Examples for Wire Problems

The first example is a scattering problem by a dipole antenna (see the inset of
Figure 11.16). The length of the dipole antenna is 1m, and its diameter is 1 cm.
The incident wave in this example is an unmodulated Gaussian impulse:
4  4 
Ei (r, t ) pˆ
 exp   (ct  ct0  kˆ i  r )2  (11.247)
T π  T 

with pˆ  zˆ , kˆ i  xˆ , T  4 , and ct0  6 . The current at the feeding point under


the short-circuit condition is depicted in Figure 11.16(a). Comparison with inverse
discrete Fourier transform (iDFT) solution is also provided in the same figure,
which solves the problem by using the method of moments (MoM) at 128
frequency points from 4 MHz to 450 MHz and then transforms the data from
frequency domain to time domain. The magnitude of the current until 200 LM
(LM=light meter: the time that light takes to travel 1m in the vacuum) is plotted in
Figure 11.16(b). It is seen from the figure that currents converge at an exponential
rate, showing that the present method is very stable and accurate.
504 Advanced Computational Electromagnetic Methods and Applications

(a) (b)
Figure 11.16 Scattering of a dipole antenna by a Gaussian impulse: (a) the short circuit current at the
feeding point and (b) the magnitude of current at the feeding point.

(a) (b)

(c)
Figure 11.17 Radiation of a V-shape dipole antenna fed by a Gaussian voltage: (a) the input current;
(b) the magnitude of input current; and (c) the far-zone radiation field.
Time-Domain Integral Equation Methods for Transient Problems 505

The sencond example is a radiation problem by a V-shape antenna (see the


inset of Figure 11.17). Each arm is 1m and its diameter is 1 cm. The angle between
the two arms is 45o. A Gaussian source voltage
4  4 
V (t )  exp   (ct  ct0 )2  (11.248)
T π  T 
is applied at the vertex (feeding point), which enters (11.58) to yield
Vm (i)  V (it ) if m  ( N  1) / 2 and Vm (i)  0 otherwise. The input current at the
feeding point for the time from 0 to 50 LM is displayed in Figure 11.17(a), and its
late-time behavior till 300 LM (6,000 time steps) is shown in Figure 11.17(b). It
can be seen from the figure that the input current is exponentially convergent but at
a lower rate compared with the straight dipole antenna in Figure 11.16(b). The far-
zone radiation field is displayed in Figure 11.17(c).

(a) (b)

(c)
Figure 11.18 Radiation of a helical monopolar antenna fed by a Gaussian voltage: (a) the input current;
(b) the magnitude of input current; and (c) the far-zone radiation field.

The third example is also a radiation problem for a helical antenna [see the
inset of Figure 11.18], which has six turns and the raising angle is 14o. The length
is 1.24m and is divided into 12 segments. A Gaussian source voltage in (11.248) is
506 Advanced Computational Electromagnetic Methods and Applications

applied at the vertex (feeding point). The current at the vertex is depicted in Figure
11.18(a), and its late-time behavior until 1,000 LM (4,000 time steps) is shown in
Figure 11.18(b). The far-zone field is displayed in Figure 11.18(c).
As demonstrated by the three examples above, the TDIE solutions are
absolutely convergent at exponent rates as long as the matrix elements are
evaluated precisely, where precise evaluations mean closed-form expressions for
self-interacting terms and numerical Gaussian quadrature by at least four points for
nonself-interacting terms. If the one-point rule is used, the MOT solution would
diverge eventually.

11.6.2 Numerical Examples for the 2-D Structures

For 2-D problems, the first example is a PEC circular cylinder with a radius of 1 m
(see the inset of Figure 11.19). The incident wave is the modulated Gaussian
impulse as (11.235) with kˆ i  xˆ , E0  120π , and fc  f bw  300 MHz. For a TM
wave incidence, pˆ  zˆ , and for a TE wave incidence, pˆ  yˆ . The induced current
at the point (1, 0) is shown in Figure 11.19(a) for TM polarization. The
convergent property of the currents is shown Figure 11.19(b). Comparisons by
using EFIE, MFIE, and CFIE are given, as well as the IDFT solution that solves
the problem by using the MoM at 256 frequency points from 150 MHz to 450
MHz and then converts the data from frequency domain to time domain. We repeat
the computing procedure by changing the incident wave to the TE polarization.
The convergent property of magnitude of the current at the same point is displayed
in Figure 11.19(c). It is seen from the figures that CFIE gives much more accurate
results than EFIE and MFIE.
The second example is a double strip structure (see the inset of Figure 11.20).
The width of the strips is 1m, and they are separated by 0.5m. The incident wave
is the same as the previous example, except for the incident direction for the TE
case, which is changed to kˆ i  (xˆ  yˆ ) / 2 such that pˆ  (xˆ  yˆ ) / 2 . For the
TM case, the magnitudes of induced currents at the point (0, 0.25) is shown in
Figure 11.20(a), and the RCS at 450 MHz is shown in Figure 11.20(b). For TE
case, the magnitude of induced currents at the same point is shown in Figure
11.21(a), and the RCS at 150 MHz is shown in Figure 11.21(b). It can be seen
from the graphs that the magnitudes attenuate exponentially first and level off
eventually.
To verify the convergent property for the narrowband case, we repeat the
computations in Figure 11.22(a) with the structure enlarged by 10 times and the
nominal bandwidth is as narrow as 3 MHz (effective band is from 298.5 MHz to
301.5 MHz). The result is illustrated in Figure 11.22(a), which levels off at around
108 A/m. A close look at a portion of the waveform is plotted in Figure 11.22(b).
Good convergence and accuracy are achieved.
Time-Domain Integral Equation Methods for Transient Problems 507

(a) (b)

(c)
Figure 11.19 Scattering of modulated Gaussian impluse by a conducting cylinder: (a) the induced
current at point (1, 0) for TM case; (b) the magnitude of currents for TM case; and (c)
the magnitude of currents for TE case. ©IEEE 2014 [28].

(a) (b)
Figure 11.20 Scattering of modulated Gaussian impluse by a double strip structure for TM
polarization: (a) the magnitude of currents at point (0, 0.25) and (b) the bistatic RCS at
450 MHz. ©IEEE 2014 [28].
508 Advanced Computational Electromagnetic Methods and Applications

(a) (b)
Figure 11.21 Scattering of modulated Gaussian impluse by a double strip structure for TE
polarization: (a) the magnitude of currents at point (0, 0.25) and (b) the bistatic RCS at
150 MHz. ©IEEE 2014 [28].

(a) (b)
Figure 11.22 Scattering by a large-size structure to verify the convergent performance: (a) the time
domain waveform and (b) a zoom-in look at a part of the waveform.

The above results demonstrate that the present TDIE methods for 2-D
structures are stable and accurate.

11.6.3 Numerical Examples for the 3-D Geometries

For 3-D geometries, the first example is a square PEC plate with side size 1m,
lying in the x-y plane. It is discretized by using 1,825 RWG basis functions. The
incident wave is the modulated Gaussian impulse with fc  600 MHz and
f bw  1.2 GHz. The induced current waveform at the plate center is shown in
Figure 11.23(a) from 0 to 10 LM (400 time steps). The convergent behavior of the
magnitudes is shown in Figure 11.23(b) till 4,000 time steps (100 LM), which
levels off below 1015 A/m. The backscattering field is plotted in Figure 11.23(c).
Using the scattered field data, monostatic RCS from 0 to 1.2 GHz can be found as
Time-Domain Integral Equation Methods for Transient Problems 509

shown in Figure 11.23(d), where comparisons by using different time step sizes
and different numbers of spatial unknowns, as well as the MoM solutions, are
illustrated. Good stability and accuracy are achieved.

(a) (b)

(c) (d)
Figure 11.23 Backscattering by square PEC plate with side size 1m: (a) the current waveform at the
plate center, (b) the magnitude of the currents at the center, (c) the backscattering field,
and (d) the backscattering wideband RCS.

The next example is a PEC cube with an edge size of 0.5 meter. It is
discretized by using 2,178 RWG basis functions. The parameters of the incident
wave are the same as the previous example. The induced current waveform at the
center of the top surface is shown in Figure 11.24(a) from 0 to 10 LM (400 time
steps). The convergent behavior of the magnitudes is shown in Figure 11.24(b)
until 4,000 time steps (100 LM), which levels off about 1016 A/m. The monostatic
RCS from 0 to 1.2 GHz can be found as shown in Figure 11.24(c), and the bistatic
RCS at 900 MHz is shown in Figure 11.24(d), where a comparison with MoM
solution and measured data [34] is provided. Again, it is observed from Figure
11.24 that the results are stable and accurate.
510 Advanced Computational Electromagnetic Methods and Applications

(a) (b)

(c) (d)
Figure 11.24 Backscattering by a PEC cube with an edge size of 0.5m: (a) the current waveform at the
center of the top side, (b) the magnitude of the currents at the center point, (c) the
monostatic RCS, and (d) the bistatic RCS at 900 MHz.

(a) (b)
Figure 11.25 Scattering by a NASA almond: (a) the current waveform at the point P and (b) the
bistatic RCS at 1.57 GHz.
Time-Domain Integral Equation Methods for Transient Problems 511

(a) (b)

(c) (d)
Figure 11.26 Scattering by a dieletric sphere: (a) the equivalent current at the polar pioint; (b) the
equivalent magnetic current at the polar point; (c) the scattered far-zone field; and (d)
wideband backscattering RCS.

The third example is the NASA almond [35]. We use 2,031 curved RWG
basis functions to discretize the surface of the almond. The carrier frequency and
nominal bandwidth of the incident modulated Gaussian impulse are f c  1.57 GHz
and f bw  3.14 GHz, respectively. The induced currents at a point are shown in
Figure 11.25(a), and its bistatic RCS at 1.57 GHz is shown in Figure 11.25(b);
these are in good agreement with the measured data.
The last example is a dielectric sphere [see the inset of Figure 11.26(a)], for
which an analytical solution is available. The diameter of the sphere is 0.5m, and
the relative permittivity is  r  4.0 . It is modeled by 6,672 curved RWG basis
functions. The carrier frequency and bandwidth are fc  500 MHz and f bw  1.0
GHz, respectively. The x-component of the equivalent electric currents and the y-
component of the equivalent magnetic currents at the point (0, 0, 0.25) are plotted
in Figures 11.26(a) and 11.26(b), respectively. The backscattering far-zone field
and RCS are displayed in Figures 11.26(c) and 11.26(d), respectively. For the sake
512 Advanced Computational Electromagnetic Methods and Applications

of comparison, the analytical Mie series solution is provided in the same figure,
and it is observed from the figure that they are in good agreement.

11.6.4 Numerical Examples for Moving Objects

The first example is a PEC sphere [see the inset of Figure 11.27(b)] with a radius
of a  1 m. It is located on the z-axis at z0  107 meters at the initial instant, and
moves along the z-axis at an exaggerated speed v0  3 107 m/s. The incident wave
is an elementary Gaussian pulse:

    t0  2  z  z0
EiG ( )  xˆ exp     ,  t c (11.249)
  2  

where  is taken to be 0.45×108 seconds and t0  5 . The backscattered time-


domain waveform and its Fourier transform of the PEC sphere is shown in Figures
11.27(a) and 11.27(b), respectively. It is seen from the figure that its spectrum is
weakened and shifted to the lower band, which is reasonable as the sphere is
moving away from the observer.
-8 -16
x 10 x 10
2

1 2 v=0
v=3e7

0
Spectrum of Ex (v*s/m)

1.5

-1 v=0
Ex(v/m)

v=3e7
-2 1

-3
0.5
-4

-5 0
0 1 2 3 4 5 6 7 8 9 0 0.5 1 1.5 2 2.5 3 3.5 4
time(s) -8 freq(Hz) x 10
8
x 10

(a) (b)
Figure 11.27 The time-domain and frequency-domain responses of a moving PEC sphere: (a) the echo
waveform and (b) the spectrum of the echo.

The second example is a missile model that consists of a cylinder and a half
sphere (see the inset of Figure 11.28). The height and diameter of the cylinder is
1.5m and 1m, respectively. It flies at v  1 km/s and in the meantime seesaws
about the track-axis at 1 kHz. A very narrowband modulated Gaussian impulse
with carrier frequency fc  200 MHz and nominal bandwidth f bw  1 MHz is
incident upon it at an elevation angle 45o. The time-domain waveform of
backscattered far-zone field looks somewhat like Figure 11.22. Taking the Fourier
Time-Domain Integral Equation Methods for Transient Problems 513

transform of the waveform, we obtain its spectrum property as shown in Figure


11.28. The main peak reflects the Doppler effect of the target as a whole that
moves at 1 km/s (theoretical value is f D  2  c  cos45o  f0  0.941 kHz). The
minor peaks, separated by 1 kHz, reflect the micro-Doppler effects due to the
seesawing motion.

-9
x 10
1.6

1.4

1.2
spectrum (V.s/m)

0.8

0.6

0.4

0.2

0
-8 -6 -4 -2 0 2 4 6 8 10
f-f0 (kHz)

Figure 11.28 The Doppler and micro-Doppler effects of a model missile that moves at 1 km/s and
vibrates at 1 kHz about its track axis.

The last example is a moving cone or a warhead model. The diameter of its
base is 42 cm, and its height is 145 cm. It flies horizontally at a speed v  3.4 km/s,
and in the meantime rotates in coning at f coning  2 Hz and in nutation at
f nutation  3.3 Hz. Both the procession angle and the maximum nutation angle are
set to be  p  m  10o [refer to (11.212)]. A train of modulated Gaussian pulses
is transmitted to the target from a ground station (G-frame), as illustrated in Figure
11.29. Each incident pulse has the waveform as shown in Figure 11.15 and is
delayed by TPRT  0.02 second, and a total of 128 pulses are used. Each pulse is
first transformed from the G-frame to the T-frame, and then the scattered pulse is
calculated in the T-frame, which is finally transformed back to the G-frame and
recorded as a vector echo:
S(n, tG )  rn EGs (n, tG  2rn / c) (11.250)

where rn is the distance from the station to the target when the nth pulse is
transmitted. Three echoes (only HH-polarization components) are illustrated in
Figure 11.29. The normalized energy of each echo may be calculated by
514 Advanced Computational Electromagnetic Methods and Applications

 S(n, k t )
2
E ( n)
 (11.251)
k 1

where K is the number of samples of each recorded echo. This energy sequence is
a measure of time varying scattering capability of the target (amounting to the
scattering cross-section), which is plotted in Figure 11.30(a), and its Fourier
transform is shown in Figure 11.30(b). Obviously, the peak positions are located at
f peak  m  f coning  n  f nutation , where m and n are a pair of integers. For example, the
first four peaks correspond to (m, n) (2, 1) , (m, n)  (1,1) , (m, n)  (1,0) , and
(m, n)  (0,1) , respectively. It might be possible to identify the micro-motion
characteristics by using a group of peak positions.

Figure 11.29 Illustration of pulse reposnes from a flying cone target that rotates in coning and
nutation in the mean time.

0.040 1.5

0.035
Spectrum, V 2 Hz
Echo Energy, V 2

1.0
0.030

0.025
0.5
0.020

0.015 0.0
0.0 0.5 1.0 1.5 2.0 2.5 0 1 2 3 4 5 6
Time, s Frequency, Hz

(a) (b)
Figure 11.30 The echo energy and its spectrum of 128 pulses that is repeated at 0.02 second: (a) the
normalized echo energy and (b) the spectrum of echo energy. The target flies at 3 km/s
and rotates in coning at 2 Hz and in nutation at 3.3 Hz; both the procession angle and
maxium nutation angle are 10o.
Time-Domain Integral Equation Methods for Transient Problems 515

11.7 SUMMARY

In this chapter, a pedestrian description of the TDIE-MOT solvers for analyses of


transient scattering and radiation problems was presented. Numerical experiments
show that the present approaches are stable and accurate for 1-D, 2-D, and 3-D
structures. The advantages of the TDIE method include following (1) it is a time-
domain method that is preferred for transient and wideband problems, and the
time-evolving physical process may be observed during the simulation. (2) It is an
integral equation approach that naturally meets the radiation boundary condition
and is preferred for open-region problems such as radiation and scattering;
artificial ABC is not needed. (3) It employs surface meshing so that the number of
unknowns is relatively small; moreover, the system matrices are sparse if the
temporal basis functions are compact. (4) Fast algorithms may be incorporated by
referring to its frequency domain counterparts, such as from the multilevel fast
multipole algorithm (MLFMA) to the multilevel plane wave time domain (ML-
PWTD) algorithm; almost every fast algorithm is based on integral equations.
Because of limited space, this chapter was largely designed for scattering
problems by PEC bodies or nondispersive dielectric, for which the TDIE approach
could be superior to any other methods. Handling of dispersive, inhomogeneous,
and lossy scatterers is not complex using the TDIE schemes, which have been
studied by many authors [3638]. This chapter also concentrated on marching-on-
in-time (MOT) scheme. Marching-on-in-degree (MOD) and temporal Galerkin
matching methods [39, 40] were not discussed as they are a little more expensive
in memory requirement and CPU time than the MOT solver. Simulations of
microwave circuits, components and devices by using the TDIE methods have
been investigated by many authors as well [41, 42], but are not included in this
chapter. Also absent from this chapter, due to limited length, are fast algorithms
[43, 44], which could be very useful for analysis of transient scattering problem by
large-size structures.

REFERENCES

[1] C. Bennett, “A Technique for Computing Approximate Impulse Response for Conducting
Bodies,” Electrical Engineering, West Lafayette, IN: Purdue University, 1968.
[2] C. Bennett and W. Weeks, “Electromagnetic Pulse Response of Cylindrical Scatters,” IEEE G-
AP International Symposium, Northeastern University, pp. 176183, 1968.
[3] C. Bennett, “Transient Scattering from Conducting Cylinders,” IEEE Trans. Antennas Propagat.,
Vol. 18, pp. 627633, 1970.
[4] E. Sayre and R. Harrington, “Time Domain Radiation and Scattering by Thin Wires,” App. Sci.
Res., Vol. 26, pp. 413444, 1972.
[5] T. Lui and K. Mei, “A Time Domain Integral Equation Solution for Linear Antenna and
Scatterers,” Radio Sci., Vol. 8, pp. 797804, 1973.
[6] E. Miller, J. Poggio, and G. Burke, “An Integro-Differential Equation Technique for the Time
516 Advanced Computational Electromagnetic Methods and Applications

Domain Analysis of Thin-Wire Structures, I. The Numerical Method,” J. Comput. Phys., Vol. 12,
No. 1, pp. 2448, 1973.
[7] R. Mittra, “Integral Equation Methods for Transient Scattering,” Transient Electromagnetic
Fields, edited by L. B. Felsen, New York: Springer-Verlag, pp. 83138, 1976.
[8] A. Tijhuis, “Toward a Stable Marching-on-in-Time Method for Two Dimensional Electro-
Magnetic Scattering Problems,” Radio Sci., Vol. 19, pp. 13111317, 1984.
[9] B. Rynne, “Instability in Time Marching Methods for Scattering Problems,” Electromagnetics,
Vol. 6, pp. 129144, 1986.
[10] P. Smith, “Instability in Time Marching Methods for Scattering: Cause and Rectification,”
Electromagnetics, Vol. 10, pp. 439451, 1990.
[11] D. Vechinski and S. Rao, “A Stable Procedure to Calculate the Transient Scattering by
Conducting Surfaces of Arbitrary Shape,” IEEE Trans. Antennas Propagat., Vol. 40, pp.
661665, 1992.
[12] A. Sadigh and E. Arvas, “Treating the Instabilities in Marching-on-in-Time Method from a
Different Perspective,” IEEE Trans. Antennas Propagat., Vol. 41, pp. 16951702, 1993.
[13] P. Davies, “A Stability Analysis of a Time Marching Scheme for the General Surface Electric
Field Integral Equation,” Applied Numerical Mathematics, Vol. 27, pp. 3357, 1994.
[14] P. Davies and D. Duncan, “Averaging Techniques for Time-Marching Schemes for Retarded
Potential integral Equations,” App. Numer. Math., Vol. 23, pp. 291-310, 1997.
[15] E. Miller, “A Selective Survey of Computational Electromagnetics,” IEEE Trans. Antennas
Propagat., Vol. 30, pp. 29, 1988.
[16] S. Rao, T. Sarkar, and M. Bluck, “Time-Domain Modeling of Two-Dimensional Conducting
Cylinders Utilizing an Implicit Scheme-TM Incidence,” Microwave Opt. Technol. Lett., Vol. 15,
pp. 342347, 1997.
[17] Y. Shifman and Y. Leviatan, “On the Use of Spatiotemporal Multiresolution Analysis in Method
of Moments Solutions for the Time-Domain Integral Equation,” IEEE T. Antenn. Propagat., Vol.
49, pp. 1123–1129, 2001.
[18] A. Ergin, B. Shanker, and E. Michielssen, “Fast Evaluation of Three Dimensional Transient
Wave Fields Using Diagonal Translation Operators,” J. Comput. Phys., Vol. 146, pp. 157180,
1998.
[19] J. Hu and C. Chan, “Improved Temporal Basis Functions for Time Domain Electric Field
Integral Equation Method,” Electronics Letters, Vol. 35, No. 11, pp. 883885, 1999.
[20] D. Weile, G. Pisharody, N. Chen, Shanker B., and Michielssen E., “A Novel Scheme for the
Solution of the Time-Domain Integral Equations of Electromagnetics,” IEEE Trans. Antennas
Propagat., Vol. 52, pp. 283-295, 2004.
[21] H. Bagci, A. Yilmaz, V. Lomakin, and E. Michielssen, “Fast Solution of Mixed-Potential Time-
Domain Integral Equations for Half-Space Environments,” IEEE Trans. Geosci Remote Sensing,
Vol. 43, pp. 269279, 2005.
[22] M. Xia, G. Zhang, G. Dai, and C. Chan, “Stable Solution of Time Domain Integral Equation
Methods Using Quadratic B-Spline Basis Functions,” Journal of Computational Mathmatica,
Vol. 25, pp. 374384, 2007.
[23] Y. Chung, T. Sarkar, B. Jung, M. Salazar-Palma, J. Zhong, J. Seongman, and K. Kyungjung,
“Solution of Time Domain Integral Equation Using the Laguerre Polynomials,” IEEE Trans.
Antennas Propagat., Vol. 52, pp. 23192328, 2004.
Time-Domain Integral Equation Methods for Transient Problems 517

[24] M. Lu and E. Michielssen, “Closed form Evaluation of Time Domain Fields due to Rao-Wilton-
Glisson Sources for Use in Marching-on-in-Time Based EFIE Solvers,” IEEE APS Int. Symp.
Dig., pp. 7477, 2002.
[25] B. Zubik-Kowal and P. Davies, “Numerical Approximation of Time Domain Electromagnetic
Scattering from a Thin Wire,” Numerical Algorithms, Vol. 30, pp. 2535, 2002.
[26] G. Zhang, M. Xia, and X. Jiang, “Transient Analysis of Wire Structures Using Time Domain
Integral Equation Method with Exact Elements,” Progress in Electromagnetics Research, Vol.
92, pp. 281298, 2009.
[27] M. Lu, K. Yegin, B. Shanker, and E. Michielssen, “Fast Time Domain Integral Equation Solvers
for Analyzing Two-Dimensional Scattering Phenomena; Part I: Temporal Acceleration,”
Electromagnetics, Vol. 24, pp. 425–449, 2004.
[28] X. Guo, M. Xia, and C. Chan, “Stable TDIE-MOT Solver for Transient Scattering by Two-
Dimensional Conducting Structures,” IEEE Trans. Antennas Propagat., Vol. 62, pp. 2149–2157,
2014.
[29] B. Shanker, M. Lu, J. Yuan, and E. Michielssen, “Time Domain Integral Equation Analysis of
Scattering from Composite Bodies via Exact Evaluation of Radiation Fields,” IEEE Trans.
Antennas Propagat., Vol. 57, No. 5, pp. 1506–1520, 2009.
[30] Y. Shi, M. Xia, R. Chen, E. Michielssen, and M. Lu, “Stable Electric Field TDIE Solvers via
Quasi-Exact Evaluation of MOT Matrix Elements,” IEEE Trans. Antennas Propagat., Vol. 59,
pp. 574584, 2011.
[31] B. Kolundzija and A. Djordjevic, Electromagnetic Modeling of Composite Metallic and
Dielectric Structures, Norwood MA: Artch House, pp. 170171, 2002.
[32] S. Rao, D. Wilton, and A. Glisson, “Electromagnetic Scattering by Surfaces of Arbitrary Shape,”
IEEE Trans. Antennas Propagat., Vol. 30, pp. 409418, 1982.
[33] P. Arcioni, M. Bressan, and L. Perregrini, “On the Evaluation of the Double Surface Integrals
Arising in the Application of the Boundary Integral Method to 3-D Problems,” IEEE Trans.
Microwave Theory Tech., Vol. 45, pp. 436438, 1997.
[34] M. Cote, M. Woodworth, and A. Yaghjian, “Scattering from the Perfectly Conducting Cube,”
IEEE Trans. Antennas Propagat., Vol. 36, pp. 13211329, 1988.
[35] J. Volakis, A. Woo, H. Wang, M. Schuh, and M. Sanders, “Benchmark Radar Targets for the
Validation of Computational Electromagnetics Problems,” IEEE Antennas and Propagation
Magazine, Vol. 35, pp. 8489, 1993.
[36] G. Kobidze, J. Guo, B. Shanker, and E. Michielssen, “A Fast Time Domain Integral Equation
Based Scheme for Analyzing Scattering from Dispersive Objects,” IEEE Trans. Antennas
Propagat., Vol. 53, pp. 1215–1226, 2005.
[37] N. Gres, A. Ergin, B. Shanker, and E. Michielssen, “Volume Integral Equation Based Analysis of
Transient Electromagnetic Scattering from Three-Dimensional Inhomogeneous Dielectric
Objects,” Radio Sci., Vol. 36, No. 3, pp. 379–386, 2001.
[38] P. Jiang and E. Michielssen, “Temporal Acceleration of Time-domain Integral Equation Solvers
for Electromagnetic Scattering from Objects Residing in Lossy Media,” Microwave and Optical
Technology Letters, Vol. 44, pp. 223230, 2005.
[39] B. Jung, J. Zhong, T. Sarkar, and M. Salazar-Palma, “A Comparison of Marching-on in Time
Method with Marching-on in Degree Method for the TDIE Solver,” Progress In
Electromagnetics Research, Vol. 70, pp. 281296, 2007.
[40] Y. Beghein, K. Cools, H. Bagci, and D. Zutter, “A Space-Time Mixed Galerkin Marching-on-in-
Time Scheme for the Time-Domain Combined Field Integral Equation,” IEEE Trans. Antennas
518 Advanced Computational Electromagnetic Methods and Applications

Propagat., Vol. 61, pp. 12281238, 2013.


[41] K. Aygün, B. Fischer, J. Meng, B. Shanker, and E. Michielssen, “A Fast Hybrid Field-Circuit
Simulator for Transient Analysis of Microwave Circuits,” IEEE Trans. Microwave Theory and
Techniques, Vol. 52, pp. 573583, 2004.
[42] H. Bagci, F. Andriulli, F. Vipiana, G. Vecchi, and E. Michielssen, “A Well-Conditioned Integral-
Equation Formulation for Efficient Transient Analysis of Electrically Small Microelectronic
Devices,” IEEE Trans. Advanced Packaging, Vol. 33, pp. 468480, 2010.
[43] B. Shanker, A. Ergin, M. Lu, and E. Michielssen, “Fast Analysis of Transient Electromagnetic
Scattering Phenomena Using the Multilevel Plane Wave Time Domain Algorithm,” IEEE Trans.
Antennas Propagat., Vol. 51, pp. 628–641, 2003.
[44] A. Yilmaz, D. Weile, E. Michielssen, and J. Jin, “A Hierarchical FFT Algorithm (HIL-FFT) for
the Fast Analysis of Transient Electromagnetic Scattering Phenomena,” IEEE Trans. Antennas
Propagat., Vol. 51, pp. 971982, 2002.
Chapter 12
Statistical Methods and Computational
Electromagnetics Applied to Human Exposure
Assessment
Joe Wiart

12.1 INTRODUCTION

The previous chapters have presented the advanced computational electromagnetic


methods and various applications of these methods. This chapter reviews recent
trends in numerical methods used to assess the human radio frequency exposure
assessment.
The progress in the computational electromagnetics methods and the
increasing use of wireless communication systems have led to frequent use of the
FDTD method to design wireless system antennas and analyze the local and whole
body averaged specific absorption rate (SAR) induced by the EMFs emitted by
these wireless systems.
This chapter discusses exposure modeling methods and presents case studies
that show the ability of this method to assess the human exposure induced by radio
frequencies (RF) sources. It presents case studies in the near field and at larger
distances using the equivalent principle and spherical mode expansion of the RF
sources.
To respond to the increasing and versatile use of wireless systems, many
efforts have been made to build realistic human phantoms, including children
model, at the level of millimeter resolution. The researches that have been
conducted have shown large variability of the exposure that is influenced by
variable morphologies and postures, versatile source locations, and RF bands. The
characterization of such variability can request a large number of simulations but
despite the progress toward high performance computing, the FDTD method is not
compatible with Monte Carlo method in terms of time consumption.
This chapter discusses the use of surrogate models to characterize the
statistical variations of outputs induced by the variation of inputs. It presents case
studies that indicate the potential of statistical methods, such as generalized chaos

519
520 Advanced Computational Electromagnetic Methods and Applications

polynomial expansion (GPCE), which can be used to build these surrogate models
with a parsimonious number of FDTD simulations.

12.2 EXPOSURE ASSESSMENT USING FDTD AND THE CHALLENGE


OF VARIABILITY

12.2.1 Present Exposure Assessment Using FDTD

The human exposure to RF EMF is defined as the ratio of the electromagnetic


power absorbed by human tissues to the mass of these relative tissues. Locally the
exposure is quantified by using SAR:

1 E 2
SAR  (12.1)
2 

In this formulation σ, ρ, and E represent, respectively, the conductivity of the body


tissue (S/m), the mass density of the tissue (kg/m3) and the average electric field
strength in the tissue (V/m). Large efforts have been conducted during the past two
decades to develop experimental methods and equipment allowing the SAR
assessment.
The main advantage of the measurement is the capability to assess the field
and therefore SAR induced by an existing device. The main limitation is linked to
the invasive approaches. To bypass such limitation, the worst-case scenarios
involving homogeneous equivalent human tissues have been defined in particular
for standardization and compliance tests. However, the numerical methods have
clear advantages to assess the exposure in specific tissues. The numerical
assessment of the exposure, known as numerical dosimetry, is a non-invasive and
efficient way to estimate tissue exposure.
The complexity of wireless systems and the strong heterogeneity of the human
tissues lead to dealing with very large problems from the computational point of
view. For instance, to assess the exposure of a human phantom model such as
visible human body model [1] with a millimeter resolution the problem can request
large memory needed for a few billions of unknowns.
As explained in Chapters 2 to 5, the FDTD method can manage such
problems since the FDTD method does not require any matrix inversion. Therefore
the FDTD method is often used to assess SAR and has proven its ability to address
bio-electromagnetism problems, such as the interaction between the human tissues
and the antenna of communication systems, as shown in Figure 12.1.
Much effort has been carried out in the last ten years to take into account the
variability of the human morphology in human RF exposure. In recent years,
several voxel phantoms have been developed on the basis of tomographic data of
real individuals. For example, we have visible human (VH), Norman [2], the
Japanese models (Male and Japan Female) [3], the Korean model [4], and the
Statistical Methods and Computational Electromagnetics 521

model Zubal [5], and more recently the Virtual Family [6] and the Chinese [7]
models. The phantoms that have been developed recently have a millimeter
resolution while some of the previous can have a resolution of few millimeters.
Using some of these phantoms, studies have been carried [8] out to assess the
human exposure induced by a frontal plane wave from 20 MHz to 2.4 GHz. As
shown in Figure 12.2, the frequency plays an important role in the human exposure.
Figures 12.2 and 12.3 show also the large influence of the morphology.

(a) (b)

(c)

Figure 12.1 Influence of the presence of tissues on the pattern antenna of a mobile: (a) far-field
pattern of a mobile device alone; (b) configuration in the FDTD simulation; and (c) far
field pattern of a mobile device with a human head.
522 Advanced Computational Electromagnetic Methods and Applications

Whole body SAR (W/kg)

Frequency (MHz)

Figure 12.2 Whole-body SAR versus frequencies from 20 MHz to 2.4 GHz for an incident density
power of 1 W/m2.
Deviation from mean wb SAR in %

Frequency (MHz)

Figure 12.3 Variability analysis of SAR from 20 MHz to 2.4 GHz. © Phys. Med. Biol. 2008 [8].

It is evident from Figures 12.2 and 12.3 that the frequency variation of
different human exposure is significant. For instance, there is a large variability for
frequency bands close to 100 MHz, where the human body is similar to an antenna
having a good efficiency to grab energy. The influence of morphologies occurs
Statistical Methods and Computational Electromagnetics 523

also in the frequency close to 1,800 MHz, in this case the influence on the SAR
value is due to the human cross-section, which varies between individuals.
Since head and body morphologies evolve with age, much effort has also been
carried out to develop a child head [9, 10] as well as a fetus [11] at different stages
[12, 13] to assess SAR induced by a mobile phone in tissues of young children.

Figure 12.4 Electric field strength coming from a closed Femto box calculated using the FDTD
method combined with Huygens’ surface and the spherical wave modes.

To avoid nonuseful free space meshing, the equivalent principle is often used
to model the excitation source using the incident EMFs occurring at the surface
surrounding the exposed object (human in the current case). This method has
proven its efficiency when the coupling between the source and the exposed object
can be negligible. This method has been used for a long time in FDTD through the
well-known Huygens’ surface. However, only a plane wave is modeled most of the
time. With the recent use of small mobile devices that are quite close to the human
body, the plane wave model is discussable. An efficient way to overcome this limit
is to use the spherical wave expansion (SWE). The EMF emitted by the sources is
expressed as a combination of spherical waves (SW), which are an orthogonal
basis of the EMF space [14]:
k
E  r , ,    2s 1nN1mn  n Qs , m, n Fs , m, n  r , ,   (12.2)

H  r , ,    ik  2s 1nN1mn  n Qs , m, n F3 s , m, n  r , ,  (12.3)

where E and H are the electric and magnetic fields expressed in the spherical
coordinates (r is the radius from the source to the observation point, 𝜃 is the
elevation angle, and 𝜑 is the latitude angle), 𝜂 is the free-space propagation
constant, N is the number of modes, and Q is the coefficient and F is the spherical
524 Advanced Computational Electromagnetic Methods and Applications

wave function of index s (TM or TE fields), order m and degree n. The fields are
fully characterized by this expansion. There can theoretically be an infinite number
of spherical modes but in practice, the number N of modes is chosen in order to be
sufficient to correctly describe the field behavior emitted by the antenna [15]. Such
an approach can be used to calculate SAR induced by small devices such as
“femto-cells” that can be close to the human body, which does not allow the use of
a plane wave model. Figure 12.4 demonstrates the electric field distribution that is
performed with FDTD using Huygens’ surface and the spherical wave modes.

12.2.2 Uncertainty and Variability Management

Today, a new challenge for numerical RF exposure assessment is the variability of


the usages and the uncertainty assessment of numerical results. Several studies [9,
16, 17] have investigated SAR in brain tissues induced by a phone, which have
shown the emitted possible power variation due to the selected technologies [18,
19]. The tissue exposure depends on input parameters such as dielectric properties
[20], human morphology [9], phone design, and antenna location and frequency
[8].
The dielectric properties of human tissues can be affected by large variability
such as inhomogeneous tissues (say, skin) and external environment. These
dielectric properties are also affected by age [18]. In turn, SAR can be affected by
all these possible variations so that it is important to assess the uncertainty of the
calculation.
People not only tremendously use wireless communication systems for voice
calls but also use these systems for complex and variable configurations. Indeed,
today mobile phones are used not only close to the human head for voice calls as it
was in the 1990s, but also for uploading and downloading films, photos and more
general files. They can be used in speaker mode, with Bluetooth or wireless hands-
free kit. The new communication systems, in particular body sensors, can be
located at various positions on the body. In such cases the morphology, the posture
of the user, and the position of the wireless systems relative to the body can have a
significant impact on the human exposure and on the efficiency of the systems.
When dealing with RF human exposure protection, the management of such
variability has often been done through worst-case scenarios. For instance, the
compliance tests recommended by the International Standardization Committees
[the Institute of Electrical and Electronics Engineers (IEEE), the International
Electrotechnical Commission (IEC) and the European Committee for Electro-
technical Standardization (CENELEC)] are based on such an approach. The
compliance tests to the International Commission on Non-Ionizing Radiation
Protection (ICNIRP) and IEEE limits of the human exposure induced by mobile
phones have been performed to define phantoms, equivalent head liquids, and
positions representing the worst case. However, in real life, the mobile phone is
not always used in these standardized positions, and therefore, the assessment of
“real” exposure needs to handle variable inputs.
Statistical Methods and Computational Electromagnetics 525

In many physics or engineering problems when it is difficult or impossible to


use closed-forms to describe the phenomena, numerical integration or difference of
complex functions, the characterization of the distribution of a probabilistic
unknown entity is often performed using the Monte Carlo methods. These classes
of algorithms use random draws of a process to obtain the value of the function or
the distribution of a probabilistic unknown entity. A well-known academic
example is the estimation of the surface defined by a circle. The value of the
surface can be approximated using the Monte Carlo methods. Uniformly scattering
some objects of uniform size within a square containing a maximum disk delimited
by the circle and counting the number of these objects inside the circle and the
total number of objects allow us to have an approximation of the ratio ( value) of
the disk surface and the square surface.
Considering a response quantity y = M(x) having the mean value my and the
standard deviation Sy and assuming a sample of n input vectors {x(1), x(2),….x(n)},
the usual estimators of the mean and standard deviation are given by
1 n
my  i 1 M  x  i   (12.4)
n

1

 n M  x  i    my 
2
Sy  (12.5)
 n  1 i 1
Thanks to the central limit theorem, the estimator my is asymptotically
Gaussian. As a consequence, if n is sufficiently large, with q1 2 the 𝛼 quantile of
the centered and reduced Gaussian law 𝒩(0, 1), the uncertainty of the estimator is

given by  q1 2 S y 
n . For example, with a typical risk value of  = 5%, the

confidence interval of the mean is given by my  1.96 S y  


n q1 2  m  my 


1.96 S y 
n q1 2 . A similar formula exists for the confidence interval of the
standard deviation.
The main advantage of the Monte Carlo method is its simplicity and
applicability to a large class of problems but the main limitation is the very large
number of requested experiments for problems involving a large number of inputs
or for higher mode estimation.
Other methods exist [21] but they are also requesting a large number of
samples that are not compatible with numerical methods requesting much time
computation such as FDTD. As described previously, the main advantage of the
FDTD method is to proceed without any matrix inversion that can be cumbersome,
but the main constraint is the time computation that can be very large (i.e., at least
a few hours if the calculation involved the whole body) for human RF exposure
assessment.
Much effort has been put toward high-performance computing using parallel
526 Advanced Computational Electromagnetic Methods and Applications

architectures with recently developed graphic processor units. But even with this
push, the time computation is still not compatible with Monte Carlo methods that
can require from a few hundreds to a few thousands of simulations depending on
the required precision of the higher computed moment of the probability
distribution.

12.3 METAMODEL MODEL FOR UNCERTAINTY PROPAGATION

The problem described in the previous section is not specific to the radio
frequency electromagnetic exposure assessment, and it may occur in many other
physics or engineering problems involving heavy use of computer simulations that
request significant time computations. Typical examples can be found in
mechanics with the design optimization of an optimal shape. Indeed, for many
real-world problems, a single simulation, as in RF dosimetry or antenna design,
can take several minutes or hours. Similar problems occur when the objective is to
characterize the influence of input variations on the statistical distribution of the
calculated outputs with the simulations.
A way to overcome such a limitation is to build simpler approximation
models, known as alternative models, surrogate models, response surfaces, or
metamodels that mimic the complex response of the model (represented in
dosimetry by the FDTD simulations) as close as possible while calculating cheaply
[22].
A model of a physical problem or system can be represented using a general
function M : x  y  M  x  ; with this notation x is a vector composed of the input
parameters of the model ( x  D  M
). The model response, y  M  x  , is also a
vector with a dimension possibly different from the input. Within this formalism,
described in Figure 12.5, the objective is to build an approximation of the model
response: y  M  x  .

Physical model (e.g., FDTD)


Accurate output

Input (x)
• Geometry design
• Frequency
• Shape … …
Surrogate model
Approximation output

Figure 12.5 General scheme for surrogate model use.


Statistical Methods and Computational Electromagnetics 527

The construction of the surrogate model, 𝑀 ̂ , often considers the physical


model (in the current case, the FDTD simulator) as a black box. Only the input-
output relationship is considered as important; the internal operations of the
simulation code and the physical phenomena do not need to be known or
understood. The main challenge is therefore to build the surrogate model 𝑀 ̂ using
the response of the simulator for a number of points that have to be selected in
such a way as to minimize the number relative to the objectives fixed for the
surrogate model.
Once the surrogate model is built and validated, the issue of computation time
is no longer a limit and the propagation of the uncertainties or the variability of the
inputs data and the characterization of the outputs (shape of the distribution,
quintiles, and sensitivity) can be conducted using the Monte Carlo methods.
Among the methods that exist to build these alternatives, we will look more
specifically on the regression, the Kriging method, and the polynomial chaos
expansion (PCE) also known under the term of polynomial chaos. These two
methods have shown their great ability to manage the kind of problems that
dosimetry is facing. The construction of alternative models imposes a sample of
the real phenomenon that the simplified model tries to approximate. The next
section will discuss the method of selecting the experiments.

12.4 DESIGN OF EXPERIMENTS

We will now focus on the design of experiments used to build the surrogate model
of the calculation code. The experiments are the configurations that will be
computed numerically (e.g., via FDTD). In the case of a person exposed to an
electromagnetic plane wave having variable angles of incidences ( 𝜃, 𝜑 ), the
experimental design will consist of selecting a set of incidence directions {(𝜃𝑖 , 𝜑𝑖 ),
. . , (𝜃𝑗 , 𝜑𝑗 ), . . . } (those are the experiences), for which the whole-body SAR, for
example, will be calculated using FDTD.
It is obvious that experiments must be optimally sampled to estimate model
parameters. The fundamental difference between the design of experiments
developed in the laboratory for physical experiences and the design of experiments
built for numerical calculations is that we assume the presence of random errors in
measurement in the laboratory but not in numerical simulations. The repetition of a
numerical experiment under the same conditions is irrelevant since it does not
provide any additional information that is not the case with physical experiences.
The choice of the design of numerical experiments and therefore the points for
which the simulations will be conducted must meet several constraints. The first
one is to distribute these points in space as uniformly as possible to capture
possible nonlinearities (relative to input variables) of the simulated phenomenon.
The second one is that the uniform distribution must subsist if a dimensionality
reduction is performed. Indeed, when problems have a large dimension, that is to
528 Advanced Computational Electromagnetic Methods and Applications

say that they have many input parameters, it is common to observe that the
calculations depend heavily on only a few influential variables or on main
components consisting of linear combinations of these variables. It is therefore
important to keep the uniform filling properly even in projection onto subspaces.
The last constraint, but not the least, is parsimony. It is necessary that the number
of simulations is large enough to estimate all the coefficients of the approximation
model but it must be limited to reduce the cost of simulation variables. In the case
of the SAR calculations, for which computational time can be a few hours, this last
constraint is fundamental.
Literature and textbooks exist on the design of experiments. An easy way to
address the problem of filling the space is to select points on a regular grid in the
experimental area but, as one can easily understand, that such an approach cannot
only lead to a large number of experiments but also to the wrong model by
ignoring some component of the phenomenon due to the regular spacing. An easy
way to avoid the problems due to the regular spacing is to select the point
randomly. But such an approach can create, as shown in Figure 12.6, a nonuniform
sampling of the space that can lead to overweighing some part of the space and
having a possible bias surrogate model.

0.9

0.8

0.7

0.6

0.5
y

0.4

0.3

0.2

0.1

0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Figure 12.6
x
Random sampling of 10 points for two variables having uniform distribution (arbitrary
units for the x- and y-axes).

Among the existing methods dedicated to the planning of experiments, the


Latin hypercube sampling, known as LHS, often used to construct computer
experiments, is a trade-off between the opposite constraints described previously
and an easily understandable method. LHS [23] is a statistical method for
generating a sample plausible set of parameter values from a multidimensional
statistical distribution while taking care of a uniform filling of space. In two
Statistical Methods and Computational Electromagnetics 529

dimensions a square grid, based on equally probable intervals, containing sample


positions is a Latin square sampling if there is only one sample in each row and
each column. The generalization of this concept to an arbitrary number of
dimensions is LHS. Implementing LHS is quite easy; to sample a function of N
variables, the range of each variable is divided into M equally probable intervals,
then M sample points (see Figure 12.7) are placed in these intervals to satisfy LHS.
The number of possible combinations of a LHS of M divisions and N
variables is given by

N 1
 M 1 
 p 0  M  p     M !
N 1
(12.6)
 

Some of the possible combinations do not fill uniformly the space as shown in
Figure 12.8. Following the LHS rules does not prevent possible bad space filling.
The identification of the combination inducing the best space filling can be
done using the “maxi-min” criteria. The minimum Euclidean distance di between
the points of the possible planning of experiments can be calculated of all the
possible experiment plans. The planning of experiments having the maximum
distance di can be considered as the best plan from the space filling point of view.

1.0
0.9
0.8
0.7
0.6
y

0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Figure 12.7
x
LHS for M = 10 points and two variables having uniform distribution (arbitrary units for
the x- and y-axes).

In the context of iterative approaches, it is sometimes necessary to enlarge


existing sampling plans, but existing sampled points can have been used as inputs
of heavy calculations. Because building a completely new planning of experiments
is not always possible it is better to enrich the existing sampling plans. The LHS
method allows adding a point thanks to the technique of nested Latin hypercube
530 Advanced Computational Electromagnetic Methods and Applications

sampling (NLHS) [24, 25]. The principle of this technique is to complement the
existing LHS plane and keep, at least approximately, the LHS plan configuration.
With an initial LHS designed for N variables and M samples, adding a new point
leads to a LHS for N variables and M+1 samples. According to the LHS approach,
the range of each variable is divided into M+1 equally probable intervals and M+1
sample points should be placed in these intervals. The new intervals are by
definition smaller than the initial ones, so all the existing points are located in
different intervals, and NLHS will keep the existing point and only add a point to
satisfy the Latin hypercube criteria (only one sample in each row and each
column).

(a) (b)
Figure 12.8 (a) The LHS design obtained with interval selection with N = 2 (dimensions) and M = 6
(intervals) and (b) a different LHS design obtained with the same constraint.

12.5 SURROGATE MODEL VALIDATION

The use of alternative models requests a method to validate these surrogate models.
Studies have been performed on the assessment of the accuracy of a model, and
the first intuitive approach is to analyze the errors of the prediction. In statistical
data analysis, the variability of the data set is measured through different sums of
squares and in particular the total sum of squares (known as SST or TSS) and the
residual sum of squares (known as RSS or SSR). SST and SSR are given by

SST    yi  y 
2
(12.7)
i

 
2
SSR   yi  y i (12.8)
i

where 𝑦i and 𝑦̂i are, respectively, the observed and predicted values and 𝑦̅ is the
mean value given by:
Statistical Methods and Computational Electromagnetics 531

1
  yi 
N
y (12.9)
N i 1

A coefficient of determination R2 is also often used when regression is


performed, and it provides information about the goodness of fit of a model and is
given by:
SSR  SST
R2  (12.10)
SST
In fact, such coefficient of determination is well suited for the data analysis
but not really for the quality analysis of a surrogate model. Having the coefficient
R2 close to one does not provide information on the generalization aptitude of the
surrogate model since over-fitting or over-training can also make this coefficient
close to one.
Among the existing methods more suitable for the assessment of a
generalization aptitude of a model, the cross-validation is often used in the domain
of design of planning of numerical experiments and surrogate model establishment.
Cross-validation is a technique for assessing the accuracy predictive model that
will be used. In general, the creation of a predictive model (e.g., surrogate model)
is based on a data set of known data on which the learning is performed and
another data set of known data, but not used in learning, and, on which the model
is tested. In summary, the purpose of cross-validation is to define a method for
testing the quality of the model in its learning phase while avoiding over-fitting or
over-training, the ultimate goal being to provide an indicator of the quality
prediction model.
Considering the set of training data and the set of test data, 𝑌2 =
{𝑦2,1 , 𝑦2,2 , ⋯ 𝑦2,𝑗 , ⋯ , 𝑦2,𝑁 }, the surrogate model, y  M  x  , is built using the set
of learning data, 𝑌1 . It will therefore be tested using the set 𝑌2 . A performance
status model can be, for example, the mean square error (MSE) that measures the
average of the squares of the “errors,” that is, the difference between the estimator
and what is estimated. With the present notation the MSE, 𝜀, is defined as
1
 
2
 p1 y 2, p  y2, p
N
 (12.11)
N
Simulations can be cumbersome and it is often the case in dosimetry that the
production of two data sets (sets of training data and test) is, in this context, often
too luxurious. For this reason, the methods have been developed to use the
computed data in learning and testing. In this section we will focus on two
methods that are often used in computer experiments: the leave-one-out and the
bootstrap.
The leave-one-out cross-validation (LOOCV) is a very intuitive method.
Assume you have N experiments 𝑦𝑖 = 𝑀(𝑥𝑖 ), you can use N1 experiments to
532 Advanced Computational Electromagnetic Methods and Applications

build a model and one experiment to test it. If N is large enough then the accuracy
of a model based on (N1) experiments is similar to N. If we consider the N
simulations, 𝑌 = {𝑦1 , 𝑦2 , … , 𝑦𝑝 , … , 𝑦𝑁 }, that have been performed and if one notes
𝑀̂−𝑖 the model based on (N1) simulations, {𝑦1 , 𝑦2 , . . , 𝑦𝑝 , . . 𝑦𝑁 } − {𝑦𝑗 }, then we
can estimate the mean square error of the model using
1
 
2
 M i  xi   M  xi 
N
 rrloo (12.12)
N i 1

To help the interpretation of this error, we can calculate the coefficient of


determination Q2 from this error:
 rrloo
Q2  1  (12.13)
ˆ y2

where 𝜎̂𝑦2 is the variance of the outputs Y. Having Q2 close to one indicates a good
generalization aptitude of the surrogate model.
The other quite simple and intuitive method is the bootstrap [26] that does not
require additional information other than that available in the sample. This
approach is based on constructing a number of new samples (called bootstrap
sample or resamples) obtained by replacing the random sampling with the original
sample and having the size equal to the observed dataset.

12.6 MODEL CONSTRUCTION AND REGRESSION

The polynomial interpolation is the simplest way used in engineering problems.


Given a set of n + 1 data points (xi, yi), linked to the response of a system governed
by an unknown function F, 𝑦𝑖 = 𝐹(𝑥𝑖 ), if we assume that the approximation of 𝐹,
noted ̂𝐹 , can be done using a polynomial, then its expression is

 x  xj 
F  x 
n
 i 0
 yi o  j
 n, j i x  x
 (12.14)
 i j 

If the polynomial approximation cannot be done, then a linear regression can
be performed to model the relationship between a variable vector and explanatory
variables. For instance, consider a set of data representing the observation between
a vector 𝑦 having m components with explanatory variables having an n
component

y , x
1 11 , x12 x1n , y2 , x21 x2n
2n , y p , x p1 x pn , ym , xm1 xmn  (12.15)

The linear regression will look for a model such as


Statistical Methods and Computational Electromagnetics 533

yp   x
i
i ip   p  x pT     p (12.16)

where ( )𝑇 stands for transpose, y is the response variable, x is the explanatories


variable, 𝛽 is the parameter vector, and 𝜀 is the error term. This expression can be
stacked using the inner product
y x  
 (12.17)

 x1T   x11 x11nn 


   
X 
   xij  (12.18)
 xT   xn1 xmn 
 m

 1   y1   1 
  , y


 , 
  
  (12.19)
  m   ym   m 

Figure 12.9 Example of linear regression in one-dimension distribution (arbitrary units for the x-
and y- axes).

The challenge is to estimate the unknown parameter 𝛽. The most common


estimator used to assess 𝛽 is the ordinary least square that is conceptually simple
and computationally straightforward. This method, consisting of minimization of
the sum of squared residuals, allows having a closed-form expression of 𝛽, ̂ the
estimation value of β [such estimation is unbiased if the errors have finite variance
and are uncorrelated with the explanatory variables (𝐸[𝑥𝑖 𝜀𝑖 ] = 0)]:

   xT x xT y
1
(12.20)

An example of application of linear regression in 1-D is given in Figure 12.9.


534 Advanced Computational Electromagnetic Methods and Applications

12.7 POLYNOMIAL CHAOS EXPANSIONS

12.7.1 Introduction to Polynomial Chaos Expansions

As explained previously, the Monte Carlo method has been and is still is the most
commonly used method when statistics are evaluated on the outputs. The
disadvantage of this method is its low rate of convergence with the number of
simulations, m, and the convergence rate is proportional to the inverse of the root
1
square of m (𝑜 = ). If higher-order moments such as the variance are needed,
√𝑚
this method is often prohibitive.
In electromagnetics, methods based on the stochastic finite element were
introduced [2729] in the past decades. These methods have already been used in
other fields such as mechanics and fluid dynamics to incorporate random
fluctuations in the deterministic finite-element method. The key results on which
these approaches are based are due to Norbert Wiener [30] where Hermite
polynomials were used to model stochastic processes with Gaussian random
variables, a few years after Cameron and Martin [31] showed that such an
expansion converges (ℒ 2 ) for any arbitrary stochastic process with finite second
moment. Such constraint is quite easy since most of the physical systems comply
with it. Recently, studies [20, 32, 33] have contributed the development and use of
stochastic methods in the engineering domain and have provided a mathematical
framework to manage the variability of the inputs in numerical calculations.
The methods that use polynomial chaos (PC) expansion can be divided into
broad categories: the intrusive methods requesting to modify the simulation code
of the solver, and the nonintrusive methods that are using solvers as black boxes.
The first category is strongly dependent on the simulation code and requires
manipulation of the governing equations that can be can be very complex and
analytically cumbersome. This complexity of intrusive approaches explains the
increasing attention given to the nonintrusive methods, which, using the complex
codes as black boxes, are more easily generalizable. The nonintrusive approaches
may themselves be divided into two categories. The first one is composed of the
stochastic collocation and the stochastic spectral methods. The stochastic
collocation method, in which the polynomial approximation is constrained to fit
exactly the model response at a suitable point set, relies upon well-established
results on Lagrange polynomial interpolation [34]. The second one is the spectral
methods in which the polynomial chaos coefficients are estimated using spectral
projections or least-square regressions. In this section we will pay specific
attention to these spectral methods.
The aim of this chapter is to consider the variations of the outputs of a
physical phenomenon or system induced by the variations of the inputs. Therefore,
of interest is a mathematical model 𝑀 having M inputs, 𝑦 = 𝑀(𝑥), with the inputs
x affected by some possible random variations or uncertainties. Because of that, a
probabilistic framework needs to be defined.
Statistical Methods and Computational Electromagnetics 535

Let us note the probability space (Ω, ℱ, 𝒫) , where Ω is the event space
equipped with σ-algebra ℱ and probability measure 𝒫. In the following M random
variables are noted by uppercase letters (𝜔): Ω ⟶ 𝒟𝑥 ⊂ ℝ𝑀 ; their realizations are
noted by the corresponding lowercase letters (e.g., 𝑥).
Let us also note ℒ 2 (Ω, ℱ, 𝒫, ℝ), the space of squared integrable real valued
function equipped with the inner product:

X1 , X 2 2  E  X1 , X 2  
 X 1   X 2   d  

(12.21)
  x1 x2 f X1 , X 2  x1 , x2  dx1dx2
Dx

where 𝑓𝑋1,𝑋2 is the joint probability density function (PDF) of the vector {𝑋1 , 𝑋2 }.
This inner product provides also a norm:

X  E X 2  (12.22)

Under this formalism and assuming that the components of the input random
vector are independent. Any scalar-valued model ℳ: ℝM ⟼ ℝ and having a
random response 𝑌(ω) = 𝑀(𝑋(ω)) with a finite second-order moment Ε(𝑌 2 ) <
+∞ can be described [35] using an infinite modal expansion such as:

Y        X  
M
(12.23)


where 𝛼 is the multi-index. 𝛽α is the coefficient of the polynomial expansion


and 𝜓𝛼 the multidimensional orthogonal polynomials. In this equation, the notation
ω show that the response Y and the input X are random variables. The series (12.23)
is usually referred to as polynomial chaos expansion (PCE). The independence of
the input random variables allows having the PDF as
M
f X  x    f X i  xi  (12.24)
i 1

where 𝑓𝑋𝑖 (𝑥𝑖 ) is the marginal PDF of X. Assuming a family of orthonormal


(𝑖)
polynomials {𝜋𝐽 (𝑋𝑖 )}, with respect to 𝑓𝑋𝑖 (𝑥𝑖 ), and having a degree j.

 ji   X i  ,  ki   X i  E
  
 ji   X i  ,  ki   X i   j ,k (12.25)

where 𝛿𝑗,𝑘 is the kronnecker symbol.


The tensorization of univariate polynomials provides a set of orthonormal
multivariate polynomials {𝜓𝑘 , 𝑘 ∈ ℕ𝑀 } where k denotes all the M-uplets
(𝑘1 , 𝑘2 , … , 𝑘𝑀 ) ∈ ℕ𝑀 and 𝜓𝑘 is defined by:
536 Advanced Computational Electromagnetic Methods and Applications

M
 k  x     ki   xi  i
(12.26)
i 1

The PCE was originally formulated with standard Gaussian random variables
and Hermite polynomials. It was later extended to other classical random variables
together with basis functions. The decomposition is then often referred to as
Generalized PCE (GPCE). Table 12.1 provides some of the most common
continuous distributions in the associated family of polynomials.

Table 12.1
Example of Relationship Between Families of Orthogonal Polynomials
in GPCE and the Usual Input Distributions

Distribution Density Function Support Polynomial

Gaussian 1 𝑥 2⁄ (∞, +∞) Hermite: 𝐻𝑒𝑘 (𝑥)


𝑒− 2
√2𝜋

Uniform 1[−1,1] (𝑥)/2 [−1, 1] Legendre: 𝑃𝑘 (𝑥)

Gamma 𝑥 𝑎 𝑒 −𝑥 1ℝ+ (𝑥) (0, +∞) Laguerre: 𝐿𝑎𝑘 (𝑥)

Beta (1 − 𝑥)𝑎 (1 + 𝑥)𝑏 (−1, 1)


1[−1,1] (𝑥) Jacobi: 𝐽𝑘𝑎,𝑏 (𝑥)
𝐵(𝑎)𝐵(𝑏)

Input having uniform or Gaussian distributions are often used in engineering.


The corresponding polynomials families are, respectively, the Legendre
polynomials family {𝑃𝑘 (𝑥)} and the Hermite polynomials family {𝐻𝑒𝑘 (𝑥)}, which
at can be iteratively built using, respectively, the following formulae:
P1  x   P0  x  
1 (12.27)

 n  1 Pn1  x  
 2n 1 xP  x   nPn1  x  , n (12.28)

 1
 Pk  x  Pl  x 11,1  x 11,1  x  / 2   k ,l (12.29)
 2n  1

1 1
P1  x  
1, P2  x  
2
3x2  1 , P3  x  
2
 5 x 3  3x  (12.30)

H e 1  x   H eo  x  
1 (12.31)

en1  x 
H xH en  x   nH en1  x  , n (12.32)
Statistical Methods and Computational Electromagnetics 537

x2
 1 
 H em  x  H en  x  e 2
dx  n ! k ,l (12.33)

2

H e1  x   1, H e2  x    x 2  1 , H e3  x    x3  3x  (12.34)

If the statistical distribution of the input data used in a problem are not those
associated with the ones associated with a well-defined family of polynomial, then
an iso-probabilistic transformation can be used. Let us denote P as the probability
governing a random variable 𝑋, and 𝐹𝑋 (𝑥) as the PDF of the random variable 𝑋
that is monotone and defined as:
FX  x   P  X  x  (12.35)

Then the random variable defined as 𝑌 = 𝐹𝑋 (𝑋) has a uniform distribution:

FY    P Y      FX  X      P  X  FX1  
(12.36)
=FX  FX1     

If X has correlated components, then advanced transformations such as the


Nataf [36] or Rosenblatt [37] transforms have to be used to recast the problem in
terms of noncorrelated random variables.
In all the cases, for practical implementation, finite dimensional polynomial
chaos has to be built. Indeed, if all the possible polynomials are used in the GPCE,
assuming multivariate polynomials, 𝜓𝑘 having M input variables, limiting the
maximum degree (∑M M
i=1 k i ) to N, (i.e., ∀k ≡ {k i }, ∑i=1 k i ≤ N) will lead to having
the size of this finite-dimensional basis given by:

M  N  N  N  M !
P   CN  M  (12.37)
 N   N ! M !
Let us note 𝑌̂ as a truncation of the GPCE:
N 1
Y    k k  X  (12.38)
k 0

𝑌̂ represents the surrogate model we are looking for and will be the substitute
to the complex and cumbersome FDTD calculations. The truncation and the
polynomials involved in this substitution model are influencing the accuracy of
such mode. The validation methods described previously will have to be used to
check the validity of such model and adapt it if necessary through the number and
type of polynomial used in ̂𝑌. The next step is to assess the expansion coefficients.
538 Advanced Computational Electromagnetic Methods and Applications

12.7.2 Calculation of the GPCE Coefficients

The GPCE coefficients can be estimated using spectral projections or least-square


regressions.

12.7.2.1 Calculation of the GPCE Coefficients Using Spectral Projections:

These approaches are based on the orthogonality property of the chaos


polynomials. Assuming normalized polynomials (i.e., ‖𝜓𝑘 ‖ = 1 ) and as used
previously 𝑌(ω) = 𝑀(𝑋(ω)) on the response of a physical phenomena or system
(e.g., FDTD simulations) to the random input 𝑋(ω), the coefficient 𝛽𝑚 can be
obtained using a projection:

m  E  M  X  m  X    M  x  m  x  f X  x  dx (12.39)

    x  m  x  f X  x  dx 

m   k k
(12.40)
k 0

 
 m     k   k m       k k ,m  (12.41)
k 0 k 0

In case of numerical experiments 𝑀(𝑥) represent a complex system


requesting calculation performed with a solver such as the FDTD. The integral
described in (12.39) (∫ 𝑀(𝑥) 𝜓𝑚 (𝑥)𝑓𝑋 (𝑥)𝑑𝑥) cannot be calculated using close
form. It is necessary to use numerical integration techniques to estimate 𝛽𝑚 . These
techniques are based on the choice of L, the integration weights defined by w(i)
and the random integration nodes x(i), which lead to an approximation of the
coefficient:

   
L
 m   m   wi  M xi   m xi  (12.42)
i 1

The accuracy depends on the number and choice of the sampling. The
simplest method to assess this is to use the Monte Carlo method. In this case the
standard error that is decreasing in L−1/2 induces a low convergence rate, which is a
well-known drawback of Monte Carlo simulations. Other methods such as the
Latin hypercube sampling the quasi-random or low discrepancy sequences are
more efficient than MCM but they still request a large number of simulations that
have to conduct as much as the number of inputs and the requested coefficients.
Such a large number of calculations are often not compatible with the FDTD
constraints.
Statistical Methods and Computational Electromagnetics 539

(a) (b)
Figure 12.10 Collocation points with sparse grids on the left, with a tensorial product on the right. (a)
Sparse grids. (b) Tensorial product.

An alternative approach for selecting the integration nodes and weights is the
use of quadrature techniques, but the main drawback of this approach is still the
curse of dimensionality. For multiple input variables the basic method and the
tensor product require the use of LN point where N is the number of random input
variables and L is the number of points used by quadrature 1-D.

Table 12.2
Number of Simulations versus Order and Number of Uncertain Variables for Sparse Grid

Order 1 2 3 4

1 3 5 7 9

2 5 13 25 41

3 9 29 69 137

4 17 65 177 401

5 33 145 441 1105

In order to bypass this issue, sparse quadrature schemes using the Smolyak
algorithm [38] that uses a multidimensional grid construction and sparse grids can
be used to reduce the simulation effort. Advanced methods such as Clenshaw-
540 Advanced Computational Electromagnetic Methods and Applications

Curtis formulation [39] can reduce even more the number of collocation points
(see Figure 12.10).
In spite of these efforts, as shown in Table 12.2, the number of simulations
requests is still important even for advance methods involving the Clenshaw-Curtis
rule. As a conclusion, the quadrature approach combined with smart grid can be
used for problems involving a small number of variables. In practical problems the
number of inputs is often higher than 3 or 4, and in this case the projection
approach is not really compatible with FDTD.

12.7.2.2 Calculation of the GPCE Coefficients Using Spectral Regression

The calculation of the GPCE coefficients using regression aims at computing the
GPCE coefficients that minimize the mean-square error of approximation of the
model response. Consider a model 𝑦̂ that has been built using a truncated GPCE.
N 1
Y    k k  X  (12.43)
k 0

If the model has M variables and we want to build a model with pth order,
(𝑀+𝑝)!
according to (12.37) the number of coefficients is (e.g., 70 if the model has
𝑝!𝑀!
four inputs and we start with a polynomial order of 4).

12.7.3 Construction of a Surrogate Model Using a Polynomial Chaos

12.7.3.1 Full GPCE

The construction of a surrogate model using the polynomial chaos can follow the
process described in Figure 12.11. The first step is to identify and characterize the
input variables, and the second step will be to identify and build the polynomial
family (see the previous section) according to the inputs and the characteristics of
the inputs (e.g., Legendre polynomials since uniform inputs). After that the
computational budget has to be taken into account. The LHS (see the previous
section) has to take into account the number of inputs (given by the problem) and
the degree of GPCE we want to start. For example, with the previous example
(four inputs and polynomial order of 4) we will need 70 coefficients, so the LHS
has to be larger than 70 and larger enough to avoid a bad conditioning of the
information matrix of the least-square formulation. The coefficients are provided
through a regression. The quality of the surrogate model can be tested using the
leave-one-out method. If the accuracy is in line with the target accuracy then the
surrogate model is ready or else new points have to complement (i.e., FDTD
simulations have to be performed) the initial LHS until the quality reaches the
target accuracy.
Statistical Methods and Computational Electromagnetics 541

Characterize the (N) inputs and their statistical distributions

Identify the polynomial family to use according to the input


distributions and probabilistic transformations performed

Decide the initial degree (M) of GPCE to build

Build initial LHS according to N and M or enlarged existing LHS

Calculate the GPCE coefficients using regression

Test accuracy (Q2) of the surrogate model using LOOCV method

Is the Q2 assessed with


LOOCV in line with
expectation? N
Y
Surrogate model is ready

Figure 12.11 Computational scheme of the surrogate built using full GPCE.

3500

3000
Cardinal of the GPCE

2500

2000

1500

1000

500

0
1 2 3 4 5 6 7 8 9 10
Number of variables
Figure 12.12 Cardinal of the GPCE basis with 5th maximum order versus the number of variables.

With a GPCE that is using all the polynomial, the cardinal of the GPCE basis
is given by (12.37). With two input variables and a polynomial order up to 7 the
542 Advanced Computational Electromagnetic Methods and Applications

cardinal of the GPCE basis is 36 while it is 792 for the same order but with 5 input
variables. As shown in Figure 12.12, the number of polynomials can be huge, and
this constraint is known as the curse of dimensionality.

12.7.3.2 Sparse GPCE

In fact, as we can imagine, all the polynomials do not have the same importance in
the GPCE truncation. Studies have been performed to build iteratively a sparse
polynomial chaos expansion for uncertainty propagation and sensitivity analysis
[40]. The objective, as shown in Figure 12.13, is to select the most important
polynomials taking into account the constraint of a constant cardinal of the
polynomials basis.

7 7
6 6
5 5
74 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7
0 1 2 3 14 5 6 7

Figure 12.13 Example of selection of polynomials in a 2-D case. The x- and y-axes represent the
order of the univariate polynomial. (a) shows a full GPCE and (b) shows a sparse GPCE.
(a) shows, in the red boxes, the polynomials having order below 5. (b) shows, in the
green boxes, a possible selection of polynomials with a number of selected polynomials
less important than that in (a) but with higher-order polynomials. In both cases, the blue
cross represents the possible polynomial having a pure order below 7.

Among the approaches that have been studied to build the sparse GPCE, there
is a method based on the sparsity-of-effects principle, which states that most
models are principally governed by the main effects and low-order interactions.
Within these approaches we have, for instance, the hyperbolic index sets [41] that
are quite easy to implement (select the polynomial having a global order below a
hyperbolic curve) and have been used in mechanical problems but seem less
relevant in electromagnetism.
Though sparse polynomial chaos based on the least angle regression method
(LAR) [42] and least absolute shrinkage and selection operator method (LASSO)
[43] (also known as “LARS”) are not easy to implement they are much more
Statistical Methods and Computational Electromagnetics 543

efficient and are well adapted for engineering problems [44] including
electromagnetics and bio-electromagnetism problems.
Among a large set of polynomials forming a full truncation, the LARS
objective is to select iteratively those polynomials having the greatest impact from
the point of view of their correlation with the residual. As a consequence, the
algorithm chooses, one by one, the polynomials in descending order of influence.
Therefore, this method provides many possible truncations having increasing sizes.
Accordingly, the steps from calculate the GPCE coefficient using regression in
Figure 12.14 linked to the computational scheme have to be replaced by a new one
describing the process of selection of polynomials using LARS and selecting,
using LOOCV, the best truncations.

Select sets of polynomials according to LARS

Calculate the GPCE coefficients (using regression)


of all the sets of polynomials selected using LARS

Calculate Q2 for all the sets of polynomials selected using LOOCV

Is the best Q2 Enlarged existing


in line with expectation? LHS
N
Y
The set of polynomials selected is the one having the best Q 2

Surrogate model is ready

Figure 12.14 Computational scheme of the surrogate built using sparse GPCE.

12.7.4 Example of the Use of the GPCE Model

To verify the compliance of mobile phones with safety limits when used close to
the head, technical standards have defined two test positions close to the head
(known as the cheek and tilt name). These configurations are used in a procedure
that uses a homogeneous phantom (known as SAM) designed to overestimate
human exposure. This approach is useful to ensure compliance with exposure
limits but does not address the need of epidemiological studies of characterization
distribution of the exposure associated with various uses of phones. To
characterize such real exposure, it is of interest to investigate the impact on head
exposure of variable phone positions. As explained in previous sections, usual
approaches that are using the Monte Carlo method are not suitable for FDTD. To
544 Advanced Computational Electromagnetic Methods and Applications

overcome this limitation and in line with the previous section, a sparse GPCE has
been used [45] to analyze the influence of variable phone usage on the SAR10g
(maximum SAR over 10 grams in the head) response. Figure 12.15 shows the
configuration studied: a phone located close to the head of the Duke human
phantom (part of the virtual family [6]). The handset model is a generic one
composed of a p.c.b., a screen, a battery, and a patch antenna located on the top of
the phone model.
Four parameters, 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , govern the rotation and translation of the
phone model (see Figure 12.15) relative to the head. The support of the uniform
distributions of these parameters is, respectively, [0o, 30o], [15o, 15o], [5 mm, 30
mm], and [10 mm, +10 mm]. The procedure described previously has been used
to build a SAR10g surrogate model using a sparse approach. The initial
experimental design are composed of N = 25 points (𝑥1(1) , 𝑥2(1) , 𝑥3(1) , 𝑥4(1) ) , ……,
(𝑖) (𝑖) (𝑖) (𝑖)
(𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ), ……, selected using the LHS method, and it has been enriched
iteratively using a NLHS.

(a) (b)
Figure 12.15 Generic phone model located close to the head of the Duke human phantom. (a) Face
side view. (b) Phone side view.

The input variables are uniform, orthogonal polynomials and therefore the
Legendre polynomials are suitable for the GPCE. Since the GPCE inputs in the
case of Legendre polynomials must be [1, 1], an iso-probabilist transform has to
be used to link 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 and the standardized GPCE Legendre polynomials
inputs. The sparse GPCE has been obtained using the hyperbolic index sets and
using the LARS. For the hyperbolic approach, as illustrated in Table 12.3, the
sparse GPCE surrogate model produced by the iterative procedure allows, for an
accuracy of 10-2 (assessed using a LOOCV), a GPCE degree of p=8, which
contains only 18 terms instead of the 495 terms that can be requested by the usual
full GPCE according to Equation (12.37).
Statistical Methods and Computational Electromagnetics 545

As explained previously, the LARS is much more efficient than other sparse
approaches. In the present example a LOOCV ( 1 − 𝑄2 ) accuracy of 1% is
obtained with a seventh order and 71 simulations (15 significant polynomial
coefficients to compare with the 330 that are requested using a full GPCE). The
0.1% is obtained with 103 simulations (i.e., 78 simulations added iteratively to the
25 initial ones using the NLHS), in this case 27 polynomials are involved in the
surrogate model. Figure 12.16 shows the probability distribution function (PDF)
of 122 simulations that have been performed.

Table 12.3
Order of GPCE Polynomials, Number of Simulations, and 𝑄2 of the Sparse SAR10g Surrogate GPCE
Model Obtained with the Iterative Process and the “Hyperbolic” Index Set

GPCE Order Number of Significant Polynomials Number of Simulations Q2

p=2 7 30 0.9

p=5 9 43 0.95

p=8 18 88 0.99

p=12 29 122 0.999


Occurrence

SAR10g

Figure 12.16 PDF of the 122 FDTD simulations that have been performed.
546 Advanced Computational Electromagnetic Methods and Applications

Figure 12.17 PDF of the SAR10g based on different surrogate models (“full” GPCE, sparse
“hyperbolic” GPCE, and sparse LARS GPCE) and 10,000 positions of the phone model
relative to the head.

Figure 12.16 provides the PDF of the FDTD simulations that have been
performed, but even if the experimental design has been done with LHS, the
resulting PDF is not necessarily fully representative of that of the SAR10g linked
to the variations of the position of the phone model relative to the head. To assess
this statistical distribution, one can use the SAR10g surrogate model that has been
built and the Monte Carlo approach to generate a large number of outputs. Figure
12.17 shows the PDF of the SAR10g estimated using the surrogate models based
on “full” GPCE (order 3), sparse GPCE using the hyperbolic index set, and the
(𝑖) (𝑖) (𝑖) (𝑖)
sparse GPCE built using LARS and 10,000 points (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) selected
using a usual Monte Carlo process.

12.7.5 Sensibility Analysis

The sensitivity analysis (SA) studies the apportionment in the output uncertainty of
a mathematical model or system to different sources of inputs uncertainty in its
input. The SA is often divided in local and global sensitivity analysis. The local
SA addresses the influence, in the vicinity of given values, on the outputs of little
variations of the inputs. The global SA quantifies the outputs uncertainties due to
changes of the inputs over their domains of variation. Different methods exist [46]
Statistical Methods and Computational Electromagnetics 547

to perform SA; among them, the variance-based methods, also known as ANOVA
(analysis of variance) with the Sobol decomposition [47], are often used. With this
ANOVA approach, the response Y = M(x) of a system having finite variance and
independent inputs can be decomposed [48] into main effects and interactions.
The response variance D = Var[Y] can be decomposed in partial variances

VarX i E Y X i 


Di  xi   (12.44)

Di , j  VarX i  E Y X i  xi , X j  x j    Di  D j (12.45)


 

Di , j , k  VarX i  E Y
 X i x i, X j xj , Xk xk  
  (12.46)
 Di , j  Di , k  Dj, k  Di  D j  Dk

Where E[Y=M(Xi=xi)] is the mean model response when the i-th input parameter is
kept fixed at a given value 𝑥𝑖 the variance of the latter is all the greater since this
conditional mean is varying much as a function of 𝑥𝑖 . The partial variance 𝐷𝑖
measures the contribution 𝑋𝑖 alone to the uncertainty (variance) in Y (averaged
over variations in other variables). The Sobol indices that are known to be good
descriptors of the sensitivity of the model response to its input parameters, since
they do not suppose any kind of linearity of the model, are defined as
Di1 ,...,is
Si1 ,...,is  (12.47)
D
The estimation of Sobol indices is usually assessed using Monte Carlo
approaches that are not easy in the case of heavy calculation. The use of a
surrogate model alleviates the procedure. In the case of GPCE, the calculations are
very greatly reduced since the knowledge of the coefficients of GPCE allows for
Sobol indices without further calculation. Indeed, thanks to the orthonormality of
the polynomials involved in the GPCE, the total and partial variances can be
assessed using the total or partial sums of the squared coefficients. With the same
notation as in (12.44) the total variance is given by:
N 1
D Var
 (Y ) 
k 0
2
k (12.48)

The partial variances are given by

Di1 ,...,is 

 2 (12.49)
i1 ,...,is

where
548 Advanced Computational Electromagnetic Methods and Applications


i1 ,is  :  k  0, k   i1 ,..., is  (12.50)

Equation (12.49) shows that the partial variances 𝐷 ̂𝑖 ,…𝑖 are obtained by
1 𝑠
summing up the squared coefficients of the relative polynomials that depend only
on 𝑥𝑖 .

Si1 ,...,is 
  i1 ,is
2
(12.51)

N 1
2
k 0 k

The total sensitivity indices 𝑆𝑖𝑇 have been also defined to quantify the total
effect of an input parameter on the output. They are defined from the sum of all
partial sensitivity indices 𝑆i1,…is involving parameter i.

D i (12.52)
SiT  1 
D

̂−𝑖 is the sum of all 𝐷


where 𝐷 ̂𝑖 ,…𝑖 that do not include index i.
1 s

For instance, the GPCE coefficients estimated in Section 12.7.4 can be used to
perform the sensitivity analysis of the head exposure with respect to the four
parameters, 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , governing the rotation and translation of the phone model
relative to the head (see Figure 12.15). Figure 12.18 shows the Sobol indices
estimated using the GPCE coefficients. These indices show that the four most
important parameters are S1, S3, S12 and S13. The total sensitivity indices have also
been estimated. The most important are 𝑆1𝑇 and 𝑆3𝑇 that are, respectively, about
85% and 10%. The less important are 𝑆3𝑇 and 𝑆2𝑇 that are contributing less than 5%
together. This analysis shows that the most important parameter is the rotation in
the plane composed of the ears and mouth.
The signature of the GPCE is also of great interest to analyze the importance
of the polynomials involved in the GPCE. Such analysis has been performed to
analyze the variation of the scattered field by building facades. Initial study
analysis methods are based on Green’s functions of semi-infinite medium [49].
The method is fast but not enough to perform a large statistical study. GPCE has
been used to perform the stochastic analysis of scattered field by building facades
[50] when the number of required input samples has been reduced by more than
one order compared to a Monte Carlo approach for the same precision in output
distribution. As shown in Figure 12.19, the problem has eight input variables [49,
50] (height and width of the windows and façade, separation distances between
windows and between windows and façade edges).
Statistical Methods and Computational Electromagnetics 549

0.9

0.8

0.7

0.6
Sobol indices

0.5

0.4

0.3

0.2

0.1

Figure 12.18 Sobol indices estimated using the GPCE coefficients.

-6 D2

-4 D4
W H
-2 D1 D3

H X
0 y
E
2
x
4

6
-6 -4 -2 0 2 4 6

Figure 12.19 Building façade model used in the stochastic analysis.


550 Advanced Computational Electromagnetic Methods and Applications

100
Pure order of degree 1 Mixed order of degree 2

Coefficient value associated with each polynomial


eG-H eG-W Mixed order of degree 1-1
W-H
eC-H
eG-D2 Pure order of degree 3
10-1 eG-D1 eC-W
Mixed order of degree 1-1-1

Mixed order of degree 2-1


10-2

10-3

10-4
1 2 1-1 3 2-1 1-1-1
Standardized order of chaos polynomials

Figure 12.20 Coefficient values for all the polynomials involved in the full GPCE.

Uniform distribution and NLHS have been used in this study [50]. Figure
12.20 shows the signature of the GPCE composed of the coefficient values for all
the polynomials involved in the GPCE. The GPCE’s signature helps to identify the
most important polynomial and is therefore a valuable complement to the
sensitivity analysis.

12.8 KRIGING

12.8.1 Introduction to Kriging

The Kriging method, also known as Gaussian process (GP) regression, is another
method to build surrogate models. The name of the method comes from the South
African engineer, D. G. Krige, who initiated it [51]. The formalism and the
popularization of the method are based on the work of Georges Matheron [52].
This approach and more generally the geostatistical methods are interpreting the
sampled data as random process results. In fact, this does not mean that the
phenomenon such as the RF exposure is produced by a random phenomenon. This
approach allows the benefit of a well-defined mathematical framework to manage
the spatial inference of quantities in unobserved areas and to quantify the
uncertainty associated with the estimator.
Statistical Methods and Computational Electromagnetics 551

The Kriging method performs the spatial interpolation of a variable by


calculating the expected value of a random variable. The interpolated values are
modeled by a Gaussian process governed by prior covariances. This linear
estimation method takes into account the distance between the data points and the
estimation, but also the distances between the two points and the correlation
between these data.
Consider sample of n input vectors {x(1) , x(2),….x(n)} with the response of the
system 𝑦 (𝑖) = 𝑀(𝑥 (𝑖) ). At a given point x the prediction is modeled as a linear
combination of the response at the simulation points:

 
yˆ x 0  yˆ  0  i i y i  (12.53)

The Kriging method is close to regression analysis from the implementation


point of view. Under suitable assumptions on the priors, this method gives the best
linear unbiased prediction (BLUP) of the intermediate values. Depending on the
knowledge one has about the mean value, the Kriging method can be simple,
ordinary, or universal. Ordinary Kriging (OK) assumes a stationary
mean 𝐸(𝑌(𝑥, 𝜔)) = 𝑚 but is unknown while the simple Kriging (SK) assumes a
known stationary mean. The universal Kriging (UK) that assumes a general
𝑝
polynomial trend model. In this case, 𝐸(𝑌(𝑥, 𝜔)) = ∑𝑖=1 𝛽𝑖 𝑓𝑖 (𝑥). The determination
of an adequate general polynomial is quite complex and most of the time the mean
value is unknown; the most popular approach is therefore the ordinary Kriging.

12.8.2 Covariance and Variogram

Variograms and covariances are often used in geostatistic and Kriging in particular.
These are functions describing the degree of spatial dependence of a spatial
random field or stochastic process. The covariance between random variables
𝑋(𝑥, 𝜔) and 𝑌(𝑦, 𝜔) is noted K(x, y) and is defined as

K  x, y   cov  X  x,   , Y  y,   
(12.54)
  
 E  X  x,    E  X  x ,    Y  y ,    E  Y  y ,    
 
For random vectors of dimension m and n, respectively, the cross-covariance
matrix is given by:

cov  X  x,   , Y  y,   
(12.55)
=E  X  x,   Y  y,    E  X  x,    E Y  y,   
T T
   
The stationarity is a standard assumption in many applications but all the
phenomena are not necessarily stationary. A process is said to be stationary if the
covariance k(x + h, x) does not depend on x. In this case the notation of k(x + h, x)
552 Advanced Computational Electromagnetic Methods and Applications

is often reduced to k(h). Classical geostatistical theory [53, 54] relies on a weaker
assumption: the intrinsic hypothesis. In this case the random function Y is called
intrinsically stationary if the increment process (h) = (Y(x) – Y(x + h)) is
stationary. In this case, E(Y(x) – Y(x + h)) and E(Y(x) – Y(x + h))2 are stationary.
The variogram, often noted as 𝛾(𝑥, 𝑦), has covered this case. It is defined as the
variance of the difference between field values at two locations. Any stationary
process is intrinsically stationary, but the converse is not true.
The covariance is a key question for the Kriging method. It can be known but
often that is not the case; the covariance model establishment has therefore to be
fitted to the data (i.e., the covariance model has to be chosen and parameters
involved in the model assessed using the data).
Large efforts have been dedicated to the covariance models that can be used
in GP [55]. Among the possible functions (e.g., 𝛾-exponential function or Matern
function based on the Bessel function), the squared exponential (SE) is quite
simple and often used. The SE covariance is given by
r2

K  r   e 2l (12.56)
2

where the parameter l defines the characteristic length-scale. This parameter has to
2
(𝑥(𝑖)−𝑥(𝑗) )
be assessed using the existing data {⋯ (𝐾𝑖,𝑖 = 𝑒 2𝑙2 ) … } . One can use

regression or an other advanced method [21].

12.8.3 Ordinary and Simple Kriging

Consider {x(1) , x(2),….x(n)} as the sampling sites the sample covariance K between
them, and the covariance K0 of the samples with the estimate point 𝑥 (0) are
expressed as:

 K1,1 K1,1,nn 


K 
K



 cov Y  x  ,Y  x 
i   j
1 i , j  n
(12.57)
 n,1  K n, n 

 K1,0 

K 0 

K 

 cov Y  x  ,Y  x 
i   0
1 i  n
(12.58)
 n,0 
Let us note
Statistical Methods and Computational Electromagnetics 553

 1   y 1 
   
   
 i  
  j  and Y  y  (12.59)
 
   
   
 n  y n 
 

If we consider that the mean of the process, noted m, is known then we can
consider without any loss of generality the case m=0. The best linear unbiased
prediction (BLUP) of the estimate value 𝑌(𝑥 (0) ) is given by the system

  
 0
Y x  

t
Y  y  i
i

(12.60)
  K 1 K 0

In the case of OK, the mean value is unknown, and a new system exists

 j K i , j   K i ,0 i
 j
 (12.61)
 j  1
 j

In this case (12.60) are still valid, but 𝐾 𝑂𝐾 , 𝐾0 𝑂𝐾 and 𝜆𝑂𝐾 provided in (12.62)
are slightly different from (12.57), (12.26), and (12.59).

 K1,1 K1,1,nn 1  K1,0   0 


     
K OK   , K OK    ,  OK    (12.62)
 K n ,1 K n, n 1
0
 K n ,0   n 
     
 1 1 0  1  

In the case of OK, the estimation variance is given by


Var Yˆ  x0   Y  x0  
 02  
K0,0   i i Ki ,0   (12.63)

Figure 12.21 shows the Kriging method applied to 𝑦 = 𝑥𝑠𝑖𝑛(𝑥) with the
samples performed at x = 1, 2, 3, 5, 6, 8, 10, respectively, where the main
advantage is to have uncertainty of the estimation.
554 Advanced Computational Electromagnetic Methods and Applications

10
Y=x.sin(x)
BLUP best linear unbiased predictor
8 95% confidence Interval

6
Y arbitrary unit
4

-2

-4

-6
0 1 2 3 4 5 6 7 8 9 10
x arbitrary unit

Figure 12.21 Kriging applied to y = xsin(x) with the sample performed at x = 1, 2, 3, 5, 6, 8, 10.

Result obtained 15 simulations based on LHS


Output obtained with large number of simulations selection of 15 inputs
SAR induced in fetal brain (W/kg)
SAR induced in fetal brain (W/kg)

Elevation angle Azimuth angle Elevation angle Azimuth angle


(a) (b)
Result obtained 30 simulations based on LHS selection of Result obtained 60 simulations based on LHS selection of
30 inputs 60 inputs
SAR induced in fetal brain (W/kg)

SAR induced in fetal brain (W/kg)

Elevation angle Azimuth angle Elevation angle Azimuth angle

(c) (d)

Figure 12.22 OK results obtained for different number of samples: (a) 200; (b) 15; (c) 30; and (d) 60.

Because of the versatile use of mobile phones, tablets, and computers, efforts
have been recently dedicated to estimate fetus exposure [11, 56, 57]. In spite of the
progress in high-performance computation (e.g., GPU, parallel computing) the
Statistical Methods and Computational Electromagnetics 555

computational time is still a limit to using a Monte Carlo-type approach to assess


the variation of SAR in a fetus’s brain (in a pregnant woman) that is exposed to an
incident plane wave having random angle of incidence. To overcome this limit, the
Kriging method has been used to build a surrogate model. The FDTD simulations
have been performed to assess the fetal brain exposure. The pregnant woman
model used is the anatomically realistic whole-body pregnant-woman model
developed in Japan [58]. Two inputs (the angles defining the incidence) are
involved. Figure 12.22 shows the OK applied to the fetus and pregnant woman.
Figure 12.22 shows also the influence of the number of samples on the OK results.

12.9 CONCLUSION

The recent significant advances observed in the area of numerical simulations,


both from the standpoint of hardware and software, allow us to envisage
simulations considered inaccessible 15 years ago. As a consequence, numerical
simulations are used more and more to design systems and estimate quantities such
as the SAR in the case of the wireless communication systems. In spite of this
progress, the computational time is still a limit to performing sensitivity analyses
and monitoring and quantifying the impact on the outputs of simulations for
possible variations of the inputs.
In this chapter we have shown that statistical tools, and in particular the
polynomial chaos expansion and the Kriging method, are mature methods to be
used in electromagnetism, and in particular in numerical dosimetry to estimate and
monitor the influence on exposure of uncertainties in geometrical and physical
properties of a physical problem.
By adding a statistical dimension to a deterministic calculation and using the
techniques of the propagation of uncertainties in a complex scenario, the EMFs
and SAR distribution (or other outputs of simulations) can be presented by a
confidence interval that makes the result more useful for subsequent decision
making.

REFERENCES

[1] M. Ackerman, “Accessing the Visible Human Project,” D-Lib Magazine, 1995. (www.nlm.
nih.gov/cresearch/visible/visible _human.html)
[2] P. Dimbylow, “Development of the Female Voxel Phantom, NAOMI and Its Application to
Calculations of Induced Current Densities and Electric Fields from Applied Low Frequency
Magnetic and Electric fields,” Physics in Medicine and Biology, Vol. 50, No. 6, pp. 1047–1070,
2005.
[3] T. Nagaoka, et al., “Development of Realistic High-Resolution Whole-Body Voxel Models of
Japanese Adult Males and Females of Average Height and Weight, and Application of Models to
556 Advanced Computational Electromagnetic Methods and Applications

Radio-Frequency Electromagnetic-Field Dosimetry,” Physics in Medicine and Biology, Vol. 49,


No. 1, pp. 1–15, 2004.
[4] A. Lee, et al., “Development of Korean Male Body Model for Computational Dosimetry,” ETRI
J., Vol. 28, No. 1, pp. 107–110, 2006.
[5] I. Zubal, et al., “Computerized 3-Dimentional Segmented Human Anatomy,” Med. Phy., Vol. 21,
No. 2, pp. 299–302, 1994.
[6] A. Christ, et al., “The Virtual Family – Development of Surface-Based Anatomical Models of
Two Adults and Two Children for Dosimetric Simulations,” Physics in Medicine and Biology,
Vol. 55, No. 2, pp. 23–38, 2010.
[7] T. Wu, et al., “Chinese adult Anatomical Models and the Application in Evaluation of RF
Exposures,” Phys. Med. Biol. Vol. 56, No. 7, pp. 2075–2089, 2011.
[8] E. Conil, et al., “Variability Analysis of SAR from 20 MHz to 2.4 GHz for Different Adult and
Child Models Using Finite-Difference Time-Domain,” Physics in Medicine and Biology, Vol. 53,
No. 13, pp. 1511–1525, 2008.
[9] J. Wiart, et al., “Analysis of RF Exposure in the Head Tissues of Children and Adults,” Physics
in Medicine and Biology, Vol. 53, No. 13, pp. 3681–3695, 2008.
[10] J. Wiart, et al., “Numerical Dosimetry Dedicated to Children RF Exposure,” Progress in
Biophysics and Molecular Biology, Vol. 107, No. 3, pp. 421–427, 2011.
[11] N. Varsier, et al., “Influence of Pregnancy Stage and Fetus Position on the Whole-Body and
Local Exposure of the Fetus to RF-EMF,” Physics in Medicine and Biology, Vol. 59, No. 17,
2014.
[12] S. Dahdouh, et al., “A comprehensive Tool for Image-Based Generation of Fetus and Pregnant
Women Mesh Models for Numerical Dosimetry Studies,” Physics in Medicine and Biology, Vol.
59, No. 16, pp. 4583-4602, 2014
[13] WHIST LAB, www.whist.institut-telecom.fr.
[14] J. Hansen, Spherical Near-Field antenna measurements. London: Peter Peregrinus, 1988.
[15] F. Jensen and A. Frandsen, “On the Number of Modes in Spherical Wave Expansions,” Proc.
26th AMTA, Vol. 2, pp. 489–494, 2004.
[16] P. Dimbylow and S. Mann, “SAR Calculations in an Anatomically Realistic Model of the Head
for Mobile Communication Transceivers at 900 MHz and 1.8 GHz,” Physics in Medicine and
Biology, Vol. 39, No. 10, pp. 1537–1553, 1994.
[17] M. Gosselin, et al., “Estimation of Head Tissue-Specific Exposure from Mobile Phones Based on
Measurements in the Homogeneous SAM Head,” Bioelectromagnetics, Vol. 32, No. 6, pp. 493–
505, 2011.
[18] J. Wiart, et al., “Analysis of the Influence of the Power Control and Discontinuous Transmission
on RF Exposure with GSM Mobile Phones,” IEEE Trans. Electromagn. Compat., Vol. 42, No. 2,
pp. 376–384 2002.
[19] A. Gati, et al., “Exposure Induced by WCDMA Mobiles Phones in Operating Networks Wireless
Communications,” IEEE Transactions on Wireless Communication, Vol. 8, No. 12, pp. 5723–
729, 2009.
[20] C. Gabriel and A. Peyman, “Cole–Cole parameters for the Dielectric Properties of Porcine
Tissues as a Function of Age at Microwave Frequencies,” Physics in Medicine and Biology, Vol.
55, No. 15, pp. 413–419, 2010.
Statistical Methods and Computational Electromagnetics 557

[21] B. Sudret, Uncertainty Propagation and Sensitivity Analysis in Mecanical Models. Contribution
to Structural Reliability and Stochastic Spectral Method, Habilitation à Diriger des Recherches,
Universite Blaise Pascal, Clermont-Ferrand, France, 2007. (http://www.ibk.ethz.ch/su/
publications /Reports/HDRSudret.pdf)
[22] http://www.openturns.org
[23] M. Mc Kay, W. Conover, and R. Beckman, “A Comparison of Three Methods for Selecting
Values of Input Variables in the Analysis of Output from a Computer code,” Technometrics, Vol.
21, No. 2, pp. 239–245, 1979.
[24] G. Wang, “Adaptive Response Surface Method Using Inherited Latin Hypercube Design Points,”
J. Mech. Des., Vol. 125, No. 2, pp. 210–220, 2003.
[25] P. Qian, “Nested Latin Hypercube Designs,” Biometrika, Vol. 96, No. 4, pp. 957–970, 2009.
[26] B. Efron, “Bootstrap Methods: Another Look at the Jackknife,” Annals of Statistics, Vol. 7, No
1, pp. 1–2, 1979.
[27] C. Chauvière, J. Hesthaven, L. Lurati, “Computational Modeling of Uncertainty in Time-Domain
Electromagnetics,” SIAM J. Sci. Comput., Vol 28, No. 2, pp. 751–775, 2006.
[28] D. Xiu and J. Hesthaven, “High-Order Collocation Methods for Differential Equations with
Random input,” SIAM J. Sci. Comput., Vol. 27, No. 3, pp. 1118–1139, 2005.
[29] J. Silly-Carette, D. Lautru, M. Wong, A. Gati, J. Wiart, and V. Fouad Hanna, “Variability on the
Propagation of a Plane Wave Using Stochastic Collocation Methods in a Bio Electromagnetic
Application,” IEEE Microwave and Wireless Components Letters, Vol. 19, No. 4, pp. 185–187,
2009.
[30] N. Wiener, “The Homogeneous Chaos,” Amer. J. Math., Vol. 60, No. 4, pp. 897–936, 1938.
[31] R. Cameron and W. Martin, “The Orthogonal Development of Nonlinear Functionals in Series of
Fourier-Hermite Functionals,” Ann. of Math., Vol. 48, No. 2, pp. 385–392, 1947.
[32] R. Ghanem and P. Spanos, Stochastic Finite Elements: A Spectral Approach, New York:
Springer-Verlag, 1991.
[33] W. Shoutens, Stochastic Processes and Orthogonal Polynomials, New York: Springer-Verlag,
2000.
[34] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions with Formulas, Graphs,
and Mathematical Tables, 9th printing, New York: Dover Publications, 1972.
[35] Ch. Soize and R. Ghanem, “Physical Systems with Random Uncertainties: Chaos
Representations with Arbitrary Probability Measure,” SIAM Journal on Scientific Computing,
Vol. 26, No. 2, pp. 395–410, 2004.
[36] A. Nataf, “Détermination des Distributions Dont Les Marges Sont Données,” C.R. de
l’Académie des Sciences, Vol. 225, pp 42–43, 1962.
[37] M. Rosenblatt, “Remarks on a Multivariate Transformation,” The Annals of Mathematical
Statistics, Vol 23, pp 470–472, 1992.
[38] S. Smolyak, “Quadrature and Interpolation Formulas for Tensor Products of Certain Classes of
Functions,” Soviet. Math. Dokl. Vol. 4, pp. 240–243, 1963.
[39] C. Clenshaw and A. Curtis, “A Method for Numerical Integration on an Automatic Computer,”
Num.Math. Vol. 2, pp. 197–205, 1960.
[40] G Blatman, Adaptive Sparse Polynomial Chaos Expansions for Uncertainty Propagation and
Sensitivity Analysis, Ph.D Thesis, Université Blaise Pascal, Clermont-Ferrand, 2009.
558 Advanced Computational Electromagnetic Methods and Applications

[41] G. Blatman and B. Sudret, “Anisotropic Parsimonious Polynomial Chaos Expansions Based on
the Sparsity-of-Effects Principle,” Int Conf. on Structural Safety and Reliability, Osaka, Japan,
2009.
[42] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least Angle Regression,” Annals of
Statistics Vol. 32, pp. 407–499, 2004.
[43] R. Tibshirani, “Regression Shrinkage and Selection via the Lasso,” J. Royal Stat. Soc., Series B
Vol. 58, pp. 267–288, 1996.
[44] G. Blatman and B.Sudret, “Adaptive Sparse Polynomial Chaos Expansion Based on Least Angle
Regression,” Journal of Computational Physics, Vol. 230, No. 6, pp. 2345–2367, 2011.
[45] A. Ghanmi, Analyse de l’exposition aux Ondes électromagnétiques des Enfants Dans le Cadre
des Nouveaux Usages et Nouveaux Réseaux, Phd Université Marne La Vallée, 2013.
[46] A. Saltelli, K. Chan, and E. Scott, (eds.), Sensitivity analysis. New York: John Wiley & Sons,
2000.
[47] I. Sobol, “Sensitivity Estimates for Nonlinear Mathematical Models,” Math Model & Comput
Exp., Vol. 1, pp. 407–414, 1993.
[48] B. Efron and C. Stein, “The Jacknife Estimate of Variance,” Annals Statist, Vol. 9, No. 3, pp.
586–596, 1981.
[49] S. Mostarshedi, et al., “Fast and Accurate Calculation of Scattered Electromagnetic Fields from
Building Faces Using Green's Functions of Semi-Infinite Medium, ” IET Microwaves, Antennas
& Propagation, Vol. 4, No. 1, pp. 78–82, 2010.
[50] P. Kersaudy, et al., “Stochastic Analysis of Scattered Field by Building Facades Using
Polynomial Chaos,” IEEE Trans on Antenna & Propagation, Vol. 62, No. 12, pp. 6382–6393,
2014
[51] D. Krige, “A Statistical Approach to Some Basic Mine Valuation Problems on the
Witwatersrand,” Journal of the Chemical, Metallurgical and Mining Society, Vol. 52, pp. 119–
139, 1951.
[52] G. Matheron, Traité de géostatistique appliquée, Tome I., In E. Technip (ed.), Mémoires du
Bureau de Recherches Géologiques et Minières, No. 14, Paris, 1962.
[53] G. Matheron, “The Intrinsic Random Functions and Their Applications,” Adv. Appl. Prob. Vol. 5,
pp. 439–468, 1973.
[54] J. Chiles and P. Delfiner, Geostatistics. Modeling Spatial Uncertainty, New York: John Wiley
and Sons, 2012.
[55] C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, University Press
Group Limited, New Era Estate, 2006.
[56] M. Jala, et al., “Simplified Pregnant Woman Models for the Fetus Exposure Assessment,” C.R.
Physique, Vol. 14, No.5, pp. 412–417, 2013.
[57] M. Jala, Plans D'expériences Adaptatifs Pour le Calcul de Quantiles et Application à la
Dosimétrie Numérique, PhD. Thesis, Telecom Paris-Tech, 2013.
[58] T. Nagaoka, et al., “An Anatomically Realistic Whole-Body Pregnant-Woman Model and Speci-
fic Absorption Rates for Pregnant-Woman Exposure to Electromagnetic Plane Waves from 10
MHz to 2 GHz,” Phys. Med. Biol., Vol.52, pp. 6731–6745, 2007.
About the Authors

Wenhua Yu is the Tepin professor at Jiangsu Normal University, the


president of 2COMU, Inc. and a visiting professor of Harbin
Engineering University. He is the director of Big-Data Analysis and
Processing Key Lab of Jiangsu Province. He was a visiting
professor/research associate at Pennsylvania State University from 1996
to 2010. He has worked on topics related to FDTD methods, software
development techniques, high-performance computing techniques, and
engineering applications for many years, and was the first to apply
vector units and Phi coprocessors to solve electromagnetic problems. He has published
more than 150 technical papers on the parallel FDTD methods and simulation techniques.
Prof. Yu also authored Conformal Finite Difference Time Domain Maxwell’s Equations
Solver Software and User’s Guide (Artech House, 2003), Parallel Finite Difference Time
Domain Methods (Artech House, 2006), Electromagnetic Simulation Techniques Based
FDTD Methods (John Wiley & Sons, 2009), Advanced FDTD method: Acceleration,
Parallelization, and Engineering applications (Artech House, 2011), and VALU
Acceleration Techniques for Parallel FDTD Methods (IET/SciTech Publisher, 2013),
Advanced Computational Electromagnetics Methods and Applications (Editor, Artech
House, 2014), and three books in Chinese (2005, 2010, 2012). He also translated one book
(from English to Chinese) Understanding the Finite Difference Time Domain Method (John
B. Schneider, Washington State University) (Tsinghua University Press, 2014). He is the
general cochair of several international conferences on computational electromagnetics,
antennas and microwave circuits. He is a lead guest editor of Journal of International
Antennas and Propagation in the special issue of “Small Antennas: Miniaturization
Techniques and Applications.” He is the member of a technical committee of several
international journals and a guest editor of the Harbin Engineering University Workshop
special issue of Journal of Applied Computational Electromagnetics Society. He is a TPC
cochair of the 2014 International Conference on Wireless Communications and Signal
Processing. He is a senior member of the IEEE and a primary developer of the GEMS
software package. He is also the founder of the Global Chinese Electromagnetic Network
(www.globalchineseEM.org).

Wenxing Li is a professor at Harbin Engineering University. He


received a B.S. and an M.S. in electrical engineering from Harbin
Engineering University in 1982 and 1987, respectively. He has published
three books and more than 60 technical papers. He received five national
awards and developed 5 products certified as “national key new
products.” His research interests include computational electromagnetic
methods, antenna theory and design, and electromagnetic compatibility.
He serves as the director of the Electromagnetic Engineering and Wireless Technology
Institute.

559
560 Advanced Computational Electromagnetic Methods and Applications

Atef Z. Elsherbeni received a Ph.D. in electrical engineering from


Manitoba University, Winnipeg, Manitoba, Canada, in 1987. He joined
the University of Mississippi in August 1987 as an assistant professor of
electrical engineering. He advanced to the associate professor rank in July
1991, and to the professor rank in July 1997. At the University of
Mississippi, he was also the director of the School of Engineering CAD
Lab from August 2002 to August 2013, the director of the Center for
Applied Electromagnetic Systems Research (CAESR) from July 2011 to
August 2013, and the associate dean of engineering for research and graduate programs
from 2009 to 2013. He was appointed as an adjunct professor at the Department of electrical
engineering and computer science of the L.C. Smith College of Engineering and Computer
Science at Syracuse University in January 2004. He spent a sabbatical term in 1996 in the
Electrical Engineering Department, University of California at Los Angeles (UCLA) and
was a visiting professor at Magdeburg University in Germany during the summer of 2005
and at Tampere University of Technology in Finland during the summer of 2007. He was a
Finland Distinguished Professor from 2009 to 2011. Dr. Elsherbeni became the Dobelman
Distinguished Chair and professor of electrical engineering at Colorado School of Mines in
August 2013. He is a Fellow member of the Institute of Electrical and Electronics Engineers
(IEEE), a fellow of the Applied Computational Electromagnetic Society (ACES), and the
editor in chief for ACES Journal.

Yahya Rahmat-Samii is a distinguished professor, holder of the


Northrop-Grumman Chair in electromagnetics, member of the U.S.
National Academy of Engineering (NAE), winner of the 2011 IEEE
Electromagnetics Award, and the former chairman of the Electrical
Engineering Department at the University of California, Los Angeles
(UCLA). Before joining UCLA, he was a senior research scientist at
Caltech/NASA's Jet Propulsion Laboratory. Dr. Rahmat-Samii was the
1995 president of the IEEE Antennas and Propagation Society and the
2009-2011 president of the United States National Committee (USNC) of the International
Union of Radio Science (URSI). He has also served as an IEEE Distinguished Lecturer
presenting lectures internationally. Dr. Rahmat-Samii is a Fellow of the IEEE, AMTA, and
ACES. Dr. Rahmat-Samii has authored and coauthored over 950 technical journal articles
and conference papers and has written over 35 book chapters and four books. He has
received numerous awards, including the 1992 and 1995 Wheeler Best Application Prize
Paper Award for his papers published in the IEEE Transactions on Antennas and
Propagation, the 1999 University of Illinois ECE Distinguished Alumni Award, the IEEE
Third Millennium Medal, AMTA’2000 Distinguished Achievement Award, 2001 recipient
of an Honorary Doctorate Causa from the University of Santiago de Compostela, Spain,
2001 Foreign Membership of the Royal Flemish Academy of Belgium for Science and the
Arts, 2002 Technical Excellence Award from JPL, 2005 URSI Booker Gold Medal, 2007
Chen-To Tai Distinguished Educator Award of the IEEE AP-S, 2009 IEEE AP-S
Distinguished Achievement Award, 2010 UCLA School of Engineering Lockheed Martin
Excellence in Teaching Award, and 2011 UCLA Distinguished Teaching Award. His
research contributions cover diverse areas of modern electromagnetics and antennas
spanning from small medical antennas to large space deployable antennas. Dr. Rahmat-
Samii is the designer of the IEEE AP-S logo, which is displayed on all IEEE AP-S
publications.
About the Authors 561

Mohammadreza Barzegaran obtained B.S. and M.S. degrees in power


engineering from University of Mazandaran, Iran in 2007 and 2010,
respectively. He joined the Energy Systems Research Laboratory in the
Department of Electrical and Computer Engineering, Florida International
University, Miami, Florida, for his doctoral research. He recently
accepted an assistant professor position at Lamar University, Texas. His
research interests include electromagnetic compatibility (EMC) in power
system components, life assessment of electrical power components, and fault detection in
electrical machines. He has performed a number of computer-aided simulations using
numerical methods techniques as well as conducted interference measurements of
interconnected power components. He has published a number of papers in journals in
addition to national and international conference records.

Malcolm M. Bibby received B.Eng. and Ph.D. degrees in electrical


engineering from the University of Liverpool, England in 1962 and 1965,
respectively. He also holds an MBA from the University of Chicago. His
career includes both engineering and management. He was president of
LXE Inc., a manufacturer of wireless data communications products from
1983 to 1994. Thereafter he was the president of NDI, a manufacturer of
hardened handheld computers, for five years. He is currently an adjunct
professor in ECE at Georgia Tech. His interests lie in the field of high-accuracy
computational electromagnetics.

Veysel Demir is an associate professor in the Department of Electrical


Engineering at Northern Illinois University. He received a bachelor of
science degree in electrical engineering from Middle East Technical
University, Ankara, Turkey, in 1997. He studied at Syracuse University,
New York, where he received both master of science and doctor of
philosophy degrees in electrical engineering in 2002 and 2004,
respectively. During his graduate studies, he worked as a research
assistant for Sonnet Software, Inc. He worked as a visiting research scholar in the
Department of Electrical Engineering at the University of Mississippi from 2004 to 2007.
He joined Northern Illinois University in August 2007, where he worked as an assistant
professor until August 2014.
Dr. Demir’s main field of research is electromagnetics and microwaves. He is especially
experienced in applied computational electromagnetics. He heavily participated in the
development of time-domain and frequency-domain numerical analysis tools for new
applications and contributed to research on improving the accuracy and speed of the
algorithms being developed. He is experienced in designing RF/microwave circuits and
antennas for related technologies and performing experimental characterizations of these
devices.
Dr. Demir is a member of the IEEE, ACES, and Sigma Xi and has coauthored more
than 40 technical journal and conference papers. He is a coauthor of the book The Finite
Difference Time Domain Method for Electromagnetics with MATLAB Simulations (Scitech,
2009).
562 Advanced Computational Electromagnetic Methods and Applications

Jiahui Fu received B.S. and M.S. degrees from the Harbin Institute of
Technology in 1995 and 1998, respectively, and a Ph.D. degree in
information and communication engineering from the Harbin Institute of
Technology, China, in 2005.
He is currently a professor in the School of Electronics and
Information Engineering, Harbin Institute of Technology. His research
interests include microwave wave and millimeter-wave circuits, antennas,
metamaterials, and electromagnetic compatibility.

Mohammed F. Hadi received M.Sc. and Ph.D. degrees from the


University of Colorado at Boulder, in 1992 and 1996, respectively. His
research is currently focused on FDTD development for modeling
electrically large structures. He has over ten years of experience in the
Kuwait government in the areas of engineering training, higher education
planning, and Kuwait’s labor profile studies. Between 2004 and 2012, he
was a board member of the Kuwait Fund for Arab Economic
Development’s prestigious National Engineering Training Program. He has been a sworn-in
consulting expert at Kuwait’s Court of Appeals since 2007. He also held membership and
chair positions in several high-level governmental inquiries at the Kuwait Ministries of
Defense, Energy, and Commerce. Professor Hadi served as technical program committee
member for several IEEE and ACES (Applied Computational Electromagnetics Society)
conferences in the United States and Europe. He was a visiting research scholar at Duke
University, Durham, NC, during the academic year 2007–2008. Until 2014, Professor Hadi
was a professor and an associate dean of Electrical Engineering Department at Kuwait
University. He is currently an adjunct professor at the University of Colorado at Boulder.

Jian-Ming Jin received his Ph.D. degree in electrical engineering from


the University of Michigan, Ann Arbor, in 1989. He joined the University
of Illinois at Urbana-Champaign in 1993 and is currently the Y. T. Lo
Endowed Chair professor of electrical and computer engineering and the
director of the Electromagnetics Laboratory and Center for Computational
Electromagnetics. He has authored and coauthored over 240 papers in
refereed journals and 21 book chapters. He has also authored The Finite
Element Method in Electromagnetics (Wiley, first edition 1993, second
edition, 2002, third edition, 2014), Electromagnetic Analysis and Design in Magnetic
Resonance Imaging (CRC, 1998), Theory and Computation of Electromagnetic Fields
(Wiley, 2010), and coauthored Computation of Special Functions (Wiley, 1996), Fast and
Efficient Algorithms in Computational Electromagnetics (Artech, 2001), and Finite Element
Analysis of Antennas and Arrays (Wiley, 2008). His current research interests include
computational electromagnetics, scattering and antenna analysis, electromagnetic
compatibility, high-frequency circuit modeling and analysis, bioelectromagnetics, and
magnetic resonance imaging. He was elected by ISI as one of the world’s most cited authors
in 2002.
Dr. Jin is a Fellow of the IEEE and was a recipient of the 1994 National Science
Foundation Young Investigator Award, the 1995 Office of Naval Research Young
Investigator Award, the 1999 Applied Computational Electromagnetics Society (ACES)
Valued Service Award, and the 2014 ACES Technical Achievement Award. He also
received the 1997 Xerox Junior Research Award and the 2000 Xerox Senior Research
Award presented by the College of Engineering, University of Illinois at Urbana-
About the Authors 563

Champaign, and was appointed as the first Henry Magnuski Outstanding Young Scholar in
the Department of Electrical and Computer Engineering in 1998 and later as a Sony scholar
in 2005. He was appointed as a distinguished visiting professor in the Air Force Research
Laboratory in 1999 and was awarded adjunct, visiting, guest, or chair professorship by City
University of Hong Kong, University of Hong Kong, Anhui University, Beijing Institute of
Technology, Peking University, Southeast University, Nanjing University, Zhejiang
University, Shanghai Jiao Tong University, and Xidian University. His name appeared over
20 times in the University of Illinois at Urbana-Champaign’s List of Excellent Instructors.
His students have won the best paper awards in IEEE 16th Topical Meeting on Electrical
Performance of Electronic Packaging and 25th and 27th Annual Review of Progress in
Applied Computational Electromagnetics. He served as an associate editor and guest editor
for the IEEE Transactions on Antennas and Propagation, Radio Science, Electromagnetics,
Microwave and Optical Technology Letters, and Medical Physics. He was the Symposium
cochairman and technical program chairman of the Annual Review of Progress in Applied
Computational Electromagnetics in 1997 and 1998, respectively

Joshua M. Kovitz received a B.S. in electrical engineering (summa cum


laude) from the University of Houston (UH) in 2010 and an M.S. in
electrical engineering from the University of California Los Angeles
(UCLA) in 2012. Currently, he is working towards his Ph.D. at UCLA
under the supervision of Professor Yahya Rahmat-Samii in the Antenna
Research, Analysis, and Measurement Laboratory.
While at UH, Joshua was involved with the Applied Electromagnetics
Laboratory and conducted research in computational electromagnetics for geophysical
applications. He also participated in the NSF REU program, where he worked on structural
health monitoring (SHM) with wireless sensors in the Wireless System Research Group
(WiSeR) under Professor Rong Zheng. Currently at UCLA, his primary research focuses on
practical antenna system design for cognitive radio applications. His research interests
include reconfigurable antennas, microstrip patch antennas, applied electromagnetics,
nature-inspired optimization techniques, vehicular antennas, cognitive radio, wireless
communications systems, and MIMO antenna systems.
He has received several awards and is actively involved with student and professional
groups. At UH, he was awarded the Outstanding Electrical Engineering Senior of the Year
of 2010 and the Outstanding Junior of the Year of 2009 for academic excellence and
student group involvement. He was also honored to participate as the Banner Bearer for the
Cullen College of Engineering during the 2010 graduation ceremony. During his time at
UCLA, he was awarded the UCLA Electrical Engineering Dean’s Fellowship as well as the
UCLA Graduate Division Fellowship. He was the highest ranked student in the 2012 UCLA
electrical engineering Ph.D. preliminary exam within the physical and wave electronics area
and was awarded a university fellowship. In 2012 he was the recipient of the Distinguished
Masters Thesis Award in Physical and Wave Electronics for his research in nature-inspired
optimization techniques applied to antenna designs. He was also awarded a prestigious
National Defense Science and Engineering Graduate (NDSEG) Fellowship. He was
awarded the Edward K. Rice Outstanding Master's Student for 2012. Joshua M. Kovitz is a
member of the IEEE, Tau Beta Pi, Eta Kappa Nu, and Phi Kappa Phi.
564 Advanced Computational Electromagnetic Methods and Applications

Fanyi Meng received B.S., M.S., and Ph.D. degrees in electromagnetics


from the Harbin Institute of Technology, Harbin, China in 2002, 2004 and
2007, respectively. Since August 2007, he has been with the Department
of Microwave Engineering at the Harbin Institute of Technology where he
is currently a professor. He has coauthored three books, 40 international
refereed journal papers, over 20 regional refereed journal papers, and 20
international conference papers. His current research interests include
electromagnetic and optical metamaterials, plasmonics, and EMC.
Dr. Meng is a recipient of several awards including the 2010 Award of Science and
Technology from the Heilongjiang Province Government of China, the 2010 “Microsoft
Cup” IEEE China Student Paper Contest Award, two best paper awards from the National
Conference on Microwave and Millimeter Wave in China, in 2009 and 2007, respectively,
the 2008 University Excellent Teacher Award of National University of Singapore, the 2007
Excellent Graduate Award of Heilongjiang Province, and the Outstanding Doctor Degree
Dissertation Award of Harbin Institute of Technology.

Osama Mohammed is a professor in the electrical and computer


engineering department at Florida International University. He is a Fellow
of the IEEE and is the recipient of the IEEE PES 2010 Cyril Veinott
Electromechanical Energy Conversion Award. He has been the general
chair of several international conferences including ACES 2006, IEEE-
CEFC 2006, IEEE-IEMDC 2009, IEEE-ISAP 1996, and COMPUMAG-
1993. He has also chaired technical programs for other major
international conferences including IEEE-CEFC 2010, IEEE-CEFC-2000 and the 2004
IEEE Nanoscale Devices and System Integration. Dr. Mohammed also organized and taught
many short courses on power systems, Electromagnetics and intelligent systems in the
United States and abroad. Professor Mohammed has served ACES in various capacities for
many years. He also serves the IEEE on various boards, committees, and working groups at
the national and international levels. He received the M.S. and Ph.D. degrees in electrical
engineering from Virginia Polytechnic Institute and State University. He has published
numerous journal articles over the past 30 years in areas relating to computational
electromagnetics and design optimization of electromagnetic devices, artificial intelligence
applications, and energy systems. He authored and co-authored more than 300 technical
papers in the archival literature. He has conducted research work for government and
research laboratories in shipboard power conversion systems and integrated motor drives.
He is also interested in the application communication and sensor networks for the
distributed control of smart power grids. He has been successful in obtaining a number of
research contracts and grants from industries and federal government agencies on projects
related to these areas.

Bach T. Nguyen was born in Hanoi, Vietnam. He received B.Eng. and


M.Eng. degrees in electrical engineering from the National Defense
Academy of Japan, Yokosuka, Kanagawa, Japan, in 2007 and 2009,
respectively. From 2010 to 2012, he was with the Vietnam Military
Academy of Science and Technology, Hanoi, Vietnam. Currently, he is
pursuing a Ph.D. degree in the electrical and computer engineering
Department, at the University of Utah, Salt Lake City, UT. His research
About the Authors 565

interests include computational electromagnetics, RF/microwave technology, RF-IC, and


liquid crystals. His current research focuses on stochastic FDTD simulation of
electromagnetic wave propagation in the ionosphere.

Andrew F. Peterson received B.S., M.S., and Ph.D. degrees in electrical


engineering from the University of Illinois, Urbana-Champaign in 1982,
1983, and 1986 respectively. Since 1989, he has been a member of the
faculty of the School of Electrical and Computer Engineering at the
Georgia Institute of Technology, where he is now a professor and an
associate chair for faculty development. He teaches electromagnetic field
theory and computational electromagnetics, and conducts research in the
development of computational techniques for electromagnetic scattering, microwave
devices, and electronic packaging applications.

Alireza Samimi received a B.S. in electrical engineering from Shiraz


University, Shiraz, Iran, in 2005, an M.S. in electrical engineering from
the University of Tabriz, Tabriz, Iran, in 2008, and a Ph.D. in electrical
engineering from Virginia Tech in 2013. His research interests include
the physics of the upper atmosphere, finite-difference time-domain
(FDTD) solution of Maxwell’s equations and its applications in
simulating wave propagation in the earth-ionosphere-magnetosphere system, active
modification of the ionosphere, and particle-in-cell computational modeling of plasma
instabilities. He participated in two experimental research campaigns at High Frequency
Active Auroral Research Program (HAARP) facilities to study generation mechanism of the
narrowband stimulated electromagnetic emission spectral features excited during
ionospheric heating near the second electron gyro-harmonic. He is currently working as a
postdoctoral fellow in the Department of Electrical and Computer Engineering, University
of Utah, Salt Lake City. Dr. Samimi has been a member of American Geophysical Union
(AGU) since 2010 and IEEE since 2011.

Jamesina J. Simpson is an associate professor in the electrical and


computer engineering department at the University of Utah. She serves as
an associate editor of the IEEE Transactions on Antennas and
Propagation. She serves as a steering committee member (Special
Sessions Chair & Publicity Chair) of 2012 IEEE AP-S International
Symposium and USNC/URSI Radio Science Meeting. Her research lab
encompasses the application of FDTD to model electromagnetic
phenomena at frequencies over 15 orders of magnitude (~1 Hz versus ~600 THz). Professor
Simpson’s research activities have been funded by NASA, Sandia National Labs, Los
Alamos National Labs, Intel Corporation, the Department of Energy, the Air Force Office
of Scientific Research, and the National Science Foundation (NSF). She has received
research and teaching awards, including a 2010 NSF Faculty Early Career Development
(CAREER) Award (entitled “3-D Global Full-Maxwell's Equations Modeling of the Effects
of a Coronal Mass Ejection on the Earth”), and a 2011 Air Force Summer Faculty
Fellowship.
566 Advanced Computational Electromagnetic Methods and Applications

Joe Wiart, received a Ph.D. from Telecom Paris Tech and University P
VI in 1995. He has been a telecommunication engineer from Telecom
Paris Tech since 1992, and the head of the research unit of Orange
(www.orange.com former France Telecom) in charge of studies relative
to the human exposure to electromagnetic fields since 1997. Since 1999
Dr. Wiart has served as the chairman of the working group of the
European Committee for Electrotechnical Standardization (CENELEC) in
charge of mobile and base station standards. He is one of founders of the common
laboratory of the Institute Mines-Telecom and the Orange Labs (http://whist.mines-
telecom.fr/) which he has managed, since its creation in 2009. Dr. Wiart is the present
chairman of the International Union of Radio Science (URSI) commission K. He has been
the chairman of the French chapter of URSI and a consultant to ICNIRP. He is emeritus
member of The Society of Environmental Engineers (SEE) since 2008 and a senior member
of Institute of Electrical and Electronics Engineers (IEEE) since 2002. He has led more than
10 national projects dedicated to dosimetry (http://whist.mines-telecom.fr/) and was
involved in several EU projects (Interphone, Mobi-Kids and Geronimo). Since the end of
2012, he is the leader of the EU project LEXNET (http://www.lexnet-project.eu/). His
research interests are dosimetry, numerical methods, and statistic applied in
electromagnetism, and stochastic dosimetry. His works resulted in more than 90
publications and more than 120 communications (including numerous invited
communications).

Qun Wu received a B.Sc. in radio engineering, an M. Eng. in


electromagnetic fields and microwaves, and a Ph.D. in communication
and information systems, all from Harbin Institute of Technology (HIT),
Harbin, China, in 1977, 1988, and 1999, respectively. He worked as a
visiting professor at Seoul National University (SNU) in Korea from 1998
to 1999, and Pohang University of Science and Technology from 1999 to
2000, and for two months as a visiting professor at National University of
Singapore from 2003 to 2010 and the Nanyang Technological University in 2011,
respectively. Since 1990 he has been with School of Electronics and Information
Engineering at HIT, China, where he is currently a professor and the director of the Center
for Microwaves and EMC.
He has published several books, including Electromagnetic Compatibility: Principle
and Techniques, Microwave Engineering and Techniques, Simulation, and Design for RF &
Microwave Circuits by Using Genesys, Theory and Applications of Metamaterials.
Professor Wu has published over 100 international and regional refereed journal papers. He
is a senior member of the IEEE. He is a technical reviewer for several international journals.
His recent research interests are mainly in the areas of electromagnetic compatibility,
metamaterials, RF microwave active and passive circuits, and millimeter-wave devices. He
is also a vice chair for IEEE Harbin section, and the chair of the IEEE Harbin
EMC/AP/MTT joint Society chapter.

Mingyao Xia received master and Ph.D. degrees in electrical engineering


from the Institute of Electronics, Chinese Academy of Sciences (IECAS),
Beijing, China, in 1988 and 1999, respectively.
From 1988 to 2002, he was with IECAS as an engineer and a senior
engineer. He was a visiting scholar at the University of Oxford, United
Kingdom, from October 1995 to October 1996. From June 1999 to
About the Authors 567

August 2000 and from January 2002 to June 2002, he was a senior research assistant and a
research Fellow, respectively, with the City University of Hong Kong. He joined Peking
University (PKU), Beijing, China, as an associate professor in 2002, and was promoted to
full professor in 2004. He moved to the University of Electronic Science and Technology of
China, Chengdu, China, as a Chang-Jiang Professor nominated by the Ministry of Education
of China in 2010. He returned to PKU after finishing the appointment in 2013. His research
interests include computational electromagnetics, wave propagation and scattering,
microwave remote sensing, antennas, and microwave components. He has authored one
book and a few book chapters, and more than 80 peer-viewed papers.
Prof. Xia was the recipient of the Young Scientist Award of the URSI in 1993. He was
awarded the first-class prize on Natural Science by the Chinese Academy of Sciences in
2001. He was the recipient of the Foundation for Outstanding Young Investigators
presented by the National Natural Science Foundation of China in 2008.

Ming-Feng Xue received a B.S. in electronic information engineering


from Anhui University, Hefei, China, in 2005, and an M.S. in
electromagnetic field and microwave technology from Shanghai Jiao
Tong University, Shanghai, China, in 2008, respectively. He is currently
working towards a Ph.D. in electrical and computer engineering at the
University of Illinois at Urbana-Champaign.
Since 2008, he has been a research assistant with the Center for
Computational Electromagnetics at the University of Illinois at Urbana-Champaign. His
research interests include finite element and boundary integral methods, domain
decomposition methods, and high-performance electromagnetic simulation. He received a
Best Student Paper Award at the 11th International Workshop on Finite Elements for
Microwave Engineering, Estes Park, Colorado, in 2012. He served as a reviewer for IEEE
Transactions on Antennas and Propagation and IEEE Antennas and Wireless Propagation
Letters.
Guohui Yang received B.Eng. in communication engineering, an M. Eng.
in instrument science and technology, and a Ph.D. in microelectronics and
solid state electronics, all from Harbin Institute of Technology, Harbin,
China, in 2003, 2006, and 2009, respectively. After that, he joined Harbin
Institute of Technology as a postdoctoral fellow and served as assistant
professor. His research interests include frequency selective surface, smart
antenna, conformal array, and cognitive radio.

Xiaoling Yang graduated from Tianjin University, China with a B.S. in


applied mathematics, a B.E. in electric engineering in 2001, and an M.S.
in applied mathematics in 2004. After that, he joined the Electromagnetic
Communication Lab of the Pennsylvania State University for several years
as a research associate. He has published over twenty conference and
journal papers and coauthored four books in the computational
electromagnetics field. He also served as a reviewer for multiple
conferences and journals. He was evaluated as an IEEE senior member in 2010. His
research interests include FDTD and FEM methods, parallel computing, hardware
acceleration (GPU and Phi), 3-D modeling, and visualization.
568 Advanced Computational Electromagnetic Methods and Applications

Kuang Zhang received a B.Eng. in communication engineering, an M.


Eng. in electronic engineering and a Ph.D. in information and
communication engineering, all from Harbin Institute of Technology,
Harbin, China, in 2005, 2007, and 2011, respectively. Currently he is an
assistant professor in the School of Electronics and Information
Engineering, Harbin Institute of Technology. His research interests
include transform optics and ultra-thin metasurface.

Lei Zhao joined the Jiangsu Normal University in September 2009 as an


assistant professor and was promoted to an associate professor in August
2012. He is the director of the Center for Computational Science and
Engineering, and the associate Dean of the School of Mathematics and
Statistics. He received a B.S. in mathematics from Jiangsu Normal
University, China, 1997, an M.S. in computational mathematics, and a
Ph.D. in electromagnetic fields and microwave technology from
Southeast University, Nanjing, China, in 2004 and 2007, respectively.
From August 2007 to August 2009, he worked in the Department of
Electronics Engineering, The Chinese University of Hong Kong as a research associate.
From February 2011 to April 2011, he worked in the Department of Electronics and
Computer Engineering of National University of Singapore as a research fellow.
Dr. Zhao has published over 20 technical papers in Progress in Electromagnetics
Research, APL, IEEE Antennas and Propagation Magazine, and other international journals
and conferences. His current research interests include big data modeling, computational
electromagnetics, electromagnetic radiation to the human’s body, and numerical methods
for partial differential equations.

Xianyang Zhu was born in February 1967, in Anhui, China. He received


a B.S. and a Ph.D. in electrical engineering from Xi’an Jiaotong
University, Xi’an, China, in 1988 and 1994, respectively.
From 1994 to 1997, he was with the Electromagnetics Institute,
Southwest Jiaotong University, Sichuan, China, where, in 1995, he was
appointed an associate professor. From 1997 to 1999, he was a
postdoctoral research fellow with the Center for Computational
Electromagnetics, Department of Electrical and Computer Engineering,
University of Illinois at Urbana-Champaign. From 1999 to 2003, he was a research scientist
at the Center for Applied Remote Sensing, Department of Electrical and Computer
Engineering, Duke University, Durham, North Carolina. From 2003 to 2005, he was a
senior research scientist at Intelligent Automation Inc., Rockville, Maryland. From 2005 to
2013, he was a principal engineer at Signal Innovations Group, Durham, NC. Since May
2013, he has been the subject matter expert at Corvid Technologies, Huntsville, Alabama.
He is a senior member of the IEEE. His current research interests include
electromagnetic scattering analysis, antenna design, and signal processing. He served as
principal investigator of several projects awarded by Army, Navy, and MDA.
Index

Absorbing boundary condition, 84 residual method, 284


Adaptive cross approximation, 300
Advanced vector extensions, 175
Alternating current, 351 Cache
Ampere’s law, 6, 153 associativity, 202
ANOVA approach, 547 hit ratio, 200, 208
Antenna CD burner XP, 180
feed, 65 CE-based FETI-DP method, 252
helical, 505 Clenshaw-Curtis
high-directivity lens horn, 435 formulation, 539
horn, 446 rule, 540
H-plane horn, 431 Cloaking shell, 416
lens, 428 Cobblestone
patch, 544 Cobblestone distance sorting
phased-array, 269 technique, 305
reflector, 5, 64, 67 Cobblestone partitioning method, 306
under test (AUT), 24 Coefficient
Vivaldi antenna array, 264 reflection, 23
Aperture distribution, 17 transmission coefficient, 277, 431
Arithmetic mean, 221 Compilation environment, 188
Asymptotic Computation performance, 142
expansion, 22 Computational method
form, 14 CEM, 227
relation, 58 FDTD, 3, 176
FEM, 3, 176, 227
MoM, 3, 176
Back-projection coordinate system, 45 physical optics, 3
Basic linear algebra subprograms, 317 Computer-aided design, 254
Bathymetry, 148 Conductor
Beamwidth, 71 perfect electric, 90, 457
half-power, 40, 66 perfect magnetic, 14
main beam, 78 Conformal modeling, 104
Best linear unbiased prediction, 551 Contiguous memory block, 123
Binomial approximation, 50 Courant
BIOS, 178, 183 Courant condition, 166
Biot-Savart law, 339, 382 Courant condition
BLAS library, 319 limit, 150
Boolean matrix, 230 stability condition, 162, 164
Borris approach, 155 CPU, 143, 178, 219, 303, 323
Boundary CUDA, 127
integral equation, 229 block, 133

569
570 Advanced Computational Electromagnetic Methods and Applications

grid, 133 European Committee for Electro-


thread, 133 technical Standardization, 524

Density Fabry-Pérot resonance, 428


electric charge, 6 Faraday
electric current, 6 law, 6, 7
electric flux, 6 rotation, 160
magnetic flux, 6 Far-field pattern, 5, 67
spectral, 10 Fast multipole algorithm, 299
Direct component, 335 FCC, 4
Directivity, 25, 65 FDTD
Dirichlet continuity condition, 231, 237, compact, 109, 111
245 extended-stencil, 84
Discretized transmission condition, 244 higher order, 84
Domain decomposition, 229, 254 high-order, 84, 94
Domain impedance, 221 plasma model, 150
Doppler S-FDTD, 147, 167, 169
effects, 493, 500, 501 Fetch operation, 201
frequencies, 501 FFTPACK Fortran packages, 44
spectrum, 500 Field
Double negative, 411 electric, 2
electromagnetic, 1, 9
far, 1, 20
Earth’s topography, 148 magnetic, 5
Earth-ionosphere waveguide, 148 near, 2, 20
Eigenmode, 88 First-order transmission condition, 252
Eigenvalue theory, 88 Fourth-order differences, 84
Electromagnetic Codes Consortium, 324 Frame
Electromagnetic compatibility, 331 C, 495
Electromagnetic energy concentrator, G, 499
412 S, 495
Electromagnetic interference, 333 T, 499
Electron gyro-frequency, 161 Frame hopping method, 494
Elevation-azimuth (EL-AZ) coordinate Free ISO burner, 180
system, 50 Frobenius norm, 320
ELF wave, 163, 165 Full-wave simulation, 422
Equation Fused multiply-add (FMA), 190
combined field integral, 459 FV24 algorithm, 108
electric-field integral, 286
Helmholtz, 7
Maxwell’s, 6, 106, 109, 151, 415, Gauss
421, 424 Gauss’ law, 6
Maxwell's, 4 Gaussian quadrature, 487, 490, 493
momentum, 156 Gauss-Legendre quadrature, 69, 71
time domain integral, 455 Gauss-Legendre rules, 285
Equivalent source model, 382 magnetic Gauss’ law, 6
Eulerian angles, 48, 51 Generalized polynomial chaos, 167
Genetic algorithm (GA), 405
Index 571

Geometric mean, 221 Jacobian matrix, 413


Geometrical optics, 150 Jacobian transformation matrix, 415
Gibb’s phenomenon, 33 Japanese model, 520
Global Earth-ionosphere system, 166 Job scheduling, 208
Global FDTD Earth-ionosphere model,
147
Global FDTD model, 151 KMP_AFFINITY, 208
GPU, 130 Korean model, 520
Graded index, 435 Kriging method, 527, 550, 551, 552, 553,
Green’s function integral, 283 555
GRIN metamaterial lens, 446, 448 K-tuning parameters, 108

Hemispherical radome, 269 Lagrange multiplier, 230, 232, 253, 255


Hierarchical vector basis functions, 273 Lagrange polynomial interpolation, 534
Homogeneous phantom, 543 Language
Human C++ prorgam, 190
morphology, 524 Latin hypercube sampling, 539
tissue, 524 Least-square solution, 285
Leave-one-out cross validation, 532
Legendre polynomials, 540, 544
IIR filter, 100 Lenz’s law, 393, 396
Impedance, 22 LHS, 528
characteristic, 23, 221 Linear-triangular approach, 43
intrinsic, 9 Linear-triangular interpolation, 43
Infinite impulse response, 99 Linux
Insulated-gate bipolar transistor, 391 CentOS, 180
Integral equation, 457 installation, 180
Integral equation rank revealing, 300 operating system, 180, 186
Intel manycore platform software stack, Red hat enterprise, 180, 182
182 SELinux, 184
Intel Xeon E3, 178 SSH Access, 186
Intel Xeon Phi coprocessor, 175 SSH key, 186
Intelligent grid, 43 SuSE, 182
International Commission on Non- Lithosphere, 165
Ionizing Radiation Protection, 524 LOOCV, 545
International Electrotechnical Lorentz equation, 168
Commission, 524 Low pass filter, 142
International Geomagnetic Reference Low rank decomposition, 303, 316
Field, 170 Lower upper (LU) decomposition, 301
International Reference Ionosphere, 171
International Standardization Committee,
524 Magnetic field boundary condition, 463
Inverse discrete Fourier transform, 503 Many integrated core (MIC), 177
Inverse mapping functions, 42 Mapping transformation function, 424
Ionosphere, 147 Marching-on-in-degree, 456
Isotropic medium, 8 Marching-on-in-time, 515
Mathwork’s Global Optimization
Toolbox, 94
572 Advanced Computational Electromagnetic Methods and Applications

MATLAB, 43, 117, 331 Near-zero index, 411


Mean square error, 531 Nested Latin hypercube sampling, 530
Measurement coordinate system, 45 Neumann boundary condition, 230, 236,
Memory 243
cached, 137 Normalized residual error, 284
computer, 123 Norman, 520
global, 128, 137 Numerical dispersion characteristics, 101
host, 130, 142 Numerical instability, 105
memory management, 201 Nyquist-Shannon sampling theorem, 35,
shared, 137 164
Metamaterial, 411
cloaks, 436
GRIN lenses, 436 Octree partitioning technique, 304
Metamaterial-based gradient index O-mode, 162
lenses, 428 One Lagrange multiplier, 229
Zero index, 428 Optical transformation, 411
Method Ordinary Kriging, 551
conformal FETI-DP, 243 Original space, 417
FETI-DP, 243 Over-dense plasma, 162
Galerkin’s, 230 Over-the-horizon radar, 147
generalized minimal residual, 358
generalized minimum residual
method, 235 Parallel
Krylov subspace, 235, 317 efficency, 200
least angle regression, 542 FDTD method, 199, 204
MacCormack, 156 message passing interface (MPI), 256
of stationary phase, 14 OpenMP, 210, 217, 301, 319
stabilized biconjugate gradient, 235 processing, 211
SVD, 307 Parseval’s theorem, 30
temporal Galerkin matching, 515 Pattern, 26
MFIE, 459 far-field, 12, 25, 40
Mie series, 323 far-field electric field, 22
Minkowski relationship, 412 radiated far-field, 20
Mode radiation, 23, 54
common, 337 PCI express, 176, 178
differential, 337 PEC circular cylinder, 506
Monostatic RCS, 299, 508 Perceval’s theorem, 20
Monte Carlo method, 519, 525, 534, 538, Phasor domain, 5
546 Physical opitics, 66
Multibeam antenna, 411 Plane wave expansion, 4
Multilevel fast multipole algorithm, 299 Plasma
Multiresolution analysis, 348 cold, 150
magnetized, 148
magnetized ionospheric, 151
NASA almond, 511 PML, 94, 259
National Geophysical Data Center, 170 CPML, 94, 222
National Oceanic and Atmospheric PO approximation, 66
Administration, 170 Poggio-Miller-Chang-Harrington-Wu-
Naval Research Laboratory, 269 Tai, 463
Index 573

Polarization Fourier transform, 27


circular, 65 Relationship
cross, 69 asymptotic, 22
left-hand circular, 159 Fourier transform, 27
orthogonal, 23 Robin boundary condition, 232, 233, 236,
right-hand circular, 23, 159 248
Polynomial chaos expansion, 168, 527, Robin transmission condition, 247
534 Rytov approximation, 148, 150
Power
radiated, 19, 20, 22, 24, 56
system, 332 SAR10g, 546
Predictor-corrector method, 156, 158 Scalability, 262
Principal value integral, 459 Second-order differencing, 84
Propagation constant, 7, 11, 13, 272 Sensitivity analysis, 546
Pulse Sidelobe, 16, 78
Gaussian, 160, 165 SIMD instructions, 176
triangular, 19 Simplified conformal technique, 104
Single instruction multiple data, 175
Single negative, 411
Quadratic B-spline function, 467 Singular value decomposition, 299
Smolyak, 539
Sobol
Radar cross-section, 502 decomposition, 547
Radiation indices, 547
efficiency, 23 Source
intensity, 25 hard, 99
Randomized projection approach, 309 soft, 99
Randomized pseudo-skeleton Space-time transform, 494
approximation, 299, 312 Specific absorption rate, 519
Rao-Wilton-Glisson (RWG), 473, 474 Spectral bandwidth, 16, 17
basis function, 301, 475, 508 Spectrum
curved basis function, 511 continuous, 10
Ray-tracing, 147 discrete, 10
RCS, 323 plane wave, 10
Reflector Speed of light, 8
circular symmetric, 77 Split ring resonator, 428
elliptical symmetric, 76 Standard deviation, 525
parabolic, 65 Successive over-relaxation, 358
Region Surface current, 65
far-field, 1
invisible, 15, 20, 21
invisible region, 19 Temporal basis functions, 467
near-field, 2 Time domain
source-free, 7, 14 impedance, 221
visible, 19, 21, 35 reflectometer (TDR), 220
visible region, 15 Topography, 165
Relation Total-field/scattered-field (TFSF)
constitutive, 7, 8 approach, 101
dispersion, 8, 88
574 Advanced Computational Electromagnetic Methods and Applications

Transform Uninstall Intel MPSS, 183


2-D Fourier, 11
discrete Fourier, 27, 30
discrete-time Fourier, 28 Vector
fast Fourier, 4, 28 Cartesian, 26
Fourier, 11, 13, 17, 21, 34, 501 identity, 7
inverse discrete Fourier, 30 potential, 7, 14
inverse discrete wavelet, 349 potentials, 61
inverse FFT, 29 Poynting, 25
inverse Fourier, 12, 34 propagation constant, 8, 14
Nataf, 537 VFY218 airplane, 325
optics, 427
Rosenblatt, 537
Transformation matrices, 46, 50 Wave
Transformed space, 415, 417 evanescent, 8, 16, 20, 21
Transmission line, 23 L, 161
Transverse R, 161
electric, 253 Wavenumber, 7, 35
electric second-order transmission West (FFTW) subroutine library, 44
condition, 252 Whistler mode, 161
magnetic, 151, 155, 253
Two Lagrange multipliers, 229
Xeon Phi coprocessor, 175, 195
XLPE cable, 370
Uniform rectangular aperture, 16 X-mode, 162
Recent Titles in the Artech House
Antennas and Electromagnetics Analysis Library
Jennifer T. Bernhard, Series Editor

Adaptive Array Measurements in Communications, M. A. Halim


Advanced Computational Electromagnetic Methods and
Applications, Wenhua Yu, Wenxing Li, Atef Elsherbeni,
Yahya Rahmat-Samii, Editors
Advances in Computational Electrodynamics: The Finite-Difference
Time-Domain Method, Allen Taflove, editor
Advances in FDTD Computational Electrodynamics: Photonics and
Nanotechnology, Allen Taflove, editor; Ardavan Oskooi and
Steven G. Johnson, coeditors
Analysis Methods for Electromagnetic Wave Problems, Volume 2,
Eikichi Yamashita, editor
Antenna Design with Fiber Optics, A. Kumar
Antenna Engineering Using Physical Optics: Practical CAD
Techniques and Software, Leo Diaz and Thomas Milligan
Antennas and Propagation for Body-Centric Wireless
Communications, Second Edition, Peter S. Hall and Yang Hao,
editors
Antennas and Site Engineering for Mobile Radio Networks,
Bruno Delorme
Analysis of Radome-Enclosed Antennas, Second Edition,
Dennis J. Kozakoff
Applications of Neural Networks in Electromagnetics,
Christos Christodoulou and Michael Georgiopoulos
AWAS for Windows Version 2.0: Analysis of Wire Antennas and
Scatterers, Antonije R. Djordjevic′, et al.

Broadband Microstrip Antennas, Girsh Kumar and K. P. Ray


Broadband Patch Antennas, Jean-François Zürcher and
Fred E. Gardiol
CAD of Microstrip Antennas for Wireless Applications,
Robert A. Sainati
The CG-FFT Method: Application of Signal Processing Techniques to
Electromagnetics, Manuel F. Cátedra, et al.
Computational Electrodynamics: The Finite-Difference Time-Domain
Method, Third Edition, Allen Taflove and Susan C. Hagness
Electromagnetic Modeling of Composite Metallic and Dielectric
Structures, Branko M. Kolundzija and Antonije R. Djordjevic′
Electromagnetic Waves in Chiral and Bi-Isotropic Media,
I. V. Lindell, et al.
Electromagnetics, Microwave Circuit and Antenna Design for
Communications Engineering, Peter Russer
Engineering Applications of the Modulated Scatterer Technique,
Jean-Charles Bolomey and Fred E. Gardiol
Fast and Efficient Algorithms in Computational Electromagnetics,
Weng Cho Chew, et al., editors
Frequency-Agile Antennas for Wireless Communications,
Aldo Petosa
Fresnel Zones in Wireless Links, Zone Plate Lenses and Antennas,
Hristo D. Hristov
Handbook of Antennas for EMC, Thereza MacNamara
Handbook of Reflector Antennas and Feed Systems, Volume I:
Theory and Design of Reflectors, Satish Sharma, Sudhakar Rao,
and Lotfollah Shafai, editors
Handbook of Reflector Antennas and Feed Systems, Volume II: Feed
Systems, Lotfollah Shafai, Satish Sharma, and Sudhakar Rao,
editors
Handbook of Reflector Antennas and Feed Systems, Volume III:
Applications of Reflectors, Sudhakar Rao, Lotfollah Shafai, and
Satish Sharma, editors
Introduction to Antenna Analysis Using EM Simulators,
Hiroaki Kogure, Yoshie Kogure, and James C. Rautio
Iterative and Self-Adaptive Finite-Elements in Electromagnetic
Modeling, Magdalena Salazar-Palma, et al.
LONRS: Low-Noise Receiving Systems Performance and Analysis
Toolkit, Charles T. Stelzried, Macgregor S. Reid, and
Arthur J. Freiley
Measurement of Mobile Antenna Systems, Second Edition,
Hiroyuki Arai
Microstrip Antenna Design Handbook, Ramesh Garg, et al.
Microwave and Millimeter-Wave Remote Sensing for Security
Applications, Jeffrey A. Nanzer
Mobile Antenna Systems Handbook, Third Edition,
Kyohei Fujimoto, editor
Multiband Integrated Antennas for 4G Terminals,
David A. Sánchez-Hernández, editor
Noise Temperature Theory and Applications for Deep Space
Communications Antenna Systems, Tom Y. Otoshi
Phased Array Antenna Handbook, Second Edition,
Robert J. Mailloux
Phased Array Antennas with Optimized Element Patterns,
Sergei P. Skobelev
Plasma Antennas, Theodore Anderson
Printed MIMO Antenna Engineering, Mohammad S. Sharawi
Quick Finite Elements for Electromagnetic Waves, Giuseppe Pelosi,
Roberto Coccioli, and Stefano Selleri
Radiowave Propagation and Antennas for Personal
Communications, Second Edition, Kazimierz Siwiak
Reflectarray Antennas: Analysis, Design, Fabrication and
Measurement, Jafar Shaker, Mohammad Reza Chaharmir, and
Jonathan Ethier
Solid Dielectric Horn Antennas, Carlos Salema, Carlos Fernandes,
and Rama Kant Jha
Switched Parasitic Antennas for Cellular Communications,
David V. Thiel and Stephanie Smith
Ultrawideband Antennas for Microwave Imaging Systems,
Tayeb A. Denidni and Gijo Augustin
Understanding Electromagnetic Scattering Using the Moment
Method: A Practical Approach, Randy Bancroft
Wavelet Applications in Engineering Electromagnetics, Tapan
Sarkar, Magdalena Salazar Palma, and Michael C. Wicks

For further information on these and other Artech House titles, includ-
ing previously considered out-of-print books now available through our
In-Print-Forever® (IPF®) program, contact:

Artech House Artech House


685 Canton Street 16 Sussex Street
Norwood, MA 02062 London SW1V HRW UK
Phone: 781-769-9750 Phone: +44 (0)20 7596-8750
Fax: 781-769-6334 Fax: +44 (0)20 7630 0166
e-mail: [email protected] e-mail: [email protected]

Find us on the World Wide Web at: www.artechhouse.com

You might also like