Fast MATLAB Assembly of FEM Matrices in 2D and 3D: Edge Elements

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/265729863

Fast MATLAB assembly of FEM matrices in 2D and 3D: Edge elements

Article in Applied Mathematics and Computation · September 2014


DOI: 10.1016/j.amc.2015.03.105 · Source: arXiv

CITATIONS READS
61 4,745

2 authors, including:

Jan Valdman
Institute of Information Theory and Automation, Czech Academy of Sciences
82 PUBLICATIONS 695 CITATIONS

SEE PROFILE

All content following this page was uploaded by Jan Valdman on 17 September 2018.

The user has requested enhancement of the downloaded file.


Fast MATLAB assembly of FEM matrices in 2D and 3D: Edge elements

I. Anjama,∗, J. Valdmanb,c
a Department of Mathematical Information Technology, University of Jyväskylä, Finland
b Institute of Mathematics and Biomathematics, University of South Bohemia, České Budějovice, Czech Republic
c Institute of Information Theory and Automation of the ASCR, Prague, Czech Republic

Abstract
We propose an effective and flexible way to assemble finite element stiffness and mass matrices in MATLAB. We apply
arXiv:1409.4618v2 [cs.MS] 11 May 2015

this for problems discretized by edge finite elements. Typical edge finite elements are Raviart-Thomas elements used in
discretizations of H (div) spaces and Nédélec elements in discretizations of H (curl) spaces. We explain vectorization ideas
and comment on a freely available MATLAB code which is fast and scalable with respect to time.
Keywords: MATLAB code vectorization, Finite element method, Edge element, Raviart-Thomas element, Nédélec
element

1. Introduction

Elliptic problems containing the full gradient operator ∇ of scalar or vector arguments are formulated in weak forms in
H 1 Sobolev spaces and discretized using nodal finite element functions. Efficient MATLAB vectorization of the assembly
routine of stiffness matrices for the linear nodal finite element was explained by T. Rahman and J. Valdman in [11].
The focus of this paper is generalizing the ideas of [11] to arbitrary finite elements, including higher order elements, and
vector problems operating with the divergence operator div and the rotation operator curl. Such problems appear in
electromagnetism and are also related to various mixed or dual problems in mechanics. Weak forms of these problems are
defined in H (div) and H (curl) Sobolev spaces. A finite element discretization is done in terms of edge elements, typically
Raviart-Thomas elements [12] for H (div) problems and Nédélec elements [9] for H (curl) problems. Edge element basis
functions are not defined on the nodes of 2D triangular or 3D tetrahedral meshes, but on edges and faces. Edge elements
provide only partial continuity over element boundaries: continuity of normal vector component for H (div) problems and
continuity of tangential vector component for H (curl) problems.
The method of finite elements applied to H (div) and H (curl) problems and its implementation has been well docu-
mented, see for instance [16] including higher order polynomials defined through hierarchical bases. A user can find many
software codes (for instance NGSOLVE [14] or HERMES [15]) written in object oriented languages allowing for higher
order elements defined on elements with curved boundaries. These codes are very powerful, capable of high complexity
computations and they provide certain flexibility via user interface. However, such codes are not so easy to understand and
modify unless one is quite familiar with the code. We believe that our MATLAB code is more convenient for students and
researchers who wish to become familiar with edge elements and prefer to have their own implementation. We consider
the lowest order linear edge elements defined on 2D triangles and 3D tetrahedra only. However, it is straightforward to
extend the code to use higher order elements, since the assembly routines remain almost the same regardless of the element
order.
There are plenty of papers [4, 7, 8, 5] dedicated to implementing vectorized FEM assembly routines for nodal elements
in MATLAB. In [8] the authors also discuss the Raviart-Thomas element in 3D, but do not provide the program code.
The iFEM package [4] has efficient implementation of FEM assembly routines for various different linear and higher order
elements. The paper [2] considers the implementation of Raviart-Thomas elements (in a non-vectorized way), providing a
good inspiration for the implementation of a multigrid based solver for H (div) majorant minimization [17] by the second
author.
Our implementation generalizes the approach of [11] to work with arbitrary affine finite elements. It is based again on
operations with long vectors and arrays in MATLAB and it is reasonably scalable for large size problems. On a typical
computer with a decent processor and enough system memory, the 2D/3D assemblies of FEM matrices are very fast. For
example, a 2D assembly of matrices with around 10 million rows takes less than a minute. Vectorization of calculations

∗ Correspondingauthor at: P.O.Box 35 (Agora), FI-40014 University of Jyväskylä, Finland


Email addresses: [email protected] (I. Anjam), [email protected] (J. Valdman)

Postprint of the paper DOI: 10.1016/j.amc.2015.03.105 published in Applied Mathematics and Computation 2015
typically requires more system memory, but the performance degrades only when the system memory becomes full. The
software described in this paper is available for download at MATLAB Central at
http://www.mathworks.com/matlabcentral/fileexchange/46635
It also includes an implementation of the linear nodal finite element in 2D and 3D with the generalized approach, so an
interested reader can compare the performance of the code described in [11]. The key idea in our generalized approach
is to vectorize the integration procedure of scalar and vector valued functions on affine meshes. This allows also for fast
evaluation of norms of functions.
The paper is divided as follows: In Section 2 we briefly describe the implemented linear edge elements. In Section 3
we go through the particular constructions related to the implementation of the elements and vectorization details. We
also show the performance of the vectorized assembly routines with respect to time and scalability. Section 4 illustrates
two applications of edge elements: a functional majorant minimization in a posteriori error analysis and solving of an
electromagnetic problem.

2. Linear edge elements

We denote by Ω an open, bounded, and connected Lipschitz domain in Rd , where d ∈ {2, 3} denotes the space
dimension. The divergence (2D and 3D) and rotation (3D) of a vector valued function w : Ω → Rd are defined as
 
Xd ∂2 w3 − ∂3 w2
div w := ∂i wi and curl w := ∂3 w1 − ∂1 w3  .
i=1 ∂1 w2 − ∂2 w1

We consider two types of rotation operators in 2D, the vector operator curl and the scalar operator curl
 
∂2 f
curl f := and curl w := ∂1 w2 − ∂2 w1
−∂1 f

applied to a scalar function f : Ω → R and to a vector function w : Ω → R2 . The operator curl is frequently called the
”co-gradient” in literature, and is often denoted by ∇⊥ . The operators give rise to the standard Sobolev spaces:

H (div, Ω) := {v ∈ L2 (Ω, Rd ) | div v ∈ L2 (Ω)},


{v ∈ L2 (Ω, R3 ) | curl v ∈ L2 (Ω, R3 )}

if d = 3
H (curl, Ω) := ,
{v ∈ L2 (Ω, R2 ) | curl v ∈ L2 (Ω)} if d = 2

where L2 denotes the space of square Lebesque integrable functions. We will denote the L2 -norm of scalar and vector
valued functions by k·k := k·kL2 (Ω) . Assuming that Ω is discretized by a triangular (2D) or a tetrahedral (3D) mesh T ,
Raviart-Thomas and Nédélec elements represent basis functions in H (div, T ) and H (curl, T ) spaces.
In the case of the lowest order (linear) Raviart-Thomas and Nédélec elements, there is one global degree of freedom
(dof), i.e., one global basis function, related to either each edge (2D and 3D), or each face (3D) of a mesh T . Due to
construction, the global Raviart-Thomas basis functions and the Nédélec basis functions in 2D are nonzero only in the
two elements who share the edge/face that is related to the basis function. In 3D the global Nédélec basis function is
nonzero in all the elements sharing the related edge, and the number of these elements is usually more than two.

x̂2 x̂2 x̂3 x̂3

1 : ê1 , t̂1 5 : ê5 , t̂5


2 3 : fˆ3 , n̂3 3
1 : ê1 , n̂1 2
4 1 2
2 6

x̂1 x̂1 x̂2 4 x̂2


3 3 x̂1 1 x̂1
2D Raviart-Thomas 2D Nédélec 3D Raviart-Thomas 3D Nédélec
(edges, #dof=3) (edges, #dof=3) (faces, #dof=4) (edges, #dof=6)

Figure 1: Degrees of freedom of linear edge elements in the reference configuration K̂.

2
We denote the global edge/face basis functions by η RT and η Ned , and by x = (x1 , x2 , x3 )T the spatial variable in Ω.
The notation for reference basis functions and spatial variable is obtained by simply adding the hat ˆ·, i.e., x̂ denotes the
spatial variable in the reference element K̂. We will use the unit triangle in 2D and the unit tetrahedron in 3D as the
reference elements. We denote by êi the i’th edge of the reference triangle or tetrahedron, and by fˆi the i’th face of the
reference tetrahedron. The numbering of the edges and faces, i.e., the numbering of the degrees of freedom in the reference
elements, can be seen in Figure 1. In the following, FK denotes the affine element mapping FK (x̂) := BK x̂ + bK from the
reference element K̂ to an element K in the mesh.
A finite element is defined by the triplet {K̂, R̂, Â}, where K̂ is the reference configuration, R̂ the finite space of
functions defined on the reference configuration, and  is the set of linearly independent degrees of freedom. The reference
configurations we have already chosen. We also need a mapping which takes functions from R̂ and maps them from K̂ to
an element K on the mesh T . These mappings are called Piola mappings.

2.1. Raviart-Thomas element


The linear Raviart-Thomas element is based on the spaces (see, e.g., [9, 12])
*1 0 0 x̂ +
      1
1 0 x̂1
2D: R̂ = , , , 3D: R̂ = 0 , 1 , 0 , x̂2  ,
0 1 x̂2
0 0 1 x̂3

and the degrees of freedom for û ∈ R̂ read as


 Z   Z 
2D: Â = α̂i (û) = n̂i · û dŝ , i ∈ {1, 2, 3} , 3D: Â = α̂i (û) = n̂i · û dŝ , i ∈ {1, 2, 3, 4} (1)
êi fˆi

for every edge êi in 2D, or face fˆi in 3D, in the corresponding reference elements K̂. There are three dofs in 2D and four in
3D. Here n̂i is the normal unit vector of the edge êi , or the face fˆi . Here one has to choose which of the two possible unit
normal vectors to use. The standard choice of outer unit normals is depicted in Figure 1. The requirement α̂i (η̂jRT ) = δij
(where δij is the Kronecker delta) gives us the reference basis functions of the Raviart-Thomas element:
     
x̂1 x̂1 − 1 x̂1
2D: η̂1RT (x̂) = , η̂2RT (x̂) = , η̂3RT (x̂) = ,
x̂2 x̂2 x̂2 − 1
       
x̂1 x̂1 x̂1 − 1 x̂1
3D: η̂1RT (x̂) =  x̂2  , η̂2RT (x̂) = x̂2 − 1 , η̂3RT (x̂) =  x̂2  , η̂4RT (x̂) = x̂2  .
x̂3 − 1 x̂3 x̂3 x̂3

In order to preserve normal continuity of the reference basis functions, we need to use the so-called Piola mappings. The
values and the divergence values are mapped as follows (see, e.g., [3]):
1 1
η RT K
(x) = BK η̂ RT (FK
−1
(x)) and div η RT K
(x) = div η̂ RT (FK
−1
(x)). (2)
det BK det BK

2.2. Nédélec element


The linear Nédélec element is based on the spaces (see, e.g., [9, 13])
*            +
      1 0 0 0 x̂3 x̂2
1 0 x̂2
2D: R̂ = , , , 3D: R̂ = 0 , 1 , 0 , x̂3  ,  0  , x̂1  ,
0 1 −x̂1
0 0 1 x̂2 x̂1 0

and the degrees of freedom for û ∈ R̂ in both dimensions are related to the edges of the elements:
 Z 
 = α̂i (û) = t̂i · û dŝ , i ∈ {1, 2, . . .} (3)
êi

for every edge êi in the reference configuration K̂. There are three dofs in 2D and six in 3D. Here t̂i is the tangential unit
vector of the edge êi . Similarly to the Raviart-Thomas element, one has to choose which direction for the unit tangential
vectors to use. Our choice is depicted in Figure 1. The requirement α̂i (η̂jNed ) = δij gives us the reference basis functions

3
of the Nédélec element:
     
−x̂2 −x̂2 1 − x̂2
2D: η̂1Ned (x̂) = Ned
, η̂2 (x̂) = Ned
, η̂3 (x̂) = ,
x̂1 x̂1 − 1 x̂1
     
1 − x̂3 − x̂2 x̂2 x̂3
3D: η̂1Ned (x̂) =  x̂1  , η̂2Ned (x̂) = 1 − x̂3 − x̂1  , η̂3Ned (x̂) =  x̂3 ,
x̂1 x̂2 1 − x̂2 − x̂1
     
−x̂2 0 x̂3
η̂4Ned (x̂) =  x̂1  , η̂5Ned (x̂) = −x̂3  , η̂6Ned (x̂) =  0  .
0 x̂2 −x̂1

Again, we need to use a Piola mapping in order to preserve the tangential continuity (see, e.g., [9, 13]). The values are
mapped as follows:
η Ned K (x) = BK
−T Ned
η̂ (FK −1
(x)). (4)
The rotation is mapped differently depending on the dimension:
1
2D: curl η Ned K
(x) = curl η̂ Ned (FK
−1
(x)), (5)
det BK
1
3D: curl η Ned K
(x) = BK curl η̂ Ned (FK−1
(x)). (6)
det BK

2.3. Orientation of local degrees of freedom


In order to obtain the global basis functions η RT and η Ned , the transformations described in the previous sections are
not enough. A global basis function is related to more than one element. No consideration has been yet made in making
sure that the local orientation of the degrees of freedom (1) and (3) in these different elements is the same. The orientation
must be same in order for the Raviart-Thomas and Nédélec elements to produce functions whose normal component, or
tangential component (respectively) are continuous at element interfaces.
Take for example the Raviart-Thomas element in 2D. Let Kn and Km be two elements in a mesh which share an edge
(see Figure 3), and let η RT be the global basis function related to this edge. We denote by η̂kRT and η̂lRT the reference
basis functions which we will transform from K̂ to Kn and Km respectively, in order to obtain the global basis function.
By taking a look at the dofs (1), we see that we are always using the outer unit normals to compute the local basis
functions. If we simply use the transformation (2), the normal component of the values at the edge might be the opposite
of each other. This depends wether or not the element mappings FKn and FKm preserve orientation. If det BKn > 0, and
det BKm < 0, the element mapping FKn preserves the counter-clockwise orientation of the reference element, and FKm is
oriented clockwise. This means that on the common edge the orientation is in the same direction, and the transformation
(2) is enough for both elements. However, otherwise the orientation on the common edge will be the opposite, and one of
the transformations must be multiplied by −1. The global basis function is thus obtained by
1 1
η RT Kn
(x) = [signkKn ] BKn η̂kRT (FK
−1
(x)) and η RT Km
(x) = [signlKm ] BKm η̂lRT (FK
−1
(x)), (7)
det BKn n
det BKm m

where

[signkKn ] = +1, [signlKm ] = +1 if det BKn > 0, det BKm < 0,


[signkKn ] = +1, [signlKm ] = −1 if det BKn > 0, det BKm > 0,
[signkKn ] = −1, [signlKm ] = +1 if det BKn < 0, det BKm < 0,
[signkKn ] = −1, [signlKm ] = −1 if det BKn < 0, det BKm > 0.

We call these values the sign data related to each of the two elements. Note that the above means that the global basis
function is obtained simply by
1 1
η RT Kn
(x) = + BKn η̂kRT (FK
−1
(x)) and η RT Km
(x) = − BKm η̂lRT (FK
−1
(x)),
|det BKn | n
|det BKm | m

but we will use (7) since it is more convenient to implement in program code.

4
The situation is the same for 3D Raviart-Thomas element and the 2D Nédélec element. For the Nédélec element in
3D one needs to be more careful since a global basis function may be nonzero in a relatively large patch of elements, and
the relevant orientation is related to an edge. Note also that the same sign data must be used when transforming the
divergence or rotation of the basis functions.

2.4. Finite element matrices


We are interested in assembly of the mass matrices MRT , MNed and the stiffness matrices KRT , KNed defined by
Z Z
RT RT RT Ned
Mij = ηi · ηj dx, Mij = ηiNed · ηjNed dx,
Ω Ω
Z Z
KRT
ij = div ηi
RT
div ηj
RT
dx, K Ned
ij = curl ηiNed · curl ηjNed dx,
Ω Ω

where the indexes i and j are the global numbering of the degrees of freedom, i.e., they are related to the edges or faces
of a mesh. By using the Piola mappings (with correct orientations), we are able to assemble the local matrices using the
reference element.
By using (7), the local matrices related to the global matrices MRT and KRT can be calculated on each element K ∈ T
by
1
Z
RT,K
Mkl = [signkK ]BK η̂kRT (x̂) · [signlK ]BK η̂lRT (x̂) dx̂,
|det BK | K̂
1
Z
KRT,K
kl = [signkK ]div η̂kRT (x̂) [signlK ]div η̂lRT (x̂) dx̂, (8)
|det BK | K̂

where K̂ is the reference element. The indexes k and l run through all the local basis functions in the element: k, l ∈ {1, 2, 3}
in 2D, and k, l ∈ {1, 2, 3, 4} in 3D.
Similarly, by using (4) (and considering the correct orientations, see Section 2.3) the local mass matrices related to the
global mass matrix MNed can be calculated on each element K ∈ T by
Z
MNed,K
kl = |det B K | [signkK ]BK η̂k (x̂) · [signlK ]BK
−T Ned −T Ned
η̂l (x̂) dx̂,

and by using (5)–(6) the local stiffness matrices related to the global stiffness matrix KNed can be calculated by

1
Z
Ned,K
2D: Kkl = [signkK ]curl η̂kNed (x̂) [signlK ]curl η̂lNed (x̂) dx̂,
|det BK | K̂
1
Z
3D: KNed,K
kl = [signkK ]BK curl η̂kNed (x̂) · [signkK ]BK curl η̂lNed (x̂) dx̂.
|det BK | K̂

The indexes k and l run through all the local basis functions in the reference element: k, l ∈ {1, 2, 3} in 2D, and
k, l ∈ {1, 2, 3, 4, 5, 6} in 3D.

3. Implementation of edge elements

We denote by #ω the number of elements in the set ω, and by N , E, F , and T the sets of nodes, edges, faces, and
elements, respectively. Note that faces F exist only in 3D. We need the following structures representing the mesh in
order to implement edge elements. The second column states the size of the structure, and the third column the meaning
of the structure.

nodes2coord #N × 2/3 nodes defined by their two/three coordinates in 2D/3D (in [11] coordinates)
edges2nodes #E × 2 edges defined by their two nodes in 2D/3D
faces2nodes #F × 3 faces defined by their three nodes in 3D

With these matrices available, we can then express every element by the list of its nodes, edges, or faces:

elems2nodes #T × 3/4 elements by their three/four nodes in 2D/3D (in [11] elements)
elems2edges #T × 3/6 elements by their three/six edges in 2D/3D
elems2faces #T × 4 elements by their four faces in 3D

5
   

elems2nodes = å b̊ c̊ (row n) elems2edges = a b c (row n)


   

c̊    

a å
 1 å2 
 (row å)
a
 1 a2 
 (row a)
b    
Kn 







nodes2coord = 
 b̊1 b̊2 
 (row b̊) edges2nodes = 
 b1 b2 
 (row b)
å c b̊








 c̊1 c̊2  (row c̊)  c1 c2  (row c)
   
nodes2coord

Figure 2: Elements by their nodes and edges, i.e., global numbering of degrees of freedom for 2D linear finite elements.

In 2D both the linear Raviart-Thomas element and the linear Nédélec element have a degree of freedom related to
each of the three edges of the reference triangle, totalling three dofs. In 3D the linear Nédélec element has a dof related to
each of the six edges, and the Raviart-Thomas element will have a dof related to each of the four faces. Thus, the global
numbering of degrees of freedom is given by the row indices of edges2nodes or faces2nodes. For a particular element K
in the mesh, the global dofs related to it are then given by the structures elems2edges or elems2faces, respectively. For
nodal elements the global numbering of dofs is given by the row indices of nodes2coord, and the dofs related to a particular
element K are given by elems2nodes. In Figure 2 we have further illustrated the structure of the mesh data in 2D.
Since the degrees of freedom are integrals over edges or faces, we need to pay attention to orientation (see Sections 2.3
and 2.4). In practice we need to know how every edge/face of every element is oriented. Orientation is naturally given
either by +1 or −1. We need the following structures:

signs_e #T × 3/6 +1 or −1 for every edge of an element, corresponding to elems2edges


signs_f #T × 4 +1 or −1 for every face of an element in 3D, corresponding to elems2faces

In 2D obtaining the sign data for an element Ki can be conveniently done by examining elems2nodes(i,:). The first
edge of Ki (elems2edges(i,1)) is the edge from node 2 (elems2nodes(i,2)) to node 3 (elems2nodes(i,3)). We can then
simply agree that if the global node indices satisfy elems2nodes(i,2) > elems2nodes(i,3), we assign signs_e(i,1) = 1, and
signs_e(i,1) = −1 otherwise. This gives us the signs, or their opposites, as described in Section 2.3. This sign data can
be used for both Raviart-Thomas element and the Nédélec element in 2D. The data structures are illustrated in Figure 3.
The procedure of determinning the signs for the 3D elements is straightforwads as well, but we will not comment on
it here. In our software package the edges in 2D and 3D are calculated by the function get_edges() and the orientation
related to edges is calculated by the function signs_edges(). In 3D the faces are calculated by the function get_faces(),
and the orientation related to faces is calculated by signs_faces().


+ a
 

− + Km b Kn elems2edges = a b c (row n)
 

− c  

signs_e = a± +1 c±  (row n)
 
2D Nédélec 2D Raviart-Thomas
(tangential direction) (normal direction)

Figure 3: Orientation of 2D edge elements sharing an edge, when both elements are oriented counter-clockwise, and c̊ > å. The thick line
denotes the ”positive direction”.

3.1. Vectorized integration procedure


As stated in the introduction, the key idea of this paper is to vectorize the integration procedure of an arbitrary function
on an arbitrary mesh. The main ingredient is how to efficiently use integration quadratures via the reference element. We
2
demonstrate our idea by explaining how to calculate the (squared) L2 -norm kfk of a function f ∈ L2 (Ω). Provided that
we have the structures nodes2coord and elems2nodes available, this is achieved (in the folder /example_majorant/) with the
following two lines:

6
[B_K,b_K,B_K_det] = affine_transformations(nodes2coord,elems2nodes);
f_L2norm = norm_L2( elems2nodes, B_K, b_K, B_K_det, f );

On the first line we obtain the affine transformations FK (x̂) = BK x̂ + bK and determinants of BK for all elements K:

B_K d × d × #T matrix parts BK


b_K #T × d vector parts bK
B_K_det #T × 1 the determinants det BK

The calculation of this data is done in a vectorized manner. On the second line we calculate the norm. The code of the
function /example_majorant/norm_L2.m is
function fnorm = norm_L2( elems2nodes, B_K, b_K, B_K_det, f );
1 B_K_detA = abs(B_K_det);
2 dim = size(B_K,1);
3 [ip,w,nip] = intquad(6,dim);
4 nelems = size(elems2nodes,1);
5 fnorm = zeros(nelems,1);
6 for i=1:nip
7 F_K_ip = squeeze(amsv(B_K, ip(i,:)))’ + b_K;
8 fval = f(F_K_ip);
9 fnorm = fnorm + w(i) .* B_K_detA .* fval.^2;
10 end
11 fnorm = sum(fnorm);

On line 2 we deduce the dimension of the mesh. On line 3 the function [ip,w,nip] = intquad(po,dim) returns an integration
quadrature of order po in the reference element. We use integration quadratures for triangles and tetrahedrons from [6]
and [18], respectively. In this example we use quadrature order 6, so the calculation of the L2 -norm is exact (up to machine
precision) for polynomials of order 3 and less. The quadrature consists of the integration points ip and the weighs w. The
variable nip is the number of integration points. On the lines 4 and 5 we deduce the number of elements in the mesh in
order to initialize the structure fnorm.
Note that the for-loop on line 6 is not over elements, but over integration points. This is what we mean by vectorization
of the for-loop over elements. Essentially we are replacing this loop with another, much smaller loop. Of course, since
all the affine mappings and other data has to be available for all elements at the same time, this method requires more
system memory.
On line 7 we transform the i’th integration point to the mesh for all elements K at the same time, and put this data
into the structure F_K_ip. We have used here some functionality from the folder /path/library_vectorization/, which was
also used in the vectorization of nodal elements in [11]. This folder contains functions which perform certain operations
between matrices and vectors, and does them in a vectorized manner. The function amsv.m from this folder takes in the
matrices B_K and does the necessary multiplication with the i’th integration point ip(i,:) for all entries simultaneously.
On line 8 we calculate the values of the function f on all of these points, and on line 9 we add the contributions of the
i’th integration point to fnorm.
After going through all the integration points, the structure fnorm contains the elementwise contributions of the norm,
2
i.e., fnorm(i)= kfkKi , where Ki is the element described by its nodes in elems2nodes(i,:). On the last line the elementwise
contributions are summed together to obtain kfk2 .

3.2. Vectorized finite element assembly routine


The vectorized integration procedure of the previous section can be directly applied for finite element matrix assembly
routines. As an example, we go through the needed program code for calculating the stiffness matrix KRT with Raviart-
Thomas elements. We assume we have the mesh in the form of the structures nodes2coord and elems2nodes, i.e., we have
the node coordinates, and the representation of elements by their nodes.
[B_K,~,B_K_det] = affine_transformations(nodes2coord,elems2nodes);
[elems2faces,faces2nodes] = get_faces(elems2nodes);
signs_f = signs_faces(nodes2coord,elems2faces,faces2nodes,B_K);

On the first line we obtain the affine transformation matrices and the determinants. On the second line we obtain the
structure elems2faces, which is the representation of elements by their faces. Note that indeed the numbers in elems2faces
are indices to faces2nodes. More importantly, elems2faces is the global numbering of the degrees of freedom for all elements
K in the mesh. On the third line we calculate the orientations for faces in 3D. Then, we call

7
K_RT0 = stiffness_matrix_RT0(elems2faces,B_K_det,signs_f);
M_RT0 = mass_matrix_RT0(elems2faces,B_K,B_K_det,signs_f);

to assemble the stiffness and mass matrices. The main part of the function stiffness_matrix_RT0 is the vectorized assembly
routine:
function STIFF = stiffness_matrix_RT0( elems, B_K_det, signs )
1 dim = size(elems,2)-1;
2 nelems = size(elems,1);
3 B_K_detA = abs(B_K_det);
4 [ip,w,nip] = intquad(1,dim);
5 [~,dval,nbasis] = basis_RT0(ip);
6 STIFF = zeros(nbasis,nbasis,nelems);
7 for i=1:nip
8 for m=1:nbasis
9 for k=m:nbasis
10 STIFF(m,k,:) = squeeze(STIFF(m,k,:)) + ...
11 w(i) .* B_K_detA.^(-1) .* ...
12 ( signs(:,m) .* dval(i,:,m) ) .* ...
13 ( signs(:,k) .* dval(i,:,k) );
14 end
15 end
16 end
17 STIFF = copy_triu(STIFF);
18 ...

Note that this function does the assembly in both 2D and 3D, depending on the input variable elems.
On lines 1–4 we deduce the dimension of the problem, deduce the number of elements in the mesh, calculate the
absolute values of the determinants, and obtain the first order integration quadrature on the reference element. This
is enough since for the linear Raviart-Thomas element the basis function divergences are constants. The function
[val,dval,nbasis] = basis_RT0(ip) returns the values val and divergence values dval of the linear Raviart-Thomas ref-
erence basis functions at the integration points. Since we are assembling the stiffness matrix, we need only the divergence
values. The variable nbasis is the number of basis functions per element. On line 6, the variable STIFF is initialized to be
of suitable size to contain all the local element matrices.
Note again that the outer for-loop on line 7 is not over elements, but over integration points. On lines 10–13 we
assemble the local matrix entry (m,k) (for the integration point i) for all elements at the same time. The assembly is
done according to (8). Note that since the matrix is symmetric, it is sufficient to assemble only the diagonal and upper
triangular entries, hence the indexing on the loop in line 9 begins from the previous loop index m, and not 1. On line 17
the symmetric entries are copied to the lower triangular part of STIFF. After this, the global matrix is assembled from the
local matrices in STIFF, but this part of the code we have excluded here.
This assembly routine consists only of the normal matrix operations of MATLAB. However, on most of the assem-
bly routines we need to perform more complicated array operations. This functionality is provided by functions in
/path/library_vectorization/.

3.3. Performance in 2D and 3D


For investigating the performance of our vectorized assembly routines, we chose an L-shaped domain in 2D, and the
unit cube for 3D. The results were performed with MATLAB 7.13.0.564 (R2011b) on a computer with 64 Intel(R) Xeon(R)
CPU E7-8837 processors running at 2.67GHz, and 1 TB system memory. The computer is located at the University of
Jyväskylä. Results can be seen in Tables 1 and 2.
Uniform refinement results in 4 times more triangles in 2D, and 8 times more tetrahedra in 3D. Thus, in each refinement
step the optimal increase in time would be 4 in 2D and 8 in 3D. We see from Tables 1 and 2 that both 2D and 3D assembly
routines scale with satisfactory performance as the problem size is increased. In 2D, on level 14 we already had over 2.4
billion elements, and the 1 TB system memory was still occupied by level 13 matrices. This forced the computer to start
using swap memory, which considerably slowed the calculation of the new ∼ 2.4 billion × 2.4 billion matrices for level 14.
It is also notable that in 3D the calculation of the matrices KNed and MNed for the Nédélec element takes over twice the
time compared to the calculation of KRT and MRT even though there are more degrees of freedom for the Raviart-Thomas
matrices. The reason becomes evident when comparing the amount of algebraic operations that need to be calculated:
for example, the divergences of Raviart-Thomas basis functions in 3D are scalar valued, but the rotations of Nédélec basis
functions in 3D are vector valued.

8
size of assembly of
level matrices KRT MRT KNed MNed
5 9 344 0.03 - 0.06 - 0.03 - 0.03 -
6 37 120 0.11 (3.6) 0.51 (8.5) 0.11 (3.6) 0.47 (15.6)
7 147 968 0.41 (3.7) 1.08 (2.1) 0.40 (3.6) 1.02 (2.1)
8 590 848 1.70 (4.1) 3.59 (3.3) 1.82 (4.5) 3.65 (3.5)
9 2 361 344 7.49 (4.4) 12.82 (3.5) 7.49 (4.1) 12.94 (3.5)
10 9 441 280 30.89 (4.1) 52.09 (4.0) 30.83 (4.1) 54.86 (4.2)
11 37 756 928 132.95 (4.3) 216.64 (4.1) 132.56 (4.2) 230.44 (4.2)
12 151 011 328 597.37 (4.4) 919.36 (4.2) 583.86 (4.4) 931.79 (4.0)
13 604 012 544 2620.11 (4.3) 3969.16 (4.3) 2840.51 (4.8) 4121.33 (4.4)
14 2 415 984 640 18333.25 (6.9) 33328.58 (8.3) 26781.41 (9.4) 37009.85 (8.9)

Table 1: 2D assembly times (in seconds) for an L-shaped domain Ω := (0, 1)2 \ (1/2, 1)2 . Values in brackets are the increase in time compared
to the previous step (the optimal increase is 4).

size of assembly of size of assembly of


level matrices KRT MRT matrices KNed MNed
1 2 808 0.02 - 0.09 - 1 854 0.05 - 0.09 -
2 21 600 0.14 (7.0) 0.39 (4.3) 13 428 0.30 (6.0) 0.79 (8.7)
3 169 344 0.82 (5.8) 2.18 (5.5) 102 024 1.92 (6.4) 4.53 (5.7)
4 1 340 928 7.15 (8.7) 15.35 (7.0) 795 024 15.44 (8.0) 33.43 (7.3)
5 10 672 128 59.37 (8.3) 125.71 (8.1) 6 276 384 129.91 (8.4) 282.14 (8.4)
6 85 155 840 503.89 (8.4) 1054.49 (8.3) 49 877 568 1125.08 (8.6) 2291.50 (8.1)
7 680 361 984 4437.84 (8.8) 8717.70 (8.2) 397 689 984 10232.01 (9.0) 20028.06 (8.7)

Table 2: 3D assembly times (in seconds) for the unit cube Ω := (0, 1)3 . Values in brackets are the increase in time compared to the previous
step (the optimal increase is 8).

4. Examples of vectorized FEM computations using edge elements

4.1. Minimization of functional majorant using Raviart-Thomas elements


Let us consider a scalar boundary value (Poisson’s) problem

−△u = f in Ω,
u=0 on ∂Ω,

for a function u ∈ H̊ 1 (Ω) := {v ∈ L2 (Ω) | ∇v ∈ L2 (Ω, Rd ), v = 0 on ∂Ω} and a given right hand side f ∈ L2 (Ω). The
exact solution u is sought from the weak formulation
Z Z
∇u · ∇w dx = f w dx ∀w ∈ H̊ 1 (Ω). (9)
Ω Ω

Assume that v ∈ H̊ 1 (Ω) is an approximation of the exact solution u of (9). Then, the functional type a posteriori error
estimate from [10] states that

k∇(u − v)k ≤ k∇v − yk + CF kdiv y + f k =: M (∇v, f, CF , y), ∀y ∈ H (div, Ω), (10)

where M is called a functional majorant. The global constant CF represents the smallest possible constant from the
Friedrichs’ inequality kwk ≤ CF k∇wk which holds for all w ∈ H̊ 1 (Ω). Note that the estimate (10) is sharp: by choosing
y = ∇u, the inequality changes into an equality. By this we immediately see that minimizing M with respect to y provides
us a way to obtain approximations of the flux ∇u. Since M contains nondifferentiable norm terms, we apply the Young’s
inequality (a + b)2 ≤ (1 + β)a2 + (1 + β1 )b2 valid for all β > 0 to obtain
 
12 2 2
(k∇v − yk + CF kdivy + f k) ≤ 1 + k∇v − yk + (1 + β)CF2 kdiv y + f k =: M(∇v, f, CF , β, y).
β

The majorant M arguments v and f are known, and upper bounds of CF are also known. The parameter β > 0 and the
function y ∈ H(div, Ω) are free parameters. For a fixed value of β, the majorant represents a quadratic functional in y.

9
#T = 512 #T = 131 072 #T = 2 097 152 #T = 33 554 432
Iter β M Ieff β M Ieff β M Ieff β M Ieff
1 1.000 0.026203 1.72 1.000 0.001648 1.73 1.000 0.000412 1.73 1.000 0.000103 1.73
2 3.208 0.023159 1.52 3.294 0.001453 1.52 3.294 0.000363 1.52 3.294 0.000091 1.52
3 3.268 0.023159 1.52 3.294 0.001453 1.52 3.294 0.000363 1.52 3.294 0.000091 1.52

Table 3: Majorant calculation for four meshes in 2D.

#T = 10 368 #T = 82 944 #T = 663 552 #T = 5 308 416


Iter β M Ieff β M Ieff β M Ieff β M Ieff
1 1.000 0.011794 1.59 1.000 0.006176 1.59 1.000 0.003135 1.59 1.000 0.001574 1.59
2 3.128 0.010396 1.40 3.512 0.005379 1.39 3.655 0.002721 1.38 3.697 0.001365 1.38
3 3.420 0.010388 1.40 3.622 0.005379 1.39 3.687 0.002720 1.38 3.706 0.001365 1.38
4 3.432 0.010388 1.40

Table 4: Majorant calculation for four meshes in 3D.

Global minimization of M with respect to y results in the following problem for y:


 Z  Z
1 1
Z Z
2 2
(1 + β)CF div y div φ dx + 1 + y · φ dx = −(1 + β)CF f div φ dx + 1 + ∇v · φ dx ∀φ ∈ H (div, Ω) (11)
Ω β Ω Ω β Ω

On the other hand, for a fixed y,


k∇v − yk
β= (12)
CF kdiv y + f k
minimizes M amongst all β > 0. It suggests the following solution algorithm:
Algorithm 1 (Majorant minimization algorithm). Let β > 0 be given (for example, set β = 1).
(a) Compute y (using current value of β) by minimizing the quadratic problem M(∇v, f, CF , β, y) → min.
(b) Update β (using y calculated in step (a)) from (12). If the convergence in y is not achieved then go to step (a).
We solved the quadratic minimization problem in (a) by discretizing the problem (11) with the linear Raviart-Thomas
elements. For this both of the FEM matrices MRT and KRT were needed (see also [17]).
Example 1. In 2D we choose the unit square Ω := (0, 1)2 , and in 3D the unit cube Ω := (0, 1)3 . We choose the bubble
function
Y d
u(x) := xi (xi − 1)
i=1

as the exact solution in both 2D and 3D. It is clear that u ∈ H̊ 1 (Ω) in both dimensions.
On Tables 3 and 4 we have calculated the majorant M values with Algoritm 1 for four different meshes in 2D and 3D,
respectively. The program code can be found in the folder /example_majorant/. The approximation v was calculated with
linear nodal finite elements. For measuring the√quality of the chosen free parameters β and y, we have also included the
values of the so-called efficiency index Ieff := M/ k∇(u − v)k ≥ 1. The approximation v and flux approximation y of
the smallest 2D mesh are depicted in Figure 4. The iterations of Algorihtm 1 were stopped if the distance of the previous
value of the majorant to the new value (normalized with the previous value) was less than 10−4 .

0.3

0.07 0.2
0.3
0.06 0.1
0.2

0.05 0 0.1

−0.1 0
0.04

−0.2 −0.1
0.03
−0.3 −0.2
0.02
−0.4 −0.3
0.01 1 1
1 −0.4
0.8 0.8
0.8 1
0 1
0.6 0.8 0.6
1 0.6 0.8
0.8 0.4 0.6 0.6 0.4
0.6 0.4
0.4 0.4
0.4 0.2 0.2 0.2
0.2 0.2 0.2
0 0 0 0 0 0

Figure 4: Discrete solution v (left), and the flux approximation y first and second components (middle and right) on a mesh with 512 elements.

10
q
#T #E kE − vk2 + kcurl(E − v)k2
32 768 49 408 2.358185e-02
131 072 197 120 1.179151e-02
524 288 787 456 5.895834e-03
2 097 152 3 147 776 2.947927e-03
8 388 608 12 587 008 1.473965e-03
33 554 432 50 339 840 7.369826e-04

Table 5: Exact energy errors of approximations of the 2D eddy-current problem on uniformly refined meshes.

4.2. Solving the eddy-current problem using Nédélec elements


We split the boundary into two parts: ∂Ω := ΓD ∪ ΓN such that ΓD ∩ ΓN = ∅. The 2D eddy-current problem reads as

curl µ−1 curl E + κE = F in Ω,


E×n=0 in ΓD ,
µ−1 curl E = 0 in ΓN ,

for E ∈ H̊ΓD (curl, Ω) := {v ∈ H (curl, Ω) | v × n = 0 on ΓD }, where n denotes the outward unit normal to the boundary
∂Ω. Here the right hand side F ∈ L2 (Ω, R2 ), and the positive material parameters µ, κ ∈ L∞ (Ω) are given. The exact
solution E is sought from the weak formulation
Z Z Z
µ−1 curl E curl w dx + κE · w dx = F · w dx ∀w ∈ H̊ΓD (curl, Ω). (13)
Ω Ω Ω

2
Example 2 ([1]). We choose the unit square Ω := (0, 1) with κ = µ = 1. We split the domain in two parts across the
diagonal, Ω1 := {x ∈ Ω | x1 > x2 } and Ω2 := Ω\Ω1 , in order to define the following discontinuous exact solution:
 
sin(2πx1 ) + 2π cos(2πx1)(x1 − x2 )
E Ω1 (x) := , E Ω2 (x) := 0.
sin (x1 − x2 )2 (x1 − 1)2 x2 − sin(2πx1 )

Since on Γ/ := {x ∈ Ω | x1 = x2 } we have

1
× n = √ 2π cos(2πx1 )(x1 − x2 ) + sin (x1 − x2 )2 (x1 − 1)2 x2 ,

E Ω1
E Ω2
× n = 0,
2

we see that E Ω × n = 0 on Γ/ . We conclude that the tangential component is continuous on Γ/ , so E belongs to


1
H (curl, Ω). Moreover,

curl E Ω1 = 2x2 (x1 − x2 )(x1 − 1)(2x1 − x2 − 1) cos x2 (x1 − x2 )2 (x1 − 1)2 , curl E Ω2 = 0,


and clearly curl E Ω1


= 0 on Γ/ , i.e., curl E is continuous on Γ/ . Also, it is easy to see that curl E vanishes on the whole
boundary, so it belongs to H̊ 1 (Ω). Thus, the exact solution satisfies zero Neumann boundary condition on the whole
boundary, i.e., ΓD = ∅ and ΓN = ∂Ω.
We denote by v an approximation of the exact solution E of (13). In the discretization of (13) we need both the
mass and stiffness matrices MNed and KNed . We see from Figure 5 that the 2D Nédélec element catches the normal
discontinuity on the diagonal line Γ/ . In Table 5 we show how the error measured in the H (curl, Ω) -norm decreases as
the mesh is uniformly refined. The program code can be found in the folder /example_eddycurrent/.

Acknowledgements The work of the first author was supported by the Väisälä Foundation of the Finnish Academy of
Science and Letters. The second author acknowledges the support of the project GA13-18652S (GA CR).

[1] I. Anjam and D. Pauly. Functional a posteriori error equalities for conforming mixed approximations of elliptic
problems. Preprint, 2014. URL: http://arxiv.org/abs/1403.2560 (accessed 9.1.2015).
[2] C. Bahriawati and C. Carstensen. Three MATLAB implementations of the lowest-order Raviart-Thomas MFEM
with a posteriori error control. CMAM, 5(4):333–361, 2005.

11
0.03
1.5
6 0.02

1 0.01
4
0
0.5
2 −0.01
1 0 −0.02
1
0 1
0.8 −0.03
−0.5 0.8 1
0.8
−2 0.6
−1 0.6 0.8
0.6
0.4 1 0.6
−4 0.8 0.4
0.4
1 0.6 0.4
0.9 0.8 0.2
0.7 0.4 0.2 0.2
0.6 0.5 0.2
0.4 0.3 0.2
0.2 0.1 0 0 0 0
0 0

Figure 5: Discrete solution v first and second components (left and middle), and curl v (right) on a mesh with 512 elements.

[3] F. Brezzi and M. Fortin. Mixed and hybrid finite element methods, volume 15 of Springer Series in Computational
Mathematics. Springer-Verlag, New York, 1991.
[4] L. Chen. iFEM: an integrated finite element method package in MATLAB. Technical report, University of California
at Irvine, 2009. URL: http://math.uci.edu/~chenlong/programming.html (accessed 9.1.2015).
[5] F. Cuvelier, C. Japhet, and G. Scarella. An efficient way to perform the assembly of finite element matrices in Matlab
and Octave. Preprint, 2013. URL: http://arxiv.org/abs/1305.3122 (accessed 9.1.2015).
[6] D. A. Dunavant. High degree efficient symmetrical gaussian quadrature rules for the triangle. Int. J. Num. Methods
Eng., 21:1129–1148, 1985.
[7] S. Funken, D. Praetorius, and P. Wissgott. Efficient implementation of adaptive P1-FEM in MATLAB. Comput.
Methods Appl. Math., 11:460–490, 2011.
[8] A. Hannukainen and M. Juntunen. Implementing the finite element assembly in interpreted languages. Preprint,
Aalto University, 2012.
[9] J. C. Nédélec. Mixed finite elements in R3 . Numerische Matematik, 35:315–341, 1980.
[10] P. Neittaanmäki and S. Repin. Reliable methods for computer simulation. Error control and a posteriori estimates,
volume 33 of Studies in Mathematics and its Applications. Elsevier, Amsterdam, 2004.
[11] T. Rahman and J. Valdman. Fast MATLAB assembly of FEM matrices in 2D and 3D: Nodal elements. Applied
Mathematics and Computation, 219:7151–7158, 2013.
[12] P. A. Raviart and J. M. Thomas. A mixed finite element for second order elliptic problems. In I. Galligani and
E. Magenes, editors, Mathematical Aspects of Finite Element Methods, pages 292–315. Springer-Verlag, New York,
1977.
[13] A. Schneebeli. An H(curl; Ω)-conforming FEM: Nédélec’s elements of first type. Technical report, 2003. URL:
http://www.dealii.org/reports/nedelec/nedelec.pdf (accessed 9.1.2015).
[14] J. Shöberl. C++11 implementation of finite elements in NGSolve. ASC Report 30/2014, Institute for Analysis and
Scientific Computing, Vienna University of Technology, 2014.
[15] P. Šolı́n, L. Korous, and P. Kus. Hermes2D, a C++ library for rapid development of adaptive hp-FEM and hp-DG
solvers. Journal of Computational and Applied Mathematics, 270:152–165, 2014.
[16] P. Šolı́n, K. Segeth, and I. Doležel. Higher-order finite element methods, volume 41 of Studies in Advanced Mathe-
matics. Chapman and Hall/CRC, Boca Raton, Florida, 2003.
[17] J. Valdman. Minimization of functional majorant in a posteriori error analysis based on H(div) multigrid-
preconditioned cg method. Advances in Numerical Analysis, vol. 2009, 2009. Article ID 164519.
[18] L. Zhang, T. Cui, and H. Liu. A set of symmetric quadrature rules on triangles and tetrahedra. J. Comp. Math.,
26(3):1–16, 2008.

12

View publication stats

You might also like