0% found this document useful (0 votes)

17 views14 pages

A Survey of GPU-based Acceleration Techniques in M

This document is a survey of GPU-based acceleration techniques in MRI reconstructions, highlighting the increasing complexity of image reconstruction in clinical MRI applications and the need for faster computational procedures. It discusses the advantages of using GPUs for high-performance parallel computations, particularly in the context of deep learning applications in MRI. The survey aims to provide a reference for researchers in the MRI community by summarizing various GPU computing schemes and their applications in MRI reconstruction.

Uploaded by

Allan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views14 pages

A Survey of GPU-based Acceleration Techniques in M

Uploaded by

Allan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/324060154

A survey of GPU-based acceleration techniques in MRI reconstructions

Article in Quantitative Imaging in Medicine and Surgery · March 2018

DOI: 10.21037/qims.2018.03.07

CITATIONS READS

42 3,509

4 authors, including:

Hanchuan Peng Yuchou Chang

Institute for Brain and Intelligence 121 PUBLICATIONS 997 CITATIONS
209 PUBLICATIONS 22,262 CITATIONS
SEE PROFILE
SEE PROFILE

Dong Liang
Chinese Academy of Sciences
404 PUBLICATIONS 5,332 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Smart robotic tools for neuroscience and medical applications View project

Artificail Intelligence View project

All content following this page was uploaded by Dong Liang on 21 February 2020.

The user has requested enhancement of the downloaded file.

Review Article

A survey of GPU-based acceleration techniques in MRI

reconstructions
Haifeng Wang1, Hanchuan Peng2, Yuchou Chang3, Dong Liang1
1
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; 2Allen Institute for Brain Science, Seattle,
WA, USA; 3Computer Science and Engineering Technology Department, University of Houston-Downtown, Houston, Texas, USA

Correspondence to: Prof. Dong Liang. Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
Email: [email protected].

Abstract: Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become
increasingly more complicated. However, diagnostic and treatment require very fast computational
procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-
performance parallel computations available, and attractive to common consumers for computing massively
parallel reconstruction problems at commodity price. GPUs have also become more and more important for
reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction.
The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI
applications and provide a summary reference for researchers in MRI community.

Keywords: Graphics processing unit (GPU); magnetic resonance imaging (MRI); reconstruction

Submitted Nov 28, 2017. Accepted for publication Mar 05, 2018.
doi: 10.21037/qims.2018.03.07
View this article at: http://dx.doi.org/10.21037/qims.2018.03.07

Introduction decreasing practical processing times of their algorithms.

During the past few years, GPUs have been developed
Parallel computing is a type of classic computation to speed
with incredible increase in number due to relatively cheap
up the computer speed (1). It can divide the large problems
and high-performance calculation platforms for data parallel
into lots of smaller ones. Their calculations are carried
computing, especially in medical image reconstruction
out simultaneously, or their process executions are carried
of massive data-set. General-purpose computing on
out simultaneously. Parallel computing can bring higher- GPUs (GPGPU) has been a fairly recent trend in parallel
performance computation in comparison to the classic computing research. Among the GPGPU framework, the
computation (2), but it generally requires hardware support. use of massive multiple graphics units in one computer
Parallel computing has become a dominant popular field can further parallelize the existing parallel nature of GPUs
in computer architecture paradigm (3), mainly in the form due to the specialization in each chip. It can provide some
of multi-core processors parallel computing, for example, advantages of the higher-performance computation ability,
clusters computing, massively parallel computing (MPPs), which multiple CPUs cannot offer (4). However, the high-
grids computing, graphics processing units (GPU), etc. performance computation utilization on GPU or GPGPU
Recently, the progress of single-core processor performance requires reformulating current sequential computation
has almost arrived at the physics limitations, and Moore’s problems in terms of graphics primitives, which have
law has become less effective to raise the computational been supported on GPUs by the two major API libraries
feasibility of more complex algorithms. Instead, the for graphics processors: OpenGL and DirectX. This
scientists and engineers have to shift their algorithms of cumbersome translation from general programming to GPU
growing complexity to parallel computing architectures for hardware was obviated by the obscure GPU programming

© Quantitative Imaging in Medicine and Surgery. All rights reserved. qims.amegroups.com Quant Imaging Med Surg 2018;8(2):196-208
Quantitative Imaging in Medicine and Surgery, Vol 8, No 2 March 2018 197

Peak memory bandwidth (GB/s) Peak-double-precision-flops-(GFLOPs)

800 6000
GPU GPU
700
CPU 5000 CPU
600

500 4000

400 3000
300
2000
200
1000
100

0 0
2006 2008 2010 2012 2014 2016 2018 2006 2008 2010 2012 2014 2016 2018

Figure 1 Comparison speed of calculation (FLOPS) and speed of data movement (bandwidth) (GB/s) between GPUs and CPUs with years.
Figure taken with kind permission from Ref. (10).

languages such as Sh/RapidMind, Brook and Accelerator (10,11). Therefore, this development shift between GPUs
(5,6). But they are hard to be applied for the common and CPUs gives a new motivation for researchers to re-
programmers without corresponding programming training. consider parallelizing their computations of medical image
To further provide real convenience to GPU programmers, applications on GPU frameworks. Through directly
three libraries, which are NVIDIA’s Compute Unified inputting the data-parallel computation part onto GPUs,
Device Architecture (CUDA), Microsoft’s DirectCompute the number of physical computers within a computer
and Apple/Khronos Group’s OpenCL, provided more can be greatly reduced to minimum. The benefits are
feasible GPU programming frameworks for programmers not only reducing computer cost, but also requiring less
to ignore the requiring full and explicit conversion of the maintenance, space, power, and cooling for whole system
data to the graphical forms and take advantages of the high- operation cost inside any institutes, schools or hospitals.
performance computing speeds on GPUs (7). Actually, a
group at SGI Corporation has firstly implemented a GPU
GPU computing
computing of image recon processing on an Onyx primitive
workstation using the RealityEngine2.5 in 1994 (8). The physical architectures and processing model of GPU
Because of the graphics-hardware limitations at that time, and CPU are very different, which is the main reason
the SGI graphics-hardware implementation is 100 times the computing power of GPU is much faster than CPU.
slower than a single-core CPU processor in 2004 (8). As seen in Figure 2 (12), GPU has the features that can
However, the performance developments of recent single- provide many data-parallel, high memory bandwidths and
core CPU processors have been much slower than the deeply multi-threaded cores for large number of simple
multi-core of GPU processors. Today, GPUs have been computation tasks, but CPU just can provide the limited
a standard hardware part of the current computers for cores for high-complex computation tasks. Architecturally,
graphics processing and are further designed as relatively as seen in Figure 2, the structure comparison between CPUs
independent frameworks for processing data parallel and GPUs in GLOP/s computation capability is that GPUs
problems, which can assign individual element data to are highly specialized for compute-intensive, highly parallel
separate logical cores for complex processing [as seen in (9)]. computation hardware structure and design that over 80%
As seen in Figure 1, it presents the evolution of bandwidth of transistors are contributed for data processing rather
and computation abilities between GPUs and CPUs in GB/s than the data caching and flow control functions; on the
and GFLOP/s (i.e., billions of data movement speed per contrary, CPUs are designed as a few cores with many cache
second; billions of floating point operations per second memories for easily handling complex software threads
under single and double precision situations) (10). If GPU at one time. For instance, a general GPU can have 100+
is compared on a chip-to-chip basis against CPUs, GPUs processing cores which can handle thousands of software
can have much better capability on both key indexes, speed threads simultaneously. In theory, the GPUs’ performance
of calculation (FLOPS) and speed of data movement (GB/s) can be accelerated to process thousands of software threads

© Quantitative Imaging in Medicine and Surgery. All rights reserved. qims.amegroups.com Quant Imaging Med Surg 2018;8(2):196-208
198 Wang et al. GPU-based acceleration in MRI reconstructions

CPU GPU

Figure 2 Comparison of GPU and CPU devotes more transistors to data processing (12). ALU, arithmetic logical unit; GPU, graphics
processing unit; DRAM, dynamic random access memory.

by 100× over CPUs’ alone processing. Because of the all the capabilities of GPUs, such as shared memory and
special GPU architectures which are different from general scattered writes; thirdly, the code portability is constrained
CPU architectures, GPU codes can easily run algorithms in by the specific hardware features of some graphics
parallel. But in most cases of general CPU-based algorithm, extensions (8).
there is no algorithm which can be suitable for GPUs, In order to solve the drawbacks, there are four major
because the general algorithms are suitable for the general commercial framework solutions, CUDA, OpenCL,
CPU architectures. The problems require that the general Stream and DirectCompute, which have been deployed
algorithms should be redesigned into new, more power- and to generate parallel higher-performance codes for GPU.
cost-efficient parallel algorithms for GPUs’ special features. Among them, CUDA was developed recently by NVIDIA;
The parallel codes can certainly bring higher-performance OpenCL is an open standard library that was developed
for developed algorithms, and simultaneously bring more by Khronos Group; Stream was developed by AMD (ATI
difficult debugging problems than general codes. chips); DirectCompute was developed by Microsoft (2).
As we know, GPUs are primitively developed to accelerate Among the solutions, CUDA is the solution that is most
the image processing speed of graphics cards. When widely used to rewrite algorithms to be GPU-enabled and
graphics mode is turned on, based on the GPUs’ API such as efficient by programmers in computer graphics, image
OpenGL or DirectX, programmers can implement shading processing, computer vision, computational fluid dynamics
programs to custom the graphics pipelines of GPUs at run (CFD) and many more fields. The primary advantage of the
time, using high-level shading languages such as NVIDIAC CUDA framework is that it can easily bring the C/C++-like
for Graphics (CG), OpenGL Shading Language (GLSL), or development environment and the parallel capabilities of
Microsoft High-Level Shading Language (HLSL), Adobe GPU acceleration for programmers, but does not require
Graphics Assembly Language (AGAL), Sony PlayStation programmers to have lots of detailed knowledge of GPU
Shader Language (PSSL), etc., which are originally designed hardware architectures. Although they are much helpful for
for real-time rendering (2). Although early GPU computing programmers to employ GPUs into the application, there
programs have achieved impressive accelerations on are still some more consumable software packages based
medical image processing (13-16), they suffered from some on CUDA and OpenCL libraries for the programmers
drawbacks as follows. Firstly, the GPU computing coding who are not familiar with GPU programming and have
is very difficult for entry-level programmer to develop the finite C/C++ parallel programming experiences. Here,
qualified code, because they need to be defined in terms of several popular libraries should be mentioned, Thrust,
graphics concepts, for instance, vertices, texture coordinates, cuFFT, cuSOLVER, cuSPARSE, and cuDNN, which
and fragments; secondly, acceleration performances of GPU are widely used in applications ranging in the fields of
computing codes are compromised by the lack of access to signal processing and image processing (17,18). For

example, Thrust is a derivative C++ template library for there are urgent speed requirements from doctors and
parallel GPU platforms based on the well-known CPU- scientists to review the patients’ images without too long
based Standard Template Library (STL). It can provide waiting for reconstruction processing. Currently, GPU
programmer a shortcut to easily utilize prototype demos computing has been increasingly investigated for clinical
for high performance CUDA applications with minimal MRI reconstruction applications (Table 1). According to
programming efforts through its high-level interfaces fully statistics, there are lots of papers about GPU, MRI and
interoperable with technologies such as C/C++, CUDA, reconstruction which are published from 2005 to 2016,
etc. (18). The cuFFT library provides a simple software as seen in Figure 3. This plot illustrates prevalence of
interface based on the well-known Cooley-Tukey and GPU-based methods in the field of MRI reconstruction.
Bluestein algorithms to get accurate Fourier transformation It is explicit that GPU-accelerated MRI reconstructions
(FT) results faster than ever, and its speed of computing became much more applicable especially after the release
fast Fourier transforms (FFTs) is up to about 10× faster of of NVIDIA’s CUDA in 2007 (47). Actually, the number
computing discrete Fourier transforms for any complex of publications related to GPU and MRI always increases
or real-valued data sets. The cuSOLVER library provides very quickly, but recently, the publications about GPU,
a collection of dense and sparse direct solvers which can MRI and reconstruction do not grow up synchronously.
deliver significant accelerations for computer vision, CFD, The growth is slowing down, because most of the GPU-
and linear optimization applications. The cuSPARSE accelerated algorithms about typical MRI reconstruction
library including a sparse triangular solver provides a have been studied well and implemented on GPUs. Among
collection of basic linear algebra subroutines used for sparse these GPU-accelerated methods, they can roughly be
matrices can deliver up to about 8× faster performance divided into three categories of MRI reconstruction with
than the well-known Intel Math Kernel Library (MKL). GPU computing, FT, parallel imaging (PI), compressed
As a GPU-accelerated version of the complete standard sensing (CS), and deep learning, which are going to be
library, it can deliver 6× to 17× faster performance than introduced as follows. A summarization of GPU-based MRI
the Intel MKL. The cuDNN library is a GPU-accelerated reconstruction method is presented in the Table 1. Here,
library for deep neural networks (NNs), which can provide it is default to apply GPUs to accelerate deep learning
highly tuned implementations for standard routines such as applications, because GPUs are suitable for deep learning
forward and backward convolution, pooling, normalization, calculations. Otherwise, the CUDA library and hardware
and activation layers (18). The cuDNN library allows of NVIDIA Corp. have occupied the field of the GPU
researchers to focus on training designed NNs and computation. It seems that the NVIDIA Tesla cards are not
developing software applications rather than spending faster than the NVIDIA GeForce cards, but it is an illusion
much time on realizing low-level GPU performance tuning. and NVIDIA Tesla cards are more powerful than NVIDIA
Now, cuDNN has been widely applied into many deep GeForce cards. Actually, the speed-up factors of GPU-
learning frameworks, such as, Caffe, TensorFlow, Theano, based MRI reconstruction depend on the system platforms,
Torch, and so on. Actually, these libraries have enough GPU-implementation and reconstruction algorithms.
ability to provide enough supports for most computations
of algorithm operations in magnetic resonance image
FT
reconstruction. Therefore, providing the competitive
ability of high-performance parallel computing supported Most MR imaging methods are designed based on Fourier
by these libraries, GPU-based computing algorithms encoding, so that methods in the MRI reconstructions
have been applied into the comprehensive applications of include basic FFT. The FFT implementation on CPUs has
magnetic resonance image reconstruction, due to GPUs’ already been quite efficient, but its version on GPUs can be
super-powerful parallel computing ability with multi-thread accelerated faster than on CPUs. Actually, Sumanaweera
capabilities and multi-core architectural structures (19). et al. presented to implement the Cartesian FFT method
as a multi-pass decimation-in-time butterfly algorithm on
GPUs in the book of Ref. (19). They presented several
Magnetic resonance imaging (MRI) reconstruction
specific approaches for obtaining higher performance,
In clinical applications, MRI reconstruction calculations using two pbuffers and balancing some computing loads
have become more and more complex for computers, so from fragment processors into the vertex processors

© Quantitative Imaging in Medicine and Surgery. All rights reserved. qims.amegroups.com Quant Imaging Med Surg 2018;8(2):196-208
200 Wang et al. GPU-based acceleration in MRI reconstructions

Table 1 Summarization of GPU-based MRI reconstruction methods

Reference Application GPU hardware GPU library Speed up than CPU

(19) Cartesian FFT NVIDIA Quadro FX NV40 HLSL 2×

(20) Cartesian FFT ATI Radeon X1800 XT HLSL, DirectX 3.5×

(21) Non-uniform FFT (nonequispaced FFT) NVIDIA GeForce GTX 8800 NFFT library 21–85×

(22) Non-uniform FFT (conjugate gradient solver) NVIDIA GeForce GTX 8800 CUDA library 10×

(23) Non-uniform FFT (optimal least square NUFFT) – – –

(24) Gridding (PROPELLER) NVIDIA GeForce GTX 8800 CUDA library 9×

(25) Gridding (reverse gridding of PROPELLER) NVIDIA GeForce GTX 8800 CUDA library 8×

(26) Gridding (reverse gridding optimization) NVIDIA Tesla C2050 CUDA library 6–30×

(27,28) Gridding (conjugate gradient linear solver) NVIDIA Tesla M2070 CUDA library 26×

(29) Parallel imaging (Cartesian SENSE, k-t SENSE) NVIDIA GeForce GTX 8800 CUDA library 3–108×

(30) Parallel imaging (radial SENSE) NVIDIA GeForce GTX 280 CUDA library 10–12×

(31) Parallel imaging (radial iterative SENSE) NVIDIA GeForce GTX 280 CUDA library 2×

(32) Parallel imaging (radial GRAPPA) NVIDIA Tesla M2090 CUDA library –

(33) Parallel imaging (GRAPPA operator gridding) NVIDIA GeForce GTX 780 CUDA library 6–30×

(34) Parallel imaging (radial ART) NVIDIA GeForce GTX 580 CUDA library 15×

(35) Compressed sensing (conjugate gradient solver) NVIDIA GeForce GTX 280 CUDA library 200×

(36) Compressed sensing (split Bregman regularization) NVIDIA Tesla C2050 CUDA library 10×

(37) Compressed sensing (3D Radial Cardiac MRI) NVIDIA GeForce GTX 480 CUDA library 34–54×

(38) Compressed sensing (ADMM algorithm) NVIDIA GeForce GTX 650 CUDA library 30×

(39) Compressed sensing (SENSE-type acquisition) NVIDIA Tesla C2050 CUDA library 3×

(40) Compressed sensing (L1-ESPIRiT algorithm) NVIDIA Tesla K20m CUDA library 3–15×

(41) Compressed Sensing (cloud computing) Amazon Elastic Compute Gadgetron 2–10×
Cloud

(42) Compressed sensing (field-compensated recon) NVIDIA GeForce GTX 280 CUDA library 81–284×

(43) Deep learning (convolutional neural network) NVIDIA GeForce GTX TITAN CUDA library –

(44) Deep learning (variational network) NVIDIA Tesla M40 CUDA library –

(45) Deep learning (residual regression, deep CNN) NVIDIA GeForce GTX 1080 CUDA library –

(46) Deep learning (manifold approximation, DNN) NVIDIA Tesla P100 CUDA library –
MRI, magnetic resonance imaging; GPU, graphics processing unit; FFT, fast Fourier transform; CNN, convolutional neural network; DNN,
deep neural network; HLSL, High Level Shading Language; CUDA, Compute Unified Device Architecture.

and rasterizers. And they briefly illustrated the high- Bessel window gridding algorithm and a Ram-Lak filtered
performance experiments of MRI reconstruction and back-projection method. Their primary results showed
ultrasonic imaging. Then, Schiwietz et al. described the recon time and image quality for two GPU-based
an efficient GPU-based implementation of the non- reconstruction algorithms that were comparable with
Cartesian FFT (20), which was written in primitive and the CPU-based implementations for radial trajectories.
underlying Microsoft’s DirectX with C/C++ and HLSL. Otherwise, Sørensen et al. also presented a fast parallel
They implemented a look-up-table (LUT)-based Kaiser- GPU-accelerated algorithm to compute the nonequispaced

1600 implementation of relative complex algorithms was hard to

GPU+MRI
1400 GPU+MRI+Reconstruction be coded for programmers; even those have some GPU-
based programming experiences.
Cumulative number of articles

1200
Currently, NVIDIA has released their easy-to-use CUDA
1000 framework in which they realized the cuFFT library (49),
which is an optimized GPU-based implementation of the
800
FFT. There are two separate libraries: cuFFT and cuFFTW.
600 The cuFFT library is designed to provide easy-to-use
high-performance FFT computations only on NVIDIA
400
GPU cards. While, the cuFFTW library is a porting
200 tool that is provided to apply FFTW into users’ projects
with a minimum amount of effort. Both two libraries can
0
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 provide the features as follows (18): an O(nlogn) algorithm
Year
for different input data sizes; single-precision (i.e., 32-bit
Figure 3 The plot shows that cumulative articles published for floating point) and double-precision (i.e., 64-bit floating
GPU, MRI and reconstruction from 2005 to 2016. GPU, graphics point) computations; complex and real-valued digital
processing unit; MRI, magnetic resonance imaging. input and output; execution of multiple 1D, 2D and 3D
transforms simultaneously; most in-place and out-of-place
FFT; arbitrary intra- and inter-dimension element strides.
Here, Figure 4 shows a current example of using CUDA’s
cuFFT library to calculate two-dimensional FFT, as similar
as Ref. (49). They simply are delivered into general codes,
which can bring the GPU-accelerated computation power
for arbitrary projects.
Actually, Stone et al. have presented an anatomically
constrained MR reconstruction algorithm based on
NVIDIA’s CUDA library for non-Cartesian MR data (22).
What is more, their algorithm could find the solution for a
quasi-Bayesian estimation problem that is a typical problem
in MRI reconstructions. Their results showed that their
algorithm could reduce the recon time for an advanced
non-uniform reconstruction of the in vivo data from
23 minutes on a quad-core CPU to about 1 minute on the
Quadro Plex cluster, which can be applied to accelerate
MR reconstruction into many clinical applications. Besides,
Figure 4 Computing 2D FFT of size NX × NY using CUDA’s Yang et al. presented optimized interpolators to approximate
cuFFT library (49). FFT, fast Fourier transform; NX, the number the non-uniform FT of a finitely supported function in the
along X axis; NY, the number along Y axis. inversion of non-Cartesian data (23). According to their
simulation applications, their interpolators could provide
iterative non-Cartesian inversion algorithms which could
FFT (21), in which the key step is implementing the reduce memory demands on memory limited early GPU
convolution step in transform that was the most time systems. Moreover, Guo et al. improved a grid-driven
consuming part before. The authors claimed that their interpolation algorithm for PROPELLER trajectory in
GPU-accelerated convolution was up to 85 times faster real-time non-Cartesian applications (24). Their GPU-
than the open source NUFFT library (48), when using based method could be about 9 times faster than their
two MRI data-sets sampled by radial and spiral trajectories implementation on CPU, and it could achieve compatible
to estimate the algorithm performances. Actually, before motion correction accuracy and image quality. Moreover,
the NVIDIA CUDA library appeared in June 2007, any Yang et al. also presented a CUDA-based algorithm to raise

the reconstruction efficiency of conventional PROPELLER implemented in GPUs (29-33). For example, the GPU-
trajectory (25). They developed a reverse gridding algorithm based implementations of Cartesian SENSE and k-t
to reduce computation complexity. But different from the SENSE have been presented by Hansen et al. in Ref. (29).
conventional gridding algorithm which generated a grid They focused on the inversion problems of SENSE recon
window for every trajectory, their algorithm calculated a and solved them for each set of aliased pixels in image-
trajectory window for every grid. The contribution of each space or x-f space, since these problems generally were the
k-space point in the convolution window was accumulated most time-consuming steps in the SENSE and SENSE
for this grid. Their experiments illustrated that its recon derivative reconstructions. Here, Sørensen et al. presented
speed was 7.5 times faster than that of conventional gridding a GPU-based reconstruction algorithm to enable real time
algorithm. Besides, Obeid et al. proposed a modified GPU- reconstruction of sensitivity encoded none-Cartesian radial
based gridding method to perform gridding using a graphics imaging (e.g., radial SENSE) (30). They claimed their
processor (GPU) to achieve up to 29× acceleration for algorithms could be used for real-time recon applications,
three-dimensional gridding (26). Their solution was to allow because of using a moving buffer scheme to buffer the
bins to contain a variable number of sample points within interval between data acquisition and image display. In
them, without sacrificing rapid access. Furthermore, an addition, Sørensen et al. also have further described their
image reconstruction GPU-accelerated software toolkit for real-time iterative SENSE GPU-based reconstruction
reconstructing data from arbitrary 3D trajectories has been to reduce the reconstruction time in the isotropic whole-
released in Ref. (27,28). It is named as the Illinois Massively heart imaging application, an important protocol in
Parallel Acquisition Toolkit for Image reconstruction with simplifying cardiac MRI (31). They have shown that the 3D
ENhanced Throughput in MRI (IMPATIENT MRI). datasets (256 slices) could be reconstructed in 5–6 minutes.
In their toolkit, they removed computational bottlenecks As an important PMRI method, GRAPPA also has been
using a gridding method to accelerate the computation of implemented on GPUs. For example, Saybasili et al.
data structures by the previous routine. Furthermore, they presented an automatically distributed hybrid (multi-node,
enhanced the capabilities of off-resonance correction and multi-GPU), low-latency through-time radial GRAPPA
multi-sensor PI reconstruction, with speeding up 200 times reconstruction pipeline in Ref. (32). Actually, they proposed
more than before (27). And they gained much efficient a combined CPU- and GPU-based computation framework
trajectories for high spatial and temporal resolution in the to use multi-threaded CPU and GPU programming on
applications (28). multiple nodes (32). In their implementation, the master
node forwards raw data was to each node for partial
processing, because GRAPPA generally requires using all
PI
coil data to separately recon coil by coil. Each node could
PI (PMRI) techniques are employed to reconstruct under- distribute the task to its local GPUs, and send its partial
sampled data in k-space by attaining complementary image results back to the master node after reconstructions.
information from multiple receive coils. There are a lot of After that, all image results were combined and sent to the
PMRI reconstruction techniques that have been proposed (50). scanner for display. Their implementation claimed their
Currently, among them, the most well-known PMRI reconstruction performances on 32 coils could achieve
techniques are mainly SMASH (51), SENSE (52), and 42 ms acquisition time, 11.2 ms reconstruction time for
GRAPPA (53). Most of these techniques require additional under-sampled radial datasets, and their methods could
coil sensitivity maps to remove the artifact’s effect, because be utilized into more challenging reconstruction scenarios
of acquisition data under-sampled in the k-space. If current which have larger numbers of acquisition coils, higher
PMRI methods are simply analyzed, they can be roughly- acceleration rates, or more GPUs than before. Furthermore,
classified into two types (50): one is the reconstruction Inam et al. proposed an acceleration method for Self-
procedure in image space which includes unfolding calibrating GRAPPA operator gridding by using massively
operation and inverse procedure, for example, SENSE (52); parallel architecture of GPUs (33). The LUTs were used to
another is the reconstruction procedure in k-space, which pre-calculate all possible combinations of gridding weight
has kernel calculation and recovery procedures of missing as well as avoid the race condition among the CUDA kernel
k-space data, for instance, SMASH (51) and GRAPPA (53). threads. Firstly, they used the LUT-based optimized kernels
SENSE and SENSE derivative methods have been of CUDA to pre-calculate all the possible combinations of

2D-gridding weight sets, after that they applied appropriate between GPUs and CPUs. Recently, because of popular
weight sets to shift the radial acquisitions to the nearest CS, many studies have applied GPU-accelerated computing
Cartesian grid locations. They claimed that their GPU- for fast CS MRI reconstruction, which seemed to be
based method could typically achieve 6× to 30× speed-up ideally suited for CS recon (54). For instance, Smith et al.
without compromising the image quality. also have presented a GPU-accelerated Bregman solver
Actually, the above methods mainly attempted to transfer to accelerate 2D CS reconstruction in Ref. (36). They
the classical PMRI methods based on CPUs into the demonstrated that their combination of the split Bregman
GPU-based PMRI recon methods. However, the PMRI method and GPU computing could achieve the rapid
algorithms based on parallel GPUs should be re-designed convergence and massive parallel computation of real-
according to the GPUs’ features. For example, Li et al. time CS reconstruction for small-to-moderate size images.
implemented a GPU-accelerated algebraic reconstruction Their GPU-accelerated iterative reconstruction method
technique (ART) reconstruction in Ref. (34) to apply could reconstruct two-dimensional 1,0242 data matrices
to recover images with radial cardiac cine acquisitions. with a factor of up to 27 and spend about 0.3 seconds or
They mainly compared the reconstructed Cine images of less; even there is no available GPU VRAM. Nam et al. also
their radial ART method with filtered back-projection at proposed a parallelized GPU-accelerated implementation
multiple under-sampling levels. Their results illustrated of an iterative CS reconstruction algorithm for 3D radial
that GPU-accelerated ART could get comparable results data acquisitions in both phantom and in vivo whole-heart
in comparison with conjugate gradient SENSE in parallel coronary data sets (37). To reduce the time-cost operations
radial MR imaging, and could also reduce artifacts and of gridding and regridding operations, operations could be
maintain image sharpness comparing with general filtered performed in a parallel manner for every measured radial
back-projection methods. Actually, the classical PMRI point, suited for CUDA implementation. Comparing with
methods are not time-cost iterative methods; there are not the efficacy of the general CPU-implementation, their
huge improvements for GPU-based classical PMRI recon GPU-implemented CS reconstruction could improve
methods if the scenario is not extreme. However, for some image recon quality in terms of vessel sharpness and
nonlinear problems or complex iterative reconstructions suppress noise-like artifacts, and reduce the running time
in PMRI applications, GPU-based recon methods can of CS reconstruction to 34.3–53.9 times less than CPU-
bring lots of improvements if recon methods are designed based C/C++ implementation. In addition, Chang et al.
according to the GPUs’ structure features. presented an efficient GPU-based method for CS-MRI
reconstruction for 3D multichannel data (38). They built
a highly-parallelized framework to compute the CS-MRI
CS
reconstructions of simultaneous multiple-channel 3D-CS
Recently, to solve a minimization problem in MRI reconstructions. The results of simulated data and in vivo
reconstruction, CS was studied and started to be applied data showed that the proposed efficient method can
into MR applications (54). Because the NVIDIA CUDA significantly shorten the reconstruction run-time by a factor
library is more and more perfect to support for GPU of 30. Even in some clinical applications, the 3D multi-slice
computing, the complex sparse reconstruction methods are CS reconstruction of the proposed method allowed to be
more easily implemented on GPUs without considering performed in less than 1 second.
the hardware constrains on GPUs. Actually, there are Otherwise, there are also some other papers studying CS
recently some papers studying CS MR recon methods MR recon methods on several parallel architectures of multi-
on GPU architectures (35-42). For example, Zhuo et al. core CPUs and multi-core GPUs (39-41). They presented
presented a GPU-accelerated regularization reconstruction the huge potential speed-up ability on the architectures.
method with compensations for susceptibility-induced field For instance, Kim et al. investigated an inexact quasi-
inhomogeneity effects, which are incorporating a quadratic Newton CS reconstruction algorithm on several parallel
regularization term (35). In their experiments, they realized processing architectures that included CPUs, GPUs, Intel’s
a GPU-based spatial regularization with sparse matrices, Many Integrated Core (MIC) architecture, etc. (39). They
which of the entire procedure is enabled to be performed have claimed lots of experiments on different parallel
on GPUs and avoid the memory bandwidth bottlenecks architectures (multi-core CPUs, GPUs, MIC, etc.). Among
which are associated with frequent communications their experiments (39), their reference implementations on

the 4-core Core i7 were able to reconstruct a 256×160×80 calculation time, while it could still guarantee acceptable
8-channel data of the neuro vasculature with a speedup of accuracy to compensate MR field inhomogeneity.
10× under-sampled data set in 56 seconds; the recon time
could be reduced further to 32 seconds on the 6-core Core
Deep learning (DL)
i7; the CUDA-based implementation could reduce the
reconstruction time to 16 seconds on NVIDIA GTX480; Recent developments of DL in NNs have brought lots
the recon time even could reduce to 12 seconds on the of breakthrough improvements in many areas (55-58).
Intel’s Knights Ferry (KNF) of the MIC architecture. All Because of the time-cost training and multiple-layer NNs,
their experiments showed that their CS algorithm could GPUs are very suitable for solving the massive calculation
bring huge benefits from those throughput-oriented problems of DL (55). Although there are several attempts
architectures. Apart from that, Sabbagh et al. studied to at creating fast NN-specific hardware, GPUs brought a real
accelerate the non-linear CS reconstruction problem in cheap way to implement DL in lots of applications. GPUs
cardiac MRI solved by iterative optimization algorithms can be employed at not only the fast matrix and vector
and to facilitate the migration of CS reconstruction in the multiplications, but also for NN training, and speeding up
clinical applications (40). Their experiments employed DL by a factor of 50 and more (55). Currently, GPU-based
3D steady-state free precession MRI images from five DL started to be applied into some MR applications (43-46)
patients, and compared the speed and recon image quality as follows to solve the problems of MRI reconstruction.
on different parallel platforms, such as CPU, CPU with Wang et al. firstly proposed a DL method to accelerate
OpenMP, and GPU. Their recon results showed that the MR reconstruction (43). They built a big dataset of
mean reconstruction time was 13.1±3.8 minutes on the existing high-quality images, and trained an off-line 3-layer
CPU platform, 11.6±3.6 minutes on the CPU platform convolutional neural network (CNN) as the complex
with OpenMP, and 2.5±0.3 minutes for the CPU platform mapping between MR images from zero-filled and fully-
with OpenMP plus GPU (40). And their image qualities sampled k-space data. Actually, the trained network can
estimated by image subtraction were very similar, which are predict the under-sampled data, when solving an online
comparable on different parallel architectures. Otherwise, constrained reconstruction problem. Although the off-line
the modern cloud-computing conception also has been training can take roughly 3 days, it took less than 1 second
applied into time-cost MR reconstructions. Cloud- for every online reconstruction-based GPU. The in vivo
computing generally needs to support most of modern results illustrated that the proposed method can restore fine
parallel architectures, GPUs are one of them. For example, details and have great potential for effective MR imaging.
Xue et al. utilized the open source Gadgetron framework to Hammernik et al. presented an efficient approach to
support distributed computing for image reconstruction and learn a variational network which can remove typical under-
demonstrated that a multi-node version of the Gadgetron sampling artifacts and restore important image details, such
which could provide nonlinear image reconstruction with as the natural appearance of anatomical structures (44).
clinically acceptable latency (41). Actually, their framework They considered that their trained models were highly
was a cloud-enabled version of the Gadgetron on three efficient and are also well-suited for parallel computation on
different distributed computing platforms ranging from GPUs, due to their structural simplicity. And their approach
a heterogeneous collection of commodity computers to illustrated that they achieved superior results than many
the commercial Amazon Elastic Compute Cloud (41). commonly used reconstruction methods.
They claimed that they could provide nonlinear, CS Lee et al. expressed a novel deep residual learning
reconstructions of cardiac and neuron imaging applications algorithm to recover images from highly under-sampled
with low reconstruction latency. Besides, Zhuo et al. proposed k-space data (45). Here, they formulated a traditional CS
an GPU-implemented reconstruction algorithm with MR problem as a residual regression problem, and designed a
field inhomogeneity compensation into calculating magnetic deep CNN to learn the aliasing artifacts. They trained the
field maps and its gradients for iterative CG reconstruction NN using the magnitude of MR images by a stochastic
algorithms on NVIDA CUDA-enabled GPUs (42). If gradient descent method with momentum based on the
comparing with their CPU-based implementations, their MatConvNet toolbox (59) and NVIDIA GTX 1080 GPUs.
GPU-based implementations could hugely reduce the They expressed that their algorithm took only about 30 ms

after their deep CNN has been well trained, with much GPU-based applications of MRI reconstruction, they have
better reconstruction performance compared to many been gradually recognized and widely applied. Although
existing GRAPPA and CS algorithms. the early GPU programming was constrained and not
Zhu et al. proposed an automated robust NN as a friendly, the developments of GPU programming have
generalized reconstruction framework which can exploit provided more easy-to-use libraries and frameworks for
the universal function approximation of multi-layer programmers. GPUs have played more and more important
perception regression and the manifold learning properties roles in medical imaging, image reconstruction and image
demonstrated by auto-encoders (46). They implemented analysis in the clinical applications. Despite lots of successful
a unified reconstruction framework with a deep neural applications have been performed in the recon community
network (DNN) feed-forward architecture composed of of GPU-based medical imaging, there still remains long-
fully-connected layers followed by a sparse convolutional standing unsolved solution problems.
auto-encoder. And they built their NN parameters which Firstly, GPUs’ parallel architectures require re-designing
were trained to minimize squared loss and updated by a the pipeline of the reconstruction algorithms. Although
stochastic gradient descent, computed with the Tensorflow there are many libraries to assist people to employ GPUs,
toolbox (60) and 2 NVIDIA Tesla P100 GPUs. And their the algorithms pre-optimized before GPU programming
results show to be over a lot of acquisition strategies, and still can bring huge improvements than any easy-to-use
have excellent immunity to noise and artifacts. libraries. It is better to consider the parallel structures in
With fast developments of GPUs and DL in NNs, an any custom-designed algorithms for GPU computing. In
exciting epoch of MRI reconstruction has started. Although addition, the hybrid architectures based on GPU computing
it is still early to say the DL reconstruction approaches will and traditional ×86 CPU-based high-performance
replace currently used clinical methods, the development of computing clusters are more and more popular, even the
the DL approaches has illustrated huge potential to promote cloud computing appears in industry. While software and
the technology developments of MRI reconstruction and hardware trends are not the primary problems of medical
change these community. image computing, the ability that is efficiently employing
more sophisticated algorithms as faster technology emerges
is still an important driving force, largely precluding any
Conclusions
kind of convergence in algorithms (47).
Except the above GPU-accelerated MR reconstructions, In the future, the computing efficiency of the custom-
there are also a few relative applications of unclassical designed optimized algorithms, especially MRI reconstruction
reconstructions which attempted to apply GPU based on GPUs and DL, should be synthetically considered
implementation applications. For example, Johnson et al. as the sequential and parallel procedure, and the low-cost
proposed a GPU-based iterative decomposition of water Internet computation and storage services should be seriously
and fat with echo asymmetry and least-squares (IDEAL) considered.
reconstruction scheme (61). They estimated the fat-
water parameters and compared Brent’s method with
Acknowledgements
golden section search to optimize the unknown MR field
inhomogeneity parameter (psi) in the IDEAL equations. Funding: This work was partially supported by the National
They claimed that their algorithm was made more robust to Natural Science Foundation of China (No. 61471350,
fat-water ambiguities using a modified planar extrapolation 81729003), the Basic Research Program of Shenzhen
of psi method (61). Their experiments show that fat-water (JCYJ20150831154213680), and the Key Laboratory
reconstruction time of their GPU-implementation methods for Magnetic Resonance and Multimodality Imaging of
could be quickly and robustly reduced with a factor of 11.6 Guangdong Province (2014B030301013).
on a GPU in comparison to CPU-based reconstruction.
Nowadays, GPU has been one of the standard tools in
Footnote
high-performance computing (2). More and more GPUs
have been applied into more and more applications because Conflicts of Interest: The authors have no conflicts of interest
of their parallel computing ability and low cost. Among the to declare.

References 16. Wang X, Wang H. Volumetric region growing based on

texture mapping. Proc. SPIE Medical Imaging, Parallel
1. Almasi GS. Highly parallel computing. Redwood City,
Processing of Images, and Optimization Techniques
CA: Benjamin/Cummings, 1989.
(MIPPR), 2009.
2. Shi L, Liu W, Zhang H, Xie Y, Wang D. A survey of
17. Pan L, Gu L, Xu J. Implementation of medical image
GPU-based medical image computing techniques. Quant
segmentation in CUDA. Proc. International Conference
Imaging Med Surg 2012;2:188-206.
on Information Technology and Applications in
3. Parallel computing. Available online: https://en.wikipedia.
Biomedicine (ITAB), 2008:82-5.
org/wiki/Parallel_computing. (accessed on 11 August 2016).
18. GPU-Accelerated Libraries. Available online: https://
4. Mittal S, Vetter JS. A Survey of CPU-GPU Heterogeneous
developer.nvidia.com/
Computing Techniques. ACM Computing Surveys
19. Sumanaweera T, Liu D. Medical image reconstruction with
2015;47:69.
the FFT. GPU Gems 2, Addison Wesley, 2005:765-84.
5. Tarditi D, Sidd Puri S, Oglesby J. Accelerator: Using
20. Schiwietz T, Chang TC, Speier P, Westermann R. MR
Data Parallelism to Program GPUs for General Purpose
image reconstruction using the GPU. Proc. SPIE Medical
Uses. ACM SIGARCH Computer Architecture News
Imaging, 2006.
2006;34:325-35. 21. Sørensen TS, Schaeffter T, Noe KO, Hansen MS.
6. Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron Accelerating the nonequispaced fast Fourier transform on
K. A performance study of general-purpose applications commodity graphics hardware. IEEE Trans Med Imaging
on graphics processors using CUDA. J Parallel and 2008;27:538-47.
Distributed Computing 2008;68:1370-80. 22. Stone SS, Haldar JP, Tsao SC, Hwu WM, Sutton BP,
7. Du P, Weber R, Luszczek P, Tomov S, Peterson G, Liang ZP. Accelerating Advanced MRI Reconstructions on
Dongarra J. From CUDA to OpenCL: Towards a GPUs. J Parallel Distrib Comput 2008;68:1307-18.
Performance-portable Solution for Multi-platform GPU 23. Yang Z, Jacob M. Efficient NUFFT algorithm for non-
Programming. Parallel Computing 2012;38:391-407. Cartesian MRI reconstruction. Proc. IEEE International
8. Pratx G, Xing L. GPU computing in medical physics: a Symposium on Biomedical Imaging (ISBI), 2009:117-20.
review. Med Phys 2011;38:2685-97. 24. Guo H, Dai J, He Y. GPU Acceleration of PROPELLER
9. Boyd C. Data-Parallel Computing. ACM Queue 2008;6(2). MRI Using CUDA. Proc. International Conference on
10. Natoli V. Why 2016 Is the Most Important Year in HPC Bioinformatics and Biomedical Engineering, 2009:1-4.
in Over Two Decades. Available online: https://www. 25. Yang J, Feng C, Zhao D. A CUDA-based reverse gridding
hpcwire.com/2016/08/23/2016-important-year-hpc-two- algorithm for MR reconstruction. Magn Reson Imaging
decades/ (posted on August 23, 2016). 2013;31:313-23.
11. Future of Computing: GPGPU? Available online: http:// 26. Obeid NM, Atkinson IC, Thulborn KR, Hwu WM.
gridtalk-project.blogspot.it/2010/07/future-of-computing- GPU-Accelerated Gridding for Rapid Reconstruction of
gpgpu.html (posted on July 12, 2010). Non-Cartesian MRI. Proc. ISMRM, Montreal, Canada,
12. Navarro CA, Hitschfeld-Kahler N, Mateu L. A Survey on 2011:2547.
Parallel Computing and its Applications in Data-Parallel 27. Gai J, Obeid N, Holtrop JL, Wu XL, Lam F, Fu M,
Problems Using GPU Architectures. Commun Comput Haldar JP, Hwu WM, Liang ZP, Sutton BP. More
Phys 2014;15:285-329. IMPATIENT: A Gridding-Accelerated Toeplitz-based
13. Schenke S, Wünsche B, Denzler J. GPU-based volume Strategy for Non-Cartesian High-Resolution 3D MRI on
segmentation. Proc. of IVCNZ 2005;05:171-6. GPUs. J Parallel Distrib Comput 2013;73:686-97.
14. Chen HL, Samavati FF, Sousa MC, Mitchell JR. 28. Wu XL, Gai J, Lam F, Fu M, Haldar JP, Zhuo Y, Liang
Sketch-based Volumetric Seeded Region Growing. ZP, Hwu W, Sutton B. Impatient MRI: Illinois massively
EUROGRAPHICS Workshop on Sketch-Based Interfaces parallel acceleration toolkit for image reconstruction with
and Modeling 2006:123-9. enhanced throughput in MRI. Proc. IEEE International
15. Beyer J, Langer C, Fritz L, Hadwiger M, Wolfsberger Symposium on Biomedical Imaging (ISBI), 2011:69-72.
S, Bühler K. Interactive diffusion-based smoothing and 29. Hansen MS, Atkinson D, Sorensen TS. Cartesian SENSE
segmentation of volumetric datasets on graphics hardware. and k-t SENSE reconstruction using commodity graphics
Methods Inf Med 2007;46:270-4. hardware. Magn Reson Med 2008;59:463-8.

30. Sørensen TS, Atkinson D, Schaeffter T, Hansen MS. Real- Distributed MRI reconstruction using Gadgetron-based
time reconstruction of sensitivity encoded radial magnetic cloud computing. Magn Reson Med 2015;73:1015-25.
resonance imaging using a graphics processing unit. IEEE 42. Zhuo Y, Wu XL, Haldar JP, Hwu WM, Liang ZP, Sutton
Trans Med Imaging 2009;28:1974-85. BP. Accelerating iterative field-compensated MR image
31. Sørensen TS, Prieto C, Atkinson D, Hansen MS, reconstruction on GPUs. Proc. IEEE International
Schaeffter T. GPU accelerated iterative SENSE Symposium on Biomedical Imaging (ISBI), 2010:820-3.
reconstruction of radial phase encoded whole-heart MRI. 43. Wang S, Su Z, Ying L, Peng X, Zhu S, Liang F, Feng D,
Proc. ISMRM, Stockholm, Sweden, 2010:2869. Liang D. Accelerating magnetic resonance imaging via
32. Saybasili H, Herzka DA, Barkauskas K, Seiberlich N, deep learning. Proc. IEEE International Symposium on
Griswold MA. A Generic, Multi-Node, Multi-GPU Biomedical Imaging (ISBI), 2016:514-7.
Reconstruction Framework for Online, Real-Time, Low- 44. Hammernik K, Knoll F, Sodickson D, Pock T. Learning
Latency MRI. Proc. 21st Meet Int Soc Magn Reson Med, a variational model for compressed sensing MRI
Salt Lake City, Utah, USA; 2013:838. reconstruction, Proc. the International Society of Magnetic
33. Inam O, Qureshi M, Malik SA, Omer H. GPU- Resonance in Medicine (ISMRM), 2016.
Accelerated Self-Calibrating GRAPPA Operator Gridding 45. Lee D, Yoo J, Ye JC. Deep residual learning for
for Rapid Reconstruction of Non-Cartesian MRI Data. compressed sensing MRI. Proc. IEEE International
Applied Magnetic Resonance 2017;48:1055-74. Symposium on Biomedical Imaging (ISBI), 2017.
34. Li S, Chan C, Stockmann JP, Tagare H, Adluru G, Tam 46. Zhu B, Liu JZ, Rosen BR, Rosen MS. Neural Network
LK, Galiana G, Constable RT, Kozerke S, Peters DC. MR Image Reconstruction with AUTOMAP: Automated
Algebraic reconstruction technique for parallel imaging Transform by Manifold Approximation. Proc. the
reconstruction of undersampled radial data: application to International Society of Magnetic Resonance in Medicine
cardiac cine. Magn Reson Med 2015;73:1643-53. (ISMRM), 2017.
35. Zhuo Y, Sutton B, Wu XL, Haldar J, Hwu WM, Liang ZP. 47. Eklund A, Dufort P, Forsberg D, LaConte SM. Medical
Sparse regularization in MRI iterative reconstruction using image processing on the GPU - past, present and future.
GPUs. Proc. International Conference on Biomedical Med Image Anal 2013;17:1073-94.
Engineering and Informatics (BMEI), 2010:578-82. 48. Fessler J, Sutton B. Nonuniform fast Fourier transforms
36. Smith D, Gore J, Yankeelov T, Welch E. Real-Time using min-max interpolation. IEEE Trans Signal Process
Compressive Sensing MRI Reconstruction Using 2003:51:560-74.
GPU Computing and Split Bregman Methods. Proc. 49. cuFFT User Guide. Available online: http://docs.nvidia.
International Journal of Biomedical Imaging, 2012:864827. com/cuda/cufft/index.html
37. Nam S, Akçakaya M, Basha T, Stehning C, Manning WJ, 50. Blaimer M, Breuer F, Mueller M, Heidemann RM,
Tarokh V, Nezafat R. Compressed sensing reconstruction Griswold MA, Jakob PM. SMASH, SENSE, PILS,
for whole-heart imaging with 3D radial trajectories: a GRAPPA: how to choose the optimal method. Top Magn
graphics processing unit implementation. Magn Reson Reson Imaging 2004;15:223-36.
Med 2013;69:91-102. 51. Sodickson DK, Manning WJ. Simultaneous acquisition
38. Chang CH, Yu X, Ji JX. Compressed sensing MRI of spatial harmonics (SMASH): fast imaging with
reconstruction from 3D multichannel data using GPUs. radiofrequency coil arrays. Magn Reson Med
Magn Reson Med 2017;78:2265-74. 1997;38:591-603.
39. Kim D, Trzasko JD, Smelyanskiy M, Haider CR, Manduca 52. Pruessmann KP, Weiger M, Scheidegger MB, Boesiger P.
A, Dubey P. High-performance 3D compressive sensing SENSE: sensitivity encoding for fast MRI. Magn Reson
MRI reconstruction. Conf Proc. IEEE Eng Med Biol Soc Med 1999;42:952-62.
2010;2010:3321-4. 53. Griswold MA, Jakob PM, Heidemann RM, Nittka
40. Sabbagh M, Uecker M, Powell A, Leeser M, Moghari M. M, Jellus V, Wang J, Kiefer B, Haase A. Generalized
Cardiac MRI compressed sensing image reconstruction autocalibrating partially parallel acquisitions (GRAPPA).
with a graphics processing unit. Proc. International Magn Reson Med 2002;47:1202-10.
Symposium on Medical Information and Communication 54. Lustig M, Donoho D, Pauly JM. Sparse MRI: The
Technology (ISMICT), Worcester, MA, 2016. application of compressed sensing for rapid MR imaging.
41. Xue H, Inati S, Sørensen TS, Kellman P, Hansen MS. Magn Reson Med 2007;58:1182-95.

55. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S,
2015;521:436-44. Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz
56. Schmidhuber J. Deep learning in neural networks: an R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga
overview. Neural Netw 2015;61:85-117. R, Moore S, Murray D, Olah C, Schuster M, Shlens J,
57. Knoll F. Leveraging the potential of neural networks for Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V,
image reconstruction. Proc. the International Society of Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg
Magnetic Resonance in Medicine (ISMRM), 2017. M, Wicke M, Yu Y, Zheng X. Tensorflow: Large-scale
58. Després P, Jia X. A review of GPU-based medical image machine learning on heterogeneous distributed systems.
reconstruction. Phys Med 2017;42:76-92. arXiv preprint arXiv:1603.04467, 2016.
59. Vedaldi A, Lenc K. MatConvNet: Convolutional 61. Johnson DH, Narayan S, Flask CA, Wilson DL. Improved
Neural Networks for MATLAB. Proc. of the 23rd ACM fat-water reconstruction algorithm with graphics hardware
international conference on Multimedia, 2015:689-92. acceleration. J Magn Reson Imaging 2010;31:457-65.
60. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro

Cite this article as: Wang H, Peng H, Chang Y, Liang D.

A survey of GPU-based acceleration techniques in MRI
reconstructions. Quant Imaging Med Surg 2018;8(2):196-208.
doi: 10.21037/qims.2018.03.07

View publication stats

Programming in Parallel With CUDA A Practical Guide (Richard Ansorge)
100% (1)
Programming in Parallel With CUDA A Practical Guide (Richard Ansorge)
477 pages
Overview Multimodal
No ratings yet
Overview Multimodal
22 pages
An Overview of Deep Learning in Medical Imaging Fo
No ratings yet
An Overview of Deep Learning in Medical Imaging Fo
45 pages
An Efficient Sorting Algorithm With CUDA
No ratings yet
An Efficient Sorting Algorithm With CUDA
8 pages
Controllable Data Generation by Deep Learning: A Review
No ratings yet
Controllable Data Generation by Deep Learning: A Review
55 pages
Medical Image Analysis With Transformers
No ratings yet
Medical Image Analysis With Transformers
66 pages
High-Throughput High-Resolution Deep Learning Micr
No ratings yet
High-Throughput High-Resolution Deep Learning Micr
21 pages
Generative Adversarial Networks (Gans) For Retinal Fundus Image Synthesis
No ratings yet
Generative Adversarial Networks (Gans) For Retinal Fundus Image Synthesis
15 pages
Ijms 17 01313
No ratings yet
Ijms 17 01313
26 pages
Instant Neural Graphics Primitives With A Multiresolution Hash Encoding
No ratings yet
Instant Neural Graphics Primitives With A Multiresolution Hash Encoding
13 pages
Paper Draft
No ratings yet
Paper Draft
30 pages
Poseidon A System Architecture For Efficient GPU-b
No ratings yet
Poseidon A System Architecture For Efficient GPU-b
15 pages
Mosaic: In-Memory Computing and Routing For Small-World Spike-Based Neuromorphic Systems
No ratings yet
Mosaic: In-Memory Computing and Routing For Small-World Spike-Based Neuromorphic Systems
12 pages
MLMI
No ratings yet
MLMI
9 pages
A Practical Approach To Ow Field Reconstruction With Sparse or Incomplete Data Through Physics Informed Neural Network
No ratings yet
A Practical Approach To Ow Field Reconstruction With Sparse or Incomplete Data Through Physics Informed Neural Network
16 pages
Prompt Engineering For Healthcare: Methodologies and Applications
No ratings yet
Prompt Engineering For Healthcare: Methodologies and Applications
34 pages
Understanding The GPU Microarchitecture To Achieve Bare-Metal Performance Tuning
No ratings yet
Understanding The GPU Microarchitecture To Achieve Bare-Metal Performance Tuning
14 pages
A Review of Deep Learning On Medical Image Analysis
No ratings yet
A Review of Deep Learning On Medical Image Analysis
48 pages
Deep L Earning
No ratings yet
Deep L Earning
7 pages
Final Report
No ratings yet
Final Report
77 pages
A Survey of Architectural Approaches For Improving GPGPU
No ratings yet
A Survey of Architectural Approaches For Improving GPGPU
24 pages
Wang - A Two-Dimensional Mid-Infrared Optoelectronic Retina Enabling Simultaneous Perception and Encoding
No ratings yet
Wang - A Two-Dimensional Mid-Infrared Optoelectronic Retina Enabling Simultaneous Perception and Encoding
8 pages
Medical Imaging Past Present Future
No ratings yet
Medical Imaging Past Present Future
28 pages
Etik 3
No ratings yet
Etik 3
15 pages
Synaptic Electronics For Neuro Morphic Computing
No ratings yet
Synaptic Electronics For Neuro Morphic Computing
27 pages
A Scan-Specific Unsupervised Method For Parallel M
No ratings yet
A Scan-Specific Unsupervised Method For Parallel M
6 pages
31.DL - Post-Proofreading 1-5
No ratings yet
31.DL - Post-Proofreading 1-5
5 pages
Advanced Uid Visualisation With Dualsphysics and Blender: Conference Paper
No ratings yet
Advanced Uid Visualisation With Dualsphysics and Blender: Conference Paper
8 pages
A Practical Guide To Machine Learning Interatomic Potentials - Status and Future
No ratings yet
A Practical Guide To Machine Learning Interatomic Potentials - Status and Future
83 pages
Atm 08 11 713
No ratings yet
Atm 08 11 713
15 pages
Exploring The Impact of A Simulation-Based Learning Tool On Undergraduate Quantum Computing Education
No ratings yet
Exploring The Impact of A Simulation-Based Learning Tool On Undergraduate Quantum Computing Education
16 pages
31.DL - Post-Proofreading 1-4
No ratings yet
31.DL - Post-Proofreading 1-4
4 pages
2020 - Singh - 3D Deep Learning On Medical Images
No ratings yet
2020 - Singh - 3D Deep Learning On Medical Images
26 pages
AI in MRI
No ratings yet
AI in MRI
9 pages
Zhao 2022 J. Phys. Conf. Ser. 2330 012002
No ratings yet
Zhao 2022 J. Phys. Conf. Ser. 2330 012002
9 pages
Ijaret: International Journal of Advanced Research in Engineering and Technology (Ijaret)
No ratings yet
Ijaret: International Journal of Advanced Research in Engineering and Technology (Ijaret)
9 pages
Exploring DL Methodologies For Medical Imaging: A Comprehensive Study
No ratings yet
Exploring DL Methodologies For Medical Imaging: A Comprehensive Study
8 pages
Optimizing MRI Data Processing by Exploiting GPU A
No ratings yet
Optimizing MRI Data Processing by Exploiting GPU A
18 pages
Deep Learning Applications in Medical Image Analys1
No ratings yet
Deep Learning Applications in Medical Image Analys1
7 pages
04 Manuscript
No ratings yet
04 Manuscript
15 pages
2014MTA Aninvestigationofpixelresonancephenomenon
No ratings yet
2014MTA Aninvestigationofpixelresonancephenomenon
20 pages
Time-Distributed Framework For 3D Reconstruction Integrating Fringe Projection With Deep Learning
No ratings yet
Time-Distributed Framework For 3D Reconstruction Integrating Fringe Projection With Deep Learning
23 pages
Graphics Processing Unit GPU Programming Strategie
No ratings yet
Graphics Processing Unit GPU Programming Strategie
14 pages
Post-Reading Report Alex Shen (Mid Exam)
No ratings yet
Post-Reading Report Alex Shen (Mid Exam)
36 pages
Processing MRI Brain Image Using OpenMP
No ratings yet
Processing MRI Brain Image Using OpenMP
6 pages
Research Paper
No ratings yet
Research Paper
5 pages
10 1016j Matpr 2020 09 536
No ratings yet
10 1016j Matpr 2020 09 536
7 pages
A Review On Parallel Medical Image Processing On GPU: AUGUST 2015
No ratings yet
A Review On Parallel Medical Image Processing On GPU: AUGUST 2015
5 pages
Poster Garciacarrasco
No ratings yet
Poster Garciacarrasco
1 page
Journal of Healthcare Engineering - 2019 - Fu - Machine Learning For Medical Imaging
No ratings yet
Journal of Healthcare Engineering - 2019 - Fu - Machine Learning For Medical Imaging
2 pages
Rishik Rangaraju Annotated Bibliography 9
No ratings yet
Rishik Rangaraju Annotated Bibliography 9
2 pages
Teaching Computational Fluid Dynamics (CFD) To Design Engineers
No ratings yet
Teaching Computational Fluid Dynamics (CFD) To Design Engineers
12 pages
GPGPUs CUDA
No ratings yet
GPGPUs CUDA
21 pages
Flinkcl: An Opencl-Based In-Memory Computing Architecture On Heterogeneous Cpu-Gpu Clusters For Big Data Abstract
No ratings yet
Flinkcl: An Opencl-Based In-Memory Computing Architecture On Heterogeneous Cpu-Gpu Clusters For Big Data Abstract
3 pages
Final Report First Review
No ratings yet
Final Report First Review
5 pages
The Arc Product-Market Fit Framework - Sequoia Capital
No ratings yet
The Arc Product-Market Fit Framework - Sequoia Capital
9 pages
Research Proposal
No ratings yet
Research Proposal
1 page
8 Things You Should Know About GPGPU Technology: Q&A With TACC Research Scientists
No ratings yet
8 Things You Should Know About GPGPU Technology: Q&A With TACC Research Scientists
2 pages
FY2024 NVIDIA Corporate Sustainability Report
No ratings yet
FY2024 NVIDIA Corporate Sustainability Report
41 pages
Jensen Huang
No ratings yet
Jensen Huang
14 pages
December Remote Tech Jobs - Career Resources
No ratings yet
December Remote Tech Jobs - Career Resources
26 pages
Poweredge-Rack-Quick-Reference-Guide (12-3-2025)
No ratings yet
Poweredge-Rack-Quick-Reference-Guide (12-3-2025)
6 pages
Install Guide
No ratings yet
Install Guide
47 pages
Whack-a-Chip:: The Futility of Hardware-Centric Export Controls
No ratings yet
Whack-a-Chip:: The Futility of Hardware-Centric Export Controls
7 pages
South America Humanoid Robot Market Report 2025 Including - Hardware Software Biped Wheel Drive E...
No ratings yet
South America Humanoid Robot Market Report 2025 Including - Hardware Software Biped Wheel Drive E...
34 pages
Maintenance and Service Guide: HP Pavilion 15 Laptop PC
No ratings yet
Maintenance and Service Guide: HP Pavilion 15 Laptop PC
102 pages
NIT-Segel-042 Rev00
No ratings yet
NIT-Segel-042 Rev00
38 pages
Skydio 349659472
No ratings yet
Skydio 349659472
8 pages
A History of Seismic Acquisition Design in Saudi Arabia - Lessons Learned
No ratings yet
A History of Seismic Acquisition Design in Saudi Arabia - Lessons Learned
19 pages
Maintenance and Service Guide
No ratings yet
Maintenance and Service Guide
96 pages
Field Data Comparisons of MEMS Accelerometers and
No ratings yet
Field Data Comparisons of MEMS Accelerometers and
23 pages
Accelerating High-Order Stencils On GPUs
No ratings yet
Accelerating High-Order Stencils On GPUs
23 pages
(BCG) Hardwiring-Digital-Transformation
No ratings yet
(BCG) Hardwiring-Digital-Transformation
34 pages
332.21 Win8 Win7 Winvista Desktop Release Notes
No ratings yet
332.21 Win8 Win7 Winvista Desktop Release Notes
68 pages
Launcher Log
No ratings yet
Launcher Log
506 pages
Nvidia Professional Graphics Cards Brochure
No ratings yet
Nvidia Professional Graphics Cards Brochure
2 pages
Lines 1999
No ratings yet
Lines 1999
3 pages
Ibs Band10
No ratings yet
Ibs Band10
92 pages
A Comparison of The Dispersion Relations For Anisotropic Elastodynamic - Bernth
No ratings yet
A Comparison of The Dispersion Relations For Anisotropic Elastodynamic - Bernth
8 pages
Randall 1989
No ratings yet
Randall 1989
12 pages
Scaling ChatGPT Five Real-World Engineering Challenges
No ratings yet
Scaling ChatGPT Five Real-World Engineering Challenges
24 pages
Hitachi Rail Unveils The HMAX' AI Solution, Accelerated by NVIDIA, To Optimize Trains, Signaling and Infrastructure
No ratings yet
Hitachi Rail Unveils The HMAX' AI Solution, Accelerated by NVIDIA, To Optimize Trains, Signaling and Infrastructure
5 pages
Troubleshooting Guide: Knights of The Old Republic II: The Sith Lords
No ratings yet
Troubleshooting Guide: Knights of The Old Republic II: The Sith Lords
25 pages
How To Disable Blacklist Nouveau Nvidia..
No ratings yet
How To Disable Blacklist Nouveau Nvidia..
7 pages
Nvidia-Smi 1 PDF
No ratings yet
Nvidia-Smi 1 PDF
26 pages
Nvidiagpu
No ratings yet
Nvidiagpu
20 pages
Dell Precision Workstations: The #1 Workstations in The World
No ratings yet
Dell Precision Workstations: The #1 Workstations in The World
7 pages
NB Asus User
No ratings yet
NB Asus User
8 pages
Disconnect 2024 11 27 - 18.04.12 Client
No ratings yet
Disconnect 2024 11 27 - 18.04.12 Client
2 pages
Pricing - Linux Virtual Machines - Microsoft Azure
No ratings yet
Pricing - Linux Virtual Machines - Microsoft Azure
33 pages
Ririn Review 4
No ratings yet
Ririn Review 4
5 pages
MSI Laptop
No ratings yet
MSI Laptop
1 page
Dapatkan!! Mouse Optic + Keyboard Universal, Only Rp. 15.000
No ratings yet
Dapatkan!! Mouse Optic + Keyboard Universal, Only Rp. 15.000
4 pages
Video Product Comparison
No ratings yet
Video Product Comparison
2 pages

A Survey of GPU-based Acceleration Techniques in M

Uploaded by

A Survey of GPU-based Acceleration Techniques in M

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A survey of GPU-based acceleration techniques in MRI reconstructions

Article in Quantitative Imaging in Medicine and Surgery · March 2018

Hanchuan Peng Yuchou Chang

Artificail Intelligence View project

The user has requested enhancement of the downloaded file.

A survey of GPU-based acceleration techniques in MRI

Introduction decreasing practical processing times of their algorithms.

Peak memory bandwidth (GB/s) Peak-double-precision-flops-(GFLOPs)

Table 1 Summarization of GPU-based MRI reconstruction methods

(19) Cartesian FFT NVIDIA Quadro FX NV40 HLSL 2×

(20) Cartesian FFT ATI Radeon X1800 XT HLSL, DirectX 3.5×

(23) Non-uniform FFT (optimal least square NUFFT) – – –

(24) Gridding (PROPELLER) NVIDIA GeForce GTX 8800 CUDA library 9×

1600 implementation of relative complex algorithms was hard to

References 16. Wang X, Wang H. Volumetric region growing based on

Cite this article as: Wang H, Peng H, Chang Y, Liang D.

View publication stats

You might also like