FPGA Co-Processing Architectures For Video Compression
FPGA Co-Processing Architectures For Video Compression
Alex Soohoo
Altera Corporation
101 Innovation Drive
San Jose, CA 95054, USA
(408) 544-8063
[email protected]
Overview
The push to roll out high definition video processing applications. This new
enabled video and imaging equipment is generation of tools facilitates the design of a
creating numerous challenges for video system architecture that is more scalable and
system architects. The increased image powerful than traditional DSP-only designs
resolution brings with it higher performance while at the same time taking advantage of
requirements for basic video data path the price and performance benefits of
processing and next-generation compression FPGAs.
standards, outstripping that which stand-
alone digital signal processors (DSPs) can Design Flow
provide. In addition, the system
specifications require designers to support a The emergence of these new DSP design
range of standard and custom video flows has made the combined DSP
interfaces and peripherals usually not processor and FPGA co-processor
supported by off-the-shelf DSPs. While it is architecture an attractive option for video
possible to go the route of application and image processing systems. What has
specific integrated circuits (ASICs) or use made this possible is the co-processor flow
application specific standard products that merges the traditional C-language based
(ASSPs), these can be difficult and development environments for
expensive alternatives that might require a programmable DSPs and hardware
compromised feature set. Furthermore, these description language (HDL) tools for
choices can hasten a short product life cycle FPGAs with powerful system integration
and force yet another system redesign to capabilities (see Fig. 1). Through clever
meet varied and quickly changing market system partitioning, designers now have the
requirements. ability to leverage a legacy code base for
DSPs and offload the most computationally
Field programmable gate arrays (FPGAs) intensive blocks of an algorithm to an FPGA
are an option that can bridge the flexibility to create systems optimized for both
gap in these types of designs. Additionally, price/performance and time-to-market.
with the increasing number of embedded
hard multipliers and high memory
bandwidth, the latest generation of FPGAs
can enable customized designs for video
systems while offering a manifold
performance improvement over the fastest
available stand-alone DSPs. Designers now
have the ability with state-of-the-art FPGA
co-processor design flows to implement
high-performance DSP video and image
CP-VIDEO-1.0
Figure 1: Combined DSP Design
Flow Figure 2: FPGA Co-Processor Flow
DSP Processor
FPGA Design Entry HDL Development Tools
C-Language
Co-Processor Optimized IP Functions
Integrated Model-Based Design
Development
Development
Flow
Environment DSP Algorithm Simulation
Co-Processor
System Integration System Integration Tools
RTL Generation
System
Integration
Tool
RTL Synthesis
RTL Simulation
HW Programming
Debug & Verification
Software development environments for
DSPs are quite mature, having been refined
Finally, new system integration tools enable
over many years to address the most
rapid development of custom FPGA co-
common design bottlenecks. On the other
processor solutions and the ability to
hand, there are many options for designing
leverage existing solutions to add new
and creating FPGA co-processors. The
capabilities and improve system
design of DSP systems with FPGAs can
performance. By automating the integration
utilize both high-level algorithm and
phase of system components and peripherals,
hardware description language (HDL)
this design software can allow users to focus
development tools as seen in Figure 2. The
attention on system-level requirements
most straightforward approach is to create
instead of the mundane, manual task of
an entire design from scratch, writing
integrating individual blocks with varying
custom DSP functions in HDL and then
requirements. For example, the job of
using standard FPGA design software.
creating and verifying the interface between
While it is possible to develop high-
an FPGA and a DSP can be complex. The
performance, optimized designs, it can be a
newest system integration tools allow the
time-consuming and labor intensive effort.
designer to drop in a FIFO-based IP core
FPGA suppliers and third-party vendors
and interface to an external processor
now offer highly optimized, parameterizable,
without having to manage or consider the
off-the-shelf intellectual property (IP),
specific pin-out details. This can be
typically the most common video and image
critically important for a DSP software
processing functions and key video
engineer with limited experience in FPGA
compression algorithm blocks. These IP
design and hardware implementation.
cores with well defined high-speed interface
wrappers can be quickly integrated into a
Figure 3 and Figure 4 illustrate example
system design enabling shorter design cycles
DSP/FPGA co-processing architectures
and an accelerated time-to-market.
using the Texas Instruments external
memory interface (EMIF) and the industry
Model-based design environments such as
standard Serial RapidIO (SRIO) interface.
The Mathworks Simulink allow designers to
These architectures can provide memory and
develop, simulate and verify a DSP
peripheral expansion as well as the
processing data path for an FPGA co-
capability for increased processing
processor. Models can be built using a mix
performance. The latest generation of
of proprietary and off-the-shelf DSP
system integration tools can automatically
building blocks. FPGA design software can
generate a seamless bridge between the DSP
integrate this environment combining its
and the FPGA, making it easier to
capabilities with standard FPGA HDL
implement algorithms defined at the block
synthesis, simulation and customized
or component level without having to focus
development tools.
on the detailed device interface mapping. image mixing/blending have little or no
control flow component. For that reason, the
Figure 3: Co-Processing With DSP -
EMIF bulk of the video processing chain should be
FPGA implemented completely on an FPGA.
Local Video compression algorithms, which have
Switch Fabric
Memory Memory
Interface
DSP EMIF
EMIF
Interface Peripheral a well defined mix of control and processing
Co-Processor operations, might be implemented in a DSP
processor or split between a DSP and FPGA
SDRAM depending on the system requirements. The
following examples highlight the challenges
Flash
and rationale for FPGA co-processor
architectures.
Figure 5: Video Processing Chains
Figure 4: Co-Processing With DSP – Pre-processing
SRIO Input
CSC
Noise
Reduction
Scaler Alpha
Up to 12.5 Gbps
FPGA Blending Output Video
Mixer CSC Encoder
Memory OSD
Interface
Memory
IP Core
Post-processing
Co-Processor
OSD
Flash