0% found this document useful (0 votes)
58 views5 pages

FPGA Co-Processing Architectures For Video Compression

Uploaded by

Umar Anjum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views5 pages

FPGA Co-Processing Architectures For Video Compression

Uploaded by

Umar Anjum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

FPGA Co-Processing Architectures for Video Compression

Alex Soohoo
Altera Corporation
101 Innovation Drive
San Jose, CA 95054, USA
(408) 544-8063
[email protected]

Overview

The push to roll out high definition video processing applications. This new
enabled video and imaging equipment is generation of tools facilitates the design of a
creating numerous challenges for video system architecture that is more scalable and
system architects. The increased image powerful than traditional DSP-only designs
resolution brings with it higher performance while at the same time taking advantage of
requirements for basic video data path the price and performance benefits of
processing and next-generation compression FPGAs.
standards, outstripping that which stand-
alone digital signal processors (DSPs) can Design Flow
provide. In addition, the system
specifications require designers to support a The emergence of these new DSP design
range of standard and custom video flows has made the combined DSP
interfaces and peripherals usually not processor and FPGA co-processor
supported by off-the-shelf DSPs. While it is architecture an attractive option for video
possible to go the route of application and image processing systems. What has
specific integrated circuits (ASICs) or use made this possible is the co-processor flow
application specific standard products that merges the traditional C-language based
(ASSPs), these can be difficult and development environments for
expensive alternatives that might require a programmable DSPs and hardware
compromised feature set. Furthermore, these description language (HDL) tools for
choices can hasten a short product life cycle FPGAs with powerful system integration
and force yet another system redesign to capabilities (see Fig. 1). Through clever
meet varied and quickly changing market system partitioning, designers now have the
requirements. ability to leverage a legacy code base for
DSPs and offload the most computationally
Field programmable gate arrays (FPGAs) intensive blocks of an algorithm to an FPGA
are an option that can bridge the flexibility to create systems optimized for both
gap in these types of designs. Additionally, price/performance and time-to-market.
with the increasing number of embedded
hard multipliers and high memory
bandwidth, the latest generation of FPGAs
can enable customized designs for video
systems while offering a manifold
performance improvement over the fastest
available stand-alone DSPs. Designers now
have the ability with state-of-the-art FPGA
co-processor design flows to implement
high-performance DSP video and image

CP-VIDEO-1.0
Figure 1: Combined DSP Design
Flow Figure 2: FPGA Co-Processor Flow
DSP Processor
FPGA Design Entry HDL Development Tools
C-Language
Co-Processor Optimized IP Functions
Integrated Model-Based Design
Development
Development
Flow
Environment DSP Algorithm Simulation

Co-Processor
System Integration System Integration Tools

RTL Generation
System
Integration
Tool
RTL Synthesis

RTL Simulation

HW Programming
Debug & Verification
Software development environments for
DSPs are quite mature, having been refined
Finally, new system integration tools enable
over many years to address the most
rapid development of custom FPGA co-
common design bottlenecks. On the other
processor solutions and the ability to
hand, there are many options for designing
leverage existing solutions to add new
and creating FPGA co-processors. The
capabilities and improve system
design of DSP systems with FPGAs can
performance. By automating the integration
utilize both high-level algorithm and
phase of system components and peripherals,
hardware description language (HDL)
this design software can allow users to focus
development tools as seen in Figure 2. The
attention on system-level requirements
most straightforward approach is to create
instead of the mundane, manual task of
an entire design from scratch, writing
integrating individual blocks with varying
custom DSP functions in HDL and then
requirements. For example, the job of
using standard FPGA design software.
creating and verifying the interface between
While it is possible to develop high-
an FPGA and a DSP can be complex. The
performance, optimized designs, it can be a
newest system integration tools allow the
time-consuming and labor intensive effort.
designer to drop in a FIFO-based IP core
FPGA suppliers and third-party vendors
and interface to an external processor
now offer highly optimized, parameterizable,
without having to manage or consider the
off-the-shelf intellectual property (IP),
specific pin-out details. This can be
typically the most common video and image
critically important for a DSP software
processing functions and key video
engineer with limited experience in FPGA
compression algorithm blocks. These IP
design and hardware implementation.
cores with well defined high-speed interface
wrappers can be quickly integrated into a
Figure 3 and Figure 4 illustrate example
system design enabling shorter design cycles
DSP/FPGA co-processing architectures
and an accelerated time-to-market.
using the Texas Instruments external
memory interface (EMIF) and the industry
Model-based design environments such as
standard Serial RapidIO (SRIO) interface.
The Mathworks Simulink allow designers to
These architectures can provide memory and
develop, simulate and verify a DSP
peripheral expansion as well as the
processing data path for an FPGA co-
capability for increased processing
processor. Models can be built using a mix
performance. The latest generation of
of proprietary and off-the-shelf DSP
system integration tools can automatically
building blocks. FPGA design software can
generate a seamless bridge between the DSP
integrate this environment combining its
and the FPGA, making it easier to
capabilities with standard FPGA HDL
implement algorithms defined at the block
synthesis, simulation and customized
or component level without having to focus
development tools.
on the detailed device interface mapping. image mixing/blending have little or no
control flow component. For that reason, the
Figure 3: Co-Processing With DSP -
EMIF bulk of the video processing chain should be
FPGA implemented completely on an FPGA.
Local Video compression algorithms, which have

Switch Fabric
Memory Memory
Interface

DSP EMIF
EMIF
Interface Peripheral a well defined mix of control and processing
Co-Processor operations, might be implemented in a DSP
processor or split between a DSP and FPGA
SDRAM depending on the system requirements. The
following examples highlight the challenges
Flash
and rationale for FPGA co-processor
architectures.
Figure 5: Video Processing Chains
Figure 4: Co-Processing With DSP – Pre-processing

SRIO Input
CSC
Noise
Reduction
Scaler Alpha

Up to 12.5 Gbps
FPGA Blending Output Video
Mixer CSC Encoder

1x/4x SRIO SRIO Local


Switch Fabric

Memory OSD
Interface
Memory
IP Core

DSP EMIF Peripheral

Post-processing
Co-Processor

Video Input Noise


Scaler Alpha
Decoder CSC Reduction
SDRAM Blending Output
Mixer CSC

OSD
Flash

FPGA Co-Processing for High A simple video noise reduction filtering


Performance Video and Image Processing example seen in Figure 6 demonstrates the
potential of the FPGA co-processor
The main justification for the FPGA co- approach. For video pre-processing in a high
processor design flow approach is the definition encoding system, a 7x7 two-
benefit of enhanced system dimensional filter kernel is applied to
price/performance. Properly architected broadcast HDTV 1080p video at 1920x1080
designs can offload a DSP processor and resolution, 30 frames per second, 24 bits per
execute computationally intensive blocks of pixel. This operation will require over 9 gig
a DSP algorithm in a more efficient parallel multiply-accumulates per second (GMACs),
implementation on an FPGA. This is more performance than the fastest
especially attractive for emerging video and commercially available DSP can offer. The
image processing applications where DSP same function can be implemented on a low-
performance requirements are growing at cost FPGA with headroom to spare.
the fastest rates. Figure 6: High Definition Encoding
System
Consider the typical video compression
(encoding/decoding) processing chains. By
Broadcast Encoding System
taking a closer look at the pre-processing
and post-processing halves, it is possible to Digital
Video Noise
H.264
HD Network
Reduction
identify the types of algorithms that might Input
Pre-processing
Encoder

be partitioned between DSP processors and in FPGA

FPGAs to implement a video data path.


Multiply-accumulate (MAC) intensive
algorithms such as color space conversion
(CSC), noise reduction filtering, scaling and
For video compression systems, FPGA co- programmable parts of the system. The
processing architectures can create motion estimation block, in particular,
especially cost effective solutions compared leaves room to incorporate a range of
to platforms based on multiple DSPs. High- different techniques for motion vector
definition broadcast quality encoding search. From the equipment vendor’s point
utilizing video codecs MPEG2, MPEG4 and of view, this flexibility allows for
H.264 can be implemented with a single customization and differentiation that is not
FPGA and DSP. possible when the only choice is a fixed
ASSP.
Figure 7: H.264 Encoding Co-Processing
Partition Conclusion

Performance requirements for video and


image processing end equipment is growing
as a direct correlation to the new
compression standards and higher resolution
formats that are being adopted. FPGA co-
processor system architectures,
complemented by leading-edge design
software, allow designers to implement
these high performance DSP algorithms in a
cost-effective, efficient manner and realize
significant benefits.

Figure 7 shows an example FPGA co-


processor partition of the H.264 encoding
standard. The FPGA has absorbed the
sections of the algorithm that require the
most cycles on the DSP, including the
motion estimation block, entropy coding and
the deblocking filter. The DSP can execute
the remaining parts on the algorithm that are
more control flow oriented and better
mapped to a C-code implementation. Newer
entropy coding techniques such as CAVLC
and CABAC do not map well to a typical
DSP instruction set and are best realized as
hardware accelerated blocks on the FPGA.

In the case of the latest video compression


standards, the FPGA co-processor
architecture provides a number of
advantages. When a standard is relatively
new or in flux, many system developers
prefer that some degree of flexibility be
allocated into the design. When the video
compression community converges on the
optimal algorithmic approach to the parts of
the standard that have some room for
enhancement, the hardware architecture can
be preserved with only modifications to the
Copyright © 2005 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device
designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and service
®
marks of Altera Corporation in the U.S. and other countries. All other product or service names are the property of their respective holders. Altera products
are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its
101 Innovation Drive semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and
San Jose, CA 95134 services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service
described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain the latest version of device
(408) 544-7000
specifications before relying on any published information and before placing orders for products or services.
http://www.altera.com

You might also like