Intel Media Developers Guide 0
Intel Media Developers Guide 0
UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR
INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A
SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must
not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined."
Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or
incompatibilities arising from future changes to them. The information here is subject to change without
notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may
cause the product to deviate from published specifications. Current characterized errata are available on
request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing
your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel
literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site.
MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of
MPEG CODECs, or MPEG enabled platforms may require licenses from various entities, including Intel
Corporation.
Intel, the Intel logo, Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries
in the United States and other countries.
*Other names and brands may be claimed as the property of others.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.
Copyright © 2008-2014, Intel Corporation. All Rights reserved.
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations
that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction
sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please
refer to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice.
Notice revision #20110804
*Other names and brands may be claimed as the property of others iii
This page left intentionally blank.
6 Appendixes .................................................................................................................. 62
6.1 Encoder Configuration for Blu-ray* and AVCHD* ................................................... 62
6.1.1 Encoder Configuration ...................................................................... 62
6.1.2 GOP Sequence ............................................................................... 62
6.1.3 SPS and PPS ................................................................................. 63
6.1.4 HRD Parameters ............................................................................ 63
6.1.5 Preprocessing Information ................................................................. 63
6.1.6 Closed-Captioned SEI Messages ........................................................... 63
6.1.7 Unsupported Features ..................................................................... 63
6.1.8 Additional configuration for H.264 Stereo High Profile ................................. 64
6.2 Encoder Configuration for DVD-Video* .............................................................. 65
6.2.1 Encoder Configuration ...................................................................... 65
6.2.2 GOP Sequence ............................................................................... 65
6.2.3 Setting NTSC/PAL video_format in the sequence_display_extension header ....... 65
6.2.4 Preprocessing Information ................................................................. 66
6.2.5 Closed Captioning ........................................................................... 66
Revision
Description Date
Number
2.0 Updates to accompany Intel® Media SDK 2012 release September, 2011
2.1 Updates accompanying Intel® Media SDK 2012 R2 release April, 2012
3.0 Updates to accompany the Intel® Media SDK 2013 Release December, 2012
3.1 Updates to accompany the Intel® Media SDK 2013 R2 Release June, 2013
4.0 Updates to accompany the Intel® Media SDK 2014 Release December, 2013
*Other names and brands may be claimed as the property of others vii
This page left intentionally blank.
viii *Other names and brands may be claimed as the property of others
1 About this Document
1.1 Overview
This document provides development hints and tips to help developers create the next
generation of applications, enabling their end-users to have a great experience creating,
editing, and consuming media content on Intel® Processor Graphics. Software development
best practices are described using the latest revision of the Intel® Media SDK. The Intel Media
SDK is free to download and free to use.
Before continuing with this document, be sure to download the latest version of this document
and the Intel Media SDK from the product page: http://intel.com/software/mediasdk
This guide starts with how to understand the sample applications bundled with installation and
expands from there to cover working with the Intel Media SDK from a software developer’s
perspective. It is intended to accompany the reference manuals
mediasdk-man.pdf
mediasdkmvc-man.pdf
mediasdkusr-man.pdf
mediasdkjpeg-man.pdf
to support rapid development of high performance media applications.
Source code:
// Initialize DECODE
decode->DecodeHeader(bitstream, InitParam_d);
decode->Init(InitParam_d);
1.5 Specifications
The following table lists the specifications for the Intel® Media SDK 2014.
Video Encoders H.264 (AVC and MVC), MPEG-2, JPEG*/Motion JPEG, HEVC(SW)
Video Decoders H.264 (AVC and MVC), MPEG-2, VC-1, JPEG*/Motion JPEG, HEVC(SW)
HEVC SW Decode and Encode in standalone Intel Media SDK HEVC Software Pack
Encode Enhancements
o Improved bit rate control for LookAhead bit rate control algorithm
o Additional bit rate control algorithms such as Intelligent Constant Quality (ICQ)
o Region of Interest encoding (ROI)
Video Processing Enhancement
o Frame Composition API
Audio Decode and Encode in standalone Intel Media SDK 2014 Audio Library: AAC
Decode & Encode, MP3 Decode
Container Splitter and Muxer API (sample): MPEG-2 TS and MPEG-4
3rd party plug-in marketplace
The Intel Media SDK optimized media libraries are built on top of Microsoft* DirectX*, DirectX
Video Acceleration (DVXA) APIs, and platform graphics drivers. Intel Media SDK exposes the
hardware acceleration features of Intel® Quick Sync Video built into 2nd, 3rd, and 4th generation
Intel® Core™ processors. Read more at
Intel® Media SDK – Optimized Media Library Intel® Media SDK – Optimized Media Library
for CPU for Intel® Processor Graphics
Graphics Drivers
While Intel Media SDK is designed to be a flexible solution for many media workloads, it focuses
only on the media pipeline components which are commonly used and usually the most in need
of acceleration. These are:
Decoding from video elementary stream formats (H.264, MPEG-2, VC-1, and
JPEG*/Motion JPEG, new: HEVC) to uncompressed frames
Selected video frame processing operations
Encoding uncompressed frames to elementary stream formats (H.264, MPEG-2, new:
HEVC)
New: Audio encode/decode and container split/muxing
Demux/
Mux
Split
Audio elementary stream
Figure 2 A generic transcode pipeline. Intel® Media SDK accelerates a subset of the most
computationally demanding video elementary stream tasks.
Writing fully functional media applications capable of working with most media players requires
that the application handle these operations. Media SDK 2014 now provides rudimentary
audio codec features and container split/muxing functionality. An alternative is to leverage
any of the open frameworks such as FFMPEG. A whitepaper is available showing this type of
integration here: http://software.intel.com/en-us/articles/integrating-intel-media-sdk-with-
ffmpeg-for-muxdemuxing-and-audio-encodedecode-usages. FFmpeg integration is also
showcased as part of the Media SDK Tutorial: http://software.intel.com/en-us/articles/intel-
media-sdk-tutorial
Intel Media SDK is designed to provide the fastest possible performance on Intel® Quick Sync
Video hardware. This can be realized in traditional applications as well as Microsoft
DirectShow* and Microsoft Media Foundation* Transform (MFT) plugins. While the
complexities of the hardware accelerated implementation are hidden as much as possible, the
API retains some characteristics of the architecture underneath which need to be accounted
for in the design of any program using Intel Media SDK. These include asynchronous operation,
NV12 color format, and Intel Media SDK’s memory management infrastructure.
The Intel Media SDK API is high-level to provide portability. This future-proofs development
efforts. Applications need not be redesigned to take advantage of new processor features as
they emerge. The software implementation is designed to fill in where hardware acceleration
is not available.
Dispatcher
Figure 3 Intel® Media SDK architecture showing software and hardware pathways
Many platforms from Intel and other vendors do not have hardware acceleration capabilities.
Intel Media SDK provides a software implementation using the same interface so its wide
range of encoding, decoding, and pixel processing capabilities can be available for most
platforms capable of running the supported operating systems. Thus applications developed
on systems without Intel® Quick Sync Video can run with hardware acceleration when moved
to a system with Intel® HD Graphics, or Intel® Iris™ —without changing a line of code.
Conversely, applications developed for 2nd, 3rd, or 4th generation Intel® Core™ processor-based
machines will still work on other systems.
The optimized software version of the Intel Media SDK can be used as a standalone package.
However, the speed advantages of Intel Media SDK come from the graphics hardware, and a
fully functional Intel® Processor Graphics driver must be installed to utilize the underlying
hardware acceleration capabilities. (The default driver shipped with the system may not
contain all of the necessary pieces. Please start your installation with a graphics driver
upgrade.) The hardware and software libraries provide identical syntax for application
development, but results may be different. The software library runs entirely on the CPU,
while the platform-specific libraries execute using the CPU and GPU via the DXVA / DDI
interfaces. Which one provides the best performance and quality will depend on several
variables such as application implementation, content, hardware, mix of CPU vs. GPU
operations, etc.
The dispatcher, which selects the version of the library used by your program, is an important
part of the Intel Media SDK architecture. For more information on the dispatcher, please see
section 4.3.
In general, the code in the samples directory can be compiled and distributed freely for Gold
releases. Beta releases have not finished validation and are made available for learning
purposes only.
The Microsoft* DirectShow* and Microsoft Media Foundation* Transform (MFT) sample codec
filters with source code are in this category. However, several binary-only utilities such as
splitters and audio codecs are included in the installation for development convenience only
and cannot be distributed.
The header files, sample executables, binary splitters/muxers, and manuals are classified as
“developer tools”. Your application documentation can instruct users to install the Intel Media
SDK, but cannot include these items.
You may repackage the software DLL to enable a self-contained install. However, since the
hardware DLL is included with the graphics driver, this update process should be left alone to
ensure system stability. A better solution may be to check the API version of the hardware
DLL in your program and offer hints to install/upgrade as needed. In general, the most recent
graphics driver/hardware DLL is the best one to use.
A machine with suitable video processing hardware and OS (see specifications table 3
in section 1.5).
A graphics driver including the Intel Media SDK hardware acceleration DLLs.
http://www.intel.com/products/processor_number/about/core.htm
Only Microsoft* Windows* 7 and Microsoft Windows 8/8.1 operating systems are currently
supported.
Each component is separate. The Intel Media SDK can be installed before or after the driver,
and the driver can be updated without re-installing Intel Media SDK. However, hardware
acceleration is only available if both components are installed successfully.
Updating the graphics driver along with each update to the Intel Media SDK, if not more
frequently, is highly recommended.
The media acceleration DLLs are distributed with the graphics driver, not the Intel Media SDK.
In many cases the default graphics driver may not provide all of the files and registry entries
needed. For best results, please update to the latest driver available from
http://downloadcenter.intel.com/ before attempting a new install. While some systems may
require a vendor-approved driver, the Intel drivers are usually appropriate. These drivers can
be found by selecting “Graphics” as the product family at the Download Center site.
The graphics driver installer populates the Intel Media SDK directories under “c:\program
files\Intel\Media SDK”, or “c:\program files (x86)\Intel\Media SDK” on 64 bit OS installs. The
Intel Media SDK hardware acceleration DLLs and certified HW Media Foundation Transforms
can be found here.
http://www.intel.com/support/graphics/sb/CS-
022130.htm?wapkw=(graphics+and+media+control+panel)
The “Hello World” program in section 4.3.5 can be used as a quick check to verify that the Intel
Media SDK and driver components are both installed and functional.
The readme in each sample package (build info and details for each sample)
Intel Media SDK Filters Specifications (DLL info, pin interfaces, etc.)
Intel Media SDK Sample Guide
Please note:
Each Media SDK sample is distributed as separate downloadable packages on the Media SDK
portal web page.
- Pipelines: the output from one stage provides the input to the next. There can be
multiple pipelines (such as in the multi transcode sample), but, as with the underlying
Intel Media SDK implementation, there is an assumption of a linear series of “media
building block” operations.
- Utilities: the sample_common directory contains utility classes for file I/O,
surface/buffer allocation, etc. These are used by all of the samples. Additional
functionality added here will be available to all samples.
- Opaque memory: Opaque memory is enabled for transcode pipelines, simplifying
surface memory management.
Repeating the sample disclaimer is especially important here. These are samples only, not
production-ready components, even though they are accessed through the same Microsoft
DirectShow/MFT interface as other production-ready filters.
These additional filters are distributed as binary only, and are “as is” with limited support. Why?
These filters are part of the developer package to ensure that the GUI applications can create
a complete Microsoft DirectShow graph. They are limited in functionality and are typically the
source of developer frustration when using the Intel Media SDK to construct custom graphs.
The internet offers alternatives to these filters, and developers are encouraged to experiment
with other third-party filters if problems occur.
In addition to the filters and transforms listed in Table 5 Microsoft* DirectShow* Filter
Samples the DirectShow sample package also contains the Microsoft DirectShow filters shown
in Table 8. These are utility filters to help you get started. They are not redistributable.
Developers may find that the Intel Media SDK sample filters do not connect to some third-party
filters. The developer package contains the supported interfaces of the Microsoft DirectShow
filters listed in the “Intel Media SDK Filter Specifications” document. The most prevalent reason
for the sample filters to refuse connection is that the color space is not supported. The Intel
Media SDK sample filters require the input data to be in NV12 format. NV12 is the native
format of the GPU, and thus the filters need this format to pass the data to the hardware for
acceleration. Developers may find the need to insert a color conversion filter upstream of the
sample decoder filter in order to connect the graph successfully. The Intel Media SDK does not
provide a sample color conversion filter at this time. This is left to the developer to implement.
Section 1: Introduces the Intel Media SDK session concept via a very simple sample.
In addition to demonstrating how to implement the most common workloads, the tutorials also
explain how to achieve optimal performance via a step-by-step approach. All tutorial samples are
located in a self-contained Visual Studio* 2010 solution file. They can be downloaded here:
http://software.intel.com/en-us/articles/intel-media-sdk-tutorial
The Intel® Media SDK is designed to reduce the complexities of decoding and encoding video
elementary streams or containers (new for Media SDK 2014).
Many video players read only container formats and do not directly support the H.264 or
MPEG-2 elementary streams output by Intel Media SDK encode. For these players, an
additional muxing stage to a container format is required.
For players capable of working with elementary and/or raw formats, please see the
Recommended Tools section.
The first step is obtaining some content to work with. The example in this section uses the
free Big Buck Bunny trailer which can be obtained from www.bigbuckbunny.org.
In this case the MPEG-4 container file holds 1 AVC/H.264 video stream and 1 audio stream.
YAMB (Yet Another MP4Box Graphical Interface) is an easy to use tool for common tasks, but
there are many others.
After selecting “Click to extract streams from AVI/MP4/MOV/TS files”, enter the container file.
The streams seen in MediaInfo should be available. Select the AVC stream and “Extract to Raw
Format”.
C:\ ffmpeg -i <input.mp4 file> -vcodec copy -bsf h264_mp4toannexb -an -f h264
<output file for elementary stream>.h264
Other notable tools for this task are mp4creator and ffmpeg. Since they can be run from the
command line they have the added advantage of easy automation in a script or build step. For
example, this will demux the trailer as well:
Several sites also host YUV sequences for encoder testing. These are packaged in many ways.
The files at http://trace.eas.asu.edu/yuv/ are an example of the YUV format usable by the Intel
Media SDK encoder console sample.
Here is another example with YAMB. First, select “Click to create an MP4 file…”
The mp4creator and ffmpeg tools are also able to perform this muxing step. Since they are
command line tools they are convenient to script or add to a Microsoft* Visual Studio* build
step.
http://software.intel.com/en-us/articles/integrating-intel-media-sdk-with-ffmpeg-for-
muxdemuxing-and-audio-encodedecode-usages
http://software.intel.com/en-us/articles/intel-media-sdk-tutorial-simple-6-transcode-
opaque-async-ffmpeg
Media SDK encoder note: Encoder can control insertion of sequence headers via the
“IdrInterval” parameter. Make sure that “GopPicSize” and “GopRefDist” values have been
specified explicitly to ensure correct behavior. Also keep in mind that “IdrInterval” has slightly
different behavior for H.264 vs. MPEG2 encoding. More details can be found in Media SDK
manual.
The list below outlines some basic architectural concepts for any Intel Media SDK project. The
program’s entire architecture does not need to fit these principles, but the section working
with Intel Media SDK should have these characteristics:
This is a general overview of what solutions will need to look like to get best performance.
More information on performance is provided in Chapter 5.
The Decode, VPP, Encode, and User function groups can be used as building blocks to provide
the foundation for many different usage scenarios, such as:
While the manual is based on the C functions in mfxvideo.h, the samples use the C++ interface
in mfxvideo++.h. In general the C++ functions are simply a wrapper around the corresponding
C functions (see the appendix). The same function groups can be seen in both sets of
functions.
If the machine has compatible hardware and the dispatcher can find a way to access it
(implying an appropriate graphics driver has been installed), the API calls will use the hardware
implementation.
If software mode is requested, or Intel Media SDK is initialized in auto mode and no hardware
implementation can be found, the software implementation will start.
Use the following flags to specify the OS infrastructure that hardware acceleration
should be based on:
Please refer to the Media SDK reference manual and header files for details about additional
implementation targets, primarily used to support systems with multiple GPUs.
Dispatcher
Figure 5 The software implementation is also embedded in the hardware library to enable
software fallback.
Results from the software and hardware implementations can be different, in more ways than
just speed. While the software implementation is a reference for the hardware accelerated
version, in some cases different approaches are used if speed or quality can be improved.
The Media SDK dispatcher is also available as source code in the “opensource/mfx_dispatch”
which is part of the Media SDK installer package.
An application interfaces with the Intel® Media SDK through the Dispatcher library. The
Dispatcher loads the appropriate library at runtime and sets up an SDK context called a
“session” to the application. The session delivers the API’s functionality and infrastructure
(threads, schedule/sync capability, etc.); the dispatcher provides the interface.
DECODE
Dispatcher
Threads
Application VPP
Scheduler
ENCODE
User
Core
The Dispatcher (libmfx.lib) must be statically linked to your application. It is responsible for
locating and loading the correct library. In addition, it sets up the DirectX* context necessary
for communicating with the GPU. This occurs even for software sessions using only system
memory.
When an application initializes an Intel Media SDK session, the dispatcher will load either the
software library (MFX_IMPL_SOFTWARE) or the hardware library appropriate for the platform
(MFX_IMPL_HARDWARE).
Hardware “Common Files” locations for 32- and 64-bit DLLs specified in the
registry by the graphics driver installer. DLL name includes codes for
target platform. If a driver with the appropriate name is not found in
the registry location used by the dispatcher, hardware session
initialization will fail.
Software Located by standard DLL search rules (i.e. system path). Intel Media SDK
installer updates path with install target bin directory. If libmfxsw64.dll
(libmfxsw32.dll for MFXInit run in 32 bit mode) cannot be found, the
software session initialization will fail.
Here are some fundamental ideas behind Intel® Media SDK sessions. Implementation details
are subject to change, but there are several essential architectural components to be aware
of:
Thread pool: Started with session initialization to avoid thread creation overhead.
The scheduler manages thread assignment to tasks.
Memory management core: Intended to enable the highest possible performance via
CPU and GPU allocation management, copy minimization, fast copy operations, atomic
lock/unlock operations, opaque memory, etc.
Thread Pool
Task requests
DECODE
Tasks in progress
Scheduler
ENCODE
All of these components work together to maximize asynchronous throughput of media tasks.
Since Intel Media SDK automatically creates several threads per session, this may interfere
with other threading approaches, especially in software mode. Please see chapter 5 for more
details on application performance optimization.
Parameters:
Additional modes are provided for switchable graphics and multiple monitors. See
Appendix D in the manual for more details. Note: there is an _ANY mode for the
hardware and auto implementation types which extends support to scenarios with (or
without) multiple graphics devices.
SDK version is used for the version parameter of MFXInit, and is also used in the Intel Media
SDK Reference Manual. Externally, Intel Media SDK is generally referenced by release (for
example, Intel Media SDK 2012).
Intel Media SDK allows session initialization with an unspecified (null) version. This can be
useful to determine the highest level of API supported by the loaded DLL (hardware or
software). However, mismatches are possible when sessions are initialized with a null version:
For most cases we recommend specifying a version instead of using the null version feature.
The minimum version providing all functions used by the program is best, since this will allow
the greatest portability. For example, even if API version 1.3 is available, it is probably not
required if a session is only doing a simple encode—this can be accomplished by specifying
version 1.0, which would enable portability even to machines with old drivers. However, null
version can be an effective method of querying the age of the software and hardware DLLs
for the purpose of informing users to update.
int main()
{
mfxVersion SWversion = {0,1}, HWversion = {0,1}, version;
mfxSession SWsession, HWsession;
mfxStatus sts;
MFXClose(SWsession);
MFXClose(HWSession);
}
Comparing the software and hardware API level can help determine if the graphics driver is out
of sync and needs to be updated. Note: beta releases can offer a software “preview” of
hardware capabilities, so a higher software API is expected.
Also note that there is a “mediasdk_sys_analyzer” tool distributed with the SDK package. The
tool analyzes the developer platform and reports back Media SDK related capabilities.
An Intel Media SDK session holds the context of execution for a given task, and may contain
only a single instance of DECODE, VPP, and ENCODE. Intel Media SDK’s intrinsic parallelism, in
terms of thread pool and scheduler, is well optimized for individual pipelines. For workloads
with multiple simultaneous pipelines additional sessions are required. To avoid duplicating the
resources needed for each session they can be “joined”. This sharing also enables task
coordination to get the job done in the most efficient way. Intel Media SDK 2.0 and later
provide the following functions for multiple simultaneous sessions:
mfxJoinSession The application then tells the SDK to share the task scheduler (thread
pool) between parent and child sessions. Sharing the scheduler
increases the performance by minimizing the number of threads, which
in turn minimizes the number of context switches.
mfxDisjoinSession After the batch conversion is complete, the application must disjoin all
child sessions prior to closing the root session.
mfxClosesession The application must also close the child session during de-initialization.
The performance impact with joined sessions is most noticeable when using the Intel Media
SDK with software based sessions. This is because the CPU is prone to oversubscription.
Joining sessions reduces the impact of this oversubscription.
With shared sessions, child session threads are placed under the control of the parent
scheduler, and all child scheduler requests are forwarded. In addition to the performance
benefits described above, this scheme enables coherent subtask dependency checking across
the joined sessions.
Parent Scheduler
Tasks in progress Tasks in progress
Child
DECODE ENCODE VPP Sched DECODE ENCODE VPP
The relationship between parent and child sessions can be fine-tuned with priority settings.
The Intel® Media SDK hides most of the complexity of efficiently pushing data through the
session infrastructure, but awareness of the internal mechanisms can be helpful for developing
applications.
Surfaces need to be set up where the work will be done (CPU or GPU) to avoid extra
copies.
The session needs to buffer frames internally, as well as reorder them, to maximize
performance.
The session will estimate how many surfaces it needs. This needs to happen before
work starts.
As work is happening, some frame surfaces will be locked. The program controlling the
session will frequently need to find an unlocked frame surface as it interacts with the
session.
SYSTEM_MEMORY specifies that CPU memory will be used. This is best for the software
implementation.
VIDEO_MEMORY specifies that GPU memory will be used. This is best for the hardware
implementation.
HW
SW/HW?
DECODE/VPP/ENCODE
DECODE/VPP/ENCODE Allocator
Initialization
Initialization Callbacks
The lock mechanism works much like other lock implementations. Internally, the Intel Media
SDK session increases a lock count when a surface is used and decreases the count when
done. Applications should consider the frame off limits when locked.
Pseudo code:
The application asks each stage how many surfaces are necessary. The total number of
surfaces to allocate is the sum of all stages.
The Intel® Media SDK natively supports I/O from both system memory and Microsoft* Direct3D*
surfaces. Generally speaking, the Intel Media SDK performs best when the memory type
selected matches the way the SDK was initialized. For example, if an application initializes a
session with MFX_IMPL_SOFTWARE, performance for that session is improved by using system
memory buffers. SDK sessions initialized to use the hardware acceleration libraries
(MFX_IMPL_HARDWARE) will benefit from the use of Microsoft Direct3D surfaces. Not using
the correct memory type can cause significant performance penalties, because each frame of
data must be copied to and from the specified surface type.
Applications can control the allocation of surfaces via the mfxBufferAllocator and
mfxFrameAllocator interfaces. When these interfaces are not implemented, the Intel Media
SDK allocates system memory by default. See the “Memory Allocation and External Allocators”
section in the Intel Media SDK Reference Manual.
Since the application manages the surface pool, it must also be aware of which surfaces are
currently in use by asynchronous operations. Locked surfaces cannot be used so a search
must be done for an unused surface when additional work is added to an Intel® Media SDK
stage.
sts = m_pmfxDEC->DecodeFrameAsync(&m_mfxBS,
&(m_pmfxSurfaces[nIndex]),
&pmfxOutSurface,
&syncp);
// . . .
return MSDK_INVALID_SURF_IDX;
}
The following diagram illustrates the buffer search process. Individual members
of the frame/surface pool are managed externally. Those currently in use are
flagged as locked, so finding an unlocked surface requires the application to
search for one that is available.
Bitstream
Frame Pool
L
DEC L
L=locked
set_up_params(&InitParam_d);
set_up_allocator(InitParam_d);
// Initialize DECODE
decode->DecodeHeader(bitstream, InitParam_d);
decode->Init(InitParam_d);
// Close components
decode->Close();
delete_surfaces_and_allocator();
Here the decoder is initialized from the stream via DecodeHeader. Since operation is asynchronous,
two loops are required.
Bitstream
DEC
When the bitstream is done, frames may still be in the decoder, so an additional loop is required to
drain them.
DEC
set_up_params(&InitParam_e);
set_up_allocator(InitParam_e);
// Initialize ENCODE
encode->Init(InitParam_e);
// Close components
encode->Close();
delete_surfaces_and_allocator();
Function Behavior
Parses the input bit stream and populates mfxVideoParam structure.
DecodeHeader Parameters are NOT validated. Decoder may or may not be able to decode the
stream
Validates the input parameters in mfxVideoParam structure completely.
Query Returns the corrected parameters (if any) or MFX_ERR_UNSUPPORTED if
parameters cannot be corrected. Parameters set as zero are not validated.
QueryIOSurf Does NOT validate the mfxVideoParam input parameters except those used in
calculating the number of input surfaces
The introduction of Intel® Iris™ Pro Graphics, Intel® Iris™ Graphics and Intel® HD Graphics (4200+
Series) has enabled the Intel Media SDK to evolve the use of this setting by expanding it to
include seven distinct target usages, each with its own particular balance of quality and
performance. The refined granularity of the TU settings enables the Media SDK library to
introduce specific algorithms for a particular setting. The Intel Media SDK 2013 R2 includes
the first of these TU specific optimizations: Look Ahead.
The Look Ahead enhancements do carry a cost. Besides increased latency, there is increased
memory consumption because the frames being analyzed must be stored. The amount of
memory used is directly related to the LookAheadDepth. LA may also increase the encode
execution time. The analysis work is done asynchronously to the actual encoding and the
encoding step is completed once the desired parameters are determined. The performance
impact for is ~20% using TU7 and for TU1-4 the look ahead performance impact compared to
VBR mode is negligible. Of course, developers are encouraged to experiment and make a
determination on what is best for their application.
The amount of frames the encoder analyzes controlled by a developer. Set the look ahead
depth via the mfxExtCodingOption2::LookAheadDepth parameter. Valid values are from 10 to
100 inclusive. The default (and recommended) value is 40, which can be set by setting this
parameter to 0.
There are some limitations to LA that need to be noted for the Intel Media SDK 2014 release:
- Look Ahead is only available on Intel® Iris™ Pro Graphics, Intel® Iris™ Graphics and Intel®
HD Graphics (4200+ Series)
- Intel Media SDK must be initialized to use API version 1.7 or newer for
MFX_RATECONTROL_LA, and 1.8 or newer for MFX_RATECONTROL_LA_CRF.
- Only progressive content is supported at this time
- In certain circumstances HRD requirements may be violated
These core filters are triggered by any inconsistencies between the mfxVideoParam struct
parameters (as listed in Figure 9) for input and output surfaces. The input and output
parameters are compared and filters are switched on or off as required. If there is a difference
requiring a filter to change the frame, the filter will run. If there are no differences in the
affected parameters, the filter does not run. While Intel Media SDK can minimize the overhead
of additional pipeline stages and much of the work can be done in parallel, there may be a
significant performance cost associated with adding additional steps.
The following pseudo code illustrates the steps required for VPP processing. These are very
similar to encode and decode.
// Initialize
vpp->Init(InitParam_v);
// run VPP for all frames in input
while (frame_in) {
vpp->RunFrameVPPAsync(frame_in, frame_out, sync_v);
SyncOperation(sync_v);
}
//drain loop
while (buffered data remains) {
vpp->RunFrameVPPAsync(NULL, frame_out, sync_v);
SyncOperation(sync_v);
}
// Close components
vpp->Close();
Figure 10: VPP I/O and pseudo code.
When the input is interleaved and the output is progressive, the Intel® Media SDK enables
deinterlacing. If the application cannot detect the input picture structure, set PicStruct to
MFX_PICSTRUCT_UNKNOWN for the input format at VPP initialization, and provide the correct
input frame picture structure type on every frame.
VPP is generally nested in the pipeline after DECODE. In this case, use the output parameters
of the decoder to initialize and process VPP frames.
The following pseudo code example demonstrates how to configure the SDK VPP with
parameters from decode:
if (MFX_ERR_NONE == sts) {
memcpy(&mfxVppParams.vpp.In, &mfxDecParams.mfx, sizeof(mfxFrameInfo));
memcpy(&mfxVppParams.vpp.Out, &mfxVppParams.vpp.In, sizeof(mfxFrameInfo));
mfxVppParams.vpp.Out.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
}
...
sts = m_pmfxDEC->DecodeFrameAsync(pmfxBS, pmfxSurfaceDecIn,
&pmfxSurfaceDecOut, &mfxSyncpDec);
if (MFX_ERR_NONE == sts) {
m_pmfxVPP->RunFrameVPPAsync(pmfxSurfaceDecOut, pmfxSurfaceVPPOut,
NULL, &mfxSyncpVPP);
}
To perform scaling, define the region of interest using the CropX, CropY, CropW and CropH
parameters to specify the input and output VPP parameters in the mfxVideoParam structure.
These parameters should be specified per frame.
CropW CropW
Filled with Black
To maintain the aspect ratio, letter and pillar boxing effects are used. The letter boxing
operation adds black bars above and below the image. Pillar boxing is the vertical variant of
letter boxing. As a result, the aspect ratio parameters of the image are maintained and non-
symmetrical changes in the image size are compensated for using the black bars.
Horizontal
720x480 0,0,720,480 640x480 0,0,640,480
stretching
16:9 4:3
with letter
1920x1088 0,0,1920,1088 720x480 0,36,720,408
boxing at the top
and bottom
4:3 16:9
with pillar boxing
720x480 0,0,720,480 1920x1088 144,0,1632,1088
at the left and
right
Intel recommends using the frame rate provided by DECODE for the VPP input parameters.
If VPP is initialized to perform deinterlacing with an input frame rate of 29.97 and an output
frame rate of 23.976, the Inverse Telecine algorithm is used.
Note that the frame rate conversion algorithm does not take into account input timestamps.
Therefore, the number of output frames does not depend on any inconsistencies in the input
timestamps. In the Intel Media SDK samples, the MFXVideoVPPEx class is demonstrates frame
rate conversion based on timestamps.
See “<InstallFolder>\samples\sample_mfoundation_plugins\readme-mfoundation-plugins.rtf”
for details.
More complex frame rate conversion approaches can be implemented as user plugins.
Please note that the Intel Media SDK implementation of frame rate conversion assumes
constant frame rate. Do not use frame rate conversion for variable frame rate (VFR) inputs.
Many implementations of Intel® Iris™ Pro Graphics, Intel® Iris™ Graphics and Intel® HD Graphics
(4200+ Series) allow support the creation of interpolated frame content by using the
MFX_FRCALGM_FRAME_INTERPOLATION algorithm if the ratio of input-to-output frame rate is
not supported, the VPP operation will report MFX_WRN_FILTER_SKIPPED. Commonly
supported conversion ratios are 1:2 and 2:5 (useful for creating 60Hz content from 30Hz and
24Hz, respectfully).
*Some platforms have global settings for properties such as Denoise. If this is enabled Media
SDK denoise operation may be ignored.
dnu.Header.BufferId=MFX_EXTBUFF_VPP_DONOTUSE;
dnu.Header.BufferSz=sizeof(mfxExtVPPDoNotUse);
dnu.NumAlg=1;
dnu.AlgList=&denoise;
memset(&conf,0,sizeof(conf));
conf.IOPattern=MFX_IOPATTERN_IN_SYSTEM_MEMORY|
MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
conf.NumExtParam=1;
conf.ExtParam=&eb;
conf.vpp.In.FourCC=MFX_FOURCC_YV12;
conf.vpp.Out.FourCC=MFX_FOURCC_NV12;
conf.vpp.In.Width=conf.vpp.Out.Width=1920;
conf.vpp.In.Height=conf.vpp.Out.Height=1088;
/* preprocessing initialization */
MFXVideoVPP_Init(session, &conf);
Although mixing user code and the Intel Media SDK functions in the application was possible
before the 2.0 release, it required data synchronization between the user-defined functions
and the Intel Media SDK functions. A key benefit of the newer API is that it allows integration
of user-defined functions into an asynchronous Intel Media SDK pipeline.
Assume that the user-defined module performs Function(input, output) {algo}. Submit specifies
the input and output argument structure of Function, which the Intel Media SDK scheduler
treats as an abstract task. The SDK calls Submit to check the validity of the I/O parameters. If
successful, Submit returns a task identifier to be queued for execution after the SDK resolves
all input dependencies. Execute is the actual implementation of the Function algorithm. The
SDK calls Execute for task execution after resolving all input dependencies.
Note that user-defined functions are asynchronous. This means that the output is not ready
when ProcessFrameAsync returns. The application must call SyncOperation at the
corresponding sync_point to get the function output when it is available. Alternatively, the
output can be pipelined to other SDK functions without synchronization. (See the Intel Media
SDK Reference Manual for details on asynchronous operation.)
MFXVideoUSER_Register(session,0,&my_user_module);
MFXVideoDECODE_Init(session, decoding_configuration);
MFXVideoVPP_Init(session, preprocessing_configuration);
/* Initialize my user module */
MFXVideoENCODE_Init(session, encoding_configuration);
do {
/* load bitstream to bs_d */
MFXVideoDECODE_DecodeFrameAsync(session, bs_d, surface_w, &surface_d, &sync_d);
MFXVideoVPP_RunFrameVPPAsync(session, surface_d, surface_v, NULL, &sync_v);
MFXVideoUSER_ProcessFrameAsync(session, &surface_v, 1, &surface_u, 1, &sync_u);
MFXVideoENCODE_EncodeFrameAsync(session, NULL, surface_u, bs_e, &sync_e);
MFXVideoCORE_SyncOperation(session, sync_e, INFINITE);
/* write bs_e to file */
} while (!end_of_stream);
MFXVideoENCODE_Close(session);
/* Close my user module */
MFXVideoVPP_Close(session);
MFXVideoDECODE_Close(session);
MFXVideoUSER_Unregister(session);
A rotation module task is associated with a single frame and is bundled with input and output
frame surfaces. The 180-degree rotation algorithm requires two swapping operations. First,
pixels within each line are swapped with respect to the middle pixel. Then lines are swapped
with respect to the middle line of the image. The algorithm can be parallelized to process image
lines in chunks, since lines can be processed independently. Call Execute several times for a
particular task (frame) until the frame is fully processed. The code for this example is shown
below.
m_pTasks[ind].pProcessor->Init(surface_in, surface_out);
*task = (mfxThreadTask)&m_pTasks[ind];
return MFX_ERR_NONE;
}
To display Stereo 3D content on platforms that support Microsoft* Direct3D 11.1, please see
Microsoft documentation and sample code found here:
http://go.microsoft.com/fwlink/p/?linkid=238402
For Microsoft* Windows* 7 platforms using 2nd Generation Intel® Core™ processors or later, a
static library (.lib) is provided as part of the Intel Media SDK that allows creation of a
GFXS3DControl object that can be used for detection and control of stereo 3D-capable
televisions and embedded DisplayPort* devices. These display devices (monitors) may offer a
wide variety of PC-to-display configuration options, and the options offered can vary.
Some stereo 3D televisions allow the user to manually select the method it uses to interpret
the data being received. Televisions that use HDMI* 1.4 allow this selection to be made
programmatically. Televisions supporting this standard are not required to support all possible
methods of receiving stereo images from a device attached to it. The PC and display device
must negotiate a common, compatible method of outputting display data, and this is a
necessary first step when displaying 3D content. The GFXS3DControl object provides
Some stereoscopic displays require activation of “active glasses” or other manual tasks to
correctly display the 3D content they are receiving, and there is nothing the GFXS3DControl
control can do to automate this display-specific part of the process.
#include "igfx_s3dcontrol.h"
…
IGFX_S3DCAPS s3DCaps = {0}; // platforms S3D capabilities
IGFX_DISPLAY_MODE mode = {0}; // platform display mode
// and finally set the mode. It may take some time for some monitors to actually
start display the mode.
m_pS3DControl->SwitchTo3D(&mode);
Once a Stereo 3D mode is enabled, two DXVA2 video processors can be created, one for the
left channel and one for the right. This is accomplished by using a method available in the
DXVA2VideoService to select the view prior to creating each Video Processor object.
To render different content to left and right views, a DXVA2 VideoProcessBlt call is needed
for each video processor. Once both views have been rendered, a single call to Direct3D*
Present call will cause the display controller to continuously send both the left and right views
to the monitor. All content that is not part of the render output of these DXVA2 processors
(for example, the Windows* desktop) will be displayed to both the left and right views. Two
VideoProcessBlt operations should occur between each Present call, to update left and right
views.
When all stereoscopic displaying is complete, an application can return the system to 2D display
mode by calling the SwitchTo2D method of the GFXS3DControl.
The following code example showcases how to insert the SEI message type
“user_data_registered_itu_t_t35”, commonly used to carry closed captioning information.
#define SEI_USER_DATA_REGISTERED_ITU_T_T35 4
typedef struct
{
unsigned char countryCode;
unsigned char countryCodeExtension;
unsigned char payloadBytes[10]; // Containing arbitrary captions
} userdata_reg_t35;
mfxPayload* m_payloads[1];
m_payloads[0] = &m_mySEIPayload;
1) Configure encoder with a set of parameters closest matching the desired SPS/PPS set
2) After initalizing the encoder call GetVideoParam() with extended buffer
mfxExtCodingOptionSPSPPS to extract the SPS and PPS selected by the encoder (make
sure to allocate sufficient space for storage of SPS and PPS buffer)
3) Make the desired modifications to SPS and/or PPS buffer
4) Apply the changes to the encoder by calling Reset(), referencing the new mfxVideoParam
structure including the mfxExtCodingOptionSPSPPS extended buffer. Note that, when
using this method the SPS/PPS parameters set via mfxVideoParams will be overwritten by
the custom SPS/PPS buffer
The simple example below shows how to manually control SPS “constraint_set” values. In this
case we are setting “constraint_set1” flag to 1 (indicates constrained baseline profile). The
same approach can be used to control any SPS or PPS parameters manually.
NOTE: The Media SDK API was extended as part of Media SDK 2012 R2 release. This API, 1.4,
now allows direct API control to set Constrained Baseline Profile
(MFX_PROFILE_AVC_CONSTRAINED_BASELINE). Manual SPS control is not required.
mfxExtCodingOptionSPSPPS m_extSPSPPS;
MSDK_ZERO_MEMORY(m_extSPSPPS);
m_extSPSPPS.Header.BufferId = MFX_EXTBUFF_CODING_OPTION_SPSPPS;
m_extSPSPPS.Header.BufferSz = sizeof(mfxExtCodingOptionSPSPPS);
An easy way to achieve concurrency is to host each full session in separate thread as
illustrated in the example below (based on Media SDK sample_decode sample).
sts = Pipeline.Init(pParams);
MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, 1);
sts = Pipeline.RunDecoding();
MSDK_CHECK_RESULT(sts, MFX_ERR_NONE, 1);
// For simplicity, error handling (such as in original decode sample) ignored
Pipeline.Close();
DecrementThreadCount();
return 0;
}
WaitForSingleObject(ThreadsDoneEvent, INFINITE);
delete [] pDecodeThreads;
CloseHandle(ThreadsDoneEvent);
}
- If processing large number of streams with high resolution on a 32bit OS you may run
into memory limitations. The error message provided by Media SDK when reaching
memory limit is not very descriptive (due to underlying vague return codes from OS)
- If you encounter this limitation, please consider using 64 bit OS (greater pool of
available graphics memory)
Note: The white paper describes how to configure encoder for temporal scalability but not how
the decoder application must interpret the encoded stream.
For temporal streams “nal_unit_header_svc_extension” is attached to each slice header, as
described in standard (G.7.3.1.1). The headers contains “temporal_id”, which can be used to
extract a certain temporal layer. Logic is simple: skip all slices with temporal_id >
target_temporal_id and keep all slices with temporal_id <= target_temporal_id.
Prior to the introduction of the Intel Media SDK 2013, supporting hardware acceleration in a
multi-gpu environment was bound by a fundamental constraint - the Intel Graphics adapter
needed to have a monitor associated with the device to be active. This constraint was due to
the capabilities of the Microsoft DirectX 9 infrastructure that the Intel Media SDK, and
associated graphics driver were based upon. The introduction and corresponding support of
the DirectX 11 in both the Intel Media SDK and graphics driver has now simplified the process
of developing applications to utilize Intel’s fixed function hardware acceleration, even when
the Intel graphics device is not connected to an external display device.
Applications wishing to leverage the Intel Media SDK’s hardware acceleration library
when a discrete card is the primary device, or on devices without a monitor attached –
such as “Session 0” modes, are required to initialize the Intel Media SDK to utilize the
DirectX11 infrastructure, as well as provide its own memory allocation routines that
manage DirectX 11 surfaces.
The following code illustrates the correct initialization parameters for initializing the
library. MFX_IMPL_AUTO_ANY will scan the system for a supporting adapter, while
MFX_IMPL_VIA_D3D11 indicates the need for DirectX 11 support.
In addition to initializing the library, the application also needs to create the correct
DirectX device context on the correct adapter. The following code is a simplistic
illustration of how to enumerate the available graphics drivers on the system, and
create the DirectX device on appropriate adapter. In this case the g_hAdapter handle is
actually pointing to the Intel adapter which is in the secondary position.
g_hAdapter = GetIntelDeviceAdapterHandle(session);
if(NULL == g_hAdapter)
return MFX_ERR_DEVICE_FAILED;
hres = D3D11CreateDevice(g_hAdapter,
D3D_DRIVER_TYPE_UNKNOWN,
NULL,
0,
FeatureLevels,
(sizeof(FeatureLevels) /sizeof(FeatureLevels[0])),
D3D11_SDK_VERSION,
&g_pD3D11Device,
&pFeatureLevelsOut,
&g_pD3D11Ctx);
if (FAILED(hres))
return MFX_ERR_DEVICE_FAILED;
g_pDXGIDev = g_pD3D11Device;
g_pDX11VideoDevice = g_pD3D11Device;
g_pVideoContext = g_pD3D11Ctx;
*deviceHandle = (mfxHDL)g_pD3D11Device;
return MFX_ERR_NONE;
}
MFXQueryIMPL(session, &impl);
HRESULT hres =
CreateDXGIFactory(__uuidof(IDXGIFactory2),(void**)(&g_pDXGIFactory) );
if (FAILED(hres)) return NULL;
IDXGIAdapter* adapter;
hres = g_pDXGIFactory->EnumAdapters(adapterNum, &adapter);
if (FAILED(hres)) return NULL;
return adapter;
}
Finally, applications also need to ensure they are using the appropriate DirectX 11
routines for its memory allocator. Examples for this are available as part of the Media
SDK sample code.
When a system supports display output to multiple monitors, each display is connected to the
output of a graphics adapter. One graphics adapter may support multiple displays. For
example, on many systems Intel’s processor graphics adapter may be connected to an analog
VGA* monitor and to an HDMI* monitor. The Windows* operating system will assign the
primary (or “main”) display as the default display device and use the associated adapter as the
acceleration device if the application does not provide specific information about which device
it wants to use. If the system contains one or more discrete graphics adapters in addition to
the integrated graphics adapter, the user can set a display connected to the discrete adapter
to be the primary display. In this case, applications that request the services of the default
display adapter will not be able to take advantage of Intel’s Quick Sync Video acceleration.
When an application creates an Media SDK session using the ‘default’ adapter, a call to
MFXQueryIMPL() may not return a MFX_IMPL_HARDWARE status if the default adapter does
not provide acceleration support for the MediaSDK API.
Some platforms allow the user to dynamically select which graphics adapter is responsible for
rendering to a single display for specific applications. For example, a user can select a game to
execute on a discrete graphics adapter and a media application to execute on Intel’s integrated
graphics adapter. On these “switchable graphics” systems, the primary display adapter is
changed to match the desired adapter for each application (process). The resources and
capabilities of the inactive adapter are not available to applications, as the only adapter seen
by the application is the current, active adapter. From the application’s point of view, the
Media SDK operations may report MFX_WRN_PARTIAL_ACCELERATION if a session is initialized
for hardware acceleration but the currently configuration does not allow acceleration to be
used.
Useful tools
API issue?
Intel® Media SDK API
(encode, decode, processing)
DirectShow/MFT plugin Problem still there with similar filter (i.e. different MPEG-2 decode)
Intel® Media SDK-based Intel® GPA trace: which components is app using?
application
MediaSDK_tracer tool: Which parameters are used at decode/VPP/encode
setup? For each frame?
Sample application Can the problem be localized to a specific stage of the sample app?
Software only Does problem also exist in hardware? Software versions of different
releases?
Hardware only Also in software? Outputs change with different graphics driver versions?
Different platforms?
1. Specify an output file. By default, logging for encode, decode, VPP, and sync is disabled.
To enable, check per-frame logging. Note: the tracer tool appends to the log file.
2. Press Start.
3. The tool can capture from a running process, but it is often helpful to start the application
after starting the trace tool.
The tool provide an easy-to-use suite of optimization tools for analyzing and optimizing games,
media, and other graphics intensive applications.
Please also refer to the following paper for details on how to analyze Media SDK application
using the Intel® GPA tool.
http://software.intel.com/en-us/articles/using-intel-graphics-performance-analyzer-gpa-
to-analyze-intel-media-software-development-kit-enabled-applications/
The data that identifies PAL or NTSC is stored in the video_format value of the
sequence_display_extension header. Since this data is purely decorative to the Intel Media
SDK, it is not directly accessible through the mfxVideoParam structure. Instead, the application
can make the Encoder write this data into the stream by specifying a
mfxExtCodingOptionSPSPPS structure with the required headers. This method allows the
application to define custom SPS and PPS information in the stream.
Please refer to “Custom control of encoded AVC SPS/PPS data” chapter for details on manual
customizations to SPS/PPS data.