HDV Whitepaper
HDV Whitepaper
HDV Whitepaper
WHITE PAPER
HDV™ Devices
INTRODUCTION
Abstract
We are rapidly transitioning into a high definition world. Driven by this trend, the demand for HD
content creation is increasing. Although there had been a plethora of high-end HD creation tools,
until the advent of the HDV™ format, no cost-effective production tools existed.
The HDV format was established on September 30, 2003 by four companies: Canon Inc., Sharp
Corporation, Sony Corporation, and Victor Company of Japan, Limited. The concept of the HDV format
was to develop a high definition standard capable of inexpensively recording high quality HD video
using conventional DV recording media. By using the mechanisms of a DV camcorder, mitigation of
development costs and development efficiency would be realized.
Efficient bit-rate reduction while retaining the high quality of HD images is made possible by means
of the MPEG-2 compression scheme. In order to use MPEG-2 encoding to compress the large quantity
of HD image data, complex signal processing and small silicon area for portable recorders are
required. Advancements in semiconductors and signal processing technology have now made possi-
ble the use of the HDV format as a standard for low cost content creation.
To meet this end, Sony introduced a compact-sized digital HD video camcorder for professional use,
the HVR-Z1 series, which was put on the market in the beginning of 2005. Since the launch of the
HDR-FX1 (consumer camcorder), the HVR-Z1 series and the HVR-M10 (compact deck) series, the HDV
format has become the most popular HD recording format with about 37,000 HDV 1080i units shipped
during the first 6 months of availability.
This paper will explain the HDV format and the technologies employed in the development of these
professional devices.
Trademark Notice:
HDV and HDV logo are trademarks of Sony Corporation and Victor Company of Japan, Limited.
2
HDV Format
3
Picture format notation
When describing the picture format, it may be written in a form such as “1080/60i.”
Here is meaning of this notation
1080/60i
“i” indicates interlace scanning
“p” indicates progressive scanning
Frame / field frequency
Number of effective scanning lines
Frame / field frequency basically indicates how many images are produced in one second. If this fre-
quency is 60, then that means 60 images are produced in one second.
Progressive scanning
The DTV broadcast transmission infrastructures give the broadcaster the option of using an inter-
laced system or a non-interlaced system - the latter known as progressive scanning. The progressive
scanning system was adopted initially in computer displays, which do
not require considerable transmission bandwidth. In progressive scan-
ning, each line is scanned in sequential order from the top to bottom of
the picture. The entire 720 lines (or 1080 lines for 1080p) are displayed in
one scanning operation.
4
Recording HD images on DV tapes
The HDV tape transport mechanisms are based on the DV helical scan format. Therefore, videotapes
used for DV recording can also be used for HD recording, and the recording time is equivalent.
Because the DV tape specification does not spell out the exact chemical formulations or manufactur-
ing methods, but only the physical characteristics, there are many possible formulations that arrive
at the target specifications. Some tape formulations and manufacturing methods yield better quali-
ty, robustness and stability than others, so the actual “in the field use” experience will vary signifi-
cantly from one tape brand or model to another. Although most people think of videotapes in terms
of the magnetic materials or lubricants used, few realize that the plastic substrate employed con-
tributes significantly to the tracking stability. High quality “premium” tapes such as Sony’s “Digital
Master™” are preferred not only for HDV but also for DV recordings as its dimensional stability is
maintained over a wide temperature range. The benefit is that tracking will be maintained even
when recording in extremely high temperatures and then editing in a cool environment.
Helical scan
When videotape recording was being developed in the late 70s, engineers faced a serious problem.
Slow tape speed was desirable for longer recording time, but the venerable stationary head tape
recording scheme could not yield the high writing speeds required for high-quality video recording.
A means of achieving high writing speeds and long record time needed to be invented. The break-
through solution, “helical scanning,” became the basis for all analog and digital videotape recoding
to date.
In the helical scan process, the tape is pulled at slow speed, and the high writing speed is achieved by
helically wrapping the tape around a rapidly rotating drum with four (in the case of DV and HDV)
small embedded record heads. This recording scheme produces recorded tracks that run diagonally
across the tape from one edge to the other. In other words, the recorded tracks are parallel to each
other but are at an angle to the edge of the tape. Analog formats stored one video field or frame
every drum revolution. Digital recording schemes continuously quantify and store the instantaneous
signal level as a numerical value, producing considerable amounts of data. To handle the greater
data-rates generated by digital recording formats, a segmented recording scheme is utilized where
multiple tracks are used to record a single video frame (for example, ten tracks for DV and HDV 720p).
For DV and HDV recording, the tracks are subdivided into sectors that carry specific types of payload
or operation data, such as ITI (Insert Track Information), which carries tracking information.
DV tape recording
HDV1080i and HDV 720p devices are downward compatible with DV from which both HDV tape trans-
ports mechanisms were derived. The DV format records the digital signal following a segmented
recording scheme with ten tracks per frame for NTSC, (480/60i) and twelve tracks per frame for PAL,
(576/50i). The video, audio, and subcode payloads are recorded into individual sectors within each
track.
HDV 720p
The HDV 720p specification simplified and mitigated product design and manufacturing cost by
adopting the same track and sector structure as DV. The sector’s ITI (tracking signal), subcode and
overwrite areas are also used for HDV recording. HDV 720p devices store the entire HDV payload,
which encompasses MPEG video (18.3 Mbps), audio (384Kbps), error correction and visual search sig-
nal into the sector used exclusively for video by the DV format. Ten TS tracks combined contain one
error correction unit.
5
HDV 1080i
The HDV 1080i specification does not follow the DV tape footprint as HDV 720p does. In order to
accommodate a higher bit-rate, the HDV 1080i specification adopted a unique track and sector struc-
ture that maximizes the length of the track for MPEG-2 payload recording. The video payload bit-rate
is 25.4Mbps, which is 28% higher than what is possible by strictly adhering to the DV track and sector
structure. The remaining area contains audio at 384Kbps, error correction data and two visual search
signals.
• Consumer HD media
Optical disk media suitable for long-form high definition program distribution requires high data
capacity media. Blu-ray Disc™ optical media is specifically designed for this purpose. Blu-ray Disc
recorders can be connected directly to compatible HDV devices via their i.Link® ports for an effi-
cient and cost effective distribution of high quality prerecorded HD content. One single layer Blu-
ray Disc media provides 115 minutes of native HDV 1080i recording. It is not necessary to resort to
high compression schemes or to use expensive, time-consuming multi-pass encoders for long form
program distribution.
• Film-out
HDV 1080i has a very high vertical resolution of 1080 lines. It is possible to deliver 1.85:1 or 2.35:1
film formats with excellent visual quality. The high spatial and temporal resolution of HDV 1080i
produces very detailed 35mm film out.
6
MPEG-2
Profile
Spatially-
Frame types Simple Main SNR-Scalable High Studio
Scalable
PICTURES I&P I,P & B I,P & B I,P & B I,P & B I,P & B
CHROMA
4:2:0 4:2:0 4:2:0 4:2:0 4:2:2~ 4:2:0 4:2:2~ 4:2:0
SAMPLING
HIGH
Samples per Line 1920 1920 1920
Lines per Frame 1152 1152 1080
Frames per Second 60 60 60
Max. Bit-Rate (Mbps) 80 100 300
MAIN
Samples per Line 720 720 720 720 720 720
Lines per Frame 576 576 576 576 576 576
Frames per Second 30 30 30 30 30 30
Max. Bit-Rate (Mbps) 15 15 15 15 20 50
LOW
Samples per Line 352
Lines per Frame 288
Frames per Second 30
Max. Bit-Rate (Mbps) 4
7
In the case of the HDV format, efficient bit-rate reduction while retaining the high quality of HD
images is possible by means of the MPEG-2 MP@HL-1440 (main profile at high level) compression
scheme highlighted above.
The MPEG (Moving Picture Experts Group) created the MPEG-2 standard as a “compression toolkit”
that could accommodate a wide range of picture sizes, from standard definition to high definition, at
a higher picture quality for a given bit-rate. MPEG-2 was approved in 1994 as a standard intended for
delivery of high quality digital video. It is the compression scheme used for DVD disks, direct digital
broadcast satellites, terrestrial and cable high-definition TV (HDTV), digital standard definition broad-
cast (SDTV), and cable TV (CATV).
The Moving Pictures Expert Group showed its wisdom by not rigidly specifying the compression algo-
rithms. Instead, they merely specified the syntax for storing and transmitting the compressed data,
as well as the decoder. This approach freed the encoder manufacturers to continue to refine the
encoding algorithm. The only constraint is that it must produce valid MPEG-2 streams that can be
decompressed by any MPEG-2 compliant decoder.
MPEG-2 realizes very high compression efficiencies while maintaining high video quality by taking
advantage of temporal redundancies within a sequence of images. The MPEG-2 codec works on two
stages. In the first step, all the video frame images are divided into 8-pixel luminance blocks and 16-
pixel color blocks. One macro block contains four luminance blocks and two chrominance blocks.
The blocks are compressed using DCT-
based intra-frame compression tech-
niques similar to that used by DV. Then,
using the first compressed image as a
reference frame, (called an I-frame), the
second stage eliminates redundant infor-
mation, keeping only those parts of the
following images (B- and P-frames) that
differ from the reference image. During
playback, the decoder will then recon-
struct all images based on the reference
image and the “difference data” con-
tained in the B- and P-frames. This combi-
nation of I, B and P frames is known as a
Group of Pictures (GOP).
8
Forward Prediction
1 2 3 4 5 6 7 8 93 10 11 12 13 14 15 1 2 3 4
Backwards Prediction
Video ES
The raw output of an MPEG-2 encoder is called a video elementary stream or video ES. The data rate
of the HDV 1080i video elementary stream is 25Mbps.
Audio ES
The HDV audio is also compressed using the MPEG-2 compatible MPEG-1 Audio Layer II audio codec.
The data rate of the compressed audio elementary stream is 384Kbps, the highest data rate permissi-
ble, providing good compression efficiencies while maintaining high audio quality.
9
Transport Stream (TS)
The next stage multiplexes the video and audio PES into a single stream for storage or transmission.
The MPEG standard defines two methods for multiplexing video and audio elementary stream data:
Program and Transport. The Transport Stream method was selected for HDV recording because it sim-
plifies detection of the start and end of frames as well as facilitating recovery from packet loss or
corruption.
Thus, the video-PES and audio-PES streams are multiplexed to form a single Transport Stream. The
transport stream packets have a fixed 188 Byte packet size. The PES packet size is variable e.g. 2048
Byte; much longer than a transport packet. Thus, each PES packet is subdivided into multiple TS
packets as described below.
10
As shown in the graphic above, the PES header is placed at the beginning of a transport packet pay-
load, following the transport packet header. The remaining PES packet content fills the payloads of
consecutive transport packets until all the PES packet data is used up. The final transport packet
may be padded with blank information (digital ones) to make it conform to the specified 188 Byte
packet size. The transport stream containing the video and audio packetized elementary streams is
transmitted via the i.Link interface to another compatible HDV device for dubbing or for storage and
editing on a PC hard drive.
11
CHALLENGES
The native MPEG editing process described above is computationally intensive, and initially made
“native” long-GOP editing impractical. Thanks to the advent of powerful new microprocessors, cost-
effective memory and sophisticated software algorithms, efficient MPEG-2 long-GOP editing is now a
reality.
HD CODEC
HD Codec Engine
In order to produce a compact camera with a signal processing engine and MPEG-2 encoder suitable
for high quality HDV1080i recording, new small-sized silicon devices capable of complex signal pro-
cessing were required. The most advanced semiconductor design and manufacturing technologies
were used to create the “HD Codec Engine” for Sony’s HDR-FX1 and HVR-Z1 series camcorders. The
“HD Codec Engine” consists of 4 main LSI groups: the Baseband Signal Processor, the HD-MPEG Video
Encoder, the HDV Streaming Processor, and the HD-MPEG Video Decoder. These high-performance
LSIs were developed by utilizing manufacturing innovations like the “submicron process rule,” an
advanced semiconductor technology that enables extreme miniaturization and low power consump-
tion. To put it in perspective, the total amount of logic gates required to implement the complex algo-
rithms used by the processing, encoding, streaming and decoding blocks exceeds 5.4 million transis-
tors.
12
HD CODEC
HD Codec Engine
Compression relies on redundancy within an image, and groups of images in the case of MPEG, to
effectively reduce the bit-rate without degrading the image quality. Noise is random, and when noise
is mixed with the video image the encoder can not discriminate between the noise and the original
signal, causing compression inefficiencies and loss of video quality. There are many existing noise
reduction systems available today, but traditional noise reduction algorithms effectively reduce
noise at the expense of visual loss of fine detail and/ or subtle color and/or luminance shades. To
perform noise reduction without introducing visually destructive artifacts, it is necessary to use com-
plex algorithms capable of discriminating noise from low level signals. These noise reduction algo-
rithms are computationally intensive and not available “off the shelf”. The advanced noise reduction
and signal processing algorithms executed within the Advanced Signal Processor are proprietary and
were specifically designed for this application.
After all the complex video processing and noise reduction have been completed, the video signal is
down-sampled by the output stage to 8-bit 4:2:0 as required by the MPEG-2 encoder. The architecture
of the Baseband Signal Processor applies high bit-depth, high-bandwidth processing on the front
end, then scales the signal after the complex processing is finished, preserving subtle image details
while minimizing noise and other artifacts. Furthermore, the Baseband Image Processor LSI provides
real time, uncompressed analog video output to the viewfinder and LCD panel, as well as 1920 x 1080
analog component video output directly from the camera head. This LSI also offers real time down-
conversion of the high-definition signal to a standard definition signal compatible with conventional
television displays.
13
HD-MPEG video encoder.
The second building block of the HD Codec Engine is a miniature MPEG-2 encoder capable of produc-
ing high-quality compressed video. The realization of this LSI was a major breakthrough, as an HDV
1080i encoder suitable for portable devices needs to handle the large 1440 x 1080 frame at a high
sixty fields per second, with no sacrifice in quality . New technologies developed for high capacity
microprocessors were applied to the encoder LSI in order to achieve the complexity necessary for
this demanding application. Designed to the 130 nanometer process rule, the HD-MPEG encoder is
very compact, yet it contains one and half million transistors and consumes a mere 200mW of power.
The HD MPEG Video Encoder LSI is fed by the last stage of Baseband Processor, which delivers 8-bit
1440 x 1080 4:2:0 as required by the MPEG-2 @ ML/ H1440 spec. Therefore, the data rate of the raw sig-
nal fed to the encoder is: (1440 x 1080) @ 4:2:0 = 560Mbps. The compression rate must be 22.4:1to pro-
duce 25Mbps. A bit-rate reduction ratio of 22.4:1 with no visual degradation requires highly complex
and sophisticated algorithms. The MPEG-2 specification does not constrain the encoder algorithm;
only the decoder is rigidly defined. This approach enabled engineers to develop and refine propri-
etary coding algorithms that produce high quality, efficient bit streams which are fully compatible
with standard decoders.
After the MPEG Video Encoder LSI compresses the 560Mbps input signal with MPEG-2 @ ML/ H1440,
the resultant 25Mbps MPEG-2 is routed to the HD Streaming Processor which conforms it so that it
may it may be recorded onto videotape or output through the i.Link interface for editing or storage.
14
HD-MPEG video decoder
The decoder LSI converts the MPEG-2 signal from tape playback or from the i.Link into a high-defini-
tion baseband video signal. Its decoding algorithm has been optimized for producing stable output
when operating with still images or frame-by-frame tracking, which are usually poorly handled by
MPEG-2. This LSI was designed to the 180 nanometer rule with seven hundred thousand transistors
and 320mW current draw.
CONCLUSIONS
Conclusions
The HD Codec Engine enables the practical implementation of small, high performance camcorders
and VTRs based on the HDV 1080i specification for acquisition and post production applications.
These devices can inexpensively record high quality HD video using conventional DV tape as a record-
ing medium. By taking advantage of the tape mechanisms, interfaces, and media already developed
for DV, Sony has helped set the stage for affordable HDV equipment and low media costs, creating a
straightforward, cost-effective migration path from DV to HD.
©2006 Sony Electronics Inc. All rights reserved. Reproduction in whole or in part without written permission is prohibited. Features and specifications
15
subject to change without notice. Sony is a trademark of Sony. The New Way of Business is a service mark of Sony.