0% found this document useful (0 votes)
33 views6 pages

3D Video Coding and Transmission

The capture, transmission, and display of 3D content has gained a lot of attention in the last few years. 3D multimedia content is no longer confined to cinema theatres but is being transmitted using stereo- scopic video over satellite, shared on Blu-RayTMdisks, or sent over Internet technologies. Stereoscopic displays are needed at the receiving end and the viewer needs to wear special glasses to present the two versions of the video to the human vision system that then generates the 3D illusio

Uploaded by

zerrojs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views6 pages

3D Video Coding and Transmission

The capture, transmission, and display of 3D content has gained a lot of attention in the last few years. 3D multimedia content is no longer confined to cinema theatres but is being transmitted using stereo- scopic video over satellite, shared on Blu-RayTMdisks, or sent over Internet technologies. Stereoscopic displays are needed at the receiving end and the viewer needs to wear special glasses to present the two versions of the video to the human vision system that then generates the 3D illusio

Uploaded by

zerrojs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Debono, C. J. and Assuncao, P. A. A. (2015). Xjenza Online, 3:183–188.

Xjenza Online - Journal of The Malta Chamber of Scientists


www.xjenza.org
DOI: 10.7423/XJENZA.2015.2.09

Review Article

3D Video Coding and Transmission

C. J. Debono∗1 and P. A. A. Assuncao2


1
Department of Communications and Computer Engineering, University of Malta, Msida, Malta
2
Instituto de Telecomunicacoes, IPLeiria, Portugal

Abstract. The capture, transmission, and display of surveys (CISCO, 2014) expect that video traffic will
3D content has gained a lot of attention in the last few reach around 79% of all the consumer generated Internet
years. 3D multimedia content is no longer confined to traffic in 2018.
cinema theatres but is being transmitted using stereo- To date most of the 3D multimedia experiences have
scopic video over satellite, shared on Blu-RayTM disks, been limited to cinema viewing and controlled environ-
or sent over Internet technologies. Stereoscopic displays ments. This is mainly attributed to the high investments
are needed at the receiving end and the viewer needs to needed to develop these environments and bandwidth
wear special glasses to present the two versions of the demands. However, technologies across the whole chain
video to the human vision system that then generates from capture to transmission to displays have been ad-
the 3D illusion. To be more effective and improve the vancing at a high rate and stereoscopic video has become
immersive experience, more views are acquired from a available for home consumption with content transmit-
larger number of cameras and presented on different dis- ted over satellite, Blu-RayTM , and Internet technolo-
plays, such as autostereoscopic and light field displays. gies (Vetro, Tourapis, Müller & Chen, 2011). In gen-
These multiple views, combined with depth data, also eral, viewing this type of video requires the use of spe-
allow enhanced user experiences and new forms of in- cial glasses to filter the content towards the correct eye
teraction with the 3D content from virtual viewpoints. of the viewer to obtain the 3D perception. However,
This type of audiovisual information is represented by a the experience of the viewer can be further improved
huge amount of data that needs to be compressed and by transmitting more camera views of the same scene
transmitted over bandwidth-limited channels. Part of and use displays which do not need glasses. If depth
the COST Action IC1105 “3D Content Creation, Cod- data is added to the multi-view stream, virtual views
ing and Transmission over Future Media Networks” (3D- can be generated using Depth-Image-Based Rendering
ConTourNet) focuses on this research challenge. (DIBR) at the display allowing the user to determine a
personal viewing angle, known as Free-viewpoint Tele-
Keywords: 3D video transmission, multi-view video Vision (FTV) (Ho & Oh, 2007). All the data generated
coding, quality of services has to generally pass through a limited bandwidth chan-
nel and therefore adequate coding must be performed.
Transmission of 3D and immersive multimedia ser-
1 Introduction vices and applications over heterogeneous networking
technologies includes broadcasting channels, wideband
Multimedia communications has been improving over
backbone links, bandwidth-constrained wireless net-
the years, starting from the broadcasting of black and
works, among others (Lykourgiotis et al., 2014). At
white television to today’s ultra high definition colour
transport level, three main system layers have been con-
transmission and stereoscopic video. These improve-
sidered in the recent past, as the most adequate for 3D
ments, together with the availability of more services
media delivery: MPEG-2 systems, Real-time Transport
and use of different devices to view the content, in-
Protocol (RTP) and ISO base media file format (Schierl
cluding mobile equipment, require more and more data
& Narasimhan, 2011). However, these legacy techno-
to be transmitted, increasingly demanding more band-
logies are now facing new challenges as a result of fast
width from the telecommunication networks. Recent

*Correspondence to: C. J. Debono ([email protected])


c 2015 Xjenza Online
3D Video Coding and Transmission 184

evolution towards future media networks. For instance, forming the 3D model into 2D. The texture maps of
3D multimedia streaming requires flexible adaptation each camera view are encoded at every time stamp us-
mechanisms capable of delivering subsets of 3D data ac- ing 4D-SPIHT (Theobalt et al., 2004; Ziegler, Lensch,
cording to network constraints or users’ preferences and Magnor & Seidel, 2004) or similar methods. Semantic
robust coding and transmission schemes are necessary to coding can also be used for model-based representations,
cope with error-prone channels and dynamic networking where detailed 3D models are assumed to be available
such as Peer-to-Peer (P2P) (Gurler & Tekalp, 2013). In (Kaneko, Koike & Hatori, 1991). The drawback of se-
this context, the challenge of achieving an acceptable mantic coding schemes is that it can only be used for
level of Quality of Experience (QoE) has been evolving video having known objects.
from a technological perspective (Cubero et al., 2012) by 2.3 Point-sample Representation
including an increasing number of human factors (Tae-
wan, Sanghoon, Bovik & Jiwoo, 2014) and acceptance in 2D video can be mapped to 3D video polygon represent-
different usage scenarios (Wei & Tjondronegoro, 2014). ation using point sample methods. Such a technique is
The COST Action IC1105 “3D Content Creation, applied in Würmlin, Lamboray and Gross (2004), where
Coding and Transmission over Future Media Networks” a differential update technique uses the spatio-temporal
(3D-ConTourNet) aims at bringing together research- coherence of the scene captured by multiple cameras.
ers from all the spectrum of the 3D technology chain Operators are applied on the 3D video fragments to com-
to discuss current trends and research problems. It also pensate for the changes in the input and are transmitted
provides, through dissemination of findings, information with the video stream. The 3D video fragment is defined
for stakeholders on the state-of-the-art technology and using a surface normal vector and a colour value. This
services. This article deals with the 3D video coding method also needs the transmission of camera paramet-
and transmission part of this COST Action. ers and identifiers together with the coordinates of the
The paper is divided into five sections. The next sec- 2D pixels.
tion gives information related to the available 3D video 2.4 Multi-view Video Representation
formats. Section 3 deals with the coding of 3D videos
This representation considers the capturing of a scene
while section 4 focuses on the transmission of the 3D
from multiple cameras placed at different angles. This
content. At the end, a conclusion is given.
generates a huge amount of data proportional to the
2 3D Video Formats number of cameras capturing the scene. To reduce this
huge data and provide for better scalability Multi-view
2.1 Stereoscopic Representations Video Coding (MVC) can be used (Vetro, Tourapis et
The most cost effective way to transmit 3D videos is al., 2011). Furthermore, the Multi-View (MV) repres-
using stereoscopic representation. This only needs to entation is an extension of the High Efficiency Video
transmit two views, one intended for the left human eye Coding (HEVC) standard. An overview of HEVC can
and the other one for the right eye. The transmission is be found in Sullivan, Ohm, Han and Weigand (2012).
done sequentially. These two views can be transmitted 2.5 Multi-view Video Plus Depth Representa-
at a lower resolution in the same space dedicated for tion
a high definition television frame and positioned either
The Multi-view Video plus Depth (MVD) format in-
side-by-side or in top-and-bottom fashion. In (Zhang,
cludes the transmission of depth maps with the texture
Wang, Zhou, Wang & Gao, 2012), the authors propose
video. The depth information adds geometrical inform-
the transmission of one single video plus the depth in-
ation that helps in achieving better encoding and view
formation. In this case the second view is generated
reconstruction at the displays. This format supports
at the display using DIBR techniques. In all cases, the
the use of less views, as intermediate views can be con-
video can either be viewed using a normal television by
structed at the display, ideal for wide angle and auto-
decoding one of the views or in 3D using any type of
stereoscopic displays (Vetro, Yea & Smolic, 2008). This
stereoscopic display.
format will probably be the main format for transmis-
2.2 Model-based Representation sion of 3D videos for HEVC coded content.
This approach considers the video as a sequence of 2D 3 3D Video Coding
projections of the scene. It uses closed meshes, such
as triangle meshes (Theobalt, Ziegler, Magnor & Seidel, 3.1 Stereoscopic 3D Video Coding
2004), to represent generic models. Adaptation through The current way of transmitting 3D video is using ste-
scaling of the segments and deformation of surfaces is reoscopic technology. This mainly involves the capture
then applied to better represent the objects in the scene. of the scene using two cameras similar to the human vis-
The input streams are mapped into texture space trans- ion system. These sequences are then separately presen-

10.7423/XJENZA.2015.2.09 www.xjenza.org
185 3D Video Coding and Transmission

ted to the left and right eye of the viewer. In this the coding efficiency by exploiting joint coding of tex-
case, the video is either coded by means of simulcast- ture images and the corresponding depth maps (Tech
ing, where each view is compressed using H.264/AVC et al., 2015).
or HEVC, or by placing the two images, one from each 3.4 Research Trends in Video Coding
stream, in a single high definition frame. In the lat-
ter, known as frame compatible format, the resolution Although a lot of work has been done in 3D video cod-
is decreased, but is an efficient way of coding since the ing, more research is still needed to provide for fast,
bandwidth required is similar to the single-view trans- more efficient and cheap encoders. This can be done
mission. by reducing further the redundancies in the videos, ap-
plying more parallel algorithms, simplifying processes,
3.2 Multi-view Video Coding catering for scalability due to the different display res-
This coding scheme allows for a more efficient way olutions, applying more prediction schemes, and other
to transmit multiple views compared to simulcasting ideas. The 3D-ConTourNet COST Action members are
each individual view. This is done by exploiting the discussing these issues and are working to address these
redundancies available between camera views. Thus, in order to get better 3D video transmission closer to
H.264/MVC and MV-HEVC use spatial, temporal and the market.
inter-view predictions for compression. An overview of
4 3D Video Transmission
the MVC extension to the H.264/AVC can be obtained
from Vetro, Weigand and Sullivan (2011). The multi- Three-dimensional video delivery is mainly accom-
view video can be coded using different structures; the plished over broadcasting networks and the Internet,
most commonly used in literature being the low latency where the IP protocol prevails in flexible platforms
structure and the hierarchical bi-prediction structure. providing IPTV and multi-view video streaming. In
The low latency structure, shown in Figure 1 for 3 views, broadcasting and IPTV services, 3D video streams are
uses only previously encoded blocks for its predictions in encapsulated in MPEG-2 Transport Streams (TS) and
the time axis. Bi-prediction is still applied in between in IPTV TS that are further packetised into the Real-
views, but this is done at the same time instant and Time Protocol (RTP) and User Datagram Protocol
therefore the decoding does not need to wait for fu- (UDP), which provide the necessary support for packet-
ture frames and needs a smaller buffer. On the other based transmission and improved QoS. Since TS and
hand, the hierarchical bi-prediction structure uses future RTP provide similar functionalities at systems level, this
frames in the encoding as shown in Figure 2. This im- type of packetisation introduces some unnecessary over-
plies that a larger buffer is needed and the decoding has head, which is particularly relevant in multi-view video
to wait for the whole group of pictures to start decod- due to the increased amount of coded data that is gen-
ing. The advantage of this structure is that it provides erated. In the case of Internet delivery, HTTP adaptive
a better coding efficiency and therefore less data needs streaming is becoming more relevant, since it allows low
to be transmitted. complexity servers by shifting adaptation functions to
the clients, while also providing flexible support for dif-
3.3 Video-plus-depth Coding ferent types of scalability under user control, either in
Even though current multi-view encoders can provide rate, resolution and/or view selection, besides improved
very high compression ratios, transmission of the mul- resilience to cope with network bandwidth fluctuations.
tiple views still needs huge bandwidths. However, to Since the term “3D video” does not always correspond
satisfy the need of a high number of views to generate to a unique format of visual information, the actual
an immersive 3D experience, a lower number of views transport protocols and networking topologies might be
can be transmitted together with the depth data. The different to better match the compressed streams. For
missing views can then be interpolated from the trans- instance, enabling multi-view 3D video services may re-
mitted views and depth data. This can be done using quire more bandwidth than available if all views of all
a synthesis tool such as DIBR with the geometry data video programs are simultaneously sent through exist-
found in the depth maps. The texture and depth videos ing networks. However, as mentioned above, if DIBR is
can be encoded using the 3D video coding extensions used, a significant amount of bandwidth may be saved,
discussed above and then multiplexed on the same bit because the same performance and quality might be kept
stream. Otherwise, they can be jointly encoded such by simply reconstructing some non-transmitted views
that redundancies inherent in the texture and the depth at the receivers, from their nearby left and right views.
videos can be exploited for further coding efficiencies. Such possibility is enabled by the MVD format, which
An example of such a coding method is found in Müller allows reconstruction of many virtual views from just
et al. (2013) and is now an extension of the HEVC stand- few of them actually transmitted through the commu-
ard. The HEVC extension for 3D (3D-HEVC) improves nications network.

10.7423/XJENZA.2015.2.09 www.xjenza.org
3D Video Coding and Transmission 186

Figure 1: The low latency MVC structure. I represents an Intra coded frame, P represents a predicted frame, and B represents a
bi-predicted frame.

Figure 2: The hierarchical bi-prediction MVC structure.

Interactive streaming also poses specific transmission sary. In the case of multi-view streaming using RTP,
requirements in 3D multi-view video. In non-interactive either single-session or multisession may be used to en-
services, multiple views can be sent through a single able a single or multiple RTP flows for transport of each
multicast session shared simultaneously by all clients, view. The underlying communication infrastructure can
while interactivity requires each view to be encoded be quite diverse (e.g. cable, DVB, LTE). Like in clas-
and transmitted separately. This allows users to freely sic 2D video transmission, dynamic network conditions
switch between views by subscribing to different multic- fluctuation, such as available bandwidth, transmission
ast channels. Multipath networks, such as P2P, can also errors, congestion, jitter, delay and link failures are the
provide the necessary support for interactive multi-view most significant factors affecting delivery of 3D video
3D video streaming by assigning the root of different across networks and ultimately the QoE.
dissemination trees to different views, which in turn can
even be hosted in different servers (Chakareski, 2013).
In the case of mobile environments, there are quite di-
verse networking technologies that might be used to
provide immersive experiences to users through multi-
view video, but the huge amount of visual data to be
processed and the limited battery-life of portable devices
is pushing towards cloud-assisted streaming scenarios
to enable deployment of large-scale systems where com-
putational power might be provided at the expense of
bandwidth (Guan & Melodia, 2014).
Figure 3 summarises the main protocol layers used
in 3D video broadcasting and streaming services. In
the left side, the traditional DVB, including satellite,
terrestrial and cable is shown. Basically, the Packet- Figure 3: Generic protocol stack for 3D video services.
ised Elementary Streams (PES) are encapsulated in TS
before transmission over the DVB network. An exten- However, the increased amount of coded data and
sion of the classic 2D MPEG-2 Systems was defined to high sensitivity of 3D video to transmission errors re-
support multi-view video, where different views may be quires robust coding techniques and efficient conceal-
combined in different PES to provide multiple decod- ment methods at the decoding side because the per-
ing options. The right side of Figure 3 shows a typ- ceived QoE in 3D video is known to be more sensit-
ical case of IP broadcasting and/or streaming of 3D ive to a wider variety of quality factors than in classic
multi-view video. Multi-Protocol Encapsulation (MPE) 2D (Hewage, Worrall, Dogan, Villette & Kondoz, 2009).
is used to increase error robustness in wireless transmis- Two robust coding techniques suitable for such purposes
sion (e.g. DVB-SH), while Datagram Congestion Con- are scalable 3D video coding and Multiple Description
trol Protocol (DCCP) may be used over Internet. In Coding (MDC). In both of them several streams are
this case, MPEG-2 TS encapsulation may not be neces- produced and transmission losses may only affect a sub-

10.7423/XJENZA.2015.2.09 www.xjenza.org
187 3D Video Coding and Transmission

set of them. In the case of scalable 3D video coding, links, providing in-network adaptation functions and
there is one main independent stream (base layer) that coping with different client requirements was also high-
should be better protected against transmission errors lighted as necessary for achieving and acceptable QoE.
and losses while the other dependent streams, or lay- As an active multidisciplinary field of research, several
ers, can be discarded at the cost of some graceful de- promising directions to carry out further relevant invest-
gradation in quality. In MDC, each stream is independ- igations were also pointed out.
ently decodable and can be sent over different paths to
avoid simultaneous loss. This is particularly efficient
References
in multipath transmission over P2P streaming networks Chakareski, J. (2013). Adaptive mutiview video stream-
(Ramzan, Park & Izquierdo, 2012). ing: Challenges and Opportunities. IEEE Commu-
nications Magazine, 51 (5), 94–100.
4.1 Research Trends in 3D Multimedia Trans-
CISCO. (2014). Cisco visual networking index: forecast
mission
and methodology, 2013-2018.
Current research trends in 3D and multi-view trans- Cubero, J. M., Gutierrez, J., Perez, P., Estalayo, E.,
mission span over several key interdisciplinary elements, Cabrera, J., Jaureguizar, F. & Garcia, N. (2012).
which aim at the common goal of delivering an accept- Providing 3D video services: The challenge from 2D
able QoE to end-users. Heterogeneous networks com- to 3DTV quality of experience. Bell Labs Technical
prising hybrid technologies with quite diverse charac- Journal, 16 (4), 115–134.
teristics and the increasing dynamic nature of 3D mul- Guan, Z. & Melodia, T. (2014). Cloud-Assisted Smart
timedia consumption (e.g. mobile, stereo, multi-view, Camera Networks for Energy-Efficient 3D Video
interactive) pose challenging research problems with re- Streaming. Computer, 47 (5), 60–66.
gard to robust coding, network support for stream ad- Gurler, C. G. & Tekalp, M. (2013). Peer-to-peer sys-
aptation, scalability and immersive interactive services, tem design for adaptive 3D video streaming. IEEE
packet loss and error concealment. Hybrid networks and Communications Magazine, 51 (5), 108–114.
multipath transmission in P2P is driving research on Hewage, C., Worrall, S., Dogan, S., Villette, S. &
MDC of 3D multimedia combined scalability and P2P Kondoz, A. (2009). Quality Evaluation of Color
protocols. While MDC is certainly better for coping Plus Depth Map-Based Stereoscopic Video. IEEE
with dynamic multipath networks, scalability might of- Journal of Selected Topics in Signal Processing,
fer the most efficient solution for pure bandwidth con- 3 (2), 304–318.
straints. Network-adaptation by processing multiple Ho, Y. S. & Oh, K. J. (2007). Overview of multi-view
streams in active peer nodes is also under research to video coding.
ensure flexibility and acceptable QoE in heterogeneous Kaneko, M., Koike, A. & Hatori, Y. (1991). Coding of
networks with different dynamic constraints and clients a facial image sequence based on a 3D model of
requiring different sub-sets of 3D multimedia content. the head and motion detection. Journal of Visual
The problem of accurate monitoring of QoE along the Communications and Image Representation, 2 (1),
delivery path has been an important focus of the re- 39–54.
search community, but no general solution has yet been Lykourgiotis, A., Birkos, K., Dagiuklas, T., Ekmekcio-
devised, so much more research is expected in the near glu, E., Dogan, S., Yildiz, Y., . . . Kotsopoulos, S.
future in this field. Synchronisation of the video streams (2014). Hybrid broadcast and broadband networks
across the different network paths is another open prob- convergence for immersive TV applications. IEEE
lem which can lead to frequent re-buffering and jitter- Wireless Communications, 21 (3), 62–69.
ing artifacts. Overall, joint optimisation of coding and Müller, K., Schwarz, H., Marpe, D., Bartnik, C., Bosse,
networking parameters is seen as the key to accomplish S., Brust, H., . . . Wiegand, T. (2013). 3D High-
high levels of QoE, validated through widely accepted Efficiency Video Coding for Multi-view Video and
models. Depth. IEEE Transactions on Image Processing,
22 (9), 3366–3378.
5 Conclusion
Ramzan, N., Park, H. & Izquierdo, E. (2012). Video
An overview of the most important elements of 3D video streaming over P2P networks: Challenges and op-
coding and transmission was presented with emphasis portunities. Signal Processing: Image Communica-
on the technological elements that have open issues for tion, 27 (5), 401–411.
further research and development. 3D video formats Schierl, T. & Narasimhan, S. (2011). Transport and
have evolved from simple stereo video to multi-view- Storage Systems for 3-D Video Using MPEG-2 Sys-
plus-depth, which leads to a huge amount of coded data tems, RTP, and ISO File Format. Proceedings of the
and multiple dependent streams. The need for robust IEEE, 99 (4), 671–683.
transmission over future media networks using multiple

10.7423/XJENZA.2015.2.09 www.xjenza.org
3D Video Coding and Transmission 188

Sullivan, G. J., Ohm, J.-R., Han, W.-J. & Weigand, T. TV Horizon: Contents, Systems, and Visual Per-
(2012). Overview of the high efficiency video coding ception, 57 (2), 384–394.
(HEVC) standard. IEEE Transactions on Circuits Vetro, A., Weigand, T. & Sullivan, G. J. (2011). Over-
and Systems for Video Technology, 22 (12), 1649– view of the stereo and multiview video coding ex-
1668. tensions of the H.264/MPEG-4 AVC standard. Pro-
Taewan, K., Sanghoon, L., Bovik, A. C. & Jiwoo, K. ceedings of the IEEE, 99 (4), 626–642.
(2014). Multimodal Interactive Continuous Scor- Vetro, A., Yea, S. & Smolic, A. (2008). Towards a 3D
ing of Subjective 3D Video Quality of Experience. video format for auto-stereoscopic displays.
IEEE Transactions on Multimedia, 16 (2), 387–402. Wei, S. & Tjondronegoro, D. W. (2014). Acceptability-
Tech, G., Chen, Y., Muller, K., Ohm, J.-R., Vetro, A. & Based QoE Models for Mobile Video. IEEE Trans-
Wang, Y.-K. (2015). Overview of the Multiview and actions on Multimedia, 16 (3), 738–750.
3D Extensions of High Efficiency Video Coding. Würmlin, S., Lamboray, E. & Gross, M. (2004). 3D
Circuits and Systems for Video Technology, IEEE video fragments: dynamic point samples for real-
Transactions on, PP (99), 1. time freeviewpoint video. Computers & Graphics,
Theobalt, C., Ziegler, G., Magnor, M. & Seidel, H. P. 28 (1), 3–14.
(2004). Model-based free-viewpoint video acquisi- Zhang, Z., Wang, R., Zhou, C., Wang, Y. & Gao, W.
tion, rendering and encoding. (2012). A compact stereoscopic video representa-
Vetro, A., Tourapis, A., Müller, K. & Chen, T. (2011). tion for 3D video generation and coding.
3D-TV content storage and transmission. IEEE Ziegler, G., Lensch, H. P. A., Magnor, M. & Seidel, H. P.
Transactions on Broadcasting, Special Issue on 3D- (2004). Multi-video compression in texture space
using 4D SPIHT.

10.7423/XJENZA.2015.2.09 www.xjenza.org

You might also like