3D Visual Communications
()
About this ebook
Provides coverage of the major theories and technologies involved in the lifecycle of 3D video content delivery
Presenting the technologies used in end-to-end 3D video communication systems, this reference covers 3D graphics and video coding, content creation and display, and communications and networking. It covers the full range of key areas from the fundamentals of 3D visual representation to the latest 3D video coding techniques, relevant communication infrastructure and networks to the 3D quality of experience.
The book is structured to logically lead readers through the topic, starting with generic and fundamental information, continuing with a detailed section of different visualisation techniques before concluding with an extensive view of 3D mobile communication systems and trends. The authors give most focus to four important areas: 3D video coding and communications; 3D graphics/gaming and mobile communications; end-to-end 3D ecosystem (including 3D display, 3D player, networking facility and 3D quality issues), and future communications and networks advances for emerging 3D experience.
- Presents the theory and key concepts behind the latest 3D visual coding framework, standards, and corresponding quality assessment
- Provides fundamental material which forms the basis for future research on enhancing the performance of 3D visual communications over current and future wireless networks
- Covers important topics including: 3D video coding and communications; 3D graphics/gaming and mobile communications; end-to-end 3D ecosystem; and future communications and networks advances for emerging 3D experience
Essential reading for engineers involved in the research, design and development of 3D visual coding and 3D visual transmission systems and technologies, as well as academic and industrial researchers.
Related to 3D Visual Communications
Related ebooks
3DTV: Processing and Transmission of 3D Video Signals Rating: 0 out of 5 stars0 ratingsModeling the Environment: Techniques and Tools for the 3D Illustration of Dynamic Landscapes Rating: 0 out of 5 stars0 ratings3D Displays Rating: 0 out of 5 stars0 ratings3D Printing of Medical Models from Ct-Mri Images: A Practical Step-By-Step Guide Rating: 0 out of 5 stars0 ratingsThe History of Visual Magic in Computers: How Beautiful Images are Made in CAD, 3D, VR and AR Rating: 0 out of 5 stars0 ratingsUbiquitous Computing: Smart Devices, Environments and Interactions Rating: 0 out of 5 stars0 ratingsUsability Design for Location Based Mobile Services: in Wireless Metropolitan Networks Rating: 0 out of 5 stars0 ratings3D Deep Learning with Python: Design and develop your computer vision model with 3D data using PyTorch3D and more Rating: 0 out of 5 stars0 ratingsCSS3 and SVG with Claude 3: Mastering CSS3 and SVG: Techniques for Advanced Data Visualization and Animation Rating: 0 out of 5 stars0 ratings"Careers in Information Technology: Computer Vision Engineer": GoodMan, #1 Rating: 0 out of 5 stars0 ratingsEdge Computing: A Primer Rating: 0 out of 5 stars0 ratingsThe Comprehensive Guide to Render Token Rating: 0 out of 5 stars0 ratingsProfessional Papervision3D Rating: 0 out of 5 stars0 ratingsArtificial Intelligence and Big Data: The Birth of a New Intelligence Rating: 0 out of 5 stars0 ratingsThe Essential Guide to Web3: Develop, deploy, and manage distributed applications on the Ethereum network Rating: 0 out of 5 stars0 ratingsPro Asynchronous Programming with .NET Rating: 5 out of 5 stars5/5Making Sense of Data III: A Practical Guide to Designing Interactive Data Visualizations Rating: 0 out of 5 stars0 ratingsSemantic Computing Rating: 0 out of 5 stars0 ratingsSupercharg3d: How 3D Printing Will Drive Your Supply Chain Rating: 0 out of 5 stars0 ratingsAutodesk Civil 3D 2024 from Start to Finish: A practical guide to civil infrastructure design, modeling, and analysis Rating: 0 out of 5 stars0 ratingsSoftware-Defined Cloud Centers: Operational and Management Technologies and Tools Rating: 0 out of 5 stars0 ratingsInternet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials Rating: 0 out of 5 stars0 ratingsEmerging Social Computing Techniques: Volume 3 Rating: 0 out of 5 stars0 ratingsDigital Twins: How Engineers Can Adopt Them To Enhance Performances Rating: 0 out of 5 stars0 ratingsHuman Visual System Model: Understanding Perception and Processing Rating: 0 out of 5 stars0 ratingsDigital Technologies – an Overview of Concepts, Tools and Techniques Associated with it Rating: 0 out of 5 stars0 ratings
Computers For You
The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 5 out of 5 stars5/5Uncanny Valley: A Memoir Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5CompTIA Security+ Get Certified Get Ahead: SY0-701 Study Guide Rating: 5 out of 5 stars5/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 4 out of 5 stars4/5Learning the Chess Openings Rating: 5 out of 5 stars5/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5Tor and the Dark Art of Anonymity Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsCompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsEverybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5People Skills for Analytical Thinkers Rating: 5 out of 5 stars5/5The Huffington Post Complete Guide to Blogging Rating: 3 out of 5 stars3/5
Reviews for 3D Visual Communications
0 ratings0 reviews
Book preview
3D Visual Communications - Guan-Ming Su
Preface
As the Avatar 3D movie experience swept the world in 2010, 3D visual content has become the most eye-catching spot in the consumer electronics products. This 3D visual wave has spread to 3DTV, Blu-ray, PC, mobile, and gaming industries, as the 3D visual system provides sufficient depth cues for end users to acquire better understanding of the geometric structure of the captured scenes, and nonverbal signals and cues in visual conversation. In addition, 3D visual systems enable observers to recognize the physical layout and location for each object with immersive viewing experiences and natural user interaction, which also makes it an important topic for both academic and industrial researchers.
Living in an era of widespread mobility and networking, where almost all consumer electronic devices are endpoints of the wireless/wired networks, the deployment of 3D visual representation will significantly challenge the network bandwidth as well as the computational capability of terminal points. In other words, the data volume received in an endpoint required to generate 3D views will be many times that of a single view in a 2D system, and hence the new view generation process sets a higher requirement for the endpoint's computational capability. Emerging 4G communication systems fit very well into the timing of 3D visual communications by significantly improving the bandwidth as well as introducing many new features designed specifically for high-volume data communications.
In this book, we aim to provide comprehensive coverage of major theories and practices involved in the lifecycle of a 3D visual content delivery system. The book presents technologies used in an end-to-end 3D visual communication system, including the fundamentals of 3D visual representation, the latest 3D video coding techniques, communication infrastructure and networks in 3D communications, and 3D quality of experience.
This book targets professionals involved in the research, design, and development of 3D visual coding and 3D visual transmission systems and technologies. It provides essential reading for students, engineers, and academic and industrial researchers. This book is a comprehensive reference for learning all aspects of 3D graphics and video coding, content creation and display, and communications and networking.
Organization of the book
This book is organized as three parts:
principles of 3D visual systems: 3D graphics and rending, 3D display, and 3D content creation are all well covered
visual communication: fundamental technologies used in 3D video coding and communication system, and the quality of experience. There are discussions on various 3D video coding formats and different communication systems, to evaluate the advantages of each system
advances and applications of 3D visual communication
Chapter 1 overviews the whole end-to-end 3D video ecosystem, in which we cover key components in the pipeline: the 3D source coding, pre-processing, communication system, post-processing, and system-level design. We highlight the challenges and opportunities for 3D visual communication systems to give readers a big picture of the 3D visual content deployment technology, and point out which specific chapters relate to the listed advanced application scenarios.
3D scene representations are the bridging technology for the entire 3D visual pipeline from creation to visualization. Different 3D scene representations exhibit different characteristics and the selections should be chosen according to the requirement of the targeted applications. Various techniques can be categorized according to the amount of geometric information used in the 3D representation spectrum; at one extreme is the simplest form via rendering without referring to any geometry, and the other end uses geometrical description. Both extremes of the technology have their own advantages and disadvantages. Therefore, hybrid methods, rendering with implicit geometries, are proposed to combine the advantages and disadvantages of both ends of the technology spectrum to better support the needs of stereoscopic applications. In Chapter 2, a detailed discussion about three main categories for 3D scene representations is given.
In Chapter 3, we introduce the display technologies that allow the end users to perceive 3D objects. 3D displays are the direct interfaces between the virtual world and human eyes and these play an important role in reconstructing 3D scenes. We first describe the fundamentals of the human visual system (HVS) and discuss depth cues. Having this background, we introduce the simplest scenario to support stereoscopic technologies (two-view only) with aided glasses. Then, the common stereoscopic technologies without aided glasses are presented. Display technologies to support multiple views simultaneously are addressed to cover the head-tracking-enabled multi-view display, occlusion-based and reflection-based multi-view system. At the end of this chapter, we will briefly discuss the holographic system.
In Chapter 4, we look at 3D content creation methods, from 3D modeling and representation, capturing, 2D to 3D conversion and, to 3D multi-view generation. We showcase three practical examples that are adopted in industrial 3D creation process to provide a clear picture of how things work together in a real 3D creation system.
It has been observed that 3D content has significantly higher storage requirements compared to their 2D counterparts. Introducing compression technologies to reduce the required storage size and alleviate transmission bandwidth is very important for deploying 3D applications. In Chapter 5, we introduce 3D video coding and related standards. We will first cover the fundamental concepts and methods used in conventional 2D video codecs, especially the state-of-the-art H.264 compression method and the recent development of next generation video codec standards. With common coding knowledge, we first introduce two-view video coding methods which have been exploited in the past decade. Several methods, including individual two-view coding, simple inter-view prediction stereo video coding, and the latest efforts on frame-compatible stereo coding, are presented. Research on the depth information to reconstruct the 3D scene has brought some improvements and the 3D video coding can benefit from introducing depth information into the coded bit stream. We describe how to utilize and compress the depth information in the video-plus-depth coding system. Supporting multi-view video sequence compression is an important topic as multi-view systems provide a more immersive viewing experience. We will introduce the H.264 multiple view coding (MVC) for this particular application. More advanced technologies to further reduce the bit rate for multi-view systems, such as the multi-view video plus depth coding and layered depth video coding system, are introduced. At the end of this chapter, the efforts on the 3D representation in MPEG-4, such as binary format for scenes (BIFS) and animation framework extension (AFX), are presented. The ultimate goal for 3D video system, namely, the free viewpoint system, is also briefly discussed.
In Chapter 6, we present a review of the most important topics in communication networks that are relevant to the subject matter of this book. We start by describing the main architecture of packet networks with a focus on those based on the Internet protocol (IP) networks. Here we describe the layered organization of network protocols. After this, we turn our focus to wireless communications, describing the main components of digital wireless communications systems followed by a presentation of modulation techniques, the characteristics of the wireless channels, and adaptive modulation and coding. These topics are then applied in the description of wireless networks and we conclude with a study of fourth generation (4G) cellular wireless standards and systems.
To make 3D viewing systems more competitive relative to 2D systems, the quality of experience (QoE) shown from 3D systems should provide better performance than from 2D systems. Among different 3D systems, it is also important to have a systematic way to compare and summarize the advances and assess the disadvantages. In Chapter 7, we discuss the quality of experience in 3D systems. We first present the 3D artifacts which may be induced throughout the whole content life cycle: content capture, content creation, content compression, content delivery, and content display. In the second part, we address how to measure the quality of experience for 3D systems subjectively and objectively. With those requirements in mind, we discuss the important factors to design a comfortable and high-quality 3D system.
Chapter 8 addresses the main issue encountered when transmitting 3D video over a channel: that of dealing with errors introduced during the communication process. The chapter starts by presenting the effects of transmission-induced errors following by a discussion of techniques to counter these errors, such as the error resilience, error concealment, unequal error protection, and multiple description coding. The chapter concludes with a discussion of cross-layer approaches.
Developing 3D stereoscopic applications has become really popular in the software industry. 3D stereoscopic research and applications are advancing rapidly due to the commercial need and the popularity of 3D stereoscopic products. Therefore, Chapter 9 gives a short discussion of commercially available products and technologies for application development. The discussed topics include commercially available glass-less two-view systems, depth adaptation capturing and displaying systems, two-view gaming systems, mobile 3D systems and perception, and 3D augmented reality systems.
In the final chapter, we introduce the state-of-the-art technologies for delivering compressed 3D content over communication channels. Subject to limited bandwidth constraints in the existing communication infrastructure, the bit rate of the compressed video data needs to be controlled to fit in the allowed bandwidth. Consequently, the coding parameters in the video codec need to be adjusted to achieve the required bit rate. In this chapter, we first review different popular 2D video rate control methods, and then discuss how to extend the rate control methods to different 3D video streaming scenarios. For the multi-view system, changing the viewing angle from one point to another point to observe a 3D scene (view switching) is a key feature to enable the immersive viewing experience. We address the challenges and the corresponding solutions for 3D view switching. In the third part of this chapter, we discuss the peer-to-peer 3D video streaming services. As the required bandwidth for 3D visual communication service poses a heavy bandwidth requirement on centralized streaming systems, the peer-to-peer paradigm shows great potential for penetrating the 3D video streaming market. After this, we cover 3D video broadcasting and 3D video communication over 4G cellular networks.
Acknowledgements
We would like to thank a few of the great many people whose contributions were instrumental in taking this book from an initial suggestion to a final product. First, we would like to express our gratitude to Dr. Chi-Yuan Yao for his help on collecting and sketching the content in Sections 9.1 and 9.2 and help with finishing Chapter 9 in time. We also thank him for his input on scene representation because of his deep domain knowledge in the field of computer geometry. We would like to thank Dr. Peng Yin and Dr. Taoran Lu for their help in enriching the introduction of HEVC. We also thank Mr. Dobromir Todorov for help in rendering figures used in Chapters 2 and 9. Finally, the authors appreciate the many contributions and sacrifices that our families have made to this effort. Guan-Ming Su would like to thanks his wife Jing-Wen's unlimited support and understanding during the writing process; and also would like to dedicate this book to his parents. Yu-Chi Lai would like to thank his family for their support of his work. Andres Kwasinski would like to thank his wife Mariela and daughters Victoria and Emma for their support, without which this work would not have been possible. Andres would also like to thank all the members of the Department of Computer Engineering at the Rochester Institute of Technology. Haohong Wang would like to thank his wife Xin Lu, son Nicholas and daughter Isabelle for their kind supports as always, especially for those weekends and nights that he had to be separated from them to work on this book at the office. The dedication of this book to our families is a sincere but inadequate recognition of all their contributions to our work.
About the Authors
Guan-Ming Su received the BSE degree in Electrical Engineering from National Taiwan University, Taipei, Taiwan, in 1996 and the MS and PhD degrees in Electrical Engineering from the University of Maryland, College Park, U.S.A., in 2001 and 2006, respectively. He is currently with Dolby Labs, Sunnyvale, CA. Prior to this he has been with the R&D Department, Qualcomm, Inc., San Diego, CA; ESS Technology, Fremont, CA; and Marvell Semiconductor, Inc., Santa Clara, CA. His research interests are multimedia communications and multimedia signal processing. He is the inventor of 15 U.S. patents and pending applications. Dr Su is an associate editor of Journal of Communications; guest editor in Journal of Communications special issue on Multimedia Communications, Networking, and Applications; and Director of review board and R-Letter in IEEE Multimedia Communications Technical Committee. He serves as the Publicity Co-Chair of IEEE GLOBECOM 2010, International Liaison Chair in IEEE ICME 2011, Technical Program Track Co-Chair in ICCCN 2011, and TPC Co-Chair in ICNC 2013. He is a Senior member of IEEE.
Yu-Chi Lai received the B.S. from National Taiwan University, Taipei, R.O.C., in 1996 in Electrical Engineering Department. He received his M.S. and Ph.D. degrees from University of Wisconsin–Madison in 2003 and 2009 respectively in Electrical and Computer Engineering. He received his M.S. and Ph.D. degrees from University of Wisconsin–Madison in 2004 and 2010 respectively in Computer Science. He is currently an assistant professor in NTUST. His research focus is on the area of computer graphics, computer vision, multimedia, and human-computer interaction. Due to his personal interesting, he is interested in industrial projects and he currently also cooperates with IGS to develop useful and interesting computer game technologies and NMA to develop animation technologies.
Andres Kwasinski received in 1992 his diploma in Electrical Engineering from the Buenos Aires Institute of Technology, Buenos Aires, Argentina, and, in 2000 and 2004 respectively, the M.S. and Ph.D. degrees in Electrical and Computer Engineering from the University of Maryland, College Park, Maryland. He is currently an Assistant Professor at the Department of Computer Engineering, Rochester Institute of Technology, Rochester, New York. Prior to this, he was with the Wireless Infrastructure group at Texas Instruments Inc., working on WiMAX and LTE technology, and with the University of Maryland, where he was a postdoctoral Research Associate. Dr. Kwasinski is a Senior Member of the IEEE, an Area Editor for the IEEE Signal Processing Magazine and Editor for the IEEE Transactions on Wireless Communications. He has been in the Organizing Committee for the 2010 IEEE GLOBECOM, 2011 and 2012 IEEE ICCCN, 2012 ICNC and 2013 IEEE ICME conferences. Between 2010 and 2012 he chaired the Interest Group on Distributed and Sensor Networks for Mobile Media Computing and Applications within the IEEE Multimedia Communications Technical Committee. His research interests are in the area of multimedia wireless communications and networking, cross layer designs, cognitive and cooperative networking, digital signal processing and speech, image and video processing for signal compression and communication, and signal processing for non-intrusive forensic analysis of speech communication systems.
Haohong Wang received the B.S. degree in computer science and the M.Eng. degree in computer applications both from Nanjing University, China, the M.S. degree in computer science from University of New Mexico, and the Ph.D. degree in Electrical and computer engineering from Northwestern University, Evanston, USA. He is currently the General Manager of TCL Research America, TCL Corporation, at Santa Clara, California, in charge of the overall corporate research activities in North America with research teams located at fourplaces. Prior to that he held various technical and management positions at AT&T, Catapult Communications, Qualcomm, Marvell, TTE and Cisco. Dr. Wang's research involves the areas of multimedia processing and communications, mobile sensing and data mining. He has published more than 50 articles in peer-reviewed journals and International conferences. He is the inventor of more than 40 U.S. patents and pending applications. He is the co-author of 4G Wireless Video Communications (John Wiley & Sons, 2009), and Computer Graphics (1997).
Dr. Wang is the Editor-in-Chief of the Journal of Communications, a member of the Steering Committee of IEEE Transactions on Multimedia, and an editor of IEEE Communications Surveys & Tutorials. He has been serving as an editor or guest editor for many IEEE and ACM journals and magazines. He chairs the IEEE Technical Committee on Human Perception in Vision, Graphics and Multimedia, and was the Chair of the IEEE Multimedia Communications Technical Committee. He is an elected member of the IEEE Visual Signal Processing and Communications Technical Committee, and IEEE Multimedia and Systems Applications Technical Committee. Dr. Wang has chaired more than dozen of International conferences, which includes the IEEE GLOBECOM 2010 (Miami) as the Technical Program Chair, and IEEE ICME 2011 (Barcelona) and IEEE ICCCN 2011 (Maui) as the General Chair.
Chapter 1
Introduction
1.1 Why 3D Communications?
Thanks to the great advancement of hardware, software, and algorithms in the past decade, our daily life has become a major digital content producer. Nowadays, people can easily share their own pieces of artwork on the network with each other. Furthermore, with the latest development in 3D capturing, signal processing technologies, and display devices, as well as the emergence of 4G wireless networks with very high bandwidth, coverage, and capacity, and many advanced features such as quality of service (QoS), low latency, and high mobility, 3D communication has become an extremely popular topic. It seems that the current trend is closely aligned with the expected roadmap for reality video over wireless, estimated by Japanese wireless industry peers in 2005 (as shown in Figure 1.1), according to which the expected deployment timing of stereo/multi-view/hologram video is around the same time as the 4G wireless networks deployment. Among those 3D video representation formats, the stereoscopic and multi-view 3D videos are more mature and the coding approaches have been standardized in Moving Picture Experts Group (MPEG) as video-plus-depth
(V+D) and the Joint Video Team (JVT) Multi-view Video Coding (MVC) standard, respectively. The coding efficiency study shows that coded V+D video only takes about 1.2 times bit rate compared to the monoscopic video (i.e., the traditional 2D video). Clearly, the higher reality requirements would require larger volumes of data to be delivered over the network, and more services and usage scenarios to challenge the wireless network infrastructures and protocols.
Figure 1.1 Estimated reality video over wireless development roadmap.
c1f001From a 3D point of view, reconstructing a scene remotely and/or reproducibly as being presented face-to-face has always been a dream through human history. The desire for such technologies has been pictured in many movies, such as Star Trek's Holodeck, Star Wars' Jedi council meeting, The Matrix's matrix, and Avatar's Pandora. The key technologies to enable such a system involve many complex components, such as a capture system to describe and record the scene, a content distribution system to store/transmit the recorded scene, and a scene reproduction system to show the captured scenes to end users. Over the past several decades, we have witnessed the success of many applications, such as television broadcasting systems in analog (e.g., NTSC, PAL) and digital (e.g., ATSC, DVB) format, and home entertainment system in VHS, DVD, and Blu-ray format. Although those systems have served for many years and advanced in many respects to give better viewing experiences, end users still feel that the scene reconstruction has its major limitation: the scene presentation is on a 2D plane, which significantly differs from the familiar three-dimensional view of our daily life. In a real 3D world, humans can observe objects and scenes from different angles to acquire a better understanding of the geometry of the watched scenes, and nonverbal signals and cues in visual conversation. Besides, humans can perceive the depth of different objects in a 3D environment so as to recognize the physical layout and location for each object. Furthermore, 3D visual systems can provide immersive viewing experience and higher interaction. Unfortunately, the existing traditional 2D visual systems cannot provide those enriched viewing experiences.
The earliest attempt to construct a 3D image was via the anaglyph stereo approach which was demonstrated by W. Rollmann in 1853 and J. C. D'Almeida in 1858 and patented in 1891 by Louis Ducos du Hauron. In 1922, the earliest confirmed 3D film was premiered at the Ambassador Hotel Theater in Los Angeles and was also projected in the red/green anaglyph format. In 1936, Edwin H. Land invented the polarizing sheet and demonstrated 3D photography using polarizing sheet at the Waldorf-Astoria Hotel. The first 3D golden era was between 1952 and 1955, owing to the introduction of color stereoscopy. Several golden eras have been seen since then. However, there are many factors affecting the popularity and success of 3D visual systems, including the 3D visual and content distribution technologies, the viewing experience, the end-to-end ecosystem, and competition from improved 2D systems. Recently, 3D scene reconstruction algorithms have achieved great improvement, which enables us to reconstruct a 3D scene from a 2D one and from stereoscope images, and the corresponding hardware can support the heavy computation at a reasonable cost, and the underlying communication systems have advanced to provide sufficient bandwidth to distribute the 3D content. Therefore, 3D visual communication systems have again drawn considerable attention from both academia and industry.
In this book, we discuss the details of the major technologies involved in the entire end-to-end 3D video ecosystem. More specifically, we address the following important topics and the corresponding opportunities:
the lifecycle of the 3D video content through the end-to-end 3D video communication framework,
the 3D content creation process to construct a 3D visual experience,
the different representations and compression formats for 3D scenes/data for content distribution. Each format has its own advantages and disadvantages. System designers can choose the appropriate solution for given the system resources, such as computation complexity and communication system capacity. Also, understanding the unequal importance of different syntaxes, decoding dependencies, and content redundancies in 3D visual data representation and coding can help system designers to adopt corresponding error resilient methods, error concealment approaches, suitable unequal error protection, and customized dynamic resource allocation to improve the system performance,
the advanced communication systems, such as 4G networks, to support transmission of 3D visual content. Being familiar with those network features can help the system designer to design schedulers and resource allocation schemes for 3D visual data transmission over 4G networks. Also, we can efficiently utilize the QoS mechanisms supported in 4G networks for 3D visual communications,
the effective 3D visual data transmission and network architectures to deliver 3D video services and their related innovative features,
the 3D visual experience for typical users, the factors that impact on the user experiences, and 3D quality of experience (QoE) metrics from source, network, and receiver points of view. Understanding the factors affecting 3D QoE is very important and it helps the system designer to design a QoE optimized 3D visual communications system to satisfy 3D visual immersive expectations,
the opportunities of advanced 3D visual communication applications and services, for example, how to design the source/relay/receiver side of an end-to-end 3D visual communication system to take advantage of new concepts of computing, such as green computing, cloud computing, and distributed/collaborated computing, and how to apply scalability concepts to handle 3D visual communications given the heterogeneous 3D terminals in the networks is an important topic.
1.2 End-to-End 3D Visual Ecosystem
As shown by the past experience and lessons learned from the development and innovation of visual systems, the key driving force is all about how to enrich the user experiences, or so-called QoE. The 3D visual system also faces the same issues. Although a 3D visual system provides a dramatic new user experience after traditional 2D systems, the QoE concept has to be considered at every stage of the communication system pipeline during system design and optimization work to ensure the worthwhileness of moving from 2D to 3D. There are many factors affecting the QoE, such as errors in multidimensional signal processing, lack of information, packet loss, and optical errors in display. Improperly addressing QoE issues will result in visual artifacts (objectively and subjectively), visual discomfort, fatigue, and other things that degrade the intended 3D viewing experiences.
An end-to-end 3D visual communication pipeline consists of the content creation, 3D representation, data compression, transmission, decompression, post-processing, and 3D display stages, which also reflects the lifecycle of a 3D video content in the system. We illustrate the whole pipeline and the corresponding major issues in Figure 1.2. In addition, we also show the possible feedback information from later stages to earlier stages for possible improvement of 3D scene reconstruction.
Figure 1.2 End-to-end 3D visual ecosystem.
c1f002The first stage of the whole pipeline is the content creation. The goal of the content creation stage is to produce 3D content based on various data sources or data generation devices. There are three typical ways of data acquisition which result in different types of data formats. The first is to use a traditional 2D video camera, which captures 2D images; the image can be derived for 3D data representation in the later stage of the pipeline. The second type is to use a depth video camera to measure the depth of each pixel corresponding to its counterpart color image. The registration of depth and 2D color image may be needed if sensors are not aligned. Note that in some depth cameras, the spatial resolution is lower than that of a 2D color camera. The depth image can also be derived from a 2D image with 2D-to-3D conversion tools; often the obtained depth does not have a satisfactory precision and thus causes QoE issues. The third type is to use an N-view video camera, which consists of an array of 2D video cameras located at different positions around one scene and all cameras are synchronized to capture video simultaneously, to generate N-view video. Using graphical tools to model and create 3D scene is another approach which could be time consuming, but it is popular nowadays to combine both graphical and video capturing and processing methods in the 3D content creation.
In the next stage, the collected video/depth data will be processed and transformed into 3D representation formats for different targeted applications. For example, the depth image source can be used in image plus depth rendering or processed for N-view application. Since the amount of acquired/processed/transformed 3D scene is rather large compared to single-view video data, there is a strong need to compress the 3D scene data. On the other hand, applying traditional 2D video coding schemes separately to each view or each different data type is inefficient as there exist certain representation/coding redundancies among neighboring views and different data types. Therefore, a dedicated compression format is needed at the compression stage to achieve better coding efficiency. In the content distribution stage, the packet loss during data delivery plays an important role in the final QoE, especially for streaming services. Although certain error concealment algorithms adopted in the existing 2D decoding and post-processing stages may alleviate this problem, directly applying the solution developed for 2D video system may not be sufficient. This is because the 3D video coding introduces more coding dependencies, and thus error concealment is much more complex compared to that in 2D systems. Besides, the inter-view alignment requirement in 3D video systems also adds plenty of difficulties which do not exist in 2D scenarios. The occlusion issue is often handled at the post-processing stage, and the packet loss will make the occlusion post-processing even more difficult. There are also some other application layer approaches to relieve the negative impact of packet loss, such as resilient coding and unequal error protection (UEP), and those technologies can be incorporated into the design of the 3D visual communication system to enrich the final QoE. At the final stage of this 3D visual ecosystem, the decoded and processed 3D visual data will be displayed on its targeted 3D display. Depending on the type of 3D display, each display has its unique characteristics of artifacts and encounters different QoE issues.
1.2.1 3D Modeling and Representation
3D scene modeling and representation is the bridging technology between the content creation, transmission, and display stages of a 3D visual system. The 3D scene modeling and representation approaches can be classified into three main categories: geometry-based modeling, image based modeling, and hybrid modeling. Geometry-based representation typically uses polygon meshes (called surface-based modeling), 2D/3D points (called point-based modeling), or voxels (called volume-based modeling) to construct a 3D scene. The main advantage is that, once geometry information is available, the 3D scene can be rendered from any viewpoint and view direction without any limitation, which meets the requirement for a free-viewpoint 3D video system. The main disadvantage is in the computational cost of rendering and storing, which depends on the scene complexity, that is the total number of triangles used to describe the 3D world. In addition, geometry-based representation is generally an approximation to the 3D world. Although there are offline photorealistic rendering algorithms to generate views matching our perception of the real world, the existing algorithms using graphics pipeline still cannot produce realistic views on the fly.
The image based modeling goes to the other extreme, not using any 3D geometry, but using a set of images captured by a number of cameras with predesigned positions and settings. This approach tends to generate high quality virtual view synthesis without the effort of 3D scene reconstruction. The computation complexity via image based representation is proportional to the number of pixels in the reference and output images, but in general not to the geometric complexity such as triangle counts. However, the synthesis ability of image based representation has limitations on the range of view change and the quality depends on the scene depth variation, the resolution of each view, and the number of views. The challenge for this approach is that a tremendous amount of image data needs to be stored, transferred, and processed in order to achieve a good quality synthesized view, otherwise interpolation and occlusion artifacts will appear in the synthesized image due to lack of source data.
The hybrid approach can leverage these two representation methods to find a compromise between the two extremes according to given constraints. By adding geometric information into image based representation, the disocclusion and resolution problem can be relieved. Similarly, adding image information captured from the real world into geometry-based representation can reduce the rendering cost and storage. As an example, using multiple images and corresponding depth maps to represent 3D scene is a popular method (called depth image based representation), in which the depth maps are the geometric modeling component, but this hybrid representation can reduce the storage and processing of many extra images to achieve the same high-quality synthesized view as the image based approach. All these methods are demonstrated in detail in Chapters 2 and 4.
1.2.2 3D Content Creation
Other than graphical modeling approaches, the 3D content can be captured by various processes with different types of cameras. The stereo camera or depth camera simultaneously captures video and associated per-pixel depth or disparity information; the multi-view camera captures multiple images simultaneously from various angles, then multi-view matching (or correspondence) process is required to generate the disparity map for each pair of cameras, and then the 3D structure can be estimated from these disparity maps. The most challenging scenario is to capture 3D content from a normal 2D (or monoscopic) camera, which lacks of disparity or depth information, and where a 2D-to-3D conversion algorithm has to be triggered to generate an estimated depth map and thus the left and right views. The depth map can be derived from various types of depth cues, such as the linear perspective property of a 3D scene, the relationship between object surface structure and the rendered image brightness according to specific shading models, occlusion of objections, and so on. For complicated scenes, the interactive 2D-to-3D conversion, or offline conversion, tends to be adopted, that is, human interaction is required at certain stages of the processing flow, which could be in object segmentation, object selection, object shape or depth adjustment, object occlusion order specification, and so on. In Chapter 4, a few 2D-to-3D conversation systems are showcased to give details of the whole process flow.
1.2.3 3D Video Compression
Owing to the huge amount of 3D video data, there is a strong need to develop efficient 3D video compression methods. The 3D video compression technology has been developed for more than a decade and there have been many formats proposed. Most 3D video compression formats are built on state-of-the-art video codecs, such as H.264. The compression technology is often a tradeoff between the acceptable level of computation complexity and affordable budget in the communication bandwidth. In order to reuse the existing broadcast infrastructure originally designed for 2D video coding and transmission, almost all current 3D broadcasting solutions are based on a frame-compatible format via spatial subsampling approach, that is, the original left and right views are subsampled into half resolution and then embedded into a single video frame for compression and transmission over the infrastructure as with 2D video, and at the decoder side the demultiplexing and interpolation are conducted to reconstruct the dual views. The subsampling and merging can be done by either (a) side-by-side format, proposed by Sensio, RealD, and adopted by Samsung, Panasonic, Sony, Toshiba, JVC, and DirectTV (b) over/under format, proposed by Comcast, or (c) checkerboard format. A mixed-resolution approach is proposed, which is based on the binocular suppression theory showing that the same subjective perception quality can be achieved when one view has a reduced resolution. The mixed-resolution method first subsamples each view to a different resolution and then compresses each view independently.
Undoubtedly, the frame-compatible format is very simple to implement without changing the existing video codec system and underlying communication infrastructure. However, the correlation between left and right views has not been fully exploited, and the approach is mainly oriented to the two-view scenario but not to the multi-view 3D scenario. During the past decade, researchers have also investigated 3D compression from the coding perspective and 3D video can be represented in the following formats: two-view stereo video, video-plus-depth (V+D), multi-view video coding (MVC), multi-view video-plus-depth (MVD), and layered depth video (LDV). The depth map is often encoded via existing a 2D color video codec, which is designed to optimize the coding efficiency of the natural images. It is noted that depth map shows different characteristics from natural color image. Researchers have proposed several methods to improve the depth-based 3D video compression. In nowadays, free-viewpoint 3D attracts a lot of attention, in which the system allows end users to change the view position and angle to enrich their immersive experience. Hybrid approaches combining geometry-based and image based representation are typically used to render the 3D scene for free-viewpoint TV. In Chapter 5, we discuss V+D, MVC, MVD, and LDV.
1.2.4 3D Content Delivery
Transmitting compressed 3D video bit streams over networks have more challenges than with conventional 2D video. From the video compression system point of view, the state-of-the-art 3D video codec introduces more decoding dependency to reduce the required bit rate due to the exploitation of the inter-view and synthesis prediction. Therefore, the existing mono-view video transmission scheme cannot be applied directly to these advanced 3D formats. From the communication system perspective, the 3D video bit stream needs more bandwidth to carry more views than the mono-view video. The evolution of cellular communications into 4G wireless network results in significant improvements of bandwidth and reliability. The end mobile user can benefit from the improved network infrastructure and error control to enjoy a 3D video experience. On the other hand, the wireless transmission often suffers frequent packet/bit errors; the highly bit-by-bit decoding-dependent 3D video stream is vulnerable to those errors. It is important to incorporate error correction and concealment techniques, as well as the design of an error resilient source coding algorithm to increase the robustness of transmitting 3D video bit streams over wireless environments. Since most 3D video formats are built up on existing 2D video codec, many techniques developed for 2D video systems can be extended or adapted to consider properties of 3D video. One technique that offers numerous opportunities for this approach is unequal error protection. Depending on the relative importance of the bit stream segments, different portions of the 3DTV bit stream are protected with different strengths of forward error control (FEC) codes. Taking the stereoscopic video streaming as an example, a UEP scheme can be used to divide the stream into three layers of different importance: intra-coded left-view frames (the most important ones), left-view predictive coded frames, and right-view frames encoded from both intra-coded and predictive left-view frames (the least valuable ones). For error concealment techniques, we can also draw from properties inherent to 3D video. Taking video plus depth format as an example, we can utilize the correlation between video and depth information to do error concealment. The multiple-description coding (MDC) is also a promising technology for 3D video transmission. The MDC framework will encode the video in several independent descriptions. When only one description is received, it can be decoded to obtain a lower-quality representation. When more than one description is received, they can be combined to obtain a representation of the source with better quality. The final quality depends on the number of descriptions successfully received. A simple way to apply multiple-description coding technology on 3D stereoscopic video is to associate one description with the right view and one with the left view. Another way of implementing multiple-description coding for 3D stereoscopic video and multi-view video consists of independently encoding one view and encoding the second view predicted with respect to the independently encoded view. This later approach can also be considered as a two-layer, base plus enhancement, encoding. This methodology can also be applied to V+D and MVD, where the enhancement layer is the depth information. Different strategies for advanced 3D video delivery over different content delivery path will be discussed in Chapters 8 and 10. Several 3D applications will be dealt with in Chapter 9.
1.2.5 3D Display
To perceive a 3D scene by the human visual system (HVS), the display system is designed to present sufficient depth information for each object such that HVS can reconstruct each object's 3D positions. The HVS recognizes objects' depth from the real 3D world through the depth cues. Therefore, the success of a 3D display depends on how well the depth cues are provided, such that HVS can observe a 3D scene. In general, depending on how many viewpoints are provided, the depth cues can be classified into monocular, binocular, and multi-ocular categories. The current 3DTV systems that consumers can buy in retail stores are all based on stereoscopic 3D technology with binocular depth cues. This stereoscopic display will multiplex two views at the display side and the viewers need to wear special glasses to de-multiplex the signal to get the left and right view. Several multiplexing/de-multiplexing approaches have been proposed and implemented in 3D displays, including wavelength division (color) multiplexing, polarization multiplexing, and time multiplexing.
For 3D systems without aided glasses, called auto-stereoscopic display (AS-D), the display system uses optical elements such as parallax barriers (occlusion-based approach) or lenticular lenses (refraction-based approach) to guide the two-view images to the left and right eyes of the viewer in order to generate the realistic 3D sense. In other words, the multiplexing and de-multiplexing process is removed compared to the stereoscopic display. Mobile 3DTV is an example of an AS-D product that we have seen in the market. The N-view AS-D 3DTVs or PC/laptop monitors have been in demos for many years by Philips, Sharp, Samsung, LG, Alioscopy, and so on, in which it explores the stereopsis of 3D space for multiple viewers without the need of glasses. However, the visual quality of these solutions still has lots of room to improve. To fully enrich the immersive visual experience, end users would want to interactively control the viewpoint, which is called free-viewpoint 3DTV (FVT). In a typical FVT system, the viewer's head and gaze are tracked to generate the viewing position and directions and thus to calculate images directed to the viewer's eyes. To render free-viewpoint video, the 3D scene needs to be synthesized and rendered from the source data in order to support the seamless view generation during the viewpoint changing.
To achieve full visual reality, holographic 3D display is a type of device to reconstruct the optical wave field such that the reconstructed 3D light beam can be seen as the physical presentation of the original object. The difference between conventional photography and holography is that photography can only record amplitude information for an object but holography attempts to record both the amplitude and phase information. Knowing that current image recoding systems can only record the amplitude information, holography needs a way to transform the phase information such that it can be recorded in an amplitude-based recoding system. For more details on 3D displays and their theory behind them, readers can refer to Chapter 3.
1.2.6 3D QoE
Although 3D video brings a brand new viewing experience, it does not necessarily increase the perceived quality if the 3D system is not carefully designed and evaluated. The 3D quality of experience refers to how humans perceive the 3D visual information, including the traditional 2D color/texture information and the additional perception of depth and visual comfort factors. As the evaluation criteria to measure the QoE of 3D systems is still in its early stages, QoE-optimized 3D visual communications systems still remain an open research area. At the current stage, the efforts to address 3D QoE are considering the fidelity and comfort aspects. 3D fidelity evaluates the unique