Download as pdf or txt
Next-Generation Interactive Broadcast Services

Conference Paper · September 2004

Uwe Rauschenbach Klaus Illgner

Jörg Heuer


Next-Generation Interactive Broadcast Services
Uwe Rauschenbach Jörg Heuer Klaus Illgner
Siemens AG, CT IC 2 Siemens AG, CT IC 2 Siemens AG, CT IC 2
Munich, Germany Munich, Germany Munich, Germany
[email protected] [email protected] klaus.illgner

Abstract News. Personalization supports the (semi-) automatic selec-

This contribution discusses some recent trends in interac- tion of content a viewer is especially interested in. This
tive broadcast systems, namely personalisable rich media functionality usually requires a receiver which is capable of
TV, scalability of services, the co-operation of broadcast recording the TV program. Additionally, metadata have to
networks with their fixed or mobile IP-based counterparts be provided along with the program which structure, de-
and the convergence of communication and entertainment scribe and classify the content and thus allow selecting the
services. For each of these, an example will be presented. interesting parts of the recording. The paper will describe a
metadata model based on the TV-Anytime standard [4].
Keywords Broadcast-IP network cooperation. Co-operating networks
Interactive Broadcast, Mobile Broadcast, Rich Media offer new functionalities and cost savings for the provider.
Broadcast, Scalable Services, Convergence. Sending content to multiple subscribers at the same time is
the domain of broadcast networks. In contrast to that, the
strength of broadband IP networks is to transfer personal-
In the last years, digital TV broadcast services have seen
ized content efficiently and to provide a return channel for
rapid growth. Having started as a replacement of analogue
“deep interactions” with the service providers, e.g., for
television, recent developments add value on top of the
votes. We will describe how a rich media service, which
classic time-linear digital video services. This paper dis-
consists of various media items, can be efficiently delivered
cusses on some recent trends in the intersecting fields of
over a combination of a DVB and a DSL network to mini-
television, multimedia, broadcast vs. IP network technolo-
mize transmission costs. A smart routing algorithm is se-
gies, and new services. Examples from recent research and
lecting the most appropriate transmission channel for each
development projects are given for illustration.
service component. The media items are combined at the
The contribution is structured as follows: First, some trends receiver into a seamless presentation such that it is trans-
of emerging and future interactive broadcast systems are parent via which channel they have been sent. If required by
described. After that, each trend is discussed in a separate the service scenario, a media stream transmitted via DSL
section, and examples are presented. Finally, conclusions (e.g., additional camera view) can be synchronized with the
are drawn. main TV program sent via DVB.
TRENDS Another recent trend is to combine broadcast and mobile
Digital television is a reality today. With the more efficient telephone networks. It is envisioned that this combination
bandwidth use provided, the user will be offered more TV offers new opportunities for television-like interactive mul-
programs than ever before. That’s why new ways of access- timedia services on-the-move.
ing this wealth of programs must be found based on per- Service convergence. Looking at the technical trends de-
sonal preferences. Additionally, the digital nature of the scribed above, new convergent services will become possi-
new television services makes it possible to use additional ble. For instance, communications (e.g. voice, SMS, MMS)
media types in a service, to deploy personal computing de- and entertainment (i.e. watching TV) can be integrated with
vices in addition to the TV set to receive these enriched ser- each other, a combination which will be further elaborated
vices, and to use data and communications networks to in the last section of the paper. Many more convergent ser-
transmit service components. vices are possible, like voting, multi-user games and virtual
Personalisable and Scalable Rich Media TV Services en- communities; all integrated with TV programs.
rich the linear program by additional related audio, video
and rich text material. Thus, these services create options A PERSONALISABLE, SCALABLE RICH MEDIA TV
for interactively browsing related content (e.g., a back- SERVICE
ground report for a News story or an additional camera Rich media TV services enrich the classic linear program
view in a Sports programme) and cater for the needs of spe- by adding supplementary or background information using
cial user groups by providing, e.g., voice-overs in a foreign various media types. Within the project SAVANT [9], a
language supporting different nationalities or a sign lan- News service concept has been realized which illustrates
guage video supporting the hearing-impaired. The paper the potential of this new type of TV services. Each News
will present a rich media service concept in the fields of show is broken along the time line of the main TV program

into individual semantically coherent entities called Pro- such a service, a rich set of metadata called a service de-
gram Items. To each of the Program Items, additional re- scription is necessary which contains the required informa-
lated background information can be assigned: HTML tion. Figure 1 illustrates the metadata model. The segmenta-
pages from the Web presentation of the News show, Video tion of the main program along the time line into program
clips featuring earlier reports on similar topics or audio items is shown. For personalization and random access via
clips featuring, e.g., reports from the radio related to a the STB user interface, each program item is annotated with
News story. This additional content is called asynchronous e.g. title, abstract and copyright. Furthermore, categories
because it only has a loose coupling with the timeline of the and topics provide means to group program items. Based on
program. Asynchronous content can also be assigned to a these metadata, a personalization component can select
complete news show or to Topics – a concept which allows Program Items based on a user profile expressing prefer-
to group multiple related News reports (e.g., reporting on ence for several categories or keywords. Such a user profile
the presidential elections) and provide a rich set of back- can be specified by the user, or it can be collected by index-
ground information to them with moderate editorial effort. ing the metadata of Program Items actively selected by a
The second class of additional content is of synchronous user for watching.
nature – i.e., it must be tightly synchronized with the main Figure 2 depicts the user interface of a scalable News ser-
program’s timeline. An example for that is an additional vice which allows access to the individual News items plus
video stream carrying a sign language interpreter for hear- background information by exploiting the annotations and
ing-impaired people. organization metadata.
Personalization requires a set top box (STB) which offers
Personal Digital Recording (PDR) functionality and oppor-
tunities to filter media data. A service may be personalized
in two ways: first, additional content is shown in a live
situation only if the personal profile of the viewer indicates
this (e.g., show a signer only if the viewer is known to be
hearing-impaired). Second, the complete program may be
recorded, and the individual Program Items are then filtered
according to personal preferences. This allows creating a
personal News program featuring Program Items originat-
ing possibly from multiple different News Shows.
The digital nature of the media data allows access to such a
rich media service using a variety of portable devices.
While a PDA or TabletPC may be used to consume the ser-
vice in-house via WLAN access to the STB, access with
mobile smartphones while on the move is possible, too. Figure 2. User interface of personalized, scalable News
The second large group of metadata describes the additional
content, supporting the rich media aspect of the service. For
each additional content item (ACI), properties like title,
synchronization with the main program or type have to be
described. Each ACI references one or more Media Items
which provide access to the actual media essences contain-
ing the additional content. The reference is realized by
means of a media locator (URI), which may e.g. reference
an HTML page via HTTP, a video clip via RTSP or via
DVB MPE (multi protocol encapsulation).
Service scalability means that a service is designed such
that it can be deployed on various devices with different
capabilities. A scalable service can benefit greatly from a
scalable content format. Such a format would allow encod-
Figure 1. Metadata model for personalized scalable rich ing each video stream once and then adapt it to different de-
media TV services vices by just decoding a well-defined part of the data pack-
In contrast to classic TV which consists just of audio and ets to reduce resolution or bitrate. However, scalable con-
video, new TV services contain a variety of media objects tent formats for video like MPEG4 Fine Granularity Scal-
with diverse relationships and meanings. To support the ability [10] have not seen wide acceptance yet. The current
various ways of presenting, accessing and personalizing MPEG-21 activity on Scalable Video Coding [8] may pro-

vide a solution in a few years. Meanwhile, a scalable ser- bile devices (a TabletPC and a PDA) connected to the STB
vice can also be realized by a combination of simulcasting via WLAN. The user interface to personalize and select the
and transcoding, at the cost of lower bandwidth efficiency. content (cf. Figure 2) is generated on the STB by an MHP
Simulcasting means to transmit a media item simultane- application, taking the metadata into account. The mobile
ously in various formats. Transcoding means to have a devices have access to scaled-down versions of the stored
transcoder unit in the STB which converts a Media Item content.
from one format/bitrate into another. The proposed meta-
data structure supports both approaches. By referencing
multiple Media Items for one content item, simulcast is
supported. For each Media Item, a description of the media
properties is provided in the metadata. The most appropri-
ate Media Item to present a content item on a specific de-
vice can be found by matching these media properties with
the capabilities of the actual target device. To support trans-
coding, a Media Item can be marked as “virtual” by omit-
ting the media locator but providing the media descriptions.
This way, a transcoder engine can use a sibling media item
from the same content item as the source for transcoding,
turning the virtual media item into a real one and inserting a
URI to the newly created media essence.

Figure 4. Content Access System to consume the person-

alized, scalable News service
After having described the metadata for a scalable, person-
alized rich media TV service which provide annotations of
and relationships between the service components (see [11]
for further details), the next section discusses ways how the
actual service components can be transmitted to the termi-
nal, possibly even by using different transport networks for
different Media Items.

Figure 3. An extended TV Anytime representation of the BROADCAST-IP NETWORK CO-OPERATION

metadata model
Rich Media Content Delivery over DVB Networks
Several metadata standards for multimedia exist, namely Within DVB, all content is sent packaged into an MPEG-2
MPEG-7 [6], MPEG-21 [7] and TV-Anytime [4]. As the transport stream (TS). A detailed discussion of the many
latter standard most closely resembles the structure of a TV functionalities of a transport stream is outside the scope of
service, it has been selected as the basis for the personal- this paper. From the point of view of rich media services, it
ized, scalable SAVANT News service [11]. Several ele- is important to point out that – besides the main audio-
ments have been added in order to support all the required visual content that is transmitted in the transport stream –
functionality (see Figure 3). Because Program Items are additional media objects may be sent using three different
modelled as segments of the main broadcast, the Segment- mechanisms: the object carousel, private sections and mul-
InformationTable has been extended to include all the tiprotocol encapsulation (MPE). The object carousel peri-
metadata necessary. Further, two tables have been added: odically transmits a hierarchical directory structure to the
one containing the information about the additional content receiver and is this way well suited to send content struc-
items and another one containing the Topics as an alterna- tured into multiple files (e.g., HTML pages or interactive
tive way of accessing Program Items and additional content. MHP applications (Xlets)), but also metadata to the re-
Figure 4 shows a system for presenting such a service to the ceiver. Private sections provide another means to embed
user. The content is delivered to a set top box by a combi- application-defined data packets into a TS. A client applica-
nation of DVB and DSL. The STB acts as a home media tion has to be provided at the STB to extract them. This
server and stores the content along with the metadata on its approach is well-suited for the one-time transmission of big
hard disk. The personalized rich media TV programme can files, e.g., asynchronous additional video content, in con-
be displayed live at a connected TV set. Furthermore, the trast to using the object carousel which may become over-
recorded content can be accessed using the TV set or mo- loaded as it repeatedly transmits its content.

Multiprotocol encapsulation allows carrying IP data packets quire too much overhead. In contrast, the routing decision
in a DVB stream, making it possible to embed RTSP for asynchronous additional content may be revised at any
streaming video into a rich media service. This way, syn- time. Revising means that all users currently consuming this
chronized additional video streams can be sent along with content via the “old” channel will keep doing so, while new
the main programme (like a signer for the hearing impaired users will receive the content through the “new” channel.
people or streams taken from additional camera angles in a As asynchronous clips are usually short, the “old” channel
Sports program). will be freed quite fast.
This way, we can distinguish the following three ways of
Rich Media Content Delivery over Co-operating smart routing:
Broadcast and Broadband Networks
As described in the previous section, digital broadcast sys- 1. Fixed routing of asynchronous content: Asynchronous
tems (DVB) are capable of carrying not only the main content is inserted into either the DVB or DSL channel
audiovisual content, but also additional media objects and depending on a pre-set field in the service description.
even downstream IP traffic to a multitude of users at the 2. Re-routing of asynchronous content: if the number of
same time. On the other hand, broadband IP networks users of an audiovisual Media Item accessed over DSL
(DSL) are becoming widely available, being ideally suited exceeds a threshold, the item will be inserted into the
to carry personalized content on demand. We believe that a DVB stream, e.g. using private sections. If the usage
future personalized broadcast system will combine DVB figures drop again, the ACI is no longer made available
and DSL networks for delivering a personalized service at via DVB but can still be pulled via DSL.
optimized costs. 3. Fixed routing of synchronous content: synchronous
In order to make content transmission over co-operating content is inserted into either the DVB or DSL channel
networks a reality, two issues must be considered. First, the depending on a pre-set field in the service description.
system should be able to select the channel via which a me- A routing decision is executed by the system by triggering
dia item will be delivered. We call this feature Smart Rout- the playout system to insert the media item into the desired
ing. Second, the system must be able to synchronize content transmission channel. Furthermore, the system must change
delivered through DSL with the main TV program deliv- the media locator (URI) of a media item in the service de-
ered via DVB. scription. This way, the Content Access System (cf. Figure
4) is instructed to extract the media from the correct chan-
Smart Routing nel. As a prerequisite for that, the service description must
The basic idea of smart routing is to save transmission costs be updated regularly, and these updates must be signalled
by using the channel for transmitting the additional content frequently to the Content Access System. As both DVB and
which offers the lowest transmission costs. For DVB, the DSL can carry IP traffic, the handling of the packets is the
costs are independent of the number of users. For DSL, the same after extracting them from the transmission channel.
transmission costs grow proportionally with the number of
users because for each user there will be an individual Content Synchronization
stream transmitted. Making a routing decision means to Systems for rich media TV services must provide a new
select the N media items of a rich media program which are kind of synchronization: synchronizing additional content
most likely to be consumed by the most users and to trans- with the main TV program. While frame-accurate sync is
mit them in the DVB channel, where N depends on the required only in some very rare cases, synchronization with
available capacity in the DVB channel and the bitrate of the an accuracy of a few frames has many applications: addi-
individual media items. To get an estimate about how likely tional camera angles in sports programs, quiz or talk shows;
it is that a media item will be in high demand, the following or a sign language interpreter to make TV programs acces-
criteria can be used: sible for hearing-impaired people. Synchronization has two
• Estimation by the program author or playout operator facets: first, transmission delays must be compensated to
and insertion into the service description ensure that the data packets carrying the additional content
arrive at the right time in the decoder. Second, the presenta-
• Prediction using usage statistics from previous similar
tion of the main and the additional content must be syn-
programs an heuristics based on media properties
chronized. Ideally, this should be possible using standard
• Actual measurements during the current program components at the receiver side.
While the first two methods provide a fixed routing decision When an additional video stream is streamed over RTSP
and are suitable for both asynchronous and synchronous and transmitted via DVB in MPE, transmission synchroni-
additional content (i.e. clips and streams), the last method zation does not pose a major problem because no transmis-
allows re-routing and can thus only be applied to asynchro- sion delays between the main program and the additional
nous additional content. The reason is that for synchronous content occur. When the additional stream is transmitted via
additional content, the routing decision can not be changed DSL, however, both transmission delay compensation and
while the content is playing – a seamless change would re- presentation synchronization are necessary.

[2]. A version of the DVB-T standard called DVB-H [3]
especially designed for efficient battery use of mobile re-
ceivers will be used as the downlink to broadcast multime-
dia content to mobile phones. Additionally, personalized
information, interactions and content protection / billing
functions will be provided using the GPRS or UMTS chan-
nels of the mobile telephone network, creating converged
mobile services. In alternative configurations of converged
broadcast-mobile systems, DVB-H may be replaced by a
multimedia version of DAB [5] or the Multimedia Broad-
cast Multicast Service (MBMS) [1] in cellular networks.

Mobile phones and digital TV receivers have in common
that these device categories become highly widespread in
daily life. However, a unique attribute of DVB set top
Figure 5: A signer synchronized with the TV program
boxes is the high display resolution compared to mobile
devices. This especially allows increasing the ease and in-
In the SAVANT project [9], we have developed a com- tuitivity of use of such devices due to enriched capabilities
bined approach to compensate transmission delays and en- of the graphical user interface (GUI). Both aspects make it
sure presentation synchronization: attractive to integrate other services than broadcast recep-
• Additional video content is streamed as MPEG-4 via tion into a digital STB.
RTSP/RTP. A very first trend towards this goal was the integration of
• A timing control component in the playout system en- web browsing capabilities over DSL into STBs. However,
sures that an additional content stream is started at the due to the lower display resolution of TVs and different
right point in time, denoted in the service description browser characteristics of STBs compared to the capabili-
and triggered by the clock driving the playout of the ties of PCs in general, web content has to be designed spe-
main video. If necessary (e.g. if the main content is sent cifically for the reception on TV. This recently results in
over satellite), the start time may be delayed slightly. ISPs providing web content in a “Walled Garden” concept
• Each RTP packet is time-stamped with a reference to which results in expensive information preparation. With a
the clock of the main video (Normal Play Time, NPT). growing number of STBs providing the capabilities of web
browsing, first business models of STB-specific web portals
• In the Content Access System, an RTSP Proxy inter- are discussed. Since these portals are accessed automati-
cepts the incoming data packets and buffers them. cally when the box is turned on, the content of the portal
• The transport stream demultiplexer in the CAS pro- appears to the user as if it has been pushed to the STB
vides the RTSP Proxy with the NPT timing information comparable to a broadcast service.
extracted from the main video. The convergence of such web and broadcast services is
• Knowing the time stamp of an RTP packet (which con- recognizable in that the so delivered web content allows
tains the NPT value when the packet left the playout interacting with the broadcasted content by means of pro-
system) and the current NPT value of the main video in gramming PDRs via Electronic Program Guides (EPGs) or
the CAS, the delay can be compensated by adjusting interactive games accompanying broadcasts. Even richer
the time stamp in the RTP packet accordingly. interactive web-based applications are enabled by synchro-
• The RTP packets with the adjusted time stamp are then nization mechanisms described in the previous section. A
passed to an unmodified MPEG4 player which presents prerequisite to enable service convergence is that the ser-
the additional stream. vice architecture has to converge. This is shown in Figure 6
which presents a high-level architecture view interfacing
Figure 5 shows a News program enhanced with a signer
interactive TV servers with play out servers.
which is synchronized with the main content using the
method just described. By adding DSL interfaces to the STB, it can act as a termi-
nating point not only for broadcast but also for IP-based
Convergent Mobile Broadcast Systems communication services. Due to its simplicity, SMS and
Up to here, extensions to broadcast systems have been dis- MMS clients have been ported to first STB prototypes. The
cussed which target the classic static TV set, with some in- obvious advantage of these concepts is the further simpli-
house mobility added by using WLAN access. However, fied use of the services. An example of an MMS Client on
digital media technology can offer more: delivering interac- TV is shown in Figure 7. In this case the integration of
tive broadcast to mobile phones and ultra-portable devices. MMS with broadcast allows new usage scenarios such as
This issue is currently being addressed by the DVB forum messaging of commented broadcast content. However, ob-

stacles for this kind of service convergence are legal aspects a) Convergence which requires the coupling of ser-
of intellectual property rights regarding broadcasted content vice provisioning.
and the lack of a purely IP-based standardisation of SMS b) Convergence which results from coupling of ser-
and MMS services as shown in Figure 6. vices on the receiving device and leads to new us-
age scenarios of services.
In both cases, the service convergence examples seen so far
aim at the enhancement of ease of use and user experience
at the same time.

In this paper, we have listed some recent trends in enhanced
digital broadcast systems and have sketched technical prob-
lems, possible solutions and usage scenarios related to these
trends. It can be expected that in the next few years a num-
ber of new, interesting services will emerge based on the
new opportunities of digital media technology and converg-
ing networks.

Figure 6: System architecture supporting service

convergence The authors express their thanks to the colleagues in the
SAVANT project for many valuable discussions. Parts of
Besides being a device terminating external services pro- the work presented have been funded by the European Un-
vided for instance via DSL, the STB can be seen as compo- ion under contract number IST-2002-34814.
nent of the home network. Such an integration can be real-
ized for instance via Universal Plug and Play (UPnP) to the REFERENCES
STB to coordinate tasks between, e.g., a phone and the [1] 3rd Generation Partnership Project; Multimedia
video recorder capability of a device as shown in Figure 6. Broadcast/Multicast Service; Stage 1, 3GPP TS
In this example, after negotiating the offered services via 22.146 V6.5.0, June 2004.
UPnP, the phone is capable to send notifications of incom- [2] DVB Forum homepage,
ing calls to the STB which renders caller information on the
TV screen. On reception of the call, the phone triggers the [3] DVB forum, Draft DVB-H Standard, available from
STB to time-shift the current broadcast. After termination
of the call, the user has the possibility to resume watching [4] ETSI TS 102 822-3-1: Broadcast and On-line Ser-
the TV at the point he started the phone call. vices: Search, select and rightful use of content on
personal storage systems ("TV-Anytime Phase 1"),
Part 3 Metadata, Sub-part 1: Metadata Schemas. 2002.
[5] W. Hoeg and T. Lauterbach: Digital Audio Broadcast-
ing – Principles and Applications of Digital Radio, 2nd
edition, John Wiley & Sons, 2003
[6] ISO MPEG-7, Part 5 - Multimedia Description
Schemes, ISO/IEC JTC1/SC29/WG11/N4242, 2001.
[7] ISO MPEG-21, Part 7 - Digital Item Adaptation,
ISO/IEC JTC1/SC29/WG11/N5231, 2002.
[8] ISO MPEG-21, Part 13 – Call for Proposals on Scal-
able Video Coding Technology, ISO/IEC JTC1/SC29/
WG11/N6193, Waikoloa, December 2003
[9] IST project SAVANT,
[10] W. Li: Overview of Fine Granularity Scalability in
MPEG-4 Video Standard. IEEE Trans. Circuits and
Systems for Video Technology, 11(3), March 2001
Figure 7: Gigaset Interactive TV as an example for conver-
[11] U. Rauschenbach, G. Stoll, W. Putz, R. Mies and P.
gent TV and communication services
Wolf: A Scalable Interactive TV Service Supporting
According to the presented services in this section, conver- Synchronized Delivery over Broadcast and Broadband
gence of services can be divided into two categories: Networks. IBC 2004 conference, Amsterdam.

