Accessible Interactive Television Using The MPEG-21 Standard

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Accessible Interactive Television Using the

MPEG-21 Standard
Evangelos Vlachogiannis1, Damianos Gavalas2, George E. Tsekouras2 and
Christos N. Anagnostopoulos2
1
Department of Product and Systems Design Engineering, University of the
Aegean, Hermoupolis, Syros, 84100, Greece
Phone: +302281097000
Fax: +302281097009
Email: [email protected]
Url: http://www.syros.aegean.gr

2
Department of Cultural Informatics, University of the Aegean, Mytilene, Lesvos
Island, 81100, Greece
Phone: +302251036600
Email: [email protected], [email protected], [email protected]
Url: http://www.ct.aegean.gr/

Abstract

In this paper, the accessibility of the interactive television (iTV) is being discussed as a primary
factor for its satisfactory adoption and commercial success. The work presented here is undertaken
in the context of a research project that focuses on delivering iTV services to disabled children.
This objective is accomplished through the utilization of the arising MPEG-21 standard. Based on
that standard, iTV accessibility is investigated in terms of metadata and content adaptation. The
novelty of contribution lies on a systematic methodology that deals with a wide range of
accessibility problems contrary to previous studies that focus mostly onto users with only one
specific disability.

Keywords: Accessible interactive TV, MPEG-21, content adaptation approach,


metadata, pervasive environments, collaborative filtering.

1 Introduction

"Winky Dink And You" is considered as "the first interactive TV show". It was a
television children's show that aired from 1953 to 1957 and allowed interaction
through the use of the “Official Winky Dink Kit”. It was the first time that the
television program consumers’ role had been extended from passive viewer to
1
active participant. This can be compared with the very recent move from the
passive web site consumption to the social – participative web, known as Web
2.0. Wellens [30] stated that “Interactive television represents means of linking
individuals together by providing each with an electronically mediated
representation of the other’s voice and visual presence”.
RNIB Scientific Research Unit's website (Tiresias1) put the threshold between
interactive TV and Enhanced TV as follows: “Enhanced TV is probably a better
term to refer to one-way applications such as teletext, EPG access etc., and it
could be advantageous to restrict the term ‘interactive TV’ to two-way services
reliant on some form of return path.”. Whatever the form of iTV has taken
(including webTV, internet TV, Video on Demand, cable, satellite, digital
terrestrial) during its long trial period, its adoption has been far away from the
expectations [5] . According to Suzanne Stefanac of RespondTV, “the single
greatest stumbling block iTV faces is the lack of a clear standard” [36] . Choi et al
[5] developed a technology adoption model for iTV and discussed that “iTV may
have different critical factors compared to conventional information systems
because it is mainly used in home environment and it has never been used
before”.
Currently, interactive television (iTV) comes again to the front having more
advances technologies and more mature audience. iTV field has adopted
techniques and technologies initially developed for the World Wide Web ([8] , [9]
). This is more apparent in the case of IPTV but generally applies to all kinds of
iTV. Considering also that the number of TV sets is considerably larger to that of
PCs worldwide [33] , it becomes evident that the interaction requirements and
specifically the need for accessibility are crucial. For instance, an iTV user now is
in front of a large number of services (term used for TV channels) with amazing
possibilities. A similar “explosion” occurred in the past in the World Wide Web
and search engines; later on it was the portals (equipped with search engine
facilities) and the adaptation mechanisms that made the huge information
manageable.
The MPEG-21 standard [11] recently released by ISO, aiming at defining an open
framework for multimedia applications, seems to find a natural fit in the world of
iTV [14] .

1
http://www.tiresias.org/research/guidelines/television/idtv.htm
2
This paper presents the work undertaken in the context a Greek national project
aiming towards developing a MPEG-21 based framework for adapting iTV’s
content with respect to disabled children requirements.
The authors propose an approach for iTV accessibility focusing onto the
interaction of the stakeholders through adaptation. Contrary to the majority of the
approaches found in literature, this approach investigates iTV accessibility in a
wider manner without focusing to a specific user group such as users with low
vision.
The paper starts by presenting the related research and sets up its contribution
roadmap. Requirements discussion is followed and a higher level approach is
proposed with an accompanied architecture. Finally, the last section concludes our
work and draws directions for future work.

2 iTV accessibility research & contribution roadmap

Even since 1997, RNIB has provided recommendations for the accessibility of
iTV [7] . Carmichael et al [3] discovered similarities between the directions of
iTV with that of the Web and further noted that the gained experience from the
later has to be transferred to the domain of iTV in order to avoid similar mistakes,
which have not been avoided so far. Piccolo et al [18] discussed that the
convergence between the two media (i.e. Web and iTV) is able to lead to the
appropriation of Web accessibility knowledge that has already been acquired with
some adjustments and proposed recommendations to design accessible interfaces.
For approaching iTV accessibility the authors have identified the following main
components (see Figure 1):
• Hunan actors: The consumers (end users), the authors and the providers
that either consumes/produces/ provides the service.
• User/Controller Interface (or Direct UI): The interface that the user
interacts directly.
• Controller / Content Interface (or Indirect UI): The interface that often is
being provided through the set-top box and displayed on the TV monitor;
i.e. indirect user interaction.
• Content: The actual digital content (e.g. movie) with accompanied
metadata.

3
One way of reaching iTV for all, is the satisfaction of the design requirements of
the identified components from an accessibility point of view.

Figure 1: iTV Main Components

Having identified the main iTV components, this section aims at identifying and
satisfying the design requirements of the components for accessible iTV. These
are being discussed briefly in separate sections below and the contribution of this
work is being allocated.

2.1 Human actors’ accessibility role

The main stakeholders that act during the life cycle of an iTV broadcasting
relating to accessibility are:
• the content providers: The content / service providers should define an
accessibility policy and also provide inspection procedures that would
guaranty the fidelity.
• the authors: the content authors are the ones that need to create content
having in mind the aforementioned accessibility policy. Thus, through an
appropriate authoring tool they will be able to produce accessible content
by providing multimodal metadata-enabled content.
• the consumers (end users): In order to consume effectively an iTV
programme, the end users with some impairment need i) to use well her
assistive technology and ii) provide appropriate feedback to the system
through the EPG interface (update profile, rate content e.t.c.)

2.2 The Accessibility of the User / Controller Interface

Cesar et al [4] distinguished two essential pillars in interactive digital television


systems: user interaction and social communication. The first pillar and most
4
interesting from the perspective of accessibility, concerns the design and
development of user interfaces as the old-fashioned passive remote controls do
not seem adequate and usable enough. They came up with three subtopics of “user
interaction” topic: Extension of traditional remote controls including voice and
gestures; Augmentation of everyday objects including natural ways to interact
with media content and nonintrusive methods; Repurposing of other devices
including handheld devices as universal remote controls.
An alternative approach to the user interaction theme is through the use of abstract
user interfaces. “People with different types of disabilities find it difficult or
impossible to directly use electronic devices and services because the
device’s/service’s user interface cannot accommodate the special needs of certain
user groups (such as users with visual, hearing, or mobility impairments)” [34] .
The former research suggests that users have to rely on service and device
implementations that are specifically designed for them. In other words, a single
individualized – universal user interface should be able to deal with as much
interactive devices as possible. A standardization effort have been taken place in
the corpus of URC consortium in order to come up with a versatile user interface
description for products, a "User Interface Socket" to which any URC can connect
to discover, access and control the remote product [35] . URC approach has been
recently adopted by ISO (ISO/IEC 24752:2008) [12] .
Abstract user interface seems even more challenging from the adaptation point of
view. The user interface design process is very important, since it is the subsystem
that directly interacts with the users and their contexts. For several years, Human
Computer Interaction (HCI) is struggling to develop an abstract representation of
user interface ([25] , [21] ). This would offer possibility of UI adaptation
according to system's environment (including user and context of use, e.g. [28] ).
Such an adaptation could be realized through the abstraction of user interface.
Having that, context sensitive pipelines would be introduced so that the interface
can be adapted according to the system's environment [29] .

2.3 The Accessibility of the Controller / Content Interface (EPG/IPG)

It turns out that the most significant accessibility difficulties concerning the iTV,
are related to the use of Electronic Programming Guide (EPG) by the users with
visual, motor, or cognitive disabilities [18] . Thus, an important step towards

5
accessible iTV is providing a well-designed EPG. The EPG is a vital component
of interactive television allowing viewers to navigate through available programs
and services. This is often a complex interface influenced by the design of WIMP
(Windows, Icons, Menus, Pointers) on-screen application [2] that explores users
between a huge number of programs and services, which is far away from the
traditional analog TV menus, having to handle no more that 5 to 10 passive
channels. Vista project [2] aimed at developing a virtual assistant, embodying a
speech based interface between digital television viewers and the content and
functions of the EPG. In order to enable the efficiently targeting of preschool
children, Joly et al [13] developed special requirements, which were based on a
range of existing guidelines on interactive television applications, personalized
recommendation systems and interaction design for children, in the context of
theories of child development.
Rice and Alm [19] , attempted to support older people who have difficulties in
using current interface models for Digital TV. Their research indicated that
“navigational techniques that mimic aspects of real-world artefacts in a manner
that individual’s can quickly relate to present possible new directions in DTV
design. However, the success of such systems depends on research strategies that
take the impact of both an appropriate input control and on-screen interaction
into account.”

2.4 The Accessibility of the Content

In the related literature there have been several attempts in order to incorporate
accessibility issues into the MPEG-21. The majority of them are focused into
visual disabilities (e.g. [20] , [22] , [24] , [31] ). Rice [20] presented the
difficulties that visually disabled users face while consuming iTV services. This
work gave emphasis into parameters like screen size, font size and color, icons’
identification and screen layout. The conclusion of this work was that the best
facing approach of the problem situation is personalization due to the diverging
requirements. Choi et al [5] mentioned the fact that TV compared with PC is a
home appliance, and therefore is not personal but shared, which directly implies
that the opinions of family members are very influential. Thang et al [24]
proposed a systematic contrast-enhancement method to improve the content
visibility for low-vision users, through MPEG-21 content adaptation. Yang et al

6
[31] proposed a technique for the accessibility of iTV for people with visually
deficiency, especially color blindness. This technique involves both the
incorporation of MPEG-21 with relating descriptive metadata and the design of an
adaptive system. Berglund & Johansson [1] studied the benefits of the usage of
speech - dialog in the domain of iTV and concludes to several design
considerations. Carmichael et al [1] [3] concluded that the accessibility
characteristics that have not yet been given necessary emphasis are subtitles,
captions and audio description [1] [3] , characteristics that are given emphasis in
the corpus of the web (WCAG2.0, SMIL2, SVG3).

3 Specific Issues of the application domain

Having set a more general approach for the accessibility of interactive television,
this section aims at raising some more specific issues on current system design
coming from the more specific users target group. As already mentioned, in the
introduction, this work focuses on the delivery of interactive television content to
the disabled children. Disabled children requirements are considered as the set of
requirements that comes out from the blending of disabled people requirements
and children requirements relating to interactive television.
Clarkson et al [6] identify four types of disabilities accompanied with relating
issues (in parenthesis):
• Visual Impairment (Recognizing and locating buttons on the remote
control; Reading the on-screen display)
• Hearing Impairment (Subtitles, Volume, Literacy)
• Dexterity Impairment (Button sensitivity; Compact layout, Remote
Control Complexity)
• Cognitive Impairment (Time delays between cause and effect;
understanding the way in which elements of the on-screen display are
intended to correspond to the buttons on the remote control, literacy)
From the side of children requirements, Hynd [10] , while studying the responses
of young children to interactive television programs, has focused on television’s
immediate effects on attention, comprehension, engagement and enjoyment. Hynd

2
Accessibility Features of SMIL: http://www.w3.org/TR/SMIL-access/
3
Accessibility Features of SVG: http://www.w3.org/TR/SVG-access

7
examined the characteristics of television that have been found to influence these
outcomes for young children and also individual factors like gender and age.
Combining the two aforementioned research results, we can come up with some
questions that can lead to i) the potential parameters a disabled child’s profile
should incorporate and ii) the technical characteristics interactive television
broadcasting should provide:
• What programme the child wants to consume?
• What interaction capabilities / possibilities are being provided through a
specific program so that the last one would be able to gain the attention of
the child?
• How such programs need to be communicated to the child – using which
modalities?
• How simple dialogs and texts should be and much time is optimum to
persist in order to be comprehensive?
• How simple and attractive both the remote controller and the EPG should
be?
• How does the context could help or distract the child?
This list is by no means an exhaustive one and of course not all of these questions
relate to proposed software architecture. Nevertheless, this illustrates the approach
developed in order to extract IN PARAMS and OUT PARAMS discussed in next
section (see Figure 2). It should be noticed that several technical requirements
coming from the “disabled people” perspective often intersect with some coming
from the “children” perspective. For instance, someone with cognitive impairment
and a child with literacy limitation both require simple text and dialogs.

4 An approach towards accessible Interactive


television focusing on the Content

This section aims at presenting the paper’s approach for enhancing the
accessibility of interactive television. This approach is focused on:
• the requirements of the content for allowing accessibility,
• the appropriate communication (e.g. subtitles, audio description, sign
language e.t.c.) to the end user through adaptation mechanisms and

8
• the delivery to the end user the appropriate programs (program
recommendations) depending on user’s program preferences and
capabilities (like impairment and age)
MPEG-21 is able to provide to the iTV designer a framework that can offer a big -
integrative picture of an iTV system. Based on that, an indicative scenario has
been devised, including production, delivery and consumption of the digital
content, aiming at identifying the primary entities and the way these are involved
in the overall design outcome (see Figure 2). According to that:
• The content designer (CD) identifies the target groups.
• The CD, supported by MPEG-21 metadata, describes the target groups
using their characteristics (e.g. blindness) and associates interaction modes
(e.g. auditory description) using an appropriate authoring tool.
• The CD develops the required content components (digital items) based on
the above-decided interaction modes. These are integrated into the
metadata by using the authoring tool.
• End user A, say blind, wants to consume developed content. She/he has
already stored her/his profile. The context of use is accomplished with
attributes like access device capabilities, audio configuration, time and
location of the end user.
• The context of use is delivered to the serving system accompanied by the
user request.
• The system inferences and maps the user’s context of use with an
appropriate composition of the components of the content. If, while
consuming, the context of use is being modified, the system needs to be
aware so that it can adapt to new requirements.

Figure 2. MPEG-21 involvement in iTV: a possible scenario.

9
Even though MPEG-21 addresses considerations for adaptation and specifically
accessibility by including several relating XML elements into its schema, it seems
that it cannot ensure the accessibility of delivered content. Instead, this is a
fundamental condition for providing accessibility output of the systems involved.
In other words, it should be able to provide the required infrastructure so that a
digital content would be able to obtain the requisite variety for both the content
designer, to be able to design accessible content, and the involved systems, to
have the required information to deliver an accessible result. Figure 3 presents the
stakeholders related to accessibility. From such a point of view, the content
provider, the author (also referred to as content designer), the authoring tools, the
systems of the content provider and of course the consumer with her/his
accompanied interaction profile [29] (preferences, device capabilities etcetera.)
are identified and all play a major and cascading role to the iTV accessibility.

Figure 3. Multimedia delivery stakeholders related to accessibility

Briefly, the role of the MPEG-21 towards the accessibility of iTV is revealed
through the following dimensions:

10
Alternative content: MPEG-21 offers metadata that allows content providers to
provide the content in one or more alternative ways. The ways often refer to
different modalities and thus, they can include captions, audio descriptions, etc.
Digital Content Navigation: In iTV environments, navigation facilities within
available content are provided by an Electronic Program Guide (EPG). This is
actually the interactive portion of the system that offers the required functionality
to the user including service (channel) selection / retrieval, programs information
and scheduling, profiling / personalizing, rating and/or even acting upon the
content.
Description of context of use (IN PARAMS): The usage context actually refers to
all the information that needs to be taken into account to adapt digital content
according to the user’s requirements.
Description of presentation parameters of digital content (OUT PARAMS): This
determines what technical characteristics need to be adapted. An important
implementation consideration was the transformation of MPEG-21 to SMIL as an
intermediate solution to ensure media players’ compatibility. This involves the
mapping between those two infrastructures realized using XSLT.
Device accessibility: This refers to the accessibility of the involved hardware
including remote controls and set-top boxes4.
Content provider accessibility policy: Probably, an important contribution to the
field of accessibility of MPEG-21 is the capability of applying and claiming for an
accessibility policy. In other words, content providers need to be capable of
applying a kind of accessibility policy based on the target consumer group and the
former’s requirements for quality assurance. For instance, such a policy could
provide for digital content to be accompanied by subtitles of two languages (e.g.
English, Greek) and every image with an alternative text between two and ten
words. Applying such policies requires a mechanism for validating a digital
content to a policy description and could be for instance implemented based on
Schematron5 (an XML structure validation language for making assertions about
the presence or absence of patterns in trees).

4
http://www.tiresias.org/equipment/settop_boxes.htm
5
http://xml.ascc.net/schematron/schematron1-5.sch

11
5 The System Architecture

Figure 4 illustrates the system’s overall architecture that came out following the
aforementioned approach to accessibility. The overall system consists of:
• the accessibility enabled authoring tool (developer21), which allows
content providers to easily author a diversity of multimedia resources
supporting a MPEG-21 compliant metadata model;
• the user interface (itvSimu), which is the component through which the
end user will experience the services;
• the expert (content recommendation) system, which uses an algorithm
originally devised for clustering web documents [26] , to classify digital
items and user profiles based on their attributes and enable intelligent TV
program recommendations;
• the backend infrastructure, which consists of i) a persistence subsystem
based on a native XML database where the digital content descriptor are
located and ii) web services infrastructure for the communication between
distributed subsystems.

Figure 4. iTV adaptation architecture

5.1 Overview of the authoring tool (developer21)

Developer21 for MPEG-21 serves as a multimedia authoring tool adding or


extracting MPEG-21 descriptors and metadata in various multimedia assets as
shown in Figure 4. Once created, these descriptors (in XML schema files) are
locally stored in an XML metadata database. Users have the possibility to create a
12
new MPEG-21 Digital Items, edit, delete, convert or send this metadata document
to the database.
The tool is designed to support 6 different XML schemas, each one dedicated to
the respective MPEG-21 part. The MPEG-21 descriptors that are provided by
Developer21 are the following:
• Digital Item Declaration (DID)
• Digital Item Identification (DII)
• Intellectual Property Management and Protection (IPMP)
• Rights Expression Language (REL)
• Rights Data Dictionary (RDD)
• Digital Item Adaptation (DIA)
Basic information about descriptors is provided by graphical representation: the
type of descriptor (DID,DII, IPMP, RDD, REL, DIA), the type of program
information and general information or only audio and video attributes. In parallel
with the editing and browsing capabilities of the tool, metadata management is
also supported. Binding of metadata and XML descriptors with the actual
multimedia content is performed in order to create the integrated Digital Item that
contains the actual content and the descriptive information. When a Digital Item is
processed with Developer21 (see Figure 4), it is in the appropriate form to interact
with an Expert System that is used for increasing the interactivity in IPTV or iTV.
The expert system assigns a TV viewer to a specific social category and then
matches the appropriate audiovisual content according its respective MPEG-21
descriptors. Hence, Developer21 supports content personalization by MPEG-21
metadata, describing the target groups with their characteristics (e.g., blindness)
and associating the appropriate interaction modes (e.g., auditory description) in
the multimedia content.
In general, personalization allows users to browse programs much more
efficiently according to their preferences. Specifically, using DIA the authoring
tool describes user characteristics such as:
• Usage Preferences and history
• Content characteristics preferences such as Audio, Display Color and
Graphics presentation, Presentation Priority and Stereoscopic Video
Conversion.

13
• Accessibility issues like focus of attention, auditory impairment, visual
impairment, color vision deficiency
• Terminal technology such as codec capabilities, display capabilities, audio
output capabilities, user interaction inputs and device class
• Network capabilities and condition
• Location, time and environment

5.2 Overview of the user interface prototype (itvSimu)

Under the umbrella of our research project, the need for designing and developing
of a simulation platform, acting as an interaction interface between our iTV
architecture and the prospective viewer, was evident. In other words, a user
interface prototype has been implemented to enable users to effectively browse,
search, download and consume the provided audio-visual content. In the case of
disabled people ‘effectively’ means that both the content and the value-added
services need to be accessible to the user. However, our project has not focused on
the accessibility of the EPG as this was out of the scope of the project.
Nevertheless the GUI have been developed using Java Accessibility API / Java
Access Bridge6, fact that makes our prototype accessible at a satisfactory level.
In effect, the developed User Interface comprises an EPG simulator. It should be
noted that the choice of the implementation technologies has not been straight-
forward considering the plethora of available standards and technologies like
MHP7, GEM-IPTV, TV-Anytime, DVB-IP, Java-TV and more. Given the
requirement for incorporating networking functionality into the EPG subsystem, a
web-based approach instead of a standalone application has been adopted. This
approach ensures execution of the EPG through a standard browser interface. The
design approach follows.
During the early faces of the design of the prototype system, an identification of
the stakeholders took place:
• The end user: he/she interacts with the ITV interface browsing and
consuming digital content. The end user is associated with an XML-based
user profile which includes personal data, preferences upon the

6
http://java.sun.com/javase/technologies/accessibility/accessbridge/index.jsp
7
http://www.mhp.org/
14
audiovisual content (e.g. sports, news, movies) and potential disabilities
(hearing problems, visual impairments, etc)
• The Service Provider: The analogous of the traditional TV channels.
• The TV Guide Provider: A service that informs end users about the offered
services and their availability time schedule.
Occasionally, the Service Provider and the TV Guide Provider coincide; for
simplicity reasons we have made such assumption while designing our prototype.
Our focus has been on the interaction of the end user with the iTV interface, since
that will affect the overall functionality of a personalized system, with particular
emphasis on disabled users.
Figure 5 illustrates the three elementary sub-systems of the iTV user interface: the
player (left panel), the EPG (right panel) and the logger (bottom panel). EPG
panel consists of three panels: i) “My ITV” panel, where the content
recommendations appear and the user can also trigger a reminder, ii) “Program”,
where the user can select between services and view the program of the selected
service and iii) “My Profile” panel where the user can modify her profile.
The three elementary sub-systems presented above are supported by auxiliary
services for enhancing the functionality of the iTV simulator. Bellow we analyze
the functional and interactivity requirements of the above-mentioned subsystems
and discuss the solutions adopted in our prototype.

Figure 5. A screenshot of the iTV user interface: recommendations panel

15
5.2.1 itvSimu Subsystems

Logger subsystem
This is the simplest, yet, a crucial software module as it provides feedback to the
user for the “hidden” operations. It records and displays all (implicit or explicit)
user actions (e.g. profile modification, starting / pausing / resuming a TV
program, etc). It has been implemented through Java Observer pattern whose
actions activate the logger.

mpeg
mpeg XSLT

21
21 Adaptation SMIL
SMIL
engine document
document

User
Profile

Figure 6. XSLT Transformation of MPEG-21 digital items to SMIL documents.

Player subsystem
This reproduces iTV programs (digital items) as well as recording the user’s
interaction history. Its elementary module is the digital content player. Such
player should support more than basic functionality (play, pause, rewind, etc.),
such as subtitles, audio descriptions, etc. Given that no MPEG-21 player is
currently available we have chosen to use SMIL as intermediate technology
mainly due to the numerous available SMIL players (e.g. X-Smiles8, QuickTime
player). In particular, the MPEG-21 digital item declarations are transformed into
SMIL format through an appropriate XSLT transformation and subsequently the
SMIL markup code is parsed by the SMIL player. That approach ensures the iTV
interface’s interoperability, since SMIL is now considered a mature web
technology. In our prototype, the SMIL player has been implemented using the
QuickTime for Java API9. As illustrated in Figure 6, the XSLT transformation of
MPEG-21 digital items to SMIL documents depends on the user profile, taking

8
X-Smiles SMIL player, http://www.xsmiles.org/xsmiles_smil.html
9
QuickTime for Java (QTJ) is a software library that allows software written in Java to provide multimedia functionality,
by making calls into the native QuickTime library. QTJ offers SMIL support and also can handle a larger variety of
multimedia formats than the ‘traditional’ Java Media Framework (JMF) API.
16
into account potential user disabilities. An example of such digital item
declaration and its SMIL representation is given in
Figure 7.
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8" ?>
<DIDL xmlns:xsi=http://www.w3.org/…="....\MPEG-21-DI\DIDL.xsd"> <smil xmlns:qt="http://www.apple.com/...." time-slider="true">
<ITEM> <head>
<DESCRIPTOR> <layout>
<STATEMENT TYPE="text/plain"> <root-layout width="320" height="350" background-color="black" />
Movie for normal, blind or deaf individuals <region id="captions" backgroundColor="yellow"
</STATEMENT> top="250" height="100" left="1" width="310" />
</DESCRIPTOR> <region id="movie" left="0" top="0" width="620" height="740" />
<ITEM>
</layout>
<DESCRIPTOR>
</head>
<STATEMENT TYPE="text/plain">
<body>
Movie for normal individuals
</STATEMENT> <par>
</DESCRIPTOR> <textstream src=“captions.txt" region="captions“
<COMPONENT> systemCaptions="on" />
<RESOURCE REF="video.mov" TYPE="video/mov"/> <video src=“video.mov" alt=“Movie title" region="movie“
</COMPONENT> begin="00:00.0" dur="00:14:02.000" />
</ITEM> </par>
<ITEM> </body>
<DESCRIPTOR> </smil>
<STATEMENT TYPE="text/plain">
Movie for blind individuals
</STATEMENT>
</DESCRIPTOR>
<COMPONENT>
<RESOURCE REF="video.mov" TYPE="video/mov"/>
<RESOURCE REF="audiodescription.mov" TYPE="audio/mp3"/>
</COMPONENT>
</ITEM>
<ITEM>
<DESCRIPTOR>
<STATEMENT TYPE="text/plain">
Movie for deaf individuals
</STATEMENT>
</DESCRIPTOR>
<COMPONENT>
<RESOURCE REF="video.mov" TYPE="video/mov"/>
<RESOURCE REF="captions.txt" TYPE="text/plain"/>
</COMPONENT>
</ITEM>
</ITEM>

Figure 7. A Digital Item Declaration document (left) transformed to SMIL format (right) which
synchronizes a video with captions (appropriate for hearing impaired individuals).

The second function of the Player subsystem is the provision of user interaction
information to the expert (recommendation) system. An XML-based description
of the user interaction is first stored into an XML native database located on the
iTV’s server and retrieved by the expert system to enable more effective and
reliable reasoning. In effect, the user interaction history comprises a function f (x,
y, .., z), wherein x, y, .., z are the values of interaction parameters. Such parameters
are either explicitly provided by the user or implicitly inferred by the player.
Examples of implicit parameters are the playing time of a video over the video
duration ratio, while the rating of a TV program (in a 0-10 scale) could be
explicitly provided by the viewer. The interaction history function could be
expressed as f(x) = a X + b Y where a, b represent weights based on the designer’s
priorities, which could either be static or dynamically specified (through training).
As shown in Figure 8, the user’s interaction history and the TV programs ratings
17
posted by users that belong to the same users’ cluster (the concept of user cluster
will be discussed later on in this paper) comprise the input of the expert system.
The latter recommends -among the available digital content- those programs that
suit the user’s profile and the user cluster XML descriptions.

Digital content
rating

MPEG-21
MPEG-21
documents
documents
(digital
(digitalitems)
items)

Recommendation
System
iTV Schedule
recommendation

User Profile

User Interaction
History

Figure 8. TV schedule recommendation.

EPG subsystem
This is the most ‘interactive’ subsystem since it is used by the user to browse,
navigate and download audiovisual content. In the context of our research project
we have identified several use cases according to which the iTV end-user may use
EPG in order to:
• navigate within iTV available services (zapping);
• personalize the audiovisual content based on her potential disabilities and
content preferences;
• schedule a reminder for a TV program.
An important consideration task during the EPG’s development has been the
representation and retrieval of the TV schedule. To satisfy this design requirement
we have used TV-Anytime Programme metadata [27] along with TV-Anytime
Java API developed by BBC10. The overall functionality of the EPG has been
based upon the specifications of the JAVA TV API (JSR-000927) in a non strict
manner. The result of the BBC TV schedule retrieval on the iTV interface is
shown in Figure 5.

10
http://www.bbc.co.uk/opensource/projects/tv_anytime_api/

18
The most important part of content personalization has been the modelling of user
characteristics (e.g. disabilities) and preferences. To address this issue, we have
adopted the Interaction Profile of DAWIS framework for the design of adaptive
web information systems [29] . The most abstract layer of the DAWIS Interaction
Profile consists of the Service Interaction Profile, the Delivery Context Interaction
Profile, the User Interaction Profile and the Platform Interaction Profile. Based on
that, an itvProfile schema has been developed and serialized in XML syntax
including elements like LanguageNative, Languages, ContentPreferences,
Disabilities, Subtitles, Captions, AudioDescription and SignLanguage. The
itvProfile instances are stored in a separate collection into the XML database
storage through XQuery 11.

5.3 Overview of the Expert System

The system aims to increase the accessibility of the iTV platform by: i) content
recommendation and ii) content adaptation, both based on user profile and content
metadata (with emphasis to the age and the accessibility).
Thus, the Expert system consists of two subsystems i) the content
recommendation unit and ii) the content adaptation unit. Figure 9 presents the
architecture of the system.

Expert System

Content
Recommendation
User Profile
XML Descriptions of
Content Clusters
Content Adaptation
Content
Clustering

Personalized - Adapted
-
XML Content Multimedia Content
Descriptions

Figure 9. Basic structure of the expert system.

11
XQuery 1.0: An XML Query Language: http://www.w3.org/TR/xquery/

19
5.3.1 The Content Recommendation Unit

The basic functionality of the content recommendation unit is to provide program


recommendations to the users based on their user profile and the multimedia
content metadata following a collaborative filtering process ([17] , [16] , [15] ,
[32] ). This unit operates in a hierarchical structured algorithm, which is described
below.

Collaborative Filtering Algorithm


Step 1. We perform cluster analysis of the content XML documents for creating
document clusters, called content clusters. Specifically, based on the
XML documents associated with the content Digital Items we select the
significant attributes, each of which is assigned to a specific document.
Since the pre-selected attributes are categorical in nature, the set of XML
documents define a categorical data set. Then, we apply an algorithm to
partition this data set into a number of clusters, where documents that
belong to the same cluster are as similar as possible, while documents
belonging to different clusters are as dissimilar as possible.
Step 2. Similarly, we select a number of significant attributes, including the
usage history, from the user profile XML documents. These attributes
define the feature space, where each user is represented by one point.
Thus, we generate a number of categorical data, each of which
corresponds to a specific user. Then, we apply the clustering algorithm
to partition the set of users into a number of clusters, called user clusters.
Step 3. The system prompts users to rate the programs they have consumed. We
assign each user cluster to a specific content cluster. This assignment is
carried out by taking into account the sum of ratings of the users that
belong to same user cluster. Then, we assign to that user cluster the
content cluster that corresponds to the higher rating. It should be
emphasized that each user cluster may be mapped to multiple content
clusters. Herein, we chose one-to-one mapping in order to reduce
computational cost.
Step 4. The system classifies the current user to a user cluster and recommends
to her the programs that belong to the content cluster assigned to that
user cluster (see previous step).
20
Step 5. If needed, the recommendation list may be shortened by including the
most interesting programs. This may be accomplished by applying a
threshold upon the ratings of the individual programs that belong to the
recommended content cluster.

5.3.2 The Content Adaptation Unit

The content adaptation is of major importance for the efficient presentation of


multimedia content [23] . Herein, content adaptation is performed by using a
number of inference rules. To design the set of rules, the Digital Items are stored
in three abstraction levels. The first level stores the original multimedia object and
its respective locators, which include all information required to download the
object (e.g. the path, etc). The second level includes the descriptions of the object,
which mainly concern the type of the content (i.e. audio file, video file, image file,
etc.). Finally, the third level includes all the sub-elements of the digital item. An
example of the three abstraction levels is depicted in Figure 10.

Digital Item
Metadata Metadata
adaptation adaptation
process process

Video Audio Video No Audio

Audio Voice to No Audio


Void
Description Text Description

Sign Language Sign Language


Freq. 500Hz Void
Description Description

Figure 10. example of two different versions of a DI as a result of the metadata adaptation process.

Based on the above figure the inference rules used to adapt the content are derived
as follows:

21
First, the symbols O1, O2 and O3 describe the objects in the first, second and
third abstraction level, respectively. Thus, the object in the first level can be
described in terms of the second level objects:

O1={O2(Video)/O2(Audio)}

Likewise, the second level objects are described in terms of the third level objects
as,

O2(Video)={O3(Video, Audio Description)/O3(Video, SLD)}


O2(Audio)={O3(Audio, VtT)/O3(Audio, Freq500Hz)}

where SLD and VtT stand for Sign Language Description and Voice-to-Text,
respectively.
In the next step, we consider the “Accessibility” attribute of the user profile. An
example of the domain of values for this attribute’s values is given next:

PR(Access)={PR(Access, Total Blindness)/


PR(Access, Partial Blindness)/
PR(Access, Total Deafness)/
PR(Access, Partial Deafness)}

Where PR and Access stand for the Profile and the Accessibility Attribute,
respectively. Based on the above analysis, the adaptation inference rules for the
above example are as follows:

O2(Video) I PR(Access, Total Blindness) → O3(Video, Audio Description)


O2(Video) I PR(Access, Partial Blindeness) → O3(Video, Audio Description)
O2(Video) I PR(Access, Total Deafness) → O3(Video, SLD)
O2(Video) I PR(Access, Partial Deafness) → O3(Video, SLD)
O2(Audio) I PR(Access, Total Deafness) → O3(Audio, VtT)
O2(Audio) I PR(Access, Partial Deafness) → O3(Audio, Freq500Hz))

where I is the conjunction operator. To this end, it should be noticed that the
adaptation inference rules apply either to the recommended program or the
program that the user takes the initiative to view.

22
6 CONCLUSIONS AND FUTURE WORK

So far, the developed system is at a prototype stage and all systems (i.e. expert
system, authoring tool, iTV simulator) have not been evaluated as a whole by end
users due to project’s time limitation. Nevertheless, the itvSimu seems to offer an
interesting and simplified architecture that can realize a primitive IPTV platform
and further serve as benchmarking software for further research in the field of
content adaptation and accessibility. Currently, the prototype has implemented
only a portion of user groups. The reason is that the difficulties for evaluating the
adaptation behavior require a considerable number of users with diverse profiles,
and an analogous number of digital items. Such an evaluation is considered as
future work. In addition, as a future work it would be interesting to consider more
runtime parameters (implicit profile) and more effective models for multiplexing
them, maybe through AI techniques and simulation. Finally, a separate version of
itvSimu optimized for users with hearing problems (e.g. incorporating auditory
menus functionality) will be implemented.
From the point of view of standardization efforts, it turns out that the selection of
standards was a difficult task as there are many of them, often overlapping and/or
contradicting each other. Consequently, even if some designer uses open
standards, the final overall design becomes a proprietary solution composed of
several open standards.
Finally, it should be mentioned that the proposed approach and architecture
contribute to the compensation of digital divide offering accessible services to
different groups of people. At the same time, having in mind that the number of
disabled and elderly people is increased and also alternative access devices (e.g.
mobile phones) are proliferated in everyday life, the benefits of incorporating
accessibility in iTV is an opportunity for businesses to grow their market share.

ACKNOWLEDGMENTS

This work is supported by the General Secretariat of Research and Technology (Project “Software
Applications for Interactive Kids TV-MPEG-21”, project framework “Image, Sound, and
Language Processing”, project number: EHΓ-16). The participants are the University of the
Aegean, the Hellenic Public Radio and Television (ERT) and the Time Lapse Picture Hellas.

23
REFERENCES
[1] Berglund A, Johansson P (2004). Using Speech and Dialogue for Interactive TV
Navigation. In Universal Access in the Information Society 3(3-4):224–238.
[2] Carmichael A, Petrie H, Hamilton F, Freeman J (2003). The Vista Project: Broadening
Access To Digital TV Electronic Programme Guides. PsychNology Journal, 2003 Volume
1, Number 3, 229 – 241 229
[3] Carmichael A, Rice M, Sloan D (2006). Inclusive Design and Interactive Digital
Television: Has an Opportunity been Missed? In 3rd Cambridge Workshop on Universal
Access and Assistive Technology. Fitzwilliam College , Cambridge, 10-12 April 2006.
[4] Cesar P, Chorianopoulos K, Jensen J F (2008). Social television and user interaction.
Comput. Entertain. 6, 1 (May. 2008), 1-10. Doi: 10.1145/1350843.1350847
[5] Choi H, Choi M, Kim J, Yu H (2003). An empirical study on the adoption of information
appliances with a focus on interactive TV. Telemat. Inf. 20, 2 (May. 2003), 161-183. Doi:
10.1016/S0736-5853(02)00024-2
[6] Clarkson J, Karger S, Sinclair K (2003). Digital television for all: report on usability and
accessible design. DTI, London
[7] Darby S (1997). Introduction to Enhancing the Accessibility of Digital Television. In
RNIB.
[8] Ferguson D A, Perse, E. M. (2000). The World Wide Web as a functional alternative to
television. Journal of Broadcasting & Electronic Media, 44, 155-174.
[9] Gil A, Pazos J, Lopez C, Lopez J; Rubio R, Ramos M, Diaz R (2002). "Surfing the Web on
TV: the MHP approach," Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE
International Conference on , vol.2, no., pp. 285-288 vol.2, 2002
[10] Hynd A (2006). Evaluating four and five-year old children’s responses to interactive
television programs. PhD thesis, Murdoch University, 2006.
https://wwwlib.murdoch.edu.au/adt/pubfiles/adt-MU20070824.134316/02Whole.pdf
[11] ISO MPEG-21, Part I: Information technology - Multimedia framework (MPEG-21) —
Vision, Technologies and Strategy, ISO/IEC TR 21000-1:2004.
[12] ISO/IEC 21000-7, Information technology - Multimedia framework (MPEG-21) — Part 7:
Digital Item Adaptation, First Edition, 2004.
[13] Joly A V, Pemberton L, Griffiths R (2008). Electronic Programme Guide Design for
Preschool Children. In Changing Television Environments, p. 263-267. Doi: 10.1007/978-
3-540-69478-6_35
[14] Lugmayr A, Samuli N, Seppo K (2004). Digital Interactive TV and Metadata: Future
Broadcast Multimedia. Berlin; New York: Springer, 2004
[15] Melville P, Mooney R J, Nagarajan R. (2002). Content-Boosted Collaborative Filtering for
Improved Recommendations. In Proceedings of the 18th National Conference on Artificial
Intelligence, 187-192.
[16] Niiranen S (2003). Broadcast Multimedia Personalization. Licentiate thesis, Tampere
University of Technology.

24
[17] Petrie H, Weber G (2004). Personalization of Interactive Systems. In Proceedings of the 9th
International Conference on Computers Helping People with Special Needs (ICCHP’2004),
117-120.
[18] Piccolo L, Melo A, Baranauskas M (2007). Accessibility and Interactive TV: Design
Recommendations for the Brazilian Scenario. In: Baranauskas, M. C. C. et al: INTERACT
2007: 11th IFIP TC 13 International Conference -- Proceedings, Part I. Rio de Janeiro.
Brazil. p. 361--374.
[19] Rice M, Alm N (2008). Designing new interfaces for digital interactive television usable by
older adults. Comput. Entertain. 6, 1 (May. 2008), 1-20. Doi: 10.1145/1350843.1350849.
[20] Rice M (2004). Personalisation of Interactive Television for Visually Impaired Viewers. In
2nd Cambridge Workshop on Universal Access and Assistive Technology. Fitzwilliam
College , Cambridge, 22-24 March 2004.
[21] Souchon N, Vanderdonckt J (2003). A review of xml compliant user interface description
languages. In Conference on Design, Specification, and Verification of Interactive Systems.
Volume 2844 of Lecture Notes in Computer Science, Springer (2003) 377–391.
[22] Springett M, Griffiths R (2007). Accessibility of Interactive Television for Users with Low
Vision: Learning from the Web. In Proceedings of the 5th European Conference on
Interactive TV (EuroITV’2007), LNCS 4471, pp. 76-85, May 2007.
[23] Sun H, Vetro A, Asai K (2003). Resource Adaptation Based on MPEG-21 Usage
Environment Descriptions. In Proceedings of the IEEE International Conference on
Circuits and Systems, 536-539.
[24] Thang T C; Yang S; Ro Y M; Wong E K (2007). Media Accessibility for Low-Vision
Users in the MPEG-21 Multimedia Framework. IEICE Transactions on Information and
Systems, E90-D(8), 2007, pp.1271-1278.
[25] Trewin S, Zimmermann G, Vanderheiden G (2004). Abstract representations as a basis for
usable user interfaces. In Interacting with Computers 16. May 2004, pp.477-506.
[26] Tsekouras G, Anagnostopoulos C, Gavalas D, Economou D (2007). Classification of Web
Documents using Fuzzy Logic Categorical Data Clustering, Proceedings of the 4th IFIP
Conference on Artificial Intelligence Applications & Innovations (AIAI’2007), 93-100.
[27] TV-Anytime, ETSI TS 102 822: Broadcast and On-line Services: Search, Select and
Rightful Use of Content on Personal Storage Systems.
[28] Velasco A, Mohamad Y, Gilman S, Viorres N, Vlachogiannis E, Arnellos A, Darzentas S
(2004). Universal access to information services—the need for user information and its
relationship to device profiles. Univers. Access Inf. Soc. 3, 1 (Mar. 2004), 88-95. Doi:
10.1007/s10209-003-0075-5
[29] Vlachogiannis E (2008). Methodological framework, analysis, development and creation of
design support environments of adaptive systems. PhD Thesis, University of the Aegean,
Dept. of product and Systems design Engineering.
[30] Wellens A R (1979). An interactive television laboratory for the study of social interaction.
J. Nonverbal Behavior 4, 2,119–122.

25
[31] Yang Α, Ro Y, Nam J, Hong J, Choi S, Lee J (2004). Improving Visual Accessibility for
Color Vision Deficiency Based on MPEG-21. In ETRI Journal, vol.26, no.3, June 2004,
pp.195-202.
[32] Zeng C, Xing C, Zhou L (2003). Similarity Measure and Instance Selection for
Collaborative Filtering. In Proceedings of the 12th International World Wide Web
Conference, 652 - 658.
[33] Zillmann D (2000). The coming of media entertainment. In: Zillmann D, Vorderer P (eds)
Media entertainment: the psychology of its appeal. Lawrence Erlbaum Associates,
Mahwah, pp 1–20
[34] Zimmermann G, Vanderheiden G, Gilman A (2003). Universal remote console -
prototyping for the alternate interface access standard. Universal Access. Theoretical
Perspectives, Practice, and Experience. 7th ERCIM International Workshop on User
Interfaces for all. Revised Papers, 24-25 Oct. 2002, 524-31.
[35] Zimmermann G, Vanderheiden G, Trewin S (2005). Interface Sockets, Remote Consoles,
and Natural Language Agents: A V2 URC Standards Whitepaper -
http://myurc.org/whitepaper.php.
[36] Pignetti, L. & Capria, F. (2001, June). Interactive television today. DV Magazine, 1-9.

26

You might also like