Terena Paper
Terena Paper
Abstract
1. Introduction
In recent times, videoconference solutions to communication via the Internet Protocol
have become more and more available and mature. Establishing feasible audio-vis-
ual sessions between Internet-connected desktop computers is no longer an
ambitious task, provided all partners access compatible tools and know each others’
location.
In adopting the Internet protocol standards as the underlying, commonly available
communication infrastructure, Videoconferencing over IP (VCoIP) can soon be
expected to be widely available. When heading towards VCoIP as a standard
Internet service, important steps have to be taken to ensure usability on a global
scale. Any requirement of specific hardware or dedicated networking infrastructure is
U
This work was supported in part by the EFRE Programme of the European Commission.
1
likely to hinder VCoIP roll-out and should therefore be avoided.
Up until now, the use of videoconference applications has been dominated by ISDN
systems. This traditional technology offers a person-to-person, or meeting-oriented,
private service, as does telephony in general. The communication paradigm consists
of a point-to-point connection between dedicated devices under specific user
management.
In contrast, VCoIP is embedded into general Internet-connected working devices
and is today oriented towards more or less public conference groups. As employment
of VCoIP grows more mature, the need for meeting-oriented, private sessions has to
be met urgently. Since it addresses people rather than devices, it should adapt to the
common internetworking communication paradigm of mobile users accessing
services, not equipment.
In the present paper we address the issue of global, decentralised VCoIP
communication infrastructure. We present a simple, ready-to-use approach to user
look-up without modification of the current Internet information infrastructure as well
as serverless, highly efficient VCoIP software, implementing our information strategy.
Our solution rigorously aims for ease and functionality at the price of loss of
generality.
This paper is organised as follows. In section 2 we discuss communication strategies,
introducing our basic ideas and examples of related work. Section 3 presents the
daViCo videoconference software and its core technologies. Finally, section 4 is
dedicated to conclusions and a look at practical experiences of the solution.
2
The H.323 architecture must be considered as local in the sense that all participants
need to agree on common MCU and Gatekeeper servers which, at least for the
MCU, suffer from severe scaling deficiencies. No global naming is defined except for
telephone numbers handled by ISDN gateways and the Q.931-compatible signalling
protocol H.225.0. H.323 concepts centre on the ideas of telephone-based wide area
connectivity and are made obsolete by the simple observation that the use of
videoconferencing via telephony is not growing. Consequently, attempts are made to
overcome local restrictions in addressing by interconnecting Gatekeepers via meta-
directory servers, as done in the Video Development Initiative [3].
H.323 terminals may be used independently of servers for bilateral conferences. MS
Netmeeting and others operate in this way. The serverless extension to multipoint
capabilities in the IP world is most efficiently done via multicast transport, where any
client in the conference simultaneously takes the role of multicast source and
destination. Multicasting is employed at the price of communicating in more or less
full public. Multicast features do not conform to H.323 and have been implemented,
for example, by the Mbone Tools [4], Vcon [5], Ivisit [6].
User location services of the available conference tools remain rudimentary. Beside
direct addressing of manually discovered devices and static listings, some terminals
can connect to a directory server and dynamically update user locations. This can be
done, for example, with Netmeeting and the MS Internet Locator Server [6]. In this
way, a conference attendee may select partners from people currently registered at
his previously selected directory server. The SDR Mbone tool, though attained
through advertising multicasts, exhibits similar behaviour. However, the
communicative aspect of these services remains far from a self-steering and is
comparable to chat groups.
The problem yet to be solved concerns strategies of locating appropriate services
and contacting a communication partner at will on a global scale. Thereby, in order
to ensure short-term success, no solution should involve changes to the present
Internet information structure.
A fairly general attempt has been made with the Session Initialisation Protocol (SIP)
[8]. SIP covers beside user localisation negotiations about user capabilities, user
availability, the call set-up by SDP and the handling of the calls itself. SIP introduces
its own infrastructure of servers which actively communicate by using SIP-URLs or
other network protocols such as ICMP. SIP is open to store persistent information in
common databases such as LDAP directories, but adheres its own server
communication layer.
SIP does not prescribe a specific addressing scheme but proposes addresses of the
form <user>@<SIP-server>, where the SIP-server contains a name mapping
directory learned from client registrations or proactively driven by unspecified server
inquiries. If addresses not of the SIP-server-type are used, the server will perform a
user address based routing throughout the distributed SIP databases (see fig. 2). SIP
does not provide mechanisms to ensure success in locating a user or a SIP-server
present on the network. It should be noted that SIP-server addresses cannot be
guessed from mail addresses as soon as virtual users tables names without
reflecting underlying infrastructure are used.
The SIP concept proposes either a significant roll-out of SIP self-learning, interrelated
infrastructure or just the presence of single, isolated information servers. In the latter
case, strategies to locate these information servers remain vague. Both SIP and
H.323 have the drawback of exchanging addresses within the protocol payload and
3
are thereby severely hindered in NAT traversal as well as in migration to IPv6.
In the following section we will introduce a mechanism covering the user location part
of SIP that precisely specifies location strategies and operates without inventing new
addresses or protocols.
4
session-based services
would call for a new
DNS service record
pointing at the USL
directory server for a
given domain name.
The extension of the
DNS by SRV records
has been proposed in
RFC 2782 [9] and is
referred to in [8].
However, it requests a
change in Internet
information structure at
present stage and
remains a proposal.
Similarly, but with less Figure 2: SIP user address based routing
significant changes in
Internet naming, the DNS TXT record could be employed to store the location of a
USL look-up server as proposed in RFC 1464 [10].
Because these two approaches, despite their straightforwardness, imply global
modifications on DNS content structure that cannot be easily achieved, we chose a
much simpler strategy. DNS data provided today are ready to cope with it: because
the mail exchange record indicates a physically present domain where any requested
user is identifiable along with a method of authentication, it is the appropriate location
for a USL server. Within this domain, the look-up server can be identified by the
common approach
of a naming
convention, i.e.
usl.<mailexchanger
-domain> [17].
Consequently, a
global user look-up
proceeds in two
steps. Firstly, the
MX record for the
target user is
requested, and
secondly, the
directory server
hostname formed
from the above
naming convention Figure 3: Distributed User Location Scheme
is resolved (see fig.
3).
Though simple, this user session information architecture neither relies on
infrastructural changes nor requires dedicated user knowledge on the application
side. Note that in contrast to H.323 gatekeepers or SIP servers the USL server
consists of a passive session record store and can be realised by an unmodified
standard LDAP server such as OpenLDAP. It is easily integrated into existing local
5
infrastructure and may establish videoconferencing as a serious, regular Internet
communication service.
Integration into local LDAP directory services then can be easily achieved through a
server referral.
DN:
dn: [email protected],dc=application
Attributes:
objectclass (< OID > NAME ‘VCoIP’ SUP top AUXILIARY
DESC ‘Video Conferencing over IP Session Information’
MUST ( VCoIPipHostNumber $ VCoIPipServicePort $ VCoIPServiceProtocol
$ VCoIPTimeStamp $ mail $ cn
)
MAY ( VCoIPMcastGroup $ VCoIPAppID $ VCoIPAppVer $ VCoIPAppProtocol
$ VCoIPMimeType $ VCoIPPrivateipHostNumber $
VCoIPPrivateipServicePort $ VCoIPStatusFlag
)
)
separate control sessions. To correct the way that NAT breaks such applications, an
Application Layer Gateway, ALG, commonly needs to be implemented directly on the
NAT-GW. Even though major vendors offer H.323 ALGs, much of the sustainable
success of Internet applications is hindered if they cannot run on endpoints without
first requiring upgrades to infrastructure components. Even though NAT-GWs are
expected to disappear with the change to the IPv6 protocol, discussions on how to
overcome NAT-GWs are increasing throughout the Internet community [16].
To achieve our goal extending VCoIP on the given, unmodified infrastructure, even
with the presence of NAT, we proceed as follows: working behind a NAT-GW, the
6
USL needs to be installed outside the NAT range. Since our system signals and
receives media streams on a single network port, which can be tcp or udp with similar
qualitative performance, we proceed through the NAT to contact the USL via tcp. We
then preserve this connection in order to restrain the NAT-GW from dropping its state
information, extract address and port from the packet headers and publish them to
the USL directory. By following this procedure, the infrastructure remains completely
untouched, while any caller from the public Internet will obtain addressable
connection data to initiate a videoconference session. Note that this NAT work-
around could be achieved for udp-based communication in a similar fashion.
3.1. Overview
The digital audio-visual conferencing system daVico [11] forms serverless multipoint
video conferencing software (see fig. 4). It has been designed in a peer-to-peer
model as a lightweight Internet conferencing tool aimed at effortless use. Guided by
the latter principle, daViCo refrained from implementing H.323 client requirements.
The system is built instead
upon a fast, highly efficient
video codec, based on a
wavelet algorithm.
Exploiting specific
properties of the coding
scheme, the software
permits scaling in
bandwidths from 64 to
4000 kbit/s. Audio data is
compressed using an MP3
algorithm with latencies
below 120 ms depending
on buffer size. Audio and
video streams can be
transmitted as unicast as
well as multicast. An
application- sharing facility Figure 4: The daViCo Conferencing Tool
is included for
collaboration and
teleteaching.
Due to low bandwidth requirements, daViCo is well suited to long distance video-
conferences on a best effort basis. To strengthen its global usability, the user location
scheme described above has become part of the software.
7
which decorrelates the
signal, a quantizer and a
lossless entropy coder
which compacts the data
produced by the
quantizer (see fig. 5).
The transformation we
use is wavelet-type,
transforming the image
as a whole. Thus, no
blocking artefacts occur. Figure 5: Transformcoding
Filtering is done in a low-
complexity implementation with a 5/3 tab convolution, subsampling on three levels.
As quantizer, we chose a simple uniform scalar with an enlarged dead zone. The
third module is a highly efficient, fast entropy codec scheme consisting of a precoder
(PC) and a set of Golomb Rice codecs. To reduce the temporal redundancies in a
video sequence, we use DPCM coding, i.e.,only the difference from one frame to the
next will be coded.
For encoding the quantized wavelet coefficients, we follow the conceptual ideas
presented in [12]. For more details, the readers are referred to [12], [13].
Results
In native implementations, the video codec encodes and decodes 25 CIF frames
(352 x 288 pixels) simultaneously on a 500 MHz Pentium machine. Alternatively, 5
frames in PAL (720 x 576) resolution may be processed, where frame rate is
expected to increase with forthcoming algorithmic improvements. The image quality
is better or comparable with MPEG 4 / H.263 Coders. At moderate motion
complexity, this frame rate produces a bit rate of ca 200 kb/s while sustaining very
good visual quality.
The codec has also been ported to JAVA as part of a Web streaming system [14].
The JAVA codec running in an applet still decodes or encodes 5 CIF frames per
second in real-time or, more appropriately, QCIF format with 25 frames.
8
therein. Currently the application is ported to IPv6.
Acknowledgement:
We would like to thank Stefan Zech for his cheerful collaboration: His tricks pushed
some Windows into networking.
References:
[1] ITU-T Recommendation H.323: Infrastructure of audio-visual services – Systems and terminal
equipment for audio-visual services: Packet-based multimedia communications systems. Draft Version
4, 2000.
[2] E. Verharen: Development of a European Videoconferencing Service. Proceedings of TERENA
2001 Networking Conference, http;//www.terena.nl/conf/tnc2001/proceedings, 2001.
[3] Video Development Initiative, homepage http://www.vide.net, 2002.
[4] Mbone tool download ftp://ftp.ee.lbl.gov/conferencing/, 1996.
[5] The VCON homepage: http://www.vcon.com, 2002.
[6] The IVISIT homepage: http://www.ivisit.com, 2002.
[7] NetMeeting Resource Kit Contents, Chapter 3, Finding People
http://www.microsoft.com/Windows/NetMeeting/Corp/ResKit/Chapter3/default.asp, 2002.
[8] M. Handley, H. Schulzrinne, E. Schooler, J. Rosenberg: SIP: Session Initiation Protocol. RFC2543,
March 1999.
[9] A. Gulbrandsen ; P. Vixie ; L. Esibov: A DNS RR for specifying the location of services (DNS SRV).
RFC2782, February 2000.
[10] R. Rosenbaum: Using the Domain Name System To Store Arbitrary String Attributes. RFC1464,
May 1993.
[11] The daViCo homepage: http://www.daViCo-gmbh.de, 2002.
[12] D. Marpe and H. L. Cycon: Efficient Pre-Coding Techniques for Wavelet-Based Image
Compression, 1997, Proc. PCS ’97, pp. 45–50.
[13] D. Marpe and H. L. Cycon: Very Low Bit-Rate Video Coding Using Wavelet-Based Tech-niques,
IEEE Trans. on Circ. and Sys. for Video Techn., 1999, 9 (1), pp. 85–94.
[14] B. Feustel, T.C. Schmidt: Media Objects in Time --- A Multimedia Streaming System. Proc. of the
TERENA Networking Conference ’01. Computer Networks, 37/6, November 2001, pp 727--735,
Amsterdam 2001.
[15] A. Sears: A Scalable Directory Schema in LDAP for Integrated Conferencing Services. Proc. of
Inet97. http://www.isoc.org/inet97/proceedings, 1997.
[16] M. Shore et al.: The Middlebox Communication (midcom) Group.
http://www.ietf.org/html.charters/midcom-charter.html, 11-Mar-2002.
[17 ] M. Hamilton, R. Wright: Use of DNS Aliases for Network Services, RFC2219, October 1997.