Because learning changes everything.
Optional: Include Cover Here
Chapter 11
Multimedia
Data Communications and
Networking, With TCP/IP
protocol suite
Sixth Edition
Behrouz A. Forouzan
© 2022 McGraw Hill, LLC. All rights reserved. Authorized only for instructor use in the classroom.
No reproduction or further distribution permitted without the prior written consent of McGraw Hill, LLC.
Chapter 11: Outline
11.1 Compression
11.2 Multimedia Data
11.3 Multimedia in the Internet
11.4 Real-Time Interactive Protocols
© McGraw Hill, LLC 2
11-1 COMPRESSION
In this section, we discuss compression, which plays
a crucial role in multimedia communication due to
the large volume of data exchanged. In compression,
we reduce the volume of data to be exchanged. We
can divide compression into two broad categories:
lossless and lossy compression.
© McGraw Hill, LLC 3
11.1.1 Lossless Compression
In lossless compression, the integrity of the data is preserved
because the compression and decompression algorithms are exact
inverses of each other: no part of the data is lost in the process.
Lossless compression methods are normally used when we cannot
afford to lose any data. For example, we must not lose data when
we compress a text file or an application program. Lossless
compression is also applied as the last step in some lossy
compression procedures to further reduce the size of the data.
© McGraw Hill, LLC 4
Run-Length Coding
Run-length coding, sometimes referred to as run-length encoding
(RLE), is the simplest method of removing redundancy. It can be
used to compress data made of any combination of symbols. The
method replaces a repeated sequence, run, of the same symbol with
two entities: a count and the symbol itself.
© McGraw Hill, LLC 5
Figure 11.1 A version of run-length coding to compress binary
patterns
Access the text alternative for slide images.
© McGraw Hill, LLC 6
Dictionary Coding
There is a group of compression methods based on creation of a
dictionary (array) of strings in the text. The idea is to encode
common sequences of characters instead of encoding each
character separately. The dictionary is created as the message is
scanned, and if a sequence of characters that is an entry in the
dictionary is found in the message, the code (index) of that entry is
sent instead of the sequence. The one we discuss here was invented
by Lempel and Ziv and refined by Welch. It is referred to as
Lempel-Ziv-Welch (LZW).
© McGraw Hill, LLC 7
Table 11.1 LZW encoding
LZWEncoding (message)
{
Initialize (Dictionary)
Char = Input (first character)
S = char // S is the encodable sequence
while (more characters in message)
{
char = Input (next character);
if ((S + char) is in Dictionary) // S is not the encodable sequence
{
S = S + char;
}
Else // S is the encodable sequence
{
addToDictionary (S + char);
Output (index of S in Dictionary);
S = char;
}
}
Output (index of S in Dictionary);
}
© McGraw Hill, LLC 8
Least Cost Trees
If there are N routers in an internet, there are (N - 1) least-cost
paths from each router to any other router. This means we need N *
(N - 1) least-cost paths for the whole internet. If we have only 10
routers in an internet, we need 90 least-cost paths. A better way to
see all of these paths is to combine them in a least-cost tree. A
least-cost tree is a tree with the source router as the root that spans
the whole graph (visits all other nodes) and in which the path
between the root and any other node is the shortest. Figure 8.2
shows the seven least-cost trees for the internet in Figure 8.1.
© McGraw Hill, LLC 9
Figure 11.2 Example 11.1
Access the text alternative for slide images.
© McGraw Hill, LLC 10
Example 11.2
Let us show how the code in Example 11.1 can be decoded and the
original message recovered (Figure 11.3). The box called PreC
holds the codeword from the previous iteration, which is not
needed in the pseudocode, but needed here to better show the
process. Note that in this example there is only the special case in
which the codeword is not in the dictionary. The new entry for the
dictionary needs to be made from the string and the first character
in the string. The output is also the same as the new entry.
© McGraw Hill, LLC 11
Figure 11.3 Example 11.2
Access the text alternative for slide images.
© McGraw Hill, LLC 12
Table 11.2 LZW decoding
LZWDecoding (code)
{
Initialize (Dictionary);
C = Input (first codeword);
Output (Dictionary [C]);
while (more codewords in code)
{
S = Dictionary[C];
C = Input (next codeword);
if (C is in Dictionary) // Normal case
{
addToDictionary (S + firstSymbolOf Dictionary[C]);
Output (Dictionary [C]);
}
else // Special case
{
addToDictionary (S + firstSymbolOf (S));
Output (S + firstSymbolOf (S));
}
}
}
© McGraw Hill, LLC 13
Huffman Coding
When we encode data as binary patterns, we normally use a fixed
number of bits for each symbol. To compress data, we can consider
the frequency of symbols and the probability of their occurrence in
the message. Huffman coding assigns shorter codes to symbols that
occur more frequently and longer codes to those that occur less
frequently.
© McGraw Hill, LLC 14
Figure 11.4 Huffman tree
Access the text alternative for slide images.
© McGraw Hill, LLC 15
Table 11.3 Coding table
Symbol Code Symbol Code Symbol Code
A 00 C 011 E 11
B 010 D 10
© McGraw Hill, LLC 16
Figure 11.5 Encoding and decoding in Huffman coding
Access the text alternative for slide images.
© McGraw Hill, LLC 17
Arithmetic Coding
In the previous compression methods, each symbol or sequence of
symbols is encoded separately. In arithmetic coding, introduced by
Rissanen and Langdon in 1981, the entire message is mapped to a
small interval inside [0,1). The small interval is then encoded as a
binary pattern. Arithmetic coding is based on the fact that we can
have an infinite number of small intervals inside the half-open
interval [0,1). Each of these small intervals can represent one of
the possible messages we can make using a finite set of symbols.
© McGraw Hill, LLC 18
Figure 11.6 Arithmetic coding
Access the text alternative for slide images.
© McGraw Hill, LLC 19
Table 11.4 Arithmetic encoding
ArithmeticEncoding (message)
{
currentInterval = [0,1);
while (more symbols in the message)
{
s = Input (next symbol);
divide currentInterval into subintervals
subInt = subinterval related to s
currentInterval = subInt
}
Output (bits related to the currentInterval)
}
© McGraw Hill, LLC 20
Example 11.3
For the sake of simplicity, let us assume that our set of symbols is S
= {A, B, ∗}, in which the asterisk is the terminating symbol. We
assign probability of occurrence for each symbol as
PA = 0.4 PB = 0.5 P* 0.1
Figure 11.7 shows how we find the interval and the code related to
the short message "BBAB*".
© McGraw Hill, LLC 21
Figure 11.7 Example 11.3
Access the text alternative for slide images.
© McGraw Hill, LLC 22
Table 11.5 Arithmetic decoding
ArithmeticDecoding (code)
{
c = Input (code)
num = find real number related to code
currentInterval = [0,1);
while (true)
{
divide the currentInterval into subintervals;
subInt = subinterval related to num;
Output (symbol related to subInt);
if (symbol is the terminating symbol) return;
currentInterval = subInt;
}
}
© McGraw Hill, LLC 23
Example 11.4
Figure 11.8 shows how we use the decoding process to decode the
message in Example 11.3. Note that the hand shows the position of
the number in the corresponding interval.
© McGraw Hill, LLC 24
Figure 11.8 Example 11.4
Access the text alternative for slide images.
© McGraw Hill, LLC 25
11.1.2 Lossy Compression
Lossless compression has limits on the amount of compression.
However, in some situations, we can sacrifice some accuracy to
increase the compression rate. Although we cannot afford to lose
information in text compression, we can afford it when we are
compressing images, video, and audio. For example, human vision
cannot detect some small distortions that can result from lossy
compression of an image. In this section, we discuss a few ideas
behind lossy compression.
© McGraw Hill, LLC 26
Predictive Coding
Predictive coding is used when we digitize an analog signal. We
discussed pulse code modulation (PCM) as a technique that
converts an analog signal to a digital signal, using sampling. After
sampling, each sample needs to be quantized to create binary
values. Compression can be achieved in the quantization step by
using predictive coding.
© McGraw Hill, LLC 27
Figure 11.9 Encoding and decoding in delta modulation
Access the text alternative for slide images.
© McGraw Hill, LLC 28
Figure 11.10 Reconstruction of quantization of xn − xn−1
versus xn − yn-1
Access the text alternative for slide images.
© McGraw Hill, LLC 29
Figure 11.11 Slope overload and granular noise
Access the text alternative for slide images.
© McGraw Hill, LLC 30
Transform Coding
In transform coding, a mathematical transformation is applied to
the input signal to produce the output signal. The transformation
needs to be invertible, to allow the original signal to be recovered.
The transformation changes the signal representation from one
domain to another (time domain to frequency domain, for
example), which results in reducing the number of bits in encoding.
© McGraw Hill, LLC 31
Figure 11.12 One-dimensional DCT
Access the text alternative for slide images.
© McGraw Hill, LLC 32
Figure 11.13 Formulas for one-dimensional forward and inverse
transformation
Access the text alternative for slide images.
© McGraw Hill, LLC 33
Example 11.5
Figure 11.14 shows the transformation matrix for N = 4. As the
figure shows, the first row has four equal values, but the other rows
have alternate positive and negative values. When each row is
multiplied by the source data matrix, we expect that the positive
and negative values result in values close to zero if the source data
items are close to each other. This is what we expect from the
transformation: to show that only some values in the source data
are important and most values are redundant.
© McGraw Hill, LLC 34
Figure 11.14 Example 11.5
Access the text alternative for slide images.
© McGraw Hill, LLC 35
Figure 11.15 Two-dimensional DCT
Access the text alternative for slide images.
© McGraw Hill, LLC 36
Figure 11.16 Formulas for forward and inverse two-dimensional
DCT
Access the text alternative for slide images.
© McGraw Hill, LLC 37
11-2 MULTIMEDIA DATA
Today, multimedia data consists of text, images,
video, and audio, although the definition is changing
to include futuristic media types.
© McGraw Hill, LLC 38
11.2.1 Text
The Internet stores a large amount of text that can be downloaded
and used. One often refers to plaintext, as a linear form, and
hypertext, as a nonlinear form, of textual data. Text stored in the
Internet uses a character set, such as Unicode, to represent
symbols in the underlying language. To store a large amount of
textual data, the text can be compressed using one of the lossless
compression methods we discussed earlier.
© McGraw Hill, LLC 39
11.2.2 Image
In multimedia parlance, an image (or a still image as it is often
called) is the representation of a photograph, a fax page, or a
frame in a moving picture.
© McGraw Hill, LLC 40
Digital Image
To use an image, it first must be digitized. Digitization in this case
means to represent an image as a two-dimensional array of dots,
called pixels. Each pixel then can be represented as a number of
bits, referred to as the bit depth. In a black-and-white image, the
bit depth = 1. In a gray picture, one normally uses a bit depth of 8
with 256 levels. In a color image, the image is normally divided
into three channels, with each channel representing one of the three
primary colors of red, green, or blue (RGB).
© McGraw Hill, LLC 41
Example 11.6
The following shows the time required to transmit an image of
1280 × 720 pixels using the transmission rate of 100 kbps.
a. Using a black-and-white image with a bit depth of 1,
Transmission time (1280 720 1) 100,000 9 seconds
b. Using a gray image with a bit depth of 8,
Transmission time (1280 720 8) 100,000 74 seconds
c. Using a color image with a bit depth of 24,
Transmission time (1280 720 24) 100,000 215 seconds
© McGraw Hill, LLC 42
Image Compression: JPEG
Although there are both lossless and lossy compression algorithms
for images, in this section we discuss the lossy compression method
called JPEG. The Joint Photographic Experts Group (JPEG)
standard provides lossy compression that is used in most
implementations. The JPEG standard can be used for both color
and gray images. However, for simplicity, we discuss only the
grayscale pictures; the method can be applied to each of the three
channels in a color image.
© McGraw Hill, LLC 43
Figure 11.17 Compression in each channel of JPEG
Access the text alternative for slide images.
© McGraw Hill, LLC 44
Figure 11.18 Three different quantization matrices
Access the text alternative for slide images.
© McGraw Hill, LLC 45
Figure 11.19 Reading the table
© McGraw Hill, LLC 46
Example 11.7
To show the idea of JPEG compression, we use a block of gray
image in which the bit depth for each pixel is 20. We have used a
Java program to transform, quantize, and reorder the values in
zigzag sequence; we have shown the encoding (Figure 11.20).
© McGraw Hill, LLC 47
Figure 11.20 Example 11.7: uniform gray scale
Access the text alternative for slide images.
© McGraw Hill, LLC 48
Example 11.8
As the second example, we have a block that changes gradually;
there is no sharp change between the values of neighboring pixels.
We still get a lot of zero values, as shown in Figure 11.21.
© McGraw Hill, LLC 49
Figure 11.21 Example 8.8: Gradient gray scale
Access the text alternative for slide images.
© McGraw Hill, LLC 50
Image Compression: GIF
The JPEG standard uses images in which each pixel is represented
as 24 bits (8 bits for each primary color). This means that each
pixel can be one of the 224 (16,777,216) complex colors. For
example, a magenta pixel, which is made of red and blue
components (but contains no green component) is represented as
the integer (FF00FF)16.
© McGraw Hill, LLC 51
11.2.3 Video
Video is composed of multiple frames; each frame is one image.
This means that a video file requires a high transmission rate.
© McGraw Hill, LLC 52
Digitizing Video
A video consists of a sequence of frames. If the frames are
displayed on the screen fast enough, we get an impression of
motion. The reason is that our eyes cannot distinguish the rapidly
flashing frames as individual ones. There is no standard number of
frames per second; in North America 25 frames per second is
common. However, to avoid a condition known as flickering, a
frame needs to be refreshed. The TV industry repaints each frame
twice. This means 50 frames need to be sent, or if there is memory
at the sender site, 25 frames with each frame repainted from the
memory.
© McGraw Hill, LLC 53
Example 11.9
Let us show the transmission rate for some video standards:
a. Color broadcast television takes 720 * 480 pixels per frame,
30 frames per second, and 24 bits per color. The transmission
rate without compression is as shown below.
720 * 480 * 30 * 24 = 248,832,000 bps = 249 Mbps
b. High definition color broadcast television takes 1920 * 1080
pixels per frame, 30 frames per second, and 24 bits per color:
The transmission rate without compression is as shown below.
1920 * 1080 * 30 * 24 = 1,492,992,000 bps = 1.5 Gbps
© McGraw Hill, LLC 54
Video Compression: MPEG
Motion Picture Experts Group (MPEG) is a method to compress
video. In principle, a motion picture is a rapid flow of a set of
frames, where each frame is an image. In other words, a frame is a
spatial combination of pixels, and a video is a temporal
combination of frames that are sent one after another. Compressing
video, then, means spatially compressing each frame and
temporally compressing a set of frames
© McGraw Hill, LLC 55
Figure 11.22 MPEG frames
Access the text alternative for slide images.
© McGraw Hill, LLC 56
11.2.4 Audio
Audio (sound) signals are analog signals that need a medium to
travel; they cannot travel through a vacuum. The speed of the
sound in the air is about 330 m/s (740 mph). The audible frequency
range for normal human hearing is from about 20Hz to 20kHz with
maximum audibility around 3300 Hz.
© McGraw Hill, LLC 57
Digitizing Audio
To be able to provide compression, audio analog signals are
digitized using an analog-to-digital converter. The analog-to-
digital conversion consists of two processes: sampling and
quantizing. A digitizing process known as pulse code modulation
(PCM) was discussed before. This process involved sampling an
analog signal, quantizing the sample, and coding the quantized
values as streams of bits. Voice signal is sampled at the rate of
8,000 samples per second with 8 bits per sample; the result is a
digital signal of 8,000 * 8 = 64 kbps. Music is sampled at 44,100
samples per second with 16 bits per sample; the result is a digital
signal of 44,100 * 16 = 705.6 kbps for monaural and 1.411 Mbps
for stereo.
© McGraw Hill, LLC 58
Audio Compression
Both lossy and lossless compression algorithms are used in audio
compression. Lossless audio compression allows one to preserve
an exact copy of the audio files; it has a small compression ratio of
about 2 and is mostly used for archival and editing purposes. Lossy
algorithms provide far greater compression ratios (5 to 20) and are
used in mainstream consumer audio devices. Lossy algorithms
sacrifice a little bit of quality, but substantially reduce space and
bandwidth requirements. For example, on a CD, one can fit one
hour of high fidelity music, 2 hours of music using lossless
compression, or 8 hours of music compressed with a lossy
technique.
© McGraw Hill, LLC 59
Figure 11.23 Threshold of audibility
Access the text alternative for slide images.
© McGraw Hill, LLC 60
11-3 MULTIMEDIA IN THE INTERNET
We can divide audio and video services into three
broad categories: streaming stored audio/video,
streaming live audio/video, and interactive
audio/video. Streaming means a user can listen (or
watch) the file after the downloading has started.
© McGraw Hill, LLC 61
11.3.1 Streaming Stored Audio/Video
In the first category, streaming stored audio/video, the files are
compressed and stored on a server. A client downloads the files
through the Internet. This is sometimes referred to as on-demand
audio/video. Examples of stored audio files are songs, symphonies,
books on tape, and famous lectures. Examples of stored video files
are movies, TV shows, and music video clips. We can say that
streaming stored audio/ video refers to on-demand requests for
compressed audio/video files.
© McGraw Hill, LLC 62
First Approach: Using a Web Server
A compressed audio/video file can be downloaded as a text file.
The client (browser) can use the services of HTTP and send a GET
message to download the file. The Web server can send the
compressed file to the browser. The browser can then use a help
application, normally called a media player, to play the file. Figure
11.24 shows this approach.
© McGraw Hill, LLC 63
Figure 11.24 Using a Web server
Access the text alternative for slide images.
© McGraw Hill, LLC 64
Second Approach: Using a Web Server with Meta File
In another approach, the media player is directly connected to the
Web server for downloading the audio/video file. The Web server
stores two files: the actual audio/video file and a metafile that
holds information about the audio/video file. shows the steps in
this approach.
© McGraw Hill, LLC 65
Figure 11.25 Using a Web server with a metafile
Access the text alternative for slide images.
© McGraw Hill, LLC 66
Third Approach: Using a Media Server
The problem with the second approach is that the browser and the
media player both use the services of HTTP. HTTP is designed to
run over TCP. This is appropriate for retrieving the metafile, but
not for retrieving the audio/video file. The reason is that TCP
retransmits a lost or damaged segment, which is counter to the
philosophy of streaming. We need to dismiss TCP and its error
control; we need to use UDP. However, HTTP, which accesses the
Web server, and the Web server itself are designed for TCP; we
need another server, a media server. shows the concept.
© McGraw Hill, LLC 67
Figure 11.26 Using a media server
Access the text alternative for slide images.
© McGraw Hill, LLC 68
Fourth Approach: Using a Web Server with RTSP
The Real-Time Streaming Protocol (RTSP) is a control protocol
designed to add more functionalities to the streaming process.
Using RTSP, we can control the playing of audio/video. RTSP is an
out-of-band control protocol that is similar to the second
connection in FTP. shows a media server and RTSP.
© McGraw Hill, LLC 69
Figure 11.27 Using a media server and RTSP
Access the text alternative for slide images.
© McGraw Hill, LLC 70
Example: Video on Demand
Video On Demand (VOD) allows viewers to select a video from a
large number of available videos and watch it interactively: pause,
rewind, fast forward, etc. A viewer may watch the video in real time
or she may download the video into her computer, portable media
player, or to a device such as a digital video recorder (DVR) and
watch it later. Cable TV, satellite TV, and IPTV providers offer both
pay-per-view and free content VOD streaming. Many other
companies, such as Amazon video and video rental companies such
as Blockbuster video, also provide VOD. Internet television is an
increasingly popular form of video on demand.
© McGraw Hill, LLC 71
11.3.2 Streaming Live Audio/Video
In the second category, streaming live audio/video, a user listens to
broadcast audio and video through the Internet. Good examples of
this type of application are Internet radio and Internet TV.
© McGraw Hill, LLC 72
Example: Internet Radio
Internet radio or web radio is a webcast of audio broadcasting
service that offers news, sports, talk, and music via the Internet. It
involves a streaming medium that is accessible from anywhere in
the world. Web radio is offered via the Internet but is similar to
traditional broadcast media: it is noninteractive and cannot be
paused or replayed like on-demand services.
© McGraw Hill, LLC 73
Example: Internet Television
Internet television or ITV allows viewers to choose the show they
want to watch from a library of shows. The primary models for
Internet television are streaming Internet TV or selectable video on
an Internet location.
© McGraw Hill, LLC 74
Example: IPTV
Internet protocol television (IPTV) is the next-generation
technology for delivering real time and interactive television.
Instead of the TV signal being transmitted via satellite, cable, or
terrestrial routes, the IPTV signal is transmitted over the Internet.
Note that IPTV differs from the ITV. Internet TV is created and
managed by service providers that cannot control the final
delivery; it is distributed via existing infrastructure of the open
Internet. An IPTV, on the other hand, is highly managed to provide
guaranteed quality of service over a complex and expensive
network. The network for IPTV is engineered to ensure efficient
delivery of large amounts of multicast video traffic and HDTV
content to subscribers.
© McGraw Hill, LLC 75
11.3.3 Real-Time Audio/Video
In the third category, interactive audio/video, people use the
Internet to interactively communicate with one another. The
Internet phone or voice over IP is an example of this type of
application. Video conferencing is another example that allows
people to communicate visually and orally.
© McGraw Hill, LLC 76
Characteristics
Before discussing the protocols used in this class of applications,
we discuss some characteristics of real-time audio/video
communication..
© McGraw Hill, LLC 77
Figure 11.28 Time relationship
Access the text alternative for slide images.
© McGraw Hill, LLC 78
Figure 11.29 Jitter
Access the text alternative for slide images.
© McGraw Hill, LLC 79
Figure 11.30 Timestamp
Access the text alternative for slide images.
© McGraw Hill, LLC 80
Figure 11.31 Playback buffer
© McGraw Hill, LLC 81
Figure 11.32 The time line of packets
Access the text alternative for slide images.
© McGraw Hill, LLC 82
Example of a Real Time Application: Skype
Skype (abbreviation of the original project Sky peer-to-peer) is a
peer-to-peer VoIP application software that was originally
developed by Ahti Heinla, Priit Kasesalu, and Jaan Tallinn, who
had also originally developed Kazaa (a P2P file-sharing
application software). The application allows registered users who
have audio input and output devices on their PCs to make free PC-
to-PC voice calls to other registered users over the Internet.
© McGraw Hill, LLC 83
11-4 REAL-TIME INTERACTIVE PROTOCOLS
After discussing the three approaches to using
multimedia through the Internet, we now concentrate
on the last one, which is the most interesting: real-
time interactive multimedia. This application has
evoked a lot of attention in the Internet society and
several application-layer protocols have been
designed to handle it.
© McGraw Hill, LLC 84
Figure 11.33 Schematic diagram of a real-time multimedia
system
Access the text alternative for slide images.
© McGraw Hill, LLC 85
11.4.1 Rationale for New Protocols
We discussed the protocol stack for general Internet applications in
Chapter 2. In this section, we want to show why we need some new
protocols to handle interactive real-time multimedia applications
such as audio and video conferencing.
© McGraw Hill, LLC 86
Application Layer
It is clear that we need to develop some application-layer protocols
for interactive real-time multimedia because the nature of audio
conferencing and video conferencing is different from some
applications, such as file transfer and electronic mail, which we
discussed in Chapter 2. Several proprietary applications have been
developed by the private sector, and more and more applications
are appearing in the market every day. Some of these applications,
such as MPEG audio and MPEG video, use some standards
defined for audio and video data transfer. There is no specific
standard that is used by all applications, and there is no specific
application protocol that can be used by everyone.
© McGraw Hill, LLC 87
Transport Layer
The lack of a single standard and the general features of
multimedia applications raise some questions about the transport-
layer protocol to be used for all multimedia applications. The two
common transport-layer protocols, UDP and TCP, were developed
at the time when no one even thought about the use of multimedia
in the Internet. Can we use UDP or TCP as a general transport-
layer protocol for real-time multimedia applications? To answer
this question, we first need to think about the requirements for this
type of multimedia application and then see if either UDP or TCP
can respond to these requirements.
© McGraw Hill, LLC 88
Table 11.6 Capability of UDP or TCP to handle real-time data
Requirements UDP TCP
1. Sender-receiver negotiation for selecting the encoding type No No
2. Creation of packet stream No No
3. Source synchronization for mixing different sources No No
4. Error control No Yes
5. Congestion control No Yes
6. Jitter removal No No
7. Sender identification No No
© McGraw Hill, LLC 89
11.4.2 RTP
Real-time Transport Protocol (RTP) is the protocol designed to
handle real-time traffic on the Internet. RTP does not have a
delivery mechanism (multicasting, port numbers, and so on); it
must be used with UDP. RTP stands between UDP and the
multimedia application. The literature and standards treat RTP as
the transport protocol (not a transport-layer protocol) that can be
thought of as located in the application layer
(see Figure 11.34).
© McGraw Hill, LLC 90
Figure 11.34 RTP location in the TCP/IP protocol suite
Access the text alternative for slide images.
© McGraw Hill, LLC 91
RTP Packet Format
Before we discuss how RTP can help the multimedia applications,
let us discuss its packet format. We can then relate the functions of
the fields with the requirements we discussed in the previous
section. Figure 11.35 shows the format of the RTP packet header.
The format is very simple and general enough to cover all real-time
applications. An application that needs more information adds it to
the beginning of its payload.
© McGraw Hill, LLC 92
Figure 11.35 RTP packet header format
Access the text alternative for slide images.
© McGraw Hill, LLC 93
Table 11.7 Payload types
Type Application Type Application Type Application
0 PCMμ Audio 7 LPC audio 15 G728 audio
1 1016 8 PCMA audio 26 Motion JPEG
2 G721 audio 9 G722 audio 31 H.261
3 GSM audio 10-11 L16 audio 32 MPEG1 video
5-6 DV14 audio 14 MPEG audio 33 MPEG2 video
© McGraw Hill, LLC 94
UDP Port 1
Although RTP is itself a transport-layer protocol, the RTP packet is
not encapsulated directly in an IP datagram. Instead, RTP is
treated like an application program and is encapsulated in a UDP
user datagram. However, unlike other application programs, no
well-known port is assigned to RTP. The port can be selected on
demand with only one restriction: The port number must be an
even number. The next number (an odd number) is used by the
companion of RTP, Real-time Transport Control Protocol (RTCP),
which we will discuss in the next section.
© McGraw Hill, LLC 95
11.4.3 RTCP
RTP allows only one type of message, one that carries data from
the source to the destination. To really control the session, we need
more communication between the participants in a session. Control
communication in this case is assigned to a separate protocol
called Real-time Transport Control Protocol (RTCP).
© McGraw Hill, LLC 96
RTCP Packet
After discussing the main functions and purpose of RTCP, let us
discuss its packets. Figure 11.36 shows five common packet types.
The number next to each box defines the numeric value of each
packet. We need to mention that more than one RTCP packet can
be packed as a single payload for UDP because the RTCP packets
are smaller than RTP packets.
© McGraw Hill, LLC 97
Figure 11.36 RTCP packet types
Access the text alternative for slide images.
© McGraw Hill, LLC 98
UDP Port 2
RTCP, like RTP, does not use a well-known UDP port. It uses a
temporary port. The UDP port chosen must be the number
immediately following the UDP port selected for RTP, which makes
it an odd-numbered port
© McGraw Hill, LLC 99
Bandwidth Utilization
The RTCP packets are sent not only by the active senders, but also
by passive receivers, whose numbers are normally greater than the
active senders. This means that if the RTCP traffic is not
controlled, it may get out of hand. To control the situation, RTCP
uses a control mechanism to limit its traffic to the small portion
(normally 5 percent) of the traffic used in the session (for both RTP
and RTCP). A larger part of this small percentage, x percent, is
then assigned to the RTCP packets generated by the passive
receiver, a smaller part, (1 - x) percent, is assigned to the RTCP
packets generated by the active senders.
© McGraw Hill, LLC 100
Example 11.10
Let us assume that the total bandwidth allocated for a session is 1
Mbps. RTCP traffic gets only 5 percent of this bandwidth, which is
50 Kbps. If there are only 2 active senders and 8 passive receivers,
it is natural that each sender or receiver gets only 5 Kbps. If the
average size of the RTCP packet is 5 Kbits, then each sender or
receiver can send only 1 RTCP packet per second. Note that we
need to consider the packet size at the data-link layer.
© McGraw Hill, LLC 101
Requirement Fulfillment
As we promised, let us see how the combination of RTP and RTCP
can respond to the requirements of an interactive real-time
multimedia application. A digital audio or video stream, a
sequence of bits, is divided into chunks. Each chunk has a
predefined boundary that distinguishes the chunk from the previous
chunk or the next one. A chunk is encapsulated in an RTP packet,
which defines a specific encoding (payload type), a sequence
number, a timestamp, a synchronization source (SSRC) identifier,
and one or more contributing source (CSRC) identifiers.
© McGraw Hill, LLC 102
11.4.4 Session Initialization Protocol (SIP)
We discussed how to use the Internet for audio-video conferencing.
Although RTP and RTCP can be used to provide these services, one
component is missing: a signaling system required to call the
participants. The Session Initiation Protocol (SIP) is a protocol
devised by IETF to be used in conjunction with the RTP/SCTP.
© McGraw Hill, LLC 103
Communicating Parties
One difference that we may have noticed between the interactive
real-time multimedia applications and other applications is
communicating parties. In an audio or video conference, the
communication is between humans, not devices. For example, in
HTTP or FTP, the client needs to find the IP address of the server
(using DNS) before communication. There is no need to find a
person before communicating. In the SMTP, the sender of an e-mail
sends the message to the receiver mailbox on an SMTP without
controlling when the message will be picked up. In an audio or
video conference, the caller needs to find the callee.
© McGraw Hill, LLC 104
Addresses
In a regular telephone communication, a telephone number
identifies the sender, and another telephone number identifies the
receiver. SIP is very flexible. In SIP, an e-mail address, an IP
address, a telephone number, and other types of addresses can be
used to identify the sender and receiver. However, the address
needs to be in SIP format (also called scheme). Figure 11.37 shows
some common formats.
© McGraw Hill, LLC 105
Figure 11.37 SIP formats
Access the text alternative for slide images.
© McGraw Hill, LLC 106
Messages
SIP is a text-based protocol like HTTP. SIP, like HTTP, uses
messages. Messages in SIP are divided into two broad categories:
Requests and responses. The format of both message categories is
shown below (note the similarity with HTTP messages).
© McGraw Hill, LLC 107
First Scenario: Simple Session
In the first scenario, we assume that Alice needs to call Bob and
the communication uses the IP addresses of Alice and Bob as the
SIP addresses. We can divide the communication into three
modules: establishing, communicating, and terminating. Figure
11.38 shows a simple session using SIP.
© McGraw Hill, LLC 108
Figure 11.38 SIP simple session
Access the text alternative for slide images.
© McGraw Hill, LLC 109
Second Scenario: Tracking the Callee
What happens if Bob is not sitting at his terminal? He may be away
from his system or at another terminal. He may not even have a
fixed IP address if DHCP is being used. SIP has a mechanism
(similar to one in DNS) that finds the IP address of the terminal at
which Bob is sitting. To perform this tracking, SIP uses the concept
of registration. SIP defines some servers as registrars. At any
moment a user is registered with at least one registrar server; this
server knows the IP address of the callee.
© McGraw Hill, LLC 110
Figure 11.39 Tracking the callee
Access the text alternative for slide images.
© McGraw Hill, LLC 111
SDP Message Format and SDP Protocol
As we discussed before, the SIP request and response messages are
divided into four sections: start or status line, header, a blank line,
and the body. Since a blank line needs no more information, let us
briefly describe the format of the other sections.
© McGraw Hill, LLC 112
11.4.5 H.323
H.323 is a standard designed by ITU to allow telephones on the
public telephone network to talk to computers connected to the
Internet. Figure 11.40 shows the general architecture of H.323 for
audio, but it can also be used for video.
© McGraw Hill, LLC 113
Figure 11.40 H.323 architecture
Access the text alternative for slide images.
© McGraw Hill, LLC 114
Protocols
H.323 uses a number of protocols to establish and maintain voice
(or video) communication. Figure 11.41 shows these protocols.
H.323 uses G.71 or G.723.1 for compression. It uses a protocol
named H.245, which allows the parties to negotiate the
compression method. Protocol Q.931 is used for establishing and
terminating connections. Another protocol, called H.225, or
Registration/Administration/Status (RAS), is used for registration
with the gatekeeper.
© McGraw Hill, LLC 115
Figure 11.41 H.323 protocols
Access the text alternative for slide images.
© McGraw Hill, LLC 116
Operation
Let us use a simple example to show the operation of a telephone
communication using H.323. Figure 11.42 shows the steps used by
a terminal to communicate with a telephone.
© McGraw Hill, LLC 117
Figure 11.42 H.323 example
Access the text alternative for slide images.
© McGraw Hill, LLC 118
End of Main Content
Because learning changes everything. ®
www.mheducation.com
© 2022 McGraw Hill, LLC. All rights reserved. Authorized only for instructor use in the classroom.
No reproduction or further distribution permitted without the prior written consent of McGraw Hill, LLC.