NV Multimedia Communications UNIT I
NV Multimedia Communications UNIT I
NV Multimedia Communications UNIT I
UNIT I
16CS7CEMMC
Introduction to Multimedia
• In networks, the data transferred can be of any of the following forms.
• Text
• Formatted text (electronic documents etc.)
• Unformatted text ( email – plain text without any font specifications)
• Images
• Computer generated images –shapes (line/circle etc)
• Digitized images of documents
• Pictures
• Audio
• Low fidelity (Speech - telephony)
• High fidelity (Steoreophonic music)
• Video
• Short sequence of moving images (video clips - advertisements)
• Complete movies ( films )
• Data Networks
• Earlier Modems worked at speed -300bps but now they operate at higher
bit rates.
• 56kbps – sufficient for text, image as well as speech and low resolution
videos
• Digital Signal Processing techniques has helped communication in many
ways.
• Two channels are used with high speed modems – one in which speech is
sent for telephony and the other is a high bit rate one which can carry high
resolution videos and audio
Dr. Nandhini Vineeth 10
DATA NETWORKS
• Designed for basic data communication services – Email and file transfers.
• UE- PC/Computer/Workstation
• Two widely deployed networks- X.25 and Internet
• X.25- low bit rate –unsuitable for MM
• Internet- coll of interconn networks operate using the same set of
communication proto
• Comm protocol- set of rules agreed by the comm. Parties for exchange of
infor-this includes syntax of messages.
• Open System Interconnection- Irrespective of the type or manufacturer, all
systems in Internet they communicate
• Interpersonal communication
• Involves all four MM types
• May in single form or combined form
• Speech only
• Telephones connected to PBX or a PSTN/ISDN/Celullar networks
• Computers can also be used to make calls
• Computer telephony Integration-requires a telephone interface card and associated software.
• Adv – Phone Directory can be saved and dialling a number is easily done with a click
• Telephony can be integrated with network services provided by the PC
• Additional services: Voice mail and teleconferencing
• Voice mail – in the absence of called party, a message is left for them which is stored in a central server
Which can be read the next time the party contacts the server.
• Teleconferencing- conference callDr.
– requires an audio bridge – to setup a conf call automatically 21
Nandhini Vineeth
Dr. Nandhini Vineeth 22
Telephony
• Internet also support telephony.
• Initially only PC TO PC Telephony was the only one supported. Later they were
able to include telephones in these networks.
• Here voice signal was converted to packets and hence necessary Hardware and
softwares were required
• Telephone over internet is collect packet voice or Voice over IP (VoIP).
• When a PC is to call a telephone, a request is sent to a Telephony Gateway with
IP address of the called party (CP). This obtains the phone number of the called
party from source PC. A session call is established by this TG to the TG nearest to
CP using internet address of the gateway. This gateway initiates a call set up
procedure to the receiver’s phone.
• When the CP answers, reverse communication happens
• A similar procedure for the closing of the call
• Inspite of a constant bit rate supported by most of the networks, the store
and forward delay in router/PSE, the actual rate across network also
becomes variable.
• Summation of mean - store and forward delay that a packet experiences in each PSE/ router in its
route
• Mean packet error rate PER
• Prob of a received packet containing one or more bit errors.
• Related to the max packet size and the worst case BER of the transmission links that
interconnect the PSEs/routers that make up the network
• Jitter – worst case - variation in the delay
• Transmission delay is the same in both pkt mode or a circuit mode and includes the codec
delay in each of the communicating computers and the signal propagation delay.
Dr. Nandhini Vineeth 51
Problem – Network QoS
• For interactive applications, however the startup delay (delay between the application
making a request and the destination (server) responding with an acceptance. Total time
delay includes the connection establishment delay + delay in source and destination.
• Formatted Text
• Richtext – documents are created which comprise of strings of characters of different
styles, size, color etc. Tables, graphics and images are inserted
• Hypertext
• Integrated set of documents – have defined linkages between them.
• The light signal associated with each frame varies to show mobility with moving image, and stays the same
for still images
• Frame refresh rate is high enough to keep our eye not recognize the refresh.
• A low refresh rate leads to flicker. RR of 50 times per second is required-frequency of mains electric supply
is required is 60 Hz in America and Asia and 50Hz in Europe
• Analog TV- Picture tubes operate in analog mode- amplitude of each signal vary as each line is scanned
• Digital TV – color signals are in digital form and comprise of a string of pixels with a fixed number of pixels per scan line.
• A stored image is displayed by reading the pixels from memory in time-synchronism with the scanning process and
continuously varying analog form by means of a digital –to-analog converter.
• As the computer memory is to be continuously scanned for the display, a separate block of memory known as video RAM
is used to store the pixel images. So the graphics program writes into this VRAM, when a new image is to be shown on the
screen.
• Graphics program: Creates the high level version of the image interactively with KB and mouse by the
• Display controller part of the program interprets sequences of display commands and converts them into displayed
objects by writing the appropriate pixel values into video RAM. – Frame/ Display refresh buffer.
• Video controller is a H/W sub system that reads the pixel values stored in the VRAM in time-synchronism with the
scanning process and for each set of pixel values converts these into the equi set of red, green and blue analog signals for
output to display.
• CCD reads the charge single row at a time and transfers to a readout
register. The charge on each photosite position is shifted out, amplified
and digitized using an ADC. All rows are read out and digitized.
• When this output is directly sent to a computer , bitmaps can be loaded in
the framebuffer which are ready for display.
• When stored in the camera, multiple images are stored and then
transferred to computer. They can be stored in an integrated circuit
memory either on a removable card or fixed within the cameras. Cards in
card slots and cable link used respectively to transfer.
• File Formats used to store a set of images. TIFF/Electronic Photography
Dr. Nandhini Vineeth 95
Dr. Nandhini Vineeth 96
AUDIO
• Audio- Speech / Music
• Generated by Microphone/ speech synthesizer.
• If by a synthesizer, then it would be a digital signal ready to be stored in a computer
• If by Microphone, then those analog signal need to be converted to digital signal using an audio
signal encoder. If this is to be sent to a speaker which again demands analog signal, an audio
signal decoder is required for this conversion.
• BW of a typical speech is 50 Hz to 10KHz.
• Music -15Hz to 20 KHz
• The sampling rate used should be in excess of their Nyquist rate which is 20ksps for speech and
40ksps for music.
• The no. of bits per sample must be chosen so that the quantization noise generated by the
sampling process is at an acceptable level rela to min signal level. Speech – 12 bits per sample
and for music – 16 bits.
• Sampling rate is often lowered in order to reduce the amount of memory that is required to store
a parti passage of music
• Within a computer, in order to reduce the access delay, multiples of this rate are used
• With CD –ROMs this bit rate is used, which is widely used for the distribution of multimedia titles (A multimedia project shipped or sold to
consumers).
Extension 0000001000
• Scanning Sequence
• Though min RR is declared as 50 times per second to avoid flicker, from human eye’s
perspective rate of 25 time per second is sufficient.
• To reduce the transmission BW, transmission of each frame is done in two halves, each half
termed a field- first only with odd scan lines and the second with even scan lines.
• These two halves are received and integrated in the receiver.
• Interlaced scanning is used to integrate the two fields.
• Saturation (chrominance)
• Strength of the color
• a pastel color has a low level of saturation than a color such as red.
• Saturated color – red has no white in it
• In 525 line system, the total line sweep time is 63.56 microseconds but during this time, the
beam is turned off set to black level for retrace of 11.56 microseconds giving an active
sweep time of 52 microsec
• In 625 line system, total line sweep time is 64 microseco with a blanking time of 12 microsec
with an active sweep time of 52 micro sec Hence in both cases, a sampling rate of 13.5 MHz
yields
• 52 X10-6 X 13.5 X 106 =702 samples per line
• In practise, the number of samples per line is increased to 720 by taking a slightly longer
active line time which results in a small number of black samples at the beginning and end of
each line for reference purpose
• For the two chrominance signals – set to half – 360 samples per line.
• This results in 4Y samples for every 2Cb and 2Cr samples giving the term 4:2:2
• 4:4:4 indicates the digi based on RGB Signals
Flickering is avoided by the receiver by using the same chrominance values from the sampled lines for the missing lines.
Flickering in large screen TVs is reduced by RX storing the incoming digitized signals of each field in a memory buffer. A
refresh rate of double the normal rate -100/120 Hz is used with the stored set used for the second field
Source 4:1:1 Half- Y= 360 X 240 Y= 360 X 288 6.75 X 106 Progress Picture quality as obtained
1 Intermediate 30Hz(525)- Cb=Cr= 180 X Cb=Cr= 180 X 144 X8 ive with Video Cassette
Format (SIF) 25Hz(625) 120 +2(1.6875 (non- Recorder (VCR)- intended
--uses half spatial (Subsampling) X106X8)= interlac for storage applications
resolution of 4:2:0 81 Mbps ed)
format-
subsampling
Half the refresh
rate– temporal
resolution
Common 4:1:1 Half- Y= 360 X 288 SAME as Progress Video Conferencing
2 Intermediate 30Hz(525)- Cb=Cr= 180 X 144 SIF ive Applications
Format (CIF) 25Hz(625) (non- Linked Desktop PCs-
--Derived from SIF 4CIF: Y=720 X 576 interlac single 64Kbps ISDN
-- combination of Cb=Cr= 360 X 288 ed) Channel.
spatial resolution Linked Video
used for SIF in 625 Conferencing Studios-
line system and 16CIF: Y=1440 X 1152 Multiple 64Kbps
temporal Cb=Cr= 720 X 576 channels (4 or 16)
resolution used in
525
3 Quarter CIF (QCIF) 4:1:1 15 / 7.5 Dr. Nandhini
Y= 180 X 144 Vineeth 3.375 X 106 Video Telephony 183
– Derived from CIF Cb=Cr= 90 X 72 X8 applications
Dr. Nandhini Vineeth 184
Dr. Nandhini Vineeth 185
PC VIDEO
• Multimedia applications involving video - Video telephony and video conferencing etc.,
• To avoid distortion on a PC Screen- for example for a display of N x N pixels – 525-hori resolution of 640
pixels per line, 625 line 768 pixels per line
• For PC Monitor where mixing live video with other info is seen, line sampling rate is modified .
• For 525 – line sampling rate reduced from 13.5MHz to 12.2727 MHz while for 625-14.75MHz
• In case of desktop video telephony and video conferencing, the video signals from the camera are
sampled at this rate prior to transmission and hence displayed directly on screen.
• All ADC operations produce a quan error and hence a string of positive errors will have a cumulative effect
on the accuracy of the value that is held in the register.
• As the errors could propagate, more sophisticated techniques have been developed for estimating- also
known as predicting – a more accurate version of prev signal. This is done by using a number of
immediately preceding estimated signals not one.
• Predictor coefficients – help in determining the proportions of the same
• Diff signal is computed by subtracting varying proportions of the last three predicted values from the current
digi value output by ADC
• Ex. If C1=0.5 and C2=C3=0.25, the contents of register R1 will be shifted right by 1 bit (Xly contents by 0.5)
and the contents of other two by 2 bits. The sum of the three shifted values are sub from curr digi value
output by ADC.
• R1 value shifted to R2, R2->R3. The new predicted value is shifted to R1 for next sample processing
• The decoder operates by adding the same proportions of the last three computed PCM signals to the
received DPCM signal.
• A performance equi to PCM is obtained by using only 6 bits for the diff signal which produces a bit rate of
32 kbps
• These are quan and sent and the destn uses them together with a
sound syn to regen a sound that is perceptually comparable with the
source audio signal. This is the basis of the linear predictive coding
tech.
Analysis done here as the others but only the ones that are perceptual to
human ear are transmitted.
Human ear is sensitive to sig – 15Hz to 20 kHz, the level of sensi to each
signal is non linear- more sensi to some than others
Freq masking
In gen audio. where multi signals are present, a strong signal may reduce
the level of sensi of the ear to other signals which are near to it in freq
Temporal Masking- When the ear hears a loud sound, it takes a short but
finite time before it can hear a quieter sound