NV Multimedia Communications UNIT I

Multimedia Communications
UNIT I
16CS7CEMMC
Introduction to Multimedia
• In networks, the data transferred can be of any of the following forms.
• Text
• Formatted text (electronic documents etc.)
• Unformatted text ( email – plain text without any font specifications)
• Images
• Computer generated images –shapes (line/circle etc)
• Digitized images of documents
• Pictures
• Audio
• Low fidelity (Speech - telephony)
• High fidelity (Steoreophonic music)
• Video
• Short sequence of moving images (video clips - advertisements)
• Complete movies ( films )
Dr. Nandhini Vineeth 2

Applications
• Person to person communication using Terminal Equipments
• Person to computer communication
• Person -- a MM PC and Computer – server with files (holding single MM type
or Integrated MM)
• Person – a set top box connected to a TV can communicate with MM servers
Applications that initially supported only one type of the MM now with
advanced H/W and S/W supports Integrated MM.
• email supported only text initially now can be sent with any type of media
attached
• Telephone services supported using only speech earlier but now allows all
MM Types.

Multimedia Information Representation
• Text and Images
• represented using blocks of digital data
• Text- rep with codewords – fixed number of bits
• Images- picture elements – every pixel is rep using a fixed number of bits
• Transaction duration – less
Audio and Video
• represented as analog signals that vary continuously with time
• Telephonic conversations may take minutes and movie downloads may take
hours
• When they are only type, they take their basic form- analog
• When integrated with other types, they need to be converted to digital form.

Multimedia Information Representation
• Speech signal – typical data rate is 64kbps
• Music and Video – higher bit rates are required
• Huge bit rates cannot be supported by all networks
• Compression is the technique applied to the digitized signals to
reduce the time delay for a request / response.

Multimedia Networks
• Five basic types of Communication Networks
• Telephone Networks
• Data Networks
• Broadcast Television Networks
• Integrated Services Digital Network
• Broadband Multiservice Networks

Telephone Networks
• POTS-Plain Old Telephone System
• Initially calls were done within a country
• Extended to International calls
• Explanation of the figure in next slide

• PBX – Private Branch Exchange
• LE - Local Exchange
• IGE - International Gateway Exchange
• GMSC – Gateway Mobile Switching Centre
• PSTN- Public Switched Telephone Networks

Telephone Networks

Telephone Networks
• Microphone is used to convert speech to analog signal

• Telephone earlier used to work in circuit mode – a separate call is set
up and resources are reserved through out the network during the
duration of the call.
• Handsets were designed to carry two way analog signals to PBX.
• Digital mode is seen within a PSTN.
• MODEM was a significant device used.

High speed Modems
• Earlier Modems worked at speed -300bps but now they operate at higher
bit rates.
• 56kbps – sufficient for text, image as well as speech and low resolution
videos
• Digital Signal Processing techniques has helped communication in many
ways.
• Two channels are used with high speed modems – one in which speech is
sent for telephony and the other is a high bit rate one which can carry high
resolution videos and audio
DATA NETWORKS
• Designed for basic data communication services – Email and file transfers.
• UE- PC/Computer/Workstation
• Two widely deployed networks- X.25 and Internet
• X.25- low bit rate –unsuitable for MM
• Internet- coll of interconn networks operate using the same set of
communication proto
• Comm protocol- set of rules agreed by the comm. Parties for exchange of
infor-this includes syntax of messages.
• Open System Interconnection- Irrespective of the type or manufacturer, all
systems in Internet they communicate

Data networks

Data Networks
• Home/Small Offices connect to Internet via Internet Service Provider thro a PSTN via modem or ISDN.
• Site / Campus Network – single site/Multiple sites through an enterprise-wide private network connect to
the Internet
• EWPN – ex. College / University campus
• When these networks use the same set of protocols for internal services used by Internet, they are said to
be Intranets.
• All the above type of networks connect to Internet Backbone Network via a gateway (router)
• Data networks operate in packet mode.
• Packet- container for data – has both head and body. Head contains the control information like the
destination address
• MM PC were introduced which supports Microphones and Speaker, sound card and a supporting software
to digitize the speech.
• Introduction of camera with its supporting H/W and S/W introduced Video.
• The data networks hence initiated the MC applications.

Broadcast Television N/W
• Designed to support the diffusion of analog television to geographically
wide areas.
• For city/town- the bx medium is a cable distribution network, for larger
areas – a satellite network or a terrestrial broadcast network.
• Digital services started with Home shopping and Game playing.
• The STB in case of cable network, help for control of television channels
(low bit rate) that are received and the cable model in STB give access to
other services where a high bit rate channel is used to connect the
subscriber back to the cable head-end.
• These also provides an “interaction television”- where an interaction
channel helps the subscriber to demand his/her interests.

Integrated Services Digital Network
• Integration of services with PSTN
• Conversion of Telephone Networks into all digital form.
• Two separate communication channels – supporting two telephonic calls simultaneously / one
telephone call and the other data call
• These circuits are termed Digital Subscriber lines (DSL)
• UE can be either an analog or a digital phone.
• Digital phone- all required conversion circuitry seen in handset
• Analog phone- all required conversion circuitry was seen in the network terminal equipment.
• Basic Rate Access – two 64kbps per channel –either independent or combined as one 128kbps
line.
• This definitely requires two separate circuitry setup to support two different calls.
• The synchronization of the two channels into a single 128kbps requires a additional box to do
the aggregation function.
• Primary rate Access – 1.5 or 2 Mbps data rate channel
• Service provided has now extended to p X 64kbps where p =1..30.
• Supports MM applications with an increased
Dr. Nandhinicost
Vineethcompared to PSTN. 16
Broadband Multiservice Networks
• Broadband- Bit rates in excess of the max 2 Mbps – 30 X 64 kbps given by ISDN.
• These are enhanced ISDN and hence termed Broadband-ISDN (B-ISDN) with the simple ISDN
termed as Narrowband or N-ISDN.
• Initial type did not support video. Current ones do with the intro to compression tech.
• As the other three types of networks also started showing improvements with the introduction to
compression techniques, broadband slowed down.
• Multi Service- Multiple services- Different rates were required for different services, hence
flexibility was introduced. Every media type was first converted to digital form and then integrated
together. This is further divided into equal sized cells.
• Uniform size helped in better switching.
• As different MM requires different rates, the rate of transfer of cells also vary and hence termed
Asynchronous transfer modes.
• ATM Networks or Cell switching Networks.
• ATM LANs- single site, ATM MANs- high speed back bone network to inter connect a number of
LANs
• These can also communicate with other types of LANs
URLs explaining in depth working
• Television Broadcast - https://www.youtube.com/watch?v=bvSDQmo-
Wbk
• Satellite TV -https://www.youtube.com/watch?v=OpkatIqkLO8

Multimedia Applications
• The applications fall under three categories:
• Interpersonal communication
• Interactive applications over the internet
• Entertainment applications
• Interpersonal communication
• Involves all four MM types
• May in single form or combined form
• Speech only
• Telephones connected to PBX or a PSTN/ISDN/Celullar networks
• Computers can also be used to make calls
• Computer telephony Integration-requires a telephone interface card and associated software.
• Adv – Phone Directory can be saved and dialling a number is easily done with a click
• Telephony can be integrated with network services provided by the PC
• Additional services: Voice mail and teleconferencing
• Voice mail – in the absence of called party, a message is left for them which is stored in a central server
Which can be read the next time the party contacts the server.
• Teleconferencing- conference callDr.
– requires an audio bridge – to setup a conf call automatically 21
Nandhini Vineeth
Telephony
• Internet also support telephony.
• Initially only PC TO PC Telephony was the only one supported. Later they were
able to include telephones in these networks.
• Here voice signal was converted to packets and hence necessary Hardware and
softwares were required
• Telephone over internet is collect packet voice or Voice over IP (VoIP).
• When a PC is to call a telephone, a request is sent to a Telephony Gateway with
IP address of the called party (CP). This obtains the phone number of the called
party from source PC. A session call is established by this TG to the TG nearest to
CP using internet address of the gateway. This gateway initiates a call set up
procedure to the receiver’s phone.
• When the CP answers, reverse communication happens
• A similar procedure for the closing of the call

Image only
• Exchange of electronic images of documents. – facsimile / fax
• To send images, a call set up is made as in telephone call
• Two fax machine communicate to establish operational parameters
• Sending machine starts to scan and digitize each page of the document in turn.
• An internal modem transmits the digitized image is simultaneously transmitted over the network
and is received at the called site a printed version of the image is produced.
• After the last page is received, connection is cleared by the calling machine
• PC fax- electronic version of a document stored in a PC can be send. This requires a telephone
interface card and an associated software. The other side of communication can a Fax machine
or a PC.
• With a LAN interface card and associated software, digitized documents can be sent over other
network types like enterprise networks.
• This is mainly useful for sending paper-based documents such as invoices, marks cards and so on.

Text Only
• Email: Home/Enterprise N/w →ISP->receiver
• Email server , mailbox
• Users can create and deposit / read mails into the mailbox.
• Email servers and Internet gateways work on the standard internet
communication protocols.
• Message format- Source and destination – name and address
• cc- carbon copy
• Can contain only text

Text and images
• An application showing this integration is Computer- supported
cooperative working (CSCW).
• A window on each PC is a shared workspace said to be shared
whiteboard.
The software associated with this is a whiteboard program with a linked set
of support programs.
Shared whiteboard has two components- Change notification and Update
control.
Change notification gives an update to the shared whiteboard program
whenever there is a modification done by the user.
This relays the changes to the update-control in each of the other PC and in
turn proceeds to update the contents of their copy of the whiteboard.

Speech and Video
• Video Telephony – Video camera in addition to microphone is userd.
• A dedicated terminal / MM PC can be used for communication
• An entire display / window in PC is used.
• A two-way communication channel must be provided by the network with sufficient bandwidth to
support this integrated environment.
• Desktop video conferencing call is used in large corporations
• Bandwidth used is more
• Multipoint Control Unit/Videoconferencing server is used (BW –reduced)
• Integrated speech and video is sent from each participant reaches MCU which selects a single
information stream to send to each participant.
• When it detects a participant speaking, it sends that stream to all other participants. Only a
single two way comm channel between each location and the MCU is required.
• Internet supports multicasting- one PC to a predefined group of PCs. MCUs were not used here.
Here number of participants will be limited

Speech and Video- Interpersonal
communication
• Environments : when more number of participants are involved at
one or more locations
• One person may communicate with a group at another location
• Ex. Live lecture
• Lecturer may share notes/ presentation
• Students may only talk or may send video along with speech
• If the students are at same location, it may be like a video phone call (
• IIT-B Live lecture sessions
• When the students are at different locations, either a separate communication channel
is required to each remote site or an MCU is used at lecturer’s site
• Relative high BW is required and hence ISDN or Broadband multiservice n/w suit

Speech and Video- Interpersonal
communication
• Group of people at different location Ex. video conferencing
• Specially equipped room called Video conferencing Studios (VS) are used
• Studios may have one or more cameras, microphones(audio equipment), large
screen displays
• Multiple locations when involved, an MCU is used to minimize the BW demands on
the access circuits
• MCU is a central facility within the network and hence only a single two way
communication channel is required. Example : Telecommunication provider
conference
• In Private networks, MCU is located at one of the sites where the comm
requirements are more demanding as it must support multiple input channels, and
an output stream to broadcast to all sites

Multimedia
• Three different types of electronic mail other than text only
• Voice mail:
• Voice mail server is associated with each network.
• User enters a voice message addressed to a recipient
• Local voice mail server relays this to the voice server of the intended recipient network.
• When the recipient logs in to the mailbox next, the message is played out
• Video mail also works the same way – but with video and speech
• Multimedia Mail
• Combination of all four media types
• MIME – Multimedia Internet Mail Extensions
• In case of speech and video, annotations can be sent either directly to mailbox of recipient
with original text message.
• Stored and played in a normal way/ played when the recipients reads out the text message
and the recipient terminal supports audio /video

Multimedia E-mail Structure

Interactive applications over Internet
• World Wide Web
• Linked set of multimedia servers that are geographically distributed
• Total information stored is equivalent to a vast library of documents.
• Pages are linked through Hyperlinks (References to other pages / same page)
• Options available to jump to specific point of pages.
• Anchors used
• HyperText
• HyperMedia
• Uniform Resource Locator- URL –unique identification to a location
• Home Page
• Browser
• HyperText Markup Language
• Free sites / Subscription sites
• Teleshopping, Telebanking- initiate additional transactions

Entertainment Applications
• Two types:
• Movie/ video – on demand
• Interactive television
• Movie/ video –on demand
• Video / audio applications need to be of much higher quality/resolution
since wide screen or stereophonic sound may be used.
• Min channel bit rate of 1.5 Mbps is used.
• Here a PSTN with high bit rate required / Cable network
• Digitized movies / videos are stored in servers.

Entertainment Applications
• Subscriber end
• Conventional television
• Television with selection device for interactive purpose.
• Movie-on-demand /video-on-demand
• Control of playing of the movies can be taken like Video Casette Recorder
• Any time – User’s choice
• This may result in concurrent access leading to multiple copies in the server
• This may add up to the cost
• Alternate method used is not play the movie immediately after request but defer till the
next time playout time. All request satisfied simultaneously by server outputting a single
video stream. This mode is known as near movie-on-demand or N-MOD.
• Viewer is unable to control the playout of the movie
• Formats of the files also play a significant role

Interactive Television
• Broadcast Television include cable, satellite and terrestrial networks.
• Diffusion of analog and digital television programs
• Set Top Box also has a modem within it
• Cable Networks- STB provides a low bit rate connection to PSTN as well
requests and a high bit rate connection to Internet or broadcasts
• An additional Keyboard, telephone can be connected to the STB to gain
access to services.
• Interaction Television:
• Through the connection to PSTN, users were initially actively able to respond to the
information being broadcast.
• Return channels helped in voting, participation in games, home shopping etc.,
• STB in these networks require a high speed modem.

Network QoS
• Communication Channel
• Parameters associated – Network QoS
• Suitability of a channel for an application can be decided using these
• Different for Circuit Switched networks and Packet Switched networks
• Circuit-Switched N/w
• Constant bit rate channels
• Parameters
• Bit rate
• Mean bit rate error
• Transmission delay

Network QoS - Circuit-Switched N/w
• Bit error rate
• Probability of the bit being corrupted during transmission
• A BER of 10-3 =1/1000 –
• indicates 1 bit may be corrupted in 1000 bits
• Bit errors occur randomly
• If BER probability is P and the number of bits in a block is N then assuming
random errors, the prob of a block containing a block error PB is given by
PB =1-(1-P)N
Which approximates to N X P if NXP < 1
Ex. If P=1/1000 and N =100 bits, PB =100 /1000=1/10

Network QoS - Circuit-Switched N/w
• Both CS and PS provide an unreliable service known as a best effort or best try service
• Erroneous packets are generally dropped either within the network or in the network interface
of the destination.
• If the application demands error free packets, then the sender needs to divide the source
information into blocks of a defined max size and transmits and the destination is to detect if the
block is missing.
• When a block is missed out, destination requests the source to send another copy of the missing
block. This is reliable service.
• A delay is introduced so the retransmission procedure should be invoked relatively infrequently
which dictates a small block size.
• High overheads are also involved since each block contains additional information associated
with retransmission procedure.
• Choice of a block size is a compromise between the increased delay resulting from a larger block
size and hence retransmissions
• When small block sizes is used, loss of transmission bandwidth results from the high overheads

Network QoS - Circuit-Switched N/W
• Transmission delay within a channel is determined not only by the bitrate
but also delays that occur in the terminal/ computer n/w interfaces(codec
delays) + propagation delay
• ie. Transmission delay depends on bitrate + terminal delay + interface
delay + propagation delay
• Determined by the physical separation of the two communicating devices
and the velocity of propagation of a signal across the transmission
medium.
• Speed of light in free space is 3 X 108 m/s
• Physical media – 2 X 108 m/s
• Propagation delay is independent of the bit rate of the communications
channel and assuming that codec delay remains constant, it is the same
whether the bit rate is 1 kbps, 1 Mbps or 1 Gbps
From Forouzan
• Propagation speed - speed at which a bit travels though the medium
from source to destination.
• Transmission speed - the speed at which all the bits in a message
arrive at the destination. (difference in arrival time of first and last bit)
• Propagation Delay = Distance/Propagation speed
• Transmission Delay = Message size/bandwidth bps
• Latency = Propagation delay + Transmission delay + Queueing time +

Processing time
Network QoS -Packet Switched Networks
• QoS Parameters
• Max Packet Size
• Mean packet Transfer rate
• Mean packet error rate
• Mean packet Transfer delay
• Worst case jitter
• Transmission delay
• Inspite of a constant bit rate supported by most of the networks, the store
and forward delay in router/PSE, the actual rate across network also
becomes variable.

Network QoS -Packet Switched Networks
• Mean packet Transfer rate
• Average number of packets transmitted across the network per second and coupled with packet
size being used, determines the equivalent mean bit rate of the channel
• Summation of mean - store and forward delay that a packet experiences in each PSE/ router in its
route
• Mean packet error rate PER
• Prob of a received packet containing one or more bit errors.
• Same as the block error rate of a CS n/w
• Related to the max packet size and the worst case BER of the transmission links that
interconnect the PSEs/routers that make up the network
• Jitter – worst case - variation in the delay
• Transmission delay is the same in both pkt mode or a circuit mode and includes the codec
delay in each of the communicating computers and the signal propagation delay.
Problem – Network QoS

Application QoS
• In applications depending on the media the parameters may vary
• Ex. Images – parameters may include a minimum image resolution
and size
• Video appln- digitization format and refresh rate may be defined
• Application QoS parameters that relate to network include:
• Required bit rate or Mean packet Transfer rate
• Max startup delay
• Max end to end delay
• Max delay variation/jitter
• Max round trip delay

Application QoS
• For appln demanding a constant bit rate stream, the important parameters are bit
rate/mean packet transfer rate, end to end delay, the delay variation/jitter since at the
destination decoder problems may be caused if the rate of arrival of the bitstream is
variable.
• For applications with constant bit rate, a circuit switched network would be appropriate
as the requirement is that call setup delay is not important, but the channel should be
providing a constant bit rate service of a known rate
• Interactive applications- a connectionless packet switched network would be

appropriate as no call set up delay and any variation in the packet transfer delay are
not important
• For interactive applications, however the startup delay (delay between the application
making a request and the destination (server) responding with an acceptance. Total time
delay includes the connection establishment delay + delay in source and destination.

Application QoS
• Round trip delay is important for a human computer interaction to be
successful-delay between start of a request for some info made and
the start of the information received/displayed should be as short as
possible and should be less than few seconds
• Application that best suits packet switched n/w compared to CS is a
large file transfer from a server to a workstation.
• Devices in home n/w connection can use PSTN, an ISDN connection,
or a cable modem
• PSTN/ISDN – CS constant bit rate channel -28.8kbps(PSTN) and
64/128kbps(ISDN)
Application QoS
• Cable modems operate in Packet switched mode.
• As concurrent users are seen using the channel, 100kbps of mean data rate
can be used.
• Time taken to transfer the complete file is of interest as though 27Mbps
channels are available, as time sharing is used, file transfer happens at the
fullest in the slot allotted.
• Summary, when a file of 100Mbits is to be transferred, the min time taken
by
• PSTN and 28.8kbps modem 57.8min
• ISDN at 64 kbps 26 min
• ISDN at 128kbps 13 min
• Cable modem at 27Mbps 3.7 sec
Application QoS
• Many situations, depending on the parameters, constant bit stream
applications can pass through packet switching networks also
• Buffering is the technique used to overcome the effects of jitter.
• A defined number of packets is kept in a memory buffer at the
destination before play out.
• FIFO discipline is followed
• Packetization delay adds up to the transmission delay of the channel
• Packet size is chosen appropriately to give an optimized effect

Application QoS

Application QoS
• To check the suitability of the network to applications to be transmitted
by the end machines, service classes have been defined.
• Every specific set of QoS parameters defined for each service class.
• Internet – includes all classes of services.
• Packets in each class have a different priority and treated differently
• Ex. Packets relating to MM applications are sensitive to delay and jitter and
are given high priority compared to packet with text messages like email
• During network congestion, video packets are transmitted first.
• Video packets are more sensitive to packet loss and hence given a higher
priority than audio.

Application QoS

MULTIMEDIA INFORMATION
REPRESENTATION

Text
• Three types of text:
• Unformatted text:
• Plaintext created from a limited character set
• Formatted Text
• Richtext – documents are created which comprise of strings of characters of different
styles, size, color etc. Tables, graphics and images are inserted
• Hypertext
• Integrated set of documents – have defined linkages between them.

Unformatted Text
• ASCII Table
• Printable characters- Alphabets, Numbers, punctuation characters
• Control characters-
• backspace, delete, Esc etc.,
• Information seperators: File Seperators, Record separator
• Transmission control characters:
• Start of Heading (SOH), Start of Text (STX), End of Text(ETX), Acknowledgement
(ACK), Negative ACK (NACK), Synchronous Idle (SYN), Data link Escape(DLE)
• ASCII Values: A- 65 – Row numbered 7 to 5 first, then columns 4321.
• So, A can be read as 1000001

Example Videotex / Teletext characters

Unformatted Text
• Mosaic characters:
• Column 010/011 and Colm 110/111 are replaced with the set of mosaic
characters
• These are used in combination of Upper case characters to create simple
graphical images
• Example application is Videotex and Teletex- general bx information services
available through a std TV set and used in a no of countries.
• Total page is made up of a matrix of symbols and characters which all have
the same size, larger size text and symbols possible by the use of groups of
basic symbols.

Formatted Text
• Produced by Word Processing packages
• Publishing sector- books, papers, magazine, journals etc.,
• Characters of various style, size, shapes
• Bold/Italic/Underline/Plain
• Chapters, sections, paragraphs each with specific tables, graphics and
pictures inserted at appropriate points
• Graphics Picture

Formatted Text
• To print the formatted Text, the microprocessor inside the printer
must be programmed –
• to detect and interpret format of characters, convert table, graphics
or picture into line by line format for printing
• Print preview was planned to WYSIWYG

HYPERTEXT
• Hypertext- type of formatted text that enables a related set of
documents-pages with defined linkage points-hyperlinks
• Ex. Electronic version of a brochure of a university

Images
• Computer generated images - graphics
• Digitized images of documents as well as pictures
• Display/Printing- in the form of two dimensional matrix of individual
picture elements - known as pixels / pels
• Stored in a computer file
• Each type is created differently

Images - Graphics
• Different S/W packages and programs are available for the creation of
computer graphics.
• Easy-to-use tools to create graphics- lines, circles, arcs, oval, diamond etc.,
as well free form objects
• Paint brush or mouse can be used to create shapes required
• Predrawn -(either by author/ from gallery-clipart) can be taken and
modifications done
• Textual images, precreated tables, graphs, digitized pictures and
photographs can be included.
• Objects can be made to look in layers
• Shadows can be added to give a 3D effect
Images - Graphics
• Computer display screen is also made up of two dimensional matrix of individual picture elements
- - pixel each of which have a range of colors associated with it
• Video Graphics Array (VGA) – a common type of display consisting of 640 X 480 pixels- 8 bits per
pixel-256 colours are allowed
• All objects are made up of a series of lines that are connected to each other and may appear as a
curved line.
• Adjacent pixels form a shape
• Attributes of each object - its shape, size (based on the border coordinates), colours and shadow
• Editing involves changing these attributes
• Moving an object involves changing border coordinators and leaving other properties in tact
• Shape- can be open or close.
• Open- Beginning and end pixel need not be same.
• Close- Beginning and end pixel need to be the same
• Rendering – Filling colours to the objects
• Basic low level commands can be used to set the colours

Images - Graphics
• Representation of a complex graphics-
• Analogous to computer program
• Program – Main body + Functions (parameters) + Built in functions
• Graphics – Basic commands to create and added functionality – built-in or
done by user
• Main body is used to invoke various functions in order required
• Graphics – base layer. Call various functions to create layers
• Two forms of representation of a computer graphic - a higher level
version(simi to high level program) and the actual pixel – image of a
graphic(simi to byte string equivalent –low level equi) said to be bitmap
format

Images - Graphics
• Transfer over a network can be done in either form.
• HLL format is more compact and requires less memory to store the image and
less BW for its transmission. Destination must be able to interpret the various
high level command
• Bit map format is often used – Many generalized formats like Graphical
Interchange Format (GIF) and Tagged Image File Format (TIFF)
• There are also software packages such as Simple Raster Graphics Package
(SRGP) which convert the HLL format to pixel image form.

Images – Digitized documents
• Ex. Digitized document is that produced by the scanner associated with a facsimile
machine.
• Each complete page from left to right is scanned to produce a sequence of scan lines
that start at the top of the page and end at the bottom.
• The vertical resolution of the scanning procedure is either 3.85 or 7.7 lines per mm
which is equivalent to approx. 100 or 200 lines per inch.
• As every line is scanned, it is digitized to a resolution of approx. 8 per picture elements-
known as pels with fax machines – per millimeter
• Fax machine use a single binary digit to represent each pel- 0 for a white pel and a 1 –
for a black pel. Two million bits are produced for a one page digital representation.
• Receiver then prints reproducing the original image by printing out original stream to an
equivalent resolution.
• Fax machines – used to transmit black and white images such as printed documents
mainly text

Images - Digitized Pictures
• Consider scanners- digitizing monochromatic images
• 8 bits per pixels leading to 256 colors varying from white to black with varying shades of grey.
• Little improved quality than a facsimile
• Colors images: necessary to know how colors are formed and picture tubes in monitors work.
• Color Gamut- Combinations of three colors- Red, Green and Blue
• Mixing technique is called additive color mixing technique.
• All three colors – RGB are 0, black is obtained, RGB max- white is obtained.
• This tech is helpful for producing color image on a black background, i.e., display applications.
• TV Sets and computer Monitors hence prefer RGB.
• Subtractive color mixing is seen when CMY (Cyan-Magenta-Yellow) is used,
• Here white is produced with all three values to zero and black is produced when all three are to the
maximum
• This is suitable for producing color image on a white background i.e., printing application
• Printers and plotters - CMY

Raster Scan Principles
• Picture tubes in most TV sets operate using Raster-scan.
• Finely focused electron beam-the raster – being scanned over the complete
screen.
• Scan starts at the left top of the screen, continues with horizontal discrete lines
with horizontal retraces till it reaches the bottom right corner- progressive
scanning.
• Each set of hori scan lines is a frame (N individual scan lines- 525-N-America,S-
america, most of Asia, 625—Europe and a number of other countries)
• Light sensitive phosphorus coating is seen in the inside of the display screens.
They emit light when energized with electron beam.
• The power in electron beam decides the brightness. Level of power changes with
lines.
• Beam turned off during retrace

• BW picture tubes- single electron beam used with white-sensitive phosphor.
• Color tubes- three sep closely located beams (R,G,B)with a 2D matrix of pixels- color sensitive phosphors.
• Set of three phosphors- phosphor triad
• Each pixel is in shape of a which merges with neighbors
• Spot size is .025 inches (0.635 mm) and when viewed from a distance continuous color image is seen.
• To support mobility the persistence of color produced by phosphor is designed to decay very quickly. Hence
refreshing the screen is necessary.
• The light signal associated with each frame varies to show mobility with moving image, and stays the same
for still images
• Frame refresh rate is high enough to keep our eye not recognize the refresh.
• A low refresh rate leads to flicker. RR of 50 times per second is required-frequency of mains electric supply
is required is 60 Hz in America and Asia and 50Hz in Europe

• Analog TV- Picture tubes operate in analog mode- amplitude of each signal vary as each line is scanned
• Digital TV – color signals are in digital form and comprise of a string of pixels with a fixed number of pixels per scan line.
• A stored image is displayed by reading the pixels from memory in time-synchronism with the scanning process and
continuously varying analog form by means of a digital –to-analog converter.
• As the computer memory is to be continuously scanned for the display, a separate block of memory known as video RAM
is used to store the pixel images. So the graphics program writes into this VRAM, when a new image is to be shown on the
screen.
• Graphics program: Creates the high level version of the image interactively with KB and mouse by the
• Display controller part of the program interprets sequences of display commands and converts them into displayed
objects by writing the appropriate pixel values into video RAM. – Frame/ Display refresh buffer.
• Video controller is a H/W sub system that reads the pixel values stored in the VRAM in time-synchronism with the
scanning process and for each set of pixel values converts these into the equi set of red, green and blue analog signals for
output to display.

Pixel depth
• The number of bits per pixel is known as the pixel depth.
• Decides the range of colors that can be produced.
• Ex. 12 bits- 4 bits /primary color -4096 diff colors
• Ex. 24 bits – 8 bits /primary color – 16 million (224) – eye does not
discriminate
• Color – Look up table(CLUT) - subset of colors (supported by eye’s vision)
above are selected are stored in a table and each pixel value is used as an
address to a location within the table which contains the corresponding
three color values.
• Ex. If each pixel is 8 bits and CLUT contains 24 bit entries, 256 colors from a
palette of 16 million are selected and filled in the CLUT. Hence amount of
memory required to store an image can be reduced significantly.

Aspect Ratio
• Aspect Ratio - Number of pixels per line and no of lines per frames
• Ratio of screen width to screen height
• AR of current TV tubes is 4/3 with older tubes – PC Monitors are based
• 16/9 with widescreen TV tubes
• US color TV standard- National Television Standards Committee(NTSC)
• Europe - three color TV standards - PAL (UK), CCIR(Germany),
SECAM(France)
• 525 (US…) and 625 (European …..). Not all lines are used for display as
some are for control and other info

• The memory requirements to store a single digital image can be high
and vary between 307.2 Kbytes for an image displayed

Aspect Ratio
• Vertical Resolution – 480 pixels –NTSC, 576 – with other three
• Horizontal – 640 (480x4/3) pixels – NTSC , 768 (576 x4/3)
• This produces a lattice structure – said to produce square pixels
• Some lines are used to carry control and other information
• Memory required to store a single digital image can be high and vary
between 307.2 kbytes for an image displayed on a VGA screen with
8bppixel.
• SVGA(Super VGA) -24 bits per pixel

Digital camera and scanners
• The scenario of capturing an image using a digital camera or scanner and
transferring to a computer directly is shown in fig.
• Alternative, store in the camera itself and then downloaded
• Capturing through a solid-state device called image-sensor.
• Silicon chips with two dimensional grid of light sensitive cells called photosites .
• Charged Coupled Device(CCD) is a widely used image sensor.
• When shutter is activated, each photosite stores the level of intensity of the light
that falls on it and converts it into equi elec charge.
• The level of charge is read and converted to digital value using an ADC
• In scanners, the image sensor comprises just a single row of photosite
• Each line is scanned in a time sequence with the scanning operation and each
row values are digitized

DC and Scanner
• For color images, the color asso with each photosite and hence pixel position is obtained using
any of the three methods below.
• 1. Surface of each photosite is coated with R,G,B filter so that its charge is determined only by the
level of R,G,B light that falls on it. Coatings are in a 3 X 3 grid structure. The color associated is
based on the 8 cells surrounding it. The levels of other two colors in each pixel are then estimated
by an interpolation procedure involving all nine values.
• 2. This method supports use of three separate exposures of a single image sensor, first through
red, second a green and third a blue filter. The color is based on the charge obtained with each of
the three filters-R,G and B. This cannot be used for video cameras as three sep exposures are
required. This is used with high resolution still image cameras in studios with tripod.
• 3. Uses three sep image sensors – one with all the photosites coated with a red filter, the second
coated with a green filter and the third coated with a blue filter. A single exposure is used with
incoming light split into three beams each of which exposes a sep image sensor. This is used in
professional quality- high resolution still and moving image cameras since they are more costly
owing to use of three sep sensors and asso signal processing circuits.

DC and Scanner
• Once an image/frame has been captured and stored on the image sensor,
the charge stored at each photosite location is read and digitized.
• CCD reads the charge single row at a time and transfers to a readout
register. The charge on each photosite position is shifted out, amplified
and digitized using an ADC. All rows are read out and digitized.
• When this output is directly sent to a computer , bitmaps can be loaded in
the framebuffer which are ready for display.
• When stored in the camera, multiple images are stored and then
transferred to computer. They can be stored in an integrated circuit
memory either on a removable card or fixed within the cameras. Cards in
card slots and cable link used respectively to transfer.
• File Formats used to store a set of images. TIFF/Electronic Photography
AUDIO
• Audio- Speech / Music
• Generated by Microphone/ speech synthesizer.
• If by a synthesizer, then it would be a digital signal ready to be stored in a computer
• If by Microphone, then those analog signal need to be converted to digital signal using an audio
signal encoder. If this is to be sent to a speaker which again demands analog signal, an audio
signal decoder is required for this conversion.
• BW of a typical speech is 50 Hz to 10KHz.
• Music -15Hz to 20 KHz
• The sampling rate used should be in excess of their Nyquist rate which is 20ksps for speech and
40ksps for music.
• The no. of bits per sample must be chosen so that the quantization noise generated by the
sampling process is at an acceptable level rela to min signal level. Speech – 12 bits per sample
and for music – 16 bits.
• Sampling rate is often lowered in order to reduce the amount of memory that is required to store
a parti passage of music

PCM Speech
• Earlier PSTN was using a pure analog system, so voice signals were transferred through switches.
• With introduction of digital networks, newer digital equipments were introduced. Bw – 200 Hz to 3.4Khz
• Poor quality of bandlimiting filters demanded a sampling rate of 8 Khz though the Nyquist Sampling rate
was 6.4 khz.
• 7 bits per sample was used in American countries and 8 bits by European countries to minimize the resulting
bit rate, as 56kbps and 64 kbps respectively.
• Modern systems are with 8 bits showing better performance than 7 bits. The digitization procedure is pulse
code modulation and the international standard relating to this is defined in ITU-T Recommendation G.711
• Encoder uses a compressor and the decoder uses an expander
• Considering the quantization procedures, Linear quantization intervals when used produces the same level
of quantization noise irrespective of the magnitude of the input signal.
• Ear is however sensitive to noise on quite signals than on loud signals
• To reduce the effect of quantization noise with 8bits per sample, PCM system uses non –linear intervals with
narrower intervals used for smaller amplitude signals than for larger signals. This is done by the compressor
and expander circuits. The overall operation is companding.
• Compressor and expander characteristic are shown in the figure

PCM Speech
• Compressor circuitry compresses the amplitude of the input signal.
• When the amplitude increases, the level of compression and hence the
quantization intervals increases
• The resulting compressed signal is then passed to ADC in turn performs a linear
quantization on the compressed signal.
• At receiver, each linear codeword is first fed to a linear DAC.
• The analog output from the DAC is then passed to the expander circuit which
performs the reverse operation of the compressor circuit. Modern systems
perform these digitally.
• Two different compression-expansion characteristics in use: µ-law (America) and
A-law used in Europe.
• Hence a conversion operation is suggested when two systems communicate.

CD Quality Audio
• Compact disks are digital storage devices for music and more general multimedia information streams.
• A standard is associated with these said to be CD-Digital Audio (CD-DA)
• Music – audible BW of 15Hz to 20KHz and min sampling rate of 40ksps.
• Actual rate is higher than this to allow imperfections in band limiting filter used, and the resulting bit rate is then compatible with one of the higher
transmission channel bit rates available in public networks.
• One of the sampling rates used is 44.1ksps which means that the signal is sampled at 23 microsecond intervals.
• BW of recording channel on a CD is large, a high number of bits per sample can be used.
• The standard defines 16 bits per sample, which is the minimum requirement with music to avoid the effect of quantization noise.
• Linear quantization can be used with these number of bits that yields 65536 equal quantization intervals.
• For stereophonic music, two separate channels are required and hence the total bit rate required is double that for mono.
• Bit rate per channel= sampling rate X bits per sample
• = 44.1 X 103 X 16 =705.6Kbps
• Total bit rate = 2 X 705.6 = 1.411Mbps
• Within a computer, in order to reduce the access delay, multiples of this rate are used
• With CD –ROMs this bit rate is used, which is widely used for the distribution of multimedia titles (A multimedia project shipped or sold to
consumers).

Synthesized audio
• When digitized, audio of any form can be stored in a computer.
• The amount of memory required to store the digitized audio
waveform can be very large even for relatively short passages.
• It is for this reason that synthesized audio is used by multimedia
applications, as the size of this type of audio is 2 to 3 orders of
magnitude less than that required to store the equivalent digitized
waveform.
• It is easier to edit synthesized audio and to mix several passages
together.

Audio Synthesizer
• : Three components
• Computer(with application programs), Keyboard(based on a piano) and a set of sound
generators.
• Computer accepts input from keyboard and outputs to sound generators which produces a
corresponding waveform via DACs to drive the speakers
• the key when pressed produces a diff codeword (message) which is read by a computer
program
• The pressure applied on the key is also of importance- message indicates the complete
detail.
• Control panel has switches and sliders allows the computer program addn info such as
volume of gene output and selec sound effects to be associated with each key.
• Secondary storage interface store the entire piece of audio in sec storage like floppy/CD
• Editing, mixing of existing several stored passages
• Sequencer program associated with the synthesizer then ensures that the resulting
integrated sequence of messages are synchronized and output to sound generators

Audio Synthesizer
• Even in the keyboard, there are keys for diff instruments (guitar)
• To distinguish between these, a std set of codewords are used (both
ip and op)
• These are defined in a standard- Music Instrument Digital Interface
(MIDI)
• In addition to the messages used by synthesizer, the type of
connectors, cables and electrical signals that are used to connect any
type of device to the synthesizer

Text and Image compression
• Requirement- reduction in volume of information transmitted
• Compression technique is applied on text, image, speech, audio and video
to either reduce the volume or reduce the BW required to transmit
• Compression Principles:
• Source encoders and Destination Decoders
• In the source before tx, compression is done by Source encoders and to
extract an exact copy of it in the destn, decompression is done by
Destination Decoders
• Time req for compre and decompre is not always critical for text and
image and done through s/w
• For audio and video, time required by software can always be not
accepted and hence two algo must be done by special processors.

Compression Principles- Lossless and Lossy
compression
• Lossless compression- when decompressed there should no loss of
data . Said to be reversible. Example application- transfer of a text
file
• Lossy compression – aim may be not to reproduce an exact copy of
the source information after decompression but rather a version of it
perceived by the recipient as a true copy.
• Higher the level of compression, the approximation is more.
Applications- transfer of audio, images and video files
• Human eye is generally insensitive to such missing data

Compression Principles- Entropy encoding
• Is lossless and independent of type of information that is compressed
• Two examples:
• In some applications these two are combined and in some others they are used separately.
• Run length encoding:
• Typical applications are when the source info comprises long substrings of same character or binary
digit
• Instead of indep codewords/bits, the codeword for the char/bit and the no of times of repetitions
are transmitted. In the destn which knows the list of codewords, repeats it for the req no of times.
• In applns, when there is a limi number of substrings, each is a given a separate CW.
• The final bit string will be a combination of the appropriate CW.
• Ex. Binary strings produced by a scanner in a facsimile m/c of a typed document generally contains
long substrings of either binary 0s or 1s. Ex. 000000011111111111110000000000. This can be
represented as 0,7,1,13,0,…
• If it is always followed that the string starts with 0, then it is sufficient to transmit 7,13….

Compression Principles
• Statistical encoding
• Gen, ASCII Codewords are used for transmission of strings.
• All the char may not have the same freq of occurrence ie. equal
probability. The freq of occ of A > freq of occu of P> freq of occ of Z
• Statistical encoding exploits usage of Variable codeword length–
where short codewords for freq occu symbols.
• Identifying codeword boundaries in the destination is a challenge,
which if missed wrong interpretation may happen.
• To support this, prefix property is used.
• Ex. Huffman encoding algorithm uses this.
Compression Principles
• Statistical encoding
• The theo min average number of bits that are required to transmit a
particular source stream is known as entropy of the source and computed
using a Shannon formula:
• Entropy H=-i=1 ton ∑Pi log2 Pi
• n-no of different symbols in the source stream and Pi is the probability of
occurrence of the symbol i.
• Hence the efficiency of the enco scheme is the ratio of the entropy of the
source to the average number of bits per codeword that are required with
the scheme.
• Average number of bits per codeword = i=1 ton ∑NiPi

Text Compression
• Three texts- formatted, unformatted and hypertext
• A loss of a single char in text would modify the meaning and hence text
transmissions are lossless. Entropy encoding and in practice stat encoding are
used.
• Two methods with stat enco- 1. using single character for codeword and 2.
variable length
• Example of type 1- Huffman and arithmetic coding algorithms
• 2. Lempel Ziv algo
• 2 types of coding used for text
• 1. text with known charac in terms of char used and their rela frequ of
occurrence. Here an optimum set of variable length codewords are used.
• Short length- frequently occurring. Resulting set of codewords agreed upon by
comm parties is used for all transmission and this is static coding
• Second type is for more gen appln- type of text may vary from one tx
to another
• Optimum set of codewords vary for each tx and are derived as the
transfer takes place. Dynamically decided but in such a way that rx is
able to arrive at the same set of codewords used. This is dynamic or
adaptive coding

Text Compression – Static Huffman coding
• Character string to be tx is analyzed and the freq of characters are noted.
• Unbalanced tree with some branches shorter than others is generated.
• Wider the spread of characters, more unbalanced the tree
• Huffman code tree
• Binary tree, root node, branch node, leaf node
• Ex. String - AAAABBCD
• Total bits- 4 X 1 + 2 X 2+1 X 3 + 1 X 3 =14 bits
• Prefix property

Arithmetic Coding
• HC achieves the Shannon value only if the character/symbol prob are
all integer powers of ½.
• As this is prac diffi, set of codewords produced are rarely optimum
• Codewords produced by the Arithmetic coding achieve the Shannon
value.
• AC is more complicated than Huffman and hence only basic static
coding mode is discussed.
• Ex. A message comprising a string of characters with prob of
• e-0.3, n=0.3, t=0.2, w=0.1, .=0.1 A period is used as the terminating
character at the end of each character string so that the decoder interprets
the end of the string

Arithmetic Coding
• In Huffman coding, sep codeword for each character is used
• In AC, a single codeword for each encoded string of characters.
• Divide 0 to 1 into segment where each seg rep diff charac in stream and the
size of each segment by the prob of the related char.
• Figure explanation
• 0.809 is obtained as 0.8+ 0.3 x .03 (30% of (0.83-0.8))
• Consider 0.8161 is transmitted as the codeword
• The number of decimal digits in the final codeword increases linearly with
the no of char in the string to be encoded
• Generally a complete message is fragmented into small strings. Each string
is encoded separately and the codeword is transmitted

Lempel Ziv Coding
• Codewords are calculated for strings of characters
• For the compression of text, a single table containing all possible character string ie words is held
by both sender and receiver
• Instead of the codeword for the text, the index of the table is tx and rx with the table interprets
the string from the table and reconstructs text
• Table is used as a dictionary and LZ algorithm is known as dictionary based compression
algorithm.
• If word processing holds say 25000 words, 15 bits – 32768 combinations possible.
• For the word- multimedia, we may use only 15 bits instead of 70 bits with 7-bit ASCII codeword
resulting in a compression ratio of 4.7:1
• Shorter words will have lower compression ratio compared to longer words.
• Requirement is that the LZ algo is that both sender and receiver have a copy the dictionary
• Inefficient if small subset of words are stored in dictionary.
• Dynamically developing the dictionary can be a solution to overcome this.

Lempel –Ziv Welsh coding
• Dictionary is dynamically created .
• Initially the table is filled with 128 ASCII charac and as and when new
words are found, the entry is inserted into the table.
• 8 bit codewords are used initially and they are extended

Image compression
• Images can be transmitted either in the form of a program written
using a programming language
• In this case, the tx is lossless as the text is transmitted
• The other form is bit map format which is a lossy tx
• Two diff schemes for these are used
• 1. runlength and statistical encoding used
• Lossless used in digitized documents tx through fascimile
• 2. Combn of transform, differential and run length encoding

Graphics Interchange format
• Extensively used in Internet for the rep and compression of graphical images.
• Here 24 bits per pixel, ie 8 bits per color is used.
• Among available 224 colors, 256 are selected and a table is made.
• The 8 bit index of the table is sent instead of 24 bits
• Global color table- table of colors relate to whole image
• Local color table – table of colors relate to portion of the image
• The fig shows the LZW working equi of GIF
• GIF allows interlaced mode in low bit rate channels
• The image data is organized so that the decompressed image is built up in a
progressive way
• Compressed data is divided into four groups-the first contains 1/8 of the toal
compre image, the second is further 1/8, third is ¼ and the last remaining is ½.
Tagged image file format
• This supports 48 bits per pixel – 16 bits for each R,G and B
• Images are tx in networks using diff formats.
• Every format is indi using a code number
• Code1- uncompressed format
• Code 5 – LZW-compressed
• Code 2,3 and 4 are used with digitized documents
• LZW compr algo is the same as in GIF
• Basic color table starts with 256 entries and extends upto 4096 entries

Digitized documents
• 1 bit per pixel cannot be considered with increased resolution
• ITU-T has given 4 std- T2(Group 1), T3(Group 2),
• T4(Group 3)- analog PSTN –Suits simple graphics
• Overscanning- all lines start with a min of one white pel
• Rx knows first is always w.r.to white
• Termination codes table and make up table are formed as a result of
extensive analysis of experienced transmissions
• Modified Huffmann codes
• EOL Codes- used to check for corruption
• Negative compression ratio is seen when used for high resolution images.
• T6(Group 4) – Modified- Modified read
Two dimensional code table contents
Mode Runlength to e encoded Abbreviation Codeword
Pass b1b2 P 0001+b1b2
Horizontal a0a1,a1a2 H 001+a0a1+a1a2
Vertical a1b1=0 V[0] 1

a1b1=-1 VR[1] 011
a1b1=-2 VR [2] 000011
a1b1=-3 VR [3] 0000011
a1b1=+1 VL [1] 010
a1b1=+2 VL [2] 000010
a1b1=+3 VL [3] 0000010
Extension 0000001000

Compression Principles- - Source encoding
• Exploits a particular prop of the source information to produce an alternate
form of repre that is either a compre version of the original form or is more
amenable to the appln of compression. Two examples are disc here
• Differential Encoding:
• Used extensively in applns where the amplitude of a value or signal covers a
large range but the diff in amp bw succ values/ signals is rela small.
• Instead of large set of codewords for the amplitude, a smaller set of codewords
can be used each of which indicate only the difference in amplitude between the
curr value /sig being encoded and the preceding value. Ex. Digitization of analog
symbol requires 12 bits to obtain requ dynamic range but only 3 bits are required
to express the difference, leading to 75% of BW being saved.

Compression Principles -Transform encoding
• Transforming the source information from one form to another the new form
lending itself to the applications of the compression
• There is no loss of info asso with the transformation operation and is used in the
applications involving images and video.
• Ex. The digitization of mono chromatic image produces a 2D matrix of Pixel values
each of which refers to the level of gray in speci pixel positions.
• Magnitude of each pixel value may vary.
• As range of pixel values are scanned,
• the rate of change in magnitude may vary from zero- If all pixel values are the
same
• low rate of change – one half diff from the next half.
• High rate of change – if each pixel magnitude changes from one location to next

• Rate of change in magnitude as we traverse the matrix give rise to
spatial frequency
• Considering an image scanning pixels in the horizontal direction gives
rise to horizontal freq components and if done in vertical direction –
gives rise to vertical freq components
• Human eye is less sensitive to higher spatial freq compo compared
to lower spatial freq comp
• Higher freq comp which are not identified by the eye can be
eliminated thereby reducing the volume of information without
degrading the quality of the orig image.
• The transformation of a 2d matrix of pixel values to an equivalent
matrix of spatial frequency components can be carried out using a
mathematical technique known as Discrete cosine transform(DCT).
• This is lossless except for some rounding errors.
• Once the spatial freq components known as coefficients are arrived
at, then the ones below a threshold can be dropped. At this point
some loss is experienced.

Source encoding-
• Three properties of a color source
• Brightness (term Luminance)
• Rep amount of energy that stimulates the eye and varies on a
grayscale from black to white
• Independent of the color of the source.
• Hue (chrominance)
• Represents the actual color of the source as each color has a different
Freq / wavelength that is helpful for the eye to distinguish colors
• Saturation (chrominance)
• Strength of the color
• a pastel color has a low level of saturation than a color such as red.
• Saturated color – red has no white in it

Source encoding
• When 0.299R+0.587G+0.114B is the proportion for the color white to
be produced on the display screen
• Luminance signal – a measure of the amount of white light (Y) it
contains
• Two other signals – blue chrominance (Cb) and red chrominance (Cr)
used to represent the coloration – hue and saturation. These are
obtained by the two color difference signals.

Joint Photographic Experts Group
• JPEG is defined in the international std IS 10918
• A range of different compression modes according to the appln is chosen
• Discussion is on lossy sequential mode/baseline mode – as it is used for both monochromatic and
color digitized images
• 5 Stages as in figure
• Image/Block preparation
• Inp – Mono chrome, CLUT, RGB, YCbCr
• As DCT is involved and every pixel calculation involves all the pixels in the image, first 8 X 8 blocks
are constructed.
• Formula used for the conversion of 2D input matrix P[x,y] to the transformed matrix F[i,j]
• x,y,i and j vary from 0 to 7

• All 64 values in input matrix contri to each entry in the transformed matrix
• When i=0 and j=0, the hori and verti freq coeff- two cosines terms become 1 and
hence F[0,0] deals simply a summation of all values in the input matrix.
Essentially it is the mean of all 64 values and known as DC coefficient
• All other have a freq cooeff assoc either hori or verti – these are known as AC
coefficients
• For j=0, only hori freq coeff are present
• For i=0, only verti freq coeff are present
• In all transformed matrix, both horiz and verti freq coeff are present to varying
degrees
• When a single color is seen, the DC Coeff is the same and only a few AC coeff
within them.
• Color transitions show diff DC coeff and a larger number of AC coeff in them
• Quantization:
• Very little loss of information during the DCT phase- losses are only due to fixed point arithmetic.
• Main source of loss occurs during the quan and entropy encoding stages where the compression takes place
• Human eye responds primarily to DC Coeff and the lower spatial freq coeff.
• If the mag of a higher freq coeff is below a certain threshold, the eye will not detect it. Such are made to
zero by dropping in quantization phase. These cannot be retrieved in decoding phase
• For magnitude check, division by using the threshold is used in place of comparing and elimination. If
quotient is zero, dropped.
• If divisor used is 16, clearly 4 bits are saved
• The threshold value varies for each of the 64 DCT coefficients. These are maintained in the quantization
table .
• The choice of threshold value is important as it is a compromise between the levels of compression that is
required and the resulting amount of info loss.
• Two tables one for luminance and chrominance can be used or customized tables allowed.

• Entropy encoding
• Consists of four steps: Vectoring, diff encoding, run-length encoding, Huffman
encoding
• Vectoring:
• Conversion of 2D to single dimen as all encoding schemes involve one d array. This is
vectoring
• Zigzag scanning
• Differential encoding
• The difference in the coefficients tx
• If 12,13,11,11,10….. Tx values may be 12, 1,-2,0,-1…… First enco rel to zero.
• The difference values are encoded as (SSS,value) SSS – no of bits required to encode the
value, and value field – actual bits that represent the value.
• Posi value- unsigned binary form
• Negative value - compliment

• Run length encoding
• AC coefficients encoded in the form of a string of pairs of values. Each
pair is made up of (skip,value) where skip is number of zeros in the
run and value –next non zero coeff
• Ex. (0,6)(0,7)(0,3)(0,3)….
• Huffman encoding
• The bits in SSS field is sent as Huffman encoded form
• Due to the use of variable length codewords in the entropy encoding stage,
this is known as variable length coding stage

• Frame building:
• Defined way is required for the decoder to decode the data
• Hence the defn of structure of the total bit stream is said to be frame
• Frame consists of scans
• Decoder works in reverse of encoder

• Inverse DCT -

Video
• Features in a range of MM appln
• Entertainment: Bx TV and VCR/DVD recordings
• Interpersonal: video telephony and video conferencing
• Interactive: windows containing short video clips
• Video quality requirement varies with application. Chat-small box,
video play- big screen
• So a set of standards are available not a single one

Broadcast Television
• Picture tubes
• RGB
• NTSC-525, PAL/CCIR/SECAM – 625
• Refresh rate- 60 or 50 frames per second
• Broadcast TV operates slightly different in terms of scanning sequence used and in
the choice of color signals compared to computer monitor inspite of the same
principle followed by both.
• Scanning Sequence
• Though min RR is declared as 50 times per second to avoid flicker, from human eye’s
perspective rate of 25 time per second is sufficient.
• To reduce the transmission BW, transmission of each frame is done in two halves, each half
termed a field- first only with odd scan lines and the second with even scan lines.
• These two halves are received and integrated in the receiver.
• Interlaced scanning is used to integrate the two fields.

• In 525 line system- each field comprises of 262.5 lines – 240 visible
• In 625 line system- each field comprises of 312.5 lines – 288 visible
• Remaining used for other purposes.
• Each field is refreshed alter at 60/50 fields/sec or 30/25
frames/second
• RR of 60/50 frames/sec is achieved but with only half the
transmission BW

VIDEO
• Color Signals:
• Color TVs must support monochrome transmission.
• Even Black and White TVs can receive Color TV broadcast and display in high
quality monochrome.
• Hence, a different set of color signals from R,G and B were selected for color
TV bx.
• Three properties of a color source
• Brightness (term Luminance)
• Rep amount of energy that stimulates the eye and varies on a
grayscale from black to white
• Independent of the color of the source.

VIDEO
• Hue (chrominance)
• Represents the actual color of the source as each color has a different
freq / wavelength that is helpful for the eye to distinguish colors
• Saturation (chrominance)
• Strength of the color
• a pastel color has a low level of saturation than a color such as red.
• Saturated color – red has no white in it

VIDEO-Chrominance
• By varying the magnitude of the three electrical signals that energizes RGB phosphors, different colors are seen
• When 0.299R+0.587G+0.114B is the proportion for the color white to be produced on the display screen
• Since lumi of a source is a function of the amount of white light it contains, for any color source its lumi can be determined by
summing together the three primary components that make up the color in this proportion
• Ys- amplitude of the luminance signal Ys= 0.299Rs+0.587Gs+0.114Bs
• Rs,Bs,Gs – magnitudes of the three color component signals that make up the source
• Luminance signal – a measure of the amount of white light it contains
• Two other signals – blue chrominance (Cb) and red chrominance (Cr) used to represent the coloration – hue and saturation.
These are obtained by the two color difference signals.
• Cb=Bs-Ys and Cr=Rs-Ys
• As Y is subtracted contains no brightness info
• G can be readily computed from these two signals.
• The combination of the three signals Y, Cb and Cr contains all the information that is needed to desc a color signal
• This is compatible with the monochrome televisions which use the luminance signal only.

VIDEO- Chrominance components
• Small difference is seen between the two systems in terms of magnitude
used for two chrominance signals
• BW for both monochrome and color TVs are the same.
• To fit Y, Cb and Cr signals in the same BW, the three signals must be
combined for transmission. Resulting is composite video signal
• If two color difference signals are transmitted at their orig magnitudes,
amplitude of lumin signals > equivalent monochrome signal. This leads to
degradation in the quality of monochrome picture and hence is
unacceptable
• To overcome this, mag of two colours signals are scaled down. Scaling factor
used for both is different as they have different level of luminance.
• Color difference signals are referred to by diff symbols in each system.

In PAL, the scaling factors are used for the three signals are:

VIDEO – Signal Bandwidth
• BW of transmission channel used for color broadcasts must be the same as that
for a monochrome bx
• So the two chrominance signals must occupy the same BW as the lumin signal.
• Baseband spectrum of a color TV signal in both systems are shown in fig
• Luminance signal is in lower freq signals and hence occupy the lower part of the
spectrum
• To avoid interference, the chrominance signals are first transmitted in the upper
part of the frequ spectrum using two sep sub carriers
• To restrict the BW used to the upper part of the spectrum, a smaller BW is used
for both chrominance signals.
• The two have same frequency but vary in phase-90 deg out of phase with each
other – each modulated indep. Hence they can use the same portion of
luminance freq spectrum

VIDEO – Signal Bandwidth
• In NTSC system, the eye is more responsive to I signal than the Q signal.
To maxi the use of avai BW while at the same time mini the level of interf
with the lumi signal the I signal has a modulated Bw of about 2 MHz an Q
signal has bw of about 1 MHZ.
• With PAL System, the larger luminance BW about 5.5 MHz rel to 4.2 MHz-
allows both the U and V chrom signals to have the same modulated BW
which is about 3 MHz
• Audio/sound signal is transmitted using one or more sep subcarriers
which are all just outside the lumi signal BW.
• Main audio subcarrier is for mono sound and the auxi subcarriers are for
stereo sound. When these are added to the baseband video signal, the
composite signal is called complex baseband signal
Digital Video
• In MM appln, the video need to be in the digi format to store in
memory of computer to edit and integrate with other types.
• Though analog TV BX require mix up of the three signals-RGB, digital
TV digitizes the three compo signals sepe prior to tx. Disadv is that
same resolu in terms of sampling rate and bits per sample must be
used for all three signals
• Resolution of the eye is less sensitive for color than it is for
luminance. Ie. The two chrominance signals can tolerate a reduced
resolution relative to that used for luminance signal. This could save
the resulting bit rate and hence tx bw significantly comp to RGB.

Digital Video
• Television studios – use digital form of video signals ex. Conversions from one video format into another.
• In order to standardize this process and make exch of TV prog internationally easier ITU-Radio communications
Branch formerly known as Consultative Committee for International Radiocomm (CCIR) defined a std for digi of
video pictures known as Recommendation CCIR-601.
• Small variations of this have been done for digi tv bx, video telephony, video conf. These are known as digitization
formats where the two chrom signals experience a reduced resolution relative to lumi signal
• 4:2:2 format (CCIRs reco for TV studios)
• Orig digi format used in Reco-CCIR-601 for use in TV studios.
• The three compo video signals from a source in a studio can have BW of upto 6 MHz
for lumi signal and less than half for the two chromi sign
• BW filters of upto 6MHz for lumi sign and 3 MHz for the two chro sig with a mini
samp rate of 12 MHz (2X BW) and 6MHz respectively
• In the standard, a line samp rate of 13.5 Mhz for lumin and 6.75Mhz for the two
chro signals was selected, indep of NTSC or PAL use

Digital Video-4:2:2
• The 13.5 MHz is used since it is the nearest frequ to 12 MHz which results in a whole no of
samples per line for both 525 and 625 line systems. The number of samples per line chosen
is 702 and derived as follows.
• In 525 line system, the total line sweep time is 63.56 microseconds but during this time, the
beam is turned off set to black level for retrace of 11.56 microseconds giving an active
sweep time of 52 microsec
• In 625 line system, total line sweep time is 64 microseco with a blanking time of 12 microsec
with an active sweep time of 52 micro sec Hence in both cases, a sampling rate of 13.5 MHz
yields
• 52 X10-6 X 13.5 X 106 =702 samples per line
• In practise, the number of samples per line is increased to 720 by taking a slightly longer
active line time which results in a small number of black samples at the beginning and end of
each line for reference purpose
• For the two chrominance signals – set to half – 360 samples per line.
• This results in 4Y samples for every 2Cb and 2Cr samples giving the term 4:2:2
• 4:4:4 indicates the digi based on RGB Signals

Digital Video- 4:2:2 format
• No of bits per sample is chosen to be 8 corresponding to 256 quantization
levels
• Vertical resolution of all three were chosen to be the same-480 lines for
525 line systems and 576 lines with a 625 line system. These are the
number of active lines in the system
• Since 4:2:2 is inten for use in TV studios, non-interlaced scanning is used at
a frame refr rate of either 60 Hz ( 525 lines) or 50 Hz (625 lines)
• The samples are in fixed posi which repeats from frame to frame.
• The sampling is said to be orthogonal and the sample method orthogonal
sampling.
• Figure shows the sample positions.

DIGITAL VIDEO-4:2:0 FORMAT
• Derivative of 4:2:2 format and is used in digital video broadcast appln
• Good pic quality is derived by using the same set of chrominance samples for two consecutive
lines.
• As it is intended for bx appln, interlaced scanning is used and the absence of chrominance
samples in alternative lines is the origin of the term 4:2:0.
• Luminance resolution is the same but chrominance resolution:
• 525 line systems – Y=720 X 480
Cb=Cr= 360 X 240
625 line systems - Y=720 X 576
Cb=Cr= 360 X 288
Bit rate in both systems with this format is

13.5 X 106 X 8+2 (3.375 X 106 X 8) = 162Mbps
Flickering is avoided by the receiver by using the same chrominance values from the sampled lines for the missing lines.
Flickering in large screen TVs is reduced by RX storing the incoming digitized signals of each field in a memory buffer. A
refresh rate of double the normal rate -100/120 Hz is used with the stored set used for the second field

HDTV Formats
• High Definition TVs asso with a number of alternative digitization
formats.
• Resolution of 4/3 aspect ratio tubes can be upto 1440 X 1152 pixels
and the resolution of those which relate to newer 16/9 – 1920 X 1152
pixels
• The number of visible lines per frame is 1080. Both use 4:2:2(RR-
50/60 Hz) for studio applications or 4:2:0 (25/30 Hz) format for bx
applications.
• 1440 X 1152- worst case bit rates are four times the values of the
other sections and proportionally higher for the wide screen format
S Name Digi Refresh Lumi & Lumi & Chromi Worst Scan Application
N forma rate Chromi Resolution in case ning
o t rep Resolutio 625 line Bit
n in 525 system Rate
line
system
Source 4:1:1 Half- Y= 360 X 240 Y= 360 X 288 6.75 X 106 Progress Picture quality as obtained
1 Intermediate 30Hz(525)- Cb=Cr= 180 X Cb=Cr= 180 X 144 X8 ive with Video Cassette
Format (SIF) 25Hz(625) 120 +2(1.6875 (non- Recorder (VCR)- intended
--uses half spatial (Subsampling) X106X8)= interlac for storage applications
resolution of 4:2:0 81 Mbps ed)
format-
subsampling
Half the refresh
rate– temporal
resolution
Common 4:1:1 Half- Y= 360 X 288 SAME as Progress Video Conferencing
2 Intermediate 30Hz(525)- Cb=Cr= 180 X 144 SIF ive Applications
Format (CIF) 25Hz(625) (non- Linked Desktop PCs-
--Derived from SIF 4CIF: Y=720 X 576 interlac single 64Kbps ISDN
-- combination of Cb=Cr= 360 X 288 ed) Channel.
spatial resolution Linked Video
used for SIF in 625 Conferencing Studios-
line system and 16CIF: Y=1440 X 1152 Multiple 64Kbps
temporal Cb=Cr= 720 X 576 channels (4 or 16)
resolution used in
525
3 Quarter CIF (QCIF) 4:1:1 15 / 7.5 Dr. Nandhini
Y= 180 X 144 Vineeth 3.375 X 106 Video Telephony 183
– Derived from CIF Cb=Cr= 90 X 72 X8 applications
PC VIDEO
• Multimedia applications involving video - Video telephony and video conferencing etc.,
• To avoid distortion on a PC Screen- for example for a display of N x N pixels – 525-hori resolution of 640
pixels per line, 625 line 768 pixels per line
• For PC Monitor where mixing live video with other info is seen, line sampling rate is modified .
• For 525 – line sampling rate reduced from 13.5MHz to 12.2727 MHz while for 625-14.75MHz
• In case of desktop video telephony and video conferencing, the video signals from the camera are
sampled at this rate prior to transmission and hence displayed directly on screen.
• In case of digi tv bx a conversion is necessary before the video is played.
• PC monitors use progressive scanning rather than interlaced scanning

Video Content
• In entertainment application, the content will be either a BX TV Program or in a video –on-demand – digi movie download
from a server.
• In interpersonal appln- video conf /tele, video source derived from a video camera and the digitized sequence of pixels
relating to each frame are tx across the network . As pixels are rx at the destination, they are displayed directly on either a
television screen or a computer monitor
• In interactive appln, the short video clips asso with the appln are obtained by plugging a video camera into a video capture
board with in the computer that prepares the contents. These are stored in a file to link to other page contents.
• A computer program may generate a video rather than a camera. This is computer animation/ computer graphics.
• Many special progr lang are available for creating computer animation. Such animations are represented as in the form of
animation program or a digital video.
• The digi video requires more memory and BW compared to a program form.
• The challenge here with program form is that the low level animation primitives in the program like move/rotate needs to
done very fast in order to produce smooth motion on the display. So additional 3-D graphics accelerator processor passes
the sequence of low level primitives to accelerator processor at the appropriate rate.
• Accelerator executes each set of primitives to produce the corresponding pixel image in the video RAM at the desired
refresh rate.

AUDIO COMPRESSION
• Pulse code Modulation/PCM:
• Digitization process that involves sampling the analog audio signal/waveform at a minimum rate which is twice
that of max freq compo that makes up the signal.
• Bandlimited signal:
• If the BW of the comm channel is less than that of the signal, then the sampling rate is determ by the BW of
the comm channel.
• Speech signal:
• max freq compo is 10 KHz and min samp rate is 20 ksps (12 bits)
• Audio and music:
• 20KHz and 40ksps.(16 bits)
• Stereophonic music
• - two signals need to be digitized. – 240kbps for a speech signal and 1.28Mbps for stereophonic music
• When the comm channels are with less BW availability, either the audio is sampled at a lower
rate or a compression algo is used.
• First approach, quality of decoding signal is reduced owing to the loss of the higher freq comp
from the orig signal. Use of fewer bits results in intro of higher levels of quan noise.
• Hence a compre algo is used as a compa perceptual quality to that obtained with a higher
sampling rate but with a redu BW requirement.

Differential PCM
• The range of difference in amp of a signal is much less compared to the range of actual
amplitudes. Fewer bits are required to encode such differences compared to a PCM
signal
• Figure of encoder and decoder are shown
• The register R - a temp storage hold prev digi sample
• Subtractor- helps to calculate the difference signal.
• Adder – helps in updating the new register value by adding the computed difference
with the prev actual to calculate the current actual amplitude
• Decoder- simply adds the received difference signal to the prev computed signal held in
register
• Typical savings of DPCM are limited to just 1 bit for a PCM voice signal which reduces
the bit rate requirement from 64 kbps to 56kbps.
• As the output of ADC is directly used, the accuracy of each computed diff (residual
signal) is determined by the accuracy of the prev signal/value held in the register

Third Order Predictive DPCM Signal
• All ADC operations produce a quan error and hence a string of positive errors will have a cumulative effect
on the accuracy of the value that is held in the register.
• As the errors could propagate, more sophisticated techniques have been developed for estimating- also
known as predicting – a more accurate version of prev signal. This is done by using a number of
immediately preceding estimated signals not one.
• Predictor coefficients – help in determining the proportions of the same
• Diff signal is computed by subtracting varying proportions of the last three predicted values from the current
digi value output by ADC
• Ex. If C1=0.5 and C2=C3=0.25, the contents of register R1 will be shifted right by 1 bit (Xly contents by 0.5)
and the contents of other two by 2 bits. The sum of the three shifted values are sub from curr digi value
output by ADC.
• R1 value shifted to R2, R2->R3. The new predicted value is shifted to R1 for next sample processing
• The decoder operates by adding the same proportions of the last three computed PCM signals to the
received DPCM signal.
• A performance equi to PCM is obtained by using only 6 bits for the diff signal which produces a bit rate of
32 kbps

ADAPTIVE DIFFERENTIAL
PCM
• The number of bits used for the diff signals can be varied
based on the ampl of the signal. Ie smaller bits to encode
small diff compared to large diff – ADPCM-ITU-T
Recommendation G.721
• Diff from DPCM is that eight order predictor is used and the
no of bits used is varied.
• Either 6 bits prod 32 kbps to obtain a better quality output
than with third order DPCM or 5 bits, producing 16 kbps if
lower bw is important
• ITU-T Reco G.722 prov a better sound quality than the prev at
the expense of added complexity. Added tech- subband
coding
• Input speech BW is ex from 50Hz to 7KHz comp with 3.4 Khz
for a std PCM
• This is useful in conference appln to diff voices of different
members
Adaptive Differential PCM
• The two filters in the begn, - to allow for higher signal BW prior to samp the audio input
signal – one for 50Hz to 3.5 KHz and the other from 3.5KHz to 7 KHz.
• Input speech signal is divided equally into two sep equal bw signals, the first is lower
subband signal and the second the upper subband signal.Each is sampled and encoded
inde using ADPCM, the samp rate of upper subband – 16 ksps to allow higher freq
compo.
• The use of two subbands has the adv that diff bit rates can be used for each.
• The freq compo in the lower subband signal has a higher perceptual imp that those in
higher sub band
• Operating bit rate can be 64, 56 or 48 kbps (upper subband is 16kbps) – receiver should
be able to divide them into two separate streams for decoding.
• The third std is ITU-T Recommendation G.726
• Uses a sub band coding but with a speech bw of 3.4Khz. Operating bit rate can be
40,32,24 or 16 kbps.

Adaptive Predictive coding
• Higher levels of compression can be achieved at higher levels of
complexity can be obtained by making predic coeff adaptive-prin of APC-
Pred Coeff contn change
• Optimum set of pred coeff contn vary as they are a fn of charac of audio
signal being digi ex., actual freq compo that make up the signal at a parti
instance of time
• The inp speech signal is divided into fixed time segments and for each
segment the currently prevailing char are determined
• The optimum set of coeff are then computed and used to predict more
accurately the prev signal. This type of compr can reduce the BW
requirements to 8 kbps while obtaining an accep perceived quality

Linear Predictive Coding
• The availability of inexpen DSP circuits intro an alter appr – where
the source simply analyzes the audio waveform to determine a
selection of the perceptual feature it contain.
• These are quan and sent and the destn uses them together with a
sound syn to regen a sound that is perceptually comparable with the
source audio signal. This is the basis of the linear predictive coding
tech.
• With this gene sound, very high levels of compressions achieved

Linear Predictive Coding
• The three features that determine the perception of a signal by the ear are
its
• Pitch: related to frequency and is signi as ear is more sensitive to frequ in the range
of 2-5KHz that to freq that are higher or lower
• Period: Duration of the signal
• Loudness: determined by the amount of energy in the signal
• Vocal tract excitation parameters: origins of the sound. These are
classified as:
• Voiced sounds: gene thro the vocal chords and ex incl sounds rel to m, v and l
• Unvoiced sounds: vocal chords are open ex. Sounds rel to f and s
• Once obtained from the source waveform, it can be used with suitable
model of the vocal tract, to generate a synthesized version of the original
speech signal.

• I/p speech waveform is first sampled and quantized at
a defined rate. A block of digi samples- segments is
analyzed to determine the various perceptual para of
the speech that it contains
• Decoder- Speech signal gen by vocal tract model is a fn
of the present output of the speech synthesizer as
determined by the current set of model coeff – plus a
linear combn of prev set of model coeff
• Vocal tract model used is adaptive. The encoder
determines and sends a new set of coeff for each quan
segment
• Output of encoder is a string of frames one for each
segment .
• Each frame contains fields for pitch and loudness –
period is determined by the sampling rate- a
notification of whether the signal is voiced or unvoiced
and a new set of computed model coeff
• Some LPC encoders use upto ten set of prev model
coeff to predict the output sound and use bit rates as
low as 2.4kbps or even 1.2 kbps
• Gen sound is very synthetic
• Appln: military applns where BW is all importantDr. Nandhini Vineeth 197
Code Excited LPC
• Synthesizers used in LPC decoders are based on basic model of vocal tract.
• Code-Excited LP model is an enhanced version – example for a family of vocal tract models known as
enhanced excitation LPC models
• Applns: Can be applied in envn where limi BW is available but perceived quality of the speech must be of an
accep std for use in various MM appln
• Here instead of treating each digi seg inde for encoding purposes, limited set of segments is used known as
waveform template
• A precomputed set of templates are held by the encoder and decoder known as template codebook. Each
of the indi digi samples that make up a parti template in the codebook are diff encoded.
• Each CW that is sent selects a parti templ from the codebook whose diff values best match those quan by
encoder. There is a continuity from one set of samples to another and as a result, an improvement in sound
quality is obtained.
• Four Intn stds-ITU-T Reco- G.728, 729, 729(A) and 723.1 –give good perceived quality at low bit rates
• All have a delay associated with them- analysis of each block of digi samples by encoder and speech
reconstructed at decoder. Combined delay value is said to be coder’s processing delay

Code Excited LPC
• Buffering is required before processing and this delay is algorithmic delay
• Lookahead- a technique in which samples from the next successive block
are included
• These are in addition to the End to end delay
• The combined delay value is important to check for the suitability of the
coder to a specific application.
• Ex. For a conven tele, a low delay coder is required as flow of conversation
can be hindered.
• Any interactive appln where a storage is involved, a couple of seconds
delay before the start of speech can be accepted and hence coder’s delay
is less important

Code Excited LPC
• Other parameters- complexity of coding algorithm and the
perceived quality of the output speech
• Compromise – between a coder’s speech quality and its
delay/complexity
• Delay in basic PCM is very small as it is equal to the time interval
between two succ samples of the input waveform.
• When the basic sampling rate (PCM) is 8ksps the delay is equal to
0.125 ms. Same delay applies to ADPCM coders.
• CELP std – delay value is in excess as multiple samples are involved

Perceptual Coding
LPC AND CELP-For compression of speech signal in telephony appln
Perceptual encoders – digital tv bx

Also use a model which is psychoacoustic model – since role is to exploit a
no of limi of human ear
Analysis done here as the others but only the ones that are perceptual to
human ear are transmitted.
Human ear is sensitive to sig – 15Hz to 20 kHz, the level of sensi to each
signal is non linear- more sensi to some than others
Freq masking
In gen audio. where multi signals are present, a strong signal may reduce
the level of sensi of the ear to other signals which are near to it in freq
Temporal Masking- When the ear hears a loud sound, it takes a short but
finite time before it can hear a quieter sound
A psychoacoustic model is used to identify those signals

Dr. Nandhini Vineeth that are 201
influenced by both these effects. These are eliminated from tx and this
Perceptual Coding -Sensitivity of the ear
• Dynamic range of a signal is the ratio of the max amp of the signal to the min
amp of the signal and is measured in dB. Human ear – 96 dB
• Sensitivity of the ear varies with the freq of the signal. If single freq is involved
min level of sensi is a function of frequency
• Ear is sensi to signals in range of 2-5Khz and these are quietest the ear is sensi to.
• The verti axis indi the amp level of all other frequ rela to this level measured in dB
that are required to be heard.
• In the fig, though A and B have the same amplitude, A will be heard and B will not
be heard
• When an audio signals consists of mul freq signals, the sensiti of the ear changes
and varies with the rela amp of the signals
• Figure shows how sensitivity changes in the vicinity of a loud signal. When the
amp of B becomes more than A, A cannot be heard

Perceptual Coding –Frequency Masking
• Masking effect varies with the freq
• The graph shows the masking effect of a selection of diff
freq signals – 1, 4 and 8 KHz width of masking curves-ie
range of frequencies that are affected increase with
increasing freq
• Critical BW: width of each curve at a particular signal level
is the criti bw for that freq and expt have shown that for
freq less than 500Hz, crit BW remains const at about 100
Hz.
• For the ones above criti BW, linear increase in multiples of
100 Hz.
• Ex. For 1 kHz(2 X 500 Hz), the critical bw is about 200 Hz(2 X
100)Hz while at 5 KHz (10 X 500) it is about 1000 (10 X
100)Hz.
• If the mag of freq compo that make up an audio sound can
be determined, the freq that will be masked can be
determined and not tx

Perceptual Coding –Temporal Masking
• After the loud sound ceases, it takes a short period of
time for the signal amp to decay. At this time, signals
whose amp are less that the decay envelope will not be
heard and hence not tx.
• Processing the input audio waveform over a time period

that is comparable with that associated with temp masking
becomes necessary

MPEG Audio coders
• Coders associated with audio compression part of MPEG stds are MPEG audio coders. Many use perceptual
coding
• All signal processing operations are carried out digitally
• Figure- encoder / decoder
• Analysis Filters/ Critical band filters: BW avai for tx is divided into a no of freq subbands by these filters.
• Each is of equal width. 32 PCM samples are mapped into 32 freq bands- subbands
• In encoder- time duration for each sampled segment- 12 succe sets of 32 PCM – 384 (12 X32)
• AF also determines the max amp of the 12 subband samples in each subband. Each is known as scaling
factor.
• These are passed both to psy-model and to quan block
• Discrete Fourier Transformations(DFT)- used to transform the PCM samples to freq components.
• Using the hearing thresholds and masking prop of each subband, the various masking effect are determined.
Output of the model is a set of signal to mask ratios which indicate the comp who amp Is below the related
audible threshold.
• Quantization accuracy is determined by using the set of scaling factors
• Intension is to use more accuracy to highly sensitive regions with less quan noise than the ones for which
the ear is less sensitive
MPEG Audio coders
• Header-info on samp freq used
• SBS- Sub band Sample format- to carry all required infor for decoder
• Ancillary data- optional field- to carry additional coded samples asso ex.
Surround sound present with video bx
• Synthesis filter in decoder- magnitude of each set of 32 subband samples
act as input, output- PCM Samples
• As psy model is not in decoder, complexity is less and hence suits bx appln.
• Intn std – iso Reco 11172-3 Three levels of processing
• Layer 1- Basic mode and other two have increased levels of processing
asso. Tempo masking not present in layer 1 but in layer2 and 3. Increasing
level of compression and perceptual quality is observed.
Dolby Audio Coders
• The psy models with MPEG coders control the quan accuracy of each subband sample
by computing and allocating the no of bits to be used to quan each sample.
• As these vary the bit allocation information that is used to quan the samples in each
subband is sent with the actual quan samples. This is used by the decoder to Dequan
the set of subband samples in the frame. This mode is fwd adaptive bit allocation mode.
• Adv: As psy model is available only in encoder, complexity of decoder is reduced.
• Disadv: a signi portion of each encoded frame contains bit allocation info which leads to
rela ineff use of avai bit rate.
• A variation is to use a fixed bit allocation strategy for each subband which is then used
by both the encoder and decoder.
• Std Dolby AC(Acoustic coder)-1 : bit allocations for each subband are based on the
sensitivity char of human ear and the bit rate is effi utilized.
• Designed to use in satellites to relay FM radio programs and the sound asso with TV programs.
• Uses a low complexity psy model with 40 subbands at a samp rate of 32 ksps
• Typi compre bit rate is 512 kbps for two channel stereo

Dolby Audio Coders
• A second variation- decoder also contains a psy model so that
overheads in encoder bit stream can be reduced.
• Copy of subband samples are required in the decoder- so in place of
bit alloc info, every frame carries the encoded freq coeff that are
present in sampled waveform segment. This is known as encoded
spectral envelope and this mode is backward adaptive bit alloc
mode.
• This is used in Dolby AC -2 used in many applications incl the audio
compre in no. of PC sound cards. In bx appln, the disadv is that the
psymodel in the encoder cannot be changed without changing all the
decoders.

Dolby
Audio
Coders

Dolby Audio Coders
• Third variation-Hybrid backward/fwd adaptive bit alloc mode uses both
backward and fwd bit alloc principles.
• Issues: with backward bit alloc method, quan accu of subband samples is affected
by the quan noise intro by the spectral encoder. Hence in this model though a
backward adap scheme is used as in AC-2 using PMB – an addn psy model – PMF is
used to compute the diff b/w the bit all computed by PMB and those computed
by PMF using fwd adap bit alloc scheme. This is used by PMB to impr the quan
accuracy of the set of subband samples.The modi info is sent in the enco frame
and is used by the PMB in the deco to improve the Dequan accuracy
• Any change in oper para of PMB requirement can be sent with computed diff info.
• The pmf must compute two sets of quan info for each set of subband samples
and hence is rela complex. As this is not required in the decoder – not an issue

Dolby Audio Coders
• Hybrid approach in Dolby AC-3 std used in simi range of applns as
MPEG audio stds inclu the audio asso with adv TV using HDTV format.
Each encoded block contains 512 subband samples.
• To obtain conti from one block to another last 256 subband samples
in prev block are repeated to the first 256 samples and hence each
block contains 256 new samples. Bit rate is 192kbps

NV Multimedia Communications UNIT I

Uploaded by

Copyright:

Available Formats

NV Multimedia Communications UNIT I

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NV Multimedia Communications UNIT I

Uploaded by

Copyright:

Available Formats

Multimedia Communications

Dr. Nandhini Vineeth 2

Dr. Nandhini Vineeth 3

Dr. Nandhini Vineeth 4

Dr. Nandhini Vineeth 5

• Broadcast Television Networks

• Integrated Services Digital Network

• Broadband Multiservice Networks

Dr. Nandhini Vineeth 6

• Extended to International calls

• Explanation of the figure in next slide

Dr. Nandhini Vineeth 7

Dr. Nandhini Vineeth 8

• Microphone is used to convert speech to analog signal

Dr. Nandhini Vineeth 9

Dr. Nandhini Vineeth 11

Dr. Nandhini Vineeth 12

Dr. Nandhini Vineeth 13

Dr. Nandhini Vineeth 14

Dr. Nandhini Vineeth 20

Dr. Nandhini Vineeth 23

Dr. Nandhini Vineeth 25

Dr. Nandhini Vineeth 27

Dr. Nandhini Vineeth 29

Dr. Nandhini Vineeth 31

Dr. Nandhini Vineeth 33

Dr. Nandhini Vineeth 34

Dr. Nandhini Vineeth 36

Dr. Nandhini Vineeth 37

Dr. Nandhini Vineeth 38

Dr. Nandhini Vineeth 40

Dr. Nandhini Vineeth 41

Dr. Nandhini Vineeth 43

Dr. Nandhini Vineeth 45

Dr. Nandhini Vineeth 46

Dr. Nandhini Vineeth 47

• Transmission Delay = Message size/bandwidth bps

• Latency = Propagation delay + Transmission delay + Queueing time +

Dr. Nandhini Vineeth 50

• Same as the block error rate of a CS n/w

Dr. Nandhini Vineeth 52

Dr. Nandhini Vineeth 53

• Interactive applications- a connectionless packet switched network would be

Dr. Nandhini Vineeth 54

Dr. Nandhini Vineeth 57

Dr. Nandhini Vineeth 58

Dr. Nandhini Vineeth 59

Dr. Nandhini Vineeth 60

Dr. Nandhini Vineeth 61

Dr. Nandhini Vineeth 62

Dr. Nandhini Vineeth 64

Dr. Nandhini Vineeth 66

Dr. Nandhini Vineeth 67

Dr. Nandhini Vineeth 68

Dr. Nandhini Vineeth 70

Dr. Nandhini Vineeth 71

Dr. Nandhini Vineeth 73

Dr. Nandhini Vineeth 75

Dr. Nandhini Vineeth 77

Dr. Nandhini Vineeth 78