0% found this document useful (0 votes)
49 views50 pages

Audio-Video I: P Kadebu

The document discusses digital audio and multimedia technologies. It covers topics like psychoacoustics, digital representation of sound, digital images, and compression methods. It also discusses digital audio applications and technologies like digital signal processing.

Uploaded by

sharon mkdauenda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views50 pages

Audio-Video I: P Kadebu

The document discusses digital audio and multimedia technologies. It covers topics like psychoacoustics, digital representation of sound, digital images, and compression methods. It also discusses digital audio applications and technologies like digital signal processing.

Uploaded by

sharon mkdauenda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Audio-Video I

P Kadebu
 Introduction
 Digital audio
› Psycho acoustics
› Digital presentation of sound
 Digital images
› JPEG
Introduction
 Multimedia uses often sound, images, and videos
from natural sources
 First, the source has to be converted into signal
› microphone, camera, video camera
 Next, analog signals are converted into digital
› sampling, A/D-transformation
 Often, amount of the information is also reduced
› compression
Introduction …….Cont’d
 Compression method can be lossless or lossy
 Compressed information is easier to store and
transfer
 Compressed information has to be decompressed
before use
 Digital to analog conversion has also to be done
 After this, the signal can be played (or shown) to
the user
Digital audio
  Technology that can be used to record, store, generate, manipulate,
and reproduce sound using audio signals encoded in digital form.
 A microphone converts sound to an analog electrical signal, then
an analog-to-digital converter (ADC)—typically using pulse-code
modulation—converts the analog signal into a digital signal.
 A digital-to-analog converter performs the reverse process,
converting a digital signal back into an analog signal, which analog
circuits amplify and send to a loudspeaker.
 Digital audio systems may
include compression, storage, processing and transmission compon
ents. Conversion to a digital format allows convenient
manipulation, storage, transmission and retrieval of an audio
signal.
Digital audio application areas

 Computer generated sound


 Sound storage and processing
 Digital communications
 Answering service
 Speech synthesis
 Speech recognition
 Computerized call center
 Presentation of data as sound (Sonification)
Computer generated sound

 Sound that has been created using computing


technology.
 Computer-generated music is music composed by, or
with the extensive aid of, a computer.
 Since the invention of the MIDI system in the early
1980s, for example, some people have worked on
programs which map MIDI notes to an algorithm
and then can either output sounds or music through
the computer's sound card or write an audio file for
other programs to play.
Sound storage and processing
 Sound can be stored as Digital files by Digital Sound Processing.
 A digital audio system starts with an ADC that converts an analog
signal to a digital signal
 A digital audio signal may be stored or transmitted. Digital audio
can be stored on a CD, a digital audio player, a hard drive, a USB
flash drive, or any other digital data storage device.
 The digital signal may then be altered through digital signal
processing, where it may be filtered or have effects applied.
 Audio data compression techniques, such as MP3, Advanced
Audio Coding, Ogg Vorbis, or FLAC, are commonly employed to
reduce the file size.
 Digital audio can be streamed to other devices.
 For playback, digital audio must be converted back to an analog
signal with a DAC.
Digital communications

 Data transmission, digital transmission,
or digital communications is the physical
transfer of data (a digital bit stream or a digitized
analog signal) over a point-to-point or point-to-
multipoint communication channel.
Answering service

 The purpose of an answering service is to offer assistance or


record messages. It can be used to:
 Offer help services eg technical assistance for configuring a
machine
 Responding on behalf of a phone user to record messages in
their absence or when they are unable to take a call.
 Answer services usually fit into one of the following
categories, although answering-service providers often offer
services from several categories:
› Automated answering service
› Live answering service
› Internet answering service
› Call center
Speech synthesis

 The artificial production of human speech.


 A computer system used for this purpose is called
a speech synthesizer, and can be implemented
in software or hardware 
products.
 A text-to-speech (TTS) system converts normal language
text into speech; other systems render symbolic linguistic
representations like phonetic transcriptions into speech 
 May be used to assist deaf/dumb people in
communication
Speech recognition
  It is the ability of a machine or program to
identify words and phrases in spoken language
and convert them to a machine-readable format.
 Speech recognition applications include voice
user interfaces such as voice dialling, call
routing, domotic (Home automation) appliance
control, search, simple data entry, speech-to-text
processing (e.g., word processors or emails).
Presentation of data as sound (Sonification)

 The use of non-speech audio to


convey information or perceptualize data.
 Auditory perception has advantages in temporal,
amplitude, and frequency resolution that open
possibilities as an alternative or complement
to visualization techniques.
Psycho acoustics
 Frequency band
 Dynamic range
 Frequency properties
 Time effect
 Masking
 Phase
 Binaural hearing and localization
Frequency band
 Frequency bands are groupings of a specific
range of frequencies in the radio frequency (RF)
spectrum.
 Humans can hear frequencies 20 Hz - 20kHz
(hearing range).
 Older people have much narrower range
 The threshold of hearing is approximately the
quietest sound a young human with undamaged
hearing can detect at 1,000 Hz.
 The ear's sensitivity is best at frequencies
between 1 kHz and 5 kHz.
Dynamic range

 At pain level, amplitude can be 1,000,000 times the sound at


the threshold of audibility
 The measurement unit is dB
 The threshold is 0 dB and pain level
100 - 120 dB
 On the decibel scale, the smallest audible sound (near total
silence) is 0 dB. A sound 10 times more powerful is 10 dB. A
sound 100 times more powerful than near total silence is 20
dB. A sound 1,000 times more powerful than near total
silence is 30 dB. 
 Hearing is a sense, which cannot be directly measured
› the pitch of a sound changes according to amplitude
› the loudness depends on frequency
Frequency properties

 Natural sound are sums of many frequencies


 The frequencies can be calculated with Fourier
analysis
 Natural sounds contain typically harmonic
frequencies of the base frequency
 Ear is sensitive to valleys and hills of the spectrum
 Distinctive spots are called formants (a range of
frequencies [of a complex sound] in which there is
an absolute or relative maximum in the sound
spectrum".)
Clarinet sound
Time effect
 Sounds of instruments have three parts:
› Attack, steady-state, and decay
 In simple synthesis, sound is generated with
frequency components and their loudness is
changed at different stages
 In real, the frequency components of the
spectrum change constantly
 Hearing is especially sensitive in attack phase
Masking
 Auditory masking occurs when the perception
of one sound is affected by the presence of
another sound
 Sounds can mask each other partly or fully
 They can also change each other
 A certain frequency sound changes the threshold
of audibility around larger area
 The sounds have to be critical distance away so
that they are heard separately
 Critical limit grows as frequency get higher
Masking (cont.)
Phase
 It is the fraction of the wave cycle that has
elapsed relative to the origin.
 Same frequency sounds can have different phases
 A phase of 180 degrees cancels the sound
 There is evidence that, humans can hear phase
difference
Binaural hearing and
localization
 Binaural hearing refers to being able to integrate information
that the brain receives from the two ears.
 Binaural hearing is known to help us with the ability to listen
in noisy, complex auditory environments, and to localize
sound sources.

 Humans can determine the location of sound


› loudness, phase difference, frequency
 Skull, ear lopes, and hearing organs filter sound
 In addition, reflections have strong effect
 Sound sources should be placed in the same location and
visual information
Digital presentation of sound

 Coding in time domains


 Transformations
 Linear prediction
 Parametric coding
 Digital transfer of audio
Coding in time domain
 Samples are taken at sample frequency
 The sample frequency has to be at least twice the
maximum frequency (so called Nyquist
frequency)
 Common sample frequencies are 8, 44.1, and 48
kHz
 The value of amplitude at sampling moment is
coded as numeric value
Aliasing
In signal processing and related disciplines, aliasing is an effect that causes different
signals to become indistinguishable (or aliases of one another) when sampled
Pulse Code Modulation (PCM)
PCM is a method used to digitally represent sampled analog signals. It is the standard
form of digital audio in computers, Compact Discs, digital telephony and other digital
audio applications.
Coding in time domain (cont.)
 Sampling causes quantization error
 Each bit improves the signal to noise ratio
› 20 log 10 2 = 6 dB
 Often 16 bits are used
› 16 * 6 dB = 96 dB
 The human dynamic hearing range is more
(about 120 dB)
Coding in time dimension (cont.)

 Transformations between analog and digital


signals are done with A/D and D/A converters
 In addition, filtering is required
› Anti-alias and Reconstruction filters
 In high quality systems, they can also be error
source
 This problem can be solved with oversampling
 The computer can also cause over hearing
Transformations

 Transformations can be used to present the


content in different domain
 The goal is to make the signal transfer more
efficient and robust
Fourier transformation
 Fourier coefficients represent the signal
accurately in frequency domain
 Static signals can be present presented exactly
with Fourier coefficients
 Discrete Fourier transformation has to be used
with dynamic signals
 Coefficients are usually calculated with the Fast
Fourier Transformation (FFT) algorithm
Frequency bands

 Masking effect can also be used in coding


 Signal is first divided into frequency bands,
which are then coded separately (Subband
Coding)
 E.g., Mini Disc -records (Sony), DCC cassettes
(Philips) and MP3
 The methods has also been used with speech
coding and recognition
MPEG audio

 MPEG audio uses Subband coding


 Signal is divided into 32 bands (Layer 1)
 The division is done to groups of 384 samples
 FFT transformation is used to find band with
pure sine signals and noise
 Only interesting channels are coded
 The bit allocation per channel varies
 Layer I over 128 kbps / channel
MPEG audio (cont.)
 Layer 2  Layer 3 (MP3)
 about 128 kbps /  about 64 kbps /
 channel  channel
 1152 samples per  filter bank
group  Huffman coding
 3 scaling factors  bits used for coding
 36 frequency bands can vary
JPEG
 Objectives
 Architectures
 DCT coding and quantization
 Statistical coding
 Lossless coding
 Efficiency
Objectives

 Compression rate / image quality can be selected


 Works with all kinds of images
 Both software and hardware implementation
 Four different modes
› sequential coding (original order)
› progressive coding (multiphase coding)
› lossless coding (perfect copy)
› hierarchical coding (many resolutions)
JPEG Architectures

 Lossy modes use DCT for 8 x 8 pixel blocks


 Sequential mode outputs the DCT coefficients
block by block
 Progressive mode outputs the DCT coefficient in
groups
 Hierarchical mode encodes several resolutions at
the same time
Sequential JPEG
Progressive JPEG
Hierarchical JPEG
Lossless JPEG
DCT and Quantization

 The DCT-coefficients can be represented as a


matrix
 The quantization is done according to a
quantization table
 The coefficients are put in Zig-Zag order
 This places the zero coefficients in the end of the
run
 Finally Run-Length coding eliminates the zeros
DCT coding
DCT Coefficients
Statistical Coding

 Uses either Huffman or arithmetic coding


 Huffman coding requires a separate table
 Arithmetic coding does not require a table, but
need more computation
 In addition, the compression ratio of arithmetic
coding is 5 - 10 % better
Lossless Encoding

 Lossless encoding utilizes prediction


 Seven different alternatives
› how many and which pixels are used
 Predictive encoding can reach compression ratio
of 2:1
Efficiency

 0,25 - 0,5 bpp: reasonable - good quality


 0,5 - 0,75 bpp: good - very good quality
 0,75 - 1,5 bpp: very good quality
 1,5 - 2,00 bpp: same as original

You might also like