0% found this document useful (0 votes)
17 views14 pages

Sec 3 - Speech Recognition - Intro

The document explains sound as vibrations that carry information and distinguishes between voice and speech. It details the process of Automatic Speech Recognition (ASR), which converts acoustic signals into text, and outlines the characteristics of audio files, including formats like MP3 and WAV, as well as concepts like audio sampling, sample rate, and bit depth. These elements are crucial for understanding audio quality and resolution in digital recordings.

Uploaded by

amrt6958
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

Sec 3 - Speech Recognition - Intro

The document explains sound as vibrations that carry information and distinguishes between voice and speech. It details the process of Automatic Speech Recognition (ASR), which converts acoustic signals into text, and outlines the characteristics of audio files, including formats like MP3 and WAV, as well as concepts like audio sampling, sample rate, and bit depth. These elements are crucial for understanding audio quality and resolution in digital recordings.

Uploaded by

amrt6958
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Speech recognition

Sec 3
What is Sound?

vibrations that travel


through the air Heard by
person’s ear

• Sound is air pressure variation .


• Sound is information bearer.
What is voice?

• Voice → refers to the sound produced by


humans

• Speech →is what comes out the mouth after


the sound is modified by the throat muscles,
palate, tongue, lips, teeth, etc
(ASR) Automatic Speech
Automatic speech Recognition
recognition (ASR)

Is the process of converting an acoustic signal, captured by microphone


or a telephone, to a set of words
Audio signal acquisition
1
Preprocessing
2
ASR
Feature extraction
Processing 3
Steps
Classification
4
Recognition (text)
5
Characteristics of Audio Files
Characteristics of Audio Files

MP3 (not hi-res):


• Popular,
• lossy compressed format
• ensures small file size but is far from the best sound quality.
• Convenient for storing music on phones and iPods.

WAV (hi-res):
• The standard format in which all CDs are encoded.
• Great sound quality but it's uncompressed, meaning huge file sizes
(especially for hi-res files).
• It has poor metadata support (that is, album artwork, artist and
song title information).
Characteristics of Audio Files

Sound as a Waveform
Characteristics of Audio Files
Audio Sampling → Conversion from analog sound waves to digital audio:-

amplitude

time

The audio signal in a file represents a series of samples that capture the
amplitude of the sound over time.
Characteristics of Audio Files

•The sample rate:


• the number of audio samples recorded each second. It
is measured in Hertz or samples per second.
• Is crucial (particularly for reproducing high
frequencies)
• The more samples you take, the more closely the
final digital file will resemble the original.
Characteristics of Audio Files

•Bit Depth:
• Every sample taken while making an audio recording
needs to be stored within your computer’s ‘bits’.

• The number of possible amplitude values we can


record for each audio sample.

• The more bits you use to record each sample, the


better the sound reproduction.

• The most common audio bit depths are 16-bit, 24-bit,


and 32-bit.
Characteristics of Audio Files

Increasing the audio bit depth, along with increasing the audio sample
rate, creates more total points to reconstruct the analog wave.
sample rate vs bit depth
Sample rate
Bit depth
determines the number
determines how many
of snapshots taken to
amplitude values each of
recreate the original
those snap shots contain.
sound wave

Together bit depth and


sample rate work
together to determine
audio resolution.
For example, a sound wave like this can be sampled at each time
sample point:

The sound recorded at each sample point is converted to its nearest numeric equivalent:

You might also like