Introduction To Speech Recognition

This article provides an overview of speech recognition technology, detailing its evolution, operation, and applications. It explains the distinction between speech recognition and voice recognition, the role of algorithms and machine learning in improving accuracy, and highlights advancements and ongoing challenges in the field. Key applications include virtual assistants, transcription services, and accessibility tools, emphasizing the efficiency and convenience offered by speech recognition technology.

Uploaded by

Kezia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views3 pages

Introduction To Speech Recognition

Uploaded by

Kezia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Introduction

This article presents an overview of speech recognition technology, including its evolution,
operation, and various applications. It looks into the underlying technologies, such as algorithms
and machine learning, that allow speech recognition systems to interpret and comprehend
human speech. Furthermore, this article discusses the advancements made in this subject, from
its early days to the present, as well as the obstacles that remain. This article is a summary of
what I've learnt, compiled into one piece, providing readers a brief understanding of speech
recognition technology.

What is Speech Recognition?

Speech recognition is a feature where the program recognizes human speech and converts it
into written language. It is also referred to as ASR (Automatic Speech Recognition), Computer
Speech Recognition, or Voice-to-Text. These functionalities have an extended use in various
apps, including: virtual assistants, transcription services, voice-controlled gadgets, and
accessibility tools for individuals with disabilities. It would allow hands-free control, voice
commands, and the facility to swiftly change spoken text into written material, making
day-to-day interactions more efficient and accessible. Many people associate speech
recognition with voice recognition. They are not the same! While speech recognition is trying to
translate a person's vocalized words into text, voice recognition is attempting to identify who it is
speaking with. So, keep that in mind before going further in the understanding of speech
recognition, as both involve speaking but have their own purpose and system behind it.

Algorithm
So how does a speech recognition model be able to recognize our verbal words? Well it is
because each speech recognition model is guided by algorithms implemented in them. An
algorithm is a set of instructions, which are done to solve a problem or complete some tasks.
In computer science, algorithms instruct machines on what to do. So in this case of speech
recognition, the algorithm guides a model by directing how to process and differentiate sounds,
recognize patterns, predict words based on context, and adjust to variations like accents or
noise. It defines the steps for accurate speech-to-text conversion. Algorithms are important
because they specify how the machine will process data in order to learn and make decisions.
The better the algorithm, the better and more efficient the learning will be for the machine.

Machine Learning
The machine learning model comprises a subcategory of artificial intelligence in which
computers are trained to learn from data without explicit programming for every little task. It
means that a machine detects various patterns from data, makes predictions using algorithms,
and relearns from experience as it continues processing more information.

For example, you would normally write a program to identify cats by listing every feature that
distinguishes a cat, like its sharp retractable claws, pointy ears, slit-shaped pupils, whiskers and
so on. Instead, you just give the machine thousands of images of cats. The machine will next be
guided by the algorithm (set of instructions), to identify and classify the patterns in each image.
The characteristics that define something as a cat are then discovered by learning these
patterns. This is how image recognition works.

Now the functionality behind speech recognition doesn’t stray away from the principles of image
recognition. It’s similar, the machine gets exposed to thousands of audio samples with different
voices, accents, and tones. It analyzes the sounds by breaking them into waveforms and
extracting key features like vowels, consonants, and phonemes. Algorithms help the machine in
identifying how these sounds blend to make words and sentences, so it can make predictions
when exposed to new speech. The system, over time, improves its performance as it ingests
new data, adapts to differences like accents or background noise, and refines its accuracy.

Evolution of Speech Recognition

An algorithm's design has a significant impact on the machine learning process. A badly
constructed algorithm will only result in ineffective pattern recognition, slow learning, and
wasteful resource consumption. On the contrary, good algorithm design entails that the learning
is more appropriate and effective because it recognizes patterns, adapts to new data, and
optimizes resource utilization strategies. These lead to faster and more trustworthy models in
machine learning. For a long time, engineers have been pushing forward with more efficient
algorithms for these speech recognition models. Endlessly trying to find a way that can
increase their ability to recognize words accurately. So here is a brief history of how speech
recognition ability has progressed.

● The Start (1960-1999)

In 1962 IBM introduced “Shoebox” which only had the ability to only recognize 16
english words. In the late 1960s continuous speech recognition was developed, earlier
machines required users to pause after each word. During 1971-1976 The Defense
Advanced Research Projects Agency (DARPA) funded 5 years of speech recognition
research. They created a machine named ‘Harpy’ capable of understanding 1011 words.
Subsequently, new models, such as the hidden Markov models (HMMs) have been
implemented in speech recognition systems, allowing machines to more accurately
recognize speech by predicting the probability of unknown sounds. By the mid 1980s, the
Tangora model was able to achieve the distinction of 20,000 words. This huge leap of
progress is thanks to the hidden Markov models, improved statistical models and access to
larger training datasets. In the 1990s, speech recognition began to be integrated into
commercial products; these included Apple computers.

● Year 2000 - The Future

By 2001, voice recognition technology had achieved nearly 80% accuracy. For most of
the decade, there were only a few advancements until Google launched Google Voice
Search. Because it was an app, millions of individuals now have access to speech
recognition technology. It was also significant because processing power could be
transferred to its data centers. Not only that, but Google was collecting data from billions
of searches to help it predict what a person is saying. At the time, Google's English
Voice Search System contained 230 billion words from user searches. In 2011, Apple
launched Siri, and in that same decade other models have been released such as
Amazon’s Alexa and Google Home. Believe it or not, there has been more progress in
speech recognition technology in the last 30 months than in the first 30 years. While
speech recognition has made significant progress, it has not necessarily reached its
ultimate limit. There is still room for improvement.
1. Noise Background: Recognizing speech in a noisy environment still remains a
challenge. Further work could be done in enhancing aspects such as noise
filtering and adaptation to different acoustic environments.
2. Accent and Dialect Recognition: There are so many types of accents all over
the world. This can be a challenge for speech recognition models due to some
accents having less training data. Example: English in non-native English accent.

As time goes by, speech recognition technologies are bound to get better. More and
more people will become comfortable talking with machines as it will assist them in
accomplishing the tasks at hand.

Applications of Speech Recognition

Speech recognition technology can be put into many different areas; the main ones are virtual
assistants, like Siri, Alexa and Google Assistant. Major voice assistants let users interact with
their gadgets simply by using their voice to execute simple tasks. Other major applications
involve transcription, where it converts spoken content to written text. This is even more helpful
in creating documentation, subtitles, and transcription of meetings or lectures. With speech
recognition, drivers won't have to take their hands off the wheel or their eyes off the road to
manage music, navigation, and other in-car features, making it much more convenient and
safer.It also provides a facility in education, such as language learning applications, by which a
student can practice pronunciation and comprehension. In terms of efficiency, speech
recognition reduces the time and effort required to perform tasks. The time-saving aspect is
particularly valuable, as users can complete tasks faster by speaking rather than typing, leaving
more time for other activities.

Sources:
https://www.ibm.com/topics/speech-recognition
https://en.wikipedia.org/wiki/Timeline_of_speech_and_voice_recognition#Overview
https://en.wikipedia.org/wiki/Speech_recognition
https://sonix.ai/history-of-speech-recognition#:~:text=1950s%20and%2060s,four%20vowels%20
and%20nine%20consonants.
https://transkriptor.com/speech-recognition/
https://verbit.ai/captioning/what-is-voice-recognition-used-for-and-how-does-it-work/#:~:text=Voi
ce%20recognition%20technology%20can%20interpret,claim%20to%20be%20when%20speakin
g.

ISO 9001 Internal Auditor Training
100% (3)
ISO 9001 Internal Auditor Training
7 pages
Voice Technology Seminar
100% (1)
Voice Technology Seminar
35 pages
Speech Recognition Report
100% (1)
Speech Recognition Report
20 pages
CAT D399 Workshop Manual
97% (37)
CAT D399 Workshop Manual
434 pages
HRM - 1st Midterm
100% (1)
HRM - 1st Midterm
81 pages
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages
Dr. Francisco O. Santos
No ratings yet
Dr. Francisco O. Santos
2 pages
SurgeTesting EARbasics 0716
100% (1)
SurgeTesting EARbasics 0716
2 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
Speech Recognition
0% (1)
Speech Recognition
27 pages
CASE STUDY - Speech Recognition
No ratings yet
CASE STUDY - Speech Recognition
25 pages
Human Resource
100% (1)
Human Resource
92 pages
Basics of Essay Writing
No ratings yet
Basics of Essay Writing
20 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
24 pages
Speech Processing
No ratings yet
Speech Processing
70 pages
Pascal Programming
No ratings yet
Pascal Programming
31 pages
D D D D D D D D: TL5001, TL5001A Pulse-Width-Modulation Control Circuits
No ratings yet
D D D D D D D D: TL5001, TL5001A Pulse-Width-Modulation Control Circuits
33 pages
UNIT 5 Application AI
No ratings yet
UNIT 5 Application AI
16 pages
Speech Recognition: From Wikipedia, The Free Encyclopedia
0% (1)
Speech Recognition: From Wikipedia, The Free Encyclopedia
16 pages
Speech Recognition: Prof. Ram Meghe Institute of Technology and Research, Badnera-Amravati
No ratings yet
Speech Recognition: Prof. Ram Meghe Institute of Technology and Research, Badnera-Amravati
13 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Kamuli District DDP III 2020 - 2025 - 0
No ratings yet
Kamuli District DDP III 2020 - 2025 - 0
233 pages
500D High Pressure Syringe Pump Datasheet PDF
No ratings yet
500D High Pressure Syringe Pump Datasheet PDF
2 pages
Text and Speech CCS369-UNIT 5
No ratings yet
Text and Speech CCS369-UNIT 5
9 pages
Based On May 2011 Occupational Standards: Ethiopian TVET-System
No ratings yet
Based On May 2011 Occupational Standards: Ethiopian TVET-System
92 pages
2022 - Digital Transformation Towards Education 4.0
No ratings yet
2022 - Digital Transformation Towards Education 4.0
28 pages
DVP06XA-S Mixed Analog Input-Output Module
No ratings yet
DVP06XA-S Mixed Analog Input-Output Module
2 pages
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
No ratings yet
A Seminar Report On: R. H. Sapat College of Engineering, Management Studies and Research
32 pages
Final Report
No ratings yet
Final Report
35 pages
M. M Arinze Corporate Law Practice Note 2
No ratings yet
M. M Arinze Corporate Law Practice Note 2
160 pages
A Report On
No ratings yet
A Report On
35 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Work 3
No ratings yet
Work 3
22 pages
Speech Recognition
No ratings yet
Speech Recognition
27 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
23 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
22 pages
Tejaswini Group Report
No ratings yet
Tejaswini Group Report
18 pages
SPEECH RECOGNITION SYSTEM Final
No ratings yet
SPEECH RECOGNITION SYSTEM Final
16 pages
SPEECH
No ratings yet
SPEECH
8 pages
Speech Recognition: BY Charu Joshi
No ratings yet
Speech Recognition: BY Charu Joshi
26 pages
Speech Recognition1
No ratings yet
Speech Recognition1
24 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
19 pages
Speech Recognition: White Paper
No ratings yet
Speech Recognition: White Paper
24 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
I6157630 - TG1 - Cara Henning - Machine Learning - Voice and Speech Recognition System
No ratings yet
I6157630 - TG1 - Cara Henning - Machine Learning - Voice and Speech Recognition System
11 pages
Ai in Speech Recognition
No ratings yet
Ai in Speech Recognition
24 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
NLP 1.3.1 - Speed Recogmnition
No ratings yet
NLP 1.3.1 - Speed Recogmnition
20 pages
Speech Recognition - Specific Task of Speech Recognition: Abstract
No ratings yet
Speech Recognition - Specific Task of Speech Recognition: Abstract
7 pages
Social Work in A Digital Age - Ethical and Risk Management Challenges
No ratings yet
Social Work in A Digital Age - Ethical and Risk Management Challenges
12 pages
Rohit
No ratings yet
Rohit
14 pages
Artificial Intelligence in Voice Recognition
No ratings yet
Artificial Intelligence in Voice Recognition
14 pages
Key Application: - Audrey System - The First Speech Recognition System Introduced by Bell Laboratories in 1952
No ratings yet
Key Application: - Audrey System - The First Speech Recognition System Introduced by Bell Laboratories in 1952
8 pages
Key Application: Automatic Speech Recognition or ASR, As It's
No ratings yet
Key Application: Automatic Speech Recognition or ASR, As It's
8 pages
Stats 101 Assignment 1
No ratings yet
Stats 101 Assignment 1
9 pages
Walls
No ratings yet
Walls
17 pages
IRJET Speech Scribd
No ratings yet
IRJET Speech Scribd
3 pages
Ramadan in Java The Joy Jihad of Ritual Fasting Lund Studies in History of Religions Andre Moller Instant Download
No ratings yet
Ramadan in Java The Joy Jihad of Ritual Fasting Lund Studies in History of Religions Andre Moller Instant Download
70 pages
A Survey On Speech Recognition
No ratings yet
A Survey On Speech Recognition
2 pages
Speech Recognition Applications TEXT
No ratings yet
Speech Recognition Applications TEXT
7 pages
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
No ratings yet
Jarvis Digital Life Assistant IJERTV2IS1237 PDF
6 pages
CN Assignment 1A
No ratings yet
CN Assignment 1A
12 pages
Chapter 1. INTRODUCTION
No ratings yet
Chapter 1. INTRODUCTION
2 pages
Body and Conclu
No ratings yet
Body and Conclu
3 pages
Lab 3 Unit 2
No ratings yet
Lab 3 Unit 2
7 pages
Traffic Control in Atm
No ratings yet
Traffic Control in Atm
8 pages
Tsa Ut V
No ratings yet
Tsa Ut V
9 pages
CID 20210320173003021556 989295 uniROC Ipayob
No ratings yet
CID 20210320173003021556 989295 uniROC Ipayob
6 pages
Speech Processing
No ratings yet
Speech Processing
4 pages
Iccsee 2012 359
No ratings yet
Iccsee 2012 359
4 pages
Speech Recognition
No ratings yet
Speech Recognition
7 pages
JD - Lead Salesforce Developer-2
No ratings yet
JD - Lead Salesforce Developer-2
2 pages
Speech Recognition Technology: Applications & Future: Pankaj Pathak
No ratings yet
Speech Recognition Technology: Applications & Future: Pankaj Pathak
3 pages
Creating Add-On Products With Add-On Product Creator
No ratings yet
Creating Add-On Products With Add-On Product Creator
4 pages
Application and Development Prospect of AI Speech Recognition Technology
No ratings yet
Application and Development Prospect of AI Speech Recognition Technology
5 pages
Writing Ten Core Concepts 2nd Robert P. Yagelski Robert P. Yagelski PDF Download
No ratings yet
Writing Ten Core Concepts 2nd Robert P. Yagelski Robert P. Yagelski PDF Download
25 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Aqa Accn4 W SQP 07
No ratings yet
Aqa Accn4 W SQP 07
6 pages
Artificial Intelligence: Presented By: A.Sowmya CH - Sushma
No ratings yet
Artificial Intelligence: Presented By: A.Sowmya CH - Sushma
10 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Tracy Resume
No ratings yet
Tracy Resume
2 pages
Project New
No ratings yet
Project New
2 pages
Il999 sf123 Spec
No ratings yet
Il999 sf123 Spec
1 page
Feelings When Your Needs Are Satisfied: Engaged
No ratings yet
Feelings When Your Needs Are Satisfied: Engaged
4 pages
Personal Mandala Rubric
No ratings yet
Personal Mandala Rubric
2 pages
Voice Application Development for Android
From Everand
Voice Application Development for Android
Michael F. McTear
1/5 (1)
Natural Language User Interface: Fundamentals and Applications
From Everand
Natural Language User Interface: Fundamentals and Applications
Fouad Sabry
No ratings yet

Introduction To Speech Recognition

Uploaded by

Introduction To Speech Recognition

Uploaded by

Introduction

What is Speech Recognition?

Evolution of Speech Recognition

●​ The Start (1960-1999)

●​ Year 2000 - The Future

Applications of Speech Recognition

You might also like

● The Start (1960-1999)

● Year 2000 - The Future