0% found this document useful (0 votes)
37 views3 pages

Introduction To Speech Recognition

This article provides an overview of speech recognition technology, detailing its evolution, operation, and applications. It explains the distinction between speech recognition and voice recognition, the role of algorithms and machine learning in improving accuracy, and highlights advancements and ongoing challenges in the field. Key applications include virtual assistants, transcription services, and accessibility tools, emphasizing the efficiency and convenience offered by speech recognition technology.

Uploaded by

Kezia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views3 pages

Introduction To Speech Recognition

This article provides an overview of speech recognition technology, detailing its evolution, operation, and applications. It explains the distinction between speech recognition and voice recognition, the role of algorithms and machine learning in improving accuracy, and highlights advancements and ongoing challenges in the field. Key applications include virtual assistants, transcription services, and accessibility tools, emphasizing the efficiency and convenience offered by speech recognition technology.

Uploaded by

Kezia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Introduction

This article presents an overview of speech recognition technology, including its evolution,
operation, and various applications. It looks into the underlying technologies, such as algorithms
and machine learning, that allow speech recognition systems to interpret and comprehend
human speech. Furthermore, this article discusses the advancements made in this subject, from
its early days to the present, as well as the obstacles that remain. This article is a summary of
what I've learnt, compiled into one piece, providing readers a brief understanding of speech
recognition technology.

What is Speech Recognition?


Speech recognition is a feature where the program recognizes human speech and converts it
into written language. It is also referred to as ASR (Automatic Speech Recognition), Computer
Speech Recognition, or Voice-to-Text. These functionalities have an extended use in various
apps, including: virtual assistants, transcription services, voice-controlled gadgets, and
accessibility tools for individuals with disabilities. It would allow hands-free control, voice
commands, and the facility to swiftly change spoken text into written material, making
day-to-day interactions more efficient and accessible. Many people associate speech
recognition with voice recognition. They are not the same! While speech recognition is trying to
translate a person's vocalized words into text, voice recognition is attempting to identify who it is
speaking with. So, keep that in mind before going further in the understanding of speech
recognition, as both involve speaking but have their own purpose and system behind it.

Algorithm
So how does a speech recognition model be able to recognize our verbal words? Well it is
because each speech recognition model is guided by algorithms implemented in them. An
algorithm is a set of instructions, which are done to solve a problem or complete some tasks.
In computer science, algorithms instruct machines on what to do. So in this case of speech
recognition, the algorithm guides a model by directing how to process and differentiate sounds,
recognize patterns, predict words based on context, and adjust to variations like accents or
noise. It defines the steps for accurate speech-to-text conversion. Algorithms are important
because they specify how the machine will process data in order to learn and make decisions.
The better the algorithm, the better and more efficient the learning will be for the machine.

Machine Learning
The machine learning model comprises a subcategory of artificial intelligence in which
computers are trained to learn from data without explicit programming for every little task. It
means that a machine detects various patterns from data, makes predictions using algorithms,
and relearns from experience as it continues processing more information.

For example, you would normally write a program to identify cats by listing every feature that
distinguishes a cat, like its sharp retractable claws, pointy ears, slit-shaped pupils, whiskers and
so on. Instead, you just give the machine thousands of images of cats. The machine will next be
guided by the algorithm (set of instructions), to identify and classify the patterns in each image.
The characteristics that define something as a cat are then discovered by learning these
patterns. This is how image recognition works.

Now the functionality behind speech recognition doesn’t stray away from the principles of image
recognition. It’s similar, the machine gets exposed to thousands of audio samples with different
voices, accents, and tones. It analyzes the sounds by breaking them into waveforms and
extracting key features like vowels, consonants, and phonemes. Algorithms help the machine in
identifying how these sounds blend to make words and sentences, so it can make predictions
when exposed to new speech. The system, over time, improves its performance as it ingests
new data, adapts to differences like accents or background noise, and refines its accuracy.​

Evolution of Speech Recognition


An algorithm's design has a significant impact on the machine learning process. A badly
constructed algorithm will only result in ineffective pattern recognition, slow learning, and
wasteful resource consumption. On the contrary, good algorithm design entails that the learning
is more appropriate and effective because it recognizes patterns, adapts to new data, and
optimizes resource utilization strategies. These lead to faster and more trustworthy models in
machine learning. For a long time, engineers have been pushing forward with more efficient
algorithms for these speech recognition models. Endlessly trying to find a way that can
increase their ability to recognize words accurately. So here is a brief history of how speech
recognition ability has progressed.

●​ The Start (1960-1999)


In 1962 IBM introduced “Shoebox” which only had the ability to only recognize 16
english words. In the late 1960s continuous speech recognition was developed, earlier
machines required users to pause after each word. During 1971-1976 The Defense
Advanced Research Projects Agency (DARPA) funded 5 years of speech recognition
research. They created a machine named ‘Harpy’ capable of understanding 1011 words.
Subsequently, new models, such as the hidden Markov models (HMMs) have been
implemented in speech recognition systems, allowing machines to more accurately
recognize speech by predicting the probability of unknown sounds. By the mid 1980s, the
Tangora model was able to achieve the distinction of 20,000 words. This huge leap of
progress is thanks to the hidden Markov models, improved statistical models and access to
larger training datasets. In the 1990s, speech recognition began to be integrated into
commercial products; these included Apple computers.

●​ Year 2000 - The Future


By 2001, voice recognition technology had achieved nearly 80% accuracy. For most of
the decade, there were only a few advancements until Google launched Google Voice
Search. Because it was an app, millions of individuals now have access to speech
recognition technology. It was also significant because processing power could be
transferred to its data centers. Not only that, but Google was collecting data from billions
of searches to help it predict what a person is saying. At the time, Google's English
Voice Search System contained 230 billion words from user searches. In 2011, Apple
launched Siri, and in that same decade other models have been released such as
Amazon’s Alexa and Google Home. Believe it or not, there has been more progress in
speech recognition technology in the last 30 months than in the first 30 years. While
speech recognition has made significant progress, it has not necessarily reached its
ultimate limit. There is still room for improvement.
1.​ Noise Background: Recognizing speech in a noisy environment still remains a
challenge. Further work could be done in enhancing aspects such as noise
filtering and adaptation to different acoustic environments.
2.​ Accent and Dialect Recognition: There are so many types of accents all over
the world. This can be a challenge for speech recognition models due to some
accents having less training data. Example: English in non-native English accent.

As time goes by, speech recognition technologies are bound to get better. More and
more people will become comfortable talking with machines as it will assist them in
accomplishing the tasks at hand.

Applications of Speech Recognition


Speech recognition technology can be put into many different areas; the main ones are virtual
assistants, like Siri, Alexa and Google Assistant. Major voice assistants let users interact with
their gadgets simply by using their voice to execute simple tasks. Other major applications
involve transcription, where it converts spoken content to written text. This is even more helpful
in creating documentation, subtitles, and transcription of meetings or lectures. With speech
recognition, drivers won't have to take their hands off the wheel or their eyes off the road to
manage music, navigation, and other in-car features, making it much more convenient and
safer.It also provides a facility in education, such as language learning applications, by which a
student can practice pronunciation and comprehension. In terms of efficiency, speech
recognition reduces the time and effort required to perform tasks. The time-saving aspect is
particularly valuable, as users can complete tasks faster by speaking rather than typing, leaving
more time for other activities.

Sources:
https://www.ibm.com/topics/speech-recognition
https://en.wikipedia.org/wiki/Timeline_of_speech_and_voice_recognition#Overview
https://en.wikipedia.org/wiki/Speech_recognition
https://sonix.ai/history-of-speech-recognition#:~:text=1950s%20and%2060s,four%20vowels%20
and%20nine%20consonants.
https://transkriptor.com/speech-recognition/
https://verbit.ai/captioning/what-is-voice-recognition-used-for-and-how-does-it-work/#:~:text=Voi
ce%20recognition%20technology%20can%20interpret,claim%20to%20be%20when%20speakin
g.

You might also like