0% found this document useful (0 votes)

20 views17 pages

Project Report

Uploaded by

akshitarawat045

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views17 pages

Project Report

Uploaded by

akshitarawat045

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

MINI PROJECT REPORT

UTTARANCHAL INSTITUTE OF TECHNOLOGY/ CSE-1,

Submitted in partial fulfillment of the

Requirements for the award of
Degree of Bachelor of Technology in Computer Science & Engineering

SUBMITTED BY:

Arti (2101010053)
Anjali Farswan (2101010040)
Ayushi (2101010073)
Chhavi(2101010080)
Semester/Branch: 7th/CSE

SUBMITTED TO:

Prof. Pinky Uniyal

Prof.Amita Bisht

Department of Computer Science & Engineering

UIT, UTTARANCHAL UNIVERSITY
Dehradun (Uttarakhand), 248001.

1
CERTIFICATE

This is certify that the work in this project title as “Speech recognition” is entirely
written ,successfully completed and demonstrated by following students themselves as a
fulfillment of requirement for Bachelors of Engineering in Computer Science
Name of students:
Arti
Anjali Farswan
Ayushi
Chhavi

Supervisor Name: Department Name:UIT

Prof. Pinky Uniyal
Prof.Amita Bisht

2
ACKNOWLEDGEMENT

I would like to express my deepest appreciation to all those who have been helping me throughout
the project and without whom this project would have been a very difficult task.I would like to
thank all of them.

I am highly indebted to Miss. Amita Bisht for her guidance and constant supervision as well as for
providing necessary information regarding this project and also for their support in doing my
Project. My thanks and appreciations also go to our colleagues who have helped me out with their
abilities in developing the Project.

This is certify that the above statement made by candidate is true to the best of my knowledge.

3
ABSTRACT

Speech recognition is a transformative technology that enables machines to interpret and respond
to human speech, making it an essential component in numerous applications such as virtual
assistants (e.g., Siri, Alexa), automated transcription services, voice-controlled interfaces, and
accessibility tools for individuals with disabilities. The field has seen remarkable advances over the
past few decades, particularly with the rise of machine learning and deep learning techniques,
which have significantly improved the accuracy and real-time performance of speech recognition
systems. Modern systems utilize deep neural networks (DNNs), recurrent neural networks (RNNs),
and transformer-based models, which learn to map raw audio signals to text through training on
large, labeled datasets.
This project investigates the key elements involved in building a speech recognition system,
including data acquisition, preprocessing, feature extraction, model selection, and decoding. We
focus on extracting features such as Mel-frequency cepstral coefficients (MFCCs) from audio data,
training deep learning models to recognize speech patterns, and evaluating system performance
using metrics such as Word Error Rate (WER) and Real-Time Factor (RTF). The project also
explores various challenges faced by speech recognition systems, including handling noisy
environments, differentiating between various accents and dialects, and ensuring real-time
processing without sacrificing accuracy.

Despite the substantial progress made, limitations such as robustness to noise, accent variability,
and computational efficiency remain key hurdles. This report concludes by providing insights into
the current state of speech recognition technology and suggesting directions for future research,
such as exploring end-to-end models, enhancing noise robustness, and leveraging transfer learning
techniques. Ultimately, speech recognition continues to evolve, with ongoing improvements
opening the door to more natural, intuitive, and inclusive human-computer interactions.

4
5
Table of Contents

1. Introduction
2. Problem Statement
3. Objectives
4. Literature Review
5. Methodology
o Data Collection
o Preprocessing
o Feature Extraction
o Model Training
6. Speech Recognition Algorithms
o Acoustic Models
o Language Models
o Decoding
7. Tools and Technologies
8. Applications of Speech Recognition
9. Challenges and Limitations
10. Results and Analysis
11. Conclusion
12. References

6
INTRODUCTION

Speech recognition is a branch of artificial intelligence (AI) that focuses on enabling machines to
understand and interpret human speech. This technology allows spoken language to be converted
into text, facilitating natural, voice-based communication between humans and computers. Over
the years, speech recognition has become an integral part of numerous applications, including
virtual assistants like Siri, Alexa, and Google Assistant, automated transcription systems, and
accessibility tools for people with hearing impairments.

At its core, speech recognition involves analyzing audio signals and mapping them to words or
phonetic units. Early speech recognition systems, which were based on template matching and
rule-based algorithms, had limited accuracy and could only handle small vocabularies under
controlled conditions. However, with the advent of machine learning, and particularly deep
learning, speech recognition has seen significant improvements in accuracy, robustness, and
scalability. Modern systems can handle large vocabularies, noisy environments, and various
accents with high accuracy and efficiency.

7
HOW SPEECH RECOGNITION WORK

1. Capture: Sound is captured by a microphone.

2. Preprocessing: The audio is cleaned, segmented, and normalized.

3. Feature Extraction: The audio is converted into features (like MFCCs or spectrograms) that
represent the relevant aspects of the speech.

4. Acoustic Modeling: A model is used to map these features to phonetic units or word
representations.

5. Language Modeling: The context of the speech is analyzed using a language model to predict
the next most probable word or phrase.

6. Decoding: A decoder uses the outputs from both models to generate the final transcription.

7. Post-processing: The output text is cleaned up for readability and accuracy.

8. Output: The transcription is made available for the user or application.

Speech recognition systems have advanced dramatically due to innovations in machine learning,
particularly deep learning. They are now capable of handling large vocabularies, noisy
environments, and varied accents with impressive accuracy. While challenges like noise
robustness, real-time performance, and accent variation remain, modern systems continue to
improve, enabling new possibilities in voice-activated technologies, transcription services, and
human-computer interaction.

8
TYPES OF SPEECH RECOGNITION

Speech recognition systems can be classified based on various factors, including the size of the
vocabulary, the environment in which they are used, and their adaptability:

 Isolated Word Recognition: These systems can recognize individual words, but the
speech must be clearly separated (e.g., no continuous speech). They are commonly used in
applications such as voice-controlled devices.
 Continuous Speech Recognition: These systems recognize natural, continuous speech,
allowing users to speak at a normal pace. They are more complex and require more
advanced algorithms to handle variations in speech patterns and background noise.
 Speaker-Dependent vs. Speaker-Independent: Speaker-dependent systems require
training with a particular user's voice to improve accuracy, while speaker-independent
systems aim to recognize speech from any speaker without prior adaptation.
 Large Vocabulary Continuous Speech Recognition (LVCSR): LVCSR systems are
capable of recognizing thousands or even millions of words, making them suitable for real-
world applications like dictation, virtual assistants, and transcription services.

9
10
Applications of Speech Recognition

The use of speech recognition spans a wide range of applications, with some of the most notable
being:

 Virtual Assistants: AI-powered voice assistants such as Apple’s Siri, Google Assistant,
and Amazon’s Alexa use speech recognition to understand user commands and perform
tasks like setting reminders, playing music, or answering questions.
 Transcription Services: Speech-to-text systems are widely used to transcribe meetings,
interviews, lectures, and podcasts. These systems allow for quicker documentation and
improved accessibility.
 Accessibility: For individuals with disabilities, speech recognition provides a powerful tool
for interacting with technology. It enables voice-controlled computing for people with
mobility or vision impairments, helping them use devices, write emails, and browse the
web.
 Customer Support: Speech recognition is widely used in customer service applications to
automate phone systems, enabling users to interact with systems using natural language
rather than needing to navigate through a series of options.
 Healthcare: Medical professionals use speech recognition to dictate patient notes and
transcribe medical records, which reduces administrative workload and improves
efficiency.

11
12
13
14
15
16
17

2 Manual RPI M50A 12s V1 EU EN 2017-03-09
No ratings yet
2 Manual RPI M50A 12s V1 EU EN 2017-03-09
166 pages
Araadhy Ayush
No ratings yet
Araadhy Ayush
22 pages
Major Project (2) - Compressed
No ratings yet
Major Project (2) - Compressed
55 pages
Similarity 0505064848
No ratings yet
Similarity 0505064848
56 pages
Volktek - Solution Catalog For Surveillance Ethernet
No ratings yet
Volktek - Solution Catalog For Surveillance Ethernet
55 pages
Final Report
No ratings yet
Final Report
35 pages
GAGEtrak Pro 8 Intro Guide
No ratings yet
GAGEtrak Pro 8 Intro Guide
119 pages
? Gallery Walk Scoring Rubric
No ratings yet
? Gallery Walk Scoring Rubric
2 pages
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Speech Recognition Report
No ratings yet
Speech Recognition Report
46 pages
NLP 1.3.1 - Speed Recogmnition
No ratings yet
NLP 1.3.1 - Speed Recogmnition
20 pages
Speech Processing
No ratings yet
Speech Processing
70 pages
Speech Recognition
No ratings yet
Speech Recognition
11 pages
Nostalgia Funny Car Rules V1
No ratings yet
Nostalgia Funny Car Rules V1
5 pages
Project Report Rtu
No ratings yet
Project Report Rtu
17 pages
Speech Recognition Using Python
No ratings yet
Speech Recognition Using Python
49 pages
Speech Recognition Final Report (1) - Removed - Removed
No ratings yet
Speech Recognition Final Report (1) - Removed - Removed
62 pages
Sumana Bandyopadhyay - Kolkata The Colonial City in Transition - Reflections in Geographies of Urban India-Routledge (2022)
100% (1)
Sumana Bandyopadhyay - Kolkata The Colonial City in Transition - Reflections in Geographies of Urban India-Routledge (2022)
395 pages
Wa0002.
No ratings yet
Wa0002.
10 pages
CBSE Class 6 Social Science Sample Paper SA 2 SET 1
No ratings yet
CBSE Class 6 Social Science Sample Paper SA 2 SET 1
2 pages
Case Study: Speech Recognition For Virtual Assistants: 1. Problem Identification
No ratings yet
Case Study: Speech Recognition For Virtual Assistants: 1. Problem Identification
8 pages
5 BA Q1250 Komax HMI EN
No ratings yet
5 BA Q1250 Komax HMI EN
31 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
Speech Recognition
No ratings yet
Speech Recognition
66 pages
Speech Recognition
No ratings yet
Speech Recognition
9 pages
A Report On
No ratings yet
A Report On
35 pages
Building Consensus Around Difficult Strategic Decisions
No ratings yet
Building Consensus Around Difficult Strategic Decisions
9 pages
AIML
No ratings yet
AIML
9 pages
Course Structure R15me
No ratings yet
Course Structure R15me
217 pages
Ranjith S - Mini Project
No ratings yet
Ranjith S - Mini Project
74 pages
Piyu Sem Report.5
No ratings yet
Piyu Sem Report.5
30 pages
Prompt Engineering Unleashed: Crafting the Future of AI Communication
From Everand
Prompt Engineering Unleashed: Crafting the Future of AI Communication
Michael Ferguson
No ratings yet
PPE Lab Manual
No ratings yet
PPE Lab Manual
52 pages
Speech Recognition
No ratings yet
Speech Recognition
7 pages
Aqa A Level English Literature Coursework Mark Scheme
100% (1)
Aqa A Level English Literature Coursework Mark Scheme
4 pages
Speech Recognition System
No ratings yet
Speech Recognition System
5 pages
AI Speech Recognition Document
No ratings yet
AI Speech Recognition Document
26 pages
Speech Recognition: Fundamentals and Applications
From Everand
Speech Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Iso 11600 2002
No ratings yet
Iso 11600 2002
9 pages
DL Based Speech To Text Converter For Audio Visual Applications
No ratings yet
DL Based Speech To Text Converter For Audio Visual Applications
4 pages
CASE STUDY - Speech Recognition
No ratings yet
CASE STUDY - Speech Recognition
25 pages
KY DSV
No ratings yet
KY DSV
7 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Thesis-Speech Recognition Markov
No ratings yet
Thesis-Speech Recognition Markov
65 pages
International Project Management Guide 2.0 (IAPM)
100% (1)
International Project Management Guide 2.0 (IAPM)
44 pages
Kleinman 2011
No ratings yet
Kleinman 2011
9 pages
Minor Project Report
No ratings yet
Minor Project Report
13 pages
"Speech Recognition and Voice Detection System": Bachlor of Technology IN Computer Science Engineering
No ratings yet
"Speech Recognition and Voice Detection System": Bachlor of Technology IN Computer Science Engineering
29 pages
Speech
No ratings yet
Speech
58 pages
Speech Recognition: Prof. Ram Meghe Institute of Technology and Research, Badnera-Amravati
No ratings yet
Speech Recognition: Prof. Ram Meghe Institute of Technology and Research, Badnera-Amravati
13 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
Design and Implementation
No ratings yet
Design and Implementation
74 pages
De Chuyen Anh Vinh Phuc 2018-2019
No ratings yet
De Chuyen Anh Vinh Phuc 2018-2019
6 pages
Artificial Intelligence in Voice Recognition
No ratings yet
Artificial Intelligence in Voice Recognition
14 pages
Techniques in Measuring Microbial Growth
No ratings yet
Techniques in Measuring Microbial Growth
7 pages
T 14.419.003 SH1 AA - CEF - Signed PDF
No ratings yet
T 14.419.003 SH1 AA - CEF - Signed PDF
33 pages
Netaji Subhas Institute of Technology, Bihta, Patna
No ratings yet
Netaji Subhas Institute of Technology, Bihta, Patna
12 pages
2.1 PPT - Homogeneous and Hetero Mixtures
No ratings yet
2.1 PPT - Homogeneous and Hetero Mixtures
60 pages
Widcollogo1 FINAL
No ratings yet
Widcollogo1 FINAL
83 pages
Jasmeet Seminar Report
No ratings yet
Jasmeet Seminar Report
24 pages
Progress - Report - of - Intership MD Shams Alam
No ratings yet
Progress - Report - of - Intership MD Shams Alam
4 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
23 pages
Lecture 23: Outline: Yell If You Have Any Questions
No ratings yet
Lecture 23: Outline: Yell If You Have Any Questions
43 pages
Expert System Voice Assistant
No ratings yet
Expert System Voice Assistant
52 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
22 pages
Text and Speech CCS369-UNIT 5
No ratings yet
Text and Speech CCS369-UNIT 5
9 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
24 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Speech Recognition: A Seminar Report On
No ratings yet
Speech Recognition: A Seminar Report On
5 pages
SPEECH RECOGNITION SYSTEM Final
No ratings yet
SPEECH RECOGNITION SYSTEM Final
16 pages
Quidos Technical Bulletin - 15th September 2019
100% (1)
Quidos Technical Bulletin - 15th September 2019
7 pages
CN Assignment 1A
No ratings yet
CN Assignment 1A
12 pages
Passband Digital Transmission
No ratings yet
Passband Digital Transmission
99 pages
The Inventory Control Account Balance of Magic Fashions at June 30
No ratings yet
The Inventory Control Account Balance of Magic Fashions at June 30
2 pages
Collaborative Learning
No ratings yet
Collaborative Learning
7 pages
Speech Recognition System - A Review
No ratings yet
Speech Recognition System - A Review
10 pages
ABSTRACT Seminar
No ratings yet
ABSTRACT Seminar
5 pages
Speech Recognition Using Ic HM2007
100% (4)
Speech Recognition Using Ic HM2007
31 pages
Curriculum Development Prof Ed LET Reviewer
100% (1)
Curriculum Development Prof Ed LET Reviewer
6 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
9 pages
Umakant B
No ratings yet
Umakant B
3 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
C9 WS 3 PHY Electromagnet
No ratings yet
C9 WS 3 PHY Electromagnet
5 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
No ratings yet
Approved by AICTE, New Delhi Affiliated To Aryabhatta Knowledge University, Patna, BIHAR
5 pages
MasterCast 222 TDS-974770
No ratings yet
MasterCast 222 TDS-974770
2 pages
Eurocode 7 Geotechnical Limit Analysis
No ratings yet
Eurocode 7 Geotechnical Limit Analysis
19 pages
Creating Graphs and Charts in Excel
No ratings yet
Creating Graphs and Charts in Excel
6 pages
Speech Recognition Seminar Report
87% (97)
Speech Recognition Seminar Report
32 pages

Project Report

Uploaded by

Project Report

Uploaded by

MINI PROJECT REPORT

UTTARANCHAL INSTITUTE OF TECHNOLOGY/ CSE-1,

Submitted in partial fulfillment of the

Prof. Pinky Uniyal

Department of Computer Science & Engineering

Supervisor Name: Department Name:UIT

1. Capture: Sound is captured by a microphone.

2. Preprocessing: The audio is cleaned, segmented, and normalized.

7. Post-processing: The output text is cleaned up for readability and accuracy.

8. Output: The transcription is made available for the user or application.

You might also like