0% found this document useful (0 votes)

8 views17 pages

OENG1167-EB-ET-project-proposal-voice Recognition

Uploaded by

dsdrefdwg4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views17 pages

OENG1167-EB-ET-project-proposal-voice Recognition

Uploaded by

dsdrefdwg4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

RMIT University

OENG1167 Engineering Capstone Project Part A - Task 1

Project Proposal
Noise Reduction for Automotive Voice Recognition

Academic Supervisor:
PRIVATE AND CONFIDENTIAL

Table of Contents
Noise Reduction for Automotive Voice Recognition 0
Table of Contents 1
Section 1 - Executive Summary 2
Section 2 - Statement of Problem 2
Section 3 - Literature Review 3
Section 3.1 - Microphone Array Beamforming 3
Section 3.2 - Microphone Array Hardware 4
Section 3.3 - Adaptive Beamforming Algorithms 4
Section 3.4 - Beamforming with NR Algorithms 5
Section 3.5 - Noise Reduction and VAD Algorithms 6
Section 4 - Design Questions 6
Section 5 - Methodology 6
Section 5.1 - Design Methodology 6
Section 5.2 - Resource Planning 7
Section 5.3 - Alternative Designs 8
Section 5.4 - Project Timeline 9
Section 6 - Risk Management and Ethical Considerations 10
Section 6.1 - Risk Assessment 10
Section 6.1.1 - SWOT Analysis 10
Strengths 10
Weaknesses 10
Opportunities 10
Threats 10
Section 6.2.2 - Risk Solution Chart 11
Section 6.2 - Ethical Considerations 11
Section 7 - References 12

1
PRIVATE AND CONFIDENTIAL

Section 1 - Executive Summary

In this project proposal, we present an outline for the development of a real-time
embedded noise reduction system for automotive voice recognition using digital signal
processing (DSP) algorithms. The primary aim of this project is to improve the clarity of
speech in the noisy environment of an automotive vehicle.

This proposal will outline the problem, along with the requirements of our industry
sponsors, Fiberdyne Systems, and their clients, Softbank and Renesas. We also present our
initial research into some DSP solutions relevant to the project and then provide a plan for
the design and development of the noise reduction system. The requirements will be
refined during a key project meeting with the clients in April.

By the completion of this project, we aim to have a working prototype of a standalone DSP
embedded hardware system that improves the signal-to-noise (SNR) ratio of the voice signal
being passed through the system.

Section 2 - Statement of Problem

The use of voice recognition (VR) algorithms is a growing trend in consumer electronics. A
number of technology companies have released virtual assistants such as Apple’s Siri,
Google’s Assistant, and Samsung’s Bixby. As a result, these companies have also released
hardware products to extend the usability of VR systems beyond mobile phones. For
instance, within the automotive industry VR systems are being used to provide hands-free
control of features. These include music, GPS, phone calls and messaging, helping minimize
distractions to the driver.

Given these VR systems are being used in a wider range of environments, the voice source is
often further away from the microphone. This significantly degrades speech intelligibility as
the signal is now being exposed to reverberation and background noise before reaching the
microphone input. For this reason, DSP algorithms could be implemented to clean up the

2
PRIVATE AND CONFIDENTIAL

signal and make the voice recognition work in environments that were previously
considered too noisy.

The company SoftBank has been developing a VR system that is based around detecting
emotion in the users voice within an automotive vehicle. Their initial testing had satisfactory
accuracy under ideal conditions (i.e. silence), however they found that the VR system
struggled to detect emotion accurately when the vehicle was being driven at speed. This is
because there are a number of extra noise sources (such as wind, road, and engine noise)
interfering with the voice signal.

As a result, SoftBank has contracted our industry sponsors, Fiberdyne Systems, to develop a
noise reduction microphone system that will enhance and clean up the microphone input
into the existing VR system. The usage of a beamforming mic array and machine learning
was suggested by SoftBank as potential methods to help isolate the users voice, and
minimise the ambient noise being picked up by the microphone input. Audio filtering was
also explored, however as the system is based around emotion recognition, it requires
significantly more bandwidth compared to normal voice recognition (M Lech, p. 2346). Due
to the increased bandwidth requirement, Softbank has specified that we need to maintain
the full spectral source data for their algorithm to analyse. This means that some additional
DSP algorithms may be required to cancel out background noise as a high-pass filter can’t be
used to filter out low frequency background noise. Additionally, a requirement of our
sponsor is that we use MEMS microphones. Due to their unique design, some research must
also be done in this area.

Our industry sponsors Fiberdyne, have specified for us to initially use the Analog Devices
SHARC SC589 family of architectures for this project as a test platform before porting it to
their client’s System on Chip (SoC) down the line. This is as it is a capable DSP platform
which we already have experience writing software on and we also have development
boards containing processors from this family.

3
PRIVATE AND CONFIDENTIAL

Overview of project scope:

1. Work with Fiberdyne Systems engineers to understand the requirements of
Softbank’s new voice emotion recognition system.
2. Research and evaluate existing DSP techniques and algorithms to determine if they
will be sufficient in an automotive environment
3. Develop and simulate the chosen DSP algorithm(s).
4. Implement the chosen DSP algorithm(s) on a standalone hardware processor.
5. Present the finished system in a hardware demo to a group of Fiberdyne Systems’
engineers.

Section 3 - Literature Review

The technical aim of this project is to improve the signal-to-noise ratio (SNR) of a
microphone input, where the voice is the signal and all other audio is noise. As such we
need to research and evaluate existing state-of-the-art noise reduction (NR) techniques and
DSP algorithms to determine if they will be sufficient in an automotive environment or if we
need to modify and or develop a NR system. This research will primarily focus on the
following areas:
● Microphone array beamforming
● Noise reduction DSP algorithms
● Voice activity detection

Section 3.1 - Microphone Array Beamforming

Fundamentally, a microphone array is a hardware configuration of multiple microphones
spaced at a fixed distance to each other. This will result in each microphone receiving a
similar signal, but offset in phase slightly. This multi-channel microphone input can then be
processed in a variety of ways as will be discussed in this literature review.

Section 3.2 - Microphone Array Hardware

Microphones come in many forms and can be set up in different ways. Microphones have
polar patterns, which can cancel out sound from certain directions and can be set act
differently depending on the way that they are constructed. The omnidirectional

4
PRIVATE AND CONFIDENTIAL

microphone results in no noise cancellation in any direction. Cardioid and supercardioid

microphones have a specific directional pattern (Nave, 2018).

In section 2, the requirement to use MEMS microphones was mentioned. As all MEMS mics
have an omnidirectional pickup pattern (InvenSense 2013, p. 1), there needs to be some
way of changing this to filter out depending on direction. A way of adding a weighted polar
pattern to a MEMS mic is using a microphone array. This uses anywhere from two to six
microphones and processes them to create a weighted polar pattern. There are two
fundamental array types which are explored here, which utilise different signal processing
techniques. Further work has also focused on using a combination of these array types to
further adjust the polar pattern (InvenSense 2013, p. 10).

The simplest microphone array type is the Broadside Array which places the microphones in
a line perpendicular to the preferred direction of sound waves. It then sums the microphone
inputs together, so at certain frequencies, they cancel out sound originating from the sides
of the array (InvenSense 2014, p. 3). Under testing, severe aliasing occurred at specific
frequencies.(Brandstein 2010, p. 50) Due to the way that this mathematically works, it
meant that the aliasing was close to the frequency where the nulls around the side
happened, and that adding more microphones reduce the frequency of the aliasing further.
Another disadvantage is that the array of two microphones has to be orthogonal to the
sound direction travel, as it creates a figure 8 pattern at the target frequency.

A more advanced type of microphone array is called the endfire array, which places the
microphones inline with the desired direction of sound propagation. Because the
microphone spacing is known, there is a known time delay between the two microphone
inputs. By compensating for this time delay and then summing the inputs, the mic array can
effectively cancel sound originating from behind the array, giving the array a cardioid
pickup-pattern (InvenSense 2013, p. 5).

Section 3.3 - Adaptive Beamforming Algorithms

Another method of microphone array beamforming utilizes an algorithm to estimate the
direction of arrival (DOA) of the voice signal based on the phase difference between the

5
PRIVATE AND CONFIDENTIAL

mic-array inputs (V Krishnaveni et al. 2013). This DOA can be used to adaptively steer the
beamformer in the direction of interest and reduce the effect of noise sources in other
directions (Hendriks & Gerkmann, 2011; Zhao et al. 2015). This would be more useful in a
automotive environment as we would be able to adjust the beamformer to have a narrower
pickup pattern and then self-adjust the pickup direction based on who is talking in the
vehicle (i.e the driver, or the passenger). Research by Zhao et al. (2015) suggests an
approach that adjusts individual microphone gains in real time based on the DOA estimate
to ‘steer’ the pickup pattern toward the location of the voice signal.

Beamforming can be done in either the time or frequency domain. Time domain
beamforming involves introducing known delays to the input signal (as in these cases the
microphone position is a fixed distance apart) and summing them together (V Krishnaveni et
al. 2013, p. 5). The downfall of this algorithm is that it is sensitive to phase mismatches,
which can be overcome by doing beamforming processing in the frequency domain (Joel J.
Fuster 2004, p. 10).

It is important that the direction of arrival algorithm used is robust. Care must be taken that
the algorithms used are able to deal with various different errors. If the algorithm does not
deal well with these errors, the speech may inadvertently be cancelled out. Errors occur
from a wide variety of factors - for example the impulse response of the environment
changing (windows being rolled up or down) or the microphone array not being calibrated
properly (SA Vorobyov et al. 2018, p. 313).

Furthermore, research by Affes (1997) suggested that identification and the matched
filtering of source-to-array impulse responses are necessary for a microphone array, to
further improve the intelligibility of speech by countering the effects of reverberation. This
is further highlighted by Aarabi (2004), which proposed a model for the signals received in a
beamforming array, being a combination of the original signal convoluted with the impulse
response of the environment, summed with a noise signal. Therefore, it will be important to
consider both noise reduction and dereverberation algorithms to achieve optimal
performance in the beamforming array.

6
PRIVATE AND CONFIDENTIAL

Section 3.4 - Beamforming with NR Algorithms

Research by Faubel et al. (2011) identified the benefit of the combination of
multi-microphone systems and NR algorithms, as the cross-correlation of common noise
between the multiple channels can negatively impact directional localization in the
microphone array. Further work by Taghizadeh et al. (2011) highlighted the usefulness of
using a VAD in a multi-microphone system to remove the noise spectrum from each
microphone channel before a beamforming algorithm is applied to improve localization.
Furthermore, research by Tourabin, Malka & Tzirkel-Hancock (2017) suggested that
designing a fixed-beamformer for a road with given properties, can result in reduced
performance when applied to a different road type, hence an adaptive noise estimation
should be used to improve the performance of the beamforming system.

Section 3.5 - Noise Reduction and VAD Algorithms

Distant speech recognition (DSR) systems are of great interest in automotive environments
as hands-free operation is the best way to avoid the distraction of the driver (Faubel et al.
2011, p. 70). However, such systems are operating in high-noise environments with engine
noise, gearbox, wind, and friction with the road all being picked up by the in-car
microphones along with the desired voice signals, thereby significantly degrading the
recorded speech quality (Tourabin, Malka & Tzirkel-Hancock, 2017). Hence, DSR systems
require a noise reduction (NR) algorithm operating in combination with a precise voice
activity detector (VAD) to help isolate the speech signal (Ramírez et al. 2003, p. 271).
Accurate VAD algorithms can improve the effectiveness of noise reduction algorithm by up
to 45.3%, by classifying periods of speech and silence in the signal (Evangelopoulos &
Maragos 2005). Significant research has been committed to improving the NR algorithms
using a VAD with algorithms such as histogram averaging, continuous spectral subtraction
(Hirsch & Ehrlicher 1995) and long-term spectral divergence (Ramírez et al. 2003).

7
PRIVATE AND CONFIDENTIAL

Section 4 - Design Questions

Given our initial research, we found extensive literature containing many potential solutions
for us to base elements of our project around. We have identified a few of these key areas
to direct our project design and development:

● What microphone array setup works best for the noise reduction algorithms?
● Which Direction of Arrival (DOA) algorithm works best for wideband voice
recognition?
● Which noise reduction (NR) algorithm works best in an automotive environment?
● Can a combination of beamforming, direction-of-arrival, noise reduction, or
dereverberation be implemented together to improve speech intelligibility?

Section 5 - Methodology

Section 5.1 - Design Methodology

This project is split into two distinct sections:
● DSP Algorithm Development
● Embedded Software and Beamforming Hardware Development
These sections can be done in parallel, as DSP algorithm development will primarily involve
MATLAB simulations for the initial development. The embedded software development
involves writing and porting software to the embedded system, to run the DSP algorithm
and provide interconnectivity with the beamforming hardware. Given each of our skills, we
will split the project such that Anthony will focus on the embedded software and
beamforming hardware development, and Michael will focus on the DSP algorithm
development. Clear specifications and requirements will need to be communicated to all of
the team so that the two modules will be compatible.

We have decided to implement a prototyping life-cycle for both the DSP and Embedded part
of the project. The prototyping model is an iterative system design model that involves
creating a series of prototypes that are shown to the project stakeholders to identify and

8
PRIVATE AND CONFIDENTIAL

confirm the correct features are being developed. Prototype life-cycle development starts
with a simple system, that conveys a single feature or concept, which will then evolve to
either refine a feature or develop a new section of the project until an acceptable solution is
made (Radcliffe, 2015).

Section 5.2 - Resource Planning

Given the number of stakeholders in this project, clear and frequent communication is
essential for this project to succeed. These stakeholders are outlined below:
● Softbank and Renesas - Clients
● Fiberdyne Systems - Industry Sponsor
● Dr PJ Radcliffe - Academic Supervisor
● Anthony Ashton and Michael Stekla - Student Engineers

One of the key meetings for this project is yet to occur on the 5th-6th April 2018, in which
representatives from Renesas will be flying to Melbourne, to meet with us and our industry
sponsors Fiberdyne Systems. These meeting will clearly outline the project specifications
and requirements, as well as provide an opportunity to discuss early concepts and potential
solutions with our client. Following this meeting, our industry sponsors Fiberdyne will be
responsible for handling all communication with the clients, Softbank and Renesas. We plan
to meet on a weekly basis with Fiberdyne and our academic supervisor, Dr PJ Radcliffe, to
facilitate an ongoing discussion on the development of the project.

As we are privileged to work on an industry project, the required development hardware,

test equipment and project budget is being supplied by Fiberdyne. They have a custom DSP
test platform and hardware processors available for us to use in development. A
microphone array header for a raspberry pi was already ordered at the beginning of the
semester to begin the prototyping the system, and we will also require components to
construct a MEMS microphone array on the Analog Devices hardware. In the case that
something additional is needed, it will be ordered through Fiberdyne. There is also an
additional RMIT budget that also may be used in the case that anything extra is required.

9
PRIVATE AND CONFIDENTIAL

Test and development equipment for this project includes laptops, audio interfaces,
speakers, signal generators, and oscilloscopes. These have also provided by Fiberdyne. In
the case that this equipment is unable to be used (Perhaps it will be used by the other
Fiberdyne engineers on a different project), the project is portable enough to be worked on
at RMIT with RMIT equipment. As this is an audio project that deals with frequencies in the
audible range, standard test equipment may be used.

Section 5.3 - Alternative Designs

Instead of using an array of MEMS microphones, regular condenser mics could be used. This
would likely be bulkier as they could not be soldered directly onto a PCB the way MEMS
mics can and instead require an enclosure to be built.

Several alternative embedded hardware designs could be used. Texas Instruments DSP’s are
popular and provide a similar environment and processing capabilities. However, there are
already capable Analog Devices development boards in use, so it would take a lot of time
porting it to this platform.

An alternative design would also be to write the software on the SoC that it is planned to be
ported to, but this does not have JTAG line-by-line debugging, so that would take a long
time to develop on. Iif JTAG debugging on the SoC was made available, it may be a good
idea to create software to run on that as it would mean that the software wouldn’t need to
be ported from the analog devices processors to the SoC.

10
PRIVATE AND CONFIDENTIAL

Section 5.4 - Project Timeline

11
PRIVATE AND CONFIDENTIAL

Section 6 - Risk Management and Ethical Considerations

Section 6.1 - Risk Assessment

While this is primarily a software project, there are still tangible health and safety risks as
there are environmental risks associated with using the prototype (in an automotive
environment), and there are project risks, as the industry sponsor is working with a third
party which may stop supporting the project.

Section 6.1.1 - SWOT Analysis

Strengths Weaknesses
● Previous experience in developing ● Limited time to develop
similar projects. ● Software development can take
● Familiarity with the development unpredictable lengths of time
environment. ● Will need to work on a wideband
● Previous work has been done on input (cannot high-pass filter input)
this platform(architectural ● Wideband noise reduction for
knowledge). emotive voice recognition is a
● Multi-part solution with many relatively unexplored area.
independent modules to show ● Creating a test setup may be
progress. difficult due to the number of
● DSP algorithm based on proven microphones, array placement, and
techniques. automotive environment.
● Software development already
underway.

12
PRIVATE AND CONFIDENTIAL

Opportunities Threats
● Current lack of solutions for ● A possibility of the client losing
wideband automotive noise interest or funding for the project.
reduction. ● Development boards potentially not
● This sort of system could be applied being available at certain times if
to other microphone inputs - for other industry-sponsored group
example, Bluetooth noise reduction. uses it or is required by the sponsor.
● Other solution to noise reduction
beating this to market

Section 6.2.2 - Risk Solution Chart

Risk Scenario Proposed Solution

Opening a window/door/boot throws off Reduce reliance on beamforming for noise

beamforming control algorithm

Playing music causes voice recognition to Create a series of tests to ensure that the
malfunction noise reduction the music volume
sufficiently

Emotive voice recognition fails due to Create tests that ensure the voice
frequency response being uneven due to reduction solution does not alter the
noise control speech input too much

Voice Recognition not failing results in Make it clear that this system is only to be
vehicle issue or crash used in non-critical systems as voice
recognition is not perfect

Section 6.2 - Ethical Considerations

An important part of ethics in engineering is making sure that the project’s goals and scope
are not misrepresented. More specifically to this project, it is important in this project to
make it clear that the goal is not to make a voice recognition algorithm, but one that

13
PRIVATE AND CONFIDENTIAL

improves a pre-existing voice recognition algorithm by using noise reduction and

beamforming DSP techniques.

It is also important to ensure that the noise reduction that is developed is not being used to
dictate safety-critical decisions for the car. This is as a voice recognition system is naturally
prone to error.

The end user’s data must also be accounted for. Considering that this is being used for a
voice detection algorithm, user data (the user’s speech) must be protected. As this is an
offline algorithm, this is not an issue in the current state. In future after the prototype stage
if the algorithm is used by another party as part of a ‘cloud’ algorithm (in other words the
DSP is done on a server where the user’s device sends the audio data to the server), it will
be their responsibility to ensure their system is secure against attackers.

14
PRIVATE AND CONFIDENTIAL

Section 7 - References
Affes, S. and Grenier, Y. (1997). A signal subspace tracking algorithm for microphone array
processing of speech. IEEE Transactions on Speech and Audio Processing, 5(5), pp.425-437.
Aarabi, P. and Shi, G. (2004). Phase-Based Dual-Microphone Robust Speech Enhancement.
IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 34(4),
pp.1763-1773.

Brandstein, M. (2010). Microphone arrays. Berlin: Springer Berlin, pp.54-55.

Chu, P. (n.d.). Superdirective microphone array for a set-top videoconferencing system.

Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

Evangelopoulos, G. and Maragos, P. (2006). Multiband Modulation Energy Tracking for Noisy
Speech Detection. IEEE Transactions on Audio, Speech and Language Processing, 14(6),
pp.2024-2038.

Faubel, F., Georges, M., Kumatani, K., Bruhn, A. and Klakow, D. (2011). Improving hands-free
speech recognition in a car through audio-visual voice activity detection. 2011 Joint
Workshop on Hands-free Speech Communication and Microphone Arrays.

Fuster, J J, 2004. A Hardware Architecture for Real-Time Beamforming. Masters Thesis. FL,
USA: University of Florida.

Hendriks, R. and Gerkmann, T. (2012). Noise Correlation Matrix Estimation for

Multi-Microphone Speech Enhancement. IEEE Transactions on Audio, Speech, and Language
Processing, 20(1), pp.223-233.

Hirsch, H. and Ehrlicher, C. (1995). Noise estimation techniques for robust speech
recognition. 1995 International Conference on Acoustics, Speech, and Signal Processing.

InvenSense. (2018). Application Note AN-1140. [online] Available at:

https://www.invensense.com/wp-content/uploads/2015/02/Microphone-Array-Beamformi
ng.pdf [Accessed 23 Mar. 2018].

M Lech, L He, N Allen, (2010). On the Importance of Glottal Flow Spectral Energy for the
Recognition of Emotions in Speech. In INTERSPEECH 2010. Makuhari, Chiba, Japan, 26-30
September 2010. 2010 ACM/IEEE International Symposium on Computer Architecture: ISCA.
2346-2349.

15
PRIVATE AND CONFIDENTIAL

Nave, R. (2018). Microphones. [online] Hyperphysics.phy-astr.gsu.edu. Available at:

http://hyperphysics.phy-astr.gsu.edu/hbase/Audio/mic3.html#c2 [Accessed 22 Mar. 2018].

PJ Radcliffe, (2015). Engineering Design 1. RMIT University, Melbourne.

Ramı́rez, J., Segura, J., Benı́tez, C., de la Torre, Á. and Rubio, A. (2004). Efficient voice
activity detection algorithms using long-term speech information. Speech Communication,
42(3-4), pp.271-287.

SA Vorobyov, AB Gershman, ZQ Luo, 2018. Robust Adaptive Beamforming Using Worst-Case

Performance Optimization: A Solution to the Signal Mismatch Problem. IEEE TRANSACTIONS
ON SIGNAL PROCESSING, [Online]. VOL. 51, NO. 2, FEBRUARY 2003, 313-324. Available at:
http://www.ece.ualberta.ca/~vorobyov/RobBeamformer.pdf [Accessed 15 March 2018].

Shengkui Zhao, Xiong Xiao et al., (2015). Robust Speech Recognition Using Beamforming
With Adaptive Microphone Gains And Multichannel Noise Reduction. In Automatic Speech
Recognition and Understanding Workshop. Scottsdale, AZ, USA, 2015. IEEE Xplore: IEEE.
460-467.

Taghizadeh, M., Garner, P., Bourlard, H., Abutalebi, H. and Asaei, A. (2011). An integrated
framework for multi-channel multi-source localization and voice activity detection. 2011
Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

Tourbabin, V., Malka, I. and Tzirkel-Hancock, E. (2017). Performance of fixed in-car

microphone array beamformer under variations in car noise. 2017 Hands-free Speech
Communications and Microphone Arrays (HSCMA).

V Krishnaveni, T Kesavamurthy, Aparna.B, 2013. Beamforming for Direction-of-Arrival (DOA)

Estimation-A Survey. International Journal of Computer Applications, [Online]. Volume 61–
No.11, January 2013, 4. Available at:
http://research.ijcaonline.org/volume61/number11/pxc3884758.pdf [Accessed 15 March
2018].

Project Proposal: FPGA Based Speech Recognition Project
100% (1)
Project Proposal: FPGA Based Speech Recognition Project
9 pages
14ec3029 Speech and Audio Signal Processing
No ratings yet
14ec3029 Speech and Audio Signal Processing
30 pages
Fundamentals of Communication Systems
From Everand
Fundamentals of Communication Systems
Janak Sodha
No ratings yet
Evolution of Input Devices
No ratings yet
Evolution of Input Devices
8 pages
Audio Word2vec: Sequence-To-Sequence Autoencoding For Unsupervised Learning of Audio Segmentation and Representation
No ratings yet
Audio Word2vec: Sequence-To-Sequence Autoencoding For Unsupervised Learning of Audio Segmentation and Representation
13 pages
Developing Better Communications Systems With Noise Reduction and Echo Cancellation
No ratings yet
Developing Better Communications Systems With Noise Reduction and Echo Cancellation
9 pages
In-Car Speech Enhancement Based On Source Separation Technique
No ratings yet
In-Car Speech Enhancement Based On Source Separation Technique
11 pages
Isolated Word Recognition On An Embedded System
No ratings yet
Isolated Word Recognition On An Embedded System
4 pages
Mini Project Evelualtion-1
No ratings yet
Mini Project Evelualtion-1
15 pages
Sensors 20 02326 PDF
No ratings yet
Sensors 20 02326 PDF
19 pages
Tech Seminar
No ratings yet
Tech Seminar
27 pages
Mini Project Report
No ratings yet
Mini Project Report
19 pages
Human-Robot Communication: Supervisor: Prof. Nejat Biomechantronics Lab Progress Report
No ratings yet
Human-Robot Communication: Supervisor: Prof. Nejat Biomechantronics Lab Progress Report
23 pages
Major Project - I Final Submission Report: DSP Tools in Wireless Communication
No ratings yet
Major Project - I Final Submission Report: DSP Tools in Wireless Communication
36 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
9 pages
Curs 4
No ratings yet
Curs 4
10 pages
Synopsis
No ratings yet
Synopsis
11 pages
Digital Signal Processing LEC 1
No ratings yet
Digital Signal Processing LEC 1
12 pages
Speech Recognition System: Surabhi Bansal Ruchi Bahety
No ratings yet
Speech Recognition System: Surabhi Bansal Ruchi Bahety
5 pages
Speech Enhancement
No ratings yet
Speech Enhancement
4 pages
Bt3420 SRK Project Report (Ete) - Ms Garima Rathi
No ratings yet
Bt3420 SRK Project Report (Ete) - Ms Garima Rathi
25 pages
Application of Deep Learning-Based Speech Signal P
No ratings yet
Application of Deep Learning-Based Speech Signal P
6 pages
Artificial Passenger
No ratings yet
Artificial Passenger
25 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Speech Interface Vlsi For Car Applications
No ratings yet
Speech Interface Vlsi For Car Applications
4 pages
Automated Speech Recognition Systems Applications in Industry
No ratings yet
Automated Speech Recognition Systems Applications in Industry
4 pages
Unit 2 Sound or Audio System
No ratings yet
Unit 2 Sound or Audio System
29 pages
Speech Recognition System Using Ic Hm2007
100% (1)
Speech Recognition System Using Ic Hm2007
21 pages
Implementing Voice Controlled Operation of Peripherals (Interactive Voice Response)
No ratings yet
Implementing Voice Controlled Operation of Peripherals (Interactive Voice Response)
49 pages
Subband Beamforming For Speech Enhancement in Hands-Free Communication
No ratings yet
Subband Beamforming For Speech Enhancement in Hands-Free Communication
135 pages
KH
No ratings yet
KH
7 pages
Voice Recognition System
No ratings yet
Voice Recognition System
4 pages
Speechsynthesis
No ratings yet
Speechsynthesis
6 pages
7 Implementationof PDF
No ratings yet
7 Implementationof PDF
9 pages
Speech
No ratings yet
Speech
7 pages
Technical Seminar On: Dept. of It-Bmsp&I
No ratings yet
Technical Seminar On: Dept. of It-Bmsp&I
37 pages
Post LTC07 115
No ratings yet
Post LTC07 115
13 pages
Application of Microphone Array For Speech Coding in Noisy Environment
No ratings yet
Application of Microphone Array For Speech Coding in Noisy Environment
5 pages
Voice Response System
0% (1)
Voice Response System
74 pages
Artificial Intelligence For Speech Recog
No ratings yet
Artificial Intelligence For Speech Recog
5 pages
Cortana
No ratings yet
Cortana
5 pages
Voice Controlled Car Using Aurduino and Bluetooth Module
No ratings yet
Voice Controlled Car Using Aurduino and Bluetooth Module
4 pages
Application of VLSI Technology in Wirele
No ratings yet
Application of VLSI Technology in Wirele
6 pages
Pankaj Singh Synopsis (Recovoicegnition)
No ratings yet
Pankaj Singh Synopsis (Recovoicegnition)
11 pages
Artificial Intelligence For Speech Recognition
No ratings yet
Artificial Intelligence For Speech Recognition
5 pages
A Review On Speech Recognition Methods: Ram Paul Rajender Kr. Beniwal Rinku Kumar Rohit Saini
No ratings yet
A Review On Speech Recognition Methods: Ram Paul Rajender Kr. Beniwal Rinku Kumar Rohit Saini
7 pages
High-Quality Text-To-Speech Synthesis: An Overview
No ratings yet
High-Quality Text-To-Speech Synthesis: An Overview
21 pages
Human-Computer Interaction Based On Speech Recogni
No ratings yet
Human-Computer Interaction Based On Speech Recogni
9 pages
Speech Recognition
No ratings yet
Speech Recognition
10 pages
Final Project Sound Detector
No ratings yet
Final Project Sound Detector
8 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Text To Speech Synthesis TTS
No ratings yet
Text To Speech Synthesis TTS
7 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
JUCE Audio Application Development: Definitive Reference for Developers and Engineers
From Everand
JUCE Audio Application Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Automatic Sound Recognition Technology: Modern College of Engineering, Pune-05
100% (2)
Automatic Sound Recognition Technology: Modern College of Engineering, Pune-05
20 pages
Hackathon Proposal Submission
No ratings yet
Hackathon Proposal Submission
3 pages
Voice Recognition
No ratings yet
Voice Recognition
16 pages
Modern Speech Recognition Approa
No ratings yet
Modern Speech Recognition Approa
337 pages
Speech Projects: Sds-01 Speech Recognition Using Cepstral Coefficients
No ratings yet
Speech Projects: Sds-01 Speech Recognition Using Cepstral Coefficients
24 pages
Surround Noise Cancellation and Speech Enhancement Using Sub Band Filtering and Spectral Subtraction
No ratings yet
Surround Noise Cancellation and Speech Enhancement Using Sub Band Filtering and Spectral Subtraction
8 pages
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Digital Signal Processing for Audio Applications: Volume 2 - Code
From Everand
Digital Signal Processing for Audio Applications: Volume 2 - Code
Anton R Kamenov
5/5 (1)
AI Unit-5
No ratings yet
AI Unit-5
34 pages
Introduction To Artificial Intelligence (AI) PDF
100% (1)
Introduction To Artificial Intelligence (AI) PDF
76 pages
Martinez 2021
No ratings yet
Martinez 2021
7 pages
Speech Recognition Calculator: ECE 4007 Section L02 - Group 08
No ratings yet
Speech Recognition Calculator: ECE 4007 Section L02 - Group 08
23 pages
UNIT 4 - Input & Output Unit
No ratings yet
UNIT 4 - Input & Output Unit
14 pages
Cybernetic Human HRP-4C: A Humanoid Robot With Human-Like Proportions
No ratings yet
Cybernetic Human HRP-4C: A Humanoid Robot With Human-Like Proportions
15 pages
(July-2021) New PassLeader AI-102 Exam Dumps
No ratings yet
(July-2021) New PassLeader AI-102 Exam Dumps
8 pages
Ai102renewal 29-12-23
No ratings yet
Ai102renewal 29-12-23
36 pages
Design and Development of Voice Based Writing Machine For Alphabet
No ratings yet
Design and Development of Voice Based Writing Machine For Alphabet
7 pages
10.2478 - Jaiscr 2019 0006
No ratings yet
10.2478 - Jaiscr 2019 0006
11 pages
ASR Improvement
No ratings yet
ASR Improvement
5 pages
Home Automation Project
0% (1)
Home Automation Project
26 pages
Abusive Detection in Multilingual Audio
No ratings yet
Abusive Detection in Multilingual Audio
5 pages
1822 B.E Cse Batchno 10
No ratings yet
1822 B.E Cse Batchno 10
56 pages
Esabel K Mahwengwa ANDROID TTS OCR SYSTEM
No ratings yet
Esabel K Mahwengwa ANDROID TTS OCR SYSTEM
24 pages
Speech Recognition Based Wireless Automation of Home Loads-E Home
No ratings yet
Speech Recognition Based Wireless Automation of Home Loads-E Home
6 pages
Speech Recogniton Calculator
No ratings yet
Speech Recogniton Calculator
28 pages
Assignment No. 1 Name:M Uzair Usman Rana ROLL NO:CB434394 Course Code:5403 Q. 1 (A) Define The Term ICT. Describe It With The Help of Proper Examples
100% (1)
Assignment No. 1 Name:M Uzair Usman Rana ROLL NO:CB434394 Course Code:5403 Q. 1 (A) Define The Term ICT. Describe It With The Help of Proper Examples
15 pages
Thesis Grade 11
No ratings yet
Thesis Grade 11
16 pages
Machine Learning QnA Session at DDU Jan 2025
No ratings yet
Machine Learning QnA Session at DDU Jan 2025
6 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
Project Jarvis
No ratings yet
Project Jarvis
13 pages
CCCCCCCCCCCCCCCCCCCCCCCCCCC CC C CCCCCCC C CC C C
No ratings yet
CCCCCCCCCCCCCCCCCCCCCCCCCCC CC C CCCCCCC C CC C C
10 pages
Jurnal VANOS Full
No ratings yet
Jurnal VANOS Full
106 pages
A Human Safety Device - An Attaching Clip Using Iot
No ratings yet
A Human Safety Device - An Attaching Clip Using Iot
26 pages
Ankan Mitra Resume PDF
No ratings yet
Ankan Mitra Resume PDF
1 page
09 - AI-900 1-35 - M - Answered
No ratings yet
09 - AI-900 1-35 - M - Answered
9 pages
Ethics FYP
No ratings yet
Ethics FYP
15 pages

OENG1167-EB-ET-project-proposal-voice Recognition

Uploaded by

OENG1167-EB-ET-project-proposal-voice Recognition

Uploaded by

RMIT University

OENG1167 Engineering Capstone Project Part A - Task 1

Section 1 - Executive Summary

Section 2 - Statement of Problem

Overview of project scope:

Section 3 - Literature Review

Section 3.1 - Microphone Array Beamforming

Section 3.2 - Microphone Array Hardware

microphone results in no noise cancellation in any direction. Cardioid and supercardioid

Section 3.3 - Adaptive Beamforming Algorithms

Section 3.4 - Beamforming with NR Algorithms

Section 3.5 - Noise Reduction and VAD Algorithms

Section 4 - Design Questions

Section 5.1 - Design Methodology

Section 5.2 - Resource Planning

As we are privileged to work on an industry project, the required development hardware,

Section 5.3 - Alternative Designs

Section 5.4 - Project Timeline

Section 6 - Risk Management and Ethical Considerations

Section 6.1 - Risk Assessment

Section 6.1.1 - SWOT Analysis

Section 6.2.2 - Risk Solution Chart

Risk Scenario Proposed Solution

Opening a window/door/boot throws off Reduce reliance on beamforming for noise

Section 6.2 - Ethical Considerations

improves a pre-existing voice recognition algorithm by using noise reduction and

Brandstein, M. (2010). Microphone arrays. Berlin: Springer Berlin, pp.54-55.

Chu, P. (n.d.). Superdirective microphone array for a set-top videoconferencing system.

Hendriks, R. and Gerkmann, T. (2012). Noise Correlation Matrix Estimation for

InvenSense. (2018). Application Note AN-1140. [online] Available at:

Nave, R. (2018). Microphones. [online] Hyperphysics.phy-astr.gsu.edu. Available at:

PJ Radcliffe, (2015). Engineering Design 1. RMIT University, Melbourne.

SA Vorobyov, AB Gershman, ZQ Luo, 2018. Robust Adaptive Beamforming Using Worst-Case

Tourbabin, V., Malka, I. and Tzirkel-Hancock, E. (2017). Performance of fixed in-car

V Krishnaveni, T Kesavamurthy, Aparna.B, 2013. Beamforming for Direction-of-Arrival (DOA)

You might also like