The Effect of Vehicle Noise On Automatic Speech Recognition Systems
The Effect of Vehicle Noise On Automatic Speech Recognition Systems
The Effect of Vehicle Noise On Automatic Speech Recognition Systems
Joshua Wheeler
Ford Motor Company
CITATION: Wheeler, J., "The Effect of Vehicle Noise on Automatic Speech Recognition Systems," SAE Technical Paper 2017-01-
1864, 2017, doi:10.4271/2017-01-1864.
Abstract using the objective metrics of Word Error Rate (WER%) and
Sentence Error Rate (SER%), which quantify the percentage of
The performance of a vehicles Automatic Speech Recognition (ASR)
individual words or full commands from the user being successfully
system is dependent on the signal to noise ratio (SNR) in the cabin at
interpreted and executed by the ASR system. The voice recognition
the time a user voices their command. HVAC noise and
software/hardware, and elements of the vehicle design, need to be
environmental noise in particular (like road and wind noise), provide
engineered in concert to ensure satisfactory performance of the core
high amplitudes of broadband frequency content that lower the SNR
ASR system to meet the customers expectations. Its not enough to
within the vehicle cabin, and work to mask the users speech.
design these disparate factors in a vacuum, independent of their
Managing this noise is a vital key to building a vehicle that meets the
influence on the ASR system and to one another.
customers expectations for ASR performance. However, a speech
recognition engineer is not likely to be the same person responsible
for designing the tires, suspension, air ducts and vents, sound package
and exterior body shape that define the amount of noise present in the
cabin. If objective relationships are drawn between the vehicle level
performance of the ASR system, and the vehicle or system level
performance of the individual noise, vibration and harshness (NVH)
attributes, a partnership between the groups is brokered. Compatible
targets are set and hardware selected that works to meet both groups
goals. This paper examines the NVH attributes and performance
metrics that relate to vehicle level ASR performance, and finds that
strong relationships and statistical trends can be drawn between the
Sentence Error Rate (SER%) and standard NVH metrics for that road
surface or HVAC configuration. The paper also establishes that AI%
should be the preferred metric to relate cabin noise to ASR
Figure 1. The systems and subsystems which are responsible for the
performance in the presence of any other kind of steady state noise.
customers satisfaction with their ASR and HFC system.
content that does not significantly contribute in the frequency range teams need to stop and align on expectations before compatible
where the current generation of automotive speech recognition hardware is discussed. The second goal is that when a road NVH
technology is focused (about 250Hz to 8kHz). However, excessive engineer notes that a proposed tire design improves their performance
tire tread sizzle, tire cavitation noise, and transient impact sounds will to target by a certain decibel level (and customer satisfaction measure),
cause higher frequency issues for the ASR engine. High levels of the ASR engineer can say that the NVH improvement also improves
wind noise caused by aspirations or aerodynamic properties of the their SER% by a similar known amount; thus also improving the ASR
exterior body design will cause broadband excitation that also customer satisfaction measure. The two teams now can work together
influences the high frequency range where engineers are trying to to defend the content proposal on common grounds. The following
preserve the clarity of speech. But HVAC noise provides the greatest sections work to draw these comparisons and propose formulas for this
level of masking noise in the car in the frequency range of concern common language to be used in those discussions.
for speech intelligibility, as high volumes of air are quickly pushed
through resonant ducts and distributed through narrow panel and
defroster openings, generating sound as the climate-controlled air is
Test Methodology
circulated. Figure 2 shows an example of the frequency content in a In order to support this investigation, data was collected on 15
voiced command spoken to the ASR system, and the competing vehicles from the OEM in their test labs anechoic chambers (for
frequency content of a common road noise and HVAC noise masking HVAC sources) and proving grounds ride roads (for road and wind
level. It is evident how the vehicle cabin noise from these sources sources). Standard surfaces, speeds, HVAC modes and blower
covers up the users speech. settings were selected so that NVH and ASR metrics would be
evaluated using common conditions. The test cases discussed in this
paper are the following:
Its also important to note that for the comparisons made below, the
noise level is measured at the drivers outboard ear (DOE). This is a
Figure 5. Panel vent Loudness (Sones) level and Articulation Index (AI%) vs.
common measurement location for NVH testing, and the point in the SER% performance.
car at which NVH targets and performance are quantified. The ASR
performance uses the noise measured at the vehicle microphone As shown in Figure 5, an exponential trend line can be used to
location, since where that microphone is positioned can change the establish the relationship between the Loudness level and the SER%
SNR in the vehicle as it moves closer to, or further away from, the performance. As with the defroster noise data, the noise levels are
drivers mouth. This microphone is often placed in the overhead recorded at the DOE microphone, so issues with HVAC buffeting at
console of the vehicle, but occasionally closer to the driver above the the hands-free microphone are not considered.
sun visor. This inconsistency will somewhat negatively affect the
statistical correlation, but is necessary because the NVH value
represented for the cabin noise level needs to share the same strategy Brushed Concrete Road/Wind Noise
as the NVH metric that it is being compared to. For highway-like brushed concrete speeds and noise, a statistical
relationship can be established the ASR performance and bandlimited
SPL. The frequency range that correlates best from those evaluated is
HVAC Defroster Noise from 630-3150Hz, which contains much of the frequencies
For HVAC defroster noise, a statistical relationship can be established responsible for voice intelligibility. However, AI% performance also
between Loudness or AI% data from the 15 cars at various defroster displays a good trend between data from the 15 cars at various levels
blower speed levels, and the resultant ASR performance. These are of brushed concrete road/wind noise, and the resultant ASR
both metrics that the NVH groups are familiar with, and can use to performance. The bandlimited SPL metric was slightly better from a
establish performance and target links. statistics perspective, but the goal of creating a relationship between
the two attributes is accomplished with AI%.
Figure 4. Defroster Loudness (Sones) level and Articulation Index (AI%) vs.
SER% performance.
Figure 6. Brushed concrete Articulation Index (AI%) and bandlimited SPL
(630-3150 dBA) vs. SER% performance.
As shown in Figure 4, an exponential trend line can be used to
establish the relationship between the Loudness level and the SER% As shown in Figure 6, an exponential or linear trend line can be used
performance. It is worth pointing out again that since the Loudness is to establish the relationship between the Articulation Index (%) level
calculated at the DOE microphone, but the ASR performance is and the SER% performance.
determined at the vehicle microphone, poor performing samples at
the high end of the scale may be due to airflow buffeting on the
vehicle microphone, which the DOE performance will not reflect. Coarse Road/Wind Noise
For city-like coarse road speeds and noise, a statistical relationship
can be established between the Articulation Index (%) and the ASR
HVAC Panel Vent Noise performance. Bandlimited 20-1000Hz RMS SPL (dBA) also displays
For HVAC panel vent noise, a statistical relationship can be a good trend between data from the 15 cars at various levels of coarse
established between Loudness or AI% data from the 15 cars at road noise, and the resultant ASR performance. The AI% metric
various blower speed levels, and the resultant ASR performance. provided the best correlation from a statistics perspective, but the
These are both metrics that the NVH groups are familiar with, and goal of creating a relationship between the two attributes is
can use to establish performance and target links. accomplished with the measure of bandlimited SPL.
Downloaded from SAE International by Vellore Inst of Technology, Thursday, December 07, 2017
DOE microphone. The results from every source, speed and surface are
graphed together at the same time to evaluate which of the previously
considered metrics best correlate to ASR performance.
References
1. Huber, J., Rangarajan, R., Ji, A., Charette, F. et al., "Validation
of In-Vehicle Speech Recognition Using Synthetic Mixing,"
SAE Int. J. Passeng. Cars - Electron. Electr. Syst. 10(1):2017,
doi:10.4271/2017-01-1693.
Contact Information
Josh Wheeler
[email protected]
Definitions/Abbreviations
ASR - Automatic speech recognition
DOE - Drivers outboard ear
Figure 9. Articulation Index (%) vs. SER% performance for all stationary
noise sources considered together. HFC - Hands free calling
HVAC - Heating, ventilation, air conditioning
There are non-standard noise sources that also have an effect on ASR
NVH - Noise, vibration, and harshness
performance that may eventually need to be considered, like rain noise,
cabin exterior noise, windows down noise, etc. Establishing a metric SER - Sentence error rate
that relates all stationary noise to the vehicle mic ASR performance SNR - Signal to noise ratio
will be important to understand when these sources are considered. For SPL - Sound pressure level
this evaluation, all road and HVAC sound recordings are grouped
WER - Word error rate
together at the vehicle-specific microphone location, instead of the
The Engineering Meetings Board has approved this paper for publication. It has successfully completed SAEs peer review process under the supervision of the session organizer. The process
requires a minimum of three (3) reviews by industry experts.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or
otherwise, without the prior written permission of SAE International.
Positions and opinions advanced in this paper are those of the author(s) and not necessarily those of SAE International. The author is solely responsible for the content of the paper.
ISSN 0148-7191
http://papers.sae.org/2017-01-1864