0% found this document useful (0 votes)
36 views5 pages

A Two-Step Technique For MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis

This document proposes a two-step technique for enhancing speech collected during MRI scans. The first step uses probabilistic latent component analysis (PLCA) to learn dictionaries of noise and speech+noise and separate them. The second step applies wavelet packet analysis and thresholding to further suppress residual noise in the estimated speech. MRI noise is high-energy and broadband, corrupting speech recordings. While existing techniques like LMS work well, they assume additive noise and may not generalize. The proposed technique exploits source separation and wavelet analysis to remove convolutive MRI noise without relying on its periodic properties. Evaluation shows it outperforms traditional methods like LMS.

Uploaded by

souhir bousselmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views5 pages

A Two-Step Technique For MRI Audio Enhancement Using Dictionary Learning and Wavelet Packet Analysis

This document proposes a two-step technique for enhancing speech collected during MRI scans. The first step uses probabilistic latent component analysis (PLCA) to learn dictionaries of noise and speech+noise and separate them. The second step applies wavelet packet analysis and thresholding to further suppress residual noise in the estimated speech. MRI noise is high-energy and broadband, corrupting speech recordings. While existing techniques like LMS work well, they assume additive noise and may not generalize. The proposed technique exploits source separation and wavelet analysis to remove convolutive MRI noise without relying on its periodic properties. Evaluation shows it outperforms traditional methods like LMS.

Uploaded by

souhir bousselmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A two-step technique for MRI audio enhancement using dictionary learning

and wavelet packet analysis

Colin Vaz, Vikram Ramanarayanan, and Shrikanth Narayanan

Ming Hsieh Department of Electrical Engineering


University of Southern California, Los Angeles, CA – 90089
<cvaz,vramanar>@usc.edu, [email protected]

Abstract mates the filter weights of an unknown system by mini-


We present a method for speech enhancement of data col- mizing the mean square error between the denoised signal
lected in extremely noisy environments, such as those and a reference signal. This approach removes noise from
found during magnetic resonance imaging (MRI) scans. the noisy signal very well, but it severely degrades the
We propose a two-step algorithm to perform this noise quality of the recovered speech. Bresch et al. proposed
suppression. First, we use probabilistic latent com- a variant to the LMS algorithm in [4] to remove MRI
ponent analysis to learn dictionaries of the noise and noise from noisy recordings. This method uses knowl-
speech+noise portions of the data and use these to factor edge of the MRI pulse sequence to design an artificial
the noisy spectrum into estimated speech and noise com- reference “noise” signal that can be used in place of a
ponents. Second, we apply a wavelet packet analysis in recorded noise reference. We found that this method out-
conjunction with a wavelet threshold that minimizes the performs LMS in denoising speech corrupted with noise
KL divergence between the estimated speech and noise from certain types of pulse sequences. Unfortunately, it
to achieve further noise suppression. Based on both ob- performs rather poorly when the noise frequencies are
jective and subjective assessments, we find that our al- spaced closely together in the frequency domain. Fur-
gorithm significantly outperforms traditional techniques thermore, the algorithm creates a reverberant artifact in
such as nLMS, while not requiring prior knowledge or the denoised signal, which makes speech analysis chal-
periodicity of the noise waveforms that current state-of- lenging. The LMS formulation assumes additive noise,
the-art algorithms require. so these algorithms may not perform well in the presence
Index Terms: rtMRI, noise suppression, wavelets, of convolutive noise in the signal, which we encounter
pLCA, dictionary learning. during MRI scans.

Source separation techniques provide a way to sepa-


1. Introduction rate the speech and noise. Duan et al. proposed a prob-
Speech science researchers use a variety of methods to abilistic component analysis (PLCA) algorithm in [5].
study articulation and the associated acoustic details of This algorithm learns the dictionaries and their associ-
speech production. These include Electromagnetic Ar- ated time activation weights for the speech and noise,
ticulography [1] and x-ray microbeam [2] methods that thus separating the speech from noise. In recent decades,
track the movement of articulators while subjects speak wavelets have been used for denoising speech and images
into a microphone. Data from these methods offer excel- [6]. Discrete wavelet transforms, wavelet packet analysis,
lent temporal details of speech production. Such meth- and lifting have been developed to aid signal denoising.
ods, however, are invasive and do not offer a full view of Both PLCA and wavelet analysis are useful for remov-
the vocal tract. On the other hand, methods using real- ing convolutive noise from signals because there is no
time MRI (rtMRI) offer a non-invasive method for imag- underlying assumption of additive noise. We propose an
ing the vocal tract, affording access to more structural algorithm that takes advantage of source separation and
details [3]. Unfortunately, MRI scanners produce high- wavelet analysis to denoise speech recorded in an MRI
energy broadband noise that corrupts the speech record- scanner.
ing. This affects the ability to analyze the speech acous-
tics resulting from the articulation and requires additional This paper is organized as follows. Section 2 dis-
schemes to improve the audio quality. cusses properties of MRI noise. In Section 3, we describe
The Least Mean Squares (LMS) algorithm is a pop- the method we used to perform denoising. Section 4 dis-
ular technique for signal denoising. The algorithm esti- cusses the results of our method on data acquired from
The authors would like to acknowledge the support of NIH Grant MRI scans and artificially-created noisy speech. Finally,
DC007124. we state our conclusions and future work in Section 5.
2. MRI Noise 3.1. Step 1: PLCA
A primary source of MRI noise arises from Lorentz PLCA uses non-negative matrix factorization (NMF) to
forces acting on receiver coils in the body of an MRI factor a spectrogram of the noisy speech into noise and
scanner. These forces cause vibrations of the coils, speech dictionaries and their corresponding time activa-
which impact against their mountings. The result is a tion weights. We first train the algorithm with the MRI
high-energy broadband noise that can reach as high as noise to learn a noise dictionary and its time activation
115 dBA [7]. The noise corrupts the speech recording, weights. Once learned, the noise dictionary stays fixed
making it hard to listen to the speaker, and can obscure for the duration of the PLCA algorithm. We obtain the
important details in speech. noise-only recording from the beginning 1 second of the
noisy speech recording before the speaker speaks (it is
MRI pulse sequences typically used in rtMRI produce
usually the case that the speaker speaks at least 1 second
periodic noise. The fundamental frequency of this noise,
after the start of the recording).
i.e., the closest spacing between two adjacent noise fre-
After training on the noise, we give PLCA the noisy
quencies in the frequency spectrum, is given by:
speech spectrogram for source separation. The algorithm
takes each frame of the spectrogram and computes the
1 KL divergence between the spectrogram frame and the
f0 = Hz (1)
repetition time × number of interleaves current estimate of the noise spectrum. If the KL diver-
gence is low, then it updates the time activation weights
The repetition time and number of interleaves are scan- of the noise. If the KL divergence is high, then it updates
ning parameters set by the MRI operator. Choice of these the speech dictionary and the time activation weights for
parameters inform the spatial and temporal resolution of the speech and noise. PLCA uses the EM algorithm to
the reconstructed image sequence, as well as the spectral update the speech dictionary.
characteristics of the generated noise. Importantly, the After the algorithm processes all the spectrogram
periodicity of the noise allows us to design effective de- frames, it returns an estimate of the speech and noise.
noising algorithms for time-synchronized audio collected The algorithm performs well at removing noise in silence
during rtMRI scans. For instance, the algorithm proposed regions and suppressing some of the noise in speech re-
by Bresch et al. [4] relies on knowing f0 to create an arti- gions. To remove the residual noise in the speech es-
ficial “noise” signal which can then be used as a reference timate, we turn to wavelet packet analysis. Nonethe-
signal by standard adaptive noise cancellation algorithms. less, PLCA removes enough noise to make wavelet
packet analysis a viable option for denoising; perform-
However, a few pulse sequences do not exhibit this ing wavelet packet analysis on the original noisy speech
exact periodic structure. In addition, there are other use- recording does not work well because the energy of the
ful sequences that are either periodic with an extremely MRI noise is too high compared to the energy of the
large period, resulting in very closely-spaced noise fre- speech.
quencies in the spectrum (i.e., f0 is very small), or are
periodic with discontinuities that can introduce artifacts 3.2. Step 2: Wavelet Packet Analysis
in the spectrum. To handle these cases, it is essential that
denoising algorithms do not rely on periodicity. One ex- Wavelet packet analysis iteratively decomposes a signal
ample of such sequences which we will consider in this into lowpass and highpass bands using a quadrature mir-
article is the Golden Ratio (GR) sequence [8], which al- ror filter (QMF) to produce different levels of frequency
lows for retrospective and flexible selection of temporal resolution. We pass the estimated speech from PLCA into
resolution of the reconstructed image sequences (typical a D-level wavelet packet, which yields wavelet coeffi-
rtMRI protocols do not allow this desirable property). cients in 2D subbands.
We threshold the wavelet coefficients to remove the
noise. Tabibian et al. proposed a threshold in [9] that
3. Denoising Algorithm minimizes the symmetric KL divergence between the
We propose a denoising algorithm that uses PLCA and noisy speech coefficients and noise coefficients in the
wavelet packet analysis. A noisy recording is given to range of −λ to λ, where λ is the threshold value. With
PLCA, which separates the signal into estimated speech this formulation, they solved for the threshold to get:
and noise components. Then, the estimated speech is s
2 r
σ̂N

passed to a wavelet packet algorithm for further noise 1
λ= k
2 (ξk + ξk2 ) ln 1+ (2)
removal. The result of the wavelet packet algorithm is ξk ξk
a denoised speech recording. Figure 1 shows the spec-
trograms of the signal at each stage of the algorithm. The where
2
following subsections describe PLCA and wavelet packet σ̂X k
ξk = 2 (3)
analysis in greater detail. σ̂N k
(a) Noisy speech (b) Estimated speech (c) Denoised speech
Figure 1: Spectrograms of the TIMIT sentence “Don’t ask me to carry an oily rag like that” spoken by a male. PLCA
processes the recording from the MRI scanner (a) to produce a speech estimate (b). Wavelet analysis subsequently
removes residual noise in the estimated speech to produce the denoised speech (c).

2
Here, σ̂N k
is the estimated variance of the noise coeffi- signal. Consequently, we supplemented our evaluation
2
cients in subband k of level D and σ̂X k
is the estimated with clean speech recordings from the Aurora 5 digits
variance of the noisy signal coefficients in subband k of database. We added the two MRI noises to the clean
level D, k = 1, 2, . . . , 2D . To compute the threshold, we speech with an SNR of −6 dB, which is similar to the
need an estimate of the noise. If the MRI noise is peri- SNR in the TIMIT utterances.
odic, we can estimate the noise with We compared the performance of our proposed algo-
X rithm to the normalized LMS algorithm (denoted LMS-1)
v[n] = αk cos(2πf0 kn) (4) and the LMS variant proposed in [4] (denoted LMS-2).
k
For LMS-1, we used a filter length of 3000 and a step
where f0 is calculated using Equation 1 and αk is a scalar size of 1. The LMS-2 algorithm did not need any param-
that shapes the spectrum of v[n] to match the spectral eter tuning; these are set by the algorithm and vary based
shape of the MRI noise. For non-periodic MRI noise, we on the MRI pulse sequence used to acquire the recording.
can estimate the noise from the beginning 1 second of the LMS-2 is known to perform well with seq1 noise and is
estimated noise calculated by PLCA. This gives us the currently used to remove seq1 noise from speech record-
flexibility to denoise speech corrupted by non-periodic ings. However, its performance degrades with GR noise,
MRI noise. Since the noise in our experiments is peri- preventing speech researchers from collecting better MRI
odic, we use v[n] for the noise estimate because it per- images using GR pulse sequences.
forms marginally better than estimating the noise from
PLCA’s noise estimate. Once we calculate the threshold, 4.1. Quantitative Performance Metrics
we soft threshold the wavelet coefficients in each subband To quantify the performance of our denoising algorithm,
and reconstruct the denoised signal from the thresholded we calculated the noise suppression, which is given by:
coefficients.  
Soon et al. reported very little difference in the SNR Pnoise
noise suppression = 10 log (5)
of the denoised signal when using different wavelets, P̂noise
even accounting for varying SNR of the noisy signal where Pnoise is the power of the noise in the noisy signal
and male/female speakers [10]. They evaluated denois- and P̂noise is the power of the noise in the denoised sig-
ing performance using biorthogonal, Daubechies, Coiflet, nal. We use a voice activity detector (VAD) to find the
and Symmlet wavelets with different wavelet orders. Our noise-only regions in the denoised and noisy signals. We
experiments corroborated their findings; we found very calculate the noise suppression measure instead of SNR
little difference in the quality of the denoised signal, because we do not have a clean reference signal for the
both quantitatively and perceptually, when using differ- TIMIT utterances.
ent wavelets. Thus, we empirically found the Beylkin Ramachandran et al. proposed the log-likelihood ra-
wavelet to give the maximum noise suppression, and we tio (LLR) and distortion variance measures in [11] for
used this wavelet for the wavelet analysis and synthesis. evaluating denoising algorithms. The LLR calculates the
mismatch between the spectral envelopes of the clean sig-
4. Experimental Evaluation nal and the denoised signal. It is calculated using:
We tested our algorithm on a set of 6 TIMIT utterances
aTŝ Rs aŝ
recorded in an MRI scanner, with two different scanner LLR = log (6)
settings that produce two different periodic noises we aTs Rs as
will call seq1 and GR. The drawback with using these where as and aŝ are p-order LPC coefficients of the
recordings for evaluation is the lack of a clean reference clean and denoised signals respectively, and Rs is a
(p+1)×(p+1) autocorrelation matrix of the clean signal. Table 1: Noise suppression results for TIMIT sentences.
An LLR of 0 indicates no spectral distortion between the Proposed LMS-1 LMS-2
clean and denoised signals, while a high LLR indicates seq1 19.27 18.01 18.79
the presence of noise and/or distortion in the denoised GR 24.1 18.37 9.17
signal. The distortion variance is given by:
1 Table 2: Noise suppression (NS), LLR, and distortion
σd2 = ks[n] − ŝ[n]k2 (7) variance (DV) results for the Aurora 5 digits.
L
Metric Sequence Proposed LMS-1 LMS-2
where s[n] and ŝ[n] are the clean and denoised signals seq1 30.23 32.55 26.53
NS (dB)
respectively, and L is the length of the signal. A low dis- GR 24.14 27.88 10.91
tortion variance is more desirable than a high distortion LLR
seq1 0.17 0.4 0.42
variance. GR 0.11 0.41 0.33
seq1 7.52 34.8 21.4
DV (×10−5 )
GR 9.56 35.8 37.7
4.2. Qualitative Performance Metrics
To supplement the quantitative results, we created a lis-
tening test to compare the denoised signals from our pro- Test showed that the medians of rankings obtained for
posed algorithm, as well as LMS-1 and LMS-2. We cre- each denoising algorithm were significantly different at
ated 12 sets of audio clips in 4 different environments: the α = 99% level. We then used the post-hoc Wilcoxon
TIMIT utterances with seq1 noise, TIMIT utterances with rank-sum test to check for pairwise differences in the me-
GR noise, Aurora digits with seq1 noise, and Aurora dig- dian ranks. The Wilcoxon test results show that the me-
its with GR noise. Each environment contained 3 sets dian ranks for each pair of clips are significantly different
of audio clips. Each set contained a noisy signal and at the α = 99% level, except for the case of the LMS-
denoised versions of the signal from the proposed algo- 1/noisy pair for the TIMIT utterances with seq1 noise en-
rithm, LMS-1, and LMS-2. For the sets with Aurora dig- vironment. Hence, we can say with some certainty that
its, we also included the clean signal. Thus, each set with listeners ranked our algorithm as the best for removing
TIMIT utterances had 4 clips and each set with Aurora GR noise and second best for removing seq1 noise.
digits had 5 clips. The sets and the clips within each set
were randomized and presented in an online survey. 25 5. Conclusions
volunteers ranked each clip within a set from 1 to 4 or 5,
We have proposed a denoising algorithm to remove noise
with 1 meaning best quality and intelligibility.
from speech recorded in an MRI scanner. The two-step
algorithm uses PLCA to separate the noise and speech,
4.3. Results
and wavelet packet analysis to further remove noise left
Objective measures: Table 1 lists the noise suppression by the PLCA algorithm. Objective measures show that
for the TIMIT utterances. Table 2 shows the noise sup- our proposed algorithm achieves better noise suppression
pression, LLR, and distortion variance results for the Au- and less spectral distortion than LMS methods. A lis-
rora digits. For TIMIT utterances corrupted by seq1 and tening test shows that our algorithm yields higher quality
GR noises, our proposed algorithm suppresses noise bet- and more intelligible speech than LMS methods.
ter than LMS-1 and LMS-2. Our algorithm performs
To further extend our work, we will compare our pro-
slightly worse than LMS-1 for Aurora digits corrupted
posed algorithm to other denoising methods, such as sig-
by seq1 and GR noises. This is because the noise in the
nal subspace and model-based approaches. Additionally,
Aurora recordings is purely additive, while the noise in
we need to evaluate how well our algorithm aids speech
the direct MRI TIMIT recordings is more convolutive in
analysis, such as formant extraction. Finally, we will
nature. Our experiments confirmed that LMS-2 performs
evaluate the performance of our algorithm in other low-
better on seq1 noise than GR noise, both for the TIMIT
SNR speech enhancement scenarios, such as those in-
utterances and Aurora digits. Importantly, our proposed
volving Gaussian, Cauchy, babble, and traffic noises.
algorithm performs comparably to LMS-2 in seq1 noise.
The LLR and distortion variance results show that our
algorithm reconstructed the spectral characteristics of the Table 3: Median rankings of the audio clips for the four
clean signal more faithfully than LMS-1 and LMS-2. Pre- environments
serving spectral characteristics of the signal is a key re-
ENVIRONMENT ALGORITHM
sult when considering denoising speech for subsequent Clean Proposed LMS-1 LMS-2 Noisy
speech analysis and modeling. TIMIT, seq1 noise 2 3 1 4
Subjective measures: Table 3 shows the median rank- TIMIT, GR noise 1 2 3 4
ings obtained from the listening test for the audio clips Aurora, seq1 noise 1 3 4 2 5
Aurora, GR noise 1 2 3 4 5
in the 4 environments. A nonparametric Kruskal-Wallis
6. References
[1] W. F. Katz, S. V. Bharadwaj, and B. Carstens, “Electromagnetic
Articulography Treatment for an Adult With Broca’s Aphasia and
Apraxia of Speech,” J. Speech, Language, and Hearing Research,
vol. 42, no. 6, pp. 1355–1366, Dec. 1999.
[2] M. Itoh, S. Sasanuma, H. Hirose, H. Yoshioka, and T. Ushi-
jima, “Abnormal articulatory dynamics in a patient with apraxia
of speech: X-ray microbeam observation,” Brain and Language,
vol. 11, no. 1, pp. 66–75, Sep. 1980.
[3] D. Byrd, S. Tobin, E. Bresch, and S. Narayanan, “Timing effects
of syllable structure and stress on nasals: A real-time MRI exam-
ination,” J. Phonetics, vol. 37, no. 1, pp. 97–110, Jan. 2009.
[4] E. Bresch, J. Nielsen, K. S. Nayak, and S. Narayanan, “Synchro-
nized and Noise-Robust Audio Recordings During Realtime Mag-
netic Resonance Imaging Scans,” J. Acoustical Society of Amer-
ica, vol. 120, no. 4, pp. 1791–1794, Oct. 2006.
[5] Z. Duan, G. J. Mysore, and P. Smaragdis, “Online PLCA for
Real-time Semi-supervised Source Separation,” in Proc. Int. Conf.
Latent Variable Analysis/Independent Component Analysis, Tel-
Aviv, Israel, 2012, pp. 34–41.
[6] Y. Ghanbari and M. R. Karami-Mollaei, “A new approach for
speech enhancement based on the adaptive thresholding of the
wavelet packets,” Speech Commun., vol. 48, no. 8, pp. 927–940,
Aug. 2006.
[7] M. McJury and F. G. Shellock, “Auditory Noise Associated with
MR Procedures,” J. Magnetic Resonance Imaging, vol. 12, no. 1,
pp. 37–45, Jul. 2001.
[8] Y. Kim, S. S. Narayanan, and K. S. Nayak, “Flexible retrospective
selection of temporal resolution in real-time speech MRI using a
golden-ratio spiral view order,” Magnetic Resonance in Medicine,
vol. 65, no. 5, pp. 1365–1371, 2011.
[9] S. Tabibian, A. Akbari, and B. Nasersharif, “A New Wavelet
Thresholding Method for Speech Enhancement Based on Sym-
metric Kullback-Leibler Divergence,” in 14th Int. Computer Soci-
ety of Iran Computer Conf., Tehran, Iran, 2009, pp. 495–500.
[10] I. Y. Soon, S. N. Koh, and C. K. Yeo, “Wavelet for Speech De-
noising,” in Proc. IEEE Region 10 Annu. Conf. Speech and Im-
age Technologies Computing and Telecommunications, Brisbane,
Australia, 1997, pp. 479–482.
[11] V. R. Ramachandran, I. M. S. Panahi, and A. A. Milani, “Objec-
tive and Subjective Evaluation of Adaptive Speech Enhancement
Methods for Functional MRI,” J. Magnetic Resonance Imaging,
vol. 31, no. 1, pp. 46–55, Dec. 2009.

You might also like