Voice Disorder Detection Using Long Short Term Memory (LSTM) Model

This document proposes using a Long Short Term Memory (LSTM) model to detect voice disorders from audio data. It discusses how LSTM models can learn from sequential data like audio better than other models. The proposed approach uses the FEMH voice disorder dataset to train and evaluate an LSTM model. Feature extraction is applied to the audio clips before feeding them to the LSTM. The model achieves 22% sensitivity and 97% specificity on a test set of 400 voice samples, showing promising results for voice disorder detection using LSTMs.

Uploaded by

WISSAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views4 pages

Voice Disorder Detection Using Long Short Term Memory (LSTM) Model

Uploaded by

WISSAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Voice Disorder Detection Using Long Short Term

Memory (LSTM) Model

Vibhuti Gupta
Department of Computer Science
Texas Tech University, Lubbock, TX 79415
Email: [email protected]

Abstract— Automated detection of voice disorders with is also important to cure the correct disorder with proper
computational methods is a recent research area in the treatment.
medical domain since it requires a rigorous endoscopy for
the accurate diagnosis. Efficient screening methods are Automated detection of voice disorders is crucial to
required for the diagnosis of voice disorders so as to mitigate these problems since it makes the diagnosis process
provide timely medical facilities in minimal resources. simpler, cheaper and less time consuming. Recent research on
Detecting Voice disorder using computational methods is a computerized detection of voice disorders has studied various
challenging problem since audio data is continuous due to machine learning techniques and few deep learning techniques
which extracting relevant features and applying machine [3-5, 6-10, 11-13]. Majority of the previous work deals with
learning is hard and unreliable. This paper proposes a machine learning techniques for voice disorder detection [3,4].
Long short term memory model (LSTM) to detect [3] used rule based analysis by analyzing various acoustic
pathological voice disorders and evaluates its performance measures such as Fundamental Frequency, jitter, shimmer etc.
in a real 400 testing samples without any labels. Different and then applied logistic model tree algorithm, instance based
feature extraction methods are used to provide the best set learning and SVM algorithms while [4] used SVM and
of features before applying LSTM model for classification. decision trees for detecting voice disorders. Muhammad et. al
The paper describes the approach and experiments that [5] used gaussian mixture model (GMM) to classify 6 different
show promising results with 22% sensitivity, 97% types of voice disorders.
specificity and 56% unweighted average recall.
Deep learning is widely used nowadays for image
recognition, music genre classification and various other
Keywords— Neoplasm; Phonotrauma; Vocal Paralysis; Long applications. It is recently used for voice disorder detection
Short Term Memory; Mel frequency cepstral coefficient tasks [6-10]. Most recently [6] applied deep neural networks
I. INTRODUCTION (DNN) for voice disorder detection using dataset of Far Eastern
Memorial Hospital (FEMH) with 60 normal voice samples and
A voice disorder occurs due to disturbance in respiratory, 402 various voice disorder samples and achieved the highest
laryngeal, subglottal vocal tract or physiological imbalance accuracy as compared to other machine learning approaches.
among the system which causes abnormal voice quality, pitch Authors at [7] discussed the use of deep neural networks
and loudness as compared to normal voice of a healthy (DNN) in acoustic modeling. They have applied DNN in
person[21]. Major voice disorders include vocal nodules various speech recognition tasks and found that DNN’s are
polyps, and cysts (collectively referred as Phonotrauma), performing well. Wu et. al [8] used convolutional neural
glottis neoplasm; and unilateral vocal paralysis. Voice network (CNN) for vocal cord paralysis which is a challenging
disorders may affect a person’s social, professional and medical classification problem. Alhussein et. al [9] applied
personal aspects of communication which hinders its growth in deep learning into a mobile healthcare framework to detect
all these aspects [2]. voice disorder.
Current approaches for voice disorder detection requires Despite the success of above mentioned models, recurrent
rigorous endoscopy (i.e. laryngeal endoscopy) which is a neural networks (RNN) are not used for voice disorder tasks.
multistep examination including mirror examination, rigid and Recurrent neural networks are widely used for speech
flexible laryngoscopy and videostroboscopy [1][22]. This recognition, music genre classification, natural language
rigorous examination requires a lot of expensive medical processing and sequence prediction problems [11-12] . Long
resources and delays the diagnosis of voice disorders due to short term memory (LSTM) is a special type of recurrent
which treatment get delayed which worsen the severity of the neural network which is widely used for long term
disease. Sometimes voice disorders remain unidentified since dependencies. [11] used LSTM for voice activity detection
they are considered as normal by most of the people due to which separates the incoming speech with noise. Convolutional
inefficient and slow screening methods. Accuracy in diagnosis neural networks are used along with LSTM in [12] to
determine dysarthric speech disorder. To the best of our B. Long Short Term Memory (LSTM) Model
knowledge, none of the studies used LSTM for voice disorder
Long Short term Memory Network (LSTM) are the special
detection task.
type of Recurrent Neural networks capable of learning long
Our major contributions in this paper are: (1) to propose an term dependencies [17]. A typical LSTM network has 4 layers
approach to detect pathological voice disorders using Long i.e. input layer, 2 hidden layers and one output layer. It
short term memory (LSTM) model. (2) to evaluate LSTM contains three gates forget gate, Input gate and Output gate.
performance in differentiating normal and pathological voice
samples. The rest of the paper is organized as follows. Section
II discusses the material and methods. Section III describes
our experimental setup along with results and Section IV
concludes the paper.
II. METHOD
This section provides a brief overview of our proposed
approach with the general description of Long short Term
memory Model (LSTM) used in our experiments and
description of dataset with preprocessing part.
A. Overview of proposed approach Fig. 2 LSTM Network
Our proposed approach starts by loading the input voice
samples provided by Far Eastern Memorial Hospital (FEMH) Forget gate layer decides what information has to be kept
voice disorder detection challenge [16] as shown in Figure 1 or thrown away from the cell state. It takes input as ht-1 and xt
which includes 50 normal voice samples and 150 samples of and outputs a number between 0 and 1 using ft as in the
Eqn(1). Value of 0 indicates completely remove and 1 to
common voice disorders, including vocal nodules, polyps, and
completely keep this.
cysts (collectively referred as Phonotrauma); glottis neoplasm;
and unilateral vocal paralysis, that comprises of our training ft = σ(Wf [ht-1 , xt ] + bf (1)
dataset.
Now we need to decide what information has to be stored
in the cell state. It has two parts , firstly input gate layer using
Loading FEMH voice
disorder Detection Dataset
to decide what values has to be updated and then tanh layer
generates a vector of new candidate values that has to be
added. it is the function used by input gate layer and C is the
vector of new candidate values by tanh layer as shown in the
Feature Extraction Eqn (2) and (3).

it = σ(Wi [ht-1 , xt ] + bi (2)

Long Short Term Memory C = tanh(WC [ht-1 , xt ] + bC (3)

(LSTM) Training
Updated state of cell is shown in Eqn (4)

Ct = ft *Ct-1 + it * C (4)
Trained LSTM model
Finally, we need to decide what will be the output using
output gate. First we run the sigmoid layer using ot as shown
in the Eqn (5) and then its output is multiplied by tanh to get
Classification
the output which is shown in Eqn. (6)
Fig. 1 Overview of proposed approach
ot = σ(Wo [ht-1 , xt ] + bo (5)
Feature extraction process is done after loading the data
that includes Mel-frequency cepstral coefficients (MFCC), ht = ot *tanh(Ct) (6)
spectral centroid, chroma and spectral contrast features
comprising 33 features for each audio sample. Details are outputclass = σ(ht * Woutparameter) (7)
provided in the further sections. Then LSTM model is used to
The output class of the LSTM network is determined by the
train the model which is used for classification.
Eqn. (7) Wf, Wi, WC, Wo, Woutparameter are the weights , bf , bi,
bC, bo are the biases, ht is the output at time t , xt are the input
features and outputclass is the classification output.
C. Dataset and Preprocessing
The dataset comprises of 200 samples in the training set
and 400 samples in the testing set. Out of 150 common voice
disorder samples, 40 are for glottis neoplasm, 60 for
Phonotrauma and 50 are for vocal palsy in the training set. The
labels of training dataset includes gender, age, whether the
speaker is healthy or not and the corresponding voice disease.
Voice samples of a 3-second sustained vowel sound were
recorded at a comfortable level of loudness, with a
Fig. 5 Waveform of Phonotrauma Voice disorder
microphone-to-mouth distance of approximately 15–20 cm,
using a high-quality microphone (Model: SM58, SHURE, IL),
with a digital amplifier (Model: X2u, SHURE) under a
background noise level between 40 and 45 dBA. The sampling
rate was 44,100 Hz with a 16-bit resolution, and data were
saved in an uncompressed .wav format as used in [6]. Further
dataset information is given in [6][16].
Visualization of voice samples is done using the waveforms
as shown in Figures 3,4,5,6. It shows waveform whose y-axis
represents the amplitude of voice sample and x-axis as time
duration. We plotted 4 secs duration of each type of voice Sample
sample with a sampling rate of 22050 Hz.
Fig. 6 Waveform of Vocal Palsy Voice disorder

Data preprocessing includes feature extraction using

different methods such as Mel-frequency cepstral coefficients
(MFCC), spectral centroid, chroma and spectral contrast etc.
For each voice sample we extracted 33 features combining all
the feature extraction techniques. A brief overview of various
feature extraction techniques are provided in below sections.
a) Mel Frequency cepstral Coefficients (MFCC): MFCC
Fig. 3 Waveform of Normal Voice Sample features are widely used in music genre classification, audio
classification and speech recognition tasks, so we used it in this
work. We extracted 13 MFCC features from each voice
sample. More details on extracting MFCC features can be
found at [18].

b) Spectral Centroid: Spectral centroid provides the

center of mass of the spectrum. It provides the average
loudness in terms of audio processing. One feature is extracted
from each audio sample using spectral centroid. More details
can be found at [19].

c) Chroma: Chroma provides a chromagram from a

Fig. 4 Waveform of Neoplasm Voice disorder Sample
waveform. 12 features are extracted using chroma from each
audio sample. More details can be found at [20].
As shown in Figures 3 and 4 normal voice sample
amplitude is fluctuating while neoplasm disorder waveform d) Spectral Contrast: Spectral contrast represents the
spectral characteristics of audio sample. We extracted 13
not. Waveforms for phonotrauma and vocal palsy as shown in
features using spectral contrast from each audio sample. More
Figures 5 and 6 shows variations as compared to normal and
details can be found at [13].
neoplasm voice samples. Amplitudes for Phonotrauma
disorder are fluctuating and increasing while for vocal palsy,
its decreasing. Normal voice sample can be easily
distinguishable with Phonotrauma and Vocal palsy disorder
but not much with Neoplasm disorder.
III. EXPERIMENTS AND RESULTS REFERENCES
This section provides the experiments and results to [1] Schwartz SR, Cohen SM, Dailey SH, et al. Clinical practice guideline:
evaluate the effectiveness of our approach. The design of our hoarseness (dysphonia). Otolaryngol Head Neck Surg. 2009;141:S1–
LSTM network is shown in Table I showing one input layer S31.
with all 33 features extracted from each voice sample,2 hidden [2] Hegde, S., Shetty, S., Rai, S., & Dodderi, T. (2018). A Survey on
Machine Learning Approaches for Automatic Detection of Voice
layers out of which first hidden layer has 128 neurons while Disorders. Journal of Voice.
second one has 32 neurons and one output layer for predicting [3] Cesari, U., De Pietro, G., Marciano, E., Niri, C., Sannino, G., & Verde,
whether the voice sample is normal or having a voice disorder. L. (2018). Voice Disorder Detection via an m-Health System: Design
We used the same experimental setup as [14] which was used and Results of a Clinical Study to Evaluate Vox4Health. BioMed
research international, 2018.
for music genre classification as it provided promising results.
[4] Verde, L., De Pietro, G., & Sannino, G. (2018). Voice Disorder
Table I: Design of LSTM Network Identification by using Machine Learning Techniques. IEEE Access.
[5] Muhammad G, Mesallam TA, Malki KH, et al. Multidirectional
Input Layer 33 Input features extracted from audio regression (MDR)-based features for automatic voice disorder detection.
samples J Voice. 2012;26. 817.e19−817.e27.
Hidden Layer I 128 neurons [6] Fang, S. H., Tsao, Y., Hsiao, M. J., Chen, J. Y., Lai, Y. H., Lin, F. C., &
Hidden Layer II 32 neurons Wang, C. T. (2018). Detection of Pathological Voice Using Cepstrum
Output Layer 4 outputs corresponding to 3 different voice Vectors: A Deep Learning Approach. Journal of Voice.
disorders and 1 normal voice [7] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic
modeling in speech recognition. IEEE Signal Proc Mag. 2012;29:82–97.
[8] Wu, H., Soraghan, J., Lowit, A., & Di Caterina, G. (2018). A deep
For training the LSTM model , optimizer used is Adams learning method for pathological voice detection using convolutional
[15] while different batch sizes and epochs are used to get the deep belief networks. Interspeech 2018.
best results. Categorical cross entropy is used as a loss function [9] Alhussein, M., & Muhammad, G. (2018). Voice Pathology Detection
to measure the performance of classification model at each Using Deep Learning on Mobile Healthcare Framework. IEEE
epoch. Increasing the number of epochs helps in improving the Access, 6, 41034-41041.
performance of the model. [10] Harar, P., Alonso-Hernandezy, J. B., Mekyska, J., Galaz, Z., Burget, R.,
& Smekal, Z. (2017, July). Voice pathology detection using deep
Table II: Results of two phases learning: a preliminary study. In Bioinspired Intelligence (IWOBI),
2017 International Conference and Workshop on (pp. 1-4). IEEE.
Result Phase Sensitivity Specificity UAR [11] Kim, J., Kim, J., Lee, S., Park, J., & Hahn, M. (2016, November).
I 30% 95.7% 54% Vowel based voice activity detection with LSTM recurrent neural
network. In Proceedings of the 8th International Conference on Signal
II 22% 97.1% 56% Processing Systems (pp. 134-137). ACM.
[12] Kim, M., Cao, B., An, K., & Wang, J. (2018). Dysarthric Speech
Recognition Using Convolutional LSTM Neural Network. Proc.
Interspeech 2018, 2948-2952.
Table II shows the results obtained in two phases of results [13] Jiang, D. N., Lu, L., Zhang, H. J., Tao, J. H., & Cai, L. H. (2002). Music
in FEMH Big data cup challenge. As we can see specificity is type classification by spectral contrast feature. In Multimedia and Expo,
high in both the phases but sensitivity is low. Sensitivity 2002. ICME'02. Proceedings. 2002 IEEE International Conference
determines the true positive rate while specificity true negative on (Vol. 1, pp. 113-116). IEEE.
rate. Results show that normal voice people are correctly [14] Tang, C. P., Chui, K. L., Yu, Y. K., Zeng, Z., & Wong, K. H. (2018).
Music Genre classification using a hierarchical Long Short Term
identified as normal as compared to the people with the voice Memory (LSTM) model.
disorder but unweighted average recall (UAR) represents the [15] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic
mean of recalls for both classes which increases with the optimization. arXiv preprint arXiv:1412.6980.
number of epochs. In phase I we run the experiment for 500 [16] FEMH challenge, Accessed August 2018, URL: https://femh-
epochs but in Phase II we run it for 5000 epochs. challenge2018.weebly.com/
[17] Christopher Olah. Understanding lstm networks. GITHUB blog, posted
Our results show that our approach works fine but requires on August, 27, 2015
more optimization in the future for better results. [18] https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
[19] https://en.wikipedia.org/wiki/Spectral_centroid
IV. CONCLUSION [20] https://labrosa.ee.columbia.edu/matlab/chroma-ansyn/
This study presents a long short term memory (LSTM) [21] https://www.asha.org/practice-portal/clinical-topics/voice-disorders/
approach to detect pathological voice disorders. The results [22] https://voicefoundation.org/health-science/voice-disorders/overview-of-
show that it works fine in detecting the disorders. Also, diagnosis-treatment-prevention/
different feature extraction techniques shows that these
features can be beneficial for voice disorder detection. Future
work includes more experiments with different
hyperparameters to improve the results and use other feature
extraction techniques too for further improvement.

Cardiologie MANUAL
50% (12)
Cardiologie MANUAL
15 pages
F-86F Flight Manual + Performance Data.
100% (3)
F-86F Flight Manual + Performance Data.
436 pages
Detection of Parkinson's Disease Using Machine Learning
75% (4)
Detection of Parkinson's Disease Using Machine Learning
91 pages
Listino Est 2011
100% (1)
Listino Est 2011
321 pages
Victoria Code of Practice For Using Concrete Pump
0% (1)
Victoria Code of Practice For Using Concrete Pump
56 pages
Emotion Recognition Based On Speech Signals by Combining Empirical Mode Decomposition and Deep Neural Network
No ratings yet
Emotion Recognition Based On Speech Signals by Combining Empirical Mode Decomposition and Deep Neural Network
10 pages
Research Paper
No ratings yet
Research Paper
5 pages
DSP For MATLAB & LabVIEW I Fundamentals of Discrete Signal Processing
100% (1)
DSP For MATLAB & LabVIEW I Fundamentals of Discrete Signal Processing
233 pages
Kurmanji Complete
100% (2)
Kurmanji Complete
217 pages
Speech Emotion Recognition From Raw Audio Using Deep Learning
No ratings yet
Speech Emotion Recognition From Raw Audio Using Deep Learning
83 pages
v1 Covered
No ratings yet
v1 Covered
32 pages
Parkinson's Disease Detection Using Ensemble Learning
No ratings yet
Parkinson's Disease Detection Using Ensemble Learning
7 pages
Swin
No ratings yet
Swin
23 pages
Paper 10
No ratings yet
Paper 10
18 pages
2020 Novel Multicenter
No ratings yet
2020 Novel Multicenter
9 pages
A Modular Deep Learning Architecture For Voice Pathology Classification
No ratings yet
A Modular Deep Learning Architecture For Voice Pathology Classification
14 pages
Voice Disorder Classification Using Speech Enhancement and Deep Learning Models
No ratings yet
Voice Disorder Classification Using Speech Enhancement and Deep Learning Models
18 pages
Dysarthria
No ratings yet
Dysarthria
21 pages
Emotional Expression Detection in Spoken Language Employing Machine Learning Algorithms
No ratings yet
Emotional Expression Detection in Spoken Language Employing Machine Learning Algorithms
15 pages
Electronics 12 00839 v2
No ratings yet
Electronics 12 00839 v2
17 pages
Speech Processing Article
No ratings yet
Speech Processing Article
13 pages
2019 Learning Strategies For VD
No ratings yet
2019 Learning Strategies For VD
7 pages
Deep Learning-Based Parkinson's Disease Classification Using Vocal Feature Sets
No ratings yet
Deep Learning-Based Parkinson's Disease Classification Using Vocal Feature Sets
12 pages
Reference 14
No ratings yet
Reference 14
14 pages
Gender Detection by Voice Using Deep Learning
No ratings yet
Gender Detection by Voice Using Deep Learning
5 pages
Zhao 2019
No ratings yet
Zhao 2019
12 pages
Reference 12
No ratings yet
Reference 12
14 pages
Violet
No ratings yet
Violet
9 pages
A Hybrid Approach For Binary and Multi-Class Class
No ratings yet
A Hybrid Approach For Binary and Multi-Class Class
15 pages
Real-Time Emergency Vehicle Event Detection Using Audio Data Zubayer Islam
No ratings yet
Real-Time Emergency Vehicle Event Detection Using Audio Data Zubayer Islam
11 pages
Entropy 21 00479 PDF
No ratings yet
Entropy 21 00479 PDF
17 pages
Voice Disorder Classification Using Convolutional Neural Network Based On Deep Transfer Learning1
No ratings yet
Voice Disorder Classification Using Convolutional Neural Network Based On Deep Transfer Learning1
15 pages
Speaker Emotion Recognition: Leveraging Self-Supervised Models For Feature Extraction Using Wav2Vec2 and Hubert
No ratings yet
Speaker Emotion Recognition: Leveraging Self-Supervised Models For Feature Extraction Using Wav2Vec2 and Hubert
9 pages
JETIR2106163
No ratings yet
JETIR2106163
5 pages
Jsip 2014021010293134
No ratings yet
Jsip 2014021010293134
7 pages
Driver Identification Based On Voice Signal Using Continuous Wavelet Transform and Artificial Neural Network Techniques
No ratings yet
Driver Identification Based On Voice Signal Using Continuous Wavelet Transform and Artificial Neural Network Techniques
4 pages
Classification of Parkinsons Disease Based On Acoustic Characteristics Using Fea
No ratings yet
Classification of Parkinsons Disease Based On Acoustic Characteristics Using Fea
6 pages
Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach
No ratings yet
Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach
8 pages
NeurIPS ML4PS 2019 151
No ratings yet
NeurIPS ML4PS 2019 151
6 pages
1802.05630v2 - Speech Emotion Detection
No ratings yet
1802.05630v2 - Speech Emotion Detection
5 pages
MachineLearning Final
No ratings yet
MachineLearning Final
5 pages
Diagnosing Dysarthria With Long Short-Term Memory Networks: September 15-19, 2019, Graz, Austria
No ratings yet
Diagnosing Dysarthria With Long Short-Term Memory Networks: September 15-19, 2019, Graz, Austria
5 pages
5831 978-1-5386-4658-8/18/$31.00 ©2019 Ieee Icassp 2019
No ratings yet
5831 978-1-5386-4658-8/18/$31.00 ©2019 Ieee Icassp 2019
5 pages
Artificial Intelligence in Medicine
No ratings yet
Artificial Intelligence in Medicine
8 pages
Voice Pathology Detection Based On The Modified Voice Contour and SVM
No ratings yet
Voice Pathology Detection Based On The Modified Voice Contour and SVM
9 pages
End-to-End Recognition Approach For Cognitive Impaired Speech Using Sequential Conv-Nets
No ratings yet
End-to-End Recognition Approach For Cognitive Impaired Speech Using Sequential Conv-Nets
6 pages
Human Emotion Detection With Speech Recognition Using Mel-Frequency Cepstral Coefficient and CNN - New
No ratings yet
Human Emotion Detection With Speech Recognition Using Mel-Frequency Cepstral Coefficient and CNN - New
2 pages
An LSTM Based Deep Learning Model For Voice-Based Detection of Parkinson's Disease
No ratings yet
An LSTM Based Deep Learning Model For Voice-Based Detection of Parkinson's Disease
7 pages
1 s2.0 S2665917423002490 Main
No ratings yet
1 s2.0 S2665917423002490 Main
5 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
16 pages
Ieee Paper Format
No ratings yet
Ieee Paper Format
4 pages
Research Paper Attri
No ratings yet
Research Paper Attri
7 pages
Paper5 Implementation
No ratings yet
Paper5 Implementation
7 pages
Unsupervised Features Learning For Audio Analysis
No ratings yet
Unsupervised Features Learning For Audio Analysis
4 pages
Ft-757gx2 User Hb9fax
No ratings yet
Ft-757gx2 User Hb9fax
37 pages
Review Paper VPD
No ratings yet
Review Paper VPD
4 pages
Classification of Lung Sounds Using CNN
No ratings yet
Classification of Lung Sounds Using CNN
10 pages
Artificial Neural Networks and Support Vector Machine For Voice Disorders Identification
No ratings yet
Artificial Neural Networks and Support Vector Machine For Voice Disorders Identification
6 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
5 pages
End-To-End Speech Emotion Recognition Using Deep Neural Networks
No ratings yet
End-To-End Speech Emotion Recognition Using Deep Neural Networks
5 pages
Comparative Analysis of Neural Networks For Speech Emotion Recognition
No ratings yet
Comparative Analysis of Neural Networks For Speech Emotion Recognition
5 pages
Saheaw 2020
No ratings yet
Saheaw 2020
4 pages
Speech Emotion Recognition With Deep Learning
No ratings yet
Speech Emotion Recognition With Deep Learning
5 pages
Visvesvaraya Technological University Belagavi, Karnataka - 590 018
No ratings yet
Visvesvaraya Technological University Belagavi, Karnataka - 590 018
3 pages
The Noun Phrase Jan Rijkhoff Z Library
No ratings yet
The Noun Phrase Jan Rijkhoff Z Library
1,028 pages
Case Study Synopsis Lpu Ums
No ratings yet
Case Study Synopsis Lpu Ums
5 pages
Powerplant Exercises
No ratings yet
Powerplant Exercises
3 pages
Annexure-4 CertificatefromUniversity
No ratings yet
Annexure-4 CertificatefromUniversity
1 page
Math 110-Fundamentals
No ratings yet
Math 110-Fundamentals
52 pages
11.interview C Coding Question
No ratings yet
11.interview C Coding Question
7 pages
Number System Representation - Study Notes
No ratings yet
Number System Representation - Study Notes
12 pages
Soil Classification Using Horizontal To Vertical Spectrum Ratio Methods On Scilab in Sendangmulyo, Semarang
No ratings yet
Soil Classification Using Horizontal To Vertical Spectrum Ratio Methods On Scilab in Sendangmulyo, Semarang
8 pages
IBM DSAA-3xxx Series Hard Drives OEM Functional Specifications
No ratings yet
IBM DSAA-3xxx Series Hard Drives OEM Functional Specifications
50 pages
IELTS Simon Speaking Part 3 9dee133876
No ratings yet
IELTS Simon Speaking Part 3 9dee133876
37 pages
G0709084055
No ratings yet
G0709084055
16 pages
Manual Slake Durability Device
No ratings yet
Manual Slake Durability Device
40 pages
Grade 7 History Term 1 Worksheets 2023
No ratings yet
Grade 7 History Term 1 Worksheets 2023
23 pages
S71937 - Enabling Intelligent Storage To Process Data For Ai Application Ibm
No ratings yet
S71937 - Enabling Intelligent Storage To Process Data For Ai Application Ibm
21 pages
01-Sap Annual Report2023 Ang
No ratings yet
01-Sap Annual Report2023 Ang
119 pages
English - Question - Paper (HW-1)
No ratings yet
English - Question - Paper (HW-1)
1 page
Phase 0
No ratings yet
Phase 0
15 pages
Cidam Layout
No ratings yet
Cidam Layout
40 pages
Dot Matrix Printer (DMP)
No ratings yet
Dot Matrix Printer (DMP)
12 pages
CMAT - Module 3 Answer Key (QA - DI - LR)
No ratings yet
CMAT - Module 3 Answer Key (QA - DI - LR)
8 pages
Lesson Plan #5-Final Demo
No ratings yet
Lesson Plan #5-Final Demo
5 pages
Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis
No ratings yet
Telephony-Based Voice Pathology Assessment Using Automated Speech Analysis
10 pages
Fertility: Overview, 2012 To 2016: Report On The Demographic Situation in Canada
No ratings yet
Fertility: Overview, 2012 To 2016: Report On The Demographic Situation in Canada
19 pages
Fallout 4 Bobblehead and Magazine Guide - Zone 1
No ratings yet
Fallout 4 Bobblehead and Magazine Guide - Zone 1
1 page
PSCC-Net: Progressive Spatio-Channel Correlation Network For Image Manipulation Detection and Localization
No ratings yet
PSCC-Net: Progressive Spatio-Channel Correlation Network For Image Manipulation Detection and Localization
11 pages
Content Server
No ratings yet
Content Server
5 pages
Automatic Identification of Pathological Voice Quality Based On The GRBAS Categorization
No ratings yet
Automatic Identification of Pathological Voice Quality Based On The GRBAS Categorization
5 pages
Max Little, Patrick Mcsharry, Irene Moroz and Stephen Roberts
No ratings yet
Max Little, Patrick Mcsharry, Irene Moroz and Stephen Roberts
4 pages
Pathology Voice Detection and Classification Using Ensemble Learning
No ratings yet
Pathology Voice Detection and Classification Using Ensemble Learning
8 pages
Voice Pathology Identification System Using SVM Classifier
No ratings yet
Voice Pathology Identification System Using SVM Classifier
7 pages
Subdivision Warranty Bond
No ratings yet
Subdivision Warranty Bond
2 pages

Voice Disorder Detection Using Long Short Term Memory (LSTM) Model

Uploaded by

Voice Disorder Detection Using Long Short Term Memory (LSTM) Model

Uploaded by

Voice Disorder Detection Using Long Short Term

Memory (LSTM) Model

it = σ(Wi [ht-1 , xt ] + bi (2)

Long Short Term Memory C = tanh(WC [ht-1 , xt ] + bC (3)

Data preprocessing includes feature extraction using

b) Spectral Centroid: Spectral centroid provides the

c) Chroma: Chroma provides a chromagram from a

You might also like