Gender Detection by Voice Using Deep Learning
Gender Detection by Voice Using Deep Learning
ISSN No:-2456-2165
Abstract:- The recognition of gender voices as an MFCC chose feature extraction because it is a fairly
important part of answering certain voices. To good feature extraction method for noise reduction which
distinguish gender from sound signals, sound techniques requires a fast, easy, and complete processing time.
have defined the gender-relevant features (male or Meanwhile, the classification of votes is based on gender
female) of these sound signals. In this study, we used using a machine support vector (SVM). [3]
various models to improve accuracy, one of which was
by using deep learning with the voice gender DNN II. RELATED WORK
method. This noise reduction uses the extraction feature
of the Mel Frequency Cepstral Coefficient (MFCC), then Several studies discussdetecting voices based on
the sound classification uses SVM. By using a separation gender. Previous research conducted by Martin and Joensuu
ratio of 80% for training data and 20% for testing data. who developed speech recognition using the detected GMM
The results showed that using DNN for voice recognition and FFT features gave the best results for the classification
was better and pairing with the SVM algorithm obtained level[4][5].
an accurate result of 0.97%.
S.LYuan[6] developed a voice recognition system and
Keywords:- Voice Recognition, Deep Neural Network, Deep detected gender based on voice using Deep Learning with
Learning, MFCC, SVM. the Deep Neural Network (DNN) algorithm resulting in a
Word Error Rate (WER) in speech recognition which
I. INTRODUCTION showed less than optimal results.
Voice recognition is one of the important researches Also, Lee and Kwak [7] used DNN and two
that are currently widely used from a variety of applications, classifications in detecting sounds based on gender. The two
such as security systems, authentication, and so on. Voice classifications are SVM and decision tree (DT). In his
recognition must have high performance, to be able to research, feature extraction used by MFCC to identify
improve speech recognition performance, one of which is by gender voices resulted in fairly good accuracy.
adding a gender classification procedure. With this gender
classification, the problem space in speech recognition can III. DEEP LEARNING
be limited only based on predetermined gender[1].
Deep learning is a method that is often used in the
Voice data is divided into training data and testing data field of machine learning based on the Network Artificial
by classifying gender into two categories, namely male and Neural (ANN) principle model. Deep learning can solve
female. Male and female voices have their characteristics problems with large datasets such as image recognition, text
due to different resonances in the throat[2]. By processing detection, speech recognition, audio, etc. Because there are
sound signals, these characteristics will be obtained in a techniques for using feature extraction from training data
form that can be recognized by a computer. With especially for speech recognition. Artificial Neural Network,
thesecharacteristics, the computer can identify gender a method that adds a hidden layer, this deep learning can be
through sound signals. Therefore, we need a learning started with the input layer (voice recording), which can
algorithm that can help humans detect sounds based on then be processed in the form of a signal that is
gender. The deep learning algorithm can support this interconnected between nodes with each other in processing
detection because it can predict more accurately and quickly data and ultimately through the output to accuracy.
for speech recognition.
One of the deep learning algorithms is called the Deep
Deep Learning has excellent models for detections Neural Network (DNN). DNN is one of the developments of
such as image recognition, emotional recognition, and the Artificial Neural Network. The DNN method is capable
speech recognition. Deep learning which is commonly used of performing voice recognition with good results because it
for speech recognition is a deep neural network (DNN). can determine the feature extraction in each layer.
Therefore, deep learning with the DNN model will be used
to detect speech recognition. The deep learning algorithm
used for this study uses several existing feature extractions.
3) Spectral Rollof
Spectral rollof to represent the frequency where the
high frequency drops to 0. A power spectrum where 85% of
the power is at the lower frequency. Fig4 shows:
B. Dataset
This study uses a dataset[8] with male and female
genders. The total number of data consisted of 350 speakers,
each of whom had 190 men and 160 women who were
involved in this voice recording. The corresponding audio is
saved as a mono, 16bit, 32kHz WAV file. Fig 4:-Rollof Spectral Features
C. Extraction Feature 4) Zero-Crossing Rate
The voice recognition that we have previously Zero-Crossing Rate to measure the smoothness of the
processed is then extracted by several methods including signal is to count the number of zero-crossing in the signal
spectral centroid, spectral bandwidth, spectral rollof, MFCC segment. Sound signal oscillates slowly ega 100 Hz signal
(Mel Frequency Cepstral Coefficients), zero-crossing rate, will pass 100 zeros per second. Can be shown the fig. 5:
and feature chroma. But here we are focusing on feature
extraction for speech recognition using MFCC.
1) Spectral Centroid
Spectral centroid shows the energy frequency
spectrum indicating where the center of mass for the sound
is located. Shown the fig. 2:
Fig 6:- Chromatogram or Spectrogram Feature The input from the filterbank of 2595 and 700 is a
fixed, predefined value that is widely used in the MFCC
6) MFCC method [13]. [14] and the last process is Discrete Cosine
Mel Frequency Cepstral Coefficients (MFCC) is one Transform (DCT) whose output is called Mel Frequency
method that is widely used in the field of speech technology, Cepstral Coefficients (MFCC). Shown in the fig.8:
both speech recognition, and voice recognition.
2𝜋𝑛
𝑊[𝑛] = 0.54 − 0.46 cos[ ] (1)
𝑁−1
VI. CONCLUSION