0% found this document useful (0 votes)
11 views4 pages

IEEE Paper

Uploaded by

Jagdish Aade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

IEEE Paper

Uploaded by

Jagdish Aade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Gender Recognition from Audio using MATLAB

Ansh Singhal Ankit Prajapati Vineet Pundpal


Electronics and Electronics and Electronics and
Telecommunication Telecommunication Telecommunication
Sardar Patel Institute Sardar Patel Institute Sardar Patel Institute
of Technology of Technology of Technology
[email protected] [email protected] [email protected]

Abstract—The document discusses the design and paper provides implementation details, result analysis, and
implementation of an algorithm to predict the gender of suggestions for further improvement.
an individual based on features extracted from pre-
recorded audio data. The algorithm utilizes pitch and II. PROPOSED METHOD
energy as the key features and sets predefined thresholds
for classification. The project team has implemented the The proposed methodology for the topic of "Gender
algorithm using MATLAB and achieved efficiency rates Recognition from Audio" involves the following steps:
of 81.1% for male voice prediction and 72.5% for female
voice prediction. The document provides a flow chart
2.1 Audio Recording:
outlining the steps involved in the process, including
audio recording, pre-processing and filtering, feature
extraction, gender classification, and displaying the Utilize the audio recorder to capture audio input for a
predicted gender. The document also suggests exploring predefined duration (e.g., 5 seconds). Adjust the recording
more advanced signal processing techniques or machine time based on the application requirements.
learning models for better accuracy if the results are not
satisfactory. Two external references are provided for 2.2 Preprocessing:
further information on gender-based speaker recognition
and GMM models. Apply any necessary preprocessing steps, such as filtering out
noise from the recorded audio signal. This may involve
Keywords— Prediction, efficiency , filtering, audio. techniques like noise reduction or filtering.

2.3 Feature Extraction:


I. INTRODUCTION
The document focuses on the design and implementation of Extract relevant features from the pre-processed audio data.
an algorithm for gender recognition from pre-recorded audio The code currently extracts pitch and energy as
as well as live aurdio. data. The objective is to predict the features.Consider exploring additional features that might
gender of an individual based on features extracted from the enhance gender classification accuracy, such as formant
audio, specifically pitch and energy. frequencies, MFCCs (Mel-Frequency Cepstral Coefficients),
or other spectral features.
Next, pre-processing and filtering techniques are applied to
remove noise and enhance the quality of the recorded audio 2.4 Threshold Setting:
data. Feature extraction is then performed, focusing on two
key features: pitch, which represents the fundamental Set appropriate thresholds for gender classification based on
frequency of the voice, and energy, which represents the the extracted features (pitch and energy). Adjust these
overall energy of the audio signal. thresholds as needed by analyzing the characteristics of your
dataset.
Gender classification is achieved by setting predefined
thresholds for pitch and energy. If the extracted features meet
2.5 Audio File Uploading:
the established thresholds, the algorithm classifies the gender
accordingly. The predicted gender is then displayed based on
the classification results. If you plan to use pre-recorded audio files instead of real-time
recording, uncomment and modify the code for loading audio
Exploring more advanced signal processing techniques or files.
machine learning models if the results are not satisfactory. It
references external sources for further information on gender- 2.6 Machine Learning Model:
based speaker recognition and Gaussian Mixture Models
(GMM). [1]Consider integrating a machine learning model for gender
classification. This could involve training a model on a
It emphasizes the significance of accurate gender recognition labeled dataset using techniques like support vector machines
and proposes a solution using pitch and energy features. The (SVM), neural networks, or other classifiers.
Evaluate the model's performance on a diverse dataset to Task: Preprocess audio data to enhance relevant features and
ensure robustness. reduce noise.
Methods: Apply techniques such as filtering, noise
2.7 Visualization: reduction, or equalization as needed.

Step 5: Test on Diverse Dataset:


Use audio visualization tools to gain insights into the
characteristics of the audio signal. The code includes a plot Task: Evaluate the system on a diverse set of audio samples.
of the audio data, which can be helpful for debugging and Considerations: Include a variety of speakers, accents, and
environmental conditions to ensure robustness.
analysis.
Step 6: Adjust for Variability:
2.8 Output Display: Task: Account for variations in pitch, accent, and other
factors.
Display the predicted gender based on the defined thresholds. Considerations: Implement mechanisms to handle
Make sure the output aligns with the application's variability, such as dynamic thresholds or adaptive feature
requirements and is easily interpretable. extraction.

2.9 Testing and Evaluation: Step 7: Evaluate and Refine:


Task: Assess the accuracy and reliability of the system.
Test the system on a variety of audio samples, considering Considerations: Analyze false positives/negatives and refine
different speakers, accents, and environmental conditions. the system accordingly. Iterate on feature extraction and
Evaluate the accuracy, precision, recall, and other relevant threshold setting.
metrics to assess the performance of the gender recognition
system. Step 8: Document the Methodology:
Task: Document the steps, parameters, and decisions made
during the implementation.
2.10 Adjustments and Optimization:
Considerations: Provide clear documentation for future
reference and collaboration.
Continuously refine the methodology by adjusting
parameters, thresholds, or incorporating more advanced Step 9: Ethical Considerations:
techniques to improve accuracy. Consider user feedback and Task: Consider ethical implications of gender recognition.
real-world testing to make the system more robust and Considerations: Ensure fairness, transparency, and privacy
reliable. in the application of gender recognition technology.

III. IMPLEMENTATION Step 10: Accessibility and Bias:


Task: Address potential biases in the system.
Considerations: Test the system's performance across
The implementation of our proposed method involves the different demographic groups to avoid perpetuating gender
following steps: biases.
Step 1: Define Feature Extraction Step 11: User Interface (Optional):
Task: Extract relevant features from audio signals. Task: If applicable, design a user-friendly interface for
Method: Use signal processing techniques to capture key utilizing the gender recognition system.
characteristics such as pitch, energy, formants, or other Considerations: Prioritize clarity, accessibility, and user
acoustic properties. consent.
Tools: Libraries like librosa in Python can assist in feature
extraction. Step 12: Maintenance and Updates:
Task: Establish a plan for maintaining and updating the
Step 2: Choose Discriminative Features: system.
Task: Identify features that are indicative of gender Considerations: Stay informed about advancements in audio
differences in audio. processing and adapt the system accordingly.
Considerations: Pitch and energy are common features, but
additional characteristics such as spectral features or
formants may also be useful. IV. EXPERIMENTAL RESULTS AND
Step 3: Set Thresholds: ANALYSIS
Task: Define thresholds for feature values that indicate male
or female characteristics. 5.1 PITCH:-
Considerations: Experiment with different threshold values
based on the distribution of features in your dataset. [2]Pitch period is defined as the time interval between two
consecutive voiced excitation cycles i.e. the distance in time
Step 4: Audio Preprocessing (Optional): from one peak to the next peak. It is the fundamental
frequency of the excitation source. Hence an efficient pitch 5.3 Analysis
extractor and an accurate pitch estimate calculator can be
used in an algorithm for gender classification. Fundamental
frequency (f0) estimation, also referred to as pitch detection,
has been a popular research topic for many years, and is still
being investigated.

5.2 Speech Database:-

Recording for 300 different males and females of age group


20-22 were done. The sentence recorded was “Hello , how
are you?”. The recording was done in recording specific
software Sony Sound Forge. The speech signals were
recorded with sampling frequency of 44100 Hz in .wav
format in Mono Channel. The plots for .10 male and 10
female were tested and plotted. ‘s’ = samples

Sr Ground Predicted Pitch Energy Correct


No. Truth Gender Mean Mean Prediction
S1 Male Male 123 88 Yes Figure 1:- Male Audio
S2 Male Male 181 105 Yes
S3 Male Male 176 67 Yes
S4 Male Male 197 107 Yes
S5 Male Male 255 275 Yes
S6 Male Male 259 50 Yes
S7 Male Female 278 499 No
S8 Male Male 214 115 Yes
S9 Male Male 231 56 Yes
S10 Male Female 286 274 No
S11 Female Female 318 559 Yes
S12 Female Female 261 255 Yes
S13 Female Male 234 34 No
S14 Female Male 234 263 No
S15 Female Female 266 261 Yes
S16 Female Male 248 564 No
S17 Female Female 275 150 Yes
S18 Female Female 256 459 Yes
S19 Female Female 260 713 Yes
S20 Female Female 274 209 Yes
Table 1.1:- Gender Prediction Samples
Figure 2:- Female Audio
5.2 Efficiency

𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Efficiency= ×100% V. CONCLUSION
𝑇𝑜𝑡𝑎𝑙 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠

The presented document details the design and


Overall efficiency after taking 50 Males and 50 Female
implementation of an algorithm aimed at predicting the
samples.
gender of individuals based on features extracted from pre-
recorded audio data. The algorithm relies on pitch and energy
Efficiency of Male Voice Prediction:- 81.1% as primary features and employs predefined thresholds for
classification. Implemented using MATLAB, the system
Efficiency of Female voice Prediction:- 72.5% achieved efficiency rates of 81.1% for male voice prediction
and 72.5% for female voice prediction.
The document outlines a systematic process through a flow
chart, encompassing key stages such as audio recording, pre-
processing, and filtering, feature extraction, gender
classification, and displaying the predicted gender. This
structured approach allows for a clear understanding of the
system's workflow.
While the implemented algorithm demonstrates reasonable
efficiency, the document wisely suggests potential avenues
for improvement. Specifically, the exploration of more I also extend my sincere thanks to “Dr. Reena Sonkusare” for
advanced signal processing techniques or machine learning their encouragement and willingness to share their
models is recommended, particularly if the achieved results knowledge and expertise. Their guidance has been
fall short of expectations. This forward-thinking approach instrumental in helping me overcome obstacles and reach
reflects an acknowledgment of the evolving nature of audio- new levels of understanding in “Audio steganography using
based gender prediction systems and the continuous pursuit LBC, ECB, and Wavelet Transform”.
of enhanced accuracy.
[3]Furthermore, the inclusion of external references on The insights and support of both professors have been
gender-based speaker recognition and Gaussian Mixture invaluable throughout this research journey. I am truly
Model (GMM) models demonstrates a commitment to honoured to have had the opportunity to learn from such
leveraging existing knowledge and methodologies within the dedicated and passionate educators.
field.
REFERENCES
ACKNOWLEDGMENTS
[1]. B. Jena, A. Mohanty, and S. K. Mohanty, "Gender
Recognition of Speech Signal using KNN and SVM," in
I would like to express my deepest gratitude to my college International Conference on Smart Data Intelligence
professors, “Dr. Kiran TALELE” and (ICSMDI 2021), 2021.
“Dr. Reena Sonkusare”, for their unwavering support and https://papers.ssrn.com/sol3/papers.cfm?abstract_id=385260
guidance throughout the development of this research 7
project. Their expertise, encouragement, and patience have
been instrumental in shaping my understanding of “Audio [2].P. Kumar, N. Jakhanwal, A. Bhowmick, and M. Chandra,
steganography using LBC, ECB, and Wavelet Transform”. "Gender Classification Using Pitch and Formants," in
Proceedings of the IEEE International Conference on Pattern
We are particularly grateful to, “Dr. Kiran TALELE” for their Recognition, December 2008, pp. 1-4.
invaluable mentorship and insightful feedback, which helped https://www.researchgate.net/publication/220846517_Gende
us to refine our research methodology and strengthen the r_classification_using_pitch_and_formants
overall quality of my work. Their dedication to teaching and
their passion for research have been a constant source of [3]. M. Gupta, S. S. Bharti, and S. Agarwal, "Gender-based
inspiration for me. speaker recognition from speech signals using GMM model,"
Modern Physics Letters B, vol. 33, no. 35, pp. 1950438,
November 2019. DOI: 10.1142/S0217984919504384
https://www.researchgate.net/publication/337227020_Gende
r-
based_speaker_recognition_from_speech_signals_using_G
MM_model

You might also like