0% found this document useful (0 votes)
23 views4 pages

IEEE Paper

Uploaded by

Jagdish Aade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views4 pages

IEEE Paper

Uploaded by

Jagdish Aade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Gender Recognition from Audio using MATLAB

Abstract—The document discusses the design and paper provides implementation details, result analysis, and
implementation of an algorithm to predict the gender of suggestions for further improvement.
an individual based on features extracted from pre-
recorded audio data. The algorithm utilizes pitch and II. PROPOSED METHOD
energy as the key features and sets predefined thresholds
for classification. The project team has implemented the
algorithm using MATLAB and achieved efficiency rates The proposed methodology for the topic of "Gender
of 81.1% for male voice prediction and 72.5% for female Recognition from Audio" involves the following steps:
voice prediction. The document provides a flow chart
outlining the steps involved in the process, including 2.1 Audio Recording:
audio recording, pre-processing and filtering, feature
extraction, gender classification, and displaying the Utilize the audio recorder to capture audio input for a
predicted gender. The document also suggests exploring predefined duration (e.g., 5 seconds). Adjust the recording
more advanced signal processing techniques or machine time based on the application requirements.
learning models for better accuracy if the results are not
satisfactory. Two external references are provided for 2.2 Preprocessing:
further information on gender-based speaker
recognition and GMM models.
Apply any necessary preprocessing steps, such as filtering
Keywords— Prediction, efficiency , filtering, audio. out noise from the recorded audio signal. This may involve
techniques like noise reduction or filtering.

2.3 Feature Extraction:


I. INTRODUCTION
The document focuses on the design and implementation of Extract relevant features from the pre-processed audio data.
an algorithm for gender recognition from pre-recorded audio The code currently extracts pitch and energy as
as well as live aurdio. data. The objective is to predict the features.Consider exploring additional features that might
gender of an individual based on features extracted from the enhance gender classification accuracy, such as formant
audio, specifically pitch and energy. frequencies, MFCCs (Mel-Frequency Cepstral Coefficients),
or other spectral features.
Next, pre-processing and filtering techniques are applied to
remove noise and enhance the quality of the recorded audio
2.4 Threshold Setting:
data. Feature extraction is then performed, focusing on two
key features: pitch, which represents the fundamental
frequency of the voice, and energy, which represents the Set appropriate thresholds for gender classification based on
overall energy of the audio signal. the extracted features (pitch and energy). Adjust these
thresholds as needed by analyzing the characteristics of your
Gender classification is achieved by setting predefined dataset.
thresholds for pitch and energy. If the extracted features
meet the established thresholds, the algorithm classifies the 2.5 Audio File Uploading:
gender accordingly. The predicted gender is then displayed
based on the classification results. If you plan to use pre-recorded audio files instead of real-
time recording, uncomment and modify the code for loading
Exploring more advanced signal processing techniques or audio files.
machine learning models if the results are not satisfactory. It
references external sources for further information on 2.6 Machine Learning Model:
gender- based speaker recognition and Gaussian Mixture
Models (GMM).
[1] Consider integrating a machine learning model for
gender classification. This could involve training a model on
It emphasizes the significance of accurate gender
a labeled dataset using techniques like support vector
recognition and proposes a solution using pitch and energy
machines (SVM), neural networks, or other classifiers.
features. The
Evaluate the model's performance on a diverse dataset to Task: Preprocess audio data to enhance relevant features
ensure robustness. and reduce noise.
Methods: Apply techniques such as filtering, noise
2.7 Visualization: reduction, or equalization as needed.

Use audio visualization tools to gain insights into the Step 5: Test on Diverse Dataset:
characteristics of the audio signal. The code includes a plot Task: Evaluate the system on a diverse set of audio
of the audio data, which can be helpful for debugging and samples. Considerations: Include a variety of speakers,
analysis. accents, and environmental conditions to ensure robustness.

2.8 Output Display: Step 6: Adjust for Variability:


Task: Account for variations in pitch, accent, and other
factors.
Display the predicted gender based on the defined Considerations: Implement mechanisms to handle
thresholds. Make sure the output aligns with the variability, such as dynamic thresholds or adaptive feature
application's requirements and is easily interpretable. extraction.

2.9 Testing and Evaluation: Step 7: Evaluate and Refine:


Task: Assess the accuracy and reliability of the system.
Test the system on a variety of audio samples, considering Considerations: Analyze false positives/negatives and refine
different speakers, accents, and environmental conditions. the system accordingly. Iterate on feature extraction and
Evaluate the accuracy, precision, recall, and other relevant threshold setting.
metrics to assess the performance of the gender recognition
system. Step 8: Document the Methodology:
Task: Document the steps, parameters, and decisions made
2.10 Adjustments and Optimization: during the implementation.
Considerations: Provide clear documentation for future
reference and collaboration.
Continuously refine the methodology by adjusting
parameters, thresholds, or incorporating more advanced
Step 9: Ethical Considerations:
techniques to improve accuracy. Consider user feedback and
Task: Consider ethical implications of gender recognition.
real-world testing to make the system more robust and
Considerations: Ensure fairness, transparency, and privacy
reliable.
in the application of gender recognition technology.

III. IMPLEMENTATION Step 10: Accessibility and Bias:


Task: Address potential biases in the system.
The implementation of our proposed method involves the Considerations: Test the system's performance across
following steps: different demographic groups to avoid perpetuating gender
biases.
Step 1: Define Feature Extraction
Task: Extract relevant features from audio signals. Step 11: User Interface (Optional):
Method: Use signal processing techniques to capture key Task: If applicable, design a user-friendly interface for
characteristics such as pitch, energy, formants, or other utilizing the gender recognition system.
acoustic properties. Considerations: Prioritize clarity, accessibility, and user
Tools: Libraries like librosa in Python can assist in feature consent.
extraction.
Step 12: Maintenance and Updates:
Step 2: Choose Discriminative Features: Task: Establish a plan for maintaining and updating the
Task: Identify features that are indicative of gender system.
differences in audio. Considerations: Stay informed about advancements in
Considerations: Pitch and energy are common features, but audio processing and adapt the system accordingly.
additional characteristics such as spectral features or
formants may also be useful.
IV. EXPERIMENTAL RESULTS AND
Step 3: Set Thresholds: ANALYSIS
Task: Define thresholds for feature values that indicate
male or female characteristics. 5.1 PITCH:-
Considerations: Experiment with different threshold values
based on the distribution of features in your dataset.
[2] Pitch period is defined as the time interval between two
consecutive voiced excitation cycles i.e. the distance in time
Step 4: Audio Preprocessing (Optional):
from one peak to the next peak. It is the fundamental
frequency of the excitation source. Hence an efficient pitch 5.3 Analysis
extractor and an accurate pitch estimate calculator can be
used in an algorithm for gender classification. Fundamental
frequency (f0) estimation, also referred to as pitch detection,
has been a popular research topic for many years, and is still
being investigated.

5.2 Speech Database:-

Recording for 300 different males and females of age group


20-22 were done. The sentence recorded was “Hello , how
are you?”. The recording was done in recording specific
software Sony Sound Forge. The speech signals were
recorded with sampling frequency of 44100 Hz in .wav
format in Mono Channel. The plots for .10 male and 10
female were tested and plotted. ‘s’ = samples

Sr Ground Predicted Pitch Energy Correct


No. Truth Gender Mean Mean Prediction
S1 Male Male 123 88 Yes Figure 1:- Male Audio
S2 Male Male 181 105 Yes
S3 Male Male 176 67 Yes
S4 Male Male 197 107 Yes
S5 Male Male 255 275 Yes
S6 Male Male 259 50 Yes
S7 Male Female 278 499 No
S8 Male Male 214 115 Yes
S9 Male Male 231 56 Yes
S10 Male Female 286 274 No
S11 Female Female 318 559 Yes
S12 Female Female 261 255 Yes
S13 Female Male 234 34 No
S14 Female Male 234 263 No
S15 Female Female 266 261 Yes
S16 Female Male 248 564 No
S17 Female Female 275 150 Yes
S18 Female Female 256 459 Yes
S19 Female Female 260 713 Yes
S20 Female Female 274 209 Yes

Table 1.1:- Gender Prediction Samples


Figure 2:- Female Audio
5.2 Efficiency

𝐶𝑜𝑟𝑟𝑒𝑐𝑡
𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 CONCLUSION
𝑇𝑜𝑡𝑎𝑙
Efficiency= ×100% V.
𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠

Overall efficiency after taking 50 Males and 50 The presented document details the design and
Female samples. implementation of an algorithm aimed at predicting the
gender of individuals based on features extracted from pre-
recorded audio data. The algorithm relies on pitch and
Efficiency of Male Voice Prediction:- 81.1% energy as primary features and employs predefined
thresholds for classification. Implemented using MATLAB,
Efficiency of Female voice Prediction:- 72.5% the system achieved efficiency rates of 81.1% for male
voice prediction and 72.5% for female voice prediction.
The document outlines a systematic process through a flow
chart, encompassing key stages such as audio recording,
pre- processing, and filtering, feature extraction, gender
classification, and displaying the predicted gender. This
structured approach allows for a clear understanding of the
system's workflow.
While the implemented algorithm demonstrates reasonable
efficiency, the document wisely suggests potential avenues
for improvement. Specifically, the exploration of more
I also extend my sincere thanks to “Dr. Reena Sonkusare”
advanced signal processing techniques or machine learning
for their encouragement and willingness to share their
models is recommended, particularly if the achieved results
knowledge and expertise. Their guidance has been
fall short of expectations. This forward-thinking approach
instrumental in helping me overcome obstacles and reach
reflects an acknowledgment of the evolving nature of audio-
new levels of understanding in “Audio steganography using
based gender prediction systems and the continuous pursuit
LBC, ECB, and Wavelet Transform”.
of enhanced accuracy.
[3] Furthermore, the inclusion of external references on The insights and support of both professors have been
gender-based speaker recognition and Gaussian Mixture invaluable throughout this research journey. I am truly
Model (GMM) models demonstrates a commitment to honoured to have had the opportunity to learn from such
leveraging existing knowledge and methodologies within dedicated and passionate educators.
the field.

ACKNOWLEDGMENTS REFERENCES
[1]. B. Jena, A. Mohanty, and S. K. Mohanty, "Gender
I would like to express my deepest gratitude to my college Recognition of Speech Signal using KNN and SVM," in
professors, “Dr. Kiran TALELE” and International Conference on Smart Data Intelligence
“Dr. Reena Sonkusare”, for their unwavering support and (ICSMDI 2021), 2021.
guidance throughout the development of this research https://papers.ssrn.com/sol3/papers.cfm?abstract_id=385260
project. Their expertise, encouragement, and patience have 7
been instrumental in shaping my understanding of “Audio
steganography using LBC, ECB, and Wavelet Transform”. [2].P. Kumar, N. Jakhanwal, A. Bhowmick, and M.
Chandra, "Gender Classification Using Pitch and
Formants," in Proceedings of the IEEE International
We are particularly grateful to, “Dr. Kiran TALELE” for
Conference on Pattern Recognition, December 2008, pp. 1-
their invaluable mentorship and insightful feedback, which
4. https://www.researchgate.net/publication/220846517_Gende
helped us to refine our research methodology and strengthen
r_classification_using_pitch_and_formants
the overall quality of my work. Their dedication to teaching
and their passion for research have been a constant source of
[3]. M. Gupta, S. S. Bharti, and S. Agarwal, "Gender-based
inspiration for me.
speaker recognition from speech signals using GMM
model," Modern Physics Letters B, vol. 33, no. 35, pp.
1950438, November 2019. DOI:
10.1142/S0217984919504384
https://www.researchgate.net/publication/337227020_Gende
r-
based_speaker_recognition_from_speech_signals_using_G
MM_model

You might also like