0% found this document useful (0 votes)
14 views8 pages

Draft

The project titled 'Driver Monitoring System using Audio in Car' aims to enhance vehicular safety by analyzing audio data to monitor driver and passenger states using deep learning models. It involves three stages: classifying in-car sounds, removing noise, and detecting emotions from speech, all designed to provide real-time insights into driver conditions. The project is supervised by Dr. Viswanath Talasila and involves students from Electronics & Telecommunication Engineering.

Uploaded by

cit.dms1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

Draft

The project titled 'Driver Monitoring System using Audio in Car' aims to enhance vehicular safety by analyzing audio data to monitor driver and passenger states using deep learning models. It involves three stages: classifying in-car sounds, removing noise, and detecting emotions from speech, all designed to provide real-time insights into driver conditions. The project is supervised by Dr. Viswanath Talasila and involves students from Electronics & Telecommunication Engineering.

Uploaded by

cit.dms1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Student Project Seed Grant

Title:

Driver Monitoring system using Audio in Car

1
ELECTRONICS & TELECOMMUNICATION

1. Title of the Project: Driver Monitoring system using Audio in Car

2. Broad Area of Research: Artificial Intelligence, Machine Learning, Signal Processing,


and Automotive Safety Systems

3. Names of Students, Parent Department(s) and USN:


1. Samarth Shinde
Electronics & Telecommunication Engineering
USN: 1MS21ET044

2. Mudit Mohan
Electronics & Telecommunication Engineering
USN: 1MS21ET032

4. Name of supervisor(s) :
Dr. Viswanath Talasila
Department of Electronics & Telecommunication Engineering

5. Expertise of supervisor in the domain of the proposed seed grant: (3 to 4 lines with one
recent publication details in the domain)

6. Abstract of proposal – 150 words

The Driver Monitoring System aims to enhance vehicular safety by analyzing audio data to monitor
driver and passenger states. The system uses advanced deep learning models like CNN and YOLO
to classify in-car sounds into categories such as speech (male, female), engine noise, and traffic. In
subsequent stages, it detects speech amidst music and identifies speaker emotions, addressing
challenges like noise and overlapping sounds. Leveraging real-time data processing, the system can
predict driver distractions and emotional states, potentially reducing accident risks. Designed for
scalability, the system integrates with automotive environments to provide robust driver assistance
solutions.

7. Brief Introduction to the proposal - 200words

The project focuses on developing a robust driver monitoring system leveraging in-car audio
signals. By analyzing sound patterns, the system classifies environmental and human-generated
sounds, such as engine status, traffic, and speech. This innovation addresses the increasing need for
real-time monitoring systems to ensure driver and passenger safety. The first stage involves
classifying audio into predefined categories. Subsequent stages involve detecting human speech in
noisy environments and analyzing emotional states using audio signals. The project employs
advanced machine learning techniques, including convolutional neural networks (CNNs) for feature
extraction and classification, ensuring high accuracy in real-time scenarios. The system's
applicability in commercial vehicles highlights its potential impact on reducing accidents caused by
driver distraction or fatigue.
2
Background (literature review) – 1000 words

Introduction
The integration of machine learning in automotive systems has led to significant advancements
in driver safety and monitoring solutions. With the ever-increasing need to monitor in-car
environments, sound-based classification has emerged as a promising technique for identifying and
understanding driver states and environmental factors. This literature review explores the
methodologies and tools used in the development of an audio-based driver monitoring system,
focusing on three distinct stages: sound classification, noise removal, and emotion detection.

Stage 1: Audio Classification Using CNN


In the first stage, the system focuses on classifying in-car sounds into predefined categories:
male voice, female voice, engine noise, traffic noise, and mock sounds. Convolutional Neural
Networks (CNNs) have proven to be highly effective for audio classification tasks. Their ability to
extract spatial and temporal features from spectrograms makes them suitable for distinguishing
between different sound types. Research on datasets like UrbanSound8K and ESC-50 demonstrates
the effectiveness of CNN-based architectures in sound classification, achieving accuracy levels
exceeding 85% in most cases.

The project utilizes publicly available datasets, such as UrbanSound8K, for training and
validation. These datasets provide a diverse range of labeled audio samples, enabling the model to
learn robust features for real-world applications. Key preprocessing steps include noise reduction,
data augmentation, and feature extraction using tools like Librosa. The extracted features, such as
Mel spectrograms and MFCCs, are fed into the CNN model for training.

Stage 2: Noise Removal Using Facebook Demucs


Noise interference is a critical challenge in automotive audio processing. The second stage
employs Facebook Demucs, a state-of-the-art deep learning model designed for noise removal.
Demucs is based on a waveform-to-waveform approach, leveraging encoder-decoder architectures to
separate audio signals into their constituent components. This approach outperforms traditional noise
removal techniques like spectral subtraction and Wiener filtering.

The Facebook Demucs model is pre-trained on large-scale datasets and can handle diverse
noise types, including engine hum, road noise, and overlapping speech. It is particularly suitable for
this project as it enhances the clarity of human speech amidst noisy environments, which is essential
for subsequent stages. By integrating Demucs, the system achieves real-time noise suppression,
enabling accurate speech detection in dynamic automotive scenarios.

Stage 3: Emotion Detection Using Google Voice Lite and Asteroid


Emotion recognition through speech analysis has gained attention for its applications in driver
assistance and passenger comfort. In this stage, the project employs **Google Voice Lite** and
Asteroid (a Hugging Face model) to analyze speech and detect emotions such as anger, sadness,
happiness, and frustration.

Google Voice Lite: This lightweight version of Google’s voice processing models provides
efficient speech-to-text conversion and basic emotion detection capabilities. It is designed for low-
latency applications, making it ideal for in-car systems where real-time processing is essential.
3
- Asteroid (Hugging Face): Asteroid is a deep learning-based audio source separation and
enhancement toolkit. It incorporates advanced algorithms for feature extraction and classification,
ensuring high accuracy in emotion recognition tasks. By leveraging pre-trained Asteroid models, the
system benefits from state-of-the-art performance in recognizing nuanced emotional states.

Emotion detection models typically rely on features such as pitch, tone, and intensity extracted
from speech signals. These features are processed using recurrent neural networks (RNNs) or
transformers, which capture temporal dependencies and contextual information. The use of Google
Voice Lite and Asteroid ensures that the system remains scalable and adaptable to various acoustic
conditions.

Existing Research and Comparisons

1. Audio Classification: Previous studies have demonstrated the effectiveness of CNNs in


audio classification tasks. For instance, Salamon et al. (2014) explored feature extraction techniques
like MFCCs and Mel spectrograms for environmental sound classification, achieving state-of-the-art
results with deep learning models. This project builds upon such methodologies to classify in-car
sounds effectively.

2. Noise Removal: Facebook Demucs has set a benchmark in noise suppression. Compared to
traditional methods, Demucs offers superior performance in handling complex noise patterns.
Research by Défossez et al. (2020) highlights its ability to enhance speech quality in challenging
acoustic environments.

3. Emotion Detection: Emotion recognition using speech has been extensively studied. Recent
advancements, such as transformers and self-supervised learning, have improved accuracy in
recognizing subtle emotional cues. Studies on datasets like IEMOCAP and RAVDESS demonstrate
the potential of deep learning models in this domain. The integration of Google Voice Lite and
Asteroid ensures that this project leverages the latest advancements in speech processing.

Challenges in Audio Processing

1. Noise Variability: Automotive environments are subject to diverse noise sources, such as
traffic, engine vibrations, and passenger conversations. Handling such variability requires robust
preprocessing and noise suppression techniques.

2. Real-Time Processing: In-car systems demand low-latency solutions to ensure timely driver
assistance. This necessitates lightweight and efficient models that can operate on edge devices.

3. Emotion Detection in Noisy Environments: Identifying emotions from speech is challenging


when background noise or overlapping sounds are present. Advanced feature extraction and
separation techniques are crucial to address this issue.

4
Proposed Approach

The proposed driver monitoring system addresses these challenges through a multi-stage
pipeline:

1. Stage 1: CNN-based audio classification for robust sound categorization.

2. Stage 2: Integration of Facebook Demucs for noise removal, ensuring clear speech signals
for downstream tasks.

3. Stage 3: Emotion detection using Google Voice Lite and Asteroid, leveraging their strengths
in speech-to-text conversion and emotional analysis.

The system's modular design allows each stage to function independently, ensuring flexibility
and scalability. By combining pre-trained models with fine-tuning on domain-specific datasets, the
project achieves high accuracy and efficiency.

Conclusion
The literature review highlights the potential of deep learning in audio-based driver monitoring
systems. By integrating state-of-the-art models like Facebook Demucs, Google Voice Lite, and
Asteroid, the proposed system overcomes challenges in noise removal and emotion detection. The
project builds upon existing research to create a scalable and real-time solution for enhancing driver
and passenger safety.

8. Problem Statement (this must be very specific) – 50 words

The increasing rate of vehicular accidents due to driver fatigue, distraction, and unmonitored
emotional states necessitates a reliable monitoring system. This project aims to classify in-car audio
signals to provide real-time insights into the driver's environment and emotional state, enhancing
road safety and reducing accidents.

9. Objectives

Objective 1: Classify in-car sounds into prede ned categories (male, female, engine, traf c, mock
sounds).
Objective 2: Detect human speech amidst music or environmental noise.
Objective 3: Analyze driver emotions through speech patterns for enhanced safety.

10. Methodology of the proposed solution

Data Collection: Use datasets like UrbanSound8K & audio captured in car of driver for training
models.
Preprocessing: Audio data augmentation, noise reduction, and feature extraction using Librosa.
Model Development: Implement CNN for sound classi cation, YOLO for object detection, and
emotion detection models.
Deployment: Develop a real-time audio analysis pipeline using Python and TensorFlow.
5
fi
fi
fi
11. Preliminary Work Done (if any) and Project Execution Feasibility:

Preliminary Work Done:


• Implemented CNN-based classi cation of UrbanSound8K dataset with 85% accuracy.
• Developed scripts for audio recording and preprocessing.
Execution Feasibility:
• The project utilizes open-source datasets and tools like Librosa, ensuring cost-effectiveness. It is
feasible with the current resources and computational power available.

12. Details of experimental set up:

Hardware: NVIDIA GPU Jetson NANO for inferance, Tesla V100 GPU, microphones for real-time audio
capture.
Software: Python, TensorFlow/Keras, Librosa for audio preprocessing, and Flask for web-based
implementation.
Environment: Simulated car setup with pre-recorded and live audio inputs for testing.

13. Timeline of project execution:

1-6 Month: Data collection and preprocessing.


2-3 Month: Model training and validation.
1 Month: Real-time system integration and testing.
1 Month: Final testing, optimization, and documentation.

14. Ethical and Environmental considerations (if any)

Ethical: Ensures privacy of recorded audio data, adhering to data protection regulations.
Environmental: Uses energy-efficient hardware and software solutions to minimize the carbon footprint
during training and deployment.

6
fi
15.Expenditure Planning (to be done by supervisor)
Sl.
No Item Amount (In Rupees)
.
1. Non- Equipment
Recurring
2. Consumables
Recurring and Components
3. Contingency

Grand total

Justification for Equipment:

Justification for Consumables:

Justification for Contingency:

17. Expected Outcomes (publications, patents etc.):

Publications
• Research paper on multi-stage driver monitoring systems, highlighting innovations in audio
classi cation, noise removal, and emotion detection.
Patents
• Patent application for the complete system design and its unique architecture for real-time in-car
monitoring.
Products/Prototypes
• A functional prototype of a driver monitoring system integrating real-time audio processing and driver
state detection.
Collaborations
• Potential partnerships with automotive companies Stellantis for system deployment.

7
fi
18. References

Supervisor(s) Head of Department(s)

You might also like