Draft
Draft
Title:
1
ELECTRONICS & TELECOMMUNICATION
2. Mudit Mohan
Electronics & Telecommunication Engineering
USN: 1MS21ET032
4. Name of supervisor(s) :
Dr. Viswanath Talasila
Department of Electronics & Telecommunication Engineering
5. Expertise of supervisor in the domain of the proposed seed grant: (3 to 4 lines with one
recent publication details in the domain)
The Driver Monitoring System aims to enhance vehicular safety by analyzing audio data to monitor
driver and passenger states. The system uses advanced deep learning models like CNN and YOLO
to classify in-car sounds into categories such as speech (male, female), engine noise, and traffic. In
subsequent stages, it detects speech amidst music and identifies speaker emotions, addressing
challenges like noise and overlapping sounds. Leveraging real-time data processing, the system can
predict driver distractions and emotional states, potentially reducing accident risks. Designed for
scalability, the system integrates with automotive environments to provide robust driver assistance
solutions.
The project focuses on developing a robust driver monitoring system leveraging in-car audio
signals. By analyzing sound patterns, the system classifies environmental and human-generated
sounds, such as engine status, traffic, and speech. This innovation addresses the increasing need for
real-time monitoring systems to ensure driver and passenger safety. The first stage involves
classifying audio into predefined categories. Subsequent stages involve detecting human speech in
noisy environments and analyzing emotional states using audio signals. The project employs
advanced machine learning techniques, including convolutional neural networks (CNNs) for feature
extraction and classification, ensuring high accuracy in real-time scenarios. The system's
applicability in commercial vehicles highlights its potential impact on reducing accidents caused by
driver distraction or fatigue.
2
Background (literature review) – 1000 words
Introduction
The integration of machine learning in automotive systems has led to significant advancements
in driver safety and monitoring solutions. With the ever-increasing need to monitor in-car
environments, sound-based classification has emerged as a promising technique for identifying and
understanding driver states and environmental factors. This literature review explores the
methodologies and tools used in the development of an audio-based driver monitoring system,
focusing on three distinct stages: sound classification, noise removal, and emotion detection.
The project utilizes publicly available datasets, such as UrbanSound8K, for training and
validation. These datasets provide a diverse range of labeled audio samples, enabling the model to
learn robust features for real-world applications. Key preprocessing steps include noise reduction,
data augmentation, and feature extraction using tools like Librosa. The extracted features, such as
Mel spectrograms and MFCCs, are fed into the CNN model for training.
The Facebook Demucs model is pre-trained on large-scale datasets and can handle diverse
noise types, including engine hum, road noise, and overlapping speech. It is particularly suitable for
this project as it enhances the clarity of human speech amidst noisy environments, which is essential
for subsequent stages. By integrating Demucs, the system achieves real-time noise suppression,
enabling accurate speech detection in dynamic automotive scenarios.
Google Voice Lite: This lightweight version of Google’s voice processing models provides
efficient speech-to-text conversion and basic emotion detection capabilities. It is designed for low-
latency applications, making it ideal for in-car systems where real-time processing is essential.
3
- Asteroid (Hugging Face): Asteroid is a deep learning-based audio source separation and
enhancement toolkit. It incorporates advanced algorithms for feature extraction and classification,
ensuring high accuracy in emotion recognition tasks. By leveraging pre-trained Asteroid models, the
system benefits from state-of-the-art performance in recognizing nuanced emotional states.
Emotion detection models typically rely on features such as pitch, tone, and intensity extracted
from speech signals. These features are processed using recurrent neural networks (RNNs) or
transformers, which capture temporal dependencies and contextual information. The use of Google
Voice Lite and Asteroid ensures that the system remains scalable and adaptable to various acoustic
conditions.
2. Noise Removal: Facebook Demucs has set a benchmark in noise suppression. Compared to
traditional methods, Demucs offers superior performance in handling complex noise patterns.
Research by Défossez et al. (2020) highlights its ability to enhance speech quality in challenging
acoustic environments.
3. Emotion Detection: Emotion recognition using speech has been extensively studied. Recent
advancements, such as transformers and self-supervised learning, have improved accuracy in
recognizing subtle emotional cues. Studies on datasets like IEMOCAP and RAVDESS demonstrate
the potential of deep learning models in this domain. The integration of Google Voice Lite and
Asteroid ensures that this project leverages the latest advancements in speech processing.
1. Noise Variability: Automotive environments are subject to diverse noise sources, such as
traffic, engine vibrations, and passenger conversations. Handling such variability requires robust
preprocessing and noise suppression techniques.
2. Real-Time Processing: In-car systems demand low-latency solutions to ensure timely driver
assistance. This necessitates lightweight and efficient models that can operate on edge devices.
4
Proposed Approach
The proposed driver monitoring system addresses these challenges through a multi-stage
pipeline:
2. Stage 2: Integration of Facebook Demucs for noise removal, ensuring clear speech signals
for downstream tasks.
3. Stage 3: Emotion detection using Google Voice Lite and Asteroid, leveraging their strengths
in speech-to-text conversion and emotional analysis.
The system's modular design allows each stage to function independently, ensuring flexibility
and scalability. By combining pre-trained models with fine-tuning on domain-specific datasets, the
project achieves high accuracy and efficiency.
Conclusion
The literature review highlights the potential of deep learning in audio-based driver monitoring
systems. By integrating state-of-the-art models like Facebook Demucs, Google Voice Lite, and
Asteroid, the proposed system overcomes challenges in noise removal and emotion detection. The
project builds upon existing research to create a scalable and real-time solution for enhancing driver
and passenger safety.
The increasing rate of vehicular accidents due to driver fatigue, distraction, and unmonitored
emotional states necessitates a reliable monitoring system. This project aims to classify in-car audio
signals to provide real-time insights into the driver's environment and emotional state, enhancing
road safety and reducing accidents.
9. Objectives
Objective 1: Classify in-car sounds into prede ned categories (male, female, engine, traf c, mock
sounds).
Objective 2: Detect human speech amidst music or environmental noise.
Objective 3: Analyze driver emotions through speech patterns for enhanced safety.
Data Collection: Use datasets like UrbanSound8K & audio captured in car of driver for training
models.
Preprocessing: Audio data augmentation, noise reduction, and feature extraction using Librosa.
Model Development: Implement CNN for sound classi cation, YOLO for object detection, and
emotion detection models.
Deployment: Develop a real-time audio analysis pipeline using Python and TensorFlow.
5
fi
fi
fi
11. Preliminary Work Done (if any) and Project Execution Feasibility:
Hardware: NVIDIA GPU Jetson NANO for inferance, Tesla V100 GPU, microphones for real-time audio
capture.
Software: Python, TensorFlow/Keras, Librosa for audio preprocessing, and Flask for web-based
implementation.
Environment: Simulated car setup with pre-recorded and live audio inputs for testing.
Ethical: Ensures privacy of recorded audio data, adhering to data protection regulations.
Environmental: Uses energy-efficient hardware and software solutions to minimize the carbon footprint
during training and deployment.
6
fi
15.Expenditure Planning (to be done by supervisor)
Sl.
No Item Amount (In Rupees)
.
1. Non- Equipment
Recurring
2. Consumables
Recurring and Components
3. Contingency
Grand total
Publications
• Research paper on multi-stage driver monitoring systems, highlighting innovations in audio
classi cation, noise removal, and emotion detection.
Patents
• Patent application for the complete system design and its unique architecture for real-time in-car
monitoring.
Products/Prototypes
• A functional prototype of a driver monitoring system integrating real-time audio processing and driver
state detection.
Collaborations
• Potential partnerships with automotive companies Stellantis for system deployment.
7
fi
18. References