0% found this document useful (0 votes)

16 views8 pages

Draft

The project titled 'Driver Monitoring System using Audio in Car' aims to enhance vehicular safety by analyzing audio data to monitor driver and passenger states using deep learning models. It involves three stages: classifying in-car sounds, removing noise, and detecting emotions from speech, all designed to provide real-time insights into driver conditions. The project is supervised by Dr. Viswanath Talasila and involves students from Electronics & Telecommunication Engineering.

Uploaded by

cit.dms1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views8 pages

Draft

Uploaded by

cit.dms1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Student Project Seed Grant

Title:

Driver Monitoring system using Audio in Car

1
ELECTRONICS & TELECOMMUNICATION

1. Title of the Project: Driver Monitoring system using Audio in Car

2. Broad Area of Research: Artificial Intelligence, Machine Learning, Signal Processing,

and Automotive Safety Systems

3. Names of Students, Parent Department(s) and USN:

1. Samarth Shinde
Electronics & Telecommunication Engineering
USN: 1MS21ET044

2. Mudit Mohan
Electronics & Telecommunication Engineering
USN: 1MS21ET032

4. Name of supervisor(s) :
Dr. Viswanath Talasila
Department of Electronics & Telecommunication Engineering

5. Expertise of supervisor in the domain of the proposed seed grant: (3 to 4 lines with one
recent publication details in the domain)

6. Abstract of proposal – 150 words

The Driver Monitoring System aims to enhance vehicular safety by analyzing audio data to monitor
driver and passenger states. The system uses advanced deep learning models like CNN and YOLO
to classify in-car sounds into categories such as speech (male, female), engine noise, and traffic. In
subsequent stages, it detects speech amidst music and identifies speaker emotions, addressing
challenges like noise and overlapping sounds. Leveraging real-time data processing, the system can
predict driver distractions and emotional states, potentially reducing accident risks. Designed for
scalability, the system integrates with automotive environments to provide robust driver assistance
solutions.

7. Brief Introduction to the proposal - 200words

The project focuses on developing a robust driver monitoring system leveraging in-car audio
signals. By analyzing sound patterns, the system classifies environmental and human-generated
sounds, such as engine status, traffic, and speech. This innovation addresses the increasing need for
real-time monitoring systems to ensure driver and passenger safety. The first stage involves
classifying audio into predefined categories. Subsequent stages involve detecting human speech in
noisy environments and analyzing emotional states using audio signals. The project employs
advanced machine learning techniques, including convolutional neural networks (CNNs) for feature
extraction and classification, ensuring high accuracy in real-time scenarios. The system's
applicability in commercial vehicles highlights its potential impact on reducing accidents caused by
driver distraction or fatigue.
2
Background (literature review) – 1000 words

Introduction
The integration of machine learning in automotive systems has led to significant advancements
in driver safety and monitoring solutions. With the ever-increasing need to monitor in-car
environments, sound-based classification has emerged as a promising technique for identifying and
understanding driver states and environmental factors. This literature review explores the
methodologies and tools used in the development of an audio-based driver monitoring system,
focusing on three distinct stages: sound classification, noise removal, and emotion detection.

Stage 1: Audio Classification Using CNN

In the first stage, the system focuses on classifying in-car sounds into predefined categories:
male voice, female voice, engine noise, traffic noise, and mock sounds. Convolutional Neural
Networks (CNNs) have proven to be highly effective for audio classification tasks. Their ability to
extract spatial and temporal features from spectrograms makes them suitable for distinguishing
between different sound types. Research on datasets like UrbanSound8K and ESC-50 demonstrates
the effectiveness of CNN-based architectures in sound classification, achieving accuracy levels
exceeding 85% in most cases.

The project utilizes publicly available datasets, such as UrbanSound8K, for training and
validation. These datasets provide a diverse range of labeled audio samples, enabling the model to
learn robust features for real-world applications. Key preprocessing steps include noise reduction,
data augmentation, and feature extraction using tools like Librosa. The extracted features, such as
Mel spectrograms and MFCCs, are fed into the CNN model for training.

Stage 2: Noise Removal Using Facebook Demucs

Noise interference is a critical challenge in automotive audio processing. The second stage
employs Facebook Demucs, a state-of-the-art deep learning model designed for noise removal.
Demucs is based on a waveform-to-waveform approach, leveraging encoder-decoder architectures to
separate audio signals into their constituent components. This approach outperforms traditional noise
removal techniques like spectral subtraction and Wiener filtering.

The Facebook Demucs model is pre-trained on large-scale datasets and can handle diverse
noise types, including engine hum, road noise, and overlapping speech. It is particularly suitable for
this project as it enhances the clarity of human speech amidst noisy environments, which is essential
for subsequent stages. By integrating Demucs, the system achieves real-time noise suppression,
enabling accurate speech detection in dynamic automotive scenarios.

Stage 3: Emotion Detection Using Google Voice Lite and Asteroid

Emotion recognition through speech analysis has gained attention for its applications in driver
assistance and passenger comfort. In this stage, the project employs **Google Voice Lite** and
Asteroid (a Hugging Face model) to analyze speech and detect emotions such as anger, sadness,
happiness, and frustration.

Google Voice Lite: This lightweight version of Google’s voice processing models provides
efficient speech-to-text conversion and basic emotion detection capabilities. It is designed for low-
latency applications, making it ideal for in-car systems where real-time processing is essential.
3
- Asteroid (Hugging Face): Asteroid is a deep learning-based audio source separation and
enhancement toolkit. It incorporates advanced algorithms for feature extraction and classification,
ensuring high accuracy in emotion recognition tasks. By leveraging pre-trained Asteroid models, the
system benefits from state-of-the-art performance in recognizing nuanced emotional states.

Emotion detection models typically rely on features such as pitch, tone, and intensity extracted
from speech signals. These features are processed using recurrent neural networks (RNNs) or
transformers, which capture temporal dependencies and contextual information. The use of Google
Voice Lite and Asteroid ensures that the system remains scalable and adaptable to various acoustic
conditions.

Existing Research and Comparisons

1. Audio Classification: Previous studies have demonstrated the effectiveness of CNNs in

audio classification tasks. For instance, Salamon et al. (2014) explored feature extraction techniques
like MFCCs and Mel spectrograms for environmental sound classification, achieving state-of-the-art
results with deep learning models. This project builds upon such methodologies to classify in-car
sounds effectively.

2. Noise Removal: Facebook Demucs has set a benchmark in noise suppression. Compared to
traditional methods, Demucs offers superior performance in handling complex noise patterns.
Research by Défossez et al. (2020) highlights its ability to enhance speech quality in challenging
acoustic environments.

3. Emotion Detection: Emotion recognition using speech has been extensively studied. Recent
advancements, such as transformers and self-supervised learning, have improved accuracy in
recognizing subtle emotional cues. Studies on datasets like IEMOCAP and RAVDESS demonstrate
the potential of deep learning models in this domain. The integration of Google Voice Lite and
Asteroid ensures that this project leverages the latest advancements in speech processing.

Challenges in Audio Processing

1. Noise Variability: Automotive environments are subject to diverse noise sources, such as
traffic, engine vibrations, and passenger conversations. Handling such variability requires robust
preprocessing and noise suppression techniques.

2. Real-Time Processing: In-car systems demand low-latency solutions to ensure timely driver
assistance. This necessitates lightweight and efficient models that can operate on edge devices.

3. Emotion Detection in Noisy Environments: Identifying emotions from speech is challenging

when background noise or overlapping sounds are present. Advanced feature extraction and
separation techniques are crucial to address this issue.

4
Proposed Approach

The proposed driver monitoring system addresses these challenges through a multi-stage
pipeline:

1. Stage 1: CNN-based audio classification for robust sound categorization.

2. Stage 2: Integration of Facebook Demucs for noise removal, ensuring clear speech signals
for downstream tasks.

3. Stage 3: Emotion detection using Google Voice Lite and Asteroid, leveraging their strengths
in speech-to-text conversion and emotional analysis.

The system's modular design allows each stage to function independently, ensuring flexibility
and scalability. By combining pre-trained models with fine-tuning on domain-specific datasets, the
project achieves high accuracy and efficiency.

Conclusion
The literature review highlights the potential of deep learning in audio-based driver monitoring
systems. By integrating state-of-the-art models like Facebook Demucs, Google Voice Lite, and
Asteroid, the proposed system overcomes challenges in noise removal and emotion detection. The
project builds upon existing research to create a scalable and real-time solution for enhancing driver
and passenger safety.

8. Problem Statement (this must be very specific) – 50 words

The increasing rate of vehicular accidents due to driver fatigue, distraction, and unmonitored
emotional states necessitates a reliable monitoring system. This project aims to classify in-car audio
signals to provide real-time insights into the driver's environment and emotional state, enhancing
road safety and reducing accidents.

9. Objectives

Objective 1: Classify in-car sounds into prede ned categories (male, female, engine, traf c, mock
sounds).
Objective 2: Detect human speech amidst music or environmental noise.
Objective 3: Analyze driver emotions through speech patterns for enhanced safety.

10. Methodology of the proposed solution

Data Collection: Use datasets like UrbanSound8K & audio captured in car of driver for training
models.
Preprocessing: Audio data augmentation, noise reduction, and feature extraction using Librosa.
Model Development: Implement CNN for sound classi cation, YOLO for object detection, and
emotion detection models.
Deployment: Develop a real-time audio analysis pipeline using Python and TensorFlow.
5
fi
fi
fi
11. Preliminary Work Done (if any) and Project Execution Feasibility:

Preliminary Work Done:

• Implemented CNN-based classi cation of UrbanSound8K dataset with 85% accuracy.
• Developed scripts for audio recording and preprocessing.
Execution Feasibility:
• The project utilizes open-source datasets and tools like Librosa, ensuring cost-effectiveness. It is
feasible with the current resources and computational power available.

12. Details of experimental set up:

Hardware: NVIDIA GPU Jetson NANO for inferance, Tesla V100 GPU, microphones for real-time audio
capture.
Software: Python, TensorFlow/Keras, Librosa for audio preprocessing, and Flask for web-based
implementation.
Environment: Simulated car setup with pre-recorded and live audio inputs for testing.

13. Timeline of project execution:

1-6 Month: Data collection and preprocessing.

2-3 Month: Model training and validation.
1 Month: Real-time system integration and testing.
1 Month: Final testing, optimization, and documentation.

14. Ethical and Environmental considerations (if any)

Ethical: Ensures privacy of recorded audio data, adhering to data protection regulations.
Environmental: Uses energy-efficient hardware and software solutions to minimize the carbon footprint
during training and deployment.

6
fi
15.Expenditure Planning (to be done by supervisor)
Sl.
No Item Amount (In Rupees)
.
1. Non- Equipment
Recurring
2. Consumables
Recurring and Components
3. Contingency

Grand total

Justification for Equipment:

Justification for Consumables:

Justification for Contingency:

17. Expected Outcomes (publications, patents etc.):

Publications
• Research paper on multi-stage driver monitoring systems, highlighting innovations in audio
classi cation, noise removal, and emotion detection.
Patents
• Patent application for the complete system design and its unique architecture for real-time in-car
monitoring.
Products/Prototypes
• A functional prototype of a driver monitoring system integrating real-time audio processing and driver
state detection.
Collaborations
• Potential partnerships with automotive companies Stellantis for system deployment.

7
fi
18. References

Supervisor(s) Head of Department(s)

Ai Unit 1
100% (1)
Ai Unit 1
101 pages
CAO Notes
No ratings yet
CAO Notes
50 pages
Best Machine Learning Platform Comparison
No ratings yet
Best Machine Learning Platform Comparison
38 pages
Flutter Essence A Crash Course in Application Development With Dart (Bisette, Vincent Van Der Post, Hayden) (Z-Library)
No ratings yet
Flutter Essence A Crash Course in Application Development With Dart (Bisette, Vincent Van Der Post, Hayden) (Z-Library)
353 pages
BMS Migration of Existing Honeywell System
100% (1)
BMS Migration of Existing Honeywell System
3 pages
Satellite Transmission (FDD) Feature Parameter Description: Issue Date
100% (1)
Satellite Transmission (FDD) Feature Parameter Description: Issue Date
24 pages
Automotive User Interfaces and Interactive Vehicular Applications
100% (1)
Automotive User Interfaces and Interactive Vehicular Applications
182 pages
Bing-Xii - KD 3.28 Application Letter and CV
0% (1)
Bing-Xii - KD 3.28 Application Letter and CV
28 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
8 pages
Android Application Development Itr Presentation On
No ratings yet
Android Application Development Itr Presentation On
10 pages
Program-MSI Netfx Full X64.msi
No ratings yet
Program-MSI Netfx Full X64.msi
4,971 pages
Pondicherry University Curriculum and Syllabi For: B.Tech. (Computer Science and Engineering)
No ratings yet
Pondicherry University Curriculum and Syllabi For: B.Tech. (Computer Science and Engineering)
24 pages
Safety
No ratings yet
Safety
235 pages
Orangepi One h3 User Manual v3.2
No ratings yet
Orangepi One h3 User Manual v3.2
127 pages
MCB Properties Manual
No ratings yet
MCB Properties Manual
188 pages
Ug1144 Petalinux Tools Reference Guide
No ratings yet
Ug1144 Petalinux Tools Reference Guide
141 pages
Nakrani Asu 0010N 14971
No ratings yet
Nakrani Asu 0010N 14971
214 pages
Automated Testing For Automotive Infotainment Systems: Ning Yin
No ratings yet
Automated Testing For Automotive Infotainment Systems: Ning Yin
70 pages
Smart Data Access - Data Virtualization in SAP HANA - ERPDocs
No ratings yet
Smart Data Access - Data Virtualization in SAP HANA - ERPDocs
97 pages
Seminar Report Parthiv
No ratings yet
Seminar Report Parthiv
58 pages
In-Car Voice Assistant Consumer Adoption Report 2019 Voicebot
No ratings yet
In-Car Voice Assistant Consumer Adoption Report 2019 Voicebot
35 pages
Pss (1) SB Cpu 3 Eth-2: Operating Manual - Item No. 21 867-01
No ratings yet
Pss (1) SB Cpu 3 Eth-2: Operating Manual - Item No. 21 867-01
29 pages
Seminar Report Final
No ratings yet
Seminar Report Final
37 pages
Embedded Speech Recognition: State-Of-Art & Current Challenges
No ratings yet
Embedded Speech Recognition: State-Of-Art & Current Challenges
36 pages
Aguila JImenez Mario Projecto
No ratings yet
Aguila JImenez Mario Projecto
18 pages
Paper 1 Mlis
No ratings yet
Paper 1 Mlis
19 pages
Sustainability 15 05749
No ratings yet
Sustainability 15 05749
21 pages
Fyp SRS
No ratings yet
Fyp SRS
22 pages
Furletov Thesis RMR
No ratings yet
Furletov Thesis RMR
149 pages
Evaluation of State of Art Open-Source ASR Engines With Local Inferencing
No ratings yet
Evaluation of State of Art Open-Source ASR Engines With Local Inferencing
81 pages
Aniket BB
No ratings yet
Aniket BB
59 pages
Mukesh BB
No ratings yet
Mukesh BB
59 pages
Arun BB
No ratings yet
Arun BB
59 pages
Interface RS485 Pour VCB
No ratings yet
Interface RS485 Pour VCB
25 pages
Ds Mod 5
No ratings yet
Ds Mod 5
17 pages
Upd2 Dox
No ratings yet
Upd2 Dox
28 pages
In-Car Speech Enhancement Based On Source Separation Technique
No ratings yet
In-Car Speech Enhancement Based On Source Separation Technique
11 pages
Sample Seminar Report
No ratings yet
Sample Seminar Report
14 pages
Physics Project
No ratings yet
Physics Project
20 pages
Review Article
No ratings yet
Review Article
14 pages
IFLYTEK Voice Trend
No ratings yet
IFLYTEK Voice Trend
22 pages
CPP Report DONE!
No ratings yet
CPP Report DONE!
17 pages
Review 2 Report........
No ratings yet
Review 2 Report........
40 pages
Advanced Speech Recognition For IOT Hardware
No ratings yet
Advanced Speech Recognition For IOT Hardware
13 pages
Progress Seminar (Format) - AIML
No ratings yet
Progress Seminar (Format) - AIML
14 pages
Kimia Report
No ratings yet
Kimia Report
26 pages
StudentProjectFunding Proposal
No ratings yet
StudentProjectFunding Proposal
13 pages
Seminar
No ratings yet
Seminar
13 pages
Release Notes V2.0: HUAWEI E5573s-320TCPU-V200R001B315D01SP00C306
No ratings yet
Release Notes V2.0: HUAWEI E5573s-320TCPU-V200R001B315D01SP00C306
7 pages
Physics 1
No ratings yet
Physics 1
10 pages
Project Abstraction 104196369
No ratings yet
Project Abstraction 104196369
17 pages
Voice Detection
No ratings yet
Voice Detection
8 pages
VCC Synopsis
No ratings yet
VCC Synopsis
17 pages
Unit 2 NMU
No ratings yet
Unit 2 NMU
4 pages
1.2. Install The Quickstart - en-US
No ratings yet
1.2. Install The Quickstart - en-US
7 pages
Audio-Based Early Warning System of Vehicle Approaching Event For Improving Pedestrian's Safety
No ratings yet
Audio-Based Early Warning System of Vehicle Approaching Event For Improving Pedestrian's Safety
3 pages
Trellix Intrusion Prevention System Physical Appliance Specifications
No ratings yet
Trellix Intrusion Prevention System Physical Appliance Specifications
7 pages
Pitch The Future 2022 - Team Milhas Gerais
No ratings yet
Pitch The Future 2022 - Team Milhas Gerais
8 pages
NTU Entrance Examination Paper Requirements: Engineering & IT
No ratings yet
NTU Entrance Examination Paper Requirements: Engineering & IT
6 pages
ZTE H3601 Overview
No ratings yet
ZTE H3601 Overview
2 pages
Voice Controlled Vehicle Dashboard
No ratings yet
Voice Controlled Vehicle Dashboard
6 pages
Abstract: Embedded System Is The One of The Firmware That Is Combination
No ratings yet
Abstract: Embedded System Is The One of The Firmware That Is Combination
3 pages
Ai Project Ideas
No ratings yet
Ai Project Ideas
6 pages
CNN Bilstm 2021
No ratings yet
CNN Bilstm 2021
5 pages
SYNOPSIS No.3
No ratings yet
SYNOPSIS No.3
5 pages
Automotive Infotainment Systems: A Digital Revolution On Wheels
No ratings yet
Automotive Infotainment Systems: A Digital Revolution On Wheels
5 pages
Assignment-Predicting The Future of Autonomous Vehicles - Innovations and Regulatory Challenges
No ratings yet
Assignment-Predicting The Future of Autonomous Vehicles - Innovations and Regulatory Challenges
27 pages
Noc17-Ph05 Week 06 Assignment 01
No ratings yet
Noc17-Ph05 Week 06 Assignment 01
5 pages
Speech Interface Vlsi For Car Applications
No ratings yet
Speech Interface Vlsi For Car Applications
4 pages
Practical DateSheet May 2023 Student-1 - 1 - 1
No ratings yet
Practical DateSheet May 2023 Student-1 - 1 - 1
5 pages
Irjet V7i9161
No ratings yet
Irjet V7i9161
4 pages
Project Title: SMART CARS Area: Image Processing Project Classification (Product/Research/Application/Review/Simulation)
No ratings yet
Project Title: SMART CARS Area: Image Processing Project Classification (Product/Research/Application/Review/Simulation)
3 pages
Intern, Software Engineering
No ratings yet
Intern, Software Engineering
3 pages
06-24 Nov, Maths IG2 Plaaner.-1
No ratings yet
06-24 Nov, Maths IG2 Plaaner.-1
2 pages
2023 01 26T21 07 12 - R3dlog
No ratings yet
2023 01 26T21 07 12 - R3dlog
3 pages
Eccouncil Ecihv2 8 1 1 Cloud Computing Concepts
No ratings yet
Eccouncil Ecihv2 8 1 1 Cloud Computing Concepts
2 pages
Automobile PRD - In-Car Virtual Assistant
No ratings yet
Automobile PRD - In-Car Virtual Assistant
1 page
Headwaters College - Elizabeth Campus: Date Started: - Date Finished
No ratings yet
Headwaters College - Elizabeth Campus: Date Started: - Date Finished
1 page
Device Registration Process - Annexure-1
No ratings yet
Device Registration Process - Annexure-1
2 pages
Applied HuggingSound for Speech Recognition: The Complete Guide for Developers and Engineers
From Everand
Applied HuggingSound for Speech Recognition: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Aimybox Voice Assistant Development: Definitive Reference for Developers and Engineers
From Everand
Aimybox Voice Assistant Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
From Everand
Voice Technologies and Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Using Vocals Determine Human Emotion
From Everand
Using Vocals Determine Human Emotion
Faiz ul haque Zeya
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Aphelion Software: Unlocking Vision: Exploring the Depths of Aphelion Software
From Everand
Aphelion Software: Unlocking Vision: Exploring the Depths of Aphelion Software
Fouad Sabry
No ratings yet
Automatic Target Recognition: Fundamentals and Applications
From Everand
Automatic Target Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet
Human Visual System Model: Understanding Perception and Processing
From Everand
Human Visual System Model: Understanding Perception and Processing
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
From Everand
Automatic Target Recognition: Advances in Computer Vision Techniques for Target Recognition
Fouad Sabry
No ratings yet

Draft

Uploaded by

Draft

Uploaded by

Student Project Seed Grant

Driver Monitoring system using Audio in Car

1. Title of the Project: Driver Monitoring system using Audio in Car

2. Broad Area of Research: Artificial Intelligence, Machine Learning, Signal Processing,

3. Names of Students, Parent Department(s) and USN:

6. Abstract of proposal – 150 words

7. Brief Introduction to the proposal - 200words

Stage 1: Audio Classification Using CNN

Stage 2: Noise Removal Using Facebook Demucs

Stage 3: Emotion Detection Using Google Voice Lite and Asteroid

Existing Research and Comparisons

1. Audio Classification: Previous studies have demonstrated the effectiveness of CNNs in

Challenges in Audio Processing

3. Emotion Detection in Noisy Environments: Identifying emotions from speech is challenging

1. Stage 1: CNN-based audio classification for robust sound categorization.

8. Problem Statement (this must be very specific) – 50 words

10. Methodology of the proposed solution

Preliminary Work Done:

12. Details of experimental set up:

13. Timeline of project execution:

1-6 Month: Data collection and preprocessing.

14. Ethical and Environmental considerations (if any)

Justification for Equipment:

Justification for Consumables:

Justification for Contingency:

17. Expected Outcomes (publications, patents etc.):

Supervisor(s) Head of Department(s)

You might also like