0% found this document useful (0 votes)
9 views50 pages

Mini Project

The document is a mini project report on a baby monitor with automated response, designed by Dhinesh Kumar A and Monish Kumar H as part of their Bachelor of Technology in Information Technology. The proposed system utilizes speech recognition software to detect a baby's cries and automatically respond with soothing sounds, while also alerting parents via SMS. The report outlines the objectives, existing systems, and the proposed system's architecture and features, emphasizing enhanced safety and peace of mind for parents.

Uploaded by

j.sujendren08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views50 pages

Mini Project

The document is a mini project report on a baby monitor with automated response, designed by Dhinesh Kumar A and Monish Kumar H as part of their Bachelor of Technology in Information Technology. The proposed system utilizes speech recognition software to detect a baby's cries and automatically respond with soothing sounds, while also alerting parents via SMS. The report outlines the objectives, existing systems, and the proposed system's architecture and features, emphasizing enhanced safety and peace of mind for parents.

Uploaded by

j.sujendren08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

A BABY MONITOR WITH AUTOMATED

RESPONSE

A MINI PROJECT REPORT

Submitted by

DHINESH KUMAR A (411720205010)

MONISH KUMAR H (411720205020)

in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY

IN

INFORMATION TECHNOLOGY

PRINCE SHRI VENKATEHWARA PADMAVATHY ENGINEERING


COLLEGE [AN AUTONOMOUS INSTITUTION], CHENNAI - 600127

ANNA UNIVERSITY: CHENNAI 600 025

MAY 2023
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this mini project report “BABY MONNITOR WITH


AUTOMATED RESPONSE” is the bonafide work of “DHINESH KUMAR
A(411720205010), MONISH KUMAR H (411720205020)”
who carried out the mini project work under my supervision.

SIGNATURE SIGNATURE

Dr.P.INDIRA PRIYA Dr.P.INDIRA PRIYA M.E.,Ph.D.,

M.E.,Ph.D ., COORDINATOR
PROFESSOR
HEAD OF THE
Department of Information Technology
DEPARTMENT
Prince Shri Venkateshwara Padmavathy
Department of Information Technology
Engineering College, Chennai – 600 127
Prince Shri Venkateshwara Padmavathy
Engineering College, Chennai – 600 127

Submitted for Viva-Voce on ………………

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT

First and foremost, we would like to express our sincere gratitude to the
almighty for being our light and for his gracious showers of blessing throughout the
course of this project.

We would like to express our sincere thanks to our founder and Chairman,
Dr.K.Vasudevan, M.A., B.Ed., Ph.D.,for his endeavor in educating us in his
premier institution.

We are grateful to our Vice Chairman, Dr.V.Vishnu Karthik, M.D., for his
keen interest in our studies and the facilities offered in the premier institution.

We would like to express our appreciation and gracefulness to our Dean


Academics, Dr.V.Mahalakshmi,M.E., Ph.D., for her great support and
encouragement and moral support given to us during this course.

We are highly indebted to our Principal, Dr. G.Indira,M.E., Ph.D., for her
valuable support and encouragement in all our efforts throughout this course
guidance which has promoted our efforts.

We also wish to convey our sincere thanks and regards to our beloved Head
of the Department cum Project coordinator, Dr. P. Indira Priya, M.E., Ph.D., for
her support and for providing us with ample time to complete our project.

We are also thankful to all faculty members and non teaching staffs of all
Departments for their support. Finally we are grateful to our family and friends for
their help, encouragement and moral support given to us during our project work.

3
ABSTRACT

We have to design a device ,which is much more useful to monitoring the baby. If the baby

cries,this device recognize the sound of the baby and send an short message to their parent and it

also provides the feature of automatic response.If the baby cries,it plays a song or baby’s mother’s

voice which is stored.It help the parent to control their baby’s energy by stop crying, it gives some

more additional time to their parent’s. This system uses a technique which is implied in a google’s

Alexa.this system uses a technique which is known as speech recognition software.This software is

functioned with the help of Natural Language Processing(NLP).The Speech recognition software

works by recognize the sound which is played in an environment and identifies the sound with the

help of NLP (natural language processing).This system uses a microphone which is act as a tool to

collect and transmit the sound(recording),and the speech recognition software recognizes the sound

which is recorded by the microphone with the help of the natural language processing.The NLP

natural language processing is used to identify the speech in all languages in their natural

format.This system also gives an alert message to their parent’s mobile through the short message

service.

iv
TABLE OF CONTENTS

CHAPTER NO TITLE PAGE NO

ABSTRACT iv

LIST OF TABLE vii

LIST OF FIGURES viii

LIST OF ABBREVIATIONS ix

1. INTRODUCTION 1

1.1 Problem Statement 1

1.1.1 General 2

1.2 Objective 2

1.3 Existing System 4

1.4 Proposed System 5

2. LITERATURE SURVEY 7

2.1 Classification Algorithm 8

2.2 Detection Algorithm 9

2.3 Segmentation Algorithm 10

3. SYSTEM REQUIREMENTS 12

3.1 Hardware System Configuration 12

3.2 Software System Configuration 13

4. SYSTEM ANALYSIS 15

4.1 Existing System 15

4.1.1 Limitations of Existing System 16

4.2 Proposed System 17

4.2.1 System Architecture 18

v
4.2.2 Block Diagram 20

4.3 Module 1 Description 22

4.4 Module 2 Description 23

4.5 List of items in COCO 24


Dataset
4.6 Neural Network 25

4.7 Comparisons on various detection 26


algorithms

4.9 Performance Evaluation 28

5. SYSTEM DESIGN 29

5.1 System Framework 29

6. SYSTEM IMPLEMENTATION 31
6.1 Module 31

6.2 Tools 32

6.2.1 Microphone 32

6.2.2 ASR 32

6.2.3 NLP 32

6.2.4 TTS 32

6.2.5 Python 33

6.2.6 Java script 33


33
6.2.7 C++

6.2.8 Swift 33

7. CONCLUSION AND FUTURE WORK 34

7.1 Conclusion 34
7.2 Future Work 35

vi
APPENDIX 1 36

A1.1 SAMPLE CODING 37

A1.2 OUTPUT SCREENSHOTS 39


40
REFERENCES

vii
LIST OF TABLE

S.NO TABLE NAME PAGE NO


4.7. Comparison on Various Baby monitoring system 26

viii
LIST OF FIGURES

FIGURE NO FIGURE NAME PAGE NO


4.1 System Architecture for baby monitor 20
5.1 UML Diagram 29
5.2 UML use case diagram 30
5.3 Framework 30

ix
LIST OF ABBREVIATIONS

S.NO NAME EXPANSION

1. ASR AUTOMATIC SPEECH RECOGNITION

2. NLP NATURAL LANGUAGE PROGRAMMNG

3. TTS TEXT-TO-SPEECH

4. CNN CONVOLUTIONAL NEURAL NETWORK

5. DSP DIGITAL SIGNAL PROCESSING

6. GMM GAUSSIAN MIXTURE MODEL

7. ADC ANALOG TO DIGITAL CONVERTER

8. MFCC’s MEL FREQUENCY CEPSTRAL


COEFFICIENT

x
CHAPTER 1
INTRODUCTION

In the world of clever technology the place where everything is being built using
technology. As technology is enhancing day by day the need for technical systems is additionally
growing. The paper presented here is a user-friendly and is about A Baby monitor, which is a
device that allows parents or caregivers to keep an ear and an eye on their baby from another
room or in any another places. With Advancements in technology, baby monitors have become
more sophisticated, with added features such as automatic responses. movements, such as a baby
crying or string, and respond with pre-recorded messages or music to soothe the baby. These
monitors can also alert parents if the baby stops moving or if the temperature becomes too cold
or it becomes too hot.

This type of baby monitors provides peace of minds for parents, and allowing them to rest easy
knowing that their baby is monitored and cared for even when thery’re not in the same room.
With the automatic response feature, parents can ensure that their baby is comforted and soothed
without having to physically be there.

1.1 PROBLEM STATEMENT

The problem that automatic response baby monitors to aim to solve is the need for
parents to be able to monitor their baby’s well-being while also attending to other responsibilities.
With busy schedules and a need for parents to be productive while their baby sleeps, it can be
challenging to keep a constant eye and ear on their child.

Furthermore, parents may not always be able to immediately attend to their baby’s
needs, such as when they are in the shower, cooking, or doing any other household chores. This
can lead to stress and anxiety for parents who worry about their baby’s safety and well-being.

1
1.1.1 GENERAL

A baby monitor with automatic response is an advanced monitoring device to


enhance the safety and well-being of infants. The monitor provides real-time audio and/or video
monitoring of a baby’s activities while they are sleeping in a room, or resting in an other room,
and alerts the caregiver or a parent if the baby shows any signs of distress, such as crying or
unusual movements.

The automatic response feature of this baby monitor is particularly beneficial for
parents who may not be able to constantly monitor their baby due to their work, household chores,
or other responsibilities. The monitor is designed to automatically detect any caregiver via an
audio alarm, flashing lights, or notifications on their smartphone or other connected device.

Advanced baby monitors with automatic response may also provide information
about the baby’s breathing, temperature, and other vital signs. For example, some baby monitors
use sensors that detect the baby’s breathing patterns and alert the caregiver or a parent if the baby
stops breathing for a period . others may have a thermometer to monitor the baby’s temperature
and notify the caregiver if it falls outside of a safe range.

The primary objective of a baby monitor with automatic response is to enhance


the safety and security of a infants and give a peace to the parents mind. By providing real-time
monitoring and alerts, the monitor allows a parents to respond promptly to any potential issues,
reducing risk of injury to harm a infant. Additionally, the automatic response feature provides an
extra layer of protection, particularly during times when the caregiver may not be able to monitor
the baby closely.

2
1.2 OBJECTIVE
A Baby monitor with automatic response is a device that aims to provide parents or
caregivers with advanced monitoring system with advanced capabilities to ensure that the
safety and well-being of their infants. The objective of such a monitor is to provide a real
time audio and/or video monitoring of a baby’s activities while they are sleeping or resting
in another room.
The automatic response feature of this baby monitor is particularly beneficial for parents
who may not be able to constantly monitor their baby due to work, household chores or other
responsibilities.The monitor is designed to alert the caregiverbif the baby shows any signs
of distress, such as crying or unusual movements. The alert may be in the form of audio
alarms, flashing lights, notifications to the caregiver’s mobile phone, or other connected
device.
In addition to audio and video monitoring, advanced baby monitors with automatic response
may also provide information about the baby’s breathing, tempersture and other vital signs.
For Example, some baby monitors with automatic response may also provide information
about the baby’s breathing, temperature and other vital signs. They have a sensor to detect
the baby’s breathing patterns and alert the caregiver if the baby stops breathing for an
extended period. And it also notifies if the baby cry for a long period of time by using the
speech recognition software.
Others may have a thermometer to monitor baby’s temperature and notify the caregiver if it
fall outside of a safe range.

The primary objective of a baby monitor with automatic response is to enhance the safety
and security of infants and give parents peace of mind.by providing real-time monitoring
and alerts, the monitor allows parents to respond promptly to any potential issues, reducing
the risk of injury or harm to the baby.
Additionally,the automatic response feature provides an extra layer of protection,
particularly during times when the caregiver may not be able to monitor the baby closely.

3
Overall, the objective of a baby monitor with automatic response is to provide parents with
advanced monitoring capabilities and peace of mind, knowing that their baby is safe and
secure even when they are not in the same room or in a same house.

1.3 EXISTING SYSTEM


There are several baby monitors with automatic response currently available in the market.
Here are some examples:

Nanit Plus Smart Baby Monitor: This monitor uses computer vision technology to track
and analyze your baby's sleep patterns, and it comes with a built-in white noise machine
that can automatically turn on if your baby wakes up.

Arlo Baby Monitor: This monitor can detect when your baby is crying or in distress, and it
can play lullabies or white noise to soothe them. It also has air sensors that can detect
changes in temperature, humidity, and air quality, and it can send alerts to your phone if
there are any issues.

Motorola Halo+ Video Baby Monitor: This monitor has a built-in projector that can display
a calming image on the ceiling to help your baby fall asleep. It also has a two-way audio
feature that allows you to talk to your baby from another room.

Owlet Smart Sock Baby Monitor: This monitor comes with a wearable sock that tracks your
baby's heart rate and oxygen levels. If there are any irregularities, it can sound an alarm or
send an alert to your phone.

i-Baby Monitor M7: This monitor has a built-in moonlight soother that can project a calming
image on the ceiling to help your baby fall asleep. It also has a sound and motion detector that
can send alerts to your phone if there is any movement or noise in the room.

4
These are just a few examples of the many baby monitors with automatic response that are
available in the market today. It's important to do your research and choose a device that fits
your needs and budget.

1.4 PROPOSED SYSTEM


A baby monitor with automatic response using automatic speech recognition (ASR) can be a
helpful system for parents to ensure the safety and well-being of their infants. Here is a proposed
system that can achieve this:

Hardware Components: The system will require a microphone, speaker, and a microcontroller or
a computer to process the audio input and trigger the response. The system can be either wired or
wireless, with the latter using a Wi-Fi or Bluetooth module for connectivity.

Speech Recognition Software: The system should use a robust ASR software that can accurately
detect and transcribe speech in real-time. There are various open-source and commercial ASR
software available, such as Google Speech-to-Text, Amazon Transcribe, or Mozilla Deep-Speech.

Response Options: The system can be programmed to respond to specific phrases or sounds made
by the baby. For example, if the baby is crying or making distressed sounds, the system can play
a pre-recorded lullaby or a soothing voice message to calm the baby. If the baby is calling out for
the parents, the system can alert them through a notification on their smartphone or via a pre-
recorded message.

Machine Learning: The ASR system can be improved over time by using machine learning
algorithms. The system can learn to recognize the baby's unique voice patterns and distinguish
them from other sounds in the environment, such as background noise or household appliances.

5
Privacy and Security: The system should be designed to ensure the privacy and security of the
data collected. The audio input can be processed locally on the device rather than being sent to a
cloud server. The system can also use encryption and authentication protocols to prevent
unauthorized access.

User Interface: The system should have a user-friendly interface that allows parents to customize
the response options and adjust the sensitivity of the ASR system. The system can also provide
real-time feedback on the baby's sounds and activities, such as sleep patterns or feeding times.

In conclusion, a baby monitor with automatic response using automatic speech recognition can
provide a valuable tool for parents to monitor and respond to their baby's needs. The system can
be customized to suit individual preferences and can be continuously improved using machine
learning algorithms. However, it is essential to ensure the privacy and security of the data
collected and to provide a user-friendly interface for ease of use.

6
CHAPTER 2
LITERATURE SURVEY

[1] T. Fuhr, H. Reetz and C. Wegener, “Comparison of supervised-learning models for infant cry
classifi-cation/Vergleich von Klassifikationsmodellen zur Säuglingsschreianalyse,” International
Journal of HealthProfession.
This paper studies the information of supervised learning models for infant cry classification.It
classifies the cry of a infant and give a alert to their responsible person.

[2] G. V. I. S. Silva and D. S. Wickremasinghe, “Infant cry detection system with automatic
soothing and videomonitoring functions,” Journal of Engineering and Technology of the Open
University of Sri Lanka.
This paper studies about the automatic cry detection of a infant and it reacts through the alert
sms,and it also
Consist of video monitoring features to monitor the infant.

[3] D. Ravichandran, P. Praveenkumar, S. Rajagopalan, J. B. B. Rayappan and R. Amirtharajan,


“ROI-based medical image watermarking for accurate tamper detection, localisation and
recovery,” Medical &Biological Engineering & Computing.
This paper studies about the accurate tamper detection with medical image this also provides the
information about the localization and recovery of a tamper by using medical and biological
engineering and computing.

[4] S. M. Luddington-Hoe, X. Cong and F. Hashemi, “Hashemi infant crying: Nature, physiologic
conse-quences, and select interventions,” Neonatal Network.
This paper studies about the baby crying in their nature and it also recognize by the cry of a infant
whether it is
based on nature or physiologic consequences and indicate the parent’s.

[5] E. Rayachoti and S. R. Edara, “Robust medical image watermarking technique for accurate

7
detection of tampers inside region of interest and recovering original region of interest,” IET
Image Processing.
This paper studies about the robust medical image watermarking technique for accurate detection
of tampers
Inside the baby’s sleeping or living region.If it finds any unnormal conditions it gives an alert to
their parent’s.

2..1 CLASSIFICATION ALGORITHM


To create a classification algorithm for a baby monitor with automatic response using
automatic speech recognition, the following steps can be taken:

Collect and prepare the dataset: Gather a dataset of audio recordings of babies crying, cooing,
and making other sounds. Label each recording with the appropriate category, such as "crying,"
"happy," "hungry," etc. Clean the dataset by removing any irrelevant or noisy data.

Extract features: Use a feature extraction technique, such as Mel-frequency cepstral coefficients
(MFCCs), to extract meaningful features from the audio recordings. This will help to represent
the audio data in a way that can be used by the classification algorithm.

Train the classification model: Use a machine learning algorithm, such as a convolutional neural
network (CNN) or a support vector machine (SVM), to train the classification model. Use the
labeled dataset to train the model to recognize the different categories of baby sounds.

Integrate automatic speech recognition: Use an automatic speech recognition (ASR) system to
convert the audio input from the baby monitor into text. This can be done using a pre-trained
ASR model, such as Google Speech API or Amazon Transcribe.

Use the classification model to make automatic responses: Once the audio input has been
converted to text, use the classification model to determine the appropriate response based on

8
the category of sound. For example, if the baby is crying, the response could be to play a lullaby
or to send a notification to the parent's phone.

Test and refine the system: Test the system with new audio recordings to evaluate its
performance and refine the model as necessary to improve its accuracy.

Overall, this approach combines machine learning and ASR to create a baby monitor with
automatic response that can help parents respond to their baby's needs more quickly and
efficiently.

2.2 DETECTION ALGORITHM:


To develop a detection algorithm for a baby monitor with automatic response using automatic
speech recognition, you can follow the steps below:

Determine the types of sounds that need to be detected: The first step is to determine the types of
sounds that are relevant for a baby monitor. These may include crying, laughing, talking, and
other noises.

Collect a dataset of sounds: Once you have identified the types of sounds, you can collect a dataset
of these sounds to use for training your algorithm. You can record these sounds using a
microphone and label them according to their type.

Train a machine learning model: You can use machine learning algorithms to train a model that
can recognize the different sounds. One way to do this is to use a neural network with automatic
speech recognition capabilities. You can use the labeled dataset to train the network to recognize
the different sounds.

Implement the algorithm: Once the model is trained, you can implement the algorithm in the baby
monitor. The monitor can listen for sounds and pass them through the model to identify the type
of sound.

9
Take automatic response actions: After the sound is identified, you can program the baby monitor
to take automatic response actions. For example, if the monitor detects crying, it can play soothing
music or turn on a night light. If it detects talking or laughing, it can record and save the audio for
later playback.

Test and refine the algorithm: Finally, you should test and refine the algorithm to ensure that it
works accurately and reliably. You can do this by collecting more data and retraining the model,
or by adjusting the parameters of the algorithm to improve its performance.

2.3 SEGMENTATION ALGORITHM


A possible segmentation algorithm for a baby monitor with automatic response using
automatic speech recognition (ASR) could be as follows:
Audio input: The algorithm receives the audio input from the baby monitor's microphone.

Pre-processing: The audio signal is pre-processed to remove noise and normalize the volume
level.

Speech detection: The algorithm detects whether there is speech in the audio signal or not. This
can be done using a speech detection algorithm such as VAD (voice activity detection) that can
detect the presence of speech in an audio signal.

Speech recognition: If speech is detected, the algorithm performs automatic speech recognition
(ASR) on the audio signal to convert it into text. This can be done using a speech recognition
engine such as Google Speech API, Amazon Transcribe, or any other suitable ASR engine.

Segmentation: The text output from the ASR engine is segmented into individual words or phrases
using a natural language processing (NLP) technique such as part-of-speech tagging or named
entity recognition. The segments can be further classified into categories such as "crying,"
"babbling," "talking," etc.

10
Response generation: Based on the detected segments, the algorithm generates an appropriate
response. For example, if the segment is classified as "crying," the algorithm may play a lullaby
or a soothing sound. If the segment is classified as "talking," the algorithm may play a pre-
recorded message or notify the parent.

Output: The response generated by the algorithm is sent to the parent's device or the baby
monitor's speaker.

This segmentation algorithm can be improved by incorporating machine learning techniques such
as deep learning or reinforcement learning to enhance the accuracy and responsiveness of
the system.

11
CHAPTER 3
SYSTEM REQUIREMENTS
3.1 HARDWARE SYSTEM CONFIGURATION
To build a baby monitor with automatic response using automatic speech recognition, you would
need the following hardware system configuration:

Microphone: A high-quality microphone is needed to capture the baby's sounds clearly. The
microphone should be sensitive enough to detect even soft sounds.

Speaker: A speaker is required to play the automated responses to the baby. The speaker should
be of good quality and loud enough to be heard clearly from a distance.

Processor: A powerful processor is required to run the automatic speech recognition software.
The processor should be able to process audio data quickly and accurately.

Memory: Sufficient memory is necessary to store the audio data that is captured by the
microphone and processed by the speech recognition software.

Wireless connectivity: A wireless connection, such as Wi-Fi or Bluetooth, is required to transmit


the audio data to a remote device, such as a smartphone or tablet.

Power source: The baby monitor will need a power source, such as a rechargeable battery or a
power adapter.

Automatic speech recognition software: A high-quality speech recognition software is necessary


to recognize the baby's sounds and trigger the appropriate response.

12
Overall, the hardware system configuration should be designed with the aim of creating a user-
friendly and reliable baby monitor that provides real-time audio monitoring, automatic speech
recognition, and automatic response.

3.2 SOFTWARE SYSTEM CONFIGURATION


To configure a software system for a baby monitor with automatic response using automatic
speech recognition, you would need to consider several components and their interactions. Here
are some of the key components and their requirements:

Microphone: The microphone should be able to capture audio from the baby's room clearly and
with minimal noise. A directional microphone may be helpful to reduce background noise and
increase sensitivity to the baby's voice.

Speech recognition software: The speech recognition software should be able to accurately
recognize the baby's voice and distinguish it from other sounds in the room. The software should
also be able to process the audio input in real-time to provide a prompt response.

Response system: The response system should be able to generate appropriate responses based
on the baby's needs. For example, if the baby is crying, the system may respond with soothing
music or a recording of the parent's voice. If the baby is talking, the system may respond with a
pre-recorded message or a live audio feed from the parent.

Control system: The control system should be able to manage the interactions between the
microphone, speech recognition software, and response system. This may involve setting
thresholds for audio input, managing system resources, and monitoring system performance.

To integrate these components into a cohesive system, you may need to develop or adapt existing
software libraries or frameworks. You may also need to test the system extensively to ensure it is

13
accurate, reliable, and responsive. Additionally, you may want to consider implementing security
features to protect the system and its users' privacy.

14
CHAPTER 4
SYSTEM ANALYSIS
4.1 EXISTING SYSTEM
There are several baby monitors with automatic response currently available in the market.
Here are some examples:

Nanit Plus Smart Baby Monitor: This monitor uses computer vision technology to track
and analyze your baby's sleep patterns, and it comes with a built-in white noise machine
that can automatically turn on if your baby wakes up.

Arlo Baby Monitor: This monitor can detect when your baby is crying or in distress, and it
can play lullabies or white noise to soothe them. It also has air sensors that can detect
changes in temperature, humidity, and air quality, and it can send alerts to your phone if
there are any issues.

Motorola Halo+ Video Baby Monitor: This monitor has a built-in projector that can display
a calming image on the ceiling to help your baby fall asleep. It also has a two-way audio
feature that allows you to talk to your baby from another room.

Owlet Smart Sock Baby Monitor: This monitor comes with a wearable sock that tracks your
baby's heart rate and oxygen levels. If there are any irregularities, it can sound an alarm or
send an alert to your phone.

iBaby Monitor M7: This monitor has a built-in moonlight soother that can project a calming
image on the ceiling to help your baby fall asleep. It also has a sound and motion detector
that can send alerts to your phone if there is any movement or noise in the room.

15
4.1.1 LIMITATIONS OF EXISTING SYSTEM
While existing systems of baby monitors with automatic response using automatic
speech recognition (ASR) can be helpful in providing parents with a sense of security and peace
of mind, they also have several limitations. Here are some of the most common limitations:

Limited Vocabulary: ASR systems used in baby monitors are often limited in the number of words
and phrases they can recognize accurately. This can result in false alarms or missed alerts.

Background Noise: The ASR system may struggle to differentiate between the baby's cries and
other background noises, such as household sounds or even other children playing in the room.

Speech Development: As babies develop, their cries and other vocalizations change. An ASR
system that was once effective may become less reliable over time.

False Positives: An ASR system may occasionally trigger an alert even if the baby is not actually
crying, causing unnecessary anxiety for parents.

Technical Issues: As with any technology, the ASR system may experience technical issues or
malfunctions, leading to missed alerts or false alarms.

Dependence on Internet Connection: Baby monitors with automatic response using ASR require
an internet connection to function properly. This means that if there is a disruption in the
connection, the system may not work effectively.

Privacy Concerns: There may be concerns about the security and privacy of the data collected by
the ASR system, as well as the potential for the system to be hacked or used for malicious
purposes.

16
4.2 PROPOSED SYSTEM
A baby monitor with automatic response using automatic speech recognition (ASR) can be a
helpful system for parents to ensure the safety and well-being of their infants. Here is a
proposed system that can achieve this:
Hardware Components: The system will require a microphone, speaker, and a microcontroller or
a computer to process the audio input and trigger the response. The system can be either wired or
wireless, with the latter using a Wi-Fi or Bluetooth module for connectivity.

Speech Recognition Software: The system should use a robust ASR software that can accurately
detect and transcribe speech in real-time. There are various open-source and commercial ASR
software available, such as Google Speech-to-Text, Amazon Transcribe, or Mozilla Deep-Speech.

Response Options: The system can be programmed to respond to specific phrases or sounds made
by the baby. For example, if the baby is crying or making distressed sounds, the system can play
a pre-recorded lullaby or a soothing voice message to calm the baby. If the baby is calling out for
the parents, the system can alert them through a notification on their smartphone or via a pre-
recorded message.

Machine Learning: The ASR system can be improved over time by using machine learning
algorithms. The system can learn to recognize the baby's unique voice patterns and distinguish
them from other sounds in the environment, such as background noise or household appliances.

Privacy and Security: The system should be designed to ensure the privacy and security of the
data collected. The audio input can be processed locally on the device rather than being sent to a
cloud server. The system can also use encryption and authentication protocols to prevent
unauthorized access.

17
User Interface: The system should have a user-friendly interface that allows parents to customize
the response options and adjust the sensitivity of the ASR system. The system can also provide
real-time feedback on the baby's sounds and activities, such as sleep patterns or feeding times.

In conclusion, a baby monitor with automatic response using automatic speech recognition can
provide a valuable tool for parents to monitor and respond to their baby's needs. The system can
be customized to suit individual preferences and can be continuously improved using machine
learning algorithms. However, it is essential to ensure the privacy and security of the data
collected and to provide a user-friendly interface for ease of use.

4.2.1 SYSTEM ARCHITECTURE


Designing a baby monitor with automatic response using automatic speech recognition
involves several key components and considerations. Here's an overview of the system
architecture for such a device:

Microphone: The device would require a high-quality microphone to capture the sounds from the
baby's room. The microphone should be sensitive enough to pick up the baby's cries and other
sounds, while also filtering out background noise.

Automatic Speech Recognition (ASR): The device would need an ASR system to process the
audio data from the microphone and convert it into text. This would involve training the ASR
model on a large dataset of baby sounds, including cries, coos, and other vocalizations.

Natural Language Processing (NLP): The text output from the ASR system would need to be
processed using NLP techniques to identify the baby's needs and determine an appropriate
response. This would involve analyzing the tone, pitch, and other features of the baby's cries to
determine if they are hungry, in pain, or simply need attention.

Response System: Once the NLP system has determined the baby's needs, the device would need
to provide an appropriate response. This could involve playing a soothing sound, such as white

18
noise or lullaby, or activating a built-in night light. In some cases, the device might also need to
send an alert to the parent's smartphone or other device.

Connectivity: To enable remote monitoring and control, the device would need to be connected
to the internet or a local Wi-Fi network. This would allow the device to send alerts and receive
commands from the parent's smartphone or other device.

Power Management: The device would need a reliable power source, such as a rechargeable
battery or AC power supply. The device should also be designed to conserve power, particularly
when not in use, to maximize battery life and minimize energy consumption.

Overall, designing a baby monitor with automatic response using automatic speech recognition
requires a careful balance of hardware and software components, along with a deep understanding
of the needs of both babies and parents. The system architecture should be designed to provide a
reliable, easy-to-use solution that meets the needs of both parents and their infants.

19
4.2.2 BLOCK DIAGRAM
Here is a block diagram for a baby monitor with automatic response using automatic
speech recognition:

Figure 4.1 System Architecture

The baby monitor consists of a microphone that captures the audio signals from the baby and
sends them to the microcontroller. The microcontroller processes the signals and sends them to
the speech digital signal processor (DSP). The speech DSP is responsible for analyzing the audio
signals to recognize the baby's voice and commands.

The speech DSP sends control signals to the control logic based on the recognized commands.
The control logic then generates response signals that are sent to the microcontroller. The
microcontroller sends these response signals to the microphone and the loudspeaker to produce
the appropriate responses.

For example, if the baby cries, the microphone captures the audio signal and sends it to the
microcontroller. The microcontroller sends the audio signal to the speech DSP, which recognizes

20
the baby's cry. The speech DSP sends a control signal to the control logic, which generates a
response signal to turn on the lullaby music. The microcontroller sends the response signal to the
loudspeaker, which plays the lullaby music to soothe the baby.

21
4.3 MODULE 1 DESCRIPTION
A baby monitor with automatic response and automatic speech recognition is a device
designed to monitor a baby's sounds and movements, and respond automatically to certain types
of events. The device is equipped with a microphone and a camera, and uses advanced technology
to detect and interpret the baby's sounds and movements.

When the baby makes a sound or moves, the microphone and camera detect it, and the audio and
video signals are transmitted to a processing unit. The processing unit uses automatic speech
recognition to analyze the sounds and determine if they require a response. For example, if the
baby is crying, the processing unit will detect this and trigger a pre-programmed response, such
as playing a soothing sound or activating a night light.

The device can also be programmed to respond to other events, such as if the baby stops moving
for a certain amount of time, indicating that they may have stopped breathing. In this case, the
device can be programmed to sound an alarm or call emergency services.

The automatic speech recognition technology used in the device is based on deep learning
algorithms, which are trained on large datasets of baby sounds and cries. This allows the system
to accurately recognize and interpret different types of cries, such as hunger cries, tired cries, or
pain cries.

The baby monitor with automatic response and automatic speech recognition is designed to
provide parents with peace of mind, knowing that their baby is being monitored and responded to
even when they are not in the room. It can also be used in situations where parents may not be
able to respond immediately to their baby's needs, such as when they are sleeping or away from
home.

Overall, this type of baby monitor represents a significant advance in the technology of infant
care, providing parents with a reliable and effective way to monitor and respond to their baby's
needs around the clock

22
4.4 MODULE 2 DESCRIPTION
A baby monitor with automatic response using automatic speech recognition is a device that
allows parents to monitor their baby's sounds and movements remotely, and respond to them
automatically using voice commands. This module consists of several components that work
together to achieve this functionality.

The first component is the microphone, which captures the sounds made by the baby and sends
them to the automatic speech recognition (ASR) module. The ASR module then processes the
audio signals, converts them into text, and interprets the meaning of the words spoken by the
baby.

The second component is the natural language processing (NLP) module, which analyzes the
meaning of the words and phrases spoken by the baby, and determines the appropriate response
based on a set of pre-defined rules. For example, if the baby says "I'm hungry", the NLP module
may respond with a pre-recorded message like "I'm preparing your bottle now".

The third component is the speaker, which outputs the pre-recorded response messages to the
baby. The speaker can be configured to play different messages based on the specific needs of the
baby, such as feeding, diaper changing, or soothing.

The fourth component is the user interface, which allows parents to configure the settings of the
baby monitor, such as the sensitivity of the microphone, the volume of the speaker, and the rules
for the NLP module. The user interface can be accessed through a mobile app or a web-based
portal, and can be customized to meet the specific needs of each family.

Overall, a baby monitor with automatic response using automatic speech recognition provides
parents with a convenient and efficient way to monitor their baby's needs and respond to them
quickly and effectively. By leveraging the latest advances in ASR and NLP technology, this
module can help parents provide better care for their babies and enjoy greater peace of mind

23
4.5 LIST OF ITEMS IN COCO DATASET

The COCO (Common Objects in Context) dataset is a large-scale object detection,


segmentation, and captioning dataset. While it contains many categories of objects, it does not
specifically have a category for baby monitors or automatic speech recognition.

That being said, you could potentially create a custom dataset by collecting images of baby
monitors in use and using automatic speech recognition to transcribe any spoken responses. Your
dataset might include the following items:

Images of baby monitors in use.


Transcriptions of spoken responses generated by automatic speech recognition.
Object annotations for the baby monitor (e.g. bounding box coordinates, segmentation masks).
Meta-data about the images (e.g. lighting conditions, time of day, distance from camera to baby
monitor).
A training/validation/testing split for the dataset
Pre-processing steps (e.g. resizing images, normalizing the transcriptions)
A model architecture for the automatic response generation task
Evaluation metrics for the model (e.g. accuracy, F1 score, BLEU score)
Any additional annotations or metadata that might be useful for the task (e.g. audio recordings of
the spoken responses, user demographics)

24
4.6 NEURAL NETWORK

Building a neural network for a baby monitor with automatic response using automatic speech
recognition can be a challenging task, but it is definitely possible. Here is a high-level overview
of the steps involved:

Data collection: To train the neural network, you need a dataset of audio recordings of babies
crying or making sounds, along with corresponding labels indicating whether the baby is crying
or not. You can either collect this data yourself by recording your own baby, or you can use
existing datasets that are publicly available.

Data pre-processing: Once you have your dataset, you need to pre-process the audio recordings
to extract relevant features. This could involve applying techniques like Fourier transforms, Mel
frequency cepstral coefficients (MFCCs), or other signal processing techniques to extract features
like frequency, amplitude, and spectral characteristics.

Training the neural network: Once you have pre-processed the data, you can train the neural
network using a supervised learning approach. You will need to split your dataset into training
and validation sets, and then use an algorithm like backpropagation to optimize the network's
weights and biases to minimize the loss function.

Automatic speech recognition: Once the neural network is trained to recognize when a baby is
crying, you can add a component for automatic speech recognition. This could involve using a
pre-trained speech recognition model, or training your own model using a dataset of recordings
of adult voices saying common baby-related phrases like "it's okay" or "time for a diaper change".

Automatic response: Once the system recognizes that the baby is crying and processes the speech,
it can automatically respond with a pre-recorded or synthesized message, such as "I'm coming"
or "everything is okay".

25
Deployment: Finally, you can deploy the system on a device like a Raspberry Pi, or integrate it
into an existing baby monitor system.

Of course, this is just a high-level overview of the process, and there are many details and
challenges involved in each step. Nonetheless, with the right data, tools, and expertise, it is
definitely possible to build a neural network for a baby monitor with automatic response using
automatic speech recognition.

4.7 COMPARISION OF VARIOUS ALGORITHM


Comparison of various algorithm for baby monitor with Automatic response using
Automatic speech recognition

There are several algorithms that can be used for a baby monitor with automatic response
using automatic speech recognition (ASR). Here is a comparison of some common algorithms:

Hidden Markov Model (HMM): HMM is a statistical model that is often used in speech
recognition. It is based on the assumption that the observed speech signal is a sequence of hidden
states. HMM can be trained to recognize specific words or phrases in the baby's speech. However,
HMM may not be very accurate in recognizing speech in noisy environments.

Convolutional Neural Networks (CNN): CNN is a type of neural network that is often used in
image recognition. However, it can also be used in speech recognition. CNN can learn to
recognize features in the speech signal and can be trained to recognize specific words or phrases.
CNN is more accurate than HMM in recognizing speech in noisy environments.

Long Short-Term Memory (LSTM): LSTM is a type of recurrent neural network that is often used
in speech recognition. It can learn to recognize patterns in the speech signal over time. LSTM can
be trained to recognize specific words or phrases and can be more accurate than both HMM and
CNN in recognizing speech in noisy environments.

26
Gaussian Mixture Model (GMM): GMM is a statistical model that is often used in speech
recognition. It is based on the assumption that the speech signal is a combination of multiple
Gaussian distributions. GMM can be trained to recognize specific words or phrases in the baby's
speech. However, like HMM, GMM may not be very accurate in recognizing speech in noisy
environments.

When it comes to choosing the best algorithm for a baby monitor with automatic response using
ASR, there are several factors to consider, such as the level of accuracy required, the processing
power available, and the specific needs of the user. However, based on the comparison above,
LSTM and CNN are both good choices for recognizing speech in noisy environments and
achieving high accuracy.

27
4.8 PERFORMANCE EVALUATION
When evaluating the performance of a baby monitor with automatic response using automatic
speech recognition (ASR), there are several factors to consider.

Accuracy of ASR: The accuracy of the ASR system is crucial to the performance of the baby
monitor. The ASR system should be able to accurately transcribe the baby's speech, even in noisy
environments. The accuracy of the ASR system can be measured using metrics such as word error
rate (WER), sentence error rate (SER), and recognition rate.

Responsiveness: The responsiveness of the baby monitor is also important. The system should be
able to detect the baby's cries quickly and respond appropriately. The response time can be
measured by the time it takes for the system to detect the baby's cry and respond.

Effectiveness of response: The effectiveness of the response is another important factor. The
response should be appropriate to the baby's needs and should calm the baby down. The
effectiveness of the response can be measured by observing the baby's reaction to the response.

False alarms: False alarms can be a problem with any monitoring system. The system should be
able to distinguish between the baby's cries and other noises in the environment, such as the sound
of a pet or a passing car. The false alarm rate can be measured by the number of false alarms per
hour.

User experience: The user experience is also important. The system should be easy to set up and
use, and the response should be customizable to the user's preferences.

Overall, when evaluating the performance of a baby monitor with automatic response using ASR,
it is important to consider the accuracy of the ASR system, the responsiveness and effectiveness
of the response, the false alarm rate, and the user experience.

28
CHAPTER 5
SYSTEM DESIGN
5.1 SYSTEM FRAMEWORK
A system framework for a baby monitor with automatic response using automatic speech
recognition (ASR) could include the following components:

Figure 5.1 UML diagram

29
Figure 5.2 UML diagram

Figure 5.3 framework

30
CHAPTER 6
SYSTEM IMPLEMENTATION
6.1 MODULE
A baby monitor with automatic response and automatic speech recognition is a device
designed to monitor a baby's sounds and movements, and respond automatically to certain types
of events. The device is equipped with a microphone and a camera, and uses advanced technology
to detect and interpret the baby's sounds and movements.

When the baby makes a sound or moves, the microphone and camera detect it, and the audio and
video signals are transmitted to a processing unit. The processing unit uses automatic speech
recognition to analyze the sounds and determine if they require a response. For example, if the
baby is crying, the processing unit will detect this and trigger a pre-programmed response, such
as playing a soothing sound or activating a night light.

The device can also be programmed to respond to other events, such as if the baby stops moving
for a certain amount of time, indicating that they may have stopped breathing. In this case, the
device can be programmed to sound an alarm or call emergency services.

The automatic speech recognition technology used in the device is based on deep learning
algorithms, which are trained on large datasets of baby sounds and cries. This allows the system
to accurately recognize and interpret different types of cries, such as hunger cries, tired cries, or
pain cries.

The baby monitor with automatic response and automatic speech recognition is designed to
provide parents with peace of mind, knowing that their baby is being monitored and responded to
even when they are not in the room. It can also be used in situations where parents may not be

31
able to respond immediately to their baby's needs, such as when they are sleeping or away from
home.

Overall, this type of baby monitor represents a significant advance in the technology of infant
care, providing parents with a reliable and effective way to monitor and respond to their baby's
needs around the clock

6.2 TOOLS
6.2.1 MICROPHONE
A high quality microphone is a crucial component of a baby monitor with automatic
sounds and transmit them to the speech recognition software for further processing.
6.2.2 AUTOMATIC SPEECH RECOGNITION SOFTWARE
Automatic speech recognition software is used to process the baby's sounds and convert
them into text. There are many ASR software available, including Google Cloud Speech-to-Text,
Amazon Transcribe, and Microsoft Azure Speech Services.

6.2.3 NATURAL LANGUAGE PROCESSING

NLP software is used to analyze the text generated by the ASR software and determine
the appropriate response. This may involve identifying specific words or phrases, such as "I'm
hungry" or "I need a diaper change."

6.2.4 TEXT-TO-SPEECH(TTS)SOFTWARE

Once the appropriate response has been determined, TTS software is used to generate a
spoken response that can be played through the baby monitor's speaker. There are many TTS
software available, including Google Text-to-Speech, Amazon Polly, and Microsoft Text-to-
Speech.

32
6.2.5 PYTHON
Python is a popular language for developing machine learning algorithms and has many
libraries for speech recognition and natural language processing.

6.2.6 JAVASCRIPT

JavaScript can be used for developing web-based applications and can be combined with
speech recognition libraries like the Web Speech API.

6.2.7 C++

C++ is a popular language for developing real-time applications and can be used to create
low-level interfaces for controlling hardware devices like a baby monitor.

6.2.8 SWIFT

Swift is a programming language used for developing iOS applications and can be used
to create a mobile app for a baby monitor with automatic speech recognition.

33
CHAPTER 7
CONCLUSION AND FUTURE WORK
7.1 CONCLUSION
In conclusion, baby monitors with automatic response using automatic speech recognition
technology offer an innovative solution to help parents keep a watchful eye and ear on their
infants. With this technology, parents can be alerted when their baby needs attention, even when
they are not in the same room.

However, it is important to note that automatic speech recognition technology is not always
perfect, and it may not always accurately interpret a baby's cries or sounds. Parents should still be
aware of their baby's needs and use their own judgement when responding to alerts from the
monitor.

Overall, baby monitors with automatic response using automatic speech recognition technology
have the potential to be a useful tool for parents, but it is important to consider the limitations of
the technology and use it as a supplement to their own caregiving instincts.

34
7.2 FUTURE WORK
Future enhancement of baby monitor with Automatic response using Automatic speech
recognition

A baby monitor with automatic response using automatic speech recognition could be a valuable
enhancement to the current technology. Here are a few potential features that could be included
in such a system:

Voice-activated response: The baby monitor could be designed to respond to specific voice
commands from the parents or caregivers. For example, saying "monitor on" could turn the device
on, and saying "monitor off" could turn it off.

Crying detection: The baby monitor could use automatic speech recognition to detect when the
baby is crying and respond accordingly. For example, it could play a lullaby or white noise to
soothe the baby, or it could send an alert to the parent's smartphone.

Customized responses: The baby monitor could be programmed with specific responses based on
the parents' preferences. For example, the parents could choose to have the monitor play a specific
song or read a bedtime story when the baby wakes up.

Two-way communication: The baby monitor could allow for two-way communication between
the parents and the baby. This could be especially useful for checking in on the baby or soothing
them without having to physically go into the room.

Remote control: The baby monitor could be controlled remotely using a smartphone app. This
would allow parents to adjust settings, turn the monitor on or off, or respond to the baby from
anywhere with an internet connection.

35
APPENDIX 1
An appendix for a baby monitor with automatic response using automatic speech recognition
could include the following:

Technical specifications: This section should outline the technical specifications of the baby
monitor, including the range of the device, the types of sensors used, the battery life, and the
compatibility with other devices.

Automatic speech recognition (ASR) software: This section should describe the ASR software
used in the baby monitor, including its accuracy, speed, and compatibility with different
languages.

Audio and video feed: This section should detail the audio and video feed provided by the baby
monitor, including the quality of the audio and video, and any additional features, such as night
vision.

Automatic response feature: This section should explain the automatic response feature of the
baby monitor, which allows the device to respond to certain cues, such as a baby crying, by
providing a predetermined response, such as playing soothing music or turning on a night light

36
A1.1 SAMPLE CODING
Here's a sample code for a baby monitor with automatic response using automatic speech
recognition. This code is just a sample and may need to be modified depending on your specific
requirements and the hardware you're using:

import speech_recognition as sr
import pyttsx3

# Initialize the speech recognition and text-to-speech engines


r = sr.Recognizer()
engine = pyttsx3.init()

# Set the voice for the text-to-speech engine


voices = engine.getProperty('voices')
engine.setProperty('voice', voices[0].id)

# Define the function to recognize speech and respond


def recognize_speech():
with sr.Microphone() as source:
print("Speak now...")

audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("You said:", text)
engine.say("I heard you say " + text)
engine.runAndWait()
except sr.UnknownValueError:
print("Sorry, I could not understand what you said")
engine.say("Sorry, I could not understand what you said")
engine.runAndWait()

# Continuously listen for speech and respond


while True:
recognize_speech()

37
This code uses the speech_recognition library to recognize speech from the microphone and the
pyttsx3 library to convert text to speech. The recognize_speech function uses the microphone to
listen for speech, recognizes it using the Google speech recognition API, prints the recognized
text, and responds using the text-to-speech engine.

You can customize this code to trigger certain actions when specific phrases are recognized, such
as turning on a night light, playing a lullaby, or sending an alert to your phone.

38
A1.2 OUTPUT SCREENSHOTS

Figure A1.2.1 eye motion sensor

Figure A1.2.2 Baby postures

Figure A1.2.3 Baby postures at night

39
REFERENCES

[1] T. Fuhr, H. Reetz and C. Wegener, “Comparison of supervised-learning models for infant cry
classifi-cation/Vergleich von Klassifikationsmodellen zur Säuglingsschreianalyse,” International
Journal of HealthProfession, vol. 2, no. 1, pp. 4–15, 2015.

[2] G. V. I. S. Silva and D. S. Wickremasinghe, “Infant cry detection system with automatic
soothing and videomonitoring functions,” Journal of Engineering and Technology of the Open
University of Sri Lanka,vol.5,no. 1, pp. 36–53, 2017.

[3] D. Ravichandran, P. Praveenkumar, S. Rajagopalan, J. B. B. Rayappan and R. Amirtharajan,


“ROI-based medical image watermarking for accurate tamper detection, localisation and
recovery,” Medical &Biological Engineering & Computing, vol. 59, no. 6, pp. 1355–1372, 2021.

[4] Y. Skogsdal, M. Eriksson and J. Schollin, “Analgesia in newborns given oral glucose,” Acta
Paediatrica,vol. 86, no. 2, pp. 217–220, 1997.

[5] S. M. Luddington-Hoe, X. Cong and F. Hashemi, “Hashemi infant crying: Nature, physiologic
conse-quences, and select interventions,” Neonatal Network, vol. 21, no. 2, pp. 29–36, 2002.

[6] E. Rayachoti and S. R. Edara, “Robust medical image watermarking technique for accurate
detection oftampers inside region of interest and recovering original region of interest,” IET
Image Processing,vol.9,no. 8, pp. 615–625, 2015

40

You might also like