0% found this document useful (0 votes)
19 views6 pages

Lobo 2020

The document describes an infant care assistant system that uses machine learning, audio processing, image processing and IoT sensors. It collects data on the infant's state and surroundings to soothe a crying infant. It includes cry detection, cry analysis and emotion recognition units. The system stores data and has a graphical user interface.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Lobo 2020

The document describes an infant care assistant system that uses machine learning, audio processing, image processing and IoT sensors. It collects data on the infant's state and surroundings to soothe a crying infant. It includes cry detection, cry analysis and emotion recognition units. The system stores data and has a graphical user interface.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)

IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4

Infant Care Assistant using Machine Learning,


Audio Processing, Image Processing and IoT
Sensor Network
1
Crispin Lobo, 2Ajinkeya Chitrey, 3Pradeepti Gupta, 4S arfaraj, 5Archana Chaudhari
1,2,3,4
U.G. Student, 5Assistant Professor, Department of Electronics and Telecommunication
Dwarkadas J. Sanghvi College of Engineering
M umbai, India
Email: [email protected], [email protected], [email protected], [email protected],
5
[email protected]

Abstract—With evolving times, working parents have become implementation of a new indigenous low cost E-Baby Cradle
the norm in the emerging contemporary society. This has led to that swings automatically when the baby cries. The cradle
an increased demand in products that assist parents in taking speed can be controlled as required by the user. The system
care of their infants. This paper aims to showcase an Infant Care
has an inbuilt alarm that indicates wet mattress and long
Assistant which employs IoT sensor network and Raspberry Pi
to collect data on the current state of the infant and its duration of cry. In Baby Cry Detection in Domestic
surrounding and automation techniques for soothing a troubled Environment Using Deep Learning (Yizhar Lavner, Rami
infant. The assistant also includes cry detection unit based on Cohen, Dima Ruinskiy, Hans Ijzerman) [3], the authors
support vector classifier, cry analysis unit based on random propose two machine-learning algorithms for automatic
forest classifier and emotion recognition unit based on mini- detection of baby cry in audio recordings. The first algorithm
Xception convolution neural network model. Furthermore, it is a logistic regression classifier. To train this classifier,
stores data using phpMyAdmin and private cloud servers and features such as Mel-frequency cepstral coefficients, pitch and
includes a graphical user interface built using HTML5 and CS S. formants from the recordings are extracted. The second
The results of proposed experiment indicate that this assistant
algorithm employs a convolutional neural network (CNN),
could ease some workload of parents and enable them to take
utmost care of their infants.
operating on log Mel-filter bank representation of the audio
recordings. Performance evaluation of the algorithms is
Keywords—Image processing, Raspberry pi, Random forest carried out using a database containing recordings of babies
classifier, Convolution neural network, IoT (0-6 months old) in domestic environments along with
respective tags. The recordings contain sounds of baby cry,
I. INTRODUCTION parents talking and door opening and closing sounds. The
CNN classifier shows better performance when compared to
Many parents are unable to devote sufficient time to infants on the logistic regression classifier. Image Processing Techniques
account of office work or being short-handed. Additionally, to Recognize Facial Emotions (A. Mercy Rani, R. Durgadevi)
[4] includes face detection, non-skin region extraction and
there are also many first time parents, who lack experience in
morphological processing to recognize emotion. First, frame
raising children. Infants, on the other hand, demand constant based detection is implemented. Then image quality is
attention and care. Simple methods to immediately calm the analyzed. Face location is detected using viola-jones
agitated infants need to be devised. Hence, there is a need to algorithm. Extraction of non-skin region and morphological
assist parents in taking care of their infants by providing them operations are applied to the extracted image to extract the
with a single product which would monitor their infants at all facial feature for recognition of facial emotions.
times, send notifications in case attention is required, raise
III. PROPOSED SOLUTION
alerts in case of emergency situations and provide real time
interaction between parents and infants. Furthermore, a market The proposed solution is an Infant Care Assistant which
research report [1] suggests that birth rate in a few countries consists of the infant monitoring, data transfer, data analysis
has risen. Awareness regarding child safety has risen too. The and user interface units. The infant monitoring unit collects
global smart baby monitor market had a value of USD 972.6 data from various sensors and creates comfortable
million in 2018. These findings further bolster the demand for environment for the infant by controlling the cradle. The data
a pragmatic infant care assistant. transfer unit is a medium for the transfer of data between the
user, the infant monitoring unit and the data analysis unit. The
data analysis unit comprises the cry detection, cry analysis
II. LITERATURE SURVEY
and emotion recognition units which determine the emotional
state of the infant. The user interface gives the user visual and
Automatic E-Baby Cradle Swing Based On Baby Cry (Misha
operational control over the Infant monitoring system.
Goyal and Dilip Kumar) [2] presents the design and

978-1-7281-4108-4/20/$31.00 ©2020 IEEE 317

Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4

on the face of infant. Facial features are extracted from the


image and analyzed to determine emotions [5] [6] [7]. For the
emotion sad, the results are correlated with the cry detection
and cry analysis results. Furthermore, the camera module is
used to check whether the infant is awake or has left the cradle
[8]. If the infant leaves the cradle then alerts are sent and a
loud alarm is sounded from the cradle speakers.
Figure 1: Block diagram of the proposed system
6. Room temperature and humidity sensor
IV. PROJECT DESCRIPTION
The DHT-11 sensor is used to monitor the temperature and
humidity of the infant’s room. An optimal range for an infant,
A. INFANT MONITORING UNIT
especially a new-born is said to be 20-23 degrees Celsius. The
raspberry pi switches on a fan mounted on the cradle when the
(a) Data Acquisition Unit room temperature exceeds a given threshold.
The data acquisition unit consists of various sensors required
to monitor the infant and its surroundings. The sensor results (b) Raspberry Pi 3B+ Control System
are correlated with other sensor results or directly processed The data from the sensors is collected and processed by the
into the decision making algorithms. Raspberry Pi 3B+. Raspberry Pi is a small signal board
computer stamped with chips and I/O connectors. The
1. Moisture sensor Raspberry Pi receives inputs from the data acquisition unit.
A moisture sensor is used to detect whether the infant has wet The analog sensor signals are converted to digital signals
the cradle bed or not. Infant hygiene is important and hence using MCP3208 12-bit Analog to Digital Converter (ADC),
the extent to which the cradle bed has been made wet is which is based on successive approximation register (SAR)
monitored. If the moisture level crosses a threshold, alerts are architecture. The data is analyzed in the Raspberry Pi and
sent. The data of the moisture sensor is correlated with the appropriate actions are executed based on the set thresholds
infant’s cry for cry analysis and finding the cause of the cry. and algorithms. The decision making algorithms basically
monitor the sensor data and when the threshold is crossed or
2. MIC Condenser abnormalities are detected, its raises alerts. These alerts are
mailed to the user and als o uploaded on the webpage. The
The MIC condenser is a digital sensor used to detect the audio Raspberry Pi controls the infant soothing unit which calms
level of the surroundings . Its sensitivity is varied using the on- down the infant when infant cry is detected. All sensor data
board potentiometer. When a loud sound is detected the MIC and results of the Raspberry Pi control system are uploaded to
condenser gives a logic ‘HIGH’ output and subsequently the the local database and private cloud interface for storage and
raspberry pi switches on the USB microphone for recording. also sent to the webpage for display. The Raspberry pi also
receives inputs from the user through the web page to control
3. USB Microphone the infant soothing unit and the camera module.

A USB microphone is connected to the raspberry pi. When a (c) Infant Soothing System
loud sound is detected, it records the sounds and cuts it
continuously into audio signals of 10s each. These audio 1. Servo Motor
signals are then processed for cry detection and cry analysis.
Two 1.8kgcm torque servo motors are used to swing the
4. PIR Motion sensor cradle based on the PWM input to them. The two servo
motors are connected to a rod which is attached to the cradle.
The PIR motion sensor is used to monitor the movement of the One servo motor moves from 0 to 180 degrees while the other
infant and inform parents of the same. If the PIR sensor moves from 180 to 0 degrees. The swinging mechanism can
detects motion, data from the camera module is processed to be used to put the infant to sleep. If the infant cry is detected
verify whether the infant is awake or not. The sleep pattern is then the servo motor is activated and the cradle is made to
analyzed for more accurate results. The PIR motion sensor is swing till the crying stops. The parents can control the
switched off when the servo motor swings the cradle in order swinging of the cradle externally.
to avoid inaccuracies.
2. Speaker
5. Camera module
The audio output of the raspberry pi is connected to a speaker.
Raspberry Pi Camera Module v2 is a high quality 8 megapixel This is used to put the infant to sleep, silence the infant when
image sensor. The camera module is mounted to focus directly it is crying or simply to entertain the infant.

978-1-7281-4108-4/20/$31.00 ©2020 IEEE 318

Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4

3. Fan The web page and the database are hosted by a local server
which is set up on the raspberry pi using the Apache HTTP
When the temperature of the room is above 23 degrees Celsius Server. The tools to be installed on the Raspbian OS of
the fan is switched on by the raspberry pi. The fan is directed raspberry pi to make these features effective are Python,
towards infant to increase comfort. The fans can be switched MySQL and Apache. The web page also has an ‘Alerts’
on and off remotely. section so that the user is notified in case of an emergency.

C. DATA ANALYSIS UNIT

The data analysis unit receives audio and image data from the
data transfer unit. The cry detection model is used to detect
whether the infant is crying or not, the cry analysis model is
used to predict the cause of crying and the emotion
recognition model is used to determine the emotion of the
infant. The result of the cry detection model is sent to the
Raspberry pi, which switches on the infant soothing system if
cry is detected. The result of the emotion recognition model is
sent to the user interface through the data transfer unit.

(a) Cry Detection

Figure 2: Block diagram of the Infant monitoring system The aim of Cry Detection model [10] [11] is to automatically
recognize whether an infant is crying or not. Audio data is
B. DATA TRANSFER UNIT read and feature engineering is performed. The input for
feature extraction is original audio data, sampling frequency of
The data transfer unit is responsible for transferring data from the recorded signal and sampling bits of the recorded signal.
raspberry pi to the data analysis unit and the user interface. The output is values of the respective features: frequency,
pitch frequency, Short-term Energy (STE), Maximum STE
(a) Local Storage acceleration and Mel-Frequency Cepstral Coefficients
In case the raspberry pi is unable to establish a connection (MFCC). These features are obtained using the Librosa library
with the cloud server, the data is stored on a backup database tools which compute zero crossing rate, MFCC, spectral
that uses the storage space of the micro SD card of the centroid, spectral roll off and spectral bandwidth. The
raspberry pi. This database is stored on the phpMyAdmin frequency spectrum is used to estimate the distribution of the
server and is accessible only to network administrators. When voice frequency. For infant’s cry the max STE acceleration is
the raspberry pi connects to the private cloud server, the data high. If more than 5 frames of the recorded signal arrive at a
is flushed to the cloud database. pitch frequency over 200Hz, then this audio sample has a
higher possibility of being an infant cry. Using these features
the prediction step decides whether the recorded audio signal
(b) Cloud Server
is a cry or not.
The data from all the sensors interfaced with the raspberry pi
is stored on a private cloud which is accessible only to
authorized people [9]. The parents can access this data from
anywhere using their web browser, username and password.
The complex programs need a high speed processor to
generate the results quickly. They can’t be run on raspberry pi
because it has lesser processing power than a normal PC.
Therefore, data from raspberry pi is transferred through a
private cloud to the data analysis unit where the complex
programs are stored and executed. The data, in the form of
image or audio, is processed by the programs on the data
analysis unit to analyze the emotions of the infant using image
processing and audio processing techniques. The evaluated
results are sent through the private cloud to the webpage.

(c) Webpage
All the data from sensors is processed by the raspberry pi and
uploaded to the webpage to be displayed in the user interface. Figure 3: Cry detection

978-1-7281-4108-4/20/$31.00 ©2020 IEEE 319

Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4

The support vector classifier (SVC) is used to build the separable convolution layers which are followed by batch
training model. The model is trained using the ESC-50: normalization. The proposed model has used a compact form
Dataset for environmental sound classification [12]. The of this architecture known as miniXception. The miniXception
signals recorded by the USB microphone are transferred to the [14] architecture combines the residual modules and depth-
private cloud and taken as input for the prediction step. The wise separable convolutions . Residual modules change the
output of the cry prediction step is binary- logic 1 if crying is mapping between two adjacent layers, so that the learned
detected and logic 0 if crying is not detected. If infant cry is features become the difference of the original feature map and
detected, a digital signal is sent to the Raspberry pi which the desired features. Depth-wise separable convolutions are
switches on the infant soothing system. constituted of point wise convolutions and depth-wise
convolutions. The use of these layers is to separate the spatial
(b) Cry Analysis cross-correlations and the channel cross -correlations. This is
done by applying a D × D filter on every M input channels
The donateacry corpus dataset [13] is used to train the cry and then applying N 1 × 1 × M convolution filters to
analysis model. This dataset contains samples of an infant’s combine the M input channels and transform into N output
cry for 5 different reasons which form the 5 groups of channels. Applying 1 × 1 × M convolutions directly combines
classification: belly pain, burping, discomfort, hunger and each value in the feature map without taking into account their
tiredness. The first step is feature extraction where the Mel- spatial relation within the channel. Depth-wise separable
Frequency Cepstral Coefficients (MFCC) is computed for all convolutions reduce the computation with respect to the
audio samples of the training dataset. Based on the MFCC standard convolutions by a factor of 1/N +1/D2. Our final
values, the random forest classifier is used for classification. architecture is a fully-convolutional neural network that
When it is infant cry is detected, the raspberry pi records more comprises of 4 residual depth-wise separable convolutions
audio samples and passes them along with the previously where each convolution is followed by a batch normalization
recorded audio samples to the private cloud which passes the operation and a ReLU activation function. The final layer uses
audio data to the cry analysis model. The reason of the cry is a global average pooling so as to reduce the overall size of
predicted and sent to the user interface for display. representation and a soft-max activation function with seven
units for each of the seven emotions to make a prediction. This
architecture uses Adam Optimizer. It has nearly 60,000
parameters.

Figure 4: MFCC Spectrum

Figure 5: MFCC values stored in an array

(c) Emotion Recognition

An experiment is performed on the dataset FERC-2013 [15]


available on Kaggle. The dataset consists of 35,887 4848 Figure 6: Proposed Architecture of Mini -Xception Ne twork
resolution grey scale images. Kaggle has broken it down into
28,709 training images, 3,589 public test images and 3,589 D. USER INTERFACE
private test images. Each image contains a human face and is
labeled with one of the seven emotions- angry, disgust, fear, Graphics User Interface (GUI) has been developed to facilitate
happy, sad, surprise and neutral. The proposed work has the user’s interaction with the infant. It enables the user to
chosen a model derived from Xception Model for image check the temperature and humidity of the infant’s
classification. All convolution operations performed are in surrounding and if the bed is wet. Other features include

978-1-7281-4108-4/20/$31.00 ©2020 IEEE 320

Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4

procuring an image of the infant which helps the user decide if


there’s a need to play music or swing the cradle so as to soothe
the crying infant. The web pages serve the purpose of GUI
completely and can be accessed from anywhere over the
internet with the correct username and password. The
programming for web pages is done using HTML5 language.
CSS (Cascading Style Sheets) is used to make the pages even
more interactive and ease the flow of commands from the user
to the cradle. PHP language is used to develop the back end
functions of the webpages which include retrieving sensor Figure 10: Infant 64.15% sad
data from database and transmission and reception of
raspberry signals regarding cradle, sensors, camera and B. Cry Detection
speaker control.
The cry detection model could accurately determine if the
given audio input was a cry or not a cry.

Figure 11: O utput for Infant Cry


Figure 7: Login Page

Figure 12: O utput for No Infant Cry

C. Cry Analysis
Figure 8: Home Page of GUI
The cry analysis model was tested with the testing dataset and
V. RESULTS an accuracy of 56% was achieved for the same.

A. Emotion Recognition

The emotion recognition model achieved an accuracy of 66%


on the dataset.

Figure 13: Cause of cry determined

D. GUI

The GUI ran successfully, allowing the user to choose from


the options of play music, swing cradle, capture image and
display sensor data.
Figure 9: Infant 92.90% happy

978-1-7281-4108-4/20/$31.00 ©2020 IEEE 321

Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4

the infant, a deep learning model can be trained to identify and


implement better conditions for the infant.

REFERENCES
[1] Grand View Research, “Smart baby monitor market size, share &trends
analysis report by product (audio & video, tracking devices), by distribut ion
channel (offline, online), by region, and segment forecasts, 2019 -2025”,
September 2019
[2] Misha Goyal and Dilip Kumar, “Automatic E-Baby Cradle Swing based
on Baby Cry”, International Journal of Computer Applications 71(21):39-43,
Figure 12: Sensor readings June 2013
[3] Y. Lavner, R. Cohen, D. Ruinskiy and H. Ijzerman, "Baby cry detection in
domestic environment using deep learning," 2016 IEEE International
VI. CONCLUS ION Conference on the Science of Electrical Engineering (ICSEE), Eilat, 2016, pp.
1-5
[4] A. Mercy Rani, R. Durgadevi, “Image Processing T echniques to
The Infant Care Assistant is able to collect essential Recognise Facial Emotions”, International Journal of Engineering and
information about the infant and its surrounding through Advanced T echnology (IJEAT ) ISSN: 2249 – 8958, Volume-6 Issue-6,
August 2017
sensors, process this information to determine the state of the
[5] Punith Kumar M B And Dr. P.S. Puttaswamy, "Video to Frame
infant through audio and image processing and soothe the Conversion of TV News Video By Using Mat Lab", International Journal of
crying infant by automatically rocking the cradle, playing Advance Research in Science And Engineering, IJARSE, vol. No.3, issue
music or regulating the room temperature. The Infant Care no.3, march 2014 issn-2319-8354(e) 95.
Assistant is able to successfully assist busy parents in taking [6] Kanchan Lata Kashyap, Sanjivani Shantaiya "Noise Removal of Facial
Expression Images Using Wiener Filter", National Conference on Emerging
care of their infants. Furthermore, the system is able to store T rends in Computer Science and Information T echnology (ET CSIT ) 2011
important data and raise alerts. Finally, it is made user friendly proceedings published in International Journal of Computer Applications
and interactive by the incorporation of a GUI. (IJCA)
[7] Samir K. Bandyopadhyay proposed, "A Method for Face
Segmentation, Facial Feature Extraction and Tracking" IJCSET | April 2011 |
VII. LIMITATIONS vol.no:1, issue 3,137-139
[8] Gonzales and Woods, ―Digital Image Processing‖, Pearson Education,
India, T hird Edition
In situations such as bed-wetting or infant trying to escape the [9] S. Emima Princy ; K. Gerard Joe Nigel, “ Implementation of cloud server
cradle there are no immediate responses from the system other for real time data storage using Raspberry Pi”, 2015 Online International
than raising alerts. Immediate response from the parents would Conference on Green Engineering and T echnologies (IC-GET )
[10] Lichuan Liu, Yang Li, Kevin Kuo. “Infant Cry Signal Detection, Pattern
be needed in such situations . The sensors used are basic Extraction and Recognition”, 2018 International conference on
whereas for a real life model, sophisticated sensors will be Informationand Computer technologies
[11] T roy-Wang. BabyCryDetector. (2018). Available:
needed. The computational power of raspberry pi limits the https://github.com/T roy-Wang/BabyCryDetector
response time and overall efficiency of the system. [12] Karol J. Piczak, 2015, “ESC: Dataset for Environmental Sound
Classification”, https://doi.org/10.7910/DVN/YDEPUT, Harvard Dataverse,
V2
VIII. FUTURE SCOPE [13] Gveres. Donateacry-corpus. (2015). Available:
https://github.com/gveres/donateacry-corpus
[14] Chen, Lawrence S., T homas S. Huang, T sut omu Miyasato and Ryohei
Additional features such as an android application can be Nakatsu, “Multimodal human emotion/expression recognition.”, in
integrated to receive live feed of the infant, control the music, Proceedings Third IEEE International Conference on Automatic Face and
Gesture Recognition, pp. 366-371. IEEE, 1998
rock the cradle and also adjust temperature and other [15] Omar. Emotion-Recognition. (2018). Available:
parameters as required. By studying the behavioral patterns of https://github.com/omar178/Emotion-recognition

978-1-7281-4108-4/20/$31.00 ©2020 IEEE 322

Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.

You might also like