Lobo 2020
Lobo 2020
Abstract—With evolving times, working parents have become implementation of a new indigenous low cost E-Baby Cradle
the norm in the emerging contemporary society. This has led to that swings automatically when the baby cries. The cradle
an increased demand in products that assist parents in taking speed can be controlled as required by the user. The system
care of their infants. This paper aims to showcase an Infant Care
has an inbuilt alarm that indicates wet mattress and long
Assistant which employs IoT sensor network and Raspberry Pi
to collect data on the current state of the infant and its duration of cry. In Baby Cry Detection in Domestic
surrounding and automation techniques for soothing a troubled Environment Using Deep Learning (Yizhar Lavner, Rami
infant. The assistant also includes cry detection unit based on Cohen, Dima Ruinskiy, Hans Ijzerman) [3], the authors
support vector classifier, cry analysis unit based on random propose two machine-learning algorithms for automatic
forest classifier and emotion recognition unit based on mini- detection of baby cry in audio recordings. The first algorithm
Xception convolution neural network model. Furthermore, it is a logistic regression classifier. To train this classifier,
stores data using phpMyAdmin and private cloud servers and features such as Mel-frequency cepstral coefficients, pitch and
includes a graphical user interface built using HTML5 and CS S. formants from the recordings are extracted. The second
The results of proposed experiment indicate that this assistant
algorithm employs a convolutional neural network (CNN),
could ease some workload of parents and enable them to take
utmost care of their infants.
operating on log Mel-filter bank representation of the audio
recordings. Performance evaluation of the algorithms is
Keywords—Image processing, Raspberry pi, Random forest carried out using a database containing recordings of babies
classifier, Convolution neural network, IoT (0-6 months old) in domestic environments along with
respective tags. The recordings contain sounds of baby cry,
I. INTRODUCTION parents talking and door opening and closing sounds. The
CNN classifier shows better performance when compared to
Many parents are unable to devote sufficient time to infants on the logistic regression classifier. Image Processing Techniques
account of office work or being short-handed. Additionally, to Recognize Facial Emotions (A. Mercy Rani, R. Durgadevi)
[4] includes face detection, non-skin region extraction and
there are also many first time parents, who lack experience in
morphological processing to recognize emotion. First, frame
raising children. Infants, on the other hand, demand constant based detection is implemented. Then image quality is
attention and care. Simple methods to immediately calm the analyzed. Face location is detected using viola-jones
agitated infants need to be devised. Hence, there is a need to algorithm. Extraction of non-skin region and morphological
assist parents in taking care of their infants by providing them operations are applied to the extracted image to extract the
with a single product which would monitor their infants at all facial feature for recognition of facial emotions.
times, send notifications in case attention is required, raise
III. PROPOSED SOLUTION
alerts in case of emergency situations and provide real time
interaction between parents and infants. Furthermore, a market The proposed solution is an Infant Care Assistant which
research report [1] suggests that birth rate in a few countries consists of the infant monitoring, data transfer, data analysis
has risen. Awareness regarding child safety has risen too. The and user interface units. The infant monitoring unit collects
global smart baby monitor market had a value of USD 972.6 data from various sensors and creates comfortable
million in 2018. These findings further bolster the demand for environment for the infant by controlling the cradle. The data
a pragmatic infant care assistant. transfer unit is a medium for the transfer of data between the
user, the infant monitoring unit and the data analysis unit. The
data analysis unit comprises the cry detection, cry analysis
II. LITERATURE SURVEY
and emotion recognition units which determine the emotional
state of the infant. The user interface gives the user visual and
Automatic E-Baby Cradle Swing Based On Baby Cry (Misha
operational control over the Infant monitoring system.
Goyal and Dilip Kumar) [2] presents the design and
Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
A USB microphone is connected to the raspberry pi. When a (c) Infant Soothing System
loud sound is detected, it records the sounds and cuts it
continuously into audio signals of 10s each. These audio 1. Servo Motor
signals are then processed for cry detection and cry analysis.
Two 1.8kgcm torque servo motors are used to swing the
4. PIR Motion sensor cradle based on the PWM input to them. The two servo
motors are connected to a rod which is attached to the cradle.
The PIR motion sensor is used to monitor the movement of the One servo motor moves from 0 to 180 degrees while the other
infant and inform parents of the same. If the PIR sensor moves from 180 to 0 degrees. The swinging mechanism can
detects motion, data from the camera module is processed to be used to put the infant to sleep. If the infant cry is detected
verify whether the infant is awake or not. The sleep pattern is then the servo motor is activated and the cradle is made to
analyzed for more accurate results. The PIR motion sensor is swing till the crying stops. The parents can control the
switched off when the servo motor swings the cradle in order swinging of the cradle externally.
to avoid inaccuracies.
2. Speaker
5. Camera module
The audio output of the raspberry pi is connected to a speaker.
Raspberry Pi Camera Module v2 is a high quality 8 megapixel This is used to put the infant to sleep, silence the infant when
image sensor. The camera module is mounted to focus directly it is crying or simply to entertain the infant.
Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
3. Fan The web page and the database are hosted by a local server
which is set up on the raspberry pi using the Apache HTTP
When the temperature of the room is above 23 degrees Celsius Server. The tools to be installed on the Raspbian OS of
the fan is switched on by the raspberry pi. The fan is directed raspberry pi to make these features effective are Python,
towards infant to increase comfort. The fans can be switched MySQL and Apache. The web page also has an ‘Alerts’
on and off remotely. section so that the user is notified in case of an emergency.
The data analysis unit receives audio and image data from the
data transfer unit. The cry detection model is used to detect
whether the infant is crying or not, the cry analysis model is
used to predict the cause of crying and the emotion
recognition model is used to determine the emotion of the
infant. The result of the cry detection model is sent to the
Raspberry pi, which switches on the infant soothing system if
cry is detected. The result of the emotion recognition model is
sent to the user interface through the data transfer unit.
Figure 2: Block diagram of the Infant monitoring system The aim of Cry Detection model [10] [11] is to automatically
recognize whether an infant is crying or not. Audio data is
B. DATA TRANSFER UNIT read and feature engineering is performed. The input for
feature extraction is original audio data, sampling frequency of
The data transfer unit is responsible for transferring data from the recorded signal and sampling bits of the recorded signal.
raspberry pi to the data analysis unit and the user interface. The output is values of the respective features: frequency,
pitch frequency, Short-term Energy (STE), Maximum STE
(a) Local Storage acceleration and Mel-Frequency Cepstral Coefficients
In case the raspberry pi is unable to establish a connection (MFCC). These features are obtained using the Librosa library
with the cloud server, the data is stored on a backup database tools which compute zero crossing rate, MFCC, spectral
that uses the storage space of the micro SD card of the centroid, spectral roll off and spectral bandwidth. The
raspberry pi. This database is stored on the phpMyAdmin frequency spectrum is used to estimate the distribution of the
server and is accessible only to network administrators. When voice frequency. For infant’s cry the max STE acceleration is
the raspberry pi connects to the private cloud server, the data high. If more than 5 frames of the recorded signal arrive at a
is flushed to the cloud database. pitch frequency over 200Hz, then this audio sample has a
higher possibility of being an infant cry. Using these features
the prediction step decides whether the recorded audio signal
(b) Cloud Server
is a cry or not.
The data from all the sensors interfaced with the raspberry pi
is stored on a private cloud which is accessible only to
authorized people [9]. The parents can access this data from
anywhere using their web browser, username and password.
The complex programs need a high speed processor to
generate the results quickly. They can’t be run on raspberry pi
because it has lesser processing power than a normal PC.
Therefore, data from raspberry pi is transferred through a
private cloud to the data analysis unit where the complex
programs are stored and executed. The data, in the form of
image or audio, is processed by the programs on the data
analysis unit to analyze the emotions of the infant using image
processing and audio processing techniques. The evaluated
results are sent through the private cloud to the webpage.
(c) Webpage
All the data from sensors is processed by the raspberry pi and
uploaded to the webpage to be displayed in the user interface. Figure 3: Cry detection
Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
The support vector classifier (SVC) is used to build the separable convolution layers which are followed by batch
training model. The model is trained using the ESC-50: normalization. The proposed model has used a compact form
Dataset for environmental sound classification [12]. The of this architecture known as miniXception. The miniXception
signals recorded by the USB microphone are transferred to the [14] architecture combines the residual modules and depth-
private cloud and taken as input for the prediction step. The wise separable convolutions . Residual modules change the
output of the cry prediction step is binary- logic 1 if crying is mapping between two adjacent layers, so that the learned
detected and logic 0 if crying is not detected. If infant cry is features become the difference of the original feature map and
detected, a digital signal is sent to the Raspberry pi which the desired features. Depth-wise separable convolutions are
switches on the infant soothing system. constituted of point wise convolutions and depth-wise
convolutions. The use of these layers is to separate the spatial
(b) Cry Analysis cross-correlations and the channel cross -correlations. This is
done by applying a D × D filter on every M input channels
The donateacry corpus dataset [13] is used to train the cry and then applying N 1 × 1 × M convolution filters to
analysis model. This dataset contains samples of an infant’s combine the M input channels and transform into N output
cry for 5 different reasons which form the 5 groups of channels. Applying 1 × 1 × M convolutions directly combines
classification: belly pain, burping, discomfort, hunger and each value in the feature map without taking into account their
tiredness. The first step is feature extraction where the Mel- spatial relation within the channel. Depth-wise separable
Frequency Cepstral Coefficients (MFCC) is computed for all convolutions reduce the computation with respect to the
audio samples of the training dataset. Based on the MFCC standard convolutions by a factor of 1/N +1/D2. Our final
values, the random forest classifier is used for classification. architecture is a fully-convolutional neural network that
When it is infant cry is detected, the raspberry pi records more comprises of 4 residual depth-wise separable convolutions
audio samples and passes them along with the previously where each convolution is followed by a batch normalization
recorded audio samples to the private cloud which passes the operation and a ReLU activation function. The final layer uses
audio data to the cry analysis model. The reason of the cry is a global average pooling so as to reduce the overall size of
predicted and sent to the user interface for display. representation and a soft-max activation function with seven
units for each of the seven emotions to make a prediction. This
architecture uses Adam Optimizer. It has nearly 60,000
parameters.
Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
C. Cry Analysis
Figure 8: Home Page of GUI
The cry analysis model was tested with the testing dataset and
V. RESULTS an accuracy of 56% was achieved for the same.
A. Emotion Recognition
D. GUI
Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC 2020)
IEEE Xplore Part Number: CFP20V66-ART; ISBN: 978-1-7281-4108-4
REFERENCES
[1] Grand View Research, “Smart baby monitor market size, share &trends
analysis report by product (audio & video, tracking devices), by distribut ion
channel (offline, online), by region, and segment forecasts, 2019 -2025”,
September 2019
[2] Misha Goyal and Dilip Kumar, “Automatic E-Baby Cradle Swing based
on Baby Cry”, International Journal of Computer Applications 71(21):39-43,
Figure 12: Sensor readings June 2013
[3] Y. Lavner, R. Cohen, D. Ruinskiy and H. Ijzerman, "Baby cry detection in
domestic environment using deep learning," 2016 IEEE International
VI. CONCLUS ION Conference on the Science of Electrical Engineering (ICSEE), Eilat, 2016, pp.
1-5
[4] A. Mercy Rani, R. Durgadevi, “Image Processing T echniques to
The Infant Care Assistant is able to collect essential Recognise Facial Emotions”, International Journal of Engineering and
information about the infant and its surrounding through Advanced T echnology (IJEAT ) ISSN: 2249 – 8958, Volume-6 Issue-6,
August 2017
sensors, process this information to determine the state of the
[5] Punith Kumar M B And Dr. P.S. Puttaswamy, "Video to Frame
infant through audio and image processing and soothe the Conversion of TV News Video By Using Mat Lab", International Journal of
crying infant by automatically rocking the cradle, playing Advance Research in Science And Engineering, IJARSE, vol. No.3, issue
music or regulating the room temperature. The Infant Care no.3, march 2014 issn-2319-8354(e) 95.
Assistant is able to successfully assist busy parents in taking [6] Kanchan Lata Kashyap, Sanjivani Shantaiya "Noise Removal of Facial
Expression Images Using Wiener Filter", National Conference on Emerging
care of their infants. Furthermore, the system is able to store T rends in Computer Science and Information T echnology (ET CSIT ) 2011
important data and raise alerts. Finally, it is made user friendly proceedings published in International Journal of Computer Applications
and interactive by the incorporation of a GUI. (IJCA)
[7] Samir K. Bandyopadhyay proposed, "A Method for Face
Segmentation, Facial Feature Extraction and Tracking" IJCSET | April 2011 |
VII. LIMITATIONS vol.no:1, issue 3,137-139
[8] Gonzales and Woods, ―Digital Image Processing‖, Pearson Education,
India, T hird Edition
In situations such as bed-wetting or infant trying to escape the [9] S. Emima Princy ; K. Gerard Joe Nigel, “ Implementation of cloud server
cradle there are no immediate responses from the system other for real time data storage using Raspberry Pi”, 2015 Online International
than raising alerts. Immediate response from the parents would Conference on Green Engineering and T echnologies (IC-GET )
[10] Lichuan Liu, Yang Li, Kevin Kuo. “Infant Cry Signal Detection, Pattern
be needed in such situations . The sensors used are basic Extraction and Recognition”, 2018 International conference on
whereas for a real life model, sophisticated sensors will be Informationand Computer technologies
[11] T roy-Wang. BabyCryDetector. (2018). Available:
needed. The computational power of raspberry pi limits the https://github.com/T roy-Wang/BabyCryDetector
response time and overall efficiency of the system. [12] Karol J. Piczak, 2015, “ESC: Dataset for Environmental Sound
Classification”, https://doi.org/10.7910/DVN/YDEPUT, Harvard Dataverse,
V2
VIII. FUTURE SCOPE [13] Gveres. Donateacry-corpus. (2015). Available:
https://github.com/gveres/donateacry-corpus
[14] Chen, Lawrence S., T homas S. Huang, T sut omu Miyasato and Ryohei
Additional features such as an android application can be Nakatsu, “Multimodal human emotion/expression recognition.”, in
integrated to receive live feed of the infant, control the music, Proceedings Third IEEE International Conference on Automatic Face and
Gesture Recognition, pp. 366-371. IEEE, 1998
rock the cradle and also adjust temperature and other [15] Omar. Emotion-Recognition. (2018). Available:
parameters as required. By studying the behavioral patterns of https://github.com/omar178/Emotion-recognition
Authorized licensed use limited to: Cornell University Library. Downloaded on August 18,2020 at 17:46:25 UTC from IEEE Xplore. Restrictions apply.