1 en Print - Indd
1 en Print - Indd
1 en Print - Indd
Advances in Decision
Sciences, Image
Processing, Security
and Computer Vision
International Conference on Emerging
Trends in Engineering (ICETE), Vol. 1
Learning and Analytics in Intelligent Systems
Volume 3
Series Editors
George A. Tsihrintzis, University of Piraeus, Piraeus, Greece
Maria Virvou, University of Piraeus, Piraeus, Greece
Lakhmi C. Jain, Faculty of Engineering and Information Technology,
Centre for Artificial Intelligence, University of Technology Sydney, NSW,
Australia; University of Canberra, Canberra, ACT, Australia; KES International,
Shoreham-by-Sea, United Kingdom; Liverpool Hope University, Liverpool, UK
The main aim of the series is to make available a publication of books in hard copy
form and soft copy form on all aspects of learning, analytics and advanced
intelligent systems and related technologies. The mentioned disciplines are strongly
related and complement one another significantly. Thus, the series encourages
cross-fertilization highlighting research and knowledge of common interest. The
series allows a unified/integrated approach to themes and topics in these scientific
disciplines which will result in significant cross-fertilization and research dissem-
ination. To maximize dissemination of research results and knowledge in these
disciplines, the series publishes edited books, monographs, handbooks, textbooks
and conference proceedings.
Editors
Advances in Decision
Sciences, Image Processing,
Security and Computer Vision
International Conference on Emerging Trends
in Engineering (ICETE), Vol. 1
123
Editors
Suresh Chandra Satapathy K. Srujan Raju
School of Computer Engineering Department of CSE
Kalinga Institute of Industrial Technology CMR Technical Campus
(KIIT) Deemed to be University Hyderabad, Telangana, India
Bhubaneswar, Odisha, India
D. Rama Krishna
K. Shyamala Department of ECE
Department of CSE University College of Engineering,
University College of Engineering, Osmania University
Osmania University Hyderabad, Telangana, India
Hyderabad, Telangana, India
Margarita N. Favorskaya
Institute of Informatics
and Telecommunications
Reshetnev Siberian State University
of Science and Technology
Krasnoyarsk, Russia
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Dedicated to
Our Alma Mater & Eminent Professors who
taught us
for their inspiring vision, unwavering
conviction and tireless efforts that have
resulted in nurturing hundreds of eminent
global citizens and effective human beings.
“Once an Osmanian Always an Osmanian”
University College of Engineering,
Osmania University, Hyderabad, India
University College of Engineering (UCE) has the distinction of being the oldest and
the biggest among the engineering colleges of the State of Telangana, India. It was
established in the year 1929, eleven years after the formation of Osmania
University. The college was the sixth engineering college to be established in the
whole of British India. The college moved to its present permanent building in the
year 1947. Today, it is the biggest among the campus colleges of Osmania
University. The golden jubilee of the college was celebrated in 1979, the diamond
jubilee in 1989 and the platinum jubilee in 2004. The college was made autono-
mous in 1994. University Grants Commission of India conferred autonomy status to
the college for a period of 6 years (2016–2017 to 2021–2022). The college offers
four-year engineering degree courses leading to the award of Bachelor of
Engineering (B.E.) in biomedical engineering, civil engineering, computer science
and engineering, electrical and electronics engineering, electronics and communi-
cations engineering and mechanical engineering. The college also offers graduate
programs and Ph.D. in the various branches of engineering. As of today, there is a
yearly intake of 320 undergraduate students (full-time) and 290 postgraduate stu-
dents (full-time and part-time). There are 143 teaching staff members, including 40
professors.
The UG programs offered have been accredited by the National Board of
Accreditation, New Delhi. Osmania University is accredited by NAAC with “A+”
Grade. UCE, OU, is the first engineering college to get ISO 9001 Certification in
Telangana State. University College of Engineering was awarded the Best
Engineering College by Indian Society for Technical Education (Telangana) in the
year 2010. UCE, OU, was adjudged as the Best Engineering College in the country
for the academic year 2003–2004 by Indian Society for Technical Education,
New Delhi, and by Star News for the years 2010–2011 and 2011–2012.
The college has successfully completed the Technical Education Quality
Improvement Programme (TEQIP-I) under the World Bank financial assistance of
Rs. 15.48 crores during the period 2003–2008. The outcome of the project has
resulted in: (i) increase in pass percentage of UG/PG students, (ii) enhancement of
research publications of staff by threefolds, (iii) introduction of six PG programs in
vii
viii University College of Engineering, Osmania University, Hyderabad, India
niche areas, (iv) introduction of credit-based system and (v) substantial increase in
internal revenue generation.
The college has successfully completed Phase II of TEQIP program with a
financial assistance of Rs. 12.5 crores and additional grant of 5 crores under the
best-performing institution category. Recently, the college has been approved as a
minor center under QIP for full-time Ph.D. programs. The college has been selected
for TEQIP Phase III twinning program with a financial assistance Rs. 7 crores. The
college has been granted “Visvesvaraya Ph.D. Scheme for Electronics and IT” for
full-time Ph.D. program. The GIAN program of MHRD has sanctioned 7 programs
in specialized research area to the college. The college has been ranked 80 in NIRF
Engineering College Ranking Survey by MHRD Survey, New Delhi, India, for the
year 2017–2018.
Alumni Association University College
of Engineering, Osmania University,
Hyderabad, India
ix
x Alumni Association University College of Engineering
In the past four years, the Executive Body set out to execute the above objectives
by taking up many initiatives like conducting global alumni meets, alumni talks,
funding student innovation, patent and research, facilitating student internships,
industry interactions and career development programs, support for student clubs
and other activities, facilitating in setting up the technology business incubator, etc.
To further the objectives of the Association to support the faculty and research
scholars, the Association has organized the First International Conference on
Emerging Trends in Engineering under its aegis.
Foreword
xi
Preface
Margarita N. Favorskaya
Suresh Chandra Satapathy
K. Shyamala
D. Rama Krishna
K. Srujan Raju
xiii
Acknowledgements
We thank all the authors for their contributions and timely response. We also thank
all the reviewers who read the papers and made valuable suggestions for
improvement.
We would like to thank Prof. S. Ramachandram, Vice-Chancellor, Osmania
University, and Prof. M. Kumar, Principal, University College of Engineering, for
having faith in us. Dr. D. Rama Krishna and Prof. K, Shyamala of UCE, OU, for
leading from the front; the TPC team, for pulling off a brilliant job; Heads of all
departments and all learned faculty, for all the support. Also, last but not the least,
we convey our thanks to all the research scholars without whose relentless slogging
this conference and publication would not have seen light.
We thank our sponsors Power Grid Corporation of India Ltd., Defence Research
and Development Organization (DRDO), CCL Products (India) Ltd., The Singareni
Collieries Company Ltd., TEQIP-III and all other financial contributors.
We extend our thanks to all the Executive Body members of Alumni Association
for their support and Sri. R. V. Rammohan Rao for the support when needed.
Finally, we thank the Springer team comprising Prof. Suresh Chandra Satapathy,
Prof. K. Srujan Raju and Dr. M. Ramakrishna Murthy for guiding and helping us
throughout.
xv
ICETE Organizing Committee
Chief Patron
S. Ramachandram Osmania University, Hyderabad, India
(Vice-chancellor)
Patrons
Kumar Molugaram University College of Engineering (A),
(Principal) Osmania University, Hyderabad, India
P. Laxminarayana (Dean) Faculty of Engineering, Osmania University,
Hyderabad, India
D. C. Reddy Osmania University, Hyderabad, India
(Former Vice-chancellor)
D. N. Reddy Jawaharlal Nehru Technological University,
(Former Vice-chancellor) Hyderabad, India
R. V. Rammohan Rao Alumni Association, University College
(Past President) of Engineering (A), Osmania University,
Hyderabad, India
Chairpersons
P. Ram Reddy Alumni Association, University College
(President) of Engineering (A), Osmania University,
Hyderabad, India
P. V. N. Prasad Department of Electrical Engineering, University
College of Engineering (A), Osmania
University, Hyderabad, India
xvii
xviii ICETE Organizing Committee
Conveners
D. Vijay Kumar Alumni Association, University College
(General Secretary) of Engineering (A), Osmania University,
Hyderabad, India
D. Rama Krishna Department of Electronics and Communication
Engineering, University College
of Engineering (A), Osmania University,
Hyderabad, India
Publication Committee
Suresh Chandra Satapathy School of Computer Engineering, Kalinga
(Chair) Institute of Industrial Technology (KIIT),
Deemed to be University, Bhubaneswar,
Odisha
Kumar Molugaram University College of Engineering,
(Co-chair, Principal) Osmania University, Hyderabad, Telangana
K. Srujan Raju Department of CSE, CMR Technical Campus,
(Co-chair) Hyderabad, Telangana
Sriram Venkatesh Department of Mechanical Engineering,
University College of Engineering,
Osmania University, Hyderabad
K. Shyamala Department of Computer Science and
Engineering, University College of
Engineering, Osmania University, Hyderabad
D. Vijay Kumar Alumni Association, University College
(General Secretary) of Engineering (A), Osmania University,
Hyderabad, India
D. Rama Krishna Department of Electronics and Communication
Engineering, University College of
Engineering, Osmania University, Hyderabad
Finance Committee
Sriram Venkatesh ME, UCE, OU
A. Krishnaiah ME, UCE, OU
P. Ramesh Babu ME, UCE, OU
V. Bhikshma CE, UCE, OU
G. Mallesham EE, UCE, OU
M. Malini BME, UCE, OU
B. Rajendra Naik ECE, UCE, OU
V. Uma Maheshwar ME, UCE, OU
P. Naveen Kumar ECE, UCE, OU
D. N. Prasad SCCL
(Advisor (Coal))
M. Shyam Prasad Reddy TREA
(General Secretary)
T. Venkatesam AA UCE, OU
(Superintendent
Engineer (Retd.))
M. S. Venkatramayya Mining
Satish Naik AA UCE, OU
R. Thomas AA UCE, OU
Syed Basharath Ali AA UCE, OU
P. Narotham Reddy AA UCE, OU
Organizing Committee
E. Vidya Sagar UCE, OU
(Vice-principal)
K. Shyamala CSE, UCE, OU
P. Chandra Sekhar ECE, OU
M. Gopal Naik CE, UCE, OU
P. Usha Sri ME, UCE, OU
M. Venkateswara Rao BME, UCE, OU
M. V. Ramana Rao EED, UCE, OU
G. Yesuratnam EED, UCE, OU
P. Raja Sekhar CE, UCE, OU
B. Mangu EED, UCE, OU
M. Chandrashekhar Reddy ME, UCE, OU
Narsimhulu Sanke ME, UCE, OU
M. A. Hameed CSE, UCE, OU
B. Sujatha CSE, UCE, OU
L. Nirmala Devi ECE, UCE, OU
N. Susheela EED, UCE, OU
S. Prasanna CE, UCE, OU
ICETE Organizing Committee xxi
Technical Committee
K. Shyamala CSE, UCE, OU
P. V. Sudha CSE, UCE, OU
M. Manjula EED, UCE, OU
B. Mangu EED, UCE, OU
P. Satish Kumar EED, UCE, OU
J. Upendar EED, UCE, OU
M. Malini BME, UCE, OU
D. Suman BME, UCE, OU
K. L. Radhika CE, UCE, OU
K. Shashikanth CE, UCE, OU
L. Siva Rama Krishna ME, UCE, OU
E. Madhusudan Raju ME, UCE, OU
R. Hemalatha ECE, UCE, OU
M. Shyamsunder ECE, UCE, OU
xxiii
xxiv Contents
xxxi
xxxii About the Editors
journals, and also he was on the editorial board of CSI 2014 Springer AISC series;
337 and 338 volumes, IC3T 2014, IC3T 2015, IC3T 2016, ICCII 2016 and
ICCII 2017 conferences. In addition to this, he has served as reviewer for many
indexed national and international journals. He is also awarded with Significant
Contributor and Active Young Member Awards by Computer Society of India
(CSI). He also authored 4 textbooks and filed 7 patents so far.
A Review on Impact Application of Heart Rate
Variability (HRV)
Abstract. The heart is the principal element of the physical structure because
it transfers deoxygenated and oxygenated blood within the body. Heart rate
variation gives the idea about many physiological and pathological parameter
that lead to the change in normal to normal synchronous of the heart rate.
HRV could be an essential tool within the department of cardiology, used as a
non-invasive measurement technique to get the pathological information of a
patient who has changed to suffer from cardiac diseases. Analysis of HRV
can facilitate grasping the understanding of the autonomous nervous system
(ANS) and can predict cardiac health. HRV shows the variation in the time
interval between heartbeats and it is a reliable indicator of current disease, or a
person may get suffer from some cardiac diseases. We gave a brief in this paper
review of the clinical application of HRV and differential measurement tech-
nique such as Time domain, Frequency domain, and Non-linear technique for
analysis of HRV.
1 Introduction
Fig. 1. RR interval
HRV is a noninvasive technique, pre marker technique to know the health status of
the autonomic nervous system (ANS) which maintain the normal rhythm of a heartbeat.
Regular rhythm is the interval between consecutive heartbeats known as RR interval
[2]. HRV reflects the normal to normal interval between the hearts beat corresponding
to changes in Heart rate (HR). The normal physiological fluctuation in Heart rate
(HR) is because of the Autonomous sensory system (ANS). ANS additionally influ-
ences the working of inner body organs.
2 Literature Review
A brief review of Heart rate variability (HRV) was given by Acharya et al. [8]. He stated
that HRV is a very vital and powerful tool to know the imbalance of the ANS. He has
mentioned the different physiological factor that can influence the regular beat of Heart
Rate (HR). Variation in HR is an indicator of current and warning about a future cardiac
disorder. The author also presented the different clinical application of HRV.
Melillo et al. [9] developed a novel predictive model using a data mining algorithm
to provide information about the risk of the hypertensive patient using HRV. In this
paper, the author expresses, the prescient model dependent on random forest method,
technique, and accuracy of this method is up to 87.8% and concluded that the HRV
A Review on Impact Application of Heart Rate Variability (HRV) 3
could be used to detect the different cardiac event and hypertensive patients. Lin et al.
[10] have derived the features of HRV based on long-term monitoring. He proposed
two best strategies to know the physiological state of an individual from HRV; the
technique is; hybrid learning and decision tree learning strategy. These techniques
include extraction strategy gives precision up to 90.1%. Kim et al. [11] have built up a
feeling acknowledgment framework depends on the physiological signal, for example,
body temperature, skin, ECG as this parameter influences ANS. The feature can be
extracted using the support vector machine (SVM) classifier.
Researchers of BARC- Bhabha Atomic Research Center and Doctors of AIIMS-All
India Institute of Medical Science introduced the distinctive sections about the inno-
vations and a new method for investigation of physiological variation in Advanced
Applications of Physiological Variability (AAPV) handbook by Dr. Jindal et al. [3].
The handbook gives a clear idea about the new measurement technique, clinical
application, a different protocol for a long term and short term recording of a physi-
ological signal. Bravi et al. [12] presented more than 70 variability technique in his
article and discussed the importance, limitation and positive reference related to the
clinical application of HRV. The author has given an idea about the different feature
extraction technique. The paper further discusses the complexity of the different
technique to get the accurate possible way of study. Verlinde et al. [13] presented the
case study of athletes and compared result with a healthy subject. The HRV of oxygen-
consuming athletes have expanded power in all frequency groups, and the outcomes
are gotten by spectral investigation utilizing wavelets transform. Wavelets could be an
accurate tool for evaluating the performance of HRV because wavelet can evaluate the
oscillating component.
Mager et al. [14] built up a calculation utilizing continuous wavelet change used for
power spectral analysis of HRV, which state the co-relation between autonomic and
cardiovascular function. Bračič and Stefanovska [15] have examined the human
bloodstream in time and frequency domain utilizing a wavelet transform for a different
state of cardiac arrhythmia and fibrillation. Panerai et al. [16] has described the rela-
tionship between HRV and Blood Pressure (BP) fluctuations in the frequency domain,
which demonstrates 10-s fluctuation among time interval and pressure variability. Nagy
et al. [17] found that an infant has a lower HR variation in the baby boy than a baby
girl. HR variances in the healthy subject have a place by age 20 to 80 were examined,
mentioned that HRV decline with age in female than male. Guzzetti et al. [18] has
studied the effect of a drug on HRV and found that High frequency (HF) fluctuation
increases and Low frequency (LF) fluctuation decreases in the sympathetic activity,
these fluctuations in sympathetic activity may lead to heart attack. Luchini et al. [19]
found that smokers have increased sympathetic activity during HRV analysis. HRV
gets reduced during smoking. Smoking effects on HRV which will harm the ANS.
Malpas et al. [20] have demonstrated that HRV will get reduced with the consumption
of alcohol. Togo and Yamamoto [21] the author suggested that the conscious state of
the brain is reflected in HRV.
Jang et al. [22] expressed in his article that, a confusion of fringe and the focal
sensory system will affect HRV. The significance of the HRV investigation in mental
illness emerges because of a lopsidedness of sympathetic and parasympathetic action.
Wheeler and Watkins [23] confirmed that in the case of diabetic neuropathy there is a
4 R. Shashikant and P. Chetankumar
HRV is not a surgical method to examine the condition of the heart; HRV is only used
for analysis of heart rhythm. Various procedures currently have been created to eval-
uate this beat to beat fluctuation to give records of autonomic direction in both healthy
and unhealthy conditions. There are two essential methodologies for the investigation
of HRV: Linear and Non-linear measurement technique. The most often utilized
methods of HRV is the linear method, i.e., Time and frequency domain which gives the
spectral data about HRV [3]. In the spectrum of the heart fluctuation, there are four
dominant ranges such as Very low frequency (VLF-0.03 to 0.04 Hz), Low frequency
(0.04 to 0.15 Hz) and High frequency (HF-0.15 to 0.40 Hz) which reflects sympathetic
and parasympathetic action individually. Analysis of HRV will help us for mass
screening, post-intervention analysis, and disease characterization.
4 Clinical Application
5 Conclusion
In this paper, we have seen the research done on HRV by various researchers and
cardiologist, to improve the medical facilities for disease diagnosis and patient moni-
toring. Heart rate variability (HRV) becomes an essential noninvasive tool, and it is
easy to perform. HRV technique not only used in case of cardiac disease but also used
in other pathological conditions. The present review proposes that Heart rate variability
investigation utilizing ECG recording could be successful in the case of addiction,
stress, obesity, etc. It can be concluded that shortly we can get a more accurate result by
developing different Machine Learning algorithms for analysis of HRV.
Acknowledgments. The author is grateful to all of the researchers who contributed to
the research of Heart Rate Variability.
References
1. ChuDuc H, NguyenPhan K, Nguyen Viet D (2013) A review of heart rate variability and its
applications. APCBEE Procedia 7:80–85
2. Billman GE, Huikuri HV, Sacha J, Trimmel K (2015) An introduction to heart rate
variability: methodological considerations and clinical applications. Front Physiol 6:55
3. Jindal GD, Deepak KK, Jain RK (2010) A Handbook on Physiological Variability.
Advanced Application of Physiological Variability (AAPV)
4. Germán-Salló Z, Germán-Salló M (2016) Non-linear methods in HRV analysis. Procedia
Technol 22:645–651
5. Camm AJMM, Malik M, Bigger JTGB, Breithardt G, Cerutti S, Cohen R, Coumel P,
Fallen E, Kennedy H, Kleiger RE, Lombardi F (1996) Heart rate variability: standards of
measurement, physiological interpretation, and clinical use. Task force of the European
Society of cardiology and the north American society of pacing and electrophysiology.
Circulation 93(5):1043–1065
6. Gang Y, Malik M (2003) Heart rate variability analysis in general medicine. Indian Pacing
Electrophysiol J 3(1):34
A Review on Impact Application of Heart Rate Variability (HRV) 7
7. Camm AJ, Lüscher TF, Serruys PW (eds) (2009) The ESC textbook of cardiovascular
medicine. Oxford University Press
8. Acharya UR, Joseph KP, Kannathal N, Min LC, Suri JS (2007) Heart rate variability.
Advances in cardiac signal processing. Springer, Berlin, Heidelberg, pp 121–165
9. Melillo P, Izzo R, Orrico A, Scala P, Attanasio M, Mirra M, De Luca N, Pecchia L (2015)
Automatic prediction of cardiovascular and cerebrovascular events using heart rate
variability analysis. PloS one 10(3):e0118504
10. Lin CW, Wang JS, Chung PC (2010) Mining physiological conditions from heart rate
variability analysis. IEEE Comput Intell Mag 5(1):50–58
11. Kim KH, Bang SW, Kim SR (2004) Emotion recognition system using short-term
monitoring of physiological signals. Med Biol Eng Compu 42(3):419–427
12. Bravi A, Longtin A, Seely AJ (2011) Review and classification of variability analysis
techniques with clinical applications. Biomed Eng Online 10(1):90
13. Verlinde D, Beckers F, Ramaekers D, Aubert AE (2001) Wavelet decomposition analysis of
heart rate variability in aerobic athletes. Auton Neurosci 90(1–2):138–141
14. Mager DE, Merritt MM, Kasturi J, Witkin LR, Urdiqui-Macdonald M, Evans MK,
Zonderman AB, Abernethy DR, Thayer JF (2004) Kullback-Leibler clustering of continuous
wavelet transform measures of heart rate variability. Biomed Sci Instrum 40:337–342
15. Bračič M, Stefanovska A (1998) Wavelet-based analysis of human blood-flow dynamics.
Bull Math Biol 60(5):919–935
16. Panerai RB, Rennie JM, Kelsall AWR, Evans DH (1998) Frequency-domain analysis of
cerebral autoregulation from spontaneous fluctuations in arterial blood pressure. Med Biol
Eng Compu 36(3):315–322
17. Nagy E, Orvos H, Bárdos G, Molnár P (2000) Gender-related heart rate differences in human
neonates. Pediatr Res 47(6):778
18. Guzzetti S, Cogliati C, Turiel M, Crema C, Lombardi F, Malliani A (1995) Sympathetic
predominance followed by functional denervation in the progression of chronic heart failure.
Eur Heart J 16(8):1100–1107
19. Lucini D, Bertocchi F, Malliani A, Pagani M (1996) A controlled study of the autonomic
changes produced by habitual cigarette smoking in healthy subjects. Cardiovasc Res
31(4):633–639
20. Malpas SC, Whiteside EA, Maling TJ (1991) Heart rate variability and cardiac autonomic
function in men with chronic alcohol dependence. Heart 65(2):84–88
21. Togo F, Yamamoto Y (2001) Decreased fractal component of human heart rate variability
during non-REM sleep. Am J Physiol Heart Circ Physiol 280(1):H17–H21
22. Jang DG, Hahn M, Jang JK, Farooq U, Park SH (2012) A comparison of interpolation
techniques for RR interval fitting in AR spectrum estimation. In: IEEE biomedical circuits
and systems conference (BioCAS), pp. 352–355, November
23. Wheeler T, Watkins PJ (1973) Cardiac denervation in diabetes. Br Med J 4(5892):584–586
24. Folino AF, Russo G, Bauce B, Mazzotti E, Daliento L (2004) Autonomic profile and
arrhythmic risk stratification after surgical repair of tetralogy of Fallot. Am Heart J
148(6):985–989
25. Mahesh V, Kandaswamy A, Vimal C, Sathish B (2009) ECG arrhythmia classification based
on the logistic model tree. J Biomed Sci Eng 2(6):405
26. DeGiorgio CM, Miller P, Meymandi S, Chin A, Epps J, Gordon S, Gornbein J, Harper RM
(2010) RMSSD, a measure of vagus-mediated heart rate variability, is associated with risk
factors for SUDEP: the SUDEP-7 Inventory. Epilepsy Behav 19(1):78–81
8 R. Shashikant and P. Chetankumar
27. Lehrer P, Karavidas MK, Lu SE, Coyle SM, Oikawa LO, Macor M, Calvano SE, Lowry SF
(2010) Voluntarily produced increases in heart rate variability modulate autonomic effects of
endotoxin-induced systemic inflammation: an exploratory study. Int J Ambient Energy
35(4):303–315
28. Li Z, Wang C, Mak AF, Chow DH (2005) Effects of acupuncture on heart rate variability in
normal subjects under fatigue and non-fatigue state. Eur J Appl Physiol 94(5–6):633–640
Assessment of ECG Signal Quality
1 Introduction
It is tough to purchase the ECG machine only for experimentation and analysis purpose
for the researcher. To perform experiments, we require a database of ECG signal.
Although many databases of real and synthetic ECG signal are available on a different
website. We aim to generate a database of Heart rate variability (HRV) signal. For
analysis of HRV, we require a real-time signal that can be acquired from various
acquisition machine like BIOPAC, but it is too costly. We have designed a small 3-lead
ECG acquisition module having sampling frequency 100 Hz which will help to acquire
a signal from the patient body. While developing a module, we must get a good quality
signal, so that no significant information is lost. The acquired signal may suffer from
disturbances like motion artifacts, which deteriorate the quality and make it impossible
to analyze and diagnose. During designing of module proper filtering action must be
considered to reduce artifacts. Acquisition system must have high Common Mode
Rejection Ratio (CMRR) and signal to noise (SNR). The spectra of the ECG signal is
0.05–100 Hz. This spectra get altered when noise present in ECG signal which affects
signal quality. So robustness of module is required.
We will discuss the development of ECG acquisition module and a different
method for quality estimation of the ECG signal. Quality estimation of ECG signal can
be done by acquiring the signal from a healthy subject (subject signal) and comparing
that with a standard signal. For experimentation purpose, standard ECG signal col-
lected from the 12-channel a Tele ECG module for 10 min at a 100 Hz sampling
frequency. The subject signal is received from the designed module for 10 min at
100 Hz. The standard and subject database contain 9000 samples. The module has a
small size, not too costly, and signal quality is significant. Several methods exist to
measure the quality of the signal. Our proposed method is based on statistical and
machine learning parameter [5].
For this study 20 healthy male subject with age 20–28 were enrolled. First, 10 min
ECG recording of 20 healthy subjects are taken using a Tele ECG module and the same
latter procedure was done using designed ECG module. For simplicity purpose let us
say, a signal acquired from the Tele ECG module is a standard signal, and a signal
received from the developed module is a subject signal. These two signals were
compared to estimate the signal quality of a developed module.
Figure 1 gives the detailed block diagram of a simple ECG module that can be
quickly assembled and experimented in the laboratories. A 9 V rechargeable battery
powers the entire circuit. The purpose to use a battery as a power source is to eliminate
power line interference 50 Hz noise. The output of the 9 V battery is connected to a
low voltage drop three terminal voltage regulator 7805 to obtain +5 V regulated power
supply. The advantage of using low voltage drop regulator to supply continues +5 V
output voltage, even if the battery discharges up to 7 V by maintaining dropout voltage.
Regulated +5 V is given to DC to DC converter (IN0505D) which gives 5 V supply
to the amplifier circuit.
The amplifier circuit made up of a low power Instrumentation amplifier (INA129).
The gain resistance is taken as 10 kΩ to get a gain of 5 from the amplifier. Gain 5 is kept
given the electrode potential of the order 450 mV. The output of INA129 is further
amplified for a gain of 450 with the help of quad op-amp LMC6044 [2]. The amplified
output is connected with Integrated circuit (Low Pass Filter) with a time constant of
3.4 s which gives a lower 3 dB response of 0.05 Hz for baseline restoration to have
clinical quality ECG output. The differentiator circuit (High Pass Filter) is used to obtain
upper 3 dB cutoff frequency at 150 Hz. The cascaded HPF and LPF formed the Band
Pass Filter, which is considered for removal of artifacts. The output of Instrumentation
amplifier is DC translated to an appropriate level with the help of LMC6044 [2].
Positive input from the left leg (LL) or Left arm (LA) electrode is connected to
inverting input and Negative input from Right arm (RA) electrode is connected to the
Assessment of ECG signal Quality 11
non-inverting terminal of INA129. The output from quad opamp LMC6044 can be
observed on Digital storage oscilloscope (DSO). We can store the data/signal using
NI DAQ assistant card, and further processing can perform on Lab VIEW or MATLAB
software. We can store data in .dat, .mat, .csv format.
After acquiring the signal, we have to estimate its quality. Estimation of quality is
essential to know because a signal is used for clinical application. We can define the
signal quality of ECG with two parameters, i.e., Fundamental quality and vital quality.
Fundamental quality is related with Heart rate (HR), Arrhythmia, Atrial fibrillation,
HRV and it usually defines P, QRS, and T wave in a condition like myocardial
ischemia and coronary heart disease. More information can be extracted from Heart
rate (HR) hence more focus on the fundamental quality of ECG signal is given [1].
Comparison between standard and subject signal was performed by ‘Unpaired t’ test.
P-value < 0.05 was accepted as the level of significance.
There are several signal quality index parameter which estimates the quality of signals
such as RR Interval variability, Template matching- Pearson correlation, skewness and
kurtosis, statistical test, comparison of Error Bar and Machine learning signal quality
classifier. We will implement all these signal quality indices on standard and subject
signal.
Where x y is the data sample and x, y are the arithmetic mean of the data sample
while the denominator is the standard deviation of two data sample. This is also known
as the Pearson product correlation coefficient, usually the value of rxy ranges between
12 Shashikant and P. Chetankumar
Which tells us about the symmetry of the distribution. Skewness measure symmetry
of the normal distribution and can take negative or positive value depending on the
skew on the left or right tail of the normal distribution. The skewness can be as
P
1
ðxi lÞ3
S¼N i¼1
r3
On the other hand, kurtosis measures a sharp peak of the distribution, the fourth
moment about the mean is
P
i¼1 ðxi lÞ4
N
In distribution, skewness also depends upon outlier, a distribution with any number
of outliers has high skewness which results in an asymmetric distribution. Presence of
outlier in data sample is an indication of noise which could effect on a signal. The
normal distribution has skewness, kurtosis equal to 0 and 3 respectively. The kurtosis
value of 2.8 to 3.5 is desirable. The following observation is drawn from the standard
signal and subject signal.
Statistical Analysis
A statistical test is used to provide the probability of occurrence. The two tail
‘unpaired t’ test is used for analysis purpose. A statistical test is conducted by con-
sidering the Null Hypothesis (Ho) and Alternative Hypothesis (Ha). As the value of
skewness and kurtosis is desirable, we can assume that the data follow a normal
distribution. The objective of the statistical test is to find out any significant difference
between the standard signal and the subject signal [3].
Null Hypothesis (Ho)
(Ho) = There is no significant difference between two data sample of the signal.
Alternative Hypothesis (Ha)
(Ha) = There is a significant difference between two data sample of the signal.
The level of significance is selected as 5% that gives a level of confidence 95%. If
P-value < a then Reject null hypothesis; if P-value > a then Fail to reject the null
hypothesis. P-value (0.72) > 0.05 is obtained using ‘unpaired t’ test, which indicates that
there is no significant difference between the two data sample. Data sample belongs to
the same population.
ðTP þ TNÞ
Accuracy ¼
ðTP þ FP þ TN þ FNÞ
TP
Sensitivity ¼
TP þ FN
TN
Specificity ¼
TN þ FP
The average accuracy, sensitivity, and specificity using k = 1 and 3 were 96.64%,
90.99%, and 97.56% respectively. The above result shows that k-NN can be effectively
used for signal quality classification [4].
4 Conclusion
For the purpose of creating an ECG signal database, we have designed an ECG
acquisition module for HRV analysis. Various signal quality index parameters such as,
mean, standard deviation, Pearson’s correlation, Normality test, Machine Learning
signal quality classifier check the performance and quality of the signal acquired from
the designed module. The outcome of all the methods shows clearly that there is no
significant difference between the standard signal and the subject signal. This means
that the designed module produces an ECG signal of quality, which can also be useful
for clinical analysis and experimentation.
References
1. Orphanidou C: Signal Quality Assessment in Physiological Monitoring: State of the Art and
Practical Considerations. Springer (2017)
2. Jindal GD, Deepak KK, Jain RK: A handbook on physiological variability. In: Advanced
applications of physiological variability (2010)
3. Martinez-Tabares FJ, Espinosa-Oviedo J, Castellanos- Dominguez G: Improvement of ECG
signal quality measurement using correlation and diversity-based approaches. In: Annual
international conference of the IEEE engineering in medicine and biology society IEEE
(2012)
4. Kulek J, et al.: Data driven approach to ECG signal quality assessment using multistep SVM
classification. In: Computing in cardiology, IEEE (2011)
5. Del Ro BAS, Lopetegi T, Romero I: Assessment of different methods to estimate
electrocardiogram signal quality. In: Computing in cardiology, IEEE (2011)
Identifying Obstructive, Central and Mixed
Apnea Syndrome Using Discrete
Wavelet Transform
1 Introduction
A person enters to REM stage after completing all the three stages of NREM. REM
stage lasts longer during night time. In REM stage eye moves rapidly [11]. All these
sleep stages are diagnosed for sleep apnea patients through polysomnography test. This
test includes electroencephalogram (EEG), electrocardiography (ECG), electrooculo-
gram EOG, electromyogram EMG, Oxygen saturation Sp02 and so on [5, 10].
Sleep Apnea is commonly seen in men and women due to lifestyle changes. Now a
days it is very common like type 2 diabetes. If it is undiagnosed and untreated then it
leads to serious consequences like death [6, 9]. There are three types of sleep apnea:
Obstructive Sleep Syndrome (OSS), Central Sleep Syndrome (CSS) and Mixed sleep
Syndrome (MSS). OSS is caused by blockage of upper airway or due to the pauses in
air flow for a few seconds. This is commonly seen in snoring subjects [7, 8]. The
obstructive air flow leads to reduced oxygen saturation. The difference between normal
breathing, partial obstruction and complete obstruction in breathing is shown in Fig. 1.
2 Methodology
The subjects having Obstructive Sleep Syndrome (OSS), Central Sleep Syndrome
(CSS) and Mixed sleep Syndrome (MSS) are taken from https://physionet.org and
signals are decomposed into sub-bands to extract the detailed and approximate coef-
ficients. The decomposition and feature extraction is done using discrete wavelet
transform with Daubechies order 2. In this paper 25 subjects data suffering from
Obstructive Sleep Syndrome (OSS), Central Sleep Syndrome (CSS), Mixed sleep
Syndrome (MSS) are taken. Out of these 25 subjects 21 are male and 4 are female with
age 50 ± 10 years. Table 1 show the sample of 10 subjects considered.
Figure 2 show the abnormal EEG for multiple subjects and data is taken at standard
terminal C4–A1. These EEG signals are said to be abnormal since they include sleep
apnea syndrome. Wavelet transform is used for decomposing the EEG signal in to sub-
bands (alpha, beta, theta, delta and gamma). The sampling frequency is 250 Hz, eight
decomposition levels are done for extracting detailed and approximate coefficients.
For decomposing of EEG signal Daubechies order 2 discrete wavelet transform is
used. From these coefficients the features are extracted to identify the obstructive sleep
apnea, central sleep apnea and mixed sleep apnea.
The features considered for extracting OSS, CSS and MSS are energy, variance,
mean, median, maximum and minimum. Later in the next stage for OSS, CSS and MSS
subjects the abdomen movements, nasal flow, ribcage movements and snoring are
identified.
18 Ch. Usha Kumari et al.
In this study the obstructive sleep syndrome, central sleep syndrome and mixed sleep
syndrome are identified using discrete wavelet transform with Daubechies order 2. Five
recordings EEG, abdomen movements, nasal flow, ribcage movements and snoring are
extracted from each subject to identify the severity of apnea shown in Figs. 4, 5 and 6.
The EEG signal is decomposed into five sub-bands alpha (a), beta (b), theta (h), delta
(d) and gamma(c) shown in Fig. 3.
Then set of features mean, absolute mean, standard deviation, median, variance,
maximum, minimum are extracted shown in Table 2. These features demonstrates the
apnea type (OSS, CSS or MSS) along with the four artifacts (abdomen movements,
nasal air flow, ribcage movements and snoring) taken from the subjects.
Figure 4 shows the occurrence of obstructive sleep apnea during REM sleep. It is
observed the abdomen movements, nasal and ribcage signals are completely reduced to
nearly zero level at 2000 to 4000 m sec range when OSS episode occurs.
20 Ch. Usha Kumari et al.
Figure 5 shows the occurrence of Central sleep apnea during REM sleep. It is
observed the ribcage movements is reduced to zero level. The airflow and abdomen
signals are reduced at 4000 ms to 6000 ms range when CSA event occurred. Figure 6
Identifying Obstructive, Central and Mixed Apnea Syndrome 21
shows the occurrence of Mixed sleep apnea. When EEG signal is compared with
abdomen movements and ribcage movements these signals reached to zero level.
In EEG signal the spike is seen at 1800 ms when the episode occurs and similarly at the
ribcage movement the event occurrence is seen at 0 to 2000 ms and again at 3000 ms
to 4800 ms.
The subjects abdomen movements, nasal and ribcage movements are analysed for
30 s duration for sleep apnea patients. Snoring of each subject is also analysed for
identifying apnea events.
4 Conclusions
In this paper identifies all three types of apnea events-Obstructive Sleep Syndrome
(OSS), Central Sleep Syndrome (CSS) and Mixed sleep Syndrome (MSS). The EEG
signal characteristics are analyzed with the help of wavelet decomposition techniques.
After identification of OSS, CSS and MSS these artifacts abdomen movements, nasal
airflow and ribcage movements and snoring of all 25 subjects are analyzed. The EEG
signal is decomposed into 8 sub-bands and coefficients are extracted as features. The
features of EEG signal-mean, standard deviation, median, variance and generated are
extracted using Daubechies order 2 wavelet transform. Daubechies wavelet gives better
efficiency than other wavelets. It is observed the EEG signal, abdomen movements,
nasal, ribcage movements and snoring signal amplitudes goes high to low when event
occurs.
22 Ch. Usha Kumari et al.
Acknowledgments. The authors wish to thank the publicly available physio bank database
https://physionet.org/physiobank/
References
1. Al-Fahoum AS, Al-Fraihat AA (2014) Methods of EEG signal features extraction using
linear analysis in frequency and time-frequency domains. ISRN Neurosci 2014: 7, February
13
2. Cai C, Harrington PDB (1998) Different discrete wavelet transforms applied to denoising
analytical data. J Chem Inf Comput Sci 38(6):1161–1170
3. Sezgin N, Tagluk ME (2009) Energy based feature extraction for classification of sleep
apnea syndrome. Comput Biol Med 39(11):1043–1050
4. Acharya UR, Sree SV, Alvin APC, Suri JS (2012) Use of principal component analysis for
automatic classification of epileptic EEG activities in wavelet framework. Expert Syst Appl
39(10):9072–9078
5. Lee JM, Kim DJ, Kim IY, Park KS, Kim SI (2002) Detrended fluctuation analysis of EEG in
sleep apnea using MIT/BIH polysomnography data. Comput Biol Med 32(1):37–47
6. Almuhammadi WS, Aboalayon KA, Faezipour M (2015) Efficient obstructive sleep apnea
classification based on EEG signals. In: 2015 IEEE long Island systems, applications and
technology conference (LISAT). IEEE, pp. 1–6, May
7. Orhan U, Hekim M, Ozer M (2011) EEG signals classification using the K-means clustering
and a multilayer perceptron neural network model. Expert Syst Appl 38(10):13475–13481
8. Jahankhani P, Kodogiannis V, Revett K (2006) EEG signal classification using wavelet
feature extraction and neural networks. In: IEEE John Vincent Atanasoff 2006 international
symposium on modern computing, 2006. JVA’06. IEEE, pp. 120–124, October
9. Amin HU, Mumtaz W, Subhani AR, Saad MNM, Malik AS (2017) Classification of EEG
signals based on pattern recognition approach. Front Comput Neurosci 11:103
10. Almazaydeh L, Elleithy K, Faezipour M (2012) Detection of obstructive sleep apnea through
ECG signal features. In: 2012 IEEE international conference on electro/information
technology (EIT). IEEE, pp. 1–6, May
11. Kalaivani M, Kalaivani V, Anusuya Devi V. Analysis of EEG signal for the detection of
brain abnormalities. Int J Comput Appl (0975 – 8887)
Fractal Dimension of Fundoscopical Retinal
Images for Diagnosing of Diabetic Retinopathy
Abstract. The present work applied different image processing techniques like
green component image, background estimation and image skeletonization on
the subject’s fundus images. Statistical methods like fractal dimensions,
neighbourhood concept was used to distinguish between normal and abnormal
fundus images in subjects (n = 45). The results show that, in normal fundus
images the vein structures were clearly visible, while in the fundoscopic positive
images, the vein structures were totally absent. In fundoscopic negative images
the visible vein structures are observed to be thick and coiled up. No significant
changes were found in Fractal Dimension (FD) values among the subjects.
Neighbourhood pixels (NP) values were found to be 45 ± 0.74 (mean ± S.D.)
for normal subjects, 34 ± 1.01 for fundoscopic positive subjects, 20.47 ± 0.49
for fundoscopic negative subjects. The results of this work validated the
skeletonized images and support the strength of diagnosis with the help of
accurate figures.
1 Introduction
eye [6–9]. The blood vessel structures are visible [10–13] in the back part of eye as
shown in Fig. 2. The disadvantage of the fundus camera is that, it lacks the in-depth
and accurate information regarding each disease.
The previously reported literature [1] shows that damage to the vessels of the eye
depends on the time of exposure to graphene oxide. [14] summarises the skele-
tonization of the fundus images using various image processing techniques. However,
the results were not supported by the mathematical justification. [15], reports the use of
fractal dimensions in images of high ocular patients. But the paper had limited scope as
they used only one mathematical concept.
2 Methodology
Data Collection: Forty-five subjects (12 females) of mean age (55 ± 5) were included
in this study. The subjects were divided into Healthy (n = 15), fundoscopic positive
(n = 15), fundoscopic negative (n = 15). All the subjects were recruited into the study
from Apollo Institute of Medical Sciences and Research. All subjects gave informed
consent for participation in the study. The Fundus CANON CF-1 retinal camera was
used in this study to record the data.
Feature Extraction: Feature Extraction is divided into two: Fractal Dimensions,
Neighbourhood Concept.
Fractal dimensions: Fractal dimension is a part of Fractal geometry, which defines the
ratio which provides a statistical index of complexity in a pattern or design which
changes with the scale that is being measured. It can also be characterized as a measure
to fill the space capacity of a pattern, it necessarily need not to be a fraction. There are
two types of fractal dimensions they are: self-similarity dimension and box counting
dimension.
Fractal Dimension of Fundoscopical Retinal Images 25
Self- Similarity Dimension: This type of dimension can only be used when the input
image is self-similar. The equation for this type of dimension is given below:
a ¼ 1= sD where 00 a00 ¼ numberofpieces; 00 s00 ¼ reduction factor and
00 ð1Þ
D00 ¼ self similarpieces:
After taking the values of the x-axis and y-axis plot the graph and find the best fit
line. In this work Box-Counting dimension has been used to find out the fractal
dimension of the input image as it does not have a limiting factor called self-similarity.
Neighbourhood Concept: The output of applying the neighbourhood operation on an
image which is considered to be an input, the output will also be an image format.
Normally the size of the neighbourhood is fixed, either it will be square matrix or a
cube matrix. The matrix will have a centre point “p”. The main work of the Neigh-
bourhood concept is that to calculate object properties. To calculate object properties
means it finds the centre pixel of the image which has high intensity. The reason for
finding the centre pixel is due to locate the initial point where light enters into the eye.
The initial point which allows the light into the eye will have a high intensity. After
capturing the high intensity pixel, the mask (size of the neighbourhood) then finds the
standard deviation among the pixels and then it displays it on the image.
Figure 3 shows the overall block diagram of image processing approach using
neighbourhood concept and fractal dimension. The fundus image is first resized to
display the image on the screen. Then the given image is converted to the greyscale
image for binarization to calculate fractal dimension, object properties using the pixel
values and custom pixel value-based properties, to plot the bar graph.
Fig. 3. Block diagram of unified image processing approach using neighbourhood concept and
fractal dimension
26 B. Dhananjay et al.
3 Results
3.1 Results of Different Image Processing Tools Applied on Normal
Fundus Images
Figure 4 represents the Normal fundus image results. Figure (4a) talks about reading
the original fundus image of a normal patient, (4b) shows the green component of the
image to put all the pixel value intensity of the image in a nearby intensity. (4c) Shows
the background of the green component image which was estimated using morpho-
logical opening. (4d) shows the result of subtraction of background image with the
original image. (4e) Increase the contrast of the image in step 4 (4d). (4f) Creating a
new binary image by thresholding the adjusted image. (4g) Removing the unwanted
noise. (4h) Skeletonising the image.
morphological opening. (5d) shows the result of subtraction of background image with
the original image. (5e) Increase the contrast of the image in step 4 (5d). (5f) Creating a
new binary image by thresholding the adjusted image. (5g) Removing the unwanted
noise. (5h) Skeletonising the image.
Figure 6 shows the Fundoscopic Negative fundus images result. Figure (6a) talks about
reading the original fundus image of a normal patient, (6b) shows the green component
of the image to put all the pixel value intensity of the image in a nearby intensity. (6c)
shows the background of the green component image which was estimated using
morphological opening. (6d) shows the result of subtraction of background image with
the original image. (6e) Increase the contrast of the image in step 4 (6d). (6f) Creating a
new binary image by thresholding the adjusted image. (6g) Removing the unwanted
noise. (6h) Skeletonising the image.
Figure 10c shows the calculation of object properties using pixel value of grays-
cale. The calculation of custom pixel value-based properties is shown in Fig. 10d.
Similarly, the same procedure was followed for the fundoscopic positive and negative
patients. All images of the patients are shown in Fig. 11a–d, Fig. 12a–d respectively
(Table 1).
Fractal Dimension of Fundoscopical Retinal Images 29
Table 1. The fractal dimension and neighbourhood concept values of study population
Subjects Fractal dimension Neighbourhood
concept
Mean SD Mean SD
Healthy subjects 0.039953 0.000175 44.94 0.72
Fundoscopic positive 0.000872398 1.04563*10−8 34.12667 0.980793
Fundoscopic negative 0.007087 0.00015 20.47333 0.479537
4 Conclusion
The work done in this paper clearly distinguishes the normal, Fundoscopic positive,
negative fundus images using the proposed fractal dimension and neighbourhood
concept to a great extent. The statistical data given in Fractal dimension concept and
Neighbourhood pixel concept widely shows the difference among the normal fundus
images and abnormal fundus images (Fundoscopic positive, Fundoscopic negative). It
is warranted that if the proposed algorithm incorporated in the fundus camera, it is
certain that it improves the efficiency of the fundus images.
30 B. Dhananjay et al.
References
1. An W, Zhang Y, Zhang X (2017) Ocular toxicity of reduced graphene oxide or graphene
oxide exposure in mouse eyes. Els Exp Eye Res 174:59–69
2. Galloway NR, Amoaku WMK (1999) Basic anatomy and physiology of the eye. In:
Common eye diseases and their management, Springer, London, pp 7–16
3. Kubota S, Kanomata K, Suzuki T, Ahmmad B, Hirose F (2015) Hybrid anti-reflection
structure with moth eye and multilayer coating for organic photovoltaics. J Coat Techno Res
12:37–47
4. Hayakawa S, Takaku Y, Hwang JS, Horiguchi T, Suga H, Gehring W, Ikeo K, Gojabori T
(2015) Function and evolutionary origin of unicellular camera-type eye structure. PLoS ONE
10(3):e0118415
5. Belmonte C, Acosta MC, Merayo-Lloves J, Gallar J (2015) What causes eye pain? Curr
Ophthalmoscope Rep 3:111–121
6. Greenspan H, Van Ginneken B, Summers RM (2016) Guest editorial deep learning in
medical imaging: overview and future promise of an exciting new technique. IEEE Transact
Medic Imaging 35:1153–1159
7. Wright AA, Hegedus EJ, Lenchik L (2015) Diagnostic accuracy of various imaging
modalities for suspected lower extremity stress fractures a systematic review with evidence-
based recommendations for clinical practice. American J Sports Med 44:255–263
8. Ogdie A, Taylor WJ, Weatherall M, Fransen J (2014) Imaging modalities for the
classification of gout: systematic literature review and meta-analysis. BMJ J 74:1868–1874
9. Prevedel R, Weisenburger S, Vaziri A (2017) Comparing two-photon excitation modalities
for fast, large-scale recording of neuronal activity in rodents. In: Optical molecular probes,
imaging and drug delivery (Paper JTu4A-7). Optical Society of America, 2–5 April 2017
10. Hussain N, Edraki M, Tahhan R, Sanalkumar N, Kenz S, Akasha NK, Mtemererwa B,
Mohammed N (2017) Telemedicine for diabetic retinopathy screening using an ultra-
widefield fundus camera. Clin Ophthalmol 11:1477–1482
11. Schindelin J, Rueden CT, Hiner MC, Eliceiri KW (2015) The ImageJ ecosystem: an open
platform for biomedical image analysis. Mol Reprod Dev 82(7–8):518–529
12. Thielicke W, Stamhuis EJ (2014) PIVlab – towards user-friendly, affordable and accurate
digital particle image velocimetry in MATLAB. J Open Res Softw 2(1):e30
13. Son J, Park SJ, Jung KH (2017) Retinal vessel segmentation in fundoscopic images with
generative adversarial networks. Corn. Univ. Lib. arXiv:1706.09318
14. Niall P, Tariq MA, Thomas M (2005) Retinal image analysis: concepts, applications and
potential. Prog Retin Eye Res 25:99–127
15. Doubal FN, Graham C, Wardlaw JM (2007) Fractal analysis of the retinal vascular network
in fundus images. In: Annual international conference of the IEEE engineering in medicine
and biology society EMBS
Photosweep: An Engineering Approach
to Develop Cost Effective Sterilization System
for Hospitals
1 Introduction
Surgical Site Infection (SSI) comprise more than one-third of all hospital-acquired
infections. It is observed that around 300,000 patients per year in the USA had a post-
operative SSI [1]. Re-usable surgical instruments also can cause cross-contamination of
serious life threatening diseases. Hospitals are not the only place affected by filthy
medical tools, but even research labs, privatized medical chambers are victims and its
proficiency, patients and worker health, are at stake. We know already that autoclave
machines are the master sterilizers, being used in these places but autoclave machines
are expensive. In developing countries the use of autoclaves machine is less because of
affordability, also they don’t have any or there is a lack of local manufacturers. So, they
need to import which increases the cost even more, but Photosweep uses sustainable
solar energy, not electricity, while providing similar working function of autoclaves.
See Fig 1.
Table 1. Parts and their functions of water heating panel and pipeline
Name of the parts Function
1. Pipeline for water supply Supplies required water
2. Steam generating heating panel Turns water into steam [123 C]
3. Pipeline for steam supply Supplies the steam to chamber I
4. Valves Control supply of water and steam
5. Temperature sensor Control the opening of valves
Fig. 2. Dimension of solar steam generating heating panel with parabolic trough
2.2 Process
• Clean water is supplied through the pipeline in a controlled way using one way
valve to the solar water heating panel. The solar steam generating heating panel, as
shown above, collects and stores the heat from solar energy, which is then absorbed
by running water through the evacuated tube. A set of evacuated tubes is situated
under the parabolic trough concentrator which can reach a temperature of 250 °C or
even more [2].
• Temperature sensor simultaneously controls two valves. When temperature reaches
at around 123 °C, temperature sensor opens the steam exit valve which allows
passing of the steam only into the jacket of chamber (I) through the pipeline and the
water supply valve permits water to enter evacuated tube. The incoming water into
the evacuated tube will decrease the tube temperature. When temperature reaches at
121 °C, temperature sensor closes the steam exit valve.
• Chamber material and pipeline will receive some heat to reach temperature of 121 °C.
Considering this heat loss in the chamber, higher temperature [123 °C] steam is
supplied than required [121 °C] (Calculation is given below). When the evacuated
tube is filled with water, the water supply valve will be closed.
2.4 Process
• At first, jacket gets filled by the steam coming from pipeline and it ensures uniform
distribution of steam inside the chamber. After filling the whole jacket properly,
steam enters inside the chamber through an opening (controlled by valves) which is
situated exactly opposite of the exit of the steam pipeline.
• Inside the chamber the steam needed for sterilization should be at 121 °C tem-
perature and 15 atm pressure. When pressure value reads 15 atm in pressure gauge,
sensor will close all three valves and thus stops the entrance of water & steam and
exhaustion of air. After 20–45 min sterilization is complete and the sensor opens the
exhaust valve.
2.6 Process
3 Related Calculations
Estimated dimensions of both chambers, required water supply, heat and temperature
of supplied steam for chamber I of this model are calculated below (Fig. 3).
Dimension of Outer Chamber:
Taking, Pressure, p ¼ 1:05 bar ¼ 0:103 MPa at 121 C
Radius of inner chamber, r ¼ 0:15m
Working stress for thin chamber, r = 5 MN/m2 30 MN/m2 [4]
Outer diameter = 0.30 m so, outer radius, r ¼ 0:15 m
Inner Radius, r ¼ 0:15 m :0031 ¼ 0:147 m
Height, h ¼ 0:60 m
Density of stainless steel used for chamber = 7930 kg/m3
Wall thickness of the chamber t ¼ prr ¼ 0:00309 m ¼ :0031 m
Volume of chamber V ¼ pr 2 h ¼ pð0:152 0:1472 Þ 0:60 ¼ 1:679 103 m3
Mass of the chamber m ¼ qV ¼ 7930 1:679 103 ¼ 13:314 kg
Weight of the chamber W ¼ mg ¼ 13:481 9:81 ¼ 130:610 N
Dimension of inner chamber:
Taking, Pressure, p ¼ 0:103 MPa at 121 C, Radius of inner chamber, r ¼ 0:13 m
Working stress for thin chamber, r = 5 MN/m2 30 MN/m2 [4]
Outer diameter = 0.26 m so, outer radius, r ¼ 0:13 m
Inner Radius, r ¼ 0:13 m :003 ¼ 0:127 m
Height, h ¼ 0:58 m & Density of stainless steel used for chamber = 7930 kg/m3
Wall thickness of the chamber, t ¼ prr ¼ 0:00309 m ¼ :003 m
Volume of chamber, V ¼ pr 2 h ¼ pð0:132 0:1272 Þ 0:58 ¼ 1:404 103 m3
Mass of the chamber, m ¼ qV ¼ 7930 1:404 103 ¼ 11:134 kg
Weight of the chamber, W ¼ mg ¼ 11:134 9:81 ¼ 109:225 N
Dimension of jacket,
Taking, pressure, p ¼ 1:05 bar ¼ 0:103 MPa at 121 C
Radius of inner chamber, r ¼ 0:125 m
Working stress for thin chamber, r ¼ 5 MN/m2 30 MN/m2 [4]
Outer diameter = 0.25 m so, outer radius, r ¼ 0:125 m
Inner radius ¼ 0:125 2:575 103 ¼ 0:122 m
Height, h ¼ 0:55 m & Density of Polypropylene = 946 kg/m3
Fig. 3. Dimension of outer chamber, inner chamber and jacket of chamber (I); Dimension of
chamber (II) will be the same excluding jacket.
36 P. Saha et al.
Again, when the sun intensity will be lower (during winter and cloudy nights),
molten salt, a mixture of sodium nitrate (NaNO3) 60% and potassium nitrate (KNO3)
40%, can be used as a storage medium of thermal energy because of its superior heat
capacity [8].
4.2 Comparison
The present condition of sterilization in a third world country is absolutely pitiful
owing to the fact of technological gap and huge expenses.
• Sterilization of both the metallic and plastic tools can be done in one single device
unlike autoclave using heating and germicidal properties of sunlight respectively.
• “Photosweep” can be manufactured locally as all the components are available in
the market which reduces the price, whereas other sterilizing devices are manu-
factured by only a few specific companies that needs to be imported.
• Autoclave machines are costing around $11,000 to $12,000 where “Photosweep”
cost around $691 [Parabolic trough $300, pipe and double metal chamber $125,
pressure temperature gauges & sensors $200, UV transmission filter $6, Polarizer
$45 and resonance tube $15].
• Conventional sterilization devices use electricity where they have to bear the
electricity bill. There is no such billing process in this model.
5 Conclusion
Using solar energy from Sun, the development of Photosweep sterilization system was
done. It uses two properties of sunlight, one for heating water and producing steam
using solar heating property and another utilizes germicidal property of UV ray. Even
38 P. Saha et al.
Acknowledgments. Authors would like to acknowledge the support from the Biomedical
Engineering Department, Military Institute of Science and Technology (MIST), Bangladesh.
Conflict of interest. The Authors declare that they have no conflict of interest.
References
1. Loyola University Health System (2017, January 19) Surgical site infections are the most
common and costly of hospital infections: guidelines for preventing surgical site infections are
updated. ScienceDaily. https://www.sciencedaily.com/releases/2017/01/170119161551.htm.
Accessed 23 Jan 2019
2. Nahas M, Sabry M, Al-Lehyani S (2015) Feasibility study of solar energy steam generator for
rural electrification. Energy Power Eng 7:1–11
3. Climate and Earth’s Energy Budget. Published January 14, 2009. https://www.
earthobservatory.nasa.gov/features/EnergyBalance/page4.php
4. Oyawale FA, Olaoye AE (2007) Design and construction of an autoclave. Pac J Sci Technol 8
(2):224–230
5. Katara G, Hemvani N, Chitnis S, Chitnis V, Chitnis DS (2008) Surface disinfection by
exposure to germicidal UV light. Indian J Med Microbiol 26(3):241–242
6. Xu Q, Ji X, Han J, Yang C, Li M (2018) Experimental study on a solar heat concentrating
steam generator, world academy of science, engineering and technology. Int J Energy Power
Eng 12(4):27–28
7. Shiroudi A, Deleuze MS, Mousavifar SM (2017) Efficiency analysis of a solar photovoltaic
array coupled with an electrolyser power unit: a case study. Int J Ambient Energy 38(3):
240–249
8. Ambrosson F, Selin M (2016) Solar concentrating steam generation in Alberta, Canada. An
investigation of the viability of producing industrial steam from concentrating solar
technology. Master of Science Thesis, KTH School of Industrial Engineering and
Management Energy Technology EGI-2016–052 EKV1151 Division of Heat & Power SE-
100 44 STOCKHOLM
Splice Junction Prediction in DNA Sequence
Using Multilayered RNN Model
1 Introduction
A faulty splicing operation can cause lethal side effects such as mutations in genes
BRCA1 and BRCA2, which increase a female’s risk of developing breast and ovarian
cancer [2]. Also, a change at an exon-intron junction results in b-Thalassemia [3]. The
proposed methodology thus explores means to reach optimal performance with near-
ideal accuracy of 99.95% with optimal precision-recall trade-off. The model in its entirety
can be found at: https://github.com/RahulSkr/junctionPredictionFromGeneSequence.
Optimal parameter settings for this architecture have been obtained through extensive
fine tuning, which has been discussed in details in the sections below.
The rest of the paper has been organized as follows: Sect. 2 discusses the various
existing statistical models designed for same/similar classification tasks; Sect. 3
explains in details all the sequence encoding stages followed by the model develop-
ment; Sect. 4 shows the various performance statistics and curves obtained by the
model on the given dataset and finally Sect. 5 discusses the importance of the proposed
methodology and touches upon its possible future scope.
2 Related Works
In this section, we briefly summarize the various algorithms involved in DNA sequence
encoding and splice junction classification. A comparison of the performance of the
proposed model with the existing relevant algorithms is provided in Sect. 4.2.
Salzberg [4] explained a machine learning system based on decision trees com-
bining 21 coding measures such as dicodon frequency, hexamer frequency and etc., to
produce classifiers on DNA sequence with lengths 54, 108 and 162 base pairs. Nguyen
et al. [5] in 2016, introduced an innovative method to classify DNA sequences using
CNN. A sequence encoding approach was devised similar to ours where one hot
encoding was performed on a group of 3 consecutive nucleotide bases and 2 such
groups were concatenated together to obtain a 2D matrix from the sequence. This
matrix was then processed by the CNN model.
Splice Junction Prediction in DNA Sequence 41
Works based on splice junction prediction include: SVM power series kernels for
classification with 4-position independent K-mer frequency based methods for map-
ping DNA sequences into SVM feature spaces, as described by Damaševicius [6].
Cervantes et al. [7] explained a sequence classification performed by using SVM and
Bayesian classifiers. Kerdprasop et al. [8] described a splice site prediction algorithm
based on association analysis: the frequent DNA patterns were combined and priori-
tized with respect to their annotations and support values based on which several rules
were generated for classification operation. A hybrid machine learning ensemble model
employing an AdaBoost A1DE and a bagging random forest classifier, was explained
by Mandal [9]. A supervised feature reduction technique was developed using entropy-
based fuzzy rough-set theory and was optimized using greedy hill climbing algorithm.
In the year 2015, Lee et al. [10] described a RNN architecture consisting of stacked
RNN layers followed by FC layers. Zhang et al. [11] proposed a methodology
involving deep CNN consisting of 2 layers. Encoding was performed for each sequence
using one hot encoding of the nucleotide bases: A, C, G, T, N (either of the bases).
A comparison of the performance of the proposed methodology with the afore-
mentioned relevant splice junction prediction algorithms is provided in Table 2.
3 Proposed Methodology
Our proposed recurrent neural network has a 3 layer deep architecture which is built
using multiple recurrent units at each layer. In this study, we showcase the performance
of the architecture with GRU, LSTM and basic RNN cells. The encoding process has
also been discussed along with related theories to ensure its reliability. A pictorial run-
down of the proposed methodology is shown in Fig. 2.
It is clear that for the dataset in question we obtain 3 sets of 3175 codon sequences.
The shifting is performed in order to obtain all the possible sequences of codon. The
ignored nucleotide bases in the shifted sequences can be considered to be a part of the
codon preceding or following the given sequence. Custom encoding is then performed
on these sets.
Each codon from the sequences is labelled with respect to its position in the DNA
codon table [13]. Thus, a numeric labelled sequence is obtained from the codon
sequences. This sequence is then converted to a sparse one-hot encoded matrix. The
codons are responsible for protein synthesis and hence are responsible for coding of
genetic characteristics. This theoretically justifies the effectiveness of our encoding
algorithm. In the following sections we describe the model architecture in details.
Splice Junction Prediction in DNA Sequence 43
Data flow within the model. The model consists of 3 stacks, where each stack
consists of 3 layers. Each layer is made up of 90 basic RNN, GRU or LSTM cells. Each
stack i, is made to train on a particular shift-sequence xi, i.e., the 0-shift sequence is
processed by the first stack, the 1-shift sequence is processed by the second and so on.
At a given state, the first layer in a stack receives a one-hot encoded vector of 64 values
representing the presence of a particular codon. The 90 hidden units in this layer
process the data and, the output of this layer bi, is forwarded to the next layer. The cell
state (along with the cell output in case of LSTM), ai is forwarded to the next state.
Finally, the output from the ultimate RNN cell layer of each stack is multiplied to a
weight vector and Softmax activation is applied in order to obtain the classification
probabilities. Now, the mean of the probability values, y is obtained from each stack
and classification is performed. The performance of the aforementioned model while
using LSTM, GRU and basic RNN cells have been discussed in details in the sections
below.
44 R. Sarkar et al.
Verification of the model performance is done by monitoring the AUROC score along
with its accuracy and loss. In this section, we compare these performance metrics of the
3 variants of our model and show how our custom embedding is able to optimize the
classification ability of these variants.
Fig. 7. (a) Shows the training accuracy curve (b) Shows the training loss curve
The training accuracy and loss curves are shown in Fig. 7. It is clear that the basic
RNN variant of the network is the most efficient. However, this is subject to the
condition that the sequences are of limited lengths. With increase in length of the
sequence, the basic RNN variant will become prone to the problem of vanishing
gradient and hence, the LSTM/GRU variants are recommended instead.
Finally, a comparison of our proposed methodology with the existing statistical
models with respect to accuracy, for the classification task in question has been shown
in Table 2.
46 R. Sarkar et al.
Table 2. Summary of some of the existing statistical models used for splice junction prediction
Author Year Employed methodology Accuracy
Cervantes et al. [7] 2009 Sparse encoding with SVM and Bayesian 98.2%
classifier
Kerdprasop et al. [8] 2010 Association analysis over frequent DNA 96.1%
patterns
Mandal [9] 2014 Ensemble model using AdaBoost and 99.67%
Random forest classifier
Lee et al. [10] 2015 RNN model with RELU, LSTM and 94.3%
GRU
Zhang et al. [11] 2016 Multilayered convolutional neural 96.1%
network
Proposed methodology – Multilayered RNN model 99.95%
In this study, we explored the concepts of our versatile DNA sequence encoding
algorithm along with a state of the art model for classification of splice junctions in
DNA sequences. The encoding algorithm introduced by us could be used to obtain
consistent and better performance from existing DNA sequence analysis models for
tasks other than the one performed in this study. The consistency of the performance of
the variants of the proposed model justifies the validity of our proposed encoding
procedure. The proposed network architecture introduced here, shows ideal compati-
bility with the aforementioned encoding process and achieves ideal performance
scores.
This study can be further extended to implement the aforementioned encoding
algorithm in order to accurately predict nucleosome occupancy, acetylation and
methylation regions in yeast genome sequence data [14], as these factors have a major
impact on nuclear processes involving DNA. This would immensely help to automate
the process of DNA sequence analysis using machine intelligence.
References
1. PREMIER Biosoft: gene splicing overview & techniques. www.premierbiosoft.com/tech_
notes/gene-splicing.html. Accessed 2019
2. Medical Xpress: predicting how splicing errors impact disease risk. https://medicalxpress.
com/news/2018-08-splicing-errors-impact-disease.html. Accessed 2019
3. Murray RK, Bender DA, Botham KM, Kennelly PJ, Rodwell VW, Weil PA. Harpers
illustrated biochemistry, pp 352–354. Accessed 2019
4. Salzberg S (1995) Locating protein coding regions in human DNA using a decision tree
algorithm. J Comput Biol: J Comput Mol Cell Biol 2:473–485
5. Ngoc Giang N, Anh Tran V, Luu Ngo D, Phan D, Lumbanraja F, Faisal MR, Abapihi B,
Kubo M, Satou K (2016) DNA sequence classification by convolutional neural network.
J Biomed Sci Eng 9:280–286
Splice Junction Prediction in DNA Sequence 47
6. Damaševicius R (2008) Splice site recognition in DNA sequences using k-mer frequency
based mapping for support vector machine with power series kernel. In: International
conference on complex, intelligent and software intensive systems, pp 687–692, March 2008
7. Cervantes J, Li X, Yu W (2009) Splice site detection in DNA sequences using a fast
classification algorithm. In: SMC’09 Proceedings of the 2009 IEEE international conference
on systems, man and cybernetics, pp 2683–2688, October 2009
8. Kerdprasop N, Kerdprasop K (2010) A high recall DNA splice site prediction based on
association analysis. In: International conference on applied computer science proceedings
9. Mandal DI (2015) A novel approach for predicting DNA splice junctions using hybrid
machine learning algorithms. Soft Comput 19:3431–3444
10. Lee B, Lee T, Na B, Yoon S (2015) DNA-level splice junction prediction using deep
recurrent neural networks. CoRR abs/1512.05135
11. Zhang Y, Liu X, MacLeod JN, Liu J (2016) Deepsplice: deep classification of novel splice
junctions revealed by RNA-seq. In: 2016 IEEE international conference on bioinformatics
and biomedicine (BIBM), pp 330–333. IEEE, December 2016
12. NCBI: Genbank. ftp://ftp.ncbi.nlm.nih.gov/genbank. Accessed 2019
13. Revolvy: Dna codon table. https://www.revolvy.com/page/DNA-codon-table. Accessed
2019
14. Pham TH, Tran DH, Ho TB, Satou K, Valiente G. Qualitatively predicting acetylation &
methylation areas in DNA sequences. http://www.jaist.ac.jp/*tran/nucleosome/index.htm.
Accessed 2019
Development of an Inexpensive Proficient
Smart Walking Stick for Visually
Impaired Peoples
1 Introduction
According to WHO (2018), In present world around 1.3 billion people live with vision
impairment where 188.5 million people have a trivial level of visual impairment and
217 million people have moderate to extreme vision impairment. Shockingly, all but
36 million are blind among them [1]. Blind or severely vision impaired people are
treated like a burden to their family as well as the country. They face problems, mainly
outside the home because of having a lack of foresight. So, they always require
someone to move. Unfortunately, this condition is more pitiful in developing countries
which bind them to lead a miserable life. Where they are using very simple cane, which
can’t assist them properly. Consequently, they face accidents frequently. Moreover,
because of financial condition they can’t afford high technological & expensive sticks.
But this model of blind stick with very nifty design and intelligence will be reasonable
for all. It is far more handy and low cost than other developed sticks. Which is not only
© Springer Nature Switzerland AG 2020
S. C. Satapathy et al. (Eds.): ICETE 2019, LAIS 3, pp. 48–56, 2020.
https://doi.org/10.1007/978-3-030-24322-7_7
Development of an Inexpensive Proficient Smart Walking Stick 49
detects the obstacle rather direct them in the path by measuring distances. Moreover,
it also senses the water areas which is a huge advantage for the user to avoid mud and
water sources. Whenever difficulties are found by the sensors, by producing respective
vibration and different beep sounds it will direct people to the right path.
2 Related Works
There are some existing systems that can be used, but those systems have some
drawbacks. Several works have been done about the smart walking stick using different
types of components and module. Some of them are very shrewd, but costly and some
of them have complex structure. This model has a very simple design with intelligence
and most significant thing is that it is reasonable to all.
Microcontroller, three IR sensor for three sides and two speakers have been used in
a model which doesn’t detect water so there is a risk with water [2]. Where in another
work ultrasonic sensor, water sensor, Arduino and Bluetooth module with android
device application have been used to send voice messages. They have also used GPS
system for safe migration of blind people to reach destination. All those things made it
smart, but overpriced [3]. Again, another model was proposed which involve IR sensor
and earphone to deliver a speech and authors demanded that it is cheap and user
friendly, where the estimated cost is around $120 does not seem so cheap [4]. Another
group designed a shrewd android based cane, which consists of two parts. Those are
Android application to get information from phone about location, longitude, etc. and
the hardware part include Arduino and sensors (ultrasonic and IR). It is an eminent
design and mostly applicable for migration, but expensive [5]. An expensive but smart
cane using Microcontroller, GPS-GSM module, RF module, sensors (water, ultrasonic)
etc. was proposed [6]. Finally, a good model of cane was proposed which provide
communication through call by using a system called “e-SOS (electronic save our
soul)”. They have used Raspberry Pi 3, Raspberry Pi camera, ultrasonic sensor. Their
approached model is appreciable but using those costly components it became
expensive [7]. All those designed sticks have very well structure and functional
capability. But not affordable to all because of their components and technology have
been used. So our aim is to provide smart stick at low cost.
3 Tools
Basically, we have used several basic components to develop the circuitry of this smart
cane. Such as, Microcontroller, Ultrasonic sensors, push button, battery (9 V), voltage
regulator, ADC, resistors, capacitors, buzzer, vibration motor, connecting wire, etc.
Here, describing about the major components in Table 1 and showing the block dia-
gram of the system in Fig. 1.
50 P. Saha et al.
Fig. 2. Microcontroller
Fig. 3. Ultrasonic
Sensor [7]
Fig. 4. Battery
(continued)
Development of an Inexpensive Proficient Smart Walking Stick 51
Table 1. (continued)
Name of the parts Function Figure
Water Sensor It is mainly used to detect the presence
of water to keep safe the person from mud
to get rid of slippery. Used one is given
in Fig. 5.
4 Techniques
4.1 Obstacle Detection
Detection of obstacle is done by using ultrasonic (HC-SR04) sensors. It is a transducer
whose job is to send and receive echoes continuously while it is active. Here, in this
model three sensors are used in three sides (right, front and left) of the stick. These
sensors produce ultrasound waves which get reflected if there is any obstacle in range
(2.5 m). That means if there is any obstacle within 2.5 m distance from the stick the
sensor will receive the reflected echo. Then sensors automatically measure the distance
of the obstacle using the time lapses between sending and receiving echoes following
these equations,
It is said by Laplace, that longitudinal wave speed is determined by the equation-
pffiffiffiffiffiffiffi
P
u ¼
q
S ¼ ut
pffiffiffiffiffi
So, S ¼ qP t
Where, t = time lapses, P = pressure, q = density & is a dimensionless quantity
[7, 8].
But, according to law of motion, S ¼ ut þ 12 at2 :
Here, acceleration a ¼ 0
So, S ¼ ut
Considering sending and receiving time, here distance of the obstacle from the stick
will be determined by,
pffiffiffiffiffiffiffi
ut P
S ¼ ¼ t
2 2q
52 P. Saha et al.
The output of an ultrasonic sensor is analog. So, these information of sensors are
converted to digital by using an ADC (Analog to Digital Converter) then delivered to
the microcontroller to process further (Fig. 6).
4.2 Processing
Main processing of the collected data from the sensors is done by a microcontroller of
PIC family. When the information is conveyed to microcontroller it process them
following an algorithm to generate a decision. A micro C program is developed in its
programming IDE called “MikroC PRO” of version 7.1. The working procedure of this
program follows the flow chart given in Fig. 7.
For instance, if there are obstacles in right & front sides, it will allow buzzer to beep
twice and vibrate the motor faster. Then the person will understand that there are
obstacle in right and front side. So he/she must go in the left side. The whole process
follows the flow chart to generate different decision depending on different situation
(Fig. 8).
TruePositive
Sensitivity Sn ¼
TruePositive þ FalseNegative
.
To check the accuracy of this model, it has been tested in different situation using
different obstacles. Obtained result & following bar diagram are given in Table 2 and
Fig. 9 respectively.
54 P. Saha et al.
This is a very practical design having fairly fast response. Measured time to response in
different situations given in Table 3.
There is nothing only having pros, there will be always some cons in everything in
the world. In case of this model of stick, the average percentage of the error (19%) in
detecting several obstacles was not so high. However, it can be reduced with the proper
construction of the system. It always needs some time to detect anything but ranging in
millisecond with some exceptions.
6 Conclusion
With the rapid growth of technology people facilitating their regular daily life with
many kinds of smart electronic devices. Progressively, responsibility for the impaired
population of the world is also aggregating. In some cases, dependency on such
technologies is praiseworthy and assisting us in many ways. Learning all these, this low
cost and smart walking stick has developed for the visually impaired peoples and tested
with real impaired people. Though several designs of smart stick for the blind peoples
are available, but affordability for all is yet to achieve. Keeping that in mind this design
of the stick has developed. Which is very simple structured, inexpensive, feasible and
user friendly as well. It can be affordable for mass peoples who are visually impaired.
This stick has given the faster detection of the objects in a very effective manner. Thus,
this designed and developed smart walking stick is practical and obviously such stick
will improve the quality of the daily life of the visually impaired people worldwide in
the near future.
Acknowledgments. The authors would like to acknowledge the support from the Biomedical
Engineering Department, Military Institute of Science and Technology (MIST), Bangladesh.
Conflict of interest. The Authors declare that they have no conflict of interest.
References
1. Bourne RRA, Flaxman SR, Braithwaite T, Cicinelli MV, Das A, Jonas JB et al (2017)
Magnitude, temporal trends, and projections of the global prevalence of blindness and
distance and near vision impairment: a systematic review and meta-analysis. Lancet Glob
Health 5(9):888–897 Vision Loss Expert Group
2. Al-Fahoum AS, Al-Hmoud HB, Al-Fraihat AA (2013) A smart infrared microcontroller-
based blind guidance system. Act Passiv Electron Compon 2013:7 Article ID 726480,
Hindawi Publishing Corporation
3. Jawale RV, Kadam MV, Gaikawad RS, Kondaka LS (2017) Ultrasonic navigation based
blind aid for visually impaired. In: IEEE international conference on power, control, signals
and instrumentation engineering (ICPCSI)
4. Nada AA, Fakhr MA, Seddik AF (2015) Assistive infrared sensor based smart stick for blind
people. In: Science and information conference (SAI)
5. Mashat MD, Albani AA (2017) Intelligent blind cane system. In: UKSim-AMSS 19th
international conference on computer modelling & simulation (UKSim)
56 P. Saha et al.
6. Agrawal MP, Gupta AR (2018) Smart stick for the blind and visually impaired people. In:
Second international conference on inventive communication and computational technologies
(ICICCT)
7. Mohapatra S, ham Rout S, Tripathi V, Saxena T, Karuna Y (2018) Smart walking stick for
blind integrated with SOS navigation system. In: 2nd international conference on trends in
electronics and informatics (ICOEI)
8. Hollis DB (2011) An e-SOS for cyberspace. Harv Int LJ 52:373
9. Kanagaratnam K (2009) Smart mobility cane: design of obstacle detection. EE 4BI6 Electrical
Engineering Biomedical Capstones
A Continuum Model and Numerical
Simulation for Avascular Tumor Growth
1 Introduction
Initially, tumor growth does not have any direct vascular support. It takes necessary
oxygen and nutrients for sustainable unbounded growth from the surrounding micro-
environment through passive diffusion [1, 2]. The metabolic consumption of an
avascular tumor nodule also grows with its volume proportionally, but actually it
absorbs oxygen and nutrients proportionate to its surface area [3]. So, after a certain
time, due to the deprivation of oxygen and nutrients tumor growth will be stagnant
(1–2 mm in radius approx.).
Oxygen and nutrients deficiency gradually increase among the tumor cells with
distance from the outer layer of the tumor towards its center. At the center of the tumor
deficiency level will be the maximum. The deficiency of oxygen and nutrient within
tumor cells divides the tumor into three different layers; though these layers are not
clearly separated [4]. The outer layer mostly consists of proliferative cells and the inner
most contains only the dead cells (necrotic core). The layer in between them is called
the quiescent layer (collection of hypoxic cells) which are alive but do not divide. In
this study, we consider quiescent cells consume less oxygen and nutrients compared to
the proliferative cells.
Mathematicians have immensely studied avascular tumor growth since 1950’s and
developed various models from different perspectives. Greenspan [5] developed first
ever tumor growth model with proliferative, quiescent, and necrotic cell zones. Later
researchers have adapted this framework and tried different modifications, like Ward
and King [6] have developed tumor growth models in terms of dead and live cells
densities and also considered that the cell population is space filling with new cells
through cell divisions. Later, they [7] have extended their work and included cell
motility into tumor spheroid. Sherratt and Chaplain [8] have developed a spatio-
temporal model which considered different types of cells in tumor spheroid (which are
not sharply divided into layers) and the tumor growth is driven by the cell movements
under the influence of nutrients concentration.
In this study, we consider that the avascular tumor (only for epithelium tissue) is
in vivo and disk shaped. Oxygen and nutrients synthesize the structural support of the
tumor cells. Diffusion and convection processes in biological systems are very complex
as most of the transportations pass through cellular membranes which are nonhomo-
geneous in nature. From the studies of the past few decades, it has been shown that
entity concentrations passing through heterogeneous media are anomalous or non-
Fickian in nature with a long leading or trailing edges [9, 10]. Within the cellular
membrane, diffusion coefficient (constant) alone cannot describe the diffusion process.
It changes with the spatial coordinates as the structural complexity varies close to the
membrane surface [10].
The aim of this research work is to develop a mathematical model based on coupled
fractional advection-diffusion equations (FADE) from phenomenological point of
view. Initially, we develop a two dimensional (in Polar coordinate system) spatio-
temporal model based on simple advection-diffusion equation, assuming less sharp
demarcations between different cell layers. Afterwards we modify the basic model
based on FADE. We include memory based diffusion process [10] to handle the non-
Fickian nature of the process and also include a suitable parameter to express skewness
in diffusion. Though, memory formalism in FADE is not adequate enough to model
such a complex system like tumor microenvironment in the microscopic level, as
several molecular activities are involved. But at the macroscopic level, FADE based
model offers more realistic description of the overall system.
This paper is organized as follows: Sect. 2 describes the two dimensional model
based on simple advection-diffusion equation in polar coordinate system; modification
of the model with respect to anomalous diffusion is presented in Sect. 3, Sect. 4 is
concerned with parameter estimation and model evaluation, and Sect. 5 concludes the
paper.
We consider that the tumor is disk shaped. Considering the radial symmetry, we
assume that p(r, t), q(r, t), and n(r, t) denote proliferative, quiescent, and necrotic cell
concentrations respectively. Here, r denotes the spatial domain in polar coordinate
system, and t indicates time. The tumor grows due to diffusive and convective force.
The movement of extracellular matrix (ECM) surrounding the tumor is responsible for
convection. In this model ve denotes the velocity of ECM. While, the distinctions
between these three layers are not sharp, but the presence of one layer restricts the
A Continuum Model and Numerical Simulation 59
movement of the other layers. We assume that necrotic cells cannot migrate, divide or
do not consume oxygen or nutrients. Hence, no cell flux is required for necrotic core as
they are collection of dead cells. In this study we also include parameters (ap, aq) to
handle the death rates due to apoptosis in the proliferative as well as quiescent cells.
We assume that the values of ap and aq are the same.
We further assume that oxygen (co(r, t)) and nutrient (cn(r, t)) concentrations are
different entities. The tumor cell divisions, proliferation, transformation into quiescent
or necrotic cells are controlled by concentration levels of oxygen and nutrients. Hence,
all the parameters: proliferation rate, proliferative to quiescent and quiescent to necrotic
transformation rates should be accompanied by co and cn or some function of co and cn.
Under these assumptions we develop the following system Eq. (1),
@p
@t
@
¼ @r Dp @p @p
@r ve p þ r Dp @r ve p þ ah1 ðco ; cn Þp bh2 ðco ; cn Þp ap p
1
@q @ @q @q
@t ¼ @r Dq @r ve q þ r Dq @r ve q þ bh2 ðco ; cn Þp ch3 ðco ; cn Þq aq q
1
@n ð1Þ
@t ¼ ch3ðco ; cn Þq @co
@co @ @co
@t ¼ @r Do @r ve co þ r Do @r ve co þ lo co k1 co k2 pco k3 qco
1
@cn @ @cn @cn
@t ¼ @r Dn @r ve cn þ r Dn @r ve cn þ ln cn w1 cn w2 pcn w3 qcn
1
Were, Dp, Dq, Do, and Dn are the diffusion coefficients of proliferative, quiescent,
oxygen, and nutrients concentrations respectively; a, b, and c are the rates of prolif-
erations, proliferative to quiescent, and quiescent to necrotic transformations; µo and µn
are scalars controlling the levels of oxygen and nutrients at any point in the domain of
interest; whereas, k1, k2, k3 and w1, w2, w3 express the losses due to consumption by
proliferative and quiescent cells. The system of Eq. (1) contains three functions h1, h2,
and h3. We consider,
Generally, biological processes pass through cell membranes which are porous media
and heterogeneous in nature. In this work, the profile concentration of the diffusing
tumor cells, oxygen and nutrients inside an organ/tissue has been calculated on the
basis of the FADE by introducing memory formalism (diffusion with memory). Dif-
fusion with memory indicates the past behaviour of the function itself [10]. This
approach generalizes the classical diffusion models to more complex systems, where
diffusion coefficients are the function of co, cn as both of them are varied with spatial
domain. Skewness in diffusion may also be considered through a suitable parameter
(u). FADE with fixed order have shown certain advantages to model anomalous (non-
Fickian) diffusion to some extent [11, 12]. In this phase, we modify model (1) and
60 S. Sadhukhan et al.
include FADE based model using the Caputo definition of fractional derivative. We use
an unconditionally stable finite element method (FEM) [13] to solve FADE.
As the medium of convection-diffusion in biological system is porous, it is also
assumed that diffusion coefficient and convective velocity are related as:
Here, we have used h instead of h(co, cn), and hq instead of hq(co, cn) for the
purpose of clarity. ∂q/∂t, ∂n/∂t, ∂co/∂t, and ∂cn/∂t also look similar to ∂p/∂t.
We consider,
We assume that the minimum distance from the tumor center (r = 0) to its nearest
blood vessel is d. Therefore, we consider a circular domain of radius d in which the
tumor has grown. Here, r is the radial direction from the center (r = 0) towards the
boundary (r = d) of the disk shaped domain. Now, we non-dimensionalize the system
of Eq. (5) by rescaling distance d with time s = d2/Do0. Proliferative cell, quiescent
cell, necrotic cell, oxygen, and nutrients concentrations are also rescaled with p0, q0, n0,
c1, and c2 respectively (where p0, q0, n0, c1, and c2 are the appropriate reference
variables). Therefore, p* = p/p0; q* = q/q0; n* = n/n0; c*o = co/c1; c*n = cn/c2; t* = t/s.
The new system of equations becomes (by dropping the stars),
p
@p @2p @p 1 @p 1 @h @co @hp @cn @p
¼ D1 hq 2 vh þ D1 hq vhp þ D1 þ
@t @r @r r @r r @co @r @cn @r @r
ð7Þ
@h @co @h @cn
v þ p þ ah1 ðco ; cn Þp bh2 ðco ; cn Þp ap p
@co @r @cn @r
The rest of the equations (∂q/∂t, ∂n/∂t, ∂co/∂t, and ∂cn/∂t) also look similar to ∂p/
∂t where, a* = ad2/Do0; b* = bd2/Do0; a*p = apd2/Do0; η* = bd2/Do0q0; aq* = aqd2/Do0;
A Continuum Model and Numerical Simulation 61
where, / is the fractional order, and u (−1 u 1) is the skewness parameter. D/L
and D/R are the left- and right-handed fractional derivatives respectively. L (L = 1) and
R (R = 2) are the corresponding lower and upper bounds of /. The definitions of left-
and right-hand derivatives are,
@/ @/
D/L ¼ and D/R ¼ ð9Þ
@ðrÞ/ @ðrÞ/
and
@np 1þw n 1w n
DL ðpÞ DR ðpÞ ; 0 \f 1 ð10Þ
@r n 2 2
where, f ð¼ U 1Þ is the fractional order. DnL and DnR are the left- and right-handed
fractional derivative respectively. L (L = 0) and R (R = 1) are the corresponding lower
and upper bounds of n. The definitions of left- and right-hand limits are,
@n @n
DnL ¼ n
and DnR ¼ ð11Þ
@ðrÞ @ðrÞn
In this study we have used referred or previously estimated values from the experi-
mental data, if possible. All the experiments are done in 10−2 mm scale. As we have
mentioned before that at the avascular stage a tumor can grow at most 1–2 mm in
radius, so, we consider the value of d = 2.50 mm (250 mm−2). As, our study is con-
centrated on the tumor in epithelium tissue, it is considered that the diffusivity constants
of proliferative and quiescent cells are the same as the epithelium cells. For this study
we have taken Dp0 = Dq0 = 3.5 10−11 cm2s−1 [16]. We further consider that the
diffusivity of nutrients (Dn0) is 1.1 10−6 cm2s−1 [17] and oxygen (Do0) as
1.82 10−6 cm2s−1(estimated).
For simulation purpose we use proliferation rate (a) = 1.1 d−1 [18], c = 3.8 10−6
s [19], and apoptosis rates ap = aq = 4 10−10 s−1 [19]. We could not find any
−1
reference for b, hence, the value is taken (1.7 10−4 s−1). We also use ve0 = 2.4
10−10 cms−1, s1 = 0.25, s2 = 0.30; µo = µn = 1.0 d−1, and the consumption rates
k1 = w1 = 1.0 d−1, k2 = w2 = 0.8 d−1, and k3 = w3 = 0.3 d−1. It is considered that the
temporal as well as the spatial step sizes for our simulation are Dt = 0.004 and Dr =
= 1 respectively. We simulate our model with p0 = 1, q0 = 2.25, n0 = 1.5, c1 = 1,
c2 = 1. We assume h = 1, and u = 0.5 in this study.
Fig. 1. (a) Proliferative (p/p0), (b) quiescent (q/q0), and (c) necrotic (n/n0) cell concentration
waves at different time intervals with respect to the distance (d) from tumor center.
Fig. 2. (a) Represents proliferative, quiescent, and necrotic cell concentrations at 3500th day;
(b) oxygen (co/c1), and (c) nutrients (cn/c2) concentrations in different time intervals.
and quiescent cells (hypoxic) residing at that place are transformed into necrotic cells.
With time the necrotic core increases rapidly and reaches approximately 1.4 mm
(Fig. 1(c)) in radius, whereas hypoxic cells grow approximately 1.65 mm from the
tumor centre (Fig. 1(b)). This means that necrotic core acquires approximately more
than 70 percent area whereas, proliferative cells contain 17.5% and quiescent cells
contain only 12.5% area in an avascular tumor. The outer surface of the tumor always
contains proliferative cells with higher concentration. The overlapping areas between
the proliferative and quiescent, and quiescent and necrotic cells in Fig. 2(a) indicate
Fig. 3. (a) Proliferative (p/p0), (b) quiescent (q/q0) and (c) necrotic (n/n0) cell concentration
waves at different order FADE for 3500th day (u = 0.5 and / varying from 1.5 to 1.95).
64 S. Sadhukhan et al.
that at any time, boundary between the two layers is not clear. Tumor regression cannot
be seen in its life time. The above simulation has done with / ¼ 1:85.
FADE model is also tested with different orders / = 1.5, 1.65, 1.75, 1.85, and 1.95.
It can be seen (Fig. 3(a)–(c)) that varying order does not affect proliferative, quiescent,
and necrotic cell concentrations much excepting when / ¼ 1:5 in terms of intensity. It
is also clear that simple ADE always underestimate the radius of tumor than FADE
based model with the same set of parameters. This means that tumor grows faster in
FADE based model than in simple ADE based model. Not only tumor radius but also
the quiescent and necrotic cell concentrations increase and the necrotic cells acquire
almost 3/5 portion of the tumor spheroid. In FADE we have also considered porosity
and dynamic behaviour of cell membranes. FADE-based model tries to represent tumor
growth more realistically than the simple ADE based model.
5 Conclusion
References
1. Sutherland RM (1988) Cell and environment interactions in tumor microregions: the
multicell spheroid model. Science 240(4849):177–184
2. van Kempen LCL, Leenders WPJ (2006) Tumours can adapt to anti-angiogenic therapy
depending on the stromal context: lessons from endothelial cell biology. Eur J Cell Biol 85
(2):61–68
3. Orme ME, Chaplain MAJ (1996) A mathematical model of the first steps of tumour-related
angiogenesis: capillary sprout formation and secondary branching. Math Med Biol: J IMA
13(2):73–98
4. Hystad ME, Rofstad EK (1994) Oxygen consumption rate and mitochondrial density in
human melanoma monolayer cultures and multicellular spheroids. Int J Cancer 57(4):532–
537
5. Greenspan HP (1972) Models for the growth of a solid tumor by diffusion. Stud Appl Math
51(4):317–340
6. Ward JP, King JR (1997) Mathematical modelling of avascular-tumour growth. Math Med
Biol: J IMA 14(1):39–69
A Continuum Model and Numerical Simulation 65
7. Ward JP, King JR (1999) Mathematical modelling of avascular-tumour growth II: modelling
growth saturation. Math Med Biol: J IMA 16(2):171–211
8. Sherratt JA, Chaplain MAJ (2001) A new mathematical model for avascular tumour growth.
J Math Biol 43(4):291–312
9. Gal N, Weihs D (2010) Experimental evidence of strong anomalous diffusion in living cells.
Phys Rev E81(2):020903
10. Caputo M, Cametti C (2008) Diffusion with memory in two cases of biological interest.
J Theor Biol 254(3):697–703
11. Morales-Casique E, Neuman SP, Guadagnini A (2006) Non-local and localized analyses of
non-reactive solute transport in bounded randomly heterogeneous porous media: theoretical
framework. Adv Water Resour 29(8):1238–1255
12. Cushman JH, Ginn TR (2000) Fractional advection-dispersion equation: a classical mass
balance with convolution-Fickian flux. Water Resour Res 36(12):3763–3766
13. Roop JP (2006) Computational aspects of FEM approximation of fractional advection
dispersion equations on bounded domains in R2. J Comput Appl Math 193(1):243–268
14. Chen W, Sun H, Zhang X, Korošak D (2010) Anomalous diffusion modeling by fractal and
fractional derivatives. Comput Math Appl 59(5):1754–1758
15. Meerschaert MM, Tadjeran C (2006) Finite difference approximations for two-sided space-
fractional partial differential equations. Appl Numer Math 56(1):80–90
16. Sherratt JA, Murray JD (1991) Mathematical analysis of a basic model for epidermal wound
healing. J Math Biol 29(5):389–404
17. Casciari JJ, Sotirchos SV, Sutherland RM (1988) Glucose diffusivity in multicellular tumor
spheroids. Can Res 48(14):3905–3909
18. Burton AC (1966) Rate of growth of solid tumours as a problem of diffusion. Growth 30
(2):157–176
19. Busini V, Arosio P, Masi M (2007) Mechanistic modelling of avascular tumor growth and
pharmacokinetics influence—Part I. Chem Eng Sci 62(7):1877–1886
20. Notes of oncologist. https://notesofoncologist.com/2018/02/26/how-fast-do-tumours-grow/ .
Accessed 28 Jan 2019
A Review on Methods of Treatment
for Diabetic Foot Ulcer
1 Introduction
Diabetes is one of the most widespread chronic diseases around the world. India is
termed as Diabetes capital of the world as it ranks second in diabetic cases globally.
This is due to the changing lifestyle, lack of physical work, unbalanced diet, i.e., intake
of food that is rich in energy (sugar and unsaturated fat) and poor in nutrients.
Approximately 425 million people suffer from diabetes worldwide in 2016, and among
these, around 73 million people belong to India. According to the World Health
Organization (WHO), it is estimated that this number rises to 87 million people by
2030. Almost 15% of patients suffering from diabetes are prone to develop a foot ulcer
in their lifetime. Normal foot ulcers can be healed, but DFU remains unable to heal due
to poor circulation and damage to peripheral nerve endings. Patients suffering from
diabetes experience metabolic disorders that disturb the normal wound healing process.
As a result, Diabetic foot ulcers may take a longer time to heal (only when proper
wound care is taken) or sometimes lead to amputation of the damaged part. Various
statistics show that 84% of the DFU cases result in amputation of the leg of the patient.
Hence this indicates the emergency of healing DFU. Most studies [1] suggest that DFU
are common in people with ischemic, neuropathic or combined neuro ischemic
abnormalities. Among these, only 10% of the DFU is pure ischemic ulcers and 90% are
caused either by neuropathy, alone or along with ischemia. Diabetic patients with
peripheral sensorimotor and autonomic neuropathy are at high risk of developing foot
ulcers, as they experience high foot pressure, deformities in the foot, and gait
instability.
Yazdanpanah [1] and Alavi [2] in their studies, suggested that healing rate could be
increased by proper management of DFU which helps in a significant reduction, or
prevention of complications like infection, amputation, and sometimes even death of
the patient.
Various studies [1–9] reported that healing of DFU could be improved by
Controlling blood sugar levels, proper wound care and dead tissue removal, pressure
offloading techniques, Wound debridement, Skin grafting, Anti biotic therapy, HBOT,
NPWT, Electrical stimulation(alone) or along with local or global heating.
2.2 Off-Loading
The off-loading technique is also known as pressure modulation [1], is mostly used for
the management of neuropathic ulcer in diabetic patients. Though many other tech-
niques are used currently, only a few studies describe the frequency and rate of wound
healing associated with them [4]. Physical characteristics of the patient and the ability
to fulfil with the treatment along with location and extremity of ulcer determine the
choice of the method to be used.
3 Discussion
In a randomized control study by Peters et al. [11] two groups of 20 patients with
diabetic foot ulcers were considered. Subjects were excluded if they have cardiac
conductivity disorder or malignancy, pregnancy, implants or suffered from any soft
tissue or bone infection. To one of the groups, a dosage of 50 V with 80 twin-peak
monophasic pulses per second was delivered for 10 min, followed by 10 min of 8
pulses per second of current with a pulse width of 100 ls at a gap of 40 min. Another
group has undergone placebo treatment in which ES units that resembled and acted like
active ES unit but did not deliver any current. Both the placebo and treatment groups
received traditional wound care. The wound healing process in patients was evaluated
every week. Among treatment and placebo groups, no significant differences were
observed in the rate of wound healing and the average healing time. Over the 12-weeks
study period, the total change in the cross-sectional area of the ulcer among treatment
and placebo groups was 86.2% versus 71.4% respectively.
In a randomised study performed by Petrofsky et al. [13] on DFU, they aimed at
comparing the rate of wound healing when ES is applied along with local or global heat
and heat alone. In this, 20 subjects with single chronic and non-healing DFU (grade 2)
were considered. Subjects randomly received local dry heat (37 °C) or local dry heat
plus ES equally, 3 times a week for 4 weeks. ES used here was biphasic sine wave
stimulation with a pulse width 250 ls at a frequency of 30 Hz and current 20 mA.
Laser Doppler imager was used to measure the skin blood flow in and around the
wound. There was a significant decrease in volume and area of the wounds by
69.3 ± 27.1% and 68.4 ± 28.6% respectively in the group receiving Local heat + ES
over a duration of 1 month. Only 30.1 ± 22.6% decrease in the area of the wound was
observed in the group receiving just local heat, and at least for 2 months, wounds were
not completely healed. Though this rate of healing was significant, it was not as much
compared to that of the rate of healing in ES + local heat group. Hence local heat when
applied along with ES has shown better results in healing DFUs compared to local heat
or ES alone.
70 C. Belly et al.
Electrical Modulation System (FREMS), and another group of 15 patients were in the
control group who were treated with conventional treatment. Wound surface area,
wound surroundings, symptoms like pain were evaluated during the study. A signifi-
cant increase in healing rate was observed and a decrease in pain was also seen in the
treatment group while compared to that of the control group.
In the study performed by Lundeberg et al. [22], they used Electrical Nerve
Stimulation (ENS) to treat 64 patients with diabetic leg ulcer. Patients with rheumatoid
arthritis, osteomyelitis and ankle pressure below 75 mm Hg were excluded. The 64
patients were divided into two groups, each of 32. One group was treated with ENS and
other is placebo ENS. In 12 weeks of study, the rate of wound healing is 61% among
ENS group and 41% with placebo ENS. It also suggests that to enhance healing,
stimulation of 40–60 min for 5–7 days a week is sufficient. Summary of clinical studies
to heal DFU using various types of ES is shown in Table 1.
Table 1. (continued)
S. Author Type of Type of ES No. of Parameters Duration of Results
No wounds patients of ES ES
6. Houghton Chronic HPVC 27 150 V, 45 min, 3 Wounds size
et al. [16] leg ulcers 100 Hz, times a decreased by
100 ls week for 4 half of initial
weeks size
approximately
7. Petrofsky Diabetic ES + local 29 20 mA, 30 min, 3 Healing rate
et al. [17] foot heat vs 30 Hz times a is little more
ulcers ES + global week for 4 in case of
heat weeks global heat
8. Kloth et al. Stage IV HPVC 16 100 V, 45 min, 5 44.8% of
[15] decubitus 105 Hz, days a healing rate
ulcers 50 ls week was seen in a
week
9. Jancovic Chronic FREMS 35 100 V, 40 min, 5 Decrease in
et al. [19] leg ulcers 10–40 ls, times a pain and
1000 Hz week for 3 improved
weeks healing rate
10. Lundeberg Diabetic ENS vs 64 Not 40–60 min, Enhanced
et al. leg ulcers Placebo mentioned 5 times a healing in
week, for case of ENS
12 weeks
4 Conclusion
Conventional methods used for treating DFUs like blood sugar control, pressure
offloading have shown little results as they are dependent on the severity of the wounds
and individual characteristics of the patient. HBOT technique can give potential results
as it aims at improving tissue oxygen level at the site of the wound, but risks such as
injuries to the middle ear, myopia, seizures are associated with it. NPWT has been
proven to be effective on normal wounds, but this method does not show significant
results when applied to infected wounds, and it is also expensive when compared to
other methods. ES accelerates wound healing by improving blood flow and promoting
cell growth at the site. On whole ES gives better results in healing DFU. ES when
applied along with local or global heat significantly improves the rate of healing of
DFU when compared to Electrical Stimulation or local heat alone.
References
1. Yazdanpanah L, Nasiri M, Adarvishi S (2015) Literature review on the management of
diabetic foot ulcer. World J Diabetes 6(1):37–53
2. Alavi A, Sibbald RG, Mayer D, Goodman L, Botros M, Armstrong DG, Woo K, Boeni T,
Ayello EA, Kirsner RS (2014) Diabetic foot ulcers: part II. Management. J Am Acad
Dermatol 70:21–e1 [PMID: 24355276]. https://doi.org/10.1016/j.jaad.2013.07.048
A Review on Methods of Treatment for Diabetic Foot Ulcer 73
3. McMurry JF (1984) Wound healing with diabetes mellitus. Better glucose control for better
wound healing in diabetes. Surg Clin North Am 64:769–778
4. Armstrong DG, Lavery LA, Nixon BP, Boulton AJ (2004) It’s not what you put on, but what
you take off techniques for debriding and off-loading the diabetic foot wound. Clin Infect Dis
39(Suppl 2):S92–S99 [PMID: 15306986]
5. Lebrun E, Tomic-Canic M, Kirsner RS (2010) The role of surgical debridement in healing of
diabetic foot ulcers. Wound Repair Regen 18:433–438 [PMID: 20840517]. https://doi.org/
10.1111/j.1524-475X.2010.00619
6. Jain AC (2014) A new classification (grading system) of debridement in diabetic lower
limbs-an improvization and standardization in practice of diabetic lower limb salvage around
the world. Med Sci 3:991–1001
7. Santema TB, Poyck PP, Ubbink DT (2016) Skin grafting and tissue replacement for treating
foot ulcers in people with diabetes. Cochrane Database Syst Rev 2:CD011255
8. Barnes RC (2006) Point: hyperbaric oxygen is beneficial for diabetic foot wounds. Clin
Infect Dis 43:188–192 [PMID: 16779745]
9. Thackham JA, McElwain DL, Long RJ (2008) The use of hyperbaric oxygen therapy to treat
chronic wounds: a review. Wound Repair Regen 16:321–330 [PMID: 18471250]. https://
doi.org/10.1111/j.1524-475x.2008.00372
10. Vikatmaa P, Juutilainen V, Kuukasjärvi P, Malmivaara A (2008) Negative pressure wound
therapy: a systematic review on effectiveness and safety. Eur J Vasc Endovasc Surg
36:438–448 [PMID: 18675559]
11. Peters EJ, Lavery LA, Armstrong DG, Fleischli JG (2001) Electric stimulation as an adjunct
to heal diabetic foot ulcers: a randomized controlled trial. Arch Phys Med Rehabil
82:721–725
12. Peters EJ, Armstrong DG, Wunderlich RP, Bosma J, StacpooleShea S (1998) The benefit of
electrical stimulation to enhance perfusion in persons with diabetes mellitus. J Foot Ankle
Surg 37:396–400
13. Petrofsky JS, Lawson D, Berk L, Suh H (2010) Enhanced healing of diabetic foot ulcers using
local heat and electrical stimulation for 30 min three times per week. J Diabetes 2:41–46
14. Petrofksy J, Lawson D, Suh H et al (2007) The influence of local versus global heat on the
healing of chronic wounds in patients with diabetes. Diabetes Technol Ther 9:535–545
15. Lawson D, Petrofsky J (2007) A randomized control study on the effect of biphasic electrical
stimulation in a warm room on blood flow and healing rates in chronic wounds of patients
with and without diabetes. Med Sci Monit 13:258–263
16. Wirsing PG, Habrom AD, Zehnder TM, Friedli S, Blatti M (2013) Wireless micro current
stimulation—an innovative electrical stimulation method for the treatment of patients with
leg and diabetic foot ulcers. Int Wound J 12:693–698
17. Burdge JJ, Hartman JF, Wright ML (2009) A study of HVPC as an adjunctive therapy in
limb salvage for chronic diabetic wounds of the lower extremity. Ostomy Wound Manag
55:30–38
18. Baker LL, Chanbers R, Demuth SK, Villar F (1997) Effects of electrical stimulation on
wound healing in patients with diabetic ulcers. Diab Care 20:405–412
19. Houghton PE, Kincaid CB, Lovell M et al (2003) Effect of electrical stimulation on chronic
leg ulcer size and appearance. Phys Ther 83(1):17–28
20. Kloth L, Feedar J (1988) Acceleration of wound healing with high voltage, monophasic,
pulsed current. Phys Ther 68:503–508
21. Janković A, Binić I (2008) Frequency rhythmic electrical modulation system in the treatment
of chronic painful leg ulcers. Arch Dermatol Res 300:377–383
22. Lundeberg T, Eriksson S, Malm M (1992) Electrical nerve stimulation improves healing of
diabetic ulcers. Ann Plast Surg 29:32831
Feature Extraction of Cardiotocography Signal
1 Introduction
During gestation, fetal heart rate is controlled by the autonomic nervous system and
chemoreceptors present in umbilical artery. The pH of the blood indicates the amount
of oxygen supplied to the fetus. The changes in the blood pH are detected by the
chemoreceptors and the fetal heart rate is regulated by the necessary sympathetic and
parasympathetic stimulation [1]. As the fetus matures its heart rate varies with respect
to the growth of parasympathetic nervous system. Also, fetal heart rate variability
becomes more pronounced. Sympathetic and parasympathetic nervous systems play an
important role in controlling acceleration and deceleration patterns of fetal heart rate
variability. Cerebral cortex, sympathetic ganglia, medulla oblongata and the vagus
nerve are the centres responsible for the regulating fetal heart rate [2].
The fetal health assessment by auscultation of the fetal heart was initially described
more than 300 years ago [3]. The information that can be picked up from the maternal
abdominal wall is the electrical potential of the fetal heart activity and fetal heart
sounds. With the fetoscope and stethoscope, listener can clearly hear and count the fetal
heart rate. Electronic fetal monitoring is more sensitive than stethoscopic auscultation
in predicting fetal distress. Fetal distress is the problem frequently encountered during
the labour due to oxygen insufficiency of the fetus. The Fetal Heart Rate
(FHR) monitoring using Cardiotocography is used as a screening tool to identify
possible reasons for fetal distress during labour. In modern obstetrics, FHR variability
analysis is used to identify the risk factors, diagnose possible abnormalities and thereby
help in executing safe labour.
The equipment which simultaneously records the instantaneous fetal heart rate and
the uterine activity is called Cardiotocography (CTG) machine. There are two mea-
suring electrodes used to record fetal electrocardiogram. Maternal abdomen skin
electrode is used non-invasively and fetal scalp electrode is an invasive approach of
CTG recording. The cardiotocograph is used to assess the electrical activity from the
fetal heart. Fetal heart rate (110–160 bpm) is much higher than the maternal heart rate
(70–80 bpm). The amplitude of fetal heart rate signal is very weak and is affected by
various noises and interferences. Various noises are power line interference, random
electronic noise, maternal interference and baseline wander among which maternal
ECG is the most prominent interference.
2 Methodology
In invasive method, a scalp electrode is placed on the fetus head which passes through
the mother’s womb. Though it identifies morphological features with great success, this
technique is harmful to the mother as it rips the womb and also causes undue pressure
on the head of the fetus. Another major drawback is that this method is applicable only
during labour after rupture of the membranes and hence there is a chance of infection to
the mother.
Cardiotocography is a non-invasive fetal monitoring method performed by using
ultrasonic transducers to record the fetal heart rate and pressure transducers to record
the uterine contractions. These electrodes are tied onto the mother’s abdomen using
elastic straps. This non-invasive approach is easy to use during labour and non-
hazardous to the mother and the fetus. The transducer directs the ultrasonic waves
towards the fetal heart and the reflected waves are processed to determine heart rate of
fetus. Hence short term variability in FHR and its beat to beat traces cannot be assessed
accurately by this method. The fetal and maternal movements result in artefacts and
hence continuous and precise record of the FHR is difficult. In order to perform
automated analysis of FHR, we used CTGOAS software which is explained in the
following section.
2.3 Preprocessing
Preprocessing is an important step in all biomedical signal applications. The values of
feature extraction and classification performance are affected by this process. The steps
in preprocessing are segment selection, artifacts rejection, interpolation, and detrend.
These steps help in removing and replacing all unexpected changes in FHR. These
changes are mainly due to the displacement of the transducer, movements of
mother/fetus or both and stress during labour [4]. Some amount of entire data is
removed as artifacts or missing values. Artifact rejection scheme is employed to
interpolate the values and to fill up the missing beats [5]. This interpolation method
introduces nonlinearity which is approximately same for both normal and abnormal
FHR. After the preprocessing stage, the signals are ready to be analyzed.
1X N
FHRmean ¼ x ¼ xðiÞ
N i¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u 1 X N
FHRstd ¼ t ðxðiÞ xÞ2
N 1 i¼1
Feature Extraction of Cardiotocography Signal 77
1X M
D¼ ½maxðxðiÞÞ minðxðiÞÞ
M i¼1
1 X24M
STV ¼ jsmði þ 1Þ smðiÞj
24M i¼1
STV
II ¼
std½smðiÞ
FHR Mean (l) represents the mean value of FHR, and FHR std (r) is the standard
deviation of FHR. Variability for each minute is considered as delta (D), and M is the
total number of minutes. sm is a vector obtained by collecting one sample per 2.5 s
from FHR signals. STV is the short-term variability, calculated based on the creation
interval of sm vector [6]. II is the interval index that shows the gross change in the
FHR.
Non-linear diagnostic parameters such as Approximation Entropy (ApEn), Sample
Entropy (SampEn) and Lempel Ziv Complexity (LZC) are calculated by this software.
These indices are very important in FHR signal analysis. Here, entropy is defined as the
response of a system in terms of randomness. These non-linear parameters are
expressed in terms of two values m (embedding dimension) and r (threshold)
respectively.
1 Xþ 1
Nm
1 NX m
ApEnðm; r; N Þ ¼ lnðCrm ðiÞÞ lnðCrm þ 1 ðiÞÞ
N m þ 1 i¼1 N m i¼1
Crm þ 1 ðiÞ
ðm; rÞ ¼ lim ln
N!1 Crm ðiÞ
LZC estimates a periodic signal that has same repeating pattern associated with low
complexity in time-series irrespective of time [7].
This section discusses the results obtained by analyzing CTG signal of 552 subjects.
CTG signal from CTU-UHB database is plotted using CTG OAS software as shown in
Fig. 1. The raw CTG signal is denoised and is shown in Fig. 2. The signals shown in
these figures are for 10 min window.
Eighteen features obtained from four different domains can be extracted and ana-
lyzed individually or collectively by using Feature Extraction menu. Morphological
parameters (baseline, the number of acceleration and deceleration patterns) are colour
coded and displayed. Mean, variance, and other linear indices of correlation structure of
CTG signal are calculated as time domain features. Non-linear indices such as ApEn
78 A. Usha Sri et al.
Fig. 1. CTG recording from CTU-UHB database plotted using CTGOAS software
(m,r), SampEn(m,r) and LZC are calculated as frequency domain features [8]. These
morphological, linear, non-linear indices and IBTF domains are displayed as shown in
Fig. 3.
FHR variability can be measured with these indices and is useful in the final stage
of the delivery to clinically interpret the fetal well-being. As the labour proceeds, it is
observed that there is a significant increase in linear domain indices and significant
decrease in the nonlinear indices. Non-linear indices (ApEn(m,r), SampEn(m,r) and
LZC) are derived by direct estimation of the CTG signal and quantify the signal
complexity [9]. Entropy indices are calculated with 0.15 STD and 0.2 STD as the
values set for r, while the value for m is set as 2. These indices are tested in
the antepartum (before labour) period in order to study growth-retarded fetuses in the
uterus and detect variability in the FHR patterns [10, 11]. Thus overall growth of
the fetus can be assessed by using these parameters throughout the gestation period.
Fetus in the womb is supplied with oxygen by means of umbilical cord of the
mother. If there is any interruption in the oxygen supply, the fetus blood turns
abnormally acidic. This state of high acidity or low pH is defined as acidemia. This
change in fetal blood pH results in the parasympathetic stimulation and increases the
fetal heart rate. The relationship between fetal blood pH and the sympathetic and
parasympathetic activation is discussed in detail in “The Porto system for automated
cardiotocographic signal analysis” [12]. In case of fetal distress, fetus suffers from
oxygen insufficiency and this condition results in abnormal heart rate patterns. Thus
fetal blood pH values can be considered as tool to predict fetal oxygenation and fetal
heart rate variability. Based upon the values of arterial pH, CTU-UHB database cat-
egorized delivery outcome into three types of acidemia. They are normal (pH 7.20),
mildly acidemic (MA) (7:10 > pH < 7:20) and mildly to severe acidemic
(MSA) (pH 7.10) [13]. Table 1 shows the perinatal (after labour) features of dif-
ferent categories of acidemia. The data in the table are the mean values of these sixty
subjects.
The arterial pH data and CTG recordings of these sixty subjects is compared to
study the change in fetal heart rate variability. Table 2 shows the linear and non-linear
parameters calculated from feature extraction and correlated with respect to the above
mentioned three categories of the fetuses.
From the above analysis it is observed that in case of severe fetal acidemia, there is
a decrease in the linear indices and increase in the non-linear indices. Moderate-to-
severe acidemia reflects fetal hypoxia and shows a continuous change in entropy
indices. In mildly acidemic and normal fetuses, linear indices are significantly
increased and nonlinear indices show a significant decrease.
80 A. Usha Sri et al.
Table 2. Linear and nonlinear parameters calculated for three categories of acidemia.
Parameters MSA MA Normal
Linear parameters:
Mean 127 127.3 129.2
SD 17.9 17.3 17.1
Delta 47.7 49.6 47.5
STV 3.7 5.52 4.56
Interval index 0.25 0.32 0.36
Mean AD 14.04 12.7 10.4
Median AD 11.4 7.9 5.8
Non-linear parameters:
ApEn(2,0.15) 0.036 0.0256 0.209
SampEn(2,0.15) 2.37 2.18 2.14
ApEn(2,0.20) 0.058 0.057 0.048
SampEn(2,0.20) 2.63 2.48 2.39
4 Conclusion
In this work an attempt has been made to detect the fetal distress prior to the labour
period. The cortical pH values of the fetuses already given in the database are com-
pared with the above calculated indices of fetal heart rate variability. The changes in pH
values are found to be correlated with the variations in the FHR patterns. The present
work mainly focuses on the early assessment of the autonomic nervous system
response to the fetal oxygenation in regulating fetal heart rate. This work can further be
improved by selecting a suitable signal processing technique to assist the clinicians in
decision making. This can help prevention of fetal deaths due to misinterpretation of
FHR variability.
References
1. Mendez-Bauer C, Arnt IC, Gulin L, Escarcena L, Caldeyro-Barcia R (1967) Relationship
between blood pH and heart rate in the human fetus during labor. Am J Obstet Gynecol 97
(4):530–545
2. Khandoker AH, Karmakar C, Kimura Y, Palaniswami M (2013) Development of fetal heart
rate dynamics before and after 30 and 35 weeks of gestation. Comput Cardiol 40:453–456.
ISSN 2325-8861
3. Khandpur RS (2003) Hand book of biomedical engineering, 2nd edn. Tata McGraw-Hill
Education, New York
4. Ayres-de-Campos D, Spong CY, Chandraharan E (2015) FIGO consensus guidelines on
intrapartum fetal monitoring: cardiotocography. Int J Gynecol Obstet 131(1):13–24
5. Cömert Z, Kocamaz AF (2017) A novel software for comprehensive analysis of
cardiotocography signals “CTG-OAS”. In: International artificial intelligence and data
processing symposium. https://doi.org/10.1109/idap.2017.8090210
Feature Extraction of Cardiotocography Signal 81
6. Cömert Z, Kocamaz AF, Gungor S (2016) Cardiotocography signals with artificial neural
network and extreme learning machine. In: 24th signal processing and communication
application conference (SIU). https://doi.org/10.1109/siu.2016.7496034
7. Ayres-de-Campos D, Rei M, Nunes I, Sousa P, Bernardes J (2017) SisPorto 4.0 - computer
analysis following the 2015 FIGO guidelines for intrapartum fetal monitoring. J Matern Fetal
Neonatal Med 30(1):62–67. https://doi.org/10.3109/14767058.2016.1161750. Epub 2016
Mar 29
8. Cömert Z, Kocamaz AF (2016) Evaluation of fetal distress diagnosis during delivery stages
based on linear and nonlinear features of fetal heart rate for neural network community. Int J
Comput Appl 156(4):26–31
9. Spilka J et al (2012) Using nonlinear features for fetal heart rateclassification. Biomed Signal
Process Control 7(4):350–357
10. Signorini MG, Magenes G, Cerutti S, Arduini D (2003) Linear and nonlinear parameters for
the analysis of fetalheart rate signal from cardiotocographic recordings. IEEE Trans Biomed
Eng 50(3):365–374
11. Behar J, Andreotti F, Zaunseder S, Oster J, Clifford GD (2016) A practical guide to non-
invasive foetal electrocardiogram extraction and analysis. Physiol Meas 37(5):1–35
12. Bernardes J, Moura C, de Sa JP, Leite LP (1991) The Porto system for automated
cardiotocographic signalanalysis. J Perinat Med 19(1–2):61–65
13. Gonçalves H, Rocha AP, Ayres-de-Campos D, Bernardes J (2006) Linear and nonlinear fetal
heart rate analysis of normal and acidemic fetuses in the minutes preceding delivery. Med
Biol Eng Comput 44(10):847
IoT Aided Non-invasive NIR Blood Glucose
Monitoring Device
1 Introduction
Diabetes Mellitus commonly known as diabetes is one of the deadliest and chronic
diseases known. It is a metabolic disease which causes high blood sugar levels in the
body. Insulin is a hormone that transports sugar from the body into cells, and these
sugars are stored or used for energy generation. A person with diabetes cannot produce
insulin or cannot effectively use the insulin produced. According to the World Health
Organisation (WHO), the number of people suffering from diabetes is nearly 425
million. In 2012, 1.5 million people died of diabetes. Currently, around 73 million
Indians are affected by diabetes, which is almost 7.1% of the population in India.
Approximately 1 million deaths occur in India due to type 2 diabetes. If diabetes
remains untreated, it can damage nerves, eyes, kidneys and other organs. Chronic
diabetes may also result in diabetic ulcers and may lead to amputation of legs and
hands of the patients in the later stages. Therefore, there is a need to continuously track
the glucose at every stage as a preventive step against the disease getting worse. The
condition of the subject based on the blood glucose concentration is illustrated in
Table 1. The glucose concentration level in every human depends on different
parameters like physiology, age, sex and so on. The target glucose concentration levels
of different age groups for diabetic patients are given in Table 2.
Table 2. Target blood glucose levels for diabetic patients of different age groups
Age Glucose level in diabetic patients
(mg/dl)
Fasting Before meal After meal
Below 6 years 80–180 100–180 Around 180
6–12 years 80–180 90–180 180
12–19 years 70–150 90–130 150
Above 20 years <100 70–130 <180
2 Literature Survey
A paper on NIR Spectroscopy by Narkhede et al. [1] proposes that spectroscopy works
on the principle of light absorption and the interaction of electromagnetic radiation with
matter. A white light source with different wavelengths is focussed on a specimen.
Upon focussing light onto the specimen, the particles (photons) present in the specimen
excite. The attenuation of the transmitted light is compared with that of incident light
and an absorption spectrum is acquired. Spectroscopic studies are mainly done in the
region of NIR and visible range i.e., 400–1190 nm. NIR rays propagate more into the
blood sample than visible or other rays. NIR can penetrate to a depth of 1–10 mm into
the tissue. The penetration value decreases with an increase in wavelength value [11].
Pandey et al. [2] suggests the Raman spectroscopy method can be used as it
allowed higher depths of penetration inside the tissue with great chemical stability.
Using a laser, the sample was irradiated, and hence the spectrum consisting of varying
molecular energies was formed. The scattered light from the specimen at varying
wavelengths and intensities are detected.
Researchers also proposed that Fluorescence can also be correlated to the glucose
levels. Biological tissues in the body exhibit fluorescence at specific frequencies. When
glucose solution is made to excite by laser light at 350 nm, fluorescence is detected.
The intensity of fluorescence depends on the concentration of glucose in the solution.
Pai et al. [9] and Sim et al. [10] proposed papers on photoacoustic spectroscopy
which works on the principle of photoacoustic effect wherein the periodic temperature
IoT Aided Non-invasive NIR Blood Glucose Monitoring Device 85
variations happening in the sample when an IR light falls on it are measured as pressure
fluctuations using a microphone. The processed pressure fluctuations/acoustic waves
produce a spectrum similar to the absorption spectrum. These variations in pressure
happen when light falls on biological tissues.
Cote et al. [8] suggested in his paper on polarimetry, where the polarized light
falling on an optically active sample changes the rotation of the polarized light. The
concentration of the sample/biological tissue was calculated using the values of the
rotation angle.
3 Methodology
3.1 Principle
The device works on the principles of spectrophotometry and Beer–Lamberts law to
analyse the absorbance of light. Beer–Lambert law states that the absorbance is directly
proportional to its concentration and path light of the light through the sample. The
principle of Beer–Lamberts law is explained in Eq. 1 and Fig. 1.
A ¼ Ecl: ð1Þ
The response of light on the biological cells and fluids at a wavelength of range
590–1180 nm helps in estimating the concentration of glucose molecules. The C–H,
O–H, C=O bonds in glucose (C6H12O6) molecules undergo transitions when incident
with IR light [6]. These transitions cause the transmitted light to get partially absorbed
and scattered. Less glucose leads to weak NIR absorption and more scattering while
86 A. Chinthoju et al.
more glucose leads to higher NIR absorption and less scattering [5]. This attenuation of
light due to absorption and scattering is calculated using Eq. 2.
I ¼ I0 ele L : ð2Þ
I is the reflected light intensity, I0 is the incident light intensity, L is the optical path
length in the biological tissue.
µe is defined in Eq. 3 using the absorption coefficient µa and scattering coefficient
µs′ in Eqs. 4 and 5.
h 0
i1=2
le ¼ 3la la þ ls : ð3Þ
la ¼ 2:303eCcm1 : ð4Þ
0
ls ¼ ls ½1 g: ð5Þ
The absorption coefficient (µa) can be derived from the chromophore concentration
(C), molar extinction coefficient (ɛ) and the glucose concentration (g).
Sensory Block
• DAQ (Data Acquisition) System: This block helps in acquiring the signal from the
subject.
• NIR Light Source: Light absorption capacity of glucose is maximum at the
wavelengths of 940 nm, 970 nm, 1408 nm, 2100 nm and 2306 nm.The attenuation
of optical signals by other blood components like water, platelet, RBCs so on is
minimum at 940 nm. This helps in achieving the desired depth of penetration and
predict glucose concentration. A NIR LED is chosen as it emits optical signals at
940 nm.
• Photodetector: A Phototransistor acts as a photodetector which detects light emitted
by the NIR LED and converts the optical signals into an electrical output signal, i.e.,
voltage.
Signal Processing Block
• Filter: The signal consists of noise due to artifacts while acquiring voltage from the
photodetector. The purpose of a filter in a circuit is to eliminate the noise from the
signal. A low pass filter with a cut off frequency of 10 Hz is used to remove high-
frequency components and power line interference while a high pass filter with a cut
off frequency of 0.5 Hz is used to remove low-frequency components or baseline
drift [5].
• Amplifier: An amplifier is used to improve the quality of weaker signals. Instru-
mentation amplifier with a suitable gain is preferred in order to increase the output
signal strength.
88 A. Chinthoju et al.
3.3 Working
The project design includes hardware and software parts. The hardware part comprises
of a transmitter and detector which are mounted on the finger. The analysis can also be
done on ear lobe as these locations have less bony prominence. The transmitter consists
of a near-infrared (NIR) LED which transmits light and the reflected light is received by
a phototransistor which is a detector. NIR waves at a wavelength of 940 nm are used,
as glucose is absorbed largely at this wavelength. IR 333A LED [4] is chosen as it
emits signals at 940 nm wavelength. One can also choose Laser as a transmitter, but it
is preferred less because of its cost and its effects on some tissues within the body.
Hence the use of LED as a transmitter is one of the best options. The transmitter is
connected to a constant power supply. The detector circuit consists of a phototransistor
BPW34 which absorbs wavelength at 940 nm [5]. The detector circuit is connected to
the signal processing unit which includes a filter and amplifier. Further data analysis
involves Embedded C programming in a microcontroller. The blood glucose concen-
tration level will be displayed on the smartphone using IoT or on the LCD.
4 Results
A working prototype of the project has been designed using an IR LED, phototransistor,
signal processing unit and a display system wherein the blood glucose concentration
levels of the subject were fed into the mobile phone or computer wirelessly using IoT.
As mentioned in Table 1 & 2, factors like age, sex and health condition of the subject
determine whether the subject is diabetic or not. So, the predefined ranges of those
conditions are added through IoT, and the obtained values are compared with those
preset values [3]. Thus, the diabetic status of the subject can be estimated precisely.
Data is acquired from 3 normal subjects and 4 diabetic patients of different age
groups. All the subjects were asked to come fasting overnight. The values of voltage at
the receiver end of the phototransistor are correlated with their corresponding glucose
concentration levels [5, 7] as shown in Table 4.
IoT Aided Non-invasive NIR Blood Glucose Monitoring Device 89
5 Conclusion
A prototype has been developed, and the output of the system is correlated with the
glucose concentration levels. Thus, non-invasively the glucose concentration levels are
obtained without any discomfort to the subject. The results are notified to the subject
and physician using IoT. These advantages depict the flexibility of using this technique.
This project can be further extended to monitor the blood haemoglobin along with the
blood glucose as it involves the similar principle.
References
1. Narkhede P, Dhalwar S, Karthikeyan B (2016) NIR based non-invasive blood glucose
measurement. Indian J Sci Technol 9(41):3–4. https://doi.org/10.17485/ijst/2016/v9i41/
98996
2. Pandey R, Paidi SK, Valdez TA, Zhang C, Spegazzini N, Dasari RR, Barman I (2017) Non-
invasive monitoring of blood glucose with Raman spectroscopy. Acc Chem Res 50(2):264–
272. https://doi.org/10.1021/acs.accounts.6b00472
3. Dorsaf G, Yacine M, Khaled N (2018) Non-invasive glucose monitoring: application and
technologies. Curr Trends Biomed Eng Biosci 14(1):555878
4. Gia TN, Ali M, Dhaou IB, Rahmani AM, Westerlund T, Liljeberg P, Tenhunen H (2017)
IoT-based continuous glucose monitoring system: A feasibility study. Procedia Comput Sci
109:327–334
5. Yadav J, Rani A, Singh V (2014) Near-infrared LED based non-invasive blood glucose
sensor. In: 2014 international conference on signal processing and integrated networks
(SPIN). IEEE, pp 591–594. 978-1-4799-2866-8/14
6. Sunny S, Kumar SS (2018) Optical based non-invasive glucometer with IOT. In: 2018
international conference on power, signals, control and computation. IEEE. 978-1-5386-
4208-5/18
7. Aizat Rahmat MA, Su ELM, Addi MM, Yeong CF (2017) GluQo: IoT-based non-invasive
blood glucose monitoring. J Telecommun Electron Comput Eng 9(3–9):71–75.
e-ISSN:2289-8131
8. Cote GL, Fox MD, Northrop RB (1990) Optical polarimetric sensor for blood glucose
measurement. In: Sixteenth annual northeast conference on bioengineering. IEEE. CH2834-
3/90/0000-0101
90 A. Chinthoju et al.
9. Pai PP, Sanki PK, Banerjee S (2015) Photoacoustics based continuous non-invasive blood
glucose monitoring system. In: 2015 IEEE international symposium on medical measure-
ments and applications (MeMeA) proceedings. IEEE. 978-1-4799-6477-2/15
10. Sim JY, Ahn CG, Jeong E, Kim BK (2016) Photoacoustic spectroscopy that uses a resonant
characteristic of a microphone for in vitro measurements of glucose concentration. In: 2016
38th annual international conference of the IEEE engineering in medicine and biology
society (EMBC). IEEE. 978-1-4577-0220-4/16
11. Lam SC, Chung JW, Fan KL, Wong TK (2010) Non-invasive blood glucose measurement
by near-infrared spectroscopy: machine drift, time drift and physiological effect. J Spectrosc
24(6):629–639. 0712-4813/10
Correlation Between Serum Levels of p53
During Radiotherapy in Cervical Cancer
Patients
Abstract. Cervical cancer is the fourth largest for both incidence and mortality
in females due to lack of proper diagnosis in early stages of cancer. Therefore,
identification of serum biomarkers could help in earlier diagnosis and chances to
improve the survival of cervical cancer patients. In this regard, we studied the
levels of circulating p53 protein in cancer patients’ serum during the radiation
therapy which is widely used method to treat cervical cancer. p53 is a tumor
suppressor protein and is associated with genetic changes of the cell. It has a
vital role in cell cycle and apoptosis during cell mutations. We examined the
relationship between the levels of circulating p53 protein and radiotherapy. Our
hypothesis was to prove p53 protein as a biomarker for clinical response to
radiotherapy of cervical cancer patients’. Five cervical cancer patients’ were
enrolled for the study and blood samples were collected from them before,
during and after the radiation treatment. Post-radiotherapy, patients’ clinical
response to radiotherapy was determined from the expression of p53 protein
using Western blot method. For all the samples, the elevation of p53 protein was
not observed in serum. As a preliminary study our results revealed that there was
no correlation between serum levels of p53 protein and radiotherapy clinical
response, as no band was observed on the transfer membrane after blotting.
1 Introduction
Cancer is the leading cause of death decreasing the life expectancy in every country of
the world in the 21st century. Cervical cancer is the fourth largest and most common
types of cancer for both incidence and mortality in females [1]. In cervical cancer the
cells of the cervix are mutated. One of the causes of cervical cancer is by Human
papillomavirus (HPV), a sexually transmitted infection. There are over 100 types of
HPV virus. Among them, HPV16 and HPV18 are high risk oncogenic types [2]. The
discovery of p53 protein with molecular weight of 53 K Dalton showed its function as
an oncogenic protein. p53 is tumor suppression protein [3, 4]. The location of p53 gene
is on chromosome 17 short arm (17p13) Fig. 1 [5].
The main function of p53 protein is to maintain cellular genomic integrity and
controlled cell growth. So, it is triggered during cell stress [6–8]. The genetic change
will trigger p53 gene to produce protein to repair the damage. If the damage is beyond
repair the inhibition of the cell cycle progression takes place leading to cell apoptosis
Fig. 2 [9].
Radiotherapy is one of the promising treatments to treat cancer in the present times.
It is recommended for 40–50% of all cases of cervical cancer [10]. In radiotherapy,
DNA damage is induced in the cell leading to cell death. So as the cell is under stress
our immune system triggers the release of p53 protein to repair the cell.
Correlation Between Serum Levels of p53 93
2 Methodology
2.1 Patients
Cervix patients’ with Squamous cell carcinoma are enrolled at Department of Radiation
Oncology, Basavatarakam Indo-American Cancer Hospital and Research Centre. All
patients provided oral and written informed consent for the collection of blood which
was approved by the Ethics Committee of Basavatarakam Indo-American Cancer
Hospital and Research Centre, Hyderabad (EC Ref. IECR/2018/124, 21st June 2018).
In total, five patients’ were enrolled for the study. The treatment was planned for 5
weeks for all the patients. Blood was collected from untreated tumor patients’ before,
during and after radiation treatment. Three samples were collected from each patient.
First sample was collected a day before the treatment started. Second sample was
collected in the third week and third was collected after the completion of 5 weeks of
the treatment. In total, fifteen blood samples were collected. Serum was extracted from
the blood sample and stored in −80 °C. Fifteen samples were blotted using western blot
method the detection of p53 protein.
Inclusion criteria: (i) Patients’ with histopathologically diagnosed cervical carci-
noma (ii) No prior treatment (iii) tumor size was medium (iv) Treatment sittings 25.
Exclusion criteria: Patients’ with HIV infection.
2.2 Method
Venous blood (5 ml) was collected in a sterile serum separation vacutiner and cen-
trifuged within 2 h after collection at 3000 rpm for 5 min at room temperature.
Supernatant containing serum was transferred into another polypropylene tube and
stored at −80 °C until further study. Serum was used for experiment after thawing once.
The aim of the study is to detect the expression levels of circulating p53 protein as an
immune response to radiotherapy in serum of patients’ suffering from cervical cancer.
Serum levels of p53 were assessed in patients’ before, during and after radiotherapy and
compared. Circulating p53 protein was detected using western blotting.
Western blotting is a gold standard method to identify specific proteins from the
sample. Firstly, SDS– polyacrylamide gel electrophoresis will be done to separate all
the proteins in the samples according to their molecular weight. The separated proteins
are then transferred to a transfer membrane mostly made up of Polyvinylidene
difluoride (PVDF) or nitrocellulose. The membrane with total proteins is then labelled
and incubated with antibodies specific to the protein of interest. Leaving the protein
bound antibodies on the membrane the unbound primary and secondary antibodies are
washed off. The membrane is then developed forming a band indicating the amount of
protein bound to antibodies. The thickness of the band corresponds to the amount of
protein present.
After separating the serum, the expression of p53 protein levels was observed using
Western blotting method. Serum samples were analysed for total protein concentration
using Bradford method. Samples were heated for 5 min at 100 °C. Then lysed samples
were subjected to 12% SDS– polyacrylamide gel electrophoresis to separate the pro-
teins according to their molecular weight (Fig. 3).
94 B. Sai Lalitha et al.
3 Results
Western blot was performed for all the fifteen patient samples. The developed gels
(shown in Figs. 4 and 5) were observed for p53 protein. No p53 protein bands were
observed on the membrane for any of the fifteen samples.
Fig. 4. Western blot analysis of p53 with Mouse- anti p53 primary antibody and HRP
conjugated secondary antibody where M – Marker, P1a, P1b, P1c- Patient 1 before, during and
after radiation samples, P2a, P2b, P2c- Patient 2 before, during and after radiation samples, P3a,
P3b, P3c- Patient 3 before, during and after radiation samples
Correlation Between Serum Levels of p53 95
Fig. 5. Western blot analysis of p53 with Mouse- anti p53 primary antibody and HRP
conjugated secondary antibody where M – Marker, P4a, P4b, P4c- Patient 4 before, during and
after radiation samples, P5a, P5b, P5c- Patient 5 before, during and after radiation samples.
Table 1. Protein expression at the three time points i.e. before, during and after radiation
samples
Patient Before radiation During radiation After radiation
(Expression of protein)a (Expression of protein)b (Expression of protein)c
P1 No expression No expression No expression
P2 No expression No expression No expression
P3 No expression No expression No expression
P4 No expression No expression No expression
P5 No expression No expression No expression
The protein expression at the three time points i.e. before, during and after radiation
samples were compared shown in Table 1. Superfix a, b, c refer to before, during and
after samples respectively. The results demonstrate that there is no detectable p53
protein as a response to radiation therapy. The results also demonstrated that there is no
elevation of protein during radiation.
4 Conclusion
The major finding in this study is that there was no close correlation between serum
p53 expression and radiotherapy process. There might be a possibility that synthesized
p53 protein in serum gets very rapidly degraded. This study was conducted as a
preliminary study. Further, the study will be done by increasing the sample size to
standardize the results. Also, by comparing the tumor p53 levels could give more
appropriate information to understand its correlation with radiation therapy. Addi-
tionally, the exact circulation time of the protein in serum is needed to be studied at
varying time intervals. This reveals the correlation between protein levels before during
and after radiation therapy.
96 B. Sai Lalitha et al.
References
1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer
statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers
in 185 countries. CA Cancer J Clin 68(6):394–424. https://doi.org/10.3322/caac.21492
2. Burd EM (2003) Human papillomavirus and cervical cancer. Clin Microbiol Rev 16(1):
1–17. https://doi.org/10.1128/cmr.16.1.1-17.2003
3. Linzer DI, Levine AJ (1979) Characterization of a 54 K Dalton cellular SV40 tumor antigen
present in SV40-transformed cells and uninfected embryonal carcinoma cells. Cell 17(1):
43–52
4. Lane DP (1984) Cell immortalization and transformation by the p53 gene. Nature 312
(5995):596
5. McBride OW, Merry D, Givol D (1986) The gene for human p 53 cellular tumor antigen is
located on chromosome 17 short arm (17p13). Proc Nat Acad Sci 83(1):130–134.
PMID: 3001719
6. Levine AJ (1997) p53, the cellular gatekeeper for growth and division. Cell 88:323–331
7. Bargonetti J, Manfredi JJ (2002) Multiple roles of the tumor suppressor p53. Curr Opin
Oncol 14:86–91
8. Fridman JS, Lowe SW (2003) Control of apoptosis by p53. Oncogene 22:9030–9040
9. Vousden KH, Lu X (2002) Live or let die: the cell’s response to p53. Nat Rev Cancer 2:
594–604
10. Delaney G et al (2014) Correlation between Akt and P53 protein expression and
chemoradiotherapy response in cervical cancer patients. HAYATI J Biosci 21(4):173–179.
https://doi.org/10.4308/Hjb.21.4.173
A Comparison of Morphological Features
Between Normal and Abnormal Left Ventricle
in One Cardiac Cycle
1 Introduction
In recent years the usage of medical image processing has become a necessity to extract
relevant information, compression and image reconstruction. There are many modern
imaging techniques which uses image processing are Radiography, Magnetic Image
Resonance, Ultrasound and Echocardiography. These are used to access problems that
occur in our body by the physician.
The human heart is four chambered and is divided into right and left side with each
atrium and ventricle. The atria acts as reservoirs which does the receiving of the blood
with a little pumping action to assist ventricular filling. The ventricles are the major
pumping chambers for delivering blood to the body. The importance is given to the left
ventricle (LV) when compared to others is as it performs more than 80% of the cardiac
function.
Coronary artery disease which causes heart muscles to thicken leads to disrupting
of heart function or even myocardial infarction. The abnormal contraction mechanisms
of the left ventricle leads to decrease in the stroke volume which further decreases the
cardiac output. Due to these reasons if muscles of the heart are affected it results in
2 Methodology
Eq. 1. Median filtering is more effective compared to other low pass filters when the
main aim is to reduce speckle noise in the image and also simultaneously preserve the
edges of the image.
F(x,y) ¼ medianfg(s,t)g; ðs,t) Sxy ð1Þ
where g(s,t) is the input image and Sxy is the subimage area used for filtering.
In-order to enhance the pixel intensity the histogram of the image was equalized
which is needed for the segmentation procedure. Histogram equalization chooses the
grayscale transformation T to minimize
jc1 ðT ðk ÞÞ c0 ðkÞ j ð2Þ
where c0 is the cumulative histogram of the input image and c1 is the cumulative sum
of histogram for all intensities k.
A B ¼ fzjðBÞz \ A 6¼ ;g ð4Þ
where A is the input image, B is the structuring element and z is the pixel location in
the image.
After morphological operations there were some unwanted spots around the edges
of the image. They were removed based on the 8-connected area ‘p’ determined for
each image. Pixels of the unwanted spots are considered to be connected if their corners
or edges touch with the region of interest (ROI) [9, 10].
Successive deletion of outer layer and inner layer gives the left ventricular contour.
To get the contour of one pixel thickness the morphological operation, thinning is used.
Thus, we get the binary image that contains the contour of left ventricle with one pixel
thickness. The same process was used to extract the contour for all the frames.
The cardiac parameters such as area, perimeter and centroid of the left ventricle
were calculated automatically for each frame during one cardiac cycle.
3 Results
From the echocardiogram images, contour of the left ventricle is extracted. By using this
left ventricular contour, the parameters such as area, perimeter and centriod of the left
ventricle are calculated. The processing is done for normal heart and abnormal heart.
The extracted frame contains noise, this noise was removed using a median filter.
The image after median filtering and contrast enhancement using histogram equaliza-
tion is shown in Fig. 4.
The segmented image using the intensity based thresholding where the ROI that is
the left Ventricle is assigned a value of 1 and the rest with a value of 0. The resultant
image is a segmented binary image shown in Fig. 5.
The contour of one pixel thickness is obtained by using the mathematical mor-
phological operations and the output is shown in Fig. 6. The image contained the noisy
pixel branches inside and outside the ROI. A program is developed to form a neat
region for the segmented left ventricle which is used to find the regional properties as
shown in Fig. 7.
Hence the left ventricular contour was extracted. The contour was extracted in all
the frames in a similar way. The change in the shape of the left ventricle in normal heart
and the abnormal heart can be differentiated accordingly.
A Comparison of Morphological Features 103
Table 1. Calculated values of area and perimeter for one cardiac cycle.
Frame Area normal Area abnormal Perimeter normal Perimeter abnormal
1 2896 1918 309.1787 239.0955
2 2746 1886 261.2792 238.6518
3 2496 1766 260.066 231.0955
4 1726 1584 193.196 226.8528
5 1574 1524 190.1249 225.1960
6 1194 1107 153.1543 196.1249
7 1093 952 140.8112 168.2254
8 1073 895 142.3259 160.1249
9 1262 913 176.9533 162.9533
10 1897 1006 197.0955 146.1249
11 2017 942 215.4386 138.1249
12 2255 1015 257.8234 144.129
13 2282 1232 222.0244 179.0538
14 2390 1403 263.4802 190.4680
15 2361 1554 254.1665 197.539
16 2496 2267 260.066 255.2376
A graph was drawn to show the variation in area over one cardiac cycle and shown
in Fig. 8. It can be observed that the change in area during one cardiac cycle of a
normal heart vary significantly when compared with the abnormal one. Figure 9 shows
the graph of change in perimeters during one cardiac cycle.
The graphs drawn in Figs. 10, 11 and 12 shows the changes in the motion with
respect to the coordinates of the centroid. The graphs obtained clearly explain the
motion of the LV i.e., the systolic and diastolic movement.
Fig. 11. Plot of the normal LV centroid X-coordinates w.r.t to the frames.
A Comparison of Morphological Features 105
The location of the centroid is significantly different both in healthy and diseased
heart. Figure 12 shows the movement of the x coordinate of the centroid during one
cardiac cycle. The locations during the beginning and end are not close to each other.
The contraction and expansion of the LV’s can be estimated thus inferring the
amount of the blood and the pressure with which it is pumped can be analyzed by the
health professionals.
Fig. 12. Plot of the abnormal LV centroid X-coordinates w.r.t to the frames.
The area, perimeter and centroid of the left ventricle for normal and abnormal subjects
were calculated for one cardiac cycle in 16 different frames. The changes in the
parameters frame to frame were observed to learn the systolic and diastolic movement
of the heart.
The area and perimeter of both normal and abnormal LV are compared. The
Perimeter of the abnormal LV is comparatively greater than the normal because of the
irregular shape of the LV. The area infers the change in contraction and expansion of
the LV. The centroid of the LV is used to study the irregular motion of the abnormal
LV comparing it with the normal LV. Thus, these results show that the contraction of
the abnormal LV is lesser compared to normal.
This work can be further extended for calculation of the volume of the heart’s left
ventricle for diagnostic purposes.
References
1. Braunwald E, Zipes DP, Libby P, Bonow R (2004) Braunwald’s heart disease: a textbook of
cardiovascular medicine. W. B. Saunders Co., Philadelphia, PA, pp l–150
2. Celebi AS, Yalcin H, Yalcin F (2010) Current cardiac imaging techniques for detection of
left ventricular mass. Cardiovasc Ultrasound 8:19
106 S. Chandra Prakash et al.
1 Introduction
ECG is a recording of the heart’s electrical activity as shown in Fig. 1. It is a vital tool
in cardiology as it helps in detecting many heart diseases. The basic finding in an ECG
is the heart rate which is determined from the count of the QRS complexes in the 10-s
ECG strip as shown in Fig. 2. The first step in ECG processing is always the detection
of R waves after which the detection of other waves are carried out. The detection of
the R waves might be done in processing the time domain or the frequency domain
representations of the ECG. Lots of algorithms had been developed to process the ECG
in the time domain in order to detect the R waves, in this paper we introduce a method
to find the heart rate by processing the spectrum only; then the R waves’ locations are
discovered retrospectively (i.e.) using the information deduced from the spectrum
processing. Finding the heart rate is a requirement in itself but we go beyond that to
localize the R waves by using the heart rate piece of information. Most time domain
algorithms-if not all-which aim to detect the QRS complexes need to set a threshold
(whether static or dynamic) above which a peak point is considered a valid R wave and
below which a peak point is neglected whatsoever. Verily a dynamic threshold is far
more practical and reliable than a static one as it adapts with ECG parameters. In our
proposed method we neither use a static threshold nor a dynamic one, but rather we
find expected locations of the R waves using the heart rate piece of information and
then design a window of a certain length to cover the vicinity area of every expected
location, finally the maximum amplitude point within the limits of the window is found
and verified to be the R wave.
Fig. 1. ECG signal waveform. Note the R wave peaks occurring periodically which represent
the depolarization of ventricles.
Fig. 2. ECG signal rhythm strip. Usually it is of 10 s duration in the ECG graph sheet.
Section 2 presents the algorithm in its two stages of processing; time and frequency
domains, Sect. 3 details the results of the proposed method applied to 50 ECG records
and finally Sect. 4 highlights some types of ECGs where this method fails to process
and concludes with future improvements to the method.
2 Algorithm
The algorithm is based on the frequency component that represents the heart rate which
typically lies in the range (0.5 Hz–6 Hz) which corresponds to heart rate range (30
beats per minute–360 beats per minute). Detecting this frequency component reveals
the key piece of information because it represents the QRS complex rate of repetition
which makes the estimation of R wave locations possible. After this primary estimation
is attained, the certain locations are detected by processing the parts of the signal
around the approximate locations.
Localization of ECG QRS Waves Through Spectral Estimation of Heart Rate 109
2.1 Pre-processing
The preprocessing stage applied includes removing the DC voltage from the ECG
signal by subtracting the mean of the signal from it which is followed by substituting
the first and last 5% samples of the ECG signal by the mean to eliminate any unwanted
distortions due signal acquisition or digitization processes, then the signal is multiplied
by negative unity if the QRS complexes morphology is a deep S wave followed by a
low amplitude R wave. Finally differentiating the signal to enhance high slope parts
and squaring to remove negative amplitudes. There is no band pass filtering stage in
this proposed method because the algorithm works on the spectrum and hence it
initially excludes the unwanted regions (regions of noise and interference) of the
spectrum from being involved while extracting the piece of information the whole
algorithm is dependent on, unlike the other algorithms which deal with time domain
representations of the ECG signal which represent all frequency components.
Fig. 3. ECG signal spectrum zoomed in to focus on the first quadrant of the spectrum. The
horizontal axis represents the heart rate instead of frequency (1 Hz corresponds to a heart rate of
60 bpm) and the vertical axis indicates the energy. (a) all spectrum peaks. (b) maximum energy
peak in (a).
followed by R wave (the term negative R wave can be used loosely) scenario ECGs
(see lead aVR in Fig. 1) the minimum point always corresponds to an R wave. Having
an R wave detected in the aforementioned mechanism (marked in blue in Fig. 4) paves
the way to find all points that are N unit time away from the first R wave in both
directions. At the end of this stage, a number of possible locations for the R waves are
found (marked in green in Fig. 4) which are going to be not correct for the reason that
the ECG signal is not periodic in the literal meaning of the word because it is a
physiological signal (technically, strict periodicity is observed only in mathematical
functions), some patterns of the cyclic ECG signal may come earlier or later than the
fundamental period. For finding the correct locations of the R waves; a window of
certain size (sizes are discussed in the next section) centered at the obtained expected
locations is applied on the signal which covers the vicinity of the expected locations.
Subsequently the highest amplitude point within the window limits is detected which
determines the correct R wave peak (marked in black in Fig. 4). Finally; minor location
offsets in R wave localizations which are due to preprocessing are corrected.
Fig. 4. ECG signal peak finding while (a) window size L = N, (b) window size L ¼ 23 N and
(c) window size L ¼ 12 N. Red delineation represents the parts processed by the successive
windows in the peak detection process.
Localization of ECG QRS Waves Through Spectral Estimation of Heart Rate 111
3 Results
The most accurate trial’s particulars based on this criterion are written in italic in
Table 1. Other criteria may be the basis to quantify the most reliable trial.
Figure 5 shows an ECG signal contaminated partially with 50 Hz power signal
interference with R waves detected.
Table 1. (continued)
Rec. No. Total beats R wave locations detected (correct, wrong, missed)
ecg_14 13 88,59,44 10,4,3 7,7,6 4,10,9
ecg_1 4 92,63,47 4,29,0 3,31,1 2,32,2
ecg_10 28 91,63,47 27,1,1 19,11,9 16,14,12
ecg_36 12 94,62,48 12,26,0 11,27,1 10,29,2
ecg_3 33 96,64,48 33,2,0 25,11,8 20,14,13
ecg_28 9 75,50,41 8,0,1 6,2,3 5,4,4
Personal Library of Digitized ECGs
ecg6 11 79,57,44 10,11,1 8,3,3 7,4,4
ecg8 14 90,50,45 13,1,1 10,5,4 8,7,6
ecg13 12 87,57,44 12,1,0 7,6,5 5,8,7
ecg14 13 94,63,47 13,14,0 8,20,5 5,23,8
ecg25 13 88,58,47 13,1,0 8,6,5 6,9,7
ecg33 16 92,62,45 15,12,1 10,22,6 9,22,7
31 13 82,55,44 13,1,1 11,3,3 9,6,5
2 10 93,92,47 10,7,0 10,2,0 8,13,2
32 23 91,61,48 22,1,1 22,2,1 22,2,1
37 29 96,62,49 29,1,0 29,1,0 29,1,0
38 9 75,51,42 8,0,1 6,2,3 5,4,4
4 Conclusion Remarks
The proposed method processes ECG signals in which the T wave height is lower in
amplitude than the R wave (whether R wave is positive or negative) as discussed in
Sect. 2, however in certain pathological cases the T waves are higher in amplitude than
the R waves, the algorithm must be developed further to deal with such cases as well.
The method is applied to the ECG signals in three trials at the stage of time domain
processing as discussed. In most of the cases at least one trial achieves the detection of
all R waves successfully. In future we shall develop a method in which the heart rate is
estimated more accurately either in time or frequency domain. Also a mechanism that
enables the artificial intelligence to choose the right trial among the three trials that
represents the correct size of the window must be developed. One of the suggested
methods for the automatic selection of the best window length among the three trials is
the standard deviation analysis on basis of beat to beat time duration differences and
peak to peak amplitude verification. Further, the false R waves (additional non true
114 R. Akula and H. Mohamed
waves detected) should be managed by the algorithm so that they are dropped from the
final R wave locations vector in order to detect the correct heart rate and exact R wave
locations.
References
1. Goutam A, Lee Y-D, Chung W-Y (2008) ECG signal denoising with signal averaging and
filtering algorithm. In: Third 2008 international conference on convergence and hybrid
information technology
2. Shen TW, Tompkins WJ (2005) Biometric statistical study of one-lead ECG feature and body
mass index (BMI). In: Proceedings of the 2005 IEEE, engineering in medicine and biology
27th annual conference, Shanghai, China, 1–4 September 2005
3. Muthuvel K, Padma Suresh L, Jerry Alexander T (2014) Spectrum approach based
classification of ECG signal. In: 2014 international conference on circuit, power and
computing technologies [ICCPCT]
4. Canan S, Ozbay Y, Karlik B (1998) A method for removing low varying frequency trend
from ECG signal. In: 2nd international biometrical engineering days
5. MIT-BIH arrhythmia database
Early Stage Squamous Cell Lung
Cancer Detection
1 Introduction
Cancer has become a common and life-threatening disease nowadays. Among various
cancers, Lung cancer has more prevalence. Lung cancer is divided into non-small cell
lung cancer and small cell lung cancer. Non-small cell lung cancer is categorized as
Adenocarcinoma, Large cell carcinoma and Squamous cell carcinoma of the lung or
squamous cell lung cancer (SqCLC). The cancer cells and their arrangements look
different under a microscope for each type of lung cancer. Non-small cell lung carci-
noma is spread at an average of 85% and Small cell lung cancer is about 15%. SqCLC
is a common type of lung cancer. Around 25–30% of people with lung cancer will have
SqCLC. Major risk factors like smoking, exposure to radon, secondhand smoke, air
pollution causes this type of lung cancer. If such factors are kept in control, the risk of
getting cancer can be lowered.
In the early stage of cancer, no specific symptoms are observed. The long-lasting
cough is always a prior symptom to be considered. Chest X-Ray Radiography is
screened to observe lung disorders. If the subject is suspicious of cancer, Chest CT is
preferred to analyse lung cancer. The stage of lung cancer is determined with the size
and position of the tumor and its widespread. Four significant stages and indications of
Lung cancer are shown in Table 1.
In the primary stage, small nodules (<3 mm size) are formed in either lung and
proliferates in a few months. It is essential to detect such nodules in early stages of
cancer and can expect a five-year survival rate closer to 53%. Usually, symptoms
appear clearly when the disease is at an advanced stage (stage III or IV). Thus the
mortality rate is very high because of the same fact that it is diagnosed in later stages.
So, the paper attempts to automatically detect lung nodules to diagnose lung cancer in
early stages.
Generally, Computed Tomography (CT) is performed to diagnose lung cancer,
which shows additional information than plain radiography does. However, CT
requires additional techniques for the interpretation and extraction of pathological
information [1]. A number of Computer-Aided Diagnosis (CAD) systems were
developed to help radiologists to characterize the spread of disease in and around lung
regions. In CAD systems, separation of Lung regions from chest CT is the crucial
step. This is carried out by a segmentation technique. There are a number of seg-
mentation algorithms developed and tested on Chest CT images.
Mesanovic et al. [2] implemented a region growing algorithm to find the lung
boundaries. In [3], Hedlund et al. developed a 3-D region growing technique to seg-
ment the lung regions. Sluimer et al. [4], implemented a thresholding method to
segment the lungs from a CT image. In the papers [2] and [3], the seed point was
manually selected. Many region growing methods have been developed in past years,
including the Graph cut method [5], Fuzzy connectedness [6], Watershed transform [7],
Flood fill technique [8]. The methods reported in [2–8], diagnose the tumor in the
advanced stages where a subject has to undergo chemotherapy or radiation treatments.
2 Methodology
The proposed method has various steps like pre-processing of the input image, seg-
mentation of the region of interest and description of a variety of features to analyse the
image. The flow of the complete work is depicted in Fig. 1.
High-Resolution Computed Tomography (HRCT) chest image is used as a non-
invasive tool for diagnosing and analysing Lung cancer in early stages. It is very useful
than X-ray because of its excellent contrast resolution. In the present work, HRCT
images are used, to enhance the efficiency and accuracy of detecting the early stage
Early Stage Squamous Cell Lung Cancer Detection 117
Lung cancer. The HRCT dataset is a volume of DICOM images. The JPEG image
format is produced and used for further analysis in MATLAB 2014a.
The primary task is to improve the quality of the image so that every pixel
information pertaining to edges, borders, regions are considered for the processing. The
image is converted into grayscale and filtering is carried out to remove the noise present
in the image. Several filters were tested on different images and observed accuracy. The
median filter is used in the work, as it is producing a better outcome. It works by
moving through each image pixel and replacing each value with the median value of
neighbouring pixels. This filter also restores the sharpness of the image.
3 Segmentation
Segmentation is the process of separating the valuable information from the complete
image. There are a number of image segmentation techniques. They are mainly clas-
sified into Region based and Edge-based segmentation. Region-based segmentation
groups similar adjacent pixels, whereas Edge-based methods detect discontinuities in
the image and use them as an outline of each segment. In the paper, Watershed
transform in combination with Morphology-based region of interest segmentation is
used to highlight the lung nodules in chest CT images perfectly. Extracting the lung
nodules and its metrics like size, area, correlation will assist us in diagnosing the lung
cancer in early stage.
Watershed transform is a classic model of region-based segmentation method. It
provides a simple framework for incorporating knowledge-based constraints. The basic
idea here is to partition the image into different regions separated by boundary lines.
118 H. Kuchulakanti and C. Paidimarry
Dam is built to restrict the rising water of different catchment basin to merge on.
Eventually, only the tops of the dams are visible above the water line. Dam boundaries
correspond to the divide lines or watershed lines of the region or the watershed.
Watershed technique main aim is to find the watershed lines. A set of markers are
implemented to segment an image in region-based watershed segmentation.
After removing the noise, the image is converted into a binary image with a
threshold with cut-off 128. This will map the image in the black and white image; pixel
values greater than the threshold become white and below that become black. An
Erosion operation is performed to eliminate white pixel. Watershed transform is per-
formed on the image to extract lung regions. The nodules if present is highlighted in
the process and using morphological operations the nodules are segmented out. The
nodules shape and texture analysis are carried out to know the various metrics. The
metrics considered here are area, perimeter, correlation, entropy, eccentricity, contrast,
smoothness, skewness, variance, mean, standard deviation, homogeneity and energy. If
a nodule is detected and has an area of more than set threshold, it is categorised to be
Stage II cancer, and if the area is less than the threshold than it is Stage I cancer.
First, Two sets of the Lung Image Database Consortium - Image Database Resource
Initiative (LIDC-IDRA) database [9], which is open access dataset is used to analyse the
algorithm. Two different images of the database are acquired. Image 1 has a solid nodule
on the right lung. The image is filtered and watershed transform is performed on it. After
morphological operations, the nodule is segmented and its metrics are calculated.
In both the images, the tumor is diagnosed to be stage 1 based on the parameters derived.
The results of the technique on LIDC database images 1 and 2 are illustrated in Figs. 2
and 3 respectively.
The algorithm is performed on the images acquired from the Hospital. Four image
datasets were acquired on a Seimens CT machine from a private hospital in DICOM
format. They were converted into an image format (.jpeg) and processed to analyse the
stage of lung cancer. The original image, Binary image, watershed transform output
image and the nodule position and texture are shown in Table 2. for all the acquired
input images.
CT Image 1 and 2 have a nodule in the left lung with a size of greater than 3 mm
and area above the set threshold of 300 and are classified as stage II lung cancer. It was
verified with the Radiologist view and resulted in the same. CT Image 3 has a small
nodule of size less than 3 mm and area less than the threshold. It is classified as Stage I.
CT Image 4 has no lung nodule, and it is a Non-cancerous lung image. The metrics
area, perimeter, correlation, entropy, eccentricity, contrast, smoothness, skewness,
variance, mean, standard deviation, homogeneity and energy are calculated for each
image of LIDC database and CT images. Of which Area, correlation, entropy, mean,
standard deviation are tabulated in Table 3. The area is correlated with the size of the
nodule, and a threshold is set to determine the stage of lung cancer. The correlation,
Homogeneity and Entropy of the output are calculated to check the variation in the
images.
Table 2. The result of the technique on different CT images
120
Image Original input image Binary image Watershed output Lung nodule Stage
Original Image BW Image Watershed Segmentation Tumor Region
CT image 1
5 Conclusion
References
1. Hu S, Hoffman EA, Reinhardt JM (2001) Automatic lung segmentation for accurate
quantification of volumetric X-ray CT images. IEEE Trans Med Imaging 20:490–498
2. Mesanovic N, Huseinagic H, Males M, Grgic M, Skejic E, Smajlovic M (2011) Automatic CT
image segmentation of the lungs with region growing algorithm IWSSIP-2011, June 2011
3. Hedlund LW, Anderson RF, Goulding PL, Beck JW, Effmann EL, Putman CE (1982) Two
methods for isolating the lung area of a CT scan for density information. Radiology 144:353–357
4. Sluimer I, Prokop M, van Ginneken B (2005) Toward automated segmentation of the
pathological lung in CT. IEEE Trans Med Imaging 24(8):1025–1038
5. Boykov Y, Jolly MP (2000) Interactive organ segmentation using graph cuts. In: Medical
image computing and computer-assisted intervention: MICCAI 2000. Springer, Berlin,
Germany, pp 276–286
6. Udupa JK, Samarasekera S (1996) Fuzzy connectedness and object definition: theory, algorithms,
and applications in image segmentation. Graph Models Image Process 58(3):246–261
7. Mangan AP, Whitaker RT (1999) Partitioning 3D surface meshes using watershed
segmentation. IEEE Trans Vis Comput Graph 5(4):308–321
8. Chen H, Mukundan R, Butler APH (2011) Automatic lung segmentation in HRCT images.
IVCNZ
9. Lung Image Database Consortium - Image Database Resource Initiative (LIDC-IDRA)
database - The Cancer Imaging Archive (TCIA) public access. https://wiki.cancerimagingarchive.
net/display/Public/LIDC-IDRI
Conditions of Deadlock Detection
in Multithreaded Applications Using
Inheritable Systems
1 Introduction
Data and system advancements have quickly and broadly spread. Database has been
utilized as a part of different application areas. Database administration frameworks are
utilized as a part of these areas. It has a tendency to be troublesome that a solid database
administration framework fits all of database request territories. The catalog adminis-
tration framework that might be adjusted to every database request region will be
essential [1–3].
2 Preparation
Terminals are assessed specifically, whereas capacities are assessed after the assess-
ment of kids’ nodes. Functions and terminals are called images in this document.
A case of an agenda is appeared in Fig. 1. The leaf hubs of the program tree are
terminals. Alternate hubs are capacities.
Fig. 1. An example of HP
The root node, which is the purpose of Symbol +, evaluates the adding the variables
and values, it specified at the first argument, which is the left sub-tree The left sub-tree
indicates the condition whether the variables X is add with right-sub tree than the value
is 1. Another Diagram represent the root node as Symbol is +, It has two child nodes,
one is left-node and right node. Left node contain the variable as X and right node
contain the value of Symbol is *. Now this right node can act as Root node of another
two child node, i.e. the value of left node is X and value of right node is 2. This first
evaluates two child nodes and do the multiply this two nodes and then evaluates with
the root node i.e. symbol of +. Finally we get the evaluation of x + (x* 2).
3 Simultaneity Control
3.1 Outline
A simultaneousness control program controls exchanges to be simultaneously executed
with guaranteeing the consistency of information. The simultaneousness control pro-
gram suited to an application program is attempted to be produced by utilizing HP
[5, 6]. The means of this framework are as per the following:
Conditions of Deadlock Detection in Multithreaded Applications 125
Each Computing Machine having processor for executing the task. Each Task
having n number of procedure, it can be delivered number of threads. Above diagram
represent the symbol of concurrency control. Each Processor may communicate with
the procedure 1 and Procedure 2 and these procedure’s may communicate with
Thread’s. We have to run this thread smooth manner i.e. without deadlock. In this
thread can have shared memory.
Step 1: To Initiate the random population generation.
Step 2: After that populations are generated, it can be evaluate the fitness of
population.
Step 3: Select the criteria.
Step 4: If criteria are not suited and will make Diversification.
Step 5: Compare with Existing fitness and Also Generate the new population.
Step 6: If conditions are satisfied goto the output.
Step 7: If conditions are not satisfied or terminated, repeat the process step 2
Create initiate
Random population
Fitness Evaluation
Diversification
Is
Satisfy
Termination
Iteration
4 Proposed Method
4.1 Requirements
The requirements are as follows:
1. Deadlock can be detected properly.
2. It is prevented that an invalid program is generated.
For communicating the exchange’s hold up state by utilizing shared control vari-
ables, we utilize the unique shared control variables. It is characterized between an
exchange and the information thing which the exchange holds up for. This shared
control variable’s id is set to Pause. It communicates the exchange’s hold up state. By
utilizing the exceptional shared control variables, deadlock discovery can work
legitimately.
5 Experiment
5.1 Purpose
To confirm that simultaneity control projects can be produced by utilizing the proposed
strategy, it is endeavored to create simultaneity control programs.
5.2 Procedure
Two arrangements of parameter settings on exchange are used. Values of them are
appeared in Table 1. In each setting, conflict has a tendency to happen. In setting 2, a
bigger number of contentions happen than in setting. The wellness estimation of the
created program is contrasted and those of 2PL and TSO. Program’s legitimacy is
likewise assessed.
6 Conclusion
This paper enhanced the simultaneity control program age framework with a specific
end goal to create simultaneity control programs that can legitimately recognize halt
and can execute any calendar. For distinguishing halt, a unique shared control variable
was presented. For creating appropriate projects the instrument checking the legitimacy
of the program produced was presented. It was tentatively demonstrated that a fitting
simultaneity control program can effectively be produced by utilizing the proposed
strategy. This paper predominantly centered on the legitimacy of simultaneity control
programs. Come to an end of the test is may be simultaneity control program can be
produced. More nonexclusive stop location utilizing shared control factors and mir-
roring system’s execution cost to the wellness estimation work are incorporated into
future work. Analyses with more reasonable parameter esteems are additionally in
future work.
References
1. Stonebraker M, Etintemel U (2005) One size fits all: an idea whose time has come and gone.
In: Procedure of the 21st international conference on data engineering, pp 2–11
2. Seltzer M (2008) Beyond relational databases. Commun ACM 51(7):52–58
3. Silberchatz A, Korth H, Sudharsan S (2002) Database system concepts. McGraw-Hill,
New York
130 A. Mohan and N. Komal Kumar
Abstract. The world became interactive and socially active now-a-days because
of the increase in different types of content sharing applications. These content
sharing applications are social media platforms which provide various features so
that users can effectively interact and share their thoughts and ideology. One such
platform is a discussion forum which promises the anonymous posting of user’s
views and complaints. The spammers target the forums as the craze of the forums
increase. Though these platforms act as medium of knowledge sharing, all of the
users don’t use these platforms for a positive cause. They are also being used to
abuse or bully targeted people taking advantage of their anonymous feature.
Spamming and cyber bullying has grown rapidly to a limit that social media is
being termed harmful. By reading spam and vulgar comments, readers will be
diverted. Main aim is to detect these bad comments which are vulgar, inappro-
priate or not related to the specific context. The research is not based on the static
contents but it live streams the comments and the entire research is being done.
The research is based on NLP, Sentiment calculation and topic detection.
1 Introduction
The social media platforms allow us to interact and share our ideas. These comments
are allowed to be posted anonymously to get more genuine views. Though we have the
access of reading the comments and coming to a decision but they may be spam and
they cause bad impact on the reader’s brain. YouTube has a feature of deleting the
comments based on the number of dislikes. By this action we can understand the real
motive is that to not entertain any spam comments. In our research in this paper our
approach is to deal not only with the spam comments but also to look after the bad,
vulgar and irrelevant comments which manipulate the readers mind and are out of topic
which are of no use. We built a mechanism to identify the spam comments and apply
the natural processing techniques and machine learning algorithms.
2 Proposed Method
The research on previous works shows that the current systems are not up to the
mark. The general practice is they use a static list each time they check for profanity.
This don’t work if the vulgarity is in form of misspelled words, different languages and
other reasons. These drawbacks make the current systems to detect profanity, obsolete
some even depend on outsiders so that they are assigned with the detection of spam
comments for the posts. This is suitable and doable up to a particular stage but when
the task becomes huge this is not applicable. So all the comments are profanity checked
based on vulgarity, abusive words and irrelevant topic discussion.
List Based Approach: This is the most standard approach where That is, in order to
determine if a comment contains profanity in a particular forum, these systems simply
examine each word in the document. If a match occurs the it is profane. Basically we
introduced a system where as soon as the comment is introduced in the forum the
comment is being checked for the profanity and the profanity module runs in the
background and if it is found to be profane we stop further pre-processing. The pro-
fanity module is from Google where they update the list on periodic basis and we make
sure that the list is updated in our profanity module which takes care of all the spellings,
partially censored and other issues taken care of.
Pre-Processing Module: Preprocessing is an important stage in natural language
processing because the words, tokens and other sentences identified in this stage are
used for further preprocessing to find ngrams and applying algorithms.
Stop Word Removal: Many words in a sentence are used as joining words but they
themselves do not make any sense unless combined and framed grammatically to form
a sentence. So we can say that their presence do not contribute to the content or context
of the document. Removal of these stop words is necessary because of their high
frequency causes obstacles in understanding the actual content of document.
Tokenization: Tokenization is the process where the sentence or a word is broken into
tokens. The main aim behind tokenization is to explore the meaning of the tokens
formed and how they are preprocessed further to make meaningful outcomes after
performing nlp. Though it is readable it still is left out with many punctuation words
and expressions which are of no use for us and should be removed. Tokenizing is based
on the delimiter which further depends on the language as different languages have
different delimiters. Space is a delimiter in English.
Stemmatization: The word is the token is reduced to its root word. The root form is not
necessarily a word by itself, it can be formed even by concatenating the right suffix.
Stemming is a type of normalization. The sequence is returned with its root word.
Lemmatization: The lemma of the word is found. So we can see that the suffixes are
removed in lemmatization. The word which is returned is called lemma. These two
terms are not same, Stemming is just finding the root word but most times it’s not
preferable but lemming is a technique where morphological analysis of the words
(Figs. 2 and 3).
Detection of Inappropriate Anonymous Comments Using NLP 135
Algorithm: PPM
Algorithm: TEM
Sentiment Analysis Module: Sentiment analysis shows the sentiment of the people
based on the topic being discussed and how the people’s opinions are. This is a
classification where the inserted phrase is decided based on the negative, positive and
neutral sentiment.
In our research we used sentiwordnet. SENTI WORDNET is a document con-
taining all the synsets of WORDNET along with their “positivity”, “negativity”, and
“neutrality”. Each synset has three scores Positive score, Negative score, and Objective
score. These scores may range from 0.0 and go up to 1.0, the sum of all the three scores
being 1.0. Each score for a synset term has a non-zero value.
Detection of Inappropriate Anonymous Comments Using NLP 137
So on the result obtained which shows the sentiment of the phrase describes how
the opinion of the people is and also the opinion on the topic being discussed which
helps a lot in case of our forum where students will be discussing all their issues which
paves a way for the management and the teachers to look after the issues which are
needed to be taken care and how they need to be handled are also discussed as we
provided their suggestions section also so they can reach the staff and be resolved. This
system helps not only the faculty and institution but also the students who want their
issues to be solved (Fig. 6).
Algorithm: SAM
3 Results
The results are shown in the form of plots like word clouds and barplots.
Fig. 7. Barplot visualizing the sentiment Fig. 8. Wordcloud depicting the most discussed
topics
138 N. Sai Nikhita et al.
4 Conclusion
Our research has overcome the problem with spam comments and all the disadvantages
which were in the existing system. In the proposed the system the spam comments will
be detected based on finding out its features and also the problem where topic irrelevant
comments which lead to misconception are also dealt with. Future enhancements can
be made to this research as we are streaming the comments not just taking the static
content which provides a great scope not only to remove the spam comments but to
make this evaluation of topic to be applicable in other areas of interest.
References
1. Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk
e-mail. In: AAAI-98 workshop on learning for text categorization, Madison, Wisconsin, July
1998, pp 98–105
2. Carreras X, Marquez L (2001) Boosting trees for anti-spam email filtering. In: Proceedings of
RANLP-01, 4th international conference on recent advances in natural language processing,
pp 58–64
3. Davison BD (2000) Recognizing nepotistic links on the web. In: AAAI 2000 workshop on
artificial intelligence for web search, pp 23–28
4. Drost I, Scheffer T (2005) Thwarting the nigritude ultramarine: learning to identify link spam.
In: ECML’05 Proceedings of the 16th European conference on machine learning, Berlin,
Germany, pp 96–107
5. Mishne G, Carmel D, Lempel R (2005) Blocking blog spam with language model
disagreement. In: Proceedings of the first international workshop on adversarial information
web (AIRWeb), Chiba, Japan, May 2005, pp 1–6
6. Bhattari A, Dasgupta D (2011) A self-supervised approach to comment spam detection based
on content analysis. Int J Inf Secur Priv (IJISP) 5(1):14–32M
7. Siersdorfer S, Chelaru S (2010) How useful are your comments? analyzing and predicting
YouTube comments and comment ratings. In: Proceedings of the 19th international
conference on World Wide Web, pp 891–900
8. Dong R, Schaal M, Smyth B Topic extraction from online reviews for classification and
recommendation
Mahaviracharya Encryption Algorithm
(MEA) with Modified Counter Mode
and Comparing with AES Algorithm
Abstract. Now a day’s usage of internet is being increased day by day, as well
as users also being increased. Online education, ecommerce and online banking
are being increased in developing countries. Network security is playing a vital
role to provide the security to the data in the internet. Even though there are
number of algorithms, developing new algorithms for internet security is very
important, because attacks are increased and processing speed is also increased.
In this paper, we are discussing a new symmetric encryption algorithm with
modified counter mode and comparing with Advanced Encryption Standard
(AES) algorithm with ECB mode. For this comparison we are taking encryption
time and decryption time as parameters with varies data sizes. The comparison
results are summarized and highlighted the characteristics of the new algorithm.
1 Introduction
Cryptography algorithms are playing important role in network security area to provide
encrypted data at sending side and decrypted data at receiving side. There are number
of symmetric encryption algorithms and asymmetric encryption algorithms. Mahavir-
acharya algorithm is a symmetric encryption algorithm, which is using same key for
encryption at sender area and decryption at receiver area.
Mahaviracharya was a 9th century Mathematician from Bharat (India). He was the
author of Ganitha Saara Sangraha, a book on algebra and geometry. In this book, he
discussed a rule. i.e. “Rasilabdacheda misravibhaga sutram”, for separating the
unknown dividend number, quotient and divisor from their combined sum [1].
Rasilabdacheda misravibhaga sutram is any suitable optionally chosen number
subtracted from the given combined sum happens to be the divisor. On dividing, by this
divisor as increased by one, the remainder (left after subtracting the optionally chosen
number from the given combined sum), the required quotient is arrived at. The very
same remainder (above mentioned), as diminished by (this) quotient becomes the
required dividend number [1].
b¼ac
x ¼ aþbþc
x is the cipher text:
k ¼ cðcxþþ11Þ
a¼xk
a is the plain text:
We can modify above decryption algorithm. The new Decryption algorithm is:
xc
a¼
cþ1
2 Background
The National Institute of Standards and Technology (NIST) suggested five block cipher
modes of operation; those are the Output Feed Back (OFB), Electronic Code Book
(ECB), Cipher Feed Back (CFB), Cipher Block Chaining (CBC), and the Counter
(CTR) modes [3]. In principle, a mode of operation is a procedure to improving the
outcome of an encryption, decryption algorithms. This division describes these five
modes of operation thoroughly.
one time in plaintext, for all time gives the same ciphertext. For long plaintexts, this
mode could not be safe. If the structure of plaintext is well, it could be feasible to a
cryptanalyst for utilize these patterns [3].
carry on up to entire message parts are ciphered. In decryption process, the same
structure has been used with small change that is the plaintext is the result of encryption
function output XOR with the acquired ciphertext [3].
3.1 Encryption
3.2 Decryption
1. Ciphertext: C = c1 c2 c3 c4 …cm
• C contains m number of digits and C should minimum 128 bits.
2. Counter value: R = r1 r2 r3 r4 …..rs
• R is SecureRandom number, contains s number of digits and R should minimum
128 bits.
3. Secrete Key: K = k1 k2 k3 k4….kq
• K contains q number of digits and K should minimum 128 bits.
4. B = R*K
5. X = R + B + K = x1 x2 x3,…… …….xm xt xt+1 xt+2……xt+n
• t=m+1
• X length (number of digits: t + n) should greater than Ciphertext (C) length
(number of digits).
• If X length less than C length then X should be expanded to xt+n.
6. X′ = x1 x2 x3…….xm
7. Plain text: P ¼ X 0 C
This algorithm provides good security from brute-force attack, ciphertext only and
known plaintext attacks.
In this experiment above algorithm is implemented using java language. This algorithm
results are compared with AES algorithm results. AES algorithm with ECB mode is
used to compare the results. Because, the variances among the modes are trivial in
small files (less than 10 MB) and for large files ECB mode is taking less time to
encrypt and decrypt than the other modes [5].
We have calculated the execution time for encryption and decryption methods for
different sizes of text messages. For this experiment, we used Intel(R) core(TM) i5-
7500 CPU @ 3.40 GHz with 4 GB RAM, Windows 7 Professional Service Pack 1, 64
bit operating System, Net Beans IDE 8.2, jdk-10.0.1_windows-x64_bin.
5 Conclusion
References
1. Rangacarya M, Ganitha-Sara - Sangraha of Mahaviracarya. Cosmo Publication, New Delhi,
India
2. Nagaraju B, Ramkumar P (2016) A new method for symmetric key cryptography. Int J
Comput Appl 142(8):36–39. ISSN: 0975-8887
3. Stallings W (2014) Cryptography and network security: principles and practices. Pearson
Education India, Chennai
146 N. Bollepalli and R. Penu
4. https://docs.oracle.com/javase/7/docs/api/java/security/SecureRandom.html
5. Almuhammadi S, Al-Hejri I (2017) A comparative analysis of AES common modes of
operation. In: 2017 IEEE 30th Canadian conference on electrical and computer engineering
(CCECE)
Text Processing of Telugu–English Code Mixed
Languages
Abstract. In social media, code mixed data has increased, due to which there is
an enormous development in noisy and inadequate multilingual content.
Automation of noisy social media text is one of the existing research areas. This
work focuses on extracting sentiments for movie related code mixed Telugu–
English bilingual Roman script data. The raw data of size 11250 tweets were
extracted using Twitter API. Initially, the data was cleaned and the annotated
data was addressed for sentiment extraction through two approaches namely,
lexicon based and machine learning based. In lexicon based approach, the
language of each word was identified to back transliterate and extract senti-
ments. In machine learning based approach, sentiment classification was
accomplished with uni-gram, bi-gram and skip-gram features using support
vector machine classifier. Machine learning performed better in skip-gram with
an accuracy of 76.33% as compared to lexicon based approach holding an
accuracy of 66.82%.
1 Introduction
Lot of people tend to utilize multiple languages in social networking. Yet there are
different tasks conducted on code mixed texts, the task of sentiment extraction, par-
ticularly, has not been explored for multilingual code mixed texts. This kind of text
differs from traditional English texts and to be processed differently. However, different
forms of texts require different methods for sentiment extraction.
Code mixing is the utilization of one language in another language, the mixing of at
least two or more languages or language categories in a content. It frequently happens
when the utilization of two languages or two cultures cannot be separated from the
components of one language well and frequent overlap between the two systems. Code
mixing usually occurs in bilingual or multilingual communities and the importance of
the language cannot be clearly separated.
Code mixing refers to placing of words, phrases and morphemes of one language
into the articulation of another language. Below example is a mix of two languages
namely Telugu transliterations and English.
Ex: NTR nenu ee Role cheyalenu ani cheppina character oke okka character….
Ade Sr.NTR gari Role in @Mahanati.
2 Related Work
Language identification became difficult as social media text contains informal text,
bilingual or multilingual text of different scripts or within a script, so language iden-
tification is a major task in processing the text for future applications.
[5] Directs the word level language identification utilizing FIRE 2013 Bengali–
English, Hindi–English, Gujarati–English code mixed information alongside 24 foreign
dialects. They constructed a twofold classifier utilizing character n-gram features for
n = 1 to 5 and two standard classifiers specifically Nave Bayes and Logistic Regres-
sion. Out of all Gujarati–English scored a higher exactness of 94.1%.
[1] Exhibited initial work on identifying Hindi, Bengali and English languages of
code mixed data from facebook posts and comments utilizing two methodologies in a
particular dictionary based approach and machine learning based approach. In Lexicon
based approach they utilized British National corpus, LexNorm-list and SemEval 2013
twitter corpus for identifying English words. As there is no transliterated lexicon to
recognize Bengali and Hindi words so they created trained set of words as a look-up,
with this they accomplished 93.12% accuracy. Further in machine learning approach,
features are character n-grams, presence in dictionary, length of words and capital-
ization. Support Vector Machine (SVM) classifier trained with these features brought
about exactness of 94.75% and Conditional Random Field (CRF) brought about higher
precision of 95.76%.
[2] Presented 3 approaches for identifying languages at the word level in code
mixed Indian online networking content namely n-gram language profiling and
pruning, Dictionary based detection and SVM based detection. TextCat language
detection framework is utilized to generate n-gram profiles arranged by frequency, out-
of-place measure is determined utilizing these profiles which is utilized to predict the
language of the text. In lexical normalization lexicon is utilized to identify English
words, Samsad English–Bengali lexicon and Hindi word-net were transliterated to
Romanized text utilizing Modified Joint Source Channel Model to identify Hindi and
Bengali words, this approach gained 38% and 35.5% F-score for English–Hindi and
English–Bengali. Features such as n-gram with weights, lexicon based, minimum edit
distance (MED) based weights, word context information are fed into SVM classifier
for word level language identification which has brought about high performance of
76.03% F-score for English–Hindi and 74.35% F-score for English–Bengali.
[15] Introduced various strategies to analyze sentiment of text after normalizing the
text. They utilized FIRE 2013 and 2014 Hindi–English data sets. They partitioned their
work into two stages: In the principal stage they identified the language of the words
Text Processing of Telugu–English Code Mixed Languages 149
present in the code mixed sentences utilizing lexicons. In the second stage they
extracted the sentiments of these sentences utilizing SentiWordNet. They handled
finding abbreviations, Spelling rectifications, Slang words, word play, phonetic typing
and transliterations of Hindi words into Devanagari script. They accomplished a pre-
cision of 85%. In [14], they also handled Named entity recognition, ambiguous words
and got exactness of 80%.
[12] has proposed a framework to analyze the sentiment of English–Hindi code
mixed text which is extracted using facebook graph API. This framework incorporates
both dictionary based approach and machine learning based approach. In dictionary
based approach, they made semi-automatic lexicons of words annotated with a
semantic orientation polarity. They utilized the data structure to keep up these lexicons
with their polarity. They characterized text based on the count of positive and negative
words, they accomplished an exactness of 86%. In machine learning based approach,
they implemented SVM, Naive Bayes, Decision tree, Random tree, multi-layer per-
ceptron models with uni-gram words, list of negative words, list of positive words and
list of negation words as features on WEKA tool. They accomplished an accuracy of
72% which is not as much as dictionary based approach.
[10] had extracted sentiment from Hindi–English live twitter information utilizing
lexicon based approach. At first, they recognized the language of each word with the
assistance of n-grams and tagged parts of speech of English–Hindi mixed sentences.
They made two lexicons comprising of important words from the tweets and cate-
gorised the tweets as overall positive, negative and neutral. For looking through the
words in lexicons, a linear search algorithm and dictionary search algorithm were tried
among which dictionary based search has better execution. While classifying data they
joined Breen’s algorithm and Cholesky decomposition for deciding sentiment. They
achieved an accuracy of 92.68% for the positive case and 91.72% for negative case.
[13] presented their work as a part of the shared task at ICON 2017 challenge. They
executed machine learning techniques on Hindi–English and Bengali–English code
mixed online networking text. The released datasets were labelled with three names to
be specific positive, negative and neutral. They build a model by training multinomial
Naive Bayes classifier in WEKA with n-gram and sentiwordnet as features. At last,
they acquired an F-score of 0.504 for Bengali–English and 0.562 for Hindi–English.
[6] had worked on automatically extracting Positive and Negative opinions. For the
English–Bengali and English–Hindi code mixed information from facebook posts
using machine learning approaches. The dataset is gathered from facebook and utilized
altered information from ICON-2015. Preprocessing of data is made possible by
removing noisy data, expansion of abbreviation, removal of punctuation, removal of
numerous character repetitions from facebook posts. The machine learning algorithms
are utilized to train the classifiers with number of word matches with sentiwordnet,
Opinion lexicon, English sentiment words, Bengali sentiment words, density of revile
words, parts of speech, number of every capitalized word, density of exclamation
marks, density of question marks, frequency of code switches, number of smiley
coordinates as features utilizing WEKA software. The best outcomes were created by
Multilayer Perceptron model with an accuracy of 68.5% utilizing coalition of word
based and semantic features.
150 S. Padmaja et al.
3 Proposed Methodology
3.1 Data Set
The proposed work has been focused on a bilingual English–Telugu code mixed movie
related data. The data has been scraped from twitter using Twitter API. The scraped
data was then cleaned by removing punctuations, hashtags and further replaced short
forms and slang words. Extraction of sentiment from bilingual code-mixed text has
been done using both lexicon based and Machine learning based approaches.
Language Identification. Identifying language of each word is the main and primary
task for sentiment extraction of code mixed text. In this phase language is identified
through lexicon based approach. Firstly, noise such as hashtags, punctuations and
URLs are removed from the text. Slang words used by the users such as ‘hru’ for ‘how
are you?’, are identified and replaced with the original. Each word in the text is then
looked-up in language dictionaries and tagged with the corresponding language tag.
For English, ‘en’ was tagged using British National corpus [9] and for Telugu, ‘te’ was
tagged using ITRANS format of Leipzig corpora [7]. Further named entities in the text
were tagged as ‘ne’, word level code mixed words in the text were tagged as ‘cm’ using
lexicon which was created for movies and remaining words are tagged as ‘un’
(universal).
e.g.: “Aagnathavasi is really disappointed. Asalu movie lo emi ledhu chala worst
movie. Songs are ok”
After language:
Text Processing of Telugu–English Code Mixed Languages 151
In the above sentence, all ‘te’ tagged words are back transliterated to Telugu script.
Sentiment Extraction. Sentiment extraction here is to classify each English–Telugu
code mixed text either positive or negative or neutral through lexicon based approach.
Based on the language of text, the sentiment of code mixed text is determined. Each
word is looked up into sentiment lexicons to find the corresponding positive and
negative lexicons present in each sentence. We used two sentiment lexicons to extract
sentiment of the text:
– Opinion lexicon [8] which consists of 2007 English positive words and 4783
English negative words.
– Telugu sentiwordnet [3] which consists of 2136 positive words, 4076 negative
words, 359 neutral words and 1093 ambiguous words.
We determined overall sentiment based on count of positive and negative words in
code mixed tweets. If the count of positive words is more then the tweet is classified as
positive sentiment, if the count of negative words is more then tweet is classified as
negative sentiment and if the count of positive and negative words is same then the
tweet is classified as neutral sentiment.
After sentiment extraction:
4 Results
The results have been drawn on 11250 English–Telugu code mixed tweets and the
proposed approaches are evaluated using precision, recall, F-measure and accuracy. In
lexicon based approach the performance of language identification phase is shown in
Table 2. The accuracy of the language identification is 75.60%.
The accuracy of sentiment extraction for lexicon based approach is 66.82%. Each
class results are shown in Table 3. The performance of machine learning based
approach with uni-grams, bi-grams and skip-grams as features are shown in Table 4.
Machine learning based approach performed far better in skip gram compared to
lexicon based approach. Hence the error analysis for lexicon approach is discussed in
Sect. 5.
Error analysis shows the limitations of lexicon based approach so the results were
analyzed to understand the flaws of the approach.
The above statements should be tagged as positive but was tagged as negative due
to ambiguity. Since in the above example the English words such as hit and craze are
considered as negative opinions.
– Due to presence of indirect sense, sarcasm and conflicts sentiment, there are mis-
classifications.
The above example should be tagged as positive but due to indirect sense i.e.,
writer appreciating one person with reference of other person.
The objectives to enhance the work in future are: This work can be extended further
to improve and refine the techniques to resolve ambiguous words related to movie
domain and to handle negation. As the data is related to movie domain, domain specific
sentiment lexicons are to be created. This work can also be extended to identify the
sarcasm, conflicts and indirect sense using some standard approaches. The work can be
extended using more machine learning approaches and adding more features to
enhance performance of the sentiment extraction model.
References
1. Barman U, Das A, Wagner J, Foster J (2014) Code mixing: A challenge for language
identification in the language of social media. In: Proceedings of the first workshop on
computational approaches to code switching, pp 13–23
2. Das A, Gambäck B (2014) Identifying languages at the word level in code-mixed Indian
social media text. International Institute of Information Technology, Goa, India
3. Das A, Bandyopadhyay S (2010) Sentiwordnet for Indian languages. In: Proceedings of the
eighth workshop on Asian language resources, pp 56–63
4. Garcia I, Stevenson V (2009) Reviews-Google translator toolkit. Multiling Comput Technol
20:6–22
5. Gella S, Bali K, Choudhury M (2010) ye word kis lang ka hai bhai? testing the limits of
word level language identification. In: Proceedings of the eleventh international conference
on natural language processing, pp 130–139
6. Ghosh S, Ghosh S, Das D (2017) Sentiment identification in code-mixed social media text.
arXiv preprint arXiv:1707.01184
7. Goldhahn D, Eckart T, Quasthoff U (2010) Building large Monolingual Dictionaries at the
Leipzig Corpora Collection: from 100 to 200 languages. In: LREC, pp 31–43
8. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth
ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–
177. ACM
9. Burnard, L (2000) Reference guide for the British National Corpus, world edition. Oxford
University Computing Services, Oxford
10. Malgaonkar S, Khan A, Vichare A (2017) Mixed bilingual social media analytics: case
study: live Twitter data. In: 2017 international conference on advances in computing,
communications and informatics (ICACCI), pp 11407–1412. IEEE
11. Platt J (1998) Fast training of support vector machines using sequential minimal
optimization. In: Schoelkopf B, Burges C, Smola A (ed) Advances in Kernel methods -
support vector learning. MIT Press. http://research.microsoft.com/˜jplatt/smo.html, http://
research.microsoft.com/˜jplatt/smo-book.ps.gz, http://research.microsoft.com/˜jplatt/smo-
book.pdf
12. Pravalika A, Oza V, Meghana NP, Kamath SS (2017) Domain-specific sentiment analysis
approaches for code-mixed social network data. In: 2017 8th international conference on
computing, communication and networking technologies (ICCCNT), pp 1–6. IEEE
13. Sarkar K (2018) JU KS@ SAIL CodeMixed-2017: sentiment analysis for Indian code mixed
social media texts. arXiv preprint arXiv:1802.05737
Text Processing of Telugu–English Code Mixed Languages 155
14. Sharma S, Srinivas PYKL, Balabantaray, RC (2015) Sentiment analysis of code-mix script.
In: 2015 international conference on computing and network communications (CoCoNet),
pp 530–534. IEEE
15. Sharma S, Srinivas PYKL, Balabantaray RC (2015) Text normalization of code mix and
sentiment analysis. In: 2015 international conference on advances in computing, commu-
nications and informatics (ICACCI), pp 1468–1473. IEEE
A Proficient and Smart Electricity Billing
Management System
Abstract. Electricity is an energy that play a major role in human life. In day to
day life, each and every device from machinery to wrist watch everything works
on electricity. It is the most basic requirement next to food, shelter, and clothing.
From the past decade’s lot of changes took place in electricity departments but
even now they are using manual billing system. This system has a wide range of
disadvantages like malpractices are done while billing, escaping from punish-
ment if any late payments, manpower for billing and collecting bills and wastage
of paper billing. And moreover, if a fire accident or a technical problem arises
the whole lane (transformer) will be terminated from power supply this may
cause an inconvenience to the peer consumers too. Here, we are concerned about
the economic loss that arises due to the manual billing system. In the manual
billing system, every month end or for a couple of months bill is generated. An
employee from the electricity department comes to each and every house for
billing the meters based on the number of units the consumer has consumed.
Hence, In this paper, we propose and discuss a new adaptive mechanism which
reduces all the above-mentioned losses.
1 Introduction
The purpose of this paper is to present a mechanism to prevent malpractices are done
while generating electricity consumption bills, to prevent from escaping of punishment
if any, to reduce manpower for billing and collecting bills and wastage of paper for
billing. So far many mechanisms are introduced and implemented but this mechanism
requires two devices, they are Integrated device - A device that displays price and
power consumed, and is connected to the server by means of a SIM card and AAD-
HAR card which is registered on the name of house owner by establishing a link with
the electricity bill meter number and also supports e-payments (online payments). This
device is fixed to the meter such that it requires technicians to replace or repair so that
some malpractices can be prevented. Just by changing the price per unit in the server all
the electric bill meters get updated by the new values. Cash Deposit Machines
2 Related Work
Many systems have been proposed in order to reduce this manual billing. Nowadays
electricity department is using a billing meter that directly detects the meters and copies
the number of units consumed and billing is done but however, in this case, the
manpower is again required in order to carry the device to all the meters. Many
people/teams proposed a lot of techniques in order to reduce manpower in electricity
billing and tried to make that work simple and efficient. Most of those solutions are
GSM Based techniques where people place a GSM module connected to it and sends a
message from a mobile to the particular SIM to which meter it is placed here they
started generating bills to the consumers one in a month or for a couple of months [2].
The below-given figure is one such kind of smart billing system published in July 2015
[1] (Fig. 1).
3 Methodology
SUPPLY
Relay Power
spdt Calculator
POWER
METER
CIRCUIT
MICROCONTROLL
GSM LCD
Module DISPLAY
The Retrofitting Device is set as it should respond only for the messages of the two
numbers so that no other person can get access over that meters for every minute to
minute or once in a day or once an hour the display will get updated with the value of
A Proficient and Smart Electricity Billing Management System 159
power consumed and charge for that consumption. This price value can also be updated
through the message. The SIM card number will be same as the registered mobile
number of the house owner in case they change their mobile number they should even
update their details in electricity department by providing an acknowledgment letter
[6]. This should also be linked with the AADHAR number of the customer that helps in
case of bill payment. Every month end the GSM module will generate a message to the
house owner’s mobile number of monthly consumption. If at all any late payments the
GSM module will automatically add finely to that bill. If the late payment is above one
month automatically power supply will be terminated from the mains to house so that
until and unless they pay the bill they will not be able to use power. In case of any fire
accidents if we make a call to the electricity department and inform them they will turn
off the relay board using Message so that it will be secure.CDM can be used for bill
payments by using your registered mobile number or by entering your AADHAR
number including meter number. As this machines will get connected to the server they
directly update your house meters display [8].
This technique brings a lot of change where it avoids all the manual billing systems
reducing manpower for generating and collecting the bills. It helps the government in
the proper/strict collection of bills. Most of the malpractices can be avoided. Easy
payment method for the consumers, however, the CDM service will be for 24*7. In
case of fire accidents, easy handling of power is allowed. This system helps to save
paper and all the data will be stored in the database so that a soft copy can be generated
as on required [5].
This is a smart way of using technology to reduce manpower and increase work
efficiency and accuracy without any malpractices. Here the work is the outcome of
smart devices.
• Low cost and efficient in billing.
• Reduces paper wastage.
• It can be implemented in rural, remote areas.
• Reduces most of the manpower.
• Prevents most of the malpractices done by consumers.
• Automatic (programmed) penalties and punishments will be implemented which
reduces revenue loss to DISCOMs.
• Power consumption and charges can be monitored through display on a meter.
• Direct implementation of new charges (per unit) from a server.
• The immediate power supply can be stopped from server in-case of fire accidents.
• The regular alerting system of bill payments through recorded calls and messages.
• User-friendly.
160 P. Pramod Kumar and K. Sagar
In this world of technology, a smart way of solving a problem is the most necessary
task. This methodology will bring a solution for the wastage of high manpower,
inefficient and inaccurate billing, abundantly increased malpractices, irregular pay-
ments in electricity billing departments. Proper implementation of bills, taxes will
enhance the nation’s economy. If the government is interested to make it through
online payments on mobile application can be built or an updated version of the present
available Mobile application can be developed this helps in making the transaction
more easier. This Mobile application should help to file complaints, apply for new
Meter connections, etc.…
References
1. Mohale VP, Hingmire AG, Babar DG (2015) Ingenious energy monitoring, control, and
management of electrical supply. In: 2015 international conference on energy systems and
applications, Pune, pp 254–257
2. Kanthimathi K et al (2015) GSM based automatic electricity billing system. Int J Adv Res
Trends Eng Technol (IJARTET) 2(7):16–21
3. Mahajan K, Jaybhave D, Nagpure N, Shirsat B (2016) A novel method for automatic
electricity billing system using Ad-Hoc networks. In: 2016 international conference on
global trends in signal processing, information computing and communication (ICGT-
SPICC), Jalgaon, pp 539–542
4. Merza AM, Nasr MS (2015) Electrical energy billing system based on smart meter and
GSM. Int J Appl Eng Res 10:42003–42012
5. Rastogi S, Sharma M, Varshney P (2016) Internet of Things based smart electricity meters.
Int J Comput Appl (0975 – 8887) 133(8):13–16
6. Rathnayaka MRMSB, Jayasinghe IDS et al (2013) Mobile based electricity billing system
(MoBEBIS). Int J Sci Res Publ 3(4):1–5. ISSN: 2250-3153
7. Tamarkin TD (1992) Automatic meter reading. Public Power Mag 50(5):934–937
8. Jain A, Kumar D, Kedia J (2012) Smart and intelligent GSM based automatic meter reading
system. Int J Eng Res Technol (IJERT) 2(3):1–6. ISSN: 2278-0181
9. “Handbook for Electricity Metering” by The Edison Electric Institute, The Bible of electric
meters, continuously updated since electricity was discovered
10. Shoults RR, Chen MS, Domijan A (1987) The energy systems research center electric power
system simulation laboratory and energy management system control center. IEEE Power
Eng Rev PER-7(2):49–50
11. Rawat N, Rana S, Yadav B, Yadav N (2016) A review paper on the automatic energy meter
reading system. In: 2016 3rd international conference on computing for sustainable global
development (INDIACom), New Delhi, pp 3254–3257
Building a Character Recognition System
for Vehicle Applications
Abstract. Today number plate for vehicles is very important for their verifi-
cation of its owner Id, address, vehicle identification and or for security purposes.
Number plates are of different shapes, colors and sizes in different countries. In
India, number Plates are of white background with black foreground color. By
number plate we can identify the number by using image processing technique.
By using image processing an image of the vehicle is captured to identify the
number. We can also check the location and detect the non-permit holders of the
stolen vehicles. The OCR, i.e., Optical Character Recognition (OCR) technique
is used to read the characters from the image captured of the vehicle. Character
Recognition is the one of the form of the OCR. In which we can read the
characters from the vehicle number plate and use this for the identification of the
owner of the vehicle with its detail like name of the owner, Place (state and
district), the date of registration of the vehicle and its registration number and
vehicle type, i.e., Either it is four wheeler or two wheeler. We have proposed this
methodology to detect or check the detail of the vehicle. The vehicle number
plate is also used for the Electric tolls to collect charge of pay-per-use of high-
ways and note down the journey time measurement and ticket collection. The
camera which is used for this process is infrared camera which capture the image
at all conditions of the weathers either it is day or night.
1 Introduction
The licensed number plate in these days is very useful because of large increase of
vehicles. The information extracted from the vehicles number plate [1] is used for
various purposes like Access Control, Traffic Monitoring and Toll roads and border
control areas, military areas and other restricted societies etc. for security purposes. The
main concern of this paper is to provide effective security or to control crime activities.
For this we have to capture the image of the vehicle by using HD-Cameras and then scan
that image by using the OCR technique as shown in Fig. 1. By using this technique the
number will compare with the database to check the vehicle [2] is belongs to its own
owner or not. The recognition process is generally sub-divided into five categories:
(a) Capturing the image of the license plate, i.e., Image acquisition.
(b) Normalization, adjusting the brightness and contrast of the image.
(c) Localizing the license plate.
(d) Locating and identifying the individual symbol images on the plate, i.e., Character
Segmentation.
(e) Optical Character Recognition, i.e., OCR.
OCR Model
These operations are used to remove the noise from the images. To identify a
vehicle a unique identification number is given to the vehicle which is provide to each
vehicle as a number plate and it is applied on the front as well as backside of the vehicle
which is vehicles unique ID. For example, HR51AX8052: which is the car number
plate number which shows its detail as, i.e., First two letters shows their state code and
the second two numbers show their district code and the third two letters are model of
the vehicle type and at the end the end four digits are provided to the vehicle which is
unique ID for the vehicle. By using this number, we can retrieve the detail of the
vehicle. The RTO i.e. district-level Regional Transport Office is provide this number to
each and every vehicle. The scheme of numbering has some advantages [11, 12], i.e., it
shows the state and district registration of the vehicle and during police investigation in
case of road accident or the vehicle related crimes if the witness read the number of the
vehicle it will be helpful for the investigators to reach the criminal easily.
Hence, the remaining part of this work is organized as: Sect. 2 discusses about used
methodology in this work. Further, Sect. 3 discusses our proposed method in detail.
Further, some applications of registered vehicle number plate are discussed in Sect. 4.
Then several applications of using OCR are discussed in Sect. 5. In last, Sect. 6 will
conclude this work in brief with some future work.
2 Methodology Used
In this technology we are working on the CCTV footage [11], i.e., closed circuit
Television with provide us an input image. The CCTV footage must be clear to see to
the input image contrast must be clear and number must be formatted.
Building a Character Recognition System for Vehicle Applications 163
The process to detect a number plate the following steps are followed as (refer
Fig. 2):
F. Vehicle Tracking: The vehicle tracking means to follow the path of the vehicle
from which it is going from various traffics signals to sense the path of the
particular course by scanned the image from the different cameras at different
routes.
Hence, this section discusses about our methodology used with Optical Character
Recognition (OCR). Now, next section will discuss about our proposed method in
detail.
3 Proposed Method
At this time there is not any system is existing, which automatically scanned the moving
vehicle. We have to check manually from the different-different CCTV cameras from
different-different locations of the traffics signals from every area from where it is
passing. A sensor is used which is automatically scanned the image to detect the vehicle
from different locations and digital camera is used which is keep ready to detect the
image (e.g., refer Fig. 3) After detecting the image we compare the image with different
images which are captured at different locations at different places to detect the path
from which the vehicle is passing through (for output, refer Fig. 4). This technology is
work on the basis of 60–70% match of the images. By which it provide the result on the
basis of different images of same vehicle number plate (the complete process can be
looked into Fig. 5). The number plate is helpful in stolen vehicles, parking organization,
toll plazas and constrained zones. The reason for converting image to text is to overcome
the problems like-multiplicity of plate formats, dissimilar scales, rotations and Non-
uniform clarification conditions caused during image possession.
Some Challenges: Problem occur during capturing the image occur are- poor reso-
lution of the image capturing camera, blurry imaging because of vehicle in motion and
poor lighting in the particular area from where the vehicle is moving. Low contrast of
light, over-exposure reflection and shadow of the image.
Building a Character Recognition System for Vehicle Applications 165
Hence, this section discusses about our proposed method in detail. Now, next
section will discuss several applications of registered vehicle number plate in detail.
4 Some Applications
(c) Road-Tolling: In this the particular use of the road or highway is concerned
where we can pay per of the roads during a journey. It helps to find the location of
a particular vehicle and also useful to measure the journey time.
• By sinking travel time,
• Jamming and improve roadways quality,
• Reduces fraud related to non-attendance,
• Makes charging valuable,
• Reduces mandatory manpower to process events of exceptions.
(d) Border Control: It is helpful in the border areas which may be under the military
or the army’s team where the security need is very high to reduce the crime or for
the investigation purpose.
• In opposition to terrorism,
• unlawful cross border traffic,
• Smuggling and against the law activities.
(e) Journey Time Measurement: Journey time measurement is used during the
travelling from various routes and the number of the vehicle is noted down on
every route to measure the time or during any accidental case the investigation
may be easily detect of the particular vehicle.
• Feeding back information to road users to boost traffic security,
• Selection efficient law enforcement,
• Optimizing traffic routes,
• Dropping costs and time, etc.
(f) By Law Enforcement: Law Enforcement is useful to find the stolen vehicle or to
detect vehicle which breaks the traffic rules or for over speed vehicle purposes.
• Red-light enforcement
• In excess of speed charging
• Automobile lane control.
For example, Intelligent Transport System (ITS), we can say this application as ITS
because here we track the vehicle and by using its number plate we can find the persons
whole detail and its vehicle also. This makes the Regional Transport Office
(RTO) system more intelligent and beneficial for the public. By using this traffic
monitoring it can be handled.
Hence, this section discusses about several applications of registered vehicle
number in detail. Now, next section will discuss about Optical Character Recognition
applications.
Building a Character Recognition System for Vehicle Applications 167
OCR (Optical Character Recognition) is vast use technology in today’s life. It is used to
scan the document text which is recognized by the computers. It is document man-
agement technology which is the smart way to manage or use the document text of the
image for the security use or to save the record for the database in the companies or
offices.
(a) Banking: In bank, OCR is used in the check with no participation of the people. In
this the check is inserted in the machine and the text on it is scanned automatically
and the given amount is deposited in the account. The use of OCR is fairly used in
the handwritten checks and manually conformation.
(b) Industry: In the legal industry, the digitized paper is frequently used to reduce the
use of paper. In order to reduce paper or space the elimination of paper docu-
ments, files, records are scanned by the computer insert or saved in the computer
database with great security and long-time use of the data and information easily
which is managed or processed by a particular person or persons access control.
(c) Healthcare: Healthcare professions are also use the OCR technology in hospitals
to save the patients records. They also have the volume of records of files of
patients records like their insurance, personal information. To keep this with them
they also make use of the electronic document scan to keep the document save in
digitally with the help of computers.
(d) Digital Signature: OCR is used in many fields as its support best result in their
applications and has many benefits like in education, finance, and government
sectors. Digital Signature is one of its basic applications as we use it in many
works or as our identification purposes. Digital Signature is the electronic sig-
nature in which a particular person’s signatures are scanned using the computer to
verify the originality.
Hence, this section discusses about Optical Character Recognition applications in
brief. Now next section will conclude this work with some future work in brief.
We have checked and evaluated the OCR technology on the vehicle number plates to
detect the text from the number plates for the security purpose and identification of the
vehicle owner. Some of the issues are there during OCR technology used on the
vehicles are proper format of the number plate of the vehicle, noise on the image and
camera pixel clarity etc., which effects the effectiveness of the OCR technology. This
software is implemented in JAVA and My-SQL is used for the database storage and the
Tresseract is used as OCR to detect the text from the image. Hence, now some of the
applications are:
• Lodge identity Check-In
• Tax-Free Shopping
• Self-Service convenience Meter Reading
168 S. Bansal et al.
References
1. Broumandnia A, Fathy M (2005) Application of pattern recognition for Farsi license plate
recognition. ICGST Int J Graphics, Vis Image Process 5:25–31
2. Puranic A, Deepak KT (2016) Vehicle number plate recognition system: a literature review
and implementation using template matching. Int J Comput Appl 134(1):12–16 ISSN: 0975-
8887
3. Rahman CA, Radmanesh A (2003) A real time vehicle’s license plate recognition system. In:
Null, p 163
4. Anagnostopoulos C-NE (2014) License plate recognition: a brief tutorial. Intell Transp Syst
Mag (IEEE) 6(1):59–67
5. Madhu Babu D et al (2015) Vehicle tracking using number plate recognition system. Int J
Comput Sci Inf Technol (IJCSIT) 6(2):1473–1476
6. Bhardwaj D, Mahajan S (2015) Review paper on automated number plate recognition
techniques. Int J Emerg Res Manage Technol 4(5):319–324. ISSN: 2278-9359
7. Saghaei H (2016) Automatic license and number plate recognition system for vehicle
identification. In: 2016 1st international conference on new research achievements in
electrical and computer engineering
8. Kaur K, Banga VK, Number plate recognition using optical character recognition (OCR)
9. Qadri MT, Asif M (2009) Automatic number plate recognition system for vehicle
identification using optical character recognition. In: 2009 international conference on
education technology and computer, ICETC’09, pp 335–338
10. Sulaiman N (2013) Development of automatic vehicle plate detection system. In: IEEE
international conference on system engineering and technology (ICSET), pp 130–135
11. Shaikh S, Lahiri B (2013) A novel approach for automatic number plate recognition. In:
IEEE international conference on intelligent systems and signal processing (ISSP),
pp 275–380
12. Kaur S, Kaur S (2014) An efficient approach for automatic number plate recognition system
under image processing. Int J Adv Res Comput Sci 5(6):43–50
Machine Learning Technique for Smart City
Development-Focus on Smart Mobility
Abstract. This work summarizes the current state of understanding the smart
city concept and how machine learning can be applied for the development of
the Smart City. The main innovations coming from the Smart City concept is the
rise of a user-centric approach considering urban issues from the perspective of
the citizen’s needs. Smart City concept has been defined to get an understanding
on how it can contribute towards urban development. In the approach to the
Smart Cities Mission, the objective is to promote cities that provide core
infrastructure and give a decent quality of life to its citizens, a clean and sus-
tainable environment and application of Smart Solutions. This paper presents a
theoretical perspective on the smart cities focused on data mining using machine
learning technique. In a smart city, a lot of data need to be automatically
processed and analyzed. A review has been done on the machine learning
algorithms applied on smart city. A smart city is to improve the quality and
efficiency of urban services by using digital technologies or information and
communication technologies. Data analytics plays an important role in smart
cities. An insight has been brought into machine learning integrated with data
mining applied to smart mobility and future focus to be on smart energy.
1 Introduction
Cities have strong imbalances and negative effects are surpasses the positive ones if
they are not properly managed. To understand how Smart City can define ideas and
how to achieve urban growth priorities. How Smart Cities learn how to reduce the
problems and how to engage citizens and how to participate in Smart City management
processes. Accordingly, public officials should facilitate any contacts in Smart City and
provide services automatically at real time. Smart movement, smart environment, smart
personality, smart people smart energy, smart education and smart healthcare. Con-
sumer approaches to urban issues from the predecessors of citizens’ needs, involvement
2 Related Work
Eiman Al Nuaimi et al. (3) took a shot at the Utilizations of enormous information to
smart cities diminishing expenses also, asset utilization notwithstanding more ade-
quately and effectively captivating with their nationals. He assessed the uses of huge
information to help smart cities. Furthermore, it endeavored to recognize the prereq-
uisites that help the usage of huge information applications for smart city adminis-
trations. Here they examined a portion of the fundamental open issues that should be
additionally explored and routed to achieve a more complete perspective of smart cities
and create them in an all-encompassing admirably thoroughly considered model.
Mohammad Saied et al. (1) studied on Machine Learning for Web of Things
Information Investigation. The Smart City is a standout amongst the most vital uti-
lizations of IoT and gives diverse administrations in spaces like vitality, versatility, and
urban arranging. It was demonstrated that these administrations could be upgraded and
improved by breaking down the smart information gathered from these zones. With the
end goal to remove information from gathered information, numerous information
scientific calculations were connected.
Jagannathan Venkatesh et al. (4) took a shot at Secluded and Customized Smart
Wellbeing Application Plan in a Smart City Condition. They connected measured
methodology for IoT applications – the setting motor – to smart medical issues,
empowering the capacity to develop with accessible information, utilize broadly useful
machine learning, and diminish register repetition and unpredictability. This uncovered
the middle condition for reuse, bringing about new information get to frameworks
being broadened and redesigned.
Mehdi Mohammedi et al. (2017) chipped away at the Empowering Subjective
Smart Cities Utilizing Enormous Information and Machine Learning: Methodologies
and Difficulties. The improvement of smart cities and their quick paced sending is
coming about the age of extensive amounts of information at exceptional rates. They
proposed a semi-directed profound fortification learning system to address the dis-
played difficulties and featured the situation of the structure in different smart city
application spaces. At long last, they enunciated a few difficulties and slanting
examination headings for joining machine figuring out how to acknowledge new smart
city administrations.
Machine Learning Technique for Smart City 171
3 Objectives
4 Smart Mobility
Mobility is another critical piece of the city. Through the information mining, city
authorities can enhance the quality of life in the city. This includes the improvement of
the productivity and administration of the vehicle through the use of video surveillance
and remote detection technologies to monitor traffic facilities and conduct related data
analysis for managing traffic flow, pedestrian flow and cargo flow in real time and
handling emergencies. It likewise advances blended model access which incorporates
different methods of transportation, including and open transport, clean-fuel vehicles,
cycling and strolling. Smart Mobility and Movement is the most ideal approach to
decrease clog and to create quick, green and shoddy activity. Smart City Transport
Framework enhances those going inside a city, decreasing vitality and diminishing
carbon discharges. Most smart activity administration frameworks utilize information
gathered from different sources about characteristic structures to enhance movement.
The smart development is inventive transport and transportation foundation, which
stores assets and makes new innovations for most extreme productivity. Openness,
reasonableness and wellbeing of transport frameworks, and reduced urban advance-
ment are basic factors in this specific situation. New easy to understand offices will
make it simpler for individuals to change to coordinated transport frameworks con-
centrated on ecologically inviting transport modes.
information, without being unequivocally customized to learn and act like people do. The
framework can enhance their learning after some time in the computerized design, by
eating those information and data as perceptions and certifiable connections. It is a part of
computerized reasoning dependent on the possibility that framework check gain from
information recognizes examples and settles on choices with negligible human mediation.
Two of the most widely adopted machine learning methods is supervised learn-
ing which trains algorithms based on example input and output data that is labelled by
humans. Unsupervised learning which provides the algorithm with no labelled data in
order to allow it to find structure within its input data. In this paper, we review the
machine learning algorithms applied on smart city. A smart city is to improve the quality
and efficiency of urban services. Data analytics plays an important role in smart cities.
Data
Problem exploration
definition
Deployment Data
preparation
Data
Modeling
Evaluation
With the available data clustering algorithm is applied. This way the main points where
traffic congestion is more serious can be identified and communicated to the traveler.
The k means was performed with 60 iterations (Fig. 2).
Input
Data Preprocessing Clustering using Class Label
(Training) Assignment
and Transformation K-Means Algorithm
Data Set
7 Conclusion
Here a methodology has been anticipated smart mobility utilizing K implies bunching.
Smart mobility can be accomplished by use of the contributions of smart movement on
constant premise to program the machine to take proper choice for smart mobility.
A similar methodology can be reached out to fields like smart vitality in this way
driving towards smart city. Smart Vitality is a standout amongst the most vital research
regions since it is basic to lessen by and large power utilization. It offers high caliber,
reasonable condition vitality companion. Besides, Smart Vitality foundation will turn
out to be more perplexing in future.
References
1. Ahmadinejad MS, Razvan MR, Barekatain MA (2017) Machine learning for internet of
things data analysis: a survey. J Digital Commun Netw
2. Khan Z, Anjum A, Soomro K, Tahir MA (2015) Towards cloud based big data analytics for
smart future cities. J Cloud Comput Adv Syst Appl 4:2. https://doi.org/10.1186/s13677-015-
0026-8
3. Al Nuaimi E, Al Neyadi H, Mohamed N, Al-Jaroodi J (2015) Applications of big data to
smart cities. J Internet Serv Appl. https://doi.org/10.1186/s13174-015-0041-5
4. Venkatesh J, Aksanli B, Member, modular and personalized smart health application design
in a smart city environment. IEEE Internet of Things J. https://doi.org/10.1109/jiot.2017.
2712558
5. Kolomvatsos K, Anagnostopoulos C (2017) Reinforcement learning for predictive analytics
in smart cities. Manuel Pedro Rodríguez Bolívar 3:16
176 Md. Fasihuddin and M. F. U. H. Syed
6. Batista DM, Goldman A, Hirata R Jr (2016) Inter smart city: addressing future internet
research challenges for smart cities. IEEE international conference on network for the future
7. Sharma N, Singha N, Dutta T (2015) Smart bin implementation for smart cities. Int J Sci Eng
Res 6(9), September, ISSN 2229-5518
8. Novotny R, Kadlec J, Kuchta R (2014) Smart city concept, applications and services.
J Telecommun Syst Manage, ISSN: 2167-0919 JTSM, an open access journal. https://doi.
org/10.4172/2167-0919.1000117
9. Wu SM, Chen T, Wu YJ, ID, Lytras M (2018) Smart cities in Taiwan: a perspective on big
data applications, sustainability 10:106. https://doi.org/10.3390/su10010106
10. Hashem IAT, Anuar NB, Adewole KS (2016) The role of big data in smart city. Int J Inf
Manage, May https://doi.org/10.1016/j.ijinfomgt.2016.05.002
11. Battula BP, Dr. Prasad RS (2013) An overview of recent machine learning strategies in data
mining. Int J Adv Comput Sci Appl 4:3
12. Chan PK, Lippmann RP (2006) Machine learning for computer security. J Mach Learn Res
7:2669–2672
13. Heureux L et al (2017) Machine learning with big data: challenges and approaches. IEEE
Access 5:7776–7797
14. Das K, Behera RN (2013) A survey on machine learning: concept, algorithms and
applications. Int J Innovative Res Comput Commun Eng 5(2)
15. Mohammed S, Mohammed O, Fiaidhi J, Fong S, Kim TH (2013) Classifying unsolicited
bulk email (UBE) using python machine learning techniques. Int J Hybrid Inf Technol 6(1)
16. Sajana T, Sheela Rani CM, Narayana KV, A survey on clustering techniques for big data
mining. Indian J Sci Technol 9(3). https://doi.org/10.17485/ijst/2016/v9i3/75971
17. Jumutc V, Langone R, Suykens JA (2015) Regularized and sparse stochastic k-means for
distributed large-scale clustering, in: big data (Big Data), 2015 IEEE international
conference on, IEEE, pp 2535–2540
18. Coates A, Ng AY (2012) Learning feature representations with k-means. Springer Berlin,
Heidelberg, pp 561–580. https://doi.org/10.1007/978-3-642-35289-8_30
19. Ma X, Wu YJ, Wang Y, Chen F, Liu J (2013) mining smart card data for transit riders travel
patterns. Transp Res Part C: Emerg Technol 36:1–12
20. Tao X, ji C (2014) Clustering massive small data for IOT, in: 2nd international conference
on systems and informatics (ICSAI), 2014, IEEE, pp 974–978
Smart Posture Detection and Correction
System Using Skeletal Points Extraction
1 Introduction
Human Posture shows an impact on human health both physically and mentally. Many
methods have been proposed in order to find out different postures of a human being.
[1, 9, 7] developed a fall detection algorithm based on posture analysis. Posture
analysis also plays an important role in the field of medicine to find out sleeping
posture of the patient [4, 6]. The major posture analysis approaches are sensor-based
approach and image processing-based approach. Many models emphasize on posture
detection using sensor-based approach in which the person needs to wear some special
gadgets or sensors which is mainly helpful for fall detection [3, 5, 8, 9, 11]. Image
processing-based approach helps to analyses standing posture as well as sitting postures
[2, 10, 12].
In recent studies, it has been proved that sitting posture not only effects our body in
a physical manner but also plays an important role in concentration on things. This
implies that sitting posture also effects our learning abilities. The physical effect due to
improper sitting posture is more in people who works with laptops or computers for a
significant amount of time in a day. In sensor-based approach the person has to wear
some sensors all the time which makes the person uncomfortable. In image processing-
based approach a depth sensor is used to get a 3D image [12], which identifies the
sitting posture. But in practical, laptops or desktops are not equipped with a depth-
sensor thus making this approach failing to detect the sitting posture without a depth
sensor.
© Springer Nature Switzerland AG 2020
S. C. Satapathy et al. (Eds.): ICETE 2019, LAIS 3, pp. 177–181, 2020.
https://doi.org/10.1007/978-3-030-24322-7_23
178 J. B. V. P. Raju et al.
2 Proposed Model
In the proposed model, there is no use of a depth sensor, rather a normal web camera or
laptop camera is used to get a 2D image and analyse the sitting posture based on human
skeletal points, thus making the model available for all the persons who spend a lot of time
in front of their laptop screens. Thus the proposed model requires no additional hardware
to correct the sitting posture. Fig. 1 explains the steps involved in the working model.
There are several steps involved in the development of smart posture detection and
correction system. The entire design process can be divided into four major steps. They
are (1) skeletal points extraction and dataset creation (2) Training the model using
KNN (K-Nearest Neighbors) algorithm (3) Real time testing (4) Correction or
Recommendation.
Fig. 2. Image without skeletal points Fig. 3. Image with skeletal points
Now the model is trained using KNN algorithm. KNN classifies based on the
distance between the skeletal points and the value of “k” (the number of neighbors to
consider).
180 J. B. V. P. Raju et al.
3 Conclusion
The proposed smart posture detection and correction system introduced a design flow
for sitting posture detection and correction. The designed system uses a web camera or
laptop camera to capture the image. The captured image is processed to extract skeletal
points using OpenCV, which are then passed through the trained model to determine
the sitting posture of the person. The system gives a voice message to adjust the posture
whenever a wrong posture is encountered. Thus the designed model makes sure that the
Smart Posture Detection and Correction System 181
person does not sit in the wrong posture, which helps to reduce the adverse effects of
sitting in wrong posture such as back pain, soreness, poor circulation, cervical pains
and decrease in eyesight.
References
1. Cheng H, Luo H, Zhao F (2011) A fall detection algorithm based on pattern recognition and
human posture analysis, Proceedings of ICCTA
2. Stoble JB, Seeraj M (2015) Multi-posture human detection based on hybrid HOG-BO
feature, 2015 Fifth international conference on advances in computing and communication,
pp 37–40
3. Miljkoviü N, Bijeliü G, Garcia GA, Mirjana B (2011) Popoviü.: Independent component
analysis of EMG for posture detection: sensitivity to variation of posture properties. 19th
Telecommunications forum TELFOR 2011, pp 47–50
4. Lee HJ, Hwang SH, Lee SM, Lim YG, Park KS (2013) Estimation of body postures on bed
using unconstrained ECG measurements. IEEE J Biomed Health Inform 17(6):985–993
(2013)
5. Matsumoto M, Takano K (2016) A posture detection system using consumer wearable
sensors. 10th international conference on complex, intelligent, and software intensive
systems
6. Wahabi S, Pouryayevali S, Hatzinakos D (2015) Posture-invariant ECG recognition with
posture detection, ICASSP 2015, pp 1812–1816
7. Tan TD, Tinh NV (2014) Reliable fall detection system using an 3-DOF accelerometer and
cascade posture recognitions, APSIPA
8. Ni W, Gao Y, Lucev Z, Pun SH, Cifrek M, Vai MI, Du M (2016) Human posture detection
based on human body communication with muti-carriers modulation, MIPRO 2016, Opatija,
Croatia, pp 273–276
9. Yi WJ, Saniie J (2014) Design flow of a wearable system for body posture assessment and
fall detection with android smartphone, 2014 IEEE international technology management
conference
10. Terrillon JC, Pilpré A, Niwa Y, Yamamoto K, DRUIDE: A real-time system for robust
multiple face detection, tracking and hand posture recognition in color video sequences. 17th
international conference on pattern recognition (ICPR’04)
11. Chopra S, Kumar M, Sood S (2016) Wearable posture detection and alert system. 5th
international conference on system modeling & advancement in research trends, pp 130–134
12. Bei S, Xing Z, Taocheng L, Qin L (2017) Sitting posture detection using adaptively fused
3D features. 2017 IEEE 2nd Information Technology, Networking, Electronic and
Automation Control Conference (ITNEC), pp 1073–1077
Multicast Symmetric Secret Key Management
Scheme in Mobile Ad-hoc Networks
1 Introduction
Mobile Ad hoc Networks are one type of wireless networks that operate multi-hop
radio transmission without any permanent infrastructure. Because of the unique char-
acteristics of MANETS, like dynamic topology, radio links, scarcity of resources,
without any central coordination etc., they are extremely susceptible to security attacks
than wired and cellular wireless networks [1]. Secrecy is the most important issues of
MANETs to thwart against attacks. Multicast communication plays an important role in
MANETS to provide group oriented communication like military applications, search-
and-rescue, and war fare situations. Secure Group Key in multicasting is required to
leverage the group communication issues in MANETS. Creation of shared and secure
cluster key means many users need to calculate a shared key to exchange information
in a secure manner.
There are several group key management protocols for wired networks, infras-
tructure networks and as well as for MANETS. All these protocols are grouped into
three types: (i) Centralized Group Key Management Schemes (ii) Decentralized Group
Key Management Schemes (iii) Distributed Group Key Management Schemes. Dis-
tributed key Management protocols have no single point of failure, low message
overhead and less computational complexity in rekeying than Centralized and
Decentralized group key management protocols [2]. Rekeying means, when a user
enters or leaves the cluster, a new Shared Secret Group Key is to be produced. Dis-
tributed Group Key Agreement protocols for multicast communication are classified in
two categories: (1) Symmetric GKA (2) Asymmetric GKA. We proposed a symmetric
GKA protocol-“Multicast Symmetric Secret Key Management Scheme” (MC-SSKMS)
in MANETs and discussed the performance analysis using Key Delivery Ratio (KDR),
Delay in key Transmission, Energy consumption metrics.
The remaining part of the paper is organized as follows: Sect. 2 describes related
work, Sect. 3 presents Methodology of proposed protocol, Sect. 4 presents Simulation
Environment and Parameters, Sect. 5 shows the Results and Sect. 6 shows the
conclusion.
2 Related Work
are the Decentralized schemes. The authors have assessed the Key delivery ratio, Delay
and Energy consumption, and Packet loss for the above mentioned protocols with
varying group size.
In SEGK model [8], authors developed a mechanism to guarantee the forward and
backward secrecy in which recalculating of secret gathered key is done very often. In
this model, Tree Links and Periodic flooding of control messages are the two tech-
niques used to find the malicious nodes. The first one is used when the node mobility is
not important and the latter used in frequent changes in topology. B. Madhusudhan
et al. [9] developed a method called “Mobility Based Key Management (MBKM)” for
multicast communication in MANETS. In this method authors proposed that
Group/cluster head periodically performs the rekeying process. By that the multicast
group ensures Forward Secrecy and Backward Secrecy.
Fig. 1. Flow graph of Multicast Symmetric Secret Key Management System (MC-SSKMS)
186 M. Sandhya Rani et al.
4 Simulation Environment
We have done the experiments through NS2 Simulator. NS2 is an open source event
driven network simulator to model and analyze the wired and wireless network traffic.
We have chosen a Linux operating System i.e. UBUNTU 12.10, as Linux provides a
numerous text processing scripts that can be used to analyze the packet transmission in
NS2. We used Tcl (Tool Command Language) code which is as part of NS2 for
implementing our work. We used CBR as traffic type for packet transmission and
1000 1000 transmission ranges for simulation. Tcl code generates two files namely
NAM (Network Animator) and Trace files with different parameters as input. NS2 is
used to model different kind of mobility models like Random Way Point model, Grid
model etc. Our proposed protocol traffic is visually shown in NAM trace file. Awk
programming is used to record the data values by taking trace file (.tr) as input. We
have taken these recorded values and generated the graphs for our proposed method.
(2) Delay in Key transmission: The average delay of keys transmission (D) from the
sender to the receivers is the time taken to transmit the group key to all the group
members. To guarantee an effective harmonization between the encryption and
decryption of data in group communication, this delay should be reduced.
(3) Energy consumption: The Energy consumption (E) is described as the number of
energy units required for delivering the keys to group members in multicast
communication during the simulation.
5 Results
We have done the performance assessment of our proposed protocol with two existing
protocols. First, we had shown the comparison of Key Delivery Ratio of MC-SSKMS
with “Distributed Multicast Group Security Architecture (DMGSA)” and “Mobility
based Key Management (MBKM)” protocols. The recorded values through NS2
simulations are shown in Table 2 and corresponding graphs are depicted in Fig. 2. It
graphically visualized that our protocol has better Key Delivery ratio than other two
specified protocols. Then we compared Delay of our proposed protocol with “Dis-
tributed Multicast Group Security Architecture (DMGSA)” and “Mobility based Key
Management (MBKM)” protocols. The recorded values through simulations are shown
in Table 3 and corresponding graphs are depicted in Fig. 3. It graphically showed that
our protocol has less Delay than other two specified protocols.
Table 2. Results obtained for key delivery ratio with varying number of nodes observed from
MC-SSKMS and other two contemporary methods
QoS Key delivery ratio
Nodes 20 40 60 80 100
DMGSA 80.9 89.87 84.9 83.34 83.9
MBKM 89.78 91.67 94.43 87.98 85.34
MC-SSKMS 92.85 94.61 96.74 92.7 89.91
Keyrao
DMGSA MBKM MC-SSKMS
96.74
94.61
94.43
92.85
91.67
89.91
89.87
89.78
92.7
87.98
85.34
83.34
84.9
83.9
KeyRao
80.9
20 40 60 80 100
Nodes
Fig. 2. Graphical representation of key delivery ratio for DMGSA, MBKM and MC-SSKMS
188 M. Sandhya Rani et al.
Table 3. Results obtained for delay with varying number of nodes observed from MC-SSKMS
and other two contemporary methods
QoS Delay
Nodes 20 40 60 80 10
DMGSA 1.76545 3.34976 3.65748 4.56768 4.76896
MBKM 1.656479 3.106972 3.532987 4.324796 4.523796
MC-SKMS 1.312789 2.245609 2.689076 3.107033 3.888776
DELAY
4.523796
DMGSA MBKM MC-SSKMS
4.324796
4.76896
4.56768
3.888776
3.532987
3.65748
3.106972
3.107033
3.34976
2.689076
2.245609
1.656479
1.312789
1.76545
DELAY
20 40 60 80 100
NUMBER OF NODES
Table 4. Results obtained for energy consumption with varying number of nodes observed from
MC-SSKMS and other two contemporary methods
QoS Energy consumption
Nodes 20 40 60 80 100
DMGSA 23.7654 21.3245 21.1233 21.1034 21.3456
MBKM 24.24113 22.43122 22.21732 22.17674 22.56745
MC-SSKMS 24.17114 22.16345 21.94356 22.06782 22.34789
Fig. 4. Graphical representation of energy consumption for DMGSA, MBKM and MC-SSKMS
Multicast Symmetric Secret Key Management Scheme 189
6 Conclusions
References
1. Zhou L, Haas ZJ (1999) Securing Ad hoc networks. In: IEEE Network
2. Bouassida MS, Chrisment I, Festor O (2008) A group key management in MANETs. In:
Int J Netw Secur 6:67–79
3. Fenner W, Internet group management protocol, Xerox PARC, RFC 2236, Version 2
4. Jiang B, Hu X (2008) A survey of group key management. IEEE Int Conf Comput Sci Softw
Eng 3:994–1002
5. Chang BJ, Kuo SL (2009) Markov chain trust model for trust value analysis and key
management in distributed multicast MANETs. IEEE Trans Veh Technol 58(4):1846–1863
6. Huang D, Medhi D (2008) A secure group key management scheme for hierarchical mobile
Ad hoc networks 6(4):560–577
7. Bouassida MS, Bouali M (2007) On the performance of group key management protocols in
MANETs. In: Proceedings of the joint conference on security in network architectures and
information systems (SAR-SSI’07), pp 275–286. Annecy, France, June
8. Wu B, Wu J, Dong Y (2008) An efficient group key management scheme for mobile ad hoc
network. Int J Netw
9. Madhusudhanan B, Chitra S, Rajan C (2015) Mobility based key management technique for
multicast security in mobile Ad hoc networks. Sci World J, pp 1–11
10. Renuka A, Shet KC (2009) Hierarchical approach for key management in mobile Ad hoc
networks. In: Int J Comput Sci Inf Secur 5, pp 87–95
An Enhanced Virtual Private Network
Authenticated Ad Hoc On-Demand Distance
Vector Routing
Sara Ali(&)
Abstract. One of the most frequently used protocol in the MANETS Mobile
Ad Hoc networks is AODV the Ad hoc on-Demand distance vector routing. The
protocol is open to various security threats. Through this paper we have pro-
posed a novel Virtual Private Network Authenticated Ad hoc On-Demand
Distance Vector Routing (VPNAODV) protocol which employs techniques like
Virtual Private Network, Observer nodes and Digital signature to defend the
protocol from attacks like flooding, wormhole, black hole, and Sybil attacks.
Our proposed protocol enhances the basic AODV protocol while retaining the
underlying functionality of the algorithm. Network Simulator-2 was used to
simulate our results, we have compared these results of AODV with our pro-
posed algorithm and found our proposed algorithm to be superior.
1 Introduction
Ad Hoc Distance Vector network routing protocol is one of the largely used network
protocol for routing in the MANETS. The protocol is Reactive in nature which indicate
the updates are exchanged between the nodes on-demand while not in a periodic
manner [1, 2]. The functionality of the MANETS allows each and every node which is
a part of the network to behave as a specialized router, which can retrieve routes as and
when required. These routes provided by the protocol are loop free. The bandwidth
usage is considerable low as in case of any disintegrated nodes in the network the
protocol does not need any additional advertisements. The neighboring nodes to have
the exclusive faculty of detecting each other’s broadcast messages. The principal
objectives of our proposed algorithm are
may result in the route entry table being modified and all the packets getting diverted
through this fallacious node.
3 Literature Review
See Table 1.
4 Proposed Algorithm
See Figure 1.
• The next data transmission uses the initial vector value of the message digest where
‘h’ the hash function is a result of function ‘h’ applied on ‘x’
• When even a node initiates a RREQ, RREP or a RERR it needs to verify the
validity of the message by using the initial vector value in order to decrypt the
message digest which was available with the target node initially, the hash value is
used to decrypt and verify of the received value is equal to the Message-Digest field
of received AODV message present in the Message Digest field
5 Simulation Results
The results are simulated in the existence of attacks like Wormhole, Flooding, Black
hole and Sybil attacks. We can examine that the Average throughput, End-to-end delay,
Energy Consumption Packet drop rate is superior in the case of our protocol
VPNAODV even in presence of the attacks mentioned above which is represented by a
redline.
6 Conclusion
The algorithm is the major crux of the research work. The major contribution lies in the
2-Phase monitoring of the network which helps in monitoring the messages being
passed in between the nodes and also encrypting them. When we compared our results
with the traditional NS2 we have found our algorithm to perform better for parameters
like Average throughput, End to End delay, Energy Consumption and Packet Drop
Rate for various attacks like Wormhole, Blackhole, Flooding and Sybil attack.
References
1. Perkins CE. Ad-hoc On-demand distance vector routing. Charles E. Perkins Sun Microsys-
tems Laboratories Advanced Development Group Menlo Park, CA 94025
2. Perkins CE. Ad-hoc on-demand distance vector routing. Charles E. Perkins Sun Microsystems
Laboratories Advanced Development Group Menlo Park, CA 94025
3. Panday MM, Shriwastava AK (2013) A review on security issues of AODV routing protocol
for MANETs, IOSR. J. Comput. Eng. (IOSR-JCE) 14(5):127–134, Sep.-Oct. ISSN 2278-
0661
4. Sharma P, Sinha HP, Bindal A (2014) Detection and prevention against wormhole attack in
AODV for mobile ad-hoc networks. Int. J. Comput. Appl. 95(13)
5. Goyal S, Rohil H (2013) Securing MANET against wormhole attack using neighbor node
analysis. Int. J. Comput. Appl. 81(18):44–48
6. Stallings W (2006) Cryptography and network security: principles and practices. Pearson
Education India
Safe Drive – Enabling Smart Do not Disturb
on Mobile and Tracking Driving Behavior
Abstract. One of the major cause for accidents is distraction. The risks of
accidents increase because of attending to calls be it using Bluetooth devices or
voice assisted calling. Existing solutions provide several apps providing modes
like driving, home, office etc., where you can configure various do not disturb
settings on the phone. However, these solutions only have option to turn off
calling mode during driving. We present an innovative app and model using
mobile sensors, crowd-sourced data, web services and feed, for smartly handling
the calls. The proposed app will automatically put the phone in Do Not Disturb
or Calling mode by smartly detecting unfavorable/favorable circumstances
respectively. We present variance thresholding based approach on accelerometer
data to sense the driving behavior and classify a situation as safe or unsafe to
make or receive a call. Secondly, we provide a framework to connect to various
services or apps and collect data to track historical data of accidents in the
vicinity. Finally, we provide driver analytics and driving performance scores to
incentivize safe driving practices.
1 Introduction
Research has shown that people who talk on phone while driving are four times likely
to be met with an accident than those who do not [8]. People who talked on phone
committed more traffic violations, committed more attention lapses, changed lanes less
frequently but reacted quickly to events occurring directly in the line of sight [9]. It has
become common for everyone to attend calls and browse their phones while driving.
Many secondary activities particularly resulting from the use of handheld electronic
devices are detrimental to driver safety [7]. Talking on phone while driving is con-
sidered as multitasking as a part of brain is used for processing auditory sentences [5].
The total number of road accidents during 2016 are 48 lakhs in India [10]. Modern
leading mobile platforms like IOS added driving mode in version 11. It turns on “do
not disturb” mode by automatically detecting that the user may be driving. The feature
can be disabled by the user [11]. While this is helpful, at times, people need to use
phones even while driving. A safer way to use the phones for small duration during
driving will be a welcome feature.
Researchers have worked upon determining safe driving behaviour using various
sensor data available on mobile device. Authors of [1] used Android based smart
phone, Nexus One, which contains Bosch BMA150 3-axis accelerometer. It configures
the vehicle conditions and recognises gear-shifts relying on the accelerometer. It detects
driving patterns of the user by accessing x and y-axis of the accelerometer. It detects
the conditions of the roads travelled by the driver and makes a detailed map informing
whereroad anomalies are present by using x and z - axis of accelerometer. Authors of
[4] detect traffic honking, bumps and vehicle braking using Microphone, GPS,
accelerometer. It utilizes GPS for traffic localizations. Authors of [2] determine whether
a driver drives safely or not. It uses accelerometer and digital compass for measuring
acceleration, deceleration, braking distance, and 2D and 3D rotation matrices for device
orientations. S-Road Assist [6], an app available in Google play store, collects data of
the accelerometer, gyrometer and GPS. It detects the orientation of phone in the car and
uses that to detect anomalies in roads. It gives scores to driver based on the trip on
various levels from beginner to expert. Authors of [3] present a survey on mobile
phone sensing. The study includes various sensors on a mobile phone, viz.
Accelerometer, GPS, Gyro meter, Digital Compass, Microphone and Camera.
In this paper we present a smart driving mode on a phone that, will adaptively turn
on a “interactive do not disturb” mode whenever it senses unsafe driving conditions. It
also limits the talk time to 2 min even during safe driving conditions. The interactive
do not disturb mode responds to the caller by an appropriate message rather than
ignoring the call. While in safe zone, it also reminds the user with voice prompts of any
calls missed while in unsafe driving zones.
We use accelerometer data, as almost all phones have this sensor, and present an
algorithm to sense the driving behavior (twists, turns, braking, speeding, and bumps).
Secondly, we propose building and connecting to services or app collected data to track
historical data of accidents in the vicinity based on GPS location to act as additional
feature for detecting safe or unsafe zones. Finally, we propose a framework to collect
report and reward driving behaviours to promote safe driving styles.
This paper is organized as follows. Section 2 provides an overview of features we
propose in our app. Section 3 describes the methodology of detecting safe or unsafe
driving conditions. Section 4 describes the app design. Section 5 presents our results
and conclusion.
200 H. Tummalapalli et al.
2 App Features
3 Methodology
3.1 Approach
Figure 1 gives an overview of various sensors present of mobile device that can help
detect driving behavior. We however only use accelerometer and GPS sensor as they
are available on all mobile phones. Gyroscope sensor is present on mostly premium
mobile phones. We collect 5 readings per second of accelerometer, then we do a
moving average and variance of last 5, 10, and 20 s worth. We use the variations along
x, y, and z axis to detect variations in speed, bumpiness of road, and twists and turns
respectively. We use experimentally determined thresholds and define conditional rules
to determine safe or unsafe condition. The process in essence is similar to a Decision
Tree Classifier.
Safe Drive – Enabling Smart Do not Disturb 201
Here l is the moving mean, lold, is the previous mean before adding the new point
and xnew is the newly added point. n is the window size, for 5 s window n = 25. Based
on the mean calculation we derive moving window variance, hx, hy, hz as follows. Here,
s is the sum of squares of elements in the window.
hx ¼ ðsx lx Þ2 = n: ð7Þ
hy ¼ ðsy ly Þ2 = n: ð8Þ
hz ¼ ðsz lz Þ2 = n: ð9Þ
202 H. Tummalapalli et al.
4 App Design
We designed the app using Android Studio requiring App Development in Java, XML,
SQLite database, MP Android Chart Library, Web Services, Google Maps and web
services. Figure 3 showcase the real time use of the app. Figure 4 shows the workflow
of the app.
Safe Drive – Enabling Smart Do not Disturb 203
As shown in Fig. 4, after logging on to the app, the sensor devices concerned will
be detected. Whenever a call is detected, the app will check whether it is safe to pick up
the call as discussed in Sect. 3. If it is not safe to pick up the call, the app will perform
actions as proposed in Sect. 2, else the app allows the user to pick up the call.
5 Results
Figure 5 shows some results of five second moving variance. Top row left shows the
normal driving scenario on a smooth road without traffic. There is not much variation
other than initial start and ending stopping times. The top right shows scenario where
one is driving slowly in traffic. There is slight variation. Bottom left shows a bumpy
road. Notice how there is more variation in z direction. Finally, bottom right shows
scenario of acceleration from 0 to speed of 60 kmph. There is more variation Y initially
and gradually drops as the speed smooth out.
204 H. Tummalapalli et al.
References
1. Fazeen Md, Gozick Md, Dantu R, Bhukhiya M, Marta CG (2012) Safe driving using mobile
phones, 19 March 2012. IEEE Trans. Intell. Transp. Syst. 13(3, Sept.):1462–1468
2. Langle L, Dantu R (2009) Are you a safe driver? 2009 international conference on
computational science and engineering. https://doi.org/10.1109/cse.2009.331
3. Nicholas DL, Emiliano M, Lu H, Daniel P, Tanzeem Ch, Andrew TC (2010) Dartmouth
college, a survey of mobile phone sensing. IEEE Communications Magazine, pp 140–150,
September
4. Mohan P, Padmanabhan VN, Ramjee R (2008) Nericell: rich monitoring of road and traffic
conditions using mobile smartphones. Proceedings of the 6th ACM conference on embedded
network sensor systems – SenSys’08. https://doi.org/10.1145/1460412.1460444
5. Oviedo-Trespalacios O, Haque MM, King M, Washington S (2016) Understanding the
impacts of mobile phone distraction on driving performance: a systematic review. Trans.
Res. Part C: Emerg. Technol. 72:360–380. https://doi.org/10.1016/j.trc.2016.10.006
6. Fazeen M, Gozick B, Dantu R, Bhukhiya M, González MC (2012) Safe driving using mobile
phones. IEEE Trans. Intell. Transp. Syst. 13(3):1462–1468. https://doi.org/10.1109/tits.
2012.2187640
7. Thomas AD, Feng G, Lee S, Jonathan FA, Miguel P, Mindy B, Jonathan H (2016) PNAS
March 8, 2016, 113(10):2636–2641, February 22
8. Redelmeier DA, Tibshirani RJ (1997) Association between cellular telephone calls and
motor vehicle collisions. New Engl. J Med. 236:453–458
9. Saifuzzaman Md, Haque MdM, Zheng Z, Washington S (2015) Impact of mobile phone use
on car-following behaviour of young drivers. Accid. Anal. Prev. 82:10–19
10. Statistics data on road accidents from 2013 to 2016. http://www.data.gov.in/. Accessed 7 Jan
2019
11. iOS 11 driving mode: https://metro.co.uk/2017/09/20/how-to-use-ios-11-driving-mode-
6941730/. Accessed 7 Jan 2019
Viability of an Uncomplicated IoT SaaS
Development for Deployment of DIY
Applications Over HTTP with Zero Investment
1 Introduction
Current Internet of Things arrangements are ordinarily given in single spaces [1], for
instance establishing connections among components [2], assembling the board, using
third party cloud services [3], etc. In such applications, domain specific [4] or on the
other hand venture explicit necessities drive the plan of all framework parts and decide
generally innovative components going from sensors and savvy gadgets to middleware
parts and application rationales. The administration conveyance process is organized by
IoT arrangement suppliers, who review target application situations, break down
application prerequisites, select equipment gadgets, incorporate subsystems given by
different merchants, create applications, give processing framework and keep up
administrations all through the lifetime of the framework. Despite the fact that this
2 Literature Review
Various studies show unparalleled attributes provisioned by IoT cloud service provi-
ders as seen in Table 1.
Table 1. Investigation into the current IoT software platform landscape; a feature comparisona
IoT software Device Integration Protocols for data Support for
platform management collection visualizations
Appcelerator No REST API MQTT, HTTP Yes (Titanium UI
Dashboard)
Bosch IoT Suite - Yes REST API MQTT, CoAP, Yes (User Interface
MDM IoT AMQP, STOMP Integrator)
Platform
Ericsson Device Yes REST API CoAP No
Connection
Platform (DCP) -
MDM IoT
Platform
EVRYTHNG - No REST API MQTT, CoAP, Yes (EVRYTHNG IoT
IoT Smart WebSockets Dashboard)
Products Platform
IBM IoT Yes REST and MQTT, HTTPS Yes (Web portal)
Foundation Real-time
Device Cloud APIs
PLAT. ONE - Yes REST API MQTT, SNMP Yes (Management
end-to-end IoT Console for application
and M2 M enablement, data
application management, and
platform device management)
ThingWorx - Yes REST API MQTT, AMQP, Yes (ThingWorx
MDM IoT XMPP, CoAP, SQUEAL)
Platform DDS, WebSockets
Xively- PaaS No REST API HTTP, HTTPS, Yes (Management
enterprise IoT Sockets/Websocket, console)
platform MQTT
a
Source: “Comparing 11 IoT Development Platforms” - An article by Miyuru Dayarathna on
Feb. 04, 16 • IoT Zone, DZone. https://dzone.com/articles/iot-software-platform-comparison
As IoT keeps on being received in additional, more organizations mesh into our
everyday life through developments [8] like savvy cities, the inherent impediments of
208 S. Tiruvayipati and R. Yellasiri
To address the exhibited impediments and empower efficient and versatile conveyance
of IoT customizations, we propose a layered IoT SaaS engineering architecture, out-
lined in Fig. 1. The IoT foundation comprises of organized labels, sensors, actuators,
keen gadgets, etc. There are a wide variety of protocols used for IoT but the existing
freely available cloud PaaS providers only work with HTTP to facilitate the joining of
IoT framework with DIY applications.
Layer 7 Administration of SaaS; monitoring utilization by IoT end devices and users
Layer 3 Customize IoT SaaS or build from scratch over PaaS based on provisionary
Layer 2 Survey on cost effective cloud provisions for IoT services over HTTP
Layer 1 DIY IoT end device hardware with customized code to suit end-users’ needs
On IoT SaaS, two sorts of administrations identified with information are given to
deal with continuous occasions and persevered information individually. Occasion
Viability of an Uncomplicated IoT SaaS Development 209
handling is to process and break down real-time occasions produced by tactile gadgets.
In the IoT SaaS model represented in Fig. 2, the assets incorporate not just cloud assets
for example, virtual machines and programming cases in customary cloud contribu-
tions, yet in addition IoT assets, custom coding for services per individual. The service
to give control applications is to be itemized prior to customization.
The metered data of both IoT and cloud assets is made to give a far reaching
perspective for administration of utilization. In the long run, charging limits with
different plans of action at runtime can break down the metered data concurring to
charging plans charged by various commercial IoT cloud providers and lead to the
utilization of freely available cloud services which could be easily custom coded to
meet the needs given, a little investment in time and effort for gaining knowledge on
programming while developing DIY IoT applications.
4 System Implementation
Each virtual vertical arrangement can settle to utilize the applications, which can be
conjured using CISCO Packet Tracer reducing the cost of building by identifying the
right equipment through simulation. Along the minimal lines of pseudo codes enlisted
below are to benefit the IoT asset and the executives after they are created. The
advancement should be possible by either third party designers or PaaS providers.
Pseudo code for, IoT end device sensor-
A: READ SENSOR VALUE
IF SENSOR VALUE CHANGES
B: CREATE HTTP REQUEST OBJECT WITH SERVER URL
ESTABLISH CONNECTION OVER HTTP
IF HTTP SESSION SUCCESS
SEND SENSOR DATA OVER HTTP
ELSE GOTO B
ELSE GOTO A
Pseudo code for, server sensor handler over the cloud for IoT end device sensor-
A: LISTEN TO HTTP REQUEST
IF HTTP REQUEST DATA NOT NULL
ESTABLISH DATABASE CONNECTIVITY
RECORD DATA ON DATABASE
SEND HTTP ACKNOWLEDGEMENT
ELSE GOTO A
Pseudo code for, server actuator handler over the cloud for IoT end device
actuator-
A: LISTEN TO HTTP REQUEST
IF HTTP REQUEST DATA NOT NULL
ESTABLISH DATABASE CONNECTIVITY
RETRIEVE STATUS DATA FROM DATABASE
SEND HTTP REPLY BY BINDING STATUS DATA
ELSE GOTO A
Viability of an Uncomplicated IoT SaaS Development 211
The experimental workbench consisted of CISCO packet tracer for simulation along
with Apache JMeter (configured to match all the providers over a common scale) for
testing the cloud providers.
Table 2. Implementation findings of major cloud providers for IoT deployments over
HTTP. Highlighted rows specify zero investment PaaS providers suggested for DIY implemen-
tations.
Free Trial on
Google IoT
expiry pay per None High Low
Cloud Core
use
Free Trial on
Microsoft IoT
expiry pay per None High Low
Azure Core
use
MediaTeK
IoT
Cloud Free Usage Yes Very Low Low
SaaS
Sandbox
Web
Free Usage /
Hostinger Hosting Yes High High
Plan Based
(PaaS)
Web
Free Usage /
AwardSpace Hosting Yes High High
Plan Based
(PaaS)
In comparison (Table 3) with the freely available services and the commercial
services, the latter does provide a clear betterment in performance. But, when cost is a
factor in comparison to the performance in the development and deployment of DIY
212 S. Tiruvayipati and R. Yellasiri
applications, the farther customization of freely available cloud services could be used.
Zero investment here involved more effort and time involved as there are no outright
services available at present to completely fulfill the needs of user requirements.
This paper proposed IoT SaaS—a novel cloud approach that underpins effective and
adaptable IoT administration customization using freely available services. On the
cloud IoT services arrangement suppliers can effectively convey new customizations by
utilizing assets and administrations, for example, space intervention; application setting
the board and metering on cloud. The area goes between an extensible system for IoT
SaaS to connect with different domain specific information models and give control
applications that depend on physical gadgets, building the executives and two control
applications are customizable to exhibit required DIY component.
The proposed engineering and reference execution is by and large additionally
formed into a mechanical evaluation IoT cloud over HTTP. In the meantime, the future
research can take a shot at the IoT SaaS will be led in two ways. First is to assess and
show the asset utilization of IoT applications so as to successfully allot processing
assets on the multi-occupant IoT administration stage. The application oriented asset
model will think about gadget conduct, physical setting of utilizations, information
handling prerequisites and use of designs. Second is to explore an individual cloud
space for high availability and performance of IoT devices and cloud conditions.
References
1. Polyviou A, Pouloudi N, Rizou S (2014) Which factors affect software-as-a-service selection
the most? a study from the customer’s and the vendor’s perspective. 2014 47th Hawaii
international conference on system sciences. Waikoloa, HI, pp 5059–5068
2. Haag S, Eckhardt A, Krönung J (2014) From the ground to the cloud – a structured literature
analysis of the cloud service landscape around the public and private sector. 2014 47th
Hawaii international conference on system sciences. Waikoloa, HI, pp 2127–2136
Viability of an Uncomplicated IoT SaaS Development 213
1 Introduction
Falls are a hazardous situation and quite common especially among elderly people, and
this leads to additional injury, fracture, and other health issues to them. The frequency
of fall, measured by World Health Organization (WHO), is measured as approximately
28–35% of people age of 65 each year increasing to 32–42% for those over age 70
years old [1]. Without the timely rescue, falls may even endanger their lives. It is very
much required to notify someone immediately after the occurrence of fall so that they
can take proper care before happening worse condition. With the advancement of
technology, an automatic system that can detect the fall of a body will be much more
helpful to save severe injuries.
One of the major concerns before designing an automated system for fall detection
is the cost efficiency so that they can be used not only industrial deployment but also
single residential use. In this paper, we present a new algorithm for fall detection which
is light for computation, as a result, it is economic for deployment.
2 Related Works
Recently there have been many kinds of researches done on human fall detection. The
most used techniques that are popular can be divided into two categories: wearable
sensors and vision-based technology.
Wearable sensors are the most common types that are used in different hospitals.
Bagalà et al. presented “Evaluation of Accelerometer-Based Fall Detection Algorithms
on Real-World Falls”, where they introduced 13 algorithms for fall detection associated
with wearable sensors (accelerometer and gyroscope) [2]. They got an average of 83%
successful detections. Other significant researches on human fall detection based by
wearable sensors are [3–9], which are not described here because of lack of space. One
of the major disadvantages of wearable sensors is that the number of false positives is
quite high. Also, it requires a portable battery, and wearing it all the time may be
uncomfortable for the user.
Vision-based detection system requires one or more cameras for monitoring person
activity, and it detects fall from the frames of the video. Generally, the camera either
contains two-dimensional information, or 3D information, where the output is depth in
addition to the 2D image. 3D cameras can be quite expensive depending on the
specifications. Kepski et al. researched on fall detection using the data from a ceiling
mounted 3d depth camera [10]. Rougier et al. also used 3D camera for head tracking to
detect falls [11]. Miaou et al. designed a fall detection system using omni-camera
images and personal data of the user [12]. Without personal information, the accuracy
was about 70%, and with personal information, it was about 81% [12]. Recent years,
machine learning techniques are very popular for detecting falls from videos. Liu et al.
used k-nearest neighbor classifier [13] from 2D video information. Alhimale et al. used
neural network for fall detection [14]. Other significant works for detecting human fall
are [15–18]. Using artificial intelligence increases the successful detection rate for fall
of elderly people. However, one of the main disadvantages is that, they require heavy
computational power. Pedro et al. showed that the computation of nearest neighbor
search can be significantly improved by parallel calculation on the GPU [19]. However,
the additional GPU or computer with heavy computational power is not financially
feasible for massive deployment for residential and industrial sectors, because they are
quite expensive. Additionally, it is also worth researching how much video instances
they can work simultaneously.
In our research, we provide a very cost-efficient fall detection technique based on
computer vision, that requires very low computational power to detect fall, and is a
financially feasible solution for massive residential and industrial deployment. We tried
to build a system which should be light and is able to detect the fall in real time. This
method is very convenient in terms of computational power, so it can easily be
implemented on small devices such as Raspberry-Pi.
216 K. S. Halder et al.
3 Approach
3.2 Description
To subtract the background from the foreground, we first use frame comparison
between the first frame when video starts capturing and the current frame. The first
frame will be either empty or can have objects in it. The absolute difference between
current frame and first frame will find the objects or pixels that are changed relative to
the first frame. In this way, we get the moving body from the background. But in this
approach, there are several constraints such as clothes and background should not be of
same pixel intensity as it may not be able to subtract background clearly.
For background subtraction, MOG2 algorithm can also be used. However, there are
noise issues with the dataset that we tested with. To get finer foreground, we use
silhouette of the body and applied thresholding, dilation and erosion by tweaking the
parameters to get the desired output. In this way, we got rid of unnecessary noise and
distortion of the moving object.
Contour is an outline or bounding box of a shape or something. With contours
finding for binary image implemented in OpenCV, several numbers of contours come
to picture [22]. To get rid of extra contours, we set a minimum area of the contour and
extract only those contours which have larger area than the threshold area. As the
method is getting applied to the background subtracted moving body, it is likely that
contour will be around moving human or around a large object if it is moving. After
getting the contours or Region of Interest, we bound the body with an ellipse as an
ellipse is much more efficient in determining angles than that of a rectangle.
For an example, a rectangle will always return angle either 0 degree or p/2 degree.
But with ellipse, it returns a variation between 0 to p degree. The angle of the body
with both horizontal and vertical should be considered to determine whether it is fall or
Novel Algorithm on Human Body Fall Detection 217
not. Again, with some additional logical gate and parameters, the rectangle bounding
box is also usable in this scenario. However, we found an ellipse to be more
convenient.
Let a and b be the minor and major axis of the ellipse respectively. Major axis
determines the width of the ellipse and similarly minor axis determines height. So
technically, approximate height of ellipse will be 2*b and approximate width will be
2*a. However, when a person falls, the orientation may change, resulting the minor and
major axis not to be same all the time while analyzing the video. Table 1 summarizes
the symbols used in the mathematical expressions of our research, and Fig. 1 depicts
the angles that are used to calculate the estimation of human fall.
For a body to fall, we observed that if body leans over the angle of 2p/5 from the
vertical and an angle of p/10 with the horizontal and if it follows the following
condition then it can be considered to be a fall.
When the value of a and b are close while changing the orientation of the ellipse,
their values require to be exchanged for finer tuning. The details are provided in the
pseudocode in Sect. 3.3 (Figs. 2 and 3).
start
//Capture frame 1 and save that frame
firstFrame = CaptureFirstFrame()
while(true):
//Calculate frame difference between two frames
frameDelta= absdiff(firstFrame,current_frame)
//Select the region of interest by contouring
contourArea = ContourAreaOfCurrentFrame()
foreach (number of pixel changed in contourArea):
update αH, βV
if(Abs (βH + π/2+ αV +3π/2) = π):
//update the minor and major axis value if
//they are changed while body falling
Gamma = Beta;
Beta = Alpha
Alpha = Gamma;
//delaying certain time to confirm fall
Delay(2s)
Print(“Fall Warning”)
End
Fig. 2. First frame which will be compared Fig. 3. Blurred frame to remove noise and
with all other frames and the areas changing retaining the as much information as
will be represented as contours. possible.
Fig. 4. Silhouette of the moving body of Fig. 5. Marking area of interest with
human. This was done by thresholding. bounding ellipse and determining angle
For finer tuning, dilation was added by and size of ellipse. Then using Eq. (1) to
image processing technique. determine it is a fall or not.
3.5 Limitations
The limitations of the algorithm are to compare frames not in reference to the first
frame but relative to previous frames. However, there is a lot of noise when in com-
paring with other relative frames. If it is possible to reduce the noise with additional
techniques, then results would be much better. The algorithm does not work in
occlusion, it requires the full sight of the human body. So, for an occluded area,
additional camera might be required for installation to avoid this limitation.
Another limitation is it compares the other frames with the first frame. As it is
always comparing with the first frame, so when the first frame is not empty it may not
display all the contours. The algorithm was not tested for a scenery of multiple person,
our research assumes no warning is required in such scenario.
4 Conclusion
References
1. World Health Organization (2007) WHO global report on falls prevention in older age.
Switzerland, Geneva
2. Bagalà F, Becker C, Cappello A, Chiari L, Aminian K, Hausdorff JM, Zijlstra W, Klenk J
(2012) Evaluation of accelerometer-based fall detection algorithms on real-world falls. PloS
one
3. Nari MI, Suprapto SS, Kusumah IH, Adiprawita W (2016) A simple design of wearable
device for fall detection with accelerometer and gyroscope. Int Symp Electron Smart
Devices (ISESD) 2016:88–91
4. Khojasteh SB, Villar JR, Chira C, González Suárez VM, de la Cal EA (2018) Improving fall
detection using an on-wrist wearable accelerometer. Sensors
5. Wang CC, Chiang CY, Lin PY, Chou YC, Kuo IT, Huang CN, Chan CT (2008)
Development of a fall detecting system for the elderly residents. 2008 2nd international
conference on bioinformatics and biomedical engineering: 1359–1362
6. Lindemann U, Hock A, Stuber M, Keck W, Becker C (2005) Evaluation of a fall detector
based on accelerometers: a pilot study. Med Biol Eng Comput 43:548–551
Novel Algorithm on Human Body Fall Detection 221
7. Bianchi F, Redmond SJ, Narayanan MKR, Cerutti S, Lovell NH (2010) Barometric pressure
and triaxial accelerometry-based falls event detection. IEEE Trans Neural Syst Rehabil Eng
18:619–627
8. Abbate S, Avvenuti M, Bonatesta F, Cola G, Corsini P, Vecchio A (2012) A smartphone-
based fall detection system. Pervasive and Mob Comput 8:883–899
9. Mao A, Ma X, He Y, Luo J (2017) Highly portable, sensor-based system for human fall
monitoring. Sensors
10. Kepski M, Kwolek B (2014) Fall detection using ceiling-mounted 3D depth camera.
2014 international conference on computer vision theory and applications (VISAPP) 2:640–
647
11. Rougier C, Meunier JF, St-Arnaud A, Rousseau J (2006) Monocular 3D head tracking to
detect falls of elderly people. 2006 international conference of the IEEE engineering in
medicine and biology society: 6384–6387
12. Miaou SG, Sung PH, Huang CY (2006) A customized human fall detection system using
omni-camera images and personal information. 1st transdisciplinary conference on
distributed diagnosis and home healthcare, 2006. D2H2:39–42
13. Liu CL, Lee CH, Lin PM (2010) A fall detection system using k-nearest neighbor classifier.
Expert Syst Appl 37:7174–7181. https://doi.org/10.1016/j.eswa.2010.04.014
14. Alhimale L, Zedan H, Al-Bayatti AH (2014) The implementation of an intelligent and video-
based fall detection system using a neural network. Appl Soft Comput 18:59–69
15. Qian H, Mao Y, Xiang W, Wang Z (2010) Recognition of human activities using SVM
multi-class classifier. Pattern Recogn Lett 31:100–111
16. Londei ST, Rousseau J, Ducharme FC, St-Arnaud A, Meunier J, Saint-Arnaud J, Giroux F
(2009) An intelligent videomonitoring system for fall detection at home: perceptions of
elderly people. J Telemedicine and Telecare 15(8):383–90
17. Miguel K de, Brunete A, Hernando M, Gambao E (2017) Home camera-based fall detection
system for the elderly. Sensors
18. Yoo SG, Oh D (2018) An artificial neural network–based fall detection. Int J Eng Bus
Manage 10:184797901878790. https://doi.org/10.1177/1847979018787905
19. Leite PJS, Teixeira JMXN, de Farias TSMC, Reis B, Teichrieb V, Kelner J (2011) Nearest
neighbor searches on the GPU. Int J Parallel Prog 40:313–330
20. Public dataset “Le2i” for fall detection, link. http://le2i.cnrs.fr/Fall-detection-Dataset?lang=fr
21. Video footage of testing result of our novel algorithm on the public dataset “Le2i” for
fall detection, available in https://www.youtube.com/watch?v=yAQU0QzgkVA&feature=
youtu.be
22. Suzuki S, Abe K (1985) Topological structural analysis of digitized binary images by border
following. Comput Vision, Graphics, and Image Process 30:32–46
Sensitive Information Security in Network
as a Service Model in Cloud-IPSec
Abstract. The integration of material for use of the Internet, which is utilized
by IT technology, is a cloud. The well-defined cloud is one of the best modern
technology companies for product performance and changes depending on
demand. Now, all the infrastructure of day-to-day business infrastructures, there
is a lot of data necessary for the safe transfer of data through the Internet. This
may include a company’s confidential information about product designs, pro-
duct expiration dates, patent owner information, human resources, job evalua-
tion, etc. Currently, all organizations are waiting on the web. Conversely, the
information necessity be moderated, although it can capture the information.
Consequently, all customers must use the cloud. In Hacker’s observation which
in some cases is the immediate move that the sensitive information will be held
within the virtual private cloud. The global security record during 2018, con-
taining the data mobility in the cloud at about 86.67%, a global analyst estimates
that this liming can be achieved at 100%. Therefore, this research paper focuses
on IP security, which is a typical set of rules for obtaining Internet Protocol
(IP) communication by verifying and encrypting the transfer of the information
stream from the network routing as OSPF and EIGRP (Enhanced Interior
Gateway Routing Protocol) protocol to implements, the effects of using the IP
security tunnel on the Network as a Service in the edge router. AES - Encryption
algorithm, SHA1 hash algorithm and Pre-Shared keys are used in the proposed
structure. The looking at of the domino effect also shows to facilitate the ESP
protocol is vaguely less effective than the header authentication protocol, which
is obviously due to the ESP protocol that supports data encryption, where Cloud
is implemented with GNS3, tested in the Wireshark to protect against attacks.
1 Introduction
NIST declared three services: PaaS, SaaS and IaaS. SaaS offers commercial software
via the online channel. Pass service must provide all the necessary resources to dis-
tribute applications and services completely from the Internet without the need to
download or install software. Pass service is designing, developing, testing, [1, 3]
distributing and distributing applications. When SaaS, PaaS provides applications to
customers, IaaS does not. It simply offers hardware, so your business can invest in
whatever it wants. Instead of purchasing servers, software, trunks, and pay for data
center space, [5] service providers hire these resources.
In 2012 Wolf suggested that one of the latest cloud services is NaaS. New cloud
computing model as a service in which to have access to additional computing
resources, has collaborated with virtual PC, firewalls, routers and [1, 5] switches and
ASA. Tenants can use NaaS to make personal transfer decisions based on application
needs, such as load balancing, protecting sensitive data or packets and personal mul-
ticast services. The main perception of NaaS is to reduce the cost of exchange data and
to better improve the network flexibility in the cloud consumers. It includes bandwidth
and [2] flexible and extended VPN on demand (Fig. 1).
NaaS offers genuine network to users. The user can have as many numbers as
necessary, sharing and implementing the required policy. With NaaS, a user can also
have a network such as IPv4 and IPv6 departments working side by side or separately.
2 Related Work
Cloud encryption uses algorithms toward fashion a way to shield personal info. To
ensure that data and data are kept secret, encryption improves measurement, mea-
surement and implementation of the debut. On the other hand, it controls the function
of honesty through the use of algorithms of [13]. John et al. have examined the
powerful generation of IP-VPN cloud computing. The encrypted text to be decrypted
and this can be referred to by the right use of “blind” [11]. In all organizations the
network as a service Security problems are very powerful in the market. The protection
of a network largely involves the use of programs and procedures for protecting the
various network devices from unauthorized access [12]. Safe tunnels assurances, the
integrity of the transmitted data and the legitimacy of the communications [7]. IPSec is
set up at the top level the protocol is called ISAKMP, with the SKEME protocols at the
lower level and a subset of Oakley Exchange keys [6]. Many institutions use a way to
protect their systems using the corresponding algorithms. On the other hand, algorithms
are also used to create security systems. Critics in 2013 through the protection of cloud
cryptography are a reviewer, confidentiality, lack of respect and integrity required and
recorded information on an internet server [15].
224 H. Bommala and S. Kiran
2. Set REO = Routing x O where, Iterative {for all d belongs to REO} : Iterative {re z
€ to RE, where z € to {1,2,3,…z}}, Set up and config ro on roz, Iterative {Inter
configuration REO to S}
3. Start server with IPx port, Loop {for all VPCSj = VAP, where j = 1,2,…n}.
Establish connection to Cloud at IPx port. Set up all RE x Routing {re1,re2… rn
with OSPF and EIGREP}
4. Set up REOS with Router. Iterative {re z belongs to RE, where z belongs to {1,2,3,
…z} and inner dz belongs to REOS is {OSPF & EIGRP x O X S}. Iterative do
simulation.
5. Stop
Proposed Technique Implementation in Network as a Service Model in Cloud
1. Open all putty terminals
2. Enabling the terminals
a. Configuration of terminal
b. Set on the connecting for router | cloud| switch |Ethernet| serial line
i. Each router or cloud Assign the public or private IP address
ii. Configure the route of sensitive data travelling method through OSPF |
EIGRP
iii. Configure loop and assign address (private or public address). No sh
iv. Creating or set crypto isakmp key keystring address peer-address
v. Creating or Set crypto isakmp key keystring hostname h_name
vi. Creating or Set crypto keyring k_name
vii. Design the Preshared-key address address key key
viii. Design the Preshared-key hostname h_name key key. No sh
3. Context ctname
a. Crypto map m_name ipsec-isakmp
b. Assigning or Set individual peer address (global | private)
c. Set or creating the isakmp preshared-key isakmp_key
d. Set or creating mode {aggressive || main}
e. Set pfs {group1 | group2| group 5}
f. Set or implementing IPsec transform-set transform_name
g. Match address a_name [preference]
h. Match crypto-group g_name {primary | secondary}. End/End /End
4. Algorithm for creating IKE Policy
a. Open all terminal by using putty
b. Enabling all router | switch | VCPN | cloud
c. configure terminal
d. making the crypto isakmp (Internet Security Association and Key Management
Protocol) policy priority
e. implementing cryptographic encryption techniques {des | 3des | aes | aes 192 |
aes 256}
f. implementing hash {sha | sha256 | sha384 | md5}
g. implementing authentication {RSA-sig | RSA-encr | pre-share}
226 H. Bommala and S. Kiran
Fig. 5. Crypto session and verify Fig. 6. Crypto IPsec security association
4 Analytical Analysis
4.1 Space Complexity
In IPsec Authentication Header size is 12 and ESP size is 10 bytes of constant header
fields and size of the variable authentication data. In the specific field contains a data of
the authentication is a method. IPsec uses for authentication keyed hashing for message
authentication (HMAC). Now to analyzing and combined with the hash algorithm
Sensitive Information Security in Network as a Service Model 227
4.2.2 Authentication
SHA1 fills the freshness data by adding 1 to 512 stuff bits. The SHA1 algorithm uses
five intermediate 4-byte registers instead of four. Thus there necessity is a final message
reports 160 bits. 4 chunk of 64 bytes to the messengers of achieve each had his own
algorithm for 20 cycles step by step. In the specified feature is an active subscriber
encryption calculation, the current log-in and continuous number. When looking for
SHA1, it is conceivable to find that 10 to 13 operations per phase are required. Then
getting whole number of operations (T) is estimated as per block of operations as in
1110 (= 900 + 210) n = N/512, – (4) Where N = Input + Pad + Size. – (5)
Input is the input text, pad is the padding field, size is the size field, and N is the
total message size. The HMAC-SHA1 algorithm formulated as SHA1 (M0, SHA1 (Mi,
Tp t)) where —- (6) M0 = Key ex-or opad — (7) Mi = Key ex-or ipad —— (8)
M0 and Mi are two unmitigated form of the input Key and be generated by
exclusive or key to complement the inner (512 bits) and in addition to the external ipad
(512 bytes). Key is an capricious extent secrete key communal by sender and receiver.
Tp t is the given input message subject to authentication. Nk = (N + K)/512 — (9)
Nk = 1 + N/512, —— (10) where, K is the size of the superfluous appended
interior form of the key (512 bits). The total number of operations (T) needed for
HMAC-SHA! Is of O nk where, T(nk) = 32 + (2 + nk) + 1110. — (11)
Delay: Delay is the travelling from R1 to R3 through the Cloud the total time is called
delay. Based up the above Fig. 8 calculate delay and rate (Table 2; Fig. 9).
Above table mentioned reports are generated when random traffic is generated from
R1 to Cloud and R3 to Cloud respectively.
Jitter: The receiving packets of in variation of the delay are called jitter.
By the Fig. 10 calculate the jitter from R1 to Cloud and R3 to Cloud in Network as
a Service by using Wireshark with GNS3 (Table 3).
5 Conclusion
Security is the foremost side in the each modern technology like cloud. Each miniature
new attack is generated in the fretful field. Therefore, it requires the authoritative
security mechanism to knob all classification of attacks. So, this paper focuses the
strongest security mechanism. The security mechanism is IPsec to provide a nice way
to secure the sensitive data when it is transferred through the Network as a Service of
the Cloud building up a protective channel between receiver and sender. Since IPsec
supply the Confidentiality and Integrity, Authentication and Anti-reply off to secure
traffic over the Network as a Service in the Cloud. The information is routed by using
the OSPF and EIGRP protocol the check model of the Network as a Cloud. If choosing
any alleyway from the model, the packets are fully protected, encrypted and decrypted.
Testing time comparison of time delay and jitter of routers R1 to Cloud and R3 to
Cloud, by using Wire-Share analysis, it is not possible to lose the best performance
analysis of sensitive data. In that case, sensitive data encrypted and fully secure
communication channel used for sending and receiving the data. Sensitive data is
encrypted to spending time, decrypted receiving them. By using Wire-Shark to observe
the Encapsulation Security Payload (ESP) has performed authentication, confidential-
ity, and integrity for sensitive data to transmit in a protection communication media. So
to calculate the time delay and jitter of the packets transmitted in a source to destination
and destination to the source of the ESP in the NaaS. To observe in wire-shark, the
packets were not to be dropped or less. Among the fully provide the security and
authentication, confidentiality, integrity, and anti-replay to Network as a Service with
GNS3 architecture have been adopted ensuring network security, routing, encapsula-
tion and encryption to be performed by using IPSec tunnel and transport mode.
230 H. Bommala and S. Kiran
References
1. Harikrishna B, Kiran S, Deep KM (2018) Network as a service model in cloud
authentication by HMAC algorithm. Int J Adv Netw Appl 9(6):3623–3631
2. Online Source. https://docs.gns3.com/
3. Harikrishna B, Kiran S, Murali G, Pradeep kumar Reddy R (2016) Security issues in service
model of cloud computing environment. Procedia Comput Sci 87:246–251
4. Free CCNA Tutorials (2017) Study CCNA for free! Study-ccna.com. N.p., 2017. Web.
21 March 2017
5. Harikrishna B, Kiran S, Pradeep Kumar Reddy R, Protection on sensitive information in
cloud — cryptography algorithms. IEEE digital library. https://doi.org/10.1109/CESYS.
2016.7889894
6. Neumann JC (2015) The book of GNS3 device nodes, live switches, and the internet
7. Check point FireWall-1 (1997) Version 3.0 White paper. June 1997
8. Wallace K (2015) CCNP routing and switching ROUTE 300-101 official cert guide, 1st edn.
Pearson Education, Indianapolis, IN
9. Internet Key Exchange Security Protocol Commands (2002) Cisco Systems, Inc. 66973.
http://www.cisco.com/en/US/products/sw/iosswrel/ps1828/products_command_summary_
chapter09186a00800eeaf5.html
10. Bellovin S (1996) Problem areas for the IP security protocols. In: Proceedings of the sixth
usenix unix security symposium, p 116. San Jose, CA, July 1996
11. Kent S, Atkinson R (1998) IP authentication header. RFC 2402, November 1998
12. Maughan D, Schertler M, Schneider M, Turner J (1998) Internet security association and key
management protocol (ISAKMP). RFC 2408, November 1998
13. Thayer R, Doraswamy N, Glenn R (1998) IP security document roadmap. RFC 2411,
November 1998
14. Madson C, Glenn R (1998) The use of HMAC-SHA-1- 96 within ESP and AH. RFC 2404,
November 1998
15. Mairs J (2002) VPNs a beginner’s guide. McGraw-Hill\Osborne, 209
Exploratory Data Analysis to Build
Applications for Android Developer
Abstract. In this paper, authors used the Exploratory Data Analysis (EDA) that
embodies different patterns and find useful tidings from Google play store
application (app) data. The intrinsic objective behind this is to analyze the
features of the dataset in order to help the developers to understand the trends
within the market and the end user needs towards the application, as well as the
mechanism of App Store Optimization (ASO) that leads to enhancement of the
popularity of the developer app.
1 Introduction
There are over 7.7 billion [1] people in the world, out of which 2.1 billion people have
Android devices [2]. Providing a new app to choose over the 2.5 million available play
store apps would be a tough task for the developer. The purpose of this study is to assist
developers to make their app more successful in Google play store. Android is a
Google’s mobile operating system which is free licensed software. The simplicity of
android enables us to deliver vast applications. By the end of 2018, it occupied 85% of
the global market [3].
From John Tukey words, Exploratory Data Analysis can never be the story, but
nothing else can serve as the foundation story. We applied EDA as part of machine
learning to answers the questions like what are currently trending apps? What are the
most used apps by the users in the play store? And ratings vs reviews and category wise
usage. In this paper we make a sense to the developer about App Store Optimization
(ASO) in order to make the app more successful.
Authors applied machine learning analysis on data to help the developer to get a
better idea. This work is intended to find the quintessential part of the apps so that
developer grasps the utility of app based on the people’s needs. As a developer,
probing what user wants is time taking process. Our results can help developers
develop features in apps that can improve their number of downloads.
2 Analysis
By studying existing work, the two crucial things for a developer is App Store
Research and App Store Optimization (ASO). Initially, the purpose of app store
research is to measure and find the trends and vital factors of present android market.
You had a great product or service idea which you had to turn into a concrete and
functional mobile app. Developing a great app is a good start but launching a new app
with nearly 2.5 million [1] in the google play store makes it difficult to stand out and to
be discovered from the other apps.
Secondly, app store optimization is to increase the visibility and discoverability of
an app in the play store by using ‘ranked keyword’. After analyzing, understood that
there is a significant relation between apps store optimization and app store research,
which helps to increase the chance of getting an application successful for a start-up
developer. Data is collected from Kaggle and applied exploration data analysis to
unbox the useful insights about the apps in play store. EDA is one of the important
aspects of machine learning, Exploratory data analysis (EDA) is an approach to
summarize their statistics to know more about the data, often with visual methods.
A statistical model is useful to make assumption, but primarily EDA is for seeing what
the data can tell us to make useful tidings.
4 Evaluation
In this study, Authors attempt to answer two things, firstly to help the Android
developer to get better knowledge about the android market and the following thing is
how to increase the developer’s app search ranking.
234 N. Rajesh et al.
To understand the android market, we used the EDA process to learn more about
data insights often by using graphs. In this paper, we evaluate the following things and
shown in Fig. 3.
Fig. 5. Free apps are about 95.1% and paid apps are about 4.9%
5 Conclusion
Every day, many apps have been deploying in the play store. This paper helps user to
improve their understanding level about android applications. As a developer, if anyone
want to develop an android application, EDI helps to analyze the android market and
ASO aids to optimize your app visibility in the Google play store.
References
1. Android and Google Play Statistics (2019) Development resources and intelligence |
AppBrain. Appbrain.Com. https://www.appbrain.com/stats. Accessed 6 Jan 2019
2. World Population Clock: 7.7 Billion People (2019) - Worldometers (2019) Worldometers.
Info. http://www.worldometers.info/world-population/. Accessed 6 Jan 2019
3. 2018, Market and Statistics - Elearning Learning (2019). Elearninglearning.Com. https://
www.elearninglearning.com/2018/market/statistics/. Accessed 6 Jan 2019
4. 2019. https://www.quora.com/What-are-best-practices-for-app-store-optimization. Accessed
6 Jan 2019
5. Joorabchi ME, Mesbah A, Kruchten P (2013) Real challenges in mobile app development.
In: Empirical software engineering and measurement, 2013 ACM/IEEE international
symposium on. IEEE
236 N. Rajesh et al.
6. Chang G, Huo H (2018) A method of fine grained short text sentiment analysis based on
machine learning. Neural Netw World 28(4):345–360
7. Hassan S, Bezemer C, Hassan A (2018) Studying bad updates of top free-to download apps
in the Google play store. IEEE Trans Softw Eng pp 1–1
8. McIlroy S et al (2015) Fresh apps: an empirical study of frequently-updated mobile apps in
the Google play store. Empir Software Eng 21(3):13461370. https://doi.org/10.1007/s10664-
015-9388-2
9. Mojica Ruiz I et al (2017) An examination of the current rating system used in mobile app
stores. IEEE Software, 1–1. Institute of Electrical and Electronics Engineers (IEEE). https://
doi.org/10.1109/ms.2017.265094809
10. Varshney K (2018) Sentiment analysis of application reviews on play store. Int J Res Appl
Sci Eng Technol (IJRASET) 6(3):2327–2329. https://doi.org/10.22214/ijraset.2018.3537
11. Hu H et al (2018) Studying the consistency of star ratings and reviews of popular free hybrid
android and ios apps. Empir Software Eng. https://doi.org/10.1007/s10664-018-9617-6
Time Series Data Mining in Cloud Model
Abstract. For attaining spatial time-series data in past one decade many of the
attempts have been implemented on data sets to perform various processes for
mining and classifying prediction rules. A novel approach is proposed in this
paper for mining time-series data on cloud model we used wallmart data set.
This process is performed over numerical characteristic oriented datasets. The
process includes theory of cloud model with expectation, entropy and hyper-
entropy characteristics. Then data is attained using backward cloud model by
implementing on Libvirt. Using curve fitting process numerical characteristics
are predicted. The proposed model is considerably feasible and is applicable in
performing forecasting over cloud.
1 Introduction
The information mining technology is used to verify the obtained cyclic fragments
over a predefined set of statistical data that is generated by imparting the alternate
patterns that are based on the time series prescribed [3] for generating the cyclic
association rules.
We are using the api called as Libvirt for accessing the native layer interface for
KVM (the virtual machine used in the cloud model) which internally uses the cloud
platform known as OpenStack to perform initial management operations over the
virtual machines that loads and operates the cloud platform effectively. We have
implemented the Libvirt library using php 5.0 as it supports xampp server too. We have
used the toolkit for implementing the Libvirt using its API available [3]. The main
package that is used by us is Libvirt-php package.
The analysis of statistical mining is at a great research interest where the square
measure comprises of many shortcomings because of the sleek and distribution
properties of data. In this paper we present a prediction model that support cloud model
by imparting the Libvirt-php package for describing the time series prediction along
with the usage of the virtual systems in the cloud environment using numerical char-
acteristics such as: pictures information, then the virtual square measure standardization
using cloud droplets. The process is implemented for performing the curve fitting to
obtain prediction rules from even a dead cloud model too for calculating range values.
Cloud model is predicted to be uncertain due to its property such as qualitative and
quantitative which are always uncertain due to fluctuating needs or dynamic require-
ments of a user or a service receiver. Due to this we use the quantitative conversion
methods based on the mathematical models that are based on the randomness of data
and fogginess of prediction. Using these two properties the major question is solved is
how to map the cloud data for qualitative and quantitative aspects.
Definition: Let U1 be a universal set, and C be the qualitative aspect that is
associated to U1. Where there is a probability that for all x belongs to U1 which is
directly implemented for a possible constant Cdegree! x, the price prediction will meet
the tendency of normalization [−1,0,1]:
l: U1 ! ½1; 0; 1 ð1Þ
in Eq. 1 and 2, the distribution of x on U is defined fat cloud is illustrated in Fig. 1 that
shows numerical characteristics of the cloud using Libvirt.
For performing this operation the cpu usage or probable CPU usage at the cloud
virtual servers can be easily defined as:
In Eq. 3, DS represents the data sets either on a dead cloud or an active cloud with
the frequency of 100 ns which is ht default value of a virtual system that is used as a
cloud server CPU measuring aspect.
In the Libvirt cloud model, we tend to use the expectation parameter with
Eexpectation then the parameter entropy with Eentropy, and therefore the parameter hyper-
entropy with Hentropy for representing the construct to be full.
The Expectation (Eexpectation): The mathematical expectation of the virtual servers
drop in a cloud are divided or distributed within the universal constant value 100 ns.
The Entropy (Eentropy): The probable uncertainty of virtual servers in a cloud model
is measured with the qualitative construct for determining the possible randomness and
opacity property of Libvirt cloud model.
The Hyper-entropy (Hentropy): this property will measure the uncertainty while
measuring the Libvirt servers for entropy either in a dead virtual server or to verify the
second order entropy in a live server to perform randomness measure of processing the
entropy.
The generator that is used for backward cloud is always uncertain as the conversion
model realizes the random conversion between numerical data and text data for per-
forming mapping based on the quantitative to qualitative approach by using the lan-
guage worth (Eexpectation, Eentropy, Hentropy). Using these parameters the cloud model
establishes the reverse and forward cloud generator models.
Various virtual machines are used in a cloud model to monitor solutions that are
implemented on monitoring module which is a very crucial model in a virtual machine.
The Ceilometer in a Libvirt library or API document is used to configure and use the
model in php for monitoring all the components in a cloud model. For performing this
we need to on the mentoring service in host model which is by default disabled in the
config file without which the data cannot me collected and placed in to dimensions.
The Libvirt model will also hold the information such as the present state of CPU,
the overall disk space utilized in the storage model adopted in the cloud which may be a
240 S. Narasimha Rao and P. Ram Kumar
star Raid model or the mesh RAID model. Now a days a new model is also evolved
called as hybrid RAID model which comprises of the most of the features. All these
models are used to perform a write operation or a re-write operation in a cloud model
either before mining the data or after generating the results as per the request of the user
in a real time scenario.
In the cloud model the process of time-series data processing is very crucial as the data in
cloud increases drastically and the free size of disk will decrease due to which the cloud
may fail. The time series comprises of huge data over a period of time that may be related
to a shopping mall or a industry that is related to time based sales or our own experimental
world. It comprises of some of the characteristics in cloud model numerically are:
The main plan of time-series data processing framework supported cloud model is
as follows:
The process initiates with the extraction of experimental knowledge for a specific
period of time on a time-series databases for obtaining the cloud droplets from
Libvirt.
Then the next step includes the generation process of backward cloud for extracting
numerical characteristics using the property Eentropy over a cloud drop using
Libvirt.
Then in next step all the obtained cloud drops will be compared with Eexpectation
with Hentropy and the data items generated from virtual CPU’s.
Lastly the obtained major rule is to fit all the generated items with the possible
numerical characteristics for performing the prediction or forecasting.
set or data cube available in cloud model. This process can be implemented using
“pseudo-regression techniques” by initiating with the root test till the leaf test in a
statutory order.
Sample Based Noise Test: after extracting the information from a data set or a data
cube we need to perform noise test by using sample data set over the target data set
which may be a shopping mall data set or any other scientific data set such as ARIES
over the sequence of data to obtain the time series predicates.
Identification of a Model: the model is to be identified based on the calculated
likelihood ratio, that specifies to either adopt a model estimate or not based on the
parameters provided by Libvirt API for obtaining the optimal solution for identification
of a dead cloud and to perform back tracking on the cloud model.
Testing the Model: the model is to be completely verified for any missing data or
missing virtual servers that are not the part of cloud model by performing various types
of noise test in the Libvirt model. Based on this input only the further data is extracted
and modified based on the requirements.
Predicting the Model: in a cloud model the complete data is to be verified towards its
consistency and completeness for performing forecasting or prediction based on the
time series.
Complete Evaluation of the Cloud Model: for performing the verification of time
series data over a cloud model we need few parameters data such as mean based
absolute error then the root mean based square error and the overall percentage level of
mean error are to be defined [11].
242 S. Narasimha Rao and P. Ram Kumar
4.4 Prediction
The statistic may be a written record over a series of observations that are made for
generating the additional fitting curves which are also some times considered to be the
square measure as foreseeable rules. Based on these prediction rules we can perform
statistic knowledge prediction. Consistent with the data of the operate and time
parameters, we are able to acquire relationship price of fitting curve operate.
For example consider the data points collected on various time intervals called as
the cloud drops as shown in Fig. 3, which comprises of time series data obtained in
summer which is the output after calculating the mean temperature values over a period
of time.
Using the Libvirt API we and easily identify the relationship attained to predict and
manage the conditions imposed for obtaining the time series knowledge for performing
decision making. The numerical model or mathematical model is used to perform long
trends towards seasonal changes and irregular temperature changes by considering the
historical knowledge using the time series data. We are able to effectively predict the
long trends in seasonal changes.
Time Series Data Mining in Cloud Model 243
For performing this test we have configured the cloud on Table 1 specific
environment.
The prediction results attained using times series are illustrated in Table 2.
We have used wallmart data set to perform prediction of time series sales for a
period of ten years and the results attained are almost matching with the actual sales.
5 Conclusions
Most of the present day abstraction knowledge have the time dimension which has the
tendency to get modified over time. The statistical area comprises of the time
dimension in abstraction that gets associated with statistical rules through statistic area
for performing the association rule mining. In this paper we have used a Libvirt API for
performing the time series knowledge rules for performing the data mining based on
three numerical characteristics of cloud model. This model represents the options of
244 S. Narasimha Rao and P. Ram Kumar
your time series knowledge then the digital options of a series of sample sets were
obtained. Then based on these feature points the rule curve fitting was foretold for
getting prognosticative models. And finally we have taken a data set of temperature and
illustrated the cloud drops and the configuration of execution environment for obtaining
the results.
References
1. Li DR, Wang SL, Shi WZ et al (2001) On spatial data mining and knowledge discovery.
GeomatS Inf Sci Wuhan Univ 26(6):491–499
2. Shekhar S, Zhang P, Huang Y et al (2003) Trends in spatial data mining. In: Kargupta H,
Joshi A (eds) Data mining: next generation challenges and future directions. AAAI/MIT
Press, Menlo Park, pp 357–380
3. Ji X et al (2015) PRACTISE: robust prediction of data center time series. In: International
conference on network & service management
4. Box G, Jenkins GM (1976) Time series analysis: forecasting and control. Holden Day Inc.,
San Francisco
5. Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series
database. In: Proceedings of 1999 international conference on data engineering (ICDE’99),
pp 106–115. Sydney, Australia, April 1999
6. Ozden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: Proceedings of
the 15th international conference on data engineering, pp 412–421
7. Li Y, Ning P, Wang XS et al (2003) Discovering calendar-based temporal association rules.
Data & Knowl Eng 44:193–218
8. Agrawal R, Lin KI, Sawhney HS et al (1995) Fast similarity search in the presence of noise,
scaling, and translation in time-series databases. In: Proceedings of the 21th international
conference on very large data bases, pp 490–501
9. Pavlidis T, Horowitz SL (1974) Segmentation of plane curves. IEEE Trans Comput 23:860–
870
10. Park S, Kim SW, Chu WW (2001) Segment-based approach for subsequence searches in
sequence databases. In: Proceedings of the sixteenth ACM symposium on applied
computing, pp 248–252
11. Pratt KB, Fink E (2002) Search for patterns in compressed time series. Int J Image Graph 2
(1):89–106
12. Xiao H, Hu YF (2005) Data mining based on segmented time warping distance in time series
database. Comput Res Dev 42(1):72–78
13. Cui WH, Guan ZQ, Qin KA (2008) Multi-scale image segmentation algorithm based on the
cloud model. In: Proceedings of the 8th spatial accuracy assessment in natural resources,
World Academic Union
14. Tang XY, Chen KY, Liu YF (2009) Land use classification and evaluation of RS image
based on cloud model. In: Liu Y, Tang X (eds) Proceedings of the SPIE, vol 7492, 74920N-
74920N-8
Multi-node MIL-STD-1553 Avionics
Bus-Monitor Development Under Linux
Platform
1 Introduction
Complex avionics systems communicate with each other in the form of electrical
signals like analog and digital. Faster update of data and execution of control and
guidance algorithm in hard real time are the vital requirement for communication
interface between subsystems. The distance, electrical interface cable complexity
contributes in drop and even total loss of signals. MIL-STD-1553 introduction in data
communication enhanced the speed of communication of 1 Mbps with the simplicity in
cable interfaces. This is more reliable and redundant protocol for avionics system as
compared with the other serial communication protocols.
© Springer Nature Switzerland AG 2020
S. C. Satapathy et al. (Eds.): ICETE 2019, LAIS 3, pp. 245–253, 2020.
https://doi.org/10.1007/978-3-030-24322-7_32
246 R. S. Karvande et al.
(1) Bus Controller (BC): The bus controller is responsible for initiating messages on
MIL-STD-1553 data bus. The commands may be for transfer of data or the control
and management of the bus.
(2) Remote Terminal (RT): Remote terminal either receives data or transmit the data
on the bus. It has 32 sub addresses and each sub address can receive data up to 32
hex words.
(3) Bus Monitor (BM): Bus Monitor is a terminal that listens to the exchange of
information on the MIL-STD-1553 data bus between BC and RT. Bus monitor
collects all the data from the bus for analysis purpose (Fig. 1).
Recent trends show that most of the serial commination channels are replaced by
MIL-STD-1553 in avionics industry because of the advantages in speed and reliability.
All data communication i.e. Ground systems to Embedded Computer for initialization
and health checks of flight subsystems, Onboard Embedded computer to all flight
hardware subsystems, 1553 interface is used.
Bus monitor plays significant role of continuously monitoring the traffic of mes-
sages on bus. It captures all the data which flows on the bus. The hex data is converted
to the engineering units as per the Interface Control Document. In this protocol the
information is sent with command word followed by data words. The following
example explains the details of the message. Command word 0x0C98 stands that
message is from RT to BC on sub-address-4 with data words 24 [6]. INS subsystem
working as RT-4, the information may consists of about positions of vehicle (3 float
variables) and velocities of vehicle (3 float variables) in the raw data [1]. This data
captured by BM and converted to engineering units and this well-structured data is
used for the analysis.
Multi-node MIL-STD-1553 Avionics Bus-Monitor 247
2 Problem Definition
OBC
(BC)
BUS
INS ACTUATOR TELEMETRY
MONITOR-1
(RT) (RT) (RT)
This development initiated with latest Intel i7 based computer architecture along with
Linux operating system [9] (Ubuntu with real time kernel) and 4 nodes MIL-STD-1553
PCI based interface card [8]. Subsequently software is developed. Architecture of the
software has three parts: data capture, extraction and online display [11].
248 R. S. Karvande et al.
RT BC
Bus
OBC LC Monitor-1
BC
N1
N2 BUS
RT RT RT
Bus N3
N
N33 MONITOR
INS ACTUATOR TELEMETRY Monitor-2 all nodes
(RT) (RT) (RT)
RT
BC
Bus
Simulation Monitor-3
Computer
Start
(Main Program)
Initialization of data
(as per ICD)
aceIniƟalize(DevNum)
(For four nodes)
GetMTStkDecodedMsg(DevNum)
DisplayDecodedMsg()
No
Is
DevNum > 4
Yes
No
DevNum =1 Is
Run Over
Yes
STOP
4 Software Development
Design of the application program under real time operating system is the main core
work of this project. The framework of application software is designed with program
in C software language under real time Linux accessing the card via driver program and
display to the GUI using QT framework.
Application Programming Interface (API) aceMTGetStkMsgDecoded() is used for
the data capture in polling mode for all four nodes.
250 R. S. Karvande et al.
if (nResult1== 1)
++nMsgNum1;
CaptureDecodedMsg1(DevNum1, &sMsg);
if (nResult2 == 1)
++nMsgNum2;
CaptureDecodedMsg2(DevNum2, &sMsg);
if (nResult3 == 1)
++nMsgNum2;
CaptureDecodedMsg3(DevNum3, &sMsg);
Till N nodes
Multi-node MIL-STD-1553 Avionics Bus-Monitor 251
In the HILS configuration the three nodes have to be configured, so the software is
developed with four node 1553 card keeping the fourth node ideal. In the case of
avionics systems it is possible that with the ‘n’ number of nodes, this bus monitor is
capable of capture the data of nodes in polling mode.
Time Tag Register returns the value (0x0000–0xFFFF) in any stage after trigger.
By default the resolution of TTR is 2 us, so this highly precise and reliable 1553
hardware timer is used for time synchronization. The first command initiation is done
on node-1. This TTR value is used for computing time interval stamp. The saved data
for each node is validated and analyzed as per the ICD and it is observed that all the
data on each node is captured as per the requirement. Number of runs has been carried
out before deployment of this software.
The fully integrated Bus Monitor with application program to capture the ‘n’ node data
is tested in different configurations to test the performance of software and hardware
(Figs. 6 and 7).
Fig. 7. Raw data output file (ii) after execution of four nodes bus monitor.
6 Conclusion
References
1. Karvande RS, Ramesh Kumar B (2013) Development of HILS test-bed to test newly
developed INS system. In: Proceedings of IEEE conference, ICMIRA. pp 536–539
2. Hoseinian MS, Bolorizadeh MA (2019) Design and simulation of a highly sensitive SPR
optical fiber sensor. Photonic Sens 9:33
3. Yang J, Konno A, Abiko S, Uchiyama M (2018) Hardware-in-the-loop simulation of
massive-payload manipulation on orbit. ROBOMECH J 5:19
Multi-node MIL-STD-1553 Avionics Bus-Monitor 253
1 Introduction
Artificial Intelligence is helping farmers across the world to improve yield and adopt
modern agricultural practices. In Maharashtra’s Konkan region, Alphonso mango is a
significant commercial crop. The Alphonso is grown mainly in western India; notably
in Sindhudurg, Ratnagiri and Raigad districts and in remaining Konkan region of
Maharashtra, India. India is a major exporter of Alphonso but, in recent years the world
famous Alphonso was banned in major consumer markets due to foreign pests found in
exported consignments. The pests and fungi were the major cause and dealt a serious
loss to cultivators. Various pests attack mango fruit crop in its vegetative and repro-
ductive phases. The different pests observed on the mango plant are Spiralling whitefly,
Leafhopper, Deanolis sublimbalis and Thrips. The major harmful diseases to mango are
Powdery Mildew, Blossom Blight and Bacterial Canker. Thrips pose serious damage to
fruit and found to be deprecating the yield of mango farms. Mango thrips has been
widely observed in recent years in India. In the very beginning of the Mango flowering
stage flower thrips feed on petals, anthers, pollen, and floral nectaries, resulting in the
discoloration and malformation of panicles [1]. The weakening of the inflorescence and
reducing fruit sets, bronzing of the fruit surface was also recorded due to the presence
of air in emptied cell cavities which acts as an incubator for thrips. This effect is mostly
recorded in mature and ripped fruits and these fruits are portrayed unsuitable for fresh
marketing.
The flower thrips have a wide range of hosts, consisting of weeds that act as a
refugee between mango flowering seasons and during the application of pesticides to
mango flowers. The study of thrips species infesting chilli and other plants has been
completed in India. However, thrips species investigation in mango orchards of
Konkan has not been completed, although thrips has increased and referred as highly
dangerous pest attacking mango inflorescences. Other dangerous consequences consist
of the pest resistant evolution of thrips populations, pest resurrection and the outbreak
of secondary pest infestations. Based on records, the consumption of synthetic pesti-
cides in Southeast Asia has increased from 0.74 kg/ha in 1990 to 1.5 kg/ha in 2000.
Considering last few years continuous significant loss in mango production in India
and Thrips are an increasing threat to the production of mango. This study is aimed at
an advanced delivering probability of the outbreak of thrips. This study is based on
finding patterns from past data and analyzing current data for predictive analysis of the
outbreak of thrips. As a part of the study it requires analysis of local farm environment.
The sensor network in targeted farms furnishes the real time data requirement. Prior to
the actual implementation and use, machine learning algorithm is trained on past 20
year’s data and prediction is done by the algorithm. The main objective of this study is
to minimize the impact of thrips on the mango crop, which reduces the use of pesticides
and reduction in the cost of production.
2 Related Work
Several works has been done in the domain of plant disease prediction and detection
using various computational tools approach.
The thermal indices play an important role in the thrips population [1]. In a par-
ticular study the correlation between thermal indices and Thrips outbreak were ana-
lyzed. Thrips count is recorded using method of gently tapping shoot or panicle and
holding a white paper in the palm in the orchards on weekly intervals. For analysis,
mean count per panicle was recorded and weather records were collected using agro
met observatory located in the experimental area. The peak in the thrips population was
observed in the flowering phase. The stepwise regression analysis revealed that max-
imum temperature, minimum temperature, maximum relative humidity, minimum
relative humidity, and sunshine hours are the factors, thrips population dynamics
depends on [1]. Several ranges in thermal indices were studied and a positive and
significant correlation regarding the thrips population has been revealed. The correla-
tion helped to predict the Thrips outbreak in advance.
The convolutional neural network models with different layers are found to be very
good in image processing and have been used to perform plant disease detection and
diagnosis very precisely [2]. The models were trained and tested on publicly available
images dataset of healthy and diseased plants. Training was provided with an open
256 P. B. Jawade et al.
3 Methodology
3.1 Data Acquisition
3.1.1 Area Survey
Survey of five villages in Devgad Talluka was conducted to find general trend in
outbreak of Thrips in those areas. General Findings of survey were as follows (Fig. 1;
Table 1):
a combined decision or prediction. The base models are the decision trees. Ensemble
trees increase the accuracy of the data prediction. In the prediction of the disease,
Random forest algorithm was used because it gave a low mean absolute error.
Temperature and humidity data was given to algorithm as input for prediction.
Algorithm predicted attack value ranging from 0.9 to 0.3 for each day. Average value
of five days was calculated to predict likelihood of outbreak of disease in next five
days. Short interval was considered because outbreak period of Thrips is very short and
farmer has to take action as quick as possible. Second reason was short period average
has considerable impact of each day attack on average likelihood.
Disease Prediction of Mango Crop Using Machine Learning and IoT 259
Results obtained by parameter tuning shows that n_estimator at 20 gives the lowest
error. Table 3 shows parameter tuning for best combination of parameters. In this study
n_estimator with value 20 shows least error. As this study opted regression approach to
solve the prediction problem, algorithm with least error on testing data is best fit
method. It means that difference between predicted and actual value is very minimal.
Regression approach was used because prediction of attack cannot be given as binary
output due to uncertainty of weather. This regression approach helped farmers to
decide what preventive measures should be taken depending upon current outbreak
status, prediction of outbreak and available resources etc.
Results obtained by parameter tuning shows that n_estimator at 20 gives the lowest
error. Table of error for different values of estimator is as follows:
Table 3 shows parameter tuning for best combination of parameters. In this study
n_estimator with value 20 shows least error. As this study opted regression approach to
solve the prediction problem, algorithm with least error on testing data is best fit
method. It means that difference between predicted and actual is less.
260 P. B. Jawade et al.
5 Conclusion
Disease prediction helped farmer to minimize the loss due outbreak of thrips. It helped
farmers to take a preventive measure which has minimized loss of yield. Also reduction
in quality of mango is prevented which will help farmers in increasing income and fruit
quality. Random forest algorithm is proven to be pretty accurate in predicting likeli-
hood of attack of thrips. This technique has helped farmers to take preventive measures
which has impacted productivity of farm and reduced use of chemical pesticides on
crop. Gradually it will prevent incidents of occurrence of foreign flies in fruit lot
exports.
The system can be extended to roll out in Mango orchards with more precise on
field sensors. The low implementation cost will help in micro farming in India. The
image processing using convolutional neural network to assess current health of plants
can be integrated with this system which will surely give very precise forecasting
results. The already developed accurate models of disease detection and classification
can also be used in hand with this system to increase throughput.
References
1. Gundappa AT, Shukla PK (2016) Prediction of mango thrips using thermal indices. GERF
Bull Biosc 7(1):17–20
2. Sladojevic S, Arsenovic M, Anderla A, Culibrk D, Stefanovic D (2016) Deep neural
networks based recognition of plant diseases by leaf image classification. Comput Intell
Neurosci 2016:1–11
3. Ferentinos KP (2018) Deep learning models for plant disease detection and diagnosis.
Comput Electron Agric 145:311–318
4. https://innovate.mygov.in/wpcontent/uploads/2018/09/mygov1536109003113172.pdf
5. Singh BK, Singh S, Yadav SM (2014) Current scenario of production, area and some
important post harvest disease of mango and their management in India: an overview.
Asian J Plant Sci 13:46–50
6. Kodali RK, Sahu A (2016) An IoT based weather information prototype using WeMos. In:
2016 2nd International conference on contemporary computing and informatics (IC3I).
Noida, pp 612–616
7. Flores KO, Butaslac IM, Gonzales JEM, Dumlao SMG, Reyes RS (2016) Precision
agriculture monitoring system using wireless sensor network and Raspberry Pi local server.
In: 2016 IEEE region 10 conference (TENCON), pp 3018–3021. IEEE, November 2016
8. http://ipindiaservices.gov.in/GirPublic/Application/Details/379
9. Jenkins GM, Alavi AS (1981) Some aspects of modelling and forecasting multivariate time
series. J Time Ser Anal 2(1):1–47
10. Mercadier M, Lardy JP (2019) Credit spread approximation and improvement using random
forest regression. Eur J Oper Res 277:351–365
11. Li Y, Zou C, Berecibar M, Nanini-Maury E, Chan JCW, van den Bossche P, Van Mierlo J,
Omar N (2018) Random forest regression for online capacity estimation of lithium-ion
batteries. Appl Energy 232:197–210
12. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of
classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
A Methodology to Find Artifacts of the Hacker
in Man-in-the-Browser Attack
1 Introduction
Cybercrimes are evolving day-by-day. Hackers are finding new ways to carry out
attacks on the information systems. Session hijacking is one category of cyber-attacks
where the hacker takes control of a user’s legitimate session to perform malicious
actions. As these actions are originating from a legitimate session, it is difficult to
differentiate between the legitimate actions and malicious actions. Therefore, these
types of attacks provide a great advantage to the hackers. Session hijacking attack can
be carried out using various methods such as man-in-the-middle, man-in-the-browser,
and more recently man-in-the-mobile.
Man-in-the-browser attacks are specialized version of man-in-the-middle attacks.
They mainly operate at application layer by compromising the victim’s computer with a
trojan. This trojan takes control of the browser to carry out session hijacking attack.
Man-in-the-browser attacks does not bother about encrypted communications because
all those operations are performed when the data leaves the browser and the attack is
done prior to that stage. Now, as the hacker is using legitimate session to perform
malicious actions, it is difficult to stop those actions. Moreover, the actual identity of the
hacker is hidden now, and all the malicious actions are originated from legitimate user.
As normal mechanisms do not detect the identity of the hacker, scientific and
analytic techniques are required to resolve this crime. All these techniques are integral
part of a domain called computer forensics. Computer forensics is a branch of forensic
science. Its intention is to find the digital evidence that could be used in court pro-
ceedings. Computer forensics does not detect or prevent the cybercrime, it is applied
once the crime has been committed.
2 Man-in-the-Browser Attack
these mule accounts as the intermediary nodes between his bank account and the
account from which the money was grabbed using man-in-the-browser attack. Even the
mule account holders are unaware of this attack. Hacker can hide his identity by
making mule account holders as the culprits in this scenario.
While performing all these functions, man-in-the-browser malware stays connected
to the hacker. Man-in-the-browser malware can add the victim machine to the botnet
which is controlled by a command server which is again controlled by the hacker.
Hacker makes changes to the configuration files and uploads them to the command
server. Now the command server issues commands on behalf of the hacker to the
malware running on the victim machine. PHP scripts are used for communication
between victim machine and command server.
Man-in-the-browser malwares compromises the browser application using tech-
niques such as browser helper objects, Document Object Model exploitation, API
hooking, and changing the registry values in windows operating system. Browser
application runs with system level privileges so if the hacker can control the browser
application then ultimately the processes which are invoked by the browser will have
system level privileges.
Browser helper objects are DLLs that help the browser to access Document Object
Model. They are add-ons or extensions that help in improving the functionality of the
browser. They add registry entries in the windows operating system to load at the
startup of the browser application.
API hooking is another technique used to compromise the browser application.
There are several APIs or DLLs that help in connecting the browser application to the
internet. These APIs are intermediary nodes between browser application and internet.
The data flows through these APIs. Browsers use these APIs to connect to the internet
and get the desired data from the internet and display the HTML content on the screen.
Man-in-the-browser malware corrupts these APIs by injecting the malicious functions
into the API code. By corrupting the APIs, the received HTML code from the internet
is rewritten such that additional input fields can be added to the legitimate website only
on the victim machine. One example of the API hooking is corrupting the wininet.dll in
windows operating system. Wininet.dll has several functions such as httpsendrequest(),
navigateto(), httpopenrequest(), internetreadfile(). These functions as their names
suggests are very important for the browser to send, receive, and display the content in
the browser. If these functions are corrupted everything that the user sees in the
browser can be altered without breaking the SSL communication.
Man-in-the-browser malware makes changes to the windows registries. These
changes help the malware in various aspects like loading the add-ons when the browser
starts, altering the browser security settings so that the malicious HTML code can be
injected into the websites without being blocked by the browser, and mainly to
maintain the high-level privileges to carry out the attack.
Being this sophisticated, man-in-the-browser attacks have wreaked havoc in the
banking industry belonging to USA and European countries. Although the banking
industry have employed many preventive measures such as two factor authentications,
they were easily circumvented by the man-in-the-browser malwares because once the
user logs into the account, hacker can change these security mechanisms.
264 S. D. Kondeti et al.
There are several man-in-the-browser malwares but the most important and the
most popular one is known as Zeus. Zeus was the first man-in-the-browser malware.
Later, when the source code of the Zeus was released many variants were developed
increasing the complexity of man-in-the-browser attack. The other examples of man-in-
the-browser malwares are Torpig, URLZone, Adrenaline, Sinowal, Silent Banker,
Shylock, Spyeye, Carberp, and Sunspot (Figs. 1, 2, 3).
Behavioural
Source code of paƩerns of the
Cuckoo sandbox
man-in-the- malware can be
tool is used for
browser malware found such as
automated
is given as input process memory,
analysis of
to cuckoo files created in
malwares
sandbox tool the background
etc
•Access data FTK imager tool is used to acquire the corrupted hard
disk image, windows registry files, memory dump of windows 7
AcquisiƟon virtual machine.
•Access data FTK tool is used to analyse the acquired corrupted hard
disk image.
Hard disk •ArƟfacts found here are xml, php files created in the background.
analysis
4 Conclusion
References
1. RSA White Paper, Making sense of man-in-the-browser attacks: threat analysis and
mitigation for financial institutions. http://viewer.media.bitpipe.com/1039183786_34/
1295277188_16/MITB_WP_0510-RSA.pdf
2. Dougan T, Curran K (2012) Man in the browser attacks. Int J Ambient Comput Intell
4(1):29–39. https://doi.org/10.4018/jaci.2012010103
3. Analysis of man-in-the-browser attack by SANS. https://www.sans.org/readingroom/
whitepapers/forensics/paper/35687
4. OWASP article about man-in-the-browser attack. https://www.owasp.org/index.php/Man-in-
the-browser_attack
5. ISACA article about man-in-the-browser attack. https://www.isaca.org/Journal/archives/
2013/Volume-4/Pages/Man-in-the-Browser-A-Threat-to-Online-Banking.aspx
6. Grande CL, Guadrón RS (2016) Computer forensics. In: 2016 IEEE 36th central american
and panama convention (CONCAPAN XXXVI), pp 1–6. San Jose. https://doi.org/10.1109/
concapan.2016.7942361
7. Zeus malware source code. https://github.com/m0n0ph1/malware-1/tree/master/Zeus
8. Cuckoo sandbox documentation. https://cuckoo.sh/docs/
9. Carrier B (2005) File system forensic analysis. https://www.oreilly.com/library/view/file-
system-forensic/0321268172/
10. Carvey H (2011) Windows registry forensics: advanced digital forensic analysis of the
windows registry. Syngress Publishing. https://dl.acm.org/citation.cfm?id=1996274
11. Ligh M, Adair S, Hartstein B, Richard M (2010) Malware analyst’s cookbook and DVD:
tools and techniques for fighting malicious code. https://www.wiley.com/en-us/Malware
+Analyst%27s+Cookbook+and+DVD%3A+Tools+and+Techniques+for+Fighting
+Malicious+Code-p-9780470613030
12. Volatility documentation. https://github.com/volatilityfoundation/volatility/wiki
13. Ligh MH, Case A, Levy J, Walters A (2014) The art of memory forensics: detecting malware
and threats in windows, linux, and mac memory. https://www.wiley.com/en-us/The+Art+of
+Memory+Forensics%3A+Detecting+Malware+and+Threats+in+Windows%2C+Linux%
2C+and+Mac+Memory-p-9781118825099
14. Casey E (2011) Digital evidence and computer crime: forensic science, computers, and the
internet. https://dl.acm.org/citation.cfm?id=2021194
15. Casey E (2009) Handbook of digital forensics and investigation. https://dl.acm.org/citation.
cfm?id=1822831
Implementation Effects of E-ID Device
in Smart Campus Using IoT
Abstract. In the present world, teachers and student’s parents are very busy in
their scheduled life. Irrespective of this, their health and work can be been
recorded and monitored with the help of E-ID device using IoT platform. We can
introduce E-ID device for every university employee including the student. By
connecting all these together on IoT platform, the university will be called as
“smart campus”. The synchronized data of the student or teacher or university
authority of each and every minute will be stored. The top-level management of
university to the ground level of student’s parents will get student’s details like
attendance, health, fitness, extracurricular activities, security, campus facilities
and the access for labs and air conditioner can be done automatically by E-ID
devices. In this way, power consumption can be reduced and energy efficiency
will be increased. Finally, the educational organizations will have transparent
information in hands within a few minutes by using E-ID device on IoT platform.
1 Introduction
In the competitive world, students are joining from their various places and staying away
from their parents. The students are also pursuing a dual degree from one university
campus and connecting other university faculties by monitoring from a long distance
and here the university authorities can monitor them by results-oriented only. So the
education system is converted into bossism commanding orientation and objective
oriented. In recent years, many latest technologies have emerged mainly with electronic
gadgets like E-learning, E-commerce, E-Health, E-fitness, E-security etc. Combination
all these onto a single platform using IoT can be used by the educational organizations to
educate and track the students, monitoring teachers and also monitoring can be done by
the student’s parents to provide them with best education by using sensor technology,
GPS tracker, radio frequency, WSN, cloud, digital displays boards and security. All
these can be introduced by one single E-ID device on the smart campus.
2 Literature Review
2.1 Education System Using Internet of Things
The physical objects are connected and converted into networks through the internet, it
was a radical change by the effect of IoT [13, 14]. IoT makes communication possible
between people and the environment, along with the people and things [4]. To react to
the environment, the cloud services, near-field communications, real-time localization,
and sensors are embedded thereby transforming normal objects into smart objects [2].
The IoT has an option to merge the Internet information and services together [4]. In
education, instructor has a special preference, the particular objective is to gather the
data and provide knowledge to the students in improving their learning aspects by
using an IoT system [7]. In recent education, “there are seven different types of
technologies provided to the students so that they feel it as a real-time experience”
which is mentioned by the author [8]. The different types of electronic gadgets like
cameras, microphone, video projectors, sensors and face recognition algorithms with
required software make a classroom as intelligent classroom environment [1]. Student’s
concentration, performance and achievements can be improved with smart classroom
environment [9]. Industries and higher education experts [10, 11] mentioned that
problems can either be solved or created using IoT in the areas of security, privacy and
data ownership issues. Using IoT, students will receive alerts from the administration
when they struggle in learning issues of their academics [11]. IoT systems are running
in many universities by connecting everything to the cloud on campus like security,
temperature management and access to electronic devices [11].
In the year of 2009, EU Commission had identified the importance of IoT in the
form of conferences to the scholar for the revolution and innovative ideas, things to
reconstruction IoT [15]. The author [5], suggested that an IoT must be the source of
Internet-connected to sensors with some database. The architecture of IoT is proposed
with three segments [14]. They are the hardware segment, the middleware segment and
the presentation segment. In the hardware segment, collection of the information is
done through sensor devices or any embedded communication device. In middleware
segment, data is stored computed and analyzed using the cloud environment. Finally, in
the presentation segment, the data is presented after analysis. IoT system must consist
of a medium for the transformation of data, to track the required thing, to take data from
the source and analyze the data for the future purpose [13]. Key role in the hardware
segment is a wireless sensor network for various IoT applications such as home
automation and energy saving [17]. The sensor device collects the data from the sensor
and sends it to the connectivity model, which is always monitored. In wireless sensor
networks communication is through wireless [6] and sensor measurements are
important to reduce the cost. Therefore, instead of using separate facilities for energy-
saving, these in built acts as energy saving devices [12]. ZigBee is a standard that
specified for wireless network with low communication rate, which is suitable for
applications in many areas [16].
270 V. Sukanya and E. V. Priya Reddy
The aim of the study is to develop the effective educational organization with the smart
campus using IoT. This can be possible using the latest technologies under the IoT
platform. In the ancient educational organizations, they could provide security and
quality of education under the guidance of a teacher throughout the period of learning
stage. But now-a-days, as the population has increased drastically, the students are
under the single teacher guidance. The aim is to study and monitor every student with
respect to the health status, security, attendance along with their academic activities.
We can also reduce wastage of power consumption using E-ID devices. An innovation
of IoT in education is shown in the Fig. 1.
The student’s parents and educational organization are having good options for
rectifying the student problems like rationalization, ragging by the implementation of
E -Id device with the IoT platform.
Student Orientation E-ID Device. The student orientation E-ID devices are of two
types, one is master E-ID device which is kept with student another one is duplicate
student E-ID device which is for the student’s parent for the observation of all activities
of the student.
a. Student E-ID device (Master E -Id device)
b. Student’s parent E-ID device (Duplicate E-ID device)
Student E-ID Device. The student E-ID Device is fully designed and developed by
electronic devices and connected to an application which involves both hardware and
software. Hardware equipment is an embedded system with different type of sensors.
The hardware here involves the following devices to frame as E-ID device. They are
RFID, Bluetooth, GPS tracker, RFID readers, Temperature Sensor and a display to the
wrist band to view the results supported by ZigBee Network. The software here
involves the mobile application, cloud server, pure pulse technology to track the
heartbeat, cardio fitness and sleeping time of the student.
Student’s Parent E-ID Device. The student’s parent E-ID Device is similar to the
student-Id device which is a duplicate one. The hardware here involves the same devices
as student and RFID (synchronized master E-ID) to frame as E-ID device. The software
here involves the mobile application to display student’s data on the parent device.
presence is identified and when there is no student or faculty in the lab the air con-
ditioner, fans and lights gets off automatically using sensors (Fig. 3).
The device consists of a health tracking system. Using this, students heartbeat can
be tracked, cardio fitness and sleeping time can be known from the device display. It
involves pure pulse technology. The device emits green light which is in contact with
the skin the light enters into the blood and observes the blood flow and heartbeat i.e.,
blood flow to and from the heart is captured per minute and by using the heart resting
time cardio fitness and workouts can be tracked. By finding the resting time of the heart
the sleeping mode is being identified. Even the alerts of meals and drinks intake are
been provided. The data collected by the E-ID device is transmitted to the cloud using
IoT. Monitoring of student data is done by the admin office staff. The data is also
displayed on the mobile application and some part of the data is also displayed on wrist
band display as notification.
Teaching Staff E-ID Device. The teacher’s attendance is recorded using RIFD at the
office room. Class to be taught is given in the lesson plan with the location of a class
mentioned to the students through the mobile application. After the class is completed,
the recorded class using external devices like microphone and camera will be uploaded
to a server which will be monitored by admin office staff. By this method, the student,
student parent’s and university authorities are notified about the completion of the
lesson and quality of teaching using IoT. Similarly, health tracking can also be done
using pure pulse technology (Fig. 5).
Non Teaching Staff E-ID Device. The Non-teaching staff has an E-ID device for the
purpose of recording attendance and health tracking in an above-mentioned way. In
general, the evaluation of exam paper is done manually by removing the hall ticket
number of the student on the exam paper before correcting. This may involve some
mistakes while removing the hall ticket. To overcome this, the answers script can be
provided with the bar code of the student hall ticket which can be called as smart exam
paper. In this smart exam paper, during evaluation only the bar code is seen by the
faculty which doesn’t help them to know the student details without the scan. After the
correction, the non-teaching staff decodes the bar code and the result is sent to the
server then to the mobile application of the student and parent. The feedback of the
student can be collected by the non-teaching staff. The feedback form is sent to the
student mobile application to be filled and results can be analyzed using the cloud.
Implementation Effects of E-ID Device in Smart Campus Using IoT 275
5.4 Limitations
1. The E-ID device is only suitable to the hands to implement the of pure pulse
technology.
2. The E-ID device is to be charged after a period of every 3 days.
6 Conclusion
If all educational organizations introduce and implement education with latest tech-
nologies like IoT it will be useful in the right way. The IoT is playing a major role in
the making of a smart campus by using E-ID device. The different types of staff and
students on campus are interlinked by IoT and this system of education will guide
every educational organization from the top level to bottom level of education system.
The students of the present academic year can be identified where there are in the
present situation and in comparing the past and future of academic student’s perfor-
mance of the smart campus of an organization. By the implementations of E-ID device
will overcome the situations faced by them under the control of teachers E-ID device.
Students will get right path to his/her education every time effectively.
References
1. Xie W, Shi Y, Xu G, Xie D (2001) Smart classroom an intelligent environment for tele-
education In: 2nd IEEE pacific rim conference on multimedia, China, p 662–668
2. Kortuem G, Kawsar F, Fitton D, Sundramoorthy V (2010) Smart objects as building blocks
for the internet of things. IEEE Internet Comput 14(1):44–51
3. Zhang P, Yan Z, Sun H (2013) A novel architecture based on cloud computing for wireless
sensor network In: Proceedings of the 2nd international conference on computer science and
electronics engineering (ICCSEE ’13), Hangzhou, China, pp 0472–0475
4. Vermesan O, Friess P, Guillemin P et al (2011) Internet of things strategic research
roadmap. IoT Cluster Strategic Research Agenda, chapter 2:10–52
5. Atzori L, Iera A, Morabito G (2010) The internet of things a survey. Comput Netw
54(15):2787–2805
6. Terada M (2009) Application of ZigBee sensor network to data acquisition and monitoring.
Meas Sci Rev 9(6):183–186
7. Wellings J, Levine MH (2009) The digital promise: transforming learning with innovative
uses of technology. In: Sesame workshop
8. Johnson L, Becker S, Estrada V, Freeman A (2015) The NMC horizon report: 2015 higher
education edition. The New Media Consortium, Austin, Texas
9. Mendell M, Heath G (2005) Do indoor pollutants and thermal conditions in school influence
student performance? a critical review of the literature. Indoor Air J 15:27–32
10. O’Brien J (2016) The Internet of things: unprecedented collaboration required. In:
EDUCAUSE review, June 2016. https://er.educause.edu/articles/2016/6/the-internet-
ofthings- unprecedented-collaboration-required
11. Asseo I, Johnson M, Nilsson B, Chalapathy N, Costello TJ (2016) Internet of things: riding
the wave in higher education In: EDUCAUSE review, July/August 2016
276 V. Sukanya and E. V. Priya Reddy
12. Wang H-I (2014) Constructing the green campus within the internet of things architecture.
Intern J Distrib Sens Netw 2014(804627):8
13. Mattern F, Floerkemeier C (2010) From the internet of computers to the internet of things.
In: Sachs K, Petrov I, Guerrero P (eds) From active data management to event-based systems
and more. Lecture Notes in Computer Science, vol 6462. Buchmann Festschrift, pp 242–259
14. Gubbi J, Buyya R, Marusic S, Palaniswami M. Internet of things (IoT): a vision, architectural
elements, and future directions. FGCS. http://www.cloudbus.org/papers/InternetofThings-
Vision-Future2012.pdf
15. European Commission (2009) Internet of things an action plan for Europe. In: COM, 278.
http://eur-lex.europa.eu/LexUriServ/site/en/com/2009/com20090278en01.pdf
16. Zigbee Alliance (2006) Zigbee specification. http://www.zigbee.org/Products/Technical
DocumentsDownload/tabid/237/Default.aspx
17. ZigBee Alliance (2007) The choice for energy management and efficiency. In: ZigBee
WhitePaper. http://www.zigbee.org/Products/TechnicalDocumentsDownload/tabid/237/
Default. aspx
Malware Detection in Executable Files Using
Machine Learning
1 Introduction
Due to the rapid increase of technology and more usage of computer systems, lots of
benefits are observed and ease of life is happened. But, with these advancements there
is also a negative energy surrounding the world in the form of cyber-attacks [1]. They
exploit the personal or sensitive data by making a new kind of malicious software
which is known as malware. Malware continues to cyber espionage and does a lot of
unwanted activities on the computer systems. Thus, the detection of this harmful
software counteracts many numbers of developers, researchers and security analysts
who care and secure the cyber world. Many organizations protect their data by using
various security products as suggested by security best practices. But modern-day
hackers easily bypass them causing a great disrupt to business of an organization,
causing a great loss for the company. Though there are several methods proposed by
Antivirus industry for malware detection, each of these methods have their own set of
lapses [2]. To overcome these issues related to traditional antivirus software, the
concept of using Machine Learning techniques to detect malware is developed.
Generally, when the user downloads software, he/she cannot detect whether it is
malicious or legitimate until the individual run them on their system. And if the
downloaded file is a malware, it may damage the resources of the system. The proposed
application frame work checks whether an executable file is malware or not using the
machine learning algorithms namely Decision tree and Random forest with almost 99%
accuracy.
2 Literature Review
Yanfang Ye et al. [3] proposed a malware detection system for executable files using
object oriented associative classification techniques which worked better compared to
traditional anti-virus systems and the proposed system was used in one of the tools of
King Soft’s Anti-virus software. Munkhbayar Bat-Erdene et al. [4] proposed a
framework which uses Symbolic Aggregate Approximation (SAX) and supervised
learning classification methods. Mozammel Chowdhury, Azizur Rahaman, Rafiqul
Islam [5] has developed a framework using machine learning and data mining tech-
niques for malware classification and prediction and obtained better results compared to
the similar works. Michael Sgroi and Doug Jacobson [6] proposed a dynamic model
that utilizes easily accessible runtime attributes in a generalizable way such that it can
be extended between operating systems. These attributes are correlated in a statistically
meaningful way by using machine learning. Deep Neural Network Based Malware
Detection by Joshua Saxe and Konstantin Berlin [7] used two-dimensional binary
program features. They introduced an approach that addresses the issues related to low
false positive rates and highly scalability, describing in reproducible detail the deep
neural network-based malware detection system. In [8] Gavriluţ et al. proposed a
framework in which can be used to differentiate malicious and legitimate files by using
different machine learning techniques. Dimensionality reduction and Pre-Processing
are the important basic steps which improves accuracy values to any ML work. The
concept behind these are easily understood from the work of M. Ramakrishna Murty
et al. [9]. A good review on machine learning and data mining techniques used in
intrusion detection was done by Anna L. Buczak, and Erhan Guven [10].
3 Classification Algorithms
There are many machine learning models available for classification. In the current
application, decision tree and random forest models were used.
Xc
EðTÞ ¼ i¼1
Pi log Pi ð1Þ
Step-4: For every attribute/feature, calculate entropy for all categorical values using
X
E(T, X) = PðcÞEðcÞ ð2Þ
c2X
4 Experimental Work
The main aim of the proposed work is to provide a web application frame work where a
user can upload an exe file and at the server side, a machine learning classification
algorithm will test whether the uploaded file malicious or benign and the result will be
shown to the user. The major part of the proposed work is to classify a file as either
benign or malignant. Figure 1 shows the architecture diagram of classification process.
Training Dataset
Feature extraction
Dimensionality reduction
Machine Learning
Algorithms
Classification
Detection
4.4 Classification
In this stage the model is built by using the dataset and then a classifier is generated,
which stores the required features information in “.pkl” format which is used for future
detection. As experimentation purpose two tree-based classification algorithms namely
decision tree and random forest were used as classifiers and their accuracies were
compared.
4.5 Detection
Finally, the detection phase. For this phase, the web application is created by using
python flask framework. Here, a user can upload any exe file, then at the backend, the
machine learning algorithm will work and identify whether the uploaded file is mal-
ware or not and the same will be notified to the user.
Malware Detection in Executable Files Using Machine Learning 281
5 Experimental Results
5.1 Phase-1 Learning
After applying tree-based feature selection algorithm to the data set for dimensionality
reduction, it identified 12 out of 54 as independent attributes for classification process.
These attributes were shown in Fig. 2.
Then the classification algorithms were applied on the reduced dataset. The accu-
racies of the algorithms were shown in the form of confusion matrix in Fig. 3. It was
observed an accuracy of 98.9% for decision tree and 99.4% for random forest for the
dataset taken.
Fig. 3. Confusion matrix for (a) decision tree and (b) random forest
Figure 5 shows the initial page of the web app where a user can upload a file. The
application was first checked by uploading a malicious file. Figure 6 shows the screen
shot of uploading a malware file called Metasploit.exe and Fig. 7 shows the result after
the file is checked at the back end.
Similarly, the application was checked by uploading the legitimate files. Figure 8
shows the screenshot of uploading benign file chromesetup.exe file. Figure 9 shows the
result after uploading the file.
Malware Detection in Executable Files Using Machine Learning 283
6 Conclusion
This paper proposes a user-friendly web application which helps to test whether a .exe
file is malware or not by using machine learning algorithms. Our approach combines
the use of algorithms like decision tree and random forest to generate a anti malware
detector. The possibility of false rate of this app depends on the dataset considered. By
using latest datasets, it is still possible to improve the accuracy rates.
References
1. Ye Y, Wang D, Li T, Ye D, Jiang Q (2008) An intelligent PE-malware detection system
based on association mining. J Comput Virol 4(4):323–334
2. Rad BB, Nejad MKH, Shahpasand M (2018) Malware classification and detection using
artificial neural network. J Eng Sci Technol. Special Issue on ICCSIT 2018, pp. 14–23
3. Ye Y, Wang D, Li T, Ye D (2007) IMDS: intelligent malware detection system. In:
Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery
and data mining, pp 1043–1047
4. Bat-Erdene M, Park H, Li H, Lee H, Choi MS (2017) Entropy analysis to classify unknown
packing algorithms for malware detection. Int J Inf Secur 16(3):227–248
5. Chowdhury M, Rahman A, Islam R. (2017) Malware analysis and detection using data
mining and machine learning classification. In: International conference on applications and
techniques in cyber security and intelligence, June 16, pp 266–274
6. Sgroi M, Jacobson D (2018) Dynamic and system agnostic malware detection via machine
learning. Creative Components 10
7. Saxe J, Berlin K (2015) Deep neural network based malware detection using two-
dimensional binary program features. In: 10th international conference on malicious and
unwanted software (MALWARE), October 20. IEEE, pp 11–20
8. Gavriluţ D, Cimpoeşu M, Anton D, Ciortuz L (2009) Malware detection using machine
learning. In: 2009 international multi conference on computer science and information
technology, October 12. IEEE, pp 735–741
9. Murty MR, Murthy JV, Reddy P, Satapathy SC (2011) A dimensionality reduced text data
clustering with prediction of optimal number of clusters. Int J Appl Res Inf Technol Comput
2(2):41–49
10. Buczak AL, Guven E (2016) A survey of data mining and machine learning methods for
cyber security intrusion detection. IEEE Commun Surv Tutorials 18(2):1153–1176
11. Mitchell TM (1997) Machine learning. McGraw-Hill, New York
284 A. R. Mohammed et al.
1 Introduction
The home robotization is control as concern home gadget structure focal control
summit computerization is the current certainties everywhere supplementary stuff is
mortal refined apiece day accordingly [9]. Generally, the crucial errands as concern
killing scheduled or else convinced gadget in addition to long-ago, whichever tenu-
ously or else within propinquity.
Creating vigor productivity in addition to sustainable supremacy source innovations
is whirling hooked on a should in addition to escalating the keenness intended for a
quantity of nations roughly the earth. A vigor gifted domicile to facilitate is wholly
pragmatic in addition to mechanized utilizing the snare of things (IoT) move on. The
growth during the ubiquity of IoT has generally spread to basic in-home applications
and regular assignments.
The work of IoT in home is with the finish ambition as concern liveliness observing
as well as thrifty whilst next to the equivalent occasion accomplishing in addition to
trust awake an explicit breadth of relief. Habitat robotization frameworks utilizing IoT
comprises of three noteworthy parts. The preliminary subdivision is the perceive in
addition to in sequence procurement ingredient. This is refined via situation sensors or
else gadgets, similarly, called equipment, next to a hardly any areas everyone
throughout the dwelling headed for estimate as well as accrue sought statistics, meant
for illustration, warmth, light power, and gas.
The second piece as concern the scaffold is the in sequence preparing. Sensors grant
in sequence within makeshift structure. These in turn are sending headed for the
mainframe throughout a scheme as concern diffusion, restless involvement. The
mainframe next to facilitate summit makes a construal of the in a row hooked on
comprehensible qualities. These traits are transmitted headed for a doohickey to be
controlled naturally and additionally to a UI.
The preceding quantity as concern IoT robotization is the trap. Mainly frameworks
exploit an attendant headed for relocate in sequence ensuing to handling, so it very well
may be gotten to by the client. The muddle moreover screens in sequence and joystick
gadgets tenuously. Via consequently executing a few directions, mechanization
frameworks container auxiliary occasion, bestow a higher delicate contentment within
domicile, as well as auxiliary verve. In this task we use sensors to quantify the states of
the home condition. The detected qualities are exchanged to the microcontroller. The
microcontroller will send directions to the gadgets to play out the required activities.
The apparatuses status is shown on the LCD show.
2 Literature Review
Muhammad Asadullah el Jan 12, 2019 [9] in this paper they proposed a minimal effort
in addition to effortless headed for comprehend remote controlled home computeri-
zation framework is reveal utilizing Arduino get on, Bluetooth module, PDA, ultra-
sonic feeler and dampness sensor. Advanced mobile receiver relevance is exploiting in
the proposed framework which enables the patrons headed for organize up to 18
gadgets together with domicile apparatuses along with sensors utilizing Bluetooth
modernization.
Kodali, Jain [10] centers roughly edifice an enthusiastic secluded home safekeeping
scaffold which sends cautions headed for the administrator via utilizing Internet stip-
ulation at hand should be an occasion of whichever infringe in addition to raises a
vigilance, the sound the alarm in addition to the standing sent by the WIFI allied
microcontroller oversaw scaffold canister be gotten via the buyer lying on his phone
booth starting in the least disconnection self-regulating as concern whether his chamber
headset is allied by the muddle.
According to our overview, there exist numerous frameworks that can control home
machines utilizing android-based telephones/tablets. Every framework has its special
highlights. Presently certain organizations are authoritatively enlisted and are
attempting to give better home computerization framework highlights. Following
models portrays the work being performed by others. Sriskanthan [11] clarified the
model for home mechanization utilizing Bluetooth by means of PC. Yet, shockingly
the framework needs to help portable innovation.
Implementation of Wireless Sensor Network 287
Within the proposed framework temperature, LDR, gas sensors are utilized as info
gadgets, the microcontroller will peruse sensors information as indicated by the sensors
information the microcontroller will work yield gadgets that are DC fan, DC engine
and Bulb and so on, the architecture of the system is shown in the Fig. 1. Additionally,
we are utilizing GPRS/GSM module for getting to web server, so framework will
refresh the status of machine on web server likewise we can control apparatuses from
web server.
The main use of this proposed method is to offer comfortable living conditions to
the user, in this we have discussed about three different applications perform in home.
The first application is based on the room temperature one can able to switch on or off
the fan. The second application will help to on or off the light related to light intensity.
The third application finds usage in kitchen to know the leakage of the gas.
3.1 Microcontroller
This vicinity frames the organize entity as concern the intact task. This subdivision in
effect comprises as concern a Microcontroller amid its interrelated hardware approxi-
mating precious stone during capacitors, reorganize hardware, Pull up resistors (if
necessary, etc. The Microcontroller shapes the nucleus as concern the endeavor seeing
as its joysticks the gadgets creature interfaced and speak with the gadgets as indicated
via the curriculum creature self-possessed.
ARM is the retrenchment as concern Advanced RISC Machines, it is the first name of
a set as concern processors, in addition to is the given name as concern an unselfish
modernization as well. The RISC management set, in addition to relate translate module
are an assortment a smaller amount not easy than those of multipart Instruction Set
Computer (CISC) plans.
288 G. B. Raj et al.
3.5 DC Motor
A DC locomotive depends lying on the approach to facilitate in the vein of crowd-
puller shaft repulses in addition to not at all approximating gorgeous post pulls in
solitary a different. A curl of line with a at hand departure throughout it produce an
electromagnetic field lined up with the focal position as concern the loop. By switch the
in progress lying on or else rancid in a round its gorgeous ground container be bowed
lying on or rancid or else by exchanging the attitude as concern the in progress within
the bend the itinerary as concern the fashioned beautiful grassland container be
exchanged 180°.
4 Results
Figures 2, 3, 4, 5.
290 G. B. Raj et al.
Fig. 5. The LDR value is bellow threshold value so that Bulb will turn ON
292 G. B. Raj et al.
In this work we executed a keen home robotization framework for detecting and
checking the home machines by utilizing IoT innovation. The plan of the savvy home
robotization is entirely adaptable and can be effectively extended and connected to
bigger structures by expanding the quantity as concern sensors, estimated limitation, as
well as organize gadgets. Greater usefulness and quickness might be additionally extra
headed for the framework pro influencing the home robotization framework to develop,
adjust, and advance independent from anyone else utilizing progressed IoT.
References
1. Lamine H, Abid H (2014) Remote control of domestic equipment from an Android
application based on Raspberry Pi card. In: IEEE transaction 15th international conference
on sciences and techniques of automatic control & computer engineering - STA’2014, 21–23
December 2014. Hammamet, Tunisia
2. Gunge VS, Yalagi PS (2016) Smart home automation: a literature review. National Seminar
on Recent Trends in Data Mining - RTDM 2016
3. Anusha S, Madhavi M, Hemalatha R (2009) IoT based home automation system. The
University of Texas, Austin
4. Shewale AN, Bari JP (2015) Renewable energy based home automation system using
ZigBee. IJCTEE 5(3):6–9
5. Al-Kuwari AM, Ortega-Sanchez C, Sharif A, Potdar V (2011) User-friendly smart home
infrastructure: BeeHouse. In: IEEE 5th international conference on digital ecosystems and
technologies, 31 May–3 June. Daejeon, Korea
6. Shkurti L, Bajrami X, Canhasi E, Limani B, Krrabaj S, Hulaj A (2017) Development of
ambient environmental monitoring system through wireless sensor network using
NodeMCU and WSN monitoring. In: 6th Mediterranean conference on embedded
computing (MECO)
7. Pandey PS, Ranjan P, Aghwariya MK (2017) The real-time hardware design and simulation
of thermoelectric refrigerator system based on Peltier effect
8. Rani G, Pandey PS, Aghwariya MK, Ranjan P (2016) LASER as a medium for data
transmission proceeding of international conference on. ICARE MIT-2016, 9–11 December
2016, Organized by Department of Mechanical Engineering, M.J.P. Rohilkhand University,
Bareilly
9. Asadullah M, Ullah K (2017) Smart home automation system using Bluetooth technology.
https://doi.org/10.1109/ICIEECT.2017.7916544
10. Kodali, RK, Jain, VM, Bose, S, Boppana, L (2016) IoT based smart security and home
automation system. In: 2016 International conference on computing, communication and
automation (ICCCA), pp 1286–1289
11. Sriskanthan N, Tan F, Karand A (2002) Bluetooth based home automation system.
J Microprocess Microsyst 26:281–289
12. Piyare R, Tazi M (2011) Bluetooth based home automation system using cell phone. In:
2011 IEEE 15th international symposium on consumer electronics
13. Ylmaz EN (2006) Education set design for smart home applications. Comput Appl Eng Educ
19(4):631–638
14. Mon Y-J, Lin C-M, Rudas IJ (2012) Wireless sensor network (WSN) control for indoor
temperature monitoring. Acta Polytech Hung 9(6):17–28
Load Balancing in Cloud Through Multi
Objective Optimization
1 Introduction
3 System Design
Figure 1.
Need for
optimization
Modelling of
Optimization
Problem Formulation the Problem
Implementation and
Testing the Solution
Fig. 1.
Fig. 2.
3.2 Algorithm
VM allocation policy is designed by considering the CPU utilization of physical ser-
vers and the physical servers are categorized as overloaded if the utilization is more
than 70% and the virtual machine is chosen from that physical machine for migration to
under loaded physical machine (CPU utilization is less than 30%). The algorithm
chooses VM based on its bandwidth requirement to migrate so that the migration time
is reduced.
Load Balancing in Cloud Through Multi Objective Optimization 297
4 Simulation Environment
The cloud environment is simulated using cloudsim3.0 and conducted the experiments
with 2 datacenters, 4 Physical Machines (PMs), 20 virtual machines and 200 inde-
pendent tasks. The homogeneous PMs were considered with heterogeneous virtual
machines (VMs). CPU utilization of physical machine is observed before task
assignment. If the CPU utilization is more than its threshold (70%) then migration
policy applied. Sometimes the VM is migrated from under loaded machines also to
make them switched off. The migration cost is calculated based on bandwidth
requirement and a VM with less bandwidth utilization is chosen to migrate so that the
migration overhead can be minimum.
298 S. Jyothsna and K. Radhika
Host Machine
VM Live CPU
Allocation VM Utilization
Optimization Migration
VM Allocation Policy
Fig. 3.
Table 2. VM properties
VM Id MIPS Size (MB) Bandwidth
1–5 500 10000 1000
6–10 1000 10000 2000
11–15 1500 10000 3000
16–20 2000 10000 4000
The following results shows that resource utilization is improved after migration for
4 host machines with the vm allocation policy considered based on threshold (Fig. 4).
Host Machines
Percentage of Utilization
Fig. 4.
References
1. Beloglazov A, Buyya R (2012) Optimal online deterministic algorithms and adaptive
heuristics for energy and performance efficient dynamic consolidation of virtual machines in
cloud data centres, published online in Wiley online library (wileyonlielibrary.com). https://
doi.org/10.1002/cpe.1867
2. Zafari F, Li J (2017) A survey on modeling and optimizing multi-objective systems. IEEE
Commun Surv Tutorials 19:1867–1901
3. Ramezani F, Li J, Taheri J, Zomaya AY (2017) A multi objective load balancing system for
cloud environments. Br Comput Soc 60:1316–1337
300 S. Jyothsna and K. Radhika
4. Narantuya J, Zang H, Lim H (2018) Service aware cloud to cloud migration of multiple
virtual machines. https://doi.org/10.1109/access.2018.2882651
5. Sethi N, Singh S, Singh G (2018) Multiobjective artificial bee colony based job scheduling for
cloud computing environment. Int J Math Sci Comput 1:41–55
6. Volkova VN, Chemenkeya LV, Desyatirikova EN, Hajali M, Khoda A (2018) Load
Balancing in cloud computing. In: 2018 IEEE conference of Russian young researchers in
electrical and electronic engineering
Classifying Difficulty Levels of Programming
Questions on HackerRank
1 Introduction
Similar to the text difficulty classification [6] we are classifying the programming
questions based on the suitable characteristics for programming.
In our work, we take attempt data of 47 problems attempted by 14571 users on
Hackerrank and based on how many people attempt a problem, number of total
attempts, time taken to a successful attempt and number of successful attempts, we
classify the problems as easy, medium, or hard. This system can be later used to
adaptively suggest problems to solve in sequence by the student in a manner that boosts
the motivation and thus improve the learning outcome.
2 Methodology
Data Collection is the first step in any work related to analyzing and predicting out-
comes from data. We used data from hackerrank contests created to train second and
third year students in C programming at Vishnu Institute of Technology, Bhimavaram,
AP. There were a total of 47 problems in 4 different contests with over 14 K attempts.
Each problem was identified by problem setter as easy, medium or hard vit a variable
score depending upon the difficulty level of the problem. Our methodology from data
collection to difficulty level prediction is demonstrated in Fig. 1.
We first collected HackerRank data using their Rest API and converted into CSV
file format. We then loaded the csv into a Pandas dataframe followed by data cleaning,
feature extraction and classification. From the features we derived the following fea-
tures such as number of attempts of each question (count), level prediction of questions,
time taken to solve the program (time from start). Using the derived features we
classify the difficulty level prediction and finally visualize them to study the
performance.
Table 1. The above table is representing the head (five rows) of the data.
d q ¼ c Tq N q
Based on the above criteria we will predict the difficulty level of the question and
categorize the difficulty level and improve the programming ability, interest, confidence
levels, and precise improvement in programming.
By this the average growth rate of programmers will increase to a greater extent.
From these, we are taking the points into consideration, the above analysis will be
favorable, initiative and prompts the programmer to attend for further programming
problems.
In order to calculate the programmer’s intellectuality, we consider the performance
based phenomena called success rate calculator.
The resultant value of success rate lies between 0 and 1 i.e., (0 <= success rate
<= 1).
If the value of success rate nearly equal to 1, then the performance of the pro-
grammer is considerable under more success rate i.e., success rate <= 1.
Classifying Difficulty Levels of Programming Questions 305
If the value of success rate nearly equal to 0, then the performance of the pro-
grammer is considerable under medium success rate.
If the value of success rate is absolutely equal to 0, then the performance of the
programmer is considerable under low success rate i.e., success rate = 0.
If the value of success rate absolutely equal to 1, then the performance of the
programmer is considerable under perfect success rate i.e., success rate = 1.
When the number of solved questions increases the success rate increases and there
by this basis summons programmer not to go for many attempts for submission.
From the essence of difficulty level formula calculation, the three attributes Easy,
Medium and Hard can be initialized to specific problem can be categorized.
So, the interface will visualize both instructor’s point of view of difficulty level to
problem and also our above formula level (predictor) point of view to problem.
3 Results
In our statistical analytics we have incorporated the following features in order to get
the prediction of difficulty level of programming questions based on programmers data
analysis in programmer’s point of view.
The features involved are:
Time taken to solve (time from start)
Score assigned for programmer (score)
Instructor specified level (challenge/slug)
The above features provokes following questions:
How the problem can be classify the performance on the basis of measures of
difficulty?
How does the time spent on the question can derive the difficulty of the question?
Based on the above features we can classify the questions into three levels (Easy,
Medium, Hard). How have we classified?
Assessment of difficulty of questions:
As mentioned earlier, based on the time taken by the programmer to solve a
question correctly, we have classified the difficulty of questions. Answering a tougher
question takes time for a programmer as it takes time to evaluate the logic and
answering an easier problem takes less time. So, time acts as a parameter to determine
the difficulty of the question.
In hackerrank platform tougher questions are given more marks as compared to
easier one and in tougher problems there may be partial execution which results in the
deduction of marks to the programmer.
The level mentioned by the instructor will also affect the difficulty of the problem
which in turn reduces the marks scored by the programmer.
The approaches and techniques that are used in this are:
Clustering using k-means:
-In this algorithm we grouped the data points into easy, medium and hard (Fig. 3).
306 S. Vamsi et al.
Bar plots: - We plotted the data points on the bar plot in which x-coordinate is
consisting of “Number of attempts of each question (count)” and the y-coordinate
consisting of “Names of the questions(challenge/name)” (Fig. 4).
Fig. 4. The above bar plot plotted between the name and count.
Higher the height of the bar tougher is the question. From the above data “display 1
to n without using loops is an easiest question” and “senior citizen or not” is the
toughest one. The difficulty rating matters for all the features of the corresponding
question. The sore mentioned on the y-axis rates the difficulty.
Classifying Difficulty Levels of Programming Questions 307
Scatter plot:
- We plotted a graph by comparing the level of each question given by the instructor
and the level of the question we predicted based on view of programmers (Fig. 5).
Fig. 5. The above figure representing the level of each question given by the instructor and the
level of the question that we predicted.
Consider a question “printing pattern 6” from the above plot. According to the
instructor it is a level 2 question but in our predictions we conclude it as a level 3 based
on the students point of view. Considering “how many doors are open” question, it is a
level 1 question with respect to the instructor. In our prediction we conclude it to be a
level 3 question. Where as the “ordered array” question is treated with same level of
difficulty with respect to both instructor and programmer. So, here we can say that the
analysis of difficulty rating of the question specified by the instructor may or may not
coincide with the student’s or programmer’s point of view.
The above plots mention the difference between the difficulty levels of programs
mentioned by instructor and the difficulty rate what students feel. This may result in a
bad interaction between the instructor and programmer. To rectify this we have pre-
dicted the difficulty level of questions on the basis of the programmer’s performance.
ratio for all the programmers we can also derive the best performer of the course. If a
person has attempted an accountable number of questions with the above calculated
ratio nearer to 1, then we can conclude him/her as one of the best programmer of that
course.
4.2 Conclusion
From this paper we want to give the programmer a better interface about difficulty level
which he may feel about the question. This will boost the confidence of the pro-
grammer and he/she will get more interest towards coding. This will also improve their
coding agility. This will provoke the feel of competitive programming among the user.
References
1. Joshi S (2004) Tertiary sector-driven growth in India: impact on employment
2. Chowdhury T, Rafiq Ullah A, Maqsud Ul Anwar MD (2017) A non-classical approach to
recommender system for competitive programmers. Doctoral dissertation, BRAC University
3. Teodorescu RE, Seaton DT, Cardamone CN, Rayyan S, Abbott JE, Barrantes A, Pawl A,
Pritchard DE (2012, February) When students can choose easy, medium, or hard homework
problems. In: AIP conference proceedings, vol 1413, no 1, pp 81–84
4. Klein C (2018) What do predictive coders want? Synthese 195(6):2541–2557
5. Lee FL, Heyworth R (2000) Problem complexity: a measure of problem difficulty in algebra
by using computer. Educ J-Hong Kong-Chin Univ Hong Kong 28(1):85–108
6. Ramakrishna Murty M, Murthy JVR, Prasad Reddy PVGD (2011) Text document
classification based on a least square support vector machines with singular value
decomposition. Int J Comput Appl (IJCA) 27(7):21–26. https://doi.org/10.5120/3312-4540,
[impact factor 0.821, 2012]
Self-Adaptive Communication of Wireless
Sensor Networks
1 Introduction
Internet of Things (IoT) assists in forming an intelligent network that can be sensed,
monitored and controlled with IoT enabled devices that use embedded technology to
communicate with each other or the Internet. IoT based Wireless Sensor Networks
(WSNs) are prevalent in many fields because of their ability to implant small, low-
power battery operated and low cost sensors for monitoring applications. A wireless
sensor network is a self-configuring wireless network comprising of spatially dispersed
devices using sensors in monitoring physical or environmental conditions at different
locations. Wireless sensor networks function unattended where the sensor nodes are
organized randomly and are expected to self-organize themselves to form multi-hop
networks [5].
The foremost challenge in wireless sensor networks is the frequent node failure due
to harsh environment, energy depletion and also interference from the environment.
Nodes might crash or can be moved physically resulting in changes in network topology
thus disturbing the network functionality. Similarly, the dynamic nature of such network
allows new nodes to enter in the network which also often lead to topology changes. So
nodes in wireless sensor network must act autonomously to recover from environmental
disturbances by adapting and organizing themselves in the network without requiring
human intervention and also providing a well-organized information exchange methods
especially in multi-hop scenario. In this paper, we mainly discuss on how nodes self-
heal themselves and reorganize the network topology in such dynamic and decentralized
network.
The rest of this paper is organized as follows. Section 2 gives a brief introduction
about 6LoWPAN protocol. Section 3 presents an overview of RPL routing protocol. In
Sect. 4, we discuss about the experimental implementation and testing results of RPL
protocol. Finally, we draw conclusions in Sect. 5.
2 6LoWPAN
The most important feature of an IoT is the communication between the devices that
are provided using communication protocols. Some of the communication protocols for
IoT are Zigbee, ZWave, Bluetooth, LoRaWAN, 6LoWPAN and Sigfox. Out of the
available communication protocols, an IPv6 over Low-Power Wireless Personal Area
Networks (6LoWPAN) is used. It is an open standard low power wireless network
protocol facilitating IPv6 networking on devices constrained with limited memory,
power and processing running 802.15.4. It connects the break between low power
devices and IP world [10] by using an adaptation layer between MAC and network
layer so as to hold interoperability between IEEE 802.15.4 and IPv6 [11]. The main
task of an adaptation layer is compressing IPv6 header, fragmenting IPv6 payload and
compressing UDP header. Routing problems are very difficult for 6LoWPAN networks
given low-power and lossy radio links, battery supplied nodes, multi-hop mesh
topologies and frequent topology changes due to mobility which requires routing to be
self-manageable without human intervention. Therefore, a routing protocol for low-
power and lossy Networks called RPL was proposed by IETF ROLL Working Group
measuring the routing necessities on numerous applications such as industrial
automation [14], home automation [15] and building automation [17].
3 Related Work
The research works are mainly concentrated on the simulation evaluation of RPL
protocol with various simulators. The authors in [18] performed a study and estimation
of RPL repairing mechanisms on metrics such as convergence time, power con-
sumption and packet loss. In [13], RPL performance with various network settings is
done using Cooja simulator for several metrics such as signaling overhead, latency and
energy consumption. The authors in [19] proposed an RPL objective function based on
fuzzy logic. RPL based on two objective functions called ETX and hop count in Cooja
simulator is done for selecting parents in [20]. A study on RPL performance is done in
[21] using OMNET ++ simulator with ETX metric in forming DODAG topology.
Self-Adaptive Communication of Wireless Sensor Networks 311
A
1 1
1. DIO MulƟcast
2. DAO unicast and Next-hop
Preferred Path
2 2
Alternate Path
C
B 1
1 1
1
2 D
2
2
E
F
Upward Routing
6LoWPAN Border Router or root node begins creating DODAG topology multi-
casting a DIO control message to downward nodes as an announcement of its presence
in the network. The nodes in wireless radio range of root will receive the DIO and
312 N. Nithya Lakshmi et al.
process the DIO message to compute their rank and decide parent based on metrics
defined by an objective function. In addition, the node that receives DIO processes it to
come to a decision in joining DODAG. When a node chooses to join the graph, a route
will be established towards root node by updating the DIO message and multi-casting
updated DIO message to its neighbors. This procedure persists until the entire network
is connected by forming paths using DIO message which is called upward routing.
Meanwhile, if a new node needs to join the graph or when a node is booted for first
time, the node sends a DIS message to solicit DODAG formation information from
neighboring nodes.
Downward Routing
RPL creates and continues route in downward direction with DAO messages thus
propagating the destination information upward along DODAG called as downward
routing and it supports two modes of downward routing: Storing mode or Non-storing
mode. Every node in storing mode stores downward routing tables for their sub-
DODAG and also observes its routing table to prefer the next node for sending data
packet whereas the nodes in non-storing mode do not store downward routing tables
and each node has to propagate the list of its parent to the root node which in turn
computes the paths to the destinations. In this paper, the downward paths are created
using storing mode.
A local repair is initiated by node A as shown in Fig. 6 where it repairs itself and
joins back in the network topology by establishing a route back to LBR in DODAG
and starts multi-casting DIO control messages. Meanwhile node B with no neighboring
nodes in its radio range multicasts DIS control messages and on receiving DIO from
node A which repairs itself it joins back in the network topology which is shown in
Fig. 7.
Self-Adaptive Communication of Wireless Sensor Networks 315
A dynamic routing path is created where the node self-heals itself in the absence of
an already existing node and joins the DODAG. It is clearly observed from above
results that multihop mesh topology network routes is supported in low-power net-
works using RPL and self-adapts to network topology changes caused due to envi-
ronmental disturbances and node failures.
316 N. Nithya Lakshmi et al.
6 Conclusions
References
1. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol
Biol 147:195–197
2. Winter T, Thubert P, Brandt A, Hui J, Kelsey R, Pister K, Levis P, Struik R, Vasseur JP,
Alexander R (2012) RPL: IPv6 routing protocol for low power and lossy networks. IETF
RFC: 6550
3. Contiki: The open source OS for the internet of things. https://www.contiki-os.org
4. Vasseur JP (2014) Terms used in routing for low power and lossy networks. IETF RFC:
7102
5. Vasseur JP, Agarwal N, Hui J, Shelby Z, Bertrand P, Chauvenet C (2011) RPL: The IP
routing protocol designed for low power and lossy networks. Internet Protocol for Smart
Objects (IPSO) Alliance
6. Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a
survey. J Comput Networks 38(4):393–422
7. Gaddour O, Koubaa A (2012) RPL in a nutshell: a survey. Comput Networks 56:3163–3178
8. Dunkels A, Gronvall B, Voigt T Contiki – a lightweight and flexible operating system for
tiny networked sensors
9. Texas Instruments: CC1310 data sheet. www.ti.com
10. Ko JG, TerZis A, Dawson-Haggerty S, Culler DE, Hui JW, Lewis P (2011) Connecting low-
power and lossy networks to the internet. Commun Mag IEEE 49(4):96–101
11. Kushalnagar N, Montenegro G, Schumacher C et al (2007) IPv6 over low-power wireless
personal area networks (6LoWPANs): overview, assumptions, problem statement and goals.
RFC 4919
12. Montenegro G, Kushalnagar N, Hui J, Culler DE (2007) Transmission of IPv6 packets over
IEEE 802.15.4 networks. RFC 4944
13. Zhang T, Li X (2014) Evaluating and analyzing the performance of RPL in Contiki
14. Thubert P, Pister K, Dwars S, Phinney T (2009) Industrial routing requirements in low
power and lossy networks. IETF RFC 5673
15. Brandt A, Buron J, Porcu G (2010) Home automation routing requirements in low power
and lossy networks. IETF RFC 5826
16. Dohler M, Barthel D, Watteyne T, Winter T (2009) Routing requirements for Urban low-
power and lossy networks. IETF RFC 5548
17. Martocci J, De Mil P, Riou N, Vermeylen W (2010) Building automation routing
requirements in low power and lossy networks. IETF RFC 5867
Self-Adaptive Communication of Wireless Sensor Networks 317
18. Khelifi N, Kammoun W, Youssef H (2014) Efficiency of the RPL repair mechanisms for low
power and lossy networks: IEEE
19. Gaddour O et al (2014) OF-FL: QoS-aware fuzzy logic objective function for RPL routing
protocol, pp 365–372
20. Banh M et al (2015) Performance evaluation of multiple RPL routing tree instances for
internet of things applications. IEEE
21. Tripathu J, Oliveira J, Vasseur J (2010) A performance evaluation study of RPL: routing
protocol for low power and lossy networks, pp 1–6
Evaluation of Performance Metrics
in GeoRediSpark Framework for GeoSpatial
Query Processing
Abstract. Now-a-days we are moving towards digitization and making all our
devices producing bigdata. This bigdata has variety of data and has paved the
way to the emergence of NoSQL databases, like Cassandra, MongoDB, Redis.
Bigdata such as geospatial data requires geospatial analytics in applications such
as tourism, marketing, rural development. Spark framework provides operators
for storing and processing distributed data. Our earlier work proposed
“GeoRediSpark” to integrate Redis with Spark. Redis is a key-value store that
uses in-memory store, hence integrating Redis with Spark can extend the real-
time processing of geospatial data. The paper investigated on storage and
retrieval of Redis built in geospatial queries and added two new geospatial
operators namely GeoWithin and GeoIntersect to enhance the capabilities of
Redis. Hashed indexing is used to improve the processing performance. Com-
parison on Redis metrics on three benchmark datasets is made in this paper.
Hashset is used to display geographic data. Output of geospatial queries is
visualized in specific to type of place and nature of query using Tableau.
1 Introduction
Companies that use big data for business challenges can gain advantage by integrating
Redis with Spark. Spark framework provides support for analytics, where process
execution is fast because of in-memory optimization. Out of various NoSQL databases,
Redis provides key-value pair, in-memory storage and suits to applications that require
fast results. As such, when integrated, Redis and Spark together can index data effi-
ciently and helps in analytics of variety of data driven applications. Geospatial data
helps in identifying the geographic location of an object, its features and boundaries on
earth. Such data can be analyzed to serve various purposes such as tourism, health care,
geo marketing and intelligent transportation system.
Even though Redis has no declarative query language support, data can be indexed
like in relational databases and structured as JSON fragments. Cassandra monitors
nodes, handles redundancy and can avoid lazy nodes, where as Redis can monitor these
activities at higher granular level. Even though some works are reported for labelling
and retrieving Redis data, are not efficient either at indexing or at retrieval. This paper
aims at adding the functionality of spatial querying for Redis database by integrating it
with Spark.
Hashed sharding computes a hash value for each of the shard key, based on this
each chunk is assigned a Redis instance. Ranged sharding divides data into ranges
based on shard key values and then each chunk is assigned a Redis instance. Let the
Redis instances are numbered as 0, 1,….n − 1, where n is the total number of instances
in the cluster. Let the range R = 16384, and hash function is CRC16 of the key modulo
16384, now the hash function maps key-value pair to the Redis instances in this range
as shown in Fig. 1.
The present study proposed “GeoRediSpark” to integrate Redis with Spark for
efficient query processing on the existing operations GeoRadius, GeoDist and to
enhance the functionality of Redis by adding GeoWithin and GeoIntersect and finally
to visualize the query output. This paper is organized as follows: Sect. 2 presents
literature survey on existing works for geospatial query processing. Proposed system is
briefed in Sect. 3. Results are discussed in Sect. 4.
2 Literature Survey
scheduler for spatial query processing and optimization. It generates query execution
plans using spatial indexing techniques. Although the query execution performance is
improved and communication cost is reduced, their architecture is costly to implement
and the use of filters can increase the processing time of a query. Spatial- Hadoop and
GeoSpark are discussed in [6]. Authors proved that GeoSpark is faster than Spa-
tialHadoop for geospatial big data analytics. Spatial data processing system that
schedules and executes range search is described in [7]. k-NN, spatio-textual operation,
spatial-join, and kNN-join queries are described. Bloom filters are used to reduce
network communication. Hashed indexing can improve the query processing time by
reducing the use of memory for storing global and local indexing. Location Spark
caches frequently accessed data into memory while stores the less frequently used data
into disk, but usage of filters on the spatial data increases the implementation cost of
this architecture. Authors of [8] described Panda architecture for spatial predictive
queries such as predictive range, k-NN and aggregate queries. Advantage of Panda
System is that, it can display answer on the right side of the interface along with a set of
statistics showing the system behaviour, pre-computed areas will be marked and
illustrated. But usage of grid and list data structures to store the data may cause
memory wastage. Also processing the grid data structures requires more amount of
time. The identification of object moment is very important in system and variations
may lead to major differences. Distributed Profitable-Area Query (DISPAQ) is
described in [9]. It identifies profitable areas from raw taxi trip data using PQ-index.
Z-skyline algorithm prunes multiple blocks during query processing. But usage of
predictive function increases the computation overhead, though it reduces the pro-
cessing time for executing frequent queries. Performance can be improved by dividing
the places in to zones. Data mining is used to increase marketing of an educational
organization in [10]. Authors used student residence distance, calculated by Haversine
formula (orthodromic distance) and k-means is used to cluster the locations. Student
residential address is calculated using latitude and longitude and visualized minimum,
maximum and average distance. Their visualization can be used by organizations with
1000 to 2000 students to improve the admission rate. For huge data, visualization is
difficult. Surrounding join query is described in [11].
The present work mainly focused on Understanding the existing architecture of
Redis and to integrate it with Spark for processing Geospatial queries. To incorporate
location-based functionality such as Geowithin and Geointersect in addition to Geo-
radius and Geodist built in commands of Redis. To visualize the results Redis Hashset
visualization and GeoJSON are used with Tableau as it handles different types of data
and supports business analytics operations for generating automatic reports.
3 Proposed System
This section presents the detailed functionality of the proposed system to perform
geospatial querying in the GeoRediSpark Architecture. In this, Redis is integrated with
Spark framework as shown in Fig. 2, so that query response time for spatial data
analysis could be optimized. Details of the architecture and methodology can be found
in our previous work [12]. A cluster of 3 nodes where each node has an Intel Core i5
Evaluation of Performance Metrics in GeoRediSpark Framework 321
Table 2 presents Redis metrics [16] derived using Redis-Stat monitoring tool. Such
monitoring helps in identifying problems and to enhance experimental setup. For
performance metrics, latency (Response time) L in milli seconds, for Memory metrics,
used memory M, for Basic activity metrics, keyspace (Resource utilization) KS, per-
sistence metrics (to check volatility of dataset) rdb_changes, Error metrics (rejected
connections) RC are monitored.
It can be observed from Table 2 that latency is 2/s (approximately) for each of the
dataset, meaning that we can not get response prior to that time because of execution of
commands such as intersection and sorting the result. Memory used is 869 kb which is
not greater than total available memory, indicating that the Redis instance is not at risk
and don’t require swapping. Consistent hashing at storage and retrieval (HSET and
HGET commands) helped to avoid swapping process. Memory based storage and
query processing requires greater I/O speed. This drawback is overcome by in-memory
geosptail data storage and processing [17].
The following Figs. 3, 4, 5 and 6 presents the Redis metrics for the three datasets.
As Redis is an in-memory store, if keyspace is larger, then more physical memory
is required. Keyspace is 599.3 approximately, and keyspace_misses, rejected connec-
tions is zero, indicating optimal performance of geospatial query processing with
Redis-Spark integration. Rdb_changes is 25420 m approximately, the value is
increased based on time of execution.
Evaluation of Performance Metrics in GeoRediSpark Framework 323
References
1. Sukumar P (2017) How Spark and Redis help derive geographical insights about customers.
https://build.hoteltonight.com/how-spark-and-redis-help-derive-geographical-insights-about-
customers-be7e32c1f479
2. SriHarsha R (2017) Magellan: Geospatial processing made easy. http://magellan.ghost.io/
magellan-geospatial-processing-made-easy/
3. Nativ S (2017) Building a large scale recommendation engine with Spark and Redis-ML.
https://databricks.com/session/building-a-large-scale-recommendation-engine-with-spark-
and-redis-ml
Evaluation of Performance Metrics in GeoRediSpark Framework 325
4. Cihan B (2016) Machine learning on steroids with the new Redis-ML module. https://
redislabs.com/blog/machine-learning-steroids-new-redis-ml-module/
5. Hagedorn S, Götze P, Sattler K-U (2017) The Stark framework for spatial temporal data
analytics on Spark. In: Proceedings of 20th international conference on extending database
technology (EDBT), pp 123–142
6. Tang M, Yu Y, Aref WG, Mahmood AR, Malluhi QM, Ouzzani M (2016) In-memory
distributed spatial query processing and optimization, pp 1–15. http://merlintang.github.io/
paper/memory-distributed-spatial.pdf
7. Tang M, Yu Y, Malluhi QM, Ouzzani M, Aref WG (2016) Location Spark: a distributed in
memory data management system for big spatial data. Proc VLDB Endowment 9(13):1565–
1568
8. Hendawi AM, Ali M, Mokbel MF (2017) Panda: a generic and scalable framework for
predictive spatio-temporal queries. GeoInformatica 21(2):175–208
9. Putri FK, Song G, Kwon J, Rao P (2017) DISPAQ: distributed profitable-area query from
big taxi trip data. Sensors 17(10):2201, 1–42
10. Hegde V, Aswathi TS, Sidharth R (2016) Student residential distance calculation using
Haversine formulation and visualization through Googlemap for admission analysis. In:
Proceedings of IEEE international conference on computational intelligence and computing
research (ICCIC), pp 1–5
11. Li L, Taniar D, Indrawan-Santiago M, Shao Z (2017) Surrounding join query processing in
spatial databases. Proceedings of ADC 2017, pp 17–28. Springer International Publishing
12. Vasavi S, Priyanka GVN, Anu Gokhale A (2019) Framework for visualization of geospatial
query processing by integrating Redis with Spark, pp 1–19, IJMSTR, vol 6, issue 1 (in press)
13. Places in India (2018) http://www.latlong.net/country/india-102.html. Accessed 1 June 2018
14. Geospatial Analytics in Magellan (2018) https://raw.githubusercontent.com/dima42/uber-
gps-analysis/master/gpsdata/all.tsv. Accessed 1 June 2018
15. Spatial dataset (2018) http://www.cs.utah.edu/*lifeifei/research/tpq/cal.cnode. Accessed 1
June 2018
16. Branagan C, Crosby P (2013) Understanding the top 5 Redis performance metrics. Datadog
Inc, pp 1–22
17. Wang Y (2018) Vecstra: an efficient and scalable geo-spatial in-memory cache. In:
Proceedings of the VLDB 2018
CNN Based Medical Assistive System
for Visually Challenged to Identify
Prescribed Medicines
1 Introduction
As per World Health Organization (WHO) statistics given in [1], 253 million people
are with vision impairment, out of which 36 million are blind and for 217 million it is
moderate to severe vision impairment [1]. It is also observed that 81% of people are
aged 50 years and above. 55% of visually impaired people are women. 89% of visually
impaired people live in low and middle income countries. Many articles report that in
recent years there has been a dramatic increase in prescription drug misuse leading to
accidental overdoses. Pill identification can be done either as computer vision based
approach and non computer vision based approach (online platforms). Various medi-
cation problems faced by visually challenged people is reported in [2], such as unable
to read the prescription labels, expiry date of the medication. Hence identification of
proper prescribed oral medicine based on several feature extraction is proposed in this
paper to assure person safety and to facilitate more effective assistive patient care. This
Medical Assistive system for visually challenged, mainly focuses on providing a
medical aid for the blind people to live independently without depending on others for
their day to day activities. There are many systems existing that serve the need for
medical pill recognition. They have drawbacks in recognition of pill, number of shapes
estimated, background modeling estimation and accuracy. A better system overcoming
these drawbacks is presented. Few pharmaceutical companies are including the med-
ication name in Braille on the drug package. This paper is organized as follows: Sect. 2
presents literature survey on pill recognition systems. Proposed system methodology is
presented in Sect. 3. Section 4 presents results and discussion.
2 Literature Survey
Work reported in [3] identifies a pill from a single image using Convolutional Neural
Network (CNN). Pill region is identified first and then data augmentation techniques
are applied. This work has addressed the challenge of minimal labeled data and domain
adaption. It gave a good Mean Average Precision (MAP) score of 0.328 with images
with noise, different backgrounds, poor lighting conditions, various resolutions and
point of view. But Region of Interest (ROI) detection has not included the segmentation
of the pill to completely ignore the background. Pharmaceutical Pill Recognition Using
Computer Vision Techniques is described in [4]. Authors initially converted the images
to grayscale and then performed background subtraction, followed by applying affine
transformation to rescale the images. Deep convolutional neural network is used to
distinguish pills from other categories of objects, but it would not work well for
distinguishing specific pills amongst other pills. The neural network created was 3
layered with 200 hidden nodes and 9 output nodes which was trained with 10 itera-
tions. Their method gives more accuracy for shapes such as circle (75%), oblong (70%)
and less accuracy for shapes like triangle (10%) and square (15%). Authors of [5]
presented Computer-Vision Based Pharmaceutical Pill Recognition on Mobile Phone.
Pill image is captured using mobile phone and shape and color features are extracted.
Shapes such as circular, oblong, oval are considered. Neural Network Based Assistive
System for Text Detection with Voice Output is given in [6]. Webcam is interfaced
with Raspberry pi that accepts a page of printed text and converts it into digital article.
After performing image enhancement and segmentation features are extracted. Finally
audio is generated. There is no clear explanation on dataset used and accuracy.
Adaptable Ring For Vision-Based Measurements and Shape Analysis is described in
[7]. Authors described how to accurately detect the shape of a pill based on outer and
inner ring. Advantage of their method is that new features can be added to detect new
shapes. But disadvantage is that any change in the pill angle will reduce the accuracy.
Image noise also decreases accuracy of the system. Text Detection from Natural Scene
Images is discussed in [8]. Three extraction methods for large characters (30 pixels in
height) are given that are based on Sobel edge detection, Otsu binarization, rule based
connected component selection/extraction, RGB color information and Edge based text
detection has given higher accuracy. Their method could not perform well for large and
small characters. Automatic number recognition for bus route information aid for the
visually-impaired is given in [9]. Authors described extraction of bus route number
information from natural scenes. This information is converted to audio. Such system
helps visually impaired people to know the bus information without someone’s help. It
takes considerable amount of time to execute on smart phones. Text recognition face
328 P. R. S. Swaroop et al.
3 Proposed System
Figure 1. presents the architecture of the proposed system for recognizing the pill and
helping the visually impaired about the medicine he has picked. It works in two ways:
Case 1: Medicine Pill: The pills are recognized based on their features. The image
undergoes preprocessing (Noise Removal and Morphological Operations) and edges
are detected using Sobel Edge Detector. Then the binary mask of the image is
extracted. Feature Extraction (seven features of Hu Moments), Color features (six),
Texture features (24 values obtained from Gray Level Co-occurrence Matrix (GLCM))
are obtained. The extracted features are then trained with the Layered Neural Network
with 37 neurons at input layer, one hidden layer with 100 neurons and output layer with
1000 neurons.
Case 2: Medicine Box: When the person picks pill from the box, the label (text)
present on the pill cover is extracted using Optical Character Recognition and is
compared with the templates available in the database. Text to Speech Engine com-
municates to person whether the medicine he has picked is right pill for the right time.
(i) Preprocessing: In pre-processing Noise Removal is performed by Deblurring using
Guassian. Morphological Operations such as Erosion followed by dilation is done.
Then the image is converted from RGB to gray. Among the three methods Average
CNN Based Medical Assistive System 329
Number of Pills
500
400
300
200
100
Number of Pills
0
WavyTriangle
Rectangle
Pentagon
Trapezium
SemiCircle
Capsule
Triangle
Diamond
Square
Circle
Oblong
Oval
Donut
Shield
SoŌgel
Fig. 2. Statistics of medicines w.r.t shape of the pill and with/without imprint
330 P. R. S. Swaroop et al.
Dataset is created for the common medications of diseases such as Diabetes, Hyper-
tension, Acute Respiratory Diseases, Arthritis and Polycystic Ovarian Diseases.
Figures 3, 4, 5, 6 and 7 presents training and testing accuracy for identifying the
medicines for each of the disease with respect to number of epochs.
The following Table 1 presents accuracy of neural network after 98 and 100
epochs.
Table 1. (continued)
Sno Disease Medicines Total Training Testing Accuracy
considered samples samples samples After 100 After 98
epochs epochs
3 Acute Azythromycin 180 144 36 69.44 77.8
Respiratory Cetzine
Diseases Coldact
Deripyhllin
Sinarest
4 Arthritis Acceloine 108 86 22 81.81 95.45
Dicoliv
Dipane
5 Polycystic Dytor 108 86 22 86.36 90.91
Ovarian Disease Metformin
Spirono
For Diabetes and Acute Respiratory Diseases five medicines are considered. Each
medicine has 36 images. A total of 180 images for these medicines are considered. For
Arthritis and Polycystic Ovarian Disease 3 medicines are considered each of having 36
images. A total of 108 images are considered. For Hypertension, 6 medicines each of
36 images are considered. A total of 216 images are considered. The overall accuracy
achieved for Diabetes is 86.11%, for hypertension 95.45%, for Acute Respiratory
diseases 69.44%, for arthritis 81.81%, for polycystic ovarian disease 86.36%. More
accuracy was achieved by changing the number of epochs. Figure 8 presents screen-
shot of identifying correct medicine at correct time.
This is the high time where we cannot forget the problems faced by visually challenged
people in identifying their medicines. An attempt is made in this paper to help the
people in need. During execution we found that pill identification is effected by various
external factors such as illumination conditions, pill manufacturing, pill cover, shape
and imprint. The proposed system has been implemented both with presence and
absence of imprints. Experimentation is made to identify prominent features for clas-
sifying the pill. The first three modules namely pre-processing, edge detection and
feature extraction has given good results with the reference images of the dataset. The
proposed system classifies the pill given to it. A training dataset consisting of 2000
reference images of 1000 pills was considered. For the implementation of neural
network we considered 800 samples of images from our dataset for training. 200 out of
them have been given for testing. The proposed system classifies the pill given to it.
The accuracy achieved was 91%. For case 2 diseases namely Diabetes, hypertension,
Acute Respiratory diseases, Arthritis are considered. For Diabetes and Acute Respi-
ratory Diseases five medicines are considered. Each medicine has 36 images. A total of
180 images for these medicines are considered. This template can be extended for other
medicines also. More accuracy was achieved by changing the number of epochs.
Recognition of expiry date became difficult in some of the medicines because of
different notation followed by each of the pharma company. Future work is to adapt
normalization of data formats during text recognition process.
References
1. Bourne RRA, Flaxman SR, Braithwaite T, Cicinelli MV, Das A, Jonas JB, Keeffe J (2017)
Magnitude, temporal trends, and projections of the global prevalence of blindness and
distance and near vision impairment: a systematic review and meta-analysis. Lancet Glob
Health 5(9):888–897
2. Zhi-Han Lg, Hui-Yin Y, Makmor-Bakry M (2017) Medication-handling challenges among
visually impaired population. Arch Pharm Pract 8(1):8–14
3. Wang Y, Ribera J, Liu C, Yarlagadda S, Zhu F (2017) Pill recognition using minimal labeled
data. In: Proceedings of IEEE third international conference on multimedia big data, pp 346–
353
4. Tay C, Birla M (2016) Pharmaceutical pill recognition using computer vision techniques.
School of Informatics and Computing, Indiana University Bloomington, pp 1–12
5. Hartl A, Arth C (2010) Computer-vision based pharmaceutical pill recognition on mobile
phones. In: Proceedings of CESCG 2010: The 14th Central European seminar on computer
graphics, pp 1–8
6. Aravind R, Jagadesh RP, Sankari M, Praveen N, Arokia Magdaline S (2017) Neural network
based assistive system for text detection with voice output. IRJET 4(4):44–47
7. Maddala KT, Moss RH, Stoecker WV, Hagerty JR, Cole JG, Mishra NK, Stanley RJ (2017)
Adaptable ring for vision-based measurements and shape analysis. IEEE Trans Instrum Meas
66(4):746–756
334 P. R. S. Swaroop et al.
8. Ezaki, N, Bulacu M, Schomaker L (2004) Text detection from natural scene images: towards
a system for visually impaired persons. In: Proceedings of the 17th international conference
on pattern recognition, 2004. ICPR 2004, vol. 2, pp 683–686
9. Lee D, Yoon H, Park C, Kim J, Park CH (2013) Automatic number recognition for bus route
information aid for the visually-impaired. In: 10th international conference on ubiquitous
robots and ambient intelligence, pp 280–284
10. Manwatkar PM, Singh KR (2015) A technical review on text recognition from images. In:
IEEE 9th international conference on intelligent systems and control (ISCO), pp 721–725
11. Vasavi S, Swaroop PRS, Srinivas R (2018) Medical assistive system for automatic
identification of prescribed medicines by visually challenged using invariant feature
extraction. IJMSTR 6(1):1–17
Experimental Analysis of Machine Learning
Algorithms in Classification Task of Mobile
Network Providers in Virudhunagar District
Abstract. Data mining has many classification algorithms with desired fea-
tures. The machine learning algorithms such as K Nearest Neighbor, Support
Vector Machines and Neural Network are some of the most popular algorithms
used for classification task. Since classification is gaining importance due to the
enormous amount of big data in real world datasets, the choice of a perfect
classification algorithm is an ultimate need. For the classification task, the
“Mobile Phone network satisfaction” real world dataset has been collected from
the mobile phone users. In today’s world, mobile network chosen by the users
has a greater impact on the individual user’s day-to-day activities and also on
the business of network providers. Hence, the performance and accuracy of the
mentioned machine learning algorithms has been investigated and analyzed in
the prior described datasets. The proposed work analyses the performance of the
KNN, SVM and Neural network classifiers and also analyses the mobile users’
affinity and usage nature based on different age groups.
1 Introduction
2 Related Work
Bansal, V. et al. [1] in their research focused on Customer Satisfaction of Mobile Phone
Service Users Operating in the Malwa Region of the Punjab. The analysis was carried
out using Cronbach’s Alpha, Weighted Average, Ranking, Chi Square and the Per-
centage method. The satisfaction of Mobile User on Bangabandhu Sheikh Mujibur
Rahman science and technology university, Gopalganj, Bangladesh was carried out by
[2]. The results indicate that Network Coverage, Internet offer, Tariff offer are the main
factor for affecting customer which helps to retain the customer and to create customer
loyalty. In [3], the researchers analyzed the customer satisfaction on mobile datasets.
The results found that image and perceived quality have significant impact on customer
satisfaction. Image and customer satisfaction were also found to have significantly
related to customer loyalty.
been adopted. The results conclude that optimal K values produce classification with
good accuracy. By considering all the above factors, it is obvious that for a KNN
classifier the value of K and the distance metrics has a vital role in better classification
and also for higher accuracy rate.
1. Architecture – Neurons.
2. Determining weights on the connections.
3. Activation function.
The learning rules are categorized as Hebbian, Perceptron, Widrow Holf or delta,
Competitive, Outstar, Boltzmann and Memory based. Neural Networks are used in
338 A. Rajeshkanna et al.
3 Proposed Work
Input Data:
The mobile phone network satisfaction dataset is collected from 200 users in the form
of questionnaires. The dataset contains 32 attributes.
Preprocessing of Data:
Data preprocessing is the foremost important step. Most computational tools are unable
to deal with missing values. To overcome this problem we simply removed the cor-
responding columns (features) or rows (samples) from the dataset that contains the
missing value.
Experimental Analysis:
The Experimental analysis was carried out using the Orange Visual programming tool.
The conventional algorithm and the performance metrics are used in the visual pro-
gramming tool. The test and score widget is used for comparison. The classifiers are
analyzed with the tool and the best performing one is selected for the learning outcome
in the proposed work. The overall flow diagram is represented in the Fig. 2.
jjx Y jj2
Kðx; yÞ ¼ expð Þ ð1Þ
2r2
wi j are the weight of the connections and a refers to the activation function which can
be either a sine function or sigmoid function or soft max function.
340 A. Rajeshkanna et al.
Table 2 represents the confusion matrix. ‘S’ represents satisfied customers. ‘US’ rep-
resents unsatisfied customers. The performance metrics analysis shows that KNN
shows highest classification accuracy when compared with SVM and NN. The Number
of instances is 200, Classification rate 167/200 = 0.835 and unclassified data
33/200 = 0.165 for KNN. Classification rate 164/200 = 0.820 and unclassified data
36/200 = 0.18 for Neural network and Classification rate 160/200 = 0.8 and unclas-
sified data 40/200 = 0.20 for SVM. Based on the KNN classifier, depending on the age
group the network provider is classified.
Table 3 represents the classification based on Age group with respect to the net-
work provider. The Age group below 30.5 prefers the category-D network provider.
Category-B is preferred by 61 users in the age group below 30.5. Table 4 represents the
mobile usage of the respondents based on the number of hours of usage, number of
calls and in internet usage in specific. Table 5 represents the classification of the
Experimental Analysis of Machine Learning Algorithms 341
network providers based on age group. Recharge purpose is taken into three categories
such as for the main balance, for internet and for SMS purpose. For the age group
below 30.5, the main balance recharge is high (Table 6).
80
Age Group <30.5
60
40
Age Group 30.5 – 42.0
20
Coverage Problem
Less Data speed
Receiving Unwanted messages and calls
60%
50%
40% 40%
30% 30% 30% 35%
15% 20% 20%
10%
4 Conclusion
References
1. Bansal V, Bansal B (2013) A study on customer satisfaction of mobile phone service users
operating in the Malwa region of the Punjab. ABAC J 33(3):30–40
2. Tapas B et al (2018) Measurement of customer satisfaction of different mobile operators in
Bangladesh; a study on Bangabandhu Sheikh Mujibur Rahman Science and Technology
Univeristy, Gopalganj, Bangladesh. IOSR J Bus Manag 20(3):38–47
3. Jamil JM, Nawawi MKM, Ramli R (2016) Customer satisfaction model for mobile phone
service providers in Malaysia. J Telecommun, Electron Comput Eng (JTEC) 8(8):165–169
4. Hu L-Y, Huang M-W, Ke S-W, Tsai C-F (2016) The distance function effect on k-nearest
neighbor classification for medical datasets. SpringerPlus 5(1):1304
5. Ganesan K, Rajaguru H (2019) Performance analysis of KNN classifier with various distance
metrics measures for MRI images. In: Soft computing and signal processing. Springer,
Singapore, pp 673—682
6. Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient KNN classification with different
numbers of nearest neighbours. IEEE Trans Neural Netw Learn Syst 29(1):1774–1785
7. Preetha V, Chitra K (2018) Authentication in mobile Adhoc network using neural network.
J Adv Res Dyn Control Syst 10(special issue 2):901–909
Performance Evaluation of SVM and Neural
Network Classification Methods for Diagnosis
of Breast Cancer
Abstract. Breast cancer is the major and detrimental ailment amid all of the
afflictions. Females are regularly affected through this disease. Data mining is a
knowledge innovation progression to detect the sickness among enormous
quantity of information. We proposed an approach used for the prognostication
of tumor and presented through support vector machine and neural network
classification methods. 10-fold and 5-fold cross validations are applied in the
intended system to obtain precise results. The breast cancer database is used in
this procedure which is from UCI machine learning repository. By using WEKA
tool we studied the both classification techniques which are support vector
machine and neural network classification models with 5 and 10-fold cross
validations. In addition, support vector machine with 5-fold cross validation got
high accuracy.
1 Introduction
In India, On behalf of every two women who are identified with breast cancer, one
female expires [1]. It is the main regular tumor in female in India [2]. In metropolitan
regions, one in twenty two female diagnosed with this malignancy for the period of life
span [3]. In medical province, patient’s data is extremely important. This information is
used to excavate and interpreted into valuable information. Data mining on medical
data gives a mode to discover hidden association existing in the data.
In the early stage, if we detect tumor, we can reduce the risk factor. A lot of
research has been done to know the prediction of tumor threat with data mining
classification techniques. If we identify the tumor in early stage, there may be a chance
to reduce the risk. In the real life, Data mining applications play a major role
particularly in medical health field. Patient’s dataset plays a vital part in order to detect
disease. In this revise, the UCI machine learning repository breast cancer dataset has
been taken to conduct experiment.
The document is organized as follows. The study of the author’s testes with dif-
ferent classification techniques which are pertained for the assessment of cancer tumor
is explained in Sect. 2. Section 3 specifies detailed explanation about supervised
learning algorithms which are designed for the examination of cancer disease. Sec-
tion 4 provides the details about dataset. The relative study of support vector machine
and neural network classification procedures are analyzed in Sect. 5. Section 6 shows
the conclusion work of whole evaluation.
2 Literature Survey
A series of constructing a class label types which came from a dataset which includes
class tags is explained as supervised learning.
Support vector machine:
It is one of the automaton knowledge algorithms which is used in classification
problems. In this algorithm, we plot each data item (value of each feature) in an n-
dimensional space where n refers to the number of features [9]. As Supervised Learning
mainly works on the categorization of the initial data provided, the SVM can be useful
to do it by introducing a hyper plane in between the various labels. The SVM is
effective even in High dimensional spaces. So, as breast cancer may have a number of
attributes, the SVM can be used for categorizing different attributes [10]. The SVM
model is even effective in cases where a number of dimensions are greater than the
346 M. Navya Sri et al.
number of the samples. It utilizes/requires only less amount of memory as it uses only a
subset of training points in the decision function.
Neural Networks:
In automaton learning these are one of the main tools. The way that we humans
learn, the neurons are brain-inspired systems which are planned to replicate [11].
Neural networks consist of layers that are input and output layers, also a hidden layer
containing of units that convert the given input into the output layer which can use.
Neural networks are excellent tools for finding patterns which are at too a complex
structure or numerous for a programmer to extract the machine to recognize. The ANN
could be used as a show tool, to send more serious and probable cases of cancer for
further diagnosis like Mammography or MRI. The intension of the ANN is to help the
radiologist discern, to give the result as accurate as possible to predict tumor’s from the
rest. If the number of amiable patients being subjected to unnecessary and possibly
harmful tests of mammography and breast MRI falls down as a result of this computer
based screening then the ANN would have helped as a useful tool [12].
Patient’s information plays a crucial role to prognosis tumor in females. The dataset has
the following attributes with 286 instances [13].
Age: If a women’s age is more than 40, the chances of breast cancer are high.
So age is one of the best attribute to detect abnormality in women.
Menopause: the risk of tumor may increase at the time of menopause stage in
women’s life. It doesn’t cause cancer directly, but it is one of the reason
based on families history.
tumor size: This attribute plays an important role to identify cancer in females. If we
find the tumor in early stages, we can reduce the risk of cancer.
inv nodes: the number of axillary lymph nodes which gives histological exami-
nation. node-caps: this attribute gives the information about the outer
most layer of a lymph node.
deg-malig: degree of malignancy plays a Vitol role to detect cancer. Breast: this
attribute gives the information about the rashes on the nibble. breast-
quad: quadrants of breast find out the abnormality on the breast.
irradiat: examination of irradiation plays a crucial role. Most women suffer from
this.
5 Experimental Results
WEKA tool is one of the useful software to conduct experiments with different clas-
sification techniques. We conducted experiments with 5-fold and 10-fold on support
vector machine and neural network classification methods.
Performance Evaluation of SVM and Neural Network Classification 347
Approach:
1. Properly categorized occurrences
2. Kappa sign
3. Average utter inaccuracy
4. Source average square inaccuracy
5. Comparative utter inaccuracy
6. Error of the ZeroR
Performance evaluation of Support vector machine classification:
Performance table of support vector machine classification with cross vali-
dation 10:
Comparative Analysis:
The support vector machine with 5-fold has identified correctly 200 instances;
whereas neural network classification with 5-fold has identified 191 classified instan-
ces. In support vector machines performance table, 176 instances recognized to the
class non recurrence event whereas in neural networks presentation table, 160 instances
identified.
Assortments of methods be studied about the abnormality tumors in women. This paper
is examined under support vector machine and neural networks classification tech-
niques with 5-folds and 10-folds. Support vector machine with 5- fold got high
accuracy compare to neural network classification method. These two techniques are
powerful classification techniques. Support vector machine algorithm is very useful on
large database whereas neural networks are adaptive. The analysis of these algorithms
is examined. In forthcoming, the tumor expert methods are going to achieve high
accuracy rate.
References
1. http://www.breastcancerindia.net/bc/statistics/stat_global.htm
2. Ferlay J, Soerjomataram I, Ervik M, et al (2013) GLOBOCAN 2012 v1.0, Cancer incidence
and mortality worldwide: IARC Cancerbase no. 11 [Internet]. International Agency for
Research on Cancer, Lyon, France
3. Chaurasia V, Pal S (2014) A novel approach for breast cancer detection using data mining
techniques. Int J Innov Res Comput Commun Eng 2(1):2456–2465
4. Navyasri M, Haripriyanka J, Sailaja D, Ramakrishna M (2018) A comparative analysis of
breast cancer dataset using different classification methods, SCI
5. Islam MM, Iqbal H, Haque MR, Hasan MK (2017) Prediction of breast cancer using support
vector machine and K-nearest neighbors. IEEE Region, pp 226–229. ISBN no: 978-1-5386-
2175-2
6. Chang M, Dalpatadul RJ, Phanord D, Ashok K (2018) Breast cancer prediction using
Bayesian logistic regression, vol 2, issue 3. Crimson Publishers. ISSN: 2578-0247
7. Nithya B, Ilango V (2017) Comparative analysis of classification methods in R environment
with two different data sets. Int J Sci Res Comput Sci, Eng Inf Technol 2(6). ISSN: 2456-
3307
8. Aavula R, Bhramaramba R (2018) A survey on latest academic thinking of breast cancer
prognosis. Int J Appl Eng Res 13:5207–5215. ISSN: 0973-4562
Performance Evaluation of SVM and Neural Network Classification 349
9. Ramakrishna Murty M, Murthy JVR, Prasad Reddy PVGD (2011) Text document
classification based on a least square support vector machines with singular value
decomposition. Int J Comput Appl (IJCA) 27(7):21–26
10. Akay MF (2009) Support vector machines combined with feature selection for breast cancer
diagnosis. Expert Syst Appl 36(2):3240–3247
11. Attya Lafta H, Kdhim Ayoob N, Hussein AA (2017) Breast cancer diagnosis using genetic
algorithm for training feed forward back propagation. In: 2017 annual conference on new
trends in information and communications technology applications (NTICT). Baghdad,
pp 144–149
12. Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and
Rotation Forest. Neural Comput Appl 28(4):753–763
13. Rajesh K, Anand S (2012) Analysis of SEER dataset for breastcancer diagnosis using C4.5
classification algorithm. Int J Adv Res Comput Commun Eng 1(2):72–77
Data Aware Distributed Storage (DAS)
for Performance Improvement Across
a Hadoop Commodity Cluster
Abstract. Big Data is the order of the day and has found in-roads into many
areas of working other than just the internet, which has been the breeding
ground for this technology. The Remote Sensing domain has also seen growth in
volumes and velocity of spatial data and thus the term Spatial Big Data has been
coined to refer to this type of data. Processing the spatial data for applications
such as urban mapping, object detection, change detection have undergone
changes for the sake of computational efficiency from being single monolithic
centralized processing to distributed processing and from single core CPUs to
Multicore CPUs and further to GPUs and specific hardware in terms of archi-
tecture. The two major problems faced in this regard is the size of the data to be
processed per unit of memory/time and the storage and retrieval of data for
efficient processing. In this paper, we discuss a method of distributing data
across a HDFS cluster, which aids in fast retrieval and faster processing per unit
of available memory in the Image Processing domain. We evaluate our tech-
nique and compare the same with the traditional approach on a 4-node HDFS
cluster. Significant improvement is found while performing edge detection on
large spatial data, which has been tabulated in the results section.
1 Introduction
incomprehensible even by the best GP-GPU based systems and the world is moving
towards more complex hybrid system architectures. The variety in this remote sensing
data comes from the varied data formats, varied sensor resolutions, and various sources
including traditional optical satellites to microwave satellites. Spatial Big Data is the
operational word for many GIS based solutions in order to deliver better systems.
Block/Window based Processing
Matrix based processing has been adopted by many commercial tools such as Matlab,
which exploit the inherent features of spatial raster data embedded in the neighbour-
hood pixels. The blocproc functionality provided by Matlab is on the premise that the
images can be either too large to load into a unit of memory or else they can be loaded
into memory but then are too large in terms of units of time [10] consumed for
processing. Due to processing of various blocks independently there could be problems
of local and global thresholds. While performing edge detection on raster data Matlab
uses global thresholds for processing the different blocks in order to match the results
obtained by full in memory processing of the data. It may be noted that the blocproc
functionality of MATLAB provides a zero padding interface to block processing.
To process images in smaller chunks called blocks is a more conducive approach as
many of the linear filtering and morphological functions use block or neighborhood
processing. This approach has the advantage of processing lesser information, which
makes it faster and suitable to the MapReduce paradigm. It has the advantage of
processing data per unit of available memory and per unit of time, thus improving the
efficiency of processing while considering the limitations of commodity hardware.
A cluster computing environment in the form of cloud computing has led to many a
system for spatial data processing. Zhao et al. [5] proposed a cloud computing based
system providing an application as a Map Reduce service for satellite imagery analysis.
Yang et al. [6] implemented a privacy preserving cloud based Medical Image File
Accessing System to improve reliable analysis and storage of medical images. Shelly
and Raghava’s [7] system based on Hadoop and Cloud Computing demonstrates
effective speedup and efficiency gain of Iris template matching as compared to
sequential process. The size of input file used was as high as 13.2 GB. Alonso-Calvo
et al. [8] proposed a distributed image processing approach based on images trans-
formed to region based graphs thus allowing parallel execution of image processing
algorithms.
Tt ¼ 67 Tr ð1Þ
However, if the data were distributed based on the block size of 32 MB then total
time taken would be
Tt ¼ 32 Tr þ Tnl ð2Þ
which is the time taken to read the 32 * 1000000 pixels on both the systems plus the
network latency for the data to reach the process. Though the reading of data happens
in parallel on both the systems, which is half of the earlier time, the network latency
adds that extra time to the processing. Two parameters are important while processing
the data in this manner viz.;
Data Aware Distributed Storage (DAS) for Performance Improvement 353
Heterogeneous systems
Stability of network and systems in the network
In a HDFS, which is a commodity cluster, the I/O on multiple machines is not uniform
and hence the slowest reader/writer is the weakest link and secondly any instability in
the network or system would lead to restart of the reading process from a redundant
node, thus consuming even more time.
2.3 Motivation
The Map Reduce environment as described in the Literature Survey is a candidate for
implementation of Image Processing algorithms and provides an efficient platform for
parallel processing of data. The increase in accurate spatial data and their usage on GIS
platforms serving the common person necessitates the usage of modern methods to
improve the processing of voluminous, high velocity, multi variant satellite imagery.
In this paper, we discuss the method to distribute data across nodes keeping in view
the Image Processing semantics in view. The storage of Meta information along-with
the data enables one to store data as chunks of various sizes in contrast to the HDFS
block size, which enforces the same block size across the cluster always. [9] Shows the
efficiency of data retrieval using the data distribution method presented here, on a Read
block of an Image.
3 Implementation
3.1 Data Pre-processing
The imagery data is inserted into
HDFS cluster by copying it from
the local disk to the cluster.-
copyFromLocal is used to
accomplish this job.
The images are distributed
across the cluster based on the
hdfs.blocksize parameter, which
does not take any other constraints
into consideration. In the current
implementation, the data place-
ment is of paramount importance
Fig. 1. Data distribution across HDFS cluster
and is effected based on the chunk
size chosen by the user.
The input image is processed to divide it into chunks of user defined size based on
the image and the application and these chunks are handed over to the HDFS default
writer in the given size and flushed into HDFS (Fig. 1).
354 R. Phani Bhushan et al.
The Image data is split as a matrix to be Table 1. Description of symbols for chunk
stored on the existing cluster. The image can size calculation
be visualized as a set of boxes each of which Description Symbol
is stored in a separate node of the cluster. The # of scanlines s
image is divided into square chunks with a # of pixels per scanline p
reasonable overlap to avoid any artifacts in Default block size (hadoop) D
the resulting image after performing image # of Overlap Pixels L
processing operations such as edge detection,
# of bytes per pixel B
histogram equalization etc. Figure 2 shows
# of Blocks R = p/D
the division of data with overlaps at every
# of pixels/block in a M = D/B
chunk. We also observe that in this division
scanline
The time taken to write the file into HDFS is equivalent to the direct copy of the file
into the cluster.
Data Aware Distributed Storage (DAS) for Performance Improvement 355
For the evaluation of the above schema, a HDFS cluster of 4 nodes was used and their
configurations are as in Fig. 3. The cluster is a commodity cluster with varying sizes of
memory and vcores.
The experiment
is conducted with
three images of
different sizes
namely 600 MB,
1.5 GB and 5.1
GB. The nodes of
the cluster are
shown in Fig. 3.
The nodes of this
cluster represent a
Hadoop commod-
ity cluster, which
consists of systems
with varying set of Fig. 3. Nodes of the cluster
resources.
4.3 Results
The experiment con- File Size = 600 MB
sisted of processing
3000
images downloaded
from USGS explorer and 2000
using montage to stitch 1000
them together and run- 0
ning an edge detection 128MB 64MB 32MB
algorithm. The images chunk size
were sub-divided into
Default Data Aware Storage
chunks keeping the
overlap factor intact
across the images both in
scan and pixel direc-
File Size = 1.5 GB
tions. In addition, the 6000
experiment was con- 4000
ducted by varying the 2000
chunk size within a file 0
128MB 64MB 32MB
from 32 MB to 128 MB chunk size
with hdfs blocksize at
128 MB. Default Data Aware Storage
The graphs shown in
Fig. 4 reflect the time
taken for an image of a File Size = 5.1 GB
given size to be pro-
8000
cessed using the default 6000
strategy of distribution 4000
and the data aware 2000
strategy of distribution. 0
128MB 64MB 32MB
The data aware storage
chunk size
(DAS) improves the
performance by almost Default Data Aware Storage
30% in files of smaller
size to 50% in larger Fig. 4. Comparison of the edge detection algorithm performance
files. We also observe on images of different sizes using default and DAS based
that there is not much distribution of data
variation in the process-
ing time with the change
in chunk size of the data.
This improvement can be attributed to the fact that the matrix method of storing an
image brings out the variation in the neighbouring pixel’s value without having to
traverse the HDFS for the neighbouring pixels.
Data Aware Distributed Storage (DAS) for Performance Improvement 357
References
1. Lee CA, Gasster SD, Plaza A, Chang C-I, Huang B (2011) Recent developments in high
performance computing for remote sensing: a review. IEEE J Sel Top Appl Earth Obs
Remote Sens 4(3):508–527
2. Lv Z, Hu Y, Zhong H, Wu J, Li B, Zhao H (2010) Parallel k-means clustering of remote
sensing images based on mapreduce. In: Proceedings of the 2010 international conference on
web information systems and mining, ser. WISM 2010. Springer-Verlag, Berlin, Heidelberg,
pp 162–170
3. Li Y, Crandall DJ, Huttenlocher DP (2009) Landmark classification in large-scale image
collections. In: ICCV, 1957–1964
4. Bajcsy P, Vandecreme A, Amelot J, Nguyen P, Chalfoun J, Brady M (2013) Terabyte sized
image computations on hadoop cluster platforms. In: Big Data, 2013 IEEE international
conference, October 2013, pp 729–737
5. Zhao JY, Li Q, Zhou HW (2011) A cloud-based system for spatial analysis service. In: 2011
international conference on remote sensing, environment and transportation engineering
(RSETE), Nanjing, 24–26 June 2011, pp 1–4
6. Yang C-T, Chen L-T, Chou W-L, Wang K-C (2010) Implementation of a medical image file
accessing system on cloud computing. In: 2010 IEEE 13th international conference on
computational science and engineering (CSE), Hong Kong, 11–13 December 2010, pp 321–
326. http://dx.doi.org/10.1109/CSE.2010.48
7. Shelly, Raghava NS (2011) Iris recognition on hadoop: a biometrics system implementation
on cloud computing. In: 2011 IEEE international conference on cloud computing and
intelligence systems (CCIS), Beijing, 15–17 September 2011, pp 482–485. http://dx.doi.org/
10.1109/CCIS.2011.6045114
8. Alonso-Calvo R, Crespo J, Maojo V, Muñoz A, Garcia-Remesal M, Perez-Rey D (2011)
Cloud computing service for managing large medical Image data-sets using balanced
collaborative agents. Adv Intell Soft Comput 88:265–270. https://doi.org/10.1007/978-3642-
19875-5_34
9. Phani Bhushan R, Somayajulu DVLN, Venkatraman S et al (2018) A raster data framework
based on distributed heterogeneous cluster. J Indian Soc Remote Sens. https://doi.org/10.
1007/s12524-018-0897-5
10. https://kr.mathworks.com/examples/image/mw/images-ex86052154-blockprocessing-large-
images
Dodecahedron Model for Storage
of Unstructured Data
Abstract. Nowadays, due to the development of the internet and social media,
the unstructured data is growing exponentially at a high rate. With the growth of
a variety of unstructured data, comes a problem of organizing and querying this
unstructured data. In this paper, we present a way of organizing the unstructured
data efficiently. We call this model as Dodecahedron Data Model (DDM). DDM
stores a variety of unstructured data with the help of a distributed hash
map. DDM provides various operations to store and retrieves the unstructured
data, and interlink related and unrelated data.
1 Introduction
Development of the internet and social network lead to increase of unstructured data
exponentially. However, there is lack of a generalized model for unstructured data,
which is a major challenge for big data. Many solutions were proposed to manage
unstructured data. But, most of them focused on limited data types, due to which their
scalability and usability are limited. Therefore there is a need for a generalized model to
store a variety of unstructured data. There is a need to manage diverse kinds of data
which are interrelated, so the data model should be evolved as per the user’s wish. In
this paper, we present you generalized data model which not only used to store different
types of unstructured data but also perform operations which are efficient and simple
compared with others. A data type is a particular kind of data which is defined by its
value. Data structures are used to store and organize data of particular data type or
multiple data types. At present there are data structures such as an array, linked list,
queue, graphs etc. in these data structures if we want to store any new data type then we
need to modify the definition in the code, or we need to write the extensions at the point
where we need to use. In order to simplify these changes done to code, in dodeca-
hedron we can store any data of any datatype without any extension or modifications
done to the code. In our proposed model, we use a 3-Dimensional figure for storage of
different types of data. This figure belongs to the family of the polyhedron.
A polyhedron is a 3-Dimensional figure with at polygon faces. The polyhedron is
named based on number faces present in the shape. There different types of polyhe-
drons and some of the regular polyhedrons are shown in the Fig. 1 given below.
© Springer Nature Switzerland AG 2020
S. C. Satapathy et al. (Eds.): ICETE 2019, LAIS 3, pp. 358–365, 2020.
https://doi.org/10.1007/978-3-030-24322-7_46
Dodecahedron Model for Storage of Unstructured Data 359
2 DDM Modeling
these both can be connected in 2 square ways. They are DDH-DDH, DDH-FACE,
FACE-FACE and FACE-DDH. For more details see Fig. 2.
3 Salient Features
• It is used to store unstructured data without any restrictions on storing the data.
• The stored data can be easily retrieved and connected between various data types
without any restrictions.
• Though the structure becomes complex as the data is stored randomly, the scala-
bility of the structure increases due to the hash key value.
• While retrieving the data, the user can retrieve the whole chain of data sequentially
even though it is stored randomly.
4 Implementation
The overall process is that the unstructured data such as image, videos, audios, etc. are
organized in a dodecahedron data model (DDM). In DDM any unstructured data is
stored as you wish and interlinked as you wish between any data type. Further, this
unstructured data is stored in the graph database. This process of DDM includes 3
stages:
Stage 1: Load the data into the Dodecahedron Data Model. In this step, the user will
submit the unstructured data to the DDM. DDM specifies a standard, in that order,
only any data item will be added. Any data that can be stored in DDM will go to
one of the faces of the dodecahedron.
Stage 2: Add interlinks among the data objects. Once the data added to the DDM
model in Stage 1. The relationships among the data can be formed by specifying the
links. DDM enables that option of specifying the interlinks in a flexible way. The
user just has to provide a pair of data objects, for which the link to be made. Here
data objects can be anything between Face and Dodecahedron.
Stage 3: Perform operations on the DDM model using DDM Query Language.
The above stage 1 stage 2 process can be better understood by algorithms men-
tioned in references [2].
DDM supports the following operations: Search, Insert, Delete and Update. Each of
the above operations will be explained in detail in the DDM Query
Language section.
Dodecahedron Model for Storage of Unstructured Data 361
Properties
Dodecahedron Face
Properties: Properties:
Label: String Contains label of Dodecahedron. variable_name: String Contains the
Hedron Size: Integer Contains No. of faces of variablenameoftheFace:
Polyhedron Ex: 12 - Dodecahedron. datatype:
It can be customized also. StringContainsthedatatypeoftheFace:
Polyhedron: ConcurrentHashMap data: T Contains the data value of the
Contains faces and its data. variable.
Polyhedron = (VariableName, DataType) next: Object Contains the address of the
¼> Face next face/Dodecahedron.
Next: Object Contains address of the next Methods:
Face/Dodecahedron. Face(variablename; datatype; data):
Methods: Constructor for initializing a face data.
Dodecahedron(Label,Hedronsize): constructor for linkFace()
initializing a Dodecahedron. Adding face to the linkedlist of a Face.
pushData(VariableName, DataType, DataValue): linkDodecahedron()
Adds data to one of the available faces of the Adding Dodecahedron to the linkedlist
Dodecahedron. of a Face.
removeData(VariableName, Type) Removes the removeLinkFace()
data from face containing variable name and type Removes link of a face from a linkedlist
which is stored already on Dodecahedron. display- of a face of Dodecahedron.
Data() Displays all the connected data of a removeLinkDodecahedron()
Dodecahedron. Removes link of a Dodecahedron from a
linkedlist of a face.
displayData()
Displays all the connected data of a
Face.
Example:
Let see how DDM models the unstructured data with an example. One such
unstructured data can be a power point presentation of most general form. It contains
title and presenter details in slide 1, contents in slide 2, on subsequent slides contains
6 Comparisons
We have compared popular data models for unstructured data with our DDM in dif-
ferent aspects shown in table below
Comparison
Feature Neo4j Json DDM
Modelling It uses Its uses key value pairs Combination of key value
graph pairs and graph models
model
Scalability Scaling is Not good for complex Scalable
hard data
Sharding Not Possible Possible
possible
Version history Not Not supported Supported
supported
Partial key Possible Not possible/key Possible
sequence sequence must be
search provided
7 Future Scope
• In this paper, we have mentioned only one-to-one relation. This can be further
extended to many-to-many relations and a user can use it upon his will.
• The data storage can be changed to a distributed database and also version history
we used can be changed to maintain on display record of data which present in the
sequence.
• The data sequence can be authenticated for modi cations using authenticated data
structure. The data can be encrypted using SHA algorithms in order to hide the data
from public view.
• This can be further developed to directly identify and assess data and relations
among the data for given unstructured data.
• Labeling of the link can be done such that, the user can retrieve the data by
mentioning the link which connected the two data types though there is no need of
the user to know the data or data types connected to the link.
364 R. R. M. Kalakanda and J. RaviKumar
Algorithms
Insertion Search
InsertFace(ConcurrentHashMap findFaceInCluster(String variable-
ddh,String variableName,String Name,String dataType, String[]
dataType,T data) queryValues)
Input:Unstructured data will be added Input: strings variable name, data type
to one of the dodecahedrons and queryValues string array will be
Facef=new given
Face(variableName,dataType, parse- begin
Data(data)); result=[]
if f!=null then i=1
faceKey=variableName+” ”+dataType for each Dodecahedron ddh in plyhe-
ddh.appendLast(faceKey,f); drns
end result=[]
end for each Dace f in ddh
CreateLink(ConcurrentHashMap poly- If queryVal-
hedrons,String s1,String s2,String ues.split(”,”).include?(f.variableName)
s3,String s4) then
Input: Collection of dodecahedrons result.put(i++,f.data)
polyhedrons and four strings are pro- if f.faceKey.match(faceKey)
vided result.remove(i–);
begin while(f.list.hasLink()) do
if s3.isEmpty and s4.isEmpty then /* begin
create link between dodecahedron and tempFace=f.list.nextFace
dodecahedron */ If queryValues.split(”,”).include?
ddh1=plyhedrns.get(s1) (tempFace.variableName) then
ddh2=plyhedrns.get(s2) result.put(0,tempFace.data)
if ddh1.link==null then if tempFace.faceKey.match(faceKey)
ddh1.link=ddh2 then result.remove(i–);
else end; end; end; end; return result; end
ddh1.appendLast(ddh2)
end;else if s4.empty then
if plyhedrns.get(s1)!=null then
face1=findAllFaces(s2,s3)
if face!=null then
/* link between dodecahe-
dron and face */ pluhe-
drns.get(s1).appendLast(face1)
end;else
face1=findAllFaces(s1,s2)
if (plyhedrns.get(s3)!=null) then
/* link between face
and dodecahedron */
face1.appendLast(plyhedrns.get(s3))
end; end; else
face1=findAllFaces(s1,s2)
face2=findAllFaces(s3,s4)
if face1!=null and face2!=null then
face1.appendLast(f2)
end; end; end
Dodecahedron Model for Storage of Unstructured Data 365
8 Conclusion
In this paper, we present to you a combination of key, value and graph model to create
a structure which is capable of handling any data type and interlink them as a solution
for a generalized model for unstructured data. In the proposed model unstructured data
can be randomly stored and interconnected at different levels as per usage and retrieved
the same. The model uses positive aspects of both hash key and graph to increase
scalability. The structure can be complex as the data is stored randomly but is con-
nected with each other.
References
1. Chen, L., et al.: RAISE: a whole process modeling method for unstructured data management.
In: 2015 IEEE International Conference on Multimedia Big Data. IEEE (2015)
2. Jeon, S., Yohanes, K., Bonghee, H.: Making a graph database from unstructured text. In:
IEEE 16th International Conference on Computational Science and Engineering. IEEE (2013)
3. Al-Aaridhi, R., Kalman, G.: Sets, lists and trees: distributed data structures on distributed hash
tables. In: 2016 IEEE 35th International Performance Computing and Communications
Conference (IPCCC). IEEE (2016)
4. Kim, J., et al.: RUBA: real-time unstructured big data analysis framework. In: 2013
International Conference on ICT Convergence (ICTC). IEEE (2013)
5. Abdullah, M.F., Kamsuriah, A.: The mapping process of unstructured data to structured data.
In: 2013 International Conference on Research and Innovation in Information Systems
(ICRIIS). IEEE (2013)
6. https://neo4j.com/developer/graph-db-vs-rdbms/
A Smart Home Assistive Living Framework
Using Fog Computing for Audio
and Lighting Stimulation
1 Introduction
Dementia is a chronic disorder that causes permanent and gradual cognitive decline
which is more likely to occur in elderly people. Dementia has affected an estimated 50
million people across the globe and is likely to rise to about 152 million people by 2050
[1]. The main side effect of dementia is the loss in memory of placing of objects,
forgetfulness of recent context and circumstances and even loss of recognition of
individuals. Sensory stimulation is the activation of various senses like vision, hearing,
smell, taste, and touch. Multi-Sensory Stimulation (MSS) has been an increasingly
popular approach to care used by several dementia care centers in recent times [2].
There are several types of sensory stimulation each with their own benefits. One of the
more effective methods to combat the impacts of dementia is music and light treatment
as we explain in the following.
Studies have shown that colors influence memory [3]. Color cues can be used to
provide a powerful information channel to the human cognitive system and play an
important role in enhancing memory performance [4]. Audio stimulation is also
effective for enhancing mood, relaxation, and cognition [5]. Music can help individuals
who are experiencing dementia to recall their past, it also helps at decreasing senti-
ments of dread and tension that they frequently feel. The impact of music and light
treatment is useful for individuals with dementia as it causes them to distinguish all the
more effectively their environment and keep up their feeling of personality.
Design of smart home has evolved into a significant research area with the
development of the Internet of Things (IoT) technology. The health care based smart
home environment assists elderly or disabled people living independently in several
ways. Sensing, reasoning, and acting are the primary tasks involved in modeling a
smart home [6]. The growing amount of enormous data and substantial processing
capabilities of devices between cloud and source of data in the IoT environment paved
the way to a new paradigm, fog computing, wherein a portion of the computation is
shifted towards the origin of data from the traditional cloud-centric computation. In fog
computing, computation occurs using the infrastructure on the way to the cloud from
the source of data [7]. It offers several benefits like improved response time, data
protection and security, reduced bandwidth consumption etc. [8].
In this work, we propose a framework for audio and lighting stimulation program
for the elderly using fog computing model in a smart home. Training the model to
identify the familiar persons and daily usage objects, classification of them is done
using fog devices. This gives quicker response time when compared to training and
classification being done on the cloud. When any person or object from this trained
group is sensed and classified using the trained model, audio and lighting stimulation
are given, which triggers the associative recall mechanism to help the elderly in rec-
ognizing the person or object better. The organization of the paper is as follows.
Section 2 presents related work done in this field. Section 3 describes the proposed
system and implementation details with experimental results are discussed in Sect. 4.
Finally, Sect. 5 concludes the work suggesting future enhancements.
2 Related Work
The problem of designing smart home environments to assist elderly and disabled
people has been addressed earlier in the literature. Wang et al. proposed an enhanced
fall detection system for monitoring the elderly person using wearable smart sensors
which can detect the accidental falls in the smart home environment [9]. Suryadevara
et al. proposed a wireless sensor network based home monitoring system for elderly
activity behavior that involves functional assessment of daily activities [10]. The
proposed mechanism tries to estimate the wellness of elderly people based upon the
usage of household items connected through various sensing units.
368 N. K. Suryadevara et al.
3 System Description
The generic architecture of our fog computing based smart home environment is shown
in Fig. 1. The sensor node, which is shown in the lower level of the architecture,
gathers information from several sensors connected to it, perform some processing, and
communicate with other connected nodes in the network. Typically all these sensor
nodes communicate with the gateway device, which has higher processing and storage
capability compared to the sensor node.
The sensor nodes could use different protocols, like Zigbee, Wi-Fi, Z-Wave etc., to
communicate with the gateway device, which depends upon the sensor nodes and
gateway device being chosen based on the application requirement. The edge gateway
collects the data from various sensor nodes, which would be further processed by fog
devices as per the application requirement. The summary information from fog devices
could be optionally sent to the cloud for further storage and analysis.
A Smart Home Assistive Living Framework Using Fog Computing 369
4 Implementation Details
Fig. 3. (a) Deployment of various sensor nodes and devices (1-Sensor unit type#1, 2-Sensor unit
type#2, 3-Gateway, 4-Fog device) (b)PIR and Camera sensor connected to Raspberry Pi, One of
the hue lights employed in the smart home
A Smart Home Assistive Living Framework Using Fog Computing 371
The fog node should trigger the appropriate audio and visual stimuli in the room the
person with dementia is present when any visitor arrives. As shown in Fig. 3 (a), we
have used three units of type#2 sensor node (each with hue light, speaker to play the
music, and PIR sensor) placed in kitchen, living room and bedroom. The PIR sensor of
this type#2 sensor node is associated with one domestic object in each room (living
room – Sofa Set, Bedroom – Bed, Kitchen – Oven). This PIR sensor serves two
purposes, one to identify the reach of the resident to that object and the other to identify
the location (room) of the resident when any visitor arrives. The fog node, through the
gateway device, triggers the appropriate audio and lighting stimulation on the speaker
and hue light in the appropriate room, when any visitor arrives or the resident nears the
domestic object under consideration. Figure 3 (b) shows various sensing nodes used in
our smart home environment.
Fig. 4. (a) Visitor identification by fog node (b) Fragment of PhilipsHueLight database table
which stores hue light values
The developed system was used continuously for one month, and sensing infor-
mation along with the corresponding stimulated audio, lighting information is stored in
the database for verifying the functionality of the system. Figure 4 (b) shows the
snapshot of the fragment of PhilipsHueLight database table, which stores hue light
values. Table 1 shows the result of the classifier algorithm running on fog node for one
particular day. Out of five trained persons, the classifier could classify four persons
correctly based on the camera picture taken by sensing unit type#1. It wrongly clas-
sified person#1 as person#4 at time stamp 2018-12-03 19:15:32.
Table 2 shows the stimulated audio and lighting information given to sensing unit
type#2 when the system recognizes the visitors and the resident nears the domestic
372 N. K. Suryadevara et al.
Table 1. Classification of the persons based on the images captured by sensing unit type#1
Person arrived Arrival timestamp Classifier output
Person 4 2018-12-03 10:05:03 Person 4
Person 2 2018-12-03 10:20:15 Person 2
Person 4 2018-12-03 11:03:16 Person 4
Person 3 2018-12-03 11:30:21 Person 3
Person 5 2018-12-03 12:30:45 Person 5
Person 2 2018-12-03 15:02:32 Person 2
Person 4 2018-12-03 15:16:45 Person 4
Person 1 2018-12-03 19:15:32 Person 4
Person 3 2018-12-03 19:25:22 Person 3
objects under consideration. The system could produce correct audio and lighting
stimulation for four persons (#2, #3, #4, #5), and failed to give for person#1, as he was
incorrectly classified as person#4 by the classifier in the fog node, and the audio,
lighting stimulation for person#4 was triggered when person#1 visits the house.
Additionally, when person#3 visits the house second time, even though he was rec-
ognized correctly by fog gateway, the exact location of the resident was wrongly
identified as the living room by PIR motion sensors. Hence, audio and lighting stim-
ulation for person#3 was given wrongly in the living room instead of the bedroom. The
audio and lighting stimulation for sofa set and bed was triggered correctly when the
dementia person approaches these objects.
Fig. 5. (a) An instance of the configuration panel in openHAB (b) Visual stimuli given for
dementia person in the kitchen for person#4, in the living room for person#5, in the living room
for sofa set
Table 2. Stimulated audio and lighting information for different visitors and objects in the smart
home on one particular day (C - Color, S - Saturation, B - Brightness)
Exact Location Person/House Hue Light Audio Triggering
location of identified by hold Object Parameters file action
the PIR sensor C S B played correctness
resident
Kitchen Kitchen Person 4 143 29 40 File 4 Yes
Kitchen Kitchen Person 2 35 40 60 File 2 Yes
Living Living room Sofa Set 55 62 59 File 6 Yes
room
Living Living room Person 4 143 29 40 File 4 Yes
room
Kitchen Kitchen Person 3 45 59 72 File 3 Yes
Living Living room Sofa Set 55 62 59 File 6 Yes
room
Living Living room Person 5 219 61 59 File 5 Yes
room
Living Living room Person 2 35 40 60 File 2 Yes
room
Living Living room Person 4 143 29 40 File 4 Yes
room
Bed room Bed room Bed 95 25 35 File 7 Yes
Bed room Bed room Person 1 143 29 40 File 4 No
Bed room Living room Person 3 45 59 72 File 3 No
Fog computing extends the concept of cloud computing, by pushing the data pro-
cessing towards the network edge, making it ideal for the internet of things (IoT) and
other applications that require quick response times. In this research work, we proposed
a framework for audio and lighting stimulation program for the elderly using fog
computing model in the smart home environment. The machine learning classification
model is trained on the fog node using images of known persons. When anyone from
this trained group is sensed or the resident nears domestic objects under consideration,
appropriate audio and lighting stimulation is given, which helped the elderly to rec-
ognize them quickly using music and lighting therapy.
At present, the system could classify the people from the trained group, and give
the relevant music and lighting stimuli. The proposed system could be extended to get
the model trained to recognize the unfamiliar persons and automatically train itself to
trigger the relevant stimuli.
374 N. K. Suryadevara et al.
References
1. World Alzheimer Report 2018: the state of the art of dementia research: new frontiers.
https://www.alz.co.uk/research/WorldAlzheimerReport2018.pdf. Last accessed 20 Jan 2019
2. Samvedna Senior Care. https://www.samvednacare.com/blog/2018/04/09/5-types-of-multi-
sensory-stimulation-for-dementia-patients/. Last accessed 10 Jan 2019
3. Dzulkifli MA, Mustafar MF (2013) The influence of colour on memory Performance: a
review. Malays J Med Sci 20(2):3–9
4. Wichmann FA, Sharpe LT, Gegenfurtner KR (2002) The contributions of color to
recognition memory for natural scenes. J Exp Psychol: Learn, Mem, Cogn 28(3):509–520
5. Auditory Stimulation for Alzheimer’s Disease and Dementia. https://best-alzheimers-
products.com/auditory-stimulation.html. Last accessed 18 Nov 2018
6. Gayathri KS, Easwarakumar KS (2016) Intelligent decision support system for dementia
care through smart home. In: 6th international conference on advances in computing &
communications, Cochin, pp 947–955
7. Shi W, Cao J, Zhang Q, Li Y, Xu L (2016) Edge computing: vision and challenges. IEEE
Internet Things J 3(5):637–646
8. Aazam M, Zeadally S, Harrass KA (2018) Fog computing architecture, evaluation, and
future research directions. IEEE Commun Mag 56(5):46–52
9. Wang J, Zhang Z, Li B, Lee S, Sherratt RS (2014) An enhanced fall detection system for
elderly person monitoring using consumer home networks. IEEE Trans Consum Electron
60(1):23–29
10. Suryadevara NK, Mukhopadhyay SC (2012) Wireless sensor network based home
monitoring system for wellness determination of elderly. IEEE Sens J 12(6):1965–1972
11. Jean-Baptiste EMD, Mihailidis A (2017) Benefits of automatic human action recognition in
an assistive system for people with dementia. In: IEEE Canada international humanitarian
technology conference (IHTC), pp 61–65
12. Zhang S, McClean S, Scotney B, Hong X, Nugent C, Mulvenna M (2008) Decision support
for alzheimers patients in smart homes. In: 21st IEEE international symposium on computer-
based medical systems, pp 236–241
13. Amiribesheli M, Bouchachia A (2016) Towards dementia-friendly smart homes. In: 40th
IEEE computer software and applications conference (COMPSAC), pp 638–647
14. Khan SS, Zhu T, Ye B, Mihailidis A, Iaboni A, Newman K, Wang AH, Martin LS (2017)
DAAD: a framework for detecting agitation and aggression in people living with dementia
using a novel multi-modal sensor network. In: IEEE international conference on data mining
workshops (ICDMW), pp 703–710
15. Mulvenna M, Zheng H, Bond R, McAllister P, Wang H, Riestra R (2017) Participatory
design-based requirements elicitation involving people living with dementia towards a
home-based platform to monitor emotional wellbeing. In: IEEE international conference on
bioinformatics and biomedicine (BIBM), pp 2026–2030
16. Schinle M, Papantonis I, Stork W (2018) Personalization of monitoring system parameters to
support ambulatory care for dementia patients. In: IEEE sensors applications symposium
(SAS), pp 1–6
17. Ritcher J, Findeisen M, Hirtz G (2015) Assessment and care system based on people
detection for elderly suffering from dementia. In: 5th IEEE international conference on
consumer electronics (ICCE), Berlin, pp 59–63
18. Kasliwal MH, Patil HY (2017) Smart location tracking system for dementia patients. In:
International conference on advances in computing, communication and control (ICAC3),
pp 1–6
A Smart Home Assistive Living Framework Using Fog Computing 375
1 Introduction
Genetic algorithms [1–4] are a type of meta heuristic optimization algorithms and a
subset of evolutionary computation [5]. Combinational optimization problems such as
TSP [6] have often been considered to be solvable by genetic algorithms for effectively
large and complex data spaces [7]. Genetic algorithms leverage the ideas of biological
evolution through considering members of the data spaces as DNA strands containing
genetic encoding which are then selected as parents, crossed and mutated to produce
off-springs for subsequent generations.
Genetic algorithm basic principal are selection and reproduction. Objective of
selection is reduced the search space. Objective of reproduction is expanded the search
space. The selection process selects two parents from population for reproduction.
Reproduction includes crossover and mutation. Crossover is the process of selecting
two parents and producing new child or offspring. Mutation operator tweaks one of
more gene on chromosome to obtain a new solution.
With regard to this, several selection techniques have been devised such as
Tournament, Ranking and Best Solution etc. However, these strategies have their own
individual trade-offs and the ideal strategy to be applied is not only problem-specific
but also dependent on the crossover and mutation operators that are applied alongside
them [8, 11].
2 Theoretical Background
3 Proposed Approach
The proposed approach involves the combination selection, crossover and mutation
procedures as per traditional gGAs. However, in our methodology we select two
selections, crossover and mutation operators that are applied on the dataset according to
a threshold value (Ts). Before the threshold point a particular operator is applied and
then a switch is made to the second operator after the threshold point. Experimentally it
has been found that this Ts is universally true for all the three operators.
Selection of the suitable operators that are considered for the hybrid is based on the
following criteria:
• Maintaining diversity of the population
• Allowing for elitist selection
• Preventing early convergence
• Obtaining a high convergence rate
random points are chosen from c2–c5 are chosen and 2, 3, 1 and 4 are filled in offspring.
Then start filling from parent-2 the missing chromosome. 6, 8, 7, 9 and 5 are filled.
GA HGA
100
Convergence Rate
98
96
94
92
Threshold
90
10 20 30 40 50 60
The experiment is run on different number of threshold on TSP and found that
when number of operations are twenty or thirty percent convergence rate is better.
380 S. Singhal et al.
4 Algorithm
First solution of the TSP is coded in array of integer. This array is called chro-
mosome and its length is a number of cities. Solutions are generated and fitness of
individuals is evaluated on the sum of the Euclidean distance between each of the cities
of the solution. After that tournament selection (s = 3) is chosen till the selection
threshold then best two selection is chosen. Then PMX crossover is performed for
some iteration and then ordered crossover is performed. Similarly, inverse and swap
mutation is performed.
The hybrid system was tested against standard TSPLIB data sets with the following set
parameters:
Tournament Size (s) = 3
Population Size = 100
Threshold Values (Ts, Tc, Tm): 30%
Crossover Probability (Pc) = 0.95
Mutation Probability (Pm) = 0.1
The tournament size (s) is set to 3 although a maximum tournament size of 5 can be
set when combined with two crossovers [8]. This size is set to prevent any early
convergence instances. The threshold value (Ts) is set to as determined experimentally
to be the optimal. The population size is set to ten times the chromosome length with a
relatively high Pc and a low Pm values. Due to the stochastic nature of GA, the
algorithm is run ten times and the best result in the ten runs is noted.
In the experiment the convergence rate is calculated as follows:
fitness optimal
Convergence Rate ¼ 1 100
Optimal
Sn. no. Dataset Optimal solution Obtained best solution Convergence rate
1. burma14 30.8 30.8 100.00%
2. dantzig42 679 685 99.11%
3. berlin52 7542 7773 96.90%
4. eil51 435 426 97.89%
5. eil101 629 673 93.10%
6. eil76 538 553 97.20%
384 S. Singhal et al.
The obtained results depict that a HGA has the potential to be worked upon using
different operator selection strategies and threshold values to design a highly accurate
gGA. Further work on the topic would include testing TSPLIB against varying tour-
nament values and operator selection strategies. It would also be interesting to see the
results that are obtained when this method is applied against other problem statements.
Acknowledgments. The author gratefully thankful to Rishab Rakshit student of SMIT who did
simulation in summer project’16 at MUJ.
References
1. Goldberg DE (2006) Genetic algorithms. Pearson Education, India
2. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multi-objective genetic
algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
3. Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) Development and validation of a
genetic algorithm for flexible docking. J Mol Biol 267(3):727–748
4. Chambers L (1995) Practical handbook of genetic algorithms: applications. CRC Press, Boca
Raton
5. Thomas B, Fogel DB, Michalewicz Z (1997) Handbook of evolutionary computation. IOP
Publishing Ltd and Oxford University Press, Oxford
6. Mühlenbein H, Gorges-Schleuter M, Krämer O (1988) Evolution algorithms in combina-
torial optimization. Parallel Comput 7(1):65–85
7. Katayama K, Sakamoto H, Narihisa H (2000) The efficiency of hybrid mutation genetic
algorithm for the travelling salesman problem. Math Comput Model 31(10-12):197–203
8. Miller BL, Goldberg DE (1995) Genetic algorithms, tournament selection, and the effects of
noise. Complex Syst 9(3):193–212
9. Reinelt G (1991) TSPLIB a traveling salesman problem library. ORSA J Comput 3(4):
376–384
10. Holland JH (1975) Adaptation in natural and artificial systems. An introductory analysis
with application to biology, control, and artificial intelligence. University of Michigan Press,
USA
11. Padmavathi K, Yadlapalli P (2017) Crossover operators in genetic algorithms: a review. Int J
Comput Appl 162(10):34–36
12. Sivanandam SN, Deepa SN (2007) Introduction to genetic algorithms. Springer, Heidelberg.
https://doi.org/10.1007/978-3-540-73190-0
Critical Evaluation of Predictive Analytics
Techniques for the Design of Knowledge Base
Abstract. The present diagnosis methods in medical fields are aided very much
by the cluster analysis methods. Data Summarization techniques are used to
discover the hidden patterns in huge datasets. They may be used for future
interpretation in diverse aspects in different environments. In the context of
medical data bases, the enormous growth of medical information and its cor-
responding use for disease diagnosis is a strenuous process. Therefore Disease
diagnose systems requires the conventional data analysis combined which
proficient knowledge of different diseases. Recent developments in Data seg-
mentation techniques may be used to analyze the reports of the liver patients
together with trends of the diseases and standard processes for resource uti-
lization in health care problems. Development of new system based on the
above analysis in turn assist the physician for better diagnosis of disease. In the
present paper, various classification techniques are applied to predict the dis-
orders in the liver functions accurately. The present paper is aimed at proposing
a new method for the prediction of the diseases with a better accuracy than the
existing traditional classification algorithms. It was found that these results are
very much promising and more accurate.
1 Introduction
Popular Data mining techniques are applied in diverse applications in the areas of
Medical diagnosis and Information Retrieval [1]. Artificial Intelligence is a branch of
Computer Science and it helps the computers to provide intelligent behavior. Learning
is one of the basic requirements for any system to exhibit intelligence. Machine
learning, indeed, is the most rapidly developing branch in AI research. Machine
Learning methods are developed to analyze the large medical datasets. In review of
these methodologies, it was found that several Clustering Techniques are built for
addressing the various problems in the classification systems developed for medical
resources analysis [2]. It is obvious to mention that decisions are made based on the
recent developments in clustering methods on medical datasets. These developments
have not yet incorporated in the literature of wide-scale medical applications. The
present study is focused on cluster analysis for the development of medical diagnosis
systems to assist physicians. The primary idea of cluster analysis is not only separating
and grouping of distinguish compatible units from differing units but also developing a
new system for the thinking process using the logical conclusions deducted from them.
Rai and Ramakrishna Murthy [3, 4] work out on the cluster analysis to group large
numbers of objects (persons or jobs) into smaller numbers of mutually exclusive
classes with members have similar properties. They developed clusters that are having
the configurations with an unique cluster based on similarity metric such that each
object would be classified into only a single unit. In the present paper, Clustering
methods are used to develop an efficient and reliable algorithm to derive a training
sample to deduce accurate predictions in the pathological status of the patient and to
design an advisory system based on MLC methods.
2 Related Work
Currently, Massive data is collected by specialists from clinical laboratories and stored
in large data bases using imaging equipments blood and other samples examination.
The Data must be evaluated for extracting the available valuable information in the
form of patterns. The extracted information should be matched to particular pathology
during the diagnosis.
In the design of Computer Aided Diagnosis System, AI algorithms are used in for
extracting the information, regularities, predicting the disease trends by avoiding wrong
diagnosis in routine. These systems are also used for dealing with special cases from
patient records stored in medical databases.
Intelligent techniques used in the data analysis include Time Series Analysis, Data
Visualization, Clustering and other Machine learning techniques.
Lavrace et al. [1] explained different data mining techniques applied in various
application areas. They developed some applications on selected data mining tech-
niques in medicine. They stated that those techniques are appropriate for the analysis of
medical databases. Clustering plays a major role among all the data analysis methods
for dealing with non-label massive datasets.
Xu et al. [2], Rai et al. [3] discussed about different Cluster Analysis in several
applications. Ramakrishna Murthy [4] proposed a method to improve accuracy and to
reduce dimensionality of a large text data in cluster analysis. They used Least Square
methods and Support Vector Machines using singular value decomposition.
Dilts [6] discussed about the critical issues faced by investigators in medical
resource-utilization domain and also its importance in validation [6, 7].
Hayashi et al. [7] evaluated the performance of various cluster analysis methods by
applying on various datasets based on cross validation technique. Kononenko et al. [8]
discussed the Medical diagnosis application using machine learning techniques. They
provided an overview data analysis and classification methods in the design and
development of Medical diagnosis systems that possesses intelligence.
Critical Evaluation of Predictive Analytics Techniques 387
Gaal [9] proposed some Cluster and Classification methods to use in the devel-
opment of Medical Expert Systems to assist the physicians for providing automated
advises.
Ramana et al. [11] worked on the analysis of patient data stored in medical records
using the concept of learning from past experiences. They opined that this analysis
improves medical diagnosis more accurately. They applied several classification
algorithms on ILPD Dataset [10]. They include, Decision Tree Classification, Bayesian
Classification and Back Propagation, Support Vector Machines (SVM) and Association
rule mining.
Ramana et al. [12] also applied the above classifier on UCI dataset [10] and
evaluated the parameters: Sensitivity, Precision, Accuracy and Specificity. They con-
sidered some classification algorithms among the different traditional classification
strategies and found their performances. They opined that relevant data extracted from
liver patient datasets may be used in medical diagnosis and other applications. Finally,
they concluded that Naive Bayesian Classifier can be applied for processing the
numerical attributes and it is very useful for the prediction of labels.
3 Problem Definition
The proposed model is aimed at obtaining clean data from the raw data by the
application of preprocessing techniques with increased accuracy in cluster analysis and
also the prediction of class values. Priority is given here to preprocessing to improve
the data grouping and consequently the classification results. The proposed model is
shown in Fig. 1. The proposed model is divided into four phases. In the first phase, data
is divided into number of Clusters using some clustering techniques, where each cluster
has manageable elements such as sub cluster. A variety of Clusters are identified as the
end result of this phase. Classification techniques are applied, in the Second phase, to
form new clusters for identifying/assigning individual objects to the predetermined
classes based on specific criteria. The outcome of this phase is a variety of classes are
formed. In the Third phase, Feature Selection techniques are applied to find the
dominating attributes by minimizing dimensionality of each class. In the Fourth phase,
these classes are used for deriving the expert system rules and to take the expert doctor
advices for disease identification for particular classes. During the above phases like
Clustering, Classification, and Feature Selection selected techniques are tested and the
best techniques are applied.
4 Experimental Dataset
In this paper two data sets are combined to form a new dataset with 1083 patients’
records of which 651 are liver disease patients and 432 are non-liver disease patients.
The first dataset is ILPD dataset which is a part of UCI Machine Learning Repository
data set [8] comprising of 583 liver patient’s records with 10 attributes (obtained from
eight blood tests). The second data set is Physically collected dataset comprising of 500
records collected from various pathological labs in south India. It contains 13attributes
(obtained from ten blood tests). The common attributes in these two datasets are: Age,
Gender, TB, DB, ALB, SGPT, SGOT, TP, A/G ratio and Alkphos. Out of these
attributes TB (Total Bilirubin), DB (Direct Bilirubin), TP (Total Proteins), ALB
(albumin), A/G ratio, SGPT, SGOT and Alkphos are related to liver function tests and
used to measure the levels of enzymes, proteins and bilirubin levels. These attributes
helps for the diagnosis of liver disease. The description of Liver Dataset Attributes and
Normal values of attributes are shown in Table 1.
5 Methodology
5.1 Clustering Analysis
Proposed Improved Bisecting k-Means algorithm is considered for the analysis. This
algorithm is implemented and compared with selected clustering algorithms namely,
k-Means algorithm, Agglomerative Nesting algorithm, DBSCAN algorithm, OPTICS
algorithm, Expectation Maximization algorithm, COBWEB algorithm, Farthest First
and Bisecting algorithm. These Clustering algorithms evaluation is performed based on
Accuracy, Entropy, F-Measure, and Purity on Liver Dataset and proved that IBKM
performed better for selected clustering algorithms in this study [5]. So, the reason
IBKM Clustering algorithm is considered for Clustering the Liver Dataset with man-
ageable Clusters.
Critical Evaluation of Predictive Analytics Techniques 389
5.2 Classification
Selected Classification algorithms are used for prediction of the class label. They are
Naïve Bayes’ Classification algorithm, C4.5 Decision tree Classification algorithm, k-
Nearest Neighbor Classification algorithm, Support Vector Machines (SVM) Classifi-
cation algorithm, ID3 Classification algorithm, and Random Forest Classification
algorithm.
6 Experimental Results
Weka is a Tool that has been used to test the proposed strategy. It is a group of machine
learning algorithms collected for data mining. Weka tools are composed of data pre-
processing, Classification, Regression, Clustering, Association Rules and Visualiza-
tion. In this dataset was preprocessed 32 outliers were eliminated in the total dataset,
the cleaned dataset was implementation of proposed model, consisting of three major
phases namely, Clustering, Classification, and Feature Selection respectively. In this
clustering phase after testing six algorithms, the IBKM Clustering algorithm is selected
as the best and applied on the liver dataset of 1051 records and clustered the data into
six clusters i.e. k is given as six. After the clustering, 6 clusters are formed. Results are
presented in Table 2.
In the above clustering, C0, C1, C2, C3, C4, C5, C6, C7, C8, C9 are taken as class
labels and the number of records in each class for different clustering algorithms are
shown in Table 2. It can be observed that Random Forest Classifier gives high per-
formance, so considered the Random Forest Classifier for further analysis. Graphical
representation of the results is given in Fig. 2.
390 K. Swapna and M. S. Prasad Babu
7 Conclusion
In this paper a model is proposed for using predictive analytic techniques in effective
liver diseases diagnosis, The main objective of this study is to compare the result of
liver diagnosis using classifiers over proposed method using liver diagnosis gives the
efficient accuracy. This proposed model has three phases. In first phase, clustering
analysis is done by IBKM clustering algorithm using cluster the total liver dataset into
6 number of clusters after the clustering 2, 4, 5 clusters have more number of records
then those clusters split into sub clusters. Then finally found the 10 clusters. In second
phase classification is done by Random forest classification algorithm using as the
input of before clustering, then getting 10 classes. In third phase using as the input of
before classification, in every class applied feature selection algorithms, After getting
the dominating attributes of every classes in that choose the common dominating
attributes are considered as the dominating attributes for every class, based on that
classes liver diseases diagnose is done with help of liver expert doctors (Gastroen-
terologists) and also taking the suggestion and precautions for those liver diseases; The
above liver expert advices is using prepare the knowledge base for every class and label
the liver diseases, the above model using knowledge Base is very important for the
development of automatic liver disease diagnose expert systems efficiently.
References
1. Lavrac N (1999) Selected techniques for data mining in medicine. Artif Intell Med 16(1):3–23
2. Xu, R, Wunsch II, D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw
(1045–9227) 16(3):645–678
3. Rai, P, Singh, S: A survey of clustering techniques. Int J Comput Appl 7(12):1–5 (0975 -
8887)
4. Ramakrishna Murty M, Murthy, JVR, Prasad Reddy, PVGD (2011) Text document
classification based on a least square support vector machines with singular value
decomposition. Int J Comput Appl (IJCA) 27(7):21–26
5. Swapna K, Prasad Babu MS (2017) A framework for outlier detection using improved
bisecting k-means clustering algorithm. Int J Electr Comput Sci IJECS-IJENS 17(2):8–12
(161102-5858)
392 K. Swapna and M. S. Prasad Babu
6. Dilts D, Khamalah J, Plotkin A (1995) Using cluster analysis for medical resource decision
making. J Med Decis Mak 15(4):333–346
7. Hayashi C, Yajima K, Bock HH, Ohsumi N, Tanaka Y, Baba Y (eds) (1996) Classification,
and related methods. In: Proceedings of the fifth conference of the international federation of
classification societies (IFSC 1996), Kobe, Japan, 27–30 March 1996
8. Kononenko I (2001) Machine learning for medical diagnosis history state of the art and
perspective. J Artif Intell Med 23(1):89–109
9. Gaal B (2009) Multi-level genetic algorithms and expert system for health promotion. PhD
thesis, University of Pannonia
10. Machine Learning Repository http://archive.ics.uci.edu/ml/about.html
11. Ramana BV, Prasad Babu MS, Venkateswarlu NB (2012) A critical comparative study of
liver patients from USA and INDIA: an exploratory analysis. Int J Comput Sci 9(2). ISSN:
1694-0784. ISSN (Online): 1694-0814
12. Ramana BV, Prasad Babu MS, Venkateswarlu, NB (2011) A critical study of selected
classification algorithms for liver disease diagnosis. Int J Database Manag Syst (IJDMS) 3
(2):101–114. ISSN: 0975-5705
Beyond the Hype: Internet of Things Concepts,
Security and Privacy Concerns
The Internet has evolved in 1990 (creation of World Wide Web (W3), a network of
network: concept created/built by Tim Berners-Lee [1]) but has received attention from
maximum customers during the past decade only. Today’s Internet has become the
richest source of information and utilize by multiple devices for example, for finding
path, or any hotel over the road network (using help of Global Positioning System
(GPS)). Internet has seen its first revolution in 1990 to 2000, and second revolution
from 2001 to 2010, whereas, third revolution since 2011 onwards to till date. Till 2010,
Internet was a universe of interlinked human and creates new generations of interactive
experiences, but todays it has moved to a new level, i.e., provide or using (sharing)
information with internet of things or smart devices. Internets of Things (IoTs) or
Internet Connected Devices are the devices (with consisting sensing feature) or use
Internet network for making a communication or provide better life experience to
human beings. In general, IoT or smart devices are the concept of connecting smart
objects or operating together to solve some specific/real – world problems. With respect
to this, IoT or Internet Connected Devices has become popular since 2011, i.e., with the
rapid development of small low-cost sensors, wireless communication technologies,
and new Internet techniques. Today’s several applications that use IoT devices are:
intelligent transportation, communication, e-healthcare, smart home, smart animal
farming, finance, retail, environmental monitoring, etc. [2]. Hence we can say that (as
discussed previously) now days, we are living in the third revolution of Internet where
internet is connected with IoT, “a world of networked smart devices equipped with
sensors, connected to the Internet, all sharing information with each other without
human intervention”. Note that “Internet of Things” term was first time coined by the
cofounder and Executive Director of MIT’s Auto-ID lab, Kevin Ashton in the mid
1990s [3]. Various definitions have been given by various scientists and researchers (in
the past decade), but most of the accepted definition is “Intelligent interactivity between
human and things to exchange information & knowledge for new value creation” [4].
Hence, with the rise of connected devices (with internet of things) and connected
individuals (systems/devices with sensor), we received combination of four terms (i.e.,
big data, cloud, social media, and mobile devices and things) or technologies (tech-
nology pillars) which works as fuel to industries and help IoTs to re-shape [5]. All these
four pillars (of technology) (big data, cloud, social media, and mobile devices and
things) are interconnected with each other and work efficiently (with respect to cost and
accessing to find/search records over records or a database). But, IoTs have different
views/aspects for these pillars, i.e., tracking or leaking information of user to malicious
users (by malicious systems/devices, for example, in a Hollywood movie “Eagle Eye”,
computer systems or army personal track a user or its location everywhere with the help
of small drone) [5]. Basically, this issue (with people) is a long (complex) process/task
to discuss and highly critical to taking care of/focus, so this issue is being discussed in
further sections with several suggestions and technology (also refer Table 1). Apart
that, discussing IoT features by providing total number of connected
devices/connections makes IoT explanation easier. In general, IoT is a complex eco-
system encompassing all aspects of the Internet, including analytics, the cloud,
application, security, etc. In technical words, connecting devices/things with internet
used three main technology components (i.e., physical devices with sensors/connected
things, connection and infrastructure, and analytics and applications) [5]. Today’s
Internet of Things are providing several benefits to users/customers through several
applications like smart farming, smart parking, smart homes, etc. IoT devices have the
potential to change the ways of communication completely (with people and tech-
nology). In future, IoTs is likely to mix the virtual and physical worlds together to
provide an easy and comfort experience to everyone (organizations or people). But, the
items which will contain sensors to produce data (home, the car, and with wearables
Beyond the Hype: Internet of Things Concepts 395
and ingestible, even the body puts particular) will put several security and privacy
challenges. Now day’s physical objects are increasing with a high rate in developed
countries (including developing countries) to provide growth/progress to respective
nation/country. Human in these countries uses IoT devices to increasing productivity of
their business (e.g., retail, e-commerce, etc.) or protect themselves (e.g., smart home),
etc. These increasingly internet connected devices detect and share each and every
observations about us, and store this information in the respective database/server (to
which these devices are interconnected). But, note that here these devices comes with a
critical issue, i.e., ‘privacy’. All users require/want privacy all the time or want to
protect their personal information from outside world/unknown user/strangers.
Now days Internet of Things (IoTs) devices are using much in our (human-being)
lives/real-world’s applications than mobile phones, but both devices contain our per-
sonal data like contacts, messages, social security numbers and banking information,
even every activity made by us online (being made on internet or made by devices) or
offline. It also accesses records which are running offline to our Systems/Mobile/
Devices (in the backend). Also, various security concerns can be with respect to a
single device, for example, a mobile phone can quickly turn to 50 or 60 concerns [6]
when considering multiple IoT devices in an interconnected home or business (e.g.,
cost, time, security, privacy, etc.). Importantly, we need to find that which/what IoT
devices have access to (i.e., need to understand their security risk). Note that the growth
in IoTs/connected devices has increased since the last decade. With this, IoT also have
increased the potential attack surface for hackers and other cyber criminals (e.g.,
Ransomware attack affect millions of computers in 2017 and steal TeraBytes of data).
More devices connected online (to Internet) means more devices require protection,
whereas, IoT systems are not usually designed for cyber-security. In current days, the
numbers of cyber-criminals/attackers are increasing every day, and the data breaches
by them is increasing every day and it will be continue in future also. Several other
issues (for mobile security) are already a challenge with respect to connected devices
(i.e., IoTs) and will be continued in future. For example, let 10 IoT connected devices is
not creating problems for a user, but what if IoT devices are in billions an connected
together, then each one represents a potential doorway into our IT infrastructure and
our company or personal data. Thinks “How much data these connected devices can
collect”? Note that when internet of things connects together, a lot of data will be
generated, collected at several locations for making valuable decisions in future for
various applications/areas like automated home appliances, defence, smart grids and
high-resolution assets. Storing similar data at several locations work as backup in
emergency-case. These concerns require new methods, strategies, regulations to protect
IoTs ecosystem (i.e., by incorporating security and privacy into its design and
implementation (refer Sect. 3 and 4)). Note that IoT ecosystem is a network of many
internet connected things which are connecting together, and sharing information with
each other’s.
Hence the remaining part of this work is organized as: Sect. 2 investigates several
threats in internet of things ecosystem. Also, this section discusses analysis of these
(respective) attacks with possible solution/countermeasure. Then, some critical chal-
lenges have been included in Sect. 3. Then, this work provides many suggestions or
techniques (methods) to secure an IoT ecosystem in Sect. 4. In Sect. 5, this work tells
396 A. K. Tyagi et al.
us what we can do/solutions for avoiding tracking of IoTs or not being trapped with
IoTs devices. In last, this work is concluded (in brief) with some future remarks in
Sect. 6.
The growth in IoTs/connected devices is increasing and will be increasing over the next
decades. Several things/devices are connected to the Internet now days (and it will be
always increasing), i.e., these (IoTs) devices provide a virtual environment to human
being/to a physical object, but when it get used as services with applications, this
virtual form start to interact and exchange essential or important information (of
respective users whoever are using these devices), and these devices make useful
decisions based on this collected information/data. Now, there are several IoT threats
which can be categorized into four types: Privacy, Security, Trust and Safety. In
security, denial of service and other attacks are possible in IoT. In privacy, like
background knowledge, timing or transition etc., attacks (with the personal informa-
tion) possible done (by cyber criminals). IoT leads to several physical threats in several
national projects/departments/areas, for example, automation industry (cars and
homes), environment, power, water and food supply, etc. Note that when many
applications interconnect with these devices to make a smart environment (with device
to device or machine to machine), we need to consider security (physical), privacy
(data, identity and location). Due to located or using IoT devices in sensitive areas like
e-healthcare, then these devices may get tampered/accessed by the individual attacker/a
group of attackers for their financial use (read or change data) [6]. With such
attacks/access, attacker could control a system (which is built by integration of IoT) and
change functionality of this system accordingly. For example in 2010, Stuxnet virus by
spread by some attackers in Iran to control/damage their nuclear weapons. Internet of
Things security is no longer a foggy future issue [6, 7], as more and more such devices
enter the market and our lives, i.e., from self-parking cars to home automation systems
to wearable smart devices. Now days there are (will be in future) so many sensors, so
many devices, that they are even sensing you, but they are always around you to track
your footprint. It is tracing your every movement/task (made by your online/offline) all
the time. So, we need to be aware from such types of attacks/tracking.
systems). For example, in Hollywood movie “I.T” (released in 2016), an attacker tracks
every movement of a victim (from a remote location) and tries to blackmail him/her.
Also in similar movie, attacker tries to control his home automation, phone or other
connected devices (to internet). Through this, attacker makes pressure on victim to
accept his proposal, or blackmail other person for his financial purpose. Moreover this,
now days, we are already looking hacked TV sets, power meters (i.e., used to steal
electric power), smart phones, video cameras and child monitors [5, 12]. Hacking such
devices (i.e., internet connected devices) has raised serious privacy concerns. Today’s
we can imagine a worm that would compromise large numbers of these Internet-
connected devices (on a large scale) and can controlled them via a botnet [15] or a
collection of computer infected system (e.g., Wannacry attack, Ransomware attack,
HTTP bot, etc.). It is not just the value or power of the device that an attacker/malicious
user want. An attacker wants to slow down network bandwidth through a DDoS
(Distributed Denial of Service) attack. Note that here biggest issue is not security of IoT
devices, but a privacy issue (collected information leaked by devices to other con-
necting devices) is a great concern. Also, with low bandwidth, attacker can compro-
mise device and can use it against a user/attack a third party. Now imagine a Botnet of
100,000,000 IoT devices all making legitimate website requests on your corporate
website at the same time. In result, respective website will get slow down and will not
properly. With such incidents, in near future, IoT will create unique and complex cases
as security and privacy challenges for several industries/organizations.
Also, machines are becoming autonomous, so they are able to interact with other
machines (in near future) and are free to make decisions which may impact the physical
world. For example, problems with automatic trading software, which can get trapped
in a loop causing market drops. The systems may have fail safe built in, but these are
coded by humans who are fallible (i.e., error-prone), especially when they are writing
code that works at the speed and frequency that computer programs can operate. If a
power system were hacked by some attackers and they turned off the lights of an area/a
city. It is not a big issue/problems for many users, but it matters for thousands of people
who are present in the subway stations (i.e., in hundreds of feet underground in pitch
darkness), then this issue became too (highly) critical. Such issue really requires
attention from research communities in near future. Hence, Internet-Connected Things
(ICT) allows the virtual world to interact with the physical world to do things smartly
and that come with several safety issues.
alters the integrity of data/modifies the data) and Extremely High-level attack (when an
intruder/attacker/eavesdropper attacks on a network (with unauthorized access) and
performing an illegal operation, i.e., making respective network unavailable, and
sending bulk messages to other users, or jamming network). Apart such attacks, the IoT
is facing various types of attacks including active attacks and passive attacks [10, 11],
which may easily disturb the functionality and abolish the services of communication
link/network. Note that in a passive attack, an attacker just sense messages (passing
through) or may steal the information, but never attacks physically (this attack is
similar to medium level attack). On the other side, in active attacks case, attacker
disturb the performance of a network/communication physically (this attack is similar
to extremely high level attack). Note that in general, active attacks can be classified into
two categories, i.e., internal attacks and external attacks [11]. Any devices can be
prevented against any vulnerable attacks via using proper awareness/making commu-
nicate smartly trough these devices. Hence, the security constraints must be applied to
prevent devices from malicious attacks.
Different types of attack, nature/behavior of attack and threat level of attacks with
possible solution have been discussed in Table 1. Hence, this section discusses several
threats like Route diversion, eavesdropping, DoS, etc., investigated in IoTs with their
behavior, level and possible solutions for respective attack. Now, next section will
discuss several common challenges in internet of things in detail.
Table 1. Different type of attacks with possible solutions for respective attacks
Type Threat Behavior Possible solution
Passive Low It is used to identify the Ensure confidentiality of data
information about the target and do not allow an attacker to
node. Examples include passive fetch information using
eavesdropping and traffic symmetric encryption techniques
analysis. Intruder silently
monitors the communication for
his own benefits without
modifying the data
Man in the Low to Examples of this attack include Ensure integrity by applying data
middle medium Alteration and eavesdropping. confidentiality and proper
An eavesdropper can silently integration. Encryption can also
monitor the transmission be applied to avoid data
medium and can do modification modification
if no encryption is used and also
manipulate the data
Eaves- Low to Causes loss of information, for Apply encryption technique on
dropping medium example in medical environment, all the devices used in
privacy of a patient may be communication
leaked by sensing the
transmission medium
(continued)
400 A. K. Tyagi et al.
Table 1. (continued)
Type Threat Behavior Possible solution
Gathering Medium to Occurs when data is gathered Encryption, Identity based
high from different wireless or wired method and message
medium. The collected message authentication code can be
may get altered by the intruder. applied in order to prevent the
Examples are skimming, network from this type of
tampering and eavesdropping malicious attacks
Active High Effects confidentiality and Ensure both confidentiality and
integrity of data. Intruder can integrity of data. Symmetric
alter the message integrity, block encryption can be applied to
messages, or may re-route the preserve the data confidentiality.
messages. It could be an internal An authentication mechanism
attacker may be applied to avoid
unauthorized access
Imitation High It impersonate for an To avoid from spoofing and
unauthorized access. Spoofing cloning attacks, apply identity
and cloning are the examples of based authentication protocols.
this attack. In spoofing attack a Can use un-clonable function as
malicious node impersonate any a countermeasure for cloning
other device and launch attacks attack
to steal data or to spread
malware. Cloning, re-write or
redundant data
ePrivacy High Intruders fetch the Sensitive Anonymous data transmission,
information of an individual or Transmission of sample data
group. Such attacks may be instead of actual data can help to
correlated to gathering attack or achieve privacy. Can also apply
may cause an imitation attack techniques like ring signature
that can further lead to exposure and blind signature
of privacy
Interruption High Affects availability of data. This Accessing of data and usage of
makes the network unavailable data is restricted by some
authorization technique
Routing High Alter the route of transmission to Apply connection oriented
diversion create huge traffic and hence the services to avoid route diversions
response time increased
Blocking Extremely It is type of DoS, jamming, or Firewall protection, apply packet
high malware attacks. It create filtering, anti-jamming, active
congestion in the network by jamming, and updated antivirus
sending, huge streams of data, programs in order to protect the
similarly different types of network from such attacks
viruses like Trojan horses,
worms, and other programs can
disturb the network
(continued)
Beyond the Hype: Internet of Things Concepts 401
Table 1. (continued)
Type Threat Behavior Possible solution
Fabrication Extremely The authenticity of information Data authenticity can be applied
high is destroyed by injecting false to ensure that no information is
data changed during the data
transmission
Denial of Extremely To disturb the normal Cryptographic techniques help to
Service high functionalities of device, the ensure security of network.
malicious node create traffic in Authenticity helps to detect the
the network by retransmitting the malicious user and block them
same data and by injecting bulk permanently.
messages into the network
As discussed above (in Sect. 3), the security, privacy, safety, etc., issues are the biggest
challenge to rectify/solve in IoT ecosystem. In general, challenges are the problem
where research is still going on or questions still require answers (still require attention
from research communities). These issues and challenges require attention of
researchers and need to solved for providing trust in devices (through this, industry will
get new competencies and capacities) and higher growth rate of IoTs devices. Some
major challenges can be (identified from various areas in IoTs application/its ecosys-
tem) included as:
a. Infrastructure: Todays Smart Infrastructure like Smarter Cities, Smart Grid, Smart
Building, Smart Home, Intelligent Transport Systems (ITS), and ubiquitous
healthcare, etc. [2] require safety (need to be trustable, mobile, distributed, valuable,
and powerful enabler for these applications) as essential component in its infras-
tructure. For this, we need to move on IPv6 addressing mechanism (for large
number of sensors and smart devices/things to be connected to the Internet) for each
IoT devices. Note that IPv6 is a technology/addressing scheme (in network) con-
sidered most suitable for IoT, as it offers scalability, flexibility, tested, extended,
ubiquitous, open, and end-to-end connectivity. Hence, it is major challenge for IoT
devices to move this addressing (new) scheme (IPv6).
b. Data and Information: The large volume of data (generating by several IoT devices)
present a biggest challenge for service providers in an IoT ecosystem. Big Data is
being so important and useful to organizations [13]. For that, we need to overcome
challenges like storing information at a secure place, and by a secure mechanism,
which will a boost to IoT service providers with analyzing this data, and discov-
ering relevant trends and patterns.
c. Computer Attacks: These attacks are the most common threats in an IoT/Cloud
Environment. Some attacks can be like Denial of Service (DoS), DDoS, etc. spread
malware in IoT devices. With such attacks, attacker exploits, or attacks on the user’s
privacy or even modification of the electronic components of the device. Note that
Server and computer security come sunder this challenge.
402 A. K. Tyagi et al.
security due to lack of experience and the human factor. Apart from above points, some
other challenges in IoTs are: Insufficient testing and updating, Brute-forcing and the
issue of default passwords, IoT malware, Ransomware, WannaCry, IoT botnets aiming
at Cryptocurrency, Data security and Privacy issues (mobile, web, cloud). Small IoT
attacks can be prevented for providing efficient Detection, Artificial Intelligence and
Automation, Ubiquitous data collection, Potential for unexpected uses of consumer
data. Generally, these internet connected devices have capability to make human being
lives easier, better and longer. So, if these issues/challenges (or issues) not addressed or
solved in near future then these (IoTs) devices may lead to a lot/more problems than
they are useful (giving benefits) to human beings.
Hence, this section discusses several challenges faced with internet of things like
preserving privacy and maintaining security, not having good standards for IoT
devices, etc. Now in continuing with this, next section will provide some solutions to
secure an IoT ecosystem.
In near future, Internet of Things will be a game changer for several applications,
including business. But together this, security and privacy issues issue will also raise on
a larger scale and will require attention form manufacture/research communities. In
general, IoT security depends on the ability to identify devices, protecting IoT hosting
platform, and protecting data (collected by smart/IoT devices) and share this data with
Trusted IoT Device (a trusted device is required to be reliably identifiable and asso-
ciated with a manufacturer/provider. IoT devices should be able to communicate with
the intended/authorized hosting services) and Trusted IoT Master. Here a trust master
has the knowledge about secure communication with several embedded sensors (in
devices/products), and issues regarding to software (i.e., when it needs to be updated
and when not). Note that this updation to these devices keeps them securely (with
assurances that using code/services are authentic, unmodified and non-malicious).
Sharing information with trusted entities only increase trust among users and on
technology. Now, here we are discussing all necessary tasks/components to
require/secure an IoT ecosystem.
(across multiple of data nodes) in unprotected manner. So, storing this data with
protected manner and avoiding any possible entry point to any malicious
users/insider is an essential issue to overlook/focus in near future. To overcome this
issue of protecting stored data, firms/organizations need to use sufficient encryption
mechanism (after compression of data)/lock down sensitive data at rest in big data
clusters (without affecting systems/devices performance). For that, it requires
transparent and automated file-system level encryption that is capable of protecting
sensitive data at rest on these distributed nodes.
ii. Protection of Data in Motion: Encrypting communicated data (moving through IoT
ecosystem) presents a unique challenge because it has a high variety and increasing
at a higher rate. As data (from a device) moves from one location to another (to
another device), it is highly vulnerable to attacks like fibre tapping attack, man in
middle attack, etc. Note that an attacker can listen a communication (which is being
with two parties/devices) with tempering/attaching a cable (with fibre coupling
device) and no device (or mechanism) can detect it. This attack is looks like
insider-attack (a type of active attack). In this, attacker can record all activity that
runs across a network, and data is captured and stolen without the owner’s
knowledge (even sender and receiver’s knowledge). In worst case, this type of
attack can also be used to change data, and has potential to override the controls on
the entire system. IoTs communication (over public networks) will require to be
secured via similar ways we protect other communications over the Internet, i.e.,
using Transport Layer Security (TLS). Note that encryption is also required at the
back-end infrastructure level of manufacturers, cloud service providers, and IoT
solution providers.
A data can also be protected using Blockchain concept. Security can be provided to any
types of data via creating blocks and storing encrypted data/information in blocks with
consisting information with respect to previous and next block’s records. This process
is clearly impossible to compromise (except in case of covering majority of blocks) by
any attacks. Hence this works presents several suggestions or techniques to (provide
security) securing an IoT ecosystem in an efficient manner. Now, next section will
discuss several possible ways to avoiding a user from being trapped by IoTs devices
(with a real world example).
Today’s IoTs are creating environment like cyber physical systems, where researcher
are looking for cyber security but they do not look over the physical security of
systems/devices. When attacks are happening on any IoT devices, they we need to
protect these devise with possible encryption mechanism and efficient symmetric or
asymmetric cryptography key to strengthen the security of IoT devices/environments.
Also, we can use security tools like data encryption, strong user authentication, resilient
coding and standardized and tested APIs (Application Programming Interface). Also,
we need to look over security of physical space (including cyber space), i.e., Physical
security is also an issue here, since these devices are usually used in open (like in smart
406 A. K. Tyagi et al.
metering, smart transportation, etc.) or in remote locations and anyone can get physical
access to it. This kind of issue requires much attention form research community. Note
that some security tools need to be applied directly to the connected IoTs devices. In this
era, traditional computers, the IoT and its cousin BYOD (Bring Your Own Device) have
similar security issues. These IoT devices do not have any sufficient capability to defend
themselves (automatically) and need to protected via some external software like fire-
walls and intrusion detection/prevention systems. Creating a separate network like
virtual private network or nay private network is also a solution, but with large number
of devices, it fails. Also, protecting devices with firewalls also fails in case of software
updating for next version (due to timely security updates on the devices). At updating
time, any attacker can sense or enter in a device. Hence, securing IoT is more difficult
from other types of security initiatives (like physical security). When someone has
physical access to the device once, the security concerns raise automatically. When we
evaluate security of IoT or protect data in IoT, then we get that this technology is still in
progress very much. In summary, loosing of privacy, security or trust is always start
with user’s permission only. Hence, using/at the time of configuring IoT devices, a user
need to be more careful and aware about not to every location/information of himself.
Hence in this section, we discuss the ways, through which, we can protected
ourselves in this smart worlds/era (in connection of IoTs), i.e., provide several solutions
for avoiding tracking by IoTs or not being trapped with IoTs devices. Now next,
sections will conclude this work in brief with few future remarks.
6 Conclusion
Today’s Internet of Things is emerging as a big revolution (third wave) in the devel-
opment of the Internet. Note that in the past, in 1990s’ (as first wave), Internet wave
connected 1 billion users, while in 2000s’, mobile connected another 2 billion users (as
another wave). The Internet of Things has the potential to connect 10X as many (28
billion) “things” to the Internet by 2020, ranging from bracelets to cars. This paper
reveals that due to the decreasing the cost of sensors, processing power, smart things
are getting cheaper and cheaper. Also, several governments (like Japan, Australia,
Dubai, India) are pushing to use the applications of IoTs devices like smart home,
smart cities, smart transportation, smart grid, etc. Dubai will fully upgraded before
2022 with smart things/devices. In India, concept of smart cities is already launched
and Amravati city is going to be the first India’s smart city before 2022. Apart that, we
also reveal that now days several smart objects/things like smart watches, smart specs,
and thermostats (Nest), etc., are already getting attention from public users. But, using
such devices/rising of IoTs creates several serious issues about privacy, security, safety,
etc. Now, this work worried about user’s privacy, i.e., IoT devices/smart gadgets
(which is configured badly) might provide a backdoor for hackers/strangers to look/in
to break into corporate networks/personal life of respective user. Hence, preserving
user’s privacy, security at the device level, protecting the master, and encrypting
communication links are critical to the secure operations of IoTs. In summary, security
needs to be built in as the foundation of IoT systems, with rigorous validity checks,
Beyond the Hype: Internet of Things Concepts 407
authentication, data verification, and all the data needs to be encrypted. Also, user’s
privacy needs to be persevered with new algorithms/mechanism.
References
1. https://en.wikipedia.org/wiki/Tim_Berners-Lee
2. Tyagi AK, Nandula A, Rekha G, Sharma S, Sreenath N (2019) How a user will look at the
connection of internet of things devices?: a smarter look of smarter environment. In:
ICACSE: 2019: 2nd international conference on advanced computing and software
engineering, KNIT Sultanpur, India, 8–9 February 2019
3. TELEFÓNICA I + D: Internet of Things + Internet of Services (2008)
4. https://wikisites.cityu.edu.hk/sites/netcomp/articles/Pages/InternetofThings.aspx
5. Tyagi AK, Shamila M (2019) Spy in the crowd: how user’s Privacy is getting affected with
the integration of internet of thing’s devices. In: SUSCOM-2019: International conference
on sustainable computing in science, technology & management (SUSCOM-2019). Amity
University Rajasthan, India, 26–28 February 2019
6. https://datafloq.com/read/internet-of-things-iot-security-privacy-safety/948
7. https://www.pcworld.com/article/2884612/security/internet-of-things-security-check-how-3-
smart-devices-can-be-dumb-about-the-risks.html
8. https://www.theguardian.com/technology/2015/aug/12/hack-car-brakes-sms-text
9. AmirGandomi Murtaza Haider (2015) Beyond the hype: big data concepts, methods, and
analytics. Int J Inf Manag 35(2):137–144
10. https://techdifferences.com/difference-between-active-and-passive-attacks.html
11. Hunt R (2004) Network Security: the Principles of Threats, Attacks and Intrusions, part1 and
part 2, APRICOT
12. Veerendra GG, Hacking Internet of Things (IoT), A Case Study on DTH Vulnerabilities,
SecPod Technologies
13. https://datafloq.com/read/big-data-history/239
14. Nakamoto S, Bitcoin: A peer-to-peer electronic cash system. http://bitcoin.org/bitcoin.pdf
15. Tyagi AK, Aghila G (2011) A wide Scale survey on Botnet. Int J Comput Appl 34(9), 9–22,
November (ISSN: 0975-8887)
A Route Evaluation Method Considering
the Subjective Evaluation on Walkability,
Safety, and Pleasantness by Elderly
Pedestrians
1 Introduction
The aging is progressing in Japan [1]. Improvement of the quality of life (QOL) of
elderly people is considered important [2]. In a report [3], more than 50% of elderly
people pointed out the problem of “getting tired easily when going out.” Therefore,
support methods for enriching their outdoor activities is drawing attention.
Conventional pedestrian navigation systems only provide the shortest path and are
inappropriate as aids for the elderly to go out. An empirical study of personalized
tourist route advice system mentioned that a shortest or a minimum-cost path does not
fit what tourists need [4]. Tourists would like to follow routes that can give them the
most satisfaction by including as much as possible those features that they like. This
concept must be also useful for improving the QOL of the elderly pedestrians. Toward
realizing route guidance method effective for improving QOL of elderly people,
mechanisms considering their physical difficulty, mental weakness, secure feeling, and
preferences can be useful.
The aims of this study are to confirm the factors can take into consideration the
mental and physical situation of the user and to acquire the quantitative cost functions
for these factors. This study consists of two stages. (1) construction of the revised cost
functions, and (2) evaluation of basic validity of the proposed method.
2 Related Work
Novack et al., proposed a system for generating pedestrian routes considering pleasant
when having green areas and social places as well as streets with less traffic noise [5].
They developed a way to integrate them into a routing cost function. The factors and
the way are theoretically defined based on the results from general studies. Torres,
et al., proposed a routing method for personalized route assistant based on multiple
criteria, which can design accessible and green pedestrian routes [6]. The factors are
selected preliminary only to show the usefulness of the method itself. The common
issues of these studies are that necessity and sufficiency of the factors and appropri-
ateness of each cost quantification for the factors were not confirmed.
Matsuda, et al., proposed the acceptable time delay, used as the cost for route
planning to consider the users’ preference for safety and walkability [7]. The delay
refers to the time actually acceptable to users for bypassing a spot or for walking by a
spot (see details in Sect. 4). A questionnaire survey was conducted from youth to
elderly people, and seven factors and the values of the delay were acquired based on
the subjective data. The issue is that the sufficiency of the factors was not evaluated.
3 Proposed Method
Two situations are assumed for the acceptable time delay [7]. Figure 1 (a) shows the
first situation, where the delay is the additional time accepted by the user to avoid a
place with a high physical load (e.g., a steep slope) or high risk (e.g., road without
410 H. Furukawa and Z. Wang
sidewalk). In the situation shown in Fig. 1 (b), the delay is defined as the additional
time that the user can accept to select the route with a preferred spot which is easier to
walk or lower risk (e.g., an intersection with traffic signals).
(a) a longer path to avoid a spot (b) a longer path to walk by or through a spot
Fig. 1. The situations assumed for the definition of the acceptable time delay.
Based on the concept about the acceptable time delay, the cost considering the
user’s preference is defined as Eqs. (1) and (2). When the value of the detour route for a
user increases (that is, the acceptable time delay becomes longer), the revised cost
decreases. The revised cost will be used instead of the original cost, when the path has
one of the spots (described in Table 1 for this study). This cost function makes it
possible to take pedestrian preferences into consideration in route planning.
a ¼ ðtime for the shortest pathÞ = ðtime for the shortest path þ the acceptable time delayÞ; ð1Þ
the revised cost of the longer path ¼ aðthe original cost : physical distanceÞ: ð2Þ
The cost functions were constructed in three steps. In the first step, potential factors
(spots) were selected as candidates in each category. In the second step, subjective data
for the acceptable time delay was acquired for each candidate factor. In the third step, a
cost function for each factor was constructed based on the acquired data.
routes. We got data from total 27 people over the age of 60 in Tsukuba City and
Shinjuku Ward, Tokyo. The former is in the countryside, and the latter is a typical
urban area. In addition to the factors used in the reference documents, additional factors
were added (for details see 5.2).
Table 1. The candidates of factors (spots) which may have relationships with user’s preferences
on routes.
Category Aim Factors (spots)
Walkability “a walk” & R1: steep slope /stairway
“to a goal”R2: crowded street
Safety “a walk” & S1: a sidewalk
“to a goal”S2: an intersection with traffic signal
S3: road with guardrails
S4: an intersection with crosswalk
S5: a pedestrian-vehicle separated traffic signal
S6: a pedestrian overpass
S7: a bright path
Pleasantness “a walk” T1: school
T2: a park
T3: waterfront
T4: a police box
T5: clear wide road
T6: quiet road
“to a goal” T1: a guide map
T2: a police box
T3: clear wide road
T4: road with less cars
412 H. Furukawa and Z. Wang
(b) at the aim “move to a near destination” (c) at the aim “move to a distant destination”
Fig. 2. The average values of the acceptable time delay to compare between the environment
conditions: daytime, nighttime, and bad weather.
Figure 2 (b) and (c) show the values in the situations where the aims are “to walk to
a near destination” and “a distant destination,” respectively. The differences in some
conditions are significant between daytime and nighttime, also daytime and bad
A Route Evaluation Method Considering the Subjective Evaluation 413
weather. But not between the situation with nighttime and bad weather. Therefore, the
values can be set in common at nighttime and bad weather.
Statistical Analysis on Differences Between the Distance Conditions. Figure 3 (a),
(b), and (c) show average values of the acceptable time delay with the aim of walking
to a near or distant destination in different environment conditions: daytime, nighttime
and bad weather. The differences in some conditions are significant. Therefore, the
values should be set separately in the environment conditions.
(a) at the envionment condition “daytime” (b) at the envionment condition “nighttime”
Fig. 3. The average values of the acceptable time delay to compare between the distance
conditions: near and distant destinations.
Revised Costs function for Spots for Assumed Conditions. As the results through
the analysis in the previous two sections, the three sets of values of acceptable time
delay were obtained for the proposed method (Figs. 4 and 5). For the aim “a walk”, 15
factors (spots) were set for the condition “daytime”, and 15 for “nithttime”, as indicated
in Fig. 4. For the aims “to a near destination” and “to a distant destination,” 13 factors
were set for the condition “daytime”, and 13 for the “nighttime or bad weather”
(Fig. 5). In this study, the revised cost functions are defined for each of the 82 factors
by using the Eqs. (1) and (2). Therefore, the total number of the revised cost functions
are 82.
414 H. Furukawa and Z. Wang
Fig. 4. The values of the acceptable time delay for the aim “a walk”, which are used for the
revised costs function of the proposed method.
(a) at the aim “move to a near destination” (b) at the aim “move to a distant destination”
Fig. 5. The values of the acceptable time delay for the aim “move to a destination”, which are
used for the revised costs function of the proposed method.
In order to evaluate the basic validity of the proposed method, subjective evaluation
experiments by elderly people were conducted, in which the shortest distance route and
the route by the proposed method were compared.
For every three types of aim, we selected the shortest route and the route by the
proposed method corresponding to each condition in Tsukuba city and Shinjuku Ward,
Tokyo. Because of the limited resource and time, we assumed “daytime” as the only
environment condition in this experiments. Participants were asked to select a higher
appraising route in three viewpoints after watching the video moving through each
route. The viewpoints are “easier to walk”, “safer to walk”, and “more favorite”. The
participants were 40 elderly people over the age of 60. Table 2 shows the results.
“3/37” indicates that three of the participants selected the shortest route and 37 selected
the route by the proposed method. The marks indicate the difference is significant
confirmed by the binomial test (***: p < 0.001, **: p < 0.01, and *: p < 0.05).
A Route Evaluation Method Considering the Subjective Evaluation 415
Table 2. Results of participant’s selection on a higher appraising route in the three viewpoints
after watching the video moving through each route.
Aim Route The viewpoint of subjective evaluation
“easier to walk” “safer to walk” “more favorite”
“a walk” Tsukuba 3 /37 *** 3 /37 *** 3 /37 ***
** *
Shinjuku 11 /29 12 /28 15 /25
“to a near goal” Tsukuba 4 /36 *** 5 /35 *** 4 /36 ***
Shinjuku 9 /30 ** 7 /32 *** 5 /34 ***
“to a distant goal” Tsukuba 11 /29 **
12 /28 *
15 /25
Shinjuku 11 /28 ** 7 /32 *** 5 /34 ***
As a result of the experiment, it was found that the preference of the route by the
proposed method is higher than that of the shortest route from the viewpoints of “easier
to walk”, “safer to walk”, and “more favorite”. It can be concluded that this result
shows the basic concept of the proposed method is appropriate.
7 Conclusion
Our target is improvement of the quality of life (QOL) of elderly people. Toward
realizing route guidance method effective for the improvement, mechanisms consid-
ering their physical difficulty, mental weakness, security feeling, and preferences can
be useful.
We propose a route planning method considering the subjective evaluation on
walkability, safety, and pleasantness of elderly pedestrians. To quantify their prefer-
ences, the acceptable time delay is used for the cost functions.
The aims of this study are to confirm the factors can take into consideration the
mental and physical situation of the user and to acquire the quantitative cost functions
for these factors. The cost functions were constructed based on the subjective evalu-
ation data on the acceptable time delay in several different conditions.
The basic validity of the method was confirmed by a subjective evaluation
experiment on the routes by the proposed method and the shortest routes. The par-
ticipants were asked to select a higher appraising route after watching the video moving
through the routes, and they selected the former routes in most of conditions.
By using this method, it is possible to plan a route with lower physical load, higher
safety, and more enjoyable for each elderly user. The next goal is to carry out
experiments at all different conditions with more participants, and to make a reliable
assessment of the usefulness of the method.
Acknowledgments. This work was supported in part by Grants-Aid for Science Research
17K00436 of the Japanese Ministry of Education, Science, Sports and Culture.
416 H. Furukawa and Z. Wang
References
1. Ministry of Internal Affairs and Communication, Statistics Bureau: Japan Statistical
Yearbook, Chapter 2: Population and Households. http://www.stat.go.jp/english/data/
nenkan/1431-02.html
2. Muramatsu N, Akiyama H (2011) Japan: super-aging society preparing for the future.
Gerontol 51(4):425–432
3. Mizuno E (2011) Research on anxiety and intention of elderly people to go out. Research
notes of The Dai-ichi Life Research Institute. http://group.dai-ichi-life.co.jp/dlri/ldi/note/
notes1107a.pdf
4. Sun Y, Lee L (2004) Agent-based personalized tourist route advice system. In: ISPRS
congress Istanbul 2004, pp 319–324
5. Novack T, Wang Z, Zipf A (2018) A system for generating customized pleasant pedestrian
routes based on OpenStreetMap data. Sensors 18:3794
6. Torres M, Pelta DA, Verdegay JL (2018) PRoA: an intelligent multi-criteria Personalized
Route Assistant. Eng Appl Artif Intell 72:162–169
7. Matsuda M, Sugiyama H, Doi M (2004) A personalized route guidance system for
pedestrians. IEICE transactions on fundamentals of electronics, communications and
computer sciences, vol 87, pp 132–139
Multi Controller Load Balancing in Software
Defined Networks: A Survey
1 Introduction
Today the requirements in the networks are rapidly increasing. The traditional net-
working technology has its own limitations in terms of new technological innovations,
complex configuration management and operational costs. There is a requirement for
the new networking architecture to overcome the drawbacks of traditional networks.
From the last decade Software Defined Networking (SDN) come into existence, which
is decoupling the controller logic intelligence from the networking devices data plane.
With SDN the network can provide augmented automation, centralized provisioning,
reduced hardware management cost, enhaced security, vendor independent and cloud
ready infrastructure. In SDN, the networking elements (switches) follow the instruc-
tions given by the controller to forward the packets from source to destination. The
controller reactively or proactively [1] insert the flow entries into the flow tables of
switches upon arrival of PACKET IN message from the switches.
SDN is having prominent role in large scale networks, enterprise networks, data
center networks and wide area networks. As the size of network increases the single
centralized controller may face difficulties to handle flow processing events. It will
produce poor response time and highly unreliable. The shortcomings can be overcome
by introducing the concept of multiple controllers to distribute the work of single
controller among multiple controllers. To handle multiple controllers [2] address dif-
ferent challenges in terms of scalability, consistency, reliability and load balancing. In
distributed SDN the switches are statically assigned to the controllers. The switches set
under one controller is called domain of that controller. But this static assignment may
result in variation of load among controllers. Imbalance of load among controller leads
to degradation in performance of the network in terms of controller throughput and
packet loss rate. We require a dynamic mapping between controllers and switches so
that the load is evenly distributed among controllers. The load imbalance in controllers
occur due to large number of flows generated at one particular switch at runtime. There
are two solutions for this problem.
1. Increase the capacity of the controller by providing more resources. (processing
speed, memory, bandwidth).
2. Shift the switches from overloaded controller to underloaded controller.
In the former case, to avoid load on single controller there is a possibility of increase in
capacity of the controller, but the network may not utilize the given resources effi-
ciently. In the later case the allotted resources can be utilized efficiently in a network.
There are many models to address this issue.
There is a recent survey [3] on SDN with multiple controllers, covering all aspects
related to multiple controllers. The contribution of our paper mainly focused on load
balancing of multiple controllers in SDN and it does not include any models presented
in that survey paper. Our paper covers the latest models/proposals published in the area
of multi controller load balancing in SDN.
The remaining part of the paper is presented as follows. Section 2 describes the
process of switch migration in case of uneven load distribution among controllers. In
Sect. 3, we explain the existing models present on controller load balancing. Section 4
gives the comparative analysis of models presented in Sect. 3. Finally it follows the
conclusion.
2 Switch Migration
Handover the switch functionality which is handled by one controller to other con-
troller is called switch migration. The process of switch migration is shown in Fig. 1.
The first controller is called initial controller and second controller is called target
controller. After migration of switch from initial controller to the target controller all
asynchronous messages [4] generated by the switch will be received and processed by
the target controller. However the role of initial controller will become master to slave
and the role of target controller will become slave to master according to OpenFlow
Specification [4]. There are three reasons for switch migration to happen in software
defined networks. First reason is when the new controller is added to the existing
controller pool. In this case the switches from other controllers are migrated towards
the new controller. The second reason is when any one of the controller is down. In this
case all switches of the down controller are migrated to other controllers. The third
reason is the load of one controller is more than its capacity. In this case switches from
overloaded controller are migrated to the underloaded controller. The challenging task
is the selection of switch to be migrated in overloaded controller and selection of target
controller among existing controller pool.
This paper presents a state-of-the-art for controller load balancing to fully utilize the
allotted resources of a network. The switch migration should be done when the load of any
Multi Controller Load Balancing in SDN 419
one of the controller increases. So the task is how to compute the controller load? There
are many metrics to be considered while calculating the load of the controller. Different
models used different metrices to calculate controller load and based on that load bal-
ancing can be performed by switch migration. Through North bound API [5] developed
applications which does controller load balancing communicates with the controller.
3 Existing Proposals
In the literature there are several proposals for controller load balancing. Almost all the
models are using mininet [6] emulator as an experimental testbed. There are many open
source controllers available like RYU [7] python base controller, OpenDayLight [8],
ONOS [9] Java based controllers etc.
3.1 ElastiCon
Elastic Controller [10] is the initiative model for all switch migration techniques. First
the controller load is calculated based on the statistics, CPU, average message arrival
rate from each switch. It provides global network view by the concept of distributed data
store. Once the load on controller is beyond the given threshold, the neighboring switch
and nearest controller are selected for switch migration to reduce inter controller
communication in terms of migration time. The author of this paper presents a switch
migration protocol as a series of message exchanges between controllers to possess the
properties of liveness and safety according to OpenFlow standard. The messages include
start migration, Role-request, Role-reply, Flow-mod, barrier messages. Because of
message exchange between controllers before migration, the response time may increase
significantly. The author proved by experiment that it takes 20 ms to complete switch
migration process and throughput reduces when two core CPU is used than quad-core
processor. Migration can be performed in limited amount of time with less impact on
response time. But to minimize the migration time, it is not considering the load of target
controller. If the load of that controller is also in overloaded stage then this model
doesn’t work well as the load of target controller is not considered in this model.
420 K. Sridevi and M. A. Saifulla
3.2 DHA
Cheng et al., addressed the problem of controller load balancing as Switch Migration
Problem (SMP). To maximize the network utilization, Distributed Hopping Algorithm
(DHA) [11] was designed based on time reversible marcov chain process having the
objective as to serve more requests under available resources. The load of the controller
in this model is calculated based on number of PACKET-IN messages from switches to
controller. According to this algorithm when there is a large variation in load of
controllers, switch and controller are selected randomly for migration and migration
activity will be broadcast to all other neighbor controllers to stop them for other
migration process. After migration the controller will update its utilization ratios of
switches and broadcast the updation to the neighbors for state synchronization among
controllers to have a global network view. This model increases the average utilization
ratio for all available resources. Compared with ElastiCon, DHA takes long migration
time but reduces the response time.
3.3 SMDM
To improve migration efficiency Switch Migration Decision Making scheme [12]
(SMDM) was proposed. Uneven load distribution among controllers can be found by
switch migration trigger metric based on load diversity of controllers. A greedy based
algorithm was designed that gives the possible migration action choices if load
imbalance occurs among controllers. The load is calculated as number of PACKET-IN
messages and minimal path cost from switch to the controller. This model is described
in three steps. First the load diversity is measured for each controller and decision to
perform switch migration is made based on that. the result of this step gives set of
outmigration controllers (The controllers which are overloaded) and immigration
controllers (the controllers which are underloaded). Next calculate the migration cost
and migration efficiency for all possible actions generating in last step. In the last step
measure the migration probability of switches managed by outmigration controller set
and select the switch which is having maximum probability and also select one of the
controller from immigration controller which gives maximum migration efficiency. The
simulation results in this model proved by the authors that response time, migration
time and migration cost are less for this model compared to above discussed schemes
Elasticon and DHA because of selection of migrating switch and target controller based
in efficiency formulation.
3.4 HeS-CoP
This scheme [13] provides a heuristic switch controller placement (HeS-CoP) for Data
Center Networks (DCN) with the intention of well distribution of load among con-
trollers and reduce the packet delay. The two parameters, number of OpenFlow mes-
sages and CPU load are considered to compute the load on controller. This model uses
discreate time slots to decide whether to change master role for switches based on the
load at previous time slot. In every time slot standard deviation of control traffic load is
calculated. If it is less than previous slot no need to change the master roles, if it is
Multi Controller Load Balancing in SDN 421
greater then check for the average CPU load. If it increases change the master role for
the switch having lowest traffic first, if it decreases change the master role for the
switch having highest traffic first to reduce the packet delay. It is also based on the
greedy strategy in which an orchestrator is used and it uses two algorithms. Decision
maker algorithm decides whether to perform switch migration and Forward and
Backward algorithm selects switches and controllers for migration and send changed
topology to Decision maker procedure so that it can send changed topology to all other
controllers. By making use of REST API [14] and SSH, controllers and orchestrator are
able to exchange information. The main extra consideration of this scheme compared to
DHA and SMDM is the characteristics of DCN and CPU load. But execution time of
HeS-CoP is more compared to SMDM and switch migration time is almost similar.
3.5 BalCon
Balanced Controller [14] proposed an algorithm to migrate cluster of switches when load
imbalance occurs in the network. The load on the controller is calculated based on path
computation load and rule installation load for a new flow. Once the controller load is
beyond the predefined threshold, the algorithm generates a set of switches for migration
ordered according to new flow generation from highest to lowest. From that set it finds
the best cluster in which switches are strongly connected according to traffic patterns.
Afterwards cluster migration takes place to a new controller. This scheme balances the
load among controllers by migrating less number of switches and reduces load on
overloaded controller about 19 percent. As this model is based on cluster migration, it
reduces the message exchanges for migration that results in less migration cost.
3.6 EASM
The main objective of Efficiency Aware Switch Migration [15] (EASM) is high effi-
ciency migration and control the load imbalance quickly. Data interaction load, flow
entry installation and state synchronization are considered as the main parametres to
compute overall load on controller. Migration cost can be measured based on number of
hops between controller and switch. It constructs the load difference matrix which is
similar to load diversity matrix in SMDM to aviod local optimization problem. Com-
pared to ElastiCon and DHA it gives reduced respose time and increased throughput.
3.7 DCSM
Dynamic Controller Switch Mapping (DCSM) [16] is not only performing controller
load balancing but also handles the network in case of controller failure. This model
uses hierarchial architecture of controllers in which one controller is selected as root
controller based on lowest controller ID. All remaining controllers will send load
statistics to this root controller. Here the load of controller is calculated in terms of CPU
422 K. Sridevi and M. A. Saifulla
load and Memory load. Based on the load information given by other controllers, the
root controller compares that load with the total load percent to be considered for
overload and will send a message to add or remove the switches in underloaded and
overloaded controllers respectively. If the root controller fails, the next lowest ID
controller is choosen as root controller to handle single point of failure.
3.8 MQPC
The main goal of Load Balancing with Minimun Quota of Processing Capacity
(MQPC) is to reduce the response time of controllers and balancing the load among
controllers when load imbalance occurs. This model [17] proposed a solution based on
minimum utilization of processing capacity of the controllers using Matching Game to
have a minimum load at every controller. The load is calculated based on number of
PACKET-IN messages and the number of hops present between switches and con-
trollers. The controllers can elect preferred list of switches to have based on the
processing capacity of controllers. At the same time the switches can also elect pre-
ferred list of controllers to have based on response time of controllers. According the
prefernce list mapping between controllers and switches will be done to maintain load
balancing. The authors of this paper proved by their experiments that the load bal-
ancing is done evenly compared to static mapping and response time also reduced with
maximum utilization of resources.
4 Comparative Analysis
Table 1 gives the comparative analysis of different models for controller load balancing
in SDN. As we already mentioned in this paper that there are many open source
controllers available like OpenDayLight, FloodLight, Beacon, RYU, ONOS etc., each
one having its own features according to the OpenFlow specification. Each model is
using different controllers according to the requirements and implementation. In multi
controller load balancing there are mainly two kinds of architectures Flat and Hierar-
chical. In Flat architechture the controllers communicate with each other via East-West
bound [3] interface. There is no root controller to maintain all other controllers. In
Hierarchial architechture the root controller is used for communication to maintain a
hierarchy of controllers but this may again leads to single point of failure.
To balance the load among controllers, the important consideration is how to
calculate the load, what parameters are used to calculate the load on the controller to
have a minimum response time. Number of PACKET-IN messages coming from the
switch, Rule installation load, CPU load, Memory and Cost of completing operation
will effect the load on controllers. Afterwards the selection of switch and controller is in
such a way that results in less migration time.
Multi Controller Load Balancing in SDN 423
5 Conclusion
To efficiently utilize the network resources in SDN, there is a requirement for controller
load balancing in distributed environment. The load on the controller is measured in
terms of number of PACKET-IN messages, rule instalation load, CPU load, memory
load and path between controller and switch. The switch migration process is in such a
way that leads to effective load balance among controllers and that should not lead to
other migration under same control traffic. The efficiency of switch migration will
increase the network throughput. In a Wide Area Networks, Data Center Networks and
cloud networks this load balancing strategy is important with available resources. There
are couple of improvements to the models presented in this survey are being developed.
References
1. Nunes BAA, Mendonca M, Nguyen X, Obraczka K, Turletti T (2014) A survey of software-
defined networking: Past, present, and future of programmable networks. IEEE Commun
Surv Tutor 16(3):1617–1634, Third 2014
2. Hu T, Guo Z, Yi P, Baker T, Lan J (2018) Multi-controller based software-defined
networking: A survey. IEEE Access 6:15980–15996
3. Zhang Y, Cui L, Wang W, Zhang Y (2018) A survey on software defined networking with
multiple controllers. J Netw Comput Appl 103:101–118
4. Open network foundation (2015). https://www.opennetworking.org/ (OpenFlow Switch
Specification (Version1.5.0))
5. Stallings W (2016) Foundations of Modern Networking: SDN, NFV, QoE, IoT, and Cloud ,
Pearson Education, Inc, USA
6. Mininet. mininet.org
7. Ryu controller. https://ryu.readthedocs.io/en/latest/writingryuapp.html
8. Opendaylight controller. https://www.opendaylight.org/
9. Onos controller. https://onosproject.org/
10. Dixit A, Hao F, Mukherjee S, Lakshman TV, Kompella R (2013) Towards an elastic
distributed sdn controller. In: Proceedings of the second ACM SIGCOMM workshop on hot
topics in software defined networking, HotSDN 2013, ACM, New York, NY, USA, pp 7–12
11. Cheng G, Chen H, Wang Z, Chen S (2015) DHA: distributed decisions on the switch
migration toward a scalable SDN control plane. In: IFIP networking conference (IFIP
Networking), IEEE, pp 1–9
12. Wang C, Hu B, Chen S, Li D, Liu B (2017) A switch migration-based decision making
scheme for balancing load in SDN. IEEE Access 5:4537–4544
13. Kim W, Li J, Hong JWK, Suh YJ (2018) Hes-cop: heuristic switch-controller placement
scheme for distributed SDN controllers in data center networks. Int J Netw Manag 28(3):
e2015
14. Rest api. https://restfulapi.net/
15. Cello M, Xu Y, Walid A, Wilfong G, Chao HJ, Marchese M (2017) Balcon: A distributed
elastic SDN control via efficient switch migration. In: 2017 IEEE international conference on
cloud engineering (IC2E), IEEE, pp 40–50
16. Hu T, Lan J, Zhang J, Zhao W (2017) EASM: Efficiency-aware switch migration for
balancing controller loads in software-defined networking. In: Peer-to-Peer networking and
applications, pp 1–13
Multi Controller Load Balancing in SDN 425
17. Ammar HA, Nasser Y, Kayssi A (2017) Dynamic SDN controllersswitches mapping for
load balancing and controller failure handling. In: Wireless communication systems
(ISWCS), 2017 international symposium on, IEEE, pp 216–221
18. Filali A, Kobbane A, Elmachkour M, Cherkaoui S (2018) SDN controller assignment and
load balancing with minimum quota of processing capacity. In: 2018 IEEE international
conference on communications (ICC), May, pp 1–6
Interesting Pattern Mining Using Item
Influence
1 Introduction
mining. For example frequency based measures may generate interesting patterns with
high dissociation [3–6] which is not expected. Dissociation (d) of an itemset refers to
the percentage of transactions where one or more items are absent but not all. Note that
dissociation for 1-itemset is not applicable. In most of the cases patterns with high
dissociation generates pessimistic association rules [11] which have less significance in
knowledge discovery.
The formal definition of frequent pattern and association rule are as follows. Let DB
a database consisting of n number of transactions T ¼ ft1 ; t2 ; t3 ; . . .; tn g and
I = {i1 ; i2 ; i3 ; . . .:im } be a set of m number of items where each transaction is a subset of
I i.e. TI. Support(s) is a metric that refers to the percentage of appearance of an
itemset in the database and used for finding the frequent patterns w.r.t user defined
minimum support threshold parameter minsup. An itemset having at least minsup
amount of support is referred to as frequent pattern (FP). An association rule (AR) is
an expression in the form of A ! B where itemsets A; B I and A \ B ¼ ;. Confi-
dence(c) is an interestingness measure used for finding the association rule(AR) from
the set of frequent patterns w.r.t user defined minimum confidence threshold parameter
minconf. Confidence indicates the conditional probability of B given that A has
occurred and expresses the strength of a rule. A rule having at least minconf amount of
confidence is referred to as association rule (AR).
Example 1. A synthetic dataset is presented in Table 1. Consider 10% minsup. Table 2
shows the extracted patterns along with their dissociation.
The result shows the generation of frequent patterns such as AB, AD with high
dissociation. Pattern with lower dissociation is much more interesting. In addition, both
of AB and AD possesses equal support and equal dissociation. So it is difficult to
identify which one is more interesting pattern.
Moreover isolated items [12] may be frequent due to their high support but do not
participate in association rule mining. This is a serious contradictory matter in ARM.
One of the possible solutions of the problems is concerned with the application of
weighted support (ws) [9, 10], but unfortunately in most of the cases the item weights
have been chosen arbitrarily or needs domain knowledge of the database. Also support
is unable to find valuable patterns with low frequency [8].
428 S. Datta et al.
In this paper we have introduced a new concept of interesting pattern mining based
on item influence. However the major contributions are- (a) introduction of non-
frequency based pattern mining using item influence (b) automatic item weight fixation
and (c) rejection of isolated patterns that have no contribution in rule generation.
The rest of the paper is as follows. Section 2 presents related work. Proposed
method is described in Sect. 3. Experimental analysis is shown in Sect. 4. Finally
Sect. 5 concludes the paper.
2 Related Work
Several frequency based techniques have been introduced in the literature for frequent
itemset mining (FIM) [2, 7, 15] throughout the last decades. Due to the limitations of
frequency based measures some of the scholars have suggested for alternative concepts
for mining of interesting patterns without support parameter. Tang et al. [17] has
introduced occupancy based interesting pattern mining concept. Utility based pattern
mining [18] is another variant of interesting pattern mining process without support
pruning. Schaus et al. [19] has voiced for constraint based pattern mining. Preti et al.
[16] has well discussed the options for pattern mining beyond frequencies. In [20]
authors have ignored the support threshold. In [9, 13, 14] authors have adopted
weighted support based pruning strategies.
3 Proposed Method
The proposed method consists of three major steps including the measurement of item
influence for 1-itemset, transaction influence for transactions and influential weights for
all itemsets. The detailed flow chart of the proposed method is furnished below.
For example we have chosen two itemsets say A and AC from Table 1 where
A 2 ft1 ; t2 ; t4 ; t9 g and AC 2 ft2 ; t9 g. Considering Table 3 and definition 3.3, influential
weights for A and AB are as follows.
iwð AÞ ¼ tiðt1 Þ þ tiP ðt2 Þ þ tiðt4 Þ þ tiðt9 Þ
10
ðt2 Þ þ tiðt9 Þ
¼ 0:48214 and iwðAC Þ ¼ tiP 10 ¼ 0:25. In the
k¼1
tiðtk Þ k¼1
tiðtk Þ
similar way influential weights for all of the itemsets are calculated. The list of
interesting patterns extracted from Table 1 using our method is shown in Table 4.
Patterns are pruned with respect to user defined minimum influential weight
threshold miniw. The patterns that possesses at least miniw amount of influential
weights are treated as interesting patterns.
3.4 Algorithm
4 Experimental Analysis
We have tested the proposed method on standard real datasets shown in Table 5. The
result in Table 6 shows the average influential weight (Avg. iw) of top 5 interesting
patterns along with their average dissociation (Avg. d). Table 7 shows the number of
generated patterns from the specified database with different miniw. It states the affect
of miniw on pattern mining.
432 S. Datta et al.
A comparative study in between Apriori [1] and our is presented in Figs. 2 and 3.
The study in Fig. 2 clearly shows that our method is capable of extracting less number
of patterns with equal minimum threshold while Fig. 3 supports mining of patterns
with less dissociation. Patterns with less dissociation are more associative.
Interesting Pattern Mining Using Item Influence 433
In this paper we have introduced a new technique of interesting pattern mining using
the concept of item influence. It is a non-frequency based weighted pattern mining
technique that follows downward closure property. The method consists of a strong
pruning process based on influential weight. Mechanism of initial weight assignment is
automatic. The proposed method not only controls the generation of huge number of
patterns but also generates interesting patterns with lower dissociation. Our method is
efficient in pruning of the isolated items.
Our future effort should concentrate on the development of algorithmic efficiency in
terms of time and memory and mining of significant association rules.
Acknowledgement. This research was partially supported by the DST PURSE II program,
Kalyani University, West Bengal, India.
References
1. Agarwal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in
large datasets. In: Proceedings ACM SIGMOD 1993, pp 207–216
2. Chee CH, Jaafar J, Aziz IA, Hasan MH, Yeoh W (2018) Algorithms for frequent itemset
mining: a literature review. In: Artificial Intelligence Review, pp 1–19
3. Pal S, Bagchi A (2005) Association against dissociation: some pragmatic consideration for
frequent Itemset generation under fixed and variable thresholds. ACM SIGKDD Explor 7
(2):151–159
4. Datta S, Bose S (2015) Mining and ranking association rules in support, confidence,
correlation and dissociation framework. In: Proceedings of FICTA, AISC, vol 404,
Durgapur, India, pp 141–152
5. Datta S, Bose S (2015) Discovering association rules partially devoid of dissociation by
weighted confidence. In: Proceedings of IEEE ReTIS, Kolkata, India, pp 138–143
434 S. Datta et al.
6. Datta S, Mali K (2017) Trust: a new objective measure for symmetric association rule
mining in account of dissociation and null transaction. In: Proceedings of IEEE ICoAC,
Chennai, India, pp 151–156
7. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In:
Proceeddings ACM SIGMOD, Dallas, USA, pp 1–12
8. Wu JM-T, Zhan J, Chobe S (2018) Mining association rules for low-frequency itemsets.
PLoS ONE 13(7):e0198066
9. Vo B, Coenen F, Le B (2013) A new method for mining frequent weighted itemsets based on
WIT-trees. Expert Syst Appl 40:1256–1264
10. Datta S, Bose S (2015) Frequent pattern generation in association rule mining using
weighted support. In: proceedings of IEEE C3IT, Hooghly, India, pp 1–5
11. Datta S, Chakraborty S, Mali K, Banerjee S, Roy K, Chatterjee S, Chakraborty M,
Bhattacharjee S (2017) Optimal usages of pessimistic association rules in cost effective
decision making. In: Proceedings of IEEE Optronix, Kolkata, India, pp 1–5
12. Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high
utility itemsets. Data Knowl Eng 64:198–217
13. Bui H, Vo B, Nguyen H, Nguyen-Hoang TA, Hong TP (2018) A weighted N-list- based
method for mining frequent weighted itemsets. Expert Syst Appl 96:388–405
14. Lee G, Yun U, Ryu KH (2017) Mining frequent weighted itemsets without storing
transaction IDs and generating candidates. Int J Uncertain, Fuzziness Knowl-Based Syst
25(1):111–144
15. Annapoorna V, Rama Krishna Murty M, Hari Priyanka JSVS, Chittineni S (2018)
Comparative analysis of frequent pattern mining for large data using FP-tree and CP-tree
methods. In: Proceedings of the 6th FICTA, AISC, vol 701, pp 59–67, Bhubaneswar, India
16. Preti G, Lissandrini M, Mottin D, Velegrakis Y (2018) Beyond frequencies: graph pattern
mining in multi-weighted graphs. In: Proceedings of the 21st EDBT, pp 169–180
17. Tang L, Zhang L, Luo P, Wang M (2012) Incorporating occupancy into frequent pattern
mining for high quality pattern recommendation. In: Proceedings of the 21st ACM CIKM
2012, Hawaii, USA, pp 75–84
18. Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Tseng VS, Yu PS (2018) A survey of
utility-oriented pattern mining. arXiv preprint arXiv:1805.10511
19. Schaus P, Aoga JOR, Guns T (2017) CoverSize: a global constraint for frequency-based
itemset mining. In: Proceedings of the international conference on principles and practice of
constraint programming, LNCS, vol 10416, pp 529–546
20. Cheung Y-L, Fu AW-C (2004) Mining frequent itemsets without support threshold: with or
without item constraints. In: IEEE TKDE, vol 16, no 9
Search Engines and Meta Search Engines
Great Search for Knowledge: A Frame Work
on Keyword Search for Information Retrieval
1 Introduction
Relevance Performance
Effective Ranking Efficient search and Indexing
Evaluation Data coverage and freshness
Testing and Measuring Scalability
Information needs Growth of Data
User interaction Adaptability
SEs is classified into different categories on the basis of their indexing, retrieval sys-
tems and other characteristics.
Meta Search Engines searches the information from different search engines simulta-
neously to retrieve user query. MSEs are also known as multiple search engines.
Examples of MSEs are Metacrawler, Mamma, Dogpile, Excite, Webcrawler etc.
1. Query Processing is query submits to SE it searches and creates the URLs to the key
words, which is related to query.
2. Web Crawling mainly used to create a copy of all the visited pages. Web Crawler
visits the links on web and updates the search engine’s index periodically.
3. Indexer request the page from the server, the server scans the page and prepares the
visits the links on web and updates the search engine’s index periodically. URLs
along with the key words in the relevant page.
4. Ranking is search the word frequency to determine the relevant web page. The
Rank is given on the basis of the number of times the word appeared in the web
page
5. Search and Display Finally it searches the sorted words, on the basis of occurrence
in the page and results will be displayed to the user.
import java.sql.*;
import javax.servlet.*;
import javax.servlet.http.*;
/*
* Inserting record into a population table
*/
public class Search Url extends HttpServlet {
public void doGet(HttpServlet Request request, Http Servlet Response response)throws
IOException, ServletException
{
String keyword = request.getParameter("keyword");
PrintWriter out = response.getWriter();
response.setContentType("text/html");
String clobData = null;
Connection con = null;
Statement st=null;
ResultSetrs=null;
try {
Search Engines and Meta Search Engines Great Search for Knowledge 441
Class.forName("com.mysql.jdbc.Driver");
con =DriverManager.getConnection ("jdbc:mysql://localhost/search_engine", "root",
"root");
st=con.createStatement();
rs=st.executeQuery("select b.keyword,b.heading,b.url_desc,b.url from
search_urla,search_url_desc b where a.keyword='"+keyword+"' and
a.keyword=b.keyword");
out.println("<html>");
out.println("<body bgcolor='hyderabad'>");
out.println("<p align='center'><font size='20'>SEARCH ENGINE</font></p>");
while(rs.next())
{
String keyword1=rs.getString(1);
String heading=rs.getString(2);
String url_desc=rs.getString(3);
String url=rs.getString(4);
out.println("<a
href='/SearchEngine/"+url+"'><h2>"+heading+"</h2></a>");
out.println("<br>");
out.println(url_desc);
out.println("</body>");
out.println("</html>");
}
} catch (Exception e) {
e.printStackTrace();
out.println("<body><h4><font color='red'>Notable Display "
+ e.getMessage() + "</font></h4></body></html>");
}
}
}
The GUI Fig. 6 keyword search frame work, searches the key word from database.
Output gives the search results on the basis of occurrence of keywords in the database.
Search Engines are significant and necessary tools that help users to find relevant
information in the World Wide Web. SEs finds information as per user’s query and
presents most relevant information to the user. Keyword search frame work with GUI,
out put of the program in java is discussed. Query searches the data base as per the
occurrence of the keywords. SEs and MSEs are going to play a very crucial role in IR
on emerging Semantic Web. It is not merely retrieval efficiency that is going to be
considerable in future. The future search engines will be more interactive they will be
talking and thinking search engines to facilitate information retrieval by the knowledge
workers of tomorrow. No SE and MSE cover the entire web.
The future work on SEs and MSEs is that there is an immediate need to dig and
discover deep web to develop a new ranking method to find out exact search results as
per the users query from the web.
References
1. Biradar BS, Sampath Kumar BT (2006) Internet search engines: a comparative study and
evaluation methodology. SRELS J Inf Manag 43(30):231–241
2. Brin S, Page L (1998) The anatomy of a large-scale hyper textual Web search engine.
Comput Netw Isdn Syst 30(1–7):107–117
3. Chowdhury A, Soboroff I (2002) Automatic evaluation of World Wide Web search services.
In: Proceedings of the ACM SIGIR conference, vol 25, pp 421–422
4. Lewandowski D, Wahlig H, Meyer-Bautor G (2008) The freshness of web search engine
databases. J Inf Sci 34(6):817–831
5. Manoj M, Elizabeth J (2008) Information retrieval on Internet using meta- search engines.
A Rev JS Sci Ind Res 67:739–746
6. Mowshowitz A, Kawaguchi A (2002) Assessing bias in search engines. Inf Process Manag
35(2):141–156
7. Jain RK, Bandumatha (2007) Dynamics of search engines: an Introduction. ICFAI
University press (2007)
8. Sangeetha K, Sivarajadhanavel P (2007) Googles growth a success story. ICFAI University
press (2007)
Search Engines and Meta Search Engines Great Search for Knowledge 443
9. Selberg E, Etzioni O (1995) Multi service search and Comparison using the Metacrawler.
In proceedings of the 4th World Wide Web conference. Boston, MA, USA, pp 195–208
10. Selberg E, Etzioni O (1997) The MetaCrawler architecture for resource aggregation on web.
In: IEEE Expert, 12, pp 11–14
11. Shafi SM, Rather RA (2005) Precision and recall of five search engines for retrieval of
scholarly information in the field of biotechnology. Webology 2(2). http://www.webology.ir/
2005/v2n2/a12.html
12. Thelwall M (2008) Quantitative comparisons of search engine results. J Am Soc Inf Sci
Technol. 59(11):1702–1710
13. Uyar A (2009) Investigation of the accuracy of search engine hit counts. J Inf Sci 35(4):
469–480
14. Vivekanand J (2011) Search engines utility and efficiency. Shree Publishers and Distributors,
New Delhi, pp 1–10
Web Resources
15. Marketing statistics/what happens online in 60 seconds www.smartinsights.com.internet.
Retrieved on 20-02-2019 at 8 pm
16. https://tinobusiness.com/how-a-search-engine-works-an-explanation-in-3-steps
17. https://en.wikipedia.org/wiki/Metasearch_engine
18. http://www.pewinternet.org/2012/03/09/search-engine-use-2012/
19. https://www.google.co.in/search?q=meta+search+engine+images&biw=1440&bih=
799&source
Model Based Approach for Design
and Development of Avionics
Display Application
1 Introduction
Modern Avionics display system mainly consists of Primary flight display, Navigation
display, Multifunctional display and Engine indicating and crew alerting system. PFD
is designed to provide the pilot with visual information regarding flight overall situa-
tional awareness such as aircrafts attitude, airspeed, vertical speed, heading, altitude
etc. PFD is the most critical and often referenced display. The display computer mainly
comprises of display hardware and application software. The display application
software is safety critical software that displays the information in standard graphical
and alphanumeric forms such as text, numerals, scales, dials, tapes, and symbols
together termed as “Display Symbologies”. The development and certification of air-
worthy application software for the avionics display system is a very long process as it
is safety-critical software and involves several stages of verification and validation,
compliance and traceability. To reduce the development time and cost, the model-based
approach is used for design and code generation of graphics for avionics display.
A model-based approach for display symbology design can greatly reduce the work-
load of modeling and improve work efficiency in design of user interface of display
systems. In this paper, the OpenGL SC (Safety Critical) based SCADE display tool is
used. The tool provides a platform to design and simulate interactive graphical inter-
face, it also features a target-independent code generator which allows generating C
code using OpenGL library. This flight worthy application software is capable of
executing both on the flight simulator as well as on the target hardware [1].
2. Design graphical interface. This includes scheming appropriate layout to put up all
the modules of the display. PFD mainly includes airspeed, altitude, attitude, heading
and autopilot modules. Further appropriate display elements are defined in each
module to represent the appropriate flight information, for example airspeed pointer,
airspeed readout etc. Finally, all the display elements are added into appropriate
module group for better identification.
3. Appropriate movement is added to the Symbology to be driven in order to interact
with the simulation input. This includes Transition, Rotation or Conditional
group. Transition properties are implemented to move a Symbology in horizontal or
vertical direction. For example, Movement of airspeed tapes in vertical direction
based on increase/decrease of airspeed value. Rotation properties are implemented
to rotate a symbol in clockwise or anticlockwise direction. For example, rotating the
heading dial based on current heading of the aircraft. Conditional group is mainly
implemented to replace a specific symbol with a failure annunciation due to
equipment failure or system failure. In order to provide dynamic movements to the
symbols, appropriate variable is plugged to the group. Figure 3 shows dynamic
movement of airspeed tape based on increase/decrease of airspeed value.
4. Simulation of the display symbology in performed using SCADE Display simulator
to verify the symbol motion based on the simulation input. Design optimization and
correction is performed using Integrated Design Checker which enforces compli-
ance of a display specification to methodological, naming and graphical design
rules. Figure 4, shows the SCADE display simulation environment.
5. SCADE Display KCG Code generator is used to generate OpenGL SC (Safety
Critical) based C source code consisting of resource file, Symbology layer file,
Target configuration file and log file [9] (Fig. 5).
448 P. Santosh Kumar et al.
However, the great majority of PFDs follow a similar layout convention. Other
information displayed on the PFD includes ILS glide slope indicators, navigational
marker information, course deviation indicators, Display configuration settings and
much more. If no valid data for the display system is available due to equipment or
subsystem failure, an appropriate failure annunciation is displayed on screen. Fig
indicates failure of airspeed, the airspeed indicator is assigned with a failure flag, during
airspeed failure the flag is set high and failure annunciation is displayed [12] (Fig. 7).
4 Conclusion
In this paper, the implementation of the display system shows that the use of SCADE
Display tool will fundamentally change the development process of HMI. The goal of
this effort was to design, develop and assessment of user interface for Primary flight
display and Engine indicating crew alerting system. The model based approach for
display HMI development provides a cost effective solution and reduces the certifi-
cation efforts.
References
1. Yang Z (2009) Aircraft cockpit displays and visual simulation software implementation.
University of Electronic Science and Technology, pp 48–53
2. Marvin, Gao X, Wu Y (2006) Design and implementation of integrated avionics display and
control simulation system. Fire Control and Command Control 31(2): 40–43
3. Liu J (2009) based GL-Studio virtual cockpit flight simulator developed. Harbin: Harbin
Institute of Mechanical and Electrical Engineering, University of Technology, pp 23–25
4. Luo C, Shen W, Zishan S Flight simulation system of multi-function display and
Implementation. Computer Simulation
Model Based Approach for Design and Development 451
5. Beijing Hua Li Chuang Tong Technology Co., Ltd.. GL Studio: real instrument panel
development tools. Software world
6. Fan S (2002) GL Studio software in visual simulation modeling. Comput Eng 3:260–261
7. Lefebvre Y (2008) Presagis, Montreal, Quebec (Canada). A flexible solution to deploy
avionics displays to multiple embedded platforms. In: 27th digital avionics systems
conference October 26–30
8. “Efficient Development of Safe AvionicsDisplay Software with DO-178BObjectives
UsingEsterel SCADE®” Methodological Handbook, Esterel Technologies Web Site
9. Yang Z, Wang J Research and implementation of display system in an avionics integrated
simulation system
10. Getting started with SCADE display. Esterel Technologies
11. Esterel Technologies Web Site. http://www.esterel-technologies.com
12. https://www.skybrary.aero/index.php/Primary_Flight_Display_(PFD)
13. https://en.wikipedia.org/wiki/Engine-indicating_and_crew-alerting_system
Thyroid Diagnosis Using Multilayer
Perceptron
Abstract. Thyroid disease is one of main origin of serious medical issues for
human subsistence. Therefore, proper diagnosis of thyroid disease is treated as
an important issue to determine treatment for patients. A new approach on
Multi-layer Perception (MLP) using back propagation learning algorithm to
classify Thyroid disease is presented. It consists of an input layer with 4 neu-
rons, 10 hidden layer with 3 neurons and an output layer with just 1 neuron. The
relevant choice of activation objective and the number of neurons in the hidden
layer and also the number of layers are achieved using MLP test and error
method. The proposed method shows better performance in terms of classifi-
cation accuracy. For simulation results MATLAB Tool is used.
1 Introduction
Currently artificial intelligence addresses huge number of issues for developing pro-
fessional systems to diagnose various kinds of defect with high precision [1]. These
systems assist staff in hospitals and medical centers to quickly diagnose patients and
relinquish them essential treatments without need for a medical expert. As a result,
these systems abetment cost and time for diagnosis [2, 3]. Artificial Neural Network is
the most important artificial intelligence technique that has been used to design diag-
nostic rule for distinct diseases such as diabetes, heart disease, breast cancer, skin
disease, and thyroid [4].
The following paper is organized as Sect. 2 describes about literature review of
neural network, Sect. 3 explain the proposed methodology, in Sect. 4 details the results
and discussion finally Sect. 5 gives the conclusion remarks of proposed algorithm.
2 Literature Review
3 Proposed Methodology
In this work, a multilayer feed leading ANN is exploited to observe the type of thyroid
cases. The architecture and procedure of ANN mimic the biological nervous system of
human beings. A multilayer ANN has input layer, output layer, and one or more finite
number of undisclosed layers. Each layer consists of personal elements called neurons
or nodes. The number of neurons in each layer is chosen to be sufficient to solve a
particular problem. Except the neurons of output layer, each neuron of a certain layer in
feed forward network is connected to all neurons of a next layer by synaptic weights
[7]. The synaptic weights are initialized with random values. During workout proce-
dure, synaptic substances are altered via learning algorithm to make inputs produce the
desired output. The structure of multilayer feed forward neural network is shown in
Fig. 1.
wi þ 1 ¼ wi þ Dw ð1Þ
Where w(i+1) is updated value of the synaptic weights is current value of the
Synaptic weights, and is the restore change of weights, which is determined as Eq. (2).
@E
Dw n ð2Þ
@W
E
Where n is the learning rate parameter, and error of the derivative with respect @@W
to value of synaptic weights.
The ANN performance is computed by calculating the classification rate as
Equation
Thyroid Diagnosis Using Multilayer Perceptron 455
Train using scaled conjugate gradient back propagation and Training automatically
stops when generalization stops improving, as indicated by an increase in the cross –
entropy error of the validation samples and also Training multiple times generate
different results due to different initial conditions and sampling (Tables 1, 2 and 3).
Thyroid Inputs is a 21x7200 matrix, static data represented data: 7200 samples of
21 elements and Target ‘Thyroid Targets’ is a 3x7200 matrix, static data represented as
7200 samples of 3 elements (Figs. 2, 3, 4, 5 and 6).
456 B. Nageshwar Rao et al.
These operations are extended until the error reaches a very small value (approx-
imately zero). At this time, the algorithm assembles, and the training process is stop-
ped. The flowchart of back propagation algorithm is shown in Fig. 1. After that, a test
process is commenced to evaluate the performance of trained ANN via applying test
samples that are not used in the training process.
Thyroid Diagnosis Using Multilayer Perceptron 457
Results in Table 4 also show that there was an increasing trend in the classifier
accuracy when the number of coefficients was increased. The MLP performance
without F-Ratio and with F-Ratio for coefficient 10 accuracy increased by 3.65% and
AUC increased by a percent of 3.05. From this work it is observed that 99.6 of AUC
for with F-Ratio.
458 B. Nageshwar Rao et al.
5 Conclusion
In this work we have patented novel approach for classification of thyroid cancer using
multilayer perceptron modal, which classifies thyroid as cancers or non-cancers. The
obtained results were analyzed using with F-Ratio and without F-Ratio at different
values of hidden layers. The use of F-Ratio analysis to rank the significance of coef-
ficients increases the classification accuracy and the sensitivity of the MLP. The results
obtained shows that multilayer perceptron with F-Ratio analysis has better classifica-
tion accuracy.
References
1. RamakrishnaMurty M, Murthy JVR, Prasad Reddy PVGD (2011) Text document classifi-
cation based on a least square support vector machines with singular value decomposition.
Int J Comput Appl (IJCA) indexed by DOAJ, Informatics, ProQuest CSA research database,
NASA ADS (Hardward university)etc, ISBN 978-93-80864-56-6, https://doi.org/10.5120/
3312-4540, [impact factor 0.821, 2012] 27(7):21–26
2. Himabindu G, Ramakrishna Murty M et al (2018) Classification of kidney lesions using bee
swarm optimization. Int J Eng Technology 7(2.33):1046–1052
3. Himabindu G, Ramakrishna Murty M et al (2018) Extraction of texture features and
classification of renal masses from kidney images. Int J Eng Technology 79(2.33):1057–1063
4. Navya M, Ramakrishna Murty M et al (2018) A comparative analysis of breast cancer data set
using different classification methods. International Conference and published the proceedings
in AISC, Springer, SCI-2018
5. Lederman D (2002) Automatic classification of infants cry. M.Sc. Thesis, Department of
Electrical and Computer Engineering: Ben-Gurion University of The Negev. Negev, Israel
6. Ham FM, Kostanic I (2001) Principles of neurocomputing for science and engineering.
McGraw Hill, New York
7. Protopapasa V, Eimas, PD (1997) Perceptual differences in infant cries revealed by
modifications of acoustic features. Acoust Soc Am 102:3723–3734
8. Dey R, Bajpai V, Gandhi G, Dey B (2008) Application of artificial neural network
(ANN) technique for diagnosing diabetes mellitus. In: IEEE third international Conference on
Industrial and Information Systems (ICIIS) Kharagpur, India, pp 1–4
Optimal Sensor Deployment Using Ant Lion
Optimization
1 Introduction
The significant concern of research in the WSN’s area is the coverage rate of the
network. It must assure that the monitored area field must be entirely covered and
sensed over the full lifespan of the whole network. Improper placement of sensors in
the ROI is the main contributing factor towards coverage problem. Many linear
techniques have been proposed in the past literature [4–8] related to sensor deployment.
K. Chakrabarty et al. proposed a virtual force algorithm for sensor deployment [9]. The
sensor field is depicted by a grid. Initially sensors are placed randomly and then the
sensors are divided into clusters and cluster heads randomly. Then VFA is executed on
the cluster heads to obtain new locations for the sensors. Li-HsingYen et al. proposed
clustering K-means approach to improve the network coverage. In this the clusters are
formed based on near proximity distance and then cluster heads are elected based on
energy. Using optimization techniques for solving real-world problems has become a
new paradigm in the diverse field of applications. The optimization techniques are a
combination of mathematical theories and collective intelligence that can solve the
problem quickly and efficiently. The first metaheuristic optimization technique called
Particle Swarm Optimization (PSO) [10] for improving the coverage rate of the net-
work was proposed by Wu Xiaoling et al. [11]. The PSO technique was used to
maximize coverage accuracy based on probability sensor model.
The above limitations motivate us to plan a system that optimizes the sensor
deployment process. Our work solely focuses on solving the coverage problem. We
consider the coverage as a single objective problem and ant lion optimization algorithm
is used to maximize the coverage rate of the sensor network. This paper is formulated
as follows: Section 2 illustrates the ALO in detail. Section 3 explains the WSN cov-
erage. Section 4 proposes a methodology to solve the coverage problem. Section 5.1
depicts the experimental setup and Section 5.2 shows the performance evaluation.
Lastly, Section 6 ends with the conclusion.
The antlions and ants move in N-dimensional search landscape for foraging pro-
cess. Ants randomly walk in the landscape searching for food and this movement
behavior can be formulated as:
where ai is the lower boundary of the random walk of ith variable, bi is the upper
boundary of the random walk in ith variable, cli is the minimum of ith variable at lth
iteration, and bli indicates the maximum of ith variable at lth iteration. The antlion traps
affect the random walk movement of the ants and is modelled as:
where cl is the minimum vector of all variables at lth iteration, bl is the maximum vector
of all variables at lth iteration, cli is the minimum vector of all variables for ith ant, bli is
the maximum vector of all variables for ith ant, and Antlionlj shows the selected
ith position of antlion at lth iteration and is calculated using Roulette wheel mechanism.
The vectors in (3) and (4) defines the ants random walk around a selected antlion. The
ants move within a hypersphere around antlion. When the antlions sense that ants are
trapped in the pit, the sliding of ants towards antlions and throwing of sand outwards
when the ants try to escape is modelled with decreasing radius as:
ct
cl ¼ ð5Þ
10w ðl=LÞ
bt
bl ¼ ð6Þ
10w ðl=LÞ
where l is the current iteration and L is the number of maximum iterations. The w is the
constant which helps in exploitation process and is given as:
8
>
> 2 l [ 0:1L
>
>
<3 l [ 0:5L
w¼ 4 l [ 0:75L ð7Þ
>
>
>
> 5 l [ 0:9L
:
6 l [ 0:95L
Optimal Sensor Deployment Using Ant Lion Optimization 463
The best solution (antlion) obtained during process is refer to as elite. The ants
move around the selected antlion by roulette wheel mechanism and elite is modelled as:
RlA þ RlE
Antil ¼ ð8Þ
2
where RlA is the random walk selected by the roulette wheel at lth iteration around the
antlion, RlE is the random walk around the elite at lth iteration, and Antli indicates the
position of ith ant at lth iteration.
The stepwise details of this algorithm is as follows:
1. Initialize parameters: no. of ants (A), no. of antlions (AL), iterations (L)
2. Evaluate the fitness of Ants and Antlions
3. Determine Elite (finding the best antlion)
4. Set iterations 1 to L
5. for every ant A
a. Choose antlion using Roulette wheel mechanism
b. Update trapping of ants in antlions pits using equation 3 and equation 4
c. Create random walk using equation 1 and normalize using equation 2
d. Update ant position using equation 8
e. Evaluate the fitness of all ants
f. Update elite if superior antlion is found
6. Return Best Antlion (Elite).
WSN to be operable the sensors must sense, process and transmit the information. Lack
of sensing ability leads to a coverage problem. According to [13], there are three main
reasons for coverage problem: random deployment, limited sensing range and inade-
quate sensors to cover the ROI. The sensors can be deployed in the ROI either man-
ually or randomly. In manual deployment, sensors are placed manually where the
location is known prior. In contrast, sensors are placed stochastically in the random
deployment. In [14], the author discussed two models to evaluate the sensing range and
coverage area of a network. They are the Binary and probability model.
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
dðS; PÞ ¼ ðxs xiÞ2 þ ðys yiÞ2 ð10Þ
In the equation (10), d(S,P) is the Euclidean distance between point P(xi,yi) and the
sensor node S(xs,ys).The main shortcoming of the binary sensing model is it ignores
imprecision in sensor detection due to interference or hardware malfunctioning. Hence
sensor coverage is not a perfect circle.
Different values of a1, b1, d1, and a2, b2, d2, yield different probability detection of
the sensors. The values of a1, b1, and a2, b2, falls in the range of [0, 1]. The d1, d2 is
calculated as:
d1 ¼ rc þ rs þ dðci; pÞ ð12Þ
d2 ¼ rc rs þ dðci; pÞ ð13Þ
To calculate for more sensor nodes to measure the target cov and is given by:
The network coverage rate is defined and calculated as the ratio of the area covered
and the total area of the grid. The formula is given as:
P
PCov ðcovÞ
Areacov ðcovÞ ¼ ð15Þ
TotalArea
Optimal Sensor Deployment Using Ant Lion Optimization 465
The coverage of the sensor field is calculated as the fraction of grid points that
exceed the threshold Cth [15].
4 Proposed Methodology
ALO based strategy is proposed to solve the coverage problem. ALO helps in deter-
mining the optimal sensor node location that can maximize the coverage rate of the
sensor network. The sensor nodes are deployed in a landscape which is a two-
dimensional area. All the sensor nodes know their respective positions. The target area
is divided into an equal number of grid points. The base station is located at a fixed
point on the grid.
Initially, the sensors are placed randomly on the ROI. Then the ALO is executed at
the base station. The ALO algorithm determines the optimal location of the sensors.
The base station then transmits the optimal location points to the sensor nodes. Upon
receiving this information from the base station, the sensor nodes move to the new
optimal positions. These location coordinates are obtained based on the coverage
objective function. Given a set of N sensors S = s1, s2,…..,sN that are to be placed on a
grid, the coverage problem is to optimally deploy sensors so that maximum coverage is
achieved and the objective function can be formulated as:
Neffective
f ¼ Maximize Area Coverage Ratio ¼ ð16Þ
Nall
Here Neffective is the number of grid points covered by the sensor and Nall is the total
grid points in the entire area. The coverage ratio can be calculated as following:
1. Calculate the coverage rate using equation 11.
2. Calculate the joint coverage rate using equation 14.
3. Repeat step1 and step 2 to calculate the joint rate of each grid point.
4. Calculate the area coverage rate using equation 15.
The Fig. 1 depicts the flowchart for optimal sensor deployment using ALO. The
final sensor deployment takes place after coverage optimization.
5 Simulations
5.1 Experimental Setup
The setup is carried on MATLAB 2018a software. The sensing field is assumed as
20x20m2 grid area and 10 sensors are to be deployed on the 2D plane. The number of
search agents assumed for our experiment is 40 and the halting criterion is assumed as
1000 iterations. The ants position is assumed as the sensor node optimal positions and
the elite is considered as the maximum coverage rate value.
466 M. A. Syed et al.
The Fig. 2 shows the uniform sensor deployment using ALO. The red square
denotes the base station, the red stars denote the cluster heads, and the blue stars denote
the sensors. The Fig. 3a shows the coverage rate achieved using different algorithms. It
is clearly noted that ALO performed better from other algorithms. The Fig. 3b shows
the average execution time taken by the algorithms to deploy the sensors. The ALO’s
execution time was less from other algorithms. However, execution time may vary
depending on the processor and cache speed. In conclusion, ALO was found providing
better optimal coverage with minimum execution time.
Optimal Sensor Deployment Using Ant Lion Optimization 467
Fig. 3. Bar graphs for coverage rate and average execution time.
468 M. A. Syed et al.
6 Conclusion
The ALO optimization algorithm technique used for optimal sensor deployment in our
work discussed in detail. The main aim of our work was to solve the coverage problem
in WSN’s. The results show that ALO outperformed all the other algorithms discussed
in terms of better convergence rate, performance, and objective value. In later works
will try to solve this problem by taking load balancing and routing paradigm into
consideration.
References
1. Zhao F, Guibas LJ, Guibas L (2004) Wireless sensor networks: an information processing
approach. Morgan Kaufmann, San Francisco
2. Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a
survey. Comput networks 38:393–422
3. Hoebeke J, Moerman I, Dhoedt B, Demeester P (2004) An overview of mobile ad hoc
networks: applications and challenges. Journal-Communications Netw 3:60–66
4. Chakrabarty K, Iyengar SS, Qi H, Cho E (2002) Grid coverage for surveillance and target
location in distributed sensor networks. IEEE Trans Comput 51:1448–1453
5. Wang Y-C, Hu C-C, Tseng, Y-C (2005) Efficient deployment algorithms for ensuring
coverage and connectivity of wireless sensor networks. In: First International Conference on
Wireless Internet (WICON’05), pp. 114–121
6. Heo N, Varshney PK (2005) Energy-efficient deployment of intelligent mobile sensor
networks. IEEE Trans. Syst., Man, Cybern.-Part A: Syst. Humans 35:78–92
7. Yen L-H, Yu CW, Cheng Y-M (2006) Expected k-coverage in wireless sensor networks. Ad
Hoc Networks. 4:636–650
8. Wu CH, Lee KC, Chung YC (2007) A Delaunay Triangulation based method for wireless
sensor network deployment. Comput Commun 30:2744–2752
9. Zou Y, Chakrabarty K (2003) Energy-aware target localization in wireless sensor networks.
In: Pervasive Computing and Communications, 2003.(PerCom 2003). In: Proceedings of the
First IEEE International Conference on, pp 60–67
10. Lei S, Cho J, Jin W, Lee S, Wu X (2005) Energy-efficient deployment of mobile sensor
networks by PSO, pp 373–382
11. Xiaoling W, Lei S, Jie Y, Hui X, Cho J, Lee S (2005) Swarm based sensor deployment
optimization in ad hoc sensor networks. In: International Conference on Embedded Software
and Systems, pp 533–541
12. Mirjalili S (2015) The ant lion optimizer. Adv Eng Softw 83:80–98
13. Heinzelman WB, Chandrakasan AP, Balakrishnan H (2002) An application-specific protocol
architecture for wireless microsensor networks. IEEE Trans Wirel Commun 1:660–670
14. Megerian S, Koushanfar F, Qu G, Veltri G, Potkonjak M (2002) Exposure in wireless sensor
networks: theory and practical solutions. Wirel Networks 8:443–454
15. Chaudhary DK, Dua RL (2012) Application of multi objective particle swarm optimization
to maximize coverage and lifetime of wireless sensor network. Int J Comput Eng Res
2:1628–1633
Optimal Sensor Deployment Using Ant Lion Optimization 469
16. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
17. Saremi S, Mirjalili S, Lewis A (2017) Grasshopper optimisation algorithm: theory and
application. Adv Eng Softw 105:30–47
18. Mirjalili S (2016) Dragonfly algorithm: a new meta-heuristic optimization technique for
solving single-objective, discrete, and multi-objective problems. Neural Comput Appl
27:1053–1073
Text Steganography: Design
and Implementation of a Secure and Secret
Message Sharing System
1 Introduction
Secret sharing of messages is the art practiced since ages. However, the techniques like
cryptography and steganography made it much more secure in the information age. As
explored in [1, 3, 5, 16] text steganography has its merits over other forms where cover
image is audio, video or image. The complexity is reduced with steganography. The
process of compression is not required and there is no need for high bandwidth con-
sumption. As the adversaries are gaining knowledge in different methods of text
steganography, it is a continuous process to upgrade systems with novel methods.
Many approaches came into existence as found in the literature. Combination of
different abbreviation methods is employed in [1] while Vigenere cipher and lossless
compression technique are used in [2] for sending secret messages through mails.
Based on secret messages or the nature of the data also certain techniques where
defined. For instance, for sharing of financial data, the concept of adding additional
zeros is used in the method proposed in [7] which proved to be good enough to ensure
secure transmission of data to desired destination. Usage of diacritics is studied in [10]
© Springer Nature Switzerland AG 2020
S. C. Satapathy et al. (Eds.): ICETE 2019, LAIS 3, pp. 470–479, 2020.
https://doi.org/10.1007/978-3-030-24322-7_58
Text Steganography 471
and [12]. From the literature it is understood that it is required to have more light
weight approaches that do not compromise security in text steganography.
In this paper, we proposed a light weight approach to have text embedding and
extraction procedures. The proposed approach is presented in Fig. 1. The contributions
of this paper are as follows. A methodology for text embedding and extraction process
is proposed and a prototype is implemented to demonstrate proof of the concept. The
remainder of the paper is structure as follows. Review of literature on related works is
provided in Sect. 2. The proposed methodology is presented in Sect. 3. Section 4
shows results of empirical study while Sect. 5 concludes the paper and provides
directions for future work.
2 Related Work
This section provides review of literature on text Steganography. Shivani, Yadav and
Batham [1] proposed a novel approach for hiding secret data using text Steganography.
Text file is used as cover as it consumes less space and needs low bandwidth. It also
achieves less time and consumes minimal overhead. They employed abbreviation
methods in combination with Zero Distortion Technique (ZDT). In order to have higher
security, encryption is carried out with a technique known as Index Based Chaotic
Sequence (IBCS). It shows better performance in terms of time consumption and
hiding capacity. Tutuncu and Hassan [2] on the other hand proposed a text
Steganography method which involves Vigenere cipher and lossless compression
technique to achieve email-based text Steganography. Different lossless compression
algorithms are used in a proper sequence such as Run Length Encoding, Burrows
Wheeler Transform, and Move to Front, Run Length Encoding and Arithmetic
Encoding. Stego key is generated using Latin Square. In order to increase complexity
and security Vigenere cipher is employed. Finally, the secret message is embedded into
an email message.
Osman et al. [3] explored capacity performance of text Steganography. Capacity is
one of the metrics used to know the performance of it. Other metrics include saving
space ratio and embedding ratio provides further performance details. There are many
format based Steganography techniques that are evaluated. The methods analyzed for
capacity performance include Quadruple Categorization (QUAD), Vertical Straight
Line (VERT) and Changing in Alphabet Letter Patterns (CALP). Other performance
measures employed are Saving Space Ratio (SSR) and Embedding Ratio (ER). Shi
et al. [4] explored the notion of search in Internet for achieving text Steganography. It is
known as search based text steganography which uses features of web pages. Web page
is used as cover data for the method.
Shutko [5] explored aprosh and kerning concepts in text steganography. Aprosh
and Kerning are the two parameters for changing text. Lwin and Phyo [6] combined
both text and image steganography. Word Mapping Method is to embed secret message
into cover text. Then the cover is embedded into image using LSB technique. A text
steganography system is built in [7] for financial statements. The concept of adding
additional zeros systematically is employed to achieve this. Stojanov et al. [8] proposed
a concept known as property coding to embed secret text into a MS word document.
472 K. Aditya Kumar and S. Pabboju
3 Proposed Approach
This section provides the methodology of research for Text Steganography using the
keywords of the US and the UK. Here the original content is plaintext which is passed
through network giving some problems relating security issues. To overcome the
problems, the plaintext will convert to cipher text based on encryption. For encryption
mechanism, use the keywords of differently spelt words of the US and the UK. The
encryption mechanism contains two steps. First, find the ASCII value of each character
of plaintext. Based on the index position of the character a set is selected from the sets
framed by considering US and UK sets Table 1. The number of sets with 256 words
depends on the number of differently spelt words of US and UK. second step is to find
the word from the sets by respective ASCII value and index position of the sets, then
select the word from set1 based on the ASCII values of each character of plaintext. The
procedure of replacing the character from plain text with matching of ASCII value of
the character with index value of the sets is done repeatedly until all the characters of
the plain text is replaced with words in US and UK sets. This is the procedure to
convert from plaintext to cipher text.
Then at the other end the cipher text will be converted to original plaintext. Use the
decryption mechanism for converting cipher text to plaintext as follows. Find the index
position of keywords based on set of keywords to select the set depending on the index
position of words from set1, set2 and so on.Then from the index we get the ASCII
values of the cipher text. Afterwards, based on ASCII values are obtained to the
equivalent characters. Thus original plaintext is obtained. This is the methodology
using for encryption and decryption mechanism.
The following sub sections provide methodology for both embedding and extrac-
tion processes. Embedding includes inserting secret message into a cover text and then
encrypting the cover file. The reverse process is followed in the extraction phase. As
shown in Fig. 1, a typical scenario of secure and secret sharing of information is
considered between sender and receiver. This kind of model operandi is common in
Text Steganography 473
As shown in Listing 1, it is evident that the given secret text is embedded into a
cover text file and sent to destination. Then at the receiver side, extraction process takes
place.
Decryption: -
Value ¼ argmin0\i1 \n
Original text = ASCII to character (value)
Here first we find the index of cipher text and if index
ii then select UK series from sets of A1, take the value if index
ii then select US series from sets of A2, take the values. This process contains until
the last index of plain text. after that we connect ASCII value to respective US or UK
sets and get cover text.
Parameter Description
T Text
I Index
Set of UK series
Set of US series
For all values of i from 0 to n
4 Results
This section provides the results of the proposed system. It includes the comparison
based on Execution time and Memory allocation. The sample sets of UK and USA
words as shown in Table 1.
The following A1 and A2 shows the list of words to be used for framing sets of US
and UK words
A1 = {Set of UK Series} => UK Words: {æroplane, æsthetic, ageing, aluminium,
……}
A2 = {set of US Series} => US Words: {airplane, aesthetic, aging, aluminum,……}
476 K. Aditya Kumar and S. Pabboju
Fig. 2. UI for loading input file for embed- Fig. 3. Input file has been loaded
ding process
Fig. 4. Get the message of file loaded Fig. 5. Get the message of File Encryption
Fig. 7. Memory usage comparison for Fig. 8. Memory usage comparison for decryp-
encryption (using UK keywords) tion (using UK keywords)
Fig. 9. Execution time comparison for Fig. 10. Execution Time comparison for
encryption (using UK keywords) decryption (using UK keywords)
Fig. 11. Memory usage comparison for Fig. 12. Memory usage comparison decryp-
encryption (using US keywords) tion (using US keywords)
478 K. Aditya Kumar and S. Pabboju
Fig. 13. Execution time comparison for Fig. 14. Execution time comparison decryp-
encryption (using US keywords) tion (using US keywords)
In this paper text steganography is studied besides proposing embedding and extraction
procedures to have secure and secret sharing of textual data. The cover media is text
and thus it is made light weight to have communication between sender and receiver.
The proposed system is implemented by developing a prototype using Java pro-
gramming language. The system is able to demonstrate proof of the concept. However,
it needs further research and development to have a robust system leveraging infor-
mation forensics to safeguard sensitive communications of the real world. The pro-
posed system can be used in the secret communication module of exiting information
sharing systems.
References
1. Shivani, Yadav VK, Batham S (2015). A novel approach of bulk data hiding using text
steganography. Procedia Comput Sci 57: 1401–1410
2. Tutuncu K, Hassan AA (2015) New approach in E-mail based text steganography. Int J Intell
Syst Appl Engineering 3(2):54–56
3. Osman B, Din R, Idrus MR (2013) Capacity performance of steganography method in text
based domain. ARPN J Eng Appl Sci 10:1–8
4. Shi S, Qi Y, Huang Y (2016) An approach to text steganography based on search in internet.
IEEE, pp 1–6
5. Shutko N (2016) The use of aprosh and kerning in text steganography, pp 1–4
6. Lwin T, Su Wai P (2014) Information hiding system using text and image steganography.
Int J Sci Eng Technol Research 3(10):1972–1977
7. Khairullah Md (2014) A novel text steganography system in financial statements. Int J
Database Theory Application 7(5):123–132
8. Stojanov I, Mileva A, Stojanovic´ I (2014) A new property coding in text steganography of
microsoft word documents, pp 1–6
Text Steganography 479
9. Al-Nofaie SM, Fattani MM, Gutub AAA (2016) Capacity improved arabic text steganog-
raphy technique utilizing ‘Kashida’ with whitespaces. In: The 3rd International Conference
on Mathematical Sciences and Computer Engineering, pp 1–7
10. Ahmadoh EM (2015) Utilization of two diacritics for arabic text steganography to enhance
performance. Lect Notes Inf Theory 3(1):1–6
11. Kumar R, Chand S, Singh S (2015) An efficient text steganography scheme using unicode
space characters. Int J Forensic Comput Science 1:8–14
12. Shah R, Chouhan YS (2014) Encoding of hindi text using steganography technique. Int J Sci
Int J Sci Res Comp C Res Comput Sci Engineering 2(1):1–7
13. Mersal S, Alhazmi S, Alamoudi R, Almuzaini N (2014) Arabic text steganography in
smartphone. Int J Comput Inf Technology 3(2):1–5
14. Iyer SS, Lakhtaria K (2016) New robust and secure alphabet pairing text steganography
algorithm. Int J Curr Trends Eng Res 2(7):15–21
15. Kour J, Verma D (2014) Steganography techniques –a review paper. Int J Emerg Res Manag
Technology 3(5):1–4
16. Aditya Kumar K, Pabboju S (2018) An optimized text steganography approach using
differently spelt english words. Int J Pure Appl Math 118(16):653–666
Commercial and Open Source Cloud
Monitoring Tools: A Review
Abstract. The cloud computing has become most popular due to its advantages.
The more number of organizations are migrating to cloud to reduce the com-
plexity of maintain resources. So cloud management becomes most challenging
task. To reduce the complexity of management and improve the overall perfor-
mance of cloud, an efficient cloud monitoring tool is required. The cloud mon-
itoring toolhelps to improve the overall performance and reduce the management
complexity. The major functions of cloud monitoring are to tracking QoS
parameter of Virtualized, physical resources and applications that are hosted on
cloud. Hence cloud monitoring tools monitors all the resources and events and
perform dynamic configurations of cloud for better performance. In this review
paper, we have discussed basic concept of cloud monitoring tool and discuss
various commercial and open-source cloud monitoring tools and its taxonomy.
1 Introduction
number users and their data. Hence there is an important rise in the implementation of
cloud monitoring and managing tools.
The cloud monitoring tool collects the data from different probes that are installed
in various part of the cloud. We found various data collection methods in our literature
survey. These are push based, pull based, hybrid and adaptive push-pull methods. The
push method passes the information from components of cloud to central server. In the
pull method, central server ask component of cloud to send the information. The hybrid
482 M. N. Birje and C. Bulla
method uses combination of push and pull based data collection. The adaptive push
pull uses either of push and pull method at one time based on User Tolerate Rate and
percentage of data change. The adaptive push pull improve the performance by con-
suming lesser computational and communication power.
The redundant, invalid, conflict and irrelevant data increase time and space com-
plexity. To reduce this overhead, the collected will be filtered. A filtering algorithm
could be more efficient to deliver more relevant information or data. Data cleaning is
very important activity to remove unwanted data from cloud system. Filtering can
decrease the impact of monitoring data transfer over the network load and increase the
performance of cloud. The agents in data filtering phase will reduce unwanted data
using some well-known intelligent data mining techniques that adapt machine learning
algorithms.
The data aggregation is a process in which information is gathered and expressed in
a summary form for statistical analysis. The main purpose of data aggregation is to
reduce network traffic and secure private data [6]. The data aggregation can be
implemented in cloud monitoring by adopting data mining techniques like clustering
and classifications. Once data has been aggregated it must be processed and analyzed in
order to extract the useful information. Data analysis is a process of reviewing,
transforming, and modeling data with the goal of extracting useful information, sug-
gestive conclusions, and supporting decision making [15]. The data analysis is used to
improve the performance by identifying status of resources, predicting future status and
detecting critical conditions.
The agents interact with other agents to take intelligent decision. The collector
agent collects the up-to date data and passes data filter agents. The agents work col-
laboratively and pass message and information between themselves. The agent man-
ager periodically checks the health of agent and working functionality of all agents in
cloud monitoring system. The agent manager stores all the details of agent in database.
The agent manager stores the intermediate data that is generated from data collection to
decision making process. The agent manager collects analyzed or processed data to
take control decision to improve the performance. Connection between agents and
agent manager to gather the information is made from time to time due to the very high
cost of the continuous monitoring. To reduce communication cost, the agents will
update the data only when the certain amount of data changes occurs.
Elasticity: The size of cloud increases and decreases dynamically based on its
usage.
Extensibility: extending resource/service as per new requirements of user.
Non-Intrusive: adopt significant modifications as the changes in the requirements.
Scalability: provide services when large number users and organizations are added.
Timelineness: respond within time limit
Resilient: A monitoring system must be persistent in delivering services when the
cloud changes as per new requirements.
Reliable: perform required task at any point of time under stated conditions.
Portable: Cloud environment incorporate with heterogeneous platforms and
services.
Multi tenancy: The CMS should maintain concurrency in serving different data or
information to many users at a particular point in time.
Customizability: The requirements changes from customer to customer. So CMS
should maintain customizability for all the operations of cloud.
In this section, we have discussed most popular commercial cloud monitoring tools
which makes management task simple for producer.
a. Amazon CloudWatch [8, 35] is a monitoring service made for variety of users like
developers, system operators, site reliability engineers, and IT managers. It delivers
with data, information and actionable understandings to monitor the applications
know and answer to system-wide performance changes, improve resource utiliza-
tion, and a unified operation and activities.
b. CloudMonix [9] is enhanced cloud monitoring and automation solution for
Microsoft Azure Cloud. CloudMonix’ s live monitoring dashboard permits Azure
Cloud administrators in advance to understand the cloud resources, get informed
with signals on cautions and exclusions and organize automated regaining and
restoration activities [7].
c. CA Unified Infrastructure Management (CA UIM) [10] provides a single,
analytics-driven solution for proactive, efficient and effective managing and mon-
itoring modern, hybrid cloud infrastructures. It is the IT monitoring solution that
adopted artificial intelligence techniques to provide intelligent analytical report,
broad coverage and an extensible, portable architecture.
d. AppDynamics APM [11] is an Application Intelligence monitoring tool monitors
operational understanding, application performance, user experience and business
influence of the software applications. Application Performance Management and
helps in mapping application automatically.
e. New Relic Cloud Monitoring [12] is used to monitor dynamic cloud application
and Infrastructure in an intelligent manner. It monitors applications in one place
which helps in viewing error rates, page load, slow transactions, and list of running
servers.
484 M. N. Birje and C. Bulla
f. PagerDuty [13]: The Pager-Duty has more freedom in customizing the parameter
and alert mechanism. It also supports other clouds such as NewRelic and AWS. It
supports incident management tool which helps for cloud monitoring systems and
triggering alarms.
g. Bitnami Stacksmith [14] is an independent, easy custom delivery with a unique
goal i.e. to make it modest to get in progress with the AWS services from the
command line. It employs artificial intelligent algorithm to improve the perfor-
mance of cloud activities.
h. Microsoft Cloud Monitoring (OMS) [15] is having more visibility and control
across the hybrid cloud with easy operation management and safety. It is a group of
cloud-based services for handling the on-premises and cloud settings from one
single place.
i. Datadog [16] helps in monitoring events and performance metrics for IT and
DevOps Organizations. The Tool support scalability properties and work efficiently
even if size of data increases. The tool helps in real-time customizable consoles with
slice & dice displays and signals by labels, characters etc.
j. Nimsoft [17] supports multi-layers monitoring and monitor both virtual and
physicalcloud resources. Nimsoft provides their consumers to view and monitor the
resources that are hostel on different cloud infrastructure.
k. Monitis [18] is a multi-agent based cloud monitoring tool. The agents are installed
on network before firewall. The agent collects all the data from network devices
using plugins.
l. RevealCloud [19] is used to monitor different types of cloud. The consumers can
monitor across different cloud layers e.g. SaaS, PaaS, and IaaS. It is not meant for
specific cloud rather it can monitor all types of cloud to get most benefits from
popular clouds.
m. LogicMonitor [20] also supports consumers to monitoring across the different
layers of cloud like IaaS, PaaS, SaaS. SSL and SNMP protocols are used for
communication purpose.
n. Cloudkick [21] is used to monitor and manage server instances that are running
on Amazon EC2 cloud and other cloud providers using a single, amalgamated
interface. Small and low overhead agents are installed on each instance that collect
different metrics from CPU, Memory and Network.
Open source monitoring tools can offer a number of advantages over cloud providers’
native options. In this section, we briefly discuss about most popular and powerful open
source cloud monitoring tools. These are:
a. Nagios [22] monitors all type of components of the cloud like network protocols,
operating systems, System performance metrics, applications, web server, web
services, website, etc. Nagios provides a high level of performance by consuming
lesser server resources using a core 4 monitoring engine which.
Commercial and Open Source Cloud Monitoring Tools: A Review 485
Accuracy Adoptability Autonomic Availability Comprehensiveness Elasticity Extensibility Non- Scalability Timelineness Resilient Reliable Portable Multi Customizability Public Private
tools Intrusive tenancy Cloud Cloud
[22] XaaS ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
[23] XaaS ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
[24] XaaS ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
[25] XaaS ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
[26] XaaS ✔ ✔ ✔ ✔ ✔ ✔
[27] XaaS ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔
[28] XaaS ✔ ✔ ✔ ✔ ✔ ✔ ✔
[29] XaaS ✔ ✔ ✔ ✔ ✔ ✔ ✔
[30] SaaS ✔ ✔ ✔ ✔ ✔ ✔
[31] XaaS ✔ ✔ ✔ ✔ ✔ ✔ ✔
[32] SaaS ✔ ✔ ✔ ✔ ✔
[33] SaaS ✔ ✔ ✔ ✔ ✔
Commercial and Open Source Cloud Monitoring Tools: A Review 489
demand for real time reporting of performance measurement is increasing while per-
forming monitoring operations. Therefore, cloud monitoring systems need to be
advanced and customized to the diversity, scalability, and high dynamic cloud
environments.
5 Conclusion
References
1. The NIST Definition of CloudComputing. https://nvlpubs.nist.gov/nistpubs/Legacy/SP/
nistspecialpublication800-145.pdf
2. Alhamazani K, Ranjan R, Mitra K, Rabhi F, Jayaraman PP, Khan SU, Guabtni A,
Bhatnagar V (2014) An overview of the commercial cloud monitoring tools: research
dimensions, design issues, and state-of-the-art. J Comput 97:357–377
3. Fatemaa K, Emeakarohaa VC, Healy PD, Morrison JP, Lynn (2014) A survey of cloud
monitoring tools: taxonomy, capabilities and objectives. J Parallel Distrib Comput
74(10):2918–2933
4. Aceto G, Botta A, de Donato W, Pescape A (2013) Cloud monitoring: a survey. J Comput
Netw 57:2093–2115
5. Stephen A, Benedict S, Anto Kumar RP (2018) Monitoring IaaS using various cloud
monitors. Cluster Comput. (22):1–13. Springer
6. Bulla C, Bhojannavar S, Danawade V (2013) Cloud computing: research activities and
challenges. Int J Emerg Trends Technol Comput Sci 2:206–214
7. Al-Ayyoub M, Daraghmeh M, Jararweh Y, Althebyan Q (2016) Towards improving
resource management in cloud systems using a multi-agent framework. Int J Cloud Comput
5(2):112–133
8. CloudWatch Document. https://docs.aws.amazon.com/AmazonCloudWatch/latest/
monitoring/acw-ug.pdf
9. CloudMonix. https://cloudmonix.com/features/cloud-monitoring/
10. CA Unified Infrastructure Management. https://docops.ca.com/ca-unified-infrastructure-
management/9-0-2/en
11. AppDynamics. https://docs.appdynamics.com/download/attachments/34272519/
12. New Relic. https://docs.newrelic.com/docs/apm/new-relic-apm/guides
13. PagerDuty. https://v2.developer.pagerduty.com/docs
490 M. N. Birje and C. Bulla
Abstract. This paper aims at developing and designing a social media based
text corpus of San’ani Dialect (SMTCSD). The corpus is considered the first in
the research area that codifies one of the most popular and spoken dialects in
Yemen representing nearly 30% of Yemeni speakers. Our primary objective is a
compilation of authentic and unmodified texts gathered from different open-
source social media platforms mainly Facebook and Telegram Apps. As a result,
we obtained a corpus of 447,401 tokens and 51,073 types with an 11.42%
Token:Type Ratio (TTR) that is composed in entirely manual and non-
experimental conditions. The corpus represents daily natural conversations
which are found in the form of fictional dialogues, representing different situ-
ations and topics during the years 2017 and 2018. The data is preprocessed and
normalized which then is classified into ten different categories. The analysis of
the corpus is made using LancsBox, and different statistical analyses are
performed.
1 Introduction
Arabic Language is one of the six main languages of the world with approximately
thirty dialects. It has three major varieties. The first form is classical Arabic which is the
form of the Holy Quran and historical literature. The second form is Modern Standard
Arabic (henceforth MSA) which covers the written form mostly and rarely formal
speech that is used in media, academics, and news. The third form is Colloquial Arabic
or Dialectal Arabic (DA) that presents the regional dialects used as informal speech. So
Arabic Language is a good example of diglossia where two varieties of the same
language are used by the speakers for formal and informal interaction. MSA is the high
variety that represents the official language in all the Arab countries while Colloquial
Arabic or DA is the low variety that is used for informal speech.
Arabic dialects are classified into many broad categories based mostly on their
regional locations. The broad regional dialects of Arabic are Egyptian Arabic (EGYA),
Gulf Arabic (GFA), Levantine Arabic (LVA), Hassaniya Arabic (HSNA), Iraqi Arabic
(IRQA), Sudanese Arabic (SDNA), Maghrebi Arabic (MGHBA), and Yemeni Arabic
(YMNA). EGYA includes all the Arabic dialects spoken in Egypt. GFA includes the
Arabic dialects in KSA, UAE, Kuwait, Oman, Bahrain, and Qatar. LVA contains
Arabic dialects spoken in Syria, Palestine, Lebanon, and Jordan. HSNA includes the
dialects in Mauritania, Western Sahara, south western Algeria, and Southern Morocco.
IRQA covers dialects spoken in eastern Syria and Iraq. SDNA contains dialects in
Sudan, and Southern Egypt. MGHBA includes dialects in Tunisia, Libya, Algeria, and
Morocco. Finally YMNA covers the dialects of Arabic spoken in Yemen and South-
ern KSA [1–3]. Further division of the above categories is based on regional and social
status.
Most of the available Arabic dialect corpora are directed to certain Arabic dialects
namely Egyptian, Gulf, Levantine, Iraqi and Maghrebi, while the rest of the dialects
have few resources or data. One of these Arabic dialects with a shortage of available
data is Yemeni Arabic which is the aim and focus of this paper. As mentioned earlier,
Yemeni Arabic covers Arabic dialects used in Yemen. It can be further divided into
three main dialects which are Tazi, San’ani, and Hadrami. San’ani Yemeni covers
almost 30% of the population being spoken in north Yemen. The number of San’ani
Yemeni speakers can approximate 9 million speakers.
This paper describes the design and collection of San’ani Yemeni Arabic corpus. It
tries to cover a gap in research providing an authentic resource of Yemeni Arabic. The
corpus is collected from social media platforms namely Facebook and Telegram. We
present our method of data extraction and pre-processing. This study is structured in the
following section headings. 1. Introduction and related work; 2. Data Collection and
Selection; 3. Data Preprocessing (cleaning and normalizing); 4. Corpus Design and
construction; 5. Corpus processing; 6. Results; 7. Conclusion.
Another study [1] presents a list of Arabic corpora divided into five main types
which are Speech Corpora, Arabic handwriting Recognition Corpora and Evaluations,
Text Corpora, Evaluation Corpora, And Lexical databases. [1] lists corpora regardless
of their free accessibility. The list contains a number of Arabic dialectal resources that
cover the following Arabic dialects, i.e., Gulf, Iraqi, Levantine, Egyptian, Jordanian,
Tunisian, and Moroccan. No resources are available for Yemeni colloquial Arabic.
The quantity, quality, coverage, and accessibility of available Arabic corpora are
the main motives for Arabic researcher to opt for better resources [6, 7]. Many of the
available Arabic resources are criticized for focusing on the two formal forms of Arabic
– Classical Arabic (CA) and Modern Standard Arabic (MSA) [2, 5, 7]. The reason of
this is related to the nature of dialectal Arabic which is mostly used in informal spoken
situations. This leads to the paucity of written texts in Colloquial Arabic. However, the
recent advancement in technology and the vast spread of social media platforms
generate the required colloquial text. Some of the social media colloquial Arabic
resources are the Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic [7]
and Creating an Arabic Dialect Text Corpus by Exploring Twitter, Facebook, and
Online Newspapers [2]. [7] was collected from Twitter representing five Arabic dialects
which are Maghrebi, Egyptian, Levantine, Iraqi, and Gulf. [2], on the other hand, used
Facebook and on-line newspapers in addition to Twitter to obtain the data.
As far as our investigation of available Arabic corpora, Yemeni colloquial Arabic
seems nowhere to be found. All the Arabic colloquial resources cover some dialects of
Arabic but not all of them. The aim of this work is to produce an authentic corpus of
San’ani Yemeni Arabic making use of available social media data. The result of this
study is an original and authentic text corpus of San’ani dialect that is preprocessed,
designed, and prepared for further NLP applications.
3 Preprocessing
Our main objective is compiling raw and authentic texts in a machine-readable form
which represent spoken variety. These texts should be useful for any computer pro-
cessing. However, not all raw data is valid for processing. The raw data needs to
undergo a number of steps for making it valid for any practical usage. Using tools for
extracting such data brings about wrong results as it contains a lot of ill-formed texts
that are to be preprocessed for making it valuable and valid for further analysis.
According to [9] preprocessing has a direct impact on the quality of the results returned
by an analysis. It is also a convention that any data that is collected from social media
will have a number of non-alphanumerical and non-keyboarding characters and hence,
will vary slightly in terms of the number of words and their frequencies before cleaning
and after cleaning (see Table 2). It shows the number of corpus tokens and their types
before and after pre-processing.
3.1 Cleaning
We developed a small code in python for corpus cleanings that is used in eliminating
all noisy data which affect some processing operations while playing with our data such
as sorting, frequency count, and finding patterns, etc. These include non-
alphanumerical characters such as symbols, emojis, shapes, and other non-printable
computer characters.
3.2 Normalization
As our data is developed and collected from different platforms and is written by
different authors and social media users, it is difficult to deal with inconsistent and non-
standardized orthographical variations across our corpus. These variations are of
multiple types and may affect in one way or another in meaning change. We observed
some morpho-phonemic processing like synthesis, epenthesis, deletion, insertion, etc.
Developing Social-Media Based Text Corpus for San’ani Dialect (SMTCSD) 495
The data contains a lot of variant forms like laugh written texts as used in social media
context (i.e., /haaa/ or /haaaaaaa/ etc.) where inconsistent letter lengthening or repeti-
tion occurs. For data to be analyzed, normalization process is required for producing
only correct, standard and machine-readable forms. This means we have to get rid of all
abnormal forms and outliers. As a solution, we developed a many-to-one module for
data normalization where we match multiple written variations of the same word to one
standardized form. This is done using Python programming and Django API. The
following are a list of some cases where a normalizer was required.
1. Adding space between the letters of the word making it a problem while counting
the frequency as in ‘ ’ﺗﺶ ﺗﻲ/tash ti/ instead of ‘ ’ﺗﺸﺘﻲ/tashti/ meaning ‘to want.’
2. Substitution of one letter for another which brings about a word with no meaning as
in ‘ ’ﺗﻌﺮﻝ/ta’aril/ instead of ‘ ’ﺗﻌﺮﻑ/ta’arif/ ‘to know.’
3. Some words are written with some letter lengthening making them either new
words with different meanings as in ‘ ’ﺟﺪﺩﻱ/djaddi/ ‘renew (you)’ instead of ‘ ’ﺟﺪﻱ/
djadi/ ‘my grandfather’ or new types of the same word as in ‘ ’ﺗﻌﺒﺒﺒﺘﻚ/ta’abbbtak/
instead of ‘ ’ﺗﻌﺒﺘﻚ/ta’abtak/ ‘made you tired’.
4. The word ‘ ’ﺗﻔﺎﺟﺄ/tafa:j’a?/ ‘was surprised’ is written in different wrong spellings by
different writers as ‘ ’ ﺗﻔﺎﺟﺄء/ tafa:j’a??/ or ‘ ’ﺗﻔﺎﺟﺎ/ tafa:j’a/ or ‘ ’ﺗﻔﺎﺟﻰ/tafa:ja/ or
‘ ’ﺗﻔﺎﺟﻰء/tafa:ja?/.
5. Deletion as in ‘ ’ﺗﻔﻮﻧﻬﺎ/tafu:naha:/ instead of ‘ ’ﺗﻠﻔﻮﻧﻬﺎ/talafu:naha:/ ‘her phone.’
6. Two words mistakenly are happened to be connected with each other with no space
to separate them.
7. Morpheme swapping as in ‘ ’ﺗﻮﻗﺎﺗﻌﻜﻢ/tawaqa:ta’akum/ for ‘ ’ﺗﻮﻗﻌﺎﺗﻜﻢ/tawaqu’a:takum/
‘your expectations’.
8. Romanized texts are found ‘ ’ﺍﻭﻛﻲ ﺑﺎﻱ ﺗﻴﻚ ﻛﻴﺮ/u:ki ba:i tai:k ki:r/ ‘Ok bye take care’.
9. Some words are written letter by letter with space in between ‘ ’ﺃ ﺡ ﺏ ﻙ/?a Ha ba
ka/instead of ‘ ’ﺃﺣﺒﻚ/?aHibaka/ ‘I love you.’
5 Corpus Processing
In the processing stage, we made use of LancsBox tool which is found to be suitable for
data analysis and visualization. Different methods were used such as word-list method
to sort and count for the frequency. We also made use of KWIC and n-grams methods
to disambiguate the word category and parts of speech (POS) as well as to find the
patterning and the structures in which such words occur.
As shown in Table 1, the difference between tokens and types in pre- and post-
cleaning stages is 8,019 and 1,711 subsequently. While the total token:types ratio
(TTR) is found to be 11.59% and 11.42% consequently. However, if we look at our
corpus file by file as shown in Table 2 and visualized in Fig. 3, the type:token rela-
tionships are found to be inverse. This means the more the number of tokens, the less
the TTR we get.
Fig. 2. It shows the total size of SDC with its token-type frequencies
Table 2. It shows statistical variation between raw and cleaned corpus as per category
San’ani Stories No. of tokens No. of types TTR
Pre- Post- Pre- Post-
cleaning cleaning cleaning cleaning
ﺣﺐ ﺗﺤﺖ ﺍﻟﻘﺼﻒ 72,515 72,115 15,648 15,099 21.58
/Hub taHat-alqasf/
‘Love under war’
ﻓﻲ ﺑﻴﺖ ﺟﺪﻱ 69,674 69,299 14,667 14,580 21.05
/fi bayti djadi/
‘in my grandpa house’
ﻋﻮﺩﺓ ﺍﻟﻤﺎﺿﻲ 64,779 62,794 13,904 13,518 21.46
/؟awda?alma:dhi/
‘Back of the past’
ﺿﺤﺎﻳﺎ ﺍﻟﻘﺪﺭ 129,580 124,398 21,483 20,439 16.58
/dhaha:ya al-qadar/
‘Destiny Victims’
1ﺇﺭﺍﺩﺓ ﻗﻠﺐ 66,195 66,012 14,643 14,338 22.12
/?iradat qalb/ ‘The will of Heart’
ﺃﻭﺟﻌﺘﻨﻲ ﻛﺜﺮ ﺍﻟﻠﻪ ﺧﻴﺮﻙ 14,423 14,418 3,593 3,575 24.91
/?awj’atani kathar allahu khairak/ ‘You Hurt me God
bless you’
ﺑﻴﻦ ﺍﻟﺤﺐ ﻭﺍﻟﺜﺄﺭ 7,203 7,203 2,163 2,161 30.03
/baiyna-al-Hubi wa-tha’ar/ ‘between love and
evenge’
ﻛﻞ ﺍﻟﺤﻜﺎﻳﺔ ﺃﺷﺘﻘﺖ ﻟﻚ 8,058 8,056 2,328 2,318 28.89
/kullal-Hika:yata ?ashtaqtu lak/ ‘all the tale I miss
you’
ﻻ ﺗﺨﻠﻴﻨﻲ ﻳﺘﻴﻢ ﻣﺮﺗﻴﻦ 10,404 10,387 3,070 3,039 29.51
/la: takhalayini yat:mun marataini/ ‘don’t make me
an orphan twice’
ﻣﺎ ﺩﺭﻳﺖ 12,759 12,711 4,008 3,969 31.41
/ma: darayt/
‘I didn’t know’
Developing Social-Media Based Text Corpus for San’ani Dialect (SMTCSD) 499
6 Results
The main result of this paper is generating a social media based text corpus of a well-
organized, cleaned and machine-readable, searchable text for San’ani Dialect (SD).
This will be useful as a base for developing NLP applications as well as a resource for
any relative future research. We obtained a result of 447,401 tokens and 51,073 types
with a Type:Token Ratio (TTR) of 11.42% that represents ten different daily and
fictional conversations of SD posted on social media platforms in the years of 2017 and
2018. Our constructed corpus is considered the first of its kind in the research area
addressed that fulfills the research gap of the lack or the unavailability of any reference
corpus for SD.
7 Conclusion
In this paper, we prepared and developed a new corpus for San’ani Dialect that is
collected from the most popularly used social media platforms in Yemen namely
Facebook and Telegram apps. The corpus is manually gathered and designed in a way
that makes it useful, searchable and accessible for developing further natural language
processing applications. Not only the corpus can be the base resource for NLP
applications, but also the way the corpus was constructed and designed makes it
acquire additional benefits. These benefits are the product of the corpus normalizer that
is used to map written variants of the dialect into one normalized and standardized
form. Such mis-writing errors occur as a result of carelessness as well as of educational
and social backgrounds of the authors and social media users. These variations can be
(a) mistyping, or mis-keyboarding which include errors occurred in the morpho-
phonemic and phonological level; (b) Short forms and abbreviations as well as
500 M. Sharaf Addin and S. Al-Shehabi
References
1. Habash NY (27 Aug 2010) Introduction to Arabic natural language processing. Synth Lect
Hum Lang Technol 3(1):1–87
2. Alshutayri A, Atwell E (May 2018) Creating an Arabic dialect text corpus by exploring
Twitter, Facebook, and online newspapers. In: OSACT 3: The 3rd Workshop on Open-
Source Arabic Corpora and Processing Tools, p 54
3. Biadsy F, Hirschberg J, Habash N, (March 2009) Spoken Arabic dialect identification using
phonotactic modeling. In: Proceedings of the eacl 2009 workshop on computational
approaches to semitic languages, pp 53–61. Association for Computational Linguistics
4. Zaghouani W (25 Feb 2017) Critical survey of the freely available Arabic corpora. arXiv
preprint arXiv:1702.07835
5. Al-Sulaiti L, Atwell ES (Jan 2006) The design of a corpus of contemporary Arabic. Int J
Corpus Linguist 11(2):135–171
6. Saad, MK, Ashour WM (2010) Osac: open source arabic corpora. In: 6th ArchEng Int.
Symposiums, EEECS, vol. 10. http://hdl.handle.net/20.500.12358/25195
7. Cotterell R, Callison-Burch C (May 2014) A multi-dialect, multi-genre corpus of informal
written Arabic. InLREC, pp 241–245
8. Sinclair J (2004) Corpus and text — basic principles. In Wynne M (ed) Developing
linguistic corpora: a guide to good practice. Oxbow Books, Oxford, pp. 1–16. http://ahds.ac.
uk/linguistic-corpora
9. Zimmermann T, Weißgerber P (25 May 2004) Preprocessing CVS data for fine-grained
analysis. In: 26th International Conference on Software Engineering Conference Proceed-
ings. Edinburgh, UK. https://doi.org/10.1049/ic:20040466
10. Atkins S, Clear J, Ostler N (1992) Corpus design criteria. Lit Linguist Comput 7(1):1–16
11. Sadjirin R, Aziz RA, Nordin NM, Ismail MR, Baharum ND (2018) The development of
Malaysian Corpus of Financial English (MaCFE). GEMA Online® J Lang Stud 18(3)
A Survey on Data Science Approach to Predict
Mechanical Properties of Steel
N. Sandhya(&)
1 Introduction
In this information era there is a huge amount of data like feedbacks, customer’s data,
medical data, materials data and shares data etc., data science aids us in to keep all the data
simple and easy to understand. Data Science helps in making quality decisions. Over the
last few years data science has changed our technology a lot. Data Science has succeeded
in adding value to business models with the help of statistics, machine learning and deep
learning. The main aim of the data science is to develop novel approaches, algorithms,
tools, methods and the associated infrastructure to extract the high value information
based on the available data and resources. The Data Science techniques are widely
classified into machine learning, regression, logistic regression, pattern recognition,
feature selection, attribute modelling, clustering, association analysis, anomaly detection,
social network analysis, time series forecasting, classification etc.
In the recent years the data science in the field of material science and engineering
has become popular. The main goal of it is to reduce the cost and save time in material
design and its behaviour. The practitioners of the advance material science and engi-
neering have commonly relied on observations made from cleverly designed controlled
experiments and sophisticated physics-based models for identifying the mechanical
properties of the material based on their composition and temperature. In the recent
vision, the experts of the field have identified an integrated approach of data science
analytics to establish desired casual relations between the chemical composition,
processing parameters and properties of the material. There is a lot of progress in the
recent years by using data science techniques in material sciences and engineering for
discovering the design, structure, physical and mechanical properties of any material.
2 Classification Techniques
Machine learning, a data science technique consists of both supervised and unsupervised
learning. In supervised learning we have classification and regression techniques.
Supervised learning is training the computer with the available data and allowing the
program to predict the possible values of data which is called as classifying new obser-
vation from the available data. Classification is applicable for both structured and
unstructured data. In these techniques the given data is classified into different classes
based on the requirement. These techniques predict under which category or class does
the new data falls. The most widely named classification algorithms are Linear Regres-
sion, Naïve Bayes, Decision Trees, Neural Networks, SVM, Random Forest, Nearest
Neighbour classification algorithm etc. Few classification methods are detailed below.
X
A Survey on Data Science Approach 503
SVM kernels are classified into three types: Linear kernel, Polynomial kernel,
Radial Basis Function Kernel.
GiniIndex ¼ 1 Rj P2j
504 N. Sandhya
Input layer
Hidden Layer
Output
layer
Basic element in neural network is node which have three components: weight, bias
and activation function. weight of a node is defined as the signals received from input
then it is multiplied and added up in the node. Bias is the constant attached to neurons
and added to the weighted input before the activation function is applied. Activation
function lies inside the layers of neural networks and modifies the data they receive
before passing it to the next layer.
Neural Network algorithm is used for both classification and regression techniques
[7–12]. They identify non-linear patterns, no direct or one-to-one relation among input
and output then neural network identifies patterns in input and output combinations.
These are mostly used in pattern recognition applications because of their ability to
respond to unexpected input patterns and generate related output patterns.
Architecture of Neural network has three layers: input layer, hidden layer, output
layer.
Input layer: This layer receives input information of the neural network from
external environment. The inputs are normalized within the limit values by using the
activation function.
Hidden layer: In this layer we have neurons which are responsible for extracting
patterns associated with the process being analysed. In this layer most of the internal
processing from a network is performed.
Output layer: This layer also consists of neurons, produces the final network
outputs which are processed by the neurons in the previous layers.
There are different types of neural network models in which most commonly used
are back propagation, radial basis function, feed forward, feedback, convolutional,
recurrent, and modular neural networks, etc.
A Survey on Data Science Approach 505
2.4 Regression
Regression technique initially originated from statistics but in the context of data
science it is mostly used for prediction and forecasting. This technique models and
analyses correlation between variables and predicts the output variables that are con-
tinuous based on the training data. This predictive modelling technique mainly looks
into the relationship between dependent and independent variables. Regression algo-
rithms are mainly trained to predict real numbered outputs. Regression depends on
hypothesis which may be linear, non-linear, quadratic and polynomial. The hypothesis
function depends on input variables and hidden parameters, after training the
hypothesis parameters they predict the output variables for new input variables.
Steel has vital impact in our daily life. Steel is used in our houses for household
purposes, in construction of buildings, manufacturing cars, electricity tower lines, steel
pipelines, different tools like hammer, knife etc., rail-roads and compartments of train
and so on. Steel is an alloy which is a mixture of several metals in which most of the
part is iron. Steels are iron alloys with 0.002% to 2% of carbon. It may also contain
some other elements like manganese, phosphorus, sulphur, silicon, copper, Nickel. The
chemical compositions of all these elements differ based on the type of the steel.
Stainless steels are mostly used in all products like home appliances, hardware
instruments, medical instruments, etc. Based on the crystalline structure of stainless
steels they are classified into 4 types:
i. Austenitic steels
ii. Ferritic steels
iii. Martensitic steels
iv. Duplex
v. Precipitation hardening
Tool Steels: They are called as tool steels as they are used to manufacture metal tools
like stamping tools, cutting tools. Tool steels consists of tungsten, molybdenum, cobalt
and vanadium in differing amounts to increase heat resistance and durability. The
carbon content in the tool steels are between 0.5% to 1.5%. There six types of tool
steels: cold-work, hot-work, water-hardening, shock-resistant, special purpose and
high-speed steels.
Alloy Steels: Alloy steels composes alloying elements such as nickel, copper, alu-
minium, manganese, titanium, chromium excluding carbon. The presence of alloying
elements in varying compositions can manipulate the properties of steel such as brit-
tleness, toughness, hardness, corrosion resistance, strength formability. These steels are
used for different application purposes such as pipelines, electric motors, power gen-
erators, transformers, car parts, etc.
4 Literature Study
[1] In this paper single and multilayer feed forward back propagation models are used
to predict the reduced mechanical properties in metallic materials due to the presence of
hydrogen based upon their composition of elements. To train and validate the models
40 readings were collected which are the properties of different aluminium alloys
before and after the effect of hydrogen at varying strain rates, temperatures and current
densities. The inputs for the model are different alloying elements like aluminium,
copper, magnesium, manganese, iron, lithium, zirconium and zinc and processing
parameters like strain rate, time, current density and temperature. The model predicts
the mechanical properties like tensile strength, yield strength, elongation percentage as
output. Initially all the collected data is normalized in the range of 0–1. The equation
used to normalize the data readings are
2 ðNi Nmin Þ
NV ¼
ðNmax Nmin Þ
508 N. Sandhya
Initially the input values are normalized from 0.05 to 0.95 using the below
equation.
x xmin
X n ¼ 0:05 þ 0:9
xmax xmin
Here xmin is the minimum value in the input(x), xmax is maximum value in the input
(x), xn represents normalized data of input (x). After the network is trained the input
data goes back to its actual values utilizing the below equation
The data of experimental results is splitted into training and testing data. In which
85% of data is given for training the ANN and the training function used is Levenberg-
marquardt and the left over 15% data is used for testing the ANN. The ANN network
used in this paper is implemented in MATLAB 2012 version. In the ANN architecture
for 304L the initial input layer composes 2 neurons, the middle layer composes 6
neurons, and the final output layer composes 5 neurons. For 316L the input layer is
built with 2 neurons, middle layer is built with 17 neurons and the output layer is built
with 5 neurons. The recommended ANN model in this paper is validated using coef-
ficient of correlation, standard deviation and average absolute error. In this model to
evaluate the accuracy coefficient of correlation, average absolute error and standard
deviance are calculated between experimental and predicted values. The coefficient of
correlation for the developed model is 0.94 except for strain hardening component (n).
The correlation coefficient is above 0.95 for Ass 316L except for strain hardening
component (n). The average absolute error values for Ass 304L, 316L are less than
7.5% and 2.82%. Standard deviation for Ass 304L, 316L are below 9.23% and 6.9%.
t-test, f-test, levene’s test are performed using Minitab v16 software. The p-values for
the mean paired t-test is above 0.05.
[5] The model implemented in this paper predicts mechanical properties of a low
carbon steel using radial basis function and back propagation model. The developed
model predicts hardness of low carbon steel and the relation between the chemical
composition and mechanical property of the steel. The normalized values of alloy
elements C, Si, P, S, Cu, Cr, V, Mn, N, Sn, Nr, Sc, Mo, Al. are given as input, hardness
value of the steel is predicted based on the alloy elements. The data considered for the
model is 70 which is categorized into three subsets. In which the training set consists
fifty percent of total data used for modifying neuron weights. The validation set
comprises one fourth of the total data for validating prediction errors by training
process. The rest over data is testing set to test the trained network. The quality of the
model is given by using standard deviation ratio, Pearson correlation and average
absolute error. For RBF network the correlation coefficient is 0.987, max error is 1.505,
min error is −2.5368 and standard estimation of error is 1.087. For back propagation
network the correlation coefficient is 0.9712, max error is 1.60, min error is −2.94, and
standard estimation of error is 1.099.
[6] The implemented model in this paper predicts tensile strength of steel using
FCM clustering based on rough sets. A total of 63 objects are considered for training
510 N. Sandhya
and testing the model. Initially there are 13 attributes (C, Cr, Mn, Si, Ni, P, S, tapping
temperature, temperature before and after fine rolling, roller way speed, opening degree
of air damper, spinning temperature) for each object. Further attribute reduction carried
out by using john’s algorithm and only three attributes C, Cr, P are considered. The
total set of 63 objects is divided into 40 objects for training and 23 objects for testing.
Here we consider two models, model1 with all attributes i.e. 13 neurons as input and
model2 with only reduced attributes i.e. 3 neurons as input. The accuracy of model2 is
higher than model1. The average relative error for model1 is 4.51% and model2 is
1.62%, the computation time for model1 is 15.6 s and for model2 is 2.52 s.
5 Proposed System
The practitioners of the advancing materials science and engineering have standardly
depended on the observations made from conventional tests and physics-based models
to know the properties of steel. The most widely used technique to know the
mechanical properties like tensile strength and yield point is by using universal Tensile
Testing machine (UTM). But it is always complex to conduct the experiments and find
out the mechanical properties of steel all the time. Literature survey emphasises that
several data science techniques like artificial neural network, clustering, regression can
achieve significant results in predicting the mechanical properties of steels or any
metals by using various parameters like composition, temperature, stress and strain
rates, distance from the ground surface, etc. The main objective of this paper is to
propose a method which predicts the mechanical properties like Tensile strength and
Yield point of British standard stainless steels considering low and medium carbon
using the carbon content, temperature, manufacturing process and size of test piece of
the steel. The aim is to propose a method to make a comparative study of different data
science algorithms and find out the accurate algorithm for predicting the mechanical
properties of steel. Unlike from the literature survey where only one algorithm is
trained for steel properties prediction, the focus maybe shifted to different data science
algorithms to be developed in python and R language to predict the mechanical
properties of steel. The accuracy between the actual and predicted values of various
classification and regression models will be evaluated by using different metrics like
correlation coefficient (R), explained variance (R2), Root Mean Square Error (RMSE).
Predictive performances will be evaluated by using confusion matrix, gain and lift
charts, F-measure, cross validation, Gini coefficient, etc. The future objective is to
propose a method where the data science algorithms can accurately predict the
mechanical properties of steel will be incorporated into user-friendly GUI to execute as
a predicting tool for steel mechanical properties.
6 Conclusion
Data science and analytics will definitely effect the current material sciences by
maximizing the accuracy and reliability in predicting the material properties with the
help of large ensemble of datasets from different material databases and by using
A Survey on Data Science Approach 511
different data science techniques. The user-friendly GUI incorporated with different
data science techniques proposed as part of future work, once developed may yield
better results in different applications like manufacturing industries, construction fields
etc. The proposed idea will be a break through to the conventional approach of con-
ducting experiments like tensile tests using Universal Tensile Test machine (UTM) to
find out the mechanical properties of steel. This paper also aims to help the research
society by presenting all the consolidated findings summarised from the deep literature
study of various papers and proposed to develop a predictive tool as the future work.
References
1. Thankachan T, Prakash KS, Pleass CD, Rammasamy D, Prabhakaran B, Jothi S (2017)
Artificial neural network to predict the degraded mechanical properties of metallic materials
due to the presence of hydrogen. Elsevier
2. Lakshmi AA, Rao CS, Srikanth M, Faisal K, Fayaz K, Puspalatha, Singh SK (2017)
Prediction of mechanical properties of ASS 304 in superplastic region using Artificial neural
networks. Elsevier
3. Senussi GH (2017) Prediction of mechanical properties of stainless steel using an artificial
neural network model. Elsevier
4. Desu RK, Krishnamurthy HN, Balu A, Gupta AK, Singh SK (2015) Mechanical properties
of Austenitic Stainless steel 304L and 316L at elevated temperatures. Elsevier
5. Vandana Somkuwar (2013) Use of artificial neural network for predicting the mechanical
property of low carbon steel. Blue Ocean Res J
6. Wang L, Zhou X, Zhang G (2010) Application of FCM clustering based rough sets on steel
rolling process. IEEE
7. Hn Bhadeshia HKD (1999) Neural networks in materials science. ISIJ Int 39(10):966–979
8. Schmidhuber J (January 2015) Deep learning in neural networks: An overview. Neural Netw
61:85–117
9. Reddy NS, Krishnaiah J, Hong SG, Lee JS (2009) Modelling medium carbon steels by using
artificial neural networks. Mater Sci Eng 508(1–2):93–105
10. Fu LM (1990) Building expert systems on neural architecture. In: 1st ZEE Znr. Co@ on
Artificial Neural Networks. London, 16–18 October 1990, pp 221–225
11. Gupta AK (2010) Predictive modelling of turning operations using response surface
methodology, artificial neural networks and support vector regression. Int J Prod Res
48:763–778
12. Singh S, Mahesh K, Gupta A Prediction of mechanical properties of extra deep drawn steel
in blue brittle region using artificial neural network
Image Steganography Using Random Image
1 Introduction
Cryptography is the study and practice of encrypt and decrypt data using mathematical
concepts. Cryptography methods ensure to store and/or transfer the data across insecure
networks, so that no other person or machine cannot read the data except the intended
receiver. Cryptanalysis is the method of analysing the secure data communication.
Traditional cryptanalysis is having any combinations of Analytical reasoning, pattern
finding, and usage of mathematical tools, tolerance, and determination. Attackers are
also called as cryptanalysts [1]. The data which is easy to read and understand without
any additional things is called the plaintext, it is also called as the original data.
Encryption is the process which converts the plaintext into different form. Ciphertext is
the result of encryption, which is unreadable gabble text. Encryption ensures that data
is hidden from all others who are not intended. Decryption converts the ciphertext into
its original form, it is the reverse of encryption. Figure 1 illustrates this process.
N. Subramanyan—Academic Consultant
1.2 Steganography
Data can be hidden within data using Steganography technique. Steganography can be
used along with cryptography as an additional security to protect data. Image, video file
or an audio file can be taken for steganography for hiding data.
2 Existing Method
In the existing method [3], bits of message are hidden in least significant bits (LSBs) of
RGB color channels of 24-bit color image. Two rounds of operations are performed in
this existing method.
In round 1 first, each plaintext character is converted into ASCII code. Find the
length of plain text. Generate a 3 digit random number, using folding technique gen-
erate a single digit key. Generate ciphertext by performing XOR operation between key
and ASCII values of plaintext.
In round 2, Split the selected color cover image into RGB color channels. For each
ciphertext character, first two bits of ciphertext are embedded in R channel using
modified LSB substitution and XOR operation with cover image bits. Next, two bits
Image Steganography Using Random Image 515
will be embedded in G channel using raster scan method and XOR operation with
cover image bits. The remaining four bits will be embedded in B channel using raster
scan method and XOR operation with cover image bits. The stego image is generated
by combining the RGB color channels after embedding process.
Embedding and cover image size are the two limitations. Embedding depends on
the size of the cover image, if data is more; cover image size needs to be increased.
3 Proposed Method
In the proposed method, random image is used to hide the secret message. Since, in the
Random image [4, 5] pixels are not arranged sequentially. So it is very difficult to
identify the information easily. Data will be converted into Gray scale image. Gray
scale image will be embedded in the Random image. Three components required for
image steganography [5] are, cover image, secret message and stego image.
Cover image: It is the image in which the secret messages are going to be hide.
Secret message: The message which is to hide in cover image is called secrete
message. It can be anything such as image, text message.
Stego image: It is the image, generated after embedding the secret message in it.
The stego image is transmitted to the receiver, at the receiver side decryption is
initiated on stego image to retrieve the embedded hidden message from it. Figure 4
illustrates the overview of image steganography.
For embedding grayscale image into random image [5] [6], two pixels of cover
image and one pixel of source image is considered.
Pixels of cover image Pixel of source image
10111100 01111100 0011 0001
L1 L2 HOB LOB
L1 of cover image pixel is replaced with higher order bits and L2 is replaced with
lower order bits of source image. After replacing the final output is shown below.
10110011 01110001
516 S. Kiran et al.
Fig. 5. (a) Grayscale image of size 9191. (b) RGB random image of size 9191 (c) Stego
image after embedding the grayscale image
Image Steganography Using Random Image 517
Fig. 6. (a) The stego image. (b) Grayscale image of size 9191
4 Result Analysis
There are three standard parameters peak signal to noise ratio(PSNR), mean square
error(MSE), and correlation used for evaluating the performance of image compression
methods [7, 8]. PSNR and MSE are error metrics. Used to compare original image with
compressed image. For measuring the level of security in encrypted information the
entropy parameter is used. The quantitative as well as qualitative analysis shown in
tables. Generally, three parameters are used for evaluating the steganography tech-
niques are hiding capacity, distortion and security. Two ways are used to represent
hiding capacity using bit rate and maximum hiding capacity. Number of bits that can be
hidden in cover image pixel is called the bit rate, is also represented as bits per pixel
(bpp). Maximum possible amount of data embed in cover image is called the maximum
hiding capacity. Data in stego image is imperceptible. After hiding data in cover image,
one cannot identify any distortion in them. MSE, RMSE and PSNR are parameters
used to measure the distortions in the image. Complexity of steganography method
may also considered as a fourth parameter. Mean square error [8] is the accumulative
square error between the cover image and the stego image. Peak Signal to Noise Ratio
is used to measure the peak error. Peak Signal to Noise Ratio is the ratio of power of a
signal and power of a noise that affects the fidelity of its representation. Logarith-
mic decibel scale is a measure used for PSNR [8]. Comparison of MSE and PSNR is
shown in Fig. 7 and Tables 1 and 2.
1 XM X N
0
MSE ¼ ½Iðx; yÞ I ðx; yÞ2 ð1Þ
MN y¼1 x¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u 1 X M X N pffiffiffiffiffiffiffiffiffiffi
RMSE ¼ t ½Iðx; yÞ I 0 ðx; yÞ2 ¼ MSE : ð2Þ
MN y¼1 x¼1
pffiffiffiffiffiffiffiffiffiffi
PSNR ¼ 20 log10 ð255 = MSE Þ ð3Þ
The results shown in Tables 1 and 2 and bar graphs of Fig. 7 shows MSE and
PSNR for proposed method has better values than that of the existing method.
518 S. Kiran et al.
Fig. 7. (a) Histogram of MSE for existing and proposed mehtod. (b) Histogram of PSNR for
existing and proposed mehtod.
random images. First the secrete data is converted into image, then it is embedded into
the random cover image. Proposed method ensures more security to the data, since the
data is embedded into random image. It is very difficult to perform the steg analysis on
random image. Further, the work can be extended to embed text and images into audio
and video files.
References
1. Stallings W (2005) Cryptography and network security: principles and practices, 4th edn,
p 592, November
2. Gonzalez RC (2009) Digital image processing, 3rd edn, Pearson Education India, p 954
3. Kiran S, Pradeep Kumar Reddy R, Subramanyan N, Naga Sravanthi D (2017) A novel high
capacity data embedding image steganography using spiral scan, 4(12), December
4. Joseph P, Vishnu Kumar S (2015) A study on stenographic techniques. In: Proceedings of
IEEE global conference on communication technologies (GCCT), pp 206–210
5. Samidha D, Agrawa D (2013) Random image steganography in spatial domain. In: IEEE
international conferences on emerging trends in VLSI, embedded system, nano electronics
and telecommunications system (ICEVENT), pp 1–3
6. Singh A, Singh H (2015) An improved LSB based image steganography techniques for RGB
color images. In: IEEE international conferences on electrical computer and communication
technologies, pp 1–4
7. Suri S, Joshi H, Minocha V, Tyagi A (2014) Comparative analysis of steganography for
colored images. Int J Comput Sci Eng (IJCSE) 2(4):180–184
8. Kumar V, Kumar A, Bhardwaj A (2012) Performance evaluation of image compression
techniques. In: 2012 international conference on devices, circuits and systems (ICDCS)
Method Level Code Smells:
Chernoff Face Visualization
Abstract. Software that is badly written and prone to design problems often
smells. Code smells results in design anomalies that make software hard to
understand and maintain. Several tools and techniques available in literature
helps in detection of code smells. But the severity of the smells in the code is
often not known immediately as it lacks visualization. In this paper, two method
level code smells namely long method and feature envy are visualized using
chernoff faces. Techniques proposed in literature either use knowledge driven
approach or data driven approach for code smell detection. In the proposed
approach a fusion of both knowledge and data driven approach is used to identify
the most relevant features. These most relevant features are mapped to the 15
desired features of chernoff faces to visualize the behavior of the code. The result
shows that almost 95% of the smells are visualized correctly. This helps in
analyzing the programmer’s capability in maintaining quality of source code.
1 Introduction
One of the essential elements of the agile based approaches is refactoring. It is one of
the extreme programming principles, which helps in modifying the existing source
code without affecting its behavior. Refactoring is used to improve the design of
existing code [10]. Refactoring is possible on the working source code with test cases
in place. Refactoring of a code is needed whenever a design anomaly is detected in the
existing source code. These anomalies will hinder the maintenance process. Hence
there is a need to detect these smells. Once the smell is detected it can be refactored to
ensure that the source code follows proper design principles.
There exists several tools and techniques to refactor the source code and to detect
the code smells [8]. As there is no formal definition of code smells, each tool uses its
own approach to detect the same. Further because of the pressure of rapid application
development, the process of smell detection is often ignored as it is time consuming
activity to make use of third party tools to detect the same. Hence there is a need for a
better way to know the anomalies in the source code.
In this paper two method level code smells are considered and mapped to the facial
expression using chernoff faces. The facial features represents the existence or
non-existence of the smells in the source code. Visualization of bad quality code is a
difficult task in real time systems development. As “face is the index of mind”, it is a
fact that the facial representation of code features will clearly indicate its design
anomalies. The visualization can help in quickly detecting the severity of particular
smell so that it can be refactored quickly. Each new version of the source code may
represent different faces. This will also help in visualizing the type of changes made to
the source code from one version to another. The data set is arrived based on project
metrics as features which include package, class and method metrics. Tirupathi et al.
[3], gave the ranking of most relevant features based on gain ratio for long method and
feature envy code smells. The priority of these features is rearranged based on the rules
to detect the same features available in literature [4]. A total of 15 features are used to
plot the faces representing the smells in the method. The results shows that more than
95% of the method level smells were represented correctly using chernoff faces.
The remaining part of this paper is organized as follows: Section 2 specifies the
background of need of chernoff faces in visualization and state of art of detection of
smells, Sect. 3 describes long and feature envy methods and also highlights the metrics
and rules available in detecting the smells. It also specifies the proposed methodology
used in visualization of faces. Section 4 specifies the results and Sect. 5 concludes the
paper.
2 Related Work
the facial features. Hai Lee [2] presents the ten basic face patterns and also provides
mechanisms to map the characteristics to the face features.
One of the major issues with smell detection is the proper visualization of the
anomalies in the existing code. This is possible by mapping the features of a method or
a class to facial characteristics. This helps in proper visualization of the smell. In the
proposed approach the most relevant characteristics are computed and arranged to map
it to facial features for better visualization of the smells.
• LOC (Lines of Code): This includes total number of lines of code in method
including comments and blank spaces.
• CYCLO (Cyclomatic Complexity): This represents the number of linearly inde-
pendent paths in a method
• MAXNESTING (Maximum Nesting): It represents the maximum nesting level of a
control structure in a given method
• NOP (Number of Parameters): This represents the number of arguments in a
method.
• NOAV (Number of Accessed Variables): It represents the number of identifiers
which are access directly or accessed via accessor methods.
• NOLV (Number of Local Variables): Total number of variables that are accessed
directly or via accessor methods including global variables.
• ATFD (Access to Foreign Data): Number of attributes of other class accessed
directly or by invoking accessor methods.
• FDP (Foreign Data Providers): Total number of classes where the foreign data are
defined, counting each class only once.
• CINT (Coupling Intensity): The number of distinct operations called by given
method.
• LAA (Locality of Attribute Access): It is the ratio of number of attributes of
methods class divided by total number of variables accessed.
In addition to the above metrics, other metrics are the candidates for identification
of given method smell. For each source code element, metrics are evaluated to find the
required conditional attributes. Several tools are used to identify whether the code
smells or not. An approach is presented to identify the most relevant features using
information gain [8]. Further the data driven approaches [5–7], specify certain rules for
the identification long method and feature envy smells.
Long Method:
(a) (((LOC_method >=33) & (CYCLO_method >=7) &
(MAXNESTING_method >=6)) & ((NOLV_method >=6) | (ATLD_method
>=5)))
(b) ((df.LOC_method >=33) & ((NOP_method > 4) | (NOLV_method > 4)) &
(MAXNESTING_method > 4))
(c) ((LOC_method > 65) & (CYCLO_method >=0.24) &
(MAXNESTING_method >=5) & (NOAV_method > 8)
Feature envy:
(a) FDP_method <=5 & ATFD_method > 5 & LAA_method < 0.33
(b) ATFD_method > 3 & LAA_method < 0.33
Define desired smell Collection of heteregenous systems Evaluate code metrics Labelling process
In the labelling process, rules are derived to decide whether a code smells or not.
Several tools are used to identify whether the source code element smells or not.
Initially polling process is adopted to decide the label. Later the data set is validated by
manual validation of labels (decision attributes) [3, 11].
In the feature ordering phase, the conditional attributes (metrics) are arranged to
find the most relevant attributes for a particular smell. The ordering of features is the
most important phase of the visualization. This is because these features are to be
mapped to the chernoff faces for proper visualization. The most relevant features which
are arranged based on decreasing order of information gain are again re-ordered using
the data driven strategy i.e. the order is rearranged using the characteristics of the
knowledge driven approach (rules) i.e. the metrics used in rules can be used to reorder
the conditional attributes again. The resultant minimal 15 features are mapped to the
facial features which are then visualized using chernoff faces. The characteristics of
faces will help in visualization of the smells and its severity.
In the proposed paper 74 projects are initially proposed, which resulted in 4,19,995
objects that represents the methods of classes in the projects [4]. Stratified sampling is
applied to obtain sample of 1900 objects. To balance data sets 1/3 of positive and 2/3 of
negative instances data is considered. Each of these resultant data sets consists of 417
objects for long method and 420 instances feature envy respectively. 57 features
(metrics) are initially computed for each object and the decision attributes specifies
existence of the smell.
These most relevant features are initially computed using information gain [3].
There resultant order of the features is adjusted using the features used in rule gen-
eration of each smell. The fusing of knowledge and data driven strategy helps in
selecting the best possible features.
These features are mapped to 15 facial features for visualization of the quality of
the method developed. The simulation of the experiment is realized by providing these
15 features to faces () method in R. For long method and feature envy 837 methods are
visualized to identify the correctness of smelly and non-smelly methods. The manual
validation shows that almost 95% of the smells are visualized effectively.
Figure 2, specifies the visualization of smelly long method.
Few instances of smelly long method is specified in Fig. 2. The common features of
the smelly long method is it includes horns, pointed caps and spread mouth. The Fig. 3,
specifies the non-smelly long method.
Method Level Code Smells: Chernoff Face Visualization 525
The common feature of the non-smelly long methods is the red color domination
and it doesn’t contain horns or the pointed caps. It is observed that 396 of smelly and
non-smelly instances out of 417 instances were recognized correctly based on the
respective facial features. Figure 4, specifies the visualization of feature envy smell and
Fig. 5 specifies the non-smelly feature envy methods.
Few objects methods that are prone to feature envy smell are depicted in Fig. 4. It is
observed that these smelly method visualization includes pointed nose and broad open
mouth.
It is observed that 399 of smelly and non-smelly feature envy instances out of 420
instances were recognized correctly based on the respective facial features.
526 S. A. Moiz and R. R. Chillarige
5 Conclusion
Code smell detection is the important activity for the refactoring of the source code.
The agile methods requires instant feedback about the design problems in the existing
code. In this paper both knowledge and data driven strategy is used to identify the most
relevant features that can be mapped to the facing features for visualization. It is
observed that one can easily identify the smelly code by visualization. The visualiza-
tion may also help in knowing the severity of design level anomalies in the source
code. In future class level, package level, project level and other method level smells
can be visualized and the video of such faces can help in knowing the capability of the
programmer.
Method Level Code Smells: Chernoff Face Visualization 527
References
1. Chernoff H (1973) The use of faces to represent points in K-dimensional space graphically.
J Am Stat Assoc 68(342):361–368
2. Yang HHL (2000) Mian Xiang: the Chinese art of face-reading made easy. Element, London
3. Guggulothu T, Moiz SA (2019) An approach to suggest code smell order for refactoring. In:
Somani A, Ramakrishna S, Chaudhary A, Choudhary C, Agarwal B (eds) Emerging
technologies in computer engineering: microservices in big data analytics. ICETCE 2019.
Communications in computer and information science, vol 985. Springer, Singapore
4. Fontana FA et al (2012) Automatic detection of bad smells in code: an experimental
assessment. J Object Technol 11(2):5:1–38
5. Li W, Shantnawi R (2007) An empirical study of the bad smells and class error probability in
the past release object oriented system evolution. J Syst Softw 80:1120–1128
6. Stefen et al (2010) Are all code smells harmful? a study of God classes and Brain classes in
the evolution of three open source systems. In: 26th IEEE international conference of
software maintenance
7. Fontana FA et al (2015) Automatic metric threshold deviation for code smell detection. In:
6th international workshop on emerging trends in software metrics, pp 44–53
8. Paiva T et al (2017) On the evaluation of code smells and detection tools. J Softw Eng Res
Dev 5:7
9. Kessentini WA (2014) A cooperative parallel search based software engineering approach
for code smells detection. IEEE Trans Softw Eng 40:841–861
10. Fowler MA (1999) Refactoring: improving the design of existing code. Addison-Wesley
Professional, Boston
11. Azadi U, Fontana FA, Zanoni M (2018) Poster: machine learning based code smell detection
through WekaNose. In: ICSE, pp 288–289
Fingerprint Identification and Matching
Abstract. Taking the fingerprints are thought to be the good and quickest
strategy for Bio-metric recognizable proof. It can use in secure manner to utilize,
remarkable in each individual but don’t effect in through out the life. In Human-
Beign the Fingerprints are very important in points of interest called details,
which can be utilized as ID marks for security purposes. In this paper it is an
investigation and execution of a unique finger impression acknowledgment uti-
lizing picture preparing instrument in MATLAB. The approach predominantly
includes extracting the details that focuses through test with different finger prints
and after that performing coordinating in light of the quantity of details matching
among two fingerprints being referred to. For each undertaking, some traditional
and exceptional techniques in literary works are broke down. In view of the
examination, a coordinated answer for unique finger impression acknowledg-
ment is created for show. It at last creates a rate of points that it gives the correct
information regarding the prints of fingers that it is matching or not.
1 Introduction
2 Fingerprint
A biometric image of fingerprint is the component example of a human finger (Fig. 1).
This is the engravings shaped by contact edges of the skin and thumbs. They have for
some time been utilized for distinguishing proof due to their unchanging nature and
distinction. Permanence alludes to the lasting and perpetual behaviour of the human
fingerprints. Independence alludes to the uniqueness of edge subtle elements crosswise
over people; the uniqueness of a finger impression can be controlled by the example of
edges and wrinkles and also by highlights called particulars, which are some anoma-
lous focuses on the edges (Fig. 1). In any case, appeared by concentrated research on
unique mark acknowledgment, fingerprints are not recognized by their edges, but rather
by the details focuses.
3 Recognition of Fingerprint
The identification of fingerprint is used for specifying the identity of person using
his fingerprint without the knowledge of person’s identity. Moreover, the fingerprint
identification system is employed for matching the fingerprint with the database con-
sisting whole fingerprints. The fingerprint identification can be utilized in several
criminal investigations and it follows Automatic Fingerprint Identification System
(AFIS). Different strategies are used for acquiring the fingerprints. From the obtained
fingerprints, inked impression strategy is highly used. Inkless unique finger impression
scanners are likewise present disposing the process of digitization. These strategies has
high proficiency and worthy precision aside from some cases in which the client finger
is dry. Unique mark eminence is essential as it affects uncomplicatedly from the
particulars mining calculation. There are two sorts of debasement ordinarily influence
unique finger impression images: (1) edge lines are not entirely nonstop as it incor-
porates slight cracks (holes); (2) the parallel edge lines are not isolated due to the
closeness jumbling upheavals. The findings of the checked fingerprints should be 500
dpi with size 300 300.
Two portrayal frames for fingerprints isolate the two methodologies for unique mark
acknowledgment. The primary approach, which utilizes picture based techniques [3, 4],
tries to do coordinating in light of the worldwide highlights of an entire unique finger
impression picture. It is a progressed and recently rising strategy for unique mark
acknowledgment. Furthermore, it is helpful to take care of some recalcitrant issues of
Fingerprint Identification and Matching 531
the main approach. The second approach, which is particulars based, speaks to the
unique mark by its neighborhood highlights, similar to terminations and bifurcations.
This method was seriously examined, likewise the foundation of existing accessible
unique mark acknowledgment items. Given two arrangements of details of two unique
mark pictures, the particulars coordinate calculation decides if two details groups are
from a similar finger or not from a single finger. The rejection of matches is impeccable
in check or ID framework, in light of the fact that every time a biometric is wedged, the
format is probably going to be a major challenge. Subsequently, the frameworks of
biometric is designed for making a match choice, with respect to a specific number, and
sets an acknowledgment level of similitude based on trial layout and selected reference
format. Once the examination is done, a score speaking is carried out to create the
similitude level, and the obtained score is used for making the match choice.
6 Algorithm Implementation
Implementation of the fingerprint verification system is broken down into four distinct
phases illustrated in the following section:
i. Acquisition of images
ii. Detection of edges
iii. Comparison of images.
iv. Decision making.
A. Image Acquisition
The fingerprint pictures are caught utilizing the inkless fingerprint impression sensor
(scanner). The nature of the fingerprint pictures is essential since it influences
straightforwardly the particulars extraction calculation. The determination of the fil-
tered pictures is inside the satisfactory qualities (500 dpi), while the size is around
300 300 and is in JPG organize.
B. Detection of Edges
The edge represents the boundary in between the two sections with distinctive gray
level properties. The goal of the techniques based on edge-detection is used for cal-
culating the local derivative operator which involves, ‘Sobel’, ‘Prewitt’ or ‘Roberts
operators. Practically speaking, arranging the pixels got through calculation of edge
location, sometimes portrays a limit totally as a result of clamor, softens up the limit
and different impacts that present fake force discontinuities [7]. In this manner, edge
discovery calculations regularly are used to trail the connecting the identification
strategies devised for pixels of edges into important confines. Essential edge discovery,
which is said to be the recognition of changes in forces to find edges, can be
accomplished utilizing First-request or Second-arrange subordi nates. Edges are
computed by utilizing the distinction between comparing pixel forces of the picture.
532 D. Anand et al.
Second Order Derivative: The second order derivative is normally used to computed
the image using the Laplacian of f(x, y)
@f @f
r2 ¼ f ðx; yÞ ¼ þ ð4Þ
@x @y
The Prewitt operator is used to offer two masks: First is to determine the edges in
horizontal direction and the second is to determine the edges in a vertical direction. The
masks utilized for detecting the edges is called as derivative masks.
1 0 1
Vertical 1 0 1
1 0 1
1 0 1
Horizontal 1 0 1
1 0 1
C. Image Based Comparison
The comparison between images is based on the calculation of managing highly
contrasting focuses which is accessible from the image of fingerprint and subsequently
contrasts the use of Matlab scripts to analyze high contrast dabs. Fingerprint impression
confirmation is the way toward directing two fingerprints with each other for checking
if they has position with equivalent entity. From this point, if a unique mark compete
with finger impression of equivalent entity, then it is called genuine acknowledge
otherwise it is called false reject. Similarly if unique mark of different individual
coordinates, then its called false acknowledges if it rejects them, its called genuine
reject. The False Reject Rate (FRR) and False Accept Rate (FAR) represents the rates
of mistake that can be used for expressing and coordinating trustability [3]. FAR is
characterized by the equation:
where (xi * xn) and (Xi * XN) denote set of minutia for individual fingerprint, m
refers the negligible value between n and N value. If score of similarity >0.8, then
move to step 2, else continue with next ridges. Every fingerprint image undergoes
translation and rotation for all minutia based on minutia reference using following
expression:
0 1 0 1
xi new xi x
@ yi new A ¼ TM @ yi y A ð7Þ
hi new hi h
cos h sin h 0
TM ¼ sin h cos h 0
0 0 1
D. Making Decisions
The Decision making process is performed on the basis of coordinated image level,
which is in the event that over 90% coordinated; pictures are coordinated. Under 90%
coordinated; pictures is extraordinary. Depending on edge setting for distinguishing
proof frameworks, in some cases uses little layouts of reference that is used to match
the tryout format, with superior scores relating to best matches. The final match amount
with respect to two fingerprints represents the count of aggregate coordinated and is
fixed as secluded using the magnitude of particulars of format unique mark. The score
represents the 100*ratio ranging from 0 and 100. In the event that the score is greater
than predetermined limit (normally 90%), the two fingerprints represents tp the same
finger.
534 D. Anand et al.
In this section, the pictorial representations of the simulation results for the two fin-
gerprints matching cases are depicted in Figs. 2, 3, 4, 5. The two sample fingerprints of
the same image after applying edge detection algorithm is depicted in Fig. 3. It can be
clearly seen from the plots that both the vertical and horizontal edges of the ridges are
more visible than the sample images shown in Fig. 5.
It can be seen from the consolidated plots in Fig. 5 that the two fingerprints are
indistinguishable. The result of the result additionally demonstrates an aggregate
coordinated level of 100; consequently the pictures have been coordinated. With
various Fingerprints an aggregate coordinated level of 7.5049 was demonstrated (under
90%); consequently the pictures have not been coordinated.
Fingerprint Identification and Matching 535
8 Conclusion
The above usage was a push to consider and see how a Fingerprint confirmation
framework is used as a kind of biometrics for perceiving the behavior of persons. It
integrates every stages specified in the previous examination. The result of the
examination demonstrates that the proposed method can be received on extensive
databases, for example, that of a nation like Nigeria. The unwavering quality of pro-
grammed mark checks the structure unambiguously based on the exactness attained by
the extraction process of minutia. Different components of the framework are liable to
damages minutia’s right area. Amongst them, the poor quality of image represents the
individual with common bangs. The details harmonizes the computation is equipped
for determining correspondences in between details lacking comprehensive research.
To promote alterations in terms of adeptness and exactness that is consummated by
improving the tools for capturing the images or by improving image upgrading
strategies.
536 D. Anand et al.
References
1. Amand E, Anju G (2012) Simulink model based image segmentation. Intellect J Adv Res
Comput Sci Softw Eng 2(6)
2. Jain A, Hong L, Boler R (1997) Online fingerprint verification. IEEE trans, PAMI-19(4):302–
314
3. Leung WF, Leung SH, Lau WH, Luk A (2000) Fingerprint recognition using neural network.
In: Proceedings of the IEEE workshop neural network for signal processing, pp 226–235
4. Lee CJ, Wang SD (1999) Fingerprint feature extraction using Gabor filter. Electroni Lett 35
(4):288–290
5. Raymond T (1991) Fingerprint image enhancement and minutiae extraction. Technical report,
The University of Western Australia
6. Tico M, Kuosmanen P, Saarinen J (2001) Wavelet domain features for fingerprint recognition.
Electroni Lett 37(1):21–22
7. Yang S, Verbauwhede I (2003) A secure fingerprint matching technique. Wanda Lee, Hong
Kong
Review of Semantic Web Mining in Retail
Management System Using Artificial Neural
Network
Abstract. Now a day, online shopping is being one of the most common things
in the daily lives. To satisfy the customers’ requirements knowing the consumer
behaviour and interests are more important in the e-commerce environment.
Generally, the user behaviour information’s are stored on the website server.
Data mining approaches are widely preferred for the analysis of user’s beha-
viour. But, the static characterization and sequence of actions are not considered
in conventional techniques. In the retail management system, this type of con-
siderations is essential. Based on these considerations, this paper gives detail
review about a Semantic web mining based Artificial Neural Network
(ANN) for the retail management system. For this review, many sentimental
analysis and prediction techniques are analyzed and compared based on their
performance. This survey also focused the dynamic data on the user behaviour.
Furthermore, the future direction in big data analytics field is also discussed.
1 Introduction
The most critical applications for the future generation of distributed systems are big
data analytics. Data mining for such kind of claims presently exceeding Exabyte’s and
fast increasing in size (Kambatla et al. 2014). Recently, in retail management systems,
Big Data is mainly used. The data generation in retails databases are related to variety,
veracity, velocity, volume, and value. The process and management of these databases
have higher capability compared with the conventional mining methods. Most of the E-
commerce companies using different approaches to attract consumers away from the
retail outlets by providing some offers like cash back, secure exchange and cash on
delivery, etc. So, to survive in this competitive business environment, retailers must
identify the problems of their consumers and solve these kinds of issues of the con-
sumers. The different trends in the social media also understand by the retailers on a
regular basis. This paper review about the prediction techniques used for the prediction
of customer behaviour with some machine learning and deep learning methods. Every
© Springer Nature Switzerland AG 2020
S. C. Satapathy et al. (Eds.): ICETE 2019, LAIS 3, pp. 537–549, 2020.
https://doi.org/10.1007/978-3-030-24322-7_65
538 Y. Praveen Kumar and Suguna
transaction made by the customer is stored for analyzing the purchase pattern of the
consumer. Purchase pattern played a vital role in the profit policy for promotion besides
placement of the products to fulfil the customer in addition toraise the retailer revenue
(Verma et al. 2015). Apriori association algorithm is mostly used to detect the standard
items in the databases (Verma and Singh 2015). However, this method has many
limitations such as nature of resource intensive and requirement of multiple scans
database. The extraction of unique patterns for buying from big databases also not
capable of this method (Malhotra and Rishi 2017). So, the sentiment analysis and
prediction methods are used to compare and analyse based on their performance.
Sentiment analysis, correspondingly recognized as opinion mining is an important
Natural Language Processing (NLP) task that gets much consideration these years,
where deep learning based neural network models have attained huge triumph. Sen-
timent analysis mentions to the procedure of computationally recognizing as well as
classifying opinions communicated in a piece of text, in 5 orders to conclude whether
the writer’s attitude concerning a specific subject or product is positive, negative, or
even neutral. In a sentence, not all the words would convey sentimental data. More
precisely, only the adjectives, adverbs, conjunctions and specific nouns are worthwhile
for sentiment analysis. For instances, in view of the subsequent three sentences (i) “I
feel very happy about the quality of the product” (ii) “I also felt extreme happy after
seeing the price of the product”. (iii) “Saying the truth, I have not been pleasant since I
was bought this particular product”. Both of the sentence (i) and sentence (ii) contain
the sentiment keyword “happy” which specifies a positive sentiment. The “happy”
seems in two dissimilar positions of different sentences. Also, sentence (iii) encloses
two sentiment keywords “not” and “pleasant”, which are parted by one more word
“been”. These two keywords together can accurately show the sentiment extremity of a
sentence. In this manner, the essential work for grouping the sentiment keyword of a
sentence is to find notion catchphrases precisely. Sentiment ways are anticipated by
breaking down the assumption of the substance for watchwords for a specific occasion
and applying the expectation calculation to the after effects of examination to foresee
the consequences of the following conclusion. In spite of the fact that conclusions can
be profoundly precisely anticipated when the machine learning calculation is utilized,
in situations where information on wistful ways are not adequate, the exactness of the
expectation demonstrate rather turns out to be much lower. Due to this issue, we
anticipate nostalgic ways through a computation strategy utilizing the weighted qual-
ities as opposed to the machine learning calculation.
As Fig. 1 appears, the idea of time window is utilized for nostalgic way forecast.
The measure of the time window can be set in multi day units like 7 days, 14 days, 21
days, and 28 days. We set the window size to 7 days for the analysis. At the point when
the time window has been seen as much as the set size, the following estimation is
anticipated by the resultant feeling for the pertinent period.
X
ð1=window sizeÞ sentimentpos cont weight ð1Þ
The condition 1 is utilized for sentiment way forecast. Assessments as much as the
set time windows are investigated to separate the methods for positive and negative
Review of Semantic Web Mining in Retail Management System 539
estimations. For this situation, the moving midpoints may be acquired by applying
weights, and the resultant qualities utilized for an expectation of following conclusions.
Weights (cont_weight) are values somewhere in the range of 0 and 1 for considering
the progression of the notions. Be that as it may, there is an impediment of utilizing
moving normal to foresee next slant. Since the normal just creates tallies in window
smooth, it can’t be receptive to dynamic development of qualities. Subsequently, we
attempted expectation utilizing LSTM (Long Short-Term Memory) (Greff et al. 2017).
Figure 2 indicates LSTM for supposition expectation. LSTM is a repetitive neural
system design. It is appropriate to anticipate long time arrangement information.
Likewise, it has leverage over customary RNNs in light of relative lack of care about
hole length (Greff et al. 2017; Hochreiter and Schmidhuber 1997). So we directed
analyses, and utilized LSTM in our expectation as the trial outcome demonstrated that
the technique utilizing LSTM was superior to utilizing moving normal.
In Fig. 2, the user gives their suggestion in the retail site and the sentiment analysis
are made and the LSTM artificial neural network prediction algorithm are used to
predict the positive and negative value of the feedback and based on that the sug-
gestions can be made. Many papers point out the prediction techniques employed for
understanding the customer behaviour and reaction on a product which are explained in
next section.
Artificial Neural Network (ANN) is a computational model inspired by the struc-
ture and functional aspects of biological neural networks (Coello 2006). They are
useful to learn complex relationships or patterns hidden in large scale semantic data.
Researchers have used ANN to enhance ontology alignment (Chortaras et al. 2005;
Mao et al. 2010), ontology enrichment (Chifu and Letia 2010), concept mining
(Honkela and Pöllä 2009), automatic ontology construction (Chen and Chuang 2008)
etc. Supervised ANNs are extensively useful in learning semantic mappings amongst
heterogeneous ontologies. Recursive Neural Network (RNN) model (Mandic and
Chambers 2001) was considered to process structured data well appropriate for uti-
lization with ontologies that are in a structured data representation also. RNN was
preferred to model automatic ontology alignment (Chortaras et al. 2005). One issue
regarding ontology alignment is to discover best configuration that can satisfy ontology
constraints, Projective Adaptive Resonance Theory Neural Network (PART) (Cao and
Wu 2004),was correspondingly used to support automatic ontology construction from
web pages (Chen and Chuang 2008). The PART is accomplished to cluster the web
pages which are gathered for the sake of looking for representative terms of every
group of web pages. The representative terms are input to a Bayesian. The most
representative benefit of CI techniques for the Semantic Web is their ability to tackle
difficult issues in an extremely dynamic as well as decentralized setting.
2 Related Works
Yu et al. (2018), demonstrated an online big data-based model for the oil consumption.
This model was based on the Google trends. This method also investigated the fea-
sibility of Google trends in the online big data prediction for oil usage. For that, this
method involved in two steps namely relationship investigation and improvement in
the prediction. Granger causality and co integration test were used to predict the
Google trends power in the related study statistically. In the prediction improvement
step, for oil consumption prediction many classification techniques were introduced.
They are logistic regression, decision trees, Support Vector Machine (SVM) and Back
Propagation Neural Networks (BPNN).
Johnson and Ni (2015), presented an approach to influence the online social net-
works and recommended a dynamic pricing strategy for the variation in the customer
estimation. This approach described a mechanism of dynamic pricing that estimates the
customer’s interest in the vendor’s product. This interest partially imitated the senti-
ment of the customers towards the products because of the social media. Based on this,
the emotion aware pricing method utilized the demand in forecasting including the
temporary fluctuation in the customer choice parameters. This parameterwas taken
from the sentimental dynamics of the social media data. This approach successfully
Review of Semantic Web Mining in Retail Management System 541
combined with the demand forecasting module of the existing pricing mechanisms.
This approach explored the sensitivity of the proposed mechanism performance
regarding an error in the sentiment forecasting and simulation. These simulations
showed that the forecasting errors underestimated customer sentiment.
Al-Obeidat et al. (2018) presented an Opinion Management Framework with the
integration of topic extraction, project management, and sentimental analysis. The
comments were placed into the clusters in the topic extraction, and each of the groups
was associated with the resolving issues. One or many tasks could be identified from
each cluster. The sentiment expression measurement represented the significance of
each cluster. This framework recommended the collection of comments about every
issue and sentiment was measured from this comments sets. If any subtasks identified
within the task, then these also considered. Merchant considered these subtasks and
functions and the person who select tasks or subtask was addressed. From these tasks
and subtasks, project management features were provided by the vendor. These fea-
tures were duration and costs, shared resources constraints and earlier start times. This
work also considered the task combination and selection of tasks that compensate the
performance cost. These optimal selections based on the sentiment improvement in the
merchant place value and relation. From this framework, the merchant can immediately
respond to the customer’s comment online.
Day and Lin (2017), applied a deep learning method for the sentiment analysis and
focused the consumer review in the mobile phones. For the evaluation and analysis of
the consumer review, deep learning method, opinion dictionary and sentiment dic-
tionary were used in the domain of smart phones. In this approach, consumer reviews
were collected based on the polarity analysis for smart phone applications. Deep
learning method was used to get higher accuracy. Compared with the general machine
learning methods the polarity analysis results were best when using the deep learning
method.
Due to the economic uniqueness, the automation in the retail trade is very difficult
in many business processes. Consider one business process which is a vending machine
formation based on fuzzy sequence algorithm. The main problem with the fuzzy logic-
based algorithms is its action. A large number of data needed to form the solution in
fuzzy logic systems. Generally, the fuzzy logic algorithm requires the period selection
analysis and product information. The product information such as purchasing and
selling prices, number of elements sold, number of products in the machine. These
types of analysis need many hours for the professional marketer, and it takes consid-
erable time, so it is not acceptable.
Semenov et al. (2017), analyzed these assortments forming problems in the cus-
tomer demand forecasting. Initially, history of the product was examined then the
future behaviour of the product was detected. Finally, the future profit of the machine
was predicted using Artificial Intelligence technologies. In this approach, Artificial
Neural Network was employed to solve the machine assortment problem.
Wang et al. (2016), compared various predictive methods for house price predic-
tion. ANN performance was compared with the Multiple regression analysis
(MRA) with the addition of autoregressive integrated moving average approach. The
presented model gave high accuracy in the future prediction of prices. Here, housing
prices were mentioned in time series formation. This method was analyzed with
542 Y. Praveen Kumar and Suguna
housing prices in different places of the world and financial markets. For relationship
modeling between prices and quantities, ARIMA model was used in time series. An
autoregressive model was used to find the variable value. The relationship between the
variable and past period residual was examined by Moving average models.
Ak et al. (2016), compared, two machine learning methods for the estimation of
prediction interval (PT) of time series prediction. For prediction quality measurements
PI coverage probability andPI width are taken. Initially, Multi-Objective Genetic
Algorithm was used to train multilayer perception NN (MLPNN). This method inte-
grated the PI integration in learning and MLPNN was trained for the minimization of
width as well as the coverage probability maximization in PI estimation. The second
method was the combination of Extreme Learning Machines (ELMs) including the
Nearest Neighbour Approach. The trained ELMs predicted the estimation points. Then,
in training dataset depends on the characteristics of the ELM, PIs were quantified.
These two methods were selected for consideration due to the different estimation
approaches of PI. For the identification of Pareto front solutions in PIW and PICP, a
multi-objective optimization framework was used.
Malhotra and Rishi (2018), presented an RV-Map reduce big data analytics out-
linein market basket analysis. By using this framework, the accomplishment of E-
commerce websites was easily made based on ranking. This was scalable and robust as
well as it was an open-source platform for E-Commerce processing based on big data.
Hadoop cluster was described by parallel machines, and big data sets could be easily
stored and processed. So, that large number of customers can quickly allow their
developments to Hadoop cluster distribution from various locations. This framework
recommended that Hadoop and Map-Reduced cloud computing could be preferred for
practical deployment in E-commerce ranking system. The primary purpose of the
framework was the customer assessment in ranking E-commerce websites and easy
searching as well as a perfect ranking of E-Commerce websites.
Chen et al. (2015) proved the need of artificial neural networks in the retail man-
agement system comparing with the other methods. However, this approach did not
answer the issues related with ANN. Different mixtures of set parameters gave about
various resultants such as a structure of input neurons, starting value if height and
hidden neurons. Therefore, this proposed method only used the stock closing price as
input, and different types of setting parameters were taken for experiments. This paper
also enhanced the Back-Propagation Neural Network (BPN) including a new nor-
malized function. BPN minimized the error in the system. MSE and mean absolute
percentage error was used for model evaluation. The result provided by this system was
better compared with the other systems regarding accuracy.
Lu et al. (2015) analyzed the variable structures of vegetable price including the
weight optimization and BPNN threshold values. Particle Swarm Optimization
(PSO) algorithm was utilized to predict the retail price of the vegetables. From the
investigational outcomes, it was verified that the PSO-BPNN method normalized the
over fitting problem well compared with the traditional back propagation method. This
proposed PSO-BP efficiently reduced the training error and improved the precision in
prediction.
Thakur et al. (2015) presented a combined approach for gas price prediction with
ANN and moving average methods. The input layer, activation function, and hidden
Review of Semantic Web Mining in Retail Management System 543
layer were employed to achieve output. A neural network trained the neuron numbers
which were in the hidden layer. For the measurement of linear and nonlinear series
values, neural network and moving average were used. But the neutron in the hidden
layer can make an impact in error and reduce the stability as well as over fitting. This
model mainly focused on the hidden layer neutron selection, so it resulted in less value
in the mean square root value.
Heinrich et al. (2015) showed the dynamic capabilities of big data analytics in the
prediction of customer behaviour, adaptive skills, key performance measurement and
maintaining the temporary viable advantage of competitors. In this way, the big data
value might be deployed for radical and incremental innovations. The additional
changes enhanced both the current and existing marketing strategies, and the radical
innovations defined a new method for anticipatory shipping strategy. Bekmamedova
and Shanks (2014), described the bank social media marketing approach. In this
approach, the actions in addition to insights came from big social data were efficiently
embedded in the existing business operation and marketing managers’ decision-making
legacy as well as business analysts.
Dutta and Bose (2015) studied the engrossment of the generic business based on
the value from the big data analytics. For this, genetic algorithm and BPNN were used
to fulfil the deployment and business model enhancement for nine building blocks.
This study highlighted the difficulty of the social big data and requirement for mindset
change for marketing heads and employees for any organization. It also presented the
application of social big data analytics in the different levels of the production cycle,
groundwork identification plan, and strategies in data mining.
Malik and Hussain (2017) investigated the prediction method based on the impacts
of negative and positive reviews about a product. From the review content, the positive
and negative emotions were predicted using a deep neural network (DNN). This
approach also facilitated the E-commerce retailers and managers in the minimization of
the processing cost for getting improved reviews. From the results, it was obtained that
the DNN based review prediction technique had vital role compared with the existing
prediction techniques.
Wang et al. (2018), suggested a technique for the evaluation of economic sus-
tainability in geographic units based on the convolution neural networks (CNN). This
method was introduced to fulfil the gap in the little market estimation issues and
provided a sustainable business strategy for retail shops. It was based on the estimation
of market demand of the retailers over actual sales data and social media., and it formed
a market potential map. By the consideration of spatial proximity, a nuclear density
method was implemented. The market potential was estimated by the established model
without the knowledge of retailers. For the estimation accuracy verification, the pre-
sented technique was associated with ANN and least square regression method using
cross-validation. The outcomes of proposed technique have greater precision compared
with the existing techniques and also it could be applied for the estimation of micro
scale market potential.
Krebs et al. (2017), proposed a reaction prediction on the Facebook post by using
neural networks. For that, a data set was used to find the Facebook post reaction, and it
was useful for both marketing users and machine learners. Then sentimental analysis
and emotion mining of Facebook posts was performed by the prediction of user
544 Y. Praveen Kumar and Suguna
reactions. Initially, emotion mining techniques and emotional analysis were utilized for
the sentimental analysis of Facebook comments and posts. Then, NNs including pre
trained word embeddings were utilized to approximate the postre action accurately.
Wehrmann et al. (2018) proposed an innovative approach for sorting the sentiment
as well as the language of tweets. The proposed architecture included a Convolution
Neural Network (ConvNet) by means of two different outputs, each of which con-
sidered to reduce the classification error of either sentiment assignment or else language
identification. Outcomes presented that the suggested method outperforms both single-
task in addition to multi-task advanced approaches for sorting multilingual tweets.
Jianqiang et al. (2018) introduced a word embeddings technique acquired by
unsupervised learning based on large twitter corpora with latent contextual semantic
relationships and co-occurrence statistical characteristics between words in tweets.
These word embeddings are integrated by means of n-grams features in addition to
word sentiment polarity score features to form a sentiment feature set of tweets. The
feature set is included into a deep CNN for training and forecasting sentiment classi-
fication labels. Experimentally compared the performance of the proposed model
through the baseline model that is a word n-grams model on five Twitter data sets,
theoutcomesspecified that the proposed model achievedimproved on the accuracy and
F1-measure for twitter sentiment classification.
Poria et al. (2017) presented a multimodal data analysis structure. It incorporated
the extraction of remarkable highlights, improvement of unimodal classifiers, building
highlight and choice level combination structures. The deep CNN-SVM - based sen-
timental analysis part was observed to be the key component for beating the best in
conventional model precision. MKL has assumed a critical part in the fusion experi-
ment. The proposed decision level fusion design was likewise an essential contribution
of this research. On account of the decision level fusion experiment, the coupling
semantic patterns to decide the heaviness of literary methodology had improved the
execution of the multimodal sentimental analysis system significantly. Strangely, a
lower precision was gotten for the task of emotion recognition, which may show that
extraction of emotions from video might be more troublesome than deducing polarity.
While content was the most vital factor for deciding extremity, the visual methodology
demonstrates the best execution for feeling examination. The most intriguing part of
this paper is that a typical multimodal data analysis structure was well suitable for
extraction of emotion and sentiment from various datasets.
The below table shows the summary of different methods involved in big data
analytics of retail management system.
From the literature and Table 1, it is understood that compared with the other
prediction methods semantic web based Back propagation neural network has better
performance for the big data analytics in the retail management system. The proposed
S-ANN technique provides less means square value compared with the other predictive
tool for big data analytics.
Experiment on Sentiment Analysis:
The suggested sentiment analysis is based on sentiment models. We consider a sen-
timent analysis model which demonstrates an accuracy of approximately 84% as with
the a fore mentioned investigational outcomes. Although training with more infor-
mation is needed to enhance the accuracy, enhanced sentence analysis outcomes can be
Review of Semantic Web Mining in Retail Management System 545
Table 1. (continued)
Authors Method Inferences
Chen et al. Backpropagation neural Setting parameters in BPN. 2. Better results in
(2015) network (BPN) the accuracy of prediction compared other
systems
Thakur et al. 1. Backpropagation Neural network showed flexibility between the
(2015) Neural Network inputs
2. Multilayer Levenberg- and outputs.
Marquardt algorithm.
achieved by allowing for the features of social media contents like social relations. We
have investigated along side traditional machine learning methods to authenticate the
suggested model. We preferred Naïve-Bayes, SVM, and Random Forest for the tra-
ditional machine learning models. We preferred the similar datasets as the suggested
model, and trained with the modules of Scikit-learn.
3 Conclusion
This paper presented a complete review of different data mining techniques available in
the retail management system. Various methods for predicting the user behaviour was
considered for analysis. From the investigation, it was identified that ANN based
semantic web mining method has better accuracy and less mean square value compared
with the other conventional predictive tool methods. The existing technique generates
numerous iterative overhead results in the analysis. The pattern extraction efficiency is
also very less in conventional techniques. Most of the existing mining techniques do
not consider the feedback from the user in retail management systems. In the literature,
an online big data-driven oil consumption predicting model was described which uti-
lizes Google trends, which marvellously reveal different related factors built on a
myriad of search results. This model includes two key steps, relationship analysis and
prediction enhancement. But the proposed model still has some limitations. Initially, it
needs the selection of the most suitable Google trends, and therefore, a complete study
of all Google trends associated to the oil market is a significant concern. Next, some
currently emerging predicting tools, particularly the decomposition-and-ensemble
methods, might also be presented to improve the accuracy of prediction. Third, the
relations concerning Google trends and oil consumption will change in extent over
time, and could even vanish.
Review of Semantic Web Mining in Retail Management System 547
References
Ak R, Fink O, Zio E (2016) Two machine learning approaches for short-term wind speed time-
series prediction. IEEE Trans Neural Netw Learn Syst 27(8):1734–1747
Al-Obeidat F, Spencer B, Kafeza E (2018) The Opinion Management Framework: identifying
and addressing customer concerns extracted from online product reviews. Electron Commer
Res Appl 27:52–64
Bekmamedova N, Shanks G (2014) Social media analytics and business value: a theoretical
framework and case study. In: 2014 47th Hawaii international conference on system sciences
(HICSS). IEEE, pp 3728–3737
Cao Y, Wu J (2004) Dynamics of projective adaptive resonance theory model: the foundation of
PART algorithm. IEEE Trans Neural Netw 15(2):245–260
Chen CC, Kuo C, Kuo SY, Chou YH (2015) Dynamic normalization BPN for stock price
forecasting. In: 2015 IEEE international conference on systems, man, and cybernetics (SMC).
IEEE, pp 2855–2860
Chen RC, Chuang CH (2008) Automating construction of a domain ontology using a projective
adaptive resonance theory neural network and Bayesian network. Expert Syst 25(4):414–430
Chifu ES, Letia IA (2010) Self-organizing maps in Web mining and semantic Web. In: Self-
organizing maps. InTech
Chortaras A, Stamou G, Stafylopatis A (2005, September) Learning ontology alignments using
recursive neural networks. In: International conference on artificial neural networks. Springer,
Berlin, Heidelberg, pp 811–816
Coello CC (2006) Evolutionary multi-objective optimization: a historical view of the field. IEEE
Comput Intell Mag 1(1):28–36
Day MY, Lin YD (2017) Deep learning for sentiment analysis on Google Play consumer review.
In: 2017 IEEE international conference on information reuse and integration (IRI). IEEE,
pp 382–388
Dutta D, Bose I (2015) Managing a big data project: the case of Ramco Cements Limited. Int J
Prod Econ 165:293–306
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search
space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232
Guille A, Hacid H, Favre C, Zighed DA (2013) Information diffusion in online social networks: a
survey. ACM Sigmod Rec 42(2):17–28
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Honkela T, Pöllä M (2009, June) Concept mining with self-organizing maps for the semantic
web. In: International workshop on self-organizing maps. Springer, Berlin, Heidelberg,
pp 98–106
Jianqiang Z, Xiaolin G, Xuejun Z (2018) Deep convolution neural networks for Twitter
sentiment analysis. IEEE Access 6:23253–23260
Johnson SD, Ni KY (2015) A pricing mechanism using social media and web data to infer
dynamic consumer valuations. In: 2015 IEEE international conference on big data (Big Data).
IEEE, pp 2868–2870
Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib
Comput 74(7):2561–2573
Krebs F, Lubascher B, Moers T, Schaap P, Spanakis G (2017) Social emotion mining techniques
for Facebook posts reaction prediction. arXiv preprint arXiv:1712.03249
Lu YE, Yuping L, Weihong L, Qidao S, Yanqun LIU, Xiaoli Q (2015) Vegetable price
prediction based on PSO-BP Neural Network. In: 2015 8th international conference on
intelligent computation technology and automation (ICICTA). IEEE, pp 1093–1096
Review of Semantic Web Mining in Retail Management System 549
Malhotra D, Rishi OP (2017) IMSS: a novel approach to design of adaptive search system using
second generation big data analytics. In: Proceedings of international conference on
communication and networks. Springer, Singapore, pp 189–196
Malhotra D, Rishi OP (2018) An intelligent approach to the design of E-Commerce metasearch
and ranking system using next-generation big data analytics. J King Saud Univ-Comput Inf
Sci
Malik MSI, Hussain A (2017) Helpfulness of product reviews as a function of discrete positive
and negative emotions. Comput Hum Behav 73:290–302
Mandic DP, Chambers J (2001) Recurrent neural networks for prediction: learning algorithms,
architectures and stability. Wiley, Hoboken
Mao M, Peng Y, Spring M (2010) An adaptive ontology mapping approach with neural network
based constraint satisfaction. Web Semant: Sci, Serv Agents World Wide Web 8(1):14–25
Poria S, Peng H, Hussain A, Howard N, Cambria E (2017) Ensemble application of
convolutional neural networks and multiple kernel learning for multimodal sentiment
analysis. Neurocomputing 261:217–230
Semenov VP, Chernokulsky VV, Razmochaeva NV (2017) Research of artificial intelligence in
the retail management problems. In: 2017 IEEE II international conference on control in
technical systems (CTS). IEEE, pp 333–336
Thakur A, Kumar S, Tiwari A (2015) Hybrid model of gas price prediction using moving average
and neural network. In: 2015 1st international conference on next generation computing
technologies (NGCT). IEEE, pp. 735–737
Verma N, Singh J (2015) Improved web mining for e-commerce website restructuring. In: 2015
IEEE international conference on computational intelligence & communication technology
(CICT). IEEE, pp. 155–160
Verma N, Singh J (2017) An intelligent approach to big data analytics for sustainable retail
environment using Apriori-MapReduce framework. Ind Manag Data Syst 117(7):1503–1520
Verma N, Malhotra D, Malhotra M, Singh J (2015) E-commerce website ranking using semantic
web mining and neural computing. Procedia Comput Sci 45:42–51
Wang F, Zhang Y, Xiao H, Kuang L, Lai Y (2015, November) Enhancing stock price prediction
with a hybrid approach based extreme learning machine. In: 2015 IEEE international
conference on data mining workshop (ICDMW). IEEE, pp 1568–1575
Wang L, Fan H, Wang Y (2018) Sustainability analysis and market demand estimation in the
retail industry through a convolutional neural network. Sustainability 10(6):1762
Wang L, Wang Y, Chang Q (2016) Feature selection methods for big data bioinformatics: a
survey from the search perspective. Methods 111:21–31
Wehrmann J, Becker WE, Barros RC (2018) A multi-task neural network for multilingual
sentiment classification and language detection on Twitter. Mach Transl 2(32):37
Yu L, Zhao Y, Tang L, Yang Z (2018) Online big data-driven oil consumption forecasting with
Google trends. Int J Forecast 35:213–223
Yuce B, Rezgui Y (2017) An ANN-GA semantic rule-based system to reduce the gap between
predicted and actual energy consumption in buildings. IEEE Trans Autom Sci Eng 14
(3):1351–1363
Zhou ZH, Chawla NV, Jin Y, Williams GJ (2014) Big data opportunities and challenges:
discussions from data analytics perspectives [discussion forum]. IEEE Comput Intell Mag 9
(4):62–74
Real Time Gender Classification Based
on Facial Features Using EBGM
1 Introduction
2 Review of Literature
We existing a program for recognizing singular experiences from singular pictures out
of an immense information source containing one picture for every person. The strategy
is testing a result of picture contrast with respect to put, measurement, appearance, and
552 D. K. Kishore Galla and B. Mukamalla
cause. The program breaks a large portion of this distinction by getting brief experience
clarifications in the best possible execution of picture graphs. In these, fiducially factors
clearly (eyes, oral depression, and so on.) are depicted by sets of wavelet components
(planes). Picture outline expulsion is fixated on a novel procedure, the accumulation
diagram, which is produced using somewhat set of illustration picture graphs. Recog-
nizable proof is focused on a direct examination of picture graphs. We report
acknowledgment tests on the FERET information source and in addition the Bochum
information source, including acknowledgment crosswise over reason.
We set ourselves the activity of recognizing people from singular pictures by
reference to a gathering, which likewise contained just a single picture for every
person. Our issue was to deliver picture contrast because of contrasts in confront
appearance, go cause, place, and measurement (to name just the most essential). Our
methodology is in this way a run of the mill segregation within the sight of fluctuation
issue, where one needs to endeavor to crumple the distinction and to feature perceiving
capacities. This is for the most part just conceivable with the assistance of insights
about the home of changes not out of the ordinary. Class systems vary extraordinarily
with respect to the nature and beginning of their insight about picture changes.
Frameworks in Artificial Intelligence [4] and Computer Vision frequently weight
specific planner gave components, for example exact kinds of three-dimensional things
or of the picture age strategy, though Sensory Network outlines for the most part
weight utilization of system from delineations with the assistance of numerical
appraisal methods. Both of these extraordinary conditions are costly in their own
particular manner and fall shatteringly. Shy of the straightforwardness with which
natural systems get basic points of interest from simply a few. Part of the achievement
of natural systems must be because of regular characteristics and laws on how thing
pictures change under natural conditions. Our program has a fundamental center of
structure which demonstrates the way that the photos of predictable things generally
change over, range, move, and misshape in the photo air ship. Our fundamental thing
reflection is the stamped diagram; sides are set apart with separate subtle elements and
hubs are set apart with wavelet responses territorially incorporated into planes. Put
away plan diagrams can be printed to new pictures to acquire picture graphs, which
would then be able to be consolidated into a gathering and progress toward becoming
outline outlines. Wavelets as we utilize them are powerful to normal lighting changes
and little changes and distortions. Display diagrams can without much of a stretch be
changed over, flaky, centered, or deformed amid the related technique, accordingly
paying for the majority of the distinction of the photos. Shockingly, having just a single
picture for each piece of the exhibitions does not offer sufficient points of interest to
oversee turning definite similarly. Be that as it may, we existing results on acknowl-
edgment crosswise over various presents. This run of the mill system is helpful for
dealing with any sort of predictable thing and might be satisfactory for perceiving
between basically extraordinary thing composes. Be that as it may, for in-class polish
of things, of which encounter acknowledgment is a case, it is important to have subtle
elements specific to the home ordinary to everything in the course. In our program,
class-particular subtle elements has the correct execution of gathering diagrams, one for
each reason, which are heaps of a normal assortment (70 in our Investigations) of
various experiences, stream inspected in a proper arrangement of fiducially factors (set
Real Time Gender Classification Based on Facial Features 553
over sight, oral hole, shape, and so on). Cluster outlines are dealt with as combinatorial
associations in which, for each fiducially point, a stream from an alternate illustration
experience can be chosen, in this way making a very helpful plan. This plan is printed
to new face pictures to successfully discover the fiducially factors in the photo. Planes
at these variables and their relative parts are delivered and are blended into a photo
outline, an impression of the experience which has no residual contrast because of
measurement, put (or in-plane arrangement, not connected here). An accumulation
outline is made in two phases. Its subjective system as a diagram (an arrangement of
hubs in addition to edges) and in addition the task of comparing names (flies and
separations) for one starting picture is architect given, though the greater part of the
gathering outline is created semi-naturally from illustration pictures by related the
embryonic accumulation graph to them, less and less regularly all the encompassing to
revise wrongly distinguished fiducially factors. Picture graphs are fairly powerful to
little top to bottom changes of the best. Bigger turning points of view, i.e. diverse
presents, are taken care of with the assistance of accumulation graphs with an alternate
diagram structure and architect gave correspondences between hubs in various pre-
sents. After these plans our program can draw out from singular pictures brief invariant
experience clarifications in the best possible execution of picture outlines (called
configuration diagrams when in a display). They contain all subtle elements applicable
for the experience tastefulness system. With the end goal of acknowledgment, picture
diagrams can be as opposed to configuration outlines at small handling cost by dis-
secting the mean stream similarity. We gave a speculatively and computationally
simple yet proficient multiresolution technique to grayish range and turning invariant
structure grouping fixated on ‘uniform’ local paired styles and nonparametric class of
case and model withdrawals. ‘Uniform’ styles were perceived to be an essential
architecture, as they offer a larger part of local structure styles, relating to structure
microstructures, for example, sides. By computing the withdrawals of these
microstructures, we blended compositional and scientific structure explore. We built up
a general grayish range and turning invariant proprietor LBPP,R riu2, which considers
finding ‘uniform’ styles in round networks of any quantization of the precise zone and
at any spatial determination. We additionally gave a simple intends to blending
responses of different suppliers for multi-determination examine, by accepting that the
proprietor responses are partitioned. Phenomenal preliminary results gained in two
issues of genuine turning invariance, where the classifier was prepared at one specific
turning position and tried with tests from other turning points of view, show that great
tastefulness can be accomplished with the episode research of ‘uniform’ turning
invariant provincial paired styles. Face acknowledgment advancements can funda-
mentally affect confirmation, following and picture posting applications. This report
exhibits a criteria to gauge similarity of experiences all in all. The system is to question
an information source utilizing the photo of an ordeal and after that have the program
either find out its personality, or recuperate the best indistinguishable matches. All
things considered, the system is run of the mill and has already been utilized effectively
in picture recuperation assignments, for example, finding indistinguishable minutes,
pictures, double shapes and plans. The methodology is focused on the two specula-
tions; first that general look of an ordeal assumes an imperative part in breaking down
resemblance and second, multi-scale differential well known elements of the photo
554 D. K. Kishore Galla and B. Mukamalla
lighting zone compose effective general look capacities. The principal hypothesis is
focused on the announcement that general look is a basic sign with which we evaluate
similarity. We promptly perceive things that offer a general look as indistinguishable,
and without other proof, are probably going to decay those that don’t. An exact
meaning of general look is testing. The physical and perceptual marvels that decide
general look are not outstanding, and notwithstanding when there is understanding, for
example, the impact of thing (3D)shape, zone structure, lighting, albedo and point of
view, it is non-insignificant to separate a photo along these components.
settled from the blend of capacities from the front and side points of interest of a person,
subsequent to taking out capacity repetition. Filter work sets from the subtle elements
source and question pictures are printed utilizing the Euclidean range and Point con-
figuration related strategies. Diagram Matching Method utilized on the SIFT descriptors
to manage inaccurate couple errand and abatement the assortment of SIFT capacities.
Filter capacities are evaluated by a discriminative prerequisites relying upon Fisher’s
Differentiate Research, so the chose capacities have the base inside class distinction and
most extreme contrast between sessions. Both universal and local related strategies are
utilized. With a specific end goal to diminish the recognizable proof errors, the
Demister-Shafer choice idea is utilized to mix the two related techniques.
3.2 Modules
A. Enrolment Stage
The photo is obtained utilizing a web digicam and held in a subtle elements source.
Then, the Human face image is perceived and prepared. Amid training, the Human
face image is preprocessed utilizing geometrical and photometric standardization.
The alternatives of the head picture are created utilizing a few capacity expulsion
strategies. The choices subtle element is then spared together with the client rec-
ognizable proof in points of interest source.
B. Recognition/Confirmation Stage
A client’s face unique finger impression points of interest is by and by gotten and it
utilizes this to either perceive who the client is, or affirm the expressed recognizable
proof of the client. While distinguishing proof incorporates assessing the acquired
unique finger impression points of interest against formats relating to all clients in
the subtle elements source, affirmation incorporates correlation with just those
designs comparing to expressed ID. In this manner, distinguishing proof and
affirmation are two special issues having their own particular common difficulties.
The acknowledgment/confirmation organize incorporates a few portions which are
picture buy, confront acknowledgment, and face acknowledgment/check.
C. Picture Acquisition/Face Detection Module
Face acknowledgment is utilized to perceive confront and to draw out the applicable
points of interest identified with facial capacities. The photo will then be resized and
settled geometrically with the goal that it is sensible for acknowledgment/
confirmation. In this part, the foundation or the minutes unessential to manage will
be expelled. The program can perceive a face progressively. The human face image
acknowledgment item is additionally powerful against lighting contrast and func-
tions admirably with various skin tone and impediments, for example, facial hair,
and bacchante and with head cover.
The Human face acknowledgment incorporates picture buy segment. Its goal is to
look for and afterward fixings a territory which contains just the head. The program
was relying upon the rectangular shape capacities utilizing Adaboost criteria. Its
consequences are the rectangular shape which contains confronts capacities, and
picture which contains the expulsion of the acknowledgment confront capacities.
556 D. K. Kishore Galla and B. Mukamalla
Enrollme Recognition /
nt Verification
Image
Acquisition
Face
Detection
Training
Verifica
Preproce
Recognition tion
ssing
Preprocessin Preproce
g ssing
Feature
Extractio
n Featur
Feature e
Extractio
Extraction n
Stored
Template
Classific
Classification ation
Threshol
d
Client ID Pass/Fail
i. Preprocessing
The reason for the pre-handling module is to diminish or dispense with a
portion of the varieties in look because of light. It standardized and upgraded
the face picture to enhance the acknowledgment execution of the framework.
The preprocessing is pivotal as the vigor of a face acknowledgment framework
enormously relies upon it. By performing unequivocal standardization forms,
Real Time Gender Classification Based on Facial Features 557
4 Results
This section describes about face recognition results with proposed approach with
different real time human facial images. Four types of facial databases are available in
outside environment. For verification of faces from real time facial images there are two
basic false instances i.e. False Alarm Rate (FAR) and False Resistance rate (FRR).
The FAR and FRR are given by:
Table 2. Homomorphic filter based approach with updated verification results [20]
Feature extractor Classifier FRR = FAR HTER
(%) (%)
FAR FRR
EBGM D.E. 5.488 7.655 6.5715
C.N. 4.998 7.125 5.5615
N.N 5.985 7.322 6.1535
LDA D.E. 7.480 7.726 7.726
C.N. 6.350 7.635 6.775
N.N 7.030 7.400 7.290
Graph 1. Comparion between EBGM and PCA (FAR, FRR and HTER percentages)
The third research has been a blend of homomorphic filtration, and histogram
levelling to the head pictures. The outcome organized in Table 3 uncovers that N.N.
classifier has the least expensive HTER. Along these lines, in general, for experience
affirmation N.N. classifier can be viewed as the best classifier among the three clas-
sifiers since it works ceaselessly in every one of the tests utilizing both PCA and LDA
work extractors.
560 D. K. Kishore Galla and B. Mukamalla
Table 5. Equalization based homomorphic filtering with updated recognition results [20]
Feature extractor Classifier Recognition
(%)
EBGM D.E. 92.84
C.N. 93.20
N.N 89.88
LDA D.E. 94.66
C.N. 93.22
N.N 87.87
Table 6. Histogram equalization filtering procedure with updated face recognition results [20]
Extraction of feature Classification approach Updated recognition results
(%)
EBGM D.E. 92.94
C.N. 92.98
N.N 89.76
LDA D.E. 90
C.N. 92.22
N.N 85.56
5 Conclusion
References
1. Wiskott L, Fellous JM, Kuiger N, von der Malsburg (1997) Face recognition by elastic
bunch graph matching. IEEE Trans Pattern Anal Mach Intell 19:775–779
2. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant
texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell
24(7):971–987
3. Ravela S, Hanson A (2001) On multi-scale differential features for face recognition. Proc.
Vision Interface, pp 15–21
4. Yanushkevich S, Hurley D, Wang P (2008) Editorial. Special Issue on Pattern Recognition
and Artificial Intelligence in Biometrics (IJPRAI) 22(3):367–369
5. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput
Vision 60(2):91–110
6. Lowe D (1999) Object recognition from local scale-invariant features. Int Conf Comput
Vision 90:150–1157
7. Lowe D (2001) Local feature view clustering for 3d object recognition. IEEE Conf Comput
Vision Pattern Recognit 1:682–688
8. Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image
descriptors. IEEE Conf Comput Vision Pattern Recognit 4:506–513
9. Brown M, Lowe D (2003) Recognising panoramas. IEEE Int. Conf. Comput Vision 3:
1218–1225
10. Abdel-Hakim A, Farag A (2006) CSIFT: A SIFT descriptor with color invariant
characteristics. In: Proceedings of the 2006 IEEE computer society conference on computer
vision and pattern recognition (CVPR’06), vol 2, pp 1978–1983
11. Bicego M, Lagorio A, Grosso E, Tistarelli M (2006) On the use of SIFT features for face
authentication. In: Proceedings of IEEE Int Workshop on Biometrics, in Association with
CVPR, pp 35–41, NY
12. Luo J, Ma Y, Takikawa E, Lao SH, Kawade M, Lu BL (2007) Person-specific SIFT features
for face recognition. In: International conference on acoustic, speech and signal processing
(ICASSP 2007), Hawaii, pp 563–566
13. KreBel U (1999) Pairwise classification and support vector machines. In: Advances in kernel
methods: support vector learning. MIT Press, Cambridge, pp 255–268
14. Hen YM, Khalid M, Yusof R (2007) Face verification with Gabor representation and support
vector machines. In: Proceedings of the first Asia international conference on modelling &
simulation, pp 451–459
15. Vapnik V (1995) The nature of statistical learning theory. Springer-Verlag, New York
16. Olivetti Research Labs, Face Dataset. www.cl.cam.ac.uk/research/dtg/attarchive/
facedatabase.html
17. Sanguansat P, Asdornwised W, Jitapunkul S, Marukatat S (2006) Class-specific subspace-
based two-dimensional principal component analysis for face recognition. In: Proceedings of
the 18th international conference on pattern recognition (ICPR), vol 2, pp 1246–1249
18. http://www.csie.ntu.edu.tw/˜cjlin/libsvm (2001)
Real Time Gender Classification Based on Facial Features 563
19. Zheng YJ, Yang JY, Yang J, Wu XJ, Yu DJ (2006) A complete and rapid feature extraction
method for face recognition. In: Proceedings of the 18th international conference on pattern
recognition (ICPR), vol 3, pp 469–472
20. Nazeer SA, Omar N, Khalid M (2007) Face recognition system using artificial neural
networks approach. In: IEEE conference on ICSCN 2007, MIT Campus, Anna University,
Chennai, India, February 22–24, pp 420–425
21. Kishore GDK (2017) A literature survey on object classification techniques. Int J Adv
Technol Eng Sci 5(3):779–786
22. Kishore GDK, Babu Reddy M (2017) Comparative analysis between classification
algorithms and data sets (1: N & N:1) through WEKA. Open Access Int J Sci Eng
2(5):23–28
23. Kishore GDK, Babu Reddy M (2018) Analysis and prototype sequences of face recognition
techniques in real-time picture processing. Intelligent engineering informatics, advances in
intelligent systems and computing, vol 695. Springer, Singapore
Integrating Netnographic Analysis and Text
Mining for Assessing Satisfaction of Travellers
Visiting to India - A Review of Literature
patients visited in India for treatment (“Indian medical tourism industry to touch $8
billion by 2020”: Grant Thornton - The Economic Times, n.d.).
The Tourism and tourism competitiveness Report of 2017 ranks us as 40th out of
136 countries. Also the report also mentions ranks the price competitiveness of
countries tourism industry as 10th of 136 countries. It also mentions that India has good
air transport (ranked 32nd), particularly considering India’s stage of development, and
good ground transport infrastructure (ranked 29th) (Travel & Tourism Economic
Impact 2017 India, n.d.).
2 What Is Netnography?
3. Collection of data: In this kind of studies data collection to be done from internet
data, field notes and interviews data.
4. Interpretations or data Analysis: here data analysis technique like analytical coding
is carried out in following steps
a. coding: coding is a process of labelling to be done in general phenomenon
b. noting: noting is identifying reflection on data
c. abstracting: in this step identification of sequences, similarities and also differ-
ences in interaction
d. Checking/refining step: this step helps to return to the place or field to check,
confirm and refining of existing understanding or interpretation of patterns,
commonalities, differences, etc.
e. Generalization/Generalizing: this step elaborate a miner set of the generalization
which covers or briefs the consistent nature in the dataset.
f. Theorizing results: this is a step which construct or fetch theory from the results
or findings.
5. Ensuring of Ethical Standards: the ethical concerns on kind of netnographic studies
are whether various online forums to be considered as a private/public site and also
what constitutes as informed consent in online/cyber space. Netnography provides
set of guidelines about when and how to cite authors, online posters and what has to
consider in an ethical online ethnographic representations, also when to ask for
permissions and in what case permission is not required or not necessary.
6. Representation of research: The one of application of this type of marketing
research is as an important tool to exhibit consumer behavior by the means of
understanding them and also by listening to their voice.
The popular term of mining on text is the process of performing analysis of textual data
in a manner that identify some patterns and gain some insights from it. This method is
widely being used by majorly by retailers in e-commerce to understand more about
consumers. To target specific individuals with personalized offers, discounts to
improve sales and loyalty it is essential to identify consumer purchase styles or patterns
which is very much possible with the help the mining on textual information. Textual
mining has become popular filed of research because it attempts to discover/explore
meaningful information from text which is unstructured in nature, which consist of
large amount of information which can’t be used for processing by any computers.
(Dang and Ahmad 2014) (Fig. 1)
This mining approach is a multidisciplinary which covers various tasks such as
retrieval of information, text analysis, information extraction, information categoriza-
tion, visualization etc.
Following are the basic steps under this approach:
a. Information collection from unstructured source.
b. Conversion of information into structured form of data
c. Identifying the useful patterns from structured form
Integrating Netnographic Analysis and Text Mining 567
4 Review of Literature
Kozinets (1998) introduced literature on research in the market in the field of netno-
graphic studies, which is a method of interpretation used to perform investigation on
the behavior of consumers, their cultures and the communities available on the
worldwide web.
Kozinets (2002) helps with the guidelines which recognize the online environ-
ments, values the inherent flexibility, openness of ethnography and ensures ethics in
performing marketing research with the help of online coffee newsgroup example and
its implications on marketing.
Langer et al. (2005) comments on suitability of online ethnographic methodology
for the better understanding of some of sensitive research topics.
568 D. Mane and D. Bhatnagar
5 Methodology
An Approach to achieve defined objectives has been divided into the following steps:
a. Intensive literature survey has been done to understand the research work to be
carried out in the field of Netnography, Electronic Commerce, tourism sector and
Netnography application using text mining.
b. To Understand Overall market structure and Consumer base of online travel agents
in India
c. To Perform Netnography on selected online travel services in India
1. define and formulate research based questions by referring travel related sites.
2. collecting data by ensuring ethical guidelines from various sources.
3. performing analysis and drawing interpretation or meanings from findings.
d. Understanding consumer experience with respect to online travel services in India
combining text mining and netnographic analysis.
Competitive Advantage of Proposed Research Framework:
The online travel service providers can gain an advantage of this framework by Mining
meaningful information from consumer reviews, comments, which further can be used
to derive patterns to understand positive and negative comments. Companies can take
corrective actions according to reviews/feedback provided by consumers and gain
competitive advantage.
6 Conclusion
One of the limitations of the studies conducted on consumer satisfaction with online
travel services in various countries including United States, that it cannot be general-
ized to all travel services in all countries. This has created scope and opportunity to
conduct a similar study in India using text based Mining which was missing in studies
conducted earlier, Present framework will help online travel service providers in India
to understand voice of consumers, understand their levels of satisfaction and dissat-
isfaction and help to take corrective measure at appropriate time. Netnographic analysis
would be carried out on data collected from various means including Corporate
website, Advertisements (sponsored links, News sites, Rating and referral sites
(Mouthshut.com), Community sites. Netnographic analysis results integrating with
Integrating Netnographic Analysis and Text Mining 571
Text mining would help to identify key tokens using tokenizing process in RapidMiner
and further interpretations can be derived from the result to measure level of satis-
faction and dissatisfaction of consumers.
References
Barbier G, Liu H (2011) Data mining in social media. Social network data analytics. Springer,
Boston. https://doi.org/10.1007/978-1-4419-8462-3_12
Belinda C, Camiciottoli S (2012) The integration of netnography and text mining for the
representation of brand image in fashion blogs
Berezina K, Bilgihan A, Cobanoglu C, Okumus F (2016) Understanding satisfied and dissatisfied
hotel consumers: text mining of online hotel reviews. J Hosp Mark Manage 25(1):1–24.
https://doi.org/10.1080/19368623.2015.983631
Dang S, Ahmad PH (2014) Text mining : techniques and its application. Int J Eng Technol Innov
1(4):22–25
Deka G, Rathore S, Panwar A (2016) Developing a research framework to assess online
consumer behaviour using netnography in India: a review of related research. IGI Global.
https://doi.org/10.4018/978-1-4666-9449-1.ch009:154-167
Derya K (2013) Reconceptualising fieldwork in a netnography of an online community of
English language teacher. Ethnography Edu 8(2):224–238
Griggs G (2011) Ethnographic study of alternative sports by alternative means-list mining as a
method of data collection. J Empirical Res Hum Res Ethics: JERHRE 6(2):85–91. https://doi.
org/10.1525/jer.2011.6.2.85
Hamilton K, Hewer P (2009) Salsa magic: an exploratory netnographic analysis of the salsa
experience. In: Advances in Consumer Research
Hernandez S, Alvarez P, Fabra J, Ezpeleta J (2017) Analysis of users’ behavior in structured e-
Commerce websites. IEEE Access. https://doi.org/10.1109/ACCESS.2017.2707600
Indian medical tourism industry to touch $8 billion by 2020: Grant Thornton - The Economic
Times (n.d.) The Economic Times. http://economictimes.indiatimes.com/industry/healthcare/
biotech/healthcare/indian-medical-tourism-industry-to-touch-8-billion-by-2020-grant-
thornton/articleshow/49615898.cms
Kozinets R (1998) On netnography: initial reflection on consumer research investigations of
cyberculture, In: Alba J, Hutchinson W (eds.) Advances in consumer research, vol 25,
pp 366–371
Kozinets RV (2002) The field behind the screen: using netnography for marketing research in
online communities. J Market Res 39(1):61–72. https://doi.org/10.1509/jmkr.39.1.61.18935
Kozinets RV (2006) Click to connect: netnography and tribal advertising. J Adv Res 46(3):279–
288. https://doi.org/10.2501/S0021849906060338
Kozinets RV (2010b) Netnography: doing etnographic research online. Sage, London
Kozinets RV (2012) Marketing netnography: prom/ot(Ulgat)ing a new research method.
Methodol Innov Online. https://doi.org/10.4256/mio.2012.004
Kozinets RV (2010a) Netnography: the marketer’s secret ingredient, pp. 4–6
Langer R, Beckman SC, Barnes SJ, Bauer HH, Neumann MM, Huber F (2005) Qual Mark Res
Int J 8(2):189–203. http://dx.doi.org/10.1108/13522750510592454
Loanzon E, Provenzola J, Siriwannangkul B, Al Mallak M (2013) Netnography: evolution,
trends, and implications as a fuzzy front end tool. In: 2013 Proceedings of technology
management in the IT-driven services (PICMET), PICMET’13, pp 1572–1593
572 D. Mane and D. Bhatnagar
Mkono M (2012) Netnographic tourist research: the internet as a virtual fieldwork site. Tourism
Anal. https://doi.org/10.3727/108354212X13473157390966
Murthy D (2008) Digital ethnography. Sociology 42(5):837–855. https://doi.org/10.1177/
0038038508094565
Nusair K, Kandampully J (2008) The antecedents of consumer satisfaction with online travel
services: a conceptual model. Eur Bus Rev 20(1):4–19. https://doi.org/10.1108/
09555340810843663
McKenna BJA, Myers M, Gardner L (2015) Analyzing qualitative data from virtual words: using
images and text mining
Regan R, Tiong-Thye G (2012) Textual factors in online product reviews: a foundation for a
more influential approach to opinion mining. Electron Commer Res 12(3):301–330
RapidMiner-Wikipedia (n.d.). https://en.wikipedia.org/wiki/RapidMiner. Accessed 11 Dec 2017
Sandlin JA (2007) Netnography as a consumer education research tool. Int J Consum Stud 31
(3):288–294. https://doi.org/10.1111/j.1470-6431.2006.00550.x
Travel & Tourism Economic Impact 2017 India (n.d.) World Travel and Tourism Council.
https://www.wttc.org/-/media/files/reports/economic-impact-research/countries-2017/
india2017.pdf
Tye R (2006) Ethno-mining: integrating numbers and words from the ground up. Technical
Report No.UCB/EECS-2006125, Department of Electrical and computer science, University
of California, Berkeley
Verma T, Renu R, Gaur D (2014, n.d.) Tokenization and filtering process in Rapid Miner. Int J
Appl Inf Syst. Research.ijais.org. http://research.ijais.org/volume7/number2/ijais14-451139.
pdf
Wan S, Cecile P (2015) Social media data aggregation and mining for internet-scale customer
relationship management. In: IEEE 16th International conference on information reuse and
integration, pp 39–48
Wu MY, Pearce PL (2014) Appraising netnography: towards insights about new market in the
digital tourist era. Current Issues in Tourism 17(5):463–474
Xun J, Reynolds J (2010) Applying netnography to market research - the case of the online
forum. J Target Meas Anal Mark 18(1):17–31. https://doi.org/10.1057/jt.2009.29
Zhang KZK, Benyoucef M (2016) Consumer behavior in social commerce: a literature review.
Decis Support Syst 86:95–108. https://doi.org/10.1016/j.dss.2016.04.001
Performance Evaluation of STSA Based
Speech Enhancement Techniques for Speech
Communication System
Abstract. Researchers present noise suppression model for reducing the spectral
effects of acoustically added noise in speech. Background noise which is
acoustically added to speech may decrease the performance of digital voice
processors that are used for applications such as speech compression, recognition,
and authentication. [6, 7] In proposed paper different types of Short Time Spectral
Amplitude (STSA) [1, 17] based methods are explained to decrease the noise.
Spectral subtraction gives a computationally efficient, processor- independent
approach to effective digital speech analysis. But as a result of artifact, another
synthetic noise may be produced by algorithm that is called musical noise. In
spectral subtraction methods, there is shown less trade-off between residual and
musical noise so the quality and intelligibility of signal is not maximized at
required level. [8] To overcome from the problem of musical noise, wiener filter
and statistical based model methods are discovered and some proposed modifi-
cations [7–11] are suggested in every methods to make it more effective.
1 Introduction
Using various algorithms, speech quality and intelligibility can be aimed to improve by
Speech enhancement. Background noise is a big problem because it degrades the
quality and intelligibility of original clean voice signal [1]. Speech enhancement
techniques are single channel and based on the short time discrete Fourier transform
(STDFT) and uses various algorithms uses analysis-modify-synthesis approach [8].
The analysis window length is kept fixed and framed based processing is used. It works
on that the clean spectral amplitude would be properly captured from the noisy speech
signal at general accepted level of quality speech at output and hence it is said that they
are called short time spectral amplitude (STSA) based methods [9, 16]. In the enhanced
output speech, the phase of noisy speech is adopted. For the synthesis process, the
overlap-add method is selected [10]. In proposed paper, the simulation and
2 Literature Review
The speech enhancement techniques are known as from its two categories which are
following as: (i) Single channel and (ii) Multiple channels (array processing) it depends
on whether the speech received is from single microphone or multiple microphone
corresponding resources [11]. The Short time spectral amplitude (STSA) methods are
single channel methods and most well known and well investigated as they doesn’t
need complexness and huge implementation [8, 9, 14]. STSA methods are transform
domain most conventional methods and our assumption is that noisy signal has white
noise which is additive and stationary for one frame and it will be changed gradually in
comparison with the input speech signal. These methods are based on the analysis-
modify-synthesis approach. They use fixed analysis window length and frame based
processing [3, 4, 5] (Table 1).
^
jDðKÞj2
cðKÞ SNR after applying algorithm at frequency bin K ¼ jY ðKÞj2
^
jDðKÞj2
/y ðKÞ Preserved Phase of signal Y(n) at frequency bin K
The original speech estimate is given by preserving the noisy speech phase ;y ðKÞ.
Performance Evaluation of STSA 575
nðKÞ
X^ ðK Þ ¼ jYðKÞj ð3Þ
1 þ nðKÞ
Decision Direct (DD) Approach: As a solution, the decision directed rule is proposed
by Ephraim and Malah [4] to count this ratio and which was used by Scalart et al. [5]
with adding Wiener filter. According to Scalart, the algorithmic rule for frame t is given
by Eq. 4.
2
^ ðt1Þ
X ðKÞ
ðtÞ ðtÞ
n ðK Þ ¼ g 2 þ ð1 gÞmaxðc ðKÞ 1; 0Þ ð4Þ
^ ðtÞ
D ðKÞ
The statistical model based methods such as minimum mean square error (MMSE) is
given by Ephrahim and Malah and its another version MMSE log spectral amplitude
(LSA) [1] generally used noise counting methods. They are in favour of modelling
spectral elements of speech signal and noise signal processes as they are independent of
Gaussian variables [12]. Many of published papers mention that the performance of
Wiener filter and MMSE LSA is fabulous in terms of every practical and mathematical
evaluation [18].
The method presented over here is Statistical Model based method [1] which is
named as minimum mean square logarithmic spectral amplitude modified or
MMSE85_modified. Clean speech is given [13] as
Z !
1
nðKÞ 1 et
jXðKÞj ¼ exp dt jYðKÞj ð5Þ
1 þ nðKÞ 2 VðKÞ t
The decision direct rule for frame t for this method is also written as Eq. 4.
Proposed Modification of a Priori SNR: In the equation of given by MMSESTSA-
LSA [6], the choice of η is critical. In usual, for every method we are getting value
mostly close to 1. It can be seen that if the value of smoothing factor remains near to 1,
the synthetic(artifact) noise will be less, but there occurs more “transient distortion” to
the output speech signal. To keep balance between these two processes, observed
outcomes in the literature always keep a constant value which is in the range of 0.95–
0.99 with a few exceptions. But using this constant has some remedies.
576 B. P. Ramprasad et al.
1
gðKÞt ¼ 2 ð6Þ
nðKÞt nðKÞ
t1
1þ nðKÞt þ 1
Fig. 1. .
Fig. 2. .
Table 2. Objective quality evaluation with street noise, restaurant noise and station noise
0 dB 5 dB 10 dB
Street noise
SSNR WSS LLR PESQ SSNR WSS LLR PESQ SSNR WSS LLR PESQ
MSSBoll79 −1.351 84.6089 1.0555 1.663 0.2108 68.916 0.8763 2.0632 2.1157 56.1325 0.6477 2.4959
WienerScalart96 −0.972 118.2388 1.3972 1.6009 0.2629 95.8322 1.1828 1.9547 1.7746 74.9996 0.8795 2.4198
SSMultibandKamath02 −3.0104 70.6299 0.9921 1.7235 −1.4438 58.6753 0.7972 2.0858 0.2581 47.3427 0.5989 2.4732
Restaurant noise
MSSBoll79 −2.1791 97.2494 0.9992 1.6782 −0.2867 79.0998 0.7826 2.113 1.8766 62.717 0.6157 2.4885
WienerScalart96 −2.016 122.7877 1.356 1.4891 −0.4357 98.5382 1.0725 1.9893 1.4381 76.1611 0.8663 2.4148
MMSESTSA85_modified −3.463 101.5653 1.0019 1.7814 −1.6211 84.1034 0.8287 2.0997 0.4594 66.8071 0.6716 2.4786
Station noise
MSSBoll79 −1.355 87.7056 1.0241 1.6836 0.747 71.2343 0.7916 2.1252 2.8388 59.5597 0.6034 2.553
WienerScalart96 −1.0319 120.2635 1.35 1.5471 0.5771 97.8194 1.0681 1.9713 2.5065 75.8354 0.8545 2.4995
MMSESTSA85_modified −2.9173 97.0670 1.0052 1.8553 97.8194 1.0681 0.7876 2.2213 0.8399 64.8758 0.6552 2.5271
578 B. P. Ramprasad et al.
For comparison purpose the graphical representation of the results of SSNR, LLR,
WSS and PESQ for all conditions are shown in the form of chart in Figs. 3, 4, 5 and 6
respectively. The all results of all methods at different places are relatively concluded.
The SSNR value for MSS and Wiener filter is high but in some cases, the value of
SSNR is the highest for the MMSE85_modified algorithm. MMSE85_modified algo-
rithm has less spectral distortion but it gives the best WSS results because WSS needs
the least value as shown in charts. In some cases the wiener method and MSS methods
has lower WSS but they have not high SSNR in compare to MMSE STSA methods.
From LLR comparison the MMSE STSA algorithms have value less than one for
nearly all cases. Ideally LLR should be zero. The PESQ score above 2.5 is acceptable.
In this regards the MMSE STSA85 modified algorithms work satisfactorily.
Performance Evaluation of STSA 579
5 Conclusions
The implementation and simulation of the STSA methods are very simple and done
here in MATLAB. According to the Results, the synthetic noise artifact is generated by
spectral subtraction methods. The MSS method is not so good as Wiener Filtering
methods as shown in the experiment. Also the speech distortion and residual noise
trade-off is not fully but at an accepted level solved by MMSE-LSA_modified algo-
rithm. In proposed algorithm, we make smoothing constant adaptive as a result there is
less transient distortion. Hence, in future any other parameter can be made adaptive for
best results or more enhanced output.
580 B. P. Ramprasad et al.
References
1. Li J, Deng L, Haeb-Umbach R, Gong Y (2016) Robust automatic speech recognition,
Chapter 6.3: Sampling-based methods, ISBN: 978-0-12802398-3. Elsevier
2. Boll SF (1979) Suppression of acoustic noise in speech using spectral subtraction. IEEE
Trans. Acoust. Speech Signal Process. ASSP-27:113–120, April
3. Wang Y, Brookes M (2016) Speech enhancement using an MMSE spectral amplitude
estimator based on a modulation domain Kalman filter with a Gamma prior. In: Proc. IEEE
intl. conf. on Acoustics, Speech and Signal Processing (ICASSP), pp 5225–5229, March
4. Scalart P, Filho JV (1996) Speech enhancement based on a priori signal to noise ratio
estimation. In: Proc. IEEE international conference on Acoustics, Speech and Signal
Processing ICASSP 96, pp 629–632, May
5. Ephrahim Y, Malah D (1985) Speech enhancement using a minimum mean square error log
spectral amplitude estimator. IEEE trans. on Acoustics, Speech and Signal Processing, vol.
ASSP-33, no. 2, pp 443–445, April
6. Xie D, Zhang W (2014) Estimating speech spectral amplitude based on the Nakagami
approximation. IEEE Signal Process. Lett. 21(11):1375–1379
7. Doire CSJ (2016) Single-channel enhancement of speech corrupted by reverberation and
noise. Ph.D. dissertation, Imperial College London
8. Liang D, Hoffman MD, Mysore GJ (2015) Speech dereverberation using a learned speech
model. IEEE international conference on Acoustic, Speech and Signal Processing (ICASSP)
9. Wang Y, Brookes M (2016) Speech enhancement using an MMSE spectral amplitude
estimator based on a modulation domain Kalman filter with a Gamma prior. IEEE
international conference on Acoustics, Speech and Signal Processing (ICASSP)
10. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and
stochastic optimization. J. Mach. Learn. Res. 12(Jul):2121–2159
11. Erkelens JS, Hendriks RC, Heusdens R, Jensen J (2007) Minimum mean-square error
estimation of discrete fourier coefficients with generalized gamma priors. IEEE Trans.
Speech Audio Process. 15(6):1741–1752
12. Navya Sri. Ramakrishna Murty MM, et al (2017) Robust features from speech by using
Gaussian mixture model classification. International conference and published proceeding in
SIST series, vol 2. Springer, pp 437–444, August
13. Yu Wang, Mike Beookes (2018) IEEE members. IEEE/ACM transactions on audio, speech
and language processing, vol 26, no 3, March
14. Dionelis N, Brookes M (2017) 25th European Signal Processing Conference (EUSIPCO)
15. Brookes M (1998–2016) VOICEBOX: A speech processing toolbox for MATLAB. http://
www.ee.imperial.ac.uk/hp/staff/dmb/voicebox/voiceboxhtml
16. Wang Y, Narayanan A, Wang D (2014) On training targets for supervised speech separation.
IEEE/ACM trans. on audio, speech and language processing 22(12):1849–1858
17. Boulanger-Lewandowski N, Mysore GJ, Hoffman M (2014) Exploiting long-term temporal
dependencies in NMF using recurrent neural networks with application to source separation.
IEEE international conference on Acoustic, Speech and Signal Processing (ICASSP)
Optimal Unit Commitment for Thermal Power
Systems Using Penguin Search Algorithm
Abstract. Electricity networks day to day delivers hundreds of giga watts per
hour (GWH) to consumers. Those interconnected systems became hard enough
challenging to maintain and run in the operating mode. So need arises to gen-
erate and supply electricity in a smart methodology which is expected to be
highly economical. Unit Commitment, an important economic problem through
optimization to be solved for obtaining an optimal cost saving in a specific
period of time which is purely based on the determination of the combination of
available generators. An electrical industry which deals mainly with meeting the
load demand for all the generators. A generating unit has various limitations
such as minimum uptime, minimum downtime, minimum and maximum power
generation limits, prohibited zones etc. This research implementation paper
presents a general idea of scheduling the generating units using the Bio-inspired
algorithm PeSOA - Penguin Search Optimization Algorithm. PeSOA is a latest
metaheuristic method through penguins collaborative hunting strategies, spe-
cially designed for optimizing non-linear systems. Significance of PeSOA is
penguins Distribution is balanced through local minima and global minimum.
Implementation carried out for various types of systems include problem
instances of 10 units - 24 h system, 3 units – 24 h system, 4 units - 8 h system
and 4 units – 24 h system, each of all the cases has compared with the results
from the literature.
1 Introduction
In the electricity network domain Unit commitment plays an important task, optimizing
the operational cost in unit commitment can lead to highly effective, more reliable and
best economic operation of power plants. It is considered as an important task to decide
the number of time units of generating power before solving the UCP either for a day or
a week. After careful planning of the operation schedule of the generators for the
chosen time units, the same style of allocation of generators to produce electric power
and also cost fixing must be implemented for the whole year of time to achieve
economic year plan of running power plants. Pattern of cost calculation is an important
factor which must be considered in the operation of power plants especially thermal
and hydro power plants. The cost of all generators operation for each hour is calculated
through production and after including cost involved in the unit start-up, the overall
operational cost is fixed for the given time unit either day or a week. It is needy to
schedule resources of the power systems to ensure proper functioning of the power
system and reliable delivery of power. In addition to achieve minimum cost, UC must
satisfy a variety of constraints in operating. The Unit Commitment constraints are
problem must be satisfied are system power balance, minimal up-time and minimal
down-time of the unit, power production limits, spinning reserve etc. There are two
variations of UC, (i) Unit commitment for cost minimization, and (ii) Unit commitment
for energy production profit maximization, commonly called as Profit based UC
(PBUC). UC activity plays an important role in the power system economic plan for
performing scheduling of the generators with fuel and cost saving as a goal. Many
algorithms were invented for the optimization of UC problem. Some of the popular
algorithms solved UC in the literature are mixed integer programming [5], lagrangian
relaxation [2], simulated annealing [4], fuzzy logic [8], particle swarm optimization [3],
cuckoo search algorithm [7], invasive weed optimization [1], and binary fireworks
algorithm [6].
The main motive of this research work is to achieve minimum cost and also by
satisfying various technical constraints. This research work proceeded based on 4
benchmark datasets of thermal power generators and compared with the UC problem
results in the literature. Benchmark data include no of units, load, maximum power,
minimum power and cost calculating constants considered. Minimum cost calculated
by the implementation of a Swarm based metaheuristics based on the interesting
hunting behavior of penguins and termed as Penguin Search Optimization Algorithm.
To Minimize; Total Cost ¼ Start - up cost þ Fuel cost þ Shut - down cost
XT XN
Min t¼1 i¼1
fci ðpi Þuði; tÞ þ suði; tÞ þ sdði; tÞ g ð1Þ
where,
c(pi)u(i, t) – fuel cost of the production units
su(i, t) – cost incurred in unit start up
sd(i, t) – cost incurred in unit shut down
Thermal unit Fuel cost is expressed as a second order estimated function of its
output Pi [2].
Optimal Unit Commitment for Thermal Power Systems 583
Where Pi is the power output for each generator i. a0, a1, a2 are the cost fixing
constants for calculating the production cost.
Start-up cost: the cost minimum needed to start a generator from the cold state.
Shut-down cost: the cost minimum needed to stop a generator which is in ON state.
Generation Limits: Each generating unit is belonging to its own minimum and also
maximum limit of power generation. This constraint is called as generation limits or
power limits constraint. It is given clearly in Eq. 4,
i [ ¼ Ti
MTON ð5Þ
U
MTOFF
i [ ¼ TD
i ð6Þ
Fuel Constraints: This constraint tends to explain limited availability of the fuel or
burning some amount of fuel.
Swarm based metaheuristics algorithm based on the penguin birds and their behaviour
of performing hunting in an interesting manner. Such algorithm is termed as Penguin
Search Optimization Algorithm. The primary equation of the penguin search opti-
mization algorithm is formulated for solving UC as to generate electric power outcome
of every generating unit as equal to the already stated load demand. Initially random
power output is selected and that selection is based on that output, rest of the power
outcomes can be estimated. In each iteration the best output of power is compared and
economic outcome is scheduled and selected as the optimal result and based on that the
operational cost is calculated and thus cost saving is achieved reasonably (Table 1).
Initially random power output is selected from any generator and cost is computed.
The selected random power outcome is considered as Xid., then by using Xid and rand()
function n number of costs are calculated for given many generators. From that, Xbest is
selected among all based on the selected minimum cost. The selected value is con-
sidered as the cheap cost (best minimum cost value) for the particular iteration. Then by
computing and substituting Xbest value in the primary equation Xnewshould be com-
puted. Finally Xbest provides the optimal cost value. These step by step procedures of
calculating cost value should be implemented for all the iterations and final cost value
can be computed or estimated.
Xid - Initial random power output or outcome (in Megawatts) for any power
generator
Rand () - it is a Function used for generating random numbers
Xbest - A best power output or outcome of the current generation
Xnew - Final optimal output of power
The algorithm tends to choose the best power must be scheduled for all the generators.
Power output schedule solving all the important technical constraints, must be selected
and that schedule is an economic dispatch of the available power generating units. This
implementation is performed on 10 generators and 24 h Data set. So power output is
likely to be scheduled as 10 * 24 for calculating for the whole day. Then also for the
selected schedule cost value should be calculated for the 24 h. After calculating or
estimating the operational cost for all the 24 h, given start up cost must be added and
finally total operating cost must be computed.
Optimal Unit Commitment for Thermal Power Systems 585
Start up costs should be added to the particular generator in such a case of situation
where its initial status is in cold state and it need to be converted from cold state to
normal state, i.e., start up cost must be added to the case of a situation in which a
particular generator is bringing to ON state. For considering other cases no need of
including values of start-up costs. Including spent shut down activity cost is also a part
task of the UC. Usually shut-down cost is considered as 0. Finally after calculating or
fixing the optimal schedule of the electric power dispatch, cost value must be calculated
parallel for all given hours.
The implementation flow of the PeSOA is depicted in the Fig. 1, step by step
process of selecting optimal holes by sending multiple penguin groups for hunting. At
last optimal minimum cost value must be calculated by solving primary constraints
including prohibited operating zones. According to the flow graph, initially random
power output is generated. And by implementing the primary equation three different
solutions should be computed or calculated and updated as a result set of every
iteration.
4 Results
The study shows, various UCP systems were taken for implementation. Implementa-
tion was performed with the Penguin Search Optimization Technique (PeSOA). The
implementation for solving UCP is based on the pre fixed data instances, including
number of power generators, number of units of time (either a day or a week), cost
fixing co-efficients (a0, a1, a2), Initial status, minimum down time, load limit for each
power generator, the start-up cost value for each generator, and minimum up time.
Number of units is nothing but the total number of power generators taken work for
a day or week. Number of units of time is based on the given hours or days manip-
ulation like 24 units of time if its taken for solving for a day, 7 days as a unit of time if
it is taken as solving for a week basis. For calculating the cost value, UCP is primarily
and entirely depends on the cost fixing co efficient constants expressed as a0, a1, a2.
Cost value co-efficients or constants must be included each time for calculating pro-
duction cost value for each power generator. In any timely hour, a particular power
generator is turned to ON from OFF state, then as for that hour start-up cost value
calculated, must be added in addition with the production cost value and the final
calculated cost value is termed as an operational cost. Aim of Unit Commitment to
minimize some from operational cost value of the power plant system.
It is then important to frame a pattern setting like considering minimum down time
and minimum up time of every generator, so that any power generator can be used
effectively. This minimum down time and minimum up time differs from one generator
to other generator based on the capacity of each generator. So it is important to keep in
mind that the minimum up time and minimum downtime before deciding which power
generator is set to be ON state and which power generator is set to be OFF for certain
number of given hours. Such selection procedure, also termed as the optimum dispatch
of the power generators. It is also tends to be an important part of solving the UCP.
586 C. Chitra and T. Amudha
Pseudocode of PeSOA:
Random Population Generation of P solutions in groups;
Probability initialization of fish existence in holes;
For i=1 to Generations(no. of)
For individual i belongs to P do
While reserved oxygen not depleted do
- Choose Random
-Penguin position improvement through Eqs (1)
-Quantities of eaten Fish Updation for penguin.
EndWhile
EndFor
-Quantity Updation of eaten fish in holes, levels and best group.
- Probabilities of penguins in holes and levels are redistributed (based on eaten fish)
-Best Solution Updation
EndFor
Table 2. (continued)
Hour/Unit UN1 UN 2 UN 3 UN 4 UN 5 UN 6 UN 7 UN 8 UN 9 UN10
HR13 ON ON ON ON ON ON OFF OFF OFF ON
HR14 ON ON ON ON ON ON OFF OFF OFF OFF
HR15 ON ON ON ON OFF ON OFF OFF OFF OFF
HR16 ON ON ON OFF OFF OFF OFF ON OFF OFF
HR17 ON ON ON ON OFF OFF OFF OFF OFF OFF
HR18 ON ON ON ON OFF ON OFF OFF OFF OFF
HR19 ON ON ON ON OFF ON ON OFF OFF OFF
HR20 ON ON ON ON OFF ON ON ON ON ON
HR21 ON ON ON ON OFF OFF ON OFF OFF ON
HR22 ON ON ON ON OFF OFF ON OFF OFF OFF
HR23 ON ON OFF OFF OFF OFF OFF OFF OFF OFF
HR24 ON ON OFF OFF OFF OFF OFF OFF OFF OFF
Table 3. (continued)
H/U UN1 UN 2 UN 3 UN 4 UN 5 UN 6 UN 7 UN 8 UN 9 UN10 Total Fuel Startup
limit cost cost
(MW) ($)
HR15 455 455 130 130 0 30 0 0 0 0 1200 24097 0
HR16 445 445 110 0 0 0 0 50 0 0 1050 21475 0
HR17 435 425 100 40 0 0 0 0 0 0 1000 20228 0
HR18 455 355 130 130 0 30 0 0 0 0 1100 22371 0
HR19 437 434 127 90 0 75 37 0 0 0 1200 25242 580
HR20 455 455 130 130 0 80 48 53 10 49 1400 31251 580
HR21 455 455 130 130 0 0 80 0 0 50 1300 27921 0
HR22 455 445 100 80 0 0 20 0 0 0 1100 22554 0
HR23 455 445 0 0 0 0 0 0 0 0 900 17125 0
HR24 450 350 0 0 0 0 0 0 0 0 800 15379 0
Based on the result obtained from the implementation of PeSOA in test data set-1,
the total production cost obtained is 552573. Total start-up cost obtained is 3870. The
total cost for 10 generators- 24 t-hours is 556443. By comparing the results of PeSOA
with those obtained with other implementations in the literature, it can be observed that
the total operational cost is minimized considerably. This is being justified by the
following comparison Table 4.
The implementation of many algorithms for solving the unit commitment by using
the same benchmark data instances as in case study-1, is being compared with the
proposed implementation using PeSOA. Researchers are trying to reduce the opera-
tional cost of the thermal power plant for many years. Various techniques were
implemented to solve UC, as Genetic algorithm, Particle swarm optimization,
Lagrangian Relaxation, Stochastic priority list, ant colony optimization, extended
Optimal Unit Commitment for Thermal Power Systems 589
40000
35000
30000
25000
COST
20000 BRGA
15000 ABC
10000
PeSOA
5000
0
hr1 hr3 hr5 hr7 hr9 hr11 hr13 hr15 hr17 hr19 hr21 hr23
TIME
From the above comparison chart, it is the operational cost acquired by the
implementation of the Penguin search optimization algorithm for the dataset-1 is
compared with the Artificial bee colony algorithm implementation [81] and Binary real
coded genetic algorithm [82] with the same data instances-1. Cost for each hour in
PeSOA is compared with the rest two implementations and found that there is rea-
sonable reduction in some hours of PeSOA implementation for fuel cost computation,
comparatively (Table 5).
From the above table the implementation results done in this paper is compared
based on the case studies done in the research. Costs obtained by implementing PeSOA
is being compared with the works implemented already in the literature for the similar
set of benchmark data. From above table its clear that PeSOA obtains reasonable low
cost.
590 C. Chitra and T. Amudha
5 Conclusion
In this research work, the operational works of the generators of thermal power plant
are economically scheduled by the proposed implementation of penguin search opti-
mization algorithm for unit commitment optimization. This technique was used to fix
the operational cost of the allotted generators for each given time unit. Different test
cases were tested here and also compared with the results of Dynamic programming,
Particle Swarm Optimization and fuzzy logic found in the literature. The proposed
implementation is successfully demonstrated with varying test ranges from 10 unit, 3
unit with 24 h system 4 unit with 8 h system. Test cases were taken from various works
in literature for implementation and the results obtained were optimal. The optimal
results obtained were compared with many implementations of solving UCP for several
years from the literature. It is clear from the results obtained for the benchmark tests,
PeSOA is the best performer of solving UCP and the results of the above proposed
approach are compared to some other methods of algorithms too. An effective attempt
can done to improve PeSOA with other aspects in future for other problems.
References
1. Christober Asir Rajan C, Surendra K, Rama Anvesh Reddy B, Shobana R, Sreekanth
Reddy Y (2014) A refined solution to classical unit commitment problem using IWO
algorithm. Int J Res Eng Technol 3(7):327–335
2. Chuang CS, Chang GW (2013) Lagrangian relaxation-based unit commitment considering
fast response reserve constraints. Energy Power Eng 5:970–974
3. Ananthan D, Nishanthivalli S (2014) Unit commitment solution using particle swarm
optimization. Int J Emerg Technol Adv Eng 4(1):1–9
4. Dudek G (2009) Adaptive simulated annealing schedule of the unit commitment problem.
Electr Power Syst Res 80: 465–472
5. Lopez JA, Ceciliano-Meza JL, Moya IG, Gomez RN (2011) A MIQCP formulation to solve
the unit commitment problem for large-scale power systems. Electr Power Energy Syst 36:
68–75
6. Srikanth Reddy K, Panwar LK, Kumar R, Panigrahi BK (2016) Binary fireworks algorithm
for profit based unit commitment (PBUC) problem. Electr Power Energy Syst 83:270–282
7. Sharma S, Mehta S, Verma T (2015) Weight pattern based cuckoo search for unit
commitment problem. Int J Res Advent Technol 3(5):102–110
8. Sharma Y, Swarnkar KK (2014) Power system generation scheduling and optimization using
fuzzy logic Technique. Int J Comput Eng Res 3(4):99–106
9. http://www.techferry.com/articles/swarm-intelligence.html
10. http://www.worldofcomputing.net/nic/swarm-intelligence.html
11. https://sites.math.washington.edu/*burke/crs/515/…/nt_1.pdf
12. http://conclave.cs.tsukuba.ac.jp/research/bestiary/
13. http://ngm.nationalgeographic.com/2007/07/swarms/miller-text/4
14. http://www.cs.vu.nl/*gusz/ecbook/Eiben-Smith-Intro2EC-Ch2.pdf
15. http://www.math.ucla.edu/`wittman/10c.1.11s/Lectures/Raids/ACO.pdf
16. http://conclave.cs.tsukuba.ac.jp/research/bestiary/
17. http://ngm.nationalgeographic.com/2007/07/swarms/miller-text/4
18. http://www.cs.vu.nl/*gusz/ecbook/Eiben-Smith-Intro2EC-Ch2.pdf
Optimal Unit Commitment for Thermal Power Systems 591
19. http://www.math.ucla.edu/`wittman/10c.1.11s/Lectures/Raids/ACO.pdf
20. http://www.otlet-institute.org/wikics/Optimization_Problems.html
21. http://www.energysage.com
22. http://www.quora.com
Privacy Preserving Location Monitoring
System Using Score Based K-Anonymity
1 Introduction
With the technological growth of increase in demand for the Location based Services,
analyzing the user location data is more important where it includes sensitive infor-
mation. Importance in hiding the user sensitive data is becoming an emerging paradigm
in networking [1]. Providing security and privacy to the users utilizing the services
offered by the LBS is treated as a major issue.
When a user requests for a service with the help of a trusted third party, the user and
the Trusted Third Party Server along with the Location based Server are to be validated
by using proper security measures. Here the physical location of the user when sub-
mitted to the Location Provider has to be protected from other malicious users. And also
the users must get their own location information from the Location Provider. Thus the
dynamic data generated by the location based devices is to be collected repeatedly by a
trusted third party server, which poses a serious threat on the user’s privacy.
The privacy of the user location information along with the query information is a
concern that is to be addressed while sharing the data. Hence it is an important area
which deals with the leakage of sensitive information of the users, known as Location
Privacy.
Several research is being carried out for addressing the issue of location privacy
where user anonymity is important. Spatial Cloaking along with k-anonymity had been
widely considered to achieve privacy where the user locations are supposed to be
distorted by creating cloaked regions [5]. This technique uses location cloaking
algorithms to generate cloaked regions and there are several existing location
anonymization algorithms to achieve this. But there are other important parameters to
be considered along with privacy such as overhead cost, query response time, accuracy
of the result and so on [2]. Thus the main aim of this paper is to minimize the size of the
cloaked region by reducing the transmission overhead.
2 Related Work
Several methods are proposed for preserving privacy in Location based Services.
Among those Anonymization is the popular technique, the state of performing certain
activities without revealing identity, also known as the technique of hiding the user
identity. A traditional anonymization technique for protecting privacy is proposed by
Sweeney, known as k-anonymity. This method uses generalization and suppression
techniques [8]. K-anonymity is a technique which is defined as the user location
information is made indistinguishable by at least k-1 users by defining a set of quasi
identifiers. To achieve privacy using k-anonymity, a trusted third party is to be
incorporated which is called as Location Anonymizer [11].
K-Nearest Neighbors’ is an anonymization technique which partitions the user
locations into spatial cloaks of arbitrary polygon shape. Another method suggested is
by using dummy location selection using entropy to find k-1 users [4, 9]. Gruteser and
Grunwald proposed a quad tree based spatial and temporal cloaking algorithm to
reduce the risk of location information by defining a set of quasi variables for the users
in Location based services [6]. By creating k number of queries from k different users it
becomes difficult to identify a specific record from a set of k records to predict the
actual user. The higher the k value yields the higher anonymity where the user is safe.
But the disadvantage is that with the increase in the anonymity level leads in the
reduction of the query accuracy and thus reduces the Quality of Service [3]. Among all
the location anonymization techniques, the most widely used technique is spatial
cloaking. Several location anonymization algorithms exist and the purpose of those
existing techniques is to hide the actual location of the user by sending the information
about the CR instead of the exact location of the user.
A Location anonymizer is available where the size of the cloaked regions are
calculated by considering the minimum and maximum distance between the user nodes
within that region. Doohee Song et al. proposed Anonymized Motion Vectors method
that uses distance vectors to construct smaller cloaked region to reduce the query
processing time [12]. Later he proposed an Adaptive Fixed K-Anonymity method
which calculates all the path movements of the user nodes to minimize the query
processing time [1]. But cost overhead is more and accuracy of the query results is not
much guaranteed. All these factors affecting the performance of the network also
depends on the capability of the devices such as low power, storage and connectivity
among the other users.
594 L. P. Yeluri and E. Madhusudhana Reddy
The Proposed Location based k-anonymity technique solves the problem of accu-
racy in the location information by generating minimized cloaked regions based on the
score function. The best path is obtained by calculating the score value for two different
paths which results in same distance among the nodes. Thus to analyze the best cloaked
region among the user nodes, a score function can be computed as follows:
Here, E is the residual energy, C is the node connectivity and I is node identifier.
wi1, wi2, wi3 are the weights of the node at ith iteration. The best score value after
subsequent iterations is considered as the optimum value and thus finds the nearest
node and transmits the message. Hence the location based k-anonymity technique
calculates the best cloaked region using a score function which is calculated relatively
by considering the energy level of the nodes, connectivity among the nodes and node
identifier value.
Thus the proposed approach is given in the following steps (Fig. 1):
4 Experimental Analysis
In this section, we evaluate our proposed algorithm based on the anonymity value
ranging from k = 2 to k = 10 (a maximum value) to achieve the privacy. The simu-
lation of the data is implemented in MATLAB to test its efficiency. The criterion for
evaluating the efficiency of the system is measured using the following parameters such
as Transmission Overhead, Size of the Cloaked Region and Query Accuracy.
The figure below shows the variation of transmission overhead over the proposed
technique in comparison with Adaptive Fixed k-anonymity, Anonymized Motion
Vector (AMV) and Minimum Cycle Region (MCR). The proposed work shows the
improvement of decrease in the overhead i.e., time taken to transmit the data across the
network in Fig. 2.
596 L. P. Yeluri and E. Madhusudhana Reddy
Figure 3 shows the generation of Cloaked Regions with respect to the anonymity
level. The graph is analyzed with respect to the proposed work and the existing
techniques
With the inclusion of score function in the proposed work, the size of the cloaked
region is minimized. There is also an improvement in the accuracy of the query
information while compared and analysed with the previous existing techniques as
shown in Fig. 4.
Privacy Preserving Location Monitoring System 597
5 Conclusion
References
1. Song D, Park K (2016) A privacy-preserving location based system for continuous spatial
queries. Mob Inf Syst 2016:1–9. https://doi.org/10.1155/2016/6182769
2. Zhang W, Song B, Bai E (2016) A trusted real time scheduling model for wireless sensor
networks. J Sens 1:9. https://doi.org/10.1155/2016/8958170
3. Wang D, Cheng H, Wang P (2016) On the challenges in designing identity-based privacy-
preserving authentication schemes for mobile devices. IEEE Syst J 12(1):916–925
4. Niu B, Li Q, Zhu X, Cao G, Li H (2014) Achieving k-anonymity in privacy-aware location-
based services. In: Proceedings in IEEE INFOCOM. https://doi.org/10.1109/infocom.2014.
6848002
5. Priya Iyer KB, Shanthi V (2013) Study on privacy aware location based service. J Sci Ind
Res 72:294–299
6. Wang Y, Xu D, He X, Zhang C, Li F, Xu B (2012) L2P2: location-aware location privacy
protection for location-based services. In: Proceedings in IEEE INFOCOM
7. Tyagi AK, Sreenath N (2015) A comparitive study on privacy preserving techniques for
location based services. Br J Math Comput Sci 10(4):1–25. ISSN:2231-0851
8. El Emam K, Dankar FK (2008) Protecting privacy using k-Anonymity. J Am Med Inf Assoc
15(5):627–637
598 L. P. Yeluri and E. Madhusudhana Reddy
9. Kido H, Yanagisawa Y, Satoh T (2005) Protection of location privacy using dummies for
location-based services. In: ICDEW ‘05, proceedings of the 21st international conference on
data engineering workshops, April 05–08, pp 12–48
10. Pan X, Jianliang X, Meng X (2012) Protecting location privacy against location-dependent
attacks in mobile services. J IEEE Trans Knowl Data Eng 24(8):1506–1519
11. Vu K, Zheng R, Gao J (2012) Efficient algorithms for K-anonymous location privacy in
participatory sensing. In: Proceedings in IEEE INFOCOM
12. Song D, Sim J, Park K, Song M (2015) A privacy-preserving continuous location monitoring
system for location-based services. Int J Distrib Sens Netw 11(8):815613. https://doi.org/10.
1155/2015/815613
(2, 3) - Remoteness Association of Corridor
Network Pn Under IBEDE and SIBEDE Loom
Abstract. This article deals with Incident Binary Equivalent Decimal Edge
(IBEDE) graceful labeling and Strong Incident Binary Equivalent Decimal Edge
(SIBEDE) graceful labeling of (2,3) remoteness network of path network. These
approaches endow with many applications in complex network traffic to identify
sensible type of traffic jams.
1 Introduction
Network labeling [1, 2, 9] plays a vital role in area of research in network theory which
has wide range of applications in coding theory, communication networks, Mobile,
telecommunication systems, optimal circuits layout, network decomposition problems,
designing ambiguities in X-ray crystallographic analysis.
such that the edges are labeled with the values obtained from binary equivalent
decimal coding. It is also equivalent to f ðek ¼ ijÞ ¼ 2ni1 þ 2nj1 where k ¼
f1; 2; 3; . . .; qg and i; j are finite positive integer labeled for the vertices of edge ek , n is
the number of vertices in G.
2.1.2 Definition
0
Let G(V(G), E(G)) be a connected network. A network G is a (p,q) - remoteness
0 0
network of G [4], if V(G)=V(G ); for v, w 2 V(G) and v and w are adjacent in G if
d(v,w) = either p or q.
2.1.3 Example
2.1.4 Example
2.1.5 Theorem
Every (2,3) - remoteness network of Pn is IBEDE graceful labeling network if n [ 4
where n is number of nodes
Proof:
Let the vertices of (2,3) - remoteness network of Pn be v1; v2;...; vn .
The labeling of vertices of (2, 3) - remoteness network of Pn is as follows,
A bijective mapping of vertex set is f : V (Pn ) ! f0; 1; 2; . . .; ðn 1Þg
f ðv1 Þ ¼ 0
The vertices of (2, 3) - remoteness network of Pn are labeled with integers from 0 to
n-1 which are distinct.
Now we define an induced edge function f : E(Pn ) ! f1; 2; . . .; mg (where m is
finite)
The binary equivalent decimal coding obtained from the incident vertex are labeled
for the edges of (2, 3) - remoteness network of Pn . It is also equivalent to
f ðek ¼ ijÞ ¼ 2ni1 þ 2nj1 where k = f1; 2; 3; . . .; ð2n 5Þg and i; j are finite
positive integer labeled for the vertices of ek .
This vertex labeling of (2, 3) - remoteness network of Pn induces a edge labeling
which are distinct.
) every (2,3) - remoteness network of Pn is IBEDE graceful labeling network if
n [ 4 where n is number of nodes.
2.2 Definition
A network G is said to be Strong Incident Binary Equivalent Decimal Edge Graceful
Labeling (SIBEDE) [5–8] if the vertices of G are labeled with distinct positive integers
from f0; 1; 2; . . .; ðn 1Þg such that the label induced on the edges by binary equiv-
alent decimal coding are distinct from the vertex labeling.
2.2.1 Example
2.2.2 Example
602 K. Thiagarajan et al.
2.2.3 Theorem
Every (2, 3) - remoteness network of Pn is SIBEDE graceful labeling network if n [ 4
where n is number of nodes.
Proof:
Let the vertices of (2, 3) - remoteness network of Pn be v1; v2;...; vn .
The labeling of vertices of (2, 3) - remoteness network of Pn is as follows,
bijective mapping of vertex set is f : V (Pn ) ! f0; 1; 2; . . .; ðn 1Þg
Case (i) When n 6¼ 0ðmod2Þ
f ðv1 Þ ¼ ðn 1Þ
f ðvn Þ ¼ ðn 2Þ
f ðv1 Þ ¼ ðn 2Þ
f ðvn1 Þ ¼ ðn 3Þ
f ðvn Þ ¼ ðn 1Þ
Now the vertices of (2, 3) - remoteness network of Pn are labeled with integers from
0 to n-1 which are distinct.
Now we define edge induced function as f : E(Pn ) ! f1; 2; . . .; mg (where m is
finite)
The binary equivalent decimal coding obtained from the incident vertex are labeled
for the edges of (2, 3) - remoteness network of Pn . It is also equivalent to
f ðek ¼ ijÞ ¼ 2ni1 þ 2nj1 where k = f1; 2; 3; . . .; ð2n 5Þg and i; j are finite
positive integer labeled for the vertices of ek .
This vertex labeling of (2, 3) - remoteness network of Pn induces a edge labeling in
which both labeling are distinct.
Therefore every (2, 3) - remoteness network of Pn is SIBEDE graceful labeling
network if n [ 4 where n is number of nodes.
3 Observation
4 Conclusion
In this paper (2,3) - remoteness network of path network Pn are proved as incident
Binary Equivalent Decimal graceful labeling network and strong Incident Binary
equivalent decimal edge graceful labeling if n [ 4 where n is number of nodes with
examples.
References
1. Gallian JA (2015) A dynamic survey of network labeling. Electron. J. Comb.
2. Bondy JA, Murty USR (1976) Graph theory with applications. London Macmillan
3. Rajeswari V, Thiagarajan K (2016) Study on binary equivalent decimal edge graceful labeling
in Indian J. Sci. Technol. 9(S1). https://doi.org/10.17485/ijst/2016/v9iS1/108356 December,
ISSN (Print): 0974-6846, ISSN (Online): 0974-5645
4. Thiagarajan K, Satheesh Babu R, Saranya K (2015) Construction of network and (i,j) -
distance graph in J. Appl. Sci. Res. ISSN: 1819-544X, EISSN: 1816-157X, September
5. Rajeswari V, Thiagarajan K (2018) Study on strong binary equivalent decimal edge graceful
labeling. In Int. J. Pure Appl. Math. 119(10):1021–1030 ISSN: 1311-8080 (printed version);
ISSN: 1314-3395 (on-line version), special Issue
6. Rajeswari V, Thiagarajan K (2017) SIBEDE approach for total graph of path and cycle graphs
in Middle-East J. Sci. Res. 25(7):1553–1558 ISSN 1990-9233 © IDOSI Publications, 2017.
https://doi.org/10.5829/idosi.mejsr.2017.1553.1558
7. Rajeswari V, Thiagarajan K (2018) Sibede approach for total graph of path and cycle graphs
in Int. J. Pure Appl. Math. 119(10):1013–1020 ISSN: 1311-8080 (printed version); ISSN:
1314-3395 (on-line version) url: http://www.ijpam.eu special issue
8. Rajeswari V, Thiagarajan K (2018) Graceful labeling of wheel graph and middle graph of
wheel graph under IBEDE and SIBEDE approach. J. Phys. 012078 doi:https://doi.org/10.
1088/1742-6596/1000/1/012078
9. Harray F. Graph theory. Narosa Publishing House pvt.ltd- ISBN 978-81-85015-55-2
Cultivar Prediction of Target Consumer Class
Using Feature Selection with Machine
Learning Classification
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology,
Chennai, India
[email protected]
1 Introduction
In machine learning classification problems, the final classification results are based on
number of input features. Since most of the features are correlated, they may be
redundant. This redundant feature increases the storage space and reduces the com-
putation time. Also, when the number of features is high, it is hard to visualize the data
to prior analysis of data. This ensures the need of dimensional reduction algorithms.
The classification performance using machine learning algorithms depends on various
factors. The independent variables in the dataset are called features. If the numbers of
features are high, it is difficult to visualize the training set. Also the majority of the
features may be redundant and are correlated. This necessitates the requirement of
dimensionality reduction algorithms.
The paper is organized in such a way that Sect. 2 deals with the existing works.
Section 3 discuss about Dimensionality Reduction. The system architecture is dis-
cussed in Sect. 4 followed by implementation and Performance Analysis in Sect. 5.
The paper is concluded with Sect. 6.
2 Related Work
3 Dimensionality Reduction
4 Proposed Work
In our proposed work, machine learning algorithms are used to predict the customer
cultivar of wine access. Our contribution in this paper is folded in two ways.
606 S. D. Munisamy et al.
(i) Firstly, the dimensionality reduction is done using three feature selection methods
which results in the existence of reasonable components to predict the dependent
variable cultivar.
(ii) Secondly, the prediction of customer class is done for various classifiers to
compare the accuracy.
Fig. 2. Selected variables from (a) Forward selection (b) Random forest
The Optimized Wine Dataset {Ash alcanity, Phenols, Flavanoids, NonFlavanoids, Hue,
OD280, Proline} from Feature Selection is implemented with 7 classifiers and the
obtained confusion matrix is shown in Fig. 4. The Optimized Wine Dataset {Fla-
vanoids, OD280, Color, Proline, Alcohol, Magnesium, NonFlavanoids, Ash alcanity,
Malic Acid} from Random Forest is implemented with 7 classifiers and the accuracy
details are viewed in confusion matrix confusion matrix is shown in Fig. 5.
The performance metrics like Precision, Recall, FScore and Accuracy for each of
the forward selection method are shown in Tables 3, 4 and 5. The Performance of
different classifiers is accessed and is compared using the metrics such as Precision,
Recall, F-Score and Accuracy and the details are shown in Fig. 6.
610 S. D. Munisamy et al.
Fig. 6. Performance metric analysis (a) Precision (b) Recall (c) F-Sore (d) Accuracy of feature
selection for various classifiers
6 Conclusion
This paper proposes to predict the customer cultivar for the Wine dataset which
decreases the manual computation time thereby increase in Accuracy. An attempt is
made to implement the dimensionality reduction for the Wine dataset using Feature
selection methods like Forward Feature selection, Backward Elimination and Random
forest projection. The obtained optimized dataset from each of the above mentioned
feature selection is trained with different classifiers like Logistic Regressor, Random
Forest KNN, SVM, Naive Bayes, Decision Tree and Kernel SVM and the accuracy is
viewed from the confusion matrix generated after predicting the cultivar from the test
data set. The Experimental Result shows that maximum accuracy of 97.2% is obtained
for Random Forest Projection with SVM, Decision Tree Classifier and Random Forest
Classifier.
References
1. Azadi TE, Almasganj F (2009) Using backward elimination with a new model order
reduction algorithm to select best double mixture model for document clustering. Expert Syst
Appl Int J 36(7):10485–10493
2. Bo L, Wang L, Jiao L (2006) Sparse Gaussian processes using backward elimination. In:
Wang J, Yi Z, Zurada JM, Lu BL, Yin H (eds) Advances in neural networks - ISNN 2006.
LNCS, vol 3971. Springer, Berlin, Heidelberg
612 S. D. Munisamy et al.
3. Chabathula KJ, Jaidhar CD, Ajay Kumara MA (2015) Comparative study of principal
component analysis based intrusion detection approach using machine learning algorithms.
In: 3rd international conference on signal processing, communication and networking.
Chennai, pp 1–6
4. Shimpi P, Shah S, Shroff M, Godbole A (2017) A machine learning approach for the
classification of cardiac arrhythmia. In: International conference on computing methodolo-
gies and communication. Erode, pp 603–607
5. Nair-Benrekia NY, Kuntz P, Meyer F (2017) Combining dimensionality reduction with
random forests for multi-label classification under interactivity constraints. In: The Pacific-
Asia conference on knowledge discovery and data mining, pp 828–839
6. Mim MA, Zamil KS (2018) GIS-based analysis of changing surface water in Rajshahi City
corporation area using support vector machine, decision tree & random forest technique.
Mach Learn Res 3(2):11–17
7. Karnan M, Kalyani P (2010) Attribute reduction using backward elimination algorithm. In:
IEEE international conference on computational intelligence and computing research.
Coimbatore, pp 1–4
8. Muthukrishnan R, Rohini R (2016) LASSO: a feature selection technique in predictive
modeling for machine learning. In: IEEE international conference on advances in computer
applications. Coimbatore, pp 18–20
9. Yan H, Tianyu H (2017) Unsupervised dimensionality reduction for high-dimensional data
classification. Mach Learn Res 2(4):125–132
10. Pavya K, Srinivasan B (2017) Feature selection algorithms to improve thyroid disease
diagnosis. In: International conference on innovations in green energy and healthcare
technologies. Coimbatore, pp 1–5
Prediction of Customer Attrition Using Feature
Extraction Techniques and Its Performance
Assessment Through Dissimilar Classifiers
1 Introduction
In machine learning classification problems, the final classification results are based on
number of input features. Since most of the features are correlated, they may be
redundant. This redundant feature increases the storage space and reduces the com-
putation time. Also, when the number of features is high, it is hard to visualize the data
to prior analysis of data. This ensures the need of dimensional reduction algorithms.
Dimensionality reduction is the process of minimizing the dimensions of the data while
preserving the information.
The generation of datasets has exponential growth in recent days. For example in
the biological domain, the standard microarray datasets have more than thousand
variables in the instances. Explosion of variables are also found in the fields of image
processing, time series analysis, automatic text analysis and internet search engines.
Statistical and machine learning algorithms used for the above domains face challenges
in handling high dimensional data.
This paper discusses four techniques in feature extraction in detail: PCA, SVD, ICA
and FA and compares its performance using different classifiers. Section 2 surveys the
application of the techniques in various domains followed by the details of feature
extraction techniques in Sect. 3. The experimental setup and the results are discussed in
Sect. 4 and the research findings are concluded in Sect. 5.
2 Related Work
ðtÞ ðtÞ
(i) Observed data Xi can be modeled using source variables si
ðtÞ
X ðtÞ
Xi ¼ uij si ; i ¼ 1; 2; . . .:; n ð1Þ
A ¼ UDV T ð3Þ
(iii) Arrange eigen values in the non increasing order and select ‘p’ eigen vectors that
correspond to m largest eigen values where p < m
(iv) The projection matrix W is constructed from the ‘p’ eigen vectors
(v) Original dataset X is transformed to W to obtain the new feature subspace Y
The performance metrics like Precision, Recall, FScore and Accuracy for each of
the forward extraction method are shown in Tables 1, 2, 3 and 4.
6 Conclusion
This paper explores feature extraction techniques for dimensionality reduction. The
procedures in applying these techniques are discussed in detail. The performance of
these techniques is tested using customer churn dataset. The dataset has 12 independent
variables and these are transformed into 3 components using FA, ICA, SVD and PCA
independently. These reduced components of each feature extraction technique are fed
to six different classifiers to evaluate their performance. Performance metrics such as
Precision, Recall, F-Score and Accuracy are used for final assessment. Results show
that dimensionality reduction using PCA performs well compared to other techniques
for the implemented dataset. The performance of PCA on test dataset for every
620 R. Suguna et al.
classifier is visualized using scatterplots. The above work exhibits the retention rate of
customers in a bank. The above technique can be extended for other applications in
different domains.
References
1. Araki T, Ikeda N, Shukla D, Jain PK, Londhe ND, Shrivastava VK, Banchhor SK, Saba L,
Nicolaides A, Shafique S, Laird JR, Suri JS (2016) PCA-based polling strategy in machine
learning framework for coronary artery disease risk assessment in intravascular ultrasound:
A link between carotid and coronary grayscale plaque morphology. J Comput Methods
Programs Biomed 128:137–158
2. Voyant C, Notton G, Kalogirou S, Nivet ML, Paoli C, Motte F, Fouilloy A (2017) Machine
learning methods for solar radiation forecasting: A review. J Renew Energy 105:569–582
3. Esmalifalak M, Liu L, Nguyen N, Zheng R, Han Z (2017) Detecting stealthy false data
injection using machine learning in smart grid. J IEEE Syst J 11(3):1644–1652
4. du Buisson L, Sivanandam N, Bassett BA, Smith M (2015) Machine learning classification
of SDSS transient survey images. Mon Not R Astron Soc 454:2026–2038
5. Howley T, Madden MG, O’Connell ML, Ryder AG (2006) The effect of principal
component analysis on machine learning accuracy with high dimensional spectral data.
In: Applications and innovations in intelligent systems XIII, Springer, London 2006,
pp 209–222
6. Naik GR (2018) Advances in principal component analysis. Springer, Singapore
7. Bhagoji AN, Cullina D, Mittal P (2016) Dimensionality reduction as a defense against
evasion attacks on machine learning classifiers. J. CoRR, abs/1704.02654
8. Sacha D et al (2017) Visual interaction with dimensionality reduction: a structured literature
analysisIEEE Trans Vis Comput Graph 23(1):241–250
9. Radüntz T, Scouten J, Hochmuth O, Meffert B (2017) Automated EEG artifact elimination
by applying machine learning algorithms to ICA-based features. J J Neural Eng 14
(4):046004
10. Desai U, Martis RJ, Nayak CG, Sarika K, Seshikala G (2015) Machine intelligent diagnosis
of ECG for arrhythmia classification using DWT, ICA and SVM techniques. In: 2015
Annual IEEE India Conference (INDICON), New Delhi, pp 1–4
11. Mori Y, Kuroda, M, Makino, N (2016) Nonlinear principal component analysis and its
applications. Springer, New York
12. Yan H, Tianyu H (2017) Unsupervised dimensionality reduction for high-dimensional data
classification. Mach Learn Res 2(4):125–132
13. Himabindu G, Ramakrishna Murty M et al (2018) Extraction of texture features and
classification of renal masses from kidney images. Int. J. Eng. Technol. 7(2):1057–1063
14. Navya Sri M, Ramakrishna Murty M et al (2017) Robust features for emotion recognition
from speech by using Gaussian mixture model classification. In: International Conference
and published proceeding in SIST series, Springer, (2), pp 437–444
An Appraisal on Assortment of Optimization
Factors in Cloud Computing Environment
1 Introduction
The terms Cloud or Internet, a metaphor of technologies, shifts the computing from
individual application server to Cloud of Computers or Resources [1]. In Cloud
Environment, resources are placed on geographically distributed linked through net-
work to create natural cloud with vast amount of computing capacity to solve complex
problems. Several work in the world are processed through Internet, the name ‘Cloud’
referred in Cloud Computing means Internet, in turn called as computing through
Internet. Cloud computing has come into view as to satisfy high performance com-
puting with large number of varied computing resources. “A Cloud is a Computational
environment, type of Distributed System collection of interconnected, virtualized
computers, are dynamically provisioned, presented as one or more unified computing
resources based on service level agreements through conciliation between the service
provider and consumers” [2]. Storing data in ordinary desktop machine or in local
network system cant access anywhere in world so can’t provide timely solution; storing
data in Cloud has meaningful, efficient and timely responses. The users of Cloud can
avail the resources through pay and use with services through their demand and needs.
The National Institute of Standards and Technology (NIST) gives the prescribed
specification for Cloud Computing as “Cloud can be termed as a type of computing
architecture, which enables on-demand and convenient services in assortment of
resources by providing computing capabilities like Storage, Processing Servers, Net-
work connection and application to contact, which can be provisioned and released
rapidly with minimal effort and integration from the service provider” [3].
2 Cloud Architecture
3.1 Scheduling
Major work of Cloud computing environment is Scheduling. Scheduler is a component
which contains resources and request from user, and major work of scheduler is to
allocate jobs to its associative resources. Scheduling of job takes place in various layers
like Platform, Application and Virtualization. Application layer Scheduling takes place
in user side. The basis of Cloud environment is anywhere and anytime, and hence when
demands arise from huge number of users, scheduling also has become necessary to
process the user jobs. The next layer scheduling is at Virtualization, which focusses on
mapping virtual resources into physical resources. Scheduling in virtual layer also
necessary to map request effectively. Next level of scheduling is at infrastructure layer,
concerned with optimal and strategic infrastructure.
3.3 Energy
The next key issue in Cloud Computing is energy consumption. Cloud Computing uses
large number of Hosts with different Computers with high configuration specification,
so consumption of energy will also be high for single data center. $40 billion are spent
annually for Consumption of Energy by enterprises [8] including waiting time of
resources. It is essential to improve efficient load balancing, scheduling and resource
utilization, to reduce the operational cost and energy usage.
(continued)
An Appraisal on Assortment of Optimization Factors 625
(continued)
No Authors Year Objective Techniques used Experimental Type of data Comparison Outcome
environment algorithm
9 Thiago A.L. 2015 Scientific PSO based Developed a Cybershake, Naive Approach Reduces
Genez et al. Workflow in Procedure is Simulator Sphit, Makespan of
[17] Public Cloud proposed Ligo Workflow
10 Atul Vikas 2015 Task Scheduling Multi Objective Cloud Sim Author 1. FCFS Minimum
Lakra et al. Task Scheduling 3.0.2 Generated Data 2. Priority Scheduling overall
[18] Workload: 1–6, Execution
VM: 3–10, Time
Tasks: 20–100
11 Nindhi Bansal 2015 Task Scheduling Proposed Cloud Sim 3.0 VM Size: Traditional FCFS Algorithm
et al. [19] Quality of Host 10000, gives good
Service Driven Configuration: Task Length: performance,
Scheduling Bandwidth: 40000 Resource
10000, RAM: File Size: 300 Utilization
16384, Storage:
1000000
12 Awad. A.I 2015 1. Task Mathematical CloudSim 6 Data Center, LCFP Model
et al. [20] Scheduling, 2. Model used with Host: 3–6, provides high
Search for Particle Swarm VM: 50, Task: Makespan, and
Optimal and Optimization for 1000–2000 saves
Predictive Load Balancing Transmission
Scheduler Cost and
Roundtrip time
13 Stelios 2015 Focused on Inter Cloud Simulating the Different Case Makespan and
Sotiriadis et al. Optimization of Meta- Inter-Cloud Turn Around
[21] IaaS Scheduling (SimIC) Time are
Performance improved
14 Piotr Bryk 2016 1. Focused on Workflow aware Cloud Cybershake, Static Provisioning Reduces File
et al. [22] Data Intensive Dynamic Workflow Sphit, Static Algorithm Transfer
Workflow Provisioning Simulator Montage, (SPSA) during
2. IaaS Cloud Dynamic Epigenomics execution
Scheduling Scheduling
(DPDS)
15 Xiao-Lan Xie 2016 Trust in Proposed Cloud Sim CPU Virtual Nodes: 1. Genetic Reduces total
et al. [23] Scheduling Particle Swarm Core: 2.50 Hz, 150, Task: Algorithm, Task
Simulated HD: 500 GB, 100–600 2. TD Min-Min Completion
Annealing Memory: 6 GB Time
16 Mohammed 2016 Task Scheduling Symbiotic Cloud Sim Data Instance: 1. PSO, Minimum
Abdullahi in Cloud Organism 100, 500, 1000 2. SA- PSO, Makespan
et al. [24] Search proposed
using discrete
fashion
17 Weiwei Kong 2016 Resource Dynamic VM Cloud Sim CPU: 2.4 GHz, 1. Fixed Price VM Proposed
et al. [25] Scheduling Resource Memory: Resource Algorithm
allocation and 4 GB, Hardisk: Allocation, effectively
Supply based on 512 MB, VM: 3. VM Resource enhance
Auction 40000, Client: Dynamic Scheduling Quality of
20000 Service
18 Woo-Chan 2016 Cost Minimizing Author 60 month 1. Optimal, Cost Saving
Kim et al. [26] Optimization Price of IaaS Developed instances with 2. Basement
Service Simulation Tool six classes of
data
19 Suvendu 2016 Lease Uses Analytical MATLAB Experiment: 1– 1. Back Filling Minimizing
Chandan Scheduling Hierarchy R2010 10, VM: 4,6,8, Algorithm, the lease
Nayak et al. Process (AHP) No of Lease: 2. Back Filling rejection
[27] 5–50 Algorithm using AHP
20 Shafi 2016 Fault Tolerance Dynamic Cloud Sim First Scenario: MTCT, MAXMIN, Producing
Muhammed aware Clustering 3.0.3 on Eclipse 5 brokers, 2 Ant Colony Lower
Abdulhamid Scheduling League IDE Data Centers Optimization, Makespan
et al. [28] Championship and 10 VMS. Genetic Algorithm
Algorithm
(continued)
626 P. Deepan Babu and T. Amudha
(continued)
No Authors Year Objective Techniques used Experimental Type of data Comparison Outcome
environment algorithm
(DCLCA) Second
Scheduling Scenario:
10 user, 10
brokers, 5 Data
center 25VMS
21 Isreal Cases 2017 Work flow, Task Balanced with VMWare-Esxi 1. Montage, 2. Provenance BaRRS proves
et al. [29] Allocation and File reuse and based Private Cybershake, 3. superior
Scheduling replication Cloud Epigenomics, performance
technique 4. Ligo
Scheduling
Algorithm
22 Hancong 2017 Optimizing Proposed Author Production 1. MM Exhibits
Duan et al. Power PreAntPolicy Simulator - Compute 2. FF excellent
[30] Consumption Light and Cluster of 3. RR energy
Powerful Google efficiency and
Simulator resource
reference Utilization
Cloudsim
23 Weihong 2017 Satisfying Minimizing Simulator in Task: 77, 299, 1. HBCS, Obtains shorter
Chen et al. Budget Constant Scheduling Java 665, 1175, 2. DBCS Schedule
[31] Level using 1829, 2627 length with
Budget Budget
Constraint Constraints
(MSLBL)
24 Ladislau 2017 Scheduling Proposed Amazon EC2 1. No Cost VOL out
Boloni et al. Cloud Resources Computation Approach, performs other
[32] Scheduling 2. Data Center, approaches
VOL 3. Cloud Computing,
25 Wei Zhu et al. 2017 Energy Saving, Virtual Resource Simulation VM: S1 - S4, 1. MVBPP, TVRSM
[33] Virtual Resource Scheduling Experiment CPU: 500– 2. HVRAA effectively
Scheduling through three based on 2500, RAM: reduce Energy
dimension CloudSim 613, 1740, 870, Consumption
Bandwidth:
100Mbps
26 Hend Gamel 2017 Jobs/Task Proposed 3 Simulation 4 task: 1. TS, Minimum
El Din Hassen Scheduling Grouped Task Programs in Urgent User, 2. Min-Min Execution
Ali et al. [34] Scheduling Java Urgent Task, Algorithm Time
Long Task,
Normal Task,
200, 400, 800,
1200, 2400 Task
27 Shymaa 2017 Work Flow Proposed 1. CloudSim, 2. 1. Sphit, 2. 1. FCFS Enhancements
Elsherbiny Scheduling Extended WorkflowSim Inspiral, 3. 2. MCT in performance
et al. [35] Natural-based Cybershake, 4. 3. MIN-MIN and cost in
Intelligent Water Montage, 5. 4. RR most situations
Drops Epigenomic, 6. 5. MAX MIN
Workflow100
28 Tom Guerout 2017 Quality of Proposed Host - 5 to 110, 1. Genetic Effective
et al. [36] Service Multiobjective VM - 15 to Algorithm - 7 Response
Optimization optimization of 400, Instance – Generations, Time
four relevant 1–9 2. Mixed Integer
Cloud QoS Linear Programming
Objective
29 Dinh-Mao Bui 2017 E2M Orchestrating 1. Google 16 1. Default Scheme, Reduces
et al. [37] the Cloud Trace, 2. Homogeneous 2. Greedy First Fit Energy
Resource using Montage Servers, 29day Decreasing, Consumption
Energy Efficient Workflow period of 3. E2M, by System
Google Trace, 4. Optimal Energy Performance
montage Aware
(continued)
An Appraisal on Assortment of Optimization Factors 627
(continued)
No Authors Year Objective Techniques used Experimental Type of data Comparison Outcome
environment algorithm
30 Preeta Sarkhel 2017 Task Scheduling Minimum Level Core i3 N Clouds Minimum Higher
et al. [38] Algorithm Priority Queue processor with C = {C1,C2,… Completion Cloud Resource
(MLPQ) Windows and CN},M (MCC) Scheduling, utilization with
Algorithm, Min Dev-C ++ IDE Applications Cloud List minimum
Median, A = {A1,A2,. Scheduling (CLS) makespan
Mean-Min-Max AN}
Algorithm DAG
Representation
31 Yibin Li et al. 2017 Dynamic Task Novel Energy- CPU: 1.7Ghz, Benchmarks: 1. Dynamic version Reduces
[39] Scheduling, aware Dynamic RAM: 2 GB, WDF, 2D, Parallelism-based Energy
Power Task Scheduling Mobile Device MDFG, BR, (PS) Algorithm Consumption
Consumption for (EDTS) Emulator: Floyd, ALL 2. Critical Path
Smart Phones, Android Pole Dynamically
Software Scheduling (CPDS)
Development Kit
(SDK)
32 Hongyan Cui 2017 Task/Service Combined Ant Core i3 Data Center: 1 ACO, GA Objective
et al. [40] Scheduling in Colony Processor with Cloud Network Function and
Cloud Optimization 2.10 GHz, and Nodes: 20–100 Convergence
and Genetic 10 GB RAM Task Size: 10– Speed are
Algorithm 100 given Optimal
Task Length: result
500–1000
33 George- 2017 Service Level SLA_and_ Cloud 30 Cloudlets- First Fit Scheduling, Achieved
Valentin Agreements in Weight_ Simulator - 100.000 MIPS Weight aware Profit
Iordache et al. Cloud Aware_Broker_ CloudSim 3 Virtual Broker based Optimization
[41] Scheduling Based Machines - Scheduling, SLA
Scheduling Processor aware broker based
Capacity (2000, Scheduling
3000, 6000)
34 Fredy Juarez 2018 Parallel Task Real Time DAG Dell Notbook, EP, MT, GT, SG Aims to
et al. [42] Scheduling Dynamic Representation Intel i7- Minimizing
through Energy Scheduling 2760QM normalized bi-
aware System 2.40 GHz, 8 GB objective
Memory, function,
800tasks, Minimize
DAG and Energy
Resources Consumption
35 Bhavin 2018 Multi Objective RR scheduling Different Case Round Robin Reduce
Fataniya et al. Task Scheduling algorithm using with Different Algorithm, Waiting Time,
[43] dynamic time Arrival Time MRRA Turnaround
quantum time
36 Stelios 2018 Virtual Machine Real time MangoDB, Time Stamp: OpenStack Major
Sotiriadis et al. Scheduling Virtual Resource YCSB, 2000–10000 improvements
[44] Monitoring Elasticsearch in the VM
through Self node placement
managed VM process
Placement
(continued)
628 P. Deepan Babu and T. Amudha
(continued)
No Authors Year Objective Techniques used Experimental Type of data Comparison Outcome
environment algorithm
37 Sukhpal Singh 2018 Cloud Resource Scheduling and Cloud Sim First Resource PSO-HPC Execute
Gill et al. [45] to Cloud Resource with 160 GB PSO-SW workloads
Workload Provisioning in HDD, 1 GB PSO-DVFS effectively on
Cloud RAM, Core 2 available
Environment Duo and resources
Windows.
Second
Resource with
160 GB HDD,
1 GB RAM,
Core i5 and
Linux.
Third Resource
with 320 GB
HDD, 2 GB
RAM, XEON
and Linux
38 Sayantani 2018 Scheduling Coalesce No of DAG GA, ACO Minimize
Basu et al. Cloud Task for Genetic Processors: 1– Representation Total
[46] IoT Applications Algorithm & 10 Execution
Ant colony Time
algorithm (Makespan)
39 Zong-Gan 2018 Cloud Workflow Multiobjective Amazon EC2 Workflow HEFT Proposed
Chen et al. Scheduling Ant Colony Cloud Platform Instances Algorithm
[47] System have proven
(MOACS) better search
ability
40 Sathya 2018 Byzantine fault Workload Cloud Sim 800 Hosts Most Efficient Fault
Chinnathambi tolerance Sensitive Server using 1052 VMs Server First (MESF) Tolerance
et al. [48] Scheduling workflowSim- reduced
(WSSS), 01 Supported in through TCC
Tactical Java Versions Effective Virtual
Coordinated Resource
Checkpointing allocation
(TCC) through WSSS
6 Conclusion
References
1. Gupta A, Gupta G (2016) A survey on load balancing algorithms in cloud computing
environment. Int J Innovative Eng Res 4(6)
2. Buyya R, Chee SY, Venugopal S, Roberg J, Brandic I (2009) Cloud computing and
emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility.
Future Gener Comput Syst 25(6):599–616
3. Mell P, Grance T (2009) Draft NIST working definition of cloud computing, V15
4. Beloglazov A, Abawajy J, Buyya R (2012) Energy-aware resource allocation heuristics for
efficient management of data centers for cloud computing. Future Gener Comput Syst.
Elsevier
5. Akilandeswari P, Srimathi H Deepli (2016) Dynamic scheduling in cloud computing using
particle swarm optimization. Indian J Sci Technol 9(37)
6. Vinotina V (2012) A survey on resource allocation strategies in cloud computing. Int J Adv
Comput Sci Appl 3(6)
7. Singh A, Tiwari VK, Dr. Gour B (2014) A survey on load balancing in cloud computing
using soft computing techniques. Int J Adv Res Comput Commun Eng 3(9)
8. Ranganathan P (2010) Receipe for efficiency: principles of power aware computing. ACM
Commun
9. Chang H, Tang X (2010) A load-balance based resource-scheduling algorithm under cloud
computing environment. International conference on web-based learning. Springer
10. Li K, Xu G, Zhao G, Dong Y, Wang D (2011) Cloud task scheduling based on load
balancing ant colony optimization. Sixth annual China Grid conference, IEEE
11. Goudarzi H, Ghasemazar M, Pedram M (2012) SLA-based optimization of power and
migration cost in cloud computing. IEEE Xplore
12. Lin W, Liang C, Wang JZ, Buyya R (2012) Bandwidth-aware divisible task scheduling for
cloud computing. Software-Practice and Experience, John Wiley & Sons, Ltd
13. Wang W, Zeng G, Tang D, Yao J (2012) Cloud-DLS: dynamic trusted scheduling for cloud
computing. Expert Syst Appl. Elsevier
14. Vignesh V, Sendhil Kumar KS, Jaisankar N (2013) Resource management and scheduling in
cloud environment. Int J Sci Res Publ 3(6), June
15. Ghribi C, Hadji M, Zeghlache D (2013) Energy efficient VM scheduling for cloud data
centers: exact allocation and migration algorithms. Conference paper, research gate, May
630 P. Deepan Babu and T. Amudha
16. Wu X, Mengqing D, Zhang R, Zeng B, Zhou S (2013) A task scheduling algorithm based on
QoS-driven in cloud computing. Procedia Comput Sci. Elsevier
17. Genez TAL, Pietri I, Sakellariou R, Bittencourt LF, Madeira ERM (2015) A particle swarm
optimization approach for workflow scheduling on cloud resources priced by CPU
frequency. IEEE Xplore Digital Library
18. Lakra AV, Yadav DK (2015) Multi-objective tasks scheduling algorithm for cloud
computing throughput optimization. International conference on intelligent computing,
communication & convergence. Procedia Comput Sci. Elsevier
19. Bansal N, Maurya A, Kumar T, Singh M, Bansal S (2015) Cost performance of QoS driven
task scheduling in cloud computing. Procedia Comput Sci, ScienceDirect. Elsevier
20. Awad AI, El-Hefnewy NA, Abdel Kader HM (2015) Enhanced Particle swarm optimization
for task scheduling in cloud computing environments. Procedia Comput Sci ScienceDirect.
Elsevier
21. Sotiriadis S, Bessis N, Anjum A, Buyya R (2015) An inter-cloud meta scheduling (ICMS)
simulation framework: architecture and evalution. IEEE Trans Software Eng
22. Bryk P, Malawski M, Juve G, Deelman E (2016) Storage-aware algorithm for scheduling of
workflow enables in clouds. J Grid Comput. Springer
23. Xie XL, Guo XJ. Research on task scheduling algorithm based on trust in cloud computing.
J Database Theory Appl 9(6)
24. Abdullahi M, Ngadi MA, Abdulhamid SM (2016) Symbotic organism search optimization
based task scheduling in cloud computing environment. Future Gener Comput Syst. Elsevier
25. Kong W, Lei Y, Ma J (2016) Virtual machine resource scheduling algorithm for cloud
computing based on auction mechanism. Optik. Elsevier
26. Kim W, Jo O (2016) Cost-optimized configuration of computing instances for large sized
cloud systems. ScienceDirect, KICS, Elsevier
27. Nayak SC, Tripathy C (2016) Dealine sensitive lease scheduling in cloud computing
environment using AHP. J King Saud University - Computer and Information Sciences
28. Abdulhamid SM, Latiff MSA et al (2016) Fault tolerance aware scheduling technique for
cloud computing environment using dynamic clustering algorithm. Neural Comput Appl.
Springer
29. Isreal C, Taheri J, Ranjan R, Wang L, Zomaya AY (2017) A balanced scheduler with data
reuse and replication for scientific workflows in cloud computing systems. Future Gener
Comput Syst. Elsevier
30. Duan H, Chen C, Min G, Wu Y (2017) Energy-aware scheduling of virtual machines in
heterogenous cloud computing systems. Future Gener Comput Syst. Elsevier
31. Chen W, Xie G, Li R, Bai Y, Fan C, Li K (2017) Efficient task scheduling for budget
constrained parallel applications on heterogeneous cloud computing system. Future Gener
Comput Syst. Elsevier
32. Boloni L, Turgut D (2017) Value of information based scheduling of cloud computing
resources. Future Gener Comput Syst. Elsevier
33. Zhu W, Zhuang Y, Zhang L (2017) A three-dimensional virtual resource scheduling method
for energy saving in cloud computing. Future Gener Comput Syst. Elsevier
34. Ali HGEDH, Saroit IA, Koth AM (2017) Grouped task scheduling algorithm based on QoS
in cloud computing network. Egypt Inform J
35. Elsherbiny S, Eldaydamony E, Alrahmawy M, Reyad AE (2017) An extended intelligent
water drops algorithm for workflow scheduling in cloud computing environment. Egypt
Inform J. Elsevier
36. Guerout T, Gaoua Y, Artigues C, Da Costa G, Lopez P (2017) Mixed integer linear
programming for quality of service optimization in clouds. Future Gener Comput Syst.
Elsevier
An Appraisal on Assortment of Optimization Factors 631
37. Bui DM, Yoon Y, Huh EN, Jun S, Lee S (2017) Energy efficiency for cloud computing
system based on predictive optimization. J Parallel Distrib Comput. Elsevier
38. Sarkhel P, Das H, Vashishtha LK (2017) Task-scheduling algorithms in cloud environment.
Adv Intell Syst Comput. Springer, May
39. Li Y, Chen M, Dai W, Qiu M (2017) Energy optimization with dynamic task scheduling
mobile computing. IEEE Syst J 11(1)
40. Cui H, Liu X, Yu T, Zhang H et al (2017) Cloud service scheduling algorithm research and
optimization. Hindawi Secur Commun Netw. Wiley, Volume
41. Iordache GV, Pop F, Esposito C, Castiglione A (2017) Selection-based scheduling
algorithms under service level agreement constraints. 21st international conference on
control systems and computer science
42. Juarez F, Ejarque J, Badia RM (2018) Dynamic energy-aware scheduling for parallel task-
based application in cloud computing. Future Gener Comput Syst. Elsevier
43. Fataniya B, Patel M (2018) Dynamic time quantum approach to improve round Robin
scheduling algorithm in cloud environment. IJSRSET 4(4)
44. Sotiriadis S, Bessis N, Buyya R (2018) Self managed virtual machine scheduling in cloud
systems. Inf Sci. Elsevier
45. Gill SS, Buyya R, Chana I, Singh M, Abraham A (2018) BULLET: particle swarm
optimization based scheduling technique for provisioned cloud resources. J Netw Syst
Manage. Springer
46. Basu S, Karuppiah M, Selvakumar K, Li KC et al (2018) An intelligent/cognitive model of
task scheduling for IoT applications in cloud computing environment. Future Gener Comput
Syst. 88:254–261
47. Chen ZG, Gong YJ, Chen X (2018) Multiobjective cloud workflow scheduling: a multiple
populations ant colony system approach. IEEE Trans Cybern. IEEE
48. Chinnathambi S, Dr. Santhanam A (2018) Scheduling and checkpointing optimization
algorithm for Byzantine fault tolerance in cloud clusters. Cluster Computing, Springer
49. Madni SHH, Latiff MSA, Coulibaly Y (2016) Resource scheduling for infrastructure as a
service (IaaS) in cloud computing: challenges and opportunities. J Netw Computer Appl.
Elsevier
A Modified LEACH Algorithm for WSN:
MODLEACH
Abstract. Communication has its roots right from the beginning of the human
race. As humans evolved the need for communication increased tremendously
and as well the technology of communication progressed. The era of commu-
nication from analog to digital has further advanced to wireless with Sensor
networks occupying the most of area in communication nowadays. Communi-
cation community is highly depending upon wireless sensor networks and they
are the key technology in communications. Even though these networks are
widely used they are is still need for research such that its full features and the
advantages can be fully utilized in the field of communication. The basic con-
cern of these networks is energy and security as the energy is limited and the
networks are wireless. Hence more algorithms have been developed for the
energy efficiency. The popular among these networks is the Low Energy
Adaptive Clustering Hierarchy (LEACH) algorithm. In this paper, a modifica-
tion of MODLEACH with cluster based routing is developed as Modified
LEACH(MODLEACH) in which the energy efficiency is increased consider-
ably. The method of alternate cluster head is introduced by allotting threshold
power levels. The results of this paper are studied along with LEACH and found
to be appreciably good. The network metrics of cluster head formation, energy
efficiency throughput and network life. An analysis is done with hard and soft
threshold levels of the cluster head are studied.
1 Introduction
Data transmission regarding various tasks is needed in our day to day life in a faster and
rapid manner. While data is being transmitted the user wants it to be secure and
efficient. The security personnel wants to transmits the data at high speeds, efficiently
and in a secured fashion as no intruder should access the information passed. Similarly
for circuit designing data is passed regarding various quantities between different
workbench to develop and simulate any design step by step. The rapid changes in
climate and their by hazards can be averted with the fast data transmission to avoid any
damage. Hence the fast data transmission has become a part of the routine life. Right
from the technology development many networks are being developed for fulfilling the
need of data transmission effectively right from the classic method networks like
Analog transmission systems to Digital mode of Telephone then to Cellular and mobile
networks. As the need demand of the data transmission is increasing parallel networks
and ad hoc networks are developed for rapid transmission with infra structure less
networks. In a network where the movement of data is liberal it is preferred and multi
hop networks are more suited for this purpose. As technology progressed further the
data transmission between machines was introduced where data was acquired and
processed in a manner which is understandable and was presented to humans. The mote
also termed as sensor or node is the basic unit in wireless sensor networks, which
performs the basic operations of sensing the data, processes it out and also commu-
nicates. For the operation of nodes power is needed and which is a limited source in
wireless sensor networks. Hence the resource of power has to be designed such that it
will be utilized fully and give maximum life to the sensor in the network. Since power
is one of the constraints in the wireless sensor networks more and more algorithms are
being developed for increase of the life time of the nodes. More emphasis is being
given on protocols of the network and sophisticated circuits are being developed to
overcome the power constraint. Multi hop networks carries a lot of data and as this is
being transmitted and received by the sensors present in the network, algorithms which
involve data fusion and data aggregation has to be evolved along with the features of
multi hop transmission algorithms and cluster based algorithms.
As wireless sensor networks has to handle a lot of data and being communicated
between number of nodes present in the network, there is a need for the efficient
utilization of the routing protocols such that the network metrics like throughput, mean
life time and efficiency in the network can be increased keeping in view of power
consumption. In this paper an algorithm is proposed in which the power consumption
by the nodes in the network are considerably decreased and also comparison is made
with the existing LEACH algorithm by taking the network parameters.
2 Literature Survey
As the field of wireless sensor network has almost driven into the worlds communi-
cation technology, in the market already many types of sensor nodes are available at
low cost and which are capable of fast transmission and reception with little power
consumption. As the number of nodes increases in the network and so the power
required for operation also increases but as already discussed power is in wireless
sensor networks is limited it has to be utilized precisely which in turn can increase the
life of the node in the network. Hence the routing protocols chosen plays a vital role not
only in utilization of power effectively but also in increase of throughput as well as the
security of the data. A study survey has been made to understand the disadvantages of
direct transmission algorithm in which the data is directly read and sends to the base
station. In this transmission the power consumption is high which reduces the life time
of the sensor if the base station is far away from the source [1]. If the power is not
sufficient and the base station is very far the node may die while propagating,
decreasing the networks efficiency. As this was major concern in the network, multi
hop technology of transmission was evolved in which the minimum transmission
energy concept is used. By the use of this concept the advantage is, the nodes which are
634 M. Khaleel Ullah Khan and K. S. Ramesh
far away from source remained alive than the nodes which are near to the base station.
This is because all the data is being routed to the base station and hence the nodes near
the base station were not alive. Also the transmitting of large amount of data sensed
from every node is consuming a lot of power. Hence to overcome this problem the
concept of hierarchal clustering which deals with asymmetric communication in
wireless sensor networks is introduced and had a considerable effect on power con-
sumption. Also the data process and dissemination methods using direct diffusion
concept were introduced in networks [2, 3]. A further improved version of data
transmission in wireless sensor networks is done using cluster based routing protocol
(CBRP) in which a two hop cluster method is used where all the nodes are covered [4].
This method form the basis for the clustering algorithms using hierarchy even though it
is not a energy efficient algorithm. Later the hierarchal clustering algorithms developed
were found to be energy efficient like LEACH [5, 6]. In these algorithms, clusters are
formed of sensor nodes and a cluster head is formed among them which will be
responsible for receiving the sensed data from other nodes in the network. Whatever
the data is received by the cluster head is aggregated and then further transmitted to the
base station. This method improves the networks throughput and also the mean life
time of the sensors increases [7]. Another method of data transmission in wireless
sensor networks is communication of nodes with other cluster nodes called inter cluster
communication [8]. In this method the aggregated data is being transmitted in multi
hop from one cluster head to other cluster head and reaches the base station. In this
principle the networks life time is improved appreciably. With the help of above
discussed algorithms numerous protocols with enhanced features.
As we have seen that the basic concept behind developed algorithms and effective
utilization of protocols of the network is to enhanced the life time of the network by
effectively utilizing the limited energy available. Models having been developed where
a cluster is divided into minimum energy regions and some as idle and operational
mode sensors [21, 22]. If the network is heterogeneous then the algorithm is developed
making one of the node as high energy cluster head applying initial conditions [9, 10].
Referring to the papers [11–13] a node with mean life time which is in the cluster can
be selected as the cluster head by the principle of election Protocol which gives
weighted probability depending upon the energy possessed by the node selects in
becoming a cluster head [11]. In Protocol DEEC [12] existing energy in node are
election criteria of a node to become a cluster head. The various predominant available
routing protocols existing are LEACH [1], TEEN [14], SPIN [11], AODV [12] and
PEGASIS [15] for wireless sensor networks. LEACH algorithm has given the basic
principle of selecting a node as cluster head and modifications on this algorithm has
given way to SPIN, TEEN and DEEC by the concept of threshold levels and also
making the nodes reactive. By the concept of threshold in routing protocols the per-
formance and the efficiency of the network can be enhanced. In LEACH algorithm has
the data flow in three phases as advertising phase, Cluster Set up phase and Scheduling
phase. In homogeneous wireless sensor networks the algorithm which optimizes the
energy and life time is Q-LEACH algorithm [16]. The comparison of different LEACH
variant algorithms used in wireless networks is done in the paper [18] which compares
performance parameters of the network such as throughput and efficiency in different
applications. A similar comparison analysis between LEACH, Multi level Hierarchal
A Modified LEACH Algorithm for WSN: MODLEACH 635
LEACH and Multi hop LEACH is shown in the reference paper [23]. How the SEP
algorithm enhances the features of the heterogeneity network is referred in paper [17].
In the papers [19] and [20] models have been proposed with modifications previous
versions of protocols which have enhanced results compared to that of SEP and DEEC
stability, throughput and mean life time of the nodes.
The protocol LEACH, got due importance in sensor networks and many of the variants
or modifies forms of LEACH have been developed. This algorithm is well suited for
homogeneous networks as it is compact and well defined in the network. In this
protocol, while transmitting of data a new cluster head is elected in every round there
by giving rise to cluster formations. In doing so the energy which is limited is being
extensively used. The procedure of selecting cluster head depends upon the usage of
the energy in round. The nodes with low energy utilized in current round may become
cluster heads in the next round. Hence energy is wasted in every round due to the
formation of cluster heads. Therefore an energy efficient algorithm is to be evolved
such that there is no wastage of energy. Many of the protocols such as LEACH, the
nodes uses amplification energy for transmission of data irrespective of the distance
between source and sink. The level of energy required for transmission of packets to
the cluster head should be different when they are at different distances. If the same
level of power level is maintained then there is surely wastage of power. To avoid such
a situation there should be global knowledge about the network and there should be
decision taken by the nodes itself that how energy level is required for transmission and
to amplify. As it is cumbersome to have the knowledge of all the nodes located in the
network and calculating of energy levels for efficient energy utilization. In this paper an
efficient cluster head election and transmission of power at two different levels is
propose which will overcome the above mentioned problems and increases the effi-
ciency. The functioning of protocol for different models is shown in Fig. 1.
4 Proposed Algorithm
along with effective utilization of energy for by selecting head in the cluster the level of
power is changed at different levels depending upon the data transmission.
In wireless sensor networks which uses cluster based protocols there are different
types of transmission taking place. They can be classified as Transmission within the
cluster, transmission in different clusters and from cluster head to the base station.
The sensors in the sensor network environment senses the data through nodes and
transmits to the cluster head with in the cluster which is termed as Intra cluster
transmission that is within the cluster. The other type of transmission in which the
collected data by the cluster head is transmitted from one cluster to the other cluster
head in the other cluster is called inter cluster transmission. Transmission and reception
takes place between cluster heads. The other way in which communication of sensors
takes place is cluster head to base station transmission. Except for intra cluster head
transmission the minimum energy is required for the inter cluster or cluster head to base
station transmission. But in LEACH, amplification energy required is same for all the
three kinds of transmissions. As the energy is lowered and a threshold is made in intra
cluster transmission the energy can be much saved as compared to other modes of
transmissions. The advantage of using multi hop transmission is that the efficiency of
the network can be increased as the packet drop ratio and collisions are reduced. For
simulation in our model an area of 15 15 m2 is taken. A field area of 120 120 m2 is
A Modified LEACH Algorithm for WSN: MODLEACH 637
considered. The routing protocol switches the nodes as cluster heads depending upon
the energy levels in each round. A node will become cluster head and uses high
amplification of power and in the next round it switches to low energy levels. In this
way soft and hard threshold schemes are employed in this model thereby increases the
efficiency of power.
The designed model is simulated using MATLAB (R9a). The network performance
parameters such as Throughput, mean life time; dead nodes and live nodes in the
network are calculated. It is found that the networks throughput, efficiency and cluster
head formation at optimal level has considerably improved when compared with other
algorithms (Table 1).
Mean Life Time of the Network: The mean life time of any network is defined as
the time during which the network is dedicatedly operational and it can be possible
with more number of nodes are alive. Using the proposed algorithm MODLEACH it is
found that due the cluster head changing scheme at threshold levels and designing of
dual transmission of power levels the mean life time has been increased appreciably
compared to that of other algorithms like LEACH. In the figures shown i.e. Figs. 2 and
3, shows an increase in mean life time as number of live nodes is increased and dead
nodes are decreased. The soft threshold concept helps the network in maintaining more
life time of the nodes as it is done in MODLEACHST when compared to other
protocols. Another advantage is less number of transmissions occurs in this protocol.
This concept helps it for transmission of data both in intra and inters cluster trans-
mission with usage of energy efficiently. As we know that the energy of sensor depends
on number of transmission and also it is inversely related to transmissions as the
number of transmissions decreases the life time of the increases as the nodes preserve
the energy in each round.
638 M. Khaleel Ullah Khan and K. S. Ramesh
each round may or may not change depending upon the threshold energy level where as
in LEACH algorithm every round a new cluster head is selected and there is lot of
energy wastage compared to that of MODLEACH as shown in Figs. 6 and 7, The
cluster heads are stable in MODLEACH (Fig. 5).
6 Conclusion
This research paper is mostly focused on the modified form of LEACH algorithm.
A separate cluster head selection in each round can be avoided using the threshold
energy level is depicted. Using the network performance metrics like throughput, life
time of nodes and packet drop ratio it is shown that a better model is designed
compared to that of basic LEACH model. Using MODLEACH the energy is efficiently
utilized which is the basic advantage in this proposed model. The dual transmission of
power has also considerably increased the throughput of the network by decreasing the
packet drop ratio. This mechanism can also be implemented in other routing protocols
of the wireless networks.
A Modified LEACH Algorithm for WSN: MODLEACH 641
References
1. Heinzelman W, Chandrakasan A, Balakrishnan H (2000) Energy-efficient communication
protocols for wireless microsensor networks. In: Proceedings of Hawaiian international
conference on systems science, January
2. Intanagonwiwat C, Govindan R, Estrin D (2000) Directed diffusion: a scalable and robust
communication paradigm for sensor networks. In: Proceedings of the 6th annual ACM/
IEEE international conference on mobile computing and networking(MOBICOM), August,
pp 56–67
3. Estrin D, Govindan R, Heidemann J, Kumar S (1999) Next century challenges: scalable
coordination in wireless networks. In: Proceedings of the 5th annual ACM/IEEE
international conference on mobile computing and networking(MOBICOM), pp 263–270
4. Jiang M, Li J, Tay YC (1999) Cluster based routing protocol. Internet draft
5. Chong CY, Kumar SP (2003) Sensor networks: evolution, opportunities and challenges.
Proc. IEEE 91(8):1247–1256
6. Younis M, Munshi P, Gupta G, Elsharkawy SM (2006) On efficient clustering of wireless
sensor networks. Second IEEE workshop on dependability and security in sensor networks
and systems, pp 78–91
7. Arboleda LMC, Nasser N (2006) Comparison of clustering algorithms and protocols for
wireless sensor networks. Canadian conference on electrical and computer engineering,
May, pp 1787–1792
8. Mhatre V, Rosenberg C (2004) Design guidelines for wireless sensor networks: commu-
nication, clustering and aggregation. Ad Hoc networks 2(1):45–63
9. Duarte-Melo Enrique J, Mingyan Liu (2002) Analysis of energy consumption and lifetime of
heterogeneous wireless sensor networks. Proc of IEEE Globecom, Taipei, Taiwan, pp 21–25
10. Rui-Hua Zhang, Zhi-Ping Jia, Dong-Feng Yuan (2008) Lifetime analysis in heterogeneous
wireless sensor networks. Jilin Daxue Xuebao 38(5):1136–1140
11. Smaragdakis G, Matta I, Bestavros A (2004) SEP: a stable election protocol for clustered
heterogeneous wireless sensor networks. Proc of the Int’1 workshop on SANPA 2004.
pp 251–261
12. Qing Li, Zhu Qingxin, et al. (2006) Design of a distributed energy-efficient clustering
algorithm for heterogeneous wireless sensor networks. Computer Communications, August,
vol. 29, no. 12, pp 2230–2237
13. Li Xiaoya, Huang Daoping, Yang Jian (2007) Energy efficient routing protocol based
residual energy and energy consumption rate for heterogeneous wireless sensor networks.
Proceedings of the 26th Chinese control conference. Zhangjiajie, China, pp 587–590
14. Manjeshwar A, Agrawal DP (2001) TEEN: a routing protocol for enhanced efficiency in
wireless sensor networks. Proc. of 1st Intl. Workshop on parallel and distributed computing,
Apr.
15. Lindsey S, Raghavendra CS (2002) PEGASIS: power-efficient gathering in sensor
information systems. Aerospace conference proceedings, 2002. IEEE 3(3):3–1125, 3–1130
16. Manzoor B, Javaid N, Rehman O, Akbar M, Nadeem Q, Iqbal A, Ishfaq M (2013) Q-
LEACH: a new routing protocol for WSNs. International workshop on body area sensor
networks (BASNet-2013) in conjunction with 4th international conference on Ambient
Systems, Networks and Technologies (ANT 2013), 2013, Halifax, Nova Scotia, Canada,
Procedia Computer Science, vol. 19, pp 926–931, ISSN 1877-0509
17. Aasia Kashaf, Nadeem Javaid, Zahoor Ali Khan, Imran Ali Khan (2012) TSEP: Threshold-
sensitive stable election protocol for WSNs, 10th IEEE international conference on frontiers
of information technology (FIT’ 12), Pakistan
642 M. Khaleel Ullah Khan and K. S. Ramesh
18. Aslam M, Javaid N, Rahim A, Nazir U, Bibi A, Khan ZA (2012) Survey of extended
LEACH-based clustering routing protocols for wireless sensor networks, 5th international
symposium on advances of high performance computing and networking (AHPCN- 2012) in
conjunction with 14th IEEE international conference on high performance computing and
communications (HPCC-2012), 25–27 June, Liverpool, UK
19. Khan AA, Javaid N, Qasim U, Lu Z, Khan ZA (2012) HSEP: heterogeneity-aware
hierarchical stable election protocol for WSNs, 3rd international workshop on advances in
sensor technologies, systems and applications (ASTSA-2012) in conjunction with 7th IEEE
international conference on broadband and wireless computing, communication and
applications (BWCCA 2012), Victoria, Canada
20. Qureshi TN, Javaid N, Khan AH, Iqbal A, Akhtar E, Ishfaq M (2013) BEENISH: balanced
energy efficient network integrated super heterogenous protocol for wireless sensor
networks. The 4th international conference on Ambient Systems, Networks and Technolo-
gies (ANT 2013), 2013, Halifax, Nova Scotia, Canada, Procedia Computer Science, vol. 19,
pp 920–925, ISSN 1877-0509
21. Tahir M, Javaid N, Khan ZA, Qasim U, Ishfaq M (2013) EAST: energy-efficient adaptive
scheme for transmission in wireless sensor networks. 26th IEEE Canadian conference on
electrical and computer engineering (CCECE 2013). Regina, Saskatchewan, Canada
22. Tauseef Shah, Nadeem Javaid, Talha Naeem Qureshi (2012) Energy efficient sleep awake
aware (EESAA) intelligent sensor network routing protocol. 15th IEEE international multi
topic conference (INMIC’12), Pakistan
23. Fareed MS, Javaid N, Ahmed S, Rehman S, Qasim U, Khan ZA (2012) Analyzing energy-
efficiency and route-selection of multi-level hierarchal routing protocols in WSNs,
broadband, wireless computing, communication and applications (BWCCA)
A Hybrid Approach of Wavelet Transform
Using Lifting Scheme and Discrete Wavelet
Transform Technique for Image Processing
Abstract. Now a modern time many area such as company, medical, research
and file require large number of image for general-purpose application to solve
the complex problem. Image contain more information that require more storage
space and transmission bandwidths, so the image compression is required to
store only important information and reduce the different types of redundancy of
image for storing and transmission in an efficient manner, because uncom-
pressed image required more data storage capacity and transmission time. In the
present work the storing space used is very less ecuase it help in reducing the
processing time. For image compression, different transform technique is used.
Image can be represented as a matrix of pixel values and after compression by
applying different method there is no change or little change between pixel
values. This present work is uses haar method and Lifting Wavelet Transform
for image compression for increase the efficiency of Discrete Wavelet Transform
(DWT).
1 Introduction
Uncompressed images require large storage space, but storage space and transmission
time (resource requirement) is limited. So the solution is that compress the image for
quickly transmission. Image compression is an application of digital image processing
performed on digital image to reduce the size of image on the way to eliminate the
redundancy from it devoid of devours the excellence of the image to an undesirable
level [1].
Original images can occupy a huge amount of memory in both RAM and storage
area. Due to this the probability of loss of image while sending is increased and it takes
a lot of time. So for these reasons, we prefer image compression method and it is
requisite. Across a network, this method is preferred to remove the undesirable storage
space.
Now there are various types of redundancy in images and all these redundancies are
removed for efficient storage and transmission. Image compression is depend on degree
of redundancy are present in image [2].
(a) Coding Redundancy: - A code contains number of symbols to represent infor-
mation body and every information defines a sequence of code. Number of
symbol present in it defines the length of code word.
(b) Spatial and Temporal Redundancy: - All adjacent data points and intensity pixels
are spatially correlated and temporal correlated pixel contain duplicate informa-
tion. Temporal redundancy means image frames are correlated.
(c) Irrelevant Information: - Human visual system ignores the 2-D intensity array. It is
mostly not used because it is redundant.
As the image is formed by many pixels and since all pixels are correlated by each
other so they contain lots of redundant information or details which occupy lots of
memory unnecessarily. Therefore, to avoid this redundancy and irrelevancy, we utilize
the different techniques.
There are two parts to the compression:
• Finding properties of image data such as grey-level histogram, image entropy,
correlation functions etc.
• Finding an appropriate compression technique for an image.
There are two types of image compression are used [3–5].
(a) Lossy Image Compression
An image can be compressed with the preamble of few errors or loss in data is
considered as lossy compression. It is based on irrelevancy reduction strategies (some
information are ignored by HVS) but will usually also employ some redundancy
strategies. In this type of compression, required bits are reduced for transmitting,
storing without emphasis on resolution of image. Image is compressed, and after
compression, it produces the less information than the original image.
(b) Lossless Image Compression
An image or file is compressed without the preamble of errors or loss in data, but
only up to certain extents, is considered as lossless compression. It is based on
redundancy reduction and mainly concentrates on the encoding of image.
An image compression system needs to have at least the following two components
[6, 7]:
a. Encoding System
b. Decoding System
Both Encoding and Decoding is used for compression and decompression of image
(Figs. 1 and 2).
A Hybrid Approach of Wavelet Transform 645
At the time of transmission and acquisition, images are often corrupted through the
noise. Denoising is therefore the common issue in application of image processing. The
denoising purpose is then to decrease level of noise, while image features preserving
likes textures and edges and so on as accurately as possible [9, 10]. To eliminate noise
in the picture area, picture coordinating is utilized in the spatial filtering technique or
the frequency filtering strategy. These techniques can rapidly and precisely evacuate
noise. Be that as it may, the shape-based picture coordinating considering the
incomplete commotion causes overwhelming calculation overhead in light of finding
and evacuating the fractional clamor in a limit picture [11]. Thus, due to highly sen-
sitive texture edges, directional wavelets and geometric for image denoising have
become famous subject, So that multidirectional-wavelet-based denoising approaches
can create an improved visual feature for highly structured image patterns. In this
paper, we propose to use a new DWT technique based on lifting. The proposed new
technique is more efficient representation for sharp features in the given image [12].
2 Proposed Methodology
After the collection of this large image set, experiments are performed. For better
performing and understanding, Graphical User Interface (GUI) is created. In this
project, different types of button are provided for better utilization. The below images
of screenshots is the picture of main experiment more clear. In this experiment, input
image is inserted (Figs. 6, 7, 8, 9, 10, 11, 12, 13, 14).
A Hybrid Approach of Wavelet Transform 647
Fig. 6. Opening of G.U.I for performing experiment (G.U.I for adaptive lifting wavelet image
compression is made, here six buttons are there that is open, region, proposed, compress,
decompress and result.)
Fig. 7. Original image are displayed through browsed database (input image are display that are
all ready in the database)
648 K. Ramya laxmi et al.
Fig. 8. Partition of image in 44 mode (selected image are partition in 44 mode)
Fig. 9. Coherence chart (images are divided into blocks and coherence chart define which block
is homogeneous and which block is heterogeneous.)
Fig. 10. Compressions by proposed algorithm (proposed algorithm (Hybrid) are apply for
compression.)
A Hybrid Approach of Wavelet Transform 649
Fig. 11. Compressed image (compressed image of original image are displayed.)
Fig. 12. Processing of decompression (decompression are processed for making recovered
image.)
Fig. 13. Recovered image (recovered image is displayed from original image)
650 K. Ramya laxmi et al.
Fig. 14. Recovered image with MSE and PSNR value (recovered image is displayed from
original image with value of MSE and PSNR.)
For further study, Discrete Wavelet Transform with different method is used for better
transmission on network in field of computer vision. The developed algorithm has to be
very flexible. It means it not only works with present research problem but also can be
beneficial for other research problems also. Present work can also be applied using
hybrid algorithm for better image compression. In future, it can further analyze the
image characteristics and image status for proper compression. By using proper
threshold computation to estimate the directional information of image. In future, we
will further increase the PSNR value for efficient image compression with less trans-
mission bandwidth.
References
1. Fang Z, Xiong N (2011) Interpolation-based direction-adaptive lifting DWT and modified
SPIHT for image compression in multimedia communication. IEEE Syst J 5(4):584–593
2. Grgic S, Grgic M, Zovko-Cihla B (2001) Performance analysis of image compression using
wavelet. IEEE Trans Indust Electron 48(3):682–695
3. Hilton ML, Jawerth BO, Sengupta A (1994) Compressing still and moving image with
wavelet. Multimedia Syst 2(5):218–227
4. AI-Kamali FS, Dessouky MI, Sallam BM, Shawki F, EI-Samie FEA (2010) Transceiver
scheme for single-carrier frequency division multiple access implementing the wavelet
transform and peak to –average-power ratio reduction method. IET Commun 4(1):69–79
5. Taubman D, Marcellin MW (2002) JPEG 2000 image compression fundamentals standard
and practice. Kluwer, Dordrecht, The Netherlands
6. Chen N, Wan W, Xiao HD (2010) Robust audio hashing based on discrete-wavelet-
transform and nonnegative matrix factorisation. IET Commn 4(14):1722–1731
7. Cands EJ, Donoho DL (1999) Curvelet a surprisingly effective nonadaptive representation
for object with edges, in curve and surface fitting: Saint-malo. University Press, Nashville,
TN, pp 105–120
8. Jangde K, Raja R (2013) Study of a image compression based on adaptive direction lifting
wavelet transform technique. Int J Adv Innov Res (IJAIR) 2(8):ISSN: 2278 – 7844
A Hybrid Approach of Wavelet Transform 651
9. Jangde K, Raja R (2014) Image compression based on discrete wavelet and lifting wavelet
transform technique. Int J Sci, Eng Technol Res (IJSETR) 3(3):ISSN: 2278 – 7798
10. Rohit R, Sinha TS, Patra RK, Tiwari S (2018) Physiological trait based biometrical
authentication of human-face using LGXP and ANN techniques. Int. J. of Inf Comput Secur
10(2/3):303–320 (Special Issue on: Multimedia Information Security Solutions on Social
Networks)
11. Raja R, Mahmood MR, Patra RK (2018) Study and analysis of different pose invariant for
face recognition under lighting condition. Sreyas Int J Sci Technocr 2(2):11–17
12. Raja R, Agrawal S (2017) An automated monitoring system for tourist/safari vehicles inside
sanctuary. Indian J Sci Res 14(2):304–309, ISSN: 2250-0138
Proportional Cram on Crooked Crisscross
for N-Hop Systems
1 Introduction
A computer network can be easily described by a graph in terms of nodes (or vertices)
and edges [1]. Graph theory [2] and Graph coloring can be applied to a network in
order to optimize [4] the performance of the network.
Preliminary and Definitions:
1. Graph:
A graph [3] is an ordered pair G = (V,E) where, V is the set of vertices or nodes of
the graph and E is the set of edges or connections between vertices.
2. Undirected Graph:
An undirected graph is a graph in which edges have no orientation i.e., the edge
(a,b) is identical to (b,a).
3. Irregular Graph:
Irregular graph is a graph in which for every vertex, all neighbors of that vertex
have distinct degrees.
4. Degree:
The degree of a vertex of the graph is the number of edges incident to the vertex.
5. Chromatic Number:
The chromatic number of a graph is the smallest number of colors needed to color
the vertices of the graph so that no two adjacent vertices share the same color.
2 Impediment Sketch
Figure 1 Explains the block diagram of the irregular graph approach for N-hop
networks.
Equivalent
Network Graphical Maximum & Possible hop networks
Representation Minimun Degree identification
based Level
Graph
Example: 1 Consider the following graph G1 with 11 vertices and 12 edges as given
below in the Fig. 2.
Fig. 2. Set-up G1
Now, fixing B as the source node (since it is at the 0th level) and D,E,I & H as the
sink nodes (because it is at the nth level), the possible N-hop networks (required level)
are listed as below in Table 1.
Similarly, fixing E as the source node (since it is at the 0th level) and H & I as the
sink node (because it is at the nth level), the following Table 2 explains different levels
of nodes [4] identified along with the n-hop networks.
Observation:
Here, we observed from Figs. 3 and 4 that is a three colorable network since there is a
odd circuit.
Example: 2 Consider the following graph G2 with 11 vertices and 12 edges as shown
in Fig. 5.
Figure 6 shows the Maximum level degree based Program Dependence Graph
(PDG) or level based structure of G1.
Proportional Cram on Crooked Crisscross for N-Hop Systems 655
Fig. 5. Organization G2
Now, fixing F as the source node (since it is at the 0th level) and E & D as the sink
node (since it is at the nth level) the possible n-hop networks are as follows in Table 3.
Figure 7 shows the Smallest amount level based Program Dependence Graph
(PDG) or level based structure of G1.
Now, fixing E as the source node (since it is at the 0th level) and K & J may be the
sink nodes (since it is at the nth level) the possible n-hop networks are as follows in
Table 4.
656 K. Thiagarajan et al.
Observation:
It is observed from the Figs. 6 and 7 that the network is 3-colorable since it has odd
circuit.
Example: 3 Consider the following graph G3 with 15 vertices and 16 edges as shown
in Fig. 8.
Fig. 8. Network G3
Now, fixing C as the source node (since it is at the 0th level) and E & O as the sink
node (since it is at the nth level) the possible 1-hop, 2-hop and 3-hop networks are
shown in Table 5.
Now, fixing B as the source node (since it is at the 0th level) and O as the sink node
(since it is at the nth level) the possible 1-hop, 2-hop and 3-hop networks are shown in
Table 6.
Observation
It is observed from the Figs. 9 and 10 that the network is 2-colorable since it has even
circuit.
Example: 4 Consider the following graph G4 with 15 vertices and 16 edges as shown
in Fig. 11.
Observation:
It is observed from the Figs. 12 and 13 that the network is 2-colorable since it has even
circuit.
4 Inference
To sum up, it is observed that, even though the number of vertices and edges in a grid
increases grid coloring can be done efficiently with at most a maximum of three colors.
The comparative analysis of maximum and minimum degree level based dependence
graph implies that the difference in maximum hop count differs.
5 Propositions
1. In a given Irregular system if there are n levels in Level degree based program
dependence graph then, at most (n-2) hop network can be constructed.
2. In a given uneven network,
If there are odd number of circuits in a level degree based program dependence
graph then it is three colorable.
If it has even number of circuits, then it is two colorable.
3. There is difference in maximum hop count between maximum degree level based
program dependence graph and minimum degree level based program dependence
graph.
660 K. Thiagarajan et al.
Proportional study on maximum and minimum level degree based program dependence
graph for irregular networks is carried out. In future, the work can be extended to
analyze some connected ancestor graphs.
Acknowledgement. The authors would like to thank Dr. Ponnammal Natarajan, Former
Director of Research and Development, Anna University, Chennai, India for her intuitive ideas
and fruitful discussions with respect to the paper’s contribution and support to complete this
paper.
References
1. Gallian JA (2016) A dynamic survey of graph labeling. Electron J Comb 18:42–91
2. Bondy JA, Murty USR (1976) Graph theory with applications. Macmillan, London
3. Harrary F (1969) Graph theory. Addison Wesley, Reading, MA
4. Chelali M, Volkmann L (2004) Relation between the lower domination parameters and the
chromatic number of a graph. Discret Math 274:1–8
5. Thiagarajan K, Mansoor P (2017) Expansion of network through seminode. IOSRD Int J
Netw Sci 1(1):7–11
Assessment of Cardiovascular Disorders Based
on 3D Left Ventricle Model of Cine
Cardiac MR Sequence
1 Introduction
Cardiovascular disease (CVD) is one among the leading reasons of mortality in India
[1]. Among the spectrum of CVDs, ischemic heart disease and stroke are the primary
disorders. In recent years, premature mortality as a result of these diseases has
increased rapidly; hence more emphasis is required to provide appropriate therapeutic
interventions. The reduced ejection fraction in many cases indicates that the heart does
not squeeze properly to pump blood. This reduction can be caused due to various
pathologies such as coronary artery disease, cardiomyopathy and aortic stenosis. These
subjects suffer from fatigue, nausea, loss of appetite and shortness of breath. In clinical
routines cardiac function is assessed with imaging modalities such as echocardiogra-
phy, computed tomography and cardiac magnetic resonances (CMR) imaging [2].
However, CMR technique is non-invasive and provides better soft tissue contrast [3].
Hence, it can be utilized to analyse the vitality of myocardium and the contraction
ability of heart.
The evaluation of functional parameters such as ventricular volume plays a signif-
icant role in prognosis of CVDs [4]. The quantitative cardiac MR measures such as end-
diastole volume, end-systole volume, stroke volume, and ejection fraction of left ven-
tricle (LV) are important predictors of cardiac abnormalities. Further, the LV volume
variation over a cardiac cycle provides the extent of abnormal behavior in CVDs [5].
Fig. 1. Representative short-axis view CMR images: (a) Normal, (b) Mild, (c) Moderate, and
(d) Severe
boundary. Here, the 3D left ventricle structure is reconstructed based on the segmented
region in the stack. This is followed by 3D reconstruction using polygon surface
rendering technique that utilizes marching cubes algorithm. Here, neighboring eight
locations of a pixel are considered by the algorithm at a time while proceeding through
the scalar field to form a cubic structure. Further, polygons that are required to rep-
resent the division of isosurface that passes through this cube are found. These separate
polygons are then combined into the desired surface. The surface of the 3D recon-
structed LV is smoothed. The pixel resolution, slice thickness and other such relevant
information about the DICOM images are taken into consideration for 3D recon-
struction. The surface model can also be viewed at random angles and different
dimensions. Finally, the volume is calculated from the 3D reconstructed structure. The
entire procedure is reported for every time frame of a cardiac cycle. Hence, for every
individual subject 30 volumes are estimated over the entire cardiac cycle. The volume
at the start of contraction and relaxation phase in a cardiac cycle denotes end-diastole
(EDV) and end-systole volume (ESV) respectively. The ejection fraction (EF) is cal-
culated from the EDV and ESV for every individual as follows.
Assessment of Cardiovascular Disorders Based on 3D Left Ventricle Model 665
3 Results
The experiment is carried out on a i7 - 7700 CPU @ 3.60 GHz, 16 GB RAM. Totally,
20 subjects have been considered for this work that includes normal, mild, moderate
and severe category. The segmented left ventricle using threshold technique is shown
in Fig. 3 for different slices of a representative subject. Here, ventricle has been seg-
mented in 9600 slices of 20 subjects available in the dataset.
These segmented regions from different slices are reconstructed to create the 3D
model as shown in Fig. 4. Here, 30 surface models of left ventricle are created per
subject to cover a cardiac cycle. Hence in total, 600 LV surface models have been
created in this study. The developed ventricle surface models for a representative
subject at different angles of rotation are illustrated in Fig. 5. This enables the exam-
ination of the ventricular geometry in diverse dimensions that assist the effective
planning of surgical executions.
(a) (b)
Fig. 4. Reconstruction of 3D LV model from stack of slices: (a) segmented stack of slices,
(b) 3D LV model
These models reflect the possible shape of the left ventricle in 3D view. The end-
diastole (EDV) and end-systole (ESV) volume calculated by surface rendering is
correlated with the manual volumes provided in the database. It is discovered that EDV
and ESV significantly correlates with the manual volumes. Further, residual analysis
has been carried out to validate the quality of the calculated volume. Figure 6 (a) and
(b) shows the residual plot for EDV and ESV of LV respectively. Here, the difference
between the measured and manual volume is plotted against the manual volume. The
residual plot follows a constant variance pattern. The calculated EDV varies only about
(−1.7 ml to +3.9 ml) with respect to the manual volume. Similarly, the calculated ESV
deviates approximately (−4.2 ml to +3.9 ml) related to manual volume. Tan et al. have
used convolutional neural network for segmentation of LV [20]. The absolute differ-
ence for EDV and ESV has been obtained as 11.8 ± 9.8 ml and 8.7 ± 7.6 ml
respectively. Khened et al. have utilized residual DenseNets for cardiac segmentation
and obtained a mean standard deviation of ±5.501 ml for EDV [21]. However, in the
Assessment of Cardiovascular Disorders Based on 3D Left Ventricle Model 667
Fig. 5. Surface models of left ventricle in different dimensions: (a) Anterior long-axis, (b) Basal,
(c) Apical, and (d) posterior long-axis
performed study a maximum deviation of ±3.9 ml and ±4.2 ml has been obtained for
EDV and ESV respectively. Hence, the considered framework performs better as the
volume is estimated directly from the reconstructed 3D models.
Further, the variation of calculated left ventricle volume over a cardiac cycle is
analysed to predict the severity of cardiovascular dysfunction. In general, there is a
decrease in LV volume during systolic phase and an increase in diastolic phase. The
rate of inflation and decline can be used to predict the level of abnormality. Figure 7
shows the variation of LV volume for the considered normal, mild, moderate and
severe subjects. The plot depicts the variation of left ventricle volume over a cardiac
cycle for each individual subject. The top and bottom most ends of each box plot
represent the EDV and ESV of each subject. The normal subjects have shown lower
ventricular volumes compared to the abnormal. It can also be observed that the
relaxation and contraction is the highest in normal subjects indicating the best possible
LV deformation during a cardiac cycle.
668 M. Muthulakshmi and G. Kavitha
Fig. 7. Variation of left ventricle volume for normal and abnormal subjects
A lower ESV indicates better contraction by LV and hence efficient blood pumping
by the heart. Though the volume change is better in mild than moderate, it can be noted
that it is slightly lower with respect to normal. On contrary, the plots also suggest that
the change in volume from ED to ES frame is minimal in moderate subjects compared
to mild. However, it is visible from the plots that in moderate subjects there is sig-
nificant change in LV volume in consecutive frames compared to severe categories. It
is examined that severe subjects have a higher volume at ED and ES as the heart
muscles weaken and elongate as the disease progresses. The contraction is the lowest in
severe subjects. Though, a slightly higher contraction is observed in one subject,
maximum level of ventricular contraction is not observed. This is noticeable from the
higher ESV for all the severe subjects.
Hence in summary, a considerable LV volumetric variation has been observed
among the mild, moderate and severe categories. Normal and severe subjects have
distinct deviation in volumes. On the other hand, the volumetric discrimination
between moderate and normal is more prominent compared to mild.
Thus effective LV surface models created through 3D reconstruction results in
improved detection of severity in cardiac abnormalities. This is also obvious from the
higher accuracy achieved in the estimated LV volume. The created 3D LV model helps
in better understanding of complex anatomical structure. This methodology can also be
used for visualization and analysis of other biological structures in medical diagnosis.
Assessment of Cardiovascular Disorders Based on 3D Left Ventricle Model 669
4 Conclusion
In this work, 3D LV models have been created to study the severity in cardiac
abnormalities. A total of 600 surface models of LV have been created to analyse the
heart functionality over a cardiac cycle. The calculated diastolic and systolic volumes
significantly correlated with the manual volumes. It is perceived that the deviation
between the measured and manual volume is also minimum. The outcomes show that
the rate of variation in LV volume over a cardiac cycle is able to better differentiate the
severity levels of the cardiovascular abnormalities. A noticeable volumetric deviation
has been observed between normal and moderate subjects. The developed 3D LV
models enhance the understanding of anatomical variations in cardiac. Thus, the 3D
reconstructed surface models of LV could aid the diagnosis of different types of car-
diovascular disorders.
References
1. Prabhakaran D, Singh DK, Roth GA, Banerjee A, Pagidipati NJ, Huffman MD (2018)
Cardiovascular diseases in India compared with the United States. J Am Coll Cardiol
72(1):79–95
2. Lu X, Yang R, Xie Q, Ou S, Zha Y, Wang D (2017) Nonrigid registration with
corresponding points constraint for automatic segmentation of cardiac DSCT images.
BioMedical Eng OnLine 16(1):39
3. Irshad M, Muhammad N, Sharif M, Yasmeen M (2018) Automatic segmentation of the left
ventricle in a cardiac MR short axis image using blind morphological operation. Eur Phys J
Plus 133(4):133–148
4. Fathi A, Weir-McCall JR, Struthers AD, Lipworth BJ, Houston G (2018) Effects of contrast
administration on cardiac MRI volumetric, flow and pulse wave velocity quantification using
manual and software-based analysis. Br J Radiol 91(1084):1–14
5. Punithakumar K, Ben Ayed I, Afshin M, Goela A, Islam A, Li S, Boulanger P, Becher H
(2016) Detecting left ventricular impaired relaxation in cardiac MRI using moving mesh
correspondences. Comput Methods Programs Biomed 124:58–66
6. Zhang D, Icke I, Dogdas B, Parimal S, Sampath S, Forbes J, Bagchi A, Chin C, Chen A
(2018) Segmentation of left ventricle myocardium in porcine cardiac cine MR images using
a hybrid of fully convolutional neural networks and convolutional LSTM. In: SPIE 10574,
medical imaging 2018: Image processing, 105740A, Texas
7. Tao Q, Yan W, Wang Y, Paiman EHM, Shamonin DP, Garg P, Plein S, Huang L, Xia L,
Sramko M, Tintera J, de Roos A, Lamb HJ, van der Geest RJ (2018) Deep learning–based
method for fully automatic quantification of left ventricle function from cine MR images: A
multivendor, multicenter study. Radiology 290(1):81–88
8. Tan LK, McLaughlin RA, Lim E, Abdul Aziz YF, Liew YM (2018) Fully automated
segmentation of the left ventricle in cine cardiac MRI using neural network regression.
J Magn Reson Imaging 48(1):140–152
670 M. Muthulakshmi and G. Kavitha
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology,
Avadi, Chennai, India
[email protected],
[email protected]
1 Introduction
Cloud computing is an emerging technology in the field of IT. Also, it provides to users
on-demand services, scalability, multitenant technology, self-service computation,
storage resource and reliability. Load balancing, resource scheduling, data lock in
problem, energy consumption and performance monitoring are major challenges in
cloud computing [1]. Load balancing is one of the main challenges in cloud computing
[1]. To overcome this, researchers proposed many scheduling algorithms. To control
the order of work, set of policies are referred. To achieve a high performance com-
puting and system throughput, scheduling algorithm is used. This paper compares three
scheduling algorithms viz. First Come First Serve (FCFS), Shortest Job First (SJF),
Generalizes priority (GP). Performance parameters used to evaluate these algorithms
are execution time. The paper is structured as follows: the introduction to scheduling is
presented in Sect. 2. Related work done is presented in Sect. 3. Analysis of existing
scheduling algorithm is presented in Sect. 4. Experimental setup is presented in Sect. 5.
Performance analysis of algorithm is presented in Sect. 6 and Sect. 7 concluded the
work carried out.
2 Introduction to Scheduling
Scheduling is the technique of mapping set jobs to available resources [3]. System
throughput, load balance, maximize resource utilization is the aim of scheduling. In
scheduling architecture, data center broker is like sender between datacentre and user.
In datacentre, there are number of hosts. On each host, there are number of virtual
machines. First, user submits task to the data center broker. The datacentre broker
communicates with cloud controller and schedule submitted task. According to the
scheduling polices, tasks are scheduled on VM. VM-level and Host-level are two level
where scheduling is performed. Using task/job scheduler, tasks are mapped to VMs in
VM-level [4], also refer as task scheduling. In task scheduling, for specific time, each
task is assign to node (VM). This ensures that all tasks are executed in a minimum span
of time [5]. Task scheduling focuses on effectively mapping tasks to appropriate VM
[4]. At Host-level, VM schedular is used to schedule the VM request to the physical
machine of particular datacentre. This is also called as VM scheduling. Space Shared
scheduling policy and Time-Shared scheduling policy are two types of scheduling
policies. For host level, one VM is assigned at time to CPU core in space shared policy.
After the completion of its task, it schedules another VM. For VM level, one task is
scheduled at a time to VM. After completion of this task, it schedules another task to
VM. Space shared scheduling policy behave like First Come First Serve scheduling
algorithm [3]. For host level, at the same time all VM are scheduled on CPU core in
time shared policy. For VM level, all tasks are scheduled on VM at same time. Time
shared scheduling policy behaves like Round Robin scheduling algorithm [3]. In
existing systems, there are various type of scheduling algorithm such as Multilevel
Queue, Multilevel Feedback, Shortest Job First, Round Robin, First Come First Serve,
Priority Queue etc. This paper analyses First Come First Serve (FCFS), Shortest Job
First (SJF) and Generalized Priority (GP) scheduling algorithms using performance
parameters such as execution time. The analysis of existing scheduling algorithm by
considering execution time, throughput, resource utilization, make span, waiting time
and response time are describe in next section.
3 Related Work
In this section, work done of task scheduling is described. Based on load balancing,
two level task scheduling is proposed by Sudha Sadhasivam [6]. This will provide high
resource utilization. Based on genetic simulated annealing, optimized task scheduling
algorithm is proposed by G. Guo-Ning [7]. Various evaluation parameters were con-
sidered for QoS requirement. In genetic algorithm, annealing is implemented after the
selection, crossover and mutation. Rajkumar Rajavel [8] presented hierarchical
scheduling. By executing high priority job first, response time is achieved. High pri-
ority is estimated by job completion time. Based on Activity Based Costing
(ABC) optimized task scheduling is proposed by Q. Cao [9]. In this algorithm, for each
task priority and user cost drivers is assign. Object cost and performance of activity is
measured by ABC. For allocating incoming jobs to virtual machines Medhat A. [10]
proposed an Ant colony optimization. Positive feedback mechanism is used. Monica
Task Scheduling Performance Evaluation 673
Gahlawat [11] analyses CPU scheduling algorithm in cloudsim and test performance of
different scheduling policy. Priority based resource allocation is proposed by Pawar C.
[12]. For resource utilization, various SLA parameters were considered. It provides
dynamic resource provisioning. M. Kumar [13] proposed a new algorithm which finds
out makespan based on priority of task.
In this section, scheduling algorithms available in cloud computing are analyze. Major
parameters considered for analyzing scheduling algorithms are task length, task
deadline and resource utilization.
A priority-based job scheduling algorithm: This algorithm reduces an important
performance parameter viz. make span. In this algorithm priority are consider for
scheduling. Each job requests resource according to some priority. In addition to make
span, other performance parameters such as consistency and complexity are also
considered [14].
A priority-based scheduling strategy for VM allocation: This algorithm maximizes
the benefit of service provider and improves resource utilization. It proposes,
scheduling of virtual machine on the basic of priority. In this strategy, according to
profit the request have, ranking of request are assigned. In this approach, it has been
observed that it can increase the benefits [15].
Generalized Priority Based Algorithm (GP): This algorithm reduces execution time
required to complete the tasks. As per the million instructions per second (MIPS) VMs
priorities are assigned and according to size and length of task, task are prioritized.
Highest priority task is scheduled on VM which has highest priority. In this algorithm,
Cloudlet size and priority are also considered as a scheduling parameter [16].
Improved Priority based Job Scheduling Algorithm using Iterative Methods: This
algorithm reduces make span. For making decisions and task execution, Analytic
Hierarchy Process (AHP) is used because tasks are different in nature [17].
Priority Based Earliest Deadline First Scheduling Algorithm: By combining two
scheduling algorithms viz. earliest deadline first and priority-based scheduling average
waiting time of task is reduced. The main focus of this algorithm is to improve resource
allocation and to reduce memory utilization [18].
Round Robin (RR): This algorithm improves response time and resource utiliza-
tion. This algorithm maintain queue to store the jobs with same execution time and it
will execute one after another. If a job is not completed, it will store back in a queue
and wait for its next turn. The drawback of round robin algorithm is that largest job
takes more time for completion [19].
Modified Round Robin: This algorithm reduces response time. It is based on
divisible load balancing theory in which master-slave relationship is maintained. Jobs
are subdivided into smaller jobs by master processor and VMs are initialized. Smaller
jobs are assigned to VM for execution. After execution, jobs are dispatched to user and
new jobs are assigned to VM for execution [20].
First Come First Serve (FCFS): This algorithm improves scalability and reliability.
In this algorithm, as per the arrival time, jobs are executed [21].
674 A. Joshi and S. D. Munisamy
Modified First Come First Serve: This algorithm improves response time,
throughput and resource utilization. It is based on two level scheduling viz foreground
VMs and background VMs. On the bases of FCFS, VMs are scheduled in foreground
VMs. On the basis of SJF, VMs are scheduled in background VMs. If current allo-
cation of process is less than threshold, it will accommodate new process [22].
Shortest Job First (SJF): This algorithm improves resource utilization, throughput
and response time. By checking the length of request, load is maintained. The smallest
size request is executed first. Smallest size job is having highest priority in SJF [23].
Opportunistic Load Balancing Algorithm: This algorithm improves resource uti-
lization and the performance. This is a static load balancing algorithm in which current
workload of VM is not considered. In this algorithm, unexecuted task can be handled in
a random order [22].
From above analysis, it is observed that execution time and throughput are not
handled by many of these algorithms. Therefore, this paper considered the experi-
mental analysis of three algorithms viz. FCFS, SJF and GP by using execution time as
an evaluation parameter.
5 Experimental Setup
This section, provide a details of simulation procedure used to compare three algo-
rithms viz. FCFS, SJF and GP. The three algorithms are compared for both homoge-
neous and heterogeneous tasks. Simulator used for the implementation of the algorithm
is Cloudsim 3.0.3. In a CloudSim, tasks are considered to be the cloudlets and nodes
are considered as a Virtual Machines (VM). A set of parameters varied to perform
simulation for Virtual machine, Datacentre and Cloudlets are as shown in Table 1.
6 Performance Analysis
Case 1: In simulation, the number of VMs set was change to 15,25 and 45 for all
algorithm. Number of cloudlets were changed to 10, 30 and 60. Figure 1a–c shows a
comparison of execution time required for the three algorithms with a variation in the
number of cloudlets at different VMs.
In case of FCFS algorithm, for all the VMs, as the number of cloudlets increases,
execution time increases, see Fig. 1a. It is also observed that, for VM 15, execution
time of FCFS is in increasing order at three cloudlets. For the VM 25 and 45, execution
time of FCFS algorithm is almost similar, at three cloudlets. In case of shortest job first,
for all the VMs, as the number of cloudlets increases, execution time increases, see
Fig. 1c. It is also observed that, for VM 15, execution time of SJF is in increasing order
at three cloudlets. No variation in execution time is observed for VM25 and VM45, at
three cloudlets. In case of Generalized Priority algorithm and at VM15, at lower
number of cloudlets, execution time is higher, see Fig. 1b. However, at a higher
number of cloudlets, it decreases. At VM 25, execution time observed is higher than
that at VM45.
676 A. Joshi and S. D. Munisamy
Case 2: Here, for all the algorithms, 20 numbers of VMs were set as a constant and
number of cloudlets varied to 20, 40, 60, 80 and 100. It is observed that for all the three
algorithms, as the number of cloudlets increases, execution time increases.
Case 3: In this case, for all the three algorithms, variation in the execution time at
the number of cloudlets of 20 and 100 is presented. Figure 2a and b shows a plot of
change in execution time for the three algorithms at various cloudlet numbers.
Fig. 2(a). Comparison for cloudlets 20 Fig. 2(b). Comparison for cloudlets 100
Of the three algorithms, generalized priority algorithms have least execution time
(ms) for all the range of cloudlets (20 Nos. to 100 Nos.). This is because, tasks are
executed as per the priority assigned to jobs. On the other hand, shortest job first
algorithm has highest execution time (ms) for all the range of cloudlets as the length of
jobs is considered and not the priority. The execution time of first come first serve
algorithm lies in between the other two viz. shortest job first and generalized priority
algorithm because as per the arrival time of jobs, jobs will be execute and not on the
basis of length of jobs and priority of jobs.
Case 4: This case gives a comparison of three algorithms with change in nature of
cloudlet viz. homogeneous and heterogeneous. Figure 3a and b shows comparison
graph of execution time vs number of cloudlets for homogeneous and heterogeneous
tasks.
Fig. 3(a). For homogeneous task Fig. 3(b). For heterogeneous task
Task Scheduling Performance Evaluation 677
It is observed that for homogeneous type of task, FCFS have least execution time
and SJF has highest execution time. Whereas, GP has execution time (ms) that lies in
between the other two. In case of heterogeneous type of task, it is observed that, GP
have least execution time (ms) and SJF has highest execution time (ms). Whereas,
FCFS has execution time (ms) that lies in between the other two.
7 Conclusion
All the three algorithms viz. FCFS, SJF and GP were compared with considering
variation in both VM and cloudlets, keeping constant VM and increasing Cloudlets,
varying nature of tasks. Comparison among three algorithms shows that, as the number
of cloudlets increases, execution time increases for all the three algorithms. Generalized
Priority algorithm (GP) has lower execution time than both First Come First Serve
(FCFS) and Shortest Job First (SJF) algorithms. For homogeneous type of tasks,
execution time of FCFS is lower and for heterogeneous type of tasks, execution time of
GP is lower. Homogeneous tasks required less execution time than Heterogeneous
tasks.
References
1. Ghomi EJ, Rahmani AM, Qader NN (2017) Load- balancing algorithms in cloud computing:
a survey. J Netw Comput Appl 88:50–71
2. Agarwal A, Jain S (2014) Efficient optimal algorithm of task scheduling in cloud computing
environment. Int J Comput Trends and Technol (IJCTT) 9
3. Manglani V, Jainv A, Prasad V (2017) Task scheduling in cloud computing. Int J Adv Res
Comput Sci 8
4. Almezeini N, Hafex A (2018) Review on scheduling in cloud computing. Int J Comput Sci
Netw Secur 18
5. Kumar M, Sharma SC (2017) Dynamic load balancing algorithm for balancing the workload
among virtual machine in cloud computing, Cochin: 7th international conference on
advances in computing and communication
6. Sadhasivam S, Jayarani R, Nagaveni N, Ram RV (2009) Design and implementation of an
efficient two-level scheduler for cloud computing environment. Proceeding of international
conference on advances in recent technology in communication and computing
7. Guo-Ning G, Ting-Lei H (2010) Genetic simulated annealing algorithm for task scheduling
based on cloud computing environment. In: Proceedings of international conference on
intelligent computing and integrated systems, pp 60–63
8. Rajavel R, Mala T (2012) Achieving service level agreement in cloud environment using job
prioritization in hierarchical scheduling. Proceeding of international conference on
information system design and intelligent application 132, pp 547–554
9. Cao Q, Gong W, Wei Z (2009) An optimized algorithm for task scheduling based on activity
based costing in cloud computing. In: Proceedings of third international conference on
bioinformatics and biomedical engineering
10. Tawfeek MA, El-Sisi A, Keshk AE, Torkey FA (2013) Cloud task scheduling based on ant
colony optimization. In: Proceeding of IEEE international conference on computer
engineering & systems (ICCES)
678 A. Joshi and S. D. Munisamy
11. Gahlawat M, Sharma P (2013) Analysis and performance assessment of CPU scheduling
algorithm in cloud sim. Int J Appl Inf Syst 5(9)
12. Pawar, CS, Wagh, RB (2012) Priority based dynamic resource allocation in cloud
computing. International symposium on cloud and services computing. IEEE
13. Kumar M, Sharma SC (2016) Priority aware longest job first algorithm for utilization of the
resource in cloud environment. 3rd international conference on. IEEE
14. Ghanbari S, Othman M (2012) A priority-based job scheduling algorithm in cloud
computing. Procedia engineering 50
15. Xiao J, Wang Z (2012) A priority based scheduling strategy for virtual machine allocation in
cloud computing environment. Cloud and service computing (CSC) international conference
on IEEE
16. Patel SJ, Bhoi UR (2014) Improved priority based job scheduling algorithm in cloud
computing using iterative method. Fourth international conference on advances in
computing and communication (ICACC). IEEE
17. Gupta G (2014) A simulation of priority based earliest deadline first scheduling for cloud
computing system. First international conference on networks and soft computing (ICNSC).
IEEE
18. Khan DH, Kapgate D, Prasad PS (2013) A review on virtual machine management
techniques & scheduling in cloud computing. Int J Adv Res Comput Sci Softw Eng 3
19. Yeboah A, Abilimi CA. Utilizing divisible load sharing theorem in round robin algorithm for
load balancing in cloud environment. IISTE J Comput Eng Intell Syst 6
20. Shokripour A, Mohamed O (2012) New method for scheduling heterogeneous multi-
installment systems. Future Gener Comput 28
21. Liu X, Chen B, Qiu X, Cai Y, Huang K (2012) Scheduling parallel jobs using migration and
consolidation in the cloud. Scheduling parallel jobs using migration and consolidation in the
cloud
22. Kaur S, Kinger S (2014) A survey of resource scheduling algorithm in green computing.
Int J Comput Sci Inf Technol 5
23. Hung CL, Wang HH, Hu YC (2012) Efficient load balancing algorithm for cloud computing.
IEEE 9
Safety Assessment with SMS Approach
for Software-Intensive Critical Systems
1 Introduction
Those systems prone to incidents and accidents, controlled or driven by, triggered by
intrinsic errors within and external to software are regarded as software-intensive
critical systems. In other words, also called as safety-critical computer systems, viz.
bio-medical devices, airplanes, and nuclear power plants, when controlled by software
are expected to function in a safe manner even in cases of leading to serious individual
or multiple simple or complex failures; as humans as well as significant financial assets
are involved. Any sort of error or failure or mistake resulting in malfunctioning can end
with events like catastrophic accidents, the span across and include life, environment
and property.
As reiterated, software is harmless in isolation, however, software is applied across
varying domains, in most of the industries to control, operate and monitor various
critical activities, where safety is bound to play an important role. The problem persists
at this juncture that how safety pertaining to software be identified, measured and scaled,
which has an indirect effect on safety whelm of the system, though practicing safety
professional makes efforts to cope up with the scenarios, but focus on software con-
tribution to safety is a little consideration. The software presently utilized in computers
has itself turned out to be composite that it isn’t responsible and has caused human
damage and demise as a result [1, 3] With the expanding reliance on software to
acknowledge complex functions in the cutting edge aeronautic systems, the software has
turned into the real determinant of the systems reliability and safety [13]. In general, the
Software-Intensive essential systems address the applications during which failure will
result in serious injury, important property injury, or injury to the atmosphere. The
planning of those systems ought to fulfill the meant useful necessities further as non-
functional necessities that outline qualities of a system like safety, reliability, and
execution time. System safety represents the most non-functional demand for the
software-intensive embedded system. It’s outlined by MIL-STD-882D commonplace
[2] as “Freedom from those conditions which will cause loss of life, harm, loss of kit, or
injury to the environment”. Therefore, several style techniques, concepts, safety
strategies, and standards are planned and wont to cowl the event life cycle and to boost
the non-functional necessities of such safety-critical systems [12]. In managing pro-
tective problems, the planet community has worked on and continues to figure on
definitions, techniques, tools, guidelines, methodologies, and standards so as to satisfy
the strain for additional advanced systems like software system controlled medical
systems, weapon systems, and craft systems. With a frequently increasing penetration of
IT into business and repair sectors, numbers of essential systems square measure
increasing and there’s additional demand for safer systems [4, 5]. Usually, safety
analysis techniques swear completely on the talent and experience of the security
engineer. software system safety analysis is also drained various ways in which the two
commonest fault modeling techniques referred to as Failure Mode and Effects Analysis
(FMEA) and Software system Failure Mode and effects analysis (SFMEA) [3]. These
techniques measure the finding problems and of constructing plans to deal with failures,
as in probabilistic risk assessment. FMEA is in fact, associate degree erring technique as
“human is to err” and has limitations to investigate the security issues [6, 16] FFMEA
and SFMEA may be used standalone while not a mixture of different safety techniques
for roaring and complete analysis of safety-critical systems [8, 14, 15]. By applying the
SFMEA procedure together with the various phases of a product’s lifecycle, the
approach offers associate degree economical system for staring at all the routes during
which a software will fail. SFMEA may be a normally used methodology to boost the
security of software system [11]. Within apply of associate degree SFMEA, analysts
gather lists of module failure modes and check out to deduce the results of these failure
modes on the system [7]. Experimental platforms measure key components for the
sensible approach of hypothesis and theories up to the mark systems. One amongst the
most contributions of this experiment is that the indisputable fact that they’re the most
tool for technological transfer and therefore the innovation method. the method of
SFMEA has three main focuses on acknowledgment and assessment of potential failures
Safety Assessment with SMS Approach 681
and their effects. It’s used to differentiate potential style short comings specified they’ll
be qualified within the early phases of a style program. In OSEL annual report [9]
describes software failures. Early researches represented restricted experimental ideas.
They delineate additional theoretic ideas. This paper depicts however Safety is assessed?
for Software-Intensive essential Systems.
The remaining sections in this paper are organized as follows. Section 2 discusses
proposed Safety Management System (SMS). The experimental setup and imple-
mentation are described in Sect. 3. Section 4 gives the conclusion.
Safety
Management System
(SMS)
Safety Assessment System:
Performance monitoring, internal safety
assessment, Management review
The goal of this research work is to edify the utilization of SFMEA for an embedded
control system through the advancement of an experiment with a laboratory prototype
Goddard talked [10] about the methodology for SFMEA together with perceiving the
sorts of variables and their failure modes. Experimentation of the proposed procedure
depends on the ball position controller system, which is appeared in Fig. 2. The control
objective of this work is to direct the stream of air into a plastic tube to keep a little
lightweight ball suspended at a predetermined height called the set-point. Expanding
the flow raises the ball and diminishing the flow brings down it. The BPCS test
comprises of 2-feet long white plastic tube, lightweight ball, Dc engine fan, and
infrared sensor circuit and 89S52 microcontroller. The vertical 2-feet long clear plastic
tube joined to a stand, which contains a lightweight ball inside, a Dc engine fan at the
base to lift the ball, and an infrared sensor at the top to sense the ball’s height. The tube
is associated with the Dc engine fan deltas by means of an input manifold which has a
channel at the base as demonstrated. There is an output manifold at the highest point of
the plastic tube with an outlet as appeared. The presence of the manifolds is a key part
of the experiment.
The infrared device hardware identifies the position of the light-weight ball and
therefore the microcontroller directs the facility provide connected to the Dc engine fan
so as to manage the air flow into the white plastic tube, keeping the light-weight ball at
a certain height.
684 K. Jayasri and A. V. Ramana
BPCS Explanation
The light-weight ball position system experiment could be a system created out of five
modules, wherever one amongst them includes a Dc engine fan to blow air into the
white plastic tube moving a polystyrene light-weight ball within it. A diagram of the
BPCS system seems in Fig. 3. Each module is combined with the others by a typical
complicated. The bottom box compares to the input manifold. The airflows into the
manifold by the distinctive input situated at the left facet of the box. The air within the
input manifold is circulated over each module in an exceedingly parallel method.
Looking on the force connected by the dc engine fan and therefore the input facet of the
manifold, the air flux yield with its direction moving the ball within it. The air from the
plastic tube is consolidated once more within the output manifold and shot out through
the output, within the right 1/2 the case. This reconfigurable structure possesses input
and output manifolds in individual boxes that may be related to them by their style as a
plaything piece. The BPCS renascent model contains a vitality exchange via flow from
the dc engine fan to the light-weight ball. This exchange is often nonlinear.
zero, and therefore the dc motor fan speed decreases and stops. This ends up in the ball
falling to very cheap of the plastic tube. If there’s no output, the impact is that the same
as if the signal is low and therefore the system loses response, stops, and therefore the
ball falls to very cheap of the plastic tube. The purposeful SFMEA views failure
modes, element impact, connected hazards impact and class shaped by the software
system [17]. The observation section recommends putting in is that the redundant
device to observe the ball’s location and restart the system.
4 Conclusions
There has been continuous research in this arena to build a completely safe system and
is in vain. Alternatively, there’s every possibility to bring the behavior of software-
intensive systems to function within acceptable risk limits. A rigorous safety assess-
ment, safety analysis and effective risk management would be advantageous. This
paper is mainly focus on techniques available for safety analysis and risk reduction in
software-intensive critical systems. An attempt is made to develop an approach towards
safety assessment of software on the model techniques prevalent in the literature such
as Software Failure Modes and Effect Analysis (SFMEA) together with risk assessment
to implement safety. The proposed system would be helpful for building resilient,
straight forward safety assessment of software, and usable by industrial safety practi-
tioners in the field of software-intensive critical systems and applications.
686 K. Jayasri and A. V. Ramana
References
1. Bowen O, Stavridou V (1982) Safety critical systems, formal methods and standards. Softw
Eng J
2. MIL-STD-882D (2000)
3. Haider AA, Nadeem A (2013) A survey of safety analysis techniques for safety critical
systems. Int J Future Comput Commun 2(2)
4. Leveson NG (1986) Software safety: why, what and how. Comput Surv 18(2)
5. Sindre G (2007) A look at misuse cases for security concerns. In: Proceedings of Henderson
Sellers, IFIP WG8.1 working conference on situational method engineering: fundamental of
experiences (ME 07). Geneva, Switzerland, IFIP Series, Heidelberg–Springer
6. Dev R (1990) Software system failure mode and effects analysis (SSFMEA) - a tool for
reliability growth. In: Proceedings of the international symposium on reliability and
maintainability
7. Jayasri K, Ramaiah PS (2016) An experimental safety analysis using SFMEA for a small
embedded computer control system. Int J Innovations in Eng Technol (IJIET) 7(3):342–351
8. Sundararajan A, Selvarani R (2012) Case study of failure analysis techniques for safety
critical systems. Advances in Comput Sci Eng Appl AISC 166:367–377. springerlink.com.
Springer-Verlag Berlin Heidelberg
9. Office of Science and Engineering Laboratories (OSEL) (2011)
10. Goddard PL (2000) Software FMEA techniques. Proc Ann Reliab Maintainability Symp,
pp 118–123
11. Jayasri K, Seetharamaiah P (2016) The quantitative safety assessment and evaluation for
safety-critical computer systems. ACM SIGSOFT software engineering notes. ACM New
York, vol. 41, no. 1, pp 1–8
12. Jayasri K, Venkata Ramana A (2017) The research framework for quantitative safety
assessment for safety-critical computer systems. Indian J Sci Technol 10(9)
13. Huang F, Liu B (2017) Software defect prevention based on human error theories. Chinese J
Aeronaut 39(3):1054–1070
14. Price CJ, Taylor N (2002) Automated multiple failure FMEA. Reliab Eng Syst Saf 76:1–10
15. Bruns G, Anderson S (1993) Validating safety models with fault trees. In: Proc. of 12th
international conference on computer safety, reliability, and security. Springer-Verlag,
pp 21–30
16. McDermott RE, Mikulak RJ, Beauregard MR (1996) The basics of FMEA, quality resources
17. Bowles JB, Wan C (2001) Software failure modes and effects analysis for a small embedded
control system, annual reliability and maintainability symposium. Proceedings of interna-
tional symposium on product quality and integrity, cat. no. 01CH37179
Lung Cancer Detection with FPCM
and Watershed Segmentation Algorithms
Abstract. Lung malignant growth drives the causes among disease related
passing around the world. WHO information showed 1.69 million passing away
in 2015. An early disease investigation can enhance the viability of therapy and
also upgrades victim’s possibility of exist. The Precision of disease investiga-
tion, Rate and Computerization levels decides the achievement of CAD
frameworks. In this paper, we worked with the few effectively existing frame-
works and discovered the best methodology for recognition of tumors. This
article talks about the division with FPCM and Watershed Transform calcula-
tions. Computer-aided design includes six stages – a. Image Acquisition, b.
Image Pre-processing, c. Lung Region Extraction, d. Segmentation, e. Feature
Extraction and f. Classification. Firstly, RGB picture is transfers to dark scale
thus the picture clamour is a greater distanced from original picture. Next
essential job is division which is performed by utilizing Watershed Transform
display. Watershed method characterizes the dark scale picture. After division,
highlight extraction is dissected by mean of the fragmented lung region lastly
delineation of lung lumps classifies with the help of SVM method. By using this
method we accomplished precision: 99% and the time are less than 2 s. The
proposed frameworks were executed in MATLAB programming.
1 Introduction
Malignancy is a trauma of the cells, which is said to be the body’s basic building
squares. The packaging always makes new cells to make, replace depleted tissue and
patch wounds regularly cells increment and fails deliberately [8]. In some instances
cells don’t construct segment and kick the can inside the standard way. This may in like
manner in addition perspective blood or lymph fluid inside the body to move closer
unpredictable or shape a bunch called a tumor. A tumor can wind up good or
destructive. The ailment that at first evolved in a body tissue or organ is implied as the
principle for the most part developments [10].
To vanquish those bothers, the creators suggested a CAD gadget for handling lung
area [6]. This look at first take over the intriguing picture getting ready with the fol-
lowing 8 methods together: a. Bit-Plane Slicing, b. Erosion, c. Median Filter, d. Custom
Filter, e. Dilation, f. Outlining, g. Lung Border Extraction and h. Flood-Fill counts for
gist of lung zone. For division, Fuzzy possibilistic C-Mean (FPCM) set of tips are
utilized for understanding and Support Vector Machine (SVM) were used for
characterizing.
Watershed Segmentation is some other division framework used to confiscate the
bounds of absorption in this delineation tumor from the spread MRI picture. This
division system is incredibly useful to partition objects while they may contact each
exceptional. This count helps in discovering the catchment basin and edge follows in
the image. In this representation, the edge line addresses the height that separates two
catchments bowls. For this, bwdist () method is for poll the partition from each pixel to
each non-zero pixel. Watershed estimation is utilizes for watershed () trademark.
The term watershed decamps to an edge, those allotment areas exhausted with the
guide of indisputable conduit systems. A catchment basin is the ground precinct
evacuating accurately into a stream or supply. Workstation examination of picture
contraptions starts with discovering them production feeling of which pixels has a place
with everything. This is known as photo division. The direction of operation is isolated
gadgets from establishment despite to everything.
2 Literature Review
Armato et al. (2001) proposed prototype that many dark phase limits are done to divide
lung stage to make succession of edge lung volumes. An 18-factor availability conspire
is utilized to bordering 3D structures, inside each ceiling lung quantity and discrete
framework fulfils degree foundation are settled on starter lung knob candidates. The
modified structure imparted a common handle region affectability of 70% with a
conventional of 1.5 false-positive distinctive bits of verification concerning stage when
finished for the total forty three-case database [1].
Armatur et al. (1992) used Lung most malignant growths are one of the common site
sorts perpetrating high mortality cost. The fantastic method for assurance from lung,
mainly sarcomas is its first identification and expectation. The site of lung majority
malignant growths in early degree is a hard issue, because of the state of the syndrome
cells; where in major of the cells are covered with one another. In this Histogram
Equalization is utilized for pre-preparing of the photos and highlight extraction way and
Support Vector Machine classifier to test the circumstances of an affected individual in
its initial degree regardless of whether it is standard or unconventional [2].
Cheran et al. (2005) proposed a model that a pneumonia handle is the best cus-
tomary sign of lung threat. Lung handles are about round regions of particularly extra
depth, which can be perceptible in X-pillar sneak peeks of the lung. Considerable
(regularly described as more important than 1 cm in separation over) critical handles
can be without trouble associated to conservative imaging apparatus and may be
perceived by needle biopsy or bronchoscope strategies [3].
Lung Cancer Detection with FPCM and Watershed 689
Image Acquisition: For a disease, right off the bat picture of inner slices of the body
anything to be required. CT check otherwise called X-beam processed tomography
makes utilization of X-beam for catching the pictures from dissimilar edges and uni-
fication of these films to form cross sectional tomographic picture of specific regions of
examining tissues. i.e. it enables the individual to see the status inside the body without
non-obtrusive strategies (Fig. 1).
Lung Region Extraction: The underlying stage is the excerpt of lung locale from CT
test picture. Central photo handling systems are connected to this thought process. The
snapshot conduct systems did within projected course of actions is: a. Bit-Plane Slicing,
b. Erosion, c. Median Filter and d. Dilation. The basic objectives of this technique are:
i. identification of lung area and ii. CT scan Regions of Interest (ROIs).
Lung Region Segmentation: After the respiratory organ area is recognized, future
technique is distribution of respiratory limb district in order to explore out the malignancy
knobs. This progression can decide Region of Interest (ROIs) that assists in significant
malignant growth locale. During this examination following 2 calculations are con-
nected: a. Fuzzy Possibility C-Mean (FPCM) and b. Watershed Transformation [12].
1. FPCM (Fuzzy Possibilistic C-Means): It is a bunching computation which con-
solidates traits of fluffy and possibilistic C-implies.
Therefore, a target work in the FPCM relying upon the two enrollments and nor-
mality’s can be appeared as:
Xc
JFPCM ðU; T; V Þ ¼ i¼1
um
i;j þ t n
d?Xj ; Vi ð1Þ
Constraints:
Xc
i¼1
li;j ¼ 1; 8j 2 f1; . . .. . .. . .; ng ð2Þ
Xn
t
j¼1 i;j
¼ 1; 8j 2 f1; . . .. . .; cg ð3Þ
An answer of the target capacity can be gotten via an iterative practice where the
degree of enrolment is:
" #1
Xc d?Xj ; Vi n1
2
Normality:
" #1
Xn d?Xj ; Vi n1
2
FPCM produces enrolments and openings concurrently, during the edge of standard
factor methods or group communities for every bunch. FPCM routinely maintains a
strategic distances from different issues. Basically this method is combination of fol-
lowing 2 methods: a. hybridization of Possibilistic C-Means (PCM) and b. Fuzzy
C-Means (FCM) [9].
2. Watershed Transformation: Watershed change is a typical procedure for picture
division. Utilizing past kind information has incontestable tough upgrades to
medicinal picture division calculations. We will in general propose an exceptional
system for civilizing watershed dissection by using ancient frame and come across
in sequence. In watershed, interior markers to get watershed lines of the angle of the
picture to be metameric. Utilize the gained watershed lines as outer markers. Every
locale delineated by the outside markers contains one inward marker and a piece of
the foundation. In watershed areas, even as not markers terrain component allowed
to be found (Fig. 2).
SVM for classification: SVM is a valuable methodology for substance class. Without
a doubt, even in spitefulness of the way that it’s thought about that Neural Networks
are less tangled to use than this, in the meantime, from time to time unsatisfactory
outcomes are obtained. The purpose of SVM is to make a frame which predicts target
estimation of information cases in the taking a gander at locate which can be known
handiest the characteristics [5]. Arrangement in SVM is an instance of Supervised
Learning. A phase in SVM portrayal includes lifestyle as which are allied with the
recognized bearings. Feature decision and SVM gathering together have use despite
when the desire for cloud precedents has been never again key. They may be used to
wind up aware of key components which might be related with something systems
perceive the arrangement.
Feature extraction: The abilities which utilized in this inspection in command to
create finding directions:
a. Candidate place Area
b. Candidate Area mean intensity price
c. Candidate vicinity Area
d. Elimination of remote pixels
692 N. Bhaskar and T. S. Ganashree
The mean intensity cost of candidate vicinity: this element suggest profundity esteem
for claimant area is resolute which allows dismissing progress locales which doesn’t at
any time conclude nasty expansion lump. The mean profundity charge shows the basic
power expense of the widely held pixels that comprise a place with the identical place
and is resolute employing the segments:
Pn
intensityðiÞ
Meanð jÞ ¼ i¼1 ð7Þ
n
Where, j describes the place of file and degrees from 1 to full wide assortment of
applicant regions inside the whole photograph. Power (I) demonstrates the CT pro-
fundity estimation of pixel I which organize from 1-n, where n is a general mixture of
pixels possess a place with j.
In this paper, we have actualized two calculations, i.e., FPCM (Fluffy Plausibility C
Means) and Watershed Transformation calculations. We have observed that watershed
is providing desirable outcomes above FPCM as far as exactness, mean approximation
of fractured region, condition of lung malignancy tumors, preparing time and speed of
execution. FPCM is setting aside much opportunity to execute which is over 30 s while
watershed is proceeding under 2 s of time to execute the entire procedure. FPCM isn’t
precise since it has high false positive rates and low affectability. A Watershed is
delivering more precise outcomes. We have investigated 50 CT pictures and proper
assessment is vaulted watershed is the best reasonable method for the condition of lung
lumps in the inception period (Figs. 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and 13).
Fig. 5. Mean value, Fig. 6. Mean value, accu- Fig. 7. Mean value, accuracy for
accuracy for the seg- racy for the segmented the segmented image by FPCM and
mented image by FPCM image by FPCM and SVM SVM training
and SVM training training
Fig. 9. Mean value, Fig. 10. Mean value, accu- Fig. 11. Mean value, accuracy for the
accuracy for the water- racy for the watershed seg- watershed segmented image and SVM
shed segmented image mented image and SVM training
and SVM training training
FPC Watershed
Fig. 12. Cancer cells (nodules) detected Fig. 13. Cancer cells (nodules) detected using
using FPCM and watershed algorithms FPCM and watershed algorithms
5 Conclusion
In this article, we are proposing a best technique to distinguish the lung malignancy.
The primary strategy is to recognize lung malignant growth is applying channels to CT
check pictures. Next, lung zone distinguished division practiced with assistance of
Fuzzy Possibility C-Means gathering figuring and watershed change calculation. These
characteristics are isolated and examination controls made. This deals with aid of
Support Vector Machine. A Watershed is best suitable process for observation of lung
tumors in starting periods. Moreover the Support Vector Machine improves precision
of diversion plan and it handles most developments. The proposed technique can
likewise be connected to some other malignant growth types like bosom disease, skin
malignant growth and so on. Additionally, it discovers applications in medicinal
research also.
Lung Cancer Detection with FPCM and Watershed 695
References
1. Armato SG, Giger ML, MacMahon H (2001) Automated detection of lung nodules in CT
scans: preliminary results. Med. Phys. 28(8):1552–1561. http://www.ncbi.nlm.nih.gov/
pubmed/11548926
2. Armatur SC, Piraino D, Takefuji Y (1992) Optimization neural networks for the
segmentation of magnetic resonance images. IEEE Trans. Med. Image 11(2):215–220.
http://www.neuro.sfc.keio.ac.jp/publications/pdf/sundar.pdf
3. Cheran SC, Gargano G (2005) Computer aided diagnosis for lung CT using artificial life
models. Proceeding of the seventh international symposium on symbolic and numeric
algorithms for scientific computing, Sept 25–29, IEEE Computer Society, Romania. ISBN:
0-7695-2453-2, pp 329–332
4. Wiemker R, Rogalla P, Zwartkruis R, Blaffert T (2002) Computer aided lung nodule
detection on high resolution CT data. Med. Image. Proc. SPIE, vol. 4684, pp 677–688. http://
adsabs.harvard.edu/abs/2002SPIE.4684.677W
5. Gomathi M, Thangaraj P (2010) A computer aided diagnosis system for lung cancer
detection using support vector machine. Am. J. Appl. Sci. 7(12):1532–1538, ISSN 1546-
9239
6. El-Baz A, Gimel’farb G, Falk R, El-Ghar MA (2007) A new CAD system for early diagnosis
of detected lung nodules. Proceeding of the IEEE international conference on image
processing, Sept 16-Oct 19. San Antonio, TX. ISSN: 1522-4880, ISBN: 978-1-4244-1436-
9, pp 461–464
7. Fiebich M, Wormanns D, Heindel W (2001) Improvement of method for computer-
assisted detection of pulmonary nodules in CT of the chest. Proc. SPIE Med. Image Conf.
4322:702–709
8. Ginneken BV, Romeny BM, Viergever MA (2001) Computer-aided diagnosis in chest
radiography: a survey. IEEE Trans. Med. Imaging 20(12):1228–1241, ISSN: 0278-0062
9. Gomathi M, Thangaraj P (2010) A new approach to lung image segmentation using fuzzy
possibilistic C-means algorithm. IJCSIS 7(3):222–228, ISSN: 1947 5500
10. Gurcan MN, Sahiner B, Petrick N, Chan H, Kazerooni EA et al (2002) Lung nodule
detection on thoracic computed tomography images: preliminary evaluation of a computer-
aided diagnosis system. Med. Phys. 29(11):2552–2558. http://www.ncbi.nlm.nih.gov/
pubmed/12462722
11. Kanazawa K, Kawata Y, Niki N, Satoh H, Ohmatsu H et al (1998). Computer-aided
diagnosis for pulmonary nodules based on helical CT images. Comput. Med. Imaging
Graph. 22(2):157–167. http://www.ncbi.nlm.nih.gov/pubmed/9719856
12. Penedo MG, Carreira MJ, Mosquera A, Cabello D (1998) Computer-aided diagnosis: a
neural-network based approach to lung nodule detection. IEEE Trans. Med. Imaging 17
(6):872–880, ISSN: 0278-0062
Early Prediction of Non-communicable
Diseases Using Soft Computing Methodology
1 Introduction
Non-communicable diseases (NCDs) are the leading cause of death globally and nearly
68% of all deaths worldwide are due to them in 2016 [1]. NCD is a non- infectious
disease that does not spread directly from one person to another and sustain for the
longer period without curing. Most of the NCDs may not be curable and leads to death.
However, with the advancement of technology and medical research especially in the
domain of NCDs prediction and treatment, these diseases can be controlled or cured
completely if they are detected at an early stage. When NCD recognized in the
beginning stage, it is more likely to respond to effective treatment and can result in a
higher likelihood of surviving, less morbidity and more affordable treatment cost. The
living nature of NCD patients can be significantly improved by providing adequate and
effective treatment at the right time.
The major five categories of NCDs are cardiovascular diseases, different cancers,
Respiratory diseases, Digestive diseases and Diabetes. Cardiovascular diseases and
cancers are the world’s biggest killers, accounting for a 44.84% of all deaths in 2016.
These diseases have remained the leading causes of death globally in the last two
decades. Respiratory diseases death rate dropped approximately 30% and digestive
diseases death rate increased 42% in the last 15 years. Even though Diabetes mortality
rate is 4.29% in 2016, it was increased approximately 150% as compared to year 2002.
The growth percentage is notably important and precautions to be taken to control it;
otherwise, diabetes can become number one killer after 50 years (Table 1).
Cardiovascular diseases (CVDs) are a group of disorders affecting the heart and
blood vessels. Ischemic heart disease and stroke are the major death cause diseases in
the category of CVDs. Ischemic heart disease also called as coronary artery disease,
occurs when the blood vessels that supply blood to the heart become narrowed and
interrupt blood flow and lead to chest pain, heart failure, arrhythmias and cause death.
A stroke occurs when an artery in the brain is blocked or leaks. As a result, brain cells
die within minutes due to loss of oxygen and nutrients. A stroke is a medical emer-
gency that require immediate prompt treatment to minimize brain damage and potential
complications.
Cancer is a disease where a few of the cells within the body become abnormal,
divides exponentially without any control, and form tumor cells and spread to nearby
cells. Breast cancer in women and lung cancer in men are the most death causing
diseases under this category. Once upon a time, there is no survival other than death for
the cancer. However, due to advancement in cancer research and medicine, it can be
cured when recognized in the early development stage. Knowing the cancer stage is
very important as it helps doctors to fully understand patient condition and to work out
the best possible treatment options.
Diabetes is one of the most prevalent chronic diseases around the world. Diabetes is
a group of diseases that influence insulin production and usage. Insulin is a hormone
produced by a gland called the pancreas. Insulin balance blood sugar levels, keeping
them in the narrow range that our body requires. Glucose or sugar is the source of
energy for the functioning of body cells. Glucose comes from the food we eat, observed
into the blood and moves through bloodstream and enter into body cells with the help
of insulin. Long-term diabetes includes type 1 diabetes and type-2 diabetes. In type 1
diabetes, the pancreas does not generate insulin. Due to lack of insulin, sugar not
transported into the body cells and builds up in the bloodstream. This results in higher
sugar in the blood and can lead to serious health problems. In type 2 diabetes, the
pancreas does not have enough insulin production, or insulin cannot be used effec-
tively. Body cells are resistant to insulin action and pancreas is unable to produce
698 R. Mogili et al.
2 Review of Literature
Many studies have been done that have focus on prediction of non-communicable
diseases. Researchers have applied different data mining and soft computing techniques
for predicting NCDs. Khera et al. [2] developed polygenic risk score to identify like-
lihood of developing non communicable diseases such as coronary artery disease, breast
cancer and type 2 diabetes well before any symptoms appear in a person using genome
analysis. Cinetha et al. [3] proposed a decision support system for predicting coronary
heart disease using fuzzy rules generated from decision tree by clustering the dataset.
This system compares normal and coronary heart disease patients to predict the possi-
bility of heart disease in a normal patient for the next ten years. Anderson et al. [4]
developed Framingham prediction equation for CVD mortality risk in next 10 years.
Dolatabadi et al. [5] used Heart Rate Variability (HRV) signal extracted from electro-
cardiogram (ECG) for the prediction of Coronary Artery Disease. They applied Prin-
cipal Component Analysis (PCA) to reduce the dimension of the extracted features and
later Support Vector Machine (SVM) classifier has been utilized. Kahramanli et al. [6]
proposed hybrid prediction model by combining artificial neural network (ANN) and
fuzzy neural network (FNN) to predict diabetics and heart disease. Adalı et al. [7]
developed a nonlinear empirical model with Back-Propagation Multi Layer perceptron
to predict cancer by using micro-array data. Geman et al. [8] proposed a hybrid Adaptive
Neuro-Fuzzy Inference System (ANFIS) model for prediction of diabetes.
Data mining is about the process of searching hidden information that can be turned
into knowledge, thus could be used for strategic decision making or answering fun-
damental research question. It aims at discovering knowledge out of data and pre-
senting it in a form that is easily compressible to humans. Data mining in health care
has become increasingly popular because it offers benefits to Doctors, patients and
healthcare organizations. Doctors can use data analysis to identify effective treatments
and best practices. By comparing causes, symptoms, treatments, and their adverse
effects, data mining can analyze which courses of action are most effective for specific
patient groups.
Early Prediction of Non-communicable Diseases 699
When the dataset contain missing, irrelevant, noisy and redundant data, the
knowledge discovery is not trustworthy. Data preprocessing methods are applied to
enhance the data quality. Missing values in health datasets may exist mainly because
sometimes doctors feel few of the medical test attributes are not required. The simplest
way to deal with missing values is to ignore tuples with them. However, this approach
is not much beneficial, as eliminating instances may introduce a bias in the learning
process, and sometimes most useful information are often lost. The best method to
handle the missing values is by using imputation methods [9]. Each missing values in a
tuple is imputed by approximation of most similar tuples exist in the dataset. Simple
mean-mode imputation, cluster based imputation methods, probabilistic based impu-
tation methods, regression based imputation methods and machine learning approaches
can be used to handle missing values.
A disease can be identified with the help of a set of symptoms (attributes) and these
symptoms from person to person vary among patients with same disease but the main
symptoms are common in all the patients. Attribute subset selection methods are used
to identify these main set of symptoms. The main objective is to obtain a subset of
attributes from the original dataset that still appropriately describe it. It removes
irrelevant and redundant attributes which might induce unintentional correlations
during the process of learning; there by diminishing their generalization abilities
decrease the chances of over-fitting of the model and making the learning process faster
and also less memory consuming.
Machine learning methods can be used to classify, recognize, or distinguish disease
stages. In other words machine learning models can assist the doctors in disease
diagnosis and detection [10, 11]. Despite for the most skilled clinician, this job is not
easy due to complex molecular, cellular and clinical parameters. In these situations,
human perception and standard statistics may not work but rather machine learning
models can. Machine learning is a branch of artificial intelligence that utilizes a variety
of statistical, logical, probabilistic and optimization tools to “learn” from past data
(training data) and to then use this knowledge to classify new data, identify new
patterns or anticipate novel trends. Machine learning methods can employ Boolean
logic (AND, OR, NOT), decision trees, IF-THEN-ELSE rules, conditional probabili-
ties, artificial neural networks, support vector machines and unconventional opti-
mization strategies to model data or classify patterns.
4 Proposed Model
Every disease has a set of symptoms and by the sign of these symptoms in the human
being, a deadly disease can be identified in the early stage and curing can be done with
minimum risk. But unfortunately all the symptoms of a disease may not appear in the
primary stage of disease. Only a few of the symptoms appear in the beginning and
remaining symptoms appear in the later stages. Additionally, symptoms of a disease
vary based on geographical locations and also differ from person to a person in rare
cases. Under these circumstances identification of a disease becomes difficult. Data
mining and soft computing techniques can be applied to make the task simpler.
By studying datasets collected from UCI repositories [13], other public sources
available on internet and also from nearby hospitals, a model has to be developed. This
model identifies the primary symptoms of a disease (attributes), intensity raise of
attributes from primary stage to the final stage of the disease and generate rules for
identifying the disease in the early stage. The early prediction of the deadly disease
model is developed in three stages by giving the disease dataset as input. In the first
stage the dataset is to be pre-processed. The initial step in our dataset preprocessing is
to identify the missing values and fill them with suitable values. Handling missing
values are very crucial because existence of missing values can influence the model
developed. There exist many ways to handle missing values, we need to study all those
missing value handling techniques and the best one for disease dataset is to be selected
and impute the missing values with it. Most probably cluster based imputation tech-
nique is used. The final step in our pre-processing stage is to apply attribute subset
selection method. It generates the subset of attributes that are crucial in disease
development. Different attribute subset selection methods such as Filter methods (LDA,
ANOVA, chi-square), Wrapper methods (forward selection, backward selection,
recursive feature elimination) and Embedded Methods (Lasso regression, Ridge
regression) are studied and applied on disease dataset.
In the second stage, the pre-processed disease dataset is given as input to the
classification. From the previous literature it was found that SVM and Neural Network
methods are predicting disease more accurately than other methods. Hence SVM and
Neural Network classification techniques are applied independently and the best one
for our dataset is chosen. The importance of the classifier in the model is to predict the
disease stage for a newly diagnosed patient.
Early Prediction of Non-communicable Diseases 701
In the third stage, the pre-processed dataset combined with almost idle condition
healthy data and clustered using fuzzy clustering. When a new person (unknown
disease stage) data is given to these fuzzy clusters, the fuzzy value of this new data
belongs to each cluster is calculated. The more the fuzzy value belongs to unhealthy
cluster, then more chances of getting the disease in the future. In other words, the
calculated fuzzy values determine the probability of getting the disease in the near
future. Next, association rules for each fuzzy cluster are generated. Based on fuzzy
values, these association rules are analyzed to identify the risk factors (attributes) that
lead to affect with disease. So that it can warn the person to minimize these risk factors.
The intensity level changes in the data of primary attributes (risk factors) and their
dependent attributes from beginning stage to last stage are studied. In this process
variation in the data of key attributes and its pattern is analyzed. Once the increasing
pattern is understood, then disease primary causes are identified. Finally, the basic
symptoms for identifying disease are obtained. The model for early prediction of
disease is given below (Fig. 1).
Classification stage
Disease Classification Soft computing
stage Preprocessed
Model Algorithm
Disease
Dataset
Analysis Stage
New person data Fuzzy clustering
The experiments are conducted using python on Cleveland heart dataset (CHD) col-
lected from UCI repositories [13]. In the Initial stage, CHD is pre-processed by han-
dling missing values and generating attribute subset. The missing values in CHD are
imputed with k-nearest neighbor imputation method choosing k as 5. But, the usage of
this method created gender reorder problem in the dataset. To overcome this problem,
the CHD is divided into groups based on gender and then the missing values in each
group is imputed with samples from the same group with Weighted K-Nearest
Neighbor imputation method choosing k as 5. Latter, Principle Component Analysis is
applied on CHD to simplify the data set complexity while retaining trends and patterns.
It simplifies CHD by obtaining 8 attributes {age, sex, blood pressure, ST depression,
vessels, thal, fasting blood sugar, Chest Pain and num or class} out of 14 attributes.
702 R. Mogili et al.
In the second stage, soft computing methods such as Neural Network (NN) and
Support Vector Machine (SVM) are applied separately on the pre-processed CHD to
develop disease prediction model. Scikit-Learn library [14] is used to implement SVM
with Gaussian kernel in python. The NN is implemented with NumPy library [15]
using sigmoid activation function. The prediction accuracy of the model developed
with SVM and NN are 92.6% and 94.5% respectively. NN model is giving better
prediction as compared to SVM model. This prediction model predicts the disease
stage.
In the final stage, fuzzy clustering is performed to predict the possibility of getting
heart disease. The nominal attributes in the dataset are converted into numerical values
to simplify the graphical analysis. The pre-processed CHD is divided into 5 cluster
groups using fuzzy cluster. Latter 20 new persons’ data is collected and each person
data is plotted in the fuzzy cluster space to get fuzzy values associated with each of it.
A person with attribute values such as {41, male, 124, 1.0, 1.0, normal, false, atypical
angina} got the fuzzy values as {healthy, stage1, stage2, stage3, stage4} = {0.632,
0.174, 0.094, 0.056, 0.043}. The first fuzzy value in the set shows the probability of
person belongs to healthy group and sums of the remaining fuzzy values shows the
probability belongs to unhealthy group. As in the above case, it shows 37% belongs to
unhealthy group; the chances of getting heart disease may increase when the attribute
values changes. The changes in risk factors (attributes) that lead to heart disease are
identified by varying one attribute value and keeping other attributes as fixed for all the
cases. This can be used to find fuzzy values associated in each case. If the fuzzy value
belongs to unhealthy group increases, then the chances of getting heart disease also
increase. The following observations are made by varying each attribute value.
• As bigger is blood pressure as higher is the heart disease stage.
• When ST depression increases then the heart disease stage also increases.
• If fasting blood sugar is false, then the probability of heart disease is high.
• As bigger is vessels as higher is the heart disease stage.
• The probability of getting Heart disease in men is higher than in women.
• When chest pain is asymptotic, then probability of heart disease is high.
Finally, other supporting attributes that influence the above observed attributes are
to be identified. For example, blood pressure increases due to supporting attributes such
as Smoking, obesity, sedentary life, too much salt in the diet, too much alcohol con-
sumption and stress. If these supporting attributes are present in a person and these are
in a stage to influence the observed attributes, then we can predict the person may get
heart disease. When the unhealthy fuzzy value > 30%, periodic health checkups need
to be conducted and the generated time series data is analyzed to predict the probability
of getting heart disease. This early prediction can alert the person to nullify these
supporting attributes to decease the chances of getting heart disease in future.
Early Prediction of Non-communicable Diseases 703
6 Conclusion
Non-communicable diseases and their risk factors pose a serious threat to global health.
The mortality rate due to NCDs can be reduced by predicting in the early stage. The
proposed model predicts the probability of getting disease in the near future and also
identifies risk factors that lead to disease. The threat of NCDs can be minimized by
maintaining balanced diet, regular exercise, reducing obesity, quitting smoke and
alcohol. Future enhancements can be done by conducting experiments on the proposed
model using other NCD datasets.
References
1. Global Health Estimates 2016: (2018) Disease burden by Cause, Age, Sex, by Country and
by Region, 2000–2016. Geneva, World Health Organization
2. Khera AV et al (2018) Genome-wide polygenic scores for common diseases identify
individuals with risk equivalent to monogenic mutations. Nat. Genet. 50(9):1219
3. Cinetha K, Dr. Uma Maheswari P, Mar.-Apr. (2014) Decision support system for precluding
coronary heart disease using fuzzy logic. Int. J. Comput. Sci. Trends Technol. (IJCST)
2(2):102–107
4. Anderson KM et al (1991) Cardiovascular disease risk profiles. Am. Heart J. 293–298
5. Dolatabadi AD, Khadem SEZ, Asl BM (2017) Automated diagnosis of coronary artery
disease (CAD) patients using optimized SVM. Comput. Methods Programs Biomed.
138:117–126
6. Kahramanli H, Allahverdi N (2008) Design of a hybrid system for the diabetes and heart
diseases. Expert Syst. Appl. 35(1–2):82–89
7. Adalı T, Şekeroğlu B (2012) Analysis of micrornas by neural network for early detection of
cancer. Procedia Technol. 1:449–452
8. Geman O, Chiuchisan I, Toderean R (2017) Application of adaptive neuro-fuzzy inference
system for diabetes classification and prediction. E-health and bioengineering conference
(EHB), 2017. IEEE
9. Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays.
Bioinformatics 17(6):520–525
10. Cruz JA, Wishart DS (2006) Applications of machine learning in cancer prediction and
prognosis. Cancer Inf. 2:117693510600200030
11. Krishnaiah V, Narsimha DG, Chandra DNS (2013) Diagnosis of lung cancer prediction
system using data mining classification techniques. Int. J. Comput. Sci. Inf. Technol.
4(1):39–45
12. Cortes C, Vapnik V (1995) Support-vector networks. Mach. Learn. 20(3):273–297
13. Asuncion A, Newman D (2007) UCI machine learning repository
14. Pedregosa F et al (2011) Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12
(Oct):2825–2830
15. Peirce JW (2009) Generating stimuli for neuroscience using PsychoPy. Frontiers in
Neuroinformatics 2:10
High Throughput VLSI Architectures
for CRC-12 Computation
Abstract. This paper presents high speed VLSI architectures from serial
architectures to parallel architectures with improved throughput and low latency.
This paper introduces IIR filter based design architecture for implementation of
parallel CRC and comparison is done for the implementations of CRC-12
polynomial equation. A LFSR is used as main component for these imple-
mentations. The proposed design consists of parallel architectures of Single and
multi level. These architectures had been implemented using verilog language
code and simulated using Xilinx tool 14.1.
1 Introduction
LFSR is one of the major components to compute CRC in many DSP and commu-
nication systems [1]. To find the transmission data errors CRC is used in communi-
cation receivers. Cyclic Redundancy Check operations were performed by LFSR. CRC
is a very useful technique in finding errors and easily implemented technique using
LFSR component to obtain data correctly [2]. In CRC, a common Generating Poly-
nomial (GP) is used for both encoding and decoding operations at transmitter and
receiver respectively [3]. CRC implementation using serial architectures is not suitable
for practical high speed communications, due to distribution of the clock. In order
improve the throughput by reducing the latency, a serial to parallel transformation is the
best solution [4]. However, parallel architectures may cause to increase the critical path
[5–7]. Another disadvantage with parallel processing techniques requires more XOR
gates and delay elements, which needs to be reduced [8, 9]. Various researchers pro-
posed different designs to get good performance inters of latency and throughput [9–
15]. The proposed architectures start from LFSR, which is generally used for serial
cyclic redundancy check.
The paper is planned as follows. The brief analysis of serial and proposed parallel
implementations is presented in Sect. 2. Section 3 contains the RTL schematic dia-
grams, output wave forms of CRC-12 serial and level of parallel architectures, and the
comparison of proposed architectures. Finally Sect. 4 concludes the paper.
yðnÞ ¼ yðn 12Þ þ yðn 11Þ þ yðn 10Þ þ yðn 9Þ þ yðn 1Þ þ f ðnÞ ð1Þ
Where
f ðnÞ ¼ uðn 12Þ þ uðn 11Þ þ uðn 10Þ þ uðn 9Þ þ uðn 1Þ ð2Þ
The critical path of single stage parallel architecture is 2 TXOR and the latency and
throughput is 12 clock cycles and 1 respectively.
yðn 1Þ ¼ yðn 13Þ þ yðn 12Þ þ yðn 11Þ þ yðn 10Þ þ yðn 2Þ þ f ðn 1Þ
ð3Þ
Fig. 3. Loop update equations for CRC-12 two level parallel architecture
Where
f ð3kÞ ¼ uð3k 12Þ þ uð3k 11Þ þ uð3k 10Þ þ uð3k 9Þ þ uð3k 1Þ ð7Þ
The critical path, latency and throughput for two stage architecture are 10Txor,
6 clock cycles and 2 respectively.
yðn 1Þ ¼ yðn 13Þ þ yðn 12Þ þ yðn 11Þ þ yðn 10Þ þ yðn 2Þ þ f ðn 1Þ ð9Þ
Taking an unit delay for the Eq. (3) the equation can be framed as
yðn 2Þ ¼ yðn 14Þ þ yðn 13Þ þ yðn 12Þ þ yðn 11Þ þ yðn 3Þ þ f ðn 2Þ ð10Þ
Substituting the Eqs. (9) and (10) in Eq. (1) final equations for y(n) can be framed as
¼ yðn 14Þ þ yðn 12Þ þ yðn 11Þ þ yðn 9Þ þ yðn 3Þ þ f ðn 2Þ þ f ðn 1Þ þ f ðnÞ ð12Þ
¼ yðn 12Þ þ yðn 11Þ þ yðn 10Þ þ yðn 9Þ þ yðn 1Þ þ f ðnÞ ð13Þ
using a look ahead technique the loop update equations are (Fig. 5)
Fig. 5. Loop update equations for CRC-12 two level parallel architecture
Where
f ð3k þ 3Þ ¼ uð3k 10Þ þ uð3k 6Þ þ uð3k þ 1Þ ð17Þ
The critical path for three stage parallel architecture is 10Txor. Latency and
throughput for three stage architecture are 4 cycles and 3 (Fig. 6).
3 Simulation Results
The results which are presented in this section are simulated using Xilinx 14.1; the
detailed analysis of all architectures along with the simulation results are shown below.
The Figure 8 represent the CRC-12 single level parallel architecture simulation
waveform in which the output sequence is obtained after 12 clock cycles In the above
single level parallel architecture latency is 12 and throughput obtained from this archi-
tecture is 1.
The Fig. 10 represents the CRC-12 three stage parallel simulation waveform in
which the output sequence is obtained after 4 clock cycles. In three level parallel
architecture latency is 4 and throughput obtained from this architecture is 3.
Fig. 10. Computation of CRC with proposed three stage parallel architecture.
4 Conclusion
This paper has proposed new high speed VLSI architectures from serial architectures to
parallel architectures with improved throughput and low latency. From the simulations
and synthesis reports the proposed three stage parallel architecture has an improved
performance over other structures. In future, the performance can be improved further
with retiming and unfolding techniques along with the folding transformation.
High Throughput VLSI Architectures for CRC-12 Computation 711
References
1. Cheng C, Parhi KK (2009) High speed VLSI architecture for general linear feedback shift
register (LFSR) structures. In: Proceedings of 43rd Asilomar conference on signals, systems,
and computers, Monterey, CA, November 2009, pp 713–717
2. Ayinala M, Parhi KK (2011) High-speed parallel architectures for linear feedback shift
registers. IEEE Trans Signal Process 59(9):4459–4469
3. Derby JH (2001) High speed CRC computation using state-space transformation.
In: Proceedings of the global telecommunication conference, GLOBECOM’01, vol 1,
pp 166–170
4. Zhang X, Parhi KK (2004) High-speed architectures for parallel long BCH encoders. In:
Proceedings of the ACM Great Lakes symposium on VLSI, Boston, MA, April 2004, pp 1–6
5. Ayinala M, Parhi KK (2010) Efficient parallel VLSI architecture for linear feedback shift
registers. In: IEEE workshop on SiPS, October 2010, pp 52–57
6. Campobello G, Patane G, Russo M (2003) Parallel CRC realization. IEEE Trans Comput 52
(10):1312–1319
7. Parhi KK (2004) Eliminating the fan-out bottleneck in parallel long BCH encoders. IEEE
Trans Circuits Syst I Regul Pap 51(3):512–516
8. Cheng C, Parhi KK (2006) High speed parallel CRC implementation based on unfolding,
pipelining, retiming. IEEE Trans Circuits Syst II, Express Briefs 53(10):1017–1021
9. Ayinala M, Brown MJ, Parhi KK (2012) Pipelined parallel FFT architectures via folding
transformation. IEEE Trans VLSI Syst 20(6):1068–1081
10. Garrido M, Parhi KK, Grajal J (2009) A pipelined FFT architecture for real-valued signals.
IEEE Trans Circuits and Syst I Regul Pap 56(12):2634–2643
11. Cheng C, Parhi KK (2007) High-throughput VLSI architecture for FFT computation. IEEE
Trans Circuits and Syst II Express Briefs 54(10):863–867
12. Mukati V (2014) High-speed parallel architecture and pipelining for LFSR. In: International
Journal of Scientific Research Engineering & Technology (IJSRET), IEERET-2014
Conference Proceeding, pp 39–43
13. Huo Y, Li X, Wang W, Liu D (2015) High performance table-based architecture for parallel
CRC calculation. In: The 21st IEEE international workshop on local and metropolitan area
networks, Beijing, pp 1–6
14. Mathukiya HH, Patel NM (2012) A novel approach for parallel CRC generation for high
speed application. In: 2012 international conference on communication systems and network
technologies, Rajkot, pp 581–585
15. Jung J, Yoo H, Lee Y, Park I (2015) Efficient parallel architecture for linear feedback shift
registers. IEEE Trans Circuits Syst II Express Briefs 62(11):1068–1072
Software Application Test Case Generation
with OBDM
1 Introduction
Testing is a movement of evaluating the framework or its segments with the resolved to
discover whether it satisfies the measured necessities or not. This action outcome in the
unmistakable likely and fluctuation between their outcomes in some cases it can be
characterized as “movement of researching programming thing to recognize the con-
trasts amongst existing and required conditions and to assess the elements of the
product thing”.
Importance of Testing
Amid configuration and development programming is tried to reveal mistakes [13]. The
realm has seen numerous fiascos in light of the disappointment of programming items.
Presently in productiveness, guaranteeing the eminence and dependability of pro-
gramming items takes turned into a vital issue. Along these lines, to guarantee pro-
gramming dependability testing is extraordinary and most requesting errands in
programming progress. It finds issues and guarantees quality, agreeableness. The
objective of testing is to discover issues, not to demonstrate rightness. And importance
of the testing was explained in the article [13].
Hands on Testing: This write contains the testing of the Software. Herein write the
analyzer take responsibility rheostat in excess of the part of culmination patron. Pro-
grammer use test arrangement, research or test circumstances to experiment the Soft-
ware to ensure the breadth of taxing [11, 12].
Motorization Testing: It be present likewise outstanding as “Test Automation”. This
development take account of computerization of a guidebook process, is make use of to
re-outing the test state of affairs that done actually, promptly and more than once [13].
When to Automate: It is best suited in the following:
Hefty and acute projects. Unchanging Software.
Chucks are stable. Convenience of stint.
Retrieving the application with
numerous users. [11]
It skeletons the approach that will be used to test an application, Stereotypically the
Eminence Guarantee Team Lead is accountable for writing Proposal. A test proposal
will contain
714 K. Koteswara Rao et al.
Test Scenario
It clarifies what locale will be tried; guarantee that all method floats are tried from end
to end. The term test situation and experiments are utilized. At the point when seen
from this observation test situations are experiments [15].
Test Case
It includes the arrangement of steps, conditions and commitments which can be utilized
while execution the testing undertakings [16]. There are various sorts of experiments
alike Functional, undesirable, botch, sensible, lustful, UI experiments and so on.
Moreover test cases are engraved to keep way of testing scope of Software. More often
than not, there is no legitimate layout which is utilized through the experiment com-
posing, the principle segments are:
Test case ID. Determination Predictable
Invention Segment Molds Outcome.
Merchandise Pre-Conditions. Authentic
variety Stages. Outcome.
Amendment Pole Circumstances
antiquity
to go for arbitrary testing? Gives us a simple method for confirming the test outcomes,
test sources of info are arbitrarily produced as pre-operative profile. Data sources may
spare some time and exertion through mindful test input determination techniques
1. Can use to gauge the unwavering quality
2. It is extremely valuable in discovering low recurrence bugs
3. It can ready to give 80% exactness of the item for consumption item
In sequence diagram set of nodes representing objects (Kb) and set of edges that
indicate the function (J). Where, J 2 Sj represents the synchronous function. Function
has the following 6 attributes.
Jsource 2 Kb - Basis of the function, Jdest 2 Kb – Journey’s end of the function and
where Jsource 6¼ Jdest Jname - Name of the function, JBW 2 Sj - Backward navigable
function and where, JBW 6¼ J it is denoted as “-”.
JER - Probabilistic carrying out rate of a function in a Sequence Diagram and where,
0 JER 1 and the failure to pay value is 1.
JEER - Expected carrying out rate of a function in a Sequence Diagram and where,
0 JEER 1 and the failure to pay value is 1.
Researcher considers a control structure of a source code, in which the carrying out
rate of a function may perhaps be affected. Think through a function an alt united
splinter only when the circumstance in fragment stands contented. If the function is
accomplished within this circumstance fragment, then the likelihood of carrying out
rate of a function is 0.5. Otherwise, defaulting value is 01. Probable implementation
proportion of a function is a probability of the execution rate of a sequence illustration.
In other words, it is the odds of the carrying out time for the total number of functions
in a particular class to the implementation time for the entire amount of functions in the
whole input application. The function in a sequence diagram is accomplished only
when it is make active. The failure to pay value of JEER is also 01 (Fig. 2).
The suggested method produces the test cases based on the OBDM which is one among
the efficient test case generation techniques. Here any software application code can be
taken whereas researcher took banking application code. The software application
which is meant for checking, takes as an input for object behavior dependence model in
software taxing.
Apiece solicitation has the number of classes, function specifically employed for
generation of test case. The recommended OBDM method principally single-
mindedness on the functions, coverage metrics of the submission applied for the test
case generation. This will avoid generating duplicate and insignificant test cases. The
function designation is represented as a fickle in accomplished method. In flowchart
overall process of test case generation. is illustrated and specifically made it cleared
below with input application and resulting values.
Figure 3 illustrates the Generation process of test case. Reflect on each function as
foundation function. Apiece foundation function, a mutable designation is specified,
then everyone function is verified to discovery whether is previously to be paid for
0.07121:skc\univ.\Abak1.java ,
0.045789::skc\univ.\Ater.java
0.053123:: skc\univ.ADeletAc.java
……
…...0.0876 :skc\univ.\ADp.java
Software Application Test Case Generation with OBDM 719
J2 -->Abak1 (- ,1.0, 0.07561) --> J2, J3 --> Adp.(- ,1.0, 0.058301) --> J5
3.7 LC Coverage
It is as in good health documented as the proclamation exposure or section reportage.
This one as well procedures quality of enigma and makes sure flow of poles apart path
[Abk1, dep, 0, 0], [Abk1, wdl, 0, 0], [Abk1, dispy, 0, 0], [Abk1, bais, 0, 0]
3.8 LP Coverage
This inclusion measurements reports whether every one circle body is actualized
multiple times, decisively once or more than once. This measurement reports whether
circle body is actualized absolutely once or more than once for do-while circles. What’s
more, also, while-circles and for-circles perform more than once. This information isn’t
accounted by other inclusion measurements. For instance, ponder one application; it
has two classes with four capacities here the capacity names are symbolized as a
variable one. The specified process illustrated in Table 1.
720 K. Koteswara Rao et al.
Class A Class B
A1 B1
If {
{ B2
A2 C1
B1 }
} B2
A2 If
{ {
C1 A1
} }
Now this exertion the researcher took the bank application as input and applied the
procedure of OBDM as stated above in the methodology. Here in the results for the
software banking application with size of 48 classes and 108 functions the total
numbers of generated test cases are 661, in which there may contain redundant, illegal,
similar, failure test cases i.e. Interactive faults (influencing errors). In order to overcome
these issues in future we will give these OBDM generated test cases to GA, AGA, PSO
to generate optimal test cases i.e. reducing the interactive fault proneness in software
application.
References
1. Darab MAD, Chang CK (2014) Black-box test data generation for GUI testing. In:
Proceeding of IEEE international conference on quality software, pp 133–138
2. Arts T, Gerdes A, Kronqvist M (2013) Requirements on automatically generated random test
cases. In: Proceedings of IEEE federated conference on computer science and information
systems, pp 1347–1354
3. Tahbildar H, Kalita B (2011) Automated software test data generation: direction of research.
Int J Comput Sci Eng Surv (IJCSES) 2(1):99–120
4. Campos J, Abreu R, Fraser G, d’Amorim M (2013) Entropy-based test generation for
improved fault localization. In: IEEE international conference on automated software
engineering (ASE), pp 257–267
5. Ahmed BS, Sahib MA, Potrus MY (2014) Generating combinatorial test cases using
Simplified Swarm Optimization (SSO) algorithm for automated GUI functional testing. Int J
Eng Sci Technol 17:218–226
6. Han AR (2010) Measuring behavioral dependency for improving change proneness
prediction in UML based model. J Syst Softw 83:222–234
7. Arcuri A, Briand L (2012) Formal analysis of the probability of interaction fault detection
using random testing. IEEE Trans Softw Eng 38(5):1088–1099
8. McMinn P, Harman M, Lakhotia K, Hassoun Y, Wegener J (2012) Input domain reduction
through irrelevant variable removal and its effect on local, global, and hybrid search-based
structural test data generation. IEEE Trans Softw Eng 38(2):453–477
9. Arcur A (2012) A theoretical and empirical analysis of the role of test sequence length in
software testing for structural coverage. IEEE Trans Softw Eng 38(3):497–519
10. Yu B, Pang Z (2012) Generating test data based on improved uniform design strategy. In:
International conference on solid state devices and materials science, vol 25, pp 1245–1252
11. Pressman RS (2005) Software engineering; a practitioner approach, 6th edn. Mc Graw-Hill
International Edition, Boston ISBN 0071240837
12. Sommerville I (1995) Software engineering. Addison-Wesley, Reading ISBN 0201427656
13. Beizer B (1990) Software testing techniques, vol 2. Van Nostrand Reinhold, New York
ISBN-10: 0442206720
14. Rao KK, Raju G, Nagaraj S (2013) Optimizing the software testing efficiency by using a
genetic algorithm; a design methodology. ACM SIGSOFT 38(3):1–15
15. Rao KK, Raju G (2015) Developing optimal directed random testing technique to reduce
interactive faults-systematic literature and design methodology. Indian J Sci Technol 8
(8):715–719 ISSN 0974-6846
722 K. Koteswara Rao et al.
16. Rao KK, Raju G (2015) Theoretical investigations to random testing variants and its
implications. Int J Softw Eng Appl 9(5):165–172
17. Kumar JR, Rao KK, Ganesh D (2015) Empirical investigations to find illegal and its
equivalent test cases using RANDOM-DELPHI. Int J Softw Eng Appl 9(10):107–116
18. Rao KK, Raju G (2015) Random testing: the best coverage technique: an empirical proof.
IJSEIA 9(12):115–122
Health Care Using Machine Learning-Aspects
Abstract. In this IT world people are working day and night in jobs and busy
life, people are using gadgets, smartphones, Due this hectic schedules people are
getting so many health issues These days, vast measure of information is
accessible all over the place. Hence, it is essential to break down this infor-
mation so as to separate some helpful data and to build up a calculation
dependent on this examination. This can be accomplished through information
mining and machine learning and it is a vital piece of man-made brainpower,
which is utilized to plan calculations dependent on the information patterns and
authentic connections between information. Machine learning is utilized in
different fields, for example, bioinformatics, interruption location, Information
recovery, amusement playing, showcasing, malware discovery, picture DE
convolution, etc. This paper explains how machine Learning applicable in health
care issues with different application territories.
1 Primer
4. Reinforcement Learning
5. Supervised Learning
Analytic radiology: Think about the activity of an analytic radiologist. These doctors
consume a proportion of time breaking down many images to perceive inconsistencies
in patients and significantly more. They are frequently basic in making an analysis, and
their decisions depend on what they find—for instance, distinguishing a tumor. Sim-
ulated intelligence can be reused to help an analytic radiologist. For instance, Project
Inner Eye depicts itself this way [4].
Venture Inner Eye is an examination based, AI-fueled programming device for
arranging radiotherapy. This is composed by Microsoft Research. Undertaking Inner
Eye creates AI methods for the programmed outline of tumors just as sound life
systems in 3D radiological pictures. This empowers: (1) extraction of focused radionics
estimations for quantitative radiology, (2) quick radiotherapy arranging, (3) exact
medical procedure arranging and route.
The product helps the radiologist via naturally following the framework of a tumor.
Radiology creates an extensive number of outputs of a region (for example start to
finish of a mind). The radiologist commonly experiences each sweep and follows the
framework of the tumor. After this is done, a 3D composite of the tumor can be
delivered. This undertaking takes hours. Utilizing ML, Project Inner Eye does this in
minutes [5] (Fig. 1).
726 K. Koteswara Rao et al.
writings from the doctors’ notes, diagnostics, and essential signs records. A Pheno-
typing calculation is an extraordinary procedure that filters through number of clinical
information focuses through the coding frameworks with specific charging codes,
radiology domino impact, and common language allotting of the huge measure of
writings from the doctors. AI calculations with upheld vector machine can be con-
nected in perceiving the rheumatoid joint pain with the mix of medicament records of
the patients to improve the precision of prescient models of disease [8].
Choice trees in social insurance field: Decision trees are intensely utilized in the
finding of sicknesses in human services field. In specific cases, the finding requires
consistent checking of autonomic neuropathy. In the medicinal services field, sensors
continually gather the huge information from the subject to recognize the examples in
the lumps of informational collections and for further handling of this information
through AI calculations. Recognizable proof of cardiovascular autonomic neuropathy
through sensors information is the way to comprehend the essential indications of
diabetes [9].
Worldwide order of illnesses: World Health Organization keeps up coding gauges
formally as a major aspect of United Nation’s endeavors to group number of unending
infections, pandemics, dismalness insights and infections through associated arrange
frameworks and fit in emergency clinic frameworks from corner to corner the globe [10].
Information mining of sensor information in medicinal data frameworks: In the
therapeutic field, substantial scale huge information is created through the sensor
information. There are a few wellsprings of such sensor information streaming into the
medicinal data frameworks, for example, logical sensors, wearable’s, physiological
sensors, and human sensors. The devices and strategies for diagnosing the illnesses
through the information mining of sensor information can be characterized into more
extensive classes, for example, information accumulation, pre-handling of the infor-
mation by isolating the clamors from the signs, information change through ETL, and
information demonstrating by applying affiliation rules, learning disclosure calcula-
tions, characterization models, grouping techniques, relapse models, and last rundown
of the KPIs got through the information mining by executing the outcomes.
Bayesian systems: Big information examination can help in recognizing the world-
wide episodes, for example, influenza dependent on the anonymized electronic well-
being narratives of those.
Advantages and Disadvantages of Machine Learning
a. Preferences of ML
i. As ML has numerous wide applications. For example, banking and monetary
area, social insurance, retail, distributing and so forth.
ii. Google and Facebook are utilizing AI to push applicable promotions. Those
commercials depend on clients past hunt conduct.
iii. It is utilized to deal with multi-dimensional and multi-assortment information in
powerful situations.
iv. It permits time cycle decrease and productive usage of assets.
728 K. Koteswara Rao et al.
v. If one needs to give persistent quality, expansive and complex procedure condi-
tions. There are a few apparatuses present in light of AI.
vi. As there are such a large number of things that go under the useful advantage of
AI. Additionally, they include the extension of independent PCs, programming
programs. Henceforth, it incorporates forms that can prompt the mechanization of
errands.
b. Inconveniences of ML
i. AI has the real test called Acquisition. Additionally, in view of various calcula-
tions information should be handled. What’s more, it must be prepared before
giving as contribution to particular calculations. In this manner, it substantially
affects outcomes to be accomplished or got.
ii. As we have one more term understanding. That it results is likewise a noteworthy
test. Those requirements to decide the viability of AI calculations.
iii. We can say employments of machine calculation are constrained. Additionally, it
isn’t having any surety that it is calculations will consistently work for each
situation possible. As we have seen that much of the time AI comes up short.
Along these lines, it have need of some thoughtful of the current issue to put on
the right calculation.
iv. Like profound learning calculation, AI additionally wants a great deal of
preparing information. Luckily, there are great deals of preparing information for
picture acknowledgment drives.
v. One outstanding impediment of AI is its vulnerability to blunders. That when they
do make mistakes, diagnosing and remedying them can be troublesome. As in
light of the fact that it will require experiencing the fundamental complexities.
vi. There are less conceivable outcomes to make quick forecasts with an AI frame-
work. Likewise, remember that it learns through verifiable information. Accord-
ingly, the greater the information and the more it needs to open to these
information, the better it will perform.
vii. Lack of changeability is another AI constraint.
c. Limitations
Despite the fact that AI has been transformative in certain fields, viable AI is trou-
blesome in light of the fact that discovering designs is difficult and frequently insuf-
ficient preparing information are accessible; subsequently, many AI programs regularly
neglect to convey the normal esteem. Explanations behind this are various: absence of
(reasonable) information, absence of access to the information, information predispo-
sition, protection issues, gravely picked assignments and calculations, wrong devices
and individuals, absence of assets, and assessment issues.
In 2018, a self-driving vehicle from Uber fails to distinguish an individual by
walking, who was killed after an accident. Attempts to use AI in human administrations
with the IBM Watson structure fail to pass on following a long time of time and billions
of theory.
AI approaches explicitly can encounter the evil impacts of different data tendencies.
In social protection data, estimation bumbles can much of the time result in tendency of
AI applications. An AI system arranged on your present customers simply will in all
Health Care Using Machine Learning-Aspects 729
probability be unfit to predict the necessities of new customer groups that are not
addressed in the planning data. Exactly when arranged on man-made data, AI is
presumably going to get a comparable secured and un-mindful inclinations successfully
present in the open field.
3 Future of ML
AI is an innovator innovation that right now shapes a basic part of various thriving and
set up ventures. This innovation enables PCs to get to shrouded bits of knowledge and
anticipate results, prompting surprising changes to organizations. Here are five key
estimates about the fate of AI. Inclination of machine learning applications. A machine
learning framework prepared on your present clients just will most likely be unable to
foresee the requirement of new client bunches that are not spoken to in the preparation
information. At the point when prepared on man-made information, machine learning
is probably going to get a similar protected and oblivious predispositions effectively
present in the public arena.
4 Conclusion
Machine learning is a standout amongst the most troublesome innovations of the 21st
century. Despite the fact that this innovation can at present be viewed as beginning, its
future is brilliant. The over five expectations have quite recently touched the most
superficial layer of what could be conceivable with AI. In the coming years, we are
probably going to see further developed applications that extend its abilities to
incomprehensible dimensions.
References
1. Auffray C, Balling R, Barrosso I, Bencze L, Benson M, Bergeron J, Bock C (2016) Making
sense of big data in health research: towards an EU action plan. US National Library of
Medicine National Institute of Health, 23 June, pp 8–71. http://dx.doi.org/10.1186/s13073-
016-0323-y
Health Care Using Machine Learning-Aspects 731
1 Introduction
Grid is a collation of ‘M’ micro grids. At each micro grid ‘mi’, the energy EGti ðmi Þ can
be generated by using primary or secondary sources or both [9]. Total energy generated
at grid during a timeslot ‘ti’ in a day [9] can be expressed as
X
M
EGti ¼ EGti ðmi Þ: ð1Þ
i¼1
The generated energy from the grid is transmitted and distributed via substations
[10]. The energy distribution in grid to users is unidirectional via lines, and can be
modeled as a tree network model [11, 12] as shown in Fig. 1. In the Fig. 1, micro grids
are represented by mi, i = 1 to M; ‘G’ is grid; Pp is primary substations under grid, p = 1
to P where P is the number of primary substations; Sp,s is secondary substations, s = 1 to
S, where S is the number of secondary substations under p; Up,s,u indicates users under
each ‘s’, u = 1 to U, where U is the number of users. The users with different load can be
represented as fC1 ; C2 ; ::; CC g. The entire tree network is connected via wireless
communication systems [10], and DSM handles energy distribution.
The learning automation (LA) is an intellect learning model and determines future
actions through the acquired knowledge. The energy distribution of smart grid can be
modeled by using LA (Fig. 2). It has a random environment, learning automation, and
reward or penalty structure. The load request from the users at substations, and at the
grid is to be computed by the smart grid environment. The demand varies dynamically,
and the smart grid environment passes these responses to the LA unit. After adequate
interactions, LA seeks to grasp optimal actions provided by the random environment.
The reward or penalties related to actions are assumed to be ‘0’ and ‘1’ respectively.
In LA, \Q; A; R; T; O [ is quintuple, Q ¼ q1 q2 :: qjQj are internal states,
A ¼ a1 a2 :: aj Aj are actions performed, R ¼ r1 r2 :: rjRj are respon-
ses from the environment, T : QXR ! Q is transition function which maps current state
and respond to automation next state, O : QXR ! A is output function which maps
current state and response to automation next action. The environment can be
abstracted by the triple \A; R; P0 [ , P0 ¼ p01 p02 :: p0jP0 j is penalty probability
set, p0l ¼ Pr rðtÞ ¼ rp0 jaðtÞ ¼ al , pi 2 P corresponds to an input action ai.
The pipelined LA (PLA) model of energy distribution in smart grid is shown in Fig. 3.
The LA is divided into various parallel blocks to reduce the computation complexity,
equipment complexity, and computational delay. This PLA model has learning
automation blocks on each level. The LA block at secondary substation represented by
LA(pp, ss), at each level, primary substation is LA(pp), at the grid is LAG and is shown
in Fig. 3. In Fig. 3, Ss indicates secondary substation environment, Pp is primary
substation environment, and ‘G’ is a grid.
The entire process of energy flow control in smart grid using PLA is divided into
three phases. They are: (i) load request evaluation phase, (ii) learning automation
phase, and (iii) energy calibration phase. The operation of each phase is explained
below.
Pipelined Learning Automation for Energy Distribution 735
The control system evaluates the load request from the users at various levels during
load request (LR) evaluation phase. Let LtRi ðpj ; sk ; uÞ is LR from a user under secondary
substation sk, LtRi ðpj ; sk Þ is requested from sk to primary substation pj, LtRi ðpj Þ is request
from pj to the grid. The total LR at sk from users can be given by:
X
U
LtRi ðpj ; sk Þ ¼ LtRi ðpj ; sk ; un Þ: ð2Þ
n¼1
X
S
LtRi ðpj Þ ¼ LtRi ðpj ; sk Þ: ð3Þ
k¼1
X
P
LtRi ðGÞ ¼ LtRi ðpj Þ: ð4Þ
j¼1
In learning automation phase of PLA model, the LA has first performed at a higher
level (grid), than at primary substation, and lastly at secondary substation. The grid has
two internal states. They are: (i) high demand state, and (ii) low demand state. The
internal states at various levels are represented by Q ¼ fqH ; qL g.
1; when EGti \LtRG
i
qtHi ¼ ti : ð6Þ
0; when EG LRG
ti
0; when EGti \LtRG
i
qtLi ¼ ti : ð7Þ
1; when EG LRG
ti
Penalty probability set PrtXi ¼ prHti X prLti X . The probability of low demand
prLti X ¼ 1, probability of high demands prHti X at LAX block, 0 prHti X \1 and
Eti ðXÞ
prHti X ¼ : ð8Þ
LtRi ðXÞ
The next state depends on EGti and LR from primary substations and is given by
LA Phase at Grid {LAG block}: The LA triple at grid (LAG block) is represented by
\AtGi ; RtGi ; PtGi [ . The responses of the grid environment to LAG block is given by
RtGi ¼ EGti LtRG i
LtRi ðp1 Þ LtRi ðp2 Þ. . .LtRi pp . During the LA phase of the grid, the energy
distributed to primary substations under grid are evaluated based on the responses. The
actions from LAG block is given by AtGi ¼ fE ti ðp1 Þ Eti ðp2 Þ :: :: E ti ðpP Þ g. The
actions from grid Eti ðpj Þ represent the energy distribution to primary substations and
can be evaluated as follows.
8 ti ti
After LA phase at the grid, LA(pj) blocks start their automation process in parallel.
Pipelined Learning Automation for Energy Distribution 737
E ti ðpj ; s2 Þ. . .E ti ðpj ; sS Þg. The actions from a primary substation E ti ðpj ; sk Þ represent the
energy distribution to sk and is given by
8 j k
>
> Pr ti ti
L ðp ; s Þ ; for k ¼ 1 to S&qtHi ¼ 1
>
< p j R j k
Such type of distribution in (13) improves the fairness of the system. Once, the
learning automation phase at the primary substation is completed, all the secondary
substation LA blocks under it starts their automation process.
LA phase at secondary substation {LA(pj,sk) block}: The LA triple at LA(pj,sk) block
is \Atpij ;sk ; Rtpij ;sk ; ptpij ;sk [ . The responses of sk environment to LA(pj,sk) block are given
by Rtpij ;sk ¼ Eti pj ; sk LtRi pj ; sk LtRi pj ; sk ; u1 . . .LtRi pj ; sk ; u1U . During the LA
phase at secondary substation, the energy is distributed to category of users based on
the responses.
The actions or automation outputs from LA(pj,sk) block is given by
Atpij ;sk ¼ E ti ðpj ; s1 Þ E ti ðpj ; s2 Þ. . .E ti ðpj ; sS Þ . The actions from secondary substation
E ti ðpj ; sk ; Cc Þ where c = 1 to C represents the energy distribution to the different cat-
egory users and can be derived as follows.
86 7
> 6 ti P U ti 7
> 6
> ; ; 2 þ c7
ti
> Pr L p s u C E
>
> 6 pj ;sk n¼1 R j k n c
7
>
> 6 7Cc ; for c ¼ C to 2
< 4 C c
5
E ti pj ; sk ; Cc ¼ : ð15Þ
>
>
>
>
>
> X 0t C
>
> E t i pj ; s k E i pj ; sk ; Cc ; for c ¼ 1
:
c¼2
The energy remained at high category is adjusted among next category user to
minimize the energy wastage. The adjustment factor Ec is given by
8
>
<
> 0; for c ¼ C
PU
Ecti ¼ Prpti ;s LtRi ðpj ; sk ; un 2 Cc þ 1 Þ þ Ecti þ 1 : ð16Þ
>
> j k
: n¼1
E ðpj ; sk ; Cc þ 1 Þ
ti
for c ¼ C 1 to 2
738 E. Susmitha and B. Rama Devi
In energy calibration phase, the energy left at each ‘s’ under ‘p’ is evaluated and
distributed to other ‘s’ under it. Later, if still some energy is left at primary substations
under grid, which will be adjusted among other primary stations, which occurs very
rarely. In calibration phase, the computations are evaluated from secondary substations
to the grid.
X
C
ELti ðpj ; sk Þ ¼ Eti ðpj ; sk Þ E ti ðpj ; sk ; Cc Þ: ð18Þ
c¼1
The total energy left at all substations connected to a primary substation is given by
X
S
ELti ðpj Þ ¼ ELti ðpj ; sk Þ: ð19Þ
k¼1
Adjust this energy to the users under sk from c = 1 to C until ELti ðpj Þ ¼ 0 and
ti
evaluate energy left after this adjustment if any denoted by ELA ðpj Þ.
X
P
ti
The energy left at grid; ELG ¼ ti
ELA ðpj Þ: ð20Þ
j¼1
Adjust this energy to the secondary substations under grid from j = 1 to P until
ti
ELG ¼ 0. This minute adjustments further improve efficiency of the system.
The fairness index [11] of the system is given by
!2
P
Q
Eq
q¼1
Fairness Index ¼ : ð21Þ
P
Q
Q Eq2
q¼1
where Eq = Energy distributed to child nodes from a node, Q = number of child nodes.
Pipelined Learning Automation for Energy Distribution 739
8 Simulation Results
Fig. 5. Energy distribution and load request during LA phase at grid and primary substation
740 E. Susmitha and B. Rama Devi
For more clarity, the total energy wastage at each substation without Ec is shown in
Table 1. This wastage is unified by considering adjustment factor ‘Ec’. The total
energy distributed to the users with and without Ec is shown in Table 1. The energy
distributed to the number of users is increased with adjustment factor ‘Ec’, and no
energy is left which yields 100% system distribution efficiency.
Table 1. Energy left, allotted and distributed; number of users allotted with and without Ec
Energy Energy distributed Energy distributed Total energy
left to number of users to number of users distributed with Ec/
without Ec without Ec with Ec Total energy allotted
(p1,s1) 3.76 80 83 380/380
(p1,s2) 3.9 80 83 316/316
(p1,s3) 5.64 80 84 382/382
(p2,s1) 3.96 80 83 356/356
(p2,s2) 3.78 72 75 340/340
(p2,s3) 6 80 84 320/320
Energy Calibration Phase in PLA Model: With Ec, no energy is left at ‘s’ and all
the energy is distributed to the users based on the demand. During the calibration
phase, if any energy is left at ‘s’ then it is adjusted among other secondary substation.
Similarly, if any, energy is left at the ‘p’ will be adjusted among other primary sub-
stations, and total wastage is minimized to zero, which leads to 100% distribution
efficiency.
Pipelined Learning Automation for Energy Distribution 741
Fairness Index of a smart grid: The fairness index at the grid is given in Table 2.
From the Table 2, it is observed that the proposed system results high fairness index at
various levels, which results fair energy distribution.
9 Conclusion
The energy distribution in a grid can be modeled as a tree network. In this work, a
pipelined architecture based LA approach for energy distribution is developed. It
simplifies the energy distribution flow, pipelines the process, reduces the awaiting time,
and speed up the process. The load request is evaluated during the evaluation phase;
energy distribution to primary and secondary substations is calculated based on penalty
probability. The energy distributed to a different category, users using adjustment
factor such that energy remained at high category is adjusted to the lower category
user which minimize the energy wastage to zero.
From the simulation results it is observed that, the proposed pipelined architecture
based LA approach minimizes the energy wastage, yields high fairness and high dis-
tribution efficiency.
References
1. Amin SM (2011) Smart grid: overview, issues and opportunities: advances and challenges in
sensing, modeling, simulation, optimization and control. Eur J Control 5–6:547–567
2. Kumar N, Misra S, Obaidat MS (2015) Collaborative learning automata-based routing for
rescue operations in dense urban regions using vehicular sensor networks. IEEE Syst J
9:1081–1091
3. Thathachar MAL, Sastry PS (2002) Varieties of learning automata: an overview. IEEE Trans
Syst Man Cybern Part B (Cybern) 32:711–722
4. Misra S, Krishna PV, Saritha V, Obaidat MS (2013) Learning automata as a utility for power
management in smart grids. IEEE Commun Mag 51:98–104
5. Thapa R, Jiao L, Oommen BJ, Yazidi A (2017) A learning automaton-based scheme for
scheduling domestic shiftable loads in smart grids. IEEE Access 6:5348–5361
6. Thathachar MAL, Arvind MT (1998) Parallel algorithms for modules of learning automata.
IEEE Trans Syst Man Cybern 28:24–33
7. Rama Devi B, Edla S (2018) Thermometer approach-based energy distribution with fine
calibration in a smart grid tree network. J Comput Electr Eng 72:393–405
8. Rama Devi B (2018) Thermometer approach based energy distribution in a smart grid tree
network. In: 2018 International conference on electronics technology (ICET 2018)
742 E. Susmitha and B. Rama Devi
9. Rama Devi B (2018) Load factor optimization using intelligent shifting algorithms in a smart
grid tree network. Cluster Comput J Networks Softw Tools Appl 2018:1–12
10. Rama Devi B, Bisi M, Rout RR (2017) Fairness index of efficient energy allocation schemes
in a tree based smart grid. Pak J Biotechnol 14(2):120–127
11. Rama Devi B, Srujan Raju K (2017) Energy distribution using block based shifting in a
smart grid tree network. In: Advances in intelligent systems and computing book series
(AISC), vol 712, pp 609–618
12. Rama Devi B (2017) Dynamic weight based energy slots allocation and pricing with load in
a distributive smart grid. J Adv Res Dyn Control Syst 11:419–433
13. MATLAB. https://www.mathworks.com/products/matlab.html. Accessed 31 May 2017
Extracting Buildings from Satellite Images
Using Feature Extraction Methods
1 Introduction
Satellite technology coupled with the human life in many ways such as critical analysis,
planning, decision making and preparedness. Analyzing and interpreting satellite
images is paramount important as its features are very complex in nature. Extracting
buildings from a satellite image is interesting which helps for planning, analysis and
estimation. With the help of segmentation algorithms, these spatial features can be
extracted from the image. Extracting the buildings in a satellite image helps for urban
planning, 3D viewing and construction, restoration and change detection, analyzing the
built-up areas, vegetation cover, bare soil, drought etc. Furthermore, we can analyze
and quantify the level of coverage of each attribute in the image. In this paper, five
feature extraction algorithms are applied with the satellite images and their performance
is compared.
Oriented Gradient (HOG) and Local Binary Pattern (LBP) method used to detect the
buildings. The SVM classifier is used to classify buildings and non buildings. The
output of HOG-LPB is again refined to extract by the segmentation algorithm to extract
the building regions in the image. Xin et al. proposed a morphological building index
(MBI) framework for detecting building in satellite images [2]. In [3], an integrated
approach for extracting building in a satellite image is presented. The SVM approach is
used to identify the building region in the image and threshold approach is used to
extract the buildings. Tahmineh et al. presented a framework for extracting the built-up
areas in the satellite image. The geometrical features of the building is classified by
SVM classifier and Scale Invariant Feature Transform (SIFT) algorithm is used to
extract the primitive features of the image [4]. In [5], a DSM interpolation algorithm is
proposed to retain the grid features in the image. The method employed with a graph
cut algorithm and neighbor contexture information which enhances the accuracy in
building extraction. Shaobo et al. presented a framework for extracting buildings in a
mobile LiDAR data. It make use of localization then segmentation method to extract
the buildings in the image [6]. In [7], neural network CNN based architecture is
proposed to extract the buildings and roads in a remotely sensed image. The post
processing employs Simple Linear Iterative Clustering (SLIC) is applied with the
image to detect the continuous region of the road network. Mustafa et al. presented an
integrated model for detecting rectangular structure and circular type buildings with the
help of SVM classifier, Hough transform and perceptual grouping. The building pat-
ches are classified by SVM classifier and the boundaries of buildings is extracted by
edge detection, Hough transform and perceptual grouping methods [8].
Extracting the build-up areas in the satellite image is complex due to the fact that it
contains hidden features. Satellite image is considered for processing is taken as input.
Figure 1 shows the flow diagram of extracting the buildings in the image.
The image contains speckle noise as default and it should be removed. The image is
registered, calibrated and geometrically corrected. The image is applied with feature
extraction techniques such as edge extraction, grey scale morphological operation,
haralick texture extraction, morphological classification and local statistics extraction.
Removing Speckle
Noise
Feature Extraction
methods
mean, variance and skewness and kurtosis are considered for computation. Therefore,
the output image comprising four features for each band. The red, blue, green and
yellow bands are representing built up structure, road, soil and building edges.
3 Experimental Results
This work considers Coimbatore settlement as a test image for experiment. The image
is openly available and downloadable from Bhuvan website [11]. Figure 2(a) shows the
original image of Coimbatore settlement. The image is applied with four of the feature
extraction techniques. Edge extraction, Grey scale morphological operation, haralick
texture extraction, morphological classification and local statistic extraction methods.
For edge extraction techniques, the three edge extraction operations gradient, Sobel and
Touzi are applied with the image and the result of these methods is shown in Fig. 2(b)
to 2(d). The gradient operator detects the line edges in the image and the sobel operator
fine tunes the edges in the image. The Touzi operation enhances the edges in the
images and gives the better visualization of edges in the image. The output of the image
is better to visualize to a human eye comparing with the other two operations.
The grey scale morphological operation of the image is shown in image 2(e) to 2(f).
In Fig. 2(e), the Dilate operation is applied with the image and the output is shown. The
erode operation is applied with the image and the output is shown in Fig. 2(f). The
opening and closing operation is applied with the image and the output is shown in
Fig. 2(f). The haralick texture extraction technique is applied with the image and the
output is shown in Fig. 2(j) to 2(l). Figure 2(j) shows the output of simple haralick
operation applied with the image. The building structure is extracted and shown in
green segments. Figure 2(k) shows the advanced haralick operation. The edges of the
buildings are extracted and outlined in a yellow color. The clear separation of buildings
and other is shown as the output. The higher order haralick operation is applied with
the image and the outcome is shown in Fig. 2(l). The buildings are identified as red
color and are demarcated. The morphological classification techniques is used and
applied with our image. Figure 2(m) shows that the output of the classified image using
morphological classification. The ball structure is opted as a structural component and
the features are extracted. Figure 2(n) is the classified image of Coimbatore settlement
using morphological classification. The cross feature is selected as a structural com-
ponent for classifying the image. Figure 2(o) shows the output of the local statistical
extraction method. The statistical features such as mean, variance, skewness and
kurtosis are computed for each pixels in the image along with its neighborhood pixels.
The four band image is the outcome of this extraction method and is shown. The built-
up area, vacant land, road and vegetation are clearly demarcated in the image.
Fig. 2. (a) original image (Coimbatore settlement) 2(b)–2(d) edge detection operations 2(b)
gradient 2(c) Sobel 2(d) Touzi 2(f)–2(i) grey scale morphological operations. 2(f) Dilate 2(g)
Erode 2(h) opening 2(i) closing 2(j)–2(l) Haralick texture extraction. 2(j) Simple 2(k) Advanced
2(l) Higher order. 2(m)–2(n) Morphological classification – ball structure and cross structure.
2(o) local statistics extraction.
Table 1 shows the comparison of MAE and PSNR value of the given image with the
resultant image. The metric MAE value concern, lower is better is good. For PSNR
value comparison, more the higher, and more the better accuracy. In edge detection
method, the operations with gradient, sobel and Touzi are compared. The Touzi is
better performance compared with other two operations. Comparing grey scale mor-
phological operations, we compare the operations like dilate, erode, opening and
closing. The closing operation is result as better performance comparing with other four
operations. Haralick texture extraction is compared with three of it operations namely
simple, advanced and Higher order. The haralick advance method is the higher the
performance with the other two methods. The Fig. 2(k) clearly shows the extraction of
buildings in the image. Morphological classification method is applied with the image.
The ball structure returns better results comparing with the other cross structure clas-
sification. Finally, the local statistics extraction performance is given.
The feature extraction techniques are compared and shown in Table 1. All these
algorithms are comes under feature extraction category. According to this kind of
image is concern, the advanced method of haralick texture extraction returns better
results comparing with all other algorithms. The MAE value of 80.30 is lower than
748 J. R. Raj and S. Srinivasulu
Table 1. Comparison of five feature extraction methods with their MAE and PSNR values.
Feature extraction methods Operations MAE PSNR
Edge detection Gradient 95.19 7.23
Sobel 95.27 7.22
Touzi 94.98 7.24
Grey scale morphological operation Dilate 81.35 8.20
Erode 86.30 7.82
Opening 82.21 8.14
Closing 80.34 8.28
Haralick texture extraction Simple 102.04 6.78
Advanced 80.30 8.30
Higher order 85.88 7.86
Morphological classification Structure – ball 129.12 5.19
Structure – cross 129.22 5.18
Local statistic extraction Default 80.75 8.27
other methods and the PSNR value of 8.30 is higher comparing with all the other
methods. The haralick texture extraction shows that 80 percent of the built-up areas in
the image.
4 Conclusion
Extracting building in a satellite image is a complex process as its features are different
with other images. This work concentrated on applying five feature extraction algo-
rithms with the satellite image and successfully implemented. The satellite image of
Coimbatore settlement from Tamilnadu is taken for feature extraction. The algorithms
are working well again the images and its performance is compared and shown.
Haralick texture extraction algorithm works well against the image and the building
features are rightly extracted from the image. The algorithms are tested against the
mobile LiDAR images also.
References
1. Konstantinidis D, Stathaki T (2017) Building detection using enhanced HOG–LBP features
and region refinement processes. IEEE J Sel Top Appl Earth Obs Remote Sens 10(3):
888–905
2. Huang X (2017) A new building extraction postprocessing framework for high-spatial-
resolution remote-sensing imagery. IEEE J Sel Top Appl Earth Obs Remote Sens 10(2):
654–668
3. Manandhar P, Aung Z, Marpu PR (2017) Segmentation based building detection in high
resolution satellite images. IEEE international geoscience and remote sensing symposium
(IGARSS), pp 3783–3786
Extracting Buildings from Satellite Images 749
1 Introduction
A Wireless sensor network (WSN) is a network consists group of wireless sensor nodes
with resource constraints such as battery power and inadequate communication. The
nodes are have ability of sensing an event, converting the analog information to digital
and transmitting them to destination node such nodes are usually called as sink. Every
sensor node wirelessly communicate with each other node within a radio communi-
cation range. Wireless sensor nodes have less battery power, and they are regularly
deployed in areas where it is difficult to replace and restore their power source [1].
Therefore, the transmission of data from a source to the sink receives data through hop
by hop approach. To extend the maximum network life time for WSNs the design and
algorithm must be constraints.
To control congestion in WSNs it is categorized into two control methods that is
traffic control and resource control. Traffic congestion control is controlled by adjusting
the incoming traffic rate, weather rate is increased or decreased based on the condition
of the congestion. While, to increase the network capacity the nodes which are not
participated in transmitting the packets at initial in the network such nodes are con-
sidered as additional resources and it is called resource control method [2]. Traffic and
resource control methods have some advantages and disadvantages depend upon cer-
tain condition.
Congestion in WSNs is categorized with reference to where and how packets are lost in
the network.
2.1 Regarding the Reason of Congestion Where the Packets Are Lost [4]
Source (transient) Hotspot: The network with large number of sensor nodes generates
information through a critical point which produces persistent hotspot near the source
within one or two hops. During this state, from the point of congestion it provides
backpressure to the source which would give effective results.
Sink (persistent) Hotspot: The network with less number of sensor nodes that cause
low data rates are capable to form transient hotspot anywhere in the sensor filed but
probably from source to the sink.
Intersection Hotspot Congestion: In a sensor network which as more than one flow
from sink to source and this flow crisscrosses with each other thus, area around the
crisscross will probably cause a hotspot. At a tree like communication the nodes at each
crisscross experience forwarder congestion. Comparison with source and sink hotspot,
intersection hotspot is demanding as a result it is hard to predict intersection points
because of network dynamics.
2.2 Regarding the Reason of Congestion How the Packets Are Lost
Node level congestion: congestion take place once the packet arrival rate exceeds over
packet service rate within the node level and due to this there is packet loss, delay, and
increase in queuing. Due to packet loss retransmission appears which leads to addi-
tional energy loss. This occurs at sink node due more upstream traffic.
Link level congestion: It is generated due to contention, interference and bit error
rate. In WSNs, the nodes communicate through wireless channels by CSMA protocols
which are shared by several nodes, collusion takes place at several active sensor nodes
which can lead to seize the channel at the same time.
3 Control Methods
In WSNs the algorithms that handle congestions can be differentiated into congestion
control, congestion avoidance and reliable data. Congestion mitigation algorithms takes
reactive actions. These algorithms considered at MAC, network and transport layers.
algorithms that considered to avoid congestion occurrence is called congestion
avoidance, these algorithms usually takes place at MAC and network layer. further,
reliable data transmission try to attempt recover all or part of the lost data.
752 S. Suma and B. Harsoor
Swarm Optimization (PSO) is used to improve C-LV by minimizing the end to end
delay the quality of service is improved is verified by simulation result.
The main scope of this study gives the result whether all data packets, created during
crisis state can be forwarded to sink without reducing the data rate of the source.
Performance metrics compare according to their congestion control such as traffic rate
control (Table 1), resource management (Table 2) with respect to key ideas, advan-
tages and disadvantages. Significant and characteristic metrics used by the congestion
protocol like throughput, packet delivery ratio, delay, end to end delay, energy node
consumption, packet loss and network lifetime.
Congestion Control Algorithms for Traffic and Resource Control 755
6 Future Work
Future work can be the hybrid algorithm that combine the advantages and avoid
disadvantages of traffic and resource control method. Concerning algorithms; it is
interesting to study whether mobile nodes can appear to the point where alternative
paths are needed. Mobile nodes can be a solution to be become very effective in
congestion control problems.
7 Conclusion
In this paper the congestion control problem in WSNs are outlined based on their
merits and demerits with different protocols. The main objective study of this paper is
to extend the network lifetime in WSNs and reduce the end to end delay. All algorithms
are aimed to control congestion and extend the network lifetime by effectively utilizing
the limited available resource. However, the idea of improving congestion control
mechanism is an open issue.
References
1. Wireless sensor networks: an information processing approach. Feng Zhao and Leonidas
Guibas-2004 - dl.acm.org. eBook ISBN: 9780080521725
2. Kang J, Zhang Y, Nath B (2007) TARA: topology-aware resource adaptation to alleviate
congestion in sensor networks. IEEE Trans Parallel Distrib Syst 18:919–931. https://doi.org/
10.1109/TPDS.2007.1030
3. Sergiou C, Vassiliou V (2010) Poster abstract: alternative path creation vs. data rate
reduction for congestion mitigation in wireless sensor networks. In: 9th ACM/IEEE
international conference on information processing in sensor networks, pp 394–395
4. Woo A, Culler DE A transmission control scheme for media access in sensor networks:
congestion detection and avoidance in sensor networks. https://doi.org/10.1145/381677.
381699. ISBN:1-58113-422-3
5. Ee CT, Bajcsy R (2004) Congestion control and fairness for many-to-one routing in sensor
networks. In: Paper presented at the proceedings of the 2nd international conference on
embedded networked sensor systems. https://doi.org/10.1145/1031495.1031513. ISBN:1-
58113-879-2
6. Hull B, Jamieson K, Balakrishnan H Mitigating congestion in wireless sensor networks. In:
Paper presented at the proceedings of the 2nd international conference on embedded
networked sensor systems. https://doi.org/10.1145/1031495.1031512. ISBN:1-58113-879-2
7. Wang C, Li B, Sohraby K, Daneshmand M Hu Y (2007) Upstream congestion control in
wireless sensor networks through cross-layer optimization. https://doi.org/10.1109/JSAC.
2007.070514
8. Vedantham R, Sivakumar R, Park S-J (2007) Sink-to-sensors congestion control. Ad Hoc
Netw 5(4):462–485
9. Scheuermann B, Lochert C, Mauve M (2008) Implicit hop-by-hop congestion control in
wireless multi hop networks. Ad Hoc Netw. https://doi.org/10.1016/j.adhoc.2007.01.001
10. Yin X, Zhou X, Huang R, Fang Y, Li S (2009) A fairness-aware congestion control scheme
in wireless sensor networks. https://doi.org/10.1109/TVT.2009.2027022
758 S. Suma and B. Harsoor
11. Fang W-W, Chen J-M, Shu L, Chu T-S, Qian D-P (2014) Congestion avoidance, detection
and alleviation in wireless sensor networks. www.ijerd.com 10(5):56-69. e-ISSN: 2278-
067X, p-ISSN: 2278-800X
12. Tao LQ, Yu FQ (2010) ECODA: enhanced congestion detection and avoidance for multiple,
pp 1387–1394. https://doi.org/10.1109/TCE.2010.5606274
13. Li G, Li J, Yu B (2012) Lower bound of weighted fairness guaranteed congestion control
protocol for WSNs. In: Proceedings of the IEEE INFOCOM, pp 3046–3050
14. Brahma S, Chatterjee M, Kwiat K, Varshney PK (2012) Traffic management in wireless
sensor networks 35(6):670–681. https://doi.org/10.1016/j.comcom.2011.09.014
15. Hua S (2014) Congestion control based on reliable transmission in wireless sensor networks.
https://doi.org/10.4304/jnw.9.3.762-768
16. Joseph Auxilius Jude M, Diniesh VC (2018) DACC: Dynamic agile congestion control
scheme for effective multiple traffic wireless sensor networks. https://doi.org/10.1109/
WiSPNET.2017.8299979
17. Misra S, Tiwari V, Obaidat MS (2009) Lacas: learning automata-based congestion avoid-
ance scheme for healthcare wireless sensor networks. IEEE J. Sele Areas Commun. https://
doi.org/10.1109/JSAC.2009.090510
18. Royyan M, Ramli MR, Lee JM Kim DS (2018) Bio-inspired scheme for congestion con- trol
in wireless sensor networks. https://doi.org/10.1109/WFCS.2018.8402366
19. Alam MM, Hong CS “CRRT: congestion-aware and rate-controlled reliable transport.
https://doi.org/10.1587/transcom.E92.B.184
20. Pilakkat R, Jacob L (2009) A cross-layer design for congestion control in UWB-based
wireless sensor networks. https://doi.org/10.1504/IJSNET.2009.027630
21. Antoniou P, Pitsillides A, Blackwell T, Engelbrecht A (2013) Congestion control in wireless
sensor networks based on bird flocking behavior. https://doi.org/10.1007/978-3-642-10865-
5_21
22. Sergiou C, Vassiliou V, Paphitis A Hierarchical Tree Alternative Path (HTAP) algorithm for
congestion control in wireless sensor networks. https://doi.org/10.1016/j.adhoc.2012.05.010
23. Aghdam SM, Khansari M, Rabiee HR, Salehi M WCCP: a congestion control protocol for
wireless multimedia communication in sensor networks. https://doi.org/10.1016/j.adhoc.
2013.10.006
24. Domingo MC Marine communities based congestion control in underwater wireless sensor
networks. https://doi.org/10.1016/j.ins.2012.11.011
25. Luha AK, Vengattraman T, Sathya M (2014) Rahtap algorithm for congestion control in
wireless sensor network. Int J Adv Res Comput Commun Eng 3(4)
26. Sergiou C, Vassiliou V, Paphitis A (2014) Congestion control in wireless sensor networks
through dynamic alternative path selection. Comput Net. https://doi.org/10.1016/j.comnet.
2014.10.007
27. Chand T, Sharma B, Kour M TRCCTP: a traffic redirection based congestion control
transport protocol for wireless sensor networks. https://doi.org/10.1109/ICSENS.2015.
7370452
28. Ding W, Tang L, Ji S (2016) Optimizing routing based on congestion control for wireless
sensor network 22(3):915–925. https://doi.org/10.1007/s11276-015-1016-y
29. Ding W, Niu Y, Zou Y (2016) Congestion control and energy balanced scheme based on the
hierarchy for WSNs. IET Wirel Sens Syst. https://doi.org/10.1049/iet-wss.2015.0097
EHR Model for India: To Address Challenges
in Quality Healthcare
1 Introduction
India is among high population nations. There are various conventional and non-
conventional as well as advanced medical practices carried out by huge number of
medical practitioners and health centers. Health care is a challenging issue in India. In
our country, due to high population most of the regions even though they are near to
cities are lacking the medicinal facilities. Medical data is generated in various forms by
large number of medical practitioners and hospitals. Health related information itself
has witnessed exponential growth. The traditional data applications are not appropriate
to analyze and process this huge amount of data. Data is being generated from various
sources. Data capturing, analysis, searching, sharing, storage, data visualization and
security are emerging as major challenges. New techniques are emerging to counter
these challenges and to use data for improved decision making. Data stored in most text
databases are semi structured data in that they are neither completely unstructured nor
completely structured. For example, a document may contain a few structured fields,
such as Patient Name, Age, and Gender Physician Information; and so on, but also
contain some largely unstructured text components, such as Medical History, Treat-
ment, Reports and Precautions. There have been a great deal of studies on the modeling
and implementation of semi structured data in recent database research. Moreover,
information retrieval techniques, such as text indexing methods, have been developed
to handle unstructured documents.
It is a need of time to implement and use “EHR” as per the guidelines framed by
Ministry of Health and Family Welfare [20]. It is a major task to provide a suitable
architecture for Electronic Health Record System along with an acceptance model for
EHR system. Towards the betterment of medicinal services in India; additional efforts
are there to provide a solution with a sub system to use IOT for Medicinal facilities [8].
2 Scope of Work
In this section we have focused on major areas such as early use of computer system to
maintain treatment channel using Medical Transcription, Health Service Infrastructure
in India. Organizational challenges in India, Efforts taken by Government, Some issues
of EHR, Importance of standards, Few Success stories about adoption of EHR System
and finally we have discussed about the tasks that need to be carried out for successful
adoption of EHR in India.
Earlier the patient record was maintained in paper format and preserved in huge
cabinet and used whenever required for reference or verification. It is necessary to
preserve the treatment plan to be used for future use and also to keep track of patient
health. In most of the super specialty hospitals there is a practice of creating clinical
notes based on case papers or treatment plans given by Doctors. There is also a practice
that doctors are sending their dictation in sound files and that is converted into com-
puter file by medical transcription team.
EHR Model for India: To Address Challenges 761
Apart from above mentioned Health Care System in each sector there are general
practitioners who are providing their services through personal dispensary, through
government polyclinic or private polyclinics. Health records are generated at every
health service encounter. Most of the records are either lost or just lying in physical
form with medical service unit or with the patient. Some records are destroyed after
certain period.
OPD record is normally handed over to patients, even if it is maintained with
hospital it is destroyed after 5 Years. IPD records are destroyed after every 5 Years and
some units retain it for 10 Years. As per the Deesha Act Expired patient data and data
of Medico legal cases are retained for Life Time. This method executed only in
properly set hospitals.
General practitioners are normally not keeping the records. There are some suc-
cessful cases of properly framed treatment. If record is not maintained; most of the data
cannot be made available for extra ordinary research. Patient referral is critical. To
make it varied purpose EHR is necessary in India. It can be made available for various
users for various purposes. It can also be made available for all direct and indirect
stakeholders.
regard. In the recent article from Hindustan Times it is mentioned that in India there are
1 Million doctors of Modern medicine to treat 1.3 Billion of its Population. There are
hardly 1.5 Lacks of Doctors in Public service to serve patients. There is absolute non
existence of patient centric care in our Country. All private hospitals and medical
facility centers have their own computer system but it is limited to registration of
patients and collection of fees towards the treatment. The interoperability is not pro-
vided in any of these systems. While having a discussion on “Future of Healthcare and
Medical Records” that digital medical records make it possible to improve quality of
patient care in numerous ways. The similar system is being adopted by various central
organizations for their employees and it is a success in the area of healthcare system.
Following is the CHSS adopted by BARC.
CHSS Stands for Contributory Health Service scheme where Employees contribute
depending upon their level of cadre to this scheme. They are provided with a CHSS
membership Number. They can avail this membership to provide healthcare for
themselves as well as their family members. They can avail medical facility at dis-
pensary, primary Health centers as well as Hospitals of BARC. The all hospitals and
dispensaries are linked with a dedicated network and all medical transactions are
recorded in central server. Doctors at BARC can refer the patient if extraordinary
healthcare is essential. All Expenses are taken care by EHSS scheme. There is a central
EHSS administration office to manage all necessary procedures in this regard.
There is still need to make the storage system compatible with Health Level 7
(HL7) which is a internationally recognized health record system. HL7 is discussed
further in this paper. While implementing the EHR System we look forward to focus on
various ways Technology can help to transform health care [3].
1. Better diagnosis and treatment: Doctors could potentially rely on Computer system
to keep track of patient history, stay up-to-date on medical research and analysis of
treatment options.
2. Helping doctors communicate with patients: patients who don’t understand
medicinal terms speak or have hesitation to communicate will help doctors to create
a dialogue.
3. Linking doctors with other doctors: Doctors may need to refer patients to specialist
in the treatment of diseases that can be made easier by creating interoperability of
the medical information.
4. Connecting doctors and patients: It is always beneficial to create dialogue among
patients and doctors. Effective communication system is suggested by the author
5. Helping patients stay healthy: Trends in technology may add value to help patients
to stay healthy by promoting awareness systems.
After a detailed literature review we have identified various areas in “Information
Technology to provide quality health care [4] (i) Mobile Health Care (ii) Cloud
computing and Virtualization. (iii) Big Data Analytics. (iv) Medicinal Standards
(v) Meaningful use of Technology. Healthcare IT will have lot of opportunities and
various skill sets such as Privacy, Data Security, Infrastructure, Legal Practices.
Analytics will have huge scope for IT professionals to provide technologies for
Healthcare. Maintaining Quality of data towards acceptance of the technology is also
vital factor. Major problems [8] with respect to scalability, heterogeneity, verification
EHR Model for India: To Address Challenges 763
and authentication should also be considered while deciding the EHR System. Big data
analytics in healthcare is evolving into a promising field for providing insight from very
large data sets and improving outcomes while reducing costs. Its potential is great;
however there remain challenges to overcome.
Health data volume is expected to grow dramatically in the years ahead. In addi-
tion, healthcare reimbursement models are changing; meaningful use and pay for
performance are emerging as critical new factors in today’s healthcare environment.
Although profit is not and should not be a primary motivator, it is vitally important for
healthcare organizations to acquire the available tools, infrastructure, and techniques to
leverage big data effectively. Electronic Health record is hand in hand [9] with char-
acteristics of big data such as volume, variety, velocity and, with respect specifically to
healthcare, veracity. By digitizing, combining and effectively using big data, healthcare
organizations ranging from single-physician offices and multi-provider groups to large
hospital networks and care organizations stand to realize significant benefits. Scientists
have discussed about the minimum elements of the EHR and presented their work
related to meet the requirements of EHR [11]. The systems standards like open EHR
and HL7 (Health Level 7) are analysed along with sample patient data is specified in
their research project. EHR has futuristic approach towards better diagnostics, treat-
ment decisions and assure the healthcare. It also provides quality in other services in
healthcare.
It is necessary to study the work done so far by various professionals since historical
period till date. This section speaks about efforts taken by various countries and
countrymen in the field of EHR.
4 Future Scope
LOINC: Logical Observation Identifiers Names and Codes is a database and universal
standard for identifying medical laboratory observations. First developed in 1994, it
was created and is maintained by the Regenstrief Institute, a US non profit medical
research organization.
DICOM: Digital Imaging and Communications in Medicine is a standard for han-
dling, storing, printing, and transmitting information in medical imaging. It includes a
file format definition and a network communications protocol.
HL7: Health Level 7 refers to a set of international standards for transfer of clinical
and administrative data between software applications used by various healthcare
providers.
openEHR: This is a virtual community working on means of turning health data from
the physical form into electronic form and ensuring universal interoperability among
all forms of electronic data. The primary focus of its endeavour is on electronic health
records (EHR) and related systems.
Technology standard will help to identify whole record as similar category record.
Users must be provided with pick list or drop down menu that will help to insert a
health related information in the form of standard.
Interoperability will facilitate improved data exchange for patient safety in clinical
decision support and analysis. Further model will be transformed into cloud based
system with user interface customised to Mobile as well as web app. The patient’s data
will be made accessible against patient’s consent. The research test bed can be created
from easy to use system platforms.
5 Proposed Model
It is observed that most of the general practitioners are not maintaining the Patient’s
information and they are also using locally identified names for the diseases. Patient’s
health information record will play vital role in deciding the treatment plan. Patients or
citizens demographic as well as treatment history need to be maintained in electronic
form and also should be made available to authorized persons (patient themselves and
attending doctors). Major aim is to have practice of recording information at root level.
In our proposed model the first phase comprises of data collection module. This
module consists of a EHR system either a cloud based mobile app or a web portal.
A state of an art user interface will be made available. All stakeholders can enter the
data in the specific format suggested by National Electronic Health Authority (NeHA).
In the system there is a facility to there is a facility that will first register all the
stakeholders who will be using the system and who are authentic users of the system.
The users will be assigned with access rights that will assure the privacy and security.
Whole patient information will be transferred to cloud in secured form.
6 Conclusions
In this paper we have identified initiations that can be taken to provide model to record
health data with minimum efforts. The cost benefit system is intended and will be the
major expected outcome. Government of India has taken initiative towards EHR by
providing the National Health Portal and various web services to upload and store
patient information. Following are few areas there is a scope for improvement and that
will help to identify objectives of this research.
1. There is a need to test the existing system with respect to HL7 compliance and other
standards specified by ministry of health and family welfare.
2. A health information system that has a patient centric approach needs to be
developed.
3. It is also necessary to build communication Portals, Connecting to various platforms
and Help aid for the patients.
4. There is a wide scope to adopt Business intelligence and analytics in Healthcare
[15]. There is need to apply intelligence along with the Analytics to assure quality
in Health care.
5. Security and privacy issues should not be ignored over growing demand in
healthcare data. The challenges of EHR are Privacy, Security, user friendliness,
portability and interoperability need to be addressed.
770 P. Kanade et al.
References
1. Bajaj D, Bharati U, Ahuja R, Goel A (2017) Role of Hadoop in big data analytics, CSI
communication April, pp 14–18
2. Blumenthal D, Future of healthcare and medical records
3. Bhattachrya I, Muju S (2012) Need of interoperability standards for healthcare in India. CSI
Journal of Computing 1(4)
4. Cyganek B, Selected aspects of EHR analysis from big data perspective. In: IEEE 2015
international conference on bioinformatics and biomedicine, pp 1391–1396
5. EHR Standards for India, guidelines given by Ministry of Health and Family welfare, April-
2016, e-governance Division
6. EI-Gayar O, Timsina P (2014) Opportunities of business intelligence and big data analytics
in evidence based medicine. In: IEEE 47th Hawai international conference on system
science, pp 749–757
7. Sinha P, Sunder G, Mantri M, Bendale P, Dande A Book Titled “Electronic Health Record:
Standards Coding Systems Framework, and Infrastructures” by IEEE Press, Published by
John Wiley
8. Fisher D, DeLine R, Czerwinski M, Drucker S (2012) Interactions with big data analytics,
ACM interactions microsoft research May–June 2012 pp 50–59
9. Gholap J, Janeja VP, Yelena Y (2015) Unified framework for Clinical Data Analysis
(U-CDA). In: IEEE international conference on big data 2015, pp 2939–2941
10. Grana M, Jackwoski K (2015) EHR: A review. In: IEEE international conference on
bioinformatics and biomedicine, pp 1375–1379
11. Hein R CIO Nov 2012, IT Health care Trends for Technology job seekers in 2013
12. How to Build a successful data lake Data lake whitepaper on http://www.knowledgent.com
(2017)
13. Humm BG, Walsh P (2015) Flexible yet efficient management of health records. In: IEEE
conference on computational science and computational intelligence, pp 771–775
14. Karkouch A (2015) Data quality enhancement in IOT environment. In: IEEE international
conference on computer systems and applications, Morocco
15. Lee E (2013) 5 ways technology is transforming health care, IBM Forbes Brandvoice
Contribution January 2013 http://onforb.es/Y16Mno
16. Lu J, Keech M (2015) Emerging technologies for health data analytics research: a conceptual
architecture. In: Jing Lu, Malcom Keech (eds.) IEEE workshop on data base and expert
system applications 2015, pp 225–229
EHR Model for India: To Address Challenges 771
orientations. In this examination utilized, half breed classifiers are utilized and
dissected their resultant. Furthermore, the resultant demonstrates the half and
half classifiers give better outcomes.
1 Introduction
The reason for sexual orientation forecast from content is to anticipate the sex of people
from the examined pictures of their contents. The procedure which is utilized in this
task analyzes the content as per some geometrical, basic and textural highlight of the
content. The sexual orientation forecast is finished by extricating the highlights from
composing tests and preparing classifier to learn and separate between them.
In this venture there are 3 modules utilized for foreseeing the sexual orientation
from penmanship:
Highlight Extraction
Information Cleaning
Grouping
Highlight Extraction: [1] Images are first changed over into twofold configuration
utilizing Otsu thresholding calculation. On account of sex distinguishing proof of
author, a likelihood dispersion work (PDF) is created from the penmanship pictures to
recognize essayist’s uniqueness. Bearing Feature: This component can gauge the
digressive heading of focal pivot of content. Here it utilizes a Probability thickness
capacity of 10 measurements. Ebb and flow Feature: This quality is typically
acknowledged in measurable science examination which thinks about the ebb and
flows as separating highlight. It utilizes a Probability thickness capacity of 100 mea-
surements which speak to the estimations of the arch at the blueprint pixels. Tortuosity
Feature: This element makes it conceivable to separate between the two kinds of
journalists, one whose penmanship is mind-boggling and composes gradually and
another whose penmanship is great and composes quick. For every pixel in the frontal
area, we search for the longest straight line conceivable incorporated into the closer
view. Chain Code Feature: We can create chain code just by checking the blueprint of
the given penmanship and after that by appointing a number to every pixel as indicated
by its area as for its last pixel. For executing this procedure, we will consider 8 pixels
around a given pixel and think about its area as for these pixels. Edge Direction
Feature: Edge-based directional features surrender a summed setup of orientation and
this can in like manner be associated at a couple of sizes by modifying a window
centered at each outliner pixel and counting the occasions of each heading. This
component has been handled from size 1 to gauge 10.
774 R. D. Sah et al.
2 Related Work
In 2012, AI Maadeed et al. [13] has proposed QUWI: An Arabic and English hand-
writing dataset for offline writer identification in the dataset of Arabic and English
handwriting which evaluate the offline writing identification system. It shows hand-
writing is used for identification of the gender and handiness of specific writer and as
well as his or her age and nationality. On the same year, Hassaïne et al. [14] has
proposed a set of geometrical features for writer identification. Author has described
different geometrical feature like directions, curvatures and tortuosities can characterize
writers. In a 2015 survey, M. Patel and S. Thakkar [Patel 2015] showed out that a
100% success rate is still unreachable in the problem of connected handwriting iden-
tification. Holistic approaches reduce the obligation to do complicated segmentation
operations on handwriting. In 2016, Blucher and his partners introduced a system that
utilizes a modification of a Long Short-Term Memory (LSTM) neural network that
implements the processing and identification of whole paragraphs. However, these
systems restrict the vocabulary that may arise in the text. For these reasons, only certain
identification results are received in cases of restricted vocabularies. To break this chain
of lessened vocabularies, some authors are successfully using recurring networks such
as Connectionist Temporal Classification (CTC). In 2017, Morera et al. [17] has
proposed a research article on gender and handedness prediction from offline hand-
writing using convolutional neural networks. Author has presented an experimental
study based on a neural network to several automatic demographic classifications based
on handwriting. [7] Sah et al., In the computation of normal act, weighted average is
technique is used. Weight values are calculated according to the distances between the
given data and each selected cluster. [4] Sah et al., Data mining offers various algo-
rithms for the purpose of mining. The bucket of data mining technique is association
rule mining, clustering, classification and regression. Padhy et al. [11] discussed the
cost effective and fault-resilient reusability prediction model by using adaptive genetic
algorithm based neural network for web-of-service applications and proposed the
algorithms and models. Authors’ primary focus was reusability prediction by taking
100 Web service projects.
4 Methodology
There are three main methodologies are used for predicting the gender from
handwriting.
(a)
(b)
(c)
(d)
Fig. 2. (a) Predefined order of traversing shape. (b) Example of an ordered shape. (c) Estimating
directions by its neighboring pixel. (d) Binary image and its corresponding Zhang election
A Hybrid Mining Approach: Gender Prediction from Handwriting 777
(a) (b)
(a) (b)
Fig. 4. (a) Longest traversing segment for 4 different pixels, (b) Length of maximum traversing
segment: red corresponds to the maximum length and blue to the minimum one.
(a)
(c)
(b)
Fig. 5. (a) Order followed to generate chain code. (b) Example shape. (c) Corresponding chain
code of the example
precision for the picture and foresee the sexual orientation of the User. For the pre-
viously mentioned reason, we will use all the above attributes and will generate some
new attribute to enhance the efficiency of the existing system.
f1: DirectionPerpendicular5Hist10
f2: CurvatureAli5Hist100
f3: tortuosityDirectionHist10
f4: chaincodeHist_8
f5: chaincode8order2_64
f6: chaincode8order3_512
f7: chaincode8order4_4096
f8: directions_hist1_4
f9: directions_hist2_8
f10: directions_hist3_12
f11: directions_hist4_16
f12: directions_hist5_20
f13: directions_hist6_24
f14: directions_hist7_28
f15: directions_hist8_32
f16: directions_hist9_36
f17: directions_hist10_40
f18: directions_hist1a2_12
f19: directions_hist1a2a3_24
f20: directions_hist1a2a3a4_40
f21: directions_hist1a2a3a4a5_60
f22: directions_hist1a2a3a4a5a6_84
f23: directions_hist1a2a3a4a5a6a7_112
f24: directions_hist1a2a3a4a5a6a7a8_144
f25: directions_hist1a2a3a4a5a6a7a8a9_180
f26: directions_hist1a2a3a4a5a6a7a8a9a10_220
[14–16] All the features are listed above is used for a specific purpose. Feature f1 is
used for calculating the direction of the text that is in which direction text is aligned.
Feature f2 is used for calculating the curvature of the handwritten text. Feature f3 is
used for calculating tortuosity of the handwritten text. This feature is used when the
handwritten text is twisted or complex. Feature f4–f7 is used for calculating chain code
of the handwritten text. There are four features are used for four different directions.
We can generate chain code just by checking the outline of the given script and then by
assigning a number to each pixel according to its location with respect to its last pixel.
For implementing this process, we will consider 8 pixels around a given pixel and
consider its location with respect to these pixels. Feature f8–f26 is used for calculating
the edge-based direction of the handwritten text. Edge-based directional features give a
generalized configuration of directions and this can also be applied at many mea-
surements by aligning a window focused at each outliner pixel and computing the event
of each direction. This feature has been computed from size 1 to size 10.
780 R. D. Sah et al.
5.1. Data cleaning remove and detect the corrupted data from the data set, we
used the data cleaning process. Basically, data cleaning is belonging to data quality.
Data quality is used to check the following features of the data:
Accuracy
Completeness
Uniqueness
Timelines
Consistency
Binning: This method is used to smoothen the sorted data to take advise by its’
neighbors. Then the values are shared into a number of buckets. For examples, dataset
D = {4, 8, 9, 15, 21, 24, 25, 26, 28, 29, 34}
Equal frequency (equal depth) bins:
Bin1: 4, 8, 9, 15
Bin2: 21, 23, 24, and 25
Bin3: 26, 28, 29, 34
Smoothing by bin means:
Bin1: 9, 9, 9, 9
Bin2: 23, 23, 23, 23
Bin3: 29, 29, 29, 29
Smoothing by bin boundaries:
Bin1: 4, 4, 4, 15
Bin2: 21, 21, 25, 25
Bin3: 26, 26, 26, 34
5.2. Dataset
We use the dataset which is publicly available from Qatar University. This data set is
basically based on the script to forecast the gender. The dataset stores the scripts of 282
writers. This dataset stores the script in both English and Arabic language. All the
writers are producing text in each language one text that is the same for all the writers
and one text which distinct for all the writers. Writers in this data set are from distinct
gender like and female. we take 3 distinct data sets in one dataset stores the test dataset
of 282 writers in this male dataset the script which is written by the writers are
distributed in 7070 columns in different features like direction, curvature, tortuosity,
chain code, edge direction. The second dataset has the training data of 200 writers. In
this dataset, a script which is written by the writers is distributed in 7070 columns in
different features like direction, curvature, tortuosity, chain code, edge direction. In this
dataset there are many columns are present for writer’s id, page id, language, same text
or different text, and many other features. In the third dataset, it stores the answer to the
A Hybrid Mining Approach: Gender Prediction from Handwriting 781
test dataset. In this dataset, the two columns are present the first column has the writer’s
id and the second column have the binary number 0 & 1. Binary number 0 shows that
the writer is female and the binary number 1 shows that the writer is male. We compute
all the features on the dataset [5, 6]. Each feature is combining using the different
grouping method. We use two grouping method for predicting the gender from the
script is a random forest grouping method and gradient boosting grouping method.
Gender grouping, it is the grouping method has the winning probability of a single
writer is 50%. According to the dataset is there is two types of gender are obtainable to
instruct the dataset, Male and female. Hence, we have to predict the writer is male or
female.
Table 2. Writer of English and Arabic represented by their Gender
782
A1 * : fir writer
E F G 1 K L
A A M N o
1 writer page_id language same_tex tortuosity tortuosity tortuosity tortuosity tortuosity tortuosity tortuosity tortuosity tortuosity tortuosity tortuos
2 1 1 Arabic 0 1 0 0 0 0 0 0 0 0 0 0.099
3 1 2 Arabic 1 0 0 0 0 0 0 0 0 0 0.1488
4 1 3 English 0 0 0 0 0 0 0 0 0 0 0.2447
5 1 4 English 1 0 0 0 0 0 0 0 0 0 0.2990
R. D. Sah et al.
6 2 1 Arabic 0 0 0 0 0 0 0 0 0 0 0.208
7 2 2 Arabic 1 0 0 0 0 0 0 0 0 0 0.219(
8 2 3 English 0 0 0 0 0 0 0 0 0 0 0.2572
9 2 4 English 1 0 0 0 0 0 0 0 0 0 0.2706.
10 3 1 Arabic 0 0 0 0 0 0 0 0 0 0 0.1342
11 3 2 Arabic 1 0.92875 0.004071 0.001876 0 0.018031 0.025427 0.003687 0.018159 0 0 0.1921
12 3 3 English 0 0 0 0 0 0 0 0 0 0 0.2900*
13 3 4 English 1 0 0 0 0 0 0 0 0 0 0.3100*
14 4 1 Arabic 0 0 0 0 0 0 0 0 0 0 0.1738
15 4 2 Arabic 1 0 0 0 0 0 0 0 0 0 0.1747(
16 4 3 English 0 0.982517 0 0.017483 0 0 0 0 0 0 0 0.2282.
17 4 4 English 1 0 0 0 0 0 0 0 0 0 0.2516*
18 5 1 Arabic 0 0 0 0 0 0 0 0 0 0 0.1608*
19 5 2 Arabic 1 0 0 0 0 0 0 0 0 0 0.1251*
20 5 3 English 0 0 0 0 0 0 0 0 0 0 0.2008
21 5 4 English 1 0 0 0 0 0 0 0 0 0 0.2247$
22 6 1 Arabic 0 0 0 0 0 0 0 0 0 0 0.1044
23 6 2 Arabic 1 0 0 0 0 0 0 0 0 0 0.1079
24 6 3 English 0 0 0 0 0 0 0 0 0 0 0.2311.
25 6 4 English 1 0 0 0 0 0 0 0 0 0 0.2707*
26 7 1 Arabic 0 0 0 0 0 0 0 0 0 0 0.1881
27 7 2 Arabic 1 0 0 0 0 0 0 0 0 0 0.1654
train © i [±n
A Hybrid Mining Approach: Gender Prediction from Handwriting 783
5.3. Result
5.4. Conclusion
According to run time experimental results shows 4 types of ensemble classifiers are
used and shows the overall accuracy is 50%. Handwriting recognition can also help in
biometric security enhancement. The most common binary problems are user’s gender
prediction, and to predict whether the user is left handed or right handed. A property of
all the above-mentioned problems is that they can be either balanced or unbalanced.
For example, It is balanced in the case of gender prediction and unbalanced in the case
of handedness prediction. Technically these problems are too complex, even for a
human, since it is quite difficult to predict which handwriting features properly identify
each affected class. Above table showed the multi classifiers/methods [12] SVM,
Random Forest, Naïve Bayes, and Adaboost executed the better results compared to
any single classifier.in the resultant table multiple methods have been taken and passed
through, then the method AUC, SVM. Random Forest and Adaboost −1.000 and Naïve
Bayes is 0.973 CA-SVM, Random Forest 0.993 Naïve Bayes 0.968 and AdaBoost
1.000 the precision and recall value are same so after the classification techniques
overall AdaBoost, SVM, and Random Forest Precision is higher than other.
References
1. De Pauw W, Jensen E, Mitchell N, Sevitsky G, Vlissides J, Yang J (2002) Visualizing the
execution of Java programs. In: software visualization, Springer. pp 151–162
2. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery
in databases. AI Mag 17(3):37
3. Ghorashi SH, Ibrahim R, Noekhah S, Dastjerdi NS (2012) A frequent pattern mining
algorithm for feature extraction of customer reviews. Int J Comput Sci Issues (IJCSI) 9:29–
35
4. Sah, R.D., et al.: Pattern extraction and analysis of health care data using rule based classifier
and neural network model. Int. J. Comput. Technol. Appl. 8(4), 551–556 (2017). ISSN:
2229-6093
5. Jiang J-Y, Liu R-J, Lee S-J (2011) A fuzzy self-constructing feature clustering algorithm for
text classification. IEEE Trans Knowl Data Eng 23(3):335–349
784 R. D. Sah et al.
1 Introduction
Recommender frameworks are structures that channel through data. They give item and
association proposals changed to the client’s needs and propensities. Recommender
frameworks are wise adjusted applications that propose information items or tenden-
cies, or considerably if all the more as rule data “interests”, which best suit the client’s
needs and inclinations, in a given circumstance and setting [1, 2]. The authentic errand
of a recommender structure is to foresee the examining a client will suit a fervour
utilizing distinctive reasonable models. These models misuse the evaluations given by
customers for recently noticed or obtained interests and produce necessary proposals.
The most fundamental and needful proposal plans are synergistic based separating
and substance based sifting. Synergistic based frameworks foresee thing appraisals for
present client subject to the evaluation given by different clients, who have propensities
exceedingly connected with the present client [3]. Substance based structures imagine
examinations for a secured intrigue subject to how much its portrayal (content) takes
after interests which the client has remarkably evaluated in the past [4]. Different issues
are associated with the utilization of the various recommender structures methodologies.
The most exceptional issues are good fortune related with substance based recommender
structures, degree dispersing related with substance and shared recommender frame-
works and addition related with normal recommender frameworks.
Good fortune is an issue that creates when clients are offered interests like the ones
they have seen as of now while having a poor opinion of new ones that they may like.
The issue of proportion spread ascents when the present client appropriates don’t
organize with different clients proportions. Addition grows either in light of the route
that there are no enough position proportions for another client or there is no enough
arranging on an item. To address these issues, it is speaking to set the recommender
structure methods to use on the focal points given by individual systems so as to
redesign suggestion exactness; subsequently the essential for a hybrid approach which
is the likelihood of this work.
2 Methodology
The new framework uses both client and interest based rationalities. In the client based
reasoning, the clients play out the urgent movement. On the off chance that greater bit of
the clients has a near taste, they joined into one get-together. Proposition are given to
clients subject to assessment of interests by different clients from an equivalent social
event, with whom the individual ideas no matter how you look at it or unremitting
inclinations [5]. In Interest-based methodology, taste of clients may stay unwavering or
change decently. Relative interests manufacture neighbourhoods subject to thanks of
clients. Later the framework produces recommendations with interests in the domain
that a client would lean toward. The structure is a half breed framework which joins both
substance based and normal sifting strategies for recommendation. This is done to deal
with the issue of karma in substance based recommender frameworks and the issue of
pound up in total recommender structures. The arrangement of the structure is appeared
in Fig. 1. It has two sorts of client, those getting to the framework through the web front
and those getting to the structure from mobile phones. Deals from the two interfaces are
guided by strategies for a typical host running a web server, extraordinary to the two
interfaces. Information sources are gotten from the client and besides the database, the
half and half recommender framework makes the most ideal suggestion as reaction
through the web server back to the individual advantageous and web customers.
2.1 Implementation
Prescribing recommendation to clients with basically having a recommender estimation
isn’t agreeable. The tally needs an illuminating record to work. Different recommender
structures, in both academic and business settings, acquire information by having
clients unequivocally express their propensities for things [6]. These imparted
An Implementation on Agriculture Recommendation System 787
Mobile/Web users
Phones httpclients
Web Server
Hybrid Recommender
User Data
propensities are then used to check the client’s inclination for things they have not
evaluated. From a structure building point of view, this is a quick technique and keeps
away from conceivably troublesome initiating issues for client propensities. It endures,
in any case, from the damage that there can, for specific reasons, be a goof between
what the clients state and what they do. In the Usenet zone, this has been examined by
788 K. Anji Reddy and R. Kiran Kumar
utilizing newsgroup enlistments and client works out. For example, time spent inves-
tigating, sparing or answering, and duplicating content into new demand and related
answer (Fig. 2).
Description similarity
Functionality similarity
Characteristic similarity
Output Data
jDa \ Db j
Dsim ða; bÞ ¼ ð1Þ
jDa [ Db j
It will when all is said in done be gotten from the Eq. (1) that the more prominent
Da \ Db is, the more close to the two associations are. Allotting by Da [ Db is the
scaling factor which guarantees that outline likeness is some spot in the extent of 0 and 1.
jFa \ Fb j
Fsim ða; bÞ ¼ ð2Þ
jFa [ Fb j
Table 1. (continued)
User context Preferred terms Finding context Temporal
context
requires hot and damp climatic Sowing rainfall/irrigation
conditions for its fruitful development. Seeds yielding
The perfect temperature required for the Transplantation High
existence time of the harvest ranges Fertility Labour
from 20 0 C to 40 0 C rainfall/irrigation Heavy
Soil prerequisites for Rice development germinated Seedings
Rice can be developed on wide seeds Production
assortment of soils, for example, sown Hybrid
sediments, topsoils and rock. Profound nursery crops
rich clayey or loamy soils are viewed as beds
perfect for developing rice crop yielding
Development strategies in Rice Farming High
1. Communicate strategy Seeds are
sown by hand and is appropriate in
regions where soil isn’t fruitful and
lands are dry. Requires least work and
information sources
Gives less yielding
2. Penetrating strategy Furrowing of
land and sowing of seeds can be done
by 2 people
3. Transplantation technique
Where soil has great fruitfulness and
plentiful precipitation/water system.
Paddy seeds are sown in nursery beds.
Seeds are developed and Uploaded
(following 5 weeks) and these seedings
can be transplanted in the primary field.
Overwhelming work and sources of
info. Best yielding
4. Japanese technique High yielding
assortments can be incorporated into
this strategy and require overwhelming
portion of composts. Seeds ought to be
sown on raised nursery beds and should
transplant the seedings in columns.
Valuable for high yielding half breed
crops
Real Rice Production states West
Bengal, UP, AP, Telangana, Punjab,
Haryana, Bihar, Orissa, Assam, Tamil
Nadu, Kerala
An Implementation on Agriculture Recommendation System 791
3 Conclusion
This work proposed a hybrid strategy for suggestion. The framework was utilized in the
suggestion of giving rural related data modified to the customer needs and inclinations
by consolidating both cooperative and substance approaches. Recommended frame-
work additionally makes dependent on proposals given by the accomplished users and
horticultural innovation specialists and furthermore dependent on the client intrigue
demonstrate recovering the data from the database and check for the greater compa-
rability dependent on expanded inclinations utilizing closeness computation strategy to
give fitting and best suggestion to the users.
References
1. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a
survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17:734–
749. http://dx.doi.org/10.1109/TKDE.2005.99 [Citation Time(s):1]
2. Burke R (2002) Hybrid recommender systems: survey and experiments. User Model User-
Adapt Interact 12:331–370. http://dx.doi.org/10.1023/A:1021240730564 [Citation Time(s):1]
3. Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for
performing collaborative filtering. In: Proceedings of the 22nd annual international
ACM SIGIR conference on research and development in information retrieval, pp 230–
237 [Citation Time(s):1]
4. Pazzani M, Billsus D (1997) Learning and Revising user profiles: the identification of
interesting web sites. Machine Learning: Special Issue on Multistrategy Learning 27:313–
331 [Citation Time(s):1]
5. Ye M, Tang Z, Xu, JB, Jin LF (2015) Recommender system for e-learning based on
semantic relatedness of concepts. Information 6:443–453. http://www.mdpi.com/2078-2489/
6/3/443 [Citation Time(s):1]
6. Sandeep RS, Vinay C, Hemant, SM (April 2013) Strength and accuracy analysis of affix
removal stemming algorithms. Int J Comput Sci Inf Technol 4(2):265–269
7. Gupta V, Lehal GS (May 2013) A survey of common stemming techniques and existing
stemmers for Indian languages. J Emerg Technol Web Intell 5(2):157–161
8. Bart PK, Martijn CW, Zeno G, Hakan S, Chris W (2012) Explaining the user experience of
recommender systems. User Model User-Adapt Interact 22:441–504. http://dx.doi.org/10.
1007/s11257-011-9118-4 [Citation Time(s):1]
9. Rodriguez, A, Chaovalitwongse, WA, Zhe L, et al. (October 2010) Master defect record
retrieval using network-based feature association. IEEE Trans Syst Man Cybern Part C: Appl
Rev 40(3):319–329
10. Ujwala HW, Sheetal RV, Debajyoti M (2013) A hybrid web recommendation system based
on the improved association rule mining algorithm. J Softw Eng Appl 6:396–404. www.
scirp.org/journal/jsea [Citation Time(s):1]
11. Monteiro E, Valante F, Costa C., Oliveira, JL (2015) A recommendation system for medical
imaging diagnostic. Stud Health Technol Inform 210:461–463. http://person.hst.aau.dk/ska/
MIE2015/Papers/SHTI210-0461.pdf [Citation Time(s):1]
12. Pagare R, Shinde A (2013) Recommendation system using bloom filter in map reduce. Int J
Data Min Knowl Manag Process (IJDKP) 3:127–134. https://doaj.org/article/72be16a47
32148ccaa346fbdfead3bf7 [Citation Time(s):1]
Identification of Natural Disaster Affected Area
Using Twitter
Abstract. Any social network activity can be posted now a days in Twitter.
People reach out to twitter during natural disasters for help by tweeting the areas
that are affected with the natural disaster and the type of natural disaster that has
occurred. As, Social media is greatly relied at the times of natural disasters, this
makes it very important that there must be an efficient method to analyze the
disaster related tweets and find out the largely affected areas by the natural
disaster. In this paper we classify the natural disaster-based tweets from the users
using classification machine algorithms like Naïve Bayes, Logistic Regression,
KNN, Random Forest and determine the best machine learning algorithm (based
on metrics like accuracy, kappa etc.) that can be relied to ascertain the severity
of the natural disaster at a desired area.
1 Introduction
Lots of information of different types is generated due to the advent of new tech-
nologies. To handle these huge amounts of data we need to follow some big data
analysis techniques rather than traditional methods for analysis. Data mining is one of
such technique which is used to discover interesting knowledge from large amounts of
data. This project is to identify the natural disaster effected area using real-time tweets
by finding the locations’ latitude and longitude and then mapping the location on to the
graphical map.
The twitter data regarding the natural disaster affected areas is obtained by first
creating a twitter app and by requesting authentication from R studio using the con-
sumer key and consumer secret key of the created twitter app. Then the keywords
which are synonyms of the natural disaster keyword are grouped and then all the tweets
having those keywords are obtained. Only the tweet texts are obtained by removing the
re tweet entities, people, HTML links, punctuations, numbers and unnecessary spaces
from the twitter data.
The locations from the twitter text are obtained by mapping the text to the locations
that exist and then the locations latitude and longitude is saved in excel format.
Then the latitude and longitude are mapped over the graphical map with the per-
centage effect of the natural disaster on that location which will also be obtained
through the tweets. This graph will be helpful in identifying the most prone areas of a
natural disaster, so that the areas can be alerted, and the less prone areas of a natural
disaster can also be found which could turn into most prone areas, so that the areas can
be kept under observation.
The identification of natural disaster affected area is helpful in minimizing the loss
which could be occurred during natural calamities and the effect of natural disaster over
people can also be minimized. Even after the natural disaster occurred over a certain
location, people over there need to know about it so that relief operations can be done
with more efficiency and the medical facilities can be provided over those areas. The
visual representation of the affected areas is more efficient as that help in easily
identifying the affected areas of natural disaster with more ease.
2 Related Work
Recent literature provides methods about how natural disasters can be notified to the
users through social media. Several authors used twitter as media to develop such
models using machine learning and deep learning algorithms.
Himanshu Shekhar et al. [1] explained disaster analysis using various attributes
such as user’s emotions, frequency and geographical distributions of various locations.
The article categorizes users’ emotions during natural disaster by building a sentiment
classifier.
Tahora H. Nazer et al. [2] experimented and developed a model which will detect
requests of twitter content based on certain features such as URL’s, hashtags and
menstions. The requests identification efficiency of the algorithms used (SVM, Random
Forest, Ada Boost, Decision Tree) is measured by comparing the Precision, Recall and
F-Measure of the algorithms.
Andrew Ilyas et al. [3] explained the usage of micro filters which is a system
designed to scrape the tweets and the links in the images in the twitter image data and
then machine learning is used to classify them, in which classification eliminates the
images that does not show the direct damage of the places due to the disaster which are
not useful for the rescue efforts. This paper also makes a point on the technical
problems involved like data sparseness, feature engineering, and classification.
Himanshu Verma et al. [4] explained disasters effected areas by collecting tweets
from users. After preprocessing task those will be evaluated by using Naïve Bayes
classifier. To identify best features among the tweets given by various users Chi-Square
test is conducted which will be used to generate polarity score.
794 S. Muppidi et al.
Nikhil Dhavase et al. [5] used the geoparsing of twitter data which will be used to
identify location of disaster or crisis situations through social media using twitter
analysis. The article make use of Natural Language Processing (NLP) methods,
methods and ZeroR, Filtered Classifier, Multi Scheme, Naïve Bayes Multinomial text
classifiers are used to classify the tweets to obtain the event occurred and then the
accuracies of the algorithms was compared in which the Naïve Bayes Multinomial text
classifier resulted in the best accuracy.
Harvinder Jeet Kaur et al. [6] experimented a model for sentiment analysis by Part
of Speech (POS) tagging features that are implemented by the model called Naïve
Bayes classifier to classify areas based on severity. The author achieved 66.8% of
accuracy which can be improved further.
Mironela Pirnau et al. [7] considered word associations in posts of social media for
analyzing the content of tweets. The article make use of apriori algorithm: a data
mining technique which is used to identify most frequently prone earthquake point on
the earth.
Nicholas Chamansingh et al. [8] developed a sentiment analysis framework where
Naïve Bayes classifier and Support Vector Machines (SVM) are used. The experi-
mental results are compared by reducing the feature set in each experiment.
Arvind Singh et al. [9] evaluated a framework using a linear and probabilistic
classifier for sentiment polarity classification over the tweets obtained from twitter and
the algorithms Naïve Bayes, SVM (Support Vector Machines), Logistic regression
were compared using accuracy metric. SVM had produced the highest accuracy of the
three algorithms compared.
Lopamudra Dey et al. [10] uses hotel review dataset and movie review dataset to
compare Naïve Bayes and K-NN classifiers using the metrics accuracy, precision, recall
values. Naïve Bayes gave better results for movie reviews whereas for hotel reviews
both the algorithms gave similar results.
Saptarsi Goswami et al. [11] given extensive & in-depth literature study on current
techniques for disaster prediction, detection and management and summarizing the
results disaster wise, they have proposed a framework for building a disaster man-
agement database for India hosted on open source Big Data platform like Hadoop in a
phased manner as India is in the top 5 countries in terms of absolute number of the loss
of human life.
Hyo Jin Do et al. [12], investigates people’s emotional responses expressed on
Twitter during the 2015 Middle East Respiratory Syndrome (MERS) outbreak in South
Korea. They had first presented an emotion analysis method to classify fine-grained
emotions in Korean Twitter posts and conducted a case study of how Korean Twitter
users responded to MERS outbreak using their emotion analysis method. Experimental
results on Korean benchmark dataset demonstrated the superior performance of the
proposed emotion analysis approach on real-world dataset.
Identification of Natural Disaster Affected Area Using Twitter 795
3 Proposed Method
3.2 Classification
The data that is obtained from twitter is processed or cleaned by removing the html
links, retweet entities, new lines etc., this data is then classified using the sentiment
package which contains emotions’ classifier and polarity classifier.
The dataset is classified with respect to four different machine learning algorithms
namely, Random Forest Classifier, K Nearest Neighbours, Naive Bayes Classifier and
Logistic Regression. The accuracy of classified datasets is determined to detect the best
algorithm in the classification. The dataset of the best algorithm is considered to find
severity of the natural disaster based on the negative polarity of each city in the dataset.
796 S. Muppidi et al.
3.3 Algorithm
3.3.1 Algorithm for Extraction and Cleaning Data
Create a twitter application
Get the API keys and Access Tokens
Authenticate using the API keys and Access tokens
Keyword search to access twitter data
for city in cities
for disaster in disasters
SearchTwitter with the keywords city, disaster with a separator ‘+’ upto
5000 lines
clean_tweets function
remove retweet entities of tw
remove people of tw
remove punctuation of tw
remove numbers of tw
remove html links of tw
remove unnecessary spaces of tw
remove emojis or special characters of tw
convert all the text of tw into lower case letters
remove new line an carriage return of tw
tw_vector concatenate an empty vector with the tweets tw passed to
clean_tweets function
4 Environmental Setup
Here, we are using various tools used to identify the natural disasters.
R: This paper uses R as scripting language.
twitteR: twitter, R package is used to get access to the twitter API.
sentiment: sentiment package is used to obtain the opinions of the dataset.
plotly: It is used to convert graphs into interactive web-based versions.
leaflet: This package is used for the usage of map plot.
stringr: It is used for string manipulations while cleaning data
R OAuth: It is used for the authentication request to twitter.
Stringr: this is an easy function for String functions in R. These functions can handle
characters of zero length and NA’s also.
ggplot2: Graphics in R can be implemented by using ggplot2 functions. It supports
multiple data sources and is useful for both base and lattice graphs.
RColorBrewer: This package can be used for drawing nice maps shared according to
a variable through palettes.
Devtools: Devtools helps the developer in providing functions for simplifying many
common tasks.
798 S. Muppidi et al.
Caret: The caret package contains tools for data splitting, pre-processing, feature
selection and model tuning using resampling.
5 Results
The metrics like Precision Score, Accuracy Score and Recall are used to find the best
model among the four models that were used to predict the natural disasters. The results
of accuracies of the different models are as follows (Table 1):
Table 1. Shows accuracy scores obtained by running different machine learning algorithms on
the data.
S. No. Name of the algorithm Highest accuracy obtained
1 Random Forest Classifier 0.735891648
2 K Nearest Neighbors 0.732505643
3 Naïve Bayes 0.581828442
4 Logistic Regression 0.735891648
From above Table inferences that the highest accuracy is obtained by the Random
Forest Classifier and Logistic Regression. The data of any of the above two algorithms
can be considered to calculate the negative polarity of the cities i.e. the effect of natural
disasters over the considered cities (Figs. 2, 3, 4, and 5).
Fig. 5. Depicts the negative polarity values of each considered city over a map based on latitude
and longitude
6 Conclusion
7 Future Scope
This research can be further improved by also including data that is geo parsed for the
required location where natural disasters have occurred.
References
1. Shekhar H, Setty S (2015) Vehicular traffic analysis from social media data. In: 2015
International conference on advances in computing, communications and informatics
(ICACCI)
2. Nazer TH, Morstatter F, Dani H, Liu H (2016) Finding requests in social media for disaster
relief. In: 2016 IEEE/ACM International conference on advances in social networks analysis
and mining (ASONAM)
3. Ilyas A (2014) MicroFilters: harnessing Twitter for disaster management. In: IEEE 2014
Global humanitarian technology conference. 978-1-4799-7193-0/14
Identification of Natural Disaster Affected Area Using Twitter 801
4. Verma H, Chauhan N (2015) MANET based emergency communication system for natural
disasters. In: 2015 International conference on computing, communication & automation
(ICCCA 2015)
5. Dhavase N, Bagade AM (2014) Location identification for crime & disaster events by
geoparsing Twitter. In: International conference for convergence of technology – 2014.
IEEE. 978-1-4799-3759-2/14
6. Kaur HJ, Kumar R (2015) Sentiment analysis from social media in crisis situations. In:
International conference on computing, communication and automation (ICCCA2015).
IEEE. ISBN: 978-1-4799-8890-7/15
7. Pirnau M (2017) Word associations in media posts related to disasters — a statistical
analysis. In: 2017 International conference on speech technology and human-computer
dialogue (SpeD), IEEE
8. Chamansingh N, Hosein P (2016) Efficient sentiment classification of Twitter feeds. In: 2016
IEEE International conference on knowledge engineering and applications
9. Raghuwanshi AS, Pawar SK Polarity classification of Twitter data using sentiment analysis.
Int J Recent Innov Trends Comput Commun 5(6):434–439. ISSN: 2321-8169
10. Dey L, Chakraborty S, Biswas A, Bose B, Tiwari S (2018) Sentiment analysis of review
datasets using naive bayes and K-NN classifier. Int J Inf Eng Electron Bus
11. Goswami S, Chakraborty S, Ghosh S, Chakrabarti A, Chakraborty B (2016) A review on
application of data mining techniques to combat natural disasters. Ain Shams Eng J. https://
doi.org/10.1016/j.asej.2016.01.012
12. Do HJ, Lim C-G, Kim YJ, Choi H-J (2016) Analyzing emotions in twitter during a crisis: a
case study of the 2015 middle east respiratory syndrome outbreak in Korea. In: BigComp
2016. IEEE. 978-1-4673-8796-5/16
13. Kishore RR, Narayan SS, Lal S, Rashid MA (2016) Comparative accuracy of different
classification algorithms for forest cover type prediction. In: 2016 3rd Asia-pacific world
congress on computer science and engineering. IEEE. 978-1-5090-5753-5/16
14. Nair MR, Ramya GR, Sivakumar PB (2017) Usage and analysis of Twitter during 2015
Chennai flood towards disaster management. In: 7th International conference on advances in
computing & communications, ICACC-2017. Cochin, India, 22–24 August 2017
15. Sangameswar MV, Nagabhushana Rao M, Satyanarayana S (2015) An algorithm for
identification of natural disaster affected area. J Big Data. https://doi.org/10.1186/s40537-
017-0096-1
Author Index
A C
Adada, Vishal, 261 Chakravarthy, A. S., 435
Aditya Kumar, K., 470 Chandana, G., 74
Ahmed, Kazi Arman, 48 Chandra Mohan, Ch., 712
Akhila, Nibhanapudi, 231 Chandra Prakash, S., 97
Akula, Rajani, 107 Chandrasekaran, A. D., 599
Ali, Sara, 190 Chatterjee, Chandra Churh, 39
Al-Shehabi, Sabah, 491 Chaugule, Dattatray, 254
Amudha, T., 581, 621 Chekka, Jaya Gayatri, 82
Anand, Dama, 528 Chekka, Srujan, 198
Anil Kumar, P., 712 Chetankumar, P., 1, 9
Anji Reddy, K., 785 Chillarige, Raghavendra Rao, 520
Anu Gokhale, A., 318 Chinthoju, Anitha, 82
Anuradha, T., 277 Chitra, C., 581
Arunesh, K., 335 Chittineni, Suresh, 344
Ashok Chaitanya Varma, R., 704
Ashrafuzzaman, Md., 31, 48 D
Das, Sayantan, 39
B Das, Sourav, 426
Babu, A. Sudhir, 723 Datta, Subrata, 426
Balamurali, Venkata, 301 Dayal, Abhinav, 198, 231
Bandu, Sasidhar, 147 Deepan Babu, P., 621
Bansal, Shivani, 161 Dhananjay, B., 23
Basu, S. K., 57 Dutta, Raja Ram, 772
Belly, Chandana, 66
Bhaskar, G., 452 F
Bhaskar, N., 687 Fasihuddin, Md, 169
Bhatnagar, Divya, 564, 759 Fatima, S. Sameen, 147
Birje, Mahantesh N., 480 Furukawa, Hiroshi, 408
Bollepalli, Nagaraju, 139
Bommala, Harikrishna, 222 G
Bonula, Sai Poojitha, 66 G, Pradeep Reddy, 177
Bulla, Chetan, 480 Ganashree, T. S., 687
V W
Vamsi, Sai, 301 Wang, Zhiping, 408
Vasavi, S., 318, 326
Venkata Subbarao, M., 704
Venkateswara Rao, M., 91 Y
Venkatraman, S., 350 Yellasiri, Ramadevi, 206
Vivekavardhan, J., 435 Yeluri, Lakshmi Prasanna, 592