0% found this document useful (0 votes)
179 views88 pages

Volume3 Issue12 PDF

This document summarizes a research paper that proposes using open source computer vision libraries like OpenCV and Python to implement an automatic license plate recognition system. The proposed system uses a camera to capture vehicle images, preprocesses the images, detects license plates using edge detection, extracts the license plate region, segments characters, and performs optical character recognition to recognize the license plate numbers. The document argues that OpenCV is better than MATLAB for license plate recognition systems because OpenCV code runs much faster, requires fewer system resources, and is free compared to the commercial MATLAB license.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
179 views88 pages

Volume3 Issue12 PDF

This document summarizes a research paper that proposes using open source computer vision libraries like OpenCV and Python to implement an automatic license plate recognition system. The proposed system uses a camera to capture vehicle images, preprocesses the images, detects license plates using edge detection, extracts the license plate region, segments characters, and performs optical character recognition to recognize the license plate numbers. The document argues that OpenCV is better than MATLAB for license plate recognition systems because OpenCV code runs much faster, requires fewer system resources, and is free compared to the commercial MATLAB license.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

International Journal of Computer Applications Technology and Research

Volume 3– Issue 12, 756 - 761, 2014, ISSN:- 2319–8656

Automatic License Plate Recognition using OpenCV

Pratiksha Jain Neha Chopra Vaishali Gupta


Department of CSE Department of ECE Department of CSE
IGIT, GGSIPU IGIT, GGSIPU IGIT, GGSIPU
New Delhi, India New Delhi, India New Delhi, India

Abstract: Automatic License Plate Recognition system is a real time embedded system which automatically recognizes the license
plate of vehicles. There are many applications ranging from complex security systems to common areas and from parking admission to
urban traffic control. Automatic license plate recognition (ALPR) has complex characteristics due to diverse effects such as of light
and speed. Most of the ALPR systems are built using proprietary tools like Matlab. This paper presents an alternative method of
implementing ALPR systems using Free Software including Python and the Open Computer Vision Library.

Keywords: License plate, Computer Vision, Pattern Recognition, Python, OCR

1. INTRODUCTION
The scientific world is deploying research in intelligent
transportation systems which have a significant impact on
peoples´ lives. Automatic License Plate Recognition (ALPR)
is a computer vision technology to extract the license number
of vehicles from images. It is an embedded system which has
numerous applications and challenges. Typical ALPR systems
are implemented using proprietary technologies and hence are
costly. This closed approach also prevents further research
and development of the system. With the rise of free and open
source technologies the computing world is lifted to new
heights. People from different communities interact in a Figure. 1 Example of a Number Plate with acceptable resolution
multi-cultural environment to develop solutions for mans
never ending problems. One of the notable contribution of the 2.2. Preprocess
open source community to the scientific world is Python.
Preprocessing is the set algorithms applied on the image to
Intel’s researches in Computer Vision bore the fruit called
enhance the quality. It is an important and common phase in
Open Computer Vision (OpenCV) library, which can support
any computer vision system. For the present system
computer vision development.
preprocessing involves two processes: Resize – The image
size from the camera might be large and can drive the system
2. PROPOSED SYSTEM
slow. It is to be resized to a feasible aspect ratio. Convert
Color Space – Images captured using IR or photographic
In India, basically, there are two kinds of license-plates, black
cameras will be either in raw format or encoded into some
characters in white plate and black characters in yellow plate.
multimedia standards. Normally, these images will be in RGB
The former for private vehicles and latter for commercial,
mode, with three channels (viz. red, green and blue).
public service vehicles. The system tries to address these two
categories of plates.[Reference 1]

2.1 Capture
The image of the vehicle is captured using a high resolution
photographic camera. A better choice is an Infrared (IR)
camera. The camera may be rolled and pitched with respect to
the license plates.

Figure. 2 Image converted in RGB mode

www.ijcat.com 756
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 756 - 761, 2014, ISSN:- 2319–8656

After performing the steps 1 and 2, the image is passed to next


component.

2.3. License Plate Extractor

This is most critical process in License Plate Recognition


System. In this process we apply different techniques on
image to detect and extract license plate. This process is
divided in two parts.

2.3.1License Plate Detection through Haar-like


features Figure 4: Character Segmentation

In image processing techniques, Haar-like features are used to 2.5. Optical Character Recognition
recognize objects from image . If our proposed system is
selected to detect only license plates then the Haar-like
features are used for this purpose and no further processing is Finally, the selected blobs are send to a Optical Character
done. This technique is old and laborious and more over needs Recognition (OCR) Engine, which returns the ASCII of the
a large database to store the collected samples nearly about license number.
10000 images of the plates and characters
3.WHY OPENCV??
2.3.2.2 License Plate Detection through Edge
Detection Advantages of OpenCV over MATLAB
In the other case, if our proposed system has to recognize
license plates, then the binary image is created from the  Speed: Matlab is built on Java, and Java is built
image. After that following steps are performed to extract upon C. So when you run a Matlab program, your
license plate from binary image: computer is busy trying to interpret all that Matlab
1. Four Connected Points are searched from binary image. code. Then it turns it into Java, and then finally
executes the code. OpenCV, on the other hand, is
2. Width/Height ratio is matched against those connected basically a library of functions written in C/C++.
points. You are closer to directly provide machine
language code to the computer to get executed. So
3. License Plate region is extracted from image. ultimately you get more image processing done for
4. Transformation of extracted license plate is performed. your computers processing cycles, and not more
interpreting. As a result of this, programs written in
Then the extracted license plate is passed to next component OpenCV run much faster than similar programs
for further processing. written in Matlab. So, conclusion? OpenCV is damn
This approach is quick and takes less execution time and fast when it comes to speed of execution. For
memory with high a efficiency ratio. That’s why we have example, we might write a small program to detect
adopted this technique in our project peoples smiles in a sequence of video frames. In
Matlab, we would typically get 3-4 frames analyzed
per second. In OpenCV, we would get at least 30
frames per second, resulting in real-time detection.

 Resources needed: Due to the high level


nature of Matlab, it uses a lot of your systems
resources. And I mean A LOT! Matlab code
requires over a gig of RAM to run through video. In
comparison, typical OpenCV programs only require
~70mb of RAM to run in real-time. The difference
as you can easily see is HUGE! [Reference 5].

 Cost: List price for the base (no toolboxes)


Figure 3: License Plate Extraction MATLAB (commercial, single user License) is
around USD 2150. OpenCV (BSD license)
is free! Now, how do you beat that?
2.4 Character Segmentation
 Portability: MATLAB and OpenCV run
In this part further image processing is done on extracted equally well on Windows, Linux and MacOS.
license plate to remove unnecessary data. After character However, when it comes to OpenCV, any device
segmentation, the extracted license plate has only those that can run C, can, in all probability, run OpenCV.
characters that belong to license number.
This also achieved with the width height ratios matching with
the contours detected on extracted number plate.

www.ijcat.com 757
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 756 - 761, 2014, ISSN:- 2319–8656

 Specific: OpenCV was made for image


processing. Each function and data structure was 3.1 Conclusion
designed with the Image Processing coder in mind.
Matlab, on the other hand, is quite generic. You get
almost anything in the world in the form of
toolboxes. All the way from financial toolboxes to
highly specialized DNA toolboxes.

Despite all these amazing features, OpenCV does lose out


over MATLAB on some points:
 Ease of use: Matlab is a relatively easy
language to get to grips with. Matlab is a pretty
high-level scripting language, meaning that you
don’t have to worry about libraries, declaring
variables, memory management or other lower-level
programming issues. As such, it can be very easy to
throw together some code to prototype your image
processing idea. Say for example I want to read in
an image from file and display it.

 Memory Management: OpenCV is based


on C. As such, every time you allocate a chunk of
memory you will have to release it again. If you Figure 5: NECKBEARD INDEX SCORES
have a loop in your code where you allocate a chunk
From the final scores we can see that OpenCV has the edge
of memory in that loop and forget release it
over Matlab for image and video processing development .
afterwards, you will get what is called a “leak”. This
Although Matlab has an easy learning curve, built in memory
is where the program will use a growing amount of
management, a great help section, it is very slow to execute
memory until it crashes from no remaining memory.
code, and is expensive to get started in. While OpenCV can be
Due to the high-level nature of
difficult to debug and requires much “housework code”
Matlab, it is “smart” enough to automatically
needed for memory management, header files, etc., it wins out
allocate and release memory in the background.
due to its free cost, the magnitude of sample code available on
 -Matlabs memory management is pretty good. the internet, the short development path from prototype code
Unless your careful with your OpenCV memory to embedding code, the useful programming skills learnt from
allocation and releasing, you can still be frustrated its use, and its super-fast speed. Matlab is a more “generic”
beyond belief. programming language in that it was designed for many uses,
demonstrated by its numerous toolboxes ranging from
 Development Environment: Matlab financial to specialized DNA analyzing tools. On the other
comes with its own development environment. For hand, OpenCV was made for image processing. Each function
OpenCV, there is no particular IDE that you have to and data structure was designed with the image processing
use. Instead, you have a choice of any C coder in mind.
programming IDE depending on whether you are
using Windows, Linux, or OS X. For 4. PROPOSED SOLUTION
Windows, Microsoft Visual Studio or NetBeans is
the typical IDE used for OpenCV. In Linux,
its Eclipse or NetBeans, and in OSX, we use 4.1 Assumptions
Apple’s Xcode.
The objective of our thesis is to detect and recognize license
 Debugging:Many of the standard dedugging plate. Our application is based on following assumptions:
operations can be used with both Matlab and
OpenCV: breakpoints can be added to code, the 1. Maximum expected distance between car and camera: 5
execution of lines can be stepped through, variable meters.
values can be viewed during code execution etc.
Matlab however, offers a number of additional 2.Minimum Camera angle: 90 degree (looking straight at the
debugging options over OpenCV. One great feature license plate).
is that if you need to quickly see the output of a line
of code, the semi-colon at the end can be omitted. 3.Image should be captured in daylight.
Also, as Matlab is a scripting language, when
execution is stopped at a particular line, the user can 4.Minimum Camera resolution: 3 Mega Pixel.
type and execute their own lines of code on the fly It is expected that it would not work efficiently during night
and view the resulting output without having to time, rainy and cloudy days because mobiles cameras are not
recompile and link again. Added to this is are equipped with proper lightning. It is also expected that it will
Matlab’s powerful functions for displaying data and give results with decreasing accuracy with angles deviating
images, resulting in Matlab being our choice for the significantly from the 90-degree (ideal) angle.
easiest development environment for debugging
code.

www.ijcat.com 758
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 756 - 761, 2014, ISSN:- 2319–8656

5. The new algorithm proposed for character recognition because spell checks, valid word checks are
would give results with considerable percentage of errors on useless in case of number plates. so our
implementation. algorithm is simple, fast and occupies less
memory than an OCR engine. also it is
6. The efficiency of the proposed system can be measured
only in terms of number of license plates successfully and expected that it will provide correct results
correctly recognized which can only be measured upon upon implementation
implementation.
4.3 Proposed Algorithm
7. Efficiency and Performance of new system may decline due
to discard of OCR library but the memory requirements will
DESCRIPTION OF THE NEW ALGORITHM FOR
decrease and also the effort for installing, configuring and
running the system would decrease. CHARACTER RECOGONITION

In this part, character segmented license plate is passed to


optical character recognition algorithm designed by us which
uses a matrix transformation of the pixel value of the binary
4.2 New Components of Proposed System and thus applying various filtration and extraction techniques
as compared to traditional system which uniquely identifies the characters. OCR Algorithm
 DETECTION ALGORITHM returns license plate in text format. Which is later stored in a
text file thus reducing the space in the memory
We are designing this system specifically for
storage.[Reference 3]
the new proposed high security number plates
which have black boundary across the number
plate and also have a uniform font all across  Our algorithm uses a 3-4 MB database of 36
the country. So we are going to utilize this files(images).
black boundary in our system by using edge  These 36 images are samples containing capital
based license plate detection method in our alphabets(A-Z) and numerals(0-9).
system. traditionally haar like features are used  These images will be colored images but only of one
for detection. This algorithm needs a large color say red. So pixel values where there is character
number of license plate images which are is 255,0,0.
manually obtained from a number of images  and where the space is empty the value is
including the backgrounds. It requires a larger 255,255,255. then the characters obtained after
memory to run, which is not suitable for character segmentation are mapped with the
embedded systems. Another problem with the characters in the data base one by one
systems using AdaBoost is that they are slower  The character obtained from segmentation is mapped
than the edge-based methods. This system is to a matrix one by one.
very sensitive to the distance between the  then this matrix is compared with the sample images
camera and the license plate as well as the in database one by one.
view angle. So we can eliminate all the above  if the character matches then the value of the
problems by using edge based detection character is returned. Else next character is matched.
method for our system. however detection rate  if any of the 36 characters don’t match with the
of edge based method is slightly less than haar image then either there is a distorted image or the
like features. This can be supported by the number plate is invalid. In this condition a message
study conducted by some research students of will be returned.
Linnaeus university. Haar like feature were  The matrix used will be preferably 20x20.
96% accurate while edge based method was  for mapping between sample image and actual
87% accurate. character we are using green intensity pixels.
Because their value is 0 at every point where there is
 OCR LIBRARY NOT USED character and 255 where there is white background.
In traditional system OCR Library is used  we could have used blue intensity as well.
which has to be installed, configured and run  this algorithm will thus possibly be able to detect
and which actually recognize the characters. similar characters like 8 and B because percentage of
We are not using this library. Instead we are matching of one character will be higher than other.
developing our own algorithm for character  It is assumed that if any image is matched with 70-
reading. also OCR engines occupy more than 80% pixel intensities we assume that character
25 MB space and configuration of OCR engine matches
has to be done with the source code. Compiler  then matrix is refreshed and new character gets
takes quite long time in compilation of typical copied in matrix.
OCR code because of the specific quality the process continues until all the characters in license plate
checks, spell checks, valid word checks etc. gets matched.
these checks are not required in ALPR case
www.ijcat.com 759
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 756 - 761, 2014, ISSN:- 2319–8656

4.3.1 Algorithm for OCR Reader


4.4 Asymptotic analysis of Algorithm
OCRReader(image temp)
{ Complexity of above code is O(mn2). Where m=36( A-Z ,
Int mat[30][30] 0-9)and n is the pixel resolution. this is same as complexity
for(y=0, y<temp->height , y++) of OCR reader.
{ But In traditional System OCR engine has database of
for (x=0 , x<temp->width , x++) 2^16 symbols(Unicode). So there value of m=2^16. Hence
{ significant reduction in Time complexity. also since
/value is a structure/ database is of 36 symbols instead of 2^16 it results in
value=get RGB values of temp significant reduction in Space complexity.
/b,g,r are integers/
b=value.val[0] 5. CONCLUSION
g=value.val[1]
r=value.val[2] The message of this research is to show that free and open
source technologies are matured enough for scientific
mat[y][x]=g; computing domains. The system works satisfactorily for wide
} variations in illumination conditions and different types of
} number plates commonly found in India. It is definitely a
stringcopy(file,”folder of 36 files”) better alternative to the existing proprietary systems, even
for(int j=0 , j<36 , j++) though there are known restrictions
{
count=0
5.1 Future Work
stringcopy(file,”folder of 36 files”)
ext=get file j Currently We have proposed the algorithms for our ALPR
l= length of string (file) system. In future we would implement this system on Open
file[l]=ext CV library and would also do the performance check of the
file[l+1]='\0' system designed. We would do the performance analysis in
file=file+”jpg” terms of number of plates successfully recognized. So far the
lchar=create image of frame of given size algorithms looks good and suitable but if the OCR algorithm
lchar= load image of name file +”jpg” won’t work than we will try to give some new algorithm or
for(y=0; y<lchar->height; y++) would do the comparative study of different OCR present in
{ the market and would try to choose the best among them and
for (x=0;x<lchar->width;x++) implement the system.
{
value= get RGB values of lchar
6. ACKNOWLEDGMENTS
b=value.val[0]
g=value.val[1] Our Sincere Thanks to Dr Kalpana Yadav ,our mentor for
r=value.val[2] providing her guidance and cooperation in this Research.

l_mat[y][x]=g;
}
} 7. REFERENCES

for(y=0;y<30;y++) [1] A. Conci, J. E. R. de Carvalho, T. W. Rauber, “ A


{ Complete System for Vehicle Plate Localization,
for(x=0;x<30;x++) Segmentation and Recognition in Real Life Scene” , IEEE
{ LATIN AMERICA TRANSACTIONS, VOL. 7, NO.
if(mat[y][x]==l_mat[y][x]) 5,SEPTEMBER 2009
count++;
[2] Ahmed Gull Liaqat, “Real Time Mobile License Plate
}
Recognition System” IEEE White paper California, VOL.2
}
2011-12-05,Linnaeus University.
if(count>400)
{ [3] Ondrej Martinsky (2007). "Algorithmic and
cout<<ext<<"in"; mathematical principles of automatic number plate
} recognition systems" (PDF). Brno University of Technology.
} http://javaanpr.sourceforge.net/anpr.pdf.
}

www.ijcat.com 760
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 756 - 761, 2014, ISSN:- 2319–8656

[4] P. Kreling, M. Hatsonn "A License Plate


Recognition algorithm for Intelligent Transportation
System applications". University of the
Aegean and National Technical University of Athens.
2006. Archived from the original on 2008-04-20.

[5] K.M Sajjad , “ALPR Using Python and Open CV”


Dept Of CSE, M.E.S College of Engineering
Kuttipuram, kerala.2008-06-21

[6] Nicole Ketelaars “Final Project : ALPR”, 2007-12-


11

[7] Steven Zhiying Zhou , Syed Omer Gilani and


Stefan Winkler “Open Sourc framework Using Mobile
Devices” Interactive Multimedia Lab, Department of
Electrical and Computer Engineering National
University of Singapore, 10 Kent Ridge Crescent,
Singapore 117576

[8] Yunggang Zhang, Changshui Zhang “A new


Algorithm for Character Segmentation Of license plate”
Beijing University,China, 2007-5-8

www.ijcat.com 761
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 762 - 768, 2014, ISSN:- 2319–8656

Analysis the Effect of Educational Package on


Promotion of Protective Behaviors in Exposure to
Dust Phenomenon by SPSS Software
Ali Ramezankhani1 Kobra Doostifar2* Saeed Motesaddi Zarandi3
Department of Public Health, Department of Public Health, Department of Environmental
Faculty of Health, Shahid Shushtar Faculty of Medical Health, Faculty of Health, Shahid
Beheshti University of Medical Sciences, Ahvaz Jundishapur Beheshti University of Medical
Sciences, Tehran, Iran University of Medical Sciences, Sciences, Tehran, Iran
Ahvaz, Iran

Tayebeh Marashi4 Nezhat Shakeri5 Maryam Parsanahad6


Department of Public Health, Department of Biostatistics, Department of nutrition, Shushtar
Faculty of Health, Shahid Faculty of Paramedicin, Shahid Faculty of Medical Sciences, Ahvaz
Beheshti University of Medical Beheshti University of Medical Jundishapur University of Medical
Sciences, Tehran, Iran Sciences, Tehran, Iran Sciences, Ahvaz, Iran

Corresponding Author: Kobra Doostifar, Department of Public Health, Shushtar Faculty of Medical Sciences, Ahvaz
Jundishapur University of Medical Sciences, Ahvaz, Iran.

ABSTRACT
Background: dust phenomenon, especially in the summer, is a serious problem in Khuzestan province and has adverse effects on
health, environment and economic. Behavior change is the base for health associated risks prevention and one of the models for
behavior change in individual level is Health Belief Model. The aim of this study was to analyze the effect of educational
package on promotion of protective behaviors in exposure to dust phenomenon in Ahvaz teachers by SPSS software.
Methods: This was an experimental study in which 200 teachers randomly were divided into two groups, case and control groups
[n=100, in each group]
pha test. Before the educational intervention, questionnaire was
completed by two groups and educational requirements of subjects were detected and an educational package was designed and
implemented for 4 week. The control group received no intervention. After a month the effect of educational package on study
variables was evaluated. Data were analyzed with SPSS statistical software version 17, by descriptive and analytical tests.
Result: Mean age of case and control groups were 39.75±6.95 and 39.78±7.02 years, respectively. There was no significant
association between marriage and behavior, but there was a significant association between employment number of years and
behavior [p=0.03], education and behavior [p=0.03]. Based on the findings of this study there was a significant association
between the knowledge, health belief model components and behavior of the study subjects, before and after the intervention [p<
0.001].
Conclusion: designing and implementation of an educational package based on health belief model can promote the knowledge
and protective behaviors in exposure to dust particles effectively.

Keywords: education, educational package, protective behaviors, dust phenomenon

1. INTRODUCTION respiratory disease, lung cancer, heart disease and damage


Scientific researches in the past two decades have shown to other organs of the body. Dust in the long-term can
that particles are one of the specific pollutants [1]. Results change the mood. Aggression and depression are also other
of a study by the World Health Organization in Berlin, effects of dust [4]. Annette peters showed an association
Copenhagen and Rome showed that particles smaller than between heart disease and air particles based on
2.5 microns in diameter, seriously affect the health and epidemiological evidences. This study showed that daily
increase death due to respiratory disease, cardiovascular changes in air particles concentration are closely associated
disease and lung cancer [2]. According to the World Health with cardiovascular related deaths, hospital admit,
Organization report over 2 million people developing cardiovascular disease symptom exacerbation [5]. Chinese
premature death in every year [3]. Most important effects researchers in 2007 investigated the impact of particles
of dust are allergy of eyes, nose and throat, respiratory tract smaller than 2.5 microns in samples that collected from
infections, headache, nausea, allergic reactions, chronic Asian dust on macrophage and lung cells DNA of mice.

www.ijcat.com 762
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 762 - 768, 2014, ISSN:- 2319–8656

The results showed that extract of these particles leaded to behaviors in exposure to dust phenomenon [11 questions,
DNA damage in these cells [6]. In a study on students of score range=11-55], respectively. In part 3, questions
850 schools in United Arab Emirates the prevalence of included: perceived susceptibility, perceived severity and
asthma and allergy were 13.6% and 73%, respectively and perceived benefits each with 7 questions [score range = 7-
there was a significant association between dust and 35], perceived barriers with 6 questions [score range = 6-
mentioned diseases [7]. Some recent epidemiological 36], cue to action with 2 questions [score range = 2-10] and
studies suggested that long-term transport of dust particles self-efficacy with 4 questions [score range = 4-20].
is associated with increased daily mortality in Seoul [8] and
Taipei and Taiwan [9] and respiratory and cardiovascular The validity of questionnaire was evaluated by means of
diseases [8]. although , the dust particles contribute to face validity and content validity methods. Face validity
drinking water contamination and therefore gastrointestinal was evaluated by means of relevance, simplicity and clarity
disease [10]. of questions. Questionnaire was evaluated by 10 experts
[included 5 experts in health education, 5 experts in
Dust occurrence increases some heavy metals such as lead environmental health, 1 epidemiologist and 2 experts in
concentrations up to 3-fold [11]. Also the concentration of Biostatistics]
toxic metals, mercury and arsenic, greatly increased [12]. Q
Air -borne microorganisms concentration in dusty days
increase and most of these microorganisms are pathogen that were same with study population in demographic
and cause disease. characteristics.
Iran is located in an area with a dry climate and over 30%
of the country is arid and semi-arid area [14]. In the past detected: Knowledge: 0.76, perceived susceptibility: 0.73,
few years the country has been exposed to the dust perceived severity: 0.88, perceived benefits: 0.72,
phenomenon. This country because of neighboring with the perceived barriers: 0.77, cue to action: 0.71, perceived Self-
wide expanse of desert is adversely affected by this efficacy: 0.71 and protective behaviors: 0.71.
phenomenon. One of the areas that has been affected by
this phenomenon is Khuzestan province that is located in The questionnaire was used before and after the educational
southwestern of Iran [15]. This phenomenon has been package implementation to determine the perceived
associated with some problems and rise in adverse effects knowledge, sensitivity, severity, benefits, barriers and self-
on health, environment and economic [15]. In the dusty efficacy, and behavior of subjects. Data were collected by a
days, admit of patients with pulmonary disease to health questionnaire in interview method before intervention in
centers in Ahvaz has showed 70 percent increase [1]; One case and control group. Then data were analyzed and the
way for reducing the incidence of diseases caused by dust educational needs of subjects were detected and
is educational interventions. Health education experts use educational package was designed. Educational package
the appropriate models in order to health education included an educational booklet, pamphlet and CD that
interventions design, one of this models is Health Belief represented essential information in relation to dust
Model. The aim of this study was design and particles, disease prevention and protective behaviors. Then
implementation of educational package based on health researcher represent educational package to the case group
belief model and evaluation of its effects on protective in four sessions [each sessions was 90 minutes].
behaviors in teachers by SPSS software. In this study Educational methods were the lecture, questioning and
educational package was an educational program that has responding and showing the video clip. Immediately and
been designed based on educational needs of subjects in two months after the educational intervention, subjects data
order to prepare the subjects for implementation of were collected by questionnaire and were analyzed. The
protective behaviors in exposure to dust phenomenon. control group received no intervention. Data were analyzed
with SPSS statistical software version 17, by frequency
distribution, correlation coefficient, t-Student, Chi-square,
Mann-Whitney and Repeated measures tests.
2. METHODS
This was an experimental and analytical study [before and
after] that has been conducted in Ahvaz. Two hundred 3. RESULTS
teachers randomly were divided into two groups, case and Two hundred teachers were participated in this study. Mean
control groups [n=100, in each group]. The inclusion age of case and control groups were 39.75±6.95 and
criteria included: employment for at least three years, lack 39.78±7.02 years; respectively. Age 40-49 years had the
of respiratory disease and cardiovascular disease and most frequency in the case group [46%] and the control
satisfaction for participation in the study. Exclusion criteria group [45%]. In the two groups more present of subjects
included: unsatisfaction for participation in the study and were married [82% in the case group and 81% in the
nonparticipation in the educational sessions. Data were control group]. In the two groups most of subjects had
collected by a questionnaire that was designed according to Bachelor's degree. More present of subjects had two
the health belief model constructs. The questionnaire children [47% in the case group and 46.3% in the control
contained 78 questions in four parts. This parts included group] and less present of subjects had four children. Most
questions regarding to individual characteristics [19 j ’
questions], knowledge [14 questions, score range=14-32], and protective behaviors. Age, marriage, education,
health belief model constructs [34 questions] and protective

www.ijcat.com 763
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 762 - 768, 2014, ISSN:- 2319–8656

Number of children and previous education about dust


phenomenon were not significantly difference between
cases and controls. 4. DISCUSSION
One of the most important air pollutants are dust particles
There was no significant association between marriage and and high concentrations of particles in dust storms causing
behavior but there was a significant association between sinusitis, bronchitis, asthma, allergy and damage to the
employment number of years and behavior (p=0.03) and defensive function of macrophages, thereby leading to an
also between education and behavior (p=0.03). In the two increase in hospital infections [19]. The purpose of the
groups the most used sources for information about present study was implementation of protective behaviors
protective behaviors in the exposure to dust particles were when dust phenomenon occurs. To the best of our
radio and television and there was no significant difference knowledge the effect of this educational method on
between two groups table 1. protective behaviors in exposure to dust particles has not
been investigated in previous studies.

Table 1. Information sources regarding the dust Before the intervention, protective behaviors of teachers in
phenomenon in teachers, Ahvaz exposure to dust phenomenon were in intermediate level.
But significant difference between behavior score of cases
and controls after intervention showed the positive effect of
The mean of knowledge, perceived susceptibility, educational package on promotion of protective behaviors
in case group. In Araban et al. study after the intervention
Groups cases controls behavior score was significantly different between case and
control groups [20]. The results of Giles et al. meeting in
Canada on strategies for reducing the adverse effects of air
Information yes no yes no p-value pollution on health, entitled "The decision to effective
source intervention", showed that personal behavior modification
and pollutants exposure reduction are appropriate
Radio & 93 7 95 5 0.552 approaches for reducing the adverse effects of air pollution
television [21]. Sexton study showed that on dusty days persons
Newspaper & 43 57 40 60 0.667 changed their behavior by reducing time spent outdoors by
magazine 18% or 21 minutes [22].
I g ’
family 58 42 58 42 1
sources of information about protective behaviors in the
coworkers 57 43 58 42 0.886 exposure to dust particles were radio, television and family.
Significant difference between knowledge score of two
friends 55 45 57 43 0.776 groups after the educational intervention was due to the
Book & booklet 32 68 31 69 0.115 educational sessions about protective behaviors in exposure
to dust phenomenon and this educational sessions promoted
Physician and the knowledge of case group about protective behaviors.
staff of health These results are in line with the use of Health Belief
center 36 64 35 65 0.077
Model in researches about diabetes control and self-care
and promotion of knowledge after the educational
internet 47 53 45 55 0.777 intervention [23, 24]. Boonkuson et al. showed that
perceived severity, perceived benefits, perceived barriers, protective behaviors in exposure to health problems
perceived self-efficacy, cue to action and behavior score depends on the knowledge and attitude [25]. Pazira et al.
were not significantly different between cases and controls reported that a part of Tehran population knowledge about
before intervention. Whereas, immediately and two months air pollution and protective behaviors was in low level [26].
after the educational intervention there was a significant In the health belief model constructs the perceived
difference between cases and controls in mentioned susceptibility score before intervention was the same in
variables [p=0.001] [table 2, 3]. both groups. After intervention perceived susceptibility
, 16% ’ score was significantly different between case and control
home in dusty days but after the intervention 57% of cases groups [p=0.001]. This finding is consistent with increased
’ perceived susceptibility in researches about the
intervention 70% of cases sometimes educated their osteoporosis prevention [27] and diet care [24].
students in relation to air pollution but after the intervention Also, perceived severity score before intervention in two
75% of cases often educated their students. Before the groups sho ’
intervention, only 2% of cases have been eaten more of illnesses caused by dust particles was over the average,
amount of fruit and vegetable in dusty days but after the probably due to the illness of friends or coworkers or
intervention the rate increased to 41%. Before the damages caused by dust particles. The dramatic increase in
intervention, only 3% of cases have been eaten more the perceived severity score of the case group seems to be
amount of milk in dusty days. ’

www.ijcat.com 764
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 762 - 768, 2014, ISSN:- 2319–8656

and providing educational package included showing video increased similarly [23, 27]. Also in the Praphant et al.
clip, booklet and pamphlet, mention to importance of study perceived severity was in moderate level [60.6%]
protective behaviors on the dusty days, high cost of [28].
pulmonary, cardiovascular and gastrointestinal tract
diseases. In the other studies perceived severity has been

Table 2. Comparing knowledge and behavior scores regarding the protective behaviors in exposure to dust phenomenon in
teachers, Ahvaz

variable group Before Immediately after 2 month after Repeated


intervention intervention intervention Measures test
mean± SD mean±SD mean±SD
Case 53.81 ±3.43 58/77 ±1/44 58.13 ± 2.54 P >0.001
knowledge Control 53.65 ±3.5 53/15 ±2/66 53.48 ± 3.64 P> 0.2
Independent sample P >0.745 P >0/001 P >0.001
t test
Case 33 ±4.14 37/81 ±3/77 38.98 ± 2.97 P >0.001
behavior Control 34.13 ±4.7 34/99 ±4/08 34.22 ± 4.66 P >0.176
Independent sample P >0.073 P >0/001 P >0.001
t test

Table 3. Comparing health belief model constructs scores regarding the protective behaviors in exposure to dust phenomenon in
teachers, Ahvaz

variable group Before intervention Immediately after 2 month after Repeated


intervention intervention Measures
mean± SD
mean±SD mean±SD test
Case 27.02 ±2.58 29.6 ±2.29 29.6 ± 2.28 P >0.001
Perceived Control 27.38 ±2.74 27.35 ±2 .74 27.41 ± 2.78 P >0.988
susceptibility
Independent P >0.341 P >0.001 P >0.001
sample t test
Case 28.75 ±1.97 31.7 ±2.43 31.46 ± 2.23 P >0.001
Perceived Control 29.03 ±2.4 28.84 ±2.47 29.04 ± 2.44 P >0.792
severity
Independent P >0.369 P >0.001 P >0.001
sample t test

Case 28.09 ±2.87 30.54 ±3.02 30/39 ± 2/95 P >0.001


Perceived Control 28.75 ±2.91 28.71 ±2.94 28/66 ± 2/89 P >0.978
benefits
Independent P >0.109 P >0.001 P >0.001
sample t test
Case 16 ±4.51 18.12 ±4.73 18.19 ± 4.51 P >0.002
Perceived Control 16.37 ±4.62 16.36 ±4.57 16.27 ± 4.55 P >0.943
barriers
Independent P >0.557 P >0.011 P >0.006

www.ijcat.com 765
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 762 - 768, 2014, ISSN:- 2319–8656

sample t test

Case 12.51 ±2.15 15.19 ±2.2 15.07 ±2 .27 P >0.001


Perceived self- Control 13.07 ±2.49 13.02 ±2.57 12.98 ± 2.52 P >0.97
efficacy
Independent P >0.091 P >0 .001 P >0.001
sample t test
Case 7 ±1.22 7.4 ±1.1 7.45 ± 1.11 P >0.007
Control 7.06 ±1.48 7.06 ±1.48 7.04 ± 1.32 P >0.13
Cue to action Independent P >0.755 P > 0.041 P >0.049
sample t test

T ’ Other Health Belief Model construct was perceived self-


perceived benefits of protective behaviors in dusty days in efficacy. Self-efficacy is the beliefs of person about their
both groups were in good condition but after the abilities to control events that affects their life [31].
intervention perceived benefits score has increased in the T ’ -efficacy score was in low level in case and
case group. Because the protective behaviors in the g T ’ -
exposure to dust phenomenon are not time consuming and efficacy score was significantly different between cases and
x ’ controls after intervention that this was due to the effect of
visit, can be useful in promotion of perceived benefits. education on self-efficacy and promotion of protective
Araban et al. showed that perceived benefits increased by behaviors in the case group. Araban et al. reported that self-
improvement in stage of change [20]. Qaderi et al. reported efficacy was significantly higher in case group after the
that perceived benefits score increased in the case group intervention [20].
after the intervention [29]. Education based on Health Belief Model promoted the
Perceived barriers of protective behaviors in both case and teachers protective behaviors in exposure to dust
control groups were moderate before intervention; but there phenomenon by promoting the perceived susceptibility,
were significant differences between the perceived barriers severity, benefits and barriers using a variety of educational
of the two groups after the intervention due to the effect of methods and educational package. On the other hand,
the education. Most of the teachers' perceived barriers for Stimuli or cues to action encouraged teachers to the
protective behaviors in exposure to dust phenomenon protective behaviors. Also, present study showed that the
included the unavailability of respiratory mask, discomfort media played an important role in attracting teachers to
and shortness of breath and nose sweating because of the protective behaviors. Drakshyani et al. in their study on
mask, financial difficulties in buying more fruit and schools and colleges teachers in India showed the Necessity
vegetable in a dusty days, financial difficulties due to stay of public health education programs through the mass
at home and problems related to communication with media [32]. Khorsandi et al. reported that radio and
coworkers. in Araban et al. study two barriers , delay in television programs are the most important cues to action in
doing things and need to enter the crowded areas of city, reducing the risk of osteoporosis [33] .The present study
After education in the intervention group changed, while designed an educational package in order to promote the
these barriers in the control group was not significant [20]. teachers behaviors but similar research should be
Koch showed that elimination of perceived barriers conducted in other parts of the country.
increased walking in diabetic patients [30].

www.ijcat.com 766
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 762 - 768, 2014, ISSN:- 2319–8656

5. CONCLUSION matter in Europe: a review of methods and results. Journal


The findings of this study showed that the designed of Aerosol Science. 2008;39[10]:827-49.
educational package was effective in promoting the 12. Wang Y, Zhang X, Arimoto R, Cao J, Shen Z.
knowledge and protective behaviors in teachers. Therefore, Characteristics of carbonate content and carbon and oxygen
health behavior education in other people, especially in isotopic composition of northern China soil and dust
high-risk groups is important to maintenance of protective aerosol and its application to tracing dust sources.
behaviors in exposure to the dust phenomenon. Atmospheric Environment. 2005;39[14]:2631-42.
13. Schlesinger P, Mamane Y, Grishkan I. Transport of
microorganisms to Israel during Saharan dust events.
6. ACKNOWLEDGEMENT Aerobiologia. 2006;22[4]:259-73.
The source of data used in this paper was from MSc thesis.
The authors express sincere thanks to the teachers because 14. Modarres R. Regional maximum wind speed frequency
of participation and cooperation in the study. analysis for the arid and semi-arid regions of Iran. Journal
of Arid Environments. 2008;72[7]:1329-42.[In persian]
15. Zarasvandi A, Moore F, Nazarpour A. Mineralogy and
11. REFERENCES morphology of dust storms particles in Khuzestan province:
1. Colls J. Air pollution.2nd ed Taylor, Francis, Inc, London XRD and SEM analysis concerning. Iranian journal of
and New York. 2003.p.4. crystallography and mineralogy. 2011;19[3]:511- 8.[In
persian]
2. World Health Organization. Particulate matter air
pollution: how it harms health. 2005 April. Fact sheet 16. Hossein Gholizadeh N. A effect of intervention based
EURO/04/05. Available from: http: //www.euro.who.int on HBM on improving of knowledge, attitude and practice
/document/ mediacentre/ fs0405e.pdf. Accessed July 16, among students in Tehran [dissertation]. School of public
2013. health: Tehran university of medical sciences, 2010.[In
persian]
3. Department of Public Health and Environment. World
Health Organization Geneva Switzerland. Urban outdoor 17. Mirzaei E. Health education and health promotion in
air pollution database. 2012. Available from: http: // www. textbook of public health. Tehran: Rakhshan. 2004.[In
who. int/ phe 2012. Accessed Jul 30, 2012. persian]

4. Griffin DW, Kellogg CA. Dust storms and their impact 18. Taheri Aziz M. Effectiveness of Designed Health
on ocean and human health: E ’ Education Package on Healthy Behaviors of Patients with
EcoHealth. 2004;1[3]:284-95. Tuberculosis at Pasteur Institute of Iran [dissertation].
Tehran: Tarbiat modares of Medical sciences; 2004. p.67-
5. Peters A. Particulate matter and heart disease: evidence 8.[In persian]
from epidemiological studies. Toxicology and applied
pharmacology. 2005;207[2]:477-82. 19. Al-Hurban AE, Al-Ostad AN. Textural characteristics
of dust fallout and potential effect on public health in
6. Meng Z, Zhang Q. Damage effects of dust storm PM< Kuwait City and suburbs. Environmental Earth Sciences.
sub> 2.5</sub> on DNA in alveolar macrophages and lung 2010;60[1]:169-81.
cells of rats. Food and chemical toxicology.
2007;45[8]:1368-74. 20. Araban M. Design and Evaluation of a Theory-Based
Educational Intervention on Behavioral Improvement in
7. Bener A, Abdulrazzaq Y, Al-Mutawwa J, Debuse P. Pregnant Women in Terms of Exposure to Air Pollution
Genetic and environmental factors associated with asthma. [Dissertation]. Tehran: Tarbiat Modares University, Faculty
Human biology. 1996:405-14. of Medical Sciences; 2013. [Text in Persian]
8. Kwon H-J, Cho S-H, Chun Y, Lagarde F, Pershagen G. 21. Giles LV, Barn P, Kunzli N, Romieu I, Mittleman MA,
Effects of the Asian dust events on daily mortality in Seoul, van Eeden S, et al. From good intentions to proven
Korea. Environmental Research.2002;90[1]:1-5. interventions: effectiveness of actions to reduce the health
impacts of air pollution. Environmental health perspectives.
9. Ichinose T, Yoshida S, Hiyoshi K, Sadakane K, Takano
H, Nishikawa M, et al. The effects of microbial materials 2011; 119[1]:29.
adhered to Asian sand dust on allergic lung inflammation. 22. Sexton AL. Responses to Air Quality Alerts: Do
Archives of environmental contamination and toxicology. Americans Spend Less Time Outdoors? [Dissertation].
2008; 55[3]: 348-57. Minnesota: Department of Applied Economics, University
10. Kellogg CA, Griffin DW, Garrison VH, Peak KK, of Minnesota; 2011.
Royall N, Smith RR, et al. Characterization of aerosolized 23. Mohebi S, Sharifirad G, Hazaveyee S. The effect of
bacteria and fungi from desert dust events in Mali, West educational program based on Health Belief Model on
Africa. Aerobiologia. 2004;20[2]:99-110. diabetic foot care. Int J Diab Dev Ctries. 2007; 27:18-23.[In
11. Viana M, Kuhlbusch T, Querol X, Alastuey A, Harrison persian]
R, Hopke P, et al. Source apportionment of particulate

www.ijcat.com 767
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 762 - 768, 2014, ISSN:- 2319–8656

24. Kamrani A. The effect of educational diet on nutrition


type2 diabetes based on Health Belief Model [Dissertation].
Faculty of Public Health, Isfahan University of Medical
Science ,2006.[In persian]
25. Boonkuson T. Comparisons of behavior on protection
of health problems caused by rock dust of the population
with difference on personal factors and social and
economic factors in the rock crusher plants, Saraburi
province. [dissertation]. Project joint research of nursing
college attached to institute of development of public health
personnel, 1994.
26. Pazira M,Ghanbari R, Askari E. Survey of knowledge,
attitude and practice about air pollution among of people
lives in Tehran and some activity of emergency.
Conference on air pollution and effects on health. 2005
Feb:1- 2; Tehran, Iran.
27. Saeedi M. The survey of educational program based on
health belief model on preventive osteoporosis
[Dissertation]. School of Public Health: Isfahan University
of Medical Science, 2005.[In persian]
28. Praphant A. Preventive behaviors form dust among
workers in lime factories and stone crushing mills, Nakhon
Si Thammarat province. [dissertation]. College of Public
Health: Chulalongkorn University. 2003.
29. Amal KA, Dalal MAR, Ibrahim KL. Effect of
educational film on the health belief model and self-
examination practice. East Mediterr Health J.
1997;3[3]:435-44.
30. Koch J. The role of exercise in the African-American
woman with type 2 diabetes mellitus: application of the
health belief model. J Am Acad Nurse Pract
2002;14[3]:1 6‒9
31. Kazdin AE. Encyclopedia of Psychology. New York:
Ox U P ; 1 ‒3
32. Drakshyani Devi K, Venkata Ramaiah P. Teacher's
knowledge and practice of breast self examination. Indian J
Med Sci 1994;48[12]: 84‒7
33. Khorsandi M, Shamsi M, Jahani F. The Survey of
Practice About Prevention of Osteoporosis Based on Health
Belief Model in Pregnant Women in Arak City. Journal of
Rafsanjan University of Medical Sciences. 2013;12[1]:35-
46.

www.ijcat.com 768
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 769 - 773, 2014, ISSN:- 2319–8656

Illumination Invariant Face Recognition System using


Local Directional Pattern and Principal Component
Analysis

Latha B Dr. Punidha R


Department of Computer Science and Engineering Department of Computer Science and Engineering
Roever College of Engineering and Technology, Roever College of Engineering and Technology,
Perambalur Tamilnadu Perambalur Tamilnadu
India – 621 220. India – 621 220.

Abstract: In this paper, we propose an illumination-robust face recognition system using local directional pattern images. Usually,
local pattern descriptors including local binary pattern and local directional pattern have been used in the field of the face recognition
and facial expression recognition, since local pattern descriptors have important properties to be robust against the illumination
changes and computational simplicity. Thus, this paper represents the face recognition approach that employs the local directional
pattern descriptor and two-dimensional principal analysis algorithms to achieve enhanced recognition accuracy. In particular, we
propose a novel methodology that utilizes the transformed image obtained from local directional pattern descriptor as the direct input
image of two-dimensional principal analysis algorithms, unlike that most of previous works employed the local pattern descriptors to
acquire the histogram features. The performance evaluation of proposed system was performed using well-known approaches such as
principal component analysis and Gabor-wavelets based on local binary pattern, and publicly available databases including the Yale B
database and the CMU-PIE database were employed.

Keywords: Face Recognition; Local Directional Pattern; Principal Component Analysis.

the LDP descriptor as face representation and also


1. INTRODUCTION demonstrated better performance compared to LBP.
Face recognition has become one of the most popular
research areas in the fields of image processing, pattern
recognition, computer vision, and machine learning, because it In this paper, we present a novel approach for achieving
spans numerous applications [1, 2]. It has many applications the illumination invariant face recognition via LDP image.
such as biometrics systems, access control systems, Most of previous face recognition researches based on LBP
surveillance systems, security systems, credit-card verification utilized the descriptor for histogram feature extraction of the
systems, and content-based video retrieval systems. Up to now, face image. Similar to LBP, LDP descriptor is also utilized to
main algorithms have been applied to describe the faces: extract the histogram facial features in previous researches
principal component analysis (PCA) [3], linear discriminant [11]. However, this paper uses the LDP image as a direct input
analysis (LDA) [4], independent component analysis (ICA) [5] image of 2D-PCA algorithms for illumination-robust face
and so on. Generally, face recognition systems can achieve recognition system. The proposed approach has an advantage
good performance under controlled environments. However, that the illumination effects can be degraded by using binary
face recognition systems tend to suffer when variations in pattern descriptor and 2D-PCA is more robust against
different factors such as varying illuminations, poses, illumination variation than global features such as PCA and
expression are present, and occlusion. In particular, LDA since 2D-PCA is a line-based local feature. The
illumination variation that occurs on face images drastically performance evaluation of the proposed system was carried out
degrades the recognition accuracy To overcome the problem using the Yale B database [12] and the CMU-PIE
caused by illumination variation, various approaches have been illumination/light database [13]. Consequently, we will
introduced, such as preprocessing and illumination demonstrate the effectiveness of the proposed approach by
normalization techniques [6], illumination invariant feature comparing our experimental results to those obtained with
extraction techniques [7], and 3D face modeling techniques other approaches.
[8]. Among abovementioned approaches, local binary pattern
(LBP) [9] has received increasing interest for face
representation in general [10]. The LBP is a non-parametric
2. PROPOSED APPROACH
kernel which summarizes the local spatial structure of an This paper aimed to improve face recognition accuracy
image. Moreover, it has important properties to be tolerant under illumination-variant environments by using the LDP
against the monotonic illumination changes and computational image and 2D-PCA algorithm. The LDP image is derived from
simplicity. More recently, the local directional pattern (LDP) the edge response values in different eight directions Next, the
method was introduced by Jabid et. al for a more robust facial LDP image is directly inputted in 2D-PCA algorithm and
representation [11]. Because LBP is sensitive to non- nearest neighbor classifier is applied to recognize unknown
monotonic illumination variation and also shows poor user. Remark that the proposed face recognition system is very
performance in the presence of random noise, they proposed different approach when compared to previous works, because

www.ijcat.com 769
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 769 - 773, 2014, ISSN:- 2319–8656

most of previous works were used the local pattern descriptors 2.2 2-D Principal Component Analysis
to extract the histogram features. However, we utilize the Principal component analysis is a well-known feature
transformed image from local pattern descriptor, i.e. LDP extraction and data representation technique widely used in
image as input image for further feature extraction procedure, the areas of pattern recognition, computer vision, signal
i.e. 2D -PCA algorithm. The advantage of the proposed processing, and so on. The central underlying concept is to
approach is that the illumination effects on face can be reduce the dimensionality of a data set while retaining the
degraded by using binary pattern descriptor, and also 2D-PCA variations in the data set as much as possible. In the PCA-
is more robust against illumination variation than global based face recognition method, 2D face image matrices must
features such as PCA and LDA since 2D -PCA is a line-based be previously transformed into 1D image vectors column by
local feature. In fact, we will be show that the recognition column or row by row fashions. However, concatenating 2D
accuracy of the proposed system outperforms that of matrices into 1D vector often leads to a high-dimensional
conventional approaches in the experimental results vector space, where it is difficult to evaluate the covariance
matrix accurately due to its large size. Furthermore,
2.1 Local Directional Pattern computing the eigenvectors of a large covariance matrix is
The LBP operator labels the pixels of an image by very time-consuming.
thresholding a (3x3) neighborhood of each pixel with the
center value and considering the results as a binary number, of To overcome these problems, a new technique called 2D-
which the corresponding decimal number is used for labeling. PCA was proposed, which directly computes eigenvectors of
The derived binary numbers are called local binary patterns or the so-called image covariance matrix without matrix-to-
LBP codes. While the LBP operator uses the information of vector conversion. Because the size of the image covariance
intensity changes around pixels, LDP operator use the edge matrix is equal to the width of images, which is quite small
response values of neighborhood pixels and encode the image compared with the size of a covariance matrix in PCA, 2D-
texture. The LDP is computed as follow. The LDP assigns an 8 PCA evaluates the image covariance matrix more accurately
bit binary code to each pixel of an input image. This pattern is and computes the corresponding eigenvectors more efficiently
then calculated by comparing the relative edge response values than PCA. It was reported that the recognition accuracy of
of a pixel by using Kirsch edge detector. Given a central pixel 2D-PCA on several face databases was higher than that of
in the image, the eight-directional edge response values mi (i = PCA, and the feature extraction method of 2D-PCA is
0, 1, ….,7) are computed by Kirsch masks as shown in Figure computationally more efficient than PCA. Unlike PCA, which
1. Since the presence of a corner or an edge shows high treats 2D images as 1D image vectors, 2D-PCA views an
response values in some particular directions, thus, most image as a matrix. Consider an m by n image matrix A.
prominent directions of k number with high response values ( n d )

are selected to generate the LDP code. In other words, top-k Let  R be a matrix with orthonormal columns, n  d.
directional bit responses, bi , are set to 1, and the remaining (8 Projecting A onto X yields m by d matrix Y= AX. In 2D-
- k) bits are set to 0. Finally, the LDP code is derived by PCA, the total scatter of the projected samples is used to
determine a good projection matrix X . Suppose that there are
M training face images, denoted m by n matrices Ak (k =1,
7
1 , x  0 A  1/ M  Ak
 b (m i  m k )  2 , bi ( x )  
i
LDP k i
x  0, 2, …,M), and the average image is denoted as k
.
i0 0, (1) Then, the image covariance matrix, G is given by

where mk is the kth most significant directional response. Figure M T

2 shows an example of LDP code with k-3. 1


G 
M
 ( Ak  A ) ( Ak  A )
k 1 (2)

m3 m2 m1 b3 b2 b1
It has been proven that the optimal value for the projection
matrix X opt is composed by the orthonormal eigenvectors
m4 X m0 b4 X b0 X1, X2 , …, Xd of G corresponding to the d largest
eigenvalues, i.e., Xopt = [ X1, X2 ,…, Xd]. Since the size of
m5 m6 m7 b5 b6 b7 G is only n by n, computing its eigenvectors is very efficient.
The optimal projection vectors of 2D-PCA, X1, X2,…,Xd are
used for feature extraction. For a given face image A, the
Figure 1. Edge Response and LDP Binary Bit Positions feature vector Y = [Y1, Y2, …,Yd] , in which Y has a
dimension of m by d , is obtained by projecting the images
into the eigenvectors as follows:
85 32 26 313 97 503 0 0 1

537 X 399 Yk  ( A  A ) X , k  1 , 2 ,... d


53 50 10 1 X 1 k
(3)

60 38 45 161 97 303 0 0 0 After feature extraction by 2D-PCA, the Euclidean distance is


used to measure the similarity between the training and test
features. Suppose that each training image Ak is projected onto
X opt to obtain the respective 2D-PCA feature F k. Also, let A
Figure 2. LDP Code with (k=3) be a given image for testing and its 2D-PCA feature be F.
Then, the Euclidean distance between F and F k is computed
by

www.ijcat.com 770
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 769 - 773, 2014, ISSN:- 2319–8656

m d

)   ( f i, j  f i, j )
k k 2
d (F , F
i 1 j 1
(4)

where k is 1,2,…M , and M is the total number of training 3.1 Yale B Database
images. This distance measurement between 2D-PCA features
is further employed to classify unknown user. To evaluate the performance of the proposed method, we
partitioned the Yale B database into training and testing sets.
3. EXPERIMENTAL RESULTS Each training set comprised of seven images per subject, and
To evaluate the robustness of the proposed method, the remaining images were used to test the proposed method.
we used images from the Yale B database and CMU-PIE We selected the illumination-invariant images for training, and
database. In the Yale B database, we employ 2,414 face images the remaining images with varying illumination were
for 38 subjects representing 64 illumination conditions under employed for testing. Next, we investigated the recognition
the frontal pose, in which subjects comprised 10 individuals in performance of proposed approach with conventional
the original Yale face database B and 28 individuals in the recognition algorithms such as PCA and Gabor-wavelet based
extended Yale B database. The CMU-PIE database contains on LBP. For the Yale B database, the recognition results in
more than 40,000 facial images of 68 individuals, 21 terms of different pre-processing images and algorithms are
illumination conditions, 22 light conditions, 13 poses and four shown in Figure 4. To further disclose the relationship between
different expressions. Among them, we selected each the recognition rate and dimensions of feature vectors, we
illumination and light images of 68 individuals with frontal showed the recognition results along with different dimensions
pose (c27). So, the CMU-PIE illumination set consists of 21 in Figure 4. Also, we summarized the maximum recognition
images of 68 individuals (21x68 images in total), and the rates as various approaches in Table 1. As a result, the
CMU-PIE illumination set also consists of 22 images of 68 proposed approach using LDP and 2D-PCA shows a maximum
individuals (22x68 images in total). All face images of two recognition rate of 96.43%, when k is 3. However, the
databases were converted as grayscale and were cropped and maximum recognition rates revealed 81.34% and 69.50% for
normalized to a resolution of (48x42) pixels. Figure 3 show an PCA and Gabor-wavelets based on LBP approaches,
example of raw, histogram equalization, LBP, and LDP images respectively. Consequently, the recognition accuracy of
in CMU-PIE illumination database, respectively. Remark that proposed method was better than that of conventional methods,
LDP images are divided into different groups as k number. The and it also shows performance improvement ranging from
performance evaluation was carried out using each database of 15.09% to 29.63% in comparison to conventional methods.
the Yale B database and CMU-PIE illumination/light database
with each pre-processing images.

Raw Histogram LBP LDP(K=1) LDP(K=2) LDP(K=3) LDP(K=4) LDP(K=5)

Figure 3. Input Images for CMU-PIE Illumination Database.

www.ijcat.com 771
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 769 - 773, 2014, ISSN:- 2319–8656

wavelets based on LBP approach. Similar to results of CMU-


PIE illumination database, the recognition rate of proposed
method showed 100.0%. Consequently, we confirmed the
effectiveness of the proposed method under varying lighting
conditions through these experimental results.

Figure 4. Recognition Rates of Yale B Database as


Feature Dimensions.

Table 1. Maximum Recognition Rates on Yale B


Database.

Figure 5. Recognition Rates of CMU-PIE Illumination


Input Images Recognition Approaches
Database as Feature Dimensions

Gabor-wavelets Table 2. Maximum Recognition Rates on CMU-PIE


PCA 2D-PCA based LBP Illumination Database.

Raw 30.03% 30.78% 57.14% Input


Images Recognition Approaches
Histogram 50.61% 54.09% 69.50%
Gabor-wavelets
LBP 72.09% 91.54% X PCA 2D-PCA based LBP

LDP (K=1) 70.77% 94.60% X 63.23


Raw 23.38% 25.22% %
LDP (K=2) 77.16% 95.72% X
Histogra 82.20
m 49.12% 83.24% %
LDP (K=3) 78.85% 96.43% X

LBP 98.97% 100.0% X


LDP (K=4) 77.96% 96.10% X

LDP
LDP (K=5) 81.34% 95.49% X (K=1) 84.71% 99.93% X

3.2 CMU-PIE Database LDP


(K=2) 98.09% 100.0% X
For the CMU-PIE illumination/light database, each
training set comprised of only one images per subject, and the LDP
remaining images were used for testing. Similar to the Yale B (K=3) 99.71% 100.0% X
database, we selected an illumination-invariant image for
training, and the remaining illumination-variant images were
employed for testing. The recognition results for the CMU-PIE LDP
illumination database s are shown in Figure 5. For the CMU- (K=4) 99.85% 100.0% X
PIE illumination database, the recognition results of various
approaches shown in Table 2. In Table 2, the proposed method LDP
showed a maximum recognition rate of 100.0%, when k is 2, 3,
4, and 5, while PCA and Gabor-wavelets based on LBP
(K=5) 99.49% 100.0% X
approaches were 99.85% and 82.20%, respectively. As a result,
the recognition accuracy of proposed method showed better
performance compared to other methods, and it provide the
performance improvement of 17.80% in comparison to Gabor-

www.ijcat.com 772
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 769 - 773, 2014, ISSN:- 2319–8656

4. CONCLUSIONS IEEE T. Neural Networ., vol. 13, no. 6, pp. 1450-1464,


In this paper, we proposed a novel approach for 2002.
achieving the illumination invariant face recognition via LDP [6] S. Lawrence, C. L. Giles, A. C. Tsoi and A. D. Back,
image. Especially, we presented the face recognition “Face recognition: A convolutional neural-network
methodology that utilizes the transformed image obtained approach”, IEEE T. Neural Networ., vol. 8, no. 1, pp. 98-
from LDP as the direct input image of 2D-PCA, unlike that 113, 1997.
most of previous works used the local pattern descriptors to
acquire the histogram features. The proposed method has an [7] W. Chen, M. J. Er and S. Wu, “Illumination
advantage that the illumination effects can be degraded by compensation and nor-malization for robust face
LDP descriptor and 2D-PCA is also more robust against recognition using discrete cosine transform in logarithm
illumination variation than global features. The performance domain”, IEEE T. Syst. Man Cy. B., vol. 36, no. 2, pp.
evaluation was performed on the Yale B database and CMU- 458-466, 2006.
PIE database, and the proposed method showed the best [8] C. Sanderson and K. K. Paliwal, “Fast features for face
recognition accuracy compared to different approaches. authentication under illumination direction changes”,
Through experimental results, we confirmed the effectiveness Pattern Recogn. Lett., vol. 24, no. 14, pp. 2409-2419,
of the proposed method under illumination varying 2003.
environments.
[9] R. Basri and D. W. Jacobs, “Illumination Modeling for
Face Recognition”, IEEE T. Pattern Anal., vol. 25, no. 2,
5. REFERENCES pp. 89-111, 2003.
[1] S. N. B. Kachare and V. S. Inamdar, Int. J. Comput.
Appl., vol. 1, no. 1, 2010. [10] C. Shan, S. Gong and P. W. McOwan, “Facial expression
recognition based on Local Binary Patterns: A
[2] T. Gong, “High-precision Immune Computation for comprehensive study”, Image Vision Comput., vol. 27,
Secure Face Recognition”, International Journal of no. 6, pp. 803-816, 2009.
Security and Its Applications (IJSIA), vol. 6, no. 2,
SERSC, pp. 293-298, 2012. [11] T. Jabid, M. H. Kabir, and O. S. Chae, “Robust Facial
Expression Recognition Based on Local Directional
[3] L. R. Rama, G. R. Babu and L. Kishore, “Face Pattern”, ETRI Journal, vol. 32, no. 5. pp. 784-794, 2010.
Recognition Based on Eigen Features of Multi Scaled
Face Components and Artificial Neural Network”, [12] A. Georghiades, P. Belhumeur and D. Kriegman, “From
International Journal of Security and Its Applications Few to Many: Illumination Cone Models for Face
(IJSIA), vol. 5, no. 3, SERSC, pp. 23-44, 2012. Recognition under Variable Lighting and Pose”, IEEE
Transactions on Pattern Analysis and Machine
[4] W. Xu and E. J. Lee, “Human Face Recognition Based Intelligence, vol. 23, no. 6, pp. 643-660, 2001.
on Improved D-LDA and Integrated BPNNs
Algorithms”, International Journal of Security and Its [13] T. Sim, S. Baker and M. Bsat, “The CMU Pose,
Applications (IJSIA), vol. 6, no. 2, SERSC, pp. 121-126, Illumination, and Expression Database”, IEEE
2012. Transactions on Pattern Analysis and Machine
Intelligence, vol. 25, no. 12, pp. 1615-1618, 2003.
[5] M. S. Bartlett, J. R. Movellan and S. Sejnowski, “Face
Recognition by Independent Component Analysis”,

www.ijcat.com 773
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 774 - 777, 2014, ISSN:- 2319–8656

Efficient Resource Management Mechanism


with Fault Tolerant Model for Computational
Grids
R. Kohila
Department of Computer Science and Engineering
V.S.B Engineering College
Tamilnadu, India.

Abstract- Grid computing provides a framework and deployment environment that enables resource
sharing, accessing, aggregation and management. It allows resource and coordinated use of various
resources in dynamic, distributed virtual organization. The grid scheduling is responsible for resource
discovery, resource selection and job assignment over a decentralized heterogeneous system. In the
existing system, primary-backup approach is used for fault tolerance in a single environment. In this
approach, each task has a primary copy and backup copy on two different processors. For dependent
tasks, precedence constraint among tasks must be considered when scheduling backup copies and
overloading backups. Then, two algorithms have been developed to schedule backups of dependent and
independent tasks. The proposed work is to manage the resource failures in grid job scheduling. In this
method, data source and resource are integrated from different geographical environment. Fault-
tolerant scheduling with primary backup approach is used to handle job failures in grid environment.
Impact of communication protocols is considered. Communication protocols such as Transmission
Control Protocol (TCP), User Datagram Protocol (UDP) which are used to distribute the message of
each task to grid resources.

Key Words: Grid Computing, Primary Backup, Communication Protocols, TCP-


Transmission Control Protocol, UDP- User Datagram Protocol.

1. INTRODUCTION
1.2 Grid Computing Overview
1.1 Grid Computing
A distributed heterogeneous
Grid Computing is distributed; large computing system consists of a distributed
scale cluster grid computing has emerged as suite of different high-performance machines,
the next-generation parallel and distributed interconnected by the high-speed networks, to
computing methodology, which aggregates perform different computationally intensive
dispersed heterogeneous resources for solving applications that have various computational
various kinds of large-scale parallel requirements. Heterogeneous computing
applications in science, engineering and systems range from diverse elements or
commerce. It can integrate and utilize paradigms within a single computer to a cluster
heterogeneous computational resources from of different types of personal computers to
different networks or regional areas into a high coordinate geographically distributed machines
performance computational platform and can with different architectures. Job scheduling is
solve complex computing-intensive problems one the major difficult tasks in a computational
efficiently. Grid service represents grid.
convergence between high performance
computation and web service. Grid aims
ultimately to turn the global network of
computers into a vast computational resource.

www.ijcat.com 774
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 774 - 777, 2014, ISSN:- 2319–8656

overloading and resource reclaiming to


improve the guarantee ratio of the system.
2. RELATED WORK They address the problem of building
a reliable and highly-available grid service by
2.1 Scheduling replicating the service on two or more hosts
using the primary-backup approach. The
Lan Foster and Car Kesselman (2004) primary goal is to evaluate the ease and
[3] develop a fault tolerant job scheduling efficiency with which this can be done, by first
strategy in order to tolerate faults gracefully in designing a primary-backup protocol using
an economy based grid environment. They Open Grid Services Infrastructure (OSGI).
propose a novel adaptive task check pointing
based fault tolerant job scheduling strategy for 2.5 Primary-Backup Approach
an economy based grid. They present a survey Primary-backup approach, also called
with the grid community. The survey reveals passive replication strategy. In this approach a
that, users have to be highly involved in backup is executed when its primary cannot
diagnosing failures, that most failures are due complete execution due to processor failure. It
to configuration problems and that solutions does not require fault diagnosis and is
for dealing with failures are mainly guaranteed to recover all affected tasks by
application-dependent. processor failure. Most works using the
primary-backup approach consider scheduling
of independent tasks.
2.2 Heuristic Algorithms
2.5.1 Backup Overloading and
Heuristic algorithms are used for the Overlapping
static and dynamic tasks assignment problem.
Many of these algorithms apply only to the Backup overloading is used to reduce
special case where the tasks are independent replication cost of independent task which
i.e. with no precedence constraints. Heuristic allows scheduling backups of multiple
scheduling algorithms are used in primaries on the same or overlapping time
heterogeneous computing environments. These interval on a processor.
algorithms use historical data of execution In Backup Overlapping, for example,
time and system load and explicit constraints two primary copies are scheduled on processor
to schedule jobs.
1 and processor 3 and their backups are
scheduled in an overlapping manner on
2.3 Non-Evolutionary Random processor 2.
Scheduling Algorithm
Non-evolutionary random scheduling 2.6 Backup Schedules
(RS) algorithm is used for efficient matching
and scheduling of inter-dependent tasks in a After the earliest possible start time
distributed heterogeneous computing (DHC) for a backup on all processor is determined, the
system. RS is a succession of randomized task time window that this backup can be scheduled
orderings and a heuristic mapping from task on all processor is determined which is
order to schedule. Randomized task ordering is between this time and its deadline. Primary
effectively a topological sort where the schedules and non over loadable backup
outcome may be any possible task order for schedules that are scheduled on the time
which the task precedent constraints are window can be identified. These backup
maintained. schedules could be scheduled for independent
tasks or dependent tasks as interleaving
2.4 Fault Tolerant Dynamic technique is allowed.
Scheduling Algorithm
Manimaran and Murthy (1997) [4]
proposed an algorithm for dynamically 3 PROPOSED WORK
scheduling arriving real-time tasks with
resource and primary-backup-based fault- The proposed system integrates
tolerant requirements in a multiprocessor resource and data source from different
system. This algorithm can tolerate more than geographical environment. In this system,
one fault at a time and employs techniques location of resource and data source is
such as distance concept, flexible backup identified. There exist a fault-detection
mechanism such as fail-signal and acceptance

www.ijcat.com 775
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 774 - 777, 2014, ISSN:- 2319–8656

test to detect processor and task failures. If a 3.1.2 MCT-LRC Algorithm


failure is detected in the primary, the backup
will execute. Backup resources are designed
with replication factors. Impact of MCT-LRC algorithm is used for
communication protocols is considered. scheduling backup of independent tasks. The
objective is to reduce job rejection. For all
Communication protocols are used to
processor besides the one where the primary is
distribute the message of each task to grid
scheduled on, boundary schedules within the
resource. time window are considered and the boundary
schedule which can complete earliest is
3.1 Scheduling Strategies chosen. This algorithm first considers the left
boundary schedules of the time window. Then,
The resources and data source are all existing schedules within or overlapping
managed from different environment. The with the time window are examined one by
location of resources and data sources is one. The algorithm calculates replication cost
identified. There exist a fault-detection of the earliest schedule on the current
mechanism such as fail-signal and acceptance processor and records it.
test to detect processor and task failures. If a
failure is detected in the primary, the backup
will execute. Backup resources are designed 3.2 Communication Protocols
with replication factors. Backup overloading is
used for scheduling backups of multiple Different communication protocols
primaries on the same or overlapping time are used in grid environment. Transmission
Control Protocol (FTP) and User Datagram
interval on a processor. Resource reclaiming is
Protocol (UDP) are used for data transmission.
also invoked when the primary completes
Grid File Transfer Protocol (FTP) is used for
earlier than its estimated execution time. It is data transmission. It is used to transfer files in
necessary so that the backup slot can be parallel manner. These protocols are used to
released timely for new tasks. distribute the message of each task to grid
resource. The system analyses the data
MRC-ECT algorithm is used to transmission in task failures.
schedule the backup of independent job. MCT-
LRC algorithm is used to schedule the backup 4. CONCLUSION
of dependent job. For independent tasks, In this paper, for Grid systems, we
scheduling of backups is independent and addressed the problem of fault- tolerant
backups can overload as long as their primaries scheduling of jobs in heterogeneous
are schedule on different processors. Backup environment. We considered the impact of
scheduling and overloading of dependent tasks communication protocols. Algorithms MRC-
are nontrivial and the constraint is that the ECT and MCT-LRC for independent and
backup of second task can only start after dependent tasks respectively do not require
backup of first task finishes and must not be sampling. These algorithms can schedule
schedule on the processor where primary of the backups in a much faster way in heterogeneous
first task is located. environment. .

3.1.1 MRC-ECT Algorithm REFERENCES

MRC-ECT algorithm is used for [1] Aikebaier.A, Makoto Takizawa,


scheduling backup of independent tasks. The Abawajy. J.H (2004), "Fault-Tolerant
objective is to improve resource utilization. Scheduling Policy for Grid Computing
For all processor besides the one where the Systems" Proceeding on Parallel and
primary is scheduled on, boundary schedules Distributed Processing Symposium
within the time window are considered and (IPDPS)..
their replication cost is compared. This
algorithm first considers the left boundary [2] Al-Omari, R., Somani, A.K., and
schedules of the time window. It is guaranteed Manimaran, G. (2001),”A New Fault-
to find an optimal backup schedule in terms of Tolerant Technique for improving
replication cost for a task. Schedulability in Multiprocessor Real-
Time Systems” Proceedings on
Parallel Distributed Processing
Symposium (IPDPS).

www.ijcat.com 776
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 774 - 777, 2014, ISSN:- 2319–8656

[3] Foster .I and Kesselman .C


(2004),“Grid: Blueprint for a Future
Computing Infrastructure". Morgan
Kaufmann

[4] Subbiah A. and Blough D.


(2004),”Distributed Diagnosis in
Dynamic Fault Environments” Parallel
and Distributed Systems.

[5] Qin X. and Jiang H. (2006),”A Novel


Fault tolerant Scheduling Algorithm
for precedence constrained Tasks in
Real-Time Heterogeneous Systems”
Parallel Computing.

Author:
Mrs.R.Kohila received
M.E(CSE) degree from
Kongu Engineering
College(Affiliated to
Anna University,
Autonomous),
Perundurai, India in 2011 and MCA
degree from Bannari Amman Institute
of Technology (Affiliated to Anna
University), Sathyamangalam, India,
in 2009 and B.Sc., degree from Trinity
College of Arts and Science for
women (Affiliated to Periyar
University), Namakkal,India, in 2006.
She has the experience in Teaching of
3+Years. Now she is currently
working as an Assistant Professor in
V.S.B Engineering College, Karur,
Tamil Nadu, and India. His research
interests include Data Mining,
Advanced Data Bases, Computer
Networks etc. She had presented
papers in 2 National Conferences so
far.

www.ijcat.com 777
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 778 - 781, 2014, ISSN:- 2319–8656

Hybrid Based Resource Provisioning in Cloud

N.Karthika K.Prabhakar R.Sangeetha


Vivekanandha College of Vivekanandha College of Vivekanandha College of
Engineering For Women Engineering For Women Engineering For Women
Tiruchengode, India Tiruchengode, India Tiruchengode, India

Abstract: The data centres and energy consumption characteristics of the various machines are often noted with different capacities.
The public cloud workloads of different priorities and performance requirements of various applications when analysed we had noted
some invariant reports about cloud. The Cloud data centres become capable of sensing an opportunity to present a different program.
In out proposed work, we are using a hybrid method for resource provisioning in data centres. This method is used to allocate the
resources at the working conditions and also for the energy stored in the power consumptions. Proposed method is used to allocate the
process behind the cloud storage.

Keywords: Cloud workload, Hybrid resource provisioning, Cloud storage and Invariant reports.

There should be techniques to avoid excess power


1. INTRODUCTION consumption. So the ultimate goal of the cloud user is to
Cloud Computing is the common buzzword in today’s minimize cost by renting the resources and from the cloud
Information Technology. Cloud computing platforms are service provider’s perspective to maximize profit by
rapidly emerging as the preferred option for hosting efficiently allocating the resources. In order to achieve the
applications in many business contexts [5]. An important goal the cloud user has to request cloud service provider to
feature of the cloud that differentiates it from traditional make a provision for the resources either statically or
services is its apparently infinite amount of resource capacity dynamically so that the cloud service provider will know how
(e.g. CPU, storage, Network) offered at a competitive rate. many instances of the resources and what resources are
It eliminates the need for setting up infrastructure which takes required for a particular application. By provisioning the
several months. Start-up Companies need not invest on the resources, the QoS parameters like availability, throughput,
infrastructure because the resources are available in the cloud security, response time, reliability, performance etc must be
[6]. Cloud Computing enables users to acquire resources achieved without violating SLA.
dynamically and elastically.
Platform as a Service is a way to rent hardware, operating
A major challenge in resource provisioning technique is to systems, storage and network capacity over the internet. It
determine the right amount of resources required for the delivers a computing platform or software stack as a service to
execution of work in order to minimize the financial cost from run applications. This can broadly be defined as application
the perspective of users and to maximize the resource development environment offered as a ‘service’ by the
utilization from the perspective of service providers [4]. So, vendors. The development community can use these platforms
Cloud computing is one of the preferred options in today’s to code their applications and then deploy the applications on
enterprise. Resource provisioning means the selection, the infrastructure provided by the cloud vendor. Here again,
deployment, and run-time management of software (e.g., the responsibility of hosting and managing the required
database management servers, load balancers) and hardware infrastructure will be with the cloud vendor. AppEngine,
resources (e.g., CPU, storage, and network) for ensuring Bungee Connect, LongJump, Force.com, WaveMaker are all
guaranteed performance for applications. This resource instances of PaaS.
provisioning takes Service Level Agreement (SLA) into
consideration for providing service to the cloud users. This is
an initial agreement between the cloud users and cloud service
providers which ensures Quality of Service (QoS) parameters 2. RELATED WORKS
like performance, availability, reliability, response time etc. From the last fewer, cloud computing has evolved as
delivering software and hardware services over the internet.
Based on the application needs Static Provisioning/Dynamic The extensive research is going on to extend the capabilities
Provisioning and Static/Dynamic Allocation of resources have of cloud computing. Given below present related work in the
to be made in order to efficiently make use of the resources area of cloud’s scalability and resource provisioning in cloud
without violating SLA and meeting these QoS parameters. computing.
Over provisioning and under provisioning of resources must
be avoided. Another important constraint is power In 2010 ChunyeGong, Jie Liu, Oiang Zhang, Haitao Chen and
consumption. Care should be taken to reduce power Zhenghu has discussed Characteristics of Cloud Computing.
consumption, power dissipation and also on VM placement. In this paper summarize the general characteristics of cloud
computing which will help the development and adoption of
this rapidly evolving technology. The key characteristics of
cloud computing
are low cost, high reliability, high scalability, security. To
make clear and essential of cloud computing, proposes the

www.ijcat.com 778
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 774 - 777, 2014

characteristics of this area which make the cloud computing algorithm is used to attempt each node keep busy and goal of
being cloud computing and distinguish it from other research load balance. Also proposed LBMM (Load Balance Min-
area. The cloud computing has its own technical, economic, Min) scheduling algorithm can make the minimum execution
user experience characteristics. The service oriented, loose time of each task on cloud computing environment and this
coupling, strong fault tolerant, business model and ease use will improve the load unbalance of the Min-Min. In order to
are main characteristics of cloud computing. Abstraction and reach load balance and decrease execution time for each node
accessibility are two keys to achieve the service oriented in the three-level cloud computing network, the OLB and
conception. In loose coupling cloud computing run in a client- LBMM scheduling algorithm are integrated. The load
server model. The client or cloud users connect loosely with balancing of three-level cloud computing network is utilized
server or cloud providers. Strong fault tolerant stand for main all calculating result could be integrated first by the
technical characteristics. The ease use user experience secondlevel node [5]
characteristic helps cloud computing being widely accepted
by non computer experts. These characteristics expose the In January 31, 2011, Sivadon Chaisiri, Bu-Sung Lee, and
essential of cloud computing. [1] Dusit Niyato discuss about the Optimization of Resource
Provisioning Cost. Under the resource provisioning optimal
In 2010 Pushpendra kumar pateria, Neha Marria discussed cloud provisioning algorithm illustrates virtual machine
resource provisioning in sky environment. Resource manager management that consider multiple provisioning stages with
is used for resource provisioning and allocate of resource as demand price uncertainty. In this task system model of cloud
user request. Offer the rule based resource manager in sky computing environment has been thoroughly explained using
environment for utilization the private cloud resource and various techniques such as cloud consumer, virtual machine
security requirement of resource of critical application and and cloud broker in details. [8]
data .Decision is made on the basis of rule. Performance of
resource manager is also evaluated by using cloudsim on basis The agent-based adaptive resource allocation is discussed in
of resource utilization and cost in sky environment. Set 2011 by the Gihun Jung, Kwang Mong Sim. In this paper the
priorities request and allocate resource accordingly. Sky provider needs to allocate each consumer request to an
computing provides computing concurrent access to multiple appropriate data center among the distributed data centers
clouds according user requirement. Define the Cloud services because these consumers can satisfy with the service in terms
like Software as a service (SaaS), Platform as a Service of fast allocation time and execution response time. Service
(PaaS) and Infrastructure as a service. [2] provider offers their resources under the infrastructure as a
service model. For IaaS the service provider delivers its
In 2010 Zhang Yu Hua, Zhang Jian ,Zhang Wei Hua present resources at the request of consumers in the form of VMs. To
argumentation about the intelligent cloud computing system find an appropriate data center for the consumer request,
and Data warehouse that record the inside and outside data of propose an adaptive resource allocation model considers both
Cloud Computing System for data analysis and data mining. the geographical distance between the location of consumer
Management problem of CCS are: balance between capacity and datacenters and the workload of data center. With
and demand, capacity development planning, performance experiment the adaptive resource allocation model shows
optimization, system safety management. Architecture of the higher performance. An agent based test bed designed and
Intelligence cloud computing system is defined with Data implemented to demonstrate the proposed adaptive resource
source, data warehouse and Cloud computing management allocation model. The test bed implemented using JAVA with
information system. [3] JADE (Java Agent Development framework). [9]

In 2008 discussed about the Phoenix by Jianfeng Zhan, Lei


Wang, Bipo Tu, Yong Li, Peng Wang, Wei Zhou and Dan
Meng. In this paper discuss the designed and implemented
cloud management system software Phoenix Cloud. Different
3. SYSTEM ARCHITECTURE
Dynamically adjusting the number of machines has each type
department of large organization often maintain dedicate
to minimize total energy consumption and performance
cluster system for different computing loads. The department
penalty in terms of scheduling delay. In my proposed using
from big organization have operated cluster system with
the hybrid method for resource provisioning in data centers.
independent administration staffs and found many problem
This method is used to allocate the resources at the working
like resource utilization rates of cluster system are varying,
conditions and also energy stored for the power
dedicated cluster systems cannot provision enough resources
consumptions. Proposed method is used to allocate the
and number of administration staff for cluster system is high.
process behind the cloud storage.
So here designed and implemented cloud management system
software Phoenix Cloud to consolidate high performance
computing jobs and Web service application on shared cluster 3.1 User Interface Design
system. Phoenix Cloud decreases the scale of required cluster
In this module we design the windows for the project. These
system for a large organization.
windows are used to send a message from one to another. In
improves the benefit of scientific computing department,
this module mainly we are focusing the login design page
and provisions resources. [4]
with the Partial knowledge information. Application Users
need to view the application they need to login through the
In 2010 Shu-Ching Wang, Kuo-Qin Yan, Wen-Pin Liao and
User Interface GUI is the media to connect User and Media
Shun-Sheng Wang discussed about Load Balancing in Three-
Database.
Level Cloud Computing Network. Cloud computing utilize
low power host to achieve high reliability. In this Cloud
computing is to utilize the computing resources on the
network to facilitate the execution of complicated tasks that
require large-scale computation. Use the OLB scheduling

www.ijcat.com 779
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 774 - 777, 2014

3.2 Dynamic Capacity Provisioning In Data Centers


In this section we address the simulation of heterogeneous
active machine. Here we are going to create difference
machine for stored based on the client demand that is
production data centers often comprise heterogeneous
machines with different capacities and energy consumption
characteristics..The energy level consumption is updated by
the cloud service provider which belongs to the datacenter.
Data center have more right the route the heterogeneous active
machine. This area regulates Heterogeneous active machine
creation. Also it regulates memory consumption and key
challenge that often has been overlooked or considered
difficult to address is heterogeneity,

3.3 Machine Heterogeneity Model Approach


This simulation addresses the production data centers often
comprise several types of machines from multiple update.
They have heterogeneous processor architectures and speeds,
hardware features, memory and disk capacities. Consequently,
they have different runtime energy consumption rates. Here
we are going to create difference machine for stored based on
the client demand that is production data centers often
comprise heterogeneous machines with different capacities
and energy consumption characteristics..The energy level
consumption is updated by the cloud service provider which
belongs to the datacenter.

3.4 Resource Monitoring and Management System


Production data centers receive a vast number of
heterogeneous resource requests with diverse resource
demands, durations, priorities and performance. The
heterogeneous nature of both machine and workload in Figure 1. Architecture of proposed framework
production cloud environments has profound implications on
the design of DCP schemes. Here we address accurate The resource provisioning in Cloud Computing, the long-held
characterization of both workload and machine dream of computing as a utility, has the potential to transform
heterogeneities. Standard K-means clustering, we show that a large part of the IT industry, making software even more
the heterogeneous workload can be divided into multiple task attractive as a service and shaping the way IT hardware is
classes with similar characteristics in terms of resource and designed and purchased. Developers with innovative ideas for
performance objectives. new Internet services no longer require the large capital
outlays in hardware to deploy their service or the human
expense to operate it. They need not be concerned about over-
3.5 Dynamic Capacity Provisioning Approach provisioning for a service whose popularity does not meet
The workload traces contain scheduling events, resource their predictions, thus wasting costly resources, or under-
demand and usage records. The job is an application that provisioning for one that becomes wildly popular, thus
consists of one or more tasks. Each task is scheduled on a missing potential customers and revenue. The Methodology
single physical machine. When a job is submitted, the user based on Infrastructure as a Service layer to access resources
can specify the maximum allowed resource demand for each on-demand. A Rule Based Resource Manager is proposed to
task in terms of required CPU and memory size. Dynamically scale up private cloud and presents a cost effective solution in
adjust the number of active machines in a data center in order terms of money spent to scale up private cloud on-demand by
to reduce energy consumption while meeting the service level taking public cloud’s resources and that never permits secure
objectives (SLOs) of workloads. The coordinates of each information to cross the organization’s firewall in hybrid
point in these figures correspond to a combination of CPU and cloud. Also set the time for public cloud and private cloud to
memory requirements. fulfill the request.

4. CONCLUSION
The user’s usages have large number of progress in an
environment. So there have large number of problems are
occurred in the cloud. The resource provisioning can be
overcome by hybrid method. This proposed method is used to
allocate the resources with working conditions. It shows the
energy is very efficiency and the overcome the workload with
the good performance.

www.ijcat.com 780
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 774 - 777, 2014

6. REFERENCES
[12]O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce.
[1]http://www.youtube.com/yt/press/statistics.html Au- tomatic annotation of human actions in video. In Proc. of
[2]http://nlp.stanford.edu/software/corenlp.shtml ICCV, 2009.
[3]Collins English Dictionary, entry for "lemmatise" [13]Laptev, M. Marszalek, C. Schmid, and B.
[4]L. Ratinov and D. Roth, Design Challenges and Rozenfeld.Learning realistic human actions from movies. In
Misconceptions in Named Entity Recognition. CoNLL (2009) Proc. of CVPR, 2008
[14]M. Everingham, J. Sivic, and A. Zisserman. Hello! my
[5]G. A. Miller.Wordnet: A lexical database for english.
name is... buffy automatic naming of characters in tv video. In
(11):39-41. Proc. of BMVC, 2006.
[6]Chengde Zhang, Xiao Wu, Mei-Ling Shyu and QiangPeng,
" Adaptive Association Rule Mining for Web Video Event [15]F. Smeaton, P. Over, and W. Kraaij.Evaluation campaigns
Classification", 2013 IEEE 14th International Conference on and trecvid. In Proc. of ACM Workshop on Multimedia In-
Information Reuse and Integration (IRI), page 618-625. formation Retrieval, 2006
[7] Y. Song, M. Zhao, J. Yagnik, and X. Wu.Taxonomic [16]J. Yang, R. Yan, and A. G. Hauptmann. Cross-domain
classification for web-based videos.In CVPR, 2010. video concept detection using adaptive svms. In Proc. of
[8] Z. Wang, M. Zhao, Y. Song, S. Kumar, and B. Li. ACM MM, 2007.
Youtube-cat: Learning to categorize wild web videos. In [17] M. E. Sargin, H. Aradhye, P. J. Moreno, and M. Zhao.
CVPR, 2010. Au- diovisual celebrity recognition in unconstrained web
[9] http://www.ranks.nl/resources/stopwords.html videos. In Proc. of ICASSP, 2009.
[10]http://cs.nyu.edu/grishman/jet/guide/PennPOS.html [18] J. Liu, J. Luo, and M. Shah.Recognizing realistic actions
[11]Roth and D. Zelenko, Part of Speech Tagging Using a from videos.In Proc. of CVPR, 2009.
Network of Linear Separators. Coling-Acl, The 17th [19] S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, "A
International Conference on Computational Linguistics (1998) novel ultrathin elevated channel low-temperature poly-Si
pp. 1136—1142 TFT," IEEE Electron Device Lett., vol. 20, pp. 569-571, Nov.
1999

www.ijcat.com 781
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 782 - 784, 2014, ISSN:- 2319–8656

Authentic Data Access Scheme for Variant Disruption-


Tolerant Networks
S.Raja Rajeshwari K. Prabhakar S.Fowjiya
Vivekanandha College Of Vivekanandha College Of
Vivekanandha College Of Engineering For Women Engineering For Women
Engineering For Women Tiruchengode, India Tiruchengode, India
Tiruchengode, India

Abstract: Mobile nodes in military environments such as a battlefield or a hostile region are likely to suffer from intermittent network
connectivity and frequent partitions. Disruption-tolerant network (DTN) technologies are becoming successful solutions that allow
wireless devices carried by soldiers to communicate with each other and access the confidential information or command reliably by
exploiting external storage nodes. However, the problem of applying CP-ABE in decentralized DTNs introduces several security and
privacy challenges with regard to the attribute revocation, key escrow, and coordination of attributes issued from different authorities.
In this paper, we propose a secure data retrieval scheme using CP-ABE for decentralized DTNs where multiple key authorities manage
their attributes independently. We demonstrate how to apply the proposed mechanism to securely and efficiently manage the
confidential data distributed in the disruption-tolerant military network. Since some users may change their associated attributes at
some point (for example, moving their region), or some private keys might be compromised, key revocation (or update) for each
attribute is necessary in order to make systems secure. This implies that revocation of any attribute or any single user in an attribute
group would affect the other users in the group. It may result in bottleneck during rekeying procedure, or security degradation due to
the windows of vulnerability if the previous attribute key is not updated immediately.

Keywords: component; formatting; style; styling; insert (Minimum 5 to 8 key words)

manage their own attribute keys independently as a


1. INTRODUCTION decentralized DTN [10].
We ask that authors follow some simple guidelines. This
document is a template. An electronic copy can be
The concept of attribute-based encryption (ABE) is a
downloaded from the journal website. For questions on paper
promising approach that fulfills the requirements for secure
guidelines, please contact the conference publications
data retrieval in DTNs. ABE features a mechanism that
committee as indicated on the conference website.
enables an access control over encrypted data using access
Information about final paper submission is available from the
policies and ascribed attributes among private keys and
conference website
ciphertexts. Especially, ciphertext-policy ABE (CP-ABE)
Delay-tolerant networking (DTN) is an approach to computer provides a scalable way of encrypting data such that the
network architecture that seeks to address the technical issues encryptor defines the attribute set that the decryptor needs to
in heterogeneous networks that may lack continuous network possess in order to decrypt the ciphertext [13]. Thus, different
connectivity. Examples of such networks are those operating users are allowed to decrypt different pieces of data per the
in mobile or extreme terrestrial environments, or planned security policy.
networks in space.
Recently, the term disruption-tolerant networking has gained 2. RELATED WORKS
currency in the United States due to support from DARPA, In CP-ABE, the ciphertext is encrypted with an access policy
which has funded many DTN projects. Disruption may occur chosen by an encryptor, but a key is simply created with
because of the limits of wireless radio range, sparsity of respect to an attributes set. CP-ABE is more appropriate to
mobile nodes, energy resources, attack, and noise. DTNs than KP-ABE because it enables encryptors such as a
commander to choose an access policy on attributes and to
Roy [4] and Chuah [5] introduced storage nodes in DTNs encrypt confidential data under the access structure via
where data is stored or replicated such that only authorized encrypting with the corresponding public keys or attributes
mobile nodes can access the necessary information quickly [4], [7], [15].
and section of confidential data including access control
methods that are cryptographically enforced [6], [7]. In many Most of the existing ABE schemes are constructed on the
cases, it architecture where a single trusted authority has the power to
is desirable to provide differentiated access services such that generate the whole private keys of users with its master secret
data access policies are defined over user attributes or roles, information [11], [13], [14]. Thus, the key escrow problem is
which are managed by the key authorities. inherent such that the key authority can decrypt every
In this case, it is a reasonable assumption that multiple key ciphertext addressed to users in the system by generating their
authorities are likely to manage their own dynamic attributes secret keys at any time. Chase et al. presented a distributed
for soldiers in their deployed regions or echelons, which could KP-ABE scheme that solves the key escrow problem in a
be frequently changed (e.g., the attribute representing current multiauthority system. In this approach, all (disjoint) attribute
location of moving soldiers) [4], [8], [9]. We refer to this authorities are participating in the key generation protocol in a
DTN architecture where multiple authorities issue and distributed way such that they cannot pool their data and link
multiple attribute sets belonging to the same user. One

www.ijcat.com 782
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 782 - 784, 2014, ISSN:- 2319–8656

disadvantage of this fully distributed approach is the access from the storage node or key authorities should be also
performance degradation. Since there is no centralized prevented.
authority with master secret information, all attribute
authorities should communicate with each other in the system 3.2.2 Collusion-resistance:
to generate a user’s secret key. If multiple users collude, they may be able to decrypt a
ciphertext by combining their attributes even if each of the
users cannot decrypt the ciphertext alone.
3. SYSTEM DESIGN
3.2.3 Backward and forward Secrecy
3.1 Existing System In the context of ABE, backward secrecy means that any user
When multiple authorities manage and issue attribute keys to who comes to hold an attribute (that satisfies the access
users independently with their own master secrets, it is very policy) should be prevented from accessing the plaintext of
hard to define fine-grained access policies over attributes the previous data exchanged before he holds the attribute. On
issued from different authorities. the other hand, forward secrecy means that any user who
drops an attribute should be prevented from accessing the
plaintext of the subsequent data exchanged after he drops the
The problem of applying the ABE to DTNs introduces several attribute, unless the other valid attributes that he is holding
security and privacy challenges. Since some users may satisfy the access policy.
change their associated attributes at some point (for example,
moving their region), or some private keys might be
compromised, key revocation (or update) for each attribute is
necessary in order to make systems secure. However, this Please use a 9-point Times Roman font, or other Roman font
issue is even more difficult, especially in ABE systems, since with serifs, as close as possible in appearance to Times
each attribute is conceivably shared by multiple users Roman in which these guidelines have been set. The goal is to
(henceforth, we refer to such a collection of users as an have a 9-point text, as you see here. Please use sans-serif or
attribute group) non-proportional fonts only for special purposes, such as
.
Another challenge is the key escrow problem. In CP-ABE, the
key authority generates private keys of users by applying the 4. SYSTEM IMPLEMENTATION
authority’s master secret keys to users’ associated set of
attributes. The last challenge is the coordination of attributes 4.1 Key Authorities
issued from different authorities. When multiple authorities They are key generation centers that generate public/secret
manage and issue attributes keys to users independently with parameters for CP-ABE. The key authorities consist of a
their own master secrets, it is very hard to define fine-grained central authority and multiple local authorities. We assume
access policies over attributes issued from different that there are secure and reliable communication channels
authorities. between a central authority and each local authority during the
initial key setup and generation phase. Each local authority
manages different attributes and issues corresponding attribute
3.2 Proposed System keys to users.
First, immediate attribute revocation enhances
They grant differential access rights to individual users based
backward/forward secrecy of confidential data by reducing the
on the users’ attributes. The key authorities are assumed to be
windows of vulnerability.
honest-but-curious. That is, they will honestly execute the
assigned tasks in the system; however they would like to learn
Second, encryptors can define a fine-grained access policy
information of encrypted contents as much as possible.
using any monotone access structure under attributes issued
from any chosen set of authorities.
4.2 Storage node:
Third, the key escrow problem is resolved by an escrow-free This is an entity that stores data from senders and provide
key issuing protocol that exploits the characteristic of the corresponding access to users. It may be mobile or static.
decentralized DTN architecture. The key issuing protocol Similar to the previous schemes, we also assume the storage
generates and issues user secret keys by performing a secure node to be semi-trusted that is honest-but-curious.
two-party computation (2PC) protocol among the key
authorities with their own master secrets. The 2PC protocol
deters the key authorities from obtaining any master secret
4.3 Sender:
information of each other such that none of them could This is an entity who owns confidential messages or data
generate the whole set of user keys alone. (e.g., a commander) and wishes to store them into the external
data storage node for ease of sharing or for reliable delivery to
Thus, users are not required to fully trust the authorities in users in the extreme networking environments. A sender is
order to protect their data to be shared. The data responsible for defining (attribute based) access policy and
confidentiality and privacy can be cryptographically enforced enforcing it on its own data by encrypting the data under the
against any curious key authorities or data storage nodes in policy before storing it to the storage node.
the proposed scheme. 4.4 User
This is a mobile node who wants to access the data stored at
3.2.1 Data confidentiality: the storage node (e.g., a soldier). If a user possesses a set of
attributes satisfying the access policy of the encrypted data
Unauthorized users who do not have enough credentials defined by the sender, and is not revoked in any of the
satisfying the access policy should be deterred from accessing attributes, then he will be able to decrypt the ciphertext and
the plain data in the storage node. In addition, unauthorized obtain the data.

www.ijcat.com 783
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 782 - 784, 2014, ISSN:- 2319–8656

5. CONCLUSION “Plutus: Scalable secure file sharing on untrusted storage,” in


Proc. Conf. File Storage Technol., 2003, pp. 29–42.
The concept of attribute-based encryption (ABE) is a [7] L. Ibraimi, M. Petkovic, S. Nikova, P. Hartel, and W. Jonker,
promising approach that fulfills the requirements for secure “Mediated ciphertext-policy attribute-based encryption and its
data retrieval in DTNs. ABE features a mechanism that application,” in Proc. WISA, 2009, LNCS 5932, pp. 309–323.
enables an access control over encrypted data using access [8] N. Chen, M. Gerla, D. Huang, and X. Hong, “Secure, selective
policies and ascribed attributes among private keys and group broadcast in vehicular networks using dynamic attribute
ciphertexts. Especially, Ciphertext policy ABE (CP-ABE) based encryption,” in Proc. Ad Hoc Netw. Workshop, 2010, pp.
provides a scalable way of encrypting data such that the 1–8.
[9] D. Huang and M. Verma, “ASPE: Attribute-based secure policy
encryptor defines the attribute set that the decryptor needs to
enforcement in vehicular ad hoc networks,” Ad Hoc Netw., vol.
possess in order to decrypt the ciphertext. Thus, different 7, no. 8, pp. 1526–1535, 2009.
users are allowed to decrypt different pieces of data per the [10] A. Lewko and B. Waters, “Decentralizing attribute-based
security policy. When multiple authorities manage and issue encryption,” Cryptology ePrint Archive: Rep. 2010/351, 2010.
attribute keys to users independently with their own master [11] A. Sahai and B. Waters, “Fuzzy identity-based encryption,” in
secrets, it is very hard to define fine-grained access policies Proc. Eurocrypt, 2005, pp. 457–473.
over attributes issued from different authorities. [12] V. Goyal, O. Pandey, A. Sahai, and B. Waters, “Attribute-based
encryption for fine-grained access control of encrypted data,” in
Proc. ACM Conf. Comput. Commun. Security, 2006, pp. 89–98.
[13] J. Bethencourt, A. Sahai, and B. Waters, “Ciphertext-policy
attributebased encryption,” in Proc. IEEE Symp. Security
Privacy, 2007, pp. 321–334.
5. REFERENCES [14] R. Ostrovsky, A. Sahai, and B. Waters, “Attribute-based
[1] J. Burgess, B. Gallagher, D. Jensen, and B. N. Levine, encryption with non-monotonic access structures,” in Proc.
“Maxprop: Routing for vehicle-based disruption tolerant ACM Conf. Computer Commun. Security, 2007, pp. 195–203.
networks,” in Proc.IEEE INFOCOM, 2006, pp. 1–11. [15] S. Yu, C. Wang, K. Ren, and W. Lou, “Attribute based data
[2] M. Chuah and P. Yang, “Node density-based adaptive routing sharing with attribute revocation,” in Proc. ASIACCS, 2010, pp.
scheme for disruption tolerant networks,” in Proc. IEEE 261–270.
MILCOM, 2006, pp.1–6. [16] [16] A. Boldyreva, V. Goyal, and V. Kumar, “Identity-based
[3] M. M. B. Tariq, M. Ammar, and E. Zequra, “Mesage ferry route encryption with efficient revocation,” in Proc. ACM Conf.
design for sparse ad hoc networks with mobile nodes,” in Proc. Comput. Commun. Security, 2008, pp. 417–426.
ACM MobiHoc, 2006, pp. 37–48. [17] M. Pirretti, P. Traynor, P. McDaniel, and B. Waters, “Secure
[4] S. Roy andM. Chuah, “Secure data retrieval based on ciphertext attributebased systems,” in Proc. ACMConf. Comput. Commun.
policy attribute-based encryption (CP-ABE) system for the Security, 2006, pp. 99–112.
DTNs,” Lehigh CSE Tech. Rep., 2009. [18] Junbeom Hur and Kyungtae Kang “Secure Data Retrieval for
[5] M. Chuah and P. Yang, “Performance evaluation of content- Decentralized Disruption-Tolerant Military Network”,,
based information retrieval schemes for DTNs,” in Proc. IEEE IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22,
MILCOM, 2007, pp. 1–7. NO. 1, FEBRUARY 2014
[6] M. Kallahalla, E. Riedel, R. Swaminathan, Q. Wang, and K. Fu,

www.ijcat.com 784
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 785 - 790, 2014, ISSN:- 2319–8656

Reverse Engineering for Documenting Software


Architectures, a Literature Review
Hind Alamin Mohamed Hany H Ammar
College of Computer Science and Lane Computer Science and
Information Technology, Electrical Engineering Department,
Sudan University of Science and College of Engineering and Mineral
Technology, Resources,
SUST University, SUDAN West Virginia University, USA

Abstract: Recently, much research in software engineering focused on reverse engineering of software systems which has become one
of the major engineering trends for software evolution. The objective of this survey paper is to provide a literature review on the
existing reverse engineering methodologies and approaches for documenting the architecture of software systems. The survey process
was based on selecting the most common approaches that form the current state of the art in documenting software architectures. We
discuss the limitations of these approaches and highlight the main directions for future research and describe specific open issues for
research.

Keywords: Reveres Engineering; Software Architecture; Documenting Software Architectures; Architectural Design Decisions.

the architecture of software systems, and highlights the open


1. INTRODUCTION issues and the directions for future research.
Reverse engineering has become one of the major The rest of the paper is organized as follows: Section 2;
engineering trends for software evolution. Reverse presents a literature review of the common existing researches
engineering is defines as the process of analyzing an existing on reverse engineering from different perspectives. Section 3;
system to determine its current components and the highlights the new research areas as open issues for future
relationship between them. This process extracts and creates works. Finally, concludes with summarizing the main
the design information and new forms of system contribution and the future research.
representations at a higher level of abstraction [1, 2]. Garg et
al. categorized engineering into forward engineering and 2. LITERATURE REVIEW
reverse engineering. Both of these types are essential in the Program understanding plays a vital role in most of software
software development life cycle. The forward engineering engineering tasks. In fact; the developers use the software
refers to the traditional process for developing software which documentation to understand the structure and behavior of
includes: gathering requirements, designing and coding existing systems [4, 5]. However, the main problem that
process till reach the testing phase to ensure that the developers face is that the design document or others software
developed software satisfied the required needs [1]. While artifacts were out-of-date to reflect the system's changes. As a
reverse engineering defined as the way of analyzing an result, more effort and time needed for understanding the
existing system to identify its current components and the software rather that modifying it [4, 5]. The following
dependencies between these components to recover the design sections will introduce the most common reverse engineering
information, and it creates other forms of system approaches that focused in documenting the architecture of
representations [1, 2]. software from different perspectives.
Legacy systems are old existing systems which are important
2.1 Reverse Engineering for
for business process. Companies rely on these legacy systems
and keep them in operations [2]. Therefore, reverse
Understanding Software Artifacts
Kumar explained that developers should understand the
engineering is used to support the software engineers in the
source code based on the static information and dynamic
process of analyzing and recapturing the design information
information [5]. The static information explained the
of complex and legacy systems during the maintenance phase
structural characteristic of the system. While dynamic
[2, 3].
information explained the dynamic characteristics or
In addition, the main objectives of reverse engineering are
behaviors of the system. Hence, these details help the
focused on generating alternative views of system's
developers on understanding the source code in order to
architecture, recover the design information, re-
maintain or evaluate the system. However, Kumar clarified
documentation, detect limitations, represent the system at
that few reverse engineering tools supported both of dynamic
higher abstractions and facilitate reuse [1, 2, 4].
and static information [5]. Therefore, he presented alternative
The main purpose of this survey paper is to achieve the
methodology to extract the static and dynamic information
following objectives: provide a literature review on the
from existing source code. This methodology focused on
existing reverse engineering methodologies for documenting
using one of the reverse engineering tools; namely, Enterprise
Architect (EA) to extract the static and dynamic views.

www.ijcat.com 785
International Journal of Computer Applications Technology and Research
Volume 3* Issue 12, December 2014

Additionally, all of the extracted information was represented


in form of Unified Modeling Language (UML) models. The
main purpose was to get the complementary views of software
in the form of state diagrams and communication diagrams.
The stages of this methodology are summarized as it shown in
Figure 1.

Figure 2. MoDisco Framework’s Architecture [6, p9]

2.3 Documenting of Architectural Design


Decisions (ADDs)
Historically, Shaw and Garlan introduced the concepts of
software architecture and defined the system in terms of
Figure 1. Reverse Engineering thorough Complementary computational components and interactions between these
Software Views [5] components as indicated in [7]. Moreover, Perry and Wolf
defined software architecture in terms of elements, their
This proposed methodology was very useful for supporting
properties, and the relationships among these elements. They
developers to understand the software artifacts of existing
suggested that the software architecture description is the
software systems. However, the methodology needs to
consequence of early design decisions [7].
support additional stakeholder beside the developers in order
Software architecture is defined by the recommended practice
to identify the stakeholders' concerns and their decisions
(ANSI/IEEE Std 1471-2000) as: the fundamental organization
about the whole system.
of a system, embodied in its components, their relationships to
2.2 Model Driven Reverse Engineering each other and the environment, and the principles governing
Model driven reverse engineering (MDRE) was proposed as its design and evolution. Software architecture development is
described in [6] to improve the traditional reverse engineering based on a set of architectural design decisions (ADDs). This
is considered as one of the important factors in achieving the
activities and legacy technologies. It is used to describe the
functional and non-functional requirements of the system [8].
representation of derived models from legacy systems to
understand their contents. However, most of MDRE solutions Che explained that the process of capturing and representing
focused on addressing several types of legacy system ADDs is very useful for organizing the architecture
scenarios, but these solutions are not complete and they do not knowledge and reducing the possibility of missing this
cover the full range of legacy systems. The work also knowledge [8]. Furthermore, the previous research focused on
developing tools and approaches for capturing, representing
introduced several reverse engineering processes such as: the
and sharing of the ADDs.
technical/functional migration, processes of MDRE [6].
However, Che clarified that most of the previous research
Recently, Hugo et al. presented a generic and extensible proposed different methods for documenting ADDs, and these
MDRE framework called "MoDisco". This framework is methods rarely support architecture evaluation and knowledge
applicable to different refactoring and re-documentation evaluation in practice [8]. Accordingly, Che et al. presented
techniques [6]. The architecture of MoDisco is represented in an alternative approach for documenting and evaluating
three layers, each layer is comprised of one or more ADDs. This approach proposed solutions described in the
components (see Figure 2). The components of each layers following subsections [8, 9]:
provided high adaptability because they are based on the
nature of legacy system technologies and the scenario based 2.3.1 Collecting of Architectural Design
on reverse engineering. Decisions
The first solution focused on creating a general architectural
However, the MoDisco framework was limited to traditional
framework for documenting ADDs called the Triple View
technologies such as: JAVA, JEE (including JSP) and XML.
Model (TVM). The framework includes three different views
This framework needs to be extended to support additional
for describing the notation of ADDs as shown in Figure 3. It
technologies and to add more advanced components to
also covers the features of the architecture development
improve the system comprehension, and expose the key
process [8, 9].
architecture design decisions.

www.ijcat.com 786
International Journal of Computer Applications Technology and Research
Volume 3* Issue 12, December 2014

included nine ADD models and used six criteria based on


desired features [11, 12]. The main reason was to investigate
the ADD models to decide if there are similarities and
differences in capturing the ADDs. Moreover, the study aimed
at finding the desired features that were missed according to
the architecture needs [11]. The authors in [11] classified the
ADD elements into two categories: major elements and
minor elements. The major elements refer to the consensus on
capturing and documenting ADDs based on the constraints,
Figure 3. Triple View Model Framework [8, p1374] rationale, and alternative decisions. While the minor elements
As it shown in Figure 3; the Element View describes the refer to the elements that used without consensus on capturing
elements that should be defined to develop the architecture; and documenting the ADDs, such as: stakeholders, problem,
such as: Computation elements, Data elements, and Connector group, status, dependency, artifacts, and phase/iteration.
elements. The Constraint View explains how the elements The main observations of this comparison study are
interact with each other by defining what the system should highlighted as follow: 1) all of the selected ADD models
do and not to do, the constraint(s) on each element of the included the major elements and used different terms to
element view. Additionally, define the constraints on the express similar concepts of the architecture design. 2) Most
interaction and configuration among the elements. ADD models used different minor elements for capturing and
Finally, the Intent View includes the rationale decision that documenting ADDs. 3) All the selected ADD models deal
made after analyzing all the available decisions, Moreover, with the architecture design as a decision making process. 4)
the selection of styles and patterns for the architecture and the While not all of them are supported by tools, some were based
design of the system. on only textual templates for capturing and documenting
ADDs. 5) The most important observation was that most of
2.3.2 Scenario-Based Documentation and existing ADD tools do not provide support for ADD
Evaluation Method personalization which refers to the ability of stakeholders to
The second solution called SceMethod is based on the TVM communicate with the stored knowledge of ADD [11, 12]
framework. The main purpose is to apply the TVM based on their own profile.
framework by specifying its views through the end-user
We summarize the approaches and methodologies described
scenarios; then manage the documentation and the evaluation
in this section in Table 1. The main observation is that
needs for ADDs [8, 10].
existing methods are focused on the developer’s concerns and
2.3.3 UML Metamodel viewpoints as the main stakeholder. Recent approaches such
The third solution is focused on developing the UML as: Triple View Model (TVM) [8], scenario-based method
Metamodel for the TVM framework. The main purpose was (SceMethod) [9], and managing ADDs [10] suggested the
to make each view of TVM specified by classes and a set of need for alternative solutions for supporting ADDs
attributes for describing ADD information. Accordingly, this personalization for different stakeholders.
solution provided the following features [8]: a) establish
3. OPEN ISSUES
traceable evaluation of ADDs, b) apply the evaluation related
We describe in this section the open issues that require further
to the specified attributes, c) support multiple ways on
research based on the research work described in the previous
documenting during the architecture process and allow
section. These issues are listed as follows:
explicit evaluation knowledge of ADDs.
 There is a significant need to develop alternative
Furthermore, TVM and SceMethod solution was validated in
approaches of reverse engineering for documenting the
using a case study to ensure the applicability and the
architectures that should simplify and classify all of the
effectiveness. Supporting the ADD documentation and
available information based on identifying the
evaluation in geographically separated software development
stakeholders' concerns and their decisions about the
(GSD) is currently work in progress.
system.
2.4 Comparison of Existing Architectural  Improve the system's comprehension by establishing
Design Decisions Models more advanced approaches for understanding the
software artifacts. These approaches should help in
Researchers made a great of effort to present related tools and documenting the architecture at different levels of
models for capturing, managing, and sharing the ADDs. abstractions and granularities based on the stakeholders
These proposed models were based on the concept of concerns.
architectural knowledge to promote the interaction between
 Finally, it's important to support multiple methods and
the stakeholders and improve the architecture of the system
guidelines on how to use the general ADDs framework
[8, 11].
in the architecting process. These methods should be
Accordingly in [11], Shahin et al. presented a comparison
base on the architecture needs, context and challenges in
study that is based on surveying and comparing the existing
order to evaluate the ADDs in the architecture
architectural design decisions models. Their comparison
development and evolution processes.

www.ijcat.com 787
International Journal of Computer Applications Technology and Research
Volume 3* Issue 12, December 2014

Table 1. Examples of some Methodologies and


Approaches for Documenting Software Architecture

Author Problem
# Proposed Solution(s) Results and Findings Limitation(s)
(year) Statement

1 Kumar Reverse - Alternative - This methodology This methodology needs to


(2013) engineering for methodology to support developers to support additional stakeholder
understanding the extract the static and achieve the reverse beside the developers in order
software artifacts dynamic information engineering goals in to identify the stakeholders'
from the source code. order to understand concerns and their decisions
the artifacts of about the whole system.
- The main purpose is software systems.
to get complementary
views of software
systems.

2 Hugo et al Understanding the - Generic and extensible - MoDisco provided MoDisco should extend to
(2014) contents of the MDRE framework high adaptability support additional technologies
legacy systems called "MoDisco". because it is based on and include more advanced
using model the nature of legacy components to improve the
driven reverse - This framework is system technologies system comprehension.
engineering applicable to different and the scenario(s)
(MDRE) types of legacy based on reverse
systems. engineering.

3 Che et al Collecting - Triple View Model - TVM framework TVM framework should extend
(2011) architectural (TVM) an architecture includes three to manage the evaluation and
design decisions framework for different views for documentation of ADDs by
(ADDs) documenting ADDs. describing the specifying its views through the
notation of ADDs. stakeholders' scenarios.

- TVM covers the main


features of the
architecture process.

4 Che et al Managing the - Scenario based - Manage the There is a need to support
(2012) documentation method (SceMethod) documentation and multiple ways on managing and
and evolution of for documenting and the evaluation needs documenting the ADDs during
the architectural evaluating ADDs. for ADDs through the architecture process.
design decisions stakeholders'
- This solution is based scenario(s).
on TVM. The main
purpose is to apply
TVM for specifying
its views through end-
user scenario(s).

www.ijcat.com 788
International Journal of Computer Applications Technology and Research
Volume 3* Issue 12, December 2014

Author Problem
# Proposed Solution(s) Results and Findings Limitation(s)
(year) Statement

5 Che Documenting - Developed UML - Apply the evaluation This solution is focused on the
(2013) and evolving the Metamodel for the TVM related to the developers view point and
architectural framework. The main specified attributes their work is currently in
design decisions purpose was to make and establish progress to support the ADD
each view of TVM traceable evaluation documentation and evaluation
specified by classes and a of ADDs, in geographically separated
set of attributes for software development (GSD).
- Allow explicit
describing ADDs
evaluation knowledge
information.
of ADDs.
- Support multiple
ways for
documenting ADDs
during the
architecture process.

6 Shahin et A survey of - The purpose of this - All of selected ADD There is a need to focus on
al (2009) architectural survey was to investigate models include the stakeholder to communicate
design decision ADD models to decide if major elements. with the stored knowledge
models and tools there are any similar of ADDs. This could be
- Most of ADD models
concepts or differences achieved by applying the
are based on using
on capturing ADD. scenario based
different minor
documentation and
- The survey classified elements for
evaluation methods through
ADD concept into two capturing and
stakeholders' scenario(s) to
categories: Major documenting the
manage the documentation
elements which refer to ADD.
and the evaluation needs for
the consensus on - All of selected ADD ADDs.
capturing and models deal with the
documenting ADD based architecture design as
on the constraint, the decision making
rationale and alternative process.
of decision. While the
Minor elements refers to - Not all models were
the elements that used supported by tools.
without consensus on Hence, some of these
capturing and ADD based on text
documenting ADD. template for
capturing and
- Moreover, to clarify the documenting ADDs.
desired features that are - However, most of
missed according to the existing ADD tools
architecture needs do not support the
ability of
stakeholders to
communicate with
the stored knowledge
of ADD.

www.ijcat.com 789
International Journal of Computer Applications Technology and Research
Volume 3* Issue 12, December 2014

Workshop on Software and System Architectures, 2005,


4. CONCLUSIONS 13-24.
This paper presented a survey on the current state of the art in
[8] Meiru Che. 2013. An Approach to Documenting and
documenting the architectures of existing software systems
Evolving Architectural Design Decisions. In Proceedings
using reverse engineering techniques. We compared existing of International Conference on Software Engineering
methods based on their findings and limitations. The main (ICSE'13), San Francisco, CA, USA, IEEE, 2013.
observation is that existing methods are focused on the 1373-1376.
developer’s concerns and viewpoints as the main stakeholder. [9] Meiru Che and Dewayne E. Perry. 2011. Scenario-based
We outlined several open issues for further research to architectural design decisions documentation and
develop alternative approaches of reverse engineering for evolution. In Proceedings of Engineering of Computer
documenting the architectures for development and evolution. Based Systems (ECBS'11), Las Vegas, NV, ( 27-29 April
2011), IEEE, 2011, 216-225.
These issues show the need to simplify and classify available
information based on identifying the stakeholders' concerns [10] Meiru Che and Dewayne E. Perry. 2012. Managing
and viewpoints about the system, improve comprehension by architectural design decisions documentation and
evolution. In Proceedings of 6th International Journal of
documenting the architecture at different levels of abstractions
Computers, 2012, 137-148.
and granularities based on the stakeholders concerns, and
[11] M. Shahin, P. Liang and M.R. Khayyambashi. 2009.
support multiple methods and guidelines on how to use the
Architectural design decision: Existing models and
ADDs framework based on the architecture needs, context tools. In Proceedings of Software Architecture, 2009 &
and challenges in order to evaluate these ADDs during the European Conference on Software Architecture.
architecture development and evolution processes. WICSA/ECSA 2009. Joint Working IEEE/IFIP
Conference, IEEE, 2009, 293-296.
5. ACKNOWLEDGMENTS [12] M. Shahin, P. Liang, and M.R. Khayyambashi. 2009. A
This research work was funded in part by Qatar National Survey of Architectural Design Decision Models and
Research Fund (QNRF) under the National Priorities Tools. Technical Report SBU-RUG-2009-SL-01.
Research Program (NPRP) Grant No.: 7 - 662 - 2 - 247 http://www.cs.rug.nl/search/uploads/Publications/shahin
2009sad.pdf, visited on 8 July 2014.
6. REFERENCES
[1] Mamta Gar and Manoj Kumar Jindal. 2009. Reverse
7. AUTHORS BIOGRAPHIES
Engineering – Roadmap to Effective software Design. In Hind Alamin Mohamed BSIT and MSCS, is a lecturer in
Proceedings of 2th International Journal of Recent Software Engineering department, College of Computer
Trends in Engineering. Information Paper, vol.1, (May Science and Information Technology at Sudan University of
2009). Science and Technology (SUST). She has participated in the
Scientific Forum for Engineering and Computer Students
[2] Rosenberg, Linda H. and Lawrence E. Hyatt, Software (December 2005) in SUDAN, and had the first prize of the
re-engineering. Software Assurance Technology Center, Innovation and Scientific Excellence for the best graduated
1996. http://www.scribd.com/doc/168304435/ Software- project on computer science in 2005. She has been teaching in
Re-Engineering1, visited on 26 April 2014. the areas of Software Engineering and Computer Science
[3] M. Harman, W. B. Langdon and W. Weimer.2013. since 2006. In 2010 she was the head of Software
Genetic Programming for Reverse Engineering, In R. Engineering Department till December 2012. She is currently
Oliveto and R. Robbes, editors, In Proceedings of 20th a PhD candidate in Software Engineering since 2013.
Working Conference on Reverse Engineering Hany H. Ammar BSEE, BSPhysics, MSEE, and PhD EE, is
(WCRE'13). Koblenz, Germany (14-17 October 2013), a Professor of Computer Engineering in the Lane Computer
IEEE, 2013. Science and Electrical Engineering department at West
[4] M. Harman, Yue Jia, W. B. Langdon, Justyna Petke, Virginia University. He has published over 170 articles in
Iman H. Moghadam, Shin Yoo and Fan Wu. 2014. prestigious international journals and conference proceedings.
Genetic Improvement for Adaptive Software He is currently the Editor in Chief of the Communications of
Engineering. In Proceedings of 9th International the Arab Computer Society On-Line Magazine. He is serving
Symposium on Software Engineering for Adaptive and and has served as the Lead Principal Investigator in the
Self-Managing Systems (SEAMS'14). Hyderabad, India projects funded by the Qatar National Research Fund under
(2-3 June 2014), ACM, 2014. the National Priorities Research Program. In 2010 he was
[5] Niranjan Kumar. 2013. An Approach for Reverse awarded a Fulbright Specialist Scholar Award in Information
Engineering thorough Complementary Software Views. Technology funded by the US State Department - Bureau of
In Proceedings of International Conference on Emerging Education and Cultural Affairs. He has been the Principal
Research in Computing, Information, Communication Investigator on a number of research projects on Software
and Applications (ERCICA'13), 2013, 229-234. Risk Assessment and Software Architecture Metrics funded
by NASA and NSF, and projects on Automated Identification
[6] Hugo Brunelière, Jordi Cabot, Grégoire Dupé and Systems funded by NIJ and NSF. He has been teaching in the
Frédéric Madiot. 2014. MoDisco: A Model Driven areas of Software Engineering and Computer Architecture
Reverse Engineering Framework. Information and since 1987. In 2004, he co-authored a book entitled Pattern-
Software Technology 56, no. 8, 2014, 1012-1032. Oriented Analysis and Design: Composing Patterns to Design
[7] May Nicholas. 2005. A survey of software architecture Software Systems, Addison-Wesley. In 2006, he co-authored
viewpoint models. In Proceedings of 6th Australasian a book entitled Software Engineering: Technical,
Organizational and Economic Aspects, an Arabic Textbook

www.ijcat.com 790
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 791 - 794, 2014, ISSN:- 2319–8656

Dynamic Resource Provisioning with Authentication in


Distributed Database

Anju Aravind K Dr. T. Senthil Prakash M. Rajesh


Shree Venkateshwara Hi-Tech Shree Venkateshwara Hi-Tech Shree Venkateshwara Hi-Tech
Engineering College Engineering College Engineering College
Gobi, India Gobi, India Gobi, India

Abstract: Data center have the largest consumption amounts of energy in sharing the power. The public cloud workloads of different
priorities and performance requirements of various applications [4]. Cloud data center have capable of sensing an opportunity to present
different programs. In my proposed construction and the name of the security level of imperturbable privacy leakage rarely distributed
cloud system to deal with the persistent characteristics there is a substantial increases and information that can be used to augment the
profit, retrenchment overhead or both. Data Mining Analysis of data from different perspectives and summarizing it into useful
information is a process. Three empirical algorithms have been proposed assignments estimate the ratios are dissected theoretically and
compared using real Internet latency data recital of testing methods.

Keywords: Mining, MD5, green computing, workload imitation, power consumption

decommissioned. So the data center contains delay process,


1. INTRODUCTION security process, mining process and also cost efficiency.
Most data centers today have a three or four tier hierarchical
networking structure. Three tier network architectures were
First, in data centre have different perspectives and useful
designed around client-server applications and single-purpose
information of data mining [DM]. In DM information that can
of application server. Client and server applications are
be used to increase the revenue and it is the process of
caused traffic to flow primarily in patterns: from a server up
summarizing the costs or both are reduced. It allows the user
to the data center core, to the environment core where it
to analyze the data with different dimensions or angles. To
moves out to the internet. These large core switches usually categorize and summarize the relationships are identified the
contain the vast majority of the intelligence in the network.
data. The proposed method of Support Vector Machine for
The cost of builds and operates the large computer platform
data mining system [DMS] is designed to take the advantage
and a focus on service quality and cost-efficient driving will
of powerful processors and shared pools to the calculation is
require cost estimation and on the capacity of processing and performed using the message passing paradigm for data that is
storage.
distributed across processors. The calculation results are then
The cloud consists of collected, and the process is repeated with new data processor.
Private Cloud: The infrastructure is provisioned for exclusive
use by a single organization. Second, data center security concerns to secure all aspects of
Public Cloud: The infrastructure is provisioned for open use cloud data [2]. Many of these features are not unique in cloud
by the general public. system: Irrespective of the data stored on it is vulnerable
Hybrid Cloud: There is a system infrastructure of two or more attack. Therefore the cloud computing have security, access
distinct cloud Infrastructures (private, community, or public) control, malware protection structures, reducing attack
that remain unique entities, but are bound together by surfaces, safety design and implementation of all data
standardized or proprietary technology that enables data and including computing security. In this proposed method having
application portability. MD5 is used to secure the data in data center.
Community Cloud: Consumers who share the concerns of the
infrastructure for personal use by the provision of a particular Third, resource provisioning in cloud computing over the last
community. few years, has emerged as new computing model to allowing
utility based delivery of services to end users[1]. Cloud
While Cloud computing is not equivalent to virtualization, computing relies on virtualization technologies to provide on-
virtualization technology is heavily used to operate a cloud demand resources according to the end user needs but
environment system. In the virtualization have one of the host problems with resource allocation pose an issue for the
that ran a single operating system now has the ability to run management of several data centers. In my proposed gossip
multiple guest operating systems as virtual machines [VMs]. protocol of green computing based virtualization can be used
The VMs are created for fast and easily data storage in a cloud to increase energy efficiency in substrate networks, allowing
environment. The infrastructure environment is invisible for consolidation through the virtual hosting of different
abstracted from the consumer. The computing of firmware resources on the same substrate resource. Migrating resources
system to provide virtual machine and it allows to operating virtually allows the network to balance its overall energy load
directly on underlying hardware but within specified and reduce the total power consumption of the data center.
constraint. It is the software that manages communications The dynamic provisioning concept is used to allocate the
between the physical server memory and CPU or processing resources dynamically.
capability and the virtual machines that are running. The
software allows VMs to be quickly provisioned or

www.ijcat.com 791
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 791 - 794, 2014, ISSN:- 2319–8656

2. RELATED WORK functions to distinguish between members of the two classes


in the trained data. The metrics for the best classification
2.1 Datacenter function can be realized geometrically and a linear
The datacenter is the collection of servers, the cloud
classification function corresponds to a separating hyperplane
computing and information technology between physical
servers to migrate the cloud computing services, virtualized that passes through the middle and separating of the two
data center, emerging paradigm changes and executives of the classes. A function is determined the new data instance of
largest independent provider [7]. VM virtualizations and can be classified by simply testing the sign of the function
migration capabilities to integrate their computer services and
belongs to the positive class if .
the minimum number of physical servers used to process the
data center such as mining processing, security purpose, load
balancing, server establishment, online maintenance, 3.2 MD5 Method
proactive fault tolerance and VM migration. Information’s are Step 1 – Append padded bits:
use of the cloud computing to provide services for worldwide – The length of the messages 448 modulo of 512 that is
users. The consumer scientific controls the commercial similar to the modular padded.
domains hosting gives pervasive applications have costs and
• 64 bits to 512 bits long means they are just shy of being
environmental contribution to the carbon footprint of data
extended.
centers and cloud hosting applications that consume huge
amounts of electrical energy. Therefore, reduces the – The value of 1 bit is appended to the message and then the
environmental impact of cloud computing with the help of value 0 bit also appended so that the length in bits are equals
green computing and gossip protocol. 448 modulo 512.

2.2 Database Step 2 – Append length:


Database systems serving cloud platforms must serve large
numbers of applications. In addition to managing tolerant with – 64 bit representation of b is appended to the result of the
small data footprints, different textures and shapes with previous step.
variable load patterns such data platforms must minimize their
operating costs by efficient resource sharing. The persistent – Right that has a length of 512 bits of message.
database have the files are stored in network attached storage.
VM migrate the database cache and the state of active Step 3 –Initialize MD Buffer
transactions to ensure minimal impact on transaction
• A four-word buffer (A, B, C, D) is used to compute the
execution while allowing transactions active during migration
message digest.
to continue executions and also guarantee the concurrency
while ensuring correctness during failures [8]. – Here every one has A, B, C, D is a 32-bit register

2.3 Resource Allocation - Word A: 01 23 45 67 etc., the following hexadecimal values


Dynamic Resource Management for Cloud Computing of these registers are initialized.
paradigm is an active area of research. The cost varies are
considerably depending upon the configuration of resources Step 4 –The 16 word blocks for process message.
by using them [6]. Efficient management of resources, cloud
providers and users have the prime interests depending upon – The input of three 32-bit words and getting the output of one
the size, capacity and flexibility for the cloud have been 32-bit word four sub-functions.
managing software which is able to use the hardware
resources to succeed, and argued that the critical alone to F(X,Y,Z) = XY v not(X) Z
provide the desired performance [5]. The successful resources
The bits X,Y and Z are independent and also unbiased. The
management in the context of the resource constraints for the
each bit of F(X,Y,Z), G(X,Y,Z), H(X,Y,Z), and I(X,Y,Z) will
best private cloud solution, initial placement and load
be independent and unbiased.
balancing when the resources to offers rich set. For example,
during the peak of banking applications based on customer
Step 5 –output
needs, they have number of servers that can be accessed from
the cloud. In a moment have shut down the server and it can – Output A,B,C,D of the message digest.
be used to save the energy.
– The ouput is start with the low order byte A and end with
3. METHODOLOGY the high-order byte D.
3.1 Support Vector Machine
The support vector machine (SVM) is a training 3.3 GREEN AND GOSSIP
algorithm for learning classification and regression rules To initiate more environmental - friendly computing practices.
of data [3]. The SVM is used to find the best classification There are some steps take toward a green computing strategy.
Green Resource Allocator: It is act as the interface between

www.ijcat.com 792
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 791 - 794, 2014, ISSN:- 2319–8656

the Cloud infrastructure and consumers. The interactions of for data transmission client request processing physical
the following components to support energy efficient resource machine data population server mining user identification by
management. The components are Arbitrator, Overhaul- server using the request name.
Prediction, Client Focused, Budget Processing, Power 4.5 Efficient Server Provisioning
Consuming, Overhaul-Router, Storage Leadership and Gauge. Dynamic allocation has been user request to the physical
server in done by cloud environment by using the concept of
protocol named as gossip [5]. This protocol is sufficient
protocol for dynamic resource allocation and it gives response
to the client at exact query matching server provisioning
approach for minimized the cost, reduce the time and quick
response. In order to assemble the cloud environmental setup
and physical server storage device is very expensive but they
are applying the mining setup. Show that it is must like
expensive.

5. EXPERIMENTAL RESULT
The implementation of my concept is to create the n number
of virtual machines and physical machines. In this machines
have n number of information’s are stored. This physical
machine contains java connection based classes and service
based concepts. In cloud environment system distributed set
up of mined servers. The cloud server’s shows all the
information of the physical server in the data center.
Figure: 3.1 Green Cloud Computing Architecture
In data center having mining setup for retrieve the data from
the data storage. In data storage wants to store the number of
Gossip huge clouds of green computing resource allocation
files with the help of query processing from server. In this
based they aim to reduce server power consumption by method using SVM for classify the data for the user query
integration with the specific objectives are sensible the searched from the server.
resource allocation protocol and propose a common rumors.
The data center is the large storage network. The network
Under load the ethical and fair allocation of CPU resources to wants to secure the stored information for the storage devices
clients. The simulation result and the key performance of by using the cryptography technique. In this concept am using
metrics for resource allocation process and suggest that it is MD5 method for creating number of keys to secure the data
the body has do not change with increasing size. from the storage devices. The valid user’s are only views the
information about the user queries. The key based concept is
4. SYSTEM IMPLEMENTATION achieving the more secure for storage devices.

4.1 General Finally, the server client communication is the very large
In general, user interface design for consumer requests and process. In this having number of resources and it also
getting response from the server. The consumer is valid intermediate process for file transfers to the user’s and data
provider means they are getting the further processes. It is storage. It is the nonstop processes so here using gossip
used to interact with client and server for request and protocol for green computing process. It is an automatically
responding from the cloud storage. allocate the resources for file sharing in the data center.

4.2 Implementation of Physical Machine 6. CONCLUSION


In this module, to implement n number of physical server and To conclude that the concept have so many process in the
which is interconnected with data storage. Each and every cloud environment. The cloud environment has number of
physical server have been separate identification like server virtual machines and physical machines. These machines are
IP, server port and server instance name. in this all the data used to store the number of data in the storage devices. The
stored in the storage device which the data can be applicable storage devices have been number of processes to mining the
to source through physical server. data.
In this the data can be find out from the storage device. The
4.3 Cloud Environment mining setup data retrieval by the user’s from the cloud storage devices. In
system this the user given the request and get the response from the
It is provide the dynamic allocation of consumer request to the server. The SVM method is used for mined the data from the
particular physical services and the physical server retrieve storage device. This method is used to classify the data and it
the consumers request based information from the storage is very efficient to gather some information without any
device. The response of physical server is applicable for unknown data.
available data only. Cloud environment use the concept of
The second process is to secure the data by using the method
SVM to populate the data from data centers.
of MD5. The cryptographic technique is used to secure the
4.4 Highly Secure Client Processing information storage by using the key values. So the
In this system using MD5 based concept of achieving the information storage is getting more secure without any
securable data transmission between the consumer and the leakage in the cloud environment.
server. This MD5 is converted into the data normal format to The third process to find the delay provisioning for data
undefined format. In this application apply the highly security center. The data center having so many request and response

www.ijcat.com 793
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 791 - 794, 2014, ISSN:- 2319–8656

process for user’s from the storage devices. In this device [4] Q. Zhang, M.F. Zhani, R. Boutaba, and J.L. Hellerstein,
having continue processes so the machines want to overcome “HARMONY: Dynamic Heterogeneity-Aware Resource
the delay processing by using the method of gossip protocol Provisioning in the Cloud,”Proc. IEEE Int’l Conf.
for green computing. Distributed Computing Systems (ICDCS), 2013.
[5] Q. Zhang, M.F. Zhani, Q. Zhu, S. Zhang, R. Boutaba,
and J.L. Hellerstein, “Dynamic Energy-Aware Capacity
7. REFERENCES Provisioning for Cloud Computing Environments,” Proc.
[1] Qi Zhang, Mohamed Faten Zhani, Raouf Boutaba, ACM Int’l Conf. Autonomic Computing (ICAC), 2012.
Joseph L. Hellerstein “Dynamic Heterogeneity Aware
Resource Provisioning in the Cloud” Proc.IEEE Trans. [6] Lu Zhang, Xueyan Tang, “Optimizing Client Assignment
Cloud computing,2014. for Enhancing Interactivity in Distributed Interactive
Applications,” Proc. IEEE/ACM Transaction on
[2] Jun Zhou, Xiaodong Lin, Xiaolei Dong, Zhenfu Cao Networking, 2012.
“PSMPA: Patient Self-controllable and Multi-level
Privacy-preserving Cooperative Authentication in [7] P. Morillo, J. Orduna, M. Fernandez, and J. Duato,
Distributed m-Healthcare Cloud Computing System” “Improving the performance of distributed virtual
Proc. IEEE Tans. Parallel and Distributed System, 2014. environment systems,” IEEE Trans. Parallel Distrib.
Syst., vol. 16, no. 7, pp. 637–649, Jul. 2005.
[3] Lu Zhang and Xueyan Tang “The Client Assignment
Problem for Continuous Distributed Interactive [8] J. Sun and Y. Fang, Cross-domain Data Sharing in
Applications: Analysis, Algorithms, and Evaluation” Distributed Electronic Health Record System, IEEE
Proc. IEEE Trans. Parallel and Distributed System, 2014. Transactions on Parallel and Distributed Systems, vol.
21, No. 6, 2010.
Authors

MS. Anju Aravind K, received the Bachelor of Engineering in Anna University,


TamilNadu, India in 2011. PG Scholar Currently persuing her M.E CSE degree in
shree Venkateshwara Hi-Tech Engg College, Gobi, TamilNadu, India.

Dr.T.Senthil Prakash received the Ph.D. degree from the PRIST University,
Thanjavur, India in 2013 and M.E(CSE) degree from Vinayaka Mission’s University,
Salem, India in 2007 and M.Phil.,MCA.,B.Sc(CS) degrees from Bharathiyar
University, Coimbatore India, in 2000,2003 and 2006 respectively, all in Computer
Science and Engineering. He is a Member in ISTE New Delhi, India, IAENG, Hong
Kong..IACSIT, Singapore SDIWC, USA. He has the experience in Teaching of
10+Years and in Industry 2 Years. Now He is currently working as a Professor and
Head of the Department of Computer Science and Engineering in Shree Venkateshwara
Hi-Tech Engineering College, Gobi, Tamil Nadu, and India. His research interests
include Data Mining, Data Bases, Artificial Intelligence, Software Engineering etc.,He
has published several papers in 17 International Journals, 43 International and National
Conferences.

Mr.M.Rajesh, Received the Bachelor of Engineering in Anna University, TamilNadu,


India in 2007 and Master of Engineering from Kongu Engineering College of India in
2012. Currently he is doing Ph.D at Bharath University, Chennai. His research
interests include cloud computing in resource provisioning.

www.ijcat.com 794
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 795 - 798, 2014, ISSN:- 2319–8656

Guarding Against Large-Scale Scrabble In Social


Network

Geerthidevi K G Dr. T. Senthil Prakash Prakadeswaran M.E


Shree Venkateshwara Hi-Tech Shree Venkateshwara Hi-Tech Shree Venkateshwara Hi-Tech
Engineering College Engineering College Engineering College
Gobi, India Gobi, India Gobi, India

Abstract: Generally, the botnet is one of the most dangerous threats in the network. It has number attackers in the network. The
attacker consists of DDOS attack, remote attack, etc., Bots perform perform repetitive tasks automatically or on a schedule over the
internet, tasks that would be too mundane or time-consuming for an actual person. But the botnets have stealthy behavior as they are
very difficult to identify. These botnets have to be identified and the internet have to be protected. Also the the activity of botnets must
be prevented to provide the users, a reliable service. The past of botnet detection has a transaction process which is not secure. A
efficient stastical data classifier is required to train the botent preventions system. To provide the above features clustering based
analysis is done. our approach can detect and profile various P2P applications rather than identifying a specific P2P application.
Anomaly based detection technique is used to obtain this goal.
Keywords: Botnet, anomaly base detection, hash function, DDOS

makes these botnets what some security experts believe to be


1. INTRODUCTION the most dangerous threat on the internet today.
A botnet is a collection of Internet-connected programs
communicating with other similar programs in order to
Such networks comprising hundreds or thousands of infected
perform tasks. Botnets sometimes compromise computers
devices have the resources needed to perform high-scale
whose security defenses have been breached and control
malicious actions such as: (1) Mass-spam delivery that floods
conceded to a third party. [1]It is remotely controlled by an
millions of inboxes in a matter of seconds (2) DoS and DDoS
attacker through a command and control (C&C) channel.
attacks that crash entire websites and can put legitimate
Botnets serve as the infrastructures responsible for a variety of
businesses in serious trouble (3) Brute-force hacking attacks
cyber-crimes, such as spamming, distributed denial of-service
by cracking passwords and other internet security measures
(DDoS) attacks, identity theft, click fraud, etc. The C&C
(4) Identity theft and internet fraud by collecting private
channel is an essential component of a botnet because
information from infected users
botmasters rely on the C&C channel to issue commands to
their bots and receive information from the compromised
Bots can sneak up on you in many ways. They can use the
machines.
vulnerabilities and outdated software in your system to infect
it while you‟re casually surfing the web. They can be
Bots perform perform repetitive tasks automatically or on a delivered by Trojans or questionable software you get tricked
schedule over the internet, tasks that would be too mundane or into downloading (like rogue antivirus programs). Or they can
time-consuming for an actual person. Search engines use be sent directly to your inbox as an email attachment by
them to surf the web and methodically catalogue information spammers.
from websites, trading sites make them look for the best
bargains in seconds, and some websites and services employ Botnets perform many malicious activity in internet like
them to deliver important information like weather conditions, sending spams to emails, increasing network traffic and even
news and sports, currency exchange rates. takes control of the system by running Trojans. But the
botnets have stealthy behavior as they are very difficult to
Unfortunately, not all bots roaming the internet are useful and identify. These botnets have to be identified and the internet
harmless. Cyber crooks have also noticed their potential and have to be protected. The information shared in social media
have come up with malicious bots – programs designed to are sensitive and personal. Hence the activity of botnets must
secretly install themselves on unprotected or vulnerable be prevented to provide the users, a reliable service.
computers and carry out whatever actions they demand. And
that could be anything from sending spam to participating in a To provide the above features clustering based analysis is
distributed denial of service attack (DDoS) that brings down done. our approach can detect and profile various P2P
entire websites applications rather than identifying a specific P2P application.
Anomaly based detection technique is used to obtain this goal.
Once infected, your computer becomes part of a botnet – a
network of infected or zombie-computers controlled from the
distance by a cybercriminal who rented it to carry out his 2. RELATED WORKS
illegal plans. So not only is your computer infected and Many approaches have been proposed to detect botnets have
your internet security compromised, but your system been proposed. For example, BotMiner [7] identifies a group
resources and your bandwidth are rented out to the highest of hosts as bots belonging to the same botnet if they share
bidder to help them attack other unsuspecting users or even similar communication patterns and meanwhile perform
legitimate businesses. This huge potential for cybercrime similar malicious activities, such as scanning, spamming,

www.ijcat.com 795
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 795 - 798, 2014, ISSN:- 2319–8656

exploiting, etc.[4] Unfortunately, the malicious activities may sets of flows Stcp(h) and Sudp(h), we characterize each flow
be stealthy and non-observable A efficient stastical data using a vector of statistical features v(h) = [Pkts , Pktr , Bytes
classifier is required to train the botent preventions system. , Byter ], in which Pkts and Pktr represent the number of
Acquiring such information is a challenging task, thereby packets sent and received, and Bytes and Byter represent the
drastically limiting the practical use of these methods. Some number of bytes sent and received, respectively.
of the older approach involves content signature, encryptions,
profiling, fixed source port. our approach does not need any The distance between two flows is subsequently defined as
content signature. our analysis approach can estimate the the euclidean distance of their two corresponding vectors. We
active time of a P2P application, which is critical for botnet then apply a clustering algorithm to partition the set of flows
detection into a number of clusters. Each of the obtained clusters of
flows, Cj (h), represents a group of flows with similar size.
3. SYSTEM DESIGN
A Botmaster has to be designed with P2P protocol. Flows corresponding to ping/pong and peer-discovery share
Therefore P2P bots exhibit some network traffic patterns that similar sizes, and hence they are grouped into two clusters
are common to other P2P client applications either legitimate (FC1 and FC2), respectively. Since the number of destination
or malicious. Hence our system is divided into two phases. In BGP prefixes involved in each cluster is larger, we take FC1
the first phase, we aim at detecting all hosts within the and FC2 as its fingerprint clusters. A fingerprint cluster
monitored network that engage in P2P communications. We summary, (Pkts , Pktr , Bytes , Byter , proto), represents the
analyze raw traffic collected at the edge of the monitored protocol and the average number of sent/received
network and apply a pre-filtering step to discard network packets/bytes for all the flows in this fingerprint cluster. We
flows that are unlikely to be generated by P2P[1]. We then implemented the flow analysis component and identified
analyze the remaining traffic and extract a number of fingerprint cluster for the sample P2P traces including two
statistical features to identify flows generated by P2P clients. traces.
In the second phase, our system analyzes the traffic generated
by the P2P clients and classifies them into either legitimate 3.2 Detecting P2P bots
P2P clients or P2P bots. Specifically, we investigate the active To detect the bots coarse grained detection method is used.
time of a P2P client and identify it as a candidate P2P bot if it Since bots are malicious programs used to perform profitable
is persistently active on the underlying host. We further malicious activities, they represent valuable assets for the
analyze the overlap of peers contacted by two candidate P2P botmaster, who will intuitively try to maximize utilization of
bots to finalize detection. After analyzing with the use of bots. This is particularly true for P2P bots[5] because in order
anomaly based detection algorithm the network has to be to have a functional overlay network (the botnet), a sufficient
revoked from malwares. number of peers needs to be always online. In other words,
the active time of a bot should be comparable with the active
time of the underlying compromised system.
Network traffic
Traffic filter The distance between each pair of hosts is computed. We
apply hierarchical clustering, and group together hosts
according to the distance defined above. In practice the
hierarchical clustering algorithm will produce a dendrogram
Detecting Detecting P2P (a tree-like data structure). The dendrogram expresses the
Bots P2P Bots clients “relationship” between hosts. The closer two hosts are, the
lower they are connected at in the dendrogram. Two P2P bots
in the same botnet should have small distance and thus are
connected at lower level. In contrast, legitimate P2P
Revoking from applications tend to have large distances and consequently are
malware connected at the upper level. We then classify hosts in dense
clusters as P2P bots, and discard all other clusters and the
related hosts, which we classify as legitimate P2P clients.
Fig 1: System architecture
4. SYSTEM IMPLEMENTATION
3.1 Detecting P2P client Out of four components in our system, “Traffic Filter” and
Traffic filter is used to sort out th traffic which is unlikely to “Coarse-Grained Detection of P2P Bots” have linear
P2P networks. In this first phase, fine grained detection of complexity since they need to scan flows only once to identify
P2P botenets is implemented. This component is responsible flows with destination addresses resolved from DNS queries
for detecting P2P clients by analyzing the remaining network or calculate the active time. Other two components, “Fine-
flows after the Traffic Filter component. For each host h Grained Detection of P2P Clients” and “Fine-Grained P2P
within the monitored network we identify two flow sets, Detection of P2P Bots”, require pairwise comparison for
denoted as Stcp(h) and Sudp(h), which contain the flows distance calculation
related to successful outgoing TCP and UDP[6] connection,
respectively. We use a two-step clustering approach to reduce the time
complexity of “Fine-Grained P2P Client Detection”. For the
To identify flows corresponding to P2P control messages,we first-step clustering, we use an efficient clustering algorithm
first apply a flow clustering process intended to group to aggregate network flows into K sub-clusters, and each
together similar flows for each candidate P2P node h. Given subcluster contains flows that are very similar to each other.
For the second-step clustering, we investigate the global

www.ijcat.com 796
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 795 - 798, 2014, ISSN:- 2319–8656

distribution of sub-clusters and further group similar sub- scenario always visible botnet minor. If any dispute in the
clusters into clusters. identification scenario overall network may be crashed.

The distance of two flows is defined as the Euclidean distance 4.1.4 Attacking model of Malware
of their corresponding vectors, where each vector [Pkts , Pktr , Botnet minor contain all the details about the peer network.
Bytes , Byter ] represents the number of packets/ bytes that The botnet minor handles all the request processed by the
are sent/received in a flow. decentralized network. The botnet major attack decentralized
scenario spread the warm data to the peer network. The node
For the second-step clustering, we use hierarchical clustering connected with the attacked node that specific node also get
with DaviesBouldin validation[8] to group sub-clusters into the warm data.
clusters. Each sub-cluster is represented using a vector ([Pkts ,
Pktr , Bytes , Byter ]), which is essentially the average for all
flow vectors in this sub-cluster. 4.1.5 Revoking the network from Malware
Data matching have the law data and the original data. The
Hierarchical clustering is used to build a dendrogram. Finally, proposed technical approach can identify the warm data it is
DaviesBouldin validation is employed to assess the global spreaded by the botnet. Revoke the original data instead of
distribution of inter- and intra-cluster distances of clusters warm data it can identify the problem and revoke the botnet
based on various clustering decisions and yield the best cut for minor from the attacking model.
the dendrogram. The two-step clustering algorithm has the
time complexity of O(nK I + K2).
5. EXPERIMENTAL RESULTS
4.1 Modules We prepared a data set (D) for evaluation. Specifically, we
The goal of guarding the large scale scrabble in social randomly selected half [8] of the P2P bots from NETbots
network is implemented by the following modules, .Then for each of the 5 P2P applications we ran, we randomly
selected one out of its two traces from NETP2P and overlaid
its traffic to the traffic of a randomly selected host We applied
User interface design our detection system on data set D. The traffic filter
drastically reduced the workload for the whole system. As
indicated in Figure 4, it reduced the number of hosts subject to
analysis by 67% (from 953 to 316) but retained all P2P
clients.
Implementing peer netwrok
Among 26 P2P clients identified in the previous step, 25 out
of them exhibit persistent P2P behaviors. We further evaluate
the similarity of fingerprint clusters and peer IPs for each pair
of persistent P2P clients and
Botnet minor approach
derive a dendrogram.

If botmasters get to know about our detection algorithm, they


could attempt to modify other bots‟ network behavior to
Attacking model of malware evade detection. This situation is similar to evasion attacks
against other intrusion detection systems

6. CONCLUSION
Revoking from malware To summarize, although our system greatly enhances and
complements the capabilities of existing P2P botnet detection
systems, it is not perfect. We should definitely strive to
4.1.1 User Interface Design develop more robust defense techniques, where the
The user interaction is effective operation and control of the aforementioned discussion outlines the potential
machine on the user„s. The user interface module has login improvements of our system.
and registration phases. The registration phase gets details
from user and stores it in database. It also checks the details In this paper, we presented a novel botnet detection system
are valid or not. that is able to identify stealthy P2P botnets, whose malicious
activities may not be observable. To accomplish this task, we
derive statistical fingerprints of the P2P communications to
4.1.2 Implementing peer network first detect P2P clients and further distinguish between those
The peer network contain decentralized networks. All the that are part of legitimate P2P networks (e.g., filesharing
nodes contains separate IP address and separate port number. networks) and P2P bots. We also identify the performance
The peer one node have stored separate list of files which in bottleneck of our system and optimize its scalability. The
the global respository. evaluation results demonstrated that the proposed system
accomplishes high accuracy on detecting stealthy P2P bots
4.1.3 Botnet minor approach and great scalability.
The global respository contains the decentralized network
details. The botnet minor store and reteive the information
about port and IP details from the database. Identification

www.ijcat.com 797
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 795 - 798, 2014, ISSN:- 2319–8656

http://mtc.sri.com/Conficker/addendumC/index.html

7. REFERENCES [6] G. Sinclair, C. Nunnery, and B. B. Kang, “The waledac


[1] S. Stover, D. Dittrich, J. Hernandez, and S. Dietrich, protocol: The how and why,” in Proc. 4th Int. Conf. Malicious
“Analysis of the storm and nugache trojans: P2P is here,” in Unwanted Softw., Oct. 2009, pp. 69–77.
Proc. USENIX, vol. 32. 2007, pp. 18–27.
[7] G. Gu, R. Perdisci, J. Zhang, and W. Lee, “Botminer:
[2] Junjie Zhang, Roberto Perdisci, Wenke Lee, Xiapu Luo, Clustering analysis of network traffic for protocol- and
and Unum Sarfraz, “Building a Scalable System for Stealthy structure-independent botnet detection,” in Proc. USENIX
P2P-Botnet Detection”, IEEE transactions on information Security, 2008, pp. 139–154.
forensics and security, vol. 9, no. 1, january 2014
[5] R. Lemos. (2006). Bot Software Looks to Improve
[3] Pratik Narang, Subhajit Ray, Chittaranjan Hota, Venkat Peerage[Online]Available:http://www.securityfocus.com/new
Venkatakrishnan, “PeerShark: Detecting Peer-to-Peer Botnets s/11390
by Tracking Conversations”,2014 IEEE Security and Privacy
Workshops [6] Y. Zhao, Y. Xie, F. Yu, Q. Ke, and Y. Yu, “Botgraph:
Large scale spamming botnet detection,” in Proc. 6th
[4] P. Porras, H. Saidi, and V. Yegneswaran, “A multi- USENIX NSDI, 2009,
perspective analysis of the storm (peacomm) worm,” Comput. pp. 1–14.
Sci. Lab., SRI Int., Menlo Park, CA, USA, Tech. Rep., 2007.
[8] T.-F. Yen and M. K. Reiter, “Are your hosts trading or
Authors: plotting? Telling P2P file-sharing and bots apart,” in Proc.
ICDCS, Jun. 2010, pp. 241–252.
[5] P. Porras, H. Saidi, and V. Yegneswaran. (2009).
Conficker C Analysis [Online]. Available:

Ms. Geerthidevi K G, PG Scholar Currently persuing her M.E CSE degree in Shree Venkateshwara Hi-
Tech Engg College, Gobi, Tamilnadu , India. Her research interests include Networking, Network Security
etc.,

Dr.T.Senthil Prakash received the Ph.D. degree from the PRIST University, Thanjavur, India in 2013
and M.E(CSE) degree from Vinayaka Mission‟s University, Salem, India in 2007 and
M.Phil.,MCA.,B.Sc(CS) degrees from Bharathiyar University, Coimbatore India, in 2000,2003 and 2006
respectively, all in Computer Science and Engineering. He is a Member in ISTE New Delhi, India,
IAENG, Hong Kong..IACSIT, Singapore SDIWC, USA. He has the experience in Teaching of 10+Years
and in Industry 2 Years. Now He is currently working as a Professor and Head of the Department of Computer Science
and Engineering in Shree Venkateshwara Hi-Tech Engineering College, Gobi, Tamil Nadu, and India. His research
interests include Data Mining, Data Bases, Artificial Intelligence, Software Engineering etc.,He has published several
papers in 17 International Journals, 43 International and National Conferences.

Mr.S.Prakadeswaran, received the Bachelor of Engineering in Anna University,Chennai,Tamilnadu in


2008.He received the Master of Engineering in Anna University, Chennai, Tamilnadu in 2013. He has the
experience in Teaching of 6+Years. He is currently working as Assistant professor in Shree
Venkateshwara Hi Tech Engineering College, Gobichettipalayam, Tamilnadu. His research interest
includes Wireless Networks and Pervasive computing. He has published several papers in 4 International
Journals

www.ijcat.com 798
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

A Secure, Scalable, Flexible and Fine-Grained Access


Control Using Hierarchical Attribute-Set-Based
Encryption (HASBE) in Cloud Computing
Prashant A. Kadam Avinash S. Devare
Department of Computer Engineering Department of Computer Engineering
JSPM Narhe Technical Campus JSPM Narhe Technical Campus
Pune, India Pune, India

Abstract: Cloud Computing is going to be very popular technology in IT enterprises. For any enterprise the data stored is very huge
and invaluable. Since all tasks are performed through network it has become vital to have the secured use of legitimate data. In cloud
computing the most important matter of concern are data security and privacy along with flexibility, scalability and fine grained access
control of data being the other requirements to be maintained by cloud systems Access control is one of the prominent research topics
and hence various schemes have been proposed and implemented. But most of them do not provide flexibility, scalability and fine
grained access control of the data on the cloud. In order to address the issues of flexibility, scalability and fine grained access control
of remotely stored data on cloud we have proposed the hierarchical attribute set-based encryption (HASBE) which is the extension of
attribute- set-based encryption(ASBE) with a hierarchical structure of users. The proposed scheme achieves scalability by handling the
authority to appropriate entity in the hierarchical structure, inherits flexibility by allowing easy transfer and access to the data in case
of location switch. It provides fine grained access control of data by showing only the requested and authorized details to the user thus
improving the performance of the system. In addition, it provides efficient user revocation within expiration time, request to view
extra-attributes and privacy in the intra-level hierarchy is achieved. Thus the scheme is implemented to show that is efficient in access
control of data as well as security of data stored on cloud with comprehensive experiments.

Keywords: Fine-grained access control, attribute-set-based encryption, hierarchical attribute-set-based encryption.

flexibility, scalability of data stored on cloud which are the


1. INTRODUCTION system performance parameters and which degrade the system
Cloud Computing refers to both the applications delivered as response time required to be handled by cloud systems. They
services over the Internet and the hardware and systems should provide a secure environment and maintenance of data
software in the data centers that provide those services. The in hierarchy.
services themselves have long been referred to as Software as a The prominent security concern is data storage security and
Service (SaaS).The datacenter hardware and software is what privacy in cloud computing due to its Internet-based data
we will call a Cloud. storage and management. The data security issue becomes vital
Cloud computing is a web-based application that provides when the data is a confidential data. In cloud computing, users
computation, software, infrastructure, platform, devices and have to give up their data to the cloud service provider for
other resources to users on the basis of pay as you use. Clients storage and business operations, while the cloud service
can use cloud services without any installation and the data provider is usually a commercial enterprise which cannot be
uploaded on cloud is accessible from anywhere in the world, totally trusted. So the data integrity and privacy of data is at
the only requirement is the computer with active internet risk.
connection. As a customizable computing resources and a huge Flexible and fine-grained access control is strongly desired in
amount of storage space are provided by internet based online the service-oriented cloud computing model. Various schemes
services, the shift to online storage has contributed greatly in which provide access control models have been proposed. But
eliminating the overhead of local machines in storage and the problem related with these schemes is that they are limited
maintenance of data. Cloud provides a number of benefits like to data owners and service providers which exist in the same
flexibility, disaster management and recovery, pay-per-use and trusted domain.
easy to access and use model which contribute to the reason of
switching to cloud. Cloud gives the provision for storage of
important data of users. Thus cloud helps to free up the space
on the local disk.
Cloud computing has emerged as one of the most influential
paradigms in the IT industry. Almost all companies,
organizations store their valuable data on the cloud and access
it. Due to this security to the cloud is a major concern. Also

www.ijcat.com 799
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

2. EXISTING SYSTEM 2.3 Pankaj et al. “Cloud Computing Security


2.1 Vipul et al.“ (Abe)Attribute based
Issues in Infrastructure as a Service”, 2012.
encryption”. [1]
[3]
As more sensitive data is shared and stored by third-party sites
Cloud computing is current buzzword in the market. It is
on the Internet, there will be a need to encrypt data stored at
paradigm in which the resources can be leveraged on peruse
these sites. One drawback of encrypting data, is that it can be
basis thus reducing the cost and complexity of service
selectively shared only at a coarse-grained level (i.e., giving
providers. Cloud computing promises to cut operational and
another party your private key). We develop a new
capital costs and more importantly it let IT departments focus
cryptosystem for fine-grained sharing of encrypted data that we
on strategic projects instead of keeping datacenters running. It
call Key-Policy Attribute-Based Encryption (KP-ABE). In our
is much more than simple internet. It is a construct that allows
cryptosystem, cipher texts are labeled with sets of attributes
user to access applications that actually reside at location other
and private keys are associated with access structures that
than user‟s own computer or other Internet-connected devices.
control which cipher texts a user is able to decrypt. We
There are numerous benefits of this construct. For instance
demonstrate the applicability of our construction to sharing of
other company hosts user application. This implies that they
audit-log information and broadcast encryption. Our
handle cost of servers, they manage software updates and
construction supports delegation of private keys which
depending on the contract user pays less i.e. for the service
subsumes Hierarchical Identity-Based Encryption (HIBE).
only. Confidentiality, Integrity, Availability, Authenticity, and
Privacy are essential concerns for both Cloud providers and
2.2 Rakesh et al. “Attribute-Sets: A
consumers as well. Infrastructure as a Service (IaaS) serves as
Practically Motivated Enhancement to the foundation layer for the other delivery models, and a lack
Attribute-Based Encryption”, University of of security in this layer will certainly affect the other delivery
models, i.e., PaaS, and SaaS that are built upon IaaS layer. This
Illinois at Urbana-Champaign, July 27,
paper presents an elaborated study of IaaS components‟
2009. [2] security and determines vulnerabilities and countermeasures.
Service Level Agreement should be considered very much
In distributed systems users need to share sensitive objects with
importance.
others base on the recipients‟ ability to satisfy a policy.
Attribute-Based Encryption (ABE) is a new paradigm where
2.4 John et al. “(CP-ABE) Cipher text-Policy
such policies are specified and cryptographically enforced in
the encryption algorithm itself. Cipher text-Policy ABE (CP-
Attribute-Based Encryption” John et al. [4]
ABE) is a form of ABE where policies are associated with
In several distributed systems a user should only be able to
encrypted data and attributes are associated with keys. In this
access data if a user posses a certain set of cre- dentials or
work we focus on improving the flexibility of representing user
attributes. Currently, the only method for enforcing such
attributes in keys. Specifically, we propose Cipher text Policy
policies is to employ a trusted server to store the data and
Attribute Set Based Encryption (CP-ASBE) - a new form of
mediate access control. However, if any server storing the data
CP-ABE - which, unlike existing CP-ABE schemes that
is compromised, then the confidentiality of the data will be
represent user attributes as a monolithic set in keys, organizes
compromised. In this paper we present a system for realizing
user attributes into a recursive set based structure and allows
complex access control on encrypted data that we call Cipher
users to impose dynamic constraints on how those attributes
text-Policy Attribute-Based Encryption. By using our
may be combined to satisfy a policy. We show that the
techniques encrypted data can be kept confidential even if the
proposed scheme is more versatile and supports many practical
storage server is not trusted; moreover, our methods are secure
scenarios more naturally and efficiently. We provide a
against collusion attacks. Previous Attribute- Based Encryption
prototype implementation of our scheme and evaluate its
systems used attributes to describe the encrypted data and built
performance overhead

www.ijcat.com 800
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

policies into user‟s keys; while in our system attributes are hierarchical identity-based encryption (HIBE) system and a
used to describe a user‟s credentials, and a party encrypting ciphertext-policy attribute-based encryption (CP-ABE) system,
data deter-mines a policy for who can decrypt. Thus, our meth- so as to provide not only fine-grained access control, but also
ods are conceptually closer to traditional access control full delegation and high performance. Then, we propose a
methods such as Role-Based Access Control (RBAC). In scalable revocation scheme by applying proxy re-encryption
addition, we provide an implementation of our sys- tem and (PRE) and lazy re-encryption (LRE) to the HABE scheme, so
give performance measurements. as to efficiently revoke access rights from users.

2.5 Ayad et al. “Enabling Data Dynamic and 2.7 Qin et al. “Hierarchical Attribute-Based
Indirect Mutual Trust for Cloud Computing Encryption for Fine-Grained Access Control
Storage System”, 2012. [6] in Cloud Storage Services”. [9]

In this paper, we propose a cloud-based storage scheme that Cloud computing, as an emerging computing paradigm,
allows the data owner to benefit from the facilities offered by enables users to remotely store their data into a cloud so as to
the CSP and enables indirect mutual trust between them. The enjoy scalable services on-demand. Especially for small and
proposed scheme has four important features: (i) it allows the medium-sized enterprises with limited budgets, they can
owner to outsource sensitive data to a CSP, and perform full achieve cost savings and productivity enhancements by using
block-level dynamic operations on the outsourced data, i.e., cloud-based services to manage projects, to make
block modification, insertion, deletion, and append, (ii) it collaborations, and the like. However, allowing cloud service
ensures that authorized users (i.e., those who have the right to providers (CSPs), which are not in the same trusted domains as
access the owner‟s file) receive the latest version of the enterprise users, to take care of confidential data, may raise
outsourced data, (iii) it enables indirect mutual trust between potential security and privacy issues. To keep the sensitive user
the owner and the CSP, and (iv) it allows the owner to grant or data confidential against untrusted CSPs, a natural way is to
revoke access to the outsourced data. We discuss the security apply cryptographic approaches, by disclosing decryption keys
issues of the proposed scheme. Besides, we justify its only to authorized users. However, when enterprise users
performance through theoretical analysis and experimental outsource confidential data for sharing on cloud servers, the
evaluation of storage, communication, and computation adopted encryption system should not only support fine-
overheads. grained access control, but also provide high performance, full
delegation, and scalability, so as to best serve the needs of
2.6 Guojun et al. “Hierarchical attribute- accessing data anytime and anywhere, delegating within

based encryption and scalable user enterprises, and achieving a dynamic set of users. In this paper,
we propose a scheme to help enterprises to efficiently share
revocation for sharing data in cloud
confidential data on cloud servers. We achieve this goal by first
servers”, 2011. [8] combining the hierarchical identity-based encryption (HIBE)
system and the cipher text-policy attribute-based encryption
With rapid development of cloud computing, more and more
(CP-ABE) system, and then making a performance-
enterprises will outsource their sensitive data for sharing in a
expressivity tradeoff, finally applying proxy re-encryption and
cloud. To keep the shared data confidential against untrusted
lazy re-encryption to our scheme.
cloud service providers (CSPs), a natural way is to store only
the encrypted data in a cloud. The key problems of this 2.8. Patrick et al. “Methods and Limitations
approach include establishing access control for the encrypted
of Security Policy Reconciliation”. [10]
data, and revoking the access rights from users when they are
no longer authorized to access the encrypted data. This paper
A security policy is a means by which participant session
aims to solve both problems. First, we propose a hierarchical
requirements are specified. However, existing frameworks
attribute-based encryption scheme (HABE) by combining a
provide limited facilities for the automated reconciliation of

www.ijcat.com 801
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

participant policies. This paper considers the limits and 3. This system is designed to provide flexibility of the
methods of reconciliation in a general-purpose policy model. data where in case of transfer of employee, his data
We identify an algorithm for efficient two-policy could be transferred to respective location with ease.
4. It also provides scalability in case when an employee
reconciliation, and show that, in the worst-case, reconciliation
is absent his work could be handled by the senior
of three or more policies is intractable. Further, we suggest
employee securely.
Efficient heuristics for the detection and resolution of
intractable reconciliation. Based upon the policy model, we
describe the design and implementation of the Ismene policy
language. The expressiveness of Ismene, and indirectly of our
model, is demonstrated through the representation and
exposition of policies supported by existing policy languages.
We conclude with brief notes on the integration and
enforcement of Ismene policy within the Antigone.

3. PROPOSED SYSTEM
In our propose system instead of showing complete data from
cloud we are fetching only those data which is essential for that
user. We are not fetching all data so it takes less time for
Figure 1. General Architecture of the System
fetching data so system response time is very less due to which
system performance increases. We are performing encryption
before storing data so even if data get hack by hacker data
3.1 Methodology
1. Registration and login by user:
cannot be easily understand by hacker. We are performing
In this user fill his/her own complete data. Request is sent to
hierarchical structure so even if lower authority is absent for
the CEO for confirmation. CEO confirms his/her request and
particular days at that time higher authority handle all work of
assigns attribute and time period for that user. Once Account
lower authority so work of company will not be stopped. The
get confirm password and key is sent to that user by email so
HASBE scheme for realizing scalable, flexible and fine-
grained access control in cloud computing. The HASBE he/she can access his/her account.
scheme seamlessly incorporates a hierarchical structure of
system users by applying a delegation algorithm to ASBE. 2. Approve User and Assign attributes:
HASBE not only supports compound attributes due to flexible Out of the selected attributes according the roles defined in
attribute set combinations, but also achieves efficient user hierarchy of the system the attribute visibility access is
revocation because of multiple value assignments of attributes. decided. Each attribute is encrypted.
We formally proved the security of HASBE based on the
security of CP-ABE. Finally, we completed the detailed 3. Key Generation and Verification
analysis of proposed scheme, and conducted comprehensive
Key is generated based on the attributes filled by the user in
performance analysis and evaluation, which showed its
efficiency and advantages over existing schemes. registration form. In attribute key verification, when a key is
used for login ,it is first checked with the key stored in the
database. If a match is found then user is allowed for further
3.1 Project Scope process else the user is rejected for further process.
4. Encryption and decryption of data
1. This system is designed to provide security to data
User fills his/her data during registration. Once it is click on
stored on cloud and improve performance of system
by showing only the required details requested by an submit button data is send to encryption algorithm that are
employee. RSA and AES. After performing encryption data is stored in
2. Security is provided by generating a secret key from encrypted format in database.
the various attributes stated in the form which is
5. Access Right:
filled by the employee at the time of registration. The user can view the selected attributes of the same level as
well as other levels according to the access authority using
attribute key.

www.ijcat.com 802
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

6. Fine Grained Access The cloud server provides the six main functions to
In our propose system instead of showing complete data, the
the user.
fetching of necessary data is allowed. Due to this system
provides a quick response time. 1. Fine-Grained Access
In our propose system instead of showing complete data,
7. Request for extra attribute: the fetching of necessary data is allowed. Due to this
system provides a quick response time.
The user can access attributes of same level as inter level
counterparts. He can request for extra attributes in case of
2. Scalability
emergency as well as ease of work.
We are performing hierarchical structure so even if
8. Flexibility lower authority is absent for particular days at that time
In this module flexibility can be done by suppose user is higher authority handle all work of lower authority so
transfer from one location to another location and for that new work of company will not be stopped.
location that user‟s data is not accessible then authority request
for accessing data of that user from old location. Once 3. Flexibility
authority got request that data should be access from new When an employee gets transferred, his data could be
location and it is not visible for old location accessible to the branch where he will be transferred
only not to the older branch. So data will be transferred
9. Scalability:
on request of CEO safely. Hence data can be transferred
We are performing hierarchical structure so even if lower
authority is absent for particular days at that time higher easily between branches.
authority handle all work of lower authority so work of
company will not be stopped. 4. Encryption
Encryption is a process in which data is hidden in a way
10. Efficient User Revocation: that is accessible to the authorized user only. In this
It can be done by two steps request to the admin and response system we are providing encryption (converting into
to the user from admin within expiration time.
unreadable) so that data is not accessible by any illegal
user like a hacker.
9. Privacy:
5. Decryption
Default it is public but a user can set intra-level privacy by Decryption is a process in which encrypted data i.e
restricting access to attributes.
unreadable format is converted into readable format.
3.2 Process Summary
Following Processes will be involved in Project: 6. Key Generation and Verification
Key is generated based on the attributes filled by the user
in registration form. In attribute key verification, when a
1. Encrypt data before Insert
key is used for login, it is first checked with the key
After user click on submit button data encrypted using RSA
stored in the database. If a match is found then user is
and AESs algorithm. Once data get encrypted it is stored into
database and when user wants to retrieve data it again
allowed for further process else the user is rejected for
decrypted and shown in original form. further process.

2. Request for New Attributes


In this phase one of the lower authority may absence then at
that time higher authority may handle both attributes, one is its
own attributes and another is attributes of the lower authority
who is absent for particular time period. User can also request 4. ALGORITHM AND MATHEMATICAL
for new attribute if needed in any case.
MODEL
3. Getting information of other user 4.1 Algorithm
In this when user transfer from one location to another
location at that time new location does not having rights to 4.1.1RSA (Rivest Shamir Adleman)
access data of that user at that time getting grant for accessing
data of that user. When user‟s data is accessible from new Key generation
location then it can -not access from old location.
RSA involves a public key and a private key. The public key
3.3 System Functional Features can be known by everyone and is used for encrypting

www.ijcat.com 803
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

messages. Messages encrypted with the public key can only be Where „AT‟ is main set of registered Attribute like at1, at2,
decrypted in a reasonable amount of time using the private key. at3…

Encryption C] Identify the employee requested For another Attribute

User1 transmits her public key (n, e) to User2 and keeps the RAA= {raa1, raa2, raa3}
private key secret. User1 then wishes to send message M to
User2.He first turns M into an integer m, such that 0 ≤ m < n Where „RAA‟ is main set of Request for another Attribute
by using an agreed-upon reversible protocol known as a raa1, raa2, raa3
scheme. He then computes the cipher text c corresponding to
D] Identify the employee requested for another employee
Information

This can be done quickly using the method of exponentiation REI= {rei1, rei2, rei3}
by squaring. User1 then transmits c to User2.Note that at least
nine values of m will yield a cipher text c equal to m, but this is Where „REI‟ is main set of Request for another Attribute rei1,
very unlikely to occur in practice. rei2, rei3

Decryption E] Identify Attribute Key of New employee

User can recover m from c by using her private key exponent. AK= {ak1, ak2, ak3….}

Where „AK‟ is main set of attribute key of users ak1, ak2,


ak3…

Given m, user can recover the original message M by reversing F] Identify the processes as P.
the padding scheme.
P= {Set of processes}
P = {P1, P2, P3,P4……}
4.1.2 Advanced Encryption Standard Algorithm
P1 = {e1, e2, e3}
Where
The AES algorithm is also used for improving the searching
{e1= upload data on server}
and access mechanism.

{e2= make the entry in database using different encryption


4.2 Mathematical Model algorithm}

We are using NP-Complete because it gives output within fix {e3= get new attribute after request}

interval of time.
{e4= get new employee information when employee get
transfer.}
Set Theory Analysis
G] Identify failure cases as FL
A] Identify the Employees
Failure occurs when –
E= {e1, e2, e3….} FL= {F1, F2, F3…}
a) F1= = {f| ‟f‟ if error in uploading due to interrupted Internet
Where „E‟ is main set of Employees like e1, e2, e3…
connection}
H] Identify success case SS:-
B] Identify the Attribute
Success is defined as-
AT= {at1, at2, at3….} SS= {S1, S2, S3, S4}

www.ijcat.com 804
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

a) S1= {s|‟s‟ if fast and no interrupted Internet connection}


b) S2= {s|‟s‟ if data is added into database}
c) S2= {s|‟s‟ if data is retrieve from database}
I] Initial conditions as I0
a) User has good internet connection
b) Admin has good internet connection

H is universal set i.e cloud.


H= {E, B, U, R}
E=employee set
B=attribute set
U=user set
R=registered
INTERMEDIATE STATE:
A] Identify the Employees
Request for new attribute
E= {e1, e2, e3….}

A=request for new attribute


Where „E‟ is main set of Employees like e1, e2, e3…

B=contain all the attribute


B] Identify the Attribute

R=provide requested attribute


B= {at1, at2, at3….}

Where „B‟ is main set of registered Attribute like at1, at2,


at3…

C] Identify the employee requested For another


Attribute

A= {raa1, raa2, raa3}

Where „A‟ is main set of Request for another Attribute raa1,


raa2, raa3

INITIAL STATE:
S1=A B
U={R, UR}

R=registered user
Hierarchy
UR=unregistered user
H = {H1, H2, H3, H4}

Where,

H is cloud

H1 is CEO.

H2 is general manager.

www.ijcat.com 805
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

H3 is the list of managers. Scalability

H4 is the list of employees. H = {H1, H2, H3, H4}

U= {H1, H2}

U‟= {H3, H4}


H
U=present user

U‟=absent user

FLEXIBILITY:

H= {C1, C2, C3}

Where,

C1 is the old branch of the company where employee worked


before transfer. S3

C2 is the employee being transferred. FINAL STATE:

C3 is the new branch where employee got transferred to. Identify the processes as P.

P= {Set of processes}
P = {P1, P2, P3, P4……}
H
Where
P1 = {S1, S2, S3}

P1 P2 P3 P4
H= {C1, C2, C3}

Where,

S2 is employee data should be accessed to new branch only not


old branch.

Where,

H {S1= get new attribute after request}

{S2= get new employee information when employee get


transfer.}

{S3= get access of lower authority}

S2= (C1-C2) U C3

www.ijcat.com 806
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

U‟= {H3, H4}

U=present user
H
U‟=absent user

FLEXIBILITY:

H= {C1, C2, C3}

Where,
S3
C1 is the old branch of the company where employee worked
before transfer. FINAL STATE:

C2 is the employee being transferred. Identify the processes as P.

C3 is the new branch where employee got transferred to. P= {Set of processes}
P = {P1, P2, P3, P4……}
Where
H P1 = {S1, S2, S3}

P1 P2
P2 P3 P4
H= {C1, C2, C3}

Where,

S2 is employee data should be accessed to new branch only not


old branch. {S1= get new attribute after request}

{S2= get new employee information when employee get

H transfer.}

{S3= get access of lower authority}

P1

S1 S2 S3
S2= (C1-C2) U C3

Scalability

H = {H1, H2, H3, H4}

U= {H1, H2}

www.ijcat.com 807
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 799 - 808, 2014, ISSN:- 2319–8656

5. CONCLUSION [5] Suhair et al. “Designing a Secure Cloud-Based EHR


System using Ciphertext-Policy Attribute-Based Encryption”,
Thus, our system efficiently provides a fine grained access 2011
control with flexibility and scalability with a hierarchical
[6] Ayad et al. ”Enabling Data Dynamic and Indirect Mutual
structure in our HASBE system. Our paper will be providing
Trust for Cloud Computing Storage System”, 2012.
security to the users from outsiders or intruders by
implementing session hijacking and session fixation security in
[7]Chandana et al. “Gasbe: A Graded Attribute-Based Solution
our system with sql injection attack prevention. The core is for
For Access Control In Cloud Computing”, 2011.
sure, a cloud-base thus giving us a choice of multi-user access
including security from intruder attacks. Hence we benefit the [8] Guojun et al. “Hierarchical attribute-based encryption and
users with attack handling and many advantages over the scalable user revocation for sharing data in cloud servers”,
existing systems. 2011.

[9] Qin et al. “Hierarchical Attribute-Based Encryption for


6. REFERENCES Fine-Grained Access Control in Cloud Storage Services”.
[1] Vipul et al.“ (Abe)Attribute based encryption”.
[10] Patrick et al. “Methods and Limitations of Security Policy
[2] Rakesh et al. “Attribute-Sets: A Practically Motivated Reconciliation”.
Enhancement to Attribute-Based Encryption”, University of
Illinois at Urbana-Champaign, July 27, 2009 [11]http://searchwindowsserver.techtarget.com/defin
ition/IIS.
[3] Pankaj et al. “Cloud Computing Security Issues in
Infrastructure as a Service”,2012. [12]http://en.wikipedia.org/wiki/Microsoft_Visual_

[4] John et al. “(cp-abe) Ciphertext-Policy Attribute-Based


Studio
Encryption”John et al..
[13] http://en.wikipedia.org/wiki/.NET_Framework.

www.ijcat.com 808
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 809 - 811, 2014, ISSN:- 2319–8656

CMS Website Security Threat Protection Oriented


Analyzer System
Pritesh Taral Balasaheb Gite
Department of Computer Engineering Department of Computer Engineering
Sinhagad Academy of Engineering, Sinhagad Academy of Engineering
Kondhwa (University of Pune) Kondhwa (University of Pune)
Pune, Maharashtra, India Pune, Maharashtra, India

Abstract - Website security is a critical issue that needs to be considered in the web, in order to run your online business healthy and
smoothly. It is very difficult situation when security of website is compromised when a brute force or other kind of attacker attacks on
your web creation. It not only consume all your resources but create heavy log dumps on the server which causes your website stop
working.

Recent studies have suggested some backup and recovery modules that should be installed into your website which can take timely
backups of your website to 3rd party servers which are not under the scope of attacker. The Study also suggested different type of
recovery methods such as incremental backups, decremental backups, differential backups and remote backup.

Moreover these studies also suggested that Rsync is used to reduce the transferred data efficiently. The experimental results show
that the remote backup and recovery system can work fast and it can meet the requirements of website protection. The automatic backup
and recovery system for Web site not only plays an important role in the web defence system but also is the last line for disaster
recovery.

This paper suggests different kind of approaches that can be incorporated in the WordPress CMS to make it healthy, secure and
prepared web attacks. The paper suggests various possibilities of the attacks that can be made on CMS and some of the possible
solutions as well as preventive mechanisms.
Some of the proposed security measures –
1. Secret login screen
2. Blocking bad boats
3. Changing db. prefixes
4. Protecting configuration files
5. 2 factor security
6. Flight mode in Web Servers
7. Protecting htaccess file itself
8. Detecting vulnerabilities
9. Unauthorized access made to the system checker
However, this is to be done by balancing the trade-off between website security and backup recovery modules of a website, as measures
taken to secure web page should not affect the user‟s experience and recovery modules.

Keywords –WodrPress,Rsync.Web Security

It becomes a difficult situation when security of a


1. INTRODUCTION website is compromised when any brute force attacker
As WWW is becoming more and more complex lot of attacks on your creation. Attacker tries different
challenges has related to security of the webpage are permutations of password and username and it also
arising. Website security is the most important part of the consumes all your resources and create heavy log dumps
post development phase of the web creation. Web on the server which causes your website stop working.
publisher needs to make check-ups of the websites and Sometimes attacker might get access to your website
audit of the website to avoid the unexpected surprises. by injecting the code into website through open areas of
Website should be ready to withstand any attack made on the webpages such as comment box or any text field which
it. Moreover, the website should not affect the user‟s is processed at the server side through php or any server
experience and revenue by compromising the security of side scripting language.
website. During holidays you don‟t have access to the
administrator panel then you can put your website admin

www.ijcat.com 809
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 809 - 811, 2014, ISSN:- 2319–8656

panel into sleep mode so that no one can attack your login Guiyang ; Xiaoyao Xie analyses the network system
page. faced by common threats and attack methods and means
Some of the proposed security measures are as for the typical, sum-up a website security system by the
follows– need to address the problem, solve these problems
1. Security for user accounts formulate the corresponding protection measures.
2. Security for login module
3. Security while registering user
4. Security related to database module 4. SECURITY MEASURES
5. htaccess and configuration file backup and restore Security for user account
6. Functionality to blacklist and whitelist Sometimes CMS might have user account with default
7. Firewall protection and prevention of brute force user name „admin‟ which is easier for attacker to predict
login attack and attack or query to your CMS. It is considered as bad
8. whois lookup and security checker security practice as it makes the task of attacker 50% easier
9. Security for comment spam because attacker already knows one of your credentials
10. Disabling access to source code and selection of required to login. Besides this a Password strength tool can
text on UI be used to allow you to create very strong passwords.

Backup can be taken using different approaches such Security for login module
as incremental backup, selective backup, complete backup It is to protect the CMS against brute force login
and user can also recover from the hacking attack by using attacks with the login lockdown feature so that users with
the restore mechanism which will restore system to certain IP range can be locked out of the system for a
previous working state. Backup can be complete database predetermined amount of time based on the configuration
backup. setting. It also force logout of all users after a configured
This paper basically deals with mechanisms period of time
mentioned above to secure website from bad boats and
hackers and make your server healthy by removing 1. Security while registering user
possible security threats. The Paper also pesents different Enable manual approve feature for registered
backup and restore mechanisms. accounts can minimize spam and bogus
registrations. Captcha can also help us to prove
valid user.
2. RELATED WORK
2. Security related to database module
There has been extensive efforts made to understand Table prefixes can be modified to other
web security by considering network traffic, encryption prefixes to make security level higher for the
techniques etc. But very few efforts have been taken to attacker. Attacker cannot predict the table prefix
understand the security needs of CMS and the techniques much easily
to deal with them.
Some of the important work related with this study is 3. htaccess and configuration file backup and
as follows: restore
A web site protection oriented remote backup and Configurations files which are useful for
recovery method : running website should be protected from attacks.
He Qian, Guo Yafeng, Wang Yong, in his thesis It is main file which provides security to other
describes that how we can take incremental abd CMS modules.
decremental backups of the website which will be used to
recover site during disaster. [1]. 4. Functionality to blacklist and whitelist
Website Regional Rapid Security Detection It is used to blacklist and whitelist IP
Method: addresses of the web surfers. It is recommended
Yong Fang ; Liang Liu, suggested that distributed to identify the search engine boat and spam boats
design, website regional rapid security detection method
can conduct security detection for the whole region by
adding detection module dynamically as needed and record
4. ANALYZE AND SUGGEST TOOL
results of detection. [2]
Research and solution of existing security Analyze and suggest tool is used to scan the CMS
problems in current internet website system : website for checking out possible threat inside the system.
It then analyze the website and generate the security

www.ijcat.com 810
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 809 - 811, 2014, ISSN:- 2319–8656

reports and suggest out some possible solutions and also Pritesh A.Taral received the B.E. degree in Computer Engineering
from S.A.O.E Pune, INDIA in 2011 and perusing M.E. degree in
provides option to incorporate them into the current CMS Computer Engineering from S.A.O.E , Pune
system
Prof. Balasaheb B.Gite is working as Head of the department of
Computer engineering at SAOE Pune India. He received the B.E.
degree in Computer Engineering from P.R.E.C Loni INDIA and
M.E. degree from W.C.E, Pune.

Figure 1: General Architecture of the System

5. CONCLUSION

CMS security is quite different from the traditional notions


of website security. CMS has a predefined structure and it is
used by millions of peoples to create websites. This fact
makes the attacker‟s task easy, as he already knows the
predefined structure of CMS. Our concept would modify the
traditional CMS structure into a new customized CMS so
that the structure of the system would not remain as a default.
Thus it becomes difficult for an attacker to predict the DB
and configuration structure of the CMS which would
eventually boost the security level in CMS up.
.

6. REFERENCES
[1] He Qian, Guo Yafeng, Wang Yong, Qiang Baohua1 “A
web site protection oriented remote backup and recovery
method” INSPEC Accession Number : 14022497 2014 IEEE
[2] Yong Fang ; Liang Liu, “Website Regional Rapid
Security Detection Method” 978-1-4799-0587-4 20 13 IEEE
[3] Gaoqi Wei, “Research and solution of existing security
problems in current internet website system”, 978-1-4244-
2584-6 20 08 IEEE
[4] Wenping Liu ; Xiaoying Wang ; Li Jin, “Design and
implementation of a website security monitoring system from
users' perspective”, 978-0-7695-4817-3/12 2012 IEEE
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 812 - 814, 2014, ISSN:- 2319–8656

Sentence Validation by Statistical Language Modeling


and Semantic Relations
Lakshay Arya
Guru Gobind Singh Indraprastha University
Maharaja Surajmal Institute Of Technology
New Delhi, India

Abstract : This paper deals with Sentence Validation - a sub-field of Natural Language Processing. It finds various applications in
different areas as it deals with understanding the natural language (English in most cases) and manipulating it. So the effort is on
understanding and extracting important information delivered to the computer and make possible efficient human computer
interaction. Sentence Validation is approached in two ways - by Statistical approach and Semantic approach. In both approaches
database is trained with the help of sample sentences of Brown corpus of NLTK. The statistical approach uses trigram technique based
on N-gram Markov Model and modified Kneser-Ney Smoothing to handle zero probabilities. As another testing on statistical basis,
tagging and chunking of the sentences having named entities is carried out using pre-defined grammar rules and semantic tree parsing,
and chunked off sentences are fed into another database, upon which testing is carried out. Finally, semantic analysis is carried out by
extracting entity relation pairs which are then tested. After the results of all three approaches is compiled, graphs are plotted and
variations are studied. Hence, a comparison of three different models is calculated and formulated. Graphs pertaining to the
probabilities of the three approaches are plotted, which clearly demarcate them and throw light on the findings of the project.

Keywords: language modeling, smoothing, chunking, statistical, semantic

1. INTRODUCTION 2. SENTENCE VALIDATION


NLP is a field of Computer Science and linguistics concerned Sentence validation is the process in which computer tries to
with interactions between computers and human languages. calculate the validity of sentence and gives the cumulative
NLP is referred to as AI-complete problem. Research into probability. Validation refers to correctness of sentence, in
modern statistical NLP algorithms require understanding of dimensions such as statistical and semantic. A good validation
various disparate fields like linguistics, computer science, program can verify whether sentence is correct at all levels.
statistics, linear algebra and optimization theory. Python language and its NLTK [5] suite of libraries is most
To understand NLP, we have to keep in mind that we have suited for NLP problems. They are used as a tool for most of
several types of languages today : Natural Languages such as NLP related research areas - empirical linguistics, cognitive
English or Hindi, Descriptive Languages such as DNA, science, artificial intelligence, information retrieval and
Chemical formulas etc, and artificial languages such as Java, machine learning. NLTK provides easily-guessable method
Python etc. We define Natural Language as a set of all names for word tokenizing, sentence tokenizing, POS tagging,
possible texts, wherein each text is composed of sequence of chunking, bigram and trigram generation, frequency
words from respective vocabulary. In essence, a vocabulary distribution, and many more. Oracle connectivity with Python
consists of a set of possible words allowed in that language. is used to store the bigrams, trigrams and entity-relation pairs
NLP works on several layers of language: Phonology, required to test the three different models and finally to
Morphology, Lexical, Syntactic, Semantic, Discourse, compare their results.
Pragmatic etc. Sentence Validation finds its applications in First model is the purely statistical Markov Model, i.e.
almost all fields of NLP - Information Retrieval, Information bigrams and trigrams are generated from the sample files of
Extraction, Question-Answering, Visualization, Data Mining, Brown corpus of NLTK and then fed into the database.
Text Summarization, Text Categorization, Machine and Testing yields some results and raises some disadvantages
Language Translation, Dialogue And Speech based Systems which will be discussed later. Second model is Chunked-Off
and many other one can think of. Markov Model - an extension of the first model in the way
Statistical analysis of data is the most popular method for that it makes use of tagging and chunking operations wherein
applications aiming at validating sentences. N-gram all the proper nouns are categorized as PERSON, PLACE,
techniques make use of Markov Model. For convenience, we ORGANIZATION, FACILITY, etc. This replacing solves
restrict our study till trigrams which are preceded by bigrams. some issues which purely statistical model could not deal
Results of this approach are compared with results of with. Moving from statistical to semantic approach, we now
Chunked-Off Markov Model. Extending our study and aim to validate a sentence on semantic basis too, i.e. whether
moving towards Semantic Analysis - we find out the Entity- the sentence has some meaning and makes sense or not. For
Relation pairs from the Chunked off bigrams and trigrams. example, 'PERSON eats' is a valid sentence whereas 'PLACE
Finally, we aim to calculate the results for comparison of the eats' is an invalid one. So the latter input sentence must result
three above models. in a low probability for correctness compared to the former. In
order to show this demarcation between sentences, we extract
the entity relation pairs from sample sentences using named
entity recognition and chunking and store them in the ER
database. Whenever a sentence comes up for testing, we

www.ijcat.com 812
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 812 - 814, 2014, ISSN:- 2319–8656

extract the E-R pairs in this sentence and match them from
database entries to calculate probability for semantic validity.
The same corpus data and test data for the above three
approaches are taken for comparison purposes. Graphs
pertaining to the results are plotted and major differences and
improvements are seen which are later illustrated and
analyzed.

3. HOW DOES IT WORK ?


The first two statistical approaches use the N-gram technique
and Markov Model[2] building. In the pure statistical Markov
N-gram Model, corpus data is fed into the database in the
form of bigrams and trigrams with their respective
frequencies(i.e. how many times they occur in the whole data
set of sample sentences). When an input sentence is to be
validated, it is tokenized into bigrams and trigrams which are Figure. 2 Complete flowchart of Sentence Validation process
then matched with database values and a cumulative
probability after application of Smoothing-off technique of 4.1 N-Gram Markov Model
Kneser-Ney Smoothing which handles new words and zero The first module is Pure Markov Model[1]. In the pure
count events having zero probability which may cause system statistical Markov N-gram Model, corpus data is fed into the
crash, is calculated. database in the form of bigrams and trigrams with their
respective frequencies(i.e. how many times they occur in the
Chunked-Off Markov Model makes use of our own defined
whole data set of sample sentences). When an input sentence
replace function implemented through pos_tag and ne_chunk
is to be validated, it is tokenized into bigrams and trigrams
functionality of NLTK. Every sentence is first tagged
which are then matched with database values and a
according to Part-Of-Speech using pos_tag. Whenever a 'NN',
cumulative probability after application of Smoothing-off
'NNP' or in general 'NN*' chunk is encountered, it is passed
technique of Kneser-Ney Smoothing is calculated. The main
to ne_chunk which replaces the named entity with its type and
disadvantage of this pure statistics-based model is that it is not
returns a modified sentence whose bigrams and trigrams are
able to deal with Proper Nouns and Named Entities.
generated and fed into the database. The testing procedure of
Whenever a new proper noun is encountered with the same
this approach follows above methodology and modifies the
relation, it will result in lower probability even though the
sentence entered by the user in the same way, calculates the
sentence might be valid. This shortcoming of Markov Model
probabilities of the bigrams and trigrams by matching them
is overcome by next module - Chunked Off Markov Model.
with database entries and finally smoothes off to yield final
Markov Modeling is the most common method to perform
results.
statistical analysis on any type of data but it cannot be the sole
Above two approaches are statistical in nature, but we need to model for testing of NLP applications.
validate sentences on semantic and syntactic basis as well, i.e.
whether sentences actually make sense or not. For bringing
this into picture, we extract all entities(again NN* chunks)
and relations(VB* chunks). We define our own set of
grammar rules as context free grammar to generate parse tree
from which E-R pairs are extracted.

Figure. 1 Parse Tree Generated by CFG Figure. 3 Testing results for Pure Statistical Markov Model

4. COMPLETE STRUCTURE
We have trained the database with 85% corpus and testing 4.2 Chunked-Off Markov Model
with the rest of 15% corpus we have. This has two advantages The second module is Chunked-Off Markov Model[3] -
- firstly we shall use the same ratio in all other approaches so training the database with corpus sentences in which all the
that we can compare them easily. Secondly it provides a nouns and named entities are replaced with their respective
threshold value for probability which will help us to type. This is implemented using the tagging and chunking
distinguish between correct and incorrect test sentences operations of NLTK. This solves the problem of Pure
depicting regions above and below threshold respectively. Statistical model that it is not able to deal with proper nouns.
Graphs are plotted between probability(exponential, in order For example, a corpus sentence has the trigram 'John eats pie'.
of 10) and length of the sentence(number of words). If a test sentence occurs like 'Mary eats pie', it will result in a
very low trigram probability. But if the trigram 'John eats pie'
is modified to 'PERSON eats pie', it will result in a better
comparison.

www.ijcat.com 813
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 812 - 814, 2014, ISSN:- 2319–8656

6. Speech and Dialogue based systems


For example, Wolfram-Alpha, Google Search Engine, Text
Compactor, SDL Trados, Siri, S-Voice, etc all integrate
sentence validation as an important module.

6. CHALLENGES AND FUTURE OF


SENTENCE VALIDATION
As mentioned earlier NLP is still in the earliest stage of
adoption. Research work in this field is still emerging. It is a
Figure. 4 Testing results for Chunked-Off Markov Model difficult task to train a computer and make it understand the
complex and ambiguous nature of natural languages. The
4.3 Entity-Relation Model statistical approach is a well proven approach for statistical
The third module is E-R model[4]. Extraction of the calculations. But the data obtained from ER approach is
entities(all NN* chunks) and relations(all VB* chunks). We inconclusive. We may have to improve our approach and
define a set of grammar rules as context free grammar to scale the data to make ER model work. ER Model offers very
generate parse tree from which E-R pairs are extracted and substantial advantages over Statistical Model, that makes this
entered into the database. For convenience, we have taken the approach worth looking into. Even if it cannot reach the levels
first main entity and the first main relation because compound of Markov Model, ER Model could be a powerful tool in
entities are difficult to deal with. complementing Markov Model as well as for variety of other
NLP Applications.
We see Sentence Validation as the single best method
available to process any Natural Language application. All
languages have own set of rules which are not only difficult to
feed in a computer, but are also ambiguous in nature and
complex to comprehend and generalize. Thus, different
approaches have to be studied, analyzed and integrated for
accurate results.
Our three approaches validate a sentence in an over-all
manner, both statistically and semantically, making this
Figure. 5 Testing results for E-R Model system an efficient one. Also, the graphs show clearly that
chunking of the training data will yield in better testing of
4.4 Comparison of the three models data. The testing will become even more accurate if database
The fourth module is comparison. As expected, the modified is expanded with more sentences.
Markov Chunked-Off model performs. We can also see that
there are no sharp dips in the modified model which are
present in pure statistical model due to a sharp decrease in the
probability of trigrams and bigrams. The modified model is 7. REFERENCES
consistent due to its ability to deal with Proper Nouns and [1] Chen, Stanley F. and Joshua Goodman. 1998 An
Named Entities. empirical study of smoothing techniques for language
modeling Computer Speech & Language 13.4 : 359-393.
[2] Goodman. 2001 A bit of progress in Language Modeling
[3] Rosenfield, Roni. 2000 Two decades of statistical
language modeling : Where do we go from here?
[4] Nguyen Bach and Sameer Badaskar. 2005 A Review of
Relation Extraction
[5] NLTK Documentation. 2014 Retrieved from
http://www.nltk.org/book

Figure. 6 Comparison of the two models

5. WHY SENTENCE VALIDATION ?


Sentence Validation finds its use in the following fields :-
1. Information systems
2. Question-answering systems
3. Query-based information extraction systems
4. Text summarization applications
5. Language and machine translation applications

www.ijcat.com 814
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 815 - 821, 2014, ISSN:- 2319–8656

Implementation of Adaptive Digital Beamforming using


Cordic

Azra Jeelani Dr. Veena.M.B Dr.Cyril Prasanna Raj


VTU VTU Coventry University
M S Engineering College, B M S College of Engineering, M S Engineering College,
Bangalore, Karnataka, India Bangalore, Karnataka, India. Bangalore, Karnataka, India

Abstract: Sonar imaging is one of the simplest technique for detection of under water drowned bodies. There is a need for design of
conventional beamforming which are robust and simple. Adaptive beamformer is used to improve the quality of the sonar image. As a
result we get an image containing more useful and correct information. The CORDIC computing technique a highly efficient method
to compute elementary functions like sine, cosine, translate, rotate values using CORDIC algorithm. The system simulation was
carried out using ModelSim and Xilinx ISE Design Suite 9.2i.. Matlab code is used to implement sin and cos using cordic angles and
amplitude response of beamformed data by optimized method in order to enlarge the validity region of beamforming. Synthesis
results of cordic shows the reduced memory requirement and less power consumption.

Keywords: Beamforming , cordic, sonar imaging, validity region

incremental functions are performed with a very simple


1. INTRODUCTION extension to the hardware architecture, and while not
Beamforming is a type of signal processing technique used in
CORDIC in the strict sense, are often included because of the
sensor arrays for directional signal transmission or reception.
close similarity. The CORDIC algorithms generally produce
Here the elements are combined in such a way that signals at
one additional bit of accuracy for each iteration. The
particular angles experience constructive interference while
trigonometric CORDIC algorithms were originally
others experience destructive interference. 3D sonar imaging
developed as a digital solution for real-time navigation
has been one of the main innovations in underwater
problems. The original work is credited to Jack Volder [4,9].
applications over the years[1]. There are two critical issues in
Extensions to the CORDIC theory based on work by John
the development of high resolution 3D sonar systems are 1)
Walther[1] and others provide solutions to a broader class of
the cost of hardware, which is associated with the huge
functions. This paper attempts to survey the existing CORDIC
number of sensors that compose the planar array and 2) the
and CORDIC-like algorithms and then towards
computational burden in processing the signals. Palmese and
implementation in Field Programmable Gate Arrays
Trucco also propose an algorithm to perform chirp zeta
(FPGAs).
transform beam forming on the wideband signals collected by
an evenly spaced planar array and generated by a scene placed
A approximation of used in near field beamforming presented
in both the far field and the near field [4],[6]. Works are done
in[13],[14] by enlarging the validity region. Beamforming can
in [8]-[10] have proposed to use the Coordinated Rotation
be used at both the transmitting and receiving ends in order to
DIgital Computer(CORDIC) in implementing frequency
achieve spatial selectivity. To change the directionality of the
domain beamforming on field Programmable Gate Arrays-the
array when transmitting, a beamformer controls the phase and
CORDIC algorithm in an iterative arithmetic algorithm given
relative amplitude of the signal at each transmitter, in order to
by Volder[11] and Walther[12].This paper describes a data
create a pattern of constructive and destructive interference in
path using CORDIC for the algorithm.
the wave front. When receiving, information from different
The digital signal processing has long been dominated by sensors is combined in a way where the expected pattern of
microprocessors with enhancements such as single cycle radiation is preferentially observed. Conventional
multiply-accumulate instructions and special addressing beamformers use a fixed set of weightings and time-delays (or
modes. While these processors are low cost and offer extreme phasings) to combine the signals from the sensors in the array,
flexibility, they are not fast enough for truly demanding DSP primarily using only information about the location of the
tasks. The advent of high speeds of dedicated hardware sensors in space and the wave directions of interest. In
solutions which has the costs that are competitive with the contrast, adaptive beamforming techniques generally combine
traditional software approach. Unfortunately, algorithms this information with properties of the signals actually
optimized for these microprocessor based systems do not received by the array, typically to improve rejection of
always map well into hardware. While hardware-efficient unwanted signals from other directions. This process may be
solutions often exist, the dominance of the software systems carried out in either the time or the frequency domain.
has kept those solutions out of the spotlight. Among these Hardware implementation of bio-inspired algorithm for
hardware-efficient algorithms is a class of iterative solutions motion detection takes less processing time. Integration of
for trigonometric and other functions that use only shifts and motion detection model and improves the performance of
adds to perform. The trigonometric functions are based on autonomous visual navigation. For resolving the navigation
vector rotations, while other functions like square root are problems two existing approach optical flow or non bio-
implemented using an incremental expression of the desired inspired and bio- inspired processing time is needed to reduce.
function. The trigonometric algorithm is called CORDIC, an For minimizing the size of system algorithm should be
acronym for COordinate Rotation DIgital Computer. The implemented on ASIC and functionality should be verified on
www.ijcat.com 815
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 815 - 821, 2014, ISSN:- 2319–8656

FPGA before taking to ASIC.

2. BACKGROUND THEORY
2.1.Beamforming
Beamforming is a type of signal processing technique used in
sensor arrays for directional signal transmission or reception.
Here the elements are combined in such a way that signals at
particular angles experience constructive interference while
others experience destructive interference[1]. Beamformers
are classified as either data independent or statistically
optimum, depending on how the weights are chosen. The Fig.1 Passive and Acive Sonar System
weights in a data independent beamformer do not depend on
the array data and are chosen to present a specified response needed during the signal processing system as
for all signal and interference scenarios. The weights in a shown in Fig.1
statistically optimum beamformer are chosen based on the
statistics of the array data to optimize the array response. The 2.4.Cordic Theory
statistics of the array data are not usually known and may Coordinate Rotational Digital Computer (CORDIC) is a set of
change over time so adaptive algorithms are typically used to shift-add algorithm known for computing a wide range
determine the weights. The adaptive algorithm is designed so trigonometric functions, hyperbolic, linear and logarithmic
the beamformer response converges to a statistically optimum functions also like multiplication division, data type
solution [6]. conversion, square root. It is highly efficient, low complexity.
The weights in a data independent beam former are The CORDIC algorithm has found in various applications
designed so that the beamformer response approximates a such as pocket calculator, numerical co-processors to high
desired response independent of the array data or data performers Radar signal processing, supersonic bomber.
statistics. This design objective is same as that for a classical Vector rotation can also be used for polar to rectangular and
FIR filter design. The simple delay and sum beam former is rectangular to polar conversions, for vector magnitude, and as
an example of the data independent beamforming. a building block in certain transforms such as the DFT and
In statistically optimum beam former the weighs DCT. The CORDIC algorithm provides an iterative method
are chosen based on the statistics of the data received at the of performing vector rotations by arbitrary angles using only
array. The goal is to optimize the beam former response so shifts and adds. The algorithm, credited to Volder[4], is
that the output signal contains minimal contributions due to derived from the general (Givens) rotation transform:
the noise and signals arriving from directions other than the x‫=׳‬x.cos (ɸ) - y.sin (ɸ) -------------(1)
desired direction. The Frost beamformer is a statistically
optimum beam former. Other statistically optimum y‫=׳‬x.cos (ɸ) + y.sin (ɸ) --------------(2)
beamformers are Multiple Side lobe Canceller and
Maximization of the signal to noise ratio. .
These can be rearranged so that:
2.2.Sonar Imaging
Sonar (an acronym for SOund Navigation and Ranging) is a
technique that uses sound propagation (usually underwater, as
in submarine navigation) to navigate, communicate with or
detect objects on or under the surface of the water, such as
other vessels. Two types of technology share the name
"sonar": passive sonar is essentially listening for the sound
made by vessels; active sonar is emitting pulses of sounds and
listening for echoes. Sonar may be used as a means of
acoustic location and of measurement of the echo
characteristics of "targets" in the water. Acoustic location in
air was used before the introduction of radar. Sonar may also
be used in air for robot navigation, and SODAR (upward
looking in-air sonar) is used for atmospheric investigations. Fig 2. Rotation of sin and cos
The term sonar is also used for the equipment used to generate
and receive the sound. x‫=׳‬cos (ɸ). [x – y.tan (ɸ)] --------------(3)
2.3.Active and Passive Sonar System
y‫=׳‬cos (ɸ). [x + y.tan (ɸ)] ---------------(4)
Active sonar or passive sonar, when receiving the acoustic
signal reflected from the target, the information included in
the signal cannot be directly collected and used without
technical signal processing. To extract the efficient and useful
. information’s from the mixed signal, some steps should be
taken to transfer sonar data from raw acoustic data reception
to detection output.

www.ijcat.com 816
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 815 - 821, 2014, ISSN:- 2319–8656

Fig 3. Input and output of rotation for rotation mode duplicating each of the three difference equations in hardware
as shown in Figure 1. The decision function, di, is driven by
So far, nothing is simplified. However, if the rotation angles the sign of the y or z register depending on whether it is
are restricted so that tan()2-i, the multiplication by the tangent operated in rotation or vectoring mode. In operation, the
term is reduced to simple shift operation. Arbitrary angles of initial values are loaded via multiplexers into the x, y and z
rotation are obtainable by performing a series of successively registers. Then on each of the next n clock cycles, the values
smaller elementary rotations. If the decision at each iteration, from the registers are passed through the shifters and adder-
i, is which direction to rotate rather than whether or not to subtractors and the results placed back in the registers. The
rotate, then the cos(i) term becomes a constant (because shifters are modified on each iteration to cause the desired
cos(i) = cos(-i)). The iterative rotation can now be expressed as: shift for the iteration. Likewise, the ROM address is
incremented on each iteration so that the appropriate
Xi+1 =Ki[ xi – di.yi.2-i ] ---------(5) elementary angle value is presented to the z adder-subtractor.
On the last iteration, the results are read directly from the
Yi+1 =Ki[ xi + di.yi.2-i] --------(6) adder-subtractors. Obviously, a simple state machine is
required keep track of the current iteration, and to select the
Where degree of shift and ROM address for each iteration. The design
depicted in Figure 1 uses word-wide data paths (called bit-
Ki= cos(tan-12-i)= 1/sqrt(1+ 2-2i) ----(7) parallel design). The bit-parallel variable shift shifters do not
map well to FPGA architectures because of the high fan-in
di= -----------------------------------(8) required. If implemented, those shifters will typically require
several layers of logic (i.e., the signal will need to pass
Removing the scale constant from the iterative equations through a number of FPGA cells). The result is a slow design
yields a shift-add algorithm or vector rotation. The product that uses a large number of logic cells.
of the Ki's can be applied elsewhere in the system or treated
as part of a system processing gain. That product approaches 3. PROPOSED WORK
0.6073 as the number of iterations goes to i infinity. Digital input pulse is passed to find the angle or detection of
Therefore, the rotation algorithm has a gain, An of object under water. In Fig.3.Beamforming can be used at both
approximately 1.647. The exact gain depends on the number the transmitting and receiving ends in order to achieve spatial
of iterations, and obeys the relation selectivity, the data is transmitted to underwater sonar system,
sonar is used to detect the underwater objects and finds the
An=∏ sqrt(1+ 2-2i) -------------------(9) angle elevation. The beamformed data is transmitted; at the
receiver end beam formation data is generated. The generated
The angle of a composite rotation is uniquely defined by the beam formation data will be having interference and noise
sequence of the directions of the elementary rotations. That error that will be reduced by using optimization technique.
sequence can be represented by a decision vector. The set of Optimized cordic beamforming will eliminate all the
all possible decision vectors is an angular measurement interference which generated at receiver end. Final
system based on binary arctangents. Conversions between this optimization beamforming data is obtained
angular system and any other can be accomplished using
look-up. A better conversion method uses an additional
adder-subtractor that accumulates the elementary rotation Digital Beam Data
angles at each iteration. The elementary angles can be
expressed in any convenient angular unit. Those angular values Input forma- transmitti
are supplied by a small lookup table (one entry per iteration) Pulse tion ng to
or are hardwired, depending on the implementation. The
angle accumulator adds a third difference equation to the from under
algorithm. transmit water
Zi+1 = Zi + di(tan -1 2-i )----------------(10) ter sonar
system
Obviously, in cases where the angle is useful in the
arctangent base, this extra element is not needed. The CORDIC
rotator is normally operated in one of two modes. The first,
called rotation by Volder[4], rotates the input vector by a
specified angle (given as an argument). The second mode,
called vectoring, rotates the input vector to the x axis
Output Optimiz Receiver
2.5. Implementation in an FPGA
There are a number of ways to implement a CORDIC Beam ed (Beam
processor. The ideal architecture depends on the speed versus Data Cordic(B formation
area tradeoffs in the intended application. First we will
examine an iterative architecture that is a direct translation eam Data
from the CORDIC equations. From there, we will look at a forming) Generate)
minimum hardware solution and a maximum performance
solution.
2.6.Iterative CORDIC Processors
An iterative CORDIC architecture can be obtained simply by Fig 4.. Beamforming for underwater sonar

www.ijcat.com 817
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 815 - 821, 2014, ISSN:- 2319–8656

3.1.Program Flow Chart: Fig 5. shows the flow chart in which initially the
CORDIC values are sampled, the antennas are used to detect
the angles and object of beam form underwater sonar with
start sampled bit rate. The detected angles are taken as input data
where the beam data is formed, the obtained beam data are
sampled according to mathematical calculations under
CORDIC algorithm, the obtained beam data samples are
computed as quad phase and In phase. The received beam
Initialize cordic data contains noise and interference which are reduced and
values eliminated using underwater noise model. The beam data is
computed for each antenna and its angles, the error beam data
is finally computed to obtain noiseless beam data. The
obtained output is in the form of optimized beam form data.
Bit Rate No.of
3.2. Architecture:
Anten The architecture is shown in Fig.6 in which input signals are
given to memory. The memory is used to store the data of
input signals. The signals are transmitted to detect the target
or object in underwater beam form data. Once the target is
detected and beam form data is generated. The received beam
Data sample
form data is up sampled and Down sampled. The adder is
rate used to combine the images received and stored in memory.
The generated beam form data signals from sonar are given to
Input data CORDIC algorithm

Beam
Formation

Compute quad
and in phase

Beam Data
Received

Generator
underwater
Fig.6. Data path algorithm

The received data is sampled according to CORDIC


Compute Beam algorithm calculations. The Angle is measured using
Data for Each CORDIC. The sin and cos angles are generated and calculated
using CORDIC algorithm. Both the IN Phase and Quad Phase
is added and given to Cordic using adder. The CORDIC
performs vector rotation and the vector data are some to
produce the array beam (B). Angles which are detected are to
Compute Error be measured using CORDIC. The obtained Samples are stored
Beam Data in the register.

Rotation vector is given by equation (3 and (4)

To find iterations the following equations are used


output
Xi+1 αi = xi – di.yi.2-i ---------------(11)

Yi+1 αi = xi + di.yi.2-I ----------------(12)


Fig.5 .Program Flow Chart
Zi+1αi=xi – di.arctan(2-i) ----- -------(13)

www.ijcat.com 818
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 815 - 821, 2014, ISSN:- 2319–8656

To find magnitude and phase the following equations are used

X’=Zn. √(X2+Y2) ----------------(14)

Y’=0 ----------------(15)

Ө’=atan(x/y) ----------------(16)

Advantages and Disadvantages of CORDIC

 Simple Shift-and-add Operation.(2 adders+2 shifters


vs. 4 mul.+2 adder)
 It needs n iterations to obtain n-bit precision.
 Slow carry-propagate addition.
 Low throughput rate and area consuming shifting
operations.

The m and n are the input coordinates, p and q are pre-


Fig.8. Input Data
computed values, ro are the rotation value ,ѱx and ѱy is the
phase shifter. By using L point DFT the sample data are
calculated by In Fig.8 the Phase Graph of Input Data is transmitted.

Sm,n(l)= ∑ Sm,n(t) exp(-j*2*∏ *t*l)/L ,

t=0 to L-1 -------------------(17)

The Sm,n(l) are stored in memory with indexing parameter m


and n .The phase shift parameter ѱx and ѱy is added to the
phase term of the data Wm,n.Sm,n(l) by CORDIC which
perform as a vector rotation. The vector data is summed to
produce the array beam of B(ro,Өap,Өeq) where l is the
frequency , Өap and Өep is the time delay.

4. RESULTS & DISCUSSIONS:


Fig.9 Amplitude of Transmitted data.
4.1 Results of Direct Method Using MATLAB
As shown in Fig 9. the amplitude variation based on the size
of the data from the transmitter side and the amplitude
variation generate from the beamforming data.

Fig.7 Amplitude of Transmitted Data Fig.10 Output Data


Fig.7 shows the variation based on the size of the data from As shown in fig 10 the Amplitude Response of Beamformed
the transmitter side. Fig.8 shows the phase wise changes Data based on Cordic Angles, output is formed..
from -10 degree to -40 degree based on the optimized
algorithm.

www.ijcat.com 819
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 815 - 821, 2014, ISSN:- 2319–8656

4.2. Results of Optimized Method Using


MATLAB

Fig.14.Error In beamformed Transmitted Data

Fig.11. Phase Graph of Input Data Fig.14.shows the loss of data which is less compared to direct
method.
Fig.11. shows the input pulse sent from the transmitter side to
underwater to detect the target.

Fig.12.Beamformed Data. Fig.15.Amplitude Response of Beamformed Data Based on


Cordic Angles
Fig.12. shows the input data is sent in the form of samples
from the transmitter side. Fig.15. shows the amplitude response of beamformed data. At
the transmitter side , the signal is up sampled and at the
receiver side the signal is down sampled using cordic
algorithm to get accurate result.

Fig.13.Amplitude Response Data

Fig.13. shows the amplitude response of optimized data from


the transmitter to receiver.

www.ijcat.com 820
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 815 - 821, 2014, ISSN:- 2319–8656

5. COMPARISON OF OPTIMIZED [6] M. Palmese, G. De Toni, and A. Trucco, “3-D


underwater acoustic imaging by an efficient frequency
METHOD AND DIRECT METHOD domain beamforming,” in Proc. IEEE Int. Workshop
Imagining Syst. Tech., 2006, pp. 86-90.
[7] B.E. Nelson, “Configurable computing and sonar
Table 1. Comparison of direct method and optimized processing-arhitecture and implementations”,
method 2001,pp.56-60
[8] B. L. Hutchings and B. E. Nelson, “Gigaop DSP on
FPGA”, in Proc. IEEE. Int. Conf. Acoust., Speech,
Direct Optimized Memory Signal Process., 2001, pp. 885-888.
Method Method reduced
Parameter [9] G. Hampson and A. Paplinski, “Phase shift
Optimized
beamforming using cordic”, in Proc. Int. Symp. Signal
Method
Process. Appl., 1996, pp. 684-687.
Number of 107 bytes 103 bytes 59kB
Delays per [10] A. Trucco, “A least-squares approximation for the
focusing delays used in focused beamforming”, J. Acoust. Soc.
distance Amer., vol. 104, no. 1, pp. 171-175, Jul. 1998.
Validity Does not Enlarged by Enlarged by [11] J. E. Volder, “The CORDIC trigonometric computing
range. enlarge. 4 degree in 4 degree technique”, IRETrans. Electron. Comput., vol. EC-8, no.
azimuth and 3, pp. 330–334, Sep. 1959
elevation [12] J.S.Walther, “A unified algortithm for elementary
angle funcations”, in proc spring joint comput 1971,pp.379-
Computationa More Less number Reduced by 385
l requirement number of of sensors. a factor of [13] A. Trucco, “Enlarging the scanning region of a focused
sensors 2. beamforming system”, Electron. Lett., vol. 33, no. 17, pp.
1502-1504, Aug. 1997
6. CONCLUSION [14] B. O. Odelowo, “A fast beamforming algorithm for
This paper has illustrated that the proposed planar/volumetric arrays”, in Proc. 39th Asilomar Conf.
approximation enlarges the validity region of the system’s Signals, Syst. Comput., 2005, pp. 1707-1710M.
view scene. Under the preferred definition of steering [15] Palmese and A. Trucco, “Acoustic imaging of
direction condition, the validity region is enlarged at least by underwater embedded objects: Signal simulation for
4◦ in both azimuth and elevation angles. The optimized three-dimensional sonar instrumentation”, IEEE Trans.
algorithm has the advantage of reducing the memory and Instrum. Meas., vol. 55, no. 4, pp. 1339–1347, Aug.
computational requirements as compared with DM 2006.
beamforming. In high-resolution sonar systems, where more
than ten thousands of beams are produced, the required
.
memory for parameter storage is reduced.
Digital antennas have the potential of satisfying the
requirements of many systems simultaneously. They are
flexible, and capable of handling wide bandwidths, and can
perform multiple functions. The bandwidth of the modulator
and demodulator must match the bandwidth of the signal for
efficient operation. The effects of the phase slope and
amplitude variations on the pattern of a linear array were
determined by simulations that incorporated the measured
data. The simulation showed unacceptable beam squint with
frequency.

7. REFERENCES
[1] V. Murino and A.Trucco , “Three-dimensional image
generation and processing in under acoustic vision,”
vol88,n0.12 dec 2000
[2] A.Davis and A.Lugsdin, “High speed underwater
inspection for port and harbour security using coda
Echoscope 3D sonar,” 2005,pp.2006-2011 .
[3] R.K.Hansen and P.A Andersen, “The application of real
time 3D acoustical imaging,” OCEANS1998 pp.738-741
[4] M. Palmese and A. Trucco, “Digital near field
beamforming for efficient 3-D underwater acoustic
image generation,” in Proc. IEEE Int. Workshop Imaging
Syst. Tech., 2007, pp. 1-5.
[5] M. Palmese and A. Trucco, “From 3-D sonar images to
augmented reality models for objects buried on the
seafloor,” IEEE Trans. Instrum. Meas., vol. 57, no. 4, pp.
820-828, Apr. 2008.

www.ijcat.com 821
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 822 - 826, 2014, ISSN:- 2319–8656

Local Restoration in Metro Ethernet Networks for


Multiple Link Failures

Shibu. V Preetha Mathew K Jabir.K.V.T


Department of Computer Department of Computer Department of Information
Applications. Applications. Technology.
Cochin University College of Cochin University College of Cochin University College of
Engineering, Pulincunnoo. Engineering, Pulincunnoo. Engineering, Pulincunnoo.
Alappuzha, Kerala, India Alappuzha, Kerala, India Alappuzha, Kerala, India

Abstract: Ethernet a popular choice for metropolitan-area networks (MAN) due to simplicity, cost effectiveness and scalability. The
Spanning-Tree based switching mechanism, which is considered to be very efficient at avoiding switching loops in LAN environment,
is a performance bottleneck in Metro network context. Handling of link failure is an important issue in metro Ethernet networks. A
link failure may result in serious service disruptions. A local restoration method for metro Ethernet with multiple spanning trees, which
aims at fast handling of single link failures in a distributed manner, have been proposed in the literature. In this paper, we propose a
local restoration mechanism that uses MULTILINK algorithm for solving multiple link failures

Keywords: Metropolitan Area Networks (MAN), Ethernet, Spanning Tree Protocol, RSTP.

from low convergence speed and inefficient bandwidth usage


1. INTRODUCTION in case of a failure.

Ethernet is a family of computer networking technologies for


The spanning tree approach fails to exploit all the physical
local area networks (LANs). Systems communicating over
network resources, because in any network of N nodes there
Ethernet divide a stream of data into individual packets called
are at most N−1 links actively forwarding traffic. This
frames. Each frame contains source and destination addresses
produces an imbalance of load in the network. This scenario is
and error-checking data so that damaged data can be detected
impractical in large scale networks like metro networks.
and re-transmitted. Ethernet has evolved over the past decade
Further, switch and link failures require rebuilding of the
from a simple shared medium access protocol to a full-duplex
spanning tree, which is a lengthy process. IEEE 802.1w [5],
switched network. Ethernet dominates current local area
the rapid spanning tree configuration protocol (RSTP),
network (LAN) realizations. It has been estimated that more
mitigates this problem by providing mechanisms to detect
than 90 percent of IP traffic originates from Ethernet LANs.
failures and quickly reconfigure the spanning tree. However,
Efforts are underway to make Ethernet an end-to-end
the recovery period can still range from an optimistic 10
technology spanning across LANs, metropolitan area
milliseconds to more realistic multiple seconds after failure
networks (MANs), and possibly wide area networks (WANs)
detection, which is still not adequate for many applications.
[2].

In this paper, we propose a local restoration mechanism for


A Metro Ethernet [1] is a computer network that covers a
metro Ethernet, which aims at fast handling of multiple link
metropolitan area and that is based on the Ethernet standard. It
failures in an efficient manner. Multiple link failures come
is commonly used as a metropolitan access network to
from the fact that when a connection is switched to a backup
connect subscribers and businesses to a larger service network
spanning tree, it has no record of the original working
or the Internet. Metro Ethernet network is a set of
spanning tree. Therefore, when the connection encounters
interconnected LANs and access networks that work together
another failure on the backup spanning tree, there is a
using Ethernet technologies to provide access and services
possibility that it would be switched back to the original
within a metro region. Metro Ethernet networks are built from
working spanning tree and form a loop in the network.
Ethernet switches/ bridges interconnected by fiber links. A
However, when multiple link failure happens in the network
spanning tree protocol is used to establish one or more trees
and both the primary and backup spanning tree fail
spanning every access point that connects LANs.
simultaneously, some packets would be dropped when they
encounter the failure on the backup spanning tree. We propose
Failure handling is a key issue in metro Ethernet networks. A a possible approach to handle these multiple link failures. This
component failure may result in serious service disruptions. approach is to allow multiple VLAN switching and add more
To support carrier-grade services in MANs using Ethernet, it information in the header of the frames, e.g., VLAN ID of the
is a critical requirement to have a fast, reliable, and efficient original working spanning tree, when the frames are switched
failure-handling mechanism [3]. Current Ethernet switched to a backup tree. Thus, they are able to select a backup tree
networks use the spanning tree protocol family without any without forming a loop when they are affected by the second
fast failure recovery mechanism. The IEEE 802.1d Spanning failure in the network.
Tree Protocol (STP) [4] establishes a single spanning tree to
guarantee a unique path between any two switches. It suffers

www.ijcat.com 822
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 822 - 826, 2014, ISSN:- 2319–8656

2. PROBLEM DEFINITION VLAN ID on the Ethernet header, Ethernet frames can be


The simplicity and the low cost provided by Ethernet makes switched among spanning trees. Frames that frequently switch
it an attractive network technology choice in networking among spanning trees may form unexpected loops. To allow
application deployments. The Spanning-Tree Protocol (STP), VLAN switching once, set a bit in the frame’s class-of-service
which is proposed in initial version of IEEE 802.1D [4], is (CoS) field as the restoration bit. The Ethernet switch will
responsible for building a loop-free logical forwarding check a frames restoration bit before restoration and drop
topology over the physical one providing connectivity among those frames that have been earlier restored once.
all nodes. The links that are not part of this tree are blocked.
In case of a failure, the blocked links are activated providing a 3.1.2 Local Restoration Mechanism:
self-healing restoration mechanism. All information
Local restoration doesn’t require the convergence of spanning
propagated between the switches is embedded in Bridge
trees after failure. To inform that the switch is alive, each
Protocol Data Units (BPDUs). These packets are exchanged
switch periodically sends a message to its neighbors. Within a
only between adjacent bridges, and protocol events (e.g., port
predefined interval, if a switch does not receive any message
state changes) are invoked by timers, thus rebuilding the
from a port, it changes the port’s status as ―failed‖. The
topology takes considerable time. This timer based operation,
restoration module is activated when an incoming frame is
which is an STP property, results in reconfiguration times up
forwarded to the failed port; a preconfigured backup VLAN
to 50 seconds and, thus, affects network performance. The
ID replaces the frame’s original VLAN ID. At the same time,
existing system defines a local restoration mechanism for
its restoration bit is set to 1. Then, the modified frame is
metro Ethernet using multiple spanning trees, which is
forwarded to the alternative output port according to its new
distributed and fast and does not need failure notification.
VLAN ID.
Upon failure of a single link, the upstream switch locally
restores traffic to preconfigured backup spanning trees. There 3.1.3 Local Restoration Mechanism:
are two restoration approaches, connection-based and
destination-based, to select backup trees. When multiple link The network manager will perform the pre configuration
failure happens in the network and both the primary and operation which includes three parts: multiple spanning trees
backup spanning tree fail simultaneously, some packets would generation, working spanning tree assignment, and backup
be dropped when they encounter the failure on the backup spanning tree configuration.
spanning tree. The proposed system defines a possible Multiple Spanning Trees Generation: The network manager is
approach to handle these multiple link failures. This approach responsible for generating multiple spanning trees [8] a priori.
is to allow multiple VLAN switching and add more The trees should satisfy the condition to handle a link failure:
information in the header of the frames, e.g., VLAN ID of the For each link, there is at least one spanning tree that does not
original working spanning tree, when the frames are switched include that particular link [7]. Commonly, more spanning
to a backup tree. Thus, they are able to select a backup tree trees should be generated to utilize network resources
without forming a loop when they are affected by the second efficiently.
failure in the network.
Working Spanning Tree Assignment: The network manager
3. EXISTING METHODS should assign a VLAN ID to each source and destination (s–d)
3.1 Metro Ethernet Local Restoration pair based on the long-term traffic demand matrix. The frames
entering the network are attached with VLAN IDs at the
Framework ingress switches according to their source and destination
The existing system define a local restoration mechanism[1] addresses and are forwarded to the proper working spanning
in metro Ethernet that selects appropriate backup spanning trees.
trees for rerouting traffic on a working spanning tree. Then
restores the traffic to the backup spanning trees in case of Backup Spanning Tree Configuration: Frames traversing the
failure locally. The path on a backup tree to reroute traffic is failed link should be restored to a preconfigured backup
from the immediate upstream node of the failed link to the spanning tree locally when a failure happens. Backup trees at
destination node and should exclude the failed link. Using the each switch should be carefully configured according to the
current Ethernet protocols the local restoration mechanism in traffic demand such that there is enough spare capacity on the
metro Ethernet can be implemented. For this, an additional backup tree for restoration.
module should be maintained in the Ethernet switch for Backup Tree Selection Strategy
checking the appropriate backup spanning tree and restoring
frames to the backup tree after failure. The Ethernet switch selects the backup tree for each frame
according to the backup tree configuration by the network
The existing system has implemented the local restoration manager. The existing system uses two backup spanning tree
mechanism in the following three steps. selection strategies: connection-based and destination-based.
3.1.1 Per VLAN Spanning Tree: The traffic between a source–destination pair is termed as a
connection. In connection-based strategy an Ethernet switch
Local restoration from one spanning tree to another can be determines the incoming frame’s backup VLAN ID according
implemented by assuming in each spanning tree is assigned a to its source address, destination address, and original VLAN
dedicated virtual LAN (VLAN) ID [6]. The pre-calculated ID. Therefore, traffic between different source–destination
spanning tree topologies are implemented in the network by pairs traversing the same failed link may be restored to
means of VLANs, which do not change during network different backup trees
operation and ensure that there are no loops in the Ethernet
network. Therefore, STP is disabled, as it is not needed to
provide loop free topology. A unique VLAN ID is assigned to
each spanning tree, which is used by the edge routers to
forward traffic over the appropriate trees [3]. By changing the

www.ijcat.com 823
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 822 - 826, 2014, ISSN:- 2319–8656

multiple VLAN switching and add more information in the


header of the frames, e.g., VLAN ID of the original working
spanning tree, when the frames are switched to a backup tree.
Thus, they are able to select a backup tree without forming a
loop when they are affected by the second failure in the
network. It also uses the concept of local restoration
mechanism provided in the existing system in its restoration
module. The restoration module uses an algorithm namely
MULTILINK. The working of this proposed algorithm is
demonstrated in the next section.

The MULTILINK algorithm

This algorithm considers three connections. We named these


connections as TLink, Blink and RLink respectively. In the
first case consider all the three links are up. Two bits in the
Ethernet IP packet header are used in this algorithm. The first
is the restoration bit (RB) that is used in the existing system.
The second bit is termed as new bit (NB) in the proposed
restoration module, which will be assigned by a value 1 or 0
according to various conditions in the algorithm. After
considering that all the three links are up, assign value zero to
RB and NB in the second step. If the TLink is down then set
the value of RB to one and set the value of NB to zero. Then
switch to the Blink by activating the restoration module as in
the existing system. Now the traffic is going through the
Fig.1 Backup tree selection strategy. (a) Two connections on ST1 Blink. Check whether the TLink is up frequently within a
before failure of link 1–2. (b) Two connections are restored to fixed time interval of 5Ms. If the TLink is up then set the
different STs after failure in connection-based strategy. (c) Two value of the bit NB to one. Otherwise set the value of the bit
connections are restored to the same ST in destination-based strategy. NB to zero. If the Blink is up, repeat the said operations by
checking TLink is down or not. If the Blink is down then
Connection 1 from node 0 to 2 and connection 2 from node 1 check whether the value of NB. If the value of NB is equal to
to 2 uses ST1 as the working spanning tree before failure [Fig. one then route the packet through TLink otherwise route the
1(a)]. When the link between 1–2 fails, node 1 restores packet through RLink by using Rapid Spanning Tree Protocol
connection 1 to ST2 and connection 2 to ST3 according to the (RSTP). The working of MULTILINK algorithm is
pre configuration. According to their source and destination demonstrated in the following Figures.
MAC addresses frames are restored and different connections
are assigned independent backup spanning trees. Connection- 5. EXPECTED RESULTS
based backup tree selection requires a complex computation The most important design objectives of a failure handling
during pre configuration and per-(source–destination) pair mechanism are fast failover, simplicity, robustness and low
information should be maintained by each switch. protocol processing and transport overheads. Ethernet has
built-in functionalities for failure handling developed in
The existing system uses a destination- based backup tree standardization bodies. When a connection is switched to a
selection strategy, in which the frame’s backup VLAN ID is backup spanning tree and when the connection encounters
determined by its destination address and original VLAN ID, another failure on the backup spanning tree, multiple link
regardless of its source address. Frames with the same VLAN failure happens in the network. Both the primary and backup
ID and destination address would use the same backup tree in spanning tree fail simultaneously, some packets would be
a local Ethernet switch. Fig. 1(c) shows an example of the dropped when they encounter the failure on the backup
destination-based backup tree selection strategy. Connections spanning tree. When an Ethernet switch finds a packet that
1 and 2 have to use the same backup spanning tree in node 1 has been rerouted once and its output port on the backup tree
upon failure of link 1–2 since they have the same destination also fails, the switch should notify the network manager or
which is different from the connection-based strategy. Node 1 broadcast the failure message asking for spanning tree re
can only restore the two connections with the same convergence by RSTP. The algorithm invokes RSTP only
destination to the same spanning tree. when the last condition NB is equal to one doesn’t satisfy.
The MULTILINK algorithm described in the proposed
4. PROPOSED METHODOLOGY system, on implementation, can solve the multiple link failure
When a connection is switched to a backup spanning tree and efficiently
when the connection encounters another failure on the backup
spanning tree, multiple link failure happens in the network.
Both the primary and backup spanning tree fail
simultaneously, some packets would be dropped when they
encounter the failure on the backup spanning tree.

The existing system only handles single link failures in the


metro Ethernet. The proposed system is an enhancement of
the existing system. This system defines a possible approach
to handle these multiple link failures. The approach is to allow

www.ijcat.com 824
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 822 - 826, 2014, ISSN:- 2319–8656

Algorithm is defined as follows,


Start
Algorithm MULTILINK
1. Initially all the three links TLink, Blink and RLink
are considered as in up state.
Initial considerations
2. If TLink is down then
A) set RB=0 and NB=0
B) Switch to Blink by activating the Set RB=0
restoration module.
C) Check whether TLink is up frequently by
setting a time limit of 5ms.
(i) if TLink is up then
IF TLink
set NB=1
is
else down?
(a) set NB=0
Y
(b) if Blink is down
then go to step 3 Set RB=1
else go to step 2.
3. If NB=1 then
Route the packet through TLink Switch to BLink

Else
Route the packet through RLink.
Fig.2 Algorithm Y
IF TLink
is up?

Set NB=0 Set NB=1

N
IF Blink
is
down?

N Y
IF
NB=1?

Route the packet Route the packet


through RLink through TLink

Stop

Fig.3 Flowchart working of multi link algorithm

www.ijcat.com 825
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 822 - 826, 2014, ISSN:- 2319–8656

6. CONCLUSION
The existing system only handles single link failures in the
metro Ethernet. The proposed system is an enhancement of
the existing system. This system defines a possible approach
to handle these multiple link failures. It also uses the concept
of local restoration mechanism provided in the existing
system in its restoration module. The restoration module uses
an algorithm namely MULTILINK. Two bits in the Ethernet
IP packet header are used in this algorithm. The occurrence of
multiple link failure is a rare event. When implemented
properly, the proposed system solves the problem of multiple
link failures in metro Ethernet network.

7. REFERENCES
[1] Jian Qiu, Mohan Gurusamy, ―Local Restoration With
Multiple Spanning Trees in Metro Ethernet Networks‖,
IEEE/ACM Transactions On Networking, Vol. 19, No. 2,
April 2011
[2] A. Meddeb, ―Why Ethernet WAN transport,‖ IEEE
Commun. Mag.,vol. 43, no. 11, pp. 136–141, Nov. 2005.
[3] C. Antal, L. Westberg, A. Paradisi, T. R. Tronco, and V.
G. Oliveira, ―Fast failure handling in Ethernet network,‖
in Proc IEEE ICC, 2006, vol. 2, pp. 841–846.
[4] Standard for Local and Metropolitan Area Networks—
Media Access Control (MAC) Bridges, IEEE 802.1d,
1998.
[5] Standard for Local and Metropolitan Area Networks—
Rapid Reconfiguration of Spanning Tree, IEEE 802.1w,
2001.
[6] Standard for Local and Metropolitan Area Networks—
Virtual Bridged Local Area Networks, IEEE 802.1q,
1999.
[7] J. Farkas, C. Antal, G. Toth, and L.Westberg,
―Distributed resilient architecture for Ethernet networks,‖
in Proc. DRCN, 2005, pp. 515–522.
[8] K. Goplan, S. Nanda, and T. Chiueh, ―Viking: A
multiple-spanning tree Ethernet architecture for
metropolitan area and cluster networks,‖ in Proc. IEEE
INFOCOM, 2004, pp. 2283–2294.

www.ijcat.com 826
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 827 - 830, 2014, ISSN:- 2319–8656

Prediction Model Using Web Usage Mining


Techniques
Priyanka Bhart
U.I.E.T Kurukshetra University
Kurukshetra, India

Abstract: Popularity of WWW increasing day by day which results in the increase of web based services , due to which web is now
largest data repository. In order to handle this incremental nature of data various prediction techniques are used. If the prefetched pages
are not visited by user in their subsequent access there will be wastage network bandwidth as it is in limited amount. So there is critical
requirement of accurate prediction method. As the data present on web is heterogeneous in nature and incremental in nature, during the
pre-processing step hierarchical clustering technique is used. Then using Markov model category and page prediction is done and
lastly page filtering is done using keywords.

Keywords: formatting Hierarchal clustering; markov model; page prediction; category prediction ;

this prediction model will reduce the operation scope and


1. INTRODUCTION increase the accuracy precision.
With the continued growth and proliferation of E-commerce
there is need to predict users behavior. These predictions 2. RELATED WORK
helps in implementing personalization, building proper Agrawal R and Srikant R [1] proposed a website access
websites, improving marketing strategy promotion, getting prediction method based on past access behavior of user by
marketing information, forecasting market trends, and constructing first-order and second-order Markov model of
increase the competitive strength of enterprises etc.[1]. website access and compare it association rules technique.
Web prediction is one of the classification problem where a Here by using session identification technique sequence of
set of web pages a user may visit are predicted on the basis of user requests are collected, which distinguishes the requests
previously visited page which are stored in the form of web for the same web page in different browses. Trilok Nath
log files. Such kind of knowledge of users’ navigation history Pandey [2] proposed a Integrating Markov Model with
within a slot of time is referred to as a session. This data is Clustering approach for user future request prediction. Here
extracted from the log files of the web server which contains improvement of Markov model accuracy is done by grouping
the sequence of web pages that a user visits along with visit web sessions into clusters. The web pages in the user sessions
date and time. This data is fed as the training data. are first allocated into categories according to the web
services that are functionally meaning full. And lastly k-
All the user’s browsing behavior is recorded in the web log means clustering algorithm is implemented using the most
file with user’s name, IP address, date, and request time etc. appropriate number of clusters and distance measures. Lee
S.no Ip Req. Timestamp Protocol Total and Fu proposed a two level prediction model in 2008[4].
address bytes Here prediction scope is reduced as it works in two levels.
This model is designed by combining Markov Model and
Bayesian theorem. In level one using Markov Model category
prediction is done and page prediction is done by Bayesian
Table 1.Common web log theorem. Chu-Hui Lee [3] used the hierarchical agglomerative
Hierarchical clustering is a classification technique. It is an clustering to cluster user browsing behaviors due to the
agglomerative(top down) clustering method , as its name heterogeneity of user browsing features. The prediction results
suggests, the idea of this method is to build hierarchy of by two levels of prediction model framework work well in
clusters, showing relations between the individual members general cases. However, two levels of prediction model suffer
and merging clusters of data based on similarity. from the heterogeneity user’s behavior. So they have
proposed a prediction model which decreases the prediction
In order to analyze user web navigation data Markov scope using two levels of framework. This prediction model is
model is used. It is used in category and page prediction. Only designed by combining Markov model and Bayesian theorem.
the set of pages which belong to the category which is
predicted in first phase are used in second phase of page Sujatha [5] proposed the prediction of user navigation patterns
prediction. Here each Web page represent a state and ever pair using clustering and classification (PUCC) from web log data.
of pages viewed in sequence represent a state transition in this In the first phase it separates the potential users in web log
model. The transition probability is calculated by the ratio of data, and in second t-stage clustering process is used to group
number of a particular transition is visited to the number of the potential users with similar interest and lastly the results of
times the first state in the pair was visited. classification and clustering is used to predict the users future
requests. Sonal vishwakarma [6] analyzed all order Markov
This paper introduces an efficient four stage prediction model model with webpage keywords as a feature to give more
in order to analyze Web user navigation behavior .This model accurate results in Web prediction.
is used in identification of navigation patterns of users and to
anticipate next choice of link of a user. It is expected that that

www.ijcat.com 827
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 827 - 830, 2014, ISSN:- 2319–8656

3. OVERVIEW OF THE PROPOSED


PREDICTION FRAMEWORK
The prediction model is designed by combining clustering and
Markov model technique. During preprocessing step
hierarchical clustering is done to group user’s browsing
behaviors and acquires many different clusters. The
information of relevant to any cluster can be seen as cluster View
view that means every cluster has its own relevant matrix Index selection
irrespective of having these matrixes for every user, so here
table n
global view is replaced by cluster view. After preprocessing
category prediction is done by using Markov model. Here in
this phase it is to predict category at time t which depends
upon users category time at time t-1 and t-2. In the same way
page prediction is done to predict the most possible web pages
Predicting category
at a time t according to users state at a time t-1. Now the set of
predicted pages are fed for keyword based filtering. Finally
after this phase predicted results are released.
Training Predicting pages
Firstly training data is fed for clustering where k
number of cluster view will be obtained which include k data
similarity matrices S, k first-order transition matrices P and k
second-order transition matrices between categories. Keyword based page filtering
Therefore, we get K relevant matrices R to represent K cluster
views [4]. In step two, these matrices will be released out for
creating index table which will be used for view selection
based on user’s browsing behavior at that time. In step three, Output
after view selection testing data is fed into the prediction
model and prediction results will be released as output.
Figure 1. Proposed prediction framework

3.1 HIERARCHICAL CLUSTERING 3.2 MARKOV MODEL


ALGORITHM Markov is a probability based model which is represented by
three parameters<A,S,T> where A is a set of all possible
It is an agglomerative clustering method. The idea of this actions performed by any user; S is the set all possible states
method is to build a hierarchy of clusters, showing relations for which the Markov model is built; and T is a |A|× |S|
between members and merging clusters of data based on Transition probability matrix, where represent the probability
similarity. For the clustering algorithm to work there is need of performing the action j when the process is in state i.
to have some means by which similarity to be judged. This is Markov model predicts the user’s next action by looking at
generally called a distance measurement. Two commonly previously performed action by the user.
used metrics for measuring correlation (similarity/distance)
are Euclidean and the Pearson correlations. The type of Here assume that D is the given database, which
correlation metric used depends largely on what it is to be consists of user’s usage records. It means users sessions are
measured. Here we have used Euclidean correlation. recorded and D={session1,session2,…………,sessionp}. each
user session is a set of web pages recorded in time orderd
In the first step of clustering, the algorithm will look for the sequential pattern and sessionp={page1,page2,……,pagen},
two most similar data points and merge them to create a new where pagei represents user’s visiting page at time j. If a
“pseudo-data point”, which represents the average of the two website has K categories, then the user session can be
data points. Each iterative step takes the next two closest data represented as sessionc={c1,c2,…..,ck}.
points and merges them. This process is generally continued
until there is one large cluster covering all original data
points. This clustering technique will results in a “tree”,
showing the relationship of all the original points. Here every 3.3 TRANSITION MATRIX
user seems to be a cluster and grouped by most similar P is transition probability matrix represented as in
browsing feature into the cluster [12]. equation(1) where Pij represent the transition probability
between any two pages/category ie. from P i to Pj. It is
calculated by the ratio of number of transition between
category/page i and category/page j to the total number of
Training data transition between category/page i and every category/page k.

Preprocessing

Cluster view
Hierarchical clustering

(1)
www.ijcat.com Clusetr1 Cluster 828
2
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 827 - 830, 2014, ISSN:- 2319–8656

3.5 RELEVANCE MATRIX


Web sessions Last matrix to create is relevance matrix represented as in
WS1={P3,P2,P1} equation (6),which is equal to the product of transition and
similarity matrix. Here relevance is an important factor of
WS2={P3,P5,P2,P1,P4} prediction between any two category and pages. It concludes
WS3={P4,P5,P2,P1,P5,P4} the behavior between pages and categories. It is represented as
follows:
WS4={P3,P4,P5,P2,P1}
WS5={P1,P4,P2,P5,P4}

(6)
Where

(7)

4. CONCLUSION
Figure 2. Sample web sessions with corresponding 1st and 2nd As there is large amount of data on web pages on many
order transition probability matrices [7]. websites, So it is better to place them according to their
category. In this paper users browsing behavior is firstly
preprocessed using hierarchical clustering then prediction is
3.4 SIMILARITY MATRIX done in three phases. In first phase category prediction is done
using Markov model then in second phase page prediction is
Similarity between any two user user i and user j, can be done. And lastly keyword based filtering is done which gives
calculated using Euclidean distance given in equation (3) . more accurate results.

5. REFERENCES
.

(2) [1] Agrawal R, Imielinski T and Swami A “Mining


Association Rules between Sets of Items in Large
Euclidean distance Databases”, ACM SIGMOD Conference on Management
of Data, pp.207-216.
[2] Trilok Nath Pandey, Ranjita Kumari Dash , Alaka Nanda
Tripathy , Barnali Sahu, “Merging Data Mining
(3) Techniques for Web Page Access Prediction: Integrating
Markov Model with Clustering”, IJCSI International
Euclidean distance is further normalized by (4) equation, by Journal of Computer Science Issues, Vol. 9, Issue 6, No
this k×k similarity matrix as given in equation (5) will be 1, November 2012.
obtained.
[3] Chu-Hui Lee, Yu-Hsiang Fu “Web Usage Mining based
on Clustering of Browsing Features” Eighth International
Conference on Intelligent Systems Design and
Applications,IEEE, 2008. M. Young, The Technical
Writer’s Handbook. Mill Valley, CA: University
(4) Science, 1989.
[4] Chu-Hui Lee , Yu-lung Lo, Yu-Hsiang Fu, “A Novel
Prediction Model based on Hierarchical Characteristic of
Web Site”, Expert Systems with Applications 38 , 2011.
[5] V. Sujatha, Punithavalli, “Improved User Navigation
Pattern Prediction Technique From Web Log Data”,
Procedia Engineering 30 ,2012.
(5) [6] Sonal Vishwakarma, Shrikant Lade, Manish Kumar
Suman and Deepak Patel “Web User Prediction by:
Integrating Markov with Different Features”,vol2
IJERST , 2013.

www.ijcat.com 829
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 827 - 830, 2014, ISSN:- 2319–8656

[7] Deshpande M and Karypis G (2004), “Selective Markov [10] A. Anitha, “A New Web Usage Mining Approach for
Models for Predicting Web-Page Accesses”, ACM Next Page Access Prediction”, International Journal of
Transactions on Internet Technology Computer Applications, Volume8–No.11,October2010.
(TIOIT),Vol.4,No.2,pp.163-184.
[11] Mehrdad Jalali, Norwati Mustapha, Md. Nasir Sulaiman,
[8] UCI KDD archive, http://kdd.ics.uci.edu/ Ali Mamat, “WebPUM: A Web-Based Recommendation
System to Predict User Future Movements” Expert
[9] V.V.R. Maheswara Rao, Dr. V. Valli Kumari” An Systems with Applications 37 , 2010.
Efficient Hybrid Predictive Model to Analyze the
Visiting Characteristics of Web User using Web Usage [12] www.microarrays.ca/services/hierarchical_clustering.pdf
Mining” 2010 International Conference on Advances in
Recent Technologies in Communication and Computing
IEEE.

www.ijcat.com 830
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 831 - 835, 2014, ISSN:- 2319–8656

Performance Prediction of Service-Oriented Architecture


- A survey

Haitham A.Moniem, Hany H Ammar,


College of Graduate Studies, Lane Department of Computer Science and
Sudan University of Science and Technology, Electrical Engineering,
Khartoum, Sudan College of Engineering and Mineral Resources,
West Virginia University
Morgantown, USA

Abstract: Performance prediction and evaluation for SOA based applications assist software consumers to estimate their applications
based on service specifications created by service developers. Incorporating traditional performance models such as Stochastic Petri
Nets, Queuing Networks, and Simulation present drawbacks of SOA based applications due to special characteristics of SOA such as
lose coupling, self-contained and interoperability. Although, researchers have suggested many methods in this area during last decade,
none of them has obtained popular industrial use. Based on this, we have conducted a comprehensive survey on these methods to
estimate their applicability. This survey classified these approaches according to their performance metrics analyzed, performance
models used, and applicable project stage. Our survey helps SOA architects to select the appropriate approach based on target
performance metric and researchers to identify the SOA state-of-art performance prediction.

Keywords: Service; Service-Oriented Architecture; Performance; Prediction; Evaluation

performance metrics, performance model, and applicability


1. INTRODUCTION stage. Section 5 concludes the paper.
Service-Oriented Architecture (SOA) is an architectural style
as well as technology of delivering services to either users or
other services through a network. SOA architecture created in 2. SERVICE-ORIENTED
order to satisfy business goals that include easy and flexible ARCHITECTURE CONCEPTS
integration with other systems. SOA has many advantages This part briefly tries to describe some important concepts
such as reducing development costs, creative services to related to SOA.
customers, and agile deployment [1]. 2.1 Enterprise Service Bus (ESB)
An ESB is a standard infrastructure that combines messaging,
There are many definitions for SOA, but they are all point to
web services, data transformation, and intelligent routing in a
the same core idea that SOA is simply a collection of
highly distributed and different environment [7] [9].
application services. The service defined as “a function or
some processing logic or business processing that well- 2.2 Business Process Execution Language
defined, self-contained, and does not depend on the context or (BPEL)
state of other services” [2]. It also states that “Generally SOA BPEL is a language for designing SOA based systems. It
can be classified into two terms: Services and Connectors.” contains a lot of facilities such as web services composition,
publishing available services, organizing service execution,
Open Management Group (OMG) defines SOA as: “an and handling exceptions.
architectural style that supports service orientation”. It goes
further to define service orientation, “service orientation is a
2.3 ACME
ACME is a generic language for describing software
way of thinking in terms of services and services-based
architecture. It presents constructs for describing systems as
development and the outcomes of services”. Moreover, SOA
graphs of components interacting through connectors [11].
is communication between services and applications which
sometimes involves data transfer. But the communication
between applications does not happen as a point-to-point 3. EXAMPLE
interaction; instead it happens through a platform- Figure 1, present an example of SOA architecture. The
independent, general purpose middle-ware that handles all example will explain in steps the requests and responses flow
communications by the use of web services [2]. between service provider, service consumer, and the directory.
Step 1 Service provider publishes its service description on a
The main goal of this paper is to report a detail survey in directory, step 2 Consumer performs queries to the directory
performance prediction of SOA. Section 2 lays the important to locate a service and find out to communicate with the
concepts of SOA. Section 3 explains by a diagram an example provider, step 3 Service description is written in a special
of SOA base application and how the system exchanges the language called Web Service Description Language (WSDL),
messages. Section 4 presents the performance metrics of SOA step 4 Messages are sent and received from the directory in a
based applications. We considered three important metrics special language called Simple Object Access Protocol
which are response time, throughput, and resource utilization. (SOAP), step 5 Consumer formulate its message to the
Section 5 summarizes the previous work in table 1 mentioning provider using tag based language called Extensible Markup
the date of published paper, the name of authors, objectives, Language (XML). The message is generated in XML but it is

www.ijcat.com 831
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 831 - 835, 2014, ISSN:- 2319–8656

based on specifications defined in WSDL, step 6 the response o The potential problems of XML which is the
generated by the provider is also in tag based XML format. standard message format increases the time needed
to process a request.
o The time needed to discover the services through
the directory either in design time or run time.
o Rules that govern services contain a business
process by business process’s need.
Directory o Adaptation of service composition by adding new
service or adapting existing services.
WSDL Service Description o Think time is an elapsed time between the end of a
response time generated by a service and the
(Find)
(Register)
(Locate) beginning of an end user’s request [4].
(Publish)

4.2 Throughput
SOAP SOAP Throughput defined as the number of requests SOA
application can process at a given period of time. There are
(Bind) two metrics for throughput; throughput of a service and
throughput of a business process [4] as Figure 3 stated.
(XML) Service Request
Service Service The value range of these two metrics service throughput and
Provider Consumer business process throughput must be greater than zero. The
(XML) Service Response
higher the values indicate a better SOA application
(Execute) performance.

Figure. 1 Example of SOA Architecture 4.3 Resource Utilization


To analyze the performance of SOA based applications in
terms of resource utilization, there are three basics
information needed: firstly, workload information, which
Based on ISO 9126 performance metrics are response time, consists of concurrent users, and request arrival rates.
throughput, and resource utilization [12]. Therefore, accurate Secondly, software specification, which consists of execution
measuring of SOA application plays an important role to path, components to be executed, and the protocol of
business success. If the application has an efficient contention used by the software [5]. Finally, environmental
information, this information consists of system specification
performance, this will lead to high productivity, well
such as configuration and device service rates, and scheduling
hardware utilization, and customer satisfaction. Otherwise, policies.
SOA based application capability will have limited benefits,
resource wasting, low productivity, and unsatisfied customer.

Applying performance to SOA applications one of the


challenging non-functional quality attribute, this because of
the physical geographic distribution of services,
communication overhead, use of standard message format,
and varying service workload [3]. Performance evaluation and
analysis differs in each situation of SOA based application.
However, previous works on service performance are not
accurate and practical enough to effectively understand and
diagnose the reasons behind performance degradation.

4. SERVICE-ORIENTED
ARCHITECTURE PERFORMANCE
METRICS
4.1 Service Response Time
Service Response Time is the measure of the time between the
end of a request to a service and the beginning of the time
service provider response. There are many considerations to
measure service response time [4] as Figure 2 stated. The
main reasons that cause low performance of SOA based
applications are:

o Services provider and service requester are


positioned at different geographical areas, mostly
at different machines.

www.ijcat.com 832
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 831 - 835, 2014, ISSN:- 2319–8656

Business
Process
Request
Message
Processing
Time

Business
Process
Execution
Waiting Time Service Throughput

Throughput
Service
Discovery Business Process
Time Throughput

Service
Adaptation
Time
Business
Process
Transmissio
n Time
Service
Composition Figure. 3 Sub-metrics of SOA Throughput
Service
Time Request
Message
Processing
Time
Service Business
Process Business
Response CPU Usage
Processing Process
Time Logic Service
Time
Execution Execuation
Time Waiting Time

Input/Output
Service Activity
Business Request Service Logic
Process Transmission Execuation Time
Response Time
Time

Secondary Communication
Service Storage Requst Resorce Utlization Devices
Processing Transmission
Time Time

Service Memeory Uage


Response Secondary Stoarge
Transmission Processing Time
Time

Secondary
Number of
Stoarge Response
Database Calls
Transmission
Business
Time
Process
Response
Message
Processing Service
Time Response
Message Figure. 4 Sub-metrics of SOA Resource Utilization
Processing Time

Figure. 2 Sub-metrics of SOA Response Times

www.ijcat.com 833
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 831 - 835, 2014, ISSN:- 2319–8656

5. SOA PREDICTION AND


EVALUATION APPROACHES
Several approaches have been created to evaluate and predict
SOA based application performance. In the following we
provide summaries of SOA performance prediction
approaches in the scope of the survey. We have divided the
approaches on seven columns such as author name and year of
publication, main objective, prediction approach, analyzed
metrics, performance model, validation method, and
applicable project stage.
Table 1. Comparison of Several Prediction Approaches

Author Name/ Main Objective Approach used Metrics Analyzed Performance Method’s Applicable
Year Model Validation Project Stage
Kounev, Designing systems Use dynamic Response time and Queuing Petri net Compared with Runtime
Samuel, et al. with build-in self- architecture-level Resource Model PCM model result
[6], 2010 aware performance performance utilization
and resource model at run-time
management for online
capabilities performance and
resource
management
Liu, et al. [7], Develop a Measure primitive Throughput and Queuing Network Compared with the Runtime
2007 performance performance response time Model results of Microsoft
model for overheads of Web Stress Tool
predicting runtime service routing
performance based activities in the
on COTS ESB ESB
(Enterprise Service
Bus)
Tribastone, et al. Present a method Modeling the Response time, Layered Queuing Compared with Design time
[8], 2010 for performance system using Processor Network Mobile Payment
predication of UML and two utilization Model case study
SOA at early stage profiles, performance result
of development. UML4SOA, and
MARTE
Teixeira, et al. Propose approach The model uses Resource Stochastic Petri Compared with Design time
[9], 2009 to estimate Petri Net consumption, Nets (Rud et al)
performance of formalism to Service levels Model Analytical
SOA represent the degradation Method and values
process and from real
estimate its applications
performance
using simulation.
Punitha, et al. Developing an Building and Response time, Queuing Network Prototype SOA Design time
[11], 2008 architectural measuring the throughput, load Model Application has
performance performance capacity, heavily been implemented
model for SOA model using loaded and measured
ACME language components.
Brüseke, et al. Developing Comparing the Response time Palladio Applied on two case Design time
[12], 2014 PBlaman observed response Component Model studies
(Performance time of each (PCM)
Blame Analysis ) component in a
failed test case to
expected response
time from the
contract
Reddy, et al. Modeling Web Simulate the Response time and SMTQA Model Applied on case Design time
[13], 2011 Service using model using Server utilization study
UML Simulation of
Multi-tiered
Queuing
Applications
(SMTQA)

www.ijcat.com 834
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 831 - 835, 2014, ISSN:- 2319–8656

Marzolla, et al. Present a multi- Approach for Response time and Queuing Network Prototype tool called Both Design
[14], view approach for performance throughput Model bpe12qnbound time and Run
2007 performance assessment of time
prediction of SOA Web Service
based applications workflows
for users and described using
providers annotated BPEL
and WSDL
specification
[6] Kounev, S., Brosig, F., Huber, N., & Reussner, R. (2010,
6. CONCLUSION July). Towards self-aware performance and resource
We have surveyed the state-of-art in the research of management in modern service-oriented systems.
performance prediction methods for service-oriented In Services Computing (SCC), 2010 IEEE International
architecture based applications. The survey categorized the Conference on(pp. 621-624). IEEE.
approaches according to the performance metrics analyzed,
performance model, method validation, and approach [7] Liu, Y., Gorton, I., & Zhu, L. (2007, July). Performance
applicable stage. prediction of service-oriented applications based on an
enterprise service bus. In Computer Software and
The field of performance evaluation and prediction for Applications Conference, 2007. COMPSAC 2007. 31st
service-oriented architecture based application has been Annual International(Vol. 1, pp. 327-334). IEEE.
developed and matured over the last decade. Many tools and
ideas have been implemented as good software engineering [8] Tribastone, M., Mayer, P., & Wirsing, M. (2010).
practice and should lead the creation of new approaches. Performance prediction of service-oriented systems with
layered queueing networks. In Leveraging Applications
Our survey helps both architects and researchers. Architects of Formal Methods, Verification, and Validation (pp. 51-
can obtain a complete view of the performance evaluation and 65). Springer Berlin Heidelberg.
prediction approaches proposed to transfer them to industry,
on the other hand researchers can align themselves with the [9] Teixeira, M., Lima, R., Oliveira, C., & Maciel, P. (2009,
proposed approaches and add more features in the future to October). Performance evaluation of service-oriented
enhance and enrich the area. architecture through stochastic Petri nets. InSystems,
Man and Cybernetics, 2009. SMC 2009. IEEE
7. REFERENCES International Conference on (pp. 2831-2836). IEEE.
[1] Bianco, P., Kotermanski, R., & Merson, P. F. (2007).
Evaluating a service-oriented architecture. [10] Balsamo, S., Mamprin, R., & Marzolla, M. (2004).
Performance evaluation of software architectures with
[2] Krafzig, D., Banke, K., & Slama, D. (2005). Enterprise queuing network models. Proc. ESMc, 4.
SOA: service-oriented architecture best practices.
Prentice Hall Professional. [11] Punitha, S., & Babu, C. (2008, September). Performance
prediction model for service oriented applications.
[3] Erl, T. (2004). Service-Oriented Architecture. Concepts, In High Performance Computing and Communications,
Technology, and Design. Tavel, P. 2007 Modeling and 2008. HPCC'08. 10th IEEE International Conference
Simulation Design. AK Peters Ltd. on (pp. 995-1000). IEEE.

[4] Her, J. S., Choi, S. W., Oh, S. H., & Kim, S. D. (2007, [12] Brüseke, F., Wachsmuth, H., Engels, G., & Becker, S.
October). A framework for measuring performance in (2014). PBlaman: performance blame analysis based on
service-oriented architecture. In Next Generation Web Palladio contracts. Concurrency and Computation:
Services Practices, 2007. NWeSP 2007. Third Practice and Experience.
International Conference on (pp. 55-60). IEEE.
[13] Reddy, C. R. M., Geetha, D. E., Srinivasa, K. G., Kumar,
[5] Abowd, G., Bass, L., Clements, P., Kazman, R., & T. S., & Kanth, K. R. (2011). Predicting performance of
Northrop, L. (1997).Recommended Best Industrial web services using SMTQA. International Journal of
Practice for Software Architecture Evaluation (No. Computer Science Information Technology, 1(2), 58-66.
CMU/SEI-96-TR-025). CARNEGIE-MELLON UNIV
PITTSBURGH PA SOFTWARE ENGINEERING [14] Marzolla, M., & Mirandola, R. (2007). Performance
INST. prediction of web service workflows. In Software
Architectures, Components, and Applications (pp. 127-
144). Springer Berlin Heidelberg.

www.ijcat.com 835
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 836 - 838, 2014, ISSN:- 2319–8656

Location Based Tracking System for Emergency


Services
T. Swathi B.S. Malleswari
Aurora’s Technological and Reaserch Institute Aurora’s Technological and Research Institute
Uppal, Hyderabad Uppal, Hyderabad
India India.

Abstract Transmitting the geo-location information of a target via wireless networks is effective when both the target and the
tracker are within Wi-Fi coverage area; the 802.11 wireless networks are not always accessible. When the target or the tracker is
unable to access Wi-Fi, it is impossible to perform location tracking. Therefore, SMS is a relatively more reliable and flexible
solution because of its widespread use. In this system, a device is equipped with a global system for mobile communications
(GSM) modem and a GPS unit. It transmits short messages containing its GPS coordinates to the server at 30-s intervals.
Although transmitting the geo-location information of a target via wireless networks is effective when both the target and the
tracker are within Wi-Fi coverage area, the 802.11 wireless networks are not always accessible. When the target or the tracker is
unable to access Wi-Fi, it is impossible to perform location tracking. In this System, a novel method called location-based
delivery (LBD), which combines the short message service (SMS) and global position system (GPS). LBD reduces the number of
short message transmissions while maintaining the location tracking accuracy within the acceptable range. The proposed
approach, LBD, consists of three primary features: Short message format, location prediction, and dynamic threshold. The
defined short message format is proprietary.
Key Words: Short Message Service (SMS), Location Tracking, Mobile Phones, Prediction Algorithms, Global Positioning
System (GPS).

1. INTRODUCTION proposed a carmonitoring and tracking system that uses


Location based tracking and handling the devices is based both SMS and GPSto prevent car theft. Anderson et al.
on the global position system (GPS) is common in the proposed a transportationinformation system . In this
growing world, and therefore, several location tracking system, a hardware device called Star Box, which is
applications have been developed, including continuous equipped with a global system for mobile communications
location based transport, system or vehicle based intelligent (GSM) modem and a GPS unit, is installed in a vehicle to
transport, monitoring vehicles, tracking elders, children’s track the vehicle’s location. Star Box transmits short
and women employees for their safety reasons or to prevent messages containing its GPS coordinates to the server at30-
them from the being lost. The GPS is mainly used to obtain s intervals. The users can send short messages to the server
geographical location of the object (e.g., a transmitter to determine the expected arrival time of buses at their
devices or mobile devices). However, most of the above- locations. Although transmitting the geolocation
citedworks used either an 802.11 wireless network or the information of a targetvia wireless networks is effective
short messageservice (SMS) to transmit the location when both the target and the tracker are within Wi-Fi
information of a targetto a tracker. Real time tracking coverage area, the 802.11 wireless networksare not always
system is majorly used for care management applications accessible. When the target or the trackeris unable to access
for Wi-Fi, it is impossible to perform locationtracking.
children and mentally challenged people; the main aim of Therefore, SMS is a relatively more reliable and
the system is to transfer the location and position of the flexiblesolution because of its widespread use (i.e., well-
objective to the mobile device to a central GPS application structuredworldwide) [6], [8]. However, SMS is a user-pay
server through the 802.11 wireless networks. This service.
application allows the server to simultaneously monitor The objective of this study is to minimize the transmission
multiple targets (e.g., elders or children), this is in line with cost ofa tracking system by minimizing the number of SMS
Lee et al.Further, Choi et al. assumed that the location transmissionswhile maintaining the location tracking
information of a targets transmitted through wireless accuracy.
networks. Their work focused on proposing a geolocation In this paper, a novel method called location-based delivery
update scheme to decrease the update. Frequency. Lita et (LBD), which combines SMS and GPS, is proposed, and
al. proposed an automobile localization system by using further, a realistic system to perform precise location
SMS.The proposed system, which is interconnectedwith the trackingis developed. LBD mainly applies the following
car alarm system, transmits alerts to theowner’s mobile two proposedtechniques: Location prediction and dynamic
phone in the event of a car theft (e.g., activationof the car threshold. Locationprediction is performed by using the
alarm, starting of the engine) or provides informationfor current location, movingspeed, and bearing of the target to
monitoring adolescent drivers (e.g., exceeding the predict its next location.When the distance between the
speedlimit or leaving a specific area). Hameed et al. predicted location and theactual location exceeds a certain

www.ijcat.com 836
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 836 - 838, 2014, ISSN:- 2319–8656

threshold, the target transmitsa shortmessage to the tracker 3. LOCATION BASED DELIVERY
to update its current location. Thedynamic threshold Location based services (LBS) a novel class of computer
maintains the location tracking accuracy andnumber of application, which combines the short message services
short messages on the basis of the moving speed ofthe (SMS) and global position system (GPS). As such LBS is
target.The simulations performed to test the performance of an informative service and number of uses in social
LBD show that compared with other related works; the networking today as an entertainment service, which is
proposed LBD minimizes the number of short message accessible with mobile devices through the mobile network
transmissions while maintaining the location prediction and which uses information on the geographical position of
accuracy within the acceptable range. the mobile device.

Figure-2 Structure of the LBD systeem.

The proposed approach, LBD, consists of three


primary features: Shortmessage format, location prediction,
Figure-1 Overview of location based and dynamic threshold. The defined short message format
is proprietary. Location prediction isperformed by using the
tracking. current location, moving speed, and bearing of the target to
predict its next location. When the distance between the
predicted location and the actual location exceeds a certain
2. SHORT MESSAGE SERVICE threshold, the target transmits a short message to the
SMS is a text messaging service component of phone, web, trackerto update its current location. The threshold is
or mobile communication systems, using standardized dynamically adjusted to maintain the location tracking
communications protocols that allow the exchange of short accuracy and the numberof short messages on the basis of
text messages between fixed line or mobile phone themoving speed of the target. It satisfactorily maintains
devices. SMS text messaging is the most widely used data the location tracking accuracy with relatively fewer
application in the world, with 3.6 billion active users, or messages.The threshold is dynamically adjusted to
78% of all mobile phone subscribers. A short message is maintain the location tracking accuracy.
transmitted from the mobile station (MS) to the GSM base
station (BTS) through a wireless link and is received in the
backbone network of the service provider. The mobile 4. CONCLUSION:
switch center (MSC), home location register (HLR), and In this System, a novel method called location-based
visitor location register (VLR) determine the appropriate delivery (LBD), which combines the short message service
short message service center (SMSC), which processes the (SMS) and global position system (GPS). LBD reduces the
message by applying the “store and forward” mechanism. number of short message transmissions while maintaining
The term SMS is used as a synonym for all types of short the location tracking accuracy within the acceptable range.
text messaging as well as the user activity itself in many The proposed approach, LBD, consists of three primary
parts of the world. SMS is also being used as a form of features: Short message format, location prediction, and
direct marketing known as SMS marketing. SMS as used dynamic threshold. The defined short message format is
on modern handsets originated from radio telegraphy in proprietary. Location prediction is performed by using the
radio memo pagers using standardized phone protocols and current location, moving speed, and bearing of the target to
later defined as part of the GSM (Global System for Mobile predict its next location. When the distance between the
Positioning) series of standards in 1985 as a means of predicted location and the actual location exceeds a certain
sending messages of up to 160 characters, to and from threshold, the target transmits a short message to the
GSM mobile handsets. Since then, support for the service trackerto update its current location. The threshold is
has expanded to include other mobile technologies such as dynamically adjusted to maintain the location tracking
ANSI CDMA networks and digital AMPs, as well as accuracy and the number of short messages on the basis of
satellite and landline networks. Most SMS messages are the moving speed of the target.
mobile-to-mobile text messages though the standard
supports other types of broadcast messaging as well.

www.ijcat.com 837
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 836 - 838, 2014, ISSN:- 2319–8656

5. REFERENCES
[1] H. H. Lee, I. K. Park, and K. S. Hong, “Design and
implementation ofa mobile devices-based real-time
location tracking,” in Proc. UBICOMM, 2008, pp. 178–
183.
[2] Z. Tian, J. Yang, and J. Zhang, “Location-based
services applied to aelectric wheelchair based on the GPS
and GSM networks,” in Proc. ISA,2009, pp. 1–4.
[3] I. Lita, I. B. Cioc, and D. A. Visan, “A new approach of
automobile localizationsystem using GPS and GSM/GPRS
transmission,” in Proc. ISSE,2006, pp. 115–119.
[4] P. Perugu, “An innovative method using GPS tracking,
WINS technologiesfor border security and tracking of
vehicles,” in Proc. RSTSCC, 2010, pp. 130–133.
[5] S. A. Hameed, O. Khalifa, M. Ershad, F. Zahudi, B.
Sheyaa, and W.Asender, “Car monitoring, alerting, and
tracking model: Enhancementwith mobility and database
facilities,” in Proc. ICCCE, 2010, pp. 1–5.
[6] R. E. Anderson, A. Poon, C. Lustig, W. Brunette, G.
Borriello, and B. E.Kolko, “Building a transportation
information system using only GPS andbasic SMS
infrastructure,” in Proc. ICTD, 2009, pp. 233–242.
[7] W. J. Choi and S. Tekinay, “Location-based services for
next-generationwireless mobile networks,” in Proc. IEEE
VTC, 2003, pp. 1988–1992.
[8] R. E. Anderson, W. Brunette, E. Johnson, C. Lustig, A.
Poon, C. Putnam,
O. Salihbaeva, B. E. Kolko, and G. Borrielllo,
“Experiences with atransportation information system that
uses only GPS and SMS,” in Proc.ICTD, 2010.
[9] A. Civilis, C. S. Jensen, and S. Pakalnis, “Techniques
for efficient roadnetwork-based tracking of moving
objects,” IEEE Trans. Knowl.DataEng., vol. 17, no. 5, pp.
698–712, 2005.
[10] M. Zahaby, P. Gaonjur, and S. Farajian, “Location
tracking in GPS usingKalman filter through SMS,” in Proc.
IEEE EUROCON, 2009, pp. 1707–1711.
[11] A. Civilis, C. S. Jensen, J. Nenortaite, and S. Pakalnis,
“Efficient trackingof moving objects with precision
guarantees,” in Proc. MOBIQUITOUS,2004, pp. 164–173.
[12] Y. Y. Xiao, H. Zhang, and H. Y. Wang, “Location
prediction for trackingmoving objects based on grey
theory,” in Proc. FSKD, 2007, pp. 390–394.
[13] P. H. Tseng, K. T. Feng, Y. C. Lin, and C. L. Chen,
“Wireless locationtracking algorithms for environments
with insufficient signal sources,”IEEE Trans. Mobile
Comput., vol. 8, no. 12, pp. 1676–1689, 2009.
[14] R. Bajaj, S. L. Ranaweera, and D. P. Agrawal, “GPS:
Location-trackingtechnology,” Computer, vol. 35, no. 4,
pp. 92–94, 2002.
[15] Movable Type Scripts. (2012 June).[Online].
Available: http://www.mova
ble-type.co.uk/scripts/latlong.html

www.ijcat.com 838
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656

Spam filtering by using Genetic based Feature Selection

Sorayya mirzapour kalaibar Seyed Naser Razavi


Department of Computer, Shabestar Branch, Computer Engineering Department,
Islamic Azad University, Faulty of Electrical and Computer Engineering,
Shabestar, Iran University of Tabriz, Iran

Abstract:
Spam is defined as redundant and unwanted electronica letters, and nowadays, it has created many problems in business life such as
occupying networks bandwidth and the space of user’s mailbox. Due to these problems, much research has been carried out in this
regard by using classification technique. The resent research show that feature selection can have positive effect on the efficiency of
machine learning algorithm. Most algorithms try to present a data model depending on certain detection of small set of features.
Unrelated features in the process of making model result in weak estimation and more computations. In this research it has been tried
to evaluate spam detection in legal electronica letters, and their effect on several Machin learning algorithms through presenting a
feature selection method based on genetic algorithm. Bayesian network and KNN classifiers have been taken into account in
classification phase and spam base dataset is used.

Keywords: Email spam, feature selection, genetic algorithm, classification.

1. INTRODUCTION 2. LITERATURE REVIEW


Nowadays, e-mail is widely becoming one of the fastest and Features selection approaches are usually employed to reduce
most economical forms of communication .Thus, the e-mail is the size of the feature set, and to select a subset of the original
features. Over the past years, the following methods have
prone to be misused. One such misuse is the posting of
been considered to select effective features such as the
unsolicited, unwanted e-mails known as spam or junk e- algorithms based on population to select important features
mails[1]. Spam is becoming an increasingly large problem. and to remove irrelevant and redundant features such as
Many Internet Service Providers (ISPs) receive over a billion genetic algorithm (GA), particle swarm optimization (PSO),
spam messages per day. Much of these e-mails are filtered and ant colony algorithm (ACO). Some algorithms are
before they reach end users. Content-Based filtering is a key developed to classify and filter e-mails. The RIPPER
technological method to e-mail filtering. The spam e-mail algorithm [6] is an algorithm that employs rule-based to
filtering e-mails. Drucker, et. al. [7] proposed an SVM
contents usually contain common words called features.
algorithm for spam categorization. Sahami, et. al. [8]
Frequency of occurrence of these features inside an e-mail proposed Bayesian junk E-mail filter using bag-of-words
gives an indication that the e-mail is a spam or legitimate representation and Naïve Bayes algorithm. Clark, et. al. [9]
[2,3,4]. There are various purposes in sending spams such as used the bag-of-words representation and ANN for automated
economical purposes. Some of the spams are unwanted spam filtering system. Branke, J. [10] discussed how the
advertising and commercial message, while others deceive the genetic algorithm can be used to assist in designing and
training. Riley. J. [11] described a method of utilizing genetic
users to use their private information (phishing), or they
algorithms to train fixed architecture feed-forward and
temporarily destroy the mail server by sending malicious recurrent neural networks. Yao. X. and Liu. Y. [12] reviewed
software to the user’s computer. Also, they create traffic, or the different combinations between ANN and GA, and used
distribute immoral messages. Therefore, it is necessary to find GA to evolve ANN connection weights, architectures,
some ways to filter these troublesome and annoying emails learning rules, and input features. Wang and et al. presented
automatically. In order to detect spams, some methods such as feature selection incorporation based on genetic algorithm and
parameter optimization and feature selection have been support vector machine based on SRM to detect spam and
legitimate emails. The presented method had better results
proposed in order to reduce processing overhead and to
than main SVM [13]. Zhu developed a new method based on
guarantee high detection rate [16].The spam filtering is high rough set and SVM in order to improve the level of
sensitive application of text classification (TC) task. A main classification. Rough set was used as a feature selection to
problem in text classification tasks which is more serious in decrease the number of feature and SVM as a classifier[14].
email filtering is existence of large number of features. For Fagboula and et al. considered GA to select an appropriate
solving the issue, various feature selection methods are subset of features, and they used SVM as a classifier. In order
considered, which extract a lower dimensional feature space to improve the classification accuracy and computation time,
some experiments were carried out in terms of data set of
from original one and offer it as input to classifier[5]. In this
Spam assassin [15]. Patwadhan and Ozarkar presented
paper, we incorporate genetic algorithm to find an optimal random forest algorithm and partial decision trees for spam
subset of features of the spam base data set. The selected classification. Some feature selection methods have been used
features are used for classification of the spam base. as a preprocessing stage such as Correlation based feature
selection, Chi-square, Entropy, Information Gain, Gain Ratio,
Mutual Information, Symmetrical Uncertainty, One R and
Relief. Using above mentioned methods resulting in selecting

www.ijcat.com 839
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656

more efficient and useful features decrease time complexity In this research, we used weighted F-score for calculate
and increase accuracy [17]. the fitness value of each chromosome. The algorithm
starts by randomly initializing a population of N
3. GENETIC ALGORITHMS number of initial chromosome.
A genetic algorithm (GA) is one of a number of heuristic
techniques that are based on natural selection from the 4.1.2 Cross over
population members attempt to find high-quality solutions to The crossover is the most important operation in GA.
large and complex optimization problems. This algorithm can Crossover as name suggests is a process of
identify and exploit regularities in the environment, and recombination of bit strings via an exchange of
converges on solutions (can also be regarded as locating the segments between pairs of chromosomes. There are
local maxima) that were globally optimal [18]. This method is various kinds of crossover. In one point cross-over, a bit
very effective and widely used to find-out optimal or near position is randomly selected that need to change. In
optimal solutions to a wide variety of problems. The genetic this process, a random number is generated which is a
algorithm repeatedly modifies a population of individual number (less than or equal to the chromosome length)
solutions. At each step the genetic algorithm tries to select the as the crossover position [21]. Here one crossover point
best individuals. From the current “parent” population genetic is selected, binary string from beginning of
algorithm creates “children”, who constitute next generation. chromosome to the crossover point is copied from one
Over successive generations the population evolves toward an parent, the rest is copied from the second parent[22].
optimal solution. The genetic algorithm uses three main rules 4.1.3. Proposed mutation
at each step to create the next generation. Select the Mutation has the effect of ensuring that all possible
individuals, called parents that contribute to the population at chromosomes can maintain good gene in the newly
the next generation. Crossover rules that combine two parents generated chromosomes. In our approach, Mutation
to form children for the next generation. Mutation rules, apply operator is a two-steps process, and is a combination of
random changes to individual parents to form children. random and substitution mutation operator. Also is
occurs on the basis of two various mutation rates.
4. FEATURE SELECTION Mutation operator firstly events substitution step with
Features selection approaches are usually employed to the probability of 0.03. In each generation, the best
reduce the size of the feature set, and to select a subset chromosome involving better features and higher fitness
of the original features. We use the proposed genetic is selected, and it substitutes for the weakest
algorithms to optimize the features that contribute chromosome having lesser fitness than others. ( َ‫ایي هزحل‬
significantly to the classification. َ‫باعث اًتقال کزّهْسّم بزتز ًسل جاری بَ ًسل بعذی هی ضْد ک‬
‫)ُوگزایی سزیع الگْریتن را ُن بَ دًبال خْاُذ داضت‬Otherwise, it
enters the second mutation step with probability of 0.02.
4.1. Feature Selection Using Proposed This step changes some gens of chromosome randomly
Genetic Algorithm by inverting their binary cells. In fact the second is
considered to prevent reducing exploration capability of
In this section, the method of feature selection by using search space to keep diversity in other chromosomes.
the proposed genetic Algorithm has been presented. The Generally mutation probability is equal to 0.05.
procedure of the proposed method has been stated in
details in the following section. 5. RESULTS SIMULATION
4.1.1. Initialize population In order to investigate the impact of our approach on
In the genetic algorithm, each solution to the feature email spam classification, spam base data set that
selection problem is a string of binary numbers, called downloaded from the UCI Machine Learning
chromosome. In this algorithm initial population is Repository are used [23]. Data set of Spam base
generated randomly. IN feature representation as a involving 4601 emails was proposed by Mark Hopkins,
chromosome, if the value of chromosome [i] is 1, the ith and his colleagues. In This data set that is divided into
feature is selected for classification, while if it is 0, then
two parts, 1 shows spam, and zero indicates non-spam.
these features will be removed [19,20]. Figure 1 shows
feature presentation as a chromosome. This data set involves 57 features with continuous
values. In simulation of the proposed method, training
Chromosome:
set involving 70% of the main data set and two
experimental sets have been separately considered for
...
feature selection and classification. Each one involves
15% of the main data set. After performing feature
1 0 1 ... 1 0 selection using the training set, the test set was used to
evaluate the selected subset of features. The evaluation
of the overall process was based on weighted f-score
Figure 1. Feature Subset: {F1,F3 , …, Fn-1 } which is a suitable measure for the spam classification

www.ijcat.com 840
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656

problem. The performance of spam filtering techniques 6. RESULT EVALUATION


is determined by two well-known measures used in text In this section, the results of experiments have been
classification. These measures are precision and recall presented to evaluate the efficiency of proposed
[24, 25]. Here four metric have been used for evaluating method. ( 2 ‫ اس ًظز دقت در جذّل‬2ّ1‫ًتایج هقایسَ دّ طبقَ بٌذ‬
the performance of proposed method such as precision, ‫ ًوْدار گزافیکی تاثیز رّش پیطٌِادی‬2 ‫آّردٍ ضذٍ است ُوچٌیي ضکل‬
accuracy, recall and F1 score. These metrics are ‫طبق ًتایج بذست‬.‫بز هیشاى کاُص ّیژگی ُای سائذ را ًطاى هی دُذ‬
computed as follows: ‫آهذٍ در هْرد طبقَ بٌذبیشیي ًت ّرک رّش پیطٌِادی تْاًستَ است‬
‫دقت طبقَ بٌذی را ًیش عالٍّ بز حذف تعذاد قابل تْجِی اس ّیژگی ُا‬
TPi ‫ ُ وچٌیي در هْرد طبقَ بٌذ کا اى اى با ّجْد حذف ّیژگی‬. ‫افشایص دُذ‬
i  (1)
.‫ُا بَ دقتی ًشدیک بَ ُواى دقت قبل اس اًتخاب ّیژگی رسیذٍ است‬
TPi  FPi
.‫ ًطاى دادٍ ضذٍ است‬3 ‫ًتایج بذست آهذٍ بزای سَ هعیار دیگز در جذّل‬
TPi
i  (2)
َ‫ در تواهی ُز سَ هعیار ارسیابی ب‬.. ‫طبق ایي جذّل طبقَ بٌذ بیشیي‬
TPi  FN i ‫بِبْد قابل هالحظَ ای رسیذٍ است ّ طبقَ بٌذ کا اى اى با اختالف‬
.‫)ًاچیشی بَ دقتی ًشدیک بَ دقت قبلی رسیذٍ است‬Evaluation
2   
F1  (3) results obtained for Bayesian Network and KNN
(   ) classifiers are shown in table 2. These results indicate
TPi  TNi that feature selection by GA technique improves email
Accuracy  (4) spam classification. GA FS and all features by using
TPi  FPi  TNi  FN i
mentioned classifiers have been compared in terms of
Where:
Accuracy, number of selected feature, recall, precision
TPi = the number of test samples that have been
properly classified in ci class. and F score of spam class. As it is observed in table 2,
FPi = the number of test samples that have been all evaluation measure of proposed GAFS in Bayesian
incorrectly classified in ci class. network, is more than the All feature while the number
TNi = the number of test samples belonging to ci class, of selected features is lesser. In addition, in comparing
and have been correctly classified in other classes. two classifiers, Bayesian network algorithm, better
FNi = the number of test samples belonging to ci class, results were presented in comparison to KNN.
and have been incorrectly classified in other classes.
The methods of Bayesian network and K nearest Table 2: comparing feature selection methods in
neighbors algorithm (KNN) have been used for
classification. The executed program and the obtained terms of accuracy
average have been compared 8 times to investigate the
performance of each classifier. The results obtained
from the proposed method of feature selection have Algorithms
been compared without considering feature selection. All Feature GA FS
The obtained results show that when the parameters are classifier
presented in tables 1 the best performance is observed
in terms of GAFS.
Bayesian network 0.891 0.918
Table 1: the parameters of feature selection by using
genetic algorithm
KNN (N=1) 0.9 0.891

Initial population
80

Mutation rate1
0.03

Mutation rate2
0.02

Crossover
0.7

Generations
100

www.ijcat.com 841
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656

Figure 3: column graph of comparing the number

classifier KNN(N=1) Bayesian network


Table 3: comparing feature selection methods

All Feature GA FS All Feature GA FS


measures

precision 0.892 0.886 0.89 0.935

recall 0.871 0.860 0.851 0.869

F1 score 0.882 0.871 0.87 0.900

of selected features

7. CONCLUTION [5] GOWEDER, A. M., RASHED, T., ELBEKAIE, A., &


In this paper, the proposed GA based feature selection ALHAMMI, H. A. (2008). An Anti-Spam System Using
Artificial Neural Networks and Genetic Algorithms.
method has been presented and evaluated by using data Paper presented at the Proceedings of the 2008
set of Spam Base. The results obtained from proposed International Arab Conference on Information
method were compared with position without feature Technology.
selection. The obtained results show that( ‫با تْجَ بَ تعذاد‬ [6] Cohen, W. (1996). Learning Rules that Classify E-mail,
ٍ‫ )ّیژگی ُای حذف ضذ‬the proposed method has accuracy In AAAI Spring Symposium on Machine Learning in
Information Access, California.
comparable with without feature selection methods. In
[7] Drucker, H., et. al.(1999) Support Vector Machines for
addition, in Bayesian network classifier ‫(ًتایج بِتزی ًسبت‬ Spam Categorization, In IEEE Transactions on Neural
) ّ َ‫بَ طبقَ بٌذ دیگز داضت‬ Networks.

all evaluation criteria have been considerably improved. Sahami, M., et. al.,(1998). A Bayesian Approach to
ّ ‫(پس رّش پیطٌِادی تاثیز قابل هالحظَ ای بز کاُص ّیژگی ُا‬ Filtering Junk E-Mail, In Learning for Text
Categorization, AAAI Technical Report, U.S.A.
).‫بِبْد دقت داضتَ است‬/‫افشایص‬We can use of parameter
[8] Riley. J. (2002). An evolutionary approach to training
optimization in this work also the proposed algorithm
Feed-Forward and Recurrent Neural Networks", Master
can be combined with other classification algorithms in thesis of Applied Science in Information Technology,
the future. Department of Computer Science, Royal Melbourne
Institute of Technology, Australia.
REFERENCE [9] Clark, et. al. (2003). A Neural Network Based Approach
[1] GOWEDER, A. M., RASHED, T., ELBEKAIE, A., & to Automated E-Mail Classification, IEEE/WIC
ALHAMMI, H. A. (2008). An Anti-Spam System Using International Conference on Web Intelligence.
Artificial Neural Networks and Genetic Algorithms. [10] Branke, J. (1995). Evolutionary algorithms for neural
Paper presented at the Proceedings of the 2008 network design and training, In Proceedings 1st Nordic
International Arab Conference on Information Workshop on Genetic Algorithms and its Applications,
Technology. Finland.
[2] Bruening, P.(2004). Technological Responses to the [11] Yao. X., Liu. Y. (1997). A new evolutionary system for
Problem of Spam: Preserving Free Speech and Open evolving artificial neural networks", IEEE Transactions
Internet Values. First Conference on E-mail and Anti- on Neural Networks.
Spam.
[12] Wang, H.-b., Y. Yu, and Z. Liu. (2005) SVM classifier
[3] Graham, P.(2003). A Plan for Spam. MIT Conference on incorporating feature selection using GA for spam
Spam. detection, in Embedded and Ubiquitous Computing–
[4] William, S., et. al. (2005). A Unified Model of Spam EUC 2005., Springer. p. 1147-1154.
Filtration, MIT Spam Conference, Cambridge.

www.ijcat.com 842
International Journal of Computer Applications Technology and Research
Volume 3– Issue 12, 839 - 843, 2014, ISSN:- 2319–8656

[13] Zhu, Z. (2008). An email classification model based on


rough set and support vector machine. in Fuzzy Systems
and Knowledge Discovery.
[14] .Temitayo, F., O. Stephen, and A. Abimbola. (2012).
Hybrid GA-SVM for efficient feature selection in e-mail
classification. Computer Engineering and Intelligent
Systems. 3(3): p. 17-28.
[15] Stern, H. (2008) A Survey of Modern Spam Tools. in
CEAS. Citeseer.
[16] Ozarkar, P. and M. Patwardhan. (2013).
INTERNATIONAL JOURNAL OF COMPUTER
ENGINEERING & TECHNOLOGY (IJCET). Journal
Impact Factor. 4(3): p. 123-139.
[17] Zhang, L., Zhu, J., & Yao, T. (2004). An evaluation of
statistical spam filtering techniques. ACM Transactions
on Asian Language Information Processing (TALIP),
3(4), 243-269.
[18] Vafaie H, De Jong K. (1992). Genetic algorithms as a
tool for feature selection in machine learning. In
Proceedings of Fourth International Conference on Tools
with Artificial Intelligence (TAI '92). 200-203.
[19] Yang J, Honavar V. (1998). Feature subset selection
using a genetic algorithm. Intelligent Systems and their
Applications, IEEE, 13(2):44-49.
[20] Shrivastava, J. N., & Bindu, M. H. (2014). E-mail Spam
Filtering Using Adaptive Genetic Algorithm.
International Journal of Intelligent Systems &
Applications, 6(2).
[21] Karimpour, J., A.A. Noroozi, and A. Abadi. (2012). The
Impact of Feature Selection on Web Spam Detection.
International Journal of Intelligent Systems and
Applications (IJISA), 4(9): p. 61.
[22] UCI repository of Machine learning Databases. (1998).
Department of Information and Computer Science,
University of California, Irvine, CA,
http://www.ics.uci.edu/~mlearn/MLRepository.html,
Hettich, S., Blake, C. L., and Merz, C. J.
[23] Liao, C., Alpha, S., Dixon.P. (2004). Feature
Preparation in Text Categorization, Oracle Corporation.
[24] Clark, et. al. (2003). A Neural Network Based Approach
to Automated E-Mail Classification, IEEE/WIC
International Conference on Web Intelligence

www.ijcat.com 843

You might also like