Securemail PDF
Securemail PDF
net/publication/288165928
Malicious Email Detection & Filtering System Using Bayesian Machine Learning
Algorithm
CITATIONS READS
0 1,431
1 author:
Khurram Hameed
The Islamia University of Bahawalpur
5 PUBLICATIONS 16 CITATIONS
SEE PROFILE
All content following this page was uploaded by Khurram Hameed on 26 December 2015.
Authors
Rao Muhammad Umer 10CS61
2010-IU-967
Tayyeb Islam 10CS76
2010-IU-972
Awais Anwar 10CS119
2010-IU-994
Hussain Ahmed Madni 10CS67
2010-IU-970
Supervisor
i
Malicious Email Detection & Filtering
System Using Bayesian Machine Learning
Algorithm
Authors
Rao Muhammad Umer 10CS61
2010-IU-967
Tayyeb Islam 10CS76
2010-IU-972
Awais Anwar 10CS119
2010-IU-994
Hussain Ahmed Madni 10CS67
2010-IU-970
Supervisor
2010-2014
ii
ABSTRACT
users are identified by determining the keywords used by someone. The keywords such
as bomb, RDX and attached harmful files are found in the mails which are sent by the
user. All these blocked mails are checked by the administrator and identify the users who
Malicious E-mail is the term used to describe any code in any part of a software system
system. Malicious code describes a broad category of system security terms that includes
attack scripts, viruses, worms, Trojan horses, backdoors, and malicious active content.
1
UNDERTAKING
I certify that research work titled “Malicious Email Detection & Filtering System
Using Bayesian Machine Learning Algorithm” is my own work. The work has not
been presented elsewhere for assessment. Where material has been used from other
Signature of Students
2
ACKNOWLEDGEMENTS
We are grateful to Allah Almighty who provided us with this wonderful concept
and bestowed us the ability to complete it successfully. We are thankful to our parents for
their constant support and their financial help as the project wouldn’t have been
We are grateful to our supervisor Engr. Khurram Hameed who helped us out of many
hitches and provided us guidance at many instances and motivated us to carry on our
work which obviously resulted in successful completion of our project. And finally, we
thank all the faculty members of this great institution that helped us over the course of
our studies. This project wouldn’t have been made without their support.
3
TABLE OF CONTENTS
Undertaking…………………………………………………………...………2
Acknowledgement ............................................................................................... 3
3.2.1Model-1 (MVC)……………………………………………………..23
3.2.2Model-2 (MVC)……………………………………………………..24
4
4.1.2Comparision between Waterfall and Agile Software Process Model...27
4.2Scrum……………………………………………………………………28
4.2.1User Story……………………………………………………………...28
5.3.2.1Admin…………………………………………………………………34
5.3.2.2User……………………………………………………………………35
5.3.3.1Admin…………………………………………………………………35
5.3.3.2User……………………………………………………………………36
5
5.3.4 Sequence Diagram……………………………………………………...36
5.3.4.1Admin…………………………………………………………………36
5.3.4.2User……………………………………………………………………37
5.3.5.1Admin…………………………………………………………………38
5.3.5.2User……………………………………………………………………38
5.3.6.1Admin………………………………………………………………….39
5.3.6.2User…………………………………………………………………….39
5.3.7.1Admin…………………………………………………………………..40
5.3.7.2User…………………………………………………………………….40
5.3.8.1Admin………………………………………………………………….41
5.3.8.2User…………………………………………………………………….41
5.3.9.1Admin…………………………………………………………………..42
5.3.9.2User……………………………………………………………………..42
6.1Java………………………………………………………………………….43
6.1.2Java Platform……………………………………………………………...45
6
6.1.4Java As Emerging Technology…………………………………………….47
6.2Servlets……………………………………………………………………….48
6.2.1Introduction…………………………………………………………………48
6.2.2Attractiveness of Servlets…………………………………………………...49
6.2.4Featurs of Servlets…………………………………………………………...50
6.2.5Loading Servlets……………………………………………………………..51
6.2.6Invoking Servlets…………………………………………………………….52
6.3JSP……………………………………………………………………………...52
6.4.1Simple Syntax…………………………………………………………………54
6.5JSTL……………………………………………………………………………..55
6.6Java Beans……………………………………………………………………….56
6.8HTML……………………………………………………………………………59
6.8.1Objectives……………………………………………………………………...59
6.8.2Prerequisits…………………………………………………………………….59
6.9CSS……………………………………………………………………………...62
7
6.9.2Main Facilities………………………………………………………………..63
6.9.3Browser Support……………………………………………………………...64
6.9.4CSS Specification…………………………………………………………….64
7.1Database………………………………………………………………………..65
7.2Relational DBMS………………………………………………………………66
7.2.1Integrity Rules………………………………………………………………..66
7.2.2Select Statements……………………………………………………………..68
7.2.3Where Clauses………………………………………………………………..69
7.2.4Join…………………………………………………………………………...70
7.2.7Transections…………………………………………………………………...73
7.2.8Stored Procedures……………………………………………………………..73
7.4JDBC Introduction……………………………………………………………….75
7.4.2JDBC Architecture……………………………………………………………..78
8.1Bayes Theorem…………………………………………………………………...80
8.1.1Anti-Spam Filter………………………………………………………………..80
8
8.2UNICODE Encryption Algorithm………………………………………………..84
8.2.1Algorithm……………………………………………………………………….85
9.1Testing in Strategies………………………………………………………………86
9.1.1Unit Testing……………………………………………………………………..86
9.1.2Integrating Testing………………………………………………………………87
9.1.3System Testing…………………………………………………………………..87
9.1.4Acceptance Testing………………………………………………………………87
9.2Test Approach……………………………………………………………………...87
9.2.1Bottom Up Approach…………………………………………………………….87
9.3Validation…………………………………………………………………………..88
10.1Hardware Requirements…………………………………………………………..89
10.2Software Requirements…………………………………………………………...89
11.1Eclipse JEE………………………………………………………………………..90
11.1.1Package Description…………………………………………………………….90
11.1.2Package Includes………………………………………………………………..90
9
11.3My SQL Workbench………………………………………………………………91
11.3.1Design……………………………………………………………………………..91
11.3.2Develop………………………………………………………………………….91
11.3.3Administrator……………………………………………………………………91
11.3.5Database Migration……………………………………………………………...92
11.4Apache Tomcat……………………………………………………………………92
11.4.1Terminology……………………………………………………………………..92
11.5Adobe Dreamweaver……………………………………………………………...93
12.1HelloWorld Application…………………………………………………………. 94
13.1Conclusion……………………………………………………………………….104
13.2Future Enhancements……………………………………………………………104
List of Abbreviations………………………………………………………………..…105
References………………………………………………………………………..……107
10
LIST OF FIGURES
Number Page
Fig3.4Model-2(MVC) ....................................................................................25
11
Fig 5.6Sequence Diagram (User) ...................................................................37
12
Fig 12.5Set Default Values ............................................................................98
13
CHAPTER 1
INTRIDUCTION
Several articles, industry reports and congressional testimonies document the existenceof
targeted malicious email (TME) sent by malicious threat actors not necessarilymotivated
by profit alone. These malicious emails have been targeted at companyexecutives,
government personnel and other individuals with access to sensitive in-formation useful
by an opposing party to advance a cause. Current research andcommercial methods for
detecting illegitimate email are limited to addressing Internetscale email abuse such as
spam, none seek to address targeted malicious emails.
For organizations targeted by these emails, detection is critically important since these
emails can enable the installation of malicious software on the targeted user’scomputer
system. This malicious software can contain a backdoor that allows amalicious threat
actor to gain entrance to an organization’s network and its sensitiveinformation. Whereas
conventional unwanted email, such as spam, is sent in bulk to alarge numberof people
onthe Internet, TMEissentto veryspecificindividuals.
The techniques that malicious threat actors use to craft and send these targeted emails are
different from the techniques used by spammers. Furthermore, since thetargeted emails
are sent to specific individuals, the characteristics of the recipientare relevant whereas
with spam, they are less relevant. This dissertation exploits the differences between spam
and TME by capturing features of TME and TME recipientsand incorporating them into a
decision classifier. The classifier is an algorithm thatcategorizes a given email as either
TME or non-targeted malicious email (NTME).
14
All organizations allow email to enter in their network some of the attackers target single
user or small group and extract important information by injecting malicious code in the
email as well as in email attachment that creates backdoor in system. If we rely on
current conventional detection methods, targeted email attack goes undetected and file
attachment have malicious code that is also create trouble in network. A malicious
executable is defined to be a program that performs a malicious function, such as
compromising a system’s security, damaging a system or obtaining sensitive information
without the user’s permission. Using data mining methods. Every day some malicious
programs are created and most cannot beaccurately detected until signatures have been
generated for them.
1.2 Objectives
Malicious E-mail is the term used to describe any code in any part of a software
system or script that is intended to cause undesired effects, security breaches or
damage to a system. Malicious code describes a broad category of system security
terms that includes attack scripts, viruses, worms, Trojan horses, backdoors, and
malicious active content.
15
1.3 Block Diagram
Figure 1.1
16
CHAPTER 2
PROJECT OVERVIEW
This module is used by administrator and users (who are authenticated) to login into
the Colors mail. The login details of the specified person will be entered and hence
can enter into the Colors mail.
Figure 2.1
17
2.2 Registration Module
This module is used by the unauthenticated users who are unregistered. The users
must register themselves such that they can login into the Colors mail.
Figure 2.2
This module is used by the administrator to perform the functions like managing the
keywords, entering new keywords and to check out the block list of the discarded
mails.
18
Figure 2.3
This module is used by the administrator to perform the functions like encryption of
the words. The encrypted words are sent to the database and hence managed by the
administrator.
Figure 2.4
19
2.5 User Module
This module is used by the users to do operations like composing mail, checking out
the mails in inbox and finally sending the mails to the authenticated users by
attaching a message.
Figure 2.5
20
CHAPTER 3
System Architecture
3.1 N-Tier Architecture
Figure 3.1
21
architecture then you can keep any middle ware like Web Logic or WebSphere
software in between your Web Server and Web Browsers.
Figure 3.2
Advantages of Tiers
Layering helps you to maximize maintainability of the code, optimize the way that
the application works when deployed in different ways, and provide a clear
delineation between locations where certain technology or design decisions must be
made.
22
Placing your layers on separate physical tiers can help performance by distributing
the load across multiple servers. It can also help with security by segregating more
sensitive components and layers onto different networks or on the Internet versus an
intranet.
3.2.1Model-1(MVC)
In Model 1, a request is made to a JSP or servlet and then that JSP or servlet handles all
responsibilities for the request, including processing the request, validating data, handling
the business logic, and generating a response. The Model 1 architecture is commonly
used in smaller, simple task applications due to its ease of development.
Also, the Model 1 architecture unnecessarily ties together the business logic and
presentation logic of the application.
Figure 3.3
23
3.2.2 Model-2(MVC)
MVC stands for Model View Controller. Models are nothing but POJO (Plain Old Java
Object). Views are any view technology like JSP, HTML, Velocity etc. and controller
contains actual business logic.
MVC is a design pattern methodology which used to separate data away from view and
business logic, as instructor said Model (data) never import from view or controller
packages.
Model 2 is a complex design pattern used in the design of Java Web applications which
separates the display of content from the logic used to obtain and manipulate the content.
Since Model 2 drives a separation between logic and display, it is usually associated with
themodel–view–controller (MVC) paradigm. While the exact form of the MVC "Model"
was never specified by the Model 2 design, a number of publications recommend a
formalized layer to contain MVC Model code. The Java BluePrints, for example,
originally recommended usingEJBs to encapsulate the MVC Model.
In a Model 2 application, requests from the client browser are passed to the controller.
The controller performs any logic necessary to obtain the correct content for display. It
then places the content in the request (commonly in the form of a JavaBean or POJO) and
decides which view it will pass the request to. The view then renders the content passed
by the controller.Model 2 is recommended for medium- and large-sized applications.
24
Figure 3.4
MVC usually is model for single application, where all 3 parts are more or less connected
between each other, but 3-tier
3 architecture
cture means, that it's something like 3
modules/different
nt projects. And every of them will do its own part and will have own
architecture, probably MVC.
Advantages of MVC
MVC is a design pattern to develop web applications. Model, View and Controller are the
three components to separate logics to reuse entire the applications. Since there are
multiple layers representing multiple logics there will be clean separation of roles or
logics.
Modification of one logic does not affect other logics.
Easy to maintain and enhance the project.
Parallel development is possible due to this productivity is very good.
Project Leader divides a team into two parts.
1. Web Authors
25
2. J2EE Developers
1. Web Authors: These programmers use JSP in developer’s presentation logic of the
application.
2. J2EE Developers: These programmers use J2EE technologies like servlets EJB and etc.,
to develop integration logic and business logic, persistence logic of the application.
26
CHAPTER 4
Figure 4.1
27
4.2 Scrum
Figure 4.2
• Description:
As a <role>
I want <feature>
So that <reason/value>
• Acceptance Criteria:
Given <Pre-condition>
• Definition of Done
28
Passed testing per acceptance criteria items
Accepted by UI team
Scenario 2:
User is unauthorized
Id or password isn’t correct
Relevant website or internet isn’t working well
29
Relevant website and Internet is working well
A person should be registered
• Any roadblocks?
2. You are not allowed to write any more of a unit test than is sufficient to fail; and
compilation failures are failures.
30
3. You are not allowed to write any more production code than is sufficient to pass the
one failing unit test.
Figure 4.3
31
CHAPTER 5
SYSTEM DESIGN
5.1 Bottom-Up VS Top-Down Approach
There are two approaches for developing any database, the top-down method and
the bottom-up method. While these approaches appear radically different, they share the
common goal of uniting a system by describing all of the interaction between the
processes. Let's examine each approach:
The top-down method starts from the general and moves to the specific. Basically, you
start with a general idea of what is needed for the system and then ask the end-users what
data they need to store. The analyst will then work with the users to determine what data
should be kept in the database. Using the top-down method requires that the analyst has
a detailed understanding of the system. The top-down method also can have
shortcomings. In some cases, top-down design can lead to unsatisfactory results because
the analyst and end-users can miss something that is important and is necessary for the
system.
The bottom-up approach begins with the specific details and moves up to the general. To
begin a bottom-up design, the system analyst will inspect all the interfaces that the
system has, checking reports, screens, and forms. The analyst will work backwards
through the system to determine what data should be stored in the database.
To understand the differences between these approaches, let's consider some jobs that are
bottom-up in nature. In statistical analysis, analysts are taught to take a sample from a
small population and then infer the results to the overall population. Physicians are also
trained in the bottom-up approach. Doctors examine specific symptoms and then infer
the general disease that causes the symptoms.
32
An example of jobs that require the top-down approach include project management and
engineering tasks where the overall requirements must be specified before the detail can
be understood. For example, an automobile manufacturer must follow a top-down
approach to meet the overall specifications for the car. If a car has the requirement that it
cost less than 15,000 dollars, gets 25 miles per gallon, and seating five people. In order
to meet these requirements the designers must start by creating a specification document
and then drilling down to meet these requirements.
The analyst will have no choice but to talk and work with the users to determine what is
important to the users and as a result determines what data should be stored in the
database. What the analyst usually does is create some prototype reports, screens, and
forms to help the users visualize what the system will look like and how the system will
work.
33
This model view models the static structures.
5.3.1.3 BEHAVIORAL MODEL VIEW
It represents the dynamic of behavioral as parts of the system, depicting
the interactions of collection between various structural elements
described in the user model and structural model view.
5.3.1.4 IMPLEMENTATION MODEL VIEW
In this the structural and behavioral as parts of the system are represented
as they are to be built.
5.3.1.5 ENVIRONMENTAL MODEL VIEW
In this the structural and behavioral aspects of the environment in which the
system is to be implemented are represented.
UML is specifically constructed through two different domains they are:
UML Analysis modeling, this focuses on the user model and structural model
views of the system.
UML design modeling, which focuses on the behavioral modeling,
implementation modeling and environmental model views.
Use case Diagrams represent the functionality of the system from a user’s point of
view. Use cases are used during requirements elicitation and analysis to represent the
functionality of the system. Use cases focus on the behavior of the system from external
point of view.
5.3.2.1 ADMIN
34
Figure 5.1
5.3.2.2 USER
Figure 5.2
5.3.3.1 ADMIN
35
Keywords
Manage Keywords
Add Keywords()
Display Keywords()
Admin
Keywords
Alert Mails
Informative
Block List
Enter keywords() Alert Mails
check mails() Informative Mails
check mails()
Figure 5.3
5.3.3.2USER
Compose Mails
composing the mail
Sent Mails
User
Sent items
Mails
check the sent items()
Compose()
send()
Inbox
Received mails
Figure 5.4
5.3.4.1 ADMIN
36
-
Manage Keywords
Figure 5.5
5.3.4.2 USER
Figure 5.6
37
5.3.5.1 ADMIN
Keyword
s
Admin Alert
Mails
Informativ
e Mails
Figure 5.7
5.3.5.2 USER
Compos
e Mail
User Inbox
Sent
Items
Figure 5.8
38
5.3.6 Object Diagram
5.3.6.1 ADMIN
Figure 5.9
5.3.6.2 USER
Figure 5.10
39
5.3.7 Use Case Diagram
5.3.7.1 ADMIN
Keywords
Alert Mails
Admin
Informative Mails
Figure 5.11
5.3.7.2 USER
Compose Mail
Inbox
User
Sent Items
Figure 5.12
40
5.3.8 Component Diagram
5.3.8.1 ADMIN
Keywords
Informative
Mails
Figure 5.13
5.3.8.2 USER
Compose
Mail
User Inbox
Sent Items
Figure 5.14
41
5.3.9 Deployment Diagram
5.3.9.1 ADMIN
Keywor
ds
Admin Alert
Mails
Informati
ve Mails
Figure 5.15
5.3.9.2 USER
Compos
e Mails
User Inbox
Sent
Items
Figure 5.16
42
CHAPTER 6
DEVELOPMENT LANGUAGES
6.1Java
In the Java programming language, all source code is first written in plain text files
ending with the .java extension. Those source files are then compiled into.Class files by
the javac compiler. A .class file does not contain code that is native to your processor; it
instead contains bytecodes — the machine language of the Java Virtual Machine (Java
VM). The java launcher tool then runs your application with an instance of the Java
Virtual Machine.
43
Figure 6.1
Figure 6.2
44
Through the Java VM, the same application is capable of
running on multiple platforms.
6.1.2Java Platform
A platform is the hardware or software environment in which a program runs. We've
already mentioned some of the most popular platforms like Microsoft Windows, Linux,
Solaris OS, and Mac OS. Most platforms can be described as a combination of the
operating system and underlying hardware. The Java platform differs from most other
platforms in that it's a software-only platform that runs on top of other hardware-based
platforms.
You've already been introduced to the Java Virtual Machine; it's the base for the Java
platform and is ported onto various hardware-based platforms.
The API is a large collection of ready-made software components that provide many
useful capabilities. It is grouped into libraries of related classes and interfaces; these
libraries are known as packages. The next section, What Can Java Technology
Do? Highlights some of the functionality provided by the API.
Figure 6.3
45
As a platform-independent environment, the Java platform can be a bit slower than native
code. However, advances in compiler and virtual machine technologies are bringing
performance close to that of native code without threatening portability.
The terms"Java Virtual Machine" and "JVM" mean a Virtual Machine for the Java
platform.
Development Tools: The development tools provide everything you'll need for
compiling, running, monitoring, debugging, and documenting your applications.
As a new developer, the main tools you'll be using are the javac compiler,
the java launcher, and the javadoc documentation tool.
Application Programming Interface (API): The API provides the core
functionality of the Java programming language. It offers a wide array of useful
classes ready for use in your own applications. It spans everything from basic
objects, to networking and security, to XML generation and database access, and
more. The core API is very large; to get an overview of what it contains, consult
the Java Platform Standard Edition 7 Documentation.
Deployment Technologies: The JDK software provides standard mechanisms
such as the Java Web Start software and Java Plug-In software for deploying your
applications to end users.
User Interface Toolkits: The Swing and Java 2D toolkits make it possible to
create sophisticated Graphical User Interfaces (GUIs).
Integration Libraries: Integration libraries such as the Java IDL API, JDBC™
API, Java Naming and Directory Interface™ (JNDI) API, Java RMI, and Java
Remote Method Invocation over Internet Inter-ORB Protocol Technology (Java
RMI-IIOP Technology) enable database access and manipulation of remote
objects.
46
6.1.4 Java as an Emerging Technology
We can't promise you fame, fortune, or even a job if you learn the Java programming
language. Still, it is likely to make your programs better and requires less effort than
other languages. We believe that Java technology will help you do the following:
47
6.2 SERVLET
6.2.1 Introduction
A Servlet is a generic server extension. a Java class that can be loaded
Dynamically to expand the functionality of a server. Servlets are commonly used with
web servers, where they can take the place CGI scripts.
A servlet is similar to proprietary server extension, except that it runs inside a Java
Virtual Machine (JVM) on the server, so it is safe and portable Servlets operate solely
within the domain of the server.
Unlike CGI and Fast CGI, which use multiple processes to handle separate program or
separate requests, separate threads within web server process handle all servlets. This
means that servlets are all efficient and scalable.Servlets are portable; both across
operating systems and also across web servers. Java Servlets offer the best possible
platform for web application development.
Servlets are used as replacement for CGI scripts on a web server, they can extend any
sort of server such as a mail server that allows servlets extend its functionality perhaps by
performing a virus scan on all attached documents or handling mail filtering tasks.
Servlets provide a Java-based solution used to address the problems currently associated
with doing server-side programming including inextensible scripting solutions platform-
specific API’s and incomplete interface.Servlets are objects that conform to a specific
interface that can be plugged into a Java-based server.
Servlets are to the server-side what applets are to the server-side what applets are to the
client-side-object byte codes that can be dynamically loaded off the net. They differ from
applets in than they are faceless objects(without graphics or a GUI component).They
serve as platform independent, dynamically loadable,pluggable helper byte code objects
on the server side that can be used to dynamically extend server-side functionality.
For example an HTTP servlet can be used to generate dynamic HTML content when you
use servlets to do dynamic content you get the following advantages:
48
They’re faster and cleaner then CGI scripts
They use a standard API( the servlet API)
They provide all the advantages of Java (run on a variety of servers without
needing to be rewritten
49
6.2.4 Features of Servlet
Servlets are persistent.Servlet are loaded only by the web server and can maintain
services between requests.
Servlets are fast. Since servlets only need to be l\loaded once, they offer much
better performance over their CGI counterparts.
Servlets are platform independent.
Servlets are extensible Java is a robust, object-oriented programming language,
which easily can be extended to suit your needs.
Servlets are secure
Servlets are used with a variety of client.
Servlets are classes and interfaces from two packages javax .servlet and
javax.servlet http. The java servlet package contains classes support generic, protocol-
independent servlets.The classes in the javax servlet http package to and HTTP specific
functionality extend these classes
Every servlet must implement the javax servlet interface.Most servlets implement it by
extending oneoftwo classes javax servlet GenericServlet or javax servlet http
HttpServlet.A protocol-independent servlet should subclass Generic-Servlet, while an
Http servlet should subclass HttpServlet, which is itself a subclass of Generic-servlet with
added HTTP-specific functionality.
Unlike a java program, a servlet does not have a main () method,Instead the server in the
process of handling requests invoke certain methods of a servlet.Each time the server
dispatches a request to a servlet, it invokes the servlets Service () method,
A generic servlet should override its service () method to handle requests as appropriate
for the servlet.The service() accepts two parameters a request object and a response
object .The request object tells the servlet about the request, while the response object is
used to return a response.
In Contrast anHttp servlet usually does not override the service () method.Instead it
overrides doGet () to handle GET requests and doPost () to handle Post requests. An Http
servlet can override either or both of these modules the service () method of HttpServlet
50
handles the setup and dispatching to all the do XXX () methods which is why it usually
should not be overridden
The remainders in the javax servlet and javax servlet http package are largely support
classes .The Servlet Request and ServletResponse classes in javax servlet provide access
to generic server requests and responses while HttpServletRequest and
HttpServletResponse classes in javax servlet provide access to generic server requests
and responses while HttpServletRequest and HttpServletResponse in javax.servlet.http
provide access a HTTP requests and responses. The javax.servlet.http provide contains an
HttpSession class that provides built-in session tracking functionality and Cookie class
that allows quickly setup and processing HttpCookies.
51
servlet(either form local disk or from the network) and the then invokes the “service”
method.Also like applets,local servlets in the server can be identified by just the class
name.In other words, if a servlet name is not absolute.it is treated as local.
A Client can Invoke Servlets in the Following Ways:
The client can ask for a document that is served by the servlet.
The client(browser) can invoke the servlet directly using a URL, once it has been
mapped using the SERVLET ALIASES Section of the admin GUI
The servlet can be invoked through server side include tags.
The servlet can be invoked by placing it in the servlets/directory
The servlet can be invoked by using it in a filter chain
6.3 JSP
The first JavaServer Pages specification was released in 1999. Originally JSP
wasmodeled after other server-side template technologies to provide a simplemethod of
embedding dynamic code with static markup. When a request ismade for the content of a
JSP, a container interprets the JSP, executes anyembedded code, and sends the results in
a response. At the time this type offunctionality was nothing terribly new, but it was and
still is a helpfulenhancement to Servlets.
JSP has been revised several times since the original release, each addingfunctionality,
and is currently in version 2.0. The JSP specifications are developedalongside the Servlet
specifications and can be found on Sun Microsystems’
The functionality defined by the JSP 2.0 specifications can be broken downas follows:
JSP
The JSP specifications define the basic syntax and semantics of a JavaServer Page.A
basic JavaServer Page consists of plain text and markup and can optionally takeadvantage
of embedded scripts and other functionality for creating dynamiccontent.
JSP includes a mechanism for defining dynamic attributes for custom tags. Anyscripting
language can be used for this purpose; usually Java is implemented, butthe JSP
specification defines a custom expression language designed specifically for the task.
52
Often the JSP EL is a much simpler and more flexible solution, especially when
combined with JSP design patterns that do not use embedded scripts.
Discussing the basics of JSP is the focus of this chapter. JavaBeans, CustomTags, and the
JSP Expression Language are all fully discussed in later chaptersafter a proper foundation
of JSP is established.
Figure 6.4
53
6.4JSP - EXPRESSION LANGUAGE (EL)
JSP Expression Language (EL) makes it possible to easily access application data stored
in JavaBeans components. JSP EL allows you to create expressions both (a) arithmetic
and (b) logical. Within a JSP EL expression, you can use integers, floating point numbers,
strings, the built-in constants true and false for Boolean values, and null.
6.4.1Simple Syntax
Typically, when you specify an attribute value in a JSP tag, you simply use a string. For
example:
JSP EL allows you to specify an expression for any of these attribute values. A simple
syntax for JSP EL is as follows:
${expr}
Here expr specifies the expression itself. The most common operators in JSP EL
are . and []. These two operators allow you to access various attributes of Java Beans and
built-in JSP objects.
For example above syntax <jsp:setProperty> tag can be written with an expression like:
When the JSP compiler sees the ${} form in an attribute, it generates code to evaluate the
expression and substitutes the value of expression.
You can also use JSP EL expressions within template text for a tag. For example, the
<jsp:text> tag simply inserts its content within the body of a JSP. The following
<jsp:text> declaration inserts <h1>Hello JSP!</h1> into the JSP output:
<jsp:text>
<h1>Hello JSP!</h1>
54
</jsp:text>
You can include a JSP EL expression in the body of a <jsp:text> tag (or any other tag)
with the same ${} syntax you use for attributes. For example:
<jsp:text>
Box Perimeter is: ${2*box.width + 2*box.height}
</jsp:text>
EL expressions can use parentheses to group sub expressions. For example, ${(1 + 2) *
3} equals 9, but ${1 + (2 * 3)} equals 7.
The valid values of this attribute are true and false. If it is true, EL expressions are
ignored when they appear in static text or tag attributes. If it is false, EL expressions are
evaluated by the container.
6.5 JSTL
The Java Server Pages Standard Tag Library (JSTL) is a collection of useful JSP tags
which encapsulates core functionality common to many JSP applications.
JSTL has support for common, structural tasks such as iteration and conditionals, tags for
manipulating XML documents, internationalization tags, and SQL tags. It also provides a
framework for integrating existing custom tags with JSTL tags.
The JSTL tags can be classified, according to their functions, into following JSTL tag
library groups that can be used when creating a JSP page:
Core Tags
Formatting tags
SQL tags
XML tags
JSTL Functions
55
Figure 6.5
6.6JavaBeans:
JavaBeans are reusable software components for Java. They are classes that
encapsulate many objects into a single object (the bean). They are serializable
have a 0-argument constructor, and allow access to properties using getter and
setter methods.
JavaBeans Conventions
In order to function as a JavaBean class, an object class must obey certain conventions
about method naming, construction, and behavior. These conventions make it possible to
have tools that can use, reuse, replace, and connect JavaBeans.
The required conventions are as follows:
The class must have a public default constructor (with no arguments). This allows
easy instantiation within editing and activation frameworks.
56
The class properties must be accessible using get, set, is (can be used for Boolean
properties instead of get), and other methods (so-called accessor methods and mutator
methods) according to a standard naming convention. This allows easy automated
inspection and updating of bean state within frameworks, many of which include
custom editors for various types of properties. Setters can have one or more than one
argument.
The class should be serializable. This allows applications and frameworks to reliably
save, store, and restore the bean's state in a manner independent of the VM and of the
platform.
packageplayer;
/**
* Property <code>name</code> (note capitalization) readable/writable.
*/
privateString name =null;
/**
* Getter for property <code>name</code>
*/
publicString getName(){
return name;
}
/**
* Setter for property <code>name</code>.
* @param value
*/
publicvoid setName(finalString value){
name = value;
}
/**
* Getter for property "deceased"
* Different syntax for a boolean field (is vs. get)
57
*/
publicboolean isDeceased(){
return deceased;
}
/**
* Setter for property <code>deceased</code>.
* @param value
*/
publicvoid setDeceased(finalboolean value){
deceased = value;
We wondered why people were so against using regular objects in their systems and
concluded that it was because simple objects lacked a fancy name. So we gave them one,
and it's caught on very nicely.
The term "POJO" is mainly used to denote a Java object which does not follow any of the
major Java object models, conventions, or frameworks. The term continues the pattern of
older terms for technologies that do not use fancy new features, such as POTS (Plain Old
Telephone Service) in telephony, PODS (Plain Old Data Structures) that are defined in
C++ but use only C language features, and POD (Plain Old Documentation) in Perl. The
equivalent to POJO on the .NET framework is Plain Old CLR Object (POCO). For PHP,
it is Plain Old PHP Object (POPO).
The POJO phenomenon has most likely gained widespread acceptance because of the
need for a common and easily understood term that contrasts with complicated object
frameworks.
Ideally speaking, a POJO is a Java object not bound by any restriction other than those
forced by the Java Language Specification. I.e., a POJO should not have to
58
Publicclass Foo extends javax.servlet.http.HttpServlet
However, due to technical difficulties and other reasons, many software products or
frameworks described as POJO-compliant actually still require the use of prespecified
annotations for features such as persistence to work properly. The idea is that if the object
(actually class) was a POJO before any annotations were added, and would return to
POJO status if the annotations are removed then it can still be considered a POJO. Then
the basic object remains a POJO in that it has no special characteristics (such as an
implemented interface) that makes it a "Specialized Java Object" (SJO or (sic) SoJO).
6.8 HTML
Welcome to HTML Basics. This workshop leads you through the basics of Hyper Text
Markup Language (HTML). HTML is the building block for web pages. You will learn
to use HTML to author an HTML page to display in a web browser.
6.8.1Objectives:
By the end of this workshop, you will be able to:
Use a text editor to author an HTML document.
Be able to use basic tags to denote paragraphs, emphasis or special type.
Create hyperlinks to other documents.
Create an email link.
Add images to your document.
Use a table for layout.
Apply colors to your HTML document.
6.8.2Prerequisites:
You will need a text editor, such as Notepad and an Internet browser, such as Internet
Explorer or Netscape.
59
Q:What is Notepad and where do I get it?
A:Notepad is the default Windows text editor. On most Windows systems, click your
Start button and choose Programs then Accessories. It should be a little blue notebook.
Mac Users SimpleText is the default text editor on the Mac. In OSX use TextEdit and
change the following preferences: Select (in the preferences window) Plain text instead of
Rich text and then select Ignore rich text commands in HTML files. This is very
important because if you don't do this HTML codes probably won't work. One thing you
should avoid using is a word processor (like Microsoft Word) for authoring your HTML
Documents.
6.8.3What is an html File?
HTML is a format that tells a computer how to display a web page. The documents
themselves are plain text files with special "tags" or codes that a web browser uses to
interpret and display information on your computer screen.
HTML stands for Hyper Text Markup Language
An HTML file is a text file containing small markup tags
The markup tags tell the Web browser how to display the page
An HTML file must have an htm or html file extension
Open your text editor and type the following text:
<html>
<head>
<title>My First Webpage</title>
</head>
<body>
This is my first homepage. <b>This text is bold</b>
</body>
</html>
Save the file as mypage.html. Start your Internet browser. Select Open(or Open Page) in
the File menu of your browser. A dialog box will appear. Select Browse(or Choose File)
and locate the html file you just created - mypage.html- select it and click Open. Now
you should see an address in the dialog box, for example
60
C:\MyDocuments\mypage.html. Click OK, and the browser will display the page. To
view how the page should look, visit this web page:
Example:
What you just made is a skeleton html document. This is the minimum required
information for a web document and all web documents should contain these basic
components. The first tag in your html document is <html>.This tag tells your browser
that this is the start of an html document. The last tag in your document is </html>.This
tag tells your browser that this is the end of the html document. The text between the
<head>tag and the </head>tag is header information. Header information is not displayed
in the browser window. The text between the <title>tags is the title of your document.
The <title>tag is used to uniquely identify each document and is also displayed in the title
bar of the browser window. The text between the <body>tags is the text that will be
displayed in your browser. The text between the <b>and </b>tags will be displayed in a
bold font.
6.8.4HTM or HTML Extension:
When you save an HTML file, you can use either the .htm or the .html extension.
The .htm extension comes from the past when some of the commonly used software only
allowed three letter extensions. It is perfectly safe to use either.html or .htm, but be
consistent. mypage.htmand mypage.html are treated as different files by the browser.
How to View HTML SourceA good way to learn HTML is to look at how other people
have coded their html pages. To find out, simply click on the View option in your
browsers toolbar and select Source or Page Source. This will open a window that shows
you the actual HTML of the page. Go ahead and view the source html for this page.
6.8.5HTML Tags
Web pages are created using a language called HTML. Don't be intimidated at the
thought of having to learn HTML; the basics you'll need to make a web page are simple.
HTML uses tags to control the look and feel of your web page. Tags are enclosed in
<>characters. Many tags have a closing tag, which are characterized by a forward
slash"/" before the tag name. This closing tag tells the browser to cease whatever
instruction began with the initial tag. The general form for a tag is: < tag_name > Your
Text </tag_name> There are many tags in HTML, and each one tells the browser a piece
61
of information about how it should display the text between the tags. Only basic HTML
tags will be addressed in this document. The HTML language continues to develop. If
you are interested in learning more about HTML search the web for HTML courses
and/or tutorials.
Cascading Style Sheets (CSS) is to give the page developer much more control on how
apage should be displayed by allbrowsers. A style sheet is a set of rulesthat controls the
formatting of HTML elements on one or more Webpages. Thus, the appearance of a Web
page can be changed by changing the style sheet associatedwith it. There is no need to
62
make detailed changes within the Web page to change how it looks. Some of the
advantages of using style sheets are accessibility, different styling can be provided
fordifferent users dependent on their requirements. Separating style and content is good
designand willnormally produce a better and more consistent web site. As one style sheet
can be used for a wholeweb site, it normally means that the overall sizeof the web site is
smaller and downloads requiredfor each page can be decreased by up to 40%. Allowing
browsers to make overall decisions on stylingoften means that the rendering time by the
browser is also shorter. We shall see how styling can radically effect a page's layout and
this allows important information to appear early in the HTML markup of the page even
if the design requires it to appear later. This can be of use to search engines.
A set of Web pages may use a common style sheet. A Web page may have its own style
sheet thatrefines the information in the common style sheet. Readers may define their
own style sheet indicatingtheir preferences. Thus style sheets cascade and decisions need
to be made as to which style sheet is in control when there is a conflict.
1. By defining a link from the HTML page to the style sheet (normally stored in an
xxx.cssfile). This allows you to have a single style sheet that changes the appearance of
many Web pages.
2. By specifying an HTML style element in the headof the page. This allows you to
define theappearance of a single page.
3. By adding inline styles to specific elements in the HTML file by the styleattribute.
This allows you to change a single element or set of elements. This should only be used
when absolutelynecessary as it negates some of the advantages of having style sheets. It
effectively over-rides the overall style for the page. Also, it is likely that it will be
removed from CSS 3.
4. By importing a style sheet stored externally into the current style. The styleelement for
the page can consist of a set of imports plus some rules specific to the page.
63
Initially, while you are experimenting, the simplest method is to use the HTML
styleelement to add astyle sheet to an HTML page. The styleelement, consisting of a set
of rules, is placed in the document head.When you have decided on your house style, it is
likely that you will move to having the style sheetexternal to an individual page and
either linked in or imported.
6.9.2Main Facilities
The facilities provided by style sheets are much as you would expect:
3. Better control of inline layout, particularly with regard to diagrams and related text
To achieve these effects, the Web browser has to know what to do with style sheets.
Really oldbrowsers will not have this capability but the recent offerings from Microsoft,
Opera and Netscape havegood support now and are committed to full support in the
future. HTML Editors are beginning to haveStyle Sheet additions and separate Style
Sheet Editors are appearing. Given the way CSS has been implemented, it is possible to
design your web pages so that they stillpresent the HTML information even when CSS
support is not there. So there is little excuse for notadding style to your web pages now
even if you expect your pages are to be viewed by really oldbrowsers.
This Primer will not tell you everything about CSS. If you get to the position where you
really need toknow precisely what happens in some subtle situation, you need to consult
the formal specification thatcan be found on the World-Wide Web Consortium's web site
To make it as easy as possible to relate the specification to the Primer, the order of
presentation here is similar to the order in the Specification.
64
CHAPTER 7
DATABASE
7.1Database
A database is the place of storage of the data in the form of tables
Data means information which is very useful. A database is also collection of 1 or more
tables.
Table 7.1
CELL
CELL
Columns
65
Oracle, SQL Server, DB2, Sybase, Informix, MySQL, MS – Access, Foxbase, FoxPro
Among the above database software – some of them are DBMS and some of them are
RDBMS.
The s/w which is widely used today is Oracle. The different versions of Oracle starting
from the earliest to the latest are – Oracle 2, Oracle 3, Oracle 4, Oracle 5, Oracle 6,
Oracle 7, Oracle 8i, Oracle 9i, Oracle 10g, and the latest to hit the market is Oracle 11g
here ‘i’ stands for Internet and ‘g’ stands for Grid / Grid computing.
7.2 Relational DBMS
A database is a means of storing information in such a way that information can be
retrieved from it. In simplest terms, a relational database is one that presents information
in tables with rows and columns. A table is referred to as a relation in the sense that it is a
collection of objects of the same type (rows). Data in a table can be related according to
common keys or concepts, and the ability to retrieve related data from a table is the basis
for the term relational database. A Database Management System (DBMS) handles the
way data is stored, maintained, and retrieved. In the case of a relational database, a
Relational Database Management System (RDBMS) performs these tasks. DBMS as
used in this book is a general term that includes RDBMS.
7.2.1Integrity Rules
Relational tables follow certain integrity rules to ensure that the data they contain stay
accurate and are always accessible. First, the rows in a relational table should all be
distinct. If there are duplicate rows, there can be problems resolving which of two
possible selections is the correct one. For most DBMSs, the user can specify that
duplicate rows are not allowed, and if that is done, the DBMS will prevent the addition of
any rows that duplicate an existing row.
A second integrity rule of the traditional relational model is that column values must not
be repeating groups or arrays. A third aspect of data integrity involves the concept of a
null value. A database takes care of situations where data may not be available by using a
null value to indicate that a value is missing. It does not equate to a blank or zero. A
blank is considered equal to another blank, a zero is equal to another zero, but two null
values are not considered equal.
66
When each row in a table is different, it is possible to use one or more columns to
identify a particular row. This unique column or group of columns is called a primary
key. Any column that is part of a primary key cannot be null; if it were, the primary key
containing it would no longer be a complete identifier. This rule is referred to as entity
integrity.
The Employees table illustrates some of these relational database concepts. It has five
columns and six rows, with each row representing a different employee.
The primary key for this table would generally be the employee number because each one
is guaranteed to be different. (A number is also more efficient than a string for making
comparisons.) It would also be possible to use First_Name and Last_Name because the
combination of the two also identifies just one row in our sample database. Using the last
name alone would not work because there are two employees with the last name of
"Washington." In this particular case the first names are all different, so one could
conceivably use that column as a primary key, but it is best to avoid using a column
where duplicates could occur. If Elizabeth Yamaguchi gets a job at this company and the
primary key is First_Name, the RDBMS will not allow her name to be added (if it has
been specified that no duplicates are permitted). Because there is already an Elizabeth in
the table, adding a second one would make the primary key useless as a way of
identifying just one row. Note that although using First_Name and Last_Name is a
unique composite key for this example, it might not be unique in a larger database. Note
also that the Employee table assumes that there can be only one car per employee.
67
7.2.2 SELECT Statements
SQL is a language designed to be used with relational databases. There is a set of basic
SQL commands that is considered standard and is used by all RDBMSs. For example, all
RDBMSs use the SELECT statement.
A SELECT statement, also called a query, is used to get information from a table. It
specifies one or more column headings, one or more tables from which to select, and
some criteria for selection. The RDBMS returns rows of the column entries that satisfy
the stated requirements. A SELECT statement such as the following will fetch the first
and last names of employees who have company cars:
The result set (the set of rows that satisfy the requirement of not having null in
the Car_Number column) follows. The first name and last name are printed for each row
that satisfies the requirement because the SELECT statement (the first line) specifies the
columns First_Name and Last_Name. The FROM clause (the second line) gives the table
from which the columns will be selected.
Table 7.3
FIRST_NAME LAST_NAME
Axel Washington
Florence Wojokowski
The following code produces a result set that includes the whole table because it asks for
all of the columns in the table Employees with no restrictions (no WHEREclause). Note
that SELECT * means "SELECT all columns."
SELECT *
FROM Employees
68
7.2.3WHERE Clauses
The WHERE clause in a SELECT statement provides the criteria for selecting values. For
example, in the following code fragment, values will be selected only if they occur in a
row in which the column Last_Name begins with the string 'Washington'.
The keyword LIKE is used to compare strings, and it offers the feature that patterns
containing wildcards can be used. For example, in the code fragment above, there is a
percent sign (%) at the end of 'Washington', which signifies that any value containing the
string 'Washington' plus zero or more additional characters will satisfy this selection
criterion. So 'Washington' or 'Washingtonian' would be matches, but 'Washing' would not
be. The other wildcard used in LIKE clauses is an underbar (_), which stands for any one
character. For example,
would match 'Batman', 'Barman', 'Badman', 'Balman', 'Bagman', 'Bamman', and so on.
The code fragment below has a WHERE clause that uses the equal sign (=) to compare
numbers. It selects the first and last name of the employee who is assigned car 12.
The next code fragment selects the first and last names of employees whose employee
number is greater than 10005:
69
WHERE Employee_Number > 10005
WHERE clauses can get rather elaborate, with multiple conditions and, in some DBMSs,
nested conditions. This overview will not cover complicated WHERE clauses, but the
following code fragment has a WHERE clause with two conditions; this query selects the
first and last names of employees whose employee number is less than 10100 and who do
not have a company car.
A special type of WHERE clause involves a join, which is explained in the next section.
7.2.4 Joins
A distinguishing feature of relational databases is that it is possible to get data from more
than one table in what is called a join. Suppose that after retrieving the names of
employees who have company cars, one wanted to find out who has which car, including
the make, model, and year of car. This information is stored in another table, Cars.
There must be one column that appears in both tables in order to relate them to each
other. This column, which must be the primary key in one table, is called the foreign key
in the other table. In this case, the column that appears in two tables is Car_Number,
which is the primary key for the table Cars and the foreign key in the table Employees. If
the 1996 Honda Civic were wrecked and deleted from the Cars table, then Car_Number 5
would also have to be removed from the Employees table in order to maintain what is
called referential integrity. Otherwise, the foreign key column (Car_Number) in
70
the Employees table would contain an entry that did not refer to anything in Cars. A
foreign key must either be null or equal to an existing primary key value of the table to
which it refers. This is different from a primary key, which may not be null. There are
several null values in the Car_Number column in the table Employees because it is
possible for an employee not to have a company car.
The following code asks for the first and last names of employees who have company
cars and for the make, model, and year of those cars. Note that the FROM clause lists
both Employees and Cars because the requested data is contained in both tables. Using
the table name and a dot (.) before the column name indicates which table contains the
column.
This returns a result set that will look similar to the following:Table 7.5
SQL commands are divided into categories, the two main ones being Data Manipulation
Language (DML) commands and Data Definition Language (DDL) commands. DML
commands deal with data, either retrieving it or modifying it to keep it up-to-date. DDL
commands create or change tables and other database objects such as views and indexes.
71
UPDATE — changes an existing value in a column or group of columns in a
table
CREATE TABLE — creates a table with the column names the user provides.
The user also needs to specify a type for the data in each column. Data types vary
from one RDBMS to another, so a user might need to use metadata to establish
the data types used by a particular database. CREATE TABLE is normally used
less often than the data manipulation commands because a table is created only
once, whereas adding or deleting rows or changing individual values generally
occurs more frequently.
DROP TABLE — deletes all rows and removes the table definition from the
database. A JDBC API implementation is required to support the DROP
TABLEcommand as specified by SQL92, Transitional Level. However, support
for the CASCADE and RESTRICT options of DROP TABLE is optional. In
addition, the behavior of DROP TABLE is implementation-defined when there
are views or integrity constraints defined that reference the table being dropped.
ALTER TABLE — adds or removes a column from a table. It also adds or
drops table constraints and alters column attributes
The rows that satisfy the conditions of a query are called the result set. The number of
rows returned in a result set can be zero, one, or many. A user can access the data in a
result set one row at a time, and a cursor provides the means to do that. A cursor can be
thought of as a pointer into a file that contains the rows of the result set, and that pointer
has the ability to keep track of which row is currently being accessed. A cursor allows a
user to process each row of a result set from top to bottom and consequently may be used
for iterative processing. Most DBMSs create a cursor automatically when a result set is
generated.
72
Earlier JDBC API versions added new capabilities for a result set's cursor, allowing it to
move both forward and backward and also allowing it to move to a specified row or to a
row whose position is relative to another row.
7.2.7 Transactions
When one user is accessing data in a database, another user may be accessing the same
data at the same time. If, for instance, the first user is updating some columns in a table at
the same time the second user is selecting columns from that same table, it is possible for
the second user to get partly old data and partly updated data. For this reason, DBMSs
use transactions to maintain data in a consistent state (data consistency) while allowing
more than one user to access a database at the same time (data concurrency).
A transaction is a set of one or more SQL statements that make up a logical unit of work.
A transaction ends with either a commit or a rollback, depending on whether there are
any problems with data consistency or data concurrency. The commit statement makes
permanent the changes resulting from the SQL statements in the transaction, and the
rollback statement undoes all changes resulting from the SQL statements in the
transaction.
A lock is a mechanism that prohibits two transactions from manipulating the same data at
the same time. For example, a table lock prevents a table from being dropped if there is
an uncommitted transaction on that table. In some DBMSs, a table lock also locks all of
the rows in a table. A row lock prevents two transactions from modifying the same row,
or it prevents one transaction from selecting a row while another transaction is still
modifying it.
A stored procedure is a group of SQL statements that can be called by name. In other
words, it is executable code, a mini-program, that performs a particular task that can be
invoked the same way one can call a function or method. Traditionally, stored procedures
have been written in a DBMS-specific programming language. The latest generation of
database products allows stored procedures to be written using the Java programming
73
language and the JDBC API. Stored procedures written in the Java programming
language are bytecode portable between DBMSs. Once a stored procedure is written, it
can be used and reused because a DBMS that supports stored procedures will, as its name
implies, store it in the database.
The following code is an example of how to create a very simple stored procedure using
the Java programming language. Note that the stored procedure is just a static Java
method that contains normal JDBC code. It accepts two input parameters and uses them
to change an employee's car number.
Do not worry if you do not understand the example at this point. The code example
below is presented only to illustrate what a stored procedure looks like. You will learn
how to write the code in this example in the tutorials that follow.
import java.sql.*;
try {
con = DriverManager.getConnection(
"jdbc:default:connection");
pstmt = con.prepareStatement(
"UPDATE EMPLOYEES " +
"SET CAR_NUMBER = ? " +
"WHERE EMPLOYEE_NUMBER = ?");
74
pstmt.setInt(1, carNo);
pstmt.setInt(2, empNo);
pstmt.executeUpdate();
}
finally {
if (pstmt != null) pstmt.close();
}
}
}
7.4JDBC Introduction
The JDBC API is a Java API that can access any kind of tabular data, especially data
stored in a Relational Database.
JDBC helps you to write Java applications that manage these three programming
activities:
75
2. Send queries and update statements to the database
3. Retrieve and process the results received from the database in answer to your
query
The following simple code fragment gives a simple example of these three steps:
while (rs.next()) {
int x = rs.getInt("a");
String s = rs.getString("b");
float f = rs.getFloat("c");
}
}
1. The JDBC API: The JDBC™ API provides programmatic access to relational
data from the Java™ programming language. Using the JDBC API, applications
76
can execute SQL statements, retrieve results, and propagate changes back to an
underlying data source. The JDBC API can also interact with multiple data
sources in a distributed, heterogeneous environment.
The JDBC API is part of the Java platform, which includes the Java™ Standard
Edition (Java™ SE ) and the Java™ Enterprise Edition (Java™ EE). The JDBC
4.0 API is divided into two packages: java.sql and javax.sql. Both packages are
included in the Java SE and Java EE platforms.
2. JDBC Driver Manager: The JDBC DriverManager class defines objects which
can connect Java applications to a JDBC driver. DriverManager has traditionally
been the backbone of the JDBC architecture. It is quite small and simple.
The Standard Extension packages javax.naming and javax.sql let you use
a DataSource object registered with a Java Naming and Directory Interface™
(JNDI) naming service to establish a connection with a data source. You can use
either connecting mechanism, but using a DataSource object is recommended
whenever possible.
3. JDBC Test Suite: The JDBC driver test suite helps you to determine that JDBC
drivers will run your program. These tests are not comprehensive or exhaustive,
but they do exercise many of the important features in the JDBC API.
4. JDBC-ODBC Bridge: The Java Software bridge provides JDBC access via
ODBC drivers. Note that you need to load ODBC binary code onto each client
machine that uses this driver. As a result, the ODBC driver is most appropriate on
a corporate network where client installations are not a major problem, or for
application server code written in Java in a three-tier architecture.
This Trail uses the first two of these four JDBC components to connect to a database and
then build a java program that uses SQL commands to communicate with a test
Relational Database. The last two components are used in specialized environments to
test web applications, or to communicate with ODBC-aware DBMSs.
77
7.4.2 JDBC Architecture
The JDBC API supports both two-tier and three-tier processing models for database
access.
Figure 7.1
In the two-tier model, a Java applet or application talks directly to the data source. This
requires a JDBC driver that can communicate with the particular data source being
accessed. A user's commands are delivered to the database or other data source, and the
results of those statements are sent back to the user. The data source may be located on
another machine to which the user is connected via a network. This is referred to as a
client/server configuration, with the user's machine as the client, and the machine housing
the data source as the server. The network can be an intranet, which, for example,
connects employees within a corporation, or it can be the Internet.
In the three-tier model, commands are sent to a "middle tier" of services, which then
sends the commands to the data source. The data source processes the commands and
sends the results back to the middle tier, which then sends them to the user. MIS directors
find the three-tier model very attractive because the middle tier makes it possible to
maintain control over access and the kinds of updates that can be made to corporate data.
Another advantage is that it simplifies the deployment of applications. Finally, in many
cases, the three-tier architecture can provide performance advantages.
78
Figure 7.2
Until recently, the middle tier has often been written in languages such as C or C++,
which offer fast performance. However, with the introduction of optimizing compilers
that translate Java bytecode into efficient machine-specific code and technologies such as
Enterprise JavaBeans™, the Java platform is fast becoming the standard platform for
middle-tier development. This is a big plus, making it possible to take advantage of Java's
robustness, multithreading, and security features.
With enterprises increasingly using the Java programming language for writing server
code, the JDBC API is being used more and more in the middle tier of a three-tier
architecture. Some of the features that make JDBC a server technology are its support for
connection pooling, distributed transactions, and disconnected rowsets. The JDBC API is
also what allows access to a data source from a Java middle tier.
79
CHAPTER 8
ALGORITHM DESIGN
8.1.1Anti-Spam Filter
80
Figure 8.1
The following section describes some of the basic probability formula’s that will be used:
Conditional probability: The probability of an event may depend on the occurrence or
non-occurrence of another event. This dependency is written in terms of conditional
probability P(A|B)
“the probability that A will happen given that B already has” or “ the probability to select
A among B”
Notice that B is given first, and we find the proportion of A among B
81
From the formulas the Bayes Theorem States the Prior probability. Unconditional
probabilities of our hypothesis before we get any data or any NEW evidence. Simply
speaking it is the state of our knowledge before the data is observed.
Also stated is the posterior probability: A conditional probability about our hypothesis
(our state of knowledge) after we revised based on the new data.
Likelihood is the conditional probability based on our observation data given that our
hypothesis holds.
The following are the mathematical formalisms, and the example on a spam filter, but
keep in mind the basic idea.
Considering each attribute and class label as a random variable and given a record with
attributes (A1,A2,…, An). The goal is to predict class C. Specifically, we want to find the
value of C that maximizes P(C| A1,A2,…An)
The approach take is to compute the posterior probability P(C| A1,A2,…An) for all
values of C using the Bayes theorem.
82
So you choose the value of C that maximizes P(C| A1,A2,…An). This is equivalent to
choosing the value of C that maximizes P(A1,A2,…An | C) P(C)
So to simplify the task of Naïve Bayesian Classifiers we assume attributes have
independent distributions.
The Naïve Bayes theorem has the following characteristics as advantages and
disadvantages:
8.1.2Advantages:
Handles quantitative and discrete data
Robust to isolated noise points
Handle missing values by ignoring the instance
During probability estimate calculations
Fast and space efficient
Not sensitive to irrelevant features
Quadratic decision boundary
8.1.3Disadvantages:
If conditional probability is zero
Assumes independence of features
Naïve Bayesian prediction requires each conditional probability be non zero. Otherwise,
the predicted probability will be zero.
83
In order to overcome this we use probability estimation from one of the following:
In order to classify and predict a spam email from a non-spam I will be using the
following techniques and assumptions:
Will be sorting according to language( spam or non-spam), then words, then count
If a word does not exist, consider to approximate P(word|class) using Laplacian
Will be using the following table for my analysis
The antispam-table.txt file is the file that contains each word that content filtering uses to
determine if a message is spam. Beside each word there are three numbers. The first
number is an identifier assigned by the anti-spam engine. The second number is the
number of times that the word has occurred in non-spam e-mail messages. The third
number is the number of times that the word has occurred in spam e-mail messages.
8.2Unicode Encryption Algorithm
UNICODE is a computing industry standard for the consistent representation and
handling of text expressed in most of the world's writing systems. Developed
inconjunction with the Universal Character Set standard and published in book form as
The UNICODE Standard, the latest version of UNICODE consists of a repertoire of more
than 107,000 characters covering 90 scripts, a set of code charts for visual reference, an
encoding methodology and set of standard character encodings, an enumeration of
character properties such as upper and lower case, a set of reference data computer files,
and a number of related items, such as character properties, rules for normalization,
84
decomposition, collation, rendering, and bidirectional display order (for the correct
display of text containing both right-to-left scripts, such as Arabic or Hebrew, and left-to-
right scripts). This paper introduces a new technique for cryptography by using
UNICODE and colors in universe (supported by computer).
8.2.1Algorithm
public String doEncryption(String actualWord) {
short shiftKey = 7;
for (int i = 0; i <= (actualWord.length() - 1); i++) {
char c = actualWord.charAt(i);
short k = (short) c;
short k2 = (short) (k + shiftKey);
returnencryptedWord;
}
85
CHAPTER 9
SOFTWARE TESTING
Testing is a process, which reveals errors in the program. It is the major quality
measure employed during software development. During software development. During
testing, the program is executed with a set of test cases and the output of the program for
the test cases is evaluated to determine if the program is performing as it is expected to
perform.
86
9.1.1.2 WHITE BOX TESTING
In this the test cases are generated on the logic of each module by drawing flow graphs of that module and
logical decisions are tested on all the cases. It has been uses to generate the test cases in the following
cases:
Guarantee that all independent paths have been Executed.
Execute all logical decisions on their true and false Sides.
Execute all loops at their boundaries and within their operational bounds
Execute internal data structures to ensure their validity.
87
the module and provides the needed data so that the module is asked to perform the way
it will when embedded with in the larger system. When bottom level modules are tested
attention turns to those on the next level that use the lower level ones they are tested
individually and then linked with the previously examined lower level modules.
9.3 VALIDATION
The system has been tested and implemented successfully and thus ensured that
all the requirements as listed in the software requirements specification are completely
fulfilled. In case of erroneous input corresponding error messages are displayed
88
CHAPTER 10
SYSTEM REQUIREMENTS
• Services :: JDBC
89
CHAPTER 11
TOOLS
Tools for Java developers creating Java EE and Web applications, including a Java IDE,
tools for Java EE, JPA, JSF, Mylyn, EGit and others.
11.1.2Package includes:
The Java Development Kit (JDK) is an implementation of either one of the Java
SE, Java EE or Java ME platforms released by Oracle Corporation in the form of a binary
product aimed at Javadevelopers on Solaris, Linux, Mac OS X or Windows. The JDK
includes a private JVM and a few other resources to finish the recipe to a Java
Application. Since the introduction of the Javaplatform, it has been by far the most
widely used Software Development Kit (SDK). On 17 November 2006, Sun announced
that it would be released under the GNU General Public License (GPL), thus making
90
it free software. This happened in large part on 8 May 2007, when Sun contributed the
source code to the OpenJDK.
11.3.1 Design:
11.3.2 Develop:
MySQL Workbench delivers visual tools for creating, executing, and optimizing
SQL queries. The SQL Editor provides color syntax highlighting, auto-complete, reuse of
SQL snippets, and execution history of SQL. The Database Connections Panel enables
developers to easily manage standard database connections, including MySQL Fabric.
The Object Browser provides instant access to database schema and objects.
11.3.3 Administer:
91
IO hotspots, high cost SQL statements, and more. Plus, with 1 click, developers can see
where to optimize their query with the improved and easy to use Visual Explain Plan.
11.3.5Database Migration:
MySQL Workbench now provides a complete, easy to use solution for migrating
Microsoft SQL Server, Microsoft Access, Sybase ASE, PostreSQL, and other RDBMS
tables, objects and data to MySQL. Developers and DBAs can quickly and easily convert
existing applications to run on MySQL both on Windows and other platforms. Migration
also supports migrating from earlier versions of MySQL to the latest releases.
11.4.1 Terminology:
92
each instance. If multiple instances are not configured, $CATALINA_BASE is the same
as $CATALINA_HOME.
/bin - Startup, shutdown, and other scripts. The *.sh files (for Unix systems) are
functional duplicates of the *.bat files (for Windows systems). Since the Win32
command-line lacks certain functionality, there are some additional files in here.
/conf - Configuration files and related DTDs. The most important file in here is
server.xml. It is the main configuration file for the container.
/logs - Log files are here by default.
/webapps - This is where your webapps go.
93
CHAPTER 12
WEB APPLICATIONS
To run this tutorial, as a minimum you will be required to have installed the following
prerequisite software:
Geronimo version 2.1.x, Java 1.5 runtime, and Eclipse Ganymede are used is used in this
tutorial but other versions can be used instead (e.g., Geronimo version 2.2, Java 1.6,
Eclipse Europa)
Details on installing eclipse are provided in the Development environment section. This
tutorial is organized in the following sections:
Figure 12.1
94
Figure 12.2
1. Right click under the project explorer and select Dynamic Web Project as shown in the
figure
95
Figure 12.3
96
2. Name the project as HelloWorld.
Figure 12.4
97
3. Keep default values for all the fields and select Finish.
Figure 12.5
1. Right-click on the project HelloWorld and create a new JSP as shown in the
figure.
98
Figure 12.6
99
2. Give the name as hello.jsp and select Next. Select Finish on the next screen
Figure 12.7
hello.jsp
100
Hello World!!
</body>
</html>
web.xml
101
Run and deploy
Figure 12.8
102
2. launch the application using http://localhost:8080/HelloWorld/hello.jsp
Figure 12.9
103
CHAPTER 13
13.1Conclusion
Email has been an efficient and popular communication mechanism as the
number of Internet user's increase. In many security informatics applications it is
important to detect deceptive communication in email. In this application the mails are
classified as suspicious or normal using the key words and encrypted keywords. The
mails containing these keywords and encrypted keywords are classified as suspicious
mails and they can be blocked and verified by the administrator. The proposed work will
be helpful for identifying the suspicious email and also assist the investigators to get the
information in time to take effective actions to reduce the criminal activities.
13.2Future Enhancements
Even though the project fulfills the requirements of the present application there is
always scope for further work.According to the emerging changes and new versions,
further work can be done to improve the application. Since project is designed in a
flexible software.
This application which is a web based is of standalone application. This can be
implemented on internet by buying the network space and by creating a website.
The present application when implemented on internet requires a large database as the
backend; this can be done by using the MySql database as the backend.
As we use encrypted keyword in this application we can implement many encrypted
algorithms like RSA, DES/3DES, BLOWFISH,AES,IDEA and many more.
104
ABBREVIATIONS
RDX: Research Department Explosive
TME: Targeted Malicious Email
NTME: Non-Targeted Malicious Email
SSL: Secure Socket Layer
MVC: Model View Controller
JSP: Java Server Pages
POJO: Plain Old Java Object
HTML: Hypertext Markup Language
J2EE: Java 2 Enterprise Edition
TDD: Test Driven Development
UML: Unified Modeling Language
JVM: Java Virtual Machine
OS: Operating System
API: Application Programming Interface
XML: Extensible Markup Language
JDK: Java Development Kit
GUI: Graphical User Interface
JDBC: Java Database Connectivity
JNDI: Java Naming and Directory Interface
RMI: Remote Method Invocation
RMI-IIOP: Remote Method Invocation over Internet Inter-ORB Protocol Technology
CGI: Common Gateway Interface
HTTP: Hypertext Transfer Protocol
URL: Universal Resource Locator
JSP-EL: Java Server Pages- Expression Language
JSTL: Java Server Pages Standard Tag Library
SQL: Structured Query Language
POTS: Plain Old Telephone Service
PODS: Plain Old Data Structure
105
POD: Plain Old Documentation
PHP: Hypertext Preprocessor
POCO: Plain Old CLR Object
POPO: Plain Old PHP Object
SJO: Specialized Java Object
RDBMS: Relational Database Management System
DML: Data Manipulation Language
DDL: Data Definition Language
TCL: Transaction Control Language
CSS: Cascading Style-sheet
SDK: Software Development Kit
GPL: General Public License
106
REFERENCES
BOOKS REFERED
HTML
[1] Holzner, HTML Black Book
JAVA TECHNOLOGIES
[2] Larne Pekowsley, JAVA Server Pages
[3] Nick Todd, JAVA Server Pages
[4] Scott oaks, JAVA Security
[5] Shadab siddiqui, J2EE Professional
[6] Yehuda Shiran, Java Script Programming
[7] JAVA Complete Reference
JDBC
[8] Patel Moss, JAVA Database Programming with JDBC
SOFTWARE ENGINEERING
Web Link:
[3]http://en.wikipedia.org/wiki/Java_Development_Kit (JDK)
[3]http://www.eclipse.org/downloads/packages/eclipse-ide-java-ee-developers/keplersr2
(Eclipse package)
[6]http://geronimo.apache.org/GMOxDOC30/developing-a-hello-world-web-
107
http://www.dba-oracle.com/t_object_top_down_bottom_up.htm (Top-down vs Bottom-
up)
108