3.1 External Interfaces
3.1 External Interfaces
Introduction
1.1 Purpose
1.2 Scope
1.3 Definitions, Acronyms, and Abbreviations
1.4 References
1.5 Overview
1.6 Glossary
2. The Overall Description
2.1 Product Perspective
2.1.1 System Interfaces
2.1.2 Interfaces
2.1.3 Hardware Interfaces
2.1.4 Software Interfaces
2.1.5 Communications Interfaces
2.1.6 Memory Constraints
2.1.7 Operations
2.1.8 Prototyping
2.1.9 Site Adaptation Requirements
2.2 Product Functions
2.3 User Characteristics
2.4 Constraints
2.5 Assumptions and Dependencies
2.6 Apportioning of Requirements
3. Specific Requirements
3.1 External interfaces
3.2 Functions
3.3 Performance Requirements
3.4Quality Requirements
3.4.1Quality Factors
3.4.2Correctness
3.4.3Efficiency
3.4.4Usability
3.4.5Testability
3.4.6Flexibility
3.4.7 Reusability
3.4.8Interoperability
3.4.9Additional Factors
3.4 Logical Database Requirements
3.5 Design Constraints
3.6 Software System Attributes
3.6.1 Reliability
3.6.2 Availability
3.6.3 Security
3.6.4 Maintainability
3.6.5 Portability
3.7 Organizing the Specific Requirements
3.7.1 System Mode
3.7.2 User Class
3.7.3 Objects
3.7.4 Use cases
3.7.4 Feature
3.7.5 Stimulus
3.7.6 Response
3.7.7 Functional Hierarchy
3.8 Additional Comments
4. Change Management Process
1 Introduction
The Software requirement specification is developed to address the
user requirement for the development of software. This document provides
purpose scope functional and non functional requirements of ‘Smart city’. It
also specifies user interface performance etc. Using J2EE & XML as a
framework to develop the system it reduces the communication gap between
participant and system.
1.1 Purpose
The purpose of this software requirement specification is to properly
document the requirements necessary in order to construct this project details.
This project is to develop a desirous web based platform for the business
executives, tourists arriving to the city. It provides reliable service, faster access
and flexibility to its user.
1.2 Scope
The project “DISCOVERING E-MAIL SPAM AND ANTICIPATION
VIA COLLABORATION AND AI TECHNIQUES” is a web application.
Spam is unsolicited email, which is of serious concern because it wastes resources
for the users and ISPs who have to handle it. Multiple techniques have been
proposed to contain spam. Collaborative technique is widely used whereby mail
servers can update each other on the recently discovered spam. Each server stores
digests of identified spam’s against which digests of incoming mails are matched.
This project implements spam algorithms based on identification of spam
using ‘fingerprints’, and uses a collaborative approach to update other servers in
the Network. Collaborative identification of spam exploits the fact that every
spam message is usually sent by an automated system to many recipients. Finger
print processing uses the comparisons of fingerprint values to prevent spam. Since
fingerprint is smaller than email messages, this finger print vector can compare
these fingerprint values much easily and efficiently.
Important Modules:
1. Authentication.
2. Finger Print Processing.
3. Finger Print Techniques:
Collaborative.
Bayesian.
4. Legal Word Verification,
5. Illegal Word Verification,
1.4 References
• E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, P2P systems. IEEE
Transactions on Knowledge and Data Engineering, 15(4):840–854, July/August
2003.
• E. Damiani, S. De Capitani di Vimercati, S. Paraboschi, and
P. Samarati. Using digests to identify spam messages. Technical report, University of Milan,
2004.
• Approximate Object Location and Spam Filtering on Peer-to-peer Systems
Feng Zhou, Li Zhuang, Ben Y. Zhao, Ling Huang,Anthony D. Joseph, and John Kubiatowicz
1.5 Overview
1.6 Glossary
2. The Overall Description
3. Specific Requirements
3.1 External interfaces
3.2 Functions
3.3 Performance Requirements
3.4Quality Requirements
3.4.1Quality Factors
3.4.2Correctness
3.4.3Efficiency
3.4.4Usability
3.4.5Testability
3.4.6Flexibility
3.4.7 Reusability
3.4.8Interoperability
3.4.9Additional Factors
3.4 Logical Database Requirements
3.5 Design Constraints
3.6 Software System Attributes
3.6.1 Reliability
3.6.2 Availability
3.6.3 Security
3.6.4 Maintainability
3.6.5 Portability
3.7 Organizing the Specific Requirements
3.7.1 System Mode
3.7.2 User Class
3.7.3 Objects
3.7.4 Use cases
3.7.4 Feature
3.7.5 Stimulus
3.7.6 Response
3.7.7 Functional Hierarchy
3.8 Additional Comments
4. Change Management Process
1.2. PROBLEM DEFINITION:
Important Modules:
6. Authentication.
7. Finger Print Processing.
8. Finger Print Techniques:
Collaborative.
Bayesian.
9. Legal Word Verification,
10. Illegal Word Verification,
MODULE DESCRIPTION:
Authentication:
The user is provided with an ID and password. Only authorized persons are allowed
to login in this module the server first checks for the right password and user name then, if it
does not match then the client is disconnected .If it matches then it accepts to send the mail and
allow to transfer a mail. Since the user is not available it allows to register for a new user also.
Comparing the fingerprint values can detect spam. Fingerprint is a digest value that
is unique for a string, whereby specific spam e-mails are identified and a unique “fingerprint" is
developed. It is calculated based on a fingerprint algorithm using substrings. This fingerprint size
is smaller than the email size.
• Divide the documents into all possible consecutive substrings of length L.
• Document of n characters will have (n-L+1) substrings.
• Rank the substring on the frequency of occurrence.
• Based on the Rank creating fingerprint for the unique substring.
Bayesian filters are personalized to each user and adapt automatically to changes in
spam. To determine the likelihood that an email is spam, these filters use Bayesian analysis to
compare the words or phrases in the email in question to the frequency of the same words or
phrases in the intended recipient's previous emails (both legitimate and spam). It checks the
received mail with the previous mails i.e. with the words, characters already created for a digest
values
CLIENT CLIENT
MAIL MAIL
SERVER SERVER
MAIL COMMUNICATION
SERVER BETWEEN
MAIL SERVERS
LOGS IN
CLIENT CLIENT
CLIENT
2. SYSTEM ANALYSIS
What is spam for one user can be a legitimate email for another. So the complexities just
increase. Many systems are evolving to deal with this menace. Some of them use user reports
and automated learning systems that block spam as and when reported .Unless spam are detected
in the first place monitoring and eliminating them isn’t possible. It is with this aspect in mind
that the implemented system explores a combination of methods to detect spam. Spam is a
problem because it wastes resources. It reduces wastage capacity in the mailbox; reduce wastage
of time and money using spam detection systems.
The proposed system recognizes the need to integrate information about spam from
every available source. Which is why when some of the existing systems work on a piece meal
approach , the proposed system tries to gather spam information from all legitimate sources such
as peer servers, client side reporting , server side reporting and global spam declarations .
It also follows through with the finger print processing which is a recent development in
spam detection methods. It uses a combination of these approaches rather than a single
dimension focus by many of the current systems in practice. In essence it tries to leverage the
best practices of as many approaches as possible and integrate them together for a unified
approach. Also there is a comparison of existing received mail which found the simultaneous
spam mails.
The implemented system utilizes the following techniques to detect spam:
• Fingerprint checking
• Pattern Matching
• Space count filter
• Scoring
The implemented system aims to utilize filter based spam detection methods and classify
them as such. It uses such an approach in unison with fingerprint processing 0which increases
efficiency and the spam detection hit ratio. The system also makes use of multiple sources of
spam reporting. These are detailed as mail server side reporting, peer to peer reporting by other
servers, in-house client side reporting to mail servers, and server side reporting to clients.
Furthermore care should be taken not to filter legitimate emails of the clients in the haste to
contain spam. The implemented system uses these approaches to optimize on the performances.
The implemented system makes use of fingerprint checking to mark the message as
Spam. Scoring is yet another stage which is adds to the previous steps outlined.
As we see spam detection remains one of the focused areas of research in recent times
and not one complete solution has been found to be satisfactory. The implemented system aims
to fulfill some of the objectives pertaining to spam detection.
Rule-based filtering.
Rule-based filters assign a spam “score” to each email based on whether the email
contains features typical of spam messages, such as fake SMTP components, keywords, HTML
formatting like fancy fonts and background colors. A major problem with rule-based scores is
that since their semantics is not well-defined, it is difficult to aggregate them and to establish a
threshold that can actually limit the number of false positives. Also, experience has shown that
spammers quickly learn feature-based rules and freely investigate ways to overcome them.
• Preferred list
This list maintains the preferred list of e-mail for each client separately. This list is
compared for granting access to the client’s inbox .if the client‘s preferred list submitted to his
service provider does not contain the email id of the inward email , it is filtered.
This is a comprehensive report that contains the list of spams reported across geographic
and domains .the two very important sources are
Source 1 : From clients of server who report spam. This can either be intranetwork
or internetwork .
Source 2 : From Global spam report by other server also called an ALERT. The
illustration depicts some of the spam reportings that the system recognizes.
MAIL SERVER
Collaborative identification of spam exploits the fact that every spam message is usually
sent by an automatic system to many recipients. In general, function “spam/ham” is not a
computable function, and an accurate determination can only be based on the evaluation of the
collective opinion of the user population.
• Spam report Received by server Update the spam dictionary. Created the related digest
for this dictionary store & forward to other server. This is known as Collaborative
approach because the servers share the spam digest among themselves.
• Uses this dictionary to compare fingerprint values
FINGERPRINT PROCESSING
Comparing the fingerprint values can detect spam. Fingerprint is a digest value that is
unique for a string, whereby specific spam e-mails are identified and a unique “fingerprint" is
developed. It is calculated based on a fingerprint algorithm using substrings. This fingerprint size
is smaller than the email size.
This fingerprint’s are smaller than the email messages. Using Finger print vector can compare
this fingerprint values.
Special cases when spammer includes the space between characters of a particular string
to avoid being detected as spam by dictionary checking in those cases. Eliminate the intra string
spaces and form the word and compare with spam Dictionary which some unwanted words and
giving individual percentage to all the words. Filter based on user input in which user specifies
some ID as spam ID.
1303805399
7515253209
S N
S
O E
R L
T E
n C
T
Substring Digest
APPLICATIONS
Current systems follow different approaches to detect spam but with not much
success as we see from the propagation of spam to alarming extents. Usually the traditional
filtering mechanisms are followed in most of the systems. The proposed system recognizes the
need to integrate information about spam from every available source. Which is why when some
of the existing systems work on a piece meal approach , the proposed system tries to gather spam
information from all legitimate sources such as peer servers, client side reporting , server side
reporting and global spam declarations . it also follows through with the finger print processing
which is a recent development in spam detection methods. It uses a combination of these
approaches rather than a single dimension focus by many of the current systems in practice. In
essence it tries to leverage the best practices of as many approaches as possible and integrate
them together for a unified approach.
Such solutions consist in software plug-ins installed on every computer that needs spam
filtering. This approach is mainly used for organizations with few computers whose users are in
charge of managing the filters; it proves awkward to adopt by larger organizations. Web-based or
mobile access to email is not protected by plug-ins, because filtering only operates on the
computer where the plug-in is running.
GATEWAY FILTERING
In this approach, all inbound email is routed through a filtering gateway before being
delivered to the mail server. Gateway services work well with web based and mobile access to
email, and may increase robustness since they queue emails if the client network or server is off-
line. On the other hand, the gateway itself is a single point of failure and may be difficult to
manage in presence of multiple mail servers within an organization. A correct approach to spam
filtering should not mandate any of the above choices. P2P architectures can provide high
flexibility, because they smoothly adapt themselves to the underlying network and emerging
application architectures. Also, component-based design should be used to deploy anti-spam
filters on single-user mailers residing on personal computers as well as on organization-wide
mailers running on server clusters.
INTRODUCTION TO JAVA
Java has two things: a programming language and a platform. Java is a high-level
programming language that is all of the following
Simple Architecture-neutral
Object-oriented Portable
Distributed High-performance
Interpreted multithreaded
Robust Dynamic
Secure
Java is also unusual in that each Java program is both compiled and interpreted.
With a compile you translate a Java program into an intermediate language called Java byte
codes the platform-independent code instruction is passed and run on the computer.
Compilation happens just once; interpretation occurs each time the
program is executed. The figure illustrates how this works.
Compilers My Program
You can think of Java byte codes as the machine code instructions for the Java
Virtual Machine (Java VM). Every Java interpreter, whether it’s a Java development tool or a
Web browser that can run Java applets, is an implementation of the Java VM. The Java VM
can also be implemented in hardware.
Java byte codes help make “write once, run anywhere” possible. You can compile
your Java program into byte codes on my platform that has a Java compiler. The byte codes
can then be run any implementation of the Java VM. For example, the same Java program
can run Windows NT, Solaris, and Macintosh.
JAVA PLATFORM
You’ve already been introduced to the Java VM. It’s the base for the Java
platform and is ported onto various hardware-based platforms.
The Java API is grouped into libraries (package) of related components. The next
sections, what can Java do? Highlights each area of functionally provided by the package in
the Java API.
The following figure depicts a Java program, such as an application or applet, that’s
running on the Java platform. A special kind of application known as a server serves and
supports clients on a network. Examples of the servers include Web Servers, proxy servers,
mail servers, print servers, and boot servers. Another specialized program is a Servlet.
Servlets are similar to applets in that they are runtime extensions of the application. Instead
of working in browsers, though, servlets run with in Java Web Servers, configuring of
tailoring the server.
How does the Java API support all of these kinds of programs? With packages of
software components that provide a wide range of functionality. The API is the API included
in every full implementation of the platform.
JAVA PROGRAM
Java API
Java Virtual Machine
Java Program
Hard Ware
API and Virtual Machine insulates the Java program from hardware dependencies. As a
platform-independent environment, Java can be a bit slower than native code. However,
smart compilers, well-tuned interpreters, and Just-in-time-byte-code compilers can bring
Java’s performance close to the native code without threatening portability.
However, Java is not just for writing cut, entertaining applets for the
World Wide Web (WWW). Java is a general purpose, high-level programming language and
a powerful software platform. Using the fineries Java API, you can write many types of
programs.
The most common types of program are probably applets and application, where a
Java application is a standalone program that runs directly on the Java platform.
SECURITY:
Most popular and widely accepted database connectivity called Open Database
Connectivity (ODBC) is used to access the relational databases. It offers the ability to
connect to almost all the databases on almost all platforms. Java applications can also use this
ODBC to communicate with a database. Then we need JDBC why? There are several
reasons:
ODBC API was completely written in C language and it makes an extensive use
of pointers. Calls from Java to native C code have a number of drawbacks in the
security, implementation, robustness and automatic portability of applications.
ODBC is hard to learn. It mixes simple and advanced features together, and it has
complex options even for simple queries.
ODBC drivers must be installed on client’s machine.
Architecture of JDBC:
JDBC Drivers
JDBC Drivers
1. Application Layer: Java program wants to get a connection to a database. It needs the
information from the database to display on the screen or to modify the existing data
or to insert the data into the table.
2. Driver Manager: The layer is the backbone of the JDBC architecture. When it
receives a connection-request form.
3. The JDBC Application Layer: It tries to find the appropriate driver by iterating
through all the available drivers, which are currently registered with Device
Manager. After finding out the right driver it connects the application to appropriate
database.
4. JDBC Driver layers: This layer accepts the SQL calls from the application and
converts them into native calls to the database and vice-versa. A JDBC Driver is
responsible for ensuring that an application has consistent and uniform m access to
any database.
When a request received by the application, the JDBC driver passes the request to the
ODBC driver, the ODBC driver communicates with the database and sends the request
and gets the results. The results will be passed to the JDBC driver and in turn to the
application. So, the JDBC driver has no knowledge about the actual database, it knows
how to pass the application request o the ODBC and get the results from the ODBC.
The JDBC and ODBC interact with each other, how? The reason is both the
JDBC API and ODBC are built on an interface called “Call Level Interface” (CLI).
Because of this reason the JDBC driver translates the request to an ODBC call. The
ODBC then converts the request again and presents it to the database. The results of the
request are then fed back through the same channel in reverse.
JDBC Connection
connection.close();
JDBC Summary
Benefits of JDBC
• No proprietary DB code
• Don’t have to rely too much on single vendor
• Don’t need to make DB vendor decision early
• Easier for DB vendors
– Don’t need to provide a query language, only implement API
– Only low-level support
The design goal for the RMI architecture was to create a Java distributed object
model that integrates naturally into the Java programming language and the local object model.
RMI architects have succeeded; creating a system that extends the safety and robustness of the
Java architecture to the distributed computing world.
This fits nicely with the needs of a distributed system where clients are concerned
about the definition of a service and servers are focused on providing the service.
remember that a Java interface does not contain executable code. RMI supports two classes that
implement the same interface. The first class is the implementation of the behavior, and it runs
on the server. The second class acts as a proxy for the remote service and it runs on the client.
A client program makes method calls on the proxy object, RMI sends the request to
the remote JVM, and forwards it to the implementation. Any return values provided by the
implementation are sent back to the proxy and then to the client's program.
RMI ARCHITECTURE LAYERS
With an understanding of the high-level RMI architecture, take a look under the covers
to see its implementation.
The RMI implementation is essentially built from three abstraction layers. The first is
the Stub and Skeleton layer, which lies just beneath the view of the developer. This layer
intercepts method calls made by the client to the interface reference variable and redirects these
calls to a remote RMI service.
The next layer is the Remote Reference Layer. This layer understands how to interpret
and manage references made from clients to the remote service objects. In JDK 1.1, this layer
connects clients to remote service objects that are running and exported on a server. The
connection is a one-to-one (unicast) link. In the Java 2 SDK, this layer was enhanced to support
the activation of dormant remote service objects via Remote Object Activation.
The stub and skeleton layer of RMI lie just beneath the view of the Java developer. In
this layer, RMI uses the Proxy design pattern as described by Gamma, Helm, Johnson and
Vlissides. In the Proxy pattern, an object in one context is represented by another (the proxy) in a
separate context. The proxy knows how to forward method calls between the participating
objects. The following class diagram illustrates the Proxy pattern.
In RMI's use of the Proxy pattern, the stub class plays the role of the proxy, and the
remote service implementation class plays the role of the RealSubject.
A skeleton is a helper class that is generated for RMI to use. The skeleton understands
how to communicate with the stub across the RMI link. The skeleton carries on a conversation
with the stub; it reads the parameters for the method call from the link, makes the call to the
remote service implementation object, accepts the return value, and then writes the return value
back to the stub.
In the Java 2 SDK implementation of RMI, the new wire protocol has made skeleton
classes obsolete. RMI uses reflection to make the connection to the remote service object.
You only have to worry about skeleton classes and objects in JDK 1.1 and JDK 1.1 compatible
system implementations.
The stub objects use the invoke() method in RemoteRef to forward the method call.
The RemoteRef object understands the invocation semantics for remote services.
The JDK 1.1 implementation of RMI provides only one way for clients to connect to
remote service implementations: a unicast, point-to-point connection. Before a client can use a
remote service, the remote service must be instantiated on the server and exported to the RMI
system. (If it is the primary service, it must also be named and registered in the RMI Registry).
The Java 2 SDK implementation of RMI adds a new semantic for the client-server
connection. In this version, RMI supports activatable remote objects. When a method call is
made to the proxy for an activatable object, RMI determines if the remote service
implementation object is dormant. If it is dormant, RMI will instantiate the object and restore its
state from a disk file. Once an activatable object is in memory, it behaves just like JDK 1.1
remote service implementation objects.
Other types of connection semantics are possible. For example, with multicast, a single
proxy could send a method request to multiple implementations simultaneously and accept the
first reply (this improves response time and possibly improves availability). In the future, Sun
may add additional invocation semantics to RMI.
Transport Layer
The Transport Layer makes the connection between JVMs. All connections are
stream-based network connections that use TCP/IP.
Even if two JVMs are running on the same physical computer, they connect through
their host computer's TCP/IP network protocol stack. (This is why you must have an operational
TCP/IP configuration on your computer to run the Exercises in this course). The following
diagram shows the unfettered use of TCP/IP connections between JVMs.
TCP/IP provides a persistent, stream-based connection between two machines based on
an IP address and port number at each end. Usually a DNS name is used instead of an IP address;
this means you could talk about a TCP/IP connection between flicka.magelang.com:3452 and
rosa.jguru.com:4432. In the current release of RMI, TCP/IP connections are used as the
foundation for all machine-to-machine connections.
On top of TCP/IP, RMI uses a wire level protocol called Java Remote Method Protocol
(JRMP). JRMP is a proprietary, stream-based protocol that is only partially specified is now in
two versions. The first version was released with the JDK 1.1 version of RMI and required the
use of Skeleton classes on the server. The second version was released with the Java 2 SDK. It
has been optimized for performance and does not require skeleton classes. (Note that some
alternate implementations, such as BEA Weblogic and NinjaRMI do not use JRMP, but instead
use their own wire level protocol. ObjectSpace's Voyager does recognize JRMP and will
interoperate with RMI at the wire level.)
The RMI transport layer is designed to make a connection between clients and server,
even in the face of networking obstacles.
While the transport layer prefers to use multiple TCP/IP connections, some network
configurations only allow a single TCP/IP connection between a client and server (some
browsers restrict applets to a single network connection back to their hosting server).
In this case, the transport layer multiplexes multiple virtual connections within a single
TCP/IP connection.
Naming Remote Objects
During the presentation of the RMI Architecture, one question has been repeatedly
postponed: Clients find remote services by using a naming or directory service. This may seem
like circular logic. A naming or directory service is run on a well-known host and port number.
RMI can use many different directory services, including the Java Naming and
Directory Interface (JNDI). RMI itself includes a simple service called the RMI Registry,
rmiregistry. The RMI Registry runs on each machine that hosts remote service objects and
accepts queries for services, by default on port 1099.
On a host machine, a server program creates a remote service by first creating a local
object that implements that service. Next, it exports that object to RMI. When the object is
exported, RMI creates a listening service that waits for clients to connect and request the service.
After exporting, the server registers the object in the RMI Registry under a public name.
On the client side, the RMI Registry is accessed through the static class Naming . It
provides the method Lookup ( ).that a client uses to query a registry. The method lookup()
accepts a URL that specifies the server host name and the name of the desired service. The
method returns a remote reference to the service object. The URL takes the form:
rmi://<host_name>
[:<name_service_port>]
/<service_name>
where the host_name is a name recognized on the local area network (LAN) or a DNS name on
the Internet. The name_service_port only needs to be specified only if the naming service is
running on a different port to the default 1099.
Using RMI
It is now time to build a working RMI system and get hands-on experience. In this section,
you will build a simple remote calculator service and use it from a client program.
A working RMI system is composed of several parts.
JAVA AWT
SWING
Preliminary investigations involve examining the project feasibility, the likelihood that
the system will be useful to the organization. Feasibility is the determination of whether or not a
project is worth doing. The process followed making this determination is called feasibility
study. This type of study determines if a project can and should be taken. Once it has been
determined, that a project is feasible, the analyst can go ahead and prepare the project
specification. Generally feasibility studies are undertaken within tight time constraints and
normally culminates in a written and oral feasibility reports. For conducting feasibility study
analyst should consider
➢ Technical Feasibility
➢ Operational Feasibility
➢ Economic Feasibility
A system that can be developed technically and that will be used if installed must still be
a good investment for the organization. The financial and economic questions raised during
investigation is almost justified as follows:
The cost to conduct a full system investigation is almost negligible as