A Correlation-Based Feature Weighting Filter For Navie Bayes
A Correlation-Based Feature Weighting Filter For Navie Bayes
A Correlation-Based Feature Weighting Filter For Navie Bayes
NAVIE BAYES
Abstract
Due to its simplicity, efficiency and efficacy, naive Bayes (NB) has continued to be one of the
top 10 algorithms in the data mining and machine learning community. Of numerous approaches
to alleviate its conditional independence assumption, feature weighting has placed more
emphasis on highly predictive features than those that are less predictive. In this paper, we argue
that for NB highly predictive features should be highly correlated with the class (maximum
mutual relevance), yet uncorrelated with other features (minimum mutual redundancy). Based on
this premise, we propose a correlation-based feature weighting (CFW) filter for NB. In CFW, the
weight for a feature is proportional to the difference between the feature-class correlation
(mutual relevance) and the average feature-feature inters correlation (average mutual
redundancy). Experimental results show that NB with CFW significantly outperforms NB and all
the other existing state-of-the-art feature weighting filters used to compare. Compared to feature
weighting wrappers for improving NB, the main advantages of CFW are its low computational
complexity (no search involved) and the fact that it maintains the simplicity of the final model.
Besides, we apply CFW to text classification and have achieved remarkable improvements.
Acknowledgment
List of figures
List of abbreviations.
- System analysis
- Existing system
- Proposed system.
- Feasibility study
- Technical feasibility
- Operational feasibility
- Economical feasibility
- System Requirements
- Modules description
- SDLC methodology
- Software requirement
- Hardware requirement
- System design
-UML
- Technology description.
- coding
- testing
- Output screens
- Conclusion
- Bibliography
- References.
Abstract
Due to its simplicity, efficiency and efficacy, naive Bayes (NB) has continued to be one of the
top 10 algorithms in the data mining and machine learning community. Of numerous approaches
to alleviate its conditional independence assumption, feature weighting has placed more
emphasis on highly predictive features than those that are less predictive. In this paper, we argue
that for NB highly predictive features should be highly correlated with the class (maximum
mutual relevance), yet uncorrelated with other features (minimum mutual redundancy). Based on
this premise, we propose a correlation-based feature weighting (CFW) filter for NB. In CFW, the
weight for a feature is proportional to the difference between the feature-class correlation
(mutual relevance) and the average feature-feature inters correlation (average mutual
redundancy). Experimental results show that NB with CFW significantly outperforms NB and all
the other existing state-of-the-art feature weighting filters used to compare. Compared to feature
weighting wrappers for improving NB, the main advantages of CFW are its low computational
complexity (no search involved) and the fact that it maintains the simplicity of the final model.
Besides, we apply CFW to text classification and have achieved remarkable improvements
INTRODUCTION
Objectives
The feature weighted naive Bayes is run and therefore are feature weighting filters. In
addition to them, there exists another category of feature weighting methods which use the
performance feedback from the feature weighted naive Bayes to optimize feature weights as
a whole. We call them feature weighting wrappers. For example, Wu and Cai proposed a
differential evolution-based feature weighting wrapper (DEFW), which conducts a
differential evolution search to optimize feature weights By maximizing the classification
accuracy of the final model. Zaidi et al. proposed a conditional log likelihood-based feature
weighting wrapper (CLLFW) And a mean squared error-based feature weighting wrapper
(MSEFW), which employ gradient descent searches to optimize feature weights by
maximizing the conditional Log likelihood or minimizing the mean squared error of the final
model, respectively.
In feature weighting methods, each feature is assigned different weight according to its
importance. This kind of feature weighting corresponds to stretching the axes in the feature
space, and the amount by which each axis is stretched is determined by the importance of
each feature. This process of stretching the axes in order to
Optimize the performance of naive Bayes (NB) provides mechanism for inspiring more
relevant features and suppressing less relevant features. Appropriate feature weights can
reduce the error that results from violations of the feature independence assumption required
by NB. Obviously, if a set of training data include a set of features that are identical to one
another, the error due to the violation of the feature independence assumption can be
removed by assigning weights
Feature selection. Feature weighting is strictly more powerful than feature selection, as it is
possible to obtain identical results to feature selection by setting the weights of selected
features to 1:0 and of unselected features to 0:0, and assignment of other weights can result
in NB that cannot be expressed using feature Selection. In a word, feature weighting assigns
a continuous positive weight to each feature, and thus is a more flexible approach than
feature selection.
Since feature weighting methods alleviate naive Bayes ‘feature independence assumption by
assigning greater weights to highly predictive features than those that are less predictive. In
this paper, we argue that for naive Bayes highly predictive features should be highly
correlated with the class (maximum mutual Relevance), yet uncorrelated with other features
(minimum mutual redundancy). Based on this premise, we propose a correlation-based
feature weighting (CFW)filter for naive Bayes. In CFW, the weight for a feature is
proportional to the difference between the feature-class correlation (mutual relevance) and
the average feature Interco relation (average mutual redundancy).
DISADVANTAGES
In our experiments, two Monk’s problems are chosen to generate two artificial datasets.
These two Monk’s problems are challenging artificial domains that have been widely used to
compare the performance of machine learning algorithms.
They involve irrelevant features and high degrees of feature interaction. Each Monk’s
problem uses the same representation and contains 432 instances described by six nominal
features. For each problem there is a standard training and test set.
Proposed system
The study of feature weighting is a relatively mature field in the data mining and
machine learning community, and a large number of existing feature weighting
methods prevent us from presenting them exhaustively. Here we only provide a
compact survey on the state-of-the-art feature weighting methods specially designed
for naive Bayes. To the best of our knowledge, the earliest feature weighting method
specially designed for naive Bayes is by Ferreira et al. However, it assigns a weight to
each feature value rather than each feature and therefore is not strictly a feature
weighting method but a feature value weighting method. In order to strictly perform
feature weighting, Zhang and Sheng proposed a gain ratio-based feature weighting
method (GRFW) at first. They argued that a feature with higher gain ratio deserves a
larger weight and therefore set the weight of each feature to the gain ratio of the feature
relative to the average gain ratio across all features.
FEASIBILITY STUDY
PRELIMINARY INVESTIGATION
The first and foremost strategy for development of a project starts from the thought of
designing a mail enabled platform for a small firm in which it is easy and convenient of sending
and receiving messages, there is a search engine ,address book and also including some
entertaining games. When it is approved by the organization and our project guide the first
activity, ie. Preliminary investigation begins. The activity has three parts:
Request Clarification
Feasibility Study
Request Approval
REQUEST CLARIFICATION
After the approval of the request to the organization and project guide, with an
investigation being considered, the project request must be examined to determine precisely what
the system requires. Here our project is basically meant for users within the company whose
systems can be interconnected by the Local Area Network(LAN). In today’s busy schedule man
need everything should be provided in a readymade manner. So taking into consideration of the
vastly use of the net in day to day life, the corresponding development of the portal came into
existence.
FEASIBILITY ANALYSIS
An important outcome of preliminary investigation is the determination that the system request
is feasible. This is possible only if it is feasible within limited resource and time. The different
feasibilities that have to be analyzed are
Operational Feasibility
Economic Feasibility
Technical Feasibility
Operational Feasibility
Operational Feasibility deals with the study of prospects of the system to be developed.
This system operationally eliminates all the tensions of the Admin and helps him in effectively
tracking the project progress. This kind of automation will surely reduce the time and energy,
which previously consumed in manual work. Based on the study, the system is proved to be
operationally feasible.
Economic Feasibility
Technical Feasibility
According to Roger S. Pressman, Technical Feasibility is the assessment of the technical
resources of the organization. The organization needs IBM compatible machines with a graphical
web browser connected to the Internet and Intranet. The system is developed for platform
Independent environment. Java Server Pages, JavaScript, HTML, SQL server and WebLogic
Server are used to develop the system. The technical feasibility has been carried out. The system
is technically feasible for development and can be developed with the existing facility.
SYSTEM REQUIREMENTS
Modules description
Correlation – based feature weighting:
Before discussing the performance of our proposed correlation-based feature weighting (CFW)
on standard benchmark datasets, it could be helpful to get some intuitive feeling and illustrative
results on CFW through two artificial examples. Thus, this section compares the behavior of our
proposed correlation-based feature weighting (CFW) to that of standard naive Bayes (NB) on
two artificially generated datasets. In particular, we are interested in how sensitive the proposed
CFW is to different levels of correlated features. We also experimentally argue that for naive
Bayes highly predictive features should be highly correlated with the class, yet uncorrelated with
other features. In our experiments, two Monk’s problems are chosen to generate two artificial
datasets. These two Monk’s problems are challenging artificial domains that have been widely
used to compare the performance of machine learning algorithms. They involve irrelevant
features and high degrees of feature interaction. Each Monk’s problem uses the same
representation and contains 432 instances described by six nominal features. For each problem
there is a standard training and test set.
Now, the only question left to answer is how to define the weight of each predictive feature,
which is crucial in constructing FWNB and has attracted more and more attention from
researchers. The study of feature weighting is a relatively mature field in the data mining and
machine learning community, and a large number of existing feature weighting methods prevent
us from presenting them exhaustively. Here we only provide a compact survey on the state-of-
the-art feature weighting methods specially designed for naive Bayes. To the best of our
knowledge, the earliest feature weighting method specially designed for naive Bayes is by
Ferreira et al. However, it assigns a weight to each feature value rather than each feature and
therefore is not strictly a feature weighting method but a feature value weighting method. In
order to strictly perform feature weighting, Zhang and Sheng proposed a gain ratio-based feature
weighting method (GRFW) at first.
Differential evolution – based feature weighting:
It can be seen that all above feature weighting methods directly compute feature weights
according to the heuristics based on general data characteristics before the feature weighted
naive Bayes is run and therefore are feature weighting filters. In addition to them, there exists
another category of feature weighting methods which use the performance feedback from the
feature weighted naive Bayes to optimize feature weights as a whole. We call them feature
weighting wrappers. For example, Wu and CAI proposed a differential evolution-based feature
weighting wrapper (DEFW), which conducts a differential evolution search to optimize feature
weights by maximizing the classification accuracy of the final model. Zaidi et al. Proposed a
conditional log likelihood-based feature weighting wrapper (CLLFW) and a mean squared error-
based feature weighting wrapper (MSEFW), which employ gradient descent searches to optimize
feature weights by maximizing the conditional log likelihood or minimizing the mean squared
error of the final model, respectively.
Naïve Bayes:
In feature weighting methods, each feature is assigned a different weight according to its
importance. This kind of feature weighting corresponds to stretching the axes in the feature
space, and the amount by which each axis is stretched is determined by the importance of each
feature. This process of stretching the axes in order to optimize the performance of naive Bayes
(NB) provides a mechanism for inspiring more relevant features and suppressing less relevant
features. Appropriate feature weights can reduce the error that results from violations of the
feature independence assumption required by NB. Obviously, if a set of training data include a
set of features that are identical to one another, the error due to the violation of the feature
independence assumption can be removed by assigning weights that sum to 1:0 to the set of
features in the set. For example, the weight for one of the features, Ai could be set to 1:0, and
that of the remaining features.
SDLC methodology
INPUT DESIGN
Input Design plays a vital role in the life cycle of software development, it requires very
careful attention of developers. The input design is to feed data to the application as accurate as
possible. So inputs are supposed to be designed effectively so that the errors occurring while
feeding are minimized. According to Software Engineering Concepts, the input forms or screens
are designed to provide to have a validation control over the input limit, range and other related
validations.
This system has input screens in almost all the modules. Error messages are developed to
alert the user whenever he commits some mistakes and guides him in the right way so that
invalid entries are not made. Let us see deeply about this under module design.
Input design is the process of converting the user created input into a computer-based
format. The goal of the input design is to make the data entry logical and free from errors. The
error is in the input are controlled by the input design. The application has been developed in
user-friendly manner. The forms have been designed in such a way during the processing the
cursor is placed in the position where must be entered. The user is also provided with in an
option to select an appropriate input from various alternatives related to the field in certain cases.
Validations are required for each data entered. Whenever a user enters an erroneous data,
error message is displayed and the user can move on to the subsequent pages after completing all
the entries in the current page.
OUTPUT DESIGN
The Output from the computer is required to mainly create an efficient method of
communication within the company primarily among the project leader and his team members,
in other words, the administrator and the clients. The output of VPN is the system which allows
the project leader to manage his clients in terms of creating new clients and assigning new
projects to them, maintaining a record of the project validity and providing folder level access to
each client on the user side depending on the projects allotted to him. After completion of a
project, a new project may be assigned to the client. User authentication procedures are
maintained at the initial stages itself. A new user may be created by the administrator himself or
a user can himself register as a new user but the task of assigning projects and validating a new
user rests with the administrator only.
The application starts running when it is executed for the first time. The server has to be started
and then the internet explorer in used as the browser. The project will run on the local area
network so the server machine will serve as the administrator while the other connected systems
can act as the clients. The developed system is highly user friendly and can be easily understood
by anyone using it even for the first time
FUNCTIONAL REQUIREMENTS:
Input: this should require dataset information data. It for used evaluation.
Process: depend on algorithms analysis works. We analyses security, items search using
algorithms using.
NON-FUNCTIONAL REQUIREMENTS:
Usability: This should be given the leading priority. This should be able to log into system with
ease and should be able to access all grants. A User can learn to operate prepare inputs for
interpret outputs on a system.
Reliability: This is the ability of system component to perform it required functions understand
Condition for a specified period on time. Reliability includes mean time to security attacks or
failures. One of the main factors that are used to determines the important requirement of any
application.
Performance: It is concerned with quantifiable activates of the system. System must have
internet facility to maintain an accurate date and time and transfer operations.
Implementation: The client is implemented in Java, it can run on any browser where the user
will be able to operate the system.
Operations: The operations requirements are constraints on the Boolean keywords and query
conditions.
Extensibility: This system should be flexible in such a way that it can be easily extended in
order to add some more modules in the future.
Hardware Constraints:
Ram : 128Mb.
Software Constraints:
Techniques : Java
IDE : .Netbeans
Database : MySql
SYSTEM DESIGN
SYSTEM DESIGN
There are several reasons to identify the design goals of any system. These goals will help to
design the system in an efficient manner. There are several criteria to identify these goals. Some
of the criteria were explained below:
Performance criteria:
a) Response time: The response time of the method is very low because the system simple
design developed on the high performance system.
Dependability criteria:
a) Robustness: the system should be designed to work efficiently on images of any type of
formats without any problem.
b) Availability: the system should be ready to accept command from user at any point of time.
c) Fault Tolerance: the system should not allow the user to work with fault input. It displays
error messages foe every specific fault occurred.
Maintenance criteria:
a) Portability: the system should work on all the platforms like linux, windows.
b) Readability: the code generated should be able to understand the purpose of the project, so
as to make the user to make the modifications easily.
c) Traceability: the code generated should be easy to map with the functions with the
operations selected by the user.
End-user criteria:
a) Utility: the system should be made to operate on al inputs of end-user under any kind of
circumstances. It should complete all the commands or instructions given by user without
any interruptions.
b) Usability: the interface of the user is to be defined with all options which make the work of
the end-user easier.
UML Diagrams
UML stands for Unified Modeling Language. This object-oriented system of notation has
evolved from the work of Grady Booch, James Rumbaugh, Ivar Jacobson, and the Rational
Software Corporation. These renowned computer scientists fused their respective technologies
into a single, standardized model. Today, UML is accepted by the Object Management Group
(OMG)as the standard for modeling object oriented programs.
There are two broad categories of diagrams and then are again divided into sub-categories:
• Structural Diagrams
• Behavioral Diagrams
Structural Diagrams:
The structural diagrams represent the static aspect of the system. These static aspects represent
those parts of a diagram which forms the main structure and therefore stable.
These static parts are represents by classes, interfaces, objects, components and nodes. The four
structural diagrams are:
• Class diagram
• Object diagram
• Component diagram
• Deployment diagram
Class Diagram:
Class diagrams are the most common diagrams used in UML. Class diagram consists of classes,
interfaces, associations and collaboration. Class diagrams basically represent the object oriented
view of a system which is static in nature. Active class is used in a class diagram to represent
the concurrency of the system. Class diagram represents the object orientation of a system. So it
is generally used for development purpose. This is the most widely used diagram at the time of
system construction.
Object Diagram:
Component Diagram:
During design phase software artifacts (classes, interfaces etc) of a system are
arranged in different groups depending upon their relationship. Now these groups are known as
components. Finally, component diagrams are used to visualize the implementation.
Deployment Diagram:
Deployment diagrams are a set of nodes and their relationships. These nodes are
physical entities where the components are deployed. Deployment diagrams are used for
visualizing deployment view of a system. This is generally used by the deployment team.
Behavioral Diagrams: Any system can have two aspects, static and dynamic. So a model is
considered as complete when both the aspects are covered fully. Behavioral diagrams basically
capture the dynamic aspect of a system. Dynamic aspect can be further described as the
changing/moving parts of a system.
• Sequence diagram
• Collaboration diagram
• Activity diagram
Use case diagrams are a set of use cases, actors and their relationships. They represent
the use case view of a system. A use case represents a particular functionality of a system. So
use case diagram is used to describe the relationships among the functionalities and their
internal/external controllers. These controllers are known as actors.
Sequence Diagram:
A sequence diagram is an interaction diagram. From the name it is clear that the
diagram deals with some sequences, which are the sequence of messages flowing from one
object to another. Interaction among the components of a system is very important from
implementation and execution perspective. So Sequence diagram is used to visualize the
sequence of calls in a system to perform a specific functionality.
Collaboration Diagram:
The purpose of collaboration diagram is similar to sequence diagram. But the specific purpose
of collaboration diagram is to visualize the organization of objects and their interaction.
State chart Diagram:
Activity Diagram:
Architecture Diagram
USE CASE DIAGRAM:
To model a system the most important aspect is to capture the dynamic behaviour. To
clarify a bit in details, dynamic behaviour means the behaviour of the system when it is
running /operating. So only static behaviour is not sufficient to model a system rather dynamic
behaviour is more important than static behaviour.
In UML there are five diagrams available to model dynamic nature and use case diagram
is one of them. Now as we have to discuss that the use case diagram is dynamic in nature there
should be some internal or external factors for making the interaction. These internal and
external agents are known as actors. So use case diagrams are consists of actors, use cases and
their relationships.
The diagram is used to model the system/subsystem of an application. A single use case
diagram captures a particular functionality of a system. So to model the entire system numbers of
use case diagrams are used. A use case diagram at its simplest is a representation of a user's
interaction with the system and depicting the specifications of a use case. A use case diagram can
portray the different types of users of a system and the case and will often be accompanied by
other types of diagrams as well.
Login
Register
Client
View Uploads
Admin
In software engineering, a class diagram in the Unified Modeling Language (UML) is a type of
static structure diagram that describes the structure of a system by showing the system's classes,
their attributes, operations (or methods), and the relationships among the classes. It explains
which class contains information.
Client View
Admin View
Registration
Login
Login
View Uploads()
Search Upload Files()
View Images()
Download Search Files()
View Image Information on Browser()
View Download File()
Server View
Registration
Login
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order.
Client Server Admin Database
Register
Register
Login
Login
Login
Search Files
View Files
Download Imgae
View Image
View Image
.
COLLABORATION DIAGRAM
Databas
e
Admin
3: Login
9: View Files
10: Download Imgae
1: Register 12: View Image
2: Register
6: Upload Files on Peers
7: Search Files
Server
ACTIVITY DIAGRAM:
Activity diagrams are graphical representations of workflows of stepwise activities and actions
with support for choice, iteration and concurrency. In the Unified Modeling Language, activity
diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.
logout
Technology description
Java Technology
The Java programming language is a high-level language that can be characterized by all
of the following buzzwords:
i. Simple
ii. Architecture neutral
iii. Object oriented
iv. Portable
v. Distributed
vi. High performance
vii. Interpreted
viii. Multithreaded
ix. Robust
With most programming languages, you either compile or interpret a program so that you
can run it on your computer. The Java programming language is unusual in that a program is
both compiled and interpreted. With the compiler, first you translate a program into an
intermediate language called Java byte codes —the platform-independent codes interpreted by
the interpreter on the Java platform. The interpreter parses and runs each Java byte code
instruction on the computer. Compilation happens just once; interpretation occurs each time the
program is executed. The following figure illustrates how this works.
A platform is the hardware or software environment in which a program runs. The Java
platform differs from most other platforms in that it’s a software-only platform that runs on top
of other hardware-based platforms.
You’ve already been introduced to the Java VM. It’s the base for the Java platform and is
ported onto various hardware-based platforms. The Java API is a large collection of ready-made
software components that provide many useful capabilities, such as graphical user interface
(GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these
libraries are known as packages.The following figure depicts a program that’s running on the
Java platform. As the figure shows, the Java API and the virtual machine insulate the program
from the hardware.
Native code is code that after you compile it, the compiled code runs on a specific
hardware platform. As a platform-independent environment, the Java platform can be a bit
slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte
code compilers can bring performance close to that of native code without threatening
portability.
Every full implementation of the Java platform gives you the following features:
i. The essentials: Objects, strings, threads, numbers, input and output, data structures,
system properties, date and time, and so on.
iii. Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram
Protocol) sockets, and IP (Internet Protocol) addresses.
iv. Internationalization: Help for writing programs that can be localized for users
worldwide. Programs can automatically adapt to specific locales and be displayed in the
appropriate langage.
v. Security: Both low level and high level, including electronic signatures, public and
private key management, access control, and certificates.
vi. Software components: Known as JavaBeansTM, can plug into existing component
architectures.
vii. Object serialization: Allows lightweight persistence and communication via Remote
Method Invocation (RMI).
viii. Java Database Connectivity (JDBCTM): Provides uniform access to a wide range of
relational databases.
The Java platform also has APIs for 2D and 3D graphics, accessibility, servers,
collaboration, telephony, speech, animation, and more. The following figure depicts what is
included in the Java 2 SDK.
ODBC
Through the ODBC Administrator in Control Panel, you can specify the particular
database that is associated with a data source that an ODBC application program is written to
use. Think of an ODBC data source as a door with a name on it. Each door will lead you to a
particular database. For example, the data source named Sales Figures might be a SQL Server
database, whereas the Accounts Payable data source could refer to an Access database. The
physical database referred to by a data source can reside anywhere on the LAN.
The ODBC system files are not installed on your system by Windows 95. Rather, they
are installed when you setup a separate database application, such as SQL Server Client or
Visual Basic 4.0. When the ODBC icon is installed in Control Panel, it uses a file called
ODBCINST.DLL. It is also possible to administer your ODBC data sources through a stand-
alone program called ODBCADM.EXE. There is a 16-bit and a 32-bit version of this program,
and each maintains a separate list of ODBC data sources.
The advantages of this scheme are so numerous that you are probably thinking there must
be some catch. The only disadvantage of ODBC is that it isn’t as efficient as talking directly to
the native database interface. ODBC has had many detractors make the charge that it is too slow.
Microsoft has always claimed that the critical factor in performance is the quality of the driver
software that is used. In our humble opinion, this is true. The availability of good ODBC drivers
has improved a great deal recently. And anyway, the criticism about performance is somewhat
analogous to those who said that compilers would never match the speed of pure assembly
language. Maybe not, but the compiler (or ODBC) gives you the opportunity to write cleaner
programs, which means you finish sooner. Meanwhile, computers get faster every year.
JDBC Goals:
The designers felt that their main goal was to define a SQL interface for Java. Although
not the lowest database interface level possible, it is at a low enough level for higher-level tools
and APIs to be created. Conversely, it is at a high enough level for application programmers to
use it confidently.
Attaining this goal allows for future tool vendors to “generate” JDBC code and to hide many of
JDBC’s complexities from the end user.
2. SQL Conformance
SQL syntax varies as you move from database vendor to database vendor. In an effort to
support a wide variety of vendors, JDBC will allow any query statement to be passed through it
to the underlying database driver. This allows the connectivity module to handle non-standard
functionality in a manner that is suitable for its users.
The JDBC SQL API must “sit” on top of other common SQL level APIs. This go allows
JDBC to use existing ODBC level drivers by the use of a software interface. This interface
would translate JDBC calls to ODBC and vice versa.
4. Provide a Java interface that is consistent with the rest of the Java system
Because of Java’s acceptance in the user community thus far, the designers feel that they
should not stray from the current design of the core Java system.
5. Keep it simple
This goal probably appears in all software design goal listings. JDBC is no exception.
Sun felt that the design of JDBC should be very simple, allowing for only one method of
completing a task per mechanism. Allowing duplicate functionality only serves to confuse the
users of the API.
Strong typing allows for more error checking to be done at compile time; also, less errors appear
at runtime.
Because more often than not, the usual SQL calls used by the programmer are simple
NetBeans:
The NetBeans IDE is primarily intended for development in Java, but also supports other
languages, in particular PHP, C/C++ and HTML5.
NetBeans is cross-platform and runs on Microsoft Windows, Mac OS X, Linux, Solaris and
other platforms supporting a compatible JVM.
History:
NetBeans began in 1996 as Xelfi (word play on Delphi),[7][8] a Java IDE student project under the
guidance of the Faculty of Mathematics and Physics at Charles University in Prague. In 1997,
Roman Stank formed a company around the project and produced commercial versions of the
NetBeans IDE until it was bought by Sun Microsystems in 1999. Sun open-sourced the
NetBeans IDE in June of the following year. Since then, the NetBeans community has continued
to grow.[9] In 2010, Sun (and thus NetBeans) was acquired by Oracle Corporation.
NetBeans Platform:
The NetBeans Platform is a framework for simplifying the development of Java Swing desktop
applications. The NetBeans IDE bundle for Java SE contains what is needed to start developing
NetBeans plugins and NetBeans Platform based applications; no additional SDK is required.
Applications can install modules dynamically. Any application can include the Update Center
module to allow users of the application to download digitally signed upgrades and new features
directly into the running application. Reinstalling an upgrade or a new release does not force
users to download the entire application again.The platform offers reusable services common to
desktop applications, allowing developers to focus on the logic specific to their application.
Among the features of the platform are:
i. User interface management (e.g. menus and toolbars)
ii. User settings management
iii. Storage management (saving and loading any kind of data)
iv. Window management
v. Wizard framework (supports step-by-step dialogs)
NetBeans IDE :
All the functions of the IDE are provided by modules. Each module provides a well-defined
function, such as support for the Java language, editing, or support for the CVS versioning
system, and SVN. NetBeans contains all the modules needed for Java development in a single
download, allowing the user to start working immediately. Modules also allow NetBeans to be
extended. New features, such as support for other programming languages, can be added by
installing additional modules. For instance, Sun Studio, Sun Java Studio Enterprise, and Sun
Java Studio Creator from Sun Microsystems are all based on the NetBeans IDE.
MIME Type or Content Type: If you see above sample HTTP response header, it contains tag
“Content-Type”. It’s also called MIME type and server sends it to client to let them know the
kind of data it’s sending. It helps client in rendering the data for user. Some of the mostly used
mime types are text/html, text/xml, application/xml etc.
Understanding URL
URL is acronym of Universal Resource Locator and it’s used to locate the server and resource.
Every resource on the web has it’s own unique address. Let’s see parts of URL with an example.
http://localhost:8080/FirstServletProject/jsps/hello.jsp
http:// – This is the first part of URL and provides the communication protocol to be used in
server-client communication.
localhost – The unique address of the server, most of the times it’s the hostname of the server
that maps to unique IP address. Sometimes multiple hostnames point to same IP addresses and
web server virtual host takes care of sending request to the particular server instance.
8080 – This is the port on which server is listening, it’s optional and if we don’t provide it in
URL then request goes to the default port of the protocol. Port numbers 0 to 1023 are reserved
ports for well known services, for example 80 for HTTP, 443 for HTTPS, 21 for FTP etc.
Web Container
Tomcat is a web container, when a request is made from Client to web server, it passes the
request to web container and it’s web container job to find the correct resource to handle the
request (servlet or JSP) and then use the response from the resource to generate the response and
provide it to web server. Then web server sends the response back to the client.
When web container gets the request and if it’s for servlet then container creates two Objects
HTTPServletRequest and HTTPServletResponse. Then it finds the correct servlet based on the
URL and creates a thread for the request. Then it invokes the servlet service() method and based
on the HTTP method service() method invokes doGet() or doPost() methods. Servlet methods
generate the dynamic page and write it to response. Once servlet thread is complete, container
converts the response to HTTP response and send it back to client.
Some of the important work done by web container are:
Communication Support – Container provides easy way of communication between
web server and the servlets and JSPs. Because of container, we don’t need to build a
server socket to listen for any request from web server, parse the request and generate
response. All these important and complex tasks are done by container and all we need to
focus is on our business logic for our applications.
Lifecycle and Resource Management – Container takes care of managing the life cycle
of servlet. Container takes care of loading the servlets into memory, initializing servlets,
invoking servlet methods and destroying them. Container also provides utility like JNDI
for resource pooling and management.
Multithreading Support – Container creates new thread for every request to the servlet
and when it’s processed the thread dies. So servlets are not initialized for each request
and saves time and memory.
JSP Support – JSPs doesn’t look like normal java classes and web container provides
support for JSP. Every JSP in the application is compiled by container and converted to
Servlet and then container manages them like other servlets.
Miscellaneous Task – Web container manages the resource pool, does memory
optimizations, run garbage collector, provides security configurations, support for
multiple applications, hot deployment and several other tasks behind the scene that makes
our life easier.
Coding
package reformance.evaluation;
import weka.classifiers.Classifier;
import weka.classifiers.Sourcable;
import weka.classifiers.trees.j48.BinC45ModelSelection;
import weka.classifiers.trees.j48.C45ModelSelection;
import weka.classifiers.trees.j48.C45PruneableClassifierTree;
import weka.classifiers.trees.j48.ClassifierTree;
import weka.classifiers.trees.j48.ModelSelection;
import weka.classifiers.trees.j48.PruneableClassifierTree;
import weka.core.AdditionalMeasureProducer;
import weka.core.Capabilities;
import weka.core.Drawable;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.Matchable;
import weka.core.Option;
import weka.core.OptionHandler;
import weka.core.RevisionUtils;
import weka.core.Summarizable;
import weka.core.TechnicalInformation;
import weka.core.TechnicalInformationHandler;
import weka.core.Utils;
import weka.core.WeightedInstancesHandler;
import weka.core.TechnicalInformation.Field;
import weka.core.TechnicalInformation.Type;
import java.util.Enumeration;
import java.util.Vector;
/**
<!-- globalinfo-start -->
* Class for generating a pruned or unpruned C4.5 decision tree. For more information, see<br/>
* <br/>
* Ross Quinlan (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers,
San Mateo, CA.
* <p/>
<!-- globalinfo-end -->
*
<!-- technical-bibtex-start -->
* BibTeX:
* <pre>
* @book{Quinlan1993,
* address = {San Mateo, CA},
* author = {Ross Quinlan},
* publisher = {Morgan Kaufmann Publishers},
* title = {C4.5: Programs for Machine Learning},
* year = {1993}
*}
* </pre>
* <p/>
<!-- technical-bibtex-end -->
*
<!-- options-start -->
* Valid options are: <p/>
*
* <pre> -U
* Use unpruned tree.</pre>
*
* <pre> -C <pruning confidence>
* Set confidence threshold for pruning.
* (default 0.25)</pre>
*
* <pre> -M <minimum number of instances>
* Set minimum number of instances per leaf.
* (default 2)</pre>
*
* <pre> -R
* Use reduced error pruning.</pre>
*
* <pre> -N <number of folds>
* Set number of folds for reduced error
* pruning. One fold is used as pruning set.
* (default 3)</pre>
*
* <pre> -B
* Use binary splits only.</pre>
*
* <pre> -S
* Don't perform subtree raising.</pre>
*
* <pre> -L
* Do not clean up after the tree has been built.</pre>
*
* <pre> -A
* Laplace smoothing for predicted probabilities.</pre>
*
* <pre> -Q <seed>
* Seed for random data shuffling (default 1).</pre>
*
<!-- options-end -->
*
* @author Eibe Frank ([email protected])
* @version $Revision: 1.9 $
*/
public class C45
extends Classifier
implements OptionHandler, Drawable, Matchable, Sourcable,
WeightedInstancesHandler, Summarizable, AdditionalMeasureProducer,
TechnicalInformationHandler {
/**
* Returns a string describing classifier
* @return a description suitable for
* displaying in the explorer/experimenter gui
*/
public String globalInfo() {
return "Class for generating a pruned or unpruned C4.5 decision tree. For more "
+ "information, see\n\n"
+ getTechnicalInformation().toString();
}
/**
* Returns an instance of a TechnicalInformation object, containing
* detailed information about the technical background of this class,
* e.g., paper reference or book this class is based on.
*
* @return the technical information about this class
*/
public TechnicalInformation getTechnicalInformation() {
TechnicalInformation result;
return result;
}
/**
* Returns default capabilities of the classifier.
*
* @return the capabilities of this classifier
*/
public Capabilities getCapabilities() {
Capabilities result;
try {
if (!m_reducedErrorPruning)
result = new C45PruneableClassifierTree(null, !m_unpruned, m_CF, m_subtreeRaising, !
m_noCleanup).getCapabilities();
else
result = new PruneableClassifierTree(null, !m_unpruned, m_numFolds, !m_noCleanup,
m_Seed).getCapabilities();
}
catch (Exception e) {
result = new Capabilities(this);
}
result.setOwner(this);
return result;
}
/**
* Generates the classifier.
*
* @param instances the data to train the classifier with
* @throws Exception if classifier can't be built successfully
*/
public void buildClassifier(Instances instances)
throws Exception {
ModelSelection modSelection;
if (m_binarySplits)
modSelection = new BinC45ModelSelection(m_minNumObj, instances);
else
modSelection = new C45ModelSelection(m_minNumObj, instances);
if (!m_reducedErrorPruning)
m_root = new C45PruneableClassifierTree(modSelection, !m_unpruned, m_CF,
m_subtreeRaising, !m_noCleanup);
else
m_root = new PruneableClassifierTree(modSelection, !m_unpruned, m_numFolds,
!m_noCleanup, m_Seed);
m_root.buildClassifier(instances);
if (m_binarySplits) {
((BinC45ModelSelection)modSelection).cleanup();
} else {
((C45ModelSelection)modSelection).cleanup();
}
}
/**
* Classifies an instance.
*
* @param instance the instance to classify
* @return the classification for the instance
* @throws Exception if instance can't be classified successfully
*/
public double classifyInstance(Instance instance) throws Exception {
return m_root.classifyInstance(instance);
}
/**
* Returns class probabilities for an instance.
*
* @param instance the instance to calculate the class probabilities for
* @return the class probabilities
* @throws Exception if distribution can't be computed successfully
*/
public final double [] distributionForInstance(Instance instance)
throws Exception {
/**
* Returns the type of graph this classifier
* represents.
* @return Drawable.TREE
*/
public int graphType() {
return Drawable.TREE;
}
/**
* Returns graph describing the tree.
*
* @return the graph describing the tree
* @throws Exception if graph can't be computed
*/
public String graph() throws Exception {
return m_root.graph();
}
/**
* Returns tree in prefix order.
*
* @return the tree in prefix order
* @throws Exception if something goes wrong
*/
public String prefix() throws Exception {
return m_root.prefix();
}
/**
* Returns tree as an if-then statement.
*
* @param className the name of the Java class
* @return the tree as a Java if-then type statement
* @throws Exception if something goes wrong
*/
public String toSource(String className) throws Exception {
/**
* Returns an enumeration describing the available options.
*
* Valid options are: <p>
*
* -U <br>
* Use unpruned tree.<p>
*
* -C confidence <br>
* Set confidence threshold for pruning. (Default: 0.25) <p>
*
* -M number <br>
* Set minimum number of instances per leaf. (Default: 2) <p>
*
* -R <br>
* Use reduced error pruning. No subtree raising is performed. <p>
*
* -N number <br>
* Set number of folds for reduced error pruning. One fold is
* used as the pruning set. (Default: 3) <p>
*
* -B <br>
* Use binary splits for nominal attributes. <p>
*
* -S <br>
* Don't perform subtree raising. <p>
*
* -L <br>
* Do not clean up after the tree has been built.
*
* -A <br>
* If set, Laplace smoothing is used for predicted probabilites. <p>
*
* -Q <br>
* The seed for reduced-error pruning. <p>
*
* @return an enumeration of all the available options.
*/
public Enumeration listOptions() {
newVector.
addElement(new Option("\tUse unpruned tree.",
"U", 0, "-U"));
newVector.
addElement(new Option("\tSet confidence threshold for pruning.\n" +
"\t(default 0.25)",
"C", 1, "-C <pruning confidence>"));
newVector.
addElement(new Option("\tSet minimum number of instances per leaf.\n" +
"\t(default 2)",
"M", 1, "-M <minimum number of instances>"));
newVector.
addElement(new Option("\tUse reduced error pruning.",
"R", 0, "-R"));
newVector.
addElement(new Option("\tSet number of folds for reduced error\n" +
"\tpruning. One fold is used as pruning set.\n" +
"\t(default 3)",
"N", 1, "-N <number of folds>"));
newVector.
addElement(new Option("\tUse binary splits only.",
"B", 0, "-B"));
newVector.
addElement(new Option("\tDon't perform subtree raising.",
"S", 0, "-S"));
newVector.
addElement(new Option("\tDo not clean up after the tree has been built.",
"L", 0, "-L"));
newVector.
addElement(new Option("\tLaplace smoothing for predicted probabilities.",
"A", 0, "-A"));
newVector.
addElement(new Option("\tSeed for random data shuffling (default 1).",
"Q", 1, "-Q <seed>"));
return newVector.elements();
}
/**
* Parses a given list of options.
*
<!-- options-start -->
* Valid options are: <p/>
*
* <pre> -U
* Use unpruned tree.</pre>
*
* <pre> -C <pruning confidence>
* Set confidence threshold for pruning.
* (default 0.25)</pre>
*
* <pre> -M <minimum number of instances>
* Set minimum number of instances per leaf.
* (default 2)</pre>
*
* <pre> -R
* Use reduced error pruning.</pre>
*
* <pre> -N <number of folds>
* Set number of folds for reduced error
* pruning. One fold is used as pruning set.
* (default 3)</pre>
*
* <pre> -B
* Use binary splits only.</pre>
*
* <pre> -S
* Don't perform subtree raising.</pre>
*
* <pre> -L
* Do not clean up after the tree has been built.</pre>
*
* <pre> -A
* Laplace smoothing for predicted probabilities.</pre>
*
* <pre> -Q <seed>
* Seed for random data shuffling (default 1).</pre>
*
<!-- options-end -->
*
* @param options the list of options as an array of strings
* @throws Exception if an option is not supported
*/
public void setOptions(String[] options) throws Exception {
// Other options
String minNumString = Utils.getOption('M', options);
if (minNumString.length() != 0) {
m_minNumObj = Integer.parseInt(minNumString);
} else {
m_minNumObj = 2;
}
m_binarySplits = Utils.getFlag('B', options);
m_useLaplace = Utils.getFlag('A', options);
// Pruning options
m_unpruned = Utils.getFlag('U', options);
m_subtreeRaising = !Utils.getFlag('S', options);
m_noCleanup = Utils.getFlag('L', options);
if ((m_unpruned) && (!m_subtreeRaising)) {
throw new Exception("Subtree raising doesn't need to be unset for unpruned tree!");
}
m_reducedErrorPruning = Utils.getFlag('R', options);
if ((m_unpruned) && (m_reducedErrorPruning)) {
throw new Exception("Unpruned tree and reduced error pruning can't be selected " +
"simultaneously!");
}
String confidenceString = Utils.getOption('C', options);
if (confidenceString.length() != 0) {
if (m_reducedErrorPruning) {
throw new Exception("Setting the confidence doesn't make sense " +
"for reduced error pruning.");
} else if (m_unpruned) {
throw new Exception("Doesn't make sense to change confidence for unpruned "
+"tree!");
} else {
m_CF = (new Float(confidenceString)).floatValue();
if ((m_CF <= 0) || (m_CF >= 1)) {
throw new Exception("Confidence has to be greater than zero and smaller " +
"than one!");
}
}
} else {
m_CF = 0.25f;
}
String numFoldsString = Utils.getOption('N', options);
if (numFoldsString.length() != 0) {
if (!m_reducedErrorPruning) {
throw new Exception("Setting the number of folds" +
" doesn't make sense if" +
" reduced error pruning is not selected.");
} else {
m_numFolds = Integer.parseInt(numFoldsString);
}
} else {
m_numFolds = 3;
}
String seedString = Utils.getOption('Q', options);
if (seedString.length() != 0) {
m_Seed = Integer.parseInt(seedString);
} else {
m_Seed = 1;
}
}
/**
* Gets the current settings of the Classifier.
*
* @return an array of strings suitable for passing to setOptions
*/
public String [] getOptions() {
if (m_noCleanup) {
options[current++] = "-L";
}
if (m_unpruned) {
options[current++] = "-U";
} else {
if (!m_subtreeRaising) {
options[current++] = "-S";
}
if (m_reducedErrorPruning) {
options[current++] = "-R";
options[current++] = "-N"; options[current++] = "" + m_numFolds;
options[current++] = "-Q"; options[current++] = "" + m_Seed;
} else {
options[current++] = "-C"; options[current++] = "" + m_CF;
}
}
if (m_binarySplits) {
options[current++] = "-B";
}
options[current++] = "-M"; options[current++] = "" + m_minNumObj;
if (m_useLaplace) {
options[current++] = "-A";
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String seedTipText() {
return "The seed used for randomizing the data " +
"when reduced-error pruning is used.";
}
/**
* Get the value of Seed.
*
* @return Value of Seed.
*/
public int getSeed() {
return m_Seed;
}
/**
* Set the value of Seed.
*
* @param newSeed Value to assign to Seed.
*/
public void setSeed(int newSeed) {
m_Seed = newSeed;
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String useLaplaceTipText() {
return "Whether counts at leaves are smoothed based on Laplace.";
}
/**
* Get the value of useLaplace.
*
* @return Value of useLaplace.
*/
public boolean getUseLaplace() {
return m_useLaplace;
}
/**
* Set the value of useLaplace.
*
* @param newuseLaplace Value to assign to useLaplace.
*/
public void setUseLaplace(boolean newuseLaplace) {
m_useLaplace = newuseLaplace;
}
/**
* Returns a description of the classifier.
*
* @return a description of the classifier
*/
public String toString() {
if (m_root == null) {
return "No classifier built";
}
if (m_unpruned)
return "J48 unpruned tree\n------------------\n" + m_root.toString();
else
return "J48 pruned tree\n------------------\n" + m_root.toString();
}
/**
* Returns a superconcise version of the model
*
* @return a summary of the model
*/
public String toSummaryString() {
/**
* Returns the size of the tree
* @return the size of the tree
*/
public double measureTreeSize() {
return m_root.numNodes();
}
/**
* Returns the number of leaves
* @return the number of leaves
*/
public double measureNumLeaves() {
return m_root.numLeaves();
}
/**
* Returns the number of rules (same as number of leaves)
* @return the number of rules
*/
public double measureNumRules() {
return m_root.numLeaves();
}
/**
* Returns an enumeration of the additional measure names
* @return an enumeration of the measure names
*/
public Enumeration enumerateMeasures() {
Vector newVector = new Vector(3);
newVector.addElement("measureTreeSize");
newVector.addElement("measureNumLeaves");
newVector.addElement("measureNumRules");
return newVector.elements();
}
/**
* Returns the value of the named measure
* @param additionalMeasureName the name of the measure to query for its value
* @return the value of the named measure
* @throws IllegalArgumentException if the named measure is not supported
*/
public double getMeasure(String additionalMeasureName) {
if (additionalMeasureName.compareToIgnoreCase("measureNumRules") == 0) {
return measureNumRules();
} else if (additionalMeasureName.compareToIgnoreCase("measureTreeSize") == 0) {
return measureTreeSize();
} else if (additionalMeasureName.compareToIgnoreCase("measureNumLeaves") == 0) {
return measureNumLeaves();
} else {
throw new IllegalArgumentException(additionalMeasureName
+ " not supported (j48)");
}
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String unprunedTipText() {
return "Whether pruning is performed.";
}
/**
* Get the value of unpruned.
*
* @return Value of unpruned.
*/
public boolean getUnpruned() {
return m_unpruned;
}
/**
* Set the value of unpruned. Turns reduced-error pruning
* off if set.
* @param v Value to assign to unpruned.
*/
public void setUnpruned(boolean v) {
if (v) {
m_reducedErrorPruning = false;
}
m_unpruned = v;
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String confidenceFactorTipText() {
return "The confidence factor used for pruning (smaller values incur "
+ "more pruning).";
}
/**
* Get the value of CF.
*
* @return Value of CF.
*/
public float getConfidenceFactor() {
return m_CF;
}
/**
* Set the value of CF.
*
* @param v Value to assign to CF.
*/
public void setConfidenceFactor(float v) {
m_CF = v;
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String minNumObjTipText() {
return "The minimum number of instances per leaf.";
}
/**
* Get the value of minNumObj.
*
* @return Value of minNumObj.
*/
public int getMinNumObj() {
return m_minNumObj;
}
/**
* Set the value of minNumObj.
*
* @param v Value to assign to minNumObj.
*/
public void setMinNumObj(int v) {
m_minNumObj = v;
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String reducedErrorPruningTipText() {
return "Whether reduced-error pruning is used instead of C.4.5 pruning.";
}
/**
* Get the value of reducedErrorPruning.
*
* @return Value of reducedErrorPruning.
*/
public boolean getReducedErrorPruning() {
return m_reducedErrorPruning;
}
/**
* Set the value of reducedErrorPruning. Turns
* unpruned trees off if set.
*
* @param v Value to assign to reducedErrorPruning.
*/
public void setReducedErrorPruning(boolean v) {
if (v) {
m_unpruned = false;
}
m_reducedErrorPruning = v;
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String numFoldsTipText() {
return "Determines the amount of data used for reduced-error pruning. "
+ " One fold is used for pruning, the rest for growing the tree.";
}
/**
* Get the value of numFolds.
*
* @return Value of numFolds.
*/
public int getNumFolds() {
return m_numFolds;
}
/**
* Set the value of numFolds.
*
* @param v Value to assign to numFolds.
*/
public void setNumFolds(int v) {
m_numFolds = v;
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String binarySplitsTipText() {
return "Whether to use binary splits on nominal attributes when "
+ "building the trees.";
}
/**
* Get the value of binarySplits.
*
* @return Value of binarySplits.
*/
public boolean getBinarySplits() {
return m_binarySplits;
}
/**
* Set the value of binarySplits.
*
* @param v Value to assign to binarySplits.
*/
public void setBinarySplits(boolean v) {
m_binarySplits = v;
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String subtreeRaisingTipText() {
return "Whether to consider the subtree raising operation when pruning.";
}
/**
* Get the value of subtreeRaising.
*
* @return Value of subtreeRaising.
*/
public boolean getSubtreeRaising() {
return m_subtreeRaising;
}
/**
* Set the value of subtreeRaising.
*
* @param v Value to assign to subtreeRaising.
*/
public void setSubtreeRaising(boolean v) {
m_subtreeRaising = v;
}
/**
* Returns the tip text for this property
* @return tip text for this property suitable for
* displaying in the explorer/experimenter gui
*/
public String saveInstanceDataTipText() {
return "Whether to save the training data for visualization.";
}
/**
* Check whether instance data is to be saved.
*
* @return true if instance data is saved
*/
public boolean getSaveInstanceData() {
return m_noCleanup;
}
/**
* Set whether instance data is to be saved.
* @param v true if instance data is to be saved
*/
public void setSaveInstanceData(boolean v) {
m_noCleanup = v;
}
/**
* Returns the revision string.
*
* @return the revision
*/
public String getRevision() {
return RevisionUtils.extract("$Revision: 1.9 $");
}
/**
* Main method for testing this class
*
* @param argv the commandline options
*/
public static void main(String [] argv){
argv[0]="weather3.arff";
runClassifier(new C45(),argv );
}
TESTING
TESTING
Software testing can be stated as the process of validating and verifying that a computer
program/application/product:
• It works as expected,
• It can be implemented with the same characteristics, It satisfies the needs of
stakeholders.
Software testing, depending on the testing method employed, can be implemented at any time in
the software development process.
Testing levels
There are generally four recognized levels of tests: unit testing, integration testing,
system testing, and acceptance testing. Tests are frequently grouped by where they are added in
the software development process, or by the level of specificity of the test.
Unit testing
Unit testing, also known as component testing, refers to tests that verify the functionality of a
specific section of code, usually at the function level. In an object-oriented environment, this is
usually at the class level, and the minimal unit tests include the constructors and destructors.[32]
These types of tests are usually written by developers as they work on code (white-box style), to
ensure that the specific function is working as expected. One function might have multiple tests,
to catch corner casesor other branches in the code. Unit testing alone cannot verify the
functionality of a piece of software, but rather is used to assure that the building blocks the
software uses work independently of each other.
Integration testing
Integration testing is any type of software testing that seeks to verify the interfaces
between components against a software design. Software components may be integrated in an
iterative way or all together. Normally the former is considered a better practice since it allows
interface issues to be located more quickly and fixed.
Integration testing works to expose defects in the interfaces and interaction between integrated
components (modules). Progressively larger groups of tested software components
corresponding to elements of the architectural design are integrated and tested until the software
works as a system.
System testing
Testing Types:
Installation testing
An installation test assures that the system is installed correctly and working at actual
customer's hardware.
Compatibility testing
Regression testing focuses on finding defects after a major code change has occurred.
Specifically, it seeks to uncover software regressions, as degraded or lost features, including old
bugs that have come back. Such regressions occur whenever software functionality that was
previously working, correctly, stops working as intended. Typically, regressions occur as an
unintended consequenceof program changes, when the newly developed part of the software
collides with the previously existing code. Common methods of regression testing include
rerunning previous sets of test-cases and checking whether previously fixed faults have
reemerged.
Acceptance Testing
1. A smoke testis used as an acceptance test prior to introducing a new build to the main
testing process, i.e. before integrationor regression.
2. Acceptance testing performed by the customer, often in their lab environment on their
own hardware, is known as user acceptance testing(UAT). Acceptance testing may be
performed as part of the hand-off process between any two phases of development.
Alpha testing
Beta Testing
Beta testing comes after alpha testing and can be considered a form of external user acceptance
testing. Versions of the software, known as beta versions, are released to a limited audience
outside of the programming team. The software is released to groups of people so that further
testing can ensure the product has few faults or bugs. Sometimes, beta versions are made
available to the open public to increase the feedbackfield to a maximal number of future users.
Functional testing refers to activities that verify a specific action or function of the
code. These are usually found in the code requirements documentation, although some
development methodologies work from use cases or user stories. Functional tests tend to answer
the question of "can the user do this" or "does this particular feature work."
Non-functional testing refers to aspects of the software that may not be related to a
specific function or user action, such as scalabilityor other performance, behavior under certain
constraints, or security. Testing will determine the breaking point, the point at which extremes
of scalability or performance leads to unstable execution.
1. User Interfaces in C#: Windows Forms and Custom Controls by Matthew MacDonald.
2. Applied Microsoft® .NET Framework Programming (Pro-Developer) by Jeffrey Richter.
3. Practical .Net2 and C#2: Harness the Platform, the Language, and the Framework by Patrick
Smacchia.
4. Data Communications and Networking, by Behrouz A Forouzan.
5. Computer Networking: A Top-Down Approach, by James F. Kurose.
6. Operating System Concepts, by Abraham Silberschatz.
7. M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A.
Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the clouds: A berkeley view of cloud
computing,” University of California, Berkeley, Tech. Rep. USB-EECS-2009-28, Feb 2009.
8. “The apache cassandra project,” http://cassandra.apache.org/.
9. L. Lamport, “The part-time parliament,” ACM Transactions on Computer Systems, vol. 16,
pp. 133–169, 1998.
10. N. Bonvin, T. G. Papaioannou, and K. Aberer, “Cost-efficient and differentiated data
availability guarantees in data clouds,” in Proc. of the ICDE, Long Beach, CA, USA, 2010.
11. O. Regev and N. Nisan, “The popcorn market. online markets for computational resources,”
Decision Support Systems, vol. 28, no. 1-2, pp. 177 – 189, 2000.
12. A. Helsinger and T. Wright, “Cougaar: A robust configurable multi agent platform,” in Proc.
of the IEEE Aerospace Conference, 2005.
Sites Referred:
http://www.sourcefordgde.com
http://www.networkcomputing.com/
http://www.ieee.org
http://www.emule-project.net/
REFERENCES
[1] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Francisco: Morgan Kaufmann,
1988.
[5] P. Langley, W. Iba, and K. Thomas, “An analysis of bayesian classifiers,” in Proceedings of
the Tenth National Conference of Artificial Intelligence. AAAI Press, 1992, pp. 223–228.
[6] P. Domingos and M. Pazzani, “Beyond independence: Conditions for the optimality of the
simple bayesian classifier,” Machine Learning, vol. 29, pp. 103–130, 1997.
[7] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, and Q. Yang, “Top 10 algorithms in data mining,”
Knowledge and Information Systems, vol. 14(1), pp. 1–37, 2008.
[9] L. Jiang, H. Zhang, and Z. Cai, “A novel bayes model: Hidden naive bayes,” IEEE
Transactions on Knowledge and Data Engineering, vol. 21, pp. 1361–1371, 2009.
[11] C. Qiu, L. Jiang, and C. Li, “Not always simple classification: Learning superparent for
class probability estimation,” Expert Systems with Applications, vol. 42(13), pp. 5433–5440,
2015.
[12] J. Wu, S. Pan, X. Zhu, P. Zhang, and C. Zhang, “SODE: Selfadaptive one-dependence
estimators for classification,” Pattern Recognition, vol. 51, pp. 358–377, 2016.
[13] P. Langley and S. Sage, “Induction of selective bayesian classifiers,” in Proceedings of the
Tenth Conference on Uncertainty in Artificial Intelligence, 1994, pp. 339–406.
[14] C. A. Ratanamahatana and D. Gunopulos, “Feature selection for the naive bayesian
classifier using decision trees,” Applied Artificial Intelligence, vol. 17, pp. 475–487, 2003.
[15] L. Jiang, H. Zhang, Z. Cai, and J. Su, “Evolutional naive bayes,” in Proceedings of the 1st
International Symposium on Intelligent Computation and its Applications, 2005, pp. 344–350.
[16] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: Criteria of
max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 27(8), pp. 1226–1238, 2005.
[17] L. Jiang, Z. Cai, H. Zhang, and D. Wang, “Not so greedy: Randomly selected naive bayes,”
Expert Systems with Applications, vol. 39(12), pp. 11 022–11 028, 2012.
[18] B. Tang, S. Kay, and H. He, “Toward optimal feature selection in naive bayes for text
categorization,” IEEE Transactions on Knowledge and Data Engineering, vol. 28(9), pp. 2508–
2521, 2016.
[19] H. Zhang and S. Sheng, “Learning weighted naive bayes with accurate ranking,” in
Proceedings of the 4th International Conference on Data Mining. Brighton, UK: IEEE, 2004, pp.
567–570.
[20] M. Hall, “A decision tree-based attribute weighting filter for naive bayes,” Knowledge-
Based Systems, vol. 20(2), pp. 120–126, 2007.
[21] C. H. Lee, F. Gutierrez, and D. Dou, “Calculating feature weights in naive bayes with
kullback-leibler measure,” in Proceedings of the 11th IEEE International Conference on Data
Mining. Vancouver, BC: IEEE, 2011, pp. 1146–1151.
[22] J. Wu and Z. Cai, “Attribute weighting via differential evolution algorithm for attribute
weighted naive bayes (wnb),” Journal of Computational Information Systems, vol. 7(5), pp.
1672–1679, 2011.
[23] N. A. Zaidi, J. Cerquides, M. J. Carman, and G. I. Webb, “Alleviating naive bayes attribute
independence assumption by attribute weighting,” Journal of Machine Learning Research, vol.
14, pp. 1947– 1988, 2013.
[24] L. Jiang, C. Li, S.Wang, and L. Zhang, “Deep feature weighting for naive bayes and its
application to text classification,” Engineering Applications of Artificial Intelligence, vol. 52, pp.
26–39, 2016.