0% found this document useful (0 votes)

146 views83 pages

Clustering With Multiviewpoint-Based Similarity Measure: Abstract

This document introduces a novel multiviewpoint-based similarity measure and two related clustering methods. Traditional similarity measures use a single viewpoint, the origin, while the proposed measure utilizes multiple viewpoints from objects not assumed to be in the same cluster. Using multiple viewpoints allows a more informative assessment of similarity. Two criterion functions for document clustering are proposed based on this new similarity measure. The performance of these methods is compared to other clustering algorithms on various document collections.

Uploaded by

SathishPerla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views83 pages

Clustering With Multiviewpoint-Based Similarity Measure: Abstract

Uploaded by

SathishPerla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 83

Clustering with Multiviewpoint-Based Similarity Measure

Abstract:
All clustering methods have to assume some cluster relationship among the data objects
that they are applied on. Similarity between a pair of objects can be defined either explicitly or
implicitly. In this paper, we introduce a novel multiviewpoint-based similarity measure and two
related clustering methods. The major difference between a traditional dissimilarity/similarity
measure and ours is that the former uses only a single viewpoint, which is the origin, while the
latter utilizes many different viewpoints, which are objects assumed to not be in the same cluster
with the two objects being measured. Using multiple viewpoints, more informative assessment of
similarity could be achieved. Theoretical analysis and empirical study are conducted to support
this claim. Two criterion functions for document clustering are proposed based on this new
measure. We compare them with several well-known clustering algorithms that use other popular
similarity measures on various document collections to verify the advantages of our proposal.

INTRODUCTION
CLUSTERING is one of the most interesting and important topics in data mining. The aim of
clustering is to find intrinsic structures in data, and organize them into meaningful subgroups for
further study and analysis. There have been many clustering algorithms published every year.
They can be proposed for very distinct research fields, and developed using totally different
techniques and approaches.Nevertheless, according to a recent study ,more than half a century
after it was introduced, the simple algorithm k-means still remains as one of the top 10 data
mining algorithms nowadays. It is the most frequently used partitional clustering algorithm in
practice. Another recent scientific discussion [2] states that k-means is the favorite algorithm that
practitioners in the related fields choose to use. Needless to mention, k-means has more than a
few basic drawbacks, such as sensitiveness to initialization and to cluster size, and its
performance can be worse than other state-of-the-art algorithms in many domains. In spite of
that,its simplicity, understandability, and scalability are the reasons for its tremendous popularity.
An algorithm with adequate performance and usability in most of application scenarios could be
preferable to one with better performance in some cases but limited usage due to high
complexity.While offering reasonable results, k-means is fast and easy to combine with other
methods in larger systems.A common approach to the clustering problem is to treat it as an
optimization process. An optimal partition is found by optimizing a particular function of
similarity (or distance) among data. Basically, there is an implicit assumption that the true
intrinsic structure of data could be correctly described by the similarity formula defined and
embedded in the clustering criterion function. Hence,effectiveness of clustering algorithms under
this approach depends on the appropriateness of the similarity measure to the data at hand. For
instance, the original k-means has sum-of-squared-error objective function that uses euclidean
distance. In a very sparse and high-dimensional domain like text documents, spherical k-means,
which uses cosine similarity (CS) instead of euclidean distance as the measure,is deemed to be
more suitable .In, Banerjee et al. showed that euclidean distance was indeed one particular form
of a class of distance measures called Bregman divergences. They proposed Bregman
hardclustering algorithm, in which any kind of the Bregman divergences could be applied.
Kullback-Leibler divergence was a special case of Bregman divergences that was said to give
good clustering results on document data sets. Kullback-Leibler divergence is a good example of
nonsymmetric measure. Also on the topic of capturing dissimilarity in data,Pakalska et al. found

that the discriminative power of some distance measures could increase when their nonEuclidean and nonmetric attributes were increased. They concluded that noneuclidean and
nonmetric measures could be informative for statistical learning of data. In , Pelillo even argued
that the symmetry and nonnegativity assumption of similarity measures was actually a limitation
of current state-of-the-art clustering approaches. Simultaneously, clustering
still requires more robust dissimilarity or similarity measures; recent works such as [8] illustrate
this need.The work in this paper is motivated by investigations from the above and similar
research findings. It appears to us that the nature of similarity measure plays a very important
role in the success or failure of a clustering method. Our first objective is to derive a novel
method for measuring similarity between data objects in sparse and high-dimensional domain,
particularly text documents. From the proposed similarity measure, we then formulate new
clustering criterion functions and introduce their respective clustering algorithms, which are fast
and scalable like k-means, but are also capable of providing high-quality and consistent
performance.

Existing System
A common approach to the clustering problem is to treat it as an optimization process. An
optimal partition is found by optimizing a particular function of similarity (or distance) among
data. Basically, there is an implicit assumption that the true intrinsic structure of data could be
correctly described by the similarity formula defined and embedded in the clustering criterion
function. Hence, effectiveness of clustering algorithms under this approach depends on the
appropriateness of the similarity measure to the data at hand. For instance, the original k-means
has sum-of-squared-error objective function that uses Euclidean distance. In a very sparse and
high-dimensional domain like text documents, spherical k-means, which uses cosine similarity
(CS) instead of Euclidean distance as the measure, is deemed to be more suitable.

Proposed System:
The work in this paper is motivated by investigations from the above and similar research
findings. It appears to us that the nature of similarity measure plays a very important role in the
success or failure of a clustering method. Our first objective is to derive a novel method for
measuring similarity between data objects in sparse and high-dimensional domain, particularly
text documents. From the proposed similarity measure, we then formulate new clustering

criterion functions and introduce their respective clustering algorithms, which are fast and
scalable like k-means, but are also capable of providing high-quality and consistent performance.
Software Requirement Specification
Software Specification
Operating System

Windows XP

Technology

JAVA 1.6,Jfreechart

Minimum Hardware Specification

Processor

Pentium IV

RAM

512 MB

Hard Disk

80GB

Modules
Select File
Process
Histogram
Clusters
Similarity
Result

TECHNOLOGIES USED
4.1 Introduction To Java:
Java has been around since 1991, developed by a small team of Sun Microsystems
developers in a project originally called the Green project. The intent of the project was to
develop a platform-independent software technology that would be used in the consumer
electronics industry. The language that the team created was originally called Oak.
The first implementation of Oak was in a PDA-type device called Star Seven (*7) that
consisted of the Oak language, an operating system called GreenOS, a user interface, and
hardware. The name *7 was derived from the telephone sequence that was used in the team's
office and that was dialed in order to answer any ringing telephone from any other phone in the
office.
Around the time the First Person project was floundering in consumer electronics, a new
craze was gaining momentum in America; the craze was called "Web surfing." The World Wide
Web, a name applied to the Internet's millions of linked HTML documents was suddenly
becoming popular for use by the masses. The reason for this was the introduction of a graphical
Web browser called Mosaic, developed by ncSA. The browser simplified Web browsing by

combining text and graphics into a single interface to eliminate the need for users to learn many
confusing UNIX and DOS commands. Navigating around the Web was much easier using
Mosaic.
It has only been since 1994 that Oak technology has been applied to the Web. In 1994,
two Sun developers created the first version of Hot Java, and then called Web Runner, which is a
graphical browser for the Web that exists today. The browser was coded entirely in the Oak
language, by this time called Java. Soon after, the Java compiler was rewritten in the Java
language from its original C code, thus proving that Java could be used effectively as an
application language. Sun introduced Java in May 1995 at the Sun World 95 convention.

Web surfing has become an enormously popular practice among millions of computer
users. Until Java, however, the content of information on the Internet has been a bland series of
HTML documents. Web users are hungry for applications that are interactive, that users can
execute no matter what hardware or software platform they are using, and that travel across
heterogeneous networks and do not spread viruses to their computers. Java can create such
applications.
The Java programming language is a high-level language that can be characterized by all
of the following buzzwords:

Simple

Architecture neutral

Object oriented

Portable

Distributed

High performance

Interpreted

Multithreaded

Robust

Dynamic

Secure
With most programming languages, you either compile or interpret a program so that you

can run it on your computer. The Java programming language is unusual in that a program is
both compiled and interpreted. With the compiler, first you translate a program into an
intermediate language called Java byte codes the platform-independent codes interpreted by
the interpreter on the Java platform. The interpreter parses and runs each Java byte code
instruction on the computer. Compilation happens just once; interpretation occurs each time the
program is executed. The following figure illustrates how this works.

Figure 4.1: Working Of Java

You can think of Java bytecodes as the machine code instructions for the java virtual
machine (Java VM). Every Java interpreter, whether its a development tool or a Web browser
that can run applets, is an implementation of the Java VM. Java bytecodes help make write
once, run anywhere possible. You can compile your program into bytecodes on any platform
that has a Java compiler. The bytecodes can then be run on any implementation of the Java VM.
That means that as long as a computer has a Java VM, the same program written in the Java
programming language can run on Windows 2000, a Solaris workstation, or on an iMac.
The Java Platform:
A platform is the hardware or software environment in which a program runs. Weve
already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and
MacOS. Most platforms can be described as a combination of the operating system and
hardware. The Java platform differs from most other platforms in that its a software-only
platform that runs on top of other hardware-based platforms.

The Java platform has two components:

The java virtual mechine (Java VM)
The java application programming interface (Java API)
Youve already been introduced to the Java VM. Its the base for the Java platform and is
ported onto various hardware-based platforms.
The Java API is a large collection of ready-made software components that provide many
useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into
libraries of related classes and interfaces; these libraries are known as packages. The next
section, What Can Java Technology Do?, highlights what functionality some of the packages in
the Java API provide.
The following figure depicts a program thats running on the Java platform. As the figure
shows, the Java API and the virtual machine insulate the program from the hardware.

Figure 4.2: The Java Platform

Native code is code that after you compile it, the compiled code runs on a specific
hardware platform. As a platform-independent environment, the Java platform can be a bit
slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time
bytecode compilers can bring performance close to that of native code without threatening
portability.
Working Of Java:
For those who are new to object-oriented programming, the concept of a class will be
new to you. Simplistically, a class is the definition for a segment of code that can contain both
data and functions.When the interpreter executes a class, it looks for a particular method by the

name of main, which will sound familiar to C programmers. The main method is passed as a
parameter an array of strings (similar to the argv[] of C), and is declared as a static method.
To output text from the program, iexecute the println method of System. out, which is
javas output stream. UNIX users will appreciate the theory behind such a stream, as it is actually
standard output. For those who are instead used to the Wintel platform, it will write the string
passed to it to the users program.

4.2 Swing:
Introduction To Swing:
Swing contains all the components.

Its a big library, but its designed to have

appropriate complexity for the task at hand if something is simple, you dont have to write
much code but as you try to do more your code becomes increasingly complex. This means an
easy entry point, but youve got the power if you need it.
Swing has great depth. This section does not attempt to be comprehensive, but instead
introduces the power and simplicity of Swing to get you started using the library. Please be
aware that what you see here is intended to be simple. If you need to do more, then Swing can
probably give you what you want if youre willing to do the research by hunting through the
online documentation from Sun.
Benefits Of Swing:
Swing components are Beans, so they can be used in any development environment that
supports Beans. Swing provides a full set of UI components. For speed, all the components are
lightweight and Swing is written entirely in Java for portability.
Swing could be called orthogonality of use; that is, once you pick up the general ideas
about the library you can apply them everywhere. Primarily because of the Beans naming
conventions.

Keyboard navigation is automatic you can use a Swing application without the mouse,
but you dont have to do any extra programming. Scrolling support is effortless you simply
wrap your component in a JScrollPane as you add it to your form. Other features such as tool tips
typically require a single line of code to implement.
Swing also supports something called pluggable look and feel, which means that the
appearance of the UI can be dynamically changed to suit the expectations of users working under
different platforms and operating systems. Its even possible to invent your own look and feel.

Domain Description:
Data mining involves the use of sophisticated data analysis tools to discover previously
unknown, valid patterns and relationships in large data sets. These tools can include statistical
models, mathematical algorithms, and machine learning methods (algorithms that improve their
performance automatically through experience, such as neural networks or decision trees).
Consequently, data mining consists of more than collecting and managing data, it also includes
analysis and prediction.
Data mining can be performed on data represented in quantitative, textual, or multimedia
forms. Data mining applications can use a variety of parameters to examine the data. They
include association (patterns where one event is connected to another event, such as purchasing a
pen and purchasing paper), sequence or path analysis (patterns where one event leads to another
event, such as the birth of a child and purchasing diapers), classification (identification of new
patterns, such as coincidences between duct tape purchases and plastic sheeting purchases),
clustering (finding and visually documenting groups of previously unknown facts, such as
geographic location and brand preferences), and forecasting (discovering patterns from which
one can make reasonable predictions regarding future activities, such as the prediction that
people who join an athletic club may take exercise classes)

Figure 4.3 knowledge discovery process

Data Mining Uses:
Data mining is used for a variety of purposes in both the private and public sectors.
Industries such as banking, insurance, medicine, and retailing commonly use data mining
to reduce costs, enhance research, and increase sales. For example, the insurance and
banking industries use data mining applications to detect fraud and assist in risk
assessment (e.g., credit scoring).
Using customer data collected over several years, companies can develop models that
predict whether a customer is a good credit risk, or whether an accident claim may be
fraudulent and should be investigated more closely.

The medical community sometimes uses data mining to help predict the effectiveness of a
procedure or medicine.
Pharmaceutical firms use data mining of chemical compounds and genetic material to
help guide research on new treatments for diseases.
Retailers can use information collected through affinity programs (e.g., shoppers club cards,
frequent flyer points, contests) to assess the effectiveness of product selection and placement
decisions, coupon offers, and which products are often purchased together.

DESIGN ANALYSIS
UML Diagrams:
UML is a method for describing the system architecture in detail using the blueprint.
UML represents a collection of best engineering practices that have proven successful in the
modeling of large and complex systems.
UML is a very important part of developing objects oriented software and the software
development process.
UML uses mostly graphical notations to express the design of software projects.
Using the UML helps project teams communicate, explore potential designs, and validate
the architectural design of the software.
Definition:
UML is a general-purpose visual modeling language that is used to specify, visualize, construct,
and document the artifacts of the software system.
UML is a language:
It will provide vocabulary and rules for communications and function on conceptual and physical
representation. So it is modeling language.
UML Specifying:
Specifying means building models that are precise, unambiguous and complete. In particular, the
UML address the specification of all the important analysis, design and implementation decisions
that must be made in developing and displaying a software intensive system.
UML Visualization:
The UML includes both graphical and textual representation. It makes easy to visualize the
system and for better understanding.

UML Constructing:
UML models can be directly connected to a variety of programming languages and it is
sufficiently expressive and free from any ambiguity to permit the direct execution of models.
UML Documenting:
UML provides variety of documents in addition raw executable codes.

Figure 3.4 Modeling a System Architecture using views of UML

The use case view of a system encompasses the use cases that describe the behavior of the
system as seen by its end users, analysts, and testers.
The design view of a system encompasses the classes, interfaces, and collaborations that form the
vocabulary of the problem and its solution.
The process view of a system encompasses the threads and processes that form the system's
concurrency and synchronization mechanisms.
The implementation view of a system encompasses the components and files that are used to
assemble and release the physical system.The deployment view of a system encompasses the

nodes that form the system's hardware topology on which the system executes.

Uses of UML :
The UML is intended primarily for software intensive systems. It has been used
effectively for such domain as
Enterprise Information System
Banking and Financial Services
Telecommunications
Transportation
Defense/Aerosp
Retails
Medical Electronics

Scientific Fields
Distributed Web
Building blocks of UML:
The vocabulary of the UML encompasses 3 kinds of building blocks
Things
Relationships
Diagrams
Things:
Things are the data abstractions that are first class citizens in a model. Things are of 4 types
Structural Things, Behavioral Things ,Grouping Things, An notational Things

Relationships:
Relationships tie the things together. Relationships in the UML are
Dependency, Association, Generalization, Specialization
UML Diagrams:
A diagram is the graphical presentation of a set of elements, most often rendered as a connected
graph of vertices (things) and arcs (relationships).
There are two types of diagrams, they are:
Structural and Behavioral Diagrams
Structural Diagrams:The UMLs four structural diagrams exist to visualize, specify, construct and document
the static aspects of a system. ican View the static parts of a system using one of the following

diagrams. Structural diagrams consists of Class Diagram, Object Diagram, Component Diagram,
Deployment Diagram.
Behavioral Diagrams :
The UMLs five behavioral diagrams are used to visualize, specify, construct, and
document the dynamic aspects of a system. The UMLs behavioral diagrams are roughly
organized around the major ways which can model the dynamics of a system.
Behavioral diagrams consists of
Use case Diagram, Sequence Diagram, Collaboration Diagram, State chart Diagram, Activity
Diagram

3.2.1 Use-Case diagram:

A use case is a set of scenarios that describing an interaction between a user and a
system. A use case diagram displays the relationship among actors and use cases. The two main
components of a use case diagram are use cases and actors.

An actor is represents a user or another system that will interact with the system you are
modeling. A use case is an external view of the system that represents some action the user
might perform in order to complete a task.
Contents:

Use cases

Actors

Dependency, Generalization, and association relationships

System boundary

select path

process

histogram

clusters

similarity

Result

3.2.2 Class Diagram:

Class diagrams are widely used to describe the types of objects in a system and their
relationships. Class diagrams model class structure and contents using design elements such as

classes, packages and objects. Class diagrams describe three different perspectives when
designing a system, conceptual, specification, and implementation. These perspectives become
evident as the diagram is created and help solidify the design. Class diagrams are arguably the
most used UML diagram type. It is the main building block of any object oriented solution. It
shows the classes in a system, attributes and operations of each class and the relationship
between each class. In most modeling tools a class has three parts, name at the top, attributes in
the middle and operations or methods at the bottom. In large systems with many classes related
classes are grouped together to to create class diagrams. Different relationships between
diagrams are show by different types of Arrows. Below is a image of a class diagram. Follow the
link for more class diagram examples.

UML Class Diagram with Relationships

select file
+file()

Process
+process()

Histogram
+histogram()

result
Clusters
+cluster()

Similarity
+similarity()

Sequence Diagram
Sequence diagrams in UML shows how object interact with each other and the order those
interactions occur. Its important to note that they show the interactions for a particular scenario.
The processes are represented vertically and interactions are show as arrows. This article
explains thepurpose and the basics of Sequence diagrams.

/ process

/ select file

/ histogram

/ clusters

/ similarity

/ result

1 : select file()

2 : process the file()

3 : divide histograms()

4 : divide clusters()
5 : no of similarities()
6 : result()

Collaboration diagram
Communication diagram was called collaboration diagram in UML 1. It is similar to sequence
diagrams but the focus is on messages passed between objects. The same information can be
represented using a sequence diagram and different objects. Click here to understand the
differences using an example.

/ process

/ histogram
/ select file
/ similarity

/ clusters

/ result

State machine diagrams

State machine diagrams are similar to activity diagrams although notations and usage changes a
bit. They are sometime known as state diagrams or start chart diagrams as well. These are very
useful to describe the behavior of objects that act different according to the state they are at the
moment. Below State machine diagram show the basic states and actions.

select file

process

histograms

clusters

similarity

result

State Machine diagram in UML, sometime referred to as State or State chart diagram

3.2.3 Activity diagram:

Activity Diagram:
Activity diagrams describe the workflow behavior of a system. Activity diagrams are
similar to state diagrams because activities are the state of doing something. The diagrams
describe the state of activities by showing the sequence of activities performed. Activity
diagrams can show activities that are conditional or parallel.
How to Draw: Activity Diagrams
Activity diagrams show the flow of activities through the system. Diagrams are read
from top to bottom and have branches and forks to describe conditions and parallel activities. A
fork is used when multiple activities are occurring at the same time. The diagram below shows a
fork after activity1. This indicates that both activity2 and activity3 are occurring at the same
time. After activity2 there is a branch. The branch describes what activities will take place
based on a set of conditions. All branches at some point are followed by a merge to indicate the
end of the conditional behavior started by that branch. After the merge all of the parallel
activities must be combined by a join before transitioning into the final activity state. .
When to Use: Activity Diagrams
Activity diagrams should be used in conjunction with other modeling techniques such
as interaction diagrams and state diagrams. The main reason to use activity diagrams is to model
the workflow behind the system being designed. Activity Diagrams are also useful for:
analyzing a use case by describing what actions need to take place and when they should
occur; describing a complicated sequential algorithm; and modeling applications with parallel
processes.

select file

process

histogram

clusters

similarity

result

Component diagram
]A component diagram displays the structural relationship of components of a software system.
These are mostly used when working with complex systems that has many components.
Components communicate with each other using interfaces. The interfaces are linked using
connectors. Below images shows a component diagram.

Deployment Diagram
A deployment diagrams shows the hardware of your system and the software in those hardware.
Deployment diagrams are useful when your software solution is deployed across multiple
machines with each having a unique configuration. Below is an example deployment diagram.

UML Deployment Diagram ( Click on the image to use it as a template )

SAMPLE CODE

//Bit.java
import java.io.*;
import java.lang.*;
import java.util.*;
////////////////////Bit class
class Bit
{
//bit operations
public static int Power(int tBase,int tExponent)
{
int tAns=1,t;
for(t=1;t<=tExponent;t++)
{
tAns=tAns*tBase;
}
return(tAns);
}
public static int GetBit(int tValue,int tPos)
{
int tBit=0;
tBit=tValue&Power(2,tPos);
if(tBit>0) tBit=1;
return(tBit);
}

public static String DecToBin(int tValue,int tLength)

{
String tBitStr="";
int t;
for(t=0;t<=tLength-1;t++)
{
tBitStr=GetBit(tValue,t)+tBitStr;
}
return(tBitStr);
}
public static int GetBitOnCount(int tValue,int tLength)
{
String tBitStr;
int t,tCount=0;
tBitStr=DecToBin(tValue,tLength);
for(t=1;t<=tLength;t++)
{
if(tBitStr.substring(t-1,t).equals("1")==true)
{
tCount=tCount+1;
}
}
return(tCount);
}
}
//Dict.java
import java.io.*;
import java.lang.*;

import java.util.*;
class Dict
{
String words[];
int nwords;
int iwords[];
//constructor
Dict()
{
int maxWords=150000;
nwords=0;
words=new String[maxWords];
iwords=new int[26];
for(int t=0;t<26;t++) iwords[t]=0;
}
//methods
public void read_dictionary()
{
try
{
//System.out.println("Reading...");
//System.out.println("GNU Collaborative International Dictionary of
English (GCIDE)\n");
for(int i=0;i<26;i++)
{

String tfpath="dict\\gcide\\words_"+(char)(97+i)+".txt";
FileInputStream fin;
fin=new FileInputStream(tfpath);
int ch=0;
String tmp="";
while((ch=fin.read())!=-1)
{
if(ch==13)
{
addWord(tmp,i);
tmp="";
fin.read();
continue;
}
tmp+=(char)ch;
}
//System.out.println("gcide_"+(char)(97+i)+": "+iwords[i]+"
words");
}
}
catch(Exception e)
{
//System.out.println("Error: "+e.getMessage());
}
}
public void addWord(String tword,int alphabetIndex)
{

words[nwords]=tword;
nwords++;
iwords[alphabetIndex]++;
}
public boolean isWord(String tword)
{
boolean flag=false;
tword=tword.toLowerCase();
for(int t=0;t<nwords;t++)
{
if(tword.compareTo(words[t])==0)
{
flag=true;
break;
}
}
return(flag);
}
public String toString()
{
String tstr="";
tstr="\nTotal: "+nwords+" words";
return(tstr);
}
}

//DocumentIndexGraph.java
import java.lang.*;
import java.io.*;
public class DocumentIndexGraph
{
Itemset V;
ItemsetCollection E;
//constructor
public DocumentIndexGraph()
{
V=new Itemset();
E=new ItemsetCollection();
}
//get functions
public Itemset getV()
{
return(V);
}
public ItemsetCollection getE()
{
return(E);
}
//set functions
public void setV(Itemset tItemset)

{
V.clear();
V.appendItemset(tItemset);
}
public void setE(ItemsetCollection tItemsetCollection)
{
E.clear();
E.appendItemsetCollection(tItemsetCollection);
}
//methods
public void addNode(String tWord)
{
V.addItem(tWord);
}
public void addEdge(Itemset tEdge)
{
E.addItemset(tEdge);
}
public boolean isEdge(String str1,String str2)
{
boolean flag=false;
for(int t=0;t<=E.get_nItemsets()-1;t++)
{
String tstr1=E.getItemset(t).getItem(0);
String tstr2=E.getItemset(t).getItem(1);

if(str1.compareToIgnoreCase(tstr1)==0&&str2.compareToIgnoreCase(tstr2)==0)
{
flag=true;
break;
}
}
return(flag);
}
public boolean isPath(String str)
{
String tarr[]=StringUtils.split(str," ");
boolean flag=true;
for(int t=0;t<=tarr.length-2;t++)
{
if(isEdge(tarr[t],tarr[t+1])==false)
{
flag=false;
break;
}
}
return(flag);
}
public double findPhrasePathWeight(String str)
{
String tarr[]=StringUtils.split(str," ");

int tCount=0;
for(int t=0;t<=tarr.length-2;t++)
{
if(isEdge(tarr[t],tarr[t+1])==true)
{
tCount+=1;
}
}
double weight=(double)tCount/(double)(V.get_nItems());
return(weight);
}
}
//Hierarchical Clustering.java
import java.io.*;
import java.net.*;
import java.awt.*;
import java.awt.event.*;
import java.util.*;
import javax.swing.*;
import javax.swing.filechooser.*;
import org.jfree.ui.RefineryUtilities;
class Hier extends JFrame implements ActionListener
{
JFrame frmRootPath = new JFrame("Root Path : Clustering with Multi-Viewpoint based
Similarity Measure");
JFrame frmButton = new JFrame("Functions : Clustering with Multi-Viewpoint based
Similarity Measure");

JFrame frmResult = new JFrame("Result : Clustering with Multi-Viewpoint based

Similarity Measure");
JLabel lblRootPath=new JLabel("RootPath:");
JList lstRootPath=new JList();
JScrollPane spLinks=new JScrollPane(lstRootPath);
JButton btProcess=new JButton("Process");
ItemsetCollection Similarities=new ItemsetCollection();
JButton btHistogram=new JButton("Histogram");
JButton btCluster=new JButton("Clusters");
JButton btSimilarity=new JButton("Similarity");
ItemsetCollection Hist=new ItemsetCollection();
JLabel lblResult=new JLabel("Result:");
JTextArea txtResult=new JTextArea("");
JScrollPane spResult=new JScrollPane(txtResult);
//system parameters
double simalpha=0.6;
WebPageRetrieval tweb=new WebPageRetrieval();
Dict dict=new Dict();
double[][][] sim,sim_perc;
HTML_Parser parser1=new HTML_Parser();
Queue frontier=new Queue();
int maxPages=1000;
String[] visitedPages=new String[maxPages];
int nVisited=0;
String logPath="visitlog.txt";
String logText="";
//init documents
int nDocuments;

WebDocument documents[];
WebDocument CumulativeDocument;
ItemsetCollection Clusters=new ItemsetCollection();
Hier()
{
//Root path frame
frmRootPath.setDefaultLookAndFeelDecorated(true);
frmRootPath.setResizable(false);
frmRootPath.setBounds(50,50,400,400);
frmRootPath.getContentPane().setLayout(null);
//Functions frame
frmButton.setDefaultLookAndFeelDecorated(true);
frmButton.setResizable(false);
frmButton.setBounds(50,50,201,380);
frmButton.getContentPane().setLayout(null);
//Result frame
frmResult.setDefaultLookAndFeelDecorated(true);
frmResult.setResizable(false);
frmResult.setBounds(50,50,600,580);
frmResult.getContentPane().setLayout(null);
//Root path Design
lblRootPath.setBounds(50,15,100,20);
frmRootPath.getContentPane().add(lblRootPath);
spLinks.setBounds(48,35,270,200);
frmRootPath.getContentPane().add(spLinks);
//Process button Design

btProcess.setBounds(50,65,100,20);
btProcess.addActionListener(this);
frmButton.getContentPane().add(btProcess);
//Histogram button Design
btHistogram.setBounds(50,125,100,20);
btHistogram.addActionListener(this);
frmButton.getContentPane().add(btHistogram);
//Cluster button Design
btCluster.setBounds(50,185,100,20);
btCluster.addActionListener(this);
frmButton.getContentPane().add(btCluster);
//Similarity button Design
btSimilarity.setBounds(50,245,100,20);
btSimilarity.addActionListener(this);
frmButton.getContentPane().add(btSimilarity);
//Result Design
lblResult.setBounds(17,35,100,20);
frmResult.getContentPane().add(lblResult);
spResult.setBounds(15,55,540,450);
frmResult.getContentPane().add(spResult);
txtResult.setEditable(false);
//initialize lstRootPath
FileSystemView fv=FileSystemView.getFileSystemView();
File files[]=fv.getFiles(new File("data"),true);
Vector tvector=new Vector();
for(int t=0;t<files.length;t++)

{
String tFileName=fv.getSystemDisplayName(files[t]);
tvector.add(tFileName);
}
lstRootPath.setSelectionMode(ListSelectionModel.SINGLE_SELECTION);
lstRootPath.setListData(tvector);
lstRootPath.setSelectedIndex(0);
frmRootPath.setVisible(true);
frmButton.setVisible(true);
frmResult.setVisible(true);
}

public void actionPerformed(ActionEvent evt)

{
if(evt.getSource()==btProcess)
{
process();
}
if(evt.getSource()==btHistogram)
{
Histogram();
}
if(evt.getSource()==btCluster)
{
Cluster();
}

if(evt.getSource()==btSimilarity)
{
Similarity();
}
}
public void process()
{
try
{
dict.read_dictionary();
//starting-urls
String tRootPath=(String)lstRootPath.getSelectedValue();
frontier.enqueue(tRootPath);
//breadth-first-search
nVisited=0;
txtResult.setText("");
logText="";
while(nVisited<maxPages&&frontier.isEmpty()==false)
{
String tstrFrontier=frontier.toString();
String tPath=frontier.dequeue();
if(isVisitedPage(tPath)==false)
{
logText+="Frontier: "+tstrFrontier+"\n";
addVisitedPage(tPath);
logText+="Downloading ["+tPath+"]..."+"\n";

parser1.setFilePath("data\\"+tPath);
Queue q=parser1.findLinks();
logText+="ExtractedLinks: "+q.toString()+"\n";
printVisitedPages();
logText+="\n";
while(q.isEmpty()==false)
{
String tPath1=q.dequeue();
if(isVisitedPage(tPath1)==false)
frontier.enqueue(tPath1);
}
}
}
//write visitlog
FileOutputStream foutlog=new FileOutputStream(logPath);
foutlog.write(logText.getBytes());
foutlog.close();
//construct webdocuments and its DIG
nDocuments=nVisited;
documents=new WebDocument[nDocuments];
CumulativeDocument=new WebDocument();
//find metas and construct cumulative document index graph
addResultText(" Clustering with Multi-Viewpoint based Similarity
Measure:\n\n");
ItemsetCollection icWords=new ItemsetCollection();
ItemsetCollection icEdges=new ItemsetCollection();
for(int t=0;t<nDocuments;t++) //for each document
{

documents[t]=new WebDocument();
documents[t].setDocName(visitedPages[t]);
parser1.setFilePath("data\\"+visitedPages[t]);
Queue q=parser1.findMetas(); //get meta-data
String tstr=q.toString();
tstr=StringUtils.replaceString(tstr,",","");
tstr=StringUtils.replaceString(tstr,"{","");
tstr=StringUtils.replaceString(tstr,"}","");
//get unique words in this document
String tarr[]=StringUtils.split(tstr," ");
Itemset tItemset=new Itemset(tarr);
ItemsetCollection ic1=new ItemsetCollection(tItemset);
tItemset=ic1.getUniqueItemset();
simalpha=0.3;
//suppress non-dictionary words
for(int t1=0;t1<tItemset.get_nItems();t1++)
{
if(dict.isWord(tItemset.getItem(t1))==false)
{
//tItemset.removeItem(t1);
}
}
icWords.addItemset(tItemset);
documents[t].DIG.setV(tItemset);
//get unique edges in this document
tstr=q.toString();
tstr=StringUtils.replaceString(tstr,"{","");
tstr=StringUtils.replaceString(tstr,"}","");
tarr=StringUtils.split(tstr,", ");
for(int j=0;j<tarr.length;j++)
{

documents[t].addPhrase(tarr[j]);
CumulativeDocument.addPhrase(tarr[j]);
String[] tarr1=StringUtils.split(tarr[j]," ");
if(tarr1.length>1)
{
for(int k=0;k<=tarr1.length-2;k++)
{
//for(int k1=k+1;k1<=tarr1.length-1;k1+
+) //if word-(k+1) appears before word-k
//{
Itemset i1=new Itemset(); //if word(k+1) appears next to word-k
i1.addItem(tarr1[k]);
i1.addItem(tarr1[k+1]);
icEdges.addItemset(i1);
documents[t].DIG.addEdge(i1);
//}
}
}
}
}
//set graph nodes and edges
for(int t=0;t<nDocuments;t++)
{
ItemsetCollection ic1=new ItemsetCollection();
ic1=documents[t].DIG.getE();
documents[t].DIG.setE(ic1.getUniqueItemsetCollection());
}
CumulativeDocument.DIG.setV(icWords.getUniqueItemset());
CumulativeDocument.DIG.setE(icEdges.getUniqueItemsetCollection());

//show each document phrases and dig

for(int t=0;t<nDocuments;t++)
{
addResultText("Document"+t+": "+documents[t].getDocName()
+"\n");
addResultText("Phrases:\n"+documents[t].getPhrases().toString()
+"\n");
addResultText("Nodes:\n"+documents[t].getDIG().getV().toString()+"\n");
addResultText("Edges:\n"+documents[t].getDIG().getE().toString1()+"\n");
}
//set cumulative dig
addResultText("\nCumulative DIG:\n");
addResultText("Phrases:\n"+CumulativeDocument.getPhrases().toString()
+"\n");
addResultText("Nodes:\n"+CumulativeDocument.DIG.getV().toString());
addResultText("\nEdges:\n"+CumulativeDocument.DIG.getE().toString1());
//initialize clusters
//ItemsetCollection Clusters=new ItemsetCollection();
for(int t=0;t<nDocuments;t++)
{
Clusters.addItemset(new Itemset(""+t));
}
//construct histogram
//ItemsetCollection Hist=new ItemsetCollection();

double HRmin=1.0;
double HRmax=0.0;
/*for(int t=0;t<=nDocuments-2;t++)
{
for(int j=t+1;j<=nDocuments-1;j++)
{
double tsim=findSimilarity(documents[t],documents[j]);
if(HRmin>tsim) HRmin=tsim;
if(HRmax<tsim) HRmax=tsim;
}
}*/
//clustering
addResultText("\nSimilarities and its Corresponding OLP:\n");
Similarities=new ItemsetCollection();
double similarityThreshold=0.3;
sim=new double[nDocuments][nDocuments][1];
sim_perc=new double[nDocuments][nDocuments][1];
for(int t=0;t<=nDocuments-1;t++)
{
for(int j=0;j<=nDocuments-1;j++)
{
double hratio=findSimilarity(documents[t],documents[j]);
Itemset i1=new Itemset();
i1.addItem(""+t);
i1.addItem(""+j);
i1.addItem(""+hratio);
Similarities.addItemset(i1);
addResultText(" sim("+t+","+j+") :
"+hratio+"\n");
sim[t][j][0]=hratio;

OLP -->

sim_perc[t][j][0]=hratio*100;
if(hratio>=similarityThreshold)
{
String tstr1=""+t;
String tstr2=""+j;
int tNewClusterIndex=-1;
int tOldClusterIndex=-1;
for(int i=0;i<=Clusters.get_nItemsets()-1;i++)
{
if(Clusters.getItemset(i).isContains(tstr1)==true)
{
tNewClusterIndex=i;
}
if(Clusters.getItemset(i).isContains(tstr2)==true)
{
tOldClusterIndex=i;
}
}
if(tNewClusterIndex!=-1&&tOldClusterIndex!=-1)
{
Clusters.getItemset(tOldClusterIndex).removeItem(tstr2);
Clusters.getItemset(tNewClusterIndex).addItem(tstr2);
}
}
}

}
}
catch(IOException e)
{
System.out.println(e);
}
}
//display histogram
public void Histogram()
{
try
{
for(int i=0;i<nDocuments;i++)
{
Histogram hist = new Histogram("Document "+i+" Similiarity",sim_perc[i]);
hist.pack();
RefineryUtilities.centerFrameOnScreen(hist);
hist.setVisible(true);
}
/*txtResult.setText("");
//ItemsetCollection Hist=new ItemsetCollection();
//ItemsetCollection Similarities=new ItemsetCollection();
addResultText("\nHistogram:\n");
double tstart=0.0f;
double tinterval=0.1f;
for(int t=0;t<10;t++)
{
int tCount=0;
for(int j=0;j<=Similarities.get_nItemsets()-1;j++)
{

double
tsim=Double.parseDouble(Similarities.getItemset(j).getItem(2));
if(tsim>=tstart&&tsim<=tstart+tinterval)
{
tCount++;
}
}
Hist.addItemset(new Itemset(""+tCount));
addResultText("("+tstart+","+(tstart+tinterval)+"):
"+Hist.getItemset(Hist.get_nItemsets()-1)+"\n");
tstart+=tinterval;
}*/
}
catch(Exception e)
{
System.out.println(e);
}
}
//display clusters
public void Cluster()
{
try{
txtResult.setText("");
//ItemsetCollection Clusters=new ItemsetCollection();
addResultText("\nClusters With the Obtained OLP :\n");
int nClusters=0;
for(int t=0;t<=Clusters.get_nItemsets()-1;t++)
{
if(Clusters.getItemset(t).get_nItems()!=0)
{

addResultText("Cluster"+(nClusters+1)+":
"+Clusters.getItemset(t).toString()+"\n");
//

nCm(tstr2);

//Clusters.getItemset(tNewClusterIndex).addItem(tstr2);
nClusters+=1;
}
}

}
catch(Exception e)
{
}
}
public void Similarity()
{
try
{
txtResult.setText("");
ItemsetCollection Clusters=new ItemsetCollection();
addResultText("\nSimilarities and its Corresponding OLP:\n");
ItemsetCollection Similarities=new ItemsetCollection();
double similarityThreshold=0.3;
for(int t=0;t<=nDocuments-1;t++)
{
for(int j=0;j<=nDocuments-1;j++)
{
double hratio=findSimilarity(documents[t],documents[j]);
Itemset i1=new Itemset();
i1.addItem(""+t);

i1.addItem(""+j);
i1.addItem(""+hratio);
Similarities.addItemset(i1);
addResultText(" sim("+t+","+j+") :

OLP -->

"+hratio+"\n");
if(hratio>=similarityThreshold)
{
String tstr1=""+t;
String tstr2=""+j;
int tNewClusterIndex=-1;
int tOldClusterIndex=-1;
for(int i=0;i<=Clusters.get_nItemsets()-1;i++)
{
if(Clusters.getItemset(i).isContains(tstr1)==true)
{
tNewClusterIndex=i;
}
if(Clusters.getItemset(i).isContains(tstr2)==true)
{
tOldClusterIndex=i;
}
}
if(tNewClusterIndex!=-1&&tOldClusterIndex!=-1)
{
Clusters.getItemset(tOldClusterIndex).removeItem(tstr2);

Clusters.getItemset(tNewClusterIndex).addItem(tstr2);
}
}
}
}
}
catch(Exception e)
{
}
}

double findSimilarity(WebDocument d1,WebDocument d2)

{
double simp=findPhraseSimilarity(d1,d2);
double simt=findTermSimilarity(d1,d2);
double sim=(simalpha*simp)+((1.0-simalpha)*simt);
return(sim);
}
double findPhraseSimilarity(WebDocument d1,WebDocument d2)
{
WebDocument doc1=CombineDocument(d1,d2);
//find sigmaj
double sigmaj=0.0;
for(int t=0;t<d1.getPhrases().get_nItems();t++)

{
double s1j=StringUtils.split(d1.getPhrase(t)," ").length;
double tweight=doc1.DIG.findPhrasePathWeight(d1.getPhrase(t));
sigmaj+=s1j*tweight;
}
//find sigmak
double sigmak=0.0;
for(int t=0;t<d2.getPhrases().get_nItems();t++)
{
double s2k=StringUtils.split(d2.getPhrase(t)," ").length;
double tweight=doc1.DIG.findPhrasePathWeight(d2.getPhrase(t));
sigmak+=s2k*tweight;
}
double fragmentationFactor=1.2; //proposed constant
//find sigmap
double sigmap=0.0;
for(int t=0;t<doc1.getPhrases().get_nItems();t++)
{
double li=StringUtils.split(doc1.getPhrase(t)," ").length;
double si=doc1.getPhrases().get_nItems();
double gi=java.lang.Math.pow(li/si,fragmentationFactor);
double f1i=d1.findPhraseFrequency(doc1.getPhrase(t));
double w1i=doc1.DIG.findPhrasePathWeight(doc1.getPhrase(t));
double f2i=d2.findPhraseFrequency(doc1.getPhrase(t));
double w2i=doc1.DIG.findPhrasePathWeight(doc1.getPhrase(t));
double tsum=(f1i*w1i)+(f2i+w2i);
sigmap+=java.lang.Math.pow(gi*tsum,2.0);
}

//find sim_p
double simp=java.lang.Math.sqrt(sigmap);
simp/=(sigmaj+sigmak);
return(simp);
}
double findTermSimilarity(WebDocument d1,WebDocument d2)
{
WebDocument doc1=CombineDocument(d1,d2);
double sigma1=0.0;
double sigma21=0.0,sigma22=0.0;
for(int t=0;t<doc1.DIG.V.get_nItems();t++)
{
double tfidf1=findTFIDF(doc1.DIG.V.getItem(t),d1);
double tfidf2=findTFIDF(doc1.DIG.V.getItem(t),d2);
sigma1+=tfidf1*tfidf2;
sigma21+=tfidf1*tfidf1;
sigma22+=tfidf2*tfidf2;
}
//consine similarity
double simt=sigma1/java.lang.Math.sqrt(sigma21*sigma22);
return(simt);
}
double findTFIDF(String term,WebDocument d1)
{
//find tf

double n1=d1.findTermFrequency(term);
double tsum=0.0;
for(int t=0;t<d1.DIG.V.get_nItems();t++)
{
tsum+=d1.findTermFrequency(d1.DIG.V.getItem(t));
}
double tf=n1/tsum;
//find idf
int tDocCount=0;
for(int t=0;t<nDocuments;t++)
{
if(documents[t].DIG.V.isContains(term)==true)
{
tDocCount+=1;
}
}
double tval=(double)nDocuments/(double)tDocCount;
double idf=java.lang.Math.log(tval);
double tfidf=tf*idf;
return(tfidf);
}
WebDocument CombineDocument(WebDocument d1,WebDocument d2)
{
//construct combined doc to find matching phrases
DocumentIndexGraph dig1=new DocumentIndexGraph();
dig1.V.appendItemset(d1.DIG.V);
dig1.V.appendItemset(d2.DIG.V);

ItemsetCollection ic1=new ItemsetCollection(dig1.V);

dig1.V=ic1.getUniqueItemset();
dig1.E.appendItemsetCollection(d1.DIG.E);
dig1.E.appendItemsetCollection(d2.DIG.E);
ic1=dig1.E;
dig1.E=ic1.getUniqueItemsetCollection();
WebDocument doc1=new WebDocument();
doc1.setDIG(dig1);
doc1.Phrases.appendItemset(d1.getPhrases());
doc1.Phrases.appendItemset(d2.getPhrases());
ic1=new ItemsetCollection(doc1.getPhrases());
doc1.setPhrases(ic1.getUniqueItemset());
return(doc1);
}
void addResultText(String tStr)
{
txtResult.append(tStr);
txtResult.updateUI();
}
private void addVisitedPage(String tStr)
{
if(isVisitedPage(tStr)==false)
{
visitedPages[nVisited]=tStr;
nVisited++;
}
}
private boolean isVisitedPage(String tStr)

{
boolean visited=false;
for(int t=0;t<nVisited;t++)
{
if(tStr.compareToIgnoreCase(visitedPages[t])==0)
{
visited=true;
}
}
return(visited);
}
private void printVisitedPages()
{
logText+="visited:"+"\n";
for(int t=0;t<nVisited;t++)
{
logText+="["+visitedPages[t]+"]"+"\n";
}
}
static public void main(String[] args)
{
try {
UIManager.setLookAndFeel("com.sun.java.swing.plaf.windows.WindowsLookAndFeel");
} catch (Exception e) {
e.printStackTrace();
}
new Hier();

}
}
//Histogram
import java.awt.*;
import org.jfree.chart.*;
import org.jfree.chart.axis.*;
import org.jfree.chart.plot.*;
import org.jfree.chart.renderer.category.*;
import org.jfree.data.category.*;
import org.jfree.data.category.*;
import org.jfree.data.general.*;
import org.jfree.ui.*;
/**
* A simple demonstration application showing how to create a bar chart.
*
*/
public class Histogram extends ApplicationFrame {
/**
* Creates a new demo instance.
*
* @param title the frame title.
*/
public Histogram(final String title,double[][] sim) {
super(title);
final CategoryDataset dataset = createDataset(sim);
final JFreeChart chart = createChart(title,dataset);
final ChartPanel chartPanel = new ChartPanel(chart);

chartPanel.setPreferredSize(new Dimension(500, 270));

setContentPane(chartPanel);
}
/**
* Returns a sample dataset.
*
* @return The dataset.
*/
private CategoryDataset createDataset(double[][] sim) {
// create the dataset...
return DatasetUtilities.createCategoryDataset("","",sim);

}
/**
* Creates a sample chart.
*
* @param dataset the dataset.
*
* @return The chart.
*/
private JFreeChart createChart(String title,final CategoryDataset dataset) {
// create the chart...
final JFreeChart chart = ChartFactory.createBarChart(
title,
"",

// chart title
// domain axis label

"Similiarity",
dataset,

// range axis label

// data

PlotOrientation.VERTICAL, // orientation
true,

// include legend

true,

// tooltips?

false

// URLs?

);
// NOW DO SOME OPTIONAL CUSTOMISATION OF THE CHART...
// set the background color for the chart...
chart.setBackgroundPaint(Color.white);
// get a reference to the plot for further customisation...
final CategoryPlot plot = chart.getCategoryPlot();
plot.setBackgroundPaint(Color.lightGray);
plot.setDomainGridlinePaint(Color.white);
plot.setRangeGridlinePaint(Color.white);
// set the range axis to display integers only...
final NumberAxis rangeAxis = (NumberAxis) plot.getRangeAxis();
rangeAxis.setStandardTickUnits(NumberAxis.createIntegerTickUnits());
// disable bar outlines...
final BarRenderer renderer = (BarRenderer) plot.getRenderer();
renderer.setDrawBarOutline(false);
// set up gradient paints for series...
final GradientPaint gp0 = new GradientPaint(
0.0f, 0.0f, Color.blue,
0.0f, 0.0f, Color.lightGray

);
final GradientPaint gp1 = new GradientPaint(
0.0f, 0.0f, Color.green,
0.0f, 0.0f, Color.lightGray
);
final GradientPaint gp2 = new GradientPaint(
0.0f, 0.0f, Color.red,
0.0f, 0.0f, Color.lightGray
);
renderer.setSeriesPaint(0, gp0);
renderer.setSeriesPaint(1, gp1);
renderer.setSeriesPaint(2, gp2);
final CategoryAxis domainAxis = plot.getDomainAxis();
domainAxis.setCategoryLabelPositions(
CategoryLabelPositions.createUpRotationLabelPositions(Math.PI / 6.0)
);
// OPTIONAL CUSTOMISATION COMPLETED.
return chart;
}
//
****************************************************************************
// * JFREECHART DEVELOPER GUIDE

// * The JFreeChart Developer Guide, written by David Gilbert, is available *

// * to purchase from Object Refinery Limited:
// *

*
*

// * http://www.object-refinery.com/jfreechart/guide.html
// *

// * Sales are used to provide funding for the JFreeChart project - please
// * support us so that we can continue developing free software.

*
*

//
****************************************************************************
/**
* Starting point for the demonstration application.
*
* @param args ignored.
*/
/*public static void main(final String[] args) {
final Histogram demo = new Histogram("Bar Chart Demo");
demo.pack();
RefineryUtilities.centerFrameOnScreen(demo);
demo.setVisible(true);
}*/
}
//ItemsetCollection.java
import java.lang.*;
import java.io.*;
import java.util.*;
////////////////////ItemsetCollection class
class ItemsetCollection
{
ArrayList Itemsets;

static boolean printStatus=false;

//constructors
public ItemsetCollection()
{
Itemsets=new ArrayList();
}
public ItemsetCollection(Itemset tItemset)
{
Itemsets=new ArrayList();
Itemsets.add(tItemset);
}
public ItemsetCollection(String[] tarr)
{
Itemsets=new ArrayList();
for(int t=0;t<tarr.length;t++)
{
Itemsets.add(new Itemset(tarr[t]));
}
}
//get functions
public int get_nItemsets()
{
return(Itemsets.size());
}
public Itemset getItemset(int tIndex)
{

Itemset tItemset=new Itemset();

if(tIndex>=0&&tIndex<=Itemsets.size()-1)
{
tItemset=(Itemset)Itemsets.get(tIndex);
}
return(tItemset);
}
//set functions
public void setItemsets(ItemsetCollection tItemsetCollection)
{
clear();
for(int t=0;t<=tItemsetCollection.get_nItemsets()-1;t++)
{
addItemset(tItemsetCollection.getItemset(t));
}
}
//methods
public void addItemset(Itemset tItemset)
{
Itemset i1=new Itemset();
for(int t=0;t<tItemset.get_nItems();t++) i1.addItem(tItemset.getItem(t));
Itemsets.add(i1);
}
public void appendItemsetCollection(ItemsetCollection tItemsetCollection)
{
int t;

for(t=0;t<=tItemsetCollection.get_nItemsets()-1;t++)
{
addItemset(tItemsetCollection.getItemset(t));
}
}
public void removeItemset(Itemset tItemset)
{
for(int i=0;i<=Itemsets.size()-1;i++)
{
if(getItemset(i).isEquals(tItemset)==true)
{
Itemsets.remove(i);
break;
}
}
}
public void removeItemset(int tIndex)
{
if(tIndex>=0&&tIndex<=Itemsets.size()-1)
{
removeItemset(getItemset(tIndex));
}
}
public void removeItemsetCollection(ItemsetCollection tItemsetCollection)
{
for(int t=0;t<=tItemsetCollection.get_nItemsets()-1;t++)
{
removeItemset(tItemsetCollection.getItemset(t));

}
}
public void removeEmptyItemsets()
{
for(int t=0;t<=Itemsets.size()-1;t++)
{
if(getItemset(t).get_nItems()==0)
{
removeItemset(t);
}
}
}
public void clear()
{
Itemsets.clear();
}
public Itemset getUniqueItemset()
{
Itemset tItemset=new Itemset();
for(int i=0;i<=Itemsets.size()-1;i++)
{
for(int j=0;j<=getItemset(i).get_nItems()-1;j++)
{
if(tItemset.isContains(getItemset(i).getItem(j))==false)
{
tItemset.addItem(getItemset(i).getItem(j));
}

}
}
return(tItemset);
}
public ItemsetCollection getUniqueItemsetCollection()
{
ItemsetCollection ic1=new ItemsetCollection();
for(int i=0;i<=Itemsets.size()-1;i++)
{
if(ic1.isContains(getItemset(i))==false)
{
ic1.addItemset(getItemset(i));
}
}
return(ic1);
}
public double getSupport(String tItem)
{
int t,tCount=0;
double tSupport;
for(t=0;t<=Itemsets.size()-1;t++)
{
if(getItemset(t).isContains(tItem)==true)
{
tCount=tCount+1;

}
}
tSupport=((double)tCount/(double)Itemsets.size())*100.0;
tSupport=Math.round(tSupport);
return(tSupport);
}
public double getSupport(Itemset tItemset)
{
int t,tCount=0;
double tSupport;
for(t=0;t<=Itemsets.size()-1;t++)
{
if(getItemset(t).isContains(tItemset)==true)
{
tCount=tCount+1;
}
}
tSupport=((double)tCount/(double)Itemsets.size())*100.0;
tSupport=Math.round(tSupport);
return(tSupport);
}
public int getSupportCount(Itemset tItemset)
{
int t,tCount=0;
for(t=0;t<=Itemsets.size()-1;t++)

{
if(getItemset(t).isContains(tItemset)==true)
{
tCount=tCount+1;
}
}
return(tCount);
}
public boolean isContains(Itemset tItemset)
{
boolean found=false;
for(int t=0;t<=Itemsets.size()-1;t++)
{
if(getItemset(t).isContains(tItemset)==true)
{
found=true;
break;
}
}
return(found);
}
public String toString()
{
String tStr="";
for(int t=0;t<=Itemsets.size()-1;t++)

{
tStr=tStr+getItemset(t).toString()+"\n\r\n\r";
if(printStatus==true)
{
System.out.print(t+" transactions, "+(tStr.length()/1024)+"k...\r");
}
}
return(tStr);
}
public String toString1()
{
String tStr="";
for(int t=0;t<=Itemsets.size()-1;t++)
{
tStr=tStr+getItemset(t).toString()+"\n";
if(printStatus==true)
{
System.out.print(t+" transactions, "+(tStr.length()/1024)+"k...\r");
}
}
return(tStr);
}
}

//WebPageRetrieval.java
import java.io.*;
import java.net.*;
class WebPageRetrieval
{
public static void openWebpage(String tstrURL) throws Exception
{
URL target=new URL(tstrURL);
URLConnection con=target.openConnection();
byte b[]=new byte[1028];
int n=0;
System.out.println("Reading: ["+tstrURL+"]:");
BufferedInputStream in=new BufferedInputStream(con.getInputStream(),8080);
while((n=in.read(b,0,1024))!=-1)
{
System.out.println(new String(b,0,0,n));
}
System.out.println("\nContentType: "+con.getContentType());
System.out.println("ContentLength: "+con.getContentLength());
}
public static void main(String args[]) throws Exception
{
openWebpage("http://www.yahoo.com/");
}
}

Screen shots

TESTING
Testing is a process of executing a program with the intent of finding an error. A good test
case is one that has a high probability of finding an as-yet undiscovered error. A successful test
is one that uncovers an as-yet- undiscovered error. System testing is the stage of implementation,
which is aimed at ensuring that the system works accurately and efficiently as expected before
live operation commences. It verifies that the whole set of programs hang together. System
testing requires a test consists of several key activities and steps for run program, string, system
and is important in adopting a successful new system. This is the last chance to detect and correct
errors before the system is installed for user acceptance testing.
The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise the program or the project is not said to be complete. Software
testing is the critical element of software quality assurance and represents the ultimate the review
of specification design and coding. Testing is the process of executing the program with the
intent of finding the error. A good test case design is one that as a probability of finding a yet
undiscovered error. A successful test is one that uncovers a yet undiscovered error. Any
engineering product can be tested in one of the two ways:
6.1 Unit Tesing:
6.1.1 White Box Testing:
This testing is also called as Glass box testing. In this testing, by knowing the specific
functions that a product has been design to perform test can be conducted that demonstrate each
function is fully operational at the same time searching for errors in each function. It is a test
case design method that uses the control structure of the procedural design to derive test cases.
Basis path testing is a white box testing.
Basis path testing:
Flow graph notation
Cyclometric complexity

Deriving test cases

6.1.2 Black Box Testing:
In this testing by knowing the internal operation of a product, test can be conducted
to ensure that all gears mesh, that is the internal operation performs according to specification
and all internal components have been adequately exercised. It fundamentally focuses on the
functional requirements of the software.
The steps involved in black box test case design are:

Graph based testing methods

Equivalence partitioning

Boundary value analysis

Comparison testing

6.1.3 Test Case Specifications:

Testcase
number

Testcase
Select file

Input
File name

Expected
output

Obtained
output

Started process Give result

and
devide
histograms

CONCLUSIONS AND FUTIRE WORK

In this paper we propose a Multiviewpoint-based Similarity measuring method, named MVS.
Theoretical analysis and empirical examples show that MVS is potentially more suitable for text
documents than the popular cosine similarity. Based on MVS, two criterion functions, IR and
IV , and their respective clustering algorithms, MVSC-IR and MVSC-IV , have been introduced.
Compared with other state-of-the-art clustering methods that use different types of similarity
measure, on a large number of document data sets and under different evaluation metrics, the
proposed algorithms show that they could provide significantly improved clustering
performance. The key contribution of this paper is the fundamental concept of similarity measure
from multiple viewpoints. Future methods could make use of the same principle, but define
alternative forms for the relative similarity in (10), or do not use average but have other methods
to combine the relative similarities according to the different viewpoints. Besides, this paper
focuses on partitioned clustering of documents. In the future, it would also be possible to apply
the proposed criterion functions for hierarchical clustering algorithms. Finally, we have shown
the application of MVS and its clustering algorithms for text data. It would be interesting to
explore how they work on other types of sparse and high-dimensional data.

REFERENCES
1. X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J.McLachlan, A. Ng, B.
Liu, P.S. Yu, Z.-H. Zhou, M. Steinbach, D.J.Hand, and D. Steinberg, Top 10 Algorithms in Data
Mining, Knowledge Information Systems, vol. 14, no. 1, pp. 1-37, 2007.
2. I. Guyon, U.V. Luxburg, and R.C. Williamson, Clustering:Science or Art?, Proc. NIPS
Workshop Clustering Theory, 2009.
3.

Dhillon and D. Modha, Concept Decompositions for Large Sparse Text Data Using

Clustering, Machine Learning, vol. 42,nos. 1/2, pp. 143-175, Jan. 2001.
4. S. Zhong, Efficient Online Spherical K-means Clustering, Proc.IEEE Intl Joint Conf.
Neural Networks (IJCNN), pp. 3180-3185, 2005.
5. A. Banerjee, S. Merugu, I. Dhillon, and J. Ghosh, Clustering with Bregman Divergences, J.
Machine Learning Research, vol. 6,pp. 1705-1749, Oct. 2005.
6. E. Pekalska, A. Harol, R.P.W. Duin, B. Spillmann, and H. Bunke,Non-Euclidean or NonMetric Measures Can Be Informative,Structural, Syntactic, and Statistical Pattern Recognition,
vol. 4109,pp. 871-880, 2006.
7. M. Pelillo, What Is a Cluster? Perspectives from Game Theory,Proc. NIPS Workshop
Clustering Theory, 2009.
8. D. Lee and J. Lee, Dynamic Dissimilarity Measure for Support Based Clustering, IEEE
Trans. Knowledge and Data Eng., vol. 22,no. 6, pp. 900-905, June 2010.
9. A. Banerjee, I. Dhillon, J. Ghosh, and S. Sra, Clustering on the Unit Hypersphere Using Von
Mises-Fisher Distributions,J. Machine Learning Research, vol. 6, pp. 1345-1382, Sept. 2005.
10. W. Xu, X. Liu, and Y. Gong, Document Clustering Based on Non-Negative Matrix
Factorization, Proc. 26th Ann. Intl ACM SIGIR Conf. Research and Development in
Informaion Retrieval, pp. 267-273,2003.
11. I.S. Dhillon, S. Mallela, and D.S. Modha, Information-Theoretic Co-Clustering, Proc.
Ninth ACM SIGKDD Intl Conf. Knowledge Discovery and Data Mining (KDD), pp. 89-98,
2003.

12. C.D. Manning, P. Raghavan, and H. Schu tze, An Introduction to Information Retrieval.
Cambridge Univ. Press, 2009.
13. C. Ding, X. He, H. Zha, M. Gu, and H. Simon, A Min-Max Cut Algorithm for Graph
Partitioning and Data Clustering, Proc.IEEE Intl Conf. Data Mining (ICDM), pp. 107-114,
2001.
14.

H. Zha, X. He, C. Ding, H. Simon, and M. Gu, Spectral Relaxation for K-Means

Clustering, Proc. Neural Info. Processing Systems (NIPS), pp. 1057-1064, 2001.
15. J. Shi and J. Malik, Normalized Cuts and Image Segmentation,IEEE Trans. Pattern
Analysis Machine Intelligence, vol. 22, no. 8,pp. 888-905, Aug. 2000.
16.

I.S. Dhillon, Co-Clustering Documents and Words Using Bipartite Spectral Graph

Partitioning, Proc. Seventh ACM SIGKDD Intl Conf. Knowledge Discovery and Data Mining
(KDD),pp. 269-274, 2001.
17. Y. Gong and W. Xu, Machine Learning for Multimedia Content Analysis. Springer-Verlag,
2007.
18. Y. Zhao and G. Karypis, Empirical and Theoretical Comparisons of Selected Criterion
Functions for Document Clustering,Machine Learning, vol. 55, no. 3, pp. 311-331, June 2004.
19. G. Karypis, CLUTO a Clustering Toolkit, technical report, Dept.of Computer Science,
Univ. of Minnesota, http://glaros.dtc.umn.edu/~gkhome/views/cluto, 2003.
20.

A. Strehl, J. Ghosh, and R. Mooney, Impact of Similarity Measures on Web-Page

Clustering, Proc. 17th Natl Conf. Artificial Intelligence: Workshop of Artificial Intelligence for
Web Search (AAAI),pp. 58-64, July 2000.

IES VE Tutorial
100% (1)
IES VE Tutorial
49 pages
Employee Job Satisfaction - Big Bazar - Ashok
92% (13)
Employee Job Satisfaction - Big Bazar - Ashok
96 pages
Dot Net Project Document
100% (1)
Dot Net Project Document
123 pages
Semester 2 Final Exam Study Guide 2 Answer Key
No ratings yet
Semester 2 Final Exam Study Guide 2 Answer Key
3 pages
Direction: Choose Your Answer in Column B
No ratings yet
Direction: Choose Your Answer in Column B
12 pages
Paper 16 - Clustering Applied To Data Structuring and Retrieval
No ratings yet
Paper 16 - Clustering Applied To Data Structuring and Retrieval
6 pages
A Novel Multi-Viewpoint Based Similarity Measure For Document Clustering
No ratings yet
A Novel Multi-Viewpoint Based Similarity Measure For Document Clustering
4 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
5 pages
Data Clustering: 50 Years Beyond K-Means
No ratings yet
Data Clustering: 50 Years Beyond K-Means
35 pages
Comparative Study of Document Similarity Algorithms and Clustering Algorithms For Sentiment Analysis
No ratings yet
Comparative Study of Document Similarity Algorithms and Clustering Algorithms For Sentiment Analysis
4 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
12 pages
Data mining, Vipin Kumar, Pang-Ning Tan, Michael Steinback, Anuj Karpatne - Introduction to Data Mining-Pearson (1)
No ratings yet
Data mining, Vipin Kumar, Pang-Ning Tan, Michael Steinback, Anuj Karpatne - Introduction to Data Mining-Pearson (1)
81 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
ML Unit V
No ratings yet
ML Unit V
26 pages
5) - Differentiate Between K-Means and Hierarchical Clustering
No ratings yet
5) - Differentiate Between K-Means and Hierarchical Clustering
4 pages
An MDL Framework For Data Clustering: Petri Kontkanen, Petri Myllym Aki, Wray Buntine, Jorma Rissanen and Henry Tirri
No ratings yet
An MDL Framework For Data Clustering: Petri Kontkanen, Petri Myllym Aki, Wray Buntine, Jorma Rissanen and Henry Tirri
35 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Unit - 4 - Modified
No ratings yet
Unit - 4 - Modified
152 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
Ref 2 Hierarchical
No ratings yet
Ref 2 Hierarchical
7 pages
Clustering Algorithm With A Novel Similarity Measure: Gaddam Saidi Reddy, Dr.R.V.Krishnaiah
No ratings yet
Clustering Algorithm With A Novel Similarity Measure: Gaddam Saidi Reddy, Dr.R.V.Krishnaiah
6 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
Ijdkp 030205
No ratings yet
Ijdkp 030205
18 pages
TwoStep Cluster Analysis
No ratings yet
TwoStep Cluster Analysis
35 pages
Spatial Data Mining On Remote Sensing Pe
No ratings yet
Spatial Data Mining On Remote Sensing Pe
9 pages
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
No ratings yet
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
24 pages
Clustering Hierarchical Algorithms
100% (1)
Clustering Hierarchical Algorithms
21 pages
Camintac Essay - Nubbh Kejriwal
No ratings yet
Camintac Essay - Nubbh Kejriwal
4 pages
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
No ratings yet
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
5 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
MVS Clustering of Sparse and High Dimensional Data
No ratings yet
MVS Clustering of Sparse and High Dimensional Data
5 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
5 pages
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
No ratings yet
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
10 pages
Unit IV and V
No ratings yet
Unit IV and V
19 pages
Document Clustering in Web Search Engine: International Journal of Computer Trends and Technology-volume3Issue2 - 2012
No ratings yet
Document Clustering in Web Search Engine: International Journal of Computer Trends and Technology-volume3Issue2 - 2012
4 pages
MOD 5 BUSAN
No ratings yet
MOD 5 BUSAN
5 pages
V5I5201699a5
No ratings yet
V5I5201699a5
7 pages
Analysis of Dendrogram Tree For Identifying and Visualizing Trends in Multi-Attribute Transactional Data
No ratings yet
Analysis of Dendrogram Tree For Identifying and Visualizing Trends in Multi-Attribute Transactional Data
5 pages
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
CLuster Time Series
No ratings yet
CLuster Time Series
8 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
File
No ratings yet
File
20 pages
A Hybrid Approach To Speed-Up The NG20 Data Set Clustering Using K-Means Clustering Algorithm
No ratings yet
A Hybrid Approach To Speed-Up The NG20 Data Set Clustering Using K-Means Clustering Algorithm
8 pages
Recursive Hierarchical Clustering Algorithm
No ratings yet
Recursive Hierarchical Clustering Algorithm
7 pages
Analysis&Comparisonof Efficient Techniquesof
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
5 pages
anupama luthra_2011
No ratings yet
anupama luthra_2011
21 pages
Clustering Techniques
No ratings yet
Clustering Techniques
30 pages
Unit- 4 DMA
No ratings yet
Unit- 4 DMA
145 pages
Clustering
No ratings yet
Clustering
34 pages
IR 2 - Implementation of Single Pass Algorithm For Clustering
No ratings yet
IR 2 - Implementation of Single Pass Algorithm For Clustering
4 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Surveyofclusteringmethods
No ratings yet
Surveyofclusteringmethods
29 pages
Analysis and Optimization of Data Classification Using K-Means Clustering and Affinity Propagation Technique
No ratings yet
Analysis and Optimization of Data Classification Using K-Means Clustering and Affinity Propagation Technique
9 pages
Map-Reduce (Hadoop) Based Data Clustering For BigData A Survey
No ratings yet
Map-Reduce (Hadoop) Based Data Clustering For BigData A Survey
6 pages
S VD For Clustering
No ratings yet
S VD For Clustering
10 pages
Recent Advances in Clustering A Brief Survey
No ratings yet
Recent Advances in Clustering A Brief Survey
9 pages
An Improved Technique For Document Clustering
No ratings yet
An Improved Technique For Document Clustering
4 pages
13_Unsupervised_Learning
No ratings yet
13_Unsupervised_Learning
9 pages
The K-Means Clustering Technique General Considera
No ratings yet
The K-Means Clustering Technique General Considera
11 pages
Predicting Students' Performance Using K-Median Clustering
No ratings yet
Predicting Students' Performance Using K-Median Clustering
4 pages
Text, Web and Social Media Analytics: SE Computer, Sem VIII Academic Year: 2023 - 24
No ratings yet
Text, Web and Social Media Analytics: SE Computer, Sem VIII Academic Year: 2023 - 24
36 pages
Cluster Analysis or Clustering Is The Art of Separating The Data Points Into Dissimilar Group With A
No ratings yet
Cluster Analysis or Clustering Is The Art of Separating The Data Points Into Dissimilar Group With A
11 pages
An Approach of Hybrid Clustering Technique For Maximizing Similarity of Gene Expression
No ratings yet
An Approach of Hybrid Clustering Technique For Maximizing Similarity of Gene Expression
14 pages
HierarchicalClusterAnalysis1
No ratings yet
HierarchicalClusterAnalysis1
13 pages
BC2017
No ratings yet
BC2017
28 pages
Retail Marketing System - Heritage
No ratings yet
Retail Marketing System - Heritage
84 pages
Retail Marketing System - Heritage
No ratings yet
Retail Marketing System - Heritage
84 pages
Confirmation: Group of Institutions", Chebrolu (V&M), Guntur (DT) Studying in The Course of M.B.A
No ratings yet
Confirmation: Group of Institutions", Chebrolu (V&M), Guntur (DT) Studying in The Course of M.B.A
2 pages
Advocate Document
No ratings yet
Advocate Document
111 pages
Corporate Evaluation - Vazir
No ratings yet
Corporate Evaluation - Vazir
73 pages
Brand Image - Hyundai
100% (1)
Brand Image - Hyundai
60 pages
11.2 - Electricity II - Online
No ratings yet
11.2 - Electricity II - Online
10 pages
R2 - Hina Rizvi
No ratings yet
R2 - Hina Rizvi
7 pages
Brochure Rodillo Compactador 3411 Hamm
No ratings yet
Brochure Rodillo Compactador 3411 Hamm
2 pages
UML Deployment WS P
No ratings yet
UML Deployment WS P
26 pages
Morse 2015
No ratings yet
Morse 2015
11 pages
Volumetric Analysis 11
100% (1)
Volumetric Analysis 11
3 pages
Terex RH40E - DataSheet - English
No ratings yet
Terex RH40E - DataSheet - English
8 pages
Tutorial 4 - BMCG 2613 - Hydrostatic Force PDF
No ratings yet
Tutorial 4 - BMCG 2613 - Hydrostatic Force PDF
2 pages
Performance Evaluation of A Double Drum Dryer For Potato Flake Production
No ratings yet
Performance Evaluation of A Double Drum Dryer For Potato Flake Production
8 pages
St. Paul University Philippines
No ratings yet
St. Paul University Philippines
46 pages
Ifm Pressure Sensors From Ifm Selection Guide 2016 (EN)
No ratings yet
Ifm Pressure Sensors From Ifm Selection Guide 2016 (EN)
21 pages
Tesla R32 Heat Pump Air To Water TGTP-14HMDB3 Spec 2023 ENG
No ratings yet
Tesla R32 Heat Pump Air To Water TGTP-14HMDB3 Spec 2023 ENG
1 page
Operation and Maintenance Manual: LED Light Machine Operation Lamp W.O. Transformer Art. 0372
No ratings yet
Operation and Maintenance Manual: LED Light Machine Operation Lamp W.O. Transformer Art. 0372
11 pages
Answer Key: C A A D A
No ratings yet
Answer Key: C A A D A
2 pages
BLUEPRINT 2025 10
No ratings yet
BLUEPRINT 2025 10
25 pages
Aircraft Airspeed Knots Pilot
No ratings yet
Aircraft Airspeed Knots Pilot
7 pages
BOQU MS-301 Datasheet
No ratings yet
BOQU MS-301 Datasheet
3 pages
Donnelly Chute
No ratings yet
Donnelly Chute
14 pages
Reinforced Cement Concrete (R.C.C.) : Specifications For
No ratings yet
Reinforced Cement Concrete (R.C.C.) : Specifications For
20 pages
Effects of Goal Orientation, Self-Complexity On The Audit Judgement Performance of Malaysian Auditors
No ratings yet
Effects of Goal Orientation, Self-Complexity On The Audit Judgement Performance of Malaysian Auditors
15 pages
Metal Forming
No ratings yet
Metal Forming
36 pages
Modules in Python
No ratings yet
Modules in Python
14 pages
Fabco 115
No ratings yet
Fabco 115
2 pages
"Molecular Behavior": Properties of Matter Solid Liquid GAS 1
No ratings yet
"Molecular Behavior": Properties of Matter Solid Liquid GAS 1
3 pages
Methodology and Seesaw
No ratings yet
Methodology and Seesaw
8 pages
PCSplus
No ratings yet
PCSplus
2 pages
Week 4
No ratings yet
Week 4
33 pages

Clustering With Multiviewpoint-Based Similarity Measure: Abstract

Uploaded by

Clustering With Multiviewpoint-Based Similarity Measure: Abstract

Uploaded by

Clustering with Multiviewpoint-Based Similarity Measure

Minimum Hardware Specification

Figure 4.1: Working Of Java

The Java platform has two components:

Figure 4.2: The Java Platform

Its a big library, but its designed to have

Figure 4.3 knowledge discovery process

Figure 3.4 Modeling a System Architecture using views of UML

3.2.1 Use-Case diagram:

Dependency, Generalization, and association relationships

3.2.2 Class Diagram:

UML Class Diagram with Relationships

2 : process the file()

State machine diagrams

3.2.3 Activity diagram:

UML Deployment Diagram ( Click on the image to use it as a template )

public static String DecToBin(int tValue,int tLength)

JFrame frmResult = new JFrame("Result : Clustering with Multi-Viewpoint based

public void actionPerformed(ActionEvent evt)

//show each document phrases and dig

double findSimilarity(WebDocument d1,WebDocument d2)

ItemsetCollection ic1=new ItemsetCollection(dig1.V);

chartPanel.setPreferredSize(new Dimension(500, 270));

// range axis label

// * The JFreeChart Developer Guide, written by David Gilbert, is available *

static boolean printStatus=false;

Itemset tItemset=new Itemset();

Deriving test cases

Graph based testing methods

Boundary value analysis

6.1.3 Test Case Specifications:

Started process Give result

CONCLUSIONS AND FUTIRE WORK

A. Strehl, J. Ghosh, and R. Mooney, Impact of Similarity Measures on Web-Page

You might also like