javaDocument-2
javaDocument-2
1. INTRODUCTION
1.1 PURPOSE
The Web has evolved into a data-rich repository containing a large amount of
structured content spread across millions of sources. The usefulness of Web data
increases exponentially (e.g., building knowledge bases, Web-scale data analytics)
when it is linked across numerous sources. Structured data on the Web resides in Web
databases and Web tables. Web data integration is an important component of many
applications collecting data from Web databases, such as Web data warehousing (e.g.,
Google and Bing Shopping; Google Scholar), data aggregation (e.g., product and
service reviews), and metasearching. Integration systems at Web scale need to
automatically match records from different sources that refer to the same real-world
entity find the true matching records among them and turn this set of records into a
standard record for the consumption of users or other applications. There is a large
body of work on the record matching problem and the truth discovery problem. The
record matching problem is also referred to as duplicate record detection, record
linkage, object identification, entity resolution, or deduplication and the truth
discovery problem is also called as truth finding or fact finding - a key problem in
data fusion. In this paper, we assume that the tasks of record matching and truth
discovery have been performed and that the groups of true matching records have thus
been identified. Our goal is to generate a uniform, standard record for each group of
true matching records for end-user consumption. We call the generated record the
normalized record. We call the problem of computing the normalized record for a
group of matching records the record normalization problem (RNP), and it is the
focus of this work.
1.1 SCOPE
present some random record from the group, to just name a couple of ad-hoc
approaches. Either of these choices can lead to a frustrating experience for a user,
because in (i) the user needs to sort/browse through a potentially large number of
duplicate records, and in (ii) we run the risk of presenting a record with missing or
incorrect pieces of data. Record normalization is a challenging problem because
different Web sources may represent the attribute values of an entity in different ways
or even provide conflicting data. Conflicting data may occur because of incomplete
data, different data representations, missing attribute values, and even erroneous data.
They are extracted from different websites. Record Rnorm is constructed by hand for
illustration purposes.
This paper is focus on privacy and security of data stored in the cloud. They
albeit computing is introduced to provide to increasing its efficiency, optimization and
effectiveness of the cloud environment. Thus author introduce Privacy Preserving
Model to Prevent Digital Data Loss in the Cloud. This proposal helps the Cloud
Requester/Users to trust their proprietary information and data stored in the cloud.
2.1.3 An efficient public auditing protocol with novel dynamic structure for
cloud data
This paper is based on the efficient method of making the structure of the data.
Author proposed public auditing scheme in which dynamic operation can be
performed. Hashing can be performed in this method. Using Merkle Hash Tree the
dynamic data operation can be performed. Ring signature stores the information of the
user.
Normalization Of Duplicate Records From Multiple Sources
This paper explains about the Verifiable Secret Sharing Schemes. Using the
metric author forms a set of codes known as set of error correcting codes. Then they
consider the burst error interleaving codes introduces the efficient burst error
correcting scheme. By this methods error correcting and secrete sharing of files can
be performed.
This paper explains about some of the security issues in cloud in various
aspects like Insider attacks, Outsider attacks, Loss of control, data loss, multi
tenancy, Network security, elasticity, and availability. It also consists of available
security schemes and method for a securing cloud. This paper will deliver the idea
about different security issues and tools to the researchers and professionals.
Normalization Of Duplicate Records From Multiple Sources
As sharing personal media online becomes easier and widely spread, new privacy
concerns emerge – especially when the persistent nature of the media and associated
context reveals details about the physical and social context in which the media items were
created. In a first-of-its-kind study, we use context-aware camerephone devices to examine
privacy decisions in mobile and online photo sharing. Through data analysis on a corpus of
privacy decisions and associated context data from a real-world system, we identify
relationships between location of photo capture and photo privacy settings. Our data
analysis leads to further questions which we investigate through a set of interviews with 15
users. The interviews reveal common themes in privacy considerations: security, social
disclosure, identity and convenience. Finally, we highlight several implications and
opportunities for design of media sharing applications, including using past privacy patterns
to prevent oversights and errors.
[2] J. Bonneau, J. Anderson, and L. Church, “Privacy suites: Shared privacy for social
networks,” in Proc. Symp. Usable Privacy Security, 2009.
Creating privacy controls for social networks that are both expressive and usable is a
major challenge. Lack of user un- derstanding of privacy settings can lead to unwanted
disclosure of private information and, in some cases, to material harm. We propose a new
paradigm which allows users to easily choose \suites" of privacy settings which have been
speci_ed by friends or trusted experts, only modifying them if they wish. Given that most
users currently stick with their default, operator-chosen settings, such a system could
dramatically increase the privacy protection that most users experience with minimal time
investment.
[3] J. Bonneau, J. Anderson, and G. Danezis, “Prying data out of a social network,” in Proc.
Int. Conf. Adv. Soc. Netw. Anal. Mining., 2009, pp.249–254.
Online photo albums have been prevalent in recent years and have resulted in more
and more applications developed to provide convenient functionalities for photo sharing. In
this project, we propose a system named SheepDog to automatically add photos into
appropriate groups and recommend suitable tags for users on Flickr. We adopt concept
detection to predict relevant concepts of a photo and probe into the issue about training
Normalization Of Duplicate Records From Multiple Sources
data collection for concept classification. From the perspective of gathering training data by
web searching, we introduce two mechanisms and investigate their performances of
concept detection. Based on some existing information from Flickr, a ranking-based method
is applied not only to obtain reliable training data, but also to provide reasonable group/tag
recommendations for input photos. We evaluate this system with a rich set of photos and
the results demonstrate the effectiveness of our work.
[4] H.-M. Chen, M.-H. Chang, P.-C. Chang, M.-C. Tien, W. H. Hsu, and J.-L. Wu, “Sheepdog:
Group and tag recommendation for flickr photos by automatic search-based learning,” in
Proc. 16th ACM Int. Conf. Multimedia, 2008, pp. 737–740.
The social media site Flickr allows users to upload their photos, annotate them with
tags, submit them to groups, and also to form social networks by adding other users as
contacts. Flickr offers multiple ways of browsing or searching it. One option is tag search,
which returns all images tagged with a specific keyword. If the keyword is ambiguous, e.g.,
“beetle” could mean an insect or a car, tag search results will include many images that are
not relevant to the sense the user had in mind when executing the query. We claim that
users express their photography interests through the metadata they add in the form of
contacts and image annotations. We show how to exploit this metadata to personalize
search results for the user, thereby improving search performance.
3.1.1. Disadvantages
In this paper, we assume that the tasks of record matching and truth discovery have
been performed and that the groups of true matching records have thus been identified.
Our goal is to generate a uniform, standard record for each group of true matching records
for end-user consumption. The system calls the generated record the normalized record. We
call the problem of computing the normalized record for a group of matching records the
record normalization problem (RNP), and it is the focus of this work. RNP is another specific
interesting problem in data fusion. The system proposes three levels of granularities for
record normalization along with methods to construct normalized records according to
them.
3.2.1. Advantages
Normalization Of Duplicate Records From Multiple Sources
The system is very fast due to identification of three levels of normalization gran-
ularity such as record, field, and value component.
An Exact Duplicate records detection due to Mining Template Collocation-Sub
Collocation Pairs
The record-level normalization assumes that each record, The assumption, while
intuitively appealing and allows to build the theoretical underpins for constructing
normalized records, needs to be taken with a grain of salt in practice. Re contains a mixture
of candidate normalized records and records with incomplete or arcane representations of
e, which may be difficult to understand by ordinary users
1) The typical normalization framework has two paths: record-level and field-level. The
former works with whole records from Re. It includes a number of record-level
rankers (RL rankers) to rank the records in Re according to their fitness to represent
the normalized record for entity e. In the single-strategy approach, each ranker
recommends the top-1 candidate in its ranked list as the normalized record. In RL
TSNRi denotes the normalized record recommended by the ith ranker. If we instead
use the multistrategy approach, then we employ rank merging methodologies to
select the final normalized record. In the multistrategy approach each ranker acts as
a voter and the records in Re are the candidates (for the normalized record). Each
ranker ranks the records in descending order of preference.
Normalization Of Duplicate Records From Multiple Sources
User Constraints for project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates. During
system analysis the feasibility study of the proposed system is to be carried out. This
is to ensure that the proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system is essential.
ECONOMICAL CONSTRAINTS
TECHNICAL CONSTRAINTS
SOCIAL CONSTRAINTS
ECONOMICAL CONSTRAINTS
This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into the
research and development of the system is limited. The expenditures must be justified.
Thus the developed system as well within the budget and this was achieved because
most of the technologies used are freely available. Only the customized products had
to be purchased.
TECHNICAL CONSTRAINTS
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on
the available technical resources. This will lead to high demands on the available
technical resources. This will lead to high demands being placed on the client. The
developed system must have a modest requirement, as only minimal or null changes
are required for implementing this system.
SOCIAL CONSTRAINTS
The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently. The
user must not feel threatened by the system, instead must accept it as a necessity. The
level of acceptance by the users solely depends on the methods that are employed to
educate the user about the system and to make him familiar with it. His level of
confidence must be raised so that he is also able to make some constructive criticism,
which is welcomed, as he is the final user of the system.
Normalization Of Duplicate Records From Multiple Sources
Existing Algorithms
Several existing algorithms have been developed to handle duplicate record normalization.
These methods generally fall into the categories of rule-based, probabilistic, and machine
learning-based approaches.
1. Rule-Based Approaches
Exact Matching: Compares records based on exact matches of key attributes (e.g.,
name, address).
Fuzzy Matching: Uses approximate string matching techniques like Levenshtein
distance and Jaro-Winkler.
Custom Heuristics: Uses domain-specific rules for deduplication.
Example Algorithms:
Proposed Algorithms
1. Hybrid Approaches
HARDWARE REQUIREMENTS
Processor : I3 or higher
Speed : 2.9 GHz
RAM : 4 GB (min)
Hard Disk : 160 GB
SOFTWARE REQUIREMENTS
Functional Requirements
Functional requirements describe what the system should do. The functional
requirements can be further categorized as follows:
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and the
steps are necessary to put transaction data in to a usable form for processing that can
be achieved by inspecting the computer to read data from a written or printed
document or it can occur by having people keying the data directly into the system.
The design of input focuses on controlling the amount of input required, controlling
the errors, avoiding delay, avoiding extra steps and keeping the process simple. The
input is designed in such a way so that it provides security and ease of use with
retaining the privacy. Input Design considered the following things:
Non-Functional Requirements
User Interfaces
Software Interfaces
Normalization Of Duplicate Records From Multiple Sources
Manpower Requirements
5 members can complete the project in 2 – 4 months if they work fulltime on it.
Normalization Of Duplicate Records From Multiple Sources
3. SYSTEM DESIGN
UML Analysis modeling, this focuses on the user model and structural model
views of the system.
The input controls provide ways to ensure that only authorized users access
the system guarantee the valid transactions, validate the data for accuracy and
determine whether any necessary data has been omitted. The primary input medium
chosen is display. Screens have been developed for input of data using HTML. The
validations for all important inputs are taken care of through various events using JSP
control.
3.2.3. Design of Output
Output layout
Output of this system is given in easily understandable, user-friendly manner,
Layout of the output is decided through the discussions with the different users.
The system should offer the means of detecting and handling errors.
All entities to the system will be validated. And updating of tables is allowed
for only valid entries. Means have been provided to correct, if any by change incorrect
entries have been entered into the system they can be edited.
As the strategic value of software increases for many companies, the industry
looks for techniques to automate the production of software and to improve quality
and reduce cost and time-to-market. These techniques include component technology,
visual programming, patterns and frameworks. Businesses also seek techniques to
manage the complexity of systems as they increase in scope and scale. In particular,
they recognize the need to solve recurring architectural problems, such as physical
distribution, concurrency, replication, security, load balancing and fault tolerance.
Additionally, the development for the World Wide Web, while making some things
simpler, has exacerbated these architectural problems. The Unified Modeling
Language (UML) was designed to respond to these needs. Simply, Systems design
refers to the process of defining the architecture, components, modules, interfaces,
and data for a system to satisfy specified requirements which can be done easily
through UML diagrams.
In the project four basic UML diagrams have been explained among the
following list:
Normalization Of Duplicate Records From Multiple Sources
Class Diagram
Use Case Diagram
Sequence Diagram
Activity Diagram
Collaboration Diagram
Deployment Diagram
State Chart Diagram
Component Diagram
Class Diagram
This is one of the most important of the diagrams in development. The diagram
breaks the class into three layers. One has the name, the second describes its attributes and
the third its methods. A padlock to left of the name represents the private attributes. The
relationships are drawn between the classes. Developers use the Class Diagram to develop
the classes. Analyses use it to show the details of the system.
Architects look at class diagrams to see if any class has too many functions
and see if they are required to be split.
Use Case
Diagram
Normalization Of Duplicate Records From Multiple Sources
Activity Diagram
Collaboration Diagram
Deployment Diagram
Component Diagram
1. The DFD is also called as bubble chart. It is a simple graphical formalism that
can be used to represent a system in terms of input data to the system, various
processing carried out on this data, and the output data is generated by this
system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is
used to model the system components. These components are the system process,
the data used by the process, an external entity that interacts with the system and
the information flows in the system.
3. DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that depicts
Normalization Of Duplicate Records From Multiple Sources
information flow and the transformations that are applied as data moves from
input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at
any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.
DFD NOTATIONS
To represent an attribute
Data Store
Normalization Of Duplicate Records From Multiple Sources
Login Master
Validation Data
UML DIAGRAMS
The goal is for UML to become a common language for creating models of
object oriented computer software. In its current form UML is comprised of two
major components: a Meta-model and a notation. In the future, some form of method
or process may also be added to; or associated with, UML.
GOALS:
Register
Login
Upload File
User
Public Cloud
Download File
List of Files
Logout
Public-Cloud
userid
password
filestorage
DataUser files
userid
password 1 login()
files storefiles()
fileid encrypt()
fileblocks * decrypt()
duplicate()
login() logout()
register()
upload()
duplicatecheck() *
encrypt()
decrypt()
downoad() Private-Cloud
logout() userid
password
1 files
rights
ownername
permissions
login()
activiation()
permissions()
logout()
Owner Login Received Permission File Upload View User Receive File Attribte
from admin Details
uid,pwd
verify
receive permission
file upload
change key
ER-DIAGRAM
4. TESTING
Testing cannot show the absence of defects, it can only show that software
errors are present.
This testing is also called as glass box testing. In this testing, by knowing the
specified function that a product has been designed to perform test can be conducted
that demonstrates each function is fully operation at the same time searching for
errors in each function. It is a test case design method that uses the control structure of
the procedural design to derive test cases. Basis path testing is a white box testing.
Condition testing
Data flow testing
Loop testing
Normalization Of Duplicate Records From Multiple Sources
A Strategy for software testing integrates software test cases into a series of
well planned steps that result in the successful construction of software. Software
testing is a broader topic for what is referred to as Verification and Validation.
Verification refers to the set of activities that ensure that the software correctly
implements a specific function. Validation refers he set of activities that ensure that
the software that has been built is traceable to customer’s requirements.
Unit testing focuses verification effort on the smallest unit of software design
that is the module. Using procedural design description as a guide, important control
paths are tested to uncover errors within the boundaries of the module. The unit test
is normally white box testing oriented and the step can be conducted in parallel for
multiple modules.
objective is to take unit tested methods and build a program structure that has been
dictated by design.
Top-Down Integration
Bottom-up Integration
This method as the name suggests, begins construction and testing with atomic
modules i.e., modules at the lowest level. Because the modules are integrated in the
bottom up manner the processing required for the modules subordinate to a given
level is always available and the need for stubs is eliminated.
Regression Testing
be conducted, and a test procedure defines specific test cases that will be used in an
attempt to uncover errors in conformity with requirements. Both the plan and
procedure are designed to ensure that all functional requirements are satisfied, all
performance requirements are achieved, documentation is correct and human-
engineered; and other requirements are met.
After each validation test case has been conducted, one of two possible
conditions exists: (1) The function or performance characteristics conform to
specification and are accepted, or (2) a deviation from specification is uncovered and
a deficiency list is created. Deviation or error discovered at this stage in a project can
rarely be corrected prior to scheduled completion. It is often necessary to negotiate
with the customer to establish a method for resolving deficiencies.
Configuration Review
When custom software is built for one customer, a series of acceptance tests
are conducted to enable the customer to validate all requirements. Conducted by the
end user rather than the system developer, an acceptance test can range from an
informal “test drive” to a planned and systematically executed series of tests. In fact,
acceptance testing can be conducted over a period of weeks or months, thereby
uncovering cumulative errors that might degrade the system over time.
The beta test is conducted at one or more customer sites by the end user of the
software. Unlike alpha testing, the developer is generally not present. Therefore, the
beta test is a “live” application of the software in an environment that cannot be
controlled by the developer. The customer records all problems that are encountered
Normalization Of Duplicate Records From Multiple Sources
during beta testing and reports these to the developer at regular intervals. As a result
of problems reported during beta test, the software developer makes modification and
then prepares for release of the software product to the entire customer base.
5.
TEST EXPECTED ACTUAL
S. No. INPUT STATUS
CASES RESULT RESULT
Password
Upload Add Select the to Upload to the Post Upload
6 pass
file upload file Database Success Fully
Normalization Of Duplicate Records From Multiple Sources
6. IMPLEMENTATION
Java Technology
Simple
Architecture neutral
Object oriented
Portable
Distributed
High performance
Interpreted
Multithreaded
Robust
Dynamic
Secure
If we think of Java byte codes as the machine code instructions for the Java Virtual
Machine (Java VM). Every Java interpreter, whether it’s a development tool or a Web
browser that can run applets, is an implementation of the Java VM. Java byte codes help
make “write once, run anywhere” possible. You can compile your program into byte codes on
any platform that has a Java compiler. The byte codes can then be run on any implementation
of the Java VM. That means that as long as a computer has a Java VM, the same program
written in the Java programming language can run on Windows 2000, a Solaris workstation,
or on an iMac.
You’ve already been introduced to the Java VM. It’s the base for the Java platform
and is ported onto various hardware-based platforms.
Native code is code that after you compile it, the compiled code runs on a
specific hardware platform. As a platform-independent environment, the Java
platform can be a bit slower than native code. However, smart compilers, well-tuned
interpreters, and just-in-time byte code compilers can bring performance close to that
of native code without threatening portability.
Feasibility Study
Technical Feasibility
GUI is developed using HTML to capture the information from the customer.
HTML is used to display the content on the browser. It uses TCP/IP protocol. It is an
interpreted language. It is very easy to develop a page/document using HTML some
RAD (Rapid Application Development) tools are provided to quickly design/develop
Normalization Of Duplicate Records From Multiple Sources
our application. So many objects such as button, text fields, and text area etc are
provided to capture the information from the customer.
Economical Feasibility
The economical issues usually arise during the economical feasibility stage are
whether the system will be used if it is developed and implemented, whether the financial
benefits are equal are exceeds the costs. The cost for developing the project will include cost
conducts full system investigation, cost of hardware and software for the class of being
considered, the benefits in the form of reduced costs or fewer costly errors. The project is
economically feasible if it is developed and installed. It reduces the work load. Keep the class
of application in the view, the cost of hardware and software is considered to be economically
feasible.
Operational Feasibility
In our application front end is developed using GUI. So it is very easy to the
customer to enter the necessary information. But customer must have some knowledge
on using web applications before going to use our application.
1. Installation of java:
Go to http://www.oracle.com/technetwork/java/javase/downloads /in-
dex.html.
click on JDK DOWNLOAD button. run the exe file and then follow the
instruction given in wizard.
To set up the path:-
o Right click on my pc and then go to my properties
Normalization Of Duplicate Records From Multiple Sources
Click on install with port number 8090 with username and password as
aits and aits.
Mention the connection port as 8090 and then click on next and finally
click on finish.
Conform the type as typical and then click on next and follow the in-
structions.
Now confirm the password as root in system settings field and then
click on finish.
Normalization Of Duplicate Records From Multiple Sources
Home Page
Admin Menu
Normalized Records
User Login
User Menu
6.CONCLUSION
In this paper, we studied the problem of record normalization over a set of matching
records that refer to the same real-world entity. We presented three levels of normalization
granularities (record-level, field-level and valuecomponent level) and two forms of
normalization (typical normalization and complete normalization). For each form of
normalization, we proposed a computational framework that includes both single-strategy
and multi-strategy approaches. We proposed four single-strategy approaches: frequency,
length, centroid, and feature-based to select the normalized record or the normalized field
value. For multistrategy approach, we used result merging models inspired from
metasearching to combine the results from a number of single strategies. We analyzed the
record and field level normalization in the typical normalization. In the complete
normalization, we focused on field values and proposed algorithms for acronym expansion
and value component mining to produce much improved normalized field values. We
implemented a prototype and tested it on a real-world dataset. The experimental results
demonstrate the feasibility and effectiveness of our approach. Our method outperforms the
state-of-the-art by a significant margin
In the future, we plan to extend our research as follows. First, conduct additional
experiments using more diverse and larger datasets. The lack of appropriate datasets
currently has made this difficult. Second, investigate how to add an effective human-in-the-
loop component into the current solution as automated solutions alone will not be able to
achieve perfect accuracy. Third, develop solutions that handle numeric or more complex
values.
Normalization Of Duplicate Records From Multiple Sources
BIBLIOGRAPHY
[1] K. C.-C. Chang and J. Cho, “Accessing the web: From search to integration,” in SIGMOD,
2006, pp. 804–805.
[2] M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang, “Webtables: Exploring the
power of tables on the web,” PVLDB, vol. 1, no. 1, pp. 538–549, 2008.
[3] W. Meng and C. Yu, Advanced Metasearch Engine Technology. Morgan & Claypool
Publishers, 2010.
[4] A. Gruenheid, X. L. Dong, and D. Srivastava, “Incremental record linkage,” PVLDB, vol. 7,
no. 9, pp. 697–708, May 2014.
[6] W. Su, J. Wang, and F. Lochovsky, “Record matching over query results from multiple
web databases,” TKDE, vol. 22, no. 4, 2010.
[7] H. K¨opcke and E. Rahm, “Frameworks for entity matching: A comparison,” DKE, vol. 69,
no. 2, pp. 197–210, 2010.
[8] X. Yin, J. Han, and S. Y. Philip, “Truth discovery with multiple conflicting information
providers on the web,” ICDE, 2008.
[10] P. Christen, “A survey of indexing techniques for scalable record linkage and
deduplication,” TKDE, vol. 24, no. 9, 2012.
[11] S. Tejada, C. A. Knoblock, and S. Minton, “Learning object identification rules for
information integration,” Inf. Sys., vol. 26, no. 8, pp. 607–633, 2001.
[12] L. Shu, A. Chen, M. Xiong, and W. Meng, “Efficient spectral neighborhood blocking for
entity resolution,” in ICDE, 2011.
Page 56
Normalization Of Duplicate Records From Multiple Sources
APPENDIX – A
URL LISTING
o www.google.co.in
o www.Java.org
o www.w3schools.com
o www.Javatutorial.com
REFERENCE BOOKS
Java Crash Course 2nd Edition - this is a basic level book for
beginners.
Learning Java 5th Edition - this book is a practical learning book for
basic to advanced level.
Java Cookbook - this book for advanced programmer interested in
learning about modern Java development tools.
Automating Boring Stuff With Java - In this book you will learn to
write programs in Java.
Head First Java - this book covered the fundamental of Java.
Think Java - the basics of programming concepts and cover advanced
topics like data structure and object-oriented design.
Page 57
Normalization Of Duplicate Records From Multiple Sources
APPENDIX – B
GLOSSARY
o GUI : Graphical User Interface
Page 58
Normalization Of Duplicate Records From Multiple Sources
APPENDIX – B
GLOSSARY
o GUI : Graphical User Interface
Page 59
Normalization Of Duplicate Records From Multiple Sources
APPENDIX – C
List of Figures
Page 60
Normalization Of Duplicate Records From Multiple Sources
List of Screens
Page 61
Normalization Of Duplicate Records From Multiple Sources
List of Tables
Page 62
Normalization Of Duplicate Records From Multiple Sources
APPENDIX – D
Coding
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Home Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href="css/style.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="css/coin-slider.css" />
<script type="text/javascript" src="js/cufon-yui.js"></script>
<script type="text/javascript" src="js/cufon-titillium-250.js"></script>
<script type="text/javascript" src="js/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="js/script.js"></script>
<script type="text/javascript" src="js/coin-slider.min.js"></script>
<style type="text/css">
<!--
.style1 {font-size: 20px}
.style2 {
color: #FF0000;
font-size: 25px;
}
.style4 { color: #FF0000;
font-weight: bold;
}
-->
</style>
</head>
<body>
<div class="main">
<div class="header">
Page 63
Normalization Of Duplicate Records From Multiple Sources
<div class="header_resize">
<div class="slider">
<div id="coin-slider"> <a href="#"><img src="images/slide1.jpg"
width="960" height="399" alt="" /> </a></div>
</div>
<div class="menu_nav">
<ul>
<li class="active"><a href="index.html"><span>Home
Page</span></a></li>
<li><a href="a_login.jsp"><span>Admin</span></a></li>
<li><a href="u_login.jsp"><span>User</span></a></li>
<li><a href="p_login.jsp"><span>Publisher</span></a></li>
</ul>
</div>
<div class="logo">
<h1 class="style1"><a href="index.html" class="style2">Normalization of
Duplicate Records <br />
from Multiple Sources</a></h1>
</div>
<div class="clr"></div>
</div>
</div>
<div class="content">
<div class="content_resize">
<div class="mainbar">
<div class="article">
<h2 align="center"><span> Welcome </span></h2>
<p align="center"><img src="images/Home.png" width="566"
height="190" /></p>
<p align="justify"><span class="style4">Data consolidation is a
challenging issue in data integration. The usefulness of data increases when it is
linked and fused with other data from numerous (Web) sources. The promise of
Big Data hinges upon addressing several big data integration challenges, such as
Page 64
Normalization Of Duplicate Records From Multiple Sources
record linkage at scale, real-time data fusion, and integrating Deep Web. Although
much work has been conducted on these problems, there is limited work on
creating a uniform, standard record from a group of records corresponding to the
same real-world entity. We refer to this task as record normalization. Such a
record representation, coined normalized record, is important for both front-end
and back-end applications. In this paper, we formalize the record normalization
problem, present in-depth analysis of normalization granularity levels (e.g.,
record, field, and value-component) and of normalization forms (e.g., typical
versus complete). We propose a comprehensive framework for computing the
normalized record. The proposed framework includes a suit of record
normalization methods, from naive ones, which use only the information gathered
from records themselves, to complex strategies, which globally mine a group of
duplicate records before selecting a value for an attribute of a normalized record.
We conducted extensive empirical studies with all the proposed methods. We
indicate the weaknesses and strengths of each of them and recommend the ones to
be used in practice.</span></p>
<div class="clr"></div>
</div>
</div>
<div class="sidebar">
<div class="clr"></div>
<div class="gadget">
<h2 class="star"><span>Sidebar</span> Menu</h2>
<div class="clr">
<p> </p>
</div>
<ul class="sb_menu"><li><a href="index.html"><span>Home
Page</span></a></li>
<li class="active"><a href="a_login.jsp"><span>Admin</span></a></li>
<li><a href="u_login.jsp"><span>User</span></a></li>
</ul>
<p><img src="images/img2.jpg" width="180" height="229" /></p>
<p> </p>
<p> </p>
Page 65
Normalization Of Duplicate Records From Multiple Sources
<p> </p>
</div>
</div>
<div class="clr"></div>
</div>
</div>
<div class="fbg"></div>
<div class="footer">
<div class="footer_resize">
<div style="clear:both;"></div>
</div>
</div>
</div>
<div align=center></div>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title> Bookmark Details</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href="css/style.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="css/coin-slider.css" />
<script type="text/javascript" src="js/cufon-yui.js"></script>
<script type="text/javascript" src="js/cufon-titillium-250.js"></script>
<script type="text/javascript" src="js/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="js/script.js"></script>
<script type="text/javascript" src="js/coin-slider.min.js"></script>
<script language="javascript" type="text/javascript">
</script>
<style type="text/css">
<!--
.style1 {font-size: 20px}
Page 66
Normalization Of Duplicate Records From Multiple Sources
.style2 {
color: #FF0000;
font-size: 25px;
}
.style4 {font-family: "Times New Roman", Times, serif}
.style5 {color: #FF0000}
.style6 {font-size: 15px}
.style7 {font-weight: bold}
.style8 {color: #000000}
-->
</style>
</head>
<body>
<div class="main">
<div class="header">
<div class="header_resize">
<div class="slider">
<div id="coin-slider"> <a href="#"><img src="images/slide1.jpg"
width="960" height="399" alt="" /> </a></div>
</div>
<div class="menu_nav">
<ul>
<li><a href="index.html"><span>Home Page</span></a></li>
<li><a href="a_login.jsp"><span>Admin</span></a></li>
<li class="active"><a href="u_login.jsp"><span>User</span></a></li>
</ul>
</div>
<div class="logo">
<h1 class="style1"><a href="index.html" class="style2">Normalization of
Duplicate Records from Multiple Sources</a></h1>
</div>
<div class="clr"></div>
</div>
Page 67
Normalization Of Duplicate Records From Multiple Sources
</div>
<div class="content">
<div class="content_resize">
<div class="mainbar">
<div class="article">
<h2 align="center"> Bookmark Details </h2>
<p> </p>
<%
String s1 = "", s2 = "", s3 = "", s4 = "", s5 = "", s6 = "", s7 = "", s8, s9 = "", s10,
s11, s12, s13,s14,s15,s16,s17,s33 = "", s44 = "", s55 = "", s66 = "";
String ss2 = "", ss3 = "", ss4 = "", ss5 = "", ss6 = "", ss7 = "", ss8, ss9 = "";
int i = 0, j = 0, k = 0,i2 = 0;
String bk=request.getParameter("bk");
String rk=request.getParameter("rank");
String rk2=request.getParameter("rank2");
String keyword=request.getParameter("key");
Page 68
Normalization Of Duplicate Records From Multiple Sources
String user=(String)application.getAttribute("user");
try
{
String task="Searched";
String strQuery222 = "insert into
transaction_bk(user,bname,task,dt)
values('"+user+"','"+bk+"','"+task+"','"+dt+"')";
connection.createStatement().executeUpdate(strQuery222);
Page 69
Normalization Of Duplicate Records From Multiple Sources
s10 =
rs22.getString(1);
//int
UpdateRank1=Integer.parseInt(s10)+1;
String
strQuery12 = "update transaction3 set rank="+rk2+" where user='"+user+"' and
bname='"+bk+"' ";
connection.createStatement().executeUpdate(strQuery12);
}
else{
String rank="1";
String strQuery22 = "insert into
transaction3(user,bname,rank) values('"+user+"','"+bk+"','"+rank+"')";
connection.createStatement().executeUpdate(strQuery22);
}
i = rs.getInt(1);
s2 = rs.getString(2);
s3 = rs.getString(3).toLowerCase();//bk name
Page 70
Normalization Of Duplicate Records From Multiple Sources
s4 = rs.getString(4).toLowerCase();//url
s5 = rs.getString(5).toLowerCase();//tag
s6 = rs.getString(6);//descr
s7 = rs.getString(7);//img
s8 = rs.getString(8);//rank
s9 = rs.getString(9);
String keys="q2e34rrfgfgfgg2a";
Cipher c1 = Cipher.getInstance("AES");
c1.init(Cipher.DECRYPT_MODE, key1);
//int UpdateRank=Integer.parseInt(s8)+1;
Page 71
Normalization Of Duplicate Records From Multiple Sources
connection.createStatement().executeUpdate(strQuery2);
%>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>Bookmark
Image</strong></div></td>
<td width="116" rowspan="1" ><div class="style7" style="margin:10px
13px 10px 13px;">
<input name="image" type="image" src="bk_Pic.jsp?id=<%=i%>"
style="width:90px; height:90px;">
</div></td>
</tr>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>Bookmark
Name</strong></div></td>
<td width="252" valign="middle" height="40"
style="color:#000000;"><div align="left" class="style23 style9 style10 style6
style4" style="margin-left:20px;">
<%out.println(s3);%>
</div></td>
</tr>
Page 72
Normalization Of Duplicate Records From Multiple Sources
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5"
style="margin-left:20px;"><strong>URL</strong></div></td>
<td width="252" valign="middle" height="40"><div align="left"
class="style23 style9 style10 style6 style4" style="margin-left:20px;">
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>
User(Uploader)</strong></div></td>
<td width="252" valign="middle" height="40"
style="color:#000000;"><div align="left" class="style23 style9 style10 style6
style4" style="margin-left:20px;">
<%out.println(s2);%>
</div></td>
</tr>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>
Date</strong></div></td>
<td width="252" valign="middle" height="40"
style="color:#000000;"><div align="left" class="style23 style9 style10 style6
style4" style="margin-left:20px;">
<%out.println(s9);%>
Page 73
Normalization Of Duplicate Records From Multiple Sources
</div></td>
</tr>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5"
style="margin-left:20px;"><strong>Tag</strong></div></td>
<td width="252" valign="middle" height="40"><div align="left"
class="style23 style9 style10 style6 style4" style="margin-left:20px;">
<textarea name="text" cols="25" rows="7" readonly><%= s5
%></textarea>
</div></td>
</tr>
<tr>
<td width="139" height="40" align="left" valign="middle"
bgcolor="#FFFF00" style="color: #2c83b0;"><div align="left" class="style14
style15 style20 style9 style4 style6 style5" style="margin-
left:20px;"><strong>Description</strong></div></td>
<td width="252" valign="middle" height="40"><div align="left"
class="style23 style9 style10 style6 style4" style="margin-left:20px;">
<textarea name="textarea" cols="25" rows="7" readonly><%= decrys6
%></textarea>
</div></td>
</tr>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>
Rank</strong></div></td>
Page 74
Normalization Of Duplicate Records From Multiple Sources
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong> Ratings
</strong></div></td>
<td><span class="style8">
<%
int rank=Integer.parseInt(s8);
if(rank==3)
{
%>
<input name="image2" type="image" src="Gallery/1.png" width="30"
height="30" />
<%
}
if(rank>3 && rank<=6)
{
%>
<input name="image2" type="image" src="Gallery/2.png" width="80"
height="30" />
<%
}
if(rank>6 && rank<=9)
{
%>
Page 75
Normalization Of Duplicate Records From Multiple Sources
</table>
<%
Page 76
Normalization Of Duplicate Records From Multiple Sources
catch(Exception e)
{
out.println(e.getMessage());
}
%>
</table>
<p> </p>
<p align="right"> </p>
<p align="right"><a href="u_search_bk.jsp">Back</a></p>
<p> </p>
</div>
</div>
<div class="sidebar">
<div class="clr"></div>
<div class="gadget">
<h2 class="star"><span>User</span> Menu</h2>
<div class="clr">
<p> </p>
</div>
<ul class="sb_menu">
<li><a href="u_main.jsp"><span>User Main </span></a></li>
<li><a href="u_login.jsp"><span>Log Out</span></a></li>
</ul>
</div>
Page 77
Normalization Of Duplicate Records From Multiple Sources
</div>
<div class="clr"></div>
</div>
</div>
<div class="fbg"></div>
<div class="footer">
<div class="footer_resize">
<div style="clear:both;"></div>
</div>
</div>
</div>
<div align=center></div>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title> Publication Details</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href="css/style.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="css/coin-slider.css" />
<script type="text/javascript" src="js/cufon-yui.js"></script>
<script type="text/javascript" src="js/cufon-titillium-250.js"></script>
<script type="text/javascript" src="js/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="js/script.js"></script>
<script type="text/javascript" src="js/coin-slider.min.js"></script>
<script language="javascript" type="text/javascript">
</script>
<style type="text/css">
<!--
.style1 {font-size: 20px}
.style2 {
color: #FF0000;
Page 78
Normalization Of Duplicate Records From Multiple Sources
font-size: 25px;
}
.style4 {font-family: "Times New Roman", Times, serif}
.style5 {color: #FF0000}
.style6 {font-size: 15px}
.style7 {font-weight: bold}
-->
</style>
</head>
<body>
<div class="main">
<div class="header">
<div class="header_resize">
<div class="slider">
<div id="coin-slider"> <a href="#"><img src="images/slide1.jpg"
width="960" height="399" alt="" /> </a></div>
</div>
<div class="menu_nav">
<ul>
<li><a href="index.html"><span>Home Page</span></a></li>
<li><a href="a_login.jsp"><span>Admin</span></a></li>
<li class="active"><a href="u_login.jsp"><span>User</span></a></li>
</ul>
</div>
<div class="logo">
<h1 class="style1"><a href="index.html" class="style2">Normalization of
Duplicate Records from Multiple Sources</a></h1>
</div>
<div class="clr"></div>
</div>
</div>
<div class="content">
<div class="content_resize">
Page 79
Normalization Of Duplicate Records From Multiple Sources
<div class="mainbar">
<div class="article">
<h2 align="center" class="style5"> Publication Details !!! </h2>
<p> </p>
<%
String s1 = "", s2 = "", s3 = "", s4 = "", s5 = "", s6 = "", s7 = "", s8, s9 = "", s10,
s11, s12, s13,s14,s15,s16,s17,s33 = "", s44 = "", s55 = "", s66 = "";
String ss2 = "", ss3 = "", ss4 = "", ss5 = "", ss6 = "", ss7 = "", ss8, ss9 = "";
int i = 0, j = 0, k = 0,i2 = 0;
String pub=request.getParameter("pub");
String rk=request.getParameter("rank");
String rk2=request.getParameter("rank2");
String keyword=request.getParameter("key");
String user=(String)application.getAttribute("user");
Page 80
Normalization Of Duplicate Records From Multiple Sources
try
{
String task="Searched";
String strQuery222 = "insert into
transaction_pub(user,pname,task,dt)
values('"+user+"','"+pub+"','"+task+"','"+dt+"')";
connection.createStatement().executeUpdate(strQuery222);
s10 =
rs22.getString(1);
Page 81
Normalization Of Duplicate Records From Multiple Sources
//int
UpdateRank1=Integer.parseInt(s10)+1;
String
strQuery12 = "update transaction4 set rank="+rk2+" where user='"+user+"' and
pname='"+pub+"' ";
connection.createStatement().executeUpdate(strQuery12);
}
else{
String rank="1";
String strQuery22 = "insert into
transaction4(user,pname,rank) values('"+user+"','"+pub+"','"+rank+"')";
connection.createStatement().executeUpdate(strQuery22);
}
i = rs.getInt(1);
s2 = rs.getString(2);
s3 = rs.getString(3);//pub name
s4 = rs.getString(4);
s5 = rs.getString(5);//tag
Page 82
Normalization Of Duplicate Records From Multiple Sources
s6 = rs.getString(6);//descr
s7 = rs.getString(7);//img
s8 = rs.getString(8);//rank
s9 = rs.getString(9);
String keys="q2e34rrfgfgfgg2a";
Cipher c1 = Cipher.getInstance("AES");
c1.init(Cipher.DECRYPT_MODE, key1);
//int UpdateRank=Integer.parseInt(s8)+1;
String strQuery2 = "update publication set rank='"+ rk+ "' where name='"+ s3
+ "'";
connection.createStatement().executeUpdate(strQuery2);
Page 83
Normalization Of Duplicate Records From Multiple Sources
%>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>Title
Image</strong></div></td>
<td width="116"><div class="style7" style="margin:10px 13px 10px
13px;">
<input name="image" type="image" src="pub_Pic.jsp?id=<%=i%>"
style="width:90px; height:90px;">
</div></td>
</tr>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>Publication
Name</strong></div></td>
<td width="252" valign="middle" height="40"
style="color:#000000;"><div align="left" class="style23 style9 style10 style6"
style="margin-left:20px;">
<%out.println(s3);%>
</div></td>
</tr>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
Page 84
Normalization Of Duplicate Records From Multiple Sources
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>
User(Uploader)</strong></div></td>
<td width="252" valign="middle" height="40"
style="color:#000000;"><div align="left" class="style23 style9 style10 style6
style4" style="margin-left:20px;">
<%out.println(s2);%>
</div></td>
</tr>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>
Date</strong></div></td>
<td width="252" valign="middle" height="40"
style="color:#000000;"><div align="left" class="style23 style9 style10 style6
style4" style="margin-left:20px;">
<%out.println(s9);%>
</div></td>
</tr>
<tr>
Page 85
Normalization Of Duplicate Records From Multiple Sources
<tr>
<td width="139" height="40" align="left" valign="middle"
bgcolor="#FFFF00" style="color: #2c83b0;"><div align="left" class="style14
style15 style20 style9 style4 style6 style5" style="margin-
left:20px;"><strong>Release Date </strong></div></td>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong>
Rank</strong></div></td>
<td width="252" valign="middle" height="40"
style="color:#000000;"><div align="left" class="style23 style9 style10 style6
style4" style="margin-left:20px;">
<%out.println(rk);%>
</div></td>
Page 86
Normalization Of Duplicate Records From Multiple Sources
</tr>
<tr>
<td width="139" height="40" valign="middle" bgcolor="#FFFF00"
style="color: #2c83b0;"><div align="left" class="style14 style15 style20 style9
style4 style6 style5" style="margin-left:20px;"><strong> Ratings
</strong></div></td>
<td><span class="style8">
<%
int rank=Integer.parseInt(s8);
if(rank==3)
{
%>
<input name="image2" type="image" src="Gallery/1.png" width="30"
height="30" />
<%
}
if(rank>3 && rank<=6)
{
%>
<input name="image2" type="image" src="Gallery/2.png" width="80"
height="30" />
<%
}
if(rank>6 && rank<=9)
{
%>
<input name="image2" type="image" src="Gallery/3.png"
width="100" height="30" />
<%
}
if(rank>9 && rank<=12)
{
Page 87
Normalization Of Duplicate Records From Multiple Sources
%>
<input name="image2" type="image" src="Gallery/4.png"
width="120" height="30" />
<%
}
if(rank>12 && rank<=15)
{
%>
<input name="image2" type="image" src="Gallery/5.png"
width="140" height="30" />
<%
}
if(rank>15)
{
%>
<input name="image2" type="image" src="Gallery/6.png"
width="170" height="30" />
<%
}
%>
</span></td>
</tr>
</table>
<%
Page 88
Normalization Of Duplicate Records From Multiple Sources
catch(Exception e)
{
out.println(e.getMessage());
}
%>
</table>
<p> </p>
<p align="right"> </p>
<p align="right"><a href="u_search_pub.jsp">Back</a></p>
<p> </p>
</div>
</div>
<div class="sidebar">
<div class="clr"></div>
<div class="gadget">
<h2 class="star"><span>User</span> Menu</h2>
<div class="clr">
<p> </p>
</div>
<ul class="sb_menu">
<li><a href="u_main.jsp"><span>User Main </span></a></li>
<li><a href="u_login.jsp"><span>Log Out</span></a></li>
</ul>
</div>
</div>
<div class="clr"></div>
</div>
</div>
<div class="fbg"></div>
<div class="footer">
Page 89
Normalization Of Duplicate Records From Multiple Sources
<div class="footer_resize">
<div style="clear:both;"></div>
</div>
</div>
</div>
<div align=center></div>
</body>
</html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>All Bookamarks Cluster Format </title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href="css/style.css" rel="stylesheet" type="text/css" />
<link rel="stylesheet" type="text/css" href="css/coin-slider.css" />
<script type="text/javascript" src="js/cufon-yui.js"></script>
<script type="text/javascript" src="js/cufon-titillium-250.js"></script>
<script type="text/javascript" src="js/jquery-1.4.2.min.js"></script>
<script type="text/javascript" src="js/script.js"></script>
<script type="text/javascript" src="js/coin-slider.min.js"></script>
<script language="javascript" type="text/javascript">
</script>
<style type="text/css">
<!--
.style1 {font-size: 20px}
.style2 {
color: #FF0000;
font-size: 25px;
}
.style4 {font-size: 15px}
.style5 {font-family: "Times New Roman", Times, serif}
.style6 {color: #FF0000}
.style12 {color: #000000}
Page 90
Normalization Of Duplicate Records From Multiple Sources
.style13 {
font-family: "Times New Roman", Times, serif;
font-size: 20px;
color: #0000FF;
}
-->
</style>
</head>
<body>
<div class="main">
<div class="header">
<div class="header_resize">
<div class="slider">
<div id="coin-slider"> <a href="#"><img src="images/slide1.jpg"
width="960" height="399" alt="" /> </a></div>
</div>
<div class="menu_nav">
<ul>
<li><a href="index.html"><span>Home Page</span></a></li>
<li class="active"><a href="a_login.jsp"><span>Admin</span></a></li>
<li><a href="u_login.jsp"><span>User</span></a></li>
</ul>
</div>
<div class="logo">
<h1 class="style1"><a href="index.html" class="style2">Normalization of
Duplicate Records from Multiple Sources</a></h1>
</div>
<div class="clr"></div>
</div>
</div>
<div class="content">
<div class="content_resize">
<div class="mainbar">
Page 91
Normalization Of Duplicate Records From Multiple Sources
<div class="article">
<h2 align="center">View Bookamark Cluster Format Based on Name
</h2>
<p> </p>
<%@page import="java.io.BufferedInputStream"%>
<%@page import="java.security.DigestInputStream"%>
<%@page import="java.io.FileInputStream"%>
<%@page import="java.io.PrintStream"%>
<%@page import="java.io.FileOutputStream"%>
<%@page import="java.math.BigInteger"%>
<%@ page
import="java.security.Key,java.security.KeyPair,java.security.KeyPairGenerator,j
avax.crypto.Cipher"%>
<%@ include file="connect.jsp"%>
<%@page
import="java.util.*,java.security.Key,java.util.Random,javax.crypto.Cipher,javax.
crypto.spec.SecretKeySpec,org.bouncycastle.util.encoders.Base64"%>
<%@page import="java.security.MessageDigest"%>
<%@page import="java.sql.Statement"%>
<%@page import="java.sql.ResultSet"%>
<%@page import="java.text.SimpleDateFormat"%>
<%@page import="java.util.Date"%>
<%
String s1 = "", s2 = "", s3 = "", s4 = "", s5 = "", s6 = "", s7 = "", s8, s9 = "", s10,
s11, s12, s13,s14,s15,s16,s17;
Page 92
Normalization Of Duplicate Records From Multiple Sources
int i = 0, j = 1, k = 0;
try {
%>
<span class="style13">Name: <%=s1%></span>
<table width="846" border="1" align="center" cellspacing="0" cellpadding="5">
<tr>
<td width="17" bgcolor="#FFFF00"><div align="center" class="style3
style4 style9 style5 style6">Id</div></td>
<td width="65" bgcolor="#FFFF00"><div align="center"
class="style3 style4 style9 style5 style6">Uploader Name </div></td>
<td width="72" bgcolor="#FFFF00"><div align="center" class="style3
style4 style9 style5 style6">Bookmark Name </div></td>
<td width="92" bgcolor="#FFFF00"><div align="center" class="style3
style4 style9 style5 style6">Bookmark Image </div></td>
<td width="81" bgcolor="#FFFF00"><div align="center" class="style3
style4 style9 style5 style6">URL</div></td>
<td width="82" bgcolor="#FFFF00"><div align="center" class="style3
style4 style9 style5 style6">Tag</div></td>
<td width="83" bgcolor="#FFFF00"><div align="center" class="style3
style4 style9 style5 style6">Description</div></td>
<td width="58" bgcolor="#FFFF00"><div align="center" class="style3
style4 style9 style5 style6">Upload Date</div></td>
<td width="45" bgcolor="#FFFF00"><div align="center" class="style3
style4 style9 style5 style6">Rank</div></td>
Page 93
Normalization Of Duplicate Records From Multiple Sources
s8 = rs.getString(8);//rank
s9 = rs.getString(9);
String keys="q2e34rrfgfgfgg2a";
Cipher c1 = Cipher.getInstance("AES");
c1.init(Cipher.DECRYPT_MODE, key1);
Page 94
Normalization Of Duplicate Records From Multiple Sources
%>
<tr>
<td><div align="center" class="style9 style10 style5 style4
style12"><%=j%></div></td>
<td><div align="center" class="style9 style10 style5 style4
style12"><%=s2%></a></div></td>
<td><div align="center" class="style9 style10 style5 style4 style12"><
%=s3%></div></td>
<td><div align="center" class="style9 style10 style5 style4 style12">
<input name="image" type="image" src="bk_Pic.jsp?id=<%=i%>"
style="width:90px; height:90px;" />
</div></td>
<td><div align="center" class="style9 style10 style5 style4
style12"><input type="button" value="<%=s4%>" onClick="window.open('<
%=s4%>')"></div></td>
<td><div align="center" class="style9 style10 style5 style4 style12">
<textarea name="text" cols="10" rows="5" readonly><%= s5
%></textarea>
</div></td>
<td><div align="center" class="style9 style10 style5 style4 style12">
<textarea name="text" cols="10" rows="5" readonly><%= decrys6
%></textarea>
</div></td>
<td><div align="center" class="style9 style10 style5 style4 style12"><
%=s9%></div></td>
Page 95
Normalization Of Duplicate Records From Multiple Sources
if(rank==3)
{
%>
<input name="image2" type="image" src="Gallery/1.png" width="30"
height="30" />
<%
}
if(rank>3 && rank<=6)
{
%>
<input name="image2" type="image" src="Gallery/2.png" width="80"
height="30" />
<%
}
if(rank>6 && rank<=9)
{
%>
<input name="image2" type="image" src="Gallery/3.png"
width="100" height="30" />
<%
}
if(rank>9 && rank<=12)
{
%>
<input name="image2" type="image" src="Gallery/4.png"
width="120" height="30" />
<%
}
Page 96
Normalization Of Duplicate Records From Multiple Sources
<%
j=j+1;}
%>
</table>
<p> </p>
<%
Page 97
Normalization Of Duplicate Records From Multiple Sources
j=1;}
connection.close();
}
catch (Exception e) {
// out.println(e.getMessage());
}
%>
<p> </p>
<p align="right"><a href="a_all_bk.jsp">Back</a></p>
<div class="clr"></div>
</div>
</div>
<div class="sidebar">
<div class="clr"></div>
<div class="gadget">
<h2 class="star"><span>Admin</span> Menu</h2>
<div class="clr"><p> </p>
</div>
<ul class="sb_menu">
<li><a href="a_main.jsp">Admin Main </a></li>
Page 98
Normalization Of Duplicate Records From Multiple Sources
Page 99