0% found this document useful (0 votes)
81 views6 pages

A Suggestion-Based RDF Instance Matching System: January 2017

Uploaded by

rickshark
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views6 pages

A Suggestion-Based RDF Instance Matching System: January 2017

Uploaded by

rickshark
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/322081963

A Suggestion-Based RDF Instance Matching System

Article · January 2017


DOI: 10.7763/IJCTE.2017.V9.1170

CITATIONS READS

0 34

2 authors:

Mehmet Aydar Serkan Ayvaz


Kent State University Bahçeşehir University
19 PUBLICATIONS   116 CITATIONS    29 PUBLICATIONS   108 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Real-time Big Data analysis using Deep learning View project

Semantic Web View project

All content following this page was uploaded by Mehmet Aydar on 16 January 2018.

The user has requested enhancement of the downloaded file.


International Journal of Computer Theory and Engineering, Vol. 9, No. 5, October 2017

A Suggestion-Based RDF Instance Matching System


Mehmet Aydar and Serkan Ayvaz


Linked Open Data along with several other Semantic Web
Abstract—This paper presents a semi-automatic projects, the structured data available in the Semantic Web
recommendation-based instance matching system using RDF have been increasing exponentially. Many datasets in various
graph data. Based on a graph node similarity algorithm, our domains such as publications, life sciences, media, social
instance matching system detects instance nodes with web, geography have been incorporated into the Linked
similarities higher than an input threshold value and returns to Open Data.
the user the subject node pairs. The system merges a matched
node pair when the user confirms the matched nodes in the B. Data Mapping and Linking
results. After a merge, the merged node is also considered as an
Data mapping [4] is the process of creating the linkages
entity for the following candidate pair generation cycle. The
procedure continues until no new matching candidate pairs are and relations between data elements of distinct data models.
recommended by the algorithm and there is no more feedback Data mapping creates connections between different data
provided by the user. elements. The connectivity of the data elements increases
data interoperability and data reusability while reducing the
Index Terms—Instance Matching, RDF, Semantic Web, redundancy. Moreover, it is a key task for data integration
Similarity Metrics. processes including data transformation, data lineage
analysis, discovery of new data details within connected data
sources, consolidation of multiple data sources into a single
I. INTRODUCTION data source, etc. Furthermore, data mapping is needed for
In this study, we utilize an RDF entity similarity algorithm standardization of the data. For instance, healthcare
for matching instances that may be merged if confirmed by institutions often need to map their local data to an accepted
the user. Our assumption is that two graph entities are similar medical standard such as ICD-9 [5] or SNOMED CT [6] to
if their neighbor entities are also similar. Our semi-automatic be able to share their data with other medical facilities.
instance algorithm runs in iterations with the input from the In essence, the task of data linking is connecting
user. At each consequent iteration, the algorithm generates semantically related instances from multiple data sources.
more precise results based on the input from the user. In For our purposes, we use the term instance matching as
addition, our technique reduces the size of the RDF graph finding the semantically matching instances between
since we merge the same or very similar RDF nodes. As a multiple RDF graphs. The matched instances do not
result, the process reduces the complexity of the similarity necessarily have to be identical or equivalent. They might
algorithm at each iteration. also be hierarchically related with subset or superset
relations.
A. Semantic Web and RDF
Resource Description Framework (RDF) [1] is a general C. Instance Matching for Semantic Interoperability
purpose language, for representing information in Semantic Instance matching is an essential task in achieving
Web [2] in a way that the meaning (or semantics) is semantic interoperability on the Semantic Web. As the
unambiguous to a machine or software process. RDF amount of publicly available heterogenic data on the
describes resources through statements in the form of Semantic Web grows continually, the applications require
(subject, predicate, object) expressions which are known as creating more and more linkages between the data sources.
triples. For instance, in [7], the authors present a method for the
The development of Semantic Web technologies has led to translation of data models from one format to another. The
significant progress including explicit semantics with data in Fig. 1 below shows how their system works.
the Web in recent years. As increasing number of
organizations adopt Semantic Web technologies, publishing
data in a standard model and interlinking the data available
on the Web using Semantic Web technologies provide a Web
of data that is machine accessible and can be utilized by
applications through semantic queries.
The collection of inter-linked datasets on the Web is also
referred to as Linked Data [3]. With the contribution of

Manuscript received February 5, 2017; revised April 15, 2017.


M. Aydar is with the Department of Computer Science, Kent State
University, Kent, OH 44240 USA (e-mail: maydar@ kent.edu).
S. Ayvaz is with Department of Software Engineering, Bahcesehir
University, Besiktas 34353, Istanbul, Turkey (Corresponding author; e-mail:
[email protected]). Fig. 1. translation of instance data (taken from [7]).

DOI: 10.7763/IJCTE.2017.V9.1170 380


International Journal of Computer Theory and Engineering, Vol. 9, No. 5, October 2017

As shown in the figure, the approach defined a number of  We merge the matched entities and run the process
steps required to perform the translation to support in iterations, and producing more accurate results
heterogeneous data interoperability. In step 3 of the approach, and yielding less complexity after each iteration.
the mapping and the translation rules between various The paper is organized as follows. We discuss the
schemata are manually defined a priori from a domain expert. application of instance matching in an RDF representation of
The manually defining the mappings may result to data elements. Subsequently, we review the computation of
time-consuming and costly work in the process due to the entity similarity that is used for the matching. Then, we
size the data. The RDF instance matching system introduced describe the user interaction for semi-automatic instance
in this study contains a semi-automatic instance matching matching process. The subsequent section presents the results
framework to help experts find linkages between different of the evaluations. In the following section, the related work
data elements. is reviewed and followed by our conclusion.
D. Semi-automatic Instance Matching Technique
The computation of entity similarity is essential for the III. INSTANCE MATCHING IN RDF GRAPH
instance matching task as our instance matching system
The source data for an RDF graph may exist in
utilizes entity similarities to link the same or similar
heterogeneous data sources with different formats and data
real-world objects. Our system uses a pairwise entity
models. In this work, we assume that the data elements are
similarity algorithm for RDF graph data as explained in
represented in RDF: i.e., the instance data, the data dictionary
section IV. As the similarity algorithm generates the entity
elements that belong to the instance data and the mapping
similarity results, our system processes the results and it
between them are represented in RDF. For the data sources
returns the subject node pairs with a similarity score higher
that do not have a pre-defined data dictionary in place, a
than a threshold. If the user confirms a matched node pair in
summary graph generation algorithm as in [8] can be
the results, the system merges the nodes. After a merge, the
exploited to extract the data dictionary elements. The main
merged node is also considered as an entity for the following
data dictionary classes can be linked to the instance data
candidate pair generation cycle.
elements with the rdf:type [9] predicate. The RDF relations
The system then reruns the similarity algorithm with the
allow linking the data elements from different systems,
merged RDF node pairs. Based on the common predicates
constituting an extensive and connected RDF graph.
and neighbor similarity, a merged node can be matched with
In such a data ecosystem, it is common to have redundant
another instance and presented to the user as a candidate pair.
instances and concepts between different data sources. Thus,
This process continues until there is no more feedback from
the instance matching technique explained in this study
the user. Each time the similarity algorithm produces more
covers both de-duplication and data concepts matching.
accurate results with the input from the user. The size of the
Merging the redundant nodes helps to reduce the size of the
input RDF graph data is reduced by merging process,
dataset. Also, linking the concepts between diverse data
yielding less complexity each time.
models assists in the information translation process and
semantic data interoperability of different systems.
II. CONTRIBUTION AND OUTLINE
This study investigates the problem of discovering the IV. RDF ENTITY SIMILARITY FOR INSTANCE MATCHING
linkages between semantically related entities that could be
In the studies [8], [10], we introduced an effective
classified as the same entity within and among different data
algorithm for the computation of pairwise graph node
sources. Thus, the same or very similar entities can be
similarity using graph locality, neighborhood similarity, and
represented by a single entity. In the literature, this problem
the Jaccard measure. In the similarity algorithm, the
has been studied by the research community as the task of
computation of entity similarity is studied as a pairwise RDF
instance matching or concept matching. In this context, we
graph node similarity problem. The algorithm is based on an
consider this problem an instance matching of RDF entities,
efficient graph node similarity metric. Our instance matching
as we represent the data instances and data model details in
system uses this RDF entity similarity algorithm for pairing
an RDF model. Our approach is semi-automatic: it utilizes a
entities.
pairwise graph node similarity computation algorithm for
An RDF entity is described through a set of predicates, the
finding similar entities for presenting the similar entities to
the domain experts. collection of literal neighboring nodes that it references and
Our main contributions: the neighbor nodes with which it interacts. The predicates of
 We present a suggestion-based semi-automatic the subject nodes are treated as the dimensions of the entities.
instance matching system utilized for the RDF We utilize the common descriptors within the Jaccard
representation of data that contributes in the measure context when calculating the similarity of an RDF
information translation process. node pair, along with the similarities of their neighbors. Thus,
 We use neighborhood similarity idea to infer the direct similarities of the entities are taken into account
possible connections between the entities, such along with the similarities of the neighbors with which they
that if two entities are matched, then other nodes interact.
with similar predicates and in similar Each descriptor of an RDF node may have a different
neighborhoods are considered. impact in the similarity calculation, in other words each

381
International Journal of Computer Theory and Engineering, Vol. 9, No. 5, October 2017

descriptor has a different importance weight. Therefore, we instance matching system generates the first instance
presented an importance weighting metric for the descriptors matching candidates based on the results of the similarity
of the RDF nodes and enhanced the similarity metric by algorithm. As shown in Fig. 2, our system pairs the subject
incorporating the auto-generated importance weights of the nodes (v1, v5) as a candidate instance matching pair and
descriptors. presents them to the user at phase 1. The user approves that
the candidates match and the subject nodes v1, v5 are merged
to get [v1,v5].
V. USER INTERACTION
The size of data in semantic interoperability tasks often
requires an automated instance matching method.
Nonetheless, fully automated techniques can be error prone.
Therefore, a semi-automated instance matching technique
with user interactions yields more accurate results. Since our
goals include matching the instances belonging to the
heterogeneous data sources, our semi-automatic instance
matching system, allows the user to provide initial matches
between the source and target graph elements. The system
then runs the entity similarity algorithm introduced in section
IV. After the entity similarities converge, it follows the steps
below: Fig. 2. Instance matching process.
 The system extracts the subject IRI node pairs which
have similarities higher than a user defined and Then, on phase 2, the common predicates of the merged
configurable parameter (threshold), and then node [v1,v5] are checked by the algorithm. As the new node
presents them to the user. [v1,v5] connects to the subject nodes v4 and v6 by the
 A subject node pair that is presented to the user is common predicate p3, our system presents (v4,v6) to the user,
denoted by (s1,s2) where s1 and s2 are two subject and it gets [v4,v6] once the user approves that they match.
IRI nodes having similarity greater than the defined On phase 3, our system checks the common object nodes
threshold. Our system merges these two nodes if of the new node [v1,v5]. As it is clear that [v1,v5] connects to
their match is approved by the user. The merged a common object node v2 with the predicates p1 and p4, the
node is considered as a single subject node which is algorithm presents the predicates pair (p1,p4) to the user, and
denoted by [s1,s2], that retains all the predicates all it merges them to get [p1,p4] upon approval by the user.
from both s1 and s2. At phase 4, comparing to phase 1, the source and target
 Our system then checks the common object nodes of graphs are merged, and the graph size in total is reduced by
s1 and s2, and generates (p1,p2) as a candidate 40% (from five triples to three triples). As the instance
instance pair if both s1 and s2 are connected to a matching system runs in iterations, in the next iteration, the
common object node by p1 and p2 correspondingly. output graph of phase 3 becomes input in phase 1. The
 In addition, the system checks the common predicates iterations continue until the optimum instance matching pairs
of s1 and s2, and generates (o1, o2) as a candidate are obtained.
instance pair if both s1 and s2 correspondingly
connects to the object nodes o1 and o2 by a
common predicate. VI. EVALUATION
 The instance matching candidates (p1,p2) and (o1, o2) In empirical evaluations, the following datasets were used:
are then presented to the user and merged if their a subset of DBpedia [11] and a subset of SemanticDB [12].
match is approved by the user. The merged entities SemanticDB is a Semantic Web data repository developed by
are denoted by [p1,p2] and [o1,o2]. the Cleveland Clinic for Clinical Research and Quality
 The system then reruns the RDF entities similarity Reporting. To evaluate the effectiveness of the instance
algorithm based on the new RDF graph generated matching algorithm, we generated a validation dataset by
by the merged graph entity pairs. replicating the original dataset and syntactically changing the
 Our system repeats the steps explained above until the names of the instances. We transformed instance names in
there is no new matching pairs generated by our the validation dataset using a specific naming pattern. The
algorithm and no more feedback from the user. original dataset was considered as the source, and the
An example of how our system handles the instance validation dataset was as the target in the instance matching
matching and merging process is shown in Fig. 2. In the step. The instance node naming pattern was used for
figure, the nodes v1, v2, v3, v4 and the predicates p1, p2, p3 validation. In summary, the instance matching evaluation
belong to the source graph while the nodes v2, v5, v6 and the with DBpedia as the source dataset included 90 triples with
predicates p3, p4 belong to the target graph. Our goal is to 60 distinct subject and predicate nodes. The algorithm
find the matching instances between the source and the target semi-automatically matched 100% of the nodes to a target
graph. graph node. The algorithm achieved 85% accuracy and
At first, the RDF entities similarity algorithm runs and our generated 20 instance matching candidates. The evaluations

382
International Journal of Computer Theory and Engineering, Vol. 9, No. 5, October 2017

with SemanticDB as the source dataset contained 2500 triples upper ontology will have an impact on the whole context.
with 520 distinct subject and predicate nodes. 86% of the The notion of instance matching has also been studied by
nodes were matched to a target graph node the Semantic Web community. With an ontology instance
semi-automatically. The accuracy of the algorithm was 95% matching perspective, some research studies have
and 310 instance matching candidates were generated the investigated comparing instances based on the properties and
algorithm. roles. However, they primarily focus on the ontology
mapping [20], [21] or ontology population [22] tasks.
On the other hand, [23] studies an automatic instance
VII. RELATED WORK matching problem in RDF graphs with the focus on property
In the literature, there has been much research in the weights, where property weights yield precedence to
subject of ontology mapping. The terminology used for properties that make the instances more unique. The
defining the problem has varied such as matching, alignment, similarity metric utilized in this work also employs
merging, articulation, fusion, integration, morphism, etc. auto-generated property weights, which is a similar notion to
Inherently, many tools and methods have emerged in the the term frequency-inverse document frequency (tf-idf) [24],
field. [25]. Our work is also different in the sense that it is
Many of these approaches are solely based on the use of semi-automatic, allowing for user feedback.
string similarity mechanisms for finding matching entities A matching algorithm called similarity flooding (SF) is
between two ontologies [13], [14]. Approaches that have proposed in [8]. SF matches two directed and labeled graphs
studied the ontology mapping problem from the schema to produce a multi-mapping of analogous nodes.
matching perspective in the context of data integration have For the similarity computation, SF relies on the intuition
also been explored since schemata can be considered as that elements of two graphs are similar when their adjacent
ontologies with restricted relation types. These include [15], elements are similar, exploiting the neighboring structure of a
[16]. concept map, the semantic meaning of the content of the
Some others have suggested asking the user for feedback, graph node and the labels of the relations between the nodes.
as in our work, to perform the mapping generation In SF, the similarity of the node pairs starts either with a
interactively by providing proper visualization to support the string similarity between the content of the nodes or with a
decision. We think this is a useful feature for generating similarity of 1. It then propagates the initial similarity of any
high-quality links. However, in this work we minimize the two nodes through the graphs. The algorithm runs in multiple
need for user feedback to reduce the load on the users. iterations until the similarity values are converged, or until a
Doan et al. [17] provided a system, GLUE, which employs pre-defined maximum number of iterations. SF also does not
learning techniques to semi-automatically find mappings distinguish between a schema node and the instance data
between two given ontologies. For each concept in one node.
ontology, their framework uses a multi-learning strategy to In SF, filters are utilized to select the best mappings. The
predict a similar concept in the other using probabilistic mappings are then manually reviewed. The accuracy of the
definitions of several similarity measures. GLUE calculates a algorithm is measured by the estimated human labor savings
joint probability distribution to measure the overlap between obtained utilizing the algorithm for the matching tasks. In our
two input sets. The authors offer two learners: a content work, we make an assumption for similarity computation like
learner and a name learner for learning information such as that of SF, that the nodes connected to similar neighbors with
the word frequencies, instance names and value formats. The similar predicates are similar. We also run our similarity
content learner uses Naive Bayesian learning, a text algorithm in iterations, and we do not distinguish between the
classification method, for the instance content whereas the data model elements and the instance data elements like in SF.
name learner uses the full name instead of its content. Then However by the end of the iterations, we suggest the similar
they combine the predictions of the two learners and assign node pairs to the user for matching, and we merge the triples
weights to them using a meta-learner. Additionally, they use of the approved matched nodes. Consequently, we suggest
a technique, relaxation labelling, which gives labels to nodes matching of the nodes and predicates based on the common
of a graph, based on a set of constraints. Similar to their predicates and neighbors of the already matched nodes, and
system, we also propose a machine learning framework. In we rerun the similarity iterations. In this sense, our technique
contrast to them, we don't employ an active learning requires more user interactions but the consequent similarity
technique, which relies on the quality of the training dataset. iterations produces more accurate results assuming the user
Jain et al. offered a framework called BLOOMS [18] and provides accurate feedback.
later an improved version under BLOOMS+ [19]. They
propose a metric to determine which classes to align between
two ontologies and a technique for using contextual VIII. CONCLUSION
information to support the alignment process. However, they In this study, we provided an instance matching system
rely on existence of a human-generated upper ontology and using RDF graphs. Our instance matching technique is
the concept categorization in the form of a tree structure. Our semi-automatic and does not distinguish between the instance
approach does not rely on an upper ontology, as we think data elements and the data model elements. The system
these assumptions are problematic since the quality of the makes use of an efficient similarity algorithm, and it makes
mapping strictly depends on the categorization of the concept smart suggestions to the user by utilizing the neighborhood
by humans, and any potential categorization issue in the concept to infer possible connections between the graph

383
International Journal of Computer Theory and Engineering, Vol. 9, No. 5, October 2017

entities for finding the linkages between different data [18] P. Jain, P. Hitzler, A. P. Sheth, K. Verma, and P. Z. Yeh, “Ontology
Alignment For Linked Open Data,” The Semantic Web–ISWC 2010,
elements. The linkages found can further be utilized in a data Springer, pp. 402–417, 2010.
translation framework to support data interoperability. [19] P. Jain et al., “Contextual ontology alignment of lod with an upper
Additionally, we performed exploratory evaluations that ontology: A case study with proton,” The Semantic Web: Research and
Applications, Springer, pp. 80–92, 2011.
demonstrated significant results in matching similar graph
[20] A. Isaac, L. Van Der Meij, S. Schlobach, and S. Wang, “An empirical
elements. study of instance-based ontology matching,” Springer, 2007.
[21] C. Wang, J. Lu, and G. Zhang, “Integration of Ontology Data through
ACKNOWLEDGMENT Learning Instance Matching,” in Proc. International Conference on
Web Intelligence, 2006, pp. 536–539.
A special note of thanks to Prof. Austin Melton for his [22] S. Castano, A. Ferrara, S. Montanelli, and D. Lorusso, “Instance
invaluable help and guidance during the study. Also, the Matching for Ontology Population,” SEBD, pp. 121–132, 2008.
[23] M. H. Seddiqui, R. P. D. Nath, and M. Aono, “An efficient metric of
authors would like to thank the Cleveland Clinic Cardiology automatic weight generation for properties in instance matching
Application Group members for their valuable feedback. technique,” Int. J. Web Semantic Technol., vol. 6, no. 1, p. 1, 2015.
[24] H. P. Luhn, “A statistical approach to mechanized encoding and
searching of literary information,” IBM J. Res. Dev., vol. 1, no. 4, pp.
REFERENCES 309–317, 1957.
[1] G. Klyne and J. J. Carroll, “Resource description framework (RDF): [25] K. Sparck Jones, “A statistical interpretation of term specificity and its
Concepts and abstract syntax,” 2006. application in retrieval,” J. Doc., vol. 28, no. 1, pp. 11–21, 1972.
[2] T. Berners-Lee, J. Hendler, O. Lassila et al., “The semantic web,” Sci.
Am., vol. 284, no. 5, pp. 28–37, 2001.
[3] C. Bizer, T. Heath, and T. Berners-Lee, “Linked data-the story so far,”
Int. J. Semantic Web Inf. Syst., vol. 5, no. 3, pp. 1–22, 2009.
Mehmet Aydar was born in Iskenderun, Turkey in
[4] Wikipedia. (2017). Data Mapping. [Online]. Available:
1986. He took his bachelor degree in computer
https://en.wikipedia.org/wiki/Data_mapping
engineering in 2005 from Bahcesehir University,
[5] C. for D. Control and Prevention. ICD - ICD-9 - International
Istanbul, Turkey. He then took his master's degree in
Classification of Diseases, Ninth Revision. [Online]. Available:
computer technology in 2008 from Kent State
http//www.cdc.gov/nchs/icd/icd9cm.htm
University in OH, USA. He also received his Ph.D
[6] U. S. N. L. of Medicine. SNOMED Clinical Terms® (SNOMED CT®).
degree in computer science from Kent State University
[Online]. Available: http//www.snomed.org/snomed-ct
in 2015, OH, USA.
[7] M. Aydar and A. C. Melton, “Translation of instance data using RDF
He worked as a PLC programmer between
and structured mapping definitions,” in Proc. 14th International
2005-2006 in Hipertech LTD, Istanbul, Turkey. He was a graduate assistant
Semantic Web Conference ISWC, 2015.
between 2007-2008 in Kent State University, OH, USA. Between 2008 and
[8] S. Ayvaz, M. Aydar, and A. C. Melton, “Building summary graphs of
2011 he worked as a programmer/analyst in visual evidence LLC company,
RDF data in semantic web,” in Proc. 2015 IEEE 39th International
Cleveland, OH. He then worked in Cleveland Clinic Heart and Vascular
Computer Software and Applications Conference (COMPSAC), 2015.
Institute as a senior system analyst between 2011 and 2016. He is currently
[9] D. Brickley and R. V. Guha, “RDF vocabulary description language
the owner of multiple e-commerce web. His research interests and previous
1.0: RDF schema,” 2004.
publications are in the field of semantic web and its applications in healthcare
[10] M. Aydar, S. Ayvaz, and A. C. Melton, “Automatic weight generation
and life sciences and data mining.
and class predicate stability in RDF summary graphs,” 2015.
[11] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives,
“Dbpedia: A nucleus for a web of open data,” Springer, 2007.
Serkan Ayvaz received his bachelor’s degree in
[12] C. D Pierce, D. Booth, C. Ogbuji, C. Deaton, E. Blackstone, and D.
mathematics and computer science in 2006 from
Lenat, “SemanticDB: A semantic Web infrastructure for clinical
Bahçeşehir University Istanbul, Turkey. Later, he
research and quality reporting,” Curr. Bioinforma., vol. 7, no. 3, pp.
received his master’s degree in technology with
267–277, 2012.
specialization in computer technology from Kent State
[13] D. Spohr, L. Hollink, and P. Cimiano, “A machine learning approach to
University in 2008. He completed his Ph.D. in
multilingual and cross-lingual ontology matching,” The Semantic
computer science at Kent State University in 2015. He
Web–ISWC 2011, pp. 665–680, 2011.
has over 8 years of industry work experience in the
[14] G. Stoilos, G. Stamou, and S. Kollias, “A String Metric for Ontology
USA, most recently as lead systems analyst at
Alignment,” The Semantic Web–ISWC 2005, Springer, pp. 624–637,
eResearch Department at the Cleveland Clinic Foundation between 2011 and
2005.
2016. In his role, he served on multidisciplinary research teams focusing on
[15] J. Madhavan, P. A. Bernstein, and E. Rahm, “Generic schema matching
medical research projects. Prior to joining the Cleveland Clinic, he had
with cupid,” VLDB, vol. 1, pp. 49–58, 2001.
worked as a software engineer at Hartville Group for three years.
[16] S. Melnik, H. Garcia-Molina, and E. Rahm, “Similarity flooding: A
He is currently a faculty member at the Department of Software
versatile graph matching algorithm and its application to schema
Engineering and serves as the coordinator of the big data analytics and
matching,” in Proc. 2002 18th International Conference on Data
management graduate program at Bahcesehir University.
Engineering, 2002, pp. 117–128.
His research interests include semantic searches, machine learning and
[17] H. Bohring and S. Auer, A.-H. Doan, J. Madhavan, P. Domingos, and
scalable knowledge discovery in Big Data Semantic Web and its
A. Halevy, “Ontology matching: a machine learning approach,” Handb.
applications, particularly in healthcare and the life sciences.
Ontol. STAAB STUDER REds Int. Handb. Inf. Syst. Springer Verl. Berl.,
pp. 385–404, 2004.

384

View publication stats

You might also like