Recap Through Exercise
Recap Through Exercise
Recap Through Exercise
IDX(S)
Candidate set: {3, 5}+{3, 5}+{3}+{3} Token DID
Agra 3
Block 3, 5
ID Freq Cinema 3
Delhi 4, 5, 6
3 4
Jeevan 3, 5
5 2
Mandir 6
New 4, 5
Ravi 4, 6
Vihar 4, 5, 6 2
X x1 : {Block, Jeevan, Cinema, Old, Agra}
x2 : {House, Jeevan, Vihar, Delhi}
Size Filtering Y x3 : {Block, Jeevan, Cinema, Agra}
x4 : {Building, Ravi, Vihar, New, Delhi}
x5 : {Block, Jeevan, Vihar, New, Delhi}
Given a string xεX, only strings y such that |x|/t ≥ |y| ≥ |x|*t
x6 : {Mandir, Ravi, Vihar, Delhi}
4
Prefix-Inverted Index
Algorithm
• reorder terms in each x ε X and y ε Y in increasing order of their frequencies
• for each y εY, create y’, the prefix of size |y| - (k – 1) of y
• build an inverted index over all prefixes y’
• for each x ε X, create x’, the prefix of size |x| - (k – 1) of x, then use above index
to find all y such that x’ overlaps with y’
Prefix Inverted Index for Y
O(x,y) ≥ 3 Prefix(x)=|x|-(3-1)=|x|-2 Token DID
Agra 3
What would be the candidate set?
X x1 : {Old, Agra, Cinema, Block, Jeevan} Cinema 3
x2 : {House, Delhi, Jeevan, Vihar} Delhi 5
7
More on Schema Matching and Mapping
8
Recap --
creating semantic matches between schemas
name = title, location = concat(city, state, zipcode)
Defining semantic mapping between schemas using SQL
9
Schemas
BOOK-VENDOR
Books(ISBN, title, category, PubYear)
Shop(SID, BID, markedPrice, Vloc)
10
Semantic Matches
11
Semantic Mappings
Semantic mapping between Aggregator, Book-vendor, DVD-vendor
GAV mapping shows how to obtain an entire tuple for Items table of
AGGREGATOR
Create View Items AS
SELECT title AS name, releaseDate AS releaseInfo, rating AS
classification, basePrice * (1 + taxRate) AS price
FROM Movies, Products, Locations
WHERE Movies.id = Products.mid AND Products.saleLocID = Locations.lid
UNION
SELECT title AS name, PubYear AS releaseInfo, category AS classification,
markedPrice AS price
FROM Books, Shop
WHERE Books.ISBN=Shop.BID;
If we use Local-as-View (LAV) approach to relate schemas
for each table in DVD-VENDOR and BOOK-VENDOR, must create a semantic mapping that
specifies how to obtain tuples for that table from schema AGGREGATOR (i.e., from table Items)
If we use GLAV approach 12
there are semantic mappings going in both directions
Challenges of
Schema Matching and Mapping
Matching and mapping systems must reconcile semantic
heterogeneity between the schemas
Such semantic heterogeneity arise in many ways
same concept, but different names for tables and attributes
rating vs classification
multiple attributes in 1 schema relate to 1 attribute in the other
basePrice and taxRate relate to price
tabular organization of schemas can be quite different
one table in AGGREGATOR vs three tables in DVD-VENDOR
coverage and level of details can also differ significantly
DVD-VENDOR also models releaseDate and releaseCompany
13
Challenges of
Schema Matching and Mapping
Why do we have semantic heterogeneity?
schemas are created by different people whose states and
styles are different
disparate databases are rarely created for exact same purposes
Why reconciling semantic heterogeneity is hard
the semantics is not fully captured in the schemas
schema clues can be unreliable
intended semantics can be subjective
correctly combining the data is difficult
14
Matching System Architecture
15
Matchers
17
Example
18
Instance-Based Matchers
20
Measuring the Overlap of Values
21
Using Classifiers
22
Using Classifiers
23
Using Classifiers: An Example
si is address, tj is location
Sim scores are 0.9, 0.7, and 0.5, respectively for the three
instances of T.location return average score of 0.7 as
sim score between address and location
24
Reminder: Matching System Architecture
25
Combining Match Predictions
26
Combining Match Predictions: Another
Example of the Average Combiner
27
Combining Match Predictions
29
Enforcing Domain Integrity Constraints
30
Enforcing Domain Integrity Constraints -- contd
31
Schema mapping
32
Overview of Mapping Systems
33
Multiple Join Paths
Address id Addr
UNION ALL
select Sal
from Professor
Could also do an outer-union, if Personnel(id,name, sal)…
and even a join.