7-Database Integration Nhom4
7-Database Integration Nhom4
Systems
TS. Phan Thị Hà
◼ Physical integration
❑ Source databases integrated and the integrated database is
materialized
❑ Data warehouses
◼ Logical integration
❑ Global conceptual schema is virtual and not materialized
❑ Enterprise Information Integration (EII)
Materialized
Global
Database
ET L
tools
◼ Local-as-view
❑ The GCS definition is assumed to exist, and each LCS is treated
as a view definition over it
◼ Global-as-view
❑ The GCS is defined as a set of views over the LCSs
◼ Schema heterogeneity
❑ Structural heterogeneity
◼ Type conflicts
◼ Dependency conflicts
◼ Key conflicts
◼ Behavioral conflicts
❑ Semantic heterogeneity
◼ More important and harder to deal with
◼ Synonyms, homonyms, hypernyms
◼ Different ontology
◼ Imprecise wording
◼ Other complications
❑ Insufficient schema and instance information
❑ Unavailability of schema documentation
❑ Subjectivity of matching
◼ Issues that affect schema matching
❑ Schema versus instance matching
❑ Element versus structure level matching
❑ Matching cardinality
⚫bip ⚫ spo
⚫ili ⚫ pon
⚫lit ⚫ ons
⚫ity ⚫ nsi
◼ 3-grams of string “Resp” are
❑ Res
❑ esp
◼ 3-gram similarity: 2/12 = 0.17
© 2020, M.T. Özsu & P. Valduriez TS.
20
Phan Thị Hà
Edit Distance Example
❑ Schema Integration
❑
One-pass Iterative
❑ Schema Mapping
❑
Given
❑ A source LCS: 𝒮 = {𝑆𝑖 }
❑ A target GCS: 𝒯 = {𝑇𝑖 }
❑ A set of value correspondences discovered during schema
matching phase: 𝒱 = {𝑉𝑖 }
Produce a set of queries that, when executed, will create
GCS data instances from the source data.
We are looking, for each 𝑇𝑘 , a query 𝑄𝑘 that is defined on a
(possibly proper) subset of the relations in 𝑆 such that,
when executed, will generate data for 𝑇𝑖 from the source
relations
© 2020, M.T. Özsu & P. Valduriez TS.
33
Phan Thị Hà
Mapping Creation Algorithm
General idea:
❑ Query Rewriting
❑
◼ Mediator/wrapper architecture
◼ MDB query processing architecture
◼ Query rewriting using views
◼ Query optimization and execution
◼ Query translation and execution
Mediator Mediator
◼ Communication autonomy
❑ The ability to terminate services at any time
❑ How to answer queries completely?
◼ Design autonomy
❑ The ability to restrict the availability and accuracy of information
needed for query optimization
❑ How to obtain cost information?
◼ Execution autonomy
❑ The ability to execute queries in unpredictable ways
❑ How to adapt to this?
Global/local
correspondences
Allocation and
capabilities
Local/DBMS
mappings
EMP(E#,ENAME,TITLE,CITY)
WORKS(E#,P#,RESP,DUR)
Q(E#,TITLE,P#) :- EMP(E#,ENAME,"Programmer",CITY),
WORKS(E#,P#,RESP,DUR).
Q(E#,TITLE,P#) :- EMP(E#,ENAME,TITLE,CITY),
WORKS(E#,P#,RESP,24).
Step2: produces
Q′(e,p) :- EMP1(e,ENAME,TITLE,"Paris"),
WORKS1(e,p,DUR). (q1)
Q′(e,p) :- EMP2(e,ENAME,TITLE,"Paris"),
WORKS1(e,p,DUR). (q2)
❑ Optimization Issues
SELECT ENAME,PNAME,DUR
FROM EMPASG
WHERE CITY = "Paris" AND DUR>24