Schema Mapping Polymorphism

This document presents an approach to improving schema mapping by applying techniques from functional programming and type theory. It describes problems with current mapping systems, such as an inability to reuse mappings across similar schemas. The approach treats mappings as typed objects classified by schemas, allowing techniques like polymorphism to address these problems. This allows mapping expressions to be reused, dependencies between mappings to be expressed, and mappings to be inferred. The work develops a formal theory of mapping polymorphism and a domain-specific mapping language implemented in Haskell that incorporates these ideas.

Uploaded by

vthung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views2 pages

Schema Mapping Polymorphism

Uploaded by

vthung

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Schema Mapping Polymorphism

Ryan Wisnesky
Harvard University
[email protected]

This poster presents type-theoretic, functional-programming in- schema integrity constraints (e.g. foreign keys). Models and seman-
spired enhancements to the theory and practice of schema mapping. tics of schema mappings for data exchange (Fagin et al. 2003) and
operations over mappings (Melnik et al. 2003; Fagin et al. 2005,
Overview 2007) have been extensively studied by the database community.
Schema mappings are logical expressions in carefully crafted for- Motivation
malisms that express invariants between data represented in differ-
Current mapping systems suffer from a number of mapping-reuse
ent schemas (Popa et al. 2002). These expressions are often cre-
related drawbacks that are similar to challenges previously encoun-
ated automatically by a “mapping generator” that uses as input
tered by the functional programming community:
source and target schemas and a set of “correspondences” between
source and target schema elements (Miller et al. 2000; Melnik et al. • Reliance on concrete schemas, so users that have created map-
2005; Bonifati et al. 2005). Figure 1 shows IBM’s Clio mapping ping expressions from S to T cannot re-use them at related
tool (Haas et al. 2005) in action. schemas S 0 , T 0 even when that is a theoretically valid use. This
is the mapping polymorphism problem.
• Mapping formalisms cannot express mappings that depend on
other mappings. For instance, a user may map from S to T and
then copy the mapping and add an extra correspondence; if the
original mapping changes, this change is not propagated to the
new mapping. This is the mapping dependence problem.
• Given a set of mapping expressions M and schemas S and T ,
mapping tools cannot (in general) construct a mapping from S
to T . Moreover, there is no way to determine if M can be used
with any schemas. This is the mapping inference problem.
The problems are particularly acute when mappings are used
within larger dataflow systems (Dessloch et al. 2008). For instance,
we may need to infer mappings between dataflow nodes or would
like to express mappings that depend on mappings defined earlier
in the flow. Work on typed SQL combinators has helped alleviate
similar challenges when exchanging purely relational data using
Figure 1. A schema mapping in Clio Haskell (Leijen and Meijer 1999) and C] (Microsoft 2005–2006;
Meijer et al. 2006).
In this screenshot, the user has loaded a source and a target
schema and has entered a number of correspondences between Contributions
atomic-level elements of both schemas. Clio generates a set of The primary idea behind this ongoing work is that schemas clas-
schema mapping expressions from this simple input of schemas sify mapping expressions in much the same way that “types clas-
and correspondences. The generated (schema) mapping expres- sify terms.” Treating mappings as typed objects allows us to use
sions can then be converted into a semantics-preserving query that classical type-theoretic and functional programming techniques to
transforms data from the source schema to the target schema. Our address the above challenges. Using this principle, we:
notion of schema can describe both (non-recursive) XML and re-
• Develop a formal theory of mapping polymorphism, including
lational data, and queries can be generated in a number of target
languages (SQL, XSLT, etc). When query generation is viewed as algorithms for type-checking mappings and inferring schemas
compilation, mapping languages correspond to intermediate forms. from mapping expressions, and investigate polymorphism’s
Mapping tools are typically used when semantics-preserving connection to mapping systems, semantics, and re-use.
data transformation is needed but users cannot or do not want to • Create a Haskell-ish, Trex-style (i.e. making essential use of
write queries themselves (Miller et al. 2000); for instance, in a extensible records, qualified types, and row-polymorphism
business context where non-programmers need to migrate informa- (Gaster and Jones 1996)) domain-specific schema mapping
tion between departmental databases. Moreover, it can be difficult language well-suited for “schema mapping in the large,” im-
to manually create semantics preserving queries in the presence of plemented as an extension to Clio.

ICFP 2008 Poster

Highlights the form is to express mappings that depend on other mappings,
Nested relational (NR) schema, which, for instance, occur in dataflow graphs of mappings (Dess-
loch et al. 2008). Our implementation, which automatically popu-
Row ::= − | L Row, L : N R M lates (adds arcs to) mapping graph skeletons (nodes are schema and
NR ::= ATOMIC A | RCD Row | arcs are mapping expressions) using schema-as-types techniques,
represents updates to mapping graphs as programs.
SETRCD Row | CHC Row
describe the shape of data that consist of atomic elements, records, Semantics
sets of records, and choices/variants/sums. We consider only rows Principal types let us investigate the semantics of sets of mapping
without duplicate labels, and identify rows up to permutation. Ex- expressions M . Given a mapping meaning function J(M, S, T )K
ample NR schema are (e.g. taking mappings to queries), we can define a simple meaning
Src :: RCD L school : String, for M as J(M, PS , PT )K where the P are concretizations of M ’s
principal type. In practice, meanings defined in this way are most
date : String, often encountered when integrating mappings with other systems
depts : SETRCD L dept : String M M – for instance, when mapping language embeddings are used by
Dst :: RCD L projects : SETRCD L projectId : String, query generators.
taskId : String M M Further type-directed analysis sheds light on mapping expres-
Constraints between instances of data conforming to NR schema sion reuse. For instance, it is useful to automatically rewrite map-
are captured by nested mapping expressions. These expressions ping expressions to lift them to apply to a “larger” schema, as in the
generate queries that materialize target instances satisfying the con- example of reusing M :: S → Book as M 0 :: S → RCD L book :
straints. Mapping expressions resemble formulae of set-theory, like Book , loanedTo : String M. Re-use of mapping expressions in
this way can be studied as type coercion, and we have investigated
forall p in Src.depts exists q in Dst.projects
lifting in particular.
s.t. Src.school = q.projectId ∧ p.dept = q.taskId
but with syntactic restrictions that ensure solutions can always be References
computed. Angela Bonifati, Elaine Qing Chang, Terence Ho, Laks V. S. Lakshmanan,
A distinguishing feature of the mapping language is its CHC and Rachel Pottinger. HePToX: Marrying XML and Heterogeneity in
eliminator, which has a single fixed branch. Supposing that e :: Your P2P Databases. In VLDB(demo), pages 1267–1270, 2005.
CHC L r , l : t M, elimination of CHC is done with a binding construct S. Dessloch, M. A Hernández, R. Wisnesky, A. Radwan, and J. Zhou.
v of l from e. φ(v), where φ(v) is typed assuming v :: t. (This Orchid: Integrating Schema Mapping and ETL. In ICDE, pages 1307–
behavior differs from many programming languages where choice 1316, 2008.
elimination is required to have as many branches as choices.) Sets R. Fagin, P. G. Kolaitis, and L. Popa. Data exchange: getting to the core. In
of mapping expressions may thus be required to specify constraints PODS, pages 90–101, 2003.
between variants. R. Fagin, P. G. Kolaitis, L. Popa, and W. Tan. Composing Schema Map-
pings: Second-Order Dependencies to the Rescue. TODS, 30(4):994–
Schema as types 1055, 2005.
The intuitive typing discipline of the mapping language (loosely Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan.
speaking, as given by a naive encoding into O’Caml or Trex / Hugs Quasi-inverses of schema mappings. In PODS, pages 123–132, 2007.
/ Haskell) guarantees the meaning of sets of mapping expressions Benedict R. Gaster and Mark P. Jones. A polymorphic type system
as satisfiable constraints. For us, and for Clio, a mapping is a set of for extensible records and variants. Technical Report Technical re-
mapping expressions M and a source and target schema S, T for port NOTTCS-TR-96-3, Department of Computer Science, University
which ` M :: S → T . of Nottingham, November 1996. URL http://www.cse.ogi.edu/
The principal type of M in this discipline can be expressed in ~mpj/pubs/96-3.ps.gz.
the NR schema language extended with qualifiers and row (ρ) and L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio Grows
schema (σ) variables, following the tradition of qualified types. The Up: From Research Prototype to Industrial Tool. In SIGMOD, pages
qualifiers enable row polymorphism and enforce a restriction that 805–810, 2005.
equalities must be between atomic schema elements. An example Daan Leijen and Erik Meijer. Domain specific embedded compilers. In
extended schema is PLAN ’99: Proceedings of the 2nd conference on Domain-specific lan-
guages, volume 35, pages 109–122, New York, NY, USA, January 1999.
atomic? σ, ρ lacks? l ⇒ RCD L ρ, l : σ M ACM Press. doi: 10.1145/331960.331977. URL http://portal.
acm.org/citation.cfm?id=331977.
We can infer the unique principal satisfiable types of a set of
mapping expressions by employing the complete row unification Erik Meijer, Brian Beckman, and Gavin M. Bierman. LINQ: reconciling
algorithm of (Gaster and Jones 1996). The main technical issue object, relations and XML in the .NET framework. In SIGMOD, page
706, 2006.
here is that we need to unify (pre-)row expressions like
S. Melnik, E. Rahm, and P. A. Bernstein. Rondo: A programming platform
L l1 : t1 , l2 : t2 M ∼ L l2 : t2 , l1 : t1 M for generic model management. In SIGMOD, pages 193–204, 2003.
but traditional unification distinguishes these permutations. The S. Melnik, P. A. Bernstein, A. Halevy, and E. Rahm. Supporting Executable
inference algorithm decides mapping expression set satisfiability. Mappings in Model Management. In SIGMOD, pages 167–178, 2005.
Microsoft. Micorsoft Corp. The LINQ Project, 2005–2006.
Implementation http://msdn.microsoft.com/netframework/future/linq/.
Using qualified types allows us to embed the mapping language Renée J. Miller, Laura M. Haas, and Mauricio A. Hernández. Schema
into a λ-calculus closely resembling (Gaster and Jones 1996). The Mapping as Query Discovery. In VLDB, pages 77–88, 2000.
result of the embedding is an intermediate form usable by mapping L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernández, and R. Fagin.
engines for representing programs over mappings. A simple use of Translating Web Data. In VLDB, pages 598–609, 2002.

Doctor Receipt Analyzer
No ratings yet
Doctor Receipt Analyzer
107 pages
Message Broker Message Flows
80% (5)
Message Broker Message Flows
1,756 pages
C Data Structures and Algorithms: Implementing Efficient ADTs
From Everand
C Data Structures and Algorithms: Implementing Efficient ADTs
Larry Jones
No ratings yet
List of Major Customer: Supplier Name Insert Supplier Logo
0% (1)
List of Major Customer: Supplier Name Insert Supplier Logo
3 pages
8 ObjectDatabases
No ratings yet
8 ObjectDatabases
60 pages
Session - 6 - Complex Data Types
No ratings yet
Session - 6 - Complex Data Types
27 pages
Unit-Iii Advanced Database Systems
No ratings yet
Unit-Iii Advanced Database Systems
29 pages
Revision On Ch.7 Thinking, Problem Solving and Reasoning
No ratings yet
Revision On Ch.7 Thinking, Problem Solving and Reasoning
43 pages
DBMS Chapter 2
No ratings yet
DBMS Chapter 2
31 pages
Course Outcome - BCA - BU - Sep - 2023 - Update
No ratings yet
Course Outcome - BCA - BU - Sep - 2023 - Update
24 pages
Week-3 Schema Matching and Mapping
No ratings yet
Week-3 Schema Matching and Mapping
26 pages
Cambridge IGCSE ™: French 0520/41 October/November 2022
No ratings yet
Cambridge IGCSE ™: French 0520/41 October/November 2022
12 pages
2012 Arizona Cardinals Media Guide
No ratings yet
2012 Arizona Cardinals Media Guide
452 pages
FIXatdl-1 1-Specification With Errata 20101221
100% (1)
FIXatdl-1 1-Specification With Errata 20101221
63 pages
French Channels3.m3u
No ratings yet
French Channels3.m3u
92 pages
Arxiv 602.03501.2016
No ratings yet
Arxiv 602.03501.2016
81 pages
Electric Drives Intro
No ratings yet
Electric Drives Intro
58 pages
Lec3 1
No ratings yet
Lec3 1
65 pages
Sandeep Maheshwari
No ratings yet
Sandeep Maheshwari
56 pages
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
A Survey On Mapping Semi-Structured Data and Graph Data To Relational Data
No ratings yet
A Survey On Mapping Semi-Structured Data and Graph Data To Relational Data
57 pages
CYT180Week2 - Big Data Models
No ratings yet
CYT180Week2 - Big Data Models
34 pages
Week 3
No ratings yet
Week 3
29 pages
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Adobe Scan May 31, 2024
No ratings yet
Adobe Scan May 31, 2024
24 pages
Slide 3
No ratings yet
Slide 3
35 pages
Kerkhof Petrography
No ratings yet
Kerkhof Petrography
21 pages
A Survey On Mapping Semi-Structured Data and Graph Data To Relational Data
No ratings yet
A Survey On Mapping Semi-Structured Data and Graph Data To Relational Data
38 pages
CrossFit Strongman Course
100% (2)
CrossFit Strongman Course
26 pages
Dfhuynh Thesis
No ratings yet
Dfhuynh Thesis
134 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
(IEEE 2024) A Generic Schema Evolution Approach For NoSQL
No ratings yet
(IEEE 2024) A Generic Schema Evolution Approach For NoSQL
16 pages
Ic Types
No ratings yet
Ic Types
36 pages
Information Integration: Maurizio Lenzerini
No ratings yet
Information Integration: Maurizio Lenzerini
110 pages
C++ Data Structures Explained: A Practical Guide with Examples
From Everand
C++ Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Module 2 Lesson 1
No ratings yet
Module 2 Lesson 1
12 pages
01 BigDataDesign
No ratings yet
01 BigDataDesign
38 pages
Tle 6281
No ratings yet
Tle 6281
15 pages
2.1.2 Data Models
No ratings yet
2.1.2 Data Models
13 pages
Conceptual Design of Document NoSQL Database With Formal Concept Analysis
No ratings yet
Conceptual Design of Document NoSQL Database With Formal Concept Analysis
20 pages
Multilingual Centralized Screening
No ratings yet
Multilingual Centralized Screening
14 pages
HW 1 Key
No ratings yet
HW 1 Key
10 pages
SUMMER INTERNSHIP REPORT (AutoRecovered)
No ratings yet
SUMMER INTERNSHIP REPORT (AutoRecovered)
19 pages
امتحان+ الصف الاول الاعدادي+اول+3+وحدات+مستر+عرفات+و+محمد+رضا
No ratings yet
امتحان+ الصف الاول الاعدادي+اول+3+وحدات+مستر+عرفات+و+محمد+رضا
8 pages
Normalization: Repetition of Information Inability To Represent Certain Information Loss of Information
No ratings yet
Normalization: Repetition of Information Inability To Represent Certain Information Loss of Information
39 pages
Section 2 Transforming From Conceptual Model To Physical Model
No ratings yet
Section 2 Transforming From Conceptual Model To Physical Model
27 pages
Bsec 1907 Viaduct DN-24
No ratings yet
Bsec 1907 Viaduct DN-24
60 pages
Composing Schema Mappings: Second-Order Dependencies To The Rescue
No ratings yet
Composing Schema Mappings: Second-Order Dependencies To The Rescue
60 pages
Information Integration: Existing Methods and Solutions
No ratings yet
Information Integration: Existing Methods and Solutions
25 pages
7A Concept Map Cells
100% (1)
7A Concept Map Cells
4 pages
Polymorphism: Polymorphism in Programming Languages A Taxonomy of Polymorphism Overloading and Dynamic Binding
No ratings yet
Polymorphism: Polymorphism in Programming Languages A Taxonomy of Polymorphism Overloading and Dynamic Binding
32 pages
11 Board Question Paper Maths II November 2020 - 6598093377c7e
No ratings yet
11 Board Question Paper Maths II November 2020 - 6598093377c7e
4 pages
The State of The Art in End-User Software Engineering: Submitted To ACM Computing Surveys
No ratings yet
The State of The Art in End-User Software Engineering: Submitted To ACM Computing Surveys
50 pages
Web Data Integration Summary
No ratings yet
Web Data Integration Summary
10 pages
Reading - W3 CLIO
No ratings yet
Reading - W3 CLIO
7 pages
Bernstein Presentation 03
No ratings yet
Bernstein Presentation 03
38 pages
10-CHP-5 Periodic Classification of Element
No ratings yet
10-CHP-5 Periodic Classification of Element
7 pages
Data & Knowledge Engineering: David Kensche, Christoph Quix, Xiang Li, Yong Li, Matthias Jarke
No ratings yet
Data & Knowledge Engineering: David Kensche, Christoph Quix, Xiang Li, Yong Li, Matthias Jarke
23 pages
Database Lexicography: Gary Coen
No ratings yet
Database Lexicography: Gary Coen
22 pages
English 4 - EJE 6 - Week 14
No ratings yet
English 4 - EJE 6 - Week 14
6 pages
ACP 312 Quiz 1 Week 1 3
No ratings yet
ACP 312 Quiz 1 Week 1 3
6 pages
Miller Approximation
No ratings yet
Miller Approximation
14 pages
Data & Knowledge Engineering: Paolo Papotti, Riccardo Torlone
No ratings yet
Data & Knowledge Engineering: Paolo Papotti, Riccardo Torlone
18 pages
Infot 1 - Chapter 5
No ratings yet
Infot 1 - Chapter 5
5 pages
Strong Types For Relational Databases: (Functional Pearl)
No ratings yet
Strong Types For Relational Databases: (Functional Pearl)
12 pages
Renormalization of Nosql Database Schemas: Michael J. Mior, Kenneth Salem
No ratings yet
Renormalization of Nosql Database Schemas: Michael J. Mior, Kenneth Salem
14 pages
Database Design Theory: Introduction To Databases CSCC43 Winter 2011 Ryan Johnson
No ratings yet
Database Design Theory: Introduction To Databases CSCC43 Winter 2011 Ryan Johnson
10 pages
Xquery: An XML Query Language
No ratings yet
Xquery: An XML Query Language
19 pages
Answering Queries Using Views - A Survey
No ratings yet
Answering Queries Using Views - A Survey
25 pages
Net Pay PDF
No ratings yet
Net Pay PDF
12 pages
Information Integration Using Logical Views
No ratings yet
Information Integration Using Logical Views
22 pages
Mapping Data To Queries: Semantics of The IS-A Rule
No ratings yet
Mapping Data To Queries: Semantics of The IS-A Rule
22 pages
A UML Profile For Modeling Schema Mappings
No ratings yet
A UML Profile For Modeling Schema Mappings
10 pages
Data Dependency
No ratings yet
Data Dependency
5 pages
Food Control: Lu Zhang, Michelle A. Schultz, Rick Cash, Diane M. Barrett, Michael J. Mccarthy
No ratings yet
Food Control: Lu Zhang, Michelle A. Schultz, Rick Cash, Diane M. Barrett, Michael J. Mccarthy
10 pages
Technical Education & Research Institute: Department of Business Administration
No ratings yet
Technical Education & Research Institute: Department of Business Administration
12 pages
Rule Based Schem A
No ratings yet
Rule Based Schem A
11 pages
Duplicate Record Detection - A Survey
No ratings yet
Duplicate Record Detection - A Survey
16 pages
Idq Fiche
No ratings yet
Idq Fiche
2 pages
Data Integration: A Theoretical Perspective: Maurizio Lenzerini
No ratings yet
Data Integration: A Theoretical Perspective: Maurizio Lenzerini
14 pages
HW 2
No ratings yet
HW 2
12 pages
Genetic Programming
No ratings yet
Genetic Programming
11 pages
Genetic Programming
No ratings yet
Genetic Programming
11 pages
A Comparison of A Graph Database and A Relational Database: A Data Provenance Perspective
No ratings yet
A Comparison of A Graph Database and A Relational Database: A Data Provenance Perspective
6 pages
Data Services in Your Spreadsheet
No ratings yet
Data Services in Your Spreadsheet
10 pages
Clip: A Visual Language For Explicit Schema Mappings
No ratings yet
Clip: A Visual Language For Explicit Schema Mappings
10 pages
Mapping of SQL Relational Schemata To OWL Ontologies
No ratings yet
Mapping of SQL Relational Schemata To OWL Ontologies
6 pages
A Comparative Analysis of Object-Relational
No ratings yet
A Comparative Analysis of Object-Relational
5 pages
Email Writing: 1. Semi-Formal Email 2. Formal Email 3. Informal Email
No ratings yet
Email Writing: 1. Semi-Formal Email 2. Formal Email 3. Informal Email
4 pages
Relational Database Technology: A Crash Course: Tables, Records, and Columns
No ratings yet
Relational Database Technology: A Crash Course: Tables, Records, and Columns
5 pages
Provenance and Scientific Workflows
No ratings yet
Provenance and Scientific Workflows
6 pages
Querying and Creating Visualizations by Analogy
No ratings yet
Querying and Creating Visualizations by Analogy
8 pages
NoSQL Databases A Survey On Schema Less Databases
No ratings yet
NoSQL Databases A Survey On Schema Less Databases
3 pages
Clio Grows Up
No ratings yet
Clio Grows Up
6 pages
Generic Model Management: A Database Infrastructure For Schema Manipulation
No ratings yet
Generic Model Management: A Database Infrastructure For Schema Manipulation
6 pages
Querying and ReUsing Workflows With Visstrails
No ratings yet
Querying and ReUsing Workflows With Visstrails
4 pages
Incremental Schema Matching
No ratings yet
Incremental Schema Matching
4 pages
Reengineering of Relational Databases To Object Oriented Database
No ratings yet
Reengineering of Relational Databases To Object Oriented Database
3 pages
Mathcad - 13-Axially Load Column Design
No ratings yet
Mathcad - 13-Axially Load Column Design
2 pages
Record Linkage Similarity Measures and Algorithms
No ratings yet
Record Linkage Similarity Measures and Algorithms
130 pages
XML Schema Automatic Matching Solution
No ratings yet
XML Schema Automatic Matching Solution
7 pages
The Romcom Agenda Jayne Denker Instant Download
No ratings yet
The Romcom Agenda Jayne Denker Instant Download
41 pages
Extension Service Data Mart
No ratings yet
Extension Service Data Mart
1 page
The Dell Vostro 1510
No ratings yet
The Dell Vostro 1510
3 pages
p802 Koudastutorial
No ratings yet
p802 Koudastutorial
2 pages
Model Managment and Schem A Mapping S Theory and Practice
No ratings yet
Model Managment and Schem A Mapping S Theory and Practice
2 pages
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Image Based Modeling and Rendering: Exploring Visual Realism: Techniques in Computer Vision
From Everand
Image Based Modeling and Rendering: Exploring Visual Realism: Techniques in Computer Vision
Fouad Sabry
No ratings yet
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet

Schema Mapping Polymorphism

Uploaded by

Schema Mapping Polymorphism

Uploaded by

Schema Mapping Polymorphism

ICFP 2008 Poster

You might also like