0% found this document useful (0 votes)
40 views2 pages

Schema Mapping Polymorphism

This document presents an approach to improving schema mapping by applying techniques from functional programming and type theory. It describes problems with current mapping systems, such as an inability to reuse mappings across similar schemas. The approach treats mappings as typed objects classified by schemas, allowing techniques like polymorphism to address these problems. This allows mapping expressions to be reused, dependencies between mappings to be expressed, and mappings to be inferred. The work develops a formal theory of mapping polymorphism and a domain-specific mapping language implemented in Haskell that incorporates these ideas.

Uploaded by

vthung
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views2 pages

Schema Mapping Polymorphism

This document presents an approach to improving schema mapping by applying techniques from functional programming and type theory. It describes problems with current mapping systems, such as an inability to reuse mappings across similar schemas. The approach treats mappings as typed objects classified by schemas, allowing techniques like polymorphism to address these problems. This allows mapping expressions to be reused, dependencies between mappings to be expressed, and mappings to be inferred. The work develops a formal theory of mapping polymorphism and a domain-specific mapping language implemented in Haskell that incorporates these ideas.

Uploaded by

vthung
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Schema Mapping Polymorphism

Ryan Wisnesky
Harvard University
[email protected]

This poster presents type-theoretic, functional-programming in- schema integrity constraints (e.g. foreign keys). Models and seman-
spired enhancements to the theory and practice of schema mapping. tics of schema mappings for data exchange (Fagin et al. 2003) and
operations over mappings (Melnik et al. 2003; Fagin et al. 2005,
Overview 2007) have been extensively studied by the database community.
Schema mappings are logical expressions in carefully crafted for- Motivation
malisms that express invariants between data represented in differ-
Current mapping systems suffer from a number of mapping-reuse
ent schemas (Popa et al. 2002). These expressions are often cre-
related drawbacks that are similar to challenges previously encoun-
ated automatically by a “mapping generator” that uses as input
tered by the functional programming community:
source and target schemas and a set of “correspondences” between
source and target schema elements (Miller et al. 2000; Melnik et al. • Reliance on concrete schemas, so users that have created map-
2005; Bonifati et al. 2005). Figure 1 shows IBM’s Clio mapping ping expressions from S to T cannot re-use them at related
tool (Haas et al. 2005) in action. schemas S 0 , T 0 even when that is a theoretically valid use. This
is the mapping polymorphism problem.
• Mapping formalisms cannot express mappings that depend on
other mappings. For instance, a user may map from S to T and
then copy the mapping and add an extra correspondence; if the
original mapping changes, this change is not propagated to the
new mapping. This is the mapping dependence problem.
• Given a set of mapping expressions M and schemas S and T ,
mapping tools cannot (in general) construct a mapping from S
to T . Moreover, there is no way to determine if M can be used
with any schemas. This is the mapping inference problem.
The problems are particularly acute when mappings are used
within larger dataflow systems (Dessloch et al. 2008). For instance,
we may need to infer mappings between dataflow nodes or would
like to express mappings that depend on mappings defined earlier
in the flow. Work on typed SQL combinators has helped alleviate
similar challenges when exchanging purely relational data using
Figure 1. A schema mapping in Clio Haskell (Leijen and Meijer 1999) and C] (Microsoft 2005–2006;
Meijer et al. 2006).
In this screenshot, the user has loaded a source and a target
schema and has entered a number of correspondences between Contributions
atomic-level elements of both schemas. Clio generates a set of The primary idea behind this ongoing work is that schemas clas-
schema mapping expressions from this simple input of schemas sify mapping expressions in much the same way that “types clas-
and correspondences. The generated (schema) mapping expres- sify terms.” Treating mappings as typed objects allows us to use
sions can then be converted into a semantics-preserving query that classical type-theoretic and functional programming techniques to
transforms data from the source schema to the target schema. Our address the above challenges. Using this principle, we:
notion of schema can describe both (non-recursive) XML and re-
• Develop a formal theory of mapping polymorphism, including
lational data, and queries can be generated in a number of target
languages (SQL, XSLT, etc). When query generation is viewed as algorithms for type-checking mappings and inferring schemas
compilation, mapping languages correspond to intermediate forms. from mapping expressions, and investigate polymorphism’s
Mapping tools are typically used when semantics-preserving connection to mapping systems, semantics, and re-use.
data transformation is needed but users cannot or do not want to • Create a Haskell-ish, Trex-style (i.e. making essential use of
write queries themselves (Miller et al. 2000); for instance, in a extensible records, qualified types, and row-polymorphism
business context where non-programmers need to migrate informa- (Gaster and Jones 1996)) domain-specific schema mapping
tion between departmental databases. Moreover, it can be difficult language well-suited for “schema mapping in the large,” im-
to manually create semantics preserving queries in the presence of plemented as an extension to Clio.

ICFP 2008 Poster


Highlights the form is to express mappings that depend on other mappings,
Nested relational (NR) schema, which, for instance, occur in dataflow graphs of mappings (Dess-
loch et al. 2008). Our implementation, which automatically popu-
Row ::= − | L Row, L : N R M lates (adds arcs to) mapping graph skeletons (nodes are schema and
NR ::= ATOMIC A | RCD Row | arcs are mapping expressions) using schema-as-types techniques,
represents updates to mapping graphs as programs.
SETRCD Row | CHC Row
describe the shape of data that consist of atomic elements, records, Semantics
sets of records, and choices/variants/sums. We consider only rows Principal types let us investigate the semantics of sets of mapping
without duplicate labels, and identify rows up to permutation. Ex- expressions M . Given a mapping meaning function J(M, S, T )K
ample NR schema are (e.g. taking mappings to queries), we can define a simple meaning
Src :: RCD L school : String, for M as J(M, PS , PT )K where the P are concretizations of M ’s
principal type. In practice, meanings defined in this way are most
date : String, often encountered when integrating mappings with other systems
depts : SETRCD L dept : String M M – for instance, when mapping language embeddings are used by
Dst :: RCD L projects : SETRCD L projectId : String, query generators.
taskId : String M M Further type-directed analysis sheds light on mapping expres-
Constraints between instances of data conforming to NR schema sion reuse. For instance, it is useful to automatically rewrite map-
are captured by nested mapping expressions. These expressions ping expressions to lift them to apply to a “larger” schema, as in the
generate queries that materialize target instances satisfying the con- example of reusing M :: S → Book as M 0 :: S → RCD L book :
straints. Mapping expressions resemble formulae of set-theory, like Book , loanedTo : String M. Re-use of mapping expressions in
this way can be studied as type coercion, and we have investigated
forall p in Src.depts exists q in Dst.projects
lifting in particular.
s.t. Src.school = q.projectId ∧ p.dept = q.taskId
but with syntactic restrictions that ensure solutions can always be References
computed. Angela Bonifati, Elaine Qing Chang, Terence Ho, Laks V. S. Lakshmanan,
A distinguishing feature of the mapping language is its CHC and Rachel Pottinger. HePToX: Marrying XML and Heterogeneity in
eliminator, which has a single fixed branch. Supposing that e :: Your P2P Databases. In VLDB(demo), pages 1267–1270, 2005.
CHC L r , l : t M, elimination of CHC is done with a binding construct S. Dessloch, M. A Hernández, R. Wisnesky, A. Radwan, and J. Zhou.
v of l from e. φ(v), where φ(v) is typed assuming v :: t. (This Orchid: Integrating Schema Mapping and ETL. In ICDE, pages 1307–
behavior differs from many programming languages where choice 1316, 2008.
elimination is required to have as many branches as choices.) Sets R. Fagin, P. G. Kolaitis, and L. Popa. Data exchange: getting to the core. In
of mapping expressions may thus be required to specify constraints PODS, pages 90–101, 2003.
between variants. R. Fagin, P. G. Kolaitis, L. Popa, and W. Tan. Composing Schema Map-
pings: Second-Order Dependencies to the Rescue. TODS, 30(4):994–
Schema as types 1055, 2005.
The intuitive typing discipline of the mapping language (loosely Ronald Fagin, Phokion G. Kolaitis, Lucian Popa, and Wang Chiew Tan.
speaking, as given by a naive encoding into O’Caml or Trex / Hugs Quasi-inverses of schema mappings. In PODS, pages 123–132, 2007.
/ Haskell) guarantees the meaning of sets of mapping expressions Benedict R. Gaster and Mark P. Jones. A polymorphic type system
as satisfiable constraints. For us, and for Clio, a mapping is a set of for extensible records and variants. Technical Report Technical re-
mapping expressions M and a source and target schema S, T for port NOTTCS-TR-96-3, Department of Computer Science, University
which ` M :: S → T . of Nottingham, November 1996. URL http://www.cse.ogi.edu/
The principal type of M in this discipline can be expressed in ~mpj/pubs/96-3.ps.gz.
the NR schema language extended with qualifiers and row (ρ) and L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio Grows
schema (σ) variables, following the tradition of qualified types. The Up: From Research Prototype to Industrial Tool. In SIGMOD, pages
qualifiers enable row polymorphism and enforce a restriction that 805–810, 2005.
equalities must be between atomic schema elements. An example Daan Leijen and Erik Meijer. Domain specific embedded compilers. In
extended schema is PLAN ’99: Proceedings of the 2nd conference on Domain-specific lan-
guages, volume 35, pages 109–122, New York, NY, USA, January 1999.
atomic? σ, ρ lacks? l ⇒ RCD L ρ, l : σ M ACM Press. doi: 10.1145/331960.331977. URL http://portal.
acm.org/citation.cfm?id=331977.
We can infer the unique principal satisfiable types of a set of
mapping expressions by employing the complete row unification Erik Meijer, Brian Beckman, and Gavin M. Bierman. LINQ: reconciling
algorithm of (Gaster and Jones 1996). The main technical issue object, relations and XML in the .NET framework. In SIGMOD, page
706, 2006.
here is that we need to unify (pre-)row expressions like
S. Melnik, E. Rahm, and P. A. Bernstein. Rondo: A programming platform
L l1 : t1 , l2 : t2 M ∼ L l2 : t2 , l1 : t1 M for generic model management. In SIGMOD, pages 193–204, 2003.
but traditional unification distinguishes these permutations. The S. Melnik, P. A. Bernstein, A. Halevy, and E. Rahm. Supporting Executable
inference algorithm decides mapping expression set satisfiability. Mappings in Model Management. In SIGMOD, pages 167–178, 2005.
Microsoft. Micorsoft Corp. The LINQ Project, 2005–2006.
Implementation http://msdn.microsoft.com/netframework/future/linq/.
Using qualified types allows us to embed the mapping language Renée J. Miller, Laura M. Haas, and Mauricio A. Hernández. Schema
into a λ-calculus closely resembling (Gaster and Jones 1996). The Mapping as Query Discovery. In VLDB, pages 77–88, 2000.
result of the embedding is an intermediate form usable by mapping L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernández, and R. Fagin.
engines for representing programs over mappings. A simple use of Translating Web Data. In VLDB, pages 598–609, 2002.

You might also like