Schema Mapping Polymorphism
Schema Mapping Polymorphism
Ryan Wisnesky
Harvard University
[email protected]
This poster presents type-theoretic, functional-programming in- schema integrity constraints (e.g. foreign keys). Models and seman-
spired enhancements to the theory and practice of schema mapping. tics of schema mappings for data exchange (Fagin et al. 2003) and
operations over mappings (Melnik et al. 2003; Fagin et al. 2005,
Overview 2007) have been extensively studied by the database community.
Schema mappings are logical expressions in carefully crafted for- Motivation
malisms that express invariants between data represented in differ-
Current mapping systems suffer from a number of mapping-reuse
ent schemas (Popa et al. 2002). These expressions are often cre-
related drawbacks that are similar to challenges previously encoun-
ated automatically by a “mapping generator” that uses as input
tered by the functional programming community:
source and target schemas and a set of “correspondences” between
source and target schema elements (Miller et al. 2000; Melnik et al. • Reliance on concrete schemas, so users that have created map-
2005; Bonifati et al. 2005). Figure 1 shows IBM’s Clio mapping ping expressions from S to T cannot re-use them at related
tool (Haas et al. 2005) in action. schemas S 0 , T 0 even when that is a theoretically valid use. This
is the mapping polymorphism problem.
• Mapping formalisms cannot express mappings that depend on
other mappings. For instance, a user may map from S to T and
then copy the mapping and add an extra correspondence; if the
original mapping changes, this change is not propagated to the
new mapping. This is the mapping dependence problem.
• Given a set of mapping expressions M and schemas S and T ,
mapping tools cannot (in general) construct a mapping from S
to T . Moreover, there is no way to determine if M can be used
with any schemas. This is the mapping inference problem.
The problems are particularly acute when mappings are used
within larger dataflow systems (Dessloch et al. 2008). For instance,
we may need to infer mappings between dataflow nodes or would
like to express mappings that depend on mappings defined earlier
in the flow. Work on typed SQL combinators has helped alleviate
similar challenges when exchanging purely relational data using
Figure 1. A schema mapping in Clio Haskell (Leijen and Meijer 1999) and C] (Microsoft 2005–2006;
Meijer et al. 2006).
In this screenshot, the user has loaded a source and a target
schema and has entered a number of correspondences between Contributions
atomic-level elements of both schemas. Clio generates a set of The primary idea behind this ongoing work is that schemas clas-
schema mapping expressions from this simple input of schemas sify mapping expressions in much the same way that “types clas-
and correspondences. The generated (schema) mapping expres- sify terms.” Treating mappings as typed objects allows us to use
sions can then be converted into a semantics-preserving query that classical type-theoretic and functional programming techniques to
transforms data from the source schema to the target schema. Our address the above challenges. Using this principle, we:
notion of schema can describe both (non-recursive) XML and re-
• Develop a formal theory of mapping polymorphism, including
lational data, and queries can be generated in a number of target
languages (SQL, XSLT, etc). When query generation is viewed as algorithms for type-checking mappings and inferring schemas
compilation, mapping languages correspond to intermediate forms. from mapping expressions, and investigate polymorphism’s
Mapping tools are typically used when semantics-preserving connection to mapping systems, semantics, and re-use.
data transformation is needed but users cannot or do not want to • Create a Haskell-ish, Trex-style (i.e. making essential use of
write queries themselves (Miller et al. 2000); for instance, in a extensible records, qualified types, and row-polymorphism
business context where non-programmers need to migrate informa- (Gaster and Jones 1996)) domain-specific schema mapping
tion between departmental databases. Moreover, it can be difficult language well-suited for “schema mapping in the large,” im-
to manually create semantics preserving queries in the presence of plemented as an extension to Clio.