Rough Sets in Spatio-Temporal Data Mining
Rough Sets in Spatio-Temporal Data Mining
Thomas Bittner
Centre de recherche en geomatique, Laval University, Quebec, Canada [email protected]
Abstract. In this paper I dene spatio-temporal regions as pairs consisting of a spatial and a temporal component and I dene topological relations between them. Using the notion of rough sets I dene approximations of spatio-temporal regions and relations between those approximations. Based on relations between approximated spatio-temporal regions congurations of spatio-temporal objects can be characterized even if only approximate descriptions of the objects forming them are available.
1 Introduction
Rough set theory [Paw82] provides a way of approximating subsets of a set when the set is equipped with a partition or equivalence relation. Rough sets were extensively used in the context of Data Mining, e.g., [Lin95,LC97]. So far, however, they were used mainly in non spatio-temporal contexts, for example, in order to classify and analyze phenomena, like diseases, given a nite number of observations or symptoms, e.g., [NSR92,BNSNT 95]. It is the purpose of this paper to apply rough sets in a spatiotemporal context, i.e., to describe and classify (congurations of) spatio-temporal objects. An important task in spatio-temporal data mining is to discover characteristic congurations of spatial objects. Characterizing spatial congurations is important, for example, in order to retrieve your new ideal home from a property database such that it has access to a highway, is located by the shore of a lake, within a beautiful forest, and far away from the next nuclear power station. Another important task is to nd classes of congurations that characterize molecules like amino-acids and proteins [GFA93]. Spatio-temporal relations are often important to identify causal relationships between events in which are spatio-temporal objects are involved: In order to interact with each other things often need to be at the same place at the same time. There are three major aspects characterizing spatio-temporal objects: (1) Aspects characterizing what they are, e.g., the class of things they belong to; (2) Aspects characterizing where they are, i.e., their spatial location; (3) Aspects characterizing when they existed and when they have been, are, or will be where, i.e., their temporal location. Between spatio-temporal objects hold spatio-temporal relations such as being in the same place at the same time, or having been in a place before something else. Sets of spatio-temporal objects form spatio-spatial congurations that are characterized by sets
The nancial support from the Canadian GEOID network is gratefully acknowledged.
of spatial, temporal, and spatio-temporal relations that hold between objects forming the conguration. In this paper I concentrate on topological spatio-temporal relations (like being in the same place at the same time). Topological relations between regions of space and time play a major role in characterizing spatial and temporal congurations [EFG97,All83]. Today the classication of spatio-temporal congurations is based on relations between objects and on relations between the spatio-temporal regions they occupy. Unfortunately, it is often impossible to identify the region of space and time those objects exactly occupy, i.e., the exact location of spatio-temporal objects is often indeterminate [BF95]. [Bit99] argued that often approximate location of spatial objects is known. The notion of approximate location is based on the notion of rough sets, i.e., the approximation of (exact) location with respect to a regional partition of space and time. In this paper I discuss how rough sets can be used in order to describe approximate location in space and time and how to derive possible relations between objects given their approximations. This paper is structured as follows. In Section 2 I dene the notions of spatiotemporal object, location, region, and the relationships between them. I dene topological relations between spatio-temporal regions in Section 3. The notion of a rough set is used in Section 4 in order to approximate spatio-temporal regions with respect to regional partitions of space and time. In Section 5 binary topological relations between those approximations are dened. These relations can be used to characterize congurations of spatio-temporal objects even if we know only their approximate location. In Section 6 the conclusions are given.
We say that the object is located in the (spatial, temporal or spatio-temporal) region in order to stress the exact t of object and region (the object matches the region). It is important to distinguish the exact match from the case of an object being located within a region which intuitive meaning allows the region to be bigger than the object and the case of the object covering a region which intuitively implies the region to be smaller than the object. Notice that this implies a four dimensional ontology of spatio-temporal objects [Sim87]
2-dimensional plane. In the remainder I concentrate on regions of time and space and topological relations between them. Spatio-temporal objects may be at rest, i.e., being located in the same region of space for a period of time, or they may change, i.e., being located in different regions of space at each moment of time3 . Spatial change may be continuous, i.e., regions of consecutive moments of time are topologically close as in the case of change of bona-de objects [SV97] like cars, planets, and human beings, or discontinuous, as (sometimes) in the case of change of at objects [SV97] like land property. Consider an object at rest. I assume that the exact region of a spatio-temporal object, , has always a corresponding time interval4, , which is bounded by the moment of time, , where stopped at and the moment, , of time where leaves (or . the current moment of time if is currently resting at ), i.e., . I dene the spatio-temporal region of the resting object as a pair, The region is a part of the exact temporal region of , i.e., . Spatial change causes spatio-temporal objects to be located in different regions of space at different moments of time. Consider a changing (moving, growing, shrinking, . Let be the . . . ) spatio-temporal object, , within the time interval , i.e., sum, , of the regions in which was located during . I dene the spatio-temporal region of the spatially changing object as a pair, . Notice, that doing this we do not know anymore, where exactly is located during . It can be everywhere within , but it cannot be somewhere else. In the special case of continuous movement the region can be thought of as the path of the objects movement during . In the remainder I will use the metaphor path of change during in order to refer to the sum of spatial regions of a spatially changing object during the interval .
3 4
This implies an ontology of absolute space and time, i.e., regions do not change. A maximal connected region of time.
The formula is true if the intersection of and is not the empty region; The formula is true if the intersection of and is identical to ; The formula is true if the intersection of and is identical to . The correspondence between such triples of boolean values and the RCC5 classication is given in the table below. Possible geometric interpretations are given in Figure 1 [BS00].
F T T T T
F F T F T
F F F T T
RCC5 DR PO PP PPi EQ
The set of triples is partially ordered by dening iff for , where the boolean values are ordered by F T. [BS00] refer to the Hasse diagram of the partially ordered set (The right diagram in Figure 1.) as the RCC5 lattice.
T T F PP
T F F PO F F F DR
T T T EQ
T F T PPi
DR(x,y)
PO(x,y)
PP(x,y)
PPI(x,y)
EQ(x,y)
3.2 Relations between temporal regions Consider the maximal connected one dimensional regions, and , i.e., intervals. Boundary insensitive topological relation between intervals and on a directed line (RCC relations) can be determined by considering the triple of values belonging to the set FLO FLI T FRI FRO :
FLO if FRO if if
where
and and
where
and where
with
if
otherwise if otherwise
and and
() is the one dimensional region occupying the whole line left (right)5 of . The intuition behind FLO ( FRO) is that is false because of parts of sticking out to the left (right) of . The intuition behind FLI ( FRI) is that is false because of parts of sticking out to the right (left) .
The triples formally describe jointly exhaustive and pairwise disjoint relations under the assumption that and are intervals in a one dimensional directed space. The correspondence between the triples and the boundary insensitive relations between intervals is given in the table below. Possible geometric interpretations of the dened relations are given in Figure 2.
FLO FRO T T T T T T T
FLO FRO FLO FRO T T FLI FRI T
FLO FRO FLO FRO FLI FRI T T T
RCC
For example. The relation DRL holds if and do not overlap and is left of ; POL holds if and partly overlap and the non overlapping parts of are left
5
I use the spatial metaphor of a line extending from the left to the right rather than the time-line extending from the past to the future in order to focus on the aspects of the time-line as a one-dimensional directed space. Time itself is much more difcult.
of ; PPL holds if is contained in but does not cover the very right parts of ; PPiL holds if is a part of and there are parts of sticking out to the left of ; PPR holds if is a part of and does not cover the very left parts of ; PPiR holds if is a part of and there are parts of sticking out to the right of . T a lattice is formed, Assuming the ordering which has as minimal element and as maximal element. The ordering is indicated by the arrows in Figure 2.
PPiL(x,y) PPiR(x,y)
DRL(x,y)
POL(x,y)
PPL(x,y)
EQ
PPR(x,y)
POR(x,y)
DRR(x,y)
and be two spatio-temporal objects at rest with spatio-temporal location and , where and are time intervals, i.e., maximally connected temporal regions, and and are arbitrary, possibly scattered, 2D regions. The spatio-temporal relation between and can be described
Let using the following pair of triples:
The relationship between those pairs of triples and spatio-temporal relations is given in the following table:
FLO FLO ... T T T T T T ... T FRO ... FLO FLO FLO FLO ... ... FLO FLO FLO FLO FLO FLO FLO FLO T FLI T FLI ... ... T T FRO FRO ... ... F T ... T T T T F T ... T F ... F F ... F T F T F F ... T F ... F F ... F F T T F F ... T F ... (RCC ,RCC5 ) (DRL,DR) (DRL,PO) ... (POL,PO) (POL,PP) (POL,PPi) (POL,EQ) (PPL,DR) (PPL,PO) ... (EQ,EQ) (DRR,DR) ...
The the Hasse diagram of the partially ordered set is called the (RCC ,RCC5 ) lattice. For example. The relation DRL PO is interpreted as follows: The spatial regions and partially overlap and the relation between the time interval when rested in and the time interval when rested in is DRL. The scenario, i.e., the sequence of events, could be described as: changes to location and rests there during . At some time in the future ( has already left ), changes to such that PO 6 . The relation POL PO is interpreted as follows: The spatial regions and partially overlap and the relation between the time interval when rested in and the time interval when rested in is POL. The scenario is: changes to its location and rests there during . While is resting in , changes to such that PO holds. While is still resting in , changes to another region. This new region may or may not overlap . Formally, for changing objects the same style of denition applies. Only the exact regions of and , and during and , are replaced by the path of change and , of and during and . In this case we do not describe the relation between the location of rest of during and the location of rest of during , but the relation between the path of change of during and the path of change of during . The relation POL PO is interpreted as follows: The path of change of during and the path of change of during do partially overlap and the relation POL holds between the time intervals and . The interpretation of POL is that we started monitoring the path of earlier than monitoring the path of and nished monitoring the path of earlier than monitoring the path of .
4 Rough approximations
Rough set theory [Paw82] provides a way of approximating subsets of a set when the set is equipped with a partition or equivalence relation. Given a set with a partition , an arbitrary subset can be approximated by a function fo po no . The value of is dened to be fo if , it is no if , and otherwise the value is po. The three values fo, po, and no stand respectively for full overlap, partial overlap and no overlap; they measure the extent to which overlaps the elements of the partition of . 4.1 Approximating spatial and temporal regions [BS00] showed that regions of space and time can be described by specifying how they relate to a partition of space and time into cells which may share boundaries but which do not overlap. A region can then be described by giving the relationship between the region and each cell. Suppose a space of precise regions. By imposing a partition, , on we can approximate elements of by elements of . That is, we approximate regions in by functions from to the set fo po no . The function which
6
assigns to each region its approximation is denoted . The value of is fo if covers all the of the cell , it is po if covers some but not all of the interior of , and it is no if there is no overlap between and . Each approximate region stands for a set of precise regions, i.e., all those proprecise regions having the approximation . This set which will be denoted vides a semantics for approximate regions: [BS00]. 4.2 The meet operation The domain of regions is equipped with a meet operation interpreted as the intersection of regions. In the domain of approximation functions the meet operation between regions is approximated by pairs of greatest minimal, , and least maximal, , meet operations on approximation mappings [BS98]. Consider the operations and on fo po no that are dened as follows: the set no po fo no no no no po no no po fo no po fo no po fo no no no no po no po po fo no po fo to ) by
4.3 Approximating spatio-temporal regions Spatio-temporal regions are pairs, , consisting of a spatial component, , and a temporal component, . Both components can be approximated separately by approximation functions, and with respect to partitions and of time and space, as described above. Consequently, an approximate spatio-temporal regions is a pair . Each approximate spatio-temporal region stands for a set of precise spatio-temporal regions, i.e., all those precise regions having the approximation . This set which will be denoted provides a semantics for approximate spatio-temporal regions:
and
where denotes the set of regions of the time-line and denotes the regions of the plane. The greatest minimal and least maximal meet operations between approximations of spatial and temporal regions generalize in the natural way to approximations of spatio-temporal regions:
In the context of approximate regions, the bottom element, , is the function from to which takes the value no for every element of . Each of the above triples provides an RCC5 relation, so the relation between and can be measured by a pair of RCC5 relations. These relations will be denoted by and . The pairs which can occur are all pairs where with the exception of PP EQ and PPi EQ [BS00]. is the minimal Consider the ordering of the RCC5 lattice. The relation relation and the relation is the maximal relation that can hold between and . For all relations , with there are and such that [BS00]. 5.2 Syntactic generalization of relations between temporal intervals In order to generalize the above formulation of RCC relations to relations between approximations of temporal intervals we need to dene operations and corresponding to operations and . The behavior of is shown in as Figure 3. Formally we dene T if M if
and
and
and
F otherwise
and similarly using and , where yields the approximation of the part of the time-line left of and yields the approximation of the part of the time-line right of respectively. Formally, and are dened as with no fo, follows. Firstly, we dene the complement operation po po, and fo no. Assuming that partition cells are numbered in increasing order in direction of the underlying space, we secondly dene and as:
if
no
no
if
no
no
X << Y = T
X << Y = T
X << Y = M
X << Y = F
, where
is below
T means that We need two more operations: and , where is contained in and does not cover the very right parts of and T is interpreted as is contained in and does not cover the very left parts of . The behavior of are shown in Figure 4. Formally we dene , containing the elements of the co-domain a set of , and the operation
We dene
T if fo M if po F otherwise
or po po and po po
respectively by replacing
by
in the denition of
X |> Y = T
X |> Y = T
X |> Y = M
X |> Y = F
, where
is below the
We are now able to generalize the above formulation of RCC relations to relations between approximations. Let and be boundary insensitive approximations of temporal intervals. We can consider the two triples of values:
where
where
if if
if
and and
F and F and
if if if if if
F and
F and
and where
if if if if if and and and and
F and
F and
The functions
and
F and
if if otherwise if if otherwise
or and
or and
and
F and
Both functions assume that is contained in is shown in Figure 5. The denitions of , are obtained by replacing by in the denitions of .
. The behavior of
, and
, and ,
leftCheck(Y,X)=T
leftCheck(Y,X)=T
leftCheck(Y,X)=F
leftCheck(Y,X)=T
where
Each of the above triples denes an RCC relation, so the relation between and can be measured by a pair of RCC relations. These relations will be denoted by and .
that can occur are all pairs where EQ and EQ with the exception of PPL EQ, PPR EQ, PPiL EQ, PPiR EQ, and EQ DRR. Proof The pairs PPL EQ, PPR EQ, PPiL EQ, PPiR EQ cannot occur since RCC relations are renements of RCC5 relations and the pairs PP EQ and PPi cannot occur in the RCC5 case [BS00]. The pair EQ DRR cannot occur due to the non-symmetry of the underlying denitions. In order to generate all remaining pairs approximations of time intervals in regional partitions consisting of at least three elements need to be considered. A Haskell [Tho99] program generating all remaining pairs of relations between approximations with respect to a partition consisting of three intervals can be found at [Bit00b]. 5.3 Semantic generalization of relations between temporal intervals At the semantic level we consider how syntactically generated pairs, 7 , relate to relations between the approximated regions and . The aim is that the syntactically generated pairs constrain the possible relations that can hold between the approximated intervals and [BS00]:
We proceed by considering all pairs containing the relation EQ. Consider conguration (a) in Figure 6, which represents the most indeterminate case. The syntactic approach described above yields the pair DRL EQ. Since in this kind of conguration the pair DRL EQ is consistent with EQ DRR and DRL EQ was chosen arbitrarily, DRL EQ is corrected syntactically to DRL DRR. Consider conguration (b) in Figure 6. The syntactic approach yields the pair DRL EQ which is not correct if and are intervals as depicted. Notice that the meet operations were originally dened for arbitrary regions not for one-dimensional and to be (time) intervals the outcome of the intervals. Assuming minimal meet must not be empty. This needs to be taken into account in the denition of . Let and be boundary insensitive approximations of time intervals: PO
if
otherwise
PO and PO or PO
Applying to and in Figure 6 (b) yields EQ as minimal relation. But EQ EQ still does not characterize Figure 6 (b) correctly, since between and
7
In
write .
instead
of
the relations POL EQ POR can hold. Consider also Figure 6 (c) for which the operations dened above yield POL EQ , but the approximations and are also consistent with POR for and . Consequently, EQ and DRL and the if leftmost or the rightmost non-empty approximation values of and have the value and is POL POR . This also apPO then the RCC relation between plies to the conguration Figure 6 (d). The corrected relations are denoted and .
(a) <DRL,EQ>(X,Y)
(b) <DRL,EQ>(X,Y)
(c) <POL,EQ>(X,Y)
(d) <POL,EQ>(X,Y)
Fig. 6. Congurations characterized by pairs containing the relation EQ is above the time-line and is below the time-line.
where
Finally, consider the conguration (a) in Figure 7. Our denitions yield PPiL but the approximations and are also consistent with PPiR but not with EQ for and . The interval can cover the very right part of or the very left part of or some part in the middle. The same holds but the if we switch and (Figure 7 (b)): Our denitions yield PPL PPL approximations and are also consistent with PPR but not with EQ . Consider Figure 7 (c) and (d). These cases are different: Assuming that and are intervals can cover neither the very letf of nor the very right of . Consequently, the conguration is consistent with both PPL PPL and PPR PPR but in this cases it is o.k. to chose one, since must cover parts in the middle of .
PPiL
(a) <PPiL,PPiL>(X,Y)
(b) <PPL,PPL>(X,Y)
(c) <PPL,PPL>(X,Y)
(d) <PPL,PPL>(X,Y)
Fig. 7. Congurations characterized by pairs PPL PPL or PPiL PPiL, where above the time-line and is below the time-line.
is
The cases depicted in Figure 7 (a) and (b) need to be handled separately. For them the theorem below does not hold. The problem disappears if boundary sensitive approximations are used. For all other case we state Theorem 2: Theorem 2 The relation is the minimal relation and the relation is the maximal relation that can hold between and . For all relations , with there are and such that .
Proof RCC relations are renements of RCC5 relations. Figure 2 shows that the RCC lattice can be separated into the left and the right RCC5 sub-lattices (DRL EQ and EQ DRR). Theorem 1 tells us that our syntactic procedure yields minimal and maximal relation pairs that either belong to the left RCC5 sub-lattice or the right RCC5 sublattice. It also tells us that the generated pairs are same pairs occuring in the RCC5 case. Consequently, with the exeption of the special cases discussed above, Theorem 2 of [BS00] applies, stating that the syntactic approach constrains the right set of relations. Consequently, what remains to show is that the theorem holds for the special cases: DRL DRR and POL POR. The case DRL DRR occurs in congurations where the syntactic procedure yields DRL EQ, i.e., in congurations that are equivalent to the conguration in Figure 6 (a). DRL and DRR are trivially minimal and maximal and it is easy to verify that all DRR can actually occur for relation with DRL and . The case POL POR occurs in congurations where the syntactic procedure yields EQ and DRL and that that the leftmost or the rightmost non-empty approximation values of and have the value PO, i.e., in congurations that are similar to the conguration in Figure 6 (b-d). It POR is easy to verify that exactly the relations with POL can actually occur for and . 5.4 Approximating topological relations between spatio-temporal objects Based on relations between approximations of spatial regions and relations between approximations of temporal regions we now dene relations between approximations of spatio-temporal regions. Let and be two spatio-temporal objects at rest with
spatio-temporal location and with approximations and . Consider the following structure:
Each cpmponent of the above pair of pairs of triples denes a spatio-temporal relation, (RCC , RCC5 ). So the relation between and can be measured by a . The pair of spatio-temporal relations: pairs
that can occur are exactly those that can occur in the separate treatment of approximations of RCC5 and RCC relations. Consequently, relations between approximations of spatio-temporal regions, and , are represented by pairs of minimal and maximal spatio-temporal relations such that is the least spatio-
temporal relation and is the largest spatio-temporal relation that can hold between spatio-temporal regions and .
6 Conclusions
In this paper I dened spatio-temporal regions as pairs consisting of a spatial and a temporal component. I dened topological relations between spatio-temporal regions based on topological relations between the spatial and temporal components. Approximations of spatio-temporal regions were dened using approximations of their spatial and temporal components. I dened topological relations between approximations of spatio-temporal regions based on a specic style that allows to dene relations between spatio-temporal regions exclusively based on constraints on the outcome on the meet operation. The proposed framework can be used in order to describe spatial congurations based on approximate descriptions of spatio-temporal objects and relation between those approximations. Those approximate descriptions can be much easier obtained from observations of reality than exact descriptions. The formalism discussed in this paper deals only with boundary insensitive topological relations between spatio-temporal regions. This can be easily extended to boundary sensitive relations using the formalisms proposed in [BS00] and [Bit00a].
References
[All83] J.F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832843, 1983. [BF95] Peter Burrough and Andrew U. Frank, editors. Geographic Objects with Indeterminate Boundaries. GISDATA Series II. Taylor and Francis, London, 1995. [Bit99] T. Bittner. On ontology and epistemology of rough location. In Spatial information theory - Cognitive and computational foundations of geographic information science, COSIT 99, number 1661 in Lecture Notes in Computer Science, Hamburg, Germany, 1999. Springer Verlag. [Bit00a] T. Bittner. Approximate temporal reasoning. In Workshop proceedings of the Seventeenth National Conference on Articial Intelligence, AAAI 2000, 2000. [Bit00b] T. Bittner. A Haskell program generating all possible relations between boundary insensitive approximations of time intervals. http://www.cs.queensu.ca/ bittner, 2000. [BNSNT 95] J. Bazan, H. Nguyen Son, T. Nguyen Trung, A. Skowron, and J. Stepaniuk. Application of modal logics and rough sets for classifying objects. In M. De Glas and Z. Pawlak, editors, Proceedings of the Second World Conference on Fundamentals of Articial Intelligence (WOCFAI95), pages 1526, Paris, 1995. [BS98] T. Bittner and J. G. Stell. A boundary-sensitive approach to qualitative location. Annals of Mathematics and Articial Intelligence, 24:93114, 1998. [BS00] T. Bittner and J. Stell. Rough sets in approximate spatial reasoning. In Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing (RSCTC2000). Springer Verlag, 2000. [CV95] R. Casati and A. Varzi. The structure of spatial localization. Philosophical Studies, 82(2):205239, 1995.
[EF91] [EFG97]
[Gea66] [GFA93]
[Paw82] [RCC92]
[Sim87] [SV97]
[Tho99]
Max J. Egenhofer and Robert D. Franzosa. Point-set topological spatial relations. International Journal of Geographical Information Systems, 5(2):161174, 1991. M. J. Egenhofer, D.M. Flewelling, and R.K. Goyal. Assessment of scene similarity. Technical report, University of Maine, Department of Spatial Information Science and Engineering, 1997. P. Geach. Some problems about time. Proceedings of the British Academy, 11, 1966. J. Glasgow, S. Fortier, and F.H. Allen. Molecular scene analysis: Crystal structure determination through imagery. In L. Hunter, editor, Articial Intelligence and Molecular Biology. AAAI/MIT Press, 1993. T.Y. Lin and N. Cercone, editors. Rough Sets and Data Mining. Analysis of Imprecise Data. Kluwer Academic Publishers, Boston, Dordrecht, 1997. T.Y. Lin, editor. Proceedings of the Workshop on Rough Sets and Data Mining at 23rd Annual Computer Science Conference, Nashville, Tenessee, 1995. R. Nowicki, Slowinski, and J. R., Stefanowski. Rough sets analysis of diagnostic capacity of vibroacoustic symptoms. Journal of Computers and Mathematics with Applications, 1992. Z. Pawlak. Rough sets. Internat. J. Comput. Inform, 11:341356, 1982. D. A. Randell, Z. Cui, and A. G. Cohn. A spatial logic based on regions and connection. In 3rd Int. Conference on Knowledge Representation and Reasoning. Boston, 1992. P. Simons. Parts, A Study in Ontology. Clarendon Press, Oxford, 1987. B. Smith and A. Varzi. Fiat and bona de boundaries: Towards an ontology of spatially extended objects. In S. Hirtle and A. Frank, editors, Spatial Information TheoryA Theoretical Basis for GIS, International Conference COSIT 97, Laurel Highlands, PA, volume 1329 of Lecture Notes in Computer Science, pages 103 119. Springer-Verlag, Berlin, 1997. Simon Thompson. Haskell: The Craft of Functional Programming. AddisonWesley, 2 edition, 1999.