14-Temporal and Spatial Mining-13!11!2024
14-Temporal and Spatial Mining-13!11!2024
Outline
• Introduction
• Active Spatial Data Mining
– Spatial data mining trigger
• Temporal Association Rule with Numerical
Attributes
– Correlation among object evolutions
• Conclusions and Future Work
Introduction
• Huge amount of spatial data
are generated everyday.
– Earth Observing System
Satellite
– National Spatial Data Infrastructure
Satellite
Satellite
– National Image Mapping Agency
– One meter resolution data
– Digital earth
Satellite dish
Satellite dish
Users are usually interested in
the hidden information.
– Aggregate information
Satellite dish
– Clustering
– Patterns
Introduction
• Introduction
• STING+: An approach to active spatial data
mining
• Temporal association rules with numerical
attributes
• Conclusions and Future Work
STING+
• Since data evolves over time, interesting patterns are likely
to emerge or change.
• Goal: identify and find (most) interesting patterns
• Problems:
– Knowledge discovery processes are expensive.
It is not feasible to re-process the entire data set for every change.
– Periodically examine the data.
• Long delays
• Transient patterns might be missed
.
.
. . .
STING+
• STING+ was designed to introduce and support spatial
triggers efficiently.
• Observation (spatial locality): Only objects added to the
shaded area will contribute to the growth of cluster size at
this moment.
..... . .
. ... ... ..
.
.
.
. . .
STING+
• STING+ Strategy: Monitor only the area occupied by
potential clusters and their neighborhoods.
..... . . ..... . .
. .. .. .. . .. .. ..
.. ..
. .
. .
. .
. . . .
. .
Level 1 Level 2
STING+
– STING+ decomposes a trigger into a set of sub-triggers
associated with individual cells in the hierarchical
structure to monitor the cumulative effect of data
changes within the cell.
..... .....
. .
. ... ... .. . ........ ..
Sub-trigger . .
on cell
Higher level
. sub-trigger .
. on cell .
. . . . . .
Level 4 Level 3
STING+
– Updates/insertions are suspended at various levels in
the hierarchy until such time that the cumulative effect
of these insertions might cause the trigger condition to
become satisfied.
..... .....
. .
. ... ... .. . ........ ..
. .
+ ++
+ + ++
+
. .
. .
. . . . . .
Level 0 Level 1
STING+
..... . . ..... . .
. ... ... .. . ... ... ..
. .
+ ++
+ + ++
+
. .
. .
. . . .
. .
Level 2 Level 3
No update of cluster !
STING+
• Primitive event: insertion, deletion, update
• Composite event: a set of primitive events
• In general, evaluating a trigger T usually involves two
aspects:
– Find a set of composite events E(s) that may cause the trigger
condition CT to become true.
– Each time some composite event in E(s) occurs, check the status
(false or true) of CT (given that CT was false previously).
• Observation: As a side effect of the occurrence of some
composite event, E(s) might also evolve over time.
STING+
..... .
. .. ..
.
.
............
. ..
.
.
. .
.
• STING+ Strategy: Two sets of composite events are considered:
– the set of composite events E(s) that can cause CT to become true
• need to re-evaluate CT
– the set of composite events F(s) that can cause a change to E(s)
• need to update E(s)
– The sub-triggers are used to monitor composite events in E(s) and F(s) and
change accordingly when E(s) and F(s) evolves.
STING+
• Introduction
• STING+: An approach to active spatial data
mining
• Temporal association rules with numerical
attributes
• Conclusions and Future Work
Temporal Association Rules
• Now we are considering general databases with evolving
numerical attributes.
• Interesting patterns exhibited in the data are often
numerous and complicated.
– Customer churning: If a customer’s phone bill increases by at
least $10 each month for six months, then he is likely to change his
long distance telephone carrier.
– Real estate: People who receive a raise of at least 20% of their
salary are likely to move away from big city.
• Such patterns can be represented by association rules of
the form X Y, which indicates that the occurrences of X
and Y have high correlation.
Temporal Association Rules
• Earlier work on association rules mainly focused on binary
attributes and intra-transaction relationship.
– E.g., ham bread
– Support and strength are two metrics used to qualify
interesting rules.
• support: number of instances to follow the rule
– N(ham, bread)
• strength: how strong the correlation is
N ( ham, bread )
–
N ( ham )
N ( ham, bread )
–
N ( ham ) N (bread )
Temporal Association Rules
• Consider a set of objects, each of which has a unique ID
and a set of time varying numerical attributes; and a
sequence of snapshots are taken at some frequency.
– E.g., in an employee database, two attributes are considered: salary
and monthly housing expense.
– For a given snapshot, each employee can be mapped to a point in a
two dimensional space.
.
.
..
salary
..
. .
. . . ..
. . .. .
salary
.. .. . .. . ..
.
.. .. . ..
time
. ..
..... .. .
. ..
. ......
salary
......... . .
... . .
monthly_housing_expense
Temporal Association Rules
• Observation: The domain of a numerical attribute might contain a
large number of distinct values and might even be continuous.
– E.g., domain(salary) = [50000, 60000].
– Any sub-ranges can appear in a rule.
– The number of possible rules may be very large if not infinite.
• Strategy: Each attribute domain is quantized into a set of equi-length
base intervals.
– The domain of salary could be quantized into base intervals of length
$2000:
50000 60000
58000
56000
salary
54000
52000
50000
E1(salary) = [52000, 54000] → [52000, 54000] → [54000, 56000]
E2(salary) = [52000, 56000] → [52000, 54000] → [52000, 56000]
Temporal Association Rules
Snapshot 1
Evolution space
Snapshot 3
Temporal Association Rules
– The subcube-supercube relationship defines a lattice
among all evolution cubes within the evolution space.
– This also holds for the evolution space of more than
one attributes.
60000
salary
50000
1000 2000
monthly housing expense
Temporal Association Rules
• Some properties of the metrics enable us to search
efficiently through the lattice in a bottom-up manner.
. . . .
...
...
...
...
..
..
50000
1000 2000
monthly housing expense
Temporal Association Rules
• Strategy: An interval can be included in a rule only if
there are some minimum number of objects whose
attributes values fall into that interval.
– The density of each base cube within the evolution cube of a rule
has to meet some threshold.
.. .. ..
. . min_density = 2
... ..
. .
.. .
– In the previous example, the second rule can be eliminated.
• Property of density: An evolution cube could satisfy the
density threshold only when all of its subcubes satisfy the
density threshold.
Temporal Association Rules
• General Model:
– Data set D
– Language L
• express properties or define subgroup of data
– Selection predicate q
• evaluate whether a sentence L defines a potentially interesting
class of D
– Task: find the set { | q(D, ) is true}
• If
– a lattice can be formed on sentences in L and
– partial order exists on selection predicate
• then the level-wise algorithm can be used to prune search
space efficiently.
Temporal Association Rules
• Temporal Association Rule:
– Language L: each sentence L is a temporal association rule.
– The selection predicate q(D, ) is true iff
• support(D, ) min_support and q1
• strength(D, ) min_strength and q2
• density(D, ) min_density q3
– Task: find the set of temporal association rules which satisfy all
three predicates.
• Specialization relation < ⎯ a lattice on the sentences in L
– subcube/supercube relationship
Temporal Association Rules
• partial order on qi with respect to <
– support(D, ) support(D, ) if <
– if strength (D, ) < min_strength for all < , then strength(D, )
< min_strength
– density(D, ) density(D, ) if <
• level-wise algorithm
– basic scheme: starting from the most special (general) sentences,
and then evaluate more and more general (special) sentences
excluding those sentences that can not be interesting given all the
information obtained in earlier iterations.
Efficient space pruning
– Starting point
– Random sampling
– Order of predicate evaluation
Temporal Association Rules
• Efficiency of space pruning
– SR algorithm: after quantization, base intervals are combined as long as
their density satisfies the threshold. The original base intervals and the
combined intervals are treated as a set of items.
100000 objects
100 snapshots
5 attributes
500 rules of length 5
density = 2
support = 5%
strength = 1.4
Conclusions and Future Work
• Selected future work
– Data mining over Internet
• data type
• networking issue
– Analytical model
• classify data mining problems
• devise efficient general approach
– Applications
• compiler/programming language
• WWW