TT02 Data, Methods, and Scenarios
TT02 Data, Methods, and Scenarios
and Scenarios
Bonchi, F., Castillo, C., Donato, D., & Gionis, A. (2009). Taxonomy-driven lumping for sequence mining.
Data Mining and Knowledge Discovery, 19(2), 227-244.
Example: Functional regions in cities
H. Assem, B. Caglayan, T.S. Buda, D. O’Sullivan. ST-DenNetFus: Deep Spatio-Temporal Dense Networks for Network Demand Prediction.
The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2018
Problem types
Data mining methods try to find
relationships
●Between columns
− Find associations, correlations, …
− If there is one key column: classification, prediction, ...
●Between rows
− Find clusters
− Detect outliers
Example:
Association pattern mining
●
Sparse binary databases
representing, e.g., items a person is
interested in
●
The relative frequency of a pattern is
its support
https://cs.nju.edu.cn/zlj/Course/DM_15.html
Association pattern mining (cont.)
●Given a binary n × d data matrix D,
− determine all subsets of columns such that all the
values in these columns take on the value True for at
least a fraction min_support of the rows in the matrix.
●The relative frequency of a pattern is referred to
as its support
Association pattern mining (cont.)
●The confidence of a rule A→B is
− support(A U B) / support(A)
●Example:
− { Chips, Olives } → { Beer }
Exercise
The confidence of a rule A→B is
support(A U B) / support(A)
Suppose
10 people buy only Chips and Beer
20 people buy only Chips and Olives
30 people buy only Olives and Beer
40 people buy all three: Chips, Olives, and Beer.
What is the confidence of the rule
{ Chips, Olives } → { Beer } ?
Clustering
●Partition records/rows in a way that
− elements in the same partition are similar
− elements in different partitions are different
●Applications:
− Segmentation, summarization, …
− Sometimes a step in a larger DM algorithm
Image credit: http://www.sthda.com/english/articles/tag/pam-clustering/
Clustering is not easy
●What does it mean to be similar?
●How many sets?
●Can a record/row belong to more than one set?
●Can a record/row belong to no set at all? ...