Multi-Dimensional Model-Based Clustering For User-Behavior Mining in Telecommunications Industry
Multi-Dimensional Model-Based Clustering For User-Behavior Mining in Telecommunications Industry
0-7803-8403-2/04/$20.00WOO4 IEEE
1650
ed licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on January 30,2025 at 14:07:50 UTC from IEEE Xplore. Restriction
Proceedings of the Third International Conference on Machime Learning and Cybernetics, Shanghai, 26-29 August 2004
accumulate for each customer in data warehouse system. (unknown) period of time, following some hidden
Third, while in other applications the sequences have clear motivations. The telecom databases logged the services of a
boundaries, such as idle time between Webpage clicks, user along with the time stamp and other information such
between user sessions, in our work we do not have such as the duration, total cost, registration information. etc. for
clear indicators. Finally, our dataset is of very large sizes, using each service. The log data can be mated as
making it important to search for efficient methods for sequences of elements, where each element is a
clustering. These new challenges motivate us to explore multi-dimensional data record. Hence, the telecom problem
novel sequence mining techniques. can be considered as a sequential-mining problem.
In this paper, we present a novel model-based Specifically for our research, we are interested in fmding
clustering technique for high-dimensional sequences in our user clusters from these sequences
analyzing users' chum behaviors in the telecommunications
industry. We show that while typically such an industry has 3. Relatedwork
(very large data sets where the sequences have features that
we point out above, our model-based clustering method is For the sequence mining problem, researchers in the
very effective in finding out who is more likely to chum past have applied probabilistic approaches. These
along with their corresponding chuming behaviors. approaches use the most frequently occurring pattems from
This paper is organized as follows: In the next section, all the possible pattems as a basis for analyzing user
we describe the problem domain. In Section 3, we review behaviors. Network systems researchers have been using
previous work on model-based clustering, and in Section 4 Markov models and N-grams models to construct
we describe the main tecbniques. In section 4, we show the sequential classifiers. For example, Su et al. [I3] performed
experimental setup and results. Finally, we conclude our an empirical study on the trade-offs between precision and
contribution with a discussion of future work in Section 6. applicability of different N-gram models, who showed that
longer N-gram models could make more accurate
2. Problem Domain Description prediction as compared to shorter ones at the expense of
lower coverage. Pitkow et d. 'I5' suggested a way to make
We solve a special problem from a telecom company predictions based on K'*-order Markov models.
in China, which has a data warehouse that maintains In clustering, Mobasher et al. [I6] applied a point-based
historical customer behavior in the terabyte level every transaction clustering method for the Web personalization
month. The data holds the records that represent dozens of problem by grouping transactions that are similar together
services from the company itself to its competitors. In based on co-occurrence pattems of UU. references. Han et
mainland China, there are at least six telecom companies al. 11'1,1181 developed an association-rule-based hyper-graph
providing over 20 services of the same kind for the end clustering algorithm. Nasraoui et al. [I9] presented a
user - causing the so-called the customer-chum problem. hierarchical clustering technique based on the concept of
The telecom industry is very interested in finding out the genetic niches called Hierarchical Unsupervised Niche
customer transitional behavior through data mining. Clustering(HUNC). Their work is a point-based approach
In order to study user behavior, the traditional rather than the path-based approach and the dimension n is
approach of the telecom company is to use SQL scripts in the total number of valid URLs. Some researchers take the
their analysis. For example, if they 'think' that people using sequential relationship in the data into account. Hay et al!*O1
service S. and s b is a group of customers who tends to leave used an alignment method to solve the clustering
from S. to s b or from Sb to Sa, then they would probe the navigation problem. They illustrated how to cluster
data warehouse with a SQL scripts like this: navi ation pattems using a Sequence Alignment Method
SELECT * FROM record-tableWHERE service-id=S, and service-id=Sa sm[dl,instead of clustering users by means of a Euclidean
and count-S. > 50 and count-S, > 103 and service-time > ...; distance measure. Web researchers such as Cadez et al."'
Clearly, this approach has several shortcomings. For
grouped user behaviors by first-order Markov models.
example, how could we know services like S. and S,, should
These models are trained by an Eh4 algorithm [221. These
be grouped together, but not S, and S, or others? How could clustering algorithms reveal insight on users'
we know the boundary for each group or clustering ahead
Web-browsing behavior through visualization.
of time, should be set at 50 but not other values? What is
The traditional clustering algorithms such as K-means
the period of validity of those values? Without using data
or K-Medoids suffer from the fact that they model each
mining techniques, these answers are hard to come by.
data item as a pint; thus they cannot handle sequences too
Taking a closer look, telecommunications users often
well, nor can they deal with irregular sharps and differing
jump from one indushy to another within a certain sizes of different clusters in the data. Consider a data set D
1651
d licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on January 30,2025 at 14:07:50 UTC from IEEE Xplore. Restrictio
Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004
. .
consisting of N sequences, D=(Sj, ...So...S,J, where which are complex structures. However, the original model
Si={x,',. is a sequeqce of length Lithat is composed of assumes that each state is a simple symbol. In the next
potentially multivariate items &.The problem of clustering section, we describe our state model in more detail.
sequences is how to discover the natural groupings in the
sequential data. This is analogous to clustering in 4.2. The Feature Vector State Model
multivariate feature space which is normally handled by
methods such as K-means and Gaussian mixtures. However, We introduce the concept of a session fmt. A session
we want to cluster the sequences s rather than the feature x. is defined as a sequence boundary from the whole
Moreover, the sequences can be of different lengths. It is historylog of a user. It is a trace of states (called services in
not clear what a meaningful distance metric is for sequence telecommunications industry) with a unique index number
comparison. for each user during the service period. We define the
period is one month in OUT application. More fo?mally, a
4. Our Markov Model-Based Clustering Algorithm user session s is defined as, S=<PjP ,... P,7, where n is
an integer that stands for the service in session S, P; is an
4.1. Problem Statement arbitrary service and Pi.j is used just before Pi.
For ease of representation, a state Pi can be converted
More formally, a sequence is defined as Si=PjP +.Pa, to a feature vector vi We use a capital letter P to represent
where Pk are telecom services that a user used in a certain the whole state space and V to represent the whole
period. If we use the discrete integer numbers to denote the feature-vector space. Thus, for any page sequence S, we
individual services, the sequences from service-log is have a corresponding feature-vector sequence v for its
exemplified in Table , representation: S=<PIPZ...P p , pi & P , and
In our work, we use a first-order Markov model for
clustering. Formally speaking, a first-order Markov model v=<vIv, ...V">, vi E v
is a model that assumes the probability distribution over the Furthermore, we define a feature vector vi as a record
next state only depends on the current state (and not on the consisting of several features. For an example, a vector vi
previous ones). Let S, be the system's.state at time step f; in can be a triple:
our telecommunications dataset a state corresponds to a <fnumeric-feature/,(categorical-feature/,(range-feature/>.
customer record. A first-order Markov model is a triple The numeric-feature set holds all the features that
(Q,A.n), where: Q = l q j , q 2 .....qJ is a set of states: A is the announced as numeric feature; categorical-feature set
transition probability matrix, where fi~=P(S,=@,.j=q,) is contains all the features that announced as discrete feature;
the probability of transition from a state qi to a state q; , the range-feature is a special nnmet'ic-feature'that has lower
which is assumed io be stationary for all t>O. n is the and upper bound.
initial probability vector, where 77 ,=P(Spqi) is the As an example, we have two sessions from users uj
probability that the initial state is qi and y2 in @%?! %e&IJ183a1. . We treat the athibutes
'age' as a numeric feature, 'gender', 'service', 'weekday',
'month' as categorical features and 'duration' as a range
'.
feature.
1652
d licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on January 30,2025 at 14:07:50 UTC from IEEE Xplore. Restrictio
Proceedings of the Third International Conference on Machine Learning and Cybemetics, Shanghai, 26-29 August 2004
For a different feature set, we use a different similarity Each state v; takes values from a discrete alphabet
function to measure any two feature vectors, as follows: v/ E [I,...,MI.We write e=UZ,e',O' > where:
a). For numeric features:
7c is a vector of K mixture weights.
d
s,= 2z (Papi -gapJ) .wheregapk= valuek-Meun.
(gapj + gap )
w
W 8' is a set of K initial state probability vectors.
eT
is a set of K transition matrices.
b) For categorical features:
1653
d licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on January 30,2025 at 14:07:50 UTC from IEEE Xplore. Restrictio
Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004
1654
d licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on January 30,2025 at 14:07:50 UTC from IEEE Xplore. Restrictio
Proceedmp of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004
P. Hingston, “Using finite state automata for sequence [13] Z. Su, Q. Yang, Y. Lu,and H.-1. Zhang, “What next:
mining,” in The Twenty-Fifth Australasian A prediction system for web request using n-gram
Conference on Computer Science, vol. Volume 24 sequence models,’’ in Web Information Systems
Issue 1. Inc. Darlinghuist, Australia, Australia: Engineering, 2000, pp. 214-221.
Australian Computer Society, January 2002, pp. [I41 M. Perkowitz and 0. Etzioni, “Towards adaptive Web
105-110. sites: conceptual framework and case study,”
C.-N. Hsu and M.-T. Dung, “Generating finite-state Computer Networks, vol. 31, no. 11-16, pp.
transducers for semi-structured data extraction from 1245-1258, 1999.
the web,” Information Systems, vol. 23, no. 8, pp. [15] 1. Pitkow and P. Pirolli, “Mining longest repeating
521-518.199R.
~~~ ..~,~ subsequences to predict world wide web slllfng,” in
R. L. Rivest and R. E. Schapire, “Diversity-based Second USENIX Symposium on Internet
inference of linite automata,” in Roc. 28th Annu. Technologies and Systems. Boulder, CO, Oct 1999, pp.
IEEE Sympos. Found. Comput. Sci. IEEE Computer 139-150.
Society Press, Los Alamitos, CA, 1987, pp. 78-87. [16] B. Mobasher, R. Cooley, and I. Srivastava,
C. R. Anderson and P. Domingos, “Relational markov “Automatic personalization based on Web usage
models and their application to adaptive web mining,” Communications of the ACM, vol. 43, no. 8,
navigation,” in The Eighth ACM SIGKDD pp. 142-151,2OOO.
International Conference on Knowledge Discovery [17] E. hong Han, G. Karypis, V. Kumar, and B. Mobasher,
and Data Mining (KDD-2002). Edmonton, Alberta, “Hypergraph based clustering in high-dimensional
Canada: ACM Press, July 2002, pp.143-152. data sets: A summary of results,’’ Data Engineering
I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S. Bulletin, vol. 21, no. 1, pp. 15-22, 1998.
white, “Visualization of navigation pattems on a web [I81 E.-H. Han, G. Karypis, V. Kumar, and’B. Mobasher,
site using model-based clustering,” in Knowledge “Clustering based on association rule hypergraphs,” in
Discovery and Data Mining, March 2000, pp. Research Issues on Data Mining and Knowledge
280-284. Discovery, 1997, pp. 9-13.
P. Smyth, “Clustering sequences with bidden markov [19] 0. Nasraoui and R. Krishnapuram, “Robust
models,” in Advances in Neural Information multi-resolution web usage data mining using
Processing Systems, M. C. Mozer, M. I. Jordan, and T. hierarchical unsupervised niche clustering,” in
Petsche, Eds., vol. 9. The MIT Press, 1997, p.648. ANNIE (Artificial Neural Networks In Engineering)
[lo] R. Agrawal and R. Srikant, “Mining sequential Conference, St. Louis, Nov 2001, pp. 369-374, won
patterns,” in Eleventh Intemational Conference on the Best Paper Award.
Data Engineering, P. S. Yu and A. S. P. Chen, Eds. [201 B. Hay, G. Wets, and K. Vanhoof, “Clustering
Taipei, Taiwan: IEEE Computer Society Press, 1995, navigation patterns on a website using a sequence
pp.3-14. alignment,” in Intelligent Techniques for Web
[I I] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and Personalization: UCAI 2001 17th International Joint
A. Verkamo, “Fast discovery of association rules,” Conference on Artificial Intelligence, Seattle, Wash.,
Advances in Knowledge Discovery and Data Mining, USA, August 2 0 0 q : 1-6. ,
pp. 307-328, 1996. [211 S. D., Time Wraps, tnng Mts, and Macromolecules:
[I21 K.-F. Lee, Automatic Speech Recognition: The The Theory and Practice of Sequence Comparison, 1.
Development of the SPHINX System. Kluwer Kruskal, Ed. Addison-Wesley,l983.
Academic Puhlishers,Boston, MA: Boston: Kluwer, [22] I. Cadez and P. Smyth, “Probabilistic clustering using
1989. hierqrchical models,” 1999.
1655
d licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on January 30,2025 at 14:07:50 UTC from IEEE Xplore. Restrictio