Found Interest in Migration Patterns Based On Hidden Markov Models
Found Interest in Migration Patterns Based On Hidden Markov Models
2
2001 year 2 month CHIN ESE J. COM PU TERS Feb. 2001
Pick want Web Mining is an important research direction is found in the user's migration patterns. In general, migrating users have some purpose. The purpose of this performance for users interested in a
certain concept. In this paper interest based on Hidden Markov Models migration patterns discovery method for discovering this user migration patterns with some interest. this model is essentially a special
association rules. in this method, the authors first based on user access records defined a hidden Markov Markov model, and then propose a new incremental discovery algorithm Inc rease- R Found interest
for migration patterns, while the proof to demonstrate that the algorithm can be found in all of the interest in migration patterns.
Key words Web Data mining, Hidden Markov Models, association rules, migration patterns
CLC: TP18
Abstract Mining Navigation pat terns is an impo rtant research direction in web mining. The discovered Navigation pat terns
can be used to help the designers to understand the users' access actions, improve the st ructure design, carry out the
adverti sement, and get the users 'cha racteristics. in general, a user accesses a web site wi th some intentions. These
intentions represent the interest in some conceptions. So the user' s interest has some relation wi th hi s navigation path. The
users' interest navigation paths compose the users 'interest nav iga tion pat terns. in this paper, we present a new method fo
r mining interest navigation pat terns based on the hidden Markov model in order to di scover users' interest navigation pat
ter ns. These pa tterns are a kind of the special association rules essentially.In our approach, we bui ld a hidden Ma rkov
model according to web server logs fi rstly, then we present a new incremental di scovery algo ri thm Increase_ R in o rder to
discover the interest navigation pat terns, and we testify that the algo ri thm can find all interest navigation pat terns.
Keywords web mining, hidden Markov model, association rule, navigation pat terns
design , Web Site navigation design , E-commerce and other work is becoming more
From the business side of the site, they need a good auto-aided design tools,
Wo rld-Wide Web The current is rapidly developing, some major work users can access interest groups according to the , Access frequency ,
on it, for example, Web Site Design , Web service Access time dynamically adjust the page structure, improve service, to carry out needle
Received Date: 2000-01-18; Revised date of receipt: 2000-11-06. king In fact, Male, 1971 Born, PhD, principal research interests include data mining. high Wen, Male, 1956 Born, Ph.D., professor, doctoral tutor, the main research areas of
multimedia data compression , Image Processing , Computer Vision , Multi-mode interface , artificial intelligence , Virtual reality. Li Jintao, Male,
1962 Research Center for Intelligent Information Platform for Family Health, researcher, the main research areas , Research application of digital home appliances. Huang Tiejun, male, 1970 Year-old, post-doctoral, main research areas of virtual reality.
2 period king Real et al: Found interest in migration patterns based on Hidden Markov Models 153
A powerful tool of e-commerce in order to better meet the needs of visitors to solve this abilistic g rammar) It found that users migrate mode, and with g rammar Digging to
demand is Web Data mining, namely the use of ideas and data mining methods, which entropy evaluation mode.
use to Web Carried on Web Dig, dig out useful information. Web Mining is an important In general, these methods do not consider the purpose of the visit of users, but
research direction is found that users of migration patterns, which can be used to solve mining sequence according to the user's browser.
these problems. In this paper we propose a new interest in migration patterns based on hidden
Markov model discovery method, so we can find users with a migration patterns of
When a user accesses a Web When the site, in fact, he was with a interest. This migration pattern with a certain interest is essentially a association rules.
purpose to browse, that is something he is interested in. Because with in this method, we first define a record hidden Markov model based on user access,
different interests between users, so they can be accessed by different paths. then we propose a new algorithm for incremental discovery Increase- R
Some may take advantage of existing business analysis tools [ 1 ] For analysis Found interest for migration patterns, and we give proof to demonstrate that the
Log, But these tools can only produce some simple statistics, such as page access algorithm can be found in all of the interest in migration patterns.
frequency and so on. The first article 2 Section gives some definitions and basic models. The first 3 Section
Literature [ 2 ] For the first time given Web Defined excavation, and gives about Web we discuss the object to be excavated. The first 4 Section outlines hidden Markov
Access to information mining system WEBM IN ER. model to use first order discrete output. At 5 Section gives a hidden Markov model with
Thinking is mentioned in the literature by Web Site logs are processed [ 3, 4], The data is interest, as well as how to take advantage of this model is to tap migration patterns. We
organized into a conventional data mining methods can be processed in the form of first 6 Section describes the experimental procedure, and the advantage of this method
transaction data, and data mining methods using conventional described by simulation and actual experiments.
( Such as association rules discovery algorithm [ 5]) Processing, mining results are
obtained which conventional data mining results. 2 Definitions and basic model
Literature [ 6] The first time the data mining technology to e-commerce
environment to find market intelligence. Excavated objects include not only log , Web Page, definition 1. User access concept e: When a user accesses a
as well as market data. This document and gives a general framework for e-commerce Web When the site, the goal of his visit, he is interested in things or concepts, such as
environment, mining. But their methods are still limited to traditional mining methods. a certain kind of book , A certain kind of goods, or the concept of academic interest him,
" Foo tprints "[ 7] The idea is: Visitors access a definition 2. Set user access concept E: By the user to access one or some
Web When the site will be left " Footprint " Over time, the most frequently visited areas will of the concepts of: E = e 1 , e 2 , ..., e M.
form a path, so new visitors will be accessed based on these paths. " Footprint " It is Web Site designers generally follow a Web The concept distribution
automatically left, and visitors do not need to provide any information about yourself. WUM model site design. We define below a Web
[ 8 ] is true " Foo tprints " An improved method, which is defined g-sequences Migration Distribution model concept site.
patterns for mining and mining presents a language M IN T. definition 3. Web The concept model site CG = (W, E):
Literature [ 9] The Log Data is mapped to relational tables, and data mining Hyperlink set between pages in which each page may be placed different concepts,
methods using standard migration patterns found user. a concept may be distributed among different pages. FIG. 1 Fig.
The question then is what we hope to find a path related to a particular . Among the speech recognition In this paper, we use a discrete output, a first-order
concept, these paths, groups, users access the possibility of a larger concept, and are hidden Markov model:
less likely to access other concepts. The path is the sense of the concept of user 1. A set of states Q, Having a specified initial state q I And final state q F.
Mining object exists in the log files on the server, which follows the format W3C observed, and so forth, until the final state, then generates a symbol string: X = x 1 , x 2 , ...,
standard[ 11]: x l. Each transfer there is a transition probability P (q → q ') . The probability of a state
Pro tocol version Version of us ed t rans f er prot ocol Here q 0 with q l + 1 The initial state q I And final state q F. x l + 1 To suspend symbol.
... ...
V (X | M) = arg max P (q k- 1 → q k) p (x k | q k)
user's IP address l. ip, User identifier l. uid, Accessed page URL address l. url Access
q 1, ..., ql ∈ Ql Π kl +
=11
t = < ip t, uid t, {( l t 1. url, l t 1. time) , ..., ( l tm. url, l tm. time)} >
5 Hidden Markov Models with interest
wher e, fo r 1 ≤ k ≤ m, l tk ∈ L, l tk. ip = ip t,
3. Accessing the record set for each visitor, based on C Divided, to find every P (q i → q j);
visit to every visitor record set, then, every visitor of each visit constitutes a record set In the transaction set T Any two nodes is calculated q i, q j The probability
4. The final transaction time by accessing all sort of form to access transaction set T.
P (q i → q j) ≈ count (q i → q j) (3)
T We constitute the basis for mining, every user transaction access sequence count (q i)
characterization of the user's visit. among them count (q i → q j) It represents the transaction set T in q i, q j Simultaneous and q j Immed
P ( e jt | q j) , Is the standard HMM The probability of observing the status of nodes. Each P ( e Algo rithm: Incr ease- R
j
t '| q j) Meaning ( e jt ' Abbreviated as e): Groups of users through Input: Q, E, C
Begin:
Live q j Were the k Visit, these visits total visits E ' Concept (reuse allowed) , It is
k : = 1; j : = 1; S k : = E;
contained therein e I.e., a ratio of approximately
While j = 1 do
P ( e jt '| q j) .
j : = 0;
Formalized as follows: provided n A transaction set T = {t 1 , t 2 , ..., t n} , m A set of
Fo r each s ∈ S k
states
Fo r each q ∈ Q
Q = {Q 1 , q 2 , ..., q m} , q j State is set on the article ( e j 1 , ..., e j t) .
If R ( e | ( s, q) ) ≥ C then
T The first transaction set i Transaction is
S k + 1 : = S k + 1+ ( s, Q);
t i = < t i [ 1 ] , t i [ 2] , ..., t i [ f] > t i [ f '] ∈ Q,
R k + 1 : = R k + 1+ R ( e | ( s, q) ); J : = 1;
f '= 1, ..., f (4) End If;
t ' i Show t i Each component of the transaction, i.e. each set of the access point: End Fo r;
End Fo r;
then S i, j Show t i Affairs q j All collections are accessible nodes after (including q j) . End.
Output:
association rules.
(6)
Then q j Node, user groups e The probability is of interest You will be able to find it.
R ( e | ( q 1 , q 2 , q 3, ..., q k- 1 , q k) ) ≥ C.
By Log Analysis file, we can build such a HMM model.
According to the algorithm, it must be found R ( e | ( q 1 , q 2 ) ) ,
R ( e | ( q 1 , q 2 , q 3) ) , ..., R ( e | ( q 1 , q 2 , q 3, ..., q k- 1 ) ) ,
5. 2 Found that migration patterns
R ( e | ( q 1 , q 2 , q 3, ..., q k - 1 , q k) ) . QED.
definition 4. Access sequence S k: S k = ( q 1 , q 2 , q 3, ...,
note: In operation to ensure C Value small enough to allow q 1 For the first
q k- 1 , q k) , A state sequence for the user to access.
definition 5. Interest association rules R ( e | S k): Given a sequence of access S k And sequences.
R ( e | S k) = ( P (q 1 → q 2 ) × P ( e | q 2 ) ) × ( P (q 2 → q 3) × 6 real Test
P ( e | q 3) ) × ... × ( P (q k- 1 → q k) × P ( e | q k) ) , and
It was divided into two steps: The first experiments with Markov model (in the
R ( e | S k) ≥ C (C A given reliability threshold) .
model stage MM) The methods were compared. Secondly experiments in a real
definition 6. Interest association rule sets R k: R k It is the interest association
rules R ( e | S k) Collection. environment, Institute of Computing Technology of the log for the object description of
Interest association rules reflect migration patterns of certain user groups or the operation of the algorithm.
concepts in order to find items of interest this interest association rules, we present an 6.1 Based Markov model ( MM) The methods were compared
Institute of Computing Technology of monthly users Web Site visit data for one year.
The whole site, including 352 More html Total page. User access logs for 147M, include 174,
9934 Key. After the transaction identification algorithm, were identified 10399 Users
access transaction, average visit length transaction 8.8, Namely the average user each
E Subset, in accordance with Log The establishment of hidden Markov model, after the
discovery migration patterns. The concept " University " For example, found that migration
= 0.5. ly. h tm l)
According to the concept on the distribution of nodes and these visits, the (/ Cjc / cjccw. H tml, / cjc / 1.635 × 10- 6
cjcc. H tm l, / cjc / cjcw 2.
calculated P ( e | q) Table 2 Fig.
h tml, / cjc / int roc. H tm
l, / cjc / abst c. H tm l)
table 2 Observation probability for each concept on each node
A B C D
N4 0 1 0 0
In this article we will for the first time into the hidden Markov model interest in
N5 0 0 0 1
migration pattern discovery methods, expanding HMM Applications, which can be found
with the migration patterns of user interest. This migration pattern with a certain interest
Credibility is located 0 According to the algorithm we can find 3 A migration
is essentially a special kind of association rules, reflects the preferences of the user's
patterns, as shown in Table 3 Fig.
access. In this method, first, we define user access history based on a hidden Markov
table 3 Degree of certainty 0 Of B Items of interest occur 3 Article migration patterns
model, then we propose a new algorithm for incremental discovery Increase- R Found
Feature of our approach are: 1) It found that the migration patterns of user
Then R (B | S 33) , R (B | S twenty one ) Greater than R (B | S twenty two ) , Obviously interest access with interest; 2) Periodicity , Offline excavation; 3) Mining is the object migration
association rules to better reflect the preferences of the user's access. behavior of all users, mining is all the user's access interests, mining results for all
6. 2 Experiments with a real background users; no specific information of one or a class of users; 4) By automatically across
We selected the Institute of Computing Technology of the Chinese Academy of Sciences Web pages category set in nature, migration patterns are not necessarily found in Web
different
Server ( ht tp: / / www ict ac cn)... Log on as subjects, including experimental data There are direct links to the site.
want to access. in tellig ence th rough online anal ytical Web usage mining S IGMOD Record, 1998, 27
Our further work will focus on the application of this method to predict the (4):. 54- 61 7 Wexelbl at A, Maes P. Footprint s: Hi st ory-rich w eb brows-
mining. In t Journal of Com Sys t em put er, Sci ence and Engineering, Special
1 St ort R. Web Si te Stat s: Tracking Hi ts and Anal yzing Traf f ic. Issue on " Semanti cs of th e Web " 1999, 3 (1): 105-113
2 Cooley R, M obash er B et a l. Grouping Web pag e references in- 9 Ch en MS, Park JS, Yu P S. Eff ici en t dat a mining f or t raver-
to t ransactions f or mining Wo rld Wide Web brow sing pat terns. sal pat terns. IEEE Trans Know ledge and Data Engin eering,