0% found this document useful (0 votes)
44 views6 pages

Found Interest in Migration Patterns Based On Hidden Markov Models

This document proposes a new method for discovering user navigation patterns based on interest using hidden Markov models. It defines a hidden Markov model based on user access records to model user behavior. It then presents an incremental discovery algorithm called Increase_R to discover all interest-based navigation patterns in the hidden Markov model. Experimental results using both simulations and real web logs demonstrate that the algorithm can effectively discover such patterns.

Uploaded by

Diyacel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views6 pages

Found Interest in Migration Patterns Based On Hidden Markov Models

This document proposes a new method for discovering user navigation patterns based on interest using hidden Markov models. It defines a hidden Markov model based on user access records to model user behavior. It then presents an incremental discovery algorithm called Increase_R to discover all interest-based navigation patterns in the hidden Markov model. Experimental results using both simulations and real web logs demonstrate that the algorithm can effectively discover such patterns.

Uploaded by

Diyacel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

The first twenty four volume The first 2 period meter  Count  machine  learn  Report Vo l. 24 No.

2
2001 year 2 month CHIN ESE J. COM PU TERS Feb. 2001

Found interest in migration patterns based on Hidden Markov Models

king real high Culture Li Jintao Huang Tiejun


( Institute of Computing Technology Beijing 100080)

Pick want Web Mining is an important research direction is found in the user's migration patterns. In general, migrating users have some purpose. The purpose of this performance for users interested in a

certain concept. In this paper interest based on Hidden Markov Models migration patterns discovery method for discovering this user migration patterns with some interest. this model is essentially a special

association rules. in this method, the authors first based on user access records defined a hidden Markov Markov model, and then propose a new incremental discovery algorithm Inc rease- R Found interest

for migration patterns, while the proof to demonstrate that the algorithm can be found in all of the interest in migration patterns.

Key words Web Data mining, Hidden Markov Models, association rules, migration patterns

CLC: TP18

Mining Interest Navigat ion Patterns Based on Hidden Markov Model

W ANG Shi GAO Wen LI Jin-Tao HUANG Tie-Jun


( Institute of Comput ing Technology, Ch inese Acad emy of Sciences, Beij ing 100080)

Abstract Mining Navigation pat terns is an impo rtant research direction in web mining. The discovered Navigation pat terns
can be used to help the designers to understand the users' access actions, improve the st ructure design, carry out the
adverti sement, and get the users 'cha racteristics. in general, a user accesses a web site wi th some intentions. These
intentions represent the interest in some conceptions. So the user' s interest has some relation wi th hi s navigation path. The
users' interest navigation paths compose the users 'interest nav iga tion pat terns. in this paper, we present a new method fo
r mining interest navigation pat terns based on the hidden Markov model in order to di scover users' interest navigation pat
ter ns. These pa tterns are a kind of the special association rules essentially.In our approach, we bui ld a hidden Ma rkov
model according to web server logs fi rstly, then we present a new incremental di scovery algo ri thm Increase_ R in o rder to
discover the interest navigation pat terns, and we testify that the algo ri thm can find all interest navigation pat terns.

Keywords web mining, hidden Markov model, association rule, navigation pat terns

design , Web Site navigation design , E-commerce and other work is becoming more

1 lead Speech complex and more and more onerous.

From the business side of the site, they need a good auto-aided design tools,

Wo rld-Wide Web The current is rapidly developing, some major work users can access interest groups according to the , Access frequency ,

on it, for example, Web Site Design , Web service Access time dynamically adjust the page structure, improve service, to carry out needle

Received Date: 2000-01-18; Revised date of receipt: 2000-11-06. king In fact, Male, 1971 Born, PhD, principal research interests include data mining. high Wen, Male, 1956 Born, Ph.D., professor, doctoral tutor, the main research areas of

multimedia data compression , Image Processing , Computer Vision , Multi-mode interface , artificial intelligence , Virtual reality. Li Jintao, Male,

1962 Research Center for Intelligent Information Platform for Family Health, researcher, the main research areas , Research application of digital home appliances. Huang Tiejun, male, 1970 Year-old, post-doctoral, main research areas of virtual reality.
2 period king Real et al: Found interest in migration patterns based on Hidden Markov Models 153

A powerful tool of e-commerce in order to better meet the needs of visitors to solve this abilistic g rammar) It found that users migrate mode, and with g rammar Digging to

demand is Web Data mining, namely the use of ideas and data mining methods, which entropy evaluation mode.

use to Web Carried on Web Dig, dig out useful information. Web Mining is an important In general, these methods do not consider the purpose of the visit of users, but

research direction is found that users of migration patterns, which can be used to solve mining sequence according to the user's browser.

these problems. In this paper we propose a new interest in migration patterns based on hidden

Markov model discovery method, so we can find users with a migration patterns of

When a user accesses a Web When the site, in fact, he was with a interest. This migration pattern with a certain interest is essentially a association rules.

purpose to browse, that is something he is interested in. Because with in this method, we first define a record hidden Markov model based on user access,

different interests between users, so they can be accessed by different paths. then we propose a new algorithm for incremental discovery Increase- R

Some may take advantage of existing business analysis tools [ 1 ] For analysis Found interest for migration patterns, and we give proof to demonstrate that the

Log, But these tools can only produce some simple statistics, such as page access algorithm can be found in all of the interest in migration patterns.

frequency and so on. The first article 2 Section gives some definitions and basic models. The first 3 Section

Literature [ 2 ] For the first time given Web Defined excavation, and gives about Web we discuss the object to be excavated. The first 4 Section outlines hidden Markov

Access to information mining system WEBM IN ER. model to use first order discrete output. At 5 Section gives a hidden Markov model with

Thinking is mentioned in the literature by Web Site logs are processed [ 3, 4], The data is interest, as well as how to take advantage of this model is to tap migration patterns. We

organized into a conventional data mining methods can be processed in the form of first 6 Section describes the experimental procedure, and the advantage of this method

transaction data, and data mining methods using conventional described by simulation and actual experiments.

( Such as association rules discovery algorithm [ 5]) Processing, mining results are

obtained which conventional data mining results. 2 Definitions and basic model
Literature [ 6] The first time the data mining technology to e-commerce

environment to find market intelligence. Excavated objects include not only log , Web Page, definition 1. User access concept e: When a user accesses a

as well as market data. This document and gives a general framework for e-commerce Web When the site, the goal of his visit, he is interested in things or concepts, such as

environment, mining. But their methods are still limited to traditional mining methods. a certain kind of book , A certain kind of goods, or the concept of academic interest him,

with its e Representation.

" Foo tprints "[ 7] The idea is: Visitors access a definition 2. Set user access concept E: By the user to access one or some

Web When the site will be left " Footprint " Over time, the most frequently visited areas will of the concepts of: E = e 1 , e 2 , ..., e M.

form a path, so new visitors will be accessed based on these paths. " Footprint " It is Web Site designers generally follow a Web The concept distribution

automatically left, and visitors do not need to provide any information about yourself. WUM model site design. We define below a Web

[ 8 ] is true " Foo tprints " An improved method, which is defined g-sequences Migration Distribution model concept site.

patterns for mining and mining presents a language M IN T. definition 3. Web The concept model site CG = (W, E):

CG As a directed graph, wherein W for Web Page collection, E

Literature [ 9] The Log Data is mapped to relational tables, and data mining Hyperlink set between pages in which each page may be placed different concepts,

methods using standard migration patterns found user. a concept may be distributed among different pages. FIG. 1 Fig.

Literature [ 10] Hypertext probability grammar ( hypertex t prob-


154 meter  Count  machine  learn  Report  2001 year

The question then is what we hope to find a path related to a particular . Among the speech recognition In this paper, we use a discrete output, a first-order

concept, these paths, groups, users access the possibility of a larger concept, and are hidden Markov model:

less likely to access other concepts. The path is the sense of the concept of user 1. A set of states Q, Having a specified initial state q I And final state q F.

groups migration patterns of interest.

2. A set of state transition, each element of ( q → q ') .

3 data preparation 3. A discrete set of output symbols: E = e 1 , e 2 , ..., e M.

Starting from an initial state, a transition to a new state, an output symbol is

Mining object exists in the log files on the server, which follows the format W3C observed, and so forth, until the final state, then generates a symbol string: X = x 1 , x 2 , ...,

standard[ 11]: x l. Each transfer there is a transition probability P (q → q ') . The probability of a state

observer to a special symbol for P ( e | q) . So in a hidden Markov model M A string X Probabil


table 1 User access log format
of being observed for the sake of probability and on all possible paths:
Fi eld Descripti on

Dat e Date, t ime, and timezon e of requ es t

Client IP Remo te hos t IP and / or DNS en try

By tes Byt es transf erred (sent and received)

Server Server name, IP add ress and port P (X | M) = Σ P (q k- 1 → q k) P (x k | q k)


Reques t URI qu ery and s tem q 1, ..., ql ∈ Ql Π
k=1
l+1

Service name Request ed s ervice nam e


( 1)
Tim e taken Time t aken f or transaction to complet e

Pro tocol version Version of us ed t rans f er prot ocol Here q 0 with q l + 1 The initial state q I And final state q F. x l + 1 To suspend symbol.

... ...

set up HMM A common goal is to find a sequence of states V (X | M) , It has the


When digging, first of all to a period of user access log access transaction
greatest probability to observe the sequence:
data organized into user Let L The user access logs, one entry l ∈ L Including the

V (X | M) = arg max P (q k- 1 → q k) p (x k | q k)
user's IP address l. ip, User identifier l. uid, Accessed page URL address l. url Access
q 1, ..., ql ∈ Ql Π kl +
=11

time and access l. time. Access the transaction is defined as


( 2)

t = < ip t, uid t, {( l t 1. url, l t 1. time) , ..., ( l tm. url, l tm. time)} >
5 Hidden Markov Models with interest
wher e, fo r 1 ≤ k ≤ m, l tk ∈ L, l tk. ip = ip t,

l tk. uid = uid t, l t k. time - l tk- 1 . time ≤ C,


5.1 Definition Model
Here C It is a fixed time window.
1. Web Node in the site for HMM The state node q. Given an initial
Correct Log The algorithm for processing each transaction looking to find
virtual state q I.
access to the transaction are as follows:
2. There is a set of concepts E = e 1 , e 2 , ..., e T.
1. Preprocessing the log.
3. q j Node contains E A subset of ( e j 1 , ..., e jt) .
2. According to every visitor IP, Division of the log. That is in Log Find accessing the record set
4. Two nodes q i, q j There is a transition probability between
for each visitor.

3. Accessing the record set for each visitor, based on C Divided, to find every P (q i → q j);

visit to every visitor record set, then, every visitor of each visit constitutes a record set In the transaction set T Any two nodes is calculated q i, q j The probability

access transaction. P (q i → q j) The following formula:

4. The final transaction time by accessing all sort of form to access transaction set T.
P (q i → q j) ≈ count (q i → q j) (3)
T We constitute the basis for mining, every user transaction access sequence count (q i)

characterization of the user's visit. among them count (q i → q j) It represents the transaction set T in q i, q j Simultaneous and q j Immed

q i The number of transactions. count (q i) for T Contains

4 Hidden Markov Model q i The number of transactions.

5. In a node q j On the concept of user groups, set on the node ( e j 1 , ...,


ej
Hidden Markov Models ( HMM) [ 12], It is widely used language t) There is a probability distribution P ( e j 1 | q j) , ...,
2 period king Real et al: Found interest in migration patterns based on Hidden Markov Models 155

P ( e jt | q j) , Is the standard HMM The probability of observing the status of nodes. Each P ( e Algo rithm: Incr ease- R

j
t '| q j) Meaning ( e jt ' Abbreviated as e): Groups of users through Input: Q, E, C

Begin:
Live q j Were the k Visit, these visits total visits E ' Concept (reuse allowed) , It is
k : = 1; j : = 1; S k : = E;
contained therein e I.e., a ratio of approximately
While j = 1 do
P ( e jt '| q j) .
j : = 0;
Formalized as follows: provided n A transaction set T = {t 1 , t 2 , ..., t n} , m A set of
Fo r each s ∈ S k
states
Fo r each q ∈ Q
Q = {Q 1 , q 2 , ..., q m} , q j State is set on the article ( e j 1 , ..., e j t) .
If R ( e | ( s, q) ) ≥ C then
T The first transaction set i Transaction is
S k + 1 : = S k + 1+ ( s, Q);
t i = < t i [ 1 ] , t i [ 2] , ..., t i [ f] > t i [ f '] ∈ Q,
R k + 1 : = R k + 1+ R ( e | ( s, q) ); J : = 1;
f '= 1, ..., f (4) End If;

t ' i Show t i Each component of the transaction, i.e. each set of the access point: End Fo r;

End Fo r;

t ' i = { t i [ 1 ] , t i [ 2] , ..., t i [ f]} , T i [ f '] ∈ Q, k : = k + 1;

f '= 1, ..., f (5) End While;

then S i, j Show t i Affairs q j All collections are accessible nodes after (including q j) . End.

Output:

{ t i [ k + l] | t i [ k] = q j, l = 0 ..., ( f - k)} , qj ∈ t 'i   R k, k = 1, ..., n.


S i, j =
theorem 1. algorithm Increase- R You will be able to find all the interest
NU LL, qj t 'i

association rules.
(6)

S i, j, e Shows S i, j Collection, containing e The set of nodes. Proof. Arbitrarily given a S k: S k = ( q 1 , q 2 , q 3, ...,

S i, j, e = { q | q ∈ S i, j, q Contains e} (7) q k - 1 , q k) , in case R ( e | S k) ≥ C, Then the algorithm Increase- R

Then q j Node, user groups e The probability is of interest You will be able to find it.

in case R ( e | ( q 1 , q 2 , q 3, ..., q k- 1 , q k) ) ≥ C, So because all of the probabilities


‖ S i, j, e ‖
of less than 1, then R ( e | ( q 1 , q 2 ) ) ≥
P ( e | q j) ≈ Σ n
i=1
( 8)
R ( e | ( q 1 , q 2 , q 3) ) ≥ ... ≥ R ( e | ( q 1 , q 2 , q 3, ..., q k - 1 ) ) ≥
Σ n
e '= 1
‖ S i, j, e ' ‖

R ( e | ( q 1 , q 2 , q 3, ..., q k- 1 , q k) ) ≥ C.
By Log Analysis file, we can build such a HMM model.
According to the algorithm, it must be found R ( e | ( q 1 , q 2 ) ) ,

R ( e | ( q 1 , q 2 , q 3) ) , ..., R ( e | ( q 1 , q 2 , q 3, ..., q k- 1 ) ) ,
5. 2 Found that migration patterns

R ( e | ( q 1 , q 2 , q 3, ..., q k - 1 , q k) ) . QED.
definition 4. Access sequence S k: S k = ( q 1 , q 2 , q 3, ...,

note: In operation to ensure C Value small enough to allow q 1 For the first
q k- 1 , q k) , A state sequence for the user to access.

definition 5. Interest association rules R ( e | S k): Given a sequence of access S k And sequences.

user access interest e, Then the interest association rules

R ( e | S k) = ( P (q 1 → q 2 ) × P ( e | q 2 ) ) × ( P (q 2 → q 3) × 6 real Test

P ( e | q 3) ) × ... × ( P (q k- 1 → q k) × P ( e | q k) ) , and
It was divided into two steps: The first experiments with Markov model (in the
R ( e | S k) ≥ C (C A given reliability threshold) .
model stage MM) The methods were compared. Secondly experiments in a real
definition 6. Interest association rule sets R k: R k It is the interest association

rules R ( e | S k) Collection. environment, Institute of Computing Technology of the log for the object description of

Interest association rules reflect migration patterns of certain user groups or the operation of the algorithm.

concepts in order to find items of interest this interest association rules, we present an 6.1 Based Markov model ( MM) The methods were compared

incremental discovery algorithm: Given a configuration diagram of a station, FIG. 2.


156 meter  Count  machine  learn  Report  2001 year

Institute of Computing Technology of monthly users Web Site visit data for one year.

The whole site, including 352 More html Total page. User access logs for 147M, include 174,

9934 Key. After the transaction identification algorithm, were identified 10399 Users

access transaction, average visit length transaction 8.8, Namely the average user each

visit 8.8 Pages.

We defined four concepts: E = { Academic, personnel, journals, agencies};

based on the content of each page, labeled on each page

E Subset, in accordance with Log The establishment of hidden Markov model, after the

discovery migration patterns. The concept " University " For example, found that migration

patterns Table 4 Fig.

Access to the site users 4 Times:


table 4 Degree of certainty 10- 6 Of " University " The concept occurs

u 1: N 1 , N 2 , N 4. Interested 3 Article migration patterns

u 2: N 1 , N 2 , N 4. Access sequence S k Interest association rules R ( " University "| S k) ≥ 10- 6

(/ Cjc / cjccw. H tml, / cjc / cjcc. 4.327 × 10- 6


u 3: N 1 , N 3, N 5. H tm l, / cjc / int roc. H tm l, /
cjc / cjcw 2. h tml, / cjc / con
u 4: N 1 , N 3, N 5. tc. H tml, / cjc / con t98c. h
tml)
The visit has marked the transition probability in the figure, based on the

(/ Cjc / cjccw. H tml, / cjc / 2.019 × 10- 6


Markov model:
cjcc. H tm l, / cjc / int roc. H

P 1 ( N 1 → N 2) = P 1 ( N 1 → N 3) = P 1 ( N 1 → N 2 → N 4) tm l, / cjc / cjcw 2. h tml, /


cjc / oth ers c. H tm l, / cjc /

= 0.5. ly. h tm l)

According to the concept on the distribution of nodes and these visits, the (/ Cjc / cjccw. H tml, / cjc / 1.635 × 10- 6
cjcc. H tm l, / cjc / cjcw 2.
calculated P ( e | q) Table 2 Fig.
h tml, / cjc / int roc. H tm
l, / cjc / abst c. H tm l)
table 2 Observation probability for each concept on each node

A B C D

N1 6/32 10/32 6/32 10/32

N2 0 4/8 2/8 2/8 7 Conclusions and future work


N3 2/8 2/8 0 4/8

N4 0 1 0 0
In this article we will for the first time into the hidden Markov model interest in
N5 0 0 0 1
migration pattern discovery methods, expanding HMM Applications, which can be found

with the migration patterns of user interest. This migration pattern with a certain interest
Credibility is located 0 According to the algorithm we can find 3 A migration

is essentially a special kind of association rules, reflects the preferences of the user's
patterns, as shown in Table 3 Fig.

access. In this method, first, we define user access history based on a hidden Markov
table 3 Degree of certainty 0 Of B Items of interest occur 3 Article migration patterns
model, then we propose a new algorithm for incremental discovery Increase- R Found

Access sequence S k Interest association rules R (B | S k) ≥ 0


interest for migration patterns, and we give proof to demonstrate that the algorithm can
S 21 = ( N 1 , N 2) R (B | S 21) = 1 × 10/32 × 1/2 × 4/8
be found in all of the interest in migration patterns.
S 22 = ( N 1 , N 3) R (B | S 22) = 1 × 10/32 × 1/2 × 2/8

S 33 = ( N 1 , N 2, N 4) R (B | S 33) = 1 × 10/32 × 1/2 × 4/8 × 1 × 1

Feature of our approach are: 1) It found that the migration patterns of user

Then R (B | S 33) , R (B | S twenty one ) Greater than R (B | S twenty two ) , Obviously interest access with interest; 2) Periodicity , Offline excavation; 3) Mining is the object migration

association rules to better reflect the preferences of the user's access. behavior of all users, mining is all the user's access interests, mining results for all

6. 2 Experiments with a real background users; no specific information of one or a class of users; 4) By automatically across

We selected the Institute of Computing Technology of the Chinese Academy of Sciences Web pages category set in nature, migration patterns are not necessarily found in Web
different

Server ( ht tp: / / www ict ac cn)... Log on as subjects, including experimental data There are direct links to the site.

from 1998 year 11 Months to 1999 year 11


2 period king Real et al: Found interest in migration patterns based on Hidden Markov Models 157

5 Ag raw al R, Srikant R. Fas t algori thm s fo r mining as sociati on


This method on the one hand can help Web Site designers understand the
rul es. In: Proc 20th V LDB Conf erence, San tiago, Chile,
user's preferences, in order to improve the design of the site; on the other hand can
1994. 487-499
help users better understand their own behavior and page access the content they
6 Buchner AG, M ulvenna M D. Dis covering int ernet mark eting

want to access. in tellig ence th rough online anal ytical Web usage mining S IGMOD Record, 1998, 27

Our further work will focus on the application of this method to predict the (4):. 54- 61 7 Wexelbl at A, Maes P. Footprint s: Hi st ory-rich w eb brows-

behavior of a user's access, real-time personalized recommendations.


ing. In: Proc Com put er-As sis ted Inf ormati on Ret ri eval

(RIAO), Bost on, 1997. 75- 84


references 8 Spi liopoulou M. The laborious way f rom data mining to w eb

mining. In t Journal of Com Sys t em put er, Sci ence and Engineering, Special

1 St ort R. Web Si te Stat s: Tracking Hi ts and Anal yzing Traf f ic. Issue on " Semanti cs of th e Web " 1999, 3 (1): 105-113

Osborne: McGraw-Hil l, 1997

2 Cooley R, M obash er B et a l. Grouping Web pag e references in- 9 Ch en MS, Park JS, Yu P S. Eff ici en t dat a mining f or t raver-

to t ransactions f or mining Wo rld Wide Web brow sing pat terns. sal pat terns. IEEE Trans Know ledge and Data Engin eering,

1998, 10 (2): 209-221


In: Proc Know ledge and Data Engineering Workshop, New-
10 Borges J, Levene M. Data mining of us er navigati on pat t erns.
po rt Beach, CA, 1997. 245- 253
In: Proc Web Us ag e Analysi s and User Profiling Worksh op,
3 Cooley R, Mobash er B et al. Web mining: Inf ormati on and
San Di ego, Cali fornia, 1999. 31- 36 11 Luo tonen A. Th e common log f ile f ormat
patt ern di scov ery on the World Wide Web In:. Proc Int ernational Conf erence on
ht tp:.. / / Www w 3.
Tools wi th Artif icial Int elli gence, New-
org / pub / WWW /, 1995
po rt Beach, 1997. 312- 320
12 Rabiner L R. A tut orial on h idden Markov mod els and s el ected
4 Cooley R, Mobasher B et al. Dat a preparati on f or mining
applications in speech recogni tion. Proceedings of th e IEEE,
World Wide Web brow sing pat t erns. Know ledge and Inf orma-
1989, 77 (2): 257-286
tion Sys t ems, 1999, 1 (1): 207- 213

You might also like