Merging algorithm to reduce dimensionality in application to web-mining
V Nikulin, GJ McLachlan - AI 2007: Advances in Artificial Intelligence: 20th …, 2007 - Springer
AI 2007: Advances in Artificial Intelligence: 20th Australian Joint Conference …, 2007•Springer
Dimensional reduction may be effective in order to compress data without loss of essential
information. It is proposed to reduce dimension (number of the used web-areas or vroots) as
a result of the unsupervised learning process maximizing a specially defined average log-
likelihood divergence. Two different web-areas will be merged in the case if these areas
appear together frequently during the same sessions. Essentially, roles of the web-areas are
not symmetrical in the merging process. The web-area or cluster with bigger weight will act …
information. It is proposed to reduce dimension (number of the used web-areas or vroots) as
a result of the unsupervised learning process maximizing a specially defined average log-
likelihood divergence. Two different web-areas will be merged in the case if these areas
appear together frequently during the same sessions. Essentially, roles of the web-areas are
not symmetrical in the merging process. The web-area or cluster with bigger weight will act …
Abstract
Dimensional reduction may be effective in order to compress data without loss of essential information. It is proposed to reduce dimension (number of the used web-areas or vroots) as a result of the unsupervised learning process maximizing a specially defined average log-likelihood divergence. Two different web-areas will be merged in the case if these areas appear together frequently during the same sessions. Essentially, roles of the web-areas are not symmetrical in the merging process. The web-area or cluster with bigger weight will act as an attractor and will stimulate merging. In difference, the smaller cluster will try to keep independence. In both cases the powers of attraction or resistance will depend on the weights of the corresponding clusters. The above strategy will prevent creation of one super-big cluster, and will help to reduce the number of non-significant clusters. The proposed method is illustrated using two synthetic examples. The first example is based on an ideal vlink matrix, which characterizes weights of the vroots and relations between them. The vlink matrix for the second example is generated using a specially designed web-traffic simulator.
Springer
Showing the best result for this search. See all results