0% found this document useful (0 votes)

88 views59 pages

Data Mining For Web Personalization

This document summarizes a presentation on data mining for web personalization given by the Highflyers group. It introduces the speakers and their topics, which include traditional approaches to web personalization, data collection and preprocessing, and pattern discovery techniques like clustering, association rule mining, and sequential pattern mining. These techniques are used to build predictive user models from web usage data to enable personalized recommendations.

Uploaded by

Bằng Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views59 pages

Data Mining For Web Personalization

Uploaded by

Bằng Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 59

Data Mining for Web Personalization

Presented by the Highflyers group

Who are the Highflyers?

Irfan Butt Introduction and Traditional approaches to Web Personalization Joel Gascoigne Data Collection, Preprocessing and Modelling James Silver Pattern Discovery Predictive Web User Modelling Part 1 Aaron John-Baptiste Pattern Discovery Predictive Web User Modelling Part 2 Asad Qazi Evaluating Personalized Models and Conclusion

Introduction
Paper titled: Data Mining for Web Personalization Author: Bamshad Mobasher

Irfan Butt Introduction and Traditional approaches to Web Personalization

Introduction to Web Personalization

Personalization
Delivery of content tailored to a particular user

Web Personalization
Delivery of dynamic content, such as text, links tailored to a particular user or segments of user

Automatic Personalization Vs Customization

Similarity: Both refer to delivery of content Difference: Creation and updating of user profile Examples
Customization: My Yahoo, Dell Website Automatic Personalization: Amazon

Personalization in Traditional Approaches

Two phases in the process of personalization
1) Data Collection Phase 2) Learning Phase

Classification based on learning from data

1. Memory Based Learning (Lazy) Examples: User-based collaborative system, Content-based filtering system 2. Model Based Learning (Eager) Examples: Item-based System

Memory Based Learning VS Model Based Learning

Memory Based Learning (Lazy)

Huge memory required Scalability issue Adaptable to changes

Model Based Learning (Eager)

Limited memory required Easily scalable Learning phase offline Not adaptable to changes

Traditional Approaches to Web Personalization

Rule Based Personalization Systems
Rules are used to recommend item Rules based on personal characteristics of user Static profiles result in degradation of system

Traditional Approaches to Web Personalization

Content-based Filtering Systems
User profile built on content descriptions of items Profile based on previous rating of items

Traditional Approaches to Web Personalization

Collaborative Filtering Systems
Single profile is built in the same way i.e. contentbased filtering Systems Items from more than one profile is used to recommend new item or content These profiles are K Nearest Neighbors based on previous ratings of items of each profile Poor results as the system grows

Data Mining Approach to Personalization

Data Mining (or Web Usage Mining)
The automatic discovery and analysis of patterns in click stream and associated data collected or generated as a result of user interactions with Web resources on one or more Web sites

Data Mining Cycle:

Data preparation and transformation phase. Pattern discovery phase Recommendation phase

Joel Gascoigne Data Collection, Preprocessing and Modelling

Data Modelling and Representation

Assume the existence of a set of m users:
U = {u1, u2, , um}

Set of n items:
I = {in, in, , in}

Data Modelling and Representation

The profile for a user u U is an n-dimensional vector of ordered pairs:
u(n) = {(i1, su(i1)), (i2, su(i2)), , (in, su(in))}

Typically, such profiles are collected over time and stored

Can be represented as an n x m matrix, UP

Data Modelling and Representation

A Personalisation System, PS can be viewed as a mapping of user profiles and items to obtain a rating of interest The mapping is not generally defined for the whole domain of user-item pairs
System must predict interest scores

Data Modelling and Representation

This general framework can be used with most approaches to personalisation In the data mining approach:
A variety of machine learning techniques are applied to UP to discover aggregate user models These user models are used to make a prediction for the target user

Data Sources for Web Usage Mining

Main data sources used in web usage mining are server log files
Clickstream data

Other data sources include the site files and meta-data

Data Sources for Web Usage Mining

This data needs to be abstracted
Pageview
Representation of a collection of web objects

Session
A sequence of pageviews by a single user

All sessions belonging to a user can be aggregated to create the profile for that user

Data Sources for Web Usage Mining

Content data
Collection of objects and relationships conveyed to the user
Text Images

Also, semantic or structual meta-data embedded within the site

Domain ontology
Could use an ontology language such as RDF Or a database schema

Data Sources for Web Usage Mining

Also, operational databases for the site may include additional information about user and items
Geographic information User ratings

Primary Tasks in Data Preprocessing for Web Usage Mining

Data Preprocessing for Web Usage Mining

Goal:
Transform click-stream data into a set of user profiles

This sessionized data can be used as the input for a variety of data mining algorithms or further abstracted

Data Preprocessing for Web Usage Mining

Tasks in usage data preprocessing:
Data Fusion Data Cleaning Pageview Identification Sessionization Episode Identification

Data Preprocessing for Web Usage Mining

Data Fusion:
Merging of log files from web and application servers

Data Cleaning:
Tasks such as:
Removing extraneous references to embedded objects Removing references due to spider navigations

Data Preprocessing for Web Usage Mining

Pageview Identification:
Aggregation of collection of objects or pages, which should be considered a unit This process is dependent on the linkage structure of the site In the simplets case, each HTML file has a one-toone correlation with a pageview Must distinguish between users
Authentication system or cookies

Data Preprocessing for Web Usage Mining

Sessionization:
Process of segmenting the user activity log of each user into sessions, each representing a single visit to the site

Episode Identification:
Episode is a subsequence of a session comprised of related pageviews

Data Preprocessing for Web Usage Mining

These tasks ultimately result in a set of n pageviews
P = {p1, p2, , pn}

A set of v user transactions

T = {t1, t2, , tv}

A user transaction captures the activity of a user during a particular session

Data Preprocessing for Web Usage Mining

Finally, one or more transactions or sessions associated with a given user can be aggregated to form the final profile for that user
If the profile is generated from a single session, it represents short-term interests Aggregation of multiple sessions results in profiles that capture long-term interests

Data Preprocessing for Web Usage Mining

The collection of these profiles comprises the m x n matrix UP which can be used to perform various data mining tasks After basic clickstream preprocessing steps, data from other sources is integrated:
Content, structure and user data

James Silver Pattern Discovery Predictive Web User Modelling Part 1

Model-Based Collaborative Techniques

Two-stage recommendation process:
(A) offline model-building (B) Real-time scoring
(Explicit & Implicit user behavioural data used)

Offline model-building algorithms:

(1) Clustering, (2) Association Rule Discovery, (3) Sequential Pattern Discovery, (4) Latent Variable Models (part 2)

We also look at hybrid models (part 2)

(1) Clustering
Clustering divides data into groups where:
Inter-cluster similarities are minimised Intra-cluster similarities are maximised

Generalization to Web usage mining User-based vs. Item-based clustering Efficiency and scalability improvements

(1) Clustering: User-based

User profiles Partitions Matrix UP
Clusters represent user segments based on common navigational behaviour

Recommendations (target user u, target item i)

Centroid vector vk computed for each cluster Ck Neighbourhood: All user segments that have a score for i and whose vk is most similar to u

(1) Clustering: Other

Fuzzy Clustering
Desirable to group users into many categories

Distance issues
Consider web-transactions as sequences

Association Rule Hypergraph Partitioning

(ARHP)

(2) Association Rule Discovery

Finding groups of pages or items that are commonly accessed or purchased together

Originally for mining supermarket basket data Discovering Association Rules involves:
1)Discovering frequent itemsets
Satisfying a minimum support threshold

2)Discovering association rules

Satisfying a minimum confidence threshold

(2) Association Rules: Concepts

Transactions set T Itemsets I = {I1,I2,...,Ik} over T Association rule r has the form X => Y (sr, cr)
sr = the support of X U Y
(i.e. probability that X and Y occur together in a transaction) cr = the confidence of the rule r (i.e. the conditional probability that Y occurs in a transaction, given that X has occurred in that transaction)

(2) Recommendations
Matching rule antecedents with target user profiles
Sliding window solution Naive approach Frequent Itemset Graph

Finding Candidate pages:

Match current user session window with previously discovered frequent itemsets

Recommendation Value
Confidence of corresponding association rule

(2) Recommendations

(3) Sequential Models

Now we consider the order when discovering frequently occurring itemsets.
So: given the user transaction {i1,i2,i3} Association rules (i1=>i2) and (i2=>i1) are fine But sequential pattern (i2=>i1) not supported

Two types of sequences: i3

Contiguous (closed) sequence Open Sequence {i1,i2,i4,i3}

i1,i2 =>
{i1,i2,i3}

Frequent Navigational Paths

(3) Recommendations
Trie-structure (aggregate tree)
Each node is an item, root is the empty sequence

Recommendation Generation
Found in O(s) by traversing the tree
s = the length of the current user transaction deemed to be useful in recommending the next set of items

Sliding window w Maximum depth of tree therefore is |w|+1 Controlling the size of the tree

(3) Sequential Models: Contiguous

Contiguous sequence patterns are particularly restrictive
Valuable in page pre-fetching applications Rather than in general context of recommendation generation

(3) Sequential Models: Markov

Another approach for sequential modelling
Based on Stochastic methods

Modelling the navigational activity in the website as a Markov chain

(3) Sequential Models: Markov

A Markov model is represented by the 3-tuple <A,S,T>
A: set of possible actions (items) S: set of n states for which the model is built (visitors navigation history) T=[pi,j]nxn: Transition Probability Matrix
pi,j: probability of a transition from state si to state sj

Order : Number of prior events used in predicting each future event

(3) Markov for Web-mining

Designed to predict the next user action based on the users previous surfing behaviour Also used to discover high-probability user navigational paths in a website
User-prefered trails

Various optimization methods Apart from Markov: Mixture Models

Aaron John-Baptiste Pattern Discovery Predictive Web User Modelling Part 2

(4) Latent Variable Models (LVMs)

Latent Variables are variables that haven't been directly observed but have rather been inferred.
E.g. Morale is not measured directly but inferred

Have more recently become popular as a modelling approach in web usage mining Two commonly used LVMs
Finite Mixture Models (FMM) Factor Analysis (FA)

(4) FA and FMM

Factor Analysis
Aims to summarise and find relationships within observed data (all data) Used in pattern recognition, collaborative filtering and personalization based web usage mining

Finite Mixture Models (FMM)

Use a finite number of components to model (a page view, or user rating)

(4) Drawbacks to pure usage based models

Pure usage based models have drawbacks
Process relies on user transactions or rating data New items or pages are therefore never recommended (new item problem) Also do not use knowledge from underlying domain and so cannot make more complex recommendations

(5) Hybrid models

Uses a combination of user-based and contentbased modelling. Three main types used in web mining
Integrating content features Integrating semantic knowledge Using Linkage structure

(5) Integrating content features with usage-based models

Solves new item problem
Use content characteristics of pages with userbased data Extract keywords from content to be used to discover patterns Not just using user data means new pages with relevant content can be recommended Users interests can be mapped to content, (concepts or topics)

(5) Integrating structured semantic knowledge with usage-based models

Content feature integration is useful when pages are rich in text and keywords However cannot capture more complex relationships where items have underlying properties Idea is to take the underlying meanings of objects and add them to the user-based data. Recommendations can then be made to pages or items with similar semantic meanings

(5) Using Linkage structure for model learning and selection

Other semantic data can be used such as relational databases and the hyperlink structure on a web page Mobasher proposes a hybrid recommendation system that switches between different algorithms based on the degree of connectivity in the site and user E.g. in a highly connected website, with short paths, non sequential models performed better

Asad Qazi Evaluating Personalized Models and Conclusion

Evaluating Personalization models

The Primary Goal of this section is to evaluate the accuracy and effectiveness of web personalization models

Why Evaluate?
More complex web-based applications and more complex user interaction requires the selection of more sophisticated models Need to further explore the impact of recommended model on user behaviour There are several different modelling approaches to web personalization Evaluating personalized models is an inherently challenging task firstly, because different models require different evaluation metrics, secondly, the required personalization actions may be quite different depending on the underlying domain, relevant data and intended application Finally, there is also a lack of consensus among researchers as to what factors affect quality of service in personalized systems and of what elements contribute to user satisfaction

Common evaluation approaches

A number of metrics have been proposed in literature for evaluating the robustness and predictive accuracy of a recommender system: this includes Mean Absolute Error (MAE) Classification Metrics (Precision and Recall) Receiver Operating Characteristic (ROC) The use of business metrics to measure the customer loyalty and satisfaction such as Recency Frequency Monetary (RFM) The use of other key dimensions along with metrics such as: Accuracy, Coverage, Utility, Explainability, Robustness, Scalability and User Satisfaction

Conclusions
Web personalisation is viewed as an application of data mining which dynamically serves customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests of data available to personalization systems, the modelling approaches employed and the current approaches to evaluating these systems We have also discussed the various sources of data available to personalization systems, the modelling approaches employed and the current approaches to evaluating these systems Recent user studies have found that a number of issues can affect the perceived usefulness of personalization systems including, trust in the system, transparency of the recommendation logic, ability for a user to refine the system generated profile and diversity of recommendations Most personalization systems tend to use a static profile of the user. However user interests are not static, changing with time and context. Few systems have attempted to handle the dynamics within the user profile.

Any Questions?

Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Dunham - Data Mining PDF
83% (6)
Dunham - Data Mining PDF
156 pages
Patran Excercise3
No ratings yet
Patran Excercise3
18 pages
Framework For Web Personalization Using Web Mining
No ratings yet
Framework For Web Personalization Using Web Mining
6 pages
H 5
No ratings yet
H 5
13 pages
Our Topic:: Web Usage Mining
No ratings yet
Our Topic:: Web Usage Mining
51 pages
Roadmap Web Mining
No ratings yet
Roadmap Web Mining
8 pages
Web Mining and Knowledge Discovery of Usage Patterns - A Survey
No ratings yet
Web Mining and Knowledge Discovery of Usage Patterns - A Survey
27 pages
A Data Warehousing and Data Mining Framework For Web Usage Management
No ratings yet
A Data Warehousing and Data Mining Framework For Web Usage Management
24 pages
Web Mining and Privacy: Bettina Berendt
No ratings yet
Web Mining and Privacy: Bettina Berendt
89 pages
Mining User Access Log Using Evolutionary Approach For Clustering
No ratings yet
Mining User Access Log Using Evolutionary Approach For Clustering
33 pages
An Effective Web Usage Analysis Using Fuzzy Clustering: P.Nithya, P.Sumathi
No ratings yet
An Effective Web Usage Analysis Using Fuzzy Clustering: P.Nithya, P.Sumathi
6 pages
Acstv10n5 65
No ratings yet
Acstv10n5 65
12 pages
Web Mining
No ratings yet
Web Mining
6 pages
Ijctt V3i4p110
No ratings yet
Ijctt V3i4p110
3 pages
Cluster Optimization For Improved Web Usage Mining
No ratings yet
Cluster Optimization For Improved Web Usage Mining
6 pages
Web Miningppt
No ratings yet
Web Miningppt
29 pages
Algorithm For Tracing Visitors' On-Line Behaviors
No ratings yet
Algorithm For Tracing Visitors' On-Line Behaviors
7 pages
Chapter 2 User Profiling
No ratings yet
Chapter 2 User Profiling
28 pages
Module 2 Web Usage Mining
No ratings yet
Module 2 Web Usage Mining
34 pages
Bda Class - Feb 7th
No ratings yet
Bda Class - Feb 7th
28 pages
2nd Project Report Pse12april
No ratings yet
2nd Project Report Pse12april
11 pages
Web Mining
No ratings yet
Web Mining
8 pages
Web Mining
No ratings yet
Web Mining
42 pages
Wdm-Unit I
No ratings yet
Wdm-Unit I
70 pages
Log Paper-1
No ratings yet
Log Paper-1
15 pages
Web Mining For BI - Part 2
No ratings yet
Web Mining For BI - Part 2
31 pages
Ijca PDF
No ratings yet
Ijca PDF
9 pages
Web Mining: Presented By: Vikash Kumar
No ratings yet
Web Mining: Presented By: Vikash Kumar
24 pages
Handling High Web Access Utility Mining Using Intelligent Hybrid Hill Climbing Algorithm Based Tree Construction
No ratings yet
Handling High Web Access Utility Mining Using Intelligent Hybrid Hill Climbing Algorithm Based Tree Construction
11 pages
Web Mining Notes
100% (1)
Web Mining Notes
8 pages
Automatic Recommendations For E-Learning Personalization Based On Web Usage Mining Techniques and Information Retrieval
No ratings yet
Automatic Recommendations For E-Learning Personalization Based On Web Usage Mining Techniques and Information Retrieval
5 pages
Web Mining PPT 4121
No ratings yet
Web Mining PPT 4121
18 pages
Unit V.2
No ratings yet
Unit V.2
31 pages
EB Ining: Dvanced Opics
0% (1)
EB Ining: Dvanced Opics
48 pages
User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns
No ratings yet
User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns
5 pages
Web Usage Mining For Extracting Users' Navigational
No ratings yet
Web Usage Mining For Extracting Users' Navigational
7 pages
Web Usage Mining Using Improved KNN Algorithm: Dr.P.Tamijeselvy, Sangavi. S, Suvetha. T, Umashankari. T
No ratings yet
Web Usage Mining Using Improved KNN Algorithm: Dr.P.Tamijeselvy, Sangavi. S, Suvetha. T, Umashankari. T
6 pages
Module1PartAweb Mining-Intro
No ratings yet
Module1PartAweb Mining-Intro
28 pages
Ijesat 2012 02 Si 01 12
No ratings yet
Ijesat 2012 02 Si 01 12
5 pages
Process of Web Mining and Categories of Web Mining
No ratings yet
Process of Web Mining and Categories of Web Mining
5 pages
Web Mining
No ratings yet
Web Mining
13 pages
User Profiling
No ratings yet
User Profiling
3 pages
Unit 5 DM
No ratings yet
Unit 5 DM
61 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Webpersonalizer: A Server-Side Recommender System Based On Web Usage Mining
No ratings yet
Webpersonalizer: A Server-Side Recommender System Based On Web Usage Mining
12 pages
Web Personalization Survey
No ratings yet
Web Personalization Survey
7 pages
Advance Clustering Technique Based On Markov Chain For Predicting Next User Movement
No ratings yet
Advance Clustering Technique Based On Markov Chain For Predicting Next User Movement
7 pages
Dunham - Data Mining PDF
100% (1)
Dunham - Data Mining PDF
156 pages
Improving Web Search Results in Web Personalization
No ratings yet
Improving Web Search Results in Web Personalization
4 pages
2 Buss Intel Analytics
No ratings yet
2 Buss Intel Analytics
43 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Web Usage Mining Chris Yang3114
No ratings yet
Web Usage Mining Chris Yang3114
32 pages
Algorithmic Personalization Definition
No ratings yet
Algorithmic Personalization Definition
1 page
Web Assignment1
No ratings yet
Web Assignment1
4 pages
Web Mining: by Saumil Shah Roll No: 46 Mca 4 Sem
No ratings yet
Web Mining: by Saumil Shah Roll No: 46 Mca 4 Sem
28 pages
Web Usage Mining: Discovery and Applications of Usage Patterns From Web Data
No ratings yet
Web Usage Mining: Discovery and Applications of Usage Patterns From Web Data
12 pages
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
No ratings yet
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
7 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
关于会计抽样的方法相关论文reference
No ratings yet
关于会计抽样的方法相关论文reference
4 pages
Monday Tuesday Wednesday Thursday Friday I. Objectives: Weekly Learning Plan
No ratings yet
Monday Tuesday Wednesday Thursday Friday I. Objectives: Weekly Learning Plan
43 pages
2024 Preseason Ip
No ratings yet
2024 Preseason Ip
7 pages
Winter Numbers 0-20 Freebie
No ratings yet
Winter Numbers 0-20 Freebie
7 pages
HET 227 - Morphology & Syntax - The Mental Lexicon
No ratings yet
HET 227 - Morphology & Syntax - The Mental Lexicon
2 pages
The World'S First 3Lcd, Touch-Enabled, Interactive Projector at Your Fingertips
No ratings yet
The World'S First 3Lcd, Touch-Enabled, Interactive Projector at Your Fingertips
8 pages
Music Harmony Analysis-Towards A Harmonic Complexity of Musical Pieces
100% (1)
Music Harmony Analysis-Towards A Harmonic Complexity of Musical Pieces
118 pages
190-ECDIS JRC JAN-7201-9201 Instruct Manual Function 1-4-2019
100% (7)
190-ECDIS JRC JAN-7201-9201 Instruct Manual Function 1-4-2019
558 pages
Installation Guide
No ratings yet
Installation Guide
210 pages
Year 2 English Map1
No ratings yet
Year 2 English Map1
1 page
Star Technique Template Apolitical
No ratings yet
Star Technique Template Apolitical
5 pages
K-Bus 6 Buttons Touch Panel User Manual-Ver. 2.1
No ratings yet
K-Bus 6 Buttons Touch Panel User Manual-Ver. 2.1
28 pages
Global Responsibility and Local Knowledge Systems
No ratings yet
Global Responsibility and Local Knowledge Systems
10 pages
Ibps
No ratings yet
Ibps
10 pages
Cambridge Ordinary Level
No ratings yet
Cambridge Ordinary Level
24 pages
BMPR pccq01
No ratings yet
BMPR pccq01
1 page
FEE 532 Power System Stability II
No ratings yet
FEE 532 Power System Stability II
30 pages
Research
No ratings yet
Research
1 page
Generating Disassembly Sequences of Mechanical Assembly From Step-Cad Model
No ratings yet
Generating Disassembly Sequences of Mechanical Assembly From Step-Cad Model
10 pages
Tutorial Manual 2021-22 - Sem-III - MOS - MITSOE
No ratings yet
Tutorial Manual 2021-22 - Sem-III - MOS - MITSOE
97 pages
Chemplast P 200
No ratings yet
Chemplast P 200
3 pages
Plantilla de Psicologia
No ratings yet
Plantilla de Psicologia
44 pages
A-heavy-metal-tolerant-novel-bacterium,-Bacillus-malikii-sp.-nov.,-isolated-from-tannery-effluent-wastewater_2015_Antonie-van-Leeuwenhoek,-International-Journal-of-General-and-Molecular-Microbiology.pdf
No ratings yet
A-heavy-metal-tolerant-novel-bacterium,-Bacillus-malikii-sp.-nov.,-isolated-from-tannery-effluent-wastewater_2015_Antonie-van-Leeuwenhoek,-International-Journal-of-General-and-Molecular-Microbiology.pdf
12 pages
Hieronymus
No ratings yet
Hieronymus
4 pages
Closure Operator
No ratings yet
Closure Operator
5 pages
Oracle ADF Open Content in New Tab
No ratings yet
Oracle ADF Open Content in New Tab
10 pages
Partial Derivative MCQ's
No ratings yet
Partial Derivative MCQ's
14 pages
Dictionar Petrol English - Romanian
No ratings yet
Dictionar Petrol English - Romanian
117 pages
7) Website Redesign Budget Template
0% (1)
7) Website Redesign Budget Template
16 pages