0% found this document useful (0 votes)

22 views34 pages

Module 2 Web Usage Mining

Web usage mining involves the automatic discovery of patterns from user interactions with websites, aiming to analyze behavioral patterns and user profiles. It utilizes data from web server logs, site contents, and visitor information, and involves processes such as data cleaning, session identification, and integration with e-commerce events. The insights gained from web usage mining are essential for personalizing web services and improving user experience, particularly in e-commerce and search engine applications.

Uploaded by

bhavinjain408

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views34 pages

Module 2 Web Usage Mining

Uploaded by

bhavinjain408

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Web Usage Mining

-
Dr. Jalpa Darshit Mehta
Introduction
◼ Web usage mining: automatic discovery of
patterns in clickstreams and associated data
collected or generated as a result of user
interactions with one or more Web sites.
◼ Goal: analyze the behavioral patterns and
profiles of users interacting with a Web site.
◼ The discovered patterns are usually
represented as collections of pages, objects,
or resources that are frequently accessed by
groups of users with common interests.
Introduction
◼ Data in Web Usage Mining:
❑ Web server logs
❑ Site contents
❑ Data about the visitors, gathered from external channels
❑ Further application data
◼ Not all these data are always available.
◼ When they are, they must be integrated.
◼ A large part of Web usage mining is about
processing usage/ clickstream data.
❑ After that various data mining algorithm can be applied.

3
Web server logs
1 2006-02-01 00:08:43 1.2.3.4 - GET /classes/cs589/papers.html - 200 9221
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)
http://dataminingresources.blogspot.com/
2 2006-02-01 00:08:46 1.2.3.4 - GET /classes/cs589/papers/cms-tai.pdf - 200 4096
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727)
http://maya.cs.depaul.edu/~classes/cs589/papers.html
3 2006-02-01 08:01:28 2.3.4.5 - GET /classes/ds575/papers/hyperlink.pdf - 200
318814 HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)
http://www.google.com/search?hl=en&lr=&q=hyperlink+analysis+for+the+web+survey
4 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/announce.html - 200 3794
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
http://maya.cs.depaul.edu/~classes/cs480/
5 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/styles2.css - 200 1636
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
http://maya.cs.depaul.edu/~classes/cs480/announce.html
6 2006-02-02 19:34:45 3.4.5.6 - GET /classes/cs480/header.gif - 200 6027
HTTP/1.1 maya.cs.depaul.edu
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)
http://maya.cs.depaul.edu/~classes/cs480/announce.html

4
Web usage mining process

5
Data preparation

6
Pre-processing of web usage data

7
Data cleaning

◼ Data cleaning
❑ remove irrelevant references and fields in server
logs
❑ remove references due to spider navigation
❑ remove erroneous references
❑ add missing references due to caching (done after
sessionization)

8
Identify sessions (sessionization)

◼ In Web usage analysis, these data are the

sessions of the site visitors: the activities
performed by a user from the moment she
enters the site until the moment she leaves it.
◼ Difficult to obtain reliable usage data due to
proxy servers and anonymizers, dynamic IP
addresses, missing references due to
caching, and the inability of servers to
distinguish among different visits.

9
Sessionization strategies

10
Sessionization heuristics

11
Sessionization example

12
User identification

13
User identification: an example

14
Pageview

◼ A pageview is an aggregate representation of

a collection of Web objects contributing to the
display on a user’s browser resulting from a
single user action (such as a click-through).
◼ Conceptually, each pageview can be viewed
as a collection of Web objects or resources
representing a specific “user event,” e.g.,
reading an article, viewing a product page, or
adding a product to the shopping cart.

15
Path completion
◼ Client- or proxy-side caching can often result
in missing access references to those pages
or objects that have been cached.
◼ For instance,
❑ if a user returns to a page A during the same
session, the second access to A will likely result in
viewing the previously downloaded version of A
that was cached on the client-side, and therefore,
no request is made to the server.
❑ This results in the second reference to A not being
recorded on the server logs.
16
Missing references due to caching

17
Path completion
◼ The problem of inferring missing user
references due to caching.
◼ Effective path completion requires extensive
knowledge of the link structure within the site
◼ Referrer information in server logs can also
be used in disambiguating the inferred paths.
◼ Problem gets much more complicated in
frame-based sites.

18
Integrating with e-commerce events
◼ Either product oriented or visit oriented
◼ Used to track and analyze conversion of
browsers to buyers.
❑ Major difficulty for E-commerce events is defining
and implementing the events for a site, however,
in contrast to clickstream data, getting reliable
preprocessed data is not a problem.
◼ Another major challenge is the successful
integration with clickstream data

19
Product-Oriented Events

◼ Product View
❑ Occurs every time a product is displayed on a
page view
❑ Typical Types: Image, Link, Text
◼ Product Click-through
❑ Occurs every time a user “clicks” on a product to
get more information

20
Product-Oriented Events

◼ Shopping Cart Changes

❑ Shopping Cart Add or Remove
❑ Shopping Cart Change - quantity or other feature
(e.g. size) is changed
◼ Product Buy or Bid
❑ Separate buy event occurs for each product in the
shopping cart
❑ Auction sites can track bid events in addition to
the product purchases

21
Web usage mining process

22
Integration with page content

23
Integration with link structure

24
E-commerce data analysis

25
Session analysis

◼ Simplest form of analysis: examine individual

or groups of server sessions and e-
commerce data.
◼ Advantages:
❑ Gain insight into typical customer behaviors.
❑ Trace specific problems with the site.
◼ Drawbacks:
❑ LOTS of data.
❑ Difficult to generalize.

26
Session analysis: aggregate reports

27
OLAP

28
Data mining

29
Data mining (cont.)

30
Some usage mining applications

31
Personalization application

32
Standard approaches

33
Summary
◼ Web usage mining has emerged as the essential
tool for realizing more personalized, user-friendly
and business-optimal Web services.
◼ The key is to use the user-clickstream data for
many mining purposes.
◼ Traditionally, Web usage mining is used by e-
commerce sites to organize their sites and to
increase profits.
◼ It is now also used by search engines to improve
search quality and to evaluate search results, etc,
and by many other applications.
34

Insignum GB V2.3 Simplex
100% (1)
Insignum GB V2.3 Simplex
188 pages
User Manual - Mobile App
No ratings yet
User Manual - Mobile App
17 pages
Assignment-4 Database Design - Models
50% (2)
Assignment-4 Database Design - Models
11 pages
Exercise: Prepare Training Sample Data For Object Detection
No ratings yet
Exercise: Prepare Training Sample Data For Object Detection
12 pages
Chapter 12: Web Usage Mining: - An Introduction
No ratings yet
Chapter 12: Web Usage Mining: - An Introduction
34 pages
Our Topic:: Web Usage Mining
No ratings yet
Our Topic:: Web Usage Mining
51 pages
Web Analytics Tutorial
No ratings yet
Web Analytics Tutorial
29 pages
H 5
No ratings yet
H 5
13 pages
Web Analytics Tutorial
No ratings yet
Web Analytics Tutorial
29 pages
Web Usage Mining Chris Yang3114
No ratings yet
Web Usage Mining Chris Yang3114
32 pages
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
No ratings yet
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
7 pages
Web Mining PPT 4121
No ratings yet
Web Mining PPT 4121
18 pages
Ijctt V3i4p110
No ratings yet
Ijctt V3i4p110
3 pages
CH - Web Mining, Social Media Analytics, Sentiment Analysis
100% (1)
CH - Web Mining, Social Media Analytics, Sentiment Analysis
45 pages
Unit 5 DM
No ratings yet
Unit 5 DM
61 pages
Algorithm For Tracing Visitors' On-Line Behaviors
No ratings yet
Algorithm For Tracing Visitors' On-Line Behaviors
7 pages
Web Mining and Knowledge Discovery of Usage Patterns - A Survey
No ratings yet
Web Mining and Knowledge Discovery of Usage Patterns - A Survey
27 pages
Web Mining For BI - Part 2
No ratings yet
Web Mining For BI - Part 2
31 pages
WSMA Mid-2 1
No ratings yet
WSMA Mid-2 1
26 pages
Clustering and Classification
No ratings yet
Clustering and Classification
1 page
A Data Warehousing and Data Mining Framework For Web Usage Management
No ratings yet
A Data Warehousing and Data Mining Framework For Web Usage Management
24 pages
Clickstream Analytics: - Submitted by Diksha Vashishth Pranav Dagar Asmita Gupta
No ratings yet
Clickstream Analytics: - Submitted by Diksha Vashishth Pranav Dagar Asmita Gupta
10 pages
2nd Project Report Pse12april
No ratings yet
2nd Project Report Pse12april
11 pages
Acstv10n5 65
No ratings yet
Acstv10n5 65
12 pages
Web X.0 Notes-1
No ratings yet
Web X.0 Notes-1
32 pages
Clickstream Analysis
No ratings yet
Clickstream Analysis
25 pages
World Wide Web Usage Mining Systems and Technologies
No ratings yet
World Wide Web Usage Mining Systems and Technologies
7 pages
User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns
No ratings yet
User Web Usage Mining For Navigation Improvisation Using Semantic Related Frequent Patterns
5 pages
Web Mining
No ratings yet
Web Mining
14 pages
Web Miningppt
No ratings yet
Web Miningppt
29 pages
Web Usage Mining
No ratings yet
Web Usage Mining
14 pages
Cluster Optimization For Improved Web Usage Mining
No ratings yet
Cluster Optimization For Improved Web Usage Mining
6 pages
06) Web Analytics
No ratings yet
06) Web Analytics
47 pages
9-Advanced Preprocessing Using Distinct User
No ratings yet
9-Advanced Preprocessing Using Distinct User
5 pages
Sharda Dss10 PPT 08 ST
No ratings yet
Sharda Dss10 PPT 08 ST
14 pages
EB Ining: Dvanced Opics
0% (1)
EB Ining: Dvanced Opics
48 pages
Web Mining
No ratings yet
Web Mining
6 pages
Web Mining
No ratings yet
Web Mining
13 pages
Ijca PDF
No ratings yet
Ijca PDF
9 pages
Web X.0
No ratings yet
Web X.0
49 pages
Presentation Outline Final (DAWA) 1
No ratings yet
Presentation Outline Final (DAWA) 1
22 pages
Web Analytics, Web Mining, and Social Analytics
No ratings yet
Web Analytics, Web Mining, and Social Analytics
53 pages
Web Usage Mining: - Hat, Hy, Ho
No ratings yet
Web Usage Mining: - Hat, Hy, Ho
18 pages
Web Analytics
No ratings yet
Web Analytics
48 pages
Bar Sag Ada
No ratings yet
Bar Sag Ada
27 pages
Web Mining Notes
100% (1)
Web Mining Notes
8 pages
Data Mining For Web Personalization
No ratings yet
Data Mining For Web Personalization
59 pages
An Optimized K-Harmonic Mean Based Clustering User Navigation Patterns
No ratings yet
An Optimized K-Harmonic Mean Based Clustering User Navigation Patterns
4 pages
Unit Iv
No ratings yet
Unit Iv
28 pages
Mining Web Log Files For Web Analytics and Usage Patterns To Improve Web Organization
No ratings yet
Mining Web Log Files For Web Analytics and Usage Patterns To Improve Web Organization
9 pages
Behavior Study of Web Users Using Two-Phase Utility Mining and Density Based Clustering Algorithms
No ratings yet
Behavior Study of Web Users Using Two-Phase Utility Mining and Density Based Clustering Algorithms
6 pages
BA 4 Module 1
No ratings yet
BA 4 Module 1
57 pages
Web Mining: by Saumil Shah Roll No: 46 Mca 4 Sem
No ratings yet
Web Mining: by Saumil Shah Roll No: 46 Mca 4 Sem
28 pages
Web Analytics: BY: Gaurav Mittal Doms - Nit-Trichy
No ratings yet
Web Analytics: BY: Gaurav Mittal Doms - Nit-Trichy
33 pages
Bda Class - Feb 7th
No ratings yet
Bda Class - Feb 7th
28 pages
Web Usage Mining On Proxy Servers: A Case Study: January 2001
No ratings yet
Web Usage Mining On Proxy Servers: A Case Study: January 2001
19 pages
Module 11 Web Analytics
No ratings yet
Module 11 Web Analytics
32 pages
Unit 7: Data Mining For Business Intelligence Applications: A) Balanced Scorecard
33% (3)
Unit 7: Data Mining For Business Intelligence Applications: A) Balanced Scorecard
11 pages
Jansen Website Analysis
No ratings yet
Jansen Website Analysis
22 pages
Web Usage Mining For Extracting Users' Navigational
No ratings yet
Web Usage Mining For Extracting Users' Navigational
7 pages
Measuring - Success - With - Google - Analytics - Reza Yazdi
No ratings yet
Measuring - Success - With - Google - Analytics - Reza Yazdi
158 pages
An Effective Web Usage Analysis Using Fuzzy Clustering: P.Nithya, P.Sumathi
No ratings yet
An Effective Web Usage Analysis Using Fuzzy Clustering: P.Nithya, P.Sumathi
6 pages
Marketingresearch
No ratings yet
Marketingresearch
35 pages
Automated Software Testing As A Service
No ratings yet
Automated Software Testing As A Service
6 pages
SSW SQL Server Object Naming Standard
No ratings yet
SSW SQL Server Object Naming Standard
1 page
ICT Theory Chapter 1 For Student
No ratings yet
ICT Theory Chapter 1 For Student
11 pages
CCRF en
No ratings yet
CCRF en
26 pages
ITT542 - Case Study 1 Network Layer Protocol J
No ratings yet
ITT542 - Case Study 1 Network Layer Protocol J
10 pages
Lecture Notes Ch1
No ratings yet
Lecture Notes Ch1
24 pages
19 CL4000GD
No ratings yet
19 CL4000GD
2 pages
SQL Result - Phpmyadmin 2.11.8.1Deb5+Lenny9
No ratings yet
SQL Result - Phpmyadmin 2.11.8.1Deb5+Lenny9
2 pages
3D Games On A High-End Socket 3 W/voodoo2: Cachechk v4 Read Speeds (MB/S) : L1 L2 RAM
No ratings yet
3D Games On A High-End Socket 3 W/voodoo2: Cachechk v4 Read Speeds (MB/S) : L1 L2 RAM
1 page
Compare Xiaomi Redmi Note 11 Pro Plus 5G
No ratings yet
Compare Xiaomi Redmi Note 11 Pro Plus 5G
4 pages
OkstraGrunderwerb de
No ratings yet
OkstraGrunderwerb de
12 pages
Victory School Club Membership System
100% (1)
Victory School Club Membership System
20 pages
4 OOP Concepts
No ratings yet
4 OOP Concepts
14 pages
United Hackathon
No ratings yet
United Hackathon
10 pages
Chapter 3 VLANs
No ratings yet
Chapter 3 VLANs
39 pages
FAQ - VeraCrypt
No ratings yet
FAQ - VeraCrypt
4 pages
3 Click Is All You Need To Rank # 1 On Google and Youtube
No ratings yet
3 Click Is All You Need To Rank # 1 On Google and Youtube
14 pages
Manual Testing Questions and Answers
No ratings yet
Manual Testing Questions and Answers
11 pages
Easy Harmony GXU - HMIGXU5512
No ratings yet
Easy Harmony GXU - HMIGXU5512
6 pages
Introduction To Management Science: Thirteenth Edition, Global Edition
No ratings yet
Introduction To Management Science: Thirteenth Edition, Global Edition
41 pages
Markaz Dropshipper Onboarding Guide
No ratings yet
Markaz Dropshipper Onboarding Guide
14 pages
wave25操作手册
No ratings yet
wave25操作手册
290 pages
Uninterruptible Power Systems: DX400E/DX600E/DX800E
No ratings yet
Uninterruptible Power Systems: DX400E/DX600E/DX800E
16 pages
Krishna Mohan - Short
No ratings yet
Krishna Mohan - Short
3 pages
Algebraic-and-partial-fractions-MS - Docx 3
No ratings yet
Algebraic-and-partial-fractions-MS - Docx 3
10 pages
FNAL VoIP CUCM User
No ratings yet
FNAL VoIP CUCM User
4 pages