Web Mining

Web mining doc which is related to data mining Which helps students and for the reference

Uploaded by

Akilesh Peethambaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

1K views28 pages

Web Mining

Web mining doc which is related to data mining Which helps students and for the reference

Uploaded by

Akilesh Peethambaram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Overview

 Definition of web mining

 Challenges in Web Mining
 Classification of Web Mining
Web Mining
 Web is the single largest data source in the world
 Due to heterogeneity and lack of structure of web data,
mining is a challenging task
 Multidisciplinary field:
 data mining, machine learning, natural language
 processing, statistics, databases, information
 retrieval, multimedia, etc.
What is Web Mining?
Web Mining Definition:-

 Web mining is a technique to automatically

discover and extract useful information from
www.

application of data mining and machine learning techniques

.
to extract useful knowledge from the content, structure, and
usage of Web resources.

4
Opportunities and Challenges
 Web offers an unprecedented opportunity and
challenge to data mining
 The amount of information on the Web is huge, and easily
accessible.
 The coverage of Web information is very wide and diverse. One
can find information about almost anything.
 Information/data of almost all types exist on the Web, e.g.,
structured tables, texts, multimedia data, etc.
 Much of the Web information is semi-structured due to the
nested structure of HTML code.
 Much of the Web information is linked. There are hyperlinks
among pages within a site, and across different sites.
 Much of the Web information is redundant. The same piece of
information or its variants may appear in many pages.
Opportunities and Challenges
 The Web is noisy. A Web page typically contains a mixture of
many kinds of information, e.g., main contents,
advertisements, navigation panels, copyright notices, etc.
 The Web is also about services. Many Web sites and pages
enable people to perform operations with input parameters,
i.e., they provide services.
 The Web is dynamic. Information on the Web changes
constantly. Keeping up with the changes and monitoring the
changes are important issues.
 Above all, the Web is a virtual society. It is not only about
data, information and services, but also about interactions
among people, organizations and automatic systems, i.e.,
communities.
Data Mining vs. Web Mining
 Traditional data mining
 data is structured and relational
 well-defined tables, columns, rows, keys, and
constraints.
 Web data
 Semi-structured and unstructured
 readily available data
 rich in features and patterns
Web mining may be divided into
three categories:

1. Web usage mining

2. Web content mining
3. Web structure mining

December 2008 ©GKGupta 8

Web-Usage Mining
 What is Usage Mining?
Discovering user ‘navigation patterns’ from web
data.
Prediction of user behavior while the user
interacts with the web.

Helps to Improve large Collection of resources.

Extracting interesting patterns from user

interactions with resources on one or more Web
sites
Web-Usage Mining
cont…
 Usage Mining Techniques
Data Preparation
Data Collection
Data Selection
Data Cleaning
Web usage mining patterns:-
Navigation Patterns
Sequential Patterns
Web-Usage Mining
cont…
 Data Mining Techniques – Navigation Patterns
Analysis:
Example:
70% of users who accessed /company/product2 did so by
starting at /company and proceeding through /company/new,
/company/products and company/product1

80% of users who accessed the site started from

/company/products

65% of users left the site after

four or less page references
Web-Usage Mining
cont…
 Data Mining Techniques – Sequential Patterns
Customer Transaction Time Purchased Items
John 6/21/05 5:30 pm Beer
Example: John 6/22/05 10:20 pm Brandy

Supermarket Frank 6/20/05 10:15 am Juice, Coke

Frank 6/20/05 11:50 am Beer
Cont… Frank 6/20/05 12:50 am Wine, Cider

Mary 6/20/05 2:30 pm Beer

Mary 6/21/05 6:17 pm Wine, Cider
Mary 6/22/05 5:05 pm Brandy
Web-Usage Mining
cont…
 Data Mining Techniques – Sequential Patterns
Customer Sequence
Customer Customer Sequences
Example: John (Beer) (Brandy)
Supermarket Frank (Juice, Coke) (Beer) (Wine, Cider)
Mary (Beer) (Wine, Cider) (Brandy)
Cont…
Mining Result
Sequential Patterns with Supporting
Support >= 40% Customers

(Beer) (Brandy) John, Frank

(Beer) (Wine, Cider) Frank, Mary
 Applications:
• user and customer behavior modeling
• Web site optimization
• e-customer relationship management
• Web marketing
• targeted advertising
Web Content Mining
 Extracting useful knowledge from the contents of
Web documents or other semantic information
about Web resources
 Content data may consist of text, images, audio,
video, structured records from lists and tables, or
item attributes from backend databases.
 Goes beyond key word extraction, or some simple
statistics of words and phrases in documents
Web Content Mining
 Pre-processing data before web content mining:
feature selection
 Post-processing data can reduce ambiguous
searching results
 Web Page Content Mining
 Mines the contents of documents directly
 Search Engine Mining
 Improves on the content search of other tools like search
engines.
Web Content Mining
 Web content mining is related to data mining and
text mining.its related to data mining because
many data mining techniques can be applied in
Web content mining.
 It is related to text mining because much of the web
contents are texts.
 Web data are mainly semi-structured and/or
unstructured, while data mining is structured and
text is unstructured.
Tech for Web Content Mining

 Classifications
 Clustering
 Association
Web Content Mining
:: example – clustered search results

Can drill
down within
clusters to
view sub-
topics or to
view the
relevant
subset of
results

19
Web Content Mining
:: example – personalized content delivery

Google's personalized
news is an example of
a content-based
recommender system
which recommends
items (in part) based
on the similarity of
their content to a
user’s profile
(gathered from search
and click history)

20
 Applications:
• document clustering or
categorization
• topic identification / tracking
• concept discovery
• focused crawling
• content-based personalization
• intelligent search tools
Web-Structure Mining
 Generate structural summary about the Web site
and Web page
• Discovering the Web Page Structure.
•Discovering useful patterns from the hyperlink
structure connecting Web sites or Web resources.
•Discovering the nature of the hierarchy of hyperlinks in
the website and its structure.
Web-Structure Mining
cont…
 Finding Information about web pages.
Retrieving information about the relevance and the
quality of the web page.

 Inference on Hyperlink and content.

The web page contains not only information but also
hyperlinks, which contains huge amount of annotation.
Hyperlink identifies author’s endorsement of the other web
page.
Discovering useful patterns from the hyperlink structure
connecting Web sites or Web resources
 Applications:
• document retrieval and ranking (e.g., Google)
• discovery of “hubs” and “authorities”
• discovery of Web communities
• social network analysis
Web Structure Mining
:: example – Google’s PageRank algorithm

 Basic idea:
 Rank of a page depends on the ranks of pages
pointing to it
 Out Degree of page is the number of edges
pointing away from it – used to compute the
contribution of the page to those to which it
points
 The final PageRank value represents the
Illustration of PageRank propagation probability that a random surfer will reach the
page
 d is the prob. that a random surfer chooses the
page directly rather than getting there via
navigation

25
 In general, there are mainly four kinds of data
mining techniques applied to the web mining
domain to discover the user navigation pattern:
 Association Rule mining
 Sequential pattern
 Clustering
 Classification
Applications of Web Mining
 With the rapid growth of World Wide Web, Web mining becomes a
very hot and popular topic in Web research. E-commerce and E-
services are claimed to be the killer applications for Web mining,
and Web mining now also plays an important role for E-
commerce website and E-services to understand how their
websites and services are used and to provide better services for
their customers and users.

 A few applications are:

 E-commerce Customer Behavior Analysis

 E-commerce Transaction Analysis
 E-commerce Website Design
 E-banking
 M-commerce
 Web Advertisement
 Search Engine
 Online Auction.
Thank you

2.1selection of Appropriate Project Approach SPM
No ratings yet
2.1selection of Appropriate Project Approach SPM
71 pages
B-QAC-PLN-210-39154 SARPI ITP For Structural Steel Erection Works
50% (2)
B-QAC-PLN-210-39154 SARPI ITP For Structural Steel Erection Works
19 pages
E-Gov Maharashtra Training Module
No ratings yet
E-Gov Maharashtra Training Module
53 pages
Web Mining
No ratings yet
Web Mining
71 pages
Vi - Sem - Bca Ai Question Bank
No ratings yet
Vi - Sem - Bca Ai Question Bank
13 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Coda
No ratings yet
Coda
15 pages
RNN Neural Network
No ratings yet
RNN Neural Network
23 pages
Data Mining Report
100% (1)
Data Mining Report
15 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
RMMMN Plan
No ratings yet
RMMMN Plan
20 pages
Week 1 - Introduction To Information Security
No ratings yet
Week 1 - Introduction To Information Security
47 pages
Enterprise Information Architecture Component Model - Chapter 5
100% (1)
Enterprise Information Architecture Component Model - Chapter 5
27 pages
Chap 04 - Intro To Client-Side Scripting
No ratings yet
Chap 04 - Intro To Client-Side Scripting
36 pages
Unit - 1: 1.role of Information Architect
100% (1)
Unit - 1: 1.role of Information Architect
27 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
0% (1)
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
58 pages
Data Compression Intro
100% (1)
Data Compression Intro
107 pages
Dsbda Unit 5 Imp Batnotes
No ratings yet
Dsbda Unit 5 Imp Batnotes
5 pages
HCI Unit IV
No ratings yet
HCI Unit IV
34 pages
Oosd Notes
50% (2)
Oosd Notes
131 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
Multcasting Communication
No ratings yet
Multcasting Communication
14 pages
Data Warehousing & Mining: Unit - V
100% (2)
Data Warehousing & Mining: Unit - V
13 pages
Selective Tuning and Indexing
No ratings yet
Selective Tuning and Indexing
3 pages
08 09 23 Soft Computing - ANN - PPT
No ratings yet
08 09 23 Soft Computing - ANN - PPT
153 pages
A Model For Network Security
No ratings yet
A Model For Network Security
1 page
Webmining I
No ratings yet
Webmining I
69 pages
Webmining I
No ratings yet
Webmining I
69 pages
Web Mining
No ratings yet
Web Mining
28 pages
Data Mining
No ratings yet
Data Mining
12 pages
Module1PartAweb Mining-Intro
No ratings yet
Module1PartAweb Mining-Intro
28 pages
Bda Class - Feb 7th
No ratings yet
Bda Class - Feb 7th
28 pages
Webminingtextmining 160906165305
No ratings yet
Webminingtextmining 160906165305
18 pages
Web Mining
No ratings yet
Web Mining
42 pages
Web Mining
No ratings yet
Web Mining
53 pages
Web Mining: by Saumil Shah Roll No: 46 Mca 4 Sem
No ratings yet
Web Mining: by Saumil Shah Roll No: 46 Mca 4 Sem
28 pages
Web Mining
No ratings yet
Web Mining
20 pages
Web Mining MMMUT NOTES
No ratings yet
Web Mining MMMUT NOTES
5 pages
UNIT - 3 Final
No ratings yet
UNIT - 3 Final
37 pages
Web Mining: By:-Vineeta 8pgc18 M.Tech (II Semester)
No ratings yet
Web Mining: By:-Vineeta 8pgc18 M.Tech (II Semester)
33 pages
Week 1
No ratings yet
Week 1
80 pages
QU PPT Format
No ratings yet
QU PPT Format
12 pages
Business Data Mining Week 13
No ratings yet
Business Data Mining Week 13
15 pages
19 Web Mining 2
No ratings yet
19 Web Mining 2
41 pages
Web Mining: Presented By: Vikash Kumar
No ratings yet
Web Mining: Presented By: Vikash Kumar
24 pages
Overview of Web Data Mining and Applications: Bamshad Mobasher Depaul University
No ratings yet
Overview of Web Data Mining and Applications: Bamshad Mobasher Depaul University
25 pages
EB Ining: Dvanced Opics
0% (1)
EB Ining: Dvanced Opics
48 pages
Web Mining
No ratings yet
Web Mining
73 pages
Unit 3 DMW
No ratings yet
Unit 3 DMW
31 pages
13-Web Mining
No ratings yet
13-Web Mining
3 pages
Web and Text Mining
No ratings yet
Web and Text Mining
73 pages
Data Mining-World Wide Web
No ratings yet
Data Mining-World Wide Web
4 pages
Introduction To Web Mining
No ratings yet
Introduction To Web Mining
20 pages
Web Mining and Text Mining
No ratings yet
Web Mining and Text Mining
65 pages
Web Content Mining: by Saumya Aggarwal (0232083107 - IT) Richa Sharma (0732082707 - CSE)
No ratings yet
Web Content Mining: by Saumya Aggarwal (0232083107 - IT) Richa Sharma (0732082707 - CSE)
12 pages
DM M5.1 Web Mining v3.11
No ratings yet
DM M5.1 Web Mining v3.11
114 pages
Unit 4 (DWDM)
No ratings yet
Unit 4 (DWDM)
27 pages
Unit 7: Web Mining and Text Mining
No ratings yet
Unit 7: Web Mining and Text Mining
13 pages
Web Usage Mining
No ratings yet
Web Usage Mining
13 pages
Web Mining
No ratings yet
Web Mining
13 pages
Web Mining and Knowledge Discovery of Usage Patterns: CS 748T Project (Part I)
No ratings yet
Web Mining and Knowledge Discovery of Usage Patterns: CS 748T Project (Part I)
25 pages
Syed Zahid Hussain: Resume
No ratings yet
Syed Zahid Hussain: Resume
3 pages
Drafting
No ratings yet
Drafting
4 pages
Proposal For Tech Fest
100% (4)
Proposal For Tech Fest
3 pages
2.2 Data Modeling and Management Relationship Types
No ratings yet
2.2 Data Modeling and Management Relationship Types
15 pages
Executive Protection Proactive Security
No ratings yet
Executive Protection Proactive Security
15 pages
TLUK Materials Catalogue 2017
No ratings yet
TLUK Materials Catalogue 2017
80 pages
Sabareesh 22215633 Automobile Service Center
No ratings yet
Sabareesh 22215633 Automobile Service Center
14 pages
Su 500 3
No ratings yet
Su 500 3
14 pages
Estimating The Size of Temporary Facilities in Construction Site Layout Planning Using Simulation
No ratings yet
Estimating The Size of Temporary Facilities in Construction Site Layout Planning Using Simulation
10 pages
Chemical Kinetics Assinment
No ratings yet
Chemical Kinetics Assinment
9 pages
Customer Specific Requirements Matrix
No ratings yet
Customer Specific Requirements Matrix
6 pages
Sitra News Jan 13
No ratings yet
Sitra News Jan 13
23 pages
Game Proposal
No ratings yet
Game Proposal
5 pages
Contractor Registration Package - Civil Works
No ratings yet
Contractor Registration Package - Civil Works
49 pages
Survitec Mooring Ropes Brochure
No ratings yet
Survitec Mooring Ropes Brochure
20 pages
03 Raw Material Supplier Checklist 101214
100% (1)
03 Raw Material Supplier Checklist 101214
12 pages
Weboc 1
No ratings yet
Weboc 1
30 pages
Total Quality Through. Project Management
No ratings yet
Total Quality Through. Project Management
236 pages
Quick Referral - SSIM CHAPTER 6
No ratings yet
Quick Referral - SSIM CHAPTER 6
12 pages
List of Linux Distributions - Wikipedia, The Free Encyclopedia
No ratings yet
List of Linux Distributions - Wikipedia, The Free Encyclopedia
19 pages
Template Profile
No ratings yet
Template Profile
28 pages
Aerodynamics Slides
No ratings yet
Aerodynamics Slides
205 pages
Problem Management Best Practices
No ratings yet
Problem Management Best Practices
2 pages
Aerospace Material Specification
No ratings yet
Aerospace Material Specification
9 pages
LOTO Certificate
No ratings yet
LOTO Certificate
1 page
Bus Stop Steel Final
No ratings yet
Bus Stop Steel Final
33 pages
22 Years of It and Java Experience Resume
No ratings yet
22 Years of It and Java Experience Resume
4 pages
ML July August 2012
No ratings yet
ML July August 2012
53 pages