100% found this document useful (3 votes)
1K views28 pages

Web Mining

Web mining doc which is related to data mining Which helps students and for the reference
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
1K views28 pages

Web Mining

Web mining doc which is related to data mining Which helps students and for the reference
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Overview

 Definition of web mining


 Challenges in Web Mining
 Classification of Web Mining
Web Mining
 Web is the single largest data source in the world
 Due to heterogeneity and lack of structure of web data,
mining is a challenging task
 Multidisciplinary field:
 data mining, machine learning, natural language
 processing, statistics, databases, information
 retrieval, multimedia, etc.
What is Web Mining?
Web Mining Definition:-

 Web mining is a technique to automatically


discover and extract useful information from
www.

application of data mining and machine learning techniques


.
to extract useful knowledge from the content, structure, and
usage of Web resources.

4
Opportunities and Challenges
 Web offers an unprecedented opportunity and
challenge to data mining
 The amount of information on the Web is huge, and easily
accessible.
 The coverage of Web information is very wide and diverse. One
can find information about almost anything.
 Information/data of almost all types exist on the Web, e.g.,
structured tables, texts, multimedia data, etc.
 Much of the Web information is semi-structured due to the
nested structure of HTML code.
 Much of the Web information is linked. There are hyperlinks
among pages within a site, and across different sites.
 Much of the Web information is redundant. The same piece of
information or its variants may appear in many pages.
Opportunities and Challenges
 The Web is noisy. A Web page typically contains a mixture of
many kinds of information, e.g., main contents,
advertisements, navigation panels, copyright notices, etc.
 The Web is also about services. Many Web sites and pages
enable people to perform operations with input parameters,
i.e., they provide services.
 The Web is dynamic. Information on the Web changes
constantly. Keeping up with the changes and monitoring the
changes are important issues.
 Above all, the Web is a virtual society. It is not only about
data, information and services, but also about interactions
among people, organizations and automatic systems, i.e.,
communities.
Data Mining vs. Web Mining
 Traditional data mining
 data is structured and relational
 well-defined tables, columns, rows, keys, and
constraints.
 Web data
 Semi-structured and unstructured
 readily available data
 rich in features and patterns
Web mining may be divided into
three categories:

1. Web usage mining


2. Web content mining
3. Web structure mining

December 2008 ©GKGupta 8


Web-Usage Mining
 What is Usage Mining?
Discovering user ‘navigation patterns’ from web
data.
Prediction of user behavior while the user
interacts with the web.

Helps to Improve large Collection of resources.

Extracting interesting patterns from user


interactions with resources on one or more Web
sites
Web-Usage Mining
cont…
 Usage Mining Techniques
Data Preparation
Data Collection
Data Selection
Data Cleaning
Web usage mining patterns:-
Navigation Patterns
Sequential Patterns
Web-Usage Mining
cont…
 Data Mining Techniques – Navigation Patterns
Analysis:
Example:
70% of users who accessed /company/product2 did so by
starting at /company and proceeding through /company/new,
/company/products and company/product1

80% of users who accessed the site started from


/company/products

65% of users left the site after


four or less page references
Web-Usage Mining
cont…
 Data Mining Techniques – Sequential Patterns
Customer Transaction Time Purchased Items
John 6/21/05 5:30 pm Beer
Example: John 6/22/05 10:20 pm Brandy

Supermarket Frank 6/20/05 10:15 am Juice, Coke


Frank 6/20/05 11:50 am Beer
Cont… Frank 6/20/05 12:50 am Wine, Cider

Mary 6/20/05 2:30 pm Beer


Mary 6/21/05 6:17 pm Wine, Cider
Mary 6/22/05 5:05 pm Brandy
Web-Usage Mining
cont…
 Data Mining Techniques – Sequential Patterns
Customer Sequence
Customer Customer Sequences
Example: John (Beer) (Brandy)
Supermarket Frank (Juice, Coke) (Beer) (Wine, Cider)
Mary (Beer) (Wine, Cider) (Brandy)
Cont…
Mining Result
Sequential Patterns with Supporting
Support >= 40% Customers

(Beer) (Brandy) John, Frank


(Beer) (Wine, Cider) Frank, Mary
 Applications:
• user and customer behavior modeling
• Web site optimization
• e-customer relationship management
• Web marketing
• targeted advertising
Web Content Mining
 Extracting useful knowledge from the contents of
Web documents or other semantic information
about Web resources
 Content data may consist of text, images, audio,
video, structured records from lists and tables, or
item attributes from backend databases.
 Goes beyond key word extraction, or some simple
statistics of words and phrases in documents
Web Content Mining
 Pre-processing data before web content mining:
feature selection
 Post-processing data can reduce ambiguous
searching results
 Web Page Content Mining
 Mines the contents of documents directly
 Search Engine Mining
 Improves on the content search of other tools like search
engines.
Web Content Mining
 Web content mining is related to data mining and
text mining.its related to data mining because
many data mining techniques can be applied in
Web content mining.
 It is related to text mining because much of the web
contents are texts.
 Web data are mainly semi-structured and/or
unstructured, while data mining is structured and
text is unstructured.
Tech for Web Content Mining

 Classifications
 Clustering
 Association
Web Content Mining
:: example – clustered search results

Can drill
down within
clusters to
view sub-
topics or to
view the
relevant
subset of
results

19
Web Content Mining
:: example – personalized content delivery

Google's personalized
news is an example of
a content-based
recommender system
which recommends
items (in part) based
on the similarity of
their content to a
user’s profile
(gathered from search
and click history)

20
 Applications:
• document clustering or
categorization
• topic identification / tracking
• concept discovery
• focused crawling
• content-based personalization
• intelligent search tools
Web-Structure Mining
 Generate structural summary about the Web site
and Web page
• Discovering the Web Page Structure.
•Discovering useful patterns from the hyperlink
structure connecting Web sites or Web resources.
•Discovering the nature of the hierarchy of hyperlinks in
the website and its structure.
Web-Structure Mining
cont…
 Finding Information about web pages.
Retrieving information about the relevance and the
quality of the web page.

 Inference on Hyperlink and content.


The web page contains not only information but also
hyperlinks, which contains huge amount of annotation.
Hyperlink identifies author’s endorsement of the other web
page.
Discovering useful patterns from the hyperlink structure
connecting Web sites or Web resources
 Applications:
• document retrieval and ranking (e.g., Google)
• discovery of “hubs” and “authorities”
• discovery of Web communities
• social network analysis
Web Structure Mining
:: example – Google’s PageRank algorithm

 Basic idea:
 Rank of a page depends on the ranks of pages
pointing to it
 Out Degree of page is the number of edges
pointing away from it – used to compute the
contribution of the page to those to which it
points
 The final PageRank value represents the
Illustration of PageRank propagation probability that a random surfer will reach the
page
 d is the prob. that a random surfer chooses the
page directly rather than getting there via
navigation

25
 In general, there are mainly four kinds of data
mining techniques applied to the web mining
domain to discover the user navigation pattern:
 Association Rule mining
 Sequential pattern
 Clustering
 Classification
Applications of Web Mining
 With the rapid growth of World Wide Web, Web mining becomes a
very hot and popular topic in Web research. E-commerce and E-
services are claimed to be the killer applications for Web mining,
and Web mining now also plays an important role for E-
commerce website and E-services to understand how their
websites and services are used and to provide better services for
their customers and users.

 A few applications are:

 E-commerce Customer Behavior Analysis


 E-commerce Transaction Analysis
 E-commerce Website Design
 E-banking
 M-commerce
 Web Advertisement
 Search Engine
 Online Auction.
Thank you

You might also like