0% found this document useful (0 votes)
14 views31 pages

Web Mining For BI - Part 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views31 pages

Web Mining For BI - Part 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Topic 4

Web Mining for Business Intelligence


Part 2: Web Mining
ITS 583 – BUSINESS INTELLIGENCE
Norizan Mohamad, UiTM Terengganu
Learning Objectives
 Describe Web mining, its objectives, and its benefits
 Understand the three different branches of Web mining
 Web content mining
 Web structure mining

 Web usage mining


 Understand the applications of these three mining paradigms
 Differentiate between text mining, Web mining and data mining
WHAT is Web Mining?
 the use of data
mining techniques
to automatically
discover
and
extract information
from Web
documents and
services.
WHY Web Mining?
 Web keeps data, information and answers
 Offers solution to companies
 Assists in decision making
 Using data mining techniques
 to make the web more useful and more profitable
 to increase the efficiency of our interaction with the
web
WHY Web Mining?
Web Mining
 the process of discovering intrinsic
relationships from Web data
 Textual content
◼ Web content, for the data found on Web pages and
inside of documents
 Linkage structure
◼ Web graph, from links between pages, people and
other data
 Usage
◼ Web activity, from server logs and Web browser
activity tracking
Web Mining
 the process of discovering intrinsic relationships
from Web data
 Textual content
 Linkage structure Web Mining

 Usage

Web Content Mining Web Structure Mining Web Usage Mining


Source: unstructured Source: the unified Source: the detailed
textual content of the resource locator (URL) description of a Web
Web pages (usually in links contained in the site’s visits (sequence
HTML format) Web pages of clicks by sessions)
Web Mining Taxonomy
Web Mining: aims

discovering
useful discovering understanding
information or and modeling user access
knowledge the hyperlink patterns from
from web page structure of the Web usage logs
contents web pages
Web Mining techniques

Employ clustering analysis,


web link analysis, pattern
Data analysis, association analysis
mining and correlation analysis
techniques
Web Mining Techniques
Web Content Mining
 Mining of the textual content on the Web
 Convert raw data on webpage into database
 Data collection via Web crawlers search engines
 Crawlers such as
◼ Google
◼ Yahoo

 The crawler
◼ digs through individual web pages,
◼ pulls out keywords and then
◼ adds the pages to the search engine's database
Steps in Web Content Mining
Collect
• Fetch the content from the web

Parse
• Extract usable data from formatted data (HTML, PDF,)

Analyze

• Tokenize, rate, classify, cluster, filter, sort

Produce

• Turn the results of analysis into report or search index)


Web Usage Mining
 Also known as Web Log Mining
 Extraction of information from data generated
through Web page visits and transactions…
 data stored in server access logs, referrer logs, agent
logs, and client-side cookies
 user characteristics and usage profiles

 metadata, such as page attributes, content attributes,


and usage data
 Clickstream data
 Clickstream analysis
Clickstream Data & Analysis
 Clickstream data
 Data that provide a trail of the user’s activities and
show the user’s browsing patterns
◼ eg.
◼ which sites are visited
◼ which pages
◼ how long
 Clickstream analysis
 The analysis of data that occur in the web environment
Web Usage Mining: clickstream analysis

Pre-Process Data Extract Knowledge


Website
User / Collecting Usage patterns
Customer Merging User profiles
Cleaning Page profiles
Structuring Visit profiles
- Identify users Customer value
- Identify sessions
- Identify page views
- Identify visits
Weblogs

How to better the data


How to improve the Web site
How to increase the customer value
Web Usage Mining
 Web usage mining applications
 Determine the lifetime value of clients
 Design cross-marketing strategies across products.

 Evaluate promotional campaigns

 Target electronic ads and coupons at user groups


based on user access patterns
 Predict user behavior based on previously learned
rules and users' profiles
 Present dynamic information to users based on their
interests and profiles…
Web Usage Mining .. cont.
 Personalization-tracking of previously accessed pages.
 Determining frequent access behavior for users.
 Improve structure of a site’s web pages.
 Aid in caching and prediction of future page references.
 Improve design of individual pages.
 Improve effectiveness of e-commerce(sales and
advertising).
 Gathering statistics-considering accessed pages may or
may not be viewed as part web mining.
Web Usage Mining Issues
 Identification of exact user not possible.
 Exact sequence of pages referenced by a user not
possible due to caching.
 Session not well defined.
 Security, privacy, and legal issues.
Web Structure Mining
 the process of discovering structure information from
the Web.
 The structure of a typical Web graph

 consists of
◼ Web pages as nodes, and
◼ hyperlinks as edges connecting between two related pages.
Web Structure Mining
 Structure of the Web
 Web graph

The shape of the Web Graph is more accurately


represented by a daisy-looking graph.

The Bow-Tie shape

The shape of the Chinese Web


Graph
The Use of Web Structure
 A useful source for extracting information:
 Quality of Web Page
◼ The authority of a page on a topic
◼ Ranking of web pages
 Interesting Web Structures

◼ Graph patterns like Co-citation, Social choice,


Complete bipartite graphs, etc.
 Web Page Classification

◼ Classifying web pages according to various topic


The Use of Web Structure
 Which pages to crawl
◼ Deciding which web pages to add to the collection
of web pages
 Finding Related Pages

◼ Given one relevant page, find all related pages


 Detection of duplicated pages
◼ Detection of neared-mirror sites to eliminate
duplication
Web Mining Categories
Web Mining Success Stories
 Amazon.com, Ask.com, Scholastic.com, …
 Website Optimization Ecosystem
Customer Interaction Analysis of Interactions Knowledge about the Holistic
on the Web View of the Customer

Web
Analytics

Voice of
Customer

Customer Experience
Management
Example: Amazon.com
 All books on software at Amazon.com
(Database)
Example: Amazon.com
 Book cover images on Amazon.com
(File Base)
Web Mining Tools
Product Name URL
Angoss Knowledge WebMiner angoss.com
ClickTracks clicktracks.com
LiveStats from DeepMetrix deepmetrix.com
Megaputer WebAnalyst megaputer.com
MicroStrategy Web Traffic Analysis microstrategy.com
SAS Web Analytics sas.com
SPSS Web Mining for Clementine spss.com
WebTrends webtrends.com
XML Miner scientio.com
Web Mining vs Text Mining vs Data Mining

 Web mining
 use of data mining techniques to automatically
discover and extract information from Web
documents and services
 Text mining
 Deal with textual data
 Data mining
Text Web
Mining Mining
Data
Mining
Web Mining Process
46

 END OF PART 2

You might also like