0% found this document useful (0 votes)

17 views42 pages

Web Mining

In detail about web mining and its uses from nit Raipur This will be really helpful

Uploaded by

t54dpsktvt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views42 pages

Web Mining

In detail about web mining and its uses from nit Raipur This will be really helpful

Uploaded by

t54dpsktvt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 42

Mining the World-Wide

Web

1
Mining the World-Wide Web

 The WWW is huge, widely distributed, global information

service centre for
 Information services: news, advertisements,
consumer information, financial management,
education, government, e-commerce, etc.
 Hyper-link information
 Access and usage information
 WWW provides rich sources for data mining
 Challenges
 Too huge for effective data warehousing and data
mining
 Too complex and heterogeneous: no standards and
structure

2
Mining the World-Wide Web

 Design of a Web Log Miner

 Web log is filtered to generate a relational database
 A data cube is generated form database
 OLAP is used to drill-down and roll-up in the cube
 OLAM is used for mining interesting knowledge

Knowledge
Web log Database Data Cube Sliced and diced
cube

1 2
Data Cleaning 3 4
Data Cube OLAP Data Mining
Creation
3
Mining the World-Wide Web

 Growing and changing very rapidly

Internet growth

40000000
35000000
30000000
Hos ts

25000000
20000000
15000000
10000000
5000000 Sep-69

Sep-72

Sep-75

Sep-78

Sep-81

Sep-84

Sep-87

Sep-90

Sep-93

Sep-96

Sep-99
0

 Broad diversity of user communities

 Only a small portion of the information on the Web is
truly relevant or useful
 99% of the Web information is useless to 99% of
Web users
 How can we find high-quality Web pages on a
specified topic? 4
Web search engines

 Index-based: search the Web, index Web

pages, and build and store huge keyword-based
indices
 Help locate sets of Web pages containing certain
keywords
 Deficiencies
 A topic of any breadth may easily contain
hundreds of thousands of documents
 Many documents that are highly relevant to a
topic may not contain keywords defining them

5
Web Mining: A more challenging task

 Searches for
 Web access patterns
 Web structures
 Regularity and dynamics of Web contents
 Problems
 The “abundance” problem
 Limited coverage of the Web: hidden Web
sources, majority of data in DBMS
 Limited query interface based on keyword-
oriented search
 Limited customization to individual users
6
What is Web Mining?

Discovering useful information from

the World-Wide Web and its usage
patterns.
Benefits of Web Mining
 Finding relevant information
 Discovering new knowledge from the
Web
 Personalized Web page Synthesis
 Learning about individual users
cont…

 We use either Browser or search

service to find specific information on
the web.
 Specifying keyword query we get
response from a web search engine in
list of pages on rank basis.
Cont…
 There are three major approaches
while assessing information stored on
the web:
 Keyword-based search or topic-
 Directory browsing with search
engine such as Google or Yahoo.
 Querying deep web sources-
Where information such as
amazon.com and realtor.com
11
Introduction
 Web mining techniques provides a set of
techniques that can be used to solve the
problems of the customers like…
 What the customer do and want.
 Mass customizing the information to the
intended customers or
 Personalizing in to individual users
 Problems related to effective web site
design and management
 Problem related to marketing etc.
Other related techniques of web
mining…..
 Information retrieval (IR)
 Natural language processing (NLP)
 In data mining terms there are three
operations of interests.
1. Clustering (e.g., finding natural
groupings of users, pages etc.)
2. Associations (e.g., which URLs
tends to be requested together.) and
3. Sequential analysis (e.g., the order
in which URLs tend to be accessed.
Web Mining Tasks
. Web
Mining

Web Content Web Web Usage

Mining Structure Mining
Mining

General Access Customized

Web Page Search Result
Pattern Usage
Content Mining Tracking Tracking
Mining
Web Mining Category

 Mining techniques in the Web is

commonly categorized into three
areas of interest…
1. Web Content Mining
2. Web Structure Mining
3. Web usage Mining

15
Cont…
 Web Content Mining- Application of
data mining techniques to unstructured or
semi structured text. Typically HTML
documents.
 Web Structure Mining – Use of
hyperlink structure of the web as an
additional information source.
 Web Usage Mining – Analysis of user
interaction with a web server.
Usage patterns
 Number of visitors
 Popularity e.g., products, movies, music 16
WEB CONTENT MINING
Web Content Mining
 Web content mining consists of
several types of data such as…
 Textual
 Image
 Audio
 Video
 Metadata, as well as
 Hyperlinks.
Cont…
 Recent research on mining multi-types
of data is termed as multimedia data
mining.
 The textual parts of web content data
consists of unstructured data such as
free text, semi-structure data such as
HTML documents and more structured
data such as data in the tables or
database-generated HTML.
 Most of the web content data is
unstructured, free text data.
Cont…
 As a result, the techniques of text
mining can be directly employed for
web content mining in such cases.
Web Content Mining
 It describes the discovery of useful
information from the web content.
Information
 Web contains many kinds of data such
as..
 Government information are gradually
being placed on the web in recent years.
 Many commercial institutes are
transforming their business and services
electronically.
Cont…
 Existence of Digital Libraries that are also
accessible from web.
 We can not ignore another type of web
content-
 The existence of web applications so that
the users could access the applications
through web interfaces.
 Many applications are being migrated to
the web and
 Many types of applications are emerging
in the web environment itself.
Web Mining
• Content: text & multimedia mining
• Structure: link analysis, graph
mining
• Usage: log analysis, query mining
• Relate all of the above
–Web characterization
– Particular applications
WEB STRUCTURE MINING
Web Structure Mining
 Web Structure Mining is concerned with
discovering the model underlying the link
structure of web.
 It is used to study the topology of the
hyperlinks with or without the description of
the links.
 This model is used to categorized web
pages.
 It is useful to generate information such as the
similarity and relationship between
different web sites.
Cont…
 Web mining is also used to discover
authority sites for the subjects and
overview ( or hub) sites for the subjects
that point to many authorities.
 It is seen that Web content mining
attempts to explore the structure within
a document (intra-document structure).
 Web structure mining studies the
structure of documents within the web
itself (inter-document structure).
Cont…
 Some algorithms to model web
topology such as…
1. HITS
2. PAGE RANK
3. CLEVER
 These models are applied to calculate
the quality of rank or relevancy of
each web page.
Techniques used in modeling topology

 Page rank-
 In this importance of the document
is measured by counting citations or
back links to a given document.
 This gives some approximation of
a document’s importance or quality.
Cont…
 Social Network-
 It is another way of studying the web
link structure.
 Web structure mining utilizes the
hyperlinks structure of the web to apply
social network analysis.
 Social network studies ways to measure
the relative standing or importance of
individuals in the network.
WEB USAGE MINING
Web usage mining
 Web usage mining deals with
studying the data generated by the
web surfer’s session or behavior.
 Web content and structure mining
utilize the real or primary data on
the web.
 Where as web usage mines the
secondary data derived from the
interactions of the users with the
web.
Cont…
 The secondary data includes the data from the-
 Web server access logs
 Proxy server logs
 Browser logs
 User profiles
 Registration data
 User sessions or transactions
 Cookies
Cont…
 User queries
 Bookmark data
 Mouse clicks and scrolls &
 Any other data which are the results
of these interactions.
Cont…
 This data can be accumulated by the
web server.
 Analysis of the web access logs of
different web sites can facilitate an
understanding of the user behavior
and the web structure .
Size of the Web
 Number of pages
 Technically, infinite
 Much duplication (30-40%)
 Best estimate of “unique” static HTML
pages comes from search engine claims
 Until last year, Google claimed 8 billion(?),
Yahoo claimed 20 billion
 Google recently announced that their index
contains 1 trillion pages
 How to explain the discrepancy?
Trends in Data Mining

 Trends in data mining include further efforts

toward the exploration of
 New application areas
 Improved scalable and interactive methods
(including constraint-based mining)
 The integration of data mining with data
warehousing and database systems
 The standardization of data mining languages.
 Visualization methods and new methods for
handling complex data types.
Other trends include
 Biological data mining
 Mining software bugs
 Web mining
 Distributed and real-time mining
 Graph mining
 Social network analysis
 Multirelational and Multidatabase data mining
 Data privacy, protection and data security
Summary

 Mining complex types of data include object data,

spatial data, multimedia data, time-series data, text data,
and Web data
 Object data can be mined by multi-dimensional
generalization of complex structured data, such as plan
mining for flight sequences
 Spatial data warehousing, OLAP and mining facilitates
multidimensional spatial analysis and finding spatial
associations, classifications and trends
 Multimedia data mining needs content-based retrieval
and similarity search integrated with mining methods

38
Summary

 Time-series/sequential data mining includes trend

analysis, similarity search in time series, mining sequential
patterns and periodicity in time sequence
 Text mining goes beyond keyword-based and similarity-
based information retrieval and discovers knowledge from
semi-structured data using methods like keyword-based
association and document classification
 Web mining includes mining Web link structures to
identify authoritative Web pages, the automatic
classification of Web documents, building a multilayered
Web information base, and Weblog mining

39
Mining Multimedia Databases
 A Multimedia database system stores
and manage a large collection of
multimedia objects…. Such as
 Audio data
 Image data
 Video data
 Sequence data and
 Hypertext data
Which contains text, text markups &
linkages.
Cont…
 Multimedia database systems are
increasingly common to the popular
use of
Important Questions
Write short notes on
 Mining Spatial Databases
 Mining Multimedia Databases
 Mining Time-Series and Sequence Data
 Mining Text Databases
 Mining the World-Wide Web
 Data Mining Applications
 Social Impact of Data Mining
 Trends in Data Mining.
UNIT-V
Important Questions

 What is Privacy-Preserving Data

Mining and its application?
 What is Distributed privacy-
preserving Data Mining?
 What are the Limitations of privacy
preserving DM.
 Explain the Method of randomization.
 Text Book: Charu C. Agrawal and Philips. Yu.”
Privacy-preserving Data Mining: Models and
Algorithms”, Springer.

Web Mining
No ratings yet
Web Mining
20 pages
Web Mining
No ratings yet
Web Mining
23 pages
Web Mining Research: A Survey: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000
No ratings yet
Web Mining Research: A Survey: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000
34 pages
Web Mining: Created By
No ratings yet
Web Mining: Created By
11 pages
Web and Text Mining
No ratings yet
Web and Text Mining
73 pages
UNIT - 3 Final
No ratings yet
UNIT - 3 Final
37 pages
Data Mining: Web Data Mining Techniques, Tools and Algorithms: An Overview
No ratings yet
Data Mining: Web Data Mining Techniques, Tools and Algorithms: An Overview
9 pages
19 Web Mining 2
No ratings yet
19 Web Mining 2
41 pages
Extracting Data Through Webmining: Mrs - Bhanu Bhardwaj Asst Proff DCE G.Noida
No ratings yet
Extracting Data Through Webmining: Mrs - Bhanu Bhardwaj Asst Proff DCE G.Noida
6 pages
Overview of Web Data Mining and Applications: Bamshad Mobasher Depaul University
No ratings yet
Overview of Web Data Mining and Applications: Bamshad Mobasher Depaul University
25 pages
Sandaruwan WP
No ratings yet
Sandaruwan WP
4 pages
Research Proposal On Distinct Study and Significant of Search Techniques in Web Mining
No ratings yet
Research Proposal On Distinct Study and Significant of Search Techniques in Web Mining
5 pages
Web Mining
No ratings yet
Web Mining
3 pages
Web Mining: By:-Vineeta 8pgc18 M.Tech (II Semester)
No ratings yet
Web Mining: By:-Vineeta 8pgc18 M.Tech (II Semester)
33 pages
Web Mining and Text Mining
No ratings yet
Web Mining and Text Mining
65 pages
A Web Mining and Optimization Approach For Improving Data Retrieval Performance in Web Search Engine Outcomes
No ratings yet
A Web Mining and Optimization Approach For Improving Data Retrieval Performance in Web Search Engine Outcomes
5 pages
Data Mining. Mining WWW.: Sonali. Parab
No ratings yet
Data Mining. Mining WWW.: Sonali. Parab
25 pages
Data Mining-World Wide Web
No ratings yet
Data Mining-World Wide Web
4 pages
Web Mining Notes
100% (1)
Web Mining Notes
8 pages
Eunice de Souza
No ratings yet
Eunice de Souza
3 pages
Web Mining
No ratings yet
Web Mining
53 pages
Web Mining: Day-Today: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
No ratings yet
Web Mining: Day-Today: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
4 pages
Web Mining
No ratings yet
Web Mining
13 pages
Unit 7: Web Mining and Text Mining
No ratings yet
Unit 7: Web Mining and Text Mining
13 pages
Analysis of Web Usage Mining: International Journal of Application or Innovation in Engineering & Management (IJAIEM)
No ratings yet
Analysis of Web Usage Mining: International Journal of Application or Innovation in Engineering & Management (IJAIEM)
7 pages
Web Mining Using Artificial Ant Colonies: A Survey
No ratings yet
Web Mining Using Artificial Ant Colonies: A Survey
6 pages
3.Eng-A Survey On Web Mining
No ratings yet
3.Eng-A Survey On Web Mining
8 pages
Web Mining and Knowledge Discovery of Usage Patterns: CS 748T Project (Part I)
No ratings yet
Web Mining and Knowledge Discovery of Usage Patterns: CS 748T Project (Part I)
25 pages
Web Mining
100% (3)
Web Mining
28 pages
Scoring Your: SAT Practice Test #9
No ratings yet
Scoring Your: SAT Practice Test #9
10 pages
Web Mining
No ratings yet
Web Mining
3 pages
TKR College of Engineering and Technology: (Autonomous & Accredited With 'A' Grade by NAAC)
No ratings yet
TKR College of Engineering and Technology: (Autonomous & Accredited With 'A' Grade by NAAC)
2 pages
Data Mining
No ratings yet
Data Mining
80 pages
6 WebMining
No ratings yet
6 WebMining
45 pages
Business Data Mining Week 13
No ratings yet
Business Data Mining Week 13
15 pages
Web Mining: Presented By: Vikash Kumar
No ratings yet
Web Mining: Presented By: Vikash Kumar
24 pages
Mi True Wireless Earbuds Basic User Manual
No ratings yet
Mi True Wireless Earbuds Basic User Manual
1 page
Bda Class - Feb 7th
No ratings yet
Bda Class - Feb 7th
28 pages
Web Content Mining: by Saumya Aggarwal (0232083107 - IT) Richa Sharma (0732082707 - CSE)
No ratings yet
Web Content Mining: by Saumya Aggarwal (0232083107 - IT) Richa Sharma (0732082707 - CSE)
12 pages
Web Mining
No ratings yet
Web Mining
15 pages
Web Mining
No ratings yet
Web Mining
28 pages
Webmining I
No ratings yet
Webmining I
69 pages
Webminingtextmining 160906165305
No ratings yet
Webminingtextmining 160906165305
18 pages
s7 200 Quick Reference Info en
No ratings yet
s7 200 Quick Reference Info en
6 pages
Introduction To Web Mining
No ratings yet
Introduction To Web Mining
13 pages
Web Mining: by Saumil Shah Roll No: 46 Mca 4 Sem
No ratings yet
Web Mining: by Saumil Shah Roll No: 46 Mca 4 Sem
28 pages
Pointers
No ratings yet
Pointers
24 pages
Web Mining MMMUT NOTES
No ratings yet
Web Mining MMMUT NOTES
5 pages
Web Usage Mining: Discovery and Applications of Usage Patterns From Web Data
No ratings yet
Web Usage Mining: Discovery and Applications of Usage Patterns From Web Data
12 pages
Webmining I
No ratings yet
Webmining I
69 pages
Lesson 4 - Sentence Structures
No ratings yet
Lesson 4 - Sentence Structures
48 pages
Data Mining
No ratings yet
Data Mining
12 pages
QU PPT Format
No ratings yet
QU PPT Format
12 pages
Web Mining
No ratings yet
Web Mining
73 pages
Unit 3 DMW
No ratings yet
Unit 3 DMW
31 pages
13-Web Mining
No ratings yet
13-Web Mining
3 pages
Module1PartAweb Mining-Intro
No ratings yet
Module1PartAweb Mining-Intro
28 pages
Dxdiag Lumion
No ratings yet
Dxdiag Lumion
38 pages
Heidi by Spyri, Johanna, 1827-1901
100% (1)
Heidi by Spyri, Johanna, 1827-1901
178 pages
Project Report PDF
No ratings yet
Project Report PDF
5 pages
Week 1
No ratings yet
Week 1
80 pages
Grammar Portfolio Intro
No ratings yet
Grammar Portfolio Intro
16 pages
Table of Specifications in English 10 For Quarter 2 - Compress
No ratings yet
Table of Specifications in English 10 For Quarter 2 - Compress
4 pages
Scratch Test
100% (1)
Scratch Test
3 pages
Lesson Plan: Class-V Subject: English Language and Spelling and Dictation
No ratings yet
Lesson Plan: Class-V Subject: English Language and Spelling and Dictation
6 pages
String Handling
No ratings yet
String Handling
5 pages
Introduction To Web Mining
No ratings yet
Introduction To Web Mining
20 pages
GlideAjax Example 1 Word
No ratings yet
GlideAjax Example 1 Word
1 page
Unit 7
No ratings yet
Unit 7
31 pages
A Bridge: - Verb (Used With Object), A Bridged, A Bridg Ing
No ratings yet
A Bridge: - Verb (Used With Object), A Bridged, A Bridg Ing
7 pages
Https:/www-Jstor-Org Lib-E2 Lib Ttu Edu/stable/pdf/3050509 Pdf?refreqid
No ratings yet
Https:/www-Jstor-Org Lib-E2 Lib Ttu Edu/stable/pdf/3050509 Pdf?refreqid
5 pages
String Manipulation Worksheet 2.1
No ratings yet
String Manipulation Worksheet 2.1
4 pages
m4 Task Draft
No ratings yet
m4 Task Draft
4 pages
Black Doodle Group Project Presentation
No ratings yet
Black Doodle Group Project Presentation
33 pages
Tcontwebbac02 Iom
No ratings yet
Tcontwebbac02 Iom
74 pages
Unit 4 (DWDM)
No ratings yet
Unit 4 (DWDM)
27 pages
Using Body Language in Giving Presentati
No ratings yet
Using Body Language in Giving Presentati
4 pages
Final Exam
No ratings yet
Final Exam
16 pages
Mefc 121 - Final Output
No ratings yet
Mefc 121 - Final Output
3 pages
A Study On Different Aspects of Web Mining and Research Issues
No ratings yet
A Study On Different Aspects of Web Mining and Research Issues
8 pages
EB Ining: Dvanced Opics
0% (1)
EB Ining: Dvanced Opics
48 pages
Mild The Mist Upon The Hill
No ratings yet
Mild The Mist Upon The Hill
4 pages
Stochastic Mechanics
No ratings yet
Stochastic Mechanics
113 pages
AI Unit 1 Notes
No ratings yet
AI Unit 1 Notes
15 pages
Web Usage Mining
No ratings yet
Web Usage Mining
13 pages
Web Mining Analyzing Websites and Collec
No ratings yet
Web Mining Analyzing Websites and Collec
8 pages
Practice Q 02
No ratings yet
Practice Q 02
2 pages
Log 2
No ratings yet
Log 2
2 pages