0% found this document useful (0 votes)
60 views15 pages

Mining The Web Graph: Technical Seminar Presentation On

This document summarizes a technical seminar presentation on mining the web graph. It introduces web mining taxonomy, including web content mining, web structure mining, and web usage mining. It describes techniques such as analyzing hyperlink structure, discovering hubs and authorities, the HITS algorithm, PageRank, web data clustering, and web caching. The goal is to discover useful patterns and information from the structure and usage of the world wide web.

Uploaded by

lokseh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views15 pages

Mining The Web Graph: Technical Seminar Presentation On

This document summarizes a technical seminar presentation on mining the web graph. It introduces web mining taxonomy, including web content mining, web structure mining, and web usage mining. It describes techniques such as analyzing hyperlink structure, discovering hubs and authorities, the HITS algorithm, PageRank, web data clustering, and web caching. The goal is to discover useful patterns and information from the structure and usage of the world wide web.

Uploaded by

lokseh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15

MINING THE WEB GRAPH

TECHNICAL SEMINAR PRESENTATION ON

MINING THE WEB GRAPH


By

K. Lokesh Acharaya
Roll No. 0701229274

Under the guidance of


Mr. S. N. samal

K. Lokesh Acharaya
MINING THE WEB GRAPH

WEB MINING TAXONOMY

Web Mining

Web
Web Content Web Usage
Structure
Mining Mining
Mining

K. Lokesh Acharaya 1
MINING THE WEB GRAPH

WEB MINING TAXONOMY

•Web Content Mining: application of data mining techniques to


unstructured or semi structured text, typically HTML
documents.

•Web Structure Mining: use of the hyperlink structure of the


web as an information source.

•Web Usage Mining: analysis of user interactions with the web


server.

K. Lokesh Acharaya 2
MINING THE WEB GRAPH

WEB CONTENT MINING

•Discovery of useful information from web


content/data/document
Web data contents:text, image, audio,video,metadata and
hyperlinks.

•Information retrieval view


Assist/improve information finding
Filtering information to user on user profile.

•Database views
Model data on the web
Integrate them for more sophisticated queries.

K. Lokesh Acharaya 3
MINING THE WEB GRAPH

WEB STRUCTURE MINING


•To discover the link structure of the hyperlinks at the inter-
document level to generate structural summary about the
Website and Web page.
•Direction 1: based on the hyperlinks, categorizing the Web
pages and generated information.
•Direction 2: discovering the structure of Web document itself.
•Direction 3: discovering the nature of the hierarchy or network
of hyperlinks in the Website of a particular domain.
•Finding authoritative web pages.
•Retrieving pages that are not only relevant but also of high
quality or authoritative on the topic.
K. Lokesh Acharaya 4
MINING THE WEB GRAPH

WEB USAGE MINING

•Web usage mining also known as web log mining.


•Mining technique to discover interesting usage patterns from the
interactions of the user while surfing the web.

•Used to enhance the quality and delivery of internet information


services to the end user.

•Also used for improving site design.


K. Lokesh Acharaya 5
MINING THE WEB GRAPH

WEB AS A GRAPH

•Pages are nodes and hyperlinks are edges.


•Power law in(out) degree:
The degree that a node has in(out) degree ¡ is
proportional to 1/¡ª for some a>1.
•Connectivity:
In weakly connected components links are undirected.

In strongly connected components only directed links


are present.

K. Lokesh Acharaya 6
MINING THE WEB GRAPH

HUBS AND AUTHORITIES

•Authorities: pages that contain a lot of information about the


query topic.
•Hubs: pages that contain a large number of links to pages that
contain information about the topic.
•Mutual reinforcement:
 A good hub points to many good authorities.
 A good authority is pointed to by many hubs.

K. Lokesh Acharaya 7
MINING THE WEB GRAPH

HITS ALGORITHM
•HITS: Algorithm for identifying good hub and authority pages for a
query each page is associated with a hub score and an authority score.
• Scores are computed based on graph structure of the Web.
• Mutual reinforcement of hubs and authorities is exploited with an
iterative algorithm.
• Hub Scores h(p): hub scores are updated with the sum
of all authority weights of pages it points
to.h(p)=Σ (p,q)ε Ε a(q)
• Authority Scores a(p): authority scores are updated
with the sum of all hub weights that point to
it.a(p)=Σ (q,p)ε Eh(q)
•Drawback: the hub and authority scores are computed
iteratively from the query result.
K. Lokesh Acharaya 8
MINING THE WEB GRAPH

PAGE RANK
•Page Rank pr(p):
pr(p)=(1-d)1/N+dΣ (q,p)ε Ε
•Where o(p) out degree of page p
d damping factor (0.85)
N total number of pages
• Page rank prefers pages that have:
a large in-degree.
predecessors with a large page rank.
predecessors with a small out-degree.
• Page rank is a probability distribution.

K. Lokesh Acharaya 9
MINING THE WEB GRAPH

WEB DATA CLUSTERING

•Grouping web objects into classes so that similar objects are


in the same class and dissimilar web objects are in different
classes.

•Discover distribution patterns and relation between data


attributes.

•Organize data circulated over the web into groups/collections


in order to facilitate data availability & accessing and at the
same time meet user preferences.

K. Lokesh Acharaya 10
MINING THE WEB GRAPH

BENEFITS OF WEB CLUSTERING

•Increase web information accessibility.


•Decrease lengths in web navigation pathways.
•Improving usage requests servicing.
•Improving information retrieval.
•Improving content delivery on the web.

K.Integrating various data representation standards.
Lokesh Acharaya 11
MINING THE WEB GRAPH

TYPES OF CLUSTERING

•Hierarchical clustering.
•Partitional clustering.
•Probabilistic clustering.
•Graph based clustering.
•Fuzzy clustering.
•Neural network based clustering.
K. Lokesh Acharaya 12
MINING THE WEB GRAPH

WEB CACHING
•Caching can improve the net traffic in the web by reducing
the bandwidth consumption
the network latency perceived by the client
the server load.
•Caching can improve the network reliability perceived by the
client.
•Evaluation measures and techniques are:
Hit Rate: The ratio of requests fulfilled by the cache and then
not handled by the web servers.
Weighted Hit Rate: Ratio of bytes served to the client by the
cache.
Latency: The time that an end user waits for retrieving a
resource.

K. Lokesh Acharaya 13
MINING THE WEB GRAPH
National Institute of Science & Technology

THANK YOU

K. Lokesh Acharaya 14

You might also like