Web Mining Course
Web Mining Course
WWW: Facts
Discovering useful information from the World-Wide Web and its usage patterns
Problems
Web page content mining, also known as web text mining or web data mining, is the process of
extracting valuable information, patterns, and insights from unstructured web content. It involves
analyzing and extracting knowledge from the vast amount of text-based information available on
the internet, including web pages, articles, blog posts, forums, social media posts, and other
textual data.
Web content mining can encompass a wide range of tasks and techniques, including:
Text Preprocessing:
Text Extraction: .
Keyword Extraction:
Sentiment Analysis:
Text Classification:
Opinion Mining: Identifying opinions, attitudes, and subjective information expressed in the
text.
Web structure mining is a branch of web mining that focuses on analyzing and discovering
patterns and knowledge from the structural components of the World Wide Web. It involves
examining the relationships and connections between web pages, websites, and other web-based
resources to gain insights into the organization, navigation, and interlinking of information on
the web.
Link Analysis: This type of web structure mining focuses on the analysis of hyperlinks
that connect web pages.
Web Usage Mining: Web usage mining analyzes user interactions with the web,
including clickstreams and navigation patterns.
Web Page Clustering: Web page clustering aims to group similar web pages based on
their content, structure, or link patterns.
Web usage mining
Web usage mining is a branch of web mining that focuses on the analysis of user interactions
and behavior on the World Wide Web. It involves discovering meaningful patterns, trends, and
insights from the vast amount of user-generated data, such as clickstreams, session data, and
navigation patterns. The goal of web usage mining is to understand how users navigate websites,
interact with web pages, and utilize web-based applications and services
Web structure mining is the process of extracting knowledge from the interconnections of
hypertext document in the world wide web.
Interesting Questions:
page rank: for discovering the most important pages on the Web (as used in Google)
hubs and authorities: a more detailed evaluation of the importance of Web pages
Intuition
Connectivity
SCC- a set of nodes such that for any (u,v) there is a path from u to v