0% found this document useful (0 votes)
38 views18 pages

Webminingtextmining 160906165305

Uploaded by

rks teja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views18 pages

Webminingtextmining 160906165305

Uploaded by

rks teja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Prepared by :Rohini Talekar

Asst.Professor CSE
ACE Engg. college
Web Mining
Web Mining
 Web Mining is the application of data mining.
 It is the process of discovering and extracting useful information directly from the
web.
 In other words, web mining is a branch of data mining concentrating on WWW as
the primary data source.
 The goal of web miming is to look for patterns in web data by collecting and
analysing information in order to gain insight into trends, user interests, booming
industry's etc.
 Web mining is very useful to e-commerce websites and e-services.
 Web mining is further divided into three different types:
1] Web content mining.
2] Web structure mining.
3] Web usage mining.
 Web data :-

• Web content mining : text,images,audio,video,records etc.

• Web structure mining : hyperlinks , tags etc.

• Web usage mining :http logs, app server logs, browser history etc.
Web Mining
Web Content Mining

 Web content mining is the process of extracting useful information from the
content of a web page.

 The primary task of content mining is data extraction i.e. extracting structured
data from unstructured websites.

 In web content mining all the content present in web page such as audio, video,
text, images are scanned ,collected and analysed to find useful data and patterns.

 In web content mining each web-page is considered as an individual document.

 It uses the Natural Language Processing and Information Retrieval techniques for
mining the data.
Web Structure Mining
 The structureof a typical Web graph consists of
Web pages as nodes, and hyperlinks as edges connecting between
two related pages.

 Web structure mining is the process of discovering structure information from the
web.

• This type of mining can be performed either at the (intra-page) document level or the
(inter-page) hyperlink level.

• The research at the hyperlink level is also called Hyperlink Analysis.


Web Structure Terminology

 Web-graph : A directed graph that represents the Web.

 Node : Each Web page represents a node.

 Edge : Each hyperlink on the Web is a directed edge .

 In-degree : The number of distinct links pointing to a particular node.

 Out-degree : The number of distinct links generated from particular node.


 The purpose of structure mining is to produce the structural summary of website
and similar web pages.
 Web structure mining can be very useful to determine the connection between two
commercial websites.
 The most popular example of structure mining is pageRank algorithm used by
Google to rank search results.
 Technique used for structure mining are :-
1.Clasification(Link based classification).
2. Clustering(Link –based clustering).
3. pageRank algorithm.
Web Usage Mining
 A Web is a collection of inter-related files on one or more Web Servers.

 Discovery of meaningful patterns from data generated by client-server transaction


on one or more Web localities.

 Typical Sources of Data :

• Automatically generated data stored in server access logs, referrer logs, agent logs, and
client-side cookies.

• User profiles.

• Metadata : page attributes, content attributes, usage data.


Web Usage Mining
 Web servers, Web proxies, and client application can quite easily capture Web
Usage data.

 Web Server Log : It is a file that is created by the server to record all the
activities it performs.

 For ex: When a user enters URL into the browsers address bar or requests by
clicking on a link.

 The page request sent to web server maintains the following info. in its log like
Information about URL, Whether the request was successful, Users IP address,
time and date, etc.
Text Mining
Text Mining

 The objective of Text Mining is to exploit information contained in textual


documents in various ways, including discovery of patterns and trends in data,
associations among entities, predictive rules, etc.

 The results can be important both for :

• The analysis of the collection, and

• Providing intelligent navigation and browsing methods.


Text Mining Workflow
Data Mining vs Text Mining

 Both seek novel and useful pattern.

 Both are semi-automated process.

 Difference is the nature of the data:

• Structured versus Unstructured data

• Structured data: databases

• Unstructured data: word docs, pdf files, xml files, and so on

 Text mining – first, impose structure to the data, then mine the structured data.
Technology premise of Text Mining
 Summarization : It is a process of making summary of any document containing
large amount of information while theme or main idea of document is maintained.

 Information Extraction : It utilizes relations within the text. It uses pattern


matching for it.

 Categorization : It is a supervised learning technique which places the document


according to content. Document categorization is largely used in libraries.

 Visualization : It is computer graphic effect to represent information


and
revealing relationships.
Technology premise of Text Mining
 Clustering : It is a document’s textual similarity based unsupervised technique
which is used by data analysis to divide the text into mutually exclusive groups.

 Question Answering : Natural language queries or questions answering is


responsible to decide a way find a more suitable answer for particular question.

 Sentiment Analysis : It is also known as opinion mining is configured of user’s


emotion, mostly into several classes which are positive, negative, neutral and
mixed. It is mainly used to get people’s view or attitude towards anything which
includes services and products.

You might also like