Webminingtextmining 160906165305
Webminingtextmining 160906165305
Asst.Professor CSE
ACE Engg. college
Web Mining
Web Mining
Web Mining is the application of data mining.
It is the process of discovering and extracting useful information directly from the
web.
In other words, web mining is a branch of data mining concentrating on WWW as
the primary data source.
The goal of web miming is to look for patterns in web data by collecting and
analysing information in order to gain insight into trends, user interests, booming
industry's etc.
Web mining is very useful to e-commerce websites and e-services.
Web mining is further divided into three different types:
1] Web content mining.
2] Web structure mining.
3] Web usage mining.
Web data :-
• Web usage mining :http logs, app server logs, browser history etc.
Web Mining
Web Content Mining
Web content mining is the process of extracting useful information from the
content of a web page.
The primary task of content mining is data extraction i.e. extracting structured
data from unstructured websites.
In web content mining all the content present in web page such as audio, video,
text, images are scanned ,collected and analysed to find useful data and patterns.
It uses the Natural Language Processing and Information Retrieval techniques for
mining the data.
Web Structure Mining
The structureof a typical Web graph consists of
Web pages as nodes, and hyperlinks as edges connecting between
two related pages.
Web structure mining is the process of discovering structure information from the
web.
• This type of mining can be performed either at the (intra-page) document level or the
(inter-page) hyperlink level.
• Automatically generated data stored in server access logs, referrer logs, agent logs, and
client-side cookies.
• User profiles.
Web Server Log : It is a file that is created by the server to record all the
activities it performs.
For ex: When a user enters URL into the browsers address bar or requests by
clicking on a link.
The page request sent to web server maintains the following info. in its log like
Information about URL, Whether the request was successful, Users IP address,
time and date, etc.
Text Mining
Text Mining
Text mining – first, impose structure to the data, then mine the structured data.
Technology premise of Text Mining
Summarization : It is a process of making summary of any document containing
large amount of information while theme or main idea of document is maintained.