0% found this document useful (0 votes)
10 views31 pages

Unit 7

Unit 7 covers advanced applications of web mining, detailing its three main categories: web content mining, web usage mining, and web structure mining. It discusses the phases of web usage mining, including data collection, pattern discovery, and pattern analysis, as well as the challenges faced in web mining such as dynamic content and noise elimination. Additionally, it highlights various application areas for web mining, including website design, web traffic handling, e-business, and web personalization.

Uploaded by

Srizan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Unit 7

Unit 7 covers advanced applications of web mining, detailing its three main categories: web content mining, web usage mining, and web structure mining. It discusses the phases of web usage mining, including data collection, pattern discovery, and pattern analysis, as well as the challenges faced in web mining such as dynamic content and noise elimination. Additionally, it highlights various application areas for web mining, including website design, web traffic handling, e-business, and web personalization.

Uploaded by

Srizan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Unit 7: Advanced Applications LH 3

Presented By : Tekendra Nath Yogi


[email protected]
College Of Applied Business And Technology
Contd…
• Outline:
– 7.1. Web-mining:

• Web content mining

• web usage mining

• Web Structure mining

– 7.2. Time-series data mining

7/17/2019 By: Tekendra Nath Yogi 2


Introduction
• Web: A huge, widely-distributed, highly heterogeneous, semi-structured,

interconnected information repository

• Web is a huge collection of documents plus

– Hyper-link information

– Access and usage information

7/17/2019 By:Tekendra Nath Yogi 3


Contd..
• What is Web Mining?

– Web mining is the application of data mining techniques to


find interesting and potentially useful knowledge from web
data.

– Web data:
• Web content data : Text, image, records, etc.

• Web structure data: Hyperlinks

• Web usages data: server logs

7/17/2019 By:Tekendra Nath Yogi 4


Contd..
– Web mining is usually divided into the following three categories.
• Web content mining
• Web usage mining and
• Web structure mining

Fig: Types of web mining


7/17/2019 By:Tekendra Nath Yogi 5
Web usages mining
• Automatic discovery of patterns in clickstreams(usages) and associated data
collected or generated as a result of user interactions with one or more Web
sites.

• Goal: analyze the behavioral patterns and profiles of users interacting with a
Web site.

• The discovered patterns are usually represented as collections of pages,


objects, or resources that are frequently accessed by groups of users with
common interests.

7/17/2019 By:Tekendra Nath Yogi 6


Contd..
• Application: Analyzing click stream data can help :
– determine the life-time value of clients,

– design cross-marketing strategies across products and services,

– evaluate the effectiveness of promotional campaigns,

– optimize the functionality of Web-based applications,

– provide more personalized content to visitors, and find the most effective
logical structure for their Web space.

7/17/2019 By:Tekendra Nath Yogi 7


Contd..
• Phase of Web Usage Mining:

– There are generally three distinctive phases in web usage mining:

• Data collection and preprocessing,

• Knowledge discovery,

• and pattern analysis as shown in figure below:

7/17/2019 By:Tekendra Nath Yogi 8


Contd..
• Data Collection and Pre-processing Phase:
– It deals with generating and cleaning of web data and transforming it to a
set of user transactions representing activities of each user during his/her
website visit.

– This step will influence the quality and result of the pattern discovery and
analysis. Therefore, it needs to be done very carefully.

7/17/2019 By:Tekendra Nath Yogi 9


Contd..
• Pattern Discovery Phase
– Knowledge or pattern discovery is the key component of the Web mining,
which uses the algorithms and techniques from data mining.

– At present the usually used data mining methods mainly have clustering,
classifying and association rule mining.

– Each method has its own excellence and shortcomings, but the quite
effective method mainly is classifying and clustering at the present.

7/17/2019 By:Tekendra Nath Yogi 10


Contd..
• Pattern Analysis Phase:
– Pattern Analysis is the final stage of the Web usage mining.

– Challenges of Pattern Analysis are to filter uninteresting information and


to visualize and interpret the interesting patterns to the user.

7/17/2019 By:Tekendra Nath Yogi 11


Web Content mining
• Web Content Mining is the process of extracting useful information from the
contents of Web documents.

• Content data corresponds to the collection of facts a Web page was designed
to convey to the users.

• It may consist of text, images, audio, video, or structured records such as lists
and tables as shown in Figure below.

7/17/2019 By:Tekendra Nath Yogi 12


Contd..

7/17/2019 By:Tekendra Nath Yogi 13


Web structure mining
• Web structure mining, one of three categories of web mining for data, is a tool
used to identify the relationship between Web pages linked by information or
direct link connection.

• It is used to study the topology of hyperlinks with or without the description


of the links.

7/17/2019 By:Tekendra Nath Yogi 14


Contd..
• The main purpose for structure mining is to extract previously unknown
relationships between Web pages.

• This structure data mining provides use for a business to link the information
of its own Web site to enable navigation and cluster information into site
maps.

• This allows its users the ability to access the desired information through
keyword association and content mining.

7/17/2019 By:Tekendra Nath Yogi 15


Contd..
• According to the type of web structural data, web structure mining can be
divided into two kinds: Hyperlinks and Document Structure as shown
in Figure below:

7/17/2019 By:Tekendra Nath Yogi 16


Issues and Challenges in Web Mining
• There are various issues and challenges with the web. Some
challenges include:
– The Web pages are dynamic that is the information is changes constantly.
Copping the changes and monitoring them is an important issue for many
applications.

– Noise elimination on the web is another issue. A user feels noisy


environment during searching the content, if the information comes from
different sources. Typical Web page involves many pieces of information
for instance the navigation links, main content of the page, copyright
notices, advertisements, and privacy policies. Only part of the information
is useful for a particular application but the rest is considered noise.

7/17/2019 By:Tekendra Nath Yogi 17


Contd..
• The diversity of the information on the multiple pages show similar
information in different words or formats, based on the diverse authorship of
Web pages that make the integration of information from multiple pages as a
challenging problem.

• Handing Big Data on the web is most important challenge, which is scalable
in term of volume, variety, variability, and complexity.

7/17/2019 By:Tekendra Nath Yogi 18


Contd..
• To maintain security and privacy of web data is not an easy task. Advanced
cryptographic algorithm is required for optimal service on the web.

• Discovery of advance hyperlink topology and its management is the other


mining issue on the web.

7/17/2019 By:Tekendra Nath Yogi 19


Web Mining Application Areas
• Web mining is an important tool to gather knowledge of the behavior of
Websites visitors and thereby to allow for appropriate adjustments and
decisions with respect to Websites‘ actual users and traffic patterns.

• Four major application areas for Web mining are:

– Website Design

– Web Traffic Handling

– e-Business and

– Web Personalization

7/17/2019 By:Tekendra Nath Yogi 20


Contd..
• Website Design:
– The content and structure of the Website is important to the user
experience/impression of the site and the site‘s usability. The problem is
that different types of users have different preferences, background,
knowledge etc. making it difficult (if not impossible) to find a design that
is optimal for all users.

– Web usage mining can then be used to detect which types of users are
accessing the website, and their behavior, knowledge which can then be
used to manually design/re-design the website, or to automatically change
the structure and content based on the profile of the user visiting it.

7/17/2019 By:Tekendra Nath Yogi 21


Contd..
• Web Traffic Handling:
– The performance and service of Websites can be improved using
knowledge of the Web traffic in order to predict the navigation path of the
current user. This may be used for cashing, load balancing or data
distribution to improve the performance. The path prediction can also be
used to detect fraud, break-ins, intrusion etc.

7/17/2019 By:Tekendra Nath Yogi 22


Contd..
• Web Personalization:
– Based on Web Mining Techniques, websites are designed to have the look-
and-feel and contents are personalized to the needs of an individual end-
user.

– Web Personalization or customization is an attractive application area for


Web based companies, allowing for recommendations, marketing
campaigns etc. to be specifically customized for different categories of
users, and more importantly to do this in real-time, automatically, as the
user accesses the Website.

7/17/2019 By:Tekendra Nath Yogi 23


Contd..
• e-Business:
– For Web based companies, Web mining is a powerful tool to collect
business intelligence by using electronic business to get competitive
advantages.

– Patterns of the customer’s activities on the Website can be used as


important knowledge in the decision-making process, e.g. predicting
customer’s future behavior; recruiting new customers and developing new
products are beneficial choices.

7/17/2019 By:Tekendra Nath Yogi 24


Contd..
• E-Learning and Digital Library:
– Web mining can be used for improving the performance of electronic
learning. Applications of web mining towards e-learning are usually web
usage based. Machine learning and web usage mining improve web based
learning.

7/17/2019 By:Tekendra Nath Yogi 25


Contd..
• Security and Crime Investigation:
– Along with the rapid popularity of the Internet, crime information on the
web is becoming increasingly rampant, and the majority of them are in the
form of text.

– Because a lot of crime information in documents is described through


events, event-based semantic technology can be used to study the patterns
and trends of web-oriented crimes

7/17/2019 By:Tekendra Nath Yogi 26


Time series data mining
• Sequential data (or time series) refers to data that appear in a specific order.

– The order defines a time axis, that differentiates this data from other cases

we have seen so far

• Examples

– The price of a stock (or of many stocks) over time

– Environmental data (pressure, temperature, precipitation etc) over time

– The sequence of queries in a search engine, or the frequency of a query

over time

– The words in a document as they appear in order, and etc.

7/17/2019 By:Tekendra Nath Yogi 27


Contd..
• Why deal with sequential data?
– Because all data is sequential

• All data items arrive in the data store in some order

– In some (many) cases the order does not matter

• E.g., we can assume a bag of words model for a document

– In many cases the order is of interest

• E.g., stock prices do not make sense without the time information.

7/17/2019 By:Tekendra Nath Yogi 28


Contd..

Fig: General time series data mining framework

7/17/2019 By:Tekendra Nath Yogi 29


Homework
• What is web data mining? In what situations can web data mining techniques
can be useful?

• What are the aims of web data mining?

• Explain the difference between the three types of web data mining.

• What data mining techniques can be used for log data analysis?

• What are time series data? Explain about time series data mining.

7/17/2019 By:Tekendra Nath Yogi 30


Thank You !

7/17/2019 By: Tekendra Nath Yogi 31

You might also like