Unit 7
Unit 7
– Hyper-link information
– Web data:
• Web content data : Text, image, records, etc.
• Goal: analyze the behavioral patterns and profiles of users interacting with a
Web site.
– provide more personalized content to visitors, and find the most effective
logical structure for their Web space.
• Knowledge discovery,
– This step will influence the quality and result of the pattern discovery and
analysis. Therefore, it needs to be done very carefully.
– At present the usually used data mining methods mainly have clustering,
classifying and association rule mining.
– Each method has its own excellence and shortcomings, but the quite
effective method mainly is classifying and clustering at the present.
• Content data corresponds to the collection of facts a Web page was designed
to convey to the users.
• It may consist of text, images, audio, video, or structured records such as lists
and tables as shown in Figure below.
• This structure data mining provides use for a business to link the information
of its own Web site to enable navigation and cluster information into site
maps.
• This allows its users the ability to access the desired information through
keyword association and content mining.
• Handing Big Data on the web is most important challenge, which is scalable
in term of volume, variety, variability, and complexity.
– Website Design
– e-Business and
– Web Personalization
– Web usage mining can then be used to detect which types of users are
accessing the website, and their behavior, knowledge which can then be
used to manually design/re-design the website, or to automatically change
the structure and content based on the profile of the user visiting it.
– The order defines a time axis, that differentiates this data from other cases
• Examples
over time
• E.g., stock prices do not make sense without the time information.
• Explain the difference between the three types of web data mining.
• What data mining techniques can be used for log data analysis?
• What are time series data? Explain about time series data mining.