0% found this document useful (0 votes)
2 views13 pages

Web usage mining

The document discusses the three categories of web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining, each focusing on different aspects of knowledge discovery from web data. Web Content Mining extracts useful information from web page content, Web Structure Mining analyzes hyperlink structures, and Web Usage Mining predicts user behavior based on interaction patterns. The document also highlights challenges, techniques, advantages, and ethical concerns associated with web mining.

Uploaded by

rohitlohar18116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
2 views13 pages

Web usage mining

The document discusses the three categories of web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining, each focusing on different aspects of knowledge discovery from web data. Web Content Mining extracts useful information from web page content, Web Structure Mining analyzes hyperlink structures, and Web Usage Mining predicts user behavior based on interaction patterns. The document also highlights challenges, techniques, advantages, and ethical concerns associated with web mining.

Uploaded by

rohitlohar18116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 13
vayvti24, 229 aM Web Content vs Web Structure vs Web Usage Mining -Javatpant Decision Tree Induction Educational Data Mining Data Mining in Healthcare Apriori Algorithm Data Integration in Data Mining Data mining vs Text mining + prev fs Difference between Web Content, Web Structure, and Web Usage Mining Web mining is the application of data mining techniques to extract knowledge from web Gata, including web documents, hyperlinks between documents, usage logs of websites, etc. Web mining aims to discover and retrieve useful and interesting patterns from large data sets and classic data mining. Big data act as data sets on web mining. Web data includes information, documents, structure, and profile. Web mining is based on two concepts defined, process-based and data-driven. In general, the use of web mining typically involves several steps, such as collecting data, selecting the data before processing, knowledge discovery, and analysis. The internet has become a crucial part of our lives nowa: extract data on the web are an interesting area of res extract knowledge from Web data, in which at least oni Gata is used in the mining process (with or without other mining tasks can be classified into three categories: 1. Web content mining 2. Web structure mining 3. Web usage mining bitps Avo javatpoint comiweb-content-vesweb-structure-vs-web-sage-minng 229 12124, 939 aM ‘Web Content vs Web Structure vs Web Usage Mining -Javatpoint All three categories focus on the process of knowledge discovery of implicit, previously unknown, and potentially useful information from the web. Each of them focuses on different mining objects of the web. Let's study all of the three categories in brief for good understanding What is Web Content Mining? Web Content Mining can be used for the mining of useful data, information, and knowledge from web page content. Web content mining performs scanning and mining of the text, images, and group of web pages according to the content of the input by displaying the list in search engines. It is also quite different from data mining because web data are mainly semi-structured or unstructured, while data mining deals primarily with structured data. Web content mining is also different from text mining because of the semi-structured nature of the web, while text mining focuses on unstructured texts. Thus, Web content mining requires creative applications of data mining and text mining techniques and its own unique approaches © In the past few years, there has been a rapid expansion mining area. This is not surprising because of the phenom the significant economic benefit of such mining. Howeve the lack of structure of web data, automated discov knowledge information still present many challenging r4 mining could be differentiated from two approaches, such 1. Agent-based Approach bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 3129 12124, 939 aM ‘Web Content vs Web Structure vs Web Usage Mining -Javatpoint This approach involves intelligent systems. It aims to improve information finding and filtering. It usually relies on autonomous agents that can identify relevant websites. And it could be placed into the following three categories, such as: ° Intelligent Search Agents: These agents search for relevant information using domain characteristics and user profiles to organize and interpret the discovered information. © Information Filtering or Categorization: These agents use information retrieval techniques and characteristics of open hypertext Web documents to retrieve automatically, filter, and categorize them. © Personalized Web Agents: These agents learn user preferences and discover Web information based on other users’ preferences with similar interests. Data based approach Data based approach is used to organize semi-structured data present on the internet into structured data. It aims to model the web data into a more structured form to apply standard database querying mechanisms and data mining applications to analyze it Web Content Mining Challenges Web content mining has the following problems or chal such as: © Data Extraction: Extraction of structured data from Web pages, such as products and search results, Extracting such data allows one to provide services. bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 4129 tarnt24, 994M ‘Web Content ve Web Siicture vs Web Usoge Mining -Javatpint Two main types of techniques, machine learning and automatic extraction, are used to solve this problem ° Web Information Integration and Schema Matching: Although the Web contains a huge amount of data, each website (or even page) represents similar information differently. Identifying or matching semantically similar data is an important problem with many practical applications. ° Opi n extraction from online sources: There are many online opinion sources, eg, customer reviews of products, forums, blogs, and chat rooms. Mining opinions are of great importance for marketing intelligence and product benchmarking. ° Knowledge synthesis: Concept hierarchies or ontology are useful in many applications, However, generating them manually is very time-consuming. The main application is to synthesize and organize the pieces of information on the web to give the user a coherent picture of the topic domain. A few existing methods that explore the web's information redundancy will be presented. ° Segmenting Web pages and detecting noise: In many Web applications, one only wants the main content of the Web page without advertisements, navigation links, copyright notices. Automatically segmenting Web pages to extract the pages’ main content is an interesting problem. What is Web Structure Mining? The challenge for Web structure mining is to deal with the structure of the hyperlinks within the web itself. Link analysis is an old area of research. However, with the growing interest in Web mining, the research of structure analysis has increased. These efforts resulted in a newly emerging research area called Link Mining, which is located at the intersection of the work in link analysis, hypertext, web mi logic programming, and graph mining Web structure mining uses graph theory to analyze a structure. According to the type of web structural datal divided into two kinds: ° Extracting patterns from hyperlinks in the web: a hyperlink is a structural component that connects the web page to a different location. bitps Avo javatpointcomiweb-conten-ve-web-structure-vs-web-sage-minng 5129 varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant © Mining the document structure: analysis of the tree-like structure of page structures to describe HTML or XML tag usage The web contains a variety of objects with almost no unifying structure, with differences in the authoring style and content much greater than in traditional collections of text documents. The objects in the WWW are web pages, and links are in, out, and co-citation (two pages linked to by the same page). Attributes include HTML tags, word appearances, and anchor texts. Web structure mining includes the following terminology, such as © Web graph:directed graph representing web. © Node: web page in the graph. Edg hyperlinks. In degree: the number of links pointing to a particular node Out degree: number of links generated from a particular node. An example of a technique of web structure mining is the PageRank algorithm used by Google to rank search results. A page's rank is decided by the number and quality of links pointing to the target node. Link mining had produced some agitation on some traditional data mining tasks. Below we summarize some of these possible tasks of link mining which are applicable in Web structure mining, such as: 1. Link-based Classification: The most recent upgrade of a classic data mining task to linked Domains. The task is to predict the category of a web page based on words that occur on the page, links between pages, anchor text, html tags, and other possible attributes found on the web page. © 2. Link-based Cluster Analysis: The data is segmented intd grouped together, and dissimilar objects are group previous task, link-based cluster analysis is unsupervised 2 patterns from di 3. Link Type: There is a wide range of tasks concerning pred predicting the type of link between two entities or predict: 4, Link Strength: Links could be associated with weights. 5. Link Cardinality: The main task is to predict the number of links between objects. page categorization used to bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 629 varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant © Finding related pages. © Finding duplicated websites and finding out the similarity between them What is Web Usage Mining? Web Usage Mining focuses on techniques that could predict the behavior of users while they are interacting with the WWW. Web usage mining, discovering user navigation patterns from web data, trying to discover useful information from the secondary data derived from users’ interactions while surfing the web. Web usage mining collects the data from Weblog records to discover user access patterns of web pages. Several available research projects and commercial tools analyze those patterns for different purposes. The insight knowledge could be utilized in personalization, system improvement, site modification, business intelligence, and usage characterization, The only information left behind by many users visiting a Web site is the path through the © pages they have accessed. Most of the Web information information, while they ignore the link information that c there are mainly four kinds of data mining techniques ap to discover the user navigation pattern, such as: 1. Association Rule Mining Association rule is the most basic rule of data mining methods which is used more than other methods in web usage mining. This method enables the website for more efficient content organization or provides recommendations for an effective cross-selling product. bitps Avo javatpointcomiweb-conten-ve-web-structure-vs-web-sage-minng m9 varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant These rules are statements in the form X => Y where (X) and (Y) are the et of available items in a series of transactions. The rule of X => Y states that transactions that contain items in X may also include iterns in Y. Associat on rules in the web usage mining are used to find relationships between pages that frequently appear next to one another in user sessions. 2. Sequential Patterns © Sequential pate! 's are used to discover the subsequenct Gata. In web usage mining, sequential patterns are used that frequently appear at meetings. The sequential patte| rules, But the sequential patterns are included the time, wt events that occurred is defined in sequential patterns. Al association rules can also be used to generate sequential patterns. Two types of algorithms are used for sequential mining patterns. bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 8129 varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant © The first type of algorithm is based on association rules mining. Many common algorithms of sequential mining patterns have been changed for mining association rules. For example, GSP and AprioriAll are two developed species of Apriori algorithms that are used to extract association rules. But some researchers believe that association rules mining algorithms do not have enough performance in the long sequential patterns mining © The second type of sequential patterns mining algorithms has been introduced in which the tree structure and Markov chain are used to represent survey patterns. For example, in one of these algorithms called WAP-mine, the tree structure called WAP-tree is used to explore access patterns to the web. Evaluation results show that its performance is higher than an algorithm such as sp. 3. Clustering Clustering techniques diagnose groups of similar iterns among high volumes of data. This is done based on distance functions which measure the degree of similarity between different items. Clustering in web usage mining is used for grouping similar meetings. What is important in this type of search is the contrast between the user and individual groups. Two types of interesting clustering can be found in this area: user clustering and page clustering. Clustering of user records is usually used to analyze web mining and web analytics tasks, More knowledge derived from clustering is used to partition the market in e-commerce. Different methods and techniques are used for clustering, which includes: © Using the similarity graph and the amount of time spent viewing a page to estimate the similarity of meetings. © ® Using genetic algorithms and user feedback © Clustering matrix © K-means algorithm, which is the most classic clust| The repetitive patterns are first extracted from the user's s other clustering methods. Then, these patterns are used to construct a graph where the nodes are the visited pages. The edges of the graph connect two or more pages. If these pages exist in a pattern extracted, the weight will be assigned to the edges that show the bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 929 varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant relationship between the nodes. Then, for clustering, this graph is recursively divided to user behavior groups are detected. 4, Classification Mi Discovering classification rules allows one to develop a profile of items belonging to a particular group according to their common attributes. This profile can classify new data items added to the database. In Web Mining, classified techniques allow one to develop a profile for clients who access particular server files based on demographic information available on those clients or their navigation patterns. Advantages Web usage mining has many advantages, making this technology attractive to corporations, including government agencies. © This technology has enabled e-commerce to do personalized marketing, resulting in higher trade volumes. Governmen@agencies are using this technology to classify threats and fight against terr © Companies can establish better customer relatio} customer's needs better and reacting to custo] increase profitability by target pricing based on even find customers who might default to a comp] retain the customer by providing promotional offers to the specific customer, thus reducing the risk of losing a customer or customers. bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 0129 ‘2724, 2994M ‘Web Content ve Web Siicture vs Web Usoge Mining -Javatpint ® More benefits of web usage mining, particularly personalization, are outlined in specific frameworks like the probabilistic latent semantic analysis model, which offers additional features to user behavior and access patterns. This is because the process provides the user with more relevant content through collaborative recommendations. © There are also elements unique to web usage mining that show the technology's benefits. These include the way semantic knowledge is applied when interpreting, analyzing and reasoning about usage patterns during the mining phase Disadvantages Web usage mining by itself does not create issues, but when used on data of personal nature, this technology might cause concerns. © The most criticized ethical issue involving web usage mining is the invasion of privacy. Privacy is considered lost when information concerning an individual is obtained, used, or disseminated, especially if this occurs without the individual's knowledge or consent. The obtained data will be analyzed, made anonymous, and then clustered to form anonymous profiles. © These applications de-individualize users by judging them by their mouse clicks rather than by identifying information. De-individualization, in general, can be defined as a tendency to judge and treat people based on group characteristics instead of on their characteristics and merits © The companies collecting the data for a specific purpose might use the data for totally different purposes, violating the user's interests. © Web Usage Mining Applications The main objective of web usage mining is to collect patterns. This information can improve the Web sites in th applications of this mining, such as: 1. Privatization of web content Web usage mining techniques can be used for the personalization of web users. For example, user behavior can be immediately predicted by comparing her current survey bitps wo javatpoint comiweb-conten-ve-web-structure-vs-web-sage-mining 19 varvni24, 29 AM Web Content vs Web Structure vs Web Usage Mining -Javatpant patterns with those extracted from the log files. Recommendation systems with a real s. Some sites application in this area suggest links that direct the user to his favorite pag also organize their product catalogs based on the predicted interests of a specific user and represent them. 2. Pre- recovery The results of web usage mining can be used to improve the performance of Web servers and Web-based applications. Web usage mining can be used for retrieving and caching strategies and thus reduce the response time of Web servers. 3. Improvement of Web site design Usability is one of the most important issues in designing and implementing websites. The results of web usage mining can help to appropriate the design of websites. Adaptive websites are an application of this type of mining. Website content and structure are dynamically reorganized based on data derived from user b@avior in these sites. Difference between Web Content, Web Str| Mining Here are the following difference between web content, mining, such as: ard Web Content MOS Etta a rT bitps Avo javatpointcomiweb-conten-ve-web-structure-vs-web-sage-minng 1029 varn1i24, 29 AM Web Content vs Web Structure vs Web Usage Mining - Javatpant eng Py RUC View of data © Unstructured ° Semi- Link structure Interacti structured ° Structured © Website as DB Main data © Text Hypertext Link structure © Ser documents gocuments © Brow © Hypertext logs documents reco” © Machine © Proprietary proprietary] © Mact Learning algorithm algorithm learn © Statistical © Association © Stati: (Including rules © Asso NLP) Rule: Representation © Bag of © Edged Graph © Rela words, n= labeled Table gram terms graph © Grap ° Phrases, © Relational concepts, or © ontology © Relational Application 5 Categorization _© Finding Categories renten © Clustering ° Clustering substructures, © Adapt and bitps Aww javatpoint comiweb-conten-vs-web-structure-vs-web-sage-minng 13129 sarv1i24, 2:29 AM Web Content vs Web Structure vs Web Usage Mining Javatpant ° Finding ° Web site moan Extract rules schema discovery ° Finding Patterns in text Dost What is Binning in Data Mining Learn Important Tutorial a ¢ bitps Aww javatpolnt comiweb-content-ve-web-structure-vs-web-sage-minng 4129

You might also like