0% found this document useful (0 votes)
15 views4 pages

Web Assignment1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

Web Assignment1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Web Mining Techniques Comparison

Assignment No.1
Q1. Differentiate between the following in tabular form :
1. Jaccard vs Euclidean vs Cosine
2. Web content vs Web structure vs Web usage mining
3. Organic vs Direct traffic
Q2. Explain the types of visitors.
Q3. What do you understand by benchmarking? Explain the steps involeved in this.
Q4. Explainn the history of web mining.
Explain each question in detail

Certainly, here is a detailed breakdown of your assignment questions:

Q1. Differentiate between the following in tabular form:

1. Jaccard vs Euclidean vs Cosine

Aspect Jaccard Similarity Euclidean Distance Cosine Similarity


Measures similarity Measures straight-line
Measures the cosine of the
between sets by dividing the distance between two
Definition angle between two vectors in
intersection of sets by their points in n-dimensional
an n-dimensional space.
union. space.

Formula ( \text{Jaccard}(A, B) = \frac{ A \cap B }{

0 to 1 (0 = no similarity, 1 = 0 to ∞ (lower value -1 to 1 (-1 = opposite, 1 =


Range
identical) indicates higher similarity) identical, 0 = orthogonal)
Continuous data or
Data Type Set-based data Numerical vectors
numerical vectors
Used in geometric,
Common in document and spatial, or high- Used in NLP, recommendation
Usage
text comparison, clustering dimensional data systems, and text mining
comparisons

Not normalized, highly Invariant to vector magnitudes


Normalization Not needed
scale-dependent (normalized by nature)

2. Web Content vs Web Structure vs Web Usage Mining

Aspect Web Content Mining Web Structure Mining Web Usage Mining

Involves extracting useful Examines the link structure Analyzes user behavior and
Definition information from the contents of the web to understand patterns from web data
of web pages. relationships. (e.g., logs).

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/4
Aspect Web Content Mining Web Structure Mining Web Usage Mining

Hyperlink structure
Text, multimedia (images, User logs, clickstreams,
Data Type (internal and external
audio, video) server data
links)

To retrieve information from To discover link patterns To understand user behavior


Goal
page contents and structural connections for personalization

Content recommendations, Page ranking algorithms, Targeted marketing, session


Applications
sentiment analysis community detection tracking, user profiling

Search engines, sentiment Google's PageRank Amazon’s recommendation


Examples
analysis algorithm, HITS algorithm system, website analytics

3. Organic vs Direct Traffic

Aspect Organic Traffic Direct Traffic

Visitors who come from unpaid search Visitors who come to a site directly by typing the URL
Definition
engine results. or from bookmarks.

Source Search engines like Google, Bing Browser URL bar, saved bookmarks, email links

Cost Typically free, through SEO efforts Free, depends on brand recognition and user habits
To acquire new users searching for
Purpose Typically indicates user familiarity or loyalty
relevant content
SEO
Heavily influenced by SEO efforts Unaffected by SEO, relies on brand or domain recall
Impact
A user finds a website via Google
Examples A user types the website URL or clicks a bookmark
search

Q2. Explain the types of visitors.

1. New Visitors: Users visiting a website for the first time, generally found through organic search,
paid ads, or referrals. Identifying and attracting new visitors is essential for audience growth.

2. Returning Visitors: Users who have visited the website before and are revisiting it. This indicates
interest and engagement with the site, often resulting in higher conversion rates.

3. Unique Visitors: A distinct count of individuals who visit a website within a specific period,
regardless of the number of visits they make. Unique visitor metrics help measure the true reach
of a website.

4. Repeat Visitors: Similar to returning visitors, but often measured in a more short-term context,
tracking users who frequently visit within a specific time frame. High repeat visits may indicate
strong user engagement.

5. Loyal Visitors: Users who visit a website frequently and engage regularly with its content. They
are often the most valuable segment, as they are more likely to convert and advocate for the
brand.

Q3. What do you understand by benchmarking? Explain the steps involved in this.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 2/4
Benchmarking is a systematic process of comparing a company's performance, products, or
processes with those of industry leaders or competitors to identify areas for improvement. The goal is
to adopt best practices and drive enhancements in efficiency, quality, and productivity.

Steps in Benchmarking:

1. Identify Objectives and Scope: Determine the specific area, metric, or process to benchmark
(e.g., customer satisfaction, product quality).

2. Select Benchmarking Partners: Identify organizations, either competitors or industry leaders,


that excel in the area being benchmarked. Often, non-competing companies in different sectors
are chosen to gain a broad perspective.

3. Data Collection: Gather data through surveys, interviews, or public resources to understand the
partner’s processes, standards, and performance levels.

4. Analyze Data: Compare the collected data with your own metrics to identify gaps, patterns, or
areas where improvements can be made.

5. Develop Action Plans: Based on insights, create actionable steps to bridge performance gaps.
This may involve adopting new practices, optimizing resources, or setting new targets.

6. Implement and Monitor: Put the action plan into practice and monitor its effectiveness over
time. Continuously track the results to ensure sustained improvement.

7. Review and Re-benchmark: Periodically reassess performance against updated benchmarks to


maintain competitive standards.

Q4. Explain the history of web mining.

Web mining is the process of discovering patterns, trends, and insights from data collected from the
World Wide Web. It emerged as the web expanded, creating vast amounts of data requiring advanced
tools and techniques for analysis.

Timeline of Web Mining Development:

1990s: With the rise of the internet, data collection from web logs began, giving early insights
into web usage patterns. Search engines like Lycos and Yahoo emerged, using basic information
retrieval techniques.

Late 1990s: Researchers identified three main types of web mining: Web Content Mining, Web
Structure Mining, and Web Usage Mining. Early algorithms for search ranking, such as PageRank,
were developed to improve search relevance.

Early 2000s: Search engines like Google utilized complex algorithms, while commercial tools for
web analytics (e.g., Omniture) grew. Personalized content delivery and recommendation
systems, like Amazon’s “Customers who bought this also bought,” were developed.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 3/4
2010s: Social media platforms like Facebook and Twitter generated new forms of web data.
Sentiment analysis, NLP, and deep learning became integral to understanding web content,
while companies focused on enhancing personalization.

Present Day: Advances in AI and machine learning are integral to web mining, with
sophisticated models analyzing multimedia data, complex user interactions, and context.
Applications in e-commerce, security, and ad-tech leverage these developments for enhanced
user experience, security, and personalization.

Web mining continues to evolve, with a growing emphasis on ethical data usage and privacy.
ChatGPT can make mistakes. Check important info.

Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 4/4

You might also like