Main Challenges The Web Poses For Knowledge Discovery

The document discusses three main challenges of knowledge discovery on the web: 1) The web is vastly complex and difficult to search and understand its current status; 2) The web changes rapidly making it hard to have an up-to-date index and the archives continue growing in size; 3) More than 99% of web pages have never been seen or indexed, making searching and determining user needs very difficult.

Uploaded by

Jillian Noreen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

959 views1 page

Main Challenges The Web Poses For Knowledge Discovery

Uploaded by

Jillian Noreen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 1

Main Challenges the Web Poses for Knowledge Discovery

The Internet is vastly complex and not easily understood, thus difficult to search.
It is nearly impossible to understand the current status of the Web, thus finding useful
information becomes an issue. The high speed of data being generated on the Web
means that it might not be possible to mine very deep into it. The vast size and
complexity of the Web also make it difficult for search engines to crawl entire pages at
once and render logical results (such as relevancy). Trying to mine the full set of data
on a single page is not feasible due to its sheer size, too many links, and other factors.

The Web changes dynamically and rapidly, thus no static index can go back in
time to reflect new content. Changes are being made by users at a rapid rate, and
therefore it is hard to have a current index of what has already been said. It might be
possible for organizers to compile an archive of some pages over time, but the problem
with this approach is that as the amount of content on such pages increases, so does
the size of the archive.

More than 99% of Web pages have never been seen by human eyes and cannot
be indexed, thus there is no efficient way to search for information within them (thus
making human-based search even more difficult). Given the sheer size of the Web,
even if a small fraction of the pages could be indexed, it would still be incomparable to
the number of pages in total. It is hard to determine what users are looking for or need.
There is no feedback or negotiation on possible relevant links that can help improve
future search results.

Brighto Wall Emulsion Digital Shade Card
No ratings yet
Brighto Wall Emulsion Digital Shade Card
4 pages
Al-Ghazali and Aesthetics
No ratings yet
Al-Ghazali and Aesthetics
7 pages
Time Sharing: Operating System
No ratings yet
Time Sharing: Operating System
8 pages
Islam 360
100% (1)
Islam 360
1 page
Literature Review Comparison Table
100% (1)
Literature Review Comparison Table
7 pages
Oriflame Products Knowledge-1
100% (3)
Oriflame Products Knowledge-1
14 pages
General Cover Letter KFUPM
No ratings yet
General Cover Letter KFUPM
3 pages
Urdu1 PDF
No ratings yet
Urdu1 PDF
298 pages
Irs Unit-5
No ratings yet
Irs Unit-5
28 pages
Wiki - Wikipedia
No ratings yet
Wiki - Wikipedia
80 pages
Data Mining and Search Techniques in The Biotechnology and 255oe5j8qk
No ratings yet
Data Mining and Search Techniques in The Biotechnology and 255oe5j8qk
9 pages
Knowledge Discovery - Edited
No ratings yet
Knowledge Discovery - Edited
4 pages
Module 1 Multimedia Concepts and Graphic Design
No ratings yet
Module 1 Multimedia Concepts and Graphic Design
12 pages
Seminar II Pdf3
No ratings yet
Seminar II Pdf3
5 pages
Summarize Principles of Distributed Database Systems Chapter 12 Web Data Management
No ratings yet
Summarize Principles of Distributed Database Systems Chapter 12 Web Data Management
24 pages
Sethi2021 Article AnOptimizedCrawlingTechniqueFo
No ratings yet
Sethi2021 Article AnOptimizedCrawlingTechniqueFo
29 pages
Assignment Data Science
No ratings yet
Assignment Data Science
3 pages
Topic:: Assignment No. 4
No ratings yet
Topic:: Assignment No. 4
7 pages
How Large Is The Web
No ratings yet
How Large Is The Web
19 pages
Assignment 3 Artificial Intelligence
No ratings yet
Assignment 3 Artificial Intelligence
5 pages
LLLLLLLLLLLLLLLLL
No ratings yet
LLLLLLLLLLLLLLLLL
30 pages
Research On Redrawing The Tag Base Search Model On The Deep Invisible Web
No ratings yet
Research On Redrawing The Tag Base Search Model On The Deep Invisible Web
6 pages
Unit Iv - Irt
No ratings yet
Unit Iv - Irt
62 pages
Internet User Growth
No ratings yet
Internet User Growth
5 pages
Ijis7 1 Editorial
No ratings yet
Ijis7 1 Editorial
5 pages
Understanding The Self
No ratings yet
Understanding The Self
13 pages
Hci Unit 5
No ratings yet
Hci Unit 5
22 pages
Products Variations and User Uploads - Part I: By: Muhammad Zeeshan Ali
No ratings yet
Products Variations and User Uploads - Part I: By: Muhammad Zeeshan Ali
10 pages
What Is Web Mining?
No ratings yet
What Is Web Mining?
2 pages
Hyper Search Engines
No ratings yet
Hyper Search Engines
18 pages
Chapter - 2 Literature Survey: S. No Page No
No ratings yet
Chapter - 2 Literature Survey: S. No Page No
22 pages
A Survey On Approaches of Web Mining in Varied Areas
No ratings yet
A Survey On Approaches of Web Mining in Varied Areas
6 pages
Web Structure Mining
No ratings yet
Web Structure Mining
10 pages
Graded Discussion 3
No ratings yet
Graded Discussion 3
3 pages
Modern Trends of Waste Management
No ratings yet
Modern Trends of Waste Management
17 pages
Web Mining Report
100% (2)
Web Mining Report
46 pages
Internet Searching: Crawling Is Conceptually Quite Simple: Starting at Some Well-Known Sites On The Web
No ratings yet
Internet Searching: Crawling Is Conceptually Quite Simple: Starting at Some Well-Known Sites On The Web
4 pages
University of Karachi Prospectus 2012
50% (2)
University of Karachi Prospectus 2012
98 pages
Wiki Wiki Web Was The First Wiki
No ratings yet
Wiki Wiki Web Was The First Wiki
3 pages
The Anatomy of A Large-Scale Hypertextual Web Search Engine: Sergey Brin and Lawrence Page
No ratings yet
The Anatomy of A Large-Scale Hypertextual Web Search Engine: Sergey Brin and Lawrence Page
19 pages
Rhetorical Analysis
No ratings yet
Rhetorical Analysis
4 pages
Nipol NBR
No ratings yet
Nipol NBR
4 pages
City Guide
No ratings yet
City Guide
78 pages
The Technical Documentation Library Help 8AL90922USAA 08 en PDF
No ratings yet
The Technical Documentation Library Help 8AL90922USAA 08 en PDF
27 pages
Published Books Arabic Books
No ratings yet
Published Books Arabic Books
5 pages
Search Engines and Web Dynamics: Knut Magne Risvik Rolf Michelsen
No ratings yet
Search Engines and Web Dynamics: Knut Magne Risvik Rolf Michelsen
17 pages
Assignment 5 Emerging Technologies
60% (5)
Assignment 5 Emerging Technologies
4 pages
Topic 3 Probalistic Models
No ratings yet
Topic 3 Probalistic Models
5 pages
Hadees e Noor (Takhreej) PDF
No ratings yet
Hadees e Noor (Takhreej) PDF
18 pages
Niqash Handicrafts
86% (7)
Niqash Handicrafts
25 pages
User Specific Search Using Grouping and Organization
No ratings yet
User Specific Search Using Grouping and Organization
6 pages
Shirk Kya Hai What Is Shirk???
No ratings yet
Shirk Kya Hai What Is Shirk???
173 pages
Ilajul Ehtelam
No ratings yet
Ilajul Ehtelam
48 pages
AIOU ADE Solve Assignment
No ratings yet
AIOU ADE Solve Assignment
30 pages
Dinuca Ciobanu
No ratings yet
Dinuca Ciobanu
8 pages
Chapter 6 - Security and Privacy Plan
No ratings yet
Chapter 6 - Security and Privacy Plan
2 pages
John "Captain Crunch" Draper. History of Hacking
No ratings yet
John "Captain Crunch" Draper. History of Hacking
16 pages
Ghulam Ishaq Khan Institute of Engineering Sciences and Technology
No ratings yet
Ghulam Ishaq Khan Institute of Engineering Sciences and Technology
10 pages
Lecture 4 SRE-Social & Cultural Issues in Requirements Engineering
No ratings yet
Lecture 4 SRE-Social & Cultural Issues in Requirements Engineering
22 pages
Riphah International University: FC SE
No ratings yet
Riphah International University: FC SE
5 pages
Information Retrieval & Machine Learning: Supporting Technologies For Web Mining Research & Practice
No ratings yet
Information Retrieval & Machine Learning: Supporting Technologies For Web Mining Research & Practice
16 pages
04 Chapter 2
No ratings yet
04 Chapter 2
24 pages
Executive Summary
No ratings yet
Executive Summary
24 pages
Asian Regionalism YTvid Summary
No ratings yet
Asian Regionalism YTvid Summary
3 pages
Lab 1 ITT557
No ratings yet
Lab 1 ITT557
3 pages
IAS PU Internship Report Format
100% (1)
IAS PU Internship Report Format
6 pages
Fee Challan MS-PPT Fall 2020 & Spring 2021
No ratings yet
Fee Challan MS-PPT Fall 2020 & Spring 2021
10 pages
Digital Marketing Full Paper
No ratings yet
Digital Marketing Full Paper
16 pages
Peridot Research Program: Joint Project Proposal
No ratings yet
Peridot Research Program: Joint Project Proposal
7 pages