0% found this document useful (0 votes)
959 views1 page

Main Challenges The Web Poses For Knowledge Discovery

The document discusses three main challenges of knowledge discovery on the web: 1) The web is vastly complex and difficult to search and understand its current status; 2) The web changes rapidly making it hard to have an up-to-date index and the archives continue growing in size; 3) More than 99% of web pages have never been seen or indexed, making searching and determining user needs very difficult.

Uploaded by

Jillian Noreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
959 views1 page

Main Challenges The Web Poses For Knowledge Discovery

The document discusses three main challenges of knowledge discovery on the web: 1) The web is vastly complex and difficult to search and understand its current status; 2) The web changes rapidly making it hard to have an up-to-date index and the archives continue growing in size; 3) More than 99% of web pages have never been seen or indexed, making searching and determining user needs very difficult.

Uploaded by

Jillian Noreen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Main Challenges the Web Poses for Knowledge Discovery

The Internet is vastly complex and not easily understood, thus difficult to search.
It is nearly impossible to understand the current status of the Web, thus finding useful
information becomes an issue. The high speed of data being generated on the Web
means that it might not be possible to mine very deep into it. The vast size and
complexity of the Web also make it difficult for search engines to crawl entire pages at
once and render logical results (such as relevancy). Trying to mine the full set of data
on a single page is not feasible due to its sheer size, too many links, and other factors.

The Web changes dynamically and rapidly, thus no static index can go back in
time to reflect new content. Changes are being made by users at a rapid rate, and
therefore it is hard to have a current index of what has already been said. It might be
possible for organizers to compile an archive of some pages over time, but the problem
with this approach is that as the amount of content on such pages increases, so does
the size of the archive.

More than 99% of Web pages have never been seen by human eyes and cannot
be indexed, thus there is no efficient way to search for information within them (thus
making human-based search even more difficult). Given the sheer size of the Web,
even if a small fraction of the pages could be indexed, it would still be incomparable to
the number of pages in total. It is hard to determine what users are looking for or need.
There is no feedback or negotiation on possible relevant links that can help improve
future search results.

You might also like