World Wide Web: The Number of Documents On The Indexed Web Is Now On The Order
World Wide Web: The Number of Documents On The Indexed Web Is Now On The Order
useful
insights from data. A wide variation exists in terms of the problem domains,
applications,
formulations, and data representations that are encountered in real applications.
Therefore,
“data mining” is a broad umbrella term that is used to describe these different aspects
of
data processing.
In the modern age, virtually all automated systems generate some form of data either
for diagnostic or analysis purposes. This has resulted in a deluge of data, which has
been
reaching the order of petabytes or exabytes. Some examples of different kinds of data
are
as follows:
• World Wide Web: The number of documents on the indexed Web is now on the order
of billions, and the invisible Web is much larger. User accesses to such documents
create Web access logs at servers and customer behavior profiles at commercial sites.
Furthermore, the linked structure of the Web is referred to as the Web graph, which
is itself a kind of data. These different types of data are useful in various applications.
For example, the Web documents and link structure can be mined to determine
associations
between different topics on the Web. On the other hand, user access logs can
be mined to determine frequent patterns of accesses or unusual patterns of possibly
unwarranted behavior.
• Financial interactions: Most common transactions of everyday life, such as using an
automated teller machine (ATM) card or a credit card, can create data in an automated
way. Such transactions can be mined for many useful insights such as fraud or
other unusual activity.