Unit I
Unit I
Unit I
Module: I
detection.
Web Mining:
Data Mining is the process that attempts to Web Mining is the process of data mining
Definition discover pattern and hidden knowledge techniques to automatically discover and extract
in large data sets in any system. information from web documents.
Data Mining is very useful for web page Web Mining is very useful for a particular
Application
analysis. website and e-service.
Target Users Data scientist and data engineers. Data scientists along with data analysts.
Access Data Mining access data privately. Web Mining access data publicly.
It includes tools like machine learning Special tools for web mining are Scrapy, PageRank
Tools
algorithms. and Apache logs.
It includes approaches for data cleansing, It includes application level knowledge, data
Skills machine learning algorithms. Statistics engineering with mathematical modules like
and probability. statistics and probability.
Basic Concepts of Information Retrieval:
• Information retrieval (IR) is the study of helping users to find information
that matches their information needs.
• Technically, IR studies the acquisition, organization, storage, retrieval,
and distribution of information.
• Historically, IR is about document retrieval, emphasizing document as the
basic unit.
General architecture of an IR system:
• The user with information need issues a query (user query) to the retrieval
system through the query operations module.
• The retrieval module uses the document index to retrieve those documents
that contain some query terms (such documents are likely to be relevant to
the query), compute relevance scores for them, and then rank the retrieved
documents according to the scores.
• The ranked documents are then presented to the user.