0% found this document useful (0 votes)
5 views

CSC425 Data Mining

The document discusses various techniques for web usage mining, including association rules, clustering, and classification, which analyze user behavior on the web. It also explains the concept of a transfer function in neural networks, providing examples such as linear, sigmoid, and ReLU functions. Additionally, it differentiates between supervised and unsupervised learning, describes information retrieval in text mining, and highlights pitfalls in data mining.

Uploaded by

filee010
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

CSC425 Data Mining

The document discusses various techniques for web usage mining, including association rules, clustering, and classification, which analyze user behavior on the web. It also explains the concept of a transfer function in neural networks, providing examples such as linear, sigmoid, and ReLU functions. Additionally, it differentiates between supervised and unsupervised learning, describes information retrieval in text mining, and highlights pitfalls in data mining.

Uploaded by

filee010
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2a. Web usage mining focuses on analyzing user behavior on the web.

List and explain briefly three (3) techniques that can be used to achieve
this.

Techniques for Web Usage Mining:

1. Association Rules – This technique discovers relationships between web pages visited
together frequently. It helps in identifying patterns in user navigation.
2. Clustering – Users with similar browsing behavior are grouped together to analyze trends. It
helps in personalization and recommendation systems.
3. Classification – This technique involves categorizing user behavior into predefined groups
based on their web activities. It helps in predicting future user actions.

2b(i). What is “Transfer Function” in a Neural Network model?

A transfer function in a neural network is a mathematical function that determines


how input data is transformed into output at each neuron. It helps in defining the
activation level of a neuron.

2b(ii). Give three (3) examples of a transfer function.

1. Linear Transfer Function – The output is directly proportional to the input.


2. Sigmoid Transfer Function – Produces an S-shaped curve and is commonly used in
classification problems.
3. ReLU (Rectified Linear Unit) Function – Outputs zero for negative inputs and the same
value for positive inputs, improving training efficiency.

2c(i). What is Information Retrieval in the context of text mining?

Information Retrieval (IR) is the process of obtaining relevant textual data from a
large collection of unstructured text. It helps in retrieving useful information from
databases, documents, or search engines.
2c(ii). Illustrate with a diagram, the general Information Retrieval
system architecture.

Your course manual contains a diagram of the General Information Retrieval


System Architecture, which includes:

 Document Collection (Source Data)


 Indexing System (Processing and Storage)
 Query Processor (User Interaction)
 Retrieval Engine (Matching and Ranking)
 User Interface (Results Display)

Refer to the diagram in your manual for the correct structure.

3a. Differentiate between Supervised and Unsupervised Learning in a


tabular form.

Aspect Supervised Learning Unsupervised Learning


Learn from labeled data to predict Identify patterns and structures in
Goal
outputs. data.
Uses labeled datasets (input-output Uses unlabeled datasets without
Data
pairs). predefined outputs.
Examples: Decision Trees, Neural
Examples: K-Means Clustering,
Algorithms Networks, Support Vector
DBSCAN, Hierarchical Clustering.
Machines.

3b. In a neural network training, when is a network said to be


Overfitting?

A neural network is overfitting when it learns the training data too well, including
noise and irrelevant details. This results in poor generalization to new data, meaning
the model performs well on training data but poorly on unseen data.
3c. State three (3) differences between Classification and Clustering.

Feature Classification Clustering


Assigns predefined labels to Groups data based on similarity
Definition
data. without predefined labels.
Supervision Supervised learning. Unsupervised learning.
Example Decision Trees, SVM, K-Means, DBSCAN, Hierarchical
Algorithms Neural Networks. Clustering.

4a. k-means is not a suitable algorithm for clustering alphabetic data.


Discuss.

 K-Means clustering is based on numerical distance measures (such as Euclidean distance),


which are ineffective for alphabetic data.
 Alphabetic data, such as words or text, do not have a natural numeric representation for
distance computation.
 Alternative approaches like Hierarchical Clustering or Latent Semantic Analysis (LSA)
are better suited for text data.

4b. List and discuss five (5) data mining pitfalls.

1. Overfitting – The model performs well on training data but fails on unseen data.
2. Ignoring Data Quality Issues – Poor data leads to inaccurate predictions.
3. Selection Bias – Using non-representative data can mislead conclusions.
4. Improper Feature Selection – Using irrelevant or redundant features reduces model
efficiency.
5. Misinterpretation of Results – Correlation does not imply causation, leading to incorrect
insights.
4c. Use the following statements to draw a neural network structure.

To complete this, refer to your course manual’s example of a Neural Network


Structure Diagram, which typically consists of:

 Input Layer (Features or Inputs)


 Hidden Layers (Processing Layers with Neurons)
 Output Layer (Final Decision or Classification)

Use the provided statements to accurately structure your diagram.

You might also like