CSC425 Data Mining
CSC425 Data Mining
List and explain briefly three (3) techniques that can be used to achieve
this.
1. Association Rules – This technique discovers relationships between web pages visited
together frequently. It helps in identifying patterns in user navigation.
2. Clustering – Users with similar browsing behavior are grouped together to analyze trends. It
helps in personalization and recommendation systems.
3. Classification – This technique involves categorizing user behavior into predefined groups
based on their web activities. It helps in predicting future user actions.
Information Retrieval (IR) is the process of obtaining relevant textual data from a
large collection of unstructured text. It helps in retrieving useful information from
databases, documents, or search engines.
2c(ii). Illustrate with a diagram, the general Information Retrieval
system architecture.
A neural network is overfitting when it learns the training data too well, including
noise and irrelevant details. This results in poor generalization to new data, meaning
the model performs well on training data but poorly on unseen data.
3c. State three (3) differences between Classification and Clustering.
1. Overfitting – The model performs well on training data but fails on unseen data.
2. Ignoring Data Quality Issues – Poor data leads to inaccurate predictions.
3. Selection Bias – Using non-representative data can mislead conclusions.
4. Improper Feature Selection – Using irrelevant or redundant features reduces model
efficiency.
5. Misinterpretation of Results – Correlation does not imply causation, leading to incorrect
insights.
4c. Use the following statements to draw a neural network structure.