Most Detailed 4 Data Mining Answers
Most Detailed 4 Data Mining Answers
**Data Reduction:**
Data reduction minimizes the dataset size while retaining important features. Techniques include:
- **Dimensionality Reduction:** Uses Principal Component Analysis (PCA) to reduce attributes.
- **Data Compression:** Encodes data efficiently (e.g., Huffman coding).
- **Sampling:** Uses subsets of data instead of full data for analysis.
- **Feature Selection:** Removes redundant attributes using correlation analysis.
- **Roll-up:** Aggregates data to a higher level (e.g., from monthly sales to yearly sales).
- **Drill-down:** Moves from summarized to detailed data (e.g., from yearly sales to monthly sales).
- **Slice:** Extracts data for a single dimension (e.g., filtering sales for 2023 only).
- **Dice:** Extracts a subset of data based on multiple dimensions (e.g., sales for 2023 and product
category A).
- **Pivot:** Rotates data for different perspectives (e.g., switching rows and columns in a report).
**Interpretation:**
- Lift > 1 indicates a strong positive correlation (customers buying bread are likely to buy milk).
**Types:**
1. **Agglomerative Hierarchical Clustering:**
- Starts with individual points and merges the closest clusters iteratively.
- Linkage methods:
- **Single Linkage:** Merges clusters based on shortest distance.
- **Complete Linkage:** Merges clusters based on farthest distance.
- **Average Linkage:** Uses the average distance between clusters.