0% found this document useful (0 votes)
17 views3 pages

Most Detailed 4 Data Mining Answers

The document discusses data transformation and reduction techniques essential for data mining, including normalization, aggregation, and dimensionality reduction. It also explains OLAP operations such as roll-up and drill-down, and provides a detailed example of calculating correlation using lift. Additionally, it covers hierarchical clustering methods, outlining agglomerative and divisive approaches along with their applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Most Detailed 4 Data Mining Answers

The document discusses data transformation and reduction techniques essential for data mining, including normalization, aggregation, and dimensionality reduction. It also explains OLAP operations such as roll-up and drill-down, and provides a detailed example of calculating correlation using lift. Additionally, it covers hierarchical clustering methods, outlining agglomerative and divisive approaches along with their applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Most Detailed Data Mining Answers with Diagrams

22. Explain data transformation and data reduction in detail.


**Data Transformation:**
Data transformation is the process of converting data into a suitable format for mining. It includes:
- **Normalization:** Adjusting values to a common scale (e.g., Min-Max Scaling: (X - Min) / (Max -
Min)).
- **Aggregation:** Summarizing data at a higher level (e.g., Monthly sales Quarterly sales).
- **Smoothing:** Removing noise using moving averages or binning.
- **Discretization:** Converting continuous values into discrete categories (e.g., Age Young,
Middle-aged, Senior).

**Data Reduction:**
Data reduction minimizes the dataset size while retaining important features. Techniques include:
- **Dimensionality Reduction:** Uses Principal Component Analysis (PCA) to reduce attributes.
- **Data Compression:** Encodes data efficiently (e.g., Huffman coding).
- **Sampling:** Uses subsets of data instead of full data for analysis.
- **Feature Selection:** Removes redundant attributes using correlation analysis.

23. Explain with diagrams, various OLAP operations.


**OLAP (Online Analytical Processing) Operations:**
OLAP is used in data warehousing to analyze multidimensional data effectively. Key operations
include:

- **Roll-up:** Aggregates data to a higher level (e.g., from monthly sales to yearly sales).
- **Drill-down:** Moves from summarized to detailed data (e.g., from yearly sales to monthly sales).
- **Slice:** Extracts data for a single dimension (e.g., filtering sales for 2023 only).
- **Dice:** Extracts a subset of data based on multiple dimensions (e.g., sales for 2023 and product
category A).
- **Pivot:** Rotates data for different perspectives (e.g., switching rows and columns in a report).

24. Explain with an example, how to perform correlation using lift.


**Lift Calculation Formula:**
- Lift = (Confidence of Rule) / (Expected Confidence)
**Example:**
- Assume a supermarket dataset where:
- 20% of transactions include bread.
- 30% of transactions include milk.
- 10% of transactions include both bread and milk.

**Step 1: Calculate Confidence:**


- Confidence(Bread Milk) = P(Bread and Milk) / P(Bread)
- Confidence = 10% / 20% = 0.5 (50%)

**Step 2: Calculate Expected Confidence:**


- Expected Confidence = P(Milk) = 30% (0.3)

**Step 3: Calculate Lift:**


- Lift = 0.5 / 0.3 = 1.67

**Interpretation:**
- Lift > 1 indicates a strong positive correlation (customers buying bread are likely to buy milk).

25. Explain hierarchical method of clustering.


**Definition:**
Hierarchical clustering builds a tree-like structure (dendrogram) of nested clusters.

**Types:**
1. **Agglomerative Hierarchical Clustering:**
- Starts with individual points and merges the closest clusters iteratively.
- Linkage methods:
- **Single Linkage:** Merges clusters based on shortest distance.
- **Complete Linkage:** Merges clusters based on farthest distance.
- **Average Linkage:** Uses the average distance between clusters.

2. **Divisive Hierarchical Clustering:**


- Starts with a single large cluster and recursively splits it into smaller clusters.
**Example Applications:**
- Used in bioinformatics for gene classification.
- Helps in customer segmentation for targeted marketing.

You might also like