0% found this document useful (0 votes)
17 views4 pages

Full Detailed Data Mining Answer Key

The document provides an answer key for data mining questions, covering topics such as data mining definitions, interestingness, data preprocessing categories, and various algorithms like SVM and k-NN. It also discusses data warehouse features, OLAP server comparisons, and challenges in knowledge discovery on the web. Additionally, it includes explanations of clustering methods, correlation using lift, and issues in classification and prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

Full Detailed Data Mining Answer Key

The document provides an answer key for data mining questions, covering topics such as data mining definitions, interestingness, data preprocessing categories, and various algorithms like SVM and k-NN. It also discusses data warehouse features, OLAP server comparisons, and challenges in knowledge discovery on the web. Additionally, it includes explanations of clustering methods, correlation using lift, and issues in classification and prediction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Comprehensive Data Mining Answer Key

2-MARK QUESTIONS

1. What do you mean by data mining?


Data mining is the process of analyzing large datasets to identify hidden patterns, correlations, and
useful information. It combines techniques from statistics, machine learning, and databases.
Applications include fraud detection, market analysis, and customer segmentation.

2. What do you mean by interestingness?


Interestingness refers to the usefulness and significance of patterns discovered in data mining. It is
measured using:
- **Support:** How often a pattern appears in the dataset.
- **Confidence:** Probability of occurrence given another event.
- **Lift:** Strength of association compared to random chance.

3. Mention the 4 categories of data preprocessing.


1. **Data Cleaning:** Removing errors, handling missing values.
2. **Data Integration:** Combining multiple sources.
3. **Data Transformation:** Normalization, feature scaling.
4. **Data Reduction:** PCA, feature selection.

4. What is technical metadata in a data warehouse?


Technical metadata includes structural details about the data stored in a warehouse:
- **Data types:** Integer, string, float.
- **Indexes:** Speed up query performance.
- **Relationships:** Define connections between tables.
- **Data lineage:** Tracks transformation history.

5. What do you mean by scalability of a classifier?


Scalability refers to a classifiers ability to handle large datasets efficiently. A scalable model:
- Maintains accuracy with large data.
- Uses optimized algorithms like SVM, decision trees, neural networks.

6. What is the objective of SVM?


Support Vector Machine (SVM) aims to find the optimal hyperplane that best separates data
classes. It maximizes the margin between different class points for better generalization.

7. What is lazy learning? Give an example.


Lazy learning postpones model training until a query is made.
**Example:** k-Nearest Neighbors (k-NN) predicts labels based on closest stored data points.

8. What is regression?
Regression is a statistical method used to predict continuous numerical values based on
independent variables.
**Example:** Predicting house prices based on square footage and location.

9. What is a continuous ordinal variable? Give an example.


A continuous ordinal variable has ordered categories with measurable differences.
**Example:** Customer satisfaction ratings on a scale from 1 to 10.

10. What do you mean by partitioning methods of clustering?


Partitioning methods divide data into clusters based on similarity. Examples include:
- **k-Means:** Assigns data points to k clusters based on centroid minimization.
- **k-Medoids:** Uses actual data points as cluster centers to reduce noise effects.

11. What do you mean by feature descriptor?


A feature descriptor represents object characteristics in pattern recognition and image processing.
**Example:** SIFT (Scale-Invariant Feature Transform) detects key image features.

12. What is text mining?


Text mining extracts meaningful insights from unstructured text using NLP techniques.
**Applications:** Sentiment analysis, spam filtering, document classification.

5-MARK QUESTIONS

13. Explain tight coupling and semi-tight coupling in data mining systems.
- **Tight Coupling:** Data mining functions are integrated into the database system, ensuring faster
data access and better optimization.
- **Semi-Tight Coupling:** Some data mining tasks (preprocessing, feature extraction) are external,
but mining functions remain integrated within the database.
14. Explain the features of a data warehouse.
- **Subject-Oriented:** Focuses on business subjects (e.g., sales, customers).
- **Integrated:** Combines structured data from multiple sources.
- **Time-Variant:** Stores historical data for trend analysis.
- **Non-Volatile:** Data remains stable after entry to ensure consistency.

15. Compare and contrast ROLAP and MOLAP servers.


**ROLAP (Relational OLAP):**
- Uses relational databases.
- Supports complex queries but slower.

**MOLAP (Multidimensional OLAP):**


- Uses specialized multidimensional storage.
- Faster but requires more space.

16. Explain uniform support and reduced support in multi-level association rules.
- **Uniform Support:** Applies a single support threshold across all levels.
- **Reduced Support:** Uses different thresholds at different levels to reflect varying item
frequencies.

17. Explain issues in classification and prediction.


- **Data Quality Issues:** Missing or noisy data affects accuracy.
- **Overfitting & Underfitting:** Poor model generalization.
- **Scalability:** Handling large datasets.
- **Imbalanced Data:** Some classes dominate others.
- **Model Interpretability:** Some complex models lack transparency.

15-MARK QUESTIONS

22. Explain the challenges in knowledge discovery in WWW.


Challenges include:
- **Huge Data Volumes:** Web data is vast and requires scalable solutions.
- **Dynamic & Heterogeneous Data:** Content is diverse and constantly changing.
- **Scalability Issues:** Data processing must be efficient.
- **Privacy & Security:** Compliance with regulations.
- **Web Spam & Irrelevant Data:** Need for quality filtering.

23. Explain with diagrams, various OLAP operations.


- **Roll-up:** Aggregates data at higher levels.
- **Drill-down:** Moves from summary to detailed view.
- **Slice:** Filters data on a single dimension.
- **Dice:** Filters data on multiple dimensions.
- **Pivot:** Rotates data to view from different perspectives.

(Diagram will be provided separately).

24. Explain with an example, how to perform correlation using lift.


**Formula:** Lift = (Confidence of Rule) / (Expected Confidence)
**Example:** If buying milk and bread has a lift of 1.5, it indicates strong correlation.

25. Explain hierarchical method of clustering.


- **Agglomerative:** Starts with individual points and merges iteratively.
- **Divisive:** Starts with a large cluster and splits iteratively.
Uses dendrograms for visualization.

You might also like