0% found this document useful (0 votes)

5 views4 pages

Data Mining Long Answers

Data mining is essential for businesses like Amazon, which uses it for customer relationship management by analyzing purchase histories and recommending products. Key functions include classification, clustering, association rule mining, and prediction, which cannot be fully replaced by simple query processing or statistical analysis. The document also discusses various data mining concepts, techniques, and algorithms, emphasizing their importance in data analysis and decision-making.

Uploaded by

AVINASH KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

Data Mining Long Answers

Uploaded by

AVINASH KUMAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

AAT-II

Data Mining and Knowledge Discovery

1. Explain with an example where data mining is crucial to the success of a
business. What data mining functions does this business need? Can they be
performed alternatively by data query processing or simple statistical analysis?
A. Data mining plays a crucial role in customer relationship management (CRM),
particularly for businesses like Amazon. Amazon uses data mining to analyze customer
purchase histories, browsing patterns, and feedback to recommend products. For example,
if a customer frequently buys fiction books, Amazon will recommend new fiction releases.

The data mining functions needed include:

- Classification (to predict customer buying behavior),
- Clustering (to segment customers),
- Association Rule Mining (to discover patterns like “customers who bought X also bought
Y”), and
- Prediction (to forecast future sales).

These cannot be completely performed by simple query processing or basic statistical

analysis. Queries can retrieve data but not reveal hidden patterns. Statistical analysis
provides summaries, but data mining gives predictive insights. Hence, data mining is
essential for success in such cases.

2. Explain the difference between discrimination and classification? Between

characterization and clustering? Between classification and prediction? For
each of these pairs of tasks, how are they similar?
A. Discrimination vs. Classification:
- Discrimination distinguishes between different classes based on attribute values
(descriptive), whereas classification builds models to assign new data to predefined classes
(predictive).
- Similarity: Both deal with class labels.

Characterization vs. Clustering:

- Characterization provides summarized data about a target class (descriptive), while
clustering groups data based on similarity without predefined labels (unsupervised).
- Similarity: Both analyze general patterns in data.

Classification vs. Prediction:

- Classification assigns categories, prediction forecasts continuous values.
- Similarity: Both are supervised learning and involve a training phase.

3. Explain briefly about the data smoothing techniques?

A. Data smoothing is used to remove noise and outliers from data. Key techniques include:

- Binning: Sort data and partition into equal-sized bins, then smooth by bin mean, median,
or boundaries.
- Regression: Fit a regression function (like linear) to the data and use the function to
smooth.
- Clustering: Detect and smooth outliers by grouping similar data and averaging within
clusters.
- Moving average: Average a window of nearby values to smooth fluctuations.

These techniques help improve data quality before applying mining algorithms.

4. Classify the various data reduction techniques?

A. Data reduction techniques aim to reduce the volume of data without losing its integrity.
Major types:

- Dimensionality Reduction: Removes irrelevant or redundant attributes (e.g., PCA –

Principal Component Analysis).
- Numerosity Reduction: Uses models (e.g., regression, clustering) or histograms to
represent data with fewer numbers.
- Data Compression: Uses encoding schemes like wavelet transforms.
- Aggregation: Data is summarized into higher-level forms, such as daily to monthly sales.
- Sampling: Selects representative subsets of the data.

These techniques are essential to make large datasets manageable for analysis.

5. Explain slice and pivot operations on data cube with a neat sketch?
A. Slice operation selects a single dimension from a data cube to form a sub-cube (e.g., sales
for the year 2023).
Pivot operation (also called rotate) reorients the cube view, changing the dimensional
orientation to view data from different angles (e.g., swap rows and columns).

[Diagram not supported in code: A 3D cube showing slicing along one plane and rotating the
cube.]

These operations help in multidimensional analysis and OLAP (Online Analytical

Processing).
6. Summarize different measures used in data warehouse construction with an
example?
A. Data warehouse measures can be classified as:

- Distributive (e.g., COUNT, SUM): Can be computed in parts and merged.

- Algebraic (e.g., AVERAGE): Computed using a fixed number of distributive measures.
- Holistic (e.g., MEDIAN, MODE): Require access to entire data set.

Example: In a sales warehouse,

- SUM of sales is distributive,
- AVERAGE sale value is algebraic,
- MEDIAN sales price is holistic.

Choosing the right type affects performance and storage during aggregation.

7. Discuss which algorithm is an influential algorithm for mining frequent item

sets for boolean association rules? Explain with an example?
A. A priori is the most influential algorithm for mining frequent item sets in transactional
databases. It uses the principle that if an itemset is frequent, all of its subsets must also be
frequent.

Example: For a supermarket:

Transaction 1: {milk, bread}
Transaction 2: {milk, diaper, beer, bread}
Transaction 3: {milk, diaper, beer, cola}

Apriori first finds frequent 1-itemsets (e.g., milk, bread), then 2-itemsets (e.g., milk &
bread), and so on, pruning infrequent ones early.

It’s effective but can be computationally expensive due to candidate generation.

8. Define the terms frequent item sets, closed item sets and association rules?
A. - Frequent Item Sets: Groups of items that appear together frequently in transactions
(e.g., {milk, bread}).
- Closed Item Sets: Itemsets that are frequent and have no supersets with the same
frequency.
- Association Rules: Implication rules of the form X → Y, meaning if X occurs, Y is likely to
occur (e.g., milk → bread).

These are used in market basket analysis to find relationships among items.
9. Explain each of the following clustering algorithms in terms of the following
criteria: (i) shapes of clusters that can be determined; (ii) input parameters that
must be specified; and (iii) limitations. (a) k-means (b) k-medoids (c) CLARA
A. (a) K-Means:
(i) Assumes spherical clusters
(ii) Requires number of clusters (k)
(iii) Sensitive to outliers and initial seed

(b) K-Medoids:
(i) Handles arbitrary shapes better
(ii) Needs number of clusters (k)
(iii) More robust to noise, but computationally expensive

(c) CLARA (Clustering LARge Applications):

(i) Handles large datasets using samples
(ii) Number of clusters (k) and sample size
(iii) May miss global patterns due to sampling

10. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
Compute (a) The Euclidean distance between the two objects. (b)The
Manhattan distance between the two objects. (c) The Minkowski distance
between the two objects, using p = 3
A. Let A = (22, 1, 42, 10), B = (20, 0, 36, 8)

(a) Euclidean Distance = sqrt((22-20)^2 + (1-0)^2 + (42-36)^2 + (10-8)^2)

= sqrt(4 + 1 + 36 + 4) = sqrt(45) ≈ 6.71

(b) Manhattan Distance = |22-20| + |1-0| + |42-36| + |10-8| = 2 + 1 + 6 + 2 = 11

(c) Minkowski (p=3) = [|2|^3 + |1|^3 + |6|^3 + |2|^3]^(1/3)

= (8 + 1 + 216 + 8)^(1/3) = (233)^(1/3) ≈ 6.2

21 Network Programmability and Automation
No ratings yet
21 Network Programmability and Automation
19 pages
Text Generation
100% (2)
Text Generation
14 pages
WT Da All Practical Questions
100% (2)
WT Da All Practical Questions
100 pages
What Is Normalization in DBMS (SQL) - 1NF, 2NF, 3NF, BCNF Database With Example
No ratings yet
What Is Normalization in DBMS (SQL) - 1NF, 2NF, 3NF, BCNF Database With Example
8 pages
DEC GEC Approved MOOC Courses MED
No ratings yet
DEC GEC Approved MOOC Courses MED
3 pages
Knowledge Management in The Organization
100% (1)
Knowledge Management in The Organization
23 pages
Computerized Library System Thesis
100% (2)
Computerized Library System Thesis
5 pages
COMP 6721 Applied Artificial Intelligence (Fall 2021) Project Assignment, Part I
No ratings yet
COMP 6721 Applied Artificial Intelligence (Fall 2021) Project Assignment, Part I
3 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Gis Naib Tehsildar
No ratings yet
Gis Naib Tehsildar
11 pages
Question Bank On NLP, COA, ITB
No ratings yet
Question Bank On NLP, COA, ITB
154 pages
Unit 1 Fod
No ratings yet
Unit 1 Fod
43 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
Artificial Intelligence Overview
No ratings yet
Artificial Intelligence Overview
10 pages
Object Detection On Raspberry Pi
No ratings yet
Object Detection On Raspberry Pi
6 pages
Relational Model Slides
No ratings yet
Relational Model Slides
30 pages
Unit-1: 1. Define Data Mining and Explain Its Importance in Modern Data Analysis
No ratings yet
Unit-1: 1. Define Data Mining and Explain Its Importance in Modern Data Analysis
42 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
RDBMS Notes
100% (1)
RDBMS Notes
77 pages
LLM Evaluation
No ratings yet
LLM Evaluation
5 pages
Interactive Machine Learning For Health Informatics
No ratings yet
Interactive Machine Learning For Health Informatics
13 pages
Datamining 1
No ratings yet
Datamining 1
7 pages
Ai Unit-5 QB
No ratings yet
Ai Unit-5 QB
9 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Case Study C++
No ratings yet
Case Study C++
11 pages
DM 100
No ratings yet
DM 100
17 pages
Business Case - Twitter NER Approach
No ratings yet
Business Case - Twitter NER Approach
9 pages
Short Notes On Data Mining & Warehousing
No ratings yet
Short Notes On Data Mining & Warehousing
43 pages
Maang Resume Rishabh
No ratings yet
Maang Resume Rishabh
2 pages
Learn OSM
No ratings yet
Learn OSM
3 pages
VI SEM DS Guess Paper CSE & IT
No ratings yet
VI SEM DS Guess Paper CSE & IT
4 pages
Template Jurnal Teknologi Pendidikan
No ratings yet
Template Jurnal Teknologi Pendidikan
4 pages
Data Mining and Warehousing
100% (3)
Data Mining and Warehousing
30 pages
DMW Notes UNIT-1 2023-24
No ratings yet
DMW Notes UNIT-1 2023-24
15 pages
CNS IMP Questions
No ratings yet
CNS IMP Questions
2 pages
Model Question Paper 2
No ratings yet
Model Question Paper 2
7 pages
DWDM MID - 2 Question Paper and Online Bits
No ratings yet
DWDM MID - 2 Question Paper and Online Bits
3 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
Portable Document Format (PDF), Standardized As ISO 32000, Is A File Format Developed by Adobe in 1992 To
No ratings yet
Portable Document Format (PDF), Standardized As ISO 32000, Is A File Format Developed by Adobe in 1992 To
1 page
David Latest CV
No ratings yet
David Latest CV
1 page
Assignment 3
No ratings yet
Assignment 3
4 pages
DMA QB Solved
No ratings yet
DMA QB Solved
42 pages
Ques 1.give Some Examples of Data Preprocessing Techniques?: Assignment - DWDM Submitted By-Tanya Sikka 1719210284
No ratings yet
Ques 1.give Some Examples of Data Preprocessing Techniques?: Assignment - DWDM Submitted By-Tanya Sikka 1719210284
7 pages
Datamining Quiz
No ratings yet
Datamining Quiz
173 pages
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
No ratings yet
Subject Code: 80359 Subject Name: Data Warehousing and Data Mining Common Subject Code (If Any)
9 pages
Oral Questions LP II
No ratings yet
Oral Questions LP II
21 pages
DWDM Unitwise Qns
No ratings yet
DWDM Unitwise Qns
3 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
DM Vsaq
No ratings yet
DM Vsaq
8 pages
DWDM
No ratings yet
DWDM
18 pages
Comp 414 Revision
No ratings yet
Comp 414 Revision
9 pages
DWDM-CSE-Question Bank
No ratings yet
DWDM-CSE-Question Bank
11 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Seperated
No ratings yet
Seperated
11 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Data Mining
No ratings yet
Data Mining
20 pages
Data Mining Suggestions
No ratings yet
Data Mining Suggestions
5 pages
Data Warehousing Fundamentals - Unit 2
No ratings yet
Data Warehousing Fundamentals - Unit 2
38 pages
Ans DM
No ratings yet
Ans DM
16 pages
Data Mining Mid 1 - Students-1
No ratings yet
Data Mining Mid 1 - Students-1
4 pages
Data Warehousing and Mining: Ii Unit: Data Preprocessing, Language Architecture Concept Description
No ratings yet
Data Warehousing and Mining: Ii Unit: Data Preprocessing, Language Architecture Concept Description
7 pages
2018 & 2019 Data Mining Answers
No ratings yet
2018 & 2019 Data Mining Answers
25 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
FDS - I Unit
No ratings yet
FDS - I Unit
9 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
24 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
7 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
Data Warehousing and Data Mining Iv-Cse A: Prepared by
No ratings yet
Data Warehousing and Data Mining Iv-Cse A: Prepared by
5 pages
Unit 5
No ratings yet
Unit 5
9 pages
CS-DM Module - 1
No ratings yet
CS-DM Module - 1
27 pages
Anmisha Mandivarapu Resume
No ratings yet
Anmisha Mandivarapu Resume
1 page
Question Bank 2
No ratings yet
Question Bank 2
4 pages
DMDW
No ratings yet
DMDW
4 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
Data Mining IMP Objective Questions - Sep 2023
No ratings yet
Data Mining IMP Objective Questions - Sep 2023
4 pages
CS1004 DWM 2marks 2013
No ratings yet
CS1004 DWM 2marks 2013
22 pages
Sheet 1 Solution1
No ratings yet
Sheet 1 Solution1
4 pages
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
No ratings yet
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
31 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

Data Mining Long Answers

Uploaded by

Data Mining Long Answers

Uploaded by

AAT-II

Data Mining and Knowledge Discovery

The data mining functions needed include:

These cannot be completely performed by simple query processing or basic statistical

2. Explain the difference between discrimination and classification? Between

Characterization vs. Clustering:

Classification vs. Prediction:

3. Explain briefly about the data smoothing techniques?

4. Classify the various data reduction techniques?

- Dimensionality Reduction: Removes irrelevant or redundant attributes (e.g., PCA –

These operations help in multidimensional analysis and OLAP (Online Analytical

- Distributive (e.g., COUNT, SUM): Can be computed in parts and merged.

Example: In a sales warehouse,

7. Discuss which algorithm is an influential algorithm for mining frequent item

Example: For a supermarket:

It’s effective but can be computationally expensive due to candidate generation.

(c) CLARA (Clustering LARge Applications):

(a) Euclidean Distance = sqrt((22-20)^2 + (1-0)^2 + (42-36)^2 + (10-8)^2)

(b) Manhattan Distance = |22-20| + |1-0| + |42-36| + |10-8| = 2 + 1 + 6 + 2 = 11

(c) Minkowski (p=3) = [|2|^3 + |1|^3 + |6|^3 + |2|^3]^(1/3)

You might also like