Assignment

The document poses 11 questions related to data mining concepts and techniques. It covers topics such as the differences and similarities between data warehouses and databases, challenges of mining large datasets, definitions of various data mining functionalities, examples of descriptive statistics, data normalization methods, receiver operating characteristic (ROC) curves, decision tree pruning, clustering algorithms and constraints, considerations for implementing real-world data mining applications, and frequent itemset mining.

Uploaded by

Yrga Weldegiwergs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views

Assignment

Uploaded by

Yrga Weldegiwergs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

1) How is a data warehouse different from a database? How are they similar?

2) What are the major challenges of mining a huge amount of data (e.g., billions of tuples) in
comparison with mining a small amount of data (e.g., data set of a few hundred tuple)?
3) Define each of the following data mining functionalities: characterization, discrimi-nation,
association and correlation analysis, classification, regression, clustering, and outlier analysis.
Give examples of each data mining functionality, using a real-life. database that you are familiar
with.
4) Suppose that the data for analysis includes the attribute age. The age values for the data tuples
are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35,
35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data? What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.)
(c) What is the midrange of the data?
(d) Can you find (roughly) the first quartile (Q1) and the third quartile (Q3) of the data?
(e) Give the five-number summary of the data.
(f) Show a boxplot of the data.
(g) How is a quantile–quantile plot different from a quantile plot?

5) Use these methods to normalize the following group of data: 200, 300, 400, 600,1000
(a) min-max normalization by setting min = 0 and max = 1
(b) z-score normalization
(c) z-score normalization using the mean absolute deviation instead of standard deviation
(d) normalization by decimal scaling

6) The data tuples from the table below are sorted by decreasing probability value, as returned by a
classifier. For each tuple, compute the values for the number of true positives (TP), false
positives (FP), true negatives (TN) and false negatives (FN). Compute the true positive rate
(TPR) and false positive rate (FPR). Plot the ROC curve for the data.
Tuple class probability
P 0.95
1
N 0.85
2
P 0.78
3
P 0.66
4
N 0.60
5
P 0.55
6
N 0.53
7
N 0.52
8
N 0.51
9
10 P 0.40
7) Given a decision tree, you have the option of (a) converting the decision tree to rules and
then pruning the resulting rules, or (b) pruning the decision tree and then converting the
pruned tree to rules. What advantage does (a) have over (b)?

8) Briefly describe and give examples of each of the following approaches to clustering:
partitioning methods, hierarchical methods, density-based methods, and grid-based
methods.

9) Suppose that you are to allocate a number of automatic teller machines (ATMs) in a given
region so as to satisfy a number of constraints. Households or workplaces may be clustered
so that typically one ATM is assigned per cluster. The clustering, however, may be
constrained by two factors: (1) obstacle objects (i.e., there are bridges, rivers, and highways
that can affect ATM accessibility), and (2) additional user-specified constraints such as that
each ATM should serve at least 10,000 households. How can a clustering algorithm such as
k-means be modified for quality clustering under both constraints?

10) Choose any real world data mining application, what major considerations are you going to
follow to implement your model?

11) Using the given database below, Generate all possible candidate itemsets and frequent
itemsets, where the minimum support count is 2.

TID Items
100 134
200 235
300 1235
400 25

COMP1942 Question Paper
No ratings yet
COMP1942 Question Paper
7 pages
Sample Size Calculations Thabane
No ratings yet
Sample Size Calculations Thabane
42 pages
Lab Report Triangle Test
100% (1)
Lab Report Triangle Test
4 pages
Case Study On Business Mathematics
No ratings yet
Case Study On Business Mathematics
10 pages
DWM Solution May 2019
No ratings yet
DWM Solution May 2019
9 pages
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
No ratings yet
III Yr B.Tech. - Computer Science & Engineering/Information Technology Data Mining
2 pages
Qb Data Mining
No ratings yet
Qb Data Mining
5 pages
DWDM Assignment 1
No ratings yet
DWDM Assignment 1
4 pages
CS 515 Data Warehousing and Data Mining
No ratings yet
CS 515 Data Warehousing and Data Mining
5 pages
Sample Question DMW
No ratings yet
Sample Question DMW
4 pages
FDS - 1 SOLVED
No ratings yet
FDS - 1 SOLVED
17 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
13 pages
Cis 417.Ccs 415. Cct 416 Cat
No ratings yet
Cis 417.Ccs 415. Cct 416 Cat
4 pages
GTU-COMPUTER-3160714-SUMMER-2023
No ratings yet
GTU-COMPUTER-3160714-SUMMER-2023
3 pages
1569928600-7cs It3a dmwh-3555
No ratings yet
1569928600-7cs It3a dmwh-3555
2 pages
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
No ratings yet
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
10 pages
HW1
No ratings yet
HW1
4 pages
DMBI_All_pyqs
No ratings yet
DMBI_All_pyqs
4 pages
Question Bank: Q1) What Is Data Warehouse?
No ratings yet
Question Bank: Q1) What Is Data Warehouse?
17 pages
Answers PDF
No ratings yet
Answers PDF
9 pages
DM Question Bank
No ratings yet
DM Question Bank
5 pages
DMDW Lab Oral Question Bank
No ratings yet
DMDW Lab Oral Question Bank
4 pages
DMDW Question Bank
No ratings yet
DMDW Question Bank
17 pages
DWDM-CSE-Question Bank
No ratings yet
DWDM-CSE-Question Bank
11 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
12 pages
comp 414 revision
No ratings yet
comp 414 revision
9 pages
126VW122019
No ratings yet
126VW122019
2 pages
FDS Sem5
No ratings yet
FDS Sem5
20 pages
assignment 3 (warehouse)
No ratings yet
assignment 3 (warehouse)
2 pages
(It-704c) Data Warehousing and Data Mining (2013-14)
No ratings yet
(It-704c) Data Warehousing and Data Mining (2013-14)
6 pages
Data Mining List of Important Question
No ratings yet
Data Mining List of Important Question
4 pages
Seperated
No ratings yet
Seperated
11 pages
Assignment I
No ratings yet
Assignment I
4 pages
DM UNIT-1 Question and Answer
No ratings yet
DM UNIT-1 Question and Answer
25 pages
Book Exercises NayelliAnswers
No ratings yet
Book Exercises NayelliAnswers
3 pages
Data Mining 2-5
No ratings yet
Data Mining 2-5
4 pages
Data Mining Question Bank
0% (1)
Data Mining Question Bank
7 pages
Dcs 7302
No ratings yet
Dcs 7302
17 pages
Winsem2012-13 Cp0535 Modqst Model QP
No ratings yet
Winsem2012-13 Cp0535 Modqst Model QP
4 pages
Revision (ques.only)
No ratings yet
Revision (ques.only)
2 pages
DWM PYQ
No ratings yet
DWM PYQ
10 pages
Data Warehousing and Mining: Ii Unit: Data Preprocessing, Language Architecture Concept Description
No ratings yet
Data Warehousing and Mining: Ii Unit: Data Preprocessing, Language Architecture Concept Description
7 pages
B.Tech May2022 Comp CSPE-64 Sem4
No ratings yet
B.Tech May2022 Comp CSPE-64 Sem4
4 pages
Data Warehousing and DatabySRS
No ratings yet
Data Warehousing and DatabySRS
8 pages
Question Bank 2
No ratings yet
Question Bank 2
4 pages
DW Model Questions
No ratings yet
DW Model Questions
8 pages
CS402 Data Mining and Warehousing Question Bank
No ratings yet
CS402 Data Mining and Warehousing Question Bank
6 pages
Q1S(1)
No ratings yet
Q1S(1)
2 pages
DWDM
No ratings yet
DWDM
18 pages
Data Warehousing and Data Mining JNTU Previous Years Question Papers
No ratings yet
Data Warehousing and Data Mining JNTU Previous Years Question Papers
4 pages
DM
No ratings yet
DM
7 pages
Data Warehousing Mining MCQs
No ratings yet
Data Warehousing Mining MCQs
12 pages
CHAPTER1-datamining
No ratings yet
CHAPTER1-datamining
33 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
CEUC502 - DMBI_Question_Bank
No ratings yet
CEUC502 - DMBI_Question_Bank
12 pages
SemSuggestions DM
No ratings yet
SemSuggestions DM
6 pages
Assgg
No ratings yet
Assgg
12 pages
DWDM_QB[1]
No ratings yet
DWDM_QB[1]
6 pages
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
No ratings yet
Cs1004: Data Warehousing and Mining Two Marks Questions and Answers Unit I
31 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Research Methods in Psychology - Indian Perspective (P.lodha, A.desousa)
No ratings yet
Research Methods in Psychology - Indian Perspective (P.lodha, A.desousa)
14 pages
Internal Order
100% (1)
Internal Order
38 pages
Events For Insurance and Finance, Springer-Verlag. 645 PP (1.04 KG) - ISSN
No ratings yet
Events For Insurance and Finance, Springer-Verlag. 645 PP (1.04 KG) - ISSN
2 pages
Energy Management - A Case Study of Companies
No ratings yet
Energy Management - A Case Study of Companies
27 pages
BLANK Field Methods Midterm Examination Final Sy 2023-2024
No ratings yet
BLANK Field Methods Midterm Examination Final Sy 2023-2024
9 pages
Statistical Process Control and Quality Control
No ratings yet
Statistical Process Control and Quality Control
36 pages
UIC Newsletter: Spring 2024
100% (1)
UIC Newsletter: Spring 2024
58 pages
Randomvariable Theoretical Distribution
No ratings yet
Randomvariable Theoretical Distribution
8 pages
Texas Data Science Brochure
No ratings yet
Texas Data Science Brochure
12 pages
Songs
No ratings yet
Songs
3 pages
Perceptual Maps: Visual Description
No ratings yet
Perceptual Maps: Visual Description
27 pages
Trip Generation Analysis - FHWA 1975 - Procesat
No ratings yet
Trip Generation Analysis - FHWA 1975 - Procesat
184 pages
What Are Repeatability and Reproducibility?: Viewpoint For Laboratories
No ratings yet
What Are Repeatability and Reproducibility?: Viewpoint For Laboratories
3 pages
data science notes 1
No ratings yet
data science notes 1
3 pages
RM & IPR Unit 1 - New
No ratings yet
RM & IPR Unit 1 - New
25 pages
Schedule
No ratings yet
Schedule
6 pages
Group Assignment PBA - Latest
No ratings yet
Group Assignment PBA - Latest
8 pages
ChengDadeLipmanMills PredictingTheBettingLineInNBAGames PDF
No ratings yet
ChengDadeLipmanMills PredictingTheBettingLineInNBAGames PDF
5 pages
Overview of Enterprise Risk Management
No ratings yet
Overview of Enterprise Risk Management
64 pages
04.zinks CCP
No ratings yet
04.zinks CCP
7 pages
Chapter 2. Linear Random Variables 2019
No ratings yet
Chapter 2. Linear Random Variables 2019
11 pages
SK Learn 1
No ratings yet
SK Learn 1
11 pages
IT-Capability-and-Digital-Transformation-A-Firm-Performance-Perspective-Completed-Research-Paper
No ratings yet
IT-Capability-and-Digital-Transformation-A-Firm-Performance-Perspective-Completed-Research-Paper
17 pages
L32-LOF Example PDF
No ratings yet
L32-LOF Example PDF
12 pages
29_Cybersecurity+in+Network+Traffic+latest
No ratings yet
29_Cybersecurity+in+Network+Traffic+latest
11 pages
Assignment 3 Hypothesis Testing
No ratings yet
Assignment 3 Hypothesis Testing
2 pages
CARS - Foundations of Comprehension Handout 8-23-23
No ratings yet
CARS - Foundations of Comprehension Handout 8-23-23
18 pages

Assignment

Uploaded by

Assignment

Uploaded by

1) How is a data warehouse different from a database? How are they similar?

You might also like