DMBI Questions
DMBI Questions
Module 1
• Draw Data warehousing Architecture?
• Compare and contrast between OLTP and OLAP.
• What is Data mining ? Explain KDD process with diagram.
• Compare star schema , Snow flakes schema and star
constellation.
• Short note on Dimensional Modeling.
• Define data warehouse. Describe different OLAP operations in
detail.
• Compare star schema , Snow flakes schema and fact
constellation.
• Explain OLAP operations with the examples.
• Explain the knowledge discovery process with diagram.
• What are the major issues in data mining?
• What is data mining? Explain KDD process with diagram.
• Demonstrate with a diagram the process of KDD.
Module 2
• What is noisy data ? how to handle noisy data ? (2)
• Consider we have age of 29 participants in a survey given to us
in sorted order. 5, 10, 13, 15, 16, 16, 20, 20, 21, 22, 22, 25, 25,
25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70, 85
Explain how to calculate mean, median, standard deviation, 1st
and 3rd Quartile for given data and also compute the same.
Show the Box and Whisker plot for this data.
• (2) Suppose the data for analysis includes the attribute age. The
age values for data tuples are (in increasing order):
13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,3
5,36,40,45,46,52,70
i) What is mean of data? What is median of data?
ii) What is mode of data? Comment on data's modality.
iii) What is mid range of data?
iv) Give the five point summary of the data.
v) Show box plot of the data.
Module 3
• Explain concept of information gain and gini value used in
decision tree algorithm.
• Consider Training dataset as given below. Use Naive Bayes
Algorithm to determine whether it is advisable to play tennis on
a day with hot temperature, rainy outlook, high humidity and
no wind?
• Short note on Random Forest technique.
• Short note on Decision tree induction.
• Short note on cross validation.
• Apply Naive Bayes classifier algorithm to the dataset given
below, and classify the unknown data sample?
Given all the previous patients I've seen(below are their
symptoms and their diagnosis)
Module 4
• What is an outlier ? Explain various methods for performing
outlier analysis.
• Cluster the following eight points (with (x, y) representing
locations) into three clusters: A1(2, 10) , A2(2, 5) , A3(8, 4) ,
A4(5, 8) , A5(7, 5) , A6(6, 4) , A7(1, 2) , A8(4,9) Assume Initial
cluster centers are at; A1(2, 10) , A4(5, 8) and A7(1, 2) .The
distance function between two points a =(x1,y1) and b =(x2,y2)
is defined as- P(a,b) =|x2-x1|+|y2-y1|
Use K-Means Algorithm to find the three cluster centres after
the second iteration.
• Short note on DBSCAN Algorithm.
• Suppose we have six objects with name A, B, C, D, E, F. Apply
single linkage clustering and dendrogram for the given data.
Module 6
• What is Business Intelligence (BI) ? Explain architecture in
detail.
• How is Data Mining used in Business Intelligence (BI) ?
• What is BI ? define decision support system.
• Explain Business Intelligence issues.
• Define BI and give its architecture. Explain any business
application where data mining can be used.