0% found this document useful (0 votes)
32 views8 pages

DMBI Questions

The document discusses data mining and business intelligence concepts across 6 modules. It covers topics such as data warehousing, OLAP, data preprocessing, classification algorithms like decision trees and naive bayes, clustering techniques including k-means and DBSCAN, association rule mining with the apriori algorithm, and applications of data mining in business intelligence.

Uploaded by

Manthan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views8 pages

DMBI Questions

The document discusses data mining and business intelligence concepts across 6 modules. It covers topics such as data warehousing, OLAP, data preprocessing, classification algorithms like decision trees and naive bayes, clustering techniques including k-means and DBSCAN, association rule mining with the apriori algorithm, and applications of data mining in business intelligence.

Uploaded by

Manthan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

DMBI

Module 1
• Draw Data warehousing Architecture?
• Compare and contrast between OLTP and OLAP.
• What is Data mining ? Explain KDD process with diagram.
• Compare star schema , Snow flakes schema and star
constellation.
• Short note on Dimensional Modeling.
• Define data warehouse. Describe different OLAP operations in
detail.
• Compare star schema , Snow flakes schema and fact
constellation.
• Explain OLAP operations with the examples.
• Explain the knowledge discovery process with diagram.
• What are the major issues in data mining?
• What is data mining? Explain KDD process with diagram.
• Demonstrate with a diagram the process of KDD.
Module 2
• What is noisy data ? how to handle noisy data ? (2)
• Consider we have age of 29 participants in a survey given to us
in sorted order. 5, 10, 13, 15, 16, 16, 20, 20, 21, 22, 22, 25, 25,
25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70, 85
Explain how to calculate mean, median, standard deviation, 1st
and 3rd Quartile for given data and also compute the same.
Show the Box and Whisker plot for this data.
• (2) Suppose the data for analysis includes the attribute age. The
age values for data tuples are (in increasing order):
13,15,16,16,19,20,20,21,22,22,25,25,25,25,30,33,33,35,35,35,3
5,36,40,45,46,52,70
i) What is mean of data? What is median of data?
ii) What is mode of data? Comment on data's modality.
iii) What is mid range of data?
iv) Give the five point summary of the data.
v) Show box plot of the data.

• Describe any two methods of data reductions.


• Use the normalization methods to normaliz the following group
of data : 200, 300, 400, 600, 1000
Use min-max normalization by setting min=0 and max=1 and z-
score normalization.
• Give any two techniques of data preprocessing.
• Suppose a group of 12 sales price records has been stored as
follows: 30, 36, 47, 50, 52, 52, 56, 60, 63, 70, 70, 110.
Find mean, median, mode , inter quartile range (IQR).
• What is an attribute ? Explain its types.
• Describe the different types of attributes one may come across
in data mining with two examples of each.
• Find Mean, median, mode for a given data. Show box plot.
11,13,13,15,15,16,19,20,20,21,21,22,23,24,30,40,45,45,45.
• What is the need of pre-processing. Explain the different steps
involved in data pre-processing.

Module 3
• Explain concept of information gain and gini value used in
decision tree algorithm.
• Consider Training dataset as given below. Use Naive Bayes
Algorithm to determine whether it is advisable to play tennis on
a day with hot temperature, rainy outlook, high humidity and
no wind?
• Short note on Random Forest technique.
• Short note on Decision tree induction.
• Short note on cross validation.
• Apply Naive Bayes classifier algorithm to the dataset given
below, and classify the unknown data sample?
Given all the previous patients I've seen(below are their
symptoms and their diagnosis)

Do I believe that patient with following Symptoms has the flu ?


• Briefly explain Bagging and Boosting of Classifiers.
• Write and explain Bayes Classification algorithm.
• Write the steps of Ada-boost algorithm.
• Describe the classification performance evaluation measures
that are obtained from confusion matrix?
• Explain Confusion matrix with one example
• Write a short note on Naïve Bayesian classification.
• Explain bagging technique.
• Explain Confusion Matrix. Calculate Accuracy, Precision and
Recall for the following Confusion Matrix.

• Explain regression. Explain linear regression with example.


• Using the given training dataset classify the following tuple
using Naïve Bayes Algorithm:
<Homeowner: No, Marital Status: Married, Job experience:3>
• Illustrate any one classification technique for the following
dataset. Show how we can classify new
tuple(HOMEOWNER=Yes, Status= Employed, Income=Average)

• Explain different methods that can be used to evaluate and


compare the accuracy of different classification algorithms.
• Explain simple linear regression with example.

Module 4
• What is an outlier ? Explain various methods for performing
outlier analysis.
• Cluster the following eight points (with (x, y) representing
locations) into three clusters: A1(2, 10) , A2(2, 5) , A3(8, 4) ,
A4(5, 8) , A5(7, 5) , A6(6, 4) , A7(1, 2) , A8(4,9) Assume Initial
cluster centers are at; A1(2, 10) , A4(5, 8) and A7(1, 2) .The
distance function between two points a =(x1,y1) and b =(x2,y2)
is defined as- P(a,b) =|x2-x1|+|y2-y1|
Use K-Means Algorithm to find the three cluster centres after
the second iteration.
• Short note on DBSCAN Algorithm.
• Suppose we have six objects with name A, B, C, D, E, F. Apply
single linkage clustering and dendrogram for the given data.

• What is an outlier ? describe methods used for outlier analysis.


• Give the overview of partition clustering methods.
• Give the steps of K means clustering algorithm.
• Explain concept hierarchy with example.
• Explain density based clustering.
• What do you mean by outlier? Give the types of it.
• Apply K-means Algorithm to divide the given set of values
{2,3,6,8,9,12,15,18,22} into 3 clusters .
• Suppose we have five objects with name A, B, C, D & E. Apply
single linkage clustering and draw dendrogram for the given
data.

• What is an outlier ? describe methods that are used for outlier


analysis.
• Use k means clustering to cluster the following data into 2
clusters. 2,3,4,10,11,12,20,25,30.

• Explain DBSCAN algorithm with example.


Module 5
• Explain market Basket Analysis with example. (3)
• Use the Apriori algorithm to identify the frequent item-sets in
the following database. Then extract the strong association
rules from these sets. Assume Min. Support = 50% Min.
Confidence = 75%

• Explain multi-level and multi-dimensional association rules with


example. (3)
• For the table given , apply Apriori algorithm and show frequent
item set and strong association rules. Assume Minimum
support of 30% and Minimum confidence of 70%.

• How can we further improve the efficiency of Apriori-based


mining?
• Explain how the efficiency of Apriori algorithm is improved.
• Consider the transaction database given in table below. Apply
Apriori Algorithm with minimum support of 50% and
confidence of 50%. Find all frequent itemsets and all the
association rules.

• What is market basket analysis ? Give apriori algorithm.

Module 6
• What is Business Intelligence (BI) ? Explain architecture in
detail.
• How is Data Mining used in Business Intelligence (BI) ?
• What is BI ? define decision support system.
• Explain Business Intelligence issues.
• Define BI and give its architecture. Explain any business
application where data mining can be used.

You might also like