0% found this document useful (0 votes)
3K views

Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks

This document contains a question bank for the Data Science subject for Semester IV. It is divided into 5 modules. Module 1 covers topics related to data mining including classification and clustering. Module 2 discusses data types, attributes, preprocessing and dimensionality reduction. Module 3 focuses on association rule mining and frequent item set generation. Module 4 is about decision trees, rule-based classifiers, nearest neighbor algorithms and Bayesian classifiers for classification. Module 5 covers different clustering algorithms like K-means, hierarchical and DBSCAN clustering. Each module contains short answer and long answer questions related to the topics in that module.

Uploaded by

Achutha JC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views

Question Bank Semester: IV Sem Subject: Data Science Sub Code: 17MCA441 SL - No. Questions Marks

This document contains a question bank for the Data Science subject for Semester IV. It is divided into 5 modules. Module 1 covers topics related to data mining including classification and clustering. Module 2 discusses data types, attributes, preprocessing and dimensionality reduction. Module 3 focuses on association rule mining and frequent item set generation. Module 4 is about decision trees, rule-based classifiers, nearest neighbor algorithms and Bayesian classifiers for classification. Module 5 covers different clustering algorithms like K-means, hierarchical and DBSCAN clustering. Each module contains short answer and long answer questions related to the topics in that module.

Uploaded by

Achutha JC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

DAYANANDASAGAR COLLEGE OF ENGINEERING

DEPARTMENT OF MCA

Question Bank

Semester: IV Sem Subject: Data Science Sub Code: 17MCA441

Module 1

Sl.No. Questions Marks


1 What is Data Mining? Explain the process of knowledge discovery in data 8marks
bases
2 Explain the motivating challenges in data mining 8 marks
3 Describe data mining as a confluence of many disciplines 8 marks
4 What are data mining tasks? Explain 8 marks
5 Explain the market basket analysis in data mining 8 marks
6 Describe the application areas in data mining 8 marks
7 What is classification? Explain 10marks
8 List out some applications in classification 10marks
9 What is clustering? List out some applications in clustering 10marks
10 10marks

Module 2

Sl.No. Questions Marks


1 What is Data? What are the types of Data 8marks
2 Define an attribute? Explain types of attributes 8 marks
3 Mention the general characteristics of data sets 5marks
4 Discuss specific aspects of data quality 8 marks
5 Mention the data quality issues in mining the data 8 marks
6 What is data preprocessing? Explain some methods for preprocessing the 8 marks
data.
7 What is dimensionality reduction? Explain the curse of dimensionality 10marks
8 Explain the approaches in feature subset selection 10marks
9 Explain discretization and binarization in data mining 10marks
10 What are the measures of similarity and dissimilarity 10marks
11 Explain the methods to measure the similarities between the data objects 8 marks
12 What are the methods to measure the dissimilarities between the data 8 marks
objects
13 What is correlation ?Explain 8 marks
14 What are the issues in proximity calculation 8 marks
15 Describe the general characteristics of selecting the right proximity 8 marks
measure
16 Distinguish between noise and outliers 8 marks
17 Which approach jaccard or hamming distance is more similar to simple 8 marks
matching coefficient
18 Which approach is more similar to the cosine measure 8 marks
19 For the following vectors,x and y calculate the indicated similarity or 8 marks
distance measure
X=(1,1,1,1)Y=(2,2,2,2) cosine,correlation ,Euclidean
X=(0,1,0,1)Y=(1,0,1,0) cosine,correlation,Euclidian,jaccard
20 Explain why computing the proximity between two attributes is often 8 marks
simpler than computing the similarity between two objects.

Module 3

Sl.No. Questions Marks


1 Why use support and confidence? 5 marks
2 What is association rule mining? Explain 8 marks
3 How to generate the frequent item sets? 8 marks
4 Write the steps to generate the frequent item sets using Aprioiri algorithm 10 marks
5 Explain about the candidate generation and pruning. 10 marks
6 Describe the support counting using a hash tree 10 marks
7 How to measure the computational complexity in data mining 8 marks
8 How to generate the rules in apriori algorithm 8 marks
9 Describe the maximal frequent item sets 5 marks
10 Explain the closed frequent item sets 5 marks
11 Describe all the alternative methods for generating frequent item sets. 10 marks
12 Construct the FP-tree algorithm with an example. 10 marks
13 How to generate the frequent item set generation in FP-growth algorithm 10 marks
14 How to evaluate the association patterns 8 marks
15 What are the objective measures of interestingness 5 marks
16 Write the limitation of interest factor, correlation analysis, and IS 8 marks
measures.
17 How to discover association rule using hash tree explain 10 marks
18 Mention the factors affecting the complexity 5 marks
19 Differentiate between the maximal and closed frequent item sets. 5 marks
20 How to compute the interestingness measures. 5 marks
21 What is the effect of support based prunining 5 marks
22 What is statistical independence and statistical based measures 5 marks
23 Distinguish between interestingness and unexpectedness 5 marks
Module 4

Sl.No. Questions Marks


1 Describe the general approach to solving a classification problem 8 marks
2 How a decision tree works. 8 marks
3 How to build a decision tree. 8 marks
4 Describe how a decision tree grows recursively using Hunt’s Algorithm. 10 marks
5 Describe the design issues of decision tree induction. 8 marks
6 Explain the methods for expressing attribute test conditions. 8 marks
7 What are the measures for selecting the best split? 8 marks
8 Explain how to split the binary, nominal and continuous attributes. 8 marks
9 Write an algorithm for decision tree induction. 8 marks
10 Explain rule based classifier? 8 marks
11 How a rule based classifier works? 8 marks
12 What are the rules ordering schemes? 8 marks
13 How to build a rule based classifier? 8 marks
14 Explain the RIPPER algorithm used for rule induction. 8 marks
15 Describe indirect methods for rule extraction. 8 marks
16 Mention the characteristics of rule based classifiers. 8 marks
17 Describe the nearest neighbor algorithm. 8 marks
18 Write the k-nearest neighbor classification algorithm. 8 marks
19 What are the characteristics of nearest neighbor classifiers? 8 marks
20 Describe the Bayesian classifiers for classification. 8 marks
21 Explain the naïve Bayes classifier. 8 marks
22 How a naïve Bayes classifier works. 8 marks
23 How to estimate conditional probabilities for categorical attributes 8 marks
24 How to estimate conditional probabilities for continuous attributes 8 marks
25 Describe the M-estimate of conditional probability. 8 marks
26 What are the characteristics of naïve Bayes classifiers? 8 marks
27 How to measure the Bayes error rate. 8 marks
28 What are Bayesian Belief networks? Explain 8 marks

Module 5

1 What is cluster Analysis? List out the application areas of cluster analysis 10 marks
to practical problems.
2 Explain the different types of clustering. 8 marks
3 Describe the basic K-means algorithm with example. 8 marks
4 Mention the ways in choosing the initial centroids. 8 marks
5 Determine the time and space complexity of K-means algorithm. 5 marks
6 Explain Bisecting K-means algorithm. 8 marks
7 Mention the strengths and weakness of K-means algorithm 5 marks
8 Describe the agglomerative hierarchical clustering algorithm. 8 marks
9 Write a basic agglomerative hierarchical clustering algorithm. 8 marks
10 Explain the different ways in defining the proximity between clusters. 8 marks
11 Determine the time and space complexity of hierarchical clustering 8 marks
algorithm.
12 Illustrate Ward’s method in finding the proximity between two clusters. 8 marks
13 Describe DBSCAN clustering algorithm with example. 8 marks
14 How to evaluate clusters? Explain 8 marks
15 What is anomaly detection? Illustrate applications for which anomalies are 8 marks
of interest.
16 Mention the causes for anomalies. 8 marks
17 Explain different approaches to anomaly detection. 8 marks
18 Explain the different issues that need to be addressed when dealing with 8 marks
anomaly detection.
19 Describe the statistical approaches to outlier detection. 8 marks
20 Explain the proximity based outlier detection. 8 marks
21 Explain the clustering based techniques for outlier detection. 8 marks

You might also like