0% found this document useful (0 votes)
11 views2 pages

Important Questions Related To Module-1 & Module-2

Uploaded by

prathammsr192003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views2 pages

Important Questions Related To Module-1 & Module-2

Uploaded by

prathammsr192003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Important Questions Related to Module-1 & Module-2

Practice and Remember :


1. Explain KDD with the help of a diagram.
2. What are the challenges in the Data Mining Process?
3. Describe the different kinds of data in DM.
4. List the types of data which can difficult to mined.
Ans- Spatial data, data streams, temporal data
5. Define the major issues in data mining.
6. How to handle Missing data in data mining process?
7. State difference between Missing Data vs Noisy Data. How to handle noisy data in DM?
8. Why data pre-processing is important and explain the major tasks in data pre-Processing?
9. What is Data Integration? List out the challenges in data integration.
10. Define Similarity and Dissimilarity between data objects.
11. A. Explain Similarity/Dissimilarity for Objects with Single Attribute B. Describe different
distance measures. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0,
36, 8): (a) Compute the Euclidean distance between the two objects. (b) Compute the
Manhattan distance between the two objects.
(c) Compute the Minkowski distance between the two objects, using q = 3.
12. Explain steps in Entropy Based Discretization. Use Entropy Based Discretization to find
the best split for the following data. (0,y), (4,y), (12,y), (16,n), (16,n),(18,y),
(24,n),(26,n),(28,n). If 'S' has to be permitted into 2 intervals 'S1' and 'S2' using two split
points '14' and '21'
13. Compare Jaccard Coefficient with SMC, and also find the SMC and JC for the data given
below.
M 1 0 0 0 0 0 0 0
N 0 0 0 0 0 0 0 1
14. What is Noisy data in Data Cleaning? How do you Handle noisy data on the given set of
datasets [5,10,11,13,15,35,50,55,72,92,204,215]. Apply Bining techniques by considering
it as 3 equal width bins.
15. Suppose that a group of 1500 people was surveyed. The gender of each person was
noted. Each person was polled as to whether his or her preferred type of reading material
was fiction oor non-fiction. Thus, we have two attributes, gender and preferred reading.
The data is as follows :- 1.male who preferred fiction=250 2. Male who preferred non-
fiction=50 3. Female who preferred fiction=200 4. Female who preferred non-
fiction=1000. Find the correlation for these nominal attributes for the given value?
16. Let x1={1,2} and x2 = {3,5} represents two objects. Calculate the Euclidean distance
between the two objects.
17. Solve problem related to the Chi-Square test.
18. Solve Numerical problem related to Cosine index.
19. Describe different Data Mining Techniques.
20. Describe main Features of Data Quality.
21. Data Pre-processing Techniques.
22. Given two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8):
(a) Compute the Euclidean distance between the two objects.
(b) Compute the Manhattan distance between the two objects.
(d) Compute the Supremum distance between the two objects.

23. For the following asymmetric binary attributes, calculate the Jaccard
coefficient (similarity) between following two objects:
X = (1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Y= (0, 0, 0, 0, 0, 0, 1, 0, 0, 1)

24. Calculate cosine similarity for the following two document vectors:
X = (3, 2, 0, 5, 0, 0, 0, 2, 0, 0)
Y = (1, 0, 0, 0, 0, 0, 0, 1, 0, 2)

Solve : Q-11

Explanation:
Given,
P = (22, 1, 42, 10)
Q = (20, 0, 36, 8)
a. Formula for Euclidean Distance :
distance = ((p1-q1)^2 + (p2-q2)^2 + ... + (pn-qn)^2)^(1/2)
Now,
distance = ( (22-20)^2 + (1-0)^2 + (42 - 36)^2 + (10-8)^2) ) ^(1/2)
=( (2)^2 + (1)^2 + (6)^2 + (2)^2 ) ) ^(1/2)
=(4+1+36+4)^(1/2)
=45^(1/2)
Distance = 6.7082
b.Manhattan distance :
d = |x1 - x2| + |y1 - y2|
d = |22- 20| + |1 - 0|
d = |2| + |1|

Q-12: Already shared solve entropy pdf. refer that pdf


Q-13: Already solve in ppt. similar type .
Rest problems try to solve your side ..for referring google or other sources.

You might also like