0% found this document useful (0 votes)
39 views4 pages

Course Outcomes For Assessment in This Ia: Cos Co3 Co4 Co5 Co6

The document provides an internal assessment for a course on Introduction to Data Science. It includes two parts covering machine learning techniques like decision trees, clustering algorithms, and performance metrics. Students are asked questions testing their understanding of concepts like information gain, precision, specificity, and applying techniques like k-means clustering and constructing decision trees. Bloom's taxonomy is used to map the cognitive complexity of the questions.

Uploaded by

Toygj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views4 pages

Course Outcomes For Assessment in This Ia: Cos Co3 Co4 Co5 Co6

The document provides an internal assessment for a course on Introduction to Data Science. It includes two parts covering machine learning techniques like decision trees, clustering algorithms, and performance metrics. Students are asked questions testing their understanding of concepts like information gain, precision, specificity, and applying techniques like k-means clustering and constructing decision trees. Bloom's taxonomy is used to map the cognitive complexity of the questions.

Uploaded by

Toygj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Reg. No.

Division: Computer Science and Engineering


B.Tech. CSE - INTERNAL ASSESSMENT III - OCT 2023
2023-24 ODD SEMESTER
Date: Session: Semester: V
Course Code: 20CS2031 Maximum Marks: 40
Course Name: INTRODUCTION TO DATASCIENCE Time: 2 Hours

Course Outcomes for Assessment in this IA:


COs COURSE OUTCOME
discuss on the principle operation of various supervised and unsupervised machine learning
CO3
techniques.
CO4 select appropriate mathematical machine learning techniques for solving real world problems.
CO5 apply the relevant techniques for implementing solutions to solve real-world problems
access the performance of prediction, classification and recommendation of machine learning
CO6
techniques.

Bloom’s Mapping
PART – A (20 Marks) Marks
Level COs
A smart traffic camera is trained on various object dataset to detect the
1.a) type of vehicles on a signal. Identify the type of machine learning U CO5 1
technique.

Classification Algorithm
b) Define information gain and formulate the equation. R CO3 1
Information gain: specifies how much information a particular predictor
variable gives about the final outcome (used to choose the predictor
variable that best splits the data)
IG=¿ E(parent) – weighted average of E(input)
Sketch the decision tree for the following AND operation.

c) A CO5 2

Calculate the precision and specificity from the given confusion matrix.
Precision = 45/50 = 0.9
Specificity = 30/35 = 0.85
d) Predicted An CO6 2
Actual

Positive Negative
Positive 45 20
Negative 5 30
State the purpose of constructing a decision tree. Explain the ID3
e(i) U CO3 4
algorithm for constructing a decision tree.
A decision tree is a decision support hierarchical model that uses a tree-like
model of decisions and their possible consequences
Step by Step procedure for building decision tree
• Step 1: Calculate the entropy of the predicted (target) attribute
• Step 2: Compute the IG for all predictor (input) variables
• Step 3: Select Best Attribute (A) based on the highest Information
Gain
• best predictor variable that separates the data into different
classes most effectively or feature that best splits the data
• Step 4: Assign A as a decision variable for the root node
• Step 5: For each value of A, build a descend of the node
• Step 6: Assign classification labels to the leaf node
• Step 7: If data is correctly classified: Stop
• Step 8: Else iterate over the tree
• Keep changing the position of predictor attributes in the tree
or change the root node also, to get the correct output
Determine the parent/root node of the decision tree for the following
dataset and show all the intermediate steps.
E(Label) = -(4/7) log2(4/7) - (3/7) log2(3/7) = 0.984

E(User Interest= Tech) = -(1/3) log2(1/3) - (2/3) log2(2/3) = 0.917


(ii). E(User Interest= Fashion) = -(2/2) log2(2/2) - (0/2) log2(0/2) = 0 A CO5 10
E(User Interest= Sports) = -(1/2) log2(1/2) - (1/2) log2(1/2) = 1
E(User Interest) =3/7*0.917+0+2/7*1=0.678
Info gain = 0.984 - 0.678 = 0.306

E(User Occu=Prof) = -(2/4) log2(2/4) - (2/4) log2(2/4) = 1


E(User Occu = Student) = -(1/3) log2(1/3) - (2/3) log2(2/3) = 0.917
E(User Occu) =4/7*1+0+3/7*0.917 = 0.964
Info gain = 0.984 – 0.964 = 0.02

Root node = User Interest


User interest User occupation Click
Tech Professional 1
Fashion Student 0
Fashion Professional 0
Sports Student 0
Tech Student 1
Tech Professional 0
Sports Professional 1
Bloom’s Mapping
PART- B (20 Marks) Marks
Level COs
Identify the method used to find the optimal number of clusters in K-
2.a) R CO3 1
Means clustering.
Elbow Method
b) Name the two essential components of the graph clustering algorithm. R CO3 1
Node and Edges
c) Differentiate between agglomerative and divisive clustering approach. U CO3 2
Agglomerative: bottom-up approach (merging) - algorithm starts with
taking all data points as single clusters and merging them until one cluster
is left.
Divisive: top-down approach (splitting) - algorithm starts with all the
points in the dataset belong to one cluster and split is performed
recursively as one moves down the hierarchy
State the advantages and disadvantages of the K-Means clustering
d) R CO3 2
algorithm.
Advantages:
• Easy to implement
• Relatively fast and efficient
• Only one parameter to be tuned and you can easily see the direct
impact of adjusting the value of parameter K
Disadvantages:
• Sensitivity to Initial Centroid Selection
• Dependence on the Number of Clusters (K)
• Spherical Clusters
• Sensitive to Outliers
• Not Suitable for Non-Numeric Data
• Difficulties with High-Dimensional Data
Use K-Means clustering algorithm to divide the following dataset into 2
clusters and calculate the updated cluster centroids after one iteration.
Assume the initial cluster centroids as (185,72), (170,56).

Cluster 1 = 1, 4, 5,6 Cluster 2 = 2,3


e. i) Centroid 1 =183.5,72.25 An CO4 10
Centroid 2= 169, 58

ID Height Weight
1 185 72
2 170 56
3 168 60
4 179 68
5 182 72
6 188 77
Sketch a dendrogram for the following data using agglomerative
clustering algorithm.

AE BC
ii) AE 0 2 A CO5 4
BC 2 0

BC AE

E A C B
E 0 1 2 2
A 1 0 2 5
C 2 2 0 1
B 2 5 1 0

Bloom’s Level-wise Mark Distribution:


Remember Understand Apply Analyze Evaluate Create Total
5 7 16 12 - - 40

Course Outcome wise Mark Distribution


COs Remember Understand Apply Analyze Evaluate Create Total
CO3 5 6 - - - - 11
CO4 - - - 10 - - 10
CO5 - 1 16 - - - 17
CO6 - - - 2 - - 2
Total 5 7 16 12 - - 40

You might also like