Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
The document outlines the format for an online end semester examination for the Machine Learning course at KIIT Deemed to be University, detailing the structure, marking scheme, and types of questions. It includes multiple-choice questions (MCQs) and descriptive questions, covering various topics in machine learning such as Naive Bayes, SVM, PCA, and neural networks. The examination is divided into two sections, with specific time allocations and marks for each question.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
34 views8 pages
Machine Learning, (CS-3035), Online Spring End Semester Examination 2021
The document outlines the format for an online end semester examination for the Machine Learning course at KIIT Deemed to be University, detailing the structure, marking scheme, and types of questions. It includes multiple-choice questions (MCQs) and descriptive questions, covering various topics in machine learning such as Naive Bayes, SVM, PCA, and neural networks. The examination is divided into two sections, with specific time allocations and marks for each question.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8
Sample Question Format
(For all courses having end semester Full Mark=50)
KIIT Deemed to be University
Online End Semester Examination(Spring Semester-2021)
Subject Name & Code: Machine Learning CS3035
Applicable to Courses: B.Tech (IT)
Full Marks=50 Time:2 Hours
SECTION-A (Answer All Questions. Each question carries 2 Marks)
Time:30 Minutes (7×2=14 Marks)
Question Question Question CO Answer Key
No Type Mapping (For MCQ (MCQ/SAT) Questions only) Q.No:1 MCQ Which of the following outcome is 1 C odd man out in the below list? A) R Squared B) RMSE C) Kappa D) All of the mentioned Point out the wrong statement. 1 C A) Regression through the origin yields an equivalent slope if you center the data first B) Normalizing variables results in the slope being the correlation C) Least squares is not an estimation tool D) None of the mentioned Which of the following implies no 1 A relationship with respect to correlation? A) Cor(X, Y) = 0 B) Cor(X, Y) = 1 C) Cor(X, Y) = 2 D) All of the mentioned Residual ______ plots investigate 1 B normality of the errors. A) RR B) QQ C) PP D) None of the mentioned Q.No:2 MCQ Which of the following is the 2 B correct formula for total variation? A) Total Variation = Residual Variation – Regression Variation B) Total Variation = Residual Variation + Regression Variation C) Total Variation = Residual Variation * Regression Variation D) All of the mentioned Which of the following is required 2 D by K-means clustering? A) defined distance metric B) number of clusters C) initial guess as to cluster centroids D) All of these Minimizing the likelihood is the 2 A same as maximizing -2 log likelihood. A) True B) False Let us say that we have computed 2 B the gradient of our cost function and stored it in a vector g. What is the cost of one gradient descent update given the gradient? (A) O(N) (B) O(D) (C) O(ND) (D) O(ND2) Q.No:3 MCQ Consider a linear-regression model 3 A with N = 3 and D = 1 with input-ouput pairs as follows: y1 = 22, x1 = 1, y2 = 3, x2 = 1, y3 = 3, x3 = 2. What is the gradient of mean-square error (MSE) with respect to β1 when β0 = 0 and β1 = 1? Give your answer correct to two decimal digits. A) -1.66 B) 1.66 C) 1.39 D) None of these Gradient of a continuous and 3 D differentiable function (A) is zero at a minimum (B) is non-zero at a maximum (C) decreases as you get closer to the minimum (D) All of these
Which of the following sentence is 3 D
FALSE regarding regression? (A) It relates inputs to outputs. (B) It is used for prediction. (C) It may be used for interpretation. (D) It discovers causal relationships For the one-parameter model, 3 C mean-Square error (MSE) is 1 defined as follows: 2𝑁 ∑𝑁 𝑖=1(yn − 2 β0 ) ). We have a half term in the front because, (A) scaling MSE by half makes gradient descent converge faster. (B) presence of half makes it easy to do grid search. (C) it does not matter whether half is there or not. (D) none of the above
Q.No:4 MCQ Lasso can be interpreted as 3 A
least-squares linear regression where (A) weights are regularized with the L1 norm (B) weights are regularized with the L2 norm (C) the weights have a Gaussian prior (D) the solution algorithm is simpler
Regarding bias and variance, which 3 B
of the follwing statements are true? (Here ‘high’ and ‘low’ are relative to the ideal model.) (A) Models which overfit have a high bias. (B) Models which overfit have a low bias. (C) Models which underfit have a high variance. (D) All of these In K-fold cross-validation, K is 3 B (A) A float (decimal) value (B) An integer value (C) An complex imaginary value (D) None of these The second principal (PC2) 3 A component is _______ to the first principal component (PC1) A) Orthogonal B) Inverse C) Transpose D) None of these Q.No:5 MCQ What is the cosine similarity 1 A between [4, 3, 3, 5], and [2, 0, 0, 0]?. A) 0.52 B) 0.62 C) 0.72 D) 0.74 Find the odd out from the following 1 C list: Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Stochastic Gradient Descent (SGD), and Gravitational Search Algorithm (GSA). A) GA B) PSO C) SGD D) None of these What is the cosine similarity 1 D between [5, 2, 0, 5], and [2, 0, 0, 0] A) 0.58 B) 0.55 C) 0.75 D) 0.68 The Manhattan distance between 1 C two points (10, 10) and (30,30) is: A) 20 B) 30 C) 40 D) 50 Q.No:6 MCQ Suppose we train a hard-margin 5 C linear SVM on n > 100 data points in , yielding a hyperplane with exactly 2 support vectors. If we add one more data point and retrain the classifier, what is the maximum possible number of support vectors for the new hyperplane (assuming the n + 1 points are linearly separable)? (A)2 (B)3 (C)(n+1) (D)n A valid Kernel function follows 5 C _______ condition. A) Vornoi B) Minkowiski C) Mercers D) None of them In Soft-margin Support Vector 5 B Classifier, the _____ variable allows few instances within the margin. A) Slack B) Sigma C) Beta D) None of these _______ multipliers used to 5 B integrate the inequality constraints into the main objective function. A) Euler B) Lagrangian C) Lypanov D) None of these Q.No:7 MCQ Neural networks 6 B (A) optimize a convex cost function (B) can be used for regression as well as classification (C) always output values between 0 and 1 (D) None of these Sigmoidal activation function maps 6 B the neuron output between A) -1 to +1 B) 0 to 1 C) Either 0 or 1 D) None of these A single input and output nuron 6 A has an input of 2, a weight 0f 6 and a bias of -5.5. What will be the output if the activation function is bipolar sigmoid (where, slope s=0.6 and round off 4 decimal places) A) 0.9802 B) 0.7806 C) 0.9881 D) None of these In a multi-layer neural network, 6 B the term “multi” suggests A) Only one hidden layer in between input and output layers B) One or more hidden layer(s) in between input and output layers C) Always more than one hidden layers in between input and output layers D) None of these
SECTION-B(Answer Any Three Questions. Each Question carries 12
Marks) Time: 1 Hour and 30 Minutes (3×12=36 Marks)
Question No Question CO Mapping
(Each question should be from the same CO(s)) Q.No:8 A) Why do we call Naive Bayes classifier “Naive”? State its CO3 advantages and disadvantages. [1+2=3] B) Explain the following with examples: One-Against-All (OAA) and One-Against-One (OAO). [2+2=4] C) Derive the Naive Bayes classification algorithm using the following toy dataset. [5]
Classify a “Red SUV Domestic” using Naive Bayes Classifier.
A) What is slack variable and its role in soft-margin SVM classifier? Explain the concept using suitable diagram. [2] B) Find the entropy, information gain and draw the decision trees for the following set of training dataset. [4]
C) Explain the following terms with examples: Confusion
Matrix, Accuracy, Precision, Recall, F1-score and Area Under the Curve (AUC). [6] A) Explain the following with examples: One-Against-All (OAA) and One-Against-One (OAO). [4] B) States the Mercers conditions for a valid kernel function. Name at least four valid kernel functions used in SVM classifier with appropriate equations. [4] C) The merits and demerits of the SVM classifier. [4] Q.No:9 A) What is “curse of dimensionality”? State the possible CO5 remedies. [2+1=3] B) What is a Covariance Matrix? State its limitation. [2+1=3] C) Using PCA, reduce the Dimension of the given data from 2 to 1. [6]
Feature /Example Ex1 Ex2 Ex3 Ex4
F1 12 4 8 17 F2 5 9 16 7 A) Differentiate between lossy and lossless feature/attribute reduction with suitable examples. [3] B) What is a Covariance Matrix? State its limitation. [2+1=3] C) Explain the Principal Component Analysis (PCA) and reduce the following dataset step-by-step from 2 dimensions to 1. [6] Feature /Example Ex1 Ex2 Ex3 Ex4 F1 6 -3 -2 7 F2 -4 5 6 -3
A) Differentiate between lossy and lossless feature/attribute
reduction with suitable examples. [2] B) What is a principal component? Explain the (mathematical) relationship between the first and the second principal components. [2+2=4] C) Using PCA, reduce the Dimension of the given data from 2 to 1. [6]
Feature /Example Ex1 Ex2 Ex3 Ex4
F1 12 4 8 17 F2 5 9 16 7 Q.No:10 A) What are over-fitting and under-fitting? Explain them with CO2, CO5 suitable examples. [4] B) Explain the Elbow technique to determine sn appropriate “K” value in K-NN classifier. [2] C) What are the intra-cluster and inter-cluster distances? Using K-means Clustering Algorithm, cluster the following data points: P1=(75,102), P2=(201,16), P3=(68, 80), P4=(188,36), P5=(165,55) and P6=(100,42). where, K=2, and use Euclidean distance for the purpose. (start the computation with P2 and P3 as the two initial centroids points) [1+5=6] A) What is the significance of bias in the decision boundary? Explain the Bias-Variance relationship in machine learning with suitable diagrams. [2+2=4] B) Why does we use the term “regression” in the Logistic Regression classification algorithm. [2] C) Using KNN algorithm and the given data set, predict the class label of the test data point (16,8), where K=3 and Euclidean distance. [6] X Y Label --------------------- 10 05 0 6.5 11 1 7 15 1 12 05 0 8 10 1 15 8 0 A. What are the over-fitting and the under-fitting? Explain them with suitable examples. [3] B. What are the intra-cluster and inter-cluster distances? Explain the Elbow technique to determine an appropriate “K” value in K-NN classifier. [1.5+1.5=3] C. Using K-means Clustering Algorithm, cluster the following data points: P1=(75,102), P2=(201,16), P3=(68, 80), P4=(188,36), P5=(165,55) and P6=(100,42). where, K=2, and use Euclidean distance for the purpose. (start the computation with P1 and P3 as the two initial centroids points) [6] Q.No:11 A) What is Binary Activated Neuron model? States its CO6 limitations. [4] B) A two input single output neuron model has weights value [1.5 -2.1] and bias of -3. It is given an input [2.0 2.5]. What will be the output if the binary step function threshold=1 is used? [4] C) Describe Multi-layer Perceptron Neural Network with suitable mathematical expressions and diagram. [4] A) What is perceptron? Explain it with an example. [4] B) Differentiate between linearly and non-linearly separable datasets. [2] C) Solve XOR problem with a two input artificial neuron model. [6] A) What are the learning rate and the momentum in Artificial Neural Network (ANN) model? State different learning rules used in ANN. [2+2=4] B) Draw a diagram of a multiple input single output artificial neural network and compute its input-output relationship. [4] C) Describe the Backpropagation algorithm using appropriate mathematical expressions. [4]