CST395 - ML Syllabus
CST395 - ML Syllabus
Year of
Category L T P Credit
CST NEURAL NETWORKS Introduction
395 AND DEEP LEARNING
VAC 3 1 0 4 2019
Preamble:
Course Outcomes: After the completion of the course the student will be able to
Illustrate the basic concepts of neural networks and its practical issues(Cognitive
CO2 Knowledge Level : Apply)
Outline the standard regularization and optimization techniques for deep neural
CO3 networks (Cognitive Knowledge Level : Understand)
Explain the concepts of modern RNNs like LSTM, GRU (Cognitive Knowledge
Level : Understand)
CO5
122
COMPUTER SCIENCE AND ENGINEERING
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1
CO2
CO3
CO4
CO5
123
COMPUTER SCIENCE AND ENGINEERING
Assessment Pattern
Mark Distribution
Total Marks CIE Marks ESE Marks ESE
Duration
124
COMPUTER SCIENCE AND ENGINEERING
contains 2 questions from each module of which a student should answer any one. Each question
can have a maximum 2 subdivisions and carry 14 marks.
Syllabus
Module - 1 (Basics of Machine Learning )
Introduction to neural networks -Single layer perceptrons, Multi Layer Perceptrons (MLPs),
Representation Power of MLPs, Activation functions - Sigmoid, Tanh, ReLU, Softmax. Risk
minimization, Loss function, Training MLPs with backpropagation, Practical issues in neural
network training - The Problem of Overfitting, Vanishing and exploding gradient problems,
Difficulties in convergence, Local and spurious Optima, Computational Challenges.
Applications of neural networks.
Introduction to deep learning, Deep feed forward network, Training deep models, Optimization
techniques - Gradient Descent (GD), GD with momentum, Nesterov accelerated GD,
Stochastic GD, AdaGrad, RMSProp, Adam. Regularization Techniques - L1 and L2
regularization, Early stopping, Dataset augmentation, Parameter sharing and tying, Injecting
noise at input, Ensemble methods, Dropout, Parameter initialization.
Recurrent neural networks – Computational graphs, RNN design, encoder – decoder sequence
to sequence architectures, deep recurrent networks, recursive neural networks, modern RNNs
LSTM and GRU, Practical use cases for RNNs. Case study - Natural Language Processing.
125
COMPUTER SCIENCE AND ENGINEERING
Text Book
1. Goodfellow, I., Bengio,Y., and Courville, A., Deep Learning, MIT Press, 2016.
2. Neural Networks and Deep Learning, Aggarwal, Charu C., c Springer International
Publishing AG, part of Springer Nature 2018
3. Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence
Algorithms (1st. ed.). Nikhil Buduma and Nicholas Locascio. 2017. O'Reilly Media, Inc.
Reference Books
1. Satish Kumar, Neural Networks: A Classroom Approach, Tata McGraw-Hill Education,
2004.
2. Yegnanarayana, B., Artificial Neural Networks PHI Learning Pvt. Ltd, 2009.
3. Michael Nielsen, Neural Networks and Deep Learning, 2018
1 man woman
2 man man
3 woman woman
4 man man
126
COMPUTER SCIENCE AND ENGINEERING
5 man woman
6 woman woman
7 woman man
8 man man
9 man woman
10 woman woman
3. In a Deep CNN architecture the feature map L1 was processed by the following
operations as shown in the figure. First down sampled using max pool operation of size
2 and stride 2, and three convolution operations and finally max unpool operation and
followed by an element wise sum. The feature map L1 and L4 are given below. Compute
the matrix L6.
L1 = 10 20 15 22 L4 = 10 20
20 16 28 30 20 30
30 12 20 16
20 20 40 12
127
COMPUTER SCIENCE AND ENGINEERING
4. Illustrate the workings of the RNN with an example of a single sequence defined on
a vocabulary of four words.
QP CODE: PAGES:4
Reg No:_______________
Name:_________________
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
FIFTH SEMESTER B.TECH DEGREE EXAMINATION(HONORS), MONTH &
YEAR
Course Code: CST 395
Course Name: Neural Networks and Deep Learning
Max.Marks:100 Duration:3 Hours
PART A
Answer all Questions. Each question carries 3 Marks
2. Suppose 10000 patients get tested for flu; out of them, 9000 are actually healthy
and 1000 are actually sick. For the sick people, a test was positive for 620 and
negative for 380. For healthy people, the same test was positive for 180 and
negative for 8820. Construct a confusion matrix for the data and compute the
128
COMPUTER SCIENCE AND ENGINEERING
5. Derive weight updating rule in gradient descent when the error function is a)
mean squared error b) cross entropy
7. What happens if the stride of the convolutional layer increases? What can be the
maximum stride? Justify your answer.
Part B
(Answer any one question from each module. Each question carries 14
Marks)
11. (a)
Prove that the decision boundary of binary logistic regression is linear
(9)
(b) Given the following data, construct the ROC curve of the data. Compute
the AUC.
Threshold TP TN FP FN
1 0 25 0 29
2 7 25 0 22 (5)
3 18 24 1 11
4 26 20 5 3
5 29 11 14 0
129
COMPUTER SCIENCE AND ENGINEERING
6 29 0 25 0
7 29 0 25 0
OR
12. (a) With an example classification problem, explain the following terms:
a) Hyper parameters b) Training set c) Validation sets d) Bias e) Variance (8)
y 52 54 56 58 62
13. (a) Update the parameters V11 in the given MLP using back propagation with
learning rate as 0.5 and activation function as sigmoid. Initial weights are
given as V11= 0.2, V12=0.1, V21=0.1, V22=0.3, V11=0.2, W11=0.5,
W21=0.2
V
1 1 W
0.6
(10)
T=0.9
1
0.8
2 2
(b)
Explain the importance of choosing the right step size in neural networks
(4)
OR
14. (a) Explain in detail any four practical issues in neural network training (8)
130
COMPUTER SCIENCE AND ENGINEERING
(b) Calculate the output of the following neuron Y with the activation function
as a) binary sigmoid b) tanh c)ReLU
(6)
(b) Differentiate gradient descent with and without momentum. Give equations
for weight updation in GD with and without momentum. Illustrate (8)
plateaus, saddle points and slowly varying gradients.
OR
16. (a) Suppose a supervised learning problem is given to model a deep feed
forward neural network. Suggest solutions for the following a) small sized
dataset for training b) dataset with both labelled and unlabeled data c) (9)
large data set but data from different distribution
(b) Describe the effect in bias and variance when a neural network is modified
(5)
with more number of hidden units followed with dropout regularization.
17. (a) Draw and explain the architecture of Convolutional Neural Networks (8)
(b) Suppose that a CNN was trained to classify images into different
categories. It performed well on a validation set that was taken from the
same source as the training set but not on a testing set. What could be the (6)
problem with the training of such a CNN? How will you ascertain the
problem? How can those problems be solved?
OR
18. (a) Explain the following convolution functions a)tensors b) kernel flipping c)
(10)
down sampling d) strides e) zero padding.
131
COMPUTER SCIENCE AND ENGINEERING
19. (a) Describe how an LSTM takes care of the vanishing gradient problem. Use
some hypothetical numbers for input and output signals to explain the (8)
concept
OR
20. (a) Explain LSTM based solution for anyone of the problems in the Natural
(8)
Language Processing domain.
Teaching Plan
132
COMPUTER SCIENCE AND ENGINEERING
2.6 Practical issues in neural network training - The Problem of Overfitting, 1 hour
Vanishing and exploding gradient problems
133
COMPUTER SCIENCE AND ENGINEERING
134