0% found this document useful (0 votes)
26 views13 pages

Syllabus

The document outlines a course on Deep Learning (CST414) that covers fundamental concepts such as neural networks, optimization techniques, and various architectures including CNNs and RNNs. It details prerequisites, course outcomes, assessment patterns, and a comprehensive syllabus divided into modules focusing on different aspects of deep learning. Additionally, it includes course assessment questions and a model question paper for evaluation.

Uploaded by

thejasurendran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views13 pages

Syllabus

The document outlines a course on Deep Learning (CST414) that covers fundamental concepts such as neural networks, optimization techniques, and various architectures including CNNs and RNNs. It details prerequisites, course outcomes, assessment patterns, and a comprehensive syllabus divided into modules focusing on different aspects of deep learning. Additionally, it includes course assessment questions and a model question paper for evaluation.

Uploaded by

thejasurendran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

COMPUTER SCIENCE AND ENGINEERING

YEAR OF
CATEGORY L T P CREDIT INTRODUCTION
CST414 DEEP LEARNING
PEC 2 1 0 3 2019

Preamble: Deep Learning is the recently emerged branch of machine learning, particularly
designed to solve a wide range of problems in Computer Vision and Natural Language Processing.
In this course, the building blocks used in deep learning are introduced. Specifically, neural
networks, deep neural networks, convolutional neural networks and recurrent neural networks.
Learning and optimization strategies such as Gradient Descent, Nesterov Accelerated Gradient
Descent, Adam, AdaGrad and RMSProp are also discussed in this course. This course will helps
the students to attain sound knowledge of deep architectures used for solving various Vision and
NLP tasks. In future, learners can master modern techniques in deep learning such as attention
mechanisms, generative models and reinforcement learning.

Prerequisite: Basic understanding of probability theory, linear algebra and machine learning

Course Outcomes: After the completion of the course, the student will be able to

CO1 Illustrate the basic concepts of neural networks and its practical issues
(Cognitive Knowledge Level: Apply)

CO2 Outline the standard regularization and optimization techniques for deep neural
network (Cognitive Knowledge Level: understand)

CO3 Implement the foundation layers of CNN (pooling, convolutions)


(Cognitive Knowledge Level: Apply)

CO4 Implement a sequence model using recurrent neural networks


(Cognitive Knowledge Level: Apply)

CO5 Use different neural network/deep learning models for practical applications.
(Cognitive Knowledge Level: Apply)

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

Mapping of course outcomes with program outcomes

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

CO1

CO2

CO3

CO4

CO5

Abstract POs defined by National Board of Accreditation

PO# Broad PO PO# Broad PO

PO1 Engineering Knowledge PO7 Environment and Sustainability

PO2 Problem Analysis PO8 Ethics

PO3 Design/Development of solutions PO9 Individual and team work


Conduct investigations of
PO4 complex problems PO10 Communication

PO5 Modern tool usage PO11 Project Management and Finance

PO6 The Engineer and Society PO12 Life long learning

Assessment Pattern

Continuous Assessment Tests


Bloom’s End Semester Examination
Category Marks (%)
Test 1 (%) Test 2 (%)

Remember 30 30 30

Understand 30 30 30

Apply 40 40 40

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

Analyze

Evaluate
Create

Mark Distribution

Total Marks CIE Marks ESE Marks ESE Duration

150 50 100 3

Continuous Internal Evaluation Pattern:


Attendance 10 marks
Continuous Assessment Tests(Average of Internal Tests 1 & 2) 25 marks

Continuous Assessment Assignment 15 marks

Internal Examination Pattern


Each of the two internal examinations has to be conducted out of 50 marks. First series test shall
be preferably conducted after completing the first half of the syllabus and the second series test
shall be preferably conducted after completing remaining part of the syllabus. There will be two
parts: Part A and Part B. Part A contains 5 questions (preferably, 2 questions each from the
completed modules and 1 question from the partly completed module), having 3 marks for each
question adding up to 15 marks for part A. Students should answer all questions from Part A. Part
B contains 7 questions (preferably, 3 questions each from the completed modules and 1 question
from the partly completed module), each with 7 marks. Out of the 7 questions, a student should
answer any 5.

End Semester Examination Pattern:


There will be two parts; Part A and Part B. Part A contains 10 questions with 2 questions from
each module, having 3 marks for each question. Students should answer all questions. Part B
contains 2 full questions from each module of which student should answer any one. Each
question can have maximum 2 sub-divisions and carries 14 marks.

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

Syllabus
Module-1 (Neural Networks )

Introduction to neural networks -Single layer perceptrons, Multi Layer Perceptrons (MLPs),
Representation Power of MLPs, Activation functions - Sigmoid, Tanh, ReLU, Softmax. , Risk
minimization, Loss function, Training MLPs with backpropagation, Practical issues in neural
network training - The Problem of Overfitting, Vanishing and exploding gradient problems,
Difficulties in convergence, Local and spurious Optima, Computational Challenges. Applications
of neural networks.

Module-2 (Deep learning)

Introduction to deep learning, Deep feed forward network, Training deep models, Optimization
techniques - Gradient Descent (GD), GD with momentum, Nesterov accelerated GD, Stochastic
GD, AdaGrad, RMSProp, Adam. Regularization Techniques - L1 and L2 regularization, Early
stopping, Dataset augmentation, Parameter sharing and tying, Injecting noise at input, Ensemble
methods, Dropout, Parameter initialization.

Module-3 (Convolutional Neural Network)

Convolutional Neural Networks – convolution operation, motivation, pooling, Convolution and


Pooling as an infinitely strong prior, variants of convolution functions, structured outputs, data
types, efficient convolution algorithms.

Module- 4 (Recurrent Neural Network)

Recurrent neural networks – Computational graphs, RNN design, encoder – decoder sequence to
sequence architectures, deep recurrent networks, recursive neural networks, modern RNNs LSTM
and GRU.

Module-5 (Application Areas)

Applications – computer vision, speech recognition, natural language processing, common word
embedding: continuous Bag-of-Words, Word2Vec, global vectors for word representation
(GloVe). Research Areas – autoencoders, representation learning, boltzmann machines, deep
belief networks.
Text Books
1. Goodfellow, I., Bengio,Y., and Courville, A., Deep Learning, MIT Press, 2016.
2. Neural Networks and Deep Learning, Aggarwal, Charu C.
3. Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence
Algorithms (1st. ed.). Nikhil Buduma and Nicholas Locascio. 2017. O'Reilly Media, Inc.

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

Reference Books
1. Satish Kumar, Neural Networks: A Classroom Approach, Tata McGraw-Hill Education,
2004.
2. Yegnanarayana, B., Artificial Neural Networks PHI Learning Pvt. Ltd, 2009.
3. Michael Nielsen, Neural Networks and Deep Learning, 2018

Course Level Assessment Questions


Course Outcome1 (CO1):
1. Suppose you have a 3-dimensional input x = (x1, x2, x3) = (2, 2, 1) fully connected
to 1 neuron which is in the hidden layer with activation function sigmoid. Calculate
the output of the hidden layer neuron.

2. Design a single layer perceptron to compute the NAND (not-AND) function. This
function receives two binary-valued inputs x1 and x2, and returns 0 if both inputs are
1, and returns 1 otherwise.

3. Suppose we have a fully connected, feed-forward network with no hidden layer, and
5 input units connected directly to 3 output units. Briefly explain why adding a
hidden layer with 8 linear units does not make the network any more powerful.

4. Briefly explain one thing you would use a validation set for, and why you can’t just
do it using the test set.

5. Give a method to fight vanishing gradients in fully-connected neural networks.


Assume we are using a network with Sigmoid activations trained using SGD.

6. You would like to train a fully-connected neural network with 5 hidden layers, each
with 10 hidden units. The input is 20-dimensional and the output is a scalar. What is
the total number of trainable parameters in your network?

Course Outcome 2(CO2):


1. Derive a mathematical expression to show L2 regularization as weight decay.
Explain how L2 regularization improves the performance of deep feed forward
neural networks.

2. In stochastic gradient descent, each pass over the dataset requires the same number
of arithmetic operations, whether we use minibatches of size 1 or size 1000. Why
can it nevertheless be more computationally efficient to use minibatches of size
1000?

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

3. State how to apply early stopping in the context of learning using Gradient Descent.
Why is it necessary to use a validation set (instead of simply using the test set) when
using early stopping?

4. Suppose that a model does well on the training set, but only achieves an accuracy of
85% on the validation set. You conclude that the model is overfitting, and plan to
use L1 or L2 regularization to fix the issue. However, you learn that some of the
examples in the data may be incorrectly labeled. Which form of regularisation
would you prefer to use and why?

5. Describe one advantage of using Adam optimizer instead of basic gradient descent.

Course Outcome 3(CO3):


1. Draw and explain the architecture of convolutional neural networks.
2. Consider a convolution layer. The input consists of 6 feature maps of size 20×20.
The output consists of 8 feature maps, and the filters are of size 5 x 5. The
convolution is done with a stride of 2 and zero padding, so the output feature maps
are of size 10 x 10.
a. Determine the number of weights in this convolution layer.
b. Determine the number of weights if we made this a fully connected layer,
but the number of input and output units are kept the same as in the network.
3. Suppose two people A and B have implemented two neural networks for
recognizing handwritten digits from 16 x 16 grayscale images. Each network has a
single hidden layer, and makes predictions using a softmax output layer with 10
units, one for each digit class.
a. A’s network is a convolutional net. The hidden layer consists of three 16 x
16 convolutional feature maps, each with filters of size 5 x 5, and uses the
logistic nonlinearity. All of the hidden units are connected to all of the
output units.
b. B’s network is a fully connected network with no weight sharing. The
hidden layer consists of 768 logistic units (the same number of units as in
A’s convolutional layer).
4. Briefly explain one advantage of A’s approach and one advantage of B’s approach.
5. Why do the layers in a deep architecture need to be non-linear?
6. Give two benefits of using convolutional layers instead of fully connected ones for
visual tasks.
7. You have an input volume of 32 x 32 x 3. What are the dimensions of the resulting
volume after convolving a 5 x 5 kernel with zero padding, stride of 1, and 2 filters?

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

Course Outcome 4(CO4): .


1. Draw and explain the architecture of LSTM.
2. Name at least one benefit of the LSTM model over the bag-of-vectors model.
3. Give one advantage of GloVe over Skipgram/CBOW models.
4. What are two ways practitioners deal with having two different sets of word vectors
U and V at the end of training both Glove and word2vec?
5. If we have a recurrent neural network (RNN), we can view it as a different
type of network by "unrolling it through time". Briefly explain what that
means.
6. Briefly explain how “unrolling through time” is related to “weight sharing” in
convolutional networks.

Course Outcome 5(CO5):


1. Development a deep learning solution for problems in the domain i) natural
language processing or ii) Computer vision
2. Illustrate the workings of the RNN with an example of a single sequence defined on
a vocabulary of four words.
3. Is an autoencoder for supervised learning or for unsupervised learning?
Explain briefly.
4. Sketch the architecture of an autoencoder network.
5. Describe how to train an autoencoder network.

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

Model Question Paper

QP CODE:

Reg No: _______________

Name: _________________ PAGES : 4

APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY

EIGHTH SEMESTER B.TECH DEGREE EXAMINATION, MONTH & YEAR

Course Code: CST414

Course Name: Deep Learning

Max. Marks : 100 Duration: 3 Hours

PART A

Answer All Questions. Each Question Carries 3 Marks

1. Discuss the limitation of a single layer perceptron with an example.

2. List the advantages and disadvantages of sigmoid and ReLU activation functions.

3. Derive weight updating rule in gradient descent when the error function is a)
mean squared error b) cross entropy.

4. Discuss methods to prevent overfitting in neural networks.

5. What happens if the stride of the convolutional layer increases? What can be the
maximum stride? Explain.

6. Draw the architecture of a simple CNN and write short notes on each block.

7. How does a recursive neural network work?

8. List down the differences between LSTM and RNN.

9. Illustrate the use of deep learning concepts in Speech Recognition.

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

10. What is an autoencoder? Give one application of an autoencoder


(10x3=30)

Part B
(Answer any one question from each module. Each question carries 14 Marks)

11. (a) Update the parameters in the given MLP using gradient descent with learning (10)
rate as 0.5 and activation function as ReLU. Initial weights are given as
𝑉𝑉 = 0.1 0.2 0 .1 0.1 W= 0.1 0.1

(b) Explain the importance of choosing the right step size in neural networks. (4)

OR

12. (a) Draw the architecture of a multi-layer perceptron. Derive update rules for (10)
parameters in the multi-layer neural network through the gradient descent

(b) Calculate the output of the following neuron Y if the activation function is a (4)

binary sigmoid.

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

13. (a) Explain, what might happen in ADAGRAD, where momentum is expressed (6)
as ∆𝑤𝑤𝑡𝑡 = −𝜂𝜂𝑔𝑔𝑡𝑡 /√(∑𝑡𝑡𝜏𝜏=1 𝑔𝑔𝜏𝜏2 ) where the denominator computes the L2
norm of all previous gradients on a per-dimension basis and is a global
learning rate shared by all dimensions.

(b) Differentiate gradient descent with and without momentum. Give equations (8)
for weight updation in GD with and without momentum. Illustrate plateaus,
saddle points and slowly varying gradient.

OR

14. (a) Suppose a supervised learning problem is given to model a deep feed forward (9)
neural network. Suggest solutions for the following a) small sized dataset for
training b) dataset with unlabeled data c) large data set but data from different
distribution.

(b) Describe the effect in bias and variance when a neural network is modified (5)
with more number of hidden units followed with dropout regularization

15. (a) Draw and explain the architecture of Convolutional Neural Networks (8)

(b) Suppose that a CNN was trained to classify images into different categories. (6)
It performed well on a validation set that was taken from the same source as
the training set but not on a testing set, which comes from another
distribution. What could be the problem with the training of such a CNN?
How will you ascertain the problem? How can those problems be solved?

OR

16. (a) What is the motivation behind convolution neural networks? (4)

(b) Discuss all the variants of the basic convolution function. (10)

17. (a) Describe how an LSTM takes care of the vanishing gradient problem. Use (8)
some hypothetical numbers for input and output signals to explain the
concept.

(b) Draw and explain the architecture of Recurrent Neural Networks (6)

OR

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

18. (a) Explain the application of LSTM in Natural Language Processing. (8)

(b) Discuss the architecture of GRU. (6)

19. (a) Explain any two word embedding techniques (8)

(b) Explain the merits and demerits of using Autoencoders in Computer Vision. (6)

OR

20. (a) Illustrate the use of representation learning in object classification. (7)

(b) Compare Boltzmann Machine with Deep Belief Network. (7)

Teaching Plan
No. of
Lecture
No Contents Hours
(36 hrs)

Module-1 (Neural Networks ) (7 hours)

1.1 Introduction to neural networks -Single layer perceptrons 1


1.2 Multi Layer Perceptrons (MLPs), Representation Power of MLPs 1
1.3 Activation functions - Sigmoid, Tanh, ReLU, Softmax. , Risk minimization,
1
Loss function
1.4 Training MLPs with backpropagation 1
1.5 Illustration of back propagation algorithm 1
1.6 Practical issues in neural network training - The Problem of Overfitting,
1
Vanishing and exploding gradient problems
1.7 Difficulties in convergence, Local and spurious Optima, Computational
1
Challenges.

Module-2 (Deep learning) (9 hours)

2.1 Introduction to deep learning, Deep feed forward network 1


2.2 Training deep models, Concepts of Regularization and optimization, 1

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

2.3 Gradient Descent (GD), GD with momentum, 1


2.4 Nesterov accelerated GD, Stochastic GD, 1
2.5 AdaGrad, RMSProp, Adam, 1
2.6 L1 and L2 regularization, Early stopping, Dataset augmentation, 1
2.7 Parameter sharing and tying, Injecting noise at input, Ensemble methods 1
2.8 Parameter sharing and tying, Injecting noise at input, Ensemble methods 1

2.9 Dropout, Parameter initialization.

Module-3 (Convolutional Neural Network) (6 hours)

3.1 Convolutional Neural Networks – convolution operation 1


3.2 motivation, pooling 1
3.3 Convolution and Pooling as an infinitely strong prior 1
3.4 Variants of convolution functions 1
3.5 structured outputs, data types. 1
3.6 Efficient convolution algorithms. 1

Module- 4 (Recurrent Neural Network) (5 hours)

4.1 Recurrent neural networks – Computational graphs, RNN design 1


4.2 Encoder – decoder sequence to sequence architectures 1
4.3 Deep recurrent networks, recursive neural networks 1
4.4 Modern RNNs LSTM 1
4.5 GRU 1

Module-5 (Application Areas)( 9 hours)

5.1 Computer vision. (TB1: Section 12.2) 1


5.2 Speech recognition. (TB1: Section 12.3) 1
5.3 Natural language processing. (TB1: Section 12.4) 1
Common Word Embedding - Continuous Bag-of-Words, Word2Vec (TB3:
5.4 1
Section 2.6)

Downloaded from Ktunotes.in


COMPUTER SCIENCE AND ENGINEERING

Common Word Embedding - Global Vectors for Word


5.5 1
Representation(GloVe) (TB3: Section 2.9.1- Pennigton 2014)
Brief introduction on current research areas - Autoencoders, Representation
5.6 1
learning. (TB3: Section 4.10)
Brief introduction on current research areas - representation learning. (TB3:
5.7 1
Section 9.3)
Brief introduction on current research areas - Boltzmann Machines, Deep
5.8 1
belief networks. (TB1: Section 20.1, TB3 Section 6.3)
Brief introduction on current research areas - Deep belief networks. (TB1:
5.9 1
Section 20.3)

Downloaded from Ktunotes.in

You might also like