0% found this document useful (0 votes)
34 views81 pages

DL Module 1 - CS-1 Fundamentals of Neural Network

Uploaded by

vignaesh23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views81 pages

DL Module 1 - CS-1 Fundamentals of Neural Network

Uploaded by

vignaesh23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Deep Learning

Module 1
BITS Pilani
The course owner and the author of this course deck,
Prof. Seetha Parameswaran, is gratefully acknowledging
the authors who made their course materials freely
available online.

2
Course Logistics

3
What we Learn…. (Module Structure)

1. Fundamentals of Neural Network


2. Multilayer Perceptron
3. Deep Feedforward Neural Network
4. Improve the DNN performance by Optimization and Regularization
5. Convolutional Neural Networks
6. Sequence Models
7. Attention Mechanism
8. Representational Learning
9. Generative Adversarial Networks

4
Books

Textbook
● Dive into Deep Learning by Aston Zhang, Zack C. Lipton, Mu Li, Alex
J. Smola. https://d2l.ai/chapter_introduction/index.html

Reference book
● Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville
https://www.deeplearningbook.org/
5
Evaluation Components and Schedule

Evaluation Component Mode Duration Weightage Date and Time

Quiz I * 30 Mins 10%

Quiz II * 30 Mins 10%


EC1
Assignment 01 4 Weeks 10%

Assignment 02 Online 4 Weeks 10% Refer to LMS

Mid-sem exam (Closed


EC2 2 Hours 30%
book)
Comprehensive exam
EC3 2.5 Hours 40%
(Open book)
* Best of Two

6
Lab Sessions

● L1 Introduction to PyTorch
● L2 Deep Neural Network with Back-propagation and optimization
● L3 CNN
● L4 RNN
● L5 LSTM
● L6 Auto-encoders

7
Course Logistics

● Refer Taxila for the following


○ Handout
○ Schedule for Webinar
○ Schedule of Quiz, and Assignments.
○ Evaluation scheme
○ Session Slide Deck
○ Demo Lab Sheets
○ Quiz-I, Quiz-II
○ Assignment-I, Assignment-II
○ Sample QPs
● Lecture Recordings
○ Available on Microsoft teams
8
Honour Code

All submissions for graded components must be the result of your original
effort. It is strictly prohibited to copy and paste verbatim from any sources,
whether online or from your peers. The use of unauthorized sources or
materials, as well as collusion or unauthorized collaboration to gain an unfair
advantage, is also strictly prohibited. Please note that we will not distinguish
between the person sharing their resources and the one receiving them for
plagiarism, and the consequences will apply to both parties equally.

In cases where suspicious circumstances arise, such as identical verbatim


answers or a significant overlap of unreasonable similarities in a set of
submissions, will be investigated, and severe punishments will be imposed on
all those found guilty of plagiarism.
9
In case of Queries regarding the course….
Step 1: Post in the discussion forum.
● Read through the existing post and if you find any topic similar to your
concern, add on to the existing discussion.
● Avoid duplication of queries or issues.
Step 2: Email the IC at [email protected] if the query or
issue is not resolved within 1 week. Turn around for a response to the email
is 48 hours.
● In the subject pl mention the phrase ”DL” clearly.
● Use BITS email id for correspondence. Emails from personal emails will
be ignored without any reply.
PATIENCE is highly APPRECIATED
10
What is Deep Learning?

11
Definitions of Deep Learning

● Deep Learning is a type of machine learning based on artificial neural


networks in which multiple layers of processing are used to extract
progressively higher-level features from data.
● Deep learning is a method in artificial intelligence (AI) that teaches
computers to process data in a way that is inspired by the human
brain.
● Deep learning is a machine learning technique that teaches computers to
do what comes naturally to humans: learn by example.
● Deep learning is a subset of machine learning, which is essentially a
neural network with three or more layers.
● Deep Learning gets its name from the fact that we add more "Layers" to
learn from the data. 12
Where in AI sits DL?

AI is a general field that encompasses machine learning and deep


learning, but that also includes many more approaches that don’t involve
any learning.

13
AI - ML - DL

AI : Artificial intelligence is the science of making things smart. The aim is


make machines perform human tasks. Eg: Robot cleaning a room.

ML : Machine learning is an approach to AI. The machine learns or


perform tasks through learning by experience.

DL : Deep Learning is a technique for implementing machine learning to


recognise patterns.
14
Deep (Machine) Learning

● Deep learning is a specific subfield of machine learning.


● Learning representations from data that puts an emphasis on learning
successive layers of increasingly meaningful representations.
● The deep in deep learning stands for this idea of successive layers of
representations.
● The number of layers that contribute to model the data is called the
depth of the model.
● In deep learning, the layered representations are learned via models
called neural networks, structured in literal layers stacked on top of
each other.
15
Why Deep Learning?

16
Why Deep Learning?

● Large amounts of data


● Lots and lots of unstructured data like images, text, audio, video
● Cheap, high-quality sensors
● Cheap computation - CPU, GPU, Distributed clusters
● Cheap data storage
● Learn by examples
● Automated feature generation
● Better learning capabilities
● Scalability
● Advance analytics can be applied
17
Why deep learning?

18
Deep Learning Timeline

19
Applications of Deep Learning

20
Breakthroughs with Neural Networks

21
Breakthroughs with Neural Networks

22
Breakthroughs with Neural Networks

23
Breakthroughs
Breakthroughs with Neural Networks
with Neural Networks

24
Breakthroughs with Neural Networks

25
Breakthroughs
Breakthroughs with Neural Networks
with Neural Networks

26
Applications of Deep Learning

27
Many more applications….

● a program that predicts tomorrowʼs weather given geographic


information, satellite images, and a trailing window of past weather.
● a program that takes in a question, expressed in free-form text, and
answers it correctly.
● a program that given an image can identify all the people it contains,
drawing outlines around each.
● a program that presents users with products that they are likely to
enjoy but unlikely, in the natural course of browsing, to encounter. 28
Key components of DL problem

29
Core components of DL problem

1. The data that we can learn from.


2. A model of how to transform the data.
3. An objective function that quantifies how well (or badly) the model is
doing.
4. An algorithm to adjust the modelʼs parameters to optimize the
objective function.

30
1. Data

● Collection of examples.
● Data has to be converted to an useful and a suitable numerical
representation.
● Each example (or data point, data instance, sample) typically consists
of a set of attributes called features (or covariates), from which the
model must make its predictions.
● In the supervised learning problems, the attribute to be predicted is
designated as the label (or target).
● Mathematically, a set of m examples,
● We need right data.
31
1. Data

● Dimensionality of data
○ Each example has the same number of numerical values. This data consist of
fixed-length vectors. Eg: Image
○ The constant length of the vectors as the dimensionality of the data.
○ Text data has varying-length data.

32
2. Model

● Model denotes the computational machinery for ingesting data of


one type, and spitting out predictions of a possibly different type.
● Deep learning models consist of many successive transformations of
the data that are chained together top to bottom, thus the name deep
learning.

33
3. Objective Function

● Learning means improving at some task over time.


● A formal mathematical system of learning machines is defined using
formal measures of how good (or bad) the models are. These formal
measures are called as objective functions.
● By convention, objective functions are defined so that lower is better.
● Because lower is better, these functions are sometimes called loss
functions.

34
3. Loss Functions

● To predict numerical values (regression), the most common loss


function is squared error.
● For classification, the most common objective is to minimize error
rate, i.e., the fraction of examples on which our predictions disagree
with the ground truth.

35
3. Loss Functions

● Loss function is defined with respect to the model parameters and


depends upon the dataset.
● We learn the best values of our model parameters by minimizing the
loss incurred on a set consisting of some number of examples
collected for training. However, doing well on the training data does
not guarantee that we will do well on unseen data. I.e Model has to
generalize better.
● When a model performs well on the training set but fails to generalize
to unseen data, we say that it is overfitting.
36
4. Optimization Algorithms

● Optimization Algorithm is an algorithm capable of searching for the


best possible parameters for minimizing the loss function.
● Popular optimization algorithms for deep learning are based on an
approach called gradient descent.

37
Example of the Framework

● We have to tell a computer explicitly how to map from inputs to


outputs.
● We have to define the problem precisely, pinning down the exact
nature of the inputs and outputs, and choosing an appropriate model
family.
● Collect a huge dataset containing examples of audio and label those
that do and that do not contain the wake word.

38
Example of the Framework

● Create a Model
○ Define a flexible program whose behavior is determined by a number of
parameters.
○ To determine the best possible set of parameters, use the data. The parameters
should improve the performance of the program with respect to some measure of
performance on the task of interest.
○ After fixing the parameters, we call the program a model.
■ Eg: The model receives a snippet of audio as input, and the model generates
a selection among yes, no as output.
○ The set of all distinct programs (input-output mappings) that we can produce just
by manipulating the parameters is called a family of models.
■ Eg: We expect that the same model family should be suitable for ”Alexa”
recognition and ”Hey Siri” recognition because they seem, intuitively, to be
similar tasks.
39
Example of the Framework

● The meta-program that uses our dataset to choose the parameters is


called a learning algorithm.
● In machine learning, the learning is the process by which we discover
the right setting of the parameter coercing the desired behavior from
our model.
● Train the model with
data.

40
Reading from TB Dive into Deep Learning
● Chapter 1
● Chapter 2 for Python Prelims, Linear Algebra,
Calculus, Probability

41
Artificial Neural Network

42
What are Neural Networks?

43
What are Neural Networks?

● It begins with human brain.

● Humans learn, solve problems, recognize patterns, create, think deeply


about something, meditate and many many more.....
● Humans learn through association. [Refer to Associationism for more
details.]
44
Observation: The Brain

● The brain is a mass of interconnected neurons.


● Number of neurons is approximately 10^(10).
● Connections per neuron is approximately 10^(4 to 5).
● Neuron switching time is approximately 0.001 second.
● Scene recognition time is 1 second.
● 100 inference steps doesn't seem like enough. Lot of parallel
computation.

45
Brain: Interconnected Neurons

● Many neurons connect in to each neuron.


● Each neuron connects out to many neurons.

46
Biological Neuron

47
Connectionist Machines

● Network of processing elements, called artificial neural unit.


● The neurons are interconnected to form a network.
● All world knowledge is stored in the connections between these
elements.
● Neural networks are connectionist machines.

48
What are Artificial Neurons?

● Neuron is a processing element inspired by how the brain works.


● Similar to biological neuron, each artificial neuron will be do some
computation. Each neuron is interconnected to other neurons.
● Similar to brain, the interconnections between neurons store the
knowledge it learns. The knowledge is stored as parameters.

49
Properties of Artificial Neural Nets (ANNs)

● Many neuron-like threshold switching units.


● Many weighted interconnections among units.
● Highly parallel, distributed process.
● Emphasis on tuning parameters or weights automatically.

50
When to consider Neural Networks?

● Input is high-dimensional discrete or real-valued (e.g. raw sensor


input).
● Possibly noisy data. Data has lots of errors.
● Output is discrete or real valued or a vector of values.
● Form of target function is unknown.
● Human readability, in other words, explainability, of result is
unimportant.
● Examples:
○ Speech phoneme recognition
○ Image classification
○ Financial prediction
51
Perceptron

52
Perceptron

● One type of ANN system is based on a unit called a perceptron.


● A perceptron takes a vector of real-valued inputs, calculates a linear
combination of these inputs, and then outputs a 1 if the result is
greater than some threshold and -1 otherwise.

53
Representing Logic Gates using Perceptron

● Perceptron can represent all of the primitive Boolean functions AND,


OR, NAND, and NOR – Linearly separable data
● some Boolean functions cannot be represented by a single
perceptron, such as the XOR function – Linearly nonseparable data
● Perceptron learning rule finds a successful weight vector when the
training examples are linearly separable, it can fail to converge if the
examples are not linearly separable.

54
NOT Logic Gate

Question:
● How to represent NOT gate using a perceptron?
● What are the parameters for the NOT perceptron?
● Data is given below.

55
Perceptron for NOT gate

56
AND Logic Gate

Question:
● How to represent AND gate using a perceptron?
● What are the parameters for the AND perceptron?
● Data is given below.

57
Perceptron for AND gate

58
Perceptron for AND gate

59
Exercise

1. Represent OR gate using Perceptron. Compute the parameters of the


perceptron.
2. Represent NOR gate using Perceptron. Compute the parameters of
the perceptron.
3. Represent NAND gate using Perceptron. Compute the parameters of
the perceptron.

60
Perceptron for OR gate

61
Perceptron for OR gate

62
Perceptron Learning Algorithm

65
Perceptron Learning Algorithm

66
Convergence of Perceptron Learning Algorithm
It can be proved that the algorithm will converge
● If training data is linearly separable.
● Learning rate is sufficiently small
○ The role of the learning rate is to moderate the degree to which weights are
changed at each step.
○ It is usually set to some small value (e.g., 0.1) and is sometimes made to decay as
the number of weight-tuning iterations increases.

67
Perceptron Learning Algorithm for NOT gate

68
Perceptron Learning Algorithm for NOT gate
w1=w0=0,  =1

X1 t W1 W0 z h Isequal(t,h) w New w

-1 1 0 0 0+0=0 -1 No w1=1(1+1)*(-1)=-2 w1 0-2=-2


w0=1(1+1) =2 w0 0+2=2

1 -1 -2 2 2-2=0 -1 Yes w1=1(-1+1)1=0 w1 -2+0=-2


w0=1(-1+1) =0 w0 2+0=2

X1 t W1 W0 z h Isequal(t,h) w New w

-1 1 -2 2 2+2=4 1 Yes w1=1(1-1)*(-1)=0 w1 -2+0=-2


w0=1(1-1) =0 w0 2+0=2
69
Demo Python Code

https://colab.research.google.com/drive/1DUVcOoUIWhl8GQKc6AWR1wi0La
MeNkgD?usp=sharing

Student pl note:
The Python notebook is shared for anyone who has access to the link and the
access is restricted to use BITS email id. So pl do not access from non-BITS email
id and send requests for access.
70
Exercise

1. Represent OR gate using Perceptron. Compute the parameters of the


perceptron using perceptron learning algorithm.
2. Represent AND gate using Perceptron. Compute the parameters of
the perceptron using perceptron learning algorithm.

71
Representational Power of Perceptrons

● A perceptron represents a hyperplane decision surface in the n-


dimensional space of examples.
● The perceptron outputs a 1 for examples lying on one side of the
hyperplane and outputs a -1 for examples lying on the other side.

72
Single Layer Perceptron

73
Example of Linearly Separable data

74
Example of Linearly Separable data

75
Example of Linearly Separable data

76
Non-Linearly Separable Data

77
Non-linearly Separable Data

● Two groups of data points are non-linearly separable in a 2-


dimensional space if they cannot be easily separated with a linear
line.

78
Perceptron for XOR Gate

79
Solution for XOR

80
MLP for XOR

81
Ref:
Chapter 4 of Book: Machine Learning by
Tom M. Mitchell

82
Thank You All !

You might also like