Deep Learning
Deep Learning
Networks/ Large
Language Models
Instructor: Dr. Umara Zahid
MSCS Fall 2022
Deep Learning
• Deep learning is a subfield of machine learning that is a set of
algorithms that is inspired by the structure and function of the brain.
• These algorithms are usually called Artificial Neural Networks (ANN).
• Deep learning is one of the hottest fields in data science with many
case studies that have astonishing results in robotics, image
recognition and Artificial Intelligence (AI)
• Pioneer of Large Language Models (LLM)
• (used by OpenAI in GPT)
Which Software Libraries you need
for Deep learning
• One of the most powerful and easy-to-use Python libraries for
developing and evaluating deep learning models is Keras
• It wraps the efficient numerical computation libraries Theano and
TensorFlow.
• The advantage of this is mainly that you can get started with neural
networks in an easy and fun way
What is Artificial Neural
Network?
• A Neural Network is a system designed to operate like a
human brain
• Human information processing takes place through the
interaction of many billions of neurons connected to
each other sending signals to other neurons
• Similarly, a Neural Network is a network of artificial
neurons, as found in human brains for solving artificial
intelligence problems such as image identification
• They may be a physical device or mathematical
constructs
• In other words, Artificial Neural Network is a parallel
computational system consisting of many simple
processing elements connected to perform a particular
task
Motivation (Replicating working of
human brain)
Human brain contains billion of neurons which are connected to many other neurons to form a network so that
if it sees any image, it recognizes the image and processes the output
Steps:
1. Dendrite receives signals from other neurons.
2. Cell body sums the incoming signals to generate input.
3. When the sum reaches a threshold value, neuron fires and the signal travels down the axon to the other
neurons.
4. The amount of signal transmitted depend upon the strength of the connections.
5. Connections can be inhibitory, i.e. decreasing strength or excitatory, i.e. increasing strength in nature.
Structure of Artificial Neuron
• Artificial Neuron is also called as perceptron. This consist of the
following items:
• Input
• Weight
• Bias
• Activation Function
• Output
• A perceptron is to classify linearly separable classes
• Often Binary classification
How Perceptron Works?
• All the inputs X1, X2, X3,…., Xn multiplies
with their respective weights
• All the multiplied values are added (Linear
Function)
• Sum of the values are applied to the
activation function
• Weights W1, W2, W3,…., Wn shows the
strength of a neuron.
• Bias allows you to change/vary the curve
of the activation curve
Input layer, Hidden layer and
Output layer
Input Layer
Input layer contains inputs and weights.
Example: X1, W1, etc.
Hidden Layer
In a neural network, there can be more than
one hidden layer. Hidden layer contains the
summation and activation function.
Output Layer
Output layer consists the set of results
generated by the previous layer. It also
contains the desired value, i.e. values that
are already present in the output layer to
check with the values generated by the
previous layer. It may be also used to improve
the end results.
Understanding Layers with
examples
• Suppose you want to go to a food shop. Based on the three factors you will
decide whether to go out or not:
1. Weather is good or not, i.e. X1. Say X1=1 for good weather and X1=0 for bad
weather.
2. You have vehicle available or not, i.e. X2. Say X2=1 for vehicle available and X2=0
for not having vehicle.
3. You have money or not, i.e. X3. Say X3=1 for having money and X3=0 for not having
money.
• Based on the conditions, you choose weight on each condition like W1=6 for
money as money is the first important thing you must have, W2=2 for
vehicle and W3=2 for weather and say you have set threshold to 5.
In this way, perceptron makes decision making model by calculating X1W1,
X2W2, and X3W3 and comparing these values to the desired output.
Activation/ Transfer Function
• An Activation Function decides whether a neuron should be activated or not.
• This means that it will decide whether the neuron’s input to the network is important or
not in the process of prediction using simpler mathematical operations.
• Purpose: an activation function adds non-linearity to the neural network
• A neural network working without the activation functions.
• Although the neural network becomes simpler, learning any complex task is
impossible, and the neural network would be just a linear regression model.
Activation Function
• The activation function determines the kind of problem that will be
solved by the neural network
• The activation function translates the input signals into output signals.
Four types of transfer functions are commonly used:
1. Unit step/ Binary Step/ Threshold Function
2. Linear Activation Function
3. Non-Linear Activation Functions
1. Sigmoid
2. Hyperbolic Tangent
3. ReLU Function
4. Softmax
5. Swish (by Google Researchers)
Unit Step (Threshold)
• Binary step function depends on a
threshold value that decides whether
a neuron should be activated or not.
• not continuous, no smooth transition
• Good for Binary Classification (Like
SVM)
• Limitations:
• It cannot be used for multi-class
classification problems.
• The gradient (slope) of the step function
is zero, which causes a hindrance in the
backpropagation process.
Linear Activation Function
• The function doesn't do anything to the weighted sum of the input, it
simply output the value it was given.
• A Simple Linear Regression Model
• Limitations:
• It’s not possible to use backpropagation as the derivative of the function is a
constant and has no relation to the input x.
• All neural network layers will collapse into one if a linear activation function is
used. No matter the number of layers in the neural network, the last layer will still
be a linear function of the first layer. So, essentially, a linear activation function
turns the neural network into just one layer.
Non-Linear Activation Functions
• Non-linear activation functions solve the following limitations of
linear activation functions:
• They allow backpropagation
• It’s possible to go back and understand which weights in the input neurons
can provide a better prediction
• They allow the stacking of multiple layers of neurons
Sigmoid Function
• Most widely used function
• This function takes any real value as input and outputs
values in the range of 0 to 1.
• The larger the input (more positive), the closer the
output value will be to 1, whereas the smaller the
input (more negative), the closer the output will be to
0
• Good to predict the probability as an output.
• As probability of anything exists only between the range of
0 and 1, sigmoid is the right choice because of its range.
• The function is differentiable and provides a smooth
gradient, i.e., preventing jumps in output values.
• Somewhat works like Naïve Bayes
Tanh Function (Hyperbolic
Tangent)
• Tanh function is very similar to the
sigmoid/logistic activation function and even has
the same S-shape with the difference in the
output range of -1 to 1.
• In Tanh, the larger the input (more positive), the
closer the output value will be to 1.0, whereas the
smaller the input (more negative), the closer the
output will be to -1.0.
• Advantages:
• It helps in centering the data and makes learning for
the next layer much easier.
• map the output values as strongly negative, neutral, or
strongly positive.
ReLU Function
• ReLU stands for Rectified Linear Unit.
• Allows for backpropagation
• ReLU function does not activate all the
neurons at the same time.
• Used in hidden layers
• The neurons will only be deactivated if the
output of the linear transformation is less
than 0.
• Since only a certain number of neurons are
activated, the ReLU function is far more
computationally efficient when compared to
the sigmoid and tanh functions.
Softmax Function
• Multi-Class Classification
• The function at last layer of neural network
• Calculates the probabilities distribution of the event over ’n’ different
events.
• The main advantage of the function is able to handle multiple classes.
Can we use different activation
functions in neural network?
• All hidden layers usually use the same activation function.
• However, the output layer will typically use a different activation
function from the hidden layers.
• The choice depends on the goal or type of prediction made by the
model.
Important Points
• A few rules for choosing the activation function for your output layer
based on the type of prediction problem:
• Regression - Linear Activation Function
• Binary Classification—Sigmoid/Logistic Activation Function
• Multiclass Classification—Softmax
• Multilabel Classification—Sigmoid/Logistic
• The activation function used in hidden layers is typically chosen based
on the type of neural network architecture.
• Convolutional Neural Network (CNN): ReLU activation function.
• Recurrent Neural Network: Tanh and/or Sigmoid activation function.
Types of Neural Networks
1. Feedforward neural networks
2. Convolutional neural networks (CNNs)
3. Recurrent neural networks (RNNs)
4. Long short-term memory (LSTM) networks
5. Autoencoders
6. Generative adversarial networks (GANs)
7. Reinforcement learning neural networks
Types of Neural Networks
• There are different types of neural networks, but they are generally classified into feed-
forward and feed-back networks:
• A feed-forward network is a non-recurrent network which contains inputs, outputs, and
hidden layers; the signals can only travel in one direction.
• Input data is passed onto a layer of processing elements where it performs calculations.
• Each processing element makes its computation based upon a weighted sum of its inputs.
• The new calculated values then become the new input values that feed the next layer.
• This process continues until it has gone through all the layers and determines the output.
• A threshold transfer function is sometimes used to quantify the output of a neuron in the
output layer.
• Feed-forward networks include Perceptron (linear and non-linear) and Radial Basis
Function networks. Feed-forward networks are often used in data mining
• Radial Basis Function: Classifies data points based on their distance from a center point
Recurrent/Feedback Neural Network
• A feed-back network (e.g., recurrent neural network or RNN) has
feed-back paths meaning they can have signals traveling in both
directions using loops.
• All possible connections between neurons are allowed.
• Since loops are present in this type of network, it becomes a non-
linear dynamic system which changes continuously until it reaches a
state of equilibrium.
• Feed-back networks are often used in associative memories and
optimization problems where the network looks for the best
arrangement of interconnected factors
Types of Neural Networks
• Feedforward neural networks: These are the most basic type of neural
network and are used for simple classification and regression tasks. They
have a series of input nodes that feed into a series of hidden layers and
output nodes.
• Convolutional neural networks (CNNs): These are commonly used for image
classification tasks, but can also be used for other tasks involving spatial
data. They use convolutional layers to extract features from input data.
• Recurrent neural networks (RNNs): These are used for tasks that involve
sequences of data, such as time series data or natural language processing.
They have loops in their architecture that allow them to take into account
previous inputs.
Types of Neural Networks
• Long short-term memory (LSTM) networks: These are a type of RNN
that are designed to handle the vanishing gradient problem that can
occur in standard RNNs. They are commonly used in natural language
processing tasks.
• Autoencoders: These are neural networks that are trained to
reconstruct input data from a compressed representation. They can
be used for data compression, anomaly detection, and image
denoising.
Types of Neural Networks
• Generative adversarial networks (GANs): These are a type of neural
network that are used for generating new data that is similar to a
given dataset. They consist of two networks: a generator network
that generates data, and a discriminator network that tries to
distinguish between real and generated data.
• Reinforcement learning neural networks: These are used in
reinforcement learning tasks, where an agent learns to take actions in
an environment to maximize a reward signal. They can be used for
tasks such as game playing or robotics.
What is Tensorflow?
• TensorFlow is an open source machine learning framework for
all developers.
• It is used for implementing machine learning and deep learning
applications.
• TensorFlow is a software library or framework, designed by the
Google team to implement machine learning and deep learning
concepts in the easiest manner
• TensorFlow is designed in Python programming language, hence
it is considered an easy to understand framework.
• It combines the computational algebra of optimization
techniques for easy calculation of many mathematical
expressions.
• Official Website:
• www.tensorflow.org
Important Features of Tensorflow
• It includes a feature of that defines, optimizes and calculates
mathematical expressions easily with the help of multi-dimensional
arrays called tensors.
• It includes a programming support of deep neural networks and
machine learning techniques.
• It includes a high scalable feature of computation with various data
sets.
• TensorFlow uses GPU computing, automating management. It also
includes a unique feature of optimization of same memory and the
data used.
Tensorflow Installation Guide
• Please see the installation steps on the following web link:
• TensorFlow - Installation – Tutorialspoint
• I will show live demo as well at the end of the lecture
Tensor Data Structure
• Tensors are used as the basic data structures in TensorFlow language
• Tensors represent the connecting edges in any flow diagram called
the Data Flow Graph
• Tensors are defined as multidimensional array or list
• Tensors are identified by the following three parameters
• Rank
• Shape
• Type
Tensor Parameters
1. Rank:
Unit of dimensionality described within tensor is called rank.
It identifies the number of dimensions of the tensor. A rank
of a tensor can be described as the order or n-dimensions of
a tensor defined
2. Shape:
The number of rows and columns together define the shape
of Tensor
3. Type:
• Type describes the data type assigned to Tensor’s elements.
• A user needs to consider the following activities for
building a Tensor
• Build an n-dimensional array
• Convert the n-dimensional array
Tensorflow tutorial links
• TensorFlow Tutorial: Deep Learning for Beginner’s (guru99.com)
• TensorFlow 2 quickstart for beginners | TensorFlow Core
Large Language Model
• A language model consisting of a neural network with many
parameters (typically billions of weights or more)
• Trained on large quantities of unlabeled text
• Use self-supervised learning or semi-supervised learning
• LLMs emerged around 2018
• perform well at a wide variety of tasks (NLP, sentiment analysis,
named entity recognition, or mathematical reasoning)
• recognize, summarize, translate, predict and generate text and other content
• Follows deep learning model
Tasks performed by LLM
• Building conversational chatbots like ChatGPT.
• Generating text for product descriptions, blog posts and articles.
• Answering frequently asked questions (FAQs) and routing customer
inquiries to the most appropriate human.
• Analyzing customer feedback from email, social media posts and
product reviews.
• Translating business content into different languages.
• Classifying and categorizing large amounts of text data for more
efficient processing and analysis.
Datasets used by LLM
• Textual datasets are:
• Common Crawl, The Pile
• MassiveText
• Wikipedia
• GitHub
• The datasets run up to 10 trillion words in size
• The stock of high-quality language data is within 4.6-17 trillion words
• Which is within an order of magnitude for the largest textual datasets
• I have doubts in the datasets (Class Discussion)
Popular Language Models
• Some of the most popular large language models are:
• GPT-3 (Generative Pretrained Transformer 3) – developed by OpenAI.
• BERT (Bidirectional Encoder Representations from Transformers) –
developed by Google.
• RoBERTa (Robustly Optimized BERT Approach) – developed by
Facebook AI.
• T5 (Text-to-Text Transfer Transformer) – developed by Google.
• CTRL (Conditional Transformer Language Model) – developed by
Salesforce Research.
• Megatron-Turing – developed by NVIDIA
Are we scared of LLM
• There are concerns about their impact on job markets, communication, and society
• One major concern about LLMs is their potential to disrupt job markets. Large Language
Models, with time, will be able to perform tasks by replacing humans like legal documents
and drafts, customer support chatbots, writing news blogs, etc. This could lead to job losses
for those whose work can be easily automated.
• New jobs will also be created as a result of the increased efficiency and productivity enabled
by LLMs. For example, businesses may be able to create new products or services that were
previously too time-consuming or expensive to develop.
• LLMs could be used to create personalized education or healthcare plans, leading to better
patient and student outcomes. LLMs can be used to help businesses and governments make
better decisions by analyzing large amounts of data and generating insights.
• Bill Gates Prediction? Regarding the world getting rich
• Will Robotics technology evolve to destroy humankind?
https://fortune.com/2023/05/04/geoffrey-hinton-godfather-ai-tech-will-get-smarter-than-hum
ans-chatgpt
/