0% found this document useful (0 votes)

15 views99 pages

Mod 1

make this pdf into the structred way

Uploaded by

SATHWIK GUDUGUNTLA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views99 pages

Mod 1

make this pdf into the structred way

Uploaded by

SATHWIK GUDUGUNTLA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 99

Introduction

to
Machine
Learning
By
Sneha
Sureddy
Syllabus

 Introductionto machine learning, Basic mathematics,

logistic regression, multilayered perceptron (MLP),
fundamentals of deep learning, simpler models.
 Transfer
Learning, Model Selection, History of Neural
Networks, learning deep networks as a minimization
problem of a mathematical function.
 Validation methods, Gradient descent algorithm,
optimization using gradient descent, stochastic
gradient descent. Evaluating the Networks, Early
Stopping
Artificial Intelligence

Artificial :Made by Human,

Intelligence: Ability to understand or think

 Enables machine to mimic human behaviour.
 Create machines which behave, think and able to make decisions like
humans.
 Artificial Intelligence is coined by John McCarthy
3
Machine Learning

 Machine Learning coined by Arthur Samuel in 1959, allows the machine to learn
from examples and experience, without being explicitly programmed.

Machine Learning is
 Study of algorithms that
 improve their performance
 at some task
 with experience 4
Alpydin & Ch. Eick: ML Topic1 5
Machine Learning and Deep
Learning

 Machine Learning is suitable for Structured data. It is data that fits

neatly into data tables and includes discrete data types such as
numbers, short text, and dates.
 Deep Learning is suitable for Unstructured data, which doesn't fit
neatly into a data table because its size or nature: for example, audio
and video files and large text documents.
Why “Learn”?

 Learning is used when:

 Human expertise does not exist (navigating on Mars),

 Humans are unable to explain their expertise (speech

recognition)

 Solution needs to be adapted to particular cases (user

biometrics) 7
Traditional Learning vs. Machine
Learning
How machines learn?

 Method of teaching computers to make

predictions based on some data

9
How To Make A Machine Learn

 ML algorithm is trained using a training data set to create a model.

 When new input data is introduced to the ML algorithm, it makes a prediction on
the basis of the model.
 The prediction is evaluated for accuracy and if the accuracy is acceptable, the ML
algorithm is deployed.
 If the accuracy is not acceptable, the ML algorithm is trained again and again with an
10

augmented training data set.

Training the machine

11
Supervised Learning

Apple

Decision Function
/ Hypothesis

Orange

Supervised Classification

In a supervised learning model, the algorithm learns on a labeled dataset,

providing an answer key that the algorithm can use to evaluate its
accuracy.
 Supervised learning is the one where you have input variables (x) and an output
variable (Y) and you use an algorithm to learn the mapping function from the input
to the output
 To detect patterns based on the typical characteristics of the input
data.
 Groups similar data samples and identify different clusters within the
data.
Unsupervised Learning

Decision Function
/ Hypothesis

Unsupervised Classification

An unsupervised model, in contrast, provides unlabeled data that the

algorithm tries to make sense of by extracting features and patterns on its
own.
Binary Classification

Decision Function
/ Hypothesis

Binary classification is the task of classifying the elements of a

given set into two groups (predicting which group each one belongs
to) on the basis of a classification rule.
Multiclass Classification

Decision Function
/ Hypothesis

A classification task with more than two classes;

e.g., classify a set of images of fruits which may
be oranges, apples, or bananas.
REINFORCEMENT LEARNING

Reinforcement learning can be thought of as a hit

and trial method of learning.
The machine gets a Reward or Penalty point for each
action it performs. If the option is correct, the machine
gains the reward point or gets a penalty point in case
of a wrong response.
Application of Machine Learning

• Speech and Hand Writing Recognition

• Robotics
• Search Engines (Information Retrieval)
• Learning to Classify new astronomical structures
• Medical Diagnosis
• Computer Vision (Object Detection algorithms)
• Email filtering
• Stock Market analysis
• Game playing etc.
mite container ship motor scooter leopard true image
mite container ship motor scooter leopard label
black widow lifeboat go-kart jaguar
cockroach amphibian moped cheetah assigned labels by
tick fireboat bumper car snow leopard
machine
starfish drilling platform golf cart Egyptian cat

grille mushroom cherry Madagascar cat true image

convertible agaric dalmatia n squirrel monkey label
grille mushroom grape spider monkey
pickup jelly fungus elderberry titi assigned labels by
beach wagon gill fungus Staffordshire bullterrier indri machine
fire engine dead-man’s-fingers currant howler monkey

Source: Krizhevsky et al 2012: ImageNet Classification with Deep Convolutional Neural

Networks
Deep Learning in
Games

AlphaGo
machin
e vs
human

Ke Jie
Learned Model Parameters
Training Mathematical
x1 Set y1 Model
x2 y2
x3 y3
?
x4 y4

xi1 xiM
xN – 1 yN – 1
xN yN Learned
Parameters
Logistic Regression: Can be applied
when data is linearly separable.
Linearly separable data
Training Set
x= y=
x1 data outcome
y1

x2 y2
?

x3 y3

x4 y4
PREDICTION

xN – 1 yN – 1
xN yN
Learned Model
Parameters
Training Mathematical
x1 Set y1 Model
x2 y2
x3 y3
?
x4 y4

xi1 xiM
xN – 1 yN – 1
xN yN Learned
Parameters
Linear Predictive
Model
zi = (b1 x xi1) + (b2 x xi2) + ⋯ + (bM x xiM)
zi bia
+ b0 s
bM
b1

xi1 xi2 features of xiM

data
In logistic regression, bias (also known as the intercept or constant term) is a parameter that is
added to the model to shift the decision boundary or the predicted probability.
Simple Linear Predictive Model

A very simple idea here is we're just going to multiply every

component of the vector X by a parameter, where
 parameter b1 will be multiplied by the first component of Xi,
 parameterb2 will be multiplied by the second component of Xi
and parameter bm will be multiplied by the mth component.
We do all of those multiplications, we add them up and then we
add an additional constant or what we call a bias, b0
this a mapping from the data Xi to a number Zi,
Will it Rain?
xi = yi = 1,
features feature outcom yes
for day i Cloud Cover
s
Humidity Temperature Air Pressure
e
Did it Rain
yi = 0,
no
0.5 80% 75 1.2 1

0.2 95% 83 1.3 0

WEATHER z1 = (b1 X 0.5) + (b2 X 0.8) + (b3 X 75) + (b4 X y1 = 1 RAIN

1.2) + b0
y2 = 0
z2 = (b1 X 0.2) + (b2 X 0.95) + (b3 X 83) + (b4 X
1.3) + b0
Sigmoid function
 The sigmoid function, also known as the logistic function, is a
mathematical function that maps any real-valued number to a
value between 0 and 1. It is defined as:
sigmoid(z) = 1 / (1 + exp(-z))
The sigmoid function has an S-shaped curve, where:
For z very small, sigmoid(z) approaches 0
For z very large, sigmoid(z) approaches 1
 Sigmoid outputs probabilities, making it suitable for binary
classification problems where the target variable is 0 or 1, yes or
no, etc.
 The sigmoid of 100 is:
sigmoid(100) = 1 / (1 + e^(-100))
 Using the approximate value of e (2.71828), we can calculate the sigmoid:
sigmoid(100) ≈ 1 / (1 + 2.71828^(-100))
sigmoid(100) ≈ 1 / (1 + 1.38e-44)
sigmoid(100) ≈ 1
So, the sigmoid of 100 is approximately 1.Note that the sigmoid function
approaches 1 as the input value increases, and approaches 0 as the input value
decreases. In this case, the input value of 100 is large enough that the sigmoid
function outputs a value very close to 1.
Convert to a
Probability
zi = (b1 x 0.5) + (b2 x 0.8) +
(b3 x 75) + (b4 x 1.2) + b0 feature
Sigmoid Function
1 p(yi = 1|xi) =
s
Cloud Cover Humidity Temperature Air Pressure
σ(zi)
0.5 80% 75 1.2
Chance of

0.
5 b parameters tell us
how important data
Rain

variables are to the

–
0 zi
prediction
– – 0 2 4 6
6 4 2
Learned Model
Parameters
Training Logistic Regression
x1 Set y1 σ(zi Model (or
)
z = (b X “Network”)
x2 y2 x ) + (b X x ) + ꔇ + (bM X
zi i 1 i1 2 i2 iMx 0
x3 y3 )+
x4 y4 b
bM
b1

xN – 1 yN – 1
xi1 xi2 xiM
xN yN

(b0, b1,… bM)

Learned
Parameters
Motivation of Multi Layered
Perceptron(MLP)
Extended Logistic
Regression
probability of K
σ(zi1 σ(ziK
latent
) ) processes/features

zi1 ziK

xi1 xi2 features of xiM

data
Extended Logistic
Regression
σ(ζi probability of a
particular
)
outcome
ζi
probability of K
σ(zi1 σ(ziK
latent
) ) processes/features

zi1 ziK

xi1 xi2 features of xiM

data
Images are Encoded as
Numbers 0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0

00 00 00 00 00 00 00 00 0
0 0 0 0 00 00 0 0 0 118
0 200
0 223
0 155
0 155
0 23
0 0 0 0 0 0
0 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 0 0 6 124 214 253 253 253 253 254 229 213 67
0 0 0
0 0 0
0
0 0 0 0 0 0 0 0 0 43 198 253 254 253 247 175 175 235 253 253 108 0
0 0 0 0 0 0 0 0 58 0 8
253 183 00 0 0 0 0 0 36 222 253 118 0 0 0 0 0 0
0 0
0 0 0 0 0 0 0 0 54 250 128
0 2 0 0 0 0 6 71 192 237 253 247 71 0 0 0 0 0 0

00 00 00 00 00 00 00 00 09 213 253
212 50
253 0
253 18
224 58 19247
123 198 253 254 253 0
85 0 00 016 0 139
0 247
0 0
247 71 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 6 241 227 136 200 253 253 253 254 192 34 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 58 9 254 254 186 14 0 0 0 0 0 0
189 254 175
118 254
0 256 254 254 254 149059 0 0 0 00 0 00 00 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 101 253 253 254 253 253 253 42 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 8 138 247 253 243 159 196 243 253 19 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 12 183 253 253 253 50 0 0 49 253 214 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 180 254 253 213 50 2 0 0 711 253 214 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 234 217 97 10 0 0 0 23 207 254 215 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 174 253 156 0 0 0 0 45 215 253 253 95 0 0 0 0 0 0 0 0 0 0

0 0 0 0 9 210 253 163 5 19 49 130 244 253 251 137 4 0 0 0 0 0 0 0 0 0 0

0 0 0 0 13 229 253 254 192 253 253 253 254 253 137 0 0 0 0 0 0 0 0 0 0 0 0
0

0 0 0 0 0 160 253 254 253 253 253 253 230 132 4 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 18 112 194 254 254 254 163 59 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0

Data Source: MNIST Dataset by LeCun et al. (1999) / CC-by- 0

0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

SA 3.0
Single Filter
(Shallow
Learning)
filte
Use of single filter only
r
looks for the average
shape
Layer

A layer is a horizontal slice of a neural network,

comprising multiple nodes (neurons) that process inputs
and produce outputs.
A layer can be thought of as a container for multiple
filters.
 Layersare responsible for transforming the input data,
and each layer builds on the previous one to extract more
complex features.
Filter
 A filter is a small, sliding window that scans the input data, performing a specific
operation at each position.
 A filter is a vertical slice of a layer.
 Filters are responsible for detecting specific features or patterns in the input data.
Example:-
 A convolutional layer (layer) might contain multiple filters (filter 1, filter 2, filter
3, etc.) that scan the input image to detect different features (edges, lines,
textures, etc.)
 Each filter (filter 1, filter 2, etc.) is applied to the entire input image, generating a
feature map that highlights specific features.
Summary

 Layers are the building blocks of a neural

network, processing inputs and producing outputs.
 Filters are small, sliding windows within a layer
that detect specific features or patterns in the input
data.
Multi-Layer Perceptron (MLP)
A Multi-Layer Perceptron (MLP) typically consists of three types of layers:

1. Input Layer: This layer receives the input features or data. The number of nodes
in this layer corresponds to the number of input features.

2. Hidden Layers: These layers perform complex representations and

transformations of the input data. There can be one or multiple hidden layers,
each consisting of a set of nodes (neurons) with nonlinear activation functions.
The hidden layers allow the MLP to learn and represent more abstract features.
3. Output Layer: This layer generates the predicted output. The
number of nodes in this layer corresponds to the number of output
classes or targets. In a typical MLP architecture, the layers are fully
connected, meaning every node in one layer is connected to every
node in the next layer.
BIOLOGICAL MOTIVATION
 Parts of Human Neuron:
• DENDRITES: Accepts the inputs
• SOMA : Process the inputs
• AXON : Turns the processed inputs into outputs.
 The human brain contains a densely interconnected network of
approximately 1011neurons, each connected, on average, to 10 4 others.

51
NEURAL NETWORK REPRESENTATION
 Artificial Neural Network (ANNs) are programs designed to solve any
problem by trying to mimic the structure and the function of our
nervous system.
 Neural networks are based on simulated neurons, which are joined
together in a variety of ways to form networks.(first one is 3-4-1
architecture, second one is 3-4-2 architecture)

Figure: ANN Model 52

 3-4-2 neural network architecture refers to a neural network with:
• 3 neurons in the input layer.
• 4 neurons in the hidden layer.
• 2 neurons in the output layer.
 Here’s a description of how the architecture looks:

1. Input Layer: It consists of 3 neurons. Each of these neurons represents a feature

or an input value.
2. Hidden Layer: The hidden layer has 4 neurons. Each neuron in the hidden layer
is connected to every neuron in the input layer, with associated weights.
3. Output Layer: The output layer has 2 neurons, representing the final output
values. Each neuron in the output layer is connected to every neuron in the
hidden layer.
Neuron or Perceptron

54
Neuron
 A neuron typically consists of:
1. Inputs: Receives data from other neurons or external inputs.
2. Weights: Assigns importance to each input.
3. Bias: Adds a constant value to the weighted sum.
4. Activation function: Determines the output.
5. Output: Sends the result to other neurons or to the output of the
network.
Activation function
 The activation function is used to determine the output of a node given its
input. It introduces non-linearity to give output.
 Common activation functions include:
1. Sigmoid: Maps the input to a value between 0 and 1.
2. ReLU (Rectified Linear Unit): Maps all negative values to 0 and all positive
values to the same value.
3. Tanh (Hyperbolic Tangent): Maps the input to a value between -1 and 1.
4. Softmax: Used for multi-class classification problems, it maps the input to a
probability distribution over all classes.
Neural networks & Deep learning
 Neural networks are a type of machine learning model inspired by the
structure and function of the human brain.
 They consist of layers of interconnected nodes (neurons) that process and
transmit information.
 Deep learning is a subfield of machine learning that focuses on neural
networks with multiple layers, typically more than three.
 Deep learning models are designed to learn complex patterns
and representations in data, such as images, speech, and text.

Here's a simple analogy to illustrate the difference:-

 Neural networks are like a network of roads, while deep
learning is like a high-speed highway system that uses those
roads to quickly and efficiently transport data.
Machine learning and deep learning
Machine Learning is used when:
 Data is relatively small to medium-sized (tens of thousands to hundreds of
thousands of samples).
 Relationships between features and targets are relatively simple.
 Computational resources are limited.
Deep Learning is used when:
 Data is large and complex (millions to billions of samples).
 Relationships between features and targets are non-linear and complex.
 Computational resources like high-end GPUs or TPUs are required
Model Selection
 The logistic regression is restricted to a linear classifier where the
multilayer perceptron allows a nonlinear classifier. That nonlinearity in
the classification decision yields improve performance.

In general, if your problem requires:

 Simple models with few parameters, use machine learning.
 Complex models with many parameters, use deep learning.
Transfer learning
 Transfer learning is a machine learning technique where a model
trained on one task or dataset is re-used or fine-tuned for
another related task or dataset.
 The idea is to leverage the knowledge and features learned from
the initial task to improve performance on the new task, rather
than training a new model from scratch.
 Transfer learning is commonly used in deep learning, where a
pre-trained model is used as a starting point for a new task.
For example, a model trained on image recognition tasks like
ImageNet can be fine-tuned for a specific task like object detection,
segmentation, or facial recognition.

Transfer learning has many benefits, including:

• Reduced training time and data requirements

• Improved performance on the new task

• Sharing knowledge across related tasks

Early history of Neural
Networks
• The multilayer perceptron was invented in 1960 Multilayer
Perceptron
around 1960.Computing power was not what
it is today. So, while the multilayer
perceptron was developed, its use was 1986 Back
limited and the amount of data was Propagation
significantly less than we have today.
• Back propagation is a method that allows us
to learn the parameters of the model in a very 1989 Convolutional Neural
Network
efficient way.
• The convolutional neural network was a key
technology for analysing images.
The Seasons of Neural
Networks
1990 –1994 Neural Nets in the
• The models did not perform well. The Wild

amount of data required to train such a

model well is significantly increased.
• Long short-term memory is a very key 1995 Long Short-Term
Memory
technology for analysing data that
varies as a function of time.
• Neural networks did not perform as 1998 – 2005 More Neural Nets in the
well as has often advertised. Other Wild
methods in machine learning came to
the forefront
The Seasons of Neural Networks

• People did not speak about neural networks at all.

• Decided to rebrand the technology with the idea that
this technology really could work effectively but
nobody would pay attention to it just because of the
name.
• GPU provided a computational platform. ImageNet
was a dataset or is a data set of images over a million
images.
• AlphaGo was based upon the convolutional neural &
reinforcement learning. AlphaGo demonstrated the
capacity to play the game Go at a level that exceeded
the performance of humans.
Learning deep networks as a minimization problem
 Learn parameters to give us the best performance
 A loss function defines a penalty for poor predictions
 Want to minimize average loss
loss
functio gues
Loss Function
n s
N Define σ(zi) as the predicted
b* = arg
min
1
b N Σ
ℓ(yi,
i σ(zi))
probability
yi is our true label
optimal
paramete
rs true label Cross Entropy Mathematical form:
ℓ(y, σ(z)) = -ylog σ(z) - (1 - y)
log(1 - σ(z))
Empirical Risk Minimization (ERM):The goal is to minimize the
difference between the network's predictions and the true labels, measured
by a loss function (e.g., cross-entropy for classification problem, mean
squared error for regression problem).
Backpropagation (BP)

BP allows for efficient computation of gradients of the loss function with

respect to model parameters, which is essential for optimization.
Backpropagation (BP) works as follows:

1. Forward pass: Input data flows through the network, layer by layer, to
produce an output.

2. Error calculation: The difference between the predicted output and the
actual output (target) is calculated, resulting in an error or loss.
3. Backward pass: The error is propagated backwards through the network,
layer by layer, to calculate the gradients of the loss function with respect to
each parameter.

4. Weight update: The gradients are used to update the model parameters
(weights and biases) to minimize the loss function.
Overfitting
 The model performs well during training but does not perform well during
testing.
 Causes of overfitting:
1. Model complexity: Using a model with too many parameters or layers.
2. Too much training: Training the model for too many epochs or iterations.
3. Small training dataset: Using a dataset that is too small to capture the
underlying patterns.
4. Noise in the data: Presence of noise or outliers in the training data.
 To overcome overfitting techniques like Validation, Early Stopping,
Regularization etc., can be used.
Validation methods

 Validation methods in deep learning are essential techniques used to evaluate the
performance of a model during training and prevent overfitting.
 Validation, in the context of machine learning and deep learning, refers to the
process of evaluating a model's performance on a separate dataset, called the
validation set, during training.
 This dataset is not just used for training the model, but rather for assessing its
performance and making adjustments as needed.
Validation Process:
 Split data: Divide the available data into training, validation, and testing sets
(e.g., 70% for training, 10% for validation, and 20% for testing).
 Train model: Train the model on the training set.
 Evaluate on validation set: Evaluate the model's performance on the validation
set during training.
 Adjusthyperparameters: Based on validation performance, adjust
hyperparameters or training parameters.
 Repeat: Repeat steps until validation performance improves.
 Final evaluation: Evaluate the final model on the testing set to estimate its
performance on new, unseen data.
Split Data in
Separate Groups
x1 y1 xx
x1 11 y1
x2 yy
x2 11
y2 y2
xx22
x3 y3 x3 y3
x4 y4 random yy
x4 22 y4
assignme
nt xx33
yy33
xN – 1 yN – 1 xN –xx1 44 yN – 1
xyyN yN
xN yN 44

all available trainin validatio testin

data g n g

–– 11

yyNN ––
Split Data in
Separate Groups x1
x2
y1
y2
x3 y3
x1 y1 x4 y4
x2 y2
yN – 1
testin
x3 y3 xN – 1
yN
x1 g y1
xN y2
x4 y4 x2
trainin x3 y3
g x4 y4
validatio
x1 n y1 x yN – 1
xN – 1 yN – 1 x2 y2 xN
N–1

yN
x3
xN yN x4 y3
y4

all available
data yN – 1
xN – 1
yN
xN
trainin
x1 g y1
x2 y2 refine
model
x3 y3
x4 y4 validatio testin
x1 n y1 x1 g y1
x2 y2 x2 y2
xN – 1 x3 y3 x3 y3
yN – 1
xN x4 y4 x4 y4
yN

yN – 1 xN – 1 yN – 1
xN – 1
yN xN yN
xN

(b0, b1,… bM)

estimate performance on final performance

validation set evaluation
 K-Fold Cross-Validation is on of the validation methods . In this
method ,we Divide the dataset into k folds. Train the model on k-1 folds
and evaluate on the remaining fold. Repeat this process k times, and
average the performance metrics.
 These validation methods help ensure that the model is generalizing well to
new, unseen data and not overfitting to the training data. By monitoring
performance on a separate validation set, you can adjust hyperparameters
to improve the model's performance.
Gradient Descent (GD)
 Gradient Descent (GD) is an optimization algorithm used to minimize the loss
function in machine learning. It iteratively updates the model parameters to
find the optimal values that result in the lowest loss.
 Here's a step-by-step explanation:
1. Initialize parameters: Start with initial values for the model parameters.
2. Compute loss: Calculate the loss function using the current parameters.
3. Compute gradient: Calculate the gradient of the loss function with respect to
each parameter. The gradient indicates the direction of the steepest ascent.
4. Update parameters: Update the parameters in the opposite direction of the
gradient to minimize the loss
4
current point

update
slope
3
(b)

2
f

–2 –1 0 1 2
b
Gradient Descent Optimizer
 Visualize the entire hypothesis of possible weight vectors and their associated
E values
 Wo,W1: weights of a linear unit
 E: Error for fixed set of training examples
 Gradient descent search determines a weight vector that minimizes E by
starting with an arbitrary initial weight vector, then repeatedly modifying it in
small steps.
 At each step, the weight vector is altered in the direction that produces the
steepest descent along the error surface

84
Derivation of the Gradient Descent Rule
 How can we calculate the direction of steepest descent along the error
surface?
 This direction can be found by computing the derivative of E with respect to
each component of the vector

is learning rate, if step size is too big, we may miss global minima if it is too
small it takes lot of time to converge
85
Gradient Descent (GD):
1. Batch processing: GD uses the entire training dataset to compute the gradient of
the loss function.
2. GD calculates the exact gradient of the loss function.
3. GD updates the model parameters after processing the entire dataset.

Stochastic Gradient Descent (SGD):

1. Online processing: SGD uses a single sample to compute the gradient of the loss
function.
2. SGD approximates the gradient of the loss function using a single sample
3. SGD updates the model parameters after processing each sample or small batch.
Difference between Standard gradient descent and stochastic gradient descent

Standard gradient descent Stochastic gradient descent

Error is summed over all examples Weights are updated upon examining
before updating weights each training example.

More computation time per weight Less computation time per weight
update update

Used with a larger step size Used with a smaller step size
per weight update per weight update

87
Summary
SGD has several advantages over GD:
1. SGD converges faster than GD, especially for large datasets.
2. SGD requires less memory since it only processes a single sample or small
batch at a time.
 GD is more accurate but slower, while SGD is faster but less accurate. The
choice between GD and SGD depends on the specific problem, dataset size, and
computational resources.
 Mini-Batch Gradient Descent can be used practically. It uses a small batch of
samples to compute the gradient.
Evaluating neural networks
Evaluating neural networks involves assessing their performance on a given task.
Here are some ways to evaluate neural networks:
1. Accuracy: Measure the proportion of correct predictions.
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Where : TP = True Positives (correctly predicted instances)
TN = True Negatives (correctly predicted non-instances)
FP = False Positives(incorrectly predicted instances)
FN = False Negatives((incorrectly predicted non-instances)

2. Loss: Evaluate the difference between predicted and actual outputs.

Loss (Cross-Entropy Loss):Loss = - (1/n) * Σ (y_true * log(y_pred) + (1-y_true) *
log(1-y_pred))
where:y_true = actual labels y_pred = predicted labels
n = number of samples
3.Precision: Measure the proportion of true positives among all positive predictions.
Precision = TP / (TP + FP)
4. Recall: Measure the proportion of true positives among all actual positive
instances.
Recall = TP / (TP + FN)
5. F1-score: Harmonic mean of precision and recall.
F1-score:F1-score = 2 * (Precision * Recall) / (Precision + Recall)
Note: These formulas assume a binary classification problem.
Early stopping

Early Stopping is a technique used to prevent overfitting in neural
networks.
 Early stopping validation is used to stop training when the model's
performance on the validation set starts to degrade.
 Monitor the validation loss during training. Stop training when the loss
stops improving. (i.e., when the loss starts to increase)
This helps to:
 1. Prevent overfitting:
 2. Save computational resources: Stop training earlier, reducing the
computational time.
 3. Improve generalization: Encourage the model to generalize better to unseen
data.

By implementing early stopping, you can train more efficient and effective
neural networks.
 Linear Regression and Logistic Regression are both supervised learning
algorithms used for prediction, but they differ in their approach and application:

Linear Regression
 1. Continuous output: Predicts a continuous value (e.g., price, temperature).
 2. Linear relationship: Assumes a linear relationship between inputs and output.
 3. Mean squared error: Optimizes for mean squared error between predicted and
actual values.
 4. Regression: Used for regression tasks, like predicting a continuous value.
Here no activation function is used , models linear relationships
Logistic Regression
 1. Binary output: Predicts a binary value (e.g., 0/1, yes/no).
 2. Non-linear relationship: Uses a sigmoid function to model a non-linear
relationship between inputs and output.
 3. Cross-entropy loss: Optimizes for cross-entropy loss between predicted
probabilities and actual labels.
 4. Classification: Used for classification tasks, like predicting a binary label.
 Here activation function(sigmoid) is used to convert a linear regression equation
to the logistic regression equation , i.e., models non-linear relationships
Underfitting and overfitting
 Underfitting and overfitting are two common problems in machine learning:
Underfitting-
 Occurs when a model is too simple or has too few parameters to capture the
underlying patterns in the data.
- Model fails to learn from the training data and performs poorly on both
training and test data.
- Symptoms: - High bias ,Low variance ,Poor performance on training data
Poor performance on test data
 Solutions: Increase model complexity, Add more features and samples, Use a
different Algorithm.
Overfitting
 Occurs when a model is too complex or has too many parameters, fitting the
noise in the training data rather than the underlying patterns.
 Model performs well on training data but poorly on test data.
Symptoms: Low bias,High variance ,Good performance on training data - Poor
performance on test data-
Solutions: - Regularization ,Early stopping , - Data augmentation - Cross-
validation
To avoid underfitting and overfitting, aim for a balance between model complexity
and data complexity. Use techniques like cross-validation, regularization, and early
stopping to find the optimal balance.
Bias-Variance

 Bias:Error introduced by simplifying assumptions or

approximations in the model.
 Variance: Error introduced by sensitivity to small fluctuations in
the training data.
 Tradeoff:-High bias (underfitting): Model is too simple, missing
important patterns.- High variance (overfitting): Model is too
complex, fitting noise in the training data.
 Optimalmodel:- Balances bias and variance- Finds a sweet spot
between underfitting and overfitting
 Consequences of imbalance:- High bias: - Model is too simplistic - Misses
important relationships - Poor performance on training and test data-
 High variance: - Model is too complex - Fits noise in the training data -
Poor performance on test data (but good on training data)
 Techniques to manage bias-variance tradeoff:- Regularization,- Cross-validation-
Early stopping- Ensemble methods (e.g., bagging, boosting)
- By understanding and managing the bias-variance tradeoff, you can build more
accurate and generalizable machine learning models.

Classification
100% (2)
Classification
105 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
100% (4)
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
27 pages
21CSC305P ML - Unit 1-E
No ratings yet
21CSC305P ML - Unit 1-E
137 pages
Unit-3 Machine Learning
No ratings yet
Unit-3 Machine Learning
81 pages
Machine Learning?
100% (2)
Machine Learning?
114 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
CS505 MJP AI Slips
No ratings yet
CS505 MJP AI Slips
25 pages
Nyquist Plot: Plot of in The Complex Plane As Is Varied On
No ratings yet
Nyquist Plot: Plot of in The Complex Plane As Is Varied On
19 pages
University of Palestine Gaza Strip Civil Engineering College Numerical Analysis CIVL 3309 Dr. Suhail Lubbad
No ratings yet
University of Palestine Gaza Strip Civil Engineering College Numerical Analysis CIVL 3309 Dr. Suhail Lubbad
42 pages
Machine Learning
No ratings yet
Machine Learning
41 pages
Chapter - Numerical Integration: Aman W Department of Applied Physics University of Gondar
No ratings yet
Chapter - Numerical Integration: Aman W Department of Applied Physics University of Gondar
38 pages
DSP Unit 2
No ratings yet
DSP Unit 2
137 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
41 pages
Machine Learning
No ratings yet
Machine Learning
133 pages
Keras-tensorflow-IT Haarlem 2023
No ratings yet
Keras-tensorflow-IT Haarlem 2023
35 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
Unit 1
100% (1)
Unit 1
13 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
Newton-Raphson Method: Numerical Analysis
No ratings yet
Newton-Raphson Method: Numerical Analysis
14 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
Week 01
No ratings yet
Week 01
37 pages
Numerical Methods Optimization
No ratings yet
Numerical Methods Optimization
19 pages
Data Structure Algorithms: Resources Used
No ratings yet
Data Structure Algorithms: Resources Used
15 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
12 pages
Backtracking PDF
No ratings yet
Backtracking PDF
54 pages
Stat 520 CH 4 Slides
No ratings yet
Stat 520 CH 4 Slides
28 pages
Chapter 3
No ratings yet
Chapter 3
23 pages
Machine Learning L1
No ratings yet
Machine Learning L1
34 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
Chapter 4 - Machine Learning
No ratings yet
Chapter 4 - Machine Learning
81 pages
Answer Any Two Full Questions, Each Carries 15 Marks: F F1124 Pages: 2
No ratings yet
Answer Any Two Full Questions, Each Carries 15 Marks: F F1124 Pages: 2
2 pages
APM4813 TUT101 2021 26may2021
No ratings yet
APM4813 TUT101 2021 26may2021
18 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
P03 A Star Algorithm 35 Anushka Shetty
No ratings yet
P03 A Star Algorithm 35 Anushka Shetty
23 pages
SDL Unit 1
No ratings yet
SDL Unit 1
7 pages
Quantum Computing: Exercise Sheet 4: Steven Herbert
No ratings yet
Quantum Computing: Exercise Sheet 4: Steven Herbert
2 pages
BFS and DFS
No ratings yet
BFS and DFS
8 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
12 pages
Unit 1
No ratings yet
Unit 1
21 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
No ratings yet
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
17 pages
Rahbm101 L04 R01
No ratings yet
Rahbm101 L04 R01
36 pages
Sample Question Paper
No ratings yet
Sample Question Paper
4 pages
Btcse 504 Machine Learning
No ratings yet
Btcse 504 Machine Learning
11 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Aiot 5 Unit
No ratings yet
Aiot 5 Unit
24 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
50% (2)
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
27 pages
Machine Learning Ppts
No ratings yet
Machine Learning Ppts
38 pages
ML - Module 1
No ratings yet
ML - Module 1
30 pages
Machine Learning (R20a0518)
No ratings yet
Machine Learning (R20a0518)
87 pages
1 s2.0 S0957417421017255 Main
No ratings yet
1 s2.0 S0957417421017255 Main
13 pages
0.10.numerical Integration Trapezium Rule
No ratings yet
0.10.numerical Integration Trapezium Rule
16 pages
ML Module 5
No ratings yet
ML Module 5
15 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Machine Learning Reg
No ratings yet
Machine Learning Reg
45 pages
Basics of Machine Learning
No ratings yet
Basics of Machine Learning
20 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
40 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
19 pages
Introduction To Soft Computing Koe 046
No ratings yet
Introduction To Soft Computing Koe 046
2 pages
Unit 5
No ratings yet
Unit 5
29 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Optimization Models
No ratings yet
Optimization Models
2 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
6 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
THEORY FILE - Machine Learning (6th Sem) !!
No ratings yet
THEORY FILE - Machine Learning (6th Sem) !!
26 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
Intorduction of ML
No ratings yet
Intorduction of ML
14 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
20 pages
Perceptron Notes
No ratings yet
Perceptron Notes
4 pages
Intro Machine Learning
No ratings yet
Intro Machine Learning
4 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
9 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
4 - Training and Testing Classifier Models
No ratings yet
4 - Training and Testing Classifier Models
15 pages

Mod 1

Uploaded by

Mod 1

Uploaded by

Introduction

 Introductionto machine learning, Basic mathematics,

Artificial :Made by Human,

Intelligence: Ability to understand or think

 Machine Learning is suitable for Structured data. It is data that fits

 Learning is used when:

 Human expertise does not exist (navigating on Mars),

 Humans are unable to explain their expertise (speech

 Solution needs to be adapted to particular cases (user

 Method of teaching computers to make

 ML algorithm is trained using a training data set to create a model.

augmented training data set.

In a supervised learning model, the algorithm learns on a labeled dataset,

An unsupervised model, in contrast, provides unlabeled data that the

Binary classification is the task of classifying the elements of a

A classification task with more than two classes;

Reinforcement learning can be thought of as a hit

• Speech and Hand Writing Recognition

grille mushroom cherry Madagascar cat true image

Source: Krizhevsky et al 2012: ImageNet Classification with Deep Convolutional Neural

xi1 xi2 features of xiM

A very simple idea here is we're just going to multiply every

0.2 95% 83 1.3 0

WEATHER z1 = (b1 X 0.5) + (b2 X 0.8) + (b3 X 75) + (b4 X y1 = 1 RAIN

variables are to the

(b0, b1,… bM)

xi1 xi2 features of xiM

xi1 xi2 features of xiM

0 0 0 0 0 0 0 8 138 247 253 243 159 196 243 253 19 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 12 183 253 253 253 50 0 0 49 253 214 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 180 254 253 213 50 2 0 0 711 253 214 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 234 217 97 10 0 0 0 23 207 254 215 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 174 253 156 0 0 0 0 45 215 253 253 95 0 0 0 0 0 0 0 0 0 0

0 0 0 0 9 210 253 163 5 19 49 130 244 253 251 137 4 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 160 253 254 253 253 253 253 230 132 4 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 18 112 194 254 254 254 163 59 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Data Source: MNIST Dataset by LeCun et al. (1999) / CC-by- 0

A layer is a horizontal slice of a neural network,

 Layers are the building blocks of a neural

2. Hidden Layers: These layers perform complex representations and

Figure: ANN Model 52

1. Input Layer: It consists of 3 neurons. Each of these neurons represents a feature

Here's a simple analogy to illustrate the difference:-

In general, if your problem requires:

Transfer learning has many benefits, including:

• Reduced training time and data requirements

• Improved performance on the new task

• Sharing knowledge across related tasks

amount of data required to train such a

• People did not speak about neural networks at all.

BP allows for efficient computation of gradients of the loss function with

all available trainin validatio testin

(b0, b1,… bM)

estimate performance on final performance

Stochastic Gradient Descent (SGD):

Standard gradient descent Stochastic gradient descent

2. Loss: Evaluate the difference between predicted and actual outputs.

 Bias:Error introduced by simplifying assumptions or

You might also like