6 CNN

This document provides an overview of convolutional neural networks (CNNs). It discusses key concepts like convolution, sparse interaction, parameter sharing, and equivariance that make CNNs effective for processing grid-like data like images and time-series. The document also covers data preprocessing techniques for neural networks like vectorization and value normalization. It describes overfitting and underfitting issues and different regularization techniques to address overfitting like reducing the network size.

Uploaded by

SWAMYA RANJAN DAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views50 pages

6 CNN

Uploaded by

SWAMYA RANJAN DAS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Neural Network and Deep

Learning
Samatrix Consulting Pvt Ltd
Convolutional Neural Network
Convolutional Neural Network
• In this chapter, we will learn about a deep learning technique called
convolution.
• Convolution has become the standard method of classifying,
manipulating, and generating images.
• It is easy to implement convolution in deep learning.
Convolutional Neural Network
• In this chapter, we will focus on the key ideas behind convolution and
the related techniques that can be used to make convolution work on
images.
• Convolutional neural networks have been used to recognize the
people in a photograph, detect and classify different types of skin
cancers, repair image damage such as dust, scratches, and blur, and
classify people’s age and gender from their photos.
• Convolutional neural networks are also used in natural language
processing.
Convolutional Neural Network
• Convolutional neural networks are specialized for processing data
that has a grid-like topology.
• Examples include time-series data, which is a 1-D grid taking samples
at regular time intervals, and image data, which is a 2-D grid of pixels.
• The network uses a mathematical operation called convolution hence
it gets the name “convolutional neural network”
What is Convolution
In the following equation, the convolution operation is denoted by an
asterisk

𝑠 𝑡 = (𝑥 ∗ 𝑤)(𝑡)

The first argument, the function 𝑥, is often referred to as the input. The
second argument, the function 𝑤, is referred to as kernel. The output is
referred to as the feature map.

Figure 6.1 illustrates an example of a simple convolution applied to a 2-D

tensor
What is Convolution
Motivation
• In order to improve a machine learning system, convolution uses
three important ideas: sparse interaction, parameter sharing, and
equivariant representations.
• Convolution also provides a means of working with inputs of variable
size.
Sparse Interaction
• In traditional neural networks, every output unit interacts with every input
unit.
• Whereas the convolutional networks have sparse interaction (also known
as sparse connectivity, or sparse weights).
• We can accomplish this by making the kernel smaller than the input.
• For example, if the input image has thousands or millions of pixels, we can
use a kernel that has only tens or hundreds of pixels to detect small and
meaningful features such as edges.
• Therefore, we need to store fewer parameters.
• This reduces the memory requirements of the model as well as improves
its statistical efficiency. We need few operations to compute the output.
Sparse Interaction
• The graphical representation of sparse interaction is illustrated in
figure 6.2 in which we have highlighted one input unit 𝑥3 , and output
units in 𝑠 that are affected by this unit.
• When 𝑠 is formed by convolution with a kernel of width 3, only three
inputs are affected by 𝑥.
• For a fully connected network, connectivity is no longer sparse and all
the outputs are affected by 𝑥3
Sparse Interaction
Equivariance
• Property equivariance means that if the input changes, the output
changes in the same way.
• The convolution creates a 2-D map of where certain features appear
in the input.
• If we move the object in the input, its representation will also move
the same amount in the output.
The Convolution Operation
• One of the major differences between a densely connected layer and
a convolution layer is as follows: the dense layers learn global
patterns in their input feature space, while the convolution layer
learns the local patterns.
• As shown in figure 6.3, the patterns were found in small 2D windows
of the inputs.
• These windows could be 3 x 3. The image can be broken into local
patterns such as edges, textures, and so on.
Data Preprocessing for Neural
Networks
Vectorization
• The input and labels of the neural network should be tensors of
floating-point data.
• In some cases, it could be tensors of integers.
• Every data type such as sound, images, and text, should be turned
into a tensor, which we call “data vectorization”.
Vectorization
• In our MNIST classification example, the image data was encoded as
integers in the 0-255 range.
• Before feeding the data into the network, we had to cast it to float32
and divide it by 255.
• Thus, we got the floating-point values in the 0-1 range.
Value Normalization
• Generally, we should not feed relatively large values, such as multi-
digit integers, which are relatively larger than the initial values of the
weights.
• We should also avoid heterogeneous data, for example, a dataset in
which one feature is in the 0-1 range whereas another feature is in
the 100-200 range.
• Such datasets can trigger large gradient updates, which will prevent
the networks from converging.
• Therefore, the data should
• Take “small” values. Most values should be in the 0-1 range
• Be homogeneous. All the features should take values in the same range
Feature Engineering
• The process in which you use your own knowledge about the data and the
machine learning algorithm at hand and apply hard-coded (non-learned)
transformation to the data before the data is fed into the model, is known
as Feature Engineering.
• We use feature engineering in cases where we do not expect the machine
learning model to learn from the data completely.
• So, the make the job of the model easier, the data is transformed and
presented to the model.
• The essence of feature engineering is to make the problem easier by
expressing it in a simpler way. However, it needs an in-depth understanding
of the problem.
Feature Engineering
• Earlier some of the machine learning models could not learn useful
features from the data by themselves.
• Therefore, feature engineering was critical for the success of the
project.
• For example, before the convolutional neural network, to solve MNIST
digit classification problems, we need to hard-code features such as
the number of the loops in the digit image, the height of each digit in
the image, a histogram of pixel values, and so on.
Feature Engineering
• Modern deep learning algorithms do not require most feature
engineering.
• The neural networks can extract useful features from raw data.
• Still, you may need feature engineering in certain cases due to
following reasons
• You can solve problems more elegantly while using fewer resources by using
feature engineering
• By using good features, you can solve the problem using much fewer data.
The deep learning models need lots of training data to learn features on their
own. However, if we have few samples, feature engineering may be required.
Overfitting and Underfitting
• We have seen in the MNIST classification problem that the
performance of the model on the held-out validation data peaks after
a few epochs and then it would start degrading.
• Therefore, our model quickly starts to overfit the training data.
• We face the overfitting problem in every single machine learning
problem.
• In order to master machine learning, we need to learn, how to deal
with overfitting.
Overfitting and Underfitting
• The fundamental issue in machine learning is to balance between
optimization and generalization.
• Optimization refers to the process in which we adjust the model to
get the best performance on the training data.
• Generalization refers to the performance of the model on the data it
has never seen before.
• The goal is to get good generalization but you do not control
generalization.
• You can only adjust the model based on training data.
Overfitting and Underfitting
• At the beginning of training, both the optimization and generalization are
correlated.
• The loss on the training data will be low so the loss on the test data will be.
• In such a case, we consider our model to be under-fit.
• This means that the model has not learned all the relevant patterns in the
training data set.
• After certain iterations on the training data, there is no further
improvement in the training data and the validation metrics start
degrading.
• This means that the model has started to over-fit.
• It has started to learn patterns that are specific to the training data.
• Such patterns are not relevant to the new data.
Overfitting and Underfitting
• The best solution to address the overfitting problem is to get more
training data.
• A model that is trained on more data generalizes better.
• When it is not possible to get more data, the next possible solution is
to control the quantity of information that the model can store.
• We can also control the information that it can store.
• If the model is constrained to memorize a small number of patterns, it
will be forced to focus on the most prominent patterns.
• Therefore, it will have a better chance of better generalization.
Overfitting and Underfitting
• The process of fighting overfitting in this way is known as
regularization.
• In the next section, we will review some of the most common
regularization techniques.
Addressing Overfitting
Reduce Network’s size
• The simplest way to address the overfitting is to reduce the size of the
model.
• This means that we reduce the number of learnable parameters, which are
decided by the number of layers and the number of units in each layer.
• A model with more parameters will have more memorization capacity.
• At the same time, we need to ensure that the model has enough
parameters so that the model does not have enough memorization
capacity and it underfits.
• We need to optimize between “too much capacity” and “not enough
capacity”.
Reduce Network’s size
• There is no magical formula that helps you decide the rights number
of layers and the right number of units in each layer.
• Therefore, to find the right model for your data, you need to evaluate
an array of different architectures.
• We can start with a small number of layers and parameters and then
add new layers until you see diminishing returns with respect to the
validation loss.
Reduce Network’s size
Suppose our original model is as follows.

import tensorflow as tf

model = tf.keras.models.Sequential([ tf.keras.layers.Dense(16,activation='relu', input_shape=(1000,)),

tf.keras.layers.Dense(16,activation='relu'),
tf.keras.layers.Dense(1,activation='sigmoid')
])
Reduce Network’s size
We can replace it with a smaller network.

model = tf.keras.models.Sequential([
tf.keras.layers.Dense(4, activation='relu', input_shape=(1000,)),
tf.keras.layers.Dense(4, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Reduce Network’s size
• The comparison of the validation loss for the original network and a smaller
network is as follows.
• We can see that the smaller network starts overfitting later than the reference
one (after 6 epochs rather than 4).
• The performance of the smaller network degrades much more slowly after it
starts overfitting.
Reduce Network’s size
We can try a different network that has a bigger capacity.

model = tf.keras.models.Sequential([
tf.keras.layers.Dense(512,activation='relu',input_shape=(1000,)),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(512, activation='sigmoid')
])
Reduce Network’s size
• The comparison of the validation loss for a bigger network with the original
network is as follows
• The bigger network starts overfitting after the first epoch. It overfits much more
severely.
Reduce Network’s size
• We can also compare the training losses of both networks.
• The training loss of the bigger network approaches zero very quickly.
• The more capacity the network has, the quicker the model will be able to model
the training data (which results in low training loss), but it is more vulnerable to
overfitting (resulting in a large difference between training and validation loss).
Adding Weight Regularization
• Given some training data and network architecture, simpler models are less likely
to overfit than complex ones.
• We can force the weights in the network to take only small values.
• This makes the distribution of weights values more regular.
• This is called “weight regularization”, which is done by adding a cost that is
associated with having large weights.
Adding Weight Regularization
• The cost comes in two flavors:
• L1 regularization: The added cost is proportional to the absolute value of the weight’s
coefficients. It is also called “L1 Norm”
• L2 regularization: The added cost is proportional to the square of the value of the weight’s
coefficients. It is also called “L2 Norm”
Adding Weight Regularization
We can add the L2 weight regularization as follows

model = tf.keras.models.Sequential([
tf.keras.layers.Dense(16, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu', input_shape=(1000,)),
tf.keras.layers.Dense(16, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu’),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Adding Weight Regularization
• The impact of L2 regularization is as follows.
• We can see that even though both the models have the same number
of parameters, the model with L2 regularization has become more
resistant to overfitting than the reference model.
Adding Weight Regularization
As an alternative to L2 regularization, you can use one of the following
Keras weight regularizations.

#L1 regularization
tf.keras.regularizers.l1(0.001)

#L1 and L2 regularization at the same time

tf.keras.regularizers.l1_l2(l1=0.001, l2=0.001)
Dropout
• Dropout is a popular regularization method.
• We apply dropout in a deep network in the form of a dropout layer.
• The dropout layer is also called an accessory layer or a supplemental
layer because it does not do any computation on its own.
• We call it a layer because it allows us to include dropout in the
drawing of the network.
• However, it is not considered a real layer (hidden or otherwise).
• This layer is not counted when we describe the number of layers in a
particular network.
Dropout
• The dropout layer temporarily disconnects some of the neurons on
the previous layer.
• For example, a given layer would have returned a vector [0.2, 0.5, 1.1,
0.7, 1.3] for a given input sample during training.
• After we apply the dropout, the vector will have a few zero entities
distributed at random, e.g. [0, 0.5, 1.1, 0, 1.3].
• We provide a “dropout rate”, which is the fraction of the features that
are being zeroed out; it is usually set between 0.2 and 0.5. At the test
time, no unit would drop out.
Dropout
• The dropped-out units do not
participate in any forward calculations.
• They are also not included in backprop.
• The optimizer does not update their
weights.
• After the batch is completed and the
rest of the weights have been updated,
the dropped-out neuron connections are
restored.
• At the start of the next batch, the layer
again chooses a new random set of
neurons and temporarily removes them.
Dropout
In Keras, you can introduce dropout in a network via the Dropout layer, which gets applied to the
output of the layer right before it.

tf.keras.layers.Dropout(rate=0.2)

Let’s add two Dropout layers in our model and see how well they can reduce overfitting

model = tf.keras.models.Sequential([
tf.keras.layers.Dense(16,activation='relu', input_shape=(1000,)),
tf.keras.layers.Dropout(rate=0.2),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dropout(rate=0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Dropout
• Let’s plot the results
• We could see a clear improvement over the reference network
Summary
• Therefore, the most common ways to prevent overfitting in neural
network
• Getting more training data.
• Reducing the capacity of the network.
• Adding weight regularization.
• Adding dropout.
Universal Workflow – Deep Learning
1. Define the problem at hand and assemble a dataset
a. What is the input data? What do you want to predict?
b. What type of problem you are facing – binary classification, multi-class
classification, regression, or something else
2. Pick a measure of success
a. How do you define success – accuracy? Precision-Recall? Customer retention
rate?
b. The metric of success will help you decide the loss function
Universal Workflow – Deep Learning
3. Prepare your data - The data should be formatted before it can be fed
into the machine learning model
a. The data should be formatted as tensors
b. The value taken by the tensors should be a small value. For example, in the [-1, 1]
range or [0, 1] range
c. For heterogeneous data, the data should be normalized
d. If the dataset is small, you may consider feature engineering
4. Develop the model – three key choices that you need to make
a. Choice of the last-layer activation
b. Choice of the loss function. This should match the problem that you are trying to
solve
c. Choice of an optimizer. What would be the optimizer? What would be the learning
rate? Can we go with the default optimizer for Keras, rmsprop, and its default learning
rate?
Universal Workflow – Deep Learning
You can also pick a last layer activation and a loss function from the
following table
Problem Type Last-layer activation Loss function

Binary Classification Sigmoid binary_crossentropy

Muti-class, Single label classification Softmax categorical_crossentropy

Multi-class, multi-label classification Sigmoid binary_crossentropy

Regression to arbitrary value None mse

Regression to value between 0 and 1 sigmoid binary_crossentropy or mse
Universal Workflow – Deep Learning
5. Scale-up: develop a model that overfits – To figure out how big a model
you will need, you must develop a model that overfits using the following
methods. You should always monitor training loss and validation loss
a. Add layers
b. Make your layers bigger
c. Train for more epochs
6. Regularize Model and Tune Hyperparameters
a. Add dropout
b. Try different architectures by adding or removing layers
c. Add L1/L2 regularization
d. Try different hyperparameters (number of units per layer, the learning rate of the
optimizer, etc.)
e. Optionally feature engineering
Thanks
Samatrix Consulting Pvt Ltd

Astrology and Winning The Lottery
70% (10)
Astrology and Winning The Lottery
5 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
Lecture 221007 05
No ratings yet
Lecture 221007 05
21 pages
3 - DeepLearning - and - CNN v3
No ratings yet
3 - DeepLearning - and - CNN v3
50 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
Unit2 CNN
No ratings yet
Unit2 CNN
34 pages
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
No ratings yet
Convolution Neural Networks: S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
123 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
Deep Learning Artificial Intelligence
No ratings yet
Deep Learning Artificial Intelligence
9 pages
1.explain The Concept of Empirical Risk Minimization. What Is The Goal of Optimization in Deep Learning?
No ratings yet
1.explain The Concept of Empirical Risk Minimization. What Is The Goal of Optimization in Deep Learning?
11 pages
Chapter 4
No ratings yet
Chapter 4
34 pages
Week 15
No ratings yet
Week 15
41 pages
Guide Convolutional Neural Network CNN
100% (1)
Guide Convolutional Neural Network CNN
25 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
Module 4
No ratings yet
Module 4
20 pages
Bias and Variance in Machine Learning
No ratings yet
Bias and Variance in Machine Learning
3 pages
Convolutional Neural Networks - Deeplearning-Notes
No ratings yet
Convolutional Neural Networks - Deeplearning-Notes
43 pages
CNN New
No ratings yet
CNN New
225 pages
DL Unit-4
No ratings yet
DL Unit-4
19 pages
Deep 2
No ratings yet
Deep 2
57 pages
NN 06
No ratings yet
NN 06
18 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
DeepLearning Unit-II
No ratings yet
DeepLearning Unit-II
70 pages
Sarma CNN Vce Oct 2022
No ratings yet
Sarma CNN Vce Oct 2022
63 pages
Module 3
No ratings yet
Module 3
46 pages
Lecture 08
No ratings yet
Lecture 08
43 pages
Optimization
No ratings yet
Optimization
95 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Reserch Papers On Deep Learning Mpgi
No ratings yet
Reserch Papers On Deep Learning Mpgi
6 pages
Week 6 Unsupervised Learning
No ratings yet
Week 6 Unsupervised Learning
60 pages
CMPE257 - W2C3 - ML Fundamentals - Part 2
No ratings yet
CMPE257 - W2C3 - ML Fundamentals - Part 2
34 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Lecture 2: Basics and Definitions: Networks As Data Models
No ratings yet
Lecture 2: Basics and Definitions: Networks As Data Models
28 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
FODL Unit-4
No ratings yet
FODL Unit-4
46 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
55 pages
Lecture 4 - Deep Learning Introduction
No ratings yet
Lecture 4 - Deep Learning Introduction
63 pages
M4 Ia2
No ratings yet
M4 Ia2
6 pages
DL Mod4
No ratings yet
DL Mod4
18 pages
CC511 Week 7 - Deep - Learning
No ratings yet
CC511 Week 7 - Deep - Learning
33 pages
Cours 4
No ratings yet
Cours 4
30 pages
Artificial Neural Networks - Lect - 4
No ratings yet
Artificial Neural Networks - Lect - 4
17 pages
Chapter 11 Neural Nets (Python)
No ratings yet
Chapter 11 Neural Nets (Python)
43 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Convolution and Pooling Layers
No ratings yet
Convolution and Pooling Layers
42 pages
Three Reasons That You Should NOT Use Deep Learning - by George Seif - Towards Data Science
No ratings yet
Three Reasons That You Should NOT Use Deep Learning - by George Seif - Towards Data Science
1 page
Unit 2 CNN
No ratings yet
Unit 2 CNN
9 pages
UNIT-III DLL Full Unit
No ratings yet
UNIT-III DLL Full Unit
63 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Unit 4
No ratings yet
Unit 4
35 pages
Unit 2
No ratings yet
Unit 2
45 pages
Lecture 3 Updated
No ratings yet
Lecture 3 Updated
56 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
Professional Ed-WPS Office
100% (2)
Professional Ed-WPS Office
127 pages
Structural Design of Spillway
No ratings yet
Structural Design of Spillway
10 pages
Instruction Manual: Digital Genset Controller DGC-500
No ratings yet
Instruction Manual: Digital Genset Controller DGC-500
151 pages
ZQ200 User Manual V2.2
No ratings yet
ZQ200 User Manual V2.2
20 pages
Notice Regarding PTM For Students
No ratings yet
Notice Regarding PTM For Students
1 page
Student Guide M2
No ratings yet
Student Guide M2
49 pages
Chapter 6
No ratings yet
Chapter 6
10 pages
Competency Matrix
No ratings yet
Competency Matrix
2 pages
ITK - AquaCheck - Standard - EN
No ratings yet
ITK - AquaCheck - Standard - EN
18 pages
Quality Control Analysis of Cube Fish With Fault Tree Analysis (FTA) Method in ALJB A Case Study
No ratings yet
Quality Control Analysis of Cube Fish With Fault Tree Analysis (FTA) Method in ALJB A Case Study
6 pages
Beam Deflection - Moment Area Method PDF
No ratings yet
Beam Deflection - Moment Area Method PDF
10 pages
Child Friendly School S High School 1
No ratings yet
Child Friendly School S High School 1
17 pages
Social Science Disciplines
No ratings yet
Social Science Disciplines
2 pages
Reviewer in Entrepreneurship
No ratings yet
Reviewer in Entrepreneurship
2 pages
Machine Standard Configuration: Horizon 03ix
No ratings yet
Machine Standard Configuration: Horizon 03ix
8 pages
Halter
No ratings yet
Halter
2 pages
Rockfall Barrier
No ratings yet
Rockfall Barrier
12 pages
11.2 The Process of Cell Division
No ratings yet
11.2 The Process of Cell Division
36 pages
Hand Sketching For Interiors: Developing Visual Illustration Techniques
No ratings yet
Hand Sketching For Interiors: Developing Visual Illustration Techniques
25 pages
740 (B) Calculation of Smoke Spilled System
No ratings yet
740 (B) Calculation of Smoke Spilled System
8 pages
Layam Group - Business Presentation
No ratings yet
Layam Group - Business Presentation
28 pages
Audio Recording & Mastering Tips
93% (15)
Audio Recording & Mastering Tips
2 pages
Failure Rates in PV Systems: A Careful Selection of Quantitative Data Available in The Literature
No ratings yet
Failure Rates in PV Systems: A Careful Selection of Quantitative Data Available in The Literature
2 pages
Steps Involved in Production and Utilization of A TV Programme
No ratings yet
Steps Involved in Production and Utilization of A TV Programme
5 pages
2023 2024 SPGBHS Main Teaching Load
No ratings yet
2023 2024 SPGBHS Main Teaching Load
2 pages
Why Triple Offset The Benefits of Triple Offset Butterfly Valves
100% (2)
Why Triple Offset The Benefits of Triple Offset Butterfly Valves
2 pages
Electrostatic Lens (10 Points) : Theory
No ratings yet
Electrostatic Lens (10 Points) : Theory
4 pages
Lesson 4 - Contructivist Theory in Teaching Science
No ratings yet
Lesson 4 - Contructivist Theory in Teaching Science
2 pages

6 CNN

Uploaded by

6 CNN

Uploaded by

Neural Network and Deep

Figure 6.1 illustrates an example of a simple convolution applied to a 2-D

model = tf.keras.models.Sequential([ tf.keras.layers.Dense(16,activation='relu', input_shape=(1000,)),

#L1 and L2 regularization at the same time

Binary Classification Sigmoid binary_crossentropy

Muti-class, Single label classification Softmax categorical_crossentropy

Multi-class, multi-label classification Sigmoid binary_crossentropy

Regression to arbitrary value None mse

You might also like