0% found this document useful (0 votes)

11 views40 pages

Lec 07 8

This document covers deep learning concepts, focusing on neural networks and their implementation using the Keras library. It discusses the architecture of neural networks for various tasks such as regression, binary classification, and multi-class classification, along with examples like house rent prediction and the Iris dataset. The lecture also highlights the importance of activation functions, optimizers, and the differences between traditional feature extraction and modern deep learning approaches in computer vision.

Uploaded by

202411073

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views40 pages

Lec 07 8

Uploaded by

202411073

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

IT549: Deep Learning

Lecture 07-08

Neural Networks Examples using Keras

(Slides are created from the lecture notes of Dr. Derek Bridge, UCC, Ireland)

Arpit Rana
16th / 17th January 2025
Deep Learning

● The word 'deep' in 'deep learning' does not mean profound.

● In deep learning, we have 'lots' of layers — tens or even hundreds.

Representations

One way of thinking about Machine Learning:

● It uses guidance from a feedback signal to automatically ﬁnd transformations that turn
input data into more useful representations.

For example,

○ in the case of supervised learning, the feedback comes from the loss function and
the algorithm seeks a representation that is closer to the target outputs.
Representations

Deep learning is about jointly ﬁnding successive layers of representations, usually in the form
of the layers of a neural network.

● The network takes in vectors (examples).

● The ﬁrst layer in some sense transforms the input vectors into new vectors — a different
representation of the inputs examples.

● The second layer transforms again into new vectors — another representation.

● Since each layer produces a new representations, one way of thinking about this is, for
the kinds of tasks on which it is successful, deep learning automates feature engineering.
Drivers of Deep Learning

Hardware:
● Faster CPUs but then highly-parallel Graphical Processing Units (GPus) and now
specially-designed Tensor Processing Units (TPUs).
Data:
● Sensors and the Internet have made vast datasets available: text, images, video, …
Algorithmic advances:
● The core ideas have been around a long time: Perceptrons (1950s), backpropagation
(1980s or earlier), convolutional networks (1980s), LSTMs (1990s), …
● But new ideas from 2010 onwards: better weight initialization, batch normalization,
different activation functions, variants of SGD, numerous ways to avoid overﬁtting, new
architectures,…
Freeware:
● Toolkits/APIs; Educational resources.
Money!
Applications of Deep Learning

It is excelling at 'perceptual' tasks, e.g.

● image classiﬁcation;
● image segmentation;
● speech recognition;
● handwriting transcription.

But it is ﬁnding ever wider application:

● video recommendation;
● machine translation;
● text-to-speech;
● question-answering;
● autonomous driving;
● the protein folding problem (AlphaFold);
● superhuman game playing (e.g. AlphaGo).
Implementation

In this lecture:

● We will use layered, dense, feedforward neural networks for regression, binary
classiﬁcation and multi-class classiﬁcation:

○ We'll use our two small datasets that contain structured data (sometimes called
tabular data): not necessarily ideal for deep learning.

○ We'll see one example that uses images.

● This will illustrate some of the different activation functions we can use:

○ in the output layer: linear, sigmoid or softmax; and

○ in the hidden layers: sigmoid or ReLU.

● This will also introduce the Keras library.

The Keras Library

scikit-learn has very limited support for neural networks.

Tensorﬂow and PyTorch are the two main libraries that do support tensor computation, neural
networks and deep learning in Python:

We will use Keras, which is a high-level API for Tensorﬂow, ﬁrst released in 2015 by François
Chollet of Google (https://keras.io):

● It is very high-level, making it easy to construct networks, ﬁt models and make

predictions.

● The downside is it gives less ﬁne-grained control than TensorFlow itself. When
ﬁne-grained control is needed, you can mix in TensorFlow functions, methods and
classes.

● This seems a suitable trade-off for us: our module is about AI, not the intricacies of
TensorFlow.
Keras Concepts

Network Architecture: Number of Hidden Layers

● Neural network with no hidden layers is just a linear model.

● Hidden layers are needed when data is not linearly separable.

○ Try to avoid more than 2 hidden layers otherwise it will increase the model
complexity.

○ For very large datasets, gradually ramp up the number of hidden layers until you
start overﬁtting the training set.
Keras Concepts

Network Architecture: Number of Neurons in Hidden Layers

● The number of hidden neurons should be between the size of the input layer and the size
of the output layer.

● The number of hidden neurons should be 2/3 the size of the input layer, plus the size of
the output layer.

● The number of hidden neurons should be less than twice the size of the input layer.

Source: An Introduction to Neural Networks for Java, Second Edition by Jeff Heaton
Keras Concepts

Layers are the building blocks.

● To begin with, we will use dense (fully connected) layers.

The activation functions of hidden layers are open for you to choose, e.g. sigmoid or ReLU.

● But the activation functions of output layers are determined by the task:
● Regression: linear activation function (default);
● Binary classiﬁcation: sigmoid activation function; and
● Multiclass classiﬁcation: softmax activation function.

Layers are combined into networks:

● Consecutive layers must be compatible: the shape of the input to one layer is the shape
of the output of the preceding layer.
Keras Concepts

Once the network is built, we compile it, specifying:

A loss function:
● Regression, e.g. mean-squared-error (mse);
● Binary classiﬁcation, e.g. (binary) cross-entropy (binary_crossentropy );
● Multiclass classiﬁcation, e.g. (categorical) cross-entropy
(sparse_categorical_crossentropy if the labels are encoded as integer labels or
categorical_crossentropy if the integer labels are then also one-hot encoded).

An optimizer, such as SGD — but see below.

A list of metrics to monitor during training and testing:

● Regression, e.g. mean-absolute-error (mae);
● Classiﬁcation, e.g. accuracy (acc).
Keras Optimizers

We know about Gradient Descent: Batch, Mini-Batch, Stochastic.

Without going into details, many other variants of Gradient Descent have been devised (e.g.
RMSprop, Adam, Nadam, Adagrad , …):

● some may have better convergence behaviour in the case of local minima;

● some may converge more quickly.

although a disadvantage is that they typically introduce further hyperparameters (e.g.

momentum) in addition to learning rate.
Keras Optimizers

We will use RMSprop below.

● Be aware, its default learning rate is 0.001. This is usually OK, but in some cases you may
need to change it.

● Be aware too that there is an argument called batch_size . Assuming we set its value to
somewhere between 1 and the size of the training set then we are getting Mini-Batch
Gradient Descent.
A Neural Network for Regression

For regression on structured/tabular data, we might use a network with the following
architecture:

● Input layer: one input per feature.

● Hidden layers: one or more hidden layers.

○ Activation function for neurons in hidden layers can be the sigmoid function or
ReLU.

● Output layer: just one output neuron (assuming we're predicting a single number).
○ Activation function for the output neuron should be the linear function: g(z) = z

There are also biases in each layer except the output layer — Keras will give us these 'for free'.
Example: House Rent Prediction

We don't want too many hidden layers, nor too many neurons in each hidden layer. Why?

Let's start with this:

● An input layer with three inputs (BHK, Size, Bathrooms);
● Two hidden layers, with 32 neurons in each, and ReLU activation function;
● An output layer with a single neuron and linear activation function.

We need to scale the features. But, since we are now not using scikit-learn's
ColumnTransformers to create a preprocessor, we need to take care of the scaling.
Example: House Rent Prediction
Example: House Rent Prediction
A Neural Network for Binary Classiﬁcation

For binary classiﬁcation, we might use a network with the following architecture:

● Input layer: one input per feature.

● Hidden layers: one or more hidden layers.

○ Activation function for neurons in hidden layers can be sigmoid or ReLU.

● Output layer: just one output neuron (for binary classiﬁcation).

○ Activation function for the output neuron should be the sigmoid function also.
Why?
Example: Class Performance Dataset

Let's start with this:

● An input layer with 3 inputs (lec, lab, cao).
● Two hidden layers, with 64 neurons in each, and ReLU activation function.
● An output layer with a single neuron and sigmoid activation function.
We'll scale using a Normalization layer.
Example: Class Performance Dataset

// 0.6666666865348816 Feel free to edit the code, e.g. add or remove

hidden layers, change the number of neurons in
the hidden layers, change ReLU to sigmoid,
change from RMSprop to another optimizer,
change the learning rate, change the number of
epochs, or change the batch size.
A Neural Network for Multiclass Classiﬁcation

For multi-class classiﬁcation, we might use a network with the following architecture:

● Input layer: one input per feature.

● Hidden layers: one or more hidden layers.

○ Activation function for neurons in hidden layers can be sigmoid or ReLU.

● Output layer: one output neuron per class.

○ Activation function for the output neurons should be the softmax function.
Example: Iris Dataset

Let's start with this:

● An input layer with 4 inputs (petal width and length, and sepal width and length).

● Two hidden layers, with 64 neurons in each, and ReLU activation function.

● An output layer with three neurons (one for Setosa, Versicolor and Virginica) and
softmax activation function.
Example: Iris Dataset
Example: Iris Dataset
Example: Iris Dataset
Example: Iris Dataset

Note the loss function above:

● sparse_categorical_crossentropy for multi-class classification when the classes
are integers, e.g. 0 = one kind of Iris, 1 = another kind, 2 = a third kind (which is what we
have in the Iris dataset).
● categorical_crossentropy for multi-class classification when the classes have been
one-hot encoded.
● As we've seen, binary_crossentropy for binary classification, where the classes will
be 0 or 1.

Below, an alternative, is code that illustrates one-hot encoding the target values using the
Keras function called to_categorical , and then using categorical_crossentropy for
the loss function.
Example: Iris Dataset

// 0.8999999761581421
Example: Iris Dataset

Observations:

● Neural networks are often not the best-performing approaches for structured data.

● And, sure enough, the results here are not great. Of course, there is a lot we can tweak to
see if we can improve the results.

● But, instead, let's switch to an image processing example.

Example: Fashion MNIST Dataset

Fashion MNIST is also a classic dataset for multi-class classiﬁcation.

● The task is classiﬁcation of fashion items.
○ Features: 28 pixel by 28 pixel grayscale images of fashion items. The values are
integers in [0, 255].
○ Classes: ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt",
"Sneaker", "Bag", "Ankle boot"].

● Dataset: 70,000 images, so we can safely use holdout, and it is already partitioned:
○ 60,000 training images; 10,000 test images.
Example: Fashion MNIST Dataset

We don't really need scikit-learn pipelines this time.

But we do need to reshape:

● Our training data is in a 3D array of shape (60000, 28, 28).

● We change it to a 2D array of shape (60000, 28 * 28).

○ This 'ﬂattens' the images.

○ When working with images, it is often better not to do this.

● Similarly, the test data.

Example: Fashion MNIST Dataset

We will do a three-layer network:

● One hidden layer with 300 neurons, using the ReLU activation function.

● Second hidden layer with 100 neurons, using the ReLU activation function.

● The output layer will have 10 neurons, one per class, and will use the softmax activation
function.

The features (pixel values) are all in the same range [0, 255], so we do not need to standardize
using a Normalization layer.

But it is a bad idea to feed into a neural network values that are much larger than the initial
weights, so we will rescale to by dividing by 255. We can do this using a Rescaling layer.
Example: Fashion MNIST Dataset
Example: Fashion MNIST Dataset
Example: Fashion MNIST Dataset
Example: Fashion MNIST Dataset
Remarks on Computer Vision Problems

In the 1960s, 70s, 80s and to some extent 90s, the typical pipeline for a computer vision (or
image processing) system was as follows:

● There would be a module that would extract features from the images.

○ These features would have been carefully hand-designed.

○ They might include edges detected by some edge detection algorithm, for example.
(If you are interested, look up SIFT or SURF or HOG.)

● Then these features would be fed into a typical learning algorithm, e.g. logistic
regression.
Remarks on Computer Vision Problems

Notice how different life is now — when using neural networks.

● There's no extraction of hand-crafted features. We feed in the raw pixel values (or,
lightly-processed pixel values, e.g. scaled values).

● It is the layers of the neural network that automatically discover the features, and the
ﬁnal layer that makes the classiﬁcation.

● The dense layers are only one possibility.

○ Computer vision (image processing) more often also uses convolutional layers,
pooling layers, batch normalization layers, and so on. We may study them in
coming lectures.
Concluding Remarks

● A few decisions are constrained: number of inputs; number of output neurons; activation
function of output neurons; and (to some extent) loss function.

● But there are numerous hyperparameters (and even more to come!)

○ Even making a good guess at them is more art than science, although this is
changing.
○ On the other hand, grid search or randomized search will make things even slower
than they already are — and we still have to specify some sensible values for them
to search through.

● There is a considerable risk of overﬁtting.

Next lecture Training Deep Neural Network
17th January 2025

Deep Learning With Python
100% (6)
Deep Learning With Python
396 pages
Deep Learning UNIT-3
No ratings yet
Deep Learning UNIT-3
20 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Jntuk Machine Learning 3-2 Unit-5
No ratings yet
Jntuk Machine Learning 3-2 Unit-5
15 pages
Deep Learning
No ratings yet
Deep Learning
28 pages
Unit 3 Slides - Getting Started With Neural Networks
No ratings yet
Unit 3 Slides - Getting Started With Neural Networks
70 pages
Deep Learning Basics
No ratings yet
Deep Learning Basics
28 pages
DLunit 3
No ratings yet
DLunit 3
13 pages
Introduction To Deep Neural Networks - DataCamp
No ratings yet
Introduction To Deep Neural Networks - DataCamp
10 pages
Keras v.2.1.6
No ratings yet
Keras v.2.1.6
244 pages
Deep Learning TensorFlow and Keras
No ratings yet
Deep Learning TensorFlow and Keras
454 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
100% (8)
(Ebook PDF) Introduction To Data Mining 2nd Edition by Pang-Ning Tanpdf Download
51 pages
KT 01 Intro2Keras
No ratings yet
KT 01 Intro2Keras
24 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Lecture07. ANN (Chapter 10-2)
No ratings yet
Lecture07. ANN (Chapter 10-2)
26 pages
Chapter 3
No ratings yet
Chapter 3
24 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
Keras1-2 Classification
No ratings yet
Keras1-2 Classification
13 pages
Unit - 3 DL
No ratings yet
Unit - 3 DL
17 pages
Keras-tensorflow-IT Haarlem 2023
No ratings yet
Keras-tensorflow-IT Haarlem 2023
35 pages
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
No ratings yet
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
9 pages
Introduction To Artificial Neural Networks
No ratings yet
Introduction To Artificial Neural Networks
31 pages
Unit3 DLT Material Important Notes
No ratings yet
Unit3 DLT Material Important Notes
33 pages
Deep Learning
No ratings yet
Deep Learning
43 pages
Tensorflow: Features
No ratings yet
Tensorflow: Features
10 pages
DL Unit 3
No ratings yet
DL Unit 3
21 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
Keras1 - 1.4 Advanced Model Architectures
No ratings yet
Keras1 - 1.4 Advanced Model Architectures
11 pages
Keras1-Introduction Two KEras
No ratings yet
Keras1-Introduction Two KEras
6 pages
Unit 2 DL
No ratings yet
Unit 2 DL
3 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Unit I
No ratings yet
Unit I
90 pages
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
No ratings yet
This Code Fragment Defines A Single Layer With Artificial Neurons, and It Expects Input Variables
9 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
DL Lab-Final
No ratings yet
DL Lab-Final
22 pages
ML Unit-5
No ratings yet
ML Unit-5
19 pages
Sony Ai Content
No ratings yet
Sony Ai Content
26 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Workshop 1 Frameworks Deep Learning
No ratings yet
Workshop 1 Frameworks Deep Learning
16 pages
Deep Learning With Keras - Quick Guide
No ratings yet
Deep Learning With Keras - Quick Guide
22 pages
DL Unit 3 Important Questions and Answers PDF .. - 1
No ratings yet
DL Unit 3 Important Questions and Answers PDF .. - 1
8 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
195 pages
Unit 5
No ratings yet
Unit 5
10 pages
Deep Learning With Python Mini Course
No ratings yet
Deep Learning With Python Mini Course
26 pages
Keras CheatSheet PGAA
No ratings yet
Keras CheatSheet PGAA
1 page
Unit 2
No ratings yet
Unit 2
10 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Dla
No ratings yet
Dla
23 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Ker As Tutorial
No ratings yet
Ker As Tutorial
33 pages
Class Notes DL Unit 2
No ratings yet
Class Notes DL Unit 2
47 pages
DL Exp-3 16010422230
No ratings yet
DL Exp-3 16010422230
9 pages
Introduction To Deep Learning-Session3: Ravi Shukla
No ratings yet
Introduction To Deep Learning-Session3: Ravi Shukla
21 pages
DL Unit-3 (CDS)
No ratings yet
DL Unit-3 (CDS)
32 pages
DSE 3141 Deep Learning Lab Manual 2024 Week4
No ratings yet
DSE 3141 Deep Learning Lab Manual 2024 Week4
14 pages
DL Inference FPGA Class1
No ratings yet
DL Inference FPGA Class1
56 pages
DL Mannual For Reference
No ratings yet
DL Mannual For Reference
58 pages
Unit 5
No ratings yet
Unit 5
61 pages
Hiperparametre
No ratings yet
Hiperparametre
10 pages
Orange Data Mining Tool: Presentation
No ratings yet
Orange Data Mining Tool: Presentation
57 pages
How AI Is Transforming Project Management
No ratings yet
How AI Is Transforming Project Management
19 pages
? Revolutionizing Liver Care
No ratings yet
? Revolutionizing Liver Care
4 pages
Budd S 2022 PHD Thesis
100% (1)
Budd S 2022 PHD Thesis
214 pages
End To End Project Multiple Disease Detection Using ML - Nomidl
No ratings yet
End To End Project Multiple Disease Detection Using ML - Nomidl
24 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
DDoS Attacks Mitigation A Review of AI-Based Strategies and Techniques
No ratings yet
DDoS Attacks Mitigation A Review of AI-Based Strategies and Techniques
6 pages
Machine Learning Algorithmsfor Predictive Maintenancein Autonomous Vehicles
No ratings yet
Machine Learning Algorithmsfor Predictive Maintenancein Autonomous Vehicles
18 pages
Parsa-Pajouh, A. (2025) - Application of Generative AI To Automate Numerical Analysis and Synthetic Data Generation in Geotechnical Engineering.
No ratings yet
Parsa-Pajouh, A. (2025) - Application of Generative AI To Automate Numerical Analysis and Synthetic Data Generation in Geotechnical Engineering.
10 pages
Thesis SVM
100% (1)
Thesis SVM
5 pages
Lab 04 - Supervised ML Classification - Updated
No ratings yet
Lab 04 - Supervised ML Classification - Updated
21 pages
Sem-Vi - Comp - Regular - Ai - May-2023.paper Solution
No ratings yet
Sem-Vi - Comp - Regular - Ai - May-2023.paper Solution
14 pages
The Data Tree
No ratings yet
The Data Tree
4 pages
(IJCST-V12I2P13) :prof. Neethi Narayanan, Sona Maria Simon, Sonica R, Sredha Anil, Parvathy Krishna S
No ratings yet
(IJCST-V12I2P13) :prof. Neethi Narayanan, Sona Maria Simon, Sonica R, Sredha Anil, Parvathy Krishna S
5 pages
DDS1
No ratings yet
DDS1
7 pages
Visual Thinking
No ratings yet
Visual Thinking
48 pages
DM Question Bank
No ratings yet
DM Question Bank
5 pages
Srivastava OmniVec Learning Robust Representations With Cross Modal Sharing WACV 2024 Paper
No ratings yet
Srivastava OmniVec Learning Robust Representations With Cross Modal Sharing WACV 2024 Paper
13 pages
Scm201805007 - Kipo - Kipo's Plan For Ai
No ratings yet
Scm201805007 - Kipo - Kipo's Plan For Ai
20 pages
Control Flow Graphs Against Malware Methods of Analysis and Detection
No ratings yet
Control Flow Graphs Against Malware Methods of Analysis and Detection
5 pages
2 - Artificial Intelligence For Oral Squamous Cell Carcinoma Detection Based On Oral Photographs - A Comprehensive Literature Review.
No ratings yet
2 - Artificial Intelligence For Oral Squamous Cell Carcinoma Detection Based On Oral Photographs - A Comprehensive Literature Review.
12 pages
Machine Learning-Lecture 03
No ratings yet
Machine Learning-Lecture 03
19 pages
Kman 07
No ratings yet
Kman 07
9 pages
Stochastic Gradient Descent (SGD) :: Import As
No ratings yet
Stochastic Gradient Descent (SGD) :: Import As
4 pages
Mail Spam
No ratings yet
Mail Spam
4 pages
CSC408 Case Study - Mar23 - With Rubric
No ratings yet
CSC408 Case Study - Mar23 - With Rubric
5 pages
A Novel Machine Learning Algorithm For Spammer Identification in Industrial Mobile Cloud Computing
No ratings yet
A Novel Machine Learning Algorithm For Spammer Identification in Industrial Mobile Cloud Computing
10 pages
A Hybrid Modeling Approach For Predicting The Educational Use of Mobile Cloud Computing Services in Higher Education
No ratings yet
A Hybrid Modeling Approach For Predicting The Educational Use of Mobile Cloud Computing Services in Higher Education
7 pages
Student Placement Prediction
No ratings yet
Student Placement Prediction
4 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet

Lec 07 8

Uploaded by

Lec 07 8

Uploaded by

IT549: Deep Learning

Neural Networks Examples using Keras

● The word 'deep' in 'deep learning' does not mean profound.

● In deep learning, we have 'lots' of layers — tens or even hundreds.

One way of thinking about Machine Learning:

● The network takes in vectors (examples).

It is excelling at 'perceptual' tasks, e.g.

But it is ﬁnding ever wider application:

○ We'll see one example that uses images.

○ in the output layer: linear, sigmoid or softmax; and

○ in the hidden layers: sigmoid or ReLU.

● This will also introduce the Keras library.

scikit-learn has very limited support for neural networks.

● It is very high-level, making it easy to construct networks, ﬁt models and make

Network Architecture: Number of Hidden Layers

● Neural network with no hidden layers is just a linear model.

● Hidden layers are needed when data is not linearly separable.

Network Architecture: Number of Neurons in Hidden Layers

Layers are the building blocks.

Layers are combined into networks:

Once the network is built, we compile it, specifying:

An optimizer, such as SGD — but see below.

A list of metrics to monitor during training and testing:

We know about Gradient Descent: Batch, Mini-Batch, Stochastic.

● some may converge more quickly.

although a disadvantage is that they typically introduce further hyperparameters (e.g.

We will use RMSprop below.

● Input layer: one input per feature.

● Hidden layers: one or more hidden layers.

Let's start with this:

● Input layer: one input per feature.

● Hidden layers: one or more hidden layers.

● Output layer: just one output neuron (for binary classiﬁcation).

Let's start with this:

// 0.6666666865348816 Feel free to edit the code, e.g. add or remove

● Input layer: one input per feature.

● Hidden layers: one or more hidden layers.

● Output layer: one output neuron per class.

Let's start with this:

Note the loss function above:

● But, instead, let's switch to an image processing example.

Fashion MNIST is also a classic dataset for multi-class classiﬁcation.

We don't really need scikit-learn pipelines this time.

But we do need to reshape:

● Our training data is in a 3D array of shape (60000, 28, 28).

● We change it to a 2D array of shape (60000, 28 * 28).

○ This 'ﬂattens' the images.

○ When working with images, it is often better not to do this.

● Similarly, the test data.

We will do a three-layer network:

○ These features would have been carefully hand-designed.

Notice how different life is now — when using neural networks.

● The dense layers are only one possibility.

● But there are numerous hyperparameters (and even more to come!)

● There is a considerable risk of overﬁtting.

You might also like