7 Neural Networks - Lecture Slides
7 Neural Networks - Lecture Slides
Neural networks
Wim De Keyser
Rony Baekeland
Tom Magerman
Quote
The brain
A human brain consists of about 86,100,000,000 nerve cells.
Reference points:
A nerve cell or neuron consists of three parts: 1,000,000 second = 11.57 days
• a cell body (aka soma aka cyton) contains 1,000,000,000 seconds = 31.71 years
86,100,000,000 seconds = 2,47 milennia
o dendrites: inputs that receive signals
o a nucleus
• axon or output with branches
• synapse: connection between an axon and a dendrite
Neuron
The neuron explained
A neuron combines incoming signals from other neurons and in turn transmits
this signal to other neurons. Amplification or attenuation of the signal occurs
in the dendrites and axons.
Neuron j
Integration Activation
function function
Artificial Neuron – Activation function
Note: when the neural network needs to yield binary values, the logistic function is widely used:
(returns a value between 0 and 1 and is a special case of the sigmoid function)
Artificial Neural Network (ANN or NN)
hidden
hidden
output
each connection synapses
positive and
input
= contain negative
weight weights weights
ANN
Output neurons
ANN
topologies
ANN
Lots of topologies possible:
• Feed Forward NN
• Recurrent NN
• LSTM
• Autoencoders
• Convolutional NN
• Kohonnen NN
• …
→ each has a specific goal
We limit ourselves to
Feed Forward NN
https://towardsdatascience.com/the-mostly-complete-chart-of-neural-networks-explained-3fb6f2367464
Perceptrons
X1 x2 Y w1
0 0 0
w2
1 0 0 X2
0 1 0
E.g; b
1 1 1
1 x2
Every line that separates the (0,0),(1,0)and (0,1) from (1,1) will do fine
x1
Perceptrons
X1 x2 Y
0 0 0
1 0 1 X2
0 1 1
E.g;
1 1 0
1 x2
Constant
1 1
neurons
Response variabele
or ouput variabele
Covariates or
Input variabels
4 weights and 2 weights and
2 biases 1 bias
A simple example - XOR
1 - 0 - 0. 00
1 -0
.0
.0 01
01 1996 99
99 71
52
5 8
93
2
6
49
99
5
7
92
28
7
27
0.
P2 (variable 2)
.
-0
5
1.090252
A simple example - XOR
1 n1 n3
1 n2
n3:
Integration function: zj = 0.44075 x 0.46983 +
0.79255 x 0.72891 – 0.00200
= 0.782781
Activation function : = = 0.68628
Note: here was chosen -by means of a parameter- not to apply the activation function to the output neuron -see below-
What to use an ANN for?
What to use an ANN for?
Classification
Image recognition, OCR, fraud detection,
identification, logistic regression, binary classification,
multiclass classifcation, …
see also cluster analysis (Data & A.I. 2)
Regression
See also linear regression and Forecasting
(Data & A.I. 2)
Compression
See also PCA (Data & A.I. 2) by means of
autoencoder
Data Science Process
All topics from Data & A.I. 2 and 3 apply to the Data Science
Process.
• Understanding the Business: research phases, …
• Understanding the Data: frequency distributions, histogram,
center measures, distributions, …
• Data management: transformation and manipulation of data
• Modeling: linear regression, forecasting, decision trees,
clustering, association rules, naive bayes, metaheuristics, ANN
• Evaluation: Evaluation metrics
• Application: writing smart applications, research phases
regression is
that here you get an output
between 0 and 1.
You can look at that output as a
probability:
How does an ANN
work?
How does an ANN learn?
We will use some notation to indicate elements of a ANN. Look careful at the examples.
𝒙𝒊 𝒚𝒊 𝟑 𝟒 𝟑
𝒛 𝒂 𝜹
activation-vector
4e layer
weighted input vector
3e layer
Example 1
Training dataset with
7 input neurons required 3 output neurons required
0,3
0,4
0,8 0
0,9 1
0,2 0
0,2
0,5
0,5
0,1
0,5 1
0,8 0
0,3 0
0,7
0,3
0,9
0,2
0,1 1
0,1 1
0,3 1
0,3
0,8
A. Feedforward pass
1. Put input vector into the input neurons
2. Have number values propagated by ANN
3. Output neurons take a value
B. Backpropagation pass
4. Calculate cost by comparing with
5. Use the cost to calculate the errors for all layers
6. Adjust weights in all layers based on the errors
The backpropagation algorithm looks for the weights in the neural network that
provide a (local) minimum for the error function. Is this effect an algorithm or is
it a heuristic (see later)?
How does an ANN learn? - Initialize network weights
𝟏 𝟐 𝟑 𝟒
𝑾 𝑾 𝑾 𝑾
..
: :
𝟏 𝟐 𝟑 𝟒
𝑾 𝑾 𝑾 𝑾
= errors in layer
How does an ANN learn? - Backpropagation pass (2/2)
𝟏 𝟐 𝟑 𝟒
𝑾 𝑾 𝑾 𝑾
..
: :
28
0 0
0 1
0 2
0 3
28
5
0 4
1 5
0 6
0 7
0 8
0 9
transformatie
0
0.01
=
0.00
0.4
0.03 One-Hot Encoding
0.03
0.01
28x28 = 784 inputs
0.02
0.04
0.64
0 0
. . 0 1
0 2
.
.
. 0 3
. . 0 4
0,78 1 5
0,8 0 6
0,9 0 7
1 0 8
0,64 0 9
0,04
1
0,36
0
0
0
0 1 trainingsexample
0
0
0
Input vector + target vector
How does an ANN learn?
One-Hot Encoding
One-hot encoding is a process by which categorical variables are
converted into a form that could be provided to ML algorithms to
do a better job in prediction.
Do we break down the data into a training and a test dataset before or
after the normalization?
How does an ANN learn? - Scaling data
Step 0: Install the package & import the required libraries, functions,…
Step 1: Upload the dataset and inspect the data
Step 2: Perform the needed data management manipulations in order to
prepare the data for processing.
Step 3: Normalise the data (only if needed and normalisation is not a part of
the chozen ANN-model –see example MNIST-)
Step 4: If required, split the dataset into a training dataset and a test dataset
Step 5: Build the ANN-model
Step 6: Train the ANN-model
Step 7: Evaluate the quality of the ANN-model
Step 8: Apply the ANN-model to a new dataset
Note: Depending on the project at hand, some steps can be skipped
# Step 0: Install the packages & import the required libraries, functions,…
import numpy as np
from tensorflow import keras
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Dense, BatchNormalization
from tensorflow.keras.utils import to_categorical
from tensorflow. keras.optimizers import Adam
from livelossplot import PlotLossesKeras
from keras.utils.vis_utils import plot_model
4 weights and
2 biases
4 weights and
2 biases
model_xor.evaluate(x_xor_data, y_xor_data)
Change the model and add several hidden layers. Check if this would improve
the predictions.
Quiz question: Why can't a neural network with only linear activation
functions predict the XOR function?
Because the XOR function cannot be linearly separated, i.e. you can never
perfectly separate the two cases with one line.
ANN in Python - Keras – ANN parameters
Dense parameters:
activation: sigmoid Applies the sigmoid activation function: sigmoid(x) = 1 / (1 + exp(-x)).
Returns a value between 0 and 1, not useful in an regression ANN
relu Applies the rectified linear unit activation function: max(x, 0)
linear Linear activation function (pass-through). Useful in an regression
ANN
softmax: converts a vector of values to a probability distribution. Useful when
output layer consist of nodes for different outcome categories
Other alternatives: https://keras.io/api/layers/activations/
model.compile parameters:
optimizer: Adam(learning_rate=lr) #lr
{0.1,0.01,0.001,0.0001,0.00001;0.000001}
useful in a classification ANN
RMSprop(learning_rate=lr) useful in an regression ANN
Other alternatives: https://keras.io/api/optimizers/
loss: keras.losses.binary_crossentropy when the ANN is aimed at binary classification
keras.losses.categorical_crossentropy when the ANN is aimed at multiclass
classification
keras.losses.MeanAbsoluteError() when expected outcome is a numerical value
(regression)
Other alternatives: https://keras.io/api/losses/
metrics: ['accuracy’]: useful in an classification ANN
keras.metrics.MeanAbsolutePercentageError(): useful in an regression ANN
ANN in Python - Keras – ANN parameters
model.fit parameters:
epoch: number of times the training examples are offered to the ANN (= number of iterations)
batch size: accumulate the errors of a number of examples before updating the weights faster
training
validation_split: percentage of training set used as a validation set ≠ test set
Cost function:
It is defined as follows:
−(y⋅log(p)+(1−y)⋅log(1−p))
Where:
• y is the true binary label (0 or 1).
• p is the predicted probability that the data point belongs to class 1.
ANN tuning - Metrics
Accuracy
Recall
Precision
F1
….
ANN tuning – Popular optimizers based on SGD
Note: np.argmax Returns the indices of the maximum values along an axis.
epochs
accuracy increases
ANN in Python - Keras – Example Cereals US (1/4)
We have data on American cereals (see ‘cereals US.csv’ on Canvas). There is
also a rating that gives an indication of how healthy these cereals are.
We want to construct, train and check a neural network and use it to predict the
rating of a new cereal.
Amongst other things, we have to split the data into a training and test dataset.
Place -at random- 80% of the data in the training dataset and the remaining 20
% in the test dataset.-).
We want to be able to predict the ‘Rating’ based on the data in the columns
‘Calories’, ‘Protein (g)’, ‘Fat’, ‘Sodium’ and ‘Dietary Fiber’.
So we will have 5 input neurons and 1 output neuron. We will add only one
hidden layer with 3 neurons.
We will test the quality of the neural network by means of the test dataset.
The rating should be predicted for the following two cereals available on the
Belgium market:
Calories Protein (g) Fat Sodium Dietary Fiber
Kellogg's Coco pops 116 1.7 0.8 230 0.9
Boni Cereal flakes 124 2.6 2.1 260 1.2
ANN in Python - Keras – Example Cereals US (2/4)
# Step 1: Upload the dataset and inspect the data
cereals = pd.read_csv('cereals US.csv',delimiter=';')
cereals.info()
cereals.describe() #note: visualisation of values only for quantitative variables
cereals.isna().sum().sum()
# Step 2: Perform the needed data management manipulations
x_cereals = cereals[['Calories','Protein (g)', 'Fat', 'Sodium', 'Dietary Fiber']].copy()
y_cereals = cereals[['Rating']].copy()
# Step 3: Normalise the data
### min-max normalisation
def minmax_norm(col):
minimum = col.min()
range = col.max() - minimum
return (col-minimum)/range
x_cereals_norm = pd.DataFrame()
for column in x_cereals:
x_cereals_norm[column] = minmax_norm(x_cereals[column])
# Step 4: Split the dataset into a training dataset and a test dataset
from sklearn.model_selection import train_test_split
x_train_cer,x_test_cer,y_train_cer,y_test_cer=train_test_split(x_cereals_norm,y_cereals,
test_size=0.2) #0.2 = 20%
# Step 5: Build the ANN-model
### Preparing the layers of the neural network
inputs_cer = Input(shape=(5,))
x_cer = Dense(32, activation='relu')(inputs_cer)
x_cer = Dense(16, activation='relu')(x_cer)
x_cer = Dense(8, activation='relu')(x_cer)
x_cer = Dense(4, activation='relu')(x_cer)
outputs_cer = Dense(1, activation='linear')(x_cer)
ANN in Python - Keras – Example Cereals US (3/4)
### Build the neural network model
model_cer = Model(inputs_cer, outputs_cer, name='Cereals')
model_cer.summary()
model_cer.compile(optimizer=RMSprop(learning_rate=0.01),
loss=keras.losses.MeanAbsoluteError(),
metrics= keras.metrics.MeanAbsolutePercentageError())
# Step 6: Train the ANN-model
history_cer = model_cer.fit(
x_train_cer, # training data
y_train_cer, # training targets
epochs=200)
# Step 7: Evaluate the quality of the ANN-model
model_cer.evaluate(x_test_cer,y_test_cer)
predicted_values = model_cer.predict(x_test_cer)
pred =[]
for i in range(predicted_values.size):
pred = pred + [predicted_values[i][0]]
predicted = pd.Series(pred, name='predicted')
actual = y_test_cer['Rating'].copy()
actual = actual.reset_index()
actual = actual['Rating']
mape = ((predicted - actual).abs()/actual).mean()
rmse = math.sqrt(((predicted - actual)**2).mean())
ANN in Python - Keras – Example Cereals US (4/4)
# Step 8: Apply the ANN-model to a new dataset
cerealsBE= pd.DataFrame({'Calories':[116,124], 'Protein (g)':[1.7,2.6], 'Fat':
[0.8,2.1],
'Sodium': [230,260], 'Dietary Fiber': [0.9,1.2]})
def minmax_norm_2(col1, col2):
minimum = col2.min()
range = col2.max() - minimum
return (col1-minimum)/range
cerealsBE_norm = pd.DataFrame()
for column in cerealsBE:
cerealsBE_norm[column] = minmax_norm_2(cerealsBE[column], x_cereals[column])
predicted_BE = pd.Series(np.argmax(model_cer.predict(cerealsBE_norm),axis=1),
name='predicted')
Neural networks in
the media
In the media
https://www.bbc.com/news/technology-5
0720823
It takes minutes for most new Minecraft players to work out how to dig up the
diamonds that are key to the game, but training artificial intelligence to do it has
proved harder than expected.
Over the summer, Minecraft publisher Microsoft and other organisations challenged coders
to create AI agents that could find the coveted gems. Most can crack it in their first session.
But out of more than 660 entries submitted, not one was up to the task. The results of the
MineRL - which is pronounced mineral - competition are due to be announced formally on
Saturday at the NeurIPS AI conference in Vancouver, Canada. The aim had been to see
whether the problem could be solved without requiring a huge amount of computing power.
Despite the lack of a winner, one of the organisers said she was still "hugely impressed" by
some of the participants.
:
The organisers wanted the coders to create programs that learned by example, through a
technique known as "imitation learning". This involves trying to get AI agents to adopt the
best approach by getting them to mimic what humans or other software do to solve a task. It
contrasts with relying solely on "reinforcement learning", in which an agent is effectively
trained to find the best solution via a process of trial and error, without drawing on past
knowledge. Researchers have found that using reinforcement learning alone can sometimes
deliver superior results. For instance, DeepMind's AlphaGo Zero program trumped one of the
research hub's earlier efforts, which used both reinforcement learning and the study of
labelled data from human play to learn the board game Go. But this "pure" approach
typically requires much more computing power, making it too expensive for researchers
AlphaGo Zero (AGZ)
Bron: https://hackernoon.com/the-3-tricks-that-made-alphago-zero-work-f3d47b6686ef
AlphaGo Zero (AGZ)
Bron: https://hackernoon.com/the-3-tricks-that-made-alphago-zero-work-f3d47b6686ef
In the media
https://www.bbc.com/news/tech
nology-48799045
An app that claimed to be able to digitally remove the clothes from pictures of
women to create fake nudes has been taken offline by its creators.
The $50 (£40) Deepnude app won attention and criticism because of an article by tech
news site Motherboard.
One campaigner against so-called revenge porn called the app "terrifying".
The developers have now removed the software from the web saying the world was not
ready for it.
"The probability that people will misuse it is too high," wrote the programmers in a
message on their Twitter feed. "We don't want to make money this way."
Anyone who bought the app would get a refund, they said, adding that there would be no
other versions of it available and withdrawing the right of anyone else to use it.
The developers also urged people who had a copy not to share it, although the app will
still work for anyone who owns it.
:
The program reportedly uses AI-based neural networks to remove clothing from images of
women to produce realistic naked shots.
The networks have been trained to work out where clothes are in an image, mask them
by matching skin tone, lighting and shadows and then fill in estimated physical features.
The technology is similar to that used to create so-called deepfakes, which manipulate
video to produce convincingly realistic clips. Early deepfake software was used to create
pornographic clips of celebrities.
QUESTIONNAIRE
Questionnaire
Questionnaire