0% found this document useful (0 votes)

47 views3 pages

Machine Learning and Pattern Recognition Week 8 Neural Net Intro

This document provides an introduction to neural networks. It discusses linear models and how neural networks can be viewed as linear models with additional learned feature transformations. It also describes the basic architecture of feedforward neural networks, including layers, units, and common activation functions.

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views3 pages

Machine Learning and Pattern Recognition Week 8 Neural Net Intro

Uploaded by

zeliawillscumberg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Neural networks introduction

We’ve seen that we can get a long way with linear models, and generalized linear models
(linear models combined with a non-Gaussian observation model).
Linear models are still widely used, and should still be implemented as baselines, even if
you’re convinced you need something more complicated. However, making a linear model
work well might require some insight into how to transform the inputs and outputs (“feature
engineering”). You can think of neural networks1 as linear models with additional parts,
where at least some of the feature transformations can also be learned.
Parameters are fitted for a series of stages of computation, rather than just the weights for a
single linear combination. The benefit of neural networks over linear models is that we can
learn more interesting functions. But fitting the parameters of a neural network is harder:
we might need more data, and the cost function is not convex.

1 We’ve already seen a neural net, we just didn’t fit it

We’ve already fitted non-linear functions. We simply transformed our original inputs x into
a vector of basis function values φ before applying a linear model. For example we could
make each basis function a logistic sigmoid:

φk (x) = σ ((v(k) )> x + b(k) ), (1)

and then take a linear combination of those to form our final function:

f (x) = w> φ(x) + b, or f (x) = σ (w> φ(x) + b). (2)

Here I’ve chosen to put in bias parameters in the final step, rather than adding a con-
stant basis function. This function is a special case of a “neural network”. In particular a
“feedforward (artificial) neural network”, or “multilayer perceptron” (MLP).
The function has many parameters θ = {{v(k) , b(k) }kK=1 , w, b}. What would make it a neural
network is if we fit all of these parameters θ to data. Rather than placing basis functions by
hand, we pick the family of basis functions, and “learn” the locations and any other parame-
ters from data. A neural network “learning algorithm”, is simply an optimization procedure
that fits the parameters to data, usually (but not always) a gradient-based optimizer that
iteratively updates the parameters to reduce their cost. In practice, optimizers can only find
a local optimum, and in practice optimization is usually terminated before convergence to
even a local optimum.

2 Some neural network terminology, and standard processing layers

In the language of neural networks, a simple computation that takes a set of inputs and
creates an output is called a “unit”. The basis functions in our neural network above are
“logistic units”. The units before the final output of the function are called “hidden units”,
because they don’t correspond to anything we observe in our data. The feature values
{ x1 , x2 , . . . x D } are sometimes called “visible units”.
In the neural network model above, the set of φk basis functions all use the same inputs x,
and all of the basis function values go on together to the next stage of processing. Thus these
units are said to form a “layer”. The inputs { x1 , x2 , . . . x D } also form a “visible layer”, which
is connected to the layer of basis functions.

1. Here I am talking about the simplest “feed-forward” neural networks.

MLPR:w8a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

The layers in simple feed-forward neural networks apply a linear transformation, and then
apply a non-linear function element-wise to the result. To compute a layer of hidden values
h(l ) from the previous layer h(l −1) :
h ( l ) = g ( l ) (W ( l ) h ( l − 1 ) + b ( l ) ) , (3)
where each layer has a matrix of weights W (l ) , a vector of biases b(l ) , and uses some
non-linear function g(l ) , such as a logistic sigmoid: g(l ) ( a) = σ ( a); or a Rectified Linear
Unit (ReLU): g(l ) ( a) = max(0, a).2 The input to the non-linearity, a, is called an activation. If
we didn’t include non-linearities there wouldn’t be any point in using multiple layers of
processing (see the week 2 question sheet).
We can define h(0) = x, so that the first hidden layer takes input from the features of our
data. Then we can add as many layers of processing we like before the final layer, which
gives the final output of our function.
Implementing the function defined by a standard neural network is very little code! A
sequence of linear transformations (matrix multiplies and maybe the addition of a bias
vector), and element-wise non-linearities.

3 Why is it called a neural network?

[If you’re short of time, skip this section, and go straight to check your understanding]
Why is it called a neural network? The term neural network is rooted in these models’
origins as part of connectionism — models of intelligent behaviour that are motivated by how
processes could be structured, but usually abstracted far from the biological details we know
about the brain. An accurate model of neurons in the brain would involve large sets of
stochastic differential equations; not smooth, simple, deterministic functions.
There is some basis to the neural analogy. There is electrical activity within a neuron. If a
voltage (“membrane potential”) crosses a threshold, a large spike in voltage called an action
potential occurs. This spike is seen as an input by other neurons. A neuron can be excited
or depressed to varying degrees by other neurons (it weights its inputs). Depending on the
pattern of inputs to a neuron, it too might fire or might stay silent.
In early neural network models, a unit computed a weighted combination of its input, w> x.
The unit was set to one if this weighted combination of input spikes reached a threshold
(the unit spikes), and zero otherwise (the unit remains silent). The logistic function φk (x)
is a ‘soft’ version of that original step function. We use a differentiable version of the step
function so we can fit the parameters with gradient-based methods.

4 Check your understanding

Before your discussion group: Try writing a Python function that evaluates a random neural
network function — a neural network function with randomly sampled weights.
[The website version of this note has a question here.]
With your discussion group: Explore how choices defining the distribution over functions
affects the typical functions you see.
Things you could try include:

2. A natural question from keen students at this point is: “what non-linearity should I use?”. As with many
questions in machine learning, the answer is “it depends” and “we don’t know yet”. ReLUs (named after Relu
Patrascu, a friendly sysadmin at the University of Toronto) replaced logistic sigmoids in generic hidden layers
of many neural networks as being easy to fit. However, now I would always use a PReLU instead, which have
worked better in cases I’ve tried. There are several other variants, including GELUs, SELUs. The small differences
between these non-linearities don’t tend to be where big advances come from. Fully differentiable non-linearities
like soft-plus log(1 + e a ), which looks like a ReLU, will make some optimizers happier. Logistic sigmoids are still
useful as switches, used in mixtures of experts, LSTMs, and adapting models. Although some of this work is
theoretically motivated, what cross-validates the best is what ultimately wins in practice.

MLPR:w8a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

• Sample weights with different standard deviations, e.g., 0.1 and 10.
• Try sampling weights from uniform np.random.rand.
• Try removing the non-linearities: setting g( a) = a, in every layer.
• Try using ReLUs: g( a) = max( a, 0).
• Try changing the number of layers.
• Try adding in randomly sampled biases.

Empirically, what choices encourage/discourage typical functions from being complicated,

that is (in 1-dimension) having at least several turning points? Can you create priors over
functions that look different to those when you sample from a Gaussian process? If so, how
are they different? Did you notice any interactions between the choices?

5 Further reading
Bishop’s introduction to neural networks is Section 5.1. Bishop also wrote another book,
published in 1995: Neural Networks for Pattern Recognition. Despite being 25 years old, and so
missing out on more recent insights, it’s still a great introduction!
MacKay’s textbook Chapter 39 is on the “single neuron classifier”. The classifier described in
this chapter is precisely logistic regression, but described in neural network language. Maybe
this alternative view will help.
Murphy’s quick description of Neural Nets is in Section 16.5, which is followed by a literature
survey of other variants.
Theoretical Neuroscience (Dayan and Abbott) has more detail about biological neural networks
and theoretical models of how they learn.

MLPR:w8a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

13 Nnbasics
No ratings yet
13 Nnbasics
22 pages
Chapter 3 - ANN Updated
No ratings yet
Chapter 3 - ANN Updated
140 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
86 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
Competitive Applications
No ratings yet
Competitive Applications
16 pages
Unit2 3 Notes
No ratings yet
Unit2 3 Notes
34 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
What Is A Neural Network? - IBM
No ratings yet
What Is A Neural Network? - IBM
10 pages
Lec 06
No ratings yet
Lec 06
20 pages
Lec 19
No ratings yet
Lec 19
16 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
LLM For Maths People
No ratings yet
LLM For Maths People
53 pages
Unit 5
No ratings yet
Unit 5
59 pages
CS217 2024 Lec11
No ratings yet
CS217 2024 Lec11
7 pages
Unit 5 ML
No ratings yet
Unit 5 ML
37 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
DL 2
No ratings yet
DL 2
62 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
What Is Gradient Based Learning in Deep Learning
100% (1)
What Is Gradient Based Learning in Deep Learning
12 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
AI Unit5 Neural Network 1c2c9166 c1b7 47a3 8ce1 E914f1ab6afb
No ratings yet
AI Unit5 Neural Network 1c2c9166 c1b7 47a3 8ce1 E914f1ab6afb
52 pages
Lecture8,9-Neural Networks
No ratings yet
Lecture8,9-Neural Networks
65 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Neural NetworksChapter2Sup
No ratings yet
Neural NetworksChapter2Sup
20 pages
Machine Learning: The Hundred-Page Book
No ratings yet
Machine Learning: The Hundred-Page Book
17 pages
Grade 7th and 8th AI Book
No ratings yet
Grade 7th and 8th AI Book
56 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
Untitledfff
No ratings yet
Untitledfff
40 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
M3 Transcript
No ratings yet
M3 Transcript
10 pages
Session NN
No ratings yet
Session NN
32 pages
AI Mod4 Session 8 Best Fit Line & ANN
No ratings yet
AI Mod4 Session 8 Best Fit Line & ANN
39 pages
Module 2
No ratings yet
Module 2
44 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
36 pages
Chapter 6 AI
No ratings yet
Chapter 6 AI
52 pages
Introduction To Feedforward Neural Networks
No ratings yet
Introduction To Feedforward Neural Networks
20 pages
DM Chapter 7
No ratings yet
DM Chapter 7
6 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Unit 3
No ratings yet
Unit 3
12 pages
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
No ratings yet
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
40 pages
Chapter Neural Networks
No ratings yet
Chapter Neural Networks
14 pages
Intro To DL
No ratings yet
Intro To DL
28 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
Lecture 08 On Neural Networks 1
No ratings yet
Lecture 08 On Neural Networks 1
15 pages
Robotics Basic 7 and 8 - 063639
No ratings yet
Robotics Basic 7 and 8 - 063639
9 pages
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
No ratings yet
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
50 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
Neural Network
No ratings yet
Neural Network
7 pages
Lecture-0 - Ai
No ratings yet
Lecture-0 - Ai
34 pages
Week-3 Module-2 Neural Network
No ratings yet
Week-3 Module-2 Neural Network
58 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Neural Network Basics 2.1 Neurons or Nodes and Layers
No ratings yet
Neural Network Basics 2.1 Neurons or Nodes and Layers
9 pages
Watch: Foreign Military Studies Office
No ratings yet
Watch: Foreign Military Studies Office
51 pages
Unit 2
No ratings yet
Unit 2
18 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
Bio Statslectures
No ratings yet
Bio Statslectures
60 pages
Advance AI & ML Certification Program
100% (1)
Advance AI & ML Certification Program
29 pages
Minor 1
No ratings yet
Minor 1
20 pages
TS Part2
No ratings yet
TS Part2
62 pages
Neural Networks From Scratch: 3.1 Formal Neuron
No ratings yet
Neural Networks From Scratch: 3.1 Formal Neuron
8 pages
Artificial Neural Network
100% (1)
Artificial Neural Network
16 pages
Lec 23
No ratings yet
Lec 23
13 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Machine Learning and Pattern Recognition - Laplace - Approximation
No ratings yet
Machine Learning and Pattern Recognition - Laplace - Approximation
4 pages
Part 3
No ratings yet
Part 3
29 pages
B.Tech CSE (AI) - R23
No ratings yet
B.Tech CSE (AI) - R23
99 pages
Applsci 13 04550
No ratings yet
Applsci 13 04550
21 pages
The Literature Review Six Steps To Success 3rd Edition
100% (1)
The Literature Review Six Steps To Success 3rd Edition
8 pages
Bca Brochure 2024
No ratings yet
Bca Brochure 2024
28 pages
Ins2701 Assignment 1.2025
No ratings yet
Ins2701 Assignment 1.2025
13 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
Part 4
No ratings yet
Part 4
24 pages
FDS Viva
No ratings yet
FDS Viva
46 pages
Biological Data Science Lecture4
No ratings yet
Biological Data Science Lecture4
21 pages
Artificial Intelligence and Machine Learning Applications in The PR 2024 Hel
No ratings yet
Artificial Intelligence and Machine Learning Applications in The PR 2024 Hel
17 pages
MDA3S
No ratings yet
MDA3S
22 pages
Computer Science Undergraduate Thesis Ideas
100% (3)
Computer Science Undergraduate Thesis Ideas
6 pages
Biological Data Science Lecture6
No ratings yet
Biological Data Science Lecture6
29 pages
Part 5
No ratings yet
Part 5
31 pages
Bayesian Week4 LectureNotes
No ratings yet
Bayesian Week4 LectureNotes
15 pages
MATH11183 Week 1-Part 2
No ratings yet
MATH11183 Week 1-Part 2
18 pages
Week 8 Pca
No ratings yet
Week 8 Pca
26 pages
PMRslides 02
No ratings yet
PMRslides 02
13 pages
Artificial Intelligence and Machine Learning (Ai & ML)
No ratings yet
Artificial Intelligence and Machine Learning (Ai & ML)
20 pages
I. Choose The Letter of Your Best Answer. 1. Which One of These Is Not An Area of AI? C
No ratings yet
I. Choose The Letter of Your Best Answer. 1. Which One of These Is Not An Area of AI? C
7 pages
Heat Advection
No ratings yet
Heat Advection
12 pages
Sample Literature Review Chicago
100% (3)
Sample Literature Review Chicago
4 pages
Interpretable Machine Learning
No ratings yet
Interpretable Machine Learning
3 pages
2017 AMAM Exam Paper
No ratings yet
2017 AMAM Exam Paper
6 pages
W6a Gaussian Process Kernels
No ratings yet
W6a Gaussian Process Kernels
6 pages
Week 2 Naive Bayes
No ratings yet
Week 2 Naive Bayes
15 pages
Hyper Parameters
No ratings yet
Hyper Parameters
7 pages
Doing Business in Hungary
No ratings yet
Doing Business in Hungary
22 pages
Structural Damage Image Classification: Minnie Ho Jorge Troncoso
No ratings yet
Structural Damage Image Classification: Minnie Ho Jorge Troncoso
6 pages
BDS 2016-17
No ratings yet
BDS 2016-17
4 pages
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
No ratings yet
Machine Learning and Pattern Recognition Minimal Stochastic Variational Inference Demo
3 pages
Calanoga, Novelyn Kaye R. BSN-2B (Ict) Final Exam
No ratings yet
Calanoga, Novelyn Kaye R. BSN-2B (Ict) Final Exam
3 pages
BDS 2018-19
No ratings yet
BDS 2018-19
6 pages
Ajay PD Yadav
No ratings yet
Ajay PD Yadav
7 pages
W2e Multivariate Gaussian
No ratings yet
W2e Multivariate Gaussian
6 pages
2019 AMAM Exam Paper
No ratings yet
2019 AMAM Exam Paper
3 pages
Bayesian Workshop1 Solution
No ratings yet
Bayesian Workshop1 Solution
3 pages
MLPR w0f - Machine Learning and Pattern Recognition
No ratings yet
MLPR w0f - Machine Learning and Pattern Recognition
3 pages
Samar Resume
No ratings yet
Samar Resume
2 pages
Robots, Re-Evolving Mind
No ratings yet
Robots, Re-Evolving Mind
9 pages
Award in Education and Training Sample
No ratings yet
Award in Education and Training Sample
9 pages
w9b Netflix Prize
No ratings yet
w9b Netflix Prize
3 pages
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Analysis of HubSpot and Motion AI
No ratings yet
Analysis of HubSpot and Motion AI
4 pages
6th Sem CIE 2 Attendance Status
No ratings yet
6th Sem CIE 2 Attendance Status
4 pages
w2c Central Limit
No ratings yet
w2c Central Limit
1 page
Assignment 1
No ratings yet
Assignment 1
3 pages
BPR Students Slides 22 Chapter 6
No ratings yet
BPR Students Slides 22 Chapter 6
38 pages

Machine Learning and Pattern Recognition Week 8 Neural Net Intro

Uploaded by

Machine Learning and Pattern Recognition Week 8 Neural Net Intro

Uploaded by

Neural networks introduction

1 We’ve already seen a neural net, we just didn’t fit it

φk (x) = σ ((v(k) )> x + b(k) ), (1)

f (x) = w> φ(x) + b, or f (x) = σ (w> φ(x) + b). (2)

2 Some neural network terminology, and standard processing layers

1. Here I am talking about the simplest “feed-forward” neural networks.

MLPR:w8a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 1

3 Why is it called a neural network?

4 Check your understanding

MLPR:w8a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 2

Empirically, what choices encourage/discourage typical functions from being complicated,

MLPR:w8a Iain Murray and Arno Onken, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2020/ 3

You might also like