0% found this document useful (0 votes)

12 views

Week4_LearningII

Uploaded by

albertadi412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Week4_LearningII

Uploaded by

albertadi412

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

CSCI218: Foundations

of Artificial Intelligence
Classical stats/ML: Minimize loss function
§ Which hypothesis space H to choose?
§ E.g., linear combinations of features: hw(x) = wTx
§ How to measure degree of fit?
§ Loss function, e.g., squared error Σj (yj – wTx)2
§ How to trade off degree of fit vs. complexity?
§ Regularization: complexity penalty, e.g., ||w||2
§ How do we find a good h?
§ Optimization (closed-form, numerical); discrete search
§ How do we know if a good h will predict well?
§ Try it and see (cross-validation, bootstrap, etc.)

2
Deep Learning/Neural Network

Image Classification
Very loose inspiration: Human neurons

Axonal arborization

Axon from another cell

Synapse
Dendrite Axon

Nucleus

Synapses

Cell body or Soma

Simple model of a neuron (McCulloch & Pitts, 1943)
Bias Weight
a0 = 1 aj = g(inj)
w0,j
g
wi,j inj
ai
Σ aj

Input Input Activation Output

Links Function Function Output Links

§ Inputs ai come from the output of node i to this node j (or from “outside”)
§ Each input link has a weight wi,j
§ There is an additional fixed input a0 with bias weight w0,j
§ The total input is inj = Si wi,j ai
§ The output is aj = g(inj) = g(Si wi,j ai) = g(w.a)
Activation functions g
g(ini) g(ini)

+1 +1

ini ini
(a)
Threshold (b)1/(1+e-x)
Sigmoid
Reminder: Linear Classifiers

▪ Inputs are feature values

▪ Each feature has a weight
▪ Sum is the activation

▪ If the activation is: f1

w1
▪ Positive, output +1 w2
▪ Negative, output -1
f2
w3 Σ >0?
f3
How to get probabilistic decisions?

If very positive, want probability going to 1

If very negative, want probability going to 0

Sigmoid function
Best w?
Maximum likelihood estimation:

with:

= Logistic Regression
Multiclass Logistic Regression
Multi-class linear classification
A weight vector for each class:

Score (activation) of a class y:

Prediction w/highest score wins:

How to make the scores into probabilities?

original activations softmax activations

Best w?
Maximum likelihood estimation:

with:

= Multi-Class Logistic Regression

Optimization

i.e., how do we solve:

Hill Climbing
A simple, general idea
Start wherever
Repeat: move to the best neighboring state
If no neighbors better than current, quit

What’s particularly tricky when hill-climbing for multiclass

logistic regression?
• Optimization over a continuous space
• Infinitely many neighbors!
• How to do this efficiently?
1-D Optimization

Could evaluate and

Then step in best direction

Or, evaluate derivative:

Tells which direction to step into
2-D Optimization

Source: offconvex.org
Gradient Ascent
Perform update in uphill direction for each coordinate
The steeper the slope (i.e. the higher the derivative) the bigger the step
for that coordinate
E.g., consider:

Updates: ▪ Updates in vector notation:

with: = gradient
Steepest Descent
o Idea:
o Start somewhere
o Repeat: Take a step in the steepest descent direction

Figure source: Mathworks

Steepest Direction
o Steepest Direction = direction of the gradient

2 @g
3
@w1
6 @g7
6 @w2
7
rg = 6 7
4 ··· 5
@g
@wn
Optimization Procedure: Gradient Ascent

init
for iter = 1, 2, …

▪ : learning rate --- hyperparameter that needs to be chosen

carefully
Batch Gradient Ascent on the Log Likelihood Objective

init
for iter = 1, 2, …
Stochastic Gradient Ascent on the Log Likelihood Objective

Observation: once gradient on one training example has been

computed, might as well incorporate before computing next one

init
for iter = 1, 2, …
pick random j
Mini-Batch Gradient Ascent on the Log Likelihood Objective

Observation: gradient over small set of training examples (=mini-batch)

can be computed in parallel, might as well do that instead of a single one

init
for iter = 1, 2, …
pick random subset of training examples J
Neural Networks
Multi-class Logistic Regression
= special case of neural network (single layer, no hidden layer)
f1(x)

z1 s
f2(x) o
f
z2 t
f3(x)
m
a
x
… z3

fK(x)
Multi-layer Perceptron

x1
s
x2 o
f
… t
x3 m
a
… … … … x
…

g = nonlinear activation function

Multi-layer Perceptron
Common Activation Functions

[source: MIT 6.S191 introtodeeplearning.com]

Multi-layer Perceptron
Training the MLP neural network is just like logistic regression:

just w tends to be a larger vector

just run gradient ascent è Back-propagation algorithm

Neural Networks Properties
Theorem (Universal Function Approximators). A two-layer
neural network with a sufficient number of neurons can
approximate any continuous function to any desired accuracy.

Practical considerations
Can deal with more complex, nonlinear classification & regression
Large number of neurons and weights
Danger for overfitting
Deep Learning Model

Neural network as
General computation graph

Krizhevsky, Suskever, Hinton, 2012

Deep Learning Model
Deep Learning Model
§ We need good features!

Feature Extraction Classification “Panda”?

Prior Knowledge,
Experience

Pose Occlusion Multiple Inter-class

objects similarity

Image courtesy of M. Ranzato

Deep Learning Model

§ Directly learn features representations from data.

§ Joint learn feature representation and classifier.

More abstract representation

Low-level Mid-level High-level

Features Features Features
Classifier “Panda”?

Deep Learning: train layers of features so that classifier works well.

Image courtesy of M. Ranzato

Deep Learning Model
Have we been here before?
ØYes.
• Basic ideas common to past neural networks research
• Standard machine learning strategies still relevant.
ØNo.
Today’s Deep Learning

Computational
Large-scale Data New Algorithms
Power
Deep Learning Model
Convolutional Neural Networks (CNNs)
§ A special multi-stage architecture inspired by visual system
§ Higher stages compute more global, more invariant features
Deep Learning Model

https://www.datasciencecentral.com/lenet-5-a-classic-cnn-architecture/
Different Neural Network Architectures
§ Exploration of different neural network architectures
§ ResNet: residual networks
§ Networks with attention
§ Transformer networks
§ Neural network architecture search
§ Really large models
§ GPT2, GPT3
§ CLIP

37
Acknowledgement

The lecture slides are based on the materials from ai.Berkey.edu

Thank you. Questions?

QuickBooks Online For Accounting 1st Edition Glenn Owen Solution Manual
100% (44)
QuickBooks Online For Accounting 1st Edition Glenn Owen Solution Manual
5 pages
Electrical Specifications PDF
100% (2)
Electrical Specifications PDF
3 pages
C550 90
100% (4)
C550 90
16 pages
META HPA - e - 02 User Manual
100% (1)
META HPA - e - 02 User Manual
16 pages
cs188-sp24-note22
No ratings yet
cs188-sp24-note22
8 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
NN Theory
No ratings yet
NN Theory
138 pages
unit 2_class
No ratings yet
unit 2_class
16 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
No ratings yet
CS 188 Introduction To Artificial Intelligence Fall 2017 Note 10 Neural Networks: Motivation
9 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Chapter 2 - 2 Shallow neural network 2_2
No ratings yet
Chapter 2 - 2 Shallow neural network 2_2
34 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
Neural Networks
No ratings yet
Neural Networks
14 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Inbound 8392301798635648784
No ratings yet
Inbound 8392301798635648784
43 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
10 Multilayer Perceptrons
No ratings yet
10 Multilayer Perceptrons
54 pages
lec05
No ratings yet
lec05
46 pages
Anthony Kuh - Neural Networks and Learning Theory
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
72 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
AI ML Session Slides
No ratings yet
AI ML Session Slides
34 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Brief Summary ML
No ratings yet
Brief Summary ML
25 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
Neural-Network(Basics)
No ratings yet
Neural-Network(Basics)
48 pages
16-dl-1 - converted
No ratings yet
16-dl-1 - converted
9 pages
Lec 21
No ratings yet
Lec 21
34 pages
02-03-Warming-up and Data and Features
No ratings yet
02-03-Warming-up and Data and Features
22 pages
12 Advanced Machine Learning Algorithms
No ratings yet
12 Advanced Machine Learning Algorithms
41 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
Neuralnetworks 1
No ratings yet
Neuralnetworks 1
65 pages
Neural Networks: Artificial Intelligence: Representation and Problem Solving
No ratings yet
Neural Networks: Artificial Intelligence: Representation and Problem Solving
19 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
No ratings yet
Learning With Linear Neurons: Adapted From Lectures by Geoffrey Hinton and Others Updated by N. Intrator, May 2007
59 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Probability Neuron Network
No ratings yet
Probability Neuron Network
84 pages
AN2DL_02_2324_Perceptron_2_FeedForward
No ratings yet
AN2DL_02_2324_Perceptron_2_FeedForward
55 pages
deep learning
No ratings yet
deep learning
11 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
NNDL
No ratings yet
NNDL
96 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Lesson 7.0 Supervised Learning With Neural Networks (1)
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks (1)
22 pages
To Machine Learning: Isabelle Guyon
No ratings yet
To Machine Learning: Isabelle Guyon
40 pages
ML
No ratings yet
ML
9 pages
Contemporary ML For Physicists
No ratings yet
Contemporary ML For Physicists
91 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Artificial life: Random walk
From Everand
Artificial life: Random walk
Mietek Szyszkowicz
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Week5_Computer_Vision
No ratings yet
Week5_Computer_Vision
58 pages
Week1_Lecture1
No ratings yet
Week1_Lecture1
40 pages
Week2_Lecture
No ratings yet
Week2_Lecture
39 pages
Week3_LearningI
No ratings yet
Week3_LearningI
48 pages
A Review of Data Mining Technologies in Building Energy Systems
No ratings yet
A Review of Data Mining Technologies in Building Energy Systems
16 pages
The Five Dysfuctions of A Team - Final
No ratings yet
The Five Dysfuctions of A Team - Final
7 pages
Pamt311 Cornell Notes Dapilaga.docx
No ratings yet
Pamt311 Cornell Notes Dapilaga.docx
3 pages
Business Forecasting & Time Series Analysis
No ratings yet
Business Forecasting & Time Series Analysis
24 pages
ODS Analysis and Vibration Solving PDF
No ratings yet
ODS Analysis and Vibration Solving PDF
49 pages
Jet Bag Filter System PDF
No ratings yet
Jet Bag Filter System PDF
22 pages
DS Unit 2
No ratings yet
DS Unit 2
34 pages
SXS Memory Card / Hardware Compatibility Chart
No ratings yet
SXS Memory Card / Hardware Compatibility Chart
1 page
Chapter 6-Pneumatic Transport
100% (1)
Chapter 6-Pneumatic Transport
18 pages
Automatic Fan Speed Control Circuit Using PIC16F877A Microcontroller
No ratings yet
Automatic Fan Speed Control Circuit Using PIC16F877A Microcontroller
5 pages
Design_and_delivery_of_Ireland_s_largest
No ratings yet
Design_and_delivery_of_Ireland_s_largest
6 pages
100 Hot Words For The Sat
No ratings yet
100 Hot Words For The Sat
2 pages
Service Performance - Between Measurement and Information in The Public Sector
No ratings yet
Service Performance - Between Measurement and Information in The Public Sector
5 pages
Energy Work and Power QP 9
No ratings yet
Energy Work and Power QP 9
11 pages
Final Role and Competencies of Od Practitioner
100% (1)
Final Role and Competencies of Od Practitioner
31 pages
Ver.7 Effects of Fibrin 23 and ARG Fibers On Porous Concrete
100% (1)
Ver.7 Effects of Fibrin 23 and ARG Fibers On Porous Concrete
55 pages
Pre Marriage Counseling
No ratings yet
Pre Marriage Counseling
6 pages
Sanyo Em-C1900 SM
No ratings yet
Sanyo Em-C1900 SM
28 pages
It'S The Detail That Counts!: Wabco Air Dryer Cartridges
No ratings yet
It'S The Detail That Counts!: Wabco Air Dryer Cartridges
1 page
Algebra II Probability Unit Plan Hogan
No ratings yet
Algebra II Probability Unit Plan Hogan
12 pages
Rohit Saini
No ratings yet
Rohit Saini
28 pages
Sonder Notes
No ratings yet
Sonder Notes
5 pages
The Siege of Kol-Dun
No ratings yet
The Siege of Kol-Dun
26 pages
Salesian Spirituality 1a
No ratings yet
Salesian Spirituality 1a
21 pages
Statement 602129 55881157 31 May 2024
No ratings yet
Statement 602129 55881157 31 May 2024
3 pages
Nexo PS15 Lautsprecher
No ratings yet
Nexo PS15 Lautsprecher
2 pages