0% found this document useful (0 votes)

4 views

Machine Learning Overview

Machine Learning (ML) is the study of algorithms that improve their performance on tasks based on experience, typically through data. It differs from traditional programming by learning from data inputs and producing outputs through trained models. ML encompasses various types including supervised, unsupervised, and reinforcement learning, and has applications in fields such as image classification, natural language processing, and robotics.

Uploaded by

Carlos Souza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Machine Learning Overview

Uploaded by

Carlos Souza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Machine

Learning
Overview
What is Machine Learning (ML)?

“A computer program is said to learn from

experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by
P, improves with experience E.”

Tom Mitchell (Machine Learning, 1997)

Machine Learning is the
study of algorithms that:

⬣ Improve their performance

⬣ on some task(s)
⬣ Based on experience
(typically data)
How is it Different than Programming?
Programming

Input Algorithm Output

Machine Learning Training

Data Algorithm Labels

Inference (Testing)

Input Model Output

Machine learning thrives when it is difficult to design an algorithm to
perform the task
Applications:

algorithm quicksort(A, lo, hi) is

if lo < hi then
p := partition(A, lo, hi)
quicksort(A, lo, p – 1)
quicksort(A, p + 1, hi)
Coffee
algorithm partition(A, lo, hi) is
pivot := A[hi]
? cup
i := lo
for j := lo to hi do
if A[j] < pivot then
swap A[i] with A[j]
i := i + 1
swap A[i] with A[hi]
return i

Machine Learning Applications

Machine Learning and Artificial Intelligence

Abductive
Inference to best
Artificial Intelligence explanation/
hypothesis for a set
of observations
Machine Learning Inductive
Deductive Reason from
Deep Deduce specific examples to
Learning conclusions general rules or
Search from rules model
• JESS, CLIPS • Scikit Learn
• Drools, Esper • TensorFlow
Deductive
• CEP Engines • Rapid Miner
Reasoning • Spark Streams • Spark MLlib

Adapted from: https://www.datanami.com/2018/03/20/u-s-

pursues-abductive-reasoning-to-divine-intent/
Given an image, output class label
⬣ Often output probability distribution over labels

Applications:
Class Scores

Model

Car Coffee Cup Bird

Class Scores

Model

Normal Benign Malignant

Example: Image Classification

Given a series of measurements, output prediction for next time period

Application:
Stock Market Data

Model

Feb March April June June July

Input Prediction

Example: Time Series Prediction

Very large number of NLP sub-tasks:
⬣ Syntax Parsing
⬣ Parts of speech
⬣ Named entity recognition
⬣ Summarization
⬣ Similarity / paraphrasing

Different from classification: Variable

length sequential inputs and/or outputs

Example: Natural Language Processing (NLP)

Sentiment Analysis:

Class Scores
Model

Negative Neutral Positive

Example: Natural Language Processing (NLP)

Application:
Decision-making tasks
⬣ Sequence of Actions Observations
inputs/outputs affect the
world
⬣ Actions affect the
environment
Model

Combination of Probability
distribution
perception and decision- over actions
making/controls {left, right,
up, down}

Example: Decision-Making Tasks

Robotics involves a combination
of AI/ML techniques: Application:
⬣ Sense: Perception
⬣ Plan: Planning
⬣ Act: Controls/Decision-Making
Some things are learned
(perception), while others
programmed
⬣ Evolving landscape

Example: Robotics
Supervised
Learning and
Parametric
Models
Dataset
Supervised Learning
𝑋 = 𝑥" , 𝑥# , … , 𝑥$ 𝑤ℎ𝑒𝑟𝑒 𝑥 ∈ ℝ% Examples
⬣ Train Input: 𝑋, 𝑌
⬣ Learning output: 𝑓 ∶ 𝑋 → 𝑌, 𝑌 = 𝑦" , 𝑦# , … , 𝑦$ 𝑤ℎ𝑒𝑟𝑒 𝑦 ∈ ℝ& Labels
e.g. 𝑃(𝑦|𝑥)

Dataset
Terminology:
⬣ Model Example 1 Label 1

⬣ Category / Class Example 2 Label 2

⬣ Note inputs 𝒙𝒊 and 𝒚𝒊 are Example N Label N

each represented as
vectors

Types of Machine Learning

Dataset
𝑋 = 𝑥" , 𝑥# , … , 𝑥$ 𝑤ℎ𝑒𝑟𝑒 𝑥 ∈ ℝ% Examples

Unsupervised Learning
⬣ Input: 𝑋
Dataset
⬣ Learning output: 𝑃 𝑥
Example 1
⬣ Example: Clustering, density
estimation, etc. Example 2

Example N

Types of Machine Learning

Agent
State
Reinforcement Learning
Reward
⬣ Supervision in form of Next state
reward
⬣ No supervision on what
action to take Environment Action

Adapted from: http://cs231n.stanford.edu/slides/2020/lecture_17.pdf

Types of Machine Learning

Supervised Unsupervised Reinforcement
Learning Learning Learning
⬣ Train Input: 𝑋, 𝑌 ⬣ Input: 𝑋 ⬣ Supervision in
form of reward
⬣ Learning output: 𝑓 ⬣ Learning
∶ 𝑋 → 𝑌, output: 𝑃 𝑥 ⬣ No supervision on
e.g. 𝑃(𝑦|𝑥) what action to take
⬣ Example: Clustering,
density estimation,
etc.

Very often combined

⬣ Sometimes within the same model!

Types of Machine Learning

Non-Parametric – Nearest Neighbor

Non-Parametric Model Example 1, cat Example 2, dog

No explicit model for the

function, examples:
Query Example 4, dog
⬣ Nearest neighbor classifier

⬣ Decision tree Example 3, car

Procedure: Take label of nearest example

Supervised Learning
Parametric – Linear Classifier
Parametric Model

Explicitly model the function 𝑓 ∶ 𝑋 → 𝑌 𝑓 𝑥, 𝑊 = 𝑊𝑥 + 𝑏

in the form of a parametrized function
𝑓 𝑥, 𝑊 = 𝑦, examples:
Procedure:
⬣ Logistic regression/classification Calculate score per class for
example
⬣ Neural networks
Return label of maximum score
(argmax)

Supervised Learning
Data: Image Class Scores

Model
𝑓 𝑥, 𝑊 = 𝑊𝑥 + 𝑏
Car Coffee Cup Bird

Input {𝑿, 𝒀} where:

⬣ 𝑋 is an image
⬣ 𝑌 is a ground truth label annotated by an expert (human)
⬣ 𝑓 𝑥, 𝑊 = 𝑊𝑥 + 𝑏 is our model, chosen to be a linear function in this case
⬣ 𝑊 and 𝑏 are the parameters (weights) of our model that must be learned

Example: Image Classification

Input image is high-dimensional Input Image
⬣ For example n=512 so 512x512
image = 262,144 pixels
⬣ Learning a classifier with high-
dimensional inputs is hard
Before deep learning, it was typical to
perform feature engineering 𝑥!! 𝑥!" ⋯ 𝑥!#
𝑥"! 𝑥"" ⋯ 𝑥"#
⬣ Hand-design algorithms for 𝑥= ⋮ ⋮ ⋱ ⋮
converting raw input into a lower-
𝑥#! 𝑥#" ⋯ 𝑥##
dimensional set of features

Input Representation: Feature Engineering

Example: Color histogram
⬣ Vector of numbers representing number of pixels fitting within
each bin
⬣ We will later see that learning the feature representation itself is
much more effective

Data: Image Features: Histogram

Input Representation: Feature Engineering

⬣ Labels are categories, but we need a
numerical representation Ground truth label: ‘Coffee Cup’
⬣ Assigning number to each Convert
category is arbitrary to Scores
⬣ Instead, represent probability (e.g. 0/1)
distribution over categories 1.0
⬣ Ground truth label then becomes a
probability distribution where the
correct category probability is 1, and all 0.0 0.0
others are 0
Car Coffee Cup Bird
⬣ Note for regression this is not an
issue as the ground truth label (e.g. Class Scores
housing prices) is a number already

Output Representation: Representing Categories

Features: Histogram Class Scores
Data: Image

Model
𝑓 𝑥, 𝑊 = 𝑊𝑥 + 𝑏

Car Coffee Bird

Cup

Input {𝑿, 𝒀} where:

⬣ 𝑋 is an image histogram
⬣ 𝑌 is a ground truth label represented a probability distribution
⬣ 𝑓 𝑥, 𝑊 = 𝑊𝑥 + 𝑏 is our model, chosen to be a linear function in this case
⬣ 𝑊 and 𝑏 are the weights of our model that must be learned

Example: Image Classification

Class Scores
Data: Text

Model
𝑓 𝑥, 𝑊 = 𝑊! + 𝑏

Negative Neutral Positive

Input {𝑿, 𝒀} where: Word Histogram

⬣ 𝑋 is a sentence Word Count
this 1
⬣ 𝑌 is a ground truth label annotated by an that 0
expert (human) is 2
⬣ 𝑓 𝑥, 𝑊 = 𝑊𝑥 + 𝑏 is our model, chosen to be a ...
extremely 1
linear function in this case hello 0
⬣ 𝑊 and 𝑏 are the weights of our model that must onomatopoeia 0
be learned …

Example: Image Classification

Components
of a
Parametric
Learning
Algorithm
⬣
Class Scores
Input (and representation)
⬣ Functional form of the model
⬣ Including parameters Car Coffee Bird
Cup
⬣ Performance measure to improve
⬣ Loss or objective function
⬣ Algorithm for finding best parameters Loss Function
⬣ Optimization algorithm

Class Scores
Optimizer
Model
𝒇 𝒙, 𝑾 = 𝑾𝒙 + 𝒃

Data: Image Car Coffee Bird

Features: Histogram Cup

Components of a Parametric Model

𝒇 𝒙, 𝒘 = 𝒚

Classifier Output
Input Weights (scalar or vector)
(vector)

⬣ Input: Continuous number or vector

⬣ Output: A continuous number
⬣ For classification typically a score
⬣ For regression what we want to regress to (house prices,
crime rate, etc.)
⬣ 𝒘 is a vector and weights to optimize to fit target function

Model: Discriminative Parameterized Function

What is the simplest function
you can think of?
Our model is:
𝒚 = 𝒎𝒙 + 𝒃
𝑦
=
𝑦 𝒇 𝒙, 𝒘 = 𝒘 ⋅ 𝒙 + 𝒃
−
𝑥
+ 1 𝑥+2
5 𝑦= 2

Weights Bias
Classifier
(scalar)
Result
𝑥
Input
(Note if 𝒘 and 𝐱 are column vectors we often show this as 𝒘! 𝒙)
Image adapted from:
https://en.wikipedia.org/wiki/Linear_equation#/
media/File:Linear_Function_Graph.svg

Simple Function
Linear Classification and
Regression
Simple linear classifier:
⬣ Calculate score:
𝒇 𝒙, 𝒘 = 𝒘 ⋅ 𝒙 + 𝒃
⬣ Binary classification rule
(𝒘 is a vector):

𝟏 𝐢𝐟 𝒇 𝒙, 𝒘 > = 𝟎
𝒚=A
𝟎 𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞
⬣ For multi-class classifier take
class with highest (max) score
𝒇(𝒙, 𝑾) = 𝑾𝒙 + 𝒃
⬣ Idea: Separate classes via
Car
high-dimensional linear
separators (hyper-planes) Bird

⬣ One of the simplest

parametric models, but
surprisingly effective

⬣ Very commonly used!

⬣ Let’s look more closely at
each element

Linear Classification and Regression

Data: Image Class Scores
Model
𝒇 𝒙, 𝑾 = 𝑾𝒙 + 𝒃
Car Coffee Bird
𝑥"" Cup
𝑥"#
⋮
𝑥"" 𝑥"# ⋯ 𝑥"'
𝑥#"
𝑥#" 𝑥## ⋯ 𝑥#'
𝑥= ⋮ ⋮ ⋱ ⋮ 𝑥 = 𝑥##
⋮
𝑥'" 𝑥'# ⋯ 𝑥'' Flatten
𝑥'"
⋮
𝑥''
To simplify notation we will refer to inputs as 𝑥! ⋯ 𝑥0 where 𝑚 = 𝑛 × 𝑛

Input Dimensionality
Model
𝒇 𝒙, 𝑾 = 𝑾𝒙 + 𝒃

Classifier for class 1 𝑤!! 𝑤!" ⋯ 𝑤!# 𝑥! 𝑏!

Classifier for class 2 𝑤"! 𝑤"" ⋯ 𝑤"# 𝑥" 𝑏"
Classifier for class 3 𝑤$! 𝑤$" ⋯ 𝑤$# ⋮ + 𝑏$
𝑥#

𝑾 𝒙 𝒃

(Note that in practice, implementations can use xW instead, assuming a different shape for W. That is just a different convention and is equivalent.)

Weights
⬣ We can move Model
the bias term 𝒇 𝒙, 𝑾 = 𝑾𝒙 + 𝒃
into the weight
matrix, and a “1”
at the end of the 𝑤!! 𝑤!" ⋯ 𝑤!# 𝑏! 𝑥!
input 𝑤"! 𝑤"" ⋯ 𝑤"# 𝑏" 𝑥"
𝑤$! 𝑤$" ⋯ 𝑤$# 𝑏$ ⋮
⬣ Results in one 𝑥#
matrix-vector 1
multiplication! 𝑾 𝒙

Weights
Example with an image with 4 pixels, and 3 classes (cat/dog/ship)
Stretch pixels into column

56
0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score
56 231
231

24 2
1.5 1.3 2.1 0.0 + 3.2 = 437.9 Dog score
24
0 0.25 0.2 -0.3 -1.2 60.75 Ship score
Input image
2
𝑾 𝒃
Adapted from slides by Fei-Fei Li, Justin Johnson, Serena Yeung, from CS 231n

Example
airplane
automobile Visual Viewpoint
bird
cat We can convert the
deer weight vector back into
dog the shape of the image
frog
horse
and visualize
ship
truck
plane car bird cat deer dog frog horse ship truck

Adapted from slides by Fei-Fei Li, Justin Johnson, Serena Yeung, from CS 231n

Interpreting a Linear Classifier

Geometric Viewpoint

𝒇(𝒙, 𝑾) = 𝑾𝒙 + 𝒃

Array of 32x32x3 numbers

(3072 numbers total)
Plot created using Wolfram Cloud

Adapted from slides by Fei-Fei Li, Justin Johnson, Serena Yeung, from CS 231n

Interpreting a Linear Classifier

Class 1: Class 1: Class 1:
number of pixels > 0 odd 1 < = L2 norm < = 2 Three modes
Class 2: Class 2: Class 2:
number of pixels > 0 even Everything else Everything else

Adapted from slides by Fei-Fei Li, Justin Johnson, Serena Yeung, from CS 231n

Hard Cases for a Linear Classifier

Algebraic Visual Geometric
Viewpoint Viewpoint Viewpoint
One template Hyperplanes
𝒇(𝒙, 𝑾) = 𝑾𝒙 per class cutting up space

Adapted from slides by Fei-Fei Li, Justin Johnson, Serena Yeung, from CS 231n

Linear Classifier: Three Viewpoints

Performance
Measure for
a Classifier
⬣
Class Scores
Input (and representation)
⬣ Functional form of the model
⬣ Including parameters Car Coffee Bird
Cup
⬣ Performance measure to improve
⬣ Loss or objective function
⬣ Algorithm for finding best parameters Loss Function
⬣ Optimization algorithm

Class Scores
Optimizer
Model
𝒇 𝒙, 𝑾 = 𝑾𝒙 + 𝒃

Data: Image Car Coffee Bird

Features: Histogram Cup

Components of a Parametric Model

⬣ The output of a classifier can
be considered a score
⬣ For binary classifier, use rule: Class Scores
𝟏 𝐢𝐟 𝒇 𝒙, 𝒘 > = 𝟎
𝒚=A
𝟎 𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞 Model
𝑓 𝑥, 𝑊 = 𝑊𝑥 + 𝑏
⬣ Can be used for many
classes by considering Car Coffee Bird
Cup
one class versus all the
rest (one versus all)
⬣ For multi-class classifier can
take the maximum

Classification using Scores

Several issues with scores:
⬣ Not very interpretable (no
bounded value)
𝒔 = 𝒇(𝒙, 𝑾) Scores

We often want probabilities

𝒆𝒔𝒌 Softmax
⬣ More interpretable 𝑷 𝒀=𝒌𝑿=𝒙 =
∑𝒋 𝒆𝒔𝒋 Function
⬣ Can relate to probabilistic
view of machine learning
We use the softmax function to
convert scores to probabilities

Converting Scores to Probabilities

We need a performance measure to
optimize Given a dataset of examples:

⬣ Penalizes model for being wrong { 𝒙𝒊 , 𝒚𝒊 }𝑵

𝒊E𝟏
⬣ Allows us to modify the model to
reduce this penalty Where 𝒙𝒊 is image and
⬣ Known as an objective or loss 𝒚𝒊 is (integer) label
function
In machine learning we use empirical Loss over the dataset is a sum
risk minimization of loss over examples:
⬣ Reduce the loss over the training 𝟏
dataset 𝑳 = / 𝑳𝒊 (𝒇 𝒙𝒊 , 𝑾 , 𝒚𝒊 )
⬣ We average the loss over the training
𝑵
data

Performance Measure
Multiclass SVM loss:
delta
Given an example (𝒙𝒊, 𝒚𝒊 )
score
where 𝒙𝒊 is the image and scores for other classes score for correct class
where 𝒚𝒊 is the (integer) label,

and using the shorthand for the

scores vector: 𝒔 = 𝒇(𝒙𝒊 , 𝑾) Example: “Hinge Loss”

the SVM loss has the form:

𝟎 𝐢𝐟 𝒔𝒚𝒊 ≥ 𝒔𝒋 + 𝟏 𝒔𝒚𝒊
𝑳𝒊 = I J𝒔 − 𝒔 + 𝟏
𝒋 𝒚𝒊 𝐨𝐭𝐡𝐞𝐫𝐰𝐢𝐬𝐞
𝒋)𝒚𝒊 𝒔𝒋
𝟏
= I 𝒎𝒂𝒙(𝟎, 𝒔𝒋 − 𝒔𝒚𝒊 + 𝟏)
𝒋)𝒚𝒊
Adapted from slides by Fei-Fei Li, Justin Johnson, Serena Yeung, from CS 231n

Performance Measure for Scores

Multiclass SVM loss: Suppose: 3 training examples, 3 classes.
Given an example (𝒙𝒊, 𝒚𝒊 )
With some 𝑾 the scores 𝒇(𝒙,𝑾)=𝑾𝒙 are:
where 𝒙𝒊 is the image and
where 𝒚𝒊 is the (integer) label,

and using the shorthand for the

scores vector: 𝒔 = 𝒇(𝒙𝒊 , 𝑾)

the SVM loss has the form: cat 3.2 1.3 2.2
𝑳𝒊 = /
𝒋(𝒚𝒊
𝒎𝒂𝒙(𝟎, 𝒔𝒋 − 𝒔𝒚𝒊 + 𝟏) car 5.1 4.9 2.5
= max(0, 5.1 - 3.2 + 1)
+max(0, -1.7 - 3.2 + 1)
frog -1.7 2.0 -3.1
= max(0, 2.9) + max(0, -3.9) Losses: 2.9
= 2.9 + 0
= 2.9 Adapted from slides by Fei-Fei Li, Justin Johnson, Serena Yeung, from CS 231n

SVM Loss Example

⬣ If we use the softmax function to
convert scores to probabilities,
the right loss function to use is 𝑳𝒊 = −𝐥𝐨𝐠 𝑷(𝒀 = 𝒚𝒊 |𝑿 = 𝒙𝒊 )
cross-entropy
⬣ Can be derived by looking at the Maximum Likelihood
distance between two probability Estimation
distributions (output of model and Choose parameters to
ground truth) maximize the likelihood
of the observed data
⬣ Can also be derived from a
maximum likelihood estimation
perspective

Performance Measure for Probabilities

Softmax Classifier (Multinomial Logistic Regression)
Want to interpret raw classifier scores as probabilities
𝒆𝒔𝒌 Softmax
𝒔 = 𝒇(𝒙𝒊 ; 𝑾) 𝑷 𝒀 = 𝒌 𝑿 = 𝒙𝒊 =
∑𝒋 𝒆𝒔𝒋 Function
Probabilities Probabilities
𝑳 = −𝐥𝐨𝐠𝑷(𝒀 = 𝒚𝒊 |𝑿 = 𝒙𝒊 )
must be >= 0 must sum to 1 𝒊
𝑳𝒊 = −𝐥𝐨𝐠(𝟎. 𝟏𝟑)
cat 3.2 24.5 0.13
exp normalize
car 5.1 164.0 0.87
frog -1.7 0.18 0.00
Unnormalized log- Unnormalized Probabilities
probabilities / logits probabilities
Adapted from slides by Fei-Fei Li, Justin Johnson, Serena Yeung, from CS 231n

Cross-Entropy Loss Example

If we are performing regression, we can directly optimize to match the
ground truth value
⬣ Example: House price prediction

𝑳𝒊 = |𝒚 − 𝑾𝒙𝒊 | L1 loss

𝑳𝒊 = |𝒚 − 𝑾𝒙𝒊 |𝟐 L2 loss

⬣ For probabilities
𝒆𝒔𝒌
𝑳𝒊 = |𝒚 − 𝑾𝒙𝒊 | = Logistic Source: https://raw.githubusercontent.com/rohan-
∑𝒋 𝒆𝒔𝒋 varma/rohan-blog/gh-pages/images/loss3.jpg

Regression Example
Often, we add a regularization term to the loss function

L1 Regularization
𝑳𝒊 = |𝒚 − 𝑾𝒙𝒊 |𝟐 + |𝑾|

Example regularizations:
⬣ L1/L2 on weights (encourage small values)

Regularization
Gradient
Descent
⬣
Class Scores
Input (and representation)
⬣ Functional form of the model
⬣ Including parameters Car Coffee Bird
Cup
⬣ Performance measure to improve
⬣ Loss or objective function
⬣ Algorithm for finding best parameters Loss Function
⬣ Optimization algorithm

Class Scores
Optimizer
Model
𝒇 𝒙, 𝑾 = 𝑾𝒙 + 𝒃

Data: Image Car Coffee Bird

Features: Histogram Cup

Components of a Parametric Model

Given a model and loss function, finding the
best set of weights is a search problem
⬣ Find the best combination of weights
that minimizes our loss function
𝒘𝟏𝟏 𝒘𝟏𝟐 ⋯ 𝒘𝟏𝒎 𝒃𝟏
Several classes of methods: 𝒘𝟐𝟏 𝒘𝟐𝟐 ⋯ 𝒘𝟐𝒎 𝒃𝟐
⬣ Random search 𝒘𝟐𝟏 𝒘𝟐𝟐 ⋯ 𝒘𝟑𝒎 𝒃𝟑
⬣ Genetic algorithms (population-based
search)
⬣ Gradient-based optimization
Loss
In deep learning, gradient-based methods
are dominant although not the only
approach possible

Optimization
As weights change, the loss
changes as well
⬣ This is often somewhat-
smooth locally, so small
changes in weights produce
small changes in the loss

We can therefore think about

iterative algorithms that take
current values of weights and
modify them a bit

Loss Surfaces
Strategy: Follow the Slope!
⬣ We can find the steepest descent direction by
computing the derivative (gradient):
𝒇 𝒂 + 𝒉 − 𝒇(𝒂)
𝒇, 𝒂 = lim
𝒉→𝟎 𝒉
⬣ Steepest descent direction is the negative
gradient
⬣ Intuitively: Measures how the function
changes as the argument a changes by a small
step size
⬣ As step size goes to zero
⬣ In Machine Learning: Want to know how the ∆𝒙
loss function changes as weights are varied
Image and equation from:
⬣ Can consider each parameter separately https://en.wikipedia.org/wiki/Derivative#/media/
by taking partial derivative of loss File:Tangent_animation.gif
function with respect to that parameter

Derivatives
This idea can be turned into an algorithm (gradient descent)

⬣ Choose a model: 𝒇 𝒙, 𝑾 = Wx

⬣ Choose loss function: 𝑳𝒊 = |𝒚 − 𝑾𝒙𝒊 |𝟐

𝝏𝑳
⬣ Calculate partial derivative for each parameter:
𝝏𝒘𝒋

𝝏𝑳
⬣ Update the parameters: 𝒘𝒋 = 𝒘𝒋 −
𝝏𝒘𝒋

𝝏𝑳
⬣ Add learning rate to prevent too big of a step: 𝒘𝒋 = 𝒘𝒋 − 𝜶
𝝏𝒘𝒋

Gradient Descent
Often, we only compute the gradients across a small subset of
data
𝟏
⬣ Full Batch Gradient Descent 𝑳 = / 𝑳 (𝒇 𝒙𝒊 , 𝑾 , 𝒚𝒊 )
𝑵

𝟏
⬣ Mini-Batch Gradient Descent 𝑳 = / 𝑳 (𝒇 𝒙𝒊 , 𝑾 , 𝒚𝒊 )
𝑴
⬣ Where M is a subset of data

⬣ We iterate over mini-batches:

⬣ Get mini-batch, compute loss, compute derivatives, and

take a step

Mini-Batch Gradient Descent

Gradient descent is guaranteed to converge under some
conditions

⬣ For example, learning rate has to be appropriately reduced

throughout training

⬣ It will converge to a local minima

⬣ Small changes in weights would not decrease the loss

⬣ It turns out that some of the local minima that it finds in

practice (if trained well) are still pretty good!

Gradient Descent Properties

We know how to compute the
model output and loss
function

𝝏𝑳
Several ways to compute
𝝏𝒘𝒊

⬣ Manual differentiation

⬣ Symbolic differentiation

⬣ Numerical differentiation

⬣ Automatic differentiation

Computing Gradients
For some functions, we can analytically derive the partial derivative
Example: Derivation of Update Rule
𝑵
𝝏𝑳 𝝏
L= ∑𝑵 𝑻 𝟐 =4 (𝒚 − 𝒘𝑻 𝒙𝒊 )𝟐
𝒊#𝟏(𝒚𝒊 − 𝒘 𝒙𝒊 ) 𝝏𝒘𝒋 𝝏𝒘𝒋 𝒊
Function Loss 𝒊#𝟏
𝑵

𝒇 𝒘, 𝒙𝒊 = 𝒘𝑻 𝒙𝒊 (𝒚𝒊 − 𝒘𝑻 𝒙𝒊 )𝟐 Gradient descent tells us = 4 𝟐 𝒚𝒊 − 𝒘𝑻 𝒙𝒊

𝝏
(𝒚 − 𝒘𝑻 𝒙𝒊 )
we should update 𝒘 as 𝝏𝒘𝒋 𝒊
𝒊#𝟏
(Assume 𝒘 and 𝐱𝐢 are column vectors, so same as 𝒘 ⋅ 𝒙𝒊 ) follows to minimize 𝐿: 𝑵
𝝏
= −𝟐 4 𝜹𝒊 𝒘𝑻 𝒙𝒊
𝝏𝑳 𝝏𝒘𝒋
𝒊#𝟏
Dataset: N examples (indexed by 𝑖) 𝒘𝒋 ← 𝒘𝒋 − 𝜼 …where…
𝝏𝒘𝒋 𝜹𝒊 = 𝒚𝒊 − 𝒘𝑻 𝒙𝒊
Update Rule 𝑵
𝝏
𝒎

𝑵 𝝏𝑳 = −𝟐 4 𝜹𝒊 4 𝒘𝒌 𝒙𝒊𝒌
So what’s 𝝏𝒘 ? 𝝏𝒘𝒋
𝒋 𝒊#𝟏 𝒌#𝟏
𝒘𝒋 ← 𝒘𝒋 + 𝟐𝜼 I 𝜹𝒌 𝒙𝒌𝒋 𝑵
= −𝟐 4 𝜹𝒊 𝒙𝒊𝒋
𝒌2𝟏
𝒊#𝟏

Manual Differentiation
If we add a non-linearity (sigmoid), derivation is more complex
𝟏
𝝈 𝒙 =
𝟏 + 𝒆+𝒙
First, one can derive that: 𝝈- 𝒙 = 𝝈(𝒙)(𝟏 − 𝝈 𝒙 )

𝐟 𝐱 = 𝝈 4 𝒘𝒌 𝒙𝒌
𝒌
𝟐

L = 4 𝒚𝒊 − 𝝈 4 𝒘𝒌 𝒙𝒊𝒌
𝒊 𝒌

𝝏𝑳 𝝏
The sigmoid perception update rule:
= 4 𝟐 𝒚𝒊 − 𝝈 4 𝒘𝒌 𝒙𝒊𝒌 − 𝝈 4 𝒘𝒌 𝒙𝒊𝒌 𝑵
𝝏𝒘𝒋 𝝏𝒘𝒋
𝒊 𝒌 𝒌
𝒘𝒋 ← 𝒘𝒋 + 𝟐𝜼 5 𝜹𝒊 𝝈𝒊 (𝟏 − 𝝈𝒊 )𝒙𝒊𝒋
𝝏 𝒊.𝟏
= 4 −𝟐 𝒚𝒊 − 𝝈 4 𝒘𝒌 𝒙𝒊𝒌 𝝈′ 4 𝒘𝒌 𝒙𝒊𝒌 4 𝒘𝒌 𝒙𝒊𝒌 𝒎
𝝏𝒘𝒋
𝒊 𝒌 𝒌 𝒌
where 𝝈𝒊 = 𝝈 5 𝒘𝒌 𝒙𝒊𝒌
= 4 −𝟐𝜹𝒊 𝝈(𝐝𝒊 )(𝟏 − 𝝈 𝐝𝒊 )𝒙𝒊𝒋 𝒌.𝟏
𝒊
where 𝜹𝒊 = 𝒚𝒊 − 𝐟(𝒙𝒊 ) 𝒅𝒊 = 4 𝒘𝒌 𝒙𝒊𝒌 𝜹 𝒊 = 𝒚𝒊 − 𝝈 𝒊

Adding a Non-Linear Function

Given a library of simple functions

𝐬𝐢𝐧(𝒙)
𝐥𝐨𝐠(𝒙) Compose into a 𝟏
𝐜𝐨𝐬(𝒙) − 𝐥𝐨𝐠
𝒙𝟑
complicate function 𝟏 + 𝒆<𝒘⋅𝒙
𝐞𝐱𝐩(𝒙)

𝒖 𝟏 𝒑 𝑳
𝒘⋅𝒙 −𝐥𝐨𝐠 𝒑
𝟏 + 𝒆d𝒖
Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Decomposing a Function
Linear
Algebra
View:
Vector and
Matrix Sizes
𝒙𝟏
𝒘𝟏𝟏 𝒘𝟏𝟐 ⋯ 𝒘𝟏𝒎 𝒃𝟏
𝒙𝟐
𝒘𝟐𝟏 𝒘𝟐𝟐 ⋯ 𝒘𝟐𝒎 𝒃𝟐
⋮
𝒘𝟑𝟏 𝒘𝟑𝟐 ⋯ 𝒘𝟑𝒎 𝒃𝟑
𝒙𝒎
𝟏

𝑾 𝒙

Sizes: 𝒄× 𝒅 + 𝟏 𝒅 + 𝟏 ×𝟏
Where c is number of classes
d is dimensionality of input

Closer Look at a Linear Classifier

Conventions:
⬣ Size of derivatives for scalars, vectors, and matrices:
Assume we have scalar 𝒔 ∈ ℝ𝟏 , vector 𝒗 ∈ ℝ𝒎, i.e. 𝒗 = 𝒗𝟏 , 𝒗𝟐 , … , 𝒗𝒎 𝑻

and matrix 𝑴 ∈ ℝ𝒌×ℓ

𝝏𝒗𝟏
𝝏𝒗 𝝏𝒔
⬣ What is the size of ? ℝ𝒎×𝟏 (column vector of size m ) 𝝏𝒗𝟐
𝝏𝒔
𝝏𝒔
𝝏𝒔 ⋮
⬣ What is the size of ? ℝ𝟏×𝒎 (row vector of size m ) 𝝏𝒗𝒎
𝝏𝒗
𝝏𝒔
𝝏𝒔 𝝏𝒔 𝝏𝒔
⋯
𝝏𝒗𝟏 𝝏𝒗𝟐 𝝏𝒗𝒎

Dimensionality of Derivatives
Conventions:
𝝏𝒗𝟏
⬣ What is the size of ? A matrix: Col j
𝝏𝒗𝟐
𝝏𝒗𝟏𝟏
⋯ ⋯ ⋯ ⋯
𝝏𝒗𝟐𝟏
⋯ ⋯ ⋯ ⋯ ⋯
Row i 𝝏𝒗𝟏𝒊
⋯ ⋯ ⋯ ⋯
𝝏𝒗𝟐𝒋
⋯ ⋯ ⋯ ⋯ ⋯
⋯ ⋯ ⋯ ⋯ ⋯
⬣ This matrix of partial derivatives is called a Jacobian

(Note this is slightly different convention than on Wikipedia)

Dimensionality of Derivatives
Conventions:
𝝏𝒔
⬣ What is the size of ? A matrix:
𝝏𝑴
𝝏𝒔
⋯ ⋯ ⋯ ⋯
𝝏𝒎[𝟏,𝟏]
⋯ ⋯ ⋯ ⋯ ⋯
𝝏𝒔
⋯ ⋯ ⋯ ⋯
𝝏𝒎[𝒊,𝒋]
⋯ ⋯ ⋯ ⋯ ⋯
⋯ ⋯ ⋯ ⋯ ⋯

Dimensionality of Derivatives
𝝏𝑳
⬣ What is the size of ?
𝝏𝑾

⬣ Remember that loss is a scalar and W is a matrix:

𝒘𝟏𝟏 𝒘𝟏𝟐 ⋯ 𝒘𝟏𝒎 𝒃𝟏
𝒘𝟐𝟏 𝒘𝟐𝟐 ⋯ 𝒘𝟐𝒎 𝒃𝟐
𝒘𝟑𝟏 𝒘𝟑𝟐 ⋯ 𝒘𝟑𝒎 𝒃𝟑
Jacobian is also a matrix: W
𝝏𝑳 𝝏𝑳 𝝏𝑳 𝝏𝑳
⋯
𝝏𝒘𝟏𝟏 𝝏𝒘𝟏𝟐 𝝏𝒘𝟏𝒎 𝝏𝒃𝟏
𝝏𝑳 𝝏𝑳 𝝏𝑳
⋯ ⋯
𝝏𝒘𝟐𝟏 𝝏𝒘𝟐𝒎 𝝏𝒃𝟐
𝝏𝑳 𝝏𝑳
⋯ ⋯ ⋯
𝝏𝒘𝟑𝒎 𝝏𝒃𝟑

Dimensionality of Derivatives
Batches of data are matrices or tensors (multi- 𝒙𝟏𝟏 𝒙𝟏𝟐 ⋯ 𝒙𝟏𝒏
dimensional matrices) 𝒙𝟐𝟏 𝒙𝟐𝟐 ⋯ 𝒙𝟐𝒏
Examples: ⋮ ⋮ ⋱ ⋮
⬣ Each instance is a vector of size m, our batch is of 𝒙𝒏𝟏 𝒙𝒏𝟐 ⋯ 𝒙𝒏𝒏
size [𝑩×𝒎]
⬣ Each instance is a matrix (e.g. grayscale image) of Flatten
size 𝑾×𝑯, our batch is [𝑩×𝑾×𝑯]
𝒙𝟏𝟏
⬣ Each instance is a multi-channel matrix (e.g. color 𝒙𝟏𝟐
image with R,B,G channels) of size 𝑪×𝑾×𝑯, our
⋮
batch is [𝑩×𝑪×𝑾×𝑯]
𝒙𝟐𝟏
Jacobians become tensors which is complicated 𝒙𝟐𝟐
⬣ Instead, flatten input to a vector and get a vector of ⋮
derivatives! 𝒙𝒏𝟏
⬣ This can also be done for partial derivatives ⋮
between two vectors, two matrices, or two tensors 𝒙𝒏𝒏

Jacobians of Batches
How is Deep
Learning
Different?
Hierarchical
Compositionality
So What is Deep
(Machine) Learning?

⬣ Representation Learning
⬣ Neural Networks
⬣ Deep Unsupervised /
Reinforcement / Structured /
<insert-qualifier-here>
Learning
⬣ Simply: Deep Learning
(Hierarchical) End-to-End Distributed
Compositionality Learning Representations
⬣ Cascade of non- ⬣ Learning (goal- ⬣ No single neuron
linear driven) “encodes”
transformations representations everything
⬣ Multiple layers of ⬣ Learning feature ⬣ Groups of neurons
representations extraction work together

So What is Deep (Machine) Learning?

VISION
hand-crafted
your favorite
features “car”
classifier
SIFT/HOG
fixed learned
SPEECH
hand-crafted
your favorite
features \ˈd ē p\
classifier
MFCC
fixed learned

NLP hand-crafted
This burrito place your favorite
features “+”
is yummy and fun! classifier
Bag-of-words
fixed learned
Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Traditional Machine Learning

VISION

pixels edge texton motif part object

SPEECH

sample spectral formant motif phone word

band

NLP

character word NP/VP/.. clause sentence story

Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Hierarchical Compositionality
Given a library of simple functions
Idea 1:
Linear Combinations
𝐬𝐢𝐧(𝒙) ⬣ Boosting
Compose into a
𝐥𝐨𝐠(𝒙) ⬣ Kernels
𝐜𝐨𝐬(𝒙)
𝒙𝟑
⬣ …
complicate function
𝐞𝐱𝐩(𝒙)
𝒇 𝒙 = + 𝜶𝒊 𝒈𝒊 (𝒙)
+ 𝒊

Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Building A Complicated Function

Given a library of simple functions
Idea 2:
Compositions
𝐬𝐢𝐧(𝒙) ⬣ Deep Learning
Compose into a
𝐥𝐨𝐠(𝒙) ⬣ Grammar models
𝐜𝐨𝐬(𝒙)
𝒙𝟑
⬣ Scattering transforms…
complicate function
𝐞𝐱𝐩(𝒙)
𝒇 𝒙 = 𝒈𝟏 (𝒈𝟐 (… 𝒈𝒏 𝒙 … )

Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Building A Complicated Function

𝐬𝐢𝐧(𝒙) 𝒙𝟑 𝐞𝐱𝐩(𝒙) 𝐜𝐨𝐬(𝒙) 𝐥𝐨𝐠(𝒙)

Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Building A Complicated Function

“car”

Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Deep Learning = Hierarchical Compositionality

Low-Level Mid-Level High-Level Trainable
“car”
Feature Feature Feature Classifier

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Deep Learning = Hierarchical Compositionality

How is Deep
Learning
Different?
End-to-End
Learning
(Hierarchical) End-to-End Distributed
Compositionality Learning Representations
⬣ Cascade of non- ⬣ Learning (goal- ⬣ No single neuron
linear driven) “encodes”
transformations representations everything
⬣ Multiple layers of ⬣ Learning feature ⬣ Groups of neurons
representations extraction work together

So What is Deep (Machine) Learning?

VISION
hand-crafted
your favorite
features “car”
classifier
SIFT/HOG
fixed learned
SPEECH
hand-crafted
your favorite
features \ˈd ē p\
classifier
MFCC
fixed learned

NLP hand-crafted
This burrito place your favorite
features “+”
is yummy and fun! classifier
Bag-of-words
fixed learned
Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Traditional Machine Learning

SIFT Spin Images

HoG Textons

And many many more….

Feature Engineering
VISION “learned”

K-Means/ “car”
SIFT/HOG classifier
pooling
fixed unsupervised supervised
SPEECH
Mixture of \ˈd ē p\
MFCC classifier
Gaussians
fixed unsupervised supervised

NLP
This burrito place Parse Tree “+”
is yummy and fun! n-grams classifier
Syntactic
fixed unsupervised supervised
Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Traditional Machine Learning (more accurately)

VISION “learned”

K-Means/ “car”
SIFT/HOG classifier
pooling
fixed unsupervised supervised
SPEECH
Mixture of \ˈd ē p\
MFCC classifier
Gaussians
fixed unsupervised supervised

NLP
This burrito place Parse Tree “+”
is yummy and fun! n-grams classifier
Syntactic
fixed unsupervised supervised
Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

Deep Learning = End-to-End Learning

“Shallow” models

Hand-crafted “Simple” Trainable

Feature Extractor Classifier

fixed learned
Deep models
Trainable Trainable Trainable
Feature- Feature- Feature-
Transform / Transform / Transform /
Classifier Classifier Classifier

Learned Internal Representations

Adapted from slides by: Marc'Aurelio Ranzato, Yann LeCun

“Shallow” vs Deep Learning

How is Deep
Learning
Different?
Distributed
Representations
(Hierarchical) End-to-End Distributed
Compositionality Learning Representations
⬣ Cascade of non- ⬣ Learning (goal- ⬣ No single neuron
linear driven) “encodes”
transformations representations everything
⬣ Multiple layers of ⬣ Learning feature ⬣ Groups of neurons
representations extraction work together

So What is Deep (Machine) Learning?

Local vs Distributed

tal
gl e
on
al
(a) (b)

e
tan
i ps
rtic

ri z
re c

el l
ho
ve
no pattern no pattern

Adapted from slides by Moontae Lee

Distributed Representations Toy Example

Local = VR + HR + HE = ?

Distributed = V+H+E ≈

Adapted from slides by Moontae Lee

Power of Distributed Representations!

(Hierarchical) End-to-End Distributed
Compositionality Learning Representations
⬣ Cascade of non- ⬣ Learning (goal- ⬣ No single neuron
linear driven) “encodes”
transformations representations everything
⬣ Multiple layers of ⬣ Learning feature ⬣ Groups of neurons
representations extraction work together

So What is Deep (Machine) Learning?

Of Course! The Greatest Collect - Zack Guido PDF
100% (1)
Of Course! The Greatest Collect - Zack Guido PDF
263 pages
Ashrae Comm BLDG Std-211p Draft 2015-09-21 - Clean
75% (4)
Ashrae Comm BLDG Std-211p Draft 2015-09-21 - Clean
64 pages
Oracle FA Student Guide
100% (13)
Oracle FA Student Guide
122 pages
Safety in Pneumatic Systems: Workbook TP 250
No ratings yet
Safety in Pneumatic Systems: Workbook TP 250
25 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
ppt4dl
No ratings yet
ppt4dl
81 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
UNIT 1
No ratings yet
UNIT 1
38 pages
Introduction To Machine Learning: Workshop On Machine Learning For Intelligent Image Processing
No ratings yet
Introduction To Machine Learning: Workshop On Machine Learning For Intelligent Image Processing
44 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
Lec1 Intoduction
No ratings yet
Lec1 Intoduction
34 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
39 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
1. U1 ML Intro and Applications
No ratings yet
1. U1 ML Intro and Applications
123 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
01 Introduction
No ratings yet
01 Introduction
43 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
46 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
9 pages
Introduction to machine learning
No ratings yet
Introduction to machine learning
33 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
Lecture 02
No ratings yet
Lecture 02
34 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Lecture 02
No ratings yet
Lecture 02
34 pages
CS3491-AI ML-Chapter 1
No ratings yet
CS3491-AI ML-Chapter 1
19 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
Ch7 Introduction to Machine Learning
No ratings yet
Ch7 Introduction to Machine Learning
29 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Week 01
No ratings yet
Week 01
37 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
9 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Algorithms for Artificial Intelligence
No ratings yet
Algorithms for Artificial Intelligence
69 pages
Chapter - 2 Machine Learning Overview
No ratings yet
Chapter - 2 Machine Learning Overview
90 pages
2024-SCU-ML-1-3-PLA
No ratings yet
2024-SCU-ML-1-3-PLA
50 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
CE880_lecture5_slides
No ratings yet
CE880_lecture5_slides
32 pages
3. Introduction to Machine Learning
No ratings yet
3. Introduction to Machine Learning
27 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
nn
No ratings yet
nn
24 pages
1b Different Types
No ratings yet
1b Different Types
26 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Aws ML PDF
No ratings yet
Aws ML PDF
74 pages
Data Analytics_ML lecturenotes
No ratings yet
Data Analytics_ML lecturenotes
85 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
ML UNIT-II
No ratings yet
ML UNIT-II
37 pages
ML Notes -2025
No ratings yet
ML Notes -2025
145 pages
algorithmeknn-121213175830-phpapp02
No ratings yet
algorithmeknn-121213175830-phpapp02
52 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
01 Introduction 1
No ratings yet
01 Introduction 1
71 pages
Intro Machine Learning
No ratings yet
Intro Machine Learning
4 pages
Practicalintroductiontomachinelearning1561472049990 PDF
No ratings yet
Practicalintroductiontomachinelearning1561472049990 PDF
110 pages
What Is Machine Learning_ _ Python Data Science Handbook
No ratings yet
What Is Machine Learning_ _ Python Data Science Handbook
11 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Mca 1 Sem Result
No ratings yet
Mca 1 Sem Result
3 pages
Ey Service Resiliency Service Overview
No ratings yet
Ey Service Resiliency Service Overview
2 pages
Interview FMT
No ratings yet
Interview FMT
9 pages
eMMC Data Recovery From Damaged Smartphone - Dangerous Payload
No ratings yet
eMMC Data Recovery From Damaged Smartphone - Dangerous Payload
15 pages
CPSD DA 1(Literature review of 3 papers) Deadline 22_8_24 (Responses)
No ratings yet
CPSD DA 1(Literature review of 3 papers) Deadline 22_8_24 (Responses)
33 pages
SWE30009 2023S2 Assignment 1
No ratings yet
SWE30009 2023S2 Assignment 1
2 pages
Functions ppt
No ratings yet
Functions ppt
24 pages
Guru Resume
No ratings yet
Guru Resume
7 pages
Seminar Research Paper
No ratings yet
Seminar Research Paper
5 pages
B V P (I) : Finite Difference Methods For Differential Equations
No ratings yet
B V P (I) : Finite Difference Methods For Differential Equations
8 pages
Lanelet2: A High-Definition Map Framework For The Future of Automated Driving
No ratings yet
Lanelet2: A High-Definition Map Framework For The Future of Automated Driving
8 pages
redeemed christian church of god hymn - english
No ratings yet
redeemed christian church of god hymn - english
16 pages
Lesson Plan On Secondary Computer Science
No ratings yet
Lesson Plan On Secondary Computer Science
21 pages
Computational Fluid Dynamics and Flow Lab
100% (1)
Computational Fluid Dynamics and Flow Lab
13 pages
CICC Cyber Economic Forum Ictm2023
No ratings yet
CICC Cyber Economic Forum Ictm2023
11 pages
Trans Series, M
No ratings yet
Trans Series, M
29 pages
Cover Letter Template Executive
100% (2)
Cover Letter Template Executive
5 pages
COMPACT-919
No ratings yet
COMPACT-919
2 pages
Industrial Training in AIML Weekly Diary
No ratings yet
Industrial Training in AIML Weekly Diary
13 pages
Intro Fullprof
No ratings yet
Intro Fullprof
34 pages
Week 03 Lectures
No ratings yet
Week 03 Lectures
21 pages
Scrapping Keywords
No ratings yet
Scrapping Keywords
91 pages
20241105-WP-Guide Streaming Data Mesh
No ratings yet
20241105-WP-Guide Streaming Data Mesh
40 pages
Procedure in Computer Assembly
No ratings yet
Procedure in Computer Assembly
1 page
Form 10A – Registration or Provisional Registratio+
No ratings yet
Form 10A – Registration or Provisional Registratio+
4 pages
Swipe File
No ratings yet
Swipe File
7 pages