0% found this document useful (0 votes)
10 views39 pages

Lec 24

1) This document provides an overview of machine learning algorithms including supervised learning techniques like classification and regression as well as unsupervised learning techniques like clustering and dimensionality reduction. 2) It discusses specific algorithms like k-NN, linear regression, logistic regression, Naive Bayes, SVM, neural networks, K-means clustering, PCA, and reinforcement learning. 3) It provides examples of how these algorithms work and descriptions of their models, cost functions, learning procedures, and inferences.

Uploaded by

saafinhasan27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views39 pages

Lec 24

1) This document provides an overview of machine learning algorithms including supervised learning techniques like classification and regression as well as unsupervised learning techniques like clustering and dimensionality reduction. 2) It discusses specific algorithms like k-NN, linear regression, logistic regression, Naive Bayes, SVM, neural networks, K-means clustering, PCA, and reinforcement learning. 3) It provides examples of how these algorithms work and descriptions of their models, cost functions, learning procedures, and inferences.

Uploaded by

saafinhasan27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Class Summary

Jia-Bin Huang
ECE-5424G / CS-5824 Virginia Tech Spring 2019
• Thank you all for participating this class!

• SPOT survey!

• Please give us feedback: lectures, topics, homework, exam, office


hour, piazza
Machine learning algorithms
Supervised Unsupervised
Learning Learning

Discrete Classification Clustering

Dimensionality
Continuous Regression reduction
k-NN (Classification/Regression)
• Model

• Cost function
None
• Learning
Do nothing
• Inference
, where
Linear regression (Regression)
• Model

• Cost function

• Learning
1) Gradient descent: Repeat {}
2) Solving normal equation
• Inference
Naïve Bayes (Classification)
• Model

• Cost function
Maximum likelihood estimation:
Maximum a posteriori estimation :
• Learning

(Discrete )
(Continuous ) mean , variance ,
Logistic regression (Classification)
• Model

• Cost function

• Learning
Gradient descent: Repeat {}


Hard-margin SVM formulation

𝑥2
margin

Soft-margin SVM formulation 𝑥1

𝑥2

𝑥
[]
SVM with kernels 𝑓0
𝑓1
• Hypothesis: Given , compute features 𝑓= 𝑓
2

• Predict ⋮
𝑓𝑚
• Training (original)

• Training (with kernel)


SVM parameters

Large : Lower bias, high variance.
Small Higher bias, low variance.

• Large features vary more smoothly.


• Higher bias, lower variance
• Small features vary less smoothly.
• Lower bias, higher variance

Slide credit: Andrew Ng


Neural network
𝑥0 𝑎
(2 )
0

𝑥1 𝑎
(2 )
1
“Output”
𝑥2 𝑎
(2 )
2
hΘ( 𝑥 )

𝑥3 𝑎
(2 )
3
Layer 1 Layer 2 Layer 3 Slide credit: Andrew Ng
Neural network
“activation” of unit in layer
𝑥0 𝑎
(2 )
0 matrix of weights controlling
𝑥1 𝑎
(2 )
1 function mapping from layer to layer
𝑥2 𝑎
(2 )
2
hΘ( 𝑥 )
𝑥3 (2 ) unit in layer
𝑎 3
units in layer

Size of ?

𝑠 𝑗 +1 ×(𝑠 𝑗 + 1)
Slide credit: Andrew Ng
Neural network “Pre-activation”

𝑥0 𝑎
(2 )
0

𝑥1 𝑎
(2 )
1

𝑥2 𝑎
(2 )
2
hΘ( 𝑥 )
𝑥3 𝑎
(2 )
3

Slide credit: Andrew Ng


Neural network “Pre-activation”

𝑥0 𝑎
(2 )
0

𝑥1 𝑎
(2 )
1

𝑥2 𝑎
(2 )
2
hΘ( 𝑥 )
𝑥3 𝑎
(2 )
3

Add

Slide credit: Andrew Ng


Neural network learning its own features

𝑥0 𝑎
(2 )
0

𝑥1 𝑎
(2 )
1

𝑥2 𝑎
(2 )
2
hΘ( 𝑥 )

𝑥3 𝑎
(2 )
3
Slide credit: Andrew Ng
Bias / Variance Trade-off
• Training error

• Cross-validation error

Loss

Degree of Polynomial
Source: Andrew Ng
Bias / Variance Trade-off
• Training error

• Cross-validation error

High bias High Variance


Loss

Degree of Polynomial
Bias / Variance Trade-off with
Regularization
• Training error

• Cross-validation error

Loss

λ
Source: Andrew Ng
Bias / Variance Trade-off with
Regularization
• Training error

• Cross-validation error

High Variance High bias


Loss

λ
Source: Andrew Ng
K-means algorithm
Randomly initialize cluster centroids
Cluster assignment step
Repeat{
for = 1 to
index (from 1 to ) of cluster centroid
closest to
Centroid update step

for = 1 to
average (mean) of points assigned to cluster
} Slide credit: Andrew Ng
Expectation Maximization (EM) Algorithm
• Goal: Find that maximizes log-likelihood

Jensen’s inequality:
Expectation Maximization (EM) Algorithm
• Goal: Find that maximizes log-likelihood

- The lower bound works for all possible set of distributions


- We want tight lower-bound:
- When will that happen? with probability 1 ( is a constant)
How should we choose ?

• (because it is a distribution)
EM algorithm
Repeat until convergence{

(E-step) For each , set

(Probabilistic inference)

(M-step) Set

}
Anomaly detection algorithm
1. Choose features that you think might be indicative of anomalous
examples

2. Fit parameters

3. Given new example , compute

Anomaly if
Problem motivation
Movie Alice (1) Bob (2) Carol (3) Dave (4)
(romance) (action)
Love at last 5 5 0 0 0.9 0
Romance 5 ? ? 0 1.0 0.01
forever
Cute puppies ? 4 0 ? 0.99 0
of love
Nonstop car 0 0 5 4 0.1 1.0
chases
Swords vs. 0 0 5 ? 0 0.9
karate
Problem motivation
Movie Alice (1) Bob (2) Carol (3) Dave (4)
(romance) (action)
Love at last 5 5 0 0 ? ?
Romance 5 ? ? 0 ? ?
forever
Cute puppies ? 4 0 ? ? ?
of love
Nonstop car 0 0 5 4 ? ?
chases
Swords vs. 0 0 5 ? ? ?
karate

[] [] [] [] []
(1 )
0 ( 2)
0 ( 3)
0 (4)
0 (1 )
?
𝜃 = 5 𝜃 = 5 𝜃 = 0 𝜃 = 0 𝑥 = ?
0 0 5 5 ?
Collaborative filtering optimization objective
• Given , estimate

• Given , estimate

• Minimize and simultaneously


Collaborative filtering algorithm
• Initialize to small random values
• Minimize using gradient descent (or an advanced optimization
algorithm). For every

• For a user with parameter and movie with (learned) feature , predict a
star rating of
Semi-supervised Learning
Problem Formulation
• Labeled data

• Unlabeled data

• Goal: Learn a hypothesis (e.g., a classifier) that has small error


Deep Semi-supervised Learning
Ensemble methods
• Ensemble methods
• Combine multiple classifiers to make better one
• Committees, majority vote
• Weighted combinations
• Can use same or different classifiers
• Boosting
• Train sequentially; later predictors focus on mistakes by earlier
• Boosting for classification (e.g., AdaBoost)
• Use results of earlier classifiers to know what to work on
• Weight hard examples so we focus on them more
• Example: Viola-Jones for face detection
Generative models
Simple Recurrent Network
Reinforcement learning

• Markov decision process


• Q-learning
• Policy gradient
Final exam sample questions
Conceptual questions
• [True/False] Increasing the value of k in a k-nearest neighbor classifier
will decrease its bias
• [True/False] Backpropagation helps neural network training get
unstuck from local minimum
• [True/False] Linear regression can be solved by either matrix algebra
or gradient descent
• [True/False] Logistic regression can be solved by either matrix algebra
or gradient descent
• [True/False] K-means clustering has a unique solution
• [True/False] PCA has a unique solution
Classification/Regression
• Given a simple dataset

• 1) Estimate the parameters

• 2) Compute training error

• 3) Compute leave-one-out cross-validation error

• 4) Compute testing error


Naïve Bayes
• Compute individual probabilities

• Compute
using Naïve Bayes classifier

You might also like