0% found this document useful (0 votes)

7 views5 pages

ECE 449 Notes

The document provides an overview of various machine learning algorithms including K-Nearest Neighbors, Perceptron, Naive Bayes, Logistic Regression, and Support Vector Machines, detailing their methodologies, advantages, and limitations. It discusses key concepts such as distance metrics, probability estimation, optimization techniques, and the importance of model assumptions. Additionally, it covers parameter estimation methods like Maximum Likelihood Estimation and Maximum A Posteriori Estimation, highlighting their applications in different contexts.

Uploaded by

hzz121600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views5 pages

ECE 449 Notes

Uploaded by

hzz121600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

ECE 449 Machine Learning: Eric Ji

1 K-Nearest Neighbors (Non-linear)

• Find top K nearest neighbors under metric d and return most common/average label/value among
them
– d(x, z) = ( |x − z|p )1/p
P

– L1 (Manhattan) Distance: p = 1
– L2 (Euclidean) Distance: p = 2
• Determine K using validation set
– Small K is sensitive to noise and will overfit
– Large K includes too far examples and will underfit
• Simple to implement but several issues
– Require memory to store dataset
– Computationally expensive inference time
– Sensitive to outliers
– Curse of dimensionality: High-dimensional data spreads far away from each other giving low
performance
• Nonparametric models place mild assumptions on data distribution and good for complex data, but
require storage/computation of entire dataset
• Parametric models place strong modeling assumptions and require fitting the model to achieve more
efficient storage/computation

2 Perceptron (Linear)
• Only applies to linearly separable data
• Perceptrons are linear classifiers trying to learn a hyperplane
• Hyperplane in Rd -space is represented as w0 + wT x = 0 where w ∈ Rd
– w is orthogonal to hyperplane and points to positive half-space
• Predicted label y(x) = sign(wt + b)
• Perceptron algorithm iterates through all the data and simultaneously updates W until all data is
correctly labeled
– Update rule: wnew = w + yx when y(wT x) <= 0
• Theorem: Given w∗ that perfectly separates the data and γ = min|w∗T x( i)|, ∀x(i) in D, the perception
algorithm takes at most 1/γ 2 to converge

1
3 Probability and Estimation
• Useful Probability Properties:
P (A,B)
– Conditional Probability: P (A|B) = P (B)
P (B|A)P (A)
– Bayes Rule: P (A|B) = P (B)

• Useful Log Rules:

– log(AB ) = B · log(A)
– log(AB) = log(A) + log(B)
– log(A/B) = log(A) − log(B)

• Useful Derivative Rules:

d 1
– dx log(x) = x

• From a dataset of joint probabilities P (X1 , X2 , x3 , ..., Xd , Y ) we can calculate P (Y |X1 , X2 , x3 , ..., Xd )

– Intuitive to learn P (Y |X) from joint distribution, but requires lots of data that may not be
attainable to produce accurate model
• Estimate parameters from sparse data using Maximum Likelihood Estimation and Maximum A Pos-
terior Estimation

• MLE chooses parameter θ that maximizes maximizes probability of observing dataset D

– θ̂ = argmaxθ P (D|θ) where P (D|θ) is the likelihood function
• Steps for solving MLE:
– Take log of likelihood
– Take derivative in respect to θ and set equal to 0
– Solve for θ that maximizes the likelihood
• MAP chooses parameter θ that is most probable given prior P (θ) and dataset D

– θ̂ = argmaxθ P (θ|D)
– θ̂ = argmaxθ P (D|θ)P
P (D)
(θ)
according to Bayes Rule
– θ̂ = argmaxθ P (D|θ)P (θ) as P (D) does not depend on θ
• MAP is better than MLE when small number of samples of dataset and prior is accurate
• As the number of samples from our dataset approaches infinity, the prior becomes irrelevant and MAP
will become MLE

2
4 Naive Bayes(Probalistic)
• Aims to learn P (Y |X) through P (X|Y ) and P (Y ) using Bayes rule with conditional independence
assumption to reduce number of parameters to estimate
P (X1 ,...,Xd |Y )P (Y )
– P (Y |X1 , ..., Xd ) = ∝ P (X1 , ..., Xd |Y )P (Y ) ignoring normalization
P (X1 ,...,Xd )

• Conditional Independence: P (X1 , ..., Xd |Y ) = j P (Xj |Y )

– Requires estimating 2(2d − 1) + 1 parameters without assuming conditional independence

– Requires estimating 2d + 1 parameters with assuming conditional independence
• Utilize MLE and MAP to estimate the parameters to learn P (Y |X)
– MAP makes it such that P (Y |X) won’t be 0 if one component of the product is 0

• If X is a continuous value, it is common to assume P (X|Y ) follow a normal distribution

– Variance can be independent of class, feature, or both
• Gaussian Naive Bayes can be linear with many assumptions regarding the data’s distributions

5 Logistic Regression (Linear)

• Discriminative counterpart to Naive Bayes that directly learn P (Y |X)
– Discriminative models directly calculate the weights
– Generative models calculate all the probabilities/parameters to calculate the weights then

• Learn a set of weights for each class

• P (Y |X) can be represented by sigmoid function
1P
– P (Y = c|X) = 1+exp(w0 + j wj Xj )

• Calculate weights using MCLE

– wM CLE = argmaxw P (y ( i)|x(i), w)
Q

– Objective is concave, but does not have a closed form so needs optimization techniques

• Can apply MAP by placing a prior on the weights themselves

– wM CLE = argmaxw P (w) P (y ( i)|x(i), w)
Q

• Logistic regression typically gives the better solution compared to naive bayes, especially with lots of
data and conditional independence does not hold

3
6 Optimization
• Gradient Descent uses first order Taylor expansion approximation to assume an objective function l
around weights w is linear
– l(w + s) = l(w) + g(w)T s where g(w) = ∇l(w)
• Gradient Descent Update rule: wnew = w − αg(w) to minimize l(w)

– Step size α should decrease by a constant rate for each update for good convergence
• Batch gradient uses error over training of entire dataset and updates w
• Stochastic gradient uses error over single sample and updates w
• Newton’s Method uses 2nd order Taylor expansion approximation

– l(w + s) = l(w) + g(w)T s + 21 sT (H(w)s

• Newton’s Update rule: wnew = w − H(w)−1 g(w)
– H(w) is the Hessian matrix which composes of the outer-product of the second derivative of l(w)
in respect to w

• Encorporating a prior for a MAP estimate results in a regularization term when updating weights
– wnew = w − αg(w) − αλw
– Helps reduce overfitting by keeping weights near 0

7 Linear Regression
• Used to learn function that linearly maps X onto Y where Y is continuous
– First choose parameterized for for P (Y |X, w)
– Then derive MLE or MAP and estimate w

• MLE produces Squared loss for objective: l(w) = 1 i

− wT xi )2
P
N i (y

– Closed form solution that minimizes l(w) give w = (X T X)−1 X T y

• MAP produces Square loss plus sum of squared weights objective: l(w) = 1 i
− wT xi )2 + λ||w||22
P
N i (y

– Closed form solution that minimize l(w) gives w = (X T X + λI)−1 X T y

4
8 Support Vector Machine
• Separate positive and negative samples as wide as possible
• Hard margin SVM is for linearly separable data and expects perfect separation
– Objective is to minimize 12 ||w||22 such that y (i) (wT x(i) + b) ≥ 1
– Only need support vectors for inference
∗ y (i) (wT x(i) + b) = 1
• Soft margin SVM allows for misclassified samples in non-linearly separable data
– Objective is to minimize 21 ||w||22 + C i ξi such that y (i) (wT x(i) + b) ≥ 1 − ξi
P

– C is trade-off parameter where C = ∞ causes hard margin

– ξi is slack variable where ξi = max(0, 1 − y (i) (wT x(i) + b)
• In both objectives ||w||22 is the regularize
• In soft margin objective i max(0, 1 − y (i) is the hinge loss
P

• Utilize Lagrangian multiplier to minimize quadratic objective without the constraints

– Want to solve dual problem: maxα minw,b L(w, b, α)

• Hard margin objective: L(w, b, α) = 21 ||w||22 + i αi (1 − y (i) (wt x(i) + b))
P

– w∗ = i α∗ y (i) x(i)
P

• Can also be applied to soft margin

Math, Science and Technology Syllabus
88% (8)
Math, Science and Technology Syllabus
5 pages
Supervised Learning Algorithms Cheat Sheet
No ratings yet
Supervised Learning Algorithms Cheat Sheet
20 pages
Andrew NG Main - Notes PDF
No ratings yet
Andrew NG Main - Notes PDF
226 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Gamit NG Wika Sa Lipunan
0% (1)
Gamit NG Wika Sa Lipunan
5 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
OBTL Literature 2 - Panunuring Pampanitikan
No ratings yet
OBTL Literature 2 - Panunuring Pampanitikan
11 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Teachers Guide Upper Primary Science PDF
100% (2)
Teachers Guide Upper Primary Science PDF
69 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Albert Bandura: Social Learning Theory
No ratings yet
Albert Bandura: Social Learning Theory
21 pages
Mga Teorya NG Kagamitang Panturo
No ratings yet
Mga Teorya NG Kagamitang Panturo
38 pages
FLS Mjpe Elc 303 Administration and Management of Pe and Health Programs
No ratings yet
FLS Mjpe Elc 303 Administration and Management of Pe and Health Programs
10 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Semi-Detailed Lesson Plan in Math
No ratings yet
Semi-Detailed Lesson Plan in Math
3 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
No ratings yet
CS229 Lecture Notes: Andrew NG and Tengyu Ma April 25, 2023
223 pages
Machine Learning: Support Vector Machines Kernel Methods
No ratings yet
Machine Learning: Support Vector Machines Kernel Methods
87 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
17 pages
Regression
No ratings yet
Regression
39 pages
CH 1
No ratings yet
CH 1
24 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Poly Aml
No ratings yet
Poly Aml
76 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
ML 01
No ratings yet
ML 01
24 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Session 5
No ratings yet
Session 5
36 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
ML Cheatsheet
No ratings yet
ML Cheatsheet
1 page
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
08 Classification
No ratings yet
08 Classification
46 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
ML Unit-4
No ratings yet
ML Unit-4
20 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
CS229
No ratings yet
CS229
216 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
How To Become Professional in Chatting With Chat GPT
No ratings yet
How To Become Professional in Chatting With Chat GPT
11 pages
FOCUS 2 FIRST PERIOD 14 Jave
No ratings yet
FOCUS 2 FIRST PERIOD 14 Jave
8 pages
Educ 215 The Teaching Profession
No ratings yet
Educ 215 The Teaching Profession
17 pages
Lesson Plan - Staying Healthy
No ratings yet
Lesson Plan - Staying Healthy
4 pages
Catherine Bryant, Victorian School of Languages, 1935-2015, Ph.D. Thesis, 2016
No ratings yet
Catherine Bryant, Victorian School of Languages, 1935-2015, Ph.D. Thesis, 2016
313 pages
Foundations of Curriculum Development: Lesson 4
No ratings yet
Foundations of Curriculum Development: Lesson 4
19 pages
Learning Disorders or Disabilities or Differences
100% (1)
Learning Disorders or Disabilities or Differences
46 pages
Lesson Plan Template Senior School
100% (1)
Lesson Plan Template Senior School
4 pages
Lesson 1
100% (1)
Lesson 1
5 pages
IPP Lesson Plan (Transitional Devices and Writing Term Paper)
100% (1)
IPP Lesson Plan (Transitional Devices and Writing Term Paper)
4 pages
Space Planning
No ratings yet
Space Planning
8 pages
Was Homework Banned in France
100% (1)
Was Homework Banned in France
7 pages
CTY1 Connect TV Worksheet Unit 1
No ratings yet
CTY1 Connect TV Worksheet Unit 1
4 pages
Introduction To SMAC Social Mobile Analytics and Cloud
No ratings yet
Introduction To SMAC Social Mobile Analytics and Cloud
3 pages
articleAnisroleplayFYP2020 1
No ratings yet
articleAnisroleplayFYP2020 1
10 pages
DLL AOM (Week 0 June 20-21)
No ratings yet
DLL AOM (Week 0 June 20-21)
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
6 pages
Journal of Human Resources Training V9 N24 2
No ratings yet
Journal of Human Resources Training V9 N24 2
8 pages
Bi 8
No ratings yet
Bi 8
3 pages
ILP-english Week2 Modules 3&5
No ratings yet
ILP-english Week2 Modules 3&5
1 page
Taylor Salamone Resume 10.15.2022
No ratings yet
Taylor Salamone Resume 10.15.2022
1 page
Curriculum Map Health&Pe6quarter2
No ratings yet
Curriculum Map Health&Pe6quarter2
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet