0% found this document useful (0 votes)
15 views50 pages

2024 Scu ML 1 3 Pla

2024-SCU-ML-1-3-PLA

Uploaded by

wxu5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views50 pages

2024 Scu ML 1 3 Pla

2024-SCU-ML-1-3-PLA

Uploaded by

wxu5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CSEN240

Machine Learning
Yen-Kuang Chen, Ph.D., IEEE Fellow
[email protected]

1
Outline
• Class 1
• What is machine learning?
• Why do we want to learn about machine learning? (part 1)
• When can machines learn? (The conditions under which machines can learn)
• Why can machines learn? (The underlying mechanism allows machines to learn)
• Class 2
• Why do we want to learn about machine learning? (part 2)
• How can machines learn? (Techniques/approaches)
• How can machines learn better? (Strategies to improve performance/efficiency)
Quiz
Why do we want to use machine learning?
• (A) To make complex decisions without data
• (B) To avoid processing a large amount of information
• (C) To build systems with simple programmable rules
• (D) To process a huge amount of information and handle complexity
Quiz
When can machines learn?
• (A) When there is no performance measurement to be improved
• (B) When there are no samples or observations about the relationship
• (C) When there is an underlying relationship to be learned
• (D) When machines have access to unlimited computational
resources
Class 1 Summary
• Why do we want to use machine learning?
• A huge amount of information must be processed
• However, no simple programmable rules
• ML can be an alternative route to building complicated systems

• When can machines learn?


• Exists some underlying relationship to be learned
• There is performance measurement to be improved
• There are samples (observations) about the relationship
• So, ML has some data to learn from

• Why can machines learn?


• The ability to learn is because the models and algorithms used
Formalize the Machine Learning
• Input: 𝑥⃑ ∈ 𝑋
• Output: 𝑦⃑ ∈ 𝑌
• Unknown relationship to be learned (aka, target function): 𝑓: 𝑋 → 𝑌
• Data (aka, training examples): 𝐷 = 𝑥⃑! , 𝑦⃑! , 𝑥⃑" , 𝑦⃑" , … , 𝑥⃑# , 𝑦⃑#
• Skill (hopefully with good performance): 𝑔: 𝑋 → 𝑌
Unknown target function
𝑓: 𝑋 → 𝑌
Hypothesis

Training examples Learned formula


𝐷 = 𝑥⃑!, 𝑦⃑! , … , 𝑥⃑" , 𝑦⃑" ML 𝑔≈𝑓
A Simple Hypothesis: the ‘Perceptron’
• For 𝑥⃑ = (𝑥! , 𝑥" , …, 𝑥$ ) ‘features of tumor’, compute a weighted score
malignant if ∑$%&! 𝑤% 𝑥% ≥ threshold
0
benign if ∑$%&! 𝑤% 𝑥% < threshold

• 𝑌: +1 malignant , −1 benign
• Note y is single-dimensional in this example

• Classification formula 𝑔 𝑥⃑ = sign ∑$%&! 𝑤% 𝑥% − threshold


• (Ignore 0 for now)
Vector Form of Perceptron Hypothesis
(For mathematical convenience)
• 𝑔 𝑥⃑ = sign ∑$%&! 𝑤% 𝑥% − threshold

= sign ∑$%&! 𝑤% 𝑥% + (−threshold)×(+1)

= sign ∑$%&' 𝑤% 𝑥% 𝑤! 𝑥!

= sign 𝑤 ⋅ 𝑥⃑
!
Perceptron in ℝ
(For visualization convenience)
• 𝑔 𝑥⃑ = sign 𝑤' + 𝑤! 𝑥! + 𝑤" 𝑥"
+
+

ℝ" ℝ# -
-
Features of tumor 𝑥⃑ Points on the 2D plane Points in ℝ#
Labels 𝑦 +1 (malignant), -1 (benign)
Hypothesis 𝑔 𝑥⃑ Line Hyperplanes in ℝ#
Positive on one side of a line (hyperplane);
Negative on the other side

After having the model, how can we find {𝑤! } & threshold?
à ML algorithms
Math Pre-Requisites
• Vector inner product • Vector addition
• Quiz: 𝑤( ⋅ 𝑥⃑#(() > or < 0? • Where is 𝑤(+! = 𝑤( + 𝑥⃑#(() ?

𝑥⃑#(%) 𝑥⃑#(%)
𝑥⃑#(%)

𝑤% 𝑤% 𝑤%
A Simple Learning Algorithm
• Start from some random 𝑤, , and correct its mistakes on 𝐷
• For 𝑡 = 0, 1, …
• Find a mistake called 𝑥⃑'()) , 𝑦'())
• Based on the mistake, “correct” 𝑤) → 𝑤)+,
• until no more mistakes
Perceptron Learning Algorithm (PLA)
• Start from some random 𝑤, , and correct its mistakes on 𝐷
• For 𝑡 = 0, 1, … 𝑦#(%) = +1
𝑥⃑#(%)
𝑤%'(
• Find a mistake called 𝑥⃑'()) , 𝑦'())
sign 𝑤) ⋅ 𝑥⃑'()) ≠ 𝑦'())
𝑤%
• (try to) correct the mistake by
𝑤)+, ← 𝑤) + 𝑦'()) 𝑥⃑'()) 𝑦#(%) = -1
𝑤%'(
𝑤%
• until no more mistakes
𝑥⃑#(%)
Pictorial Example

Source: H.-T. Lin, Machine Learning Foundations


Pictorial Example PLA: 𝑤%'( ← 𝑤% + 𝑦#(%) 𝑥⃑#(%)

Source: H.-T. Lin, Machine Learning Foundations


Pictorial Example PLA: 𝑤%'( ← 𝑤% + 𝑦#(%) 𝑥⃑#(%)

Source: H.-T. Lin, Machine Learning Foundations


Pictorial Example PLA: 𝑤%'( ← 𝑤% + 𝑦#(%) 𝑥⃑#(%)

Source: H.-T. Lin, Machine Learning Foundations


Pictorial Example PLA: 𝑤%'( ← 𝑤% + 𝑦#(%) 𝑥⃑#(%)

Source: H.-T. Lin, Machine Learning Foundations


Pictorial Example

Source: H.-T. Lin, Machine Learning Foundations


Group Exercise: Numerical Example
•𝐷= 1, 2 , −1 , 2, 1 , −1 , (3, 3), 1 , (4, 4), 1
• 𝑤, = (−1, 0.75, 0)

• Quiz
• What is 1 & -1 in D’s “third-”dimension?
• What is -1 in 𝑤/ ’s first-dimension?
• Given 𝑤/ , how to classify D?
Numerical Example PLA: 𝑤%'( ← 𝑤% + 𝑦#(%) 𝑥⃑#(%)
𝑥⃑#(%) = (1, 𝑥( , 𝑥" )

•𝐷= 1, 2 , −1 , 2, 1 , −1 , (3, 3), 1 , (4, 4), 1


• 𝑤, = (−1, 0.75, 0)

• Mistake 2,1 , −1

𝑥"

𝑥(
Numerical Example PLA: 𝑤%'( ← 𝑤% + 𝑦#(%) 𝑥⃑#(%)
𝑥⃑#(%) = (1, 𝑥( , 𝑥" )

• 𝑤! = −2, −1.25, −1

• Mistake 3,3 , 1
Numerical Example PLA: 𝑤%'( ← 𝑤% + 𝑦#(%) 𝑥⃑#(%)
𝑥⃑#(%) = (1, 𝑥( , 𝑥" )

• 𝑤" = −1, 1.75, 2

• Mistake 1,2 , −1
Numerical Example
• 𝑤- = −2, 0.75, 0
Why?
Why can machines learn? humans design

• The ability to learn is because the models and algorithms used

Unknown target function


𝑓: 𝑋 → 𝑌
Hypothesis

Training examples Learned formula


𝐷 = 𝑥⃑!, 𝑦⃑! , … , 𝑥⃑" , 𝑦⃑" ML 𝑔≈𝑓
Summary
• Why do we want to use machine learning?
• A huge amount of information must be processed
• However, no simple programmable rules
• ML can be an alternative route to building complicated systems

• When can machines learn?


• Exists some underlying relationship to be learned
• There is performance measurement to be improved
• There are samples (observations) about the relationship
• So, ML has some data to learn from

• Why can machines learn?


• The ability to learn is because the models and algorithms used
Extended Discussion
• Why “cannot” machines learn?
Outline
• Class 1
• What is machine learning?
• Why do we want to learn about machine learning? (part 1)
• When can machines learn? (The conditions under which machines can learn)
• Why can machines learn? (The underlying mechanism allows machines to learn)
• Class 2
• Why do we want to learn about machine learning? (part 2)
• How can machines learn? (Techniques/approaches)
• How can machines learn better? (Strategies to improve performance/efficiency)
Review of Perceptron Learning Algorithm
• What kinds of data were given?
• Labeled (often created by humans)
• What kinds of labels? Supervised classification
• Two classes
• Assumptions?
• Linear separatable
• Learning algorithm?
• Iteratively updates its weights in response to errors in its prediction
Course Atlas

Supervised Unsupervised

Dimension
Classification Regression Clustering
reduction

Perceptron Linear
Support Vector Linear
Learning Decision Trees Discriminant K-mean PCA
Machines (SVM) regression
Algorithm (PLA) Analysis (LDA)

29
How?
Supervised vs. Unsupervised Learning
• Supervised Learning
• The algorithm is provided with a set of input/output pairs
• The correct output is pre-“labeled”
• Goal: to learn mapping function to predict outputs for new/unseen inputs

• Unsupervised Learning
• No pre-labeled outputs provided to the algorithm
• Instead, the algorithm must explore the structure of the data on its own and
identify meaningful patterns or relationships among the input variables
Classification vs. Regression
• Classification
• Goal: to predict a categorical or discrete output variable
• E.g., benign or malignant, hand-written digit recognition

• Regression
• Goal: to predict a continuous numerical output
• E.g., price of a house, price of a used car
Binary vs. Multiclass Classification
• Patient sick/not sick • Written digits à 0, 1, …, 9
• Email spam/non-spam • Emails à primary, social,
• Credit approve/disapprove promotion, spam
• Answer correct/incorrect • Pictures à apple, orange,
strawberry

• Quiz: any other examples?


Clustering and Dimension Reduction
• Clustering
• Goal: to identify natural groupings in the data without any prior knowledge of the
correct labels or categories
• E.g., segment customers into groups based on their purchasing behaviors

• Dimension reduction
• Goal: to transform the original data into a new set of data that capture the most
variation in the data with the fewest number of components
• Applications: compression, feature extraction, data visualization
• Note that
• Data reduction refers to the process of reducing the amount of data in a dataset by removing
irrelevant or redundant information.
• Feature extraction, on the other hand, is the process of extracting useful information or
features from the raw data.
Quiz
1. Classification • Predicting the stock price of a
2. Regression company based on historical data.
3. Clustering • Identifying fraudulent transactions
in a credit card dataset.
4. Dimension reduction
• Segmenting customers into groups
based on their purchasing
behaviors.
• Visualizing high-dimensional data
in a lower-dimensional space.
Course Atlas

Supervised Unsupervised

Dimension
Classification Regression Clustering
reduction

Perceptron Linear
Support Vector Linear
Learning Decision Trees Discriminant K-mean PCA
Machines (SVM) regression
Algorithm (PLA) Analysis (LDA)

35
List of Key Questions About Each Machine
Learning Algorithm
• What kinds of data were given?
• Labeled?
• What kind of labels?
• Continuous or classification labels? Number of categories?
• Assumptions?
• Linear separatable?
• Learning algorithm?
• Direct optimization? Iterative optimization? Parameters?
• Inference computation?
• Linear? Polynomial? Non-linear?
• Error function?
• Class labels? Probabilities? Manual threshold?
• Overfitting or underfitting?
36
Summary
• How can machines learn?
• Different approaches to enable machines to learn, e.g.,

Supervised Unsupervised

Dimension
Classification Regression Clustering
reduction

Perceptron Linear
Support Vector Linear
Learning Decision Trees Discriminant K-mean PCA
Machines (SVM) regression
Algorithm (PLA) Analysis (LDA)
Better?
Remaining Question
• How can machines learn better?
• What are the strategies to improve performance/efficiency?

• Review of PLA
• Assumptions: Linear separable
• Learning algorithm: Iteratively updates its weights in response to errors in its
prediction
• However,
• Will it guarantee finding the linear separable boundary?
• How can we find the linear separable boundary faster?
• Furthermore, if data are linear separable, is PLA the best algorithm?
Linear Separability

Source: H.-T. Lin, Machine Learning Foundations


Will PLA Find the Boundary if Linear Separable
Notation:
• Linear separable means there is a 𝑤! such that 𝑤% : weight
• sign 𝑤/ ⋅ 𝑥⃑0 = 𝑦0 for all n 𝑥⃑) : input
𝑦) : output
• 𝑦0 𝑤/ ⋅ 𝑥⃑0 ≥ min 𝑦0 𝑤/ ⋅ 𝑥⃑0 > 0
0
• The inner product of 𝑤! ⋅ 𝑤" after updating with 𝑥⃑#(") , 𝑦#(")
• 𝑤/ ⋅ 𝑤1 = 𝑤/ ⋅ (𝑤123 + 𝑦0(123) 𝑥⃑0(123) )
= 𝑤/ ⋅ 𝑤123 + 𝑤/ ⋅ 𝑦0(123) 𝑥⃑0(123)
≥ 𝑤/ ⋅ 𝑤123 + min 𝑦0 𝑤/ ⋅ 𝑥⃑0
0
≥ 𝑤/ ⋅ 𝑤6 + 𝑡 min 𝑦0 𝑤/ ⋅ 𝑥⃑0
0
• Normally, two unit vectors are close to each other when the product
product of two unit vectors is closer to 1
𝑤" Does Not Grow Too Fast
" "
• 𝑤( = 𝑤(.! + 𝑦#((.!) 𝑥⃑#((.!)
" "
= 𝑤(.! + 2𝑦#((.!) 𝑤(.! S 𝑥⃑#((.!) + 𝑦#((.!) 𝑥⃑#((.!)
• When there is a mistake, sign 𝑤% ⋅ 𝑥⃑#(%) ≠ 𝑦#(%)

" "
≤ 𝑤(.! + 𝑦#((.!) 𝑥⃑#((.!)
" "
≤ 𝑤(.! + max 𝑥⃑#
#
" "
≤ 𝑤' + 𝑡 max 𝑥⃑#
#
Guarantee
/! ⋅/" 123 5# /! ⋅7⃑#
#
• ≥ 𝑡⋅ after correcting 𝑡 mistakes
/! /" /! ⋅189 7⃑#
#

• As long as linear separable and correct mistakes


• Inner product of 𝑤D and 𝑤) grows fast 𝑂(𝑡)
• Length of 𝑤) grows slowly 𝑂( 𝑡)
• PLA’s 𝑤) is more and more aligned with 𝑤D
Pros and Cons
• Pros
• Simple to implement
• Cons
• Assuming linear separable (property unknown in advance)
• Not fully sure how long it will halt
Variations of the Simple PLA
• Allow a small number of data points as noise

𝑤 = arg min ∑# 𝑦# − sign(𝑤 ⋅ 𝑥⃑# )


/

• Unfortunately, NP-hard to solve


• Modified the algorithm to Pocket algorithm

Source: H.-T. Lin, Machine Learning Foundations


Variations of the Simple PLA
• Randomize the order of the training data
• Prevent the algorithm from getting stuck in local minima

• Use a more sophisticated learning rate


• Learning rate: a hyperparameter that controls size of update steps
• 𝑤)+, ← 𝑤) + 𝑦'()) 𝑥⃑'()) à 𝑤)+, ← 𝑤) + 𝛼𝑦'()) 𝑥⃑'())

• Modify the algorithm


• The perceptron with momentum
Extended Discussion
• Even if data are linearly separable, is PLA the best algorithm?
List of Key Questions About Each Machine
Learning Algorithm
• What kinds of data were given?
• Labeled?
• What kind of labels?
• Continuous or classification labels? Number of categories?
• Assumptions?
• Linear separatable?
• Learning algorithm?
• Direct optimization? Iterative optimization? Parameters?
• Inference computation?
• Linear? Polynomial? Non-linear?
• Error function?
• Class labels? Probabilities? Manual threshold?
• Overfitting vs. underfitting?
47
This Course: Learn to Choose the Right Tool
• Many ML approaches
• Each tailored to different needs
• Each works best under specific circumstances

Our goal: learn which tool to use and when


Summary
• How can machines learn?
• Different approaches to enable machines to learn, e.g.,
• Supervised learning (with labeled data)
• Unsupervised learning (identify patterns in unlabeled data)

• How can machines learn better?


• Strategies to improve performance/efficiency
• In the next few weeks, we will see a number of different algorithms that can
solve the same problems (with the same assumptions)
Learning Objectives
• Demonstrate knowledge of and ability to solve problems in foundational
topics in machine learning
• Including logistic regression, linear discriminant analysis, Bayesian classification, and
support vector machines.
• Implement supervised learning algorithms
• For example, decision trees and linear regression.
• Implement unsupervised learning algorithms
• Such as clustering algorithms or principal-component analysis.
• Work with real data sets, create training and test data, and analyze the
results of learning algorithms.
• Demonstrate knowledge of neural networks
• Particularly backpropagation.

You might also like