1 ML Overview
1 ML Overview
fi
https://tung-dn.github.io/programming.html
What is machine learning?
• Arthur Samuel (1959): Machine learning is the eld of study that gives the
computer the ability to learn without being explicitly programmed.
fi
Supervised
Learning
Taxonomy of ML
Unsupervised
Reinforcement
Learning
Learning
Part II: Supervised Learning
Example 1: Predict whether a user likes a song or not
model
Example 1: Predict whether a user likes a song or not
Intensity
User Sharon
Tempo
Example 1: Predict whether a user likes a song or not
Intensity
User Sharon
DisLike
Like
Intensity
User Sharon
DisLike
Like
DisLike
Like
User Sharon
DisLike
Like
Experience/Data:
images with labels
indoor outdoor
Example 2: Classify Images
Label: outdoor
Label: indoor
Label: indoor
d: feature dimension
x
x1 Tempo
x=
x2 Intensity
Label Intensity
y ∈ {0,1}
y=1
Where “supervision”
comes from y=0
Relaxed Tempo Fast
Represent various types of data
• Image
- Pixel values
• Bank account
- Credit rating, balance, # deposits in last day, week,
month, year, #withdrawals
Two Types of Supervised Learning Algorithms
Classification Regression
Example of regression: housing price prediction
Given: a dataset that contains samples
(x1, y2), (x2, y2), . . . , (xn, yn) Price
Square feet
𝑛
Example of regression: housing price prediction
Given: a dataset that contains samples
(x1, y2), (x2, y2), (x3, y3), . . . , (xn, yn)
Square feet
𝑛
Example of regression: housing price prediction
x
(credit: stanford CS229)
Supervised Learning: More examples
x = raw pixels of the image y = bounding boxes
Classification Regression
y=1
Intensity
y=0
Tempo
Unsupervised Learning
• Given: dataset contains no label x1, x2, . . . , xn
• Goal: discover interesting patterns and structures in the data
y=1
Intensity Intensity
y=0
Tempo Tempo
Clustering
• Given: dataset contains no label x1, x2, . . . , xn
• Output: divides the data into clusters such that there are
intra-cluster similarity and inter-cluster dissimilarity
Intensity
Tempo
Clustering
[Arora-Li-Liang-Ma-Risteski, TACL’17,18]
How do we perform clustering?
• Many clustering algorithms. We will look at the two most
frequently used ones:
• K-means clustering: we specify the desired number of
clusters, and use an iterative algorithm to find them
• Hierarchical clustering: we build a binary tree over the
dataset
K-means clustering
• Very popular clustering method
Intensity
Tempo
K-means clustering
Step 2: for each point x, determine its cluster: nd the closest center in Euclidean space
Intensity
Tempo
fi
K-means clustering
Step 3: update all cluster centers as the centroids
Intensity
Tempo
K-means clustering
Repeat step 2 & 3 until convergence
Intensity
Converged solution!
No labels required!
Tempo
K-means clustering: A demo
https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
Hierarchical Clustering (more to follow next lecture)
Quiz Break
Q2-1: Which is true about machine learning?
Google Deepmind
Reinforcement Learning Key Problems
1. Problem: actions may have delayed effects.
• Requires credit-assignment
2. Problem: maximal reward action is unknown
• Exploration-exploitation trade-off
- Peter Whittle
Multi-armed Bandit
Today’s recap
• What is machine learning?
• Supervised Learning
• Classi cation
• Regression
• Unsupervised Learning
• Reinforcement Learning
fi
Thanks!