0% found this document useful (0 votes)
4 views

L01-intro-clustering

The document outlines the course structure and policies for a Pattern Recognition class, including sections for undergraduates and master's students, communication channels, and a plagiarism policy. It emphasizes the importance of understanding machine learning concepts, the workflow of machine learning, and the evaluation of models using various metrics. Additionally, it discusses the types of machine learning, the significance of feature extraction, and the course's philosophy of going beyond black box models.

Uploaded by

tharitad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

L01-intro-clustering

The document outlines the course structure and policies for a Pattern Recognition class, including sections for undergraduates and master's students, communication channels, and a plagiarism policy. It emphasizes the importance of understanding machine learning concepts, the workflow of machine learning, and the evaluation of models using various metrics. Additionally, it discusses the types of machine learning, the significance of feature extraction, and the course's philosophy of going beyond black box models.

Uploaded by

tharitad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

INTRODUCTION

2110573 Pattern Recognition


Sections
• 21 : Undergrad
• 1 : Masters

• Switch sections if wrong, if full please tell me to


increase.
Mycourseville and discord
Discord
https://discord.gg/2usmexXj
TA office hours: 10-11.30 pm Mondays, Tuesdays, Fridays – These are official
working hours
Private DMs (unrelated to personal issues will be ignored)
MyCourseVille
https://www.mycourseville.com?q=courseville/course/46136
Password: nya
For homework submission
Github
https://github.com/ekapolc/Pattern_2024
For slides and homework instructions
Playlist
https://www.youtube.com/playlist?list=PLcBOyD1N1T-OpGooU_P9nFL9I3I6IiDgu
Syllabus
Plagiarism Policy
• You shall not show other people your code or solution
• Copying will result in a score of zero for both parties on
the assignment
• Many of these algorithms have code available on the
internet, do not copy paste the codes
Grades
Notes regarding homework
submission
• Please submit everything as pdf
• The TAs will mostly grade this
• If you have a collab/python code – then export as pdf
• Combine the materials into a single pdf file

• Additional materials
• Can submit additional code or results in a .zip file
Course project
• <= 5 people
• Topic of your choice
• Can be implementing a paper
• Extension of a homework
• Project for other courses with an additional machine learning
component
• Your current research (with additional scope)
• Or work on a new application
• Must already have existing data! No data collection!
• Topics need to be pre-approved
• Details about the procedure TBA
Why study machine learning?
The machine learning trend 2015

http://www.gartner.com/newsroom/id/3114217
The machine learning trend 2016

http://www.gartner.com/newsroom/id/3412017
The machine learning trend 2017

http://www.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-cycle-for-emerging-technologies-2017/
The machine learning trend 2018

https://www.gartner.com/smarterwithgartner/5-trends-emerge-in-gartner-hype-cycle-for-emerging-
technologies-2018/
The machine learning trend 2019

https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cycle-for-artificial-intelligence-2019/
The machine learning trend 2021

https://www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
The machine learning trend 2022

https://www.gartner.com/en/articles/what-s-new-in-the-2022-gartner-hype-cycle-for-emerging-technologies
2023

https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle
• “If I were to guess like what our biggest existential threat
is, it’s probably that. So we need to be very careful with
the artificial intelligence. There should be some regulatory
oversight maybe at the national and international level,
just to make sure that we don’t do something very
foolish.”
• “I think people who are naysayers and try to drum up
these doomsday scenarios — I just, I don’t understand it.
It’s really negative and in some ways I actually think it is
pretty irresponsible”
Poll
What is Pattern Recognition?
• “Pattern recognition is a branch of machine learning that
focuses on the recognition of patterns and regularities in
data, although it is in some cases considered to be nearly
synonymous with machine learning.”
wikipedia

• What about
• AI
• Data mining
• Knowledge Discovery in Databases (KDD)
• Statistics
• Data science
What is AI?
• Classical definition
• A system that appears intelligent
• Populace definition

• Probably what the field


means right now
• ML
• narrow AI
• Specialized

https://techsauce.co/pr-news/tcas-use-cloud-
computing-and-ai-for-admission
Artificial General Intelligence (AGI)
• “hypothetical ability of an intelligent agent to understand
or learn any intellectual task that a human being can.”
Wikipedia
Can continue to learn new skills on its own.

Probably not of much interest besides philosophical


debates
Works done in baby steps
ML vs PR vs DM vs KDD
• “The short answer is: None. They are … concerned with
the same question: how do we learn from data?”
Larry Wasserman – CMU Professor

• Nearly identical tools and subject matter


History
• Pattern Recognition started from the engineering
community (mainly Electrical Engineering and Computer
Vision)
• Machine learning comes out of AI and mostly considered
a Computer Science subject
• Data mining starts from the database community
Distinguishing things
• DM – Data warehouse,
ETL
• AI – search, swarm
intelligence
• PR – Signal processing
(feature engineering)

http://www.deeplearningbook.org/
Different terminologies
http://statweb.stanford.edu/~tibs/stat315a/glossary.pdf
Merging communities and fields
• With the advent of Deep learning the fields are merging
and the differences are becoming unclear
Course philosophy
• Going beyond the black box
• In this course you will
• Understand models on a deeper level
• Implement stuff from scratch
The danger zone

Driving a car analogy


- Just drive without knowing where you are going
- Getting there vs getting there effectively
- Putting the wrong fuel into the car
Be better than autoML

https://towardsdatascience.com/ocr-for-scanned-numbers-using-googles-automl-vision-29d193070c64
Types of machine learning
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

0. Pre-machine learning: rule-base


Pre-machine learning: 7-segment
display
• Input: 7 binary values (0,1) forming a display
• Given x = (A, B, C, D, E, F, G)
• Output: y, either 0, 1, …, 9 or not a number
• Task: write a program (a function F) that maps
x to y; F(x) = y

Image from http://www.physics.udel.edu/~watson/scen103/colloq2000/7-


seg.html
Mapping function
Y X

Image from: http://www.instructables.com/id/DIY-7-Segment-Display/


Mapping function

• IF A==1 && B==1 && C==1 && D==1 && E==1 &&
F==1 && G==0, THEN output(0).
• IF B==1 && C==1, THEN output(1) F(x)
• …..
• OTHERWISE, output(“not number”)
Learning from data
• Machine learning requires identifying the same ingredients
• Input, Output, Task

Real world observations

This is the hardest part of data science


and the last part to be replaced by
machines.

https://cloud.google.com/blog/products/gcp/how-a-japanese-cucumber-farmer-is-using-deep-learning-and-tensorflow
An example
• Handwritten digit recognition
• Input: x = 28 x 28 pixel image
• Output: y = digit 0 to 9
• Task: find F(x) such that y ≈ F(x)

Goal of machine learning is to find the best


F(x) automatically from data
Supervised learning
• Learn a classifier F from a training set (input-output pairs)
• {(x1, y1), (x2, y2), (x3, y3), …, (xn, yn)}

Need a training set for training.


x y Training = finding (optimizing) a good
function f
0
Labeling (i.e., assigning y for each x
1 in the training set) is typically done
manually.
2
Types of machine learning
1. Supervised learning
Learn a model F from pairs of (x,y)
2. Unsupervised learning
Discover the hidden structure in unlabeled data x (no y)
3. Reinforcement learning
Train an agent to take appropriate actions in an environment by
maximizing rewards
Typical workflow of machine
learning
1. Feature extraction (getting the x)
2. Modeling
• Training (getting the function F)
3. Evaluation
• Metrics (defining what’s the best function F)
• Testing (getting the y for unseen inputs)
Typical workflow of machine
learning
• The typical workflow

sensors
Real world observations
1
Feature vector
5
Feature x
3.6
extraction 1
3
-1
How do we learn from data?
1
5 Training set
3.6
1
3
-1
Learning
algorithm

Model h Desired output y

Training phase
How do we learn from data?

New input X

1
5
3.6
1 h Predicted output y
3
-1

Testing
phase
Feature extraction
• The process of extracting meaningful information related
to the goal
• A distinctive characteristic or quality
• Example features

data1

data2

data3
Garbage in Garbage out
• The machine is as intelligent as the data/features we put
in
• “Garbage in, Garbage out”
• Data cleaning is often done
to reduce unwanted things

https://precisionchiroco.com/garbage-in-garbage-out/
The need for data cleaning

However, good models should be able to handle some dirtiness!

https://www.linkedin.com/pulse/big-data-conundrum-garbage-out-other-challenges-business-platform
Feature properties
• The quality of the feature vector is related to its ability to
discriminate samples from different classes

• In this course, we won’t talk much about data/feature


issues, since these are domain specific. However, they
can be important than modeling.
Model evaluation

How to compare h1 and h2?

New input X
h2
1
5
3.6
1 h1 Predicted output y
3
-1

Testing
phase
Metrics
• Compare the output of the models
• Errors/failures, accuracy/success
• We want to quantify the error/accuracy of the models
• How would you measure the error/accuracy of the
following
Ground truths
• We usually compare the model predicted answer with the
correct answer.
• What if there is no real answer?
• How would you rate machine translation?

ไปไหน

Model A: Where are you going?


Model B: Where to?

Designing a metric can be tricky, especially when it’s subjective


Ground truths can be hard

Slides from https://github.com/goldmermaid/mlrs


Labelling
Labelling issues
Need labelled data
Metrics consideration 1
• Are there several metrics?

• Use the metric closest to your goal but never disregard


other metrics.
• May help identify possible improvements
Metrics consideration 2
• Are there sub-metrics?

http://www.ustar-consortium.com/qws/slot/u50227/research.html
Commonly used metrics
• Error rate
• Accuracy rate

• Precision
• True positive
• Recall
• False alarm
• F score
A detection problem
• Identify whether an event occur
• A yes/no question
• A binary classifier
Smoke detector

Hotdog detector
Evaluating a detection problem
• 4 possible scenarios

Detector
Yes No
Actual Yes True positive False negative
(Type II error)
No False Alarm True negative
(Type I error)

True positive + False negative = # of actual yes


False alarm + True negative = # of actual no
• False alarm and True positive carries all the information of
the performance.
Detector
Yes No
Definitions Actual Yes True positive False negative
(Type II error)
No False Alarm True negative
(Type I error)

• True positive rate (Recall, sensitivity)


= # true positive / # of actual yes
• False positive rate (False alarm rate)
= # false positive / # of actual no
• False negative rate (Miss rate)
= # false negative / # of actual yes
• True negative rate (Specificity)
= # true negative / # of actual no

• Precision = # true positive / # of predicted positive


Search engine example

A recall of 50% means?

A precision of 50% means?


Recall/precision
• When do you want high recall?
• When do you want high precision?

• Initial screening for cancer


• Face recognition system for authentication
• Detecting possible suicidal postings on social media
• COVID screening: ATK vs PCR

Usually there’s a trade off between precision and recall. We will revisit this later
Let’s consider a case
• A: no rain predictor has 97% accuracy
• Always say no rain.

Detector
Rain No rain
Actual Rain 0 1

No rain 0 30

• Accuracy might not be a good metric for biased data


• A good model should be better than stupid baselines
Definitions 2
• F score (F1 score, f-measure)

• A single measure that combines both aspects


• A harmonic mean between precision and recall (an average of
rates)

Note that precision and recall says nothing about the true negative
Evaluating models
• We talked about the training set used to learn the model

• We use a different data set to test the accuracy/error of


models – “test set”

• We can still compute the error and accuracy on the


training set

• Training error vs Testing error


• We will discuss how we can use these to help guide us
later
Other considerations when evaluating
models
• Training time
• Testing time
• Memory requirement
• Parallelizability
• Latency
Course
walkthrough
Traditional
Machine learning

Deep learning
Why anything else besides deep
learning

https://medium.com/analytics-vidhya/ongoing-kaggle-survey-picks-the-topmost-data-
science-trends-7c19ec7606a1
KNN and K-means
clustering
Our first model - Unsupervised
learning
Discover the hidden structure in unlabeled data X (no y)
• Customer/product segmentation
• Data analysis for ...
• Identify number of speakers in a meeting recording
• Helps supervised learning in some task
Example - Customer analysis
Brand loyalty

Price
sensitivity
Example - Customer analysis
Brand loyalty

Price
sensitivity
Example - Real Estate
segmentation in Thailand
What should be the
input feature of
this?
Example - Real Estate
segmentation in Thailand
What should be the
input feature of
this?
Example - Real Estate
segmentation in Thailand
Example - Real Estate
segmentation in Thailand
Example - Real Estate
segmentation in Thailand
Example - Real Estate
segmentation in Thailand
K-mean clustering
Clustering - task that tries to automatically discover groups
within the data

Too hard… Brand loyalty

Price
sensitivity
K-mean clustering
Clustering - task that tries to automatically discover groups
within the data

Too hard… Brand loyalty


Easier if we know the Which cluster?
grouping beforehand
(supervised)

How?

Price
sensitivity
Nearest Neighbour classification
Find the closest training data, assign the same label as the
training data

Given query data


Brand loyalty
For every point in the training data
Compute the distance with the query Which cluster?

Assign label of the smallest distance

Price
sensitivity
K-Nearest Neighbour (kNN)
classification
Nearest Neighbour is susceptible to noise in the training
data
Use a voting scheme instead
Brand loyalty

Which cluster?

Price
sensitivity
K-Nearest Neighbour (kNN)
classification
Nearest Neighbour is susceptible to noise in the training
data
Use a voting scheme instead
Brand loyalty
Given query data
For every point in the training data k=4

Compute the distance with the query


Assign label of the smallest distance
Assign label by voting

The votes can be weighted by the Price


inverse distance (weighted k-NN) sensitivity
Closest?
We need some kind of distance or similarity measures
F(X1 ,X2) = d

Euclidean distance

Euclidean

Cosine similarity

Cosine similarity
Many more distances, Jaccard distance, Earth mover distance = cos(angle)
KNN runtime
For every point in the training data
Compute the distance with the query
Find the K closest data points
Assign label by voting

O(N)
O(JN) - If we have J queries
Expensive
Ways to make it faster
Kernelized KNN
Locally Sensitive Hashing (LSH)
Use centroids
Centroids
Basically, the representative of the cluster
Find the mean location of the cluster by averaging
Can use mode or median depending on the data
Brand loyalty

Which cluster?

O(JL)
L - number of clusters

Price
sensitivity
K-mean clustering
1. Randomly init k centroids by picking from data points
2. Assign each data points to centroids
3. Update centroids for each cluster
4. Repeat 2-3 until centroids does not change
Brand loyalty

Price
sensitivity
An Illustration Of K-Mean Clustering

Randomly select Assign points to


K=3 centroids nearest centroid

Update centroids Update point


assignments Update centroids

From https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
Characteristics of K-means
▪ The number of clusters, K, is specified in advance.
▪ Always converge to a (local) minimum.
• Poor starting centroid locations can lead to incorrect minima.

Image from https://en.wikipedia.org/wiki/K-means_clustering

▪ The model has several implicit assumptions:


• Data points scatter around cluster’s centers.
• Boundary between adjacent clusters is always halfway
between the cluster centroids.
Effect of bad initializations

Solution, try different randomization and


pick the best
From https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
Clustering Metrics?
Selecting K - Using Elbow method
All-data centroid From https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

K=1 K=2 K=3 K=4


between-cluster variance
fraction of explained variance =
all-data variance
(note: this - denotes Euclidean distance)

Amount of explained variance


The elbow method chooses K
Fraction of explained variance where increasing complexity
doesn’t yield much in return.
Model complexity

Number of cluster, K
Selecting K - other methods
95% explained variance
Fraction of explained variance

Choose minimal K that


explains at least 95% of the
Number of cluster, K all-data variance.

K Accuracy
K=2
Training
K=3 Testing / 2 50%
K-mean
Cross-
K=4 Clustering 3 68%
validation
Model
4 83%


Choose K that maximizes
certain objective (e.g.
accuracy on testing data) Best method
Summary
• Other clustering methods
• K-mode, K-median
• Spectral clustering (clustering in embedding space)
• DBScan (clustering by “density”– very robust, no need for k*)

https://github.com/NSHipster/DBSCAN

You might also like