0% found this document useful (0 votes)

4 views

L01-intro-clustering

The document outlines the course structure and policies for a Pattern Recognition class, including sections for undergraduates and master's students, communication channels, and a plagiarism policy. It emphasizes the importance of understanding machine learning concepts, the workflow of machine learning, and the evaluation of models using various metrics. Additionally, it discusses the types of machine learning, the significance of feature extraction, and the course's philosophy of going beyond black box models.

Uploaded by

tharitad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

L01-intro-clustering

Uploaded by

tharitad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

INTRODUCTION

2110573 Pattern Recognition

Sections
• 21 : Undergrad
• 1 : Masters

• Switch sections if wrong, if full please tell me to

increase.
Mycourseville and discord
Discord
https://discord.gg/2usmexXj
TA office hours: 10-11.30 pm Mondays, Tuesdays, Fridays – These are official
working hours
Private DMs (unrelated to personal issues will be ignored)
MyCourseVille
https://www.mycourseville.com?q=courseville/course/46136
Password: nya
For homework submission
Github
https://github.com/ekapolc/Pattern_2024
For slides and homework instructions
Playlist
https://www.youtube.com/playlist?list=PLcBOyD1N1T-OpGooU_P9nFL9I3I6IiDgu
Syllabus
Plagiarism Policy
• You shall not show other people your code or solution
• Copying will result in a score of zero for both parties on
the assignment
• Many of these algorithms have code available on the
internet, do not copy paste the codes
Grades
Notes regarding homework
submission
• Please submit everything as pdf
• The TAs will mostly grade this
• If you have a collab/python code – then export as pdf
• Combine the materials into a single pdf file

• Additional materials
• Can submit additional code or results in a .zip file
Course project
• <= 5 people
• Topic of your choice
• Can be implementing a paper
• Extension of a homework
• Project for other courses with an additional machine learning
component
• Your current research (with additional scope)
• Or work on a new application
• Must already have existing data! No data collection!
• Topics need to be pre-approved
• Details about the procedure TBA
Why study machine learning?
The machine learning trend 2015

http://www.gartner.com/newsroom/id/3114217
The machine learning trend 2016

http://www.gartner.com/newsroom/id/3412017
The machine learning trend 2017

http://www.gartner.com/smarterwithgartner/top-trends-in-the-gartner-hype-cycle-for-emerging-technologies-2017/
The machine learning trend 2018

https://www.gartner.com/smarterwithgartner/5-trends-emerge-in-gartner-hype-cycle-for-emerging-
technologies-2018/
The machine learning trend 2019

https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cycle-for-artificial-intelligence-2019/
The machine learning trend 2021

https://www.gartner.com/en/articles/the-4-trends-that-prevail-on-the-gartner-hype-cycle-for-ai-2021
The machine learning trend 2022

https://www.gartner.com/en/articles/what-s-new-in-the-2022-gartner-hype-cycle-for-emerging-technologies
2023

https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2023-gartner-hype-cycle
• “If I were to guess like what our biggest existential threat
is, it’s probably that. So we need to be very careful with
the artificial intelligence. There should be some regulatory
oversight maybe at the national and international level,
just to make sure that we don’t do something very
foolish.”
• “I think people who are naysayers and try to drum up
these doomsday scenarios — I just, I don’t understand it.
It’s really negative and in some ways I actually think it is
pretty irresponsible”
Poll
What is Pattern Recognition?
• “Pattern recognition is a branch of machine learning that
focuses on the recognition of patterns and regularities in
data, although it is in some cases considered to be nearly
synonymous with machine learning.”
wikipedia

• What about
• AI
• Data mining
• Knowledge Discovery in Databases (KDD)
• Statistics
• Data science
What is AI?
• Classical definition
• A system that appears intelligent
• Populace definition

• Probably what the field

means right now
• ML
• narrow AI
• Specialized

https://techsauce.co/pr-news/tcas-use-cloud-
computing-and-ai-for-admission
Artificial General Intelligence (AGI)
• “hypothetical ability of an intelligent agent to understand
or learn any intellectual task that a human being can.”
Wikipedia
Can continue to learn new skills on its own.

Probably not of much interest besides philosophical

debates
Works done in baby steps
ML vs PR vs DM vs KDD
• “The short answer is: None. They are … concerned with
the same question: how do we learn from data?”
Larry Wasserman – CMU Professor

• Nearly identical tools and subject matter

History
• Pattern Recognition started from the engineering
community (mainly Electrical Engineering and Computer
Vision)
• Machine learning comes out of AI and mostly considered
a Computer Science subject
• Data mining starts from the database community
Distinguishing things
• DM – Data warehouse,
ETL
• AI – search, swarm
intelligence
• PR – Signal processing
(feature engineering)

http://www.deeplearningbook.org/
Different terminologies
http://statweb.stanford.edu/~tibs/stat315a/glossary.pdf
Merging communities and fields
• With the advent of Deep learning the fields are merging
and the differences are becoming unclear
Course philosophy
• Going beyond the black box
• In this course you will
• Understand models on a deeper level
• Implement stuff from scratch
The danger zone

Driving a car analogy

- Just drive without knowing where you are going
- Getting there vs getting there effectively
- Putting the wrong fuel into the car
Be better than autoML

https://towardsdatascience.com/ocr-for-scanned-numbers-using-googles-automl-vision-29d193070c64
Types of machine learning
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning

0. Pre-machine learning: rule-base

Pre-machine learning: 7-segment
display
• Input: 7 binary values (0,1) forming a display
• Given x = (A, B, C, D, E, F, G)
• Output: y, either 0, 1, …, 9 or not a number
• Task: write a program (a function F) that maps
x to y; F(x) = y

Image from http://www.physics.udel.edu/~watson/scen103/colloq2000/7-

seg.html
Mapping function
Y X

Image from: http://www.instructables.com/id/DIY-7-Segment-Display/

Mapping function

• IF A==1 && B==1 && C==1 && D==1 && E==1 &&
F==1 && G==0, THEN output(0).
• IF B==1 && C==1, THEN output(1) F(x)
• …..
• OTHERWISE, output(“not number”)
Learning from data
• Machine learning requires identifying the same ingredients
• Input, Output, Task

Real world observations

This is the hardest part of data science

and the last part to be replaced by
machines.

https://cloud.google.com/blog/products/gcp/how-a-japanese-cucumber-farmer-is-using-deep-learning-and-tensorflow
An example
• Handwritten digit recognition
• Input: x = 28 x 28 pixel image
• Output: y = digit 0 to 9
• Task: find F(x) such that y ≈ F(x)

Goal of machine learning is to find the best

F(x) automatically from data
Supervised learning
• Learn a classifier F from a training set (input-output pairs)
• {(x1, y1), (x2, y2), (x3, y3), …, (xn, yn)}

Need a training set for training.

x y Training = finding (optimizing) a good
function f
0
Labeling (i.e., assigning y for each x
1 in the training set) is typically done
manually.
2
Types of machine learning
1. Supervised learning
Learn a model F from pairs of (x,y)
2. Unsupervised learning
Discover the hidden structure in unlabeled data x (no y)
3. Reinforcement learning
Train an agent to take appropriate actions in an environment by
maximizing rewards
Typical workflow of machine
learning
1. Feature extraction (getting the x)
2. Modeling
• Training (getting the function F)
3. Evaluation
• Metrics (defining what’s the best function F)
• Testing (getting the y for unseen inputs)
Typical workflow of machine
learning
• The typical workflow

sensors
Real world observations
1
Feature vector
5
Feature x
3.6
extraction 1
3
-1
How do we learn from data?
1
5 Training set
3.6
1
3
-1
Learning
algorithm

Model h Desired output y

Training phase
How do we learn from data?

New input X

1
5
3.6
1 h Predicted output y
3
-1

Testing
phase
Feature extraction
• The process of extracting meaningful information related
to the goal
• A distinctive characteristic or quality
• Example features

data1

data2

data3
Garbage in Garbage out
• The machine is as intelligent as the data/features we put
in
• “Garbage in, Garbage out”
• Data cleaning is often done
to reduce unwanted things

https://precisionchiroco.com/garbage-in-garbage-out/
The need for data cleaning

However, good models should be able to handle some dirtiness!

https://www.linkedin.com/pulse/big-data-conundrum-garbage-out-other-challenges-business-platform
Feature properties
• The quality of the feature vector is related to its ability to
discriminate samples from different classes

• In this course, we won’t talk much about data/feature

issues, since these are domain specific. However, they
can be important than modeling.
Model evaluation

How to compare h1 and h2?

New input X
h2
1
5
3.6
1 h1 Predicted output y
3
-1

Testing
phase
Metrics
• Compare the output of the models
• Errors/failures, accuracy/success
• We want to quantify the error/accuracy of the models
• How would you measure the error/accuracy of the
following
Ground truths
• We usually compare the model predicted answer with the
correct answer.
• What if there is no real answer?
• How would you rate machine translation?

ไปไหน

Model A: Where are you going?

Model B: Where to?

Designing a metric can be tricky, especially when it’s subjective

Ground truths can be hard

Slides from https://github.com/goldmermaid/mlrs

Labelling
Labelling issues
Need labelled data
Metrics consideration 1
• Are there several metrics?

• Use the metric closest to your goal but never disregard

other metrics.
• May help identify possible improvements
Metrics consideration 2
• Are there sub-metrics?

http://www.ustar-consortium.com/qws/slot/u50227/research.html
Commonly used metrics
• Error rate
• Accuracy rate

• Precision
• True positive
• Recall
• False alarm
• F score
A detection problem
• Identify whether an event occur
• A yes/no question
• A binary classifier
Smoke detector

Hotdog detector
Evaluating a detection problem
• 4 possible scenarios

Detector
Yes No
Actual Yes True positive False negative
(Type II error)
No False Alarm True negative
(Type I error)

True positive + False negative = # of actual yes

False alarm + True negative = # of actual no
• False alarm and True positive carries all the information of
the performance.
Detector
Yes No
Definitions Actual Yes True positive False negative
(Type II error)
No False Alarm True negative
(Type I error)

• True positive rate (Recall, sensitivity)

= # true positive / # of actual yes
• False positive rate (False alarm rate)
= # false positive / # of actual no
• False negative rate (Miss rate)
= # false negative / # of actual yes
• True negative rate (Specificity)
= # true negative / # of actual no

• Precision = # true positive / # of predicted positive

Search engine example

A recall of 50% means?

A precision of 50% means?

Recall/precision
• When do you want high recall?
• When do you want high precision?

• Initial screening for cancer

• Face recognition system for authentication
• Detecting possible suicidal postings on social media
• COVID screening: ATK vs PCR

Usually there’s a trade off between precision and recall. We will revisit this later
Let’s consider a case
• A: no rain predictor has 97% accuracy
• Always say no rain.

Detector
Rain No rain
Actual Rain 0 1

No rain 0 30

• Accuracy might not be a good metric for biased data

• A good model should be better than stupid baselines
Definitions 2
• F score (F1 score, f-measure)

• A single measure that combines both aspects

• A harmonic mean between precision and recall (an average of
rates)

Note that precision and recall says nothing about the true negative
Evaluating models
• We talked about the training set used to learn the model

• We use a different data set to test the accuracy/error of

models – “test set”

• We can still compute the error and accuracy on the

training set

• Training error vs Testing error

• We will discuss how we can use these to help guide us
later
Other considerations when evaluating
models
• Training time
• Testing time
• Memory requirement
• Parallelizability
• Latency
Course
walkthrough
Traditional
Machine learning

Deep learning
Why anything else besides deep
learning

https://medium.com/analytics-vidhya/ongoing-kaggle-survey-picks-the-topmost-data-
science-trends-7c19ec7606a1
KNN and K-means
clustering
Our first model - Unsupervised
learning
Discover the hidden structure in unlabeled data X (no y)
• Customer/product segmentation
• Data analysis for ...
• Identify number of speakers in a meeting recording
• Helps supervised learning in some task
Example - Customer analysis
Brand loyalty

Price
sensitivity
Example - Customer analysis
Brand loyalty

Price
sensitivity
Example - Real Estate
segmentation in Thailand
What should be the
input feature of
this?
Example - Real Estate
segmentation in Thailand
What should be the
input feature of
this?
Example - Real Estate
segmentation in Thailand
Example - Real Estate
segmentation in Thailand
Example - Real Estate
segmentation in Thailand
Example - Real Estate
segmentation in Thailand
K-mean clustering
Clustering - task that tries to automatically discover groups
within the data

Too hard… Brand loyalty

Price
sensitivity
K-mean clustering
Clustering - task that tries to automatically discover groups
within the data

Too hard… Brand loyalty

Easier if we know the Which cluster?
grouping beforehand
(supervised)

How?

Price
sensitivity
Nearest Neighbour classification
Find the closest training data, assign the same label as the
training data

Given query data

Brand loyalty
For every point in the training data
Compute the distance with the query Which cluster?

Assign label of the smallest distance

Price
sensitivity
K-Nearest Neighbour (kNN)
classification
Nearest Neighbour is susceptible to noise in the training
data
Use a voting scheme instead
Brand loyalty

Which cluster?

Price
sensitivity
K-Nearest Neighbour (kNN)
classification
Nearest Neighbour is susceptible to noise in the training
data
Use a voting scheme instead
Brand loyalty
Given query data
For every point in the training data k=4

Compute the distance with the query

Assign label of the smallest distance
Assign label by voting

The votes can be weighted by the Price

inverse distance (weighted k-NN) sensitivity
Closest?
We need some kind of distance or similarity measures
F(X1 ,X2) = d

Euclidean distance

Euclidean

Cosine similarity

Cosine similarity
Many more distances, Jaccard distance, Earth mover distance = cos(angle)
KNN runtime
For every point in the training data
Compute the distance with the query
Find the K closest data points
Assign label by voting

O(N)
O(JN) - If we have J queries
Expensive
Ways to make it faster
Kernelized KNN
Locally Sensitive Hashing (LSH)
Use centroids
Centroids
Basically, the representative of the cluster
Find the mean location of the cluster by averaging
Can use mode or median depending on the data
Brand loyalty

Which cluster?

O(JL)
L - number of clusters

Price
sensitivity
K-mean clustering
1. Randomly init k centroids by picking from data points
2. Assign each data points to centroids
3. Update centroids for each cluster
4. Repeat 2-3 until centroids does not change
Brand loyalty

Price
sensitivity
An Illustration Of K-Mean Clustering

Randomly select Assign points to

K=3 centroids nearest centroid

Update centroids Update point

assignments Update centroids

From https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
Characteristics of K-means
▪ The number of clusters, K, is specified in advance.
▪ Always converge to a (local) minimum.
• Poor starting centroid locations can lead to incorrect minima.

Image from https://en.wikipedia.org/wiki/K-means_clustering

▪ The model has several implicit assumptions:

• Data points scatter around cluster’s centers.
• Boundary between adjacent clusters is always halfway
between the cluster centroids.
Effect of bad initializations

Solution, try different randomization and

pick the best
From https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
Clustering Metrics?
Selecting K - Using Elbow method
All-data centroid From https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

K=1 K=2 K=3 K=4

between-cluster variance
fraction of explained variance =
all-data variance
(note: this - denotes Euclidean distance)

Amount of explained variance

The elbow method chooses K
Fraction of explained variance where increasing complexity
doesn’t yield much in return.
Model complexity

Number of cluster, K
Selecting K - other methods
95% explained variance
Fraction of explained variance

Choose minimal K that

explains at least 95% of the
Number of cluster, K all-data variance.

K Accuracy
K=2
Training
K=3 Testing / 2 50%
K-mean
Cross-
K=4 Clustering 3 68%
validation
Model
4 83%
…

…
Choose K that maximizes
certain objective (e.g.
accuracy on testing data) Best method
Summary
• Other clustering methods
• K-mode, K-median
• Spectral clustering (clustering in embedding space)
• DBScan (clustering by “density”– very robust, no need for k*)

https://github.com/NSHipster/DBSCAN

(ASM403) Reflective Paper Wan Muhammad Akram Bin Wan Mohd Azli Ba2321a
No ratings yet
(ASM403) Reflective Paper Wan Muhammad Akram Bin Wan Mohd Azli Ba2321a
2 pages
A Collection of Quotes From Jewish Periodicals On Bolshevism
100% (2)
A Collection of Quotes From Jewish Periodicals On Bolshevism
5 pages
14-Day Bootcamp PDF
100% (1)
14-Day Bootcamp PDF
10 pages
ML Overview
No ratings yet
ML Overview
26 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
Introduction To Machine Learning: Pekka Parviainen
No ratings yet
Introduction To Machine Learning: Pekka Parviainen
39 pages
1 Introduction
No ratings yet
1 Introduction
58 pages
GenerativeAI ML Roadmap
No ratings yet
GenerativeAI ML Roadmap
26 pages
Industrial Training Report (Sahil)
No ratings yet
Industrial Training Report (Sahil)
33 pages
Lecture 1
No ratings yet
Lecture 1
51 pages
Master+Data+Science,+Data+Analytics+and+Machine+Learning+Using+Python (1)
No ratings yet
Master+Data+Science,+Data+Analytics+and+Machine+Learning+Using+Python (1)
16 pages
ml
No ratings yet
ml
333 pages
Lesson 4 -Introduction Machine Learning
No ratings yet
Lesson 4 -Introduction Machine Learning
44 pages
AI Fellowship Nepal
No ratings yet
AI Fellowship Nepal
17 pages
ML 01
No ratings yet
ML 01
15 pages
01 Intro To ML Wo Videos
No ratings yet
01 Intro To ML Wo Videos
46 pages
Machine Learning (ML) and ML Engineering: CSE 473 24wi
No ratings yet
Machine Learning (ML) and ML Engineering: CSE 473 24wi
32 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
ML-cahp-1
No ratings yet
ML-cahp-1
35 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
58 pages
Introduction To Machine Learning and Hands On Sessions
No ratings yet
Introduction To Machine Learning and Hands On Sessions
50 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Machine Learning: Martin Jaggi & Nicolas Flammarion
No ratings yet
Machine Learning: Martin Jaggi & Nicolas Flammarion
52 pages
AD8552-ML-UNIT-V (1)
No ratings yet
AD8552-ML-UNIT-V (1)
78 pages
Machine Learning: Instructor: Prof. Ayesha
No ratings yet
Machine Learning: Instructor: Prof. Ayesha
31 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
LM #01-Introduction To ML
No ratings yet
LM #01-Introduction To ML
33 pages
Unit-I
No ratings yet
Unit-I
23 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
Rama E.K. Lekshmi - 212222240082
No ratings yet
Rama E.K. Lekshmi - 212222240082
20 pages
Data Science Course Syllabus 01
100% (1)
Data Science Course Syllabus 01
20 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Data Science Student Schedule
No ratings yet
Data Science Student Schedule
7 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Workshop 0
No ratings yet
Workshop 0
22 pages
ML Revision
No ratings yet
ML Revision
207 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
60 pages
Report Data Analysis
No ratings yet
Report Data Analysis
45 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Computational Intelligence: (Introduction To Machine Learning)
No ratings yet
Computational Intelligence: (Introduction To Machine Learning)
55 pages
F# For Machine Learning Essentials - Sample Chapter
No ratings yet
F# For Machine Learning Essentials - Sample Chapter
29 pages
Ai - Introduction: FDP / Short Term Training On Artificial Intelligence & Deep Learning Applications
No ratings yet
Ai - Introduction: FDP / Short Term Training On Artificial Intelligence & Deep Learning Applications
6 pages
Chapter 5 AI
No ratings yet
Chapter 5 AI
40 pages
Lec 01 - Intro To ML
No ratings yet
Lec 01 - Intro To ML
28 pages
Brochure_CE_AML_30_Sept_2021_V33
No ratings yet
Brochure_CE_AML_30_Sept_2021_V33
16 pages
Lecture01 Introduction To Machine Learning (Chapter1)
No ratings yet
Lecture01 Introduction To Machine Learning (Chapter1)
64 pages
Python For Data Science and Machine Learning
100% (2)
Python For Data Science and Machine Learning
31 pages
Unit 1&2
No ratings yet
Unit 1&2
270 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
81 pages
Machine Learning for Data Science Unit-4
No ratings yet
Machine Learning for Data Science Unit-4
16 pages
SocrAI Day 1
No ratings yet
SocrAI Day 1
104 pages
Machine Learning Introduction - A Comprehensive Guide
No ratings yet
Machine Learning Introduction - A Comprehensive Guide
13 pages
Introduction
No ratings yet
Introduction
4 pages
Lecture 2 - What Is ML
No ratings yet
Lecture 2 - What Is ML
17 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Steven Skiena-The Algorithm Design Manual-En
50% (2)
Steven Skiena-The Algorithm Design Manual-En
27 pages
AI QP - 1 Solved
No ratings yet
AI QP - 1 Solved
9 pages
Chapter 01 machine learning
No ratings yet
Chapter 01 machine learning
22 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Deep learning: deep learning explained to your granny – a guide for beginners
From Everand
Deep learning: deep learning explained to your granny – a guide for beginners
PAT NAKAMOTO
3/5 (2)
DeKUT 10th Anniversary Feature Magazine March 2023
No ratings yet
DeKUT 10th Anniversary Feature Magazine March 2023
60 pages
New Educational Strategies and Asian Competitiveness
100% (2)
New Educational Strategies and Asian Competitiveness
29 pages
SAS - Session-24-Research 1
No ratings yet
SAS - Session-24-Research 1
5 pages
Math 1 Q1
100% (1)
Math 1 Q1
4 pages
Planning 2 - Insular Village
No ratings yet
Planning 2 - Insular Village
7 pages
Lorenzo Saenz - Research Rough Draft
No ratings yet
Lorenzo Saenz - Research Rough Draft
2 pages
CBSE Class 9 English Poetry
No ratings yet
CBSE Class 9 English Poetry
13 pages
Basic Egg Recipes PDF
No ratings yet
Basic Egg Recipes PDF
25 pages
Yazu's Resume
No ratings yet
Yazu's Resume
3 pages
Balance Sheet Britannia Industries LTD (BRIT IN) - Standardized
No ratings yet
Balance Sheet Britannia Industries LTD (BRIT IN) - Standardized
12 pages
WORKING OF INSTITUTIONS -ANSWERS
No ratings yet
WORKING OF INSTITUTIONS -ANSWERS
4 pages
SAP Theology 1
No ratings yet
SAP Theology 1
6 pages
6A a North African Story
No ratings yet
6A a North African Story
8 pages
Test 1-With Answers
No ratings yet
Test 1-With Answers
16 pages
Cybercrime in Zimbabwe and Globally
No ratings yet
Cybercrime in Zimbabwe and Globally
19 pages
Auto Body Receipt Template
No ratings yet
Auto Body Receipt Template
2 pages
Tort Law 1 + 2
No ratings yet
Tort Law 1 + 2
25 pages
TriAmp Manual 1 2
No ratings yet
TriAmp Manual 1 2
32 pages
Environmental Change and Security Program Report 3: Event Summaries, Update, and Bibliography
No ratings yet
Environmental Change and Security Program Report 3: Event Summaries, Update, and Bibliography
105 pages
One Year Chronological Bible Reading Plan
No ratings yet
One Year Chronological Bible Reading Plan
4 pages
A.1 - B1130.0.20.33.945.GD11.004-02 F A4 - (Manufacturing and Inspection Procedure - Procurement) PDF
No ratings yet
A.1 - B1130.0.20.33.945.GD11.004-02 F A4 - (Manufacturing and Inspection Procedure - Procurement) PDF
8 pages
Around The World in 80 Days Begins at The Reform Club With Phileas Fogg, Thomas Flanagan, Samuel
No ratings yet
Around The World in 80 Days Begins at The Reform Club With Phileas Fogg, Thomas Flanagan, Samuel
1 page
Obgyn: History Taking and Examination DR Musa Marena Obgyn
No ratings yet
Obgyn: History Taking and Examination DR Musa Marena Obgyn
94 pages
Economic dispatch
No ratings yet
Economic dispatch
12 pages
Mozzozzin Sur Clearance Indigency Certification
No ratings yet
Mozzozzin Sur Clearance Indigency Certification
3 pages
Connecting The Actor-Self To The Dancer-Self, Regardless of Dramaturgical Function or Style
No ratings yet
Connecting The Actor-Self To The Dancer-Self, Regardless of Dramaturgical Function or Style
3 pages
The Next Level in Robotwelding: WWW - Welding-And-Cutting - Info Technical Journal For Welding and Allied Processes
No ratings yet
The Next Level in Robotwelding: WWW - Welding-And-Cutting - Info Technical Journal For Welding and Allied Processes
64 pages

L01-intro-clustering

Uploaded by

L01-intro-clustering

Uploaded by

INTRODUCTION

2110573 Pattern Recognition

• Switch sections if wrong, if full please tell me to

• Probably what the field

Probably not of much interest besides philosophical

• Nearly identical tools and subject matter

Driving a car analogy

0. Pre-machine learning: rule-base

Image from http://www.physics.udel.edu/~watson/scen103/colloq2000/7-

Image from: http://www.instructables.com/id/DIY-7-Segment-Display/

Real world observations

This is the hardest part of data science

Goal of machine learning is to find the best

Need a training set for training.

Model h Desired output y

However, good models should be able to handle some dirtiness!

• In this course, we won’t talk much about data/feature

How to compare h1 and h2?

Model A: Where are you going?

Designing a metric can be tricky, especially when it’s subjective

Slides from https://github.com/goldmermaid/mlrs

• Use the metric closest to your goal but never disregard

True positive + False negative = # of actual yes

• True positive rate (Recall, sensitivity)

• Precision = # true positive / # of predicted positive

A recall of 50% means?

A precision of 50% means?

• Initial screening for cancer

• Accuracy might not be a good metric for biased data

• A single measure that combines both aspects

• We use a different data set to test the accuracy/error of

• We can still compute the error and accuracy on the

• Training error vs Testing error

Too hard… Brand loyalty

Too hard… Brand loyalty

Given query data

Assign label of the smallest distance

Compute the distance with the query

The votes can be weighted by the Price

Randomly select Assign points to

Update centroids Update point

Image from https://en.wikipedia.org/wiki/K-means_clustering

▪ The model has several implicit assumptions:

Solution, try different randomization and

K=1 K=2 K=3 K=4

Amount of explained variance

Choose minimal K that

You might also like