0% found this document useful (0 votes)

92 views40 pages

ML Book

Machine learning exists at the intersection of computer science and statistics. The document provides examples of machine learning applications such as spam filters, recommendations, and fraud detection. It discusses common machine learning tasks like classification, clustering, and feature extraction. Classification involves learning a mapping from entities to discrete labels. The document outlines the typical classification workflow, which includes data preprocessing, feature extraction, model training, and evaluation. It also provides an example of applying classification to spam detection.

Uploaded by

Rishabh chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views40 pages

ML Book

Uploaded by

Rishabh chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Machine

Learning Crash Course:

Part I
Ariel Kleiner
August 21, 2012
Machine learning
exists at the intersec<on of
computer science and sta<s<cs.
Examples
•  Spam filters
•  Search ranking
•  Click (and clickthrough rate) predic<on
•  Recommenda<ons (e.g., NeJlix, Facebook friends)
•  Speech recogni<on
•  Machine transla<on
•  Fraud detec<on
•  Sen<ment analysis
•  Face detec<on, image classifica<on
•  Many more
A Variety of Capabili<es
•  Classifica<on •  Collabora<ve filtering
•  Regression •  Ac<ve learning and
•  Ranking experimental design
•  Clustering •  Reinforcement learning
•  Dimensionality •  Time series analysis
reduc<on •  Hypothesis tes<ng
•  Feature selec<on •  Structured predic<on
•  Structured probabilis<c
modeling
For Today

Classiﬁca<on
Clustering

(with emphasis on

implementability and scalability)
Typical Data Analysis Workﬂow
Obtain and load raw data

Data explora<on

Preprocessing and featuriza<on

Learning

Diagnos<cs and evalua<on

Classiﬁca<on
•  Goal: Learn a mapping from en<<es to
discrete labels.
–  Refer to en<<es as x and labels as y.

•  Example: spam classiﬁca<on

–  En<<es are emails.
–  Labels are {spam, not-‐spam}.
–  Given past labeled emails, want to predict
whether a new email is spam or not-‐spam.
Classifica<on
•  Examples
–  Spam filters
–  Click (and clickthrough rate) predic<on
–  Sen<ment analysis
–  Fraud detec<on
–  Face detec<on, image classifica<on
Classifica<on
Given a labeled dataset (x1, y1), ..., (xN, yN):
1.  Randomly split the full dataset into two disjoint
parts:
–  A larger training set (e.g., 75%)
–  A smaller test set (e.g., 25%)
2.  Preprocess and featurize the data.
3.  Use the training set to learn a classifier.
4.  Evaluate the classifier on the test set.
5.  Use classifier to predict in the wild.
Classifica<on

training
classifier
full set
dataset

test set new entity

accuracy prediction
Example: Spam Classiﬁca<on

From: [email protected]

"Eliminate your debt by spam

giving us your money..."

From: [email protected]

"Hi, it's been a while! not-spam

How are you? ..."
Featuriza<on
•  Most classifiers require numeric descrip<ons
of en<<es as input.
•  Featuriza1on: Transform each en<ty into a
vector of real numbers.
–  StraighJorward if data already numeric (e.g.,
pa<ent height, blood pressure, etc.)
–  Otherwise, some effort required. But, provides an
opportunity to incorporate domain knowledge.
Featuriza<on: Text
•  Ofen use “bag of words” features for text.
–  En<<es are documents (i.e., strings).
–  Build vocabulary: determine set of unique words
in training set. Let V be vocabulary size.
–  Featuriza1on of a document:
•  Generate V-‐dimensional feature vector.
•  Cell i in feature vector has value 1 if document contains
word i, and 0 otherwise.
Example: Spam Classifica<on

From: [email protected]

"Eliminate your debt by Vocabulary

giving us your money..."
been
debt
eliminate
giving
how
From: [email protected] it's
money
"Hi, it's been a while! while
How are you? ..."
Example: Spam Classiﬁca<on
0 been

1 debt

1 eliminate
From: [email protected]
1 giving
"Eliminate your debt by
0 how
giving us your money..."
0 it's

1 money

0 while
Example: Spam Classifica<on
•  How might we construct a classifier?
•  Using the training data, build a model that will
tell us the likelihood of observing any given (x, y)
pair.
–  x is an email’s feature vector
–  y is a label, one of {spam, not-‐spam}
•  Given such a model, to predict label for an email:
–  Compute likelihoods of (x, spam) and (x, not-‐spam).
–  Predict label which gives highest likelihood.
Example: Spam Classifica<on
•  What is a reasonable probabilis<c model for
(x, y) pairs?
•  A baseline:
–  Before we observe an email’s content, can we say
anything about its likelihood of being spam?
–  Yes: p(spam) can be es<mated as the frac<on of
training emails which are spam.
–  p(not-‐spam) = 1 – p(spam)
–  Call this the “class prior.” Wrinen as p(y).
Example: Spam Classifica<on
•  How do we incorporate an email’s content?
•  Suppose that the email were spam. Then,
what would be the probability of observing its
content?
Example: Spam Classifica<on
•  Example: “Eliminate your debt by giving us your
money” with feature vector (0, 1, 1, 1, 0, 0, 1, 0)
•  Ignoring word sequence, probability of email is
p(seeing “debt” AND seeing “eliminate” AND
seeing “giving” AND seeing “money” AND not
seeing any other vocabulary words | given that
email is spam)
•  In feature vector nota<on:
p(x1=0, x2=1, x3=1, x4=1, x5=0, x6=0, x7=1, x8=0 |
given that email is spam)
Example: Spam Classifica<on
•  Now, to simplify, model each word in the
vocabulary independently:
–  Assume that (given knowledge of the class label)
probability of seeing word i (e.g., eliminate) is
independent of probability of seeing word j (e.g.,
money).
–  As a result, probability of email content becomes
p(x1=0 | spam) p(x2=1 | spam) ... p(x8=0 | spam)
rather than
p(x1=0, x2=1, x3=1, x4=1, x5=0, x6=0, x7=1, x8=0 | spam)
Example: Spam Classifica<on
•  Now, we only need to model the probability of
seeing (or not seeing) a par<cular word i,
assuming that we knew the email’s class y
(spam or not-‐spam).
–  But, this is easy!
–  To es<mate p(xi = 1 | y), simply compute the
frac<on of emails in the set {emails in training set
with label y} which contain the word i.
Example: Spam Classifica<on
•  Pusng it all together:
–  Based on the training data, es<mate the class prior
p(y).
•  i.e., es<mate p(spam) and p(not-‐spam).
–  Also es<mate the (condi<onal) probability of seeing
any individual word i, given knowledge of the class
label y.
•  i.e., es<mate p(xi = 1 | y) for each i and y
–  The (condi<onal) probability p(x | y) of seeing an
en<re email, given knowledge of the class label y, is
then simply the product of the condi<onal word
probabili<es.
•  e.g., p(x=(0, 1, 1, 1, 0, 0, 1, 0) | y) =
p(x1=0 | y) p(x2=1 | y) ... p(x8=0 | y)
Example: Spam Classifica<on
•  Recall: we want a model that will tell us the
likelihood p(x, y) of observing any given (x, y) pair.
•  The probability of observing (x, y) is the
probability of observing y, and then observing x
given that value of y:
p(x, y) = p(y) p(x | y)
•  Example:
p(“Eliminate your debt...”, spam) =
p(spam) p(“Eliminate your debt...” | spam)
Example: Spam Classifica<on
•  To predict label for a new email:
–  Compute log[p(x, spam)] and log[p(x, not-‐spam)].
–  Choose the label which gives higher value.
–  We use logs above to avoid underflow which
otherwise arises in compu<ng the p(x | y), which
are products of individual p(xi | y) < 1:
log[p(x, y)] = log[p(y) p(x | y)]
= log[ p(y) p(x1 | y) p(x2 | y) ...]
= log[p(y)] + log[p(x1 | y)] + log[p(x2 | y)] + ...
Classifica<on: Beyond Text
•  You have just seen an instance of the Naive
Bayes classifier.
•  Applies as shown to any classifica<on problem
with binary feature vectors.
•  What if the features are real-‐valued?
–  S<ll model each element of the feature vector
independently.
–  But, change the form of the model for p(xi | y).
Classifica<on: Beyond Text
•  If xi is a real number, ofen model p(xi | y) as
2
Gaussian with mean µ iy
and variance σ
iy
:
� x −µ �2
1 − 12 i iy
p(xi |y) = √ e σiy

σiy 2π
•  Es<mate the mean and variance for a given i,y as
the mean and variance of the xi in the training set
which have corresponding class label y.
•  Other, non-‐Gaussian distribu<ons can be used if
know more about the xi.
Naive Bayes: Benefits
•  Can easily handle more than two classes and
different data types
•  Simple and easy to implement
•  Scalable
Naive Bayes: Shortcomings
•  Generally not as accurate as more sophis<cated
methods (but s<ll generally reasonable).
•  Independence assump<on on the feature vector
elements
–  Can instead directly model p(x | y) without this
independence assump<on.
•  Requires us to specify a full model for p(x, y)
–  In fact, this is not necessary!
–  To do classifica<on, we actually only require p(y | x),
the probability that the label is y, given that we have
observed en<ty features x.
Logis<c Regression
•  Recall: Naive Bayes models the full ( joint)
probability p(x, y).
•  But, Naive Bayes actually only uses the
condi<onal probability p(y | x) to predict.
•  Instead, why not just directly model p(y | x)?
–  Logis1c regression does exactly that.
–  No need to first model p(y) and then separately
p(x | y).
Logis<c Regression
•  Assume that class labels are {0, 1}.
•  Given an en<ty’s feature vector x, probability that
label is 1 is taken to be
1
p(y = 1|x) = −b Tx
1+e
where b is a parameter vector and bTx denotes a
dot product.
•  The probability that the label is 1, given features
x, is determined by a weighted sum of the
features.
Logis<c Regression
•  This is libera<ng:
–  Simply featurize the data and go.
–  No need to find a distribu<on for p(xi | y) which is
par<cularly well suited to your sesng.
–  Can just as easily use binary-‐valued (e.g., bag of
words) or real-‐valued features without any
changes to the classifica<on method.
–  Can ofen improve performance simply by adding
new features (which might be derived from old
features).
Logis<c Regression
•  Can be trained efficiently at large scale, but
not as easy to implement as Naive Bayes.
–  Trained via maximum likelihood.
–  Requires use of itera<ve numerical op<miza<on
(e.g., gradient descent, most basically).
–  However, implemen<ng this effec<vely, robustly,
and at large scale is non-‐trivial and would require
more <me than we have today.
•  Can be generalized to mul<class sesng.
Other Classifica<on Techniques
•  Support Vector Machines (SVMs)
•  Kernelized logis<c regression and SVMs
•  Boosted decision trees
•  Random Forests
•  Nearest neighbors
•  Neural networks
•  Ensembles

See The Elements of Sta4s4cal Learning by Has<e,

Tibshirani, and Friedman for more informa<on.
Featuriza<on: Final Comments
•  Featuriza<on affords the opportunity to
–  Incorporate domain knowledge
–  Overcome some classifier limita<ons
–  Improve performance
•  Incorpora<ng domain knowledge:
–  Example: in spam classifica<on, we might suspect that
sender is important, in addi<on to email body.
–  So, try adding features based on sender’s email
address.
Featuriza<on: Final Comments
•  Overcoming classifier limita<ons:
–  Naive Bayes and logis<c regression do not model
mul<plica<ve interac<ons between features.
–  For example, the presence of the pair of words
[eliminate, debt] might indicate spam, while the
presence of either one individually might not.
–  Can overcome this by adding features which explicitly
encode such interac<ons.
–  For example, can add features which are products of
all pairs of bag of words features.
–  Can also include nonlinear effects in this manner.
–  This is actually what kernel methods do.
Classifica<on
Given a labeled dataset (x1, y1), ..., (xN, yN):
1.  Randomly split the full dataset into two disjoint
parts:
–  A larger training set (e.g., 75%)
–  A smaller test set (e.g., 25%)
2.  Preprocess and featurize the data.
3.  Use the training set to learn a classifier.
4.  Evaluate the classifier on the test set.
5.  Use classifier to predict in the wild.
Classifica<on

training
classifier
full set
dataset

test set new entity

accuracy prediction
Classifier Evalua<on
•  How do we determine the quality of a trained
classifier?
•  Various metrics for quality, most common is
accuracy.
•  How do we determine the probability that a
trained classifier will correctly classify a new
en<ty?
Classifier Evalua<on
•  Cannot simply evaluate a classifier on the
same dataset used to train it.
–  This will be overly op<mis<c!
•  This is why we set aside a disjoint test set
before training.
Classifier Evalua<on
•  To evaluate accuracy:
–  Train on the training set without exposing the test set
to the classifier.
–  Ignoring the (known) labels of the data points in the
test set, use the trained classifier to generate label
predic<ons for the test points.
–  Compute the frac<on of predicted labels which are
iden<cal to the test set’s known labels.
•  Other, more sophis<cated evalua<on methods
are available which make more efficient use of
data (e.g., cross-‐valida<on).

2022 Naive Bayes and Probability
No ratings yet
2022 Naive Bayes and Probability
30 pages
Introduction To Learning and Behavior 5th Edition Powell Test Bank
No ratings yet
Introduction To Learning and Behavior 5th Edition Powell Test Bank
30 pages
Printable SHADOW WORK JOURNAL - Compressed
No ratings yet
Printable SHADOW WORK JOURNAL - Compressed
163 pages
A Guide To Rational Living in An Irrational World - (Original) Albert Ellis & Robert A. Harper
100% (3)
A Guide To Rational Living in An Irrational World - (Original) Albert Ellis & Robert A. Harper
194 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
CSE546: Naïve Bayes: Winter 2012
No ratings yet
CSE546: Naïve Bayes: Winter 2012
35 pages
Materi Kelas 8 14 Simple Past Tense Dan Past Continuous Tense
100% (1)
Materi Kelas 8 14 Simple Past Tense Dan Past Continuous Tense
14 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
Artificial Neural Networks For Secondary Structure Prediction
No ratings yet
Artificial Neural Networks For Secondary Structure Prediction
21 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Critical Analysis
No ratings yet
Critical Analysis
3 pages
TESOL Diploma Assignmen1
No ratings yet
TESOL Diploma Assignmen1
7 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
Lecture-07 Search Techniques
No ratings yet
Lecture-07 Search Techniques
7 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
TOOls TIC ITIL
No ratings yet
TOOls TIC ITIL
6 pages
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
No ratings yet
How To Submit Your Homework: EECS 349 Machine Learning Homework 5
4 pages
DLL Cot2 Packaging
No ratings yet
DLL Cot2 Packaging
2 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Probabilistic Reasoning Lab Procedure
No ratings yet
Probabilistic Reasoning Lab Procedure
4 pages
Week 3 - 5-Bayesian Methods
No ratings yet
Week 3 - 5-Bayesian Methods
4 pages
Malaysia National Education Philosophy
No ratings yet
Malaysia National Education Philosophy
22 pages
MachineLearning Lecture06 PDF
No ratings yet
MachineLearning Lecture06 PDF
16 pages
Comp Vis Week 3
No ratings yet
Comp Vis Week 3
44 pages
NLP NB
No ratings yet
NLP NB
52 pages
14 Supervised Machine Learning
No ratings yet
14 Supervised Machine Learning
94 pages
Identify Real or Make-Believe, Fact or No-Fact Images: By: Teacher CHE
No ratings yet
Identify Real or Make-Believe, Fact or No-Fact Images: By: Teacher CHE
37 pages
7 - Text Classification Naive Bayes
No ratings yet
7 - Text Classification Naive Bayes
41 pages
Course Name (Course Code)
No ratings yet
Course Name (Course Code)
9 pages
Transform Research Paper
No ratings yet
Transform Research Paper
24 pages
EEE604 Electrical Engineering Modeling UD PDF
No ratings yet
EEE604 Electrical Engineering Modeling UD PDF
4 pages
Unit 5 Assignment
No ratings yet
Unit 5 Assignment
7 pages
ENGLISH-8 TOS 3rd Quarter Examination
No ratings yet
ENGLISH-8 TOS 3rd Quarter Examination
2 pages
Incorporating Limited Rationality
No ratings yet
Incorporating Limited Rationality
17 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Module 21ST Q1 W6 D4 Bucayu Revised
No ratings yet
Module 21ST Q1 W6 D4 Bucayu Revised
6 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
COMP2050-Lecture 22 - Machine Learning
No ratings yet
COMP2050-Lecture 22 - Machine Learning
47 pages
DLP Co2 Q2M4 MMD 2324
No ratings yet
DLP Co2 Q2M4 MMD 2324
6 pages
CS585 Lecture October01st
No ratings yet
CS585 Lecture October01st
158 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
23-Naive Bayes
No ratings yet
23-Naive Bayes
22 pages
Case 2 Ghana Uni
No ratings yet
Case 2 Ghana Uni
2 pages
Expert Systems Sylabus
No ratings yet
Expert Systems Sylabus
1 page
Arabic Verb List E01
No ratings yet
Arabic Verb List E01
1 page
Aa11549757 17 157-173
No ratings yet
Aa11549757 17 157-173
17 pages
Lecture3 Linear Classifiers
No ratings yet
Lecture3 Linear Classifiers
36 pages
The Anthropology of The Imagination in T
No ratings yet
The Anthropology of The Imagination in T
22 pages
Unit 3
No ratings yet
Unit 3
157 pages
DWM Exp5 C49
No ratings yet
DWM Exp5 C49
12 pages
Naive Bayes Ons
No ratings yet
Naive Bayes Ons
29 pages
Lecture 05
No ratings yet
Lecture 05
45 pages
Text Classification in ML
No ratings yet
Text Classification in ML
47 pages
Unit - 5
No ratings yet
Unit - 5
13 pages
CH 5
No ratings yet
CH 5
21 pages
Episode 6 Classroom Management
No ratings yet
Episode 6 Classroom Management
7 pages
ML 2
No ratings yet
ML 2
13 pages
cs221 Lecture10
No ratings yet
cs221 Lecture10
43 pages
03 Classification
No ratings yet
03 Classification
66 pages
Business Data Science
No ratings yet
Business Data Science
2 pages
Lecture 6 Text Classification
No ratings yet
Lecture 6 Text Classification
19 pages
Television - and - Childrens Executive Functionpdf
No ratings yet
Television - and - Childrens Executive Functionpdf
30 pages
Lec 09
No ratings yet
Lec 09
50 pages
Lec 09
No ratings yet
Lec 09
50 pages
Spam Detection Model
No ratings yet
Spam Detection Model
4 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
In4080 2022 Lecture 03
No ratings yet
In4080 2022 Lecture 03
62 pages
CS 188 Introduction To Artificial Intelligence Fall 2018 Note 9
No ratings yet
CS 188 Introduction To Artificial Intelligence Fall 2018 Note 9
14 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
lec20-ML I
No ratings yet
lec20-ML I
48 pages
CELTA Difficult Test Full 50
No ratings yet
CELTA Difficult Test Full 50
9 pages
Classification
No ratings yet
Classification
21 pages
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
cs188 Fa22 Note19
No ratings yet
cs188 Fa22 Note19
8 pages
SP14 CS188 Lecture 21 - Naive Bayes - Print
No ratings yet
SP14 CS188 Lecture 21 - Naive Bayes - Print
41 pages
Naive456 Bayes297Classification
No ratings yet
Naive456 Bayes297Classification
21 pages
AI Notes
No ratings yet
AI Notes
19 pages
Aiml Assignment-2
No ratings yet
Aiml Assignment-2
8 pages
Unit 3
No ratings yet
Unit 3
46 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Lecture 6
No ratings yet
Lecture 6
54 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet

ML Book

Uploaded by

ML Book

Uploaded by

Machine

Learning Crash Course:

(with emphasis on

Preprocessing and featuriza<on

Diagnos<cs and evalua<on

• Example: spam classiﬁca<on

test set new entity

"Eliminate your debt by spam

"Hi, it's been a while! not-spam

"Eliminate your debt by Vocabulary

See The Elements of Sta4s4cal Learning by Has<e,

test set new entity

You might also like

•  Example: spam classiﬁca<on