Machine Learning - Classification: CS102 Winter 2019
Machine Learning - Classification: CS102 Winter 2019
CS102
Winter 2019
Classification CS102
Big Data Tools and Techniques
§ Basic Data Manipulation and Analysis
Performing well-defined computations or asking
well-defined questions (“queries”)
§ Data Mining
Looking for patterns in data
§ Machine Learning
Using data to build models and make predictions
§ Data Visualization
Graphical depiction of data
§ Data Collection and Preparation
Classification CS102
Regression
Using data to build models and make predictions
§ Supervised
§ Training data, each example:
• Set of predictor values - “independent variables”
• Numerical output value - “dependent variable”
§ Model is function from predictors to output
• Use model to predict output value for new
predictor values
§ Example
• Predictors: mother height, father height, current age
• Output: height
Classification CS102
Regression
Classification
Using data to build models and make predictions
§ Supervised
§ Training data, each example:
• Set of predictor values– numeric
feature values - “independent
or categorical
variables”
Numerical output value - “dependent
• Categorical “label” variable”
feature values
method from predictors
§ Model is function to label
to output
• Use model to predict output value for new
label
predictor values
feature values
§ Example
• Predictors:
Feature values: age,
mother gender,
height, income,
father profession
height, current age
Label: buyer,
• Output: heightnon-buyer
Classification CS102
Other Examples
Medical diagnosis
• Feature values: age, gender, history,
symptom1-severity, symptom2-severity,
test-result1, test-result2
• Label: disease
Email spam detection
• Feature values: sender-domain, length,
#images, keyword1, keyword2, …, keywordn
• Label: spam or not-spam
Credit card fraud detection
• Feature values: user, location, item, price
• Label: fraud or okay
Classification CS102
Algorithms for Classification
Despite similarity of problem statement to
regression, non-numerical nature of classification
leads to completely different approaches
§ K-nearest neighbors
§ Decision trees
§ Naïve Bayes
§ … and others
Classification CS102
K-Nearest Neighbors (KNN)
For any pair of data items i1 and i2, from their
feature values compute distance(i1,i2)
Example:
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305)
person2 = (female, teacher, 43, $28K, 94309)
distance(person1, person2)
Classification CS102
K-Nearest Neighbors (KNN)
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305)
person2 = (female, teacher, 43, $28K, 94309)
Remember training data has labels
Classification CS102
K-Nearest Neighbors (KNN)
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305) buyer
person2 = (female, teacher, 43, $28K, 94309) non-buyer
Remember training data has labels
To classify a new item i : In the labeled data find
the K closest items to i, assign most frequent label
person3 = (female, doctor, 40, $40K, 95123)
Classification CS102
KNN Example
§ City temperatures – France and Germany
§ Features: longitude, latitude
§ Distance is Euclidean distance
distance([o1,a1],[o2,a2]) = sqrt((o1−o2)2 + (a1−a2)2)
= actual distance in x-y plane
§ Labels: frigid, cold, cool, warm, hot
Classification CS102
KNN Example
Classification CS102
KNN Summary
To classify a new item i : find K closest items to i
in the labeled data, assign most frequent label
Classification CS102
“Regression” Using KNN
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305) buyer
person2 = (female, teacher, 43, $28K, 94309) non-buyer
Remember training data has labels
To classify a new item i, find K closest items to i
in the labeled data, assign most frequent label
person3 = (female, doctor, 40, $40K, 95123)
Classification CS102
“Regression” Using KNN
Features - gender, profession, age, income, postal-code
$250
person1 = (male, teacher, 47, $25K, 94305) buyer
person2 = (female, teacher, 43, $28K, 94309) non-buyer
$100
Remember training data has labels
To classify a new item i, find K closest items to i
average
in the labeled data, assign most value of
frequent labels
label
person3 = (female, doctor, 40, $40K, 95123)
Classification CS102
Regression Using KNN - Example
Classification CS102
Decision Trees
§ Use the training data to construct a decision tree
§ Use the decision tree to classify new data
Classification CS102
Decision Trees
Nodes: features with apologies
for binary gender
Gender
Edges: feature values male
female
Leaves: labels
Age Income
<20 >50 ≥$100K
<$100K
20-50
Non-Buyer Profession Postal Code Profession Buyer
Classification CS102
Decision Trees
Primary challenge is building good decision
trees from training data
• Which features and feature values to use
at each choice point
• HUGE number of possible trees even with
small number of features and values
Common approach: “forest” of many trees,
combine the results
• Still impossible to consider all trees
Classification CS102
Naïve Bayes
Given new data item i, based on i’s feature values
and the training data, compute the probability of
each possible label. Pick highest one.
Efficiency relies on conditional independence
assumption:
Given any two features F1,F2 and a label L, the
probability that F1=v1 for an item with label L is
independent of the probability that F2=v2 for that
item
Examples:
gender and age? income and postal code?
Classification CS102
Naïve Bayes
Given new data item i, based on i’s feature values
and the training data, compute the probability of
each possible label. Pick highest one.
Efficiency relies on conditional independence
assumption:
Given any two features
Conditional F1,F2 and a label L, the
independence
probability
assumption that F1=vdoesn’t
often 1 for anhold,
item with label L is
independent
which of the
is why the probability
approach that F2=v2 for that
is “naive”
item.
Nevertheless the
Examples: approach works very
gender and age? income and postal code?
well in practice
Classification CS102
Naïve Bayes Example
Predict temperature category for a country based
on whether the country has coastline and
whether it is in the EU
Classification CS102
Naïve Bayes Preparation
Step 1: Compute fraction (probability) of items in
each category
cold .18
cool .38
warm .24
hot .20
Classification CS102
Naïve Bayes Preparation
Step 2: For each category, compute fraction of
items in that category for each feature and value
coastline=yes .83 coastline=yes .5
warm coastline=no .5
cold coastline=no .17
(.24)
(.18) EU=yes .67 EU=yes .5
EU=no .33 EU=no .5
coastline=yes .69 coastline=yes 1.0
cool coastline=no .31 hot coastline=no .0
(.38) (.20)
EU=yes .77 EU=yes .71
EU=no .23 EU=no .29
Classification CS102
Naïve Bayes Prediction
New item: France, coastline=yes, EU=yes
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.
Classification CS102
Naïve Bayes Prediction
New item: France, coastline=yes, EU=yes
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.
Classification CS102
Naïve Bayes Prediction
New item: Serbia, coastline=no, EU=no
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.
Classification CS102
Naïve Bayes Prediction
New item: Serbia, coastline=no, EU=no
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.
Classification CS102
Naïve Bayes Prediction
New item: Austria, coastline=no, EU=yes
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.
Classification CS102
Naïve Bayes Prediction
New item: Austria, coastline=no, EU=yes
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.
Classification CS102
Naïve Bayes Prediction
New item: Austria, coastline=no, EU=yes
Manycategory:
For each presentations of NaïveofBayes
probability category times
include
product an additionalofnormalization
of probabilities new item’s features
step
in that so the final
category. Pickproducts
highest.are
probabilities that sum to 1.0. The
choice prob.
category of label is unchanged,
coastline=no so we’ve
EU=yes product
omitted.18
cold that step .17
for simplicity.
.67 .02
cool .38 .31 .77 .09
warm .24 .5 .5 .06
hot .20 .0 .71 .0
Classification CS102
Feature Selection
Real applications often have thousands of features
§ Naïve Bayes typically uses only some of the
features, those most affecting the label
§ Decision trees also rely on choosing features
that most affect the label
§ Feature selection is a key part of machine
learning – an art and a science
Classification CS102
Training and Test
Created machine learning model from training data.
How do you know whether it’s a good model?
Ø Try it on known data
Feature Values Labels
“Test Data”
Classification CS102
Other Terms You Might Hear
Logistic regression
• Recall regression model is function f from
predictor values to numeric output value
• For classification: from training data obtain one
regression function fL for each label L
fL(feature-values) = probability of item having label L
Support Vector Machine
• Two labels only (“binary classifier”)
• Features = multidimensional space
• From training data SVM finds
hyper-plane that best divides
space according to labels
Classification CS102
Other Terms You Might Hear
Deep Learning
• Complex, mysterious (the ultimate “black box”
software), becoming extremely popular
• Multiple layers, each layer uses classification
techniques to reduce complexity for next layer
and further classification
• Important plus: identifies features from raw data
Neural Network
• Precursor to deep learning, typically two layers
• Leap to deep learning enabled by massive
amounts of data, powerful computing
Classification CS102
Other Terms You Might Hear
Deep Learning
• Complex, mysterious (the ultimate “black box”
software), becoming extremely popular
• Multiple layers, each layer uses classification
techniques to reduce complexity for next layer
and further classification
• Important plus: identifies features from raw data
Neural Network
• Precursor to deep learning, typically two layers
• Leap to deep learning enabled by massive
amounts of data, powerful computing
Classification CS102
Classification Summary
§ Supervised machine learning
§ Training data, each example:
• Set of feature values – numeric or categorical
• Categorical output value – label
§ Model is “function” from feature values to label
• Use model to predict label for new feature values
§ Approaches we covered
• K-nearest neighbors – relies on distance (or
similarity) function
• Decision trees – relies on finding good trees/forests
• Naïve Bayes – relies on conditional independence
assumption
Classification CS102