0% found this document useful (0 votes)
48 views

Machine Learning - Classification: CS102 Winter 2019

The document provides an overview of machine learning classification techniques. It discusses supervised learning where the training data contains labels or categories. Several classification algorithms are described including k-nearest neighbors (KNN), decision trees, and Naive Bayes. KNN classifies new data based on the labels of the closest training examples. Decision trees use a tree structure to recursively split the data based on feature values until pure leaves containing labels are reached. Naive Bayes computes the probability of each label based on the feature values and training data distributions.

Uploaded by

technetvn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Machine Learning - Classification: CS102 Winter 2019

The document provides an overview of machine learning classification techniques. It discusses supervised learning where the training data contains labels or categories. Several classification algorithms are described including k-nearest neighbors (KNN), decision trees, and Naive Bayes. KNN classifies new data based on the labels of the closest training examples. Decision trees use a tree structure to recursively split the data based on feature values until pure leaves containing labels are reached. Naive Bayes computes the probability of each label based on the feature values and training data distributions.

Uploaded by

technetvn
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Machine Learning - Classification

CS102
Winter 2019

Classification CS102
Big Data Tools and Techniques
§ Basic Data Manipulation and Analysis
Performing well-defined computations or asking
well-defined questions (“queries”)
§ Data Mining
Looking for patterns in data
§ Machine Learning
Using data to build models and make predictions
§ Data Visualization
Graphical depiction of data
§ Data Collection and Preparation

Classification CS102
Regression
Using data to build models and make predictions
§ Supervised
§ Training data, each example:
• Set of predictor values - “independent variables”
• Numerical output value - “dependent variable”
§ Model is function from predictors to output
• Use model to predict output value for new
predictor values
§ Example
• Predictors: mother height, father height, current age
• Output: height

Classification CS102
Regression
Classification
Using data to build models and make predictions
§ Supervised
§ Training data, each example:
• Set of predictor values– numeric
feature values - “independent
or categorical
variables”
Numerical output value - “dependent
• Categorical “label” variable”
feature values
method from predictors
§ Model is function to label
to output
• Use model to predict output value for new
label
predictor values
feature values
§ Example
• Predictors:
Feature values: age,
mother gender,
height, income,
father profession
height, current age
Label: buyer,
• Output: heightnon-buyer

Classification CS102
Other Examples
Medical diagnosis
• Feature values: age, gender, history,
symptom1-severity, symptom2-severity,
test-result1, test-result2
• Label: disease
Email spam detection
• Feature values: sender-domain, length,
#images, keyword1, keyword2, …, keywordn
• Label: spam or not-spam
Credit card fraud detection
• Feature values: user, location, item, price
• Label: fraud or okay

Classification CS102
Algorithms for Classification
Despite similarity of problem statement to
regression, non-numerical nature of classification
leads to completely different approaches
§ K-nearest neighbors
§ Decision trees
§ Naïve Bayes
§ … and others

Classification CS102
K-Nearest Neighbors (KNN)
For any pair of data items i1 and i2, from their
feature values compute distance(i1,i2)
Example:
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305)
person2 = (female, teacher, 43, $28K, 94309)
distance(person1, person2)

distance() can be defined as inverse of similarity()

Classification CS102
K-Nearest Neighbors (KNN)
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305)
person2 = (female, teacher, 43, $28K, 94309)
Remember training data has labels

Classification CS102
K-Nearest Neighbors (KNN)
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305) buyer
person2 = (female, teacher, 43, $28K, 94309) non-buyer
Remember training data has labels
To classify a new item i : In the labeled data find
the K closest items to i, assign most frequent label
person3 = (female, doctor, 40, $40K, 95123)

Classification CS102
KNN Example
§ City temperatures – France and Germany
§ Features: longitude, latitude
§ Distance is Euclidean distance
distance([o1,a1],[o2,a2]) = sqrt((o1−o2)2 + (a1−a2)2)
= actual distance in x-y plane
§ Labels: frigid, cold, cool, warm, hot

Nice (7.27, 43.72) cool Predict temperature


Toulouse (1.45, 43.62) warm category from
Frankfurt (8.68, 50.1) cold longitude and latitude
......

Classification CS102
KNN Example

Classification CS102
KNN Summary
To classify a new item i : find K closest items to i
in the labeled data, assign most frequent label

§ No hidden complicated math!


§ Once distance function is defined, rest is easy
§ Though not necessarily efficient
Real examples often have thousands of features
§ Medical diagnosis: symptoms (yes/no), test results
§ Email spam detection: words (frequency)
Database of labeled items might be enormous

Classification CS102
“Regression” Using KNN
Features - gender, profession, age, income, postal-code
person1 = (male, teacher, 47, $25K, 94305) buyer
person2 = (female, teacher, 43, $28K, 94309) non-buyer
Remember training data has labels
To classify a new item i, find K closest items to i
in the labeled data, assign most frequent label
person3 = (female, doctor, 40, $40K, 95123)

Classification CS102
“Regression” Using KNN
Features - gender, profession, age, income, postal-code
$250
person1 = (male, teacher, 47, $25K, 94305) buyer
person2 = (female, teacher, 43, $28K, 94309) non-buyer
$100
Remember training data has labels
To classify a new item i, find K closest items to i
average
in the labeled data, assign most value of
frequent labels
label
person3 = (female, doctor, 40, $40K, 95123)

Classification CS102
Regression Using KNN - Example

Can refine by weighting


average by distance

Classification CS102
Decision Trees
§ Use the training data to construct a decision tree
§ Use the decision tree to classify new data

Classification CS102
Decision Trees
Nodes: features with apologies
for binary gender
Gender
Edges: feature values male
female
Leaves: labels
Age Income
<20 >50 ≥$100K
<$100K
20-50
Non-Buyer Profession Postal Code Profession Buyer

teacher other other


other
doctor 92*** lawyer

Buyer Buyer Non-Buyer Buyer Non-Buyer Non-Buyer Buyer

New data item to classify:


Navigate tree based on feature values

Classification CS102
Decision Trees
Primary challenge is building good decision
trees from training data
• Which features and feature values to use
at each choice point
• HUGE number of possible trees even with
small number of features and values
Common approach: “forest” of many trees,
combine the results
• Still impossible to consider all trees

Classification CS102
Naïve Bayes
Given new data item i, based on i’s feature values
and the training data, compute the probability of
each possible label. Pick highest one.
Efficiency relies on conditional independence
assumption:
Given any two features F1,F2 and a label L, the
probability that F1=v1 for an item with label L is
independent of the probability that F2=v2 for that
item
Examples:
gender and age? income and postal code?

Classification CS102
Naïve Bayes
Given new data item i, based on i’s feature values
and the training data, compute the probability of
each possible label. Pick highest one.
Efficiency relies on conditional independence
assumption:
Given any two features
Conditional F1,F2 and a label L, the
independence
probability
assumption that F1=vdoesn’t
often 1 for anhold,
item with label L is
independent
which of the
is why the probability
approach that F2=v2 for that
is “naive”
item.
Nevertheless the
Examples: approach works very
gender and age? income and postal code?
well in practice

Classification CS102
Naïve Bayes Example
Predict temperature category for a country based
on whether the country has coastline and
whether it is in the EU

Classification CS102
Naïve Bayes Preparation
Step 1: Compute fraction (probability) of items in
each category

cold .18
cool .38
warm .24
hot .20

Classification CS102
Naïve Bayes Preparation
Step 2: For each category, compute fraction of
items in that category for each feature and value
coastline=yes .83 coastline=yes .5
warm coastline=no .5
cold coastline=no .17
(.24)
(.18) EU=yes .67 EU=yes .5
EU=no .33 EU=no .5
coastline=yes .69 coastline=yes 1.0
cool coastline=no .31 hot coastline=no .0
(.38) (.20)
EU=yes .77 EU=yes .71
EU=no .23 EU=no .29

Classification CS102
Naïve Bayes Prediction
New item: France, coastline=yes, EU=yes
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.

category prob. coastline=yes EU=yes product


cold .18 .83 .67 .10
cool .38 .69 .77 .20
warm .24 .5 .5 .06
hot .20 1.0 .71 .14

Classification CS102
Naïve Bayes Prediction
New item: France, coastline=yes, EU=yes
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.

category prob. coastline=yes EU=yes product


cold .18 .83 .67 .10
cool .38 .69 .77 .20
warm .24 .5 .5 .06
hot .20 1.0 .71 .14

Classification CS102
Naïve Bayes Prediction
New item: Serbia, coastline=no, EU=no
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.

category prob. coastline=no EU=no product


cold .18 .17 .33 .01
cool .38 .31 .23 .03
warm .24 .5 .5 .06
hot .20 .0 .29 .00

Classification CS102
Naïve Bayes Prediction
New item: Serbia, coastline=no, EU=no
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.

category prob. coastline=no EU=no product


cold .18 .17 .33 .01
cool .38 .31 .23 .03
warm .24 .5 .5 .06
hot .20 .0 .29 .00

Classification CS102
Naïve Bayes Prediction
New item: Austria, coastline=no, EU=yes
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.

category prob. coastline=no EU=yes product


cold .18 .17 .67 .02
cool .38 .31 .77 .09
warm .24 .5 .5 .06
hot .20 .0 .71 .0

Classification CS102
Naïve Bayes Prediction
New item: Austria, coastline=no, EU=yes
For each category: probability of category times
product of probabilities of new item’s features
in that category. Pick highest.

category prob. coastline=no EU=yes product


cold .18 .17 .67 .02
cool .38 .31 .77 .09
warm .24 .5 .5 .06
hot .20 .0 .71 .0

Classification CS102
Naïve Bayes Prediction
New item: Austria, coastline=no, EU=yes
Manycategory:
For each presentations of NaïveofBayes
probability category times
include
product an additionalofnormalization
of probabilities new item’s features
step
in that so the final
category. Pickproducts
highest.are
probabilities that sum to 1.0. The
choice prob.
category of label is unchanged,
coastline=no so we’ve
EU=yes product
omitted.18
cold that step .17
for simplicity.
.67 .02
cool .38 .31 .77 .09
warm .24 .5 .5 .06
hot .20 .0 .71 .0

Classification CS102
Feature Selection
Real applications often have thousands of features
§ Naïve Bayes typically uses only some of the
features, those most affecting the label
§ Decision trees also rely on choosing features
that most affect the label
§ Feature selection is a key part of machine
learning – an art and a science

Classification CS102
Training and Test
Created machine learning model from training data.
How do you know whether it’s a good model?
Ø Try it on known data
Feature Values Labels

Training Training Data


Data

“Test Data”

Classification CS102
Other Terms You Might Hear
Logistic regression
• Recall regression model is function f from
predictor values to numeric output value
• For classification: from training data obtain one
regression function fL for each label L
fL(feature-values) = probability of item having label L
Support Vector Machine
• Two labels only (“binary classifier”)
• Features = multidimensional space
• From training data SVM finds
hyper-plane that best divides
space according to labels

Classification CS102
Other Terms You Might Hear
Deep Learning
• Complex, mysterious (the ultimate “black box”
software), becoming extremely popular
• Multiple layers, each layer uses classification
techniques to reduce complexity for next layer
and further classification
• Important plus: identifies features from raw data
Neural Network
• Precursor to deep learning, typically two layers
• Leap to deep learning enabled by massive
amounts of data, powerful computing

Classification CS102
Other Terms You Might Hear
Deep Learning
• Complex, mysterious (the ultimate “black box”
software), becoming extremely popular
• Multiple layers, each layer uses classification
techniques to reduce complexity for next layer
and further classification
• Important plus: identifies features from raw data
Neural Network
• Precursor to deep learning, typically two layers
• Leap to deep learning enabled by massive
amounts of data, powerful computing

Classification CS102
Classification Summary
§ Supervised machine learning
§ Training data, each example:
• Set of feature values – numeric or categorical
• Categorical output value – label
§ Model is “function” from feature values to label
• Use model to predict label for new feature values
§ Approaches we covered
• K-nearest neighbors – relies on distance (or
similarity) function
• Decision trees – relies on finding good trees/forests
• Naïve Bayes – relies on conditional independence
assumption

Classification CS102

You might also like