0% found this document useful (0 votes)
27 views16 pages

Week 11 - PROG 8510 Week 11

This class describes various classification algorithms including logistic regression, decision trees, random forests, and support vector machines. It discusses how classification predicts categorical outcomes like spam/not-spam rather than continuous values. Common steps in classification problems include estimating probabilities that records belong to a class, setting a cutoff probability, and assigning records above the cutoff to that class. Specific algorithms covered are support vector machines, which find boundaries to separate classes; logistic regression which assigns classes based on model output; decision trees which classify using a series of questions; and random forests which combine predictions from many decision trees. Python implementations of these algorithms using Scikit-Learn are also demonstrated. The next class will cover evaluating classification model performance.

Uploaded by

Vineel Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views16 pages

Week 11 - PROG 8510 Week 11

This class describes various classification algorithms including logistic regression, decision trees, random forests, and support vector machines. It discusses how classification predicts categorical outcomes like spam/not-spam rather than continuous values. Common steps in classification problems include estimating probabilities that records belong to a class, setting a cutoff probability, and assigning records above the cutoff to that class. Specific algorithms covered are support vector machines, which find boundaries to separate classes; logistic regression which assigns classes based on model output; decision trees which classify using a series of questions; and random forests which combine predictions from many decision trees. Python implementations of these algorithms using Scikit-Learn are also demonstrated. The next class will cover evaluating classification model performance.

Uploaded by

Vineel Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

PROG 8510 : Programming Statistics for Business

Classification

Week 11
This class
• Describe diverse types of classification algorithms.
• Build classification models, such as Logistics Regression, Decision Trees,
Random Forests
What are classification algorithms ?
• Classification is perhaps the most important form of prediction: the goal is to predict
whether a record is a 1 or a 0 (phishing/not-phishing, click/don’t click, churn/don’t
churn), or in some cases, one of several categories.
• Often, we need more than a simple binary classification: we want to know the
predicted probability that a case belongs to a class.
• Rather than having a model simply assign a binary classification, most algorithms
can return a probability score (propensity) of belonging to the class of interest.
• For example; Predicting the amount of loan for a potential client is a regression
problem whereas predicting whether they will qualify or not qualify for the loan is a
classification problem.
What are classification algorithms ?

Basic Image for spam filtration classifier


Steps involved in classification
A sliding cutoff can then be used to convert the propensity score to a decision. The general approach is as
follows:

1. Establish a cutoff probability for the class of interest, above which we consider a record as belonging to
that class.

2. Estimate (with any model) the probability that a record belongs to the class of interest.

3. If that probability is above the cutoff probability, assign the new record to the class of interest.

4. In the next section you will learn the inner working of some of the most common classification
algorithms.
Support Vector Machines
• Support vector machines or SVM are a types of discriminant classification
algorithm. Here rather than modeling each class, we simply find a line or
curve (in two dimensions) or manifold (in multiple dimensions) that divides
the classes from each other.
• A linear discriminative classifier would attempt to draw a straight line
separating the two sets of data, and thereby create a model for classification.
SVM Models
• SVM fits a line that maximizes the
margin between the two sets of
points.
• Notice that a few of the training
points just touch the margin: they
are indicated by the black circles in
this figure. These points are the
pivotal elements of this fit, and are
known as the support vectors, and
give the algorithm its name.

SVM model dividing red and yellow class


SVM in Python
Notice since it is classification problem
hence we changed the y value for a
continuous variable to a categorical or
binary variable. i.e Was the tip value
more than 2$ or not ?

Support vector classifier model is


imported from the relevant
submodule in sklearn package.
Rest of the coding steps are in line
with Sklearn API.
SVM in Python Continued

SVM model created is now used to


predict the values by musing predict
method.

The resultant output is an array of


Boolean data types depicting whether
for each X record whether the tip
value exceeded 2$ or not.
Logistic Regression
• Logistic regression is analogous to
multiple linear regression except
the outcome is binary. Class 1

• Based on the value of model output


- logit(p), class 1 or zero is
assigned.

Class 0
Logistic Regression in Python

The sklearn.linear_model submodule


contains LogisticRegression class
similar to LinearRegression that is used
to perform logistic regression in python.

Notice that rest of the steps in model


building and classification remains the
same.
Decision Trees and Random Forests
• Decision trees are extremely
intuitive ways to classify or label
objects: you simply ask a series of
questions designed to zero-in on
the classification.
• For example, if you wanted to build
a decision tree to classify an animal
you come across while on a hike,
you might construct the one shown
here:
Fig Source : Data Science Handbook, Jake Vanderplaus
Decision Tress in Python

• The sklearn.tree submodule


contains DecisionTreeClassifier
class similar that is used to
perform logistic regression in
python.

• Notice that rest of the steps in


model building and classification
remains the same.
Random Forests in Python

• When the predictive power of


several individual trees are
combined in a single model such
model is called random forest.

• Models that rely on outputs from


several individual component
models is called ensemble
modeling.
Please refer to the notebook file
PROG 8510 Classification Modeling and Evaluation (Week 11
and 12)
Next Class :
• Classification Model Evaluation

You might also like