Week 11 - PROG 8510 Week 11
Week 11 - PROG 8510 Week 11
Classification
Week 11
This class
• Describe diverse types of classification algorithms.
• Build classification models, such as Logistics Regression, Decision Trees,
Random Forests
What are classification algorithms ?
• Classification is perhaps the most important form of prediction: the goal is to predict
whether a record is a 1 or a 0 (phishing/not-phishing, click/don’t click, churn/don’t
churn), or in some cases, one of several categories.
• Often, we need more than a simple binary classification: we want to know the
predicted probability that a case belongs to a class.
• Rather than having a model simply assign a binary classification, most algorithms
can return a probability score (propensity) of belonging to the class of interest.
• For example; Predicting the amount of loan for a potential client is a regression
problem whereas predicting whether they will qualify or not qualify for the loan is a
classification problem.
What are classification algorithms ?
1. Establish a cutoff probability for the class of interest, above which we consider a record as belonging to
that class.
2. Estimate (with any model) the probability that a record belongs to the class of interest.
3. If that probability is above the cutoff probability, assign the new record to the class of interest.
4. In the next section you will learn the inner working of some of the most common classification
algorithms.
Support Vector Machines
• Support vector machines or SVM are a types of discriminant classification
algorithm. Here rather than modeling each class, we simply find a line or
curve (in two dimensions) or manifold (in multiple dimensions) that divides
the classes from each other.
• A linear discriminative classifier would attempt to draw a straight line
separating the two sets of data, and thereby create a model for classification.
SVM Models
• SVM fits a line that maximizes the
margin between the two sets of
points.
• Notice that a few of the training
points just touch the margin: they
are indicated by the black circles in
this figure. These points are the
pivotal elements of this fit, and are
known as the support vectors, and
give the algorithm its name.
Class 0
Logistic Regression in Python