Assignment #2 Introduction To Classification
Assignment #2 Introduction To Classification
Questions
1. Why is naïve Bayesian classification called “naïve”? Briefly outline the major ideas of
naïve Bayesian classification?
Questions
Consider the following dataset of a credit card promotion database. The credit card
company has authorized a new life insurance promotion similar to the existing one. We are
interested in building a classification data mining model for deciding whether to send the
customer promotional material.
1. Build a Naive Bayes classifier for this dataset, by filling in the following with counts
and probabilities.
Life insurance promotion
Y N
Magazine promotion Y
N
2. Use the Naive Bayes classifier obtained in question 1. To determine the value of Life
Insurance Promotion for the following instance:
Magazine Promotion = Y ; Watch Promotion = Y ; Credit Card Insurance = N; Sex =
F; Life Insurance Promotion = ?
Problem #2
Consider the set of training examples in the diagram below. A plus indicates a positive
example and a star indicates a negative example. Use the Euclidian distance to answer the
following questions:
1. How will the point (8, 1) be classified by the 1-nearest neighbor classifier?
2. How will the point (8, 8) be classified by the 3-nearest neighbors?
Lisa has lost gender information of one of her customers, and does not know whether to
make a skirt or trousers. She is planning to throw a coin. Can you help her to make a better
decision using a KNN-classifier (K =3)? Use the Euclidian distance. The customer who is
missing gender information:
The following table contains a small data set of 10 records excerpted from the ClassifyRisk
data set, with predictors’ age, marital status, and income, and target variable risk.
1. Using R find the k-nearest neighbor for Record #10, using k=3.
2. Using the ClassifyRisk data set with predictors age, marital status, and income, and
target variable risk, find the k-nearest neighbor for Record #1, using k=2 and
Euclidean distance.
3. Using the ClassifyRisk data set with predictors age, marital status, and income, and
target variable risk, find the k-nearest neighbor for Record #1, using k=2 and
Minkowski distance.