0% found this document useful (0 votes)
15 views

Assignment #2 Introduction To Classification

This document contains an assignment for a data mining course. The assignment includes 4 problems related to classification techniques like naive Bayes classification, k-nearest neighbors (KNN) classification, and evaluating classification models. Specifically, it asks students to: 1) Build a naive Bayes classifier and make predictions on new data using the classifier. 2) Perform KNN classification using 1-nearest neighbor and 3-nearest neighbors on 2D data points. 3) Make a gender prediction for a customer using 3-nearest neighbors classification. 4) Find the k-nearest neighbors for different records in a sample dataset using KNN with Euclidean and Minkowski distances.

Uploaded by

Rania Saoud
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Assignment #2 Introduction To Classification

This document contains an assignment for a data mining course. The assignment includes 4 problems related to classification techniques like naive Bayes classification, k-nearest neighbors (KNN) classification, and evaluating classification models. Specifically, it asks students to: 1) Build a naive Bayes classifier and make predictions on new data using the classifier. 2) Perform KNN classification using 1-nearest neighbor and 3-nearest neighbors on 2D data points. 3) Make a gender prediction for a customer using 3-nearest neighbors classification. 4) Find the k-nearest neighbors for different records in a sample dataset using KNN with Euclidean and Minkowski distances.

Uploaded by

Rania Saoud
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment #2

Introduction to classification (part 1)


Course Title: Data Mining

Instructor: Dr. Amor Messaoud

Questions

1. Why is naïve Bayesian classification called “naïve”? Briefly outline the major ideas of
naïve Bayesian classification?

Questions

Use the three-class confusion matrix below to answer questions 1 through 3.

1. What percent of the instances were correctly classified?


2. How many class 2 instances are in the dataset?
3. How many instances were incorrectly classified with class 3?
4. Sometimes a data set is partitioned such that a validation set is provided. What is the
purpose of the validation set?
5. If we build a classifier and evaluate it on the training set and the test set:
a. Which data set would we expect to have the higher accuracy: training set or test
set
b. Which data set provides best accuracy estimate on new data: training set test set
6. Consider the one-dimensional data shown in the following table. Classify the data
point x = 5.0 according to its 1-, 3-, and 5-nearest neighbors (using majority vote)

ASSIGNMENT #1 (FEBRUARY 2019) 1


Problem #1

Consider the following dataset of a credit card promotion database. The credit card
company has authorized a new life insurance promotion similar to the existing one. We are
interested in building a classification data mining model for deciding whether to send the
customer promotional material.

1. Build a Naive Bayes classifier for this dataset, by filling in the following with counts
and probabilities.
Life insurance promotion
Y N
Magazine promotion Y
N

Life insurance promotion


Y N
Watch promotion Y
N

Life insurance promotion


Y N
Credit card insurance Y
N

ASSIGNMENT #1 (FEBRUARY 2019) 2


Life insurance promotion
Y N
Sex M
F

2. Use the Naive Bayes classifier obtained in question 1. To determine the value of Life
Insurance Promotion for the following instance:
Magazine Promotion = Y ; Watch Promotion = Y ; Credit Card Insurance = N; Sex =
F; Life Insurance Promotion = ?

Problem #2

Consider the set of training examples in the diagram below. A plus indicates a positive
example and a star indicates a negative example. Use the Euclidian distance to answer the
following questions:
1. How will the point (8, 1) be classified by the 1-nearest neighbor classifier?
2. How will the point (8, 8) be classified by the 3-nearest neighbors?

ASSIGNMENT #1 (FEBRUARY 2019) 3


Problem #3

Lisa has lost gender information of one of her customers, and does not know whether to
make a skirt or trousers. She is planning to throw a coin. Can you help her to make a better
decision using a KNN-classifier (K =3)? Use the Euclidian distance. The customer who is
missing gender information:

Gender Waist Hip


? 28 34
Male 28 32
Male 33 35
Female 27 33
Female 31 36

Problem #4 (Larose and Larose, 2015, p. 312)

The following table contains a small data set of 10 records excerpted from the ClassifyRisk
data set, with predictors’ age, marital status, and income, and target variable risk.

1. Using R find the k-nearest neighbor for Record #10, using k=3.
2. Using the ClassifyRisk data set with predictors age, marital status, and income, and
target variable risk, find the k-nearest neighbor for Record #1, using k=2 and
Euclidean distance.
3. Using the ClassifyRisk data set with predictors age, marital status, and income, and
target variable risk, find the k-nearest neighbor for Record #1, using k=2 and
Minkowski distance.

ASSIGNMENT #1 (FEBRUARY 2019) 4

You might also like