0% found this document useful (0 votes)
24 views

Assignment #2 Introduction To Classification

This document contains an assignment for a data mining course. The assignment includes 4 problems related to classification techniques like naive Bayes classification, k-nearest neighbors (KNN) classification, and evaluating classification models. Specifically, it asks students to: 1) Build a naive Bayes classifier and make predictions on new data using the classifier. 2) Perform KNN classification using 1-nearest neighbor and 3-nearest neighbors on 2D data points. 3) Make a gender prediction for a customer using 3-nearest neighbors classification. 4) Find the k-nearest neighbors for different records in a sample dataset using KNN with Euclidean and Minkowski distances.

Uploaded by

Rania Saoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Assignment #2 Introduction To Classification

This document contains an assignment for a data mining course. The assignment includes 4 problems related to classification techniques like naive Bayes classification, k-nearest neighbors (KNN) classification, and evaluating classification models. Specifically, it asks students to: 1) Build a naive Bayes classifier and make predictions on new data using the classifier. 2) Perform KNN classification using 1-nearest neighbor and 3-nearest neighbors on 2D data points. 3) Make a gender prediction for a customer using 3-nearest neighbors classification. 4) Find the k-nearest neighbors for different records in a sample dataset using KNN with Euclidean and Minkowski distances.

Uploaded by

Rania Saoud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment #2

Introduction to classification (part 1)


Course Title: Data Mining

Instructor: Dr. Amor Messaoud

Questions

1. Why is naïve Bayesian classification called “naïve”? Briefly outline the major ideas of
naïve Bayesian classification?

Questions

Use the three-class confusion matrix below to answer questions 1 through 3.

1. What percent of the instances were correctly classified?


2. How many class 2 instances are in the dataset?
3. How many instances were incorrectly classified with class 3?
4. Sometimes a data set is partitioned such that a validation set is provided. What is the
purpose of the validation set?
5. If we build a classifier and evaluate it on the training set and the test set:
a. Which data set would we expect to have the higher accuracy: training set or test
set
b. Which data set provides best accuracy estimate on new data: training set test set
6. Consider the one-dimensional data shown in the following table. Classify the data
point x = 5.0 according to its 1-, 3-, and 5-nearest neighbors (using majority vote)

ASSIGNMENT #1 (FEBRUARY 2019) 1


Problem #1

Consider the following dataset of a credit card promotion database. The credit card
company has authorized a new life insurance promotion similar to the existing one. We are
interested in building a classification data mining model for deciding whether to send the
customer promotional material.

1. Build a Naive Bayes classifier for this dataset, by filling in the following with counts
and probabilities.
Life insurance promotion
Y N
Magazine promotion Y
N

Life insurance promotion


Y N
Watch promotion Y
N

Life insurance promotion


Y N
Credit card insurance Y
N

ASSIGNMENT #1 (FEBRUARY 2019) 2


Life insurance promotion
Y N
Sex M
F

2. Use the Naive Bayes classifier obtained in question 1. To determine the value of Life
Insurance Promotion for the following instance:
Magazine Promotion = Y ; Watch Promotion = Y ; Credit Card Insurance = N; Sex =
F; Life Insurance Promotion = ?

Problem #2

Consider the set of training examples in the diagram below. A plus indicates a positive
example and a star indicates a negative example. Use the Euclidian distance to answer the
following questions:
1. How will the point (8, 1) be classified by the 1-nearest neighbor classifier?
2. How will the point (8, 8) be classified by the 3-nearest neighbors?

ASSIGNMENT #1 (FEBRUARY 2019) 3


Problem #3

Lisa has lost gender information of one of her customers, and does not know whether to
make a skirt or trousers. She is planning to throw a coin. Can you help her to make a better
decision using a KNN-classifier (K =3)? Use the Euclidian distance. The customer who is
missing gender information:

Gender Waist Hip


? 28 34
Male 28 32
Male 33 35
Female 27 33
Female 31 36

Problem #4 (Larose and Larose, 2015, p. 312)

The following table contains a small data set of 10 records excerpted from the ClassifyRisk
data set, with predictors’ age, marital status, and income, and target variable risk.

1. Using R find the k-nearest neighbor for Record #10, using k=3.
2. Using the ClassifyRisk data set with predictors age, marital status, and income, and
target variable risk, find the k-nearest neighbor for Record #1, using k=2 and
Euclidean distance.
3. Using the ClassifyRisk data set with predictors age, marital status, and income, and
target variable risk, find the k-nearest neighbor for Record #1, using k=2 and
Minkowski distance.

ASSIGNMENT #1 (FEBRUARY 2019) 4

You might also like