0% found this document useful (0 votes)

6 views5 pages

Lab3 Form

This document outlines a lab session on simple classifiers in data mining using Weka, focusing on OneR, overfitting, Naïve Bayes, decision trees, pruning, and nearest neighbor methods. It includes instructions for running various algorithms on datasets, comparing their performance, and understanding key concepts like entropy and information gain. The lab emphasizes practical application and evaluation of classifiers through accuracy metrics and model building.

Uploaded by

Khang Le

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

Lab3 Form

Uploaded by

Khang Le

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Introduction to Data Mining

Lab 3 – Simple Classifiers

3.1. Simplicity first!

In the third class, we are going to learn how to examine some data mining algorithms on datasets using
Weka. (See the lecture of class 3 by Ian H. Witten, [1]1)

In this section, we learn how OneR (one attribute does all the work) works. Open weather.nominal.arff,
run OneR, look at the classifier model, how is it?

- Remarks:

Use OneR to build decision tree for some datasets. Compared with ZeroR, how does OneR perform?

Dataset OneR - accuracy ZeroR - accuracy

3.2. Overfitting
What is “overfitting”? - overfitting occurs when a statistical model describes random error or noise
instead of the underlying relationship, b/c of complex model, noise/error in the data, or unsuitable
applied criterion,  poor prediction. To avoid this, use cross-validation, or pruning... [ref:
http://en.wikipedia.org/wiki/Overfitting]

1
http://www.cs.waikato.ac.nz/ml/weka/mooc/dataminingwithweka/

1
Follow the instructions in [1], run OneR on the weather.numeric and diabetes dataset…

Write down the results in the following table: (cross-validation used)

Dataset OneR ZeroR

weather.numeric Classifier model: Classifier model:

Accuracy: Accuracy:
weather.numeric w/o outlook Classifier model: Classifier model:
att.

Accuracy: Accuracy:
diabetes Classifier model: Classifier model:

Accuracy: Accuracy:
Diabetes w/ minBucketSize 1 Classifier model: Classifier model:

Accuracy: Accuracy:

MinBucketSize? -

Remark? -

3.3. Using probabilities

Lecture of Naïve Bayes: [1]

 All attributes contribute equally and independently  no identical attributes

Follow the instructions in [1] to exame NaiveBayes on weather.nominal

2
Classifier model Performance
(how many percent of
total instances are
classified correctly?)

3.4. Decision Trees

Lecture of decision trees: [1]

How to calculate entropy and information gain?

Entropy measures the impurity of a collection.

c
Entropy ( S )=−∑ pi log 2 p i
i=1

Information Gain measures the Expected Reduction in Entropy.

Info. Gain = (Entropy of distribution before the split) – (Entropy of distribution after the split)

|S v|
Gain ( S , A ) ≡ Entropy ( S ) − ∑ |S|
Entropy ( S v )
v ∈Values ( A )

3
Values(A) is the set of all possible values for attribute A and Sv is the subset of S for which attribute A has
value.

Build a decision tree for the weather data step by step:

Compute Entropy and Info. Gain Selected attribute

Final decision tree

Use Weka to examine J48 on the weather data.

4
3.5. Pruning decision trees
Follow the lecture of pruning decision tree in [1] …

Why pruning? - Prevent overfitting to noise in the data.

In Weka, look at the J48 leaner. What are parameters: minNunObj, confidenceFactor?

-
-

Follow the instructions in [1] to run J48 on the two dataset, then fill in the following table:

Dataset J48 (default, pruned) J48 (unpruned)

diabetes.arff

breast‐cancer.arff

3.6. Nearest neighbor

Follow the lecture in [1]

“Instance‐based” learning = “nearest‐neighbor” learning

What is k‐nearest‐neighbors (K-NN)? –

Follow the instructions in [1] to run lazy>IBk on the glass dataset with k = 1, 5, 20, and then fill its
accuracy in the following table:

Dataset IBk, k =1 IBk, k =5 IBk, k =20

Glass

WEKA Lab Manual
100% (2)
WEKA Lab Manual
107 pages
Screw Conveyor Design
100% (1)
Screw Conveyor Design
8 pages
Square Wave Generator
100% (2)
Square Wave Generator
6 pages
Ancillary Eqpmnts
No ratings yet
Ancillary Eqpmnts
24 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Programming Manual: Advanced Motion Control Software
No ratings yet
Programming Manual: Advanced Motion Control Software
17 pages
Diploma in Electrical Engineering Industrial Traning Report
No ratings yet
Diploma in Electrical Engineering Industrial Traning Report
42 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Lab3 NguyenQuocKhanh ITITIU18186
No ratings yet
Lab3 NguyenQuocKhanh ITITIU18186
7 pages
2.4-p1-p71 Vertical
No ratings yet
2.4-p1-p71 Vertical
7 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Bridge Cost Design Manual PDF
No ratings yet
Bridge Cost Design Manual PDF
50 pages
Lab 7 RC Time Constant
100% (2)
Lab 7 RC Time Constant
8 pages
Calculation For Open Drain Design: Rain Storm Discharge Calculation
No ratings yet
Calculation For Open Drain Design: Rain Storm Discharge Calculation
45 pages
Ch4 Supervised
No ratings yet
Ch4 Supervised
78 pages
Var, Svar and Svec Models
No ratings yet
Var, Svar and Svec Models
32 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
CINPD Unit 5
No ratings yet
CINPD Unit 5
16 pages
Business Intelligence DM2 WEKA Classification
No ratings yet
Business Intelligence DM2 WEKA Classification
102 pages
Ur Z21rev4
No ratings yet
Ur Z21rev4
16 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Lesson 5
No ratings yet
Lesson 5
28 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
6 Mips Datapath
No ratings yet
6 Mips Datapath
55 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Hbgary Shell Trojan Gens
No ratings yet
Hbgary Shell Trojan Gens
28 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
HAI C-06 Jueves 15-10-2020
No ratings yet
HAI C-06 Jueves 15-10-2020
34 pages
DS4 - CLS-Decision Tree
No ratings yet
DS4 - CLS-Decision Tree
32 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
45 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Lipid Chemistry BSN
No ratings yet
Lipid Chemistry BSN
53 pages
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
60 pages
Uv-K5 User Manuel
No ratings yet
Uv-K5 User Manuel
55 pages
Boot Reference List
No ratings yet
Boot Reference List
6 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Decision Tree
No ratings yet
Decision Tree
29 pages
Enhanced Performance of Air-Cooled Chillers Using Evaporative Cooling PDF
No ratings yet
Enhanced Performance of Air-Cooled Chillers Using Evaporative Cooling PDF
5 pages
DWM
No ratings yet
DWM
9 pages
Classification
No ratings yet
Classification
30 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Lecture 2.6
No ratings yet
Lecture 2.6
23 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
BS5467 Cables Prysmian PDF
No ratings yet
BS5467 Cables Prysmian PDF
5 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Big Data Notes
No ratings yet
Big Data Notes
33 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
Analytical Scalable PDF
No ratings yet
Analytical Scalable PDF
9 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
43 pages
Chap 18 B
No ratings yet
Chap 18 B
22 pages
Short Notes On Servo Motor
100% (3)
Short Notes On Servo Motor
2 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
40 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Yapay Zeka Ve Makine Öğrenmesi 10
No ratings yet
Yapay Zeka Ve Makine Öğrenmesi 10
34 pages
Semester 2, 2020 Week 8: Data Mining in WEKA Tutorial/Lab Session - 7
No ratings yet
Semester 2, 2020 Week 8: Data Mining in WEKA Tutorial/Lab Session - 7
13 pages
Aukland TB - SQL in 24 Hours, Sams Teach Yourself - (PG 1 - 69)
No ratings yet
Aukland TB - SQL in 24 Hours, Sams Teach Yourself - (PG 1 - 69)
69 pages
P4-DTRF 1
No ratings yet
P4-DTRF 1
63 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Novel Convolutional Neural Network (NCNN) For The Diagnosis of Bearing Defects in Rotary Machinery
No ratings yet
Novel Convolutional Neural Network (NCNN) For The Diagnosis of Bearing Defects in Rotary Machinery
10 pages
Chap 06
No ratings yet
Chap 06
15 pages
Lab3 OS
No ratings yet
Lab3 OS
6 pages
2018 Howland Et Al. Quantifying The Effects of Erosion On Archaeological Sites With Low-Altitude Aerial Photography, Structure From Motion, and GIS
No ratings yet
2018 Howland Et Al. Quantifying The Effects of Erosion On Archaeological Sites With Low-Altitude Aerial Photography, Structure From Motion, and GIS
9 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
ZOOM Software Measurement and Graph Types
No ratings yet
ZOOM Software Measurement and Graph Types
6 pages
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
No ratings yet
Geometric Intuition of Decision Tree: Axis Parallel Hyperplanes
7 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Compressive Strength Characteristic of Cowdung Ash Blended Cement Concrete
No ratings yet
Compressive Strength Characteristic of Cowdung Ash Blended Cement Concrete
7 pages
Chap 07
No ratings yet
Chap 07
7 pages
Dear Sir,: Larsen & Toubro Limited Electrical & Automation Control & Automation
No ratings yet
Dear Sir,: Larsen & Toubro Limited Electrical & Automation Control & Automation
2 pages
Decision Tree Id3 Problem
No ratings yet
Decision Tree Id3 Problem
5 pages
Lab6 OS
No ratings yet
Lab6 OS
3 pages
Cavity Vent Valve
No ratings yet
Cavity Vent Valve
2 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Spacer Plate
No ratings yet
Spacer Plate
1 page
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet