0% found this document useful (0 votes)

42 views14 pages

Assignment 3

this is the assignment file

Uploaded by

arun neupane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views14 pages

Assignment 3

this is the assignment file

Uploaded by

arun neupane

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

lOMoARcPSD|12245914

Assignment 3

Introduction to Data Analytics (University of Technology Sydney)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by arun neupane ([email protected])
lOMoARcPSD|12245914

Introduction to Data Analytics

Assessment Task 3: Data mining in action

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Table of Contents

Data Mining ....................................................................................................................... 4

The Task ........................................................................................................................ 4

Input................................................................................................................................ 4

Output............................................................................................................................. 4

Preprocessing ................................................................................................................... 5

Column Filter ................................................................................................................. 5

Missing Value ................................................................................................................ 5

Number to String .......................................................................................................... 5

Normalizer ..................................................................................................................... 5

Partitioning..................................................................................................................... 5

Classifiers .......................................................................................................................... 6

Decision Trees .............................................................................................................. 6

Random Forest ............................................................................................................. 7

K Nearest Neighbor (KNN) ......................................................................................... 8

SVM ................................................................................................................................ 9

Neural Networks ......................................................................................................... 10

Tree Ensemble.............................................................................................................11

Best Classifier ................................................................................................................. 12

Result Summary ......................................................................................................... 12

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Conclusion ................................................................................................................... 12

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Data Mining

The Task

Following the last assignment, building classifiers and choosing the best one to predict

an attribute “QUALIFIED” for property data set is the main focus of this assignment.

There are number of methods for it. The software called KNIME, which has a graphical

interface, is chosen for it to explicate the process visually.

Input

There are three files for this assignment. These are training data set, unknown data set,

and sample prediction data set. The training data set has the attribute “QUALIFIED”, but

unknown data set has not. The last data set, sample prediction, is filled with random

values for how Kaggle works.

For the assignment, KNIME will handle the training and unknown data sets to predict

the attribute value for unknown data set.

Output

It is not mandatory, but once predicted data is created, uploading on Kaggle will score it

and known how effective the process is.

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Preprocessing

Column Filter

Within the data set, attribute “GIS_LAST_MOD_DTTM” which is a column number 37

has same value for all rows. Therefore, a column filter is used to remove the column

from the data set to ignore it.

Missing Value

Missing values which may disturb the prediction are will be removed.

Number to String

There are attributes which have numbers as data, but not numeric data such as “HEAT”,

“STYLE”, “STRUCT”, “GRADE”, “CNDTN”, “EXTWALL”, “ROOF”, “INTWALL”,

“USECODE”. There will be treated as string to improve learner’s performance.

Normalizer

The normalizer normalizes attribute “AYB” with min-max normalization.

Partitioning

The partitioning node separates the training data into two portions, split 70-30 with 70%

will be trained, and 30% will be tested.

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Classifiers

Decision Trees

The data will be transformed and predicted by decision tree nodes. It is most

appropriate to construct categorical data. The accuracy is 83.074%. There are 1544

wrong classified rows.

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Random Forest

The pre-processed data transmitted into Random Forest learner, and default settings

are used. The accuracy is 88.043%.

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

K Nearest Neighbor (KNN)

The preprocessed data transmitted into the KNN node. The “Number of Neighbors to

consider (K) was changed to 5 which was originally 3. The accuracy is 85.855%

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

SVM

After starting the SVM learner over 24 hours, it did not complete the process; thus, no

results came out.

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Neural Networks

The pre-processed data was transmitted into the PNN Learner. The settings are default.

The accuracy is 87.01%.

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Tree Ensemble

The pre-processed data transmitted into the Tree Ensemble Learner with default

settings except the partitioning, which is 90-10. The accuracy is 88.662%.

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

Best Classifier

Result Summary

The result of each method is the following:

Decision Tree: 83.074%

Random Forest: 88.043%

K Nearest Neighbor: 85.855%

SVM:

Neural Networks: 87.01%.

Tree Ensemble: 88.662%.

Conclusion

Based on the result summary above, Tree Ensemble has the highest accuracy among

others. Thus, for unknown data set, Tree Ensemble methods will be used for making a

prediction. The prediction from unknown data set was uploaded on Kaggle.

Downloaded by arun neupane ([email protected])

lOMoARcPSD|12245914

The whole part of KNIME workflows:

Downloaded by arun neupane ([email protected])

Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
AAIC Syllabus
No ratings yet
AAIC Syllabus
19 pages
Assignment 1.1: First 10 Rows Looks Like Below in Notepad++
100% (1)
Assignment 1.1: First 10 Rows Looks Like Below in Notepad++
6 pages
E1213 PRNN: Assignment 1 - Basic Models: Prof. Prathosh A. P. Submission Deadline: 1st March 2022
No ratings yet
E1213 PRNN: Assignment 1 - Basic Models: Prof. Prathosh A. P. Submission Deadline: 1st March 2022
3 pages
DPP-Inequalities (Wavy Curve Method) BASICS - NEHA - 231102 - 180010
100% (1)
DPP-Inequalities (Wavy Curve Method) BASICS - NEHA - 231102 - 180010
6 pages
ML FA24 Final Term Exam (Solution)
No ratings yet
ML FA24 Final Term Exam (Solution)
19 pages
WIP - ML-22-DEC Weekend
No ratings yet
WIP - ML-22-DEC Weekend
40 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
manual(2023-CS-156).docx
No ratings yet
manual(2023-CS-156).docx
26 pages
fda_a3_13642032.pdf
No ratings yet
fda_a3_13642032.pdf
19 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
CSE Sem7 N 8
No ratings yet
CSE Sem7 N 8
51 pages
2_DataPreProcessing_code
No ratings yet
2_DataPreProcessing_code
46 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
16 pages
minor project
No ratings yet
minor project
21 pages
05 K-Nearest Neighbors
No ratings yet
05 K-Nearest Neighbors
15 pages
Slides on DataI
No ratings yet
Slides on DataI
33 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Icrtmse 2023 Book of Abstracts
No ratings yet
Icrtmse 2023 Book of Abstracts
154 pages
data-analytics-manual lab g.anill kumar
No ratings yet
data-analytics-manual lab g.anill kumar
23 pages
Aman 61
No ratings yet
Aman 61
24 pages
Final
No ratings yet
Final
13 pages
AIML_ECE304_Assign-2_kartikeya_Kandpal_Ajitesh_S.ipynb - Colab
No ratings yet
AIML_ECE304_Assign-2_kartikeya_Kandpal_Ajitesh_S.ipynb - Colab
4 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Capstone Projects
No ratings yet
Capstone Projects
4 pages
Scikit Learn Cheat Sheet Python
No ratings yet
Scikit Learn Cheat Sheet Python
1 page
AIML_ECE304_Assign-2_Kartikeya_Kandpal_Ajitesh_S.ipynb - Colab
No ratings yet
AIML_ECE304_Assign-2_Kartikeya_Kandpal_Ajitesh_S.ipynb - Colab
3 pages
Video Lectures and Assignments
No ratings yet
Video Lectures and Assignments
3 pages
ML 3 & 4 Notes
No ratings yet
ML 3 & 4 Notes
18 pages
Assignment 1 (1)
No ratings yet
Assignment 1 (1)
2 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Lecture_1
No ratings yet
Lecture_1
43 pages
Heart Merged
No ratings yet
Heart Merged
8 pages
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
100% (1)
Scikit-Learn Cheat Sheet Python For Data Science: Preprocessing The Data Evaluate Your Model's Performance
1 page
Data Structure Unit-1
No ratings yet
Data Structure Unit-1
125 pages
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
100% (1)
Python For Data Science Cheat Sheet: Scikit-Learn Create Your Model Evaluate Your Model's Performance
1 page
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
100% (1)
Scikit-Learn: Scikit-Learn Is An Open Source Python Library That
1 page
Machine Learning Assignment (1)
No ratings yet
Machine Learning Assignment (1)
8 pages
Mathematics For Business and Economics Notes
No ratings yet
Mathematics For Business and Economics Notes
3 pages
FAQ's - Supervised Learning
No ratings yet
FAQ's - Supervised Learning
4 pages
Midterm - APS1070 - 2019 - 09 Fall
No ratings yet
Midterm - APS1070 - 2019 - 09 Fall
2 pages
Unit 5
No ratings yet
Unit 5
13 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
The MathWorks, Inc. - MATLAB Global Optimization Toolbox™ User's Guide (2020, The MathWorks, Inc.)
No ratings yet
The MathWorks, Inc. - MATLAB Global Optimization Toolbox™ User's Guide (2020, The MathWorks, Inc.)
878 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
1.1 - Limits Numerically and Graphically
No ratings yet
1.1 - Limits Numerically and Graphically
26 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
QFT I
No ratings yet
QFT I
200 pages
Case Study - Classifier
No ratings yet
Case Study - Classifier
5 pages
Logit Disagreement: OoD Detection with Bayesian Neural Networks
No ratings yet
Logit Disagreement: OoD Detection with Bayesian Neural Networks
14 pages
Syllabus MSC
No ratings yet
Syllabus MSC
133 pages
分层深度学习神经网络（HiDeNN）：用于计算科学与工程的人工智能框架
No ratings yet
分层深度学习神经网络（HiDeNN）：用于计算科学与工程的人工智能框架
39 pages
Btech III Year i Semester (Ar20)
No ratings yet
Btech III Year i Semester (Ar20)
7 pages
MPC 006
No ratings yet
MPC 006
4 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Supervised Learning: Hadrien Lacroix
No ratings yet
Supervised Learning: Hadrien Lacroix
85 pages
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
No ratings yet
Activity 01: Python Set/s of Source Code Use in The Activity (Paste Below)
2 pages
CT106 3 2 SNA SNA LBEF Exam 1
50% (2)
CT106 3 2 SNA SNA LBEF Exam 1
2 pages
MBAN Assignment
No ratings yet
MBAN Assignment
2 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
@4 - Assignment Problem
100% (1)
@4 - Assignment Problem
32 pages
Classification
No ratings yet
Classification
4 pages
SWEN 5012 Advanced Algorithms and Problem Solving: Lecture 3 Randomized Algorithms Beakal Gizachew Assefa
No ratings yet
SWEN 5012 Advanced Algorithms and Problem Solving: Lecture 3 Randomized Algorithms Beakal Gizachew Assefa
33 pages
Simulation of Wireless Communication Systems Using MATLAB
No ratings yet
Simulation of Wireless Communication Systems Using MATLAB
57 pages
Np000333 Post Internship Report
No ratings yet
Np000333 Post Internship Report
49 pages
MLT Syllabus
No ratings yet
MLT Syllabus
3 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
CSE Cryptography and Network Security Report
No ratings yet
CSE Cryptography and Network Security Report
13 pages
Numerical Methods For The Simulation of Chemical Engineering Processes
No ratings yet
Numerical Methods For The Simulation of Chemical Engineering Processes
14 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Balanced Binary Search Trees! - Black Trees (RBT) (Ch. 13) !: Not Counting X Itself
No ratings yet
Balanced Binary Search Trees! - Black Trees (RBT) (Ch. 13) !: Not Counting X Itself
6 pages
Introduction To ARENA Simulation
No ratings yet
Introduction To ARENA Simulation
25 pages
Cópia de Lingaps - 70
No ratings yet
Cópia de Lingaps - 70
10 pages
Authenticated Image Encryption Scheme Based On Chaotic Maps ND Memory Cellular Automataa
No ratings yet
Authenticated Image Encryption Scheme Based On Chaotic Maps ND Memory Cellular Automataa
9 pages
ICT309 - Assessment 2
No ratings yet
ICT309 - Assessment 2
3 pages
Section A: Operating System and Computer Architecture 1 of 7
No ratings yet
Section A: Operating System and Computer Architecture 1 of 7
7 pages
Template For Least Learned Competencies in All Mathematics
No ratings yet
Template For Least Learned Competencies in All Mathematics
8 pages
Warehousemanagement
No ratings yet
Warehousemanagement
2 pages
Paper - Garner, Lynn - On The Collatz 3n + 1 Algorithm
No ratings yet
Paper - Garner, Lynn - On The Collatz 3n + 1 Algorithm
4 pages
Canonical Transformations: 1 Motivation
No ratings yet
Canonical Transformations: 1 Motivation
8 pages
Solving Systems With Elimination PDF
No ratings yet
Solving Systems With Elimination PDF
5 pages
Marking - Scheme of Assignment
No ratings yet
Marking - Scheme of Assignment
1 page
SAP Infotype
No ratings yet
SAP Infotype
3 pages
A Network Administrator Is Troubleshooting Connectivity Issues On A Server
No ratings yet
A Network Administrator Is Troubleshooting Connectivity Issues On A Server
2 pages
In New Sample
No ratings yet
In New Sample
9 pages
CT042-3-1-IDB - Assignment Question Cover
No ratings yet
CT042-3-1-IDB - Assignment Question Cover
1 page
Tugas 4 (Inverse Response Chapter 17)
No ratings yet
Tugas 4 (Inverse Response Chapter 17)
5 pages
Error Due To Diaphragm Constraint
No ratings yet
Error Due To Diaphragm Constraint
3 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
NX Nastran 9.0 for Designers
From Everand
NX Nastran 9.0 for Designers
Prof. Sham Tickoo
4.5/5 (2)
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet