Assignment 2

This document provides instructions for an assignment on analyzing the Titanic dataset using decision trees. Students are asked to: 1) explore and preprocess the dataset to create new attributes; 2) load the preprocessed data into Weka and visualize distributions; 3) build a decision tree model on the training set and test it on the test set; 4) analyze and discuss the results compared to actual outcomes of the Titanic incident. The assignment must be submitted in a professional report with all code and outputs.

Uploaded by

Erick Menjivar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

202 views3 pages

Assignment 2

Uploaded by

Erick Menjivar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CST8390 Assignment 2

Due: June 26, 2022 at 11:59 PM Sharp!!!

(Late submissions will not be accepted)
Goal: The goal of this lab is to explore and analyze Titanic dataset and perform classification using
Decision Trees.
References:
1. http://web.stanford.edu/class/archive/cs/cs109/cs109.1166/problem12.html
2. https://towardsdatascience.com/predicting-the-survival-of-titanic-passengers-
30870ccc7e8
3. https://www.kaggle.com/c/titanic
4. http://csis.pace.edu/~ctappert/srd2014/d3.pdf
5. https://titanicfacts.net/titanic-survivors/
Steps:
Data Understanding
Explore and analyse Titanic dataset given with this assignment (both train & test sets
provided). Read the pages given in references. You must include a brief description (what
is this dataset about, what is the purpose of analysing it etc. - 10 lines) about the dataset.
Also, you must provide a table with all attribute names and their description.
Data Preparation
1. Identify and record relevant attributes to perform a classification on this dataset. Remove
irrelevant attributes in the dataset.
2. Create a new attribute to represent age group
(If age is not given, keep it is as NK,
if age <2, then Baby;
else if age < 12, then Child;
else if age < 18, then Youth;
else if age ≤ 60, then Adult,
else if age >60, then Senior).
3. Create a new attribute “Relatives”. To create this column, the total number of relatives that
include siblings, spouse, parents and children should be calculated.
if the number of relatives is 0, record it as “None”,
else if the number is less than 3, record it as “Few”,
else if the number is less than 5, record it as “Some”
else if the number is 5 or greater, record it as “Many”.
4. Save the new file as Titanic_train_processed.csv. Provide a screenshot of this file (header
and a few rows should be visible.)
Load data into Weka. Double check the type of your attributes. If they are not as expected, apply
filters to convert them to the right types. (For example, class attribute should be nominal. Also, as
you will be using decision trees, it is meaningful to have nominal attributes)
Paste a screenshot of (a) distribution of the class attribute and (b) for the attribute Age Group by
selecting class attribute from the dropdown list for visualization. Include screen shots in the report.
Save the file as titanic_train_processed.arff. Include a screenshot of the file. (Open the file in
Notepad++. header and a few instances should be visible.)
Modeling & Evaluation
Prepare the test set in the same way that you prepared training set. Test file should have the same
format as the train set. In addition to all the other preprocessing steps, create a column for Survived,
with ‘?’ as the value. Save it as an arff file. Then use the same header from training file in the test
file too (only change relation name). Now, perform classification using Decision Trees with 10-
fold cross validation. Copy your confusion matrix and include it in your report. Visualize tree and
paste the tree. Explain the meaning of the obtained tree.
Now open another explorer and open your test file. This is just to ensure that the test file is in the
right format to be used for testing. If there is an issue in opening the file, you need to make changes
in the test file. Once test file is opened in the new explorer window, close the window. Now, in
the first explorer window, set the test set for testing. Click on Supplied Test set and set the test set.

Run Decision trees for the test set. As there is no actual Survived information, you will not get a
valid confusion matrix. Right click on the execution and visualize classifier errors. Save your file
from there as res.arff. Include a screenshot of this file. Your new file will have a new column
named “predicted Survived”. Fill in the following information:
a. Total instances in the test file:
b. Number of persons predicted to survive (1):
c. Number of persons predicted not to survive (0):
d. Percentage of predicted survival:
Discussion of Results
From reference 5, check the actual information of the incident. Give an explanation on how your
predicted results matches with the actual incident. List a few reasons why you think that your
answer is different from the actual results. Also, you need to compare results in detail based on
various features. This is a 5 marks question, so a detailed analysis and comparison of results
with tables and charts expected.

Note: Make sure that you have selected relevant attributes. Otherwise, the analysis
will be completely wrong and if you select totally irrelevant attributes that do not
have any effect on the survival, you will not get any marks.
Submission Details:
This is a partner assignment. Report should have a cover page with the names (Last name, first
name) and student numbers. You must paste all screenshots in the report. The report should have
table of contents, images, etc., sections titled Introduction, Data Understanding, Data Preparation,
Modeling and Evaluation, Discussion of Results, Conclusion, References etc.
Font: Times New Roman. Font size: 12, justified
Now, create a zipped folder named:
<LastNameFirstStudent>_<FirstNameFirstStudent>_<LastnameSecondStudent>_<FirstNameSe
condStudent>.zip with the
• Report in professional style,
• processed train and test arff files,
• model files and
• res.arff file.
Upload the zipped folder to Brightspace.

Marks:
This assignment will have a total of 30 marks. There will be negative marks if you miss
explanation for any of the steps. Every step/question should be answered with explanation.
Prepare your assignment in a professional report style.

Titanic Survival Prediction Assignment
No ratings yet
Titanic Survival Prediction Assignment
3 pages
DS for Business Home Assignments
No ratings yet
DS for Business Home Assignments
24 pages
Titanic Dataset Model Prediction
No ratings yet
Titanic Dataset Model Prediction
11 pages
CSE5ML 2024 SEM2 Assignment 1
No ratings yet
CSE5ML 2024 SEM2 Assignment 1
6 pages
Titanic: Logistic Regression Project
No ratings yet
Titanic: Logistic Regression Project
19 pages
AI lab5
No ratings yet
AI lab5
5 pages
ML Assignment
No ratings yet
ML Assignment
34 pages
CS2B Nov 24 QP
No ratings yet
CS2B Nov 24 QP
5 pages
1
No ratings yet
1
9 pages
Machine Learning Extended Project - BrahmaChari
No ratings yet
Machine Learning Extended Project - BrahmaChari
29 pages
Machine learning with Titanic dataset tutorial
No ratings yet
Machine learning with Titanic dataset tutorial
7 pages
Titanic Classification Project
No ratings yet
Titanic Classification Project
17 pages
Titanic ML Kaggle
No ratings yet
Titanic ML Kaggle
3 pages
Titanic Eda
No ratings yet
Titanic Eda
14 pages
Data Mining Questions Q&A
No ratings yet
Data Mining Questions Q&A
11 pages
Ahamed 123
100% (1)
Ahamed 123
7 pages
DM Manual-Min
No ratings yet
DM Manual-Min
100 pages
Data Strategy Seminar Paper Round1
No ratings yet
Data Strategy Seminar Paper Round1
3 pages
Lab2
No ratings yet
Lab2
17 pages
CS504 hw3
No ratings yet
CS504 hw3
2 pages
What Are Decision Trees?
No ratings yet
What Are Decision Trees?
9 pages
iml project (1) (1)
No ratings yet
iml project (1) (1)
13 pages
Decision Support
No ratings yet
Decision Support
21 pages
4.1.3.5 Lab - Decision Tree Classification
No ratings yet
4.1.3.5 Lab - Decision Tree Classification
11 pages
Rouse Final
No ratings yet
Rouse Final
8 pages
01-Logistic Regression With Python
No ratings yet
01-Logistic Regression With Python
12 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Titanic Report ml report
No ratings yet
Titanic Report ml report
14 pages
Task 1
0% (1)
Task 1
3 pages
Titanic Survival Prediction Using Machine Learning
No ratings yet
Titanic Survival Prediction Using Machine Learning
7 pages
ML Report-1
No ratings yet
ML Report-1
13 pages
MCA- Project Documentation Guidelines 2024-2025
No ratings yet
MCA- Project Documentation Guidelines 2024-2025
26 pages
Question Standard Level As Per Obe All Question Papers Should Follow The Given Levels REVISED Bloom's Taxonomy Verbs
No ratings yet
Question Standard Level As Per Obe All Question Papers Should Follow The Given Levels REVISED Bloom's Taxonomy Verbs
1 page
ELT 211 - Research
No ratings yet
ELT 211 - Research
3 pages
MBAN Assignment
No ratings yet
MBAN Assignment
2 pages
CEP Final
No ratings yet
CEP Final
11 pages
Maneesha Nidigonda Minor Project .Ipynb
No ratings yet
Maneesha Nidigonda Minor Project .Ipynb
35 pages
Neural Network Project
No ratings yet
Neural Network Project
4 pages
my_project_1_AI
No ratings yet
my_project_1_AI
3 pages
Coding Titanicmain
No ratings yet
Coding Titanicmain
58 pages
1.1 Objective: 2. Data Preparation and Exploratory Analysis
No ratings yet
1.1 Objective: 2. Data Preparation and Exploratory Analysis
11 pages
Titanic (5)
No ratings yet
Titanic (5)
3 pages
Titanic (4)
No ratings yet
Titanic (4)
3 pages
Aim: Predicting The Survival of Titanic Passengers
No ratings yet
Aim: Predicting The Survival of Titanic Passengers
20 pages
1.1 Loading The Data: Survival by Sex
No ratings yet
1.1 Loading The Data: Survival by Sex
6 pages
ML Report
No ratings yet
ML Report
3 pages
Report TSP
No ratings yet
Report TSP
13 pages
Data Analytics 01: Drag The Titanic Data Add Set Role Connect It Configure It
No ratings yet
Data Analytics 01: Drag The Titanic Data Add Set Role Connect It Configure It
2 pages
ML Mini Project 2
No ratings yet
ML Mini Project 2
26 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
Titanic
No ratings yet
Titanic
1 page
TITANIC SURVIVAL PREDICTION USING ML MINIPROJECT
No ratings yet
TITANIC SURVIVAL PREDICTION USING ML MINIPROJECT
21 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Assignment 1-Preprocessing Handon
No ratings yet
Assignment 1-Preprocessing Handon
6 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
No ratings yet
PredictingTitanicSurvivorsusing by Applying Exploratory Data Anyltics and ML
7 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Machine Learning
100% (1)
Machine Learning
62 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Systematic Literature Review Narrative Synthesis
100% (3)
Systematic Literature Review Narrative Synthesis
6 pages
Crystal Reports Introduction: Versions 2008-2016
From Everand
Crystal Reports Introduction: Versions 2008-2016
Seth Bonder
No ratings yet
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Difficulties in Translating A Rose For Emily'
67% (3)
Difficulties in Translating A Rose For Emily'
3 pages
Draft
No ratings yet
Draft
15 pages
The Case Study of Lone
No ratings yet
The Case Study of Lone
11 pages
Acs111 - Summary Writing
No ratings yet
Acs111 - Summary Writing
14 pages
Gagne's Conditions of Learning
100% (1)
Gagne's Conditions of Learning
17 pages
Artificial Intelligence and Robotics
No ratings yet
Artificial Intelligence and Robotics
21 pages
UNDERSTANDING THE SELF MODULE 14 ANs
No ratings yet
UNDERSTANDING THE SELF MODULE 14 ANs
4 pages
Detailed Lesson Plan in Music 1
No ratings yet
Detailed Lesson Plan in Music 1
7 pages
Test Construction
No ratings yet
Test Construction
73 pages
Lab 2
No ratings yet
Lab 2
6 pages
Lab5 OutlierDetection
No ratings yet
Lab5 OutlierDetection
3 pages
Writing Business Research Proposal
No ratings yet
Writing Business Research Proposal
32 pages
Lab3 KNN
No ratings yet
Lab3 KNN
4 pages
Assigment 3
No ratings yet
Assigment 3
2 pages
Lesson Plan
No ratings yet
Lesson Plan
4 pages
PLC 2 0
No ratings yet
PLC 2 0
14 pages
San Isidro College: Business Letters and Memos
No ratings yet
San Isidro College: Business Letters and Memos
2 pages
Academic Processes - NITTR - Assignment 1
No ratings yet
Academic Processes - NITTR - Assignment 1
2 pages
Ethics Report
No ratings yet
Ethics Report
11 pages
EXIM Bank Case Study
100% (1)
EXIM Bank Case Study
6 pages
Business Preliminary Read Write Sample Paper 1 - Answer Key
No ratings yet
Business Preliminary Read Write Sample Paper 1 - Answer Key
5 pages
19MFS10021 Vn0030-Language Wing-Faculty of Arts, Education & Social Sciences (New Campus), J.N.V. University, Jodhpur
No ratings yet
19MFS10021 Vn0030-Language Wing-Faculty of Arts, Education & Social Sciences (New Campus), J.N.V. University, Jodhpur
1 page
What Is The Research
No ratings yet
What Is The Research
3 pages
Lesson Plan SS - Insects
No ratings yet
Lesson Plan SS - Insects
3 pages
PSTHE - 1st Discussion
No ratings yet
PSTHE - 1st Discussion
2 pages
Sample Format For Course Outline
No ratings yet
Sample Format For Course Outline
3 pages
Viola Lesson Plan
No ratings yet
Viola Lesson Plan
3 pages
Lesson Plan in English Grade 8
No ratings yet
Lesson Plan in English Grade 8
3 pages
DSEQ Manual 0 PDF
No ratings yet
DSEQ Manual 0 PDF
29 pages
Tema 3. Proceso de Comunicación
No ratings yet
Tema 3. Proceso de Comunicación
6 pages
Ap Annotation and Soapstone
No ratings yet
Ap Annotation and Soapstone
2 pages

Assignment 2

Uploaded by

Assignment 2

Uploaded by

CST8390 Assignment 2

Due: June 26, 2022 at 11:59 PM Sharp!!!

You might also like