0% found this document useful (0 votes)
91 views18 pages

Titanic Classification Disaster Kaggle

This document summarizes the steps taken to clean and prepare the Titanic dataset for machine learning analysis. It describes how certain fields like PassengerId, Pclass, and Age were transformed into new fields like Class and AgeGroup to simplify the data. A decision tree model was then generated using the Weka tool that showed gender and class were important factors in predicting survival, with females in higher classes and embarking from Cherbourg or Queenstown more likely to survive. The model achieved 81% accuracy on the training data.

Uploaded by

James Alberts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views18 pages

Titanic Classification Disaster Kaggle

This document summarizes the steps taken to clean and prepare the Titanic dataset for machine learning analysis. It describes how certain fields like PassengerId, Pclass, and Age were transformed into new fields like Class and AgeGroup to simplify the data. A decision tree model was then generated using the Weka tool that showed gender and class were important factors in predicting survival, with females in higher classes and embarking from Cherbourg or Queenstown more likely to survive. The model achieved 81% accuracy on the training data.

Uploaded by

James Alberts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

[2] “Titanic: Machine Learning from Disaster,” Kaggle.com. [Online].

Available:
https://www.kaggle.com/c/titanic-gettingStarted. [Accessed: 13-Dec-2013].
[3] Wiki, “Titanic.” [Online]. Available: http://en.wikipedia.org/wiki/Titanic. [Accessed: 13-Dec-2013].
http://www.cs.waikato.ac.nz/ml/weka/
https://www.kaggle.com/c/titanic-gettingStarted
Field Action Comment
PassengerId Removed Not needed for analysis as it’s
just an identifier
Survived Converted to No/Yes Needed Nominal identifier
Pclass Removed -> created Class Needed Nominal Identifier
column instead
Class New column Simple calculation based upon
PClass
Age Removed -> created AgeGroup Wanted simple classification
class coding
AgeGroup Formula based; some values not Arbitrarily did the following:
supplied. But ended up with 4 =IF(F2="", "Unk",IF(F2<10,
group other than Unknown "Child", IF(F2<20, "Adolescent",
(Child, Adolescent, Adult, Old) IF(F2<50, "Adult", "Old"))))
Ecode Removed -> created class Needed nominal identifier
Embarked
Embarked New column that converted
Ecode to the real name of the
departure point for the
passenger
@relation 'train4-weka.filters.unsupervised.attribute.Remove-R1,3,6,8'

@attribute Survived {No,Yes}


@attribute Class {1st,2nd,3rd}
@attribute Sex {male,female}
@attribute AgeGroup {Child,Adolescent,Adult,Old,Unk}
@attribute Embarked {Southampton,Cherbourg,Queenstown,Unk}

@data
No,3rd,male,Adult,Southampton
Yes,1st,female,Adult,Cherbourg
J48 pruned tree
------------------

Sex = male: No (577.0/109.0)


Sex = female
| Class = 3rd
| | Embarked = Southampton: No (88.0/33.0)
| | Embarked = Cherbourg: Yes (23.0/8.0)
| | Embarked = Queenstown
| | | AgeGroup = Child: Yes (0.0)
| | | AgeGroup = Adolescent: Yes (5.0/1.0)
| | | AgeGroup = Adult: No (5.0/1.0)
| | | AgeGroup = Old: Yes (0.0)
| | | AgeGroup = Unk: Yes (23.0/4.0)
| | Embarked = Unk: No (0.0)
| Class = 1st: Yes (94.0/3.0)
| Class = 2nd: Yes (76.0/6.0)

Number of Leaves : 11

Size of the tree : 15

=== Summary ===

Correctly Classified Instances 722 81.0325 %


Incorrectly Classified Instances 169 18.9675 %
Kappa statistic 0.5714
Mean absolute error 0.2911
Root mean squared error 0.385
Relative absolute error 61.5359 %
Root relative squared error 79.1696 %
Information gain
Amount of information gained by knowing the value of the attribute
(Entropy of distribution before the split) –(entropy of distribution after it)
Claude Shannon, American mathematician and scientist 1916–2001

You might also like